Skip Headers
Oracle® Retail Data Model Implementation and Operations Guide
Release 11.3.1

Part Number E20363-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

4 ETL Implementation and Customization

This chapter discusses the ETL (Extraction, Transformation and Loading) procedures you use to populate an Oracle Retail Data Model warehouse. It includes the following topics:

The Role of ETL in the Oracle Retail Data Model

Figure 2-1 shows the three layers in Oracle Retail Data Model warehouse environment: the optional staging layer, the foundation layer, and the access layer. As shown in Figure 4-1, you use two types of ETL (extraction, transformation and loading) to populate these layers:

Figure 4-1 ETL Flow Diagram

Description of Figure 4-1 follows
Description of "Figure 4-1 ETL Flow Diagram"

Creating Source-ETL for Oracle Retail Data Model

ETL that populates the staging layer or the foundation layer of an Oracle Retail Data Model warehouse with data from a transactional system is known as source-ETL.

Due to the large number of available transactional applications and multiple versions that may be in use, source-ETL is not provided with Oracle Retail Data Model. You must write your own source-ETL scripts using Oracle Warehouse Builder or another ETL tool or mapping tool.

Oracle By Example:

See the following OBE tutorials for more information on Oracle Warehouse Builder:
  • "Setting Up the Oracle Warehouse Builder 11g Release 2 Environment"

  • "Improved User Interface, Usability, and Productivity With OWB 11g"

  • "Using Data Transformation Operators with Source and Target Operators"

  • "Working with Pluggable Mappings"

  • "Examining Source Data Using Data Profiling with Database 11g Release 2"

To access the tutorials, open the Oracle Learning Library in your browser by following the instructions in "Oracle Technology Network"; and, then, search for the tutorials by name.

The following topics provide general information about writing source-ETL:

Source-ETL Design Considerations

Keep the following points in mind when designing and writing source-ETL for Oracle Retail Data Model:

  • You can populate the calendar data using the calendar population scripts provided with Oracle Retail Data Model (the calendar population scripts populate data for the business and gregorian calendars, other calendars need to be populated using source ETL). See Oracle Retail Data Model Reference for more information on the Oracle Retail Data Model calendar population scripts.

  • Populate the tables in the following order:

    1. Lookup tables

    2. Reference tables

    3. Base tables

  • Analyze the tables in one category before loading the tables in the next category (for example, analyze the reference tables before loading the lookup tables). Additionally, you must analyze all of the tables loaded by the source-ETL process before executing the intra-ETL processes).

    See:

    The topic about analyzing tables, indexes and clusters in Oracle Database Administrator's Guide.

ETL Architecture for Oracle Retail Data Model Source-ETL

ETL typically extracts data from the transactional system, checks for data quality, cleanses the data and ensures consistency of terms, currency, units of measures, and so on, as it consolidates and populates the physical objects in the data warehouse with 'clean' data.

The fundamental services upon which data acquisition is constructed are as follows:

  • Data sourcing

  • Data movement

  • Data transformation

  • Data loading

From a logical architecture perspective, there are many different ways to configure these building blocks for delivering data acquisition services. The major architectural styles available that cover a range of options to be targeted within a data warehousing architecture include:

  • Batch Extract, Transform, and Load and Batch Extract, Load, Transform, Load

    Batch Extract, Transform and Load (ETL) and Batch Extract, Load, Transform, Load (ELTL) are the traditional architectures in a data warehouse implementation. The difference between these is where the transformation proceeds, in or out of the database.

  • Batch Hybrid Extract, Transform, Load, Transform, Load

    Batch Hybrid Extract, Transform, Load, Transform, Load (ETLTL) is a hybrid strategy. This strategy provides the most flexibility to remove hand coding approaches to transformation design, apply a metadata-driven approach, and still be able to leverage the data processing capabilities of the enterprise warehouse. In this targeted design, the transformation processing is first performed outside the warehouse as a pre-processing step before loading the staging tables, and then further transformation processing is performed within the data warehouse before the final load into the target tables.

  • Real-time Extract, Transform, Load

    Real-time Extract, Transform, Load (rETL) is appropriate when service levels for data freshness demand more up-to-date information in the data warehousing environment. In this approach, the transactional system must actively publish events of interest so that the rETL processes can extract them from a message bus (queue) on a timely basis. A message-based paradigm is used with publish and subscribe message bus structures or point-to-point messaging with reliable queues.

When designing source-ETL for Oracle Retail Data Model, use the architecture that best meets your business needs.

Creating a Source to Target Mapping Document for the Source-ETL

Before you begin building your extract systems, create a logical data interface document that maps the relationship between original source columns and target destination columns in the tables. This document ties the very beginning of the ETL system to the very end.

Columns in the data mapping document are sometimes combined. For example, the source database, table name, and column name could be combined into a single target column. The information within the concatenated column would be delimited with a period. Regardless of the format, the content of the logical data mapping document has been proven to be the critical element required to sufficiently plan ETL processes.

Designing a Plan for Rectifying Source-ETL Data Quality Problems

Data cleaning consists of all the steps required to clean and validate the data feeding a table and to apply known business rules to make the data consistent. The perspectives of the cleaning and conforming steps are less about the upside potential of the data and more about containment and control.

If there are data quality problems, then build a plan, in agreement with IT and business users, for how to rectify these problems.

Answer the following questions:

  • Is data missing?

  • Is the data wrong or inconsistent?

  • Should the problem be fixed in the source systems?

Set up the following processes and programs:

  • Data quality measurement process.

  • Data quality reporting and action program and people responsibility.

Designing Source-ETL Workflow and Jobs Control

All data movement among ETL processes are composed of jobs. An ETL workflow executes these jobs in the proper sequence and with regard to the necessary dependencies. General ETL tools, such as Oracle Warehouse Builder, support this kind of workflow, job design, and execution control.

The following list includes tips for when you design ETL jobs and workflow:

  • Use common structure across all jobs (source system to transformer to target data warehouse).

  • Have a one-to-one mapping from source to target.

  • Define one job per Source table.

  • Apply generic job structure and template jobs to allow for rapid development and consistency.

  • Use an optimized job design to leverage Oracle load performance based on data volumes.

  • Design parameterized job to allow for greater control over job performance and behavior.

  • Maximize Jobs parallelism execution.

Designing Source-ETL Exception Handling

Your ETL tool or your developed mapping scripts generate status and error handling tables. All ETL procedures log status and errors into a log table. Execution status may be monitored using an ETL tool or by examining the log table.

Writing Source-ETL that Loads Efficiently

Whether you are developing mapping scripts and loading into a staging layer or directly into the foundation layer the goal is to get the data into the warehouse in the most expedient manner. To achieve good performance during the load you must begin by focusing on where the data to be loaded resides and how you load it into the database. For example, you should not use a serial database link or a single JDBC connection to move large volumes of data. The most common and preferred mechanism for loading large volumes of data is loading from flat files.

The following topics discuss best practices for ensuring your source-ETL loads efficiently:

Using a Staging Area for Flat Files

The area where flat files are stored before being loaded into the staging layer of a data warehouse system is commonly known as staging area. The overall speed of your load is determined by:

  • How quickly the raw data can be read from staging area.

  • How quickly the raw data can be processed and inserted into the database.

Recommendations: Using a Staging Area

Stage the raw data across as many physical disks as possible to ensure that reading it is not a bottleneck during the load.

Also, if you are using the Oracle Exadata Database Machine, the best place to stage the data is in an Oracle Database File System (DBFS) stored on the Exadata storage cells. DBFS creates a mountable cluster file system which can you can use to access files stored in the database. Create the DBFS in a separate database on the Database Machine. This allows the DBFS to be managed and maintained separately from the data warehouse.

Mount the file system using the DIRECT_IO option to avoid thrashing the system page cache while moving the raw data files in and out of the file system.

See:

Oracle Database SecureFiles and Large Objects Developer's Guide for more information on setting up DBFS.

Preparing Raw Data Files for Source-ETL

To parallelize the data load Oracle Database must be able to logically break up the raw data files into chunks, known as granules. To ensure balanced parallel processing, the number of granules is typically much higher than the number of parallel server processes. At any given point in time, a parallel server process is allocated one granule to work on. After a parallel server process completes working on its granule, another granule is allocated until all of the granules are processed and the data is loaded.

Recommendations: Preparing Raw Data Files for Source-ETL

Follow these recommendations:

  • Deliminate each row using a known character such as a new line or a semicolon. This ensures that Oracle can look inside the raw data file and determine where each row of data begins and ends to create multiple granules within a single file.

  • If a file is not position-able and seek-able (for example the file is compressed or zip file), then the files cannot be broken up into granules and the entire file is treated as a single granule. In this case, only one parallel server process can work on the entire file. To parallelize the loading of compressed data files, use multiple compressed data files. The number of compressed data files used determines the maximum parallel degree used by the load.

  • When loading multiple data files (compressed or uncompressed):

    • Use a single external table, if at all possible

    • Make the files similar in size

    • Make the size of the files a multiple of 10 MB

  • If you must have files of different sizes, list the files from largest to smallest. By default, Oracle assumes that the flat file has the same character set as the database. If this is not the case, specify the character set of the flat file in the external table definition to ensure the proper character set conversions can take place.

Source-ETL Data Loading Options

Oracle offers several data loading options

  • External table or SQL*Loader

  • Oracle Data Pump (import and export)

  • Change Data Capture and Trickle feed mechanisms (such as Oracle GoldenGate)

  • Oracle Database Gateways to open systems and mainframes

  • Generic Connectivity (ODBC and JDBC)

The approach that you take depends on the source and format of the data you receive.

Recommendations: Loading Flat Files

If you are loading from files into Oracle Database you have two options: SQL*Loader or external tables.

Using external tables offers the following advantages:

  • Allows transparent parallelization inside the database.You can avoid staging data and apply transformations directly on the file data using arbitrary SQL or PL/SQL constructs when accessing external tables. SQL Loader requires you to load the data as-is into the database first.

  • Parallelizing loads with external tables enables a more efficient space management compared to SQL*Loader, where each individual parallel loader is an independent database sessions with its own transaction. For highly partitioned tables this could potentially lead to a lot of wasted space.

Create an external table using the standard CREATE TABLE statement. However, to load from flat files the statement must include information about where the flat files reside outside the database. The most common approach when loading data from an external table is to issue a CREATE TABLE AS SELECT (CTAS) statement or an INSERT AS SELECT (IAS) statement into an existing table.

Parallel Direct Path Load Source-ETL

A direct path load parses the input data according to the description given in the external table definition, converts the data for each input field to its corresponding Oracle Database data type, then builds a column array structure for the data. These column array structures are used to format Oracle data blocks and build index keys. The newly formatted database blocks are then written directly to the database, bypassing the standard SQL processing engine and the database buffer cache.

The key to good load performance is to use direct path loads wherever possible:

  • A CREATE TABLE AS SELECT (CTAS) statement always uses direct path load.

  • A simple INSERT AS SELECT (IAS) statement does not use direct path load. In order to achieve direct path load with an IAS statement you must add the APPEND hint to the command.

Direct path loads can also run in parallel. To set the parallel degree for a direct path load, either:

  • Add the PARALLEL hint to the CTAS statement or an IAS statement.

  • Set the PARALLEL clause on both the external table and the table into which the data is loaded.

    After the parallel degree is set:

    • A CTAS statement automatically performs a direct path load in parallel.

    • An IAS statement does not automatically perform a direct path load in parallel. To enable an IAS statement to perform direct path load in parallel, you must alter the session to enable parallel DML by executing the following statement.

      alter session enable parallel DML;
      

Partition Exchange Load for Oracle Retail Data Model Source-ETL

A benefit of partitioning is the ability to load data quickly and easily with minimal impact on the business users by using the EXCHANGE PARTITION command. The EXCHANGE PARTITION command enables swapping the data in a nonpartitioned table into a particular partition in your partitioned table. The EXCHANGE PARTITION command does not physically move data, instead it updates the data dictionary to exchange a pointer from the partition to the table and vice versa.

Because there is no physical movement of data, an exchange does not generate redo and undo. In other words, an exchange is a sub-second operation and far less likely to impact performance than any traditional data-movement approaches such as INSERT.

Recommendations: Partitioning Tables

Partition the larger tables and fact tables in the Oracle Retail Data Model warehouse.

Example 4-1 Using Exchange Partition Statement with a Partitioned Table

Assume that there is a large table called Sales, which is range partitioned by day. At the end of each business day, data from the online sales system is loaded into the Sales table in the warehouse.

The following steps ensure the daily data gets loaded into the correct partition with minimal impact to the business users of the data warehouse and optimal speed:

  1. Create external table for the flat file data coming from the online system

  2. Using a CTAS statement, create a nonpartitioned table called tmp_sales that has the same column structure as Sales table

  3. Build any indexes that are on the Sales table on the tmp_sales table

  4. Issue the EXCHANGE PARTITION command.

    Alter table Sales exchange partition p2 with
        table top_sales including indexes without validation;
    
  5. Gather optimizer statistics on the newly exchanged partition using incremental statistics.

The EXCHANGE PARTITION command in this example, swaps the definitions of the named partition and the tmp_sales table, so the data instantaneously exists in the right place in the partitioned table. Moreover, with the inclusion of the INCLUDING INDEXES and WITHOUT VALIDATION clauses, Oracle swaps index definitions and does not check whether the data actually belongs in the partition - therefore, the exchange is very quick.

Note:

The assumption being made in this example is that the data integrity was verified at date extraction time. If you are unsure about the data integrity, omit the WITHOUT VALIDATION clause so that the Database checks the validity of the data.

Customizing Intra-ETL for the Oracle Retail Data Model

The Oracle Retail Data Model supports the use of ETL tools such as Oracle Warehouse Builder to define the workflow to execute the intra-ETL process. You can, of course, write your own Intra-ETL. However, an intra-ETL component is delivered with Oracle Retail Data Model that is a process flow designed using the Oracle Warehouse Builder Workflow component. This process flow is named ORDM_INTRA_ETL_FLW.

As shown in Figure 4-2, the ORDM_INTRA_ETL_FLW process flow uses the data in the Oracle Retail Data Model base, reference, and lookup tables to populate all of the other Oracle Retail Data Model structures. Within this package the dependency of each individual program is implemented and enforced so that each program executes in the proper order.

Figure 4-2 ORDM Main Intra-ETL Process Flow

Description of Figure 4-2 follows
Description of "Figure 4-2 ORDM Main Intra-ETL Process Flow"

You can change the original intra-ETL script for your specific requirements. However, perform a complete impact analysis before you make the change. Package the changes as a patch to the original Oracle Retail Data Model intra-ETL mapping.

The ORDM_INTRA_ETL_FLW process flow consists of the following sub-processes and includes the dependency of individual sub-process flows and executes them in the proper order:

ORDM_DERIVED_FLW

The ORDM_DERIVED_FLW sub-process flow contains all the PL/SQL package code for populating derived tables, based on the content of the base, reference, and lookup tables.

Figure 4-3 shows the ORDM_DERIVED_FLW sub-process flow for populating derived tables.

Figure 4-3 Intra-ETL Derived Process Flow

Description of Figure 4-3 follows
Description of "Figure 4-3 Intra-ETL Derived Process Flow"

After the ORDM_DERIVED_FLW starts successfully, it moves to the fork. The sub-process FORK performs the derived ETL execution (these run in parallel). For each activity, a record is inserted into a control table and the state is set to RUNNING and the respective ETL is executed. Once ETL execution completes successfully, the control table record that was inserted before ETL execution is updated with COMPLETED-SUCCESS status; otherwise it is updated with COMPLETED-ERROR status.

The AND activity specifies whether all the parallel activities run to completion. Then the flow switches to the next activity, for example END_SUCCESS.

This sub-process uses the following technologies:

  • Table: Whenever ETL package is executed, data is inserted into a derived table based on the values of ETL parameters in the DWC_ETL_PARAMETER control table.

ORDM_AGG_N_DEP_FLW

For each activity, the ORDM_AGG_N_DEP_FLW sub-process flow invokes a PL/SQL procedure for refreshing materialized views. The activities in the sub-process flow are all independent, and hence can run in parallel. This sub-process flow has dependency on ORDM_DERIVED_FLW sub-process, that is, ORDM_AGG_N_DEP_FLW is executed only ORDM_DERIVED_FLW is executed successfully.

Figure 4-4 shows the ORDM_AGG_N_DEP_FLW sub-process flow for refreshing all independent materialized views.

Figure 4-4 Intra-ETL Independent MV Process Flow

Description of Figure 4-4 follows
Description of "Figure 4-4 Intra-ETL Independent MV Process Flow"

After the ORDM_AGG_N_DEP_FLW is initiated and starts successfully the flow moves to the Fork. The FORK process makes the aggregates run in parallel. The AND activity specifies all the parallel aggregates must complete; the flow then switches over to the next activity, (for example, END_SUCCESS).

This sub-process uses the following technologies:

  • Materialized View: A materialized view is used to hold the aggregation data. Whenever this is refreshed then the modified data are reflected in the corresponding aggregate table and this leads to a significant increase in the query execution performance. Moreover usage of materialized view allows Oracle Retail Data Model to make use of the Oracle Query Rewrite feature for better SQL optimization and hence improved performance.

  • FAST Refresh: This refresh type is used to refresh the aggregates with only the incremental data (inserted and modified) in base and derived tables after the immediately previous refresh and this incremental refresh leads to much better performance and hence shorter intra-ETL window.

ORDM_AGG_DEP_FLW

For each activity, the ORDM_AGG_DEP_FLW sub-process flow invokes a PL/SQL procedure for refreshing materialized views. The activities in the sub-process flow have dependencies. This sub-process flow has dependency on ORDM_DERIVED_FLW sub-process, that is, ORDM_AGG_DEP_FLW is executed only after ORDM_DERIVED_FLW runs successfully.

Figure 4-5 shows the ORDM_AGG_DEP_FLW sub-process flow for refreshing all independent materialized views.

Figure 4-5 Intra-ETL Aggregate Process Flow

Description of Figure 4-5 follows
Description of "Figure 4-5 Intra-ETL Aggregate Process Flow"

After the ORDM_AGG_DEP_FLW is initiated and starts successfully the process flow moves to the Fork. The FORK process makes the aggregates to run in parallel. The AND activity specifies all the parallel aggregates must complete, then the flow switches over to the next activity, (for example, END_SUCCESS).

This sub-process uses the following technologies:

  • Materialized View: A materialized view holds the aggregation data. Whenever this is refreshed then the modified data is reflected in the corresponding aggregate table and this leads to a significant increase in the query execution performance. Moreover usage of materialized view allows Oracle Retail Data Model to make use of the Oracle Query Rewrite feature for better SQL optimization and hence improved performance.

  • FAST Refresh: This refresh type is used to refresh the aggregates with only the incremental data (inserted and modified) in base and derived tables after the immediately previous refresh and this incremental refresh leads to much better performance and hence shorter intra-ETL window.

OLAP_MAP Mapping Flow

The OLAP_MAP mapping invokes PKG_ORDM_OLAP_ETL_AW_LOAD.OLAP_ETL_AW_BUILD function of OLAP ETL package that can load from Oracle Retail Data Model reference and derived tables to Oracle Retail Data Model Analytical Workspace dimensions and cubes respectively and calculate the forecast data. It reads OLAP ETL parameters from DWC_OLAP_ETL_PARAMETER table.

Figure 4-6 shows the OLAP_MAP mapping that invokes the OLAP ETL package.

Figure 4-6 OLAP Map Process Flow

Description of Figure 4-6 follows
Description of "Figure 4-6 OLAP Map Process Flow"

ORDM_MNNG_FLW

The mining process flow, ORDM_MNNG_FLW, first refreshes mining source materialized views then refreshes the mining models.

Figure 4-7 shows the mining process flow, ORDM_MNNG_FLW.

Figure 4-7 Mining Flow Process

Description of Figure 4-7 follows
Description of "Figure 4-7 Mining Flow Process"

Performing an Initial Load of an Oracle Retail Data Model Warehouse

Performing an initial load of an Oracle Retail Data Model is a multistep process:

  1. Load the reference, lookup, and base tables Oracle Retail Data Model warehouse by executing the source-ETL that you have written using the guidelines given in "Creating Source-ETL for Oracle Retail Data Model".

  2. Load the remaining structures in the Oracle Retail Data Model, by taking the following steps:

    1. Update the parameters in DWC_ETL_PARAMETER control table in the ordm_sys schema so that the ETL can use this information (that is, the beginning and end date of the ETL period) when loading the derived and aggregate tables and views.

      For an initial load of an Oracle Retail Data Model warehouse, specify the values shown in the following table.

      Columns Value
      PROCESS_NAME 'ORDM-INTRA-ETL'
      FROM_DATE_ETL The beginning date of the ETL period.
      TO_DATE_ETL The ending date of the ETL period.

      See:

      Oracle Retail Data Model Reference for more information on the DWC_ETL_PARAMETER control table.
    2. Update the Oracle Retail Data Model OLAP ETL parameters in DWC_OLAP_ETL_PARAMETER control table in the ordm_sys schema to specify the build method and other build characteristics so that the ETL can use this information when loading the OLAP cube data.

      For an initial load of the analytic workspace, specify values following the guidelines in Table 4-1.

      Table 4-1 Explanation of Load Parameters in DWC_OLAP_ETL_PARAMETER Along with (typical) Initial Load Values

      Column Name Value

      BUILD_METHOD

      Use the build method parameter to indicate a full or a fast (partial) refresh. The following are the possible values for BUILD_METHOD:

      • C: Complete refresh clears all dimension values before loading. (Default value).

      • F: Fast refresh of a cube materialized view, which performs an incremental refresh and re-aggregation of only changed rows in the source table.

      • ?: Fast refresh if possible, and otherwise a complete refresh.

      • P: Recomputes rows in a cube materialized view that are affected by changed partitions in the detail tables.

      • S: Fast solve of a compressed cube. A fast solve reloads all the detail data and re-aggregates only the changed values.

      Note:

      In a fast refresh, only changed rows are inserted in the cube and the affected areas of the cube are re-aggregated.

      The C, S, and ? methods always succeed and can be used on any cube.

      The F and P methods require that the cube have a materialized view that was created as a fast or a rewrite materialized view.

      For initial load, specify C which specifies a complete refresh which clears all dimension values before loading.

      BUILD_METHOD_TYPE

      HISTORICAL or INCREMENTAL indicating whether this is an initial load of OLAP AW or an incremental load of the OLAP AW.

      For initial load, specify HISTORICAL

      CALC_FCST

      One of the following values depending on whether you calculate forecast cubes:

      • Y specifies calculate forecast cubes.

      • N specifies do not calculate forecast cubes.

      For initial load, specify Y.

      CUBENAME

      One of the following values that specifies the cubes you build:

      • ALL specifies a build of the cubes in the Oracle Retail Data Model analytic workspace.

      • cubename[[|cubename]...] specifies one or more cubes to build.

      For initial load, specify ALL.

      FCST_MTHD

      If the value for the CALC_FCST column is Y, then specify AUTO; otherwise, specify NULL. Another valid value is MANUAL which sets the forecasting approach to APPMANUAL instead of APPAUTO (APPAUTO and APPMANUAL are internal terms used by Oracle OLAP Forecasting command). This parameter is ignored if CALC_FCST column is N.

      For initial load, specify AUTO.

      FCST_ST_MO

      If the value for the CALC_FCST column is Y, then specify value specified as BY YYYY MX which is the "end business month" of a historical period; otherwise, specify NULL. This parameter is ignored if CALC_FCST column is N. X is month number in a year.

      For example:

      BY 2011 M7, or BY 2011 M11

      For initial load, specify BY 2012 M1

      HIST_ST_MO

      If the value for the CALC_FCST column is Y, then specify value specified as BY YYYY MX which is the "start business month" of historical data; otherwise, specify NULL. This parameter is ignored if CALC_FCST column is N. X is the month number in a year.

      For example: BY 2011 M7, or BY 2011 M11

      For initial load, specify BY 2010 M1

      MAXJOBQUEUES

      A decimal value that specifies the number of parallel processes to allocate to this job. (Default value is 4.) The value that you specify varies depending on the setting of the JOB_QUEUE_PROCESSES database initialization parameter.

      For initial load, specify 4

      NO_FCST_YRS

      If the value for the CALC_FCST column is Y, specify a decimal value that specifies how many years forecast data to calculate; otherwise, specify NULL. This parameter is ignored if CALC_FCST column is N.

      For initial load, specify 2

      OTHER1

      Not used. Specify NULL.

      OTHER2

      Not used. Specify NULL.

      PROCESS_NAME

      'ORDM-OLAP-ETL'


    3. Execute the intra-ETL as described in "Executing the Default Oracle Retail Data Model Intra-ETL".

Executing the Default Oracle Retail Data Model Intra-ETL

Oracle Retail Data Model provides you with a database package named PKG_INTRA_ETL_PROCESS that is a complete Intra-ETL process. This intra-ETL process is composed of individual population programs (database packages and MV refresh scripts). This package includes the dependency for each individual program and executes the programs in the proper order.

You can execute the intra-ETL packages provided with Oracle Retail Data Model in the following ways.

Executing the ORDM_INTRA_ETL_FLW Workflow from Oracle Warehouse Builder

You can execute the ORDM_INTRA_ETL_FLW process from within Oracle Warehouse Builder.

To deploy the ORDM_INTRA_ETL_FLW process flow, take the following steps: 

  1. Confirm that Oracle Warehouse Builder Workflow has been installed as described in Oracle Retail Data Model Installation Guide.

  2. Within Oracle Warehouse Builder, go to the Control Center Manager.

  3. Select OLAP_PFLW, then select AGR_PFLW, then select the main process flow ORDM_INTRA_ETL_FLW.

  4. Right-click ORDM_INTRA_ETL_FLW and select set action.

    • If this is the first deployment, set action to Create.

    • If this is a later deployment, set action to Replace.

    Deploy the process flow.

After the deployment finishes successfully, ORDM_INTRA_ETL_FLW is ready to execute.

See:

Oracle Warehouse Builder Sources and Targets Guide for information about Oracle Warehouse Builder.

Executing the Intra-ETL Without Using Oracle Warehouse Builder

You do not have to execute the Intra-ETL as a workflow in Oracle Warehouse Builder. You can, instead, execute it as follows:

Executing the Intra-ETL by Using the PKG_INTRA_ETL_PROCESS.RUN Procedure

You can use the PKG_INTRA_ETL_PROCESS.RUN procedure to start the Intra-ETL process. This procedure can be invoked manually, or by another process such as Source-ETL, or according to a predefined schedule such as Oracle Job Scheduling.

The database package PKG_INTRA_ETL_PROCESS is a complete Intra-ETL process composed of individual population programs (database packages and MV refresh scripts). This package includes dependency for each individual program and executes the programs in the proper order.

PKG_INTRA_ETL_PROCESS.RUN does not accept parameters. This procedure calls other programs in the correct order to load the data for current day (according to the Oracle system date). The PKG_INTRA_ETL_PROCESS.RUN procedure uses the DWC_ control tables to track the loading progress and results.

Refreshing the Data in Oracle Retail Data Model Warehouse

The section, "Performing an Initial Load of an Oracle Retail Data Model Warehouse" describes how to perform an initial load of an Oracle Retail Data Model data warehouse. After this initial load, you must load new data into your Oracle Retail Data Model data warehouse regularly so that it can serve its purpose of facilitating business analysis.

To load new data into your Oracle Retail Data Model warehouse, you extract the data from one or more operational systems and copy that data into the warehouse. The challenge in data warehouse environments is to integrate, rearrange and consolidate large volumes of data over many systems, thereby providing a new unified information base for business intelligence.

The successive loads and transformations must be scheduled and processed in a specific order that is determined by your business needs. Depending on the success or failure of the operation or parts of it, the result must be tracked and subsequent, alternative processes might be started.

You can do a full incremental load of the relational tables and views, OLAP cubes, and data mining models simultaneously, or you can refresh the data sequentially, as follows:

  1. Refreshing Oracle Retail Data Model Relational Tables and Views

  2. Refreshing Oracle Retail Data Model OLAP Cubes

  3. Refreshing Oracle Retail Data Model Data Mining Models

In either case, you can manage errors during the execution of the intra-ETL as described in "Managing Errors During Oracle Retail Data Model Intra-ETL Execution".

Refreshing Oracle Retail Data Model Relational Tables and Views

Refreshing the relational tables and views in an Oracle Retail Data Model is a multi-step process:

  1. Refresh the reference, lookup, and base tables in the Oracle Retail Data Model warehouse with transactional data by executing the source-ETL that you have written.

  2. Update the parameters of the DWC_ETL_PARAMETER control table in the ordm_sys schema. For an incremental load of an Oracle Retail Data Model warehouse, specify the values shown in the following table (that is, the beginning and end date of the ETL period).

    Columns Value
    PROCESS_NAME 'ORDM-INTRA-ETL'
    FROM_DATE_ETL The beginning date of the ETL period.
    TO_DATE_ETL The ending date of the ETL period.

    See:

    Oracle Retail Data Model Reference for more information on the DWC_ETL_PARAMETER control table.
  3. Refresh the derived tables and aggregate tables which are materialized views in Oracle Retail Data Model by executing the DRVD_FLOW and AGGR_FLOW subprocess of the ORDM_INTRA_ETL_FLW process flow. See "Executing the Default Oracle Retail Data Model Intra-ETL" for more information.

See also:

Oracle Warehouse Builder Sources and Targets Guide

Refreshing Oracle Retail Data Model OLAP Cubes

On a scheduled basis you must update the OLAP cube data with the relational data that has been added to the Oracle Retail Data Model data warehouse since the initial load of the OLAP cubes.

You can execute the Oracle Retail Data Model ETL to update the OLAP cubes in the following ways

  • Refresh all of the data in the warehouse by executing the Oracle Warehouse Builder Workflow ORDM_INTRA_ETL_FLW in one of the ways that are described in "Executing the Default Oracle Retail Data Model Intra-ETL".

    The OLAP Cubes are populated through OLAP_MAP which is a part of Oracle Retail Data Model intra-ETL main workflow ORDM_INTRA_ETL_FLW.

  • Refresh only the OLAP cube data by executing the OLAP_MAP Oracle Warehouse Builder mapping in the Oracle Warehouse Builder control center.

    Note:

    You must refresh the corresponding materialized view of the OLAP cubes you are refreshing before you execute OLAP_MAP. (For the mapping between OLAP cube and materialized views, refer to Oracle Retail Data Model Reference.

Take these steps to perform an incremental load of the analytic workspace that is part of the Oracle Retail Data Model warehouse:

  1. Update the aggregate tables which are materialized views in Oracle Retail Data Model. See "Refreshing Oracle Retail Data Model Relational Tables and Views" for more information.

  2. Execute the intra-ETL to load the cube data in one of the ways described in "Executing the Default Oracle Retail Data Model Intra-ETL".

  3. If necessary, to recover from errors during the execution of OLAP_MAP take the following steps.

    1. Change the value of the BUILD_METHOD column of the ordm_sys.DWC_OLAP_ETL_PARAMETER table to "C".

    2. In Oracle Warehouse Builder, rerun the OLAP_MAP map.

Refreshing Oracle Retail Data Model Data Mining Models

The MINING_FLW sub-process flow of the ORDM_INTRA_ETL_FLW process flow triggers the data mining model refreshment. After the initial load of the warehouse, it is recommended to refresh the data mining models monthly. Refreshing the data models is integrated into the MINING_FLW sub-process flow. You can also manually refresh the data models.

The way you refresh a data mining model varies depending on whether you want to refresh all of the models or only one model:

  • To manually refresh all mining models, call the following procedure.

    PKG_ordm_mining.REFRESH_MODEL( MONTH_CODE,P_PROCESS_NO)
    

    This procedure performs the following tasks for each model:

    1. Refreshes the mining source materialized views based on the latest data from ordm_sys schema.

    2. Trains each model on the new training data.

    3. Applies each model onto the new apply data set.

  • To manually re-create only one mining model, you can call the corresponding procedure. For example, to re-create the employee combination model, you can call the following procedure.

    create_emp_cmbntn_glmr_model;
    

    "Tutorial: Customizing the Customer Life Time Value Prediction Data Mining Model" provides detailed instructions for refreshing a single data mining model.

Managing Errors During Oracle Retail Data Model Intra-ETL Execution

This topic discusses how you can identify and manage errors during intra-ETL execution. It contains the following topics:

Monitoring the Execution of the Intra-ETL Process

Two ordm_sys schema control tables, DWC_INTRA_ETL_PROCESS and DWC_INTRA_ETL_ACTIVITY, monitor the execution of the intra-ETL process. These tables are documented in Oracle Retail Data Model Reference.

Each normal run (as opposed to an error-recovery run) of a separate intra-ETL execution performs the following steps:

  1. Inserts a record into the DWC_INTRA_ETL_PROCESS table with a monotonically increasing system generated unique process key, SYSDATE as process start time, RUNNING as the process status, and an input date range in the FROM_DATE_ETL and TO_DATE_ETL columns.

  2. Invokes each of the individual intra-ETL programs in the appropriate order of dependency. Before the invocation of each program, the procedure inserts a record into the intra-ETL Activity detail table, DWC_INTRA_ETL_ACTIVITY, with values for:

    • ACTIVITY_KEY, a system generated unique activity key.

    • PROCESS_KEY, the process key value corresponding to the intra-ETL process.

    • ACTIVITY_NAME, an individual program name.

    • ACTIVITY_DESC, a suitable activity description.

    • ACTIVITY_START_TIME, the value of SYSDATE.

    • ACTIVITY_STATUS, the value of RUNNING.

  3. Updates the corresponding record in the DWC_INTRA_ETL_ACTIVITY table for the activity end time and activity status after the completion of each individual ETL program (either successfully or with errors). For successful completion of the activity, the procedure updates the status as 'COMPLETED-SUCCESS'. When an error occurs, the procedure updates the activity status as 'COMPLETED-ERROR', and also updates the corresponding error detail in the ERROR_DTL column.

  4. Updates the record corresponding to the process in the DWC_INTRA_ETL_ PROCESS table for the process end time and status, after the completion of all individual intra-ETL programs. When all the individual programs succeed, the procedure updates the status to 'COMPLETED-SUCCESS', otherwise it updates the status to 'COMPLETED-ERROR'.

You can monitor the execution state of the intra-ETL, including current process progress, time taken by individual programs, or the complete process, by viewing the contents of the DWC_INTRA_ETL_PROCESS and DWC_INTRA_ETL_ACTIVITY tables corresponding to the maximum process key. Monitoring can be done both during and after the execution of the intra-ETL procedure.

Recovering an Intra ETL Process

To recover an intra-ETL process

  1. Identify the errors by looking at the corresponding error details that are tracked against the individual programs in the DWC_INTRA_ETL_ACTIVITY table.

  2. Correct the causes of the errors.

  3. Re-invoke the intra-ETL process.

The ORDM_INTRA_ETL_FLW process identifies whether it is a normal run or recovery run by referring the DWC_INTRA_ETL_ACTIVITY table. During a recovery run, ORDM_INTRA_ETL_FLW executes only the necessary programs. For example, in the case of a derived population error as a part of the previous run, this recovery run executes the individual derived population programs which produced errors in the previous run. After their successful completion, the run executes the aggregate population programs and materialized view refresh in the appropriate order.

In this way, the intra-ETL error recovery is almost transparent, without involving the data warehouse or ETL administrator. The administrator must only correct the causes of the errors and re-invoke the intra-ETL process. The intra-ETL process identifies and executes the programs that generated errors.

Troubleshooting Intra-ETL Performance

To troubleshoot the performance of the intra-ETL:

Checking the Execution Plan

Use SQLDeveloper or other tools to view the package body of the code generated by Oracle Warehouse Builder.

For example, take the following steps to examine a map:

  1. Copy out the main query statement from code viewer.

    Copy from "CURSOR "AGGREGATOR_c" IS …." to end of the query, which is right above another "CURSOR "AGGREGATOR_c$1" IS".

  2. In SQLDeveloper worksheet, issue the following statement to turn on the parallel DML:

    Alter session enable parallel dml;
    
  3. Paste the main query statement into another SQL Developer worksheet and view the execution plan by clicking F6.

    Carefully examine the execution plan to make the mapping runs according to a valid plan.

Monitoring PARALLEL DML Executions

Check that you run the mapping in parallel mode by executing the following SQL statement to count the executed "Parallel DML/Query" operations:

column name format a50
column value format 999,999
SELECT NAME, VALUE 
FROM GV$SYSSTAT
WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%'
  OR UPPER (NAME) LIKE '%PARALLELIZED%'
  OR UPPER (NAME) LIKE '%PX%'
;

If you run mapping in parallel mode, you should see "DML statements parallelized" increased by 1 (one) every time the mapping was invoked. If not, you do not see this increase, then the mapping was not invoked as "parallel DML".

If you see "queries parallelized" increased by 1 (one) instead, then typically it means that the SELECT statement inside of the INSERT was parallelized, but that INSERT itself was not parallelized.

Troubleshooting Data Mining Model Creation

After the data mining models are created, check the error log in ordm_sys.dwc_intra_etl_activity table. For example, execute the following code.

set line 160
col ACTIVITY_NAME format a30
col ACTIVITY_STATUS format a20
col error_dtl format a80
select activity_name, activity_status,  error_dtl from dwc_intra_etl_activity;

If all models are created successfully, the activity_status is all "COMPLETED-SUCCESS". If the activity_status is "COMPLETED-ERROR" for a certain step, check the ERROR_DTL column, and fix the problem accordingly.

The following examples illustrate how to troubleshoot some common error messages returned in ERROR_DTL and ACTIVITY_NAME when working with Oracle Retail Data Model:

Example 4-2 Troubleshooting an ORA-20991 Error for Oracle Retail Data Model

Assume that the returned error is ORA-20991: Message not available ... [Language=ZHS]CURRENT_MONTH_KEY.

This error may happen when there is not enough data in the DWR_BSNS_MO table. For example, if the calendar data is populated with 2004~2009 data, the mining model refresh for Year 2010 may result in this error.

To fix this error, execute the Oracle Retail Data Model calendar utility script again to populate the calendar with sufficient data. For example:

Execute Calendar_Population.run('2005-01-01',10);

See Oracle Retail Data Model Reference for information on the calendar population utility script.

Example 4-3 Troubleshooting the "Message not available ... [Language=ZHS]" Error

Assume that the returned error is Message not available ... [Language=ZHS].

'ZHS' is a code for a language. The language name it relates to can appear as different name depending on the database environment. This error happens when ordm_sys.DWC_MESSAGE.LANGUAGE does not contain messages for the current language.

Check the values in the DWC_MESSAGE table and, if required, update to the language code specified by the Oracle session variable USERENV('lang').

Example 4-4 Troubleshooting an ORA-40113 Error for Oracle Retail Data Model

Assume that the returned error is ORA-40113: insufficient number of distinct target values, for 'create_chrn_dt_model.

This error happens when the target column for the training model contains only one value or no value when it is expecting more than one value.

For example, for the customer churn prediction model, the target column is: dmv_cust_acct_src.chrn_ind

To troubleshoot this error:

  1. Execute a SQL query to check if there are enough values in this column.

    Using the customer churn prediction model as an example, issue the following statement.

    select chrn_ind, count(*) from DMV_CUST_ACCT_SRC  group by chrn_ind;
    

    The following is a result of the query.

    CHRN_IND   COUNT(*)
    -- -----------
    0         10000
    
    
  2. Check whether dwr_cust.prmry_eff_to_dt column has few non-null values.

  3. Execute the following statement to refresh the mining source materialized views:

    exec pkg_ordm_mining.refresh_mining_source;
    

Example 4-5 Troubleshooting an ORA-40112 Error for Oracle Retail Data Model

Assume that the returned error is:

ORA-40112:insufficient number of valid data rows, for " create_chrn_dt_model "

For this model, the target column is dmv_cust_acct_src.chrn_ind.

To troubleshoot this error:

  1. Execute the following SQL statement:

    select count(chrn_ind) from dmv_cust_acct_src;
    
  2. Check to see that the value returned by this query is greater than 0 (zero) and similar to number of customers. If the number is 0 or too small, check the data in source tables of mining source materialized view, dmv_cust_acct_src :

    DWB_RTL_TRX
    DWR_CUST
    

Example 4-6 Troubleshooting an ORG-11130 Error for Oracle Retail Data Model

Assume that the returned error is ORG-11130:no data found in the collection, for "create_sentiment_svm_model".

This error occurs when there is not enough data in the source table for customer sentiment model training: dm_cust_cmnt .

To ensure that some text is loaded for customer sentiment analysis:

  1. Issue the following SQL statement:

    SELECT cust_key, count(cust_cmmnt)
    from dm_cust_cmnt
    group by cust_key;
    
  2. Check the number of text comments from the dm_cust_cmnt.

  3. If there is not enough data in the dm_cust_cmnt table, check the ETL logic of dm_cust_intrqacn_cmnt table first, then check data in the base interaction event tables.