Oracle® Airlines Data Model Implementation and Operations Guide 11g Release 2 (11.2) Part Number E26211-02 |
|
|
PDF · Mobi · ePub |
This chapter discusses the ETL (extraction, transformation and loading) programs you use to populate an Oracle Airlines Data Model warehouse. It includes the following topics:
ETL for the Foundation Layer of an Oracle Airlines Data Model Warehouse
Performing an Initial Load of an Oracle Airlines Data Model Warehouse
Refreshing the Data in an Oracle Airlines Data Model Warehouse
Managing Errors During Oracle Airlines Data Model Intra-ETL Execution
Figure 2-1, "Layers of an Oracle Airlines Data Model Warehouse" illustrated the three layers in Oracle Airlines Data Model warehouse environment: the optional staging layer, the foundation layer, and the access layer. You use two types of ETL (extraction, transformation and loading) to populate these layers:
Source-ETL. ETL that populates the staging layer (if any) and the foundation layer (that is, the base, reference, and lookup tables) with data from the OLTP system is known as source ETL.
Oracle Airlines Data Model does not include source-ETL scripts. You must create source-ETL yourself using your understanding of your OLTP system and your customized Oracle Airlines Data Model. See "ETL for the Foundation Layer of an Oracle Airlines Data Model Warehouse" for more information on creating source-ETL.
Intra-ETL. ETL that populates the access layer (that is, the derived tables, aggregate tables, materialized views, OLAP cubes, and data mining models) using the data in the foundation layer is known as intra-ETL.
Oracle Airlines Data Model does include intra-ETL. You can modify the default intra-ETL to populate a customized access layer from a customized foundation layer. See "Customizing Intra-ETL for the Oracle Airlines Data Model" for more information on the intra-ETL.
ETL that populates the foundation layer of an Oracle Airlines Data Model warehouse (that is, the base, reference, and lookup tables) with data from an OLTP system is known as source-ETL.
You populate the foundation layer of an Oracle Airlines Data Model warehouse by writing your own source-ETL scripts using Oracle Warehouse Builder or another ETL tool and then use those scripts to populate the foundation layer.
The following topics provide general information about writing source-ETL:
Creating a Source to Target Mapping Document for the Source-ETL
Designing a Plan for Rectifying Source-ETL Data Quality Problems
Keep the following points in mind when designing and writing source-ETL for Oracle Airlines Data Model:
You can populate the calendar data by using the calendar population scripts provided with Oracle Airlines Data Model and described in Oracle Airlines Data Model Reference.
Populate the tables in the following order:
Lookup tables
Reference tables
Base tables
Analyze the tables in one category before loading the tables in the next category (for example, analyze the lookup tables before loading the reference tables). Additionally, you must analyze all of the tables loaded by the source-ETL process before executing the intra-ETL processes).
See:
The topic about analyzing tables, indexes and clusters in Oracle Database Administrator's Guide.ETL first extracts data from the original sources, assures the quality of the data, cleans the data, and makes the data consistent across the original sources. ETL then populates the physical objects with the "clean" data so that query tools, report writers, dashboards and so on can access the data.
The fundamental services upon which data acquisition is constructed are as follows:
Data sourcing
Data movement
Data transformation
Data loading
From a logical architecture perspective, there are many different ways to configure these building blocks for delivering data acquisition services. The major architectural styles available that cover a range of options to be targeted within a data warehousing architecture include:
Batch Extract, Transform, and Load and Batch Extract, Load, Transform, Load
Batch Extract, Transform and Load (ETL) and Batch Extract, Load, Transform, Load (ELTL) are the traditional architecture sin data warehouse implementation. The difference between them is where the transformation proceed in or out database.
Batch Hybrid Extract, Transform, Load, Transform, Load
Batch Hybrid Extract, Transform, Load, Transform, Load (ETLTL) is a hybrid strategy. This strategy provides the most flexibility to remove hand coding approaches to transformation design, apply a metadata-driven approach, and still be able to leverage the data processing capabilities of the enterprise warehouse. In this targeted design, the transformation processing is first performed outside the warehouse as a pre-processing step before loading the staging tables, and then further transformation processing is performed within the data warehouse before the final load into the target tables.
Real-time Extract, Transform, Load
Real-time Extract, Transform, Load (rETL) is appropriate when service levels for data freshness demand more up-to-date information in the data warehousing environment. In this approach, the OLTP system must actively publish events of interest so that the rETL processes can extract them from a message bus (queue) on a timely basis. A message-based paradigm is used with publish and subscribe message bus structures or point-to-point messaging with reliable queues.
When designing source-ETL for Oracle Airlines Data Model, use the architecture that best meets your business needs.
Before you begin building your extract systems, create a logical data interface document that maps the relationship between original source fields and target destination fields in the tables. This document ties the very beginning of the ETL system to the very end.
Columns in the data mapping document are sometimes combined. For example, the source database, table name, and column name could be combined into a single target column. The information within the concatenated column would be delimited with a period. Regardless of the format, the content of the logical data mapping document has been proven to be the critical element required to sufficiently plan ETL processes.
Data cleaning consists of all the steps required to clean and validate the data feeding a table and to apply known business rules to make the data consistent. The perspectives of the cleaning and conforming steps are less about the upside potential of the data and more about containment and control.
If there are data quality problems, then build a plan, in agreement with IT and business users, for how to rectify these problems.
Answer the following questions:
Is data missing?
Is the data wrong or inconsistent?
Should the problem be fixed in the source systems?
Set up the data quality reporting and action program and people responsibility.
Then, set up the following processes and programs:
Set up a data quality measurement process.
Set up the data quality reporting and action program and people responsibility.
All data movement among ETL processes are composed of jobs. An ETL workflow executes these jobs in the proper sequence and with regard to the necessary dependencies. General ETL tools, such as Oracle Warehouse Builder, support this kind of workflow, job design, and execution control.
Below are some tips when you design ETL jobs and workflow:
Use common structure across all jobs (source system to transformer to target data warehouse).
Have a one-to-one mapping from source to target.
Define one job per Source table.
Apply generic job structure and template jobs to allow for rapid development and consistency.
Use an optimized job design to leverage Oracle load performance based on data volumes.
Design parameterized job to allow for greater control over job performance and behavior.
Maximize Jobs parallelism execution.
Your ETL tool or your developed mapping scripts generate status and error handling tables.
As a general principle, all ETL logs status and errors into a table. You monitor execution status using an ETL tool or by querying this log table directly.
Whether you are developing mapping scripts and loading into a staging layer or directly into the foundation layer the goal is to get the data into the warehouse in the most expedient manner. In order to achieve good performance during the load you must begin by focusing on where the data to be loaded resides and how you load it into the database. For example, you should not use a serial database link or a single JDBC connection to move large volumes of data. The most common and preferred mechanism for loading large volumes of data is loading from flat files.
The following topics discuss best practices for ensuring your source-ETL loads efficiently:
The area where flat files are stored before being loaded into the staging layer of a data warehouse system is commonly known as staging area. The overall speed of your load is determined by:
How quickly the raw data can be read from staging area.
How quickly the raw data can be processed and inserted into the database.
Recommendations: Using a Staging Area
Stage the raw data across as many physical disks as possible to ensure that reading it is not a bottleneck during the load.
Also, if you are using the Exadata Database Machine, the best place to stage the data is in an Oracle Database File System (DBFS) stored on the Exadata storage cells. DBFS creates a mountable cluster file system which can you can use to access files stored in the database. Create the DBFS in a separate database on the Database Machine. This allows the DBFS to be managed and maintained separately from the data warehouse.
Mount the file system using the DIRECT_IO
option to avoid thrashing the system page cache while moving the raw data files in and out of the file system.
See:
Oracle Database SecureFiles and Large Objects Developer's Guide for more information on setting up DBFS.In order to parallelize the data load Oracle Database must be able to logically break up the raw data files into chunks, known as granules. To ensure balanced parallel processing, the number of granules is typically much higher than the number of parallel server processes. At any given point in time, a parallel server process is allocated one granule to work on. After a parallel server process completes working on its granule, another granule is allocated until all of the granules are processed and the data is loaded.
Recommendations: Preparing Raw Data Files for Source-ETL
Follow these recommendations:
Deliminate each row using a known character such as a new line or a semicolon. This ensures that Oracle can look inside the raw data file and determine where each row of data begins and ends in order to create multiple granules within a single file.
If a file is not position-able and seek-able (for example the file is compressed or zip file), then the files cannot be broken up into granules and the whole file is treated as a single granule. In this case, only one parallel server process can work on the entire file. In order to parallelize the loading of compressed data files, use multiple compressed data files. The number of compressed data files used determines the maximum parallel degree used by the load.
When loading multiple data files (compressed or uncompressed):
Use a single external table, if at all possible
Make the files similar in size
Make the size of the files a multiple of 10 MB
If you must have files of different sizes, list the files from largest to smallest. By default, Oracle assumes that the flat file has the same character set as the database. If this is not the case, specify the character set of the flat file in the external table definition to ensure the proper character set conversions can take place.
Oracle offers several data loading options
External table or SQL*Loader
Oracle Data Pump (import and export)
Change Data Capture and Trickle feed mechanisms (such as Oracle Golden Gate)
Oracle Database Gateways to open systems and mainframes
Generic Connectivity (ODBC and JDBC)
The approach that you take depends on the source and format of the data you receive.
Recommendations: Loading Flat Files
If you are loading from files into Oracle you have two options: SQL*Loader or external tables.
Using external tables offers the following advantages:
Allows transparent parallelization inside the database.You can avoid staging data and apply transformations directly on the file data using arbitrary SQL or PL/SQL constructs when accessing external tables. SQL Loader requires you to load the data as-is into the database first.
Parallelizing loads with external tables enables a more efficient space management compared to SQL*Loader, where each individual parallel loader is an independent database sessions with its own transaction. For highly partitioned tables this could potentially lead to a lot of wasted space.
You can create an external table using the standard CREATE TABLE
statement. However, to load from flat files the statement must include information about where the flat files reside outside the database. The most common approach when loading data from an external table is to issue a CREATE TABLE AS SELECT (CTAS)
statement or an INSERT AS SELECT (IAS)
statement into an existing table.
A direct path load parses the input data according to the description given in the external table definition, converts the data for each input field to its corresponding Oracle data type, then builds a column array structure for the data. These column array structures are used to format Oracle data blocks and build index keys. The newly formatted database blocks are then written directly to the database, bypassing the standard SQL processing engine and the database buffer cache.
The key to good load performance is to use direct path loads wherever possible:
A CREATE TABLE AS SELECT (CTAS)
statement always uses direct path load.
A simple INSERT AS SELECT (IAS)
statement does not use direct path load. In order to achieve direct path load with an IAS statement you must add the APPEND
hint to the command.
Direct path loads can also run in parallel. To set the parallel degree for a direct path load, either:
Add the PARALLEL
hint to the CTAS statement or an IAS statement.
Set the PARALLEL
clause on both the external table and the table into which the data is loaded.
After the parallel degree is set:
A CTAS
statement automatically performs a direct path load in parallel.
An IAS
statement does not automatically perform a direct path load in parallel. In order to enable an IAS
statement to perform direct path load in parallel, you must alter the session to enable parallel DML by executing the following statement.
alter session enable parallel DML;
A benefit of partitioning is the ability to load data quickly and easily with minimal impact on the business users by using the EXCHANGE PARTITION
command. The EXCHANGE PARTITION
command enables swapping the data in a nonpartitioned table into a particular partition in your partitioned table. The EXCHANGE PARTITION
command does not physically move data, instead it updates the data dictionary to exchange a pointer from the partition to the table and vice versa.
Because there is no physical movement of data, an exchange does not generate redo and undo. In other words, an exchange is a sub-second operation and far less likely to impact performance than any traditional data-movement approaches such as INSERT
.
Recommendations: Partitioning Tables
Partition the larger tables and fact tables in the Oracle Airlines Data Model warehouse.
Example 4-1 Using Exchange Partition Statement with a Partitioned Table
Assume that there is a large table called DWB_PNR_H
, which is range partitioned by day. At the end of each business day, data from the online airlines system is loaded into the DWB_PNR_H
table in the warehouse.
The following steps ensure the daily data gets loaded into the correct partition with minimal impact to the business users of the data warehouse and optimal speed:
Create external table for the flat file data coming from the online system
Using a CTAS
statement, create a nonpartitioned table called tmp_pnr
that has the same column structure as DWB_PNR_H
table
Build any indexes that are on the DWB_PNR_H
table on the tmp_pnr
table
Issue the EXCHANGE PARTITION
command.
Alter table dwb_pnr_h exchange partition SYS_P1926 with table tmp_pnr including indexes without validation;
Gather optimizer statistics on the newly exchanged partition using incremental statistics.
The EXCHANGE PARTITION
command in this example, swaps the definitions of the named partition and the tmp_pnr
table, so the data instantaneously exists in the right place in the partitioned table. Moreover, with the inclusion of the INCLUDING INDEXES
and WITHOUT VALIDATION
clauses, Oracle swaps index definitions and does not check whether the data actually belongs in the partition - so the exchange is very quick.
Note:
The assumption being made in this example is that the data integrity was verified at date extraction time. If you are unsure about the data integrity, omit theWITHOUT VALIDATION
clause so that the Database checks the validity of the data.Intra-ETL is delivered as a component of Oracle Airlines Data Model. This intra-ETL is delivered as a PL/SQL package named PKG_INTRA_ETL_PROCESS
which is a complete intra-ETL process that has procedures that populate the access layer.
The PKG_INTRA_ETL_PROCESS
package is composed of individual sub-process procedures and functions that respect the dependency of each individual program. The main procedures execute in the following order:
Populate_Dimension
- Populates the dimension (DWM_
) tables based on the content of the reference (DWR_
) tables.
Populate_Derived
- Populates the derived (DWD_
) tables based on the content of the base (DWB_
) tables.
Populate_Aggregate
- Refreshes all of the aggregate (DWA_
) tables using data from the dimension (DWM_
) and derived (DWD_
) tables.
Populate_Aw
- Loads data from Oracle Airlines Data Model aggregate (DWA_
) tables into the Oracle Airlines Data Model Analytical Workspace and calculates the forecast data. It reads OLAP ETL parameters from DWC_OLAP_ETL_PARM
table.
Populate_MINING
- Triggers the data mining models.
The PKG_INTRA_ETL_PROCESS
package is designed to work with the default Oracle Airlines Data Model. If you have customized the model, you also need to customize the intra-ETL. To customize the intra-ETL delivered with Oracle Airlines Data Model, take the following steps:
Familiarize yourself with the PKG_INTRA_ETL_PROCESS
package. In particular, understand the procedures, data flows, and maps as described in Oracle Airlines Data Model Reference, and how the procedures use the DWC_ETL_PARAMETER
and DWC_OLAP_ETL_PARM
control tables, and the DWC_INTRA_ETL_PROCESS
and DWC_INTRA_ETL_ACTIVITY
control tables. Review the following topics: "Performing an Initial Load of the Access Layer", "Refreshing the Access Layer of an Oracle Airlines Data Model Warehouse", and "Managing Errors During Oracle Airlines Data Model Intra-ETL Execution".
Identify the changes that you have made to the Oracle Airlines Data Model and what changes need to be made to the intra-ETL needs to support those model changes.
Make a copy of the packages that you need to change. Give the copies a different name.
Make the changes you identified in Step 2 to the packages you created in Step 3.
Optionally, you can create new intra-ETL from scratch - either by writing your own PL/SQL code or by using an ETL tool such as Oracle Warehouse Builder.
Performing an initial load of an Oracle Airlines Data Model is a multistep process:
Load the foundation layer of the Oracle Airlines Data Model warehouse (that is, the reference, lookup, and base tables) as described in "Performing an Initial Load of the Foundation Layer".
Load the access layer of the Oracle Airlines Data Model warehouse (that is, the derived and aggregate tables, the tables that dimension the derived and aggregate tables, materialized views, OLAP cubes, and data mining models) as described in "Performing an Initial Load of the Access Layer".
You perform the initial load of the foundation layer using source-ETL that you create. See "ETL for the Foundation Layer of an Oracle Airlines Data Model Warehouse" for more information on creating this ETL.
To perform an initial load of access layer of the Oracle Airlines Data Model warehouse (that is, the derived and aggregate tables, materialized views, OLAP cubes, and data mining models) take the following steps:
Update the parameters in DWC_ETL_PARAMETER
control table in the oadm_sys
schema so that the ETL can use this information (that is, the beginning and end date of the ETL period) when loading the data into the access layer.
For an initial load of an Oracle Airlines Data Model warehouse, specify the values shown in the following table.
Columns | Value |
---|---|
PROCESS_NAME |
'OADM-INTRA-ETL' |
FROM_DATE_ETL |
The beginning date of the ETL period. |
TO_DATE_ETL |
The ending date of the ETL period. |
See:
Oracle Airlines Data Model Reference for more information on theDWC_ETL_PARAMETER
control table.Update the Oracle Airlines Data Model OLAP ETL parameters in DWC_OLAP_ETL_PARM
control table in the oadm_sys
schema to specify the build method and other build characteristics so that the ETL can use this information when loading the OLAP cube data.
For an initial load of the analytic workspace, specify values following the guidelines in Table 4-1.
Table 4-1 Values of Oracle Airlines Data Model OLAP ETL Parameters in the DWC_OLAP_ETL_PARM Table for Initial Load
Column Name | Value |
---|---|
|
|
|
|
|
One of the following values that specifies the cubes you want to build:
|
|
A decimal value that specifies the number of parallel processes to allocate to this job. (Default value is |
|
One of the following values depending on whether you want to calculate forecast cubes:
|
|
If the value for the |
|
If the value for the |
|
If the value for the |
|
If the value for the |
|
Specify |
|
Specify |
Execute the intra-ETL in one of the ways described in "Executing the Default Oracle Airlines Data Model Intra-ETL ".
You execute the PKG_INTRA_ETL_PROCESS
process flow from an Oracle client tool (for example, ex: SQL Plus) by issuing the following statement.
EXEC PKG_INTRA_ETL_PROCESS.RUN( )
The PL/SQL code executed by PKG_INTRA_ETL_PROCESS
:
Reads the values from the DWC_ETL_PARAMETER
and DWC_OLAP_ETL_PARM
control tables in the oadm_sys
schema before executing the mappings in the correct order.
The result of each table loading are tracked in the DWC_INTRA_ETL_PROCESS
and DWC_INTRA_ETL_ACTIVITY
control tables.
Executing the PKG_INTRA_ETL_PROCESS
from within an Oracle client provides the ability to monitor the execution of the process. However, you can simply execute the RUN
procedure within the PKG_INTRA_ETL_PROCESS
PL/SQL package.
In either case, you can execute the intra-ETL explicitly or invoke its execution in some other program or process (for example, the source-ETL process after its successful execution) or through a predefined schedule (for example, using Oracle Job Scheduling feature and so on).
"Performing an Initial Load of the Access Layer" describes how to perform an initial load of an Oracle Airlines Data Model data warehouse. After this initial load, you must load new data into your Oracle Airlines Data Model data warehouse regularly so that it can serve its purpose of facilitating business analysis.
To load new data into your Oracle Airlines Data Model warehouse, you extract the data from one or more operational systems and copy that data into the warehouse. The challenge in data warehouse environments is to integrate, rearrange and consolidate large volumes of data over many systems, thereby providing a new unified information base for business intelligence.
The successive loads and transformations must be scheduled and processed in a specific order and are determined by your business needs. Depending on the success or failure of the operation or parts of it, the result must be tracked and subsequent, alternative processes might be started.
You can do a full incremental load of the Oracle Airlines Data Model warehouse, or you can refresh the data sequentially.:
Refreshing the Foundation Layer of Oracle Airlines Data Model Warehouse
Refreshing the Access Layer of an Oracle Airlines Data Model Warehouse
In either case, you can manage errors during the execution of the intra-ETL as described in "Managing Errors During Oracle Airlines Data Model Intra-ETL Execution".
You refresh the foundation layer using source-ETL scripts that you wrote using Oracle Warehouse Builder or another ETL tool. For more information on creating source-ETL, see "ETL for the Foundation Layer of an Oracle Airlines Data Model Warehouse".
Refreshing the access layer of an Oracle Airlines Data Model is a multi-step process. You can do a full incremental load of the access layer all at one time by executing the PKG_INTRA_ETL_PROCESS
package as described in"Executing the Default Oracle Airlines Data Model Intra-ETL ", or you can refresh the data sequentially:
In either case, you can manage errors during the execution of the intra-ETL as described in "Managing Errors During Oracle Airlines Data Model Intra-ETL Execution".
After you have refreshed the foundation layer of the Oracle Airlines Data Model. You can refresh only the relational tables and views in the access layer of an Oracle Airlines Data Model by taking the following steps:
Update the parameters of the DWC_ETL_PARAMETER
control table in the oadm_sys
schema. For an incremental load of an Oracle Airlines Data Model warehouse, specify the values shown in the following table (that is, the beginning and end date of the ETL period).
Columns | Value |
---|---|
PROCESS_NAME |
'OADM-INTRA-ETL' |
FROM_DATE_ETL |
The beginning date of the ETL period. |
TO_DATE_ETL |
The ending date of the ETL period. |
See:
Oracle Airlines Data Model Reference for more information on theDWC_ETL_PARAMETER
control table.Refresh the tables by executing the following procedures in the PKG_INTRA_ETL_PROCESS
PL/SQL package as described in "Executing the Default Oracle Airlines Data Model Intra-ETL ". Execute the procedures in the following order:
Populate_Dimension
Populate_Derived
Populate_Aggregate
On a scheduled basis you must update the OLAP cube data with the relational data that has been added to the Oracle Airlines Data Model data warehouse since the initial load of the OLAP cubes.
Take these steps to refresh only the OLAP cube data in the Oracle Airlines Data Model warehouse:
Ensure that the underlying aggregate tables are refreshed. See "Refreshing the Access Layer Relational Tables in the Oracle Airlines Data Model" for more information.
Execute the PKG_INTRA_ETL_PROCESS.Populate_Aw
procedure to load the cube data.
If necessary, recover from errors that happen during the execution of Populate_Aw
by taking the following steps.
Change the value of the BUILD_METHOD
column of the DWC_OLAP_ETL_PARM
table to "C"
.
Re-execute PKG_INTRA_ETL_PROCESS.Populate_Aw
.
The PKG_INTRA_ETL_PROCESS.Populate_Mining
procedure triggers the data mining model refreshment as part of the initial load of the warehouse. After the initial load of the warehouse, it is recommended that you refresh the data mining models monthly.
After you have refreshed the OLAP cubes, you can also refresh the data mining models. In this case, the way you refresh a data mining model varies depending on whether you want to refresh all of the data mining models or only one data mining model:
To manually refresh all data mining models, call the following procedure.
oadm_sys.pkg_oadm_mining.refresh_model(p_month_code,p_process_no)
This procedure performs the following tasks for each data mining model:
Refreshes the source materialized views for the data mining model based on the latest data from oadm_sys
schema.
Trains each data mining model on the new training data.
Applies each data mining model onto the new apply data set.
To manually re-create only one data mining model, you can call the corresponding oadm_sys.pkg_oadm_mining.create_
procedure.
For example, to re-create the Customer Life Time Value regression data mining model, call the following procedure.
oadm_sys.pkg_oadm_mining.create_cust_ltv_svm_rgrsn(p_month_cd);
"Tutorial: Customizing the Customer Life Time Value Prediction Data Mining Model" provides detailed instructions for refreshing a single data mining model.
This topic discusses how you can identify and manage errors during intra-ETL execution. It contains the following topics:
Two oadm_sys
schema control tables, DWC_INTRA_ETL_PROCESS
and DWC_INTRA_ETL_ACTIVITY
, monitor the execution of the intra-ETL process. These tables are documented in Oracle Airlines Data Model Reference.
Each normal run of the PKG_INTRA_ETL_PROCESS
package (as opposed to an error-recovery run) performs the following steps:
Inserts a record into the DWC_INTRA_ETL_PROCESS
table with a monotonically increasing system generated unique process key, SYSDATE
as process start time, RUNNING
as the process status, input date range in the fields FROM_DATE_ETL
and TO_DATE_ETL
.
Invokes each of the individual PKG_INTRA_ETL_PROCESS
procedures in the appropriate order of dependency. Before the invocation of each program, the procedure inserts a record into the Intra-ETL Activity detail table DWC_INTRA_ETL_ACTIVITY
with a system generated unique activity key, the process key value corresponding to the Intra-ETL process, individual program name as the Activity Name
, a suitable activity description, SYSDATE
as activity start time, RUNNING
as the activity status.
Updates the corresponding record in the DWC_INTRA_ETL_ACTIVITY
table for the activity end time and activity status after the completion of each individual ETL program (either successfully or with errors. For successful completion of the activity, the procedure updates the status as 'COMPLETED-SUCCESS'
. If an error occurs, the procedure updates the activity status as 'COMPLETED-ERROR'
, and also updates the corresponding error detail in the ERROR_DTL
column.
Updates the record corresponding to the process in the DWC_INTRA_ETL PROCESS
table for the process end time and status, after the completion of all individual intra-ETL programs. If all the individual programs succeed, the procedure updates the status to 'COMPLETED-SUCCESS'
, otherwise it updates the status to 'COMPLETED-ERROR'
.
You can monitor the execution state of the intra-ETL, including current process progress, time taken by individual programs, or the complete process, by viewing the contents of the DWC_INTRA_ETL_PROCESS
and DWC_INTRA_ETL_ACTIVITY
tables corresponding to the maximum process key. Monitoring can be done both during and after the execution of the intra-ETL procedure.
To recover run of the PKG_INTRA_ETL_PROCESS
package:
Identify the errors by looking at the corresponding error details that are tracked against the individual programs in the DWC_INTRA_ETL_ACTIVITY
table.
Correct the causes of the errors.
Re-execute the PKG_INTRA_ETL_PROCESS
package.
The procedures in the PKG_INTRA_ETL_PROCESS
package identify whether it is a normal run or recovery run by referring the DWC_INTRA_ETL_ACTIVITY
table. During a recovery run, PKG_INTRA_ETL_PROCESS
executes only the procedures needed for recovery. For example, in the case of a derived population error as a part of the previous run, this recovery run executes the individual derived population programs which produced errors in the previous run. After their successful completion, the run executes the aggregate population programs and materialized view refresh in the appropriate order.
In this way, the intra-ETL error recovery is almost transparent, without involving the data warehouse or ETL administrator. The administrator must only correct the causes of the errors and re-invoke the PKG_INTRA_ETL_PROCESS
package. PKG_INTRA_ETL_PROCESS
identifies and executes the programs that generated errors.
To troubleshoot the performance of the intra-ETL:
Check the execution plan as described in "Checking the Execution Plan".
Monitor parallel DML executions as described in "Monitoring PARALLEL DML Executions".
Check that data mining models were created correctly as described in "Troubleshooting Data Mining Model Creation".
Use SQLDeveloper or other tools to view the package body of the code generated by Oracle Warehouse Builder.
For example, take the following steps to examine a map:
Copy out the main query statement from code viewer.
Copy from "CURSOR "AGGREGATOR_c" IS …."
to end of the query, which is right above another "CURSOR "AGGREGATOR_c$1" IS"
.
In SQLDeveloper worksheet, issue the following statement to turn on the parallel DML:
Alter session enable parallel dml;
Paste the main query statement into another SQL Developer worksheet and view the execution plan by clicking F6
.
Carefully examine the execution plan to make the mapping runs according to a valid plan.
Check that you are running mapping in parallel mode by executing the following SQL statement to count the executed "Parallel DML/Query" statement
column name format a50 column value format 999,999 SELECT NAME, VALUE FROM GV$SYSSTAT WHERE UPPER (NAME) LIKE '%PARALLEL OPERATIONS%' OR UPPER (NAME) LIKE '%PARALLELIZED%' OR UPPER (NAME) LIKE '%PX%' ;
If you run mapping in parallel mode, you should see "DML statements parallelized" increased by 1 (one) every time the mapping was invoked. If not, you do not see this increase, then the mapping was not invoked as "parallel DML".
If you see "queries parallelized" increased by 1 (one) instead, then typically it means that the SELECT
statement inside of the INSERT
was parallelized, but that INSERT
itself was not parallelized.
After the data mining models are created, check the error log in oadm_sys.dwc_intra_etl_activity
table. For example, execute the following code.
set line 160 col ACTIVITY_NAME format a30 col ACTIVITY_STATUS format a20 col error_dtl format a80 select activity_name, activity_status, error_dtl from dwc_intra_etl_activity;
If all models are created successfully, the activity_status
is all "COMPLETED-SUCCESS"
. If the activity_status
is "COMPLETED-ERROR"
for a certain step, check the ERROR_DTL column, and fix the problem accordingly.
The following examples illustrate how to troubleshoot some common error messages returned in ERROR_DTL
and ACTIVITY_NAME
when working with Oracle Airlines Data Model:
Example 4-2, "Troubleshooting the "Message not available ... [Language=ZHS]" Error"
Example 4-3, "Troubleshooting ORA-40112: insufficient number of valid data rows,"
Example 4-4, "Troubleshooting ORA-40113: insufficient number of distinct target values"
Example 4-2 Troubleshooting the "Message not available ... [Language=ZHS]" Error
Assume that the returned error is Message not available ... [Language=ZHS]
.
'ZHS' is a code for a language. The language name it relates to can appear as different name depending on the database environment. This error happens when oadm_sys.DWC_MESSAGE.LANGUAGE
does not contain messages for the current language.
Check the values in the DWC_MESSAGE
table and, if required, update to the language code specified by the Oracle session variable USERENV('lang')
.
Example 4-3 Troubleshooting ORA-40112: insufficient number of valid data rows,
Assume that the returned error is ORA-40112: insufficient number of valid data rows, for "create_cust_ltv_svm_rgrsn" model
.
For this model, the target column is oadm_sys.dmv_cust_ltv_src.tot_cpn_amt
.
To troubleshoot this error:
Execute the following SQL statements.
SELECT count(tot_cpn_amt) FROM dmv_cust_ltv_src; SELECT count(frqtflr_card_key) FROM dmv_cust_ltv_src;
Check that the values returned by above two queries are same and greater than 0 (zero). If value returned by the first SELECT
statement is smaller than value returned by the second SELECT
statement, then check the source tables and materialized views of dmv_cust_ltv_src
.
Example 4-4 Troubleshooting ORA-40113: insufficient number of distinct target values
Assume that the returned error is ORA-40113: insufficient number of distinct target values, for "create_ffp_pred_svm" model
.For a two-class classification model, the target column should have two distinct values. This error happens when the target column for the training model contains only one value or no value when it is expecting two values.
For example, for frequent fliers prediction among non-frequent fliers svm model, the target column is oadm_sys.dmv_ffp_pred_src.ff_ind
.
To troubleshoot this error:
Execute a SQL query to check if there are enough values in this column. Using the frequent flier prediction svm model as an example, issue the following statement.
SELECT ff_ind, count(*) FROM dmv_ffp_pred_src GROUP BY ff_ind;
The result of the query is shown below.
FF_IND COUNT(*) ------ -------- 1 1296 0 990
Check the following tables to make sure that there are both frequent fliers and non-frequent fliers by issuing the following query.
SELECT NVL2(frqtflr_nbr,'FFP','Non_FFP') AS ffp_ind, count(*) FROM oadm_sys.dwd_bkg_fact GROUP BY NVL2(frqtflr_nbr,'FFP','Non_FFP');
The result of the query is shown below
FFP_IND COUNT(*) ------- -------- Non-FFP 77353 FFP 42647
Execute the following statement to refresh the mining source materialized views:
exec pkg_oadm_mining.refresh_mining_source;