Oracle® Retail Data Model Implementation and Operations Guide Release 11.3.1 Part Number E20363-02 |
|
|
PDF · Mobi · ePub |
This appendix provides information about sizing and configuring an Oracle Retail Data Model warehouse. It contains the following topics:
Businesses now demand more information sooner and are delivering analytics from their Enterprise Data Warehouse (EDW) to an ever-widening set of users and applications. To keep up with this increase in demand the EDW must now be near real-time and be highly available. Regardless of the design or implementation of a data warehouse the initial key to good performance lies in the hardware configuration used. This has never been more evident than with the recent increase in the number of data warehouse appliances in the market.
But how do you go about sizing such a system? You must first understand how much throughput capacity is required for your system and how much throughput each individual CPU or core in your configuration can drive, thus the number one task is to calculate the database space requirement in your data warehouse.
There are two data volume estimate resources in a data warehouse environment:
The estimated raw data extract from source systems. This estimate affects the ETL system configuration and the staging layer database space in the data warehouse system. Because this value is determined by your specific transactional systems, you must calculate this information yourself.
The space needed for data stored to support the objects defined in the default Oracle Retail Data Model schema. This appendix provides information you can use to make this calculation.
Calculation Factors When Making a Data Volume Calculation for an Oracle Retail Data Model Warehouse
Consider the following calculation factors when making a data volume calculation:
Calculates data unit volume within different type:
Reference and lookup tables data. Assume this data is permanently stored.
Base tables data (transaction data). Assume that this data is stored within its life cycle.
Star schema (derived and summary). Assume that this data is stored within its life cycle.
Calculate each type of data retention.
Define how many months or years of each type of tables to retain.
Calculate data growth.
Assume that annual growth rate: applies to both transaction and reference data and data in the star schema.
Assume that annual change rate applies only to reference data.
Calculate Staging Area data requirements, if proposed.
Tip:
Multiply ETL volume by day by number of days held for problem resolution and re-run of transform with new extract from source systems.Calculate data volume for indexes, temporary tables, and transaction logs.
Calculate the space requirement for business intelligence tools, such as cubes, and data mining.
Consider the redo log and Oracle ASM space requirement.
Consider the RAID architecture [RAID 1, 0+1, 5]
Consider the backup strategy.
Consider the compress factor if applied.
Consider the OS and file system disk space requirements.
Formula to Determine Minimum Disk Space Requirements for an Oracle Retail Data Model Warehouse
Use the following formula, based on the factors outlined in "Calculation Factors When Making a Data Volume Calculation for an Oracle Retail Data Model Warehouse", to determine the minimum disk space requirements for an Oracle Retail Data Model warehouse.
Disk Space Minimum Requirements = Raw data size * Database space factor * (1+GrthperY)nY*OS and File system factor * Compress Factor * Storage Redundant facto
r
where:
Raw data size = (reference and lookup data per year + base/transaction data per year + derived and summary data per year +staging data +other data(OLAP/Data Mining))
Database space factor = Indexes + Temporary Tables + Logs]
GrthperY
= growth rate per year
OS and File system factor
is the install and configuration and maintain space for OS and DB
Redundant factor= ASM disk space and RAID factor. [RAID 1=2, RAID 5=1.25 or 1.33]
Compress factor
depends how you apply the compress function. If you are executing on an Oracle Exadata Database machine, it has a huge savings in disk space by using compression.
Many data warehouse operations are based upon large table scans and other I/O-intensive operations, which perform vast quantities of random I/Os. To achieve optimal performance the hardware configuration must be sized end to end to sustain this level of throughput. This type of hardware configuration is called a balanced system. In a balanced system all components - from the CPU to the disks - are orchestrated to work together to guarantee the maximum possible I/O throughput. I/O performance is always a key consideration for data warehouse designers and administrators. The typical workload in a data warehouse is especially I/O intensive, with operations such as large data loads and index builds, creation of materialized views, and queries over large volumes of data. Design the underlying I/O system for a data warehouse to meet these heavy requirements.
To create a balanced system, answer the following questions:
How many CPUs are required? What speed is required?
What amount of memory is required? Data warehouses do not have the same memory requirements as mission-critical transactional applications?
How many I/O bandwidth components are required? What is the desired I/O speed?
Each component must be able to provide sufficient I/O bandwidth to ensure a well-balanced I/O system.
The following topics provide more information about configuring a balanced system for Oracle Retail Data Model:
Maintaining High Throughput in an Oracle Retail Data Model Warehouse
Configuring I/O in an Oracle Retail Data Model for Bandwidth not Capacity
Testing the I/O System Before Building the Oracle Retail Data Model Warehouse
Balanced Hardware Configuration Guidelines for Oracle Retail Data Model
The hardware configuration and data throughput requirements for a data warehouse are unique mainly because of the sheer size and volume of data. Before you begin sizing the hardware configuration for your data warehouse, estimate the highest throughput requirement to determine whether current or proposed hardware configuration can deliver the necessary performance. When estimating throughput, use the following criteria:
The amount of data accessed by queries during peak time, and the acceptable response time
The amount of data that is loaded within a window of time
Based on the data volume calculated and the highest throughput requirement, you can estimate the I/O throughput along with back-end ETL process and front end business intelligence applications by time unit. Typically, a value of approximately 200MB per second I/O throughput per core is a good planning number for designing a balanced system. All subsequent critical components on the I/O path - the Host Bus Adapters, fiber channel connections, the switch, the controller, and the disks - have to be sized appropriately.
When running a data warehouse on an Oracle Real Application Cluster (Oracle RAC) it is just as important to size the cluster interconnect with the same care and caution you would use for the I/O subsystem throughput.
When configuring the storage subsystem for a data warehouse, it should be simple, efficient, highly available and very scalable. An easy way to achieve this is to apply the S.A.M.E. methodology (Stripe and Mirror Everything). S.A.M.E. can be implemented at the hardware level or by using Oracle ASM (Automatic Storage Management) or by using a combination of both. There are many variables in sizing the I/O systems, but one basic rule of thumb is that the data warehouse system has multiple disks for each CPU (at least two disks for each CPU at a bare minimum) to achieve optimal performance.
A data warehouse designer plans for future growth of a data warehouse. There are several approaches to handling the growth in a system, and the key consideration is to be able to grow the I/O system without compromising on the I/O bandwidth. You cannot, for example, add four disks to an existing system of 20 disks, and grow the database by adding a new tablespace striped across only the four new disks. A better solution would be to add new tablespaces striped across all 24 disks, and over time also convert the existing tablespaces striped across 20 disks to be striped across all 24 disks.
When creating a data warehouse on a new system, test the I/O bandwidth before creating all of the database data files to validate that the expected I/O levels are being achieved. On most operating systems, you can perform the test using simple scripts to measure the performance of reading and writing large test files.
You can reference the follow tips for a balanced hardware configuration:
Total throughput = #cores X 100-200MB
(depends on the chip set)
Total host bus adaptor (HBA) throughput = Total core throughput
Note:
If total core throughput is 1.6 GB, you need four 4 Gbit HBAs.Use one disk controller per HBA port (throughput capacity must be equal).
Switches must have the capacity as HBAs and disk controllers.
Use a maximum of ten physical disk per controller (that is, use smaller drives: 146 or 300 GB).
Use a minimum of 4 GB of memory per core (8 GB if using compress).
Interconnect bandwidth equals I/O bandwidth (InfiniBand).
Oracle now provides the Oracle Database Machine, Exadata which combines industry-standard hardware from Oracle, Oracle Database 11g Release 2, and Oracle Exadata Storage Server Software to create a faster, more versatile database machine. It's a completely scalable and fault tolerant package for all data management, especially for data warehousing.
Oracle also has a series of Optimized Warehouse Reference configurations that help customers take the risk out of designing and deploying Oracle data warehouses. Using extensive field experience and technical knowledge, Oracle and its hardware partners have developed a choice of data warehouse reference configurations that can support various sizes, user populations and workloads. These configurations are fast, reliable and can easily scale from 500 GB to over 100 TB on single and clustered servers to support tens to thousands of users.