Changes in This Release for Oracle Data Mining Concepts Guide
Changes in this release for Oracle Data Mining Concepts.
Changes in Oracle Data Mining 18c
The following changes are documented in Oracle Data Mining Concepts for 18c.
New Features
The following features are new in this release:
New Mining Function
Time Series
Time series analysis provides forecasts of future values based on past history. For example, forecasting sales based on the prior sequence of sales. Forecasting is a critical component of business and governmental decision making.
See Time Series.
New Algorithms
-
Random Forest
Random Forest is a powerful machine learning algorithm. It uses an ensemble method that combines multiple trees built with random feature selection. Effectively, individual trees are built in random subspaces and combined using the bagging ensemble method.
Random forest is a very popular algorithm which has excellent performance on a number of benchmarks. It is part of Oracle R Enterprise (ORE) but the implementation is based on a public R package. Implementing it as kernel code brings significant performance and scalability benefits.
See Random Forest.
-
Enhanced Explicit Semantic Analysis Machine Learning Algorithm to Support Classification
Explicit Semantic Analysis (ESA) is exposed in Oracle Database 12c Release 2, as a topic model only under
FEATURE_EXTRACTION
. It typically uses hundreds of thousands of explicit features. The algorithm can be easily adapted to perform Classification to address use cases with hundreds of thousands of classes of challenging Classification problem that is not appropriately addressed by the current Oracle Advanced Analytics (OAA) algorithms.The task of large text classification is very important in the context of big data. Extending ESA to Classification significantly enhances our offering in the text classification domain and allows OAA to address use cases which are currently intractable for the product.
-
Neural Network
The Neural Network algorithm is a biologically inspired approach where a collection of interconnected units (neurons) learn to approximate a function. Neural Networks are appropriate for nonlinear approximation in both Classification and Regression problems.
Neural networks are powerful algorithms that can learn arbitrary nonlinear functions. There have been successfully used in a number of hard problems, including non-linear regression or time series, computer vision, and speech recognition.
See Neural Network.
-
CUR Decomposition-based Algorithm for Attribute and Row Importance
The CUR algorithm allows users to find the columns and features that best explain their data. This algorithm has gained popularity because it allows the user to gain insight into their data using easily understandable terms. In contrast, decomposition method like Singular Value Decomposition (SVD) derive implicit features that are hard to interpret. CUR is tries to use the insights derived from SVD but translate them in terms of the original rows and columns.
A CUR-based attribute and row importance can be used to provide data insight as well as a data filter followed by additional analytical processing. This will be the first Oracle Advanced Analytics (OAA) algorithm that singles out not only important columns but important rows.
-
Exponential Smoothing
Exponential Smoothing (ESM) allows users to make predictions from time series data. Exponential Smoothing Methods (ESM) are widely used for forecasting from time series data. Originally, thought to be less flexible and accurate than competitors, such as ARIMA, ESM has more recently been shown to cover a broader class of models and has been extended to increase both its descriptive realism and accuracy. Oracle ESM includes many of these recent extensions, a total of 14 models, including the popular Holt (trend) and Holt-Winters (trend and seasonality) models, and the ability to handle irregular time series intervals.
Algorithm Enhancements
-
Algorithm Meta Data Registration
The algorithm meta data registration simplifies and streamlines the integration of new algorithms in the R extensibility framework. This feature allows a uniform consistent approach of registering new algorithm functions and their settings.
The integration of new algorithms in the extensibility framework will be simplified. The GUI will be able to seamlessly pick up and support such new algorithms.
-
Alternating Direction Method of Multipliers (ADMM)
A new Generalized Liner Models (GLM) distributed solver Alternating Direction Method of Multipliers (ADMM) is introduced.
See GLM Solvers.
-
Association Rules Sampling
A new specialized sampling approach is introduced for Association Rules.
See Improved Sampling.
New Administrative Tasks
IMPORT
and EXPORT
Serialized Models
Serialized machine learning models can be exported in a serialized object form. Serialized models can be moved to another platform for scoring.
See Oracle Data Mining User’s Guide and Oracle Database PL/SQL Packages and Types Reference.
Deprecated Features
The following features are deprecated in this release, and may be desupported in another release. See Oracle Database Upgrade Guide for a complete list of deprecated features in this release.
-
*GET_MODEL_DETAILS
are deprecated and are replaced with Model Detail Views. See Oracle Data Mining User’s Guide.
Desupported Features
See Oracle Database Upgrade Guide for a complete list of desupported features in this release.
Other Changes
The following is an additional change in Oracle Data Mining Concepts for 18c:
“Oracle Data Mining with R Extensibility” topic is moved from Chapter Introduction to Oracle Data Mining to a new chapter: R Extensibility.