Oracle® R Enterprise User's Guide Release 11.2 for Linux, Solaris, AIX, and Windows Part Number E26499-05 |
|
|
PDF · Mobi · ePub |
R is an open source statistical programming language and environment. For information about R, see the R Project for Statistical Computing at http://www.r-project.org
.
R provides an environment for statistical computing, including:
An easy-to-use language
A powerful graphical environment for visualization
Many out-of-the-box statistical techniques
R packages (An R package is a set of related functions, help files, and data files; currently, there are more than 3340 R packages.)
The R Console graphical user interface for analyzing data interactively
R's rapid adoption has earned it a reputation as a new statistical software standard.
Oracle R Enterprise is a component of the Oracle Advanced Analytics Option of Oracle Database Enterprise Edition. For detailed information about Oracle R Enterprise, including links to software downloads, go to Oracle R Enterprise at http://www.oracle.com/technetwork/database/options/advanced-analytics/r-enterprise/index.html
.
Oracle R Enterprise allows users to perform statistical analysis on data stored in tables in an Oracle Database. Oracle R Enterprise has these components:
The Oracle R Enterprise R transparency layer. The transparency layer is a collection of packages that support mapping of R data types to Oracle Database objects and generate SQL transparently in response to R expressions on mapped data types. The transparency layer allows an R user to directly interact with database-resident data using R language constructs. This enables R users to work with data too large to fit into the memory of a user's desktop system.
The Oracle statistics engine, a collection of statistical functions and procedures corresponding to commonly-used statistical libraries. The statistics engine packages execute in Oracle Database.
SQL extensions supporting R engine execution through the database on the database server. These SQL extensions enable productizing R scripts, that is, running R scripts in a lights-out mode.
Oracle R Connector for Hadoop is an R package executing MapReduce jobs that enables R users to directly work with an Oracle Hadoop cluster executing computations written in the R language and working on data resident in HDFS, Oracle database or local files.
The components of Oracle R Enterprise are described in Chapter 3.
Oracle R Connector for Hadoop is a related product that is part of the Big Data Appliance.
Oracle R Enterprise also includes functions that perform most common or base statistical procedures; see Chapter 4 for more information.
The rest of this chapter describes Oracle R Enterprise Architecture, Oracle R Enterprise Data Types, and Oracle R Enterprise Supported Configurations.
Oracle R Enterprise has these three components including the connector for Hadoop:
The Client R Engine is a collection of R packages that allows you to connect to an Oracle Database and to interact with data in that database.
You can use any R commands from the client. In addition, the client supplies these functions:
The R SQL Transparency framework intercepts R functions for scalable in-database execution
Functions intercept data transforms, statistical functions, and Oracle R Enterprise-specific functions
Interactive display of graphical results and flow control as in open source R
Submission of R closures (functions) for execution in the Oracle Database
The Server is a collection of PL/SQL procedures and libraries that augment Oracle Database with the capabilities required to support an Oracle R Enterprise client. The R engine is also installed on Oracle Database to supported embedded R execution. Oracle Database spawns R engines, which can provide data parallelism.
The Oracle R Enterprise Database engine provides this functionality:
Scale to large datasets
Access to tables, views, and external tables in the database, as well as those accessible through database links
Use SQL query parallel execution
Use in-database statistical and data mining functionality
R Engines spawned by Oracle Database are spawned to support database-managed parallelism; provide lights-out scheduled execution of R scripts, that is, scheduling or triggering R scripts packaged inside a PL/SQL or SQL query. Oracle R Enterprise provides efficient transfer to and from the spawned engines. Embedded R execution can be used to emulate MapReduce style programming.
There are several data types specific to Oracle R Enterprise; see Oracle R Enterprise Data Types for details.
Oracle R Connector for Hadoop (ORHC) is an R package that provides an interface between the local R environment and Hadoop. You install and load this package just as you would any other R package. Using R functions, you can copy data between R memory, the local file system, and HDFS. You can schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations.
ORHC is preinstalled on Oracle Big Data Appliance, but it is licensed separately as one of the Oracle Big Data Connectors. You can install ORHC on a Hadoop cluster other than one on an Oracle Big Data Appliance.
For information about ORCH, see the Oracle Big Data Connectors User's Guide (http://docs.oracle.com/cd/E27101_01/doc.10/e27365/toc.htm
), part of the Oracle Big Data Documentation library (http://docs.oracle.com/cd/E27101_01/index.htm
).
Oracle R Enterprise introduces a variant to many R data types. The name of the Oracle R Enterprise data type is the name of the corresponding R data type prefixed by ore. These data types establish a mapping between an R object and a database table or view. The mapping tracks metadata of the Oracle object which in turn aids in SQL query generation. These data types form the foundation of the Oracle R Enterprise transparency layer.
The following R data types have been overloaded for transparent in-database execution:
Character, Integer, Numeric and Logical vectors
Factors
Data Frame
Matrix is overloaded in two situations:
Linear algebra cross-products
Creating input matrices for advanced analytics
For more information and examples, see Oracle R Enterprise Transparency Framework .
Oracle R Enterprise consists of a client and a server. The client and the server run on Microsoft Windows (32-bit and 64-bit), Oracle Linux, Red Hat Linux, Solaris, or IBM AIX. The server is installed in an Oracle Database, to which the client connects.
Oracle R Enterprise also runs on Oracle Exadata machines with the Linux or Solaris operating system and on SPARC SuperCluster. For details, see Prerequisites.
Installation of Oracle R Enterprise is described in Chapter 2.