33.4 Creating a Model that Includes Text Mining
Learn how to create a model that includes text mining.
Oracle Data Mining supports unstructured text within columns of VARCHAR2
, CHAR
, CLOB
, BLOB
, and BFILE
, as described in the following table:
Table 33-2 Column Data Types That May Contain Unstructured Text
Data Type | Description |
---|---|
|
Oracle Data Mining interprets |
|
Oracle Data Mining interprets |
|
Oracle Data Mining interprets |
|
Oracle Data Mining interprets Oracle Data Mining interprets |
The settings described in the following table control the term extraction process for text attributes in a model. Instructions for specifying model settings are in "Specifying Model Settings".
Table 33-3 Model Settings for Text
Setting Name | Data Type | Setting Value | Description |
---|---|---|---|
|
|
Name of an Oracle Text policy object created with |
Affects how individual tokens are extracted from unstructured text. See "Creating a Text Policy". |
|
|
1 <= value <= 100000 |
Maximum number of features to use from the document set (across all documents of each text column) passed to Default is 3000. |
A model can include one or more text attributes. A model with text attributes can also include categorical and numerical attributes.
To create a model that includes text attributes:
-
Create an Oracle Text policy object..
-
Specify the model configuration settings that are described in "Table 33-3".
-
Specify which columns must be treated as text and, optionally, provide text transformation instructions for individual attributes.
-
Pass the model settings and text transformation instructions to
DBMS_DATA_MINING.CREATE_MODEL
.Note:
All algorithms except O-Cluster can support columns of unstructured text.
The use of unstructured text is not recommended for association rules (Apriori).