Oracle® Text Reference 11g Release 2 (11.2) Part Number E24436-02 |
|
|
PDF · Mobi · ePub |
This chapter describes the CTX_QUERY
PL/SQL package you can use for generating query feedback, counting hits, and creating stored query expressions.
Note:
You can use this package only when your index type isCONTEXT
. This package does not support the CTXCAT
index type.The CTX_QUERY
package includes the following procedures and functions:
Name | Description |
---|---|
BROWSE_WORDS | Returns the words around a seed word in the index. |
COUNT_HITS | Returns the number hits to a query. |
EXPLAIN | Generates query expression parse and expansion information. |
HFEEDBACK | Generates hierarchical query feedback information (broader term, narrower term, and related term). |
REMOVE_SQE | Removes a specified stored query expression from the SQL tables. |
RESULT_SET | Executes a query and generates a result set. |
STORE_SQE | Executes a query and stores the results in stored query expression tables. |
This procedure enables you to browse words in an Oracle Text index. Specify a seed word and BROWSE_WORDS
returns the words around it in the index, and an approximate count of the number of documents that contain each word.
This feature is useful for refining queries. You can identify the following words:
Unselective words (words that have low document count)
Misspelled words in the document set
Syntax 1: To Store Results in Table
ctx_query.browse_words(
index_name IN VARCHAR2, seed IN VARCHAR2, restab IN VARCHAR2, browse_id IN NUMBER DEFAULT 0, numwords IN NUMBER DEFAULT 10, direction IN VARCHAR2 DEFAULT BROWSE_AROUND, part_name IN VARCHAR2 DEFAULT NULL
);
Syntax 2: To Store Results in Memory
ctx_query.browse_words(
index_name IN VARCHAR2, seed IN VARCHAR2, resarr IN OUT BROWSE_TAB, numwords IN NUMBER DEFAULT 10, direction IN VARCHAR2 DEFAULT BROWSE_AROUND, part_name IN VARCHAR2 DEFAULT NULL
);
Specify the name of the index. You can specify schema.name
. Must be a local index.
Specify the seed word. This word is lexed before browse expansion. The word need not exist in the token table. seed must be a single word. Using multiple words as the seed will result in an error.
Specify the name of the result table. You can enter restab as schema.name
. The table must exist before you call this procedure, and you must have INSERT
permissions on the table. This table must have the following schema.
Column | Datatype |
---|---|
browse_id | number |
word | varchar2(64) |
doc_count | number |
Existing rows in restab are not deleted before BROWSE_WORDS
is called.
Specify the name of the result array. resarr
is of type ctx_query.browse_tab
.
type browse_rec is record ( word varchar2(64), doc_count number ); type browse_tab is table of browse_rec index by binary_integer;
Specify a numeric identifier between 0 and 232. The rows produced for this browse have a value of in the browse_id
column in restab
. When you do not specify browse_id
, the default is 0.
Specify the number of words returned.
Specify the direction for the browse. You can specify one of:
value | behavior |
---|---|
BEFORE |
Browse seed word and words alphabetically before the seed. |
AROUND |
Browse seed word and words alphabetically before and after the seed. |
AFTER |
Browse seed word and words alphabetically after the seed. |
Symbols CTX_QUERY.BROWSE_BEFORE
, CTX_QUERY.BROWSE_AROUND
, and CTX_QUERY.BROWSE_AFTER
are defined for these literal values as well.
Specify the name of the index partition to browse.
Browsing Words with Result Table
begin ctx_query.browse_words('myindex','dog','myres',numwords=>5,direction=>'AROUND'); end; select word, doc_count from myres order by word; WORD DOC_COUNT -------- ---------- CZAR 15 DARLING 5 DOC 73 DUNK 100 EAR 3
Browsing Words with Result Array
set serveroutput on; declare resarr ctx_query.browse_tab; begin ctx_query.browse_words('myindex','dog',resarr,5,CTX_QUERY.BROWSE_AROUND); for i in 1..resarr.count loop dbms_output.put_line(resarr(i).word || ':' || resarr(i).doc_count); end loop; end;
Returns the number of hits for the specified query. You can call COUNT_HITS
in exact or estimate mode. Exact mode returns the exact number of hits for the query. Estimate mode returns an upper-bound estimate but runs faster than exact mode.
exec CTX_QUERY.COUNT_HITS( index_name IN VARCHAR2, text_query IN VARCHAR2, exact IN BOOLEAN DEFAULT TRUE, part_name IN VARCHAR2 DEFAULT NULL ) RETURN NUMBER;
exec CTX_QUERY.COUNT_HITS_CLOB_QUERY( index_name IN VARCHAR2, text_query IN CLOB, exact IN BOOLEAN DEFAULT TRUE, part_name IN VARCHAR2 DEFAULT NULL ) RETURN NUMBER;
Specify the index name.
Specify the query.
Specify TRUE
for an exact count. Specify FALSE
for an upper-bound estimate.
Specifying FALSE
returns a less accurate number but runs faster. Specifying FALSE
might return a number which is too high if rows have been updated or deleted since the last FULL
index optimize. Optimizing in full mode removes these false hits, and then EXACT
set to FALSE
will return the same number as EXACT
set to TRUE
.
Specify the name of the index partition to query.
If the query contains structured criteria, then you should use SELECT COUNT(*)
.
If the index was created with the TRANSACTIONAL
parameter, then COUNT_HITS
will include pending rowids as well as those that have been synchronized.
Use CTX_QUERY.EXPLAIN
to generate explain plan information for a query expression. The EXPLAIN
plan provides a graphical representation of the parse tree for a Text query expression. This information is stored in a result table.
This procedure does not execute the query. Instead, this procedure can tell you how a query is expanded and parsed before you enter the query. This is especially useful for stem, wildcard, thesaurus, fuzzy, soundex, or about queries. Parse trees also show the following information:
ABOUT
query normalization
Query expression optimization
Stop-word transformations
Breakdown of composite-word tokens
Knowing how Oracle Text evaluates a query is useful for refining and debugging queries. You can also design your application so that it uses the explain plan information to help users write better queries.
exec CTX_QUERY.EXPLAIN(
index_name IN VARCHAR2, text_query IN VARCHAR2, explain_table IN VARCHAR2, sharelevel IN NUMBER DEFAULT 0, explain_id IN VARCHAR2 DEFAULT NULL, part_name IN VARCHAR2 DEFAULT NULL
);
exec CTX_QUERY.EXPLAIN_CLOB_QUERY( index_name IN VARCHAR2, text_query IN CLOB, explain_table IN VARCHAR2, sharelevel IN NUMBER DEFAULT 0, explain_id IN VARCHAR2 DEFAULT NULL, part_name IN VARCHAR2 DEFAULT NULL );
Specify the name of the index to be queried.
Specify the query expression to be used as criteria for selecting rows.
When you include a wildcard, fuzzy, or soundex operator in text_query
, this procedure looks at the index tables to determine the expansion.
Wildcard, fuzzy (?), and soundex (!) expression feedback does not account for lazy deletes as in regular queries.
Specify the name of the table used to store representation of the parse tree for text_query. You must have at least INSERT
and DELETE
privileges on the table used to store the results from EXPLAIN
.
See Also:
"EXPLAIN Table" in Appendix A, "Oracle Text Result Tables" for more information about the structure of the explain table.Specify whether explain_table
is shared by multiple EXPLAIN
calls. Specify 0 for exclusive use and 1 for shared use. Default is 0 (single-use).
When you specify 0, the system automatically truncates the result table before the next call to EXPLAIN
.
When you specify 1 for shared use, this procedure does not truncate the result table. Only results with the same explain_id
are updated. When no results with the same explain_id
exist, new results are added to the EXPLAIN
table.
Specify a name that identifies the explain results returned by an EXPLAIN
procedure when more than one EXPLAIN
call uses the same shared EXPLAIN
table. Default is NULL
.
Specify the name of the index partition to query.
To create an explain table called test_explain
for example, use the following SQL statement:
create table test_explain( explain_id varchar2(30), id number, parent_id number, operation varchar2(30), options varchar2(30), object_name varchar2(64), position number, cardinality number);
To obtain the expansion of a query expression such as comp% OR ?smith, use CTX_QUERY.EXPLAIN
as follows:
ctx_query.explain( index_name => 'newindex', text_query => 'comp% OR ?smith', explain_table => 'test_explain', sharelevel => 0, explain_id => 'Test');
Retrieving Data from Explain Table
To read the explain table, you can select the columns as follows:
select explain_id, id, parent_id, operation, options, object_name, position from test_explain order by id;
The output is ordered by ID to simulate a hierarchical query:
EXPLAIN_ID ID PARENT_ID OPERATION OPTIONS OBJECT_NAME POSITION ----------- ---- --------- ------------ ------- ----------- -------- Test 1 0 OR NULL NULL 1 Test 2 1 EQUIVALENCE NULL COMP% 1 Test 3 2 WORD NULL COMPTROLLER 1 Test 4 2 WORD NULL COMPUTER 2 Test 5 1 EQUIVALENCE (?) SMITH 2 Test 6 5 WORD NULL SMITH 1 Test 7 5 WORD NULL SMYTHE 2
CTX_QUERY.EXPLAIN
does not support the use of query templates.
You cannot use CTX_QUERY.EXPLAIN
with remote queries.
If the query utilizes themes (for example, with an ABOUT
query), then a knowledge base must be installed. Such a knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see Oracle Text Application Developer's Guide.
In English or French, this procedure generates hierarchical query feedback information (broader term, narrower term, and related term) for the specified query.
Broader term, narrower term, and related term information is obtained from the knowledge base. However, only knowledge base terms that are also in the index are returned as query feedback information. This increases the chances that terms returned from HFEEDBACK
produce hits over the currently indexed document set.
Hierarchical query feedback information is useful for suggesting other query terms to the user.
Note:
CTX_QUERY.HFEEDBACK
requires an installed knowledge base. A knowledge base may or may not have been installed with Oracle Text. For more information on knowledge bases, see Oracle Text Application Developer's Guide.
CTX_QUERY.HFEEDBACK
is only supported in English and French.
exec CTX_QUERY.HFEEDBACK( index_name IN VARCHAR2, text_query IN VARCHAR2, feedback_table IN VARCHAR2, sharelevel IN NUMBER DEFAULT 0, feedback_id IN VARCHAR2 DEFAULT NULL, part_name IN VARCHAR2 DEFAULT NULL );
exec CTX_QUERY.HFEEDBACK_CLOB_QUERY( index_name IN VARCHAR2, text_query IN CLOB, feedback_table IN VARCHAR2, sharelevel IN NUMBER DEFAULT 0, feedback_id IN VARCHAR2 DEFAULT NULL, part_name IN VARCHAR2 DEFAULT NULL );
Specify the name of the index for the text column to be queried.
Specify the query expression to be used as criteria for selecting rows.
Specify the name of the table used to store the feedback terms.
See Also:
"HFEEDBACK Table" in Appendix A, "Oracle Text Result Tables" for more information about the structure of the explain table.Specify whether feedback_table
is shared by multiple HFEEDBACK
calls. Specify 0 for exclusive use and 1 for shared use. Default is 0 (single-use).
When you specify 0, the system automatically truncates the feedback table before the next call to HFEEDBACK
.
When you specify 1 for shared use, this procedure does not truncate the feedback table. Only results with the same feedback_id
are updated. When no results with the same feedback_id exist, new results are added to the feedback table.
Specify a value that identifies the feedback results returned by a call to HFEEDBACK
when more than one HFEEDBACK
call uses the same shared feedback table. Default is NULL
.
Specify the name of the index partition to query.
Create a result table to use with CTX_QUERY.HFEEDBACK
as follows:
CREATE TABLE restab ( feedback_id VARCHAR2(30), id NUMBER, parent_id NUMBER, operation VARCHAR2(30), options VARCHAR2(30), object_name VARCHAR2(80), position NUMBER, bt_feedback ctxsys.ctx_feedback_type, rt_feedback ctxsys.ctx_feedback_type, nt_feedback ctxsys.ctx_feedback_type ) NESTED TABLE bt_feedback STORE AS res_bt NESTED TABLE rt_feedback STORE AS res_rt NESTED TABLE nt_feedback STORE AS res_nt;
CTX_FEEDBACK_TYPE is a system-defined type in the CTXSYS
schema.
See Also:
"HFEEDBACK Table" in Appendix A, "Oracle Text Result Tables" for more information about the structure of theHFEEDBACK
table.The following code calls the HFEEDBACK
procedure with the query computer industry.
BEGIN ctx_query.hfeedback (index_name => 'my_index', text_query => 'computer industry', feedback_table => 'restab', sharelevel => 0, feedback_id => 'query10' ); END;
The following code extracts the feedback data from the result table. It extracts broader term, narrower term, and related term feedback separately from the nested tables.
DECLARE i NUMBER; BEGIN FOR frec IN ( SELECT object_name, bt_feedback, rt_feedback, nt_feedback FROM restab WHERE feedback_id = 'query10' AND object_name IS NOT NULL ) LOOP dbms_output.put_line('Broader term feedback for ' || frec.object_name || ':'); i := frec.bt_feedback.FIRST; WHILE i IS NOT NULL LOOP dbms_output.put_line(frec.bt_feedback(i).text); i := frec.bt_feedback.NEXT(i); END LOOP; dbms_output.put_line('Related term feedback for ' || frec.object_name || ':'); i := frec.rt_feedback.FIRST; WHILE i IS NOT NULL LOOP dbms_output.put_line(frec.rt_feedback(i).text); i := frec.rt_feedback.NEXT(i); END LOOP; dbms_output.put_line('Narrower term feedback for ' || frec.object_name || ':'); i := frec.nt_feedback.FIRST; WHILE i IS NOT NULL LOOP dbms_output.put_line(frec.nt_feedback(i).text); i := frec.nt_feedback.NEXT(i); END LOOP; END LOOP; END;
The following output is for the preceding example, which queries on computer industry:
Broader term feedback for computer industry: hard sciences Related term feedback for computer industry: computer networking electronics knowledge library science mathematics optical technology robotics satellite technology semiconductors and superconductors symbolic logic telecommunications industry Narrower term feedback for computer industry: ABEND - abnormal end of task AT&T Starlans ATI Technologies, Incorporated ActivCard Actrade International Ltd. Alta Technology Amiga Format Amiga Library Services Amiga Shopper Amstrat Action Apple Computer, Incorporated ..
Note:
TheHFEEDBACK
information you obtain depends on the contents of your index and knowledge base and as such might differ from the sample shown.CTX_QUERY.HFEEDBACK
does not support the use of query templates.
The CTX_QUERY.REMOVE_SQE
procedure removes the specified stored query expression.
CTX_QUERY.REMOVE_SQE(query_name IN VARCHAR2);
Specify the name of the stored query expression to be removed.
begin ctx_query.remove_sqe('disasters'); end;
This procedure executes an XML query and generates a result set in XML. The result set interface can return data views that are difficult to express in SQL, such as top N by category queries.
See Also:
Oracle Text Application Developer's Guide for details on how to use the result set interfaceCTX_QUERY.RESULT_SET ( index_name IN VARCHAR2, query IN VARCHAR2, result_set_descriptor IN CLOB, result_set IN OUT NOCOPY CLOB, part_name IN VARCHAR2 DEFAULT NULL);
Specify the index against which to execute the query.
Specify the query string.
Specify the result set descriptor in XML. It describes what the result set should contain. See "The Input Result Set Descriptor" for more details.
Specify the output result set. If this variable is NULL
on input, a session-duration temporary lob will be allocated and returned to the user. The user is responsible for deallocating this temporary lob. See "The Output Result Set XML" for more details.
Specify the index partition name. If the index is global, part_name
must be NULL
. If the index is partitioned and part_name
is not NULL
, then the query will only be evaluated for the given partition. If the index is partitioned and part_name
is NULL
, then the query will be evaluated for all partitions.
The Input Result Set Descriptor
The result set descriptor is an XML message which describes what to calculate for the result set. The elements present in the result set descriptor and the order in which they occur serve as a simple template, specifying what to include in the output result set. That is, there should be the list of hit rowids, then a count, then a token count, and so on. The attributes of the elements specify the parameters and options to the specific operations, such as number of hits in the list of rowids, estimate versus exact count, and so on.
The result set descriptor itself is XML conforming to the following DTD:
<!ELEMENT ctx_result_set_descriptor (hitlist?, group*, count?)> <!ELEMENT hitlist (rowid?, score?, sdata*)> <!ELEMENT group(count?)> <!ELEMENT count EMPTY> <!ELEMENT rowid EMPTY> <!ELEMENT score EMPTY> <!ELEMENT sdata EMPTY> <!ATTLIST group sdata CDATA #REQUIRED> <!ATTLIST hitlist start_num_hit integer #REQUIRED> <!ATTLIST hitlist end_num_hit integer #REQUIRED> <!ATTLIST hitlist order PCDATA #IMPLIED> <!ATTLIST count exact (TRUE|FALSE) "FALSE"> <!ATTLIST sdata name CDATA #REQUIRED>
The following is a description of the possible XML elements for the result set descriptor:
ctx_result_set_descriptor
This is the root element for the result set descriptor. The parent element is none, as are the available attributes.
The possible child elements are:
Zero or more hitlist
elements.
Zero or more group
elements.
At most one count
element.
group
The group
element causes the generated result set to include a group breakdown. In other words, a breakdown of the results by SDATA
section values. The parent element is ctx_result_set_descriptor
, and the available attributes are:
sdata
Specifies the name of the SDATA
section to use for grouping. It is required.
Possible child elements of group
are:
At most one count
element.
hitlist
The hitlist
element controls inclusion of a list of hit documents. The parent element is ctx_result_set_descriptor
, and the available attributes are:
start_hit_num
This specifies the starting document hit to be included in the generated result set. This can be set to any positive integer less than or equal to 2048. For example, if start_hit_num
is 21, then the result set will include document hits starting from the 21st document hit. This element is required.
end_hit_num
This specifies the last document hit to be included in the generated result set. This can be set to any positive integer less than or equal to 2048. For example, if end_hit_num
is 40, then the result set will include document hits up to the 40th document hit. This element is required.
order
This is an optional attribute that specifies the order for the documents in the generated result set. The value is a list similar to a SQL ORDER
BY
statement, except that, instead of column names, they can either be SCORE
or SDATA
section names. In the following example, MYDATE
and MYPRICE
are the SDATA
section names:
(order = "SCORE DESC, MYDATE, MYPRICE DESC")
The possible child elements are:
At most one rowid
element.
At most one score
element.
At most one sdata
element.
count
This element causes the generated result set to include a count of the number of hit documents. The parent elements are:
ctx_result_set_descriptor
group
The available attributes are:
exact
This is to estimate mode. Set to true
or false
. It is required, and the default is false
.
The possible child elements are none.
rowid
This child element causes the generated result set to include rowid information for each hit. The parent element is hitlist
. There are no attributes and no possible child elements.
score
This child element causes the generated result set to include score information for each hit.
The parent element is hitlist
.
There are no available attributes, and no possible child elements.
sdata
This child element causes the generated result set to include sdata
values for each hit.
The parent element is hitlist
.
The available attribute is name
. This specifies the name of the sdata
section. It is required.
There are no child elements.
The output result set XML is XML conforming to the following DTD:
<!ELEMENT ctx_result_set (hitlist?, groups*, count?)> <!ELEMENT hitlist (hit*)> <!ELEMENT hit(rowid?, score?, sdata*)> <!ELEMENT groups (group*)> <!ELEMENT group (count?)> <!ELEMENT count CDATA> <!ELEMENT rowid CDATA> <!ELEMENT score CDATA> <!ELEMENT sdata CDATA> <!ATTLIST groups sdata CDATA #REQUIRED> <!ATTLIST group value CDATA #REQUIRED> <!ATTLIST sdata name CDATA #REQUIRED>
The following is a description of the list of possible XML elements for the output result set:
ctx_result_set
This is the root element for the generated result set. There are no attributes. The parent is none. The possible child elements are:
At most one hitlist
element.
Zero or more groups
elements.
groups
This delimits the start of a group breakdown section. The parent element is ctx_result_set
. The available attributes are:
sdata
This is the name of the sdata
section used for grouping.
The possible child elements are:
Zero or more group
elements.
group
This delimits the start of a GROUP
BY
value. The parent element is the groups
element. The available attributes are:
value
This is the value of the sdata
section.
The possible child elements are at most one count
element.
hitlist
This delimits the start of hitlist
information. The parent element is ctx_result_set
, while the children are zero or more hit
elements. There are no attributes.
hit
This delimits the start of the information for a particular document within a hitlist
. The parent element is hitlist
, and there are no available attributes. The possible child elements are:
Zero or one rowid
elements.
Zero or one score
element.
Zero or one sdata
element.
rowid
This is the rowid of the document, so the content is the rowid of the document. The parent element is the hit
element. There are no child elements, and no available attributes.
score
This is the score of the document. The parent element is the hit element. The content is the numeric score. There are no available attributes, and no possible child elements.
sdata
This is the SDATA
value or values for the document. The parent element is the hit
element, and the available attribute is name
, which is the name of the sdata
section. There are no possible child elements available. The content is the SDATA
section value, which, for DATE
values, is in the format "YYYY-MM-DD HH24:MI:SS", depending upon the actual values being stored.
count
This is the document hit count. The parent element is the ctx_result_set
element or the group
element. It contains the numeric hit count, and has no attributes, and no possible child elements.
This call to ctx_query.result_set()
with the specified XML result_set_descriptor
will generate the following information in the form of XML:
top 20 hits displaying, score, rowid, author SDATA
section value, and pubDate SDATA
section value, order by pubDate SDATA
section value DESC
and score DESC
total doc hit count for the text query
counts group by pubDate SDATA
section values
counts group by author SDATA
section values
declare rs clob; begin dbms_lob.createtemporary(rs, true, dbms_lob.session); ctx_query.result_set('docidx', 'oracle', ' <ctx_result_set_descriptor> <count/> <hitlist start_hit_num="1" end_hit_num="5" order="pubDate desc, score desc"> <score/> <rowid/> <sdata name="author"/> <sdata name="pubDate"/> </hitlist> <group sdata="pubDate"> <count/> </group> <group sdata="author"> <count/> </group> </ctx_result_set_descriptor> ', rs); dbms_lob.freetemporary(rs); exception when others then dbms_lob.freetemporary(rs); raise; end; /
The XML output store in the result set output clob will resemble the following:
<ctx_result_set> <hitlist> <hit> <score>3</score><rowid>AAAPoEAABAAAMWsAAC</rowid> <sdata name="AUTHOR">John</sdata> <sdata name="PUBDATE">2001-01-03 00:00:00</sdata> </hit> <hit> <score>3</score><rowid>AAAPoEAABAAAMWsAAG</rowid> <sdata name="AUTHOR">John</sdata> <sdata name="PUBDATE">2001-01-03 00:00:00</sdata> </hit> <hit> <score>3</score><rowid>AAAPoEAABAAAMWsAAK</rowid> <sdata name="AUTHOR">John</sdata> <sdata name="PUBDATE">2001-01-03 00:00:00</sdata> </hit> <hit> <score>3</score><rowid>AAAPoEAABAAAMWsAAO</rowid> <sdata name="AUTHOR">John</sdata> <sdata name="PUBDATE">2001-01-03 00:00:00</sdata> </hit> <hit> <score>3</score><rowid>AAAPoEAABAAAMWsAAS</rowid> <sdata name="AUTHOR">John</sdata> <sdata name="PUBDATE">2001-01-03 00:00:00</sdata> </hit> </hitlist> <count>100</count> <groups sdata="PUBDATE"> <group value="2001-01-01 00:00:00"><count>25</count></group> <group value="2001-01-02 00:00:00"><count>50</count></group> <group value="2001-01-03 00:00:00"><count>25</count></group> </groups> <groups sdata="AUTHOR"> <group value="John"><count>50</count></group> <group value="Mike"><count>25</count></group> <group value="Steve"><count>25</count></group> </groups> </ctx_result_set>
This procedure creates a stored query expression. Only the query definition is stored.
Stored query expressions support all of the CONTAINS
query operators. Stored query expressions also support all of the special characters and other components that can be used in a query expression, including other stored query expressions.
Users are allowed to create and remove stored query expressions owned by them. Users are allowed to use stored query expressions owned by anyone. The CTXSYS
user can create or remove stored query expressions for any user.
CTX_QUERY.STORE_SQE(query_name IN VARCHAR2, text_query IN VARCHAR2);
CTX_QUERY.STORE_SQE_CLOB_QUERY(query_name IN VARCHAR2, text_query IN CLOB);
Specify the name of the stored query expression to be created.
Specify the query expression to be associated with query_name
.
begin ctx_query.store_sqe('disasters', 'hurricanes | earthquakes'); end;