27.5 Example : Building an ESA Model with a Wiki Dataset
The examples shows FEATURE_COMPARE
function with Explicit Semantic Analysis (ESA) model, which compares a similar set of texts and then a dissimilar set of texts.
The example shows an ESA model built against a 2005 Wiki dataset rendering over 200,000 features. The documents are mined as text and the document titles are given as the feature IDs.
Similar Texts
SELECT 1-FEATURE_COMPARE(esa_wiki_mod USING 'There are several PGA tour golfers from South Africa' text AND USING 'Nick Price won the 2002 Mastercard Colonial Open' text) similarity FROM DUAL;
SIMILARITY
----------
.258
The output metric shows distance calculation. Therefore, smaller number represent more similar texts. So, 1
minus the distance in the queries result in similarity.
Dissimilar Texts
SELECT 1-FEATURE_COMPARE(esa_wiki_mod USING 'There are several PGA tour golfers from South Africa' text AND USING 'John Elway played quarterback for the Denver Broncos' text) similarity FROM DUAL;
SIMILARITY
----------
.007