Working with Keymatches


KeyMatches are an extended feature based on Google Enterprise's KeyMatch functionality. The KeyMatch is designed to provide a separate set of data that can be used in conjunction with the standard Lucene search results. The KeyMatch feature is, for the most part, independent of the search functionality and could be used entirely on its own. There are three KeyMatch types available. This table (from Google's support documentation) is useful in understanding the different types.

KeyMatch Type Criteria (None are case-sensitive.) If search query is "Abraham Lincoln" Reason for KeyMatch

KeywordMatch

A word that must appear anywhere in query.

KeywordMatches = "Abraham" and "Lincoln"

If your KeywordMatch is "Abraham Lincoln", the search query must include both "Abraham" and "Lincoln" to trigger this KeywordMatch. 
To get a KeywordMatch for either "Abraham" or "Lincoln," then enter two KeywordMatches: one for "Abraham" and one for "Lincoln."

 

PhraseMatch

A phrase that appears anywhere in query. For the phrase to match, all of the words must be present, the order of the words must be the same with no intervening words, and any hyphens in the query must be matched.

PhraseMatch = 
"Abraham Lincoln," "President Abraham Lincoln," "Abraham Lincoln president," and "young Abraham Lincoln"

These are all phrase KeyMatches because the words appear in the order entered in the search query, "Abraham Lincoln."

"Abraham the Tall Lincoln" is not a PhraseMatch because "the Tall" separates the phrase "Abraham Lincoln."

ExactMatch

Phrase must exactly match the query.

ExactMatch = 
"Abraham Lincoln"

Only "Abraham Lincoln" is an ExactMatch for the query. "President Abraham Lincoln" and "Abraham Lincoln's" are not ExactMatches.

 

There are two main parts to the KeyMatch feature. The first is the KeyMatchSearcher. The KeyMatchSearcher does a reverse query against a given query itself. For instance, if you pass a query string "hello world" the query string will be tokenized and the KeyMatch entries will be used as search queries themselves against the original query.

The second part is the source classes. There are two built-in sources: ExportedDataKeyMatchSource and LuceneDataKeyMatchSource. Each of these inherits from the KeyMatchSource base class, which requires its own KeyMatchSearcher object. LuceneDataKeyMatchSource also takes a KeyMatchFieldDefinition object on creation.

The KeyMatchFieldDefinition class has three fields that map to a field name in the Lucene index. The Keyword field maps to the field from which to pull in KeyMatch fields; the title field matches the KeyMatch title; and the url field matches the KeyMatch URL. There is an optional field for defining a specific source name to search on. If the Lucene index has multiple sources indexed, this field can be used to specify a single one.

Each of these classes also has a ReadKeyMatchData() method. The LuceneDataKeyMatchSource takes an IndexSearcher object to query the index for KeyMatch data. ExportedDataKeyMatchSource takes a string path to a file. Generally, this .csv file is exported from the Google Enterprise dashboard, but a file can be built from other programs as well. The format is keywords, KeyMatch type, title, and url.

There is also a wrapper class that help manage multiple sources. The KeyMatchSearchCollector allows you to specify a source name, and a KeyMatchSearcher (can be obtained from the source object once built). You can have multiple KeyMatchSearcher objects assigned to the same source name. There are also multiple methods for searching a single source, multiple sources, or all searches at once.