|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
Base interface for objects that can match a document (text stream) based on a set of query rules (a topic). Uses an IndexedDocument to store token maps.
Constructed from rule sets by IDocumentMatcherFactory
or query strings by IMatcherParser implementations.
Used by the DocumentClassifier.
Document Matchers can also be used as Term Extractors - this enables complex query rules to be developed so
that terms will be extracted from the document only if the matcher criterion is satisfied. This is important
for example in clustering where documents can be associated by shared keywords (see the
ResultKeywordClusterer class).
| Method Summary | |
void |
addAttribute(java.lang.String attrName,
java.lang.Object attrValue)
DocumentMatcher attributes: These are what are added to documents that are matched by the IDocumentMatcher i.e. |
void |
addTermsAsAttributes(java.lang.String termAttribute,
java.lang.String delimiter)
Calling this method will cause all of the Terms and Phrases contained in this matcher to be added as a matcher attribute or "tag". |
void |
extractTerms(IndexedDocument fromDocument,
java.util.HashMap termsMap)
Extracts the matching terms contained in the document. |
void |
extractTerms(IndexedDocument fromDocument,
java.util.Set termsSet)
|
java.lang.Object |
getAttribute(java.lang.String attrName)
|
java.util.Iterator |
getAttributeNames()
|
DocumentMatchBean |
getMatchCriteria(IndexedDocument document)
returns a DocumentMatchBean containing the match criteria (the category or categories that specify the 'reason' or context of the match. |
DocumentMatchBean |
getMatchCriteria(IndexedDocument document,
java.util.Map termsMap)
returns a DocumentMatchBean containing the match criteria (the category or categories that specify the 'reason' or context of the match. |
java.lang.String |
getName()
|
java.util.Set |
getPhraseSet()
returns the list of phrases in all contained PhraseDocumentMatchers |
java.util.Set |
getTermSet()
returns the list of terms in all contained TermDocumentMatchers |
boolean |
isStopWord(IndexedDocument document)
Adds stop word support. |
boolean |
matches(IndexedDocument document)
returns true if the matcher matches the IndexedDocument, false otherwise. |
java.lang.String |
render()
Renders a human-readable version of the matcher's logic. |
void |
setName(java.lang.String name)
Unique key that identifies this IDocumentMatcher |
| Methods inherited from interface com.raritantechnologies.utils.tagging.ITermExtractor |
extractTerms |
| Methods inherited from interface com.raritantechnologies.searchApp.IConfigurable |
initialize |
| Method Detail |
public void setName(java.lang.String name)
public java.lang.String getName()
public DocumentMatchBean getMatchCriteria(IndexedDocument document)
public DocumentMatchBean getMatchCriteria(IndexedDocument document,
java.util.Map termsMap)
public boolean matches(IndexedDocument document)
public java.util.Set getTermSet()
public java.util.Set getPhraseSet()
public void addAttribute(java.lang.String attrName,
java.lang.Object attrValue)
public java.lang.Object getAttribute(java.lang.String attrName)
public java.util.Iterator getAttributeNames()
public void addTermsAsAttributes(java.lang.String termAttribute,
java.lang.String delimiter)
public void extractTerms(IndexedDocument fromDocument,
java.util.HashMap termsMap)
fromDocument - The Document to be processedtermsMap - The map into which the extracted terms should be put. The
key of the map is the term that hit. The value is an
AttributeWordTagger that can be used to mark up the
document.
public void extractTerms(IndexedDocument fromDocument,
java.util.Set termsSet)
public java.lang.String render()
public boolean isStopWord(IndexedDocument document)
TermDocumentMatcher.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||