com.raritantechnologies.concept.classifier
Interface IDocumentMatcher

All Superinterfaces:
IConfigurable, ITermExtractor
All Known Implementing Classes:
AccrueDocumentMatcher, AndDocumentMatcher, AndNotDocumentMatcher, BasicDocumentMatcher, CompositeDocumentMatcher, CountDocumentMatcher, FieldValueMatcher, NearDocumentMatcher, NotDocumentMatcher, PhraseDocumentMatcher, RangeDocumentMatcher, TermDocumentMatcher, WildcardDocumentMatcher

public interface IDocumentMatcher
extends ITermExtractor

Base interface for objects that can match a document (text stream) based on a set of query rules (a topic). Uses an IndexedDocument to store token maps.

Constructed from rule sets by IDocumentMatcherFactory or query strings by IMatcherParser implementations.

Used by the DocumentClassifier.

Document Matchers can also be used as Term Extractors - this enables complex query rules to be developed so that terms will be extracted from the document only if the matcher criterion is satisfied. This is important for example in clustering where documents can be associated by shared keywords (see the ResultKeywordClusterer class).


Developed by Raritan Technologies Inc..

Author:
Ted Sullivan

Method Summary
 void addAttribute(java.lang.String attrName, java.lang.Object attrValue)
          DocumentMatcher attributes: These are what are added to documents that are matched by the IDocumentMatcher i.e.
 void addTermsAsAttributes(java.lang.String termAttribute, java.lang.String delimiter)
          Calling this method will cause all of the Terms and Phrases contained in this matcher to be added as a matcher attribute or "tag".
 void extractTerms(IndexedDocument fromDocument, java.util.HashMap termsMap)
          Extracts the matching terms contained in the document.
 void extractTerms(IndexedDocument fromDocument, java.util.Set termsSet)
           
 java.lang.Object getAttribute(java.lang.String attrName)
           
 java.util.Iterator getAttributeNames()
           
 DocumentMatchBean getMatchCriteria(IndexedDocument document)
          returns a DocumentMatchBean containing the match criteria (the category or categories that specify the 'reason' or context of the match.
 DocumentMatchBean getMatchCriteria(IndexedDocument document, java.util.Map termsMap)
          returns a DocumentMatchBean containing the match criteria (the category or categories that specify the 'reason' or context of the match.
 java.lang.String getName()
           
 java.util.Set getPhraseSet()
          returns the list of phrases in all contained PhraseDocumentMatchers
 java.util.Set getTermSet()
          returns the list of terms in all contained TermDocumentMatchers
 boolean isStopWord(IndexedDocument document)
          Adds stop word support.
 boolean matches(IndexedDocument document)
          returns true if the matcher matches the IndexedDocument, false otherwise.
 java.lang.String render()
          Renders a human-readable version of the matcher's logic.
 void setName(java.lang.String name)
          Unique key that identifies this IDocumentMatcher
 
Methods inherited from interface com.raritantechnologies.utils.tagging.ITermExtractor
extractTerms
 
Methods inherited from interface com.raritantechnologies.searchApp.IConfigurable
initialize
 

Method Detail

setName

public void setName(java.lang.String name)
Unique key that identifies this IDocumentMatcher


getName

public java.lang.String getName()

getMatchCriteria

public DocumentMatchBean getMatchCriteria(IndexedDocument document)
returns a DocumentMatchBean containing the match criteria (the category or categories that specify the 'reason' or context of the match.


getMatchCriteria

public DocumentMatchBean getMatchCriteria(IndexedDocument document,
                                          java.util.Map termsMap)
returns a DocumentMatchBean containing the match criteria (the category or categories that specify the 'reason' or context of the match. Adds any contained terms or phrases to the termsMap


matches

public boolean matches(IndexedDocument document)
returns true if the matcher matches the IndexedDocument, false otherwise.


getTermSet

public java.util.Set getTermSet()
returns the list of terms in all contained TermDocumentMatchers


getPhraseSet

public java.util.Set getPhraseSet()
returns the list of phrases in all contained PhraseDocumentMatchers


addAttribute

public void addAttribute(java.lang.String attrName,
                         java.lang.Object attrValue)
DocumentMatcher attributes: These are what are added to documents that are matched by the IDocumentMatcher i.e. "tags".


getAttribute

public java.lang.Object getAttribute(java.lang.String attrName)

getAttributeNames

public java.util.Iterator getAttributeNames()

addTermsAsAttributes

public void addTermsAsAttributes(java.lang.String termAttribute,
                                 java.lang.String delimiter)
Calling this method will cause all of the Terms and Phrases contained in this matcher to be added as a matcher attribute or "tag".


extractTerms

public void extractTerms(IndexedDocument fromDocument,
                         java.util.HashMap termsMap)
Extracts the matching terms contained in the document.

Parameters:
fromDocument - The Document to be processed
termsMap - The map into which the extracted terms should be put. The key of the map is the term that hit. The value is an AttributeWordTagger that can be used to mark up the document.

extractTerms

public void extractTerms(IndexedDocument fromDocument,
                         java.util.Set termsSet)

render

public java.lang.String render()
Renders a human-readable version of the matcher's logic.


isStopWord

public boolean isStopWord(IndexedDocument document)
Adds stop word support. This is typically done by checking if the matchers terms are stop words by calling the IndexedDocument method isStopWord( string ). See TermDocumentMatcher.