com.raritantechnologies.concept.classifier
Class AccrueDocumentMatcher

java.lang.Object
  extended bycom.raritantechnologies.concept.classifier.BasicDocumentMatcher
      extended bycom.raritantechnologies.concept.classifier.AccrueDocumentMatcher
All Implemented Interfaces:
IConfigurable, IDocumentMatcher, ITermExtractor

public class AccrueDocumentMatcher
extends BasicDocumentMatcher
implements IDocumentMatcher

Computes the weighted average of scores of a set of contained document matchers.

The combined score is determined by the average score of the child DocumentMatchers multiplied by a weighting factor that favors matches spread across the set of child matchers.

XML Configuration Template:
   <DocumentMatcher class="com.raritantechnologies.concept.classifier.AccrueDocumentMatcher"
                       minMatchers=[ minimum number of matchers needed to reach threshold (default=2) ]"
                       maxFailures="[ maximum number of matchers that can fail before Accrue matcher fails (default=2) ]"
                       multipleMatchWeight="[ weighting factor for matches across matchers ]" >

     <!-- two or more child matchers -->
     <DocumentMatcher class="[ class of com.raritantechnologies.concept.classifier.IDocumentMatcher ]" >

     </DocumentMatcher>

     <!-- etc. . . -->

   </DocumentMatcher>
 

Developed by Raritan Technologies Inc..

Author:
Ted Sullivan

Constructor Summary
AccrueDocumentMatcher()
           
AccrueDocumentMatcher(java.util.ArrayList childMatchers)
           
AccrueDocumentMatcher(java.util.ArrayList childMatchers, int minMatchers, double multipleMatchWeight)
           
AccrueDocumentMatcher(java.util.ArrayList childMatchers, int minMatchers, int maxFailures, double multipleMatchWeight)
           
 
Method Summary
protected  void collectPhraseSet(java.util.HashSet phraseSet)
           
protected  void collectTermSet(java.util.HashSet termSet)
           
 void extractTerms(IndexedDocument fromDocument, java.util.HashMap termsMap)
          Extracts the matching terms contained in the document.
 void extractTerms(IndexedDocument fromDocument, java.util.Set termsSet)
           
 DocumentMatchBean getMatchCriteria(IndexedDocument document, java.util.Map termsMap)
          Computes an average of child scores = sum of scores / number of matchers.
 void initialize(org.w3c.dom.Element elem)
          Initializes the object from an XML tag or element.
 boolean isStopWord(IndexedDocument document)
          Adds stop word support.
 boolean matches(IndexedDocument document)
          returns true if the matcher matches the IndexedDocument, false otherwise.
 java.lang.String render()
          Renders a human-readable version of the matcher's logic.
 void setMaxFailures(int maxFailures)
          sets the maximum number of match failures that can occur.
 void setMinMatchers(int minMatchers)
          sets the minimum number of child matchers that need to match.
 void setMultipleMatchWeight(double multipleMatchWeight)
          sets the weighting factor for multiple-matcher matches.
 
Methods inherited from class com.raritantechnologies.concept.classifier.BasicDocumentMatcher
addAttribute, addTerms, addTermsAsAttributes, extractTerms, getAttribute, getAttributeNames, getMatchCriteria, getName, getPhraseSet, getTermSet, setName
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.raritantechnologies.concept.classifier.IDocumentMatcher
addAttribute, addTermsAsAttributes, getAttribute, getAttributeNames, getMatchCriteria, getName, getPhraseSet, getTermSet, setName
 
Methods inherited from interface com.raritantechnologies.utils.tagging.ITermExtractor
extractTerms
 

Constructor Detail

AccrueDocumentMatcher

public AccrueDocumentMatcher()

AccrueDocumentMatcher

public AccrueDocumentMatcher(java.util.ArrayList childMatchers)

AccrueDocumentMatcher

public AccrueDocumentMatcher(java.util.ArrayList childMatchers,
                             int minMatchers,
                             double multipleMatchWeight)

AccrueDocumentMatcher

public AccrueDocumentMatcher(java.util.ArrayList childMatchers,
                             int minMatchers,
                             int maxFailures,
                             double multipleMatchWeight)
Method Detail

setMinMatchers

public void setMinMatchers(int minMatchers)
sets the minimum number of child matchers that need to match.


setMaxFailures

public void setMaxFailures(int maxFailures)
sets the maximum number of match failures that can occur.


setMultipleMatchWeight

public void setMultipleMatchWeight(double multipleMatchWeight)
sets the weighting factor for multiple-matcher matches.


getMatchCriteria

public DocumentMatchBean getMatchCriteria(IndexedDocument document,
                                          java.util.Map termsMap)
Computes an average of child scores = sum of scores / number of matchers. Non-matching elements will lower the score of the accrue matcher.

Specified by:
getMatchCriteria in interface IDocumentMatcher
Overrides:
getMatchCriteria in class BasicDocumentMatcher

isStopWord

public boolean isStopWord(IndexedDocument document)
Description copied from interface: IDocumentMatcher
Adds stop word support. This is typically done by checking if the matchers terms are stop words by calling the IndexedDocument method isStopWord( string ). See TermDocumentMatcher.

Specified by:
isStopWord in interface IDocumentMatcher
Overrides:
isStopWord in class BasicDocumentMatcher

matches

public boolean matches(IndexedDocument document)
Description copied from interface: IDocumentMatcher
returns true if the matcher matches the IndexedDocument, false otherwise.

Specified by:
matches in interface IDocumentMatcher
Specified by:
matches in class BasicDocumentMatcher

extractTerms

public void extractTerms(IndexedDocument fromDocument,
                         java.util.HashMap termsMap)
Description copied from interface: IDocumentMatcher
Extracts the matching terms contained in the document.

Specified by:
extractTerms in interface IDocumentMatcher
Specified by:
extractTerms in class BasicDocumentMatcher

extractTerms

public void extractTerms(IndexedDocument fromDocument,
                         java.util.Set termsSet)
Specified by:
extractTerms in interface IDocumentMatcher

initialize

public void initialize(org.w3c.dom.Element elem)
Description copied from interface: IConfigurable
Initializes the object from an XML tag or element. This method is called by the Framework as part of the application initializtion. see ConfigurationManager, XMLConfigurationManager, XMLSearchFieldMapFactory, XMLSearchSourceFactory. Configurable objects that are owned or contained by other configurable objects will be initialized in by the parent object.

Specified by:
initialize in interface IConfigurable

collectTermSet

protected void collectTermSet(java.util.HashSet termSet)
Specified by:
collectTermSet in class BasicDocumentMatcher

collectPhraseSet

protected void collectPhraseSet(java.util.HashSet phraseSet)
Specified by:
collectPhraseSet in class BasicDocumentMatcher

render

public java.lang.String render()
Description copied from interface: IDocumentMatcher
Renders a human-readable version of the matcher's logic.

Specified by:
render in interface IDocumentMatcher