com.raritantechnologies.concept.classifier
Class DocumentClassifier

java.lang.Object
  extended bycom.raritantechnologies.concept.classifier.DocumentClassifier
All Implemented Interfaces:
IConfigurable, IFieldFormatter, IGatewayOutputProcessor, IResultSetProcessor

public class DocumentClassifier
extends java.lang.Object
implements IGatewayOutputProcessor, IResultSetProcessor, IFieldFormatter

Uses a set of IDocumentMatchers to tag one or more documents (IResult objects).

XML Configuration Template:
  <OutputProcessor class="com.raritantechnologies.concept.classifier.DocumentClassifier"
                      name="[ name of this classifier ]"
                      nestedResultField="[ name of field to put nested classification results ]"
                      matcherNameField="[ name of result field that contains matcher name ]"
                      useMatcherNameAsKey="[ true|false( default ) - if true - matcherName is used as matcher Key ]"
                      keywordsField="[ name of result field to add matched terms ]"
                      unmatchedValue="[ name of value to add if result not matched ]"
                      unmatchedField="[ name of field to add the unmatchedValue to ]"
                      scoreField="[ name of field that gets match scores ]"
                      excerptField="[ optional excerpt field that will get snippets from matching DocumentMatchers ]"
                      excerptWidth="[ character width of excerpts (default=100) ]"
                      highlightPrefix="[ prefix tag for marking hit terms in the excerpt ]"
                      highlightPostfix="[ postfix tag for marking hit terms in the excerpt ]"
                      stopWordsFile="[ File that contains list of stop words ]"
                      debug="[true|false(default) | select]"
                      tokenDelimiter="[ custom token delimiter ]" >

    <!-- IndexedFields are the set of fields in input IResults that will be indexed for the purpose of classification. -->
    <IndexedFields>
      <!-- Unfiltered field: -->
      <Field ID="[ ID of result field to be indexed ]" useFieldNameDelimiter="true|false(default)" 
                fieldAliases="[ comma separated list of name(s) of field aliases to use for fielded indexing ]"
                fieldWeight="[ relevance weighting for this field (floating point value) ]" />

      <Field ID="[ ID of result field to be indexed ]" >
        <StringFilter class="[ class of com.raritantechnologies.utils.filter.IStringFilter ]" >

        </StringFilter>
      </Field>
    </IndexedFields>

    <StopWords>
      <StopWord>[ stop word ]</StopWord>
      <!-- etc. -->
    </StopWords>

    <!-- Use a QueryMatcher Parser Search Source to create a set of IDocumentMatchers -->
    <QueryMatcherParser searchSource="[ name of search source to obtain query list ]"
                           queryField="[ name of field that contains query to parse ]" >

      <!-- MatcherParser that will convert the query string into an IDocumentMatcher instance -->
      <MatcherParser class="[ class of com.raritantechnologies.concept.classifier.IMatcherParser"  >

      </MatcherParser>

      <InputQuery>
        <Field ID=" query field " value=" query value " />
      </InputQuery>

      <ResultMap>
        <Field ID=" result field ID " attributeName=" name of IDocumentMatcher attribute " />
      </ResultMap

    </QueryMatcherParser>

    <!-- Alternatively, can use a IDocumentMatcherFactory to create the set of IDocumentMatchers -->
    <DocumentMatcherFactory class="[ class of com.raritantechnologies.concept.classifier.IDocumentMatcherFactory" >

    </DocumentMatcherFactory>

  </OutputProcessor>

  <!-- can also be used as a FieldFormatter: -->
  <FieldFormatter class="com.raritantechnologies.concept.classifier.DocumentClassifier" >

  </FieldFormatter>
 

Developed by Raritan Technologies Inc..

Author:
Ted Sullivan

Field Summary
 
Fields inherited from interface com.raritantechnologies.searchApp.IFieldFormatter
TEMPLATE
 
Constructor Summary
DocumentClassifier()
           
DocumentClassifier(org.w3c.dom.Element elem)
           
 
Method Summary
 IndexedDocument classifyResult(java.lang.String sessionID, IResult result)
           
 void dataComplete()
          Data feed is complete.
 java.lang.String formatField(java.lang.String fieldVal)
          Reformats a field value.
 java.lang.String formatField(java.lang.String sessionID, java.lang.String fieldVal)
          Reformats a field value.
 void formatResultField(IResult result)
          Formats a result field "in place".
 void formatResultField(java.lang.String sessionID, IResult result)
          Formats a result field "in place", incorporating session context.
 java.lang.String getConfigurationXML()
           
 java.lang.String getConfigurationXML(java.lang.String configurationTemplate)
           
static java.lang.String getDocument(IResult result, java.util.List indexedFields, java.util.Map stringFilterMap, java.util.Set fieldDelimiterSet, java.lang.String fieldDelimiter)
           
static IDocumentMatcher getDocumentMatcher(java.lang.String classifierName, java.lang.String matcherName)
           
static IDocumentMatcher getDocumentMatcher(java.lang.String sessionID, java.lang.String classifierName, java.lang.String matcherName)
           
static java.util.Map getDocumentMatchers(java.lang.String classifierName)
           
static java.util.Map getDocumentMatchers(java.lang.String sessionID, java.lang.String classifierName)
           
 java.lang.String getFieldName()
          Returns the name of the result field that this formatter can reformat.
 java.util.List getIndexedFields()
           
 java.lang.String getMatcherNameField()
           
 IDocumentMatcher[] getMatchingMatchers(java.lang.String sessionID, java.lang.String documentText)
          Returns a list of matching IDocumentMatchers for a given String.
 java.lang.String getNestedResultField()
           
 void initialize(org.w3c.dom.Element elem)
          Initialize the from XML Element.
 void initialize(org.w3c.dom.Element outputProcElem, ISearchFieldMap sfMap)
          Initialize the GatewayOutputProcessor from XML Configuration Element.
 void initialize(java.util.Map initParams)
          Used for dynamic initialization (connection, collection name, file name, etc.)
 java.lang.String processData(IResultSet data)
          returns name of XML File created/appended.
 void processResultSet(java.lang.String sessionID, IResultSet data)
          processes the IResultSet (somehow)
static void removeDocumentClassifier(DocumentClassifier docClassifier)
           
 void setNestedResultField(java.lang.String nestedResultField)
           
 void setSessionID(java.lang.String sessionID)
           
static void updateDocMatcher(java.lang.String sessionID, java.lang.String docClassifierName, IDocumentMatcher docMatcher, boolean copyOldAttributes)
          Updates a DocumentClassifier by adding or replacing an IDocumentMatcher.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DocumentClassifier

public DocumentClassifier()

DocumentClassifier

public DocumentClassifier(org.w3c.dom.Element elem)
Method Detail

getDocumentMatcher

public static IDocumentMatcher getDocumentMatcher(java.lang.String classifierName,
                                                  java.lang.String matcherName)

getDocumentMatcher

public static IDocumentMatcher getDocumentMatcher(java.lang.String sessionID,
                                                  java.lang.String classifierName,
                                                  java.lang.String matcherName)

getDocumentMatchers

public static java.util.Map getDocumentMatchers(java.lang.String classifierName)

getDocumentMatchers

public static java.util.Map getDocumentMatchers(java.lang.String sessionID,
                                                java.lang.String classifierName)

setNestedResultField

public void setNestedResultField(java.lang.String nestedResultField)

getNestedResultField

public java.lang.String getNestedResultField()

getMatcherNameField

public java.lang.String getMatcherNameField()

getIndexedFields

public java.util.List getIndexedFields()

processData

public java.lang.String processData(IResultSet data)
Description copied from interface: IGatewayOutputProcessor
returns name of XML File created/appended.

Specified by:
processData in interface IGatewayOutputProcessor

processResultSet

public void processResultSet(java.lang.String sessionID,
                             IResultSet data)
Description copied from interface: IResultSetProcessor
processes the IResultSet (somehow)

Specified by:
processResultSet in interface IResultSetProcessor

formatResultField

public void formatResultField(IResult result)
Description copied from interface: IFieldFormatter
Formats a result field "in place".

Specified by:
formatResultField in interface IFieldFormatter
Parameters:
result - The result object that is to be formatted.

formatResultField

public void formatResultField(java.lang.String sessionID,
                              IResult result)
Description copied from interface: IFieldFormatter
Formats a result field "in place", incorporating session context.

Specified by:
formatResultField in interface IFieldFormatter
Parameters:
sessionID - The session key needed to lookup any session content stored in the session data cache.
result - The result object that is to be formatted.

classifyResult

public IndexedDocument classifyResult(java.lang.String sessionID,
                                      IResult result)

getMatchingMatchers

public IDocumentMatcher[] getMatchingMatchers(java.lang.String sessionID,
                                              java.lang.String documentText)
Returns a list of matching IDocumentMatchers for a given String.


getDocument

public static java.lang.String getDocument(IResult result,
                                           java.util.List indexedFields,
                                           java.util.Map stringFilterMap,
                                           java.util.Set fieldDelimiterSet,
                                           java.lang.String fieldDelimiter)

getFieldName

public java.lang.String getFieldName()
Description copied from interface: IFieldFormatter
Returns the name of the result field that this formatter can reformat.

Specified by:
getFieldName in interface IFieldFormatter

formatField

public java.lang.String formatField(java.lang.String sessionID,
                                    java.lang.String fieldVal)
Description copied from interface: IFieldFormatter
Reformats a field value.

Specified by:
formatField in interface IFieldFormatter
Parameters:
sessionID - The session key needed to lookup any session content stored in the session data cache.
fieldVal - The field value to be reformatted.
Returns:
The reformatted field value.

formatField

public java.lang.String formatField(java.lang.String fieldVal)
Description copied from interface: IFieldFormatter
Reformats a field value.

Specified by:
formatField in interface IFieldFormatter
Parameters:
fieldVal - The field value to be reformatted.
Returns:
The reformatted field value.

dataComplete

public void dataComplete()
Description copied from interface: IGatewayOutputProcessor
Data feed is complete.

Specified by:
dataComplete in interface IGatewayOutputProcessor

initialize

public void initialize(java.util.Map initParams)
Description copied from interface: IGatewayOutputProcessor
Used for dynamic initialization (connection, collection name, file name, etc.)

Specified by:
initialize in interface IGatewayOutputProcessor

initialize

public void initialize(org.w3c.dom.Element outputProcElem,
                       ISearchFieldMap sfMap)
Description copied from interface: IGatewayOutputProcessor
Initialize the GatewayOutputProcessor from XML Configuration Element.

Specified by:
initialize in interface IGatewayOutputProcessor

initialize

public void initialize(org.w3c.dom.Element elem)
Description copied from interface: IResultSetProcessor
Initialize the from XML Element.

Specified by:
initialize in interface IResultSetProcessor

getConfigurationXML

public java.lang.String getConfigurationXML()
Specified by:
getConfigurationXML in interface IGatewayOutputProcessor

getConfigurationXML

public java.lang.String getConfigurationXML(java.lang.String configurationTemplate)
Specified by:
getConfigurationXML in interface IFieldFormatter

updateDocMatcher

public static void updateDocMatcher(java.lang.String sessionID,
                                    java.lang.String docClassifierName,
                                    IDocumentMatcher docMatcher,
                                    boolean copyOldAttributes)
Updates a DocumentClassifier by adding or replacing an IDocumentMatcher. The IDocumentMatcher must have a unique name.


removeDocumentClassifier

public static void removeDocumentClassifier(DocumentClassifier docClassifier)

setSessionID

public void setSessionID(java.lang.String sessionID)