com.raritantechnologies.concept
Class TermExtractorDocumentKeywordProcessor

java.lang.Object
  extended bycom.raritantechnologies.concept.AbstractDocumentKeywordProcessor
      extended bycom.raritantechnologies.concept.TermExtractorDocumentKeywordProcessor
All Implemented Interfaces:
IConfigurable, IDocumentKeywordProcessor, IGatewayOutputProcessor, IResultSetProcessor

public class TermExtractorDocumentKeywordProcessor
extends AbstractDocumentKeywordProcessor
implements IDocumentKeywordProcessor

Uses an ITermExtractor to process documents for keywords. Works with RelatedDocumentProcessor. Can get term Maps using local term extractors or from System cache.

XML Configuration Template:
  <DocumentProcessor class="com.raritantechnologies.concept.TermExtractorDocumentKeywordProcessor" >

    <!-- One or more TermExtractor elements: -->
    <TermExtractor class="[ class of com.raritantechnologies.utils.tagging.ITermExtractor ]"
                      resultKeyField="[ field where keywords will be stored ]"
                      textFields="[ comma separated field list of fields to process for keywords ]"
                      documentFields="[ list of document url fields ]" >

    </TermExtractor>

    <!-- etc . . . -->

  </DocumentProcessor>
 

Developed by Raritan Technologies .

Author:
Ted Sullivan

Field Summary
 
Fields inherited from class com.raritantechnologies.concept.AbstractDocumentKeywordProcessor
documents, resKeyField
 
Constructor Summary
TermExtractorDocumentKeywordProcessor()
           
 
Method Summary
 java.lang.String getConfigurationXML()
           
protected  void getWords(IResult result, java.lang.String text, java.lang.String resultKey)
          Subclasses must implement this method: extract keywords from the text for the document given by resultKey.
 void initialize(org.w3c.dom.Element elem)
          Initialize the from XML Element.
 boolean isKeyword(WordCount wordCount)
           
 
Methods inherited from class com.raritantechnologies.concept.AbstractDocumentKeywordProcessor
addWord, addWord, dataComplete, dataComplete, getDocuments, getDocuments, getKeywordAssociations, getKeywords, getWordCounts, getWordDocumentMap, getWordDocumentMap, initialize, initialize, processData, processResult, processResultSet, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.raritantechnologies.concept.IDocumentKeywordProcessor
dataComplete, dataComplete, getDocuments, getKeywordAssociations, getKeywords, getWordDocumentMap, getWordDocumentMap, processResultSet, reset
 
Methods inherited from interface com.raritantechnologies.searchApp.IResultSetProcessor
initialize
 
Methods inherited from interface com.raritantechnologies.searchApp.dataCollection.IGatewayOutputProcessor
initialize, initialize, processData
 

Constructor Detail

TermExtractorDocumentKeywordProcessor

public TermExtractorDocumentKeywordProcessor()
Method Detail

getWords

protected void getWords(IResult result,
                        java.lang.String text,
                        java.lang.String resultKey)
Description copied from class: AbstractDocumentKeywordProcessor
Subclasses must implement this method: extract keywords from the text for the document given by resultKey. The implemented method should call the addWord( ) method with each keyword or word.

Specified by:
getWords in class AbstractDocumentKeywordProcessor

isKeyword

public boolean isKeyword(WordCount wordCount)
Specified by:
isKeyword in class AbstractDocumentKeywordProcessor

initialize

public void initialize(org.w3c.dom.Element elem)
Description copied from interface: IResultSetProcessor
Initialize the from XML Element.

Specified by:
initialize in interface IResultSetProcessor
Overrides:
initialize in class AbstractDocumentKeywordProcessor

getConfigurationXML

public java.lang.String getConfigurationXML()
Specified by:
getConfigurationXML in interface IGatewayOutputProcessor