com.raritantechnologies.concept
Class WordCountDocKeywordProcessor

java.lang.Object
  extended bycom.raritantechnologies.concept.AbstractDocumentKeywordProcessor
      extended bycom.raritantechnologies.concept.WordCountDocKeywordProcessor
All Implemented Interfaces:
IConfigurable, IDocumentKeywordProcessor, IGatewayOutputProcessor, IResultSetProcessor

public class WordCountDocKeywordProcessor
extends AbstractDocumentKeywordProcessor
implements IDocumentKeywordProcessor

This type of processor calculates relative word frequencies - a Keyword is defined as a word that occurs in some range of document percentages.

XML Configuration Template:
  <DocumentProcessor class="com.raritantechnologies.concept.WordCountDocKeywordProcessor"
                        resultKeyField="[ field where keywords will be stored ]"
                        textFields="[ comma separated field list ]"
                        documentFields="[ list of document url fields ]"
                        minWordFrequency="[ percentage value for minimum word frequency considered as keyword ]"
                        maxWordFrequency="[ percentage value for maximum word frequency considered as keyword ]" >

  </DocumentProcessor>
 

Developed by Raritan Technologies .

Author:
Ted Sullivan

Field Summary
 
Fields inherited from class com.raritantechnologies.concept.AbstractDocumentKeywordProcessor
documents, resKeyField
 
Constructor Summary
WordCountDocKeywordProcessor()
           
 
Method Summary
 java.lang.String getConfigurationXML()
           
 double getHighRange()
           
 double getLowRange()
           
protected  void getWords(IResult result, java.lang.String text, java.lang.String resultKey)
          Subclasses must implement this method: extract keywords from the text for the document given by resultKey.
 void initialize(org.w3c.dom.Element elem)
          Initialize the from XML Element.
 boolean isKeyword(WordCount wordCount)
           
 void setHighRange(double highRange)
           
 void setLowRange(double lowRange)
           
 
Methods inherited from class com.raritantechnologies.concept.AbstractDocumentKeywordProcessor
addWord, addWord, dataComplete, dataComplete, getDocuments, getDocuments, getKeywordAssociations, getKeywords, getWordCounts, getWordDocumentMap, getWordDocumentMap, initialize, initialize, processData, processResult, processResultSet, reset
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.raritantechnologies.concept.IDocumentKeywordProcessor
dataComplete, dataComplete, getDocuments, getKeywordAssociations, getKeywords, getWordDocumentMap, getWordDocumentMap, processResultSet, reset
 
Methods inherited from interface com.raritantechnologies.searchApp.IResultSetProcessor
initialize
 
Methods inherited from interface com.raritantechnologies.searchApp.dataCollection.IGatewayOutputProcessor
initialize, initialize, processData
 

Constructor Detail

WordCountDocKeywordProcessor

public WordCountDocKeywordProcessor()
Method Detail

getWords

protected void getWords(IResult result,
                        java.lang.String text,
                        java.lang.String resultKey)
Description copied from class: AbstractDocumentKeywordProcessor
Subclasses must implement this method: extract keywords from the text for the document given by resultKey. The implemented method should call the addWord( ) method with each keyword or word.

Specified by:
getWords in class AbstractDocumentKeywordProcessor

isKeyword

public boolean isKeyword(WordCount wordCount)
Specified by:
isKeyword in class AbstractDocumentKeywordProcessor

initialize

public void initialize(org.w3c.dom.Element elem)
Description copied from interface: IResultSetProcessor
Initialize the from XML Element.

Specified by:
initialize in interface IResultSetProcessor
Overrides:
initialize in class AbstractDocumentKeywordProcessor

setLowRange

public void setLowRange(double lowRange)

getLowRange

public double getLowRange()

setHighRange

public void setHighRange(double highRange)

getHighRange

public double getHighRange()

getConfigurationXML

public java.lang.String getConfigurationXML()
Specified by:
getConfigurationXML in interface IGatewayOutputProcessor