com.raritantechnologies.concept
Class AbstractDocumentKeywordProcessor

java.lang.Object
  extended bycom.raritantechnologies.concept.AbstractDocumentKeywordProcessor
All Implemented Interfaces:
IConfigurable, IDocumentKeywordProcessor, IGatewayOutputProcessor, IResultSetProcessor
Direct Known Subclasses:
KeywordFieldsDocumentKeywordProcessor, ProfilerDocumentKeywordProcessor, TermExtractorDocumentKeywordProcessor, WordCountDocKeywordProcessor

public abstract class AbstractDocumentKeywordProcessor
extends java.lang.Object
implements IDocumentKeywordProcessor

Base class for Document-Keyword processors. Implements basic functionality except for the detection of keywords. Subclasses should implement this by implementing the getWords( ) and isKeyword( ) methods. The getWords( ) implementation should detect keywords in the text, and add them to the processor by calling the superclass method addWord( word, docKey ).

XML Configuration Template:
  <DocumentProcessor class="[ some subclass of com.raritantechnologies.concept.AbstractDocumentKeywordProcessor ]"
                        resultKeyField="[ field where keywords will be stored ]"
                        textFields="[ comma separated field list ]"
                        documentFields="[ list of document url fields ]" >

  </DocumentProcessor>
 

Developed by Raritan Technologies .

Author:
Ted Sullivan

Field Summary
protected  java.util.HashMap documents
           
protected  java.lang.String resKeyField
           
 
Constructor Summary
AbstractDocumentKeywordProcessor()
           
 
Method Summary
protected  void addWord(java.lang.String word, java.lang.String docKey, IResult result)
           
protected  void addWord(java.lang.String word, java.lang.String docKey, IResult result, int[] positions)
           
 void dataComplete()
          Data feed is complete.
 void dataComplete(boolean computeRelatedDocs)
           
 java.util.Map getDocuments()
          returns Map of document Key -> Document
 java.util.List getDocuments(WordCount wc)
           
 java.util.Map getKeywordAssociations(int minAssociationDistance, boolean returnRanked)
          returns a sorted map of keyword --> java.util.HashMap of associated keywords The associated keywords map contains Keyword.AssociatedKeywordData objects defining the strength of the association and a set of sentences containing the keyword and the associated keyword.
 java.util.Map getKeywords()
          returns Map of keyword text -> Keyword object.
 java.util.List getWordCounts(WordCountComparator sortBy)
           
 OrderedMap getWordDocumentMap()
          Returns an OrderedMap of keyword --> List of documents containing the keyword.
 OrderedMap getWordDocumentMap(WordCountComparator sortBy)
          Returns a map of keyword --> List of documents containing the keyword.
protected abstract  void getWords(IResult result, java.lang.String text, java.lang.String docKey)
          Subclasses must implement this method: extract keywords from the text for the document given by resultKey.
 void initialize(org.w3c.dom.Element elem)
          Initialize the from XML Element.
 void initialize(org.w3c.dom.Element outputProcElem, ISearchFieldMap sfMap)
          Initialize the GatewayOutputProcessor from XML Configuration Element.
 void initialize(java.util.Map initParams)
          Dynamic initialization.
abstract  boolean isKeyword(WordCount wordCount)
           
 java.lang.String processData(IResultSet data)
          returns name of XML File created/appended.
protected  void processResult(IResult result)
           
 void processResultSet(java.lang.String sessionID, IResultSet data)
          processes the IResultSet (somehow)
 void reset()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface com.raritantechnologies.searchApp.dataCollection.IGatewayOutputProcessor
getConfigurationXML
 

Field Detail

documents

protected java.util.HashMap documents

resKeyField

protected java.lang.String resKeyField
Constructor Detail

AbstractDocumentKeywordProcessor

public AbstractDocumentKeywordProcessor()
Method Detail

reset

public void reset()
Specified by:
reset in interface IDocumentKeywordProcessor

processData

public java.lang.String processData(IResultSet data)
Description copied from interface: IGatewayOutputProcessor
returns name of XML File created/appended.

Specified by:
processData in interface IGatewayOutputProcessor

processResultSet

public void processResultSet(java.lang.String sessionID,
                             IResultSet data)
Description copied from interface: IResultSetProcessor
processes the IResultSet (somehow)

Specified by:
processResultSet in interface IDocumentKeywordProcessor

processResult

protected void processResult(IResult result)

getWords

protected abstract void getWords(IResult result,
                                 java.lang.String text,
                                 java.lang.String docKey)
Subclasses must implement this method: extract keywords from the text for the document given by resultKey. The implemented method should call the addWord( ) method with each keyword or word.


addWord

protected void addWord(java.lang.String word,
                       java.lang.String docKey,
                       IResult result)

addWord

protected void addWord(java.lang.String word,
                       java.lang.String docKey,
                       IResult result,
                       int[] positions)

getWordDocumentMap

public OrderedMap getWordDocumentMap()
Returns an OrderedMap of keyword --> List of documents containing the keyword. OrderedMap keys are sorted by number of word counts (descending)

Specified by:
getWordDocumentMap in interface IDocumentKeywordProcessor

getWordDocumentMap

public OrderedMap getWordDocumentMap(WordCountComparator sortBy)
Returns a map of keyword --> List of documents containing the keyword. OrderedMap can be ordered alphabetically or by Word counts (word frequency )

Specified by:
getWordDocumentMap in interface IDocumentKeywordProcessor

getWordCounts

public java.util.List getWordCounts(WordCountComparator sortBy)

getDocuments

public java.util.List getDocuments(WordCount wc)

dataComplete

public void dataComplete()
Description copied from interface: IResultSetProcessor
Data feed is complete.

Specified by:
dataComplete in interface IDocumentKeywordProcessor

dataComplete

public void dataComplete(boolean computeRelatedDocs)
Specified by:
dataComplete in interface IDocumentKeywordProcessor

isKeyword

public abstract boolean isKeyword(WordCount wordCount)

getDocuments

public java.util.Map getDocuments()
returns Map of document Key -> Document

Specified by:
getDocuments in interface IDocumentKeywordProcessor

getKeywords

public java.util.Map getKeywords()
returns Map of keyword text -> Keyword object.

Specified by:
getKeywords in interface IDocumentKeywordProcessor

getKeywordAssociations

public java.util.Map getKeywordAssociations(int minAssociationDistance,
                                            boolean returnRanked)
returns a sorted map of keyword --> java.util.HashMap of associated keywords The associated keywords map contains Keyword.AssociatedKeywordData objects defining the strength of the association and a set of sentences containing the keyword and the associated keyword. if Ranked is true. Returns a map of keyword --> OrderedMap of AssociatedKeywordData objects in which the associated keyword with the highest number of associations is returned first.

Specified by:
getKeywordAssociations in interface IDocumentKeywordProcessor

initialize

public void initialize(java.util.Map initParams)
Description copied from interface: IResultSetProcessor
Dynamic initialization.

Specified by:
initialize in interface IResultSetProcessor

initialize

public void initialize(org.w3c.dom.Element outputProcElem,
                       ISearchFieldMap sfMap)
Description copied from interface: IGatewayOutputProcessor
Initialize the GatewayOutputProcessor from XML Configuration Element.

Specified by:
initialize in interface IGatewayOutputProcessor

initialize

public void initialize(org.w3c.dom.Element elem)
Description copied from interface: IResultSetProcessor
Initialize the from XML Element.

Specified by:
initialize in interface IResultSetProcessor