com.raritantechnologies.searchApp.dataCollection
Class ResultTermsCollector

java.lang.Object
  extended bycom.raritantechnologies.searchApp.dataCollection.GatewayProcessorFilter
      extended bycom.raritantechnologies.searchApp.dataCollection.ResultTermsCollector
All Implemented Interfaces:
ICollectionIndexer, IConfigurable, IGatewayOutputProcessor, IResultSetProcessor

public class ResultTermsCollector
extends GatewayProcessorFilter

GatewayProcessorFilter that extracts a set of terms from a result set. Can be used to generate a lookup source, browse list "hot topics" list, etc.

Uses an InMemorySearchSource to cache the terms as they are extracted. This will de-duplicate and count the occurrences of the terms.

XML Configuration Template:
 <GatewayOutputProcessor class"com.raritantechnologies.searchApp.dataCollection.ResultTermsCollector" >

    <Terms>
      <!-- Extracts a term result from a set of single value fields -->
      <Term>

         <!-- Optional IResultMatcher filter to select results for term extraction -->
         <IncludeFilter class="[ class of com.raritantechnologies.searchApp.IResultMatcher ]" >

         </IncludeFilter>

         <!-- Optional IResultMatcher filter to exclude results for term extraction -->
         <ExcludeFilter class="[ class of com.raritantechnologies.searchApp.IResultMatcher ]" >

         </ExcludeFilter>
         
         <Field ID="[ the fieldID in the term result ]"
                   primarySourceID="[fieldID in input result]"
                   inputDelimiter="[ delimiter for input fields ]" />

         <Field ID="[the fieldID in the term result]"
                   value="[a fixed value]" 
                   matchPattern="[optional regular expression pattern to extract tokens]" />

         
         <Field ID="[the fieldID in the term result]"
                   primarySourceID="[fieldID in input result]"
                   secondarySourceID="[backup field ID if first is empty]"
                   matchPattern="[optional regular expression pattern to extract tokens]" />
 
         <!-- multiple value fields: create results with all permutations -->
         <Field ID="[another field ID of term]"
                   primarySourceID="[fieldID in input result]"
                   multiple="true"
                   matchPattern="[optional regular expression pattern to extract tokens]" />

         <Field ID="[the fieldID in the term result]"
                   primarySourceID="[fieldID in input result]"
                   secondarySourceID="[backup field ID if first is empty]" >

            <!-- Alternatively - can use an ITermExtractor to generate terms -->
            <TermExtractor class="[ class of com.raritantechnologies.utils.tagging.ITermExtractor ]" >

            </TermExtractor>
         </Field>

      </Term>

      <!-- Extracts a term result from nested results
      <Term nestedPath="/path/to/nested/result" >
         <Field ID="[the fieldID in the term result],[alternate field in term result]"
                   primarySourceID="[fieldID in input result]"
                   secondarySourceID="[backup field ID if first is empty]" 
                   matchPattern="[optional regular expression pattern to extract tokens]" />

         <Field ID="[another field ID of term]"
                   primarySourceID="[fieldID in input result]"
                   secondarySourceID="[backup field ID if first is empty]"
                   matchPattern="[optional regular expression pattern to extract tokens]" />
      </Term>
    
    </Terms>

    <!-- Proxy output processor -->
    <OutputProcessor class="[IGatewayOutputProcessor class]" >
      <!-- OutputProcessor details -->
    </OutputProcessor>

    <!-- Additional result set filters -->
    <PostProcessor class="[GatewayProcessorFilter class ]" >
      <!-- PostProcessor details -->
    </PostProcessor>
 </GatewayOutputProcessor>
 

Developed by Raritan Technologies .

Author:
Ted Sullivan

Field Summary
 
Fields inherited from class com.raritantechnologies.searchApp.dataCollection.GatewayProcessorFilter
postProcessorFilters
 
Fields inherited from interface com.raritantechnologies.searchApp.dataCollection.ICollectionIndexer
ADD, CREATE, DELETE, UPDATE
 
Constructor Summary
ResultTermsCollector()
           
 
Method Summary
 void dataComplete()
          Data feed is complete.
protected  IResultSet filterData(IResultSet data)
           
 void initialize(org.w3c.dom.Element outputProcElem, ISearchFieldMap sfMap)
          Initialize the GatewayOutputProcessor from XML Configuration Element.
 java.lang.String processData(IResultSet data)
          Overrides super class method.
 
Methods inherited from class com.raritantechnologies.searchApp.dataCollection.GatewayProcessorFilter
addOutputProcessor, addPostProcessor, filterResultSet, getConfigurationXML, initialize, initialize, processResultSet, sendToOutput, setIndexMode, setResultMatcher, setUserMatcher
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ResultTermsCollector

public ResultTermsCollector()
Method Detail

processData

public java.lang.String processData(IResultSet data)
Overrides super class method.

Specified by:
processData in interface IGatewayOutputProcessor
Overrides:
processData in class GatewayProcessorFilter

filterData

protected IResultSet filterData(IResultSet data)
Overrides:
filterData in class GatewayProcessorFilter

dataComplete

public void dataComplete()
Description copied from class: GatewayProcessorFilter
Data feed is complete.

Specified by:
dataComplete in interface IGatewayOutputProcessor
Overrides:
dataComplete in class GatewayProcessorFilter

initialize

public void initialize(org.w3c.dom.Element outputProcElem,
                       ISearchFieldMap sfMap)
Description copied from interface: IGatewayOutputProcessor
Initialize the GatewayOutputProcessor from XML Configuration Element.

Specified by:
initialize in interface IGatewayOutputProcessor
Overrides:
initialize in class GatewayProcessorFilter