com.raritantechnologies.federated.html
Class HTMLSearchSourceFactory

java.lang.Object
  extended bycom.raritantechnologies.searchApp.XMLSearchSourceFactory
      extended bycom.raritantechnologies.federated.html.HTMLSearchSourceFactory
All Implemented Interfaces:
IXMLSearchSourceFactory

public class HTMLSearchSourceFactory
extends XMLSearchSourceFactory

Responsible for constructing HTMLSearchSource objects from <SourceType> Configuration XML tags. The HTMLSearchSource is used to search web site search engines.

XML Configuration Template:
  <SourceType name="[unique search source name]" 
                 type="HTMLSearchSource" 
                 displayName="[ displayable name]"
                 sourceFactoryClass="com.raritantechnologies.federated.html.HTMLSearchSourceFactory" 
                 queryProcessor="com.raritantechnologies.federated.html.HTMLQueryProcessor"
                 resultIsXML="true|false(default)"
                 cacheCookieKey="[ key to cache cookies from site ]"
                 cookieResultField="[ result field to put cookie String in ]"
                 IDField="[ field to use as unique ID ]"
                 URLField="[ field with fullText URL ]"
                 titleField="[ field with document Title ]" >

  <!-- Describes mapping of query input parameters to the HTML SearchProcess and of the      -->
  <!-- abstract or normalized field names to the field name at the HTML source.              -->
  <Fields>

    <!-- The value of the ID field in the query will be inserted into the SearchProcess      -->
    <!-- element at the xPath location specified in the xPath parameter. The sourceName      -->
    <!-- field defines the name of the field at the HTML source.                             -->
    <Field ID="[ abstract field name ]"
              xPath="[ xPath within the SearchProcess: e.g. '/SearchProcess/Step/params/param[@formName='q']/@value' ]"
              sourceName="[ field name at source (e.g. 'q')]"/>

    <!-- Describes complex formatting needed for a field value.       -->
    <!-- Describes complex formatting needed for a field value.       -->
    <FormattedField uses com.raritantechnologies.utils.format classes -->

    <FormatField value="{(KY)_KY_}"
             xPath="/SearchProcess/Step/params/param[@formName='term']/@value" >

    <FilteredField uses com.raritantechnologies.utils.filter classes -->

    <FilterField class="com.raritantechnologies.utils.filter.IStringFilter"
             xPath="/SearchProcess/Step/params/param[@formName='term']/@value" >

    <Field ID="[ boolean ID ]" xPath="path to search form value ]"
      <!-- FieldLookup allows user selected value to be looked up for use in FormatField -->
      <!-- used by Block: {(TI)(BOOL(TI TTL TI_OP ())) _TTL_[Title]} 
           see com.raritantechnologies.utils.format.BlockFormatter javadocs for details -->
      <FieldMap ID="TIBOOL" name="TI_OP">   
         <Choice abstractVal="AND" sourceVal="AND" />
         <Choice abstractVal="OR"  sourceVal="OR" />
      </FieldMap>
    </Field>

  </Fields>
 
  <SecurityModel>
     <search>[ Public | Restricted ]</search>
  </SecurityModel>

  <!-- Restricted sites require a LoginProcess which defines how login parameters are to be handled -->
  <LoginMap>
      <UserName xPath="[ xPath to userName in LoginProcess ]" />
      <Password xPath="[ xPath to password in LoginProcess ]" />
  </LoginMap>

  <LoginProcess>
    <Step type="[getURL|getURLSocket|postURL|postURLSocket]" URL="[ login form URL ]" >
      <params>
        <param formName="[ name of parameter in html form ]" value="[ form value ]" alwaysOutput="[true|false(default) - use for blank values ]" />
      </params>
    </Step>
  </LoginProcess>

  <!-- For NTLM Authentication - use this format: -->
  <LoginProcess UserName="[user name]" 
                   Password="[password]" 
                   PasswordEnc="[DES encrypted DB password]" />

  <!-- The SearchProcess describes the search form that will be sent to the search site: -->
  <!-- It can consist of one or more "Steps" depending on the site -->
    <SearchProcess 
          outputStep="[ step number (from 1) that generates output - if no value: the last step will be used ]" >
    <Step type="[getURL|getURLSocket|getNTLMAuthorized|postURL|postURLSocket|]" URL="[the URL that the form should be sent to]" >
      <params>
        <param formName="[ name of parameter in html form ]" value="[ form value ]" alwaysOutput="[true|false(default) - use for blank values ]" />
        <param formName="[ name of form parameter ]" value="" alwaysOutput="[true|false(default) - use for blank values ]" />

        <!-- etc. . . -->

      </params>
    </Step>
  </SearchProcess>

  <!-- The PageProcess describes how paging commands (get or post) will be sent to the search site: -->
  <PageProcess mapFrom="[ xPath within result to get paging data ]" method="[tagMap|  ]" >
    <TotalDocs mapFrom="[ xPath within result to get total docs e.g. '/Records/Page/TotalDocs' ]" />
    <Step type="[getURL|getURLSocket|postURL|postURLSocket]" URL="[the URL that the form should be sent to]" >
      <params>
        <param formName="[ name of parameter in html form ]" value="[ form value ]" alwaysOutput="[true|false(default) - use for blank values ]" />

        <!-- computed parameter -->
        <param formName="[ name of parameter in html form ]" value="" computeFrom="[ compute formula with PAGE_NUM as placeholder for page Number with 1 representing the first page ]" />

        <param formName="[ name of parameter ]" computeFrom="[ some string with pattern {COMPUTE_FROM:[ formula ]} ]" />

      </params>
  </PageProcess>

  <ScraperConfigFile>[ path to the HTMLScraper configuration File ]
  <OutputTransformer>[ path to the XSL file that translates the raw XML to result XML ]

  <!-- Optional FieldFormatters section -->
  <FieldFormatters>

    <Formatter formatterClass="[ class of com.raritantechnologies.searchApp.IFieldFormatter ]" >

    </Formatter>

    <!-- etc. . . -->

  </FieldFormatters>

  <ResultSetAttributes>
    <Attribute name="[ attribute name ]" xPath="[ xPath of value within HTMLScraper XML output ]" />
    <Attribute name="[ another ]" xPath="[ its xPath ]" />
    <!-- etc... -->
  </ResultSetAttributes>

  </SourceType>
 

Developed by Raritan Technologies Inc..

Author:
Ted Sullivan

Constructor Summary
HTMLSearchSourceFactory()
           
 
Method Summary
static void addSteps(org.w3c.dom.Document processDoc, org.w3c.dom.Element source)
           
static org.w3c.dom.Document cloneProcessDoc(java.lang.String processType, org.w3c.dom.Element searchProcess)
           
 SearchSource[] createSearchSources(org.w3c.dom.Element sourceElem, ISearchFieldMapFactory factory)
          Extract data from XMLConfig to create an HTMLSearchSource objects.
 
Methods inherited from class com.raritantechnologies.searchApp.XMLSearchSourceFactory
changeBasePath, initializeFieldFormatters, initializeSource, initializeStaticFields
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HTMLSearchSourceFactory

public HTMLSearchSourceFactory()
Method Detail

createSearchSources

public SearchSource[] createSearchSources(org.w3c.dom.Element sourceElem,
                                          ISearchFieldMapFactory factory)
Extract data from XMLConfig to create an HTMLSearchSource objects. Needs to set:

the HTMLScraper config Document

LoginProcess Document template (if necessary) loginProcessMap (map of UserName --> XPath to set value in LoginProcess Document, Password --> XPath to set value in LoginProcess Document SearchProcess Document of the HTMLSearchSource Creates HTMLSearchFields for each in the SearchProcess. (Note needs to have an ID attribute that maps to FieldID sets the HTMLSearchSource "setSourceField" property to XPath within the SearchProcess Template of the field Value. HTMLSearchField properties: PageSizePath "PageSize" mapped to XPath for setting page size if applicable StartRecPath "StartRec" mapped to XPath for setting start rec if applicable The Map is generated by the HTMLSearchSourceFactory from the Config XML for an HTMLSearchSource.

Specified by:
createSearchSources in interface IXMLSearchSourceFactory
Specified by:
createSearchSources in class XMLSearchSourceFactory

cloneProcessDoc

public static org.w3c.dom.Document cloneProcessDoc(java.lang.String processType,
                                                   org.w3c.dom.Element searchProcess)

addSteps

public static void addSteps(org.w3c.dom.Document processDoc,
                            org.w3c.dom.Element source)