com.raritantechnologies.federated.html
Class HTMLSearchSourceFactory
java.lang.Object
com.raritantechnologies.searchApp.XMLSearchSourceFactory
com.raritantechnologies.federated.html.HTMLSearchSourceFactory
- All Implemented Interfaces:
- IXMLSearchSourceFactory
- public class HTMLSearchSourceFactory
- extends XMLSearchSourceFactory
Responsible for constructing HTMLSearchSource
objects from <SourceType> Configuration XML tags. The HTMLSearchSource is used to search
web site search engines.
XML Configuration Template:
<SourceType name="[unique search source name]"
type="HTMLSearchSource"
displayName="[ displayable name]"
sourceFactoryClass="com.raritantechnologies.federated.html.HTMLSearchSourceFactory"
queryProcessor="com.raritantechnologies.federated.html.HTMLQueryProcessor"
resultIsXML="true|false(default)"
cacheCookieKey="[ key to cache cookies from site ]"
cookieResultField="[ result field to put cookie String in ]"
IDField="[ field to use as unique ID ]"
URLField="[ field with fullText URL ]"
titleField="[ field with document Title ]" >
<!-- Describes mapping of query input parameters to the HTML SearchProcess and of the -->
<!-- abstract or normalized field names to the field name at the HTML source. -->
<Fields>
<!-- The value of the ID field in the query will be inserted into the SearchProcess -->
<!-- element at the xPath location specified in the xPath parameter. The sourceName -->
<!-- field defines the name of the field at the HTML source. -->
<Field ID="[ abstract field name ]"
xPath="[ xPath within the SearchProcess: e.g. '/SearchProcess/Step/params/param[@formName='q']/@value' ]"
sourceName="[ field name at source (e.g. 'q')]"/>
<!-- Describes complex formatting needed for a field value. -->
<!-- Describes complex formatting needed for a field value. -->
<FormattedField uses com.raritantechnologies.utils.format classes -->
<FormatField value="{(KY)_KY_}"
xPath="/SearchProcess/Step/params/param[@formName='term']/@value" >
<FilteredField uses com.raritantechnologies.utils.filter classes -->
<FilterField class="com.raritantechnologies.utils.filter.IStringFilter"
xPath="/SearchProcess/Step/params/param[@formName='term']/@value" >
<Field ID="[ boolean ID ]" xPath="path to search form value ]"
<!-- FieldLookup allows user selected value to be looked up for use in FormatField -->
<!-- used by Block: {(TI)(BOOL(TI TTL TI_OP ())) _TTL_[Title]}
see com.raritantechnologies.utils.format.BlockFormatter javadocs for details -->
<FieldMap ID="TIBOOL" name="TI_OP">
<Choice abstractVal="AND" sourceVal="AND" />
<Choice abstractVal="OR" sourceVal="OR" />
</FieldMap>
</Field>
</Fields>
<SecurityModel>
<search>[ Public | Restricted ]</search>
</SecurityModel>
<!-- Restricted sites require a LoginProcess which defines how login parameters are to be handled -->
<LoginMap>
<UserName xPath="[ xPath to userName in LoginProcess ]" />
<Password xPath="[ xPath to password in LoginProcess ]" />
</LoginMap>
<LoginProcess>
<Step type="[getURL|getURLSocket|postURL|postURLSocket]" URL="[ login form URL ]" >
<params>
<param formName="[ name of parameter in html form ]" value="[ form value ]" alwaysOutput="[true|false(default) - use for blank values ]" />
</params>
</Step>
</LoginProcess>
<!-- For NTLM Authentication - use this format: -->
<LoginProcess UserName="[user name]"
Password="[password]"
PasswordEnc="[DES encrypted DB password]" />
<!-- The SearchProcess describes the search form that will be sent to the search site: -->
<!-- It can consist of one or more "Steps" depending on the site -->
<SearchProcess
outputStep="[ step number (from 1) that generates output - if no value: the last step will be used ]" >
<Step type="[getURL|getURLSocket|getNTLMAuthorized|postURL|postURLSocket|]" URL="[the URL that the form should be sent to]" >
<params>
<param formName="[ name of parameter in html form ]" value="[ form value ]" alwaysOutput="[true|false(default) - use for blank values ]" />
<param formName="[ name of form parameter ]" value="" alwaysOutput="[true|false(default) - use for blank values ]" />
<!-- etc. . . -->
</params>
</Step>
</SearchProcess>
<!-- The PageProcess describes how paging commands (get or post) will be sent to the search site: -->
<PageProcess mapFrom="[ xPath within result to get paging data ]" method="[tagMap| ]" >
<TotalDocs mapFrom="[ xPath within result to get total docs e.g. '/Records/Page/TotalDocs' ]" />
<Step type="[getURL|getURLSocket|postURL|postURLSocket]" URL="[the URL that the form should be sent to]" >
<params>
<param formName="[ name of parameter in html form ]" value="[ form value ]" alwaysOutput="[true|false(default) - use for blank values ]" />
<!-- computed parameter -->
<param formName="[ name of parameter in html form ]" value="" computeFrom="[ compute formula with PAGE_NUM as placeholder for page Number with 1 representing the first page ]" />
<param formName="[ name of parameter ]" computeFrom="[ some string with pattern {COMPUTE_FROM:[ formula ]} ]" />
</params>
</PageProcess>
<ScraperConfigFile>[ path to the HTMLScraper configuration File ]
<OutputTransformer>[ path to the XSL file that translates the raw XML to result XML ]
<!-- Optional FieldFormatters section -->
<FieldFormatters>
<Formatter formatterClass="[ class of com.raritantechnologies.searchApp.IFieldFormatter ]" >
</Formatter>
<!-- etc. . . -->
</FieldFormatters>
<ResultSetAttributes>
<Attribute name="[ attribute name ]" xPath="[ xPath of value within HTMLScraper XML output ]" />
<Attribute name="[ another ]" xPath="[ its xPath ]" />
<!-- etc... -->
</ResultSetAttributes>
</SourceType>
Developed by
Raritan Technologies Inc..
- Author:
- Ted Sullivan
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HTMLSearchSourceFactory
public HTMLSearchSourceFactory()
createSearchSources
public SearchSource[] createSearchSources(org.w3c.dom.Element sourceElem,
ISearchFieldMapFactory factory)
- Extract data from XMLConfig to create an HTMLSearchSource objects.
Needs to set:
the HTMLScraper config Document
LoginProcess Document template (if necessary)
loginProcessMap (map of UserName --> XPath to set value in LoginProcess Document,
Password --> XPath to set value in LoginProcess Document
SearchProcess Document
of the HTMLSearchSource
Creates HTMLSearchFields for each in the SearchProcess.
(Note needs to have an ID attribute that maps to FieldID
sets the HTMLSearchSource "setSourceField" property to XPath within the SearchProcess
Template of the field Value.
HTMLSearchField properties:
PageSizePath "PageSize" mapped to XPath for setting page size if applicable
StartRecPath "StartRec" mapped to XPath for setting start rec if applicable
The Map is generated by the HTMLSearchSourceFactory from the Config XML for an
HTMLSearchSource.
- Specified by:
createSearchSources in interface IXMLSearchSourceFactory- Specified by:
createSearchSources in class XMLSearchSourceFactory
cloneProcessDoc
public static org.w3c.dom.Document cloneProcessDoc(java.lang.String processType,
org.w3c.dom.Element searchProcess)
addSteps
public static void addSteps(org.w3c.dom.Document processDoc,
org.w3c.dom.Element source)