com.raritantechnologies.HTML
Class HTMLScraperPageImportRenderer
java.lang.Object
com.raritantechnologies.searchApp.taglibrary.PageImportRenderer
com.raritantechnologies.HTML.HTMLScraperPageImportRenderer
- All Implemented Interfaces:
- IConfigurable, IPageContextRenderer
- public class HTMLScraperPageImportRenderer
- extends PageImportRenderer
- implements IPageContextRenderer
Uses an HTMLScraper to get the final page. Scraper may be
needed to do a deep scrape to get to a specific page described by a SearchProcess
and set of Scraper XML configuration files.
A login process may also be used to get pages from secure sites.
Use the URLPageImportRenderer if a simple get or
post request is needed. The HTMLScraperPageImportRenderer
is used for "heavy-duty" page imports that may require several steps to be followed.
XML Configuration Template:
<SystemObject type="PageImportRenderer" name="[The SysObject Name]"
configurableClass="com.raritantechnologies.HTML.HTMLScraperPageImportRenderer"
scraperConfig=" scraper config file name "
resultIsHTML="false|true" >
<Fields>
<Field ID="[the field ID]" xPath="path into SearchProcess" />
<Field ID="[the field ID]" xPath="path into SearchProcess" >
<StringFilter class="[ an IStringFilter ]" >
</StringFilter>
</Field>
</Fields>
<LoginProcess>
</LoginProcess>
<SearchProcess>
</SearchProcess>
<OutputStringFilter class="[ class of com.raritantechnologies.utils.filter.IStringFilter" >
</OutputStringFilter>
</SystemObject>
Developed by
Raritan Technologies Inc..
- Author:
- Ted Sullivan
|
Method Summary |
java.lang.String |
getPage(RaritanPageContext pContext)
returns an HTML page or page fragment given a set of request parameters. |
void |
initialize(org.w3c.dom.Element elem)
Initializes the object from an XML tag or element. |
java.lang.String |
render(RaritanPageContext pContext)
Returns the tag body. |
| Methods inherited from class com.raritantechnologies.searchApp.taglibrary.PageImportRenderer |
addPageElement, getAddPersistent, getConfigurationXML, getFragmentFile, getPageHeader, getPageName, getPageTrailer, setAddPersistent, setPageHeader, setPageName, setPageTrailer, setStringFilter |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
HTMLScraperPageImportRenderer
public HTMLScraperPageImportRenderer()
render
public java.lang.String render(RaritanPageContext pContext)
- Description copied from interface:
IPageContextRenderer
- Returns the tag body.
- Specified by:
render in interface IPageContextRenderer- Overrides:
render in class PageImportRenderer
getPage
public java.lang.String getPage(RaritanPageContext pContext)
- Description copied from class:
PageImportRenderer
- returns an HTML page or page fragment given a set of request parameters.
- Overrides:
getPage in class PageImportRenderer
- Parameters:
pContext - contains request and session parameters needed to execute the page
retrieval.
- Returns:
- A string containing the page data.
initialize
public void initialize(org.w3c.dom.Element elem)
- Description copied from interface:
IConfigurable
- Initializes the object from an XML tag or element.
This method is called by the Framework as part of the application initializtion.
see ConfigurationManager, XMLConfigurationManager, XMLSearchFieldMapFactory, XMLSearchSourceFactory.
Configurable objects that are owned or contained by other configurable objects will be initialized
in by the parent object.
- Specified by:
initialize in interface IConfigurable- Overrides:
initialize in class PageImportRenderer