- In the HTMLScraperFilter demo
as part of the Snippet extraction process: Combines a URLContentFilter that
retrieves the snippet with an HTMLFilter / ExtractHTMLSectionFilter EventProcessor that gets the
HTML section containing the snippet.
<StringFilter class="com.raritantechnologies.utils.filter.SequentialStringFilter" >
<!-- ============================================================== -->
<!-- URLContentFilter: Transforms the URL into its content -->
<!-- ============================================================== -->
<StringFilter class="com.raritantechnologies.utils.filter.URLContentFilter"
useURLSocket="false"
requestMethod="get" />
<!-- =================================================================== -->
<!-- HTMLFilter: Extracts the portion of the HTML document between the -->
<!-- #BeginEditable "comment" comment and the #EndEditable HTML comment -->
<!-- =================================================================== -->
<StringFilter class="com.raritantechnologies.HTML.filter.HTMLFilter" >
<EventProcessor class="com.raritantechnologies.HTML.filter.ExtractHTMLSectionFilter" >
<StartComparator class="com.raritantechnologies.utils.comparators.AndComparator" >
<Comparator class="com.raritantechnologies.utils.comparators.StringContainsComparator"
contains="#BeginEditable" />
<Comparator class="com.raritantechnologies.utils.comparators.StringContainsComparator"
contains="content" />
</StartComparator>
<EndComparator class="com.raritantechnologies.utils.comparators.StringContainsComparator"
contains="#EndEditable" />
</EventProcessor>
</StringFilter>
</StringFilter>
- In the Google SOAPSearchSource
to clean up encoded html markup that Google puts into the XML response so that
it displays correctly. (common XML/HTML problem with the '<' character.)
<!-- unencode < and > markup so that this displays as intended -->
<OutputFilter class="com.raritantechnologies.utils.filter.SequentialStringFilter" >
<StringFilter class="com.raritantechnologies.utils.filter.ReplaceSubstringFilter"
inPattern="<" outPattern="<" replace="ALL" />
<StringFilter class="com.raritantechnologies.utils.filter.ReplaceSubstringFilter"
inPattern=">" outPattern=">" replace="ALL" />
</OutputFilter>
- In the HTMLFilter demo as part of a complex hyperlink rewriting operation
used to integrate the standard Javadoc pages with example pages like this one:
<StringFilter class="com.raritantechnologies.utils.filter.SequentialStringFilter" >
<StringFilter class="com.raritantechnologies.utils.filter.RegExprStringFilter"
inPattern="(.*)/com/raritantechnologies(.*)"
outPattern="/FrameworkDocumentation/JavadocPage.jsp?javadoc=/doc/com/raritantechnologies$2" />
<StringFilter class="com.raritantechnologies.utils.filter.ConcatenateFilter" >
. . .
</StringFilter>