com.raritantechnologies.utils
Class Stemmer

java.lang.Object
  extended bycom.raritantechnologies.utils.Stemmer

public class Stemmer
extends java.lang.Object

Stemmer, implementing the Porter Stemming Algorithm enhanced to give more accurate stems and variations as well.

The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods.


Developed by Raritan Technologies .

Author:
Kepler Gelotte

Constructor Summary
Stemmer()
           
 
Method Summary
 void add(char ch)
          Add a character to the word being stemmed.
 void add(char[] w, int wLen)
          Adds wLen characters to the word being stemmed contained in a portion of a char[] array.
static java.lang.String comparative(java.lang.String s)
          comparative returns the comparative form -er Comparative tall + er
 char[] getResultBuffer()
          Returns a reference to a character buffer containing the results of the stemming process.
 int getResultLength()
          Returns the length of the word resulting from the stemming process.
 java.util.Set getStemVariants(java.lang.String term)
          Stem the word placed into the Stemmer buffer through calls to add().
 java.util.Set getStemVariants(java.lang.String term, boolean includeTerm)
           
 java.lang.String getStemVariants(java.lang.String term, java.lang.String delimiter)
          Stem the word placed into the Stemmer buffer through calls to add().
static java.lang.String[] irregularPast(java.lang.String s)
          irregularPast returns the irregular past tense(s) of a word
static java.lang.String irregularPlural(java.lang.String s)
           
static java.lang.String irregularSingular(java.lang.String s)
           
static boolean isPlural(java.lang.String s)
           
static void main(java.lang.String[] args)
          Test program for demonstrating the Stemmer.
static java.lang.String past(java.lang.String s)
          past returns the past tense of a word -ed Past walk + ed
static java.lang.String plural(java.lang.String s)
          plural returns the plural form of a word -s Plural dog + s -s 3rd sing Present sing + s
static java.lang.String progressive(java.lang.String s)
          progressive returns the progressive form -ing Progressive say + ing
static java.lang.String singular(java.lang.String s)
           
 void stem()
          Stem the word placed into the Stemmer buffer through calls to add().
static java.lang.String[] stemPhrase(java.lang.String phrase)
           
static java.lang.String stemWord(java.lang.String word)
           
static java.lang.String superlative(java.lang.String s)
          superlative returns the mostest form -est Superlative
 java.lang.String toString()
          After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Stemmer

public Stemmer()
Method Detail

stemWord

public static java.lang.String stemWord(java.lang.String word)

stemPhrase

public static java.lang.String[] stemPhrase(java.lang.String phrase)

add

public void add(char ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to stem the word.


add

public void add(char[] w,
                int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[] array. This is like repeated calls of add(char ch), but faster.


toString

public java.lang.String toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)


getResultLength

public int getResultLength()
Returns the length of the word resulting from the stemming process.


getResultBuffer

public char[] getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.


getStemVariants

public java.util.Set getStemVariants(java.lang.String term)
Stem the word placed into the Stemmer buffer through calls to add(). Returns a Set containing all stemmed variations.


getStemVariants

public java.util.Set getStemVariants(java.lang.String term,
                                     boolean includeTerm)

getStemVariants

public java.lang.String getStemVariants(java.lang.String term,
                                        java.lang.String delimiter)
Stem the word placed into the Stemmer buffer through calls to add(). Returns a Set containing all stemmed variations.


stem

public void stem()
Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().


plural

public static java.lang.String plural(java.lang.String s)
plural returns the plural form of a word -s Plural dog + s -s 3rd sing Present sing + s


isPlural

public static boolean isPlural(java.lang.String s)

singular

public static java.lang.String singular(java.lang.String s)

past

public static java.lang.String past(java.lang.String s)
past returns the past tense of a word -ed Past walk + ed


irregularPast

public static java.lang.String[] irregularPast(java.lang.String s)
irregularPast returns the irregular past tense(s) of a word


irregularPlural

public static java.lang.String irregularPlural(java.lang.String s)

irregularSingular

public static java.lang.String irregularSingular(java.lang.String s)

progressive

public static java.lang.String progressive(java.lang.String s)
progressive returns the progressive form -ing Progressive say + ing


comparative

public static java.lang.String comparative(java.lang.String s)
comparative returns the comparative form -er Comparative tall + er


superlative

public static java.lang.String superlative(java.lang.String s)
superlative returns the mostest form -est Superlative


main

public static void main(java.lang.String[] args)
Test program for demonstrating the Stemmer. It reads text from standard input, stems each word, and writes the result to standard output.