org.itc.irst.tcc.sre.util
Class PorterStemmer

java.lang.Object
  extended by org.itc.irst.tcc.sre.util.PorterStemmer
All Implemented Interfaces:
Stemmer

public class PorterStemmer
extends java.lang.Object
implements Stemmer

Stemmer, implementing the Porter Stemming Algorithm The Stemmer class transforms a word into its root form. The input word can be provided a character at time (by calling add()), or at once by calling one of the various stem(something) methods.


Field Summary
private  char[] b
           
private  boolean dirty
           
private static int EXTRA
           
private  int i
           
private static int INC
           
private  int j
           
private  int k
           
private  int k0
           
(package private) static org.apache.log4j.Logger logger
          Define a static logger variable so that it references the Logger instance named PorterStemmer.
private static PorterStemmer stemmer
          A prototype for a feature factory such that only one instance class can ever exist.
 
Constructor Summary
PorterStemmer()
           
 
Method Summary
 void add(char ch)
          Add a character to the word being stemmed.
 void add(char[] ch)
          Add a character to the word being stemmed.
private  boolean cons(int i)
           
private  boolean cvc(int i)
           
private  boolean doublec(int j)
           
private  boolean ends(java.lang.String s)
           
 char[] getResultBuffer()
          Returns a reference to a character buffer containing the results of the stemming process.
 int getResultLength()
          Returns the length of the word resulting from the stemming process.
static PorterStemmer getStemmer()
           
private  int m()
           
static void main(java.lang.String[] args)
          Test program for demonstrating the Stemmer.
(package private)  void r(java.lang.String s)
           
 void reset()
          reset() resets the stemmer so it can stem another word.
(package private)  void setto(java.lang.String s)
           
 boolean stem()
          Stem the word placed into the Stemmer buffer through calls to add().
 boolean stem(char[] word)
          Stem a word contained in a char[].
 boolean stem(char[] word, int wordLen)
          Stem a word contained in a leading portion of a char[] array.
 boolean stem(char[] wordBuffer, int offset, int wordLen)
          Stem a word contained in a portion of a char[] array.
 boolean stem(int i0)
           
 java.lang.String stem(java.lang.String s)
          Stem a word provided as a String.
private  void step1()
           
private  void step2()
           
private  void step3()
           
private  void step4()
           
private  void step5()
           
private  void step6()
           
 java.lang.String toString()
          After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)
private  boolean vowelinstem()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

logger

static org.apache.log4j.Logger logger
Define a static logger variable so that it references the Logger instance named PorterStemmer.


stemmer

private static PorterStemmer stemmer
A prototype for a feature factory such that only one instance class can ever exist.


b

private char[] b

i

private int i

j

private int j

k

private int k

k0

private int k0

dirty

private boolean dirty

INC

private static final int INC
See Also:
Constant Field Values

EXTRA

private static final int EXTRA
See Also:
Constant Field Values
Constructor Detail

PorterStemmer

public PorterStemmer()
Method Detail

reset

public void reset()
reset() resets the stemmer so it can stem another word. If you invoke the stemmer by calling add(char) and then stem(), you must call reset() before starting another word.

Specified by:
reset in interface Stemmer

add

public void add(char ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to process the word.

Specified by:
add in interface Stemmer

add

public void add(char[] ch)
Add a character to the word being stemmed. When you are finished adding characters, you can call stem(void) to process the word.

Specified by:
add in interface Stemmer

toString

public java.lang.String toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to the internal buffer can be retrieved by getResultBuffer and getResultLength (which is generally more efficient.)

Overrides:
toString in class java.lang.Object

getResultLength

public int getResultLength()
Returns the length of the word resulting from the stemming process.

Specified by:
getResultLength in interface Stemmer

getResultBuffer

public char[] getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming process. You also need to consult getResultLength() to determine the length of the result.

Specified by:
getResultBuffer in interface Stemmer

cons

private final boolean cons(int i)

m

private final int m()

vowelinstem

private final boolean vowelinstem()

doublec

private final boolean doublec(int j)

cvc

private final boolean cvc(int i)

ends

private final boolean ends(java.lang.String s)

setto

void setto(java.lang.String s)

r

void r(java.lang.String s)

step1

private final void step1()

step2

private final void step2()

step3

private final void step3()

step4

private final void step4()

step5

private final void step5()

step6

private final void step6()

stem

public java.lang.String stem(java.lang.String s)
Stem a word provided as a String. Returns the result as a String.

Specified by:
stem in interface Stemmer

stem

public boolean stem(char[] word)
Stem a word contained in a char[]. Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Specified by:
stem in interface Stemmer

stem

public boolean stem(char[] wordBuffer,
                    int offset,
                    int wordLen)
Stem a word contained in a portion of a char[] array. Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Specified by:
stem in interface Stemmer

stem

public boolean stem(char[] word,
                    int wordLen)
Stem a word contained in a leading portion of a char[] array. Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Specified by:
stem in interface Stemmer

stem

public boolean stem()
Stem the word placed into the Stemmer buffer through calls to add(). Returns true if the stemming process resulted in a word different from the input. You can retrieve the result with getResultLength()/getResultBuffer() or toString().

Specified by:
stem in interface Stemmer

stem

public boolean stem(int i0)
Specified by:
stem in interface Stemmer

main

public static void main(java.lang.String[] args)
Test program for demonstrating the Stemmer. It reads a file and stems each word, writing the result to standard out. Usage: Stemmer file-name


getStemmer

public static PorterStemmer getStemmer()