com.raritantechnologies.concept
Class WordCount

java.lang.Object
  extended bycom.raritantechnologies.concept.WordCount

public class WordCount
extends java.lang.Object

Maintains a map of word count / document name. Used to generate word statistics on documents. Keeps a map of document names and the number of times that the word occurred in the document.


Developed by Raritan Technologies .

Author:
Ted Sullivan

Constructor Summary
WordCount(java.lang.String word)
           
 
Method Summary
 void addCounts(WordCount counts)
           
 void addDocument(java.lang.String documentName)
           
 void addDocument(java.lang.String documentName, int[] wordPositions)
           
 double getAverageWordDensity(java.util.HashMap docWordCountMap)
           
 double getAverageWordDensity(java.util.HashMap docWordCountMap, int minDocSize)
          Returns average word density.
 java.util.Map getDocumentCounts()
           
 double getDocumentFrequency(int totalDocuments)
          returns ratio of number of documents that this word occurs in to total documents.
 int getMaxWordsIn()
           
 int getNumberOfDocuments()
           
 int getTotalCounts()
          returns the total number of times the word occurred in the set of documents.
 java.lang.String getWord()
           
 int[] getWordPositions(java.lang.String docKey)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordCount

public WordCount(java.lang.String word)
Method Detail

getWord

public java.lang.String getWord()

getDocumentCounts

public java.util.Map getDocumentCounts()

addDocument

public void addDocument(java.lang.String documentName)

addDocument

public void addDocument(java.lang.String documentName,
                        int[] wordPositions)

addCounts

public void addCounts(WordCount counts)

getDocumentFrequency

public double getDocumentFrequency(int totalDocuments)
returns ratio of number of documents that this word occurs in to total documents.


getTotalCounts

public int getTotalCounts()
returns the total number of times the word occurred in the set of documents.


getAverageWordDensity

public double getAverageWordDensity(java.util.HashMap docWordCountMap)

getAverageWordDensity

public double getAverageWordDensity(java.util.HashMap docWordCountMap,
                                    int minDocSize)
Returns average word density.

Parameters:
docWordCountMap - map of Document name to total number of Words in document.
minDocSize - minimum of words per document that will be used. If minDocSize <= 0 all documents will be used to compute average density.

getNumberOfDocuments

public int getNumberOfDocuments()

getMaxWordsIn

public int getMaxWordsIn()

getWordPositions

public int[] getWordPositions(java.lang.String docKey)