Next: Data analysis
Up: A compendium of UNIX
Previous: A compendium of UNIX
The following are provided by Ted Dunning from New Mexico State
University. They have some overlap with stuff we have developed
so far, but have extra facilities, and are often faster:
- 1.
- hwcount - count tokens, like sort | uniq -c but faster.
- 2.
- fwords - a fast version of words for segmenting the English
text
- 3.
- cgram - convert text into character n-grams
- 4.
- grams - no man-page , but cat file | grams 3
prints all bigrams in file.
- 5.
- compare - compare frequencies of strings in two files.
- 6.
- chi2 - several measures of how ``sticky'' words are.
Chris Brew
8/7/1998