Next: Probability and information
Up: Applying probabilities to Data-Intensive
Previous:
There ought to be a difference between things which are
frequent in all documents (e.g of the) and those which
are frequent in some only (e.g. sherlock holmes).
- The binomial model, and its relative the Poisson
distribution don't take account of the ``burstiness''
of words.
- The negative binomial does, used by Church to find
``interesting words'' and by Mosteller and Wallace
to discriminate authorship
Chris Brew
8/7/1998