next up previous contents
Next: Words and documents Up: Applying probabilities to Data-Intensive Previous: Bigram probabilities

${\chi}^2$

Fill table with difference of real and expected frequencies.

Deviations from expectation

  holmes $\neg {\tt holmes}$ Total
sherlock 7 - 0.05 0 - 6.95 -
$\neg {\tt sherlock}$ 39 - 45.95 7023 - 7016.05 -
Total - - -

and there is a statistic called ${\chi}^2$ which is made from these differences.

\begin{displaymath}
{\chi}^2 = \sum \frac{(f_o - f_e)^2}{f_e}\end{displaymath}

This will be big when we are not dealing with word confetti.

Contributions to ${\chi}^2$

  holmes $\neg {\tt holmes}$ Total
sherlock $\frac{(7 - 0.05)^2}{0.05}$ $\frac{(0 - 6.95)^2}{6.95}$ -
$\neg {\tt sherlock}$ $\frac{(39 - 45.95)^2}{45.95}$ $\frac{(7023 - 7016.05)^2}{7016.05}$ -
Total - - -

Summing these


which you can look up in the table and find to be unlikely by chance.



Chris Brew
8/7/1998