next up previous contents
Next: Case study: Language Identification Up: Probability and Language Models Previous: Medical diagnosis:

Statistical models of language

For language modelling, an especially useful form of the conditional probability equivalence:

\begin{displaymath}
P(A,B) \equiv P(B\vert A) \times P(A)\end{displaymath}

is:

\begin{displaymath}
P(w_1,w_2,\ldots w_n) \equiv P(w_n\vert w_1,w_2,...w_{n-1}) \times P(w_1,w_2,...w_{n-1})\end{displaymath}

and this can be applied repeatedly to give:

This is nice because it shows how to make an estimate of the probability of the whole string from contributions of the individual words. It also points up the possibility of approximations. A particularly simple one is to assume that the contexts don't affect the individual word probabilities:

We can get esimates of the P(wk) terms from frequencies of words. This is just word confetti in another form. It misses out the same crucial facts about patterns of usage. But the following , which restricts context to a single word, is much more realistic:

This is called a bigram model. It gets much closer to reality than does word-confetti, because it takes limited account of the relationships between successive words. The next section describes an application of such a model to the task of language identification.


next up previous contents
Next: Case study: Language Identification Up: Probability and Language Models Previous: Medical diagnosis:
Chris Brew
8/7/1998