Markov models

Next: Bayesian Decision Rules Up: Case study: Language Identification Previous: Common words

Markov models

A Markov model is a random process in which the transition probability to the next state depends solely on the previous state, as in the following:

$\begin{displaymath} p(S) = p(s_1 \ldots s_n) = p(s_1)\prod^n_{i=2} p(s_i\vert s_{i-1}) \end{displaymath}$

This is another form of the bigram model which we saw earlier. You can view any process where the output probability distribution depends only on the last k states as a Markov process by regarding the labels of the last k states of the small machine as labels of corresponding states in a much bigger machine, as shown in figure 8.2.

**Figure 8.2:** Relabelling a Markov process
$\begin{figure} Need a picture of this!\end{figure}$

The relabelled model has transitions:

$\begin{displaymath} p(s_{i+1}\ldots s_{i+k}\vert s_{i} \ldots s_{i+k-1}) = p(s_{i+k}\vert s_{i} \ldots s_{i+k-1})\end{displaymath}$

These larger models are called kth order Markov models. For language identification 4th order Markov models are quite effective (you take account of four characters of context). Given 6000 words of data intensive linguistics research report, you get the results shown in figure 8.3.

**Figure 8.3:** Randomly generated strings from several orders of Markov model
$\begin{figure} \begin{verbatim} 0 hm 1 imuando~doc ni leotLs Aiqe1pdt6cf tlc.teo... ...not words can be said to specify appear McDonald. 1989\end{verbatim}\end{figure}$

The numbers down the side are the order of the models, 0 is word-confetti, 1 is bigrams, more than that is higher-order. This is not what we want, because it is brittle. You get verbatim reproduction, and texts which are different from the training material often get zero probability. To fix this you either need lots of training data or a good technique for smoothing the probabilities. We'll come to that in section 8.3, but for the moment we'll note the problem and move on to formalising the task of deciding which language a particular string came from

Next: Bayesian Decision Rules Up: Case study: Language Identification Previous: Common words

Chris Brew
8/7/1998