Choice of priors may not matter:

Next: Estimating Model Parameters Up: Bayesian Decision Rules Previous: Bayesian Decision Rules

Choice of priors may not matter:

In the case of our medical decision problem, this comparison hinged on the prior probabilities of the diseases, and we could proceed analogously, asking the client of the language identification system for estimates of $p({\tt Spanish})$ and $p({\tt English})$ . But for this decision problem you don't know the priors, so you just assume that English and Spanish are equally likely. This is the assumption of uniform (or sometimes uninformed) priors. Provided there are big differences in the conditional probabilities, the decision is going to be insensitive to the precise values of the priors.

Strictly, the probability of observing a particular test string S given a Markov model like M_Spanish or M_English is:

$\begin{displaymath} P(S\vert M_{lang}) = p(s_{1}\ldots s_{k}) \prod^{N}_{i=k+1} p(s_{i} \vert s_{i-k} \ldots s_{i-1}\vert M_{lang})\end{displaymath}$

but for practical purposes it is just as good to drop the leading term. Variations in this are going to be massively outweighed by the contribution of the terms in the product.

You can rearrange the product by grouping together terms which involve the same words (for example, pulling together all instances of ``th''), to get [Need to spell this out in more detail, with an example and a diagram].

$\begin{displaymath} P(S\vert M_{lang}) = \prod_{w_{1} \ldots w_{k+1}} p(w_{k+1} ... ... w_{1} \ldots w_{k}\vert M_{lang})^{T(w_{1} \ldots w_{k+1},S)}\end{displaymath}$

where $T(w_{1} \ldots w_{k+1},S)$ is the number of times the k+1 gram occurs in the test string. NB. Dunning gets this formula wrong, using a product instead of an exponent. The next one is right. As is usually the case, when working with probabilities, taking logarithms helps to keep the numbers stable. This gives:

$\begin{displaymath} \log P(S\vert M_{lang}) = \sum_{w_{1} \ldots w_{k+1}} \log p... ...t w_{1} \ldots w_{k}\vert M_{lang}){T(w_{1} \ldots w_{k+1},S)}\end{displaymath}$

We can compare these for different languages, and choose the language model which is most likely to have generated the given string. If the language models sufficiently reflect the languages, comparing the models will get us the right conclusions about the languages.

The question remaining is that of getting reliable estimates of the ps. And this is where statistical language modellers really spend their lives. Everything up to now is common ground shared, in one way or another, by almost all applications. What remains is task-specific and crucially important to the success of the enterprise.

Next: Estimating Model Parameters Up: Bayesian Decision Rules Previous: Bayesian Decision Rules

Chris Brew
8/7/1998