Next: Applying probabilities to Data-Intensive Up: Probability and Language Models Previous: Results

Summary

This chapter has introduced the basics of probability and statistical language modelling:

Events are things which might or might not happen.
Many processes can be thought of as long sequences of equivalent trials. Counting things over long sequences of trials yields probabilities.
Bayes' theorem lets you unpack probabilities into contributions from different sources. These conditional probabilities provide a means for reasoning probabilistically about causal relationships between events. You can do this even if you are guessing some of the parameters.
There is a close connection between bigrams, contingency tables and conditional probabilities.
It is often worthwhile to work with simplified models of probabilistic processes, because they allow you to get estimates of useful quantities which are otherwise inaccessible.
In language processing you need to be alert to the consequences of limited training data, which can mean that the theoretically ideal answer needs adjustment to work in the real world.
Language identification is a relatively simple illustration of these ideas.

In chapter 9 we add basic information theory to the repertoire which we have already developed. We will show the application of this tool to a word-clustering problem Then in chapter

we bring back the n-gram models introduced in the current chapter, combining them with information theoretic ideas to explain the training algorithm which makes it possible for part-of-speech taggers and speech recognisers to work as well as they do.

Next: Applying probabilities to Data-Intensive Up: Probability and Language Models Previous: Results

Chris Brew
8/7/1998