Next: The components of a Up: Statistical Parsing Previous: The need for structure

Why statistical parsing?

Statistical parsing methods solve the problem outlined above by allowing the system to assign scores to analyses. A parser which is a component of a natural language understander might pass on only the highest scoring analysis to downstream components, or more sophisticated control regimes (such as that in ()) can generate successively less plausible analyses as and when the downstream components require.

The simplest way to assign weights is to use a probabilistic grammar. It is worth having a fairly watertight definition of what this is. We start with the notion of a grammar.

What is a grammar? Conventional (non-probabilistic grammars) can be seen as collections of rules which precisely specify a class of sentence structures which conform to the rules of the language. In defining the class of legal structures the grammar also indirectly define the class of legal sentences. Every word sequence which has at least one legal sentence structure is a legal sentence. Many sentences have far more than one sentence structure.

For the moment we are just saying that a sentence is legal if it has a legal analysis. This is a purely declarative notion, and abstracts away from the details of the procedures which are needed in order to actually find the analysis or analyses for a particular sentence. Suffice it to say that the parser uses the grammar to find analyses, and that the whole thing can be seen as a form of deduction.

What is a probabilistic grammar? A probabilistic grammar is the same as a conventional grammar, except that in addition to assigning a (possibly empty) set of legal structures to each word sequence of the language, it also gives a probability to each such structure. The probability of a word sequence is given by the sum of the individual probabilities of all its structures. One of the effects of this is that word sequences which have no legal analysis according to the grammar have zero probability. An intuitive way to think about the probability of a sentence is as the answer to the question ``If I pick a sentence at random from the legal sentences of this grammar, what is the chance of getting this one.'' This question is well-defined even though many languages (and all natural languages) contain an infinite number of sentences. As the sentences get longer, their probabilities get vanishingly small, with the effect that the total probability summed over all sentences is unity.

Once again we have said nothing much about the mechanisms by which you find the probabilities of these sentences. The probabilistic grammar just tells you what structures can exist, and how to deduce the probability from the structure. It is possible to define probabilistic grammars for which efficient parsing algorithms are provably impossible (Such models may still be practically useful if there are efficient ways to approximate the ideal answer. See Bod (1995) for an example).

When we are working with a probabilistic grammar, the parser has the job of interpreting the grammar by building a record of the analyses which go with each of the input sentences. Unlike the conventional parser it must also keep track of the probability of each of the analyses, but like its counterpart it must be sufficiently efficient to make the testing and development of improved grammars a practical proposition. In this, as in many other combinatorial problems, it is not enough to hope that fast enough machines will one day become available, because the combinatorics of the problems will defeat any plausible machine unless some care is taken to design algorithms which avoid unnecessary work.

We must take account of this, because probabilistic grammars tend to be highly underconstrained (in many cases they are induced from corpus data rather than written by people). This lack of constraint has the effect that probabilistic systems must cope with very large numbers of analyses. And, to the extent that probabilistic systems are capable of working with naturally occuring language, we face the consequences of long sentences, wide vocabularies and unexpected constructions. All of these factors force us towards a serious concern with the efficiency of the algorithms which we choose.

Next: The components of a Up: Statistical Parsing Previous: The need for structure

Chris Brew
8/7/1998