next up previous contents
Next: Why is the cross-entropy Up: Probability and information Previous: Entropy

Cross entropy

In the previous section we developed the idea that entropy is measure of the expected information gain from seeing the next symbol of a ticker tape. The formula for this quantity, which we called entropy is:

\begin{displaymath}
H(w) - \sum_w p(w) \log p(w)\end{displaymath}

Now we imagine that we are still watching a ticker tape, whose behaviour is still controlled by P(w) but we have imperfect knowledge PM(w) of the probabilities. That is, when we see w we assess our information gain as $\log p_{M}(w)$, not as the correct $\log p(w)$. Over time we will see symbols occurring with their true distribution, so our estimate of the information content of the signal will be:

\begin{displaymath}
H(w; P_{M}) = - \sum_w p(w) \log p_{M}(w)\end{displaymath}

This quantity is called the cross-entropy of the signal with respect to the model PM. It is a remarkable and important fact that the cross entropy with respect to any incorrect probabilistic model is greater than the entropy with respect to the correct model.

The reason that this fact is important is that it provides us with a justification for using cross-entropy as a tool for evaluating models. This lets you organize the search for a good model in the following way

. If you are able to find a scheme which guarantees that the alterations to the model will improve cross-entropy, then so much the better, but even if not every change is an improvement, the algorithm may still eventually yield good models.



 
next up previous contents
Next: Why is the cross-entropy Up: Probability and information Previous: Entropy
Chris Brew
8/7/1998