next up previous contents
Next: Bigram probabilities Up: Applying probabilities to Data-Intensive Previous: Counting words in documents

Introduction

Recall the following passage from the introduction to this book:

Imagine a cup of word confetti made by cutting up a copy of ``A Case of Identity'' (or sherlock_words). Now imagine picking words out of the cup, one at a time. On each occasion, you note the word and put it back.

Given that there are 7070 words in the cup, and 7 of them are sherlock, the probability of picking sherlock out of the cup is $p(\mbox{\tt sherlock}) = 7/7070 = 0.00099$. This is the fraction of time you expect to see sherlock if you draw one word. Similarly, $p(\mbox{\tt holmes}) = 46/7070 = 0.0065$.



Chris Brew
8/7/1998