P(Wn = holmes | Wn-1 = sherlock) = 1
We also have notation for the joint event of the n-1th word being ``Sherlock'' and the nth ``Holmes''. This is:P(Wn = holmes,Wn-1 = sherlock)
Because we are absolutely certain of the identity of the next word when we have seen the ``Sherlock'', it follows that:P(Wn = holmes,Wn-1 = sherlock) =
P(Wn-1 = sherlock)
In general, for any pair of words, we will have: which is usually written more compactly: While it is true that P(holmesn| sherlockn-1) = 1, it is definitely not true that P(sherlockn-1| holmesn) = 1, because the word ``holmes'' occurs frequently in contexts where it is preceded by something other than ``sherlock''. If someone tells us that the 354th word of the story is ``holmes'' (I haven't checked), then we cannot be certain that the 353rd is ``sherlock''. There is a better than even chance, but we cannot be sure. It remains the case that .