P(Wn = holmes | Wn-1 = sherlock)
and the difference between this andP(Wn-1 = sherlock | Wn = holmes)
This is a clear case, because it is the probability that Wn-1 is ``sherlock'' given that Wn is ``holmes'', so, because more than one word can precede ``holmes'', it isn't 1.You may be confused about why anyone would care about
P(Wn-1 = sherlock | Wn = holmes)
in which case you should remember the possibility that you are reading the text backwards from end to beginning!You should also be familiar with the idea of joint probability
P(Wn = holmes, Wn-1=sherlock)
which is just the two events occurring together.And you should be aware that
The second expression is for people reading in the ordinary way, and the first is for those of us who read backwards (don't do this at home - especially with crime novels). The usual form of Bayes' theorem is This is a form which lets people who were fed the text backwards convert their knowledge into a form which will be useful for prediction when working forwards. Of course there are variations which apply to all kinds of situations more realistic than this one. The general point is that all this algebra lets you work with information which is relatively easy to get in order to infer things which you can count less reliably or not at all. See the example about twins below to get more of an intuition about this.