The idea for Hidden Markov Models is the following: Let’s say that a way of tagging the sentence “Jane will spot Will” is noun-modal-verb-noun and we’ll calculate a probability associated with this tagging. So we need two things. First of all, how likely is it that a noun is followed by a modal and a modal by a verb and a verb by a noun. These need to be high in order for this tagging to be likely. These are called the transition probabilities. Now, the second set of probabilities we need to calculate are these: What is the probability that a noun will be the word Jane and that a modal will be the word will, etc. This also need to be relatively high for our tagging to be likely. These are called the Emission Probabilities. So, here are sentences with their corresponding tags and we’re going to calculate the Emission Probabilities, that is the probability that if a word is, say, a noun that that word will be Mary or Jane, etc. So, to do this we again do a counting table like this one where, for example, the entry on the Mary row and the noun column is four because Mary appears four times as a noun. Now, in order to find the probabilities, we divide each column by the sum of the entries and we obtain the following numbers. And here’s a graphical representation of this table. Say, if we know that a word is a noun, the probabilities of it being Mary is four over nine, Jane is two over nine, Will is one over nine, and Spot is two over nine. The other words are zero. Same thing for modal, and for verb. And notice that words can appear repeatedly here like Will which appears as a noun and also as a modal. That is no problem. So, now let’s calculate the transition probabilities. These are the probabilities that are part of speech follows another part of speech. First, in order to actually get the whole picture, we’ll add starting and ending tags on each sentence and we’ll treat these tags as parts of speech as well. And now we make a table of counts in this table. We count the number of appearances of each pair of parts of speech. For example, this three here in the noun row and the modal column corresponds to the three occurrences of a noun followed by a modal. Now, to find the probabilities, we divide each row by the sum of the entries in the row. In this way, if we look at, say, the modal row, the probability that the next part of speech is a noun is one quarter, but it’s a verb it’s three quarters and that it’s a modal or the end of the sentence is zero. And here’s a nice graph of our transition probabilities. We draw our parts of speech and arrows between them with the transition probabilities attached to them, and as a final step we combine the two previous graphs to form a Hidden Markov Model. We have our words, which are observations. These are called the observations because they are the things we observe when we read the sentences. And the parts of speech are called the Hidden States, since they are the ones we don’t know and we have to infer based on the words. And among the Hidden States we have the transmission probabilities, and between the Hidden States and the observation we have the Emission Probabilities. And that’s it. That’s a Hidden Markov Model.