4 – NLPND POS 03 Bigrams V1

Now, of course we can’t really expect the Lookup Table method to work all the time. Here’s a new example where the sentences are getting a bit more complicated. Now, our data is formed by the sentences, Mary will see Jane, Will will see Mary, and Jane will see Will. And the tags are as follows. Noun, modal, and verb. And our goal is to tag the sentence, Mary will see Will. So we do a normal Lookup Table like this, and let’s tag our sentence. Mary gets correctly tagged as a noun. Then Will, get correctly tagged as a modal. See gets correctly tagged as a verb, and here’s a problem. Will, always get tagged as a modal since it appears three times as modal and two times as a noun, but in this sentence, we know Will is a noun since it’s referring to our friend Will. This is a problem. In particular, Lookup Tables won’t work very well if a word can have two different tags, since it will always pick the most common tags that’s associated with no matter the context. So now the question is, how do we take the context into account? Well, the simplest way to think of context is to look at each words neighbor. So for instance, if we tag pairs of words. For example, here we tagged the consecutive pair, see-Jane as a verb-noun one time, as it appears once as verb-noun. And now, onto tag other sentence. We’ll tag the first word using the previous table. So let’s tag Mary as a noun. Now, for each following word, we’ll tag using the previous one, and the Lookup Table to find the pair of tags that correspond to them. So for example, to tag the word will, we look at Mary-will in particular where Mary is a noun and we see that the most common one is when will is a modal. So we’ll tag will as a modal. We continue tagging see as a verb, and the second Will correctly as a noun, since the pair see-Will is tagged as a verb-noun. Notice that I was a bit vague with this algorithm. The reason is there are many ways to define the details here, and most of them work well. I encourage you to think of ways to refine this. For example, let’s say you’re tagging a word based on itself and the previous one, and your Lookup Table actually corrects the tagging of the previous word, do you keep the existing one, or use the new one? These are all decisions that are based on what fits the data best. Of course, Bigrams are not the end of the story, we can do it with three words at a time or even more. These are called Ingrams and they’re also very good with part of speech tagging.

%d 블로거가 이것을 좋아합니다: