7 – Part-of-Speech Tagging

Remember parts of speech from school? Nouns, pronouns, verbs, adverbs, et cetera. Identifying how words are being used in a sentence can help us better understand what is being said. It can also point out relationships between words and recognize cross references. NLTK, again, makes things pretty easy for us. You can pass in tokens or words to the POS tag function which returns a tag for each word identifying different parts of speech. Notice how it has correctly labelled the first utterance of “lie” as a verb, while marking the second one as a noun. Refer to the NLTK documentation for more details on what each tag means. One of the cool applications of part of speech tagging is parsing sentences. Here’s an example from the NLTK book that uses a custom grammar to parse an ambiguous sentence. Notice how the parser returns both interpretations that are valid. It is much easier to see the difference when we visualize the parse trees. I shot an elephant in my pajamas, versus, I shot an elephant and the elephant was in my pajamas. How he got into my pajamas? I don’t know.