10 – Summary

We have covered a number of text processing steps. Let’s summarize what a typical workflow looks like. Starting with a plain text sentence, you first normalize it by converting to lowercase and removing punctuation, and then you split it up into words using a tokenizer. Next, you can remove stop words to reduce the vocabulary you have to deal with. Depending on your application, you may then choose to apply a combination of stemming and lemmatization to reduce words to the root or stem form. It is common to apply both, lemmatization first, and then stemming. This procedure converts a natural language sentence into a sequence of normalized tokens which you can use for further analysis.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading