10 - Summary - Dr. Serendipity

We have covered a number of text processing steps. Let’s summarize what a typical workflow looks like. Starting with a plain text sentence, you first normalize it by converting to lowercase and removing punctuation, and then you split it up into words using a tokenizer. Next, you can remove stop words to reduce the vocabulary you have to deal with. Depending on your application, you may then choose to apply a combination of stemming and lemmatization to reduce words to the root or stem form. It is common to apply both, lemmatization first, and then stemming. This procedure converts a natural language sentence into a sequence of normalized tokens which you can use for further analysis.

10 – Summary

이것이 좋아요:

이 글 공유하기:

이것이 좋아요:

Dr. Serendipity에서 더 알아보기