1 – Feature Extraction

Once we have our text ready in a clean and normalized form, we need to transform it into features that can be used for modeling. For instance, treating each document like a bag of words allows us to compute some simple statistics that characterize it. These statistics can be improved by assigning appropriate weights towards using a TF-IDF Scheme. This enables a more accurate comparison between documents. For certain applications, we may need to find numerical representations of individual words, and for that, we can use word embeddings, which are a very efficient and powerful method. In this lesson, you will learn all these techniques for extracting relevant features from text data.

%d 블로거가 이것을 좋아합니다: