So far, we’ve looked at representations that tried to characterize an entire document or collection of words as one unit. As a result, the kinds of inferences we can make are also typically at a document level, mixture of topics in the document, documents similarity, documents sentiment, et cetera. For a deeper analysis of text, we need to come up with a numerical representation for each word. If you’ve dealt with categorical variables for data analysis or tried to perform multi-class classification, you may have come across this term, One-Hot Encoding. That is one way of representing words, treat each word like a class, assign it a vector that has one in a single pre-determined position for that word and zero everywhere else. Looks familiar? Yeah, it’s just like the bag of words idea, only that we keep a single word in each bag and build a vector for it.