1 – Introduction

Hello, this is Luis. Welcome to the Topic Modeling section. While classification is an interesting supervised learning problem and a lot of tasks fall under that category, there’s a whole world of further unsupervised problems that I find fascinating. One of these is Topic Modeling. In this section, we’ll study a model, which given a set of documents, it classifies them into different topics. The bag-of-words approach we’ve learned previously tries to represent documents in a dataset directly using the words that appear in them. But often, these words are predicated on some underlying parameters are very among documents such as a topic being discussed. In this section, we’ll begin with discussing this hidden or latent variables. Then, we’ll look at a specific technique used to estimate them called Latent Dirichlet Allocation. Finally, you’ll get to apply the knowledge you’ve learned to perform topic modeling on a collection of documents.

%d 블로거가 이것을 좋아합니다: