7 – Solution Picking Topics

So let’s think. In the distribution on the left, we’re very likely to pick a point, say, here close to a corner or the edges. Let’s say, for example, close to politics. That means our article is 80 percent about politics, 10 percent about sports, and 10 percent about science. On the distribution in the … Read more

6 – Quiz Picking Topics

Let’s sidetrack a bit. Let’s say we’re at a party and this party is in the triangular room, and these black dots are people and they’re roaming around the party. Now let’s say we locate some food in a corner, some dessert in the other corner and some music in the other one. So people … Read more

5 – Matrices

So, the idea for building our LDA model will be to factor are Bag of Words matrix on the left into two matrices, one indexing documents by topic and the other indexing topics by word. In this video, I’ll be more specific about what these matrices mean. Here’s how we calculate our Bag of Words … Read more

4 – Matrix Multiplication

Well, let’s see. The first part has 500 times 10 parameters, which is 5,000. The second part has 10 times 1,000, which is 10,000. So, together, they have 15,000 parameters. This is much better than 500,000. This is called Latent Dirichlet Allocation or LDA for short. An LDA is an example of matrix factorization. We’ll … Read more

3 – Latent Variables

Well, the answer is, we need one arrow for each pair document word. Since we have 500 documents and 1,000 words, the number of parameters is the number of documents times the number of words. This is 500 times 1,000, which is 500,000. This is too many parameters to figure out. Is there any way … Read more

2 – Bag Of Words

Let’s start with a regular bag of words model. If you think about the Bag of Words model graphically, it represents the relationship between a set of document objects and a set of word objects. It’s very simple. Let’s say we’ve got an article like this one and we look at what words are in … Read more

16 – Outro

That’s it. Great job. In this lesson, we’ve learned topic model and document categorization using Latent Dirichlet Allocation. This will give us the mixture model of topics in a new document and the probabilities of these topics generating all the words. Now, in the following lab, you’ll be able to put all this in practice … Read more

15 – Combining the Models

So, now let’s put this all together and study how to get these two matrices in the LDA model based on their respective dirichlet distributions. The rough idea is as we just saw, the entries from the first matrix come from picking points in the distribution alpha. The interest from the second matrix come from … Read more

14 – Sample a Word

Now, we’ll do the same thing for topics and words. Let’s say for the sake of visualization that we only have four words: space, climate, vote, and rule. Now, we have a different Dirichlet distribution, beta. This one is similar to the previous one but it is three-dimensional, it’s not around a triangle but it’s … Read more

13 – Sample A Word

Now, we’ll do the same thing for topics and words. Let’s say for the sake of visualization that we only have four words: space, climate, vote, and rule. Now we have a different distribution, beta. This one is similar to the previous one but it is three-dimensional, is not around a triangle but it’s around … Read more

12 – Sample A Topic

Let’s start by picking some topics for our documents. We start with some Dirichlet distribution with parameters alpha, and the parameter should be small for the distribution to be spiky towards the sides, which means, if we pick a point somewhere in the distribution, it will most likely be close to a corner or at … Read more

11 – Sample A Topic

Let’s start by picking some topics for our documents. We start with some declared distribution with parameters Alpha. And the parameters should be small for the distribution to be spiky towards the sites. Which means, if we pick a point somewhere in the distribution, it will most likely be close to a corner or at … Read more

10 – Latent Dirichlet Alocation

So, now let’s build our LDA model. The idea is the following; We’ll have our documents here, let’s say these three documents, and then we’ll generate some fake documents, like this three over here. The way we generate them is with the topic model, and then what we do is we compare the generated documents … Read more

1 – Introduction

Hello, this is Luis. Welcome to the Topic Modeling section. While classification is an interesting supervised learning problem and a lot of tasks fall under that category, there’s a whole world of further unsupervised problems that I find fascinating. One of these is Topic Modeling. In this section, we’ll study a model, which given a … Read more

9 – T-SNE

Word embeddings need to have high dimensionality in order to capture sufficient variations in natural language, which makes them super hard to visualize. T-SNE, which stands for t-Distributed Stochastic Neighbor Embedding, is a dimensionality reduction technique that can map high dimensional vectors to a lower dimensional space. It’s kind of like PCA, Principle Component Analysis, … Read more

8 – Embeddings For Deep Learning

Where the embeddings are fast becoming the de facto choice for representing words, especially for use and deep neural networks. But why do these techniques work so well? Doesn’t it seem almost magical that you can actually do arithmetic with words, like woman minus man plus king equals queen? The answer might lie in the … Read more

7 – GloVe

Word2vec is just one type of forward embedding. Recently, several other related approaches have been proposed that are really promising. GloVe or global vectors for word representation is one such approach that tries to directly optimize the vector representation of each word just using co- occurrence statistics, unlike word2vec which sets up an ancillary prediction … Read more

6 – Word2Vec

Word2Vec is perhaps one of the most popular examples of word embeddings used in practice. As the name Word2Vec indicates, it transforms words to vectors. But what the name doesn’t give away is how that transformation is performed. The core idea behind Word2Vec is this, a model that is able to predict a given word, … Read more

5 – Word Embeddings

One-hot encoding usually works in some situations but breaks down when we have a large vocabulary to deal with, because the size of our ward representation grows with the number of words. What we need as a way to control the size of our word representation by limiting it to a fixed-size vector. In other … Read more

4 – One-Hot Encoding

So far, we’ve looked at representations that tried to characterize an entire document or collection of words as one unit. As a result, the kinds of inferences we can make are also typically at a document level, mixture of topics in the document, documents similarity, documents sentiment, et cetera. For a deeper analysis of text, … Read more