Well, let’s see. The first part has 500 times 10 parameters, which is 5,000. The second part has 10 times 1,000, which is 10,000. So, together, they have 15,000 parameters. This is much better than 500,000. This is called Latent Dirichlet Allocation or LDA for short. An LDA is an example of matrix factorization. We’ll see how. The idea is the following: we go from this Bag of Words model in the left to the LDA model in the right. The Bag of Words model in the left basically says our probability of, say, the word tax being generated by the second document is the label of this arrow. On the LDA model in the right, that probability is calculated by these arrows by multiplying the P of t given z on the top by the corresponding P of z given d on the bottom and adding them. This formula reminds us a bit of matrix multiplication in the following way. We can put all the probabilities in the left model on a big matrix, then the idea is to write this big bag of words matrix as a product of a tall skinny matrix indexed by documents and topics with a wide flat matrix indexed by topics and terms. In this case, the entry corresponding to say the second document and the term “tax” in the Bag of Words matrix will be equal to the inner product of the corresponding row and column in the matrices on the right. And as before, if the matrices are big, say if we have 500 documents, 10 topics and 1,000 terms, the Bag of Words matrix has 500,000 entries, whereas the two matrices in the topic model combined have 15,000 entries. But aside from being much simpler, the LDA model has a huge advantage that it gives us a bunch of topics that we can divide the documents on. In here we’re calling them science, politics, and sports, but in real life, the algorithm will just throw some topics and it’ll be up to us to look at the associated words and decide what is the common topic of all these words. We’ll keep them as science, politics, and sports for clarity, but think of them as topic 1, topic 2, and topic 3.