11 – Sample A Topic

Let’s start by picking some topics for our documents. We start with some declared distribution with parameters Alpha. And the parameters should be small for the distribution to be spiky towards the sites. Which means, if we pick a point somewhere in the distribution, it will most likely be close to a corner or at least to an edge. Let’s say we pick this point close to the politics corner which generates the following values 0.1 for Science, 0.8 for politics and 0.1 for sports. These values represent a mixture of topics for this particular document. They also give us a multinomial distribution theta. Now from this distribution, we’ll start picking topics. That means the topics we’ll pick are Science with a 10 percent probability, politics with an 80 percent probability and sports with a 10 percent probability. So, we’ll pick some topics say politics, science, politics, sports etc. And we do this for several documents. So, each document is a point in this declared distribution. Let’s say document one is here which gives us this multinomial distribution. Then, document two is here which gives us this other one, and we’ll do this for all the documents. And now, we merge all these vectors to get the first matrix. The matrix that indexes documents with their corresponding topics.