12 – Sample A Topic

Let’s start by picking some topics for our documents. We start with some Dirichlet distribution with parameters alpha, and the parameter should be small for the distribution to be spiky towards the sides, which means, if we pick a point somewhere in the distribution, it will most likely be close to a corner or at least to an edge. Let’s say we pick this point close to the politics corner, which generates the following values; 0.1 for science, 0.8 for politics, and 0.1 for sports. These values represent the mixture of topics for this particular document. They also give us a multinomial distribution data. Now, from this distribution, we’ll start picking topics. That means the topics we’ll pick are science, with a 10 percent probability, politics, with an 80 percent probability, and sports with a 10 percent probability. So, we’ll pick some topics, say, politics, science, politics, sports, etc., and we do this for several documents, so each document is a point in this Dirichlet distribution. Let’s say document one is here, which gives us this multinomial distribution, and document two is here, which gives us this other one, and, we do this for all the documents. And now, we merge all these vectors to get the first matrix, the matrix that indexes documents with their corresponding topics.

%d 블로거가 이것을 좋아합니다: