12 – MLND SL NB Solution Naive Bayes Algorithm

So the way to do this is to actually divide each one by the sum of both. This will make sure that they add to one. For the first one, we have one over 12 divided by one over 12 plus one over 40, which is 10 divided by 13. And for the second one, we have one over 40 divided by one over 12 plus one over 40, which is three over 13. So there we go. The answers are 10 over 13 for spam and three over 13 for ham. So, for this particular email, we conclude that it is very likely to be spam. Now, what happens in general? Well, let’s say we have a bunch of words that we use as features to tell if the email is spam or not. Say, easy, money,cheap, et cetera. Our first step is to flip the event and the conditional to get this, then we make the naive assumption to split this into a product of simple factors that we can quickly calculate by looking at our data. We do this both for spam and ham, and we get some values that don’t add to one. As a final step, we normalize to get our final probabilities of our email being spam or ham. And that’s it. That’s how the Naive Bayes algorithm works.

%d 블로거가 이것을 좋아합니다: