35 – Gradient Descent Vs Perceptron Algorithm

So let’s compare the Perceptron algorithm and the Gradient Descent algorithm. In the Gradient Descent algorithm, we take the weights and change them from Wi to Wi_ plus_ alpha_ times_ Y hat_ minus_ Y_ times_ Xi. In the Perceptron algorithm, not every point changes weights, only the misclassified ones. Here, if X is misclassified, we’ll change the weights by adding Xi to Wi if the point label is positive, and subtracting if negative. Now the question is, are these two things the same? Well, let’s remember that in that Perceptron algorithm, the labels are one and zero. And the predictions Y-hat are also one and zero. So, if the point is correct, classified, then Y_ minus_ Y-hat is zero because Y is equal to Y-hat. Now, if the point is labeled blue, then Y_ equals_ one. And if it’s misclassified, then the prediction must be Y-hat_ equals_ zero. So Y-hat_ minus_ Y is minus one. Similarly, with the points labeled red, then Y_ equals_ zero and Y-hat_ equals_ one. So, Y-hat_ minus_ Y_ equals_ one. This may not be super clear right away. But if you stare at the screen for long enough, you’ll realize that the right and the left are exactly the same thing. The only difference is that in the left, Y-hat can take any number between zero and one, whereas in the right, Y-hat can take only the values zero or one. It’s pretty fascinating, isn’t it? But let’s study Gradient Descent even more carefully. Both in the Perceptron algorithm and the Gradient Descent algorithm, a point that is misclassified tells a line to come closer because eventually, it wants the line to surpass it so it can be in the correct side. Now, what happens if the point is correctly classified? Well, the Perceptron algorithm says do absolutely nothing. In the Gradient Descent algorithm, you are changing the weights. But what is it doing? Well, if we look carefully, what the point is telling the line, is to go farther away. And this makes sense, right? Because if you’re correctly classified, say, if you’re a blue point in the blue region, you’d like to be even more into the blue region, so your prediction is even closer to one, and your error is even smaller. Similarly, for a red point in the red region. So it makes sense that the point tells the line to go farther away. And that’s precisely what the Gradient Descent algorithm does. The misclassified points asks the line to come closer and the correctly classified points asks the line to go farther away. The line listens to all the points and takes steps in such a way that it eventually arrives to a pretty good solution.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading