16 – Closed Form Solution

So here’s an interesting observation; in order to minimize the mean squared error, we do not actually need to use gradient descent or the tricks. We can actually do this in a closed mathematical form. Let me show you. Here’s our data x_1, y_1 all the way to x_m, y_m; and in this case, m is five. And the areas of the squares represent our squared error. So our input is x_1 up to x_m and our labels are y_1 up to y_m, and our predictions are of the form y_i hat equals w_1 x_i plus w_2, where w_1 is a slope of the line and w_2 is the y-intercept. And the mean squared error is given by this formula over here. Notice that I’ve written the error as a function of w_1 and w_2, since given any w_1 and w_2 we can calculate the predictions and the error based on these values of w_1 and w_2. Now, as we know from calculus, in order to minimize this error, we need to take the derivatives with respect to the two input variables w_1 and w_2 and set them both equal to zero. We calculate the derivatives and you can see the full calculation in the instructor notes and we get these two formulas. Now, we just need to solve for w_1 and w_2 for these two equations to be zero. So what do we have now? We have a system of two equations and two unknowns, we can easily solve this using linear algebra. So now the question is, why don’t we do this all the time? Why do we have to go through many gradient descent steps instead of just solving a system of equations and unknowns? Well, think about this. If you didn’t have only two dimensions in the input but you had n, then you would have n equations with n unknowns, and solving a system of n equations with n unknowns is very expensive because if n is big, then at some point of our solution, we have to invert an n by n matrix. Inverting a huge matrix is something that takes a lot of time and a lot of computing power. So this is simply not feasible. So instead this is why we use gradient descent. It will not give us the exact answer necessarily but it will get us pretty close to the best answer which will give us a solution that fits our data pretty well. But if we had infinite computing power, we would just solve this system and solve linear regression in one step.

Dr. Serendipity에서 더 알아보기

지금 구독하여 계속 읽고 전체 아카이브에 액세스하세요.

Continue reading