Okay. So, now we’ll do the same thing as we did before, painting our weights in the neural network to better classify our points. But we’re going to do it formally, so fasten your seat belts because math is coming. On your left, you have a single perceptron with the input vector, the weights and the bias and the sigmoid function inside the node. And on the right, we have a formula for the prediction, which is the sigmoid function of the linear function of the input. And below, we have a formula for the error, which is the average of all points of the blue term for the blue points and the red term for the red points. And in order to descend from Mount Errorest, we calculate the gradient. And the gradient is simply the vector formed by all the partial derivatives of the error function with respect to the weights w1 up to wn and and the bias b. They correspond to these edges over here, and what do we do in a multilayer perceptron? Well, this time it’s a little more complicated but it’s pretty much the same thing. We have our prediction, which is simply a composition of functions namely matrix multiplications and sigmoids. And the error function is pretty much the same, except the ŷ is a bit more complicated. And the gradient is pretty much the same thing, it’s just much, much longer. It’s a huge vector where each entry is a partial derivative of the error with respect to each of the weights. And these just correspond to all the edges. If we want to write this more formally, we recall that the prediction is a composition of sigmoids and matrix multiplications, where these are the matrices and the gradient is just going to be formed by all these partial derivatives. Here, it looks like a matrix but in reality, it’s just a long vector. And the gradient descent is going to do the following; we take each weight, w_i_j super k and we update it by adding a small number, the learning rate times the partial derivative of E with respect to that same weight. This is the gradient descent step, so it will give us new updated weight w_i_j super k prime. That step is going to give us a whole new model with new weights that will classify the point much better.