So, our goal is to train our neural network. In order to do this, we have to define the error function. So, let’s look again at what the error function was for perceptrons. So, here’s our perceptron. In the left, we have our input vector with entries x_1 up to x_n, and one for the bias unit. And the edges with weights W_1 up to W_n, and b for the bias unit. Finally, we can see that this perceptor uses a sigmoid function. And the prediction is defined as y-hat equals sigmoid of Wx plus b. And as we saw, this function gives us a measure of the error of how badly each point is being classified. Roughly, this is a very small number if the point is correctly classified, and a measure of how far the point is from the line and the point is incorrectly classified. So, what are we going to do to define the error function in a multilayer perceptron? Well, as we saw, our prediction is simply a combination of matrix multiplications and sigmoid functions. But the error function can be the exact same thing, right? It can be the exact same formula, except now, y-hat is just a bit more complicated. And still, this function will tell us how badly a point gets misclassified. Except now, it’s looking at a more complicated boundary.