Hello, in this notebook, I’ll be showing you how to train a neural network with PyTorch. So, remember from the previous part we built a neural network but it wasn’t able to actually tell us what the digit was in these images. So, what we want is we want to be able to pass in an image into some function and this function is going to return to us a probability distribution that tells us that this is an image of a four. So, what we’re going to do here is actually train a neural network to approximate this function. The way you do that is you give it an image, you tell it the right answer, and then the neural network learns how to approximate this function. So, that is what we are going to be looking at in this notebook. Just as a quick reminder of how training actually works is, as I was saying, we pass in an image and then we also pass in the correct category, the correct label for it, and what we do is measure the difference between our networks prediction, what this thing is, and the true label of that image. The difference between the prediction and the true label is called the loss. There’s different ways to measure this, for example, the mean squared loss or a cross entropy, which we’ve seen, and you use this loss to update the weights. We do the updating through this iterative process called gradient descent. So, then the idea is that you have this gradient of your loss function, which is basically the slope of the loss function, and it always points in the direction of fastest change. We want to minimize our loss because that means that our predictions are as close as possible to the true labels. So, it’s great descent since it’s always pointing in the direction of fastest change, this will get us to the bottom of our loss function the fastest way possible. With multilayer neural networks, we do this through back propagation. Back propagation is really just an application of the chain rule. So, what this means is that for a change in W1 that change propagates all the way through this network from left to right, all the way to the loss which is script L. Similarly, the same changes they propagate backwards through the network to the 2W1, our weights. So, to update our weights we actually need to know the gradient of the loss with respect to our weights. To do that, we can just multiply the gradients of each step in this series of operations. The first thing we need to do in PyTorch is define our loss function. You’ll usually see the loss assigned to a variable called criterion. So, if we’re using the softmax output, we want our loss, our criterion, to be cross-entropy loss so we can get that within nn.CrossEntropyLoss. Then later, we put in the output from our network and our true labels, our targets to get the actual loss. We also need an optimizer. So the optimizer takes a loss. We’ll also need an optimizer. Since we have a loss, we can calculate our gradients and then once we have those gradients, the optimizer uses them to actually update all the weights in parameters in our network. So, for this, I’m going to be using stochastic gradient descent or SGD. There’s also more advanced optimizers like Adam but we won’t be worrying about that right now. To calculate gradients, PyTorch uses a module called autograd. What autograd does is that if you have a tensor, and you tell it that it requires a gradient, it tracks every single operation that happens on that tensor. So then, at the end of your operations, you can say for example, z.backward and then it does a backward pass through all the operations, and it kept track of all those things, and it knows the gradient functions for all those things. And so, we can go backwards and then eventually calculate all the gradients that you want. So, I’ll show you how this works. So first, I’m just going to import stuff like normal. Here, I’m just going to create a random tensor. So this just going to be a two by two tensor, just randomly sampled from a normal distribution. We’re going to say requires grad equals true. So, this will tell PyTorch to track all the operations that happen on this tensor using auto grad. Eventually, we’ll be able to calculate the gradient for it and that’s what our tensor looks like. So, we can say y equals x squared and print y, so it says squared r values. If we do print.y.grad fun, then this actually shows the operation that was done to get y. So, we squared x here to create y and then if you look at the gradient function for y is its power. So, we took x to the power of two. So, eventually when we’re going backwards to these operations, it’s going to use this gradient function to calculate the gradient for y. So, now if we take the mean of y, print that, it’s now just a single number. We can check the gradients of x and y but there’s nothing there. So, there’s nothing there because we haven’t actually done a backward pass through these operations yet. So, we haven’t actually asked PyTorch or Autograd to calculate the gradient yet. So, now to actually calculate our gradients, we need to go backward through these operations. For instance, if we want to do the gradient of z with respect to x, then we say z.backward and this calculates the gradients of z with respect to x, then we can print this out and also print out. So, based on the operations that we did, we should expect that the gradient of z with respect to x should be x over two. So, I’ll just go and print that out too. So, we get our gradient and it is the same as x over two. So, calculate gradient and it is what we expect. So, now I’m just going to grab our data and build the network like we did in the last part. Now we have our model and now I’m going to actually train it. So like I said before, we need to define our loss or our criterion. So, criterion is equal to nn.CrossEntropyLoss and then we can define our optimizer. Optimizer optim.SGD. Then for the optimizer, we need to pass in the parameters that we actually want to optimize. So, in this case, you want to optimize all the parameters of our model. So, we say model.parameters and then we need to set the learning rate, so learning rate here is going to be 0.01. Then, the general process with PyTorch is first we’re going to make a forward pass through the network to get our output, or logits, or a softmax, whatever the output of your network is. Then we’re going to use that output to calculate the loss. Then once you have the loss, you can do a backward pass through the network with loss.backward and this will calculate the gradients for all the parameters in your network. Then with the gradients you can take a step with the optimizer to update the weights. So, first I’m going to do is actually print out the weights before our little training pass, model.fc1.weight. Then we get our images and labels like normal but the first thing you actually have to do is call optimizer.zero_grad. So, what this does is it zeros out all the gradients set are on tensors, weights, and biases that are being trained. So, the reason you need to do this is because every time you do backwards like if you do loss.backward, it actually accumulates the gradients. So, that means that if you go through twice then it’s going to add the two gradient calculations again, and then if you do it a third time, it’s going to add all three of those, if you do it a fourth time it’s going to add all four of those and so in general, you don’t want to do that. You just want it to do one pass, get your gradients, use those gradients to train, and then the next pass, calculate new gradients and use those gradients to train. So, if you don’t zero grad and basically you are summing up, you’re accumulating gradients over multiple training passes and multiple batches. The first thing in your training pass is typically you want to make sure that you zero out the gradients. So, now we can do our forward pass. In the output, is going to be model.forward, pass in our images, calculate the loss with our criterion, where we pass in our output and the true labels, and then go into do our backward pass. So, this is going to calculate the gradients for all of our parameters. Then here I can actually print out the gradients. Now, with the gradients, we’re going to take an optimizer step and update our weights. That’s what it looks like before, here are our gradients, and now we can look at our updated weights. So, basically, it took our gradients, added it to our existing parameters, our weights with some learning rate, and we have updated weights. So, this little bit right here is, this whole cell, is really just a basic training pass. We get some images and labels from our train loader, which loads our data, and then we pass it forward through our model, calculate the loss, do backwards pass through our network, and then we take an optimizer step. You can wrap this whole thing in a for-loop that goes through all of your images and labels, everything in your dataset and updates your network and that’s how your network basically learns. So, now we can take this training method and put it in a loop. So, what you typically do is define the number of epochs you want to train it on. So, basically, one epoch is one pass through the entire dataset and so three epochs is going to be three passes through the entire dataset. Just defining some things which I’ll use. So for E in epochs, for range in range epochs. So, defining running loss. So we’re going to actually be printing out our loss as we’re training so we can see that the training losses like the loss of our network is actually dropping. So, as our network learns and its predictions become better, then the loss is going to decrease and so we just want to see that’s happening as we’re training. So, we get our images and labels from the train loader. We’re going to keep track of our steps, so how many times we’ve actually trained. So again, images.resize. Okay, so now our training pass. So, optimizer.zero_grad. Now our forward pass, output model.forward, loss equals criterion. Our backward pass and then our weight update step. Here, our loss is just a single number so what this means it’s a scalar tensor. To actually get this number out of the scalar tensor and just summit with a zero here, then we actually need to do a loss.item. Every so often going to print out our training loss. Okay and so we can see, just scroll down, that our training loss is dropping as its training which is a good sign. Now with the network trained, we can see how it performs. Here, we are passing in the image of number six to our network and then our network has learned that this is a number six. We can look at different numbers so it knows that a two is a two. It knows that a three is a three, so it’s really cool. Now using back propagation, we were able to train our neural network to identify these images of handwritten digits. This is useful for things like scanning documents, right? You can imagine that you can just take a whole bunch of images of a document, and it kind of scan over it, and then pass the images into a network, and the network is able to basically tell your computer what is in this document just from an image of a document. In the next part, you’ll be able to, again, build your own network and train it this time. Cheers.