Hello everyone, welcome back. So in this network, we will be using a pre-trained network to solve this challenging problem of creating a classifier for your cat and dog images. These pre-trained networks were trained on ImageNet which is a massive dataset of over one million labeled images from 1,000 different categories. These are available from torchvision and this module, torchvision.models. And so, we see we have six different architectures that we can use, and here’s a nice breakdown of the performance of each of these different models. So, AlexNet gives us the top one error and the top five error. So basically, as you see, some of these networks and these numbers here, 19, 11, 34, and so on, they usually indicate the number of layers in this model. So, the larger this number, the larger the model is. And accordingly, the larger the model is, you get better accuracy, you get lower errors. At the same time, again, the larger the model is, the longer it’s going to take to compute your predictions and to train and all that. So when you’re using these, you need to think about the tradeoff between accuracy and speed. So, all these networks use an architecture called convolutional layers. What these do, they exploit patterns and regularities in images. I’m not going to get into the details but if you want to learn more about them, you can watch this video. So we’re saying, these deep learning networks are typically very deep. So that means, they have dozens or even hundreds of different layers, and they were trained on this massive ImageNet dataset. It turns out that they were astonishingly well as future detectors for images that they weren’t trained on. So using a pre-trained network like this on a training set that it hasn’t seen before is called transfer learning. So basically, what’s learned from the ImageNet dataset is being transferred to your dataset. So here, we’re going to use transfer learning to train our own network to classify our cat and dog photos. What you’ll see is you’ll get really good performance with very little work on our side. So again, you can download these models from torchvision.models, this model here, so we can include this in our imports, right here. Most of these pre-trained models require a 224 by 224 image as the input. You’ll also need to match a normalization used when these models were trained on ImageNet. So when they train these models, each color channel and images were normalized separately. And you can see the means here and the standard deviations here. So, I’m going to leave it up to you to define the transformations for the training data and the testing data now. And if you’re done, we can get to a new one. Now, let’s see how we can actually load in one of these models. So here, I’m going to use the Densenet-121 model. So you see, it has very high accuracy on the ImageNet dataset and it’s one 121 tells us that it has 121 layers. To load this in our code and use it, so we just say model models.densenet121 and then we say pretrained equals true. So this is going to download the pre-trained network, the weights, the parameters themselves, and then load it into our model. So now, we can do that and then we can look at what the architecture of this model. And this is what our DenseNet architecture looks like. So, you’ll notice that we have this features part here and then a bunch of these layer. So this is like a convolutional layer which again I’m not going to talk about here but you don’t really need to understand it to be able to actually use this thing. There’s two main parts that we’re interested in. So firstly, again, this features part, but then if we scroll all the way to the bottom, we also see this classifier part. So we see here is that we have the classifier. This has been defined as a linear combination layer, it’s a fully connected dense layer, and it has 1,024 input features and then 1,000 output features. So again, the ImageNet dataset has 1,000 different classes. And so, the the number of outputs of this network should be 1,000 for each of those classes. So, the thing to know is that this whole thing was trained on ImageNet. Now, the features will work for other datasets but the classifier itself has been trained for ImageNet. So this is the part that we need to retrain, the classifier. We want to keep the feature part static. We don’t want to update that, but we just need to update the classifier part. So then, the first thing we need to do is freeze our feature parameters. To do that, we go through our parameters in our model. And then, we just say, requires_grad equals false. So what this will do is that when we run our tensors through the model, it’s not going to calculate the gradients. It’s not going to keep track of all these operations. So firstly, this is going to ensure that our our feature parameters don’t get updated but it also will speed up training because we’re not keeping track of these operations for the features. Now, we need to replace the classifier with our own classifier. So here, I’m going to use a couple of new things. I’m going to use the sequential module available from PyTorch. And so, what this does, you basically just give it a list of different operations you want to do and then it will automatically pass a tensor through them sequentially. So, you can pass in an ordered dict to name each of these layers. So I’ll show you how this works. So we want a fully connected layer, so I’ll just name it FC1, and then that is a fully connected layer coming from 1,024 inputs and I’m going to say 500 for this hidden layer. And then we want to pass this through ReLu activation and then this should go through another fully connected layer and this will be our output layer. So, 500 to two, so we have cat and dog, so we want two outputs here. And finally, our output is going to be the LogSoftmax like before. Okay, and that is how we define the classifier. So now, we can take this classifier, just a classifier built from fully connected layers, and we can attach it to our model.classifier. So now, the new classifier that we built that is untrained is attached to our model and this model also has the features parts. The features parts are going to remain frozen. We’re not going to update those weights but we need to train our new classifier. Now, if we want to train our network that we’re using, this Densenet-121 is really deep and it has 121 layers. So, if we can try to train this on the CPU like normal, it’s going to take pretty much forever. So instead, what we can do is use the GPU. GPUs are built specifically for doing a bunch of linear algebra computations in parallel and our neural networks are basically just a bunch of linear algebra computations. So if we run these on the GPU, they’re done in parallel and we get something like 100 times increase speeds. In PyTorch, it’s pretty straightforward to use the GPU. If you have your model, so model, the idea is that your model has all these parameters in there tensors that are sitting in your memory on your computer, but we can move them over to our GPU by saying model.cuda. So what this does is it moves the parameters for your model to the GPU and then all of the computations and the processing and are going to be done on the GPU. Similarly, if you have a tensor like your images, select images, if you want to run your images through your model, you have to make sure that the tensors that you’re putting through your model or on the GPU if your model’s on the GPU. So you just have to make those match up. So to do that, to move a tensor from computer to the GPU, you just, again, say images.cuda. So that will move a tensor, that’s images, to the GPU. Then oftentimes, you’ll want to move your model and your tensors back from the GPU to your local memory and CPU, and so, to do that, you just say like model.cpu or images.cpu, so this’ll bring your tensors back from the GPU to your local computer to run on your CPU. Now, I’m going to give you a demonstration of how this all works and the amazing increased speed we get by using the GPU. So here, I’m just going to do for cuda and false, true. So this way, I’m going to be able to basically like loop through and try it once where we’re not using the GPU, and once where we are using the GPU. So let’s define my criterion which is going to be natural log_loss like we’d normally do, define our optimizer. So again, here, remember that we only want to update the parameters for the classifier. So we’re just going to pass in model.classifier.parameters. This will work and that it’s going to update the premise for our classifier but it’s going to lead the parameters for the feature detector part of the model static. So I typically do is, say like, if cuda, then we want to move our model to the GPU. Otherwise, let’s leave it on the CPU. And then I’m going to write a little training loop. We’ll get our inputs and our labels, changes into variables like normal, then again, if we have cuda enabled, so if we have GPUs, then we can do inputs, labels, and we’ll just move these over to the GPU. We’re using the GPU now and we’re also using this pre-trained network, but in general, you’re going to do the training loop exactly the same way you have been doing it with these feed forward networks that you’ve been building. So first, I’m actually going to define a start time just so I can time things, then you just do your training pass like normal. So, you just do a forward pass through your model and you can calculate the loss, do your backward pass. Finally, update your weights with your optimizer. So I’m going to do here, I’m going to break this training loop after the first three iterations. So I want to time the difference between using a GPU and not using the GPU. What happens is the very first batch to go through the training loop tends to take longer than the other batches, so I’m just going to take the first three or four and then average over those just so we get a better sense of how long it actually takes to process one batch. So, that will just print out our training times. So we can see that if we’re not using the GPU, then each batch takes five and a half seconds to actually go through this training step. Whereas, with the GPU, it only takes 0.012 seconds. So, I mean, this is a speedup of over 100 times. So here, I basically set cuda manually but you can also check if a GPU is available so you say torch.cuda is available, and this will give you back true or false depending if you have a GPU available that can use cuda. Okay, so from here, I’m going to let you finish training this model. So you can either continue with a DenseNet model that is already loaded or you can try ResNet which is also a good model to try out. I also really like VGGNet, I think that one’s pretty good. It’s really up to you. Cheers.