So, in this video, I’ll be showing you how to use transfer learning to train a network that can properly classify those images of cats and dogs. What we’ll do here is use a pre-trained network to actually detect and extract features from the images. This is really good for solving many challenging problems in computer vision and other fields. So, specifically, these networks have been trained on the ImageNet dataset, this is a giant dataset with over a million labeled images and a thousand different categories. These networks use an architecture called convolutional layers that exploit patterns and structure that we see in images and learn from that. These networks have dozens or even hundreds of different convolutional layers. They are extremely deep compared to what you’ve seen so far. This combination of a massive dataset with a massive neural network is come to be known as deep learning. What’s really cool is that once these networks are trained on ImageNet, the models work amazingly well as feature detectors on images that aren’t in ImageNet. So, you can, for instance, apply these networks to images of these cats and dogs and actually extract the features. Then you use those features as inputs to a new classifier which you can train to accurately classify your cat and dog photos. You can download these pre-trained networks using torchvision.models. So, we’re going to include that in our imports. Otherwise is pretty much what you’ve seen before. Now we have our models. Most of these models from towards vision and also other deep learning frameworks like Cara’s require these images to be 224 by 224. So, these are the size of the images that the networks should train on originally. You’ll also need to measure the normalization that was used when the models were trained. These colored channels were normalized separately, so the red green and blue and the means for each of these channels are listed here as their standard deviations are listed here. So, I’ll let you actually define the trained transforms. So, the transforms for the training data and the transforms for the test and validation data. So, you build your transforms like I showed you before and you should note that you need to make the images 224 by 224, you need to normalize them with these means and these standard deviations and you’d probably want to use data augmentation on the training images. As an example of transfer learning, I’m going show you how to do this with a model called DenseNet. So, we can load the model like so. So, model equals models.densenet. So, there’s a few and typically this number afterwards tells you how many layers are in the network. So, in general, more layers means it’s going to have better accuracy, it will just get this 121, we don’t need crazy accuracy right now. So, you want say pre-trained equals true and so this will download the model to your computer if you don’t already have it. So, we can look at the architecture. So, here we see this DenseNet architecture has a part called features that is built up of all these convolutional layers. As you scroll down very far because this is a very deep network, keep going, keep going, at the end we see apart called classifier which is just one linear transformation at the end as taking the features from the feature part and passing them to 1,000 output units. So, transfer learning the features part like all these layers up here above here work really well with other datasets. However, this classifier has a thousand outputs. So, this has been trained particularly like specifically for the ImageNet dataset. So, this part we’re going to want to replace it with a new classifier that has built for our dataset. Then we will train this classifier on our dataset and just use the features that come into here as the input for this classifier. So, the first thing we want to do is actually freeze the parameters in the features section of DenseNet. So, we don’t want to train or otherwise calculate the gradients for that part because we just want to keep it static and just use it as a feature detector. So, you can basically do for param and model.parameters. So, this is going to go through all the parameters in a model and then we can say requires grad equals false. So, this will shut off gradients like calculating gradients for other parameters in the model and we won’t train them when we do the training steps. Now, we’re just going to define our classifier. So, this is similarly to what we built before where you can just build a classifier as a sequential model. So, I’m just using an order dict here to get our first linear transformation into a revenue into another linear transformation for our output. So, we have two outputs here and one for cat and one for dog. The input features here for this classifier are coming from here. So, we can see that this classifier, the input features are 1,024. So, that means that the feature section of this network, the output of that is 1,024 values. So, we want to use that as the input for a classifier and so we have 1,024 values here as well. So, in general, you’re going to need to match up the size of the output from the features part with the input of your classifier. Now that we have our classifier defined, we just need to replace the DenseNet classifier with our classifier. Now, this network is ready to be trained. The problem though is that now we’re using a really deep neural network. So, like I said before this DenseNet has 121 different layers. So, if you’re going try to train this on a CPU like we had before, it’s going to take a very long time. So, instead, we’re going to use the GPU to do the calculations. Using the GPU you can get speedups of like 100 times, 500 times, it’s pretty crazy. PyTorch like pretty much all the other frameworks uses library called CUDA to actually run our networks on the GPU. So, if you’re wondering more about CUDA it’s all here. CUDA is a software library that runs on NVIDIA GPUs and it uses these GPUs to efficiently compute linear algebra computations. So, remember that a lot of the stuff and deep learning networks is basically just linear algebra matrix multiplications and that sort of thing. These computations can be done really fast on a GPU using CUDA. So, the idea with PyTorch is that we move our model parameters and other tensors to the GPU using something like model.to(cuda) and we can move them back from the GPU to the CPU with model.to(CPU). So, I’m going to demonstrate how fast this actually goes by comparing how long it takes to do a forward and backward pass with and without a GPU. So, here is fairly normal training pass that you’ve seen before just to find my criterion to find my optimizer. But now, remember that we don’t want to train the features part but we just want to train the classifier part. So, in optimizer I’m passing in model.classifier.parameters. Here I’m looping through using a device CPU and CUDA and to get our model to whatever device we’re using you say model.to(device) and as usual we’re getting our images and our labels from the training loader and to use CUDA, to use GPU needs to move them to CUDA or wherever. So, inputs.to(device), labels.to(device). So, when a device is CPU, it’ll move them to the CPU. When a device is CUDA, it’ll move them to our GPU. Otherwise, it’s basically all you need to do to change your code for your networks to run on a GPU. So, we can see comparing writing this code on a CPU verse GPU, that we get a huge time-saver. So, for one batch it takes over five seconds on the CPU but nine milliseconds. So, this is just a massive improvement. We can write device agnostic code which will automatically use CUDA if it’s available on your machine. It’s basically you just do torch.device is CUDA if CUDA is just available and then otherwise use the CPU and this becomes this device variable. So, then anytime you have a tensor or a model, you just say tensor.to(device) or model.to(device). If it’s already on the CPU, then it’ll just stay on the CPU. If it’s ready on the GPU it’ll stay there but if it’s not, it’ll move to the appropriate one. From here I’m going to let you finish training the model and pretty much everything is exactly the same except that you’re only training, the classifier which you’ve defined like this and you’re using the GPU now. All right. Good luck.