Hello, welcome back. So in this video and notebook, I’m going to be talking about inference and validation. So, inference means when we have a tree network and we’re using our network to make predictions. So neural networks have this issue where they have a tendency to perform too well on the training data, and they aren’t able to generalize to other data. So for instance, if you traded on a bunch of images and then you give it an image that it hasn’t seen before, it would do a really poor job at predicting what’s in that image. So neural networks learn from the training data too much and they’re bad at generalizing, this is called overfitting. So the test for overfitting while training, we perform inference with our network on a test set or our validation set. So this is the set of data that looks like the training data, but it’s not in the training set, so basically the neural network hasn’t seen this data before. So that way when we’re testing on this validation set, we can see how well the network is able to generalize to data it hasn’t seen before. To get started, I’m going to import our packages like PyTorch towards vision like normal. Here again, I am loading the fashion hymnist dataset. You’ll notice here that I’m delegating the test data, so this is the data that we’re going be using for validation. So we have this data, the training data, that we’re going to get from the train set and train loader, and we have our test validation data from test loader. So here, I built a network that it’s a bit more advanced than what I’ve shown you so far because I wanted to have an arbitrary number of hidden layers. So basically, I wanted to be able to pass in a list like this, and then it will automatically build three hidden layers with this number of units in each of those layers. So to do this I used nn.ModuleList, and basically, the idea is that it works like a normal list, where you can add things to it and extend it and so on, but the model is able to PyTorch models, able to track the modules that you’re adding to this module list. Because otherwise if you just use a normal list, then PyTorch wouldn’t be able to track that, and you wouldn’t see those modules and operations ending up in your model. I’m not going get too much into how this works, because I wrote it out here, so feel free to read all this at your leisure. The major things that you should note in this new network is that I’ve added dropout. So we remember from before, drop out randomly deletes some of these connections between the layers as the network is training. So what this does is it forces units in the network to learn various features from the input data, and what this does is it helps the network generalize. To use dropout, basically you just create this operation, this module like you normally would, and give it a probability of dropping in units. We can add drop out and the forward pass like we would with other modules and operations, so just self-taught dropout and pass in a tensor. The second thing to notice here is that I’m actually returning the log softmax from the forward pass. I’m doing this because remember that softmax returns a probability distribution. So the problem with this is that a lot of times you’re going to get values that are really close to zero or really close to one. Due to the imprecision of representing numbers as floating points, then this can lead to instabilities in calculations and overall can build up a lot of inaccuracies. So the way to solve this is actually take the log of the softmax, and this moves as numbers like away from zero and away from one into normal like negative numbers, like negative four. So this helps keep computation stable, helps with precision, and in general, it’s just a lot easier to work in the log-space of probability functions. Here, I’m creating the model like normal, so 784 units like input units, 10 outputs, I’m using two hidden layers, one of the 516 units and one with 256 units, I’m setting my dropout probability to 0.5. So since I’m using the log softmax as my output, I need to use the negative log-likelihood loss, so I’m just using this here, there’s my criterion. It works basically the same as the cross entropy loss you saw before, except that it expects the log softmax as the input. As I’m also using the Adam optimizer, and so like I mentioned before, this is a variation on stochastic gradient descent that uses momentum and it ends up training your networks faster than with just normal stochastic gradient descent. Now, I’m going to show you how to write this validation code. So the idea with validation code, with the validation pass is that we want to take our data from the test dataset and run it through our network, measure the loss on the test data, but also the accuracy. So, how well is our network actually doing on data it hasn’t seen before? So it’s normal, it’s going to start off by grabbing some images in our labels. This time we’re getting from the test loader, so the train loaders, these are our test images. So we shape our images, and then pass our images through the network. So again, these images are coming from our test set and we’re going to get the output for our test set, and we can update so we can record our test loss criterion. So we’re not interested in doing backpropagation with this loss, we really just want to measure the loss on the test dataset. So that is measuring the loss, but now I also want to measure the accuracy, like how well our network is predicting the correct label for the test dataset. So here like normal, I’m going to calculate our probabilities. So, torch.exponential. So again, our output is log softmax, so the inverse of the log is exponential and so this is going to give us back our softmax distribution. So what should this work so far, cool. Those are probabilities and you just kind of check him out, cool. They have, so 64 by 10. I want to see how well our network is predicting the correct label. So, I need to compare the correct label with what the network is predicting. To get what the network is predicting I need to take the highest probability from our softmax output. So this is ps.max, I want to take the first dimension, so across here,10. So what this actually does is it gives me two different tensors. So the first tensor are the actual probabilities, and the actual highest probabilities within the second tensor is the index of the highest probability. So this index is the interesting thing, because this one is actually telling us which class has the highest probability in the softmax output. So since the second tensor, this guy here is telling us the predicted classes, I’m going to do one so that’ll give us a predicted classes, and then we need to compare this with our true labels, and I’m going to call this equality. We’ll see what this looks like. Equality. So, here we get this other tensor where ones are when we make the correct prediction, zeros where you make the incorrect prediction. Now to measure the accuracy, so again, the accuracy is basically just how many times did it get the prediction correct out of all the predictions it made. So since these are all ones and zeros, we basically just need to add up all the correct ones, so all these ones, and then divide by the total number of predictions. So since these are all ones, the really simple way to do this is just to take the mean. So equality.mean. So a lot of times you’re going to run it into problems with the type of tensor that you have, so this is an example. So the tensor equality has the type torch.bitetensor, and this is saying mean is not implemented for this type of tensor, and so, this is our problem right here, equality.mean. So we actually needed to convert this to a float tensor which does have this mean function. So to do that, we do equality.type and give it a different data type, so float tensor. So what this does is it converts the equality tensor which is a byte tensor into a float tensor and then we can get the mean from that. Now, we have our accuracy. So we’re going to want to run this validation pass, there’s a bunch of times, and we also need to go all the way through our test loader. So, what I’d like to do is wrap this thing up and it’s own function. So it looks something like this. We do define validation function, passing a model or data, test loader and a criterion, and a thing where we can sum up our accuracy, and now we’re going to loop through the images and labels in our data. So basically, we’re looping through all the batches in our test loader, calculating the test loss and accuracy, sum them all up and we can return these. Now that we have this as a function, we can put this into our normal training loop and measured validation as we’re training. So then, every certain number of steps which we’ve set by print_ every, we’re going to do our validation. So, what I’m going to do here is say torch.no_grad. So, this basically turns off all gradients for all of our tensors. We want to do is because when we’re doing validation we don’t really care about the gradients, and so, it’s just going to speed up the computation that we’re doing in our validation. So, we’re going to get our test loss in our accuracy from our validation function. It’s also important to know that now that we have dropout we don’t want dropout on while we’re doing validation. So if it is on it’s going to make the performance of our network look worse, because if you’re doing inference after your training you’re going to have dropout off. So, we want to make sure that a dropout is off while we’re doing validation. Then we can do that is by putting the model and evaluation mode. So, we do that with model.eval. Then we want to turn drop out back on, so we do that with model model.train. So to put our network in our model into a mode where it can do inference and validation, we want to use model.eval, if we want it to be training then we say model.train. Just for a good measure was also set model to training up here, just in case. Now, we can check out what this looks like. So now, we can see we get our test laws printed out, as well as, our accuracy. So accuracy starts fairly low, 0.7 and then over time when guys are training it gets higher. With the network trained, then we can use this for inference. So again, making actual predictions. So pretty much everything is the same as before, we need to remember to put our model in evaluation mode to turn off drop out, and we want to use with torch.no_ grad to turn off our gradients and then we can actually do a forward pass and get our probability. So, we’ll get log softmax from here, take the exponential get softmax and then we can get our probability distribution and actually see what we have. So, in the next part we’re going to look at how to save and load models. It doesn’t make sense to basically train a completely new model everytime you need one, and so, typically what you’ll do is you’ll train your model, save it to a checkpoint which is typically what a train model is called when you save it at checkpoint. But you save to a checkpoint and then later you can load it up to either do inference or keep training. See you in the next video.