Hello everyone and welcome back.So,in this video, I’m going to be showing you how you actually build neural networks with PyTorch. At the end of this notebook which I’ll provide for you you’ll build your own neural network. So, let’s get started. So, the first step we import things like Normal, to import PyTorch we do import torch. We’re also going to import this package called torchvision which is associated with PyTorch and it allows us to download and use some existing datasets. The dataset that we’re going to use for this is called MNIST. The MNIST dataset is just a bunch of images of hand drawn digits. So, 0 through 9, this dataset is used to train a network or other machine learning models such they can classify the images into those details, right? So then an image that shows at eight will be classified as the digit 8. That’s what we’ll be using for this example. So, here I’m using torchvision to download and load the training data. So, this is the MNIST dataset that we can grab and put it into the trainset. So here I’m saying download equals true, so if it doesn’t exist on disk already, then it will download it for us. I’m also providing a transform. So, what this does is it basically reads in these images and then applies these transforms to give us our dataset that we feed into our network. So, as it transform it looks like this so basically the first thing we do is transform these images into a PyTorch tensor and then we normalize them. So, what normalizing means in this case is that we want to take our images which are grayscale images and each value, each pixel is a float from zero to one but we actually want our pixels to be from negative one to one. So, this is what this does is it subtracts 0.5, so this is the mean, so it will subtract from each color chamber, from each pixel, and then it’s going to divide by 0.5. So, what this does is it moves this range of zero to one to negative half to one half and then you divide by 0.5 or you multiply it by two, so that stretches out the variance to minus one to one. So, this just makes it easier for our neural networks to learn and you’ll pretty much always want to normalize your data going into a neural network. The other thing to note is that, here I’ve set a batch size of 64. So, that basically means when we get our data out of trainloader here, is going to give us 64 images at a time. Okay. So let me go and load this, then load our data and now we can see what one of these images looks like. So, here is one example and this is number seven. So this is just a hand-drawn digit that we are going to be classifying with our neural network. So, here is the network that we’ll be building as example with MNIST. So,MNIST, the images are 28 by 28, and so what we actually want to do is transform them into a vector that’s 28 times 28 units long. So 784, which is 28 times 28. So this is the size of our input layer. So we’re going to take our images, we convert them into a vector and then pass them in here as our inputs. We’re going to use two hidden layers, the first with 128 units and the second was 64 units, this is going to go to an output layer with 10 units. This output layer has 10 units because we want to classify these digits and there’s 10 of them, right? So this one correspond to the digit 0,1,2 and so on. Here the number of units in each of these hidden layers and the number of hidden layers you actually use is somewhat arbitrary. In general, the more units you have on a layer and the more layers you have, the better your network will be able to fit the data. The large part of training neural networks is actually finding the best number of units and the best number of layers to using your network. For activation functions, for the hidden layers are going to be using relus and for the output, we’re going to be using softmax. So, remember that softmax actually takes the values of these output units and transforms them into a probability distribution. So, what that means is that it actually squishes all these values between zero and one but then divides by the total things, so, if you sum up the values for each of these units it equals one. So, this is a discrete probability distribution that gives us the probability that whatever our input is it belongs to this class. And to train this network, we’re going to need a loss and so we’re using a cross entropy loss. What this does is compare the probability, like the prediction from the softmax layer with the true category and then you can use the difference, that loss there to update the weights in your network. Okay, so it’s time to build this network. So, firstly, we need to import a few modules from PyTorch. So, from torch import nn, this is like neural network functions. We can also import functional as F. So, this is more functions that are specifically around neural networks and this is just like a functional form of them. With those imported, we can start building our network. So, the kind of general, most basic way you do this in PyTorch is by creating a class. It doesn’t really matter because those class network that is subclass from nn.Module. Then in the init function, you first need to call super. So, basically what super does is a calls functions or attributes of whatever class that this is subclassing from. So, what this line is going to do is it is actually going to call the init method of an nn.Module. So that done, we’re going to define the layers we are using, the operations that will define the architecture of our neural network. To do this, we’ll call it self.fc, so a fully connected layer. So fc is linear. So this is, if you look at this, this is applies that linear transformation to an incoming data. So, this we need to put in our input, size, and then the output size. So, we know that we have 784 units in our input layer and we want our first hidden layer to be 128 units. And for the second layer we can do something similar, 128 to 64. Then the output layer call fc3. So, this is going from 64 to 10. That’s pretty much all the operations that we need to define in this part. So, to get this to work as a neural network, we need to define a forward function. So, all PyTorch networks that you build from nn.Module need to have this forward function. So, what this is expecting is that x is a PyTorch tensor and then what you’re going to do is basically just pass this tensor through each of your layers in each of your operations. So for example, to do the first layer, so self.fc1 x. So, this is passing the tensor x through our first layer, through this first linear operation and that gives us another tensor. Then we can take that tensor and put the relu activation function on it and then go to the next one. So this kind of keep going like this. Finally we do our softmax function. So,here we actually need to define the dimension that we want to calculate the softmax across. So, if you remember earlier I said that our images are actually going to be batches and there’s going to be 64 different images per batch, and the way that looks with the tensor size is at the batch size is first. So, that would be dimension zero. Then our actual vector that we’re passing through the network is the second dimension. So, this tensor here is going to end up being 64 which is the batch size by 10. So, we actually want to take our softmax across the dimension that is has 10 values, and so that’s dimension one here. And finally, you just return this. So, we can create our model, our network like this and then see what it looks like. So, this basically will just tell us like all these modules, these operations that we registered in the init function. So, we create these linear operations, these linear transformations. The weights and biases are automatically created for us. So, we can see that with model.fc1.weight, and I’ll print this out. Then also print fc1.bias. So, we can see we have these parameters here that have automatically been created and initialized for us. So, if we want to go in and reinitialize these parameters in bias in weights, then we can get to our bias like this bias. Then to get the underlying data, you say.data and now we can fill it in place with zeros. So, now our bias that’s attached to this linear operation in the first fully connected layer is filled with zeros. Similarly, you can look at your weights.data, and suppose we want to do a normal distribution. So, we want to initialize these random weights with a normal distribution with a standard deviation of 0.01. There we go. So, it’s really convenient because basically the bias and weight matrices are always available to you in your model. So, you just do model.whatever layer, whatever operation and then.weight and that you can get your weights out. Now that we have a network, I’m going to pass some data through it and see what it looks like. We get our data from the train loader and what it does is it returns images and then labels for each of those images. So, labels are the categories that the images belong to. So, if there’s an image of a zero then the label is zero, if there’s an image of a five the label is five and so on. So, we do is we call next iter trainloader. So, train loader returns a generator. To make it something that you can actually iterate over, you do iter and to get the first value of it, you do next. So, basically every time you call next on this, it’s going to give you the next batch in your training data. With images, we actually need to resize them now because remember they come as 28 by 28 images and we need to make them a 784 long vector. So, say images.resize our batch size. This is like the number of color channels for instance and then 784. A somewhat better way to do this is if you actually pass in the batch size that you get from the images tensor itself. So, this is the first element of the shape. You won’t do this because a lot of times you don’t actually know what the batch size is going to be beforehand. So, in general, you don’t want to hard code that into your network or your tensors or anywhere. So, you do something like that. With the images resized, we can now pass it through our network. So to do this, you say model.forward. I’m going to put this into ps, so this just stands for probabilities. So, forward pass our images, we’re just going to take the first image. That’ll calculate our probabilities. So now, I’m going to use dysfunction and the helper file so helpers at the file I wrote that sits next to this and makes things easier and nicer. So, I want to look at what our image is and then how our network classified it. So, what is its prediction for the class. So, here we’re going to use image is zero. So this time, we actually need to convert it back to a 28 by 28 image. So, I’m going to use view here. So, view is basically like resizing except that it returns a new tensor. So, resize it has an underscore at the end which means it’s done in place, but view is the same as resizing, reshaping except it returns a new tensor. So, we can pass in our image and when you pass in probability distribution. I typed that wrong. So, you can see here, the image that we passed through our network is a seven and it’s trying to make some predictions but since the weights are random, we haven’t trained it, the network has no idea what the digit actually is, and so, it’s just giving us a random guesses where like all the probabilities are roughly the same. So, it’s a uniform distribution right now because we haven’t trained it yet. PyTorch provides a more convenient way for building a model called nn.Sequential. So, I’ll show you how to do this. First is want to define our hyper-parameters. So, hyper parameters are parameters that define the architecture of your network, and the parameters are the weights and biases themselves. So, for instance, your input size is going to be 784. We can define the size of our hidden layers. So, this is would be 128 and 64 and then we can define the output size, is equal to 10. So with nn.Sequential we can say our model is this. Then basically, you just pass in the operations, the transformations, modules that you want to do. So, an nn.Linear input size and then hidden size zero. So that’s the first module. Then we know we want this to go through yellow activation. So, nn.ReLU, and then the second Linear transformation and then another ReLU, and finally, our output and then the softmax. Then we print out the model, we see it’s a sequential model and then we have this linear transformation, ReLU, linear transformation, ReLU, linear and then soft max. This is practically the same as the network that we built before, except that it’s much simpler to do and fewer lines of code. Now we can do a forward pass through this and see that it’s the same. So, we put another seven. Since our network hasn’t trained yet, it doesn’t know what the seven actually is. Another cool thing you can do sequential is you can actually pass in in order to dict, in order to dictionary, to name each of the layers in your sequential model. So, to get an order dict, do from collections “import OrderedDict. Then we can build our model sequential. Then OrderedDict takes in a list of tuples that builds the dictionary keys and values. So here, the keys are going to be the names of our layers and the values are going to be the operations themselves. So, we can name our first layer FC1 for instance and given our normal linear thing and felt the rest of the network. So, we call this one FC1 and then our first ReLU is relu1, FC2, relu2 output softmax. So, one thing to note and remember that in a dictionary the keys have to be unique. So, you have to name this one relu1 or whatever and this ReLU activation has to have a different name, and now we can look at our model. So, you can see now each of our layers has a name attached to it. So, FC2 is this linear operation and we can actually get out FC2 here, and so, there you go. They’re named. You can access them just like attributes like we defined in the original network and go forward from there. So now, your turn to build a network. You can use any of these methods that I talked about here. So what I want you to do is build a network to classify the images with three hidden layers and the first one should have 400 units second washed up 200 units and last one should have 100 units, and make sure you use ReLU activation functions on each of them and softmax on the output layer. Here, you’re not training the network yet, so it’s not going to be able to make predictions. But in the next video notebook, you’ll learn how to do that, and so, you’ll be able to train your network. See you in the next video. Cheers.