# 4-6-1-6. Neural Networks in PyTorch

Hello everyone and welcome back.

So, in this notebook and series of videos,

I’m going to be showing you a more powerful way to build neural networks and PyTorch.

So, in the last notebook, you saw how you can calculate the output for

network using tensors and matrix multiplication.

But PyTorch has this nice module, nn,

that has a lot of my classes and methods and

functions that allow us to build large neural networks in a very efficient way.

So, to show you how this works,

we’re going to be using a dataset called MNIST.

So, MNIST it’s a whole bunch of grayscale handwritten digits.

So ,0, 1, 2, 3,

4 and so on through nine.

Each of these images is 28 by 28 pixels and the goal is

to actually identify what the number is in these images.

So, that dataset consists of each of these images and

it’s labeled with the digit that is in that image.

So, ones are labeled one,

twos are labeled two and so on.

So, what we can do is we can actually show

our network and image and the correct label and

then it learns how to actually determine what the number and the image is.

This dataset is available through the torchvision package.

So, this is a package that sits alongside PyTorch,

that provides a lot of nice utilities like

datasets and models for doing computer vision problems.

What it does is it gives us back an object which I’m calling trainloader.

So, with this trainloader we can turn into an iterator with iter and then

this will allow us to start getting good at

it or we can actually just use this in a loop,

in a for loop and so we can get our images and labels out

of this generator with four image,

One thing to notice is that when I created the trainloader,

I set the batch size to 64.

So, what that means and every time we get a set of images and labels out,

we’re actually getting 64 images out from our data loader.

So, then if you look at the shape and the size of these images,

we’ll see that they are 64 by one by 28 by 28.

So, 64 images and then one color channels so it’s grayscale,

and then it’s 28 by 28 pixels is the shape of these images and so we can see that here.

Then our labels have a shape of 64 so it’s just a vector that’s 64 elements which with

a label for each of our images and we can see what

one of these images looks like this is a nice number four.

So, we’re going to do here is build

a multi-layer neural network using the methods that we saw before.

By that I mean you’re going to initialize some weight matrices and

some bias vectors and use those to calculate the output of this multi-layer network.

Specifically, we want to build this network with 784 input units,

256 hidden units, and 10 output units.

So, 10 output units,

one for each of our classes.

So, the 784 input units,

this comes from the fact that with this type of network is

called a fully connected network or a dense network.

We want to think of our inputs as just one vector.

So, our images are actually this 28 by 28 image,

but we want to put a vector into our network and so what we need to do is

actually convert this 28 by 28 image into a vector and so,

784 is 28 times 28.

When we actually take this 28 by 28 image and flatten it into

a vector then it’s going to be 784 elements long.

So, now what we need to do is take each of our batches

which is 64 by one by 28 by 28 and then

convert it into a shape that is to another tensor which shapes 64 by 784.

This is going to be the tensor that’s the input to our network.

So, go and give this a shot.

So again, build the networks 784 input units,

256 hidden units and 10 output units and you’re going to be

generating your own random initial weight and bias matrices. Cheers.

모두 안녕하고 다시 오신 것을 환영합니다.

따라서 이 노트와 일련의 비디오에서

신경망과 PyTorch를 구축하는 더 강력한 방법을 보여드리겠습니다.

그래서 지난 노트에서 결과를 계산하는 방법을 보았습니다.

텐서와 행렬 곱셈을 사용하는 네트워크.

그러나 PyTorch에는 이 멋진 모듈이 있습니다.

내 클래스와 메서드가 많이 있으며

매우 효율적인 방식으로 대규모 신경망을 구축할 수 있는 기능을 제공합니다.

이것이 어떻게 작동하는지 보여드리기 위해,

우리는 MNIST라는 데이터 세트를 사용할 것입니다.

따라서 MNIST는 회색조 손으로 쓴 숫자의 전체 무리입니다.

따라서 ,0, 1, 2, 3,

4부터 9까지.

이 이미지 각각은 28 x 28픽셀이며 목표는

이 이미지에서 숫자가 무엇인지 실제로 식별합니다.

따라서 해당 데이터 세트는 이러한 각 이미지와

해당 이미지에 있는 숫자로 레이블이 지정됩니다.

그래서, 그것들은 하나라는 레이블이 붙습니다.

2개는 2개 등으로 레이블이 지정됩니다.

그래서 우리가 할 수 있는 것은

우리의 네트워크와 이미지, 정확한 라벨과

그런 다음 숫자와 이미지가 무엇인지 실제로 결정하는 방법을 배웁니다.

이 데이터 세트는 torchvision 패키지를 통해 사용할 수 있습니다.

따라서 이것은 PyTorch와 함께 제공되는 패키지입니다.

다음과 같은 멋진 유틸리티를 많이 제공합니다.

컴퓨터 비전 문제를 해결하기 위한 데이터 세트 및 모델.

이 셀을 실행하여 MNIST 데이터 세트를 다운로드하고 로드할 수 있습니다.

그것이 하는 일은 내가 trainloader라고 부르는 객체를 돌려주는 것입니다.

따라서 이 trainloader를 사용하여 iter를 사용하여 iterator로 전환한 다음

이것은 우리가

또는 우리는 실제로 이것을 루프에서 사용할 수 있습니다.

for 루프에서 이미지와 레이블을 가져올 수 있습니다.

4개의 이미지가 있는 이 생성기의

한 가지 주의할 점은 내가 trainloader를 만들 때

배치 크기를 64로 설정했습니다.

그래서 그것이 의미하는 바와 우리가 일련의 이미지와 라벨을 얻을 때마다,

실제로 데이터 로더에서 64개의 이미지를 가져오고 있습니다.

그렇다면 이 이미지의 모양과 크기를 보면

64 x 1 x 28 x 28임을 알 수 있습니다.

따라서 64개의 이미지와 하나의 색상 채널이 있으므로 그레이스케일입니다.

그리고 28 x 28 픽셀이 이 이미지의 모양이고 여기에서 볼 수 있습니다.

그런 다음 레이블의 모양은 64이므로 64개 요소로 구성된 벡터입니다.

각 이미지에 대한 레이블을 보면

이 이미지 중 하나는 멋진 숫자 4인 것 같습니다.

그래서 여기서 할 일은 빌드입니다.

이전에 본 방법을 사용하는 다층 신경망.

즉, 몇 가지 가중치 행렬을 초기화하고

일부 바이어스 벡터를 사용하여 이 다층 네트워크의 출력을 계산합니다.

특히, 우리는 784개의 입력 유닛으로 이 네트워크를 구축하고자 합니다.

256개의 은닉 유닛과 10개의 출력 유닛.

따라서 10개의 출력 단위,

각 수업마다 하나씩.

따라서 784개의 입력 유닛은,

이것은 이러한 유형의 네트워크가

완전 연결 네트워크 또는 밀집 네트워크라고 합니다.

우리는 입력을 하나의 벡터로 생각하고 싶습니다.

그래서, 우리의 이미지는 실제로 이 28 x 28 이미지입니다.

하지만 우리는 벡터를 네트워크에 넣고 싶으므로 우리가 해야 할 일은

실제로 이 28 x 28 이미지를 벡터로 변환하고,

784는 28 곱하기 28입니다.

실제로 이 28 x 28 이미지를 가져 와서 평면화하면

벡터는 784개 요소가 됩니다.

이제 우리가 해야 할 일은 각 배치를 가져오는 것입니다.

이것은 64 x 1 x 28 x 28입니다.

64 x 784를 형성하는 다른 텐서인 모양으로 변환합니다.

이것은 우리 네트워크에 대한 입력인 텐서가 될 것입니다.

자, 가서 이것 좀 해보세요.

다시, 네트워크 784 입력 유닛을 구축하고,

256개의 은닉 유닛과 10개의 출력 유닛이 있습니다.

임의의 초기 가중치 및 편향 행렬을 생성합니다. 건배.

# Neural networks with PyTorch

Deep learning networks tend to be massive with dozens or hundreds of layers, that’s where the term “deep” comes from. You can build one of these deep networks using only weight matrices as we did in the previous notebook, but in general it’s very cumbersome and difficult to implement. PyTorch has a nice module nn that provides a nice way to efficiently build large neural networks.

Now we’re going to build a larger network that can solve a (formerly) difficult problem, identifying text in an image. Here we’ll use the MNIST dataset which consists of greyscale handwritten digits. Each image is 28×28 pixels, you can see a sample below

Our goal is to build a neural network that can take one of these images and predict the digit in the image.

First up, we need to get our dataset. This is provided through the torchvision package. The code below will download the MNIST dataset, then create training and test datasets for us. Don’t worry too much about the details here, you’ll learn more about this later.

We have the training data loaded into trainloader and we make that an iterator with iter(trainloader). Later, we’ll use this to loop through the dataset for training, like

for image, label in trainloader:
## do things with images and labels

You’ll notice I created the trainloader with a batch size of 64, and shuffle=True. The batch size is the number of images we get in one iteration from the data loader and pass through our network, often called a batch. And shuffle=True tells it to shuffle the dataset every time we start going through the data loader again. But here I’m just grabbing the first batch so we can check out the data. We can see below that images is just a tensor with size (64, 1, 28, 28). So, 64 images per batch, 1 color channel, and 28×28 images.

First, let’s try to build a simple network for this dataset using weight matrices and matrix multiplications. Then, we’ll see how to do it using PyTorch’s nn module which provides a much more convenient and powerful method for defining network architectures.

The networks you’ve seen so far are called fully-connected or dense networks. Each unit in one layer is connected to each unit in the next layer. In fully-connected networks, the input to each layer must be a one-dimensional vector (which can be stacked into a 2D tensor as a batch of multiple examples). However, our images are 28×28 2D tensors, so we need to convert them into 1D vectors. Thinking about sizes, we need to convert the batch of images with shape (64, 1, 28, 28) to a have a shape of (64, 784), 784 is 28 times 28. This is typically called flattening, we flattened the 2D images into 1D vectors.

Previously you built a network with one output unit. Here we need 10 output units, one for each digit. We want our network to predict the digit shown in an image, so what we’ll do is calculate probabilities that the image is of any one digit or class. This ends up being a discrete probability distribution over the classes (digits) that tells us the most likely class for the image. That means we need 10 output units for the 10 classes (digits). We’ll see how to convert the network output into a probability distribution next.

Exercise: Flatten the batch of images images. Then build a multi-layer network with 784 input units, 256 hidden units, and 10 output units using random tensors for the weights and biases. For now, use a sigmoid activation for the hidden layer. Leave the output layer without an activation, we’ll add one that gives us a probability distribution next.

Now we have 10 outputs for our network. We want to pass in an image to our network and get out a probability distribution over the classes that tells us the likely class(es) the image belongs to. Something that looks like this:

Here we see that the probability for each class is roughly the same. This is representing an untrained network, it hasn’t seen any data yet so it just returns a uniform distribution with equal probabilities for each class.

To calculate this probability distribution, we often use the softmax function. Mathematically this looks like

What this does is squish each input $x_i$ between 0 and 1 and normalizes the values to give you a proper probability distribution where the probabilites sum up to one.

Exercise: Implement a function softmax that performs the softmax calculation and returns probability distributions for each example in the batch. Note that you’ll need to pay attention to the shapes when doing this. If you have a tensor a with shape (64, 10) and a tensor b with shape (64,), doing a/b will give you an error because PyTorch will try to do the division across the columns (called broadcasting) but you’ll get a size mismatch. The way to think about this is for each of the 64 examples, you only want to divide by one value, the sum in the denominator. So you need b to have a shape of (64, 1). This way PyTorch will divide the 10 values in each row of a by the one value in each row of b. Pay attention to how you take the sum as well. You’ll need to define the dim keyword in torch.sum. Setting dim=0 takes the sum across the rows while dim=1 takes the sum across the columns.

## Building networks with PyTorch

PyTorch provides a module nn that makes building networks much simpler. Here I’ll show you how to build the same one as above with 784 inputs, 256 hidden units, 10 output units and a softmax output.

Let’s go through this bit by bit.

class Network(nn.Module):

Here we’re inheriting from nn.Module. Combined with super().__init__() this creates a class that tracks the architecture and provides a lot of useful methods and attributes. It is mandatory to inherit from nn.Module when you’re creating a class for your network. The name of the class itself can be anything.

self.hidden = nn.Linear(784, 256)

This line creates a module for a linear transformation, $x\mathbf{W}+b$, with 784 inputs and 256 outputs and assigns it to self.hidden. The module automatically creates the weight and bias tensors which we’ll use in the forward method. You can access the weight and bias tensors once the network (net) is created with net.hidden.weight and net.hidden.bias.

self.output = nn.Linear(256, 10)

Similarly, this creates another linear transformation with 256 inputs and 10 outputs.

self.sigmoid = nn.Sigmoid()
self.softmax = nn.Softmax(dim=1)

Here I defined operations for the sigmoid activation and softmax output. Setting dim=1 in nn.Softmax(dim=1) calculates softmax across the columns.

def forward(self, x):

PyTorch networks created with nn.Module must have a forward method defined. It takes in a tensor x and passes it through the operations you defined in the __init__ method.

x = self.hidden(x)
x = self.sigmoid(x)
x = self.output(x)
x = self.softmax(x)

Here the input tensor x is passed through each operation and reassigned to x. We can see that the input tensor goes through the hidden layer, then a sigmoid function, then the output layer, and finally the softmax function. It doesn’t matter what you name the variables here, as long as the inputs and outputs of the operations match the network architecture you want to build. The order in which you define things in the __init__ method doesn’t matter, but you’ll need to sequence the operations correctly in the forward method.

Now we can create a Network object.

You can define the network somewhat more concisely and clearly using the torch.nn.functional module. This is the most common way you’ll see networks defined as many operations are simple element-wise functions. We normally import this module as Fimport torch.nn.functional as F.

### Activation functions

So far we’ve only been looking at the sigmoid activation function, but in general any function can be used as an activation function. The only requirement is that for a network to approximate a non-linear function, the activation functions must be non-linear. Here are a few more examples of common activation functions: Tanh (hyperbolic tangent), and ReLU (rectified linear unit).

In practice, the ReLU function is used almost exclusively as the activation function for hidden layers.

Your Turn to Build a Network

Exercise: Create a network with 784 input units, a hidden layer with 128 units and a ReLU activation, then a hidden layer with 64 units and a ReLU activation, and finally an output layer with a softmax activation as shown above. You can use a ReLU activation with the nn.ReLU module or F.relu function.

### Initializing weights and biases

The weights and such are automatically initialized for you, but it’s possible to customize how they are initialized. The weights and biases are tensors attached to the layer you defined, you can get them with model.fc1.weight for instance.

It’s good practice to name your layers by their type of network, for instance ‘fc’ to represent a fully-connected layer. As you code your solution, use fc1fc2, and fc3 as your layer names.

For custom initialization, we want to modify these tensors in place. These are actually autograd Variables, so we need to get back the actual tensors with model.fc1.weight.data. Once we have the tensors, we can fill them with zeros (for biases) or random normal values.

### Forward pass

Now that we have a network, let’s see what happens when we pass in an image.

As you can see above, our network has basically no idea what this digit is. It’s because we haven’t trained it yet, all the weights are random!

### Using nn.Sequential

PyTorch provides a convenient way to build networks like this where a tensor is passed sequentially through operations, nn.Sequential (documentation). Using this to build the equivalent network:

The operations are availble by passing in the appropriate index. For example, if you want to get first Linear operation and look at the weights, you’d use model[0].

You can also pass in an OrderedDict to name the individual layers and operations, instead of using incremental integers. Note that dictionary keys must be unique, so each operation must have a different name.

Now you can access layers either by integer or the name

In the next notebook, we’ll see how we can train a neural network to accuractly predict the numbers appearing in the MNIST images.

# Import necessary packages

import numpy as np
import torch

import helper

import matplotlib.pyplot as plt

# The MNIST datasets are hosted on yann.lecun.com that has moved under CloudFlare protection
# Reference: https://github.com/pytorch/vision/issues/1938

from six.moves import urllib
opener = urllib.request.build_opener()
urllib.request.install_opener(opener)

### Run this cell

from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,)),
])

images, labels = dataiter.next()
print(type(images))
print(images.shape)
print(labels.shape)

plt.imshow(images[1].numpy().squeeze(), cmap='Greys_r');

out = # output of your network, should have shape (64,10)

def softmax(x):
## TODO: Implement the softmax function here

# Here, out should be the output of the network in the previous excercise with shape (64,10)
probabilities = softmax(out)

# Does it have the right shape? Should be (64, 10)
print(probabilities.shape)
# Does it sum to 1?
print(probabilities.sum(dim=1))

from torch import nn

class Network(nn.Module):
def __init__(self):
super().__init__()

# Inputs to hidden layer linear transformation
self.hidden = nn.Linear(784, 256)
# Output layer, 10 units - one for each digit
self.output = nn.Linear(256, 10)

# Define sigmoid activation and softmax output
self.sigmoid = nn.Sigmoid()
self.softmax = nn.Softmax(dim=1)

def forward(self, x):
# Pass the input tensor through each of our operations
x = self.hidden(x)
x = self.sigmoid(x)
x = self.output(x)
x = self.softmax(x)

return x

# Create the network and look at it's text representation
model = Network()
model

import torch.nn.functional as F

class Network(nn.Module):
def __init__(self):
super().__init__()
# Inputs to hidden layer linear transformation
self.hidden = nn.Linear(784, 256)
# Output layer, 10 units - one for each digit
self.output = nn.Linear(256, 10)

def forward(self, x):
# Hidden layer with sigmoid activation
x = F.sigmoid(self.hidden(x))
# Output layer with softmax activation
x = F.softmax(self.output(x), dim=1)

return x

print(model.fc1.weight)
print(model.fc1.bias)

# Set biases to all zeros
model.fc1.bias.data.fill_(0)

# sample from random normal with standard dev = 0.01
model.fc1.weight.data.normal_(std=0.01)

# Grab some data
images, labels = dataiter.next()

# Resize images into a 1D vector, new shape is (batch size, color channels, image pixels)
images.resize_(64, 1, 784)
# or images.resize_(images.shape[0], 1, 784) to automatically get batch size

# Forward pass through the network
img_idx = 0
ps = model.forward(images[img_idx,:])

img = images[img_idx]
helper.view_classify(img.view(1, 28, 28), ps)

# Hyperparameters for our network
input_size = 784
hidden_sizes = [128, 64]
output_size = 10

# Build a feed-forward network
model = nn.Sequential(nn.Linear(input_size, hidden_sizes[0]),
nn.ReLU(),
nn.Linear(hidden_sizes[0], hidden_sizes[1]),
nn.ReLU(),
nn.Linear(hidden_sizes[1], output_size),
nn.Softmax(dim=1))
print(model)

# Forward pass through the network and display output
images.resize_(images.shape[0], 1, 784)
ps = model.forward(images[0,:])
helper.view_classify(images[0].view(1, 28, 28), ps)

print(model[0])
model[0].weight

from collections import OrderedDict
model = nn.Sequential(OrderedDict([
('fc1', nn.Linear(input_size, hidden_sizes[0])),
('relu1', nn.ReLU()),
('fc2', nn.Linear(hidden_sizes[0], hidden_sizes[1])),
('relu2', nn.ReLU()),
('output', nn.Linear(hidden_sizes[1], output_size)),
('softmax', nn.Softmax(dim=1))]))
model

print(model[0])
print(model.fc1)