# 4-6-1-14. Inference and Validation

Hey there. So now,

we’re going to start talking about inference and validation.

So, when you have your trained network,

you typically want to use it for making predictions.

This is called inference,

it’s a term borrowed from statistics.

However, neural networks have a tendency to perform too well on

your training data and they aren’t able to

generalize the data that your network hasn’t seen before.

This is called overfitting.

This happens because as you’re training more and more and more on your training set,

your network starts to pick up correlations and

patterns that are in your training set but they aren’t

in the more general dataset of all possible handwritten digits.

So, to test for overfitting,

we measure the performance of the network on data that isn’t in the training set.

This data is usually called the validation set or the test set.

So, while we measure the performance on the validation set,

we also tried to reduce overfitting through regularization such as dropout.

So, in this notebook,

I’ll show you how we can both look at

our validation set and also use dropout to reduce overfitting.

So, to get the training set for your data like from PyTorch,

then we say train equals true and for fashionMNIST.

To get our test set,

we’re actually going to set train equals false here.

Here, I’m just defining the model like we did before.

So, the goal of validation is to measure

our model’s performance on data that is not part of our training set.

But what we mean by performance is up to you,

up to the developer, the person who’s writing the code.

A lot of times, it’ll just be the accuracy.

So, like how many correct classifications did

our model make compared to all of the predictions?

And other options for metrics are precision and recall,

and the top five error rate.

So here, I’ll show you how to actually measure the accuracy on the validation set.

So first, I’m going to do a forward pass that is one batch from the test set.

So, see in our test set we get our probabilities.

So, just 64 examples in a batch.

Then 10 columns like one for each of the classes.

So, the accuracy, we want to see if our model made

the correct prediction of the class given the image.

The prediction we can consider it to be whichever class has the highest probability.

So, for this, we can use this top-k method on our tensors.

This returns the k highest values.

So, if we pass in one,

then this is going to give us the one highest value.

This one highest value is the most likely class that our network is predicting.

So, for the first ten examples,

and this batch of test data that I grabbed,

we see that the class four and class five are what are being predicted for these.

So, remember that this network actually hasn’t been trained yet,

and so it’s just making these guesses randomly because

it doesn’t really know anything about the data yet.

So, top-k actually returns a tuple with two tensors.

So, the first tensor is the actual probability values,

and the second tensor are the class indices themselves.

So typically, we just want this top class here.

So, I’m calling top-k here and I’m separating out the probabilities in the classes.

So, we’ll just use this top class going forward.

So, now that we have the predicted classes from our network,

we can compare that with the true labels.

So, we say, we can say like top class equals equals labels.

The only trick here is that we need to make

sure our top class tensor and the labels tensor has the same shape.

So, this equality actually operates appropriately like we expect.

So, labels from the test loader is actually a 1D tensor with 64 elements,

but top class itself is a 2D tensor, 64 by one.

So here, I’m just like changing the shape of labels to match the shape of top class.

This gives us this equals tensor.

We can actually see it looks like.

So, it gives us a bunch of zeros and ones.

So, zeros are where they don’t match,

and then ones are where they do match.

Now, we have this tensor that’s all just a bunch of zeros and ones.

So, if we want to know the accuracy, right?

We can just sum up all the correct things,

all the correct predictions,

and then divide by the total number of predictions.

If you’re tensor is all zeros and ones,

that’s actually equivalent to taking the mean.

So for that, we can do torch.mean,

but the problem is that equals is actually a byte tensor,

and torch.mean won’t work on byte tensors.

So, we actually need to convert equals until a float tensor.

If we do that, then we can actually see our accuracy for

this one particular batch is 15.6 percent.

So, this is roughly what we expect.

So, our network hasn’t been trained yet.

It’s making pretty much random guesses.

That means that we should see our accuracy be about one in ten for

any particular image because it’s just uniformly guessing one of the classes, okay?

So here, I’m going to have you actually implement this validation loop,

where you’ll pass in data from

the test set through the network and calculate the loss and the accuracy.

So, one thing to note, I think I mentioned this before.

For the validation paths, we’re not actually going to be doing any training.

So, we don’t need the gradients.

So, you can actually speed up your code a little bit if you turn off the gradients.

So, using this context,

so with torch.no_grad, then you can put your validation pass in here.

So, for images and labels in your test loader and then do the validation pass here.

So, I’ve basically built a classifier for you, set all this up.

Here’s the training pass, and then it’s up to you to implement the validation pass,

and then print out the accuracy.

All right. Good luck,

and if you get stuck or want any help,

be sure to check out my solution.

안녕하세요. 그래서 지금,

우리는 추론과 검증에 대해 이야기하기 시작할 것입니다.

따라서 훈련된 네트워크가 있으면

일반적으로 예측에 사용하려고 합니다.

이것을 추론이라고 하며,

통계에서 차용한 용어입니다.

그러나 신경망은 너무 잘 수행하는 경향이 있습니다.

당신의 훈련 데이터는

네트워크가 이전에 본 적이 없는 데이터를 일반화하십시오.

이것을 과적합이라고 합니다.

이것은 훈련 세트에서 점점 더 많이 훈련함에 따라 발생합니다.

네트워크가 상관 관계를 파악하기 시작하고

훈련 세트에 있지만 그렇지 않은 패턴

가능한 모든 손글씨 숫자의 보다 일반적인 데이터 세트에서.

따라서 과적합을 테스트하려면

훈련 세트에 없는 데이터에 대한 네트워크의 성능을 측정합니다.

이 데이터는 일반적으로 검증 세트 또는 테스트 세트라고 합니다.

따라서 검증 세트의 성능을 측정하는 동안

우리는 또한 dropout과 같은 regularization을 통해 overfitting을 줄이기 위해 노력했습니다.

그래서 이 노트에는

우리 둘 다 어떻게 볼 수 있는지 보여줄게

검증 세트와 드롭아웃을 사용하여 과적합을 줄입니다.

따라서 PyTorch에서와 같이 데이터에 대한 훈련 세트를 얻으려면

그런 다음 우리는 train이 true이고 fashionMNIST에 대해 같음을 말합니다.

테스트 세트를 얻으려면

우리는 실제로 여기에서 train equals를 false로 설정할 것입니다.

여기에서는 이전에 했던 것처럼 모델을 정의하고 있습니다.

따라서 유효성 검사의 목표는 다음을 측정하는 것입니다.

훈련 세트의 일부가 아닌 데이터에 대한 모델의 성능.

그러나 성능이 의미하는 바는 귀하에게 달려 있습니다.

개발자, 코드를 작성하는 사람까지.

대부분의 경우 정확도일 뿐입니다.

따라서 얼마나 많은 올바른 분류가 수행되었는지와 같이

우리의 모델은 모든 예측과 비교하여?

측정항목에 대한 다른 옵션은 정밀도와 재현율입니다.

그리고 상위 5개 오류율.

그래서 여기에서는 유효성 검사 세트의 정확도를 실제로 측정하는 방법을 보여 드리겠습니다.

따라서 먼저 테스트 세트에서 한 배치인 정방향 패스를 수행하겠습니다.

따라서 테스트 세트에서 확률을 얻습니다.

따라서 일괄 처리에 64개의 예만 있습니다.

그런 다음 각 클래스에 대해 하나와 같은 10개의 열이 있습니다.

따라서 정확도는 우리 모델이

이미지가 주어진 클래스의 정확한 예측.

예측은 클래스가 가장 높은 확률을 갖는 것으로 간주할 수 있습니다.

따라서 이를 위해 텐서에서 이 top-k 방법을 사용할 수 있습니다.

이것은 k개의 가장 높은 값을 반환합니다.

그래서 우리가 하나를 통과하면,

그러면 이것은 우리에게 가장 높은 값을 줄 것입니다.

이 가장 높은 값은 네트워크에서 예측할 가능성이 가장 높은 클래스입니다.

따라서 처음 10개의 예에 대해

제가 수집한 이 테스트 데이터 배치는

우리는 클래스 4와 클래스 5가 이들에 대해 예측되는 것임을 알 수 있습니다.

따라서 이 네트워크는 실제로 아직 훈련되지 않았습니다.

무작위로 추측하는 것뿐입니다.

아직 데이터에 대해 아는 것이 없습니다.

따라서 top-k는 실제로 두 개의 텐서가 있는 튜플을 반환합니다.

따라서 첫 번째 텐서는 실제 확률 값입니다.

두 번째 텐서는 클래스 인덱스 자체입니다.

따라서 일반적으로 여기에서 이 최상위 클래스를 원합니다.

그래서 저는 여기서 top-k라고 부르고 클래스에서 확률을 분리합니다.

따라서 앞으로는 이 최상위 클래스만 사용할 것입니다.

이제 네트워크에서 예측된 클래스가 있으므로

실제 레이블과 비교할 수 있습니다.

그래서 우리는 최상위 클래스가 레이블과 같다고 말할 수 있습니다.

여기서 유일한 트릭은

최상위 클래스 텐서와 레이블 텐서의 모양이 동일한지 확인합니다.

따라서 이 평등은 실제로 우리가 기대하는 대로 적절하게 작동합니다.

따라서 테스트 로더의 레이블은 실제로 64개 요소가 있는 1D 텐서입니다.

그러나 최상위 클래스 자체는 64 x 1의 2D 텐서입니다.

그래서 여기서는 탑 클래스의 모양에 맞게 레이블 모양을 변경하는 것과 같습니다.

이것은 우리에게 동등한 텐서를 제공합니다.

우리는 실제로 그 모습을 볼 수 있습니다.

그래서, 그것은 우리에게 많은 0과 1을 줍니다.

따라서 0은 일치하지 않는 곳입니다.

그리고 그것들은 일치하는 곳입니다.

이제 우리는 0과 1로 구성된 이 텐서를 가지고 있습니다.

정확도를 알고 싶다면 맞습니까?

우리는 모든 올바른 것들을 요약할 수 있습니다.

모든 정확한 예측,

그런 다음 총 예측 수로 나눕니다.

텐서가 모두 0과 1인 경우

그것은 실제로 평균을 취하는 것과 같습니다.

이를 위해 우리는 torch.mean을 할 수 있습니다.

하지만 문제는 equals가 실제로 바이트 텐서라는 것입니다.

그리고 torch.mean은 바이트 텐서에서 작동하지 않습니다.

따라서 실제로 float 텐서까지 equals를 변환해야 합니다.

그렇게 하면 실제로 정확도를 확인할 수 있습니다.

이 특정 배치는 15.6%입니다.

이것은 대략 우리가 기대하는 것입니다.

따라서 우리 네트워크는 아직 훈련되지 않았습니다.

거의 무작위로 추측하고 있습니다.

즉, 10분의 1 정도 정확도를 확인해야 합니다.

특정 이미지는 클래스 중 하나를 균일하게 추측하기 때문입니다. 알겠죠?

여기에서는 실제로 이 유효성 검사 루프를 구현하도록 하고,

데이터를 전달할 위치

네트워크를 통해 테스트 세트를 만들고 손실과 정확도를 계산합니다.

따라서 한 가지 주의할 점은 이전에 언급한 적이 있습니다.

유효성 검사 경로의 경우 실제로 교육을 수행하지 않습니다.

따라서 그라디언트가 필요하지 않습니다.

따라서 그라디언트를 끄면 실제로 코드 속도를 약간 높일 수 있습니다.

따라서 이 컨텍스트를 사용하여

따라서 torch.no_grad를 사용하면 여기에 유효성 검사 패스를 넣을 수 있습니다.

따라서 테스트 로더의 이미지와 레이블에 대해 여기에서 유효성 검사를 통과하십시오.

그래서, 저는 기본적으로 여러분을 위해 분류기를 만들었습니다. 이 모든 것을 설정했습니다.

다음은 학습 패스입니다. 그런 다음 유효성 검사 패스를 구현하는 것은 사용자에게 달려 있습니다.

그런 다음 정확도를 인쇄하십시오.

괜찮아. 행운을 빕니다,

막히거나 도움이 필요하면

내 솔루션을 확인하십시오.

# Inference and Validation

Now that you have a trained network, you can use it for making predictions. This is typically called inference, a term borrowed from statistics. However, neural networks have a tendency to perform too well on the training data and aren’t able to generalize to data that hasn’t been seen before. This is called overfitting and it impairs inference performance. To test for overfitting while training, we measure the performance on data not in the training set called the validation set. We avoid overfitting through regularization such as dropout while monitoring the validation performance during training. In this notebook, I’ll show you how to do this in PyTorch.

As usual, let’s start by loading the dataset through torchvision. You’ll learn more about torchvision and loading data in a later part. This time we’ll be taking advantage of the test set which you can get by setting train=False here:

testset = datasets.FashionMNIST('~/.pytorch/F_MNIST_data/', download=True, train=False, transform=transform)

The test set contains images just like the training set. Typically you’ll see 10-20% of the original dataset held out for testing and validation with the rest being used for training.

import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])

testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

Here I’ll create a model like normal, using the same one from my solution for part 4.

from torch import nn, optim
import torch.nn.functional as F

class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)

def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)

x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.log_softmax(self.fc4(x), dim=1)

return x

The goal of validation is to measure the model’s performance on data that isn’t part of the training set. Performance here is up to the developer to define though. Typically this is just accuracy, the percentage of classes the network predicted correctly. Other options are precision and recall and top-5 error rate. We’ll focus on accuracy here. First I’ll do a forward pass with one batch from the test set.

model = Classifier()

# Get the class probabilities
ps = torch.exp(model(images))
# Make sure the shape is appropriate, we should get 10 class probabilities for 64 examples
print(ps.shape)

## torch.Size([64, 10])

With the probabilities, we can get the most likely class using the ps.topk method. This returns the $k$ highest values. Since we just want the most likely class, we can use ps.topk(1). This returns a tuple of the top-$k$ values and the top-$k$ indices. If the highest value is the fifth element, we’ll get back 4 as the index.

top_p, top_class = ps.topk(1, dim=1)
# Look at the most likely classes for the first 10 examples
print(top_class[:10,:])

"""
tensor([[5],
[3],
[5],
[9],
[5],
[9],
[5],
[5],
[9],
[5]])
"""

Now we can check if the predicted classes match the labels. This is simple to do by equating top_class and labels, but we have to be careful of the shapes. Here top_class is a 2D tensor with shape (64, 1) while labels is 1D with shape (64). To get the equality to work out the way we want, top_class and labels must have the same shape.

If we do

equals = top_class == labels

equals will have shape (64, 64), try it yourself. What it’s doing is comparing the one element in each row of top_class with each element in labels which returns 64 True/False boolean values for each row.

equals = top_class == labels.view(*top_class.shape)

Now we need to calculate the percentage of correct predictions. equals has binary values, either 0 or 1. This means that if we just sum up all the values and divide by the number of values, we get the percentage of correct predictions. This is the same operation as taking the mean, so we can get the accuracy with a call to torch.mean. If only it was that simple. If you try torch.mean(equals), you’ll get an error

RuntimeError: mean is not implemented for type torch.ByteTensor

This happens because equals has type torch.ByteTensor but torch.mean isn’t implement for tensors with that type. So we’ll need to convert equals to a float tensor. Note that when we take torch.mean it returns a scalar tensor, to get the actual value as a float we’ll need to do accuracy.item().

accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')

"""
Accuracy: 14.0625%
"""

The network is untrained so it’s making random guesses and we should see an accuracy around 10%. Now let’s train our network and include our validation pass so we can measure how well the network is performing on the test set. Since we’re not updating our parameters in the validation pass, we can speed up the by turning off gradients using torch.no_grad():

# turn off gradients
# validation pass here
...

Exercise: Implement the validation loop below and print out the total accuracy after the loop. You can largely copy and paste the code from above, but I suggest typing it in because writing it out yourself is essential for building the skill. In general you’ll always learn more by typing it rather than copy-pasting. You should be able to get an accuracy above 80%.

model = Classifier()
criterion = nn.NLLLoss()

epochs = 30
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
running_loss = 0

log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()

running_loss += loss.item()

else:
## TODO: Implement the validation pass and print out the validation accuracy
print(f'Accuracy: {accuracy.item()*100}%')
%matplotlib inline

import matplotlib.pyplot as plt

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

## Overfitting

If we look at the training and validation losses as we train the network, we can see a phenomenon known as overfitting.

The network learns the training set better and better, resulting in lower training losses. However, it starts having problems generalizing to data outside the training set leading to the validation loss increasing. The ultimate goal of any deep learning model is to make predictions on new data, so we should strive to get the lowest validation loss possible. One option is to use the version of the model with the lowest validation loss, here the one around 8-10 training epochs. This strategy is called early-stopping. In practice, you’d save the model frequently as you’re training then later choose the model with the lowest validation loss.

The most common method to reduce overfitting (outside of early-stopping) is dropout, where we randomly drop input units. This forces the network to share information between weights, increasing it’s ability to generalize to new data. Adding dropout in PyTorch is straightforward using the nn.Dropout module.

class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)

# Dropout module with 0.2 drop probability
self.dropout = nn.Dropout(p=0.2)

def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)

# Now with dropout
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = self.dropout(F.relu(self.fc3(x)))

# output so no dropout here
x = F.log_softmax(self.fc4(x), dim=1)

return x

During training we want to use dropout to prevent overfitting, but during inference we want to use the entire network. So, we need to turn off dropout during validation, testing, and whenever we’re using the network to make predictions. To do this, you use model.eval(). This sets the model to evaluation mode where the dropout probability is 0. You can turn dropout back on by setting the model to train mode with model.train(). In general, the pattern for the validation loop will look like this, where you turn off gradients, set the model to evaluation mode, calculate the validation loss and metric, then set the model back to train mode.

# turn off gradients

# set model to evaluation mode
model.eval()

# validation pass here
...

# set model back to train mode
model.train()

Exercise: Add dropout to your model and train it on Fashion-MNIST again. See if you can get a lower validation loss or higher accuracy.

## TODO: Define your model with dropout added

## TODO: Train your model with dropout, and monitor the training progress with the validation loss and accuracy


%matplotlib inline

import matplotlib.pyplot as plt

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

## Inference

Now that the model is trained, we can use it for inference. We’ve done this before, but now we need to remember to set the model in inference mode with model.eval(). You’ll also want to turn off autograd with the torch.no_grad() context.

# Import helper module (should be in the repo)
import helper

model.eval()

images, labels = dataiter.next()
img = images[0]
# Convert 2D image to 1D vector
img = img.view(1, 784)

# Calculate the class probabilities (softmax) for img
output = model.forward(img)

ps = torch.exp(output)

# Plot the image and probabilities
helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')

## Next Up!

In the next part, I’ll show you how to save your trained models. In general, you won’t want to train a model everytime you need it. Instead, you’ll train once, save it, then load the model when you want to train more or use if for inference.

Full Code:

import torch
from torchvision import datasets, transforms

# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])

from torch import nn, optim
import torch.nn.functional as F

class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)

def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)

x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = F.log_softmax(self.fc4(x), dim=1)

return x

model = Classifier()

# Get the class probabilities
ps = torch.exp(model(images))
# Make sure the shape is appropriate, we should get 10 class probabilities for 64 examples
print(ps.shape)

## torch.Size([64, 10])

top_p, top_class = ps.topk(1, dim=1)
# Look at the most likely classes for the first 10 examples
print(top_class[:10,:])

"""
tensor([[5],
[3],
[5],
[9],
[5],
[9],
[5],
[5],
[9],
[5]])
"""

equals = top_class == labels.view(*top_class.shape)

accuracy = torch.mean(equals.type(torch.FloatTensor))
print(f'Accuracy: {accuracy.item()*100}%')

"""
Accuracy: 14.0625%
"""

model = Classifier()
criterion = nn.NLLLoss()

epochs = 30
steps = 0

train_losses, test_losses = [], []
for e in range(epochs):
running_loss = 0

log_ps = model(images)
loss = criterion(log_ps, labels)
loss.backward()
optimizer.step()

running_loss += loss.item()

else:
## TODO: Implement the validation pass and print out the validation accuracy
print(f'Accuracy: {accuracy.item()*100}%')

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

import matplotlib.pyplot as plt
class Classifier(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(784, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)

# Dropout module with 0.2 drop probability
self.dropout = nn.Dropout(p=0.2)

def forward(self, x):
# make sure input tensor is flattened
x = x.view(x.shape[0], -1)

# Now with dropout
x = self.dropout(F.relu(self.fc1(x)))
x = self.dropout(F.relu(self.fc2(x)))
x = self.dropout(F.relu(self.fc3(x)))

# output so no dropout here
x = F.log_softmax(self.fc4(x), dim=1)

return x

## TODO: Train your model with dropout, and monitor the training progress with the validation loss and accuracy

import matplotlib.pyplot as plt

plt.plot(train_losses, label='Training loss')
plt.plot(test_losses, label='Validation loss')
plt.legend(frameon=False)

# Import helper module (should be in the repo)
import helper

model.eval()

helper.view_classify(img.view(1, 28, 28), ps, version='Fashion')