Training Neural Networks with Validation using PyTorch
Last Updated :
19 Aug, 2021
Neural Networks are a biologically-inspired programming paradigm that deep learning is built around. Python provides various libraries using which you can create and train neural networks over given data. PyTorch is one such library that provides us with various utilities to build and train neural networks easily. When it comes to Neural Networks it becomes essential to set optimal architecture and hyper parameters. While training a neural network the training loss always keeps reducing provided the learning rate is optimal. But it’s important that our network performs better not only on data it’s trained on but also data that it has never seen before. One way to measure this is by introducing a validation set to keep track of the testing accuracy of the neural network. In this article we’ll how we can keep track of validation accuracy at each training step and also save the model weights with the best validation accuracy.
Installing PyTorch
Installing PyTorch is pretty similar to any other python library. We can use pip or conda to install PyTorch:-
pip install torch torchvision
This command will install PyTorch along with torchvision which provides various datasets, models, and transforms for computer vision. To install using conda you can use the following command:-
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
Loading Data
For this tutorial, we are going to use the MNIST dataset that’s provided in the torchvision library. In Deep Learning we often train our neural networks in batches of a certain size, DataLoader is a data loading utility in PyTorch that creates an iterable over these batches of the dataset. Let’s start by loading our data:-
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
transforms = transforms.Compose([
transforms.ToTensor()
])
In the above code, we declared a variable called transform which essentially helps us transform the raw data in the defined format. Here our transform is simply taking the raw data and converting it to a Tensor. A Tensor is a fancy way of saying a n-dimensional matrix.
train = datasets.MNIST('', train = True, transform = transforms, download = True)
train, valid = random_split(train,[50000,10000])
Now we are downloading our raw data and apply transform over it to convert it to Tensors, train tells if the data that’s being loaded is training data or testing data. In the end, we did a split the train tensor into 2 tensors of 50000 and 10000 data points which become our train and valid tensors.
trainloader = DataLoader(train, batch_size=32)
validloader = DataLoader(valid, batch_size=32)
Now we just created our DataLoaders of the above tensors of 32 batch size. Now that we have the data let’s start by creating our neural network.
Building our Model
There are 2 ways we can create neural networks in PyTorch i.e. using the Sequential() method or using the class method. We’ll use the class method to create our neural network since it gives more control over data flow. The format to create a neural network using the class method is as follows:-
from torch import nn
class model(nn.Module):
def __init__(self):
# Define Model Here
def forward(self, x):
# Define Forward Pass Here
So in the __init__() method we define our layers and other variables and in the forward() method we define our forward pass i.e. how data flows through the layers.
import torch
from torch import nn
import torch.nn.functional as F
class Network(nn.Module):
def __init__(self):
super(Network,self).__init__()
self.fc1 = nn.Linear(28*28, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 10)
def forward(self, x):
x = x.view(1,-1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
model = Network()
if torch.cuda.is_available():
model = model.cuda()
In the above code, we defined a neural network with the following architecture:-
- Input Layer: 784 nodes, MNIST images are of dimension 28*28 which have 784 pixels so when flatted it’ll become the input to the neural network with 784 input nodes.
- Hidden Layer 1: 256 nodes
- Hidden Layer 2: 128 nodes
- Output Layer: 10 nodes, for 10 classes i.e. numbers 0-9
nn.Linear() or Linear Layer is used to apply a linear transformation to the incoming data. If you are familiar with TensorFlow it’s pretty much like the Dense Layer.
In the forward() method we start off by flattening the image and passing it through each layer and applying the activation function for the same. After that, we create our neural network instance, and lastly, we are just checking if the machine has a GPU and if it has we’ll transfer our model there for faster computation.
Defining Criterion and Optimizer
Optimizers define how the weights of the neural network are to be updated, in this tutorial we’ll use SGD Optimizer or Stochastic Gradient Descent Optimizer. Optimizers take model parameters and learning rate as the input arguments. There are various optimizers you can try like Adam, Adagrad, etc.
The criterion is the loss that you want to minimize which in this case is the CrossEntropyLoss() which is the combination of log_softmax() and NLLLoss().
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
Training Neural Network with Validation
The training step in PyTorch is almost identical almost every time you train it. But before implementing that let’s learn about 2 modes of the model object:-
- Training Mode: Set by model.train(), it tells your model that you are training the model. So layers like dropout etc. which behave differently while training and testing can behave accordingly.
- Evaluation Mode: Set by model.eval(), it tells your model that you are testing the model.
Even though you don’t need it here it’s still better to know about them. Now that we have that clear let’s understand the training steps:-
- Move data to GPU (Optional)
- Clear the gradients using optimizer.zero_grad()
- Make a forward pass
- Calculate the loss
- Perform a backward pass using loss.backward() to calculate the gradients
- Take optimizer step using optimizer.step() to update the weights
The validation and Testing steps are also similar but there you just make a forward pass and calculate the loss. A Simple training loop without validation is written like the following:-
epochs = 5
for e in range(epochs):
train_loss = 0.0
for data, labels in tqdm(trainloader):
# Transfer Data to GPU if available
if torch.cuda.is_available():
data, labels = data.cuda(), labels.cuda()
# Clear the gradients
optimizer.zero_grad()
# Forward Pass
target = model(data)
# Find the Loss
loss = criterion(target,labels)
# Calculate gradients
loss.backward()
# Update Weights
optimizer.step()
# Calculate Loss
train_loss += loss.item()
print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(trainloader)}')
If you add the validation loop it’ll be the same but with forward pass and loss calculation only. But it may happen that your last iteration isn’t the one that gave you the least validation loss. To tackle this we can set a max valid loss which can be np.inf and if the current valid loss is lesser than we can save the state dictionary of the model which we can load later, like a checkpoint. state_dict is an OrderedDict object that maps each layer to its parameter tensor.
import numpy as np
epochs = 5
min_valid_loss = np.inf
for e in range(epochs):
train_loss = 0.0
model.train() # Optional when not using Model Specific layer
for data, labels in trainloader:
if torch.cuda.is_available():
data, labels = data.cuda(), labels.cuda()
optimizer.zero_grad()
target = model(data)
loss = criterion(target,labels)
loss.backward()
optimizer.step()
train_loss += loss.item()
valid_loss = 0.0
model.eval() # Optional when not using Model Specific layer
for data, labels in validloader:
if torch.cuda.is_available():
data, labels = data.cuda(), labels.cuda()
target = model(data)
loss = criterion(target,labels)
valid_loss = loss.item() * data.size(0)
print(f'Epoch {e+1} \t\t Training Loss: {train_loss / len(trainloader)} \t\t Validation Loss: {valid_loss / len(validloader)}')
if min_valid_loss > valid_loss:
print(f'Validation Loss Decreased({min_valid_loss:.6f}--->{valid_loss:.6f}) \t Saving The Model')
min_valid_loss = valid_loss
# Saving State Dict
torch.save(model.state_dict(), 'saved_model.pth')
After running the above code you should get the following output, although your loss might vary:-

Code
Python3
import torch
from torch import nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, random_split
import numpy as np
transforms = transforms.Compose([
transforms.ToTensor()
])
train = datasets.MNIST('', train = True , transform = transforms, download = True )
train, valid = random_split(train,[ 50000 , 10000 ])
trainloader = DataLoader(train, batch_size = 32 )
validloader = DataLoader(valid, batch_size = 32 )
class Network(nn.Module):
def __init__( self ):
super (Network, self ).__init__()
self .fc1 = nn.Linear( 28 * 28 , 256 )
self .fc2 = nn.Linear( 256 , 128 )
self .fc3 = nn.Linear( 128 , 10 )
def forward( self , x):
x = x.view(x.shape[ 0 ], - 1 )
x = F.relu( self .fc1(x))
x = F.relu( self .fc2(x))
x = self .fc3(x)
return x
model = Network()
if torch.cuda.is_available():
model = model.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01 )
epochs = 5
min_valid_loss = np.inf
for e in range (epochs):
train_loss = 0.0
for data, labels in trainloader:
if torch.cuda.is_available():
data, labels = data.cuda(), labels.cuda()
optimizer.zero_grad()
target = model(data)
loss = criterion(target,labels)
loss.backward()
optimizer.step()
train_loss + = loss.item()
valid_loss = 0.0
model. eval ()
for data, labels in validloader:
if torch.cuda.is_available():
data, labels = data.cuda(), labels.cuda()
target = model(data)
loss = criterion(target,labels)
valid_loss + = loss.item()
print (f'Epoch {e + 1 } \t\t Training Loss: {\
train_loss / len (trainloader)} \t\t Validation Loss: {\
valid_loss / len (validloader)}')
if min_valid_loss > valid_loss:
print (f'Validation Loss Decreased({min_valid_loss:. 6f \
} - - - >{valid_loss:. 6f }) \t Saving The Model')
min_valid_loss = valid_loss
torch.save(model.state_dict(), 'saved_model.pth' )
|
Similar Reads
Training Neural Networks using Pytorch Lightning
Introduction: PyTorch Lightning is a library that provides a high-level interface for PyTorch. Problem with PyTorch is that every time you start a project you have to rewrite those training and testing loop. PyTorch Lightning fixes the problem by not only reducing boilerplate code but also providing
7 min read
Visualizing PyTorch Neural Networks
Visualizing neural network models is a crucial step in understanding their architecture, debugging, and conveying their design. PyTorch, a popular deep learning framework, offers several tools and libraries that facilitate model visualization. This article will guide you through the process of visua
4 min read
Graph Neural Networks with PyTorch
Graph Neural Networks (GNNs) represent a powerful class of machine learning models tailored for interpreting data described by graphs. This is particularly useful because many real-world structures are networks composed of interconnected elements, such as social networks, molecular structures, and c
4 min read
How to Visualize PyTorch Neural Networks
Visualizing neural networks is crucial for understanding their architecture, debugging, and optimizing models. PyTorch offers several ways to visualize both simple and complex neural networks. In this article, we'll explore how to visualize different types of neural networks, including a simple feed
7 min read
Train and Test Neural Networks Using R
Training and testing neural networks using R is a fundamental aspect of machine learning and deep learning. In this comprehensive guide, we will explore the theory and practical steps involved in building, training, and evaluating neural networks in R Programming Language. Neural networks are a clas
10 min read
Training a Neural Network using Keras API in Tensorflow
In the field of machine learning and deep learning has been significantly transformed by tools like TensorFlow and Keras. TensorFlow, developed by Google, is an open-source platform that provides a comprehensive ecosystem for machine learning. Keras, now fully integrated into TensorFlow, offers a us
3 min read
Select the right Weight for deep Neural Network in Pytorch
PyTorch has developed a strong and adaptable framework for creating deep neural networks (DNNs) in the field of deep learning. Choosing the proper weight for your model is an important component in designing an efficient DNN. Initialization of weights is critical in deciding how successfully your ne
11 min read
Training vs Testing vs Validation Sets
In this article, we are going to see how to Train, Test and Validate the Sets. The fundamental purpose for splitting the dataset is to assess how effective will the trained model be in generalizing to new data. This split can be achieved by using train_test_split function of scikit-learn. Training S
7 min read
Probabilistic Neural Networks (PNNs)
Probabilistic Neural Networks (PNNs) were introduced by D.F. Specht in 1966 to tackle classification and pattern recognition problems through a statistical approach. In this article, we are going to delve into the fundamentals of PNNs. Table of Content Understanding Probabilistic Neural NetworksArch
5 min read
How to Use K-Fold Cross-Validation in a Neural Network
To use K-Fold Cross-Validation in a neural network, you need to perform K-Fold Cross-Validation splits the dataset into K subsets or "folds," where each fold is used as a validation set while the remaining folds are used as training sets. This helps in understanding how the model performs across dif
3 min read