How to Implement Various Optimization Algorithms in Pytorch?

Last Updated : 23 Jul, 2025

Optimization algorithms are an essential aspect of deep learning, and PyTorch provides a wide range of optimization algorithms to help us train our neural networks effectively. In this article, we will explore various optimization algorithms in PyTorch and demonstrate how to implement them. We will use a simple neural network for the demonstration.

NOTE: If in your system, the PyTorch module is not installed, then you need to install PyTorch by running the following command in your terminal or command prompt :

pip install torch torchvision

This will install the PyTorch module along with torchvision, which is a package that provides access to popular datasets, model architectures, and image transformations for PyTorch. Once you have installed these modules, you should be able to run the code without any errors.

Implementations

Import Libraries:

First, we need to import the required libraries. We will be using the PyTorch framework, so we will import the torch library. We will also use the MNIST dataset to train our neural network, so we will import the torchvision library.

Python3

import torch
import torchvision
import torchvision.transforms as transforms

Load Data:

Next, we will load the MNIST dataset and prepare it for training. We will normalize the data and create batches of data using the DataLoader class.

Python3

transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])

trainset = torchvision.datasets.MNIST(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64,
                                          shuffle=True, num_workers=2)

Output:

Files already downloaded and verified

Build Neural Network Model:

We will define a simple neural network with two hidden layers, each with 128 neurons, and an output layer with 10 neurons, one for each digit. We will use the ReLU activation function for the hidden layers and the softmax activation function for the output layer.

Python3

class Net(torch.nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = torch.nn.Linear(784, 128)
        self.fc2 = torch.nn.Linear(128, 128)
        self.fc3 = torch.nn.Linear(128, 10)

    def forward(self, x):
        x = x.view(-1, 784)
        x = torch.nn.functional.relu(self.fc1(x))
        x = torch.nn.functional.relu(self.fc2(x))
        x = torch.nn.functional.softmax(self.fc3(x), dim=1)
        return x

net = Net()

Loss Function and Optimization Algorithm:

We will use the cross-entropy loss function to train our neural network. We will also use various optimization algorithms, such as stochastic gradient descent (SGD), Adam, Adagrad, and Adadelta, to train our neural network. We will define these optimization algorithms and their hyperparameters as follows:

Python3

criterion = torch.nn.CrossEntropyLoss()

# SGD optimizer
optimizer_sgd = torch.optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

# Adam optimizer
optimizer_adam = torch.optim.Adam(net.parameters(), lr=0.01, betas=(0.9, 0.999))

# Adagrad optimizer
optimizer_adagrad = torch.optim.Adagrad(net.parameters(), lr=0.01)

# Adadelta optimizer
optimizer_adadelta = torch.optim.Adadelta(net.parameters(), rho=0.9)

Now, Train the Neural Network:

We will now train our neural network using the various optimization algorithms we defined earlier. We will train our neural network for 10 epochs and print the loss and accuracy after each epoch.

Python3

# Train the neural network using different optimization algorithms
for epoch in range(10):
    running_loss = 0.0
    correct = 0
    total = 0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        # move data and target to the GPU
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer_sgd.zero_grad()
        optimizer_adam.zero_grad()
        optimizer_adagrad.zero_grad()
        optimizer_adadelta.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer_sgd.step()
        optimizer_adam.step()
        optimizer_adagrad.step()
        optimizer_adadelta.step()
        running_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Epoch: %d | Loss: %.3f | Accuracy: %.3f %%' %
          (epoch + 1, running_loss / len(trainloader), 100 * correct / total))

Output:

Epoch: 1 | Loss: 1.589 | Accuracy: 42.224 %
Epoch: 2 | Loss: 1.377 | Accuracy: 51.298 %
Epoch: 3 | Loss: 1.314 | Accuracy: 54.116 %
Epoch: 4 | Loss: 1.272 | Accuracy: 55.800 %
Epoch: 5 | Loss: 1.249 | Accuracy: 57.118 %
Epoch: 6 | Loss: 1.223 | Accuracy: 57.998 %
Epoch: 7 | Loss: 1.204 | Accuracy: 58.720 %
Epoch: 8 | Loss: 1.191 | Accuracy: 59.426 %
Epoch: 9 | Loss: 1.181 | Accuracy: 59.916 %
Epoch: 10 | Loss: 1.176 | Accuracy: 60.258 %

Use different optimization algorithms for different parts of the model

Python3

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data import DataLoader

# Define a neural network architecture
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(torch.relu(self.conv1(x)))
        x = self.pool(torch.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = torch.relu(self.fc1(x))
        x = torch.relu(self.fc2(x))
        x = self.fc3(x)
        return x

# Define the training dataset and data loader
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = datasets.CIFAR10(
    root='./data', train=True, download=True, transform=transform)
trainloader = DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

# Move the model to the GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = Net().to(device)

# Define the optimization algorithms
optimizers = [optim.SGD(net.parameters('fc3'), lr=0.001, momentum=0.9),
              optim.Adagrad(net.parameters('fc2'), lr=0.001),
             optim.Adam(net.parameters('fc1'), lr=0.001)]


# Train the neural network using different optimization algorithms
for epoch in range(10):
    running_loss = 0.0
    correct = 0
    total = 0
    for i, data in enumerate(trainloader, 0):
        inputs, labels = data
        # move data and target to the GPU
        inputs, labels = inputs.to(device), labels.to(device)
        for optimizer in optimizers:
            optimizer.zero_grad()
        outputs = net(inputs)
        
        EntropyLoss = nn.CrossEntropyLoss()(outputs, labels)
        fc1_loss = nn.L1Loss()(net.fc1.weight, torch.zeros_like(net.fc1.weight))
        fc2_loss = nn.L1Loss()(net.fc2.weight, torch.zeros_like(net.fc2.weight))
        total_loss = EntropyLoss + fc1_loss + fc2_loss
        total_loss.backward()
        
        for optimizer in optimizers:
            optimizer.step()
        running_loss += total_loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
    print('Epoch: %d | Loss: %.3f | Accuracy: %.3f %%' %
          (epoch + 1, running_loss / len(trainloader), 100 * correct / total))

Output:

Files already downloaded and verified
Epoch: 1 | Loss: 1.634 | Accuracy: 41.848 %
Epoch: 2 | Loss: 1.436 | Accuracy: 50.932 %
Epoch: 3 | Loss: 1.367 | Accuracy: 54.456 %
Epoch: 4 | Loss: 1.318 | Accuracy: 56.632 %
Epoch: 5 | Loss: 1.287 | Accuracy: 58.154 %
Epoch: 6 | Loss: 1.270 | Accuracy: 59.088 %
Epoch: 7 | Loss: 1.247 | Accuracy: 60.192 %
Epoch: 8 | Loss: 1.235 | Accuracy: 60.676 %
Epoch: 9 | Loss: 1.226 | Accuracy: 61.344 %
Epoch: 10 | Loss: 1.220 | Accuracy: 61.608 %

Advantages and disadvantages of implementing various Optimization Algorithm in Pytorch

Advantages:

Improved training performance: Using different optimization algorithms for different parts of the model can improve the training performance by allowing each part of the model to learn at its optimal rate.
Better convergence: Some optimization algorithms perform better for specific types of model architectures. With the help of multiple optimizations, we can take advantage of their respective strength to achieve better convergence.
Regularization: Different optimization algorithms can have different regularisation impacts on the model. It can prevent from overfitting and enhance the model's generalizability.

Disadvantages:

Increased complexity: Implementing multiple optimization algorithms can increase the complexity which will require more training time and resources, And it may be harder to maintain and debug.
Risk of instability: Using several optimization algorithms can make the training process more unstable because different algorithms may attempt to optimize the same parameter in conflicting or oscillating ways.

worldhello

Improve

Article Tags :

How to Implement Various Optimization Algorithms in Pytorch?

Implementations

Import Libraries:

Load Data:

Build Neural Network Model:

Loss Function and Optimization Algorithm:

Now, Train the Neural Network:

Use different optimization algorithms for different parts of the model

Advantages and disadvantages of implementing various Optimization Algorithm in Pytorch

Advantages:

Disadvantages:

Explore

Machine Learning Basics

Python for Machine Learning

Feature Engineering

Supervised Learning

Unsupervised Learning

Model Evaluation and Tuning

Advanced Techniques

Machine Learning Practice

Thank You!

What kind of Experience do you want to share?