Multinomial Logistic Regression with PyTorch

Last Updated : 24 Apr, 2025

Logistic regression is a popular machine learning algorithm used for binary classification tasks. It models the probability of the output variable (also known as the dependent variable) given the input variables (also known as the independent variables). It is a linear algorithm that applies a logistic function to the output of a linear regression model, which transforms the continuous output into a probability between 0 and 1.

Logistic regression has several variants, including binary logistic regression, multinomial logistic regression, and ordinal logistic regression. Binary logistic regression is used for binary classification tasks, where there are only two possible outcomes for the output variable. Multinomial logistic regression, also known as softmax regression, is used for multi-class classification tasks, where there are more than two possible outcomes for the output variable. Ordinal logistic regression is used for ordered multi-class classification tasks, where the outcomes have a natural ordering (e.g. low, medium, high).

In this article, we will focus on implementing multinomial logistic regression using PyTorch.

Data Preparation

For our example, we will be using the famous Iris dataset, which contains measurements of the sepal length, sepal width, petal length, and petal width for three species of iris flowers (Iris setosa, Iris versicolor, and Iris virginica). The goal is to predict the species of an iris flower based on these measurements.

Before training our multinomial logistic regression model, we need to preprocess our data. In this case, we will normalize our input data to have a mean of 0 and a standard deviation of 1. This helps to ensure that our model trains more efficiently and effectively.

We can load and prepare the dataset using PyTorch as follows:

Python3

# Import thenecessary libraies
import torch
from sklearn.datasets import load_iris
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
import torch.nn as nn
from torchinfo import summary

# Load the Iris dataset
iris = load_iris()

# Convert the data to PyTorch tensors
X = torch.tensor(iris.data, dtype=torch.float32)
y = torch.tensor(iris.target, dtype=torch.long)

# Normalize the input data
mean = torch.mean(X, dim=0)
std = torch.std(X, dim=0)
X = (X - mean) / std

# Split the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Create PyTorch Datasets
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)

# Define the data loaders
batch_size = 16
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

In this code, we begin by importing the required packages - torch, load_iris from scikit-learn, DataLoader from PyTorch, and train_test_split from scikit-learn.

We load the Iris dataset using load_iris() from scikit-learn. The data is then converted to PyTorch tensors using torch.tensor() and specifying the data type using dtype. Here, we use dtype=torch.float32 for the input data and dtype=torch.long for the output data.

We then normalize the input data using the mean and standard deviation calculated from the training data. This is done to ensure that each input feature has a similar scale and to improve model performance.

Next, we split the dataset into training and validation sets using train_test_split() from scikit-learn. We specify the test_size as 0.2, which means that 20% of the data is reserved for validation, and the random_state as 42 for reproducibility.

We create PyTorch Datasets using TensorDataset(), which takes in the input and output tensors for each set. Then, we create PyTorch DataLoaders using DataLoader() to batch and shuffle the data during training and validation. Here, we set the batch size as 16 for both training and validation and shuffle the training data by setting shuffle=True and validation data by setting shuffle=False.

Now that we have loaded the dataset into PyTorch, we can move on to data preparation.

Now that we have preprocessed and split our data, we can move on to building our multinomial logistic regression model using PyTorch.

Model Creation

Multinomial logistic regression is a type of logistic regression that is used when there are three or more categories in the dependent variable. It models the probability of each category using a separate logistic regression equation, and then selects the category with the highest probability as the predicted outcome.

We can implement multinomial logistic regression using PyTorch by defining a neural network with a single linear layer and a softmax activation function. The linear layer takes in the input data and outputs a vector of logits (i.e. unnormalized log probabilities), which are then passed through the softmax function to obtain a vector of probabilities.

Python3

class LogisticRegression(nn.Module):
    def __init__(self, input_size, num_classes):
        super(LogisticRegression, self).__init__()
        self.linear = nn.Linear(input_size, num_classes)

    def forward(self, x):
        out = self.linear(x)
        out = nn.functional.softmax(out, dim=1)
        return out

In the above code, we define a PyTorch module called LogisticRegression that inherits from nn.Module. We define a single linear layer with input size 4 (i.e. the number of input features) and output size num_classes (i.e. the number of output classes). We then define the forward() method, which takes in the input data x and passes it through the linear layer and softmax activation function.

Print the model summary

Instantiate our model by creating an instance of the LogisticRegression class with input_size=4 and num_classes=3.

If a GPU is available, we set the device to 'cuda:0', otherwise, we set it to 'cpu'.Next, we move the model to the specified device using the to() method. This is done so that the model can utilize the hardware resources of the device during training.

Print the model summary.

Python3

# Define the model
model = LogisticRegression(input_size=4, num_classes=3)

# Check for cuda
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# Move the model to the device
model = model.to(device)
summary(model, input_size=(16,4))

Output:

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
LogisticRegression                       [16, 3]                   --
├─Linear: 1-1                            [16, 3]                   15
==========================================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================

Define a loss function and an optimizer

To train our model, we need to define a loss function and an optimizer. For this example, we will use cross-entropy loss and stochastic gradient descent (SGD) optimizer.

Python3

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.002)

In the above code, we define our loss function as nn.CrossEntropyLoss(), which is commonly used for multi-class classification problems. We also define our optimizer as torch.optim.SGD(), which implements stochastic gradient descent with a learning rate of 0.002. We pass in our model parameters using model.parameters().

Now that we have defined our model, loss function, and optimizer, we can move on to training the model using our training dataset.

Training and Validation

To train our model, we need to define a few parameters such as the number of epochs, batch size, and the device to use for training (CPU or GPU).

First, we define the training parameters such as the number of epochs to train the model and the device to use for training.

Then, we start training the model using a nested loop. In the outer loop, we iterate over the specified number of epochs. In the inner loop, we iterate over the batches of data in the training DataLoader.

For each batch, we move the input data and labels to the specified device using the to() method. Then, we perform a forward pass through the model using the input data to get the model's predictions. We calculate the loss using the predicted outputs and the actual labels using the specified loss function.

We then perform backpropagation to compute the gradients of the loss with respect to the model's parameters using the backward() method. We zero out the gradients using optimizer.zero_grad() before performing backpropagation to prevent gradient accumulation. We then update the model's parameters using the optimizer's step() method.

After each epoch, we print the training loss for that epoch using the print() function.

Python3

# Define training parameters
num_epochs = 1000

# Train the model
for epoch in range(num_epochs):
    for i, (inputs, labels) in enumerate(train_loader):
        # Move inputs and labels to the device
        inputs = inputs.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # Print training loss for each epoch
    if (epoch+1)%100 == 0:
        print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))

Output:

Epoch [100/1000], Loss: 0.8293
Epoch [200/1000], Loss: 0.8145
Epoch [300/1000], Loss: 0.9410
Epoch [400/1000], Loss: 0.9489
Epoch [500/1000], Loss: 0.7749
Epoch [600/1000], Loss: 0.8242
Epoch [700/1000], Loss: 0.7823
Epoch [800/1000], Loss: 0.7413
Epoch [900/1000], Loss: 0.7628
Epoch [1000/1000], Loss: 0.7471

In this output, we can see that the training loss decreases with each epoch, indicating that our model is learning from the training data. We can see that the training loss starts at 0.8293 and gradually decreases over the epochs, indicating that the model is learning to make better predictions. The loss fluctuates a bit but generally trends downwards.

There are a few ways we can further improve our model's performance. One way is to adjust the hyperparameters such as the learning rate, number of layers, and number of neurons in each layer. We can also use different optimization algorithms such as Adam or SGD to update the model parameters. Additionally, we can try using different types of regularization such as L1 or L2 regularization to prevent overfitting. We can also use techniques such as data augmentation to increase the size of our training dataset and reduce overfitting.

Evaluations

After training our model, we can evaluate its performance on the validation set by looping over the validation dataset and computing the model's predictions for each example. We can then calculate the accuracy of the model by comparing its predictions to the true labels.

In this code block, we evaluate the performance of the trained model on the validation set.

We first wrap the evaluation code in a "torch.no_grad()" context, which tells PyTorch that we don't need to keep track of the gradients during this evaluation. This can save memory and computation time.

Next, we initialize "correct" and "total" counters to keep track of the number of correctly classified examples and the total number of examples, respectively. We then loop over the validation set using the "val_loader" created earlier.

For each batch in the validation set, we move the inputs and labels to the device (either CPU or GPU), just like we did during training. We then compute the model's predictions by passing the inputs through the model, and taking the argmax of the output tensor to get the predicted class labels.

We update the "total" counter with the number of examples in this batch, and add the number of correctly classified examples to the "correct" counter. Note that we use the ".item()" method to convert the PyTorch tensor to a scalar value.

Finally, we print the validation accuracy as a percentage, which is calculated as the number of correctly classified examples divided by the total number of examples in the validation set, multiplied by 100.

Python3

# Evaluate the model on the validation set
with torch.no_grad():
    correct = 0
    total = 0
    for inputs, labels in val_loader:
        # Move inputs and labels to the device
        inputs = inputs.to(device)
        labels = labels.to(device)

        # Compute the model's predictions
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)

        # Compute the accuracy
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Validation Accuracy: {:.2f}%'.format(100 * correct / total))

Output:

Validation Accuracy: 90.00%

The validation accuracy is displayed as 90.00%. This means that when the model was evaluated on the validation set, it correctly classified 90.00% of the samples. This indicates that the model has learned to generalize well on unseen data.

Results and Conclusion

In this tutorial, we implemented multinomial logistic regression using PyTorch and trained the model on a dataset. We then evaluated the model's performance on a validation set and achieved an accuracy of 90%.

This model can be used in various applications such as digit recognition in postal services, automated bank check processing, and many more.

In conclusion, PyTorch provides a powerful framework for implementing various machine learning models, including logistic regression. With the ability to use GPUs and parallelize computations, PyTorch can significantly speed up the training process and enable the creation of highly accurate models. With some adjustments to hyperparameters and regularization techniques, our multinomial logistic regression model can be further improved and applied to various real-world problems.

Ordinal Logistic Regression in R

swapnilvishwakarma7

Improve

Article Tags :

Practice Tags :

Multinomial Logistic Regression with PyTorch

Data Preparation

Model Creation

Print the model summary

Define a loss function and an optimizer

Training and Validation

Evaluations

Results and Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?