Logistic regression is a popular machine learning algorithm used for binary classification tasks. It models the probability of the output variable (also known as the dependent variable) given the input variables (also known as the independent variables). It is a linear algorithm that applies a logistic function to the output of a linear regression model, which transforms the continuous output into a probability between 0 and 1.
Logistic regression has several variants, including binary logistic regression, multinomial logistic regression, and ordinal logistic regression. Binary logistic regression is used for binary classification tasks, where there are only two possible outcomes for the output variable. Multinomial logistic regression, also known as softmax regression, is used for multi-class classification tasks, where there are more than two possible outcomes for the output variable. Ordinal logistic regression is used for ordered multi-class classification tasks, where the outcomes have a natural ordering (e.g. low, medium, high).
In this article, we will focus on implementing multinomial logistic regression using PyTorch.
Data Preparation
For our example, we will be using the famous Iris dataset, which contains measurements of the sepal length, sepal width, petal length, and petal width for three species of iris flowers (Iris setosa, Iris versicolor, and Iris virginica). The goal is to predict the species of an iris flower based on these measurements.
Before training our multinomial logistic regression model, we need to preprocess our data. In this case, we will normalize our input data to have a mean of 0 and a standard deviation of 1. This helps to ensure that our model trains more efficiently and effectively.
We can load and prepare the dataset using PyTorch as follows:
Python3
# Import thenecessary libraies
import torch
from sklearn.datasets import load_iris
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
import torch.nn as nn
from torchinfo import summary
# Load the Iris dataset
iris = load_iris()
# Convert the data to PyTorch tensors
X = torch.tensor(iris.data, dtype=torch.float32)
y = torch.tensor(iris.target, dtype=torch.long)
# Normalize the input data
mean = torch.mean(X, dim=0)
std = torch.std(X, dim=0)
X = (X - mean) / std
# Split the dataset into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
# Create PyTorch Datasets
train_dataset = TensorDataset(X_train, y_train)
val_dataset = TensorDataset(X_val, y_val)
# Define the data loaders
batch_size = 16
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)
In this code, we begin by importing the required packages - torch, load_iris from scikit-learn, DataLoader from PyTorch, and train_test_split from scikit-learn.
We load the Iris dataset using load_iris() from scikit-learn. The data is then converted to PyTorch tensors using torch.tensor() and specifying the data type using dtype. Here, we use dtype=torch.float32 for the input data and dtype=torch.long for the output data.
We then normalize the input data using the mean and standard deviation calculated from the training data. This is done to ensure that each input feature has a similar scale and to improve model performance.
Next, we split the dataset into training and validation sets using train_test_split() from scikit-learn. We specify the test_size as 0.2, which means that 20% of the data is reserved for validation, and the random_state as 42 for reproducibility.
We create PyTorch Datasets using TensorDataset(), which takes in the input and output tensors for each set. Then, we create PyTorch DataLoaders using DataLoader() to batch and shuffle the data during training and validation. Here, we set the batch size as 16 for both training and validation and shuffle the training data by setting shuffle=True and validation data by setting shuffle=False.
Now that we have loaded the dataset into PyTorch, we can move on to data preparation.
Now that we have preprocessed and split our data, we can move on to building our multinomial logistic regression model using PyTorch.
Model Creation
Multinomial logistic regression is a type of logistic regression that is used when there are three or more categories in the dependent variable. It models the probability of each category using a separate logistic regression equation, and then selects the category with the highest probability as the predicted outcome.
We can implement multinomial logistic regression using PyTorch by defining a neural network with a single linear layer and a softmax activation function. The linear layer takes in the input data and outputs a vector of logits (i.e. unnormalized log probabilities), which are then passed through the softmax function to obtain a vector of probabilities.
Python3
class LogisticRegression(nn.Module):
def __init__(self, input_size, num_classes):
super(LogisticRegression, self).__init__()
self.linear = nn.Linear(input_size, num_classes)
def forward(self, x):
out = self.linear(x)
out = nn.functional.softmax(out, dim=1)
return out
In the above code, we define a PyTorch module called LogisticRegression that inherits from nn.Module. We define a single linear layer with input size 4 (i.e. the number of input features) and output size num_classes (i.e. the number of output classes). We then define the forward() method, which takes in the input data x and passes it through the linear layer and softmax activation function.
Print the model summary
 Instantiate our model by creating an instance of the LogisticRegression class with input_size=4 and num_classes=3.Â
 If a GPU is available, we set the device to 'cuda:0', otherwise, we set it to 'cpu'.Next, we move the model to the specified device using the to() method. This is done so that the model can utilize the hardware resources of the device during training.Â
Print the model summary.
Python3
# Define the model
model = LogisticRegression(input_size=4, num_classes=3)
# Check for cuda
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
# Move the model to the device
model = model.to(device)
summary(model, input_size=(16,4))
Output:
==========================================================================================
Layer (type:depth-idx) Output Shape Param #
==========================================================================================
LogisticRegression [16, 3] --
├─Linear: 1-1 [16, 3] 15
==========================================================================================
Total params: 15
Trainable params: 15
Non-trainable params: 0
Total mult-adds (M): 0.00
==========================================================================================
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.00
Estimated Total Size (MB): 0.00
==========================================================================================
Define a loss function and an optimizer
To train our model, we need to define a loss function and an optimizer. For this example, we will use cross-entropy loss and stochastic gradient descent (SGD) optimizer.
Python3
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.002)
In the above code, we define our loss function as nn.CrossEntropyLoss(), which is commonly used for multi-class classification problems. We also define our optimizer as torch.optim.SGD(), which implements stochastic gradient descent with a learning rate of 0.002. We pass in our model parameters using model.parameters().
Now that we have defined our model, loss function, and optimizer, we can move on to training the model using our training dataset.
Training and Validation
To train our model, we need to define a few parameters such as the number of epochs, batch size, and the device to use for training (CPU or GPU).
First, we define the training parameters such as the number of epochs to train the model and the device to use for training.
Then, we start training the model using a nested loop. In the outer loop, we iterate over the specified number of epochs. In the inner loop, we iterate over the batches of data in the training DataLoader.
For each batch, we move the input data and labels to the specified device using the to() method. Then, we perform a forward pass through the model using the input data to get the model's predictions. We calculate the loss using the predicted outputs and the actual labels using the specified loss function.
We then perform backpropagation to compute the gradients of the loss with respect to the model's parameters using the backward() method. We zero out the gradients using optimizer.zero_grad() before performing backpropagation to prevent gradient accumulation. We then update the model's parameters using the optimizer's step() method.
After each epoch, we print the training loss for that epoch using the print() function.
Python3
# Define training parameters
num_epochs = 1000
# Train the model
for epoch in range(num_epochs):
for i, (inputs, labels) in enumerate(train_loader):
# Move inputs and labels to the device
inputs = inputs.to(device)
labels = labels.to(device)
# Forward pass
outputs = model(inputs)
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
# Print training loss for each epoch
if (epoch+1)%100 == 0:
print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
Output:
Epoch [100/1000], Loss: 0.8293
Epoch [200/1000], Loss: 0.8145
Epoch [300/1000], Loss: 0.9410
Epoch [400/1000], Loss: 0.9489
Epoch [500/1000], Loss: 0.7749
Epoch [600/1000], Loss: 0.8242
Epoch [700/1000], Loss: 0.7823
Epoch [800/1000], Loss: 0.7413
Epoch [900/1000], Loss: 0.7628
Epoch [1000/1000], Loss: 0.7471
In this output, we can see that the training loss decreases with each epoch, indicating that our model is learning from the training data. We can see that the training loss starts at 0.8293 and gradually decreases over the epochs, indicating that the model is learning to make better predictions. The loss fluctuates a bit but generally trends downwards.
There are a few ways we can further improve our model's performance. One way is to adjust the hyperparameters such as the learning rate, number of layers, and number of neurons in each layer. We can also use different optimization algorithms such as Adam or SGD to update the model parameters. Additionally, we can try using different types of regularization such as L1 or L2 regularization to prevent overfitting. We can also use techniques such as data augmentation to increase the size of our training dataset and reduce overfitting.
Evaluations
After training our model, we can evaluate its performance on the validation set by looping over the validation dataset and computing the model's predictions for each example. We can then calculate the accuracy of the model by comparing its predictions to the true labels.
In this code block, we evaluate the performance of the trained model on the validation set.
We first wrap the evaluation code in a "torch.no_grad()" context, which tells PyTorch that we don't need to keep track of the gradients during this evaluation. This can save memory and computation time.
Next, we initialize "correct" and "total" counters to keep track of the number of correctly classified examples and the total number of examples, respectively. We then loop over the validation set using the "val_loader" created earlier.
For each batch in the validation set, we move the inputs and labels to the device (either CPU or GPU), just like we did during training. We then compute the model's predictions by passing the inputs through the model, and taking the argmax of the output tensor to get the predicted class labels.
We update the "total" counter with the number of examples in this batch, and add the number of correctly classified examples to the "correct" counter. Note that we use the ".item()" method to convert the PyTorch tensor to a scalar value.
Finally, we print the validation accuracy as a percentage, which is calculated as the number of correctly classified examples divided by the total number of examples in the validation set, multiplied by 100.
Python3
# Evaluate the model on the validation set
with torch.no_grad():
correct = 0
total = 0
for inputs, labels in val_loader:
# Move inputs and labels to the device
inputs = inputs.to(device)
labels = labels.to(device)
# Compute the model's predictions
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
# Compute the accuracy
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Validation Accuracy: {:.2f}%'.format(100 * correct / total))
Output:
Validation Accuracy: 90.00%
 The validation accuracy is displayed as 90.00%. This means that when the model was evaluated on the validation set, it correctly classified 90.00% of the samples. This indicates that the model has learned to generalize well on unseen data.
Results and Conclusion
In this tutorial, we implemented multinomial logistic regression using PyTorch and trained the model on a dataset. We then evaluated the model's performance on a validation set and achieved an accuracy of 90%.
This model can be used in various applications such as digit recognition in postal services, automated bank check processing, and many more.
In conclusion, PyTorch provides a powerful framework for implementing various machine learning models, including logistic regression. With the ability to use GPUs and parallelize computations, PyTorch can significantly speed up the training process and enable the creation of highly accurate models. With some adjustments to hyperparameters and regularization techniques, our multinomial logistic regression model can be further improved and applied to various real-world problems.
Similar Reads
Multinomial Logistic Regression in R
Multinomial logistic regression is applied when the dependent variable has more than two categories that are not ordered. This method extends binary logistic regression to deal with multiple classes by estimating the probability of each outcome category relative to a baseline. It is commonly used in
4 min read
Logistic Regression With Polynomial Features
Logistic regression with polynomial features is a technique used to model complex, non-linear relationships between input variables and the target variable. This approach involves transforming the original input features into higher-degree polynomial features, which can help capture intricate patter
5 min read
Logistic Regression using Python
A basic machine learning approach that is frequently used for binary classification tasks is called logistic regression. Though its name suggests otherwise, it uses the sigmoid function to simulate the likelihood of an instance falling into a specific class, producing values between 0 and 1. Logisti
8 min read
How to Optimize Logistic Regression Performance
Logistic Regression is a widely employed algorithm for binary classification tasks. However, the performance of Logistic Regression models can be significantly impacted by the choice of hyperparameters, which can lead to suboptimal results if not properly tuned. Therefore, it is crucial to explore t
8 min read
Ordinal Logistic Regression in R
A statistical method for modelling and analysing ordinal categorical outcomes is ordinal logistic regression, commonly referred to as ordered logistic regression. Ordinal results are categorical variables having a built-in order, but the gaps between the categories are not all the same. An example o
10 min read
Linear Regression using PyTorch
Linear Regression is a very commonly used statistical method that allows us to determine and study the relationship between two continuous variables. The various properties of linear regression and its Python implementation have been covered in this article previously. Now, we shall find out how to
4 min read
Logistic Regression using PySpark Python
In this tutorial series, we are going to cover Logistic Regression using Pyspark. Logistic Regression is one of the basic ways to perform classification (donât be confused by the word âregressionâ). Logistic Regression is a classification method. Some examples of classification are: Spam detectionDi
3 min read
Plot Multinomial and One-vs-Rest Logistic Regression in Scikit Learn
Logistic Regression is a popular classification algorithm that is used to predict the probability of a binary or multi-class target variable. In scikit-learn, there are two types of logistic regression algorithms: Multinomial logistic regression and One-vs-Rest logistic regression. Multinomial logis
4 min read
ML | Multiple Linear Regression using Python
Linear regression is a statistical method used for predictive analysis. It models the relationship between a dependent variable and a single independent variable by fitting a linear equation to the data. Multiple Linear Regression extends this concept by modelling the relationship between a dependen
4 min read
Logistic Regression in R Programming
Logistic regression ( also known as Binomial logistics regression) in R Programming is a classification algorithm used to find the probability of event success and event failure. It is used when the dependent variable is binary (0/1, True/False, Yes/No) in nature. At the core of logistic regression
6 min read