How to handle sequence padding and packing in PyTorch for RNNs?
Last Updated :
23 Jul, 2025
There are many dataset that have sequences with variable lengths and recurrent neural networks (RNNs) require fixed-length inputs. To address this challenge, sequence padding and packing techniques are used, particularly in PyTorch, a popular deep learning framework. The article demonstrates how sequence padding ensures uniformity in sequence lengths by adding zeros to shorter sequences, while sequence packing compresses padded sequences for efficient processing in RNNs.
Sequence Padding and Packing for RNNs
Training Recurrent Neural Networks (RNNs) can be tricky when dealing with sequences of different lengths. Imagine we have a batch of 8 sequences where their lengths are: 6, 5, 4, 7, 2, 3, 8, and 7.
This is where padding comes and pad all sequences to the maximum length (8 in this case) with meaningless values. This creates an 8x8 matrix for computations, even though some sequences are shorter. This wastes processing power because we perform unnecessary calculations (64 computations instead of the actual 45 needed).
For this , packing plays an important role as It packs the sequences into a data structure that preserves their original lengths before padding. By doing so, the RNN model can process only the non-padded portions of each sequence, effectively reducing the computational overhead.
Implementation of Sequence Padding and Sequence Packing
- The code imports necessary modules from PyTorch: torch and torch.nn.utils.rnn defining a list of example sequences with variable lengths (sequences). Each sequence in the list is converted to a PyTorch tensor using a list comprehension (sequences_tensor).
- The pad_sequence function from torch.nn.utils.rnn is used to pad the sequences to the maximum length with zeros, ensuring that all sequences have the same length. The batch_first=True argument specifies that the batch dimension should be the first dimension in the resulting tensor.
- Then, the code calculates the actual lengths of sequences, and finally demonstrates how to pack sequences using
pack_padded_sequence()
.
Python3
import torch
import torch.nn.utils.rnn as rnn_utils
# Define sequences
sequences = [
[1, 2, 3],
[4, 5],
[6, 7, 8, 9],
[10]
]
sequences_tensor = [torch.tensor(seq) for seq in sequences] # Convert sequences to PyTorch tensors
# Padding
padded_sequences = rnn_utils.pad_sequence(sequences_tensor, batch_first=True)
print("Padded sequences:","\n",padded_sequences)
# Packing
sequence_lengths = torch.tensor([len(seq) for seq in sequences]) # Calculating actual lengths of sequences
# Pack padded sequences
packed_sequences = rnn_utils.pack_padded_sequence(padded_sequences, sequence_lengths, batch_first=True, enforce_sorted=False)
print("\nPacked sequences:",packed_sequences)
Output:
Padded sequences:
tensor([[ 1, 2, 3, 0],
[ 4, 5, 0, 0],
[ 6, 7, 8, 9],
[10, 0, 0, 0]])
Packed sequences: PackedSequence(data=tensor([ 6, 1, 4, 10, 7, 2, 5, 8, 3, 9]), batch_sizes=tensor([4, 3, 2, 1]), sorted_indices=tensor([2, 0, 1, 3]), unsorted_indices=tensor([1, 2, 0, 3]))
The output consists of a 2-dimensional PyTorch tensor, representing the padded sequences. Each row in the tensor corresponds to a sequence, and columns represent elements within each sequence. For example,
- First Sequence (Row 1):
- Original sequence: [1, 2, 3]
- Padded sequence: [1, 2, 3, 0]
- The original sequence had three elements, so it was padded with a zero to match the length of the longest sequence in the batch (which is four).
- Second Sequence (Row 2):
- Original sequence: [4, 5]
- Padded sequence: [4, 5, 0, 0]
- The original sequence had two elements, so it was padded with two zeros to match the length of the longest sequence in the batch.
Same is done for all the rows.
In the packed sequence:
data:
contains the flattened non-padded elements from the padded sequences.batch_sizes:
indicates how many elements are present at each time step, reflecting the varying sequence lengths within the batch.
This packed sequence is feed into your recurrent neural network (RNN) model during training, allowing it to efficiently process variable-length sequences.
Handling Sequence Padding and Packing in PyTorch for RNNs
This code implements a basic RNN model using PyTorch's nn.Module class. for sequence processing tasks, while handling variable-length input sequences using sequence packing and unpacking techniques.
The forward method takes input sequences (text) and their lengths (text_lengths). Inside the forward method, sequence packing is performed.
Python3
import torch
import torch.nn as nn
import torch.optim as optim
from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
# Define RNN Model
class RNN(nn.Module):
def __init__(self, input_dim, embedding_dim, hidden_dim, output_dim):
super().__init__()
self.embedding = nn.Embedding(input_dim, embedding_dim)
self.rnn = nn.RNN(embedding_dim, hidden_dim)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, text, text_lengths):
embedded = self.embedding(text)
packed_embedded = pack_padded_sequence(embedded, text_lengths.cpu(), batch_first=True, enforce_sorted=False)
packed_output, _ = self.rnn(packed_embedded)
output, output_lengths = pad_packed_sequence(packed_output, batch_first=True)
return self.fc(output[:, -1, :])
Pad Sequences
- The example sequences are padded to the maximum length using torch.nn.utils.rnn.pad_sequence(). This is necessary for processing sequences in batches.
- The lengths of the original sequences are computed and stored in a tensor (sequence_lengths). This is necessary for sequence packing.
Python3
sequences = [
torch.tensor([1, 2, 3]),
torch.tensor([4, 5]),
torch.tensor([6, 7, 8, 9]),
torch.tensor([10])
]
padded_sequences = torch.nn.utils.rnn.pad_sequence(sequences, batch_first=True)
sequence_lengths = torch.tensor([len(seq) for seq in sequences])
Instantiate the Model
An instance of the RNN model is created with example dimensions for input, embedding, hidden, and output layers to determine the architecture and behavior of the model.
Python3
INPUT_DIM = 11
EMBEDDING_DIM = 100
HIDDEN_DIM = 256
OUTPUT_DIM = 1
model = RNN(INPUT_DIM, EMBEDDING_DIM, HIDDEN_DIM, OUTPUT_DIM)
Forward Pass with Packed Sequences
The model is called with the padded sequences (padded_sequences) and their lengths (sequence_lengths). Inside the model's forward method, sequence packing is performed using pack_padded_sequence().
Python3
outputs = model(padded_sequences, sequence_lengths)
print("Output shape:", outputs.shape)
Output:
Output shape: torch.Size([4, 1])
The output shape torch.Size([4, 1]) indicates that the model produced a tensor with 4 rows and 1 column.
- Rows: The first dimension of the tensor corresponds to the batch size. In this case, there are 4 sequences in the batch.
- Columns: The second dimension corresponds to the output dimension of the model. Since OUTPUT_DIM = 1, each sequence produces a single output value.
Conclusion
In conclusion, sequence padding ensures uniformity in sequence lengths by adding zeros to shorter sequences, while sequence packing compresses padded sequences for efficient processing in recurrent neural networks (RNNs) using PyTorch.
Similar Reads
Deep Learning Tutorial Deep Learning is a subset of Artificial Intelligence (AI) that helps machines to learn from large datasets using multi-layered neural networks. It automatically finds patterns and makes predictions and eliminates the need for manual feature extraction. Deep Learning tutorial covers the basics to adv
5 min read
Deep Learning Basics
Introduction to Deep LearningDeep Learning is transforming the way machines understand, learn and interact with complex data. Deep learning mimics neural networks of the human brain, it enables computers to autonomously uncover patterns and make informed decisions from vast amounts of unstructured data. How Deep Learning Works?
7 min read
Artificial intelligence vs Machine Learning vs Deep LearningNowadays many misconceptions are there related to the words machine learning, deep learning, and artificial intelligence (AI), most people think all these things are the same whenever they hear the word AI, they directly relate that word to machine learning or vice versa, well yes, these things are
4 min read
Deep Learning Examples: Practical Applications in Real LifeDeep learning is a branch of artificial intelligence (AI) that uses algorithms inspired by how the human brain works. It helps computers learn from large amounts of data and make smart decisions. Deep learning is behind many technologies we use every day like voice assistants and medical tools.This
3 min read
Challenges in Deep LearningDeep learning, a branch of artificial intelligence, uses neural networks to analyze and learn from large datasets. It powers advancements in image recognition, natural language processing, and autonomous systems. Despite its impressive capabilities, deep learning is not without its challenges. It in
7 min read
Why Deep Learning is ImportantDeep learning has emerged as one of the most transformative technologies of our time, revolutionizing numerous fields from computer vision to natural language processing. Its significance extends far beyond just improving predictive accuracy; it has reshaped entire industries and opened up new possi
5 min read
Neural Networks Basics
What is a Neural Network?Neural networks are machine learning models that mimic the complex functions of the human brain. These models consist of interconnected nodes or neurons that process data, learn patterns and enable tasks such as pattern recognition and decision-making.In this article, we will explore the fundamental
12 min read
Types of Neural NetworksNeural networks are computational models that mimic the way biological neural networks in the human brain process information. They consist of layers of neurons that transform the input data into meaningful outputs through a series of mathematical operations. In this article, we are going to explore
7 min read
Layers in Artificial Neural Networks (ANN)In Artificial Neural Networks (ANNs), data flows from the input layer to the output layer through one or more hidden layers. Each layer consists of neurons that receive input, process it, and pass the output to the next layer. The layers work together to extract features, transform data, and make pr
4 min read
Activation functions in Neural NetworksWhile building a neural network, one key decision is selecting the Activation Function for both the hidden layer and the output layer. It is a mathematical function applied to the output of a neuron. It introduces non-linearity into the model, allowing the network to learn and represent complex patt
8 min read
Feedforward Neural NetworkFeedforward Neural Network (FNN) is a type of artificial neural network in which information flows in a single direction i.e from the input layer through hidden layers to the output layer without loops or feedback. It is mainly used for pattern recognition tasks like image and speech classification.
6 min read
Backpropagation in Neural NetworkBack Propagation is also known as "Backward Propagation of Errors" is a method used to train neural network . Its goal is to reduce the difference between the modelâs predicted output and the actual output by adjusting the weights and biases in the network.It works iteratively to adjust weights and
9 min read
Deep Learning Models
Deep Learning Frameworks
TensorFlow TutorialTensorFlow is an open-source machine-learning framework developed by Google. It is written in Python, making it accessible and easy to understand. It is designed to build and train machine learning (ML) and deep learning models. It is highly scalable for both research and production.It supports CPUs
2 min read
Keras TutorialKeras high-level neural networks APIs that provide easy and efficient design and training of deep learning models. It is built on top of powerful frameworks like TensorFlow, making it both highly flexible and accessible. Keras has a simple and user-friendly interface, making it ideal for both beginn
3 min read
PyTorch TutorialPyTorch is an open-source deep learning framework designed to simplify the process of building neural networks and machine learning models. With its dynamic computation graph, PyTorch allows developers to modify the networkâs behavior in real-time, making it an excellent choice for both beginners an
7 min read
Caffe : Deep Learning FrameworkCaffe (Convolutional Architecture for Fast Feature Embedding) is an open-source deep learning framework developed by the Berkeley Vision and Learning Center (BVLC) to assist developers in creating, training, testing, and deploying deep neural networks. It provides a valuable medium for enhancing com
8 min read
Apache MXNet: The Scalable and Flexible Deep Learning FrameworkIn the ever-evolving landscape of artificial intelligence and deep learning, selecting the right framework for building and deploying models is crucial for performance, scalability, and ease of development. Apache MXNet, an open-source deep learning framework, stands out by offering flexibility, sca
6 min read
Theano in PythonTheano is a Python library that allows us to evaluate mathematical operations including multi-dimensional arrays efficiently. It is mostly used in building Deep Learning Projects. Theano works way faster on the Graphics Processing Unit (GPU) rather than on the CPU. This article will help you to unde
4 min read
Model Evaluation
Deep Learning Projects