Lab (Bounding Box)
Lab (Bounding Box)
ipynb - Colab
Bounding Box Predictions is a fundamental task in object detection where the model predicts the coordinates of a rectangular box enclosing
an object in an image .
Objective
Train a Convolutional Neural Network (CNN) to predict bounding box coordinates for a single object in an image.
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import numpy as np
import matplotlib.pyplot as plt
Purpose: These libraries support dataset loading, neural network construction, and visualization.
2. Dataset Preparation
We will use the MNIST dataset, which contains images of handwritten digits, and generate bounding boxes around each digit.
class MNISTWithBoundingBoxes:
def __init__(self, train=True):
self.dataset = datasets.MNIST(
root='./data',
train=train,
download=True,
transform=transforms.ToTensor()
)
def __len__(self):
return len(self.dataset)
# Initialize DataLoaders
train_dataset = MNISTWithBoundingBoxes(train=True)
test_dataset = MNISTWithBoundingBoxes(train=False)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
The model takes the image as input and outputs the bounding box coordinates: [ 𝑥 min , 𝑦 min , 𝑥 max , 𝑦 max ] [x min,y min,x max,y max].
class BoundingBoxModel(nn.Module):
def __init__(self):
super(BoundingBoxModel, self).__init__()
self.backbone = nn.Sequential(
nn.Conv2d(1, 16, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2),
nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2)
)
self.fc = nn.Sequential(
nn.Flatten(),
nn.Linear(32 * 7 * 7, 128),
nn.ReLU(),
https://colab.research.google.com/drive/1uyLIsxk740OFLmhYmj27Jfeqk9PgdLkC#scrollTo=bYJ61DzZpw-E&printMode=true 1/3
11/27/24, 8:29 AM Untitled6.ipynb - Colab
nn.Linear(128, 4) # 4 outputs: [x_min, y_min, x_max, y_max]
)
Bounding box prediction uses Mean Squared Error (MSE) as the loss function since it involves regression.
# Training Loop
epochs = 5
for epoch in range(epochs):
model.train()
total_loss = 0
for imgs, _, bboxes in train_loader:
imgs, bboxes = imgs.to(device), bboxes.to(device)
# Forward pass
pred_bboxes = model(imgs)
# Compute loss
loss = criterion(pred_bboxes, bboxes)
total_loss += loss.item()
pred_bboxes = model(imgs)
7. Visualize Results
plt.legend()
plt.show()
# Visualize a sample
img, _, bbox = train_dataset[0]
pred_bbox = model(img.unsqueeze(0).to(device)).cpu().detach().numpy()[0]
visualize_bbox(img, bbox.numpy(), pred_bbox)
Expected Results
Predicted Bounding Box: Model predicts bounding box coordinates for a digit in the test set.
Ground Truth Bounding Box: Actual bounding box coordinates for comparison. Visualization:
https://colab.research.google.com/drive/1uyLIsxk740OFLmhYmj27Jfeqk9PgdLkC#scrollTo=bYJ61DzZpw-E&printMode=true 3/3