0% found this document useful (0 votes)

19 views20 pages

Faster R-CNN

The document provides a comprehensive guide on understanding and implementing the Faster R-CNN model, which is a two-stage object detection framework that proposes regions and classifies objects within images. It details the architecture, including the Region Proposal Network (RPN) and the object classification process, along with the training and inference steps using PyTorch. Additionally, it includes code snippets for setting up the model, preparing datasets, and evaluating performance.

Uploaded by

Sagar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views20 pages

Faster R-CNN

Uploaded by

Sagar Giri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Understanding and

Implementing Faster R-CNN

Most of the current SOTA models are built on top of the

groundwork laid by the Faster-RCNN model. Faster R-CNN is

an object detection model that identifies objects in an image

and draws bounding boxes around them, while also classifying

what those objects are. It’s a two-stage detector:

1. Stage 1: Proposes potential regions in the image that

might contain objects. This is handled by the Region

Proposal Network (RPN).

2. Stage 2: Uses these proposed regions to predict the

class of the object and refines the bounding box to

better match the object.

The Architecture of Faster R-CNN

Faster R-CNN Architechture

Stage 1: Region Proposal Network (RPN):

Backbone Network:

● The image passes through a convolutional network (like

ResNet or VGG16).

● This extracts important features from the image and

creates a feature map.

Anchors:

● Anchors are boxes of different sizes and shapes placed

over points on the feature map.

● Each anchor box represents a possible object location.

● At every point on the feature map, anchor boxes are

generated with different sizes and aspect ratios.

Classification of Anchors:

● The RPN predicts whether each anchor box is

background (no object) or foreground (contains an

object).
● Positive (foreground) anchors: Boxes with high

overlap with actual objects.

● Negative (background) anchors: Boxes with little

or no overlap with objects.

Bounding Box Refinement:

● The RPN also refines the anchor boxes to better align

them with the actual objects by predicting offsets

(adjustments).

Loss functions:

I)Classification loss: Helps the model decide if the anchor is

background or foreground.

II)Regression loss: Helps adjust the anchor boxes to fit the

objects more precisely.

Stage 2: Object Classification and Box
Refinement:

Region Proposals:

● After RPN, we get region proposals (refined boxes

that likely contain objects).

ROI Pooling:

● The region proposals have different sizes, but the neural

network needs fixed-size inputs.

● ROI Pooling resizes all region proposals to a fixed size

by dividing them into smaller regions and applying

pooling, making them uniform.

Object Classification:

● Each region proposal is passed through a small network

to predict the category (e.g., dog, car, etc.) of the object

inside it.
● Cross-entropy loss is used to classify the objects into

categories.

Bounding Box Refinement (Again):

● The region proposals are refined again to better match

the actual objects, using offsets.

● This uses regression loss to adjust the proposals.

Multi-task Learning:

● The network in stage 2 learns both to predict object

categories and refine bounding boxes at the same time.

Inference (Testing/Prediction Time):

● Top Region Proposals: During testing, the model

generates a large number of region proposals, but only

the top proposals (with the highest classification

scores) are passed to the second stage.

● Final Predictions: The second stage predicts the final

categories and bounding boxes.

● Non-Max Suppression: A technique called

Non-Max Suppression is applied to remove

duplicate or overlapping boxes, keeping only the best

ones.

Training:
Two ways to train:

1. Train in stages: First, train the region proposal

network (RPN) and then the classifier and regressor.

2. Train together: Train both stages at the same time

(faster and more efficient).

Implement and Fine-Tune Faster R-CNN in

PyTorch

Step 1: Install Required Libraries

pip install torch torchvision

Step 2: Import Required Modules

import torch

from torch.utils.data import DataLoader

import torchvision

from torchvision.models.detection import fasterrcnn_resnet50_fpn

from torchvision.datasets import ImageFolder

from torchvision import transforms

import torchvision.transforms as T

from torchvision.models.detection.faster_rcnn import

FastRCNNPredictor

Step 3: Load Pre-trained Faster R-CNN Model

PyTorch’s torchvision provides a Faster R-CNN model

pre-trained on COCO. You can modify this for your own dataset

by changing the number of classes in the final layer.

# Load the pre-trained Faster R-CNN model with a ResNet-50 backbone

model = fasterrcnn_resnet50_fpn(pretrained=True)

# Number of classes (your dataset classes + 1 for background)

num_classes = 3 # For example, 2 classes + background

# Get the number of input features for the classifier

in_features = model.roi_heads.box_predictor.cls_score.in_features
# Replace the head of the model with a new one (for the number of
classes in your dataset)

model.roi_heads.box_predictor = FastRCNNPredictor(in_features,
num_classes)

Step 4: Prepare the Dataset

● Faster R-CNN requires images and corresponding

annotations (bounding boxes and labels).

● Your dataset should return: Images and Target

dictionaries that include bounding boxes (boxes) and

labels (labels).

Create your custom dataset class if necessary. You can use

torchvision.datasets.ImageFolder and provide bounding boxes in

the annotation files or create a custom Dataset class.

# Define transformations (e.g., resizing, normalization)

transform = T.Compose([
T.ToTensor(),

])

# Custom Dataset class or using an existing one

class CustomDataset(torch.utils.data.Dataset):

def init(self, transforms=None):

# Initialize dataset paths and annotations here

self.transforms = transforms

# Your dataset logic (image paths, annotations, etc.)

def getitem(self, idx):

# Load image

img = ... # Load your image here

# Load corresponding bounding boxes and labels

boxes = ... # Load or define bounding boxes

labels = ... # Load or define labels

# Create a target dictionary

target = {}

target["boxes"] = torch.tensor(boxes, dtype=torch.float32)

target["labels"] = torch.tensor(labels, dtype=torch.int64)

# Apply transforms

if self.transforms is not None:

img = self.transforms(img)

return img, target

def __len__(self):

# Return the length of your dataset

return len(self.data)

Step 5: Set Up Data Loader

# Load dataset

dataset = CustomDataset(transforms=transform)

# Split into train and validation sets

indices = torch.randperm(len(dataset)).tolist()

train_dataset = torch.utils.data.Subset(dataset, indices[:-50])

valid_dataset = torch.utils.data.Subset(dataset, indices[-50:])

# Create data loaders

train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True,

collate_fn=lambda x:
tuple(zip(*x)))

valid_loader = DataLoader(valid_dataset, batch_size=4,

shuffle=False,

collate_fn=lambda x:
tuple(zip(*x)))

Step 6: Set Up Training Loop

Now set up the optimizer and training loop. For Faster R-CNN,

it’s common to use SGD or Adam as the optimizer.

# Move model to GPU if available

device = torch.device('cuda') if torch.cuda.is_available()

else torch.device('cpu')
model.to(device)

# Set up the optimizer

params = [p for p in model.parameters() if p.requires_grad]

optimizer = torch.optim.SGD(params, lr=0.005, momentum=0.9,

weight_decay=0.0005)

# Learning rate scheduler

lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
step_size=3,

gamma=0.1)

# Train the model

num_epochs = 10
for epoch in range(num_epochs):

model.train()

train_loss = 0.0

# Training loop

for images, targets in train_loader:

images = list(image.to(device) for image in images)

targets = [{k: v.to(device) for k, v in t.items()} for t in

targets]

# Zero the gradients

optimizer.zero_grad()
# Forward pass

loss_dict = model(images, targets)

losses = sum(loss for loss in loss_dict.values())

# Backward pass

losses.backward()

optimizer.step()

train_loss += losses.item()

# Update the learning rate

lr_scheduler.step()
print(f'Epoch: {epoch + 1}, Loss: {train_loss /
len(train_loader)}')

print("Training complete!")

Step 7: Evaluate the Model

After training, you can evaluate the model on the validation set

or use it for inference on new images.

# Set the model to evaluation mode

model.eval()

# Test on a new image

with torch.no_grad():

for images, targets in valid_loader:

images = list(img.to(device) for img in images)

predictions = model(images)
# Example: print the bounding boxes and labels for the first
image

print(predictions[0]['boxes'])

print(predictions[0]['labels'])

Step 8: Inference
To run inference on a new image:

import cv2

from PIL import Image

# Load image

img = Image.open("path/to/your/image.jpg")

# Apply the same transformation as for training

img = transform(img)
img = img.unsqueeze(0).to(device)

# Model prediction

model.eval()

with torch.no_grad():

prediction = model([img])

# Print the predicted bounding boxes and labels

print(prediction[0]['boxes'])

print(prediction[0]['labels'])

DLV Lab Manual Print
No ratings yet
DLV Lab Manual Print
29 pages
Deep Learning For Vision Lab Manual 2024
100% (1)
Deep Learning For Vision Lab Manual 2024
25 pages
Deep Learning Project For Computer Vision With Python 2022
No ratings yet
Deep Learning Project For Computer Vision With Python 2022
297 pages
B.SC Mathmatics Books
No ratings yet
B.SC Mathmatics Books
26 pages
Deep Learning With Pytorch: Ai Courses by Opencv
No ratings yet
Deep Learning With Pytorch: Ai Courses by Opencv
9 pages
Light Crossword
100% (3)
Light Crossword
2 pages
Font Transfer 2 Autoencoders
No ratings yet
Font Transfer 2 Autoencoders
78 pages
SF Handbook-2025
No ratings yet
SF Handbook-2025
14 pages
Deep Learning With PyTorch
No ratings yet
Deep Learning With PyTorch
19 pages
BIONICS - DR - Parameswari. PHD Agri., Bionics Enviro Tech, Nanozyme
100% (1)
BIONICS - DR - Parameswari. PHD Agri., Bionics Enviro Tech, Nanozyme
26 pages
Grade 11 1st Quarter Module 3.2 Solving Problems Involving Functions - PDF - 4pages
No ratings yet
Grade 11 1st Quarter Module 3.2 Solving Problems Involving Functions - PDF - 4pages
4 pages
Pytorch Tutorial: Narges Honarvar Nazari January 30
No ratings yet
Pytorch Tutorial: Narges Honarvar Nazari January 30
29 pages
(Deep Learning Using PyTorch) (Cheatsheet)
No ratings yet
(Deep Learning Using PyTorch) (Cheatsheet)
7 pages
Siam Mapped A History of The Geo-Body of A Nation - Selection
No ratings yet
Siam Mapped A History of The Geo-Body of A Nation - Selection
102 pages
Csc413 Project Semantic Segmentation
No ratings yet
Csc413 Project Semantic Segmentation
84 pages
Acs Q1 DLL W7
No ratings yet
Acs Q1 DLL W7
4 pages
PyTorch Crash Course 1713016363
No ratings yet
PyTorch Crash Course 1713016363
15 pages
Tesi
No ratings yet
Tesi
57 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
HANDOUT - Understanding Common Hazards
No ratings yet
HANDOUT - Understanding Common Hazards
2 pages
PyTorch Made Easy A Quick Overview
No ratings yet
PyTorch Made Easy A Quick Overview
55 pages
SFVIP Calendar 2025 - Cohort 1
No ratings yet
SFVIP Calendar 2025 - Cohort 1
1 page
Group 2 - BECG Tata Group
No ratings yet
Group 2 - BECG Tata Group
17 pages
Module02 PyTorch
No ratings yet
Module02 PyTorch
36 pages
TDS Polyurea-WH200 BFL-v3
No ratings yet
TDS Polyurea-WH200 BFL-v3
4 pages
Earthen Dam Design
No ratings yet
Earthen Dam Design
41 pages
Gabriel Taborin College of Davao Foundation, Inc
No ratings yet
Gabriel Taborin College of Davao Foundation, Inc
17 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Applied Machine and Deep Learning
No ratings yet
Applied Machine and Deep Learning
34 pages
Qualitative Research 11
No ratings yet
Qualitative Research 11
33 pages
DL Pipeline and Tutorial
No ratings yet
DL Pipeline and Tutorial
36 pages
Chapter 3 - Training Deep Neural Networks
No ratings yet
Chapter 3 - Training Deep Neural Networks
25 pages
Lab 9
No ratings yet
Lab 9
29 pages
Presentation
No ratings yet
Presentation
31 pages
Brainwashing
No ratings yet
Brainwashing
23 pages
Assignment3 AL
No ratings yet
Assignment3 AL
23 pages
LC50 51 51 58
No ratings yet
LC50 51 51 58
14 pages
Transport in Plants - LP 7-9
No ratings yet
Transport in Plants - LP 7-9
19 pages
NN From Scratch PDF 1735495327
No ratings yet
NN From Scratch PDF 1735495327
19 pages
01 - Mnist - Ipynb (4) - JupyterLab
No ratings yet
01 - Mnist - Ipynb (4) - JupyterLab
23 pages
Dinushasan Courseproject04: Sign in
No ratings yet
Dinushasan Courseproject04: Sign in
19 pages
Transfer Learning For Image Classification in Pytorch
No ratings yet
Transfer Learning For Image Classification in Pytorch
13 pages
Ballast
No ratings yet
Ballast
15 pages
Report
No ratings yet
Report
15 pages
Izzaldine SYUFYAN - Circus Assessment Citeria A and B
No ratings yet
Izzaldine SYUFYAN - Circus Assessment Citeria A and B
14 pages
Assignment 3 DS5620
No ratings yet
Assignment 3 DS5620
11 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
10 pages
Module 2
No ratings yet
Module 2
10 pages
3 Steps To Update Parameters of Faster R-CNN - SSD Models in TensorFlow Object Detection API
No ratings yet
3 Steps To Update Parameters of Faster R-CNN - SSD Models in TensorFlow Object Detection API
15 pages
Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
No ratings yet
Understanding and Implementing Faster R-CNN - by Rishabh Singh - Medium
14 pages
Intro To Pytorch
No ratings yet
Intro To Pytorch
12 pages
Video 18 - Transfer Learning and Fine-Tuning Pretrained Models
No ratings yet
Video 18 - Transfer Learning and Fine-Tuning Pretrained Models
14 pages
361 Project Code
No ratings yet
361 Project Code
10 pages
Assignment 2 DL
No ratings yet
Assignment 2 DL
10 pages
Report
No ratings yet
Report
11 pages
Assignment U Net
No ratings yet
Assignment U Net
11 pages
Bioresource Technology: Sciencedirect
No ratings yet
Bioresource Technology: Sciencedirect
10 pages
Green Building Research-Current Status and Future Agenda
No ratings yet
Green Building Research-Current Status and Future Agenda
11 pages
Val
No ratings yet
Val
9 pages
Topological Indices of Molecular Graph and Drug Design
No ratings yet
Topological Indices of Molecular Graph and Drug Design
5 pages
CV Lab Final AwaisKhan EE A
No ratings yet
CV Lab Final AwaisKhan EE A
7 pages
CIFAR - 10 - Dataset - Using - CNN - Aniiiii - HTML
No ratings yet
CIFAR - 10 - Dataset - Using - CNN - Aniiiii - HTML
8 pages
Training A Classifier - PyTorch Tutorials 2.3.0+cu121 Documentation
No ratings yet
Training A Classifier - PyTorch Tutorials 2.3.0+cu121 Documentation
8 pages
Skill 7
No ratings yet
Skill 7
11 pages
Aditya Joshi 23252595 Assign 5
No ratings yet
Aditya Joshi 23252595 Assign 5
7 pages
Train Your Image Classifier Model With PyTorch
No ratings yet
Train Your Image Classifier Model With PyTorch
6 pages
Cep Dip
No ratings yet
Cep Dip
9 pages
Report in Pa 211
No ratings yet
Report in Pa 211
6 pages
Principle - of - Econ - HW1 (Solution)
No ratings yet
Principle - of - Econ - HW1 (Solution)
8 pages
CVDL Tae 63
No ratings yet
CVDL Tae 63
9 pages
Building Deep Learning Models Using The PyTorch Library
No ratings yet
Building Deep Learning Models Using The PyTorch Library
4 pages
Keras
No ratings yet
Keras
4 pages
Document 2
No ratings yet
Document 2
8 pages
Notebook - Agave Plant Maturation Model Inference and Testing
No ratings yet
Notebook - Agave Plant Maturation Model Inference and Testing
7 pages
Normalisation
No ratings yet
Normalisation
7 pages
2c PyTorch4
No ratings yet
2c PyTorch4
4 pages
AIHT Final Project
No ratings yet
AIHT Final Project
6 pages
Ccnet Only
No ratings yet
Ccnet Only
6 pages
ML Code Analysis
No ratings yet
ML Code Analysis
6 pages
Z-ABS Technical Data Sheet Eng-1
No ratings yet
Z-ABS Technical Data Sheet Eng-1
2 pages
Practical 02
No ratings yet
Practical 02
5 pages
WS 4 (Done)
No ratings yet
WS 4 (Done)
3 pages
Auto Encoder
No ratings yet
Auto Encoder
4 pages
Advance Questions Answers
No ratings yet
Advance Questions Answers
4 pages
Attribute Closure
No ratings yet
Attribute Closure
4 pages
Code
No ratings yet
Code
4 pages
Health: Myths and Facts About Aluminum and Human Health
No ratings yet
Health: Myths and Facts About Aluminum and Human Health
3 pages
Softmax Regression Mnist
No ratings yet
Softmax Regression Mnist
3 pages
Imran CV
No ratings yet
Imran CV
2 pages
Tutorial 7
No ratings yet
Tutorial 7
2 pages
Test Reading English
No ratings yet
Test Reading English
3 pages
Palaka
No ratings yet
Palaka
2 pages
Pacific Ocean - Wikipedia
No ratings yet
Pacific Ocean - Wikipedia
1 page
Basics of Python
No ratings yet
Basics of Python
1 page
MARKSHEET4
No ratings yet
MARKSHEET4
1 page
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet