0% found this document useful (0 votes)

231 views

Object Detection Slides

The document discusses various computer vision tasks and object detection algorithms including RCNN, Fast RCNN, Faster RCNN, YOLO, and mAP evaluation. RCNN was one of the early object detection algorithms that was slow due to running CNN thousands of times per image. Fast RCNN and Faster RCNN improved speed by running CNN once per image and using region proposals. YOLO further improved speed by running CNN on the full image and dividing it into a grid for predictions. mAP is used to evaluate object detection and involves calculating precision and recall based on a confusion matrix using IoU thresholds.

Uploaded by

Arooj Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

231 views

Object Detection Slides

Uploaded by

Arooj Fatima

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 90

Object Detection

Object Detection
✓ boat
✓ person

Image Classification Object Detection

(what?) (what + where?)
Computer vision Tasks
Comparing Boxes: Intersection over Union
(IoU)
Comparing Boxes: Intersection over Union
(IoU)
Region-based Convolutional Neural Network (RCNN )
• Instead of working on a massive number of regions, the RCNN algorithm proposes
a bunch of boxes in the image and checks if any of these boxes contain any object.
RCNN uses selective search to extract these boxes from an image (these boxes are called
regions).

First Selective Search • Then combines the similar

• It first takes an image as input • Then, it generates initial sub-
regions to form a larger region
segmentations so that we
(based on color similarity, texture
have multiple regions from
similarity, size similarity, and
this image:
shape compatibility)
RCNN- PROBLEMS
• Extracting 2,000 regions for each image based on a selective search

• Extracting features using CNN for every image region. Suppose we have N images,
then the number of CNN features will be N*2,000

• The entire process of object detection using RCNN has three models:

• CNN for feature extraction

• Linear SVM classifier for identifying objects.

• Regression model for tightening the bounding boxes.

• All these processes combine to make RCNN very slow. It takes around 40-50 seconds to
make predictions for each new image
RCNN-PROBLEM

• As CNN is followed by fully connected layers which can accept

input of fixed size.

• This makes CNN incapable of accepting varied size inputs. Thus,

images are first reshaped into some specific dimension before
feeding into CNN.

• This creates another issue of image warping and reduced

resolution. Spatial Pyramid pooling comes as a counter to this
problem.
What’s wrong with SPP-net?

• Training is still Slow( though better).

• Introduces a new problem: cannot update parameters below SPP layer during
training.
FAST-RCNN
• Instead of running a CNN 2,000 times per image, we can run it just once per image and get
all the regions of interest (regions containing some object).

• In Fast RCNN, we feed the input image to the CNN, which in turn generates the
convolutional feature maps.

• Using these maps, the regions of proposals are extracted.

• We then use a RoI pooling layer to reshape all the proposed regions into a fixed size, so that
it can be fed into a fully connected network.

• A SoftMax layer is used on top of the fully connected network to output classes. Along with
the SoftMax layer, a linear regression layer is also used parallelly to output bounding box
coordinates for predicted classes.
Cropping Features: RoI Pool
Model takes an image input of size 512x512x3 (width x height x RGB) and VGG16 is
mapping it into a 16x16x512 feature map.

Note that the Output’s width and height are exactly 32 times smaller than the input
image (512/32 = 16). That’s important because all RoIs must be scaled down by this
factor.
Cropping Features: RoI Pool

• Its original size is 145x200 and the top left corner is

set to be in (192x296). As you could probably tell,
we’re not able to divide most of those numbers by 32.

• width: 200/32 = 6.25

• height: 145/32 = ~4.53

• x: 296/32 = 9.25

• y: 192/32 = 6
Cropping Features: RoI Pool
After RoI Pooling Layer there is a Fully Connected
layer with a fixed size. Because our Roi's have different
sizes we have to pool them into the same size
(3x3x512 in our example). At this moment our mapped
RoI is a size of 4x6x512 and as you can imagine
we cannot divide 4 by 3.
Problems with Fast RCNN
• Fast RCNN has certain problem areas.

• It also uses selective search as a

proposed method to find the Regions
of Interest, which is a slow and time-
consuming process.

• It takes around 2 seconds per image

to detect objects, which is much
better compared to RCNN.

• But when we consider large real-life

datasets, then even a Fast RCNN
doesn’t look so fast anymore.
Faster-RCNN
• Faster RCNN is the modified version of Fast RCNN. The major difference
between them is that Fast RCNN uses the selective search for generating
Regions of Interest.
• We extract a descriptor
per location
YOLO (You Only Look Once!)
• YOLO is a real-time object detection algorithm. It was developed by Joseph
Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi at the University of
Washington (2015).

• Yolo is extremely fast because it passes the entire image at once into a CNN,
rather than making predictions on many individual regions of the image.

• The key idea behind YOLO is to use a single neural network to predict the
bounding boxes and class probabilities for objects in an image
YOLO (You Only Look Once!)
• YOLO divides the input image into a grid of cells and predicts the presence of objects in
each cell.
• If an object is detected in a cell, the algorithm also predicts the bounding box and the
class for the object.
• The bounding box coordinates and class probabilities are then used to localize and
classify the objects.

Each object in training

image is assigned to
grid cell that contains
that object’s midpoint.
YOLO –Anchor Boxes
• One of the Caveats of YOLO is that it can’t detect multiple objects in same grid.
• Solution: Anchor boxes. It is a predefined bounding box used in object detection
algorithms.
• The anchor box is used to define the size and aspect ratio of the window, and it is defined
prior to training the object detection model. The model is then trained to predict the
bounding box coordinates and class probabilities for objects relative to the anchor box.
Each object in training Per grid target label
image is assigned to
grid cell that contains
object’s midpoint and
anchor box for the grid
cell with highest IoU.
Putting it together: YOLO algorithm

Two anchor boxes used

Outputting the non-max suppressed outputs
Detection evaluation
mAP formula is based on the following sub metrics:
• Confusion Matrix,
• Intersection over Union(IoU),
• Recall,
• Precision

Confusion Matrix
• To create a confusion matrix, we need four attributes:
• True Positives (TP): The model predicted a label and matches correctly as per ground
truth.
• True Negatives (TN): The model does not predict the label and is not a part of the
ground truth.
• False Positives (FP): The model predicted a label, but it is not a part of the ground
truth.
• False Negatives (FN): The model does not predict a label, but it is part of the ground
truth.
Confusion Matrix
Detection evaluation

• Precision measures how many of the “positive” predictions

made by the model were correct.

• Recall measures how many of the positive class samples

present in the dataset were correctly identified by the model.
• Precision and recall offer a trade-off, i.e., one metric comes
at the cost of another.
mAP
• The mAP is calculated by finding Average Precision(AP) for each class and then average over a
number of classes.

Object Detection Week 2 YOLOv1-YOLOv8
100% (1)
Object Detection Week 2 YOLOv1-YOLOv8
264 pages
A Survey of Evolution of Image Captioning PDF
No ratings yet
A Survey of Evolution of Image Captioning PDF
18 pages
Session 1
0% (1)
Session 1
13 pages
Hybrid Quick Sort + Insertion Sort: Runtime Comparison
No ratings yet
Hybrid Quick Sort + Insertion Sort: Runtime Comparison
26 pages
Object Detection
No ratings yet
Object Detection
57 pages
Project
100% (1)
Project
30 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
Object Recognition
No ratings yet
Object Recognition
30 pages
Computer Vision
No ratings yet
Computer Vision
4 pages
Object Detection - Week 1 - Object Detection in 20 Years - Final
No ratings yet
Object Detection - Week 1 - Object Detection in 20 Years - Final
280 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Segmentation Detection
100% (1)
Segmentation Detection
109 pages
GANppt
100% (1)
GANppt
34 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
13 pages
The COMPLETE TRUTH About AI Agents (2024)
No ratings yet
The COMPLETE TRUTH About AI Agents (2024)
32 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)
No ratings yet
Cs490 Advanced Topics in Computing (Deep Learning) : Lecture 16: Convolutional Neural Networks (CNNS)
63 pages
Tensorflow Internal
No ratings yet
Tensorflow Internal
17 pages
Deep Learning Project for Computer Vision with Python 2022
No ratings yet
Deep Learning Project for Computer Vision with Python 2022
297 pages
Tensor Flow
No ratings yet
Tensor Flow
12 pages
22 Selected Top Papers On Deep Learning
No ratings yet
22 Selected Top Papers On Deep Learning
393 pages
Unit-V Deep Learning Techniques
100% (1)
Unit-V Deep Learning Techniques
31 pages
OpenCV - Cheatsheet
100% (1)
OpenCV - Cheatsheet
12 pages
GNN Review
No ratings yet
GNN Review
26 pages
Yolo
No ratings yet
Yolo
10 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
Deep Learning Interview Questions
No ratings yet
Deep Learning Interview Questions
17 pages
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
No ratings yet
Word2Vec Tutorial - The Skip-Gram Model Chris McCormick
18 pages
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
No ratings yet
An Introduction To Vision-Language Modeling: Aishwarya Agrawal Kate Saenko Asli Celikyilmaz Vikas Chandra
76 pages
(Advances in Computer Vision and Pattern Recognition) Ke Gu, Hongyan Liu, Chengxu Zhou - Quality Assessment of Visual Content-Springer (2022)
No ratings yet
(Advances in Computer Vision and Pattern Recognition) Ke Gu, Hongyan Liu, Chengxu Zhou - Quality Assessment of Visual Content-Springer (2022)
256 pages
Zhihui X (2008) Computer - Vision, I-Tech
No ratings yet
Zhihui X (2008) Computer - Vision, I-Tech
549 pages
Deep Learning RNN
100% (1)
Deep Learning RNN
53 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
No ratings yet
CNN Architectures: Lenet, Alexnet, VGG, Googlenet, Resnet and More
9 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Deep Learning: Huawei AI Academy Training Materials
No ratings yet
Deep Learning: Huawei AI Academy Training Materials
47 pages
Dropout Vs Pruning
No ratings yet
Dropout Vs Pruning
2 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
24 pages
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
No ratings yet
Image Classification Using Pre-Trained Convolutional Neural Network in COLAB
6 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
Object Detection Using Image Processing
No ratings yet
Object Detection Using Image Processing
17 pages
Deep Learning Tutorial Complete (v3)
No ratings yet
Deep Learning Tutorial Complete (v3)
109 pages
Artificial Vision For Robots
No ratings yet
Artificial Vision For Robots
228 pages
Backpropagation
No ratings yet
Backpropagation
7 pages
Types of Neural Networks
No ratings yet
Types of Neural Networks
7 pages
Deep Learning and TensorFlow
No ratings yet
Deep Learning and TensorFlow
50 pages
Dictionary - of - Computer - Vision - and - Image Book PDF
No ratings yet
Dictionary - of - Computer - Vision - and - Image Book PDF
338 pages
1 - Machine Learning (Start)
No ratings yet
1 - Machine Learning (Start)
32 pages
Computer Vision Unit 4
No ratings yet
Computer Vision Unit 4
186 pages
Feature Selection Techniques in ML With Python-1
No ratings yet
Feature Selection Techniques in ML With Python-1
7 pages
2.building Blocks of Neural Networks
100% (1)
2.building Blocks of Neural Networks
2 pages
chapter 4 Neural Network
No ratings yet
chapter 4 Neural Network
46 pages
Chapter4 Associative Memory
No ratings yet
Chapter4 Associative Memory
27 pages
Deep Learning and Computer Vision For Video Analytics
No ratings yet
Deep Learning and Computer Vision For Video Analytics
37 pages
Data Science Guide
No ratings yet
Data Science Guide
275 pages
Stock Price Prediction Using Machine Learning With Python
No ratings yet
Stock Price Prediction Using Machine Learning With Python
10 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Yolo
No ratings yet
Yolo
24 pages
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
No ratings yet
Object Detection Using You Only Look Once (YOLO) Algorithm in Convolution Neural Network (CNN)
5 pages
ANN Unit -IV Competitive learning Neural Network
No ratings yet
ANN Unit -IV Competitive learning Neural Network
13 pages
Deep Learning Techniques (Important Questions)
No ratings yet
Deep Learning Techniques (Important Questions)
5 pages
Cse329:Prelude To Competitive Coding: Course Outcomes
No ratings yet
Cse329:Prelude To Competitive Coding: Course Outcomes
3 pages
CS198 Programming Assignment 2
No ratings yet
CS198 Programming Assignment 2
4 pages
DWDM Ii Mid Paper
No ratings yet
DWDM Ii Mid Paper
2 pages
60003190064_Assignment
No ratings yet
60003190064_Assignment
2 pages
Module 1 - Recursion
No ratings yet
Module 1 - Recursion
18 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
20 pages
NumericalIntegration PDF
No ratings yet
NumericalIntegration PDF
17 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Caile Ren and Leilei Meng1
No ratings yet
Caile Ren and Leilei Meng1
4 pages
"FFT-based Resynthesis" Zack Settel & Cort Lippe
No ratings yet
"FFT-based Resynthesis" Zack Settel & Cort Lippe
13 pages
Lec 6 Bisection Method
No ratings yet
Lec 6 Bisection Method
24 pages
Chapter 2 - 1 - Roots of Equations - Bracketing Methods
No ratings yet
Chapter 2 - 1 - Roots of Equations - Bracketing Methods
24 pages
Web 3
No ratings yet
Web 3
17 pages
Intro Regression Modeling
No ratings yet
Intro Regression Modeling
11 pages
Greedy
No ratings yet
Greedy
8 pages
Lecture2-Linear Regression With One Variable
No ratings yet
Lecture2-Linear Regression With One Variable
49 pages
Ensemble Learning
No ratings yet
Ensemble Learning
12 pages
Summary of Power Flow Studies
No ratings yet
Summary of Power Flow Studies
10 pages
105 ROUNDING NUMBERS and SIGNIFICANT FIGURES - Final 10
100% (1)
105 ROUNDING NUMBERS and SIGNIFICANT FIGURES - Final 10
23 pages
Machine Learning With Python Cookbook 2e Preview
No ratings yet
Machine Learning With Python Cookbook 2e Preview
5 pages
Ad3311 Set 1
No ratings yet
Ad3311 Set 1
2 pages
Tut 10 Questions
No ratings yet
Tut 10 Questions
2 pages
Lect 7 Single Layer NN
No ratings yet
Lect 7 Single Layer NN
20 pages
Digital Signal Processing (EEE324) : Lab Instructor Engr. Muhammad Ayaz
No ratings yet
Digital Signal Processing (EEE324) : Lab Instructor Engr. Muhammad Ayaz
12 pages
Linear Prog Lecture9&10 BCD
No ratings yet
Linear Prog Lecture9&10 BCD
14 pages
MCA 454 Java Lab Assignment 1
No ratings yet
MCA 454 Java Lab Assignment 1
5 pages
Insertion Sort Algorithm in Programming
No ratings yet
Insertion Sort Algorithm in Programming
1 page

Object Detection Slides

Uploaded by

Object Detection Slides

Uploaded by

Object Detection

Image Classification Object Detection

First Selective Search • Then combines the similar

• CNN for feature extraction

• Linear SVM classifier for identifying objects.

• Regression model for tightening the bounding boxes.

• As CNN is followed by fully connected layers which can accept

• This makes CNN incapable of accepting varied size inputs. Thus,

• This creates another issue of image warping and reduced

• Training is still Slow( though better).

• Using these maps, the regions of proposals are extracted.

• Its original size is 145x200 and the top left corner is

• width: 200/32 = 6.25

• height: 145/32 = ~4.53

• It also uses selective search as a

• It takes around 2 seconds per image

• But when we consider large real-life

Each object in training

Two anchor boxes used

• Precision measures how many of the “positive” predictions

• Recall measures how many of the positive class samples

You might also like