0% found this document useful (0 votes)

15 views24 pages

CSE4261 Lecture-12

CSE Department

Uploaded by

asad chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views24 pages

CSE4261 Lecture-12

CSE Department

Uploaded by

asad chowdhury

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

Object Classification: An Introduction

Prof. Dr. Shamim Akhter

Professor, Dept. of CSE
Ahsanullah University of Science and Technology
Object Classification

CAT DOG

Typically, recognizing that you have many

instances of an object in an image involves
? finding their position in an image and being
able to distinguish between them.
To do that, we need to be able to find the
positions of each instance in the image and
their borders.
Object localization(single)/detection(multiple)
Object Localization
• Determine the location of one or more objects
(for example, people or cars) in an image and
draw a rectangular bounding box around it.
– Localization typically refers to when an image contains only
one instance of an object, while detection is when several
instances of an object in an image.
• Classification: Give a label to an image, or in other
words, “understand” what is in an image.
– For example, an image of a cat may have the label “cat”.
• Classification and localization: Give a label to an image
and determine the borders of the object contained in it
(and typically draw a rectangle around the object).
• Object detection: This term is used when you have
multiple instances of an object in an image.
– In object detection, you want to determine all the instances of
several objects (for example, people, cars, signs, etc.) and draw
bounding boxes around them.
• Instance segmentation: You want to label each pixel of
the image with a specific class for each separate instance,
to be able to find the exact limits of the object instance.
• Semantic segmentation: You want to label each pixel
of the image with a specific class.
– The difference with instance segmentation is that you don’t
care if you have several instances of a car as examples. All
pixels belonging to the cars will be labelled as “car”.
– In instance segmentation, you will still be able to tell how
many instances of a car you have and where they are exactly.
Object Classification and Localization
We will assume that in the images we have only one instance of a
specific object, and the task is to determine what kind of object it is
and draw a bounding box (a rectangle) around it.

How to create a bounding box around the object?

Sliding Window Approach
• Cut a small portion of your input
image(x,y) starting from the top-left
corner. Has dimensions wx, wy, with
wx < x and wy<y.
• Use a pre-trained network and let it
classify the image portion that you cut.
• Now shift this window by an amount we
call strides toward the right and then
below. You use the network to classify
this second portion.
• Once the sliding window has covered the
entire image, you choose the position of
the window that gives you the highest A graphical illustration of the sliding
classification probability. This position window approach
will give you the bounding box of your
object.
Problems and Limitations:
Sliding Window Approach
• Depending on the choice of wx, wy, and s, we may not be able to
cover the entire image.
• How do you choose wx, wy, and s? What if the object is larger or
smaller?
• What if our object flows across two windows?
We could solve the third problem by using s = 1 to be sure that we cover all
possible cases, but the first two problems are not so easy to solve.

To address the window size problem, we should try all possible sizes and all
possible proportions.

Do you see any problem here?

The number of evaluations that you will need to do with your network is getting
out of control and will become quickly computationally infeasible.
Multi-task Learning
A better approach is to use multi-task learning. The idea is that we can build a network that
will learn at the same time the class and the position of the bounding box.

Learn the class

Position of the bounding box

Need to minimize a linear combination of the two loss functions:

α is an additional hyper-parameter that needs

to be tuned.
Region Based CNN(R-CNN)
• Girshick proposed a Selective search: first propose 2000 regions
from the image and then, instead of classifying a huge number of
regions, they classified just those 2000 regions.
• uses a classical approach to determine which regions may contain
an object.
 The first step in the algorithm is
to segment an image, using pixel
intensities and graph-based
methods
 After this step, adjacent regions
are grouped based on similarities
of the following features: Color
similarity, Texture similarity, Size
similarity, and Shape
compatibility.
 Produces 2000 candidate region
proposals
 These 2000 candidate region proposals are warped into a
square and fed into a pre-trained convolutional neural
network (Alexnet) that produces a 4096-dimensional
feature vector as output.
 The extracted features are fed into an SVM to classify the
presence of the object within that candidate region
proposal.
 The algorithm also predicts four(4) values which are offset
values to increase the precision of the bounding box.
 For example, given a region proposal, the algorithm would have
predicted the presence of a person but the face of that person
within that region proposal could’ve been cut in half. Therefore,
the offset values help in adjusting the bounding box of the
region proposal.
Problems with R-CNN
• It still takes a huge amount of time to train the network
as you would have to classify 2000 region proposals per
image.

• It cannot be implemented in real time as it takes around

47 seconds for each test image.

• The selective search algorithm is a fixed algorithm.

Therefore, no learning is happening at that stage. This
could lead to the generation of bad candidate region
proposals.
Fast R-CNN

• The approach is similar to the R-CNN algorithm. But, instead of feeding

the region proposals to the CNN, we feed the input image to the CNN to
generate a convolutional feature map.
• From the convolutional feature map, we identify the region of proposals
and warp them into squares, and by using a RoI pooling layer we reshape
them into a fixed size so that it can be fed into a fully connected layer.
• From the RoI feature vector, we use a softmax layer to predict the class of
the proposed region and also the offset values for the bounding box.
Faster R-CNN
• Shaoqing Ren et al. developed an
object detection algorithm that
eliminates the selective search
algorithm and lets the network
learn the region proposals.
• Similar to Fast R-CNN, the image is
provided as input to a convolutional
network, providing a convolutional
feature map.
• Instead of using a selective search
algorithm on the feature map to
identify the region proposals, a
separate network is used to predict
the region proposals.
• The predicted region proposals are
then reshaped using a RoI pooling
layer which is then used to classify
the image within the proposed
region and predict the offset values
for the bounding boxes.
You Only Look Once (YOLO) Method
• In 2015, Redmon J. et al. proposed a new method to
do object detection: they called it YOLO (You Only
Look Once).
• This method is fast and is used often in real-time
applications
– The network can perform all the necessary tasks (detect where
the objects are, classify multiple objects, etc.) in one pass.

• The main idea of the method is to reframe the

detection problem as one single regression problem,
– from the pixels of the image as inputs, to the bounding box
coordinates and class probabilities.
How YOLO Works
• Dividing the Image Into Cells
– The first step is to divide the image into S × S cells. For each cell,
we predict what (and if an) object is in the cell.
• Only one object will be predicted for each cell, so one cell cannot predict
multiple objects.
– Then for each cell, a certain number (B) of bounding boxes that
should contain the objects are predicted.
The likelihood of each object class (C).
– And predicts a class confidence (a number) for each bounding box.

Let’s take as an example cell D3 in the given

Figure. This cell will predict the presence of a
mouse and then it will predict a certain
number B of bounding boxes (the yellow
rectangles). Similarly, cell B2 will predict the
presence of the bottle and B bounding boxes
(the red rectangles) all at the same time.
• For each bounding box (B in total), there are four
values: x, y, w, h. These are the position of the center,
its width, and its height. Note that the position of the
center is given with a relationship to the cell position,
not as an absolute value.
• For each bounding box (B in total), there is a confidence
score, which is a number that reflects how likely the
box contains the object. In particular, at training time, if
we indicate the probability of the cell containing the
object as Pr(Object), the confidence is calculated as
follows:

Where IOU indicates the Intersection Over Union,

which is calculated using the training data
IOU (Intersect Over Union)
• This is a fully supervised task.
• This means that we will need to learn where the bounding
boxes are and compare them to some given ground truth. We
need a metric to quantify how good the overlap is between
the predicted bounding boxes and the ground truth. This is
typically done with the IOU (Intersect Over Union) .

In the ideal case of perfect overlap, we

have IOU = 1, while if there is no
overlap at all, we have IOU = 0.
?

?
In the original paper, the authors were inspired by the GoogLeNet
model. The network has 24 layers followed by two dense layers (the
last one having 1470 neurons; do you see why?).
4 box parameters 1 cs

Elective Mathematics Super Mock 2025
No ratings yet
Elective Mathematics Super Mock 2025
4 pages
MS-MO6-L02-Theory of Columns-Rankine Formula
No ratings yet
MS-MO6-L02-Theory of Columns-Rankine Formula
11 pages
ML Cheat Sheet
50% (2)
ML Cheat Sheet
74 pages
Steel Tips - Base Plates 1
No ratings yet
Steel Tips - Base Plates 1
6 pages
Object Detection and Tracking
No ratings yet
Object Detection and Tracking
144 pages
L10 Lecture Detection - Segmentation v2.5
No ratings yet
L10 Lecture Detection - Segmentation v2.5
35 pages
Unit 3 - 1 - 1709014556934
No ratings yet
Unit 3 - 1 - 1709014556934
49 pages
CVR FDP
No ratings yet
CVR FDP
37 pages
Object Detection
No ratings yet
Object Detection
76 pages
CSE4261 Lecture-11
No ratings yet
CSE4261 Lecture-11
35 pages
Part 2
No ratings yet
Part 2
225 pages
Yolo Family
No ratings yet
Yolo Family
40 pages
01-02 Introduction To CV and Segmentation
No ratings yet
01-02 Introduction To CV and Segmentation
85 pages
Region-Based Object Detection and Classification Using Faster R-CNN
No ratings yet
Region-Based Object Detection and Classification Using Faster R-CNN
6 pages
Gao, Packer, Koller - Unknown - A Segmentation-Aware Object Detection Model With Occlusion Handling-Annotated
No ratings yet
Gao, Packer, Koller - Unknown - A Segmentation-Aware Object Detection Model With Occlusion Handling-Annotated
8 pages
Everhard™: Abrasion-Resistant Steel Plate
No ratings yet
Everhard™: Abrasion-Resistant Steel Plate
12 pages
L7 Detection
No ratings yet
L7 Detection
54 pages
139 Pretrained Networks Object Detection
No ratings yet
139 Pretrained Networks Object Detection
22 pages
DL Unit-5
No ratings yet
DL Unit-5
34 pages
Module 6
No ratings yet
Module 6
83 pages
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
No ratings yet
A Review On Various Methodologies Used For Vehicle Classification, Helmet Detection and Number Plate Recognition
9 pages
1974 Lambda Catalog and Application Handbook
No ratings yet
1974 Lambda Catalog and Application Handbook
191 pages
Unit 3-Non CNN Approaches To Object Recognition
No ratings yet
Unit 3-Non CNN Approaches To Object Recognition
26 pages
Object Detection Slides
No ratings yet
Object Detection Slides
90 pages
Unit 5 CV
No ratings yet
Unit 5 CV
13 pages
Object Detection
No ratings yet
Object Detection
96 pages
Company SNP (Eng) - Color - 1-6-61
No ratings yet
Company SNP (Eng) - Color - 1-6-61
95 pages
Najibi G-CNN An Iterative CVPR 2016 Paper
No ratings yet
Najibi G-CNN An Iterative CVPR 2016 Paper
9 pages
Half Deflection
No ratings yet
Half Deflection
4 pages
Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Lec36 Obj Detn
No ratings yet
Lec36 Obj Detn
60 pages
Dataform
No ratings yet
Dataform
17 pages
8 ObectDectection
No ratings yet
8 ObectDectection
60 pages
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
No ratings yet
cv2021 Lec6 Object Detection - 1600 - PDF - Gdrive.vip
60 pages
Including:: 4 Authors
No ratings yet
Including:: 4 Authors
34 pages
Unit 3
No ratings yet
Unit 3
17 pages
Probability of One Event
No ratings yet
Probability of One Event
14 pages
RO47002 - Lecture 2A - Case Study Visual Object Detection
No ratings yet
RO47002 - Lecture 2A - Case Study Visual Object Detection
24 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
CSE4261 Lecture-8
No ratings yet
CSE4261 Lecture-8
49 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
10 R CNN
No ratings yet
10 R CNN
28 pages
Wepik Advancing Object Detection Unveiling The Potential For Precision and Efficiency 202401081226449LyU
No ratings yet
Wepik Advancing Object Detection Unveiling The Potential For Precision and Efficiency 202401081226449LyU
22 pages
Deep Learning: Dr. Sanjeev Sharma
No ratings yet
Deep Learning: Dr. Sanjeev Sharma
61 pages
Report 34
No ratings yet
Report 34
22 pages
Industrial Filters PDF
No ratings yet
Industrial Filters PDF
48 pages
Deepak Singh Resume
No ratings yet
Deepak Singh Resume
2 pages
Introduction To Computer Vision
No ratings yet
Introduction To Computer Vision
45 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Module 2 Lab: Creating Data Types and Tables
No ratings yet
Module 2 Lab: Creating Data Types and Tables
5 pages
SAMPLING and SAMPLING DISTRIBUTIONS (With Key)
No ratings yet
SAMPLING and SAMPLING DISTRIBUTIONS (With Key)
5 pages
8051 Instruction Set
No ratings yet
8051 Instruction Set
50 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
A Simheuristic Approach For Throughput Maximization of A - 2020 - Computers - Op
No ratings yet
A Simheuristic Approach For Throughput Maximization of A - 2020 - Computers - Op
13 pages
Project Report Final 1
No ratings yet
Project Report Final 1
63 pages
CHEVRON Maintenance Heat Exchanger
67% (3)
CHEVRON Maintenance Heat Exchanger
23 pages
CSE4261 Lecture-9
No ratings yet
CSE4261 Lecture-9
45 pages
Basic Programming Sample Paper
No ratings yet
Basic Programming Sample Paper
11 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
Object Detection and Identification A Project Report: November 2019
No ratings yet
Object Detection and Identification A Project Report: November 2019
45 pages
Object Detection Techniques A Review
No ratings yet
Object Detection Techniques A Review
9 pages
Object Detection and Identification A Project Report: November 2019
No ratings yet
Object Detection and Identification A Project Report: November 2019
45 pages
Edi Lab - 2019-2020
No ratings yet
Edi Lab - 2019-2020
13 pages
Freeman Chain Code
No ratings yet
Freeman Chain Code
8 pages
Object Detection and Identification
67% (3)
Object Detection and Identification
20 pages
R-CNN and FR-CNN Report: Methods Used at The Core of Object Detection
No ratings yet
R-CNN and FR-CNN Report: Methods Used at The Core of Object Detection
4 pages
Real-Time Object Detection Using Deep Learning and Open CV
No ratings yet
Real-Time Object Detection Using Deep Learning and Open CV
4 pages
Tensor Flow
No ratings yet
Tensor Flow
5 pages
Introduction To Cisco PIX and ASA
No ratings yet
Introduction To Cisco PIX and ASA
35 pages
Mini Project Synopsis
No ratings yet
Mini Project Synopsis
6 pages
Literature Survey For Robotics
No ratings yet
Literature Survey For Robotics
6 pages
Starbucks Review
No ratings yet
Starbucks Review
34 pages
01 Task Performance 1
No ratings yet
01 Task Performance 1
3 pages
C# Practical Solution
No ratings yet
C# Practical Solution
61 pages
Yolo Vs RCNN
No ratings yet
Yolo Vs RCNN
5 pages
Din 653
No ratings yet
Din 653
5 pages
Object Detection
No ratings yet
Object Detection
13 pages
SR22804211151
No ratings yet
SR22804211151
8 pages
Multiflex Assembly Instructions
No ratings yet
Multiflex Assembly Instructions
52 pages
Objectdetection
No ratings yet
Objectdetection
7 pages
BODMAS 1new
No ratings yet
BODMAS 1new
2 pages
Project Detecto!: A Real-Time Object Detection Model
No ratings yet
Project Detecto!: A Real-Time Object Detection Model
3 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
No ratings yet
Synopsis of Real Time Security System: Submitted in Partial Fulfillment of The Requirements For The Award of
6 pages
Aircraft Welding Cabriana
No ratings yet
Aircraft Welding Cabriana
5 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
Le Club Francais Case
No ratings yet
Le Club Francais Case
8 pages
Object Detection
No ratings yet
Object Detection
57 pages
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
No ratings yet
R-CNN, Fast R-CNN, Faster R-CNN, YOLO - Object Detection Algorithms
11 pages
Noblelft FD20-35 Operation & Maintenance Manual
No ratings yet
Noblelft FD20-35 Operation & Maintenance Manual
108 pages
Bounding Volume: Exploring Spatial Representation in Computer Vision
From Everand
Bounding Volume: Exploring Spatial Representation in Computer Vision
Fouad Sabry
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

CSE4261 Lecture-12

Uploaded by

CSE4261 Lecture-12

Uploaded by

Object Classification: An Introduction

Prof. Dr. Shamim Akhter

Typically, recognizing that you have many

How to create a bounding box around the object?

Do you see any problem here?

Learn the class

Position of the bounding box

Need to minimize a linear combination of the two loss functions:

α is an additional hyper-parameter that needs

• It cannot be implemented in real time as it takes around

• The selective search algorithm is a fixed algorithm.

• The approach is similar to the R-CNN algorithm. But, instead of feeding

• The main idea of the method is to reframe the

Let’s take as an example cell D3 in the given

Where IOU indicates the Intersection Over Union,

In the ideal case of perfect overlap, we

You might also like