Computer Vision - Compressed

Overview of the YOLO
Object Detection Algorithm

Presented by:
-Benamara Ichrak
-Chouchaoui Med Bachir
_Messar Aya
1
OUR WORK :
01 03
YOLO Overview How YOLO Works
Challenges and limits of Forward Pass and Predictions
Yolo
02 04
YOLO Architecture Training Process
Convolutional Layers and Loss Functions and data
Feature Extraction augmentation.
2
01
YOLO
Overview
3
4
5
6
7
02
YOLO
Architecture
8
ARCHITECTURE
predicting bounding boxes and class labels in a single pass through the network.
That ia why it is called YOU ONLY LOOK ONCE 9
10
10
INPUT LAYER
Grid Cells
● Image resized to 448x448 (typically)

● Divided into SxS grid cell
● S= 7 in paper
● Each grid cell has 64 px
● Each cell is responsible for predicting
one object
Normalization
Raw image Values (0-255):
Pixel (0,0) : R: 220 G: 150 B: 100
Normalization(0-1) :
normalized_value = pixel_value / 255
Pixel (0,0) R: 0.86 G: 0.59 B: 0.39
11
Each grid cell predicts:
● Bounding boxes (typically, each cell predicts multiple boxes).
● Confidence scores indicating how sure the model is that a box contains an object. Also
reflects how accurate the predicted box is.
● Class probabilities how likely the object belongs to a specific class (e.g., fire, cat..)
12
Non-Maximum Suppression :
13
Non-Maximum Suppression :
Non-Maximum Suppression (NMS) is
a post-processing technique used in
object detection to eliminate redundant
or overlapping bounding boxes, How it works :
retaining only the most accurate
prediction for each object.
filters out overlapping bounding
boxes by keeping the one with the
highest confidence score and
discarding others with lower scores.
It repeats this process until no more
boxes are left to evaluate.
14
The predicted bounding box is represented
as = (x ,y ,w ,h , c )
● Center points (x , y) : relative to the cell
● Width/Height (w , h) : relative to the

whole image
15
How are bounding boxes encoded
Let's use a simple exemple where S= 3 , each cell predicts one bounding box
(B=1) and objects are either dog (c1) or human (c2)
Exemple
16
Multiple bounding boxes
What happens if we predict multiple bounding boxes per cell (B>1) ?

We simply augment y
17
Network:
● 24 convolutional
layers
● 2 fully connected
layers
18
19
Prediction Vector
Output layer :
7x7x30
20
21
22
03
How YOLO
works
23
Forward Pass and Predictions :
Forward pass in YOLO involves running the input
image through a CNN (Convolutional Neural Network)
to generate feature maps, which are then used to
predict both bounding boxes and class probabilities for
each grid cell.
24
25
26
27
28
Multiple objects in one image :
To detect multiple objects in an image,
YOLO divides the image into grid cells,
where each cell is responsible for
predicting objects whose centers fall
within it. This approach enables
simultaneous detection and classification
of multiple objects in a single forward
pass.
29
And for each grid cell in YOLO,
the network predicts a vector
containing information that we
saw before .
30
So ,if YOLO divides an image
into a grid of 16 cells (4x4),
and each grid cell predicts a
vector of size 7, the output
volume will have a size 112.
31
Let’s form our training dataset:
32
Prediction:
○ The final output is a list

of bounding boxes,
each with:
■ Predicted object
class.
■ Bounding box
coordinates
(x,y,w,hx, y, w,
hx,y,w,h).
■ Confidence score.
33
YOU ONLY LOOK ONCE :
YOLO is called "You Only Look Once"
because it processes the entire image
in a single forward pass through the
network to detect and classify objects.
This unified and efficient approach
contrasts with older models that
required multiple stages to perform
object detection.
34
1st Issue that we can have :
First issue is the algorithm

might detect multiple
bounding rectangles for a
given object.
35
IoU Concept:
The Intersection over Union
(IoU) is a key metric in object
detection, used to evaluate the
accuracy of predicted bounding
boxes against ground truth
bounding boxes. It measures how
much overlap exists between the
two boxes and is represented as a
ratio.
IoU=Area of Union / Area of Overlap
36
2nd Issue that we can have :
So after neural network has

detected all the objects you
apply no max suppression and
you get these unique bounding
boxes there could be another
issue is :
what if a single cell contains
the center of two objects?
37
see this vector can represent
only one class.
So how do you represent two
class?
Well I have this value for dog. I
have this value for
Person so instead of having a
seven dimension vector
how about we have a vector of
size 14. where you're just
concatenating these Two
vectors.
38
04
Optimizing Object
Detection Performance
39
40
41
42
43
44
45
Any
Questions ?
46

Computer Vision - Compressed

Uploaded by

Copyright:

Available Formats

Computer Vision - Compressed

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Vision - Compressed

Uploaded by

Copyright:

Available Formats

Overview of the YOLO

Object Detection Algorithm

● Image resized to 448x448 (typically)

Pixel (0,0) R: 0.86 G: 0.59 B: 0.39

● Center points (x , y) : relative to the cell

● Width/Height (w , h) : relative to the

What happens if we predict multiple bounding boxes per cell (B>1) ?

○ The final output is a list

First issue is the algorithm

IoU=Area of Union / Area of Overlap

So after neural network has

You might also like