Computer Vision - Compressed

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Overview of the YOLO

Object Detection Algorithm


Presented by:
-Benamara Ichrak
-Chouchaoui Med Bachir
_Messar Aya

1
OUR WORK :

01 03
YOLO Overview How YOLO Works
Challenges and limits of Forward Pass and Predictions
Yolo

02 04
YOLO Architecture Training Process
Convolutional Layers and Loss Functions and data
Feature Extraction augmentation.

2
01
YOLO
Overview

3
4
5
6
7
02
YOLO
Architecture

8
ARCHITECTURE

predicting bounding boxes and class labels in a single pass through the network.
That ia why it is called YOU ONLY LOOK ONCE 9
10
10
INPUT LAYER

Grid Cells

● Image resized to 448x448 (typically)


● Divided into SxS grid cell
● S= 7 in paper
● Each grid cell has 64 px
● Each cell is responsible for predicting
one object

Normalization
Raw image Values (0-255):
Pixel (0,0) : R: 220 G: 150 B: 100

Normalization(0-1) :
normalized_value = pixel_value / 255

Pixel (0,0) R: 0.86 G: 0.59 B: 0.39

11
Each grid cell predicts:
● Bounding boxes (typically, each cell predicts multiple boxes).
● Confidence scores indicating how sure the model is that a box contains an object. Also
reflects how accurate the predicted box is.
● Class probabilities how likely the object belongs to a specific class (e.g., fire, cat..)

12
Non-Maximum Suppression :

13
Non-Maximum Suppression :
Non-Maximum Suppression (NMS) is
a post-processing technique used in
object detection to eliminate redundant
or overlapping bounding boxes, How it works :
retaining only the most accurate
prediction for each object.
filters out overlapping bounding
boxes by keeping the one with the
highest confidence score and
discarding others with lower scores.
It repeats this process until no more
boxes are left to evaluate.

14
The predicted bounding box is represented
as = (x ,y ,w ,h , c )

● Center points (x , y) : relative to the cell

● Width/Height (w , h) : relative to the


whole image

15
How are bounding boxes encoded

Let's use a simple exemple where S= 3 , each cell predicts one bounding box
(B=1) and objects are either dog (c1) or human (c2)
Exemple

16
Multiple bounding boxes

What happens if we predict multiple bounding boxes per cell (B>1) ?


We simply augment y

17
Network:

● 24 convolutional
layers

● 2 fully connected
layers

18
19
Prediction Vector

Output layer :
7x7x30

20
21
22
03
How YOLO
works

23
Forward Pass and Predictions :
Forward pass in YOLO involves running the input
image through a CNN (Convolutional Neural Network)
to generate feature maps, which are then used to
predict both bounding boxes and class probabilities for
each grid cell.

24
25
26
27
28
Multiple objects in one image :
To detect multiple objects in an image,
YOLO divides the image into grid cells,
where each cell is responsible for
predicting objects whose centers fall
within it. This approach enables
simultaneous detection and classification
of multiple objects in a single forward
pass.

29
And for each grid cell in YOLO,
the network predicts a vector
containing information that we
saw before .

30
So ,if YOLO divides an image
into a grid of 16 cells (4x4),
and each grid cell predicts a
vector of size 7, the output
volume will have a size 112.

31
Let’s form our training dataset:

32
Prediction:

○ The final output is a list


of bounding boxes,
each with:
■ Predicted object
class.
■ Bounding box
coordinates
(x,y,w,hx, y, w,
hx,y,w,h).
■ Confidence score.

33
YOU ONLY LOOK ONCE :
YOLO is called "You Only Look Once"
because it processes the entire image
in a single forward pass through the
network to detect and classify objects.
This unified and efficient approach
contrasts with older models that
required multiple stages to perform
object detection.

34
1st Issue that we can have :

First issue is the algorithm


might detect multiple
bounding rectangles for a
given object.

35
IoU Concept:
The Intersection over Union
(IoU) is a key metric in object
detection, used to evaluate the
accuracy of predicted bounding
boxes against ground truth
bounding boxes. It measures how
much overlap exists between the
two boxes and is represented as a
ratio.

IoU=Area of Union / Area of Overlap

36
2nd Issue that we can have :

So after neural network has


detected all the objects you
apply no max suppression and
you get these unique bounding
boxes there could be another
issue is :
what if a single cell contains
the center of two objects?

37
see this vector can represent
only one class.
So how do you represent two
class?
Well I have this value for dog. I
have this value for
Person so instead of having a
seven dimension vector
how about we have a vector of
size 14. where you're just
concatenating these Two
vectors.

38
04
Optimizing Object
Detection Performance

39
40
41
42
43
44
45
Any
Questions ?

46

You might also like