Computer Vision - Compressed
Computer Vision - Compressed
Computer Vision - Compressed
1
OUR WORK :
01 03
YOLO Overview How YOLO Works
Challenges and limits of Forward Pass and Predictions
Yolo
02 04
YOLO Architecture Training Process
Convolutional Layers and Loss Functions and data
Feature Extraction augmentation.
2
01
YOLO
Overview
3
4
5
6
7
02
YOLO
Architecture
8
ARCHITECTURE
predicting bounding boxes and class labels in a single pass through the network.
That ia why it is called YOU ONLY LOOK ONCE 9
10
10
INPUT LAYER
Grid Cells
Normalization
Raw image Values (0-255):
Pixel (0,0) : R: 220 G: 150 B: 100
Normalization(0-1) :
normalized_value = pixel_value / 255
11
Each grid cell predicts:
● Bounding boxes (typically, each cell predicts multiple boxes).
● Confidence scores indicating how sure the model is that a box contains an object. Also
reflects how accurate the predicted box is.
● Class probabilities how likely the object belongs to a specific class (e.g., fire, cat..)
12
Non-Maximum Suppression :
13
Non-Maximum Suppression :
Non-Maximum Suppression (NMS) is
a post-processing technique used in
object detection to eliminate redundant
or overlapping bounding boxes, How it works :
retaining only the most accurate
prediction for each object.
filters out overlapping bounding
boxes by keeping the one with the
highest confidence score and
discarding others with lower scores.
It repeats this process until no more
boxes are left to evaluate.
14
The predicted bounding box is represented
as = (x ,y ,w ,h , c )
15
How are bounding boxes encoded
Let's use a simple exemple where S= 3 , each cell predicts one bounding box
(B=1) and objects are either dog (c1) or human (c2)
Exemple
16
Multiple bounding boxes
17
Network:
● 24 convolutional
layers
● 2 fully connected
layers
18
19
Prediction Vector
Output layer :
7x7x30
20
21
22
03
How YOLO
works
23
Forward Pass and Predictions :
Forward pass in YOLO involves running the input
image through a CNN (Convolutional Neural Network)
to generate feature maps, which are then used to
predict both bounding boxes and class probabilities for
each grid cell.
24
25
26
27
28
Multiple objects in one image :
To detect multiple objects in an image,
YOLO divides the image into grid cells,
where each cell is responsible for
predicting objects whose centers fall
within it. This approach enables
simultaneous detection and classification
of multiple objects in a single forward
pass.
29
And for each grid cell in YOLO,
the network predicts a vector
containing information that we
saw before .
30
So ,if YOLO divides an image
into a grid of 16 cells (4x4),
and each grid cell predicts a
vector of size 7, the output
volume will have a size 112.
31
Let’s form our training dataset:
32
Prediction:
33
YOU ONLY LOOK ONCE :
YOLO is called "You Only Look Once"
because it processes the entire image
in a single forward pass through the
network to detect and classify objects.
This unified and efficient approach
contrasts with older models that
required multiple stages to perform
object detection.
34
1st Issue that we can have :
35
IoU Concept:
The Intersection over Union
(IoU) is a key metric in object
detection, used to evaluate the
accuracy of predicted bounding
boxes against ground truth
bounding boxes. It measures how
much overlap exists between the
two boxes and is represented as a
ratio.
36
2nd Issue that we can have :
37
see this vector can represent
only one class.
So how do you represent two
class?
Well I have this value for dog. I
have this value for
Person so instead of having a
seven dimension vector
how about we have a vector of
size 14. where you're just
concatenating these Two
vectors.
38
04
Optimizing Object
Detection Performance
39
40
41
42
43
44
45
Any
Questions ?
46