0% found this document useful (0 votes)
44 views32 pages

Algoritm For MOD

Uploaded by

Mc Swathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views32 pages

Algoritm For MOD

Uploaded by

Mc Swathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

YOLO Algorithm

for Object
Detection
What is Object Detection

Two-stage object detection

What is YOLO?

How does YOLO work?


Contents Intersection over Union (IoU)

Average Precision (AP)

YOLO, YOLOv2, YOLO9000, YOLOv3, YOLOv4+

Why the YOLO algorithm is important


Object Detection
It is a phenomenon in computer vision that
involves the detection of various objects (eg.
people, cars, chairs, stones, buildings, and
animals) in digital images or videos.
This phenomenon answers two basic questions:

What is the object?


This question seeks to identify the object in a
specific image.

Where is it?
This question seeks to establish the exact
location of the object within the image.
Two-stage object detection

• Two-stage object detection refers to


the use of algorithms that break down
the object detection problem statement
into the following two-stages:
• Detecting possible object regions.
• Classifying the image in those regions
into object classes.
Popular two-step algorithms like Fast-RCNN and Faster-
RCNN typically use a Region Proposal Network that proposes
regions of interest that might contain objects.

The output from the RPN is then fed to a classifier that


classifies the regions into classes.

While this gives accurate results in object detection with a


high mean Average Precision (mAP), it results in multiple
iterations taking place in the same image, thus slowing
down the detection speed of the algorithm and preventing
real-time detection.
What is YOLO? You Only Look Once
• YOLO is an algorithm proposed by by Redmond et. al in a research article
published at the IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR) as a conference paper, winning OpenCV People’s Choice
Award.
• In Comparison with other object detection algorithms, YOLO proposes the use of
an end-to-end neural network that makes predictions of bounding boxes and class
probabilities all at once.
How does YOLO work?
• The YOLO algorithm works by dividing the
image into N grids, each having an equal
dimensional region of SxS.
• Each of these N grids is responsible for the
detection and localization of the object it
contains.
• Correspondingly, these grids predict B
bounding box coordinates relative to their cell
coordinates, along with the object label and
probability of the object being present in the
cell.
Residual blocks

First, the image is divided into


various grids. Each grid has a
dimension of S x S. The
following image shows how an
input image is divided into
grids.

ImageSource:
https://www.guidetomlandai.co
m/assets/img/computer_vision
/grid.png
Bounding box regression

• A bounding box is an outline that


highlights an object in an image.
• Every bounding box in the image consists
of the following attributes:
• Width (bw)
• Height (bh)
• Class (for example, person, car, traffic
light, etc.)- This is represented by the letter
c.
• Bounding box center (bx,by)
Intersection over Union
• Intersection over Union is a popular metric to measure localization
accuracy and calculate localization errors in object detection models.
• To calculate the IoU with the predictions and the ground truth, we
first take the intersecting area between the bounding boxes for a
particular prediction and the ground truth bounding boxes of the
same area. Following this, we calculate the total area covered by the
two bounding boxes—also known as the Union.
• The intersection divided by the Union, gives us the ratio of the
overlap to the total area, providing a good estimate of how close the
bounding box is to the original prediction.

Note: Localization error occurs when an object from the target category. is detected with a misaligned bounding
box (0.1 <= overlap < 0.5).
IoU
• Intersection over union ensures that the predicted bounding boxes
are equal to the real boxes of the objects.
• This phenomenon eliminates unnecessary bounding boxes that do
not meet the characteristics of the objects (like height and width).
The final detection will consist of unique bounding boxes that fit the
objects perfectly.
Other methods brings
forth a lot of duplicate
predictions due to
multiple cells predicting
the same object with
different bounding box
predictions. • In Non Maximal Suppression, YOLO suppresses all bounding
boxes that have lower probability scores.
YOLO makes use of Non • YOLO achieves this by first looking at the probability scores
associated with each decision and taking the largest one. Following
Maximal Suppression to this, it suppresses the bounding boxes having the largest
deal with this issue. Intersection over Union with the current high probability bounding
box.
• This step is repeated till the final bounding boxes are obtained.

• Average Precision is calculated as the area under a precision vs
recall curve for a set of predictions.
• Recall is calculated as the ratio of the total predictions made by
the model under a class with a total of existing labels for the
class.
• On the other hand, Precision refers to the ratio of true positives
Average with respect to the total predictions made by the model.

Precision • The area under the precision vs recall curve gives us the
Average Precision per class for the model. The average of this

(AP)
value, taken over all classes, is termed as mean Average
Precision (mAP).

Note: In object detection, precision and recall are not for class
predictions, but for predictions of boundary boxes for measuring
the decision performance.
An IoU value > 0.5. is taken as a positive prediction, while an
IoU value < 0.5 is a negative prediction.
YOLO Architecture

• Inspired by the GoogleNet


architecture, The first YOLO
architecture has a total of 24
convolutional layers with 2 fully
connected layers at the end.
Yolo Timeline
The differences: YOLOv2, YOLO9000, YOLOv3,
YOLOv4+

YOLOv2 YOLO9000 YOLOv3 YOLOv4+


YOLOv2 detects even An algorithm to detect YOLOv3 uses a much Weighted Residual
the smallest of objects more classes than more complex DarkNet- Connections, Cross Mini
in groups and the COCO, capable of 53 as the model Batch Normalization
localization accuracy, detecting more than backbone.
Anchor boxes 9000 classes.

Increased mean Average However, it provides a A 106 layer neural YOLOv5 is an open-
Precision of the network lower mean Average network complete with source project based on
by introducing batch Precision as compared to residual blocks and up- the YOLO model pre-
normalization. YOLOv2 sampling networks trained on the COCO
dataset
Importance of YOLO
• YOLO algorithm is important because of the following reasons:
• Speed: This algorithm improves the speed of detection because it can predict
objects in real-time.
• High accuracy: YOLO is a predictive technique that provides accurate results
with minimal background errors.
• Learning capabilities: The algorithm has excellent learning capabilities that
enable it to learn the representations of objects and apply them in object detection.
Contents

1.Introduction

2.YOLO?

3.How YOLO detect real-time object

4.YOLOv1 , YOLOv2, YOLOv3

5.Implementation of YOLO with OpenCV

6.Performance chart YOLO vs other model

7.Real scenario with YOLO

8.Conclusion
Introduction

 Object detection is a computer technology related to computer vision and image


processing.

 It is widely used in computer vision tasks such as activity recognition, face


detection, face recognition, video object co-segmentation. It is also used in
tracking objects, for example tracking a ball during a football match, tracking
movement of a cricket bat, or tracking a person in a video.

 Methods for object detection generally fall into either machine learning-based
approaches or deep learning-based approaches.

 YOLO is mainly come under deep learning-based approch.


YOLO?

 You only look once (YOLO) is a state-of-the-art, real-time object detection


system. On a Pascal Titan X it processes images at 30 FPS and has a mAP of
57.9% on COCO test-dev

 family of models are a series of end-to-end deep learning models designed for
fast object detection

 There are three main variations of the approach till , they are YOLOv1, YOLOv2,
and YOLOv3
How YOLO Work’s?

 Prior detection systems repurpose classifiers or localizers to perform detection.


They apply the model to an image at multiple locations and scales. High scoring
regions of the image are considered detections.

 YOLO use totally different approach.

 Apply a single neural network to the full image. This network divides the image
into regions and predicts bounding boxes and probabilities for each region.
These bounding boxes are weighted by the predicted probabilities.

 It looks at the whole image at test time so its predictions are informed by global
context in the image.
 YOLO use Darknet network.
 The YOLO network splits the input image into a grid of S×S cells.
 Each grid cell predicts B number of bounding boxes and their objectness score along
with their class predictions.

 Coordinates of B bounding boxes -YOLO predicts 4 coordinates for each bounding


box (bx,by,bw,bh) with respect to the corresponding grid cell.
 Objectness score (P0) – indicates the probability that the cell contains an object

 Class prediction – if the bounding box contains an object, the network predicts the
probability of K number of classes.

 The predicted bounding boxes may look something like as follow


 Finally, the confidence score(Objectness score) for the bounding box and the
class prediction are combined into one final score that shows probability that this
bounding box contains a specific type of object.

 It turns out that most of these boxes will have very low confidence scores, so
only keep the boxes whose final score is above some threshold.

 Non-maximum Suppression (NMS) intends to cure the problem of multiple


detections of the same image.
 Non-maximum Suppression

 Non-maximum Suppression or NMS uses the very important function called


“Intersection over Union”, or IoU

 Define a box using its two corners (upper left and lower right): (x1, y1, x2, y2)
rather than the midpoint and height/width.

 Find the coordinates (xi1, yi1, xi2, yi2) of the intersection of two boxes where :

 xi1 = maximum of the x1 coordinates of the two boxes


 yi1 = maximum of the y1 coordinates of the two boxes
 xi2 = minimum of the x2 coordinates of the two boxes
 yi2 = minimum of the y2 coordinates of the two boxes
 The area of intersection by this formula
 area_intersection =(xi2 - xi1)*(yi2 – yi1)

 calculate the area of union


 union _area = (area of box 1 + area of box 2) – area_intersection

 Therefore IoU=area_intersection/union_area

 Implement non-max suppression, the steps are

 Select the box that has the highest score.


 Compute its overlap with all other boxes, and remove boxes that overlap it more
than a certain threshold which we call iou_threshold.
 Go back to above two step and iterate until there are no more boxes with a lower
score than the currently selected box
YOLOv1 YOLOv2 YOLOv3

It uses Darknet Second version YOLO is The previous version has


framework which is named as YOLO9000, been improved for an
trained on ImageNet- 40 FPS incremental improvement
1000 dataset, 30 FPS which is now called
YOLO v3,45 FPS
It could not find small Higher Resolution Feature Pyramid
objects if they are Classifier, Networks (FPN)
appeared as a cluster Fine-Grained
Features,Multi-Scale
Training
This architecture found Darknet 19: YOLO v2 Darknet-53: the
difficulty in generalisation uses Darknet 19 predecessor YOLO v2
of objects if the image is architecture used Darknet-19 as
of other dimensions feature extractor and
different from the trained YOLO v3 uses the
image Darknet-53 network for
feature extractor which
has 53 convolutional
layers
Basic Implementation of YOLO with OpenCV

 use OpenCV to implement YOLO algorithm as it is really simple

 Step 1 — Install the dependencies for Windows,Linux


 Python3,Opencv

 Step 2 — Install the DarkNet/YOLA


 DarkNet: Originally, YOLO algorithm is implemented in DarkNet framework.
 The CFG and WEIGHTS files and COCO.

 Step 3 – the command or code need to render the input image or video
Performance chart YOLO vs other model
Real scenario with YOLO

 The core of the self-driving car’s brain is YOLO Object Detection.

 The Reson why YOLO Used in Self Driving cars:

 --->Extremely Fast
 --->Contextually Aware
 --->A Generalized Network
Conclusion

 YOLO, a unified model for object detection,is simple to construct and can be
trained directly on full images.YOLOv3 is orders of magnitude faster(45 frames
per second) than other object detection algorithms.

 YOLO is the fastest general-purpose object detector and YOLO pushes the
state-of-the-art in real-time object detection. YOLO also generalizes well to new
domains making it ideal for applications that rely on fast, robust object detection.
References

 https://ieeexplore.ieee.org/document/7780460
 https://ieeexplore.ieee.org/abstract/document/8740604
 https://ieeexplore.ieee.org/document/8621865
 https://medium.com/@venkatakrishna.jonnalagadda/object-detection-yolo-v1-v2-
v3-c3d5eca2312a
 https://towardsdatascience.com/real-time-object-detection-with-yolo-
9dc039a2596b
 https://pjreddie.com/darknet/yolo/

You might also like