Overview of YOLO ObjectDetectionAlgorithm

The document provides an overview of the YOLO object detection algorithm. It discusses how YOLO implements one-stage object detection based on regression ideas. YOLO analyzes an entire image in one go to predict bounding boxes and class probabilities, achieving faster speeds than two-stage detectors. The summary describes the evolution of YOLO from version 1 to version 4, highlighting improvements like added batch normalization, multi-scale feature maps, and new data augmentation techniques.

Uploaded by

bxr16536

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Overview of YOLO ObjectDetectionAlgorithm

Uploaded by

bxr16536

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

International Journal of Computing and Information Technology Vol 1, No.

2, 2022
ISSN: 2790-170X DOI: 10.56028/ijcit.1.2.11

Overview of YOLO Object Detection Algorithm

Chengjuan Wan, Yuxuan Pang, Shanzhen Lan*

Communication University of China

[email protected]

Abstract
As an important research direction in the field of computer vision, object detection has developed
rapidly and many kinds of mature algorithms emerged. The series of YOLO (You Only Look Once)
algorithms implement one-stage detection based on regression ideas, which showing preeminent
in speed and owning strong generalization on a variety of datasets. This paper will give a simple
introduction to the current mainstream deep learning object detection algorithm, then focus on
combing the principle and optimizational process of the series of YOLO algorithms, summarize
the latest breakthroughs in YOLO algorithm, Hopefully that can provide reference for the
research of related topics.
Keywords
object detection; deep learning; YOLO.

11
International Journal of Computing and Information Technology Vol 1, No. 2, 2022
ISSN: 2790-170X DOI: 10.56028/ijcit.1.2.11

1. Object detection algorithm

Traditional object detection algorithms generate candidate regions by exhaustive sliding

window and then feature extraction for machine learning classification with poor
performance in speed and accuracy. With the rapid development of AI technology, it has been
gradually replaced by deep learning methods.

Fig.1. Timeline of object detection algorithm

1.1. Two-stage detection algorithm based on region ideas

The two-stage detection algorithm is represented by the series of R-CNN networks, since 2014
using region proposal and CNN instead of sliding window and manual design features to
achieve RCNN framework construction, SPP-NET、Fast RCNN have appeared in succession. To
make up for the time-consuming shortcoming that selectively search candidate boxes,
Shaoqing Ren proposed the Faster RCNN algorithm which applies Regional Proposal Network
to replace selective search to generate candidate regions and realizes weight sharing,
ultimately forms a typical end-to-end two-stage detection process. Nowadays two-stage
detection algorithm either includes FPN, Cascade-RCNN, Libra RCNN, Hybrid Task RCNN,
PSS-Det and so on, all of which perform well in aspects of accuracy and recall.

1.2. One-stage detection algorithm based on regression ideas

The one-stage detection algorithm is represented by YOLO, SSD, RetinaNet. In 2016 Redmon
proposed YOLOv1, which produces candidate boxes for classification and bounding box
regression without the output process of intermediate regions, bringing about real-time
improvement at the sacrifice of a small amount of precision rate. In the same year, Liu
proposed the SSD algorithm that combines YOLO with Faster RCNN to solve the localization
accuracy problem, moreover adding multi-scale feature maps for prediction. Since then, the
series of YOLO algorithms continue to develop, will be highlighted thereinafter .

1.3. Anchor-free object detection algorithm

The above mainstream algorithms actualize object detection based on bounding boxes, which

12
International Journal of Computing and Information Technology Vol 1, No. 2, 2022
ISSN: 2790-170X DOI: 10.56028/ijcit.1.2.11
include many problems: the irregular object appearance makes the bounding box contain
some non-object areas, interfering to detection result; the setting of hyper- parameters such
as the number ， size and width ratio of bounding box is needed to adjust according to the
datasets; the large number of bounding boxes may lead to the imbalance of positive and
negative samples, affecting the training effect. In 2018, Law H proposed CornerNet, converting
the object detection problem into a key-point detection problem, which uses a single
convolutional neural network to predict two key points in the upper left and lower right
corners of the object, thus obtaining the prediction box. CornerNet focused on object edges, in
2019 Duan K proposed CenterNet, increased detection of the center site and then regressed to
other properties of the object through the location of the center site. Up to this day, the
CornerNet-Squeeze, FCOS, and TTFNet algorithm have all performed well.

2. YOLO algorithm

In order to make up for the shortcomings of Faster RCNN be difficult to meet real-time
requirements in detection speed, Joseph Redmon, Santosh Divvala, Ali Farhad and others
proposed YOLO, which regards the object detection framework as a spatial regression
problem. A single neural network can get bounding box and class prediction from the
complete image after one operation. YOLO experienced the version update from v1 to v5 , has
become one of the mainstream framework of object detection.

Fig.2. YOLO algorithm implementation process

2.1. YOLOv1
YOLOv1 achieves the prediction of object classes and bounding box regression on a complete
image directly through the CNN. Its network structure is built based on the GoogLeNet model.
Firstly, the input image is fixed to a uniform size (448x448), the input image is divided into
SxS grid cells, each of them is responsible for detecting the object falling on it and predicting
the confidence, class and location of the object. Secondly, extracting the feature in input
images using CNN. And finally, the optimal result is obtained by processing Non-Maximum

13
International Journal of Computing and Information Technology Vol 1, No. 2, 2022
ISSN: 2790-170X DOI: 10.56028/ijcit.1.2.11
Suppression (NMS). Each grid cell divided by YOLO detects an object and transforms the
detection bounding box by regression so that the framework can extract features directly
from the input image to predict the object bounding boxes and class probabilities.
The YOLO detection system is designed to divide the image of the input 448x448x3 into grid
of 7x7, and whose calculation of the predicted output tensor is expressed as:
S × S ( B × 5 + C)
Where SxS represents the number of grids divided by the input image, corresponding to the
feature graph resolution; B represents the number of bounding boxes generated per grid; 5
represents the number of predicted parameters (x, y, w, h, confidence), and C represents the
identified species that can be detected (20). Compared to the Fast RCNN of 0.5 fps and Faster
RCNN of 7 fps, YOLO operates at 45 fps, with a large improvement in running speed, but still
some deficiencies in prediction accuracy, prone to more positioning errors especially for small
objects.

2.2. YOLO v2
YOLOv2, relative to the v1 version, mainly improves on the prediction accuracy, speed, and
the number of identified objects. The objects detected by YOLOv2 extend to 9 000 species.
YOLOv2 optimizes in the following aspects ： It uses a simpler feature extraction network
DarkNet19 to replace the GoogLeNet network; Batch normalization (BN) was introduced to
strengthen the convergence rate of the network, and enhanced generalization; Trains a
high-resolution classifier to accommodate higher-resolution images; Removes a pooling layer
to increase the output resolution of the convolutional layer; The ImageNet classification
dataset and the COCO detection dataset were jointly trained using WordTree; A pass-through
layer was added, by connecting the last output 3x3x512 layer with the preceding
convolutional layer to contact high resolution image features and low resolution image
features, features were acquired from the earlier layers at a resolution of 26x26; To adapt to
more pictures of different sizes, after every ten iterations, select a new resolution (from
320x320 to 608x608) for the operation; The full connection layer was removed and the prior
box (or anchor boxes) was automatically searched for using the k-means clustering algorithm,
improve the detection performance.

2.3. YOLOv 3
In 2018, the original author proposed the YOLOv3 algorithm, which inherits the ideas of
YOLOv1 and YOLO9000 and achieves a balance of speed and detection accuracy. In terms of
network structure, YOLOv2 canceled all the fully connected layers of the first generation
YOLOv1, while YOLOv3 further abolished all the most common pooling layers in the
convolutional neural network, and the original pooling layer used to reduce the feature size
was changed by increasing the step length of the original convolutional core, which greatly
improves the speed. Another important improvement of YOLOv3 is the ability to output three
different size feature maps of both 13x13, 26x26, and 52x52, enhancing the detection of small
objects, but also weakening large objects. Finally, YOLOv3 no longer uses Softmax to classify
each box, and instead uses multiple separate logistic classifiers, operating only the one of the
obtained anchor frame with the highest object likelihood score.
14
International Journal of Computing and Information Technology Vol 1, No. 2, 2022
ISSN: 2790-170X DOI: 10.56028/ijcit.1.2.11

Fig.3. YOLOV3 structure

2.4. YOLOv4
In April 2020, Alexey improved and proposed a new algorithm for efficient object detection
based on YOLOv3, YOLOv4. It is characterized by integration, including new data
enhancement methods Mosaic and self-Adversarial Training (SAT) method, proposed
improved SAM and PAN, and cross-small batch normalization (Cross mini-Batch
Normalization, cmBN). Including Head, Neck for SPP, and PAN from YOLOv3, and Backbone
from CSPDarkNet53.
The YOLOv4 is split into 4 sections including input terminal, BackBone, Neck, prediction
part.Among them, the input side mainly includes Mosaic, cmBN, SAT; the backbone network
includes the CSPDarknet53 network, Mish activation function, Dropblo-ck; the Neck part
includes the SPP module, FPN + PAN structure; the prediction part is mainly the improved
loss function CIOU_Loss, and the bounding box filtered nms becomes DIOU_nms.

2.5. YOLOv5
After two months after YOLOv4, some researchers launched the YOLOv5 algorithm. In terms
of accuracy metrics, its performance is equal to YOLOv4, far exceeds v4 in speed, and the
model size(27MB) is also very small than YOLOv4(245MB) . It has a strong advantage in
model deployment. The YOLOv5 structure is slightly similar to the YOLOv4, but it’s different.
Its input adopts Mosaic data enhancement, adaptive anchor frame calculation and adaptive
image scaling; Framework includes Focus structure, Backbone of CSP structure and Neck of
FPN structure. Through the improvement of YOLO series algorithm and comparison with
RCNN series algorithm, YOLO achieves "you only look once" relative to the extraction and
classification of RCNN series; YOLO unifies the detection as a regression problem, while RCNN
divides the detection results into object class (classification problem) and object location
(regression problem). A range of improved approaches give YOLO a lead in the speed of object
detection.

15
International Journal of Computing and Information Technology Vol 1, No. 2, 2022
ISSN: 2790-170X DOI: 10.56028/ijcit.1.2.11

Fig.4. YOLOV5s structure

3. Summary and Outlook

This paper focuses on the field of object detection and introduces the development and
optimization process of the series of YOLO algorithms. In order to meet its application
optimization in light weight, small object detection, precision speed and so on, extension
algorithms such as PP-YOLOv2, PP-YOLOv2, P P-YOLO Tiny, YOLO-Fastest and YOLObile have
also emerged continuously, worthing further research.

Reference
[1] ZHOU X Y, GONG W, FU W L, et al. Application of deep learning in object detection[J]. In
Proceedings of the IEEE/ACIS 16th International Conference on Computer and Information
Science, 2017, 132( 5) : 631-634.
[2] QIAN X, LIN S, CHENG G, et al. Object detection in re-mote sensing images based on improved
bounding box re-gression and multi-level features fusion[J]. Remote Sens, 2020, 12( 1) : 143-164.
[3] RUSSAKOVSK O, DENG J, SUH, et al. Imagenet large scale visual recognition challenge [J ].
International Journal of Computer Vision, 2015, 115( 3) : 211-252.
[4] LIU L, OUYANG W, WANG X, et al. Deep learning for generic object detection: a survey [J].
International Journal of Computer Vision, 2020, 128: 261-318.
[5] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection
and se-mantic segmentation[J]. IEEE Conference on Computer Vision and Pattern Recognition,
2014, 81( 1) : 580-587．.
[6] UIJLINGS J, SANDE K, GERS T, et al. Selective search for object recognition[J]. International Journal
of Computer Vision, 2013, 104( 2) : 154-171.
[7] COETES C, VAPNIK V. Integrated series in information systems[M]. Berlin: Springer, 1995:
207-235.

16
International Journal of Computing and Information Technology Vol 1, No. 2, 2022
ISSN: 2790-170X DOI: 10.56028/ijcit.1.2.11
[8] REN S, HEK, GIRSHICK R, et al. Faster RCNN: towards real-time object detection with region
proposal networks [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39:
1137-1149.

Application of Big Data - and - Artificial - Intelligence - in - Pilot - Training - A - Systematic - Literature - Review
No ratings yet
Application of Big Data - and - Artificial - Intelligence - in - Pilot - Training - A - Systematic - Literature - Review
5 pages
Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
NYT Opinion Noam Chomsky: The False Promise of ChatGPT - The New York Times
100% (1)
NYT Opinion Noam Chomsky: The False Promise of ChatGPT - The New York Times
7 pages
Object Detection Week 2 YOLOv1-YOLOv8
No ratings yet
Object Detection Week 2 YOLOv1-YOLOv8
264 pages
Yolo Paper
No ratings yet
Yolo Paper
10 pages
Red Mon 2016
No ratings yet
Red Mon 2016
10 pages
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
Paper 5
No ratings yet
Paper 5
13 pages
Real Time Object Detection
No ratings yet
Real Time Object Detection
8 pages
Deep Learning YOLOv2
No ratings yet
Deep Learning YOLOv2
3 pages
yolopdf
No ratings yet
yolopdf
10 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
YOLO Series Algorithms in Object Detection of Unmanned Aerial Vehicles: A Survey
No ratings yet
YOLO Series Algorithms in Object Detection of Unmanned Aerial Vehicles: A Survey
30 pages
Project
100% (1)
Project
30 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
YOLOv1 v8综述
No ratings yet
YOLOv1 v8综述
36 pages
Yolo
No ratings yet
Yolo
10 pages
Enhancing Real-Time Object Detection With YOLO Alg
No ratings yet
Enhancing Real-Time Object Detection With YOLO Alg
9 pages
Object Detection and Classification Using Yolov3 IJERTV10IS020078
No ratings yet
Object Detection and Classification Using Yolov3 IJERTV10IS020078
6 pages
A4121119119
No ratings yet
A4121119119
4 pages
YOLO Object Detection Explained_ A Beginner's Guide _ DataCamp
No ratings yet
YOLO Object Detection Explained_ A Beginner's Guide _ DataCamp
14 pages
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
No ratings yet
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
4 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
yolo1-11
No ratings yet
yolo1-11
38 pages
Object_Detection_Document
No ratings yet
Object_Detection_Document
4 pages
27 GSJ8976
No ratings yet
27 GSJ8976
16 pages
YED-YOLO: An Object Detection Algorithm For Automatic Driving
No ratings yet
YED-YOLO: An Object Detection Algorithm For Automatic Driving
9 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Improvement of Object Detection Based On Faster R - 220904 150051
No ratings yet
Improvement of Object Detection Based On Faster R - 220904 150051
5 pages
Seminar 201202175023
No ratings yet
Seminar 201202175023
16 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
Algoritm For MOD
No ratings yet
Algoritm For MOD
32 pages
MJEER-Volume 30-Issue 1 - Page 52-57
No ratings yet
MJEER-Volume 30-Issue 1 - Page 52-57
6 pages
yolo
No ratings yet
yolo
32 pages
YOLO_v2
No ratings yet
YOLO_v2
9 pages
You Only Look Once - Object Detection Models A Review
No ratings yet
You Only Look Once - Object Detection Models A Review
8 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
Object Detection Technique (YOLO)
No ratings yet
Object Detection Technique (YOLO)
19 pages
Paper
No ratings yet
Paper
3 pages
Image Detection and Segmentation Using YOLO v5 For
No ratings yet
Image Detection and Segmentation Using YOLO v5 For
6 pages
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
YOLOv8_A_Novel_Object_Detection_Algorithm_with_Enhanced_Performance_and_Robustness
No ratings yet
YOLOv8_A_Novel_Object_Detection_Algorithm_with_Enhanced_Performance_and_Robustness
6 pages
YOLO Based Detection and Classification of Objects in Video Records
No ratings yet
YOLO Based Detection and Classification of Objects in Video Records
5 pages
Base Paper (YOLO)
No ratings yet
Base Paper (YOLO)
6 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
Yolo
No ratings yet
Yolo
10 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
BIOMETRICS
No ratings yet
BIOMETRICS
18 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
You Only Look Once Model-Based Object Identification in Computer Vision
No ratings yet
You Only Look Once Model-Based Object Identification in Computer Vision
12 pages
Object Detection Method Based On YOLOv3 Using - Deep Learning Networks
No ratings yet
Object Detection Method Based On YOLOv3 Using - Deep Learning Networks
4 pages
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
100% (1)
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
61 pages
Yolo Vs RCNN
No ratings yet
Yolo Vs RCNN
5 pages
Ex No 06
No ratings yet
Ex No 06
4 pages
Tinier YOLO
No ratings yet
Tinier YOLO
10 pages
Real-Time Face Detection Based On YOLO
No ratings yet
Real-Time Face Detection Based On YOLO
4 pages
YOLOv 5
No ratings yet
YOLOv 5
10 pages
Object Detection Using Yolo
No ratings yet
Object Detection Using Yolo
42 pages
1-s2.0-S1877050924033301-main
No ratings yet
1-s2.0-S1877050924033301-main
7 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
From Everand
Visual Sensor Network: Exploring the Power of Visual Sensor Networks in Computer Vision
Fouad Sabry
No ratings yet
Digital Fluency
No ratings yet
Digital Fluency
9 pages
Ara Bea Iligan Bantono Uwu
No ratings yet
Ara Bea Iligan Bantono Uwu
2 pages
Blur Parameter Identification Using Support Vector Machine: Ratnakar Dash, Pankaj Kumar Sa, and Banshidhar Majhi
No ratings yet
Blur Parameter Identification Using Support Vector Machine: Ratnakar Dash, Pankaj Kumar Sa, and Banshidhar Majhi
4 pages
Assignment
No ratings yet
Assignment
3 pages
Data Mining MCQs unit1&2
No ratings yet
Data Mining MCQs unit1&2
11 pages
Information System Overview
No ratings yet
Information System Overview
68 pages
Youtubers American English Student
No ratings yet
Youtubers American English Student
8 pages
Ethics in The Software Development Process From Co
No ratings yet
Ethics in The Software Development Process From Co
25 pages
A Survey On Neural Architecture Search
No ratings yet
A Survey On Neural Architecture Search
53 pages
Artificial Intelligence for Aspiring Lawyers
No ratings yet
Artificial Intelligence for Aspiring Lawyers
68 pages
Arsana Innovation Teleperformance
No ratings yet
Arsana Innovation Teleperformance
18 pages
Cosi Ujm Advanced Image Processing
No ratings yet
Cosi Ujm Advanced Image Processing
2 pages
Machine Learning Techniques Quantum
No ratings yet
Machine Learning Techniques Quantum
161 pages
AI For Teachers - An Open Textbo - Diversos Autors
No ratings yet
AI For Teachers - An Open Textbo - Diversos Autors
250 pages
Capital Com White Paper 20200415
No ratings yet
Capital Com White Paper 20200415
82 pages
Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys 2024 scribd download
100% (1)
Machine Learning with R the tidyverse and mlr 1st Edition Hefin I Rhys 2024 scribd download
55 pages
AWS Partner - Generative AI On AWS - Sales Guide
No ratings yet
AWS Partner - Generative AI On AWS - Sales Guide
4 pages
Course Catalogue NASSCOM
No ratings yet
Course Catalogue NASSCOM
26 pages
NRTC Report
No ratings yet
NRTC Report
33 pages
Intelligent Robots and Drones, 2024
No ratings yet
Intelligent Robots and Drones, 2024
479 pages
Rui Cândido Azevedo Ribeiro PDF
No ratings yet
Rui Cândido Azevedo Ribeiro PDF
132 pages
COMPX310-19A Machine Learning Chapter 11: Training Deep Neural Networks
No ratings yet
COMPX310-19A Machine Learning Chapter 11: Training Deep Neural Networks
21 pages
Most Cited Research Papers - Image Processing
No ratings yet
Most Cited Research Papers - Image Processing
3 pages
White Paper Ethics and Trustworthiness of Artifici
No ratings yet
White Paper Ethics and Trustworthiness of Artifici
12 pages
Data-Driven Modeling To Predict The Load vs. Displacement Curves of Targeted
No ratings yet
Data-Driven Modeling To Predict The Load vs. Displacement Curves of Targeted
14 pages
Mercer Skills Snapshot Survey Report 2024 2025
No ratings yet
Mercer Skills Snapshot Survey Report 2024 2025
12 pages
Dissertation Topics On Service Marketing
100% (2)
Dissertation Topics On Service Marketing
7 pages
AIS Chapter 1
No ratings yet
AIS Chapter 1
44 pages