0% found this document useful (0 votes)

77 views3 pages

Deep Learning YOLOv2

The document summarizes two object detection models: YOLOv1 and YOLO9000. YOLOv1 was the original model that divided images into grids and had each grid cell predict bounding boxes. It helped improve detection speed but had limitations around small objects. YOLO9000 aimed to improve accuracy and tackle YOLOv1's limitations by detecting over 9,000 categories using a technique called "wordtrees and combination" during training. The document reviews the design, training process, limitations and performance of each model.

Uploaded by

Pedro Antonio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views3 pages

Deep Learning YOLOv2

Uploaded by

Pedro Antonio

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

A Review of YOLO: Real-Time Object Detection

1st João António

Data Science Master Degree
Instituto Politécnico de Leiria
Leiria, Portugal
[email protected] / ORCID 0000-0001-6306-1992

Abstract—While object detection has been a widely docu- model name. They claim the model only needs to look once
mented branch of computer vision, such applications mainly at an image to predict what objects are present and what they
consist on re-utilizing region proposal classification networks (R- are [1].
CNNs) to perform detection on many proposed regions and, as a
result, predicting multiple times for any given image. You Only
Look Once (YOLO) aims to provide a new approach to object
detection by having a single neural network performing bounding
box proposal and class prediction directly on the input images,
acting closer to what is referred to as a Fully Connected Neural
Network. This revision paper aims to cover the main features of
the YOLO network by providing the reader with a comprehensive
look into the model’s architecture, performance and real-life
application scenarios throughout its multiple iterations in recent
years.
Index Terms—YOLO, Object Detection, Real-time, FCNN,
YOLO9000,

I. I NTRODUCTION
Fig. 1. Example of a typical R-CNN Architecture. [3]
With the human eye being the epitome of object detec-
tion, it is only natural for science to seek development of The main premise behind the YOLO networks is the use of
strategies that aim to replicate this function and apply it to a single convolutional neural network capable enough to work
day-to-day living. Thoughts of self driving cars that can scan on full images and predict bounding boxes as well as class
the road ahead and perform decisions based on solely their probabilities for those boxes. This unified approach to object
immediate surroundings were once far fetched ideas that, detection brings several benefits, as described by the authors
nowadays, are an ever evolving reality that may very well in the first paper, but also introduces a series of limitations
be the new norm. In 2016, with the light-speed evolution of involving spatial constraints, where smaller objects that appear
the neural network scientific department, the authors of the in clusters, such as bird flocks, are often missed entirely. Other
YOLO network understood typical object detection systems presented limitations in the original paper include sensitivity
as re-purposed classifiers [1] which would evaluate a test to image aspect ratios and incorrect localizations [1]. Months
image for different objects at variable scales and locations. later, in December of 2016, two of the four original authors
The deformable parts models (DPM) documented in 2011 by presented a new iteration of the YOLO network, aptly named
P. Felzenszwalb outlines an approach which uses a sliding YOLO9000, due to its ability to detect over 9000 object
window technique, where a filter of a specified size is run categories. This new publication aims not only to tackle the
at evenly spaced locations over the target image, essentially main difficulties encountered with the first model, but also to
treating object detection as a binary classification problem [2]. massively increase performance and accuracy when compared
Another well documented approach to object detection are the to existing models at the time [5]. The ambitious YOLO9000
R-CNN (Region-Based Convolutional Networks) which utilize was huge step in the direction to make object detection be
region proposal methods on a given image and perform object comparable to the scale of object classification, as object
detection via a two-step function as depicted in Figure 1. detection datasets were typically limited to less than a few
Firstly, potential bounding boxes are generated around likely hundred possible tags, such as Microsoft’s COCO challenge
objects, and afterwards, a classifier is used on each region [7] or the Pascal visual object classes (voc) challenge [6],
of interest (ROI), ultimately predicting whether a region is while classification datasets commonly reached upwards of
an object or not [4]. While these examples demonstrate a 100.000 possible categories spanning across millions upon
vast understanding of computer vision and, overall, are very millions of entries. One notable example of the sheer scale
valid approaches to object detection in a timely fashion, the of classification datasets is the largest multimedia collection
authors of the YOLO network present a straightforward, 3-step currently available, YFCC100M, containing around 99 million
mechanism to identify objects in an image that originated the images and 1 million videos [8]. This bibliographic review
article aims to introduce the reader to both iterations of
the YOLO network, focusing each version’s features, limita-
tions and performance when put to the test against common
challenge datasets and other networks designed for object
detection.

II. YOLOV 1
The first iteration of the YOLO network, referred to in this
paper as YOLOv1, is heavily focused towards an ”unified”
way of performing detection on real-time images, using the
entire image frame, rather than smaller sized filters [9] as
a means to obtain contextual information about the objects
in any given image. It works by dividing the input frame
into a grid of size S × S, making each resulting grid cell
responsible by identifying whatever object falls within it. Each
cell is also responsible by returning 6 prediction values [1], Fig. 2. YOLOv1 Model Functionality. The original image is split into a
namely x, y, w, h, confidence, and C, respectively associated S × S grid of individually capable cells which predict B bounding boxes and
with the position (x, y) of the bounding box in relation to C probabilities. [1]
the bounds of the grid cell, the dimensions (w, h) when put
into perspective with the whole frame, the confidence value
1 × 1 reduction layers as a means to reduce dimensionality
(Intersection over Union) between the predicted box and the
before the more expensive 3 × 3 convolutions, an approach
ground truth box, and finally, the conditional class probability
documented by M. Lin, Q. Chen and S. Yan in the 2013 paper
vector (C), Pr(Classi |Object). The formula used to obtain
’Network in Network’ [12]. A graphic depiction of the full
class-specific confidence scores for each bounding box is given
network architecture may be consulted in figure 3
by:

truth
P r(Classi |Object) ∗ P r(Object) ∗ IOUpred =
truth
(1)
P r(Classi ) ∗ IOUpred

A representation of the YOLO workflow can be consulted

in figure 2, where the steps mentioned earlier can be easily
identified. The input image, in this scenario, was divided into
a grid of size 7 × 7 for ease of interpretation. Each of the 49
resulting cells is fully capable of predicting bounding boxes
for all classes that the network is trained to identify, and does
so with an associated confidence level, which in turn enables Fig. 3. The YOLOv1 detection network containing 24 convolutional layers
class probability mapping based on what bounding boxes are and 2 fully connected layers. Note that some convolutional layers contain 1
the most repeated and with most confidence. The end result is times 1 filters used to reduce feature dimensionality. [1]
a reduced amount of bounding boxes, this time labelled with
a specific class, encoded as an S × S × (B ∗ 5 + C) tensor. B. Training Process
While the method with which to select values for S and B
is mostly arbitrary, C is directly associated with the number C. Limitations
of labelled classes the target dataset contains. For instance, a D. Performance
dataset with 100 labelled classes where B = 2 and S = 7 will III. YOLO9000
output a 7 × 7 × 110 tensor.
A. Better, Faster, Stronger
A. Design B. Design
The original YOLO network was implemented as a convo- C. Wordtrees and Combination
lutional neural network (CNN) and evaluted on the P ASCAL D. Training
VOC detection dataset [10]. Large inspiration was taken
from another popular image classification model, GoogLeNet, E. Performance
which introduces a concept referred to as Inception Modules, IV. C ONCLUSION
essentially allowing the network to choose between multiple R EFERENCES
convolutional filters in each block and thus improving adapt-
[1] J. Redmon, A. Farhadi, S. Divvala, R. Girshick, “You Only Look Once:
ability to the dataset that it is being tested on [11]. The authors Unified, Real-Time Object Detection´´, University of Washington, Allen
of the YOLO network opted instead by running data through Institute for AI, May 2016
[2] Pedro F. Felzenszwalb, “Object Detection with Deformable Part Models
(DPM)´´, School of Engineering and Department of Computer Science,
Brown University, Rhode Island, December 2011
[3] Investigations of Object Detection in Images/Videos Using
Various Deep Learning Techniques and Embedded Platforms—A
Comprehensive Review - Scientific Figure on ResearchGate.
Available from: https://www.researchgate.net/figure/RCNN-architecture-
17fig4341099304 [accessed 17 Apr, 2022]
[4] R. Girshick, J. Donahue, T. Darrell and J. Malik, “Rich Feature Hier-
archies for Accurate Object Detection and Semantic Segmentation´´,
2014 IEEE Conference on Computer Vision and Pattern Recognition,
2014, pp. 580-587, doi: 10.1109/CVPR.2014.81.
[5] J. Redmon, A. Farhadi, “YOLO9000: Better, Faster, Stronger´´, Uni-
versity of Washington, Allen Institute for AI, December 2016
[6] M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisser-
man. “The pascal visual object classes (voc) challenge´´. International
journal of computer vision, 88(2):303–338, 2010
[7] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll
ar, and C. L. Zitnick.“ Microsoft coco: Common objects in context´´. In
European Conference on Computer Vision, Springer, 2014. pp. 740–755.
[8] B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D.
Poland, D. Borth, and L.J. Li. “Yfcc100m: The new data in multimedia
research´´. Communications of the ACM, 59(2):64–73, 2016
[9] I. Goodfellow, Y. Bengio, A. Courville. “Deep Learning (Adaptive
Computation and Machine Learning Series)“, 2015, pp. 330-348, ISBN-
13: 978-0262035613
[10] M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I.Williams, J.
Winn, and A. Zisserman. The pascal visual object classes challenge: A
retrospective. International Journal of Computer Vision, 111(1):98–136,
Jan. 2015.
[11] C. Szegedy, W. Liu, Y. Jia, P.Sermanet, S. Reed, D. Anguelov, D. Erhan,
V. Vanhoucke, A. Rabinovich, “Going Deeper with Convolutions´´,
arXiv:1409.4842v1, Sep. 2014.
[12] Lin, Q. Chen, and S. Yan. Network in network. CoRR, abs/1312.4400,
2013

Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
Prim Maths 4 2ed TR Unit 11 Test
100% (3)
Prim Maths 4 2ed TR Unit 11 Test
4 pages
Object Detection Week 2 YOLOv1-YOLOv8
100% (1)
Object Detection Week 2 YOLOv1-YOLOv8
264 pages
Digital Image Processing - Assignment No 2: Problem No. 1: (CLO 2, C-5)
No ratings yet
Digital Image Processing - Assignment No 2: Problem No. 1: (CLO 2, C-5)
7 pages
YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems
50% (2)
YOLO Is The State-Of-The-Art, Real Time System Built On Deep Learning For Solving Object Detection Problems
8 pages
Project
100% (1)
Project
30 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
Le Maitre 1976
No ratings yet
Le Maitre 1976
10 pages
Object Detection Using Image Processing
No ratings yet
Object Detection Using Image Processing
17 pages
Object Detection Technique (YOLO)
No ratings yet
Object Detection Technique (YOLO)
19 pages
MJEER-Volume 30-Issue 1 - Page 52-57
No ratings yet
MJEER-Volume 30-Issue 1 - Page 52-57
6 pages
Incremental Training For Image Classification of Unseen Objects
No ratings yet
Incremental Training For Image Classification of Unseen Objects
19 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Object Detection and Classification Using Yolov3 IJERTV10IS020078
No ratings yet
Object Detection and Classification Using Yolov3 IJERTV10IS020078
6 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
NMCE Lecture Plan
No ratings yet
NMCE Lecture Plan
1 page
Yolo
No ratings yet
Yolo
10 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
A Comprehensive Review of YOLO From YOLOv1 To YOLO
No ratings yet
A Comprehensive Review of YOLO From YOLOv1 To YOLO
27 pages
27 GSJ8976
No ratings yet
27 GSJ8976
16 pages
Yolo Paper
No ratings yet
Yolo Paper
10 pages
Yolo
No ratings yet
Yolo
10 pages
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
Object Detection Using Yolo
No ratings yet
Object Detection Using Yolo
42 pages
Y4 Place Number and Place Value End-of-Strand Assessment
100% (1)
Y4 Place Number and Place Value End-of-Strand Assessment
3 pages
Overview of YOLO ObjectDetectionAlgorithm
No ratings yet
Overview of YOLO ObjectDetectionAlgorithm
7 pages
Paper 5
No ratings yet
Paper 5
13 pages
Signature Object Detection Based On YOLOv3
No ratings yet
Signature Object Detection Based On YOLOv3
4 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
BIOMETRICS
No ratings yet
BIOMETRICS
18 pages
Seminar 201202175023
No ratings yet
Seminar 201202175023
16 pages
Yolov 3
No ratings yet
Yolov 3
42 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
Linkers in The English Language
No ratings yet
Linkers in The English Language
3 pages
20HCC22XX: B.Tech (III Sem)
No ratings yet
20HCC22XX: B.Tech (III Sem)
2 pages
Real Time Object Detection
No ratings yet
Real Time Object Detection
8 pages
YOLO Based Detection and Classification of Objects in Video Records
No ratings yet
YOLO Based Detection and Classification of Objects in Video Records
5 pages
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
No ratings yet
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
4 pages
Poison For Breakfast
100% (1)
Poison For Breakfast
23 pages
The Nature of Strategy Implementation
100% (5)
The Nature of Strategy Implementation
3 pages
Enhancing Real-Time Object Detection With YOLO Alg
No ratings yet
Enhancing Real-Time Object Detection With YOLO Alg
9 pages
You Only Look Once - Object Detection Models A Review
No ratings yet
You Only Look Once - Object Detection Models A Review
8 pages
Real-Time Face Detection Based On YOLO
No ratings yet
Real-Time Face Detection Based On YOLO
4 pages
s2017 Pbs Pixar Notes PDF
No ratings yet
s2017 Pbs Pixar Notes PDF
18 pages
Yolo
No ratings yet
Yolo
32 pages
YOLO
No ratings yet
YOLO
10 pages
You Only Look Once Model-Based Object Identification in Computer Vision
No ratings yet
You Only Look Once Model-Based Object Identification in Computer Vision
12 pages
YOLOv1 v8综述
No ratings yet
YOLOv1 v8综述
36 pages
Winspire
No ratings yet
Winspire
44 pages
Quesioner Design and Analyisis
No ratings yet
Quesioner Design and Analyisis
25 pages
Software Defined Radio Handbook: Eighth Edition
No ratings yet
Software Defined Radio Handbook: Eighth Edition
53 pages
Interpretation and Report Writing: Bm-Aryan Panchal
No ratings yet
Interpretation and Report Writing: Bm-Aryan Panchal
13 pages
Ex No 06
No ratings yet
Ex No 06
4 pages
Learning Targets 1-16 Practice Problems
No ratings yet
Learning Targets 1-16 Practice Problems
4 pages
Economics, Game Theory and Terrorism (Walter Enders, Todd Sandler)
No ratings yet
Economics, Game Theory and Terrorism (Walter Enders, Todd Sandler)
544 pages
1 s2.0 S1877050924033301 Main
No ratings yet
1 s2.0 S1877050924033301 Main
7 pages
YP ICSE 10th Number Based Programs
No ratings yet
YP ICSE 10th Number Based Programs
8 pages
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
No ratings yet
YOLO Algorithm For Real-Time Object Detection: 2.1. Network Design
3 pages
YOLO Object Detection Explained - A Beginner's Guide - DataCamp
No ratings yet
YOLO Object Detection Explained - A Beginner's Guide - DataCamp
14 pages
Base Paper (YOLO)
No ratings yet
Base Paper (YOLO)
6 pages
Yolo
No ratings yet
Yolo
34 pages
Noise in The Cabin of Agricultural Tractors
No ratings yet
Noise in The Cabin of Agricultural Tractors
5 pages
Semantics
100% (2)
Semantics
14 pages
The Impact of Reward and Recognition On Employee Engagement at Pt. Bank Sulutgo, Manado
No ratings yet
The Impact of Reward and Recognition On Employee Engagement at Pt. Bank Sulutgo, Manado
13 pages
Yolopdf
No ratings yet
Yolopdf
10 pages
Abstract Algebra - Proof of Prime and Irreducible Equivalences in PIDs - Mathematics Stack Exchange
No ratings yet
Abstract Algebra - Proof of Prime and Irreducible Equivalences in PIDs - Mathematics Stack Exchange
2 pages
YOLO v2
No ratings yet
YOLO v2
9 pages
2 Mesh Analysis
No ratings yet
2 Mesh Analysis
16 pages
Object Detection Document
No ratings yet
Object Detection Document
4 pages
Yolo1 11
No ratings yet
Yolo1 11
38 pages
LaSalleCollege F3 Maths Final Exam Paper 1 Section AB 2012 13
No ratings yet
LaSalleCollege F3 Maths Final Exam Paper 1 Section AB 2012 13
8 pages
Pavel Florensky S Complex Universe
No ratings yet
Pavel Florensky S Complex Universe
39 pages
GATE 2023: (Forenoon Session) Computer Science Engineering
No ratings yet
GATE 2023: (Forenoon Session) Computer Science Engineering
55 pages
YOLO
No ratings yet
YOLO
7 pages
Notes-1-Activation Functions
No ratings yet
Notes-1-Activation Functions
2 pages
Mid Term Last Year
No ratings yet
Mid Term Last Year
4 pages
YOLOV1论文-同济子豪兄批注You Only Look Once Unified Real-time Object Detection
No ratings yet
YOLOV1论文-同济子豪兄批注You Only Look Once Unified Real-time Object Detection
10 pages
Synopsis - Internship - Group-53
No ratings yet
Synopsis - Internship - Group-53
8 pages
Data Analysis Exercises
No ratings yet
Data Analysis Exercises
4 pages
Efficient Object Detection With YOLO A C
No ratings yet
Efficient Object Detection With YOLO A C
13 pages
Tutorial 4 (Week 4) Beams, Supports and Indeterminacy of Structure
No ratings yet
Tutorial 4 (Week 4) Beams, Supports and Indeterminacy of Structure
5 pages
Improved Small-Object Detection Using YOLOv8 A Com
No ratings yet
Improved Small-Object Detection Using YOLOv8 A Com
9 pages
Math No Problem Textbook 1A
No ratings yet
Math No Problem Textbook 1A
152 pages

Deep Learning YOLOv2

Uploaded by

Deep Learning YOLOv2

Uploaded by

A Review of YOLO: Real-Time Object Detection

1st João António

A representation of the YOLO workflow can be consulted

You might also like