(GOOD) Faster R-CNN and YOLO Based Vehicle Detection A Survey

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/350727309
Faster R-CNN and YOLO based Vehicle detection: A Survey
Conference Paper · April 2021

DOI: 10.1109/ICCMC51019.2021.9418274
CITATIONS READS
69 655
3 authors:
Madhusri Maity Sriparna Banerjee

Jadavpur University Jadavpur University
1 PUBLICATION 68 CITATIONS 55 PUBLICATIONS 130 CITATIONS
SEE PROFILE SEE PROFILE
Sheli Sinha Chaudhuri

Jadavpur University
198 PUBLICATIONS 2,325 CITATIONS
SEE PROFILE
All content following this page was uploaded by Sriparna Banerjee on 07 October 2023.
The user has requested enhancement of the downloaded file.

Proceedings of the Fifth International Conference on Computing Methodologies and Communication (ICCMC 2021)
IEEE Xplore Part Number: CFP21K25-ART
Faster R-CNN and YOLO based Vehicle detection: A

Survey
2021 5th International Conference on Computing Methodologies and Communication (ICCMC) | 978-1-6654-0360-3/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICCMC51019.2021.9418274
Madhusri Maity1, Sriparna Banerjee2, Sheli Sinha Chaudhuri3

Electronics and Telecommunication Engineering Department
Jadavpur University
Kolkata, India
[email protected], [email protected] ,[email protected]
Abstract— Automatic moving vehicle detection plays a crucial approximately 21% of total annual accidental crashes in U.S.
and challenging role in performing intelligent traffic surveillance. occurred due to poor visibility during inclement weather.
Numerous research projects aiming to perform proper detection According to the data published in [3], it is stated that about
and tracking of vehicles have been carried out and the methods 90% of the road accidents in India occur due to negligence of
designed under these projects have found their uses in various
important applications for e.g. to minimize the fatal accidents
drivers. These data and statistics published in several
which mainly occur due to negligence of drivers or due to poor significant sources clearly state the importance of performing
visibility during inclement weather condition or due to improper accurate vehicle detection in real world.
illumination, etc. At present, several deep neural networks have In the past decade, numerous methods have been
been proposed for performing object detection. This paper designed for accurate tracking and detection of vehicles.
presents a comprehensive review of existing Faster Region-based Although the traditional vehicle detection algorithms such as
Convolutional Neural Network (Faster R-CNN) and You look Gaussian mixed model (GMM) [4] give promising results but
only once (YOLO) based vehicle detection and tracking methods. it fails to perform desirably when illumination changes occur
In this survey, we have divided the existing vehicle detection or in the presence of background clutter etc. The deep learning
methods into different groups depending upon the architecture
(Faster R-CNN/YOLO) which have been used as the backbone of
methods have inherent feature extraction capability which
these designed methods. We have organized the entire survey in makes them much more acceptable to researchers compared to
chronological order so that interrelations between proposed the traditional methods as it minimizes the errors occurring in
methods can be highlighted. Apart from performing in depth classification tasks which occur due to erroneous handcrafted
analyses of the existing methods, we have described the respective feature extraction to great extent. As Convolutional neural
architectures of Faster R-CNN, YOLO and their proposed networks (CNN) are designed to artificially replicate the
variants in details in this survey for better understanding. We functional capabilities of human cognitive system, they give
have concluded this paper by listing down the limitations of the better performances in various computer vision tasks
existing works and unexplored aspects of this research topic. We compared to the traditional methods. Hence, in this survey
have also thrown some light on the future scope of this research
area.
we have focused mainly on deep neural networks like Faster
R-CNN and YOLO network based vehicle detection methods.
Keywords—Vehicle detection; Faster R-CNN; YOLO; Proposed The remaining portions of this survey is organized in
variants; Survey the following order: R-CNN and its’ proposed variants is
described elaborately in Section 2 and the vehicle detection
I. INTRODUCTION models which are designed based on Faster R-CNN are
In recent years, vehicle detection has become a popular discussed in Section 3. The architectures of several versions of
topic of research among researchers working in related fields YOLO detector are studied in details in Section 4 and the
due to its’ societal importance. According to the survey, every methods which are designed based on these architectures are
year a large number of people die worldwide because of the discussed in Section 5. This survey is finally concluded by
fatal accidents which are mainly caused due to the negligence listing down the unexplored aspects and future work in this
of drivers or poor visibility during inclement weather research topic.
conditions, etc.The report [1] published by National Crime
Record Bureau’s Accidental Death and Suicides in India II. BRIEF INTRODUCTION TO R-CNN AND IT’S
stated that hundreds of people died mainly in two states of PROPOSED VARIANTS
India(Andhra Pradesh and Telangana) in the year 2014 due
A. Region-based Convolutional Network (R-CNN) [5]:
accidents caused by poor visibility during inclement weather
conditions. Another report [2] which is published in the R-CNN is one of the primary deep neural network which
website of U.S. Department of Transportation, Federal is designed to perform object detection.
Highway Administration based on the data collected over a R-CNN uses object proposals generated by selective search to
span of 10 years (2007–2016) by NHTSA also stated that train CNN for performing object detection and generating
1442
Authorized licensed use limited to: Robert Gordon University. Downloaded on May 27,2021 at 06:43:13 UTC from IEEE Xplore. Restrictions apply.
2000 candidate boxes. Each candidate box is then warped into C. Faster R-CNN [7]:
fixed size and given as input to the CNN which in turn acts as In order to further reduce the time-complexity and also to
a feature extractor and produces 4096 dimensional feature as a generate accurate region proposals, a network namely Faster R-
output. This set of features is fed to the SVM classifier to CNN is designed in [7] by merging Fast R-CNN and a novel
perform classification. In addition to performing classification fully-convolutional neural network namely Region Proposal
R-CNN also predicts four offset values to increase the Network (RPN). RPN not only generates high-quality region
precision of each bounding box. System overview of R-CNN proposals but also can simultaneously propose object bounds
is pictorially represented in Fig. 1. and objectness scores at each position. The system overview of
Faster R-CNN is given in Fig.3.
Fig. 1. System overview of R-CNN [5]
Limitations of R-CNN:
i. Huge time complexity, which makes R-CNN not
suitable for real life applications.
ii. Inaccurate generation of candidate region proposals
due to absence of inherent learning capability in selective Fig. 3. System overview of Faster-RCNN [7]
search algorithm.
Due to the efficiency of Faster R-CNN in performing
B. Fast R-CNN [6]: accurate region proposal generation as well as its capability of
Girshick. et.al. [6] have proposed a modified deep neural reducing the time-complexities of R-CNN and Fast R-CNN to
network namely, Fast R-CNN to overcome the limitations large extent, Faster R-CNN is used by many researchers as the
occurring due to huge time-complexity in R-CNN. Unlike R- backbone of the deep neural architectures designed by them to
CNN, Fast R-CNN does not require to fed 2000 region perform vehicle detection and tracking in the following section.
proposals generated from an image by selective search method III. METHODS DESIGNED BASED ON FASTER R-
to CNN individually to generate corresponding convolutional CNN ARCHITECTURE
feature map, instead it feds an entire image as an input to CNN.
From the generated feature map of an entire image, region 1. Fan et.al. [8] (2016): In this method, the authors have
proposals are identified and resized into fixed size by a RoI performed object detection by modifying a few model
pooling layer. Then softmax layer is used to identify the objects parameters like training scale, test scale and the
present within each region proposal and to predict four offset number of proposals. The authors in [8] have shown
values. The system overview of Fast R-CNN is given in Fig.2. how the performance efficiencies of Faster R-CNN
vary in performing vehicle detection using different
training scale, test scale and number of proposal
values on KITTI dataset [9].
2. Espinosa et. al. [10]: In this paper, the authors have
performed comparative analyses of performances of
AlexNet [11] and Faster R-CNN in performing moving
vehicle detection using a video of urban area. They
have used Vgg16 [12] as a feature extractor in the
Fig. 2. System overview of Fast R-CNN [6] Faster R-CNN architecture while performing vehicle
detection and finally concluded that Faster R-CNN
Limitations of Fast R-CNN: achieves better F1-score in performing moving vehicle
detection.
i. The introduction of the RoI pooling layer although have
reduced the time complexity of Fast R-CNN to some extent 3. H. Nyugen [13]: In this work, the authors have pointed
compared to R-CNN but the problem of inaccurate region out that although Faster R-CNN give desirable
proposals generation occurring due to non-learning capability performances compared to other well-known deep
of selective search algorithm too exists in Fast R-CNN as like neural networks like R-CNN, Fast R-CNN, AlexNet,
R-CNN, in Fast R-CNN too the region proposals are detected etc. but it fails to perform desirably in case of heavy
using the selective search algorithm. occlusion or large scale vehicle variation or truncation
of small vehicles, etc. In order to overcome these
shortcomings of traditional model of Faster-RCNN, H.
Nyugen has designed an improved architecture of
Faster R-CNN where he has adopted MoblieNet
1443
architecture [14] to build the base convolutional layer representation of steps of object detection using YOLO version
of the designed architecture. They have replaced the 1 is given in Fig.4.
Non-Maximum-Suppression (NMS) algorithm in
Faster R-CNN with a novel algorithm namely, soft
NMS. Traditional NMS algorithm checks for all the
classes respectively and removes a proposal when the
Intersection over Union (IoU) values between the
neighbouring boxes for the same class is less than a
pre-defined threshold. This property of NMS algorithm
often leads to improper vehicle detection when heavy
vehicle occlusion occurs. To overcome this drawback,
the authors have replaced NMS with soft NMS in their
proposed architecture. Soft NMS suppresses proposals
based on their objectness scores which are computed Fig. 4. Steps of object detection using YOLO version 1 [17]
according to overlap level of winning proposals and
neighbouring proposals and thus reduces the errors Limitations of YOLO version 1:
occurring in detection tasks. The authors have also i. The maximum number of objects detected by a YOLO
substituted RoI pooling layer of Faster R-CNN with detector always depends on the dimension of the grid as YOLO
context aware pooling layer to fully preserve the can detect only one object per grid. Like, if the size of the grid
contextual information. They have also used the depth- 2
wise convolution structure in MobileNet architecture to is S  S , the maximum number of objects detected is S .
perform classification of objects and adjustment of ii. As the maximum number of objects detected by the
bounding box co-ordinates. YOLO detector per grid is 1, so it performs erroneous detection
4. Mu et.al. [15]: The authors have designed this Faster when more than one object exist within a grid.
R-CNN based deep neural network primarily to
perform vehicle detection in aerial images. Initially, the B. YOLO version 2 [18]:
authors have performed data augmentation using their Redmon et. al. [18] has proposed an improved version of
proposed oversampling and stitching based data YOLO also known as YOLO9000 which not only excels state-
augmentation method in order to solve the of-art methods like Fast R-CNN, Faster R-CNN in terms of
discrepancies arising due to small size of vehicles in efficiency but also performs detection within a reasonable
aerial images as well as to solve the positive and amount of time. In this version of YOLO detector, the authors
negative samples imbalance issue. The authors here have performed various changes in the architecture of YOLO
have used ResNet101 [16] as feature extractor in their version 1 in order to solve its limitations.
designed architecture. The traditional model of
ResNet101 possess four pooling layers which diminish Some notable architectural changes which are done in
YOLO version 1:
the feature maps generated for images of size 32  32
into a size of 2  2 , which leads to huge loss of a. Introduction of Batch Normalization Layer: The
information. So to overcome this information loss, the introduction of this layer after all convolutional layer improves
authors here have performed amplification of feature the performance of the detector and eliminates the chances of
maps using bilinear interpolation method to preserve overfitting without even adding the dropout layers.
the information loss. The authors have also designed a b. Unlike YOLO detector which uses images of dimension
joint loss function by combing the losses of horizontal 224  224 for training and increases their dimension into
bounding boxes and oriented bounding boxes so that 448  448 during test phase. The sudden increase of the
their designed architecture can detect horizontal and image resolution during the test phase decrease the
oriented vehicles simultaneously. performance efficiency of YOLO detector version 1. Hence, to
overcome this drawback YOLO version 2, fine tuning is done
IV. YOLO AND ITS’ EVOLUTION and network is trained on images of dimension 448  448 for
10 epochs so that it can gradually adjust with images of high
A. YOLO version1[17]: resolution. Hence, the problem arising due to the decrease in
Redmon et. al. have designed this object detection network mAP (mean Average Precision) which occurs in YOLO due to
to reduce huge run-time complexities of R-CNN and its’ sudden increase in image dimension is solved.
proposed variants. Unlike R-CNN and its’ variants, YOLO c. This improved model does not predict the offset values
does not require region proposals to localize and classify using the fully connected layers which lie on the the top of
convolutional layers like YOLO version, instead it removes the
objects, instead it divides an entire image into S  S grid and fully connected layers from the architecture and predicts
within each grid it locates ' m' number of bounding boxes. objectness scores using the anchor boxes. The use of anchor
Each bounding box predicts a class probability and offset boxes although reduce the mAP of YOLO version 2 in
values. The bounding boxes which predict class probabilities comparison to YOLO version 1 but it increases its’ Recall
below a certain threshold are suppressed. Pictorial value.
1444
d. YOLO version 1 performs training of the network using D. YOLO version 4 [20]
hand annotated bounding boxes but to make the learning YOLO version 4 is designed taking the inspiration from
process more easier, the authors in [18] have performed several Bag-of-Freebies and Bag-of-Specials object detection
training of their network using bounding boxes generated using methods. Bag-of-Freebies method increases the inference time
k-means algorithm in combination with their proposed distance and training cost of the detector but increases its’ accuracy
metric, which is mathematically defined in (1). while Bag-of-Specials methods increases the inference cost of
the method to some extent but increases its’ accuracy.
d (box, centroid )  1  IOU (box, centroid ) 
Apart from these modifications, other improvements
performed in YOLO version 4 model are selection of optimal
The authors have considered the value of ' k ' to be 5 in their values of hyper-parameters using genetic algorithms,
work as it achieves a good trade-off between the network’s introduction of data-augmentation methods like Self-
performance and complexity. Adversarial Training (SAT) and Mosaic, alterations of existing
The other significant characteristics of the model which methods like Cross mini-Batch Normalization, Spatial
needs to be mentioned are: Attention Module, etc.
e. Direct location prediction: This characteristic mainly
deals with the stability of the method after the introduction of E. YOLO version 5 [21]
the anchor boxes as the introduction of anchor boxes increases
Unlike previous versions of YOLO which have been
the instability of the model to some extent. To increase the
developed using Darknet research framework, this is the first
stability of the model, the authors have constrained the co-
version of YOLO which is developed in PyTorch framework.
ordinates of bounding boxes within [0 1] using logistic
This makes YOLO version 5 much more production ready
activation.
compared to its’ previous versions as PyTorch is much more
f. Fine-grained features: Most of the state-of-the-art
easily configurable compared to Darknet.
methods like Faster R-CNN run on features with different
resolutions in order to adapt the network to different Another notable improvement of this version of YOLO is
resolutions. But in YOLO version 2, instead of running the its’ run-time. YOLO version 5 is much faster compared to its’
network on features with different resolutions, the authors have previously proposed versions. The inference time of YOLO
simply added a passthrough layer to the network which version 5 is 140 frames per second while inference time of
concatenates both low resolution as well as high resolution YOLO version 4 is 50 frames per second when it is designed
features by adjacently stacking them instead of locating them using same PyTorch library as that of YOLO version 5.
spatially.
g. Multi-scale Training: Unlike YOLO version 1 which V. METHODS DESIGNED BASED ON YOLO
ARCHITECTURE
trains network using images of resolution 448  448 , YOLO
version 2 trains the network using images of different Xu et. al. [22]: In this work, the authors have performed
resolutions. This network runs on images of a particular vehicle detection in aerial images using YOLO version 3
resolution for 10 epochs and then randomly changes the network but only after some modifications. The authors have
resolution of images. This network has the down-sampling rate increased the depth of YOLO version 3 network by
of 32, and range of resolutions varies from 302 to 608. This increasing the number of convolutional layers to 75 as they
characteristic of the network helps it to adjust to different empirically found that at this depth, the network achieves
resolutions and perform efficiently irrespective of image desirable performance in detecting vehicles in aerial images.
resolutions.
The architecture of YOLO version 3 proposed in [19] cannot
detect vehicles in aerial images efficiently due to the small
C.YOLO version 3 [19]: size of vehicles and complex background of images. As the
YOLO version 3 is an improved version of YOLO detector top level features provide more information about small
which is designed by Redmon et. al. [19]. YOLO version 3 objects, the authors in [22] have mostly modified the
does not use softmax classifier to predict classes of detected connections between up-sampling and down-sampling layers
objects as it allows the prediction of only one class per object in order to preserve more top level features so that small
and thus fails to efficiently handle multiclass prediction. To vehicles can be detected accurately.
overcome this drawback, YOLO version 3 uses independent
logistic classifiers for each class, which allows it to efficiently 2. Ghoreyshi et. al. [23]: The authors have designed two
handle multi-class prediction. different vehicle detection networks in this work to detect
vehicles whose images are taken from Iranian websites. The
Unlike YOLO version 2 which uses Darknet-19 as feature
images of vehicles which are used for training and testing of
extractor, YOLO version 3 uses a hybrid feature extraction
networks in this work bear a lot of similarities among them.
approach by combining features extracted using Darknet-19
The first network is designed by merging ResNet network [16]
and the residual network. The proposed architecture of YOLO
and Single Shot Detector (SSD) [24]. ResNet is used for
version 3 has several shortcut connections which increases its’
feature extraction and SSD is used for object localization. The
performance efficiency while detecting small objects but
second network which the authors have designed for
decreases its’ performance efficiency while detecting large and
performing vehicle detection in this work inspired by the
medium objects.
1445
architectures of Vgg network is a modified version of YOLO. the right direction in this work. In other cases, the vehicles are
Some significant characteristics of the YOLO architecture considered to be moving in the wrong direction.
based network are:
4. Zhou et. al. [26] has primarily designed this method to
A. Most convolutional layers have filters of size 3x3. perform vehicle detection in satellite images. In order to
perform vehicle detection using satellite images, in this work
B. The convolutional layers mostly have same number of
the authors have chosen a modified YOLO version 3 network.
filters in exception when the size of feature maps be halved .In The modifications are done in YOLO version 3 network
such cases, the number of filters is doubled to maintain the considering the fact that the vehicles in satellite images are
time complexity of the network. very small and also the background of satellite images cause
C. In this network, convolutional layers perform sampling interference in performing accurate vehicle detection.
using a stride value of 2. The notable changes done in YOLO version 3 network in
D. The final layer of the network is a softmax layer where order to adapt it to the characteristics of satellite images in this
the number of output neurons is equal to the number of output work are listed below:
classes. a. Here the authors have trained the network using an image
3. Rahaman et al. proposed a three step real-time wrong way set comprising of satellite images.
vehicle detection method in [25]. b. In this work, the anchor points are chosen using K-means
The first step of the method deals with vehicle detection algorithm. The bounding boxes are generated from those
which is done using YOLO version 3 detector [19] which is chosen anchor points.
discussed briefly in Section 4. 5. Doan et. al. [27] have designed this method to perform
The second step of the method deals with tracking of vehicle detection and counting using YOLO version 4 network
detected vehicles using a centroid tracker. In this step, the [20] and DeepSORT network [28]. YOLO version 4 network is
bounding boxes of vehicles detected by YOLO version 3 used in this work to predict co-ordinates of the bounding boxes,
detector in the previous step are fed as inputs and centroid of class of detected objects and confidence scores of objects.
each bounding box is calculated to detect the current position DeepSORT network is used in this work to track detected
of the vehicle. This centroid tracker algorithm is designed objects. Kalman filter present in DeeSORT network helps in
based on the assumption that the difference in between the tracking objects by facilitating the use of previous states to
position of a vehicle in consecutive frames of a video is very predict the closest frames of objects. It also helps to avoid
little. Here the tracking method is based on camera view, hence duplicate tracking of vehicles by setting a threshold in the first
it is done manually. A region of interest is first initialized and a frame.
vehicle is only tracked if the computed centroid of its’ However, as Kalman filter handles each detected object
bounding box lie within the region of interest and then an
independently, no connection can be established between
identification number is assigned to that vehicle and details of detected objects and tracked objects. In this work, the authors
the vehicle is entered in the tracking list corresponding to the
have solved this problem by using square Mahalanobis distance
assigned identification number. Once, the centroid of the to combine the uncertainty elements from Kalman filter and
bounding box of any vehicle goes out of the region of interest,
Hungarian algorithm to link data.
the details of the vehicle is removed from the tracking list. Also
as the vehicle moves, the centroid of its’ bounding box changes, Counting of detected vehicles is done using YOLO version 4
then in such cases, the details of the vehicle is updated in the network in combination with DeepSORT network. In the
tracking list as long as the centroid of the bounding box of the counting phase, initially the outputs of YOLO version 4
vehicle lie with the region of interest. network are fed as inputs to DeepSORT network which in turn
assigns an identification number to the vehicle when its’
The third step of the algorithm deals with the detection of bounding box co-ordinates suggest that it has entered pre-
the direction of vehicles. In [25], the authors have tracked the determined region of interest area for the first time, then only
direction of vehicle using the height of the centroid. Direction the counter corresponding to the object class of that vehicle
of a vehicle is determined in [25] using the following logic: will be incremented by one.
a. Let when the centroid of the bounding box of any vehicle
first comes within the region of interest, then its’ centroid VI. CONCLUSION, UNEXPOLRED ASPECTS AND
height is computed to be H 1 . FUTURE SCOPE OF WORK
b. As the vehicle moves, its’ centroid height also gets After studying the methodologies proposed in each work
changed along with its’ position. If the updated centroid lies we have included in this survey, we can conclude that there is a
within the region of interest, then its height is computed. Let room for improvement especially from the run-time complexity
aspect. In this survey, we have studied several variants of R-
the computed height is H 2 . CNN and YOLO, but we have found that the existing methods
are mostly designed based on the architectures of few of them.
c. If H 1  H 2 , then the designed method predicts that the So future work in this research area can be focused on
vehicle is coming towards the camera which is considered as designing a vehicle detection method based on YOLO version
5 architecture.
1446
Apart from performing vehicle detection, another important [14] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. W., M.
aspect is to track them properly to prevent collisions. After Andreetto and H. Adam “MobileNets: eﬃcient convolutional neural
networks for mobile vision applications” ,2017, arXiv:1704.04861v1.
going through these works, we can conclude that tracking
[15] N.Mo and L.Yan, “Improved Faster RCNN Based on Feature
methods should be improved as the tracking methods proposed Ampliﬁcation and Oversampling Data Augmentation for Oriented
till date are mostly manual and are solely dependent on camera Vehicle Detection in Aerial Images”, Remote Sensing, 2020.
view. [16] K. He, X. Zhang, S. Ren and J. Sun, “Deep Residual Learning for
Image Recognition”, 2015, arXiv:1512.03385v1.
The similarities between different classes of vehicles as
[17] J.Redmon, S.Divvala, R. Girshick and A. Farhadi, “You Only Look
well as small size of vehicles in satellite images also requires Once:Unified, Real-Time Object Detection”,2016, arXiv:1506.02640v5.
fine-tuning of network parameters to achieve desired results. [18] J. Redmon and A. Farhadi, “YOLO9000: Better, Faster, Stronger”,
arXiv:1612.08242v1
REFERENCES [19] J.Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement”,
2018, arXiv:1804.02767v1.
[20] A. Bochkovskiy, C-Y.Wang,H-Y.Mark Liao, “YOLOv4: Optimal Speed
[1] M. Ramu (2015) Poor visibility due to bad weather is killing hundreds in
and Accuracy of Object Detection”, 2020, arXiv:2004.10934v1.
accidents. THE HINDU. https ://www.thehi ndu.com/ news/citie
s/Hyderabad/poor-visibility -due-to-bad-weather-is killing-hundreds-in- [21] G.Jocher, https://github.com/ultralytics/yolov5, 2020.
accidents/article743 9794.ece, Accessed 9 Oct 2019 [22] B. Xu, B.Wang and Y.Gu, “Vehicle Detection in Aerial Images Using
[2] Federal Highway Administration (2018) Road weather Management Modified YOLO”, IEEE Int. Conf. on Communication Technology,
Program. U.S. Department of Transportation. China, 2019.
https://ops.fhwa.dot.gov/weather/q1_roadimpact.htm, Accessed 25 [23] A.M. Ghoreyshi, A. AkhavanPour and A. Bossaghzadeh,
February, 2021. “ Simultaneous Vehicle Detection and Classification Model based on
[3] https://timesofindia.indiatimes.com/india/90-deaths-on-roads-due-to- Deep YOLO Networks” , Int. Conf. on Machine Vision and Image
rash-driving-ncrb/articleshow/61898677.cms, Accessed 25 February, Processing, Iran, 2020.
2021. [24] W. Liu , D. Anguelov , D. Erhan , C. Szegedy , S. Reed , C-Y. Fu1 and
[4] C. Stauffer and W. E. L. Grimson, Adaptive background mixture models A. C. Berg, “SSD: Single Shot MultiBox Detector”, 2016,
for real-time tracking, in Computer Vision and Pattern Recognition, arXiv:1512.02325v5.
1999. IEEE Computer Society Conference on, Fort Collins, 1999. [25] Z. Rahman, A. M. Ami and M. A. Ullah, “A Real-Time Wrong-Way
[5] R. Girshick, J. Donahue, T. Darrell , J. Malik and UC Berkeley, “ Rich Vehicle Detection Based on YOLO and Centroid Tracking”, IEEE
feature hierarchies for accurate object detection and semantic Region 10 Symposium (TENSYMP), June 2020.
segmentation”, IEEE Int. Conf. on Computer Vision and Pattern [26] L. Zhou, J. Liu and L. Chen, “Vehicle detection based on remote sensing
Recognition, USA, June 2014. image of YOLOv3”, IEEE Int. Conf. on Information Technology,
[6] R. Girshick, “Fast R-CNN”, IEEE Int. Conf. on Computer Vision Networking, Electronic and Automation Control, June, 2020.
(ICCV), Chile, December 2015. [27] T-N. Doan and M-T. Truong, “Real-time vehicle detection and counting
[7] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real- based on YOLO and DeepSORT ”,IEEE Int. Conf. on Knowledge and
Time Object Detection with Region Proposal Networks”, 2015, Systems Engineering , Vietnam, 2020.
arXiv:1506.01497v3. [28] F. Yu, W. Li, Q. Li, Y.Liu, X. Shi and J. Yan, “POI: Multiple Object
[8] Q.Fan, L.Brown and J.Smith, “A Closer Look at Faster R-CNN for Tracking with High Performance Detection and Appearance Feature”,
Vehicle Detection”, IEEE Intelligent Vehicles Symposium, Sweden, 2016, arXiv:1610.06136v1.
2016.
[9] A. Geiger,P. Lenz and R.Urtasun, “Are we ready for Autonomous
Driving? The KITTI Vision Benchmark Suite”, Conference on
Computer Vision and Pattern Recognition (CVPR), USA, 2012.
[10] J.E. Espinosa, S.A.Velastin and J.W.Branch, “Vehicle Detection Using
Alex Net and Faster R-CNN Deep Learning Models: A Comparative
Study”, Int. Visual Informatics Conference, Malaysia, November 2017.
[11] A. Krizhevsky, I. Sutskever and G. E. Hinton, “ImageNet Classification
with Deep Convolutional Neural Networks”, Communications of the
ACM, 60(6), 2017.
[12] K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks
for Large-Scale Image Recognition”, Int.l Conf.on Learning
Representations (ICLR), May 2015.
[13] H. Nyugen, “Improving Faster R-CNN Framework for Fast Vehicle
Detection”, Hindawi Mathematical Problems in Engineering, 2019.
Article ID 3808064
1447
View publication stats

(GOOD) Faster R-CNN and YOLO Based Vehicle Detection A Survey

Uploaded by

Copyright:

Available Formats

(GOOD) Faster R-CNN and YOLO Based Vehicle Detection A Survey

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(GOOD) Faster R-CNN and YOLO Based Vehicle Detection A Survey

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Faster R-CNN and YOLO based Vehicle detection: A Survey

Conference Paper · April 2021

Madhusri Maity Sriparna Banerjee

SEE PROFILE SEE PROFILE

Sheli Sinha Chaudhuri

The user has requested enhancement of the downloaded file.

Faster R-CNN and YOLO based Vehicle detection: A

Madhusri Maity1, Sriparna Banerjee2, Sheli Sinha Chaudhuri3

Fig. 1. System overview of R-CNN [5]

You might also like