A Real-Time Vehicle Counting, Speed Estimation, and Classification System Based On Virtual Detection Zone and YOLO
A Real-Time Vehicle Counting, Speed Estimation, and Classification System Based On Virtual Detection Zone and YOLO
A Real-Time Vehicle Counting, Speed Estimation, and Classification System Based On Virtual Detection Zone and YOLO
Research Article
A Real-Time Vehicle Counting, Speed Estimation, and
Classification System Based on Virtual Detection Zone and YOLO
Received 7 May 2021; Revised 30 July 2021; Accepted 7 October 2021; Published 2 November 2021
Copyright © 2021 Cheng-Jian Lin et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
In recent years, vehicle detection and classification have become essential tasks of intelligent transportation systems, and real-time,
accurate vehicle detection from image and video data for traffic monitoring remains challenging. The most noteworthy challenges
are real-time system operation to accurately locate and classify vehicles in traffic flows and working around total occlusions that
hinder vehicle tracking. For real-time traffic monitoring, we present a traffic monitoring approach that overcomes the
abovementioned challenges by employing convolutional neural networks that utilize You Only Look Once (YOLO). A real-time
traffic monitoring system has been developed, and it has attracted significant attention from traffic management departments.
Digitally processing and analyzing these videos in real time is crucial for extracting reliable data on traffic flow. Therefore, this
study presents a real-time traffic monitoring system based on a virtual detection zone, Gaussian mixture model (GMM), and
YOLO to increase the vehicle counting and classification efficiency. GMM and a virtual detection zone are used for vehicle
counting, and YOLO is used to classify vehicles. Moreover, the distance and time traveled by a vehicle are used to estimate the
speed of the vehicle. In this study, the Montevideo Audio and Video Dataset (MAVD), the GARM Road-Traffic Monitoring data
set (GRAM-RTM), and our collection data sets are used to verify the proposed method. Experimental results indicate that the
proposed method with YOLOv4 achieved the highest classification accuracy of 98.91% and 99.5% in MAVD and GRAM-RTM
data sets, respectively. Moreover, the proposed method with YOLOv4 also achieves the highest classification accuracy of 99.1%,
98.6%, and 98% in daytime, night time, and rainy day, respectively. In addition, the average absolute percentage error of vehicle
speed estimation with the proposed method is about 7.6%.
networks, such as single-shot detection (SSD) [9], Fast to perform real-time vehicle counting, vehicle speed esti-
R-CNN [10], YOLOv3 [11], and YOLOv4 [12], have been mation, and vehicle classification; (2) the virtual detection
implemented for traffic detection using deep learning object zone, GMM, and YOLO are used to increase vehicle counting
detectors [13]. For example, Biswas et al. [14] implemented and classification efficiency; (3) the distance and time traveled
SSD to estimate traffic density. Yang et al. [15] proposed a by a vehicle is proposed to estimate the vehicle speed; and (4)
multitasking-capable Faster R-CNN method that uses a the MAVD, GRAM-RTM, and our collection data sets are
single image to generate three-dimensional (3D) space co- used to verify various methods and the proposed method with
ordinate information for an object with monocular vision to YOLOv4 achieving the highest classification accuracy in the
facilitate autonomous driving. Huang et al. [8] proposed a three data sets.
single-stage deep neural network called YOLOv3 and ap- The remainder of this study is organized as follows.
plied it to data sets generated in different environments to Section 2 describes the materials and methods, including
improve its real-time detection accuracy. Hu et al. [16] data set preparation, vehicle counting method, and vehicle
proposed an improved YOLOv4-based video stream vehicle classification method. Section 3 presents the results of and a
target detection algorithm to solve the problem in the de- discussion on the proposed real-time vehicle counting, speed
tection speed. In addition, the most noteworthy challenges estimation, and classification system based on a virtual
associated with traffic monitoring systems are real-time detection zone and YOLO. Finally, Section 4 presents a few
operation for accurately locating and classifying vehicles in concluding remarks and an outline for future research on
traffic flows and total occlusions that hinder vehicle tracking. real-time traffic monitoring.
Therefore, YOLO was developed as a regression-based, high-
performance algorithm for the real-time detection of and 2. Materials and Methods
statistics collection from vehicle flows.
The robustness of YOLOv3 and YOLOv4 to road To count vehicles from traffic videos, this study proposes a
marking detection improves its accuracy in small target real-time vehicle counting, speed estimation, and classifi-
detection. The model that is based on the TensorFlow cation system based on the virtual detection zone and
framework, to enhance the real-time monitoring of traffic- YOLO. We combined a vehicle detection method with a
flow problems by an intelligent transportation system [17]. classification system on the basis of two conditions between
The YOLOv3 network comprises 53 layers. It uses the the virtual detection zone and the virtual detection lane line.
Feature Pyramid Network for pedestrian detection to handle To detect vehicles, a Gaussian mixture model (GMM) is
general multiscale object detection problems and the deep applied to detect moving objects in each frame of a traffic
residual network (ResNet) ideas to extract image features for video. Figure 1 shows a flowchart of the vehicle counting and
achieving a trade-off between detection speed and detection classification process used in the proposed real-time vehicle
accuracy [18]. In addition to leveraging anchor boxes with counting, speed estimation, and classification system. In this
predesigned scales and aspect ratios to predict vehicles of study, first, traffic videos are collected to train the image data
different sizes, YOLOv3 and YOLOv4 can realize real-time and used to perform vehicle classification verification. Next,
vehicle detection with a top-down architecture [19]. GMM and virtual detection zone are used for vehicle
Moreover, a real-time vehicle detection and classification counting. Finally, YOLO is used to perform vehicle classi-
system can perform foreground extraction, vehicle detec- fication in real time. In this study, the three steps are de-
tion, vehicle feature extraction, and vehicle classification scribed as follows:
[20]. To test the proposed method for vehicle classification, a
vehicle-feature-based virtual detection zone and virtual Part 1: Collect traffic videos from online cameras.
detection line, which are predefined for each frame in a In this study, traffic videos were collected from online
video, are used for vehicle feature computation [21]. Grents cameras and used for image data training and vehicle
et al. [22] proposed a video-based system that uses a con- classification verification, as described in Section 2.1.
volutional neural network to count vehicles, classify vehicles,
and determine the vehicle speed. Tabassum et al. [23, 24] Part 2: Perform vehicle counting using GMM and
applied YOLO and a transfer learning approach to recognize virtual detection zone.
native vehicles and vehicle classification on Bangladeshi To realize real-time vehicle counting, object detection
Roads. Therefore, YOLO can be used to obtain a better and recognition are performed. A virtual detection lane line
matching map. and virtual detection zone are used to perform vehicle
To increase vehicle counting and classification problems counting and speed estimation, respectively, as described in
in real-time traffic monitoring, this study presents a real-time Section 2.2 and Section 2.4, respectively.
traffic monitoring system based on a virtual detection zone,
Gaussian mixture model (GMM), and YOLO to increase the Part 3: Perform vehicle classification and speed esti-
vehicle counting and classification efficiency. GMM and a mation using the YOLOv3 and YOLOv4 algorithms.
virtual detection zone are used for vehicle counting, and
YOLO is used to classify vehicles. Moreover, the distance and
time traveled by a vehicle are used to estimate the speed of the 2.1. Data Set Preparation. The data set used in this study was
vehicle. The major contributions of this study are described as prepared by collecting traffic videos recorded with online
follows: (1) A real-time traffic monitoring system is developed cameras installed along various roads in Taiwan. Image data
Mathematical Problems in Engineering 3
Online
camera
1 Sedan 3.6–5.5
Part1
Collect traffic video
2 Truck >5.5–11
Part 2
Use GMM and virtual detection zone
for vehicle counting
Part 3
Use YOLO for vehicle
classification
3 Scooter 1–2.5
End
Class1:sedan
Class2:truck
Class3:scooter
Class4:bus
Class5:hlinkcar
YOLO
Class6:flinkcar
Figure 3: Architecture of visual classifier based on the YOLO algorithm for verifying the vehicle classification.
Input
416*416*3 32
64
128
256
512
1024
Darknet-53 [11]. In Darknet-53, alternating convolution extraction and subsequently uses the feature pyramid top-
kernels are used, and after each convolution layer, a batch down and lateral connections to generate three features with
normalization layer is used for normalization. The leaky sizes of 13 × 13 × 1024, 26 × 26 × 512, and 52 × 52 × 256 px.
rectified linear unit function is used as the activation The final output depth is (5 + class) × 3, which indicates that
function, the pooling layer is discarded, and the step size of the following parameters are predicted: four basic param-
the convolution kernel is increased to reduce the size of the eters and the credibility of a box across three regression
feature map. The YOLOv3 model uses ResNet for feature bounding boxes as well as the possibility of each class being
Mathematical Problems in Engineering 5
contained in the bounding box. YOLOv3 uses the sigmoid number of output channels; PCB technology can make the
function to score each class. When the class score is higher model more flexible because it can be adjusted according to
than the threshold, the object is considered to belong to a the structure to achieve the best accuracy-speed balance.
given category, and any object can simultaneously have The loss function remains the same as the YOLOv4
multiple class identities without conflict. model, which consists of three parts: classification loss,
The loss function of YOLOv3 is mainly divided into four regression loss, and confidence loss [28]. Classification loss
parts. A denotes the loss of the identified center coordinates and confidence loss remain the same as the YOLOv3 model,
that is used to predict (x, y) in the bounding box to ensure but complete intersection over union (CIoU) is used to
that it is only valid for the highest predicted target. B is the replace mean-squared error (MSE) to optimize the regres-
loss of (w, h) width and height in the predicted bounding sion loss [29]. The CIoU loss function is shown as follows:
box, and the error value reflects the bounding box of dif-
ferent sizes in the object to predict the square root of the ρ2 b, bgt
LOSS � 1 − IoU + + αυ
width and height instead of directly predicting the width and c2
height of the bounding box. C is the loss of the predicted
S2 B
object category, assuming that each box is a cell; if the center obj
− Iij Ci log Ci + 1 − Ci log 1 − Ci
of the object detection is in this cell, then mark the cell with i�0 j�0
bounding box (x, y, w, h), and there is also category in-
S2 B
formation to meet which object in the image to predict in the noobj
− λnoobj Iij Ci log Ci + 1 − Ci log 1 − Ci
cell. D denotes the loss of the credibility of the predicted i�0 j�0
object to calculate the credibility in each bounding box to
know that when the bounding box predicts the object. When S2
obj
the object is not predicted, there will be a credibility pre- − Iij pi (c)log pi (c) + 1 − pi (c)log 1 − pi (c),
i�0 c∈classes
diction penalty λnoobj � 0.5, and it is defined as follows:
(2)
A
√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√
√
s2 B where S2 represents S × S grids; each grid generates B can-
obj 2 2
λcoord ℓij i + yi − y
xi − x i didate boxes, and each candidate box gets corresponding
i�0 j�0 bounding boxes through the network; finally, S × S × B
B
√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√ bounding boxes are formed. If there is no object (noobj) in
�� 2
s2 B
√�� �� 2 �� the box, only the confidence loss of the box is calculated. The
i + hi − hi ⎤⎥⎦
obj
+λcoord ℓij wi − w
i�0 j�0
confidence loss function uses cross entropy error and is
C divided into two parts: there is the object (obj) and noobj.
√√√√√√√√√√√√√√√√√√√√√√√
√ The loss of noobj increases the weight coefficient λ, which is
s2 B
obj 2 to reduce the contribution weight of the noobj calculation
i (c)
+ ℓi pi (c) − p
i�0 j�0 part. The classification loss function also uses cross entropy
D
√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√√ error. When the j-th anchor box of the i-th grid is re-
s2 B s2 B sponsible for certain ground truth, then the bounding box
obj I 2 + λnoobj ℓnoobj Ci − C
+ ℓij Ci − C I 2 , generated by this anchor box will calculate the classification
ij
i�0 j�0 i�0 j�0
loss function.
(1)
where xi , yi is the location of the centroid of the anchor box 2.4. Speed Estimation. The real-time vehicle speed is also
and wi , hi is the width and height of the anchor box. Ci is the calculated in this study. Figure 6 shows the video images
Objectness, i.e., confidence score of whether there is an object taken along the direction parallel to the length of the car
or not, and pi (c) is the classification loss. (defined as the y-axis) and parallel to the width of the car
YOLOv4 is the latest algorithm of YOLO series, which is (defined as the x-axis). First, as per the scale in the video, the
the basis of YOLOv3, scales both up and down and is ap- yellow line (referred to as L) in the red circle has a length of
plicable to small and large networks while maintaining op- 4 m in accordance with traffic laws. A GMM is used to draw a
timal speed and accuracy, and the network architecture is virtual detection zone (blue box) on the road to be tested
shown in Figure 5. Compared with YOLOv3, YOLOv4-tiny is (referred to as Q). The green box is the car frame (referred to
an extended version of YOLOv3. The original Darknet53 as C), and the midpoint of the car is Ct.
network is added with a CSP network. Backbone is CSPO-
SANet proposed by Cross Stage Partial Network
LAB (px)
(CSPNet) + One-Shot Aggregation Network (OSANet), plus u0 � ,
Partial in Computational Blocks (PCB) technology. CSPNet LAB (m)
(3)
can be applied to different CNN architectures to reduce the u
amount of parameters and calculations while improving α � 0,
u0′
accuracy. OSANet is derived from the OSA model in VoVNet.
Its central idea is improved by the DenseNet module. At the u0
If α > 1, ui � u0 αi , uj � , (4)
end, all layers are connected to allow input consistent with the αj
6 Mathematical Problems in Engineering
CSPDarkNet53
608×608 304×304 152×152 76×76 38×38 19×19
y19 y19_output
process2 process5 YOLO HEAD3 19x19,(5+class)*3
y38 y38_output
process3 process4 YOLO HEAD3 38x38,(5+class)*3
y76 y76_output
YOLO HEAD3 76x76,(5+class)*3
(a) (b)
Figure 6: Diagram of speed estimation ((a): real picture on the video; (b): control scale).
where u0 is the scale, i is the scale of the blue box, j is the scale 3.6
of the green box, px is the length of the video, and m is the v′ � x × . (8)
t
actual length. The parameter α denotes the increase or de-
crease in relationship of the scale per unit length on the y-axis. Equation (7) is used for calculating vehicle speed. After
If α > 1, the speed calculation is performed using equation (4). unit conversion (m/s to km/h), Equation (8) provides the
To calculate the parallel L line segment of Ct (referred to vehicle speed.
as L∗ ), the algorithm computes the L∗ distance y between A
and B. Then, it restores y from its scale relationship with the 3. Results and Discussion
actual line segment length x, where x denotes the distance
traveled by the vehicle in Q. All experiments in this study were performed using the YOLO
p p
algorithm under the Darknet framework, and the program
x � y0 u0 + yi ui yj uj . was written in Python 2.7. To validate the real-time traffic
(5)
i�1 j�1 monitoring system, we used a real-world data set to perform
vehicle detection, vehicle counting, speed estimation, and
In the calculation process, the program is used to de- classification. In this study, three test data sets were used to
termine the frame rate of the video and calculate the number evaluate the proposed method. One of these data sets was
of frames for which the vehicle travels in Q (referred to as p). mainly derived from traffic video images of online cameras on
Equation (6) is used to find the travel time of the vehicle various roads in Taiwan, and it contains 12,761 training
from A to B in Q. images and 3,190 testing images. Second, the Montevideo
p Audio and Video Dataset (MAVD), which contains data on
t� , (6) different levels of traffic activity and social use characteristics
fps
in Montevideo city, Uruguay, was used as the other traffic data
x set [30]. Finally, GARM Road-Traffic Monitoring (GRAM-
v� , (7) RTM) data set [21] has four categories (i.e., cars, trucks, vans,
t
Mathematical Problems in Engineering 7
and big-trucks). The total number of different objects in each Table 2: The real-time vehicle counting using the proposed method
sequence is 256 for M-30, 235 for M-30-HD, and 237 for in the daytime.
Urban 1. In this study, the definition of accuracy is based on Actual number of Estimated number of
the classification of vehicles in the database. In the video Video no. vehicles vehicles
verification, if the results of manual and proposed system S L Total S L Total
classification of the vehicles are the same, it means that the
1 31 7 38 29 6 35
count is correct; otherwise, it is the wrong vehicle counting. 2 18 0 18 18 0 18
3 16 5 21 16 5 21
4 28 0 28 25 0 25
3.1. Vehicle Counting. Seven input videos of the road, each
5 22 3 25 20 3 23
ranging in length between 3 and 5 minutes, were recorded 6 11 0 11 11 0 11
at 10 am and 8 pm. In addition, eleven input videos of the 7 11 1 12 9 0 9
road in the rain were also recorded for testing. Each frame
in these traffic videos was captured at 30 fps. The first
experimental results of real-time vehicle counting using the Table 3: The real-time vehicle counting using the proposed method
proposed method during the day are summarized in Ta- in the night time.
ble 2. The symbols S and L denote small and large vehicles,
Actual number of Estimated number of
respectively. The vehicle counting accuracy of the proposed vehicles vehicles
Video no.
method at 10 am was 95.5%. The second experimental
S L Total S L Total
results of real-time vehicle counting using the proposed
method during the night are summarized in Table 3. The 1 23 8 31 23 7 30
2 31 5 36 29 5 34
vehicle counting accuracy of the proposed method at 8 pm
3 15 2 17 14 1 15
was 98.5%. In addition, the third experimental results of 4 10 3 13 9 3 12
real-time vehicle counting using the proposed method in 5 19 4 23 19 4 23
the rain are summarized in Table 4. The vehicle counting 6 11 2 13 11 2 13
accuracy of the proposed method was 94%. Screenshots of 7 35 5 38 34 3 37
vehicle detection with the proposed real-time vehicle
counting and classification system are depicted in Figure 7,
where the detected vehicles are represented as green
rectangles. Table 4: The real-time vehicle counting using the proposed method
Vehicle counting in online videos is delayed due to in the raining day.
network stoppages or because the target vehicle may be Actual number of Estimated number of
blocked by other vehicles on the screen, which causes the Video no. vehicles vehicles
count to be missed. In addition, poor lighting in the rain and S L Total S L Total
night affects the vehicle recognition capabilities of YOLOv3 1 9 0 9 9 0 9
and YOLOv4. These challenges can be overcome using a 2 12 0 12 10 1 11
stable network connection and adjusting the camera 3 13 0 13 13 0 13
brightness, respectively. Therefore, the novelty of this study 4 7 0 7 8 0 8
is to solve the problem of unclear recognition in the rain. 5 11 1 12 14 1 15
6 15 0 15 17 0 17
7 10 0 10 13 0 13
3.2. Speed Estimation. In this subsection, the vehicle speed 8 7 0 7 10 0 10
can be estimated using the proposed method. Table 5 lists 9 12 0 12 15 0 15
the actual and the estimated speeds of the vehicles. The 10 12 0 12 14 0 14
results indicate that the average absolute percentage error 11 17 0 17 19 0 19
of vehicle speed estimation was about 7.6%. The use of
online video for vehicle speed estimation will cause large
speed errors due to network delays. Therefore, network MAVD traffic data set was 93.84%. Vehicle classification
stability is essential to reduce the percentage error in the results of the proposed method using MAVD traffic data set
speed estimation. are listed in Table 6.
In summary, three data sets, namely, MAVD, GRAM-
RTM, and our collection data sets, were used to verify the
3.3. Comparison Results Using the MAVD and GRAM-RTM proposed method and Fast RCNN method [10]. The MAVD
Data Sets. MAVD traffic data set [30] and GARM Road- training and testing samples contains vehicles belonging to
Traffic Monitoring (GRAM-RTM) data set [21] were used four categories (i.e., cars, buses, motorcycles, and trucks). The
for evaluating the vehicle counting performance of the GRAM-RTM data set has four categories (i.e., cars, trucks,
proposed method. The videos were recorded with a GoPro vans, and big-trucks). The total number of different objects in
Hero 3 camera at a frame rate of 30 fps and a resolution of each sequence is as follows: 256 for M-30, 235 for M-30-HD,
1920 × 1080 px. We analyzed 10 videos, and the vehicle and 237 for Urban 1. Table 7 shows the classification accuracy
counting accuracy of the proposed method at 10 am for the results of three data sets using various methods. In Table 7, the
8 Mathematical Problems in Engineering
Figure 7: Screenshots from the proposed real-time vehicle counting, speed estimation, and classification system.
Table 5: The actual and the estimated vehicle speeds using the proposed method.
Vehicle ID Actual speed Estimated speed Difference Error (%)
1 60 63 3 5
2 70 75 5 7
3 72 63 −9 12.5
4 99 100 1 1
5 84 85 1 1
6 67 60 −7 10
7 73 71 −2 2.7
8 67 64 −3 4.4
9 37 43 6 16
10 73 77 4 5
11 55 50 −5 9
12 48 54 6 12.5
13 111 127 16 14.4
14 79 75 −4 5
15 69 71 2 2.8
16 82 75 −7 8.5
17 83 73 −10 12
Average error 7.6
Table 7: Classification accuracy results of three data sets using various methods.
Data sets Methods Accuracy (%) FPS
Faster RCNN [10] 97.21 5
MAVD Proposed method with YOLOv3 97.66 15
Proposed method with YOLOv4 98.91 15
Faster RCNN [10] 91.54 5
GRAM-RTM Proposed method with YOLOv3 98.02 15
Proposed method with YOLOv4 99.5 15
Faster RCNN [10] 97.7 5
Daytime Proposed method with YOLOv3 98 15
Proposed method with YOLOv4 99.1 15
Faster RCNN [10] 93.59 5
Our data set Night time Proposed method with YOLOv3 98 15
Proposed method with YOLOv4 98.6 15
Faster RCNN [10] 87.5 5
Rainy day Proposed method with YOLOv3 90 15
Proposed method with YOLOv4 98 15
Table 8: Classification accuracy results of various methods using GARM-RTM data set.
Our proposed method
Methods Faster RCNN [10] Gomaa et al. [31] Abdelwahab [32]
YOLOv3 YOLOv4
Accuracy (%) 91.54 96.8 93.51 98.02 99.5
[6] Á. Llamazares, E. J. Molinos, and M. Ocaña, “Detection and Computer Science and Software Engineering (JCSSE), Khon
tracking of moving obstacles (DATMO): a review,” Robotica, Kaen, Thailand, July 2016.
vol. 38, no. 5, pp. 761–774, 2020. [22] A. Grents, V. Varkentin, and N. Goryaev, “Determining
[7] C. Liu, D. Q. Huynh, Y. Sun, M. Reynolds, and S. Atkinson, “A vehicle speed based on video using convolutional neural
vision-based pipeline for vehicle counting, speed estimation, network,” Transportation Research Procedia, vol. 50,
and classification,” IEEE Transactions on Intelligent Trans- pp. 192–200, 2020.
portation Systems, pp. 1–14, 2020. [23] S. Tabassum, M. S. Ullah, N. H. Al-Nur, and S. Shatabda,
[8] Y.-Q. Huang, J.-C. Zheng, S.-D. Sun, C.-F. Yang, and J. Liu, “Native vehicles classification on Bangladeshi roads using
“Optimized YOLOv3 algorithm and its application in traffic CNN with transfer learning,” in Proceedings of the 2020 IEEE
flow detections,” Applied Sciences, vol. 10, Article ID 3079, Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, June
2020. 2020.
[9] W. Liu, D. Anguelov, D. Erhan et al., “SSD: single [24] S. Tabassum, S. Ullah, N. H. Al-nur, and S. Shatabda, “Por-
shot multibox detector,” in Proceedings of the Computer ibohon-BD: Bangladeshi local vehicle image dataset with
Vision—ECCV 2016, pp. 21–37, Amsterdam, Netherlands, annotation for classification,” Data in Brief, vol. 33, Article ID
October 2016. 106465, 2020.
[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards [25] LabelImg (accessed on 5 March 2018), https://github.com/
real-time object detection with region proposal networks,” tzutalin/labelImg.
IEEE Transactions on Pattern Analysis and Machine Intelli- [26] A. Nurhadiyatna, B. Hardjono, A. Wibisono et al., “Improved
gence, vol. 39, no. 6, pp. 1137–1149, 2017. vehicle speed estimation using Gaussian mixture model and
[11] X. Zhang and X. Zhu, “Vehicle detection in the aerial infrared hole filling algorithm,” in Proceedings of the 2013 International
images via an improved Yolov3 network,” in Proceedings of Conference on Advanced Computer Science and Information
the 2019 IEEE 4th International Conference on Signal and Systems (ICACSIS), Sanur Bali, Indonesia, September 2013.
Image Processing (ICSIP), pp. 372–376, Wuxi, China, July [27] A. Ghosh, M. S. Sabuj, H. H. Sonet, S. Shatabda, and
2019. D. M. Farid, “An adaptive video-based vehicle detection,
[12] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: classification, counting, and speed-measurement system for
optimal speed and accuracy of object detection,” 2020, https:// real-time traffic data collection,” in Proceedings of the 2019
arxiv.org/abs/2004.10934v1. IEEE Region 10 Symposium (TENSYMP), Olkata, India, June
[13] J. Redmon and A. Farhadi, “YOLO V3: an incremental im- 2019.
provement,” pp. 1–22, 2018, http://arxiv.org/abs/1804.02767. [28] L. Wu, J. Ma, Y. Zhao, and H. Liu, “Apple detection in
[14] D. Biswas, H. Su, C. Wang, A. Stevanovic, and W. Wang, “An complex scene using the improved YOLOv4 model,”
automatic traffic density estimation using single shot detec- Agronomy, vol. 11, no. 3, p. 476, 2021.
tion (SSD) and mobilenet-SSD,” Physics and Chemistry of the [29] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-
Earth, Parts A/B/C, vol. 110, pp. 176–184, 2019. IoU loss: faster and better learning for bounding box re-
[15] W. Yang, Z. Li, C. Wang, and J. Li, “A multi-task Faster gression,” in Proceedings of the 2020 AAAI Conference on
R-CNN method for 3D vehicle detection based on a single Artificial Intelligence, pp. 12993–13000, New York, NY, USA,
image,” Applied Soft Computing, vol. 95, Article ID 106533, February 2020.
2020. [30] P. Zinemanas, P. Cancela, and M. Rocamora, “MAVD: a
[16] X. Hu, Z. Wei, and W. Zhou, “A video streaming vehicle dataset for sound event detection in urban environments,” in
detection algorithm based on YOLOv4,” in Proceedings of the Proceedings of the 4th Workshop on Detection and Classifi-
2021 IEEE 5th Advanced Information Technology, Electronic cation of Acoustic Scenes and Events (DCASE 2019), New
and Automation Control Conference (IAEAC), pp. 2081–2086, York, NY, USA, October 2019.
Chongqing, China, March 2021. [31] A. Gomaa, M. M. Abdelwahab, M. Abo-Zahhad,
[17] C. Y. Cao, J. C. Zheng, Y. Q. Huang, J. Liu, and C. F. Yang, T. Minematsu, and R.-I. Taniguchi, “Robust vehicle detection
“Investigation of a promoted You Only Look once algorithm and counting algorithm employing a convolution neural
and its application in traffic flow monitoring,” Applied Sci- network and optical flow,” Sensors, vol. 19, no. 20, Article ID
ences, vol. 9, Article ID 3619, 2019. 4588, 2019.
[18] H. Zhou, L. Wei, C. P. Lim, D. Creighton, and S. Nahavandi, [32] M. A. Abdelwahab, “Accurate vehicle counting approach
“Robust vehicle detection in aerial images using bag-of-words based on deep neural networks,” in Proceedings of the 2019
International Conference on Innovative Trends in Computer
and orientation aware scanning,” IEEE Transactions on
Engineering (ITCE’2019), Aswan, Egypt, February 2019.
Geoscience and Remote Sensing, vol. 56, no. 12, pp. 7074–7085,
2018.
[19] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and
S. Belongie, “Feature pyramid networks for object detection,”
in Proceedings of the 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 936–944, Hon-
olulu, HI, USA, July 2017.
[20] C.-Y. Chen, Y.-M. Liang, and S.-W. Chen, “Vehicle classifi-
cation and counting system,” in Proceedings of the 2014 In-
ternational Conference on Audio, Language and Image
Processing (ICALIP), pp. 485–490, Shanghai, China, July 2014.
[21] N. Seenouvong, U. Watchareeruetai, C. Nuthong,
K. Khongsomboon, and N. Ohnishi, “Vehicle detection and
classification system based on virtual detection zone,” in
Proceedings of the 2016 13th International Joint Conference on