A Real-Time Vehicle Counting, Speed Estimation, and Classification System Based On Virtual Detection Zone and YOLO

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Hindawi

Mathematical Problems in Engineering


Volume 2021, Article ID 1577614, 10 pages
https://doi.org/10.1155/2021/1577614

Research Article
A Real-Time Vehicle Counting, Speed Estimation, and
Classification System Based on Virtual Detection Zone and YOLO

Cheng-Jian Lin ,1,2 Shiou-Yun Jeng,3 and Hong-Wei Lioa1


1
Department of Computer Science and Information Engineering, National Chin-Yi University of Technology,
Taichung 411, Taiwan
2
College of Intelligence, National Taichung University of Science and Technology, Taichung 404, Taiwan
3
Department of Business Administration, Asia University, Taichung 413, Taiwan

Correspondence should be addressed to Cheng-Jian Lin; [email protected]

Received 7 May 2021; Revised 30 July 2021; Accepted 7 October 2021; Published 2 November 2021

Academic Editor: Teen-Hang Meen

Copyright © 2021 Cheng-Jian Lin et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In recent years, vehicle detection and classification have become essential tasks of intelligent transportation systems, and real-time,
accurate vehicle detection from image and video data for traffic monitoring remains challenging. The most noteworthy challenges
are real-time system operation to accurately locate and classify vehicles in traffic flows and working around total occlusions that
hinder vehicle tracking. For real-time traffic monitoring, we present a traffic monitoring approach that overcomes the
abovementioned challenges by employing convolutional neural networks that utilize You Only Look Once (YOLO). A real-time
traffic monitoring system has been developed, and it has attracted significant attention from traffic management departments.
Digitally processing and analyzing these videos in real time is crucial for extracting reliable data on traffic flow. Therefore, this
study presents a real-time traffic monitoring system based on a virtual detection zone, Gaussian mixture model (GMM), and
YOLO to increase the vehicle counting and classification efficiency. GMM and a virtual detection zone are used for vehicle
counting, and YOLO is used to classify vehicles. Moreover, the distance and time traveled by a vehicle are used to estimate the
speed of the vehicle. In this study, the Montevideo Audio and Video Dataset (MAVD), the GARM Road-Traffic Monitoring data
set (GRAM-RTM), and our collection data sets are used to verify the proposed method. Experimental results indicate that the
proposed method with YOLOv4 achieved the highest classification accuracy of 98.91% and 99.5% in MAVD and GRAM-RTM
data sets, respectively. Moreover, the proposed method with YOLOv4 also achieves the highest classification accuracy of 99.1%,
98.6%, and 98% in daytime, night time, and rainy day, respectively. In addition, the average absolute percentage error of vehicle
speed estimation with the proposed method is about 7.6%.

1. Introduction relatively low detection speeds [8]. Therefore, an algorithm


must be developed for a real-time traffic monitoring system
Traffic monitoring with an intelligent transportation system with the capabilities of real-time computation and accurate
provides solutions to various challenges, such as vehicle vehicle detection. Therefore, the accurate and quick detec-
counting, speed estimation, accident detection, and assisted tion of vehicles from traffic images or videos has theoretical
traffic surveillance [1–5]. A traffic monitoring system es- and practical significance.
sentially serves as a framework to detect the vehicles that With the rapid development of computer vision and
appear on a video image and estimate their position while artificial intelligence technologies, object detection algo-
they remain in the scene. In the case of complex scenes with rithms based on deep learning have been widely investigated.
various vehicle models and high vehicle density, accurately Such algorithms can extract features automatically through
locating and classifying vehicles in traffic flows is difficult machine learning; thus, they possess a powerful image ab-
[6, 7]. Moreover, limitations occur in vehicle detection due straction ability and an automatic high-level feature rep-
to environmental changes, different vehicle features, and resentation capability. A few excellent object detection
2 Mathematical Problems in Engineering

networks, such as single-shot detection (SSD) [9], Fast to perform real-time vehicle counting, vehicle speed esti-
R-CNN [10], YOLOv3 [11], and YOLOv4 [12], have been mation, and vehicle classification; (2) the virtual detection
implemented for traffic detection using deep learning object zone, GMM, and YOLO are used to increase vehicle counting
detectors [13]. For example, Biswas et al. [14] implemented and classification efficiency; (3) the distance and time traveled
SSD to estimate traffic density. Yang et al. [15] proposed a by a vehicle is proposed to estimate the vehicle speed; and (4)
multitasking-capable Faster R-CNN method that uses a the MAVD, GRAM-RTM, and our collection data sets are
single image to generate three-dimensional (3D) space co- used to verify various methods and the proposed method with
ordinate information for an object with monocular vision to YOLOv4 achieving the highest classification accuracy in the
facilitate autonomous driving. Huang et al. [8] proposed a three data sets.
single-stage deep neural network called YOLOv3 and ap- The remainder of this study is organized as follows.
plied it to data sets generated in different environments to Section 2 describes the materials and methods, including
improve its real-time detection accuracy. Hu et al. [16] data set preparation, vehicle counting method, and vehicle
proposed an improved YOLOv4-based video stream vehicle classification method. Section 3 presents the results of and a
target detection algorithm to solve the problem in the de- discussion on the proposed real-time vehicle counting, speed
tection speed. In addition, the most noteworthy challenges estimation, and classification system based on a virtual
associated with traffic monitoring systems are real-time detection zone and YOLO. Finally, Section 4 presents a few
operation for accurately locating and classifying vehicles in concluding remarks and an outline for future research on
traffic flows and total occlusions that hinder vehicle tracking. real-time traffic monitoring.
Therefore, YOLO was developed as a regression-based, high-
performance algorithm for the real-time detection of and 2. Materials and Methods
statistics collection from vehicle flows.
The robustness of YOLOv3 and YOLOv4 to road To count vehicles from traffic videos, this study proposes a
marking detection improves its accuracy in small target real-time vehicle counting, speed estimation, and classifi-
detection. The model that is based on the TensorFlow cation system based on the virtual detection zone and
framework, to enhance the real-time monitoring of traffic- YOLO. We combined a vehicle detection method with a
flow problems by an intelligent transportation system [17]. classification system on the basis of two conditions between
The YOLOv3 network comprises 53 layers. It uses the the virtual detection zone and the virtual detection lane line.
Feature Pyramid Network for pedestrian detection to handle To detect vehicles, a Gaussian mixture model (GMM) is
general multiscale object detection problems and the deep applied to detect moving objects in each frame of a traffic
residual network (ResNet) ideas to extract image features for video. Figure 1 shows a flowchart of the vehicle counting and
achieving a trade-off between detection speed and detection classification process used in the proposed real-time vehicle
accuracy [18]. In addition to leveraging anchor boxes with counting, speed estimation, and classification system. In this
predesigned scales and aspect ratios to predict vehicles of study, first, traffic videos are collected to train the image data
different sizes, YOLOv3 and YOLOv4 can realize real-time and used to perform vehicle classification verification. Next,
vehicle detection with a top-down architecture [19]. GMM and virtual detection zone are used for vehicle
Moreover, a real-time vehicle detection and classification counting. Finally, YOLO is used to perform vehicle classi-
system can perform foreground extraction, vehicle detec- fication in real time. In this study, the three steps are de-
tion, vehicle feature extraction, and vehicle classification scribed as follows:
[20]. To test the proposed method for vehicle classification, a
vehicle-feature-based virtual detection zone and virtual Part 1: Collect traffic videos from online cameras.
detection line, which are predefined for each frame in a In this study, traffic videos were collected from online
video, are used for vehicle feature computation [21]. Grents cameras and used for image data training and vehicle
et al. [22] proposed a video-based system that uses a con- classification verification, as described in Section 2.1.
volutional neural network to count vehicles, classify vehicles,
and determine the vehicle speed. Tabassum et al. [23, 24] Part 2: Perform vehicle counting using GMM and
applied YOLO and a transfer learning approach to recognize virtual detection zone.
native vehicles and vehicle classification on Bangladeshi To realize real-time vehicle counting, object detection
Roads. Therefore, YOLO can be used to obtain a better and recognition are performed. A virtual detection lane line
matching map. and virtual detection zone are used to perform vehicle
To increase vehicle counting and classification problems counting and speed estimation, respectively, as described in
in real-time traffic monitoring, this study presents a real-time Section 2.2 and Section 2.4, respectively.
traffic monitoring system based on a virtual detection zone,
Gaussian mixture model (GMM), and YOLO to increase the Part 3: Perform vehicle classification and speed esti-
vehicle counting and classification efficiency. GMM and a mation using the YOLOv3 and YOLOv4 algorithms.
virtual detection zone are used for vehicle counting, and
YOLO is used to classify vehicles. Moreover, the distance and
time traveled by a vehicle are used to estimate the speed of the 2.1. Data Set Preparation. The data set used in this study was
vehicle. The major contributions of this study are described as prepared by collecting traffic videos recorded with online
follows: (1) A real-time traffic monitoring system is developed cameras installed along various roads in Taiwan. Image data
Mathematical Problems in Engineering 3

Table 1: Vehicle classification.


Start
Class Vehicle Length (m) Image

Online
camera
1 Sedan 3.6–5.5

Part1
Collect traffic video

Train image data

2 Truck >5.5–11
Part 2
Use GMM and virtual detection zone
for vehicle counting

Part 3
Use YOLO for vehicle
classification
3 Scooter 1–2.5

Estimate vehicle speed

End

Figure 1: Flowchart of the vehicle counting and classification 4 Bus 7–12.2


process.

were extracted from the traffic videos using a script, and


labeling was performed using an open-source software
application called “labeling” [25]. According to the common
types of vehicles on the road are announced by the Direc- 5 Hlinkcar 15–18
torate General of Highways, Ministry of Transportation and
Communications (MOTC) in Taiwan, this study divides six
different sizes, such as sedans, trucks, scooters, buses,
hlinkcars, and flinkcars, in the training process, and the
vehicle lengths of these six vehicle classes are listed in Ta-
ble 1. In this study, we used YOLO to perform vehicle 6 Flinkcar 18–20
classification without using the length of the vehicle.

2.2. Vehicle Counting. To count vehicles, a GMM is used for


the background subtraction in the complex environment to
identify the regions of moving objects. The GMM is quite
classification in the collected videos. A visual classifier based
reliable in the background extraction and foreground seg-
on the YOLO algorithm is used to verify the vehicle clas-
mentation process, so the characteristics of a moving object
sification capability. Figure 3 depicts the architecture of the
in video surveillance are easier to detect [26, 27]. The virtual
visual classifier based on the YOLO algorithm that is used for
detection zone is predefined in each video and used for
classifying each vehicle into one of six classes. In the training
vehicle feature computation. When the vehicle enters a
process, when a vehicle belonging to one of the six classes is
virtual detection zone and virtual detection lane line, the
detected, all bounding boxes are extracted, their classes are
GMM is used for vehicle counting. The vehicle counting
manually labeled, and the labeled data are passed to the
window is depicted in Figure 2.
YOLO model for classifying the vehicle.
The YOLOv3 model architecture displayed in Figure 4
2.3. Vehicle Detection and Classification. This study uses the was used in this study. Images of size 416 × 416 px were input
YOLO algorithm to classify vehicles into six classes. The into the Darknet-53 network. This feature extraction net-
validation method is used for verifying the vehicle work comprises 53 convolutional layers, and thus, it is called
4 Mathematical Problems in Engineering

virtual detection zone

virtual detection lane line

Figure 2: Object detection window.

Class1:sedan

Class2:truck

Class3:scooter

Class4:bus

Class5:hlinkcar
YOLO

Class6:flinkcar

Figure 3: Architecture of visual classifier based on the YOLO algorithm for verifying the vehicle classification.

Input
416*416*3 32

64
128
256
512
1024

416*416 208*208 104*104 52*52 26*26 13*13

128 256 512

256 512 1024


Conv + Leaky Relu + Batch Normalization
128 256 512
ResNet Layer
256 512 1024

Conv + Leaky ReLu 128 256 512

Sigmoid 256 512 1024

(5+class)*3 (5+class)*3 (5+class)*3


Conv (depth/2) + Upsampling + Concat

Figure 4: YOLOv3 model architecture.

Darknet-53 [11]. In Darknet-53, alternating convolution extraction and subsequently uses the feature pyramid top-
kernels are used, and after each convolution layer, a batch down and lateral connections to generate three features with
normalization layer is used for normalization. The leaky sizes of 13 × 13 × 1024, 26 × 26 × 512, and 52 × 52 × 256 px.
rectified linear unit function is used as the activation The final output depth is (5 + class) × 3, which indicates that
function, the pooling layer is discarded, and the step size of the following parameters are predicted: four basic param-
the convolution kernel is increased to reduce the size of the eters and the credibility of a box across three regression
feature map. The YOLOv3 model uses ResNet for feature bounding boxes as well as the possibility of each class being
Mathematical Problems in Engineering 5

contained in the bounding box. YOLOv3 uses the sigmoid number of output channels; PCB technology can make the
function to score each class. When the class score is higher model more flexible because it can be adjusted according to
than the threshold, the object is considered to belong to a the structure to achieve the best accuracy-speed balance.
given category, and any object can simultaneously have The loss function remains the same as the YOLOv4
multiple class identities without conflict. model, which consists of three parts: classification loss,
The loss function of YOLOv3 is mainly divided into four regression loss, and confidence loss [28]. Classification loss
parts. A denotes the loss of the identified center coordinates and confidence loss remain the same as the YOLOv3 model,
that is used to predict (x, y) in the bounding box to ensure but complete intersection over union (CIoU) is used to
that it is only valid for the highest predicted target. B is the replace mean-squared error (MSE) to optimize the regres-
loss of (w, h) width and height in the predicted bounding sion loss [29]. The CIoU loss function is shown as follows:
box, and the error value reflects the bounding box of dif-
ferent sizes in the object to predict the square root of the ρ2 􏼐b, bgt 􏼑
LOSS � 1 − IoU + + αυ
width and height instead of directly predicting the width and c2
height of the bounding box. C is the loss of the predicted
S2 B
object category, assuming that each box is a cell; if the center obj
− 􏽘 􏽘 Iij 􏽨C􏽢i log Ci 􏼁 + 􏼐1 − C􏽢i 􏼑log 1 − Ci 􏼁􏽩
of the object detection is in this cell, then mark the cell with i�0 j�0
bounding box (x, y, w, h), and there is also category in-
S2 B
formation to meet which object in the image to predict in the noobj
− λnoobj 􏽘 􏽘 Iij 􏽨C􏽢i log Ci 􏼁 + 􏼐1 − C􏽢i 􏼑log 1 − Ci 􏼁􏽩
cell. D denotes the loss of the credibility of the predicted i�0 j�0
object to calculate the credibility in each bounding box to
know that when the bounding box predicts the object. When S2
obj
the object is not predicted, there will be a credibility pre- − 􏽘 Iij 􏽘 􏼂p􏽢i (c)log pi (c)􏼁 + 1 − p􏽢i (c)􏼁log 1 − pi (c)􏼁􏼃,
i�0 c∈classes
diction penalty λnoobj � 0.5, and it is defined as follows:
(2)
A
􏽺√√√√√√√√√√√√√√√√√√􏽽􏽼√√√√√√√√√√√√√√√√√
√􏽻
s2 B where S2 represents S × S grids; each grid generates B can-
obj 2 2
λcoord 􏽘 􏽘 ℓij 􏽨 􏽢 i 􏼁 + yi − y
xi − x 􏽢 i 􏼁􏼁 􏽩 didate boxes, and each candidate box gets corresponding
i�0 j�0 bounding boxes through the network; finally, S × S × B
B
􏽺√√√√√√√√√√√√√√√√√√√√√􏽽􏽼√√√√√√√√√√√√√√√√√√√√√􏽻 bounding boxes are formed. If there is no object (noobj) in
􏽱�� 2
s2 B
√�� 􏽱�� 2 􏽱�� the box, only the confidence loss of the box is calculated. The
􏽢 i 􏼓 + hi − h􏽢i 􏼡 ⎤⎥⎦
obj
+λcoord 􏽘 􏽘 ℓij 􏼢􏼒 wi − w
i�0 j�0
confidence loss function uses cross entropy error and is
C divided into two parts: there is the object (obj) and noobj.
􏽺√√√√√√√√√√√√􏽽􏽼√√√√√√√√√√√
√􏽻 The loss of noobj increases the weight coefficient λ, which is
s2 B
obj 2 to reduce the contribution weight of the noobj calculation
􏽢 i (c)􏼁 􏽩
+ 􏽘 ℓi 􏽘􏽨 pi (c) − p
i�0 j�0 part. The classification loss function also uses cross entropy
D
􏽺√√√√√√√√√√√√√√√√√√√√√√√√􏽽􏽼√√√√√√√√√√√√√√√√√√√√√√√√􏽻 error. When the j-th anchor box of the i-th grid is re-
s2 B s2 B sponsible for certain ground truth, then the bounding box
obj 􏽢 I 􏼑2 􏼕 + λnoobj 􏽘 􏽘 ℓnoobj 􏼔􏼐Ci − C
+ 􏽘 􏽘 ℓij 􏼔􏼐Ci − C 􏽢 I 􏼑2 􏼕, generated by this anchor box will calculate the classification
ij
i�0 j�0 i�0 j�0
loss function.
(1)
where xi , yi is the location of the centroid of the anchor box 2.4. Speed Estimation. The real-time vehicle speed is also
and wi , hi is the width and height of the anchor box. Ci is the calculated in this study. Figure 6 shows the video images
Objectness, i.e., confidence score of whether there is an object taken along the direction parallel to the length of the car
or not, and pi (c) is the classification loss. (defined as the y-axis) and parallel to the width of the car
YOLOv4 is the latest algorithm of YOLO series, which is (defined as the x-axis). First, as per the scale in the video, the
the basis of YOLOv3, scales both up and down and is ap- yellow line (referred to as L) in the red circle has a length of
plicable to small and large networks while maintaining op- 4 m in accordance with traffic laws. A GMM is used to draw a
timal speed and accuracy, and the network architecture is virtual detection zone (blue box) on the road to be tested
shown in Figure 5. Compared with YOLOv3, YOLOv4-tiny is (referred to as Q). The green box is the car frame (referred to
an extended version of YOLOv3. The original Darknet53 as C), and the midpoint of the car is Ct.
network is added with a CSP network. Backbone is CSPO-
SANet proposed by Cross Stage Partial Network
LAB (px)
(CSPNet) + One-Shot Aggregation Network (OSANet), plus u0 � ,
Partial in Computational Blocks (PCB) technology. CSPNet LAB (m)
(3)
can be applied to different CNN architectures to reduce the u
amount of parameters and calculations while improving α � 0,
u0′
accuracy. OSANet is derived from the OSA model in VoVNet.
Its central idea is improved by the DenseNet module. At the u0
If α > 1, ui � u0 αi , uj � , (4)
end, all layers are connected to allow input consistent with the αj
6 Mathematical Problems in Engineering

CSPDarkNet53
608×608 304×304 152×152 76×76 38×38 19×19

SSP+PAN process1 YOLO HEAD

y19 y19_output
process2 process5 YOLO HEAD3 19x19,(5+class)*3
y38 y38_output
process3 process4 YOLO HEAD3 38x38,(5+class)*3

y76 y76_output
YOLO HEAD3 76x76,(5+class)*3

Figure 5: YOLOv4 model architecture.

(a) (b)

Figure 6: Diagram of speed estimation ((a): real picture on the video; (b): control scale).

where u0 is the scale, i is the scale of the blue box, j is the scale 3.6
of the green box, px is the length of the video, and m is the v′ � x × . (8)
t
actual length. The parameter α denotes the increase or de-
crease in relationship of the scale per unit length on the y-axis. Equation (7) is used for calculating vehicle speed. After
If α > 1, the speed calculation is performed using equation (4). unit conversion (m/s to km/h), Equation (8) provides the
To calculate the parallel L line segment of Ct (referred to vehicle speed.
as L∗ ), the algorithm computes the L∗ distance y between A
and B. Then, it restores y from its scale relationship with the 3. Results and Discussion
actual line segment length x, where x denotes the distance
traveled by the vehicle in Q. All experiments in this study were performed using the YOLO
p p
algorithm under the Darknet framework, and the program
x � y0 u0 + 􏽘 yi ui 􏽘 yj uj . was written in Python 2.7. To validate the real-time traffic
(5)
i�1 j�1 monitoring system, we used a real-world data set to perform
vehicle detection, vehicle counting, speed estimation, and
In the calculation process, the program is used to de- classification. In this study, three test data sets were used to
termine the frame rate of the video and calculate the number evaluate the proposed method. One of these data sets was
of frames for which the vehicle travels in Q (referred to as p). mainly derived from traffic video images of online cameras on
Equation (6) is used to find the travel time of the vehicle various roads in Taiwan, and it contains 12,761 training
from A to B in Q. images and 3,190 testing images. Second, the Montevideo
p Audio and Video Dataset (MAVD), which contains data on
t� , (6) different levels of traffic activity and social use characteristics
fps
in Montevideo city, Uruguay, was used as the other traffic data
x set [30]. Finally, GARM Road-Traffic Monitoring (GRAM-
v� , (7) RTM) data set [21] has four categories (i.e., cars, trucks, vans,
t
Mathematical Problems in Engineering 7

and big-trucks). The total number of different objects in each Table 2: The real-time vehicle counting using the proposed method
sequence is 256 for M-30, 235 for M-30-HD, and 237 for in the daytime.
Urban 1. In this study, the definition of accuracy is based on Actual number of Estimated number of
the classification of vehicles in the database. In the video Video no. vehicles vehicles
verification, if the results of manual and proposed system S L Total S L Total
classification of the vehicles are the same, it means that the
1 31 7 38 29 6 35
count is correct; otherwise, it is the wrong vehicle counting. 2 18 0 18 18 0 18
3 16 5 21 16 5 21
4 28 0 28 25 0 25
3.1. Vehicle Counting. Seven input videos of the road, each
5 22 3 25 20 3 23
ranging in length between 3 and 5 minutes, were recorded 6 11 0 11 11 0 11
at 10 am and 8 pm. In addition, eleven input videos of the 7 11 1 12 9 0 9
road in the rain were also recorded for testing. Each frame
in these traffic videos was captured at 30 fps. The first
experimental results of real-time vehicle counting using the Table 3: The real-time vehicle counting using the proposed method
proposed method during the day are summarized in Ta- in the night time.
ble 2. The symbols S and L denote small and large vehicles,
Actual number of Estimated number of
respectively. The vehicle counting accuracy of the proposed vehicles vehicles
Video no.
method at 10 am was 95.5%. The second experimental
S L Total S L Total
results of real-time vehicle counting using the proposed
method during the night are summarized in Table 3. The 1 23 8 31 23 7 30
2 31 5 36 29 5 34
vehicle counting accuracy of the proposed method at 8 pm
3 15 2 17 14 1 15
was 98.5%. In addition, the third experimental results of 4 10 3 13 9 3 12
real-time vehicle counting using the proposed method in 5 19 4 23 19 4 23
the rain are summarized in Table 4. The vehicle counting 6 11 2 13 11 2 13
accuracy of the proposed method was 94%. Screenshots of 7 35 5 38 34 3 37
vehicle detection with the proposed real-time vehicle
counting and classification system are depicted in Figure 7,
where the detected vehicles are represented as green
rectangles. Table 4: The real-time vehicle counting using the proposed method
Vehicle counting in online videos is delayed due to in the raining day.
network stoppages or because the target vehicle may be Actual number of Estimated number of
blocked by other vehicles on the screen, which causes the Video no. vehicles vehicles
count to be missed. In addition, poor lighting in the rain and S L Total S L Total
night affects the vehicle recognition capabilities of YOLOv3 1 9 0 9 9 0 9
and YOLOv4. These challenges can be overcome using a 2 12 0 12 10 1 11
stable network connection and adjusting the camera 3 13 0 13 13 0 13
brightness, respectively. Therefore, the novelty of this study 4 7 0 7 8 0 8
is to solve the problem of unclear recognition in the rain. 5 11 1 12 14 1 15
6 15 0 15 17 0 17
7 10 0 10 13 0 13
3.2. Speed Estimation. In this subsection, the vehicle speed 8 7 0 7 10 0 10
can be estimated using the proposed method. Table 5 lists 9 12 0 12 15 0 15
the actual and the estimated speeds of the vehicles. The 10 12 0 12 14 0 14
results indicate that the average absolute percentage error 11 17 0 17 19 0 19
of vehicle speed estimation was about 7.6%. The use of
online video for vehicle speed estimation will cause large
speed errors due to network delays. Therefore, network MAVD traffic data set was 93.84%. Vehicle classification
stability is essential to reduce the percentage error in the results of the proposed method using MAVD traffic data set
speed estimation. are listed in Table 6.
In summary, three data sets, namely, MAVD, GRAM-
RTM, and our collection data sets, were used to verify the
3.3. Comparison Results Using the MAVD and GRAM-RTM proposed method and Fast RCNN method [10]. The MAVD
Data Sets. MAVD traffic data set [30] and GARM Road- training and testing samples contains vehicles belonging to
Traffic Monitoring (GRAM-RTM) data set [21] were used four categories (i.e., cars, buses, motorcycles, and trucks). The
for evaluating the vehicle counting performance of the GRAM-RTM data set has four categories (i.e., cars, trucks,
proposed method. The videos were recorded with a GoPro vans, and big-trucks). The total number of different objects in
Hero 3 camera at a frame rate of 30 fps and a resolution of each sequence is as follows: 256 for M-30, 235 for M-30-HD,
1920 × 1080 px. We analyzed 10 videos, and the vehicle and 237 for Urban 1. Table 7 shows the classification accuracy
counting accuracy of the proposed method at 10 am for the results of three data sets using various methods. In Table 7, the
8 Mathematical Problems in Engineering

Figure 7: Screenshots from the proposed real-time vehicle counting, speed estimation, and classification system.

Table 5: The actual and the estimated vehicle speeds using the proposed method.
Vehicle ID Actual speed Estimated speed Difference Error (%)
1 60 63 3 5
2 70 75 5 7
3 72 63 −9 12.5
4 99 100 1 1
5 84 85 1 1
6 67 60 −7 10
7 73 71 −2 2.7
8 67 64 −3 4.4
9 37 43 6 16
10 73 77 4 5
11 55 50 −5 9
12 48 54 6 12.5
13 111 127 16 14.4
14 79 75 −4 5
15 69 71 2 2.8
16 82 75 −7 8.5
17 83 73 −10 12
Average error 7.6

proposed method with YOLOv4 achieved the highest clas-


Table 6: Vehicle classification results of the proposed method using sification accuracy of 98.91% and 99.5% in MAVD and
MAVD traffic data set. GRAM-RTM data sets, respectively. Moreover, three different
Total number of Number of counted environments (i.e., daytime, night time, and rainy day) are
Video no. vehicles vehicles used verify the proposed method. Experimental results in-
S L Total S L Total dicate that the proposed method with YOLOv4 also achieves
1 8 0 8 8 0 8 the highest classification accuracy of 99.1%, 98.6%, and 98% in
2 5 0 5 4 0 4 daytime, night time, and rainy day, respectively.
3 4 2 6 3 2 5 Recently, some researchers have adopted various
4 4 0 4 3 0 3 methods for vehicle classification using GRAM-RTM data
5 3 0 3 3 1 4 set, such as Faster RCNN [10], CNN [31], and DNN [32].
6 1 0 1 1 0 1 Therefore, we use the same GRAM-RTM data set to compare
7 7 2 9 7 2 9 the proposed method with other methods. Table 8 shows the
8 11 0 11 10 0 10 comparison results. In Table 8, the results show that the
9 9 0 9 9 0 9 proposed method with YOLOv4 can perform better than the
10 9 0 9 8 0 8
other methods.
Mathematical Problems in Engineering 9

Table 7: Classification accuracy results of three data sets using various methods.
Data sets Methods Accuracy (%) FPS
Faster RCNN [10] 97.21 5
MAVD Proposed method with YOLOv3 97.66 15
Proposed method with YOLOv4 98.91 15
Faster RCNN [10] 91.54 5
GRAM-RTM Proposed method with YOLOv3 98.02 15
Proposed method with YOLOv4 99.5 15
Faster RCNN [10] 97.7 5
Daytime Proposed method with YOLOv3 98 15
Proposed method with YOLOv4 99.1 15
Faster RCNN [10] 93.59 5
Our data set Night time Proposed method with YOLOv3 98 15
Proposed method with YOLOv4 98.6 15
Faster RCNN [10] 87.5 5
Rainy day Proposed method with YOLOv3 90 15
Proposed method with YOLOv4 98 15

Table 8: Classification accuracy results of various methods using GARM-RTM data set.
Our proposed method
Methods Faster RCNN [10] Gomaa et al. [31] Abdelwahab [32]
YOLOv3 YOLOv4
Accuracy (%) 91.54 96.8 93.51 98.02 99.5

4. Conclusions Conflicts of Interest


In this study, a real-time traffic monitoring system based on The authors declare that there are no conflicts of interest
a virtual detection zone, GMM, and YOLO is proposed for regarding the publication of this study.
increasing the vehicle counting and classification efficiency.
GMM and a virtual detection zone are used for vehicle
counting, and YOLO is used to classify vehicles. Moreover,
Acknowledgments
the distance and time traveled by a vehicle are used to es- This research was funded by the Ministry of Science and
timate the speed of the vehicle. In this study, MAVD, Technology of the Republic of China, grant number MOST
GRAM-RTM, and our collection data sets are used to verify 110-2221-E-167-031-MY2.
the proposed method. Experimental results indicate that the
proposed method with YOLOv4 achieved the highest
classification accuracy of 98.91% and 99.5% in MAVD and References
GRAM-RTM data sets, respectively. Moreover, the proposed
[1] Y. Mo, G. Han, H. Zhang, X. Xu, and W. Qu, “Highlight-
method with YOLOv4 also achieves the highest classification assisted nighttime vehicle detection using a multi-level fusion
accuracy of 99.1%, 98.6%, and 98% in daytime, night time, network and label hierarchy,” Neurocomputing, vol. 355,
and rainy day, respectively. In addition, the average absolute pp. 13–23, 2019.
percentage error of vehicle speed estimation with the pro- [2] D. Feng, C. Haase-Schuetz, L. Rosenbaum et al., “Deep multi-
posed method is about 7.6%. Therefore, the proposed modal object detection and semantic segmentation for au-
method can be applied to vehicle counting, speed estimation, tonomous driving: datasets, methods, and challenges,” IEEE
and classification in real time. Transactions on Intelligent Transportation Systems, vol. 22,
However, the proposed method has a few limitations. no. 3, p. 3, 2019.
The vehicles appearing in the video are assumed to be inside [3] Z. Liu, Y. Cai, H. Wang et al., “Robust target recognition and
tracking of self-driving cars with radar and camera infor-
the virtual detection zone; thus, the width of the virtual
mation fusion under severe weather conditions,” IEEE
detection zone should be sufficiently large for counting the Transactions on Intelligent Transportation Systems, 2021.
vehicles. In the future work, we will focus on algorithm [4] Y. Qian, J. M. Dolan, and M. Yang, “DLT-NET: joint de-
acceleration and model simplification. tection of drivable areas, lane lines, and traffic objects,” IEEE
Transactions on Intelligent Transportation Systems, vol. 21,
Data Availability no. 11, pp. 4670–4679, 2020.
[5] Y. Cai, L. Dai, H. Wang et al., “Pedestrian motion trajectory
The MAVD and GRAM-RTM traffic data sets are available at prediction in intelligent driving from far shot first-person
https://zenodo.org/record/3338727#.YBD8B-gzY2w and perspective video,” IEEE Transactions on Intelligent Trans-
https://gram.web.uah.es/data/datasets/rtm/index.html. portation Systems, pp. 1–16, 2021.
10 Mathematical Problems in Engineering

[6] Á. Llamazares, E. J. Molinos, and M. Ocaña, “Detection and Computer Science and Software Engineering (JCSSE), Khon
tracking of moving obstacles (DATMO): a review,” Robotica, Kaen, Thailand, July 2016.
vol. 38, no. 5, pp. 761–774, 2020. [22] A. Grents, V. Varkentin, and N. Goryaev, “Determining
[7] C. Liu, D. Q. Huynh, Y. Sun, M. Reynolds, and S. Atkinson, “A vehicle speed based on video using convolutional neural
vision-based pipeline for vehicle counting, speed estimation, network,” Transportation Research Procedia, vol. 50,
and classification,” IEEE Transactions on Intelligent Trans- pp. 192–200, 2020.
portation Systems, pp. 1–14, 2020. [23] S. Tabassum, M. S. Ullah, N. H. Al-Nur, and S. Shatabda,
[8] Y.-Q. Huang, J.-C. Zheng, S.-D. Sun, C.-F. Yang, and J. Liu, “Native vehicles classification on Bangladeshi roads using
“Optimized YOLOv3 algorithm and its application in traffic CNN with transfer learning,” in Proceedings of the 2020 IEEE
flow detections,” Applied Sciences, vol. 10, Article ID 3079, Region 10 Symposium (TENSYMP), Dhaka, Bangladesh, June
2020. 2020.
[9] W. Liu, D. Anguelov, D. Erhan et al., “SSD: single [24] S. Tabassum, S. Ullah, N. H. Al-nur, and S. Shatabda, “Por-
shot multibox detector,” in Proceedings of the Computer ibohon-BD: Bangladeshi local vehicle image dataset with
Vision—ECCV 2016, pp. 21–37, Amsterdam, Netherlands, annotation for classification,” Data in Brief, vol. 33, Article ID
October 2016. 106465, 2020.
[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards [25] LabelImg (accessed on 5 March 2018), https://github.com/
real-time object detection with region proposal networks,” tzutalin/labelImg.
IEEE Transactions on Pattern Analysis and Machine Intelli- [26] A. Nurhadiyatna, B. Hardjono, A. Wibisono et al., “Improved
gence, vol. 39, no. 6, pp. 1137–1149, 2017. vehicle speed estimation using Gaussian mixture model and
[11] X. Zhang and X. Zhu, “Vehicle detection in the aerial infrared hole filling algorithm,” in Proceedings of the 2013 International
images via an improved Yolov3 network,” in Proceedings of Conference on Advanced Computer Science and Information
the 2019 IEEE 4th International Conference on Signal and Systems (ICACSIS), Sanur Bali, Indonesia, September 2013.
Image Processing (ICSIP), pp. 372–376, Wuxi, China, July [27] A. Ghosh, M. S. Sabuj, H. H. Sonet, S. Shatabda, and
2019. D. M. Farid, “An adaptive video-based vehicle detection,
[12] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: classification, counting, and speed-measurement system for
optimal speed and accuracy of object detection,” 2020, https:// real-time traffic data collection,” in Proceedings of the 2019
arxiv.org/abs/2004.10934v1. IEEE Region 10 Symposium (TENSYMP), Olkata, India, June
[13] J. Redmon and A. Farhadi, “YOLO V3: an incremental im- 2019.
provement,” pp. 1–22, 2018, http://arxiv.org/abs/1804.02767. [28] L. Wu, J. Ma, Y. Zhao, and H. Liu, “Apple detection in
[14] D. Biswas, H. Su, C. Wang, A. Stevanovic, and W. Wang, “An complex scene using the improved YOLOv4 model,”
automatic traffic density estimation using single shot detec- Agronomy, vol. 11, no. 3, p. 476, 2021.
tion (SSD) and mobilenet-SSD,” Physics and Chemistry of the [29] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-
Earth, Parts A/B/C, vol. 110, pp. 176–184, 2019. IoU loss: faster and better learning for bounding box re-
[15] W. Yang, Z. Li, C. Wang, and J. Li, “A multi-task Faster gression,” in Proceedings of the 2020 AAAI Conference on
R-CNN method for 3D vehicle detection based on a single Artificial Intelligence, pp. 12993–13000, New York, NY, USA,
image,” Applied Soft Computing, vol. 95, Article ID 106533, February 2020.
2020. [30] P. Zinemanas, P. Cancela, and M. Rocamora, “MAVD: a
[16] X. Hu, Z. Wei, and W. Zhou, “A video streaming vehicle dataset for sound event detection in urban environments,” in
detection algorithm based on YOLOv4,” in Proceedings of the Proceedings of the 4th Workshop on Detection and Classifi-
2021 IEEE 5th Advanced Information Technology, Electronic cation of Acoustic Scenes and Events (DCASE 2019), New
and Automation Control Conference (IAEAC), pp. 2081–2086, York, NY, USA, October 2019.
Chongqing, China, March 2021. [31] A. Gomaa, M. M. Abdelwahab, M. Abo-Zahhad,
[17] C. Y. Cao, J. C. Zheng, Y. Q. Huang, J. Liu, and C. F. Yang, T. Minematsu, and R.-I. Taniguchi, “Robust vehicle detection
“Investigation of a promoted You Only Look once algorithm and counting algorithm employing a convolution neural
and its application in traffic flow monitoring,” Applied Sci- network and optical flow,” Sensors, vol. 19, no. 20, Article ID
ences, vol. 9, Article ID 3619, 2019. 4588, 2019.
[18] H. Zhou, L. Wei, C. P. Lim, D. Creighton, and S. Nahavandi, [32] M. A. Abdelwahab, “Accurate vehicle counting approach
“Robust vehicle detection in aerial images using bag-of-words based on deep neural networks,” in Proceedings of the 2019
International Conference on Innovative Trends in Computer
and orientation aware scanning,” IEEE Transactions on
Engineering (ITCE’2019), Aswan, Egypt, February 2019.
Geoscience and Remote Sensing, vol. 56, no. 12, pp. 7074–7085,
2018.
[19] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and
S. Belongie, “Feature pyramid networks for object detection,”
in Proceedings of the 2017 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 936–944, Hon-
olulu, HI, USA, July 2017.
[20] C.-Y. Chen, Y.-M. Liang, and S.-W. Chen, “Vehicle classifi-
cation and counting system,” in Proceedings of the 2014 In-
ternational Conference on Audio, Language and Image
Processing (ICALIP), pp. 485–490, Shanghai, China, July 2014.
[21] N. Seenouvong, U. Watchareeruetai, C. Nuthong,
K. Khongsomboon, and N. Ohnishi, “Vehicle detection and
classification system based on virtual detection zone,” in
Proceedings of the 2016 13th International Joint Conference on

You might also like