0% found this document useful (0 votes)

38 views11 pages

Research Article: A Real-Time Object Detector For Autonomous Vehicles Based On Yolov4

Uploaded by

Ziad Ayman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views11 pages

Research Article: A Real-Time Object Detector For Autonomous Vehicles Based On Yolov4

Uploaded by

Ziad Ayman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Hindawi

Computational Intelligence and Neuroscience

Volume 2021, Article ID 9218137, 11 pages
https://doi.org/10.1155/2021/9218137

Research Article
A Real-Time Object Detector for Autonomous Vehicles
Based on YOLOv4

Rui Wang,1 Ziyue Wang,1 Zhengwei Xu ,2 Chi Wang,1 Qiang Li,1 Yuxin Zhang,1
and Hua Li1
1
Changchun University of Science and Technology, School of Compute Science and Technology, Changchun, Jilin 130022, China
2
Chengdu University of Technology, Department of Geophysics, Chengdu, Sichuan 610059, China

Correspondence should be addressed to Zhengwei Xu; [email protected]

Received 21 October 2021; Revised 25 November 2021; Accepted 26 November 2021; Published 10 December 2021

Academic Editor: Jianli Liu

Copyright © 2021 Rui Wang et al. ,is is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Object detection is an important part of autonomous driving technology. To ensure the safe running of vehicles at high speed, real-
time and accurate detection of all the objects on the road is required. How to balance the speed and accuracy of detection is a hot
research topic in recent years. ,is paper puts forward a one-stage object detection algorithm based on YOLOv4, which improves
the detection accuracy and supports real-time operation. ,e backbone of the algorithm doubles the stacking times of the last
residual block of CSPDarkNet53. ,e neck of the algorithm replaces the SPP with the RFB structure, improves the PAN structure
of the feature fusion module, adds the attention mechanism CBAM and CA structure to the backbone and neck structure, and
ﬁnally reduces the overall width of the network to the original 3/4, so as to reduce the model parameters and improve the inference
speed. Compared with YOLOv4, the algorithm in this paper improves the average accuracy on KITTI dataset by 2.06% and BDD
dataset by 2.95%. When the detection accuracy is almost unchanged, the inference speed of this algorithm is increased by 9.14%,
and it can detect in real time at a speed of more than 58.47 FPS.

1. Introduction to ensure the safety in driving. Generally, autonomous ve-

hicles use various sensors, such as cameras, lidar, and radar,
In recent years, deep learning has been widely applied in to detect objects [5]. Some researchers [6] detect vehicles by
various fields, including computer vision [1], social services extracting binary images from discrete sensor arrays, and
[2], and autonomous driving [3]. With the rapid develop- some researchers [7] have achieved good results in the
ment of sensors and GPU, the computing speed of deep detection task in bad weather through the sensing method of
learning algorithm is greatly accelerated, especially in the radar and camera information fusion. Compared with other
past decade, when it has been noticed that the fully au- sensors, the camera is now more accurate and more cost-
tonomous vehicles might become a reality in the foreseeable effective at detecting objects. Object detection algorithm
future. According to the report, two-thirds of the fatal ac- based on deep learning becomes an essential method in
cidents every year are related to the urban traffic network [4], autonomous driving because it can achieve high detection
and the variability of autonomous driving scenes (such as accuracy with less computing resources.
cars and people in different weather, different light, and with Object detection algorithm of autonomous vehicles
or without occlusion) makes it particularly difficult to detect should satisfy the following two conditions: First, high
them accurately. ,erefore, there are still many difficulties in detection accuracy of road objects is needed. Secondly, a
the detection task. real-time detection speed is very important for whether the
,e main task of autonomous driving is to accurately detector can be used in driving. Object detection algorithms
and quickly detect the vehicles, pedestrians, traffic lights, based on deep learning can be roughly divided into two
traffic signs, and other objects around the vehicles, in order categories: two-stage and one-stage. Two-stage algorithm
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 Computational Intelligence and Neuroscience

generates region proposal in the first stage and goes on bbox and CA [30], and reducing the computation, improving the
regression and object classification prediction in these re- real-time performance by scaling the width of the network.
gions in the second stage, e.g., R-CNN [8], Fast R-CNN [9],
Faster R-CNN [10], and R-FCN [11]. Two-stage algorithms 2. Related Work
usually have a high accuracy but have a relatively slow
detection speed. One-stage algorithms, such as SSD [12] and YOLO [13] is different from the two-stage algorithm using
YOLO [13], perform classification and regression in just one region proposal to get regions of interest. Instead, it detects
stage. ,ese methods generally have a low accuracy but a objects by segmenting the image into grid cells. Its output
high detection speed. In recent years, object detectors layer information includes bbox coordinates, confidence,
combining various optimization methods have been widely and classification score. ,erefore, it can detect multiple
studied [14–18] in order to take advantage of both types of objects through a single stage, and the speed is much faster
method. MS-CNN [14], a two-stage object detection algo- than two-stage algorithm. However, due to the fact that it
rithm, improves detection speed by a series of intermediate predicts coordinates directly and not based on anchor, it is
layers. RFBNet [18], a one-stage algorithm, proposes re- difficult to detect small objects. YOLOv2 [27] adds BN layer
ceptive filed blocks to expand the receptive field to improve after convolution layer, applies the idea of bbox based on
accuracy. However, previous studies [14–17] can no longer anchor, multiscale training, and uses passthrough layer to
satisfy the detector speed above 30 fps, one of the prereq- fuse fine-grained features, which improves the accuracy
uisites for autonomous driving, when the input resolution compared with YOLO and YOLOv3 [23]; its backbone
reaches 512 × 512 or higher. ,is indicates that the previous DarkNet53 applies residual connection to solve the problem
schemes are incomplete in terms of the trade-off between of deep network gradient disappear; FPN feature fusion
accuracy and speed and therefore difficult to apply in the retains small object fine-grained features; multiscale pre-
field of autonomous driving. diction makes the network detect objects of different sizes. It
,e problem of most object detection algorithms is that has a more obvious improvement compared with YOLO and
large objects are easily detected, while small objects are often YOLOv2. ,e structure of YOLOv4 [24] is shown in Fig-
ignored by the detector. It is extremely dangerous to miss ure 1. On the basis of YOLOv3, a large number of excellent
pedestrians, traffic lights, and traffic signs in autonomous methods and training tricks in recent years are tried.
driving. In recent years, there are many feature fusion al- Backbone CSPDarkNet53 is DarkNet53 integrated into CSP
gorithms for small object detection [19–22]. Kaiming He structure [31]. ,e SPP module [19] after the backbone
proposed SPPNet [19] in 2014 to extract features of any significantly increases the receptive field but hardly affects
aspect ratio region, which provides an idea for the detection the inference speed. ,e repeated extraction process of PAN
algorithms such as YOLOv3 [23] and YOLOv4 [24]. FPN [21] structural features alleviates the problem of serious
[20] is a multiscale feature fusion network structure. FPN information loss when the bottom information is transferred
combines high-level semantic features and low-level location to the top in FPN. As with YOLOv3, the prediction layer is
features to effectively improve the detection accuracy of carried out on three different scales to detect objects of
small targets. PANet [21] is an improved version of FPN, different sizes. ,e inference speed of YOLOv4 is faster than
which adopts the top-down and bottom-up transmission that of YOLO and YOLOv2 because it only consists of 1 × 1
mode to eliminate the problem of information loss from the and 3 × 3 small convolution layers. ,e parameters of the
bottom features to the high features. ASFF [22] is a novel backbone with CSP structure are greatly reduced, and the
feature fusion strategy, which reduces the conflict and in- information exchange between layers is greatly improved.
consistency between different feature layers through adap- ,erefore, the inference speed and accuracy are better than
tive spatial feature fusion and improves the effectiveness of those of YOLOv3. It can also satisfy the high real-time
feature pyramid. requirement of autonomous driving system. However,
In addition, some researchers [25, 26] try to add P6 and generally speaking, its accuracy is still lower than that of the
P7 detection layers after P5 with 32 times downsampling rate two-stage algorithm, and it does not optimize for the sit-
to improve the detection accuracy of small objects, but it uation of many small objects in the autonomous driving
brings huge computational cost and speed loss. YOLO series scene. To make up for this, we use YOLOv4, which has a
algorithm [13, 23, 24, 27] is one of the faster one-stage al- lower complexity than the two-stage algorithm, and improve
gorithms, especially the YOLOv4. It improves the low ac- the accuracy and speed of YOLOv4 through additional
curacy of YOLO [13], YOLOv2 [27], and YOLOv3 by methods, so as to design a more efficient detector for au-
combining the advantages of a large number of excellent tonomous driving.
models and adding a large number of training tricks. Since SENet [32] shined in the last ImageNet classifi-
However, both YOLOv4 and previous algorithms are trained cation competition in 2016, the attention module of plug-
and optimized for MS-COCO [28], which requires a large and-play can be directly applied to the existing neural
number of categories to be detected and its context is highly network because of its flexibility, which is popular in
variable. So these models are suboptimal when applied to the computer vision tasks. CBAM [29] considers the location
field of autonomous driving. ,erefore, this paper proposes information ignored by SE module and uses large-scale
a new method to improve the accuracy of the model by convolution to utilize the location information by reducing
embedding the RFB module [18] into the backbone network, the number of channels, which has better interpretability
optimizing the PAN, adding attention module CBAM [29] than SE module. CA [33] is a newly proposed attention
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational Intelligence and Neuroscience 3

Output

Backbone Neck
Conv
×3 Conv Up

SPP

CSP4 Conv
×3
Predictor
Conv
CSP8 Conv Concat Conv Up
×5

Conv
CSP8 Conv Concat Conv Conv2D
×5

CSP2
Conv

CSP1
Conv
Concat Conv Conv2D
×5
Conv

Conv

Conv
Concat Conv Conv2D
×5

Input
Figure 1: YOLOv4 structure.

module. In order to alleviate the loss of location information aspect ratio in CIoU loss cannot reflect the real difference
caused by 2D global pooling, channel attention is decom- with its confidence, so the real width loss and high loss are
posed into two parallel 1D feature decoding processes, and calculated, respectively, and then added up.
the location information is effectively embedded into ,e autonomous driving scene is different from the daily
channel attention. life scene, which does not need to pay attention to those
Traditional object detection algorithm usually uses mean unimportant classes. ,erefore, most of the advanced
square error (MSE, L2) or smooth L1 [9] to regress the center models optimized for MS-COCO [28] are suboptimal.
point coordinates and the width and height of bbox directly, KITTI [37] is a common dataset in autonomous driving
i.e., {xcenter, ycenter, w, h}, or the upper left corner and lower scenes. It is collected in urban areas, rural areas, and ex-
right corner, i.e., {xtop left, ytop left, xbottom right, ybottom right}. pressways. Each image has up to 15 cars and more than 30
For the anchor-based object detection algorithm, it is to pedestrians, and there are various degrees of occlusion and
regress the offset, that is, {xoffset, yoffset, woffset , hoffset}. But truncation. BDD100k [38] is a large and diverse public
regression of bbox directly is to take the four bbox points as driving dataset released by the Berkeley AI Research (BAIR)
independent variables, without considering the correlation in recent years, including different weather conditions, day
between them, and in the process of training, it is more and night, as well as different lighting conditions and oc-
inclined to large objects, because the loss of small objects is clusion. ,is paper proposes two algorithms based on
originally small. ,erefore, in order to better deal with this YOLOv4. ,e first algorithm improves the accuracy by
problem, IoU loss [34] was proposed to treat bbox as a whole adding CSP [31] structure into feature fusion, inserting
regression and take GT into account. IoU has scale in- attention mechanism, and using EIoU regression loss
variance; it can solve the problem that loss increases with function to accelerate model convergence. ,e second al-
scale in regression. Recently, with the continuous im- gorithm improves the detection accuracy of dense small
provement of researchers, GIoU loss [30] was proposed. In objects by inserting RFB [18] module. Finally, the width is
addition to IoU, GIoU loss also considers the shape and reduced to 3/4 of the original to improve the inference speed,
direction of the object to solve the problem that IoU loss can as shown in Figure 2.
not reflect the size of coincidence degree and return gradient
when IoU is zero. DIoU loss [35] is to replace the penalty 3. Proposed Work
term of GIoU to maximize the overlap area with the min-
imum circumscribed rectangle by minimizing the Euclidean According to YOLOv4 [24], the anchor-based one-stage
distance of bbox and GT center points, so as to accelerate the detection algorithm is generally composed of backbone,
convergence. As for CIoU loss [35], the aspect ratio is neck, and predictor head. ,e first model proposed in this
considered on the basis of DIoU. ,is year, some researchers paper inserts the attention mechanism into the bottleneck of
put forward EIoU loss [36], thinking about that the relative the residual structure and adds the CSP structure into the
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 Computational Intelligence and Neuroscience

Output
Neck
Backbone CSP2 Conv Up

RFB
CSP8 Conv
CBAM ×2

Predictor
CSP8 CSP3
Conv Concat Conv Up
CBAM CBAM

CSP8 CSP3
Conv Concat Conv Conv2D
CBAM CBAM

CSP2
CBAM
Conv

CSP1
CBAM CSP3
Concat Conv Conv2D
CBAM
Conv

Conv

CSP3
Concat Conv Conv2D
CBAM

Input

Figure 2: Proposed work (2) structure.

neck as the baseline of this paper. In addition, in model 2,

slightly adjust the number of iterations of the backbone,
adjust the insertion position of the attention mechanism, Conv Conv
Bottleneck =
replace the SPP structure, and scale the overall network 1×1 3×3
structure in the width direction. ,e improved algorithm
meets the needs of real-time detection. It is a multiscale real- CA
time detection algorithm specially designed for autonomous
driving scene. Figure 3: Coordinate attention in bottleneck.

3.1. Backbone. CSPDarkNet53 of YOLOv4 is an excellent information, suppress irrelevant information, and improve
backbone, which can solve the task of feature extraction in the overall accuracy of object detection. Figure 3 is the CA
most detection scenes. ,e first model proposed in this attention mechanism insertion position of model 1.
paper continues to use CSPDarkNet53 and only adds CA
attention module into bottleneck (see Figure 3). ,e ef-
fectiveness of attention mechanism has been fully verified in 3.2. Neck. For CNN, the more backward layers are rich in
many detection models. It can greatly increase the ability of semantic information. YOLOv4 uses SPP [19] after back-
feature extraction by adding only a small number of pa- bone to increase the receptive field of the network. Com-
rameters. In order to more fully enhance the feature ex- pared with the pure pooling of SPP, RFB [18] draws lessons
traction ability of backbone in complex traffic scenes, the from Inception in structure, adopts the horizontal con-
second model doubled the number of iterations of the last nection fusion network layer, and increases the receptive
layer of its residual structure (i.e., increased to 8). In the field and reduces the amount of calculation through dilated
experiment, it was found that it is better to modify the convolution, which is more robust. As shown in Figure 6,
attention mechanism to CBAM and the insertion position to RFB block is composed of 3 × 3 convolution and three di-
be outside the residual structure and inside the CSP lated convolution layers.
structure, as shown in Figure 4(b). PAN [21] is a feature enhancement structure for feature
CBAM [29] and CA [30] modules are shown in Figure 5. fusion. It adopts a top-down and bottom-up transmission
Both CBAM and CA are attention mechanisms of mixed mode to eliminate the loss of feature information from the
channel and space. Compared with the single channel at- bottom feature to the high feature. However, the layer
tention mechanism SE [32], the neural network will pay structure between PAN is connected in the form of ordinary
more attention to the object area containing important convolution. CSP [31] structure has shown its advantages in
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational Intelligence and Neuroscience 5

Bottle CBAM
Conv Conv2D
Neck
CSPN = ×N Concat BN LeakyReLU Conv Bottle
Conv2D Conv
Neck
CSPN ×N Concat Conv
CBAM =
Conv

(a) (b)

Figure 4: (a) CSP in YOLOv4. (b) CSP in proposed work (2).

backbone: strengthening information exchange between 1 × 1 convolution layer. ,e ﬁnal predicted output channel is
channels and reducing the amount of calculation. ,erefore, na × (4 + 1 + nc), where na is the number of anchors in each
adding CSP structure to the layer structure between PAN is detection layer and nc is the number of classes. Proposed
more reﬁned and has less parameters than CSP structure in work follows this structure.
CSPDarkNet53 (see Figure 4).

3.4. Loss Function. For the object detection model, the loss
3.3. Predictor Head. In object detection, the conflict between
function is generally the sum of confidence loss, classifi-
classification and regression tasks is a well-known problem,
cation loss, and bbox regression loss. Binary cross entropy
so the prediction head for classification and regression is
(BCE) was used for confidence loss and classification loss,
widely used in most detectors. YOLOv4 follows the pre-
and EIoU loss was used for bbox regression loss.
dictor head of YOLOv3, which consists of one 3 × 3 and one

L � λ1 Lobj + λ2 Lcls + λ3 Lbox , (1)

1
Lobj � − 􏽘􏼐Oi ln􏼐C􏽢i ) + 1 − Oi 􏼁ln􏼐1 − C􏽢i )), (2)
N i

1 􏽢 ij 􏼑,
􏽢 ij 􏼑 + 􏼐1 − Oij 􏼑ln􏼐1 − C
Lcls � − 􏽘 􏽘 O ln􏼐C (3)
Npos i∈pos j∈cls ij

ρ2 􏼐b, bgt 􏼑 ρ2 􏼐w, wgt 􏼑 ρ2 􏼐h, hgt 􏼑

Lbox � LEIoU � LIoU + Ldis + Lasp � 1 − IoU + + + . (4)
C2 C2w C2h

In formula (1), λ1 , λ2 , λ3 are the coefficient of each loss, and 36.1%, respectively, compared with YOLOv4 and
which are hyperparameters. In formula (2) Oi ε[0, 1] rep- YOLOv3. In addition, from the perspective of FLOPs,
resents the IoU of the predicted bounding box and the groud proposed work greatly reduces the complexity. At the same
truth, C􏽢i � sigmoid(Ci ), Ci is the predicted value, and N is time, in terms of model size, proposed work (2) only oc-
the number of positive and negative samples. In formula (3), cupies 72.1 MB, which is 40.9% less than that of YOLOv4,
Oij ε{0, 1} indicates whether there is a jth class in the ith which largely depends on the impact of CSP structure in-
prediction bounding box, C 􏽢 ij � sigmoid(Cij ), Cij is the troduced in neck and 3/4 reduction in overall width. It is
predicted value, and Npos is the number of positive samples. suitable for carrying and using in autonomous driving.
In formula (4), ρ2 (b, bgt ) denotes the Euclidean distance
between the center points of bbox and GT, C is the diagonal
of the smallest circumscribed rectangle of the two boxes, and 4. Experiment
Cw , Ch are the width and height of the minimum circum-
scribed rectangle. 4.1. Dataset. In the experiment, we used KITTI [37] and
BDD100k [38], which are commonly used in autonomous
driving research. KITTI dataset consists of 7481 training sets
3.5. 3e Performance of Different Models. ,e parameter and 7518 test sets, including three classes: Car, Cyclist, and
quantity and calculation quantity of different network model Pedestrian. Since the test set has no label, the training set and the
weights are shown in Table 1. All models are tested at validation set are split by randomly dividing the training set into
512 × 512 resolution, with FP16-precision. two halves [39, 40]. BDD100k dataset is composed of 70,000
It can be seen that the parameters of proposed work (1) training sets, 10,000 validation sets, and 20,000 test sets, in-
are 11.61M less than YOLOv4 and 6.35M less than YOLOv3. cluding ten classes: person, rider, car, bus, truck, bike, motor,
,e parameters of proposed work (2) are reduced by 41.3% traffic light, traffic sign, and train. ,e ratio of training set and
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 Computational Intelligence and Neuroscience

Residual Residual

GAP+GMP

X AvgPool Y AvgPool
Conv+ReLU

Concat+Conv2D
Conv1×1

BatchNorm+Non-linear
Re-weight

Conv2D Conv2D
ChannelPool
Sigmoid Sigmoid
Conv7×7

Re-weight Re-weight

(a) (b)

Figure 5: Attention mechanism. (a) CBAM. (b) CA.

Conv

Concat
Conv1×1

Concat

MaxPool MaxPool MaxPool

5×5 9×9 13×13
Conv3×3 Conv3×3 Conv3×3
rate=1 rate=3 rate=5

Conv Conv1×1 Conv3×3 Conv3×3

Conv1×1 Conv1×1

Previous Layer

(a) (b)

Figure 6: (a) SPP layer. (b) RFB layer.

verification set is 7 :1. ,ere are about 1.46 million object in- traffic light, and traffic sign. Since we only studied the differ-
stances in training set and validation set, of which about 0.8 ences between models, 1/5 of the training set and validation set
million are car instances, while only 151 are train instances. ,is are randomly sampled as the final dataset. ,e experiment was
kind of unbalanced distribution among categories will lead to carried out on Ubuntu 18.04, NVIDIA Quadro M4000, CUDA
the decline of network feature extraction ability, so train, rider, 10.1, and cuDNN v7.6.5. ,e inference speed is related to the
and motor are ignored in the final evaluation. ,e final BDD hardware equipment. ,e inference test FPS in this paper is
dataset includes seven classes: person, car, bus, truck, bike, carried out on NVIDIA RTX 2080Ti.
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational Intelligence and Neuroscience 7

Table 1: Comparison of proposed work and YOLOv4. Table 2: K-means cluster.

Models Parameters (M) GFLOPs Model size (MB) Anchor 1 Anchor 2 Anchor 3
YOLOv3 58.70 100.1 117 KITTI
YOLOv4 63.96 87.9 122 Small object (10,29) (16,39) (10,90)
Proposed work (1) 52.35 71.3 100 Medium object (24,53) (37,71) (27,197)
Proposed work (2) 37.53 46.2 72.1 Large object (57,101) (79,163) (129,246)
BDD
Small object (5,6) (4,12) (7,11)
4.2. Anchor Design. For the KITTI and BDD datasets used in
Medium object (6,20) (13,17) (10,37)
this paper, we set the anchor box size to obtain accurate Large object (22,30) (41,57) (99,136)
prediction results. ,e results obtained by k-means clus-
tering algorithm are shown in Table 2.
Table 3: Confusion matrix.

4.3. Performance Evaluation of Proposed Work. In order to Prediction

check the effectiveness of the improved YOLOv4 network, a Real Positive Negative
comparative experiment is carried out between the original True TP TN
False FP FN
YOLOv4 model and the improved YOLOv4 model. Gen-
erally speaking, the test results can be divided into four
categories: TP (True Positive) is the positive sample of In order to reflect the performance of the improved
correct prediction; FP (False Positive) is the positive sample model entirely, the evaluation results are compared with
of false prediction; TN (True Negative) is the negative other researches [14, 18, 39]. ,ese experimental results are
sample of correct prediction; FN (False Negative) is the from [39], as shown in Table 4. ,ese researches are not
negative sample of false prediction. ,e confusion matrix is included in Table 5 as AP50 evaluation results of BDD
shown in Table 3. dataset.
,e number of all positive samples predicted by the As shown in Table 4, the mAP of YOLOv4 in KITTI
model is TP + FP, and the proportion of correct positive validation set is 86.43%, while the mAP of model 1 is 88.49%,
samples is called precision, as shown in formula (5). ,e and the detection accuracy is improved by 2.06%. ,e mAP
number of all positive samples in the validation set is of model 2 is 86.35%, which is 0.08% lower than that of
TP + FN, and the proportion of predicted positive sam- YOLOv4, but its parameters and calculation are much less,
ples is called recall, as shown in formula (6). and the inference speed is 6.33FPS higher. Table 5 shows the
performance of each model in each class of BDD validation
TP
Precision � , (5) set. Compared with YOLOv4, the mAP of model 1 is in-
TP + FP creased by 2.95% and that of model 2 is increased by 1.73%.
In addition, it can be seen that model 1 and model 2 sig-
TP
Recall � . (6) nificantly improve the detection accuracy of small objects
TP + FN such as traffic lights and traffic signs. For large objects such
AP value is usually used as a criterion to evaluate the as cars and trucks, the detection accuracy of the improved
performance of object detection model. AP value is the area model is almost the same as that of the original YOLOv4.
enclosed by P-R curve (with recall as x axis and accuracy as y From these results, it can be concluded that model 1 and
axis). AP represents the accuracy of the model in a certain model 2 can fully improve the detection accuracy of small
category; mAP represents the average accuracy of all cate- objects without sacrificing the detection accuracy of large
gories, which can measure the performance of the model in objects. It is worth mentioning that when the input size is
all categories. mAP50 represents all mAP values with IoU of increased to 704 × 704, the mAP reaches 61.34%, but it is the
prediction box and GT greater than 0.5. As shown in for- high precision obtained at the expense of speed.
mulas (7) and (8). In addition, the PR curves of the three common objects
of the BDD dataset, cars, people, and traffic lights, are shown
1 in Figure 7. PR curve is an important index for evaluating the
AP � 􏽚 P(R)dR, (7) output of object detection algorithm, and its area is the AP
0
value of this class. It can be seen from Figure 7 that the PR
􏽐N curves of model 1 and model 2 completely surround the
i�1 APi (8)
mAP � . YOLOv4, which also shows the effectiveness of the proposed
N
work.
For KITTI [37] dataset, the IoU of Car is usually set to
0.7, and Cyclist and Pedestrian are set to 0.5, while for BDD
dataset [38], the IoU of all classes is set to 0.5. In the training 4.4. Visual Evaluation. Figure 8 shows the visual compar-
of YOLOv4 and proposed model 1, the batch size is set to 16, ison of YOLOv4 and proposed work. It can be seen from the
while in model 2, the batch size is set to 32, the learning rate third row that, in the night environment, model 1 and model
is set to 0.003, and 300 epochs are trained. 2 can detect traffic light object missed by YOLOv4. In the
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 Computational Intelligence and Neuroscience

Table 4: Evaluation in KITTI.

Detection algorithm Car AP70 Pedestrian AP50 Cyclist AP50 mAP (%) FPS Input size
MS-CNN [14] 87.42 80.43 86.28 84.71 8.13 1920 × 576
SINet [15] 89.82 79.20 87.23 85.42 23.98 1920 × 576
SSD [12] 85.12 48.06 50.68 61.28 28.93 512 × 512
ReﬁneDet [17] 92.74 78.45 81.90 84.36 27.81 512 × 512
CFENet [16] 88.47 — — — — 512 × 512
RFBNet [18] 86.39 61.62 72.31 73.44 39.20 512 × 512
YOLOv3 [23] 79.49 79.01 83.07 80.52 43.57 512 × 512
Gaussian YOLOv3 [39] 87.33 79.90 83.60 83.61 43.13 512 × 512
YOLOv4 [24] 90.50 80.10 88.70 86.43 52.14 512 × 512
Proposed work (1) 92.38 83.60 89.50 88.49 48.37 512 × 512
Proposed work (2) 90.05 81.10 87.90 86.35 58.47 512 × 512

Table 5: Evaluation in BDD.

Detection algorithm Person Car Bus Truck Bike Traﬃc light Traﬃc sign mAP50 (%) FPS Input size
YOLOv4 51.70 69.20 49.30 55.70 43.00 52.30 55.00 53.74 52.84 512 × 512
Proposed work (1) 57.30 73.00 50.20 54.00 43.50 58.90 59.90 56.69 48.56 512 × 512
Proposed work (2) 55.70 72.00 47.50 53.50 44.80 56.80 58.00 55.47 57.67 512 × 512
Proposed work (1) 54.90 77.60 54.90 59.40 50.40 65.30 66.90 61.34 41.20 704 × 704

person AP50 PR-Curve car AP50 PR-Curve

1.0 1.0

0.8 0.8

0.6 0.6
Precision

Precision

0.4 0.4

0.2 0.2

0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Recall Recall
yolov4 yolov4
ours (2) ours (2)
ours (1) ours (1)
(a) (b)
traﬃc Light AP50 PR-Curve [email protected] PR-Curve
1.0 1.0

0.8 0.8

0.6 0.6
Precision

Precision

0.4 0.4

0.2 0.2

0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Recall Recall

yolov4 yolov4 0.537

ours (2) ours (2) 0.555
ours (1) ours (1) 0.567
(c) (d)

Figure 7: P-R curve.

8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
9

Figure 8: (a) YOLOv4 inference results. (b) Proposed work (1) inference. (c) Proposed work (2) inference.
(c)
(b)
Computational Intelligence and Neuroscience

(a)
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 Computational Intelligence and Neuroscience

fourth row, model 1 can supplement the detection of in- Proceedings of the IEEE Conference International Conference
correct traffic sign in YOLOv4. In rows 5 and 6, model 1 and on Neural Information Processing, pp. 425–432, Springer,
model 2 can find more small objects than YOLOv4. ,e Daegu, South Korea, November 2013.
weather in the first row and the last row is better, and the [3] X. Dai, “Hybridnet: a fast vehicle detection system for au-
detection frame of the improved algorithm is more accurate. tonomous driving,” Signal Processing: Image Communication,
vol. 70, pp. 79–88, 2019.
Based on these results, model 1 and model 2 can sig-
[4] M. Bassani, L. Rossetti, and L. Catani, “Spatial analysis of road
nificantly improve the detection accuracy, so as to improve crashes involving vulnerable road users in support of road
driving stability and efficiency, prevent fatal accidents, meet safety management strategies,” Transportation Research
the needs of autonomous driving real-time object detection Procedia, vol. 45, pp. 394–401, 2020.
task, and have practical application value. [5] C. Zhang, Y. Liu, D. Zhao, and Y. Su, “Roadview: a traffic
scene simulator for autonomous vehicle simulation testing,”
5. Conclusions in Proceedings of the 17th International IEEE Conference on
Intelligent Transportation Systems (ITSC), pp. 1160–1165,
Real-time object detection technology is of great significance IEEE, Qingdao, China, October 2014.
in the field of autonomous driving. Aimed at the problem of [6] G. S. R. Satyanarayana, S. Majhi, and S. K. Das, “A vehicle
insufficient accuracy of one-stage detector in autonomous detection technique using binary images for heterogeneous
driving scene, based on YOLOv4, this paper replaces SPP and lane-less traffic,” IEEE Transactions on Instrumentation
with RFB structure after backbone, integrates CSP structure and Measurement, vol. 70, pp. 1–14, 2021.
[7] Z. Liu, Y. Cai, H. Wang et al., “Robust target recognition and
with less computation into neck structure, and finally adds
tracking of self-driving cars with radar and camera infor-
CBAM and CA attention mechanism to make the neural mation fusion under severe weather conditions,” IEEE
network pay more attention to the object area containing Transactions on Intelligent Transportation Systems, no. 99,
important information, suppress irrelevant information, and pp. 1–14, 2021.
improve detection accuracy. ,e experimental results show [8] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
that the improved model 1 has higher accuracy than the hierarchies for accurate object detection and semantic seg-
original YOLOv4 in object detection task. ,e mAP is mentation,” in Proceedings of the IEEE Conference on Com-
improved by 2.06% in KITTI validation set and 2.95% in puter Vision and Pattern Recognition, pp. 580–587, Columbus,
BDD validation set. ,e mAP50 of model 2 is increased by OH, USA, June 2014.
1.73%, and the inference speed is increased by 4.83 fps, [9] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE inter-
which verifies the effectiveness of the improved algorithm. It national Conference on Computer Vision, pp. 1440–1448,
provides a theoretical reference for further practical appli- Santiago, Chile, December 2015.
[10] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: towards
cation. In the follow-up work, some researchers are con-
real-time object detection with region proposal networks,” in
cerned about how to improve the detection accuracy of Proceedings of the Advances in Neural Information processing
[7, 41, 42] at night and under bad weather conditions, and systems, pp. 91–99, Montreal, Quebec, Canada, December
further improvement of the detection accuracy will also be 2015.
our next research direction. [11] J. Dai, Yi Li, K. He, and J. S. R-fcn, “Object detection via
region-based fully convolutional networks,” in Proceedings of
Data Availability the Advances in neural information processing systems,
pp. 379–387, Barcelona, Spain, December 2016.
All data included in this study can be downloaded from the [12] W. Liu, D. Anguelov, D. Erhan et al., “Ssd: single shot
official websites of KITTI and BDD100k or obtained by multibox detector,” in Proceedings of the European Conference
contacting the corresponding authors. on Computer Vision, pp. 21–37, Springer, Amsterdam,
Netherlands, October 2016.
[13] R. Joseph, S. Divvala, R. Girshick, and F. Ali, “You only look
Conflicts of Interest once: unified, real-time object detection,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition,
,e authors declare that there are no conflicts of interest
pp. 779–788, Las Vegas, NV, USA, June 2016.
regarding the publication of this paper. [14] Z. Cai, Q. Fan, R. S. Feris, and N. Vasconcelos, “A unified
multi-scale deep convolutional neural network for fast object
Acknowledgments detection,” in Proceedings of the European conference on
computer vision, pp. 354–370, Springer, Amsterdam, Neth-
,is work was financially supported by the Natural Science erlands, October 2016.
Foundation of Jilin Provincial (no. 20200201053JC). [15] X. Hu, X. Xu, Y. Xiao et al., “Sinet: a scale-insensitive con-
volutional neural network for fast vehicle detection,” IEEE
References Transactions on Intelligent Transportation Systems, vol. 20,
no. 3, pp. 1010–1019, 2019.
[1] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning [16] Q. Zhao, Y. Wang, T. Sheng, and Z. Tang, “Comprehensive
for image recognition,” in Proceedings of the IEEE Conference feature enhancement module for single-shot object detector,”
on Computer Vision and Pattern Recognition, pp. 770–778, Las in Proceedings of the IEEE Conference Asian Conference on
Vegas, NV, USA, June 2016. Computer Vision, Springer, Perth, Australia, December 2018.
[2] F. Liu, B. Liu, C. Sun, M. Liu, and X. Wang, “Deep learning [17] S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, “Single-shot
approaches for link prediction in social network services,” in refinement neural network for object detection,” in
8483, 2021, 1, Downloaded from https://onlinelibrary.wiley.com/doi/10.1155/2021/9218137 by Egyptian National Sti. Network (Enstinet), Wiley Online Library on [09/10/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Computational Intelligence and Neuroscience 11

Proceedings of the IEEE Conference on Computer Vision and [33] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for
Pattern Recognition, pp. 4203–4212, Salt Lake City, UT, USA, efficient mobile network dessigh,” 2021, https://arxiv.org/abs/
June 2018. 2103.02907.
[18] S. Liu, Di Huang, and Y. Wang, “Receptive field block net for [34] Y. Jiahui, Y. Jiang, Z. Wang, Z. Cao, and T. Huang, “UnitBox:
accurate and fast object detection,” in Proceedings of the an advanced object detection network,” in Proceedings of the
European Conference on Computer Vision (ECCV), pp. 385– 24th ACM international conference on Multimedia, pp. 516–
400, Munich, Germany, September 2018. 520, Amsterdam, Netherlands, October 2016.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling [35] Z. Zheng, P. Wang, W. Liu, J. Li, R. Ye, and D. Ren, “Distance-
in deep convolutional networks for visual recognition,” IEEE IoU Loss: faster and better learning for bounding box re-
Transactions on Pattern Analysis and Machine Intelligence, gression,” in Proceedings of the AAAI Conference on Artificial
Intelligence (AAAI), New York, NY, USA, February 2020.
vol. 37, no. 9, pp. 1904–1916, 2015.
[36] Yi-F. Zhang, W. Ren, Z. Zhang, Z. Jia, L. Wang, and T. Tan,
[20] T.-Yi Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and
“Focal and efficient IoU loss for accurate bounding box re-
S. Belongie, “Feature pyramid networks for object detection,”
gression,” 2021, https://arxiv.org/abs/2101.08158.
in Proceedings of the IEEE Conference on Computer Vision and
[37] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for au-
Pattern Recognition(CVPR), pp. 2117–2125, Honolulu, HI, tonomous driving? the kitti vision benchmark suite,” in
USA, July 2017. Proceedings of the 2012 IEEE Conference on Computer Vision
[21] S. Liu, Qi Lu, H. Qin, J. Shi, and J. Jia, “Path aggregation and Pattern Recognition, pp. 3354–3361, IEEE, Providence,
network for instance segmentation,” in Proceedings of the Rhode Island, June 2012.
IEEE Conference on Computer Vision and Pattern Recognition [38] Yu Fisher, W. Xian, Y. Chen et al., “Bdd100k: a diverse driving
(CVPR), pp. 8759–8768, Salt Lake City, UT, USA, June 2018. video database with scalable annotation tooling,” 2018,
[22] S. Liu, Di Huang, and Y. Wang, “Learning spatial fusion for https://arxiv.org/abs/1805.04687.
single-shot object detection,” 2019, https://arxiv.org/abs/1911. [39] J. Choi, D. Chun, H. Kim, and H.-J. Lee, “Gaussian YOLOv3:
09516. an accurate and fast object detector using localization un-
[23] R. Joseph and F. Ali, “YOLOv3: an incremental improve- certainty for autonomous driving,” in Proceedings of the IEEE
ment,” 2018, https://arxiv.org/abs/1804.02767. International Conference on Computer Vision (ICCV),
[24] A. Bochkovskiy, C.-Y. Wang, and H. Y. Mark Liao, “YOLOv4: pp. 502–511, Seoul, South Korea, October 2019.
optimal speed and accuracy of object detection,” 2020, https:// [40] B. Wu, F. Iandola, P. H. Jin, and K. Keutzer, “Squeezedet:
arxiv.org/abs/2004.10934. unified, small, low power fully convolutional neural networks
[25] Y. Cai, T. Luan, H. Gao et al., “YOLOv4-5D: an effective and for real-time object detection for autonomous driving,” in
efficient object detector for autonomous driving,” IEEE Proceedings of the IEEE Conference on Computer Vision and
Transactions on Instrumentation and Measurement, vol. 70, Pattern Recognition Workshops, pp. 129–137, Honolulu, HI,
2021. USA, July 2017.
[26] M. Tan, R. Pang, and V. Le Quoc, “EfficientDet: scalable and [41] A. Bell, T. Mantecón, C. Dı́az, C. R. del-Blanco, F. Jaureguizar,
efficient object detection,” in Proceedings of the IEEE Con- and N. Garcı́a, “A novel system for nighttime vehicle de-
ference on Computer Vision and Pattern Recognition (CVPR), tection based on foveal classifiers with real-time perfor-
mance,” IEEE Transactions on Intelligent Transportation
Seattle, WA, USA, June 2020.
Systems, 2021.
[27] R. Joseph and F. Ali, “YOLO9000: better, faster, stronger,” in
[42] M. Hnewa and H. Radha, “Object detection under rainy
Proceedings of the IEEE Conference on Computer Vision and
conditions for autonomous vehicles: a review of state-of-the-
Pattern Recognition (CVPR), pp. 7263–7271, Honolulu, HI,
art and emerging techniques,” IEEE Signal Processing Mag-
USA, July 2017. azine, vol. 38, no. 1, pp. 53–67, 2020.
[28] T.-Y. Lin, M. Maire, S. Belongie et al., “Microsoft COCO:
common objects in context,” in Proceedings of the European
Conference on Computer Vision (ECCV), pp. 740–755, Zurich,
Switzerland, Septem 2014.
[29] S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: con-
volutional block attention module,” in Proceedings of the
European Conference on Computer Vision (ECCV), pp. 3–19,
Munich, Germany, September 2018.
[30] R. Hamid, T. Nathan, J. Y. Gwak, A. Sadeghian, I. Reid, and
S. Savarese, “Generalized intersection over union: a metric
and a loss for bounding box regression,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pp. 658–666, Las Vegas, NV, USA, June 2019.
[31] C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen,
J.-W. Hsieh, and I.-H. Yeh, “CSPNet: a new backbone that can
enhance learning capability of CNN,” in Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition
Workshop (CVPR Workshop), Seattle, WA, USA, June 2020.
[32] J. Hu, Li Shen, and G. Sun, “Squeeze-and-excitation net-
works,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pp. 7132–7141, Salt
Lake City, UT, USA, June 2018.

Information Asset Management Part 1
No ratings yet
Information Asset Management Part 1
7 pages
Usasma Briefing Guide w122
No ratings yet
Usasma Briefing Guide w122
15 pages
Computational Intelligence and Neuroscience - 2021 - Wang - A Real Time Object Detector For Autonomous Vehicles Based On
No ratings yet
Computational Intelligence and Neuroscience - 2021 - Wang - A Real Time Object Detector For Autonomous Vehicles Based On
11 pages
Choi Gaussian YOLOv3 An Accurate and Fast Object Detector Using Localization ICCV 2019 Paper
No ratings yet
Choi Gaussian YOLOv3 An Accurate and Fast Object Detector Using Localization ICCV 2019 Paper
10 pages
Self Attention Yolov3
No ratings yet
Self Attention Yolov3
12 pages
Autonomous Drivingg
No ratings yet
Autonomous Drivingg
12 pages
Sensors 23 03385
No ratings yet
Sensors 23 03385
20 pages
Improving The Vehicle Small Object Detection Algorithm of Yolov5
No ratings yet
Improving The Vehicle Small Object Detection Algorithm of Yolov5
11 pages
DAYOLO
No ratings yet
DAYOLO
19 pages
IEEE Journal Submission Trans On MTT Example
No ratings yet
IEEE Journal Submission Trans On MTT Example
5 pages
YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices
No ratings yet
YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices
12 pages
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
No ratings yet
Comparative Analysis of Feature Descriptors and Classifiers For Real-Time Object Detection
11 pages
Enhancing Object Detection in Self Driving Cars Using A 3nb1910g
No ratings yet
Enhancing Object Detection in Self Driving Cars Using A 3nb1910g
12 pages
Pedestrian Detection System Based On Deep Learning
No ratings yet
Pedestrian Detection System Based On Deep Learning
5 pages
Major PRC-1 ppt-1
No ratings yet
Major PRC-1 ppt-1
12 pages
(2024-AEJ) Z-YOLOv8s-based Approach For Road Object Recognition in Complex Traffic Scenarios
No ratings yet
(2024-AEJ) Z-YOLOv8s-based Approach For Road Object Recognition in Complex Traffic Scenarios
14 pages
Fast and Accurate Object Detector For Autonomous D
No ratings yet
Fast and Accurate Object Detector For Autonomous D
14 pages
Project Report (Group 9)
No ratings yet
Project Report (Group 9)
20 pages
YED-YOLO: An Object Detection Algorithm For Automatic Driving
No ratings yet
YED-YOLO: An Object Detection Algorithm For Automatic Driving
9 pages
Vehicle Detection and Classification Using Three Variations of You Only Look Once Algorithm
No ratings yet
Vehicle Detection and Classification Using Three Variations of You Only Look Once Algorithm
11 pages
Applsci 14 11257
No ratings yet
Applsci 14 11257
17 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Computer Vision-Based Lane Detection and Detection of Vehicle, Traffic Sign, Pedestrian G Öztürk
No ratings yet
Computer Vision-Based Lane Detection and Detection of Vehicle, Traffic Sign, Pedestrian G Öztürk
13 pages
Cin2022 4423744
No ratings yet
Cin2022 4423744
12 pages
RRPN: Radar Region Proposal Network For Object Detection in Autonomous Vehicles
No ratings yet
RRPN: Radar Region Proposal Network For Object Detection in Autonomous Vehicles
5 pages
Improved Vision-Based Vehicle Detection and Classification by Optimized YOLOv4
No ratings yet
Improved Vision-Based Vehicle Detection and Classification by Optimized YOLOv4
14 pages
Enhancement of Robustness in Object Detection Module For Advanced Driver Assistance Systems
No ratings yet
Enhancement of Robustness in Object Detection Module For Advanced Driver Assistance Systems
6 pages
A Real-Time Collision Detection System For Vehicles
No ratings yet
A Real-Time Collision Detection System For Vehicles
6 pages
Report 34
No ratings yet
Report 34
26 pages
Fin Irjmets1699034517
No ratings yet
Fin Irjmets1699034517
6 pages
4 - Engineering Vehicles Detection For Warehouse Surveillance System Based On Modified YOLOv4-Tiny
No ratings yet
4 - Engineering Vehicles Detection For Warehouse Surveillance System Based On Modified YOLOv4-Tiny
17 pages
Applsci 14 05841 With Cover
No ratings yet
Applsci 14 05841 With Cover
26 pages
Helmet Detection Using Machine Learning and Automatic License Final
75% (4)
Helmet Detection Using Machine Learning and Automatic License Final
47 pages
JOIV - Template - A Thorough Review of Vehicle Detection and Distance Estimation Using Deep Learning in Autonomous Cars
No ratings yet
JOIV - Template - A Thorough Review of Vehicle Detection and Distance Estimation Using Deep Learning in Autonomous Cars
10 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Final Project Paper Akash
No ratings yet
Final Project Paper Akash
5 pages
Synopsis - Internship - Group-53
No ratings yet
Synopsis - Internship - Group-53
8 pages
1 s2.0 S0167865523000727 Main
No ratings yet
1 s2.0 S0167865523000727 Main
8 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
YOLOPv2 bdd100k
No ratings yet
YOLOPv2 bdd100k
8 pages
Electronics 13 02790
No ratings yet
Electronics 13 02790
15 pages
Maity 2021
No ratings yet
Maity 2021
6 pages
Multi-Modal 3D Object Detection in Autonomous Driving A Survey and Taxonomy
No ratings yet
Multi-Modal 3D Object Detection in Autonomous Driving A Survey and Taxonomy
18 pages
Report 34
No ratings yet
Report 34
22 pages
Yolo11 Car
No ratings yet
Yolo11 Car
16 pages
EdgeYOLO AnEdge-Real-Time Object Detector
No ratings yet
EdgeYOLO AnEdge-Real-Time Object Detector
7 pages
2004 10934v1 PDF
No ratings yet
2004 10934v1 PDF
17 pages
Comparison and Evaluation of YOLO Models For Vehicle Detection On Bicycle Paths
No ratings yet
Comparison and Evaluation of YOLO Models For Vehicle Detection On Bicycle Paths
10 pages
Relatório Cíentifico - A Comparative Study of The Art Deep Learning Algorithms For Vehicle Detection
No ratings yet
Relatório Cíentifico - A Comparative Study of The Art Deep Learning Algorithms For Vehicle Detection
14 pages
1 s2.0 S0045790618319682 Main
No ratings yet
1 s2.0 S0045790618319682 Main
11 pages
Embedded YOLO A Real-Time Object Detector For Smal
No ratings yet
Embedded YOLO A Real-Time Object Detector For Smal
11 pages
Object Detection For Self Driving Car in Complex Traffic Scenarios
No ratings yet
Object Detection For Self Driving Car in Complex Traffic Scenarios
8 pages
Smart Cities
No ratings yet
Smart Cities
23 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
No ratings yet
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
4 pages
Article For ROBOT Camera Version
No ratings yet
Article For ROBOT Camera Version
13 pages
Abstract + Introduction
No ratings yet
Abstract + Introduction
15 pages
Iraqi Traffic Signs Detection Based On Yolov5
No ratings yet
Iraqi Traffic Signs Detection Based On Yolov5
6 pages
A Novel Lightweight Real Time Traffic Sign Detection Method Based On An Embedded Device and Yolov8
No ratings yet
A Novel Lightweight Real Time Traffic Sign Detection Method Based On An Embedded Device and Yolov8
10 pages
Analyzing Real-Time Object Detection With YOLO Alg
No ratings yet
Analyzing Real-Time Object Detection With YOLO Alg
43 pages
Electronics 13 01862 1
No ratings yet
Electronics 13 01862 1
18 pages
Object Detection Classification and Tracking For Autonomous Veh
No ratings yet
Object Detection Classification and Tracking For Autonomous Veh
46 pages
Tits2024 Jiachengjiajunkunyu
No ratings yet
Tits2024 Jiachengjiajunkunyu
14 pages
Traj 12
No ratings yet
Traj 12
9 pages
Multiple Object Tracking in Recent Times A Literat
No ratings yet
Multiple Object Tracking in Recent Times A Literat
18 pages
Multi-Object Tracking Algorithm For Unmanned Vehic
No ratings yet
Multi-Object Tracking Algorithm For Unmanned Vehic
9 pages
Main
No ratings yet
Main
14 pages
Traj 11
No ratings yet
Traj 11
16 pages
How Simulation Helps Autonomous Driving
No ratings yet
How Simulation Helps Autonomous Driving
18 pages
Regional Feature Fusion For On-Road Detection of Objects Using Camera and 3D-Lidar in High-Speed Autonomous Vehicles
No ratings yet
Regional Feature Fusion For On-Road Detection of Objects Using Camera and 3D-Lidar in High-Speed Autonomous Vehicles
20 pages
Scholarworks at Utrgv Scholarworks at Utrgv
No ratings yet
Scholarworks at Utrgv Scholarworks at Utrgv
15 pages
Multimodal Fusion Object Detection System For Autonomous Vehicles
No ratings yet
Multimodal Fusion Object Detection System For Autonomous Vehicles
9 pages
Gas Turbines Structural Properties Operation Principles and Design Features 9819909767 9789819909766
No ratings yet
Gas Turbines Structural Properties Operation Principles and Design Features 9819909767 9789819909766
250 pages
Cisco Networking Academy: CCNA Routing & Switching Equipment List
No ratings yet
Cisco Networking Academy: CCNA Routing & Switching Equipment List
4 pages
SAP English For Professional English 1: Dosen
No ratings yet
SAP English For Professional English 1: Dosen
3 pages
Cryptography Assignment-2
No ratings yet
Cryptography Assignment-2
6 pages
Building User Guide
100% (1)
Building User Guide
33 pages
1 - NOx Emission by SVM, Zeldovich, Regression Correlation
No ratings yet
1 - NOx Emission by SVM, Zeldovich, Regression Correlation
17 pages
352-001 Dumps With 100% Passing Guarantee
No ratings yet
352-001 Dumps With 100% Passing Guarantee
7 pages
Core Java
No ratings yet
Core Java
6 pages
Datasheet Resun 545W
No ratings yet
Datasheet Resun 545W
2 pages
She Feels A Monster Dick: Options: Rate This Video
No ratings yet
She Feels A Monster Dick: Options: Rate This Video
1 page
Beijing Valve General Factory Co., LTD.: Introduction of Foundries
No ratings yet
Beijing Valve General Factory Co., LTD.: Introduction of Foundries
3 pages
2012 CL605 Spec431
No ratings yet
2012 CL605 Spec431
15 pages
Optional Equipment Selection Guide
No ratings yet
Optional Equipment Selection Guide
30 pages
CEN/TS 14997: Technical Specification Spécification Technique Technische Spezifikation
No ratings yet
CEN/TS 14997: Technical Specification Spécification Technique Technische Spezifikation
29 pages
Vol 4
No ratings yet
Vol 4
216 pages
Technical Specifications of Distance Relays For 66 KV Lines SL No Particulars Specifications
No ratings yet
Technical Specifications of Distance Relays For 66 KV Lines SL No Particulars Specifications
4 pages
Capr III 2015
No ratings yet
Capr III 2015
47 pages
ACG 10a
No ratings yet
ACG 10a
38 pages
Economics: - Economy Is Composed of Households and Firms. - Economics Is The Study of How Households
No ratings yet
Economics: - Economy Is Composed of Households and Firms. - Economics Is The Study of How Households
31 pages
Getting Electricity Supply TNB
No ratings yet
Getting Electricity Supply TNB
24 pages
The Analysis of Earth Pressure On Retaining Wall Based On ABAQUS
No ratings yet
The Analysis of Earth Pressure On Retaining Wall Based On ABAQUS
4 pages
Clarifier
No ratings yet
Clarifier
9 pages
Header Condition
100% (1)
Header Condition
75 pages
Fully-Differential Isolation Amplifier: Features Description
No ratings yet
Fully-Differential Isolation Amplifier: Features Description
23 pages
June 2011 (v1) QP - Paper 2 CIE Physics IGCSE PDF
No ratings yet
June 2011 (v1) QP - Paper 2 CIE Physics IGCSE PDF
20 pages
Articipants 3 PDF
0% (1)
Articipants 3 PDF
21 pages
Wireless and Mobile Network Assignment 2 # 1G, 2G, 3G, 4G
33% (3)
Wireless and Mobile Network Assignment 2 # 1G, 2G, 3G, 4G
3 pages
Stuff - IN009194 PDF
No ratings yet
Stuff - IN009194 PDF
110 pages

Research Article: A Real-Time Object Detector For Autonomous Vehicles Based On Yolov4

Uploaded by

Research Article: A Real-Time Object Detector For Autonomous Vehicles Based On Yolov4

Uploaded by

Hindawi

Computational Intelligence and Neuroscience

Correspondence should be addressed to Zhengwei Xu; [email protected]

Academic Editor: Jianli Liu

1. Introduction to ensure the safety in driving. Generally, autonomous ve-

Figure 2: Proposed work (2) structure.

neck as the baseline of this paper. In addition, in model 2,

Figure 4: (a) CSP in YOLOv4. (b) CSP in proposed work (2).

L � λ1 Lobj + λ2 Lcls + λ3 Lbox , (1)

ρ2 􏼐b, bgt 􏼑 ρ2 􏼐w, wgt 􏼑 ρ2 􏼐h, hgt 􏼑

Figure 5: Attention mechanism. (a) CBAM. (b) CA.

MaxPool MaxPool MaxPool

Conv Conv1×1 Conv3×3 Conv3×3

Figure 6: (a) SPP layer. (b) RFB layer.

Table 1: Comparison of proposed work and YOLOv4. Table 2: K-means cluster.

4.3. Performance Evaluation of Proposed Work. In order to Prediction

Table 4: Evaluation in KITTI.

Table 5: Evaluation in BDD.

person AP50 PR-Curve car AP50 PR-Curve

yolov4 yolov4 0.537

Figure 7: P-R curve.

You might also like