0% found this document useful (0 votes)
41 views12 pages

YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices

Research paper for Objective detection

Uploaded by

robiul bogura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views12 pages

YOLOv8n-FAWL Object Detection For Autonomous Driving Using YOLOv8 Network On Edge Devices

Research paper for Objective detection

Uploaded by

robiul bogura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Received 22 September 2024, accepted 10 October 2024, date of publication 15 October 2024, date of current version 5 November 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3480976

YOLOv8n-FAWL: Object Detection for


Autonomous Driving Using YOLOv8
Network on Edge Devices
ZIBIN CAI , RONGRONG CHEN , ZIYI WU, AND WUYANG XUE
School of Electronic and Electrical Engineering, Zhaoqing University, Zhaoqing 526061, China
Corresponding author: Rongrong Chen ([email protected])
This work was supported by the National College Students Innovation and Entrepreneurship Training Program under Grant 202310580013.

ABSTRACT In the field of autonomous driving, common challenges include difficulties in detecting
small vehicles and pedestrians on the road, high computational demands of algorithms, and low accuracy
of detection algorithms. This paper proposes a YOLOv8n-FAWL object detection algorithm tailored for
edge computing, incorporating the following three improvements: (1) The Faster-C2f-EMA module is
created, designed through the synergy of the FasterNet architecture and the concept of EMA modules,
effectively addressing the challenge of suboptimal feature extraction for small objects. (2) The WIOU loss
function is adopted to resolve the issue of imbalanced training samples. (3) The LAMP pruning technique
is applied to reduce the model parameters and complexity, thereby enhancing the overall model accuracy.
The experimental results show that compared to the baseline model, the proposed algorithm achieves
improvements of 6.2% and 4.5% in the [email protected], and 3.8% and 2.7% in the [email protected]:0.95, on the
Udacity and BDD100K-tiny datasets,respectively. In addition, the model parameters we’re reduced by 49.2%
and 46%. The model achieved real-time performance at 54 FPS, thereby advancing the development of
autonomous driving technology.

INDEX TERMS YOLOv8n, autonomous driving, LAMP, FasterNet, EMA, WIOU.

I. INTRODUCTION capability and overall performance of autonomous driving


In the 21 st century, automobiles have become an indispens- systems, particularly in complex and dynamically changing
able means of transportation. The number of newly registered traffic environments. Therefore, high-precision object detec-
vehicles and driver licenses worldwide is rapidly increasing. tion is crucial.
This rapid growth in the number of motor vehicles has also However, object detection in autonomous driving presents
resulted in other issues such as traffic accidents, congestion, numerous technical and practical challenges. First, the diver-
and environmental pollution. sity and complexity of visual environments pose significant
Object detection plays a pivotal role in autonomous driving difficulties for recognition, such as variations in lighting
technologies. It is a necessary to ensure the correct execution conditions and identification of small objects, which can
of driving decisions, path planning, obstacle avoidance, and reduce the accuracy of object detection [2]. Second, low-
compliance with traffic rules within autonomous driving quality samples in object detection model training may
systems by detecting and recognizing various traffic entities adversely affect the overall performance of the model [3].
including vehicles, pedestrians, cyclists, traffic signs, and Finally, the computational capabilities of embedded hardware
road obstacles [1]. Moreover, the efficiency and accuracy and the portability of models limit the complexity of object
of object detection directly affect the real-time response detection algorithms. These applications must to achieve real-
time performance on platforms with limited computational
The associate editor coordinating the review of this manuscript and resources [4]. In view of these challenges, the current
approving it for publication was Yu-Da Lin . research focus has shifted towards achieving lightweight
2024 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
158376 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ VOLUME 12, 2024
Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

object detection models while ensuring detection accuracy. inference speed. Although YOLOv1 initially lagged behind
This not only helps improve the real-time response capability the Faster R-CNN [9] in terms of accuracy, its outstanding
of autonomous driving systems but also facilitates the real-time performance makes it highly attractive for practical
deployment and application of models on embedded devices applications.
with limited computational power [5]. Subsequent YOLO versions, such as YOLOv4 [10], have
Therefore, to address the aforementioned challenges, further improved the detection accuracy by incorporating
this study proposes a YOLOv8n-Faster-C2f-EMA-WIOU- advanced deep learning techniques. YOLOv4 combines
LAMP(FAWL) model. The primary contributions of this a cross-stage partial network (CSPNet) [11] and Spatial
model are as follows: Pyramid Pooling (SPP) [12] to enhance the feature extraction
1. This study devised a module named Faster-C2f-EMA, and object detection capabilities. These enhancements con-
which, upon comparison with the original model across two tributed to the continuous improvement of the YOLO series
datasets, significantly enhanced the accuracy of the model in in the field of autonomous driving.
detecting small objects while maintaining its inference speed. In terms of current usage, YOLOv5 [13] and YOLOv7 [14]
2.We adopted the WIoU [6] loss function, which reduces are the two most widely accepted algorithms. Compared to
the harmful gradients generated by suboptimal training YOLOv4, YOLOv5 improved the in model structure, training
samples and enhances the overall detection performance of strategy, and performance. It can effectively reduce redundant
the model. computations and improve the computational efficiency.
3.We propose employing the LAMP [7] pruning algorithm However, YOLOv5 has certain drawbacks. For example,
to reduce the model parameters and computational complex- it still has some deficiencies in small object detection, and
ity while simultaneously enhancing the overall accuracy of its detection effect on dense objects also needs improvement.
the model, making it suitable for deployment on embedded YOLOv7 proposes a new training strategy called the
devices. Compared with existing algorithms, the pruned Trainable Bag of Freebies (TOF) to enhance the performance
model, when evaluated on two datasets, demonstrates signif- of real-time object detectors. The TOF method encompasses
icant improvements in both detection accuracy and speed for a series of trainable techniques, such as data augmentation
the YOLOv8n-FAWL algorithm. and MixUp. Applying TOF to three different types of object
4. We conducted tests on a high-performance Jetson Orin detectors (SSD, RetinaNet, and YOLOv3) can significantly
Nano edge computing device, achieving an improved model improve the accuracy and generalization ability of object
inference speed of 54FPS, which meets the requirements detectors. However, YOLOv7 is also constrained by low-
for real-time detection. This validates the feasibility and quality annotated data, model structures, and hyperparame-
effectiveness of the lightweight framework of the YOLOv8n- ters, which can lead to performance degradation in certain
FAWL model in practical applications. scenarios.
The rest of this paper is organized as follows: Section II The latest YOLOv8 [15] further refined its network
introduces the evolution of YOLO series algorithms and architecture and computational flow to enhance response
lightweight model techniques. Section III details the speed and accuracy in high-speed driving environments.
YOLOv8n-FAWL algorithm. In Section IV, we present the Depending on the depth and width of the network, YOLOv8
experimental datasets, parameter settings, and evaluation can be categorized as YOLOv8n, YOLOv8s, YOLOv8m,
metrics, along with the results of the ablation studies YOLOv8l, and YOLOv8x.
and comparative experiments. Section V summarizes the
proposed algorithm and provides an outlook for future work.

II. RELATED WORK


A. YOLO FOR OBJECT DETECTION
In the field of autonomous driving, target detection algo-
rithms play a crucial role in, enabling vehicles to perceive
their surroundings in real-time. The You Only Look Once
(YOLO) series of algorithms has emerged as a standout
choice because of its exceptional real-time detection perfor-
mance and commendable accuracy, making it an integral part
of autonomous driving vision perception systems.
In the evolution of Deep Convolutional Neural Networks
(DCNNs) for object detection, YOLO series algorithms
stand out prominently. Since Redmon et al. [8] introduced FIGURE 1. YOLOv8n structure diagram.

YOLOv1, the YOLO series has adopted an innovative


single-stage detection method, revolutionizing the traditional As shown in Figure 1, the YOLOv8n model architecture
multi-stage detection paradigm and significantly accelerating primarily comprises a backbone neck, and head network. The

VOLUME 12, 2024 158377


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

backbone section of YOLOv8 is largely similar to that of model complexity by reducing redundant connections or
YOLOv5, leveraging the CSP (Cross Stage Partial) concept neurons in the network. Slimming [23], on the other hand,
but substituting the C3 modules with C2f modules. The is an example of fine-grained pruning that trims parameters
C2f module, featuring a dense residual structure, enables by removing unimportant weights or neurons. Zhang et al.
YOLOv8 to obtain richer gradient flow information while [24] proposed an adaptive pruning method for optimizing
maintaining a lightweight design. At the end of the backbone, lightweight transformers (PAOLTransformer), which uses
the most popular SPPF (Spatial Pyramid Pooling Fast) norm information to assess the contribution of each element
module is utilized. The SPPF layer enhances the receptive in the model to the output and automates the pruning
field and captures the feature information across different process through reinforcement learning to achieve the
levels. optimal compression ratio. PSE-Net [25] is a new approach
The neck part employs the PAN-FPN feature fusion for channel pruning in convolutional neural networks that
method to, strengthen the representation of the entire feature accelerates supernetwork training using a parallel subnet
hierarchy as well as the fusion and utilization of information training algorithm and employs a prior distribution sampling
through a bottom-up path augmentation approach. strategy to identify optimal substructures subject to resource
Finally, the Head section obtains the target class informa- constraints. The LAMP algorithm adopted in this study
tion by decoupling the classification and detection processes integrates hierarchical automatic pruning with adaptive
through the three prediction branches. pruning rate adjustments customized to the characteristics of
In summary, despite the significant progress made by the each layer, thereby effectively eliminating redundant data in
YOLO series in the field of autonomous driving, challenges the model.
remain in areas such as adapting to low-quality annotated Finally, after being accelerated with TensorRT [26], the
data, improving the small-object detection performance, and improved model was successfully deployed on the embedded
streamlining deployment on embedded platforms. Therefore, device Jetson Orin Nano, meeting the standards for the
this study focused on lightweight YOLOv8n. By integrating real-time detection and laying a solid foundation for future
novel techniques and methods, the model achieves real-time development and application of lightweight autonomous
performance while further enhancing detection accuracy and driving algorithms..
robustness, providing a more reliable perception capability
for autonomous driving systems.
III. ENHANCED LIGHTWEIGHT YOLOV8N ALGORITHM
A. SYSTEM OVERVIEW
B. MODEL LIGHTWEIGHTING TECHNIQUES To address the challenges of the low accuracy and lim-
ited portability inherent in traditional autonomous driv-
In the realm of model lighting, the common approaches
ing detection networks, this study presents a streamlined
include replacing the backbone and pruning.
YOLOv8n-FAWL model. The model meticulously integrates
In the approach to replacing lightweight backbone net-
enhancements across three key areas of the YOLOv8n
works, networks such as ShuffleNet [16], MobileNet [17],
network, and its core architecture is depicted in Figure 2.
and GhostNet [18] are commonly used. They effectively
Initially, the paper embraces the faster-block-EMA module
reduce floating-point operations (FLOPs) but may increase
was designed to replace the original C2f module in the
the memory access latency or data manipulation overhead.
backbone. Following this, the Wise-IoU loss function was
To address this issue, this study draws inspiration from
introduced as a replacement for the original loss function
the design philosophy of the FasterNet [19] network and
in the YOLOv8 detection mechanism. In the final stage,
creates a Faster-C2f module. However, the experimental
we employ the LAMP pruning algorithm to compress the
results showed subpar accuracy, which we speculated was
model and reduce its complexity.
due to insufficient feature scale extraction by Faster-
C2f. To improve this, we integrated the EMA attention
mechanism [20] into Faster-C2f and redesigned it as Faster- B. FASTER-C2F-EMA LIGHTWEIGHT MODULE
C2f-EMA. As detailed in Section 4.4.5, the experimental To address the issue of the large model size affecting the
results demonstrate that Faster-C2f-EMA achieves signif- detection speed in autonomous driving detection tasks, this
icant improvements in accuracy and robustness for small study employs a self-designed faster-block-EMA module to
object detection. replace the bottleneck part of the original C2f module in
Pruning reduces the size of a network by eliminating YOLOv8n. This resulted in a new module named Faster-C2f-
redundant connections or neurons. Structured pruning is EMA, as illustrated in Figure 3.
also a method that reduces the size of a network by The design emphasis of this module is to accelerate
removing redundant connections or neurons. Techniques inference speed and enhance the detection of small objects.
such as DeepLayer [21] and TropNNC [22] are examples Integrating the FasterBlock module effectively improved
of structured pruning. DeepLayer reduces complexity by the inference speed. However, there are limitations in
progressively discarding entire layers, whereas TropNNC, terms of feature extraction capability. To address this
based on the principles of tropical geometry, aims to decrease issue and strengthen the detection of small objects, this

158378 VOLUME 12, 2024


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

inferior annotations. It also reduces the penalties associated


with geometric measurements and minimizes the interference
during the training process. These advantages promote
adaptive focal learning within the network, leading to
improvements in the model and enhanced generalization
capabilities. As illustrated in equations (1), (2), and (3),
Wi Hi
LIoU = 1 − IoU = 1 − (1)
Su
LWIoUv1 = RWIoU LIoU (2)
((x − xgt )2 + (y − ygt )2 )
RWIoU = expg { } (3)
(Wg2 + Hg2 )∗
where Wg, Hg represent the width and height of the smallest
enclosing box that encompasses both the predicted and
ground-truth bounding boxes, respectively, and Wi , Hi denote
FIGURE 2. YOLOv8n-FAWL structure diagram. the width and height of the intersection area between the
predicted box and the ground-truth box. Su signifies the union
area of the predicted box and ground truth box, respectively.
RWIoU stands for Region-based Weighted Intersection over
Union. A diagram illustrating the predicted and true bounding
boxes is shown in Figure 4.
To prevent RWIoU from generating harmful gradients that
hinder convergence, Wg and Hg we’re decoupled from the
computation graph (where ’*’ in the aforementioned formula
indicates this operation). Restricting the range of RWIoU
values to [1, e) significantly amplifies the importance of
low-quality anchor boxes. In contrast, the Loss Intersection
over Union (LIoU ) has a range of [1, 0], which reduces the
RWIoU value for high-quality anchor boxes and places greater
emphasis on the distance between center points.
FIGURE 3. Faster-C2f-EMA module structure diagram. The dynamic non-monotonic focus mechanism assesses
the quality of anchor boxes using ‘‘outlyingness’’ rather than
IoU, thereby providing an intelligent gradient gain allocation
study integrated an attention mechanism into the Faster- strategy. This strategy reduces the competitiveness of high-
C2f module. EMA, with its spatial structure, exhibits more quality anchor boxes while mitigating the harmful gradients
significant improvements in handling small than compared generated by low-quality samples. Consequently, the WIoU
with classical attention mechanisms such as ECA [27], can focus more on medium-quality anchor boxes, enhancing
CBAM [28], and CA [29]. As shown in the middle part the overall performance of the detector.
of Figure 3, the EMA attention mechanism is embedded
into the second-to-last layer of the FasterBlock, recovering
the lost features while extracting the relevant information
from the backbone network more efficiently. A comparative
experiment is described in Section 4.4.5.
This improvement not only reduces the number of
parameters and computational complexity, but also enhances
the overall model performance and improves the precision in
detecting small objects.. FIGURE 4. Schematic of the WIoU parameter.

C. WISE-IOU LOSS FUNCTION


This study implemented the Wise-IoU loss function in D. LAMP PRUNING ALGORITHM
YOLOv8 as a replacement for the original loss mechanism Despite the improvements made to the model as mentioned
to mitigate the inaccuracies resulting from low-quality anno- earlier, its parameters and computational load remain con-
tations during training and enhance the overall performance. siderable when deployed on embedded devices. To address
Wise-IoU is can discriminate the quality of annotations this challenge, we use a pruning algorithm called LAMP to
during training, thereby avoiding adverse gradients caused by streamline the model.

VOLUME 12, 2024 158379


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

LAMP introduces an innovative global pruning importance TABLE 1. Server-side environment.


score. The LAMP score serves as a rescaled weight
magnitude that estimates model-level distortion caused by
pruning. Thus, connections with scores below the designated
threshold were pruned. Global pruning using LAMP scores
is analogous to global pruning using dynamically determined
layer-wise sparsity thresholds. Notably, LAMP scores are
computationally efficient, require no hyperparameters, and
are independent of model-specific knowledge. The formula
for the LAMP scoring is shown in Equation (4). TABLE 2. Training parameter setting.

(W [u])2
score(u; W ) = (4)
6(v≥u) (W [v])2
In this equation, variables v and u represent different
parameters within the network, and W denotes the weight
parameters of the neural network. W[u] is the weight of the
u-th parameter. The term 6(v≥u) (W [v])2 represents the sum of
squares of all weights starting from the u-th weight (inclusive)
up to the last weight in the current layer. This formula rescales
the magnitude of each weight by computing its ratio to the
sum of the squares of all subsequent weights in the same For this experiment, two classic datasets in the field of
layer, resulting in a layer-adaptive pruning importance score. autonomous driving, the Udacity Self-Driving Car Dataset
This score aims to approximate the distortion in the model [30] and BDD100K-tiny, were used for testing. First,
output caused by pruning, enabling efficient neural network we experimented with a model improvement method using an
pruning through the automatic selection of sparsity levels for udacity dataset. The final improvements were validated using
each layer without the need for hyperparameter tuning. BDD100K-tiny dataset.
The pruning process based on the importance of the The Udacity Self-Driving Car Dataset, originally designed
weights, is illustrated in Figure 5. After the model input, for autonomous vehicle algorithm competitions, contains
the LAMP calculates the scores, resulting in LAMP scores. 15,000 urban road images. In this experiment, the entire
The pruning algorithm then removes the weights with lower dataset was randomly divided into three distinct subsets
scores, leading to a pruned model. Finally, fine-tuning was with a distribution ratio: training, validation, and testing.
performed as needed to recover some of the performance As depicted in Figure 6, the target points in this dataset
losses. we’re predominantly concentrated around the central point,
In this study, we implemented fine-tuning of model and small targets constituted a significant proportion of the
pruning by introducing an additional parameter, Speed_up, dataset.
which is numerically equivalent to the ratio of the model’s The BDD100K [31] dataset, released by the University of
computational cost before pruning divided by the number of California, Berkeley, is a large-scale and diverse dataset for
parameters after pruning. For instance, if the computational research in the field of autonomous driving. From the 70,000
cost of the model before pruning is 8.2 the GFLOPs, and images in the dataset, 20,000 containing at least two types
the target computational cost after pruning is 5.47 GFLOPs, of objects were randomly selected. These images were also
Speed_up can be set to 1.5-. This allows the model to randomly divided into three subsets: training, validation, and
automatically prune until it reaches the target computational testing, following the same 6:2:2 ratio to form a new dataset
cost, at which point the pruning process stops.Similarly, named BDD100K-tiny, as shown in Figure 7. Compared with
if the target computational cost after pruning is 4.1GFLOPs, the Udacity dataset, the target distribution in this dataset
Speed_up can be set to 2. was more uniform, and the proportion of small targets was
relatively smaller.
IV. EXPERIMENTS AND ANALYSIS Subsequently, the enhanced model was deployed on a
A. TRAINING ENVIRONMENT AND METHODOLOGY Jeston Orin Nano platform to evaluate its performance, with
The research in this study involved training the model on the specific test environment parameters for the Jeston Orin
the server side and conducting an initial model evaluation, Nano delineated in Table 3.
followed by deployment to the embedded device Jeston
Orin Nano with acceleration through TensorRT for model B. EVALUATION METRICS
validation. This study primarily focused on, several evaluation metrics.
The server-side experimental environment configurations (1) It is noteworthy that precision (P) as a metric
used in this study are presented in Table 1. indicates the proportion of correctly predicted positive
The training parameters are shown in Table 2. samples among all predicted positive samples, calculated

158380 VOLUME 12, 2024


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

FIGURE 5. Schematic diagram of pruning.

TP denotes the number of true positive samples that were cor-


rectly identified, FP represents the number of false positive
samples, which are actually negative samples misidentified
as positive, and FN indicates the number of false negative
samples, which are positive samples incorrectly classified as
negative.

TABLE 3. Embedded test environment.

FIGURE 6. Udacity location and size distribution of the object center


point: (a) map or diagram of target position distribution in the dataset
(b) graph of target size distribution of the dataset of 6:2:2.

(2) The average precision is the mean of the individual


average precisions (AP). The AP represents the area under the
precision-recall (P - R) curve, which is computed according
to equations (7) and (8).
Z 1
AP = P(r)dr (7)
0
n
1X
mAP = APi (8)
n
i=1
FIGURE 7. BDD100K-tiny location and size distribution of object center Specifically, these included [email protected], [email protected]:0.95.
point: (a) The map or diagram of target position distribution in the
dataset,(b) The graph of target size distribution of the dataset. The difference between the two lies in the IoU threshold;
[email protected]:0.95, whicn represents the mean mAP computed
over an IoU threshold range 0.5 of 0.95.
according to Equation (5). (3) To evaluate the suitability of deploying models onto
embedded devices, the study concluded by testing [email protected],
TP
P= × 100% (5) [email protected]:0.95, and frames per second (FPS) on the Jeston
TP + FP Orin Nano platform. The (fra-per-second)FPS calculation
Recall (R) represents the proportion of correctly predicted formula is based on equation (9):
positive samples among all actual positive samples and was N (P)
calculated using Equation (6). FPS = (9)
T (P)
TP where N(P) represents the total number of images processed
R= × 100% (6)
TP + FN and T(P) denotes the time taken to process these images.

VOLUME 12, 2024 158381


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

(4)This study used Giga Floating-Point Operations Per TABLE 4. Udacity dataset comparison experiment.
Second (GFLOPs) and the number of parameters to measure
the complexity of the models.

C. EXPERIMENTAL RESULTS ANALYSIS


1) TRAINING ANALYSIS OF THE IMPROVED MODEL
During the training process with the Udacity Self-Driving
Car Dataset, all models utilized pre-trained weights. Upon
completion of training, the results for the YOLOv8n
and lightweight YOLOv8n-FAWL models are presented
in Figures 8(a) and 8(b) respectively.After 350 iterations, To validate the generalization performance of the model,
both models stabilized their performance and achieved comparative experiments were conducted on the BDD100K-
significant improvements over the original model in terms tiny dataset, and YOLOv8n-FAWL achieved good results in
of [email protected] and [email protected]:0.95. This indicates that terms of both accuracy and model size compared to several
YOLOv8n-FAWL outperformed the original YOLOv8n in other models.
terms of object detection accuracy, thereby validating the On the Udacity and BDD100K-tiny datasets, the improved
effectiveness of the proposed enhancements. algorithm exhibited different levels of performance enhance-
ment, with a more significant improvement on the Udacity
dataset. This is because the variation in the image back-
grounds in the BDD100K-tiny dataset limits the ability
of the model to process detailed information to a certain
extent. However, the performance improvement was still con-
siderable, indicating that YOLOv8n-FAWL still possessed
stronger detection capabilities than the original algorithm
under complex lighting conditions.

TABLE 5. BDD100K-tiny dataset comparison experiment.

FIGURE 8. Original model and improved model metric comparison:


(a) Training result of [email protected]; (b) Training result of [email protected]:0.95.

2) COMPARISON EXPERIMENT ANALYSIS OF DIFFERENT


MODELS
Table 4 compares the YOLOv8n-FAWL algorithm on
the server side with mainstream object detection models,
including Faster R-CNN, SSD [32], YOLOv3-tiny [33],
YOLOv5s, YOLOv7-tiny, YOLOv8n, RT-DETR-r18 [34], 3) ABLATION EXPERIMENT ANALYSIS
and YOLOv10n [35], which have been among the top- On the server side, we conducted ablation experiments on
performing models in recent years, under the Udacity dataset. the Udacity dataset; the results are presented in Table 6.
The experimental results show that among the ten tested Compared with YOLOv8n, the YOLOv8n-W model, which
models, RT-DETR-r18,an advanced algorithm developed solely replaces the loss function with WIOU, exhibits a
in recent years,has achieved good detection performance. slight improvement in [email protected]. YOLOv8n-FA, however,
However, owing to its significant number of parameters and showed moderate enhancements in both precision and recall
computational requirements, it is less suitable for deployment rates. While reducing the number of model parameters
in embedded devices. Meanwhile, the latest SOTA algorithm, and computational complexity, it achieved a 2.6% increase
YOLOv10n, exhibited slightly lower performance metrics in average precision [email protected] and a 1.7% increase in
than the YOLOv5s. [email protected]:0.95. This indicates that the Faster-C2f-EMA
The lightweight versions within the YOLO series, such module effectively minimizes the model size while enhancing
as YOLOv3-tiny, YOLOv5s and YOLOv7-tiny, demonstrate the feature extraction capability of the backbone network,
faster detection speeds. Among them, YOLOv3-tiny lags ultimately improving the accuracy of the model.
slightly behind the other models in detection accuracy, where- After replacing the loss function with Wise-IoU,
as YOLOv5s and YOLOv7-tiny strike a balance between YOLOv8n-FAW’s [email protected]:0.95 increased by 1.4%. This
parameter count, computational complexity, and accuracy. demonstrates that the ability of the model to handle low-
However, their overall performance still falls short of that of quality samples during training was enhanced, resulting in
YOLOv8n. significant optimization of its overall performance.

158382 VOLUME 12, 2024


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

FIGURE 9. Channel contrast diagram.

Upon applying LAMP pruning to YOLOv8n-FAW, as indi- enhancing detection accuracy. Additionally, replacing the
cated in Table 5, the model attained a peak accuracy when the loss function with Wise-IoU effectively handles low-quality
speed_up factor was set to 1.2. Consequently, we designated samples during training, thereby improving the model’s
this optimized model YOLOv8n-FAWL. Figure 9 shows a overall performance and generalization capabilities. To fur-
comparison of the model channel configurations before and ther lighten the model, the LAMP pruning algorithm is
after pruning. utilized, which compresses the model while maintaining the
Merely relying on data may not effectively enhance the original network structure and image feature representation,
detection performance. To visually demonstrate the improved thereby eliminating the redundant weights in each layer.
detection capabilities resulting from these modifications, This results in the improved robustness of the model and
this study first presents a heatmap visualization, as shown an additional increase in detection precision. Therefore, the
in Figure 10. This figure compares the heatmaps gener- improved YOLOv8n-FAWL model, which is characterized
ated by the YOLOv8-FAWL and YOLOv8n algorithms. by fewer parameters and computations yet achieves the
As shown in the heatmaps, the YOLOv8-FAWL model highest recognition accuracy, is suitable for deployment on
exhibits a higher degree of focus on small objects than the embedded devices.
YOLOv8n model. The results indicate that the YOLOv8- The final improved model, YOLOv8n-FAWL, further
FAWL algorithm pays more attention to feature information, reduces the number of parameters and computational
resulting in higher sensitivity to target detection and better requirements to 49.2% and 74.4% of their original values,
performance. respectively. Compared with the original model, the model’s
Figure 11 shows a comparison of the detection results average precision metrics, [email protected] and [email protected]:0.95,
between the YOLOv8-FAWL and YOLOv8n algorithms increased by 6.2% and 3.8%, respectively.
on the Udacity and BDD100K-tiny datasets, showing the We conducted ablation experiments on the BDD100K-
actual detection outcomes under various conditions including tiny dataset, as shown in Table 7. The final improved
daytime, nighttime, significant changes in lighting con- model, YOLOv8n-FAWL, further reduces the number of
ditions, and environments with numerous small targets. parameters and computational requirements to 49.2% and
It is evident that the YOLOv8-FAWL algorithm exhibits 74.4% of their original values, respectively. Compared with
superior generalization and applicability in different sce- the original model, the average precision metrics of the
narios. It can detect small targets in various environments. model, [email protected] and [email protected]:0.95, increased by 4.5%
By contrast, the YOLOv8n network struggles to detect and 2.7%, respectively. In summary, the improved model
smaller targets. Additionally, compared to the YOLOv8n has made significant progress in reducing the parameter
algorithm, the YOLOv8-FAWL algorithm achieves similar count and computational intensity, while also achieving
confidence levels when detecting large and medium-sized considerable enhancement in model precision. It exhibits
targets, demonstrating its robustness. This underscores the good portability and is suitable for autonomous driving
potential of YOLOv8-FAWL for applications in diverse detection tasks in complex road environments.
autonomous driving scenarios with high accuracy.
In summary, this study enhances the YOLOv8n model 4) EMBEDDED DEVICE EXPERIMENT ANALYSIS
by introducing the Faster-C2f-EMA module to improve To ensure real-time inference of the model on edge devices,
its backbone C2f module. This modification reduces the the following acceleration measures were taken for the
model’s parameter count and computational load while model proposed in this paper. First, the improved model

VOLUME 12, 2024 158383


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

TABLE 6. Udacity ablation experiment results.

TABLE 7. BDD100K-tiny ablation experiment results.

FIGURE 10. Heat map comparison.

was converted into a universal ONNX model file format. used to measure real-time FPS data while deploying the
Subsequently, TensorRT parses the ONNX model to create model on a Jetson Orin Nano for live detection. Concurrently,
an FP32 engine-model file. This approach maintains floating- the converted model was employed to perform inference
point precision, resulting in a minimal impact on the accuracy on the validation set images, and the results were saved as
of the detection model, thereby demonstrating its broad JSON files for testing the average precision at [email protected] and
applicability. In this study, an external USB camera was [email protected]:0.95.

158384 VOLUME 12, 2024


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

FIGURE 11. Model ablation experiment results comparison chart.

As shown in Table 8, except for YOLOv8n, which 5) SUPPLEMENTARY NOTE DISCUSSION


maintained its accuracy after conversion, all other models In the original YOLOv8n network, the backbone section
experienced some degree of accuracy loss. In terms of the employs the concept of multi-scale feature fusion, which
average precision metrics [email protected] and [email protected]:0.95, plays a crucial role in the accuracy of the model. Therefore,
as well as the FPS metric, YOLOv8n-FAWL achieved this study aimed to improve the feature extraction capability
the highest scores, meeting the requirements for real-time by modifying the C2f module in the backbone. As shown
detection in vehicle-mounted testing. in Table 9, experiments were conducted in which the C2f

VOLUME 12, 2024 158385


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

TABLE 8. Results of the embedded deployment experiment.

in the backbone was replaced with Faster-C2f, as well as on embedded devices to achieve rapid and precise recognition
with Faster-C2f fused with an attention mechanism. It can while meeting the requirements for real-time detection. This
be observed that the Faster-C2f-EMA designed in this study validates the feasibility and portability of the Yolov8n-FAWL
achieved the best results in all aspects. detection model and offers valuable information for the
potential deployment of future autonomous driving detection
TABLE 9. Ablation experiment results of module replacement locations. models on mobile devices.
(3)A limitation of this study is that the improved model
does not demonstrate a significant increase in inference
speed. In subsequent research, we plan to perform model
distillation while maintaining the accuracy to further reduce
the number of model parameters and alleviate the compu-
tational burden on hardware systems in autonomous driving
applications.

V. CONCLUSION
ACKNOWLEDGMENT
For the autonomous driving recognition task studied in
The authors would like to thank Prof. Rongrong Chen and
this paper, considering the significant variations in lighting
Prof. Wuyang Xue for their guidance in the experiments and
conditions and uneven distribution of target scales across the
for revising the article during the research period. This article
two datasets, as well as to ensure the model’s portability and
solely utilizes AI to assist in translating sentences.
real-time performance, we made improvements to the high-
performance YOLOv8n model. This has led to the creation
of a lightweight detection model called YOLOv8n-FAWL. REFERENCES
Through a series of ablation experiments and comparative [1] O. Tuzel, F. Porikli, and P. Meer, ‘‘Pedestrian detection via classification on
Riemannian manifolds,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 30,
evaluations with other object detection models, including no. 10, pp. 1713–1727, Oct. 2008.
experimental tests on embedded boards, the following [2] S. Ren, K. He, and R. Girshick, ‘‘Faster R-CNN: Towards real-time
conclusions were drawn: object detection with region proposal networks,’’ in Proc. Adv. Neural Inf.
(1)By integrating the Faster-Block from the FasterNet Process. Syst., vol. 28, 2015.
[3] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, ‘‘Focal loss for dense
network with an EMA attention mechanism, the bottleneck object detection,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Oct. 2017,
module in the C2f segment of the YOLOv8n backbone was pp. 2999–3007.
replaced, creating a lightweight Faster-C2f-EMA module. [4] R. Huang, J. Pedoeem, and C. Chen, ‘‘YOLO-LITE: A real-time object
detection algorithm optimized for non-GPU computers,’’ in Proc. IEEE
In addition, the Wise-IoU loss function was introduced
Int. Conf. Big Data, Dec. 2018, pp. 2503–2510.
as a replacement for the CIoU loss function. The model [5] J. Redmon and A. Farhadi, ‘‘YOLOv3: An incremental improvement,’’
is then compressed using the LAMP pruning algorithm. 2018, arXiv:1804.02767.
The improved lightweight model reduced the computational [6] Z. Tong, Y. Chen, Z. Xu, and R. Yu, ‘‘Wise-IoU: Bounding box regression
loss with dynamic focusing mechanism,’’ 2023, arXiv:2301.10051.
complexity and parameter sizes were 74.3% and 49.2% of the
[7] J. Lee, S. Park, S. Mo, S. Ahn, and J. Shin, ‘‘Layer-adaptive sparsity for
original model, respectively. In server-side evaluations, the the magnitude-based pruning,’’ 2020, arXiv:2010.07611.
enhanced model shows a 6.1% increase in [email protected] and a [8] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
3.7% increase in [email protected]:0.95 over the original YOLOv8n, Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
indicating that the enhanced model not only reduced its
[9] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
volume but also significantly enhances recognition accuracy Dec. 2015, pp. 1440–1448.
while being lightweight. [10] A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, ‘‘YOLOv4: Optimal
(2)Compared with the current mainstream autonomous speed and accuracy of object detection,’’ 2020, arXiv:2004.10934.
driving detection methods, the lightweight autonomous [11] C.-Y. Wang, H.-Y. Mark Liao, Y.-H. Wu, P.-Y. Chen, J.-W. Hsieh, and
I.-H. Yeh, ‘‘CSPNet: A new backbone that can enhance learning capability
driving detection model improved from YOLOv8n in this of CNN,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
study after being accelerated by TensorRT can be deployed Workshops (CVPRW), Jun. 2020, pp. 1571–1580.

158386 VOLUME 12, 2024


Z. Cai et al.: YOLOv8n-FAWL: Object Detection for Autonomous Driving

[12] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Spatial pyramid pooling in deep [33] L. Fu, Y. Feng, J. Wu, Z. Liu, F. Gao, Y. Majeed, A. Al-Mallahi, Q. Zhang,
convolutional networks for visual recognition,’’ IEEE Trans. Pattern Anal. R. Li, and Y. Cui, ‘‘Fast and accurate detection of kiwifruit in orchard
Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015. using improved YOLOv3-tiny model,’’ Precis. Agricult., vol. 22, no. 3,
[13] X. Zhu, S. Lyu, X. Wang, and Q. Zhao, ‘‘TPH-YOLOv5: Improved pp. 754–776, Jun. 2021.
YOLOv5 based on transformer prediction head for object detection on [34] Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, and
drone-captured scenarios,’’ in Proc. IEEE/CVF Int. Conf. Comput. Vis. J. Chen, ‘‘DETRs beat YOLOs on real-time object detection,’’ 2023,
Workshops (ICCVW), Oct. 2021, pp. 2778–2788. arXiv:2304.08069.
[14] C.-Y. Wang, A. Bochkovskiy, and H.-Y.-M. Liao, ‘‘YOLOv7: Trainable [35] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, and G. Ding,
bag-of-freebies sets new state-of-the-art for real-time object detectors,’’ in ‘‘YOLOv10: Real-time end-to-end object detection,’’ 2024,
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023, arXiv:2405.14458.
pp. 7464–7475.
[15] P. Yan, W. Wang, G. Li, Y. Zhao, J. Wang, and Z. Wen, ‘‘A lightweight coal
gangue detection method based on multispectral imaging and enhanced
YOLOv8n,’’ Microchemical J., vol. 199, Apr. 2024, Art. no. 110142.
[16] X. Zhang, X. Zhou, M. Lin, and J. Sun, ‘‘ShuffleNet: An extremely
efficient convolutional neural network for mobile devices,’’ in ZIBIN CAI was born in Shaoguan, Guangdong,
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018,
China, in 2001. He is currently pursuing the
pp. 6848–6856.
bachelor’s degree with the School of Electronic
[17] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
and Electrical Engineering, Zhaoqing University.
M. Andreetto, and H. Adam, ‘‘MobileNets: Efficient convolutional neural
networks for mobile vision applications,’’ 2017, arXiv:1704.04861.
[18] K. Han, Y. Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, ‘‘GhostNet: More
features from cheap operations,’’ in Proc. IEEE/CVF Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jun. 2020, pp. 1577–1586.
[19] J. Chen, S.-H. Kao, H. He, W. Zhuo, S. Wen, C.-H. Lee, and S.-H.-G. Chan,
‘‘Run, don’t walk: Chasing higher FLOPS for faster neural networks,’’ in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2023,
pp. 12021–12031.
[20] D. Ouyang, S. He, G. Zhang, M. Luo, H. Guo, J. Zhan, and Z. Huang,
‘‘Efficient multi-scale attention module with cross-spatial learning,’’
in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), RONGRONG CHEN received the Bachelor of
Jun. 2023, pp. 1–5. Engineering degree from Beijing Institute of
[21] S. Gao, F. Huang, W. Cai, and H. Huang, ‘‘Network pruning via Technology, in 2006, and the Master of Science
performance maximization,’’ in Proc. IEEE/CVF Conf. Comput. Vis. degree from Blekinge Institute of Technology,
Pattern Recognit. (CVPR), Jun. 2021, pp. 9266–9276. in 2008. Currently, she is an Associate Professor
[22] K. Fotopoulos, P. Maragos, and P. Misiakos, ‘‘TropNNC: Structured neural and the Dean of the Department of Electron-
network compression using tropical geometry,’’ 2024, arXiv:2409.03945. ics and Communication Engineering, School of
[23] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, ‘‘Learning efficient Electronics and Electrical Engineering, Zhaoqing
convolutional networks through network slimming,’’ in Proc. IEEE Int.
University. Her research interests include digi-
Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2755–2763.
tal signal processing, fundamentals of computer
[24] X. Zhang, J. Sun, J. Wang, Y. Jin, L. Wang, and Z. Liu, ‘‘PAOLTransformer:
application and C++ programming, signals and systems, and related fields.
Pruning-adaptive optimal lightweight transformer model for aero-engine
remaining useful life prediction,’’ Rel. Eng. Syst. Saf., vol. 240, Dec. 2023,
Art. no. 109605.
[25] S. Wang, T. Xie, H. Liu, X. Zhang, and J. Cheng, ‘‘PSE-net: Channel
pruning for convolutional neural networks with parallel-subnets estima-
tor,’’ Neural Netw., vol. 174, Jun. 2024, Art. no. 106263.
[26] G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, Y. Kwon, J. Fang, ZIYI WU was born in Yunfu, Guangdong, China,
K. Michael, D. Montes, J. Nadar, P. Skalski, and Z. Wang, ‘‘Ultralyt- in 2004. He is currently pursuing the bachelor’s
ics/YOLOv5: V6. 1—TensorRT, TensorFlow edge TPU and OpenVINO degree with the School of Electronic and Electrical
export and inference,’’ Zenodo, Tech. Rep., 2022. Engineering, Zhaoqing University.
[27] Q. Wang, B. Wu, P. Zhu, P. Li, W. Zuo, and Q. Hu, ‘‘ECA-net: Efficient
channel attention for deep convolutional neural networks,’’ in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2020,
pp. 11531–11539.
[28] S. Woo, J. Park, J. Y. Lee, and I. S. Kweon, ‘‘CBAM: Convolutional
block attention module,’’ in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
pp. 3–19.
[29] Q. Hou, D. Zhou, and J. Feng, ‘‘Coordinate attention for efficient mobile
network design,’’ in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.
(CVPR), Jun. 2021, pp. 13708–13717.
[30] A. Buyval, A. Gabdullin, R. Mustafin, and I. Shimchik, ‘‘Realtime
vehicle and pedestrian tracking for didi udacity self-driving car chal- WUYANG XUE received the B.S. and M.S.
lenge,’’ in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2018, degrees in electrical engineering and the Ph.D.
pp. 2064–2069. degree in information and communication engi-
[31] F. Yu, H. Chen, X. Wang, W. Xian, Y. Chen, F. Liu, V. Madhavan, neering from Shanghai Jiao Tong University,
and T. Darrell, ‘‘BDD100K: A diverse driving dataset for heterogeneous Shanghai, China, in 2014, 2017, and 2022,
multitask learning,’’ 2018, arXiv:1805.04687. respectively. His research interests include robotic
[32] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, navigation, path planning, obstacle avoidance, and
and A. C. Berg, ‘‘SSD: Single shot MultiBox detector,’’ in Proc. deep learning.
14th Eur. Conf., Amsterdam, The Netherlands. Springer, Oct. 2016,
pp. 21–37.

VOLUME 12, 2024 158387

You might also like