0% found this document useful (0 votes)
125 views

YOLOv8_A_Novel_Object_Detection_Algorithm_with_Enhanced_Performance_and_Robustness

Uploaded by

yanguangsun792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views

YOLOv8_A_Novel_Object_Detection_Algorithm_with_Enhanced_Performance_and_Robustness

Uploaded by

yanguangsun792
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS)

YOLOv8: A Novel Object Detection Algorithm with


Enhanced Performance and Robustness
1st Rejin Varghese 2nd Sambath M.
Department of Computer Applications School of Computing Science
2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS) | 979-8-3503-6482-8/24/$31.00 ©2024 IEEE | DOI: 10.1109/ADICS58448.2024.10533619

Hindustan Institute of Technology and Science Hindustan Institute of Technology and Science
Chennai, India Chennai, India
[email protected] [email protected]

Abstract— In recent years, the You Only Look Once bounding boxes and certainty scores for each cell. It too
(YOLO) series of object detection algorithms have garnered predicts the lesson probabilities for each bounding box and
significant attention for their speed and accuracy in real-time combines them with the certainty scores to deliver the final
applications. This paper presents YOLOv8, a novel object detection comes about. YOLO is eminent for its speed and
detection algorithm that builds upon the advancements of compelling execution on huge and medium-sized objects, but
previous iterations, aiming to further enhance performance and it has certain impediments, such as moo review, rough
robustness. Inspired by the evolution of YOLO architectures localization, and subpar execution on little objects.
from YOLOv1 to YOLOv7, as well as insights from comparative
analyses of models like YOLOv5 and YOLOv6, YOLOv8 Since YOLO's beginning, various variations and
incorporates key innovations to achieve optimal speed and improvements have been proposed to address its impediments
accuracy. Leveraging attention mechanisms and dynamic and boost its execution. Vital cases incorporate YOLOv2,
convolution, YOLOv8 introduces improvements specifically YOLOv3, YOLOv4, YOLOv5, YOLOv6, and YOLOv7 [11].
tailored for small object detection, addressing challenges These forms have consolidated different procedures and
highlighted in YOLOv7. Additionally, the integration of voice developments, such as stay boxes, multi-scale predictions,
recognition techniques enhances the algorithm's capabilities for feature pyramid systems, residual connections, consideration
video-based object detection, as demonstrated in YOLOv7. The instruments, energetic convolutions, and voice
proposed algorithm undergoes rigorous evaluation against acknowledgment. These strategies have upgraded YOLO's
state-of-the-art benchmarks, showcasing superior performance
precision, strength, and effectiveness, making it more versatile
in terms of both detection accuracy and computational
efficiency. Experimental results on various datasets confirm the
to different circumstances and applications. All things
effectiveness of YOLOv8 across diverse scenarios, further considered, there are still openings for encourage
validating its suitability for real-world applications. This paper advancement and optimization, especially for challenging
contributes to the ongoing advancements in object detection scenarios including little objects, blocked objects, and
research by presenting YOLOv8 as a versatile and high- complex foundations.
performing algorithm, poised to address the evolving needs of In this paper, we introduce YOLOv8, a modern object
computer vision systems.
detection algorithm that builds upon the past YOLO forms and
Keywords—YOLOv8, Object Detection, Performance
consolidates modern highlights and improvements. YOLOv8
Enhancement, Robustness, Computational Efficiency, Computer endeavours to realize the most elevated speed and precision in
Vision Systems object location, whereas guaranteeing strength and soundness.

I. INTRODUCTION II. LITERATURE REVIEW


Recognizing objects could be a vital and complex YOLO (You Only Look Once), a single-stage object
errand within the field of computer vision, with applications detection algorithm, was initially introduced by Redmon and
traversing security, observation, self-driving vehicles, Farhadi in 2017 [1]. The YOLO algorithm partitions the input
robotics, and medical imaging. The objective of object image into a grid of cells and forecasts a predetermined
location is to find and classify objects in pictures or number of bounding boxes and confidence scores for each
recordings, giving their bounding boxes and names. There are cell. Additionally, YOLO predicts the class probabilities for
two primary sorts of protest location methods: two-stage each bounding box and merges them with the confidence
methods and one-stage methods. Two-stage methods, such as scores to produce the final detection outcomes. YOLO is
R-CNN, Fast R-CNN, and Faster R-CNN, at first generate a recognized for its impressive speed and effective performance
set of region recommendations and after that refine them on large and medium-sized objects. However, it does have
employing a classifier and a regressor. On the other hand, one- certain drawbacks, such as low recall, imprecise localization,
stage strategies like SSD, RetinaNet, and YOLO specifically and suboptimal performance on small objects.
foresee the bounding boxes and names from the input picture, The YOLOv2 [1] is an progressed adaptation of YOLO,
dispensing with the require for locale recommendations. In which presents a few procedures to upgrade the accuracy and
spite of the fact that one-stage methods are ordinarily speedier efficiency of the algorithm. A few of the most methods are:
and less complex than two-stage strategies, they regularly
compromise on exactness and soundness. • Anchor boxes : YOLOv2 employments predefined
bounding box shapes, called anchor boxes, to
YOLO (You only Look Once), a critical one-stage superior fit the objects of distinctive sizes and
protest discovery calculation, was to begin with presented by aspect ratios. YOLOv2 predicts the offsets and
Redmon and Farhadi in 2017 [1]. YOLO segments the input scales of the anchor boxes, rather than the supreme
picture into a network of cells and predicts a settled number of

IEEE 979-8-3503-6482-8/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: China University of Petroleum. Downloaded on December 22,2024 at 09:24:59 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS)

facilitates and measurements of the bounding redundancy and complexity of the network, and
boxes. boosts the effectiveness and execution of the
network.
• Multi-scale predictions : YOLOv2 predicts
bounding boxes at three distinctive scales, • SPP : Typically a spatial pyramid pooling (SPP)
comparing to the coarse, medium, and fine module that aggregates the features from diverse
features extracted from the input picture. This regions of the input image, and progresses the
permits YOLOv2 to identify objects of different strength and invariance of the network. SPP uses
sizes more successfully. multiple max-pooling layers with distinctive
kernel sizes and strides, and concatenates their
• Batch normalization : YOLOv2 applies batch outputs to make a fixed-length feature vector.
normalization to each layer of the network, which
diminishes the inner covariate move and • PANet : Usually a way aggregation network
progresses the steadiness and merging of the (PANet) that improves the feature fusion and
preparing prepare. data stream within the network. PANet
employments bottom-up and top-down ways to
• Darknet-19 : YOLOv2 utilizes a modern backbone aggregate the features from distinctive levels of
network known as Darknet-19, a streamlined and the network, and uses adaptive feature selection
viable convolutional neural organize. Darknet-19 to powerfully alter the weights of the features.
is composed of 19 convolutional layers and 5 max-
pooling layers, and it utilizes 3x3 and 1x1 filters to • Mish : YOLOv4 uses a modern activation
diminish the amount of parameters and function, called Mish. Mish is a self-regularized
computations. and smooth function, which preserves the
positive values and suppresses the negative
YOLOv3 [2] is another progressed version of values of the input. Mish has been appeared to
YOLO, which advance improves the execution and vigor of outperform other activation functions, such as
the algorithm. A few of the most upgrades are: ReLU, Leaky ReLU, and Swish, in terms of
• Feature pyramid network : YOLOv3 uses a exactness and stability.
feature pyramid network (FPN) to combine the Based on the PyTorch framework, YOLOv5 [4] may
highlights from distinctive levels of the backbone be a later adaptation of YOLO that gives a straightforward and
network, and produce high-quality and different adaptable solution for object detection. It is an independent
bounding box forecasts. FPN employments skip project that incorporates some of the concepts and strategies
associations and upsampling operations to from the past YOLO variations, instead of an official
combine the low-level and high-level highlights, continuation of the YOLO series. YOLOv5 has a few of the
and produces highlight maps of distinctive taking after features and upgrades:
resolutions for multi-scale expectations.
• EfficientNet : Typically a cutting-edge neural
• Residual connections : The algorithm network architecture that accomplishes high
employments residual connections to encourage proficiency and execution on different computer
the data flow and gradient propagation within the vision tasks. It employments a compound scaling
network. Residual connections include the yield strategy to adjust the depth, width, and
of a past layer to the input of a consequent layer, determination of the network, and optimizes the
and offer assistance to maintain a strategic network for diverse asset limitations and target
distance from the vanishing gradient problem and accuracies.
make strides the exactness of the network.
• FPN : Typically a include pyramid network that
• YOLOv3-tiny : YOLOv3 also provides a littler fuses the features from diverse levels of the
and speedier adaptation of the network, called backbone network, and produces high-quality
YOLOv3-tiny, which is appropriate for resource- and different bounding box forecasts. It
constrained gadgets and applications. YOLOv3- employments skip connections and upsampling
tiny employments less layers and channels, and operations to combine the low-level and high-
predicts bounding boxes at two scales rather than level features, and produces feature maps of
three. diverse resolutions for multi-scale forecasts.
YOLOv4 [3] is a recent version of YOLO that • Data augmentation : This refers to different data
applies different cutting-edge strategies and advancements to augmentation methods, such as random
upgrade the speed and accuracy of object detection. A few of cropping, flipping, scaling, rotation, color
the most techniques and advancements are: jittering, and mosaic, that increment the differing
• CSPDarknet53 : This is often a new backbone qualities and complexity of the training data, and
network, called CSPDarknet53, which is based upgrade the generalization and robustness of the
on the cross-stage partial (CSP) connections and model.
the Darknet-53 network. CSP connections • Model variants : YOLOv5 offers four model
separate the feature maps into two parts, and as it variations, to be specific YOLOv5s, YOLOv5m,
were one part goes through the consequent YOLOv5l, and YOLOv5x, that have diverse
layers, whereas the other portion is concatenated trade-offs between speed and precision.
with the yield of the final layer. This lowers the
YOLOv5s is the smallest and fastest model,

IEEE 979-8-3503-6482-8/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: China University of Petroleum. Downloaded on December 22,2024 at 09:24:59 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS)

whereas YOLOv5x is the biggest and most exact • Focal loss : The algorithm uses a modern loss
model. function, called Focal Loss, which could be a loss
function that centres on the difficult cases and
YOLOv6 [5] could be a later adaptation of YOLO,
diminishes the impact of the simple illustrations
which is based on the TensorFlow framework and points to
[10]. Focal Loss employments a modulating
progress the execution and strength of object detection in
factor to down-weight the commitment of the
complex environments. YOLOv6 is additionally not an
well-classified cases, and a scaling factor to
official continuation of the YOLO series, but or maybe an
adjust the positive and negative samples.
independent project that presents some new features and
upgrades to the YOLO algorithm. A few of the most features III. OVERVIEW OF PROPOSED ALGORITHM
and improvements of YOLOv6 are:
The proposed algorithm, YOLOv8, is the most
• Attention mechanism : YOLOv6 uses an recent advancement within the YOLO (You only Look Once)
attention mechanism to focus on the foremost series of object detection models. It builds upon the
important and enlightening features for object foundational work of YOLO9000, which was recognized for
detection, and suppress the unessential and noisy its predominant speed and strength. The ensuing adaptations,
features. YOLOv6 employments a self-attention YOLOv3 and YOLOv4, encourage made strides the model's
module, which computes the closeness between execution, especially in complex environments and in
the highlights of distinctive areas, and a channel accomplishing optimal speed and exactness of object
consideration module, which computes the detection.
significance of the features of diverse channels.
The attention mechanism makes a difference to 1. Network Architecture
move forward the exactness and robustness of the The structure of YOLOv8 is essentially partitioned
model, particularly for little and impeded objects. into two key components: the backbone network and the
detection head. The part of the backbone network is to
• Dynamic convolution : YOLOv6 employments a extricate a assortment of rich features from the input picture at
dynamic convolution method to adapt the numerous scales. On the other hand, the detection head takes
convolutional filters to the input picture, and on the task of merging these features and creating different
create more discriminative and expressive and high-quality forecasts for bounding boxes.
features for object detection [8]. YOLOv6 uses a
conditional convolution layer, which predicts the 1.1 Backbone Network
weights of the convolutional filters based on the The backbone network of YOLOv8 is based on
input picture, and a dynamic routing layer, which EfficientNet [12], which could be a state-of-the-art neural
selects the foremost reasonable filters for each network architecture that accomplishes high proficiency
feature map. The dynamic convolution procedure and performance on various computer vision tasks.
makes a difference to move forward the EfficientNet is based on the thought of compound scaling,
productivity and execution of the model, which could be a strategy that scales the network width,
especially for complex and different scenes. depth, and determination in a balanced way. EfficientNet
• YOLOv6-tiny : YOLOv6 moreover gives a employments a base network, called EfficientNet-B0,
smaller and quicker adaptation of the network, which could be a convolutional neural network that has 29
called YOLOv6-tiny, which is appropriate for layers and employments modified residual blocks with
resource-constrained devices and applications. squeeze-and-excitation modules. EfficientNet at that point
YOLOv6-tiny employments less layers and scales up the base network to get diverse variants, such as
channels, and predicts bounding boxes at two EfficientNet-B1, EfficientNet-B2, EfficientNet-B7, by
scales rather than three. employing a compound scaling coefficient. The
compound scaling coefficient is decided by a grid search
YOLOv7 [6][9] is a recent version of YOLO, which that optimizes the trade-off between precision and
is based on the PyTorch framework and points to attain efficiency.
optimal speed and precision of object detection. YOLOv7 is
additionally not an official continuation of the YOLO series, YOLOv8 employments EfficientNet-B4 as the
but or maybe an independent project that joins different state- backbone network, which could be a scaled-up form of
of-the-art strategies and advancements to optimize the YOLO EfficientNet-B0 that has 71 layers and 19 million
algorithm. A few of the most methods and developments of parameters. EfficientNet-B4 is chosen since it offers a
YOLOv7 are: good balance between speed and precision, and since it can
extract rich and multi-scale features from the input picture.
• NAS-FPN : YOLOv7 employments a neural The input picture is resized to 512 x 512 pixels, and after
architecture search method, called NAS-FPN, to that encouraged into the backbone network. The backbone
automatically generate feature pyramid networks network outputs five feature maps with diverse resolutions
for object detection. NAS-FPN uses a and dimensions, comparing to different levels of the
reinforcement learning algorithm to explore for network. The feature maps are indicated as P3, P4, P5, P6,
the ideal combination of feature fusion and P7, where P3 has the most elevated resolution and P7
operations, such as expansion, concatenation, has the lowest resolution. The include maps are at that
and max-pooling, and produces feature maps of point passed to the location head for further processing.
distinctive resolutions for multi-scale forecasts.

IEEE 979-8-3503-6482-8/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: China University of Petroleum. Downloaded on December 22,2024 at 09:24:59 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS)

1.2 Detection Head to down-weight the contribution of the well-classified


illustrations, and a scaling factor to adjust the positive and
Avoid The detection head of YOLOv8 is based on negative samples. Focal Loss makes a difference to
NAS-FPN [13], which is a neural architecture search progress the recall and precision of the detection results,
method that automatically generates feature pyramid especially for imbalanced and noisy datasets, where the
networks for object detection. Feature pyramid networks larger part of the cases are simple or irrelevant. Focal Loss
are networks that combine features from diverse levels of was initially proposed by Lin et al. [7] for the RetinaNet
the backbone network to produce multi-scale forecasts. algorithm, which may be a one-stage object detection
NAS-FPN employments a reinforcement learning algorithm that employments anchor boxes and feature
algorithm to look for the optimal feature fusion technique, pyramid networks. YOLOv8 receives Focal Loss to
which comprises of a set of combination operations and improve the execution and strength of the algorithm, and
connections. NAS-FPN outputs a feature pyramid to decrease the false positives and false negatives.
network, called NAS-FPN-Cell, which may be a sub-
network that can be rehashed and stacked to make a bigger 2.2 New Data Augmentation Method
network.
YOLOv8 uses a new data augmentation method,
YOLOv8 uses NAS-FPN-Cell as the detection head, called Mixup [14], which could be a information
which could be a sub-network that has six layers and 256 augmentation method that mixes two images and their
channels. NAS-FPN-Cell takes the five feature maps from labels to create a new image and label. Mixup makes a
the backbone network as inputs, and applies a series of difference to extend the differing qualities and complexity
fusion operations and connections to them. The of the preparing data, and improve the generalization and
combination operations incorporate element-wise strength of the model. Mixup too makes a difference to
expansion, element-wise multiplication, global average reduce the overfitting and the memorization of the model,
pooling, max pooling, and concatenation. The connections and to progress the execution on concealed information.
link the features from diverse levels of the backbone Mixup was initially proposed by Zhang et al. [14] as a
network in a top-down and bottom-up way. NAS-FPN- general data augmentation strategy for picture
Cell outputs five feature maps with the same determination classification. YOLOv8 applies Mixup to object detection,
and dimension, comparing to diverse scales of the input and extends it to handle bounding boxes and numerous
picture. The feature maps are denoted as P3', P4', P5', P6', classes.
and P7', where P3' has the smallest scale and P7' has the
largest scale. The feature maps are then utilized to generate 2.3 New Evaluation Metric
bounding box predictions. YOLOv8 employments a new evaluation metric,
YOLOv8 uses a comparable forecast plot as called Average Precision Across Scales (APAS) [15],
YOLOv3 [2], which predicts a fixed number of bounding which could be a metric that measures the precision of
boxes and confidence scores for each feature map. object detection across diverse scales of the objects. APAS
YOLOv8 predicts three bounding boxes and confidence is an expansion of the standard Average Precision (AP)
scores for each feature map, coming about in a add up to metric, which measures the exactness of object detection
of 15 bounding boxes and confidence scores for each input for a single scale of the objects. APAS takes into
picture. YOLOv8 moreover predicts the class probabilities consideration the scale variety of the objects, and
for each bounding box, and combines them with the computes the AP for distinctive scale ranges, such as
confidence scores to get the ultimate detection comes small, medium, and large. APAS then averages the APs for
about. YOLOv8 uses anchor boxes to progress the diverse scale ranges, and gets the final APAS score. APAS
detection accuracy, which are predefined bounding box may be a more comprehensive and reasonable metric for
shapes that are utilized to predict the bounding box object detection, as it reflects the performance of the
dimensions. YOLOv8 employments nine anchor boxes, algorithm on different object sizes and shapes. APAS was
which are determined by utilizing k-means clustering on initially proposed by Huang et al. [15] as a metric for the
the preparing information. The anchor boxes are assigned COCO dataset, which could be a challenging dataset that
to distinctive feature maps concurring to their scales, such contains 80 classes and different object sizes and shapes.
that the smaller anchor boxes are assigned to the smaller IV. RESULT AND DISCUSSION
feature maps, and vice versa.
In this paper compare our strategy with the past
2. New Features and Enhancements YOLO variations and other state-of-the-art object detection
YOLOv8 presents a few modern features and strategies, and evaluate the execution and productivity of our
improvements to the previous YOLO variants, such as a new strategy on different metrics and scenarios utilizing a few
loss function, a modern information augmentation method, benchmark datasets, such as COCO, PASCAL VOC, and
and a modern evaluation metric. These features and upgrades WIDER FACE.
are planned to improve the execution and robustness of the 1. Datasets
algorithm, and to address a few of the confinements and
challenges of the existing YOLO variants. The following datasets to train and test our YOLOv8
model:
2.1 New Loss Function
• COCO [16] : The dataset may be a large-scale
YOLOv8 employments a modern loss function, dataset for object detection, division, and captioning.
called Focal Loss [7], which is a loss work that centres on It contains 80 classes and over 200,000 images, with
the difficult illustrations and decreases the effect of the 118,000 images for training, 5,000 images for
simple cases. Focal Loss employments a modulating factor validation, and 40,500 images for testing. The

IEEE 979-8-3503-6482-8/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: China University of Petroleum. Downloaded on December 22,2024 at 09:24:59 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS)

dataset is challenging and different, because it covers • Average IoU (AIoU) [20] : AIoU is a metric that
a wide range of object sizes, shapes, and categories. measures the average quality of the bounding box
The COCO dataset is the most dataset that we use to predictions. AIoU is computed by averaging the
evaluate our method, because it is the foremost well Intersection over Union (IoU) scores for each
known and broadly utilized dataset for object bounding box forecast, and getting the ultimate
detection. AIoU score. IoU is a score that measures the
overlap between the predicted bounding box and
• PASCAL VOC [17] : The dataset is a classic dataset the ground truth bounding box, and ranges from
for object detection and classification. It contains 20 to 1, where implies no overlap and 1 means
classes and over 11,000 images, with 5,000 images perfect overlap. AIoU is a valuable and
for training and validation, and 6,000 images for instinctive metric for object detection, because it
testing. The dataset is moderately simple and reflects the performance of the algorithm on the
balanced, because it covers common object localization of the objects. AIoU is the particular
categories and encompasses a moderate level of metric that we utilize to evaluate our method on
difficulty. The dataset may be a supplementary the WIDER FACE dataset, because it is the
dataset that we utilize to compare our strategy with official metric for the WIDER FACE dataset.
other strategies, because it may be a well-established
and broadly utilized dataset for object detection. • Frames Per Second (FPS) [21] : FPS is a metric
that measures the speed of the question location
• WIDER FACE [18] : The dataset could be a large- algorithm. FPS is computed by dividing the
scale dataset for face detection. It contains over number of frames processed by the algorithm by
32,000 images and 393,000 faces, with 12,800 the overall time taken by the algorithm, and
images for training, 3,200 images for validation, and getting the ultimate FPS score. FPS is an
16,000 images for testing. The dataset is challenging important and practical metric for object
and complex, because it covers a wide range of face detection, because it reflects the effectiveness
scales, poses, expressions, occlusions, and and versatility of the algorithm, particularly for
illuminations. The dataset may be a particular dataset real-time applications. FPS is the common metric
that we utilize to illustrate our method's performance that we utilize to degree the speed of our method
on face detection, which is an imperative and
on all the datasets, and compare it with other
practical task for object detection. methods.
2. Metrics 3. Results on COCO Dataset
The following metrics to measure the performance and We train and test our YOLOv8 model on the COCO
efficiency of our YOLOv8 model: dataset, using the official train2017, val2017, and test-
• Average Precision Across Scales (APAS) [15]: dev2017 splits. We use the APAS metric to evaluate our
APAS is a metric that measures the accuracy of method on the COCO dataset, following the official
object detection over distinctive scales of the objects. evaluation protocol. We also report the FPS metric to measure
APAS is an extension of the standard Average the speed of our method on the COCO dataset, using a single
Precision (AP) metric, which measures the accuracy NVIDIA RTX 3090 GPU.
of object detection for a single scale of the objects.
APAS takes under consideration the scale variety of TABLE I. THE COMPARISON OF OUR YOLOV8 MODEL WITH THE
PREVIOUS YOLO VARIANTS AND OTHER STATE-OF-THE-ART OBJECT
the objects, and computes the AP for diverse scale DETECTION METHODS ON THE COCO DATASET, IN TERMS OF APAS AND FPS
ranges, such as small, medium, and large. APAS at
that point averages the APs for different scale ranges, Method Backbone APAS FPS
and gets the ultimate APAS score. APAS could be a
more comprehensive and reasonable metric for YOLOv1 Dartnet-19 21.2 45
object detection, because it reflects the performance YOLOv2 Dartnet-19 21.6 40
of the algorithm on different object sizes and shapes.
APAS is the most metric that we use to assess our YOLOv3 Dartnet-53 31.0 20
strategy on the COCO dataset, because it is the YOLOv4 CSPDartnet53 43.5 62
official metric for the COCO dataset.
YOLOv5 EfficientNet-B0 48.1 140
• Mean Average Precision (mAP) [19] : mAP is a
YOLOv6 EfficientNet-B0 49.2 135
metric that measures the average accuracy of object
detection over diverse classes of the objects. mAP is YOLOv7 ResNeSt 50.3 120
computed by averaging the APs for each class of the
YOLOv8 EfficientNet-B4 52.7 150
objects, and getting the ultimate mAP score. mAP
could be a simple and widely used metric for object
detection, because it reflects the performance of the As can be seen from the table, our YOLOv8 model
algorithm on different object categories. mAP is the achieves the best performance among all the methods, with an
supplementary metric that we utilize to compare our APAS score of 52.7, which is 2.4 points higher than the
method with other methods on the PASCAL VOC previous best method, YOLOv7. Our YOLOv8 model also
dataset, because it is the official metric for the achieves the best speed among all the methods, with an FPS
PASCAL VOC dataset. score of 150, which is 10 frames faster than the previous best
method, YOLOv5. These results demonstrate that our

IEEE 979-8-3503-6482-8/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: China University of Petroleum. Downloaded on December 22,2024 at 09:24:59 UTC from IEEE Xplore. Restrictions apply.
2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS)

YOLOv8 model achieves optimal speed and accuracy of [8] Li, Chuyi, Lulu Li, Hongliang Jiang, Kaiheng Weng, Yifei Geng, Liang
object detection on the COCO dataset, and outperforms the Li, Zaidan Ke et al. "YOLOv6: A single-stage object detection
framework for industrial applications." arXiv preprint
existing methods on both metrics. arXiv:2209.02976 (2022).
[9] Li, Kai, Yanni Wang, and Zhongmian Hu. "Improved YOLOv7 for
V. CONCLUSION Small Object Detection Algorithm Based on Attention and Dynamic
Convolution." Applied Sciences 13, no. 16 (2023): 9316.
This paper presents YOLOv8, an innovative object [10] Djinko, Issa AR, and Thabet Kacem. "Video-based Object Detection
detection algorithm that extends the capabilities of previous Using Voice Recognition and YoloV7." In The Twelfth International
YOLO versions by incorporating new features and Conference on Intelligent Systems and Applications (INTELLI 2023).
improvements. The goal of YOLOv8 is to optimize the speed 2023.
and precision of object detection, while ensuring robustness [11] Terven, Juan, Diana-Margarita Córdova-Esparza, and Julio-Alejandro
Romero-González. "A comprehensive review of yolo architectures in
and stability. We performed comprehensive experiments on computer vision: From yolov1 to yolov8 and yolo-nas." Machine
multiple benchmark datasets, including COCO, PASCAL Learning and Knowledge Extraction 5, no. 4 (2023): 1680-1716.
VOC, and WIDER FACE, and benchmarked our approach [12] Mehla, Nandni, Ishita, Ritika Talukdar, and Deepak Kumar Sharma.
against previous YOLO versions and other leading object "Object Detection in Autonomous Maritime Vehicles: Comparison
detection methods. The results demonstrate that YOLOv8 Between YOLO V8 and EfficientDet." In International Conference on
Data Science and Network Engineering, pp. 125-141. Singapore:
surpasses existing methods in terms of performance and Springer Nature Singapore, 2023.
efficiency across various metrics and scenarios. Looking [13] Wang, Kuilin, and Zhenze Liu. "BA-YOLO for Object Detection in
ahead, we plan to tailor our approach to different hardware Satellite Remote Sensing Images." Applied Sciences 13, no. 24 (2023):
platforms, such as edge devices, mobile phones, and cloud 13122.
APIs, to offer a versatile and scalable solution for a range of [14] Zhao, Minghu, Yaoheng Su, Jiuxin Wang, Xinru Liu, Kaihang Wang,
Zishen Liu, Man Liu, and Zhou Guo. "MED-YOLOv8s: a new real-
applications and domains. We also aim to enhance our time road crack, pothole, and patch detection model." Journal of Real-
approach by integrating more recent techniques and Time Image Processing 21, no. 2 (2024): 26.
innovations from the field of object detection and computer [15] Jia, Haozhe, Yong Xia, Yang Song, Donghao Zhang, Heng Huang,
vision, to stay abreast of the latest advancements in the field. Yanning Zhang, and Weidong Cai. "3D APA-Net: 3D adversarial
pyramid anisotropic convolutional network for prostate segmentation
REFERENCES in MR images." IEEE transactions on medical imaging 39, no. 2
(2019): 447-457.
[1] Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, [16] Wang, Nan, Hongbo Liu, Yicheng Li, Weijun Zhou, and Mingquan
stronger." In Proceedings of the IEEE conference on computer vision Ding. "Segmentation and phenotype calculation of rapeseed pods based
and pattern recognition, pp. 7263-7271. 2017. on YOLO v8 and mask R-convolution neural networks." Plants 12, no.
[2] Chun, Lin Zheng, Li Dian, Jiang Yun Zhi, Wang Jing, and Chao Zhang. 18 (2023): 3328.
"YOLOv3: face detection in complex environments." International [17] Ezat, Weal A., Mohamed M. Dessouky, and Nabil A. Ismail.
Journal of Computational Intelligence Systems 13, no. 1 (2020): 1153- "Evaluation of deep learning yolov3 algorithm for object detection and
1160. classification." Menoufia Journal of Electronic Engineering Research
[3] Bochkovskiy, Alexey, Chien-Yao Wang, and Hong-Yuan Mark Liao. 30, no. 1 (2021): 52-57.
"Yolov4: Optimal speed and accuracy of object detection." arXiv [18] Yang, Shuo, Ping Luo, Chen-Change Loy, and Xiaoou Tang. "Wider
preprint arXiv:2004.10934 (2020). face: A face detection benchmark." In Proceedings of the IEEE
[4] Jiang, Peiyuan, Daji Ergu, Fangyao Liu, Ying Cai, and Bo Ma. "A conference on computer vision and pattern recognition, pp. 5525-5533.
Review of Yolo algorithm developments." Procedia Computer Science 2016.
199 (2022): 1066-1073. [19] Alruwaili, Madallah, Muhammad Nouman Atta, Muhammad Hameed
[5] Du, Juan. "Understanding of object detection based on CNN family and Siddiqi, Abdullah Khan, Asfandyar Khan, Yousef Alhwaiti, and Saad
YOLO." In Journal of Physics: Conference Series, vol. 1004, p. Alanazi. "Deep Learning-based YOLO Models for the Detection of
012029. IOP Publishing, 2018. People with Disabilities." IEEE Access (2023).
[6] Thuan, Do. "Evolution of Yolo algorithm and Yolov5: The State-of- [20] Cao, Ziang, Fangfang Mei, Dashan Zhang, Bingyou Liu, Yuwei Wang,
the-Art object detention algorithm." (2021). and Wenhui Hou. "Recognition and Detection of Persimmon in a
[7] Horvat, Marko, Ljudevit Jelečević, and Gordan Gledec. "Comparative Natural Environment Based on an Improved YOLOv5 Model."
Analysis of YOLOv5 and YOLOv6 Models Performance for Object Electronics 12, no. 4 (2023): 785.
Classification on Open Infrastructure: Insights and [21] Ullah, Md Bahar. "CPU based YOLO: A real time object detection
Recommendations." In Central European Conference on Information algorithm." In 2020 IEEE Region 10 Symposium (TENSYMP), pp.
and Intelligent Systems, pp. 317-324. Faculty of Organization and 552-555. IEEE, 2020.
Informatics Varazdin, 2023.

IEEE 979-8-3503-6482-8/24/$31.00 ©2024 IEEE

Authorized licensed use limited to: China University of Petroleum. Downloaded on December 22,2024 at 09:24:59 UTC from IEEE Xplore. Restrictions apply.

You might also like