0% found this document useful (0 votes)
25 views7 pages

Qa16 6056 END

Uploaded by

RAFLI THE DOCTOR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views7 pages

Qa16 6056 END

Uploaded by

RAFLI THE DOCTOR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

p-ISSN 2083-0157, e-ISSN 2391-6761 IAPGOŚ 2/2024 89

http://doi.org/10.35784/iapgos.6056 received: 01.04.2024 | revised: 27.04.2024 | accepted: 19.06.2024 | available online: 30.06.2024

PERFORMANCE EVALUATION FOR FACE MASK DETECTION BASED


ON MULT MODIFICATION OF YOLOv8 ARCHITECTURE
Muna Jaffer Al-Shamdeen, Fawziya Mahmood Ramo
University of Mosul, College of Computer Science and Mathematics, Mosul, Iraq

Abstract. This work aims to engineer a robust system capable of real-time detection, accurately discerning individuals who are either adhering
to or neglecting face mask mandates, across a diverse range of scenarios encompassing images, videos, and live camera streams. This study improved
the architecture of YOLOv8n for face mask detection by building a new two-modification version of YOLOv8n model to improve feature extraction
and prediction network for YOLOv8n. In proposed YOLOv8n-v1, the integration of a residual Network backbone into the YOLOv8n architecture
by replacing the first two layers of YOLOv8n with ResNet_Stem and ResNet_Block modules to improve the model’s ability for feature extraction
and replace Spatial Pyramid Pooling Fast (SPPF) module with Spatial Pyramid Pooling-Cross Stage Partial (SPPCSP) modules which combine SPP
and CSP to create a network that is both effective and efficient. The proposed YOLOv8n-v2 is built by integration Ghostconv and ResNet_Downsampling
modules into the proposed YOLOv8n-v1 backbone. All models have been tested and evaluated on two datasets. The first one is MJFR dataset, which
contains 23,621 images, and collected by the authors of this paper from four distinct datasets, all of which were used for facemask detection purposes.
The second one is MSFM object detection dataset has been collected from groups of videos in real life and images based on the curriculum learning
technology. The model’s performance is assessed by using the following metrics: mean average precision (mAP50), mAP50-95, recall (R) and precision
(P). It has been concluded that both versions of proposed YOLOv8n outperform the original model in terms of accuracy for both datasets. Finally,
the system was successfully implemented in one of the medical clinics affiliated with a medical complex, where the results of its application showed high
efficiency in various aspects of work, and it effectively contributed to improving the public health and safety.
Keywords: YOLOv8, object detection, detection algorithm, residual network

OCENA WYDAJNOŚCI WYKRYWANIA MASKI NA TWARZY NA PODSTAWIE WIELU


MODYFIKACJI ARCHITEKTURY YOLOv8
Streszczenie. Praca ta ma na celu opracowanie solidnego systemu zdolnego do wykrywania w czasie rzeczywistym, dokładnie rozróżniającego osoby,
które przestrzegają lub zaniedbują wymogi dotyczące noszenia masek na twarzy, w różnych scenariuszach obejmujących obrazy, filmy i streaming z kamer
na żywo. Niniejsze badanie ulepszyło architekturę YOLOv8n do wykrywania masek na twarzy poprzez zbudowanie nowej, dwumodyfikacyjnej wersji
modelu YOLOv8n w celu poprawy ekstrakcji cech i sieci predykcyjnej dla YOLOv8n. W proponowanej wersji YOLOv8n-v1, integracja szkieletu sieci
rezydualnej z architekturą YOLOv8n poprzez zastąpienie pierwszych dwóch warstw YOLOv8n modułami ResNet_Stem i ResNet_Block w celu poprawy
zdolności modelu do ekstrakcji cech i zastąpienia modułu Spatial Pyramid Pooling Fast (SPPF) modułami Spatial Pyramid Pooling-Cross Stage Partial
(SPPCSP), które łączą SPP i CSP w celu stworzenia sieci, która jest zarówno skuteczna, jak i wydajna. Proponowany YOLOv8n-v2 został zbudowany
poprzez integrację modułów Ghostconv i ResNet_Downsampling z proponowanym szkieletem YOLOv8n-v1. Wszystkie modele zostały przetestowane
i ocenione na dwóch zestawach danych. Pierwszym z nich jest zbiór danych MJFR, który zawiera 23 621 obrazów i został zebrany przez autorów
niniejszego artykułu z czterech różnych zbiorów danych, z których wszystkie zostały wykorzystane do wykrywania masek na twarzy. Drugi to zbiór danych
wykrywania obiektów MSFM, który został zebrany z grup filmów wideo w prawdziwym życiu i obrazów opartych na technologii uczenia się programu
nauczania. Wydajność modelu została oceniona za pomocą następujących wskaźników: mean average precision (mAP50), mAP50-95, recall (R)
and precision (P). Stwierdzono, że obie wersje proponowanego YOLOv8n przewyższają oryginalny model pod względem dokładności dla obu zestawów
danych. Wreszcie, system został z powodzeniem wdrożony w jednej z klinik medycznych powiązanych z kompleksem medycznym, gdzie wyniki jego
zastosowania wykazały wysoką wydajność w różnych aspektach pracy i skutecznie przyczyniły się do poprawy zdrowia i bezpieczeństwa publicznego.
Słowa kluczowe: YOLOv8, detekcja obiektów, algorytm detekcji, sieć rezydualna

Introduction strategies to avoid the transmission of COVID-19 in accordance


with the world health organization (WHO). Face masks
Our daily lives have become embedded into object detection are required in many nations, especially in public areas. Also face
applications including people counting, face detection, text masks were once worn by people to safeguard their health from air
detection, face mask detection, pose detection, animal detection, pollution and in the medical domain [15]. Deep learning is a new
vehicle detection, and more. Object detection is an important part of machine learning methods that has recently grown
and powerful computer vision method that focuses on classifying in prominence, which is based on artificial neural networks
(identification) and localizing multiple objects in an image, to model and solve complex problems [1]. To learn various
videos, and even real-time video. The process of identifying features with various levels of abstraction, deep learning refers
the correct location of one or multiple objects using bounding to architectures that uses several layers among the input and
boxes, which correspond to rectangular shapes around the objects output layers with higher level learnt features expressed in terms
are called image Localization [6]. Typically, object detectors of lower-level characteristics [14, 27]. In the deep learning the key
consist of three basic parts. 1) The part for features extraction characteristic is that the layers of features are gradually learned
from the provided image which is called Backbone. 2) The feature from data using a general-purpose learning technique instead
network, which receives input from the backbone at various of being handcrafted and created by human engineers [10, 29].
feature levels and generates a list of fused features that reflect In 2023, Yunus EĞİ presented YOLOv7 deep learning-based
the key aspects of the image. 3) The last part which is the final warning system that makes a distinction in real time among those
class/box network predicts the class and position of each object who are and aren't wearing masks and individual who wear masks
using the fused features [7, 19]. For everyone in the world, improperly. The detection accuracy in term of [email protected] results
the dissemination of the 2019 coronavirus illness, commonly of for all classes, wearing the mask, not wearing the mask,
referred to as COVID-19, is a major apprehension. It is an and improper mask wearing are 0.718, 0.464, 0.922, and 0.763
infectious illness that has had an impact on human life all around respectively, the class of not wearing the mask have a much higher
the world. According to medical professionals, the virus may accuracy according to [email protected] among the other classes [5].
spread by either direct or indirect contact with a person who In 2022, Vrigkas and et al. presented three versions of the YOLO
is infected [25]. The world was greatly affected by the 2019 model, namely YOLOv3, YOLOv4, and YOLOv4-tiny,
Coronavirus Disease (Covid-19) pandemic. Globally, Covid-19's to recognize persons wearing masks using the Facemask picture
contagious spread has impacted nearly 172 million people as dataset that consist of 4866 images for mask and no-mask classes,
of May 2021 [16, 24]. Wearing a face mask is among the greatest carefully chosen to correlate to real-world settings and the result

artykuł recenzowany/revised paper IAPGOS, 2/2024, 89–95


90 IAPGOŚ 2/2024 p-ISSN 2083-0157, e-ISSN 2391-6761

of experiment shows that YOLOv4 presents the best performance prediction and cross-scale connections, to further improve object
than YOLOv3 and YOLOv4-tiny [26]. In 2022, Kumar and et al. detection performance [4, 28]. The backbone network architecture
suggested ETL-YOLOv4 for detection of face mask that of YOLOv8n is presented in table 1.
is a modified version of tiny YOLOv4 to improve feature Table 1. Backbone network architecture of YOLOv8n [20]
extraction and prediction network. A dense SPP network
No of
is initially added to the feature extraction network followed Layer
From Repeat Module type Filter Stride Padding
by adding two additional detection layers are added to ETL- 0 -1 1 Conv 64 2 1
YOLOv4. The suggested ETL-YOLOv4 acquired 9.93% higher 1 -1 1 Conv 128 2 1
mAP, 5.75% higher average precision (AP) for faces with masks, 2 -1 3 C2f ----- -- ---
3 -1 1 Conv 256 2 1
and 16.6% higher average precision (AP) for the face mask region 4 -1 6 C2f ----- -- ---
in comparison to its original base-line form [11]. In 2021, Loey 5 -1 1 Conv 512 2 1
et al presented a hybrid model for the detection of face mask using 6 -1 6 C2f ----- -- ---
deep transfer learning and classical machine learning, the hybrid 7 -1 1 Conv 1024 2 1
model consists of two parts. The first part uses Residual Neural 8 -1 3 C2f ----- -- ---
9 -1 1 SPPF ----- -- ---
Network (Resnet50) feature extraction technology. While
the second part is intended for classifying the face masks
by employing support vector machines (SVM), decision trees 1.1. Explanation of the backbone network
and ensemble algorithms. for the experiment three different The backbone network plays an important role in extracting
datasets of face masked have been used. the Real-World Masked features of YOLOv8 architecture, which processes the given
Face Dataset (RMFD), the Simulated Masked Face Dataset image to generate a collection of feature maps that are then used
(SMFD), and the Labeled Faces in the Wild (LFW). During by the detection head to predict the final object detections.
the testing experiment, the SVM obtained the highest detection A succession of convolutional layers makes up the backbone that
accuracies as compared to decision trees and ensemble algorithm are arranged in a specific pattern to capture features at different
classifiers. The SVM attained an accuracy of 99.64% in RMFD spatial scales. In this architecture, the backbone consists of several
and an accuracy of 99.49% in SMFD, while in LFW achieved types of modules that are repeated multiple times with varying
100% accuracy [12]. numbers of filters, kernel sizes, and stride values [28]. Here
After studying research’s related to the subject of the research is a brief explanation of each module:
and to increase the efficiency of the results to detect face mask, Convolutional layer (Conv): This module performs a 2D
the YOLOv8n raw model has been used, studied, and analyses its convolution operation on the input feature map using a set
results then make a modification to the head and backbone of learned filters. The filters are learned through the process
network architecture of YOLOv8n raw model. The results were of training to capture specific patterns in the data. The number
evaluated using international standards metrics (P, R, [email protected] of filters, kernel size, and stride values can be adjusted for each
and [email protected]:95) and very efficient results were obtained convolutional layer in the architecture [23].
compared to the original YOLOv8n. The contributions of this Cross Stage Partial Network (C2f): This module is a modified
paper are as follows. version of the ResNet block, which uses skip connections
1) Impact on Public Health and Safety: this work addresses to help gradients flow through the network during training.
a critical societal issue by contributing to the development Two convolutional layers that have an identical number of filters
of automated systems that promote public health and safety. and kernel sizes make up the C2f module, which is then followed
The real-time face mask detection system has the potential by a skip connection that add the input feature map to the second
to assist authorities, businesses, and institutions in enforcing convolutional layer output [23]. Fig 1 shows the structure of C2f,
mask-wearing protocols during public health emergencies, BottleNeck and CBS respectively.
thereby reducing the risk of disease transmission and
enhancing community well-being.
2) Building and compilation of large dataset: Created two
datasets for face mask detection and addressing the challenges
posed by varying environmental conditions.
3) Proposed a two new hybrid models using a YOLOv8n deep
learning and a residual Network backbone tailored for face
mask detection, optimizing accuracy for both proposed models
more than the original models.
4) Real-Time Accuracy: achieving real-time face mask detection
without compromising the accuracy of the predictions.
Real-time applications require swift decision-making,
but maintaining a high level of precision in distinguishing
between masked and unmasked individuals is imperative.
The remaining parts of this work is arranged in a specific style:
The original YOLOv8n architecture is displayed in section two,
section3 covers the proposed YOLOv8n, section four covers
the description of datasets section5 covers the analysis of results, Fig. 1. The Structure of (a) C2f, (b) BottleNeck, (c) CBS
finally section 6 covers the conclusions.
Spatial Pyramid Pooling (SPPF): This module performs
1. YOLOv8 model a pooling operation on the input feature map at multiple scales
to capture features at different spatial resolutions. The pooled
YOLOv8 is a newer version that builds on the success features are merging and passing them through a convolutional
of YOLOv7. Which is an object detection neural network model layer to generate a single feature map [22]. The backbone network
that able to identify objects in images and videos. The model is designed to capture features at different spatial scales, which are
is made up of a backbone and a head. Features from the given necessary for detecting objects at different sizes and distances
image are extracted by the backbone while the head is responsible from the camera. The C2f modules and SPPF module help
for detecting objects based on these features [18]. It uses improve the flow of gradients through the network and capture
a modified version of the ResNet architecture as its backbone features at different spatial resolutions [9, 21]. The SPPF
network and introduces several new techniques, such as multiscale architecture is depicted in figure 2.
p-ISSN 2083-0157, e-ISSN 2391-6761 IAPGOŚ 2/2024 91
2. Proposed YOLOv8n
In proposed YOLOv8n-v1, A modified architecture
of YOLOv8n model has been built which outline the integration
of a ResNet backbone into the YOLOv8n architecture to create
a hybrid model. Key changes and features include:
 We've seamlessly integrated the ResNet_Stem module
and ResNet_Block modules with the YOLOv8n backbone,
the first two layers are removed and replaced with
ResNet_Stem and ResNet_ Block. ResNet is a form
of convolutional neural network (CNN) that is known
for its ability to learn deep representations from data. ResNets
Fig. 2. Architecture of SPPF [22]
have showed to achieve state-of-the-art results on a variety
of computer vision tasks, including object detection.
1.2. Explanation of the head network The ResNet backbone consists of multiple essential
components, each specifically tailored to enhance
The head network consists of a series of layers that
the feature representation. The ResNet_Stem module initiates
progressively refine the feature maps generated by the backbone.
the backbone with a convolutional layer then performs
First, an upsampling layer is used to double the resolution
batch normalization, ReLU and max pooling. while
of the feature maps, which helps to recover spatial information
the ResNet_Block contains a pair of 3x3 convolutional layers,
that may have been lost during the downsampling performed
each followed by batch normalization and ReLU activation.
by the backbone. Then, the head network concatenates
They also incorporate a skip connection with a 1x1
the upsampled feature maps with feature maps from the backbone
convolution to match the dimensions when necessary. Fig. 4
network that have been selected to have a compatible resolution.
shows the architecture of ResNet_Stem and ResNet_Block.
This permits the head to incorporate both low and high-level
features from input image [9]. Next, the concatenated feature  Replace the SPPF modules with SPPCSP. The SPPCSP
maps are passed through a sequence of convolutional layers architecture combines spatial pyramid pooling (SPP) and cross
and other operations that minimize the dimensions of the feature stage partial networks (CSP) to create a network that is both
maps while increasing their depth. This is done to prepare effective and efficient. The SPP layer extract features
the feature maps for the final detection layer. Table 2 shows at multiple scales from an image, and the CSP layer enables
the head network architecture of YOLOv8n [22]. the network to learn more complex features which divide
an input into multiple stages and then partially connects
Table 2. Head network architecture of YOLOv8n [22] the stages. Figure 5 shows SPPCSP architecture.
No of
From Repeat Module type Filter Stride Padding
Layer
10 -1 1 Upsample --- --- --
11 [-1,6] 1 Concat --- --- --
12 -1 3 C2f --- --- --
13 -1 1 Upsample --- --- --
14 [-1,4] 1 Concat --- --- --
15 -1 3 C2f --- --- --
16 -1 1 Conv 256 3x3 2
17 [-1,12] 1 Concat --- --- --
18 -1 3 C2f --- --- --
19 -1 1 Conv 512 3x3 2
20 [-1,9] 1 Concat --- --- --
21 -1 3 C2f --- --- --
[15 ,18 ,21] -- Detect --- --- --

Finally, the head network passes the processed feature maps


through a detection layer that predicts bounding boxes and class
Fig. 4. The architecture of (a) ResNet_Stem module (b) The ResNet_Block module
probabilities for the objects presented in the given image.
The detection layer uses anchor boxes, which are predefined boxes
of various sizes and aspect ratios, to predict the location and size
of objects in the image [9, 20]. The structure of YOLOv8n
has been displayed in figure 3 [13].

Fig. 5. SPPCSP modules Architecture

The configuration of proposed YOLOv8n-v1 has been presented


Fig. 3. YOLOv8n network architecture [13] in figure 6.
92 IAPGOŚ 2/2024 p-ISSN 2083-0157, e-ISSN 2391-6761

Fig. 6. The Architecture of the proposed YOLOv8n-v1 Fig. 8. Architecture of the proposed YOLOv8n-v2

The details of the proposed YOLOv8n-v1 backbone have been 3. Datasets


presented in table 3.
In this paper, two types of object detection datasets have been
Table 3. Backbone network architecture of proposed YOLOv8n-v1
used namely The MJFR and MSFM dataset. The MJFR is object
No of
From Repeat Module type Filter Stride Padding
detection dataset collected by the authors available on Roboflow
Layer platform cloned from four different repositories on roboflow
0 -1 1 RestNet_Stem --- ---- --
computer vision platform. Data augmentation has been applied
1 -1 3 ResNet_Block --- ---- --
2 -1 4 C2f --- ---- -- to improve the model's performance. The MJFR contains 23621
3 -1 1 Conv 256 3x3 2 images in total, including 20658 images for training and 1952
4 -1 5 C2f --- ---- -- for validation and 1011 images for testing. The dataset consisted
5 -1 1 Conv 512 3x3 2 of images containing individuals with and without Face masks.
6 -1 5 C2f --- ---- --
7 -1 1 Conv 1024 3x3 2
Special attention was given to ensure accurate ground truth
8 -1 5 C2f --- ---- -- annotations and check the class label for each object in dataset
9 -1 1 SPPCSP --- ---- -- corresponding to ('mask' or 'no-mask'). The MSFM object
detection dataset has been created from groups of images
In proposed YOLOv8n-v2, the ResNet_Stem module, and videos in real life taken by the webcam and mobile camera
ResNet_Block modules, ResNet_downsample modules, based on the curriculum learning technology and annotated
GhostConv modules, SPPCSP modules are intergrated with by the authors of this paper using YOLOv8n model. Curriculum
the YOLOv8n model backbone to enhance the feature extraction learning relies on the idea of arranging data to categories or levels
for the model. The first two layers are removed and replaced from easy to hard in a way that makes the model learn better
with ResNet_Stem and ResNet_ Block. The fourth and sixth and more effectively. The training of dataset is started
layer which is conv module is replaced with GhostConv module. by presenting the easiest data first and then gradually increase
The architecture of GhostConv module is shown in Fig. 7. the complexity. The MSFM dataset contains 19601 images
The eighth and ninth layers are replaced by ResNet_Downsample in total, including 14293 images for training, 4204 for validation
and ResNet_Block. The ResNet_Downsample modules perform and 1104 images for testing.
down sampling operations using 1x1 convolutions, batch
normalization, and ReLU activation. The last layer is replaced 4. Results and discussions
by SPPCSP module. Table 4 shows the architecture
of the backbone proposed YOLOv8n-v2. Experiments were conducted by training both datasets with
Figure 8 shows the architecture of proposed YOLOv8-v2. YOLOv8n raw model and both version of proposed YOLOv8n.
To discover which one of all models are better in terms of P, R,
Table 4. Backbone network architecture of proposed YOLOv8n-v2 [email protected] and [email protected]:0.95 since these metrics define which
No of one performs better in terms of overall detection. The metrics
From Repeat Module type Filter Stride Padding employed for quantitative examination of the models are described
Layer
0 -1 1 RestNet_Stem --- ------ -- as following:
1 -1 3 ResNet_Block --- ------ --
 The ratio of correctly classified positive samples (True
2 -1 3 C2f --- ------ --
3 -1 1 GhostConv 245 3x3 2 Positives) to all positively classified samples (True Positives
4 -1 6 C2f --- ------ -- plus False Positives), whether they were correctly classified
5 -1 1 GhostConv 512 3x3 2 or not, is known as precision [2].
6 -1 6 C2f --- ------ --  The ratio of true positive to all positive samples (True Positive
7 -1 1 ResNet_Downsample 1024 3x3 2
8 -1 6 ResNet_Block --- ------ --
+ False Negative) is used to compute the recall value.
9 -1 1 SPPCSP --- ------ -- It gauges how well the model can locate positive samples [2].
 By comparing the detected box to the ground-truth box
bounding box at an IoU threshold of 0.5, the [email protected]
determines a score. The higher the score, the more accurate
the model's detections are [17].
 The phrase "[email protected]:0.95" denotes the average mAP over
different thresholds, from 0.5 to 0.95, in steps of 0.05 [12, 17].
To apply the system in medical clinic, two external Full HD
1080p web camera has been used with a cable of five meters
length and a graphics user interfaces (GUIs) for the proposed
system has been built using the qt designer application
and connect it with python language using the package pyqt5.
Colaboratory by Google (also known as Google Colab)
is a product of Google Research which is a runtime environment
Fig. 7. The Ghostconv Module Structure based on Jupyter notebook that enables us to train our deep
learning and machine learning models on CPUs, GPUs, and TPUs.
p-ISSN 2083-0157, e-ISSN 2391-6761 IAPGOŚ 2/2024 93
makes it the ideal tool for data analytics and deep learning Table 7. Evaluation Results of proposed YOLOv8n-v2on MJFR and MSFM dataset
enthusiasts because of computing limits of local devices [8]. When
you create Your personal Colab notebooks they are kept in your
Google Drive account. The training, validation, and testing results
were saved on Google Drive and are accessible for further
use. The platform also gives users access to Google Drive,
which is crucial for importing data and saving files. Collaboratory
can also be defined as a data analysis platform that integrates
text, code, and code outputs into one document, enables anyone
to create and run arbitrary python code over the web, and
is particularly well suited to machine learning, data analysis [3].
We are training the YOLOv8n raw model, proposed
YOLOv8n-v1 and proposed YOLOv8n-v2 in 100 epochs for
MJFR and MSFM datasets. The output values of the performance
results got from the validation and testing of YOLOv8n raw
model on both dataset is shown in table 5.
Table 5. Evaluation Results of YOLOv8n raw model on MJFR and MSFM dataset

In terms of R, as indicated in table 8, comparing the results


of YOLOv8 raw model with both versions of proposed
YOLOv8n, it has been shown that both versions of proposed
model outperform raw model in all cases for both validation
and testing results and for both datasets.
In terms of [email protected], the results of [email protected] seen that both
versions of proposed model have a better results and performance
in term of accuracy as evidenced by the mAP50 in both datasets
and for both validation and testing.
The proposed YOLOv8n-v1 had 0.929 for mAP50 and 0.932
for proposed YOLOv8n-v2 in validation results of MJFR dataset
compared to the original model which have 0.913 for mAP50
by a difference of 1.6% and 1.9% for the proposed YOLOv8n-v1
and proposed YOLOv8n-v2 respectively. The mAP50 metrics
for testing results had 0.920 and 0.922 of proposed YOLOv8n-v1
and proposed YOLOv8n-v2 respectively compared to original
model which have 0.909 for mAP50 by a difference of 1.1%
Table 6 shows evaluation outcomes of the proposed and 1.3% for the proposed YOLOv8n-v1 and proposed
YOLOv8n-v1 model on MJFR and MSFM datasets. YOLOv8n-v2 respectively.
Based on the comparison of detection results, the YOLOv8n
Table 6: Evaluation Results of proposed YOLOv8n-v1on MJFR dataset and MSFM model achieved 0.939 for mAP50 in MSFM testing result while
dataset
the proposed YOLOv8n-v1 model achieved 0.953 for mAP50
and 0.95 for mAp50 in proposed YOLOv8n-v2 model
by difference of 1.4% and 1.1% for v1 and v2 respectively. While
the value of mAP50 in MSFM validation results have 0.94
and 0.935 for the proposed YOLOv8b-v1 and YOLOv8n-v2
respectively compared to the base model which have 0.907
by a difference of 3.3% and 2.8% for v1 and v2 respectively.
In terms of Map50-95, as seen in table 8, the results
of [email protected] shown that both versions of proposed model
have a better result than original model in both datasets
and for both validation and testing except in the testing result
of MJFR, the value of [email protected] in proposed YOLOv8n_v1
model has 0.523 while the original model has 0.529.
Table 8. Comparative analysis among YOLOv8n model, proposed YOLOv8n-v1
model proposed YOLOv8n-v2 in MJFR and MSFM dataset

Table 7 shows evaluation outcomes of the proposed


YOLOv8nv2 model on MJFR and MSFM dataset.
The comparison analysis of validation and testing results for
the original model, proposed YOLOv8n-v1 and proposed
YOLOv8n-v2 can be seen in table 8.
In terms of P, as indicated in table 8, From the detection
comparison in MJFR and MSFM datasets, both versions of
proposed YOLOv8n achieves better performance and has more
true positives to total number of detected objects compared to raw
model in the testing and validation of both dataset except in the
testing result of MSFM, the value of P in proposed YOLOv8n_v1
model has 0.943 while the original model has 0.948.
94 IAPGOŚ 2/2024 p-ISSN 2083-0157, e-ISSN 2391-6761

As summary, It can be concluded that both versions Figure 10 shows that the proposed YOLOv8n_v1
of proposed model architecture (proposed YOLOv8n-v1 and proposed YOLOv8n_v2 models outperform the original
and proposed YOLOv8n-v2) is better than the original model model in all image sizes used in comparison for validation results
in accuracy that can be used in detection of face mask. in MSFM dataset.
The comparison results of accuracy (mAp50) of original,
proposed YOLOv8n_v1 and proposed YOLOv8n_v2 models
are made on different sizes of image for both the validation
and testing. As seen in Fig. 9, the proposed YOLOv8n_v1
and proposed YOLOv8n_v2 models outperform the original
model in all image sizes used in comparison for the validation
results in MJFR dataset.

Fig. 10. Comparative analysis of original YOLOv8n, proposed YOLOv8n_v1


and proposed YOLOv8n_v2 of mAp50 for different size of images in the validation
Results of MSFM dataset

Some detection results can be seen in MJFR test images


for the YOLOv8n model, proposed YOLOv8n-v1 and proposed
YOLOv8n-v2 in figures 11, 12 and 13 respectively. Red box
denotes the masked face; pink box denotes the non-masked face.
As seen in below figures that both versions of proposed models
can detect some objects in images that is not detected using
Fig. 9. Comparative analysis of original YOLOv8n, proposed YOLOv8n_v1 original model and the confidence score for both versions
and proposed YOLOv8n_v2 of mAp50 for different size of images in the validation
Results of MJFR Dataset of proposed model are better than original model.

Fig. 11. Images from the test dataset evaluated by original YOLOv8n

Fig. 12. Images from the test dataset evaluated by proposed YOLOv8n_v1

Fig. 13. Images from the test dataset evaluated by proposed YOLOv8n_v2
p-ISSN 2083-0157, e-ISSN 2391-6761 IAPGOŚ 2/2024 95
[5] Deng J. et al.: Retinaface: Single-stage dense face localisation in the wild. arXiv
5. Conclusion preprint arXiv: 1905.00641, 2019.
[6] Diwan T., Anirudh G., Tembhurne J. V.: Object detection using YOLO:
Based on the COVID-19 pandemic's fast spread, A face mask Challenges, architectural successors, datasets and applications. multimedia
must be wear in our daily lives particularly in public areas to avoid Tools and Applications 82(6), 2023, 9243–9275.
[7] Elharrouss O. et al.: Backbones-review: Feature extraction networks for deep
transmission of this disease. The present work aims to build learning and deep reinforcement learning approaches. arXiv preprint arXiv:
an intelligent system that achieves a high accuracy to detect 2206.08016, 2022.
the persons wearing a mask or not and give a sound alert [8] Gunawan T.S. et al.: Development of video-based emotion recognition using
to the person who is not wearing the mask across a wide range deep learning with Google Colab. TELKOMNIKA (Telecommunication
Computing Electronics and Control) 18(5), 2020, 2463–2471.
of scenarios and improve the YOLOv8n model for face mask. [9] Ju R. Y., Cai W.: Fracture Detection in Pediatric Wrist Trauma X-ray Images
Both versions of proposed YOLOv8n have been applied Using YOLOv8 Algorithm. arXiv preprint arXiv: 2304.05071, 2023.
on two datasets which are MJFR and MSFM that are collected [10] Kelleher J. D.: Deep learning. MIT Press, 2019.
[11] Kumar A., Kalia A., Kalia A.: ETL-YOLO v4: A face mask detection algorithm
and built from the authors of this paper, and it enable the models in era of COVID-19 pandemic. Optik, 259, 2022, 169051.
to accurately detect both masked and unmasked faces. [12] Loey M. et al.: A hybrid deep transfer learning model with machine learning
Our proposed YOLOv8n_v1 and YOLOv8n_v1 models achieved methods for face mask detection in the era of the COVID-19 pandemic.
significant improvements in object detection accuracy compared Measurement 167, 2021, 108288.
[13] Lou H. et al.: DC-YOLOv8: Small-Size Object Detection Algorithm Based
to the original model on the MSFM, MJFR dataset and real time. on Camera Sensor. Electronics 12(10), 2023, 2323.
The performance of both versions of the proposed YOLOv8n [14] Mbunge E. et al.: Application of deep learning and machine learning models
model outperforms the YOLOv8n original model in terms to detect COVID-19 face masks-A review. Sustainable Operations
and Computers 2, 2021, 235–245.
of accuracy in validation and testing evaluations. [15] Mohammed Ali F. A., Al-Tamimi M. S.: Face mask detection methods
The experiments results shown that both version of proposed and techniques: A review. International Journal of Nonlinear Analysis
YOLOv8n outperform the original model in both testing and Applications 13(1), 2022, 3811–3823.
and validation results for mAP50 metrics which is a metric of [16] Nowrin A. et al.: Comprehensive review on facemask detection techniques
in the context of covid-19. IEEE access 9, 2021, 106839–106864.
object detection for both datasets and the proposed YOLOv8n-v2 [17] Padilla R., Netto S. L., Da Silva E. A.: A survey on performance metrics
model outperform the performance of proposed YOLOv8n-v1 for object-detection algorithms. in 2020 international conference on systems,
in both testing and validation results for Map50 in MJFR datasets signals and image processing (IWSSIP), IEEE, 2020.
[18] Phan Q. B., Nguyen T.: A Novel Approach for PV Cell Fault Detection using
while in MSFM, the performance of proposed YOLOv8n-v1 YOLOv8 and Particle Swarm Optimization, 2023.
outperform the performance of proposed YOLOv8n-v2 [19] Rajeshwari P. et al.: Object detection: an overview. Int. J. Trend Sci. Res. Dev.
in both testing and validation results for Map50. It is shown that (IJTSRD) 3(1), 2019, 1663–1665.
the performance of both proposed models depends on dataset. [20] Reis D. et al.: Real-Time Flying Object Detection with YOLOv8. arXiv preprint
arXiv: 2305.09972, 2023.
As a future work, the system's capabilities can be extended [21] Solawetz J.: What is YOLOv8? The Ultimate Guide, 2023,
to include real-time social distancing monitoring can be valuable [https://blog.roboflow.com/whats-new-in-yolov8/] (available: 1.01.2024).
for enforcing physical distancing measures. By detecting [22] Talaat F. M., ZainEldin H.: An improved fire detection approach based
on YOLO-v8 for smart cities. Neural Computing and Applications, 2023, 1–16.
and notifying instances of proximity between individuals, [23] Terven J., Cordova-Esparza D.: A comprehensive review of YOLO: From
the system can aid in maintaining safe distancing guidelines YOLOv1 and beyond. arXiv 2023. arXiv preprint arXiv: 2304.00501.
in crowded areas. [24] Tian Y. et al.: Role of masks in mitigating viral spread on networks. Physical
Review E 108(1), 2023, 014306
[25] Vibhuti et al.: Face mask detection in COVID-19: a strategic review.
References Multimedia Tools and Applications 81(28), 2022, 40013–40042.
[26] Vrigkas M. et al.: Facemask: A new image dataset for the automated
[1] Ahuja A. S. et al.: Artificial intelligence in ophthalmology: A multidisciplinary identification of people wearing masks in the wild. Sensors 22(3), 2022, 896.
approach. Integrative Medicine Research 11(4), 2022, 100888. [27] Wani M. A. et al.: Advances in deep learning. Springer, 2020.
[2] Al-Shamdeen M. J., Younis A. N., Younis H. A.: Metaheuristic algorithm [28] Wu W. et al.: Application of local fully Convolutional Neural Network
for capital letters images recognition. Computer Science 16(2), 2020, 577–588. combined with YOLO v5 algorithm in small target detection of remote sensing
[3] Bhujel S., Shakya S.: Rice Leaf Diseases Classification Using Discriminative image. PloS one 16(10), 2021, e0259283.
Fine Tuning and CLR on EfficientNet. Journal of Soft Computing Paradigm [29] Yunus E.: YOLO V7 and Computer Vision-Based Mask-Wearing Warning
4(3), 2022, 172–187. System for Congested Public Areas. Journal of the Institute of Science
[4] Chabi Adjobo E. et al.: Automatic Localization of Five Relevant Dermoscopic and Technology 13(1), 2023, 22–32.
Structures Based on YOLOv8 for Diagnosis Improvement. Journal of Imaging
9(7), 2023, 148.

M.Sc. Muna Jaffer Al-Shamdeen Prof. Fawziya Mahmood Ramo


e-mail: [email protected] e-mail: [email protected]

She has B.Sc. degree in Computer Science from Obtained B.A. degree in Computer Science in 1992,
the Iraq, University of Mosul, College of Computer then obtained M.A. degree in Computer Architecture
Sciences & Mathematics, Department of Computer in 2001 and Ph.D. in Artificial Intelligence in 2007,
Science at 2005. She received her M.Sc. degrees obtained assistant professor in 2013.
in Computer Science form the same University Research interests: computer science, artificial
& department at 2011. She worked since 2005 until intelligence and machine learning.
this time in the same department.
Research interests: digital image processing, computer
vision, remote sensing, pattern recognition.

https://orcid.org/0000-0002-2806-532X https://orcid.org/0000-0002-7510-0482

You might also like