Qa16 6056 END
Qa16 6056 END
http://doi.org/10.35784/iapgos.6056 received: 01.04.2024 | revised: 27.04.2024 | accepted: 19.06.2024 | available online: 30.06.2024
Abstract. This work aims to engineer a robust system capable of real-time detection, accurately discerning individuals who are either adhering
to or neglecting face mask mandates, across a diverse range of scenarios encompassing images, videos, and live camera streams. This study improved
the architecture of YOLOv8n for face mask detection by building a new two-modification version of YOLOv8n model to improve feature extraction
and prediction network for YOLOv8n. In proposed YOLOv8n-v1, the integration of a residual Network backbone into the YOLOv8n architecture
by replacing the first two layers of YOLOv8n with ResNet_Stem and ResNet_Block modules to improve the model’s ability for feature extraction
and replace Spatial Pyramid Pooling Fast (SPPF) module with Spatial Pyramid Pooling-Cross Stage Partial (SPPCSP) modules which combine SPP
and CSP to create a network that is both effective and efficient. The proposed YOLOv8n-v2 is built by integration Ghostconv and ResNet_Downsampling
modules into the proposed YOLOv8n-v1 backbone. All models have been tested and evaluated on two datasets. The first one is MJFR dataset, which
contains 23,621 images, and collected by the authors of this paper from four distinct datasets, all of which were used for facemask detection purposes.
The second one is MSFM object detection dataset has been collected from groups of videos in real life and images based on the curriculum learning
technology. The model’s performance is assessed by using the following metrics: mean average precision (mAP50), mAP50-95, recall (R) and precision
(P). It has been concluded that both versions of proposed YOLOv8n outperform the original model in terms of accuracy for both datasets. Finally,
the system was successfully implemented in one of the medical clinics affiliated with a medical complex, where the results of its application showed high
efficiency in various aspects of work, and it effectively contributed to improving the public health and safety.
Keywords: YOLOv8, object detection, detection algorithm, residual network
of experiment shows that YOLOv4 presents the best performance prediction and cross-scale connections, to further improve object
than YOLOv3 and YOLOv4-tiny [26]. In 2022, Kumar and et al. detection performance [4, 28]. The backbone network architecture
suggested ETL-YOLOv4 for detection of face mask that of YOLOv8n is presented in table 1.
is a modified version of tiny YOLOv4 to improve feature Table 1. Backbone network architecture of YOLOv8n [20]
extraction and prediction network. A dense SPP network
No of
is initially added to the feature extraction network followed Layer
From Repeat Module type Filter Stride Padding
by adding two additional detection layers are added to ETL- 0 -1 1 Conv 64 2 1
YOLOv4. The suggested ETL-YOLOv4 acquired 9.93% higher 1 -1 1 Conv 128 2 1
mAP, 5.75% higher average precision (AP) for faces with masks, 2 -1 3 C2f ----- -- ---
3 -1 1 Conv 256 2 1
and 16.6% higher average precision (AP) for the face mask region 4 -1 6 C2f ----- -- ---
in comparison to its original base-line form [11]. In 2021, Loey 5 -1 1 Conv 512 2 1
et al presented a hybrid model for the detection of face mask using 6 -1 6 C2f ----- -- ---
deep transfer learning and classical machine learning, the hybrid 7 -1 1 Conv 1024 2 1
model consists of two parts. The first part uses Residual Neural 8 -1 3 C2f ----- -- ---
9 -1 1 SPPF ----- -- ---
Network (Resnet50) feature extraction technology. While
the second part is intended for classifying the face masks
by employing support vector machines (SVM), decision trees 1.1. Explanation of the backbone network
and ensemble algorithms. for the experiment three different The backbone network plays an important role in extracting
datasets of face masked have been used. the Real-World Masked features of YOLOv8 architecture, which processes the given
Face Dataset (RMFD), the Simulated Masked Face Dataset image to generate a collection of feature maps that are then used
(SMFD), and the Labeled Faces in the Wild (LFW). During by the detection head to predict the final object detections.
the testing experiment, the SVM obtained the highest detection A succession of convolutional layers makes up the backbone that
accuracies as compared to decision trees and ensemble algorithm are arranged in a specific pattern to capture features at different
classifiers. The SVM attained an accuracy of 99.64% in RMFD spatial scales. In this architecture, the backbone consists of several
and an accuracy of 99.49% in SMFD, while in LFW achieved types of modules that are repeated multiple times with varying
100% accuracy [12]. numbers of filters, kernel sizes, and stride values [28]. Here
After studying research’s related to the subject of the research is a brief explanation of each module:
and to increase the efficiency of the results to detect face mask, Convolutional layer (Conv): This module performs a 2D
the YOLOv8n raw model has been used, studied, and analyses its convolution operation on the input feature map using a set
results then make a modification to the head and backbone of learned filters. The filters are learned through the process
network architecture of YOLOv8n raw model. The results were of training to capture specific patterns in the data. The number
evaluated using international standards metrics (P, R, [email protected] of filters, kernel size, and stride values can be adjusted for each
and [email protected]:95) and very efficient results were obtained convolutional layer in the architecture [23].
compared to the original YOLOv8n. The contributions of this Cross Stage Partial Network (C2f): This module is a modified
paper are as follows. version of the ResNet block, which uses skip connections
1) Impact on Public Health and Safety: this work addresses to help gradients flow through the network during training.
a critical societal issue by contributing to the development Two convolutional layers that have an identical number of filters
of automated systems that promote public health and safety. and kernel sizes make up the C2f module, which is then followed
The real-time face mask detection system has the potential by a skip connection that add the input feature map to the second
to assist authorities, businesses, and institutions in enforcing convolutional layer output [23]. Fig 1 shows the structure of C2f,
mask-wearing protocols during public health emergencies, BottleNeck and CBS respectively.
thereby reducing the risk of disease transmission and
enhancing community well-being.
2) Building and compilation of large dataset: Created two
datasets for face mask detection and addressing the challenges
posed by varying environmental conditions.
3) Proposed a two new hybrid models using a YOLOv8n deep
learning and a residual Network backbone tailored for face
mask detection, optimizing accuracy for both proposed models
more than the original models.
4) Real-Time Accuracy: achieving real-time face mask detection
without compromising the accuracy of the predictions.
Real-time applications require swift decision-making,
but maintaining a high level of precision in distinguishing
between masked and unmasked individuals is imperative.
The remaining parts of this work is arranged in a specific style:
The original YOLOv8n architecture is displayed in section two,
section3 covers the proposed YOLOv8n, section four covers
the description of datasets section5 covers the analysis of results, Fig. 1. The Structure of (a) C2f, (b) BottleNeck, (c) CBS
finally section 6 covers the conclusions.
Spatial Pyramid Pooling (SPPF): This module performs
1. YOLOv8 model a pooling operation on the input feature map at multiple scales
to capture features at different spatial resolutions. The pooled
YOLOv8 is a newer version that builds on the success features are merging and passing them through a convolutional
of YOLOv7. Which is an object detection neural network model layer to generate a single feature map [22]. The backbone network
that able to identify objects in images and videos. The model is designed to capture features at different spatial scales, which are
is made up of a backbone and a head. Features from the given necessary for detecting objects at different sizes and distances
image are extracted by the backbone while the head is responsible from the camera. The C2f modules and SPPF module help
for detecting objects based on these features [18]. It uses improve the flow of gradients through the network and capture
a modified version of the ResNet architecture as its backbone features at different spatial resolutions [9, 21]. The SPPF
network and introduces several new techniques, such as multiscale architecture is depicted in figure 2.
p-ISSN 2083-0157, e-ISSN 2391-6761 IAPGOŚ 2/2024 91
2. Proposed YOLOv8n
In proposed YOLOv8n-v1, A modified architecture
of YOLOv8n model has been built which outline the integration
of a ResNet backbone into the YOLOv8n architecture to create
a hybrid model. Key changes and features include:
We've seamlessly integrated the ResNet_Stem module
and ResNet_Block modules with the YOLOv8n backbone,
the first two layers are removed and replaced with
ResNet_Stem and ResNet_ Block. ResNet is a form
of convolutional neural network (CNN) that is known
for its ability to learn deep representations from data. ResNets
Fig. 2. Architecture of SPPF [22]
have showed to achieve state-of-the-art results on a variety
of computer vision tasks, including object detection.
1.2. Explanation of the head network The ResNet backbone consists of multiple essential
components, each specifically tailored to enhance
The head network consists of a series of layers that
the feature representation. The ResNet_Stem module initiates
progressively refine the feature maps generated by the backbone.
the backbone with a convolutional layer then performs
First, an upsampling layer is used to double the resolution
batch normalization, ReLU and max pooling. while
of the feature maps, which helps to recover spatial information
the ResNet_Block contains a pair of 3x3 convolutional layers,
that may have been lost during the downsampling performed
each followed by batch normalization and ReLU activation.
by the backbone. Then, the head network concatenates
They also incorporate a skip connection with a 1x1
the upsampled feature maps with feature maps from the backbone
convolution to match the dimensions when necessary. Fig. 4
network that have been selected to have a compatible resolution.
shows the architecture of ResNet_Stem and ResNet_Block.
This permits the head to incorporate both low and high-level
features from input image [9]. Next, the concatenated feature Replace the SPPF modules with SPPCSP. The SPPCSP
maps are passed through a sequence of convolutional layers architecture combines spatial pyramid pooling (SPP) and cross
and other operations that minimize the dimensions of the feature stage partial networks (CSP) to create a network that is both
maps while increasing their depth. This is done to prepare effective and efficient. The SPP layer extract features
the feature maps for the final detection layer. Table 2 shows at multiple scales from an image, and the CSP layer enables
the head network architecture of YOLOv8n [22]. the network to learn more complex features which divide
an input into multiple stages and then partially connects
Table 2. Head network architecture of YOLOv8n [22] the stages. Figure 5 shows SPPCSP architecture.
No of
From Repeat Module type Filter Stride Padding
Layer
10 -1 1 Upsample --- --- --
11 [-1,6] 1 Concat --- --- --
12 -1 3 C2f --- --- --
13 -1 1 Upsample --- --- --
14 [-1,4] 1 Concat --- --- --
15 -1 3 C2f --- --- --
16 -1 1 Conv 256 3x3 2
17 [-1,12] 1 Concat --- --- --
18 -1 3 C2f --- --- --
19 -1 1 Conv 512 3x3 2
20 [-1,9] 1 Concat --- --- --
21 -1 3 C2f --- --- --
[15 ,18 ,21] -- Detect --- --- --
Fig. 6. The Architecture of the proposed YOLOv8n-v1 Fig. 8. Architecture of the proposed YOLOv8n-v2
As summary, It can be concluded that both versions Figure 10 shows that the proposed YOLOv8n_v1
of proposed model architecture (proposed YOLOv8n-v1 and proposed YOLOv8n_v2 models outperform the original
and proposed YOLOv8n-v2) is better than the original model model in all image sizes used in comparison for validation results
in accuracy that can be used in detection of face mask. in MSFM dataset.
The comparison results of accuracy (mAp50) of original,
proposed YOLOv8n_v1 and proposed YOLOv8n_v2 models
are made on different sizes of image for both the validation
and testing. As seen in Fig. 9, the proposed YOLOv8n_v1
and proposed YOLOv8n_v2 models outperform the original
model in all image sizes used in comparison for the validation
results in MJFR dataset.
Fig. 11. Images from the test dataset evaluated by original YOLOv8n
Fig. 12. Images from the test dataset evaluated by proposed YOLOv8n_v1
Fig. 13. Images from the test dataset evaluated by proposed YOLOv8n_v2
p-ISSN 2083-0157, e-ISSN 2391-6761 IAPGOŚ 2/2024 95
[5] Deng J. et al.: Retinaface: Single-stage dense face localisation in the wild. arXiv
5. Conclusion preprint arXiv: 1905.00641, 2019.
[6] Diwan T., Anirudh G., Tembhurne J. V.: Object detection using YOLO:
Based on the COVID-19 pandemic's fast spread, A face mask Challenges, architectural successors, datasets and applications. multimedia
must be wear in our daily lives particularly in public areas to avoid Tools and Applications 82(6), 2023, 9243–9275.
[7] Elharrouss O. et al.: Backbones-review: Feature extraction networks for deep
transmission of this disease. The present work aims to build learning and deep reinforcement learning approaches. arXiv preprint arXiv:
an intelligent system that achieves a high accuracy to detect 2206.08016, 2022.
the persons wearing a mask or not and give a sound alert [8] Gunawan T.S. et al.: Development of video-based emotion recognition using
to the person who is not wearing the mask across a wide range deep learning with Google Colab. TELKOMNIKA (Telecommunication
Computing Electronics and Control) 18(5), 2020, 2463–2471.
of scenarios and improve the YOLOv8n model for face mask. [9] Ju R. Y., Cai W.: Fracture Detection in Pediatric Wrist Trauma X-ray Images
Both versions of proposed YOLOv8n have been applied Using YOLOv8 Algorithm. arXiv preprint arXiv: 2304.05071, 2023.
on two datasets which are MJFR and MSFM that are collected [10] Kelleher J. D.: Deep learning. MIT Press, 2019.
[11] Kumar A., Kalia A., Kalia A.: ETL-YOLO v4: A face mask detection algorithm
and built from the authors of this paper, and it enable the models in era of COVID-19 pandemic. Optik, 259, 2022, 169051.
to accurately detect both masked and unmasked faces. [12] Loey M. et al.: A hybrid deep transfer learning model with machine learning
Our proposed YOLOv8n_v1 and YOLOv8n_v1 models achieved methods for face mask detection in the era of the COVID-19 pandemic.
significant improvements in object detection accuracy compared Measurement 167, 2021, 108288.
[13] Lou H. et al.: DC-YOLOv8: Small-Size Object Detection Algorithm Based
to the original model on the MSFM, MJFR dataset and real time. on Camera Sensor. Electronics 12(10), 2023, 2323.
The performance of both versions of the proposed YOLOv8n [14] Mbunge E. et al.: Application of deep learning and machine learning models
model outperforms the YOLOv8n original model in terms to detect COVID-19 face masks-A review. Sustainable Operations
and Computers 2, 2021, 235–245.
of accuracy in validation and testing evaluations. [15] Mohammed Ali F. A., Al-Tamimi M. S.: Face mask detection methods
The experiments results shown that both version of proposed and techniques: A review. International Journal of Nonlinear Analysis
YOLOv8n outperform the original model in both testing and Applications 13(1), 2022, 3811–3823.
and validation results for mAP50 metrics which is a metric of [16] Nowrin A. et al.: Comprehensive review on facemask detection techniques
in the context of covid-19. IEEE access 9, 2021, 106839–106864.
object detection for both datasets and the proposed YOLOv8n-v2 [17] Padilla R., Netto S. L., Da Silva E. A.: A survey on performance metrics
model outperform the performance of proposed YOLOv8n-v1 for object-detection algorithms. in 2020 international conference on systems,
in both testing and validation results for Map50 in MJFR datasets signals and image processing (IWSSIP), IEEE, 2020.
[18] Phan Q. B., Nguyen T.: A Novel Approach for PV Cell Fault Detection using
while in MSFM, the performance of proposed YOLOv8n-v1 YOLOv8 and Particle Swarm Optimization, 2023.
outperform the performance of proposed YOLOv8n-v2 [19] Rajeshwari P. et al.: Object detection: an overview. Int. J. Trend Sci. Res. Dev.
in both testing and validation results for Map50. It is shown that (IJTSRD) 3(1), 2019, 1663–1665.
the performance of both proposed models depends on dataset. [20] Reis D. et al.: Real-Time Flying Object Detection with YOLOv8. arXiv preprint
arXiv: 2305.09972, 2023.
As a future work, the system's capabilities can be extended [21] Solawetz J.: What is YOLOv8? The Ultimate Guide, 2023,
to include real-time social distancing monitoring can be valuable [https://blog.roboflow.com/whats-new-in-yolov8/] (available: 1.01.2024).
for enforcing physical distancing measures. By detecting [22] Talaat F. M., ZainEldin H.: An improved fire detection approach based
on YOLO-v8 for smart cities. Neural Computing and Applications, 2023, 1–16.
and notifying instances of proximity between individuals, [23] Terven J., Cordova-Esparza D.: A comprehensive review of YOLO: From
the system can aid in maintaining safe distancing guidelines YOLOv1 and beyond. arXiv 2023. arXiv preprint arXiv: 2304.00501.
in crowded areas. [24] Tian Y. et al.: Role of masks in mitigating viral spread on networks. Physical
Review E 108(1), 2023, 014306
[25] Vibhuti et al.: Face mask detection in COVID-19: a strategic review.
References Multimedia Tools and Applications 81(28), 2022, 40013–40042.
[26] Vrigkas M. et al.: Facemask: A new image dataset for the automated
[1] Ahuja A. S. et al.: Artificial intelligence in ophthalmology: A multidisciplinary identification of people wearing masks in the wild. Sensors 22(3), 2022, 896.
approach. Integrative Medicine Research 11(4), 2022, 100888. [27] Wani M. A. et al.: Advances in deep learning. Springer, 2020.
[2] Al-Shamdeen M. J., Younis A. N., Younis H. A.: Metaheuristic algorithm [28] Wu W. et al.: Application of local fully Convolutional Neural Network
for capital letters images recognition. Computer Science 16(2), 2020, 577–588. combined with YOLO v5 algorithm in small target detection of remote sensing
[3] Bhujel S., Shakya S.: Rice Leaf Diseases Classification Using Discriminative image. PloS one 16(10), 2021, e0259283.
Fine Tuning and CLR on EfficientNet. Journal of Soft Computing Paradigm [29] Yunus E.: YOLO V7 and Computer Vision-Based Mask-Wearing Warning
4(3), 2022, 172–187. System for Congested Public Areas. Journal of the Institute of Science
[4] Chabi Adjobo E. et al.: Automatic Localization of Five Relevant Dermoscopic and Technology 13(1), 2023, 22–32.
Structures Based on YOLOv8 for Diagnosis Improvement. Journal of Imaging
9(7), 2023, 148.
She has B.Sc. degree in Computer Science from Obtained B.A. degree in Computer Science in 1992,
the Iraq, University of Mosul, College of Computer then obtained M.A. degree in Computer Architecture
Sciences & Mathematics, Department of Computer in 2001 and Ph.D. in Artificial Intelligence in 2007,
Science at 2005. She received her M.Sc. degrees obtained assistant professor in 2013.
in Computer Science form the same University Research interests: computer science, artificial
& department at 2011. She worked since 2005 until intelligence and machine learning.
this time in the same department.
Research interests: digital image processing, computer
vision, remote sensing, pattern recognition.
https://orcid.org/0000-0002-2806-532X https://orcid.org/0000-0002-7510-0482