0% found this document useful (0 votes)
74 views

Face Mask Using Transfer Learning Inception V3 Model

This article depicts how to use to transfer learning in order to solve recognizing face mask in the real-time

Uploaded by

Dũng Trần Anh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

Face Mask Using Transfer Learning Inception V3 Model

This article depicts how to use to transfer learning in order to solve recognizing face mask in the real-time

Uploaded by

Dũng Trần Anh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Final Report Machine Learning, UIT-HCM 2022

Tran Anh Dung – 19521167

Article

Real-Time Facemask Detection Using Transfer Learning


Based Deep Neural Network.
Tran Anh Dung
19521167
University of Information Technology

Abstract: The COVID-19 pandemic disrupted people’s livelihoods and hindered global trade and
transportation. During the COVID-19 pandemic, the World Health Organization mandated that
masks be worn to protect against this deadly virus. Protecting one’s face with a mask has become the
standard. Many public service providers will encourage clients to wear masks worn over is
exhausting. This paper offers a solution based on deep learning for identifying masks worn over
faces in public places to minimize coronavirus community transmission. The main contribution of
the proposed work is the development of a real-time system for determining whether the person on a
webcam is wearing a mask or not, and detecting the person in the image is wearing a mask or not.
The ensemble method makes achieving high accuracy easier and makes considerable strides toward
enhancing detection speed. In addition, the implementation of transfer learning in pre-trained models
and stringent testing on an objective dataset led to the development of a highly dependable and
inexpensive solution. The findings provide validity to the application’s potential for use in real-world
settings, contributing to the reduction in pandemic transmission. Compared to the existing
methodologies, the proposed method delivers improved accuracy, specificity, precision, recall, and
F-measure performance in two-class outputs. These metrics include accuracy, specificity, precision,
and recall. An appropriate balance is kept between the number of necessary parameters and the time
needed to conclude the various models.
Keywords: deep learning, facemask, computer vision, CNN, Covid-19
1. Introduction
The emergency of novel coronavirus (COVID-19) presents unprecedented challenges in many
health systems globally. In March 2020, the World Health Organization (WHO) declared a public
health emergency as the pandemic continues to spread and decimate the population, especially in
vulnerable countries [1]. Since the outbreak of the pandemic, artificial intelligence techniques have
been also utilized to develop contact tracing apps, social distancing tools, smart wearable devices,
and subsequent monitoring of patients in isolation and quarantine facilities remotely [2]. For
instance, [3] conducted a systematic review of machine learning and deep learning techniques
applied to detect the person wearing a face mask during the pandemic world breakout. The research
supporting people wearing masks in public locations to prevent COVID-19 transmission is
advancing rapidly. Disease spread can be delayed by physically separating sick people from others,
taking additional precautions, and minimizing the probability of transmission per interaction. A
mask minimizes transmissibility per encounter in laboratory and clinical settings by limiting the
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

transmission of contaminated respiratory particles. When public mask-wearing is widespread, it


effectively reduces virus spread[4]. Wearing a face mask or other covering over the nose and mouth
was proven to significantly minimize the risk of coronavirus spread by avoiding the forward distance
traveled by an individual’s exhaled air [5]. Face mask detection is determining whether or not
someone is wearing a mask. In computer vision and pattern recognition, face detection is a crucial
component. The face is recognized using various machine learning techniques[6]. The existing
systems have several flaws, including high feature complexity and low detection accuracy. Face
identification approaches based on deep convolutional neural networks (CNNs) have been popular in
increasing detection performance[7]. Even though many academicians have worked hard to develop
fast face detection and recognition algorithms, there is a significant difference between ‘detection of
the face mask under mask’ and ‘detection of mask overface’. In practice, it is difficult to spot mask
abuse. The key challenge is the dataset limitation. Mask-wearing status datasets are often minimal
and merely identify the presence of masks. There is very little study on detecting masks over the face
in the literature. The proposed research intends to create a technique that can accurately detect masks
over the face in public places (such as aiports, train station, crowded markets, and bus stops) to
prevent coranavirus transmission, thus contributing to public health. Furthermore, detecting faces
with or without a mask in public is difficult to train. As a result, the concept of transfer learning is
utilized to transfer learned kernels from networks trained on a large dataset for similar face
detection.
In this pandemic situation, there is a need to monitor the people wearing masks to control the
spread of COVID-19. It is necessary to alert the people to wear masks properly in public places by
comparing the captured image with the datasets. If CCTV cameras record videos, the faces appear
small, hazy, and low-resolution. Because people do not always stare straight at the camera, the face
angle change. These real-wold videos differ significantly from those obtained by webcams or selfie
camera, making face mask recognition in practice much more difficult. Residual blocks were
integrated into the depth-wise separable convolutional layer developed by Inception V3.
The main contributions of the proposed work(techniques and benefits) are as follows:
1. A real-time system was built for determining whether a person on a webcam is wearing a mask or
not.
2. A balanced dataset for a facemask with a nearly one-to-one imbalance ratio was generated using random
oversampling (ROS).
3. An object detection approach (ensemble) combined a one-stage and two-stage detector to recognize objects
from real-time video streams with a short inference time (high speed) and high accuracy.
4. Transfer learning was utilized in Inception-V3 for fusing high-level semantic information in diverse feature
maps by extracting new features from learned characteristics.
5. An improved affine (bounding box) transformation was applied in the cropped region of interest (ROI) as
there are many changes in the size of the face and location.

In this paper, the Inception V3 model was used. Inception V3 Model is a widely used image
classification model which successfully combined ideas from multiple researchers and the building
process took years [15]. A high-level diagram of the Inception V3 model is shown below as Figure
1.
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Figure 1. High-Level Diagram of Inception V3 Model [15]


As shown in Figure 1, the architecture of the Inception V3 Model can be divided into several
inception blocks. Each inception block can contain a different combination of Convolution layer,
AvgPool layer, MaxPool layer, Concat layer, Dropout layer, Fully Connected layer, and Softmax
output. As the graph shows, the Inception V3 Model could have some softmax outputs during the
learning process. This model allows inception work with Factorizing Convolutions which could not
only effectively speed up the training process by reducing the number of connections, but also
prevent overfitting by reducting the parameters to learn [15].
2. Literature Review
Transfer learning is an approach in which knowledge acquired by a CNN from provided and related
data is used to solve the problem. Deep learning network pretrained on previous datasets can be
fine-tuned to achieve high accuracy with a smaller dataset. The methods which are used for deep
learning are discussed below. Sethi et al. [8] proposed a multigranularity masked face recognition
model developed using MobileNetV2 and achieved 94% accuracy. Sen et al. [9] built a system that
differentiates those who use face masks and those who do not utilizing a series of photographs and
videos. The suggested method employed the MobileNetV2 model and Python’s PyTorch and
OpenCV for mask detection, with 79.24% accuracy. Balaji et al. [10] included an entrance system to
public locations that distinguish persons who wear masks from those who do not. Furthermore, if a
person violates the rule of wearing a facemask, this device produces a beep as an alert. The video
was captured with a Raspberry-PI camera and then converted into pictures for further processing.
The usage of masks significantly slow the virus’s spread, according to Cheng et al. [11]. It was
determined that YOLO v3-tiny (You Only Look Once) can detect mask use in real time. It is also
small, fast, and excellent for real-time detection and mobile hardware deployment. Sakshi et al. [12]
created a face mask detector based on MobileNetV2 architecture utilizing Keras/TensorFlow. The
model was changed to guarantee face mask recognition in real-time video or still pictures. The
ultimate goal is to employ computer vision to execute the concept in high-density areas, such as
hospitals, healthcare facilities, and educational institutions. Using a featured image pyramid and
focus loss, a single-stage object detector can detect dense objects in images over several layers. Jiang
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

et al. [13] proposed a two-stage detector that achieves amazing accuracy and speeds comparable to
the single-stage detector. It divides a picture into GxG grids, each providing N-bound box
predictions. Each bounding box can only have one class during the prediction, preventing the
network from finding smaller items. Redmon et al. [14] introduced YOLO, which uses a one-phase
prediction strategy with impressive inference time, but the localization accuracy was low for small
images. YOLOv2 with batch normalization, a high-resolution classifier, and anchor boxes were
added to the YOLO network.
YOLOv3 is an improved version of YOLOv2, featuring a new feature extraction network, a better
backbone classifier, and multiscale prediction, it achieved the classification accuracy as a single-shot
detector (SSD). Futhermore, YOLOv3’s inference demands significant CPU resources, making it
unsuitable for embedded systems. SSD networkds outperform YOLO networks due to their compact
filters of convolution type, extensive feature maps, and estimation across manifolds. The YOLO
network has two fully linked layers, while the SSD network utilizes varies-sized convolutional
layers. The region-based convolutional neural network (R-CNN) presented by Girshick et al. [16]
was the first CNN implementation for object detection and localization on a large scale. The model
generated state-of-art results when tested on standard datasets. R-CNN first extracts a set of item
proposals using a selective search strategy and then forecasts items and related classes using an SVN
(support vector machine) classifier. He et al [17] introduced SPPNet, which is a categorization
system for gathering features and feeding them into a fully connected layer. SPNN can create feature
maps in a single shot detection for the whole image, resulting in a nearly 20-fold boost in object
detection time over R-CNN. Both the detector and the regressor are trained simultaneously without
changing the network configuration. Even though this integration breaks beyond the fast R-CNN
speed R-CNN speed bottleneck, the subsequent detection stage has computation redundancy. The
users are also detected through images, but this works effciently only when they remain stationary,
posing a problem for real-time implementation. Capturing the user’s image and then determining the
presence/ absence of a mask takes more time and is a little more complicated than in video streaming
Inception V3 correctly identifies the presence/ absence of a mask with better accuracy compared to
MobileV2. The video analysis method can be used for face mask detection. Of all the approaches
proposed in the literature, Inception V3 appears to the most promising face mask detection as it uses
a fast and accurate object detection algorithm. The Inception V3 approach allows the accuracy of
determining mask wearing in a video and identification/ extraction of the pixels associated with each
individual.
The existing literature study has some limitations, which are summarized as follows:
a. Various models have been pretrained on standard datasets, but only a limited number of datasets
handle face mask.
b. Due to the limitedness of facemask detection datasets, varying degrees of occlusion and semantics
are essential for numerous mask types.
c. However, none of them are ideal for real-time video surveillance systems.
According to Roy et al., surveillance devices are constrained by a lack of processing power and
memory [18]. As a result, these devices necessitate efficient object detection models capable of
performing real-time surveillance while using minimal memory and maintaining high accuracy.
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Although one-stage detectors are suitable for video surveillance in many applications, hey have
limited accuracy[19]. Two-stage detectors offer accuracy detection in case of multifaceted input at
the expense of high computing time [20]. The aforementioned factors require creating a combined
survelillance device, thereby saving computing time with improved accuracy. Handling an
imbalanced dataset is important for better classification [21,22]. Deep Learning architectures are
rapidly being applied to facemask detection to preven COVID-19 spread using a transfer
learning-based deep neural network[23-30]. Other deep learning [31-38] and optimization
algorithms are used to solve various optimization problems [39-47]. However, several gaps in using
deep learning systems for real-time implementation and prevention strategies must be addressed, as
indicated [48-53].
3. Proposed Methodology
3.1. Convolutional Neural Network for face mask detection

CNN is a deep neural feed-forward network that works on the principles of weight sharing, spatial
feature extraction capability, and less computational costs [54]. CNN architecture was initially
introduced by LeCun in 1989, and designed to process visual imagery [55], The recent
improvements in CNN architectures in detecting objects have seen the great performance of several
CNN-based models in detecting face masks. CNN-based models adopt the architecture of artificial
neural networks (ANNs). It can be thought of as a classifier that extracts and processes hierarchical
features for imagery data. This network gradually uses activation function and training algorithms to
adaptively learn spatial hierarchies of image features [56]. Thus, images are given as input labels and
training is done automatically. In CNN, the first layer is the input image. Instead of having an input
layer and output layer only, CNNs have more additional types of layers or building blocks called
convolutional layer, pooling layer, and fully connected layer [57]. The convolutional layer is the core
module of CNN which is responsible for convolving the input image with learnable filters and
extract its features. Every filter is composed of neurons that detect features for the layer inputs.
Convolutional learns image features and works in coordination with pixels by using small squares of
input data. The input image is slid over by any filter to result in a feature map. Another feature map
is produced by sliding the input image over another filter. The number of filters will determine the
number of feature maps. To reduce the dimensionality of individual feature maps and still sustain the
crucial information, subsampling is used [58]. A pooling layer is a layer that is periodically added
between two successive convolutional layers, to reduce redundant representations from the
predecessor layers and hence, it controls overfitting. Average pooling and max pooling are typical
pooling operations of convolutional neural networks. The max-pooling is more suitable when the
pooled features are very sparse, whereas average pooling allows these networks to act on different
frequencies at each layer while down sampling the images to increase invariance and reduce
redundancies [59]. The pooling layer simply reduces the number of neurons of the previous
convolutional layer. Subsampling can be of varying types, average, sum, maximum, etc. For
maximum subsampling, the spatial neighborhood is specified (such as a 3 x 3 window) and then the
largest element of that feature rectified map is picked within that window. The new feature maps will
be convolved, and subsampling will be performed on the resulting features until it is fully connected.
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Thus, the fully connected layer is responsible for mapping extracted features (from pooling and
convolutional layers) into the final output, such as object detection [54].
3.2 Transfer learning for face mask detection
Training deep neural networks for image classification is better than other algorithms, but the cost
for training it requires high computational power and other resources, and it is time-consuming. In
order to make the network to train faster and cost-effective, deep learning-based transfer learning
evolved. Transfer learning allows to transfer the trained knowledge of the neural networks in terms
of parametric weights to the new model. Transfer learning boosts the performance of the new model
even when it is trained on a small dataset. There are several pre-trained models like InceptionV3,
Xception, MobileNet, MobileNetV2, VGG16, ResNet50, etc, [60,61,62,63,64,65] that are trained
with 14 million images from the ImageNet dataset. InceptionV3 is a 48 layered convolutional neural
network architecture developed by Google.
In this dissertation, a transfer learning-based approach is proposed that utilizes the InceptionV3
pre-trained model for classifying the people who are not wearing face mask. For this work, the last
layer of the InceptionV3 is removed and is fine-tuned by adding a flattening layer, followed by a
dense layer of 512 neurons with ReLU activation function and dropout rate of 0.2, and finally a
decisive dense layer with one neurons and sigmoid activation function is added to classify whether a
person is wearing mask. This transfer learning model is trained for 30 epochs with each epochs
having 116 steps. The schematic representation of the proposed methodology is shown in Fig. 1. The
architecture of the proposed model is shown in Fig.2
Fig.
1:

Schematic representation of the proposed work


Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Fig. 2: Architecture of the proposed model


3.3 Training and Deployment
The figure 3 illustrates the proposed model face mask detection system, which implemented in two
phases (training and deployment). The training phase consists of 11 steps ranging from dataset
collection to image classification, the dataset is classified into two classes: with_mask and
without_mask. In the first step, the data frame is extracted using OpenCV, go with one-hot encoding
implementations and resize the image into 224x224. Balance the unequal number of classes by
performing imbalance computation and random oversampling (ROS). Image augmentation and face
detection are applied by passing through many convolutional layers, which extract feature maps. In
the next step, transfer learning is implemented by replacing thelast predicting layer of pretrained
model with its predicting layers to implement fine-tuned transfer learning. Finally, in the last step, a
pre-trained classification model, Inception V3, is applied to classify images.

(a) Training phase


Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

(b) Deployment phase

Fig 3. The proposed model for face mask detection

In the proposed model for face mask detection, a simple and user-friendly system brings comfort to
users. It uses a web camera as its hardware requirement and processes the video captured. The web
camera can be placed where the shop’s entrance, hotels, offices, etc., are visible so that a face mask
can be easily detected. In the proposed methodology, the video is processed using transfer learning
and an efficient deep learning method for detecting the face mask. The dataset used is applied to the
face mask dataset, consisting of 8,731 images separated two sampling folders for training and testing
with 7:3 ratio respectively.

(a) Samples of people wearing masks from the dataset


Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

(b) Samples of people unwearing masks from the dataset

Fig 4. Face mask dataset, samples for each class from the dataset
Learning more features using learning algorithms is difficult due to the face mask dataset’s small
size and various image complexities. Transfer learning based on deep learning is used to pass
knowledge learned from a source task to a related target task.
3.4. Dataset Characteristics
This paper conducted experiments using the face mask dataset, ie.., the dataset collected in Kaggle
consists of 8,731 masked faces with a minimum size of 128 x 128. The faces in this dataset have
different orientations and occlusion degrees, with many angles. Therefore, during the training
process, we collect multiple edges to minimize loss and ease of training as possible.
3.5. Image Preprocessing and Face Detection
Data augmentation is a widely used approach for getting the most out of a data source that was
unbalanced, and frames were extracted from the videos and resized to 224x224. In CNN, the initial
layers are in charge of extracting generic visual elements such as edges and textures. The subsequent
layers look for more specific qualities on the basis of the preceding attributes. This procedure is
applied for numerous layers until high-value semantic traits can be detected. Finally, the
categorization is carried out using a traditional neural network. A variety of training sets can be
obtained from changes made to the photos such as rotations, translations, and zooming.
The transfer learning approach is preferable in case of limited samples available in the training set.
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Then, the dataset can be increased to a large size by performing a different arrangement of faces on a
template. Thereby, face detection is achieved by removing the image boundaries with no useful
information. For this purpose, an effective approach called rapid object detection with a boosted
cascade of simple features is utilized, in specific, haar cascade classifier.
The face detection step is to detect the face region, utilizing the Haar cascade classifier proposed by
Viola-Jones [66]. This classifier performs feature extraction by Haar Wavelet technique with 24 x 24
window size, uses AdaBoost to remove redundant features to find a sequence of classifiers
𝑓1, 𝑓2, 𝑓3, …, 𝑓𝑘. Applied cascade classifier is then put input data of the Inception V3 algorithm to
detect the regions of face masks.

𝑓𝑤,𝑏(𝐼) = {1, 𝑖𝑓 ∑ 𝑤(𝑥, 𝑦)𝐼(𝑥, 𝑦) + 𝑏 > 0 0, 𝑒𝑙𝑠𝑒


𝑥,𝑦
The formula takes only three possible values {+1, -1, 0}, white on +1, black on -1, and transparent
on 0, as the Fig.5 denotes each patterns must also be symmetric to x-reflection and y-reflection.

Fig 5. Rectangle features shown relative to the enclosing detection window


4. Results and Discussion
The experimental trials for this work are conducted using the Google Colab environment. For
evaluating the performance of the transfer learning model several performance metrics are used
namely Accuracy, Precision, Sensitivity, Specificity, Intersection over Union (IoU), and Matthews
Correlation Coefficient (MCC), Classification error formulated in terms of 𝑌𝑖 and 𝑝𝑖where 𝑌𝑖
represents the one-hot encoded vector and 𝑝𝑖 represents the predicted probability.
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167
𝑇𝑃+𝑇𝑁
Accuracy = (𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁)
(1)

𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃+𝐹𝑃
(2)

𝑇𝑃
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 = 𝑇𝑃+𝐹𝑁
(3)

𝑇𝑁
Specificity = 𝑇𝑁+𝐹𝑃
(4)

𝑇𝑃
IoU = (𝑇𝑃+𝐹𝑃+𝐹𝑁)
(5)

((𝑇𝑃 ×𝑇𝑁)−(𝐹𝑃 ×𝐹𝑁))


MCC = (6)
(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁)

𝑛 (𝑃𝑖)
CE = ∑ 𝑌𝑖𝑙𝑜𝑔2 (7)
𝑖=1

The other performance metrics are formulated in term of True Positive (TP), False Positive (FP),
True Negative (TN), and False Negative (FN). The TP, FP, TN and FN are represented in a grid-like
structure called the confusion matrix. For this work, two confusion matrices are constructed for
evaluating the performance of the model during training and testing. The two-confusion metrics are
shown in Fig. 3 whereas Fig. 4 shows the comparison of the area under precision, loss, and accuracy
during training and testing the model.

Fig. 5. Confusion Matrix

Fig. 6. Comparison of performance of the proposed model during training


Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Fig. 7. Output from proposed model


5. Conclusion
Based on my research into leading proposed deep learning model is built using transfer learning of
InceptionV3. In this work, image augmentation techniques are used to enhance the performance of
the model as they increase the diversity of the training data. The accuracy achieved at 100% during
testing train dataset. Furthermore, by employing large volumes of data and can also be extended to
classify the type of mask, and implement a facial recognition system, working well to identify a
large scale of crowded detection people are wearing the mask. The ensemble approach aids in
reaching high accuracy, but it also significantly improves detection speed. Furthermore, transfer
learning on pretrained models and rigorous testing on an unbiasd dataset resulted in a relable and
low-cost solution. The findings support this application’s viability in real-world scenarios, thus
helping to prevent pandemic spread. Compared with existing approaches, the proposed method
achieved better performance in terms of accuracy, specificity, precision, recall, and F-measure in
two-classes output. Future work can be expanded to include other mask-wearing issues to improve
accuracy. The developed model can be implemented using surveillance devices for biometric
applications, especially in polluted industries with facial landmark detection and face masks.
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

REFERENCES
[1] Z. Yan, Unprecedented pandemic, unprecedented shift, and unprecedented opportunity, Hum. Behav.
Emerg. Technol. (2020), tunity, Hum. Behav. Emerg. Technol. (2020)
[2] F. Shi, J. Wang, J. Shi, Z. Wu, Q. Wang, Z. Tang, et al., Review of artificial intelligence techniques in
imaging data acquisition, segmentation, and diagnosis for COVID-19, IEEE Rev. Biomed. Eng. (2021)
[3] M. Abboah-Offei, Y. Salifu, B. Adewale, J. Bayuo, R. Ofosu-Poku, EBA. Opare-Lokko,A rapid review of
the use of face mask in preventing the spread of COVID-19, Int. J.Nurs. Stud. Adv. 3 (2021) 100013,
[4] Howard, J.; Huang, A.; Li, Z.; Tufekci, Z.; Zdimal, V.; van der Westhuizen, H.-M.; von Delft, A.; Price,
A.; Fridman, L.; Tang, L.-H.; et al. An evidence review of face masks against COVID-19. Proc. Natl. Acad.
Sci. USA 2021, 118, e2014564118. [CrossRef] [PubMed]
[5] Godoy, L.R.G.; Jones, A.E.; Anderson, T.N.; Fisher, C.L.; Seeley, K.M.; Beeson, E.A.; Zane, H.K.;
Peterson, J.W.; Sullivan, P.D. Facial protection for healthcare workers during pandemics: A scoping review.
BMJ Glob. Health 2020, 5, e002553. [CrossRef] [PubMed] 3. Nanni, L.; Ghidoni, S.; Brahnam, S.
Handcrafted vs. non-handcrafted features for computer vision classification. Pattern Recognit. 2017, 71,
158–172. [CrossRef]
[6] Nanni, L.; Ghidoni, S.; Brahnam, S. Handcrafted vs. non-handcrafted features for computer vision
classification. Pattern Recognit. 2017, 71, 158–172. [CrossRef]
[7] Erhan, D.; Szegedy, C.; Toshev, A.; Anguelov, D. Scalable Object Detection using Deep Neural Networks.
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA,
23–28 June 2014; pp. 2147–2154. 5. Sethi, S.; Kathuria, M.; Kaushik, T. Face mask detection using deep
learning: An approach to reduce risk of Coronavirus spread. J. Biomed. Inform. 2021, 120, 103848.
[CrossRef]
[8]. Sethi, S.; Kathuria, M.; Kaushik, T. Face mask detection using deep learning: An approach to reduce risk
of Coronavirus spread. J. Biomed. Inform. 2021, 120, 103848. [CrossRef]
[9]. Sen, S.; Sawant, K. Face mask detection for COVID-19 pandemic using pytorch in deep learning. IOP
Conf. Ser. Mater. Sci. Eng. 2021, 1070, 012061. [CrossRef]
[10]. Balaji, S.; Balamurugan, B.; Kumar, T.A.; Rajmohan, R.; Kumar, P.P. A Brief Survey on AI Based Face
Mask Detection System for Public Places. Ir. Interdiscip. J. Sci. Res. 2021, 5, 108–117.
[11]. Cheng, G.; Li, S.; Zhang, Y.; Zhou, R. A Mask Detection System Based on Yolov3-Tiny. Front. Soc. Sci.
Technol. 2020, 2, 33–41.
[12]. Sakshi, S.; Gupta, A.K.; Yadav, S.S.; Kumar, U. Face Mask Detection System using CNN. In
Proceedings of the 2021 IEEE International Conference on Advanced Computing and Innovative
Technologies in Engineering (ICACITE), Greater Noida, India, 4–5 March 2021; pp. 212–216.
[13]. Jiang, M.; Fan, X.; Yan, H. RetinaMask: A Face Mask Detector. 2020. Available online:
http://arxiv.org/abs/2005.03950 (accessed on 5 April 2021).
[14]. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object
detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, Las Vegas, NV, USA, 27–30 June 2016; Volume 2016, pp. 779–788
[15] Patel, Khush. (2020) “Architecture comparison of AlexNet, VGGNet, ResNet, Inception, DenseNet”
https://towardsdatascience.com/architecture-comparison-of-alexnet-vggnet-resnet-inceptiondensenet-beb8b11
6866d
[16] Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate
Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 142–158. [CrossRef]
[17] He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual
Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef] [PubMed]
[18] Roy, B.; Nandy, S.; Ghosh, D.; Dutta, D.; Biswas, P.; Das, T. MOXA: A Deep Learning Based
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Unmanned Approach For Real-Time Monitoring of People Wearing Medical Masks. Trans. Indian Natl. Acad.
Eng. 2020, 5, 509–518. [CrossRef]
[19]. Ionescu, R.T.; Alexe, B.; Leordeanu, M.; Popescu, M.; Papadopoulos, D.P.; Ferrari, V. How hard can it
be? Estimating the difficulty of visual search in an image. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2157–2166.
[20] Soviany, P.; Ionescu, R.T. Optimizing the Trade-Off between Single-Stage and Two-Stage Deep Object
Detectors using Image Difficulty Prediction. In Proceedings of the 2018 20th International Symposium on
Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, Romania, 20–23
September 2018. [CrossRef]
[21] Devi Priya, R.; Sivaraj, R.; Anitha, N.; Devisurya, V. Forward feature extraction from imbalanced
microarray datasets using wrapper based incremental genetic algorithm. Int. J. Bio-Inspired Comput. 2020,
16, 171–180. [CrossRef]
[22]. Devi Priya, R.; Sivaraj, R.; Anitha, N.; Rajadevi, R.; Devisurya, V. Variable population sized PSO for
highly imbalanced dataset classification. Comput. Intell. 2021, 37, 873–890
[23] Chen, K. MMDetection: Open MMLab Detection Toolbox and Benchmark. arXiv 2019,
arXiv:1906.07155
[24] Goyal, H.; Sidana, K.; Singh, C. A real time face mask detection system using convolutional neural
network. Multimed. Tools Appl. 2022, 81, 14999–15015. [CrossRef]
[25] Farman, H.; Khan, T.; Khan, Z.; Habib, S.; Islam, M.; Ammar, A. Real-Time Face Mask Detection to
Ensure COVID-19 Precautionary Measures in the Developing Countries. Appl. Sci. 2022, 12, 3879.
[CrossRef]
[26] Mbunge, E.; Simelane, S.; Fashoto, S.G.; Akinnuwesi, B.; Metfula, A.S. Application of deep learning
and machine learning models to detect COVID-19 face masks—A review. Sustain. Oper. Comput. 2021, 2,
235–245. [CrossRef]
[27] Tomás, J.; Rego, A.; Viciano-Tudela, S.; Lloret, J. Incorrect Facemask-Wearing Detection Using
Convolutional Neural Networks with Transfer Learning. Healthcare 2021, 9, 1050. [CrossRef] [PubMed]
[28] Jiang, X.; Gao, T.; Zhu, Z.; Zhao, Y. Real-Time Face Mask Detection Method Based on YOLOv3.
Electronics 2021, 10, 837. [CrossRef]
[29] Hussain, S.; Yu, Y.; Ayoub, M.; Khan, A.; Rehman, R.; Wahid, J.; Hou, W. IoT and Deep Learning Based
Approach for Rapid Screening and Face Mask Detection for Infection Spread Control of COVID-19. Appl.
Sci. 2021, 11, 3495. [CrossRef]
[30] Awan, M.J.; Bilal, M.H.; Yasin, A.; Nobanee, H.; Khan, N.S.; Zain, A.M. Detection of COVID-19 in
Chest X-ray Images: A Big Data Enabled Deep Learning Approach. Int. J. Environ. Res. Public Health 2021,
18, 10147. [CrossRef]
[31] Ardabili, S.; Mosavi, A.; Várkonyi-Kóczy, A.R. Systematic review of deep learning and machine
learning models in biofuels research. In Engineering for Sustainable Future; Springer: Cham, Switzerland,
2020; pp. 19–32. [CrossRef]
[32] Abdelminaam, D.S.; Ismail, F.H.; Taha, M.; Taha, A.; Houssein, E.H.; Nabil, A. Coaid-deep: An
optimized intelligent framework for automated detecting COVID-19 misleading information on Twitter. IEEE
Access 2021, 9, 27840–27867. [CrossRef] [PubMed]
[33] Emadi, M.; Taghizadeh-Mehrjardi, R.; Cherati, A.; Danesh, M.; Mosavi, A.; Scholten, T. Predicting and
Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran. Remote Sens. 2020,
12, 2234. [CrossRef]
[34] Salama AbdELminaam, D.; Almansori, A.M.; Taha, M.; Badr, E. A deep facial recognition system using
intelligent computational algorithms. PLoS ONE 2020, 15, e0242269. [CrossRef]
[35] Mahmoudi, M.R.; Heydari, M.H.; Qasem, S.N.; Mosavi, A.; Band, S.S. Principal component analysis to
study the relations between the spread rates of COVID-19 in high risks countries. Alex. Eng. J. 2020, 60,
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

457–464. [CrossRef]
[36] Ardabili, S.; Mosavi, A.; Várkonyi-Kóczy, A.R. Advances in Machine Learning Modeling Reviewing
Hybrid and Ensemble Methods. In Engineering for Sustainable Future; Springer: Cham, Switzerland, 2020;
pp. 215–217. [CrossRef]
[37] AbdElminaam, D.S.; ElMasry, N.; Talaat, Y.; Adel, M.; Hisham, A.; Atef, K.; Mohamed, A.; Akram, M.
HR-Chat bot: Designing and Building Effective Interview Chat-bots for Fake CV Detection. In Proceedings
of the 2021 International
Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 26–27 May 2021; pp.
403–408. [CrossRef]
[38] Rezakazemi, M.; Mosavi, A.; Shirazian, S. ANFIS pattern for molecular membranes separation
optimization. J. Mol. Liq. 2018, 274, 470–476. [CrossRef]
[39] Torabi, M.; Hashemi, S.; Saybani, M.R.; Shamshirband, S.; Mosavi, A. A Hybrid clustering and
classification technique for forecasting short-term energy consumption. Environ. Prog. Sustain. Energy 2018,
38, 66–76. [CrossRef]
[40] Ardabili, S.; Abdolalizadeh, L.; Mako, C.; Torok, B.; Mosavi, A. Systematic Review of Deep Learning
and Machine Learning for Building Energy. Front. Energy Res. 2022, 10, 786027. [CrossRef]
[41] Houssein, E.H.; Hassaballah, M.; Ibrahim, I.E.; AbdElminaam, D.S.; Wazery, Y.M. An automatic
arrhythmia classification model based on improved Marine Predators Algorithm and Convolutions Neural
Networks. Expert Syst. Appl. 2021, 187, 115936. [CrossRef]
[42] Deb, S.; Abdelminaam, D.S.; Said, M.; Houssein, E.H. Recent Methodology-Based Gradient-Based
Optimizer for Economic Load Dispatch Problem. IEEE Access 2021, 9, 44322–44338. [CrossRef]
[43] Elminaam, D.S.A.; Neggaz, N.; Ahmed, I.A.; Abouelyazed, A.E.S. Swarming Behavior of Harris Hawks
Optimizer for Arabic Opinion Mining. Comput. Mater. Contin. 2021, 69, 4129–4149. [CrossRef]
[44] Band, S.S.; Ardabili, S.; Sookhak, M.; Chronopoulos, A.T.; Elnaffar, S.; Moslehpour, M.; Csaba, M.;
Torok, B.; Pai, H.-T.; Mosavi, A. When Smart Cities Get Smarter via Machine Learning: An In-Depth
Literature Review. IEEE Access 2022, 10, 60985–61015. [CrossRef]
[45] Mohammadzadeh, S.D.; Kazemi, S.-F.; Mosavi, A.; Nasseralshariati, E.; Tah, J.H. Prediction of
compression index of fine-grained soils using a gene expression programming model. Infrastructures 2019, 4,
26. [CrossRef]
[46] Deb, S.; Houssein, E.H.; Said, M.; Abdelminaam, D.S. Performance of Turbulent Flow of Water
Optimization on Economic Load Dispatch Problem. IEEE Access 2021, 9, 77882–77893. [CrossRef]
[47] Abdul-Minaam, D.S.; Al-Mutairi, W.M.E.S.; Awad, M.A.; El-Ashmawi, W.H. An Adaptive
Fitness-Dependent Optimizer for the One-Dimensional Bin Packing Problem. IEEE Access 2020, 8,
97959–97974. [CrossRef]
[48] Mosavi, A.; Golshan, M.; Janizadeh, S.; Choubin, B.; Melesse, A.M.; Dineva, A.A. Ensemble models of
GLM, FDA, MARS, and RF for flood and erosion susceptibility mapping: A priority assessment of
sub-basins. Geocarto Int. 2020, 2541–2560. [CrossRef]
[49] Mercaldo, F.; Santone, A. Transfer learning for mobile real-time face mask detection and localization. J.
Am. Med. Inform. Assoc. 2021, 28, 1548–1554. [CrossRef]
[50] Teboulbi, S.; Messaoud, S.; Hajjaji, M.A.; Mtibaa, A. Real-Time Implementation of AI-Based Face Mask
Detection and Social Distancing Measuring System for COVID-19 Prevention. Sci. Program. 2021, 2021,
8340779. [CrossRef]
[51] Hussain, D.; Ismail, M.; Hussain, I.; Alroobaea, R.; Hussain, S.; Ullah, S.S. Face Mask Detection Using
Deep Convolutional Neural Network and MobileNetV2-Based Transfer Learning. Wirel. Commun. Mob.
Comput. 2022, 2022, 1536318. [CrossRef]
[52] Shaban, H.; Houssein, E.H.; Pérez-Cisneros, M.; Oliva, D.; Hassan, A.Y.; Ismaeel, A.A.; AbdElminaam,
D.S.; Deb, S.; Said, M. Identification of parameters in photovoltaic models through a runge kutta optimizer.
Final Report Machine Learning, UIT-HCM 2022
Tran Anh Dung – 19521167

Mathematics 2021, 9, 2313. [CrossRef]


[53]. Houssein, E.H.; Abdelminaam, D.S.; Hassan, H.N.; Al-Sayed, M.M.; Nabil, E. A hybrid barnacles
mating optimizer algorithm with support vector machines for gene selection of microarray cancer
classification. IEEE Access 2021, 9, 64895–64905. [CrossRef] 58. Vibhuti; Jindal, N.; Singh, H.; Rana, P.S.
Face mask detection in COVID-19: A strategic review. Multimedia Tools Appl. 2022, 1–30. [CrossRef]
[PubMed]
[54] M. Jiang, X. Fan, Retinamask: a face mask detector. ArXiv 2020.
[55] S.V. Militante, N.V. Dionisio, Deep learning implementation of facemask and physical distancing
detection with alarm systems, Proceeding - 2020 3rd Int. Conf. Vocat. Educ. Electr. Eng. Strength. Framew.
Soc. 5.0 through Innov. Educ. Electr. Eng. Informatics Eng. ICVEE 2020, 2020
[56] R. Yamashita, M. Nishio, RKG. Do, K. Togashi, Convolutional neural networks: an overview and
application in radiology, Insights Imaging (2018)
[57] AR. Pathak, M. Pandey, S. Rautaray, Application of deep learning for object detection, in: Procedia
Comput. Sci., vol. 132, Elsevier B.V., 2018, pp. 1706–1717
[58] M. Mahdianpari, B. Salehi, M. Rezaee, F. Mohammadimanesh, Y. Zhang, Very deep convolutional neural
networks for complex land cover mapping using multispectral remote sensing imagery, Remote Sens. (2018)
[59] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, et al., Recent advances in convolutional neural
networks, Pattern Recognit. (2018)
[60] Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the in-ception architecture for
computer vision. CoRR abs/1512.00567 (2015)
[61] Chollet, F.: Xception: Deep learning with depthwise separable convolutions. CoRRabs/1610.02357
(2016)
[62] Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., An-dreetto, M., Adam, H.:
Mobilenets: Efficient convolutional neural networks formobile vision applications. CoRR abs/1704.04861
(2017)
[63] Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residualsand linear bottlenecks:
Mobile networks for classification, detection and segmenta-tion. CoRR abs/1801.04381 (2018)
[64] Liu, S., Deng, W.: Very deep convolutional neural network based image classifi-cation using small
training sample size. In: 2015 3rd IAPR Asian Conference onPattern Recognition (ACPR). pp. 730–734
(2015)
[65] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.CoRR abs/1512.03385
(2015)
[66] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in
Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition.
CVPR 2001.

You might also like