0% found this document useful (0 votes)
109 views5 pages

Covid-19 Face Mask Detection Using Tensorflow, Keras and Opencv

SURVEY 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views5 pages

Covid-19 Face Mask Detection Using Tensorflow, Keras and Opencv

SURVEY 2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Covid-19 Face Mask Detection Using

TensorFlow, Keras and OpenCV


Arjya Das Mohammad Wasif Ansari Rohini Basak
Department of Information Technology Department of Information Technology Department of Information Technology
2020 IEEE 17th India Council International Conference (INDICON) | 978-1-7281-6916-3/20/$31.00 ©2020 IEEE | DOI: 10.1109/INDICON49873.2020.9342585

Jadavpur University Jadavpur University Jadavpur University


Kolkata, India Kolkata, India Kolkata, India
[email protected] [email protected] [email protected]

Abstract—COVID-19 pandemic has rapidly affected our day- for source control or aversion of COVID-19. Potential
to-day life disrupting the world trade and movements. Wearing points of interest of the utilization of masks lie in reducing
a protective face mask has become a new normal. In the near vulnerability of risk from a noxious individual during the
future, many public service providers will ask the customers
to wear masks correctly to avail of their services. Therefore, “pre-symptomatic” period and stigmatization of discrete
face mask detection has become a crucial task to help global persons putting on masks to restraint the spread of virus.
society. This paper presents a simplified approach to achieve WHO stresses on prioritizing medical masks and respirators
this purpose using some basic Machine Learning packages like for health care assistants[4]. Therefore, face mask detection
TensorFlow, Keras, OpenCV and Scikit-Learn. The proposed has become a crucial task in present global society.
method detects the face from the image correctly and then
identifies if it has a mask on it or not. As a surveillance
task performer, it can also detect a face along with a mask in Face mask detection involves in detecting the location of
motion. The method attains accuracy up to 95.77% and 94.58% the face and then determining whether it has a mask on it
respectively on two different datasets. We explore optimized or not. The issue is proximately cognate to general object
values of parameters using the Sequential Convolutional Neural detection to detect the classes of objects. Face identification
Network model to detect the presence of masks correctly without
causing over-fitting. categorically deals with distinguishing a specific group of
entities i.e. Face. It has numerous applications, such as
Keywords—Coronavirus, Covid-19, Machine Learning, Face autonomous driving, education, surveillance, and so on [5].
Mask Detection, Convolutional Neural Network, TensorFlow
This paper presents a simplified approach to serve the above
purpose using the basic Machine Learning (ML) packages
I. I NTRODUCTION
such as TensorFlow, Keras, OpenCV and Scikit-Learn.
According to the World Health Organization (WHO)’s
official Situation Report – 205, coronavirus disease 2019 The rest of the paper is organized as follows: Section II
(COVID-19) has globally infected over 20 million people explores related work associated with face mask detection.
causing over 0.7million deaths [1]. Individuals with COVID- Section III discusses the nature of the used dataset. Section
19 have had a wide scope of symptoms reported – going IV presents the details of the packages incorporated to build
from mellow manifestations to serious illness. Respiratory the proposed model. Section V gives an overview of our
problems like shortness of breath or difficulty in breathing is method. Experimental results and analysis are reported in
one of them. Elder people having lung disease can possess section VI. Section VII concludes and draws the line towards
serious complications from COVID-19 illness as they appear future works.
to be at higher risk [2]. Some common human coronaviruses
that infect public around the world are 229E, HKU1, OC43, II. R ELATED W ORK
and NL63. Before debilitating individuals, viruses like In face detection method, a face is detected from an
2019-nCoV, SARS-CoV, and MERS-CoV infect animals image that has several attributes in it. According to [21],
and evolve to human coronaviruses [3]. Persons having research into face detection requires expression recognition,
respiratory problems can expose anyone (who is in close face tracking, and pose estimation. Given a solitary image,
contact with them) to infective beads. Surroundings of a the challenge is to identify the face from the picture. Face
tainted individual can cause contact transmission as droplets detection is a difficult errand because the faces change
carrying virus may withal arrive on his adjacent surfaces [4]. in size, shape, color, etc and they are not immutable. It
becomes a laborious job for opaque image impeded by some
To curb certain respiratory viral ailments, including other thing not confronting camera, and so forth. Authors
COVID-19, wearing a clinical mask is very necessary. The in [22] think occlusive face detection comes with two
public should be aware of whether to put on the mask major challenges: 1) unavailability of sizably voluminous
© IEEE 2021. This article is free to access and download, along with rights for full text and data
mining, re-use and analysis.

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 03,2021 at 12:47:50 UTC from IEEE Xplore. Restrictions apply.
datasets containing both masked and unmasked faces, IV. I NCORPORATED PACKAGES
and 2) exclusion of facial expression in the covered area. A. TensorFlow
Utilizing the locally linear embedding (LLE) algorithm and
TensorFlow, an interface for expressing machine learning
the dictionaries trained on an immensely colossal pool of
algorithms, is utilized for implementing ML systems into fab-
masked faces, synthesized mundane faces, several mislaid
rication over a bunch of areas of computer science, including
expressions can be recuperated and the ascendancy of facial
sentiment analysis, voice recognition, geographic information
cues can be mitigated to great extent. According to the work
extraction, computer vision, text summarization, information
reported in [11], convolutional neural network (CNNs) in
retrieval, computational drug discovery and flaw detection
computer vision comes with a strict constraint regarding the
to pursue research [18]. In the proposed model, the whole
size of the input image. The prevalent practice reconfigures
Sequential CNN architecture (consists of several layers) uses
the images before fitting them into the network to surmount
TensorFlow at backend. It is also used to reshape the data
the inhibition.
Here the main challenge of the task is to detect the face (image) in the data processing.
from the image correctly and then identify if it has a mask on B. Keras
it or not. In order to perform surveillance tasks, the proposed Keras gives fundamental reflections and building units for
method should also detect a face along with a mask in motion. creation and transportation of ML arrangements with high
III. DATASET iteration velocity. It takes full advantage of the scalability
Two datasets have been used for experimenting the current and cross-platform capabilities of TensorFlow. The core data
method. Dataset 1 [16] consists of 1376 images in which structures of Keras are layers and models [19]. All the layers
690 images with people wearing face masks and the rest used in the CNN model are implemented using Keras. Along
686 images with people who do not wear face masks. Fig. 1 with the conversion of the class vector to the binary class
mostly contains front face pose with single face in the frame matrix in data processing, it helps to compile the overall
and with same type of mask having white color only. model.
C. OpenCV
OpenCV (Open Source Computer Vision Library), an open-
source computer vision and ML software library, is utilized
to differentiate and recognize faces, recognize objects, group
movements in recordings, trace progressive modules, follow
eye gesture, track camera actions, expel red eyes from pictures
taken utilizing flash, find comparative pictures from an image
database, perceive landscape and set up markers to overlay it
with increased reality and so forth [20]. The proposed method
makes use of these features of OpenCV in resizing and color
conversion of data images.
Fig. 1. Samples from Dataset 1 including faces without masks and with
masks V. T HE P ROPOSED M ETHOD
The proposed method consists of a cascade classifier and a
Dataset 2 from Kaggle [17] consists of 853 images and
pre-trained CNN which contains two 2D convolution layers
its countenances are clarified either with a mask or without
connected to layers of dense neurons. The algorithm for face
a mask. In fig. 2 some face collections are head turn, tilt and
mask detection is as follows:
slant with multiple faces in the frame and different types of
masks having different colors as well.

Fig. 2. Samples from Dataset 2 including faces without masks and with
masks

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 03,2021 at 12:47:50 UTC from IEEE Xplore. Restrictions apply.
A. Data Processing Deep CNNs require a fixed-size input image. Therefore
Data preprocessing involves conversion of data from we need a fixed common size for all the images in the
a given format to much more user friendly, desired and dataset. Using cv2.resize() the gray scale image is resized
meaningful format. It can be in any form like tables, images, into 100 x 100.
videos, graphs, etc. These organized information fit in
with an information model or composition and captures c) Image Reshaping: The input during relegation of an
relationship between different entities [6]. The proposed image is a three-dimensional tensor, where each channel has a
method deals with image and video data using Numpy and prominent unique pixel. All the images must have identically
OpenCV. tantamount size corresponding to 3D feature tensor. How-
ever, neither images are customarily coextensive nor their
corresponding feature tensors [10]. Most CNNs can only
a) Data Visualization: Data visualization is the pro-
accept fine-tuned images. This engenders several problems
cess of transforming abstract data to meaningful representa-
throughout data collection and implementation of model.
tions using knowledge communication and insight discovery
However, reconfiguring the input images before augmenting
through encodings. It is helpful to study a particular pattern
them into the network can help to surmount this constraint.
in the dataset [7].
[11].
The total number of images in the dataset is visualized in
The images are normalized to converge the pixel range
both categories – ‘with mask’ and ‘without mask’.
between 0 and 1. Then they are converted to 4 di-
The statement categories=os.listdir(data path) categorizes
mensional arrays using data=np.reshape(data,(data.shape[0],
the list of directories in the specified data path. The variable
img size,img size,1)) where 1 indicates the Grayscale image.
categories now looks like: [‘with mask’, ‘without mask’]
As, the final layer of the neural network has 2 outputs – with
Then to find the number of labels, we need to
mask and without mask i.e. it has categorical representation,
distinguish those categories using labels=[i for i in
the data is converted to categorical labels.
range(len(categories))]. It sets the labels as: [0, 1]
Now, each category is mapped to its respective label using B. Training of Model
label dict=dict(zip(categories,labels)) which at first returns a) Building the model using CNN architecture: CNN
an iterator of tuples in the form of zip object where the has become ascendant in miscellaneous computer vision tasks
items in each passed iterator is paired together consequently. [12]. The current method makes use of Sequential CNN.
The mapped variable label dict looks like: {‘with mask’: 0, The First Convolution layer is followed by Rectified Linear
‘without mask’: 1} Unit (ReLU) and MaxPooling layers. The Convolution layer
learns from 200 filters. Kernel size is set to 3 x 3 which
b) Conversion of RGB image to Gray image: Modern specifies the height and width of the 2D convolution window.
descriptor-based image recognition systems regularly work As the model should be aware of the shape of the input
on grayscale images, without elaborating the method used to expected, the first layer in the model needs to be provided
convert from color-to-grayscale. This is because the color- with information about input shape. Following layers can
to-grayscale method is of little consequence when using perform instinctive shape reckoning [13]. In this case, in-
robust descriptors. Introducing nonessential information could put shape is specified as data.shape[1:] which returns the
increase the size of training data required to achieve good dimensions of the data array from index 1. Default padding
performance. As grayscale rationalizes the algorithm and is “valid” where the spatial dimensions are sanctioned to
diminishes the computational requisites, it is utilized for truncate and the input volume is non-zero padded. The
extracting descriptors instead of working on color images activation parameter to the Conv2D class is set as “relu”.
instantaneously [8]. It represents an approximately linear function that possesses
all the assets of linear models that can easily be optimized
with gradient-descent methods. Considering the performance
and generalization in deep learning, it is better compared to
other activation functions [14]. Max Pooling is used to reduce
the spatial dimensions of the output volume. Pool size is set
to 3 x 3 and the resulting output has a shape (number of rows
or columns) of: shape of output = (input shape - pool size
+ 1) / strides), where strides has default value (1,1) [15].
Fig. 3. Conversion of a RGB image to a Gray Scale image of 100x100 size As shown in fig, 4, the second Convolution layer has 100
filters and Kernel size is set to 3 x 3. It is followed by ReLu
We use the function cv2.cvtColor(input image, flag) for and MaxPooling layers. To insert the data into CNN, the
changing the color space. Here flag determines the type of long vector of input is passed through a Flatten layer which
conversion [9]. In this case, the flag cv2.COLOR BGR2GRAY transforms matrix of features into a vector that can be fed
is used for gray conversion. into a fully connected neural network classifier. To reduce

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 03,2021 at 12:47:50 UTC from IEEE Xplore. Restrictions apply.
overfitting a Dropout layer with a 50% chance of setting VI. R ESULT A ND A NALYSIS
inputs to zero is added to the model. Then a Dense layer The model is trained, validated and tested upon two
of 64 neurons with a ReLu activation function is added. The datasets. Corresponding to dataset 1, the method attains
final layer (Dense) with two outputs for two categories uses accuracy up to 95.77% (shown in fig. 7). Fig. 6 depicts how
the Softmax activation function. this optimized accuracy mitigates the cost of error. Dataset
2 is more versatile than dataset 1 as it has multiple faces
in the frame and different types of masks having different
colors as well. Therefore, the model attains an accuracy of
94.58% on dataset 2 as shown in Fig. 9. Fig. 8 depicts the
contrast between training and validation loss corresponding
to dataset 2. One of the main reasons behind achieving
this accuracy lies in MaxPooling. It provides rudimentary
translation invariance to the internal representation along
with the reduction in the number of parameters the model
has to learn. This sample-based discretization process
down-samples the input representation consisting of image,
by reducing its dimensionality. Number of neurons has the
optimized value of 64 which is not too high. A much higher
Fig. 4. Convolutional Neural Network architecture number of neurons and filters can lead to worse performance.
The optimized filter values and pool size help to filter out
The learning process needs to be configured first with the main portion (face) of the image to detect the existence
the compile method [13]. Here “adam” optimizer is used. of mask correctly without causing over-fitting.
categorical crossentropy which is also known as multiclass
log loss is used as a loss function (the objective that the
model tries to minimize). As the problem is a classification
problem, metrics is set to “accuracy”.

b) Splitting the data and training the CNN model:


After setting the blueprint to analyze the data, the model
needs to be trained using a specific dataset and then to
be tested against a different dataset. A proper model and
optimized train test split help to produce accurate results
while making a prediction. The test size is set to 0.1 i.e.
90% data of the dataset undergoes training and the rest 10%
goes for testing purposes. The validation loss is monitored Fig. 6. # epochs vs loss corresponding to dataset 1
using ModelCheckpoint. Next, the images in the training set
and the test set are fitted to the Sequential model. Here, 20%
of the training data is used as validation data. The model is
trained for 20 epochs (iterations) which maintains a trade-off
between accuracy and chances of overfitting. Fig. 5 depicts
visual representation of the proposed model.

Fig. 7. # epochs vs accuracy corresponding to dataset 1


The system can efficiently detect partially occluded faces
either with a mask or hair or hand. It considers the occlusion
degree of four regions – nose, mouth, chin and eye to
differentiate between annotated mask or face covered by
hand. Therefore, a mask covering the face fully including
nose and chin will only be treated as “with mask” by the
Fig. 5. Overview of the Model model.

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 03,2021 at 12:47:50 UTC from IEEE Xplore. Restrictions apply.
[2] “Coronavirus Disease 2019 (COVID-19) – Symptoms”,
Centers for Disease Control and Prevention, 2020. [Online].
Available: https://www.cdc.gov/coronavirus/2019-ncov/symptoms-
testing/symptoms.html. 2020.
[3] “Coronavirus — Human Coronavirus Types — CDC”, Cdc.gov, 2020.
[Online]. Available: https://www.cdc.gov/coronavirus/types.html. 2020.
[4] W.H.O., “Advice on the use of masks in the context of COVID-19:
interim guidance”, 2020.
[5] M. Jiang, X. Fan and H. Yan, “RetinaMask: A Face Mask detector”,
arXiv.org, 2020. [Online]. Available: https://arxiv.org/abs/2005.03950.
2020.
[6] B. Suvarnamukhi and M. Seshashayee, “Big Data Concepts and
Techniques in Data Processing”, International Journal of Computer
Sciences and Engineering, vol. 6, no. 10, pp. 712-714, 2018. Available:
10.26438/ijcse/v6i10.712714.
Fig. 8. # epochs vs loss corresponding to dataset 2 [7] F. Hohman, M. Kahng, R. Pienta and D. H. Chau, “Visual Analytics
in Deep Learning: An Interrogative Survey for the Next Frontiers,” in
IEEE Transactions on Visualization and Computer Graphics, vol. 25,
no. 8, pp. 2674-2693, 1 Aug. 2019, doi: 10.1109/TVCG.2018.2843369.
[8] C. Kanan and G. Cottrell, “Color-to-Grayscale: Does the Method
Matter in Image Recognition?”, PLoS ONE, vol. 7, no. 1, p. e29740,
2012. Available: 10.1371/journal.pone.0029740.
[9] Opencv-python-tutroals.readthedocs.io. 2020. Changing Colorspaces
— Opencv-Python Tutorials 1 Documentation. [online] Available
at:https://opencv-python-tutroals.readthedocs.io/en/latest/py tutorials/
py imgproc/py colorspaces/py colorspaces.html. 2020.
[10] M. Hashemi, “Enlarging smaller images before inputting into convolu-
tional neural network: zero-padding vs. interpolation”, Journal of Big
Data, vol. 6, no. 1, 2019. Available: 10.1186/s40537-019-0263-7 . 2020.
[11] S. Ghosh, N. Das and M. Nasipuri, “Reshaping inputs for con-
volutional neural network: Some common and uncommon meth-
ods”, Pattern Recognition, vol. 93, pp. 79-94, 2019. Available:
Fig. 9. # epochs vs accuracy corresponding to dataset 2 10.1016/j.patcog.2019.04.009.
[12] R. Yamashita, M. Nishio, R. Do and K. Togashi, “Convolutional neural
networks: an overview and application in radiology”, Insights into
Imaging, vol. 9, no. 4, pp. 611-629, 2018. Available: 10.1007/s13244-
The main challenges faced by the method mainly comprise 018-0639-9.
of varying angles and lack of clarity. Indistinct moving faces [13] “Guide to the Sequential model - Keras Documentation”, Faroit.com,
in the video stream make it more difficult. However, following 2020. [Online]. Available: https://faroit.com/keras-docs/1.0.1/getting-
started/sequential-model-guide/. 2020.
the trajectories of several frames of the video helps to create [14] Nwankpa, C., Ijomah, W., Gachagan, A. and Marshall, S., 2020.
a better decision – “with mask” or “without mask”. Activation Functions: Comparison Of Trends In Practice And
Research For Deep Learning. [online] arXiv.org. Available at:
VII. C ONCLUSIONS https://arxiv.org/abs/1811.03378. 2020.
[15] K. Team, “Keras documentation: MaxPooling2D layer”, Keras.io,
In this paper, we briefly explained the motivation of the 2020. [Online]. Available: https://keras.io/api/layers/pooling layers/
work at first. Then, we illustrated the learning and perfor- max pooling2d/. 2020.
mance task of the model. Using basic ML tools and simplified [16] “prajnasb/observations”, GitHub, 2020. [Online]. Available:
https://github.com/prajnasb/observations/tree/master/experiements/data.
techniques the method has achieved reasonably high accuracy. 2020.
It can be used for a variety of applications.Wearing a mask [17] “Face Mask Detection”, Kaggle.com, 2020. [Online]. Available:
may be obligatory in the near future, considering the Covid-19 https://www.kaggle.com/andrewmvd/face-mask-detection. 2020.
[18] “TensorFlow White Papers”, TensorFlow, 2020. [Online]. Available:
crisis. Many public service providers will ask the customers to https://www.tensorflow.org/about/bib. 2020.
wear masks correctly to avail of their services. The deployed [19] K. Team, “Keras documentation: About Keras”, Keras.io, 2020. [On-
model will contribute immensely to the public health care line]. Available: https://keras.io/about. 2020.
[20] “OpenCV”, Opencv.org, 2020. [Online]. Available: https://opencv.org/.
system. In future it can be extended to detect if a person is 2020.
wearing the mask properly or not. The model can be further [21] D. Meena and R. Sharan, “An approach to face detection and
improved to detect if the mask is virus prone or not i.e. the recognition,” 2016 International Conference on Recent Advances and
Innovations in Engineering (ICRAIE), Jaipur, 2016, pp. 1-6, doi:
type of the mask is surgical, N95 or not. 10.1109/ICRAIE.2016.7939462.
[22] S. Ge, J. Li, Q. Ye and Z. Luo, “Detecting Masked Faces in the Wild
R EFERENCES with LLE-CNNs,” 2017 IEEE Conference on Computer Vision and
[1] W.H.O., “Coronavirus disease 2019 (covid-19): situation report, 205”. Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 426-434, doi:
2020 10.1109/CVPR.2017.53.

Authorized licensed use limited to: IEEE Xplore. Downloaded on May 03,2021 at 12:47:50 UTC from IEEE Xplore. Restrictions apply.

You might also like