0% found this document useful (0 votes)
5 views32 pages

DIP Report Final.

This document discusses the development of a masked face recognition system to improve access control technology, particularly in the context of increased mask-wearing due to COVID-19. It outlines the creation of a dataset by adding masks to existing facial images and the use of deep learning models to enhance recognition accuracy. The proposed system aims to allow users to maintain contactless access without removing masks, addressing the limitations of traditional face recognition methods.

Uploaded by

shashi246521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views32 pages

DIP Report Final.

This document discusses the development of a masked face recognition system to improve access control technology, particularly in the context of increased mask-wearing due to COVID-19. It outlines the creation of a dataset by adding masks to existing facial images and the use of deep learning models to enhance recognition accuracy. The proposed system aims to allow users to maintain contactless access without removing masks, addressing the limitations of traditional face recognition methods.

Uploaded by

shashi246521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

ABSTRACT

Facial recognition is a widely used technology. However, it fails to perform well when the
user’s face is partially covered. This has become an imminent problem to solve as wearing
masks has become the new normal since the outbreak of COVID-19. In this paper, we propose
an access control system with the capability to recognize users’ identities even when they are
wearing masks. The proposed system we create our dataset by automatically adding masks to
the existing datasets of facial images without masks. A deep learning architecture developed
by matching different current models and loss functions to train the datasets. The model will
be applied to our system with a desktop camera application and a backend server to simulate
an access control system.

ii
TABLE OF CONTENTS
Chapter Description Page No.
Acknowledgement i
Abstract ii
1. INTRODUCTION 1-2
1.1 Overview
1.2 Objective
2. LITERATURE SURVEY 3-7
2.1 DCNN Pipeline
2.2 Loss Function
3. PREAMBLE 08
3.1 Existing System
3.2 Proposed System
3.3 Methodology
4. REQUIREMENT SPECIFICATION 09-10
4.1 Hardware
4.2 Software
5. DESIGN 11-13
5.1 Masked Face Recognition Pipeline
5.2 Dataset Creation
6. IMPLEMENTATION 14-18
6.1 Masked Face Recognition Model
6.2 Masked Face Recognition Process
7. RESULT 19-26
7.1 Masked Face Recognition Model
7.2 InceptionResNetV1 with ArcFace Loss
7.3 InceptionResNetV1 with Triplet Loss
7.4 System
8. CONCLUSION 30
REFERENCES
Appendix
LIST OF TABLES
Table No. Name Page No.
2.1 Literature Survey Summary 03-07
5.4 The raw datasets used 13
7.1 Test result of models with different configurations 19
7.4.1 Test result with different threshold values with method 1 24
7.4.3 Test result with Euclidean distance and cosine similarity 25

LIST OF FIGURES
Figure No. Name Page No.
3.3.1 Simplified recognition flow chart 08
5.1 DCNN Pipeline Design 15
5.2 Flow chart for recognition steps 16
5.3 Sample images after applying MaskTheFace to LFW dataset 17
7.1.1 t-SNE analysis with 10 random people, each with 3 images 19
7.2 InceptionResNetV1. The test accuracy of ArcFace is slightly 19
better than the result of triplet loss with the same model
architecture, InceptionResNetV1
7.2.1 Validation graphs of pre-trained InceptionResNetV1 with 20
ArcFace in epoch 1,5,6,10,11,15
7.2.2 Evaluation matrix of the test result of InceptionResNetV1 21
with ArcFace and confusion matrix
7.3.1 Training with semi-hard triplet pairs for 1 epoch and 5 22
epochs
7.3.2: Training with both semi-hard and hard triplet for 5 epochs with 1, 24
5 and 10 weighting set for hard triplet
7.4.2 Visualization of validation result for our best model 26
(ArcFace) and Conditional probability against distance
difference with a fitted customized sigmoid function
7.4.4 Loss, metrics, false positive and negative during training of 27
the prototypical network
7.4.5 Metrics against threshold values for Euclidean distance and 28
Metrics against threshold values for cosine similarity
8.1 Face Recognized, Mask not detected 29
8.2 Masked Face Recognized 29
8.3 Mask Detected, Face not recognized 30
8.4 Mask not detected; Face not recognized 30
Masked Face Recognition Using Deep Learning

Chapter 1
INTRODUCTION
1.1 Overview
Face recognition is one of the most important applications of machine learning. It has been
widely used in products like access control systems. However, the technology still cannot offer
satisfactory performance in particular conditions, including when the face is half covered.
Some people would wear masks due to health problems or for privacy concerns. However,
there were not many studies done in masked face recognition as these were relatively rare cases
until 2020. With the outbreak of COVID-19 worldwide, wearing masks in public areas has
become the new normal, or even regulations in some places, including Bengaluru, making
recognition of faces with masks an imminent need to improve the current technology.
The main goal of this project is to find a feasible solution to automatically recognize people’s
faces even when they are wearing masks, in order to compensate for the inadequate accuracy
offered by the conventional face recognition technology when the users’ faces are partially
covered. As deep learning is the most popular method to approach conventional face
recognition problems, we also decided to use deep learning to train robust models in this
project. We will try to match various robust models and loss functions that are commonly used
in deep learning and face recognition, and feed them with the dataset we create ourselves in
the project.
To test our model in a real-world situation, we choose to build an access control system which
supports masked face recognition. A server is set up to simulate a practical situation that allows
users to register and upload images themselves and compares the incoming data from the
camera application against the user database to calculate the recognition result.
With the inadequate accuracy offered by the existing conventional face recognition to match
faces with masks, access control systems have to be either temporarily abandoned or require
the users to remove their masks beforehand. This will not only make the process inconvenient
for users but also increase the risk of infection of COVID-19. We believe masked face
recognition can greatly improve the existing system, allowing the users to keep contactless
without removing masks in an automatic access control system.
1.2 Objective
The goal of this project is to train a face recognition model which is capable of identifying
people even when they are wearing masks and integrate the model into an access control
system.

Dept. of AIML, VKIT 2022-23 Page 1


Masked Face Recognition Using Deep Learning

1.2.1 Create datasets of masked face images


Our first goal is to create datasets with masked face images which are not available online. We
will add masks automatically to normal face images. This will allow us to utilize a large
number of existing datasets for normal face recognition problems without masks.
1.2.2 Develop a masked face recognition model
Our second goal is to train a masked face recognition model, with the dataset created by
ourselves, which can recognize static facial images with masks. We would like to look into the
methods used in current face recognition technology including various pre-trained models and
loss functions, and develop a deep learning architecture to train our model. We will also
evaluate different parameters and choose a threshold that can minimize the number of false
positive cases in order to be applied to an access control system.

Dept. of AIML, VKIT 2022-23 Page 2


Masked Face Recognition Using Deep Learning

Chapter 2
LITERATURE SURVEY
Sinno Jialin Pan et al., [1] have given us a comprehensive overview of transfer learning for
classification, regression and clustering developed in machine learning and data mining areas.
There has been a large amount of work on transfer learning for reinforcement learning in the
machine learning literature A unified system for face verification, recognition and clustering.
The main benefits of transfer learning include the saving of resources and improved efficiency
when training new models. The need for transfer learning may arise when the data can be easily
outdated.

M. Tan et al., [10] have implemented EfficientNet-B7 achieves state-of-the-art 84.3% top-1
accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best
existing ConvNet. Model scaling and identify that carefully balancing network depth, width,
monitor social distancing. The proposed training model for mask detection is based on Single-
Shot Multibox Detector (SSD) and You Only Look Once (YOLO) version 2. The testing of
this model is performed on complex images including face turning, wearing classes, beard
faces, and scarf images. The testing accuracy for this model attains 93.4%. An off-line step
aiming to create a DL model that is able to detect and locate facemasks. It is a light-weight DL
suited to edge devices; it provides excellent results for object detection.

Table 2.1 Literature Survey Summary


SI. Year Title Author Methodology Advantage Disadvantage
No
1. 2010 A survey on S. J. Pan and Q. A unified The main The need for
transfer Yang system for face benefits of transfer
learning verification, transfer learning may
recognition and learning arise when
clustering include the the data
saving of can be easily
resources and outdated.
improved
efficiency
when training
new models.
2. 2012 Learning Face Dong Yi, Zhen A semi- Commercial The face pairs
Representation Lei, Shengcai automatic way picture search of the
from Scratch to collect face engines can be photographs

Dept. of AIML, VKIT 2022-23 Page 3


Masked Face Recognition Using Deep Learning

Liao and Stan images from the used to in each batch


Z. Li Internet and supplement are not
builds a large- the dataset. covered, and
scale dataset Create more it is left to the
containing efficient future to
about 10,000 annotating generate
subjects and software and complete face
500,000 algorithms. pairings.
images.
3. 2015 Facenet: A F. Schroff, D. This method Strength of This paper
unified Kalenichenko, uses a deep our model is also looked
embedding for and J. Philbin convolutional that it only into the
face network trained requires accuracy
recognition and to directly minimal trade-off with
clustering optimize the alignment regards
embedding. (tight crop to the number
around the of model
face area). parameters.
However, the
picture
is not as clear
in that case.
4. 2015 Deep Learning Ziwei Liu, Ping A novel deep The filters The face
Face Attributes Luo, Xiaogang learning used in this regions
in the Wild Wang, Xiaoou framework for save the cannot be
Tang. attribute redundant separated
prediction in the computation, from the
wild. It which enables background
cascades two evaluating the and the other
CNNs, LNet image body parts.
and ANet, with arbitrary
which are fine- size in
tuned jointly realtime and it
with attribute also allows to
tags, but pre- take images of
trained arbitrary sizes
differently. as input
without any
normalization.
5. 2018 Focal loss for T.-Y. Lin, P. Focal Loss Two-stage When p is
dense object Goyal, R. focuses training detectors, such very close to
detection Girshick, K. on a sparse set as Regio- 0 (when Y
He, and P. of hard based CNN =0) or 1,
Doll´ar examples and (R-CNN) and easily
prevents the its successors. classified
vast number of examples
easy negatives with large pt
from > 0.5 can
overwhelming incur a loss
with non-

Dept. of AIML, VKIT 2022-23 Page 4


Masked Face Recognition Using Deep Learning

the detector trivial


during training. magnitude

6. 2018 Fine-tuning F. Radenovi´c, Fine-tune The proposed It requires


cnn image G. Tolias, and CNNs for method does very large
retrieval with O. Chum image retrieval not require amount of
no human on a large any manual data in order
annotation collection of annotation and to perform
unordered yet achieves better than
images in a top other
fully automated performance techniques.
manner on standard
benchmarks.
7. 2018 Face Yuval Nirkin Uses the Removes Face
Segmentation information unwanted identification
present in the portions of the in 2-D
edge map of the picture intensity
image and providing images, and
through some better doesn't extend
preprocessing accuracy. its results to a
separates the 3-D system.
head from the
background
clutter.
8. 2018 Vggface2: A Q. Cao, L. To assess face Created a Some classes
dataset for Shen, W. Xie, recognition pipeline for contain a
recognising O. M. Parkhi, performance collecting mixture of
faces across and A. using the new a high-quality faces of more
pose and age Zisserman dataset, we train dataset: than one
ResNet-50. VGGFace2, person, or
with a wide they overlap
range of with another
pose and age. class in the
And dataset.
demonstrate
that deep
models
(ResNet-50
and SENet)
trained on
VGGFace2.

Dept. of AIML, VKIT 2022-23 Page 5


Masked Face Recognition Using Deep Learning

9. 2019 Arcface: J. Deng, J. Guo, The proposed In this paper, Massive data
Additive N. Xue, and S. ArcFace has a they propose storage
angular margin Zafeiriou clear geometric an burden. The
loss for deep interpretation Additive ML
face due to the exact Angular technology
recognition correspondence Margin Loss used in face
to the geodesic (ArcFace) to detection
distance on the obtain highly requires
hypersphere. discriminative powerful data
features for storage that
face may not be
recognition. available to
all users.
10. 2020 Efficientnet: M. Tan and Q. EfficientNet-B7 model scaling Missing
Rethinking V. Le achieves state- and identify piece,
model scaling of-the-art that carefully preventing
for 84.3% top-1 balancing from better
convolutional accuracy on network accuracy and
neural ImageNet, depth, width, efficiency.
networks while being and resolution
8.4x smaller can lead to
and 6.1x faster better
on inference performance.
than the best
existing
ConvNet.
11. 2020 Masked Face Zhongyuan This work face-eye- Another
Recognition Wang, proposes three based multi- related task is
Dataset and Guangcheng types of masked granularity face mask
Application Wang, Baojin face model recognition,
Huang. datasets,like achieves 95% that is,
Masked Face recognition identifying
Detection accuracy, whether a
Dataset, Real- which is person is
world Masked greater thn 5% wearing a
Face comparing mask as
Recognition various required or
Dataset and models not.
Simulated currently.
Masked Face
Recognition
Dataset.

Dept. of AIML, VKIT 2022-23 Page 6


Masked Face Recognition Using Deep Learning

12. 2020 Masked Face Aqeel Anwar, Masked faces to an open- the largest
Recognition Arijit be recognized source tool, face
for Secure Raychowdhury. with low false- MaskTheFace recognition
Authentication positive rates which can be dataset with
and high overall used to mask 24,771
accuracy, faces. This images. But
without results in the the dataset
requiring the creation faces are not
user dataset to of a large consistent or
be recreated by dataset of aligned
taking new masked faces. making it a
pictures for The dataset little harder to
authentication. generated be used.
with this tool
can then used
towards
training an
effective
facial
recognition
system with
target
accuracy for
masked faces
13. 2021 A Deep Wadii Boulila, An off-line step It is a light- The testing of
Learning-based Ayyub aiming to create weight DL this model is
Approach for Alzahem. a DL model that suited to edge performed on
Real-time is able to detect devices; it complex
Facemask and locate provides images
Detection facemasks. excellent including face
results for turning,
object wearing
detection. classes, beard
faces, and
scarf images.
The testing
accuracy by
93.4%.

Dept. of AIML, VKIT 2022-23 Page 7


Masked Face Recognition Using Deep Learning

Chapter 3
PREAMBLE
3.1 Existing System
When face recognition systems are presented with a masked face, the system fails to identify
the person rendering the system unusable. A face recognition system which can recognize
masked faces becomes evident in the wake of the ongoing situation.
3.2 Proposed System
In this proposed system, we train a face recognition model which is capable of identifying
people even when they are wearing masks and integrate the model into an access control
system.
• Cost effective: The main objective of developing algorithms of a real time system is to
provide cost effectiveness. It is necessary to design a system which is affordable and
includes cost effective components for designing.
• Fast: The main objective of this project is to develop an algorithm which is on par or
even better than the existing ones.
• Accuracy: The main objective of this project is to develop an algorithm which is
accurate.

3.3 Methodology

Figure 3.3.1: Simplified Recognition Flow Chart


• Face detection detects the face from the camera. It returns the coordinates of the
bounding box where the face appears in the original image.
• The system checks whether the user is wearing a mask. It will prompt the user to wear
a mask if they’re not wearing one.
• The user is granted access if their face is recognised. The face data is stored in a secure
storage unit.

Dept. of AIML, VKIT 2022-23 Page 8


Masked Face Recognition Using Deep Learning

Chapter 4
REQUIREMENT SPECIFICATION
The study of requirement specification is focused especially on the functioning of the
system. It allows the developer or analyst to understand the system function to be carried out
the performance level to the obtained and corresponding interfaces to be established.
4.1 Hardware
1. GPU for model training
2. Camera for image capture
3. Storage Units
4.1 Software
1. Python
2. PyCharm
3. Anaconda
4. TensorFlow
5. Keras
6. Open CV
Python is a high-level, interpreted, general-purpose programming language. Its design
philosophy emphasizes code readability with the use of significant indentation. Python is
dynamically-typed and garbage-collected. It supports multiple programming paradigms,
including structured (particularly procedural), object-oriented and functional programming. It
is often described as a "batteries included" language due to its comprehensive standard library.
The core principles of Python are summarized as follows:
• Beautiful is better than ugly.
• Explicit is better than implicit.
• Simple is better than complex.
• Complex is better than complicated.
• Readability counts.
PyCharm is a dedicated Python Integrated Development Environment (IDE) providing a wide
range of essential tools for Python developers, tightly integrated to create a convenient
environment for productive Python, web, and data science development. We have used the
Community Edition of PyCharm which is free and open Source. It is used for smart and
intelligent Python development, including code assistance, refactoring, visual debugging, and
version control integration.

Dept. of AIML, VKIT 2022-23 Page 9


Masked Face Recognition Using Deep Learning

Anaconda is a distribution of the Python and R programming languages for scientific


computing (data science, machine learning applications, large-scale data processing, predictive
analytics, etc.), that aims to simplify package management and deployment. The distribution
includes data-science packages suitable for Windows, Linux, and macOS. It is developed and
maintained by Anaconda, Inc., which was founded by Peter Wang and Travis Oliphant in 2012.
As an Anaconda, Inc. product, it is also known as Anaconda Distribution or Anaconda
Individual Edition, while other products from the company are Anaconda Team Edition and
Anaconda Enterprise Edition, both of which are not free.
Package versions in Anaconda are managed by the package management system conda. This
package manager was spun out as a separate open-source package as it ended up being useful
on its own and for things other than Python. There is also a small, bootstrap version of
Anaconda called Miniconda, which includes only conda, Python, the packages they depend on,
and a small number of other packages.
Tensor Flow is a free and open-source software library for machine learning and artificial
intelligence. It can be used across a range of tasks but has a particular focus on training and
inference of deep neural networks. TensorFlow was developed by the Google Brain team for
internal Google use in research and production. The initial version was released under the
Apache License 2.0 in 2015. Google released the updated version of TensorFlow, named
TensorFlow 2.0, in September 2019. TensorFlow can be used in a wide variety of programming
languages, most notably Python, as well as JavaScript, C++, and Java.

Dept. of AIML, VKIT 2022-23 Page 10


Masked Face Recognition Using Deep Learning

Chapter 5
DESIGN
5.1 Masked Face Recognition Pipeline
A supervised DCNN pipeline was designed to solve the facial recognition problem in this
project. We choose the InceptionResNetV1 as the deep convolutional layers and ArcFace as
the loss function because of their highest performance. All facial images will be resized to
128x128x3 first. If the image size is too large, more time and GPU memory are need during
training. If the size is too small, many important features cannot be extracted by the DCNN.
Then, normalization and data augmentation will be done before fitting the data into the
InceptionResNetV1 classifier. The classifier will output a 512-dimensional image embedding.
Finally, ArcFace is used to compute the cost and do optimization.

Figure 5.1: DCNN Pipeline Design

• Normalization and Data Augmentation: Normalizing the data before training a deep
network can make the data distribution better so that the gradient will not become zero.
Data augmentation includes horizontal or vertical flipping randomly to increase the
number of data to handle overfitting.
• Deep Convolutional Layers: There are many convolutional layers and pooling layers
here. Convolutional layers can extract the important features from the images. Pooling
layers can reduce the size of the images so that the model is relatively small. The last
layer in the convolutional layer is a global average pooling or generalized mean pooling
which can further reduce the size of the images.
• Fully-connected Layer: A layer to resize the output from convolutional layer to 512
dimensions. It is a common dimension size for embedding representation in deep
learning. A 512-dimensional embedding is enough to represent most of the important
features of an image, and it is also efficient in computation.

Dept. of AIML, VKIT 2022-23 Page 11


Masked Face Recognition Using Deep Learning

• Loss Function: A function to calculate the cost of the predictions and real labels so that
optimization can be done through back propagation.

Figure 5.2: Flow Chart for Recognition Steps

The overall flow of the recognition process in our system is shown in Figure. Face detection is
always in use, and the whole recognition process will be initiated only when a face is detected.
The image is then passed to mask detection which will be used to determine whether to use a
normal face recognition model or masked face recognition model for feature extraction
depending on whether the user is wearing a mask or not. The extracted embedding will then be
sent to the backend server where the user’s data is stored, and embedding matching will be
done in the server to tell whether the user is valid or not. The result will be logged and will
then be sent back to the camera application. If the user is valid, access will be granted, otherwise
denied. The result will be shown in the camera application with the name of the valid user or a
warning if the user is invalid. If the user is valid, a blue box will be turned on else the box will
be red.

5.2 Dataset Creation

A deep learning model always requires a large amount of diverse data in order to train a robust
model, dataset is essential in our project. However, as there is no masked face image dataset
available online, we have to create our own dataset in the project.

Dept. of AIML, VKIT 2022-23 Page 12


Masked Face Recognition Using Deep Learning

5.2.1 Adding Masks Automatically

MaskTheFace is a GitHub package that can add different kinds of masks to normal face images.
The output masked face images of the package are of satisfactory quality without problematic
images such as masks been added to wrong positions on the face images with few sample
images shown in Figure.

Figure 5.3: Sample Images After Applying Masktheface to LFW Dataset

This package enables us to make use of a large number of datasets with normal face images
currently available. Thus, CASIA, VGGFace2 and LFW dataset are used as the raw data in
our project with the detailed information illustrated in Table.

Table 5.4: The Raw Datasets Used

Datasets Number of identities Number of images


CASIA 10,575 494,414
VGGFace2 9,131 3,310,000
LFW 5,749 13,233
We plan to add green and blue surgical masks, white N95 masks, white KN-95 masks and black
cloth masks to the images in the aforementioned datasets. Adding different types of masks can
make the model more robust so it will be less sensitive to the colour or type of mask worn by
the person in the image. In some cases, the facial images in the original datasets are of low
quality and MaskTheFace package might fail to detect the faces correctly. Those images are
not processed. Nevertheless, as multiple masks are chosen to add to the images, the actual
number of output images in our datasets is comparable to or even larger than the original
datasets as shown in Table. The images from the processed CASIS and VGGFace2 are used
for training and the images from the processed LFW will be split into validation and test set.

Dept. of AIML, VKIT 2022-23 Page 13


Masked Face Recognition Using Deep Learning

Chapter 6
IMPLEMENTATION
6.1 Masked Face Recognition Model

6.1.1 InceptionResNetV1

We wanted to use transfer learning to train a facial recognition model in order to let the loss
converge faster and save the training time. Since our masked face dataset has more than 2
million facial images, the training time will be very long if we did not use transfer learning. A
Python package, facenet-pytorch, provides a pre-trained InceptionResNetV1, which was
trained on VGG2, a large facial image dataset with more than 3,000,000 images. This pre-
trained model achieves 99.65% accuracy on a facial image dataset, LFW. We simply loaded
the pre-trained InceptionResNetV1 as the classifier. Then, we used PyTorch, a deep learning
framework, to keep training this pre-trained model with our masked face dataset. We did not
need to modify the output size of the 1-layer fully connected layer because the output size is
512 in the last linear layer of InceptionResNetV1. After the loss and accuracy converge, the
model can generate a representative 512- dimensional embedding, which contains the
important feature of the facial image.

6.1.2 SE-ResNeXt-101

We could not find a SE-ResNeXt-101 model, which was trained on facial image datasets. Most
of the pre-trained SE-ResNeXt-101 models were trained on ImageNet, a large object
recognition dataset. A pre-trained model of object recognition is not suitable for doing transfer
learning in facial recognition. The feature extraction in the model can only extract object’s
features. Using this model is the same as training a model from scratch with weight
initialization or even worse than weight initialization. Our method is that training a SE-
ResNeXt-101 from scratch without transfer learning. The Python package, timm, provides
many pre-implemented image models. SE-ResNeXt-101 is one of the pre-implemented models
in timm. We loaded SE-ResNeXt-101 from timm and trained it with the same method and
hyper-parameters as InceptionResNetV1.

Dept. of AIML, VKIT 2022-23 Page 14


Masked Face Recognition Using Deep Learning

6.2 System

Our system consists of three parts. The camera application will capture the images from the
user and stores them. The system will return the result which will be reflected in the camera
application. The system also allows users to register and upload images themselves.

6.2.1 Masked Face Recognition Process

A camera application is built with OpenCV from a skeleton code available online which will
handle the major part of the recognition process. Multiprocessing is used in the application,
one for grabbing the images from the camera and putting them into a queue, and another for
getting the images from the queue and displaying it with an image widget in the application.
There are multiple steps as shown in the following flow chart to process the incoming image
and doing embedding matching on the server with the results being shown in the camera
application.

Face Detection

We use MTCNN, a famous face detection package, in our face detection module. It returns the
coordinates of the bounding box where the face appears in the original image. As users will be
authenticated one by one in an access control system, only the first face detected by MTCNN
in the image captured by the camera will be processed. The cropped facial image with the
bounding box, instead of the whole image, will then be passed to the following steps.

Face Mask Detection

MaskDetection is a package developed by AIZOOTech. We use it in our mask detection


module. As wearing a mask has become mandatory in some places like Hong Kong, a warning
will be shown when the user is not wearing a mask though the authentication procedure will
not stop. More importantly, the mask information is useful in our next step and could increase
the accuracy of the system.

Embedding Extraction

Since the conventional face recognition model still performs better than our model as more
features is available, we will use the pre-trained InceptionResNetV1 to extract the embedding
when the user is not wearing a mask. Otherwise, our masked face recognition model will be
used to extract the embedding.

Dept. of AIML, VKIT 2022-23 Page 15


Masked Face Recognition Using Deep Learning

Pseudo Code

#----tensorflow version check


if tensorflow.__version__.startswith('1.'):
import tensorflow as tf
else:
import tensorflow.compat.v1 as tf
tf.disable_v2_behavior()
print("Tensorflow version: ",tf.__version__)

img_format = {'png','PNG','jpg','JPG','JPEG','bmp','BMP'}
def video_init(camera_source=0,resolution="1080",to_write=False,save_dir=None):
'''
:parameter camera_source:
:parameter resolution: '480', '720', '1080'. Set None for videos.
:parameter to_write: to record or not
:parameter save_dir: the folder to save your recording
:return: cap,height,width,writer
'''
writer = None
resolution_dict = {"480":[480,640],"720":[720,1280],"1080":[1080,1920]}
#----camera source connection
cap = cv2.VideoCapture(camera_source)
#----resolution decision
if resolution_dict.get(resolution) is not None:
# if resolution in resolution_dict.keys():
width = resolution_dict[resolution][1]
height = resolution_dict[resolution][0]
cap.set(cv2.CAP_PROP_FRAME_WIDTH, width)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, height)
else:
height = cap.get(cv2.CAP_PROP_FRAME_HEIGHT)#default 480
width = cap.get(cv2.CAP_PROP_FRAME_WIDTH)#default 640
print("video size is auto set")

Dept. of AIML, VKIT 2022-23 Page 16


Masked Face Recognition Using Deep Learning

'''
ref:https://docs.opencv.org/master/dd/d43/tutorial_py_video_display.html
FourCC is a 4-byte code used to specify the video codec.
The list of available codes can be found in fourcc.org.
It is platform dependent. The following codecs work fine for me.
In Fedora: DIVX, XVID, MJPG, X264, WMV1, WMV2. (XVID is more preferable.
MJPG results in high size video. X264 gives very small size video)
In Windows: DIVX (More to be tested and added)
In OSX: MJPG (.mp4), DIVX (.avi), X264 (.mkv).
FourCC code is passed as `cv.VideoWriter_fourcc('M','J','P','G')or
cv.VideoWriter_fourcc(*'MJPG')` for MJPG.
'''
if to_write is True:
#fourcc = cv2.VideoWriter_fourcc('x', 'v', 'i', 'd')
#fourcc = cv2.VideoWriter_fourcc('X', 'V', 'I', 'D')
fourcc = cv2.VideoWriter_fourcc(*'XVID')
save_path = 'demo.avi'
if save_dir is not None:
save_path = os.path.join(save_dir,save_path)
writer = cv2.VideoWriter(save_path, fourcc, 30, (int(width), int(height)))
return cap,height,width,writer
def stream(pb_path,
node_dict,ref_dir,camera_source=0,resolution="480",to_write=False,save_dir=None)
frame_count = 0
FPS = "loading"
face_mask_model_path = r'face_mask_detection.pb'
margin = 40
id2class = {0: 'Mask', 1: 'NoMask'}
batch_size = 32
threshold = 0.8
display_mode = 0
label_type = 0
cap,height,width,writer = video_init(camera_source=camera_source,
resolution=resolution, to_write=to_write, save_dir=save_dir)
# ----face detection init
Dept. of AIML, VKIT 2022-23 Page 17
Masked Face Recognition Using Deep Learning

for i in range(ites):
num_start = i * batch_size
num_end = np.minimum(num_start + batch_size, len_ref_path)
batch_data_dim =[num_end - num_start]
batch_data_dim.extend(model_shape[1:])
batch_data = np.zeros(batch_data_dim,dtype=np.float32)

for idx,path in enumerate(paths[num_start:num_end]):


# img = cv2.imread(path)
img = cv2.imdecode(np.fromfile(path, dtype=np.uint8), 1)
writer.release()
if __name__ == "__main__":
camera_source = 0#usb camera or laptop camera
#The camera source can be also the path of a clip. Examples are shown below
# camera_source = r"rtsp://192.168.0.137:8554/hglive"
# camera_source = r"C:\Users\User\Downloads\demo01.avi"
#pb_path: please download pb files from Lecture 48
pb_path = r"C:\Users\realk\Documents\Project\pb\pb_model_select_num=15.pb"
node_dict = {'input': 'input:0',
'keep_prob': 'keep_prob:0',
'phase_train': 'phase_train:0',
'embeddings': 'embeddings:0',
}
#ref_dir: please offer a folder which contains images for face recognition
ref_dir = r"C:\Users\realk\Documents\Project\dataset\database"
stream(pb_path, node_dict, ref_dir, camera_source=camera_source, resolution="480",
to_write=False, save_dir=None)
'''
resolution: '480', '720', '1080'. If you input videos, set None.
'''

Dept. of AIML, VKIT 2022-23 Page 18


Masked Face Recognition Using Deep Learning

Chapter 7
RESULT
7.1 Masked Face Recognition Model

This section is mainly about our experiments on different models and loss functions. We
compare their results through the training, validation and testing processes. The best model is
attained with InceptionResNetV1 and ArcFace, which will also be used as the final model
utilized in the access control system. The following table shows the best result we achieve with
different combinations of models and loss functions. Our model’s capability in extracting
similar embedding for images from the same person is also illustrated by t-SNE analysis in
Figure 7.2, where images from the same person can form obvious clusters.

Table 7.1: Test Result of Models with Different Configurations

Model Architect Loss function Pre-trained Accuracy


Model1 InceptionResNetV1 ArcFace Yes 95.85%
Model2 InceptionResNetV1 Triplet loss Yes 94.28%
Model3 SE-ResNeXt-101 ArcFace No 93.51%

Figure 7.1.1: t-SNE Analysis With 10 Random Figure 7.1.2: Distribution of Two Validation Sets
People, Each With 3 Images
The 2 validation sets and the test set prepared with the method were used to evaluate the
performance of our model, with the sample visualization of the validation result shown in
Figure. For simplicity, validation set 1 which contains the pairs of images from the same person
will be called same-class pair and validation set 2 which contains the pairs of images from
different people will be called different-class pair.

Dept. of AIML, VKIT 2022-23 Page 19


Masked Face Recognition Using Deep Learning

7.2 InceptionResNetV1 with ArcFace Loss

Except for using triplet loss to do optimization, we also try ArcFace to optimize the pre-trained

Figure 7.2: InceptionResNetV1. The Test Accuracy of Arcface Is Slightly Better Than the
Result of Triplet Loss with The Same Model Architecture, Inceptionresnetv1.

Figure shows the result of training loss reduction. We choose a multistep decay scheduler, an
equally spaced five-step learning rate with values 0.1, 0.01, 0.001, 0.0001 and 0.00001(reduce
learning rate at the end of epoch 5, 10, 15, 20). The training loss is dropped a lot after every
learning rate decay. However, we cannot make any conclusion through the training result only.
It is because there may be a model over-fitting occur. We need to analyse the result with the
validation and test result together. Figure 34 shows the IOU result, one of the validation results.
We observe that the model obtains the best result in epoch 11. After epoch 11, the validation
result does not improve anymore; even the training loss continues to drop. It means that the
model is the best in epoch 11. After that, over-fitting occurs.

Figure shows the validation graphs in epoch 0, 5, 6, 10, 11, 15, 16, 20 and 21. We observe that
the validation result is similar to the training result: after the decay of learning rate at the end
of epoch 5 and 10, the validation result is also improved a lot at each time. However, some
overfitting still occurs after training too many epochs. We do not observe an obvious
improvement from the validation graphs in the last two decays of the learning rate, even the
training loss is dropped.

Dept. of AIML, VKIT 2022-23 Page 20


Masked Face Recognition Using Deep Learning

Figure 7.2.1: Validation Graphs of Pre-Trained Inceptionresnetv1 with Arcface In Epoch


1,5,6,10,11,15

Figure 7.2.2: Evaluation Matrix of The Test Result of Inceptionresnetv1 with Arcface and Confusion
Matrix

After getting the model with the lowest IOU value, we use this model to do testing with the
testing set. The test result of this model (95.85%) is the best compared with other models in
our experiment.

Dept. of AIML, VKIT 2022-23 Page 21


Masked Face Recognition Using Deep Learning

Although FaceNet suggests that semi-hard triplet pairs are preferred in training a face
recognition model, we actually observe that the model trained with semi-hard triplet pairs fails
to learn to recognize images of different people throughout the training. Figure shows that the
model eventually gets a large distance difference for many same-class pairs and also gets a
small distance difference for even more different-class pairs. This situation is alleviated after
we beginning to make use of both semi hard and hard triplet, or even adding a larger weight to
the loss calculated from hard triplet as shown in Figure. On the contrary, the training result
when using only hard triplet is more normal as shown in Figure.

Figure 7.3.1: Training with Semi-Hard Triplet Pairs for 1 Epoch And 5 Epochs

Figure 7.3.2: Training with Both Semi-Hard and Hard Triplet for 5 Epochs With 1, 5 And 10
Weighting Set for Hard Triplet

Figure 7.3.3: Training with Hard Triplet Pairs for 1 Epoch And 5 Epochs

The best model trained with hard triplet achieves an accuracy of 94.28% in our testing set.
Visualization of the changing in the validation set during the training process is shown in Figure

Dept. of AIML, VKIT 2022-23 Page 22


Masked Face Recognition Using Deep Learning

7.3.1. Figure 7.3.2 shows the decreasing in training loss and IOU during the training process
and Figure 7.3.2 shows the ultimate test result. We can see that the actual test result is roughly
the same as the IOU we choose as the metrics for validation, with the lowest IOU achieved at
0.054 and the test accuracy 94.28% (≈ 100% − 5.4%). There is also a validation loss shown in
Figure 31 which is the result we get from a manually downloaded real masked face image
dataset with only around 100 images. Since the dataset is too small, which leads to the
instability in the loss on this dataset, and also the IOU value can already effectively reflect the
performance of our model, the dataset is abandoned.

Figure 7.3.4: Best model (triplet) with triplet loss after 5, 12, 20 and 28 epochs

Figure 7.3.5: Training Loss and IOU of the Best Model (Triplet) During Training and Test Result of
The Best Model (Triplet)

Dept. of AIML, VKIT 2022-23 Page 23


Masked Face Recognition Using Deep Learning

7.4 System
7.4.1 Face Detection

MTCNN works very well during our testing in the real-world situation as well as the
downloaded images we prepare for system testing. However, since GPU cannot be used in our
local machine, it is too computationally expensive to loop over the image too many times for
a real-time application. Nevertheless, we still want to keep a real-time face detection so that
the bounding box can be shown correctly on the camera application, MTCNN has to be done
for each frame. To make the streaming more fluent, we decide to set a minimum face size that
the MTCNN can detect. After doing some experiments, the minimum face size that the camera
can detect is set to be 1/64 of the displayed screen size. This will not affect the effectiveness of
our system as it is reasonable for the user to walk close to the camera for the face recognition
process, but greatly improve the fluency of the streaming.

7.4.2 Embedding Matching

Methods introduced in to calculate the threshold and implement embedding matching is used.
As there are 129 ∗ 128 = 16512 pairs available, the b16512 ∗ 0.05%c = 8th value will be used
as the threshold to keep a roughly 99.95% precision on comparison on pairs of images. With
Euclidean distance and cosine similarity, the thresholds calculated are 0.7879859 and
0.6895391 respectively. With the first method to compare all 3 embeddings available for each
user, the image threshold can be set to 1, 2 or 3, which means only one 1, 2 or 3 of the images
from a single user have embedding difference smaller than the threshold value will be
considered as a valid user. The detailed comparison results are shown in Table.

Table 7.4.1: Test Result with Different Threshold Values with Method 1

Metrics Threshold Image threshold Accuracy Precision Result


Euclidean distance 0.7879859 1 72.8% 96.4% 61.7%
2 58.2% 99% 37.9%
3 43.3% 100% 15.3%
Cosine similarity 0.6895391 1 74.7% 96.0% 64.8%
2 60.5% 99.1% 41.4%
3 44.9% 100% 17.6%
Conditional 0.9995 1 76.2% 95.2% 67.8%
probability 2 62.8% 99.2% 44.8%
3 47.7% 100% 21.8%

Dept. of AIML, VKIT 2022-23 Page 24


Masked Face Recognition Using Deep Learning

Another metric proposed by us, conditional probability, is calculated based on the validation
set (visualization shown in Figure ) from our masked face recognition model as mentioned in
This significantly affects the conditional probability result. Therefore, we fit a customized
sigmoid function to the conditional probability calculated based on the original values, which
becomes very smooth and will be used by us to estimate the conditional probability. The
problematic original conditional probability results and the smooth fitted sigmoid function are
shown in Figure 7.4.2. We can also easily set the conditional probability threshold to 0.9995 to
keep a high precision and be consistent with how we get the thresholds for Euclidean distance
and cosine similarity. Probability seems to perform the best as shown in Table 7.4.1 with a
comparable precision with Euclidean distance and cosine similarity but with much higher recall
and thereby higher accuracy with different image thresholds set.

Figure 7.4.2: Visualization of Validation Result for Our Best Model (Arcface) And Conditional
Probability Against Distance Difference with A Fitted Customized Sigmoid Function.
The second method to implement embedding matching is to utilize a prototypical network. The
mean value of the 3 embeddings for each user is used as the prototype for each user. This is
extremely useful when the database is very large as only 1 3 calculation and comparison needs
to be done as method 1. The experiment results shown in Table 7.4.3 also reflect its excellent
performance, with precision, recall and accuracy all much better than the results we get from
method 1 with the image threshold set as 1 as shown in Table 7.4.1. To keep a high precision
with comparable accuracy and recall, the result with Euclidean distance as the metric seems to
be the best. Thus, we will use Euclidean distance with threshold 0.7879859 in our system.

We also study the relationship between accuracy, precision and recall values with different
threshold values applied in the case of using prototypes. We can see that the threshold values
we retrieve with the previously mentioned method, 0.7879859 for Euclidean distance and
0.6895391 for cosine similarity, almost achieve the highest accuracy while at the same time
keep a high precision. This means that the method proposed by us to decide the threshold value

Dept. of AIML, VKIT 2022-23 Page 25


Masked Face Recognition Using Deep Learning

is robust and applicable. The relationship between values of different metrics and the threshold
value is illustrated in the following plots.

Figure 7.4.5: Metrics Against Threshold Values for Euclidean Distance and Metrics Against
Threshold Values for Cosine Similarity

A further experiment is done to evaluate the possibility to expand the database size. We firstly
reduce the database size to 50, which are the 50 people who appear in valid testing data. The
missing people in the original database are then added back to the database one by one while
keeping the valid testing and invalid testing data unchanged. The relationship between the
metrics and the increasing database size is shown in Figure 7.4.6. We can see that the
performance is very consistent and is only affected by very few individual users (in our case,
the precision and accuracy drop a little only once due to the addition of one user). Therefore,
we believe our system will be as performant as the current one after not too huge expansion.

Figure 7.4.6: Metrics During Expanding Database Size (From 50 To 100 People)

A very important thing to notice is that the above experiments are done on each of the static
images but our actual system will stream the incoming images of the users taken by the camera,
which means there will be many incoming images that will be used for embedding matching.
Therefore, the performance of our system in the real-world situation is not limited by the 81.3%
accuracy attained.

Dept. of AIML, VKIT 2022-23 Page 26


Masked Face Recognition Using Deep Learning

Chapter 8
CONCLUSION
To summarize this project of a masked face recognition model for an access control system,
we have solved the problems of masked face recognition. In the access control system, we
implemented a camera application for taking in the streaming, a database for storing users’
information Users store their details on the storage unit and can then be recognized and
authenticated in our system

In our primary focus, the masked face recognition model, we explored different deep learning
methods, especially deep convolutional neural networks with transfer learning for
representative embedding learning. We created a new masked face image dataset for training
the DCNN model. We also implemented our DCNN pipeline for embedding learning and found
that the DCNN architecture, InceptionResNetV1, along with ArcFace loss, could achieve the
highest 95.85% accuracy in our experiments. With the embedding learned from the model, we
used a prototypical network to do embedding matching to recognize users’ identities. By
integrating different steps like face detection and mask detection, the model was finally applied
to our access control system.

The whole access control system was fully functional with our masked face recognition model
in the real-world situation. In the camera application, face detection, mask detection and
embedding extraction were used to capture a face and convert it to an embedding to be matched
on the storage.

In conclusion, we have finished our goal of building an access control system with face
recognition that works with and without masks. It achieved high accuracy in identity
verification. Our future work should tackle the questions “how to further improve the accuracy
of the masked face recognition model”, “how to improve data security in the system”, and
“how to prevent spoofing attack with 3D modelling”.

Dept. of AIML, VKIT 2022-23 Page 27


Masked Face Recognition Using Deep Learning

REFERENCES
1. S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Transactions on
Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345– 1359, 2010.
2. D. Yi, Z. Lei, S. Liao, and S. Z. Li, “Learning face representation from scratch,” 2012.
3. F. Schroff, D. Kalenichenko, and J. Philbin, “Facenet: A unified embedding for face
recognition and clustering,” in 2015 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR), pp. 815–823, 2015.
4. Ziwei Liu, Ping Luo, Xiaogang Wang, Xiaoou Tang., “Deep Learning Face Attributes
in the Wild” 2015
5. T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Doll´ar, “Focal loss for dense object
detection,” 2018.
6. F. Radenovi´c, G. Tolias, and O. Chum, “Fine-tuning cnn image retrieval with no
human annotation” 2018.
7. YuvalNirkin,“facesegmentation.”https://github.com/YuvalNirkin/face_segmentation#
deep-face-segmentation-in-extremely-hard-conditions, 2018.
8. Q. Cao, L. Shen, W. Xie, O. M. Parkhi, and A. Zisserman, “Vggface2: A dataset for
recognising faces across pose and age,” 2018.
9. J. Deng, J. Guo, N. Xue, and S. Zafeiriou, “Arcface: Additive angular margin loss for
deep face recognition” 2019.
10. M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” 2020.
11. Zhongyuan Wang, Guangcheng Wang, Baojin Huang. Masked Face Recognition
Dataset and Application, 2020.
12. Aqeel Anwar, Arijit Raychowdhury,”Masked Face Recognition for Secure
Authentication”, 2020.
13. Wadii Boulila, Ayyub Alzahem., “A Deep Learning-based Approach for Real-time
Facemask Detection”, 2021.

Dept. of AIML, VKIT 2022-23 Page 28

You might also like