0% found this document useful (0 votes)
48 views6 pages

Comparative Analysis of Transfer Learning CNN For Face Recognition

A_Face_Recognition_Security_Model_Using_Transfer_Learning_Technique

Uploaded by

pycoder29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views6 pages

Comparative Analysis of Transfer Learning CNN For Face Recognition

A_Face_Recognition_Security_Model_Using_Transfer_Learning_Technique

Uploaded by

pycoder29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2022 2nd International Conference on Intelligent Technologies (CONIT)

Karnataka, India. June 24-26, 2022

Comparative Analysis of Transfer Learning CNN


for Face Recognition
Janvi Nandre Swarnim Rai B. R. Kanawade
Department of Information Technology Department of Information Technology Department of Information Technology
International Institute of Information International Institute of Information International Institute of Information
Technology, Technology, Technology,
Pune, India Pune, India Pune, India
[email protected] [email protected] [email protected]

Abstract—The practice of identifying people by looking at CNN. The pre-trained model is employed to extract features
their faces is known as face recognition (FR). This technology when the dataset is minimal or the classification problem is
is frequently used in biometric authentication, surveillance similar. Transfer learning was employed in the proposed
technologies, security systems, law enforcement, real-time study to develop a face recognition system using three
attendance systems, smart cards, and other applications. The alternative CNN architectures.
face recognition technology works in two stages. First, a
method for picking up or extracting face characteristics is
applied, followed by pattern categorization. Deep learning has II. LITREATURE OVERVIEW
2022 International Conference on Intelligent Technologies (CONIT) | 978-1-6654-8407-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/CONIT55038.2022.9847946

lately made important contributions to face recognition Previously, researchers developed many algorithms and
technology, particularly the convolutional neural network methodologies for facial recognition. These are covered in
(CNN). Deep learning employs numerous processing layers to the following section. Masooli et al. [1] suggested a training
develop data representations with varying degrees of feature strategy for fine-tuning a cutting-edge SeNet-50 architecture
extraction. Since the achievements of DeepFace and DeepID, in extracting resolution-robust deep features. The SeNet-50
this developing technology has altered the study landscape of
model was trained on photos from the VGGFace2 dataset
facial recognition. Deep learning has dramatically improved
with a randomly chosen resolution and a cooperative
cutting-edge performance and aided in the development of
effective real-world applications. This study examines the
learning teacher-student strategy from literature. Authors
performance of three among the most popular CNN evaluated the model on IARPA Janus Benchmark B (IJB-B),
architectures for face recognition. In the proposed work, IARPA Janus Benchmark C (IJB-C), Queen Mary University
transfer learning is used to deploy pre-trained CNN models for of London (QMUL) SurvFace, TinyFace, and Surveillance
face recognition such as VGG16, ResNet-50, and MobileNet. Cameras Face Database (SCface) datasets and obtained good
Training and validation accuracy and loss were utilized as cross-resolution recognition accuracy results. Sanchez et al.
criterion to enhance the performance of the CNN algorithm. 5 [2] proposed a framework for real-time facial identification
Celebrity Faces Dataset from Kaggle, as well as a local dataset, that requires moderate hardware and a pairing of deep
were used to train and test the models. Face recognition was learning algorithms like FaceNet and classic classifiers like
implemented in two ways: static and webcam-based. The support vector machine (SVM), K-nearest neighbor (KNN),
models developed for this research can be used in real-time and random forest (RF) to function in an unconstrained
attendance and surveillance systems. environment. The model detected faces using YOLO-Face
and used preprocessing techniques such as Bicubic
Keywords—Machine Learning Face · Recognition · interpolation for picture resampling, the L2 method for
Convolution Neural Network (CNN) · Neural Networks · Deep normalization, and color adjustments. FaceNet with the
Learning · Transfer Learning · VGG16 · ResNet-50 · MobileNet softmax and cross-entropy loss function was used for the
recognition stage. Face detection is performed on the Face
I. INTRODUCTION Detection Data Set and Benchmark (FDDB), Wider Face and
The market for face recognition technology is rapidly Honda/UCSD CelebA data sets and is evaluated based on
growing as a result of breakthroughs in artificial intelligence accuracy, precision, and recall rate, whereas face recognition
(AI), deep learning, and machine learning (ML) is performed on the Labeled Faces in the Wild (LFW) and
technologies. Facial recognition refers to a system that YouTube Faces (YTF) data sets, with the YTF dataset
recognizes people based on their faces. Face recognition achieving 99.1 percent recognition accuracy at a real-time
requires just a digital imaging device to create and gather the speed of 24 frames per second. Almday et al. [3] evaluated
images and data required to establish and record the subject's FR performance using pre-trained convolutional neural
biometric face pattern. Unlike other methods of networks (AlexNet and ResNet-50 models) for feature
authentication such as password-based, email verification, or extraction, support vector machine for categorization, and
fingerprint identification, biometric facial recognition relies transfer learning for both feature extraction and classification
on unique mathematical and dynamic patterns, making it one using pre-trained CNN (AlexNet model). The evaluation was
of the safest and most successful. There are numerous performed on the Olivetti Research Laboratory (ORL),
approaches of implementing FR. The use of CNN for deep Georgia Tech Face, Frontalized labeled faces in the wild
learning is an important advance. CNN can be applied in a (F_LFW), FEI face, Labeled Faces in the Wild (LFW) and
variety of ways. The first way entails learning the model YouTube Faces (YTF) datasets. ResNet with SVM used the
completely from scratch. The architecture of the pre-trained least amount of time to train, however, transfer learning
model is employed in this situation, and the dataset is used to AlexNet consistently performed well.
train it. In large dataset situations, the second method is to Kumar et al. [4] developed an image recognition
employ transfer learning with features from pre-trained Artificial Neural Network (ANN) with OpenCV and

978-1-6654-8407-7/22/$31.00 ©2022 IEEE 1


Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
Euclidean distance, as well as an android-based navigation specifically for image datasets. CNN’s are generally used for
system with ultrasonic sensors, to address the issue of smart objects detection, but they may also be used in natural
navigation for the visually impaired people. The model had a language processing applications. One of the key reasons
95% accuracy rate for obstacle detection and a 90% accuracy convolutional neural networks are so significant in deep
rate for face recognition. Razavian et al. [5] demonstrated learning and artificial intelligence nowadays is that they are
that Deep-Net features can serve as the prime candidate in useful in these fast-growing domains. CNN's are based on
vision tasks. They retrieved features from the OverFeat the basic principles that govern how the human retina
network and used them as a general representation for functions. Each neuronal bundle in the retina is responsible
various object classification tasks, scene recognition, and for one overlapping region of the overall picture.
attribute description. For recognition of a variety of vision
tasks, the feature representation of shape 4096x1 was used in Image processing tasks have two key characteristics that
SVM classifier or simple linear classifiers. They also claimed influence how they should be approached. The first is that
that the features retrieved using OverFeat and trained on the pictures are massive in terms of input; even a 100x100 image
ImageNet database can be used for a variety of visual would supply 10,000 different weights to each node in a fully
identification tasks. Prakash R. et al. [5] suggested a CNN- linked network. The first is that location is important. Pixels
based automated facial recognition system with transfer that are close to a feature's center position in a picture are
learning. Weights learned from the pre-trained VGG-16 crucial, whereas pixels that are far away are highly
model are used by CNN. Softmax activation was used to unimportant. As a result, a completely linked network is not
send the collected features into the fully linked layer for an acceptable design for imaging tasks, as it would be
classification. AT&T dataset photos get a 100% face computationally intractable as well as inefficient.
recognition accuracy, while Yale dataset photos get a 96.5 % CNN's solve this challenge by reducing the amount of
face accuracy. Face recognition utilizing CNN with transfer weights that must be learned for each node in an intelligent,
learning exceeds the PCA technique in terms of classification locality-aware manner. CNN's have the critical property of
accuracy, according to the findings. To address the being location invariant, which means that a feature
limitations of video-based face recognition, Ding and Tao [6] appearing in one area in an image may be recognized no
examined a broad CNN-based approach (VFR). CNN better or worse than the identical feature appearing
discovers hidden details by leveraging prepared material that everywhere else in the picture.
contains misleadingly disguised data and photos. They
demonstrated a trunk-branch composite CNN model (TBE- Convolution is CNN's main function. Convolutions start
CNN) to enhance CNN features for identifying differences by defining a small feature and then attempting to locate the
and obstacles. They enhanced the triplet misfortune potency picture segment that most closely fits that feature. A rolling
of TBE-CNN-learned classification representations and window is applied to the image, and for each selection in the
evaluated the model on the PaSC, YouTube, and COX Face window, a summary statistic of "fitness" is obtained. The
datasets. Bah et al. [7] proposed a new method for improving winning window is the one that is the most similar to the
the accuracy of a face recognition system by combining the others. A feature map is an outcome of applying a
Local Binary Pattern (LBP) technique with several advanced convolution to an image for a specific feature, and it appears
image processing strategies, including Contrast Adjustment, like a matrix of fit statistics. Pooling is another key step.
Histogram Equalization, Bilateral Filter, and Image Blending Pooling splits the feature map's fit statistics and computes a
with 0.5 alpha value and 181 x 181-pixel values. The summary statistic from them. A good pooling layer will
proposed method outperformed other handcrafted FR lower the feature map's complexity while substantially
algorithms in terms of face detection and accuracy. The compressing it, making it much more computationally
approach, however, fails to tackle the issues of occlusion and feasible. Finally, ReLU layers are used by CNNs to create
concealed faces. Umrani et al. [8] thoroughly analyzed sparsity. It's an activation function that scales the value nodes
numerous feature-based automatic facial recognition systems broadcast to the next layer to zero, thus "deadening" nodes
and the aspects that influence them, such as stance, variation, that deliver sum-zero or sum-negative outputs.
and illumination. Face recognition approaches in the forensic CNN's work by layering Convolution > ReLU > Pooling
realm have been discussed. The research has addressed . The initial several layers in the stack learn feature maps for
future dealings with face recognition technology and extremely tiny portions of the image, usually just a few
provided a comprehensive study of numerous common pixels in size. A ReLU layer keeps the relevant feature maps
datasets. Yassin et al. [9] investigated some well-known and disables the irrelevant ones. The feature maps are then
approaches for FR and provided a taxonomy of the strategies upvetted to bigger sizes (e.g. a dozen pixels), parsed by a
as well as a comparison study in terms of complexity, ReLU, repooled, and so on. In a fully-trained network, nodes
robustness, accuracy, and discrimination. They also at the most similarly-sized computational layer of the
examined a number of prominent datasets used for FR. network vote on patterns that exist at varying sizes in the
Nuredin Ali [10] used transfer learning on VGGFace to picture corpus, resulting in a vertical stack of ever more
recognize faces with dark skin, primarily Ethiopian faces. On sophisticated feature maps. Finally, there is a fully linked
local datasets, the accuracy was greater than 95%. layer at the end of the image that combines the votes from
each of the high-level detectors over what the network is
III. PRELIMINARIES trying to distinguish. Fully connected layers can be combined
with each other so that the network can vote on deeper
A. Convolutional Neural Networks (CNN) combinations of features, which can further improve
Convolutional neural networks are a form of neural decision-making.
network that are frequently employed to tackle image
processing problems. They are a network architecture made

2
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
IV. DATASET

A. Dataset Collection
In this research, two dataset approaches were used. The
first dataset is 5 Celebrity Faces Dataset from kaggle. This is
a small dataset that can be used to experiment with computer
vision and image processing techniques. It features a training
directory with 14-20 photographs of each celebrity and
contains total of 118 images with different scales,
illumination, pose, and resolution. The training set has 93
photos, whereas the testing set contains 25.
For the second dataset, a new approach was used, with
Fig. 1. CNN basic architecture
OpenCV being used to build a program that takes samples
from a camera. Then, using the HAAR classifier, it detects
faces and returns cropped images containing faces. Using
A. Transfer Learning this method, it is possible to produce a local dataset with an
Transfer learning is an AI-ML subfield that aims to adapt infinite number of photos for a specific class, each with a
information gained from one effort (the source task) to a different stance and lighting variation.
different but related activity (target task). Transfer learning is
the endeavor to use what is learned in one activity to increase
generalization in another. Transfer learning requires three
requirements to be met: the production of an open-source
third-party pre-trained model, the repurposing of the model,
and fine-tuning for the Scenario.

B. VGG16
The University of Oxford proposed the VGG16 CNN
architecture. It was one of the well-known models that was
submitted in the ILSVRC-2014. In ImageNet [11], a
collection of over 14 million images classified into 1000
classes, the model exhibits 92.7 percent top-five test
accuracy. It outperforms AlexNet by sequentially replacing
massive kernel-sized filters, (eleven and fifth in the initial
layers) with multiple 33 kernel-sized filters.

C. ResNet50 Fig. 2. 5 Celebrity Faces Dataset [12]


ResNet50 is a 50-layer deep convolutional neural
network. Microsoft constructed and trained the model in
2015. This model was also trained using approximately 1 B. Data Augumentation
million images from the Imagenet dataset. It, like the VGG- Before feeding the training and validation images to a
16, can classify up to 1000 objects and was trained on convolutional neural network, they were augmented.
coloured pictures with 224x224 pixels. Convolutional neural networks require a large quantity of
data, but our dataset was insufficient, therefore we performed
D. MobileNet data augmentation to prevent over-fitting. Simple geometric
transformations such as translation, scale change, horizontal
TensorFlow's first mobile computer vision model, with
flip, zooming and so on are used to acquire augmented data
28 layers, is the MobileNet model, which is intended for use
from original images. ImageDataGenerator from the Keras
in mobile apps. MobileNet employs depthwise separable
library helps in achieving this effeciently. In each epoch,
convolutions. It considerably decreases the number of
the augmented data replaces the original training and
parameters when compared to the network with standard
validation data before passing through a CNN model, which
convolutions of the same depth in the nets. Lightweight deep
is how the model obtains multiple types of images from same
neural networks are constructed as a result.
class.
E. Face recognition
V. EXPERIMENTS AND METHODOLOGY
Facial recognition is a system that can identify people
based on their faces. It is based on complex mathematical AI The main purpose of this research was to see how well
and machine learning algorithms that gather, record, and convolutional neural networks performed in face recognition.
evaluate face traits in order to match them with photos of To begin, the pre-trained models and weights were imported
people and, in some cases, data about them in a database. using the Keras [13] API library and TensorFlow backend.
Facial recognition is a subset of biometrics, which also The pre-trained model was then frozen and rebuilt using a
includes, palm printing, fingerprint scanning, eye scanning, new customized fully connected layer. Along with an
and signature identification. additional layer, a new hidden layer of neurons was added.
This additional layer flattened the image input, reducing it to

3
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
a single dimension. Using the softmax activation function, all learning models, the best batch size for all variations of the
of the layers were then grouped into a single object. Next, the experiment is 32. We chose a batch size of 32 for our
model was developed with the Adam optimizer, accuracy as number of epoch experiments. The table II shows training
metrics, and a loss function. During the preprocessing stage, and validation accuracy of the 3 models for 50 epochs.
the dataset images were augmented using the VGG16 achieved the highest validation accuracy.
ImageDataGenerator from the Keras package. The model
was then trained using the 5 celebrities image dataset as well The training and validation accuracy and loss for our
as a local dataset constructed from real-time collected models are as follows:
pictures having 224 × 224-pixel input images. For all
models, the epoch was set to 5 for fine tuning the model to
get best batch size. After obtaining the most efficient batch
size, the models were trained for 50 epochs to perform
comparative analysis. Finally, the models were tested using
images from the test set that were produced at random. In our
experiment, the MobileNet model performed admirably,
achieving 100% training accuracy on the 7th epoch while the
VGG16 model reached complete accuracy on the 12th.
ResNet fails to attain training accuracy of 100 percent. To
train and validate the dataset, 50 epochs were used in total.

VI. EXPERIMENTAL RESULTS


Face recognition and classification will be performed on
Fig. 3. VGG16 Training and Validation Accuracy
the datasets by the pre-trained models. The batch size and
epoch parameters are tuned before testing to see the accuracy
results for training and validation

A. Testing Result Analysis


During the training process, the measurement metrics
observed were accuracy and validation.MobileNet was the
quickest to train and provide accurate results, but VGG16
took longer to train because of its complexity. In the batch
size variation experiment, different numbers of batches were
tried, including 2, 4, 8, 16, and 32. According to table I, the
more batches there are, the more accurate the training and
validation algorithms will be. Using the batch size of 32, the
VGG16 , ResNet50 and MobileNet models achive 100%,
85% and 100% respectively. This high accuracy could be
attributed to the quantity of our dataset In all three transfer
Fig. 4. VGG16 Training and Validation Loss

TABLE I. EXPERIMENTS IN THE TRAINING DATA STAGE USING BATCH SIZE VARIATIONS

2 batch 4 batch 8 batch 16 batch 32 batch


Pre-trained
Model Train Val Train Val Train Val Train Val Train Val
Acc. Acc. Acc. Acc. Acc. Acc. Acc. Acc. Acc. Acc.
VGG16 0.9182 0.0113 0.9636 0.9841 0.9844 0.9751 0.9844 0.9857 0.9857 0.9844

Resnet50 0.4516 0.3200 0.6344 0.3600 0.7312 0.3600 0.8495 0.5200 0.7572 0.4400

MobileNetV2 0.9677 0.6400 1.0000 0.8400 1.0000 0.9600 1.0000 0.9600 1.0000 0.9800

TABLE II. EXPERIMENTS IN THE TRAINING DATA STAGE USING EPOCH NUMBER VARIATIONS

10 Epoch 20 Epoch 30 Epoch 40 Epoch 50 Epoch


Pre-trained
Model Train Val Train Val Train Val Train Val Train Val
Acc. Acc. Acc. Acc. Acc. Acc. Acc. Acc. Acc. Acc.
VGG16 0.7957 0.6400 1.0000 0.8400 1.0000 0.8800 1.0000 0.8800 1.0000 0.9200

Resnet50 0.5054 0.3600 0.6559 0.4400 0.8602 0.4400 0.7957 0.4800 0.8602 0.5200

MobileNetV2 1.0000 0.8000 1.0000 0.8400 1.0000 0.8800 1.0000 0.8800 1.0000 0.8800

4
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
B. Testing Result Analysis
Face recognition was performed using a random sample
from the test set, and the developed models were then
employed as classifiers. The detected image is labeled with
the person's name on the window, and this technique may be
utilized for both image and video face recognition. A
separate function for real-time recognition was also created.
The function consists of real-time face detection using the
HAAR classifier and recognition using any of the models.
All three models provided significant result in terms of
accuracy.

Fig. 5. ResNet50 Training and Validation Accuracy

Fig. 6. ResNet50 Training and Validation Loss


Fig. 9. Image Classification Results by the Models.

VII. CONCLUSION
In this study, transfer learning was employed to create a
face recognition system. The proposed research compared
models in terms of accuracy, loss, training duration, and
recognition loss. The models developed in the study can be
used in real-time attendance systems as well as static image
face recognition systems. As all the three models are pre-
trained on large datasets, they can be used for any large
dataset for real-time face recognition. The objective of future
research is to create a real-time robust facial recognition
system based on deep learning for use in crime detection and
forensics.
Fig. 7. MobileNet Training and Validation Accuracy
REFERENCES
[1] Massoli, F. V., Amato, G., & Falchi, F. (2020). Cross-resolution
learning for face recognition. Image and Vision Computing, 99,
103927.
[2] Sanchez-Moreno, A. S., Olivares-Mercado, J., Hernandez-Suarez, A.,
Toscano-Medina, K., Sanchez-Perez, G., & Benitez-Garcia, G.
(2021). Efficient Face Recognition System for Operating in
Unconstrained Environments. Journal of Imaging, 7(9), 161.
[3] Almabdy, S., & Elrefaei, L. (2019). Deep convolutional neural
network-based approaches for face recognition. Applied Sciences,
9(20), 4397.
[4] Kumar, P. M., Gandhi, U., Varatharajan, R., Manogaran, G., Jidhesh,
R., & Vadivel, T. (2019). Intelligent face recognition and navigation
system using neural learning for smart security in Internet of Things.
Cluster Computing, 22(4), 7733-7744.
[5] Prakash, R. M., Thenmoezhi, N., & Gayathri, M. (2019, November).
Face recognition with convolutional neural network and transfer
Fig. 8. MobileNet Training and Validation Loss learning. In 2019 International Conference on Smart Systems and
Inventive Technology (ICSSIT) (pp. 861-864). IEEE.

5
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
[6] Ding, C., & Tao, D. (2017). Trunk-branch ensemble convolutional [9] Kortli, Y., Jridi, M., Al Falou, A., & Atri, M. (2020). Face
neural networks for video-based face recognition. IEEE transactions recognition systems: A survey. Sensors, 20(2), 342.
on pattern analysis and machine intelligence, 40(4), 1002-1014. [10] Ali, N. (2021). Exploring Transfer Learning on Face Recognition of
[7] Bah, S. M., & Ming, F. (2020). An improved face recognition Dark Skinned, Low Quality and Low Resource Face Data. arXiv
algorithm and its application in attendance management system. preprint arXiv:2101.10809.
Array, 5, 100014. [11] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L.
[8] Jayaraman, U., Gupta, P., Gupta, S., Arora, G., & Tiwari, K. (2020). (2009). Imagenet: A large-scale hierarchical image database. In 2009
Recent development in face recognition. Neurocomputing, 408, 231- IEEE
245. [12] https://www.kaggle.com/datasets/dansbecker/5-celebrity-faces-dataset
[13] F. Chollet et al., “Keras,” https://keras.io,

6
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.

You might also like