Comparative Analysis of Transfer Learning CNN For Face Recognition
Comparative Analysis of Transfer Learning CNN For Face Recognition
Abstract—The practice of identifying people by looking at CNN. The pre-trained model is employed to extract features
their faces is known as face recognition (FR). This technology when the dataset is minimal or the classification problem is
is frequently used in biometric authentication, surveillance similar. Transfer learning was employed in the proposed
technologies, security systems, law enforcement, real-time study to develop a face recognition system using three
attendance systems, smart cards, and other applications. The alternative CNN architectures.
face recognition technology works in two stages. First, a
method for picking up or extracting face characteristics is
applied, followed by pattern categorization. Deep learning has II. LITREATURE OVERVIEW
2022 International Conference on Intelligent Technologies (CONIT) | 978-1-6654-8407-7/22/$31.00 ©2022 IEEE | DOI: 10.1109/CONIT55038.2022.9847946
lately made important contributions to face recognition Previously, researchers developed many algorithms and
technology, particularly the convolutional neural network methodologies for facial recognition. These are covered in
(CNN). Deep learning employs numerous processing layers to the following section. Masooli et al. [1] suggested a training
develop data representations with varying degrees of feature strategy for fine-tuning a cutting-edge SeNet-50 architecture
extraction. Since the achievements of DeepFace and DeepID, in extracting resolution-robust deep features. The SeNet-50
this developing technology has altered the study landscape of
model was trained on photos from the VGGFace2 dataset
facial recognition. Deep learning has dramatically improved
with a randomly chosen resolution and a cooperative
cutting-edge performance and aided in the development of
effective real-world applications. This study examines the
learning teacher-student strategy from literature. Authors
performance of three among the most popular CNN evaluated the model on IARPA Janus Benchmark B (IJB-B),
architectures for face recognition. In the proposed work, IARPA Janus Benchmark C (IJB-C), Queen Mary University
transfer learning is used to deploy pre-trained CNN models for of London (QMUL) SurvFace, TinyFace, and Surveillance
face recognition such as VGG16, ResNet-50, and MobileNet. Cameras Face Database (SCface) datasets and obtained good
Training and validation accuracy and loss were utilized as cross-resolution recognition accuracy results. Sanchez et al.
criterion to enhance the performance of the CNN algorithm. 5 [2] proposed a framework for real-time facial identification
Celebrity Faces Dataset from Kaggle, as well as a local dataset, that requires moderate hardware and a pairing of deep
were used to train and test the models. Face recognition was learning algorithms like FaceNet and classic classifiers like
implemented in two ways: static and webcam-based. The support vector machine (SVM), K-nearest neighbor (KNN),
models developed for this research can be used in real-time and random forest (RF) to function in an unconstrained
attendance and surveillance systems. environment. The model detected faces using YOLO-Face
and used preprocessing techniques such as Bicubic
Keywords—Machine Learning Face · Recognition · interpolation for picture resampling, the L2 method for
Convolution Neural Network (CNN) · Neural Networks · Deep normalization, and color adjustments. FaceNet with the
Learning · Transfer Learning · VGG16 · ResNet-50 · MobileNet softmax and cross-entropy loss function was used for the
recognition stage. Face detection is performed on the Face
I. INTRODUCTION Detection Data Set and Benchmark (FDDB), Wider Face and
The market for face recognition technology is rapidly Honda/UCSD CelebA data sets and is evaluated based on
growing as a result of breakthroughs in artificial intelligence accuracy, precision, and recall rate, whereas face recognition
(AI), deep learning, and machine learning (ML) is performed on the Labeled Faces in the Wild (LFW) and
technologies. Facial recognition refers to a system that YouTube Faces (YTF) data sets, with the YTF dataset
recognizes people based on their faces. Face recognition achieving 99.1 percent recognition accuracy at a real-time
requires just a digital imaging device to create and gather the speed of 24 frames per second. Almday et al. [3] evaluated
images and data required to establish and record the subject's FR performance using pre-trained convolutional neural
biometric face pattern. Unlike other methods of networks (AlexNet and ResNet-50 models) for feature
authentication such as password-based, email verification, or extraction, support vector machine for categorization, and
fingerprint identification, biometric facial recognition relies transfer learning for both feature extraction and classification
on unique mathematical and dynamic patterns, making it one using pre-trained CNN (AlexNet model). The evaluation was
of the safest and most successful. There are numerous performed on the Olivetti Research Laboratory (ORL),
approaches of implementing FR. The use of CNN for deep Georgia Tech Face, Frontalized labeled faces in the wild
learning is an important advance. CNN can be applied in a (F_LFW), FEI face, Labeled Faces in the Wild (LFW) and
variety of ways. The first way entails learning the model YouTube Faces (YTF) datasets. ResNet with SVM used the
completely from scratch. The architecture of the pre-trained least amount of time to train, however, transfer learning
model is employed in this situation, and the dataset is used to AlexNet consistently performed well.
train it. In large dataset situations, the second method is to Kumar et al. [4] developed an image recognition
employ transfer learning with features from pre-trained Artificial Neural Network (ANN) with OpenCV and
2
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
IV. DATASET
A. Dataset Collection
In this research, two dataset approaches were used. The
first dataset is 5 Celebrity Faces Dataset from kaggle. This is
a small dataset that can be used to experiment with computer
vision and image processing techniques. It features a training
directory with 14-20 photographs of each celebrity and
contains total of 118 images with different scales,
illumination, pose, and resolution. The training set has 93
photos, whereas the testing set contains 25.
For the second dataset, a new approach was used, with
Fig. 1. CNN basic architecture
OpenCV being used to build a program that takes samples
from a camera. Then, using the HAAR classifier, it detects
faces and returns cropped images containing faces. Using
A. Transfer Learning this method, it is possible to produce a local dataset with an
Transfer learning is an AI-ML subfield that aims to adapt infinite number of photos for a specific class, each with a
information gained from one effort (the source task) to a different stance and lighting variation.
different but related activity (target task). Transfer learning is
the endeavor to use what is learned in one activity to increase
generalization in another. Transfer learning requires three
requirements to be met: the production of an open-source
third-party pre-trained model, the repurposing of the model,
and fine-tuning for the Scenario.
B. VGG16
The University of Oxford proposed the VGG16 CNN
architecture. It was one of the well-known models that was
submitted in the ILSVRC-2014. In ImageNet [11], a
collection of over 14 million images classified into 1000
classes, the model exhibits 92.7 percent top-five test
accuracy. It outperforms AlexNet by sequentially replacing
massive kernel-sized filters, (eleven and fifth in the initial
layers) with multiple 33 kernel-sized filters.
3
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
a single dimension. Using the softmax activation function, all learning models, the best batch size for all variations of the
of the layers were then grouped into a single object. Next, the experiment is 32. We chose a batch size of 32 for our
model was developed with the Adam optimizer, accuracy as number of epoch experiments. The table II shows training
metrics, and a loss function. During the preprocessing stage, and validation accuracy of the 3 models for 50 epochs.
the dataset images were augmented using the VGG16 achieved the highest validation accuracy.
ImageDataGenerator from the Keras package. The model
was then trained using the 5 celebrities image dataset as well The training and validation accuracy and loss for our
as a local dataset constructed from real-time collected models are as follows:
pictures having 224 × 224-pixel input images. For all
models, the epoch was set to 5 for fine tuning the model to
get best batch size. After obtaining the most efficient batch
size, the models were trained for 50 epochs to perform
comparative analysis. Finally, the models were tested using
images from the test set that were produced at random. In our
experiment, the MobileNet model performed admirably,
achieving 100% training accuracy on the 7th epoch while the
VGG16 model reached complete accuracy on the 12th.
ResNet fails to attain training accuracy of 100 percent. To
train and validate the dataset, 50 epochs were used in total.
TABLE I. EXPERIMENTS IN THE TRAINING DATA STAGE USING BATCH SIZE VARIATIONS
Resnet50 0.4516 0.3200 0.6344 0.3600 0.7312 0.3600 0.8495 0.5200 0.7572 0.4400
MobileNetV2 0.9677 0.6400 1.0000 0.8400 1.0000 0.9600 1.0000 0.9600 1.0000 0.9800
TABLE II. EXPERIMENTS IN THE TRAINING DATA STAGE USING EPOCH NUMBER VARIATIONS
Resnet50 0.5054 0.3600 0.6559 0.4400 0.8602 0.4400 0.7957 0.4800 0.8602 0.5200
MobileNetV2 1.0000 0.8000 1.0000 0.8400 1.0000 0.8800 1.0000 0.8800 1.0000 0.8800
4
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
B. Testing Result Analysis
Face recognition was performed using a random sample
from the test set, and the developed models were then
employed as classifiers. The detected image is labeled with
the person's name on the window, and this technique may be
utilized for both image and video face recognition. A
separate function for real-time recognition was also created.
The function consists of real-time face detection using the
HAAR classifier and recognition using any of the models.
All three models provided significant result in terms of
accuracy.
VII. CONCLUSION
In this study, transfer learning was employed to create a
face recognition system. The proposed research compared
models in terms of accuracy, loss, training duration, and
recognition loss. The models developed in the study can be
used in real-time attendance systems as well as static image
face recognition systems. As all the three models are pre-
trained on large datasets, they can be used for any large
dataset for real-time face recognition. The objective of future
research is to create a real-time robust facial recognition
system based on deep learning for use in crime detection and
forensics.
Fig. 7. MobileNet Training and Validation Accuracy
REFERENCES
[1] Massoli, F. V., Amato, G., & Falchi, F. (2020). Cross-resolution
learning for face recognition. Image and Vision Computing, 99,
103927.
[2] Sanchez-Moreno, A. S., Olivares-Mercado, J., Hernandez-Suarez, A.,
Toscano-Medina, K., Sanchez-Perez, G., & Benitez-Garcia, G.
(2021). Efficient Face Recognition System for Operating in
Unconstrained Environments. Journal of Imaging, 7(9), 161.
[3] Almabdy, S., & Elrefaei, L. (2019). Deep convolutional neural
network-based approaches for face recognition. Applied Sciences,
9(20), 4397.
[4] Kumar, P. M., Gandhi, U., Varatharajan, R., Manogaran, G., Jidhesh,
R., & Vadivel, T. (2019). Intelligent face recognition and navigation
system using neural learning for smart security in Internet of Things.
Cluster Computing, 22(4), 7733-7744.
[5] Prakash, R. M., Thenmoezhi, N., & Gayathri, M. (2019, November).
Face recognition with convolutional neural network and transfer
Fig. 8. MobileNet Training and Validation Loss learning. In 2019 International Conference on Smart Systems and
Inventive Technology (ICSSIT) (pp. 861-864). IEEE.
5
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.
[6] Ding, C., & Tao, D. (2017). Trunk-branch ensemble convolutional [9] Kortli, Y., Jridi, M., Al Falou, A., & Atri, M. (2020). Face
neural networks for video-based face recognition. IEEE transactions recognition systems: A survey. Sensors, 20(2), 342.
on pattern analysis and machine intelligence, 40(4), 1002-1014. [10] Ali, N. (2021). Exploring Transfer Learning on Face Recognition of
[7] Bah, S. M., & Ming, F. (2020). An improved face recognition Dark Skinned, Low Quality and Low Resource Face Data. arXiv
algorithm and its application in attendance management system. preprint arXiv:2101.10809.
Array, 5, 100014. [11] Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., & Fei-Fei, L.
[8] Jayaraman, U., Gupta, P., Gupta, S., Arora, G., & Tiwari, K. (2020). (2009). Imagenet: A large-scale hierarchical image database. In 2009
Recent development in face recognition. Neurocomputing, 408, 231- IEEE
245. [12] https://www.kaggle.com/datasets/dansbecker/5-celebrity-faces-dataset
[13] F. Chollet et al., “Keras,” https://keras.io,
6
Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:05 UTC from IEEE Xplore. Restrictions apply.