0% found this document useful (0 votes)
16 views

Automatic Face Recognition System Based On Data Augmentation and Transfer Learning

A_Face_Recognition_Security_Model_Using_Transfer_Learning_Technique

Uploaded by

pycoder29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Automatic Face Recognition System Based On Data Augmentation and Transfer Learning

A_Face_Recognition_Security_Model_Using_Transfer_Learning_Technique

Uploaded by

pycoder29
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Automatic Face Recognition System based on Data

Augmentation and Transfer Learning


2023 Third International Conference on Theoretical and Applicative Aspects of Computer Science (ICTAACS) | 979-8-3503-8585-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICTAACS60400.2023.10449574

Hattab Abdessalam Behloul Ali


LaSTIC laboratory, computer science department LaSTIC laboratory, computer science department
Batna 2 University Batna 2 University
Batna, Algeria Batna, Algeria
[email protected] [email protected]

Abstract—In an era marked by the ubiquity of digital identity identification in unconstrained environments and training Deep
verification, the realm of biometrics, especially facial recognition, Learning models with limited datasets. This study aims to
has been a focal point of research. While it has exhibited evaluate our system’s performance and compare it with recent
impressive performance under controlled conditions, its efficacy
degrades under uncontrolled environments. To address this face recognition systems.
challenge, we present a robust face recognition system inspired by The rest of this paper is structured as follows: Section
Xception model. We harnessed the power of Data Augmentation II covers related works, Section III introduces our system,
and Transfer Learning techniques to proficiently train our Section IV provides a detailed analysis of results and compares
proposed Deep CNN model on limited face datasets. These dual them to recent systems, and Section V wraps up the paper.
techniques not only led to heightened recognition accuracy but
also served as robust safeguards against overfitting. Through II. R ELATED W ORK
rigorous testing using the Georgia Tech and ORL benchmark
datasets, employing a two-fold cross-validation protocol, our In the past years, face recognition has attracted significant
system consistently surpasses contemporary state-of-the-art face research attention. In this section section, we provide an
recognition methods, achieving an outstanding 100% accuracy overview of select research endeavors in this domain.
rate on both datasets.
Index Terms—Face recognition, Deep Learning, Convolutional In their study, Aldhahab et al. [5] developed an approach
Neural Networks, Transfer Learning, Data Augmentation, Xcep- that involved the precise localization of the eyes, nose, and
tion model. mouth regions within each facial image. From these localized
regions, they extracted features using two algorithms: the
I. I NTRODUCTION Discrete Wavelet Transform (DWT) and Vector Quantization
In the wake of the rapid evolution of technology, the (VQ) . This approach yielded commendable accuracy rates
adoption of automatic facial identification systems has surged, when tested across various facial image databases. See and
primarily for personal identification purposes. Among biomet- Noor [6] introduced a novel hybrid method for face recognition
ric methods, face recognition stands out as one of the most by combining the Complete Gabor Filter (CGF) with the Ran-
non-intrusive and user-friendly approaches, as it obviates the dom Forest (RF) classifier. The suggested approach attained
need for active user cooperation. Traditional face recognition a satisfactory accuracy rate when tested on the Georgia Tech
techniques have demonstrated commendable proficiency in database.
facial identification tasks. However, their efficacy dramatically Ouyang et al. [7] employed the Improved Kernel Linear
declines when applied under uncontrolled conditions, encom- Discriminant Analysis (IKLDA) technique in conjunction with
passing variable illumination, diverse poses, and changing Probabilistic Neural Networks (PNNs) for facial recognition.
facial expressions [1]. Deep Learning (DL) techniques, partic- Notably, their method yielded encouraging results, especially
ularly Convolutional Neural Network (CNN), have gained in- when applied to datasets containing a limited number of
creasing attention recently, demonstrating remarkable accuracy samples. Drawing on directional coding, Ousliman et al. [8]
in various image recognition domains, including biometrics introduced a collection of Rotation-Invariant features tailored
[2], [3]. for texture classification. These features were successfully
This paper propose a robust facial recognition system via a integrated into the realm of face recognition and demonstrated
robust CNN architecture inspired by the Xception architecture exceptional accuracy when assessed across various face im-
[4] . To overcome the challenge of training our proposed age databases. In the research conducted by Sapijaszko and
CNN model with limited data, we applied Data Augmentation colleagues [9], they designed a facial recognition system that
(DA) and Transfer Learning (TL) techniques. By harnessing incorporated a preprocessing algorithm aimed at enhancing the
the capabilities of Deep CNN and incorporating DA and TL quality of facial images. This system employed a combination
techniques, we anticipate our system will excel in addressing of DWT and DCT techniques to extract features from the
the challenges of face recognition, delivering superior perfor- improved images. A multilayer sigmoid neural network was
mance. Our primary goal is to tackle the challenges of facial employed in the classification stage. Hattab and Behloul

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:07 UTC from IEEE Xplore. Restrictions apply.
[10] employed the SIFT technique to pinpoint Significant consistently outperformed other well-known deep CNN mod-
regions within facial images. Subsequently, they applied the els, including inception-v3 [23], ResNet [14], VGG16 [22],
ALTP method to extract features from this identified area. and AlexNet [21]. The Xception network is composed of 14
For the classification phase, the researchers adopted the k- residual blocks and boasts over 22 million parameters. Fig. 1
Nearest Neighbors (k-NN) algorithm. Their proposed approach (a) provides a visual representation of the Xception model’s
demonstrated a notably superior accuracy rate when compared architecture.
to traditional manual feature engineering methods. One notable limitation of the Xception model is its re-
In their study, Coúkun et al. [11] Implemented an eight-layer quirement for a substantial amount of data to effectively
Convolutional Neural Network (CNN). To enhance accuracy train all its parameters [24]. This poses significant challenge
rates, the authors incorporated batch normalization after both when applying the model to tasks with small datasets. To
the first and last Convolutional layers. Almabdy et al. [12] address this limitation, researchers have applied promising
introduced a facial identification system that relies on two approaches, including Data Augmentation and Transfer Learn-
pre-trained deep CNN models, namely AlexNet [13] and ing, as techniques to tackle this challenge. These techniques
ResNet-50 [14]. These models were utilized to construct two offer remedies for addressing data constraints and boosting the
feature vectors, which were then concatenated to create a final model’s performance. By implementing Data Augmentation,
feature vector with 6144 dimensions. The authors employed researchers can create synthetic data by introducing a variety
a Support Vector Machine (SVM) for the classification stage. of transformations and modifications to the existing dataset.
Zeghina et al. [15] adopted the Harris Detector to pinpoint This enriched dataset substantially expands both the volume
significant regions within facial images. Following this, a and diversity of the training set, enabling the model to learn
custom CNN was deployed to recognize faces based on the from more extensive examples.
identified regions. Moreover, Transfer Learning serves as a cornerstone in
Wang et al. [16] introduced a hybrid system designed for augmenting a model’s performance and its ability to generalize
face identification. Initially, they utilized Local Binary Patterns when faced with smaller datasets. Leveraging the acquired fea-
(LBP) to generate LBP images, which were subsequently tures and knowledge garnered from training on larger datasets,
employed as input for a Deep CNN. This innovative approach the model proficiently applies this valuable knowledge to
yielded a satisfactory level of accuracy in the realm of face the new dataset. In conclusion, both Data Augmentation and
recognition. In another notable work [17], a CNN architecture Transfer Learning present promising avenues for tackling
underwent training on widely adopted multi-sample databases. training challenges and enhancing the performance.
Subsequently, the authors applied this pre-trained architecture To develop a robust facial recognition system using the
for facial recognition tasks across various datasets. To en- Xception model while minimizing computational complexity,
hance the recognition rate, they augmented intra-class varia- it is imperative to decrease the number of residual blocks
tion by implementing the K Class Feature Transfer (KCFT) within the Xception network. Through a series of experiments
technique. Hattab and Behloul [18] extracted face features incorporating Data Augmentation, we have successfully en-
using eight residual blocks from the Xception model pre- gineered a robust facial recognition system relying on the
trained on the ImageNet database. Then, they used LinearSVC Xception architecture, which was trained on the ImageNet
for the classification stage. The proposed method achieved database [25]. This dataset encompasses over 1.4 million
acceptable results in the ORL and Georgia Tech databases. images, with approximately 17% of them containing at least
In an independent study, Kamencay et al. [19] unveiled a new one face [26]. Our experimental findings highlight that a
facial identification system that hinged on feature extraction CNN model inspired by the Xception architecture, comprising
using Convolutional Neural Network. This system underwent just six residual blocks, emerges as an exceptionally effective
evaluation on the ORL database and garnered an impressive approach for achieving precise facial recognition results, par-
accuracy. Authors in [20] used pre-trained architectures such ticularly when dealing with small facial datasets.
as AlexNet-v2 [21] and VGG16 [22] to extract facial fea- This paper introduces a robust CNN model for facial
tures. Subsequently, they employed LinearSVC classifier. This recognition, drawing inspiration from the pre-trained Xcep-
method obtained an impressive accuracy on the ORL database, tion model. Our proposed model comprises a streamlined
notwithstanding the fact that the models used in this method architecture consisting of merely six residual blocks, which
comprised tens of millions of parameters. are subsequently followed by Global Average Pooling (GAP)
and Softmax layers. Within our model, we used a total of
III. T HE P ROPOSED S YSTEM 12 Separable Convolutional layers, 5 Convolutional layers,
In the past decade, Deep Learning has achieved remarkable and 4 MaxPooling layers. The inclusion of the GAP layer
success in the field of image recognition, particularly with serves to significantly decrease the feature dimensions from
the advent of CNNs. One standout model in this domain 19*19*728 to 728. Particularly, our architecture is constructed
is Xception [4], renowned for its robust performance in with approximately four million parameters, striking a balance
image recognition tasks. Xception takes inspiration from the between effectiveness and computational efficiency. Fig. 1
inception-v3 [23] network but employs depthwise separable (b) illustrates our CNN architecture inspired by the Xception
Convolution layers in place of inception modules. It has model.

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:07 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. A face image after applying the Data Augmentation technique.

IV. E XPERIMENTS
This paper introduces a facial recognition system based
on Data Augmentation and Transfer Learning. To prove the
effectiveness and robustness of our system, we employed
two evaluation protocols: a train-test split protocol (80% for
training and 20% for testing) and a two-fold cross-validation
protocol. We train our model for a maximum of 30 epochs
using the Adam optimizer and implement early stopping
if there is no improvement in validation accuracy over 10
consecutive epochs to mitigate the risk of overfitting.
We carried out a series of experiments on the Georgia Tech
and ORL databases to comprehensively assess the system’s
performance. Subsequently, we compared our system’s recog-
nition rates with recent research efforts. All experiments were
carried out using the Keras open-source library within the
Google Colab environment.

Fig. 1. a) The Xception architecture [4] , b) Our proposed model. A. Data sets:
We performed evaluation experiments on two distinct facial
databases:
1) The Georgia Tech database: comprises 750 face images,
To attain exceptional recognition accuracy, we employ a
each taken from 50 individuals, featuring 15 images per
carefully designed training strategy. During the training phase,
person. These images possess a resolution of 640x480 pixels
our strategy involves the freezing of the initial five pre-
and were captured under diverse conditions, encompassing
trained residual blocks, which are responsible for extracting
variations in scale, facial expressions, and lighting. Fig. 3
low-level features. In parallel, we devote our attention to
provides visual samples from the Georgia Tech database.
retraining the sixth residual block, tasked with capturing high-
level features capable of identifying characteristic facial traits.
Simultaneously, we train the Softmax classifier, responsible for
accurately assigning the appropriate class label to the facial
image. Moreover, we leverage Data Augmentation techniques
to enrich the training dataset. This augmentation encompasses
a variety of geometric transformations, such as shifting height
and width within a range of 0.1, rotating images within a range
of 10 degrees, applying zoom within a range of 0.15, and
employing a constant fill mode. For a visual demonstration
of the facial images generated through Data Augmentation, Fig. 3. Samples from the Georgia Tech database.
kindly refer to Fig. 2.

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:07 UTC from IEEE Xplore. Restrictions apply.
2) The ORL database: comprises of 400 images, each
taken from 40 individuals, with ten images per person, and
each image has dimensions of 112×92 pixels. These images
were captured under diverse conditions, including variations
in illumination, facial expressions, time differences, and the
presence of glasses. Fig. 4 provides visual examples from the
ORL database.

Fig. 4. Samples from the ORL database.

B. Experimental results :
Our architecture demonstrated exceptional recognition rates
on the Georgia Tech and ORL databases, achieving a perfect
100% accuracy when employing an 80% training and 20%
testing split. Additionally, our system achieved the same 100%
accuracy on both databases using a two-fold cross validation
protocol. You can find the accuracy rates achieved by our
proposed system in Table I.

TABLE I
T HE RECOGNITION RATE OF OUR SYSTEM ON THE G EORGIA T ECH AND
ORL DATABASES Fig. 5. Accuracy and loss graphs depicting the effectiveness of the suggested
model on the ORL database, utilizing 20% of data for testing and 80%
Georgia Tech database ORL database training.
Train: 80%, Test:20% 100% 100%
Two-fold cross-validation 100% 100%
TABLE II
C OMPARISON OF OUR ACCURACY RATE WITH RECENT STUDIES ON THE
Fig. 5 displays the accuracy and loss curves of our proposed G EORGIA T ECH DATABASE .
system with the ORL database split into 20% for testing Method Accuracy Evaluation protocol
and 80% for training. The accuracy curve shows our model Aldhahab et al. [5] 98.40% 5-fold cross-validation
achieving 100% accuracy by the eighth iteration, while the loss See and Noor [6] 95.10% 5-fold cross-validation
Coskun et al. [11] 94.80% Test: 34%, Train: 66%
curve demonstrates rapid convergence, reaching below 0.2 by Almabdy and Elrefaei [12] 98.31% Test: 20%, Train: 80%
the tenth iteration, Zeghina et al. [15] 97.41% Test: 20%, Train: 80%
To prove our system’s performance, we compared its ac- Hattab and Behloul [18] 99.47% 2-fold cross-validation
Proposed 100% Test: 20%, Train: 80%
curacy rates with recent techniques commonly used in the 100% 2-fold cross-validation
domain of facial identification. The results, presented in Table
II, highlight that our system’s accuracy rates on the Georgia
Tech database outperformed those of recent works. Further-
more, Table III illustrates the efficacy of our face recognition six residual blocks followed by GAP and Softmax layers. To
system when evaluated on the ORL database. Notably, the overcome the inherent challenges of training Deep Learning
system introduced by Hattab and Behloul [20] obtained a high architectures with limited datasets, we utilized Data Augmen-
recognition accuracy of 100% on the ORL dataset. However, tation and Transfer Learning techniques, enabling us to achieve
it’s worth noting that their system employed tens of millions remarkable accuracy rates under unconstrained conditions.
of parameters, whereas our system utilizes only approximately Our proposed system exhibits exceptional performance,
4 million parameters. achieving high accuracy rates through a two-fold cross-
validation protocol, surpassing recent methods. Specifically,
V. C ONCLUSION we achieve a perfect 100% accuracy on both the Georgia Tech
In this research paper, we introduce an efficacy facial and ORL benchmark databases. Moreover, we replicate this
recognition system relying on the Xception model, employing remarkable accuracy by employing an 80% training and 20%

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:07 UTC from IEEE Xplore. Restrictions apply.
TABLE III [11] M. Coskun, A. Ucar, O. Yildirim, and Y. Demir, “Face recognition
C OMPARISON OF OUR ACCURACY RATE WITH RECENT STUDIES ON THE based on convolutional neural network,” Proceedings of the International
ORL DATABASE . Conference on Modern Electrical and Energy Systems, MEES 2017, vol.
2018-Janua, pp. 376–379, 2017.
Method Accuracy Evaluation protocol [12] S. Almabdy and L. Elrefaei, “Feature extraction and fusion for face
Ouyang et al. [7] 97.22% Test: 20%, Train: 80% recognition systems using pre-trained convolutional neural networks,”
Ouslimani et al. [8] 98.61% Test: 20%, Train: 80% International Journal of Computing and Digital Systems, vol. 9, pp.
Sapijaszko and Mikhael [9] 98.8% Test: 20%, Train: 80% 1–7, 2021.
Hattab and Behloul [10] 99.75% 5-fold cross-validation [13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
Wang et al. [16] 96.6% Test: 33%, Train: 67% with deep convolutional neural networks,” Advances in neural informa-
Min et al. [17] 97.77 % Test: 90%, Train: 10% tion processing systems, vol. 25, pp. 1097–1105, 2012.
Hattab and Behloul [18] 99.75% 2-fold cross-validation [14] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
Kamencay et al. [19] 98.30% Test: 20%, Train: 80% recognition,” in Proceedings of the IEEE conference on computer vision
Hattab and Behloul [20] 100% 5-fold cross-validation and pattern recognition, 2016, pp. 770–778.
proposed 100% Test: 20%, Train: 80% [15] A. O. Zeghina, O. Zoubia, and A. Behloul, “Face Recognition Based on
100% 2-fold cross-validation Harris Detector and Convolutional Neural Networks,” in International
Symposium on Modelling and Implementation of Complex Systems.
Springer, 2020, pp. 163–171.
[16] M. Wang, Z. Wang, and J. Li, “Deep convolutional neural network
applies to face recognition in small and medium databases,” in 2017 4th
testing data split. International Conference on Systems and Informatics (ICSAI). IEEE,
However, further optimization is required to use our system 2017, pp. 1368–1372.
[17] R. Min, S. Xu, and Z. Cui, “Single-sample face recognition based on
on resource-constrained devices. Additionally, we recognize feature expansion,” IEEE Access, vol. 7, pp. 45 219–45 229, 2019.
the potential benefits of incorporating anti-spoofing techniques [18] A. Hattab and A. Behloul, “Face-iris multimodal biometric recognition
to enhance the system’s ability to effectively detect face system based on deep learning,” Multimedia Tools and Applications, pp.
1–28, 2023.
spoofing attacks. [19] P. Kamencay, M. Benco, T. Mizdos, and R. Radil, “A new method
Our future research endeavors will primarily focus on for face recognition using convolutional neural network,” Advances in
refining the proposed system’s optimization and integrating Electrical and Electronic Engineering, vol. 15, no. 4, pp. 663–672, 2017.
[20] A. Hattab and A. Behloul, “New approaches for automatic face recog-
robust anti-spoofing algorithms. Subsequently, we aim to apply nition based on deep learning models and local handcrafted altp,” EAI
this improved system in the domain of face-iris multimodal Endorsed Transactions on Scalable Information Systems, vol. 9, no. 34,
biometric identification. pp. e11–e11, 2022.
[21] A. Krizhevsky, “One weird trick for parallelizing convolutional neural
networks,” arXiv preprint arXiv:1404.5997, 2014.
R EFERENCES [22] K. Simonyan and A. Zisserman, “Very deep convolutional networks for
large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[23] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking
[1] S. G. Kong, J. Heo, B. R. Abidi, J. Paik, and M. A. Abidi, “Recent the inception architecture for computer vision,” in Proceedings of the
advances in visual and infrared face recognition—a review,” Computer IEEE conference on computer vision and pattern recognition, 2016, pp.
vision and image understanding, vol. 97, no. 1, pp. 103–135, 2005. 2818–2826.
[2] A. Hattab and A. Behloul, “A robust iris recognition approach based [24] I. Kandel and M. Castelli, “Transfer learning with convolutional neu-
on transfer learning,” International Journal of Computing and Digital ral networks for diabetic retinopathy image classification. A review,”
Systems, vol. 13, no. 1, pp. 1065–1080, 2023. Applied Sciences, vol. 10, no. 6, p. 2021, 2020.
[3] A. HATTAB and A. BEHLOUL, “An illumination-robust face recog- [25] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
nition approach based on convolutional neural network,” in Modelling A large-scale hierarchical image database,” in 2009 IEEE conference on
and Implementation of Complex Systems: Proceedings of the 7th Inter- computer vision and pattern recognition. Ieee, 2009, pp. 248–255.
national Symposium, MISC 2022, Mostaganem, Algeria, October 30-31, [26] K. Yang, J. Yau, L. Fei-Fei, J. Deng, and O. Russakovsky, “A Study of
2022. Springer, 2022, pp. 135–149. Face Obfuscation in ImageNet,” arXiv preprint arXiv:2103.06191, 2021.
[4] F. Chollet, “Xception: Deep learning with depthwise separable convolu-
tions,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 1251–1258.
[5] A. Aldhahab, T. Alobaidi, A. Q. Althahab, and W. B. Mikhael, “Ap-
plying multiresolution analysis to vector quantization features for face
recognition,” in 2019 IEEE 62nd International Midwest Symposium on
Circuits and Systems (MWSCAS). IEEE, 2019, pp. 598–601.
[6] Y. C. See and N. M. Noor, “Integrating complete gabor filter to the
random forest classification algorithm for face recognition,” Journal of
Engineering Science and Technology, vol. 14, no. 2, pp. 859–874, 2019.
[7] A. Ouyang, Y. Liu, S. Pei, X. Peng, M. He, and Q. Wang, “A hybrid
improved kernel LDA and PNN algorithm for efficient face recognition,”
Neurocomputing, vol. 393, pp. 214–222, 2020.
[8] F. Ouslimani, A. Ouslimani, and Z. Ameur, “Rotation-invariant features
based on directional coding for texture classification,” Neural Computing
and Applications, vol. 31, pp. 6393–6400, 2019.
[9] G. M. Sapijaszko and W. B. Mikhael, “Facial Recognition System Using
Mixed Transform and Multilayer Sigmoid Neural Network Classifier,”
Circuits, Systems, and Signal Processing, vol. 39, pp. 6142–6161, 2020.
[10] A. Hattab and A. Behloul, “A robust face recognition method
based on altp and sift,” in Advances in Communication Technology,
Computing and Engineering. RGN Publications, 2021, pp. 155–
169. [Online]. Available: https://rgnpublications.com/ICACTCE2021/
manuscripts/015-128.pdf

Authorized licensed use limited to: Nirma University Institute of Technology. Downloaded on June 13,2024 at 13:39:07 UTC from IEEE Xplore. Restrictions apply.

You might also like