0% found this document useful (0 votes)
48 views16 pages

Deep Learning Based Multi Pose Human Face Matching System

Uploaded by

sreerox05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views16 pages

Deep Learning Based Multi Pose Human Face Matching System

Uploaded by

sreerox05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Received 23 January 2024, accepted 9 February 2024, date of publication 14 February 2024, date of current version 23 February 2024.

Digital Object Identifier 10.1109/ACCESS.2024.3366451

Deep Learning Based Multi Pose Human


Face Matching System
MUHAMMAD SOHAIL1 , IJAZ ALI SHOUKAT 1 , ABD ULLAH KHAN2,3 , (Member, IEEE),
HARAM FATIMA1 , MOHSIN RAZA JAFRI 2 , MUHAMMAD AZFAR YAQUB 4,5 , (Member, IEEE),
AND ANTONIO LIOTTA 4 , (Senior Member, IEEE)
1 Department of Computing, Riphah International University, Faisalabad Campus, Faisalabad 38000, Pakistan
2 Department of Computer Sciences, National University of Sciences and Technology, Balochistan Campus, Quetta 87000, Pakistan
3 Department of Electronics and Information Convergence Engineering, Kyung Hee University, Suwon, Gyeonggi-do 17104, South Korea
4 Faculty of Engineering, Free University of Bozen-Bolzano, 39100 Bolzano, Italy
5 Department of Electrical and Computer Engineering, COMSATS University Islamabad, Islamabad 44000, Pakistan

Corresponding author: Muhammad Azfar Yaqub ([email protected])


This work was supported by the Open Access Publishing Fund of the Free University of Bozen-Bolzano.

ABSTRACT Current techniques for multi-pose human face matching yield suboptimal outcomes because
of the intricate nature of pose equalization and face rotation. Deep learning models, such as YOLO-V5, etc.,
that have been proposed to tackle these complexities, suffer from slow frame matching speeds and therefore
exhibit low face recognition accuracy. Concerning this, certain literature investigated multi-pose human face
detection systems; however, those studies are of elementary level and do not adequately analyze the utility
of those systems. To fill this research gap, we propose a real-time face matching algorithm based on YOLO-
V5. Our algorithm utilizes multi-pose human patterns and considers various face orientations, including
organizational faces and left, right, top, and bottom alignments, to recognize multiple aspects of people.
Using face poses, the algorithm identifies face positions in a dataset of images obtained from mixed pattern
live streams, and compares faces with a specific piece of the face that has a relatively similar spectrum for
matching with a given dataset. Once a match is found, the algorithm displays the face on Google Colab,
collected during the learning phase with the Robo-flow key, and tracks it using the YOLO-V5 face monitor.
Alignment variations are broken up into different positions, where each type of face is uniquely learned to
have its own study demonstrated. This method offers several benefits for identifying and monitoring humans
using their labeling tag as a pattern name, including high face-matching accuracy and minimum speed of
owing face-to-pose variations. Furthermore, the algorithm addresses the face rotation issue by introducing a
mixture of error functions for execution time, accuracy loss, frame-wise failure, and identity loss, attempting
to guide the authenticity of the produced image frame. Experimental results confirm effectiveness of the
algorithm in terms of improved accuracy and reduced delay in the face-matching paradigm.

INDEX TERMS Deep learning, face recognition, pattern matching, YOLO-V5.

I. INTRODUCTION smartphones, digital cameras, and GPUs [1]. Despite the


Over the last three decades, facial recognition has garnered advancements made in machine learning and recognition
significant attention due to its perceived ease of use as an systems, their performance remains constrained by real-
image analysis and pattern recognition application. Two of world conditions. For instance, accurately identifying facial
the most important reasons to understand the trend are: firstly, images in unconstrained environments characterized by light-
the diverse range of commercial and legal requirements, ing variations, diverse postures, facial expressions, partial
and secondly, the ubiquity of relevant technologies such as occlusion, disguises, or camera movement continues to pose
daunting challenges. In other words, existing technologies
The associate editor coordinating the review of this manuscript and are still lagging behind the visual capabilities of the human
approving it for publication was Ghulam Muhammad . mind [2].
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
26046 For more information, see https://creativecommons.org/licenses/by/4.0/ VOLUME 12, 2024
M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

FIGURE 1. Structure of multi-pose human face recognition.

Recently, machine learning approaches have gained sig- improvement in speed in multiple human face matching and
nificant success in computer vision applications, specifi- higher precision.
cally face recognition [3]. Two major applications of face To overcome the limitations associated with already
recognition are face identification and face verification. available methods, a modified version of YOLO-V5 [15] is
In face identification, facial images of a person can be presented to improve face matching from different poses and
utilized for his identity, whereas in face verification, a face reduce face recognition time. For this purpose detecting face
image and identity estimation is given to system to ver- benchmarks or recovering the depth appearance of a face
ify whether this image belongs to a specific person or from an image is utilized, which is relatively straightforward
not [4]. compared to the inherent difficulty in resolving position
A major problem associated with face detection is fluctuations. The YOLO-V5 method is utilized for multi-pose
detection accuracy. Different face scale for the same human face matching, enabling the evaluation or detection
image varies dramatically for detector [5]. Deep learning- of individuals using their face photos captured in various
based approaches, i.e., Region-based Convolutional Neural postures. This approach has prompted the development
Network (RCNN) [6] for face recognition has improved per- of current rotation estimation techniques such as Insight
formance significantly as compared to traditional algorithms, position, recursive convolution photograph, detailed points,
i.e., AdBoost and Deep Pyramid Deformable (DPM) [7], [8]. facial expression, and smoke, discussed in [16]. Figure 1
Upcoming machine learning algorithms i.e. Spatial Pyramid presents a framework for the detection of multi-pose
Pooling (SPP-net) [9], Fast Region-based Convolutional challenge, including classifying the multi-pose face to a
Neural Network (Fast-RCNN) [10], Faster Region-based (left, right, front, top, bottom) appearance while maintaining
Convolutional Neural Network (Faster-RCNN) [11], and identification order. The proposed method uses a multi-pose
Region-based Fully Convolutional Network (R-FCNN) [12] base YOLO-V5 to train a dataset based on face orientation
improved their accuracy and speed with passage of time and and alignments between frontal and rotated faces in multiple
thus the authors have achieved significant progress in face poses.
recognition domain but still there are many challenges need The major contributions of this study are summarized
to be addressed, especially face recognition duration. Already below.
available methods discuss feature selection based on duel 1) We proposed YOLO-V5-based simple and robust
systems [13] and single platform [14], which are comparable algorithm for real-time multi-pose human face-
to strict scientific edge image retrieval approaches utilizing matching and recognition. The algorithm learns model-
analytic infrastructure and local factors-centered approach ing across multi-pose faces and forehead appearances
tools. The limitations associated with [13] and [14] are blurry in input images to identify all types of rotation images,
descriptions and missed facial expressions in foregrounded such as front, left, right, and side views.
faces, as well as shading problems in small areas. To improve 2) By introducing a mixture of error functions for
the limitations associated with already available methods execution time, accuracy loss, frame-wise failure,
discussed in section II, the authors therein chose one of the and identity loss, the face rotation challenge is
advanced algorithms, i.e., You Only Look Once (YOLO), handled. Additionally, a fundamental weight matrix
and presented their work. But they lacked significant is used in the training process to increase accuracy.

VOLUME 12, 2024 26047


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

Consequently, a higher accuracy of up to 99% and a The authors in [23] proposed a pose estimation method
faster face-matching speed of about 34s is achieved. based on bin classification. The proposed method is designed
Furthermore, actual experiments are carried out to to accurately estimate head poses using a deep learning
show that the YOLO-V5 formalization, matching, and approach. They utilized predicted probabilistic labels to
identification created excellent results in real-world regress with a discrete Gaussian distribution, which models
circumstances. the diverse range of true head poses. This Gaussian dis-
tribution is used to supervise the deep neural network by
A. PROBLEM MOTIVATION employing a maximum mean discrepancy loss. Additionally,
Current multi-pose human face matching techniques face the authors also introduced a spatial channel-aware residual
challenges in handling pose equalization and face rotation, attention structure to enhance the intrinsic pose features,
resulting in suboptimal outcomes. Deep learning models, further improving the prediction accuracy and speeding up
such as YOLO-V5, proposed to address these complexities, the training convergence process.
suffer from slow frame matching speeds, leading to lower Accessible position in face recognition pertains to the
face recognition accuracy. Existing literature on multi-pose system’s ability to process face images captured in diverse
human face detection lacks comprehensive analysis, leaving real-world situations. This encompasses scenarios where
a research gap for practical and effective face matching the face may be partially obscured or only partly visible.
algorithms. To bridge this gap, we present a real-time face Additionally, accessible position handling includes situations
matching algorithm based on YOLO-V5. Our method lever- where the face is not directly facing the camera, but rather
ages multi-pose human patterns and various face orientations tilted or captured at an angle. Literature given below discusses
to improve recognition accuracy. By addressing face rotation the accessible position in detail.
with error functions, our algorithm offers high accuracy Sign language recognition depends on three main channels
and reduced delay, making it valuable for identifying and of information i.e. hand gesture, body pose, and facial
monitoring humans in real-world applications. expression. The authors in [24] utilized SMPL-X, a modern
parametric model that allows the extraction of 3D body shape,
II. RESEARCH BACKGROUND face, and hand information from a single image. By using
The research background is discussed in this section in detail. this comprehensive 3D reconstruction, the authors conducted
SLR and found that it resulted in greater accuracy compared
A. DEPTH POSE ALIGNMENT OF IMAGE to recognizing information from raw RGB images or 2D
Depth pose alignment plays a crucial role in face recognition skeletons. Additionally, the authors highlighted the signifi-
systems, especially when dealing with images captured under cance of combining information from all three channels to
different poses. Conventional face recognition algorithms achieve the best recognition outcomes.
often encounter difficulties when faced with images where The authors in [25] proposed a method to improve the
the subject’s face is not directly facing the camera, leading to speed and accuracy of face recognition system. he system
compromised performance. The primary objective of depth utilized a combination of mixed methods and strategies,
pose alignment is to address these challenges by transforming incorporating deep learning and machine learning techniques.
the pose of the face in the image into a standardized frontal The project consisted of four primary stages. Initially, the
view. This standardization greatly facilitates the recognition Histogram of Oriented Gradients (HOG) was applied to
algorithm in accurately matching and identifying faces. swiftly identify faces in digital images. Following successful
A comprehensive analysis of this concept can be found in the face detection, a customized facial landmark estimation
literature, as presented below. process was employed to delineate five distinct facial regions.
The existing computational intelligence learning models Subsequently, the segmented face was passed through a
are unable to accurately distinguish faces in images with pre-trained facial model for recognition purposes. The study’s
varying perspectives [17]. This is because changes in surface results indicate that face recognition algorithms designed to
and pattern caused by shifting perspectives often outweigh operate in real-time using modern deep learning techniques
the differences between individuals, as highlighted in [18] can be efficiently deployed on inexpensive computing
and [19]. The more recently developed face recognition devices.
techniques for face recognition can be broadly categorized
into two groups [20]. The first group includes single-stage
techniques such as edge detection and feature extraction, B. BLAZE POSE
which represent bottom-up style as deep features. These The authors in [26] proposed a novel technique that can
techniques have been successfully applied to in-depth pose estimate 3D face shapes and animatable details that are
face recognition from images. The second group is multi-pose unique to an individual but vary with expressions. Their
area approaches, which integrate the internal capabilities of proposed approach, named Detailed Expression Capture and
face angles into a shared latent region that allows for the Animation (DECA), is trained to produce a UV displacement
concept of multi-face recognition. Both of these approaches map from a low-dimensional latent representation that con-
are elaborated in [21] and [22]. tains both person-specific and generic expression parameters.

26048 VOLUME 12, 2024


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

The authors developed a regressor that can predict detail, diverse face variations and can result in reduced accuracy in
shape, albedo, expression, pose, and illumination parameters challenging scenarios.
using only one image. The proposed model is trained On the other hand, the multiple models approach uti-
on images captured in-the-wild, without any paired 3D lizes a collection of specialized models, each trained to
supervision. Consequently, the model achieved significant excel in recognizing specific subsets of faces or handling
improvement in shape reconstruction accuracy over two particular challenges. This approach enhances the system’s
benchmarks. performance and robustness, especially in dealing with
In [27], a new network for 3D face reconstruction, called variations in face appearance, pose, and lighting conditions.
CED-Net, is presented. The network incorporates contextual However, it introduces increased complexity and compu-
information at both the shape and feature level. The loss tational overhead in managing and maintaining multiple
function is constrained by considering the shape context models.
relationship, where the Euclidean norm and vector angle The authors in [30] summarize the CNN-based methods
similarity are computed for each contextual vector. To incor- used for face identification selecting face models from a
porate contextual information at the feature level, the network larger population. The authors provide an outline of the latest
uses a local feature correlation modulator in its center section. developments in this field and examine the current state-of-
This allows the network to capture the relationship between the-art CNN-based face recognition and verification systems.
facial features from a spatial perspective. A method based on YOLO for facial recognition is
A face tracking method for Human Robot Interaction presented in [31]. The authors therein proposed a system for
(HRI) is discussed in [28]. The proposed method is presented recognizing facial expressions in a smart classroom setting.
for face detection using the Viola-Jones algorithm, while face To achieve improved results, the authors employed YOLO to
tracking is achieved using the Kanade-Lucas-Tomasi (KLT) extract face images from high-resolution videos of multiple
algorithm with different pose conditions. The camera motion students. After pre-processing the images, a self attention
is controlled based on the displacement between the frames, based model called Vision Transformer (ViT) is utilized to
which is obtained from the tracking result of the previous recognize facial expressions. The authors then utilize the
stage. Real-time experiments show that the proposed system classified facial expressions to help teachers analyze the
can successfully track human faces even when the subjects learning status of their students and provide suggestions for
are wearing glasses, hats, or in lateral face postures. improving teaching effectiveness.
The authors in [29] presented a method to overcome The authors in [32] proposed an online platform for
the challenges associated with face recognition. In face face recognition. They prove that the proposed platform
recognition applications, a major obstacle is the significant provides features such as user and criminal information
differences between profile and frontal faces. Existing management and real-time facial recognition for identifying
techniques address this issue by either synthesizing frontal criminals through a live stream camera feed. The system is
faces or by learning pose invariance. The authors propose designed for use by two types of users: police employees
a new approach using Lie algebra theory to investigate how and administrators who have higher-level access and database
rotating a face in 3D space affects the process of generating maintenance responsibilities. The Haar Cascade algorithm
deep features with CNNs. The paper demonstrates that face is used and extended for efficient real-time recognition.
rotation in the image space is equivalent to an additional The website is developed following the MVC pattern and
residual component in the CNN feature space, which is includes a live feed section with video filters to optimize
determined solely by the rotation. Based on this finding, the recognition results. The development process involved exten-
paper proposes a Lie Algebraic Residual Network (LARNet) sive research on face recognition algorithms and related
to address the issue of pose robust face recognition. The platforms, requirements definition, persona and scenario
LARNet consists of a residual subnet for decoding rotation development, communication with stakeholders, heuristic
information from input face images, and a gating subnet to evaluation, and feedback collection via a questionnaire. The
learn the rotation magnitude and control the strength of the approach was successful in achieving its goals, as evidenced
residual component involved in the feature learning process. by the results of the feedback analysis.
To overcome the limitations associated with face recog-
nition by using video surveillance cameras, a dataset is
presented in [33], wherein it is shown that, though deep
C. SINGLE AND MULTIPLE MODELS FOR FACE learning models render impressive performance in facial
RECOGNITION SYSTEMS recognition, they perform poorly in surveillance scenarios.
In face recognition systems, two main approaches are It is further shown that the accuracy of face recognition
commonly used: the single model approach and the multiple depends not only on the structure of the model but also
models approach. on the quality and diversity of the training samples. It is
The single model approach involves training a single model demonstrated that the existing multi-pose face datasets do
on a dataset containing images of various individuals. While not include complete top-view face samples, which limits the
simple and efficient, this approach may struggle to handle accuracy of the models trained on them.

VOLUME 12, 2024 26049


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

FIGURE 2. Changing the background Tier of the YOLO-V5 model [38].

A YOLO-V4 based scheme for head detection and people RealSense SR305. This database includes three modalities:
counting is presented in [34]. It is shown that counting color, depth, and near infrared and contains variations in
accuracy is affected by pedestrians blocking each other and pose, expression, occlusion, and distance. We preprocess
occluding their heads, especially in crowded areas. the data using a face alignment method, and a Point Cloud
The authors in [35] discussed a method for public identifi- Spherical Cropping Method (SCM) is applied to remove
cation. Increasing attention has been given to face recognition background noise in the depth images. We also design
due to its importance in public identity verification and an evaluation protocol for fair comparison and perform
security, as well as in information management and digital extensive experiments with different backbone networks
entertainment. Existing face recognition systems encounter to provide different baselines on this database. To our
various challenges such as pose variation, illumination knowledge, CAS-AIR-3D Face is the largest low-quality 3D
variation, and occlusion issues. The authors also propose a face database in terms of the number of individuals and the
face recognition system based on the VGG16 deep learning sample variations.
model to address these problems. To achieve robust pose and
view variant face recognition, the system utilizes MultiTask III. PROPOSED YOLO-V5 BASED MODEL
Convolutional Neural Network (MT-CNN) for face detection In our proposed model, the architecture of YOLO-V5 is
and VGGNet for face recognition. A real-time database of reconfigured by incorporating a CSP darknet slim layer and
facial images of 50 subjects was used for evaluation, and adding a p6 shell at the neck level for optimal results.
cross-validation accuracy was used to assess the system’s During the paired learning phase, the framework’s source
performance. The proposed method achieved an improved obtained from Robo-flow can consist of one or more multi-
accuracy of 95.80%, 77.50%, and 98.20% under extreme pose expressions. The representation of multiple human faces
uncontrolled conditions for ORL, FERET, and the real-time is achieved by collecting the frontal view and other aligned
face database, respectively. faces from the provided dataset [37].
Similarly, the authors in [36] highlighted the recent
progression in 2D face recognition and pointed out that the A. MODEL STRUCTURE
existing literature had limitations with respect to lighting A detailed detection mechanism based on YOLO-V5 is
conditions, poses, and face spoofing. 3D face recognition shown in Figure 2. Architectural application for the proposed
provides a solution to these limitations. However, construct- model is presented in 2(a). In 2(c), a direction bundle
ing a suitable database for 3D face recognition is a major termed CBS is demonstrated, which further emerges as a
challenge. To overcome this challenge, the authors present fully-connected stratum, packet similarity unit, and a fractal
a new database called CAS-AIR-3D Face, which contains dimension storage facility visual stimuli activity. This is used
24713 videos from 3093 individuals captured by Intel in a variety of contexts. The ending summary for the upstairs,

26050 VOLUME 12, 2024


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

TABLE 1. Ablations reading output on the custom or real and fake dataset with new changing of this YOLO-V5 model.

which includes structure framing, bravery, classifying, etc. the technology is exceptionally adept at detecting patterns
is detailed in Figure 2(c). The observations in this investiga- in pitch-black environments. The CSP connectivity in this
tion lead to the further development of YOLO-V5 to generate system is distinct from other varieties, making it a highly
a head classifier that provides better results. If there is an suitable computer vision feature for integrated or wearable
apparent lack of different point tags, the terminal version devices despite its small size. Table 1 provides further details
sixteen must be six, and the number of individuals must on the advancements and accomplishments in the feature
be scaled accordingly. The stem infrastructure employed to selection design.
recover the true awareness level in 2(d) is detailed as YOLO-
V5. The exploration of the spine surface using YOLO-V5
for feature extraction is an example of this experiment’s C. WORKING OF YOLO-V5
acknowledgment. Restricted fulfillment concern updates are YOLO [39] is a popular object detection algorithm, largely
explored in 2(e). Additionally, instead of integrating clear utilized in machine learning and computer vision applica-
knowledge and outcomes from trained neural levels, the tions. The complete framework of the YOLO algorithm is
signal is decomposed into multiple proportions. A midpoint explained in 3.
travels through a CBS place, and the number of superior There are three major components of YOLO algorithm
throat sections, subsequent Cns layer, as shown in Figure 2(f). explained as:
The back half flows below the surrounding matrix; after that,
all pair are merged and delivered through a different broadcast 1) Backbone: The initial stage of the YOLO algorithm
region. In 2(g), the 3 technique categories are ready (7, 5, 3) is referred to as the backbone, which is primarily
inside this profile identity (13, 9, 5 with YOLO-V5). accountable for feature extraction from the input
image. A CNN is usually employed as the backbone,
pre-trained on a massive dataset. Its primary objective
B. OVERVIEW OF SIGNIFICANT CHANGES is to recognize high-level characteristics in the image,
The latest advancements in the YOLO-V5 framework including edges, corners, and textures. These features
have significantly enhanced its capabilities. One crucial are subsequently forwarded to the neck, the subsequent
modification is the inclusion of a preview retrieval portion. component in the process.
However, this cutting-edge feature incurs a significant 2) Neck: The neck constitutes the second phase of the
learning rate penalty. The measurement system used in the YOLO algorithm, with the primary task of merging
blocks has also been improved, making it more impactful the features obtained from the backbone to develop an
and useful in various functions. The locations of objects array of feature maps. It commonly comprises a set
are now more accurately determined, which improves the of convolutional layers that employ spatial filters to
identification’s robustness. The diacritical marks level has condense the size and intricacy of the feature maps.
also been transformed, resulting in increased predictive The ultimate goal of the neck is to create a condensed
accuracy and reduced workload while maintaining efficiency. depiction of the input image that can be handled more
Additionally, the overall computer has been revamped, and efficiently by the subsequent phase, i.e., the head.
the SPP domain has been altered, resulting in considerably 3) Head: The last component of the YOLO algorithm is
higher resolution and making the YOLO-V5 more suitable known as the head, which is accountable for identifying
for face identification. Furthermore, a pooling tier 6 trade the objects in the input image. The head comprises
part with a distance of 48 has been integrated to enable the multiple convolutional layers that implement object
processing of larger photos. The design includes multiple data detection algorithms to the feature maps generated
amplification strategies for universal feature extraction, such by the neck. In this phase, the head anticipates a
as webcam movement and sketching, which have been found sequence of bounding boxes encircling each object
to be ineffective for biometric technology. The platform’s present in the image, together with their respective class
efficiency has been boosted by extending the display, and labels. Moreover, the head executes non-maximum

VOLUME 12, 2024 26051


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

FIGURE 3. Complete framework of YOLO-V5.

suppression (NMS) to get rid of overlapping bounding


boxes, enhancing the accuracy of the detection process.

1) MATHEMATICS BEHIND YOLO ALGORITHM FIGURE 4. Dataset creation flowchart.


The optimization of the loss function is a crucial aspect of
the YOLO algorithm during training. This function is entirely
reliant on the sum-squared error, as expressed in [40]:
Confidence score, which determines the presence of an
2 object within the bounding box, is computed and incorporated
S X
B
X obj into the loss function to account for errors in object
λcoord 1ij [(xi − x̂)2 + (yi + ŷi )2 ] obj
detection.1ij will have value of ‘1’ if the object is present
i=0 j=0
2
in the bounding box and otherwise it will be ‘0’. The last part
B
S X  √ 2 √ p 2  S2
X obj √ obj
+ λcoord ω − ω̂ + (pi (c) − p̂i (c))2 is responsible
P P
1ij i − î of the function 1i
i=0 j=0 i=0 c∈classes
for the class probability loss. Whenever there is no object,
2
S X
B
X obj
YOLO does not care about the classification error [39]. The
+ λcoord 1ij [(Ci − Ĉi )2 ] loss function is optimized during the training process and
i=0 j=0 is responsible for minimizing the classification error of an
2
S X
B object present in a particular grid cell and reducing the error
noobj
X
+ λcoord 1ij [(Ci − Ĉi )2 ] in the coordinates of the bounding box.
i=0 j=0
S2 IV. EXPERIMENTATION
obj
X X
+ 1i (pi (c) − p̂i (c))2 Experimental work carried out for the proposed model is
i=0 c∈classes shown in Figure 8.

The initial part of the equation calculates the loss based


A. DATASET CREATION
on the predicted position of the bounding box and the actual
S2 P
B For the implementation of the proposed model Private
obj
position of the bounding box. λcoord 1ij [(xi − x̂)2 +
P
and RFFD-based database is utilized and accessed through
i=0 j=0 Kaggle. The initial step is the level creating samples earlier
obj
(yi + ŷi )2 ] by using (xcenter , ycenter ) coordinates [39]. 1i learning, with the 1680 phases as shown in Figure 4.
represents whether the object exists in a specific cell of
obj
the grid made over detection i and 1ij represents that jth B. COACHING DATA SOURCE
bounding box predictor is in that specific cell i. In the second
S2 P
The subsequent phase entails employing a YOLO-V5
B
obj √ √ √
part, the loss function λcoord 1ij [( ω − ω̂)2 +( i − algorithm for training the model, and the weight file obtained
P

p i=0 j=0 from this process is utilized for testing, as depicted in


) 2 ] calculates the error in bounding box prediction. Figure 5.

26052 VOLUME 12, 2024


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

FIGURE 5. Dataset training flowchart.

FIGURE 7. After Changing the Dataset from Raw to Labeled Face with
Bounding Box looking like this.

FIGURE 6. Image classes are described as real and fake faces.

A distinctive YOLO-V5 recognition method is generated


at Google Collab from the sample, which is then utilized to
train the sample. Upon completion of the retraining process, FIGURE 8. Validation Flowchart.
the material can be evaluated using images. This process
began with the dataset collection in 2019, as outlined in [41].
The collection comprised around 150,000 captioned features
across a frequency range and profile trait filtration, along with
3,892 augmented raw snapshots used to establish fake and
genuine face detection. Figure 7 shows the outcomes of this
phase.

C. CLASSIFICATION
The third phase involves the identification and classification
of multi-pose human faces using a pre-trained model.
During the training of the YOLO-V5 framework, the process
commences with 100-150 iterations, also known as epochs. FIGURE 9. Output of testing images with their accuracy.
The results are shown in categories of real face and fake face
as shown in Figure 6, calculate based on which consists of
416 width size, 16 blocks, and 100 epochs. by inference squares, precision levels, and image classes.
The recognition results of an entity type are then deposited
1) PERFORMANCE IMPROVEMENT in the summary dataset, and subsequently, the entries in
Last 50% of the dataset is utilized to achieve improvement the document registry are exhibited on the page. The report
by dividing the data into blocks that are equivalent in terms segment is designed to present the face-matching accuracy
of accuracy. The pixel count is primarily 415 on the seventh in numerical terms, along with the user’s name, date, and
line and is further categorized into averages, top speeds, and time of participation. All these transformations are illustrated
estimation squares. Ultimately, the results are presented using in Figure 9, which depicts the data’s appearance at the
confidence scores, frame categories, and certainty numbers, GPU-based endpoint, following the completion of the entire
as depicted in Figure 8. process.

D. PREDICTION E. OUTCOME
In the next phase, results are predicted based on the proposed The final stage of the proposed model is the outcome. At this
model as shown in Figure 10. The outcomes are represented phase, the final results are resented and a report is prepared.

VOLUME 12, 2024 26053


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

V. EVALUATION MATRICES OF PROPOSED MODEL TABLE 2. Comparison of the dataset with previous datasets.
In 1 C represents the total number of specific problem
domains. If the participant’s center does not fall within this
element, the image will not be retrieved. Equation 1 is used
to determine the exact position of the customer’s estimation
shot.
Cji = Pi,j × IOUprep
truth
(1)
The equation (C,J,I) represents the strength proportion of
the j-th shape identification frame of the i-th template in terms poses and subcategories of genuine and counterfeit appear-
of the bandwidth strokes. The parameters i and j correspond ances. This is commonly used to differentiate between
to the point of the i-th and j-th elements in the equation. The a genuine image and one with numerous distinguishing
quadratic formula is used to determine whether there is a features as well as bogus heads. The dataset employed in this
body or target, where j equals 1 if there is a Figure in the j-th research yields superior outcomes compared to prior datasets,
panel and 0 otherwise. The output anticipate is a frequently as demonstrated in the comparison presented in Table 2.
used parameter that bridges the gap between the selected To establish the data source, it is necessary to gather pictures
and fundamental constitutional images. The accuracy of the in a JPG format, followed by cataloging or marking each
identified image increases with a higher threshold ratio. The image with a box. The marker then generates a file name
loss value of the model is defined in Equation 2. as an output. Lastly, the file scripts are integrated with the
information for image preparation [43].
Loss Value = Loss Box + Loss Clssification Table 2 compares the dataset design and evaluation to
+ Loss Object (2) ensure the validity of this research. The dataset comprises
meticulously enhanced and altered facial landmarks that are
L-bx, l-clas, and l-obj serve as threshold markers for filtering
composites of several individuals differentiated by glasses,
out false positives, measuring classification accuracy, and
facial features, neck, or natural appearance. Furthermore,
identifying loss attributes respectively, throughout the pre-
it examines each pattern in every round. This approach
ceding computation. The class label used for structuring the
is consistent with previous studies that employed both
element for the object is demonstrated in the above formula.
supervised and unsupervised techniques to construct the
The pattern grid cell is utilized for the categorization of
YOLO-V5 framework, which was utilized in this study,
both the illustration and shaping of the basic, as shown in
resulting in an accuracy of 99%. However, in contrast to
Equation 3.
previous investigations, the raw data used here is applied to
p2 (b, b2 ) recognize facial images from real-time streams.
b′bx = 1−IoU + truth
Pi,j × IOUprep (3)
c2
The assessment of image quality is imperative to ensure VII. EXPERIMENTAL RESULTS
the development of an accurate image detection model that Experimental results performed for the proposed research
can identify individuals in the model effectively. When work are discussed here.
comparing the F1 rate value and the rate, the former exhibits
higher values, thus representing the key measure of the A. GRAPHICAL INTERFACE OF DATASET AND MATRICES
efficiency of our model, as illustrated in Equation 4 [42]. USED IN YOLO-V5 WHICH BASED AT GPU
The point at which the graph starts to slope downwards
TP
Precision = × Recall (in the multi-pose method) after 100 iterations at [email protected],
TP ± FN Precision, and Memory represents the threshold for reduced
TP output, in accordance with the desired learning objectives.
= × F1
TP ± FN The results of this can be seen in Figure 10.
2×P×R The graphical representation of the RFFD dataset training
= (4)
P±R and implementation with the YOLO-V5 model for multi-pose
The recall ratio of 100, the performance rate of 0.985, and human face recognition is displayed in Figure 10, depicting
F1 score of 0.998 were obtained from the equation presented. the bounding box accuracy and the result precision of
This methodology was effectively applied to the generated the images. The second block demonstrates the image
dataset as described in equation 4. Furthermore, based on the classification accuracy, while the third block highlights the
fundamental principles of score analysis, it is evident that the precision accuracy of the dataset. The final block illustrates
proposed approach delivers enhanced outcomes. the recall accuracy of the model, and the results are based on
mean average precision and time with various parameters’
VI. DISCUSSIONS accuracy [43]. The outcomes are a consequence of the
Ultimately, the primary objective of this study is to identify learning process, as demonstrated by the graph in Figure 10,
facial landmarks using a vast database containing various where the maximum accuracy priority of 0.99% is achieved

26054 VOLUME 12, 2024


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

FIGURE 10. Graphical Representation of Dataset Training with YOLO-V5 Matrices.

at 100 epochs. Consequently, increasing the number of scenarios (76%). Similarly, YOLO-V5 based Live human
iterations above 100 yields greater rewards. detection [50] method achieves commendable accuracy
The accuracy of face recognition is assessed separately on at the basic level (95.6%) but faces notable difficulties
both GPU and CPU levels for specific reasons. This approach in handling tough frames (68%). While the Structure of
allows researchers and developers to compare the system’s multi-patterns with YOLO-V5 [51] demonstrates promising
performance under different computational configurations. results at the basic level (93.1%), it reveals limitations
By doing so, they can understand the hardware’s impact, eval- in more complex scenarios (80%). On the other hand,
uate scalability, and make informed decisions about resource YOLO5Face algorithm [38] showcases remarkable accuracy
allocation based on speed and accuracy. The distinction at both the basic (96%) and middle (95%) frame levels but
between GPU and CPU accuracy provides valuable insights encounters a decline in performance in challenging frames
into how each configuration affects the face recognition (86%). YOLO-V5 base live streaming human face detection
system, ensuring a well-informed deployment strategy. algorithm [52] also performs well at the basic level (91.9%),
but its accuracy decreases in tough frames (74.9%). The
B. FACE RECOGNITION ACCURACY RESULTS OF Angular measurement method [53] exhibits commendable
PROPOSED MODEL AT GPU accuracy at the basic (94%) and middle (93%) levels, but it
Previous studies on human cross profile detection models faces challenges in tough scenarios (80%). Despite Human
utilizing YOLO-V5 and multiple databases have shown multi-pose recognizer with YOLO-V5 [54] achieving high
significant performance. However, many experiments have accuracy at all frame levels (95.5%, 94.5%, and 88% at basic,
struggled to accurately identify natural human appearances, middle, and tough levels, respectively), it still shows potential
and improving their accuracy has posed challenges. Prior limitations in more difficult scenarios.
research has compared the proposed strategy presented in The proposed method, a modified version of YOLO-V5,
Table 3. In 2021, an enrollment individual recognition emerges as a groundbreaking solution to the shortcomings
system was developed to enhance education and employment of existing methods in face detection. Based on the results
processes. Individuals trained their YOLO-V5 framework for presented in Table 3, the proposed model showcases better
template matching using 1280 facial landmark images. The performance, surpassing all other available methods across
quality of the study was based on retrieval but rather implies all frame levels. With a perfect 100% accuracy at the basic
reliability precision to the facial expression of sentient frame level and an impressive 99.5% accuracy at the middle
pixels occurring in various components termed (simple, mild, frame level, the proposed model demonstrates its ability to
tough) structures [49]. Although it takes a long time to accurately detect faces in both simple and moderately com-
discover individuals, as it requires 5000 cycles, they also plex scenarios. Most notably, the proposed method excels in
struggle to find many facial landmarks, as shown in Table 3 tackling tough frame-level situations, achieving a remarkable
below. 99% accuracy, showcasing its robustness and adaptability
Table 3 presents a comprehensive overview of various to challenging real-world conditions. The utilization of
face detection studies utilizing YOLO-V5 as the base the Private and RFFD database, consisting of 1680 faces,
framework. Among the methods, YOLO-V5 base Attendance further enhances the model’s ability to generalize and detect
system [43] exhibits robust accuracy at the basic frame level faces with high precision. The proposed method’s superior
(97.5%), but its performance falters in more challenging performance can be attributed to its effective modifications

VOLUME 12, 2024 26055


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

TABLE 3. Accuracy comparison of YOLO-V5 algorithms with this proposed YOLO-V5 model (online GPU).

and optimizations to the YOLO-V5 architecture, making it


a pioneering advancement in the field of face detection and
a promising foundation for future research and real-world
applications.

C. CPU BASED INTERFACE FOR MULTI-POSE


FACE RECOGNITION
The Python interface is utilized to convey the facial
recognition results to the CPU, where Indian celebrity images
are procured from online search engines to assess the
efficacy of our YOLO-V5 model interface. The interface’s
impact on retention is gauged by several factors, including
accuracy, the number of participants, and the identification
of users along with their respective names. Furthermore, FIGURE 11. Multi-pose face recognition system results in images.

the interface architecture is agnostic, meaning that it can


be accessed without any language constraints [56]. The
multi-pose recognition model for human faces is acquired expressions with seven categories, and the (RFFD) sets
using Python, and the recognition time for each individual are used to create YOLO-V4 patterns for each individual.
takes approximately 2 seconds, as demonstrated in Table 4. While the personal YOLO-V4 shows advancements over
The research project involving Classifier edition is note- past versions, it requires less data and takes longer than the
worthy for its use of rigorous tasks, including 460 visuals YOLO-V5 employed in this work.
sourced from Web search, examination of a limited dataset of Their approach yields an 8% error rate within a lengthy
only 15 image data to derive conclusions for mouth matching, timeframe of seventy seconds, requiring significant effort,
resulting in a reliability of 87% based on 14 images. However, and offering minimal return with a 2% false positive rate.
a previous report highlights the unreliability of sample Thus, in comparison to other models, our system proves to be
sizes, with only 90 individuals identified and misleading more accurate. The experimentation was conducted through
results generated, resulting in a completely false value of digital means, specifically photographs, on a runtime NVidia
15%. Profile pairing is also a time-consuming process, P100-PC-64GB with 52280 MB of storage, using a computer
from ideation to obtaining results using supplied information vision learning algorithm. The system executes image
on individuals. Nevertheless, YOLO-V5 utilized in this face detection, processing 40 photos containing multiple
experiment yields superior benefits when compared to other faces. The face-matching stage for the 40 photographs was
versions, such as YOLO-V3 which used 40 screenshots to completed within 5 seconds, including the pre-treatment
evaluate their model with multiple categories. Despite the phase. The average time for each portrait’s face recognition
challenge of head identification with only 36 screenshots, is 0.02 seconds. Refer to Figure 11 for a visual representation
a precise pairing of mouth and six different illustrations of some of the matching results.
resulted in useful output. However, the accuracy of prediction Based on the analysis of the first iteration of the YOLO
is inferior compared to the conceptual framework presented model, the authors used 160 images from the internet, out
in this study. The database comprises sixty photos of facial of which only 15-20 were analyzed, and proved to be

26056 VOLUME 12, 2024


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

TABLE 4. Average recognition measurement of performance with other earlier utilised YOLO editions for multi-posse appearance identification
(Desktop-based).

successful in identifying profiles with an accuracy of 86%


for 14 images [57]. However, six photos were not recognized,
leading to false positives and limiting the combined outcome
to 71 people, resulting in a false incidence rate of 15%.
Profile identification also requires a significant amount
of time, from contributing ideas to receiving conclusions
using provided information about an individual. However,
the YOLO-V5 model employed in this investigation offers
superior outcomes compared to other editions. In contrast
to the YOLO-V3 model, which used 41 images to evaluate
their model across four categories, our model uses only
36 images for profile identification, resulting in more
accurate classification performance. In addition, the YOLO-
V4 model employs RFFD sources of data involving sixty
photos of human facial expressions with seven categories,
resulting in advancements over older designs but requiring
less information and running slower than the YOLO-V5
model used in this report.
On the other hand, the YOLOv5 model used in 2022 mostly
FIGURE 12. The result of mean average precision as Length (y) provides
employed seventeen images to evaluate its prediction perfor- the IoU threshold numbers, Area (x) describes the exact efficiency of the
mance, requiring a lot of effort and producing a 9% false cost system.
in seventy milliseconds. However, our system corresponds
accurately in a shorter amount of time and yields a one-part
incorrect ratio, making it more reliable than other models. enables the retrieval of critical information. The potential for
In the face detection post, the records used include superior composite classification performance is estimated to
1680 images, consisting of 1420 input samples for the be 96%. In a previous assessment, the vectorized platform
training phase, 220 images for validation accuracy, and was evaluated using an IoU benchmark of 0.5. In this survey,
40 samples for testing data. The distribution pattern is 92.9% the benchmark was raised to 0.6, resulting in a production
for the training set, 4.5% for priority basis, and 2.6% for of 0.95 accuracy and reliability in diagnosis, as depicted in
system testing. The matrices evaluated with their results Figure 11. The precision outcomes of the multi-pose human
during training and testing time give good results based on faces recognition model are represented in the form of a
precision, recall, and mean precision average. The output of graph, as illustrated in Figure 13.
the tensor board, including loading the model results and The parameter of interest is the number of true positives
running it, is shown in Figure 12. divided by the sum of true negatives and false negatives in
In the context of human detection, overall accuracy the quality matrix. As previously demonstrated, the value at
serves as an impartial estimate that is widely accepted a bounding box target of 0.5 is 100%, which is represented
for edge detection, such as characterization and indexing. by the formula 1/(1+1)=0.5. This precision rate is excellent
Throughout this experiment, this metric is employed to for both the RFFD dataset and COCO dataset. In the case of
select significant features, determine the head structure, and multi-pose face detection using YOLO-V5, the recall value
identify individuals, all while achieving an impressive 100% is 100%, represented as a range from 0.5 to 1, as shown in
accuracy and a 0.97 reliability score for the exact textual Figure 14.
location of selected features, such as a face pattern with a The outcome of the parameter is calculated as the ratio
profile name and serial number. This is noteworthy as it of true positive results to the sum of true positive and false

VOLUME 12, 2024 26057


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

FIGURE 15. The results of package damage throughout the samples


FIGURE 13. Result of precision as length (y) displays the IoU points, Area training period like (y) point shows the range of sample (x) displays the
of width (x) shows the correct accuracy of the system. iterations correct accuracy fail in decreasing level.

FIGURE 16. The results of transmission line losses throughout the


pre-processing step of the sample are as follows: (y) Shown that is
FIGURE 14. The outcome of recall shows as (y) defines the IoU graphic
symbolized either by axis (x) The period reliability damage is given by
series, (x) shows the correctness of the design.
pivot in decreasing order.

positive outcomes, within the recall vector. As previously precision of the model. Figure 15 illustrates the boxing loss
mentioned, this ratio is 100% when a single threshold value in the dataset.
is used, which can be represented by the equation 1/1+0=1. During the training phase of the dataset, all images are
While previous studies have reported some losses in the considered as objects due to the classification process. The
training dataset, this study proposes the use of three loss of images is minimal at 0.5%. Following the boxing
different loss functions during training, namely boxing loss, training of the given dataset, the classification loss of the
classification loss, and face recognition loss. The severity of dataset is depicted in Figure 16.
the loss due to inadequate training is depicted in Figures 14, Therefore, each set of photographs requires a unique
15, and 16, ranging from 95 to 0 and extending beyond identifier during the training phase to be recognized and
infinity. The peak of each graph represents the magnitude of identified later. While there may be some minor losses in
the loss, while the spread of the error function represents the a categorized training dataset, there are no losses at 225.

26058 VOLUME 12, 2024


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

actions, and angles and height were consistently picked for


this method. The predictor was trained using the sample,
resulting in precise individual identification. Mouth detection
and face recognition in low-light environments have also
improved, addressing difficulties regarding actual posture
evaluation when fitted with mirrored surfaces or protective
suits. We aim to conduct experiments from various angles,
assuming the social world to be free of distortions. Our
observations demonstrate the possibility of separating this
group of actions. However, different databases, including the
multifractal element of three-dimensional graphics, present
harsh and complex drawings that require significant research
and resources to understand multimedia webcams before
handling large amounts of data. Path length assessment is
the primary focus for further development, given its poorer
efficiency and unclear shots when recognizing a user and
the database from a distant location. Our objective is to
assert super-intelligent categorization and determine the most
FIGURE 17. The outcome of face matching loss during the training time of suitable procedure to maintain the accuracy and reliability
dataset looks like as (y) Each facing the quantity of data (x) The era
stability loss is described by a vector in lowest to highest. of human activity recognition. We chose the RFFD dataset
and a personalized sample using the YOLO-V5 model for
greater concentrations, despite its high response rate, being
acceptable for our investigation.
Figure 17 displays the face-matching loss of the dataset after
the boxing and classification training. REFERENCES
At the outset of the photo training process, there was a 2% [1] Y. Kortli, M. Jridi, A. A. Falou, and M. Atri, ‘‘A novel face detection
incidence of item loss. Upon declaration of 35 instances, the approach using local binary pattern histogram and support vector
algorithm selectively identifies only those which had been machine,’’ in Proc. Int. Conf. Adv. Syst. Electric Technol. (IC_ASET),
Mar. 2018, pp. 28–33.
previously activated for this purpose, resulting in occasional [2] I. Adjabi, A. Ouahabi, A. Benzaoui, and A. Taleb-Ahmed, ‘‘Past, present,
shortfalls. and future of face recognition: A review,’’ Electronics, vol. 9, no. 8,
p. 1188, Jul. 2020.
[3] X. Sun, P. Wu, and S. C. H. Hoi, ‘‘Face detection using deep learning: An
VIII. CONCLUSION AND DISCUSSION improved faster RCNN approach,’’ Neurocomputing, vol. 299, pp. 42–50,
This work primarily focuses on utilizing the YOLO-V5 Jul. 2018.
framework for image processing to classify individual [4] S. Setiowati, E. L. Franita, and I. Ardiyanto, ‘‘A review of optimization
method in face recognition: Comparison deep learning and non-deep
interactions and body language from assertive postures, left learning methods,’’ in Proc. 9th Int. Conf. Inf. Technol. Electr. Eng.
and right sides, head orientation, and angular orientation (ICITEE), Oct. 2017, pp. 1–6.
of shapes via webcam. This approach proves to be effi- [5] W. Chen, H. Huang, S. Peng, C. Zhou, and C. Zhang, ‘‘YOLO-face: A
real-time face detector,’’ Vis. Comput., vol. 37, no. 4, pp. 805–813, 2021.
cient in addressing problems that require quickness and [6] R. Girshick, J. Donahue, T. Darrell, and J. Malik, ‘‘Rich feature hierarchies
effectiveness. Our research findings indicate that a highly for accurate object detection and semantic segmentation,’’ in Proc. IEEE
certified dataset, along with an active GPU and CPU, can Conf. Comput. Vis. Pattern Recognit., 2014, pp. 580–587.
[7] P. Viola and M. Jones, ‘‘Rapid object detection using a boosted cascade of
efficiently handle such scenarios. Cross-facial expression simple features,’’ in Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern
identification in responsive sight is a crucial research subject Recognit., 2001, pp. 1–12.
closely related to our daily routine. Processor-based actual [8] S. Ioffe and D. A. Forsyth, ‘‘Probabilistic methods for finding people,’’ Int.
template matching techniques facilitate human-computer J. Comput. Vis., vol. 43, no. 1, pp. 45–68, 2001.
[9] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Spatial pyramid pooling in deep
interactions, reducing instances and criminals. In-hospital convolutional networks for visual recognition,’’ IEEE Trans. Pattern Anal.
treatment and consultation benefit from an interpretive Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sep. 2015.
approach, significantly reducing the need for awareness [10] R. Girshick, ‘‘Fast R-CNN,’’ in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
Dec. 2015, pp. 1440–1448.
efforts and resource expenditure, thereby improving our [11] S. Ren, K. He, R. Girshick, and J. Sun, ‘‘Faster R-CNN: Towards real-time
quality of life. Advanced analytics has emerged as a compre- object detection with region proposal networks,’’ in Proc. Adv. Neural Inf.
hensive education requirement as the world’s methodological Process. Syst., vol. 28, 2015, pp. 1–12.
[12] J. Dai, ‘‘R-FCN: Object detection via region-based fully convolutional
capabilities continue to evolve. This approach involved networks,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 29, 2016, pp. 1–9.
gathering and analyzing blockchain materials to extract the [13] M. Bizjak, P. Peer, and Ž. Emeršic, ‘‘Mask R-CNN for ear detection,’’
suggested YOLO-V5 techniques and head improvements, in Proc. 42nd Int. Conv. Inf. Commun. Technol., Electron. Microelectron.
resulting in a significant increase in the accuracy and effi- (MIPRO), May 2019, pp. 1624–1628.
[14] D. Li, Z. Li, R. Luo, J. Deng, and S. Sun, ‘‘Multi-pose facial expression
ciency of human identification. The directional percentage, recognition based on generative adversarial network,’’ IEEE Access, vol. 7,
moving up shapes, stylistic layering, x and y-directional pp. 143980–143989, 2019.

VOLUME 12, 2024 26059


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

[15] F. Zhang, J. Gao, H. Zhou, J. Zhang, K. Zou, and T. Yuan, ‘‘Three- [39] D. Thuan, ‘‘Evolution of YOLO algorithm and YOLOv5: The state-of-the-
dimensional pose detection method based on keypoints detection network art object detention algorithm,’’ Bachelor’s thesis, DIN16SP, Inf. Technol.,
for tomato bunch,’’ Comput. Electron. Agricult., vol. 195, Apr. 2022, Oulu Univ. Appl. Sci., Finland, 2021.
Art. no. 106824. [40] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, ‘‘You only look once:
[16] M. D. Putro, Wahyono, and K.-H. Jo, ‘‘Multiple layered deep learning Unified, real-time object detection,’’ in Proc. IEEE Conf. Comput. Vis.
based real-time face detection,’’ in Proc. 5th Int. Conf. Sci. Technol. (ICST), Pattern Recognit. (CVPR), Jun. 2016, pp. 779–788.
vol. 1, Jul. 2019, pp. 1–5. [41] A. A. Una, E. Haque, N. S. Ritu, Z. T. Haque, and R. S. Opal,
[17] Z. Ren and X. Xue, ‘‘Research on multi pose facial feature recognition ‘‘Classification technique for face-spoof detection in artificial neural
based on deep learning,’’ in Proc. 5th Int. Conf. Mech., Control Comput. networks using concepts of machine learning,’’ Bachelor’s thesis, Dept.
Eng. (ICMCCE), Dec. 2020, pp. 1427–1433. Comput. Sci. Eng., Brac Univ., Bangladesh, 2021.
[18] S. Ruan, C. Tang, X. Zhou, Z. Jin, S. Chen, H. Wen, H. Liu, and D. Tang, [42] V. Shinde, N. Jagtap, and H. Shukla, ‘‘Deep learning based face-mask
‘‘Multi-pose face recognition based on deep learning in unconstrained and shield detection,’’ in Proc. Int. Conf. Comput. Intell. Comput. Appl.
scene,’’ Appl. Sci., vol. 10, no. 13, p. 4669, Jul. 2020. (ICCICA), Nov. 2021, pp. 1–4.
[43] Mardiana, M. A. Muhammad, and Y. Mulyani, ‘‘Library attendance system
[19] S. B. Ahmed, S. F. Ali, J. Ahmad, M. Adnan, and M. M. Fraz, ‘‘On the
using YOLOv5 faces recognition,’’ in Proc. Int. Conf. Converging Technol.
frontiers of pose invariant face recognition: A review,’’ Artif. Intell. Rev.,
Electr. Inf. Eng. (ICCTEIE), Oct. 2021, pp. 68–72.
vol. 53, no. 4, pp. 2571–2634, Apr. 2020.
[44] H. Wu, D. Ma, Z. Mao, and J. Sun, ‘‘SSRFD: Single shot real-time face
[20] M. Ben Gamra and M. A. Akhloufi, ‘‘A review of deep learning techniques detector,’’ Appl. Intell., vol. 52, no. 10, pp. 11916–11927, 2022.
for 2D and 3D human pose estimation,’’ Image Vis. Comput., vol. 114, [45] F. A. M. Ali and M. S. Al-Tamimi, ‘‘Face mask detection methods and
Oct. 2021, Art. no. 104282. techniques: A review,’’ Int. J. Nonlinear Anal. Appl., vol. 13, no. 1,
[21] M. Toshpulatov, W. Lee, S. Lee, and A. H. Roudsari, ‘‘Human pose, hand pp. 3811–3823, 2022.
and mesh estimation using deep learning: A survey,’’ J. Supercomput., [46] Q. Xu, Z. Zhu, H. Ge, Z. Zhang, and X. Zang, ‘‘Effective face detector
vol. 78, no. 6, pp. 7616–7654, Apr. 2022. based on YOLOv5 and superresolution reconstruction,’’ Comput. Math.
[22] P. Gao, K. Lu, J. Xue, L. Shao, and J. Lyu, ‘‘A coarse-to-fine facial Methods Med., vol. 2021, pp. 1–9, Nov. 2021.
landmark detection method based on self-attention mechanism,’’ IEEE [47] S. Dooley, G. Z. Wei, T. Goldstein, and J. P. Dickerson, ‘‘Are com-
Trans. Multimedia, vol. 23, pp. 926–938, 2021. mercial face detection models as biased as academic models?’’ 2022,
[23] Y. Zhang, K. Fu, J. Wang, and P. Cheng, ‘‘Learning from discrete Gaussian arXiv:2201.10047.
label distribution and spatial channel-aware residual attention for head pose [48] A. Douklias, L. Karagiannidis, F. Misichroni, and A. Amditis, ‘‘Design and
estimation,’’ Neurocomputing, vol. 407, pp. 259–269, Sep. 2020. implementation of a UAV-based airborne computing platform for computer
[24] A. Kratimenos, ‘‘3D hands, face and body extraction for sign language vision and machine learning applications,’’ Sensors, vol. 22, no. 5, p. 2049,
recognition,’’ in Proc. Sign Lang. Recognit., Transl. Prod. (SLRTP) Mar. 2022.
Workshop-Extended Abstr., vol. 4, 2020, pp. 1–4. [49] A. Ali-Gombe, E. Elyan, C. F. Moreno-García, and J. Zwiegelaar, ‘‘Face
[25] Y. S. Ismael, ‘‘Deep learning based real-time face recognition system,’’ detection with YOLO on edge,’’ in Proc. 22nd Eng. Appl. Neural Netw.
NeuroQuantology, vol. 20, no. 6, pp. 7355–7366, 2022. Conf. (EANN). Cham, Switzerland: Springer, 2021, pp. 284–292.
[26] Y. Feng, H. Feng, M. J. Black, and T. Bolkart, ‘‘Learning an animatable [50] G. Castellano, B. De Carolis, N. Marvulli, M. Sciancalepore, and
detailed 3D face model from in-the-wild images,’’ ACM Trans. Graph., G. Vessio, ‘‘Real-time age estimation from facial images using YOLO
vol. 40, no. 4, pp. 1–13, Aug. 2021. and efficientnet,’’ in Proc. Comput. Anal. Images Patterns, 19th Int. Conf.
[27] L. Zhu, S. Wang, Z. Zhao, X. Xu, and Q. Liu, ‘‘CED-net: Contextual (CAIP). Cham, Switzerland: Springer, Sep. 2021, pp. 275–284.
[51] A. Wang, X. Cao, L. Lu, X. Zhou, and X. Sun, ‘‘Design of efficient human
encoder–decoder network for 3D face reconstruction,’’ Multimedia Syst.,
head statistics system in the large-angle overlooking scene,’’ Electronics,
vol. 28, no. 5, pp. 1713–1722, Oct. 2022.
vol. 10, no. 15, p. 1851, Jul. 2021.
[28] M. D. Putro and K.-H. Jo, ‘‘Real-time face tracking for human–robot [52] A. Ghimire, N. Werghi, S. Javed, and J. Dias, ‘‘Real-time face recognition
interaction,’’ in Proc. Int. Conf. Inf. Commun. Technol. Robot. (ICT- system,’’ 2022, arXiv:2204.08978.
ROBOT), Sep. 2018, pp. 1–4. [53] I. H. Al Amin and F. H. Arby, ‘‘Implementation of YOLO-v5 for a real
[29] X. Yang, X. Jia, D. Gong, D.-M. Yan, Z. Li, and W. Liu, ‘‘LARNet: time social distancing detection,’’ J. Appl. Informat. Comput., vol. 6, no. 1,
Lie algebra residual network for face recognition,’’ in Proc. Int. Conf. pp. 1–6, Jul. 2022.
Mach. Learn., 2021, pp. 11738–11750. [54] N. Kim, J.-H. Kim, and C. S. Won, ‘‘FAFD: Fast and accurate face
[30] A. Bansal, R. Ranjan, C. D. Castillo, and R. Chellappa, ‘‘Deep CNN detector,’’ Electronics, vol. 11, no. 6, p. 875, Mar. 2022.
face recognition: Looking at the past and the future,’’ in Deep Learning- [55] R. Chatterjee, A. Chatterjee, S. H. Islam, and M. K. Khan, ‘‘An
Based Face Analytics. Springer, 2021, pp. 1–20. [Online]. Available: object detection-based few-shot learning approach for multimedia quality
https://doi.org/10.1007/978-3-030-74697-1_1 assessment,’’ Multimedia Syst., vol. 29, no. 5, pp. 1–14, Oct. 2023.
[31] X. Ling, J. Liang, D. Wang, and J. Yang, ‘‘A facial expression recognition [56] N. Darapaneni, A. K. Evoori, V. B. Vemuri, T. Arichandrapandian,
system for smart learning based on YOLO and vision transformer,’’ in G. Karthikeyan, A. R. Paduri, D. Babu, and J. Madhavan, ‘‘Automatic face
Proc. 7th Int. Conf. Comput. Artif. Intell., 2021, pp. 178–182. detection and recognition for attendance maintenance,’’ in Proc. IEEE 15th
[32] E. Michos, ‘‘Development of an online platform for real-time facial Int. Conf. Ind. Inf. Syst. (ICIIS), 2020, pp. 236–241.
recognition,’’ Postgraduate thesis, Masters HCI, Joint Program ECE CEID, [57] H. Deshpande, A. Singh, and H. Herunde, ‘‘Comparative analysis on
Univ. Patras, Greece, 2021. YOLO object detection with OpenCV,’’ Int. J. Res. Ind. Eng., vol. 9, no. 1,
[33] N. Wang, Z. Wang, Z. He, B. Huang, L. Zhou, and Z. Han, ‘‘A tilt-angle pp. 46–64, 2020.
face dataset and its validation,’’ in Proc. IEEE Int. Conf. Image Process. [58] J. Yu and W. Zhang, ‘‘Face mask wearing detection algorithm based on
(ICIP), Sep. 2021, pp. 894–898. improved YOLO-v4,’’ Sensors, vol. 21, no. 9, p. 3263, May 2021.
[34] Z. Zhang, S. Xia, Y. Cai, C. Yang, and S. Zeng, ‘‘A soft-YoloV4 for high- [59] Y. Liu, R. Liu, S. Wang, D. Yan, B. Peng, and T. Zhang, ‘‘Video face
performance head detection and counting,’’ Mathematics, vol. 9, no. 23, detection based on improved SSD model and target tracking algorithm,’’
p. 3096, Nov. 2021. J. Web Eng., vol. 21, no. 2, pp. 545–568, 2022.
[35] K. Bhangale, P. Ingle, R. Kanase, and D. Desale, ‘‘Multi-view multi-
pose robust face recognition based on VGGNet,’’ in Proc. 2nd Int. Conf.
Image Process. Capsule Netw. (ICIPCN). Cham, Switzerland: Springer, MUHAMMAD SOHAIL received the bachelor’s
Feb. 2022, pp. 414–421. degree in computer engineering from the Univer-
[36] Q. Li, X. Dong, W. Wang, and C. Shan, ‘‘CAS-AIR-3D face: A low-quality, sity of Engineering and Technology (UET), Taxila,
multi-modal and multi-pose 3D face database,’’ in Proc. IEEE Int. Joint and the master’s degree in computer science
Conf. Biometrics (IJCB), Aug. 2021, pp. 1–8. from the Riphah College of Computing, Riphah
[37] T. Xie, Z. Chen, M. Cao, P. Hu, Y. Zeng, and Z. Pan, ‘‘Face detection in VR International University, Faisalabad, in 2023. His
games,’’ in Proc. 3rd Int. Conf. Control Comput. Vis., Aug. 2020, pp. 7–10. research interests include smart cities, smart trans-
[38] D. Qi, W. Tan, Q. Yao, and J. Liu, ‘‘YOLO5Face: Why reinventing a face portation, advanced driving assistance systems, the
detector,’’ in Proc. Comput. Vis.–ECCV Workshops, Tel Aviv, Israel. Cham, Internet of Vehicles, smart education, and machine
Switzerland: Springer, 2023, pp. 228–244. learning.

26060 VOLUME 12, 2024


M. Sohail et al.: Deep Learning Based Multi Pose Human Face Matching System

IJAZ ALI SHOUKAT received the Ph.D. degree MOHSIN RAZA JAFRI received the Ph.D. degree
in computer science from Universiti Teknologi in computer science from Universitã Ca’ Foscari
Malaysia (UTM). The Ph.D. thesis is on informa- Venezia, in 2019. He has been an Assistant
tion security and applied cryptography. Currently, Professor of computer science with the National
he is an Associate Professor with the Comput- University of Sciences and Technology (NUST),
ing Department, Riphah Internaitonal University, Pakistan, since June 2019. His research interests
Faisalabad Campus. He retains extensive aca- include communication system design, wireless
demic, industry, and research experiences. His sensor networks, and underwater sensor networks.
academic brilliance reflects the talent reward He has contributed to developing network simu-
through the Outstanding Talent Support Scheme lators and energy-efficient algorithms for wireless
by the Punjab Information Technology Board, Government of Punjab, communication. Moreover, he has also developed stochastic models for the
Pakistan. performance analysis of wireless sensor networks.

MUHAMMAD AZFAR YAQUB (Member, IEEE)


ABD ULLAH KHAN (Member, IEEE) received received the bachelor’s degree from COMSATS
the Ph.D. degree in computer science from the University Islamabad (CUI), Pakistan, in 2007,
Ghulam Ishaq Khan Institute of Engineering the master’s degree from Lancaster University,
Science and Technology, Pakistan, in 2021. He is U.K., in 2010, and the Ph.D. degree from the
currently an Assistant Professor with the National School of Computer Science and Engineering
University of Science and Technology, Pakistan. (SCSE), Kyungpook National University (KNU),
He is also the Founding Member of the Highly Republic of Korea, in 2019. He is currently an
Secure and Spectrum-Efficient Network (HiSEN) RTDA Research Assistant with the Faculty of
Research Group. He has over 20 journal publi- Engineering, Free University of Bozen-Bolzano,
cations in diverse reputed venues, such as IEEE Italy. Previously, he was a Lecturer with the Department of Electrical and
INTERNET OF THINGS JOURNAL, IEEE TRANSACTIONS ON NETWORK SCIENCE AND Computer Engineering, CUI, from 2008 to 2021, where he was an Assistant
ENGINEERING, IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, Professor, from 2021 to 2023. His research interests include future internet
IEEE WIRELESS COMMUNICATIONS LETTERS, Future Generation Computer architectures, information-centric networks, CCN/NDN, wireless ad-hoc
Systems (Elsevier), and Journal of Network and Computer Applications networks, sensor networks, connected vehicles, and video streaming. He is
(Elsevier). His research interests include resource allocation and man- an ACM Member and serves as a TPC/reviewer for several conferences and
agement in wireless networks and network security. Besides, he is an journals.
Active Reviewer of IEEE NETWORK, IEEE INTERNET OF THINGS JOURNAL,
IEEE SYSTEM JOURNAL, IEEE ACCESS, IEEE TRANSACTIONS ON NETWORK
SCIENCE AND ENGINEERING, IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY,
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, and Computer
Communications (Elsevier). ANTONIO LIOTTA (Senior Member, IEEE) is
currently a Full Professor with the Faculty of
Computer Science, Free University of Bozen-
Bolzano, Italy, where he teaches data science and
machine learning. Previously, he was the Founding
HARAM FATIMA received the B.S. degree in Director of the Data Science Research Centre,
computer science from Government College Uni- University of Derby, U.K. He is credited with
versity Faisalabad, in 2020, and the M.S. degree over 350 publications involving, overall, more
in computer science from Riphah International than 150 coauthors. His research interests include
University, Faisalabad Campus, Pakistan. Her artificial intelligence theories and applications,
research interests include machine learning and particularly artificial vision, e-health, intelligent networks, and intelligent
multi-pose face recognition. systems. He is the Editor-in-Chief of the Internet of Things (Springer)
book series (springer.com/series/11636) and an associate editor of several
prestigious journals.

Open Access funding provided by ‘Libera Università di Bolzano’ within the CRUI CARE Agreement

VOLUME 12, 2024 26061

You might also like