Artificial Intelligence-Enabled Deep Learning Model For Multimodal Biometric Fusion
Artificial Intelligence-Enabled Deep Learning Model For Multimodal Biometric Fusion
https://doi.org/10.1007/s11042-024-18509-0
Abstract
The goal of information security is to prevent unauthorized access to data. There are sev-
eral conventional ways to confirm user identity, such as using a password, user name, and
keys. These conventional methods are rather limited; they can be stolen, lost, copied, or
cracked. Because multimodal biometric identification systems are more secure and have
higher recognition efficiency than unimodal biometric systems, they get attention. Single-
modal biometric recognition systems perform poorly in real-world public security opera-
tions because of poor biometric data quality. Some of the drawbacks of current multimodal
fusion methods include low generalization and single-level fusion. This study presents a
novel multimodal biometric fusion model that significantly enhances accuracy and general-
ization through the power of artificial intelligence. Various fusion methods, encompassing
pixel-level, feature-level, and score-level fusion, are seamlessly integrated through deep
neural networks. At the pixel level, we employ spatial, channel, and intensity fusion strate-
gies to optimize the fusion process. On the feature level, modality-specific branches and
jointly optimized representation layers establish robust dependencies between modalities
through backpropagation. Finally, intelligent fusion techniques, such as Rank-1 and modal-
ity evaluation, are harnessed to blend matching scores on the score level. To validate the
model’s effectiveness, we construct a virtual homogeneous multimodal dataset using simu-
lated operational data. Experimental results showcase significant improvements compared
to single-modal algorithms, with a remarkable 2.2 percentage point increase in accuracy
achieved through multimodal feature fusion. The score fusion method surpasses single-
modal algorithms by 3.5 percentage points, reaching an impressive retrieval accuracy of
99.6%.
13
Vol.:(0123456789)
80106 Multimedia Tools and Applications (2024) 83:80105–80128
1 Introduction
In recent years, the significance of biological characteristics in solving criminal cases has
been growing. Biometrics, including fingerprints, faces, DNA, and voiceprints, have been
established as ministerial-level databases and have become crucial tools in crime inves-
tigation due to their uniqueness and specificity [1]. The idea behind biological theories
of crime, which date back to the nineteenth century, is that a person’s biological makeup
determines whether or not they are likely to commit crimes. Genetic, hormonal, or neuro-
logical predispositions can be acquired (by accident or disease) or inherited (existing from
birth), making some people more likely to commit crimes. Since crime is determined by
society, no one can be a born criminal. A connection has to be drawn from a few more
universal characteristics, such as risk-taking, impulsivity, and violence. Darwin’s theories
of natural selection and evolution had an impact on early biological explanations of crime.
According to theories like the degeneration theory, persons who used drugs like alcohol
and opium had morally deteriorating characteristics that may be passed on to future gener-
ations. In the past, eugenics initiatives like the Third Reich’s have been justified by biologi-
cal theories of criminality, particularly those advanced by Lombroso and B. A. Morel. The
development of neurology in the last decades of the twentieth century made genetic inves-
tigations into criminal behavior possible. These studies look at the interactions between
certain chemicals in the brain called neurotransmitters and various environmental behav-
iors that lead to criminal behavior. Twin adoption studies are one popular approach for this.
Measurements of a person’s biological or behavioral traits are used in biometrics to
identify them. Since most of these traits are inherited, they cannot be learned or taken. Eve-
rybody has a different fingerprint. They are measurable in several ways. While image-based
measurement looks for similarities between an individual’s fingerprint picture and photos
of their fingertips in the database, minutiae-based measurement matches ridges on graphs.
It is used for both identification and verification and has a high-security level. Facial char-
acteristics such as the separation between the nose, mouth, and ears, the length of the face,
and the color of the skin are used to confirm identity. The distinctive patterns observed
in the eye may be utilized for both recognition and identification. Retinal analysis equip-
ment is costly, which makes it less widely used. Pitch, tone, and voice modulation are all
measured, among other things. Because people’s voices are similar, security is a medium
level and is mostly utilized for verification. DNA is distinct and enduring throughout life.
As a result, security is strong and useful for both verification and identity. However, as the
size of the database expands, the increasing security requirements cannot be met by sin-
gle-modal biometric recognition algorithms. Furthermore, the interference caused by low-
quality outlier samples is significant due to the dependence of single-modality identifica-
tion on data quality. To address these challenges, multimodal fusion recognition algorithms
have emerged as a solution to improve recognition performance and security [2].
Multimodal big data have great volume, diversity, velocity, and veracity much as con-
ventional big data. The multimodal big data’s diversity, however, stands out more than
its other qualities. With each modality having an independent distribution, multimodal
big data are specifically made up of many modalities that each describe a portion of the
same objects of interest. Intricate relationships exist between modalities as well. Many
multimodal applications can perform even better if the fusion representations concealed
in the cross-modality and intermodality are fully modeled. Several modalities, including
fingerprints, retinas, and finger veins, are used in multimodal authentication systems to
extract features. Both key creation and encryption employ the RSA algorithm. Combining
13
Multimedia Tools and Applications (2024) 83:80105–80128 80107
the fuzzy technique with a genetic algorithm that incorporates crossover and mutation to
reduce convergence is known as the optimized fuzzy genetic algorithm, or OFGA. The
starting population is randomly generated using the genetic algorithm as the first stage in
the proposed optimization procedure. Every function in the population selected at random
is fully vector-based and represents every technique.
Single-modal biometric identification algorithms are unable to keep up with the grow-
ing security needs as the database gets larger. An unprecedented rate of data generation
and collection is occurring due to the widespread deployment of heterogeneous networks.
These data, which are sometimes called big data, have the following qualities: high veloc-
ity, high diversity, high volume, and high veracity. A key technique in multimodal data
mining is multimodal data fusion, which combines data from many sources, distributions,
and kinds into a global space where intermodality and cross-modality may be uniformly
represented. Authentication systems may be made more accurate and secure by combining
fuzzy approaches with optimization methodologies. Using a local binary pattern feature
vector, ELBP is a contemporary texture-based dimensionality reduction approach that is
suggested to describe entropy details in one dimension. It has been shown that the com-
bination of fuzzy systems with rankings optimization algorithms performs better in multi-
modal biometric technique recognition or verification than other well-known approaches.
To recreate the crime and the occurrence, scientists now employ advanced technology
such as 3-D computers, high-performance liquid chromatography, mass spectrometry, and
DNA testing. Trace elements and organic compounds up to a few hundred molecules apart
can be distinguished by modern forensic science. Using the following methods, forensic
science can establish the occurrence of a crime, the identity of the offender, or a link to
a crime: physical test examination, testing administration, analyzing the information, and
relationships that are succinct and clear a forensic scientist’s genuine testimony with objec-
tive facts derived from scientific understanding supporting both the prosecution’s and
defense’s positions. Forensic science has grown to be an essential component of many
criminal trials and sentencing.
Image fusion approaches at the pixel level immediately combine the data from input
pictures for additional computer processing activities. To develop more fused features, fea-
ture-level approaches for picture fusion involve extracting pertinent characteristics such as
edges, textures, and pixel intensities. These features are then compounded. Multibiometric
systems frequently use fusion at the score level. To improve system performance overall,
this approach calculates the recognition results for each unimodal system independently
and then fuses the recognition score results into a single multimodal system. First, the
score vectors for each characteristic in the classification procedure are computed indepen-
dently and normalized to the lowest EER value. Ultimately, a decision-making process is
achieved by applying the targeted threshold that fulfills the highest possible fused system
performance.
The fusion of different biometric modalities is categorized into pixel, feature, and
score fusion at different layers in the existing literature [3–12]. Raw data fusion at
the pixel layer focuses on eliminating image gaps and ensuring accurate image regis-
tration, which is achieved through sensor fusion techniques [5]. Layer fusion concen-
trates on features extracted from different biological characteristics, and its effective-
ness in improving recognition accuracy has been confirmed [6–8]. Fractional layer
fusion research emphasizes unique comparison techniques, and the final matching score
is the sum of individual matching scores [9, 10]. Score fusion methods described in
the literature combine static and dynamic gait parameters to enhance recognition accu-
racy [11]. The goal of pixel-level image fusion is to combine many input photos with
13
80108 Multimedia Tools and Applications (2024) 83:80105–80128
complimentary details from the same scene to create a composite image. The input
images, often referred to as source images, are taken using various parameter settings
on various imaging equipment or from a single kind of sensor. Point-cloud data have a
higher information density than derived lidar DSMs because they provide a wider range
of raw characteristics that may be used to extract structural information. Any individual
input should not be as good for human or machine perception as the fused image, which
is a composite image. This benefit makes image fusion techniques very important in
many applications that use two or more photos of the same scene.
The sampling distribution of the mean is an important notion in mathematics and sta-
tistics, according to the central limit theorem (CLT), an arithmetical concept. One type
of sensor fusion technique, the Kalman filter, estimates unknown values that are com-
monly employed in navigation and control technologies by utilizing data inputs from
many sources. When compared to individual predictions made using a single measure-
ment approach, these filters can calculate unknown values with far more accuracy. Con-
volutional neural networks (CNNs) are utilized in sensor fusion systems to detect transi-
tion movements in medical applications. A Bayesian network approach in sensor fusion
is a graphical model that uses a directed acyclic graph to represent a set of variables and
their conditional relationships.
Pixel-level image fusion is to merge many input photos into a fused image that, when
compared to any of the input images, is theoretically more informative for human or
machine perception. The process of feature fusion involves the fusion of handcrafted
features and multi-layer features of a single CNN model at different stages. On the other
hand, score-level fusion involves the integration of handcrafted features with multi-layer
features of the same CNN model at different stages. Each stage is connected to a classi-
fier, and the automatic score-level fusion of these classifiers leads to the recognition of
the scene geometry type.
However, several limitations exist in the current multimodal fusion algorithms.
Firstly, there is a lack of sufficient generalization, making it challenging to migrate
fusion levels and modes. Most fusion algorithms are designed for a single fusion level,
such as pixel layer fusion or feature layer fusion. Secondly, there is a scarcity of data
to properly evaluate these methods. The limited availability of biometric information
hinders comprehensive testing, often resulting in miniature virtual homologous datasets.
These small-scale datasets make it difficult to explain the remarkable performance of
fusion methods. Lastly, the practical utility of these algorithms remains unclear as they
fail to address the identification of low-quality data, thereby falling short of meeting the
practical needs of public safety agencies.
Motivated by these research gaps, a multimodal algorithm based on a deep learning
state biometric fusion model is proposed in this paper. The three biometric data types of
face, fingerprint, and iris are the main research objects. The limitations of existing algo-
rithms are addressed, and the actual application requirements of public security agen-
cies are aimed to be met. Various levels and facets of multimodal biometric retrieval
techniques, including pixel, feature, and score-level fusion, are investigated. Flexibility,
scalability, and generalization in fusion selection on multiple levels are offered by the
proposed model. Furthermore, the negative impact of single-modal outliers on recogni-
tion performance is effectively mitigated, and the influence of low-quality or inaccurate
data is overcome.
To evaluate the performance of the proposed model, a simulated multimodal dataset
comprising over a thousand individuals with similar characteristics is created. This dataset
addresses the limitations of scale and recognition difficulty found in existing research and
13
Multimedia Tools and Applications (2024) 83:80105–80128 80109
simulates real-case scenarios with low-quality data processing. The main contributions of
work for this paper are as follows:
1. Proposing a versatile multi-modal biometric fusion model for enhancing public security.
This model addresses low-quality data issues and corrects single-modal recognition
errors, making it suitable for real-world security applications.
2. Introducing a flexible fusion model with selectivity and generalization, independent
of specific modalities and neural network performance. Users can tailor fusion layers,
fusion modes, backbone networks, and fusion levels to meet specific needs, enhancing
adaptability.
3. Creating a substantial virtual multimodal dataset with 2712 individuals and 40,482
images, based on face, fingerprint, and iris data. This dataset replicates real-world low-
quality scenarios and surpasses existing research in scale and recognition complexity.
2 Related work
13
80110 Multimedia Tools and Applications (2024) 83:80105–80128
Feature layer fusion involves extracting separate features from each modality and fusing
them to create a composite feature for matching. Fusion features tend to be more expressive
and contain richer fusion information compared to single features [14, 19–24]. Neverthe-
less, the curse of dimensionality and compatibility issues may arise due to the increase
in feature dimensions and features from different sources [14, 20, 21]. The findings of
Ghaderi et al. [25] demonstrated the effectiveness of the hybrid GFFN-GA that was used,
indicating its potential for use in defining soil horizons for future event prediction. Conse-
quently, the likelihood of a good outcome can be raised by choosing an appropriate model
based on the information in a complementary and cohesive manner. A limitation metric
for evaluating the effectiveness of artificial intelligence (AI), and deep learning ensemble-
based models, in particular, is uncertainty quantification (UQ). Nevertheless, UQ’s capac-
ity to use existing AI-based techniques is constrained not just by the amount of processing
power available, but also by the need for several performances to track model instabilities,
modifications to topology, and optimization procedures [26].
One typical feature selection method used to find the key characteristics in a dataset is
sensitivity analysis. Sensitivity analysis involves perturbing each input feature one at a time
and analyzing the machine learning model’s response to ascertain the rank of each feature
[27]. It was shown that the ranking parameters with SA can be more dependable as they
cover a wider range of uncertainty. Sensitivity index-based model evaluation revealed that
the suspended load’s contribution to the anticipated bed load is negligible [28].
Artificial intelligence (AI) is the process of simulating human intellect in machines
through programming. AI can replicate an advanced module of the human brain, which
would allow computers to mimic people, perform tasks, and complete tasks by watching,
copying, and evolving. This would be accomplished via the use of algorithms, improved
data, and pattern recognition. Within artificial intelligence, deep learning has become a
cutting-edge discipline that is changing how computers learn and handle challenging jobs.
Neural networks are the computer models that underpin Deep Learning; they are inspired
by the neural connections seen in the human brain.
Score layer fusion focuses on the corresponding scores provided by each uni-modal
method. Fractional layer fusion has gained attention as it demonstrates optimal effects
in multimodal fusion and effectively balances the original information and data process-
ing difficulties. Currently, score fusion techniques can be categorized into three types: (1)
Transformation-based score fusion methods normalize the scores of various modalities to
the same interval and combine them into the final fusion score [29, 30]; (2) Classifier-
based score fusion methods treat matching scores from different classifiers as a feature
vector and concatenate them to form a new score space, where each potential matching
score is viewed as an element of the feature vector; and (3) The score fusion method utiliz-
ing a rank-level fusion approach uses a single-modal algorithm without creating low-level
semantic or pixel characteristics or interacting at a low level with multi-modal data. As the
fusion progresses from the pixel layer to the feature layer and score layer, the amount of
retained original data gradually decreases, along with the complexity of data processing.
Both pixel features and semantic features have their advantages and drawbacks based on
their application scenarios.
Fusion occurs at the sensor (or raw data) and feature levels in pre-classification systems,
and at the match score, rank, and decision levels in post-classification schemes. Because
the match scores, rankings, and judgments can be easily accessed and processed, post-clas-
sification fusion procedures are rather common in the literature. Rank-level fusion is com-
monly applied in multi-biometric identification systems, where each classifier assigns a
rank to each enrolled identity. However, it is not confined to these systems. A better match
13
Multimedia Tools and Applications (2024) 83:80105–80128 80111
is often indicated by a higher rank. Rank-level fusion approaches aim to create a consensus
rank for each identity by combining the ranking output from each of the separate biometric
subsystems.
Existing research suffers from limited integration across fusion levels, hindering the
transferability and applicability of multi-modal fusion approaches. Moreover, current stud-
ies focus on specific tasks and datasets, lacking verification on larger multi-modal datasets.
Pixel layer fusion faces challenges of computational intensity and reduced interpretability,
while feature layer fusion requires further exploration and addressing compatibility issues.
Future research should aim to develop practical and generalizable multi-modal fusion mod-
els, evaluate them on larger datasets, and address the needs of public security agencies and
low-quality biometric data.
In this study, we propose a multi-modal biometric fusion model that incorporates three
fusion-level methods. The model is designed to be practical and generalizable, independ-
ent of specific modalities or neural networks. The experimental evaluation is conducted on
a multi-modal dataset consisting of fingerprints, irises, and faces from 2712 individuals,
testing the resilience and efficacy of the model. The findings aim to bridge the research gap
and provide insights into the development of effective multi-modal fusion techniques.
3 Proposed methodology
This paper presents a deep learning-based multimodal biometric fusion model, as depicted
in Fig. 1, which achieves a high-level fusion in multimodal biometric fusion. The model
comprises three fusion algorithms: the score layer (in red), the feature layer (in blue), and
the pixel layer (in orange). Each fusion level involves both a training phase and a feature
Multi-Model Fusion
13
80112 Multimedia Tools and Applications (2024) 83:80105–80128
phase, resulting in fusion features or fusion scores for retrieval. Importantly, this model
offers superior adaptability and generalization in real-world applications, as it is not lim-
ited by specific neural network performance or biometric modalities. To solve security
concerns, biometrics can play a big part in safeguarding a variety of newly developed IoT
devices. It also presents an intriguing window of opportunity to enhance the usefulness and
security of IoT. Human physical characteristics, such as fingerprints and faces, are used in
biometric recognition to identify or confirm identity. Thanks to developments in sensing
technology, it eliminates the disadvantage of password-based verification and is growing
in popularity. To restrict unwanted access to IoT devices and services, a single-modal bio-
metric authentication system authenticates users using information from a single biomet-
ric feature. When opposed to traditional password-based authentication, a biometric-based
solution provides robust security and ease of use.
In the pixel-level fusion method, this paper employs three primary fusion techniques: chan-
nel fusion, intensity fusion, and spatial fusion, focusing on image data. These methods
combine various modalities to create a fused image fed as a complete input to the backbone
network for training and retrieval, as depicted in Fig. 2.
Common methodologies at the pixel level include multi-resolution transformation
approaches, principal component analysis (PCA), and color space-based algorithms. In
color space-based fusion, pictures are converted from the RGB color model to a sequential
color system using techniques such as the Brovey transform and HIS transform. Spatio-
spectral data separation is skillfully accomplished by the HIS transform. By minimizing
data dimensionality and identifying crucial characteristics, Principal Component Analysis
(PCA) is an image fusion approach that combines several pictures. Channel fusion com-
bines grayscale fingerprint and iris images with color face images by converting them into
three-channel images during training. This adjustment ensures data consistency and miti-
gates information imbalances between modalities. The fused image comprises nine chan-
nels, with the first three channels representing facial information, the middle three channels
containing iris information, and the last three channels capturing fingerprint information.
Specifically, the input data, initially a three-dimensional [224, 224, 3] modal image, is
13
Multimedia Tools and Applications (2024) 83:80105–80128 80113
transformed through channel fusion into a [224, 224, 9] data format, which is then for-
warded to the backbone network for further training.
Intensity fusion is to perform pixel intensity weighted fusion on
th
the input images. The i fused image intensity weighting formula is
Inputi = N1 × Picture(Facei ) + N2 × Picture(Irisi ) + N3 × Picture(Fingerprinti ), and the
fused images are sent to into the backbone network for training. Spatial fusion is to spatially
stitch and compress the input images. Three input images are stitched horizontally, and the
fused image is scaled to the size before fusion to ensure that the input size is the same as the
single image size. Send the spliced and compressed pictures to the backbone network for
training.
In our model, we utilized a specific input resolution for consistency and optimal per-
formance. The choice of this resolution was based on a balance between computational
efficiency and the quality of information captured. It’s essential to consider the trade-offs
between resolution and computational resources in deep learning models.
In our case, the chosen resolution was [224, 224, 3] for the input data. This resolution
was selected as a common standard for various image-based deep learning tasks, balancing
the need to capture essential visual information with the computational demands of pro-
cessing large images. However, it’s worth noting that this choice of resolution may not be
universally optimal for all scenarios, and the selection of input resolution can vary based
on the specific requirements and constraints of the task at hand.
The feature layer fusion focuses on combining the features extracted from each modality
after the training of input data, resulting in fusion features for retrieval. The overall fusion
process is illustrated in Fig. 3.
The feature layer fusion framework comprises modality-specific branches based on con-
volutional neural networks (CNNs) and a joint representation layer. These branches extract
modality-specific features through independent uni-modal networks, and the joint repre-
sentation layer is trained to enhance the interdependencies among different modalities. The
joint representation layer plays a crucial role in fusing multi-modal feature representations
to improve the final joint biometric representation.
INPUT
13
80114 Multimedia Tools and Applications (2024) 83:80105–80128
Step 3 Determine the high-frequency selection strategy of the fused image F according to
the gradient coefficient matrix of each scale and direction of the images A and B, as shown in
the following formula:
{ A A B
Il,k (i, j), grad_Il,k (i, j) ≥ grad_Il,k (i, j)
(3)
F
Il,k (i, j) = B A B
Il,k (i, j), grad_Il,k (i, j) < grad_Il,k (i, j)
13
Multimedia Tools and Applications (2024) 83:80105–80128 80115
INPUT
Iris Score
Evaluation Model
Face Matching
Score
Iris Matching
Score
For the score layer fusion method, this paper proposes two different fusion methods,
namely the Rank1-based evaluation method and the modality-based evaluation method.
The main research object of score layer fusion is the comparison score output by each
modality classifier, which is then fused according to "rules or theories" to output the final
comparison score, which is used as the basis for open set retrieval. Since the comparison
score is given by the algorithm, the score is related to the performance of the uni-modal
algorithm. Confidence in the scores of the specific idea is shown in Fig. 4, and the specific
fusion method in the fractional layer fusion stage is shown in Fig. 5.
Based on the modal evaluation method, the difference formula is used to calculate the
score value of each single mode, and the comparison score of the mode with the highest
score value is selected as the final comparison score of all claimants. Calculated as follows:
( [ ] )
∑ s q − 1 − s[q]
C−1
Dt = (4)
q=1
q
13
80116 Multimedia Tools and Applications (2024) 83:80105–80128
Among them, D represents the score value of the modal classifier, C represents the
capacity of the candidate pool, and s[q] represents the score of the q th candidate given by
the modal classifier to a certain claimant. Equation (1) measures the decision scores of each
modality, and selects the modality result with the highest score as the final fusion result.
Based on the Rank1 evaluation method, for each claimant information the sum of the
differences between the optimal candidate of each modality and the remaining candidates
is calculated as the score value, and the result of the modality with the highest score is
selected as the final decision result. Calculated as follows
∑C
q=1 (s[0]
− s[q])
Dt = (5)
C−1
According to formula (2), for each claimant information, according to the first ranking
search result, calculate the distance between the rest ranking and the first ranking search
result to measure the score value D of each uni-modal classifier.
The modality evaluation method measures the overall effect of the uni-modal classifier,
and trusts the result of all comparison scores of the uni-modal with the highest score value.
The Rank1 evaluation measures the first-ranked retrieval results of each claimant of each
single modality, and trusts the current comparison score result of the modality with the
highest Rank1 score value for the claimant. Overall, score-layer fusion decides to trust a
uni-modal decision-making outcome by score value.
Unlike the feature layer, which needs to train the backbone network of three modali-
ties at the same time and perform backpropagation, the score layer only needs to train the
modal backbone network separately, and does not need to realize the decoupling of each
modal backbone network at the same time. This makes the score layer fusion training less
difficult than the feature layer fusion training. At the same time, the single-modal model is
easy to converge, the interpretability of each network is stronger, and the parallel training
of the model can be realized.
In this paper, using multimodal input, a fusion method based on pixel layer, feature layer,
and score layer is proposed to construct a multimodal biometric fusion model. The fusion of
the pixel layer and the feature layer needs to fuse the original biometric data of the modal-
ity, and complete the individual retrieval through the feature vector. The process is shown
in Fig. 6. Let the multimodal fusion biometrics obtained by the claimant (referring to the
individual who holds the biometric data and waits for verification by the biometric identi-
fication method) be Qi (q1 , q2 , … , qn ), i ∈ [1, n]. Candidates (referring to individuals in the
biometric reference database that are similar to the claimant’s biometric data) list, the mul-
timodal fusion biometric obtained by one of the candidates is Gi (g1 , g2 , … , gqn ), i ∈ [1, n].
In this paper, the similarity is measured by two measures of Euclidean distance and cosine
similarity, and the candidate list is sorted to output the retrieval results.
Due to the problem of inconsistent lengths of different feature vectors, to unify the simi-
larities comparison between� Qi and Gi , the model � normalizes the length of feature vectors
∑i=1 2 � � ∑i=1 2
to 10, that is, ��Qi ��2 = n qi
= G
� i �2 = n gi = 10 . For the Euclidean distance
similarity, the distance between the feature vectors of any two images is:
13
Multimedia Tools and Applications (2024) 83:80105–80128 80117
Metric Learning
Similarity Evaluation
Fig. 6 Pixel layer and feature layer multi-modal biometric fusion retrieval
√ √
√ i=1 √ i=1
√∑ ( )2 √ ∑( )
0≤ √ qi − gi ≤ √2 q2 − g2 = 20 (6)
i i
n n
13
80118 Multimedia Tools and Applications (2024) 83:80105–80128
() ()
() ()
() ()
Similarity Evaluation
5.1 Experimental data
This study focuses on three distinct physiological traits: the face, fingerprint, and iris,
which are highly relevant to real-world security scenarios. To address the scarcity of large-
scale multimodal biometric datasets and concerns about biometric information security, we
amalgamate public unimodal datasets [31] into virtual homologous multimodal datasets.
For iris and fingerprint data, we utilize the publicly available CASIA-IrisV4-Interval
and CASIA-FingerprintV5 biometric datasets. Additional data were sourced from local
public security agencies using the same collection equipment and procedures as the open
datasets. The CASIAFingerprintV5 dataset includes fingerprint data from 500 individuals
for eight fingers, categorized into homogeneous and heterogeneous fingerprints to expand
the dataset size, with minimal impact on model performance [32–34].
The face data is extracted from the Web Face 260 M [34–37] public network face data-
set, characterized by noisy images, low resolution, and overall challenging recognition. To
improve data quality, this dataset undergoes specific cleaning techniques.
5.1.1 Dataset processing
The dataset used in this study consists of multi-modal biometric data, including iris, finger-
print, and face images. The combined iris dataset comprises 2712 categories, with each cat-
egory containing a minimum of 5 data samples. To ensure consistency and alignment, the
13
Multimedia Tools and Applications (2024) 83:80105–80128 80119
fingerprint and face data were limited to the first 2712 individuals and synchronized with the
iris data.
The iris and fingerprint images in the dataset were captured using a high-fidelity imag-
ing device, resulting in high-resolution and high-quality data. However, it is important to note
that these images do not directly represent the low-resolution and low-quality images typically
encountered in real crime scenes. To simulate realistic conditions, certain modifications were
applied to the iris and fingerprint images.
To emulate real-world scenarios, the resolution of the iris and fingerprint images was
reduced to 1/4 of the original data. Additionally, the iris images retained only 20% of the origi-
nal data quality, while the fingerprint images retained 30% of the original quality. These modi-
fications aimed to introduce the challenges associated with low-resolution and lower-quality
biometric data, improving the robustness of the proposed model. For a detailed overview of
the dataset composition, please refer to Table 1. By providing comprehensive information on
the dataset, alignment procedures, applied modifications, and the omission of preprocessing
steps, this study ensures transparency and enables readers to understand the dataset’s charac-
teristics, limitations, and suitability for evaluating the proposed model in real-world biometric
recognition tasks. An Internet of Things network with thousands or perhaps hundreds of thou-
sands of sensors produces both trivial and nontrivial data. Data fusion algorithms’ accuracy
may be impacted by the processing of unimportant data. For data fusion techniques, the most
important and pertinent characteristics are sorted. The process of combining data is dynamic
by nature. Results that defy logic are produced by contradicting data. As a result, while man-
aging contradicting data, data fusion algorithms must exercise extreme caution. Prioritizing
data fusion involves taking care of data alignment and correlation. In wireless sensor networks
or WSNs, this is more typical. Effective handling of data imperfection and inconsistencies is
necessary when employing data fusion methods. Nevertheless, training a new deep learning
model is too time-consuming, and thus is not suitable for live multimodal data applications.
There are no hard and fast rules for picking a backbone network in the multimodal biometric
fusion model. To prove the efficacy of the fusion approach, the traditional convolutional neural
network is used only in this paper. In this paper, we choose the VGG16 network, the ResNet50
network, the DenseNet169 network, and the VGG16 network with a batch normalization layer
inserted before the activation function as the backbone structure to implement and test the
multimodal biometric fusion model. Public safety organizations use non-typical settings in
their actual applications, and the model retrieval performance is relatively demanding. As a
result, the authors of this work use the mAP (mean average accuracy), Rank N (where N = 1,
5, and 10), and CMC (cumulative matching characteristic) curves as measures of performance.
Average Precision (mAP) measures how accurate an algorithm is on average, whereas Rank
N shows how well the top N results are retrieved. Biometric model performance is typically
evaluated using the CMC curve, which represents the Rank-N index for a range of N.
13
80120 Multimedia Tools and Applications (2024) 83:80105–80128
Given the unavailability of low-quality biometric data details, in this case, the dataset
underwent low-quality processing. To assess the impact of fusion methods on accuracy,
we initially trained modality-specific networks using two distance measures: Euclidean dis-
tance and cosine similarity, defined in Eqs. (4) and (6).
Table 2 displays the single-modal recognition results, indicating that Euclidean distance
metric learning yields significantly higher accuracy than cosine similarity. The Euclidean
distance metric employs DenseNet as the backbone network for single-modal recognition,
achieving the highest mAP of 93.7% for the iris and the lowest of 85.9% for the face. Nota-
bly, the Euclidean distance metric outperforms the cosine similarity metric by 7.3, 11.5,
and 2.4 percentage points for face, fingerprint, and iris mAPs, respectively. The highest
Rank1 accuracy is 96.1% for the iris and 91.2% for the face, with varying accuracy across
different backbone networks. Overall, face accuracy lags behind fingerprint and iris, with
iris recognition showing the best performance, highlighting the unique and distinctive
nature of iris images.
The CMC curves for single-modal biometric retrieval based on Euclidean distance and
cosine similarity are depicted in Fig. 8. Figure 8(a) represents the Euclidean distance met-
ric, and Fig. 8(b) the cosine similarity metric. The CMC curves clearly illustrate that the
single-modal biometric retrieval accuracy using Euclidean distance significantly outper-
forms that of cosine similarity. In biometric retrieval scenarios, Euclidean distance emerges
as a more accurate metric compared to cosine similarity.
Table 3 presents the mAP and Rank1 accuracy results for pixel layer fusion. Among the
three fusion methods within the pixel layer fusion, spatial fusion exhibits the lowest per-
formance, whereas channel fusion and intensity fusion demonstrate similar effectiveness.
When compared to the individual modalities, channel fusion and intensity fusion outper-
form fingerprint and face by 2.8 and 6.7 percentage points, respectively, yet they fall 1.1
percentage points short of iris unimodal accuracy.
The Euclidean distance metric excels, surpassing the cosine similarity metric by 17.5,
6.7, and 8.1 percentage points, respectively. Notably, the highest Rank1 accuracy, achieved
with the intensity fusion method, remains only 0.2 percentage points lower than the iris
13
Multimedia Tools and Applications (2024) 83:80105–80128 80121
A 1
Face-DenseNet
0.95 Face-ResNet
Precision
Face-VGG16BN
0.9
Face-VGG
FP-DenseNet
0.85
FP-ResNet
0.8 FP-VGG16BN
1 5 10 20 30 40 50 FP-VGG
Ranking
B 1 Face-DenseNet
Face-ResNet
0.95
Face-VGG16BN
Precision
0.9 Face-VGG
FP-DenseNet
0.85 FP-ResNet
FP-VGG16BN
0.8
1 5 10 20 30 40 50 FP-VGG
Ranking Iris-DenseNet
unimodal accuracy. Among the four backbone networks, the VGG network enhanced with
batch normalization before the activation function demonstrates the best performance,
though it’s worth noting that for intensity fusion, DenseNet stands out as the top-perform-
ing option.
In Table 4, we present the mAP and Rank1 accuracy for the fusion of feature layers, dem-
onstrating the notable precision improvement achieved through the combination of these
13
80122 Multimedia Tools and Applications (2024) 83:80105–80128
layers. Both the VGG16 and VGG-BN networks deliver a commendable mAP accuracy
of 96.7%. The best fusion retrieval accuracy surpasses the highest single-modal biometric
retrieval accuracy by 3.0%. The cosine similarity metric yields the highest retrieval accu-
racy, outperforming the unimodal result by 1.8 percentage points, while the Euclidean dis-
tance metric achieves a 1.2 percentage point advantage over cosine similarity.
Comparing the best and worst unimodal biometric retrieval accuracy, the VGG16-BN
network attains a Rank1 accuracy that is 2.2 percentage points higher. This underscores the
effectiveness of the feature layer fusion method, which differs from pixel layer fusion by
incorporating both single-modal image features and semantic features. This is substantiated
by the enhanced retrieval accuracy seen in both measurement techniques.
In Table 5, we present the mAP and Rank1 accuracy results for the score layer fusion. The
two score layer fusion methods have a significant and positive impact on enhancing the
accuracy of fusion retrieval. The best fusion retrieval accuracy, as evaluated by the Rank1
method, outperforms the best single-modal biometric retrieval accuracy by a noteworthy
4.4 percentage points. Similarly, the modal evaluation method indicates that the best fusion
retrieval accuracy surpasses the best single-modal biometric retrieval accuracy by a sub-
stantial 5.4 percentage points. Furthermore, the best retrieval accuracy, using the Euclidean
distance metric, excels by 1.3 percentage points compared to the best retrieval accuracy
using cosine similarity.
DenseNet stands out as the preferred choice for score layer fusion, demonstrating supe-
rior performance. The best Rank1 accuracy, according to the Rank1 evaluation method, is
3 percentage points higher than the highest single-modal Rank1 accuracy, while the best
Rank1 accuracy according to the modal evaluation method surpasses the single-modal
Rank1 accuracy by an impressive 3.6 percentage points. Even the accuracy of the cosine
similarity-based method shows improvement compared to the single-modal accuracy.
Achieving a 99.6% mAP involves various optimization strategies. Regularization tech-
niques, such as dropout and L1/L2 regularization, help prevent overfitting. Training strate-
gies, like early stopping and learning rate schedules, aid in avoiding overtraining and local
minima. Fine-tuning hyperparameters, including learning rates and network architecture,
13
Multimedia Tools and Applications (2024) 83:80105–80128 80123
100
95 Intensity-E
90
mAP%
Feature-E
85 Model-E
80 Intensity-C
75 Feature-C
VGG16 VGG16-BN ResNet50 DenseNet169 Model-C
Backbone
(A) mAP
100
Intensity-E
95 Feature-E
Rank1%
90 Model-E
Intensity-C
85 Feature-C
VGG16 VGG16-BN ResNet50 DenseNet169
Model-C
Backbone
(B) Rank1%
13
80124 Multimedia Tools and Applications (2024) 83:80105–80128
1
0.998 Intensity-DenseNet
0.996
0.994 Intensity-ResNet
0.992 Intensity-VGGBN
0.99
0.988 Intensity-VGG
Precision
0.986 Feature-DenseNet
0.984
0.982 Feature-ResNet
0.98
0.978 Feature-VGGBN
0.976 Feature-VGG
0.974
0.972 Rank1-DenseNet
0.97 Rank1-ResNet
0 10 20 30 40 50 60
Rank1-VGGBN
Ranking
1
0.998 Intensity-DenseNet
0.996
0.994 Intensity-ResNet
0.992
0.99 Intensity-VGGBN
Precision
0.988 Intensity-VGG
0.986
0.984 Feature-DenseNet
0.982
0.98 Feature-ResNet
0.978
0.976 Feature-VGGBN
0.974
0.972 Feature-VGG
0.97
0 10 20 30 40 50 60 Rank1-DenseNet
Ranking Rank1-ResNet
Fig. 10 CMC curves of different fusion methods. (a) Euclidean distance. (b) Cosine distance
13
Multimedia Tools and Applications (2024) 83:80105–80128 80125
6 Conclusion
This research proposes a multimodal method based on a state biometric fusion model
of deep learning. The three primary study objectives are facial, fingerprint, and iris
biometric data types. In addition to addressing the shortcomings of current algorithms,
real-world application needs of public security organizations are sought after. Pixel, fea-
ture, and score-level fusion are among the levels and aspects of multimodal biometric
retrieval approaches that are examined. The suggested paradigm offers flexibility, scal-
ability, and generality in fusion selection on several levels. Additionally, low-quality or
erroneous data is no longer a factor, and the detrimental effects of single-modal out-
liers on recognition performance are successfully reduced. This work focuses on pic-
ture data and uses three main fusion strategies in the pixel-level fusion method: channel
fusion, intensity fusion, and spatial fusion. These techniques blend several modalities
to produce a fused picture that is fed into the backbone network as a whole for training
and retrieval. We use the publicly accessible CASIA-FingerprintV5 and CASIA-IrisV4-
Interval biometric databases for iris and fingerprint data. Using the identical tools and
protocols as the publicly available datasets, more data were obtained from nearby public
security organizations. 500 people’s fingerprint data for eight fingers are included in
the CASIAFingerprintV5 dataset. The fingerprint data is divided into homogeneous and
heterogeneous categories to increase the dataset size with no effect on model perfor-
mance. Unfortunately, pixel-level fusion does not significantly improve fusion retrieval
performance. It is noteworthy that the best mAP for the feature layer fusion approach
lags by 2.3 percentage points and the greatest Rank1 accuracy by 2.4 percentage points,
respectively, behind the best mAP for the score layer fusion. On the other hand, retrieval
accuracy may be significantly increased using score layer fusion; the highest accuracy
possible is 99.6%. Moreover, it is more feasible and resource-efficient for real-world
applications since it simply necessitates the training of modality-specific networks.
These efforts will contribute to developing robust and practical fusion models that can
effectively address the needs of public security agencies and accommodate low-quality
biometric data.
Funding This research does not receive any kind of funding in any form.
Declarations
This research does not involve any animal or human participation.
References
1. Ö. Toygar, F. O. Babalola and Y. Bi̇ti̇ri̇m (2020) "FYO: A Novel Multimodal Vein Database With Pal-
mar, Dorsal and Wrist Biometrics," in IEEE Access 8: 82461–8247. https://doi.org/10.1109/ACCESS.
2020.2991475.
13
80126 Multimedia Tools and Applications (2024) 83:80105–80128
2. Kumar P, Mukherjee S, Saini R, Kaushik P, Roy PP, Dogra DP (2019) Multimodal Gait Recogni-
tion With Inertial Sensor Data and Video Using Evolutionary Algorithm. IEEE Trans Fuzzy Syst
27(5):956–965. https://doi.org/10.1109/TFUZZ.2018.2870590
3. Atenco JC, Moreno JC, Ramírez JM (2023) Deep Learning Convolutional Network for Bimodal
Biometric Recognition with Information Fusion at Feature Level. IEEE Lat Am Trans 21(5):652–
661. https://doi.org/10.1109/TLA.2023.10130837
4. Yuan C, Jiao S, Sun X, Wu QMJ (2022) MFFFLD: A Multimodal-Feature-Fusion-Based Finger-
print Liveness Detection. IEEE Transactions on Cognitive and Developmental Systems 14(2):648–
661. https://doi.org/10.1109/TCDS.2021.3062624
5. Huang Y, Ma H, Wang M (2023) Multimodal Finger Recognition Based on Asymmetric Networks
With Fused Similarity. IEEE Access 11:17497–17509. https://doi.org/10.1109/ACCESS.2023.
3242984
6. Hammad M, Liu Y, Wang K (2019) Multimodal Biometric Authentication Systems Using Con-
volution Neural Network Based on Different Level Fusion of ECG and Fingerprint. IEEE Access
7:26527–26542. https://doi.org/10.1109/ACCESS.2018.2886573
7. Kanhangad V, Kumar A, Zhang D (2008) "Comments on “An Adaptive Multimodal Biometric
Management Algorithm,” in IEEE Transactions on Systems, Man, and Cybernetics. Part C (Appli-
cations and Reviews) 38(6):841–843. https://doi.org/10.1109/TSMCC.2008.2001570
8. Poh N et al (2009) Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal
Biometric Fusion Algorithms. IEEE Trans Inf Forensics Secur 4(4):849–866. https://doi.org/10.
1109/TIFS.2009.2034885
9. Snelick R, Uludag U, Mink A, Indovina M, Jain A (2005) Large-scale evaluation of multimodal
biometric authentication using state-of-the-art systems. IEEE Trans Pattern Anal Mach Intell
27(3):450–455. https://doi.org/10.1109/TPAMI.2005.57
10. Poh N, Kittler J, Bourlai T (2010) Quality-Based Score Normalization With Device Qualitative
Information for Multimodal Biometric Fusion. IEEE Transactions on Systems, Man, and Cybernet-
ics - Part A: Systems and Humans 40(3):539–554. https://doi.org/10.1109/TSMCA.2010.2041660
11. Shekhar S, Patel VM, Nasrabadi NM, Chellappa R (2014) Joint Sparse Representation for Robust
Multimodal Biometrics Recognition. IEEE Trans Pattern Anal Mach Intell 36(1):113–126. https://
doi.org/10.1109/TPAMI.2013.109
12. Walia GS, Jain G, Bansal N, Singh K (2020) Adaptive Weighted Graph Approach to Generate Mul-
timodal Cancelable Biometric Templates. IEEE Trans Inf Forensics Secur 15:1945–1958. https://
doi.org/10.1109/TIFS.2019.2954779
13. Monwar MM, Gavrilova ML (2009) “Multimodal Biometric System Using Rank-Level Fusion
Approach,” in IEEE Transactions on Systems, Man, and Cybernetics. Part B (Cybernetics)
39(4):867–878. https://doi.org/10.1109/TSMCB.2008.2009071
14. Haghighat M, Abdel-Mottaleb M, Alhalabi W (2016) Discriminant Correlation Analysis: Real-
Time Feature Level Fusion for Multimodal Biometric Recognition. IEEE Trans Inf Forensics Secur
11(9):1984–1996. https://doi.org/10.1109/TIFS.2016.2569061
15. Nguyen K, Denman S, Sridharan S, Fookes C (2015) Score-Level Multibiometric Fusion Based
on Dempster-Shafer Theory Incorporating Uncertainty Factors. IEEE Transactions on Human-
Machine Systems 45(1):132–140. https://doi.org/10.1109/THMS.2014.2361437
16. Conti V, Militello C, Sorbello F, Vitabile S (2010) “A Frequency-based Approach for Features
Fusion in Fingerprint and Iris Multimodal Biometric Identification Systems,” in IEEE Transactions
on Systems, Man, and Cybernetics. Part C (Applications and Reviews) 40(4):384–395. https://doi.
org/10.1109/TSMCC.2010.2045374
17. Poh N, Windridge D, Mottl V, Tatarchuk A, Eliseyev A (2010) Addressing Missing Values in Ker-
nel-Based Multimodal Biometric Fusion Using Neutral Point Substitution. IEEE Trans Inf Foren-
sics Secur 5(3):461–469. https://doi.org/10.1109/TIFS.2010.2053535
18. Zhang X, Cheng D, Jia P, Dai Y, Xu X (2020) An Efficient Android-Based Multimodal Biometric
Authentication System With Face and Voice. IEEE Access 8:102757–102772. https://doi.org/10.
1109/ACCESS.2020.2999115
19. Guo BH, Nixon MS, Carter JN (2019) Soft Biometric Fusion for Subject Recognition at a Distance.
IEEE Trans Biomet Behav Identity Sci 1(4):292–301. https://doi.org/10.1109/TBIOM.2019.29439
34
20 Parashar A, Parashar A, Abate AF, Shekhawat RS, Rida I (2023) Real-time gait biometrics for sur-
veillance applications: A review. In Image and Vision Computing 138:104784. https://doi.org/10.
1016/j.imavis.2023.104784
21. Zhang H, Li S, Shi Y, Yang J (2019) Graph Fusion for Finger Multimodal Biometrics. IEEE Access
7:28607–28615. https://doi.org/10.1109/ACCESS.2019.2902133
13
Multimedia Tools and Applications (2024) 83:80105–80128 80127
22. Veeramachaneni K, Osadciw LA, Varshney PK (2005) “An adaptive multimodal biometric manage-
ment algorithm,” in IEEE Transactions on Systems, Man, and Cybernetics. Part C (Applications
and Reviews) 35(3):344–356. https://doi.org/10.1109/TSMCC.2005.848191
23. Sultana M, Paul PP, Gavrilova ML (2018) Social Behavioral Information Fusion in Multimodal Biom-
etrics. IEEE Trans Syst Man Cybernetics: Systems 48(12):2176–2187. https://doi.org/10.1109/TSMC.
2017.2690321
24. Rida, I., Al-Maadeed, N., Al-Maadeed, S., & Bakshi, S. (2018). A comprehensive overview of feature
representation for biometric recognition. In Multimedia Tools and Applications 79 7–8: 4867–4890.
Springer Science and Business Media LLC. https://doi.org/10.1007/s11042-018-6808-5
25. Ghaderi A, Shahri AA, Larsson S (2022) A visualized hybrid intelligent model to delineate Swedish
fine-grained soil layers using clay sensitivity. CATENA 214:106289
26. Abbaszadeh Shahri A, Shan C, Larsson S (2022) A novel approach to uncertainty quantification in
groundwater table modeling by automated predictive deep learning. Nat Resour Res 31(3):1351–1373
27. Naik DL, Kiran R (2021) A novel sensitivity-based method for feature selection. Journal of Big Data
8:1–16
28. Asheghi R, Hosseini SA, Saneie M, Shahri AA (2020) Updating the neural network sediment load
models using different sensitivity analysis methods: a regional application. J Hydroinf 22(3):562–577
29. Paul PP, Gavrilova ML, Alhajj R (2014) Decision Fusion for Multimodal Biometrics Using Social Net-
work Analysis. IEEE Trans Syst Man Cybern: Syst 44(11):1522–1533. https://doi.org/10.1109/TSMC.
2014.2331920
30. Bahrampour S, Nasrabadi NM, Ray A, Jenkins WK (2016) Multimodal Task-Driven Dictionary Learn-
ing for Image Classification. IEEE Trans Image Process 25(1):24–38. https://doi.org/10.1109/TIP.
2015.2496275
31. Iula A, Micucci M (2022) Multimodal Biometric Recognition Based on 3D Ultrasound Palmprint-
Hand Geometry Fusion. IEEE Access 10:7914–7925. https://doi.org/10.1109/ACCESS.2022.3143433
32. Jiang RM, Sadka AH, Crookes D (2010) “Multimodal Biometric Human Recognition for Percep-
tual Human-Computer Interaction,” in IEEE Transactions on Systems, Man, and Cybernetics. Part C
(Applications and Reviews) 40(6):676–681. https://doi.org/10.1109/TSMCC.2010.2050476
33. Rahman A et al (2021) Multimodal EEG and Keystroke Dynamics Based Biometric System Using
Machine Learning Algorithms. IEEE Access 9:94625–94643. https://doi.org/10.1109/ACCESS.2021.
3092840
34. Fox NA, Gross R, Cohn JF, Reilly RB (2007) Robust Biometric Person Identification Using Automatic
Classifier Fusion of Speech, Mouth, and Face Experts. IEEE Trans Multimedia 9(4):701–714. https://
doi.org/10.1109/TMM.2007.893339
35. Xin Y et al (2018) Multimodal Feature-Level Fusion for Biometrics Identification System on IoMT
Platform. IEEE Access 6:21418–21426. https://doi.org/10.1109/ACCESS.2018.2815540
36 Toh KA, Yau WY (2004) "Combination of hyperbolic functions for multimodal biometrics data fusion.
IEEE Trans Syst Man Cybern Part B 34(2):1196–1209. https://doi.org/10.1109/TSMCB.2003.821868
37. Talreja V, Valenti MC, Nasrabadi NM (2021) Deep Hashing for Secure Multimodal Biometrics. IEEE
Trans Inf Forensics Secur 16:1306–1321. https://doi.org/10.1109/TIFS.2020.3033189
38. Li J, Hong D, Gao L, Yao J, Zheng K, Zhang B, Chanussot J (2022) Deep learning in multimodal
remote sensing data fusion: A comprehensive review. Int J Appl Earth Obs Geoinformation (Vol. 112,
p. 102926). Elsevier BV. https://doi.org/10.1016/j.jag.2022.102926
39. Wang Y, Shi D, Zhou W (2022) Convolutional Neural Network Approach Based on Multimodal Bio-
metric System with Fusion of Face and Finger Vein Features. Sensors (Basel) 22(16):6039. https://doi.
org/10.3390/s22166039
40. Gavrilova M, Luchak I, Sudhakar T, Tumpa SN (2022) Artificial Intelligence in Biometrics: Uncov-
ering Intricacies of Human Body and Mind. In Learning and Analytics in Intelligent Systems (pp.
123–169). Springer International Publishing. https://doi.org/10.1007/978-3-030-93052-3_7
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under
a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted
manuscript version of this article is solely governed by the terms of such publishing agreement and applicable
law.
13
80128 Multimedia Tools and Applications (2024) 83:80105–80128
* Mohammad Shabaz
[email protected]
Haewon Byeon
[email protected]
Vikas Raina
[email protected]
Mukta Sandhu
[email protected]
Ismail Keshta
[email protected]
Mukesh Soni
[email protected]
Khaled Matrouk
[email protected]
Pavitar Parkash Singh
[email protected]
T. R. Vijaya Lakshmi
[email protected]
1
Department of Digital Anti‑Aging Healthcare, Inje University, Gimhae, Republic of Korea 50834
2
CSE, SET, Mody University of Science and Technology, NH‑52, Lakshmangarh, Sikar,
Rajasthan 332311, India
3
Shri Vishwakarma Skill University, Palwal, India
4
Model Institute of Engineering and Technology Jammu, Jammu, J&K, India
5
Computer Science and Information Systems Department, College of Applied Sciences, AlMaarefa
University, Riyadh, Saudi Arabia
6
Dr. D. Y. Patil Vidyapeeth, Pune, D. Y. Patil School of Science & Technology, Tathawade, Pune,
India
7
Computer Engineering Department, Al-Hussein Bin Talal University, Ma’an, Jordan
8
Department of Management, Lovely Professional University, Phagwara, India
9
Mahatma Gandhi Institute of Technology, Gandipet, Hyderabad, India
13