5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

Journal of the Franklin Institute 361 (2024) 107306

Contents lists available at ScienceDirect

Journal of the Franklin Institute


journal homepage: www.elsevier.com/locate/fi

Face recognition method based on fusion of improved


MobileFaceNet and adaptive Gamma algorithm☆
Jingwei Li , Yipei Ding * , Zhiyu Shao , Wei Jiang
School of Electrical and Energy Power Engineering, Yangzhou University, Yangzhou, Jiangsu 225127, PR China

A R T I C L E I N F O A B S T R A C T

Keywords: MobileFaceNet face recognition algorithm is a relatively mainstream face recognition algorithm
Face recognition at present. Its advantages of small memory and fast running speed make it widely used in
MobileFaceNet embedded devices. Due to the limited face image acquisition capability of embedded devices, the
Convolutional neural network
accuracy of face recognition is often reduced due to uneven illumination and poor exposure
Style attention mechanism
Gamma correction
quality. In order to solve this problem, a face recognition algorithm based on the fusion of
MobileFaceNet and adaptive Gamma algorithm is proposed. The application of the algorithm
proposed in this paper in image preprocessing is as follows. Firstly, adaptive Gamma correction is
used to improve the brightness of the face image. Then, the edge of the face image is enhanced by
the Laplace operator. Finally, a linear weighted fusion was performed between the Gamma cor­
rected image and the enhanced edge image to obtain the pre-processed face image. At the same
time, we have improved the traditional MobileFaceNet network. The feature extraction network
MobileFaceNet has been improved by adding a Stylebased Recall Module (SRM) attention
mechanism to its bottom neck layer, utilizing the mean and standard deviation of input features
to improve the ability to capture global information and enhance more important feature infor­
mation. Finally, the proposed method was verified on the LFW and Agedb face test set. The
experimental results show that the adaptive Gamma algorithm proposed in this paper and the
improvement of MobileFaceNet can achieve a face recognition accuracy of 99.27 % on LFW
dataset and 90.18 % on Agedb dataset while only increasing the model size by 0.4 M and the
processing speed for each image is enhanced by 4 ms. which can effectively improve the accuracy
of face recognition and better application prospects on embedded devices. The method presented
in this article has certain practical significance.

1. Introduction

Face recognition is a biometric identification technology based on human face shape features. Compared with fingerprint, iris, and
other biometric identification technologies, it has the advantages of non-contact and non-mandatory and has been widely used in
finance, security, traffic, public security, and other fields [1,2]. The key of face recognition is to extract effective recognition features
from face images under different illumination, angles, postures, and other factors, to reduce the gap between classes and increase the


This work was supported by the Open Project Program of Engineering Research Center of High-efficiency and Energy-saving Large Axial Flow
Pumping Station, Jiangsu Province, Yangzhou University (grant number ECHEAP2022–018).
* Corresponding author.
E-mail address: [email protected] (Y. Ding).

https://doi.org/10.1016/j.jfranklin.2024.107306
Received 29 August 2023; Received in revised form 13 July 2024; Accepted 26 September 2024
Available online 27 September 2024
0016-0032/© 2024 The Franklin Institute. Published by Elsevier Inc. All rights are reserved, including those for text and data mining, AI training,
and similar technologies.
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

gap between classes, so as to distinguish different individuals. Therefore, how to extract more robust features to effectively recognize
faces has become the key to solve the problem [3].
In the dynamic field of deep learning, the quest for efficient models on resource-limited devices is paramount. MobileNet series,
spearheaded by Google, represents a significant stride in deploying computer vision applications on mobile and embedded systems [4].
Commencing with MobileNetV1 in 2017, Google introduced lightweight convolutional neural networks (CNNs) featuring depthwise
separable convolutions, effectively curbing parameters and computational overhead for devices with limited resources [5]. Subsequent
iterations like MobileNetV2 incorporated inverted residuals and linear bottlenecks, further refining adaptability for mobile image
classification [6]. MobileNetV3, introduced in 2019, continued the efficiency journey with optimized separable convolutions,
improved activation functions, and Squeeze-and-Excitation (SE) modules [7,8]. Despite these advancements, the efficacy of such
networks, including MobileNetV3, in facial recognition tasks remained suboptimal. While MobileNetV3 offers a lightweight solution
suitable for mobile and embedded systems, its complexity, computational demands, and limited adaptability for facial recognition
tasks may hinder its effectiveness in achieving optimal performance in this specific domain. To address this issue, Sheng proposed a
lightweight network called MobileFaceNet specifically designed for face recognition [9]. The main goal of MobileFaceNet may be to
optimize facial feature extraction by replacing the global average pooling layer with a global depthwise convolutional layer (GDConv)
to meet the different importance levels of different positions in the final layer feature map [10]. At the same time, the parameter
corrected linear unit (Prelu) is used instead of the Relu activation function, which further restricts Relu [11]. This means that Prelu is a
Relu with added correction parameters. Introducing a normalization layer to accelerate model convergence and prevent overfitting. In
order to further compress the size of the model and better achieve accurate facial recognition function on embedded devices. Xiao et al.
reduced model parameters by reducing the number of layers in MobileFaceNet [12]. Then, use the h-ReLU6 activation function to
replace the PReLU in the original model [13]. And by introducing the ECA module [14], the importance of each feature channel is
obtained through learning, which improves the accuracy and conciseness of the MobileFaceNet [15]. Similarly, Hu et al. optimized the
model structure by transitioning from bottleneck modules to sandglass modules, strategically adjusting the spatial configuration of
convolutions to enhance feature extraction while minimizing information loss [16]. In addition, Zaferani et al.propose a CNN-based
model optimized for mobile devices. Utilizing the Multi-Task Convolutional Neural Network (MTCNN) for face detection, we apply
modifications to the MobileFaceNet model and train it with the Margin Distillation cost function. Enhancements such as Dense Blocks
and Depthwise separable convolutions boost model performance. [17] However, the above methods are not accurate enough for some
face recognition application scenarios that require high accuracy, so a method that can not only ensure the size of the model but also
improve the accuracy of face recognition is urgently needed.
One significant factor influencing the performance of facial recognition systems is the quality of input images. Illumination var­
iations, shadows, and uneven lighting can significantly degrade the quality of facial images, making it challenging for algorithms to
extract discriminative features accurately [18,19]. Traditional preprocessing methods, such as gamma correction [20], histogram
equalization [21], and linear stretching [22], have been employed to enhance image quality and improve feature extraction. Among
these techniques, adaptive gamma image correction has garnered attention due to its ability to address lighting variations effectively
[23]. Adaptive gamma image correction is a data-driven approach that adjusts the gamma correction parameter based on the char­
acteristics of input images. By dynamically modifying the gamma value according to the pixel distribution and luminance levels, this
technique can mitigate the adverse effects of uneven lighting and improve the overall quality of facial images. Moreover, adaptive
gamma correction offers advantages over static methods by providing tailored adjustments for each image, thereby enhancing the
robustness and accuracy of facial recognition systems. Adaptive gamma correction algorithm has been widely used in face recognition.
James and Chandy [24] proposed an adaptive correction algorithm for non-uniform lighting images by improving the two-dimensional
gamma function, which achieved certain results. Zhang et al. [25] further used the artificially set average brightness value of 128 as the
standard to evaluate the parameter γ. An improved two-dimensional gamma correction algorithm has been proposed for adaptive
adjustment. However, although correction algorithms based on two-dimensional gamma functions can achieve good processing re­
sults, their computational complexity is often high and they are not suitable for large-scale facial databases [26]. Considering that the
algorithm in this paper is mainly used on mobile embedded devices with limited computing power, the two-dimensional gamma
algorithm obviously does not meet our needs, so it is necessary to find a suitable one-dimensional adaptive gamma algorithm [27,28].
The paper introduces enhancements to the MobileFaceNet by refining the MoblieFaceNet model structure and incorporating a
Style-based Recalibration Module (SRM) into the Depthwise structure of the original network structure. At the same time, after multi-
task cascaded convolutional neural networks (MTCNN) clipping the face, the data set is preprocessed using the proposed one-
dimensional adaptive gamma algorithm.We investigate the effectiveness of adaptive gamma correction in enhancing image quality,
preserving important facial features, and facilitating more accurate feature extraction. This paper will use the above methods to
improve the face recognition method and verify the algorithm in this paper by comparing receiver operating characteristic (ROC)
curve and area under curve (AUC) in the LFW face data test set to expect to improve the accuracy of the model.

2. Materials and methods

2.1. Gamma correction

Gamma correction is used to face image enhancement processing. Gamma correction establishes a nonlinear relationship between
gamma values and image pixel values, and makes appropriate corrections based on the specific values of each pixel in the image,
making pixels severely affected by lighting more significantly corrected.
Firstly, normalize all pixel values of the input image so that their value ranges are distributed between [0, 1]. Secondly, non-linear

2
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

mapping of pixel values is performed using the given γ, and the calculation process is shown in Eq. (1):
f(I) = Iγ (1)
Where I is the pixel value of the pixels in the image, f(I) is the converted pixel value, which is the gamma value.
When γ<1, the grayscale value of the image increases, the overall brightness increases, and the contrast is enhanced. When γ>1, the
grayscale value of the image decreases, but it can also enhance the contrast of the image to a certain extent.
Finally, the obtained pixel values are reverse-normalized, so that the range of pixel values is expanded to [0, 255], so as to obtain
the image after gamma correction. Fig. 1 shows the effect of different Gamma values on the image gray scale.

2.2. Feature extraction network

The feature extraction network stands as a pivotal element in the realm of deep learning, meticulously crafted to discern abstract
features within input data, thus furnishing the model with insightful representations. These networks assume a vital role in diverse
domains such as computer vision, natural language processing, and beyond, finding extensive utility in tasks ranging from image
recognition to speech processing and text analysis.At the heart of these networks lies their hierarchical structure, typically comprising
multiple layers of convolution, pooling, and activation functions. Convolutional layers meticulously extract local features from input
data through convolution operations, adept at capturing spatial patterns and structural nuances. Subsequent pooling layers serve to
diminish the spatial dimensions of the feature map, preserving crucial information and mitigating computational costs. The infusion of
activation functions introduces nonlinearity, empowering the network to apprehend more intricate features and patterns.A proto­
typical feature extraction network unfolds in a stratified fashion, with lower levels entrusted to capture rudimentary and localized
features, while higher echelons progressively distill more abstract and overarching features. This methodical layer-by-layer extraction
process empowers the network to gradually fathom the hierarchical intricacies of input data, thereby enhancing its capability to
represent intricate patterns adeptly.It is noteworthy that feature extraction networks are frequently integrated with task-specific
networks, forging holistic end-to-end deep learning models. In certain instances, these networks manifest as pretrained models,
harnessing universal feature extraction prowess via training on expansive datasets. Leveraging transfer learning, the features gleaned
from pretrained models can be seamlessly applied to specific tasks, expediting training timelines and refining overall model perfor­
mance.In essence, the architecture of feature extraction networks is meticulously designed to adeptly capture salient information from
input data, endowing deep learning models with robust representations. The continuous evolution and refinement of these networks
have propelled the advancement of deep learning across diverse domains, empowering models to glean profound insights and interpret
complex real-world datasets more effectively.

2.2.1. MobileNetV1 and mobilenetv2


MobileNetV1 is a network architecture released by Google, proposing the concept of Depthwise Separable Convolutions (DSC)
[29]. The core is to split the convolution into two parts: Depthwise Convolution and Pointwise Convolution [30]. Depthwise
Convolution is a convolution that does not cross channels, meaning that each channel of a Feature Map has an independent convo­
lution kernel that acts only on that channel, as shown in Fig. 2(a). The number of channels in the output Feature Map is equal to the
number of channels in the input Feature Map. So it does not have the function of dimensionality up or down. Pointwise Convolution is
used for feature merging, dimensionality enhancement, or dimensionality reduction, typically using 1 × 1 convolution to achieve
functionality.
Taking an input feature layer with M channels and DF × DF dimensions, and an output feature layer with N output channels as an
example:

Fig. 1. Gamma functional curve.

3
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

Fig. 2. Schematic diagram of detachable convolution.

The number of parameters for standard convolutional kernels is DF × DF × M × N. The computational cost of a convolutional
kernel acting on a feature layer at once is Dw × Dh × M (Dw and Dh is the width and height of the standard convolutional kernel),
requires DF × DF times. The standard convolution has a total of N convolution kernels, so the computational cost P1 of the standard
convolution kernel is shown in Eq. (2):
P1 = DF × DF × M × N × Dw × Dh (2)
Similarly, the size of Depthwise Convolution is DF × DF × M, need to do a total of Dw × Dh times multiplication and addition op­
erations. The convolution kernel size of Pointwise Convolution is 1 × 1 × M, with N points, and a total of Dw × Dh multiplication and
addition operations need to be performed. The computational complexity of DSC is composed of two parts: Depthwise Convolution and
Pointwise Convolution, so the computational cost P2 of DSC is shown in Eq. (3):
P2 = DF × DF × M × Dw × Dh + M × N × Dw × Dh (3)
Comparing Eqs. (2) and (3), the number of parameters and the decrease in computational complexity P3 of the multiplication and
addition operation are obtained is shown in Eq. (4):
P2 DF × DF × M × Dw × Dh + M × N × Dw × Dh 1 1
P3 = = = + 2 (4)
P1 DF × DF × M × N × Dw × Dh N Dk

Where M signifies the number of channels in the input feature layer, indicating the quantity of feature maps in the input data. N
signifies the number of channels in the output feature layer, indicating the quantity of feature maps generated after the convolution
operation. DF × DF denotes the width and height of the input feature layer, representing the spatial dimensions of the input feature
maps, where DF is the width and DF is the height. Dw × Dh refers to the width and height of the standard convolutional kernel,
determining the receptive field size of each convolutional kernel. P1 represents the computational cost of a standard convolution

Table 1
MobileNetV2 network structure.
Input Convolution z c n v

224 × 224 × 3 Conv2d ​ 32 1 2


112 × 112 × 32 bottleneck 1 16 1 1
112 × 112 × 16 bottleneck 6 24 2 2
56 × 56 × 24 bottleneck 6 32 3 2
28 × 28 × 32 bottleneck 6 64 4 2
28 × 28 × 64 bottleneck 6 96 3 2
14 × 14 × 96 bottleneck 6 160 3 1
7 × 7 × 160 bottleneck 6 320 1 2
7 × 7 × 320 Conv2d 1 × 1 ​ 1280 1 1
7 × 7 × 1280 Avgpool 7 × 7 ​ ​ 1 1
1 × 1 ×f Conv2d 1 × 1 ​ f ​ ​

4
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

kernel, indicating the amount of computation required for one convolution operation with a standard convolutional kernel on the
input feature layer. The specific operation is shown in Eqs. (2) to (4).
From Eq. (4), it can be seen that the number of parameters and computational complexity compared to standard convolution are
significantly reduced by DSC, and the computational efficiency of the model is also improved.
Depthwise Convolution itself does not have the ability to change channels, so the input channel is equal to the output channel. If the
number of input channels is small, Depthwise Convolution can only work in low dimensions, and the final convolution effect may not
be very good. Therefore, before the depthwise convolutional layer of MobileNetV1, MobileNetV2 [31] uses 1 × 1 convolution to carry
out dimensional upgrade operations, so that the network can extract features in a higher-dimensional space, which makes the extracted
features more global. At the same time, in order to solve the problem of information loss caused by low dimensional Relu operation, the
last Relu is replaced with Linear activation function. The network structure of MobileNetV2 is shown in Table 1, in the table: z rep­
resents the channel expansion factor. c is the number of output channels. n is the number of repetitions. v is the step. f is the number of
input channels.

2.2.2. MobileFaceNet
MobileFaceNet is an improved version of MobileNetV2. Due to the use of an average pooling layer in MobileNetV2, the weights of
different pixels in the same image are different [32]. The weight will be averaged by the global average pooling, and the network’s
performance will naturally decrease. Therefore, in MobileFaceNet, a 7 × 7 × 512 separable convolution is used instead of the original
global average pooling layer [33]. At the same time, Parametric Rectified Linear Unit (Prelu) is used to replace the Relu activation
function. Prelu is a further restriction on Relu, which means that Prelu is a Relu with added correction parameters [34]. And the
normalization layer is introduced to accelerate the model convergence and prevent overfitting [35]. The network structure of
MobileFaceNet is shown in Table 2.
The network structure not only retains the up-down dimension layer of MobileNetV2 and the linear activation function, but also
introduces 7 × 7 separable convolution before the full connection layer to replace the original average pooling layer. The features
extracted by the network are more generalized and global.

2.3. Style-based recalibration module (SRM)

Generally speaking, different face images have different attribute features. Face image features are extracted by the convolutional
network layer for classification and recognition. However, different face images may have different styles, and these style attributes
will further affect the feature extraction of convolutional networks, thereby affecting the accuracy of subsequent recognition and
classification [36].
SRM explicitly merges style information into convolutional neural network representations through feature recalibration.
Dynamically estimate the relative importance of each style feature based on the style attributes of different face images. Then
dynamically adjust feature weights based on the importance of style. This allows the network to focus on meaningful style information
while ignoring unnecessary style information. The structure of the SRM module is shown in Fig. 3. C is the number of channels. H is the
height of feature maps. W is the width of feature maps. d is the number of style features.
The SRM module consists of two main components: style pooling layer and style integration. The style pooling layer extracts style
features from each channel by summarizing feature responses from different spatial dimensions. The style integration generates
corresponding style weights based on style features for different positions in the image through channel by channel operations.
Simultaneously recalibrate feature maps based on different style weights to emphasize or hide their information. The SRM module can
significantly improve the performance of the network with minimal additional computation.

2.3.1. Style pooling


The style pooling layer includes two parts, namely the average pooling layer and the standard deviation pooling layer. For each
feature map with an input of X ∈ RNd ∗C∗H∗W during the convolution process. Nd is the number of samples in the small batch. Firstly, the
average pooling layer is used, and the specific operation is as shown in Eq. (5):

Table 2
MobileFaceNet network structure.
Input Convolution z c n v

112 × 112 × 3 Conv3 × 3 ​ 64 1 2


56 × 56 × 64 Depthwise Conv3 × 3 ​ 64 1 1
56 × 56 × 64 bottleneck 2 64 5 2
28 × 28 × 64 bottleneck 4 128 1 2
14 × 14 × 128 bottleneck 2 128 6 1
14 × 14 × 128 bottleneck 4 128 1 2
7 × 7 × 128 bottleneck 2 128 2 1
7 × 7 × 128 Conv1 × 1 ​ 512 1 1
7 × 7 × 512 Linear GDConv7 × 7 ​ 512 1 1
1 × 1 × 512 Linera Conv1 × 1 ​ 128 1 1

5
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

Fig. 3. Stylebased Recalibration Module.

H ∑ w
1 ∑
μnc = Xnchw (5)
HW h=1 w=1

Where μnc is the average pooling result of the original feature map. Xnchw is the original feature map. n, c, h, and w are the com­
ponents of Nd, C, H, and W. Secondly, calculate the standard deviation between the results of the average pooling layer and the original
input feature map is shown in Eq. (6):
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ H ∑ W
√ 1 ∑
σnc = √ (xnchw − μnc )2 (6)
HW h=1 w=1

Where σ nc is the result after passing through the standard pooling layer. Finally, the result after passing through the style pooling
layer is shown in Eq. (7):
tnc = [μnc , σnc ] (7)
Where tnc is the feature vector after passing through the style pooling layer.

2.3.2. Style integration


The style integration layer consists of three parts, namely the channel-wise fully connected (CFC),batch normalization (BN), and
activation layer [37]. Use the output from the style pooling layer as input to the style integration layer. The specific operation is shown
in Eqs. (8) to (12).
znc = wc tnc (8)

Nd
1 ∑
μ(z)
c = znc (9)
Nd n=1

√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√ Nd
√1 ∑ ( )2
σ(z) =√ znc − μ(z) c (10)
c
Nd n=1

( )
znc − μ(z)
̂z nc = γ c c
(z)
+ βc (11)
σ c

1
gnc = (12)
1 + e− ẑnc

Where znc is the feature vector multiplied by the learnable parameter wc to obtain the output. The averaged feature vector μ(z) c ,
standard deviation feature vector σ(z)
c , normalized feature vector z
̂ nc , activated feature vector g nc , and learnable parameter w c are all
involved in this process. wc ∈ RC∗2 is a learnable parameter. γc and βc represents the affine transformation parameter used in the
normalization layer.

2.4. Enhanced algorithms and improved models

The enhanced algorithms and improved models comprises three key stages. Initially, during data preprocessing, the MTCNN face
detection method is applied to identify faces, followed by cropping the detected face images to a standardized size of 112 × 112 pixels.

6
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

Additionally, an adaptive gamma correction method is employed to enhance the quality of the images. Subsequently, in the feature
extraction stage, an advanced MobileFaceNet face feature extraction network is utilized, incorporating an SRM module to enhance
spatial and channel-wise recalibration. Finally, the third stage involves the output of face classification results, where the face loss
function ArcFace is applied to supervise the training process, ensuring the extraction of highly discriminative features for accurate face
recognition.

2.4.1. Adaptive gamma correction


In the process of face recognition, the quality of collected face images can directly affect the efficiency of face recognition, so a
feasible algorithm to enhance image quality is crucial.
Although traditional gamma correction can improve overall image brightness, it fails to enhance edges and textures of the image.
However, edges and textures play an important role in effectively maintaining face features and ensuring face recognition accuracy. To

Fig. 4. Adaptive Gamma correction.

7
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

address this limitation, we first apply gamma correction to brighten the image before using the Laplacian operator to sharpen its edges.
The calculation process and convolution template are shown in Eqs. (13)-(14).
f(I) = f(x + 1, y) + f(x − 1, y) + f(x, y + 1) + f(x, y − 1) − 3 × f(x, y) (13)
⎡ ⎤
010
h(x, y) = ⎣ 1 − 3 1 ⎦ (14)
010

Where x and y represent coordinates in a two-dimensional plane. Typically, x denotes the coordinate along the horizontal di­
rection,while y represents the coordinate along the vertical direction.
Finally, the overall brightness of the image is adaptively adjusted based on the enhanced edges to achieve satisfactory imaging
quality in terms of brightness, edge and texture.
Before preprocessing face images, first, clarify the exposure status and brightness uniformity of the input image. When the face
image is dark, the parameters for gamma correction should be adjusted in the (0, 1) range. On the contrary, if the face image is too
bright or overexposed, the parameter value for gamma correction should be greater than 1. At this point, the brightness of the face
image has been preliminarily improved. However, considering that the brightness adjustment process will change the details of the
image, this paper uses the Laplace operator to extract the edge features of the image to obtain the face image with enhanced details. By
using coefficient factors, the obtained edge features are weighted and fused with the gamma-corrected image, and the image brightness
is finely adjusted again to obtain the preprocessed image.
The paper proposes an adaptive algorithm for one-dimensional gamma correction parameters to overcome the inconvenience of
manually adjusting parameters and improve the preprocessing efficiency of large-scale facial image databases. Firstly, calculate the
average value of the overall image pixels and use it to determine the overall exposure quality of the image to be processed. Based on the
grayscale value of 128, if the average pixel value of the entire image exceeds 128, the image will be considered brighter, otherwise it
will be considered darker. Furthermore, the normalized amplitude of the image mean deviation of 128 is used as the adaptive
parameter for gamma correction, and the calculation process is shown in Eq. (15):

⎪ |mean(I) − 128|
⎪ Iγ , γ =
⎪ , (mean(I) < 128)
⎨ 128
f(I) = (15)
⎪ 128
⎩I , γ = , (mean(I) > 128)

⎪ γ
|mean(I) − 128|

Where I is the pixel value of the pixels in the image, f(I) is the converted pixel value.

2.4.2. Weighted fusion of images


Simply applying gamma correction and edge feature extraction cannot achieve the goal of effectively enhancing recognizable
features while adjusting contrast. Even gamma correction may cause weakening or alteration of image details. Considering this, the
paper further weights and fuses the two resulting images after gamma correction and edge feature extraction to improve the accuracy

Fig. 5. MobileFaceNet-SRM bottleneck layer structure.

8
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

of face recognition. The calculation formula is shown in Eq. (16):


f(I) = f(I) + λ × f(e) (16)
Where f(I) is the original image. λ is the undetermined weighting factor. f(e) is the feature details after edge extraction. After
extensive experimentation,λ is taken as 1.2.
Because the fusion process of detail features may also cause changes in the brightness information of the image. We first calculate
the image mean M, and if M > 128, reduce the overall grayscale value of the image. On the contrary, increasing the grayscale values of
all pixels enables their average brightness to approach 128. Finally, the preprocessed face image was obtained. In this way, the fused
image not only improves brightness but also retains some detailed information. The flowchart of the algorithm in the paper is shown in
Fig. 4:

2.4.3. MobileFaceNet integrating SRM modules


In the feature extraction network section, the paper adds the SRM to traditional MobileFaceNet, which can further improve the
robustness and globality of feature extraction. Add an SRM module to the bottleneck layer of the MobileNetFace network. SRM first
extracts style information from each channel of the feature map through style pooling, and then estimates the recalibration weight of
each channel through channel independent style integration. By incorporating the relative importance of each style into the feature
map, SRM effectively enhances the representation ability of CNN. The key point is that it is lightweight and with very few parameters
introduced. After placing the SRM module in the bottleneck layer through the Depthwise operation, the feature extraction network can
dynamically enhance useful feature expressions based on the style information of the feature map, suppress possible noise, and
improve the robustness and representativeness of feature extraction. The operation of adding style attention mechanism is similar to
the architecture of MobileNetV3. The difference is that the SE module is added to the network architecture of MobileNetV3 and the
SRM module performs better than SENet in actual performance. The structure of the SRM module is shown in Fig. 5.

2.5. Face recognition method process

The process is mainly divided into three parts.

(1) The first stage is data preprocessing. In the data preprocessing stage, the MTCNN face detection method is used to detect face
images, and the detected face images are cropped into 112 × 112 pixels in size and then use adaptive gamma correction method
to enhance the image.
(2) The second stage of feature extraction is based on an improved MobileFaceNet face feature extraction network that adds an SRM
module.
(3) The third stage outputs face classification results. Then the face loss function ArcFace is used to supervise the training process
and obtain highly discriminative features as faces.

The overall implementation of the method is based on the Pytorch. The process of face recognition method is shown in Fig. 6.

3. Experiment and analysis

We conducted relevant experiments on the CASIA-Webface face dataset to verify the improved preprocessing methods and model
for improving face recognition efficiency.

Fig. 6. Face recognition method flowchart.

9
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

3.1. Evaluation method

3.1.1. Histogram evaluation method


Histogram is one of the image quality evaluation methods, which can evaluate the quality of an image by analyzing the distribution
of pixel values. Specifically, a histogram of the pixel value distribution can be obtained by statistically analyzing the pixel values of an
image. Histograms can reflect the contrast, brightness, saturation, and other information of an image, and can be used to determine
whether there are issues such as overexposure, underexposure, and noise in the image. The more peaks in the histogram, the better the
image quality. In general, the histogram of an image should be balanced, meaning that the proportion of histograms for each color
channel should be similar without significant offset. If the proportion of a channel’s histogram is too large or too small, it indicates that
the channel may have overexposure or underexposure issues and needs to be adjusted. In addition, if there are obvious peaks in the
histogram, it indicates that the image has distinct features and such images usually have good quality.

3.1.2. Evaluation of face recognition models


Accuracy and Receiver Operating Characteristic (ROC) curves are used as evaluation indicators. Accuracy calculation formula is
shown in Eq. (17):
TN + TP
Acc = (17)
TN + FN + TP + FP
In the ROC curve, the horizontal axis represents the false positive rate (FPR), which refers to the proportion of false positive in­
stances identified by the classifier to all negative instances. The vertical axis is true positive rate (TPR), which refers to the proportion
of real instances identified to all positive instances. For a certain classifier, we can obtain a FPR and TPR point pair based on its
performance on the test sample. In this way, this classifier can be mapped to a point on the ROC plane. By adjusting the threshold used
during the classification of this classifier, we can obtain a curve that passes through (0,0), (1,1), which is the ROC curve of this
classifier. The calculation formulas for FPR and TPR are shown in Eqs. (18) to (19):
FP
FPR = (18)
TN + FP

TP
TPR = (19)
TP + FN
Where TP is true positive, FP is false positive, FN is false negative, and TN is true negative.
Although using ROC curves to represent the performance of classifiers is very intuitive and useful. However, people always hope to
have a numerical value to indicate the quality of the classifier. So area under the roc curve (AUC) appeared. Typically, AUC values
range from 0.5 to 1.0, with larger AUC representing better performance. AUC is a standard used to measure the quality of classification
models.

3.2. Experimental design

To evaluate the performance of the method, we first evaluate the effectiveness of the adaptive gamma correction algorithm by
comparing the pixel distribution of histograms. During the experiment, we first perform MTCNN face clipping on the entire dataset,
followed by an adaptive gamma correction algorithm. Select images with deviation in lighting conditions for histogram pixel distri­
bution analysis, and draw preliminary conclusions. Then, the original dataset and the enhanced dataset were trained using Mobile­
FaceNet and the improved MobileFaceNet, respectively. The feasibility of the proposed method was further verified through the test
results on the LFW and Agedb test set.
The experiment was implemented using the Python language under Pytorch. The training and testing steps are run on NVIDIA
GTX3060 GPU. The total number of training rounds epoch is set to 100. The batch size is set to 128. The learning rate reduction method
is the cosine learning rate. The maximum learning rate is set to 0.01, and the minimum learning rate is 0.01 times the maximum
learning rate. The weight attenuation parameter is set to 5 × 10− 4 . Using a random gradient descent strategy to optimize the SGD
model.

3.3. Sample database description

The paper uses the CASIA-Webface face dataset as the training dataset.CASIA-WebFac is a large-scale face recognition dataset that
includes 494,414 images of 10,575 individuals, each with different angles, expressions, and lighting conditions. All face images are
automatically obtained from public image libraries on the internet, thus possessing high diversity and authenticity. First, use the
MTCNN face detection method to redetect CASIA Webface images, and crop the detected face images into 112 × 112 pixels in size, and
then use adaptive gamma correction to enhance the image.
The test dataset was selected from the Labeled Faces in the Wild (LFW) dataset, which is a very popular face recognition dataset
created by artificial intelligence researchers in the United States. The dataset contains personal photos of 13,233 individuals, each with
multiple different photos, and a total of 5749 face image pairs. These face images are collected from actual scenes on the internet, with
high diversity and authenticity. Each face image pair in the LFW dataset is labeled as the same or different person. This makes the LFW

10
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

dataset an ideal dataset for testing face recognition algorithms.And the AgeDB dataset contains a total of 12,440 facial images from 440
people.

3.4. Experimental results

3.4.1. Preprocessing results


The effect of correcting images with uneven lighting using this method is shown in Fig. 7. The algorithm effectively solves the
problems of uneven lighting and poor exposure quality. The enhanced image can display more details than the original image, better
adjusting the overall brightness of the face image, and making images with poor exposure quality clearer. Meanwhile, because the
method in the paper only needs to calculate the average brightness of the image and obtain appropriate correction parameters based on
it, the computational complexity is very low. It will not cause computational burdens on the face recognition system, which is
conducive to significantly improving the preprocessing efficiency of face images.
The corresponding histogram is also shown in Fig. 7. Take figures a and c as examples. The main peak of the histogram of the
original image is concentrated on the left, and the overall brightness of the image is low. Similarly, for the picture with overexposure,
such as figure e, its overall brightness is high, and the main peak of the original image histogram is concentrated on the right. After
correction, the pixels are more evenly distributed at the pixel level. As can be seen from the histogram, the distribution of interval
pixels with >150 pixel values is effectively enhanced, making the histogram appear a more obvious bimodal distribution, highlighting
the texture and details of the image.

3.4.2. Improved model


In order to test the effectiveness of the algorithm and objectively reflect the training effectiveness of different models and datasets.
We will train the model before and after improvement on the CASIA-Webface before and after enhancement. Then compare the test
results on the LFW and AgeDB test sets. The statistical results are shown in Table 3.

(1) Based on the original model and original dataset

The results of the feature extraction network MobileFaceNet trained on the CASIA-Webface training set are shown in the first row of
Table 3. As an excellent mobile face recognition model, MobileFaceNet has an accuracy of 99.10% on the LFW dataset and 90.03 % on

Fig. 7. Picture and Histogram Comparison.

11
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

Table 3
Statistical results of face recognition models.
Models Data set Accuracy (%) Model size (MB) Time (ms/sheet)

LFW AgeDB

MobileFaceNet CASIA-Webface 99.10 90.03 4.9 117


MobileFaceNet CASIA-Webface-Gamma 99.15 90.05 4.9 103
MobileFaceNet-SRM CASIA-Webface 99.18 90.12 5.3 125
MobileFaceNet-SRM CASIA-Webface-Gamma 99.27 90.18 5.3 113

the Agedb dataset, and the model size is only 4.9 M. The recognition speed of each image is 117 ms. The ROC curve based on the LFW
dataset is shown in Fig. 8:

(2) Based on the original model and enhanced dataset

From the comparison between the second and first rows of Table 3, it can be seen that the model remains unchanged. The
MobileFaceNet network trained on the CASIA Webface training set after adaptive gamma correction. It has improved the efficiency of
face recognition compared to the feature extraction network trained on the original training dataset, with an accuracy of 0.05 %,
reaching 99.15 % on the LFW dataset. Similarly, the accuracy on the Agedb dataset also improved by 0.02 %, reaching 90.05 %.
Increasing recognition speed by 14 milliseconds per image. The ROC curve based on the LFW dataset is shown in Fig. 9:

(3) Based on improved model and original dataset

From the comparison between the third row and the first row of Table 3, it can be seen that the dataset remains unchanged. After
integrating the SRM module, the feature extraction network can dynamically enhance useful feature expressions based on the style
information of the feature map. We trained the improved feature extraction network MobileFaceNet-SRM on the original training
dataset CASIA Webface, which can improve the recognition accuracy by 0.08 %, reaching 99.18 % on the LFW dataset. In addition, it
also has excellent performance on the Agedb dataset, with a 0.09 % improvement and an accuracy rate of 90.12 % . The recognition
speed of each image is 125 ms. The ROC curve based on the LFW dataset is shown in Fig. 10:

(4) Based on improved models and enhanced datasets

From the comparison between the data in the fourth row and the first row of the table, it can be seen that the MobileFaceNet
network fused with the SRM module and the adaptive gamma correction proposed in the paper can effectively improve the efficiency
of the model in face recognition. It can improve the accuracy by 0.17% compared to the original model and the original dataset,
reaching 99.27% on the LFW dataset. When testing with the Agedb dataset, it can also be found that the accuracy has been improved.
Compared to the original model, the accuracy has been improved by 0.15 %, resulting in a face recognition accuracy of 90.18 %. In
contrast to the initial model and dataset, the processing speed for each image is enhanced by 4 ms. Compared to the second and third
rows, there is also a certain improvement, and the result is the best. The ROC curve based on the LFW dataset is shown in Fig. 11:

3.4.3. Discussion
According to the experimental statistical results in Table 3, it can be seen that the true positive rate was obtained by comparing

Fig. 8. Roc Curve based on the original model and original dataset.

12
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

Fig. 9. Roc Curve based on the original model and enhanced dataset.

Fig. 10. Roc Curve based on improved model and original dataset.

Fig. 11. Roc Curve based on improved models and enhanced datasets.

13
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

different methods. The image preprocessing method proposed in the paper (the adaptive Gammer correction method) is used to
perform grayscale transformation and enhance image details on face images. To better ensure the relative grayscale changes and detail
integrity of the image. Making images with poor exposure quality clearer effectively enhances the dataset.
At the same time, we improved the MobileFaceNet network by adding the SRM style attention mechanism to the Depthwise of the
bottleneck module in the traditional network structure. SRM first extracts style information from each channel of the feature map
through style pooling. And then estimates the recalibration weight of each channel through style integration independent of the
channel. By incorporating the relative importance of each style into the feature map, SRM effectively enhances the representation
ability of CNN. The focus is on lightweight, with very few parameters introduced. From Table 3, we can see that while effectively
improving the model recognition efficiency, the memory occupied only increases by 0.4 MB, which can be well adapted to the
lightweight face recognition network MobileFaceNet and has better application prospects on embedded devices.
ROC curves of the four experiments in Table 3 are partial zoom. As shown in Fig. 12. By comparing the AUC areas of the four
experiments, it can be clearly and intuitively seen that the ROC curves obtained from the improved MobileFaceNet combined with the
adaptive gamma correction algorithm have a significant improvement in the AUC area compared to other experiments. Finally, the
method was validated through experiments on the LFW dataset, and the accuracy of face recognition can reach 99.27 % while ensuring
that the model size does not change significantly. As a comparison, Xiao et al. also introduced an attention module, which introduced
the ECA model, learned the importance of each feature channel, and used the h-ReLU6 activation function to replace the PReLU in the

Fig. 12. ROC Curve and AUC.

14
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

original model, resulting in an improved recognition accuracy of 98.52 % on the LFW test set. Hu et al. improved the structure of
MobileFaceNet by optimizing the bottleneck module to a sandglass module, improving the relative positions of deep convolution and
point by point convolution, and increasing the number of output channels of the sandglass module to reduce information loss during
feature compression and enhance the recognition performance of facial spatial feature extraction. Their method can achieve an ac­
curacy of 99.15 % on the LFW test set while effectively compressing the model size. Zaferani et al. also improved MobileFaceNet by
using the marginal extraction cost function for training. At the same time, in order to improve the performance of the model, dense
blocks and depth separable convolutions were used in the model . Their method can improve the accuracy of 0.017 % on the LFW test
data set.This paper demonstrates that the efficiency of facial recognition is superior to the above three improved algorithms while
ensuring the size of the model, which has certain practical significance.

4. Conclusion

In the research process of face recognition, the quality of face images and the feature extraction ability of the network play a crucial
role. How to effectively extract features from face images will directly affect the recognition rate of face images. A face recognition
method based on the fusion of improved MobileFaceNet and adaptive Gamma algorithm is proposed to address the problem of low
accuracy in face recognition caused by external lighting and other factors in the face image recognition process. By implementing the
adaptive gamma correction proposed in this paper on face images to address issues such as uneven lighting and poor exposure quality,
the enhanced image can display more details than the original image. Additionally, in the proposed improvement of the traditional
MobileFacenet network model. SThe SRM module is added to traditional MobileFaceNet to enhance the network’s feature extraction
capability. Finally, the method was validated on the LFW dataset through experiments, its facial recognition accuracy can reach 99.27
%.And on the Agedb dataset, its facial recognition accuracy can reach 90.18 %. while effectively improving the model recognition
efficiency, the memory occupied only increases by 0.4 MB and the processing speed for each image is enhanced by 4 ms.,its recognition
performance was superior to traditional face recognition algorithms.
In the future, other loss functions will be explored to further improve the verification accuracy of the improved MobileFaceNet
model. Such as AdaCos face loss function, whose dynamic adaptive parameters have different effects on the classification probability at
each iteration, so that a reasonable scaling factor can be dynamically generated according to the convergence degree of the model to
accelerate the convergence speed of the model. It will further improve the recognition efficiency of lightweight face recognition
models.

CRediT authorship contribution statement

Jingwei Li: Formal analysis. Yipei Ding: Writing – review & editing, Methodology. Zhiyu Shao: Data curation. Wei Jiang:
Writing – original draft.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jfranklin.2024.107306.

References

[1] A. Sepas-Moghaddam, F. Pereira, P. Correia, Face recognition:a novel multi-level taxonomy based survey, IET. Biom. (2020) 1–12, pages.
[2] S. Du, R. Ward, Face recognition under pose variations, J. Franklin. Inst. (2006) 596–613, pages.
[3] V.B.T. Shoba, I.S. Sam, A Hybrid Features Extraction on Face for Efficient Face Recognition, Multimed. Tools. Appl 79 (31) (2020) 22595–22616, pages.
[4] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. MobileNets: efficient
convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861, 2017.
[5] T. Sheng, C. Feng, S. Zhuo, X. Zhang, L. Shen, M. Aleksic, A quantization-friendly separable convolution for mobilenets, in: 2018 1st workshop on energy
efficient machine learning and cognitive computing for embedded applications (EMC2), IEEE, 2018, pp. 14–18, pages.
[6] Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen, MobileNetV2: inverted residuals and linear bottlenecks, in: Proceedings
of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2018, pp. 4510–4520, pages.
[7] Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, et al.,
Searching for mobilenetv3, in: ICCV, 2019, pp. 1314–1324, pages.
[8] Jie Hu, Li Shen, Gang Sun, Squeeze-and-excitation networks, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018,
pp. 7132–7141, pages.
[9] Sheng Chen, Yang Liu, Xiang Gao, Zhen Han, Mobilefacenets: efficient cnns for accurate real-time face verification on mobile devices, in: Chinese conference on
biometric recognition, 2018, pp. 428–438, pages.
[10] Zhihao Shi, Xiaohong Liu, Kangdi Shi, Linhui Dai, Jun Chen, Video frame interpolation via generalized deformable convolution, IEEE Trans. Multimedia (2021)
426–439, pages.
[11] B. Xu, N. Wang, T. Chen, and M. Li. Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853, 2015.

15
J. Li et al. Journal of the Franklin Institute 361 (2024) 107306

[12] J. Xiao, G. Jiang, H. Liu, A Lightweight Face Recognition Model based on MobileFaceNet for limited computation environment, EAI Endorsed Trans. Internet
Things (2022) 1–9, pages.
[13] Yuan Hao, Hao Chen, Shiwang Liu, Lin Jun, Luo Xiao, A deep convolutional neural network for detection of rail surface defect, in: Proceedings of the 2019 IEEE
vehicle power and propulsion conference (VPPC), Hanoi, Vietnam, 2019, pp. 1–4, pages.
[14] Qilong Wang, Banggu Wu, Pengfei Zhu, Peihua Li, Wangmeng Zuo, Qinghua Hu, Eca-net: efficient channel attention for deep convolutional neural networks, in:
CVPR, 2020.
[15] P. Zhang, F. Zhao, P. Liu, M. Li, Efficient lightweight attention network for face recognition, IEEe Access. (2022) 31740–31750, pages.
[16] H.U. Jiarong, M.E.N.G. Wen, Jingjing ZHAO, Face Recognition Method Based on Improved MobileFaceNet, Semicond. Optoelectr. (2022) 164–168, pages.
[17] H. Zaferani, K. Kiani, Rastgoo, Real-time face verification on mobile devices using margin distillation. Multimedia tools and applications, 2023,
pp. 44155–44173, pages.
[18] B. Ergen. Facial landmark based region of interest localization for deep facial expression recognition, Tehnički Vjesnik, pages 38–44, 2022.
[19] S.H. Wan, Y. Xia, L.Y. Qi, Y.H. Yang, M. Atiquzzaman, Automated colorization of a grayscale image with seed points propagation, IEEE T. Multimedia, 2020,
pp. 1756–1768, pages.
[20] Z.K. Zhuang, Y.S. Yin, X.X. Sun, L. Lei, Design of a wide voltage swing high-precision gamma correction circuit for AMOLED, Chin J Liq Cryst Dis (2021)
529–537, pages.
[21] C.F.J. Kuo, H.C. Wu, Gaussian probability bi-histogram equalization for enhancement of the pathological features in medical images, Int. J. ImAging Syst.
Technol. (2019) 132–145, pages.
[22] J. Ao, C.B. Ma, Adaptive stretching method for underwater image color correction, Int. J. Pattern. Recogn (2018) 1–14, pages.
[23] M.I. Ashiba, H.I. Ashiba, M.S. Tolba, A.S. EI-Fishawy, F.E. Abd EI-Samie, An efficient proposed framework for infrared night vision imaging system, Multimed.
Tools. Appl. (2020) 23111–23146, pages.
[24] S.P. James, D.A. Chandy, Devignetting fundus images via Bayesian estimation of illumination component and gamma correction, Biocybern. Biomed. Eng.
(2021) 1071–1092, pages.
[25] H.Z. Zhang, J.S. Liang, H.B. Jiang, Y.F. Cai, X. Xu, Lane line recognition based on improved 2D-gamma function and variable threshold Canny algorithm under
complex environment, Meas Control (2020) 1694–1708, pages.
[26] A.C. Maioli, A.G.M. Schmidt, Exact solution to the Lippmann-Schwinger equation for an elliptical billiard, Phys (2019) 51–62, pages.
[27] H.L. Zhang, A. Jolfaei, M.A. Alazab, A face emotion recognition method using convolutional neural network and image edge computing, IEEe Access. (2019)
159081–159089, pages.
[28] Z.X. Huang, R. Huang, B.H. Zhao, C. Su, H. Du, Gamma correction with adjustable segmentation for OLED-on-silicon microdisplay, Chin. J. Liq. Cryst. Dis (2020)
825–830, pages.
[29] A. Saad, J. Ahmed, A. Elaraby, Classification of bird sound using high-and low-complexity convolutional neural networks, pages 187–193, 2022.
[30] X.Z. Xu, Y.Y. Ding, Z.H. Lv, Z.N. Li, Optimized pointwise convolution operation by ghost blocks, Electron. Res. Arch (2023) 3187–3199, pages.
[31] V. Ayumi, E. Ermatita, A. Abdiansah, H. Noprisson, Y. Jumaryadi, M. Purba, M. Utami, E.D. Putra, Transfer learning for medicinal plant leaves recognition: a
comparison with and without a fine-tuning strategy, Int. J. Adv. Comput. (2022) 138–144, pages.
[32] S.H. Wang, M.A. Khan, V. Govindaraj, S.L. Fernandes, Z.Q. Zhu, Y.D. Zhang, Deep rank-based average pooling network for Covid-19 recognition, in: Cmc-
Comput Mater Con, 2022, pp. 2797–2813, pages.
[33] S. Bera, V.K. Shrivastava, Effect of pooling strategy on convolutional neural network for classification of hyperspectral remote sensing images, in: Iet Image
Processing, 2020, pp. 480–486, pages.
[34] J. Crnjanski, M. Krstic, A. Totovic, N. Pleros, D. Gvozdic, Adaptive sigmoid-like and PReLU activation functions for all-optical perceptron, Opt. Lett. (2021)
2003–2006, pages.
[35] S. Watanabe, H. Yamana, Overfitting measurement of convolutional neural networks using trained network weights, Int. J. Data Sci. Anal. (2022) 261–278,
pages.
[36] F. Peng, L.P. Yin, L.B. Zhang, M. Long, CGR-GAN: CG facial image regeneration for antiforensics based on generative adversarial network, in: IEEE T
Multimedia, 2020, pp. 2511–2525, pages.
[37] K. Zaitsu, S. Noda, T. Ohara, T. Murata, S. Funatsu, K. Ogata, A. Ishii, A. Iguchi, Optimal inter-batch normalization method for GC/MS/MS-based targeted
metabolomics with special attention to centrifugal concentration, Anal. Bioanal. Chem. (2019) 6983–6994, pages.

16

You might also like