Computer Vision2
Computer Vision2
a r t i c l e i n f o a b s t r a c t
Article history: Background and Objective: Face recognition success rate is influenced by illumination, expression, posture
Received 28 April 2020 change, and other factors, which is due to the low generalization ability of a single convolutional neural
Accepted 15 June 2020
network. A new face recognition method based on parallel ensemble learning of convolutional neural
networks (CNN) and local binary patterns (LBP) is proposed to solve this problem. It also helps to improve
Keywords: the low pedestrian detection rate caused by occlusion.
Convolutional Neural Networks (CNN)
Methods: First, the LBP operator is employed to extract features of the face texture. After that, 10 convo-
Local Binary Patterns (LBP)
Ensemble learning lutional neural networks with 5 different network structures are adopted to further extract features for
Face recognition training, to improve the network parameters and get classification result by using the Softmax function
after the layer is fully connected. Finally, the method of parallel ensemble learning is used to generate
the final result of face recognition using majority voting.
Results: By this method, the recognition rates in the ORL and Yale-B face datasets increase to 100% and
97.51%, respectively. In the experiments, the proposed approach is illustrated not only enhances its toler-
ance to illumination, expression, and posture but also improves the accuracy of face recognition and the
poor generalization performance of the model, which is normally caused by the learning algorithm being
trapped in a local minimum. Moreover, the proposed method is combined with a pedestrian detection
model as a hybrid model for improving the detection rate, which shows in the result that the detection
rate is improved by 11.2%.
Conclusion: In summary, the proposed approach greatly outperforms other competitive methods.
© 2020 Elsevier B.V. All rights reserved.
https://doi.org/10.1016/j.cmpb.2020.105622
0169-2607/© 2020 Elsevier B.V. All rights reserved.
2 J. Tang, Q. Su and B. Su et al. / Computer Methods and Programs in Biomedicine 197 (2020) 105622
The convolutional neural network can detect the facial features Therefore, the parallel ensemble learning based on LBP and
from image directly, but it will learn the noise of the image in the CNN is proposed in this paper to extract face image features, which
meantime. Both HOG and LBP can process the face images and re- reduces the depth of the convolutional neural network, improves
duce the noise interference, thus making the features of images the accuracy of classification and reduces the influence of illumi-
more obvious. Since HOG is more focused on extracting the pre- nation, expression and posture change, etc. The main difference of
sentational and shape features of the target, while LBP can extract our approach and the existing approaches are that our approach
the texture features of the target, therefore, LBP has better effect utilizes LBP to extract facial features firstly. Afterwards, they are
on extracting facial features. In this paper, LBP is employed to de- fed into ResNet and classified, where ResNet is a cutting-edge CNN
tect facial texture features to reduce the influence of illumination model for classifying images.
and expression; and CNN and skip connection are used for paral-
lel convolution processing, so as to reduce the training time and 3. FACE recognition based on parallel ensemble learning of LBP
improve the accuracy of classification. Meanwhile, by introducing and CNN
the parallel ensemble learning method and parallel connection of
two or more convolutional neural networks with different struc- This paper mainly studies the face recognition based on paral-
tures for face recognition, the diversity of the ensemble individu- lel ensemble learning of LBP and CNN. In this paper, LBP is first
als is improved and the generalization ability of the network is en- employed to analyze the texture of the input images, then CNN
hanced. One face recognition method based on parallel ensemble is utilized to get the facial features of the images processed by
learning of LBP and CNN is proposed in this paper, which effec- LBP, finally, the parallel ensemble learning method is utilized to
tively improves the face recognition accuracy. improve the poor generalization performance of the CNN caused
In the experiment part, two different experiments are con- by the learning algorithm being trapped in a local minimum, so as
ducted. First, ORL [8] and Yale-B [13] face data sets are adopted to to improve the effect of distinguishing different faces. The imple-
test the accuracy of our proposed CNN model in face recognition mentation flow of the proposed approach is given in Fig. 1, and the
problem. In detail, PCA, HOG-CNN, CNN and the proposed method detailed implementation way is described as follows.
are compared in the experiment, the result of which demonstrates
that our method outperforms the others. Second, our method for 4. Local Binary Pattern (LBP)
face recognition is combined with a CNN model for pedestrian de-
tection, we call it hybrid model. This hybrid model improves the The local binary pattern, with its simple principle, low compu-
accuracy of pedestrian detection caused by occlusion with our pro- tational complexity, grayscale invariance, and illumination insensi-
posed face recognition model. Intuitively, human face is a small tivity, can extract the texture features of images and fuse the over-
part of a whole human in an image. Therefore, even half of the all features of an image.
human is occluded, the pedestrian is available to be detected as In this paper, the LBP model improved by Ojala [8] and other
long as his/her face is clear. researchers can obtain the texture features of images by changing
the radius of circle and the number of pixels. Bilinear interpola-
2. Literature review tion is used to obtain the point gray value that is not in the center
of the pixel box, which makes the algorithm more robust. Using
Tahira et al. [1] used PCA to reduce face dimensions so as to the improved LBP, the radius and pixels are expressed by R and
realize face recognition. Face image acquisition is influenced by il- P, respectively. The LBP algorithm for different radii and pixels is
lumination, expression and posture, which results in a large differ- demonstrated in Fig. 2.
ence of the same individual, and the reduction of the face recog- The basic idea of LBP is to compare the gray value of every
nition rate [2]. The face recognition approach based on local bi- neighboring pixel with that of the center pixel. Taking the radius of
nary pattern introduced by Ahonen et al. [3] can divide a face im- 1 and the number of pixels of 8 as an example, taking the center
age into several regions for face recognition. The face recognition point as the base point, the gray value of the central point is com-
approach based on multi-direction local binary pattern proposed pared with the gray values of 8 pixels in neighborhood. If the gray
by Liu et al. [4] acquires the feature vector by replacing a single value of the neighboring pixels is greater than that of the center
pixel with the regional mean value, which not only introduces the pixel, the gray values of all neighboring pixels are set to 1; on the
whole spatial image information, but also reduces the dimensions contrary, the gray values of all neighboring pixels are set to 0. The
of the image. By this way, the face recognition rate under com- LBP-based image feature extraction procedure is shown in Fig. 3.
plex illuminations is significantly improved. The face recognition The formula of LBP can be expressed as:
method based on a hybrid model of CNN and LBP proposed by
Wang et al. [5] can effectively overcome the disadvantage of poor
P−1
LBPP,R (x, y ) = 2n s(in − ix,y ) (1)
grayscale stability of CNN and reduce the influence of illumina-
n=0
tion, expression and posture change. The face recognition approach
based on a hybrid model of HOG and CNN proposed by Ahamed 1, x ≥ 0
et al. [6] makes face recognition by inputting CNN with the shape S (x ) = (2)
0, x < 0
of the target. There are still some problems in the above methods.
For example, with the change of illumination intensity and pos- In which, LBPP,R (x, y) represents the LBP texture feature with
ture, the recognition rate of PCA will be greatly reduced. The HOG the center pixel (x, y), the radius R, and the neighboring pixels P.in
is more focused on extracting the presentational and shape fea- represents the gray value of the nth neighboring pixel, and ix,y rep-
tures of the target, instead of specific facial feature points. A large resents the gray value of the center pixel.
number of experiments prove that the increasement of depth and The texture features after being extracted by LBP are insensitive
width of the network improves the accuracy. However, with the to light change, and they are less affected by illumination varia-
deepening of the convolutional neural network, problems like the tions than that of the original image. Facial features of the same in-
increasing of parameters and the surging of computations will oc- dividual with various illumination intensities are extracted by LBP
cur. A deeper network can lead to a rapid saturation of accuracy, and the effect is shown in Fig. 4. In addition, as shown in Fig. 4,
and after saturation, a higher error rate will occur as the number not only is LBP able to capture the facial features with only a little
of training times increases [7]. affection of illumination, also that LBP preserves adequate details
J. Tang, Q. Su and B. Su et al. / Computer Methods and Programs in Biomedicine 197 (2020) 105622 3
Fig. 1. Flow Chart for the Implementation Method Introduced in the Paper.
famous Batch Normalization [10] for skip connection [7] via CNN.
The network structure in Fig. 5 is one of the CNN models used in
this paper, which consists of four convolutional layers, five maxi-
Fig. 3. Calculation Process with the LBP Algorithm. mum pooling layers and one stacking model of Inception modules.
Table 1
Convolutional Neural Network Structure.
Type Kernel Size Type Kernel Size Type Kernel Size Type Kernel Size Type Kernel Size
1 C1 3 × 3 C1 3 × 3 C1 3 × 3 C1 3 × 3 C1 3 × 3
2 P1 2 × 2 P1 2 × 2 P1 2 × 2 P1 2 × 2 P1 2 × 2
3 C2 3 × 3 C2 3 × 3 C2 3 × 3 C2 3 × 3 C2 3 × 3
4 P2 2 × 2 P2 2 × 2 P2 2 × 2 P2 2 × 2 P2 2 × 2
5 I_C1 1 × 1 I_C1 1 × 1 I_C1 1 × 1 I_C1 1 × 1 I_C1 1 × 1
I_C2 3 × 3 I_C2 3 × 3 I_C2 3 × 3 I_C2 3 × 3 I_C2 3 × 3
I_C3 5 × 5 I_C3 5 × 5 I_C3 5 × 5 I_C3 5 × 5 I_C3 3 × 3
I_P1 3 × 3 I_P1 3 × 3 I_P1 3 × 3 I_P1 3 × 3 I_P1 3 × 3
6 P3 2 × 2 P3 2 × 2 P3 2 × 2 P3 2 × 2 P3 2 × 2
7 C3 1 × 3 C3 1 × 3 C3 1 × 3 P4 2 × 2 P4 2 × 2
8 C4 3 × 1 C4 3 × 1 C4 3 × 1 C3 1 × 3 C3 1 × 3
9 P4 2 × 2 P4 2 × 2 P4 2 × 2 C4 3 × 1 C4 3 × 1
10 FC – FC – C5 1 × 3 P5 2 × 2 P5 2 × 2
11 S – S – C6 3 × 1 FC – FC –
12 P5 2 × 2 S – S –
13 FC –
14 S –
Table 3 features and improve the face recognition accuracy. In CNN, the
Recognition Results of Each Method in ORL
Inception module is utilized to increase the width of the CNN,
Face Image Dataset.
the Batch Normalization is employed to reduce the training time,
Method Recognition Rate /% and the skip connection is employed to improve the accuracy of
Scheme 1 92.00 face recognition. The parallel ensemble learning makes the net-
Scheme 2 99.50 work structure no longer single, and greatly improves the accuracy
Scheme 3 93.50 and generalization ability of the proposed approach in the paper. In
Scheme 4 96.6
the experiments, we compared the proposed approach in this pa-
Proposed Method 100.00
per with other three methods, which are PCA, HOG-CNN and CNN
respectively. The final results illustrate that the proposed approach
Table 4
is more effective in face recognition, and its accuracy of face recog-
Recognition Results of Each Method in Yale-B
Face Image Dataset. nition is promising.
[1] Z. Tahira, H.M. Asif, Effect of averaging techniques on PCA algorithm and its
performance evaluation in face recognition applications, in: International Con-
that of PCA, is over 30% higher than that of HOG-CNN and is a ference on Computing, Electronic and Electrical Engineering (ICE Cube), 2018,
slightly higher than LBP-CNN. Because the illumination variation pp. 1–6.
of Yale-B face image dataset is relatively large, the recognition rate [2] D.N. Parmar, B.B. Mehta, Face recognition methods & applications, CoRR
abs/1403.0485 (2014).
will be reduced when the original image features are used. How-
[3] T. Ahonen, A. Hadid, M. Pietikainen, Face Description with Local Binary Pat-
ever, the proposed approach in the paper adopts LBP to extract terns: application to Face Recognition, IEEE Trans. Pattern Anal. Mach. Intell.
texture features, which greatly reduces the influence of illumina- 28 (12) (2006) 2037–2041.
[4] J. Liu, Y. Chen, S. Sun, Face recognition based on multi-direction local binary
tion changes. The robustness to the change of illumination of the
pattern, in: 3rd IEEE International Conference on Computer and Communica-
proposed method in this paper is stronger than that of the three tions (ICCC), 2017, pp. 1606–1610.
methods compared, and the extraction of face detail features and [5] M. Wang, Z. Wang, J. Li, Deep convolutional neural network applies to face
recognition rate of the proposed method are much better than that recognition in small and medium databases, in: 4th International Conference
on Systems and Informatics (ICSAI), 2017, pp. 1368–1372.
of Scheme 3. [6] H. Ahamed, I. Alam, M.M. Islam, HOG-CNN based real time face recognition,
In summary, the proposed method in this paper is superior in: International Conference on Advancement in Electrical and Electronic Engi-
to the three contrast schemes in both datasets. Compared with neering (ICAEEE), 2018, pp. 1–4.
[7] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition,
Schemes 1, 2 and 3, the proposed method in this paper has in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016,
stronger adaptability and higher accuracy in face recognition sit- pp. 770–778.
uations with large changes in illumination and expression. [8] T. Ojala, M. Pietikainen, T. Maenpaa, Multiresolution gray-scale and rotation
invariant texture classification with local binary patterns, IEEE Trans. Pattern
In the identification of ORL and Yale-B face image datasets, Anal. Mach. Intell. 24 (7) (2002) 971–987.
the average elapsed time of the proposed method in this paper is [9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S.E. Reed, D. Anguelov, D. Erhan, V. Van-
shown in Table 5, and the recognition time is all less than 40 ms, houcke, A. Rabinovich., Going deeper with convolutions, in: IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
which is in accordance with the real-time processing requirements.
[10] S. Ioffe, C. Szegedy, Batch normalization accelerating deep network training
by reducing internal covariate shift, in: International Conference on Machine
8. Conclusion Learning (ICML), 2015, pp. 448–456.
[11] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception
architecture for computer vision, in: IEEE Conference on Computer Vision and
The face recognition approach based on the parallel ensem- Pattern Recognition (CVPR), 2016, pp. 2818–2826.
ble learning of LBP and CNN introduced in the paper adopts [12] CBCL pedestrian database, website: http://cbcl.mit.edu/software-datasets/
LBP to extract the texture features as training data for the par- PedestrianData.html. Last accessed: 10/27/2019.
[13] N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in:
allel CNN and finally is applicable to face recognition. LBP fea- IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2005,
tures are mainly utilized to extract facial texture features, be- pp. 886–893.
cause LBP can reduce the influence of illumination on facial