Proof: An Efficient Fuzzy Deep Learning Approach To Recognize 2D Faces Using Fadf and Resnet-164 Architecture
Proof: An Efficient Fuzzy Deep Learning Approach To Recognize 2D Faces Using Fadf and Resnet-164 Architecture
DOI:10.3233/JIFS-211114
IOS Press
F
O
K. Seethalakshmia,∗ , S. Vallib , T. Veeramakalic , K.V. Kanimozhid , S. Hemalathae and M. Sambathf
a Department of Computer Science and Engineering, Vel Tech Rangarajan Dr. Sagunthala R&D Institute
O
of Science and Technology, Chennai, Tamil Nadu, India
b Department of Computer Science and Engineering, College of Engineering, Anna University, Chennai,
PR
c Department of Data science and Business Systems, School of Computing, SRM Institute of Science
Abstract. Deep learning using fuzzy is highly modular and more accurate. Adaptive Fuzzy Anisotropy diffusion filter
(FADF) is used to remove noise from the image while preserving edges, lines and improve smoothing effects. By detecting
edge and noise information through pre-edge detection using fuzzy contrast enhancement, post-edge detection using fuzzy
morphological gradient filter and noise detection technique. Convolution Neural Network (CNN) ResNet-164 architecture is
used for automatic feature extraction. The resultant feature vectors are classified using ANFIS deep learning. Top-1 error rate
R
is reduced from 21.43% to 18.8%. Top-5 error rate is reduced to 2.68%. The proposed work results in high accuracy rate with
low computation cost. The recognition rate of 99.18% and accuracy of 98.24% is achieved on standard dataset. Compared
R
to the existing techniques the proposed work outperforms in all aspects. Experimental results provide better result than the
existing techniques on FACES 94, Feret, Yale-B, CMU-PIE, JAFFE dataset and other state-of-art dataset.
O
Keywords: Fuzzy anisotropy diffusion, edge detection, contrast enhancement, CNN (ResNet), feature extraction, ANFIS
deep learning
C
and classifying them. Preprocessing enhances the rapidity of the algorithm. They capture both local
contrast of the image, eliminates noise from the image and global information and the experiment results
through filters and preserves the edges and line infor- are more significantly better than the other competi-
mation. A number of noise removal methods are tive algorithms. AFLW face dataset and Pascal face
available to deal with, Gaussian noise, salt and pepper dataset are compared.
noise and Poisson noise. Wiener filter, median filter, CNN has convolutional layers, pooling layers
and linear filter are used for removing noise. and fully connected layers [18]. CNN is a type of
Convolution neural network overcomes the draw- deep learning which learns the system very fast.
back of artificial neural network (ANN) which is A weighted mixture deep neural network is auto-
computationally expensive. CNN [2] automatically matically extracting the feature for facial expression
extracts the features from the image by reducing the recognition [19]. They used CK+ [35], JAFFE [37],
F
parameter required to setup the model. It allows to Oulu-CASIA [36] dataset and obtained recognition
encode image specific features into the architecture accuracy of 0.970, 0.922, and 0.923 respectively.
O
making it appropriate for image focused task. Certain CNN architecture with five convolution layers fol-
common layer such as hidden layer, pooling layer, lowed by one max pooling layer and three fully
O
convolution layers are stacked as neural network to connected layers followed by Softmax [3]. GPU and
form CNN. Multiple hidden layers are stacked up non-saturating neuron is used in improving the per-
PR
on each other and is called as deep learning. CNN formance of the system. Dropout is the technique
architecture exists as Vgg-vd16 [4], Vgg-vd19 [5], adopted in fully connected layer to reduce over fit.
ResNet-101 [6] and AlexNet [3]. Based on the sys- ImageNet LSVRC data set is used in the imple-
tem performance and the error rate any one of these mentation. They achieved top-5 error rate of 15.3%
architectures is selected. In this work ResNet-101 compared to the next entry in that layer which is
D
architecture is used to evaluate the accuracy of the 26.2% error rate.
system. The authors [4] in their work proposed to solve two
Face recognition related feature extraction and facts namely first how large scale dataset is accumu-
TE
dimensionality reduction have been attempted using lated by grouping of automation and human in the
LBP [7], RLBP [9], texture (GLCM) [13], local direc- loop. Second the complexity in deep neural network.
tional ternary pattern [14], sparse manifold subspace They also discuss about the data purity, performance
learning [8], PCA [10], LDA [11] and ADA [12]. rate and time complexity. Standard LFW and YTF
EC
Classification has been achieved using machine learn- face benchmark data set is used in the implementa-
ing such as K-NN [15], SVM [16], and neural network tion. They achieved 98.95% accuracy rate for LFW
[17] to identify the correct individual among the test- and 97.3 for YTF dataset.
ing images. Deep learning is an extension of neural Very deep neural network architecture to reduce the
R
network from machine learning concepts. error rate from top-1 to top-5 using very small 3 × 3
The proposed work removes noise using fuzzy convolution filter layers [5]. The ConvNet architec-
R
anisotropy diffusion and preserves the edges by ture is used and achieved top 1 error rate of 25.5 and
despekcle the noise. Preprocessed image is given as top-5 error rate of 8.0. ImageNet dataset is used in
O
input to the convolution neural network ResNet-164 the implementation and provides better result than
architecture to extract feature vector. Finally, fuzzy existing state of the art results.
C
min-max hyperbox deep learning is used for classifi- Deep residual learning (ResNet) architecture is
cation. The related works are presented next. implemented to improve deep convolution neural net-
work for deeper training network than the existing
network [6]. ImageNet dataset is used with 152 layers
2. Related works which is 8 × 8 deeper than VGG Net, 3 × 3 convo-
lution with 512 layers followed by average pooling
The deep convolution neural network on Hyper- and fully connected 1000 layers. 3.57% error rate was
Face images with four challenging factors such as obtained which is best when compared to the existing
face detection, landmarks localization, pose esti- results.
mation and gender recognition [1]. The HyperFace Fuzzy deep neural network with sparse autoen-
is divided into two factors HyperFace-ResNet and coder (FDNNSA) to understanding intention of
Fast-HyperFace to accomplish state-of-the-art per- human being based on human emotions and informa-
formance and a high face indicator to increase the tion such as age, gender, and region in which the fuzzy
K. Seethalakshmi et al. / An efficient fuzzy deep learning approach to recognize 2D faces 3
F
C-means (FCM) is used to cluster the input data and done such as image preprocessing, fuzzy inference
system and diffusion iteration. Image preprocessing
O
FDNNSA is used to detect the intention of the human
[38]. The ability of feature extraction is improved by is divided into edge detection and noise removal.
fuzzy technique by removing the redundancy through Edge detection is a two-step process. Such as pre edge
O
restricted Boltzmann machine (F3RBM) is developed detection is done through fuzzy contrast enhance-
and those features are imported into SVM which ment technique and post edge detection is done by
PR
attains fast and high-precision automatic classifica- morphology gradient operation to improve the per-
tion of dissimilar samples [39]. formance of the edge detection technique. Noise
The proposed system enhances the contrast and detection is done using adaptive median filter. The
preserves the edges, thereby improving the qual- obtained membership value from edge detection and
ity. The performance and accuracy of the system is noise removal is used in fuzzy inference system.
D
increased by deep learning. The next section gives
the block diagram of the proposed system. 3.1.1. Fuzzy preprocessing
A. Pre-edge detection using fuzzy contrast equal-
TE
ization
3. Proposed work Fuzzy contrast equalization is used to improve the
brightness of the input image based on the intensity
EC
The proposed system has three stages. Preprocess- values such as low, medium and high which are said
ing is done through FADF to improve the quality of to be membership function and A,B, and C are the
the image. Features are extracted using CNN ResNet- fuzzy rule applied to improve the contrast of the input
164 architecture and deep fuzzy classifier is used in image. Initially set the fuzzy limit based on member-
ship function then apply the fuzzy rules and finally
R
Untitled
(mamdani)
F
O
Fig. 4. Result obtained using morphological operation.
Fig. 3. Edge detection using fuzzy rules.
O
Next the resultant output image is given as input to C. Post edge detection using Fuzzy morphological
PR
edge detection using fuzzy logic. gradient
M N μmn Fuzzy morphological gradient is done to smooth
F= with μm,n ∈ [0, 1] (1) the detected edge. The morphological closing opera-
m=1 n=1 gmn
tion is done by erosion operation followed by dilation
In the above Equation (1) M × N is number pixel operation which smooth’s the edges by several iter-
ations. Based upon the different dilation depth the
D
in the image. Where g is the gray term based on the
brightness, gmn is the intensity value of (m,n)th pixel iteration is taken place. Morphological smoothing is
value and μmn is the membership value to enhance done by the opening operation followed by closing
TE
the brightness of the image. operation which removes the dark and bright artifact
B. Edge detection using fuzzy logic of noise. Here the dark and bright are the membership
After fuzzy contrast equalization the edge detec- function. Figure 4 illustrate the image obtained after
performing fuzzy morphological gradient operation.
EC
edge feature.
D. Noise Removal
O
F
3.1.3. Diffusion coefficient iteration
tion is performed along with zero padding operation
The resultant Defuzzification output is fuzzy coef-
which leads to increase in the dimension and no extra
O
ficient used to control the diffusion coefficient during
parameter is added. Second the projection shortcut
iteration approach. The gradient and degree of edge
operation is performed to map dimensionality vector.
O
and noise is used to control the speed of iteration
For both shortcuts the common stride value is taken
approach. When the number of iteration is less then
as 2.
there is a loss of information in the image whereas
PR
In Fig. 3, F (X) + X performs the feed forward neu-
when there is more iteration which in turn gives more
ral network operation with shortcut connection. But
information about the image.
in this work shortcut operation performs the iden-
tity mapping. The building block is defined using
3.2. Convolution neural network (CNN) Equation (3).
D
The obtained edge and contrast features are given y = F (x, {Wi }) + x (3)
as input to the CNN architecture consisting of resid-
TE
When the numbers of layers are increased, degra- y = F (x, {Wi }) + Ws x (4)
dation occurs. But overlapping and over fitting
R
F
O
O
PR
D
TE
EC
R
R
O
C
Fig. 6. ResNet architecture for different input images with 19 parameter layers and shortcut for each layer.
the combination of both neural networks and fuzzy of set of fuzzy IF–THEN rules. Hence, ANFIS is used
logic principles. It has the ability to identify the to classify the features obtained from the ResNet-164
advantages of both in a single framework. It consists architecture.
K. Seethalakshmi et al. / An efficient fuzzy deep learning approach to recognize 2D faces 7
F
O
O
PR
D
TE
EC
The ANFIS Deep classifier produces better result of 94.98% and 90.86% on CK+ and JAFFE dataset
than the existing machine learning algorithms such using CNN-VGG. Our approach achieved an accu-
R
as SVM, K-NN. The next section discusses the result racy of 98.24% and 97.51 % on CK+ and JAFFE.
and compares with existing works. Table 1 also gives the accuracy obtained by CNN
R
tp + fp + fn + tn
4.1. Results based on accuracy
Table 1
The proposed DCNN ResNet-164 architecture Accuracy Rate Obtained from Different Deep Learning Models
model provides better accuracy rate when compared Methods Accuracy (%)
to the existing approaches [23, 27]. The authors Yale-B CK+ JAFFE CMU-PIE
[27] achieved an accuracy rate of 90.58%, 90.02% CNN [23] – 93.12 88.92 –
p-CNN [27] 90.58 – 90.02 90.58
and 90.58% for the Yale-B dataset with illuminated CNN-VGG [23] – 94.98 90.86 –
images, JAFFE with pose variation images and CMU- gACNN [30] – 81.07 – –
PIE with expression variation images. This work Proposed Fuzzy 95.62 98.24 97.51 97.23
has enhanced the accuracy by 95.62%, 97.51% and Deep ResNet-164
architecture
97.23%. Biao Yang et al. [23] achieved an accuracy
8 K. Seethalakshmi et al. / An efficient fuzzy deep learning approach to recognize 2D faces
Table 2 Table 3
Error Rate (%) from Top-1 to Top-5 for Validation Recognition Rate Based On Classifier and Preprocessing
Model Top-1 Error Top-5 Error Methods Recognition Rate (%)
VGG-16 [24] 28.07 9.33 Extended Yale-B CMU-PIE
Google Net [25] – 9.15 VGG classifier + Original 65.69 95.91
PReLu-Net [26] 24.27 7.38 image [22]
ResNet-152 [6] 21.43 5.71 VGG Classifier + 85.86 96.98
Proposed ResNet-164 18.8 3.03 Preprocessed Image
[22]
In-Net + CNN Classifier 91.82 98.94
[22]
ResNet Classifier + 95.62 97.23
Original image
F
4.2. Comparison of error rate ResNet Classifier + 96.35 99.18
Filtered Image
O
Top-1 error rate implies that the target class will
be the first search prediction. Top-5 error rate implies Table 4
O
that the target class will be anywhere in the first five Accuracy Based on Standard Dataset
search predictions. Table 2 compares the top-1 and Method Accuracy (%)
top-5 error rate. The ResNet model is tested and vali- Face 94 Face95 Face 96 Grimace
PR
dataset dataset dataset dataset
dated according to the error rate obtained. K. He et al.
PCA [31] 72.10 69.87 70.95 74.79
[6] achieved 21.43% top-1 error rate and 5.71% top- LDA [32] 79.39 76.61 78.34 81.93
5 error rate. This approach reduced the top-1 error LBP [33] 85.93 80.47 84.14 86.45
rate to 18.8 % and top-5 error rate to 3.03% for the DL + LBP [28] 93.6 90.6 91.6 96.6
training dataset. The number of layers was increased Proposed Method 98.05 94.56 95.23 99.26
D
along with shortcut parameter resulting in reducing
the top-5 error rate to 2.68%. Equation (6) is used for
TE
The authors [22] used VGG classifier and obtained by several existing methods. In this work
compared with the proposed DCNN Res-Net-164 the training job is run around 450 iteration using GPU
R
architecture model on Extended Yale-B and CMU- Tesla V100 using 16 GB. Least possible 8 GB mem-
PIE dataset. In the proposed work along with the ory is sufficient for training any deep learning system.
O
ResNet, the preprocessing model improves the recog- CNN is used for feature extraction and produces an
nition rate of the system. Anisotropy diffusion filter is accuracy of 98.05 % on Face 94 dataset, 94.56 % on
Face 95 dataset, 95.23% on Face 96 dataset and 99.26
C
F
Fig. 8. Confusion matrix based on fuzzy deep learning. Fuzzy based Anisotropy diffusion filter removes
O
noise to give better clarity to image in the preprocess-
of the recognition rate in which actual outcome with ing stage. The proposed model improves the accuracy
rate without any loss of information. Compared to the
O
respect to the expected outcome. Accuracy, as given
by Equation (5), is how close the measured value is to existing techniques, the proposed work outperforms
the actual value. Precision, which is given by Equa- with respect to recognition rate, accuracy, Top-1 and
PR
tion (8), is the real value obtained by the system. The Top-5 error rate. Obtained features are classified
recall is the relevant information gathered in the mea- using deep ANFIS classifier where results are bet-
sured value and is given by Equation (9). True positive ter than the existing machine learning approaches. A
tp, implies that the correct face image is identified Huge dataset is trained and tested using deep learning
as the correct image. True negative tn, implies that approach which provides outstanding result than the
D
the incorrect face image is identified as the incorrect existing works. As future work, better preprocessing
image, false positive fp, implies that the correct face and segmentation algorithms are needed for occluded
TE
image is identified as the incorrect image, and false and partial occluded images. New features can be for-
negative fn, implies that the incorrect face image is mulated. 3D images can also be trained and tested in
identified as the correct image. Figure 8 depicts the future. Fuzzy recurrent neural network (FRNN) will
confusion matrix for fuzzy deep learning approach. be used to reduce the time and space by implementing
EC
tp
recall = (9)
tp + fn [1] R. Ranjan and R. Chellappa, Hyper Face: A Deep Multi-task
R
FERET dataset consists of 856 faces and it con- Multi-view Facial Expression Recognition, IEEE Transac-
tains 2413 facial images under different poses. The tions on Multimedia 18(12) (2016), 2528–2536.
extended Yale-B dataset has 16128 images with 9 [3] A. Krizhevsky I. Sutskever and G.E. Hinton, ImageNet
classification with deep convolutional neural networks, Pro-
poses and 64 illumination conditions of 28 individu- ceeding in International Conference on Advance in Neural
als. In this work, the extended Yale-B dataset is used. Information Processing System (2012), 1097–1105.
In training, 252 images of 9 poses and maximum [4] O.M. Parkhi, A. Vedaldi and A. Zisserman, Deep face recog-
nition, British Machine Vision Conference (2015), 1–12.
illuminated images under 45º illumination conditions
[5] K. Simonyan and A. Zisserman, Very deep convolu-
resulting in 1260 images are used. The other images tional networks for large-scale image recognition. (2014),
are used for testing. The JAFFE dataset is also used; 1409–1556.
251 images with 7 different expression variations [6] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learn-
ing for image recognition, IEEE Conference on Computer
from the JAFFE dataset is used for experimentation. Vision Pattern Recognition (2016), 770–778.
The seven different expression variations are hap- [7] J.-Y. Jung, S.-W. Kim, C.-H. Yoo, W.-J. Park and S.-J.
piness, sadness, fear, disgust, anger, contempt and Ko, LBP-Ferns-Based Feature Extraction for Robust Facial
10 K. Seethalakshmi et al. / An efficient fuzzy deep learning approach to recognize 2D faces
Recognition, IEEE Transactions on Consumer Electronics Going deeper with convolutions, IEEE Conference on Com-
62(4) (2016), 446–453. puter Vision and Pattern Recognition (2015), 1–9.
[8] M. Shao, M. Ma and Y. Fu, Sparse Manifold Subspace [26] K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rec-
Learning, Springer, Low-Rank and Sparse Modeling for tifiers: Surpassing human-level performance on imagenet
Visual Analysis (2014), 117–132. classification, IEEE International Conference on Computer
[9] W. Deng, J. Hu and J. Guo, Compressive Binary Patterns: Vision (2016), 1026–1034.
Designing a Robust Binary Face Descriptor with Random- [27] X. Yin and X. Liu, Multi-Task Convolutional Neural
Field Eigenfilters, IEEE Transactions on Pattern Analysis Network for Pose-Invariant Face Recognition, IEEE Trans-
and Machine Intelligence 41(3) (2019), 758–767. actions on Image Processing 27(2) (2018), 964–975.
[10] S.X. Wu, H.-T. Wai, L. Li and A. Scaglione, A Review of [28] A. Vinay, A. Gupta, A. Bharadwaj, A. Srinivasan, K. Ala-
Distributed Algorithms for Principal Component Analysis, subramanya Murthy and S. Natarajan, Deep Learning on
Proceedings of the IEEE 106(8) (2018), 1321–1340. Binary Patterns for Face Recognition, International Con-
[11] H. Zhao, Z. Wang and F. Nie, A New Formulation of ference on Computational Intelligence and Data Science,
F
Linear Discriminant Analysis for Robust Dimensionality Elsevier, 132 (2018), 76–83.
Reduction, IEEE Transactions on Knowledge and Data [29] A.R. Rivera, J.R. Castillo and O.O. Chae, Local directional
Engineering 31(4) (2018), 629–640. number pattern for face analysis: Face and expression recog-
O
[12] T. Luo, F. Nie and D. Yi, Dimension Reduction for Non- nition, IEEE Transaction on Image Processing 22(5) (2013),
Gaussian Data by Adaptive Discriminative Analysis, IEEE 1740–1752.
Transactions on Cybernetics 49(3) (2019), 933–946. [30] Y. Li, J. Zeng, S. Shan and X. Chen, Occlusion Aware Facial
O
[13] B. Xiao, K. Wang, X. Bi, W. Li and J. Han, 2D-LBP: An Expression Recognition Using CNN with Attention Mecha-
Enhanced Local Binary Feature for Texture Image Classifi- nism, IEEE Transactions on Image Processing 28(5) (2019),
cation, IEEE Transactions on Circuits and Systems for Video 2439–2450.
PR
Technology 29(9) (2018), 2796–2808. [31] J. Yang, D. Zhang, F. Alejandro Frangi and J.-
[14] B. Ryu, A.R. Rivera, J. Kim and O. Chae, Local Direc- yu. Yang, Two-dimensional PCA: a new approach to
tional Ternary Pattern for Facial Expression Recognition, appearance-based face representation and recognition,
IEEE Transactions On Image Processing 26(12) (2017), IEEE Transactions on Pattern Analysis and Machine Intel-
6006–6018. ligence 26(1) (2004), 131–137.
[15] Q. Liu and C. Liu, A Novel Locally Linear KNN Method [32] L.-F. Chen, H.-Y.M. Liao, M.-T. Ko, J.-C. Lin and G.-J. Yu,
D
with Applications to Visual Recognition, IEEE Transac- A new LDA-based face recognition system which can solve
tions on Neural Networks And Learning Systems 28(9) the small sample size problem, Pattern Recognition 33(10)
(2017), 2010–2020. (2000), 1713–1726.
TE
[16] S. Wang, B. Pan, H. Chen and Q. Ji, Thermal Augmented [33] Z. Guo, L. Zhang and D. Zhang, A completed modeling
Expression Recognition, IEEE Transactions on Cybernetics of local binary pattern operator for texture classification,
48(7) (2018), 2203–2214. IEEE Transactions on Image Processing 19(6) (2010),
[17] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng and Y. Ma, 1657–1663.
PCANet: A Simple Deep Learning Baseline for Image Clas- [34] P. Perona and J. Malik, Scale-Space and Edge Detection
EC
sification? IEEE Transactions on Image Processing 24(12) Using Anisotropic Diffusion, IEEE Transactions on Pattern
(2015), 5017–5032. Analysis and Machine Intelligence 12(7) (1990), 629–639.
[18] Y. Liu, X. Yuan, X. Gong, Z. Xie and F. Fang and Z. Luo, [35] P. Lucey, J.F. Cohn, T. Kanade, J. Saragih Z. Ambadar
Conditional convolution neural network enhanced random and I. Matthews, The Extended Cohn-Kanade Dataset
forest for facial expression Recognition, Elsevier, Pattern (CK+): A complete dataset for action unit and emotion-
R
with an Ensemble of Regression Trees, IEEE Confer- Image and Vision Computing 29(9) (2011), 607–619.
ence on Computer Vision and Pattern Recognition (2014), Retrieved from https://paperswithcode.com/dataset/oulu-
1867–1874. casia.
C
[21] K. Simonyan and A. Zisserman, Very deep convolutional [37] M. Lyons, S. Akamatsu, M. Kamachi and J. Gyoba, Cod-
networks for large-scale image Recognition, International ing facial expressions with Gabor wavelets, Proceedings
Conference on Learning Representations 6 (2015), 1–14. Third IEEE International Conference on Automatic Face
[22] O.M. Parkhi, A. Vedaldi and A. Zisserman, Deep face recog- and Gesture Recognition (1998), 200–205. Retrieved from
nition, British Machine Vision Conference 1 (2015), 6. https://paperswithcode.com/dataset/jaffe.
[23] B. Yang, J. Cao, R. Ni and Y. Zhang, Facial Expression [38] L. Chen, W. Su, M. Wu, W. Pedrycz and K. Hirota, A
Recognition Using Weighted Mixture Deep Neural Network Fuzzy Deep Neural Network With Sparse Autoencoder for
Based on Double-Channel Facial Images, IEEE Access 6 Emotional Intention Understanding in Human–Robot Inter-
(2018), 4630–4640. action, IEEE Transactions on Fuzzy Systems 28(7) (2020),
[24] K. Simonyan and A. Zisserman, Very deep convolutional 1252–1264.
networks for large-scale image recognition, International [39] X. Lu, L. Meng, C. Chen and P. Wang, Fuzzy Remov-
Conference on Learning Representations 6 (2015), 1–14. ing Redundancy Restricted Boltzmann Machine: Improving
[25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Learning Speed and Classification Accuracy, IEEE Trans-
Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, actions on Fuzzy Systems 28(10) (2020), 2495–2509.