0% found this document useful (0 votes)
26 views12 pages

Enhanced Convolutional Neural Network For Robust Facial Expression Recognition On Fer2013 and Natural Image Datasets

This document presents a study on an enhanced convolutional neural network (CNN) model designed for robust facial expression recognition, utilizing the Fer2013 dataset and a self-created natural expression dataset. The model, which consists of 10 layers including convolutional, pooling, and fully connected layers, achieved a recognition accuracy rate of 85.1%. The research highlights the importance of facial expression recognition in improving human-computer interaction across various fields such as security, psychology, and online education.

Uploaded by

snagel1974
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views12 pages

Enhanced Convolutional Neural Network For Robust Facial Expression Recognition On Fer2013 and Natural Image Datasets

This document presents a study on an enhanced convolutional neural network (CNN) model designed for robust facial expression recognition, utilizing the Fer2013 dataset and a self-created natural expression dataset. The model, which consists of 10 layers including convolutional, pooling, and fully connected layers, achieved a recognition accuracy rate of 85.1%. The research highlights the importance of facial expression recognition in improving human-computer interaction across various fields such as security, psychology, and online education.

Uploaded by

snagel1974
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Advanced International Journal of Multidisciplinary Research

Volume 1, Issue 1, July – August 2023………………………………………………………………188

Enhanced Convolutional Neural Network for Robust Facial


Expression Recognition on Fer2013 and Natural Image Datasets
Prof. Prakash Sangle
Assistant Professor, Department of Computer Engineering,
Veermata Jijabai Technological Institute, Mumbai

Abstract: In order to study the application of Keywords: Convolutional neural network;


convolutional neural networks in facial Expression recognition; Deep learning; Feature
expression recognition, a 10-layer extraction; Image classification.
convolutional neural network model is
designed to recognize facial expressions. The 1. Introduction
last layer uses the Softmax function to output With the development of artificial
the classification results of expressions. First, intelligence, human-computer interaction has
the convolution and pooling algorithms of
been widely studied. Letting machines learn
convolutional neural networks were studied
human emotions based on human expressions
and the structure of the model was designed.
Secondly, in order to more vividly display the
is an important research part of human-
features extracted by the convolutional layer, computer interaction. Facial expression
the extracted features are visualized and recognition is an interdisciplinary subject in
displayed in the form of feature maps. The a broad sense, and its research involves
convolutional neural network model in this computer vision, graphics and image
work was tested on the Fer-2013 data set, and processing, and psychology. Research on
the experimental results demonstrated the facial expression recognition can promote
superiority of the recognition rate. It is known better human-computer interaction and is
that the Fer-2013 dataset contains data also an indispensable part of human-
collected in an experimental environment, and
computer interaction. The purpose of
in order to verify the generalization ability of
studying facial expression recognition may
model recognition, a self-made facial
expression data set in natural state was
lay in one or more of the following fields:
created, and performed a series of I. To better understand human emotions
preprocessing on the face images such as in human-computer interaction, thereby
cropping, grayscale and pixel adjustment. The improving the experience of human-
trained model, which was previously applied computer interaction;
to the Fer-2013 dataset, was tested out on the
II. To track and recognize facial
new dataset. The experiment yielded
expressions in video clips;
promising results, one of which in the form of
a recognition accuracy rate as high as 85.1%. III. To research facial expression encoding
mode, which is more conducive to
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

transmitting and storing pictures of representation of the input. Finally, in the


facial expressions. fully connected layer, the complexity of the
Facial expression recognition has broad model was simplified by discarding some
application prospects in fields such as neurons.
security, psychology, medical care, customer The contributions of this work can be
satisfaction analysis, and online teaching. summarized in the following points:
Human facial expressions have been studied
since the 1970s and human expressions have I. A new convolutional neural network
been classified. In traditional expression structure is designed based on the idea
recognition systems, this process is divided of VGG network.
into feature extraction and expression II. Train the model through the Fer-2013
classification. The methods for extracting data set and verify the accuracy of
facial features include Gabor filter [1]; model recognition.
direction histogram HOG; discrete cosine III. Created a self-made data set of human
transform (DCT) and scale-invariant feature facial expressions in a natural state, and
transform (SIFT), etc., and then use SVM [2] verified the recognition generalization
or PCA [3] to perform facial expression ability of the model based on the self-
Classification. With the development of deep made data set.
learning, deep learning has been applied in
2. Literature Work
expression recognition. The deep neural
network model can extract image features The convolutional neural network is divided
and image classification at the same time, so into two processes, forward propagation and
it also brings great convenience to expression back propagation. Forward propagation
recognition. performs convolution and pooling
operations. These two operations are to
In computer vision, convolutional neural extract image features and process image
networks have better performance than other
features. Back propagation uses the BP
neural networks in processing graphics and
algorithm to transmit errors, thereby using
images due to their own convolution and
the optimization algorithm to update model
pooling operations. This paper designs a new parameters.
convolutional neural network structure to
extract and classify expression features. The In the 2012 ILSVRC challenge, Krizhevsky
model of this work draws on the idea of VGG et al. [5] used deep convolutional neural
network, designs a convolutional network networks for image classification and
structure, and adjusts the parameters of the achieved good results in the challenge. Since
network structure. Inspired by the GoogleNet then, convolutional neural networks have
[4] network, a 1*1 convolution kernel to the been widely used in image recognition. Chen
first layer of the convolutional neural et al. [6] studied convolution and pooling
network was added to increase the nonlinear algorithms for facial expression recognition,
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

pointed out some limitations of fixed pooling, 3. CNN Structure Design


and proposed a dynamic adaptive pooling Convolutional neural network is a kind of
algorithm. Lu et al. [7] and Jeon et al. [8] neural network that has unique advantages in
respectively designed a convolutional neural extracting image features. Expression
network model for expression recognition, recognition belongs to classification
but the accuracy of expression recognition supervised learning, which uses labeled
was not ideal. Arriaga [9] et al. respectively expression pictures to train a convolutional
designed a convolutional neural network that neural network classification model. The
simultaneously recognizes gender and forward propagation of the convolutional
expression. Xu et al. [10] designed a parallel neural network model is convolution and
convolutional neural network to recognize pooling operations, the back propagation
expressions, reducing the training time algorithm is used to transfer errors, and the
during model training. In order to improve stochastic gradient descent (SGD)
the accuracy of recognition, the optimization algorithm is used to train and
convolutional neural network usually optimize the parameters of the model. The
integrates another model for expression convolutional neural network designed in this
recognition [11] [12]. For example, Wang et article for expression recognition consists of
al. [11] merged the convolutional neural an input layer, 4 convolutional layers, 3
network and the support vector machine. The pooling layers, 2 fully connected layers and a
convolutional neural network only extracts SoftMax layer. Its structure is as shown in
features, and the support vector machine is Figure 1.
used to replace the fully connected layer for
The parameters of each convolutional layers,
classification. Huang et al. [13] and Li et al.
pooling layers and fully connected layers are
[14] respectively proposed cross-connection
detailed in Table 1.
convolutional neural networks. Different
convolutional layers extract different 3.1. Convolution Layer
features. Cross-connections are used to retain
The convolutional layer of the convolutional
the features of different layers to improve the
neural network performs convolution
recognition rate. Qian et al. [15] used a
operations on facial expression pictures to
convolutional neural network to extract facial
extract facial expression features. The input
expression features from different
layer directly uses the image pixels as the
perspectives, so that the extracted features are
input value, and then performs a convolution
more accurate and detailed and are more
operation on the input value. This article uses
conducive to classifying expressions.
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

Figure 1. The proposed structure of the convolutional neural network.

Table 1. Parameters of convolutional neural network structure.


Network layer Input Convolution Pooling Step With or Output
dimension kernel size area size without dimension
Filling
Layer 1 (convolution) 48*48 32@1*1 1 None 32@48*48

Layer 2 (convolution) 32@48*48 32@5*5 1 None 32@44*44

Layer 3 (pooling) 32@44*44 2*2 2 None 32@22*22

Layer 4 (convolution) 32@22*22 64@3*3 1 None 64@20*20

Layer 5 (pooling) 64@20*20 2*2 2 None 64@10*10

Layer 6 (convolution) 64@10*10 128@3*3 1 Yes 128@10*10

Layer 7 (pooling) 128@10*10 2*2 2 None 128@5*5

Layer 8 1*3200 fully connected layer 1*2048


Dropout (0.6)
Layer 9 1*2048 fully connected layer 1*1024
Dropout (0.4)
Layer 10 1*1024 SoftMax layer 1*7

convolution kernels of different sizes for 𝐶𝑖 = 𝑓(𝑥 ∗ 𝑤𝑖 + 𝑏𝑖 ) (1)


feature extraction. Convolution kernels of Where, Ci represents the output result
different sizes represent different receptive obtained by the ith convolution, f(.)
fields. Therefore, different convolution represents the activation function, and the
kernels are used to extract expression activation function selected the rectified
features of different receptive fields. The linear unit function (Rectified Linear Units,
expression of the convolution layer is as ReLU), x represents the input image value, *
shown in (1). represents the convolution operation, and wi
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

represents the ith Convolution kernel, bi kernels in each layer and visually displays the
represents the bias of the first convolution extracted feature maps. A facial expression
kernel. The expression of the ReLU function pictures in the Fer2013 dataset were used for
is as shown in (2). demonstration. The feature extraction for a
𝑦, 𝑦 ≥ 0 feature map after one convolution operation
𝑅𝑒𝐿𝑈 (𝑦) = { (2) is as shown in Figure 2.
0, 𝑦 < 0
This work uses a total of 4 convolutional
layers. The convolution kernel sizes are: 1*1,
5*5, 3*3, 3*3, and the number of convolution
kernels is 32-32-64-128. After the Original image First layer
convolution
Second layer
convolution
Third layer
convolution
Fourth layer
convolution
convolution layer, the excitation layer is kernel 1*1 kernel 5*5 kernel 3*3 kernel 3*3
Figure 2. Features after convolution.
output. A 1*1 convolution kernel before the
second layer of convolution is used to 3.2. Pooling Layer
increase the nonlinear representation of the The pooling layer of a convolutional neural
input, deepen the network structure of the network is usually designed after the
model, and improve the expression ability of convolutional layer. The number of feature
the model. The input of the image is a 48*48 maps will increase as the number of
matrix. After convolution with 32 (1*1) convolutional layers increases. However, the
convolution kernels, 32 (48*48) feature maps increase in feature dimension will cause a
are output. The second layer of convolution dimensionality disaster, so it is usually added
uses a 5*5 convolution kernel to first extract after the convolutional layer. The pooling
features in a large receptive field, and then layer is used for dimensionality reduction.
reduce the size of the convolution kernel to This work uses the maximum pooling
extract features in a smaller area. 5*5 operation to maintain the most salient
convolution kernels for convolution on the features in a pooled area. The pooling layer
48*48 feature map was used to obtain 32 (48- can be expressed as shown in (3).
5+1)*(48-5+1) feature maps. Using 32
convolution kernels is the extraction 32 𝑆𝑖 = 𝑑𝑜𝑤𝑛 (max(𝑦𝑎,𝑏 )) 𝑎, 𝑏 ∈ 𝑝𝑖 (3)
different local expression features are Where, Si represents the maximum pooling
included. The third and fourth convolutional result of the ith pooling area, down(∙)
layers use 3*3 convolution kernels represents the down sampling process
respectively. The specific parameter values (retaining the maximum value of the pooling
of each layer of the network are shown in area), ya,b represents the value in the pooling
Figure 1. area, and pi represents the ith pooling area.
Each layer of the convolution operation of the The third layer of the network structure in this
convolutional neural network performs work is the pooling layer, and the feature map
feature extraction. This paper fuses the input to this layer is 44*44. The pooling area
features extracted by different convolution is 2*2, so in the feature map, 2*2 represents
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

a pooling window, and each pooling window probability corresponding to this neuron is
results in a maximum pooling result. the highest. The representation of SoftMax
Therefore, the final pooling result of the classification is shown in (5).
feature map is (44/2)*(44/2). 𝑒 𝑤𝑐 ×𝑚
𝑝( 𝑦 = 𝑐 | 𝑚 ; 𝑤 ) = ∑𝑘 𝑤𝑖 ×𝑚 (5)
𝑖=1 𝑒
3.3. Fully Connected Layer
Where, 𝑝( 𝑦 = 𝑐 | 𝑚 ; 𝑤) represents the
In the fully connected layer, the neurons, are probability that the input picture m is the
connected to the ones in the previous layer, expression type 𝑐, w is the weight parameter
thereby converting the feature dimensions value (to be fitted), and k is the total number
into one-dimensional data. The last pooling of categories, 7. The value of expression type
layer in this work is connected to the fully c is {0, 1, 2, 3, 4, 5, 6}.
connected layer. The last pooling layer
outputs 128 convolutional 5*5 feature maps, 4. Experiment
which is converted into one-dimensional
The experiments in this work are
data: 128×5×5 = 3200, and then input 1×3200
implemented in Python, and are based on the
data into the fully connected layer. The fully deep learning platform of Keras. In addition,
connected layer is represented by (4). Python was also used to reproduce the two
𝐹𝑢𝑙𝑙 = 𝑓(𝑤 × 𝑧 + 𝑏) (4) models in the two papers for comparison. In
Where, Full represents the output result of order to make a fair comparison of the
the fully connected layer, f(∙) is the ReLU experimental results, a unified data set was
activation function, w represents the weight used for training the different models.
value of the connection, z is the value input
4.1. Dataset
to the fully connected layer, and b is the bias.
This article uses two data sets, one is the
In order to reduce the complexity of the
Fer2013 [16] facial expression data set, and
network structure and prevent overfitting,
random deactivation (Dropout) of neurons is the other is produced as a part of this paper
work. The Fer2013 expression database has
used.
35,886 facial expression pictures, including
3.4. SoftMax 28,708 in the training set, 3589 in the
The last layer of the network structure in this verification set and 3,589 in the test set. The
work is the SoftMax function to classify the size of each grayscale image is 48*48. There
7 facial expressions. There are 7 neurons in are 7 expressions in the data set: angry,
this layer, and each neuron represents an disgusted, fearful, happy, sad, surprised, and
expression category. For each input face neutral.
picture, the 7 neurons in the SoftMax layer The Fer2013 data set was collected in a
input the probability between 0 and 1, and the laboratory environment, so it cannot well
neuron with the largest input probability verify the model's recognition of human
value, it means that the expression expressions in the natural state. Therefore, a
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

search for some pictures of human The training goal is to minimize the loss
expressions in the natural state from the value, use the backpropagation algorithm to
Internet was performed, and then analyzed propagate the error value, and use the SGD
the pictures. The size, pixels, background, optimization algorithm to update the
etc. are preprocessed and the pictures are parameter values along the direction of
uniformly converted into grayscale images. gradient descent. The SGD algorithm is as
Finally, a small data set is formed. The facial shown in (7).
expressions in the self-made data set are
divided into 7 types of expressions, with a
𝜕𝑙𝑜𝑠𝑠 𝜃
total of 396 pictures. This work uses the 𝜕𝜃𝑖1
= − ∑𝑛𝑖=1 𝑎𝑖1 (7)
𝑖1
above two data sets to jointly verify the
Therefore, the parameters are updated as in
performance of the proposed convolutional
(8).
neural network model.
𝜕𝑙𝑜𝑠𝑠
𝜃𝑗 = 𝜃𝑗 − 𝑎 𝜕𝜃𝑗
(8)
4.2. Model Training
In order to train a more accurate model and Where θj is the parameter to be updated, a is
use expression pictures more efficiently, the 𝜕𝑙𝑜𝑠𝑠
the learning rate, is the value to
𝜕𝜃𝑗
expression data library was amplified
decrease in the gradient descent direction.
through a series of random transformations,
The learning rate in this model is 0.01. In
such as these shown in Figure 3.
order to allow the training to converge to the
best result, this work sets the learning rate to
gradually decay as the number of training
times increases, so the learning step size
gradually decreases.
This work first uses the training set in the
Fer2013 data set to train the model, and then
uses the verification set to verify the accuracy
of the recognition. When the accuracy of the
Figure 3. Data augmentation.
verification set decreases and the loss value
The loss function used in this work is a multi- increases, the training is stopped.
class cross-entropy loss function, the loss
function is as shown in (6). 4.3. Experimental Results and Analysis
𝑙𝑜𝑠𝑠 = − ∑𝑛𝑖=1 𝑦𝑖1 log 𝑎𝑖1 + . . . +𝑦𝑖7 𝑙𝑜𝑔 𝑎𝑖7 (6) This article reproduces the convolutional
neural network model proposed by Lu [7] et
Where a is the actual output value of the al. and the LeNet-5 model proposed by Li
neuron and y is the expected output value. [14] et al. using Python. The Fer2013 data set
was used to train and the accuracy was
calculated on the test set. The training results
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

of the model in this paper, the model of Lu Figure 6. Training results of the Li et al. model.
[7] et al, and the model of Li [14] et al, are as
shown in Figs. 4~6. For the training results, we selected the
training models with the highest accuracy on
the verification set. The accuracy of various
models in the test set is summarized in Table
2. It can be seen from this table that the
accuracy of the suggested model on the test
set is relatively high, with an accuracy of
72.92%.

Table 2. Accuracy of each model.


Number
Validation Test set
Ref of
set accuracy accuracy
iterations
Figure 4. Training results of the suggested model.
10 76 0.5811 0.6455

11 0.7074

12 116 0.5646 0.7142

This paper 76 0.6400 0.7292

The model in this work is trained with the


Fer2013 data set and saves the trained model,
and then uses the trained model to identify the
pictures in the self-made data set. For
example, the single picture recognition result
is as shown in Figure 7.
Figure 5. Training results of the Lu et al. model.
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

Figure 7. Expression recognition results.

The entire self-made dataset was identified,


and the confusion matrix of the identification
results is as shown in Table 3.

Table 3. Confusion matrix for identifying self-made datasets.


Predict
Angry Disgust Fear Happy Sad Surprise Neutral Recognition Rate
Angry 33 0 3 0 0 1 0 89.19%
Disgust 5 10 0 2 3 0 2 45.45%
Fear 0 0 22 0 1 13 2 57.89%
Actual Happy 1 0 1 134 0 0 5 95.04%
Sad 0 0 2 1 20 0 7 66.66%
Surprise 0 0 1 2 0 39 1 90.70%
Neutral 1 0 2 2 1 0 79 92.94%

From the confusion matrix, one can see that people's pupils dilate, so fear will be
the recognition accuracy of happy, neutral, recognized as surprise. The overall accuracy
and surprised expressions is relatively high, rate on the self-made natural expression data
but the recognition effect is relatively poor set is 337/396 = 85.10%.
when identifying disgust and fear In expression recognition, we analyzed
expressions. Regarding disgust expressions, several difficulties in facial expression and
everyone has different expressions and facial expression recognition. Human beings are
expressions are also very different. complex animals with a rich inner world. The
Therefore, when recognizing disgust expressions on human faces are sometimes
expressions, the recognition results are intertwined with multiple emotions, such as
relatively scattered and may be recognized facial expressions may contain multiple
into various expressions. When recognizing expressions such as surprise, anger, and
fear expressions, it is easier to recognize helplessness at the same time, which makes
surprise, mainly when extracting features of recognition difficult. Sometimes different
the eyes. Both fear and surprise tend to make expressions of human beings may ex-press
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

the same emotion, and the same expression design. To further evaluate the model’s
may have different emotions for different capability to generalize beyond controlled
people, which requires strict extraction of environments, a supplementary dataset
subtle features of human faces. Finally, composed of facial expression images
human facial features have their own collected from natural, real-world settings
characteristics and cannot be generalized. For was created and utilized. These images
example, in the expression recognition underwent preprocessing steps such as
process, the expressions of people with big grayscale conversion, cropping, and pixel
eyes are more likely to be recognized as normalization to align them with the training
surprise or fear. data format.
The evaluation on this custom dataset
5. Conclusion
revealed that the CNN maintained a high
Unlike traditional facial expression level of recognition performance, indicating
recognition methods that rely heavily on good generalization capacity across varied
manual feature engineering, convolutional contexts and image conditions. However, it is
neural networks (CNNs) possess the important to note that CNNs typically require
remarkable ability to automatically and large-scale datasets to achieve optimal
implicitly learn relevant features directly learning outcomes, as their depth and
from raw image data. By utilizing only the parameter complexity necessitate extensive
pixel values of facial images as input, the exposure to diverse training examples.
CNN model is capable of discovering Consequently, the development of more
complex hierarchical patterns that represent comprehensive and representative facial
various emotional expressions. In this study, expression datasets, especially those
a customized CNN architecture is developed depicting spontaneous expressions in real-
specifically for facial expression recognition, life scenarios, would further enhance the
leveraging the inherent strengths of CNNs in model’s applicability and reliability in
spatial feature extraction and image practical emotion recognition systems. As
classification tasks. such, the continued accumulation and
The proposed model is initially trained using integration of naturalistic facial expression
the widely adopted FER2013 dataset, which images is essential to advance the
contains a large and diverse collection of generalization and real-world readiness of
labeled facial expression images. The CNN-based recognition models.
experimental results obtained from this
dataset highlight the superior performance of
the designed network in terms of References
classification accuracy and robustness,
1. Kumbhar, M., Jadhav, A., & Patil, M.
affirming the effectiveness of the
(2012). Facial Expression
architectural choices made in the model's
Recognition Based on Image Feature.
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

International Journal of Computer Journal of Nanjing University of


and Communication Engineering, Posts and Telecommunications:
1(2), 117–119. Natural Science Edition, 36(1), 16–
2. Reddy, C. V. R., Reddy, U. S., & 22.
Kishore, K. V. K. (2019). Facial 8. Jeon, J., Park, S., Kim, J., & Kim, J.
Emotion Recognition Using NLPCA (2016). A Real-Time Facial
and SVM. Traitement du Signal, Expression Recognizer Using Deep
36(1), 13–22. Neural Network. In Proceedings of
3. Deng, H.-B., Jin, L.-W., Zhen, L.-X., the 10th International Conference on
& Huang, J.-C. (2005). A New Facial Ubiquitous Information Management
Expression Recognition Method and Communication (Article No. 94,
Based on Local Gabor Filter Bank pp. 1–4).
and PCA plus LDA. International 9. Arriaga, O., Valdenegro-Toro, M., &
Journal of Information Technology, Plöger, P. (2017). Real-Time
11(11), 86–96. Convolutional Neural Networks for
4. Szegedy, C., Liu, W., Jia, Y., Emotion and Gender Classification.
Sermanet, P., Reed, S., Anguelov, D., arXiv preprint arXiv:1710.07557.
... & Rabinovich, A. (2015). Going 10. Xu, L., Zhang, S., & Zhao, J. (2019).
Deeper with Convolutions. In Constructing a Parallel Convolutional
Proceedings of the IEEE Conference Neural Network Expression
on Computer Vision and Pattern Recognition Algorithm. Chinese
Recognition (pp. 1–9). Columbus. Journal of Image and Graphics,
5. Krizhevsky, A., Sutskever, I., & 24(2), 227–236.
Hinton, G. E. (2012). ImageNet 11. Wang, Z., Li, H., & Zhang, R. (2019).
Classification with Deep Expression Recognition Integrating
Convolutional Neural Networks. In Convolutional Neural Network and
Advances in Neural Information Support Vector Machine. Computer
Processing Systems (Vol. 25, pp. Engineering and Design, 40(12),
1097–1105). 3594–3600.
6. Chen, H., & Qiu, X. (2019). Research 12. Sun, X., Pan, T., & Ren, F. (2016).
on Expression Recognition Based on Facial Expression Recognition Based
Convolutional Neural Network and on ROI-KNN Convolutional Neural
Pooling Algorithm. Computer Network. Journal of Automation,
Technology and Development, 29(1), 42(6), 883–891.
61–65. 13. Huang, Q., & Wang, Q. (2019).
7. Lu, G., He, J., & Yan, J. (2016). Facial Expression Recognition Based
Convolutional Neural Network for on Cross-Connection Feature Fusion
Facial Expression Recognition.
Advanced International Journal of Multidisciplinary Research
Volume 1, Issue 1, July – August 2023………………………………………………………198

Network. Computer Engineering and


Design, 40(10), 2969–2973.
14. Li, Y., Lin, X., & Jiang, M. (2018).
Facial Expression Recognition Based
on Cross-Connection LeNet-5
Network. Journal of Automation,
44(1), 176–182.
15. Qian, Y., Shao, J., Ji, X., & Li, Y.
(2018). Multi-View Facial
Expression Recognition Based on
Improved Convolutional Neural
Network. Computer Engineering and
Applications, 54(24), 12–19.
16. Goodfellow, I. J., Erhan, D., Carrier,
P. L., Courville, A., Mirza, M.,
Hamner, B., ... & Bengio, Y. (2013).
Challenges in Representation
Learning: A Report on Three
Machine Learning Contests. Neural
Networks, 64, 59–63.

You might also like