Deep Learning-Based Recognition of Facial Expressions
Deep Learning-Based Recognition of Facial Expressions
https://doi.org/10.22214/ijraset.2022.48474
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
Abstract: Convolutional neural networks were used in deep learning to maintain a system for recognizing face expressions of
emotion. We were created as two distinct models. The first model was a suggested CNN architecture that was trained on the
FER-2013 dataset. The model could classify expressions into 7 different categories with an accuracy rate of 67.18%. Using the
FER-2013 dataset and a transfer learning strategy, the second model was produced. The model was able to categories the
expressions into 4 groups with an accuracy of 75.55%. A mobile web application that quickly executes our FER models on a
device is also provided by us. We introduce generic assessment standards, general face recognition databases, and face
recognition research for real-world scenarios. We present a prospective analysis of facial recognition. Face recognition has
emerged as the field's most promising area for future advancement.
I. INTRODUCTION
The face is the most expressive and communicative portion of a human, and improving IHM to establish communication between
the two entities has made it a prominent focus of recent study.
The face is the most expressive and communicative portion of a human [1], and improving IHM to allow for conversation between
the two entities is a major area of current research.
Our objectives in this study were to apply emotion detection models to real-world scenarios as well as to better understand and
enhance their performance. In order to increase accuracy, we adopted a number of strategies from recent papers, including transfer
learning, data augmentation, class weighting, adding auxiliary data, and assembling.
We also examined our models using error analysis and various interpretability methods. In order to execute our models on a device,
we also used our findings to create a mobile app.
Recently, academics have shown an interest in creating FER systems utilising machine learning (ML) and deep learning (DL)
techniques[9]. This interest is paving the way for the creation of reliable FER systems as well as the discovery of novel FER
parameters. Typically, visible light cameras are employed to capture the pictures needed for the categorization of facial expressions
since they are widely accessible, both as standalone cameras and as an attachment for inexpensive portable devices like phones and
tablets.
Despite the numerous studies on the subject, identifying facial emotions from photographs taken by cameras that use visible light
remains challenging due to commonplace circumstances like shadows, reflections[10], and obscurity (or low-light). Along with the
face, other features like scenery, background images, and many other things are also there.
Therefore, removing the face from the image in order to study the facial emotions becomes a burden. By taking into account the
temperature distribution in face muscles and offering improved facial expression categorization, working with thermal pictures aids
in resolving these problems.
The face recognition development process and related technologies, such as early algorithms, synthetic[8] features and classifiers,
deep learning, and other stages, will be discussed in this study. Next, we'll discuss the studies on facial recognition in realistic
settings. Finally, we introduce the general assessment standards and facial recognition databases.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2333
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
The top three finishers in the 2014 Image Net object identification contest all employed a CNN strategy, with the GooLeNet
architecture attaining an astounding 6.66% error rate in classification.
To study the FER problem, a brand-new deep neural network design known as a "AU-Aware" architecture was put forth in [24].
Convolution layers and max-pooling layers make up the bottom of the layer stack in an AU-Aware architecture, which is used to
create a comprehensive representation of the face.
The Japanese Female Facial Expression (JAFFE) Database is one database, while the Cohn Cade Database is the other. A multi-
step, two-class facial expression classification issue was devised by Kyperountas[11] et al.15 and results were published using the
JAFFE and MMI databases.
The best two-class classifier is chosen from a large pool of classifiers at each stage of the procedure. This aided the authors in
developing a more effective FER system. A two-step strategy for categorizing facial expressions was suggested by Ali et al.
A histogram of oriented gradients (HOG) was utilized to extract the face characteristics, and a sparse representation classifier (SRC)
was employed to identify the facial emotions.
To learn hierarchical features, a multilayer Restricted Boltzmann Machine (RBM) is utilized. The network's outputs are then
combined into characteristics that are used to train a linear SVM classifier to recognize the six fundamental phrases.
FER2013 was designed by Goodfellow et al. as a Kaggle competition to promote researchers to develop better FER systems. The
top three teams all used CNNs trained discriminatively with image transformations [3].
The winner, Yichuan Tang[13] , achieved a 71.2% accuracy by using the primal objective of an SVM as the loss function for
training and additionally used the L2-SVM loss function [4].
III. DATASETS
With a wide variety of available datasets, FER is a well-researched area. We used FER2013 as our primary dataset and CK+ and
JAFFE as auxiliary datasets to increase accuracy on its test set. In order to fine-tune our models so they perform better in real-world
circumstances, we also generated our own web app dataset.
A. FER2013 Dataset
48*48 pixel pictures of human facial expressions in grayscale make up the FER-2013 dataset. It classifies the photos into 7 different
categories of emotion. They are ecstatic, depressed, angry, disgusted, shocked, afraid, or neutral. There are 35887 photos in all.
C. MMI
More than 20 subjects of either a European, Asian, or South American ancestry (44% female), ranging in age from 19 to 62, are
included in the MMI [35] database. Subjects were told to present 79 series of facial expressions, six of which constitute prototypical
emotions, and a picture sequence with neutral faces was acquired at the start and conclusion of each session. From each sequence,
we took static frames to create 11,500 photos.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2334
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
Figure 2: Our web app dataset contains images of each emotion class
IV. ARCHITECTURE
This model comprises a softmax output layer, an FC layer with a 1024 by 1024 pixel size, and three stages of convolutional and
max-pooling layers. The sizes of the 32, 32, and 64 filters used by convolutional layers are 5x5, 4x4, and 5x5.
The max-pooling layers employ 3x3 kernels with a stride of 2.
ReLU served as the activation function. We used these features, together with batchnorm at each layer and 30% dropout after the
final FC layer, to improve speed. In order to optimize the cross-entropy loss, we trained the model for 300 iterations using stochastic
gradient descent with a momentum of 0.9.
Initial learning rates are specified at 0.1, 128 for batch sizes, and 0.0001 for weight decay. If the validation accuracy does not
increase after 10 epochs, the learning rate is cut in half.
Subsampling layers come after convolution layers in the traditional convolutional neural network topologies. The size of the cards is
decreased by the sub-sampling layer, which also introduces (poor) rotation and translation invariance as input.
A. Transfer Learning
The FER2013 dataset is small and unbalanced, thus we found that using transfer learning significantly improved our model's
accuracy. We looked into transfer learning using the pre-trained models ResNet50, SeNet50, and VGG16 [14] together with the
Keras VGG-Face library. In FER2013, we reduced the size and changed the hue of the 48x48 gray scale pictures to match the RGB
images no less than 197x197 that these new networks anticipated as input[12].
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2335
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
Despite having just 16 layers, which is significantly shallower than ResNet50 and SeNet50, VGG16 is more complex and has many
more parameters. We kept the pre-trained layers completely frozen and added two FC layers of size 4096 and 1024, respectively,
and a 50% dropout.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2336
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
D. Android Application
Technology advancement has led to an increase in mobile device usage in recent years. As a consequence, deep learning and
machine learning models may be installed on mobile devices. Due to the storage restrictions of mobile devices, deep learning
models should be optimized before being used on those devices. As a consequence, the optimized model (.pb) for the original model
is generated first. Here, the transfer learning model was improved and made available as a mobile Android application.
The Android application has a camera activity that snaps a photo of the other person. After picture capture, the color image is
converted into a gray scale image with a size of 48*48 pixels. Following that, the programmer will predict which expression class
each image falls into. The textual form of the anticipated statement will then be converted into audio in order to assist persons who
are blind.
VI. PROPPOSED
Although there are several face Expression Reorganization datasets accessible online, their image size, color, and particularly their
format, as well as their labeling and directory structures, all differ significantly. FER projects may be divided into two categories
based on their methodology.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2337
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
For the purpose of easily checking the dataset's structure, they identified more than 30 face characteristic points for the image's eye,
mouth, and brow in order to distinguish facial expression.
For that, they used 80 face photos from the pre-dataset with 128 × 128 pixel resolution and equal lighting, distance, and
backdrop[17] settings. After putting this theory into practice, they discovered that this method produces an output with a 92.1%
recognition rate.
Improve method for Face PCA (Principal Component Analysis) is used to recognize a picture from a digital facial image. In this
study, they break the picture down into tiny tuples of feature images or Eigen faces..
They first generate a training dataset of the more than 30 different types of photos stated above in order to compare the results. Once
the face picture that was uploaded had been pre-processed, it was compared to training data that had previously been added into the
dataset.
When numerous face photos are supplied, the success rate is highest, but processing time is lengthy. For this study, they used the
FACE94[14] database, which resulted in a 35% reduction in processing time compared to PCA's initial processing time. With this
new technique, they also achieved a 100% recognition rate.
VII. CONCLUSION
At the conclusion of this project, I would say that I learnt a lot from several sources to finish it. I used a variety of techniques to
finish this project successfully. When we started this project, we wanted to apply FER models to the actual world and first attain the
maximum accuracy possible.
Next, we looked at a number of models, such as shallow CNNs and pr-trained networks based on SeNet50, ResNet50, and
VGG16[18]. We used class weights, data augmentation, and supplementary datasets to reduce the FER2013's innate class
imbalance. The maximum accuracy we were able to attain was 75.8% by combining the seven models. Accuracy this is the highest
accuracy we achieved.
Additionally, we discovered that our models trained to concentrate on essential face cues for emotion recognition through network
interpretability. Additionally, by creating a mobile web application with real-time recognition speeds, we showed that FER models
could be used in the actual world.
By creating our own training datasets and fine-tuning our architecture, we were able to overcome data mismatch concerns and run
on-device with low memory, storage, and compute needs.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2338
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
Notably, accuracy on 10% test samples is 82.40% by [7], taking just chosen 980 frontal photographs into account, and the
effectiveness is lower than the suggested technique. The proposed FER's performance should be compared to other deep learning
techniques, particularly CNN-based techniques, because it is built on the DCNN model through TL.
When simply taking into account frontal pictures, the study using SCAE plus CNN [3] demonstrates an accuracy of 92.52% on the
KDEF dataset. On the JAFFE dataset, the hybrid CNN and RNN approach [56] has an accuracy of 94.91%.
X. CODE
We have tried to develop the code on the basis of the requirement of the project so mentioned repository show how actually code
will be of the facial expression recognitions using deep learning.
https://github.com/PrajjawalTiwari29/FER-code
XI. CONTRIBUTIONS
A. Prajjawal Tiwari.
1) Managing project .
2) Dataset .
B. Palak Singh.
1) Mobile web app development .
2) Future work.
C. Navneet Kumar.
1) Preprocessing of auxiliary data and datasets.
2) Interpretability of networks and error analysis.
D. Arpit Rai.
1) Methods.
2) Conclusion.
E. Prbhav Attray.
1) Models.
2) Mobile web app.
XII. ACKNOWLEDGEMENTS
We are really thankful to Assistant Professor Mr. Nizam Uddin Khan from the IMS Engineering College in Ghaziabad's Computer
Science and Engineering department for his assistance in assisting us with the application of our research to the real world.
Its our privilege to express our sincere regards to our project guide, Prof. Mr. Nizam Khan for his valuable inputs, able guidance,
encouragement, cooperation and constructive criticism throughout the duration of our project.
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2339
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue XII Dec 2022- Available at www.ijraset.com
We sincerely thank the Project Assessment Committee members for their support and for enabling us to present the project on the
topic.
“ Recognizing Facial Expressions Using Deep Learning.”
REFERENCES
[1] Y. Tang, “Deep Learning using Support Vector Machines,” in International Conference on Machine Learning (ICML) Workshops, 2013.
[2] Quinn M., Sivesind G., and Reis G., “Real-time Emotion Recognition From Facial Expressions”, 2017.
[3] Wang J., and Mbuthia M., “FaceNet: Facial Expression Recognition Based on Deep Convolutional Neural Network”, 2018.
[4] Challenges in representation learning: Facial expression recognition challenge http://www.kaggle.com/c/challengesin-representation-learning-facial-expression
recognition.
[5] H. Jung et al., "Development of deep learning-based facial expression recognition system", Proc. 21st Korea-Jpn. Joint Workshop Frontiers Comput. Vis.
(FCV), pp. 1-4, 2015.
[6] Martina Rescigno, Matteo Spezialetti1 and Silvia Rossi1, “Personalized models for facial emotion recognition through transfer learning”,Springer,2020.
[7] A. Sehgal and N. Kehtarnavaz, "Guidelines and benchmarks for deployment of deep learning models on smartphones as real-time apps", Machine Learning and
Knowledge Extraction, vol. 1, no. 1, pp. 450-465, 2019.
[8] I. M. Revina and W. S. Emmanuel, "A survey on human face expression recognition techniques" in Journal of King Saud UniversityComputer and Information
Sciences, 2018.
[9] Lei Xu, Minrui Fei, Wenju Zhou and Aolei Yang, "Face expression recognition based on convolutional neural network", Australian & New Zealand Control
Conference (ANZCC), pp. 115-118, 2018.
[10] M. Banerjee, S. Bose, A. Kundu and M. Mukherjee, "A Comparative Study: Java Vs Kotlin Programming in Android Application Development", International
Journal of Advanced Research in Computer Science, vol. 9, no. 3, pp. 41-45, 2018
[11] 2018; Multimedia Tools and Applications, vol. 77, pp. 22821-22839; C. Tang, "Twelve-layer deep convolutional neural network with stochastic pooling for tea
category categorization on GPU platform."
[12] "A pansharpening strategy employing spectral graph wavelet transformations and convolutional neural networks," International Journal of Remote Sensing, vol.
42, pp. 2898-2919, April 2021. N. Saxena and R. Balasubramanian
[13] Ensemble of Deep Neural Networks with Probability-Based Fusion for Facial Expression Recognition. Wen, G.; Hou, Z.; Li, H.; Li, D.; Jiang, L.; Xun, E.
2017's Cogn. Comput. 9, 597–610 Using Google Scholar [CrossRef].
[14] Evaluation of Data Augmentation Techniques for Facial Expression Recognition Systems by Porcu, Floris, and Atzori. 2020, 9; Electronics; 1892. Using
Google Scholar [CrossRef]
[15] Visualizing Deep Convolutional Neural Networks Using Natural Pre-images. Mahendran, A.; Vedaldi, A. Journal of Computer Vision, 2016, 120, 233–255.
Using Google Scholar [CrossRef] [Version Green]
[16] The Japanese Female Facial Expression (JAFFE) Database. Lyons, M.J.; Akamatsu, S.; Kamachi, M.; Gyoba, J.; Budynek, J. http://www.kasrl.org/jaffe
download.html is a website where it is accessible. (viewed on February 1, 2021).
[17] Information Processing and Management, vol. 58, article ID: 102439, 2021. D. S. Guttery, "Improved Breast Cancer Classification Through Combining Graph
Convolutional Network with Convolutional Neural Network,"
[18] S. Sharma, S. Kumar, and R. Mehra (2021). For the multi-classification of breast cancer, an optimised CNN in combination with a successful pooling method
was used. Early Access Article on IET Image Processing. IPR2.12074, doi: 10.1049
©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 2340