Bachelor of Technology
Bachelor of Technology
Bachelor of Technology
in
CERTIFICATE
Certified that project work entitled ”Customer Review using Facial Expression
Recognition” is a bonafide work done by Ms. Alicia George Thottukandathil
(RET18CS022), Ms. Ann Mohan Chacko (RET18CS037), Ms. Anupa
Anna Sam (RET18CS042) and Ms. Arathi Jayachandran (RET18CS044),
of Eighth Semester in partial fulfillment of the requirements for the award of the Degree
of Bachelor of Technology in Computer Science and Engineering from APJ Abdul Kalam
Technological University, Kerala during the academic year 2021-2022.
We are highly indebted to our seminar coordinators, Dr. Jisha G, Assistant Profes-
sor,Department of CSE, Ms.Meenu Mathew, Assistant Professor, Department of CSE,
and Mr. Ajith S, Assistant Professor, Department of CSE for their valuable support.
It is indeed our pleasure and a moment of satisfaction for us to express our sincere
thanks to our project guide Mr. Paul Augustine, Assistant Professor, Department of CSE
for his patience and all priceless advice and for all the wisdom he has shared with us.
Last but not the least, We would like to express our sincere gratitude towards all other
teachers and friends for their continuous support and constructive ideas.
Facial expressions are used by humans to indicate their emotions and, on rare
occasions, their needs. It could be a smiling or frowning expression. Our words don’t
always have the same impact as our actions. Nowadays, customer satisfaction is crucial
for businesses and organisations. Conducting surveys and handing out questionnaires to
clients are examples of manual procedures. Marketers and businesses, on the other hand,
are seeking for quick ways to get relevant and timely feedback from potential customers.
Acknowledgements ii
Abstract iii
List of Figures vi
1 Introduction 1
1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Scope and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Literature Survey 7
4 Proposed Method 10
4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Scope of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.2 Facial Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.3 Emotion recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5 Design of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iv
4.5.1 Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5.2 Back End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5.3 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5.4 Sequence Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6 Modular Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6.2 Emotion Detection Training . . . . . . . . . . . . . . . . . . . . . . 17
4.6.3 Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6.5 Model Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6.6 Front End Development . . . . . . . . . . . . . . . . . . . . . . . . 20
4.7 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7.1 Convolutional Neural Network (CNN/ConvNets): . . . . . . . . . . 22
4.7.2 Adam Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Results 27
Glossary 30
References 31
v
List of Figures
4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Detection Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5 Samples from FER2013 Dataset . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6 Model Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.7 Model Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.8 Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.9 Result Displayed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.10 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.11 Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.12 Comparison of Adam to Other Optimization Algorithms . . . . . . . . . . 26
vi
Chapter 1
Introduction
1.1 Preamble
Since the ability to consistently recognise human emotions has a wide range of appli-
cations, emotion detection has long been a focus of computer vision research. Manual
approaches including as satisfaction surveys, interviews, and focus groups can be used
to assess customer satisfaction. These procedures are inefficient and ineffective in terms
of cost, time, and data reliability. Nonverbal communication is done through facial ex-
pressions. Facial expressions are one of the most important nonverbal components in
interpersonal communication because they contain emotional significance. They are a
unique way for us to convey our feelings and gratitude. A negative feedback sentiment is
frequently associated with a lower perceived quality of service in the context of customer
satisfaction.
We plan to focus on the issue of emotion recognition and develop a system that can
identify emotions in real time. This project makes use of a Kaggle dataset containing
48x48-pixel grayscale photos of faces. TensorFlow is a powerful deep learning library
in Python that allows you to run deep learning neural Convolutional networks for dig-
its (handwritten) categorization, picture pre-processing, and recognition. For loading,
displaying, and analysing the data, various libraries such as NumPy, Pandas, and Mat-
plotlib were employed. Keras, one of the libraries used to code deep learning models, is
also employed in this model. TensorFlow is used at the backend.
This project contains many of the characteristics required for accurate facial expression
recognition. Using Convolution Neural Networks, we constructed a model that accurately
classifies faces as one of the seven basic emotions: happy, sad, anger, neutral, disgust,
1
fear, and surprise. The goal of this research is to use facial expression analysis to detect
clients’ positive, negative, or neutral emotions.
• Feature extraction
• Expression categorization.
First, a face image is extracted from an input image, followed by the detection of facial
components (e.g., eyes and nose) or landmarks in the face region. Second, from the
facial components, various spatial and temporal features are derived. Third, utilising the
retrieved features, pre-trained FE classifiers such as a support vector machine (SVM),
AdaBoost, and random forest give recognition results.
To create, alter, store, and trade digital photographs, several open standards were
suggested. The guidelines defined the picture file format, image encoding techniques, and
the form of additional information known as metadata.
1.1.2 Image
The dimensions (height and breadth) of an image are determined by the number of
pixels. For example, if an image’s dimensions are 500 × 400 (width x height), the image’s
total number of pixels is 200000.
This pixel is a point on the image that takes on a specific shade, opacity or color. It is
usually represented in one of the following:
• Grayscale - A pixel is an integer with a value ranging from 0 to 255 (0 is black and
255 is white).
2
• RGB - A pixel is made up of three numbers ranging from 0 to 255. (the integers
represent the intensity of red, green, and blue).
• RGBA -It’s an extension of RGB with an additional alpha field that reflects the
image’s opacity.
Red, green, and blue are the three channels in an RGB image. RGB channels, which
are used in computer displays and image scanners, roughly follow the colour receptors in
the human eye.
If the RGB image is 24-bit (the industry standard as of 2005), each red, green, and
blue channel contains 8 bits—in other words, the image is made up of three images (one
for each channel), each of which may store discrete pixels with traditional brightness
intensities between 0 and 255. Each channel in a 48-bit RGB image (extremely high
colour depth) is made up of 16-bit images.
3
Figure 1.2: Different Channels of an Image
4
Figure 1.4: 3D Array
1.2 Motivation
Human emotions and intents are expressed through facial expressions, and the facial
expression system’s key component is determining an efficient and effective feature. Non-
verbal cues are sent by facial expressions, and they play a vital part in interpersonal
relationships. Automatic facial expression recognition can be a useful feature in natural
human-machine interfaces, as well as in behavioural science and medical practise. Face
identification and positioning in a chaotic scene, facial feature extraction, and facial emo-
tion classification are all difficulties that an autonomous Facial Expression Recognition
system must overcome. FER systems can be used in a variety of settings, including com-
puter interfaces, health-care systems, and social marketing. However, due to the delicate
and fleeting motions of the foreground people and the noisy background environment in
real-world images/videos, face expression analysis is extremely difficult. In a FER system
for detecting emotions, extracting features is a critical step. The second critical stage in
a FER system is the classification of emotions. Despite the fact that researchers have
sought to fix the problems in real-world applications, there are still three major issues to
be addressed (illumination variation, subject-dependence, and head pose).
5
1.3 Scope and Objectives
1. Using facial expression recognition, detect the emotions that a customer is experi-
encing.
1.4 Summary
Using facial expression recognition, our study provides an analytic analysis of a client
using a product or service. We’re using a recorded video of a client using the product to
detect the emotions shown by the customer while using it. The FER2013 dataset, which
we received from Kaggle, was used to train a neural network model. Face detection was
done with OpenCV, and training and testing were done with TensorFlow. We provide
a video file as input, and each frame is retrieved from the film to detect the emotions
displayed in each frame. A bar graph is created based on the total number of times each
emotion was shown. If the customer left a favourable, negative, or neutral review, a report
is generated.
6
Chapter 2
Literature Survey
Marek Kowalski et al., Jacek Naruniec, and Tomasz Trzcinski[4] have shown a Deep
Learning alignment work based on Convolutional NN, which is a rigorous face calibra-
tion method. They claim that, unlike current face alignment algorithms, Deep Alignment
Network performs face calibrations mostly based on whole-facial pictures, making it excep-
tionally accurate to significant changes in both initializations and forehead poses. Using
heatmaps with landmarks and communicating the detail of landmark placements across
DAN stages, they were able to use face images instead of locally accessible markers recov-
ered around landmarks. When applied to two different activities, extensive performance
evaluation reduces the ultra-modern failure rate by a significant limit of more than 70
Kai Wang et al., Xiaojiang Peng, Jianfei Yang, and Debin Meng[3] have presented a
real-world implementation of a facial expression recognition system that resolves the steps
that occur when changes are made. The scientists used FER datasets to design a number
of additional tests on these phases, as well as a novel ”Region Attention Network (RAN)”
that represents the importance of face landmarks. The authors also evaluated their data
collection approach and conducted extensive study on FER-Plus and Affect-Net.
Ivona Tautke et al., Tomasz Trzcinski, and Adam Bielski[1] have shared their thoughts
on a work-in-progress technique that uses facial emotion detection to extract a lot of
information from face landmarks. The findings are based on the JAFFE dataset, which
revealed some signs of a possible development area as well as enhanced precision. The
authors give an overview, claiming that the proposed method has a lot of potential to
outperform existing methods.
7
B. Hasani et al. M. H. Mahoor[6] suggested a 3-Dimensional Convolutional Neural Net-
work approach for FER in video frames. This model creates a 3D Inception-ResNetlayers
followed by an LSTM unit that grasps the spatial relationships inside images of faces as
well as the temporal relationships between distinct frames of video. Facial curve dots are
also employed as examples in their network design, which focuses on instances of facial
landmarks rather than other well-known facial patches that aren’t as useful and may be
unable to generate significant facial expressions.
Sivo Prasad Raju et al. Saumya A and Dr. Romi Murthy[2] proposed an architecture in
which face emotions/expressions are classified using convolutional neural networks (CNN).
To obtain good accuracy during the training phase, the authors employed the Japanese
Female Face Expression (JAFFE) dataset of facial emotion images for training CNN. CNN
has been utilised to detect tiredness or alertness of drivers in real time in the concept of
a hybrid vehicle.
8
Chapter 3
• Anaconda
• OpenCV
• TensorFlow
• Keras
• Seaborn
• Matplotlib
• Numpy
• Pandas
9
Chapter 4
Proposed Method
Human emotions and intents are expressed through facial expressions, and the facial
expression system’s key component is determining an efficient and effective feature. Non-
verbal cues are sent by facial expressions, and they play a vital part in interpersonal
relationships. Automatic facial expression recognition can be a useful feature in natural
human-machine interfaces, as well as in behavioural research. Face identification and po-
sitioning in a chaotic scene, facial feature extraction, and facial emotion classification are
all difficulties that an autonomous Facial Expression Recognition system must overcome.
• The goal is to create a system that generates an analytical review based on a cus-
tomer’s facial expression detection.
4.3 Methodology
The first and most important step in face recognition is face detection. It recognises
persons in photos. It’s a type of object detection that can be used for a variety of
purposes, including security, biometrics, law enforcement, entertainment, and personal
safety. It is used for surveillance and item tracking by recognising faces in real time. The
image is imported initially by specifying its location. The image is then converted from
10
RGB to grayscale since faces are easier to discern in grayscale. The image is then edited,
which may involve resizing, cropping, blurring, and sharpening if needed. The following
phase is image segmentation, which is used for contour detection.
The next step is to figure out where the human faces are in a frame or image. Some
characteristics of the human face are universal, such as the nose region being brighter
than the eye region and the region being darker than its surrounding pixels. The x, y,
w, and h coordinates are then given, which construct a rectangle box in the image to
show the location of the face or the region of interest in the image. The programme can
then draw a rectangle box around the recognised face in the target area. A number of
distinct detection techniques, such as grin detection, eye detection, and so on, are used
in conjunction with one another.
A facial expression recognition system uses algorithms to detect faces, code facial expres-
sions, and recognise emotional states in real time because it is a computer-based tech-
nology. It does so by using computer-powered cameras installed in laptops, cellphones,
and digital signage systems, as well as cameras mounted on computer screens, to evaluate
faces in pictures or video. Three processes are commonly included in facial analysis using
computer-assisted cameras:
2. Recognition of facial landmarks: The data on facial features is collected from faces
that have been discovered. Two instances are detecting the shape of facial compo-
nents and characterising the texture of skin in a facial area.
Face expressions are a vital means of communication for people and animals alike. Facial
expressions can be used to study human behaviour and psychological features. It’s also
11
employed in a number of medical procedures and treatments. In this section, we’ll in-
terpret the sentiment portrayed in the image using photographs of facial expressions and
portraits of faces.
• There are two sorts of blocks in MobileNetV2. One is a one-stride residual block.
Another option for shrinking is a block with a 2 stride.
• Another 1x1 convolution is used in the third layer, but this time there is no non-
linearity. If ReLU is applied again, deep networks will only have the power of a
linear classifier on the non-zero volume part of the output domain, according to the
assertion.
• There is also a t expansion factor. t=6 was utilized in each of the primary experi-
ments.
• The internal output would have 64xt=64x6=384 channels if the input had 64 chan-
nels.
12
Figure 4.1: Architecture
Tkinter has Python binding to the Tk GUI toolkit. Tkinter is the Python interface to
the Tk GUI toolkit, and it is also Python’s standard GUI. A bar graph showing which
emotion was shown in the done using plt.bar(). Tkinter can be used to import the Tkinter
module, to create the GUI application main window, even add one or more widgets to the
GUI application and enter the main event loop to take action against each event triggered
by the user.
13
4.5.2 Back End
The back-end of the project is the code that runs on the server. It is responsible in
receiving requests from the clients, and then the logic to send the appropriate data back
to the client. The back-end also includes the database, which will persistently store all
of the data for the application.We used Python for our back end development because it
has several powerful libraries with large amount of pre-written code.This saves developers
to write the code from scratch,which speeds up the development time. Therefore it is an
ideal choice to use Python as the language for back-end development.
14
4.5.4 Sequence Diagrams
15
Figure 4.4: Detection Process
16
4.6 Modular Division
The data consists of grayscale images of faces at a resolution of 48x48 pixels. The goal
is to categorise each face into one of seven categories based on the emotion expressed
in the facial expression (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Neutral, 5=Sad,
6=Surprised). There are 28,709 examples in the training set and 3,589 examples in the
public test set.
Imported required models to allow the model to access system resources. The user then
inputs an emotion (in the form of a text), and the model uses a given datapath to contact
the system and return an output of that emotion in the form of pictures from the dataset.
• Conv2D Layer: This layer generates a tensor of outputs by convolving the layer
input with a convolution kernel.
• Dense Layer: This layer is a simple layer of neurons in which each neuron receives
input from all of the neurons in the previous layer, hence the name. Dense Layers
are used to identify images based on convolutional layer output.
17
patch. The end result is a set of down sampled or pooled feature maps that highlight
show the patch’s most prominent feature.
18
Figure 4.7: Model Layers
• The Model was trained for a total of 48 epochs, which took about 12 hours. Adam
has been chosen as the optimiser to improve the Model’s accuracy. The learning
rate is set at 0.001, and the activation functions ReLU and SoftMax are utilised.
• The dataset is separated into training and validation sets, with 128 photos in each
batch.
19
4.6.5 Model Testing
Videos of users expressing various emotions were used to test the model. The count of
each emotion displayed in the video input was then calculated.
A graph depicting the frequency of each emotion was displayed. The 7 emotions were
split into 3 lists: Satisfied (Happy, Surprised), Not Satisfied (Fear, Disgust, Anger) and
Neutral (Neutral) The total number of emotions in each of the three lists was computed,
and the highest of the three sums was determined. The final review is then displayed,
which is based on the aggregate of the three lists.
20
Figure 4.9: Result Displayed
21
4.7 Algorithms
This is a Deep Learning system that can take an image as input and assign value (learn-
able weights and biases) to different aspects/objects in the image, as well as distinguish
them from one another. When compared to other classification methods, the amount of
pre-processing required by a ConvNet is significantly less. Basic techniques necessitate
the hand-engineering of filters, but with the right training, ConvNets can learn these fil-
ters/characteristics. Individual neurons can only respond to stimuli inside the Receptive
Field, which is a small section of the visual field. To cover the entire visual field, a series
of equivalent fields can be piled on top of each other. A ConvNet may capture the spatial
and temporal correlations in a picture by using the appropriate filters. The architecture
allows for better fitting to the image collection because of the reduced number of parame-
ters and the reusability of weights. To put it in other words, the network could be taught
to recognise the level of sophistication in an image. The objective of the ConvNet is to
compress the images in such a way that they are easy to analyse while yet maintaining
crucial components that allow for accurate prediction. This is required for developing an
architecture capable of learning features while also being scalable to large datasets.
1. Kernel: The Convolution Layer retrieves high-level information such as edges from
an input image. There is no requirement for Convolutional Networks to have only
one Convolutional Layer. Low-level data such as edges, colours, gradient direction,
and so on are usually recorded by the first ConvLayer. Now that layers have been
added, the architecture responds to the High-Level characteristics as well, providing
us a network that knows the photos in the database as well as we do.
2. The Pooling layer: Like the Convolutional Layer, this layer is in charge of min-
imizing the Convolved Feature’s spatial size. Dimensionality reduction reduces the
amount of computing power required to process the data. It is also possible to ex-
tract rotational and positional invariant dominant features, which is useful during
the model’s training phase. Maximum and average pooling are the two different
types of pooling. Max Pooling returns the highest value from the picture that
the Kernel covers. Average Pooling, on the other hand, calculates the average of
22
all values in the image’s Kernel region. Max Pooling can also be used as a noise
reducer. It eliminates all noisy activations while de-noising and reducing dimen-
sionality. Average Pooling, on the other hand, is a noise-suppression approach that
reduces dimensionality. As a result, Max Pooling outperforms Average Pooling.
23
3. Fully Connected Layer: It is a (usually) low-cost method to use a Fully-Connected
layer to learn non-linear combinations of high-level information represented by the
convolution layer’s output. This layer may be learning a nonlinear function in this
case. Now that we’ve converted the image to a Multi-Level Perceptron-compatible
format, we’ll flatten it into a column vector. Backpropagation is used in each round
of training to transmit the flattened output to a feed-forward neural network. Using
the Softmax Classification technique, the model can differentiate between dominat-
ing and certain low-level features in images across multiple epochs and classify them.
24
4.7.2 Adam Algorithm:
25
Figure 4.12: Comparison of Adam to Other Optimization Algorithms
26
Chapter 5
Results
A model is created, trained with FER2013 dataset that detects 7 emotions - happy,
sad, anger, fearful, surprised, ,disgust ,neutral and output is a graphical representation of
the emotions detected and one of the 3 results- satisfied, not satisfied, neutral.
The model’s performance was evaluated during training and following data was ob-
tained:
• Accuracy - 82%
27
Chapter 6
6.1 Conclusion
The method for analysing customer emotions has been proven in this project. The goal
was to classify facial expressions into one of seven emotions using several models using
the FER2013 dataset. The goal of this project is to offer a CNN-based emotion detection
model that makes use of OpenCV, CNN modules, TensorFlow framework, Keras, and
Matplotlib. The video input recognises the faces, which are then grayscaled and sent
into our classification model. With the Adam optimizer, an overall accuracy of 82% was
achieved.
This research can be further analysed and studied in order to generate more accurate
models using various algorithms and image processing techniques. With more people par-
ticipating in this field of research, there’s a chance that a fully automated facial expression
detection system with 100
As a result, the range of emotions evoked by facial expressions has increased significantly,
making customer prediction and decision-making much easier. Proactive marketing or
product design strategies may be established to improve corporate operations and com-
petitive power of the corresponding enterprises. It can be used in a variety of real-world
applications, including healthcare, marketing, and the video game industry.
6.2 Recommendation
• Users should upload videos with good resolution and suitable lighting to help with
the detection of faces and expressions. This may help to increase the model’s overall
accuracy.
• The availability of a larger dataset for training is required to improve the model’s
28
performance.
29
Glossary
3. Open CV: OpenCV (Open Source Computer Vision Library) is an open source
computer vision and machine learning software library. It was built to provide
a common infrastructure for computer vision applications and to accelerate the
use of machine perception in the commercial products.
5. Max Pooling: Max Pooling is a pooling operation that calculates the max-
imum value for patches of a feature map, and uses it to create a downsam-
pled(pooled) feature map. It is usually used after a convolutional layer.
7. Fully Connected Layer: Fully Connected Layer is simply, feed forward neural
networks. Fully Connected Layers form the last few layers in the network.
30
References
[1] Ivona Tautke, Tomasz Trzcinski, Adam Bielski, “I Know how You Feel: Emotion
Recognition with Facial Landmarks”,2020
[2] Sivo Prasad Raju, Saumya A and Dr. Romi Murthy, “Facial Expression Detection
using Different CNN Architecture Hybrid Vehicle Driving”, Centre for Communica-
tions, International Institute of Information Technology.,2020
[3] Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, “Region Attention Networks
for Pose and Occlusion Robust Facial Expression Recognition”,2019
[4] Marek Kowalski, Jacek Naruniec, Tomasz Trzcinski, Deep Alignment Network: A
convolutional neural network for robust face alignment”,2017
[5] Deepesh Lekhak, “Facial Expression Recognition System using Convolutional Neural
Network”, Tribhuwan University Institute of Engineering.
[6] B. Hasani and M. H. Mahoor, “Facial expression recognition using enhanced deep 3d
convolutional neural networks,” in Proceedings of CVPRW. IEEE, 2017.
[7] Kingma, Diederik P., and Jimmy Ba. ”Adam: A method for stochastic optimization.”
arXiv preprint arXiv:1412.6980 (2014).
[8] J. Hamm, C.G. Kohler, R.C. Gur, R. Verma, “Automated Facial Ac-
tion Coding System for dynamic analysis of facial expressions in neuropsy-
chiatric disorders,” Journal of neuroscience methods, 200(2), 237-256, 2011.
https://doi.org/10.1016/j.jneumeth.2011.06.023
[10] K.K. Lee, Human expression and intention via motion analysis: Learning, recognition
and system implementation, The Chinese University of Hong Kong Hong, 2004.
31
[11] Z. Zeng, J.T., M. Liu, T. Zhang, N. Rizzoto, Z. Zhang, “Bimodal HCI-related Affect
Recognition” in 2004 International Conference Multimodal Interfaces, State College,
PA, USA, 2004. https://doi.org/10.1145/1027933.1027958
[13] M. Pantic, L.J.M. Rothkrantz, “An expert system for multiple emotional classifica-
tion of facial expressions” in 1999 IEEE International Conference on Tools with Arti-
ficial Intelligence, Chicago, IL, USA, 1999. https://doi.org/10.1109/TAI.1999.809775
32