0% found this document useful (0 votes)
95 views

Bachelor of Technology

This project report describes developing a system to analyze customer reviews using facial expression recognition. The system is designed to use machine learning approaches to understand human behaviors based on facial expressions in order to assess customer satisfaction in real-time. It employs techniques like OpenCV, CNN models, TensorFlow, Keras and Flask to build a facial expression classifier, train it on a dataset, and deploy it as a web interface for customers to provide feedback via a video. The system analyzes the facial expressions and classifies the reviews as either satisfied or not satisfied.

Uploaded by

Ashe Jay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

Bachelor of Technology

This project report describes developing a system to analyze customer reviews using facial expression recognition. The system is designed to use machine learning approaches to understand human behaviors based on facial expressions in order to assess customer satisfaction in real-time. It employs techniques like OpenCV, CNN models, TensorFlow, Keras and Flask to build a facial expression classifier, train it on a dataset, and deploy it as a web interface for customers to provide feedback via a video. The system analyzes the facial expressions and classifies the reviews as either satisfied or not satisfied.

Uploaded by

Ashe Jay
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Project Report On

Customer Review using Facial Expression


Recognition

Submitted in partial fulfillment of the requirements for the


award of the degree of

Bachelor of Technology
in

Computer Science Engineering


By
Alicia George Thottukandathil (RET18CS022)

Ann Mohan Chacko (RET18CS037)

Anupa Anna Sam (RET18CS042)

Arathi Jayachandran (RET18CS044)

Under the guidance of


Mr.Paul Augustine

Department of Information Technology


Rajagiri School of Engineering and Technology
Rajagiri Valley, Kakkanad, Kochi, 682039
June 2022
DEPARTMENT OF INFORMATION TECHNOLOGY
RAJAGIRI SCHOOL OF ENGINEERING AND TECHNOLOGY
RAJAGIRI VALLEY, KAKKANAD, KOCHI, 682039

CERTIFICATE

Certified that project work entitled ”Customer Review using Facial Expression
Recognition” is a bonafide work done by Ms. Alicia George Thottukandathil
(RET18CS022), Ms. Ann Mohan Chacko (RET18CS037), Ms. Anupa
Anna Sam (RET18CS042) and Ms. Arathi Jayachandran (RET18CS044),
of Eighth Semester in partial fulfillment of the requirements for the award of the Degree
of Bachelor of Technology in Computer Science and Engineering from APJ Abdul Kalam
Technological University, Kerala during the academic year 2021-2022.

Dr. Dhanya P M Dr. Jisha G Mr. Paul Augustine


Head of Department Project Coordinator Project Guide
Associate Professor Asst.Professor Asst.Professor
Dept. of CSE Dept. of CSE Dept. of CSE
RSET RSET RSET

Internal Examiner External Examiner


ACKNOWLEDGEMENTS

We wish to express our sincere gratitude towards Dr.P.S.Sreejith, Principal of RSET,and


Dr.Dhanya P.M, Associate Professor, Head of Department of Computer Science Engi-
neering for providing me the opportunity to present the project ”Customer Review using
Facial Expression Recognition”.

We are highly indebted to our seminar coordinators, Dr. Jisha G, Assistant Profes-
sor,Department of CSE, Ms.Meenu Mathew, Assistant Professor, Department of CSE,
and Mr. Ajith S, Assistant Professor, Department of CSE for their valuable support.

It is indeed our pleasure and a moment of satisfaction for us to express our sincere
thanks to our project guide Mr. Paul Augustine, Assistant Professor, Department of CSE
for his patience and all priceless advice and for all the wisdom he has shared with us.

Last but not the least, We would like to express our sincere gratitude towards all other
teachers and friends for their continuous support and constructive ideas.

Alicia George Thottukandathil


Ann Mohan Chacko
Anupa Anna Sam
Arathi Jayachandran
ABSTRACT

Facial expressions are used by humans to indicate their emotions and, on rare
occasions, their needs. It could be a smiling or frowning expression. Our words don’t
always have the same impact as our actions. Nowadays, customer satisfaction is crucial
for businesses and organisations. Conducting surveys and handing out questionnaires to
clients are examples of manual procedures. Marketers and businesses, on the other hand,
are seeking for quick ways to get relevant and timely feedback from potential customers.

In this study, we employ machine-learning approaches to develop a new way for


assessing client happiness using face emotion recognition. The goal is to adapt image
processing technology to understand human behaviours based on a variety of people’s
facial expressions, and then use that data to assess customer reviews in the retail indus-
try. It also employs some of Python’s most advanced features to create software that
can recognise human expression in real time. The goal is to create a facial expression
recognition classification model using OpenCV, CNN modules, TensorFlow framework,
Keras, and Matplotlib, and then deploy the trained model to a web interface and design
a customer review page using Flask API. The information is examined and divided into
two categories: satisfied and not satisfied customers.
Contents

Acknowledgements ii

Abstract iii

List of Figures vi

1 Introduction 1
1.1 Preamble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Scope and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Literature Survey 7

3 Hardware and Software Specification 9


3.1 Hardware Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Software Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Proposed Method 10
4.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Scope of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.1 Face Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3.2 Facial Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 11
4.3.3 Emotion recognition . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.4 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.5 Design of the System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

iv
4.5.1 Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.5.2 Back End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5.3 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5.4 Sequence Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6 Modular Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6.2 Emotion Detection Training . . . . . . . . . . . . . . . . . . . . . . 17
4.6.3 Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.6.5 Model Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6.6 Front End Development . . . . . . . . . . . . . . . . . . . . . . . . 20
4.7 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.7.1 Convolutional Neural Network (CNN/ConvNets): . . . . . . . . . . 22
4.7.2 Adam Algorithm: . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Results 27

6 Conclusion and Future Scope 28


6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Future Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Glossary 30

References 31

v
List of Figures

1.1 Color Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3


1.2 Different Channels of an Image . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Representation of the Channels as 2D Arrays . . . . . . . . . . . . . . . . . 4
1.4 3D Array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Use Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Training Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.4 Detection Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.5 Samples from FER2013 Dataset . . . . . . . . . . . . . . . . . . . . . . . . 17
4.6 Model Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.7 Model Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.8 Front End . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.9 Result Displayed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.10 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.11 Fully Connected Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.12 Comparison of Adam to Other Optimization Algorithms . . . . . . . . . . 26

vi
Chapter 1

Introduction

1.1 Preamble

Since the ability to consistently recognise human emotions has a wide range of appli-
cations, emotion detection has long been a focus of computer vision research. Manual
approaches including as satisfaction surveys, interviews, and focus groups can be used
to assess customer satisfaction. These procedures are inefficient and ineffective in terms
of cost, time, and data reliability. Nonverbal communication is done through facial ex-
pressions. Facial expressions are one of the most important nonverbal components in
interpersonal communication because they contain emotional significance. They are a
unique way for us to convey our feelings and gratitude. A negative feedback sentiment is
frequently associated with a lower perceived quality of service in the context of customer
satisfaction.

We plan to focus on the issue of emotion recognition and develop a system that can
identify emotions in real time. This project makes use of a Kaggle dataset containing
48x48-pixel grayscale photos of faces. TensorFlow is a powerful deep learning library
in Python that allows you to run deep learning neural Convolutional networks for dig-
its (handwritten) categorization, picture pre-processing, and recognition. For loading,
displaying, and analysing the data, various libraries such as NumPy, Pandas, and Mat-
plotlib were employed. Keras, one of the libraries used to code deep learning models, is
also employed in this model. TensorFlow is used at the backend.

This project contains many of the characteristics required for accurate facial expression
recognition. Using Convolution Neural Networks, we constructed a model that accurately
classifies faces as one of the seven basic emotions: happy, sad, anger, neutral, disgust,

1
fear, and surprise. The goal of this research is to use facial expression analysis to detect
clients’ positive, negative, or neutral emotions.

The FER is made of three primary processes in traditional FER approaches:

• Face and facial component identification

• Feature extraction

• Expression categorization.

First, a face image is extracted from an input image, followed by the detection of facial
components (e.g., eyes and nose) or landmarks in the face region. Second, from the
facial components, various spatial and temporal features are derived. Third, utilising the
retrieved features, pre-trained FE classifiers such as a support vector machine (SVM),
AdaBoost, and random forest give recognition results.

1.1.1 Image Representation

We can represent an image in a variety of ways in computer science. The majority of


the time, it pertains to how information is sent, such as how colour is digitally coded, and
how an image is saved, i.e., how an image file is constructed.

To create, alter, store, and trade digital photographs, several open standards were
suggested. The guidelines defined the picture file format, image encoding techniques, and
the form of additional information known as metadata.

1.1.2 Image

The dimensions (height and breadth) of an image are determined by the number of
pixels. For example, if an image’s dimensions are 500 × 400 (width x height), the image’s
total number of pixels is 200000.

This pixel is a point on the image that takes on a specific shade, opacity or color. It is
usually represented in one of the following:

• Grayscale - A pixel is an integer with a value ranging from 0 to 255 (0 is black and
255 is white).

2
• RGB - A pixel is made up of three numbers ranging from 0 to 255. (the integers
represent the intensity of red, green, and blue).

• RGBA -It’s an extension of RGB with an additional alpha field that reflects the
image’s opacity.

Red, green, and blue are the three channels in an RGB image. RGB channels, which
are used in computer displays and image scanners, roughly follow the colour receptors in
the human eye.

If the RGB image is 24-bit (the industry standard as of 2005), each red, green, and
blue channel contains 8 bits—in other words, the image is made up of three images (one
for each channel), each of which may store discrete pixels with traditional brightness
intensities between 0 and 255. Each channel in a 48-bit RGB image (extremely high
colour depth) is made up of 16-bit images.

Figure 1.1: Color Image

3
Figure 1.2: Different Channels of an Image

Figure 1.3: Representation of the Channels as 2D Arrays

4
Figure 1.4: 3D Array

1.2 Motivation

Human emotions and intents are expressed through facial expressions, and the facial
expression system’s key component is determining an efficient and effective feature. Non-
verbal cues are sent by facial expressions, and they play a vital part in interpersonal
relationships. Automatic facial expression recognition can be a useful feature in natural
human-machine interfaces, as well as in behavioural science and medical practise. Face
identification and positioning in a chaotic scene, facial feature extraction, and facial emo-
tion classification are all difficulties that an autonomous Facial Expression Recognition
system must overcome. FER systems can be used in a variety of settings, including com-
puter interfaces, health-care systems, and social marketing. However, due to the delicate
and fleeting motions of the foreground people and the noisy background environment in
real-world images/videos, face expression analysis is extremely difficult. In a FER system
for detecting emotions, extracting features is a critical step. The second critical stage in
a FER system is the classification of emotions. Despite the fact that researchers have
sought to fix the problems in real-world applications, there are still three major issues to
be addressed (illumination variation, subject-dependence, and head pose).

5
1.3 Scope and Objectives

1. Using facial expression recognition, detect the emotions that a customer is experi-
encing.

2. To correctly identify emotions and provide an analysis based on them.

3. Determine whether or not the customer is satisfied with the product.

1.4 Summary

Using facial expression recognition, our study provides an analytic analysis of a client
using a product or service. We’re using a recorded video of a client using the product to
detect the emotions shown by the customer while using it. The FER2013 dataset, which
we received from Kaggle, was used to train a neural network model. Face detection was
done with OpenCV, and training and testing were done with TensorFlow. We provide
a video file as input, and each frame is retrieved from the film to detect the emotions
displayed in each frame. A bar graph is created based on the total number of times each
emotion was shown. If the customer left a favourable, negative, or neutral review, a report
is generated.

6
Chapter 2

Literature Survey

Marek Kowalski et al., Jacek Naruniec, and Tomasz Trzcinski[4] have shown a Deep
Learning alignment work based on Convolutional NN, which is a rigorous face calibra-
tion method. They claim that, unlike current face alignment algorithms, Deep Alignment
Network performs face calibrations mostly based on whole-facial pictures, making it excep-
tionally accurate to significant changes in both initializations and forehead poses. Using
heatmaps with landmarks and communicating the detail of landmark placements across
DAN stages, they were able to use face images instead of locally accessible markers recov-
ered around landmarks. When applied to two different activities, extensive performance
evaluation reduces the ultra-modern failure rate by a significant limit of more than 70

Kai Wang et al., Xiaojiang Peng, Jianfei Yang, and Debin Meng[3] have presented a
real-world implementation of a facial expression recognition system that resolves the steps
that occur when changes are made. The scientists used FER datasets to design a number
of additional tests on these phases, as well as a novel ”Region Attention Network (RAN)”
that represents the importance of face landmarks. The authors also evaluated their data
collection approach and conducted extensive study on FER-Plus and Affect-Net.

Ivona Tautke et al., Tomasz Trzcinski, and Adam Bielski[1] have shared their thoughts
on a work-in-progress technique that uses facial emotion detection to extract a lot of
information from face landmarks. The findings are based on the JAFFE dataset, which
revealed some signs of a possible development area as well as enhanced precision. The
authors give an overview, claiming that the proposed method has a lot of potential to
outperform existing methods.

7
B. Hasani et al. M. H. Mahoor[6] suggested a 3-Dimensional Convolutional Neural Net-
work approach for FER in video frames. This model creates a 3D Inception-ResNetlayers
followed by an LSTM unit that grasps the spatial relationships inside images of faces as
well as the temporal relationships between distinct frames of video. Facial curve dots are
also employed as examples in their network design, which focuses on instances of facial
landmarks rather than other well-known facial patches that aren’t as useful and may be
unable to generate significant facial expressions.

Sivo Prasad Raju et al. Saumya A and Dr. Romi Murthy[2] proposed an architecture in
which face emotions/expressions are classified using convolutional neural networks (CNN).
To obtain good accuracy during the training phase, the authors employed the Japanese
Female Face Expression (JAFFE) dataset of facial emotion images for training CNN. CNN
has been utilised to detect tiredness or alertness of drivers in real time in the concept of
a hybrid vehicle.

Deepesh Lekhak[5] suggested a programmed facial Expression Recognition system that


can recognise and locate faces landmarks in a confusing scene, extract a set of facial
movements, and classify facial emotions. This model was created using Convolutional
NN, which is completely reliant on a network design known as ”Le-Net,” and the Kaggle
facial expression (FER2013) dataset, which has seven facial expression class labels: happy,
sad, surprise, disgust, fear, anger, and neutral.

8
Chapter 3

Hardware and Software Specification

3.1 Hardware Specification

• Local Machine(Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz 3.19 GHz, 5 GB RAM,


64-bit OS)

• Good quality camera

3.2 Software Specification

• Languages used - Python

• Anaconda

• OpenCV

• TensorFlow

• Keras

• Seaborn

• Matplotlib

• Numpy

• Pandas

9
Chapter 4

Proposed Method

4.1 Problem Definition

Human emotions and intents are expressed through facial expressions, and the facial
expression system’s key component is determining an efficient and effective feature. Non-
verbal cues are sent by facial expressions, and they play a vital part in interpersonal
relationships. Automatic facial expression recognition can be a useful feature in natural
human-machine interfaces, as well as in behavioural research. Face identification and po-
sitioning in a chaotic scene, facial feature extraction, and facial emotion classification are
all difficulties that an autonomous Facial Expression Recognition system must overcome.

4.2 Scope of the Work

• The goal is to create a system that generates an analytical review based on a cus-
tomer’s facial expression detection.

• A trained model for recognising facial expressions.

• Customer satisfaction with a product is analysed graphically and textually.

4.3 Methodology

4.3.1 Face Detection

The first and most important step in face recognition is face detection. It recognises
persons in photos. It’s a type of object detection that can be used for a variety of
purposes, including security, biometrics, law enforcement, entertainment, and personal
safety. It is used for surveillance and item tracking by recognising faces in real time. The
image is imported initially by specifying its location. The image is then converted from

10
RGB to grayscale since faces are easier to discern in grayscale. The image is then edited,
which may involve resizing, cropping, blurring, and sharpening if needed. The following
phase is image segmentation, which is used for contour detection.
The next step is to figure out where the human faces are in a frame or image. Some
characteristics of the human face are universal, such as the nose region being brighter
than the eye region and the region being darker than its surrounding pixels. The x, y,
w, and h coordinates are then given, which construct a rectangle box in the image to
show the location of the face or the region of interest in the image. The programme can
then draw a rectangle box around the recognised face in the target area. A number of
distinct detection techniques, such as grin detection, eye detection, and so on, are used
in conjunction with one another.

4.3.2 Facial Feature Extraction

A facial expression recognition system uses algorithms to detect faces, code facial expres-
sions, and recognise emotional states in real time because it is a computer-based tech-
nology. It does so by using computer-powered cameras installed in laptops, cellphones,
and digital signage systems, as well as cameras mounted on computer screens, to evaluate
faces in pictures or video. Three processes are commonly included in facial analysis using
computer-assisted cameras:

1. Face recognition: Face recognition in a scenario, an image, or video footage.

2. Recognition of facial landmarks: The data on facial features is collected from faces
that have been discovered. Two instances are detecting the shape of facial compo-
nents and characterising the texture of skin in a facial area.

3. Classification of facial expressions and emotions: Analyzing and categorising diverse


facial traits and/or changes in appearance into expression-interpretive categories
such as facial muscle activations such as smile or frown, emotion categories such as
happy or anger, and attitude categories such as disliking/liking.

4.3.3 Emotion recognition

Face expressions are a vital means of communication for people and animals alike. Facial
expressions can be used to study human behaviour and psychological features. It’s also

11
employed in a number of medical procedures and treatments. In this section, we’ll in-
terpret the sentiment portrayed in the image using photographs of facial expressions and
portraits of faces.

4.4 System Architecture

• There are two sorts of blocks in MobileNetV2. One is a one-stride residual block.
Another option for shrinking is a block with a 2 stride.

• Both sorts of blocks have three levels.

• This time, the first layer is a 1x1 convolution with ReLU6.

• The depthwise convolution is the second layer.

• Another 1x1 convolution is used in the third layer, but this time there is no non-
linearity. If ReLU is applied again, deep networks will only have the power of a
linear classifier on the non-zero volume part of the output domain, according to the
assertion.

• There is also a t expansion factor. t=6 was utilized in each of the primary experi-
ments.

• The internal output would have 64xt=64x6=384 channels if the input had 64 chan-
nels.

• In the MobileNet V2 model, there are two distinct components:

12
Figure 4.1: Architecture

4.5 Design of the System

4.5.1 Front End

Tkinter has Python binding to the Tk GUI toolkit. Tkinter is the Python interface to
the Tk GUI toolkit, and it is also Python’s standard GUI. A bar graph showing which
emotion was shown in the done using plt.bar(). Tkinter can be used to import the Tkinter
module, to create the GUI application main window, even add one or more widgets to the
GUI application and enter the main event loop to take action against each event triggered
by the user.

13
4.5.2 Back End

The back-end of the project is the code that runs on the server. It is responsible in
receiving requests from the clients, and then the logic to send the appropriate data back
to the client. The back-end also includes the database, which will persistently store all
of the data for the application.We used Python for our back end development because it
has several powerful libraries with large amount of pre-written code.This saves developers
to write the code from scratch,which speeds up the development time. Therefore it is an
ideal choice to use Python as the language for back-end development.

4.5.3 Use Case Diagram

Figure 4.2: Use Case Diagram

14
4.5.4 Sequence Diagrams

Figure 4.3: Training Process

15
Figure 4.4: Detection Process

16
4.6 Modular Division

4.6.1 Data Collection

The data consists of grayscale images of faces at a resolution of 48x48 pixels. The goal
is to categorise each face into one of seven categories based on the emotion expressed
in the facial expression (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Neutral, 5=Sad,
6=Surprised). There are 28,709 examples in the training set and 3,589 examples in the
public test set.

Figure 4.5: Samples from FER2013 Dataset

4.6.2 Emotion Detection Training

Imported required models to allow the model to access system resources. The user then
inputs an emotion (in the form of a text), and the model uses a given datapath to contact
the system and return an output of that emotion in the form of pictures from the dataset.

4.6.3 Model Building

Convolutional Neural Networks are made up of numerous layers, including Conv2D,


MaxPooling2D, Dense, and Dropout.

• Conv2D Layer: This layer generates a tensor of outputs by convolving the layer
input with a convolution kernel.

• Dense Layer: This layer is a simple layer of neurons in which each neuron receives
input from all of the neurons in the previous layer, hence the name. Dense Layers
are used to identify images based on convolutional layer output.

• MaxPooling2D Layer: Maximum pooling, often known as max pooling, is a pooling


procedure that determines the maximum, or largest, value in each feature map

17
patch. The end result is a set of down sampled or pooled feature maps that highlight
show the patch’s most prominent feature.

Figure 4.6: Model Layers

18
Figure 4.7: Model Layers

4.6.4 Model Training

• The Model was trained for a total of 48 epochs, which took about 12 hours. Adam
has been chosen as the optimiser to improve the Model’s accuracy. The learning
rate is set at 0.001, and the activation functions ReLU and SoftMax are utilised.

• The dataset is separated into training and validation sets, with 128 photos in each
batch.

• ReduceLROnPlateau is used to reduce the learning rate by a factor of 0.2 once


learning has reached a plateau. EarlyStopping is used to minimise overfitting when
training the model, and ReduceLROnPlateau is used to reduce the learning rate by
a factor of 0.2 once learning has reached a plateau.

• Finally, a .h5 file is created to store the trained model (model.h5).

19
4.6.5 Model Testing

Videos of users expressing various emotions were used to test the model. The count of
each emotion displayed in the video input was then calculated.
A graph depicting the frequency of each emotion was displayed. The 7 emotions were
split into 3 lists: Satisfied (Happy, Surprised), Not Satisfied (Fear, Disgust, Anger) and
Neutral (Neutral) The total number of emotions in each of the three lists was computed,
and the highest of the three sums was determined. The final review is then displayed,
which is based on the aggregate of the three lists.

4.6.6 Front End Development

Figure 4.8: Front End

20
Figure 4.9: Result Displayed

21
4.7 Algorithms

4.7.1 Convolutional Neural Network (CNN/ConvNets):

This is a Deep Learning system that can take an image as input and assign value (learn-
able weights and biases) to different aspects/objects in the image, as well as distinguish
them from one another. When compared to other classification methods, the amount of
pre-processing required by a ConvNet is significantly less. Basic techniques necessitate
the hand-engineering of filters, but with the right training, ConvNets can learn these fil-
ters/characteristics. Individual neurons can only respond to stimuli inside the Receptive
Field, which is a small section of the visual field. To cover the entire visual field, a series
of equivalent fields can be piled on top of each other. A ConvNet may capture the spatial
and temporal correlations in a picture by using the appropriate filters. The architecture
allows for better fitting to the image collection because of the reduced number of parame-
ters and the reusability of weights. To put it in other words, the network could be taught
to recognise the level of sophistication in an image. The objective of the ConvNet is to
compress the images in such a way that they are easy to analyse while yet maintaining
crucial components that allow for accurate prediction. This is required for developing an
architecture capable of learning features while also being scalable to large datasets.

1. Kernel: The Convolution Layer retrieves high-level information such as edges from
an input image. There is no requirement for Convolutional Networks to have only
one Convolutional Layer. Low-level data such as edges, colours, gradient direction,
and so on are usually recorded by the first ConvLayer. Now that layers have been
added, the architecture responds to the High-Level characteristics as well, providing
us a network that knows the photos in the database as well as we do.

2. The Pooling layer: Like the Convolutional Layer, this layer is in charge of min-
imizing the Convolved Feature’s spatial size. Dimensionality reduction reduces the
amount of computing power required to process the data. It is also possible to ex-
tract rotational and positional invariant dominant features, which is useful during
the model’s training phase. Maximum and average pooling are the two different
types of pooling. Max Pooling returns the highest value from the picture that
the Kernel covers. Average Pooling, on the other hand, calculates the average of

22
all values in the image’s Kernel region. Max Pooling can also be used as a noise
reducer. It eliminates all noisy activations while de-noising and reducing dimen-
sionality. Average Pooling, on the other hand, is a noise-suppression approach that
reduces dimensionality. As a result, Max Pooling outperforms Average Pooling.

Figure 4.10: Pooling Layer

23
3. Fully Connected Layer: It is a (usually) low-cost method to use a Fully-Connected
layer to learn non-linear combinations of high-level information represented by the
convolution layer’s output. This layer may be learning a nonlinear function in this
case. Now that we’ve converted the image to a Multi-Level Perceptron-compatible
format, we’ll flatten it into a column vector. Backpropagation is used in each round
of training to transmit the flattened output to a feed-forward neural network. Using
the Softmax Classification technique, the model can differentiate between dominat-
ing and certain low-level features in images across multiple epochs and classify them.

Figure 4.11: Fully Connected Layer

24
4.7.2 Adam Algorithm:

Instead of utilising the standard stochastic gradient descent approach, Adam is an


optimization algorithm that may be used to repeatedly update network weights based
on training data. For all weight updates, stochastic gradient descent maintains a con-
stant learning rate. Each network weight has its own learning rate, which varies as it
acquires more information. Individual adaptive learning rates for various parameters are
estimated using the first and second moments of the gradients. Adam, according to the
authors, combines the benefits of two prior stochastic gradient descent advancements. By
maintaining a constant learning rate per parameter, the Adaptive Gradient Algorithm
(AdaGrad) enhances efficiency in circumstances with sparse gradients. Per-parameter
learning rates (e.g., how fast it’s evolving) are also retained via Root Mean Square Propa-
gation (RMSProp), which is adjusted based on the average of recent gradient magnitudes
for the weights. This demonstrates that the strategy is applicable to both stationary and
non-stationary problems. Both AdaGrad and RMSProp are important to Adam. Rather
than using the average first moment as RMSProp does, Adam averages the second mo-
ments of the gradients that change the parameter learning rates. The method generates a
squared gradient as well as an exponential moving average of the gradient. The variables
beta1 and beta2 regulate the rates of decay of these moving averages. Moment estima-
tions are biassed towards zero due to the beginning value of the moving averages, as well
as beta1 and beta2 values near 1.0. (recommended). To overcome the bias, calculate the
inaccurate estimates first, then the bias-corrected estimates. Adam is now the suggested
strategy to utilise because it consistently outperforms RMSProp.

25
Figure 4.12: Comparison of Adam to Other Optimization Algorithms

26
Chapter 5

Results

A model is created, trained with FER2013 dataset that detects 7 emotions - happy,
sad, anger, fearful, surprised, ,disgust ,neutral and output is a graphical representation of
the emotions detected and one of the 3 results- satisfied, not satisfied, neutral.

The model’s performance was evaluated during training and following data was ob-
tained:

• Loss Function- 0.014

• Accuracy - 82%

27
Chapter 6

Conclusion and Future Scope

6.1 Conclusion

The method for analysing customer emotions has been proven in this project. The goal
was to classify facial expressions into one of seven emotions using several models using
the FER2013 dataset. The goal of this project is to offer a CNN-based emotion detection
model that makes use of OpenCV, CNN modules, TensorFlow framework, Keras, and
Matplotlib. The video input recognises the faces, which are then grayscaled and sent
into our classification model. With the Adam optimizer, an overall accuracy of 82% was
achieved.
This research can be further analysed and studied in order to generate more accurate
models using various algorithms and image processing techniques. With more people par-
ticipating in this field of research, there’s a chance that a fully automated facial expression
detection system with 100
As a result, the range of emotions evoked by facial expressions has increased significantly,
making customer prediction and decision-making much easier. Proactive marketing or
product design strategies may be established to improve corporate operations and com-
petitive power of the corresponding enterprises. It can be used in a variety of real-world
applications, including healthcare, marketing, and the video game industry.

6.2 Recommendation

• Users should upload videos with good resolution and suitable lighting to help with
the detection of faces and expressions. This may help to increase the model’s overall
accuracy.

• The availability of a larger dataset for training is required to improve the model’s

28
performance.

• Customization of the application to allow it to be utilised by a variety of businesses.

6.3 Future Scope

• Introduction of more emotions that can be detected.

• Capability to detect faces in all angles.

29
Glossary

1. Tensorflow: TensorFlow is a Python library for fast numerical computing cre-


ated and released by Google. It is the most used Deep Learning framework and
it has pre-trained models that easily help with image classification.

2. CNN: A convolutional neural network (CNN) is a type of artificial neural net-


work used in image recognition and processing that is specifically designed to
process pixel data.

3. Open CV: OpenCV (Open Source Computer Vision Library) is an open source
computer vision and machine learning software library. It was built to provide
a common infrastructure for computer vision applications and to accelerate the
use of machine perception in the commercial products.

4. FER2013: Facial emotion recognition (FER) is an important topic in the fields


of computer vision and artificial intelligence. It contains approximately 30,000
facial RGB images of different expressions with size restricted to 48×48, and
the main labels of it can be divided into 7 types: 0=Angry, 1=Disgust, 2=Fear,
3=Happy, 4=Sad, 5=Surprise, 6=Neutral.

5. Max Pooling: Max Pooling is a pooling operation that calculates the max-
imum value for patches of a feature map, and uses it to create a downsam-
pled(pooled) feature map. It is usually used after a convolutional layer.

6. Bounding box: A bounding box is an imaginary rectangle that serves as a


point of reference for object detection and creates a collision box for that object.

7. Fully Connected Layer: Fully Connected Layer is simply, feed forward neural
networks. Fully Connected Layers form the last few layers in the network.

30
References

[1] Ivona Tautke, Tomasz Trzcinski, Adam Bielski, “I Know how You Feel: Emotion
Recognition with Facial Landmarks”,2020

[2] Sivo Prasad Raju, Saumya A and Dr. Romi Murthy, “Facial Expression Detection
using Different CNN Architecture Hybrid Vehicle Driving”, Centre for Communica-
tions, International Institute of Information Technology.,2020

[3] Kai Wang, Xiaojiang Peng, Jianfei Yang, Debin Meng, “Region Attention Networks
for Pose and Occlusion Robust Facial Expression Recognition”,2019

[4] Marek Kowalski, Jacek Naruniec, Tomasz Trzcinski, Deep Alignment Network: A
convolutional neural network for robust face alignment”,2017

[5] Deepesh Lekhak, “Facial Expression Recognition System using Convolutional Neural
Network”, Tribhuwan University Institute of Engineering.

[6] B. Hasani and M. H. Mahoor, “Facial expression recognition using enhanced deep 3d
convolutional neural networks,” in Proceedings of CVPRW. IEEE, 2017.

[7] Kingma, Diederik P., and Jimmy Ba. ”Adam: A method for stochastic optimization.”
arXiv preprint arXiv:1412.6980 (2014).

[8] J. Hamm, C.G. Kohler, R.C. Gur, R. Verma, “Automated Facial Ac-
tion Coding System for dynamic analysis of facial expressions in neuropsy-
chiatric disorders,” Journal of neuroscience methods, 200(2), 237-256, 2011.
https://doi.org/10.1016/j.jneumeth.2011.06.023

[9] M. Nishiyama, H. Kawashima, T. Hirayama, “Facial Expression representation based


on timing structures in faces”, in 2005 workshop on Analysis Modeling of faces and
Gestures, Beijing, China, 2005. https://doi.org/10.1007/11564386 12

[10] K.K. Lee, Human expression and intention via motion analysis: Learning, recognition
and system implementation, The Chinese University of Hong Kong Hong, 2004.

31
[11] Z. Zeng, J.T., M. Liu, T. Zhang, N. Rizzoto, Z. Zhang, “Bimodal HCI-related Affect
Recognition” in 2004 International Conference Multimodal Interfaces, State College,
PA, USA, 2004. https://doi.org/10.1145/1027933.1027958

[12] A. A. Marsh, H. A. Elfenbein, N. Ambady – “Nonverbal “accents” cultural differ-


ences in facial expressions of emotion”, Psychological Science, 14(4), 373-376, 2003.
https://doi.org/10.1111/1467-9280.24461

[13] M. Pantic, L.J.M. Rothkrantz, “An expert system for multiple emotional classifica-
tion of facial expressions” in 1999 IEEE International Conference on Tools with Arti-
ficial Intelligence, Chicago, IL, USA, 1999. https://doi.org/10.1109/TAI.1999.809775

32

You might also like