Human Expression Detection Using Computer Vision B.E. Project Report - A'
Human Expression Detection Using Computer Vision B.E. Project Report - A'
Vision
B.E. Project Report - ‘A’
Submitted in partial fulfillment of the requirements
For the degree of
Bachelor of Engineering
in
Computer Engineering
by
Ms. Pranjal Ghan 15CE2008
Ms. Kirti Mahale 15CE1016
Mr. Pradnyesh Gumaste 15CE2022
Mr. Tanuj Jain 15CE1097
Supervisor
Prof Dr. Leena Ragha
Co-Supervisor
Mrs. Harsha Saxena
CERTIFICATE
This is to certify that, the project ‘A’ titled
“ Human Expression Detection Using Computer Vision ”
is a bonafide work done by
Supervisor Co-Supervisor
(Prof Dr. Leena Ragha) (Mrs. Harsha Saxena)
We declare that this written submission represents our ideas in our own words and where
other’s ideas or words have been included, we have adequately cited and referenced the orig-
inal sources. We also declare that we have adhered to all principles of academic honesty and
integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my
submission. We understand that any violation of the above will be cause for disciplinary action
by the Institute and can also evoke penal action from the sources which have thus not been
properly cited or from whom proper permission has not been taken when needed.
Date : . . . /. . . /. . . . . .
Project Report Approval for B.E
This is to certify that the project entitled “ Human Expression Detection using Computer
Vision ” is a bonafide work done by Ms. Pranjal Ghan, Ms. Kirti Mahale, Mr. Pradnyesh
Gumaste and Mr. Tanuj Jain under the supervision of Prof Dr. Leena Ragha and Mrs. Harsha
Saxena. This dissertation has been approved for the award of Bachelor’s Degree in Computer
Engineering, University of Mumbai.
Examiners :
1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Supervisors :
1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Principal :
..............................
Date : . . . /. . . /. . . . . .
Place : . . . . . . . . . . . .
Acknowledgement
We take this opportunity to express our profound gratitude and deep regards to our supervisor
Prof Dr. Leena Ragha & co-supervisor Mrs. Harsha Saxena for their exemplary guidance,
monitoring and constant encouragement throughout the completion of this report. We are truly
grateful to their efforts to improve our technical writing skills. The blessing, help and guidance
given by him time to time shall carry us a long way in the journey of life on which we are about
to embark.
We take this privilege to express our sincere thanks to Dr. Ramesh Vasappanavara, Prin-
cipal, RAIT for providing the much necessary facilities. We are also thankful to Dr. Leena
Ragha, Head of Department of Computer Engineering, Project Co-ordinator Mrs. Smita
Bharne and Project Co-coordinator Mrs. Bhavana Alte , Department of Computer Engineer-
ing, RAIT, Nerul Navi Mumbai for their generous support.
Last but not the least we would also like to thank all those who have directly or indirectly
helped us in completion of this thesis.
Classroom environments are affected by legions of factors that are difficult to detect by
college supervising authorities. Evaluating the student-teacher interaction by looking at the
student behaviour from outside the class can simply provide us a shallow understanding of what
actually is happening within the classroom. In order to gain a greater depth of understanding,
facial expressions of the students can be evaluated. Facial expressions are one of the most
important cues for sensing human emotions and behavioral aspects among st humans. Neural
networks, and deep learning in general, are far more effective at categorizing such emotions
due to their robust designs and accuracy in predictions. We also contrast our deep learning
approach with conventional shallow learning based approaches and show that a convolutional
neural network is far more effective at learning representations of facial expression data.
i
Contents
Abstract i
List of Figures iv
List of Tables v
List of Algorithms vi
1 Intro 1
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Organization of report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
2.1 Research Paper Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Similar Existing Project Comparison . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Proposal 10
3.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Proposed Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Proposed Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3.1 Face Detection: Using Viola Jones Algorithm . . . . . . . . . . . . . . 12
3.3.1.1 Using Haar Features . . . . . . . . . . . . . . . . . . . . . . 12
3.3.1.2 AdaBoost Training . . . . . . . . . . . . . . . . . . . . . . 13
3.3.1.3 Cascade Classifier . . . . . . . . . . . . . . . . . . . . . . . 13
ii
3.3.2 Convolutional Neural Network(CNN) . . . . . . . . . . . . . . . . . . 14
3.4 Hardware & Software Requirement . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.1 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4.2 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 Design of System 19
5.1 Diagrams with Explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Proposed Results 24
6.1 Proposed Results & Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.2 Project Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7 Conclusion 26
8 Future Work 28
References 29
Appendix 30
iii
List of Figures
iv
List of Tables
v
List of Algorithms
vi
Chapter 1
Introduction
Human emotions run a gamut from extreme to almost neutral facial emotions. If this huge
range of emotions are carefully analyzed, a lot of information can be obtained regarding the
humans. Each emotion has a unique expression style and using proper interpretation, these
styles can be identified to recognize the particular emotions. Specifically, in a classroom, stu-
dents express a wide range of expressions. These expression can convey a lot of information
regarding the lecture and the comprehension levels of the respective lecture, in the students.
Computer vision is one such technology that can be used to automate this task by implementing
Deep learning models to analyze such data. With the help of this analyzed data, teachers and
responsible educational committees can figure out the different ways to improve the conducted
lectures.
1.1 Overview
Deep learning is the latest upcoming technology. With the automation development that is
currently affecting several industries, it is quite visible that deep learning and computer vision
is the apt technology for our project. For a case in point, consider the live example of computer
vision implemented in the city of mumbai. A trained model captures the image of fast moving
vehicle and if the speed limit is above the speed limit, a direct fine is sent to the driver. Another
implementation of computer vision is the automation robots that are being developed for smart
homes. Robots such as Vector can detect the human emotions and react according to the en-
vironment variables. It is quite evident that the computer vision technology is becoming quite
prominent in automating everyday tasks and in making of intelligent machines.
1
1.2 Objective
The main objective of this project is to leverage computer vision technology in order to
understand and evaluate the various emotions displayed by students in a classroom environment.
Using Deep Learning and computer vision to automate the analysis, we aim to understand the
various expressions displayed by the students in the classroom environment during a lecture.
Using the obtained information, our project can be used to improvise the classroom teaching
and to devise new ways in which students can be motivated during the lectures. The extrapolated
information can be used by teachers and college authorities to improvise the learning. Several
factors such as the quality of lecture, the level of interest in students and many such more,
although not limited to, parameters can be measured and altered by using this designed system.
1.3 Motivation
Nowadays, the class environment is affected by several factors. One of the most common
issues affecting classroom is the lack of student’s attention in the lecture. Students tend to lose
attention in lectures when the lecture begins to cover complex topics or intricate details. Another
common issue is that lecturers fail to provide the energy while teaching the topic. As a result,
the students lose their interest in the subject. As we can see, there are many such, although
not limited to, variegated factors that can influence the overall efficiency of the lecture. Our
project was inspired by this issue faced in the classroom environment. We aim to reduce this
issue by implementing state of the art neural networks to capture the live images of students
in a lecture and to analyze the overall emotions expressed by the students within the lecture.
This information can be further extrapolated to devise several solutions in order to improve the
overall lecture efficiency.
2
1.5 Organization of report
The rest of the report is organized as follows. The literature survey and the related work of
this research for our study is covered in Chapter 2. The motivation which leads us towards our
proposal is discussed in Chapter 3. The planning and formulation for our research is defined in
Chapter 4 followed by the design of the system in Chapter 5. We finally conclude the report by
stating the conclusion of the entire project report in Chapter 7. Chapter 8 provides an insight on
the future work of the entire project.
3
Chapter 2
Literature Survey
There are present existing systems which already use some techniques for detecting emotions
in images. Various techniques are used considering the different environment. Hence we did a
survey of various techniques which is presented in section 2.1.
4
Figure 2.1: Gabor Filter with various size, length and orientation
2. Human Facial Expression Recognition from Static Images using Shape and Appear-
ance Feature
Naveen Kumar H N et al [6] proposed a different approach for emotion recognition from
static images. They used HOG detector to detect the expression from the images. HOG feature
vectors extracted from training images to train the classifier. Characteristics of local shape or
gradient structure are better projected by HOG features. It is relatively invariant to local geo-
metric and photometric transformation such that small pose variation doesnt affect performance
of FER system. HOG feature vector provides information about shape and appearance of the
face which are better characterized by intensity gradients or edge directions. HOG feature ex-
traction algorithm divides static image of a face into small spatial regions known as Cells. Cells
can be rectangular or circular. The image is divided into cells of size N N pixels and for each
cell, gradients are computed.
Then the extracted HOG features are given as input to a group of Support Vector Machines
(SVM). SVM is a discriminative classifier defined through a separating hyper plane. SVMs are
non-parametric and hence boost the robustness associated with Artificial Neural Networks and
other nonparametric classifiers. The purpose of using SVM is to obtain acceptable results in a
fast, accurate and easier manner. Using this method[6], they reached an accuracy of 92.56%.
5
4. A Deep Neural Network Driven Feature Learning Method for Multi-view Facial
Expression Recognition
In this paper Tong Zhang et.al [8] used a Deep Neural Network to detect expression within
Multi-PIE and BU-3DFE, in which there were six faces with different emotions and differ-
ent lighting conditions. For Facial feature extractions they[8] used a method called as SIFT
method which annotated fixed points around nose, mouth and eyes.The scale-invariant feature
transform(SIFT) is a feature detection algorithm in computer vision to detect and describe local
features in images.Then these features were used to make a vector of image having dimension
of 128. In order to improve multi pose analysis of image they[8] used a DNN in which they[8]
used 1D convolution rather than 2D Convolution.Then finally they passed this to a CNN layer
which then predicts the emotion shown by the face in the image.
6
technology. Amazon posits that the computer vision technology can determine when an object
is taken from a shelf and who has taken it. If an item is returned to the shelf, the system is
also able to remove that item from a customers virtual basket. After the buyer is finished with
the shopping, they can directly walk out of the store and the money will be deducted via a
connected bank account. The system using facial recognition to ensure that it bills the correct
buyer for the respective items. Amazon also states that this system also helps reduce shoplifting
by alerting the store owners with any suspicious activity such as passing of items or hiding of
items under clothes. The Amazon Go store implements state of the art technology to achieve
excellent accuracy rate at detecting and constantly monitoring the shopping environment.
7
2.3 Analysis
Sr Research Paper Authors Publication and Year Description and Ac- Limitations
No. Name curacy
1. Facial Expression Abir Fathallah, Lotfi IEEE 14th Interna- CNN was imple- Could not detect
Recognition via Abdi, Ali Douik tional Conference on mented on the CK+ complex emotions.
Deep Learning Computer Systems dataset. The Accu-
and Applications racy obtained was
2017[2] 97%
2. A Deep Neural Tong Zhang, Wen- IEEE Transactions Deep Neural network The dataset used was
Network Driven ming Zheng, Zhen on Multimedia Vol- and SIFT transform limited in size
Feature Learn- Cui, Yuan Zong, ume: 18 , Issue: 12 , was implemented
ing Method for Jingwei Yan and Dec. 2016[8] on the FER2013
Multi-view Fa- Keyu Yan dataset. The accu-
cial Expression racy obtained was
Recognition 96%
3. Facial Expression Sagor Chandro 2nd International Support Vector Ma- The dataset was
Recognition Bakchy, Mst. Jan- Conference on chine with Gabor limited, the accuracy
based on Support natul Ferdous, Electrical and Elec- Wavelets was im- was compromising
Vector Machine Ananna Hoque Sathi, tronic Engineering plemented on the
using Gabor Krishna Chandro (ICEEE) 2017[1] FER2013 dataset.
Wavelet Filter Ray, Faisal Imran, The accuracy ob-
Md. Meraj Ali tained was 84%
4. SenTion: A Rahul Islam, Karan arXiv August Intervector angles The algorithm
framework for Ahuja, Sandip Kar- 2016[10] and histogram lacks compression
Sensing Facial makar, Ferdous features were im- techniques
Expressions Barbhuiya plemented on the
CK+ and JAFEE
datasets. The accu-
racy obtained was
95%
8
Sr Research Paper Authors Publication and Description and Limitations
No. Name Year Accuracy
5. Analysis of Fa- Praseeda World Congress on It was the orig- It was developed
cial Expressions Lekshmi V., Engineering July inial paper which only on the basis of
from Video Im- Dr.M.Sasikumar, 2008 Vol I[7] implemented analy- 10 images
ages using PCA Naveen S sis of expressions
from video images.
The accuracy ob-
tained was 88%
6. DeXpression: Peter Burkert, Felix arXiv August Used CNN for Cannot detect more
Deep Convo- Trier, Muhammad 2017[9] emotion detection complex emotion
lutional Neural Zeshan Afzal An- and used MMI and
Network for dreas Dengel and CKP database and
Expression Marcus Liwicki achieved 99.6%
Recognition and 98.36
After finishing the survey and studying the plethora number of techniques relating to emo-
tion detection we concluded that Convolutional Neural Networks provides excellent results.It is
observed that the FER-2013 database gives the least accuracy because the database represents
images of real life scenarios and were taken from Google database gathering the results of a
Google image search of each emotion and synonyms of the emotions
9
Chapter 3
Proposal
The classroom is affected by a lot of distractions and factors. The quality of classroom teaching
can be degraded due to a plethora of causes that may not be visible superficially. We plan
to create an automated system which will assist college authorities and teachers in evaluating
the emotions expressed by the students, during a lecture. By evaluating this data, the quality
of lecture and the attention of students can be extracted and used for improvising the overall
lecture qualities. In the process of implementation, we will leverage the use of computer vision
technology to detect the faces of students, feed the facial expression extracted by the neural
network and classify the emotions from a range of six different common emotions.
10
3.2 Proposed Work
In this project, we are implementing computer vision technology in order to detect the emo-
tions expressed by the students in a classroom environment. The first stage involves the process
of facial detection from a live stream. The image can be captured in a buffer time using OpenCv.
This image is converted into a gray scale image which has a size of 48*48 pixel. The face de-
tection will be done using Viola-Jones algorithm, specifically using Frontal haar-cascade. This
cropped image will be then fed to the Convolutional neural network model for feature extrac-
tion and emotion detection. The convolutional neural network will train itself to detect emotion
from the these images. The output will be one of the 6 types of emotions namely: -
• Neutral
• Interested
• Bored
• Frustrated
• Confused
• Laughing
11
3.3.1 Face Detection: Using Viola Jones Algorithm
In our system we only need the frontal faces that are normalized in scale from the input images.
Therefore it is important to localize and extract the facial region from an image and exclude the
unnecessary parts of image to reduce the computation after feature extraction and processing.
We are using Viola-Jones face detection method as it is robust algorithm which is capable of
fast processing for the real time systems.
In Viola Jones we construct an intermediate image also known as integral image that it used
in order to reduces computational complexity and improve performance of our face detection
algorithm. The Viola - Jones contains of 3 techniques for the face detection:
The haar features are used for face detection is of a rectangular type which is determined by
an integral image. Figure 3.2 shows different types of haar features which are similar to few
properties common to human faces. The eye region is the one which is darker than the upper-
cheeks so second type of haar feature in figure 3.2 is used to detect that facial region and haar
feature for the nose bridge region which is brighter than the cheeks as shown in the figure 3.3 .
Here, using these features we can find the locations of eyes, bridge of nose and mouth by
calculating
X X
V alue of f eature = pixels in black area − sum of pixels in white area (3.1)
It is used for facial edge detection and hence output is horizontal high value line.In haar we
use 24 X 24 pixel sub-window on the image window to find the edge therefore there are 162,336
possible features that can be extracted which are further used for facial region detection.We
12
create a kernel using haar features to extract this line. Then apply the kernel to the whole image
and it has a high output only where the image value matches the kernel that is our expected
output.
Figure 3.3: Haar Features used to recognize eyes and the bridge of nose regions of face [?]
AdaBoost is used to optimize the process of detecting the face. The term boost determines the
classifiers that are complex in itself at each stage, which are built of basic strong classifiers of
different features.As we have seen in the previous method it calculates large number of features
from a image. Therefore in order to avoid the redundant features that are less important for clas-
sifier we use AdaBoost algorithm. We find the important features using weighted combination
of weak classifiers and features are considered okay to be included if they can perform better.
This algorithm constructs a strong classifier as a linear combination of weighted simple weak
classifiers.
Cascade classifier used to combine many of the features efficiently. The term cascade in a
classifier determines the several filters on a resultant classifier. It on discarding non faces images
to avoid the unnecessary work and spend more time on images with probable face regions.
Therefore a cascade classifier is used which is composed of stages containing strong classifiers.
So that with the output from each we can discard non facial images as shown in figure 3.3.
13
Figure 3.4: Cascade Classifier
14
learn overtime. Every filter is small spatially (along width and height), but extends through
the full depth of the input volume. During the forward pass, the filters slide more precisely,
convolve each filter across the width and height of the input volume and compute dot products
between the entries of the filter and the input at any position.
As we slide the filter over the width and height of the input volume we will produce a
2-dimensional activation map that gives the responses of that filter at every spatial position.
Intuitively, the network will learn filters that activate when they see some type of visual feature
such as an edge of some orientation or a blotch of some color on the first layer, or eventually
entire honeycomb or wheel-like patterns on higher layers of the network. Now, we will have
an entire set of filters in each Convolution layer , and each of them will produce a separate 2-
dimensional activation map. We will stack these activation maps along the depth dimension and
produce the output volume. The spatial extent of connectivity is a hyper parameter called the
receptive field of the neuron (equivalently this is the filter size). In our data set the, the image
have a size of 48x48 pixel and the first layer of the convolution has 32 filters with a size of 5x5
and the next layer of the convolution has 64 filters of size 5x5.
Spatial arrangement- Three hyper parameters control the size of the output volume: -
1. Depth
2. Stride
3. Zero-padding
The spatial size of the output volume as a function of the
1. Input volume size (W).
2. The receptive field size of the Convolution Layer neurons (F)
3. The stride with which they are applied(S)
4. The amount of zero padding used (P) on the border.
The formula for calculating how many neurons fit is given by
In our dataset the input of 48x48 and the receptive field size of 5x5 and 0 padding and stride
set to 1 will generate an output of 44x44 .
1.1 Backpropagation-
The backward pass for a convolution operation for both the data and the weights is also a
convolution but with spatially-flipped filters. With the help of backpropogation we get the
updated weights of the filters
2.Pooling Layer-
15
Its function is to progressively reduce the spatial size of the representation to reduce the
amount of parameters and computation in the network, and to also control overfitting. The
Pooling Layer operates independently on every depth slice of the input and re-sizes it spatially.
The most common form is a Max pooling layer with filters of size 2x2 applied with a stride of 2
downsamples every depth slice in the input by 2 along both width and height, discarding most of
the activations. Every MAX operation would in this case be taking a max over 4 numbers (little
2x2 region in some depth slice). The depth dimension remains unchanged. More generally, the
pooling layer: Accepts a volume of size W1xH1xD1 and requires two hyper-parameters:
1 Spatial extent- F
2 Stride -S
Produces a volume of size W2H2xD2where:
D2 = D1 (3.6)
16
3.4 Hardware & Software Requirement
• Python- Popular and easy to use language for Image Processing algorithms
• Keras back-end- Open source library written in python that allows fast implementation of
Deep neural network models.
17
Chapter 4
18
Chapter 5
Design of System
• DFD Level 0
The Expression detection system will take the image containing face as input. The system
will then try to predict the emotion expressed by the face in image.
19
• DFD Level 1
The expression detection system will take the image containing face as input and will detect
face features that be given to CNN as input to predict the emotions.
The face features are detected using face detection algorithm Viola Jones. The algorithm
will detect features of faces that are then passed to CNN to predict emotions.
20
• DFD Level 2
The face detection algorithm detect the features of face such as eyes ,nose ,mouth etc. These
features are passed to the CNN for features extraction and emotion grouping to get grouped
emotions such as angry,happy ,sad ,laughing etc.
21
USE CASE DIAGRAM
This diagram shows the working of the users and the system in detail.
The use cases Interact with the System ,Provide face is taken by the user. Other use cases like
Detect face,Classify Emotions ,Display Emotions is done by the System.
22
SEQUENCE DIAGRAM
This diagram shows the sequence of the processes in order. The first layer depicts the input
layer. The middle layer represents the core application layer. The third and final layer is used to
represent the model which processes various different functions of the system such as feature
extraction, feature reduction and classification.
23
Chapter 6
Proposed Results
6.1.1 Dataset
We used FER-2013 database, the database was created as a part of a larger project that included
a competition for Kaggle.com. We were able to look to that project to understand the dataset
and what sort of results we could hope to achieve. In particular, the winner of the competition
achieved a test accuracy of 71% by cleverly implementing a CNN with an output layer that fed
into a linear SVM.We Changed the database to make it suit a scenario of classroom such as
Angry was changed to frustrated, Neutral was kept as it is, happy was changed as laughing, sad
was changed to bored and disgust was interpreted as confused and surprised was converted as
interested.
24
Figure 6.2: Multiple Face Detection
25
Chapter 7
Conclusion
Computer vision is extremely robust technology that can be used in a myriad number of ap-
plications. Powered by the neural networks, any system implementing computer vision can be
trained to achieve a significantly high level of accuracy at detecting and recognizing objects in
a real world environment. In our system, the convolutional neural network can achieve remark-
able results compared to other technologies that can be implemented to solve the same issue.
The faster computation and accurate predictions obtained from the CNN model can give us a
better insight at the emotions expressed by the students. The categories of emotions, expressed
by the students, can indicate a lot of details regarding the lecture in general. This system will
prove to be extremely helpful at not only improving teaching quality, but also at determining
the level of attention displayed by the students during the lecture. Based on the data obtained,
futher improvements can be made to the classroom teaching pattern.
26
Algorithm 1 Algorithm for Face Detection
1: Start
2: Read the live video from camera
3: while TRUE do
4: Take frame from video
5: Convert the RGB image into Grayscale image
6: For each pixel do
7: Grayscale value = ( (0.3 * R) + (0.59 * G) + (0.11 * B) )
8: Store this grayscale value in the array
9: Store the array as an image.
10: Detect face scales from images
11: Draw rectangle around face using scales
12: For (x,y,w,h) in faces:
13: Rectangle(img,(x,y),(x+w,y+h),(255,0,0),2)
14: Show image with detected face
15: Destroy image window
16: Release Camera
17: Stop
27
Chapter 8
Future Work
Pertaining to the future work that can be improved on this project, there are mainly two major
aspects,we believe, would improve our we believe there are two major areas of focus that would
improve our emotion recognition performance. The first improvement can be to calibrate the
architecture of the CNN used for the model to accurately match the problem our project intends
to focus on. A few examples of this calibration would be removing redundant parameters and
adding new parameters. Furthermore, adjusting the probability of dropout and experimenting to
find ideal stride sizes can also be focused in future work. As we can see there are such improve-
ment areas, although not limited to, factors that can be improved. The second improvement
area is the adaptation of the data sets to represent a real-time recognition environment, in order
to generate a model that can be used in real life application scenario. A case in point for this
can be to simulate noisy backgrounds in the images which can help the model to generate an
accurate recognition. Overall, our models achieve satisfactory results on the FER-2013 data set
with a much simpler CNN architecture. More work is necessary to make the real-time system
robust outside laboratory conditions, and it is possible that a deeper, more calibrated CNN could
improve results.
28
References
[1] Sagor Chandro Bakchy, Mst. Jannatul Ferdous, Ananna Hoque Sathi, Krishna Chandro
Ray, Faisal Imran, Md. Meraj Ali, ”Facial Expression Recognition based on Support Vec-
tor Machine using Gabor Wavelet Filter”, 2nd International Conference on Electrical &
Electronic Engineering (ICEEE), 27-29 December 2017.
[2] Abir Fathallah, Lofti Abdi, Ali Douik, ”Facial Expression Recognition via Deep Learn-
ing”, IEEE/ACS 14th International Conference on Computer Systems and Applications,
2017 .
[3] Viola and M. J. Jones, ”Robust real-time face detection”, International journal of computer
vision Vol. 57,no. 2, pp. 137 154, 2004.
[5] Chellappa, C.L. Wilson, S. Sirohey, ”Human and Machine recognition of Faces A Sur-
vey”, Proc. IEEE,Vol. 83, No. 5, pp. 705-741, 2015.
[6] Naveen Kumar H N, Jagadeesha S, Amith K Jain, ”Human Facial Expression Recog-
nition from Static Images using Shape and Appearance Feature”,2nd International
Conference on Applied and Theoretical Computing and Communication Technology
(iCATccT),December 2016 .
[7] Praseeda Lekshmi V., Dr.M.Sasikumar, Naveen S, ”Analysis of Facial Expressions from
Video Images using PCA”, Proceedings of the World Congress on Engineering 2008 Vol
I WCE 2008, July 2 - 4, 2008, London, U.K.
[8] Tong Zhang, Wenming Zheng, Member, IEEE, Zhen Cui, Yuan Zong, Jingwei Yan and
Keyu Yan, ”A Deep Neural Network Driven Feature Learning Method for Multi-view
29
Facial Expression Recognition” IEEE Transactions on Multimedia Volume: 18 , Issue: 12
, December. 2016.
[9] Peter Burkert, Felix Trier, Muhammad Zeshan Afzal , Andreas Dengel and Marcus Li-
wickim, ”DeXpression: Deep Convolutional Neural Network for Expression Recogni-
tion”,arXiv:1509.05371v2, 17 August 2016.
[10] Rahul Islam, Karan Ahuja, Sandip Karmakar, Ferdous Barbhuiya, ”SenTion: A frame-
work for Sensing Facial Expressions”, arXiv:1608.04489, August 2017
30
Appendix0