0% found this document useful (0 votes)
10 views9 pages

Facial Emotion Detection To Assess Learner's: State of Mind in An Online Learning System

Uploaded by

Ebenezer Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Facial Emotion Detection To Assess Learner's: State of Mind in An Online Learning System

Uploaded by

Ebenezer Mathew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Facial Emotion Detection to Assess Learner’s State of

Mind in an Online Learning System


Moutan Mukhopadhyay Saurabh Pal Anand Nayyar
Bengal Institute of Technology Bengal Institute of Technology Duy Tan University
Kolkata, West Bengal, India Kolkata, West Bengal, India Da Nang, Vietnam
[email protected] [email protected] [email protected]
Pijush Kanti Dutta Pramanik Niloy Dasgupta Prasenjit Choudhury
National Institute of Technology Bengal Institute of Technology National Institute of Technology
Durgapur, West Bengal, India Kolkata, West Bengal, India Durgapur, West Bengal, India
[email protected] [email protected] [email protected]

ABSTRACT State of mind


Despite the success and the popularity of the online learning
system, it still lacks in dynamically adapting suitable pedagogical
1. INTRODUCTION
methods according to the changing emotions and behaviour of the The continuous advancement in digital technologies and
learner, as can be done in the face-to-face mode of learning. This multimedia have proliferated the use of online learning systems
makes the learning process mechanized, which significantly (OLSs) [1] [2], settling the constraints of the face-to-face mode of
affects the learning outcome. To resolve this, the first and learning like cost, time restriction, space requirement,
necessary step is to assess the emotion of a learner and identify unavailability, etc. The OLS advantages like ubiquity, flexibility,
the change of emotions during a learning session. Usually, images availability (in terms of pedagogy and resources), multimedia
of facial expressions are analysed to assess one‟s state of mind. support, user-centric (more user control over learning), etc. have
However, human emotions are far more complex, and these made it quite popular among all categories of learners. However,
psychological states may not be reflected only through the basic OLSs still face challenges of being not so learner-friendly. It
emotion of a learner (i.e. analysing a single image), but a supposes all learners same throughout a learning session, not
combination of two or more emotions which may be reflected on considering their varied emotional and psychological fitment to
the face over a period of time. From a real survey, we derived four learning. But in practice, learners are different, and so is their
complex emotions that are a combination of basic human capability to fathom and process the learning instructions and
emotions often experienced by a learner, in concert, during a contents. Furthermore, during a particular learning session, a
learning session. To capture these combined emotions correctly, learner may go through various mental and emotional states,
we considered a fixed set of continuous image frames, instead of which directly affects their learning process.
discrete images. We built a CNN model to classify the basic Failing to grasp and process information while learning eventually
emotions and then identify the states of mind of the learners. The may lead the learner to get confused, feel strenuous (not
outcome is verified mathematically as well as surveying the comprehending), and feel bore and weary (lose attention and
learners. The results show a 65% and 62% accuracy respectively, interest). This may trigger changes in learner behaviours, making
for emotion classification and state of mind identification. him/her disorientated, frustrated, depressed, grim, or angry, which
ultimately results in skipping and leaving out from learning
CCS Concepts sessions. Thus, an unstable and downcast emotion of a learner
• Applied computing➝Interactive learning environments leads to a low or negative feeling which results in poor learning
• Applied computing➝E-learning • Computing performance.
methodologies➝Computer vision • Human-centered
computing➝➝Gestural input In face-to-face or human tutoring, the emotional and
psychological changes in the learner are easily identified by the
Keywords teacher through human‟s natural cognizance and experience. And
Online learning systems; Emotion detection; Facial expression; based on the changed emotion and behaviour of the learner, the
Machine learning; CNN; Image processing; Combined emotion; teacher may follow appropriate pedagogical method and approach.
This brings attention to the conclusive fact that teaching and
Permission to make digital or hard copies of all or part of this work for learning process needs to adapt as per the emotional and
personal or classroom use is granted without fee provided that copies are psychological state of the learner.
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights A learner‟s emotional involvement reflects the degree of
for components of this work owned by others than ACM must be engagement in learning, also referred as affective engagement [3].
honored. Abstracting with credit is permitted. To copy otherwise, or As an OLS has no intelligence to recognize human emotion and
republish, to post on servers or to redistribute to lists, requires prior the system is more mechanized, it is inherently unable to
specific permission and/or a fee. Request permissions from understand the affective engagement of a learner while learning.
[email protected].
Therefore, it has to be explicitly armoured to do that. For
ICIIT 2020, February 19–22, 2020, Hanoi, Viet Nam
© 2020 Association for Computing Machinery. automatic affective engagement detection, generally, two
ACM ISBN 978-1-4503-7659-4/19/07…$15.00 approaches are adopted [3], as shown in Table 1. For an OLS,
DOI: https://doi.org/10.1145/3385209.3385231 computer vision based approach is more suitable and has a wide
application scope compared to the sensor data analysis approach.

���
Among the other perceptual assessments (eye movement, facial Similarly, a learner also experiences a combination of different
expression, and gesture and posture), human face reflects one‟s states of mind, as listed in Figure 2, during the learning process.
internal emotion and psychological state most prominently. As a These states of mind of a learner may last for a short duration and
reason, among the three visual cue assessment approaches, the transform into other emotion over time. Thus, identifying the
facial expression is the most potent and effective way to assess the actual emotion of a learner during the learning session is
affective engagement of a learner. Facial expression (or emotion) challenging.
detection uses robust geometric and pattern matching algorithms
to assess muscle tone and stretching on the face to find the 1.2 Problem Statement
instantaneous emotion of a person. In view of the above discussions, in this paper, we precisely
address the following three problems:
Generally, human emotions change due to internal and external
stimuli and events. Figure 1 shows the basic and universally a) Identifying the change of emotions reflecting the state
accepted human facial emotions [4]. Likewise, in a learning of mind of the learner during a learning session.
scenario, a learner‟s emotion changes as per the learning actions b) Aggregation of the emotions of the learner over time to
he/she encounters. These emotions are temporary, discrete and assess the combinatorial emotions during the learning
change frequently before one could realize it. The different session.
emotions that develop during a learning process reflect the c) Analyse these combinatorial emotions to consider as
different states of mind of the learner. The state of mind is a long- feedback for the learning session on learning content,
term outlook, an overall experience of a learner, whereas pedagogical method, etc.
emotions are the components of the state of mind. Frequent
changing emotions cannot capture the actual feeling of a learner. 1.3 Contribution of this Paper
Thus, finding a long-lasting state of mind is more useful than a The paper has a two-fold contribution as follows:
temporary emotion. Analysing this would help in identifying the a) We find different patterns of single or combined
psychological state of the learner and its probable effect in the emotions of a learner during the learning session.
learning activity, which can be used as the metric for making b) We propose a definite model to assess the state of mind
learning more efficient and instructional pedagogy more of a learner from his/her different emotional expressions
anthropomorphized. at any point of time in a learning session.

Basic human facial emotions 1.4 Implication of the Work


The proposed work has the following important implications in a
personalized OLS and beyond:
Surprise
a) Depending on the change of emotion of a learner, an
Fear intelligent OLS may adapt to a suitable pedagogical
method (e.g., changing the learning contents, changing
Disgust the pace of tutoring, etc.).
Anger b) Analysing the found pattern of emotions and changes in
emotions, suitable learning contents may be designed
Happiness and recommended to the learner.
c) The proposed model and solution is not only limited to
Sadness OLS but can be used in many other applications such as
human resource analysis (e.g., during an interview),
Neutral assessing driver‟s condition (e.g., sleepy, tired, etc.),
online product feedback, real-time TV show
Figure 1. Basic human facial emotions recommendation, etc.
1.1 Problem Description 1.5 Organization of the Paper
Most of the facial emotion detection algorithms identify one of the
basic seven emotions [4], listed in Figure 1. However, human The rest of the paper is organized as follows. Section 2 mentions
the related work. Section 3 presents the details of detecting the
emotions are far more complex and may contain shades of more
than one emotion. Therefore, the psychological states may not be emotion of a learner. In this section, the model for combinatorial
emotions, the method and algorithm of detecting emotions, the
revealed only through a basic emotion, but a combination of two
or more emotions which may be reflected on the face over a classification technique, and the practical experiment are
presented. The result of the experiment and its analysis are
period of time. Expression of these combined emotions is
generally continuous and are not very discrete; i.e., they do not discussed in Section 4. Section 5 concludes the paper, mentioning
the scope of future work, based on this paper.
change very abruptly. So, explicitly choosing one emotion in an
instant of time may fail to assess the exact emotion.

���
Table 1. Automatic affective engagement detection approaches
Uses Measures Method Advantage Disadvantage
Sensor Bio-  Heartbeat Sensor  Accurate finding of learner‟s  Requires specialized sensors
data sensors  EEG data physiological state. which may not be convenient for a
analysis  Blood analysis  Alertness and arousal are quite well real-life learning environment.
based pressure detected.  The emotions which are detected
approach  Skin are very less to be useful for wide
response online learning applications.
Computer Camera  Eye Feature  Low cost  Lacking in accuracy.
vision movement extraction  Easy to set up and use  Result‟s accuracy depends on the
based  Facial and  The wide availability of camera in dataset and algorithm used for
approach expression analysis different devices allows detecting detection.
 Gesture from learner‟s engagement widely supportable.
and posture captured  Computer vision can be trained to
images assess the learner, the same way as a
teacher does in face-to-face teaching.
The facial expression detection involves process like facial
landmark detection, facial feature extraction and classification. In
Different states of mind of a
learner during learning [10], automatic facial expression detection was performed by
extracting features by using wavelets transformation and emotion
Confusion classification by K-nearest neighbours (KNN) algorithm.
Typically, several number of features exist on one's face; therefore,
Satisfaction the authors considered principal component analysis (PCA) for
Dissatisfaction selecting facial features. In another approach, Krithika et al. [11]
used Voila Jones algorithm and local binary patterns (LBP) for
Frustration detecting a face, facial expression, eye and head movement to
detect the attention, boredom etc. of the learner while learning.
Tense Machine learning and deep learning algorithms have been proved
Anxiety to be efficient in terms of implementation and results as compared
to other contemporary systems [12]. Liyuan et al., in [13] shown
Delight that machine learning technique like linear SVM has shown a
better result for identifying relevant and irrelevant facial
Figure 2. Different states of mind which may arise in the expression using Gabor wavelet and shape feature. In another
learner during the learning process approach, HAAR cascades are used for detecting eyes and mouth
feature combined with a neural network to provide much better
2. RELATED WORK emotion detection [14]. Guojon Yang et al. [15], proposed a deep
The ability of computers in detecting human emotions has neural network (DNN) model by using vectorized facial features.
encouraged its application widely in education and learning The human facial expressions are represented in vectors, allowing
system. Emotion detection has been successful in detecting to train DNN with high accuracy. It is observed that among the
learner‟s alertness and attention [5], getting opinion and feedback many advanced machine learning techniques, convolution neural
[6], and assessing learner feeling and affective state [7]. In regular network (CNN) is proved to be much efficient in terms of
classroom learning settings, emotion detection by head posture automated feature extraction, lesser input and classification
and facial expression detection can find the student‟s attentiveness accuracy [16]. In [16], two facial expression detection methods,
and synchronization rate [8]. In comparison to regular classroom namely, autoencoder and CNN, have been compared. It is found,
learning, emotion detection has especially benefited to the e- CNN statistically able to predict emotions with higher accuracy.
learning system. Due to immense application scope, the research Similarly, in [17], CNN has achieved better accuracy even by
on emotion detection is quite popular. The advancement in using the small-sized dataset (EMotiW). CNN not only used for
technology has accelerated the research on emotion detection. detecting the face and facial expression, but it could be fine-tuned
One of the key applications of emotion detection in e-learning is to detect important parts of the face instead of full face. Moreover,
personalized learning support. Whereby, learning is being adapted the CNN model works better over the existing models on multiple
to be personalized based on the learner‟s emotion to fit as per datasets, including FER-2013, CK+, FERG and JAFFE [18] [9]
learner suitability of learning. In [7], the facial expression is [19]. Conventionally emotion detection is performed on a static
detected for emotion recognizing (sad, happy, etc.) to set the image, but detecting emotion detection from facial expression in a
difficulty level of the task assigned to the learner, based on the video is a challenge. Zhang et al. [20] proposed that two CNN
learner‟s emotion the system tries to make the learning happy and models and a deep belief network (DBN) model, combined as a
joyful. Other applications like teaching process improvement, hybrid model, can extract facial expression very well from a
emotion detection, etc, are used in e-learning. While teaching over running video.
the internet, a teacher by using facial emotion detection technique
can know the learner‟s feedback remotely and thus adapting the Even though a lot of work has been done for emotion detection,
teaching methodology likewise [9]. they all focused on finding the basic emotion, while the complex
emotions, natural to human face during the learning process, is
Many advanced computer vision algorithms have been used to ignored.
detect emotions from facial expression over the past few years.

���
Table 2. Learner’s state of mind, its implication in learning, and the Plutchik’s combined emotion composition
Learner’s Combination of emotions
Implication in learning
state of mind (Plutchik’s theorem)
Confused Learner is not sure about a concept, or feeling difficulty in comprehending the topic Surprised + Anticipation
Satisfied Successfully completed learning task (reading, understanding, solving. etc.) Not Available
Dissatisfied Not understanding the topic or learner is not happy with the content Surprise + Sadness
Frustrated Perform repeatedly poor for a given learning task Not Available

Figure 3. Basic emotion patterns for the state of minds of the surveyed learners

3. EMOTION DETECTION OF A Plutchik has theorized many emotions, not all of the complex
emotions, as mentioned above, are defined by the combination of
LEARNER basic emotions. Furthermore, “Anticipation” is not identified as
3.1 Modelling the Combinatorial Emotions the basic emotion and is not being detected computationally [21].
Therefore, there is a need for emotion pattern finding that defines
and State of Mind these complex emotions in terms of basic emotion combination.
3.1.1 Existing set of combined emotions
Assessing the learner state of mind during a learning session helps 3.1.2 Deriving combined emotions of a learner
to know how does a learner feel about the learning material. To find the complex emotions of a learner from the combinations
Typically, humans do not explicitly express their feeling by a of basic emotions, a survey was conducted on 150 candidates,
single emotion discretely rather by an orchestration of many studying at the undergraduate level. Each student was given a
emotions either simultaneously or in sequence. These emotions small learning material, composed of reading material along with
altogether lead to a complex emotion and often determined by a puzzles and programs. The learning material was so chosen that it
combination of the primary emotions, as postulated by Robert might evoke the considered states of mind (Confusion,
Plutchik [21]. Based on the general applicability for all kind of Satisfaction, Dissatisfaction, Frustration) of the learner. The facial
learners and wide application scope in learning, we have chosen emotions of each student while going through the learning
four types of state of mind for assessment, as shown in Table 2. material were recorded through video. The state of mind of each
student was being monitored and noted by an expert. Similarly,
It is found that the state of minds, like Satisfied and Frustrated, are the learner after completing the learning session was prompted to
not classified in Plutchik‟s theory of emotion [21]. Though

���
give feedback on his/her emotional states during the learning a learning process. As a reason, complex emotion or the learner‟s
session; i.e., they were asked to select from the four options - state of mind cannot be detected by judging only one facial image.
Confusion, Satisfaction, Dissatisfaction, and Frustration. A sequence of images over a period of time is required to detect
the state of mind of the learner while studying. For generalization,
Both the expert and learner feedbacks were compared to rule out we consider that human emotion transformation may longs within
the discrepancies. Cases with similar feedbacks were considered a time frame of 6 seconds, approximately. To identify the change
for further assessment. Only 115 candidates out of 150 learners in emotion in the learner, a group of 6 images are taken in
were found to have similar feedback with that of the expert. The sequence by capturing images at a rate of one image per second.
recorded videos were analysed to find the emotion patterns of This creates a window of size 6, representing learner‟s images for
each selected candidate. The video was split into image frames, the last 6 seconds, as shown in Figure 4. Appropriately, for each
whereby each image was analysed through a standard emotion image, the facial expression is identified to assess the score of
detection API (Windows Azure) [22] for finding the basic each class of basic emotion (as listed in Figure 1), as shown in
emotion levels. The predicted confidence of each emotion class Figure 5. The dominant emotion of the face image is identified as
for each candidate are added. The basic emotion pattern for the having a very high confidence score value. Since a face can
state of minds is shown in Figure 3(a-d), as derived from expert‟s display multiple shades of emotion, a different score value for
and learner‟s feedback. For example, Figure 3(a) shows that the each class of emotion is obtained. To normalize the classifier
learner who feels confused, exhibits higher basic emotions pattern prediction error and excluding the minor emotion detected, a
for neutral, and shows either surprise or sad emotions, whereas in threshold value of 10% is set for selecting the appropriate emotion
Figure 3(d), a candidate being frustrated exhibits both sad and class score. In the window, for an image, all those emotions are
anger emotions along with neutral. selected whose confidence score value is more than 10%.
Table 3 summarises the identified combinations of basic emotion To find the emotion pattern from the 6 images in a window, the
pattern for each state of mind for 115 students after analysing mean of the confidence score is calculated for all the respective
their emotions. The found combinatorial emotions are neutral to emotions, as given by Equation (1).
the occurrence order and magnitude of the basic emotions.

Table 3. Basic emotion combination pattern for each state of MX = ∑  (1)
mind Where, X = A, D, F, H, N, S, R.
Combinatorial emotions State of mind of a learner The mean of the confidence of the corresponding emotions angry,
Neutral + Surprise/sad Confusion disgust, fear, happy, neutral, sad and surprise are represented by
Happy + Neutral Satisfaction/delighted MA, MD, MF, MH, MN, MS, and MR respectively.
Neutral + Sadness Disappointment/dissatisfaction
Sad + Angry + Neutral Frustrated The set of the mean (ME) of all detected emotions‟ confidence
score for the 6 images in the window is defined in Equation (2),
3.2 Emotion Detection and the emotion pattern (EP) is defined through Equations (3) to
Emotion from the facial image could be detected by facial (6).
expression pattern matching. In this regard, machine learning
tools are quite sophisticated in identifying the facial expressions ME = {MA, MD, MF, MH, MN, MS, MR} (2)
and further classifying it for emotion. We chose CNN for facial E1 = max(ME) (3)
expression identification and emotion detection. The reason for
choosing CNN, among other available machine learning tools, is E2 = max(ME – E1) (4)
that the CNN, a supervised deep-learning algorithm, automatically E3 = max(ME - {E1,E2}) (5)
identifies the features and represents the most discriminative EP = {E1, E2, E3} (6)
features and hence allows for better performance. In CNN-based
approaches, the input image is filtered to produce a feature map. Equation 6 represents three prominent emotions E1, E2, E3, as
Where each feature map is then computed through connected selected from ME having emotion confidence score E1>E2>E3.
neural networks for recognition of facial expression and The confidence score allows to choose the best of three prominent
identifying the emotion class. CNN gives better accuracy than the emotions. These three emotions, not considering their occurrence
other neural network-based classifiers [12]. order and magnitude, show an emotion pattern for the six second
time frame. The selected emotion pattern is mapped to Table 3 to
3.3 Identifying Learner’s State of Mind find the relevant state of mind for the particular time period.
For a learning process, it is observed that the emotion of a learner
does not change instantly, and the emotion transformation
happens gradually. Furthermore, the emotion changes stay on
learner's face for a period of time (at least for 3 to 5 seconds), for

���
Figure 4. Basic emotion pattern recognition from a series of images

Video
Video frame capture

Image

Emotion detection

Emotions
Adapted
learning State of mind
instruction identification

State of mind

Learning
Management System

Figure 6. Emotion detection modelling in an e-learning system


The emotion detection module uses the trained CNN classifier to
detect the different emotions in the image. The state of mind
identification module takes a sequence of 6 images as a window
frame to identify the emotion pattern of the learner for the last 6
seconds. The learner‟s state of mind is identified from the derived
emotion pattern.
Further, the identified state of mind is sent to LMS as feedback of
learner‟s emotion. Based on the identified state of mind of a
learner, the LMS adapts the instructional or pedagogical method
in real-time, making more suitable for the particular learner.
3.5 Experiment
Figure 5. The confidence score of basic emotions in an image The experiment for detecting the state of mind is conducted in the
frame following three phases:
3.4 Learner’s State of Mind Identification 1. CNN, a deep learning algorithm, is being implemented
for detecting emotions from images.
Model in an E-learning System 2. Identifying learner state of mind over a period of time.
The state of mind identification process is incorporated along with
the Learning Management System (LMS) to have real-time 3. Learner verification on the output for assessing the
learning adaptation, as shown in Figure 6. The web camera takes correctness of state of mind detected for learner emotion
the video of the learner, from which frames (image of the learner) pattern.
are grabbed at a frequency of 1 frame/sec.
Each step is discussed in detail in the following subsections.

���
3.5.1 Working with CNN grayscale images of faces. The faces are more or less centred and
occupy about the same amount of space in each image. The FER
3.5.1.1 Modelling and implementation dataset consists of 28,709 samples, where 7 types of emotions are
Convolution depicted, namely, Anger, Disgust, Fear, Happy, Sad, Surprised,
Neutral. The FER dataset is split into two sets, training set
Max Pooling (consisting of 80% sample data) and test set (consisting of 20%
sample data). The training dataset is feed to CNN for training.
Convolution
Two techniques have been used to speed up the training process
as:
Convolution  Image batches as input to CNN instead of using single
images.
Max Pooling  Dropout regularization technique to get better
performance on the model.
Convolution
3.5.1.3 Testing step
We run our CNN model for around 50 epochs (considered to be as
Convolution
dataset run forward and backwards through CNN) and learn about
the performance and accuracy of the model. After which, we use
Max Pooling testing images to test the model. The model is also tested for real-
time analysis on several input video sequences and webcam
Flatten sequences for specific emotions depicted in each frame. The result
is duly noted for each frame, and also errors are recorded for any
Dense (1024) misclassification that occurs; accordingly, suitable measures are
taken.
Dense (1024) 3.5.2 Learner verification
Emotion and state of mind are not quantifiable and often are not
Output formally expressible. Therefore, identifying the implicit learner
state of mind need appropriate verification of the concerned
Figure 7. CNN model and its layered structure person. For this reason, to measure the accuracy of our emotion
The structure for our CNN model is given as in Figure 7. The model and emotion pattern recognizing approach, the classified
structure consists of five convolution layers with ReLu activation emotion pattern needs learner‟s verification.
function, three pooling layers, two fully connected layers, and the The approach is being assessed and verified by 40 candidates at
output layer. The functionality and the parameters set for each the graduate-level course. A short online tutorial followed by a
layer is given as: test session on machine learning is being carried over students.
 The convolution layer produces a feature map from the While the learner is in the learning session, video recording of
images. The convolution kernels are set at a size of 3 x each candidate is being carried over to identify the learner‟s state
3. of mind at different time. After the learning session is finished,
 The max pooling allows reducing the dimension without the recorded video is analysed frame by frame to find the emotion
losing important features and patterns. Each of the max pattern and thus state of mind over time. The emotion pattern
pooling layers is set with a stride value of 2 and a detection mechanism is followed to derive the state of mind for
pooling window of 2 x 2. every 6 seconds.
 The flatten layer converts the 2-dimensional data into 1-
dimensional data that can be fed to a fully connected The learner‟s state of mind identified are aggregated across the
layer. entire learning session. The candidates are prompted to give their
 The dense layer (or fully connected layer) is a feedback for the correctness of the assessed aggregated learner‟s
combination of two or more neural network. The 1- state of mind. It is being hypothesized that learner is truly able to
dimensional data from flattening layer is fed as input to respond to his state of mind correctly. The detected state of mind
input nodes of each dense layer. of a learner while a learning session is shown in Figure 8.
 The output layer has 7 nodes with SoftMax activation
function, where each node stands for a different set of
emotion.

The CNN model is being implemented using python. The built-up


model is being further trained and tested over facial expression
dataset for accuracy.
Figure 8. Learner’s state of mind detected over time
3.5.1.2 Training step
A CNN model requires a lot of labelled images for training. Here, 4. RESULT ANALYSIS
the CNN model is trained with the Facial Expression Recognition Based on the two experiments – a) detecting and classifying
(FER2013) dataset [23] [24], where each human face images in emotion, b) detecting the state of mind (emotion pattern), the
the dataset are labelled by the emotions that are reflected by the obtained results are shown and analysed separately for accuracy.
respective facial expression. The data consists of 48x48 pixel The result is evaluated by four metrics, as described in Table 4

���
and using three measures, as described in Table 5 and defined in Accuracy 0.62
Equations (7) to (9). Precision 0.65
Table 4. Performance evaluation parameters Recall 0.68
True The model correctly predicts observation. 4.3 Critical Analysis
Positive Accuracy depends on the ratio of correctly identified state of mind
False The model incorrectly predicts observation. and a total number of observations. In our approach, CNN shows
Positive 65% accuracy in classifying the basic emotions (listed in Figure 1).
True The model does not predict the observation which This led to the conclusion that 4 out of 10 emotions analysed are
Negative does not exist. wrong. In our proposed approach, we used a combinatorial
False The model missed in predicting the observation, or emotion pattern for detecting the state of mind. An incorrect
Negative cannot recognize the observation which exists. emotion prediction leads to lesser accuracy in the state of mind
detection.
Table 5. Performance evaluation metrics The experimental result exhibits quite high false positive. This
Accuracy An intuitive performance measure, a ratio of correct denotes that the model is quite susceptible to predict the state of
observation made to total observation done. minds that differ from learning the actual state of minds.
Recall The ratio of correct positive observation made to all Misclassifying the emotion „fear‟ as „sad‟ may lead the learner to
correct and wrong observation made. have „confusion‟ as a state of mind. Similarly, misclassifying
Precision The ratio of current positive observation made to all „disgust‟ as „angry‟ wrongly detects learner to have „confusion‟
predicted as well as non-predicted correct positive which actually should be „frustrated‟. The high false-negative
observation. depicts that the model often missed revealing the actual feelings
of the learner. Facts like not able to detect „sad‟ and „angry‟ could
 result in „neutral‟ feelings and thus not detecting the „confusion‟
Accuracy = (7) or „frustration‟ state of mind. Some factors which lead to non-

 identification of emotions are:
Precision = (8)


 Captured images often do not contain the whole face or
Recall = (9) frontal view.

 Learner‟s frequent movement, lead to instability in face
4.1 Accuracy of the CNN Model recognition.
The CNN model is tested for accuracy, precision and recall over  Illumination is an important factor in capturing a clear
the test data and other real data. For the FER 2013 dataset, the image, poor lighting or very sharp lighting on face
model gives test accuracy of around 65%, as shown in Table 6. In affects emotion detection.
FER 2013 dataset, „happy‟ is the most dominant emotion. Thus,  Uses of spectacles or glass lead to poor performance in
the CNN model trained over it gives better accuracy in predicting emotion detection.
„happy‟ and „neutral‟ facial expression recognition than other  Contempt learner or inexpressive learner shows no
emotions. Detecting „disgust‟ and „fear‟ have a lower accuracy change in emotion.
rate. This inaccuracy is due to misclassifications of emotion such
 A matured learner does not show much of facial
as deducing „sad‟ instead of „fear‟ and „angry‟ instead „disgust‟.
expression in a learning environment.
Table 6. The performance measure of CNN for emotion
detection 5. CONCLUSIONS AND FUTURE WORK
This paper proposes a method to assess the state of mind of the
Metric Value learners during an online learning session. We have identified four
Accuracy 0.65 complex emotions (confusion, dissatisfaction, satisfaction, and
0.69 frustration) that are combinations of the basic emotions. We also
Precision
have established the fact that considering a single image capture
Recall 0.43 for assessing the emotion is not sufficient. That‟s why we have
considered a window of six image frames, captured by the
4.2 Performance Measure of the State of webcam, to assess the state of mind of the learner. Taking six
Mind Identification Model image frames is also led by the consideration that the human
The assessment for the accuracy of the proposed emotion model emotions are continuous and are not very discrete; that is, they do
and emotion pattern recognizing approach is carried by manual not change abruptly; rather it takes a while (though very little) to
verification over the predicted outputs. The learner verification change the state of mind.
with the predicted output for correctly identified, wrongly
In classifying learner‟s emotions, our proposed CNN model
identified, and not identified the state of mind is shown in Table 7.
performs fairly with a 65% accuracy. While the proposed model
Table 7. The performance measure of the proposed approach for identifying learner‟s state of mind gives an accuracy of 62%.
Metric Value The results show the state of mind identification for a learner is
not very expressive. There is a lot of scopes to improve. Actually,
True Positive 15 considering only facial expression images are not sufficient to
False Positive 8 assess the human emotions correctly. Identification of emotions
True Negative 10 and the state of mind can be more accurate by considering other
False Negative 7 facial features like eye movement, eye blinks, eye gaze change,

���
eyebrow movement, and other subtle facial expressions such as [12] I. M. Revina and W. R. SamEmmanuel, “A Survey on
curves and micro-expressions. Human Face Expression Recognition Techniques,” Journal
of King Saud University - Computer and Information
6. REFERENCES Sciences, 2018.
[1] S. Pal, P. K. D. Pramanik and P. Choudhury, “A Step
[13] L. Chenak, C. Zhoua and L. Shenb, “Facial Expression
Towards Smart Learning: Designing an Interactive Video-
Recognition Based on SVM in E-learning,” IERI Procedia,
Based M-Learning System for Educational Institutes,”
vol. 2, pp. 781-787, 2012.
International Journal of Web-Based Learning and Teaching
Technologies, vol. 14, no. 4, pp. 26-48, 2019. [14] D. Yanga, A. Alsadoona, P. Prasad, A. K. Singh and A.
Elchouemi, “An Emotion Recognition Model Based on
[2] S. Pal, P. K. D. Pramanik, T. Majumdar and P. Choudhury,
Facial Recognition in Virtual Learning Environment,” in 6th
“A semi-automatic metadata extraction model and method
International Conference on Smart Computing and
for video-based e-learning contents,” Education and
Communications, ICSCC 2017, Kurukshetra, 2017.
Information Technologies, vol. 24, no. 6, pp. 3243-3268,
2019. [15] G. Yang, J. S. Y. Ortoneda and J. Saniie, “Emotion
Recognition Using Deep Neural Network with Vectorized
[3] M. A. A. Dewan, M. Murshed and F. Lin, “Engagement
Facial Features,” in IEEE International Conference on
detection in online learning: a review,” Smart Learning
Electro/Information Technology (EIT), Rochester, Michigan,
Environments, vol. VI, no. 1, pp. 1-20, 2019.
USA, 2018.
[4] M. Dubey and L. Singh, “Automatic Emotion Recognition
[16] P. R. Dachapally, “Facial Emotion Detection Using
Using Facial Expression: A Review,” International Research
Convolutional Neural Networks and Representational
Journal of Engineering and Technology, vol. III, no. 02, pp.
Autoencoder Units,” arXiv, no. 1706.01509, 2015.
488-492, 2016.
[17] H.-W. Ng, D. V. Nguyen, V. Vonikakis and S. Winkler,
[5] S. L. Happy, A. Dasgupta, P. Patnaik and A. Routray,
“Deep Learning for Emotion Recognition on Small Datasets
“Automated Alertness and Emotion Detection for Empathic
Using Transfer Learning,” in ACM International Conference
Feedback during e-Learning,” in IEEE Fifth International
on Multimodal Interaction, Seattle, 2015.
Conference on Technology for Education, Kharagpur, 2013.
[18] S. Minaee and A. Abdolrashidi, “Deep-Emotion: Facial
[6] H. H. Binali, C. Wu and V. Potdar, “A new significant area:
Expression Recognition Using Attentional Convolutional
Emotion detection in E-learning using opinion mining
Network,” arXiv, no. 1902.01019, 2019.
techniques,” in IEEE International Conference on Digital
Ecosystems and Technologies, Istanbul, 2009. [19] M. M. Taghi Zadeh , M. Imani and B. Majidi , “Fast Facial
emotion recognition Using Convolutional Neural Networks
[7] M. Saneiro, O. C. Santos, S. Salmeron-Majadas and J. G.
and Gabor Filters,” in 5th Conference on Knowledge-Based
Boticario, “Towards Emotion Detection in Educational
Engineering and Innovation, Iran University of Science and
Scenarios from Facial Expressions and Body Movements
Technology, Tehran, Iran, 2019.
through Multimodal Approaches,” The Scientific World
Journal, vol. 2014 (Article ID 484873), 2014. [20] S. Zhang, X. Pan, Y. Cui, X. Zhao and L. Liu, “Learning
Affective Video Features for Facial Expression Recognition
[8] K. Fujii, P. Marian, D. Clark, Y. Okamoto and J. Rekimoto,
via Hybrid Deep Learning,” IEEE Access, vol. 7, pp. 32297-
“Sync Class: Visualization System for In-Class Student
32304, 2019.
Synchronization,” in 9th Augmented Human International
Conference, Seoul, 2018. [21] M. M. Abbasi and A. Beltiukov, “Summarizing Emotions
from Text Using Plutchik‟s Wheel of Emotions,” in Scientific
[9] A. Sun, Y.-J. Li, Y.-M. Huang and Q. Li, “Using facial
Conference on Information Technologies for Intelligent
expression to detect emotion in e-learning system: A deep
Decision Making Support, Ufa, 2019.
learning method,” in International Symposium on Emerging
Technologies for Education (SETE 2017), Cape Town, South [22] Microsoft Azure, “Face,” Microsoft, 2019. [Online].
Africa, 2017. Available: https://azure.microsoft.com/en-
in/services/cognitive-services/face/. [Accessed 1st December
[10] J. Ou, "Classification Algorithms Research on Facial
2019].
Expression Recognition," in International Conference on
Solid State Devices and Materials Science, Hainan, China, [23] S. Li and W. Deng, “Deep Facial Expression Recognition: A
2012. Survey,” Computer Vision and Pattern Recognition, pp. 1-25,
2018.
[11] L. B. Krithika and G. G. Lakshmi Priya, “Student Emotion
Recognition System (SERS) for e-learning Improvement [24] R. Verma, “fer2013,” Kaggle Inc, 26 May 2018. [Online].
Based on Learner Concentration Metric,” Procedia Available: https://www.kaggle.com/deadskull7/fer2013.
Computer Science, vol. 85, pp. 767-776, 2016. [Accessed 22 November 2019].

���

You might also like