Classifying Emotional Engagement in Online Learning Via Deep Learning Architecture
Classifying Emotional Engagement in Online Learning Via Deep Learning Architecture
Science (IJAEMS)
Peer-Reviewed Journal
ISSN: 2454-1311 | Vol-10, Issue-5; Jul-Aug, 2024
Journal Home Page: https://ijaems.com/
DOI: https://dx.doi.org/10.22161/ijaems.105.2
Email: [email protected]
Received: 08 May 2024; Received in revised form: 16 Jun 2024; Accepted: 24 Jun 2024; Available online: 02 Jul 2024
Abstract— The world has seen a phenomenal rise in online learning over the past decade, with universities
shifting courses to online modes, MOOCs(Massive Open Online Course) emerging and laptop and tab-based
initiatives being extensively promoted. However, educators face significant challenges in analyzing learning
environments due to issues like lack of in-person cues, small video size, etc. To address these challenges, it is
crucial to analyze the engagement levels of online classes. Out of the various subcategories of engagement,
emotional engagement is one that is overlooked, but integral to analysis and deterministic in its approach. In
response, we developed a deep learning architecture to analyze emotional engagement in online classes. Our
method utilizes a ResNet50-based algorithm, refined through experimentation with various techniques such as
transfer learning, optimizers, and pre-trained weights. The model adds a unique layer to the analysis of different
algorithms used for engagement detection in academia while also achieving stellar rates of 81.34% validation
accuracy and 81.04% training accuracy. Unlike other models, our approach employs high-quality image data
for training, ensuring more reliable results. Moreover, we constructed a novel framework for applying emotional
engagement to real-world scenarios, thus bridging the pre-existing gap between implementation and academia.
The integration of this technology into online learning has immense potential, and can bring with it a shift in the
quality of education. By fostering a safe and healthy learning space for every student, we can significantly
enhance the effectiveness of online education systems.
Keywords— deep learning, emotional engagement, engagement, framework, online learning, ResNet-50
(Sobieszczuk-Nowicka et al., 2018; Mashoedah et al., networks to imitate the intricate decision-making
2018). It is also difficult for educators to understand the capability of the human brain. These deep neural networks
class dynamics and environment in online modes. As a are trained on vast amounts of data to enable them to
result, students’ emotional well-being can’t be catered to. identify phenomena, observe patterns in information, and
In turn, since students' participation is highly impacted by make predictions and decisions. They only need to be
the direct attention and support they obtain from teachers, trained once, however, after which they can efficiently be
students are prompted to leave the class or disengage from used for purposes ranging from medical diagnosis to
lessons (Azlan et al., 2020). voice-enabled machinery (Goodfellow et al., 2016). Many
To initiate change, it is necessary to systematically analyze deep learning algorithms are used to create neural
online classes. The principal approach for analyzing networks. This paper focuses on ResNet-50, which was
learning environments is to monitor student engagement developed by Microsoft researchers in 2015. It was
levels. Engagement can be defined as “the interaction designed to enable better performance through its residual
between the time, effort and other relevant resources connections. Interestingly, its name was derived from its
invested by both students and their institutions to optimize characteristic feature of having 50 layers in its network.
the student experience while also enhancing the learning One particular machine learning technique that we will use
outcomes and development of students as well as the in the study is transfer learning. Regarding theoretical
performance of the institution” (Trowler, 2010). There are context, transfer learning can be defined as a method
multiple types or sub-categories of engagement within the where a model trained on one task is used as the starting
educational setting. Researchers agree that cognitive, point for a model on a second task. By using the learned
emotional and behavioral engagement are the most features from the first task, the model can work more
deterministic. Cognitive engagement refers to the efficiently and quickly even with a small amount of data
willingness and effort to grasp more difficult concepts and (Ali et al., 2023).
try challenging puzzles, behavioral engagement refers to
concentration and attention on the material, and emotional
engagement refers to the presence of positive emotion such
as interest and enthusiasm in regards to the material being
taught (Hasnine et al., 2023).
This paper has limited its scope to emotional engagement
due to its comprehensiveness and significance, along with
the elusiveness of its quantifiability in pre-existing
frameworks. According to Patrick et al, the premise is
simple: “the more emotionally involved students are with
their environment while studying a subject, the more
engaged they are, and the more support students get with
managing their emotional states, the more they can pay
attention in classes” (Patrick et al., 2007). In other words,
student engagement is directly proportional to their Fig. I The process of transfer learning
achievement (Skinner et al., 1998). Hence, it is crucial for
achieving learning goals and receiving quality education.
Many methods are used to gauge emotional engagement.
Traditionally, educators rely on quizzes and questionnaires
at the end of sessions, but this is prone to demand
characteristics and is susceptible to the student’s angle of
analysis (McCambridge et al., 2012). It also requires a lot
of effort from both the students and the educators. Hence,
automation has been brought into the limelight,
significantly shifting the potential scope of emotional
engagement analysis. Our research delves into the field of
automated analysis through the usage of deep learning.
Deep Learning (DL) is a subset of machine learning that
utilizes multi-layered neural networks called deep neural Fig. II Deep learning architecture
The key contributions of this paper are: discrepancies. The model may also have difficulty
• This paper proposes a model that has been trained interpreting mixed emotions, since it is trained on
to detect the emotional engagement levels of artificially emotive images.
students in real-time with a stellar accuracy of
81.34% val. accuracy and 81.01% test accuracy II. METHOD
• This paper adds on to the plethora of research This research was carried out on Google Collab software
done in this field by methodologically with T4 GPU, using the highly-acclaimed python libraries
experimenting with 4+ datasets, 3+ algorithms, of Keras and Tensorflow. The dataset was uploaded to
and a wide range of machine learning techniques Google Drive, where file paths were used to reference the
to determine which is more lightweight and yields images and train the model on them. Initially, the
better results, along with the learning rate and employed system underwent training with the FER-2013
epoch number at which it does so dataset, which contains 30,000+ images of people of
• The model uses high quality data, a feature of different cultures and ages. However, due to low image
datasets that is rarely seen in research in this field quality and lack of color, the Facial Expressions Training
Data was chosen instead. This dataset is a high-quality,
• This paper also aims to provide a modified
coloured dataset consisting of 29,000+ (96 by 96 pixels)
framework that prioritizes privacy by analyzing
images. It was taken from Kaggle, a public dataset
student videos on their own devices and provides
publishing platform.
visual, easy to navigate, graphical summaries to
educators. It will also enable a student support To pre-process the data, multiple steps were taken. The
system to assist students with dire emotional labeled data was first sorted into its respective emotion
states class folders, and split into validation, training and testing
data by a 10-80-10 split. Training and validation data was
In terms of potential limitations in our research, a
shuffled to ensure random selection.
prevalent issue is the scarcity of available high-quality
data, which reduces the accuracy of models and their
ability to learn relevant features. Moreover, there may be
biases due to deep learning models mirroring the innate
biases of the training data. For example, cultural
accessories such as bindis and headscarves may not be
properly identified by the model and hence may create
The employed CNN (Convolutional Neural Network) To construct the architecture, we removed the fully
architecture was integral to this study. We experimented connected layers at the top of the pre-trained models to
with MobileNet, ResNet-50 and EfficientNet, evaluating enable customization of layers. 8 output classes were
which would be better for the chosen objective. While all added, namely ‘Happy’, ‘Sad’, ‘Contempt’, ‘Surprised’,
of them converged as epochs increased, ResNet-50 had the ‘Neutral’, ‘Fear’, and ‘Anger’.
best overall performance since it gave higher accuracies In terms of the layers in the models, the functional transfer
even at smaller epochs. Additionally, transfer learning learning layers were followed by alternating flatten and
proved to be a crucial technique to increase the speed and dense layers. These dense layers were composed of 2048
accuracy of the model. We used pre-trained a ResNet50 neurons. For activation, ReLu was used to prevent
model from Keras Applications. gradients from saturating and hence solve the issue of
vanishing gradients. In the final layer, Softmax was used, analysis, we deemed SGD (Stochastic Gradient Descent)
which helped training converge at a faster rate. to be better suited due to how well it converged to more
Moreover, the model weights pre-trained on the standard optimal solutions.
ImageNet dataset were used. These weights were locked In this study, loss calculation was done through sparse
into the models to ensure learned representations are not categorical cross entropy. In comparison to other methods,
lost. After the convolutional layers, global average pooling it saves time in memory as well as computation. The key
was used to reduce the amount of computation required metric we used to measure the success of the model was
while retaining important features. In terms of optimizers, training accuracy, which estimates the potential of a
we initially implemented Adam, which is a standard model.
method to help the model converge faster. However, upon
III. RESULTS precision (88.9%) and recall (89.7%) for class 1. However,
i) Model it has a relatively high false positive rate (92.6%) and a
low false negative rate (10.3%).
The final model is an 8-layer sequential classification
model, composed of a pre-trained layer along with
alternating dense and flattening layers. The usage of high-
quality data and experimentation with parameters has
resulted in lightweight yet high performance structuring. In
essence, this model analyses student expressions to
accurately classify their emotional engagement states.
ii) Framework
This framework is designed to be an extension app in
online learning platforms such as Zoom. Currently, the
market does not host any such platforms, with the closest
alternative being Engagement Hub, an extension on Zoom
Fig. V: Confusion matrix Marketplace that allows users to automatically transcribe
and analyze meeting recordings. This lack of
implementation may be a result of how restricted
In summary, the confusion matrix indicates that the model engagement analysis via deep learning architecture is to
has a high accuracy (80.8%) and performs well in terms of academia.
Following are the steps of the devised framework- • The extracted image will then be run through an
• At the start of any session, an automated message emotional engagement detection model, where it
will be displayed on all student devices to notify will be pre-processed and then analyzed. Through
them that they are being recorded and analyzed. methods like transfer learning, optimization,
This will be similar to the pre-existing feature on pooling layers, etc. the model is fine tuned to
zoom that notifies participants when screen accurately predict the emotion of the student.
recording is turned on by a user. Through this • The student’s name is then extracted from their
feature, the privacy rights of students will be name label. The name and its associated emotion
protected. classification is encrypted and sent to the
• The cycle of emotional engagement analysis will teacher’s device.
repeat in a set interval of time, for example, every • At the teacher’s device, all data is decrypted and
2 minutes. On each student’s device, their camera entered into an array. This process will run in the
will be connected to the framework and a backend, where it can’t be accessed by the
screenshot will be taken. teacher.
• Through a basic AI (Artificial Intelligence) • The emotion classification data of the array will
algorithm, the student’s face will be detected. then be used to generate a pie-graph. This will be
Then, facial features of the image will be an easy to read, understandable format for
extracted by mapping of facial points. For both educators to quickly access and analyze. The
face detection and feature extraction, the OpenCV graph will be available during screen sharing and
library will be used, which provides ready-to-use be readily movable across the educator’s page.
methods with advanced capabilities. This will ensure ease and efficiency.
Following are the steps for student support in the devised willing to spend a lot or aren't comfortable with
framework- professional therapy
• For long-term courses that engage with students for
more than 3 sessions, educators can turn on settings to IV. DISCUSSION
enable student support. Data arrays from each session
For the purpose of this study, we developed a ResNet50-
will be automatically stored on the teacher’s device.
based classification model, with the aim of analyzing
This data will be encrypted to prevent privacy
different architectures, datasets, parameters, etc. to develop
invasion. It will be loaded back onto the streaming
the most accurate and efficient version. This was
platform architecture in the backend when the next
accompanied with constructing a framework which
session starts.
detailed the real-time process of image extraction, feature
• The data arrays in the backend will be analyzed and if detection, emotion classification and data storage. The
a student is flagged to have shown negative emotions student support system is unique from pre-existing
such as ‘sad’, ‘angry’, ‘contempt’, ‘fear’, etc. research through its ability to actually utilize the data
repeatedly (i.e. more than 7 times in an average emotion classification data to assist students that are
session of 30 minutes), their device will be contacted. struggling.
An automated support message will be sent asking for
This study’s results are promising, both in terms of model
their consent to take further action. Moreover, if all
analysis and framework development. The high training
students show negative emotions consistently, this
accuracy reflects that the model architecture and
will be an indication to the educator to make their
hyperparameters are well-suited to the task. The ResNet50
sessions more engaging.
model is particularly noteworthy due to its performance
• If the student gives consent, they will be prompted to and lightweight characteristics. Additionally, the features
take one or more of 3 actions- in the dataset are highly predictive of the target variable.
• They can contact their teacher or other trusted While the research objectives were met, it’s essential to
staff, with whom they can then share their consider the limitations as well. Due to the lack of
concerns. This method would be best suited for available data, the potential of this model was stunted to
issues with the learning style or course load. some extent. With resources like more computational
power, it could have had better performance. As predicted,
• They can contact professional therapists or
the model also may have difficulties in real- world
psychologists. We will suggest trained experts
scenarios, where lighting, angles, accessories, etc may
they can reach out to. This method is suited for
distort faces in images and lead to inaccurate predictions.
personal issues, such as mental health disorders,
The training data may also be artificial in its expression of
financial issues, health-related challenges, etc.
specific emotions, leading to disparity with real-life
• We can also collaborate with high-quality analysis scenarios since students don't portray singular
therapist AI bots. This would work best for emotions in real life but rather have mixed emotions that
students who have minor problems, and aren't the model may get confused with.