Identification of Emotions From Facial Gestures in
Identification of Emotions From Facial Gestures in
Identification of Emotions From Facial Gestures in
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Educational models currently integrate a variety of technologies and computer applications
that seek to improve learning environments. With this objective, information technologies have increasingly
adapted to assume the role of educational assistants that support the teacher, the students, and the areas
enrolled in educational quality. One of the technologies that are gaining strength in the academic field is
computer vision, which is used to monitor and identify the state of mind of students during the teaching of
a subject. To do this, machine learning algorithms monitor student gestures and classify them to identify
the emotions they convey in a teaching environment. These systems allow the evaluation of emotional
aspects, based on two main elements, the first is the generation of an image database with the emotions
generated in a learning environment such as interest, commitment, boredom, concentration, relaxation, and
enthusiasm. The second is an emotion recognition system, through the recognition of facial gestures using
non-invasive techniques. This work applies techniques for the recognition and processing of facial gestures
and the classification of emotions focused on learning. This system helps the tutor in a modality of face-to-
face education and allows him to evaluate emotional aspects and not only cognitive ones. This arises from
the need to create a base of images focused on the spontaneous learning of emotions since most of the works
reviewed focus on these acted-out emotions.
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
in the field of information technologies (IT) are mentioned, sues are important to be aware of the main one is precision,
which have revealed new forms of support for education since, this can vary depending on the quality of the input data
strategies in the higher technical-professional field. Espe- and the complexity of the algorithms used. It is important
cially in the evaluation of emotional aspects of people with to note that these solutions are not foolproof and may make
the application of different biometric techniques, with em- mistakes when identifying students’ emotions. The collection
phasis on the recognition of facial patterns, which allows for and use of student emotional data may raise ethical and
capturing relevant information for the analysis and develop- privacy issues. It is important to ensure that students are
ment of strategies in the educational field. In the work carried informed about the use of their data and that appropriate
out by [8], it is mentioned that the recognition of biometric steps are taken to protect their privacy. The algorithms used
patterns is a method for the identification of individuals in these solutions may have unintended biases, which can
through facial characteristics. This analysis can be developed lead to inaccurate or discriminatory results. It is important
practically by various tools and applied to various disciplines, to consider the diversity of the student population and ensure
where it is required to use the recognition of faces and their that solutions are fair and equitable. Some of these solutions
emotions. can be expensive and may not be available to all educational
Another group of works mentions that facial recognition institutions. It is important to carefully weigh the costs and
can be a three-dimensional object that is subject to different benefits before investing in a solution.
degrees of luminosity, positioning, expressions, and other This work proposes the design of an emotion identification
factors that need to be identified from patterns generated system, based on the recognition of gestures in the faces
from the acquired images [9]. Another of the works reviewed of students during the teaching process in a specific subject
mentions that most emotion recognition systems analyze [12]. The gestures on the faces are generally aligned to the
the voice, as well as all the words that are pronounced or emotion of a person, it is this factor that is taken advantage
written. For example, a high, shaky, rushed tone can indicate of by an artificial intelligence algorithm that uses a neural
fear, and more complex systems also analyze gestures and network that has been previously trained with the use of a
even consider the environment, along with facial expressions. data set that contains a large volume of frames generated,
Emotion recognition systems typically learn to link emotion through streaming videos, which allow classifying gestures
to its outward manifestation from large, categorized data sets. and determining the emotion generated by a student [13]. The
Gartner estimates that in the immediate future, one in ten proposed algorithm is developed in Python and uses several
devices will have emotion recognition technology. libraries that are responsible for managing neural networks,
In works such as [10], a study is carried out on existing as well as the use of functions for data analysis.
technological solutions on the market that use artificial in- The novelty of having a system that is responsible for
telligence techniques to identify students’ emotions during identifying emotions lies in its ability to provide real-time
a learning activity. These solutions use facial and voice feedback on the emotional state of students. This can help ed-
recognition, and text analysis techniques to identify students’ ucators tailor their teaching and improve student learning. In
emotions in real-time. For example, Affectiva is a platform addition, these types of systems can help identify long-term
that uses facial recognition technology to measure students’ patterns of behavior and emotions, which can help educators
emotions as they interact with digital learning content. Emo- develop more effective interventions to help students learn
tiva is a voice analysis system that can detect emotions in [14].
real-time in students’ speech. Mursion is a simulation plat- For example, if an emotion recognition system detects that
form that uses artificial intelligence technology to simulate a student is frustrated, the educator can step in and offer
interactions between students and teachers, allowing teachers additional help to address the challenges the student is facing.
to measure students’ emotions in real-time and adjust their Similarly, if the system detects that a student is bored or
teaching accordingly. Smart Sparrow uses machine learning disinterested, the educator can modify her teaching to keep
algorithms to measure students’ attention and emotion as the student more engaged. In addition, the integration of
they interact with adaptive learning content. Knewton is an emotion recognition systems can provide greater objectivity
adaptive learning platform that uses data analytics technol- to the learning assessment process since it can help educators
ogy to monitor student progress and adjust learning content to assess the impact of different teaching methods on the
based on their needs and emotions. BrainCo uses wear- emotional state of students. In general, emotion recognition
able sensors to measure students’ brain activity and detect systems have the potential to improve the learning experience
their levels of concentration and emotion during learning. of students and help educators improve the quality of their
Classcraft is a gamification platform that uses data analytics teaching.
technology to measure students’ motivation and engagement
while playing online educational games. Importantly, the se- II. MATERIALS AND METHODS
lection of a platform should be based on the specific needs of For the development of the method, three fundamental
each learning scenario and the ethics and privacy of student bases are considered, image databases, affective computing,
data. and emotion recognition systems with artificial intelligence.
While these solutions offer significant benefits, some is- These bases guarantee the functioning of the identification of
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
off the object can affect the quality of the image that you
want the vision system to process. Inadequate lighting can
overload the image with unnecessary information such as
shadows, reflections, high contrasts, etc. which decreases the
performance of the vision system. The two relevant lighting
factors are light intensity and the position of the light source.
Preprocessing is the transformation of one image into an-
other, that is, through one image a modified one is obtained,
the purpose of which is to make the subsequent analysis sim-
pler and more reliable. There are countless image preprocess-
ing techniques, but very few satisfy the low computational
cost. Within these, there are methods in the space domain
and the frequency domain. In the transformation of data, the
use of techniques and filters that allow for to improvement
of the image is common, among these is the handling of
histograms. That is to say that given the digital representation
of an image by means of the arrangement of N rows by M
FIGURE 1. Methodology for data processing and generation of a dataset. columns, an M×N matrix is determined, in which the digital
representation of bitmap will be given by the distribution
function f(m,n), for n ∈ [0, N − 1], and m ∈ [0, M − 1],
typically N and M are powers of 2. Another parameter to
the emotions of the students, through the gestures that their
consider is the resolution of an image, this is the number
faces generate in a didactic environment.
of pixels that describe it, and a typical measurement is in
terms of “pixels per inch” (PPI). Therefore, the quality of
A. IMAGE DATABASE the representation as well as the size of the image depends
Databases (DB) have had great growth in recent years in on the resolution, which in turn determines the memory
the analysis and construction of multidimensional data ware- requirements for the graphic file to be generated.
houses. These are created for a specific purpose and represent Another important parameter in handling images is the
a collection of large volumes of data that can be text, images, size of the image, these are its actual dimensions in terms
audio, etc. In general, data warehouses represent a funda- of width and height once printed, while the file size refers to
mental part of the infrastructure for the development of data the amount of physical memory needed to store the image
recognition and processing applications [15], [16]. Figure 1 information, the digitized image on any computer storage
shows the methodology of a data warehouse, which generally medium. Certainly, the resolution of the image strongly con-
integrates a data planning and capture process, processing, ditions these two concepts, since the number of pixels in the
data storage and classification, and dataset generation. The digitized image is fixed and therefore increasing the size of
extraction stage consists of planning and data capture; In this the image reduces the resolution and vice versa. Contrast
stage, the sources and types of data that are collected, the is another factor widely used in the transformation of the
necessary infrastructure to acquire them, and the process that data, this consists of increasing or decreasing the straight line
those involved will follow are specified. The transformation with a slope of 45 degrees that represents the gray (with the
stage consists of data processing that allows the identification precaution of not exceeding the limits 0-255) between input
and extraction of characteristics from the captured data, then and output. The transformation corresponding to the contrast
maximum and time difference techniques are applied. The change is:
last stage consists of storing and classifying the information
using a title or label that allows the information to be recog- vO (m, n) = (vI (m, n) − 2y−1 )tanϕ + 2y−1 (1)
nized, through a loading process [17].
Planning and data capture is the process by which an where Y is the scale in bits, VI and VO are the input
image of the real world is obtained through a sensor (camera, and output values, respectively valued in the pixel (m,n);
scanner, etc.) that will then be processed and manipulated. An and the angle ϕ corresponds to the properties of the linear
image is a two-dimensional function symbolized by f(x,y), transformation of contrasts, specifically the slope The data
where x and y are spatial coordinates. A continuous image warehouse contains information relevant to the object of
can be represented by a matrix of N rows and M columns. study, therefore, it must be tested to identify its strengths
This matrix is the digital representation of the image. For and weaknesses. Based on the experience and analysis of
the correct treatment and study of the image in an artificial the tests, data is added or removed from the warehouse.
vision system, it is important to consider certain factors such This mechanism is repeated continuously to have a balanced
as light, interference, image background, and resolution. The database, which implies that the classes or labels have ap-
nature of the light, its position, and how the light reflects proximately the same number of images [18]. In addition, the
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
data must be representative and correspond to spontaneous anger or rage (secondary emotion) and provoke an aggressive
images, avoiding the storage of actuated images. The data reaction. A model of valence and intensity dimensions is also
must also be reliable, without errors or repetitions. used in this case to describe an emotion more precisely [24],
[25], as shown in Figure 3.
B. AFFECTIVE COMPUTING Emotions generally generate expressions, and these are
Affective computing is an area of Artificial Intelligence (AI) classified into internal and external expressions. Internal ex-
that arises from the need to provide computer equipment with pressions can be signals generated by the body, such as blood
a certain capacity to interact with people. This task is carried pressure, sweating, and electroencephalography signals, and
out using computer vision techniques and machine learning external expressions can be facial, the sound of the voice,
algorithms, the objective of machine-human interaction is for body posture, body movement body language [28].
the system to be capable of producing an effective response All the activities carried out by people generate emotions,
in people [19]. Affective computing is interdisciplinary and according to the environment where this work is carried out,
is applicable in different areas such as computer science, emotions are also focused on learning. These emotions are
psychology, and cognitive aspects. In addition, it represents produced in students when they perform different activities,
an important role in the development of intelligent interfaces manifesting a variety of affective states in learning contexts
applied to education or educational software. According to [29]. Among the emotions aligned to learning are commit-
[20], [21], affective computing is subdivided into four re- ment, boredom, frustration, stress, focus, interest, relaxation,
search areas, such as: etc. According to several of the related works, the emotions
of confusion, frustration, or bored are presented in students
• The analysis and characterization of affective states that when they perform exercises that require certain techniques
identify through natural interactions the relationships or information with which they are not familiar and can be
between effect and cognitive processes in learning. considered negative for student learning.
• Automatic recognition of affective states by analyzing On the [30] a model related to the emotions of learning
facial expressions and extracting features from linguis- is proposed, in which a model divided into four quadrants is
tic expressions, posture, gaze tracking, and heart rate, shown, where quadrant I show an evaluation of admiration,
among others. satisfaction or curiosity resulting in positive constructive
• The adaptation of the systems to respond to a particular learning, Quadrant II depicts an appraisal of disappointment,
affective state of the users. perplexity, or confusion resulting in negative constructive
• The design of avatars that show appropriate affective learning, Quadrant III depicts an appraisal of frustration,
states for a better interaction with the user. rejection, or misconceptions that result in negative learning
Affective computing, from an interpretive point of view, and the last Quadrant IV, shows an assessment of optimism,
requires that the concept of emotion be precisely defined new research that translates into positive learning [21], [31],
since it can confuse with concepts such as affect, feeling, [32].
or motivation. With this consideration, it is established that
affection is a process of social interaction between two or C. EMOTION RECOGNITION SYSTEM
more people [22]. Giving affection is something that is For the design of emotion recognition systems, various meth-
transferred, for example, giving a gift, visiting a sick person, ods of extraction and classification of biometric parameters
etc. Feelings are the mental expression of emotions; that is are used. Parameters are extracted from user-generated ges-
when the emotion is encoded in the brain and the person tures, for example, by using a person’s facial expressions.
can identify the specific emotion they are experiencing, joy, Facial expression analysis is applied in different areas of
grief, anger, loneliness, sadness, shame, etc. Motivation is interest such as education, video games, and telecommuni-
a set of processes involved in the activation, direction, and cations, to name a few [33]. In addition, it is one of the
persistence of behavior which allows us to cause changes most used in human-computer interactions. Facial expression
in life in general. For its part, emotion is a state of mind recognition is an intelligent system that identifies a person’s
produced by an event or memory and occurs every day in face and from it obtains certain characteristics that it analyzes
our daily lives, which plays an important role in non-verbal and processes to know the person’s affective state [34]. The
communication. objective of facial recognition is, from the incoming image,
Concerning emotions, these are classified into two groups, to find a series of data of the same face in a set of training
primary or basic and secondary or alternative. In [23], he images in a database. The great difficulty lies in ensuring that
identified six basic emotions, anger, disgust, fear, happiness, this process is carried out in real-time, something that is not
sadness, and surprise, and shows some characteristics that within the reach of similar works that have been previously
appear on the person’s face as shown in Figure 2. reviewed.
Secondary or alternative emotions are complex emotions For the development of the emotion recognition system
that appear after the primary emotions and depend on the through gesture classification, several technical requirements
situation and the context of the person. For example, a of the system are considered. In the design of algorithms, the
person who is afraid (primary emotion) may turn that into requirements may vary depending on the specific use, how-
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
choosing the data set, it is important to choose a quality data To capture the data, a controlled environment is established
set that contains a variety of emotions and facial expressions. that corresponds to a laboratory with a maximum capacity
This data is used to train and test the emotion recognition of 24 students. An Intel RealSense depth camera has been
model. installed in this laboratory to capture images of the partic-
In addition, in the development of the algorithm, it is ipants. The laboratory has 24 computers that the students
necessary to select the characteristics that will be used to use to solve different exercises on the subject. The emotion
recognize emotions. Characteristics may include the intensity recognition application is housed in the institution’s data
of facial expressions, the position of the eyes and mouth, and center, where two virtual servers have been created, one of
the duration of the expressions. There are different machine- which processes and stores the image signals, and the other
learning algorithms and signal-processing techniques that captures and stores images with a specific name. The name
can be used to recognize emotions. It is important to choose consists of an identifier, a number, and the exact date, and
the appropriate classification model for the selected data set time.
and features. Before feeding the data to the model, it is The class sessions are separate for the two groups of stu-
important to perform proper preprocessing such as data nor- dents, however, the methodology for each session is similar,
malization, denoising, and dimensionality reduction. Another as are the various activities that consist of theoretical com-
important parameter is the evaluation of the emotion recog- ponents and practical development with exercises related to
nition model using a test data set to measure its accuracy a specific problem. During the design stage of the activities,
and generalizability. Likewise, once the emotion recognition they evaluate different learning outcomes, these are based on
model has been developed, it is important to integrate it with the eight types of learning exposed in the learning theory of
an application so that it can be used in real life. [35]. Within the classes, the teacher has three sessions, each
Regarding the evaluation, several techniques and metrics of 60 minutes. As it is a three-hour class, the attention of
are considered to evaluate the emotion recognition algorithm. the students can be diminished, therefore, it is the teacher’s
For example, we consider the precision that measures the pro- task to use a methodology that allows for improving the
portion of correct classifications made by the model. It is one interest of the students. The class schedule is added to the
of the most common metrics and is used to assess the overall continuous sessions since this is in the evening and 75% of
performance of the model. Sensitivity measures the model’s the students work. These factors hinder the teacher’s efforts
ability to correctly identify a specific emotion. It is especially to maintain the interest and concentration of students in the
important if you want to recognize specific emotions, such different topics of the subject. Among the techniques used by
as sadness or happiness. Specificity measures the model’s the teacher is giving higher priority to practical sessions and
ability to correctly identify the lack of a specific emotion. activities that require student participation.
For example, if you want to identify the absence of negative The mechanics of the class that uses the recognition system
emotions such as sadness or anger. The F value is a measure has as components that each student has at their disposal a
that combines precision and sensitivity in a single measure. It computer and a webcam, in addition to the depth camera
is useful when you want to find a balance between accuracy that each laboratory has. Having the cameras allows the
and the ability to correctly identify a specific emotion. The identification of gestures when the students are carrying out
confusion matrix is a table showing the actual rankings and practical activities on their computers [36]. Instead, when
the model’s predictions. It is a useful tool for evaluating false student review is needed for conceptual review of a topic or
positive and false negative rates and for identifying patterns if the teacher is the focus of the class, depth cameras have the
of error in the model. full perspective for monitoring.
To carry out facial recognition, a database of several faces
D. METHODOLOGY that express a facial gesture is needed. These faces should
For the development of this work, an institution of higher denote a variety of expressions such as happiness, sadness,
technical education is considered. The students participating boredom, surprise, etc. Another aspect that these images must
in the study are the third level, and as per the regulations of have had was the variation in light conditions, whether people
the institution, the objective of the system was explained to wear glasses or not, even if they close or wink. It is recom-
the population and their consent was requested to be able to mended that these images be collected in the environment
capture facial images for research purposes only. The data or environment where facial recognition will be applied. The
capture is carried out in two stages, in which the first 20 variety of images obtained from the faces contributes to the
students participate, of which 12 are men and 8 women, performance of the algorithms applied in the system. There
with an age range of 18 to 20 years. In the second stage, is no exact number of images and no guide to precisely
18 students participate, of which 10 are men and 8 women, how many images are needed for an AI algorithm to classify
who are in the same age range. The two groups of students emotions. The only existing reference is that the larger the
take the subject of databases and are part of two parallels, volume of images and the more varied, the better to increase
the first group belongs to parallel “A” and the second to the precision in the algorithm.
“B”, the teacher is the same for both parallels, therefore, the Initially, many images are needed to train a classifier so
methodology is the same for the total population. that it can discern between the presence and non-presence
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
of an object. For example, in the first stage, the algorithm TABLE 1. Dataset generated for the first training session with the detection of
six emotions applied in the tests of a convolutional neural network with 7,000
detects if there is a face, therefore happy images are required, images
that is, photos with faces, and negative images, which are
images that do not contain faces. Then, the features of all Commitment Boring Frustration Focused Interested Neutral
the images are extracted, and for this, an automatic learning 1850 1680 592 1140 922 816
Total 7000
approach is used, and the training proceeds as shown in figure
4.
OpenCV has several pretrained classifiers, not only for
been named as parallel “A” and six classes are evaluated,
human faces but also for eyes, smiles, and animal faces,
commitment, boredom, frustration, focused, interested and
among others. To use face detection with hair cascade in
neutral, with a total of 7,000 images corresponding to 20
OpenCV it is necessary to use the detectMultiScale module
students, Table 1 indicates the number of images that each
that helps to detect objects according to the classifier used.
category has.
This allows obtaining a bounding rectangle where the object
to be found within an image is located, for this, the following The 7,000 images correspond to the gestures identified
arguments must be specified: within each of the six categories and these are processed as
part of the training of the convolutional network. Figure 5
• Image: It is the image where the face detector will act. shows the architecture of the convolutional network, which
• ScaleFactor: This parameter specifies how much the im- has three layers, each of which has a convolution with an
age will be scaled down. For example, if 1.1 is entered, activation function and an additional max polling layer. At
it means that the image will be reduced by 10%, with the end of these the flatten, dropout layers, and an output
1.3 it will be reduced by 30%, thus creating a pyramid layer with an activation function were added.
of images. It is important to note that if we give a very Facial expression recognition depends on four steps shown
high number, some detections are lost. While for very in Figure 6, the first step is to detect faces in an image by
small values such as 1.01, it will take longer to process, applying the oriented gradient histogram algorithm. Next, the
since there will be more images to analyze, in addition facial landmark estimation algorithm is used, which allows
to the fact that false positives can increase, which are the identification of 68 landmarks on each face. In the third
detections presented as objects or faces, but which are step, 128 measurements are created for each face through
not they are. deep learning, which corresponds to the unique characteris-
• MinNeighbors: This parameter specifies how many tics of the faces, finally, with the unique characteristics of
neighbors each candidate rectangle must have to retain each face, the name of the person is determined.
it, that is, a window is obtained that will go through an Figure 7 shows the stages that the system performs to
image looking for faces. So, this parameter relates to correctly identify the emotion. The initial stage validates that
all those delimiting rectangles of the same face. There- the image received by the recognizer contains a face, if the
fore, minNeighbors specifies the minimum number of algorithm does not find it, it discards the image. Next, in the
bounding boxes, or neighbors, that a face must have to process, a gray filter is applied to eliminate the different color
be detected as such. channels to later detect some important parts of the face such
The detection of the students’ gestures, while they are as the nose, eyes, eyebrows, and mouth. In the next stage, a
receiving a synchronous class, requires the generation of a technique is applied to mark the facial points on the detected
data set that the recognition algorithm takes for its training, parts, in addition, an initial reference point is in the center of
for this haarcascade_f rontalf ace is used. This algorithm the nose, and various facial points are identified on the parts
allows the storage of a specific number of faces for system of the face. In the next stage, the geometric calculations of
training, this process can be done from a video provided by the distance between the initial point of reference and each
the students or from streaming. In this case, the number of facial point detected on the face are made. The result of
images is 350 per student, and the students have been asked to the calculations is a matrix of facial characteristics that are
simulate many expressions in the videos, where gesticulation processed by a vector support algorithm with her emotion
helps to improve the accuracy of the training. For this task, a tag so she can learn to classify facial expressions. Finally, the
convolutional network designed in Kaggle Notebooks is used trained neural network is sent more vectors of facial features
with the Keras, Tensorflow, and OpenCV libraries for image to assess whether the algorithm has learned to classify ges-
processing [36]. In addition, the Matplotlib library is used in tures and recognize emotions.
the design to generate a graph where the model is evaluated. For the construction of the model, it is possible to use a
To do this, the loss and precision values of both the training video previously stored in a repository, for which the face
and validation phases are identified. The confusion matrix is must simulate several gestures that will later be separated into
generated with the Sklearn library, to evaluate the accuracy frames and used for training the artificial neuron. In addition,
of the [37] classification. in the process of data acquisition and training of the neuron,
The data set generated for the first training session and it is possible to acquire a greater volume of data with the use
tests corresponds to the first group of students that have of streaming video. Obtaining the images from a streaming
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
video has the advantage that the frames represent the most
natural state possible of the people and even help to identify resizes it to 640 pixels wide with the grayscale application.
the variations of the environment such as lighting, objects, In the next step, a function is created where all the detected
attenuation, etc. faces are stored. To analyze each one of them, the cycle is
For the generation of the streaming video, it is necessary generated, in which a rectangle is drawn that surrounds each
to specify the faceClassif function where the face detector to face and then it is cut and resized to 150 x 150 pixels to that
be used is assigned, in addition, a counter initialized to “0” all stored faces have the same size, and we store each face in
is established, which is responsible for counting all the faces the repository and finally increment the counter by 1.
that are stored for training. The library reads each frame and For training, the obtModel function is created, for
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
TABLE 2. Description of the test set for algorithm verification TABLE 3. Confusion matrix built with the results obtained for the test with 54
images
Description Number of images Models used
Angry 6 2,5,8,13,21,25 Prediction 1 2 3 4 5 6 7 8 Accuracy
Contented 9 5,6,11,13,23,26,28,35,37 1 5 1 1 0.71
Disgusted 7 2,10,13,18,28,30,35 2 7 1 0.87
Fearful 8 3,10,16,21,25,26,37,18 3 7 1 0.87
Happy 4 2,6,11,37 4 1 7 1 0.78
Neutral 6 3,5,10,16,23,26 5 3 1
Sad 8 6,8,16,21,25,28,30,35 6 1 3 0.75
Surprised 6 3,8,11,18,23,30 7 8 1
8 1 1 5 0.4
Recall 0.83 0.77 1 0.87 0.75 0.5 1 0.83 80 %
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
TABLE 7. Results of the precision and standard deviation taken in two the tests carried out, the system was also evaluated in tra-
sessions for the evaluation of the recognition model
ditional classrooms, in which only one camera is available,
Classification Session 1 Session 2 even though the model is capable of recognizing people, its
Average accuracy 65.7% 69.9% effectiveness in recognizing gestures and subsequent emotion
Standard deviation 2.3% 1.3% is limited to the line of sight of the camera with the student.
This means that if a student does not have a direct camera
TABLE 8. Comparison of the emotions captured by the system among 5 angle, the model assumes that a person does not exist and
randomly selected users in the development of two learning activities
generates a person not detected notification. In addition, the
Classification A 1 A2 A3 A4 A5 Total number of students marks another limitation for the identifi-
Coincidence 33 47 53 48 55 236 cation of gestures [42], [43]. Other works that use sensors to
Total data 44 75 61 81 69 330 detect emotions may have greater success in an environment
Average 0.75 0.626 0.868 0.592 0.797 77.66%
A= Participant such as the one described, however, costs become a limitation
that must be considered.
This work guarantees the application of the model in
this calculation the precision of the balance is represented by face-to-face educational environments, focusing on provid-
giving the mean, plus or minus the standard deviation. For ing comprehensive help to the teacher [44]. Considering
example, the accuracy of AGR is 65.7%. that in certain chairs it is a requirement to have computer
The results of the final validation of the method are pre- devices, it is possible to implement the system and act as
sented in Figure 8, where it was carried out with a com- an assistant to the tutor. Having accurate information about
parison of the emotion recognition system between the two the emotions reflected by the students, they can adjust their
groups participating in the study. Five people were selected, teaching methodologies. The potential of the system allows
three men and two women, aged between 19 and 22 years. it to be integrated into the student service system, for this
The participants carried out two activities; reading a scientific it will be necessary to add methods and libraries that allow
article and carrying out three design exercises for a relational not only to identify of gestures and emotions in students, but
database using Microsoft SQL. The results were obtained by it will also be necessary for the system to recognize people
recording and counting the times the recognizer matched the [33]. The inclusion of these characteristics in the system will
emotion classification with the student’s feelings. be presented in the second stage of this work, in addition, it
has been proposed to improve the recognition of emotions so
IV. DISCUSSION that the system can be applied to a hybrid education model
Based on the literature review on various facial expression and verify its functionality.
recognition systems, it was determined that for implemen- There are several comparisons between an emotion recog-
tation the Python programming language is aligned with nition system like the one proposed and other works that use
the needs exposed in this work. Furthermore, the OpenCV different techniques. Some comparisons focus on the accu-
library for parsing and processing images or videos is suitable racy of emotion recognition models, while others consider
for training support vector machines [38]. In several works, factors such as processing speed and ease of implementation.
the use of headbands with sensors for the identification of For example, in a study published in 2020 [12], various emo-
emotions is proposed, however, the process with devices that tion recognition algorithms were compared using standard
people must use can cause rejection because they are consid- databases. The results indicated that the deep neural net-
ered invasive. Therefore, the inclusion of AI and computer work models had the best overall accuracy results. However,
vision was planned in the design, for which it was determined models based on spectral features were also found to have
to validate that the image received by the recognizer contains comparable results and were faster to train and evaluate.
a face and that if it is not found, discard the image. Subse- Another 2019 [21] study compared different signal pro-
quently, in the process, a gray filter is applied to eliminate cessing methods for emotion recognition in speech. In this
the different color channels to later detect some important case, several machine learning algorithms were compared,
parts of the face such as the nose, eyes, eyebrows, and mouth including convolutional neural networks and support vector
[39], [40]. By applying a facial point technique, it is possible machines. The results indicated that methods based on spec-
to detect an initial reference point in the center of the nose tral features and prosodic features obtained the best overall
and to identify various facial points on parts of the face [41]. accuracy results. The choice of the best method for emotion
According to the results obtained, it is established that the recognition will depend on the context of the application, the
proposed method can be applied in an educational environ- quality of the available data, and the skills of the development
ment, where the ideal environment is when students have a team. Therefore, it is important to do a thorough evaluation
personal computer. Having personal cameras guarantee the of the available options before making a decision.
results obtained from the emotion recognition system, in In another study carried out in 2020 [11], different signal
addition, it is important to establish that the performance tests processing methods for emotion recognition in EEG (elec-
were carried out in computer laboratories, where activities troencephalogram) were compared. The methods evaluated
depend on the exclusive use of computer equipment. Among included convolutional neural networks, recurrent neural net-
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
[5] R. Kobai and H. Murakami, “Effects of interactions between facial expres- [30] E. Ivanova and G. Borzunov, “Optimization of machine learning algorithm
sions and self-focused attention on emotion,” PLoS ONE, vol. 16, 2021. of emotion recognition in terms of human facial expressions,” 2020, vol.
[6] E. G. Krumhuber, D. Küster, S. Namba, D. Shah, and M. G. Calvo, 169.
“Emotion recognition from posed and spontaneous dynamic expressions: [31] R. W. Picard, “A ective Computing,” Pattern Recognition, no. 321, 1995.
Human observers versus machine analysis,” Emotion, vol. 21, 2021. [32] P. Näykki, J. Isohätälä, and S. Järvelä, “You really brought all your feelings
[7] F. Makhmudkhujaev, M. Abdullah-Al-Wadud, M. T. B. Iqbal, B. Ryu, and out” – Scaffolding students to identify the socio-emotional and socio-
O. Chae, “Facial expression recognition with local prominent directional cognitive challenges in collaborative learning,” Learning, Culture and
pattern,” Signal Processing: Image Communication, vol. 74, 2019. Social Interaction, vol. 30. 2021.
[8] K. R. Scherer, H. Ellgring, A. Dieckmann, M. Unfried, and M. Mortillaro, [33] N. A. A. Aziz et al., “Awareness and Readiness of Malaysian University
“Dynamic facial expression of emotion and observer inference,” Frontiers Students for Emotion Recognition System,” International Journal of Inte-
in Psychology, vol. 10, 2019. grated Engineering, vol. 13, 2021.
[9] M. Aziz and M. Aman, “Decision Support System For Selection Of Ex- [34] P. Partila, M. Voznak, and J. Tovarek, “Pattern Recognition Methods and
pertise Using Analytical Hierarchy Process Method,” IAIC Transactions Features Selection for Speech Emotion Recognition System,” Scientific
on Sustainable Digital Innovation (ITSDI), vol. 1, 2021. World Journal, vol. 2015, 2015, doi: 10.1155/2015/573068.
[35] T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. Ambikaira-
[10] B. Gaudelus et al., “Improving facial emotion recognition in schizophre-
jah, “A Comprehensive Review of Speech Emotion Recognition Systems,”
nia: A controlled study comparing specific and attentional focused cogni-
vol. 9. 2021. doi: 10.1109/ACCESS.2021.3068045.
tive remediation,” Frontiers in Psychiatry, vol. 7, 2016.
[36] D. Singh, “Human Emotion Recognition System,” International Journal of
[11] S. Volynets, D. Smirnov, H. Saarimaki, and L. Nummenmaa, “Statistical Image, Graphics and Signal Processing, vol. 4, 2012.
pattern recognition reveals shared neural signatures for displaying and [37] S. Emami and V. P. Suciu, “Facial Recognition using OpenCV,” Journal of
recognizing specific facial expressions,” Social Cognitive and Affective Mobile, Embedded and Distributed Systems, vol. 4, no. 1, 2012.
Neuroscience, vol. 15, 2020. [38] J. Sigut, M. Castro, R. Arnay, and M. Sigut, “OpenCV Basics: A
[12] G. Lou and H. Shi, “Face image recognition based on convolutional neural Mobile Application to Support the Teaching of Computer Vision Con-
network,” China Communications, vol. 17, 2020. cepts,” IEEE Transactions on Education, vol. 63, no. 4, 2020, doi:
[13] Y. Takahashi, S. Murata, H. Idei, H. Tomita, and Y. Yamashita, “Neu- 10.1109/TE.2020.2993013.
ral network modeling of altered facial expression recognition in autism [39] M. Naveenkumar and V. Ayyasamy, “OpenCV for Computer Vision Ap-
spectrum disorders based on predictive processing framework,” Scientific plications,” Proceedings of National Conference on Big Data and Cloud
Reports, vol. 11, 2021. Computing (NCBDC’15), no. March 2015, 2016.
[14] G. Yang, X. Xi, and Y. Yin, “Finger vein recognition based on a personal- [40] H. Adusumalli, D. Kalyani, R. K. Sri, M. Pratapteja, and P. V.
ized best bit map,” Sensors, vol. 12, 2012. R. D. P. Rao, “Face Mask Detection Using OpenCV,” 2021. doi:
[15] R. Shyam and Y. N. Singh, “Identifying individuals using multimodal face 10.1109/ICICV50876.2021.9388375.
recognition techniques,” 2015, vol. 48. [41] Y. Kumar and M. Mahajan, “Machine learning based speech emotions
[16] A. Villar, M. T. Zarrabeitia, P. Fdez-arroyabe, and A. Santurtún, “Integrat- recognition system,” International Journal of Scientific and Technology
ing and analyzing medical and environmental data using ETL and Business Research, vol. 8, no. 7, 2019.
Intelligence tools,” International Journal of Biometeorology, vol. 62, p. [42] R. Alhalaseh and S. Alasasfeh, “Machine-learning-based emotion recog-
1085, 2018. nition system using EEG signals,” Computers, vol. 9, no. 4, 2020, doi:
[17] W. Villegas-Ch, X. Palacios-Pacheco, and S. Luján-Mora, “Artificial intel- 10.3390/computers9040095.
ligence as a support technique for university learning,” 2019, pp. 1–6. [43] Y. Uranishi, “OpenCV: Open source computer vision library,” Kyokai Joho
[18] X. Palacios-Pacheco, W. Villegas-Ch, and S. Luján-Mora, “Application of Imeji Zasshi/Journal of the Institute of Image Information and Television
data mining for the detection of variables that cause university desertion,” Engineers, vol. 72, no. 5, 2018, doi: 10.3169/ITEJ.72.736.
Communications in Computer and Information Science, vol. 895. 2019. [44] H. J. Yun and J. Cho, “Affective domain studies of K-12 computing
[19] W. Villegas-Ch, S. Luján-Mora, and D. Buenaño-Fernandez, “Data mining education: a systematic review from a perspective on affective objectives,”
toolkit for extraction of knowledge from LMS,” 2017, vol. Part F1346, pp. Journal of Computers in Education, vol. 9, 2022.
31–35.
[20] J. Liu, J. Tong, J. Han, F. Yang, and S. Chen, “Affective Computing
Applications in Distance Education,” 2013.
[21] H.-J. So, J.-H. Lee, and H.-J. Park, “Affective Computing in Education:
Platform Analysis and Academic Emotion Classification,” International
Journal of Advanced Smart Convergence, vol. 8, 2019.
[22] E. Yadegaridehkordi, N. F. Noor, M. N. B. Ayub, H. B. Affal, and N. B.
Hussin, “Affective computing in education: A systematic review and future FIRST A. AUTHOR William Eduardo Villegas
research,” Computers and Education, vol. 142, 2019. is a professor of Information Technology at the
[23] A. Alblushi, “Face Recognition Based on Artificial Neural Network: A Universidad de Las Américas (Quito, Ecuador).
Review,” Artificial Intelligence & Robotics Development Journal, 2021. He has a Ph.D. in computer science from the Uni-
[24] E. Roidl, F. W. Siebert, M. Oehl, and R. Höger, “Introducing a multivariate versity of Alicante, obtained a master’s degree in
model for predicting driving performance: The role of driving anger and communications networks, and is a systems engi-
personal characteristics,” Journal of Safety Research, vol. 47, 2013. neer specializing in artificial intelligence robotics.
[25] K. R. Scherer and E. Coutinho, “How music creates emotion: A multifac- His main research topics include web applications,
torial process approach,” The emotional power of music, 2013.
data mining, and e-learning. He has participated in
[26] J. H. Chai, “Notice of Retraction: Study on harmonious human-computer various conferences as a speaker on topics such
interaction model based on affective computing for web-based education,”
as ICT in education and how they improve educational quality and student
2009 1st International Conference on Information Science and Engineer-
learning. His main articles focus on the design of ICT systems, models,
ing, ICISE 2009, 2009.
and prototypes applied to different academic environments, especially with
[27] B. García, E. L. Serrano, S. P. Ceballos, E. J. Cisneros-Cohernour, G.
C. Arroyo, and Y. E. Díaz, “Las competencias docentes en entornos
the use of Big data and Artificial Intelligence as a basis for the creation of
virtuales: un modelo para su evaluación,” RIED. Revista Iberoamericana intelligent educational environments. In addition, his interests and scientific
de Educación a Distancia, vol. 21. 2017. articles integrate cybersecurity techniques and methods for data protection
[28] M. Á. R. Catalán, R. G. Pérez, R. B. Sánchez, O. B. García, and L. V. Caro, in all environments that use ICT as a communication channel, at the moment
“Las emociones en el aprendizaje online,” RELIEVE - Revista Electrónica he has a wide variety of articles indexed in scientific journals that arise from
de Investigación y Evaluación Educativa, vol. 14. 2014. research projects. , supported by the Universities of Ecuador.
[29] E. Yadegaridehkordi, N. F. B. M. Noor, M. N. B. Ayub, H. B. Affal, and
N. B. Hussin, “Affective computing in education: A systematic review and
future research,” Computers and Education, vol. 142, 2019.
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4