Identification of Emotions From Facial Gestures in

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

This article has been accepted for publication in IEEE Access.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

VOLUME 4, 2016 1

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI

Identification of Emotions from Facial


Gestures in a Teaching Environment with
the use of Machine Learning Techniques
WILLIAM VILLEGAS-CH.1 , JOSELIN GARCÍA-ORTIZ 1 , AND SANTIAGO SÁNCHEZ-VITERI.2
1
Escuela de Ingeniería en Tecnologías de la Información, FICA, Universidad de Las Américas, Quito 170125, Ecuador
2
Departamento de Sistemas, Universidad Internacional del Ecuador, 170411 Quito, Ecuador
Corresponding author: William Villegas-Ch. ([email protected]).

ABSTRACT Educational models currently integrate a variety of technologies and computer applications
that seek to improve learning environments. With this objective, information technologies have increasingly
adapted to assume the role of educational assistants that support the teacher, the students, and the areas
enrolled in educational quality. One of the technologies that are gaining strength in the academic field is
computer vision, which is used to monitor and identify the state of mind of students during the teaching of
a subject. To do this, machine learning algorithms monitor student gestures and classify them to identify
the emotions they convey in a teaching environment. These systems allow the evaluation of emotional
aspects, based on two main elements, the first is the generation of an image database with the emotions
generated in a learning environment such as interest, commitment, boredom, concentration, relaxation, and
enthusiasm. The second is an emotion recognition system, through the recognition of facial gestures using
non-invasive techniques. This work applies techniques for the recognition and processing of facial gestures
and the classification of emotions focused on learning. This system helps the tutor in a modality of face-to-
face education and allows him to evaluate emotional aspects and not only cognitive ones. This arises from
the need to create a base of images focused on the spontaneous learning of emotions since most of the works
reviewed focus on these acted-out emotions.

INDEX TERMS Computer vision, emotion recognition, neural networks, teaching.

I. INTRODUCTION regarding the contents, to offer more experiential forms of


MART classrooms aim to improve the pedagogical activ- teaching that reinforce memory and understanding [3]. The
S ities that take place in them, through the analysis of the
classroom context and its adaptation using different forms of
recognition of facial gestures in people is a complex mul-
tidisciplinary problem that has not yet been fully resolved
content presentation or varying teaching methodologies [1]. [4]. The incorporation of new technologies such as depth
The classroom context refers to all the metrics that make it sensors or high-resolution cameras, as well as the greater pro-
possible to determine what happens inside the school, for cessing capacity of current devices, allow the development
example, the type of activity that is being carried out in the of new technologies capable of detecting different gestures
classroom, and the disposition or level of attention of the and acting in real time [5]. Facial expressions focus on the
students concerning the class, among others [2]. Based on identification of gestures, which express affective states such
this information, a smart classroom determines what needs as joy, surprise, fear, sadness, anger, and disgust. It is difficult
to be improved in the classroom and recommends it to the to find databases of facial images that represent learning-
teacher. These recommendations can range from a change in focused secondary emotions, such as bored, engaged, ex-
teaching practice to modifying the way content is presented, cited, focused, interested, and relaxed [6]. The use of emotion
for example, using slides or augmented reality. recognition systems to detect emotions focused on teaching
Currently, smart classrooms propose the use of different can help tutors to assess emotional aspects and not only
information and communication technologies (ICT) that con- cognitive processes [7].
tribute to the identification of the emotional state of students In several of the works reviewed, the existing advances

2 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

in the field of information technologies (IT) are mentioned, sues are important to be aware of the main one is precision,
which have revealed new forms of support for education since, this can vary depending on the quality of the input data
strategies in the higher technical-professional field. Espe- and the complexity of the algorithms used. It is important
cially in the evaluation of emotional aspects of people with to note that these solutions are not foolproof and may make
the application of different biometric techniques, with em- mistakes when identifying students’ emotions. The collection
phasis on the recognition of facial patterns, which allows for and use of student emotional data may raise ethical and
capturing relevant information for the analysis and develop- privacy issues. It is important to ensure that students are
ment of strategies in the educational field. In the work carried informed about the use of their data and that appropriate
out by [8], it is mentioned that the recognition of biometric steps are taken to protect their privacy. The algorithms used
patterns is a method for the identification of individuals in these solutions may have unintended biases, which can
through facial characteristics. This analysis can be developed lead to inaccurate or discriminatory results. It is important
practically by various tools and applied to various disciplines, to consider the diversity of the student population and ensure
where it is required to use the recognition of faces and their that solutions are fair and equitable. Some of these solutions
emotions. can be expensive and may not be available to all educational
Another group of works mentions that facial recognition institutions. It is important to carefully weigh the costs and
can be a three-dimensional object that is subject to different benefits before investing in a solution.
degrees of luminosity, positioning, expressions, and other This work proposes the design of an emotion identification
factors that need to be identified from patterns generated system, based on the recognition of gestures in the faces
from the acquired images [9]. Another of the works reviewed of students during the teaching process in a specific subject
mentions that most emotion recognition systems analyze [12]. The gestures on the faces are generally aligned to the
the voice, as well as all the words that are pronounced or emotion of a person, it is this factor that is taken advantage
written. For example, a high, shaky, rushed tone can indicate of by an artificial intelligence algorithm that uses a neural
fear, and more complex systems also analyze gestures and network that has been previously trained with the use of a
even consider the environment, along with facial expressions. data set that contains a large volume of frames generated,
Emotion recognition systems typically learn to link emotion through streaming videos, which allow classifying gestures
to its outward manifestation from large, categorized data sets. and determining the emotion generated by a student [13]. The
Gartner estimates that in the immediate future, one in ten proposed algorithm is developed in Python and uses several
devices will have emotion recognition technology. libraries that are responsible for managing neural networks,
In works such as [10], a study is carried out on existing as well as the use of functions for data analysis.
technological solutions on the market that use artificial in- The novelty of having a system that is responsible for
telligence techniques to identify students’ emotions during identifying emotions lies in its ability to provide real-time
a learning activity. These solutions use facial and voice feedback on the emotional state of students. This can help ed-
recognition, and text analysis techniques to identify students’ ucators tailor their teaching and improve student learning. In
emotions in real-time. For example, Affectiva is a platform addition, these types of systems can help identify long-term
that uses facial recognition technology to measure students’ patterns of behavior and emotions, which can help educators
emotions as they interact with digital learning content. Emo- develop more effective interventions to help students learn
tiva is a voice analysis system that can detect emotions in [14].
real-time in students’ speech. Mursion is a simulation plat- For example, if an emotion recognition system detects that
form that uses artificial intelligence technology to simulate a student is frustrated, the educator can step in and offer
interactions between students and teachers, allowing teachers additional help to address the challenges the student is facing.
to measure students’ emotions in real-time and adjust their Similarly, if the system detects that a student is bored or
teaching accordingly. Smart Sparrow uses machine learning disinterested, the educator can modify her teaching to keep
algorithms to measure students’ attention and emotion as the student more engaged. In addition, the integration of
they interact with adaptive learning content. Knewton is an emotion recognition systems can provide greater objectivity
adaptive learning platform that uses data analytics technol- to the learning assessment process since it can help educators
ogy to monitor student progress and adjust learning content to assess the impact of different teaching methods on the
based on their needs and emotions. BrainCo uses wear- emotional state of students. In general, emotion recognition
able sensors to measure students’ brain activity and detect systems have the potential to improve the learning experience
their levels of concentration and emotion during learning. of students and help educators improve the quality of their
Classcraft is a gamification platform that uses data analytics teaching.
technology to measure students’ motivation and engagement
while playing online educational games. Importantly, the se- II. MATERIALS AND METHODS
lection of a platform should be based on the specific needs of For the development of the method, three fundamental
each learning scenario and the ethics and privacy of student bases are considered, image databases, affective computing,
data. and emotion recognition systems with artificial intelligence.
While these solutions offer significant benefits, some is- These bases guarantee the functioning of the identification of
VOLUME 4, 2016 3

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

off the object can affect the quality of the image that you
want the vision system to process. Inadequate lighting can
overload the image with unnecessary information such as
shadows, reflections, high contrasts, etc. which decreases the
performance of the vision system. The two relevant lighting
factors are light intensity and the position of the light source.
Preprocessing is the transformation of one image into an-
other, that is, through one image a modified one is obtained,
the purpose of which is to make the subsequent analysis sim-
pler and more reliable. There are countless image preprocess-
ing techniques, but very few satisfy the low computational
cost. Within these, there are methods in the space domain
and the frequency domain. In the transformation of data, the
use of techniques and filters that allow for to improvement
of the image is common, among these is the handling of
histograms. That is to say that given the digital representation
of an image by means of the arrangement of N rows by M
FIGURE 1. Methodology for data processing and generation of a dataset. columns, an M×N matrix is determined, in which the digital
representation of bitmap will be given by the distribution
function f(m,n), for n ∈ [0, N − 1], and m ∈ [0, M − 1],
typically N and M are powers of 2. Another parameter to
the emotions of the students, through the gestures that their
consider is the resolution of an image, this is the number
faces generate in a didactic environment.
of pixels that describe it, and a typical measurement is in
terms of “pixels per inch” (PPI). Therefore, the quality of
A. IMAGE DATABASE the representation as well as the size of the image depends
Databases (DB) have had great growth in recent years in on the resolution, which in turn determines the memory
the analysis and construction of multidimensional data ware- requirements for the graphic file to be generated.
houses. These are created for a specific purpose and represent Another important parameter in handling images is the
a collection of large volumes of data that can be text, images, size of the image, these are its actual dimensions in terms
audio, etc. In general, data warehouses represent a funda- of width and height once printed, while the file size refers to
mental part of the infrastructure for the development of data the amount of physical memory needed to store the image
recognition and processing applications [15], [16]. Figure 1 information, the digitized image on any computer storage
shows the methodology of a data warehouse, which generally medium. Certainly, the resolution of the image strongly con-
integrates a data planning and capture process, processing, ditions these two concepts, since the number of pixels in the
data storage and classification, and dataset generation. The digitized image is fixed and therefore increasing the size of
extraction stage consists of planning and data capture; In this the image reduces the resolution and vice versa. Contrast
stage, the sources and types of data that are collected, the is another factor widely used in the transformation of the
necessary infrastructure to acquire them, and the process that data, this consists of increasing or decreasing the straight line
those involved will follow are specified. The transformation with a slope of 45 degrees that represents the gray (with the
stage consists of data processing that allows the identification precaution of not exceeding the limits 0-255) between input
and extraction of characteristics from the captured data, then and output. The transformation corresponding to the contrast
maximum and time difference techniques are applied. The change is:
last stage consists of storing and classifying the information
using a title or label that allows the information to be recog- vO (m, n) = (vI (m, n) − 2y−1 )tanϕ + 2y−1 (1)
nized, through a loading process [17].
Planning and data capture is the process by which an where Y is the scale in bits, VI and VO are the input
image of the real world is obtained through a sensor (camera, and output values, respectively valued in the pixel (m,n);
scanner, etc.) that will then be processed and manipulated. An and the angle ϕ corresponds to the properties of the linear
image is a two-dimensional function symbolized by f(x,y), transformation of contrasts, specifically the slope The data
where x and y are spatial coordinates. A continuous image warehouse contains information relevant to the object of
can be represented by a matrix of N rows and M columns. study, therefore, it must be tested to identify its strengths
This matrix is the digital representation of the image. For and weaknesses. Based on the experience and analysis of
the correct treatment and study of the image in an artificial the tests, data is added or removed from the warehouse.
vision system, it is important to consider certain factors such This mechanism is repeated continuously to have a balanced
as light, interference, image background, and resolution. The database, which implies that the classes or labels have ap-
nature of the light, its position, and how the light reflects proximately the same number of images [18]. In addition, the
4 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

data must be representative and correspond to spontaneous anger or rage (secondary emotion) and provoke an aggressive
images, avoiding the storage of actuated images. The data reaction. A model of valence and intensity dimensions is also
must also be reliable, without errors or repetitions. used in this case to describe an emotion more precisely [24],
[25], as shown in Figure 3.
B. AFFECTIVE COMPUTING Emotions generally generate expressions, and these are
Affective computing is an area of Artificial Intelligence (AI) classified into internal and external expressions. Internal ex-
that arises from the need to provide computer equipment with pressions can be signals generated by the body, such as blood
a certain capacity to interact with people. This task is carried pressure, sweating, and electroencephalography signals, and
out using computer vision techniques and machine learning external expressions can be facial, the sound of the voice,
algorithms, the objective of machine-human interaction is for body posture, body movement body language [28].
the system to be capable of producing an effective response All the activities carried out by people generate emotions,
in people [19]. Affective computing is interdisciplinary and according to the environment where this work is carried out,
is applicable in different areas such as computer science, emotions are also focused on learning. These emotions are
psychology, and cognitive aspects. In addition, it represents produced in students when they perform different activities,
an important role in the development of intelligent interfaces manifesting a variety of affective states in learning contexts
applied to education or educational software. According to [29]. Among the emotions aligned to learning are commit-
[20], [21], affective computing is subdivided into four re- ment, boredom, frustration, stress, focus, interest, relaxation,
search areas, such as: etc. According to several of the related works, the emotions
of confusion, frustration, or bored are presented in students
• The analysis and characterization of affective states that when they perform exercises that require certain techniques
identify through natural interactions the relationships or information with which they are not familiar and can be
between effect and cognitive processes in learning. considered negative for student learning.
• Automatic recognition of affective states by analyzing On the [30] a model related to the emotions of learning
facial expressions and extracting features from linguis- is proposed, in which a model divided into four quadrants is
tic expressions, posture, gaze tracking, and heart rate, shown, where quadrant I show an evaluation of admiration,
among others. satisfaction or curiosity resulting in positive constructive
• The adaptation of the systems to respond to a particular learning, Quadrant II depicts an appraisal of disappointment,
affective state of the users. perplexity, or confusion resulting in negative constructive
• The design of avatars that show appropriate affective learning, Quadrant III depicts an appraisal of frustration,
states for a better interaction with the user. rejection, or misconceptions that result in negative learning
Affective computing, from an interpretive point of view, and the last Quadrant IV, shows an assessment of optimism,
requires that the concept of emotion be precisely defined new research that translates into positive learning [21], [31],
since it can confuse with concepts such as affect, feeling, [32].
or motivation. With this consideration, it is established that
affection is a process of social interaction between two or C. EMOTION RECOGNITION SYSTEM
more people [22]. Giving affection is something that is For the design of emotion recognition systems, various meth-
transferred, for example, giving a gift, visiting a sick person, ods of extraction and classification of biometric parameters
etc. Feelings are the mental expression of emotions; that is are used. Parameters are extracted from user-generated ges-
when the emotion is encoded in the brain and the person tures, for example, by using a person’s facial expressions.
can identify the specific emotion they are experiencing, joy, Facial expression analysis is applied in different areas of
grief, anger, loneliness, sadness, shame, etc. Motivation is interest such as education, video games, and telecommuni-
a set of processes involved in the activation, direction, and cations, to name a few [33]. In addition, it is one of the
persistence of behavior which allows us to cause changes most used in human-computer interactions. Facial expression
in life in general. For its part, emotion is a state of mind recognition is an intelligent system that identifies a person’s
produced by an event or memory and occurs every day in face and from it obtains certain characteristics that it analyzes
our daily lives, which plays an important role in non-verbal and processes to know the person’s affective state [34]. The
communication. objective of facial recognition is, from the incoming image,
Concerning emotions, these are classified into two groups, to find a series of data of the same face in a set of training
primary or basic and secondary or alternative. In [23], he images in a database. The great difficulty lies in ensuring that
identified six basic emotions, anger, disgust, fear, happiness, this process is carried out in real-time, something that is not
sadness, and surprise, and shows some characteristics that within the reach of similar works that have been previously
appear on the person’s face as shown in Figure 2. reviewed.
Secondary or alternative emotions are complex emotions For the development of the emotion recognition system
that appear after the primary emotions and depend on the through gesture classification, several technical requirements
situation and the context of the person. For example, a of the system are considered. In the design of algorithms, the
person who is afraid (primary emotion) may turn that into requirements may vary depending on the specific use, how-
VOLUME 4, 2016 5

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 2. Examples of recognizable gestures on people’s faces that demonstrate an emotion.

to perform an analysis of their use, depending on the use


required. For the implementation of image recognition, it is
necessary to have experience in machine learning libraries
such as TensorFlow, Keras, PyTorch, and Scikit-learn, among
others.
Concerning hardware, it is considered that most image
recognition algorithms require a large number of hardware
resources for processing, so it is important to have a computer
with a good amount of RAM and a high-end processor speed,
the use of a graphics card can speed up the processing of
information. In addition, it is necessary to have a data set that
is representative of the images to be recognized. This data
is used to train the machine learning model. For the proper
choice of algorithms and the selection of hyperparameters, it
is necessary to know statistics and model theory.
FIGURE 3. Model of valence and intensity dimensions that describe an Regarding the design of the algorithm in Python for the
emotion [26], [27]. recognition of emotions, several technical data are taken into
account, such as the programming language, where Python
is used with the OpenCV, NumPy libraries. The choice of
ever, according to the needs of the institution that participates the emotion detection method through face detection and
in this study, the programming language is considered, since analysis of facial expressions. The algorithm uses a pre-
it is necessary to have experience in the language used, for trained Machine Learning classification model to recognize
example, Python since this is the most used language for the facial expressions. The model is trained on labeled data
development of image recognition systems. For image recog- sets to recognize six universal emotions, happiness, sadness,
nition, there are several image processing libraries in Python, fear, surprise, disgust, and anger. Regarding the input data,
such as OpenCV, Pillow, Scikit-image, and more, which are these are images or video sequences, the output data are
used to read, process, and manipulate images, it is important the emotions recognized for each image or video frame. In
6 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

choosing the data set, it is important to choose a quality data To capture the data, a controlled environment is established
set that contains a variety of emotions and facial expressions. that corresponds to a laboratory with a maximum capacity
This data is used to train and test the emotion recognition of 24 students. An Intel RealSense depth camera has been
model. installed in this laboratory to capture images of the partic-
In addition, in the development of the algorithm, it is ipants. The laboratory has 24 computers that the students
necessary to select the characteristics that will be used to use to solve different exercises on the subject. The emotion
recognize emotions. Characteristics may include the intensity recognition application is housed in the institution’s data
of facial expressions, the position of the eyes and mouth, and center, where two virtual servers have been created, one of
the duration of the expressions. There are different machine- which processes and stores the image signals, and the other
learning algorithms and signal-processing techniques that captures and stores images with a specific name. The name
can be used to recognize emotions. It is important to choose consists of an identifier, a number, and the exact date, and
the appropriate classification model for the selected data set time.
and features. Before feeding the data to the model, it is The class sessions are separate for the two groups of stu-
important to perform proper preprocessing such as data nor- dents, however, the methodology for each session is similar,
malization, denoising, and dimensionality reduction. Another as are the various activities that consist of theoretical com-
important parameter is the evaluation of the emotion recog- ponents and practical development with exercises related to
nition model using a test data set to measure its accuracy a specific problem. During the design stage of the activities,
and generalizability. Likewise, once the emotion recognition they evaluate different learning outcomes, these are based on
model has been developed, it is important to integrate it with the eight types of learning exposed in the learning theory of
an application so that it can be used in real life. [35]. Within the classes, the teacher has three sessions, each
Regarding the evaluation, several techniques and metrics of 60 minutes. As it is a three-hour class, the attention of
are considered to evaluate the emotion recognition algorithm. the students can be diminished, therefore, it is the teacher’s
For example, we consider the precision that measures the pro- task to use a methodology that allows for improving the
portion of correct classifications made by the model. It is one interest of the students. The class schedule is added to the
of the most common metrics and is used to assess the overall continuous sessions since this is in the evening and 75% of
performance of the model. Sensitivity measures the model’s the students work. These factors hinder the teacher’s efforts
ability to correctly identify a specific emotion. It is especially to maintain the interest and concentration of students in the
important if you want to recognize specific emotions, such different topics of the subject. Among the techniques used by
as sadness or happiness. Specificity measures the model’s the teacher is giving higher priority to practical sessions and
ability to correctly identify the lack of a specific emotion. activities that require student participation.
For example, if you want to identify the absence of negative The mechanics of the class that uses the recognition system
emotions such as sadness or anger. The F value is a measure has as components that each student has at their disposal a
that combines precision and sensitivity in a single measure. It computer and a webcam, in addition to the depth camera
is useful when you want to find a balance between accuracy that each laboratory has. Having the cameras allows the
and the ability to correctly identify a specific emotion. The identification of gestures when the students are carrying out
confusion matrix is a table showing the actual rankings and practical activities on their computers [36]. Instead, when
the model’s predictions. It is a useful tool for evaluating false student review is needed for conceptual review of a topic or
positive and false negative rates and for identifying patterns if the teacher is the focus of the class, depth cameras have the
of error in the model. full perspective for monitoring.
To carry out facial recognition, a database of several faces
D. METHODOLOGY that express a facial gesture is needed. These faces should
For the development of this work, an institution of higher denote a variety of expressions such as happiness, sadness,
technical education is considered. The students participating boredom, surprise, etc. Another aspect that these images must
in the study are the third level, and as per the regulations of have had was the variation in light conditions, whether people
the institution, the objective of the system was explained to wear glasses or not, even if they close or wink. It is recom-
the population and their consent was requested to be able to mended that these images be collected in the environment
capture facial images for research purposes only. The data or environment where facial recognition will be applied. The
capture is carried out in two stages, in which the first 20 variety of images obtained from the faces contributes to the
students participate, of which 12 are men and 8 women, performance of the algorithms applied in the system. There
with an age range of 18 to 20 years. In the second stage, is no exact number of images and no guide to precisely
18 students participate, of which 10 are men and 8 women, how many images are needed for an AI algorithm to classify
who are in the same age range. The two groups of students emotions. The only existing reference is that the larger the
take the subject of databases and are part of two parallels, volume of images and the more varied, the better to increase
the first group belongs to parallel “A” and the second to the precision in the algorithm.
“B”, the teacher is the same for both parallels, therefore, the Initially, many images are needed to train a classifier so
methodology is the same for the total population. that it can discern between the presence and non-presence
VOLUME 4, 2016 7

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

of an object. For example, in the first stage, the algorithm TABLE 1. Dataset generated for the first training session with the detection of
six emotions applied in the tests of a convolutional neural network with 7,000
detects if there is a face, therefore happy images are required, images
that is, photos with faces, and negative images, which are
images that do not contain faces. Then, the features of all Commitment Boring Frustration Focused Interested Neutral
the images are extracted, and for this, an automatic learning 1850 1680 592 1140 922 816
Total 7000
approach is used, and the training proceeds as shown in figure
4.
OpenCV has several pretrained classifiers, not only for
been named as parallel “A” and six classes are evaluated,
human faces but also for eyes, smiles, and animal faces,
commitment, boredom, frustration, focused, interested and
among others. To use face detection with hair cascade in
neutral, with a total of 7,000 images corresponding to 20
OpenCV it is necessary to use the detectMultiScale module
students, Table 1 indicates the number of images that each
that helps to detect objects according to the classifier used.
category has.
This allows obtaining a bounding rectangle where the object
to be found within an image is located, for this, the following The 7,000 images correspond to the gestures identified
arguments must be specified: within each of the six categories and these are processed as
part of the training of the convolutional network. Figure 5
• Image: It is the image where the face detector will act. shows the architecture of the convolutional network, which
• ScaleFactor: This parameter specifies how much the im- has three layers, each of which has a convolution with an
age will be scaled down. For example, if 1.1 is entered, activation function and an additional max polling layer. At
it means that the image will be reduced by 10%, with the end of these the flatten, dropout layers, and an output
1.3 it will be reduced by 30%, thus creating a pyramid layer with an activation function were added.
of images. It is important to note that if we give a very Facial expression recognition depends on four steps shown
high number, some detections are lost. While for very in Figure 6, the first step is to detect faces in an image by
small values such as 1.01, it will take longer to process, applying the oriented gradient histogram algorithm. Next, the
since there will be more images to analyze, in addition facial landmark estimation algorithm is used, which allows
to the fact that false positives can increase, which are the identification of 68 landmarks on each face. In the third
detections presented as objects or faces, but which are step, 128 measurements are created for each face through
not they are. deep learning, which corresponds to the unique characteris-
• MinNeighbors: This parameter specifies how many tics of the faces, finally, with the unique characteristics of
neighbors each candidate rectangle must have to retain each face, the name of the person is determined.
it, that is, a window is obtained that will go through an Figure 7 shows the stages that the system performs to
image looking for faces. So, this parameter relates to correctly identify the emotion. The initial stage validates that
all those delimiting rectangles of the same face. There- the image received by the recognizer contains a face, if the
fore, minNeighbors specifies the minimum number of algorithm does not find it, it discards the image. Next, in the
bounding boxes, or neighbors, that a face must have to process, a gray filter is applied to eliminate the different color
be detected as such. channels to later detect some important parts of the face such
The detection of the students’ gestures, while they are as the nose, eyes, eyebrows, and mouth. In the next stage, a
receiving a synchronous class, requires the generation of a technique is applied to mark the facial points on the detected
data set that the recognition algorithm takes for its training, parts, in addition, an initial reference point is in the center of
for this haarcascade_f rontalf ace is used. This algorithm the nose, and various facial points are identified on the parts
allows the storage of a specific number of faces for system of the face. In the next stage, the geometric calculations of
training, this process can be done from a video provided by the distance between the initial point of reference and each
the students or from streaming. In this case, the number of facial point detected on the face are made. The result of
images is 350 per student, and the students have been asked to the calculations is a matrix of facial characteristics that are
simulate many expressions in the videos, where gesticulation processed by a vector support algorithm with her emotion
helps to improve the accuracy of the training. For this task, a tag so she can learn to classify facial expressions. Finally, the
convolutional network designed in Kaggle Notebooks is used trained neural network is sent more vectors of facial features
with the Keras, Tensorflow, and OpenCV libraries for image to assess whether the algorithm has learned to classify ges-
processing [36]. In addition, the Matplotlib library is used in tures and recognize emotions.
the design to generate a graph where the model is evaluated. For the construction of the model, it is possible to use a
To do this, the loss and precision values of both the training video previously stored in a repository, for which the face
and validation phases are identified. The confusion matrix is must simulate several gestures that will later be separated into
generated with the Sklearn library, to evaluate the accuracy frames and used for training the artificial neuron. In addition,
of the [37] classification. in the process of data acquisition and training of the neuron,
The data set generated for the first training session and it is possible to acquire a greater volume of data with the use
tests corresponds to the first group of students that have of streaming video. Obtaining the images from a streaming
8 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 4. Creation of an image database.

FIGURE 5. The architecture of a convolutional network in an emotion


recognition model.
FIGURE 6. Detection phases applied in a convolutional neural network.

video has the advantage that the frames represent the most
natural state possible of the people and even help to identify resizes it to 640 pixels wide with the grayscale application.
the variations of the environment such as lighting, objects, In the next step, a function is created where all the detected
attenuation, etc. faces are stored. To analyze each one of them, the cycle is
For the generation of the streaming video, it is necessary generated, in which a rectangle is drawn that surrounds each
to specify the faceClassif function where the face detector to face and then it is cut and resized to 150 x 150 pixels to that
be used is assigned, in addition, a counter initialized to “0” all stored faces have the same size, and we store each face in
is established, which is responsible for counting all the faces the repository and finally increment the counter by 1.
that are stored for training. The library reads each frame and For training, the obtModel function is created, for
VOLUME 4, 2016 9

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

FIGURE 7. Architecture for gesture detection and emotion identification.

TABLE 2. Description of the test set for algorithm verification TABLE 3. Confusion matrix built with the results obtained for the test with 54
images
Description Number of images Models used
Angry 6 2,5,8,13,21,25 Prediction 1 2 3 4 5 6 7 8 Accuracy
Contented 9 5,6,11,13,23,26,28,35,37 1 5 1 1 0.71
Disgusted 7 2,10,13,18,28,30,35 2 7 1 0.87
Fearful 8 3,10,16,21,25,26,37,18 3 7 1 0.87
Happy 4 2,6,11,37 4 1 7 1 0.78
Neutral 6 3,5,10,16,23,26 5 3 1
Sad 8 6,8,16,21,25,28,30,35 6 1 3 0.75
Surprised 6 3,8,11,18,23,30 7 8 1
8 1 1 5 0.4
Recall 0.83 0.77 1 0.87 0.75 0.5 1 0.83 80 %

this function the method to be used and facesData are


necessary, the array where the faces with their dif- hypothesized functions corresponding to the 8 emotional
ferent emotions will be stored with the labels, which classifiers. The result of this operation is a matrix with 8
are the labels of each of the faces corresponding to rows and 54 columns. The number of rows corresponds to the
each emotion. There are several options for training the number of classifiers (of the different emotions). In this way,
neural network if the method to use is EigenFaces, row 1 will contain the outputs of the classifier associated with
then cv2.f ace.EigenF aceRecognizer_create() will be as- the emotion with code 1 (Anger), 2 (Happy), etc. The number
signed to emotion_recognizer. If EigenFaces is used, of columns for your part is equal to the number of images in
cv2.f ace.F isherF aceRecognizer_create() will be as- the test set. Thus, for example, in column 1 the outputs of the
signed to expression_recognizer. Or if the method used 8 classifiers for the first image were obtained. The final output
is LBPH, cv2.f ace.LBP HF aceRecognizer_create() will of the algorithm is for each image the row number for which
be assigned to expression_recognizer. the maximum probability value was calculated. The results
obtained made it possible to build the confusion matrix in
III. RESULTS Table 3.
In the first evaluation of the system, a dataset consisting of 54 From the matrix shown in the table, it is possible to
images was used, which correspond to 18 students belonging deduce that the operation in the first evaluation of the system,
to groups "A and B" out of a total of 38 students. For the presented several errors with which the operation must be
dataset, the students were asked to model three images that adjusted, for example, in code 1 (Angry) six images were
are included in the description of Table 2. The students were expected, but The result obtained was five hits and one image
randomly selected, and what is sought is to carry out a test to was recognized as neutral. This can be observed in several
evaluate the operation of the emotion recognition algorithm, of the emotions evaluated in the confusion matrix. Another
model subjects have been identified as, 2, 3, 5, 6, 8, 10, 11, result is that in the disgusted and sad emotions, the algorithm
13, 16, 18, 21, 23, 25, 26, 28, 30, 35 and 37. has recognized 100 % of the images.
In the table, the first column indicates the description or After the training stage and the first evaluation, the opera-
name of the evaluated emotion, the second column contains tion of the system is evaluated in a production environment.
the number of images present in the set of tests related to In this, the system is implemented obtaining several results
each emotion, and the last column “Models used” indicates whose fundamental base is the identification of the gestures
the model number detected. In this training stage, it was on the faces of the students. Figure 8 shows the result of the
necessary to generate the feature matrix for the test data set. process carried out on the images, which consists of applying
It had 54 rows (number of images) and 136 columns (number a gray filter to the figure (left-right). Subsequently, the parts
of features). The matrix with the results was stored in a file of the face are detected and the initial center point is found,
with a "txt" extension. This task is performed by a routine which will be used as a reference for the different facial
written in the Python language. points.
Once the “X” matrix is obtained, the next step evaluates Once the facial points have been obtained, the pertinent
the 136 characteristics of each one of the images in the geometric calculations are made from the distance between
10 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 6. Samples taken for the generation of the image dataset

Emotion Session 1 Session 2


Bored 1850 890
Hooked 1680 1540
Excited 592 1175
Focused 1140 974
Interested 922 933
Relaxed 816 788
FIGURE 8. Application of filters and identification of facial points.

sessions is added, for which a second data set is considered


TABLE 4. Accuracy obtained in cross-validation in emotion recognition
that is generated from the second group named parallel “B”.
Classification Average Accuracy
The generation of the second data set maintains the same
Bored 68.57 % guidelines, such as the generation of 350 images from a video
Hooked 71.83 % or streaming, the objective is to increase the number of facial
Excited 70.99 % expressions for the recognition of emotions. In this session
Focused 75.82 %
Interested 92.07 % the images correspond to the students while they carry out
Relaxed 100 % an activity, considering the experience of the group “A”. The
Average accuracy 52.78 % activity proposed for the detection of mood is the reading of
Standard deviation 13.70 %
concepts on the application of relationships in databases, the
objective is to improve the detection of boring emotion, this
the initial reference point (center of the nose) to the other being the one that had the greatest impact in group “A”. A
facial points and their respective position coordinates (x, y), total of 6300 captured images belonging to the 18 students
obtaining the matrix of features. Table 4 shows the precision in the group were obtained, these images are attached to
obtained by the emotion recognition system, resulting from the process to create an additional evaluation session of the
the algorithm applied with a cross-validation of 10 instances system. Table 6 shows the total number of images in the
for the training of the neural network. The right column corpus, annexing the sessions where the boredom emotion
shows the average accuracy of each emotion class. According was captured.
to the literature review, the emotions that are present in Table 7 shows the mean precision and the standard devia-
learning are established in the first column, in addition, what tion of the two sessions that were considered for the evalua-
is sought with this limitation in emotions is to adjust the tion of the proposed model. According to the results obtained,
system so that there is not a high percentage of false positives. it can be identified that the performance of the machine
The statistical analysis process was applied to the result- learning algorithm in the personality trait recognition task
ing data to find anomalous results referring to a possible has been effective. For this, an evaluation was carried out
correlation between the gender of the participants and their using quintuple-crossed controls. The precision calculation
emotions. For this, the test of the person carried out on is performed using the standard deviation, for which it is
the samples of the entire population is considered. In the possible to use two formulas. One of them is used if the
process, each gesture was ordinarily categorized, according measured data represents an entire population, while a second
to the results obtained, there is no evidence of an existing formula is used in case the measured data comes from only
relationship between the gender of the participants and their a sample of the population. For this work, a set of samples is
emotions. In a subsequent analysis, the bivariate correlation used, where formula (3) for the standard deviation to be used
calculation was used to obtain the Pearson correlation coef- for a set of samples is presented below:
ficient, between the average of the emotions presented by rP
each participant. Table 5 presents the results of the tests (x − µ)2
σ (2)
with negative correlation values. Among the most significant n
results, positive correlations were found in hooked/bored and rP
(x − µ)2
excited (.441), and in focused and excited (.987), this means σ (3)
that when the focused value increases, the exciting value also n−1
intensifies. As with calculating the mean deviation, the first step is to
For the evaluation of the system, a greater number of find the mean of the data values using the set of measure-
ments for each factor. In the second step, the square of each
variance is calculated by subtracting the data value from the
TABLE 5. Result of existing correlations between emotions
mean for each data point and squaring the result. The square
of the difference will always be positive in each case of the
Classification Engaged/bored Relaxed Excited
Interested -0.501 -0.316 -0.225 five sample data values. Finally, the square root of the result is
Engaged/bored 1 0.223 0.441 calculated, which represents the variance of the data set. The
Focused 0.134 -0.435 0.987 standard deviation is the square root of the variance, using
VOLUME 4, 2016 11

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

TABLE 7. Results of the precision and standard deviation taken in two the tests carried out, the system was also evaluated in tra-
sessions for the evaluation of the recognition model
ditional classrooms, in which only one camera is available,
Classification Session 1 Session 2 even though the model is capable of recognizing people, its
Average accuracy 65.7% 69.9% effectiveness in recognizing gestures and subsequent emotion
Standard deviation 2.3% 1.3% is limited to the line of sight of the camera with the student.
This means that if a student does not have a direct camera
TABLE 8. Comparison of the emotions captured by the system among 5 angle, the model assumes that a person does not exist and
randomly selected users in the development of two learning activities
generates a person not detected notification. In addition, the
Classification A 1 A2 A3 A4 A5 Total number of students marks another limitation for the identifi-
Coincidence 33 47 53 48 55 236 cation of gestures [42], [43]. Other works that use sensors to
Total data 44 75 61 81 69 330 detect emotions may have greater success in an environment
Average 0.75 0.626 0.868 0.592 0.797 77.66%
A= Participant such as the one described, however, costs become a limitation
that must be considered.
This work guarantees the application of the model in
this calculation the precision of the balance is represented by face-to-face educational environments, focusing on provid-
giving the mean, plus or minus the standard deviation. For ing comprehensive help to the teacher [44]. Considering
example, the accuracy of AGR is 65.7%. that in certain chairs it is a requirement to have computer
The results of the final validation of the method are pre- devices, it is possible to implement the system and act as
sented in Figure 8, where it was carried out with a com- an assistant to the tutor. Having accurate information about
parison of the emotion recognition system between the two the emotions reflected by the students, they can adjust their
groups participating in the study. Five people were selected, teaching methodologies. The potential of the system allows
three men and two women, aged between 19 and 22 years. it to be integrated into the student service system, for this
The participants carried out two activities; reading a scientific it will be necessary to add methods and libraries that allow
article and carrying out three design exercises for a relational not only to identify of gestures and emotions in students, but
database using Microsoft SQL. The results were obtained by it will also be necessary for the system to recognize people
recording and counting the times the recognizer matched the [33]. The inclusion of these characteristics in the system will
emotion classification with the student’s feelings. be presented in the second stage of this work, in addition, it
has been proposed to improve the recognition of emotions so
IV. DISCUSSION that the system can be applied to a hybrid education model
Based on the literature review on various facial expression and verify its functionality.
recognition systems, it was determined that for implemen- There are several comparisons between an emotion recog-
tation the Python programming language is aligned with nition system like the one proposed and other works that use
the needs exposed in this work. Furthermore, the OpenCV different techniques. Some comparisons focus on the accu-
library for parsing and processing images or videos is suitable racy of emotion recognition models, while others consider
for training support vector machines [38]. In several works, factors such as processing speed and ease of implementation.
the use of headbands with sensors for the identification of For example, in a study published in 2020 [12], various emo-
emotions is proposed, however, the process with devices that tion recognition algorithms were compared using standard
people must use can cause rejection because they are consid- databases. The results indicated that the deep neural net-
ered invasive. Therefore, the inclusion of AI and computer work models had the best overall accuracy results. However,
vision was planned in the design, for which it was determined models based on spectral features were also found to have
to validate that the image received by the recognizer contains comparable results and were faster to train and evaluate.
a face and that if it is not found, discard the image. Subse- Another 2019 [21] study compared different signal pro-
quently, in the process, a gray filter is applied to eliminate cessing methods for emotion recognition in speech. In this
the different color channels to later detect some important case, several machine learning algorithms were compared,
parts of the face such as the nose, eyes, eyebrows, and mouth including convolutional neural networks and support vector
[39], [40]. By applying a facial point technique, it is possible machines. The results indicated that methods based on spec-
to detect an initial reference point in the center of the nose tral features and prosodic features obtained the best overall
and to identify various facial points on parts of the face [41]. accuracy results. The choice of the best method for emotion
According to the results obtained, it is established that the recognition will depend on the context of the application, the
proposed method can be applied in an educational environ- quality of the available data, and the skills of the development
ment, where the ideal environment is when students have a team. Therefore, it is important to do a thorough evaluation
personal computer. Having personal cameras guarantee the of the available options before making a decision.
results obtained from the emotion recognition system, in In another study carried out in 2020 [11], different signal
addition, it is important to establish that the performance tests processing methods for emotion recognition in EEG (elec-
were carried out in computer laboratories, where activities troencephalogram) were compared. The methods evaluated
depend on the exclusive use of computer equipment. Among included convolutional neural networks, recurrent neural net-
12 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

works, and SVMs (support vector machines). The results possible.


indicated that recurrent neural networks had the best accuracy In the validation of results, it is recommended to make a
results overall. The 2020 [42] study compared different deep- comparison of the emotion recognition system with a greater
learning models for emotion recognition in facial images. number of sessions. In addition, it is important to include a
The models evaluated included convolutional neural net- greater variety of gender in the group, as well as to include
works and recurrent neural networks. The results indicated in the study people who are in a higher age range than the
that the models based on convolutional neural networks one included in the study. By opening the age range, the aim
obtained the best precision results. In a 2019 study [29], is to build a training data set that can be used for the future
different feature selection methods for emotion recognition in inclusion of the system in an online education model. In these
the speech were compared. The methods evaluated included models, it is common for students to be older, which implies
LDA (linear discriminant analysis), SVM, and decision trees. that their traits physical is different.
The results indicated that LDA and SVM had the best over- When analyzing the data, it is identified that the classes
all accuracy results. A 2018 study [16] compared different of the data set were unbalanced, so a greater number of
signal processing methods for emotion recognition in video sessions was carried out, to increase the number of facial
signals. The methods evaluated included SVM, convolutional expressions of the boring emotion. In the same way, teaching
neural networks, and recurrent neural networks. The results activities should be varied and include those that usually
indicated that convolutional neural networks had the best present problems in students, such as reading literary articles,
accuracy results. this helps to increase the production of emotions and adjust
the different parameters of the model.
V. CONCLUSIONS According to the results obtained from the systems, there
The gesture recognition system for the identification of emo- is a percentage of precision that guarantees its use in an ed-
tions allows the teacher to obtain an additional variable, ucation system and provides important information to tutors.
with which it is possible to improve the teaching method. However, several limitations may affect the effectiveness of
According to the results obtained, it has been identified that the AI algorithm designed to detect emotions in university
students generate a variety of gestures during the teaching students. Among these limitations, it stands out that AI
process, and these are generally linked to a certain emotion. algorithms can be trained with limited or biased data sets,
By identifying the emotions of the student during a teaching which can affect the accuracy of emotional detection’s. This
process, the teacher provides feedback on her method and can can lead to erroneous results and a lack of reliability in
make decisions that allow him to improve the teaching en- the results. Difficulty detecting subtle emotions, sometimes
vironment. For the teacher, having real-time information on emotions can be subtle and hard to see, even for humans.
the state of mind of the students is undoubtedly an advantage Therefore, AI algorithms may also have a hard time detecting
over any model of supervision of the impact of teaching on emotions that are not so obvious.
the student. About other works, it should be noted that there
AI algorithms can have difficulty understanding the con-
are several recognition algorithms and libraries for image
text in which an emotion occurs since without context, al-
processing, however, the applicability of these is focused on
gorithms can misinterpret emotions and provide incorrect
students with other types of abilities, our proposal is focused
results. Often people experience multiple emotions simulta-
on monitoring an environment common, and we prepare a
neously, which can make it challenging to detect accurate
future work that adapts to the needs established in the hybrid
emotions. AI algorithms may also have difficulty detecting
and online educational models, these models being the future
mixed emotions and may provide inaccurate or conflicting
of education where the role of the teacher will need ICT with
results. Furthermore, emotions and their ways of expression
greater emphasis.
can vary between cultures and geographic regions. Therefore,
In the training of the neural network, a validation accuracy
AI algorithms designed to detect emotion in college students
percentage of over 70% has been achieved, this percentage
may not be accurate across cultures.
is adequate considering the results found in similar works
that have used other platforms. These results guarantee the
effectiveness of the model and the classification of student REFERENCES
gestures, with which it is possible to identify with certainty [1] N. Agustini, N. Nursalam, T. Sukartini, G. K. Pranata, N. W. Suniyadewi,
and I. D. A. Rismayanti, “Teaching Methodologies Regarding Palliative
the emotions of a certain student population. Care Competencies on Undergraduate Nursing Students: A Systematic
Among the positive characteristics of the developed Review,” Journal of International Dental and Medical Research, vol. 14,
model, it can be established that generating the training with 2021.
[2] A. K. H. AlSaedi and A. H. H. AlAsadi, “A new hand gestures recogni-
the students’ images, allows for improved recognition of tion system,” Indonesian Journal of Electrical Engineering and Computer
gestures. For this, the data set worked on was asked of all Science, vol. 18. 2019.
the students, through a video, however, in certain cases it is [3] H. Y. Lai, H. Y. Ke, and Y. C. Hsu, “Real-time hand gesture recognition
better to use streaming to obtain the images and generate the system and application,” Sensors and Materials, vol. 30, 2018.
[4] J. Zhou, D. Yungbluth, C. N. Vong, A. Scaboo, and J. Zhou, “Estimation of
data set. The objective is to prevent participants from acting the maturity date of soybean breeding lines using UAV-based multispectral
unconsciously and for the images obtained to be as natural as imagery,” Remote Sensing, vol. 11, 2019.

VOLUME 4, 2016 13

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

[5] R. Kobai and H. Murakami, “Effects of interactions between facial expres- [30] E. Ivanova and G. Borzunov, “Optimization of machine learning algorithm
sions and self-focused attention on emotion,” PLoS ONE, vol. 16, 2021. of emotion recognition in terms of human facial expressions,” 2020, vol.
[6] E. G. Krumhuber, D. Küster, S. Namba, D. Shah, and M. G. Calvo, 169.
“Emotion recognition from posed and spontaneous dynamic expressions: [31] R. W. Picard, “A ective Computing,” Pattern Recognition, no. 321, 1995.
Human observers versus machine analysis,” Emotion, vol. 21, 2021. [32] P. Näykki, J. Isohätälä, and S. Järvelä, “You really brought all your feelings
[7] F. Makhmudkhujaev, M. Abdullah-Al-Wadud, M. T. B. Iqbal, B. Ryu, and out” – Scaffolding students to identify the socio-emotional and socio-
O. Chae, “Facial expression recognition with local prominent directional cognitive challenges in collaborative learning,” Learning, Culture and
pattern,” Signal Processing: Image Communication, vol. 74, 2019. Social Interaction, vol. 30. 2021.
[8] K. R. Scherer, H. Ellgring, A. Dieckmann, M. Unfried, and M. Mortillaro, [33] N. A. A. Aziz et al., “Awareness and Readiness of Malaysian University
“Dynamic facial expression of emotion and observer inference,” Frontiers Students for Emotion Recognition System,” International Journal of Inte-
in Psychology, vol. 10, 2019. grated Engineering, vol. 13, 2021.
[9] M. Aziz and M. Aman, “Decision Support System For Selection Of Ex- [34] P. Partila, M. Voznak, and J. Tovarek, “Pattern Recognition Methods and
pertise Using Analytical Hierarchy Process Method,” IAIC Transactions Features Selection for Speech Emotion Recognition System,” Scientific
on Sustainable Digital Innovation (ITSDI), vol. 1, 2021. World Journal, vol. 2015, 2015, doi: 10.1155/2015/573068.
[35] T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. Ambikaira-
[10] B. Gaudelus et al., “Improving facial emotion recognition in schizophre-
jah, “A Comprehensive Review of Speech Emotion Recognition Systems,”
nia: A controlled study comparing specific and attentional focused cogni-
vol. 9. 2021. doi: 10.1109/ACCESS.2021.3068045.
tive remediation,” Frontiers in Psychiatry, vol. 7, 2016.
[36] D. Singh, “Human Emotion Recognition System,” International Journal of
[11] S. Volynets, D. Smirnov, H. Saarimaki, and L. Nummenmaa, “Statistical Image, Graphics and Signal Processing, vol. 4, 2012.
pattern recognition reveals shared neural signatures for displaying and [37] S. Emami and V. P. Suciu, “Facial Recognition using OpenCV,” Journal of
recognizing specific facial expressions,” Social Cognitive and Affective Mobile, Embedded and Distributed Systems, vol. 4, no. 1, 2012.
Neuroscience, vol. 15, 2020. [38] J. Sigut, M. Castro, R. Arnay, and M. Sigut, “OpenCV Basics: A
[12] G. Lou and H. Shi, “Face image recognition based on convolutional neural Mobile Application to Support the Teaching of Computer Vision Con-
network,” China Communications, vol. 17, 2020. cepts,” IEEE Transactions on Education, vol. 63, no. 4, 2020, doi:
[13] Y. Takahashi, S. Murata, H. Idei, H. Tomita, and Y. Yamashita, “Neu- 10.1109/TE.2020.2993013.
ral network modeling of altered facial expression recognition in autism [39] M. Naveenkumar and V. Ayyasamy, “OpenCV for Computer Vision Ap-
spectrum disorders based on predictive processing framework,” Scientific plications,” Proceedings of National Conference on Big Data and Cloud
Reports, vol. 11, 2021. Computing (NCBDC’15), no. March 2015, 2016.
[14] G. Yang, X. Xi, and Y. Yin, “Finger vein recognition based on a personal- [40] H. Adusumalli, D. Kalyani, R. K. Sri, M. Pratapteja, and P. V.
ized best bit map,” Sensors, vol. 12, 2012. R. D. P. Rao, “Face Mask Detection Using OpenCV,” 2021. doi:
[15] R. Shyam and Y. N. Singh, “Identifying individuals using multimodal face 10.1109/ICICV50876.2021.9388375.
recognition techniques,” 2015, vol. 48. [41] Y. Kumar and M. Mahajan, “Machine learning based speech emotions
[16] A. Villar, M. T. Zarrabeitia, P. Fdez-arroyabe, and A. Santurtún, “Integrat- recognition system,” International Journal of Scientific and Technology
ing and analyzing medical and environmental data using ETL and Business Research, vol. 8, no. 7, 2019.
Intelligence tools,” International Journal of Biometeorology, vol. 62, p. [42] R. Alhalaseh and S. Alasasfeh, “Machine-learning-based emotion recog-
1085, 2018. nition system using EEG signals,” Computers, vol. 9, no. 4, 2020, doi:
[17] W. Villegas-Ch, X. Palacios-Pacheco, and S. Luján-Mora, “Artificial intel- 10.3390/computers9040095.
ligence as a support technique for university learning,” 2019, pp. 1–6. [43] Y. Uranishi, “OpenCV: Open source computer vision library,” Kyokai Joho
[18] X. Palacios-Pacheco, W. Villegas-Ch, and S. Luján-Mora, “Application of Imeji Zasshi/Journal of the Institute of Image Information and Television
data mining for the detection of variables that cause university desertion,” Engineers, vol. 72, no. 5, 2018, doi: 10.3169/ITEJ.72.736.
Communications in Computer and Information Science, vol. 895. 2019. [44] H. J. Yun and J. Cho, “Affective domain studies of K-12 computing
[19] W. Villegas-Ch, S. Luján-Mora, and D. Buenaño-Fernandez, “Data mining education: a systematic review from a perspective on affective objectives,”
toolkit for extraction of knowledge from LMS,” 2017, vol. Part F1346, pp. Journal of Computers in Education, vol. 9, 2022.
31–35.
[20] J. Liu, J. Tong, J. Han, F. Yang, and S. Chen, “Affective Computing
Applications in Distance Education,” 2013.
[21] H.-J. So, J.-H. Lee, and H.-J. Park, “Affective Computing in Education:
Platform Analysis and Academic Emotion Classification,” International
Journal of Advanced Smart Convergence, vol. 8, 2019.
[22] E. Yadegaridehkordi, N. F. Noor, M. N. B. Ayub, H. B. Affal, and N. B.
Hussin, “Affective computing in education: A systematic review and future FIRST A. AUTHOR William Eduardo Villegas
research,” Computers and Education, vol. 142, 2019. is a professor of Information Technology at the
[23] A. Alblushi, “Face Recognition Based on Artificial Neural Network: A Universidad de Las Américas (Quito, Ecuador).
Review,” Artificial Intelligence & Robotics Development Journal, 2021. He has a Ph.D. in computer science from the Uni-
[24] E. Roidl, F. W. Siebert, M. Oehl, and R. Höger, “Introducing a multivariate versity of Alicante, obtained a master’s degree in
model for predicting driving performance: The role of driving anger and communications networks, and is a systems engi-
personal characteristics,” Journal of Safety Research, vol. 47, 2013. neer specializing in artificial intelligence robotics.
[25] K. R. Scherer and E. Coutinho, “How music creates emotion: A multifac- His main research topics include web applications,
torial process approach,” The emotional power of music, 2013.
data mining, and e-learning. He has participated in
[26] J. H. Chai, “Notice of Retraction: Study on harmonious human-computer various conferences as a speaker on topics such
interaction model based on affective computing for web-based education,”
as ICT in education and how they improve educational quality and student
2009 1st International Conference on Information Science and Engineer-
learning. His main articles focus on the design of ICT systems, models,
ing, ICISE 2009, 2009.
and prototypes applied to different academic environments, especially with
[27] B. García, E. L. Serrano, S. P. Ceballos, E. J. Cisneros-Cohernour, G.
C. Arroyo, and Y. E. Díaz, “Las competencias docentes en entornos
the use of Big data and Artificial Intelligence as a basis for the creation of
virtuales: un modelo para su evaluación,” RIED. Revista Iberoamericana intelligent educational environments. In addition, his interests and scientific
de Educación a Distancia, vol. 21. 2017. articles integrate cybersecurity techniques and methods for data protection
[28] M. Á. R. Catalán, R. G. Pérez, R. B. Sánchez, O. B. García, and L. V. Caro, in all environments that use ICT as a communication channel, at the moment
“Las emociones en el aprendizaje online,” RELIEVE - Revista Electrónica he has a wide variety of articles indexed in scientific journals that arise from
de Investigación y Evaluación Educativa, vol. 14. 2014. research projects. , supported by the Universities of Ecuador.
[29] E. Yadegaridehkordi, N. F. B. M. Noor, M. N. B. Ayub, H. B. Affal, and
N. B. Hussin, “Affective computing in education: A systematic review and
future research,” Computers and Education, vol. 142, 2019.

14 VOLUME 4, 2016

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2023.3267007

Author et al.: Preparation of Papers for IEEE TRANSACTIONS and JOURNALS

SECOND B. AUTHOR Joselin García is a soft-


ware engineer at Capmation Inc. (Quito, Ecuador),
in the field of knowledge application they focus on
Software development and cybersecurity. Joselin
is a Cybersecurity Engineer from the Universidad
de las Américas, where he has participated in var-
ious research projects demonstrating his ability to
manage autonomous systems and applications that
integrate emerging technologies such as artificial
intelligence and data analysis. In addition, she has
participated as a co-author of several scientific articles published in high-
impact journals.

THIRD C. AUTHOR, JR. Santiago Sanchez Vi-


teri was born in Quito, Ecuador, in 1984. He
received the System Engineering degree, from the
Salesiana Polytechnic University, UPS, Ecuador,
in 2017. He is currently pursuing the master’s de-
gree in Telematics Management, Salesiana Poly-
technic University, UPS, Ecuador. He work in the
area of Telecommunications and Networking of
IT, from 2010 to present, in Universidad Interna-
cional del Ecuador. He participated in more than
10 indexed articles on telecommunications, big data and innovation. He
has been a professor of computer science, in Universidad Internacional del
Ecuador. He is currently an administrator of computer servers and internet
connection equipment. He is the administrator of the Moodle and CANVAS
learning management system, in Universidad Internacional del Ecuador.

VOLUME 4, 2016 15

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4

You might also like