2023IEEE

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
Approaches, Applications,
and Challenges in
Physiological Emotion
Recognition—A Tutorial
Overview
By Y EKTA S AID C AN , B HARGAVI M AHESH , AND E LISABETH A NDRÉ , Senior Member IEEE
ABSTRACT | An automatic emotion recognition system can recognition, such as experiment design, properties of different
serve as a fundamental framework for various applications physiological modalities, existing datasets, suitable machine
in daily life from monitoring emotional well-being to improv- learning algorithms for physiological data, and several appli-
ing the quality of life through better emotion regulation. cations. It aims to provide the necessary psychological and
Understanding the process of emotion manifestation becomes physiological backgrounds through various emotion theories
crucial for building emotion recognition systems. An emo- and the physiological manifestation of emotions, thereby lay-
tional experience results in changes not only in interpersonal ing a foundation for emotion recognition. Finally, the tutorial
behavior but also in physiological responses. Physiological discusses open research directions and possible solutions.
signals are one of the most reliable means for recogniz-
KEYWORDS | Affective computing; deep learning; emotion
ing emotions since individuals cannot consciously manipulate
recognition; physiological signals; wearable.
them for a long duration. These signals can be captured
by medical-grade wearable devices, as well as commercial
smart watches and smart bands. With the shift in research I. I N T R O D U C T I O N
direction from laboratory to unrestricted daily life, commercial Emotions serve a significant role in human lives as they
devices have been employed ubiquitously. However, this shift assist in decision-making and forging social relationships.
has introduced several challenges, such as low data quality, The short-lasting emotional responses distinguish them-
dependency on subjective self-reports, unlimited movement- selves from affective states, such as mood or stress.
related changes, and artifacts in physiological signals. This However, enduring negative emotions may have severe
tutorial provides an overview of practical aspects of emotion effects if they are not managed well early. They may inhibit
learning among students [1], lead to burnout among work-
ers [2], and eventually lead to mental health disorders,
Manuscript received 30 September 2022; revised 1 March 2023 and 9 May such as anxiety- and mood-related disorders, schizophre-
2023; accepted 8 June 2023. This work was supported in part by the Deutsche nia, and substance abuse [3].
Forschungsgemeinschaft (DFG) through the Leibniz Award of Elisabeth André
under Grant AN 559/10-1 and in part by the Bavarian Ministry of Science and Automatic recognition of emotions (specifically negative
Arts through the ForDigitHealth project. (Corresponding author: Yekta Said Can.)
emotions, such as sadness, anxiety, fatigue, and anger) can
The authors are with the Chair for Human-Centered Artificial Intelligence,
University of Augsburg, 86159 Augsburg, Germany (e-mail: [email protected]; contribute significantly to a prescreening tool to prevent
[email protected]; [email protected]). adverse health consequences. Suppose that one is driv-
Digital Object Identifier 10.1109/JPROC.2023.3286445 ing a car for a long distance under time pressure and
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
P ROCEEDINGS OF THE IEEE 1

This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition
cannot afford to rest sufficiently. This condition reduces teach emotional responses to people with certain condi-
one’s attention on the road and makes one vulnerable to tions, such as autism, monitor enduring negative emotions,
mistakes and accidents. The U.S. Department of Trans- report emotions to physicians and psychologists through
portation claims that driving-related errors cause around a prescreening tool, track workers in dangerous lines of
95% of fatal road accidents [4]. A huge proportion of work, and notify authorities in the case of accumulated
these driving errors are caused by drowsiness or fatigue. fatigue, anxiety, or stress, thereby decreasing work acci-
However, an intelligent emotion detection system in a dents.
car, which can continuously monitor and detect fatigue or An ideal emotion recognition cycle in the wild com-
drowsiness using our physiological signals, could save lives prises emotion recognition and regulation components
by preventing accidents. Sending personalized alerts to the (see Fig. 1 for details). This tutorial provides the neces-
driver ahead of time to pause for a coffee break and change sary background and guidelines for developing such an
the music tempo or ambient temperature could ensure a emotion recognition system. Section II briefly describes
safer and more comfortable driving experience. the evolution of theories on how emotions are caused,
The advancement in sensing technologies has enabled represented, and regulated. Section III describes the phys-
computer scientists to develop automatic emotion recog- iological correlates of emotions and presents empirical
nition tools. Facial expressions [5] and speech [6] are evidence for emotion manifestation through physiological
adopted for emotion recognition due to the ease of changes in the body. Section IV describes the physiological
associating typical facial expressions and speech with emo- signals, devices to obtain them, and discriminatory fea-
tions. Physiology-based solutions have emerged as another tures. Section V presents the guidelines for designing and
alternative for emotion recognition research due to their implementing scientific experiments for data collection,
suitability for continuous monitoring in everyday life and along with prominent public datasets. Section VI describes
relatively fewer privacy issues. Wearable devices have state-of-the-art machine learning and deep learning tech-
emerged as pervasive instruments for passive quantitative niques appropriate for physiological time-series data. This
data collection. More than 330 million smart watches, article concludes with open research issues, insights, and
fitness trackers, and similar wearable devices have been recommendations for recognizing emotions in the wild.
sold, and the market has been growing each year [7]. Most
wearable devices can capture physiological, environmen- II. B A C K G R O U N D
tal, and activity-related information without interfering Several psychophysiologists proposed emotion theories to
with the user’s activities, making them a promising can- model the elicitation of an emotional experience. Although
didate for emotion recognition, especially in daily life. most of the emotion elicitation theories bear similarities in
Numerous emotion recognition studies in labora- the psychological and physiological elements that consti-
tory environments have been conducted over the past tute an emotional experience, they differ in the occurrence
decade [8], [9], and several public datasets are created order of these elements or the depth of description of
in these settings. However, the focus of research has the underlying process. Nevertheless, these theories laid
recently shifted from laboratory to daily life [10] since the foundation for emotion representation and regula-
emotion recognition in the laboratory differs significantly tion. Emotion representation frameworks consist of single
from daily life in terms of emotional stimulus charac- or multidimensional spaces where several emotions are
teristics, responses, and labeling [11], [12]. People can arranged. Such frameworks for emotion representation
differentiate between the artificial stimuli induced in the promote emotion modeling—detection and recognition.
laboratory and daily emotional stimuli that matter to This section concludes with the emotion regulation theory
them and react accordingly [11]. Researchers proposed that describes the potential strategies at various stages of
several emotion recognition techniques and tested them an emotional experience where individuals can regulate
in the wild [13], [14], [15]. Shu et al. [16] describe their emotions.
a framework for emotion recognition using physiologi-
cal signals and emphasize that emotion recognition in
the wild faces several challenges apart from the stimuli A. Emotion Elicitation
itself, such as emotion labeling and intersubject variability. Emotion refers to a change in the mental state arising
Saganowski et al. [12] systematically reviewed the litera- from a complex interaction between a stimulus in the
ture on wearable devices for emotion recognition in daily external environment and the internal state of an indi-
life and noted that most studies involved laboratory data. vidual. Although details of the emotion definition have
The approaches developed in a laboratory setting do not been controversial, the theories around emotion elicitation
have sufficient robustness to be employed in a real-time have converged on the crucial components of emotion,
monitoring system. Precise and robust emotion recogni- which define the characteristics of an emotional response.
tion in daily life is crucial for developing emotion-aware These theories, however, differ based on the process details
systems (i.e., personal agents or robots) that employ the of an emotional experience. James [18], [19] proposed
user’s emotions as feedback to adapt its behavior. It can that the physiological response precedes an emotional
be used to find personalized emotion regulation strategies, experience. According to this theory, an emotional stimulus
2 P ROCEEDINGS OF THE IEEE

Fig. 1. Ideal emotion recognition system for daily life is shown. It should continuously monitor the signals, and if it detects negative emotions,
it should suggest appropriate relaxation methods (emotion regulation support scheme) to return individuals to their baseline state [17].
activates the sensory cortex, thereby eliciting peripheral involves a sequence of appraisal checks, the response
responses. The feedback from the peripheral responses to which varies over different personalities and cultures.
then triggered an emotional response. This theory empha- Emotion researchers have proposed several such variables
sizes that the stimulus elicits a specific response pattern influencing the appraisal. Meanwhile, Ekman et al. [26]
that governs emotion quality. However, the theory does not challenged the theory of undifferentiated physiological
explain how physiological responses are initiated. Later, responses using empirical studies and showed that auto-
Cannon [20] argued against the specificity of physio- nomic responses are specific to emotions. Although there
logical responses for a given emotion and claimed that have been several propositions regarding emotion elici-
different emotions could elicit an undifferentiated physio- tation throughout history, the class of appraisal theories
logical (autonomic) response, such as an increased heart is preferred the most. Fig. 2 depicts the main compo-
rate (HR). Following Cannon’s empirical understanding, nents and the sequence of occurrence of these components
Schachter [21] proposed that different stimuli are likely based on the appraisal theory of emotions. The research
to produce similar physiological arousal, but a specific concludes that there is an endless and inconsistent list
emotional experience is produced by the cognitive process of components leading to an emotional experience in an
of consciously attributing the arousal to characteristics of individual. However, it is worthwhile to consider various
the stimulus. Therefore, according to this theory, attribut- factors that influence the subjectivity of the cognitive
ing arousal to different characteristics of the eliciting appraisal.
stimulus produces different emotions. Several researchers,
including Arnold [22], Scherer [23], and Lazarus [24],
argued against the conscious attribution of the physiologi-
cal arousal and instead claimed that the cognitive appraisal
of stimulus or the situation with respect to the individual’s
goals is likely to occur unconsciously, and it precedes
the physiological arousal. This concept of appraisal gave
rise to appraisal theories of emotion. While the first level
of appraisal focused on the situation itself, Lazarus fur-
thered his theory by adding the concept of coping or
secondary appraisal of a potentially dangerous situation by
individuals based on their capabilities. Roseman et al. [25]
suggested that the subjective evaluation of the situa- Fig. 2. Sequence of emotion elicitation based on the appraisal.
Cognitive appraisal of the external situation and the internal state
tion concerning an individual’s goals and accountability
of the individual result in emotions that further trigger various
also influences emotions, thereby making an appraisal physiological and behavioral responses. The changes in the mental
individualistic. The component process model proposed and physiological states of an individual constitute an emotional
by Scherer [23] suggested that a cognitive appraisal experience.

defined by the scales: pleasure, arousal, and dominance

(PAD) dimensions [32]. The PAD model was based on the
premise that emotions are the foundations for cognitive
judgments. Plutchik [33] used a hybrid approach with
the emotion wheel made of eight primary emotions rep-
resented with different colors and intensities on a polar
coordinate system, thereby establishing the spatial rela-
tionship between them. The intensity of the emotions is
proportional to the color intensity, and opposite emotions
are placed diagonally opposite in the wheel. Meanwhile,
the mixtures of primary emotions are presented in the
spaces in the outermost layer. Prior work [30] has revealed
that emotions represented as categories are better at cap-
turing subjective experiences through self-reports than the
commonly used dimensions such as valence and arousal.
Scherer [23] identified a robust alternative to categorical
emotions with eight cognitive dimensions leading to a
cognitive appraisal—novelty, pleasantness, fairness of the
situation, and the individual’s perception of goal, coping
Fig. 3. Russell’s circumplex model of affect [31] depicting emotions ability, accountability, morality, and self-consistency. Each
on a 2-D space. V stands for valence, and A stands for arousal. Can basic emotion was found to exhibit a specific appraisal
and Ersoy [35] selected the five highlighted emotions for their study. profile along the eight dimensions, and these appraisal
The figure becomes lighter when the valence is more positive. When
profiles have been used to distinguish different emotion
the arousal increases, the red color becomes more evident (similar to
categories qualitatively [34].
an alarm).
C. Emotion Regulation
B. Emotion Representation
Emotions are helpful when they enhance our
Emotions have been represented as discrete or cate- decision-making and motivate socially appropriate
gorical emotion states and in continuous or dimensional behaviors. Nevertheless, they could also be unhelpful
emotion space. Over the past decades, different sets when they are inappropriate for a given situation or are
of primary emotions have been categorized by emotion of inappropriate intensity, higher frequency, and longer
researchers. James [19] identified emotion categories, duration. Emotion regulation is required when these
such as fear, grief, love, and rage as coarse emotions as unhelpful emotions lead to collateral damage or harm to
they involve strong physiological changes. Ekman [27] oneself or others. Emotion recognition systems have the
proposed a finite set of emotions having distinctive physio- potential to assist in emotion regulation.
logical signatures and universal signals, and called them One needs to assume a positive goal in order to regulate
basic emotions, having common features, such as rapid emotions. Such a goal could be to feel less sad or to lead
onset, short duration, automatic appraisal, and coher- a healthy lifestyle. Emotion regulation could be intrinsic,
ent responses. The basic emotions constitute anger, fear, where an individual regulates one’s own emotions, such
sadness, enjoyment, disgust, and surprise. However, the as encouraging oneself after a job rejection, or extrinsic,
universality in the definition of basic emotions limited where an individual regulates another person’s emotions,
the representation of the complexity of the emotion gen- such as a parent consoling a child. Individuals have differ-
eration process among different individuals. Researchers ent strategies to regulate emotions, and not all strategies
extended the list of basic emotions to 15 [28], 17 [29], and work. Hence, one must find the emotion regulation strat-
27 emotions [30]. However, the similarity among the emo- egy that works for them. Gross [36] proposed the process
tions could not be gauged with such emotion categories model of emotion regulation, which is a framework for
though they could be broadly classified into positive and identifying emotion regulation strategies at several steps
negative emotions. Conversely, the dimensional model rep- involved in emotion generation. The steps involved in emo-
resents emotions in a continuous multidimensional space tion generation and the regulation strategies are depicted
that denotes a systematic relationship between different on a time axis in Fig. 4. Each step presents a potential
emotions. A prominent example is the Circumplex model opportunity for regulation. The first strategy is situation
proposed by Russell [31], which is defined by two orthog- selection, where an individual can choose the situation
onal dimensions: valence and arousal. The two dimensions that will have the least negative emotional impact on
depicted the subjective experience and the extent of physi- the future. This strategy is also used in cognitive behav-
ological activity. An example of its application is presented ioral therapy, where the interventions increase a person’s
in Fig. 3. Another commonly adopted model is a 3-D space exposure to positive state-inducing activities. However,

behavior by specifying relations among different modal-

ities [40]. Theories also help researchers accurately
hypothesize and model the antecedents and consequences
of emotion. While emotion theories help explain a model’s
predictions, they also assist in reasoning the variance in
the predictions. Hence, theories act as a foundation for
prediction models.
III. P H Y S I O L O G I C A L C O R R E L A T E S
OF EMOTIONS
An emotional experience constitutes changes in the psy-
Fig. 4. Steps involved in the emotion regulation process [36]. chological and physiological states in response to a
An individual can regulate their emotions at various stages through stimulus. Early studies reported specific physiological and
their choices. They can select situations that may have a lesser behavioral patterns for different emotions [26]. Later, the
negative emotional impact or modify the existing situation to avoid
studies investigated how the human brain, which hosts
negative emotions. In addition, they can choose which aspects of a
situation to focus on and which meaning to attach to an aspect. Once
the emotion-processing center of the human body and also
the emotion is generated, they can choose to modulate the responses regulates the organs of the human body that it innervates.
by suppressing or expressing them differently. The human nervous system mainly comprises the cen-
tral and peripheral nervous systems. The central nervous
system includes the brain, its stem, and the spinal cord,
interventions for situation selection are challenging since whereas the peripheral nervous system includes the net-
it is hard to gauge one’s intrinsic feelings about different work of nerves passing through different types of muscles.
situations, mainly when driven by an impact bias. Another The peripheral system is further divided into autonomic
strategy is situation modification, where an individual can and somatic nervous systems. These two systems play a
physically alter an existing situation, such as moving away primary role in regulating the physiological and behavioral
from a negative emotion-eliciting scene, person, or object. responses to emotions. The autonomic nervous system
In addition, one can choose to focus on a certain favorable (ANS) is further divided into two branches: the SNS,
aspect of the given situation. This strategy is known as responsible for stimulative functions, and the parasympa-
attention deployment. When facing a situation and a par- thetic nervous system (PNS), responsible for restorative
ticular aspect that elicit negative emotions, one can choose functions in the body. The ANS traverses the end effectors,
to attach a meaning to that aspect that may elicit more pos- including smooth cardiac muscles and glands that are
itive emotions. This strategy is known as cognitive change, predominantly involuntary. The somatic nervous system
and one way to achieve it is through cognitive reappraisal. makes up the nerves in the skeletal muscles that are often
Once the emotion has been evoked, one can modulate one voluntary. The following paragraphs describe, with the
or more of the behavioral, experiential, and physiological help of previous research, the manifestation of emotions
response tendencies, such as using physical exercise as through different physiological responses in coordination
an intervention. While adaptive forms of emotion regu- with ANS.
lation are vital for the successful functioning of humans
in daily life, the autonomic and behavioral responses
due to regulation may overlap with those of emotion A. Brain Responses
expression. Hence, it necessitates the consideration of The brain, along with serving several necessary func-
emotion regulation while detecting emotions. For example, tions in our daily lives, plays a significant role in emotion
studies have shown that emotion regulation strategies, expression and regulation. Various regions of the brain
such as suppression through facial expressions, result in are involved in emotion processing, such as the amygdala,
decreased facial activity [37] but an increase in sympa- prefrontal cortex, insula, and cingulate cortex. Amygdala
thetic nervous system (SNS) activity, such as increased is known to be involved in a negative emotional response.
blood pressure [38]. However, the self-reported subjective When the emotion processing regions are active, several
experiences remained unchanged. In contrast, the regula- neurons located in the cerebral cortex communicate by
tion strategy of cognitive reappraisal decreased HR and generating electric potentials synchronously. This neural
corrugator muscle activity [39]. Therefore, understanding activity collectively results in electric activity, which can
the impact of various regulation strategies potentially aids be measured by placing electrodes on the scalp. Studies
better emotion recognition, provided that such interfer- show that individuals exhibited relatively higher neural
ence in physiological responses to emotions is carefully activity in the left prefrontal cortex for positive emotions
modeled. and higher right prefrontal cortex activity for negative
Emotion elicitation and regulation theories pro- emotions [41], [42]. The neural activity is measured in
vide interrelated components that explain or predict terms of the asymmetry in the neural activation of the
characteristics of human emotions and corresponding left and right prefrontal cortices [43]. Furthermore, such

neural activity in active regions of the brain demands more that send and receive muscle contraction information. The
oxygen and nutrients, and this results in increased blood depolarization of motor neurons upon contraction results
flow to that region. in electrical activity measurable from the skin surface.
B. Cardiac Responses E. Respiration Responses

The cardiovascular system comprises the heart and Respiratory organs, mainly lungs, are dually innervated
blood vessels. The heart is responsible for pumping blood as well. Respiratory sinus arrhythmia (RSA), a phe-
to all parts of the body. It has specialized muscle cells that nomenon where the heart contracts and relaxes as a func-
generate electrical impulses that initiate the heart con- tion of respiration due to the inherent coupling between
tractions or heartbeat. Activation of SNS due to negative breathing and blood pumped by the heart, is a noninvasive
emotions or various stress stimuli results in the release index of parasympathetic activity as it arises from the
of substances called neurotransmitters that bind to the fluctuations in the vagal control [47]. The chemoreceptors
cardiac muscles stimulating an increase in the heartbeat in the arteries detect small decreases in the amount of oxy-
rate or HR, whereas activation of PNS due to relaxation gen or increases in carbon dioxide and trigger respiration
or positive emotions results in a decrease in the rate and activity. Negative emotions, such as anger, trigger a higher
force of heart contractions. Any change in contractions respiration rate than positive emotions [48].
is associated with a change in the electrical activity of
the heart. Since the cardiovascular system is dually inner- F. Behavioral Responses
vated, i.e., simultaneously controlled by both sympathetic
and parasympathetic branches of ANS, the end response Although closely tied to the ANS-mediated responses
measurements will not reveal the activity of individual described above, we categorize action tendencies or
branches due to the reciprocal control due to dual innerva- expressions driven by underlying changes in the physio-
tion. For instance, an increase in HR could be influenced by logical state as behavioral responses. Behavioral changes
increased activity of the sympathetic branch or decreased resulting from emotions include changes in facial expres-
activity of the parasympathetic branch, or a combination of sions, gait, speech properties, body postures, gestures, and
both where either of the activities dominates [44]. Hence, so on. For instance, speech is influenced by respiration
specific features, such as HR variability (HRV), should be rate. A variation in respiration rate triggered by SNS
considered to differentiate the two types of activation. affects the air pressure below the larynx. The variation
in air pressure affects the opening and closing of the
vocal folds, thereby resulting in variations in voice inten-
C. Skin Responses sity [49]. Emotion-specific variations in speech have been
The outer layer of the skin is capable of conducting studied [50]. Furthermore, underlying emotions are found
electricity but offers a certain level of resistance. The to activate various facial muscles, resulting in facial expres-
middle, dermis layer of our skin, comprises the blood sions. Ekman et al. [26] conducted a pioneering study on
vessels and sweat glands. Sweat glands, mainly inner- facial expressions and autonomic responses, discovering
vated by the SNS, produce moisture to facilitate grasping that the activation of prototypical facial muscles or action
during the fight–flight reaction. When SNS is activated units is associated with corresponding changes in the ANS
due to emotional stimuli, the emitted neurotransmitters activity. A specific combination of action units is involved
induce changes in the resistance (or conductance) of the in particular emotions [27]. For instance, negative emo-
skin. According to secretion theory [45], the changes in tions, such as sadness, activate the action units near the
skin conductance are triggered by sweat gland activity. eyebrows, whereas positive emotions, such as happiness,
Furthermore, due to the evaporation of sweat, the skin activate the action units of the cheek.
temperature (ST) reduces as well.
IV. M E A S U R E M E N T O F E M O T I O N
D. Muscle Responses RESPONSES
This section is dedicated to exploring the measurement
ANS activity elicited by emotions can lead to changes techniques of the modalities that were discussed in
in muscle activity, both voluntary and involuntary. Invol- Section III. We will examine the features of each modality
untary muscle movements include tensing up of shoulders and how they are utilized for detecting emotions (see
and twitching due to the activation of the SNS [46]. Table 1).
Voluntary muscle movements may include facial expres-
sions. Even though the boundary between involuntary
and voluntary muscle movements is not always clear, A. Brain Activity
both types of movements can generate electrical activity Electroencephalogram (EEG) involves recording and
in the muscles. The skeletal muscle fibers that make up amplifying the collective electrical signals generated by bil-
the muscle tissue are innervated by the motor neurons. lions of nerve cells through the use of electrodes and wires
Motor neurons are a part of the somatic nervous system attached to the scalp. Despite its ability to offer researchers

Table 1 Activity Types and Corresponding Measurement Types
tremendous time resolution, the spatial resolution of EEG wave duration, peak amplitude, instantaneous frequency,
is relatively low, and it requires multiple electrodes to complexity, and energy [60]. In frequency-domain anal-
be placed at various locations on the head. Nonetheless, ysis, brain rhythms are very well established. Gamma
EEG remains a valuable tool for investigating phase tran- waves can be found over 30 Hz and related to activity
sitions in response to emotional stimuli [51]. Functional in fronto-central areas. They have the highest frequencies
neuroimaging techniques, including positron emission and can be used to monitor regions related to volun-
tomography (PET) and functional magnetic resonance tary movements, cognitive functioning, learning, memory,
imaging (fMRI), have been utilized to investigate the and processing information [61]. Beta waves are between
impact of emotion on the limbic system [52]. Researchers 14 and 30 Hz and are related to activity in the pari-
discovered emotion-related increases in cerebral blood etal, somatosensory, frontal, and motor areas. They are
flow or blood-oxygen-level-dependent signals in corti- seen during awakened states, and they are correlated
cal, limbic, and paralimbic regions. This suggested that with memory, focus, and problem-solving functions. Alpha
specific brain regions have specialized functions for emo- rhythms are between 8 and 13 Hz and are related to
tional operations. To investigate this specificity, researchers occipital and parietal regions. Alpha rhythms are made
induced visual, auditory, and recall-based stimuli to rec- up of the subconscious activity of the brain, and they
ognize emotions by analyzing the activated regions using are related to relaxed and mediated mind states. Another
PET and fMRI technologies [53], [54]. In addition, EEG is rhythm is the theta rhythm and related to the hip-
noninvasive, fast, and cost-effective, making it a preferred pocampus region. They are commonly observed under
method for investigating the brain’s responses to emotional drowsy, daydreaming, and sleep states. The last rhythm
stimuli [55]. EEG is commonly combined with speech [56] is delta waves, and they are the slowest brain waves.
and facial expression [57] data to improve the robustness They can be observed during deep sleep states. Frequency-
of emotion recognition systems. Recently, new EEG devices domain features are mostly built up on well-established
have emerged in the market, which offers several advan- brain rhythms. δ, θ, α, β, γ, θ/α, β/α, (θ + α)/β, θ/β, γ/δ,
tages, such as unobtrusiveness, affordability, portability, mean, median, variance, standard deviation, and reflec-
and ease-of-use. These devices, such as the Emotiv Epoch tion coefficients are commonly used frequency-domain
14-channel, the Emotiv Insight 5-channel, and the Omnifit features.
Brain 2-channel headsets, are typically equipped with
10–20 electrodes and can be utilized to capture raw EEG B. Electrical Activity of Heart
data. There are two methods for measuring heart activ-
1) Preprocessing: There are two types of artifacts that ity: electrocardiography (ECG) and photoplethysmography
can affect EEG data: technical (extrinsic) artifacts and (PPG). ECG sensors use multiple electrodes placed sym-
physiological (intrinsic) artifacts [58]. Technical artifacts metrically on specific areas of the body to measure the
include noise from electrode misplacement, powerline heart’s electrical activity, resulting in an ECG signal with
interference, and other electromagnetic interferences, essential information, including the R peak, which is com-
while physiological artifacts include eye movements and monly used for extracting emotion-specific features [62].
blinks (electrooculogram artifacts), muscle activities (elec- On the other hand, PPG sensors measure the changes
tromyogram artifacts), and cardiac activities. Frequency- in blood volume by measuring the extent of reflection
domain filters, such as a bandpass filter between absorbed by skin-reflected infrared light initially emitted
0.5 and 60 Hz, can remove most technical artifacts. How- by a light-emitting diode, resulting in a PPG signal that
ever, removing physiological artifacts is more complex and can be used to estimate R peaks from the peaks of blood
requires the use of threshold-based time-domain filters and volume (refer Fig. 5). Although PPG data have lower
independent component analysis techniques [59]. quality and are more susceptible to motion artifacts under
physically active situations, they offer greater unobtrusive-
2) Feature Extraction: EEG features can be divided into ness and can be used without interrupting users during
two groups: time and frequency domains. Time-domain long experiments in daily life. Therefore, sensors should
features can be listed as mean, median, variance, stan- be selected based on the performance requirement, exper-
dard deviation, skewness, kurtosis, zero crossing rate, iment duration, and environment of the study. Devices

providing raw ECG data include BIOPAC’s MP150, MP35, C. Muscle Activity
Shimmer Sensing 3, Polar H9, Polar H10, Firstbeat Body- Electromyography (EMG) utilizes electrodes to quan-
guard 2 and 3, Zephyr HxM, and Bitalino (r)evolution. tify the electrical activity changes in muscles as a
Wristbands such as Empatica E3 and E4, Samsung Galaxy result of contraction. The facial and trapezius muscles
S1 and S2, Angel, Polar Verity Sense, and finger sensors, are the most extensively examined muscles for emo-
such as CorSense, UFI model 1020, and BIOPAC BioNo- tional responses [68]. Facial muscle activity is commonly
madix PPGED-R, provide raw PPG data. employed for emotion recognition and is recognized via
1) Preprocessing: Robust artifact detection and removal the facial action coding system (FACS) [69]. While the
algorithms are applied before processing the PPG data. visual inspection is subjective in nature and has poten-
In the literature, several frequency- and time-domain filters tial coding errors, facial EMG is an objective method
have been used. Generally, every data point is compared with fewer true negatives than [46]. However, facial
with the local average for time-domain filters. A data EMG measurement may be intrusive and alter the par-
point is labeled as an artifact if the percentage of differ- ticipant’s natural behavior. Facial expressions resulting
ence is greater than a certain threshold (approximately from muscle activity will be discussed in greater depth in
20% [63]). The commonly used frequency-domain fil- Section IV-G. Yet, the importance of bodily expressions of
ters include Butterworth high-pass filters with a cutoff emotions is currently being investigated as they have been
frequency of 1 Hz to eliminate baseline wander, low- found to correlate with facial expressions during social
pass filters with a cutoff around 25 Hz to eliminate interactions [70].
high-frequency artifacts (also from other sensors, such as 1) Preprocessing: The EMG signal is often affected by
EMG), and band rejection filters to eliminate power line noise. The possible noises include the motion artifacts
interference between 50 and 60 Hz [64]. The removed arising from user motion or cable and electrode inter-
artifact data points can be replaced using different interpo- faces, inherent device noise, and ambient noise [71].
lation techniques. The cubic spline interpolation is one of Frequency-domain filters are applied to remove artifacts
the most commonly used techniques since it has a structure in specific frequency bands [72]. In addition, adaptive
similar to the heart activity signal. prediction error filters have been proposed for eliminating
2) Feature Extraction: HR is commonly used to estimate nonstationary artifacts affected by factors such as stimu-
the degree of emotions. It can be calculated by counting lation intensity, fatigue, and the contraction level of the
the number of heartbeats per minute. Alternatively, the muscle [71].
time interval between consecutive R peaks called the RR 2) Feature Extraction: Muscle activity signals obtained
interval or interbeat interval (IBI) is used. IBI has an from the EMG sensor include the superposition of actions
inverse relationship with HR. HRV is another widely used of numerous motor units. Therefore, they need to be
measure for heart activity, and it can be computed from the decomposed to reveal the mechanisms of muscle and
distribution of RR intervals over a time interval. Variation nerve control. The decomposition is commonly performed
in HRV corresponds to SNS and PNS activities. using wavelet spectrum matching and principle compo-
HRV features can be extracted from time and frequency nent analysis of wavelet coefficients [71]. Commonly
domains. Mean HR, standard deviation of IBI, mean RR, extracted features include wavelet-based features [73],
root mean square of successive differences (RMSSD) of Mel-frequency cepstral coefficients [72], and statistical fea-
respiration rate (RR) intervals, and the percentage of tures, such as mean, standard deviation, rms, peak loads,
successive RR intervals that differ from the previous RR and gaps per minute. Furthermore, muscle tremors are
interval by more than 50 ms (pNN50) are considered the known to be signs of different emotions [74], and they
most distinctive time-domain features. The IBI data should can be detected around 11 Hz using frequency-domain
be converted to the frequency domain to extract frequency- analysis.
domain features. Since R-peaks are not equidistant, either
the IBI signal needs to be resampled to obtain equidistant
samples in order to use fast Fourier transform or methods D. Skin Activity
such as Lomb–Scargle periodogram [65] can be used. After Electrodermal activity (EDA) is the activity of the skin
the conversion to the frequency domain, powers in very where the electrical properties change based on the emo-
low, low, prevalent low, high, and prevalent high-frequency tion a person experiences. EDA is measured in terms
ranges and the ratio of power in low- to high-frequency of change in skin conductance estimated by passing a
ranges are commonly extracted. small amount of current through silver–silver chloride
Several nonlinear features of HRV [66] are evaluated electrodes. An instantaneous surge in skin conductance
using various state-space domain entropy-related mea- constitutes the phasic component of EDA. Darrow [75]
sures. The most commonly used measures are the standard found a correlation between the sweat gland activity and
deviations of the Poincare plots, approximate and sample the phasic skin conductance response (SCR) upon expo-
entropy, correlation dimension, recurrence, and fluctuation sure to an emotional stimulus; however, there is a delay
slopes [67]. of a few seconds between the two. The dc component

Fig. 5. Recorded signals from a laboratory experiment comprising four phases. In the first phase, the baseline is shown. In the second phase,
participants are induced with mental stress using TSST. The changes in BVP and EDA signals can be observed in the stress phase. The third
is a recovery phase using breathing exercises. The last phase is a physical activity phase with increased acceleration, EDA, and BVP signal
activities.
of EDA is the skin conductance level (SCL) and is either tool) and pyEDA [80]. The tonic component is used
low or high in resting and activated states, respectively. for long-term baseline measurement using statistical fea-
Although EDA is a good approximation of SNS activity tures, such as the mean, minimum, maximum, standard
and an easy yet inexpensive way to measure, it is unre- deviation, quartile deviation, 20th percentile and 80th
liable when the subject moves or the external temperature percentile of values over an interval, and first and second
conditions vary. Furthermore, researchers must be cautious derivative features. For short-term arousal detection, fea-
while measuring the EDA signal due to the factors such as tures from the phasic component, such as the peaks count
the contact between the electrodes and the measurement over a specific duration, the total number of peaks above
area, the salinity of the electrolyte, skin area preparation, a certain high threshold (one micro Siemens) over a dura-
the controllability of the stimulus, and respiration matter. tion, the delay between stimulus and peak response, peak
EDA is a promising signal for emotion recognition along amplitude, and rise and recovery times, are measured.
with the heart activity signal. Measuring instruments, a) Skin temperature: Besides emotions, STs are
such as Shimmer 3 GSR+, ProComp Infiniti, Bitalino affected by various factors, such as weather and physical
(r)evolution, BIOPAC MP150, and wrist devices such as activity. Previous research has shown that increased blood
Movisens EDAMove 4, Empatica E3, and E4, are widely flow due to arousal induces about 0.1 ◦ C–0.2 ◦ C change
used to measure EDA, which provides raw data [68]. in ST [81]. With controlled external factors, such subtle
changes in the ST resulting from an emotional response
1) Preprocessing: EDA increases with physical activ- can be measured. Often, ST is combined with additional
ity and changes in temperature as they cause sweating. biosignals to get a more robust recognition performance.
Therefore, a multimodal approach with physical sensors is Standard time-domain statistical features of ST signals are
required to isolate the effect of emotional state changes used in the literature.
on EDA. Physical activity measured using accelerometer
sensors and external temperature changes inferred by ST
sensors can be useful. There are several preprocessing E. Blood Pressure
tools for cleaning the EDA signal. Though wavelet-based High-arousal negative emotions cause an increase in
artifact removal techniques are common in the lit- blood pressure levels, whereas low-arousal positive emo-
erature [76], [77], supervised machine learning-based tions can decrease them [82]. Recently, commercial
techniques [78] for artifact removal exist. Manually anno- wearables have been equipped with blood pressure sen-
tated data labeled by experts for artifacts are used to train sors, namely, ASUS VivoWatch BP (HC-A04) and Omron
supervised models. HeartGuide. These devices make it possible to monitor
blood pressure levels continuously. Systolic and diastolic
2) Feature Extraction: The EDA signal has two com- components of blood pressure can be used as features.
ponents: SCL and SCRs. SCL is a slow-changing dc
component, also called the tonic component. In contrast,
SCR is an event-related and short-term component of the F. Respiration
EDA and is also called the phasic component. There are Furthermore, respiration data are used to decouple
open-source tools for analyzing the EDA signal, such as the EDA data from the effects of breathing. Respiration
cvxEDA [79] (a convex optimization-based EDA analysis measurement is inexpensive as it involves a simple belt

containing a piezoelectric device. However, one should electroglottography, it is easier to capture emotion-related
beware of possible issues during the measurement. For patterns of speech in microphone audio data. Recent
example, the tightness of the chest strap may lead to advances in machine learning have resulted in learning
either ceiling effect or inaccurate recordings, the dis- emotional feature representations from speech data [94].
comfort caused by the strap leading to new breathing Furthermore, transformer-based speech emotion models
patterns, or voluntary controlled breathing. Breathing have led to improved recognition of positive and negative
rate and amplitude can be indirectly measured using emotions, with good generalization and robustness across
transducer-based sensors that rely on chest cavity expan- different domains, speakers, and genders [95].
sion [83], [84]. PPG data from wearable devices can be In addition, recent research has demonstrated that
used to derive respiration rate [85]. Statistical features, alterations in body posture can indicate changes in affec-
such as minimum, maximum, mean, and standard devia- tive states [96], [97]. Consequently, numerous studies
tion of respiration rate, mean and standard deviation of have investigated the utilization of body postures and
the first and second derivatives, and frequency-domain movements for emotion recognition [98]. Specific body
features such as spectral power [16], are extracted. postures, such as head tilts and clenched fists, have
In addition, nonlinear features are extracted using recur- been linked to the expression of specific emotions [99],
rence quantification analysis, deterministic chaos, and [100], suggesting their involvement in nonverbal com-
detrended fluctuation techniques [86]. munication and emotion perception. Moreover, recent
studies have revealed that body movements [101] includ-
G. Measurement of Behavioral Responses ing measures such as the velocity of joints, acceleration,
Behavioral responses are best suited for noncon- and jerk, and other gesture-specific features such as
tact measurement. Behavioral responses are commonly height, angle, and movement direction of the hands and
combined with physiological signals to obtain a more arms, body movement trends, head movement, symme-
accurate emotion recognition system. Yang et al. [87] try [102], and gait [103] can carry information relevant to
combined several behavioral (facial expression, speech, emotions.
and keystroke) and physiological (blood volume, EDA,
and ST) modalities and achieved 89% accuracy for binary
emotion recognition. One of the advantages of deep H. Contextual Information
learning approaches is their ability to effectively utilize Context influences an emotional experience but is chal-
multimodal data, which includes information from phys- lenging to obtain in an uncontrolled setting, such as daily
iology, facial expressions, and speech. Moreover, facial life. Nevertheless, the system’s robustness can be increased
muscle activity has independently aided emotion recog- by adding contextual information to the physiological sig-
nition. The measurement has started with facial EMG, nals. Activity-associated context actively acquired from the
but, recently, RGB cameras have been used more com- user in combination with HRV significantly increased the
monly to capture emotions from facial muscle activity. stress detection performance in the wild (around 25%
The discovery of action units by Ekman et al. [26] led increase in F1-score) [104]. Since the active acquisition
to the development of the FACS. This system represents of context from the users may interrupt them, passive
facial expression prototypes in terms of the location of acquisition using smartphone data may provide helpful
action units on the face [69], and geometry-based facial insights. Passive context based on physical activity and
feature extraction approaches that involve the position, location, smartphone activity (calls, SMS, applications,
size, and shape of facial landmarks were developed to battery status, and screen usage), and ambient condi-
detect these action units [88]. In addition, appearance- tions (light and weather) can detect stress independent
based approaches that utilize color intensity and texture of physiological signals [105]. Context based on smart-
of facial features, such as spatial filters and local binary phone activity has been used in addition to physiological
patterns [89], have also been explored. Early approaches data, such as EDA [106]. A 10%–15% increase in stress
to facial emotion recognition primarily relied on tradi- recognition accuracy was reported when weather data (air
tional emotion classification methods that utilized these temperature, humidity, and air pressure), activity informa-
extracted features from facial expressions. However, with tion, and physical activity intensity were added to HRV and
the availability of large datasets and advancements in EDA signals [107].
computing technology, recent research has introduced
deep learning approaches that can inherently capture the
nuances of facial expressions from images and directly I. Combination of Multiple Physiological Signals
classify them into emotions [90], [91]. and Interdependencies With Other Modalities
Speech signal has been widely combined with physiolog- Emotion recognition studies often combine multimodal
ical signals [92], [93] and improved emotion recognition physiological signals to obtain a more comprehensive view
performance. Emotion-specific variations in the speech of emotional states [74]. Adding more modalities can
were identified several decades ago [50]. While the elec- eliminate the drawbacks of individual signals and develop
trical activity of the vocal cords can be measured through more robust systems. Soleymani et al. [108] investigated

the interactions between EEG signals and facial expres- The behavioral (i.e., speech and body movements) and
sions for emotion recognition. In particular, they show muscle-based responses (such as facial expressions) are
that informative features of EEG signals originated to a more robust in controlled environments than physiological
large extent from facial expressions. Insights on potential signals. Moreover, researchers obtain robust performance
artifacts in channels of affect-related information could with EEG signals, especially in laboratory or controlled
be deployed when designing fusion processes and, thus, environments. In more controlled situations, they can be
contribute to a more reliable emotion recognition process. preferred. However, it is challenging to monitor speech and
The multimodal fusion process is of three different facial expressions in the wild, and users will be reluctant
types: early, intermediate, and late [16]. to wear EEG devices in daily life although they are more
reliable. Therefore, the story is different for daily life emo-
1) Early Fusion: This type of fusion occurs at the feature
tion recognition. User’s self-reports reflecting the issues
level by selecting the features from multiple signals and
such as comfort and utility are more important for daily
combining them to form a single input for feature extrac-
life [112]. Wrist-worn devices have advantages in these
tion or classification. Fabiano and Canavan [109] used a
aspects, but they have lower data quality [113]. Therefore,
feature-level fusion and showed a 10%–15% improvement
the selection of modalities and wearable devices is a mul-
in valence, arousal, and dominance recognition. However,
tivariate problem, and researchers should make a tradeoff
this fusion is suitable for synchronized input signals.
by evaluating their application-specific requirements.
2) Intermediate Fusion: This kind of fusion can overlook
synchronization issues by leveraging feature extraction
from different time lengths. Furthermore, by comparing K. Unexpected Observations
previous instances with the current ones, probabilities for In some of the experiments, researchers observed unex-
imperfect instances can be statistically predicted [16]. pected phenomena while analyzing the data. The most
Methods using hidden Markov models and Bayesian net- common ones are observed during the emotion elicitation
works are practical for dealing with these situations. phases of experiments. Wagner et al. [114] observed that
Shin et al. [110] used a Bayesian network to fuse features all classification algorithms had particular problems in
from EEG and ECG for recognizing comic, fear, sadness, separating pleasure and sadness that they found surpris-
joy, anger, and disgust emotions and increased the accuracy ing. After further analysis, it is revealed that listening
by 35.78%. to sad music may elicit positive feelings [115]. Emotions
are complex phenomena, and assumptions made while
3) Late Fusion: This type involves aggregating results
designing experiments for emotion elicitation might not
generated by different classifiers to obtain a final result,
hold on some participants. In another case, researchers
often through voting. The classifiers can be trained
noticed that some participants did not report stress in the
separately on each modality, hence not requiring syn-
arithmetic phase of the trier social stress test (TSST) in the
chronization [16]. Wang et al. [111] applied three SVM
questionnaires [116]. They recruited the participants from
(RBF kernel) classifiers to power spectral, Higuchi frac-
a university, and they saw that students from mathematics
tal dimension, and Lempel–Ziv complexity features. They
or computer science departments tend to report low stress
integrated these classifiers by employing a weighted fusion
in the arithmetic phase of TSST. Therefore, to detect these
strategy that computes confidence estimation on each class
unexpected observations, perceived emotion self-reports
by each classifier. They evaluated their approach on the
can be collected and cross-referenced with the elicitated
DEAP dataset (on EEG data) and showed that this late
emotional context (whether the participant watches a sad
fusion method outperforms the performance of individual
video or stress is induced) to validate whether the experi-
classifiers and the early fusion methods.
enced emotion is the same as the elicitated one. Moreover,
although multimodality yields generally better results, it is
J. Insights not always the case [117]. Sometimes, research using only
Multimodality has advantages such as increased redun- ECG or EEG data showed better or sometimes worse than
dancy, i.e., when one signal fails to detect emotion in the multimodal approaches. One signal can be dominantly
a specific situation, thereby improving prediction perfor- better than others for a task or all signals can be noisy
mance. Furthermore, specific signals can be used to detect in similar intervals, and they could not contribute to the
and remove artifacts from other signals. For example, EDA performance of others. In these cases, multimodality does
is very sensitive to physical activity and increased room not necessarily improve performance. In the context of
temperature. Under such conditions, changes in EDA could interpersonal differences, a study found that women and
be falsely regarded as increased arousal or valence. Accel- men do not react the same way and also showed different
eration and ST data can be used for cleaning the artifacts patterns in physiological (skin conductance) recording.
in EDA data [78]. In addition, some signals, such as ST Women were found to be more emotionally expressive
and respiration, achieve better results when combined than men [118]. Individuals using an emotional regulation
with additional biosignals [16]. The selection of modalities strategy, such as suppression, yield different physiological
depends mostly on application type and environment. responses to emotions than those [119].

V. E X P E R I M E N T A L D E S I G N F O R an emotion recognition study involving participants

P H Y S I O L O G I C A L D ATA C O L L E C T I O N from a specific age group or cultural background may
A N D E X I S T I N G D ATA S E T S not be generalized to other populations. Depending
Data collection in emotion research has no consensus on on the context of the study, a larger and more diverse
emotion elicitation and measurement methods owing to sample population size is crucial to overcome the
the highly subjective nature of emotions. However, sev- interindividual variability and the effect of confound-
eral measures can be adopted during data collection to ing variables. Sample size can be obtained using
capture emotions reliably and facilitate effective emotion appropriate statistical tools, such as G*Power [140],
recognition. The most important factors to be considered Krejcie and Morgan’s formula [141], or Cochran’s
during data collection are sample population, emotion sample size formula [142], by specifying the allowed
stimulus, modalities measured, emotion annotation pos- margins of error.
sibilities, and sensing equipment [132]. Below are some 4) Measurement: As described in Section III, the
points to consider. physiological manifestation of emotions makes it
1) Emotional stimuli: Stimuli are characterized by cat- possible to identify emotions through multiple
egorical emotions, such as happy and sad, and are modalities. Depending on measurement convenience,
employed for emotion elicitation. Appraisal-based the selected physiological measures should include
theories of emotion elicitation have emphasized that major ANS responses. Cardiac and electroder-
the emotion elicited in an individual is specific to the mal responses are helpful for autonomic activity
stimulus and its appraisal. Therefore, due to individ- estimation. Research suggests seeking convergent
ual differences, perception of the stimulus may vary evidence across multiple responses for a particu-
from the intended emotion or its intensity. Research lar emotion [143]. Furthermore, since emotions are
has progressed from eliciting strong emotions in a short-lived, the timing of physiological measurement
laboratory to measuring emotions in real life, thereby is important. This is especially true when emotional
dealing with low-intensity emotions. Therefore, it is stimuli produce a less intense emotional response.
crucial to ensure: 1) the stimulus for a specific cat- Measuring devices play a crucial role in data col-
egory of emotion should be verified to elicit the lection. Medical-grade devices are often not suited
intended emotion; 2) to the required intensity; and for real-life data collection. Therefore, researchers
3) no other overlapping emotion is elicited. An exam- are directed toward more unobtrusive and easy-to-
ple of a verified stimulus for inducing stress is the use devices. However, scientifically validated devices
TSST [133]. It is a method consisting of a public should be chosen for the data collection.
interview and arithmetic tests to induce stress and is 5) Annotation: Most often used means of self-reporting
widely used for stress response elicitation. TSST has are Likert scales of valence and arousal. While
been clinically validated to induce a stress response self-reported data are the closest reflection of an
in most of the population and is characterized by individual’s emotion, it is prone to several errors, such
novelty, uncontrollability, unpredictability, and socio- as inaccurate understanding of scales or negligence in
evaluative threat [134]. reporting. Timing of self-reporting is also crucial as
2) Emotion regulation: It is an innate process that the reports may be affected by failure to recall events
may take place alongside emotion expression [135]. or the obtrusiveness of prompts.
Participants may use different emotion regulation 6) Context: Models built on artificially elicited emotions
strategies to modify their subjective emotional expe- in laboratories cannot be generalized to the real-
riences or responses during measurement. This may life environment. The psychophysiological responses
result in inaccurate physiological responses. Depend- to artificial stimuli do not represent those in real
ing on age, culture, and personality, participants may life. Although real-life data collection has more issues
adopt different regulation strategies [136], [137]. when compared to collecting data in a controlled lab-
Different regulation strategies may influence physiol- oratory environment, the research direction is toward
ogy differently. Research has shown that participants developing real-life and daily emotion recognition
who used suppression to regulate their emotions, systems. However, data collection in real life poses
in contrast to reappraisal as a regulation strategy, several challenges. First, the subtlety of the emotional
showed higher physiological responses to emotional responses is a hurdle for annotation in real life. Unlike
stimuli [138]. Therefore, it is important to instruct the ability to control the stimuli in the lab, a real-life
the participants not to adopt an emotion regulation scenario requires considerable contextual information
strategy during the experiment. to be recorded. Individual-specific information, such
3) Sample population: Research has shown that cul- as personality, demographics, and health conditions
tural differences influence physiological emotion that potentially impact emotional responses, are likely
responses [139]. Depending on the emotion regula- to yield better confidence in the computed emotion
tion strategy used, age is also a factor influencing recognition models. The self-reported and sensor-
physiological responses [137]. Therefore, results from based contextual information about the participant

Table 2 Comparison of Physiological Datasets Collected for Emotion Recognition. A Stands for Arousal, V Stands for Valence, and D Stands for
Dominance
and the experimental conditions, such as physical video segments from commercial movies and assessed
activity type and intensity, location, and ambient con- them on valence, arousal, and dominance scales. EEG,
ditions, is necessary to reason for the anomalies in the ECG, EDA, and ST were collected. In addition, face and
emotion recognition models as they tend to influence body videos were recorded using six cameras.
the physiological modalities. For example, an increase
3) DREAMER [122]: The DREAMER dataset was col-
in the EDA signal could result from physical activ-
lected from 23 participants in a controlled environment.
ity, environment and weather changes, or emotional
Scenes from commercial movies were selected to induce
stimuli. Furthermore, collecting data in the laboratory
different emotions. EEG and ECG signals were recorded.
and in real life from each participant could increase
The participants assessed arousal, valence, and dominance
the robustness of the systems. Responses to emotional
levels on a scale from 1 to 5. The dataset was collected
stimuli can be more accurately modeled in a con-
using portable and low-cost wearable devices, which are
trolled environment, and these personalized models
viable options for real-life data collection. However, the
could be adapted to a real-life environment.
dataset has restricted access and is available upon request.
A. Existing Datasets for Emotion Recognition 4) WESAD [123]: The Wearable Stress and Affect Detec-
tion (WESAD) dataset3 was collected from 15 participants
In this section, we provide the prominent emotion recog-
in the laboratory environment. The experiment included
nition datasets that consist of physiological signals (see
amusement, stress, meditation, and recovery conditions.
Table 2). Although most of these datasets are recorded
Positive and negative affect schedule (PANAS), state-trait
in laboratory environments, recently, new studies created
anxiety inventory (STAI), and additional Likert scale ques-
datasets recorded in real-life environments [131], which
tions (stress, frustration, happy, and sad) were used as
would help researchers to improve emotion research in
self-reports. The physiological signals recorded were ECG,
real life or the wild.
EDA, EMG, PPG, respiration, accelerometer, and ST. The
1) DEAP [120]: The Database for Emotion Analysis experiment duration was about 2 h.
Using Physiological Signals (DEAP) dataset1 was collected
5) AMIGOS [124]: A dataset for Multimodal research of
from 32 participants in a laboratory environment. Par-
affect, personality traits, and mood on the Individuals and
ticipants were asked to watch annotated 1-min music
GrOupS (AMIGOS) dataset4 was gathered in two exper-
videos and evaluate them on arousal, valence, dominance,
imental settings. First, 40 participants watched 16 short
likability, and familiarity scales. EEG, PPG, EDA, EMG, elec-
emotional videos (50–150 s) in the laboratory environ-
trooculography, respiration, and temperature signals were
ment. Second, the participants watched four longer videos
collected. In addition, frontal face videos were recorded for
individually and in groups. EEG, ECG, and EDA signals
22 participants.
were recorded. High-quality frontal face and body videos
2) MAHNOB-HCI [121]: Similar to the DEAP dataset, were also recorded. Participants reported their valence,
the MAHNOB-HCI dataset2 was also recorded in a lab-
oratory environment. Twenty-seven participants watched 3 WESAD access link: https://ubicomp.eti.uni-siegen.de/home/
datasets/icmi18/
1 DEAP access link: http://www.eecs.qmul.ac.uk/mmv/datasets/deap/ 4 AMIGOS access link: http://www.eecs.qmul.ac.uk/mmv/datasets/
2 MAHNOB-HCI access link: https://mahnob-db.eu/hci-tagging/ amigos

Table 3 Performance of Varying Machine Learning Algorithms for Recognizing Emotions. The Accuracies Are Two-Class by Default If Not Reported
Otherwise. LALV Is Low Arousal Low Valence, LAHV Is Low Arousal High Valence, HALV Is High Arousal Low Valence, and HAHV Is High Arousal High
Valence. A: Arousal; V: Valence; Acc: Accuracy; and RBC: Radial-Basis Classifier
arousal, control, familiarity, liking, and basic emotions and an automobile environment. Ten participants provided
were also evaluated externally. They also collected the big psychological data during real-word driving tasks under
five questionnaires for personality-related information and the 30-km zone, the 50-km zone, highway, freeway, and
PANAS questionnaire for mood-related data. tunnel conditions. At the end of the driving task, they filled
perceived workload questionnaire. ECG, EDA, and ST data
6) CASE [125]: The Continuously Annotated Signals of
were recorded.
Emotion (CASE) dataset5 consists of real-time annotated
emotions of participants while watching videos in the 10) DSDRWDT [129]: The Detecting Stress During
laboratory environment. Twenty videos whose emotional Real-World Driving Tasks (DSDRWDT) dataset9 was
content is verified by previous studies were selected. ECG, recorded in an automobile environment. 17 participants
BVP, EMG, EDA, respiration, and ST signals were recorded provided psychological data during watching real-world
from 30 participants. In addition, valence and arousal driving tasks. The duration of sessions was between 54 and
levels were reported by the participants. 93 min. HR and EDA data were recorded. Perceived stress
scores were collected for each session.
7) ASCERTAIN [126]: the databASe for impliCit pER-
sonaliTy and Affect recognition (ASCERTAIN) dataset6 11) EMOTIONS [130]: The EMOTIONS dataset10 was
includes big-five personality scales and emotional self- recorded once a day, in a session lasting around 25 min,
ratings of 58 participants. EEG, ECG, EDA, and facial for over twenty days. It was recorded by one participant.
activity data were recorded, while the participants watched Eight emotions (neutral, anger, hate, grief, joy, platonic
audiovisual clips. Arousal, valence, and personality were love, romantic love, and reverence) were annotated for
collected using self-reports. each session. PPG, EDA, EMG, and respiration data were
recorded.
8) EMDB [127]: The Emotional Movie DataBase
(EMDB) dataset7 was recorded in a laboratory environ- 12) DAPPER [131]: The DAPPER dataset11 was recorded
ment. Thirty-two participants provided psychological data in an ambulatory environment, unlike the abovemen-
during watching 52 emotional film clips, which took tioned ones collected in a laboratory; 142 participants
around 40 s each. HR and EDA data were recorded. provided psychological recordings, whereas only 88 pro-
Arousal, valence, and dominance were recorded as the vided physiological recordings over five days. Emotions
ground truth. were annotated using the experience sampling method
(ESM), and detailed descriptions of everyday emotional
9) RWDADW [128]: The Real World Driving to Assess
experiences were obtained using the day reconstruction
Driver Workload (RWDADW) dataset8 was recorded in
method. ESM comprises arousal and valance ratings and
5 CASE access link: https://gitlab.com/karan-shr/case_dataset/tree/
ver_SciData_0 9 DSDRWDT access link: https://physionet.org/content/drivedb/1.0.0/
6 ASCERTAIN access link: https://ascertain-dataset.github.io/ 10 EMOTIONS access link: https://dam-prod2.media.mit.edu/x/2022/
7 EMDB access request at: [email protected] 01/05/SetA.tar.gz
8 RWDADW access link: https://www.hcilab.org/wp-content/uploads/ 11 DAPPER access link: https://www.synapse.org/#!Synapse:syn2241
hcilab_driving_dataset.zip 8021/files/

PANAS questions for ten selected emotions. HR, EDA, and the ECG signals. Due to its suitability to high dimension
acceleration data were recorded. data, RF was also tested for emotion recognition, and
it achieved around 70% accuracy for two-class arousal
VI. M A C H I N E L E A R N I N G A P P R O A C H E S and valence classification, and outperformed other tra-
Emotion recognition systems are based on supervised ditional methods [148]. Wen et al. [147] applied RF to
learning and consist of binary or multiclass classifiers. recognize emotional states, such as baseline, amusement,
The inputs to these classifiers are various signals, and the anger, grief, and fear using heart activity, EDA, and blood
output class labels correspond to an emotional state (i.e., oxygen saturation signals. They achieved 74% accuracy
different emotion types and levels). Early studies employed for quinary classification on their dataset consisting of
traditional classifiers to recognize emotions. Classification 477 cases of 101 subjects while watching emotional videos.
tools can be listed as linear discriminant analysis (LDA),
quadratic discriminant analysis (QDA), k-nearest neighbor
(kNN), random forest (RF), and support vector machine
B. Deep Learning Approaches
(SVM). With the advancements in deep learning algo- After the improvements in deep learning algorithms,
rithms, multilayer perceptron (MLP), convolutional neural they are also widely used for emotion recognition. The
networks (CNNs), and long short-term memory (LSTM) researchers first tested MLP, an artificial neural network
techniques are also tested for recognizing emotions (see that generally outperformed other traditional algorithms.
Table 3). It was among one of the best-performing classifiers [149].
In one of the preliminary works, Wagner et al. [114]
applied MLP and compared the results with LDF and
A. Traditional Machine Learning Approaches kNN. MLP classifier achieved better results than the other
Traditional algorithms and their advantages and dis- classifiers when applied to ECG, EDA, EMG, and respira-
advantages can be described briefly as follows: the SVM tion data for emotions such as joy and anger (88,64%
algorithm defines a hyperplane that separates data points for valence detection and 94.32% for arousal detection).
belonging to different classes with the largest spatial mar- However, the best-performing classifier changed with the
gin. Although originally designed as a linear classifier, SVM selected emotion and feature selection technique. The MLP
can be scaled to perform nonlinear classification using classifier was also applied to PPG, EDA, and ACC data for
different kernel functions efficiently. However, it is used stress level detection and achieved better results (92.15%
predominantly for binary classification of emotions [16]. accuracy for binary stress classification) than LDA, SVM,
kNN is an algorithm that assigns a class to a new data kNN, logistic regression, and RF [113].
point based on the classes of its k closest data points and CNN is another type of deep, feed-forward neural net-
is rather straightforward to implement. Nevertheless, kNN work. They achieved significant success in the image
requires storing all training data, which causes increased domain [16], and recently, researchers have applied them
complexities in time and space. Kernel SVM and kNN to physiological signals, such as EEG, EMG, and ECG.
being nonlinear classifiers compute the decision boundary In one of the preliminary studies, Martinez et al. [157]
accurately depending on their hyperparameters, which can tested several CNN architectures on BVP and EDA sig-
cause overfitting and decrease the generalization capabil- nals for recognizing four emotional states (relaxation,
ity. The generalization capability of LDA is better when anxiety, excitement, and fun) and achieved better results
compared with the mentioned nonlinear classifiers [16]. than using traditional techniques (70% accuracy for
It assigns instances to classes with a projection of the fun and excitement and 60% accuracy for relaxation
feature values to a new subspace. The classification per- and anxiety). CNNs were also used for automatically
formance of RF is typically higher for high-dimensional extracting high-level features from physiological signals.
data. However, the decision tree classifier has an issue Kanjo et al. [13] extracted features from the EDA signal
of overfitting, which can be alleviated by assigning class using CNN architecture and achieved 95% accuracy and
labels with the results of several decision trees in the RF outperformed the usage of handcrafted features (which
classifier. has 83%) for five-class valence detection. Graph CNNs
SVM is commonly used for recognizing emotions. are also used for recognizing emotions from physio-
It is applied to different public datasets. Around logical signals. They are appropriate for the irregular
0.6–0.7 F1-scores for recognizing arousal and valence in structure of EEG data and can discover the intrinsic rela-
two-class classification in the ASCERTAIN dataset [126] tionship between various EEG channels. The graph CNN
and 45%–50% accuracy for three-class arousal and valence algorithm achieved higher accuracies with the EEG sig-
classification in the MAHNOB-HCI dataset [144] are nals of the SEED dataset reaching 94.24% [158]. After
reported. LDA is another widely used classifier for rec- the success of graph CNNs with EEG signals, they were
ognizing emotions. It achieved around 80% accuracy for also applied to a combination of physiological signals.
differentiating stress from the cognitive load by analyz- Wierciński et al. [150] reported that they achieved 70%
ing the EDA signal [145] and around 80% accuracy for accuracy for valence and arousal detection when the graph
recognizing two-class valence and arousal levels from CNN algorithm was applied to EEG, ECG, and EDA signals

on the AMIGOS dataset. They further stated that EEG alone and 88% accuracy for arousal detection. The transformer
achieved better accuracy (75% accuracy for arousal and architectures achieved promising results on these public
valence detection) compared to the multimodal approach. datasets.
However, it can be inferred that the performance of
graph CNN algorithms for recognizing emotions using C. Insights
physiological signals (except for EEG) is not investigated
Deep learning approaches improved the emotion recog-
comprehensively.
nition results by analyzing physiological signals on promi-
In recent years, recurrent neural networks (RNNs) have
nent public datasets. However, it is important to note
had remarkable success in various areas, such as speech
that deep learning approaches require a huge amount of
recognition, language modeling, translation, and image
data compared to traditional classifiers. Therefore, if the
captioning, due to their structure being suitable for time
dataset size and the number of data points are limited,
series. LSTM is a special type of RNN capable of learning
it is advised to use traditional approaches. CNN-based
long-term dependencies and overcoming the vanishing
techniques automatize the feature extraction phase, and
gradients problem of RNN. LSTM is commonly applied to
RNN-based techniques use previous and current data for
the output of CNN for recognizing emotions using CNN
enhanced predictions. The performance of providing raw
as an automatic feature extractor [151], [152]. In these
data to classifiers, usage of handcrafted features, and
studies, Kim and Jo [151] achieved 78.72% and 79.03%
automatic feature extraction with CNN depend on the
for recognizing valence and arousal on the DEAP dataset,
application and data. As an example, CNN requires a larger
and Dar et al. [152] achieved 99.0% accuracy for the AMI-
amount of data for automatically extracting features.
GOS dataset and 90.8% for the DREAMER dataset in
If the data are limited, handcrafted features can be used
four class classification (high-arousal, high-valence, low-
instead of automatically extracted features. Architecture
valence, and low-arousal areas). In some cases, LSTM is
and hyperparameter selection are other challenging tasks
directly applied to the raw physiological data [153] and
for researchers that change with varying applications. It is
handcrafted feature sets [154]. Awais et al. [153] applied
also important to note that other metrics, such as privacy
LSTM to a combination of raw signals (i.e., ECG, EMG,
and explainability, are as crucial as classification perfor-
BVP, EDA, ST, and respiration) and achieved 97%, 94.2%,
mance. To protect the users’ privacy, researchers applied
93.9%, and 95.2% accuracies for detecting amusement,
differential privacy (DP) [159] and federated learning
boredom, relax, and scared emotions, respectively, on the
(FL) [160] approaches with a tradeoff in the performance.
CASE dataset. On the other hand, Umematsu et al. [154]
Another issue is the lack of information about the
achieved 83% accuracy in predicting the next day’s stress
decision-making process of deep learning. By automatiz-
level by applying LSTM to the features obtained from EDA,
ing the feature extraction process with CNNs and using
ST, ACC, mobile phone usage, and location data in their
deep learning for classification, the emotion recognition
local dataset. RNN variants are the most common clas-
systems have turned into black boxes with high accu-
sifiers for recognizing emotion levels from physiological
racy. Although several studies applied explainable methods
signals.
for face [161] and speech-based [162] emotion recogni-
Another important issue for processing time-series
tion systems, there are only a few explainable AI works
signals using deep learning methods is aggregating infor-
for recognizing emotions from physiological signals. For
mation from the raw signal by giving more importance to
example, Liew et al. [163] evaluated and analyzed contri-
the most relevant parts [155]. The attention mechanism
butions of individual features and feature interactions for
technique employs attention weights to restrict processing
representing human emotions by employing the Shapley
to relevant information independent of their distances.
additive explanation values method on multimodal DEAP,
Transformers can be regarded as one of the most pros-
DREAMER, and AMIGOS datasets.
perous attention-based techniques. They have been first
implemented for natural language processing (NLP) and
VII. P R A C T I C A L A P P L I C A T I O N S O F
employ attention mechanisms to analyze sequences of
EMOTION RECOGNITION STUDIES
words and are appropriate for use in other applications,
Emotion recognition systems have a wide range of appli-
such as time-series forecasting, medical, physiological sig-
cations in various fields, such as the workplace, education,
nal analysis, and human activity recognition [155]. Recent
automobile, healthcare, and other areas. By continuously
studies also use these architectures for recognizing emo-
monitoring physiological signals in real time, these sys-
tions from physiological signals (see [156] and [155]).
tems can detect and interpret emotions, and adapt their
Yang et al. [156] combined CNN architectures with con-
responses and actions accordingly.
former blocks and tested them on PPG, EDA, and ST
data from the K-Emocon dataset. They achieved 77.37%
and 79.42% accuracies for detecting valence and arousal A. Workplace and Office
levels. Vazquez et al. [155] tested a transformer model Researchers aimed to recognize emotions in workplaces,
(by combining it with a 1-D CNN) on ECG data of the considering that individuals spend a significant amount of
AMIGOS dataset and achieved 83% accuracy for valence time in these settings, and given that emotion recognition

Table 4 Summary of Practical Applications Using Physiological Signals That Use Emotion Recognition Systems. V: Valence and A: Arousal
systems have the potential to improve workers’ well-being, robots potentially replacing them. Liu et al. [167] used
reduce work-related accidents, and enhance productivity various classification models, including kNN, regression
(see Table 4). In a study by Al Jassmi et al. [164], the tree, Bayesian network, and SVM, to analyze physiolog-
researchers explored the relationship between workers’ ical signals (ECG, EDA, and EMG) and recognize five
emotions and their productivity, discovering a moderate distinct emotions (anxiety, engagement, boredom, frustra-
positive correlation. This prompted them to develop an tion, and anger) during interactions, achieving an accuracy
automated emotion recognition system for construction of around 80%.
workers. By utilizing blood volume pulse (BVP), RR, gal-
vanic skin response (GSR), skin temperature (TEMP), B. Automotive Environment
and HR data, they were able to accurately detect work- Given that people spend a significant amount of time
ers’ positive and negative emotions with a 98% accuracy in their cars, monitoring their emotions and intervening
rate, using an RF classifier. The authors conducted a when necessary could help reduce accidents, injuries, and
four-day field experiment at a prefabricated stone con- fatalities. Emotion research in automotive environments
struction factory to collect data for their study. Using has focused on identifying and mitigating conditions, such
virtual reality technology, Sun et al. [165] designed envi- as fatigue, confusion, nervousness, distraction, and stress
ronments with varying heights, including ground level, that can impact drivers in automotive environments [181].
4 m, and 8 m. The researchers found a statistically sig- Nonintuitive user interfaces, complex navigation systems,
nificant difference in anxiety levels as indicated by EDA ambiguous traffic signs, and intricate routing can cause
signals in response to the different heights. In a subsequent confusion. Nervousness is another affective state charac-
study, Lee et al. [166] utilized PPG, EDA, and ST signals terized by heightened arousal levels and can negatively
to determine workers’ perceived risk levels in hazardous impact decision-making processes. Li and Ji [182] pro-
occupations. They applied an SVM classifier and obtained posed a method based on dynamic Bayesian networks to
an 81.2% accuracy rate for binary classification. detect fatigue, confusion, and nervousness from physio-
With the increasing adoption of robotics technology in logical signals, facial features, and gaze data from both
factories, there has been a significant focus on developing synthetic and real-world environments.
and improving the accuracy of these systems. However, Earlier stages of fatigue can impact driving performance
researchers have also explored the emotions of work- by reducing physiological vigilance/arousal, slowing down
ers during human–robot interactions, given that this is sensorimotor processes, and impairing information pro-
a relatively new experience for workers with a fear of cessing, leading to slower reaction times and decreased

ability to respond to urgent situations, ultimately increas- and RF classifiers on the DEAP dataset. The researchers
ing the risk of accidents. As a result, fatigue has been created a smart virtual therapist that recognizes human
extensively studied in the automotive environment. Craw- emotions using physiological signals (EEG, ECG, and EDA)
ford [183] suggested that physiological signals are the and provides encouragement, suggestions, and adapts its
most reliable indicators of driver fatigue, which has been voice parameters to the scenario accordingly.
corroborated by numerous studies (e.g., [184], [185], Pain is a combination of sensory and emotional experi-
and [176]) that use physiological signals to estimate driver ences. It can be difficult for infants, anesthetized patients,
fatigue and drowsiness. and people with speech impairments to communicate their
Research has shown that increased driver stress, pain. Self-reports have been the traditional method of
whether short term or long term, can have negative effects gathering data from patients with serious illnesses or
on decision-making ability, driver awareness, and reaction those who have undergone surgery. Nevertheless, these
times in automotive environments [181]. As a result, there reports have a subjective nature and may not always be
is a growing interest in developing methods to detect feasible to obtain in real time, such as during surgical
stress levels in drivers. In one pioneer study, Healey and procedures. Automated pain assessment can be help-
Picard [129] presented a method that employed HR, ful in alleviating suffering, but more improvements are
EEG, and respiration data to assess drivers’ stress levels. needed before it can be clinically adopted. Researchers
EDA signals were also employed with an LDA classifier have developed various machine-learning techniques to
to detect driver stress, and around 80% accuracies were detect pain and mental illnesses. For example, Lopez-
obtained [186]. Martinez and Picard [171] attempted detecting pain using
a MultiTask Neural Network classifier along with SVM
C. Education and e-Learning and RF classifiers using ECG and EDA data from the
BioVid Heat Pain Database [188] and achieved around
Emotion recognition research has found another impor-
80% accuracy. Subramaniam and Dass [172] achieved
tant application in the field of education, particularly in
95% accuracy using a CNN-LSTM classifier on the same
improving e-learning technologies compared to traditional
dataset. Depression is another frequently researched men-
learning methods. By monitoring the emotions of both
tal illness. Chen et al. [173] investigated the physiological
teachers and students, emotion-aware e-learning systems
signals of depression patients and control groups while
have the potential to enhance receptiveness and productiv-
inducing emotions in the laboratory. They computed and
ity. Umematsu et al. [154] detected student stress utilizing
presented a significant statistical difference between these
LSTM classifiers on physiological signals, mobile phone
groups. Cai et al. [174] produced a physiological dataset
usage, location, and behavioral surveys, achieving 83%
that included 213 participants (92 of whom had depres-
accuracy for daily stress level detection. In another study,
sion and 121 were normal controls). EEG signals were
Shen et al. [168] identified four emotions that commonly
recorded during the resting state and sound stimulation.
arise during learning engagement (confusion, boredom,
They applied kNN, decision tree, SVM, and NN clas-
hopefulness, and engagement) and employed SVM on
sifiers and obtained a maximum of 79% accuracy for
EDA, PPG, and EEG signals to detect them with 86% accu-
detecting depression. In addition, emotion recognition sys-
racy. The performance of the emotion-aware e-learning
tems have the potential to enhance the quality of life
system was compared with a baseline e-learning scheme.
for individuals with various genetic disorders, such as
Their experiment prototype offered appropriate interven-
autism, by aiding in the perception and expression of
tions based on the emotional state of the learner. The
emotions. Sarabadani et al. [175] induced emotions using
emotion-aware e-learning system was found to be effective
images on 15 children diagnosed with autism disorder and
in reducing the number of required interventions and
collected ECG, EDA, respiration, and ST. They detected
improving the effectiveness of the e-learning system.
binary arousal and valence with around 80% accuracy
using an ensemble of kNN, LDA, and SVM classifiers. After
D. Healthcare detecting the emotions of children with autism disorder,
The use of physiological data analysis has demon- some studies also try to intervene with social robots to
strated potential in the identification of mental disorders, teach them to perceive and express emotions better [189].
such as depression, panic disorder, anxiety, and phobias. Another interesting application is the detection of emotion
Researchers have been focused on detecting fear and pho- during equine-assisted therapy (EAT), which is a therapy
bia automatically using physiological data. In one study, type that uses horse-related activities to alleviate mental
Handouzi et al. [169] exposed participants to anxiogenic health issues. Althobaiti et al. [179] applied SVM, LDA,
(the environment that causes anxiety and fear) virtual and kNN classifiers to ECG, EMG, and EEG signals recorded
environments to identify anxiety levels in phobic indi- during horse-related activities (looking, grooming, and
viduals. They applied the SVM classifier to BVP data leading) and achieved an F1-score of 78.27% for valence
and achieved 76% accuracy in detecting anxiety levels. and 65.49% for arousal detection.
In another study, Bălan et al. [170] developed an auto- When it comes to emotion regulation, individuals often
matic emotion recognition model using SVM, LDA, kNN, regulate their emotions and other affective states passively.

Fig. 6. Heart activity signal obtained from a PPG sensor during a study in the wild. Artifacts and data gaps in heart activity signal can be
seen when the subject moves (during an increased activity in the acceleration signal) [187].
However, certain regulation strategies, such as emotion a three-axis accelerometer, facial recording, and game
suppression [36], are known to have a more negative screen recording, and achieved around 70% accuracy with
impact than a positive impact. Technology can help people SVM, decision tree, and RF classifiers. In another study,
identify appropriate strategies through experimentation. AlZoubi et al. [178] applied deep neural networks to ECG,
While research has shown that emotion regulation is EDA, EMG, BVP, and respiration signals collected during
often hard to detect with a visual inspection, physiological PlayerUnknown’s Battlegrounds (PUBG) gameplay. They
modalities are promising in validating the efficacy of the achieved around 80% for detecting arousal and valence
interventions for regulation. Slow, controlled breathing has levels. Emotions were also analyzed during touristic travels
been known to regulate affect positively. Several vibrotac- to design and manage tourism experiences better. Kim
tile methods, such as Doppel [190], ambienBeat [191], and Fesenmaier [180] monitored the EDA signals of two
and BoostMeUp [192], have been introduced as the travelers during their touristic visit to Philadelphia (the
means for affect regulation. They provide heartbeat-like United States of America) and demonstrated the changes
stimulation on the wrist. Physiological measurements of in signals in different activities.
respiration and HRV due to controlled breathing induced
by these devices are measured. There are more applica- VIII. R E S E A R C H I S S U E S F O R E M O T I O N
tions to monitor breathing and encourage slower breathing RECOGNITION IN THE WILD
during daily activities, such as Just Breathe [193] and Emotion recognition in the wild or real-world settings
Calm Commute [194]. Furthermore, skin conductance can involves detecting and identifying emotions in
measure the extent of regulation using such applications. uncontrolled and unpredictable environments. However,
However, more studies are required for assessing the effec- several challenges and limitations must be overcome to
tiveness and validity of such technological interventions achieve accurate emotion recognition in such scenarios,
and the affect regulation strategies adopted by the indi- including device limitations, data quality concerns
viduals [195]. (as depicted in Fig. 6), labeling difficulties, privacy
considerations, and more. A few of the challenges are
E. Other Applications described in the following.
The application of emotion recognition is not restricted
to industries such as the workplace, automotive, health- A. Issues Related to Devices
care, and education. It also has a significant role in 1) Selection of Unobtrusive Devices and Access to Raw
enhancing user experience, such as in the field of affec- Data: In order to develop an emotion recognition system
tive gaming, where emotions are detected to enhance the suitable for everyday use, one should employ unobtrusive
gaming experience of players. Yang et al. [177] detected devices, such as smart bands, watches, or straps that can be
anger, boredom, frustration, happiness, and fear emo- worn without much discomfort (refer Fig. 7 for examples
tions during the FIFA2016 video game by analyzing of unobtrusive wrist-worn devices). However, most of the
ECG, EDA, EMG, respiration, and body movement with renowned commercial smart band/watch providers, such

modalities and select the most appropriate interpolation

technique (i.e., one that captures the modality characteris-
tics) to fill in the gaps.
B. Issues Related to Data Annotation

1) Reliability of Self-Report Questionnaires and Emotion
Awareness: To train supervised machine learning algo-
rithms, the physiological data require the ground truth
depicting emotions and their intensity. In laboratory exper-
iments, researchers may establish the intended emotion
Fig. 7. Medical-grade devices are shown in the top row. At the
and intensity level of the stimulus as the ground truth.
bottom, unobtrusive wrist-worn devices are demonstrated [68].
The ground truth for emotions outside the laboratory is
typically obtained through ecological momentary assess-
ment, such as self-report questionnaires, as the context
as Apple Watch, Fitbit, and Microsoft Band 2 [Microsoft and induced emotion level of participants in their daily
ceased support for software development kit (SDK)], lives are unknown to the experimenter. However, the
do not provide access to raw data for research purposes. reliability of self-reports is questionable because they are
After the release of Samsung Galaxy Gear S3, Samsung subjective and dependent on factors such as the individual,
stopped providing IBI data, which was used for HRV culture, and gender, as described in Section V. In addition,
feature calculation, and instead started providing only some individuals may try to conceal their true emotional
HR data. Often, the devices provide processed data and state in experiments, or they may have difficulty accessing
insights related to the user’s health via their proprietary and expressing their own emotions. When considering a
algorithms and applications rather than providing raw data general model capable of recognizing the emotions of all
for research purposes. When researchers aim to develop people, subjective self-reports can decrease accuracy. Fur-
a multimodal system that includes multiple physiological thermore, self-reports are challenging to obtain frequently
modalities, such as HRV, EDA, ACC, ST, and BVP, the options in real time as the emotions occur, leading to delays in
for unobtrusive smart bands become more limited. As a labeling. This can result in the loss of valuable information
result, researchers are often directed toward expensive, and affect the accuracy of the emotion recognition model.
research-oriented bands, such as Empatica E3, E4, and Q
sensor instead of off-the-shelf commercial bands. 2) Necessity for a Substantial Amount of Labeling: Emo-
tion recognition studies in the wild rely on self-reports
2) Battery Life: Continuous data from sensors are nec- collected from users as the ground truth. Although more
essary for monitoring the mental health of individuals frequent and correct labels can result in better-trained
in their daily life. However, unobtrusive smart bands or models, it is challenging for participants to provide
watches have limitations when it comes to battery life. self-reports frequently and accurately during their daily
When all sensors are active, the state-of-the-art batteries routines as this process is time-consuming and demands
of these devices can only endure for a few hours. In our increased compliance from the participants. Therefore,
tests with devices that provide raw data, Samsung Gear S, researchers try to balance this out by finding optimal
S2, and S3 lasted around 4 h, while Microsoft Band 2 (with intervals for collecting self-reports without causing signif-
the latest SDK before support ceased) lasted approximately icant inconvenience to users. Machine learning methods
8 h [113]. Empatica E4 wristband (a research-oriented involving deep learning generally outperform traditional
band with no display) lasted longer than these commercial methods, but they require a significant amount of labeled
devices, with a duration of about 48 h as stated on the training data for robust models. This further increases
website [196]. Commercial devices need to be charged at the demand for annotated data. Recently, semisupervised
least once a day, which makes users hesitant to use them in methods (SSMs) have been proposed for decreasing the
everyday life. This limitation forces researchers to develop need for labels. These methods can generate labels for
more energy-efficient emotion monitoring methods. unannotated data points by making use of the existing
3) Data Quality and Artifacts: Unobtrusive smart bands labeled data. Although researchers recently started using
offer lower data quality and lower sampling frequen- SSM techniques for emotion recognition [197], their use
cies compared to medical-grade systems. They are more in research is still limited.
susceptible to artifacts, which can complicate the deci-
sion process of affect recognition systems. To develop C. Issues Related to Emotion Classes
a robust system, modality-specific artifact detection and 1) Division of Self-Report Scales Into Classes: Self-report
removal algorithms should be developed. Furthermore, collection in the wild involves Likert or Self-Assessment
since the movement of the wrist is almost unrestricted, Manikin (SAM) scales with different resolutions. After the
data gaps can occur during intense activity. To address this data collection, the scale is divided into a number of emo-
issue, researchers need to investigate the characteristics of tion levels or classes for emotion recognition. However,

defining a general threshold for dividing low and high involves the data collection process, which demands spe-
levels of emotions is challenging, given the subjectivity cific procedures. Ethical approval for the experiment
of self-reports and the potential for variation in baselines protocol and informed consent must be obtained from
across individuals. A fixed threshold might decrease the the ethical boards before collecting data from participants.
performance of affect recognition systems. In the litera- During the data collection, informed consent must be
ture, a common technique is to use a fixed threshold, which obtained from the participants by clarifying the purpose of
can be calculated as the number of scales/the number of the study, the data that will be collected, and the rights of
classes. Suppose that we used a 10 Likert scale for emotion participants with respect to their data and contact persons,
detection. We want to detect two-class emotion levels. both verbally and in writing. Another crucial ethical ele-
If we use a fixed threshold of “5” and decide the emotion ment related to the experiments is the emotional stimulus.
level accordingly, we might misclassify the people with Inducing negative effects (i.e., anger, stress, and sadness)
enduring high emotion levels and classify all their data can be challenging because of the ethical constraints [199].
as high emotion. However, investigating the baseline with Generally, researchers use low-intensity emotion induction
questionnaires and increasing or decreasing their baselines techniques, namely, IAPS images, movie clips, emotional
dynamically will improve the performance. In other words, videos, and music, which are approved by the ethical
person-specific thresholds might increase the accuracy. committees. However, this can create a problem when the
Automatic clustering methods, such as K-means clustering, models cannot learn high-intensity responses as in daily
can also be employed to assign self-reports to the desired lives since they are not present in the training data [11].
number of affect levels. Furthermore, privacy must be ensured during stor-
ing and processing of the data. The most important
2) Data Sparsity: As mentioned previously, especially,
step is the anonymization of the information. Instead
deep learning algorithms require a huge amount of data
of anonymization, researchers sometimes also applied
for training. Otherwise, they may overfit, learn the noise in
pseudonymization in which data without personal infor-
the data, and cannot be generalized to other applications.
mation are stored along with a table that maps the subjects
In order to overcome this issue, researchers first try to
to their identity. However, without accessing this table, it is
increase the amount of data synthetically. In a recent study,
impossible to get the identity of the subjects. The following
Nita et al. [198] augmented an ECG dataset with a consid-
example can be provided to clarify the difference between
erable amount of representative ECG samples that were
anonymization and pseudonymization. In pseudonymiza-
created by randomizing, concatenating, and resampling
tion, P32’s physiological data and a table that maps P32
realistic ECG signals in the DREAMER dataset. By applying
to the participant’s real name are stored separately. On the
a seven-layer CNN classifier, they achieved an accuracy
other hand, in anonymization, it is stated that a patient
of 95.16% to detect valence, 85.56% for arousal, and
has the corresponding physiological data, and there is no
77.54% for dominance and increased the baseline (without
way to get the identity of this patient. Both techniques
data augmentation) drastically. When the local dataset
are allowed in different privacy protection laws, such as
size is relatively small, another technique is applying deep
General Data Protection Regulation.
transfer learning (DTL) techniques from prominent large
The second stage pertains to the implementation of emo-
datasets. In DTL, parameters are learned from a relatively
tion recognition technologies in real life. A crucial concern
large dataset, and they are adapted to the local dataset.
is the access rights to physiological data and outcomes. For
In the literature, DTL techniques were applied from the
instance, if employers can access their employees’ stress,
SEED dataset to the DREAMER dataset, and it is reported
anxiety, and workload data, they may exploit it unethically.
that DTL is beneficial in comparison to traditional machine
Potential misuse may include assigning more tasks to
learning techniques. Another problem occurs when data
workers with low mental workloads or terminating those
are imbalanced in terms of class labels. Especially in the
with intense anxiety or stress. Another instance is that
wild, datasets have fewer negative labels than positive
health insurance companies can determine the likelihood
labels. In this case, machine learning algorithms have the
of mental health disorders and charge higher contribution
tendency to classify data points as the majority classes.
premiums to those affected. In addition, the presence of
In order to avoid this issue, researchers can randomly
hidden biases in the training data used for these systems
undersample the majority class and balance the dataset.
can lead to unfair or discriminatory outcomes. These exam-
Another technique is called Synthetic Minority Oversam-
ples highlight the significance of ensuring user data privacy
pling Technique (SMOTE), and it increases the size of the
and addressing ethical concerns.
minority class by creating synthetic data points.
D. Privacy and Ethical Concerns E. Privacy Preserving Machine Learning for Affect
Collecting and processing physiological signals require Recognition From Physiological Signals
careful consideration as they carry sensitive, health-related Researchers proposed FL and DP approaches for
information about individuals. Privacy and ethical con- addressing privacy concerns that occurred during machine
cerns must be addressed in two stages. The first stage learning processes. The FL approach uploads the model

parameters obtained from the sensitive physiological data males and 98.85% for females). They reported that there is
instead of the data itself [160]. Although FL has been a significant difference in how both genders communicate
widely applied for facial features and speech for affect their emotions using the arousal self-reports. On the other
recognition [200], [201], [202], it is rarely used for hand, they also stated that gender does not have an effect
recognizing affects from physiological signals. Can and on the EDA signal during subtle human–computer inter-
Ersoy [160] applied FL for predicting binary perceived action tasks. However, more comprehensive experiments
stress using heart activity. Each client trained an MLP are needed for more accurate conclusions. The research
classifier on local data and shared model parameters community should encourage people to create more open
for each update. The parameters were then averaged real-life datasets with this homogeneity. Another state-of-
by using the FedAvg algorithm. FL was also applied to the-art solution to the generalizability and transferability
multimodal physiological signals. Nandi and Xhafa [203] problem of traditional machine learning algorithms (sta-
developed an FL-based Fed-ReMECS framework for rec- tistical models) is causal representation learning [207].
ognizing arousal and valence levels. They validated their Although causal representation learning has several pos-
neural network-based FL approach on EDA and respiration sible real-world applications in different fields, such as
data from the DEAP dataset. In these studies, researchers health care, marketing, political science, and online adver-
applied FL without sacrificing the affect recognition perfor- tising, and has achieved promising performances, it has not
mances. been applied to physiological signals for emotion recogni-
Although FL improved the process of training mod- tion, but it can solve the abovementioned problems.
els in terms of privacy, the privacy vulnerabilities of The development of accurate and reliable emotion
the stochastic gradient descent (SGD) algorithm remain recognition systems for real-world environments is a com-
unsolved. The DP mechanism can be explained as injecting plex and challenging task. It demands interdisciplinary
noise into each model client or server, perturbing the collaboration and encourages the development of new
updates, and restricting gradient leakage between client techniques and methodologies.
and server [204]. DP can be applied alone without FL
settings. In a physiological signal-based activity recog-
nition case, the noise is added to the data directly so IX. C O N C L U S I O N S A N D F U T U R E
that personal information is lost, but activity data can PERSPECTIVES
still be used by compromising on the performance to an The purpose of this tutorial was to provide guidance for
extent [159]. It further improved the privacy vulnerabil- new researchers entering the field of emotion recognition.
ities when applied together with FL on speech emotion It covered the essential steps of developing an emotion
recognition tasks [205]. However, a combination of FL recognition system, including understanding the theories
and DP has not been applied to the physiological data for of emotion and their regulation, the physiological and
recognizing emotions yet. psychological basis of emotions, designing scientific experi-
ments for studying emotions, utilizing wearable devices for
capturing physiological modalities, identifying prominent
F. Generalizability features of each modality, and applying both traditional
Another issue is the generalizability of the emotion machine learning and deep learning methods for analyzing
recognition research. Unfortunately, most of the studies physiological data.
are published on private datasets, which makes it difficult Emotion elicitation and regulation theories have pro-
to apply new techniques to these datasets and creates a vided a framework for understanding the factors that
question about repeatability. On the other hand, as pre- contribute to the experience of emotions and their expres-
viously mentioned in Section V, many of the current sions, which can aid in the development of more accurate
datasets were collected in controlled laboratory settings emotion recognition models. Research has demonstrated
with artificial stimuli, such as watching movie clips or that emotions are expressed through various psychological,
listening to music. It is widely known that emotional physiological, and behavioral modalities. Multimodality
responses in such laboratory environments can differ from has been shown to enhance the performance of emotion
those in natural daily life situations where the stimuli may recognition systems. We emphasize the importance of mul-
be more personal and subjectively appraised with greater timodality and selecting appropriate ones considering their
intensity [11]. Furthermore, since most of the research is advantages and disadvantages of each modality for specific
conducted at universities, participants are generally col- environments and application goals.
lege students of a certain age. However, if these algorithms Another crucial consideration is the choice of machine
are applied to the general population, the participant learning techniques. While many studies prioritize per-
should be selected from different ages, cultures, gender, formance and accuracy, other important factors, such as
and social status homogeneously. Liapis et al. [206] exam- privacy and explainability, also need to be taken into
ined the effect of gender on stress recognition using EDA account when designing emotion recognition systems.
signals. They trained gender-specific models and achieved Unfortunately, many existing research works overlook
high accuracy for detecting stress (94.80% accuracy for these factors, and it is essential to explicitly address and

discuss them during the development and deployment of emotion recognition systems, especially using physiolog-
such systems. ical signals. By considering the aspects of emotions,
As research progresses toward real-life emotion data col- utilizing multimodality, and addressing ethical considera-
lection and recognition, there are several open challenges tions, researchers can develop more robust and effective
that need to be addressed, including selecting good-quality emotion recognition systems that can contribute to a wide
unobtrusive devices, handling low-quality data, and using range of applications in fields such as psychology, health-
subjective self-reports as ground truth. This tutorial aims care, human–computer interaction, and social robotics. ■
to provide the necessary information for future research in
addressing these challenges. Acknowledgment
In summary, this tutorial covers various aspects from This work was carried out within the framework of the AI
theoretical foundations to practical implementation of Production Network Augsburg.
REFERENCES
[1] A. Rowe and J. Fitness, “Understanding the role of emotion detection,” Inf. Fusion, vol. 49, pp. 46–56, J. Pers. Social Psychol., vol. 39, no. 6,
negative emotions in adult learning and Sep. 2019. pp. 1161–1178, Dec. 1980.
achievement: A social functional perspective,” [14] F. A. Alskafi, A. H. Khandoker, and H. F. Jelinek, [32] A. Mehrabian, “Pleasure-arousal-dominance: A
Behav. Sci., vol. 8, no. 2, p. 27, Feb. 2018. “A comparative study of arousal and valence general framework for describing and measuring
[2] R. Erickson and W. Grove, “Why emotions matter: dimensional variations for emotion recognition individual differences in temperament,” Current
Age, agitation, and burnout among registered using peripheral physiological signals acquired Psychol., J. Diverse Perspect. Diverse Psychol. Issues,
nurses,” Online J. Issues Nursing, vol. 13, no. 1, from wearable sensors,” in Proc. 43rd Annu. Int. vol. 14, no. 4, pp. 261–292, Dec. 1996.
pp. 1–12, Oct. 2007. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Nov. 2021, [33] R. Plutchik, “The nature of emotions: Human
[3] M. R. Kamdar and M. J. Wu, “PRISM: A pp. 1104–1107. emotions have deep evolutionary roots, a fact that
data-driven platform for monitoring mental [15] P. Schmidt, R. Dürichen, A. Reiss, may explain their complexity and provide tools for
health,” in Proc. Pacific Symp., Biocomputing, K. Van Laerhoven, and T. Plötz, “Multi-target clinical practice,” Amer. Scientist, vol. 89, no. 4,
Singapore: World Scientific, Jan. 2016, affect detection in the wild: An exploratory study,” pp. 344–350, 2001.
pp. 333–344. in Proc. 23rd Int. Symp. Wearable Comput., [34] D. Sander, D. Grandjean, and K. R. Scherer,
[4] 2015 Motor Vehicle Crashes: Overview, Traffic Sep. 2019, pp. 211–219. “A systems approach to appraisal mechanisms in
Safety Facts: Research Note, National Highway [16] L. Shu et al., “A review of emotion recognition emotion,” Neural Netw., vol. 18, no. 4,
Traffic Safety Administration, Washington, DC, using physiological signals,” Sensors, vol. 18, pp. 317–352, May 2005.
USA, 2016, pp. 1–9. no. 7, p. 2074, Jun. 2018. [35] Y. S. Can and C. Ersoy, “Smart affect monitoring
[5] L. C. De Silva, T. Miyasato, and R. Nakatsu, “Facial [17] Y. S. Can, N. Chalabianloo, D. Ekiz, with wearables in the wild: An unobtrusive
emotion recognition using multi-modal J. Fernandez-Alvarez, G. Riva, and C. Ersoy, mood-aware emotion recognition system,” IEEE
information,” in Proc. Int. Conf. Inf., Commun. “Personal stress-level clustering and decision-level Trans. Affect. Comput., early access, Dec. 27, 2022,
Signal Process. (ICICS), Theme, Trends Inf. Syst. smoothing to enhance the performance of doi: 10.1109/TAFFC.2022.3232483.
Eng. Wireless Multimedia Commun., 1997, ambulatory stress detection with smartwatches,” [36] J. J. Gross, “Emotion regulation: Affective,
pp. 397–401. IEEE Access, vol. 8, pp. 38146–38163, 2020. cognitive, and social consequences,”
[6] B. Schuller, G. Rigoll, and M. Lang, “Hidden [18] W. James, “What is an emotion?” Mind, vol. 9, Psychophysiology, vol. 39, no. 3, pp. 281–291,
Markov model-based speech emotion no. 34, pp. 188–205, 1884. [Online]. Available: May 2002.
recognition,” in Proc. IEEE Int. Conf. Acoust., http://www.jstor.org/stable/2246769 [37] H. A. Demaree, B. J. Schmeichel, J. L. Robinson,
Speech, Signal Process. (ICASSP), vol. 2, Apr. 2003, [19] W. James, The Principles of Psychology, vol. 1. J. Pu, D. E. Everhart, and G. G. Berntson, “Up- and
p. 2. New York, NY, USA: Henry Holt, 1890. down-regulating facial disgust: Affective, vagal,
[7] IDC. (Mar. 10, 2020). Wearables Unit Shipments [20] W. B. Cannon, “The James–Lange theory of sympathetic, and respiratory consequences,” Biol.
Worldwide by Vendor From 2014 to 2019 (in emotions: A critical examination and an Psychol., vol. 71, no. 1, pp. 90–99, Jan. 2006.
Millions) [Graph]. Accessed: Feb. 25, 2023. alternative theory,” Amer. J. Psychol., vol. 39, [38] C. R. Harris, “Cardiovascular responses of
[Online]. Available: https://www.statista.com/ nos. 1–4, p. 106, Dec. 1927. embarrassment and effects of emotional
statistics/515634/wearables-shipments- [21] S. Schachter, The Interaction of Cognitive and suppression in a social setting,” J. Pers. Social
worldwide-by-vendor/ Physiological Determinants of Emotional State Psychol., vol. 81, no. 5, pp. 886–897, 2001.
[8] B. Zhao, Z. Wang, Z. Yu, and B. Guo, (Advances in Experimental Social Psychology). [39] J. Zaehringer, C. Jennen-Steinmetz, C. Schmahl,
“EmotionSense: Emotion recognition based on [22] M. B. Arnold, Emotion and Personality. New York, G. Ende, and C. Paret, “Psychophysiological effects
wearable wristband,” in Proc. IEEE SmartWorld, NY, USA: Columbia Univ. Press, 1960. of downregulating negative emotions: Insights
Ubiquitous Intell. Comput., Adv. Trusted Comput., [23] K. R. Scherer, “Appraisal considered as a process from a meta-analysis of healthy adults,” Frontiers
Scalable Comput. Commun., Cloud Big Data of multilevel sequential checking,” in Appraisal Psychol., vol. 11, p. 470, Apr. 2020.
Comput., Internet People Smart City Innov. (Smart- Processes in Emotion: Theory, Methods, Research, [40] J. Gratch and S. Marsella, “A domain-independent
World/SCALCOM/UIC/ATC/CBDCom/IOP/SCI), vol. 92, no. 120. 2001, p. 57. framework for modeling emotion,” Cogn. Syst.
Oct. 2018, pp. 346–355. [24] R. S. Lazarus, Psychological Stress and the Coping Res., vol. 5, no. 4, pp. 269–306, Dec. 2004.
[9] B. Nakisa, M. N. Rastgoo, A. Rakotonirainy, Process. New York, NY, USA: McGraw-Hill, 1966. [41] R. J. Davidson, P. Ekman, C. D. Saron,
F. Maire, and V. Chandran, “Long short term [25] I. J. Roseman, A. A. Antoniou, and P. E. Jose, J. A. Senulis, and W. V. Friesen,
memory hyperparameter optimization for a neural “Appraisal determinants of emotions: Constructing “Approach-withdrawal and cerebral asymmetry:
network based emotion recognition framework,” a more accurate and comprehensive theory,” Emotional expression and brain physiology: I,”
IEEE Access, vol. 6, pp. 49325–49338, 2018. Cogn. Emotion, vol. 10, no. 3, pp. 241–277, 1996. J. Pers. Social Psychol., vol. 58, no. 2, pp. 330–341,
[10] S. Saganowski et al., “Emotion recognition using [26] P. Ekman, R. W. Levenson, and W. V. Friesen, 1990.
wearables: A systematic literature “Autonomic nervous system activity distinguishes [42] R. J. Davidson, “Anterior cerebral asymmetry and
review—Work-in-progress,” in Proc. IEEE Int. Conf. among emotions,” Science, vol. 221, no. 4616, the nature of emotion,” Brain Cogn., vol. 20,
Pervasive Comput. Commun. Workshops (PerCom pp. 1208–1210, Sep. 1983. no. 1, pp. 125–151, Sep. 1992.
Workshops), Mar. 2020, pp. 1–6. [27] P. Ekman, “An argument for basic emotions,” Cogn. [43] J. A. Coan and J. J. B. Allen, “Frontal EEG
[11] R. W. Picard, “Automating the recognition of stress Emotion, vol. 6, nos. 3–4, pp. 169–200, May 1992. asymmetry as a moderator and mediator of
and emotion: From lab to real-world impact,” [28] R. S. Lazarus and B. N. Lazarus, Passion and emotion,” Biol. Psychol., vol. 67, nos. 1–2,
IEEE Multimedia Mag., vol. 23, no. 3, pp. 3–7, Reason: Making Sense of Our Emotions. New York, pp. 7–50, Oct. 2004.
Jul. 2016. NY, USA: Oxford University Press, 1994. [44] J. T. Larsen, G. G. Berntson, K. M. Poehlmann,
[12] S. Saganowski, B. Perz, A. Polak, and P. Kazienko, [29] P. Ekman, Basic Emotions. Hoboken, NJ, USA: T. A. Ito, and J. T. Cacioppo, “The
“Emotion recognition for everyday life using Wiley, 1999, ch. 3, pp. 45–60. [Online]. Available: psychophysiology of emotion,” in Handbook of
physiological signals from wearables: A systematic https://onlinelibrary.wiley.com/doi/abs/10.1002/ Emotions, M. Lewis, J. M. Haviland-Jones, and
literature review,” IEEE Trans. Affect. Comput., 0470013494.ch3 L. F. Barrett, Eds. The Guilford Press, 2008,
early access, May 20, 2022, doi: 10.1109/TAFFC. [30] A. S. Cowen and D. Keltner, “Self-report captures pp. 180–195.
2022.3176135. 27 distinct categories of emotion bridged by [45] J. Tarchanoff, “Galvanic phenomena in the human
[13] E. Kanjo, E. M. G. Younis, and C. S. Ang, “Deep continuous gradients,” Proc. Nat. Acad. Sci. USA, skin during stimulation of the sensory organs and
learning analysis of mobile physiological, vol. 114, no. 38, pp. E7900–E7909, Sep. 2017. during various forms of mental activity,” Pflugers
environmental and location sensor data for [31] J. A. Russell, “A circumplex model of affect,” Arch. fur die Gesamte Physiologiedes Menschen und

der Tiere, vol. 46, no. 1, pp. 46–55, 1890. Expert Syst., vol. 34, no. 6, Dec. 2017, Art. no. e2105573118.
[46] J. T. Cacioppo and R. E. Petty, “Electromyograms Art. no. e12219. [83] M. Chu et al., “Respiration rate and volume
as measures of extent and affectivity of [65] T. Ruf, “The Lomb–Scargle periodogram in measurements using wearable strain sensors,” NPJ
information processing,” Amer. Psychol., vol. 36, biological rhythm research: Analysis of incomplete Digit. Med., vol. 2, no. 1, pp. 1–9, Feb. 2019.
no. 5, pp. 441–456, 1981. and unequally spaced time-series,” Biol. Rhythm [84] T. Nguyen, H. Okada, Y. Takei, A. Takei, and
[47] G. G. Berntson, J. T. Cacioppo, and K. S. Quigley, Res., vol. 30, no. 2, pp. 178–201, Apr. 1999. M. Ichiki, “A band-aid type sensor for wearable
“Respiratory sinus arrhythmia: Autonomic origins, [66] V. Houshyarifar and M. C. Amirani, “Early physiological monitoring,” in Proc. 21st Int. Conf.
physiological mechanisms, and detection of sudden cardiac death using Poincaré Solid-State Sens., Actuators Microsyst.
psychophysiological implications,” plots and recurrence plot-based features from (Transducers), Jun. 2021, pp. 1432–1435.
Psychophysiology, vol. 30, no. 2, pp. 183–196, HRV signals,” TURKISH J. Electr. Eng. Comput. Sci., [85] A. A. Alian and H. K. Shelley,
Mar. 1993. vol. 25, no. 2, pp. 1541–1553, 2017. “Photoplethysmography,” Best Pract. Res. Clin.
[48] S. E. Taylor, “Asymmetrical effects of positive and [67] J. Bolea, E. Pueyo, M. Orini, and R. Bailón, Anaesthesiol., vol. 28, no. 4, pp. 395–406, 2014.
negative events: The mobilization-minimization “Influence of heart rate in non-linear HRV indices [86] G. Valenza, A. Lanata, and E. P. Scilingo, “The role
hypothesis.,” Psychol. Bull., vol. 110, no. 1, as a sampling rate effect evaluated on supine and of nonlinear dynamics in affective valence and
pp. 67–85, Jul. 1991. standing,” Frontiers Physiol., vol. 7, p. 501, arousal recognition,” IEEE Trans. Affect. Comput.,
[49] P. J. Davis, S. P. Zhang, A. Winkworth, and Nov. 2016. vol. 3, no. 2, pp. 237–249, Apr. 2012.
R. Bandler, “Neural control of vocalization: [68] Y. S. Can, B. Arnrich, and C. Ersoy, “Stress [87] K. Yang et al., “Behavioral and physiological
Respiratory and emotional influences,” J. Voice, detection in daily life scenarios using smart signals-based deep multimodal approach for
vol. 10, no. 1, pp. 23–38, Jan. 1996. phones and wearable sensors: A survey,” mobile emotion recognition,” IEEE Trans. Affect.
[50] K. Scherer, “Vocal communication of emotion: A J. Biomed. Informat., vol. 92, Apr. 2019, Comput., vol. 14, no. 2, pp. 1082–1097,
review of research paradigms,” Speech Commun., Art. no. 103139. Apr./Jun. 2023.
vol. 40, nos. 1–2, pp. 227–256, Apr. 2003. [69] P. Ekman and W. V. Friesen, “Facial action coding [88] M. Valstar and M. Pantic, “Fully automatic facial
[51] S. M. Alarcão and M. J. Fonseca, “Emotions system (FACS) [database record],” APA PsycTests, action unit detection and temporal analysis,” in
recognition using EEG signals: A survey,” IEEE 1978, doi: 10.1037/t27734-000. Proc. Conf. Comput. Vis. Pattern Recognit.
Trans. Affect. Comput., vol. 10, no. 3, pp. 374–393, [70] B. D. Gelder, “Why bodies? Twelve reasons for Workshop (CVPRW), Jun. 2006, p. 149.
Jul. 2019. including bodily expressions in affective [89] C. Shan, S. Gong, and P. W. McOwan, “Facial
[52] K. L. Phan, T. Wager, S. F. Taylor, and I. Liberzon, neuroscience,” Phil. Trans. Roy. Soc. B, Biol. Sci., expression recognition based on local binary
“Functional neuroanatomy of emotion: A vol. 364, no. 1535, pp. 3475–3484, Dec. 2009. patterns: A comprehensive study,” Image Vis.
meta-analysis of emotion activation studies in PET [71] M. B. I. Reaz, M. S. Hussain, and F. Mohd-Yasin, Comput., vol. 27, no. 6, pp. 803–816, May 2009.
and fMRI,” NeuroImage, vol. 16, no. 2, “Techniques of EMG signal analysis: Detection, [Online]. Available: https://www.sciencedirect.
pp. 331–348, Jun. 2002. processing, classification and applications,” Biol. com/science/article/pii/S0262885608001844
[53] A. R. Damasio et al., “Subcortical and cortical Procedures Online, vol. 8, no. 1, pp. 11–35, [90] S. Li and W. Deng, “Deep facial expression
brain activity during the feeling of self-generated Dec. 2006. recognition: A survey,” IEEE Trans. Affect. Comput.,
emotions,” Nature Neurosci., vol. 3, no. 10, [72] J. Chen, T. Ro, and Z. Zhu, “Emotion recognition vol. 13, no. 3, pp. 1195–1215, Jul. 2022.
pp. 1049–1056, Oct. 2000. with audio, video, EEG, and EMG: A dataset and [91] F. Z. Canal et al., “A survey on facial emotion
[54] A. R. Hariri, S. Y. Bookheimer, and J. C. Mazziotta, baseline approaches,” IEEE Access, vol. 10, recognition techniques: A state-of-the-art
“Modulating emotional responses: Effects of a pp. 13229–13242, 2022. literature review,” Inf. Sci., vol. 582, pp. 593–617,
neocortical network on the limbic system,” [73] M. Sharma, J. Darji, M. Thakrar, and Jan. 2022, doi: 10.1016/j.ins.2021.10.005.
NeuroReport, vol. 11, no. 1, pp. 43–48, Jan. 2000. U. R. Acharya, “Automated identification of sleep [92] M. Ali, A. H. Mosa, F. A. Machot, and K.
[55] C. P. Niemic, “Studies of emotion: A theoretical disorders using wavelet-based features extracted Kyamakya, “Emotion recognition involving
and empirical review of psychophysiological from electrooculogram and electromyogram physiological and speech signals: A
studies of emotion,” J. Undergraduate Res., vol. 1, signals,” Comput. Biol. Med., vol. 143, Apr. 2022, comprehensive review,” in Recent Advances in
no. 1, pp. 15–18, Fall 2002. Art. no. 105224. Nonlinear Dynamics and Synchronization (Studies
[56] Y. Jiang, W. Li, M. S. Hossain, M. Chen, [74] G. Giannakakis, D. Grigoriadis, K. Giannakaki, in Systems, Decision and Control), vol. 109,
A. Alelaiwi, and M. Al-Hammadi, “A snapshot O. Simantiraki, A. Roniotis, and M. Tsiknakis, K. Kyamakya, W. Mathis, R. Stoop, J. Chedjou,
research and implementation of multimodal “Review on psychological stress detection using and Z. Li, Eds. Cham, Switzerland: Springer,
information fusion for data-driven emotion biosignals,” IEEE Trans. Affect. Comput., vol. 13, 2018, doi: 10.1007/978-3-319-58996-1_13.
recognition,” Inf. Fusion, vol. 53, pp. 209–221, no. 1, pp. 440–460, Jan. 2022. [93] J. Kim and E. André, “Emotion recognition using
Jan. 2020. [75] C. W. Darrow, “Sensory, secretory, and electrical physiological and speech signal in short-term
[57] X. Huang et al., “Multi-modal emotion analysis changes in the skin following bodily excitation,” observation,” in Perception and Interactive
from facial expressions and J. Exp. Psychol., vol. 10, no. 3, pp. 197–226, Technologies. Kloster Irsee, Germany: Springer,
electroencephalogram,” Comput. Vis. Image Jun. 1927. Jun. 2006, pp. 53–64.
Understand., vol. 147, pp. 114–124, Jun. 2016. [76] J. Shukla, M. Barreda-Ángeles, J. Oliver, and [94] M. B. Akçay and K. Oğuz, “Speech emotion
[58] A. Khosla, P. Khandnor, and T. Chand, D. Puig, “Efficient wavelet-based artifact removal recognition: Emotional models, databases,
“A comparative analysis of signal processing and for electrodermal activity in real-world features, preprocessing methods, supporting
classification methods for different applications applications,” Biomed. Signal Process. Control, modalities, and classifiers,” Speech Commun.,
based on EEG signals,” Biocybern. Biomed. Eng., vol. 42, pp. 45–52, Apr. 2018. vol. 116, pp. 56–76, Jan. 2020. [Online].
vol. 40, no. 2, pp. 649–690, Apr. 2020. [77] W. Chen, N. Jaques, S. Taylor, A. Sano, S. Fedor, Available: https://www.sciencedirect.
[59] I. Winkler, S. Haufe, and M. Tangermann, and R. W. Picard, “Wavelet-based motion artifact com/science/article/pii/S0167639319302262
“Automatic classification of artifactual removal for electrodermal activity,” in Proc. 37th [95] J. Wagner et al., “Dawn of the transformer era in
ICA-components for artifact removal in EEG Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), speech emotion recognition: Closing the valence
signals,” Behav. Brain Functions, vol. 7, pp. 1–15, Aug. 2015, pp. 6223–6226. gap,” IEEE Trans. Pattern Anal. Mach. Intell., early
Aug. 2011. [78] S. Taylor, N. Jaques, W. Chen, S. Fedor, A. Sano, access, Mar. 31, 2023, doi: 10.1109/TPAMI.
[60] I. Stancin, M. Cifrek, and A. Jovic, “A review of and R. Picard, “Automatic identification of 2023.3263585.
EEG signal features and their application in driver artifacts in electrodermal activity data,” in Proc. [96] B. D. Gelder, “Towards the neurobiology of
drowsiness detection systems,” Sensors, vol. 21, 37th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. emotional body language,” Nature Rev. Neurosci.,
no. 11, p. 3786, May 2021. (EMBC), Aug. 2015, pp. 1934–1937. vol. 7, no. 3, pp. 242–249, Mar. 2006, doi:
[61] P. A. Abhang, B. W. Gawali, and S. C. Mehrotra, [79] A. Greco, G. Valenza, A. Lanata, E. P. Scilingo, and 10.1038/nrn1872.
“Technical aspects of brain rhythms and speech L. Citi, “CvxEDA: A convex optimization approach [97] A. Mehrabian, Nonverbal Communication.
parameters,” in Introduction to EEG- and to electrodermal activity processing,” IEEE Trans. Evanston, IL, USA: Routledge, 2017.
Speech-Based Emotion Recognition. New York, NY, Biomed. Eng., vol. 63, no. 4, pp. 797–804, [98] A. Kleinsmith and N. Bianchi-Berthouze, “Affective
USA: Academic, 2016, ch. 3, pp. 51–79. Apr. 2016. body expression perception and recognition: A
[62] S. Greene, H. Thapliyal, and A. Caban-Holt, [80] S. A. H. Aqajari, E. K. Naeini, M. A. Mehrabadi, survey,” IEEE Trans. Affect. Comput., vol. 4, no. 1,
“A survey of affective computing for stress S. Labbaf, N. Dutt, and A. M. Rahmani, “PyEDA: pp. 15–33, Jan. 2013.
detection: Evaluating technologies in stress An open-source Python toolkit for pre-processing [99] Z. Witkower and J. L. Tracy, “Bodily
detection for better health,” IEEE Consum. and feature extraction of electrodermal activity,” communication of emotion: Evidence for
Electron. Mag., vol. 5, no. 4, pp. 44–56, Oct. 2016. Proc. Comput. Sci., vol. 184, pp. 99–106, extrafacial behavioral expressions and available
[63] B. Cinaz, B. Arnrich, R. La Marca, and G. Tröster, Jan. 2021. coding systems,” Emotion Rev., vol. 11, no. 2,
“Monitoring of mental workload levels during an [81] C. Kappeler-Setz, Multimodal Emotion and Stress pp. 184–193, May 2018, doi: 10.1177/
everyday life office-work scenario,” Pers. Recognition. Zürich, Switzerland: ETH Zürich, 1754073917749880.
Ubiquitous Comput., vol. 17, no. 2, pp. 229–239, 2012. [100] L. D. Lopez, P. J. Reschke, J. M. Knothe, and
Feb. 2013. [82] A. M. Gordon and W. B. Mendes, “A large-scale E. A. Walle, “Postural communication of emotion:
[64] M. Adnane and A. Belouchrani, “Heartbeats study of stress, emotions, and blood pressure in Perception of distinct poses of five discrete
classification using QRS and T waves daily life using a digital platform,” Proc. Nat. Acad. emotions,” Frontiers Psychol., vol. 8, p. 710,
autoregressive features and RR interval features,” Sci. USA, vol. 118, no. 31, Aug. 2021, May 2017, doi: 10.3389/fpsyg.2017.00710.

[101] F. Ahmed, A. S. M. H. Bari, and M. L. Gavrilova, Affect. Comput., vol. 3, no. 1, pp. 18–31, emotion regulation: Divergent consequences for
“Emotion recognition from body movement,” IEEE Jan. 2012. experience, expression, and physiology,” J. Pers.
Access, vol. 8, pp. 11761–11781, 2020, doi: [121] M. Soleymani, J. Lichtenauer, T. Pun, and Social Psychol., vol. 74, no. 1, pp. 224–237, 1998.
10.1109/ACCESS.2019.2963113. M. Pantic, “A multimodal database for affect [139] J. L. Tsai, R. W. Levenson, and K. McCoy, “Cultural
[102] J. Wu, Y. Zhang, S. Sun, Q. Li, and X. Zhao, recognition and implicit tagging,” IEEE Trans. and temperamental variation in emotional
“Generalized zero-shot emotion recognition from Affect. Comput., vol. 3, no. 1, pp. 42–55, response,” Emotion, vol. 6, no. 3, pp. 484–497,
body gestures,” Appl. Intell., vol. 52, Jan. 2012. 2006.
pp. 8616–8634, Jun. 2022. [122] S. Katsigiannis and N. Ramzan, “DREAMER: A [140] F. Faul, E. Erdfelder, A.-G. Lang, and A. Buchner,
[103] S. Xu et al., “Emotion recognition from gait database for emotion recognition through EEG “G∗ Power 3: A flexible statistical power analysis
analyses: Current research and future directions,” and ECG signals from wireless low-cost program for the social, behavioral, and biomedical
IEEE Trans. Computat. Social Syst., early access, off-the-shelf devices,” IEEE J. Biomed. Health sciences,” Behav. Res. Methods, vol. 39, no. 2,
Dec. 5, 2022, doi: 10.1109/TCSS.2022.3223251. Informat., vol. 22, no. 1, pp. 98–107, Jan. 2018. pp. 175–191, May 2007.
[104] V. Mishra et al., “Investigating the role of context [123] P. Schmidt, A. Reiss, R. Duerichen, C. Marberger, [141] R. V. Krejcie and D. W. Morgan, “Determining
in perceived stress detection in the wild,” in Proc. and K. Van Laerhoven, “Introducing WESAD, a sample size for research activities,” Educ. Psychol.
ACM Int. Joint Conf. Int. Symp. Pervasive multimodal dataset for wearable stress and affect Meas., vol. 30, no. 3, pp. 607–610, Sep. 1970.
Ubiquitous Comput. Wearable Comput., Oct. 2018, detection,” in Proc. 20th ACM Int. Conf. [142] W. Cochran, W. Cochran, and A. Bouclier,
pp. 1708–1716. Multimodal Interact., Oct. 2018, pp. 400–408. Sampling Techniques (Wiley Series in Probability
[105] P. Kostopoulos, A. I. Kyritsis, M. Deriaz, and [124] J. A. Miranda-Correa, M. K. Abadi, N. Sebe, and and Statistics). Hoboken, NJ, USA: Wiley, 1977.
D. Konstantas, “Stress detection using smart I. Patras, “AMIGOS: A dataset for affect, [143] R. W. Levenson, “The autonomic nervous system
phone data,” in eHealth 360◦ . Berlin, Germany: personality and mood research on individuals and and emotion,” Emotion Rev., vol. 6, no. 2,
Springer, 2017, pp. 340–351. groups,” IEEE Trans. Affect. Comput., vol. 12, pp. 100–112, Mar. 2014.
[106] A. Sano and R. W. Picard, “Stress recognition no. 2, pp. 479–493, Apr. 2021. [144] Y.-L. Hsu, J.-S. Wang, W.-C. Chiang, and
using wearable sensors and mobile phones,” in [125] K. Sharma, C. Castellini, E. L. van den Broek, C.-H. Hung, “Automatic ECG-based emotion
Proc. Hum. Assoc. Conf. Affect. Comput. Intell. A. Albu-Schaeffer, and F. Schwenker, “A dataset of recognition in music listening,” IEEE Trans. Affect.
Interact., Sep. 2013, pp. 671–676. continuous affect annotations and physiological Comput., vol. 11, no. 1, pp. 85–99, Jan. 2020.
[107] Y. S. Can et al., “Real-life stress level monitoring signals for emotion analysis,” Sci. Data, vol. 6, [145] C. Setz, B. Arnrich, J. Schumm, R. La Marca,
using smart bands in the light of contextual no. 1, pp. 1–13, Oct. 2019. G. Tröster, and U. Ehlert, “Discriminating stress
information,” IEEE Sensors J., vol. 20, no. 15, [126] R. Subramanian, J. Wache, M. K. Abadi, from cognitive load using a wearable EDA device,”
pp. 8721–8730, Aug. 2020. R. L. Vieriu, S. Winkler, and N. Sebe, “ASCERTAIN: IEEE Trans. Inf. Technol. Biomed., vol. 14, no. 2,
[108] M. Soleymani, S. Asghari-Esfeden, Y. Fu, and Emotion and personality recognition using pp. 410–417, Mar. 2010.
M. Pantic, “Analysis of EEG signals and facial commercial sensors,” IEEE Trans. Affect. Comput., [146] F. Agrafioti, D. Hatzinakos, and A. K. Anderson,
expressions for continuous emotion detection,” vol. 9, no. 2, pp. 147–160, Apr. 2018. “ECG pattern analysis for emotion detection,”
IEEE Trans. Affect. Comput., vol. 7, no. 1, [127] S. Carvalho, J. Leite, S. Galdo-Álvarez, and IEEE Trans. Affect. Comput., vol. 3, no. 1,
pp. 17–28, Jan. 2016. Ó. F. Gonçalves, “The emotional movie database pp. 102–115, Jan. 2012.
[109] D. Fabiano and S. Canavan, “Emotion recognition (EMDB): A self-report and psychophysiological [147] W. Wen, G. Liu, N. Cheng, J. Wei, P. Shangguan,
using fused physiological signals,” in Proc. 8th Int. study,” Appl. Psychophysiol. Biofeedback, vol. 37, and W. Huang, “Emotion recognition based on
Conf. Affect. Comput. Intell. Interact. (ACII), no. 4, pp. 279–294, Dec. 2012. multi-variant correlation of physiological signals,”
Sep. 2019, pp. 42–48. [128] S. Schneegass, B. Pfleging, N. Broy, F. Heinrich, IEEE Trans. Affect. Comput., vol. 5, no. 2,
[110] D. Shin, D. Shin, and D. Shin, “Development of and A. Schmidt, “A data set of real world driving pp. 126–140, Apr. 2014.
emotion recognition interface using complex to assess driver workload,” in Proc. 5th Int. Conf. [148] D. Ayata, Y. Yaslan, and M. E. Kamasak, “Emotion
EEG/ECG bio-signal for interactive contents,” Automot. User Interfaces Interact. Veh. Appl., based music recommendation system using
Multimedia Tools Appl., vol. 76, no. 9, Oct. 2013, pp. 150–157. wearable physiological sensors,” IEEE Trans.
pp. 11449–11470, May 2017. [129] J. A. Healey and R. W. Picard, “Detecting stress Consum. Electron., vol. 64, no. 2, pp. 196–203,
[111] S. Wang, J. Du, and R. Xu, “Decision fusion for during real-world driving tasks using May 2018.
EEG-based emotion recognition,” in Proc. Int. physiological sensors,” IEEE Trans. Intell. Transp. [149] V. Kolodyazhniy, S. D. Kreibig, J. J. Gross,
Conf. Mach. Learn. Cybern. (ICMLC), vol. 2, Syst., vol. 6, no. 2, pp. 156–166, Jun. 2005. W. T. Roth, and F. H. Wilhelm, “An affective
Jul. 2015, pp. 883–889. [130] R. W. Picard, E. Vyzas, and J. Healey, “Toward computing approach to physiological emotion
[112] M. Umair, N. Chalabianloo, C. Sas, and C. Ersoy, machine emotional intelligence: Analysis of specificity: Toward subject-independent and
“HRV and stress: A mixed-methods approach for affective physiological state,” IEEE Trans. Pattern stimulus-independent classification of
comparison of wearable heart rate sensors for Anal. Mach. Intell., vol. 23, no. 10, film-induced emotions,” Psychophysiology, vol. 48,
biofeedback,” IEEE Access, vol. 9, pp. 1175–1191, 2001. no. 7, pp. 908–922, Jul. 2011.
pp. 14005–14024, 2021. [131] X. Shui, M. Zhang, Z. Li, X. Hu, F. Wang, and [150] T. Wierciński, M. Rock, R. Zwierzycki,
[113] Y. S. Can, N. Chalabianloo, D. Ekiz, and C. Ersoy, D. Zhang, “A dataset of daily ambulatory T. Zawadzka, and M. Zawadzki, “Emotion
“Continuous stress detection using wearable psychological and physiological recording for recognition from physiological channels using
sensors in real life: Algorithmic programming emotion research,” Sci. Data, vol. 8, no. 1, graph neural network,” Sensors, vol. 22, no. 8,
contest case study,” Sensors, vol. 19, no. 8, pp. 1–12, Jun. 2021. p. 2980, Apr. 2022.
p. 1849, Apr. 2019. [132] B. Mahesh, T. Hassan, E. Prassler, and J. Garbas, [151] B. H. Kim and S. Jo, “Deep physiological affect
[114] J. Wagner, J. Kim, and E. André, “From “Requirements for a reference dataset for network for the recognition of human emotions,”
physiological signals to emotions: Implementing multimodal human stress detection,” in Proc. IEEE IEEE Trans. Affect. Comput., vol. 11, no. 2,
and comparing selected methods for feature Int. Conf. Pervasive Comput. Commun. Workshops pp. 230–243, Apr. 2020.
extraction and classification,” in Proc. IEEE Int. (PerCom Workshops), Mar. 2019, pp. 492–498. [152] M. N. Dar, M. U. Akram, S. G. Khawaja, and
Conf. Multimedia Expo, Jul. 2005, pp. 940–943. [133] B. M. Kudielka, D. H. Hellhammer, and C. A. N. Pujari, “CNN and LSTM-based emotion
[115] D. Huron and J. K. Vuoskoski, “On the enjoyment Kirschbaum, “Ten years of research with the trier charting using physiological signals,” Sensors,
of sad music: Pleasurable compassion theory and social stress test–revisited,” in Social Neuroscience: vol. 20, no. 16, p. 4551, Aug. 2020.
the role of trait empathy,” Frontiers Psychol., Integrating Biological and Psychological [153] M. Awais et al., “LSTM-based emotion detection
vol. 11, p. 1060, May 2020. Explanations of Social Behavior, E. Harmon-Jones using physiological signals: IoT framework for
[116] Y. S. Can, D. Gokay, D. R. Kılıç, D. Ekiz, and P. Winkielman, Eds. The Guilford Press, 2007, healthcare and distance learning in COVID-19,”
N. Chalabianloo, and C. Ersoy, “How laboratory pp. 56–83. IEEE Internet Things J., vol. 8, no. 23,
experiments can be exploited for monitoring stress [134] S. S. Dickerson and M. E. Kemeny, “Acute stressors pp. 16863–16871, Dec. 2021.
in the wild: A bridge between laboratory and daily and cortisol responses: A theoretical integration [154] T. Umematsu, A. Sano, S. Taylor, and R. W. Picard,
life,” Sensors, vol. 20, no. 3, p. 838, Feb. 2020. and synthesis of laboratory research,” Psychol. “Improving students’ daily life stress forecasting
[117] E. Grande, “From physiological signals to Bull., vol. 130, no. 3, pp. 355–391, 2004. using LSTM neural networks,” in Proc. IEEE EMBS
emotions: An integrative literature review,” [135] J. J. Gross, “The emerging field of emotion Int. Conf. Biomed. Health Informat. (BHI),
Bachelor Thesis, Universität Stuttgart, Stuttgart, regulation: An integrative review,” Rev. Gen. May 2019, pp. 1–4.
Germany, 2022, no. 44. Psychol., vol. 2, no. 3, pp. 271–299, Sep. 1998. [155] J. Vazquez-Rodriguez, G. Lefebvre, J. Cumin, and
[118] A. M. Kring and A. H. Gordon, “Sex differences in [136] S. Scheibe, G. Sheppes, and U. M. Staudinger, J. L. Crowley, “Transformer-based self-supervised
emotion: Expression, experience, and physiology,” “Distract or reappraise? Age-related differences in learning for emotion recognition,” 2022,
J. Pers. Social Psychol., vol. 74, no. 3, pp. 686–703, emotion-regulation choice,” Emotion, vol. 15, arXiv:2204.05103.
1998. no. 6, pp. 677–681, 2015. [156] K. Yang et al., “Mobile emotion recognition via
[119] J. J. Gross and R. W. Levenson, “Hiding feelings: [137] M. N. Shiota and R. W. Levenson, “Effects of aging multiple physiological signals using
The acute effects of inhibiting negative and on experimentally instructed detached convolution-augmented transformer,” in Proc. Int.
positive emotion,” J. Abnormal Psychol., vol. 106, reappraisal, positive reappraisal, and emotional Conf. Multimedia Retr. New York, NY, USA:
no. 1, pp. 95–103, Feb. 1997. behavior suppression.,” Psychol. Aging, vol. 24, Association for Computing Machinery, Jun. 2022,
[120] S. Koelstra et al., “DEAP: A database for emotion no. 4, pp. 890–900, Dec. 2009. pp. 562–570.
analysis using physiological signals,” IEEE Trans. [138] J. J. Gross, “Antecedent- and response-focused [157] H. P. Martinez, Y. Bengio, and G. N. Yannakakis,

“Learning deep physiological models of affect,” K. Cheng, “Physiological signal analysis for regulation,” in Proc. 14th Int. Conf. Tangible,
IEEE Comput. Intell. Mag., vol. 8, no. 2, pp. 20–33, patients with depression,” in Proc. 4th Int. Conf. Embedded, Embodied Interact. New York, NY, USA:
May 2013. Biomed. Eng. Informat. (BMEI), vol. 2, Oct. 2011, Association for Computing Machinery, Feb. 2020,
[158] T. Zhang, X. Wang, X. Xu, and C. L. P. Chen, pp. 805–808. pp. 17–30.
“GCB-Net: Graph convolutional broad network [174] H. Cai et al., “A pervasive approach to EEG-based [192] J. Costa, F. Guimbretière, M. F. Jung, and
and its application in emotion recognition,” IEEE depression detection,” Complexity, vol. 2018, T. Choudhury, “BoostMeUp: Improving cognitive
Trans. Affect. Comput., vol. 13, no. 1, pp. 379–388, pp. 1–13, Feb. 2018. performance in the moment by unobtrusively
Jan. 2022. [175] S. Sarabadani, L. C. Schudlo, A. A. Samadani, and regulating emotions with a smartwatch,”
[159] N. Saleheen et al., “mSieve: Differential A. Kushski, “Physiological detection of affective Proc. ACM Interact., Mobile, Wearable
behavioral privacy in time series of mobile sensor states in children with autism spectrum disorder,” Ubiquitous Technol., vol. 3, no. 2, pp. 1–23,
data,” in Proc. ACM Int. Joint Conf. Pervasive IEEE Trans. Affect. Comput., vol. 11, no. 4, Jun. 2019.
Ubiquitous Comput. New York, NY, USA: pp. 588–600, Oct. 2020. [193] P. E. Paredes et al., “Just breathe: In-car
Association for Computing Machinery, Sep. 2016, [176] J. Gwak, M. Shino, and A. Hirao, “Early detection interventions for guided slow breathing,” Proc.
pp. 706–717. of driver drowsiness utilizing machine learning ACM Interact., Mobile, Wearable Ubiquitous
[160] Y. S. Can and C. Ersoy, “Privacy-preserving based on physiological signals, behavioral Technol., vol. 2, no. 1, pp. 1–23, Mar. 2018.
federated deep learning for wearable IoT-based measures, and driving performance,” in Proc. 21st [194] S. Balters, M. L. Mauriello, S. Y. Park, J. A. Landay,
biomedical monitoring,” ACM Trans. Internet Int. Conf. Intell. Transp. Syst. (ITSC), Nov. 2018, and P. E. Paredes, “Calm commute: Guided slow
Technol., vol. 21, no. 1, pp. 1–17, Feb. 2021. pp. 1794–1800. breathing for daily stress management in drivers,”
[161] P. Prajod, T. Huber, and E. André, “Using [177] W. Yang, M. Rifqi, C. Marsala, and A. Pinna, Proc. ACM Interact., Mobile, Wearable Ubiquitous
explainable AI to identify differences between “Physiological-based emotion detection and Technol., vol. 4, no. 1, pp. 1–19, Mar. 2020.
clinical and experimental pain detection models recognition in a video game context,” in Proc. Int. [195] P. Miri et al., “Challenges in evaluating
based on facial expressions,” in Proc. Int. Conf. Joint Conf. Neural Netw. (IJCNN), Jul. 2018, technological interventions for affect regulation,”
Multimedia Modeling. Berlin, Germany: Springer, pp. 1–8. IEEE Trans. Affect. Comput., early access,
2022, pp. 311–322. [178] O. AlZoubi, B. AlMakhadmeh, M. B. Yassein, and May 19, 2022, doi: 10.1109/TAFFC.2022.
[162] W. Zhang and B. Y. Lim, “Towards relatable W. Mardini, “Detecting naturalistic expression of 3175687.
explainable AI with the perceptual process,” in emotions using physiological signals while playing [196] (2022). Empatica Website. Accessed: Aug. 2022.
Proc. CHI Conf. Hum. Factors Comput. Syst. video games,” J. Ambient Intell. Hum. Comput., [Online]. Available: https://www.empatica.com/
New York, NY, USA: Association for Computing vol. 14, pp. 1133–1146, Feb. 2023. [197] J. Luo, Y. Tian, H. Yu, Y. Chen, and M. Wu,
Machinery, Apr. 2022, pp. 1–24. [179] T. Althobaiti, S. Katsigiannis, D. West, and “Semi-supervised cross-subject emotion
[163] W. S. Liew, C. K. Loo, and S. Wermter, “Emotion N. Ramzan, “Examining human-horse interaction recognition based on stacked denoising
recognition using explainable genetically by means of affect recognition via physiological autoencoder architecture using a fusion of
optimized fuzzy ART ensembles,” IEEE Access, signals,” IEEE Access, vol. 7, pp. 77857–77867, multi-modal physiological signals,” Entropy,
vol. 9, pp. 61513–61531, 2021. 2019. vol. 24, no. 5, p. 577, Apr. 2022.
[164] H. A. Jassmi, S. Ahmed, B. Philip, F. Al Mughairbi, [180] J. Kim and D. R. Fesenmaier, “Measuring emotions [198] S. Nita, S. Bitam, M. Heidet, and A. Mellouk,
and M. Al Ahmad, “E-happiness physiological in real time: Implications for tourism experience “A new data augmentation convolutional neural
indicators of construction workers’ productivity: A design,” J. Travel Res., vol. 54, no. 4, pp. 419–429, network for human emotion recognition based on
machine learning approach,” J. Asian Archit. Jul. 2015. ECG signals,” Biomed. Signal Process. Control,
Building Eng., vol. 18, no. 6, pp. 517–526, [181] C. D. Katsis, G. Rigas, Y. Goletsis, and vol. 75, May 2022, Art. no. 103580.
Nov. 2019. D. I. Fotiadis, “Emotion recognition in car [199] F. Larradet, R. Niewiadomski, G. Barresi,
[165] G. Sun, F. Pang, Q. Liu, Y. Lin, L. Xu, and Y. Meng, industry,” in Emotion Recognition: A Pattern D. G. Caldwell, and L. S. Mattos, “Toward emotion
“Research on recognizable physiological signals of Analysis Approach. New York, NY, USA: Wiley, recognition from physiological signals in the wild:
workers working at heights,” in Proc. Int. Conf. 2015, pp. 515–544. Approaching the methodological issues in real-life
Man-Mach.-Environ. Syst. Eng. Berlin, Germany: [182] X. Li and Q. Ji, “Active affective state detection data collection,” Frontiers Psychol., vol. 11,
Springer, 2020, pp. 75–82. and user assistance with dynamic Bayesian p. 1111, Jul. 2020.
[166] B. G. Lee, B. Choi, H. Jebelli, and S. Lee, networks,” IEEE Trans. Syst., Man, Cybern. A, Syst. [200] K. Somandepalli et al., “Federated learning for
“Assessment of construction workers’ perceived Humans, vol. 35, no. 1, pp. 93–105, Jan. 2005. affective computing tasks,” in Proc. 10th Int. Conf.
risk using physiological data from wearable [183] A. Crawford, “Fatigue and driving,” Ergonomics, Affect. Comput. Intell. Interact. (ACII), Oct. 2022,
sensors: A machine learning approach,” J. Building vol. 4, no. 2, pp. 143–154, 1961. pp. 1–8.
Eng., vol. 42, Oct. 2021, Art. no. 102824. [184] Z. Li, K. Jiao, M. Chen, Y. Yang, C. Wang, and [201] S. Latif, S. Khalifa, R. Rana, and R. Jurdak,
[167] C. Liu, P. Rani, and N. Sarkar, “An empirical study S. Qi, “Spectral analysis of heart rate variability as “Federated learning for speech emotion
of machine learning techniques for affect a quantitative indicator of driver mental fatigue,” recognition applications,” in Proc. 19th ACM/IEEE
recognition in human–robot interaction,” in Proc. SAE Trans., vol. 111, pp. 249–253, Jan. 2002. Int. Conf. Inf. Process. Sensor Netw. (IPSN),
IEEE/RSJ Int. Conf. Intell. Robots Syst., Aug. 2005, [185] G. Yang, Y. Lin, and P. Bhattacharya, “A driver Apr. 2020, pp. 341–342.
pp. 2662–2667. fatigue recognition model using fusion of multiple [202] P. Chhikara, P. Singh, R. Tekchandani, N. Kumar,
[168] L. Shen, M. Wang, and R. Shen, “Affective features,” in Proc. IEEE Int. Conf. Syst., Man and M. Guizani, “Federated learning meets human
e-learning: Using ‘emotional’ data to improve Cybern., vol. 2, Oct. 2005, pp. 1777–1784. emotions: A decentralized framework for
learning in pervasive learning environment,” [186] Y. Liu and S. Du, “Psychological stress level human–computer interaction for IoT
J. Educ. Technol. Soc., vol. 12, no. 2, pp. 176–189, detection based on electrodermal activity,” Behav. applications,” IEEE Internet Things J., vol. 8, no. 8,
2009. Brain Res., vol. 341, pp. 50–53, Apr. 2018. pp. 6949–6962, Apr. 2021.
[169] W. Handouzi, C. Maaoui, A. Pruski, and [187] D. Ekiz, Y. S. Can, and C. Ersoy, “Long short-term [203] A. Nandi and F. Xhafa, “A federated learning
A. Moussaoui, “Objective model assessment for memory network based unobtrusive workload method for real-time emotion state classification
short-term anxiety recognition from blood volume monitoring with consumer grade smartwatches,” from multi-modal streaming,” Methods, vol. 204,
pulse signal,” Biomed. Signal Process. Control, IEEE Trans. Affect. Comput., vol. 14, no. 2, pp. 340–347, Aug. 2022.
vol. 14, pp. 217–227, Nov. 2014. pp. 895–905, Apr./Jun. 2021. [204] C. Dwork and A. Roth, “The algorithmic
[170] O. Bălan, A. Moldoveanu, and M. Leordeanu, [188] S. Walter et al., “The biovid heat pain database foundations of differential privacy,” Found. Trends
“A machine learning approach to automatic data for the advancement and systematic Theor. Comput. Sci., vol. 9, nos. 3–4, pp. 211–407,
phobia therapy with virtual reality,” in Modern validation of an automated pain recognition 2014.
Approaches to Augmentation of Brain Function. system,” in Proc. IEEE Int. Conf. Cybern. (CYBCO), [205] T. Feng, R. Peri, and S. Narayanan, “User-level
Berlin, Germany: Springer, 2021, pp. 607–636. Jun. 2013, pp. 128–131. differential privacy against attribute inference
[171] D. Lopez-Martinez and R. Picard, “Multi-task [189] K. D. Bartl-Pokorny et al., “Robot-based attack of speech emotion recognition in federated
neural networks for personalized pain recognition intervention for children with autism spectrum learning,” 2022, arXiv:2204.02500.
from physiological signals,” in Proc. 7th Int. Conf. disorder: A systematic literature review,” IEEE [206] A. Liapis, C. Katsanos, D. Sotiropoulos, M. Xenos,
Affect. Comput. Intell. Interact. Workshops Demos Access, vol. 9, pp. 165433–165450, 2021. and N. Karousos, “Stress recognition in
(ACIIW), Oct. 2017, pp. 181–184. [190] R. T. Azevedo, N. Bennett, A. Bilicki, J. Hooper, human–computer interaction using physiological
[172] S. D. Subramaniam and B. Dass, “Automated F. Markopoulou, and M. Tsakiris, “The calming and self-reported data: A study of gender
nociceptive pain assessment using physiological effect of a new wearable device during the differences,” in Proc. 19th Panhellenic Conf.
signals and a hybrid deep learning network,” IEEE anticipation of public speech,” Sci. Rep., vol. 7, Informat., Oct. 2015, pp. 323–328.
Sensors J., vol. 21, no. 3, pp. 3335–3343, no. 1, p. 2285, May 2017. [207] B. Schölkopf et al., “Toward causal representation
Feb. 2021. [191] K. Y. Choi and H. Ishii, “AmbienBeat: Wrist-worn learning,” Proc. IEEE, vol. 109, no. 5,
[173] Y. Chen, I. Hung, M. Huang, C. Hou, and mobile tactile biofeedback for heart rate rhythmic pp. 612–634, May 2021.

ABOUT THE AUTHORS

Yekta Said Can received the B.Sc., M.Sc., Elisabeth André (Senior Member, IEEE)
and Ph.D. degrees from Boğaziçi University, is currently a Full Professor of computer
Istanbul, Turkey, in 2012, 2014, and 2020, science and the Founding Chair of human-
respectively. centered artificial intelligence with the Uni-
He has worked as a Teaching Assistant versity of Augsburg, Augsburg, Germany,
at Boğaziçi University for six years dur- where she is also a Co-Speaker of the Bavar-
ing his Ph.D. degree. After obtaining his ian Research Association ForDigitHealth.
Ph.D. degree, he worked as a Postdoctoral She has a long track record in multimodal
Researcher in a European Union’s Horizon human–machine interaction, embodied con-
2020 ERC project (UrbanOccupations) for applying computer vision versational agents, social robotics, affective computing, and social
techniques to retrieve information from historical documents for signal processing.
two years. He is currently working on recognizing emotions and Dr. André is a member of the prestigious Academy of Europe, the
stress at the University of Augsburg, Augsburg, Germany, as a German Academy of Sciences Leopoldina, and the CHI Academy.
Postdoctoral Researcher. His research interests include biometrics, Her work has won many awards, including the Gottfried Wil-
document analysis, physiological signal processing, affective and helm Leibniz Prize, the most important research funding award in
wearable computing, and machine learning. Germany. In 2013, she was awarded a European Association for
Artificial Intelligence (EurAI) fellowship. In 2019, she was named
one of the ten most influential figures in the history of AI in Germany
by the National Society for Informatics (GI). From 2019 to 2022, she
has served as the Editor-in-Chief of IEEE TRANSACTIONS ON AFFECTIVE
Bhargavi Mahesh received the M.Sc. COMPUTING.
degree in autonomous systems from the
Bonn-Rhein-Sieg University of Applied Sci-
ences, Sankt Augustin, Germany, in 2019.
She is currently working toward the Ph.D.
degree at the Chair of Human-Centered Arti-
ficial Intelligence, University of Augsburg,
Augsburg, Germany.
From 2019 to 2022, she worked as a
Researcher at the Smart Sensing Electronics Division, Fraunhofer
Institute for Integrated Circuits, Erlangen, Germany. Her research
interests include affective computing and social signal processing.

2023IEEE

Uploaded by

Copyright:

Available Formats

2023IEEE

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023IEEE

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

P ROCEEDINGS OF THE IEEE 1

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

2 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 3

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

defined by the scales: pleasure, arousal, and dominance

4 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

behavior by specifying relations among different modal-

P ROCEEDINGS OF THE IEEE 5

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

B. Cardiac Responses E. Respiration Responses

6 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

Table 1 Activity Types and Corresponding Measurement Types

P ROCEEDINGS OF THE IEEE 7

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

8 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 9

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

10 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 11

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

V. E X P E R I M E N T A L D E S I G N F O R an emotion recognition study involving participants

12 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 13

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

14 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 15

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

16 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 17

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

18 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 19

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

modalities and select the most appropriate interpolation

B. Issues Related to Data Annotation

20 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 21

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

22 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 23

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

24 P ROCEEDINGS OF THE IEEE

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

P ROCEEDINGS OF THE IEEE 25

Can et al.: Approaches, Applications, and Challenges in Physiological Emotion Recognition

26 P ROCEEDINGS OF THE IEEE