0% found this document useful (0 votes)
27 views20 pages

Deep Learning Approaches For Stress Detection: A Survey

This article presents a survey on deep learning approaches for stress detection, highlighting the significance of reliable stress detection techniques in improving quality of life. It discusses various data sources used for stress detection, including physiological signals, speech, facial expressions, and social media content, and emphasizes the advantages of deep learning over traditional machine learning methods. The paper also outlines publicly available datasets and identifies key research directions for future studies in the field.

Uploaded by

rajasekhar.ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views20 pages

Deep Learning Approaches For Stress Detection: A Survey

This article presents a survey on deep learning approaches for stress detection, highlighting the significance of reliable stress detection techniques in improving quality of life. It discusses various data sources used for stress detection, including physiological signals, speech, facial expressions, and social media content, and emphasizes the advantages of deep learning over traditional machine learning methods. The paper also outlines publicly available datasets and identifies key research directions for future studies in the field.

Uploaded by

rajasekhar.ch
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

This article has been accepted for publication in IEEE Transactions on Affective Computing.

This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1

Deep Learning Approaches for Stress Detection:


A Survey
Maria Kyrou, Ioannis Kompatsiaris, Senior Member, IEEE,
and Panagiotis C. Petrantonakis, Senior Member, IEEE

Abstract—Stress has a severe impact on individuals irrespective of age, sex, work, or background. The reliable development of stress
detection techniques enhances the social, educational, physical, economic, and professional quality of life, preventing chronic stress
and proposing alleviation strategies. Research studies examine psychological, cognitive, behavioral, and physiological reactions to
identify stress adequately. Deep Learning (DL) has received significant attention in recent years as it deals with high-dimensional,
heterogeneous data and automatically learns representative features. This paper presents a survey on stress detection with recent DL
approaches, leveraging data from all possible sources (physiological, speech, facial expressions, gestures, and social media content).
The methodological outlines, the best results, and the main contributions of each study are discussed. We also describe publicly
available datasets used by several of the presented works. Finally, we emphasize various open issues within the field of research and
highlight key directions for future work.

Index Terms—Deep learning, neural networks, stress detection.

1 I NTRODUCTION

S TRESS can promote several pathological conditions and


disorders. The impact is higher if stress becomes chronic,
affecting the immune system and systematically damaging
Numerous studies have explored stress detection
through traditional machine learning (ML) [8], [9], referred
to as shallow learning. These studies often utilize physio-
multiple organs and tissues [1], [2]. Early detection could logical signals from various sensors as they are known for
prevent the damage, i.e., by measuring stress in daily life their effectiveness and accuracy in indicating stress [10]. Ad-
scenarios such as work, or providing a feedback warning ditionally, research has investigated the impact of stress on
in life-threatening situations for patients with, e.g., autism speech signals [11], [12], as well as how behavioral signals
[3], [4]. With computers increasingly developing the ability such as head pose, body posture, and movement can reveal
to recognize and express affect, the field of Affective Com- stress levels [13], [14]. Changes in typing patterns have
puting concerned with the connection to, origins in, and also been studied concerning stress [15], [16]. Expanding on
impact on emotion [5], has received growing attention in these methods, researchers have integrated multiple modal-
recent decades. Following Russell and the circumplex model ities to assess how different sensor measurements contribute
of affect [6], stress lies in the upper left quadrant of the to stress recognition [17].
emotional valence-arousal space, showing high arousal and While shallow learning techniques can be powerful,
negative valence characteristics. they often become inefficient when handling large-scale
Studies in the field, consider the psychological, cog- data. In contrast, deep learning (DL), which dominates the
nitive, behavioral, and physiological responses to detect contemporary ML field, offers the potential to address this
stress efficiently. These responses are measured by collect- challenge effectively. DL is particularly suited for managing
ing physiological signals, such as Blood Volume Pressure large-scale, high-dimensional data from multiple sources
(BVP), Electrocardiography (ECG), Electromyogram (EMG), such as speech, audio, and video [18]. Furthermore, shallow
Galvanic Skin Response (GSR), Respiration, and behavioral learning methods require time-consuming manual feature
signals such as speech, body movement, head position, extraction, which can lead to overfitting and necessitate
gestures or leveraging facial expressions and context from additional feature selection techniques [19]. This manual
social media [7]. process is also data-dependent and lacks generalizability
across different domains [20], [21]. A key advantage of
• M. Kyrou is with the Artificial Intelligence and Information Analysis DL is its ability to learn representative data attributes auto-
Laboratory, Dept. of Informatics, Aristotle University of Thessaloniki, matically through a general-purpose learning process, thus
Greece, GR 54124, and with Information Technologies Institute, Centre enhancing efficiency and applicability [22].
for Research and Technology - Hellas, Greece, GR 57001.
E-mail: [email protected]
Surveys in the field have reviewed studies on
• I. Kompatsiaris is with the Information Technologies Institute, Centre for physiology-based stress detection [8], [9], [23], and on
Research and Technology - Hellas, Greece, GR 57001. several modalities using ML techniques [24]. Here, we
E-mail: [email protected] survey studies that use at least one DL technique to detect
• P.C. Petrantonakis is with the Dept. of Electrical and Computer Engineer-
ing, Aristotle University of Thessaloniki, Greece, GR 54124. stress from various signals. The main aims of this survey
E-mail: [email protected] are to highlight the superiority of the DL approaches in
Manuscript received January 5th, 2023. Revised February 1, 2024 and August comparison with the traditional ML ones, to stress out
19, 2024 the peculiarities of the DL-based stress detection and the

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 2

solutions offered by the surveyed studies, and to point Biochemical indicators involve catecholamines, copeptin,
out the important research directions for future studies in prolactin hormones in the blood, and alpha-amylase and
the field. To the best of our knowledge, there is no other cortisol in the saliva [32].
survey concerning stress detection with DL approaches Chronic stress, depression, and anxiety, if not timely
which both consider all possible types of data (physiolog- detected and managed, carry heavy consequences on the
ical, speech, facial expressions, gestures, social media) and individual and society. The person becomes unable to cope
integrate information from multiple modalities. In addition, with daily routine and work, and physical and mental
we also provide information about publicly available bench- problems arise, resulting in a higher cost to health service
mark datasets attempting to delve into their attributes (e.g., providers [33], too. Other possible applications of stress
modalities, number of subjects, study environment) giving detection frameworks can be driver stress monitoring in
insights into the challenges associated with them within the automobile environments, passenger stress detection and
specific context of utilizing DL for stress detection. alleviation in commercial flights in airplanes, and support-
The organization of this paper is as follows. In Section 2, ing psychologists in online therapy sessions by continu-
we define stress, point out the significance of stress detec- ously monitoring the stress level of patients [34]. In addi-
tion, and introduce the main characteristics and applications tion, education could benefit from detecting and evaluating
of DL. Section 3 discusses the different data types and their mental stress affecting students’ performance adversely. By
relation to stress. Moreover, publicly available, benchmark monitoring indicators of students’ engagement and stress,
datasets are presented. In Section 4, we attempt to introduce educators can gain insights into the impact of different
the scientific methods, data analysis, and results of studies teaching content, and effectively adjust their teaching speed
in the stress detection area. In section 5, we analyze the and methods [35]. Tracking stress levels is also crucial in the
contribution of DL to the stress detection field and highlight industry, because workers in stressful situations are prone to
the advantages of such an approach. Moreover, we examine mistakes, and lacking safety measures constitutes the main
different aspects of DL-based stress detection research that reason for human-related problems in this area. [36]. In sum,
should be considered in future endeavors. Section 6 con- the development of robust stress detection mechanisms as-
cludes the paper. sists in improving the quality of social, academic, physical,
financial, and professional life [37].
2 BACKGROUND 2.2 Deep Learning
2.1 Stress DL has gained attention during the past several years by
All living beings have experienced stressors, which are situ- introducing a new ML breakthrough. Two essential factors
ations causing stress and stemming from both positive and led to the adoption of DL in modern technologies. The first
negative events. In general, stress arises when homeostasis is the continuous interest in Big Data analytics, where a huge
is threatened or perceived threatened [25]. Stress can be amount of information needs to be processed. As mentioned
distinguished between ‘eustress’ and considered beneficial, before, conventional shallow learning fails to manage these
enhancing a person’s productivity and motivation, and ‘dis- massive amounts of unstructured, heterogeneous data and
tress’, which produces negative emotional and physiological the manual extraction of representative features becomes
responses [26]. Depending upon the temporal pattern and impracticable. The second factor lies in the advances in par-
duration, stress is discerned in acute (short-term), episodic allel computing architectures, namely GPUs, which allow
(occurs periodically), and chronic stress (long-term) [27]. the efficient computation of network’s weights [38].
Acute stress may occur because of daily life demands such Commonly, DL employs Neural Network architectures
as work pressure, resulting in headaches, upset stomachs, and models which comprise multiple layers processing non-
and other symptoms. If acute stress episodes become more linear information. These layers correspond to many lev-
frequent, people suffer from episodic stress and symptoms els of abstraction and hierarchical feature representations
like anxiety and hypertension. The more severe type of where higher-level concepts help to define lower-level con-
stress, chronic stress, may arise from experiences such as cepts and vice versa. DL architectures were inspired by
wars and abuse and could become fatal [28]. Depending human information processing mechanisms that are also
on the type and duration of the stressor, the genetic back- hierarchically structured [39].
ground, age, and sex of the individual, and the stress con- DL algorithms have been applied to a tremendous vari-
text, the neuronal ensembles through the Central Nervous ety of applications and domains. Lately, DL algorithms are
System produce a different response. This stress-response also widespread for stress detection since they are capable
system is of vital importance for survival but prolonged of handling the continuous and real-time collection of multi-
exposure leads to chronic stress, and mental and physical dimensional data from wearable devices [40], [41], capture
health problems [29]. Evaluation of the quantitative stress the non-linear correlation across different data modalities
response is a key challenge because every person perceives [42] or generate and classify user-scope attributes [43]. Thus,
and reacts distinctly to the same stimuli. In addition, differ- a comprehensive review of the subject is timely and neces-
ent situations uniquely affect the same person [30]. Besides, sary for the stress detection field.
several common physiological or biochemical responses to
stress have been found. Some biophysiological markers in- 3 S IGNALS AND DATASETS
clude an increase in Heart Rate (HR), the blood supply to the The main characteristics of the signals used for stress de-
muscles, muscle tension, respiratory rate, skin temperature, tection and how they are related to the stress biologi-
sweat release, cognitive activity, and pupil dilation [9], [31]. cal/behavioral substrate are described in this section.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 3

3.1 Physiological signals 3.1.7 Body temperature


3.1.1 Electrocardiography (ECG) Physiological stress typically results in a rise in body tem-
ECG records the electrical activity of the heart measuring the perature [59]. In addition, several studies employ a body
potential difference of electrodes placed on particular body temperature sensor to eliminate artifacts in other signals,
parts [44]. An ECG signal has three main components: The such as GSR which is directly affected by temperature [60].
P-wave, QRS-complex, and T-wave. Heart Rate Variability
(HRV) can be defined as the variation in time between 3.1.8 Accelerometer
two consecutive heartbeats (R peaks) [45]. To measure the An accelerometer sensor records the acceleration along three
HRV, time, frequency, and nonlinear domain analysis are axes with gravity. It is a commonly used sensor for activity
used. The autonomic nervous system (ANS) is responsible recognition in terms of velocity and displacement [61]. This
for regulating cardiac activity by balancing the Sympa- sensor is suitable for real-life scenarios to detect behavior
thetic Nervous System (SNS) and Parasympathetic Nervous related to stress and in the removal of other physiological
System (PNS). SNS or PNS activity is detectable via their signals’ artifacts coupled with motion [62].
relation with low and high-frequency ranges respectively.
Increased activity in SNS, which can be caused by stress,
results in cardio-acceleration [46]. 3.2 Other signals or behavioral data
3.1.2 Blood Volume Pressure (BVP) and Photoplethysmo- 3.2.1 Speech
gram (PPG) When a person is under stress the speech production process
BVP signal, also known as PPG measures the blood volume varies. Different stressors affect speech production, namely
changes in the human body. Sympathetic activation as a the articulators, breathing rate, and muscle tension, or in-
consequence of stress, pain, or fear, causes changes in HR crease the vocal effort due to a noisy environment, experi-
and stroke volume, and thus, BVP constitutes a measure of ence, or a perceived threat. Speaking under stress increases
HRV as well [47]. subglottal pressure and pitch (fundamental frequency) [63].
Besides, in real-world environments speech signal is also
3.1.3 Galvanic Skin Response (GSR) affected by other factors like noise, accent change, and
GSR, also known as Electrodermal Activity (EDA), refers language variability, hindering automatic speech systems
to the increase in the electrical conductance of the skin performance [64].
or decrease in resistance across the palms of the hands
or the feet due to changes of contraction and dilation of 3.2.2 Facial expressions
blood vessels in the skin and the sweat gland secretion. It
reflects unconscious cognitive and emotional changes. GSR Face can depict a person’s emotional state through changing
is a good measure for SNS activation and thus is broadly expressions [65]. In [66], stress is found to have a positive
used to define stress levels [48], [49]. correlation with valence and a negative with arousal from
images with emotional facial expressions. Facial expressions
3.1.4 Electromyography (EMG) of emotion are linked to stress responses [67]. In particular,
Electrical currents generated in muscles during contraction various facial feature types are postulated to be connected
are measured by EMG signal, controlled by the nervous to stress and anxiety. These features comprise head (head
system [50]. The activation of SNS during stress experience movement, skin color, HR (facial PPG)), eyes (blink rate,
elevates muscle tone and sometimes produces shivering. eyelid response, eye aperture, eyebrow movements), mouth
Increased EMG activity due to mental stress is usually (mouth shape, lip deformation, lip corner puller/depressor,
observed and measured in the upper trapezius muscle and lip pressor), gaze (saccadic eye movements, gaze spatial
can cause musculoskeletal disorders [51], [52]. distribution, gaze direction) and pupil (pupil size variation,
pupil ratio variation) [68]. For example, under stressful
3.1.5 Electroencephalogram (EEG)
situations eyebrows frown frequently [69], head motility is
EEG is the prevalent method to access brain activity and increased and more rapid [70], mouth activity and blink rate
is used in clinical diagnosis and biomedical research [53]. are augmented [68], [69].
Stress hormones in the human body provide feedback to it
[54]. Quantification of stress from brain patterns is feasible
3.2.3 Gestures
by utilizing EEG. It is evidenced that there is a correlation
between levels of stress and EEG power [55] and that Negative stimuli induce stress in a person and the sponta-
different brainwaves reveal brain mental states. neous or involuntary response of his/her body, other than
the face, is called micro-gesture. If accurately detected, these
3.1.6 Respiration gestures reveal hidden emotional states such as stress and
Alterations in breathing patterns during stressful situations nervousness. Typical micro-gestures involve folding arms,
have been observed. In particular, when a person is in playing or adjusting the hair, folding arms behind the body,
a high arousal state, notably anger and stress, or during or crossing legs [71]. Moreover, gestures produced while
mental load and sustained attention, irregular and quickly interacting with a smartphone have been examined to access
varying respiration patterns are produced, as opposed to stress states. These gestures include ‘tap’, ‘scroll’, ‘swipe’,
slow respiration in a relaxed state [56], [57]. Respiration and ‘text writing’ by extracting features like the applied
can also influence the HRV, and thus, is often used as a pressure, the size of the touch, the duration of the interaction
supplementary input to HRV analysis related to stress [58]. [72], [73].

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 4

3.2.4 Social media content scores indicate a stressed person). In addition, big-five
Social media have set a new means of communication as personality trait scores [82] and participants’ self-ratings
gradually more people tend to share their inner thoughts provide the potential to connect the personality traits with
and moods, unfold everyday events, and interact with the emotional states.
friends. This behavior may enclose valuable pieces of in- 3.3.1.5 Cognitive Load, Affect and Stress Recogni-
formation about their emotional states. Studies have shown tion (CLAS): CLAS [83] database contains PPG, ECG, and
that the context of a post, the language used, the social EDA recordings from 62 individuals. The subjects partici-
network structure, the frequency of interactions, etc. can be pated in interactive tasks such as mathematical and logical
employed to detect stress, anxiety, or post-traumatic stress problems and perceptive tasks which involved emotional
disorder (PTSD) [74], [43]. images and video clips. The available labels consist of stress,
valence, and arousal.
3.3.1.6 Alcohol and Drug Abuse Research Program
3.3 Benchmark Datasets
(ADARP): ADARP [84] dataset consists of HR, EDA, tem-
Data is of great importance when deploying and evalu- perature, and bodily movement signals recorded from 11
ating ML algorithms. The performance, fairness, robust- individuals facing alcohol use disorder. The study aims
ness, safety, and scalability of Artificial Intelligence (AI) to exploit physiological signals and self-reports to detect
systems are highly influenced by data [75]. Towards this mental disorders and stress.
end, benchmark datasets are widely used by the community
to evaluate their proposed methods and carry out a direct 3.3.2 Speech datasets
comparison of their performance. Below we present in detail
the benchmark datasets for emotional state-stress detection 3.3.2.1 Speech Under Simulated and Actual Stress
that are publicly available (see also Table 1). (SUSAS): This dataset [85] includes speech under stress by
different speaking styles and in noisy environments. The
3.3.1 Physiological signal datasets dataset is formed by 35 aircraft communication words con-
sidered to be ambiguous (e.g., six-fix). A total of 32 speakers
3.3.1.1 DeepBreath: In this study, breathing patterns
took part, producing 16,000 utterances. In addition, pilots in
were collected as temperature changes around the nostril
Apache helicopter flight conditions were also employed as
using a low-cost thermal camera, to recognize people’s psy-
speakers.
chological stress levels. The tasks implemented and aimed
to induce mental stress were the Stroop Colour Word test
[76] and a mathematics test. The difficulty varied, in the 3.3.3 Facial expression datasets
expectation of eliciting a range of stress levels (none, low, 3.3.3.1 Extended Cohn-Kanade Dataset (CK+): The
and high). Whole sessions took 63 to 72 minutes for each of CK+ dataset [86] is intended for action unit and facial
the 8 subjects [77], [78]. emotion expression detection use. It includes 593 image
3.3.1.2 Stress Recognition in Automobile Drivers sequences from 123 subjects. The image sequence incorpo-
(SRAD): SRAD [79] dataset includes records from four rates the onset (neutral expression) to the peak formation of
physiological signals: ECG, EMG, GSR, and respiration. A the facial expressions. There are six basic emotion classes
total of 17 subject recordings are publicly available with a namely anger, disgust, fear, happiness, sadness, surprise,
duration of 65 to 93 minutes. The dataset was created during and contempt as another special facial expression. The
real-world driving tasks to determine a driver’s relative studies that exploit the CK+ dataset for stress recognition,
stress level. Three distinct driving conditions, namely rest, employ either anger, fear, and sadness [96] or, anger, fear,
highway, and city driving are present and correspond to and disgust [97] as stress-related facial expressions.
three stress levels, low, medium, and high, respectively. 3.3.3.2 Oulu-CASIA: Oulu-CASIA [88] contains fa-
3.3.1.3 Wearable Stress and Affect Detection (WE- cial expressions from 80 people. The six emotional expres-
SAD): For this data collection [80] a chest-worn device, sions are happiness, sadness, surprise, anger, fear, and dis-
RespiBAN Professional, and a wrist-worn device, Empatica gust. Expressions are captured in three distinct illumination
E4, were used. Chest sensors recorded ECG, GSR, EMG, conditions: normal, weak, and dark using Near Infrared
temperature, acceleration, and respiration. The signals from (NIR) and Visible light (VIS). Anger, fear, and sadness labels
the wrist device are BVP, GSR, temperature, and accelera- are associated with stress class in [96].
tion. In addition, meditation followed the amusement and 3.3.3.3 Karolinska Directed Emotional Faces
stress conditions for the subjects to ’de-excite’. The experi- (KDEF): KDEF [87] dataset contains 4900 pictures of human
ment lasted about two hours and 15 subjects participated in facial expressions. The facial expressions of 70 individuals
the experiment. produce angry, fearful, disgusted, sad, happy, surprised,
3.3.1.4 A multimodal databASe for impliCit pER- and neutral emotional states. Anger, fear, and disgust labels
sonaliTy and Affect recognitIoN (ASCERTAIN): ASCER- are associated with stress class in [97].
TAIN [81] is a multimodal affective dataset that includes 3.3.3.4 Keimyung University Facial Expression of
both physiological responses, namely EEG, ECG, GSR, and Drivers (KMU-FED): KMU-FED database [89] was created
data recorded from wearable and commercial sensors and for facial expression recognition in an actual driving envi-
facial activity recorded from a webcam while 58 subjects ronment. A NIR camera was attached to the dashboard or
were watching affective movie clips. The self-reported tags steering wheel and captured 55 image sequences from 12
are valence and arousal scores and can be used as ground subjects in a real vehicle driving environment. The emo-
truth labels for stress (i.e. high arousal and low valence tional expressions are happiness, sadness, surprise, anger,

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 5

TABLE 1: Publicly Available Datasets Used in Studies Recognizing Stress With DL Approaches.

Number of
Name Labels Modalities Study Environment
Subjects

ECG, EMG, RESP,


SRAD [79] Stress Levels (low, medium, high) 17 Real world driving
GSR (hand and foot)
ECG, GSR, EMG, RESP (chest),
WESAD [80] neutral, amusement, stress 15 BVP (wrist), GSR, ACC, Laboratory
TEMP (chest and wrist)
arousal, valence, engagement, liking, familiarity
ASCERTAIN extraversion, agreeableness, openness, emotional 58 ECG, EEG, GSR, EMO Laboratory
[81] stability, conscientiousness
CLAS [83] stress, valence, arousal 62 PPG, ECG, GSR Laboratory

ADARP [84] stress event markers, 5-point scale for 11 HR, GSR, TEMP, bodily movements Everyday life
negative/positive emotions, alcohol-related
cravings, pain-discomfort
DeepBreath Stress Levels (none, low, high) 8 RESP Laboratory
[77], [78]
stress, slow, fast, soft, loud, question,
SUSAS [85] 32 Speech Laboratory
clear, angry, Lombard effect
anger, disgust, fear, happiness, sadness,
CK+ [86] 123 Facial image sequences Laboratory
surprise, contempt
KDEF [87] angry, fearful, disgusted, sad, happy, surprised, 70 Facial images Laboratory
neutral
happiness, sadness, surprise, anger,
Oulu-CASIA 80 Facial Videos Laboratory
fear, disgust
[88]
happiness, sadness, surprise, anger,
KMU-FED [89] 12 Facial image sequences Real world driving
fear, disgust
FERET [90] N/A 1199 Facial images Laboratory

ANUStressDB stressed, not stressed 35 Facial Videos Laboratory


[91]
BR, HR, EDA (perinasal and
NASA TLX, STAI, Type A/B Personality scores,
MDVFDD [92] 68 palm), PRV, FACS signals, Driving simulator
stressor/event by session
Facial Videos
valence, activation, happiness, sadness, surprise,
NNIME [93] 44 audio, video, ECG Laboratory
anger, frustration, neutral
HR, HRV, BR, BM, step count,
sleep duration and quality,
TILES-2018 [94] variety of psychologically validated 212 Real working
speech, proximity,
questionnaires conditions
environmental data
pre and post mental health surveys, Passive and automatic sensing
StudentLife 48 Everyday life
spring and cumulative GPA data from phones
[95]

fear, and disgust. Anger, fear, and sadness labels are associ- 3.3.4 Multimodal datasets
ated with stress class in [96].
3.3.4.1 A multimodal dataset for various forms of
distracted driving (MDVFDD): The MDVFDD dataset [92]
3.3.3.5 Face Recognition Technology (FERET): The was obtained during a controlled experiment on a driv-
FERET program database [90] is a large dataset of face ing simulator. The recorded physiological signals are the
images. It contains 14,126 face images of 1199 individuals perinasal EDA, palm EDA, HR, and breathing rate. Apart
collected in 15 sessions between 1993 and 1996. The datasets from these explanatory variables, facial expression signals,
were essentially created for the development, evaluation, biographical and psychometric covariates, and eye tracking
and comparison of face recognition algorithms as part of data are also obtained, as well as, response variables like
the FERET program. speed, acceleration, brake force, steering, and lane position
signals. The 68 subjects drove the same highway under
3.3.3.6 ANU Stress database (ANUStressDB): Ther- four distinct conditions: no distraction, cognitive distraction,
mal spectrum and visible spectrum videos of the faces of emotional distraction, and sensorimotor distraction.
35 individuals were recorded during this experiment while 3.3.4.2 NTHU-NTUA Chinese Interactive Multi-
they watched films [91]. These films were labeled as stressed modal Emotion Corpus (NNIME): NNIME [93] is a mul-
or not stressed and validated by the participants, too. In timodal database consisting of audio, video, and ECG
total, each individual has six videos of each type. recordings. The 44 subjects had prior professional acting

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 6

experiences. The database comprises 102 dyadic interaction of mental state. Most studies employ a combination of
sessions with roughly 11 hours’ worth of audio-video data physiological sensors for collecting data.
of participants spontaneously performing a short scene that
targeted one of the six pre-defined affects (anger, sadness, 4.2.1 ECG
happiness, frustration, neutral, and surprise). A large pool ECG is a widely utilized physiological signal in studies
of raters (including self-reports) provides the ground truth aiming at discerning human stress. The psychological strain
for valence, activation, and six emotions which may serve associated with stress manifests its impact on HR and blood
for stress recognition evaluation. pressure. Consequently, it becomes feasible to ascertain the
3.3.4.3 Tracking IndividuaL performancE with Sen- presence or absence of stress in an individual through
sors, year 2018 (TILES-2018): Data collection originates from monitoring cardiac activity [98].
212 hospital employees in California [94]. While at work, Choosing the window length. By analyzing the ECG wave-
ECG, breathing patterns, and activity levels were recorded, form, researchers can identify specific patterns and abnor-
and feature extraction from their speech was conducted. malities that may indicate individuals’ stress levels. Within
They also wore a wristband collecting their HR, sleep qual- the framework of handling continuous time series data, such
ity, and activity continuously. Environment sensors were as ECG recordings, a critical consideration revolves around
used to track staff movement within the hospital. Partici- determining the optimal window length for temporally seg-
pants self-reported their stress levels daily. Their answers menting the signals for subsequent analysis. Traditionally,
may be used to distinguish between stressed days and researchers have employed long-term ECG recordings span-
relaxed days. ning 24 hours or short-term 5-minute snapshots [99] in an
3.3.4.4 StudentLife: Passive and automatic sensing attempt to capture precise statistical features related to HRV
data from phones are collected in StudentLife study [95]. [100]. However, the chosen window length must strike a
The data facilitated in accessing mental health, academic balance, incorporating data characteristics while facilitating
performance, and behavioral trends of 48 Dartmouth stu- real-time analysis and maintaining computational efficiency.
dents over a 10-week term. The StudentLife app which To address this challenge, researchers have delved into
ran on students’ phones, automatically measured human utilizing ultra-short-term ECG signals for stress monitoring.
behaviors 24/7 without user interaction. Also, there are pre A commonly adopted window length focuses on 10 seconds
and post-mental health surveys and spring and cumulative of ECG data [99], [101], [102], [100], [103], [104], [105]. Nev-
GPA ground truths for evaluating mental health and aca- ertheless, there has been an exploration into even briefer
demic performance respectively. Participants’ responses to time windows, such as 6 seconds [106] and 3 seconds [107],
the EMA questions (stress reports) can be used as ground demonstrating the ongoing efforts to optimize the balance
truth labels for stress. between distinctive aspects of the data, computational ef-
ficiency in real-time analysis, and memory allocation re-
sources.
4 D EEP N EURAL N ETWORKS FOR STRESS DETEC - Feature extraction. The analysis of HRV facilitates the
TION identification and understanding of stress patterns within
4.1 Methodology individuals. This approach, which encompasses temporal
We performed the following queries on Google Scholar, dynamics, frequency distribution, and nonlinear character-
Pubmed, IEEE Xplore, and Elsevier websites to collect the istics, underscores its efficacy in identifying variations in
papers for the survey: stress detection + deep learning, autonomic regulation. Hence, HRV parameters have been
stress detection + neural networks, stress detection + GSR frequently applied in the context of stress recognition [99],
or EDA or ECG or HR or EEG or speech or gestures or [103], [104], [105], [108], [109], [40], [110], [111]. Si-
facial expressions or social media, GSR or EDA or ECG multaneously, the efficacy of classification is profoundly
or HR or EEG or speech or gestures or facial expressions impacted by the quality and quantity of these features, and
or social media + NN or CNN or LSTM or RNN or Bi- additional steps, e.g., feature selection, are often required.
LSTM or GAN or VGG or Transfer Learning or Multitask These approaches, distinguished by their ad-hoc character-
Learning or self-supervised learning. We chose only papers istics, necessitate a substantial time investment and heavily
published between 2015 and 2022 to find out the latest trend rely on expert knowledge for successful implementation.
in stress detection analysis. Additional studies were found Moreover, their sensitivity to noise introduces an additional
by carefully reviewing the references of each publication layer of complexity, diminishing their robustness in real-
and throughout the preparation of the current study. We world applications [106].
selected papers applying at least one DL technique as a clas- An alternative strategy involves directly harnessing the
sification method of stress. All studies with accompanied information embedded within the R peaks of ECG signals.
implementation details are tabulated in Table 2. Toward this end, authors in [104] derived the spectrum
from the positions of R peaks and used it as input to a CNN
model. The detection error rate (ER) was improved using
4.2 Physiological signals DL compared to conventional ML methods and further
Physiological signals are the most common signals in as- reduced in an optimized version of the proposed approach
sessing stress in humans. Distinct emotions affect the ANS [105]. Ramteke et al. [109] also compared the performance
differently, and the effects may be identified via physio- of features extracted by HRV and conventional ML to DL
logical markers. Furthermore, these responses are difficult models (Inception and Inception-LSTM networks) and RR
for someone to imitate, thus providing a reliable indicator interval (RRI) time series. In the study by Bu et al. [112] and

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 7

Giannakakis et al. [113], authors employed the differential and labeling of large-scale data sets. The TL’s potential
RRI, which signifies the increment between successive RR is also examined in [106] attempting to detect drivers’
intervals. Substituting the RRI data with its differential stress in three levels from short-time windows in real-time,
counterpart, which constitutes a significant factor in the exploring CNN-based pre-trained networks. The Xception-
derivation of indices for HRV evaluation, has substantially based model [116] reached the highest accuracy with only
enhanced the training performance of the LSTM network one physiological signal, for three levels of stress detection,
[112]. under real-world conditions and no handcrafted features.
Raw ECG input. Instead of resorting to feature extraction, Ishaque et al. [117] exploited TL to detect stress from small
an alternative approach involves implementing end-to-end datasets. An autoencoder (AE) modified the data from the
methods utilizing raw ECG signals. Contemporary DL tech- SWELL benchmark dataset [118], which was then used to
niques enable the automated construction of reliable fea- pre-train a 1D CNN and a novel 1D VGG16. These pre-
tures directly from the data. Hwang et al. [99] used raw ECG trained networks are then employed to classify ECG signals
signals for monitoring mental stress. In the proposed model, from WESAD [80]. CNN + AE + TL showed the best results
‘Deep ECGNet’, CNNs could capture the ECG character- in terms of accuracy compared to VGG16 + AE + TL, RF
istics with the entire full shape information of the signal. + AE, and other related studies that classified stress using
After the one CNN stage, two RNN layers generate feature AE, CNN, or TL techniques. This work demonstrated the
vectors using sequential features extracted by the CNN capacity to develop DL approaches for stress detection even
stage. Zhang et al. [101] introduced DL models with at- with limited data using again TL approaches making that
tention mechanisms to detect physiological stress from raw research approach an interesting direction for future work.
ECG signals. A CNN-BiLSTM architecture is deployed and Self-supervised learning (SSL) is adopted in [119] to
compared with several DL models with attention mecha- compensate for the little available human-labeled data. In
nisms. Results indicated that CNN-attention-based BiLSTM this two-step training process, high-level data representa-
outperformed all other models in terms of accuracy. In [103] tions are learned through an unsupervised pre-training task,
the CNN-BiLSTM model’s performance was compared with and the pre-trained weights are transferred to a network
conventional ML classifiers and handcrafted features ex- responsible for stress recognition. Sarkar and Etemad [120]
tracted from the RR interval or HRV. A 22.8% improvement compared SSL to fully supervised learning and achieved re-
in terms of accuracy was achieved with the CNN-BiLSTM markable performance using significantly less labeled data.
model and ECG signals of 10s, proving its potential for ECG visual representation. The remarkable success of
real-time stress monitoring and the potential of attention CNNs in the realm of image processing serves as a notable
mechanisms for stress detection revealing a temporal aspect paradigm of DL achievements. The ability of CNNs to auto-
of the stress expression in ECG that should be further matically learn and extract meaningful features from images
investigated. The short-time 10s window is also utilized in has greatly improved the accuracy and efficiency of im-
[102] to recognize stress/burnout in ECG data of healthcare age classification tasks. Following this, research approaches
workers. Authors proposed a 1-D CNN model, which is a have investigated strategies for transforming the ECG signal
modified version of AlexNet [114]. A wider window of 30s into a visual representation, leveraging CNNs’ inherent
ECG is utilized by Behinaein et al. [115] in their end-to-end capabilities in processing and categorizing structured visual
network (with a convolutional subnetwork, a transformer data. Sardeshpande and Thool [110] transformed the 1D
encoder, and a fully connected (FC) subnetwork). Authors signal to a 2D time-frequency representation. Features from
also perform fine-tuning using 1%, 5%, and 10% data to the last convolutional layer were also extracted for catego-
calibrate the model due to little data from each subject. rization with classical classifiers which drastically improved
The issue of limited accessible data was addressed by the their performance, proving the effectiveness of this kind
data augmentation approach in [113], with encouraging of network as a feature extractor as well. In addition, the
results. The proposed 1D Deep Wide CNN outperformed utilization of 2D representations of 1D signals proved to be
single-kernel networks, exhibiting less variability among the valuable when using CNN architectures that are constructed
different experimental phases and signal transformations. to be efficient in 2D or 3D signal analysis. Huang et al. [111]
Transfer and Self-supervised Learning. Since training a neu- extracted the Inter-Beat Intervals (IBIs) from ECG signals
ral network with a limited dataset is challenging, leveraging and transferred them to images that were used as input to
a pre-trained model for fine-tuning becomes more feasible, a CNN. Performance accuracy of detecting cognitive stress
especially when the distribution of data to be learned is with the proposed methodology was significantly higher
similar to that of the pre-trained model. Cho et al. [100] compared to time-domain features from IBIs and a simple
concentrated on a method using raw ECG signals in short ANN. Ahmad and Khan [121] constructed images based on
windows for ambulatory and laboratory stress detection. R-R peaks and temporal correlation from ECG signals. The
Deep CNN is utilized for training a large benchmark [79] images were used as inputs to CNN models which used
and a smaller dataset. The authors also trained the small decision-level fusion to recognize stress in five levels. Fusion
dataset with the Transfer Learning (TL) method using the of decision information appeared to be more effective in this
network pre-trained on the benchmark dataset, adjusting work highlighting also this approach for stress detection
and re-training some layers. Interestingly, the best result utilizing multiple DL classifiers and features.
was achieved with the shortest window. The utilization of In sum, studies with ECG have demonstrated the superi-
the TL method in this work highlighted the potential of ority of the DL approach in comparison with the traditional
incremental fine-tuning approaches to stress detection that ML ones. CNNs and LSTMs comprise popular network
would compensate for difficulties in reliable data gathering architectures and are often combined to deploy a deeper

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 8

model. Moreover, real-time stress detection is shown to While EEG has low spatial resolution, 2D and 3D maps
be possible using ECG data. Acquiring large datasets of can be extracted from EEG signals, preserving spatial brain
physiological signals, such as ECG, is not often feasible or information by leveraging the scalp’s electrode locations
practical, thus researchers have investigated TL approaches, and time-frequency analysis as in [127]. The authors ex-
fine-tuning and calibration of the models, or augmentation amined 2D and 3D AlexNet networks, either pre-trained or
of the existing data. Methods of transforming the 1D ECG with random initial weights, also suggesting the applicabil-
signal to a 2D image and feeding it to a CNN model are ity of TL in stress detection.
considered in some studies exploiting the power of CNNs Work-related stress could impinge fatigue, mental over-
in image-like data classification. load, productivity, safety, and health. In [128], EEG signals
are classified by a simple DNN to predict the user’s stress
4.2.2 PPG and relieve fatigue from multitasking. Moreover, the need
for limiting stress on construction workers is highlighted in
Over the past decade, there has been a notable increase in
[129], where DL addressed varying stress levels, as opposed
the prevalence of wearable technology. Authors in [122],
to the SVM algorithms which are often constrained to a
aimed to exploit the PPG sensor from wearable smart-
binary classification framework.
watches to detect both stress states and stressor types. IBI
Studies concerning the EEG signal for stress detection
and BVP values were extracted from the PPG time series
propose that real-time and accurate predictions can be
and then transformed into 2D images in the spatial and
achieved employing DL techniques in various contexts and
frequency domain wishing to utilize the potentials of CNNs.
applications such as simulated driving, work, or even with
They also developed conventional ML classifiers given the
people with autism spectrum disorders. In addition, the
lighter computational demands compared to DL. Can and
superiority of DL approaches is also confirmed for EEG.
Ersou [123] were interested in preserving the security of
Moreover, DL approaches require less time for feature engi-
data deriving from wearable devices. On the grounds of
neering as they require minor pre-processing of the signals.
this, they applied Federated Learning (FL) in two use-case
Finally, TL approaches are also applied in EEG-based stress
scenarios and three datasets with PPG-based heart activ-
detection highlighting the value of this approach under
ity data. FL combined with MLP yielded higher accuracy
limited data situations for physiological signals.
than traditional learning or SVM in a binary classification
scheme. In this work, data privacy issues were highlighted
4.2.4 Respiration
and taken into consideration paving the way for more such
approaches in future stress detection implementations using Cho et al. [77] introduced an automatic stress recognition
FL. method from breathing patterns using a low-cost thermal
imaging camera. A two-dimensional respiration variability
4.2.3 EEG spectrogram (RVS) automatically transformed from one-
dimensional signal sequences to avoid the extraction of
Detecting stress through the analysis of EEG signals using handcrafted features. Considering the data needs of DL,
DL techniques presents an interesting and promising area of the authors augmented the RVS dataset by proposing a
research. DL models can be trained to recognize patterns in unidirectional sliding cropper with a square window. A
the recorded brain electrical activity associated with stress CNN model architecture is employed for the classification
and foster timely interventions or support. In particular, scheme. Fernández and Anishchenko [130] presented a
EEG allows the estimation of a person’s stress in real-time stress detection technique based on respiratory signals taken
due to its high temporal resolution. This can become benefi- by a bioradar designed at Bauman Moscow State Technical
cial when strategies for stress mitigation need to be devised. University. The method is based on radar signal modulation
Driving, for example, is a context where these approaches by contractions of the heart and vessels, as well as reciprocal
are particularly relevant, as suggested by Halim and Rehan movements of the chest wall and abdomen generated by
[124] who studied driving-induced stress by simulating respiratory muscle activity, which are the major causes of
normal and stressed driving conditions. They followed a such signal variations in a person at a steady state.
traditional methodology of extracting and selecting EEG Although breathing patterns for stress detection are not
features. However, recent advances in DL hold promise used as extensively as the previous physiological signals,
in optimizing and refining these processes, addressing the from the works mentioned above it is evident that it is a
constraints associated with the feature extraction and classi- reliable modality for stress detection. Furthermore, 1D to
fication stages. 2D transformation of the signals seems to be a ubiquitous
With a view to this end, the superiority in terms of fewer technique for stress detection in physiological signals as
time requirements for training and classification accuracy it favors the more efficient exploitation of the capacities
of DL over shallow is featured in [125]. CNN models with of the CNN models. Finally, the key advantage of stud-
raw EEG signal inputs outperformed classical ML at com- ies presented in this section lies in the non-contact, non-
putation time and accuracy performance, as well as proved invasive data acquisition techniques as they do not require
excellent feature extractors for ML algorithms. EEG-based the subject to wear any sensor.
stress detection in adolescents with autism is studied in
[126] providing also a breathing entrainment system for
stress alleviation. The LSTM approach yielded the highest 4.3 Speech
classification accuracy as it effectively captures temporal Speech constitutes a crucial communication channel con-
information in EEG. taining human status information such as gender, age, and

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 9

emotional state making it a vital source of information in nisms have been proven valuable for speech processing in
everyday life [131], but also in communication between other applications as well. TL is again one technique that
humans and intelligent systems [132]. Speech emotion when leveraged seems to contribute to the performance of
recognition remains a challenge due to sound diversity and the models, especially for small datasets.
cultural expression. Notwithstanding, speech signals are
readily recorded using microphones, instead of bio-signal- 4.4 Facial expressions
based methods, which need direct sensor attachment to the
Facial and action-based cues. Stress is strongly related to facial
body.
expressions of fear, sadness, anger, and disgust. Conse-
Studies have developed innovative methods for enhanc-
quently, during episodes of anxiety, observable facial man-
ing stress recognition in speech by addressing various chal-
ifestations of these negative emotions are to be expected.
lenges in feature extraction and speaker variability. Huang
Pediaditis et al. [141] extracted several features from the
et al. [133] segmented speech into verbal, nonverbal, and
facial area related to head motion, blink rate, eye movement,
silent parts, using CNN-based models to extract features
mouth, and heart rate from facial color in order to explore
from each segment. These features were processed with an
the signs of anxiety. Besides, the small size of the dataset
LSTM equipped with an attention mechanism. It was shown
and the lack of personal baselines probably hindered the
that important characteristics in stress speech recognition
performance of the recognition algorithm. Zhang et al. [142]
tasks lie in both verbal and nonverbal sounds (e.g., laughter)
introduced a video-based stress detection network with
within an utterance. In addition, the efficacy of CNNs as
attention mechanisms to exploit users’ action cues along
feature extractors and the processing power of LSTMs for
with facial features.
sequential data are both showcased in this paper. Similarly,
Handling dataset challenges. To address the limitations
Han et al. [132] applied LSTM structures to capture temporal
of small datasets, recent studies have employed various
information from MFCC features by averaging the output
techniques such as data augmentation, advanced network
sequence and using the last frame-level output, emphasiz-
architectures, and TL. Toward this end, Zhang et al. [96]
ing the role of LSTMs in handling sequential speech data.
performed data augmentation to compensate for the small
Shin et al. [134] developed a speaker-invariant approach
dataset. In their work, Multi-task Cascaded Convolutional
to manage variations in individual speech patterns. They
Networks (MTCNN) are first trained for face detection and
used a spectral-temporal encoder and a multi-head attention
key landmark location. The employed CNN for classifi-
mechanism to capture local and global speech relationships.
cation uses the structure of the LeNet-5 model [143] to
In addition, adversarial multi-task training was used to dis-
connect low-level features learned by the network and high-
tinguish stress-related characteristics from speaker-specific
level features. This approach returned the best accuracy
traits.
results compared to the same CNN model but without
OpenSMILE toolkit. In the literature for Speech Emo-
connections. Prasetio et al. [144] developed a system that
tion Recognition (SER), the open-source toolkit OpenSMILE
divides images into three parts, eyes, nose, and mouth, and
[135] has been widely employed for audio feature ex-
features are extracted and augmented by combining them.
traction and classification of speech signals. Partila et al.
In the last step, a CNN classifies data into three classes,
[131] extracted several features with the assistance of this
surpassing previous works that employed SVM. The lack of
toolkit and proved that modern models such as CNNs have
a benchmark dataset for stress/no stress classification led
the potential to be used in the context of stress detection
Almeida and Rodrigues [97] to investigate TL with fine-
despite the need for large amounts of data. Avila et al. [136]
tuning techniques.
proposed a new set of modulation features and functionals-
In a nutshell, DL approaches outperform traditional ML
based pooling mechanisms and investigated their applica-
ones when using facial expressions as is demonstrated with
bility in stress detection context compared to the benchmark
other signals as well. The small datasets appear to be an
OpenSMILE features. A CNN model and the proposed fea-
important issue for this research area and TL has been used
tures outperformed other shallow and deep classifiers in the
to alleviate it. The stress emotional state is assumed to be
most difficult task involving nine classes. Interestingly, CNN
significantly related to other emotional states in terms of
also efficiently replaced the need for statistical pooling, a
facial expressions and respective labels in the data sets are
step that boosted performance in previous works [137].
used to represent the stress state.
Transfer and semi-supervised Learning. TL is considered in
[138] aiming to detect PTSD. The authors propose train-
ing a DBN model with data derived from a large speech 4.5 Gestures and body movement patterns
dataset [139] and then transferring the knowledge learned Chen et al. [71] developed a dataset for recognizing sponta-
to compute deep features for two small datasets. The new neous micro-gestures, which are subtle movements elicited
deep representations of data provide higher accuracy results by inner emotions. They captured four types of data during
when compared to traditional raw features and the SVM their experiments: RGB videos, silhouette images, depth
classifier. Finally, the lack of labeled data in [140] is compen- videos, and 3D skeleton coordinates. To model the temporal
sated by adopting semi-supervised learning with autoen- aspect of these gestures, they employed a Hidden Markov
coders. The findings show that the suggested approaches Model (HMM) to analyze sequences of 3D skeleton data.
increase recognition performance by learning priors from A Deep Belief Network (DBN), leveraging Restricted Boltz-
unlabeled data in settings with few labeled instances. mann Machines (RBMs), was used to estimate the HMM
In sum, speech-based stress detection is dominated by states. Additionally, a two-layer Fully-Connected Network
attention mechanisms. This is expected as attention mecha- was trained to detect the gestures.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 10


TABLE 2: Studies for Stress Detection

Ref Dataset Model(s) Modalities Best Performance Validation

17.3±9.2
[104] Their Own Dataset CNN, LDA, SVM ECG 4-fold CV
(Error Rate CNN)
CNN, SVM, KNN,
[110] Their Own Dataset ECG 97.22% (Acc. CNN) 5-fold CV
LDA, DT

DNN, DT, kNN,


[100] SRAD [79], Their Mental ECG 93.8% (AUC DNN) 10-fold CV
LR, RF, SVM
Arithmetic Data Set
Deep ECGNet, 87.39%, 73.96%
[99] Their Own 2 Datasets ECG 5-fold CV
several ML classifiers (Acc. Deep ECGNet)
MLP, NB, SVM,
[146] Their Own Dataset ECG 83% (AUC C4.5) LOOCV
Adaboost, C4.5
[117] SWELL [118] VGG16, CNN ECG 98.99% (Acc. CNN + AE + TL) 10-fold CV,
various
statistical
tests
[109] SRAD [79], Their Own SVM, shallow ANN, Inception and ECG 97.19% (Acc.) Hold-out set
Dataset Inception-LSTM
[107] DriveDB CNN ECG 83.55%, 98.77% (Acc. 2,3 classes) 4-fold CV

[112] Their Own Dataset LSTM HRV 87.9% (Acc. 2 classes) LOOCV

[101] Their Own Dataset CNN-BiLSTM, DL with attention ECG 86.8% (Acc. attention) 5-fold CV
mechanism
[103] Their Own Dataset CNN-BiLSTM, shallow ML ECG 86.5% (Acc. CNN-BiLSTM) 5-fold CV

[111] Their Own Dataset CNN ECG 92.8% (Acc.) Hold-out set

[115] WESAD [80], SWELL-KW CNNs ECG 91.1% (Acc. WESAD) LOSO CV
[118]
[102] Their Own Dataset CNN (AlexNet [114]) ECG 99.397% (Acc.) 5-fold CV

[106] SRAD [79] CNN-based pre-trained networks ECG 98.11% (Acc. Xception) Hold-out set

[121] Their Own Dataset CNN ECG 85.45% (Acc.) 10-fold CV

[105] Their Own Dataset CNN ECG 9.2 ± 5.7% (ER) 4-fold CV

[120] SWELL CNN with SSL ECG 98% (Acc.) 10-fold CV

[119] WESAD, RML SSL ECG 94.09% (Acc. WESAD) 10-fold CV

98%, 94.5% (Acc. CNN


[108] Their Own Dataset CNN, KNN, MLP, SVM HRV 10-fold CV
cognitive, emotional stress)
[40] Their Own Dataset LSTM HRV 80% (Acc.) Hold-out set

[122] WESAD, Their Own CNN, shallow ML classifiers PPG 99.18% (Acc. stress detection, Extra 5,10-fold CV,
Dataset Trees), 98.5% (Acc. stressor types, LOSO CV
CNN)
[123] Their Own Datasets FL + MPL, SVM PPG 89.55% (Acc. FL + MPL) Hold-out set

[126] Their Own Dataset SVM, CNNs, LSTMs, LSTM-CNN EEG 93.27% (Acc. LSTM) Hold-out set

97.95% (Acc. Ensemble


[124] Their Own Dataset NN, SVM, RF EEG 10 fold-CV
method)
[128] Their Own Dataset DNN EEG 80.13% (Acc.) Hold-out set

[129] Their Own Dataset DNN, CNN, SVM EEG 86.62% (Acc. DNN) Hold-out set

[125] Their Own Dataset CNN, shallow ML models EEG 96% (Acc. CNN) Hold-out set

[127] DEAP [147] pre-trained AlexNet EEG 86.12% (Acc. 3D maps) 10 fold-CV

[77] DeepBreath [77], [78] CNN, 3 shallow NNs RESP 84.59%, 56.52% (Acc. CNN binary, 8-fold LOSO
multi-class) CV
[130] Their Own Dataset MLP RESP 94,44% (Acc.) 10-fold
LOSO CV
SRAD [79],
MT-NN, DRCN-MT-NN, 96% (F-score MT-NN,
[148], MDVFDD [92], HR, GSR Hold-out set
LR, linear SVM, RBF SVM AUC DRCN-MT-NN)
[149] Their Own Dataset
[150] SRAD [79] CNN HR, GSR 95.67% (Acc.) LOOCV

[151] WESAD [80] LSTM, RF, Least-Squares Boosting, ECG, GSR >90% (Acc. NARX) LOSO CV
NARX
[152] ASCERTAIN [81], CLAS CNN ECG, GSR 75.5%, 69.9% (Acc. ASCERTAIN, Hold-out set
[83] CLAS)
[153] ASCERTAIN [81], CLAS GRNN, CNN with/without TL ECG, GSR 78.8%, 72.6% (Acc. ASCERTAIN, Hold-out set
[83] CLAS CNN with TL)
Continued on next page

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 11


TABLE 2 – Continued from previous page
Ref Dataset Model(s) Modalities Best Performance Validation

[154] ASCERTAIN [81], CLAS CNN using MMTM ECG, GSR 85%, 80.3% (Acc. ASCERTAIN, Hold-out set
[83] CLAS)
[155] Their Own Dataset CNN BVP, GSR 49.23% (Acc. 3 classes) 5-fold CV

[156] Their Own Dataset shallow ML models, LSTM BVP, GSR 90% (AUC LSTM) Hold-out set

[157] Their Own Dataset DT, RNN, CRNN BVP, GSR 71.6% (AUC RNN) 3-fold CV

FFDNN, RF, DT, AdaBoost, GSR, ECG, BVP, 95.21%, 84.32%


[158] WESAD [80] LOSO CV
kNN, LDA, Kernel SVM RESP, EMG, TEMP (Acc. FFDNN 2, 3 classes)
[159] WESAD [80] CNN GSR, ECG, BVP, 96.62% (Acc.) LOSO CV
RESP, EMG, TEMP,
ACC
CNN (chest signals), GSR, ECG, BVP, 99.65%, 99.80%
[160] WESAD [80] 10-fold CV
MLP (wrist signals) RESP, EMG, TEMP (Acc. CNN, MLP)
[161] WESAD [80] Hierarchical CNN GSR, ECG, BVP, 87.7% (Acc.) 4-fold CV
RESP, EMG, TEMP,
ACC
CNN-LSTM, SVM, RF,
[162] Their Own Dataset ECG, RESP 83.9% (Acc. CNN-LSTM) 5-fold CV
kNN, LR, DT
92%, 90%, 89% (group 1,
[163] Their Own Dataset CNN BR, HR, HRV, GSR 7-fold LOSO
group 2, all-driver)
CV
[164] SRAD [79] MLP, PILAE with Adaboost ECG, EMG, FGSR, 90.09 (Acc. PILAE with Adaboost, LOOCV
HGSR, HR, RESP FGSR)
[165] WESAD [80], ADARP [84] k-NN, DT, CNN ACC, EDA, BVP, 98.86% (Acc.) LOSO CV,
TEMP Hold-out set
[166] Their Own Dataset CNN HR, RESP, EDA, 90% (Acc. all signals) Hold-out set
EEG
[136] SUSAS [85] CNN, SVM, DNN Speech 70% (Acc. CNN) 3-fold CV

[133] NNIME [93] CNN-LSTM Speech 52% (Acc.) Hold-out set

[132] Their Own Dataset LSTM-SVM, LSTM-Softmax Speech 66.4% (Acc. LSTM-SVM) Hold-out set

Recordings of 112 87.9%, 87.5%


[131] CNN, SVM, kNN Speech Hold-out set
emergency line (Acc. SVM, CNN)
98.01%
[134] Multimodal dataset [132], CRNN-Attention Speech Hold-out set
(Acc. on SUSAS 2 classes)
SUSAS dataset [85]
[138] TIMIT [139], Their Own DBN with TL Speech 74.99% (Acc.) LOSO CV
Datasets
[140] AEC, ABC, EMO, SUSAS semisupervised AE Speech 63.6% (UAR) 5-fold CV
[85], GeWEC
[142] Their Own Dataset TSDNet, RF, Gaussian NB, DT, Facial Expressions 85.42% (Acc. TSDNet) Hold-out set
FDASSNN

[144] FERET [90] CNN Facial Expressions 95.9% (Acc.) 5-fold CV

[141] Their Own Dataset MLP, NB, Bayes network, SVM, DT Facial Expressions 73% (Acc. MLP) LOOCV

[96] CK+ [86], Oulu-CASIA MTCNN Facial Expressions 99.3% (Acc. on KMU-FED) 10-fold CV
[88], KMU-FED [89]

[97] KDEF [87], CK+ [86], Their VGG16 with TL Facial Expressions 92.1% (Acc.) Hold-out set
Own Dataset
[71] Their Own Dataset DBN micro-Gestures 60.6% (Acc.) 4-fold LOSO
CV
[145] Their Own Dataset Attention-Based CNN-BiLSTM Body movement 76.22% (Acc.) LOSO CV

[167] Tweets from Weibo, FGM+CNN, LR, SVM, RF, GBDT Tweets from Social 91.55% (Acc. FGM+CNN) 5-fold CV
Twitter Media
[168] Twitter history MTL, STL Tweets 80% (AUC MTL) plot ROC

[169] TILES [94] Bi-LSTM Multimodal ≃65% (Acc.) Hold-out set

[170] StudentLife [95] LSTM, CNN, CNN-LSTM Multimodal 62.83% (Acc. LSTM) Hold-out set

[171] Their Own Dataset MTL-NN, MTMKL, HBLR Multimodal 81.5% (Acc. MTL-NN) 5-fold CV

[42] Their Own Dataset CNN-LSTM Multimodal 92.8% (Acc.) Hold-out set

[172] SNAPSHOT study [173] LSTM, SVM, LR Multimodal 83.6% (Acc. LSTM) 5-fold CV

[174] SNAPSHOT MTMKL, HBDPP, NN with MTL Multimodal 86.07% (Acc. for stress) 5-fold CV

[41] Their Own Dataset DNN Multimodal 96.05% (Acc.) Hold-out set

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 12

A person’s body movement patterns may also reveal convolutional and recurrent layers is also employed in [149]
signs of stress. Shin et al. [145] exploited an IR-UWB radar for domain adaption. Lee et al. [150] were also interested
to capture signals in a non-contact manner and proved that in driving stress detection. Their multimodal CNN model
compared to ACC signal denoting motion from wearables, architecture was able to learn representative stressed pat-
a higher performance detecting stress can also be achieved. terns using two-dimensional nonlinear input. A valuable
The attention mechanism in the proposed classification addition to the discussion of efficient real-time detection
model aims to identify dependencies in consecutive stress- was added by Martino and Delmastro [151] who were
induced data at both the local and global levels. inspired by e-health systems and attempted to detect stress
This area of research on stress detection seems to have with a higher resolution, i.e., in multiple levels, to provide
limited representation in the literature probably due to a enhanced feedback to users.
lack of publicly available data sets. Based on the strong link Instead of developing personalized models, Radhika and
between body-related expressions and stressful situations, Oruganti [152], [153], [154] were interested in subject-
this particular area should be further investigated in the independent stress detection with ECG and GSR data. Their
future. efforts concentrated on searching for the optimal level to
concatenate ECG and GSR in a CNN model [152]. Results
indicate that deep multimodal fusion on convolutional lay-
4.6 Social media content
ers is optimum compared to fusion on fully connected lay-
Lin et al. [167] studied the recognition of stress based on ers. In addition, TL is examined and improved the model’s
social interactions in social networks. The proposed archi- accuracy performance [153]. In [154], a Multimodal Transfer
tecture model is a CNN with cross-autoencoders to gener- Module (MMTM) is added after the last convolutional layers
ate user-level content attributes from tweet-level attributes. for intermediate fusion of the two modalities, and late fusion
Benton et al. [168] focused on estimating suicide risk and by taking the maximum classification probabilities. This
mental health in a DL framework utilizing Twitter users’ method along with TL, further enhanced the classification
history data. The mental conditions examined are neu- performance.
roatypicality, suicide attempt, depression, anxiety, eating, Seo et al. [162] aimed at recognizing mental stress from
panic, schizophrenia, bipolar disorder, PTSD, and gender. raw ECG and respiration signals. The Deep ECG net they
MTL predictions for anxiety were shown to reduce the ER propose contains two convolutional layers for each signal
compared to single-task models. Performance was further that extract features and two LSTM layers obtaining sequen-
improved when gender was included as an auxiliary task tial information about the features extracted.
within the MTL framework. Multiple physiological signals. Approaches for detecting
Again, this subfield appears to have fewer DL-based stress using physiological signals from more than two
studies for stress detection. Nevertheless, based on the vast modalities have also been investigated. Recognizing driving
information flow on social media and its relation to the stress by multiple modalities is discussed in [163] using a
emotional traits of the users, this line of research should CNN model architecture and statistical features and in [164],
be further investigated. suggesting stacked AE for automatically extracting features.
In the latter, experiments with single input signals indicated
4.7 Multimodal signals that FGSR alone can produce similar results, thus the trade-
off between cost and performance is discussed. Similarly,
4.7.1 Multiple physiological signals in [165], the performance of stress detection algorithms is
GSR and heart-related signals. In many research studies aimed compared using a single ACC, EDA, BVP, or temperature
at monitoring stress, the inclusion and analysis of several against using all the aforementioned signals. The authors
physiological data is a prevalent strategy. GSR and heart- in [166] combined HR, RESP, and EDA along with EEG
related signals, for instance, are frequently coupled to iden- data to enhance the performance of a CNN but reduce the
tify stress. The key reason is that these sensors may be convenience of the wireless platform.
included in wearable devices, interfering with the user as Studies on WESAD and multiple physiological signals. The
little as possible and allowing for real-time analysis and following four works investigate this multimodal approach
feedback. Studies in the field have examined applications in using the WESAD dataset. In [160] an end-to-end deep CNN
the context of user experience analysis [155] by classifying was proposed, consisting of identical 1D convolutional
BVP and EDA signals from wearables with an end-to-end blocks, for chest signals, and a multilayer perceptron for
DL architecture (CNN) without feature extraction. Other wrist signals. The CNN in [161] is hierarchical in the sense
applications include the real-time monitoring of adults and that 1D CNN sub-subnetworks exist for each input signal,
older adults [157], [156], where BVP and GSR are acquired and 1D CNN subnetworks exist for each device (wrist or
by wearable devices, handcrafted features are extracted and chest-worn) while performing better when compared to
classified by LSTM and RNN networks. traditional ML classifiers used in a previous study [175].
Detecting stress in driving conditions is discussed in In [158] statistical features are extracted and classified with
[148], [149], [150]. Saeed et al. [148], [149] focused on shallow ML classifiers or a simple deep feedforward NN
detecting stress in driving conditions utilizing HR and which provides an improvement in accuracy recognition.
GSR signals derived by wrist-worn devices. The extracted In [159] Fourier transforms, cube root, and constant Q trans-
features are classified by a multitask NN (MT-NN) with a form (CQT) are applied to examine different representations
specific layer for each subject (subjects-as-tasks) to provide of the signals as input to a CNN model with cube root
personalized models in [148]. A hybrid model of temporal showing improved results in the model’s accuracy.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 13

Concisely, heart-related signals and GSR signals are or Hierarchical Bayesian learning when combined with
more frequently combined to detect stress in various con- MTL.
texts, such as driving, or in different age groups, such as Despite the limited literature on multimodal stress de-
older adults and students. The fusion of more than two tection, it appears as a promising research path. While the
signals is also investigated in the literature, not leading to multiple physiological signals approaches did not seem to
significantly better results, whereas the issue of the fusion provide significant gains in contrast with a single-signal ap-
techniques arises as a fundamental module in the classifi- proach, the multimodal stress detection area seems to lead
cation framework that has to be appropriately dealt with. to the enhancement of the respective models probably due
Again TL has also been utilized in this line of research but to the heterogeneity of the modalities under consideration.
not so extensively as in the case of a single physiological
signal.
5 D ISCUSSION
4.7.2 Diverse modalities Detecting stress in an efficient and timely way helps circum-
Gaballah et al. [169] leveraged speech, location, and circa- vent its adverse implications on a human’s general health.
dian rhythm to add contextual information that presumably On that account, there is a growing interest in capturing the
enhances stress recognition performance. The authors sug- traits that stress states exhibit by utilizing multiple sensors
gested a Bi-LSTM model to ensure feedback from future for data collection and analysis.
time steps. Based on experimental results, adding contex- Physiological signals and their alternations during
tual information improved both accuracy and F1 scores stressful moments are highly examined in studies aiming
when audio with location was included. They were further to recognize stress. Among them, the sensors that track
improved when audio with location and circadian rhythm heart activity are the most popular and usually embedded
were included. in wearable devices. The need for real-time detection led
Aristizabal et al. [41] examined stress detection with the researchers to consider raw or minor-processed ECG
physiological signals in combination with behavioral mea- data in short-time windows (e.g., 10s). In some studies,
sures intending to alleviate stress in workplaces. The au- techniques for converting the 1D signals to 2D images as
thors designed two DL models, one that takes as input only input to CNN-based models are considered. In addition,
the physiological signals, and a second which receives both pre-trained DL networks helped deal with the little data
physiological data and behavioral measures. The proposed in the field. Results indicate that leveraging DL methods
method demonstrated that self-reports along with physi- boosted also the performance compared to algorithms based
ological data greatly contributed to more accurate stress on hand-crafted features and traditional ML. Recording the
detection. EEG activity in a non-invasive, wireless way is proposed in
Rastgoo et al. [42] classified drivers’ stress employing some studies to deploy DL algorithms that also suggest that
ECG signals, vehicle dynamics data, and contextual data. less time in classifying mental states is required compared
They propose a DL multimodal fusion framework consist- to shallow ML.
ing of CNN to automatically extract features and LSTM to Exploiting non-contact means such as thermal imaging
classify stress in three levels. The data collection procedure cameras and bioradar to capture alterations in breathing
involved different driving scenarios to induce different lev- patterns related to stress is also a promising field since in
els of stress. Comparison with handcrafted features revealed certain environments such as hospitals, schools, or work
the effectiveness of the DL method. wearing sensors may be uncomfortable or distressing and
Acikmese and Alptekin [170] attempted to predict the the use of cameras brings up privacy concerns.
stress levels of students from passive mobile data. Towards The fusion of multiple physiological signals formulates
this direction, authors leveraged data from StudentLife the proposal to capitalize on various data as input to classi-
dataset [95]. Sensor data include three types of data: i) fiers. The trade-off between cost and performance should be
activity and audio inference, ii) conversation, light sensor, taken into account when deploying these methods since one
phone charge and phone lock data, and iii) Bluetooth and can leverage the additional information for detecting stress
Wi-Fi data. Other than sensor data, deadlines, phone logs, responses with boosted accuracy, yet in some cases combin-
running apps, SMS, and the time of day are used. The ing several sensor data deteriorates models’ performance
proposed classifier is an LSTM network and reached higher and is bothersome for users.
accuracy than CNN and CNN-LSTM. Regarding the studies based on speech signals, it is evi-
Umematsu et al. [172] attempted to forecast students’ dent that DL approaches can also serve as feature extractors,
daily life stress using LSTMs. The data gathered from a 30- proposing new representations for data. CNNs extracted
day study measured Sleep, Networks, Affect, Performance, features related to emotion and sound, DBN proposed new
Stress, and Health using Objective Techniques (SNAPSHOT) sets of deep features and LSTM networks could capture the
[173] and included physiological, mobile phone, and behav- temporal information when used to generate features.
ioral survey data from college students. They proposed a Detecting stress based on facial expressions has the ad-
LSTM model for classification and SVM, and LR models for vantage of non-contact recording since cameras are often
comparison. The best accuracy was obtained using LSTM. employed to track facial regions. Deep network architec-
Jaques et al. [174] and [171] exploited the potential of NN tures played an essential role not only in the classification
for MTL in the same dataset (SNAPSHOT) to perform stage but also in pre-processing data, discriminating ex-
personalized stress detection since a more complex model pressionless and expressive face images, detecting face and
presents a higher generalization ability compared to SVMs alignment, or performing complementary classifications to

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 14

first discriminate emotions related to stress. Observing body to inter-instance and intra-instance variations [120]. Conse-
movement and micro-gesture activity has also recently been quently, these models can be adapted for different tasks
established as a means to quantify stress levels. within the same domain, offering a robust alternative to
Leveraging the large amount of data derived from social traditional fully-supervised methods.
media users and analyzing them with DL algorithms to
detect mental health issues timely can potentially reduce 5.1.2 Time, space and computational requirements
the risk of anxiety or even suicide. Finally, incorporating The DL methods offer great potential for interpreting huge
environmental, behavioral, or smartphone data into phys- amounts of information as they are robust and ideal both
iological signals increased the overall performance of DL for offline and online processing (e.g., in wearable devices
algorithms which are suitable for handling heterogeneous for real-time stress assessment) requiring also less compu-
multidimensional data. tational time for feature extraction and classification. Be-
sides, deep architectures can be trained to provide multi-
level classification as opposed to binary supervised shallow
5.1 Limitations and open issues
algorithms [129].
5.1.1 Data requirements A promising characteristic of deep architectures, espe-
cially CNNs is that apart from being powerful classifiers,
DL approaches demand a sufficient volume of data and DL
they show excellent performance in deriving deep fea-
models are capable of learning complex patterns and rela-
tures as input to deep and shallow classification models.
tionships within them. However, this capability also makes
This unsupervised feature learning partially automates and
these models prone to overfitting, i.e., instead of learn-
simplifies the learning process as it may propose data-
ing generalized features they memorize the learning data.
driven solutions that are independent of the classification
Hence, for a model to effectively generalize to previously
task. Considering the diverse traits stress conveys among
unknown data and perform well in the real-world setting, it
individuals, context, stress type, or sensors, the concept of
needs to be trained on a varied and representative dataset.
auto-generated features is extremely advantageous in stress
However, the publicly available benchmark datasets are
recognition problems. Furthermore, the burden of extensive
limited and usually small, impeding the proper validation
data processing or comprehensive knowledge of datasets is
of deployed models. In addition, it is common to provide
alleviated. To illustrate, the efficacy of HRV parameter-based
ground truth labels for binary classification, hindering the
methods relies upon the accuracy of the R-peak detection
multi-level stress assessment or lacking task-related data
algorithm. On top of that, manual feature extraction from
[107]. These limitations may lead to deploying models that
data frequently necessitates additional feature selection to
are not robust enough. To address this, researchers often
disclose the most valuable attributes or combinations of
conduct their own experiment studies to record the data of
them for the model, adding complexity to the process.
interest tailored to their specific task or domain. The main
challenges in this procedure are to design a protocol able
to effectively induce stress in participants and imitate real- 5.1.3 Explainable AI
world conditions in a controlled setting while maintaining Taking into consideration the studies in this survey, it is
ethical standards and without affecting the quality of the evident that significant performances in detecting stress in
recorded data. Meanwhile, obtaining a profusion of data humans are reached, while researchers reported that DL
that is representative of different individuals to compensate models often achieved better accuracy scores when com-
for inter-variability problems and of probably rare condi- pared to other traditional ML algorithms. However, models’
tions (e.g., autism, PTSD) should also be taken into account decisions may be considered as a ’black box’ to the end
when designing an experimental protocol. Despite the dif- user, and the parameters that led to these remarkable per-
ficulties in recruiting participants in data collection exper- formances and decision-making processes remain hidden.
iments, the inclusion of an adequate number of subjects This uncertainty becomes larger when more complex and
leads to improved efficiency in stress detection algorithms, opaque models (i.e., DL models) are deployed. Toward this
prevents over-fitting problems, and enhances generalization end, explainable AI (XAI) has become an important research
performance. The inadequacy of available training data is topic, to interpret a machine learning model decision [176],
also confronted by augmenting and synthesizing a dataset. making them more transparent and interpretable to users.
TL based on deep pre-trained models is also regarded in In particular, AI systems developed for medical domains,
many studies as taking advantage of networks trained with such as mental health and stress will be highly benefited
large datasets and applying the transferred knowledge to by XAI as both experts and end-users will gain access
the target data. Following these strategies, researchers can to information regarding a model’s output. Utilizing XAI,
achieve state-of-the-art performance even with limited data. one can explore the contribution of each input feature to
Moreover, the deep TL approach reduces the computational the model. It also allows analysis of the impact of pre-
cost and time of deploying robust models from scratch. processing steps such as signal filtering or normalization of
Besides, it is often hypothesized that the data distribution the signals on the predictions. Additionally, XAI facilitates
to be learned is similar to that of the pre-trained model performance comparison of different models on the same
to effectively apply TL [100]. To address this challenge for task. For instance, Chalabianloo et al. [177] used SHAP
small datasets, the strategy of freezing certain layers within values, which are unified measures of feature importance,
the pre-trained network can be implemented. Moreover, the to investigate which features were most important and
feature manifolds learned using SSL typically are resilient how different scaling techniques affected the results from

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 15

two types of models. This understanding can inform better is evidenced that DL approaches favor the higher perfor-
preprocessing strategies to enhance model performance. mance of stress detection systems irrespective of the modal-
ity used, e.g., physiological signals, and facial expressions.
5.1.4 Labeling The necessity of stress detection is discussed, as well as,
the most important traits of the signals used for this task
The validity of the training data directly impacts a model’s and their relation to stress. Moreover, we describe publicly
performance. Nevertheless, the ground truth labels of available datasets adopted in research identifying stress. It
datasets are often insufficient or of low quality, and labeling is remarked that DL models tend to outperform common
techniques should also be considered by researchers in shallow machine learning, having great potential for stress
data preparation stages [178]. The data collectors usually detection applications. Overall, the presented works in this
annotate and assign labels to their constructed datasets or survey define the main current research road toward DL-
employ crowd-sourcing methods to reduce time and cost, based stress detection and pave the way for the next steps
especially in large datasets. In the context of the stress de- for this endeavor. In sum, the construction of new, large-
tection problem, the issue becomes apparent in cases when scale, multi-modal data sets for stress detection, explainable
publicly available datasets do not exist, the data are unla- models for stress detection based on DL architectures, and
beled or the ground truth provided refers to fewer classes the exploitation of the multifaceted expression of stress to
(e.g. binary case) than desired, hindering the deployment capture signals in a non-invasive manner should lead the
of algorithms assigning multi-level predictions. To address research attempts for this field in the foreseeable future.
this, researchers in the field often conduct their experiments
for collecting and annotating signals, also exploring stress
responses in the intended context (e.g. driving) by induc- ACKNOWLEDGMENTS
ing various types of stress (mental, cognitive, physical). This research is part of the MuseIT project “Multisen-
In addition, depending on the availability of ground truth sory, User-centered, Shared cultural Experiences through
labels and the scope of the study, researchers consider either Interactive Technologies” that is funded by the European
the self-reports of participants or the context of the study Union under the Horizon Europe Framework Program and
protocol to construct labeled datasets. Future directions the grant agreement N°101061441. Views and opinions ex-
could also consider SSL, which enables models to learn from pressed are however, those of the author(s) only and do not
the data themselves without relying on explicit labels. This necessarily reflect those of the European Union or European
approach holds promise, especially for stress detection in Research Executive Agency. Neither the European Union
the wild from wearable sensors, when ground truth from nor the granting authority can be held responsible for them.
participants or the context is difficult to obtain [179], [180].

R EFERENCES
5.1.5 Privacy
[1] H. Yaribeygi, Y. Panahi, H. Sahraei, T. P. Johnston, and A. Sahe-
AI systems for predicting stress levels have been devel- bkar, “The impact of stress on body function: A review,” EXCLI
oped in different aspects of daily life such as work, school journal, vol. 16, p. 1057, 2017.
and education, healthcare units, or driving. A major pri- [2] A. Mariotti, “The effects of chronic stress on health: new insights
into the molecular mechanisms of brain–body communication,”
vacy concern arises when personal information is stored Future science OA, vol. 1, no. 3, 2015.
and analyzed [181], [182]. Users may be concerned about [3] L. Xia, A. S. Malik, and A. R. Subhani, “A physiological signal-
the possibility of misuse or unauthorized access to their based method for early mental-stress detection,” Biomedical Signal
Processing and Control, vol. 46, pp. 18–32, 2018.
personally identifiable information. To that end, privacy- [4] R. W. Picard, “Automating the recognition of stress and emotion:
preserving studies have explored the possibility of respect- From lab to real-world impact,” IEEE MultiMedia, vol. 23, no. 3,
ing users’ personally identifiable information. In [123], [183] pp. 3–7, 2016.
and [184], FL solutions for storing and sharing the locally [5] ——, “Affective computing-mit media laboratory perceptual
computing section technical report no. 321,” Cambridge, MA, vol.
trained ML model and its parameters instead of the raw 2139, 1995.
biomedical data of each individual are discussed. Moreover, [6] J. A. Russell, “A circumplex model of affect.” Journal of personality
a two-layer privacy-preserving platform architecture [185] and social psychology, vol. 39, no. 6, p. 1161, 1980.
[7] S. Greene, H. Thapliyal, and A. Caban-Holt, “A survey of af-
has been proposed. These architectures aim to provide pri-
fective computing for stress detection: Evaluating technologies
vacy guarantees while still enabling effective collaborative in stress detection for better health,” IEEE Consumer Electronics
learning and model sharing. Although various studies using Magazine, vol. 5, no. 4, pp. 44–56, 2016.
traditional ML approaches take into account the privacy [8] S. S. Panicker and P. Gayathri, “A survey of machine learning
techniques in physiology based mental stress detection systems,”
issue of stress detection, only one study of those surveyed Biocybernetics and Biomedical Engineering, vol. 39, no. 2, pp. 444–
here deals with such an aspect. Further work should be 469, 2019.
dedicated to this direction. By properly addressing privacy [9] G. Giannakakis, D. Grigoriadis, K. Giannakaki, O. Simantiraki,
issues, trust in AI systems could be enhanced. A. Roniotis, and M. Tsiknakis, “Review on psychological stress
detection using biosignals,” IEEE Transactions on Affective Com-
puting, 2019.
[10] A. Anusha, J. Jose, S. Preejith, J. Jayaraj, and S. Mohanasankar,
6 C ONCLUSION “Physiological signal based work stress detection using unob-
trusive sensors,” Biomedical Physics & Engineering Express, vol. 4,
In this paper, we presented an overview of the works no. 6, p. 065001, 2018.
[11] R. Sabo, M. Rusko, A. Ridzik, and J. Rajčáni, “Stress, arousal, and
that apply at least one DL technique for detecting stress stress detector trained on acted speech database,” in International
and also take into account various modalities of data. It Conference on Speech and Computer. Springer, 2016, pp. 675–682.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 16

[12] O. Simantiraki, G. Giannakakis, A. Pampouchidou, and M. Tsik- [34] Y. S. Can, B. Arnrich, and C. Ersoy, “Stress detection in daily life
nakis, “Stress detection from speech using spectral slope mea- scenarios using smart phones and wearable sensors: A survey,”
surements,” in Pervasive Computing Paradigms for Mental Health. Journal of biomedical informatics, vol. 92, p. 103139, 2019.
Springer, 2016, pp. 41–50. [35] N. Gao, W. Shao, M. S. Rahaman, and F. D. Salim, “n-gage:
[13] G. Giannakakis, D. Manousos, V. Chaniotakis, and M. Tsiknakis, Predicting in-class emotional, behavioural and cognitive engage-
“Evaluation of head pose features for stress detection and classifi- ment in the wild,” Proceedings of the ACM on Interactive, Mobile,
cation,” in 2018 IEEE EMBS International Conference on Biomedical Wearable and Ubiquitous Technologies, vol. 4, no. 3, pp. 1–26, 2020.
& Health Informatics (BHI). IEEE, 2018, pp. 406–409. [36] O. Attallah, “An effective mental stress state detection and eval-
[14] B. Arnrich, C. Setz, R. La Marca, G. Tröster, and U. Ehlert, “What uation system using minimum number of frontal brain elec-
does your chair know about your stress level?” IEEE Transactions trodes,” Diagnostics, vol. 10, no. 5, p. 292, 2020.
on Information Technology in Biomedicine, vol. 14, no. 2, pp. 207–214, [37] M. F. Rizwan, R. Farhad, F. Mashuk, F. Islam, and M. H. Imam,
2009. “Design of a biosignal based stress detection system using
[15] E. A. Sağbaş, S. Korukoglu, and S. Balli, “Stress detection via machine learning techniques,” in 2019 International Conference
keyboard typing behaviors by using smartphone sensors and on Robotics, Electrical and Signal Processing Techniques (ICREST).
machine learning techniques,” Journal of medical systems, vol. 44, IEEE, 2019, pp. 364–368.
no. 4, pp. 1–12, 2020. [38] K. Ota, M. S. Dao, V. Mezaris, and F. G. D. Natale, “Deep
[16] S. D. Gunawardhane, P. M. De Silva, D. S. Kulathunga, and S. M. learning for mobile multimedia: A survey,” ACM Transactions on
Arunatileka, “Non invasive human stress detection using key Multimedia Computing, Communications, and Applications (TOMM),
stroke dynamics and pattern variations,” in 2013 International vol. 13, no. 3s, pp. 1–22, 2017.
Conference on Advances in ICT for Emerging Regions (ICTer). IEEE, [39] A. Khamparia and K. M. Singh, “A systematic review on deep
2013, pp. 240–247. learning architectures and applications,” Expert Systems, vol. 36,
[17] J. Aigrain, M. Spodenkiewicz, S. Dubuisson, M. Detyniecki, no. 3, p. e12400, 2019.
D. Cohen, and M. Chetouani, “Multimodal stress detection from [40] L. V. Coutts, D. Plans, A. W. Brown, and J. Collomosse, “Deep
multiple assessments,” IEEE Transactions on Affective Computing, learning with wearable based heart rate variability for prediction
vol. 9, no. 4, pp. 491–506, 2016. of mental and general health,” Journal of Biomedical Informatics,
[18] C. Janiesch, P. Zschech, and K. Heinrich, “Machine learning and vol. 112, p. 103610, 2020.
deep learning,” Electronic Markets, pp. 1–11, 2021. [41] S. Aristizabal, K. Byun, N. Wood, A. F. Mullan, P. M. Porter,
[19] M. R. Anderson and M. Cafarella, “Input selection for fast feature C. Campanella, A. Jamrozik, I. Z. Nenadic, and B. A. Bauer, “The
engineering,” in 2016 IEEE 32nd International Conference on Data feasibility of wearable and self-report stress detection measures
Engineering (ICDE), 2016, pp. 577–588. in a semi-controlled lab environment,” IEEE Access, vol. 9, pp.
[20] G. Farias, S. Dormido-Canto, J. Vega, G. Rattá, H. Vargas, G. Her- 102 053–102 068, 2021.
mosilla, L. Alfaro, and A. Valencia, “Automatic feature extraction [42] M. N. Rastgoo, B. Nakisa, F. Maire, A. Rakotonirainy, and
in large fusion databases by using deep learning approach,” V. Chandran, “Automatic driver stress level classification using
Fusion Engineering and Design, vol. 112, pp. 979–983, 2016. multimodal deep learning,” Expert Systems with Applications, vol.
[21] B. Chikhaoui and F. Gouineau, “Towards automatic feature ex- 138, p. 112793, 2019.
traction for activity recognition from wearable sensors: a deep [43] H. Lin, J. Jia, Q. Guo, Y. Xue, Q. Li, J. Huang, L. Cai, and
learning approach,” in 2017 IEEE International Conference on Data L. Feng, “User-level psychological stress detection from social
Mining Workshops (ICDMW). IEEE, 2017, pp. 693–702. media using deep neural network,” in Proceedings of the 22nd
ACM international conference on Multimedia, 2014, pp. 507–516.
[22] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol.
521, no. 7553, pp. 436–444, 2015. [44] A. Alberdi, A. Aztiria, and A. Basarab, “Towards
an automatic early stress recognition system for office
[23] S. Gedam and S. Paul, “A review on mental stress detection using
environments based on multimodal measurements: A
wearable sensors and machine learning techniques,” IEEE Access,
review,” Journal of Biomedical Informatics, vol. 59, pp. 49–
vol. 9, pp. 84 045–84 066, 2021.
75, 2016. [Online]. Available: \url{https://www.sciencedirect.
[24] S. Sharma, G. Singh, and M. Sharma, “A comprehensive review com/science/article/pii/S1532046415002750}
and analysis of supervised-learning and soft computing tech- [45] G. Tan, T. K. Dao, L. Farmer, R. J. Sutherland, and R. Gevirtz,
niques for stress diagnosis in humans,” Computers in Biology and “Heart rate variability (hrv) and posttraumatic stress disorder
Medicine, vol. 134, p. 104450, 2021. (ptsd): a pilot study,” Applied psychophysiology and biofeedback,
[25] G. P. Chrousos, “Stress and disorders of the stress system,” Nature vol. 36, no. 1, pp. 27–35, 2011.
reviews endocrinology, vol. 5, no. 7, p. 374, 2009. [46] U. R. Acharya, K. P. Joseph, N. Kannathal, C. M. Lim, and J. S.
[26] M. Le Fevre, J. Matheny, and G. S. Kolt, “Eustress, distress, Suri, “Heart rate variability: a review,” Medical and biological
and interpretation in occupational stress,” Journal of managerial engineering and computing, vol. 44, no. 12, pp. 1031–1051, 2006.
psychology, vol. 18, no. 7, pp. 726–744, 2003. [47] W. Handouzi, C. Maaoui, A. Pruski, and A. Moussaoui, “Objec-
[27] J. Bakker, M. Pechenizkiy, and N. Sidorova, “What’s your current tive model assessment for short-term anxiety recognition from
stress level? detection of stress patterns from gsr sensor data,” in blood volume pulse signal,” Biomedical Signal Processing and
2011 IEEE 11th international conference on data mining workshops. Control, vol. 14, pp. 217–227, 2014.
IEEE, 2011, pp. 573–580. [48] M. Sharma, S. Kacker, and M. Sharma, “A brief introduction and
[28] M. Abouelenien, M. Burzo, and R. Mihalcea, “Human acute stress review on galvanic skin response,” Int J Med Res Prof, vol. 2, pp.
detection via integration of physiological signals and thermal 13–17, 2016.
imaging,” in Proceedings of the 9th ACM international conference [49] M. Liu, D. Fan, X. Zhang, and X. Gong, “Human emotion recogni-
on pervasive technologies related to assistive environments, 2016, pp. tion based on galvanic skin response signal feature selection and
1–8. svm,” in 2016 International Conference on Smart City and Systems
[29] M. Gunnar and K. Quevedo, “The neurobiology of stress and Engineering (ICSCSE). IEEE, 2016, pp. 157–160.
development,” Annu. Rev. Psychol., vol. 58, pp. 145–173, 2007. [50] M. B. I. Reaz, M. S. Hussain, and F. Mohd-Yasin, “Techniques
[30] C. Xu, Y. Xu, S. Xu, Q. Zhang, X. Liu, Y. Shao, X. Xu, L. Peng, of emg signal analysis: detection, processing, classification and
and M. Li, “Cognitive reappraisal and the association between applications,” Biological procedures online, vol. 8, no. 1, pp. 11–35,
perceived stress and anxiety symptoms in covid-19 isolated peo- 2006.
ple,” Frontiers in Psychiatry, vol. 11, 2020. [51] J. Wijsman, B. Grundlehner, J. Penders, and H. Hermens, “Trapez-
[31] S. D. Kreibig, “Autonomic nervous system activity in emotion: A ius muscle emg as predictor of mental stress,” ACM Transactions
review,” Biological psychology, vol. 84, no. 3, pp. 394–421, 2010. on Embedded Computing Systems (TECS), vol. 12, no. 4, pp. 1–20,
[32] T. Iqbal, A. Elahi, P. Redon, P. Vazquez, W. Wijns, and A. Shahzad, 2013.
“A review of biophysiological and biochemical indicators of [52] R. Luijcks, H. J. Hermens, L. Bodar, C. J. Vossen, J. Van Os, and
stress for connected and preventive healthcare,” Diagnostics, R. Lousberg, “Experimentally induced stress validated by emg
vol. 11, no. 3, p. 556, 2021. activity,” PloS one, vol. 9, no. 4, p. e95215, 2014.
[33] Y. Lecrubier, “The burden of depression and anxiety in general [53] S. A. Hosseini and M. A. Khalilzadeh, “Emotional stress recog-
medicine,” The Journal of clinical psychiatry, vol. 62, no. suppl 8, nition system using eeg and psychophysiological signals: Using
pp. 4–9, 2001. new labelling process of eeg signals in emotional stress state,” in

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 17

2010 international conference on biomedical engineering and computer [73] M. Ciman, K. Wac, and O. Gaggi, “isensestress: Assessing stress
science. IEEE, 2010, pp. 1–6. through human-smartphone interaction analysis,” in 2015 9th In-
[54] G. Giannakakis, D. Grigoriadis, and M. Tsiknakis, “Detection of ternational conference on pervasive computing technologies for health-
stress/anxiety state from eeg features during video watching,” in care (PervasiveHealth). IEEE, 2015, pp. 84–91.
2015 37th Annual International Conference of the IEEE Engineering in [74] G. Coppersmith, C. Harman, and M. Dredze, “Measuring post
Medicine and Biology Society (EMBC). IEEE, 2015, pp. 6034–6037. traumatic stress disorder in twitter,” in Proceedings of the Inter-
[55] G. Jun and K. G. Smitha, “Eeg based stress level identification,” in national AAAI Conference on Web and Social Media, vol. 8, no. 1,
2016 IEEE international conference on systems, man, and cybernetics 2014.
(SMC). IEEE, 2016, pp. 003 270–003 274. [75] N. Sambasivan, S. Kapania, H. Highfill, D. Akrong, P. Paritosh,
[56] M. Lee, J. Moon, D. Cheon, J. Lee, and K. Lee, “Respiration signal and L. M. Aroyo, ““everyone wants to do the model work, not
based two layer stress recognition across non-verbal and verbal the data work”: Data cascades in high-stakes ai,” in proceedings of
situations,” in Proceedings of the 35th Annual ACM Symposium on the 2021 CHI Conference on Human Factors in Computing Systems,
Applied Computing, 2020, pp. 638–645. 2021, pp. 1–15.
[57] A. Hernando, J. Lázaro, A. Arza, J. M. Garzón, E. Gil, P. Laguna, [76] J. R. Stroop, “Studies of interference in serial verbal reactions.”
J. Aguiló, and R. Bailón, “Changes in respiration during emo- Journal of experimental psychology, vol. 18, no. 6, p. 643, 1935.
tional stress,” in 2015 Computing in Cardiology Conference (CinC). [77] Y. Cho, N. Bianchi-Berthouze, and S. J. Julier, “Deepbreath: Deep
IEEE, 2015, pp. 1005–1008. learning of breathing patterns for automatic stress recognition
[58] J. Choi, B. Ahmed, and R. Gutierrez-Osuna, “Development and using low-cost thermal imaging in unconstrained settings,” in
evaluation of an ambulatory stress monitor based on wearable 2017 Seventh International Conference on Affective Computing and
sensors,” IEEE transactions on information technology in biomedicine, Intelligent Interaction (ACII). IEEE, 2017, pp. 456–463.
vol. 16, no. 2, pp. 279–286, 2011. [78] Y. Cho, S. J. Julier, N. Marquardt, and N. Bianchi-Berthouze, “Ro-
[59] C. H. Vinkers, R. Penning, J. Hellhammer, J. C. Verster, J. H. bust tracking of respiratory rate in high-dynamic range scenes
Klaessens, B. Olivier, and C. J. Kalkman, “The effect of stress on using mobile thermal imaging,” Biomedical optics express, vol. 8,
core and peripheral body temperature in humans,” Stress, vol. 16, no. 10, pp. 4480–4503, 2017.
no. 5, pp. 520–530, 2013. [79] J. A. Healey and R. W. Picard, “Detecting stress during real-world
[60] H. Kim, Y.-S. Kim, M. Mahmood, S. Kwon, F. Epps, Y. S. Rim, driving tasks using physiological sensors,” IEEE Transactions on
and W.-H. Yeo, “Wireless, continuous monitoring of daily stress intelligent transportation systems, vol. 6, no. 2, pp. 156–166, 2005.
and management practice via soft bioelectronics,” Biosensors and [80] P. Schmidt, A. Reiss, R. Duerichen, C. Marberger, and
Bioelectronics, vol. 173, p. 112764, 2021. K. Van Laerhoven, “Introducing wesad, a multimodal dataset for
[61] N. Ravi, N. Dandekar, P. Mysore, and M. L. Littman, “Activity wearable stress and affect detection,” in Proceedings of the 20th
recognition from accelerometer data,” in Aaai, vol. 5, no. 2005. ACM international conference on multimodal interaction, 2018, pp.
Pittsburgh, PA, 2005, pp. 1541–1546. 400–408.
[62] E. Garcia-Ceja, V. Osmani, and O. Mayora, “Automatic stress de- [81] R. Subramanian, J. Wache, M. K. Abadi, R. L. Vieriu, S. Winkler,
tection in working environments from smartphones’ accelerom- and N. Sebe, “Ascertain: Emotion and personality recognition
eter data: a first step,” IEEE journal of biomedical and health using commercial sensors,” IEEE Transactions on Affective Com-
informatics, vol. 20, no. 4, pp. 1053–1060, 2015. puting, vol. 9, no. 2, pp. 147–160, 2016.
[63] J. H. Hansen and S. Patil, “Speech under stress: Analysis, mod- [82] P. Costa and R. Mccrae, “Neo-pi-r professional manual: revised
eling and recognition,” in Speaker classification I. Springer, 2007, neo personality and neo five-factor inventory (neo-ffi). odessa, fl,
pp. 108–137. psychological assessment resources,” Psychol. Assess, vol. 4, pp.
[64] H. Kurniawan, A. V. Maslov, and M. Pechenizkiy, “Stress de- 26–42, 1992.
tection from speech and galvanic skin response signals,” in [83] V. Markova, T. Ganchev, and K. Kalinkov, “Clas: A database for
Proceedings of the 26th IEEE International Symposium on Computer- cognitive load, affect and stress recognition,” in 2019 International
Based Medical Systems. IEEE, 2013, pp. 209–214. Conference on Biomedical Innovations and Applications (BIA). IEEE,
[65] D. F. Dinges, R. L. Rider, J. Dorrian, E. L. McGlinchey, N. L. 2019, pp. 1–4.
Rogers, Z. Cizman, S. K. Goldenstein, C. Vogler, S. Venkataraman, [84] R. K. Sah, M. McDonell, P. Pendry, S. Parent, H. Ghasemzadeh,
and D. N. Metaxas, “Optical computer recognition of facial and M. J. Cleveland, “Adarp: A multi modal dataset for stress
expressions associated with stress induced by performance de- and alcohol relapse quantification in real life setting,” arXiv
mands,” Aviation, space, and environmental medicine, vol. 76, no. 6, preprint arXiv:2206.14568, 2022.
pp. B172–B182, 2005. [85] J. H. Hansen and S. E. Bou-Ghazale, “Getting started with susas:
[66] T.-D. Tran, J. Kim, N.-H. Ho, H.-J. Yang, S. Pant, S.-H. Kim, A speech under simulated and actual stress database,” in Fifth
and G.-S. Lee, “Stress analysis with dimensions of valence and European Conference on Speech Communication and Technology, 1997.
arousal in the wild,” Applied Sciences, vol. 11, no. 11, p. 5194, [86] P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar, and
2021. I. Matthews, “The extended cohn-kanade dataset (ck+): A com-
[67] J. S. Lerner, R. E. Dahl, A. R. Hariri, and S. E. Taylor, “Facial ex- plete dataset for action unit and emotion-specified expression,”
pressions of emotion reveal neuroendocrine and cardiovascular in 2010 ieee computer society conference on computer vision and
stress responses,” Biological psychiatry, vol. 61, no. 2, pp. 253–260, pattern recognition-workshops. IEEE, 2010, pp. 94–101.
2007. [87] D. Lundqvist, A. Flykt, and A. Öhman, “Karolinska directed
[68] G. Giannakakis, M. Pediaditis, D. Manousos, E. Kazantzaki, emotional faces,” Cognition and Emotion, 1998.
F. Chiarugi, P. G. Simos, K. Marias, and M. Tsiknakis, “Stress [88] G. Zhao, X. Huang, M. Taini, S. Z. Li, and M. PietikäInen, “Facial
and anxiety detection using facial cues from videos,” Biomedical expression recognition from near-infrared videos,” Image and
Signal Processing and Control, vol. 31, pp. 89–101, 2017. Vision Computing, vol. 29, no. 9, pp. 607–619, 2011.
[69] M. N. H. Mohd, M. Kashima, K. Sato, and M. Watanabe, “Facial [89] M. Jeong and B. C. Ko, “Driver’s facial expression recognition in
visual-infrared stereo vision fusion measurement as an alterna- real-time for safe driving,” Sensors, vol. 18, no. 12, p. 4270, 2018.
tive for physiological measurement,” J. Biomedical Image Process- [90] P. J. Phillips, H. Moon, S. A. Rizvi, and P. J. Rauss, “The feret
ing (JBIP), vol. 1, no. 1, pp. 34–44, 2014. evaluation methodology for face-recognition algorithms,” IEEE
[70] G. Giannakakis, D. Manousos, P. Simos, and M. Tsiknakis, “Head Transactions on pattern analysis and machine intelligence, vol. 22,
movements in context of speech during stress induction,” in 2018 no. 10, pp. 1090–1104, 2000.
13th IEEE International Conference on Automatic Face & Gesture [91] N. Sharma, A. Dhall, T. Gedeon, and R. Goecke, “Thermal spatio-
Recognition (FG 2018). IEEE, 2018, pp. 710–714. temporal data for stress recognition,” EURASIP Journal on Image
[71] H. Chen, X. Liu, X. Li, H. Shi, and G. Zhao, “Analyze spontaneous and Video Processing, vol. 2014, no. 1, pp. 1–12, 2014.
gestures for emotional stress state recognition: A micro-gesture [92] S. Taamneh, P. Tsiamyrtzis, M. Dcosta, P. Buddharaju, A. Khatri,
dataset and analysis with deep learning,” in 2019 14th IEEE M. Manser, T. Ferris, R. Wunderlich, and I. Pavlidis, “A multi-
International Conference on Automatic Face & Gesture Recognition modal dataset for various forms of distracted driving,” Scientific
(FG 2019). IEEE, 2019, pp. 1–8. data, vol. 4, no. 1, pp. 1–21, 2017.
[72] M. Ciman and K. Wac, “Individuals’ stress assessment using [93] H.-C. Chou, W.-C. Lin, L.-C. Chang, C.-C. Li, H.-P. Ma, and C.-
human-smartphone interaction analysis,” IEEE Transactions on C. Lee, “Nnime: The nthu-ntua chinese interactive multimodal
Affective Computing, vol. 9, no. 1, pp. 51–65, 2016. emotion corpus,” in 2017 Seventh International Conference on Affec-

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 18

tive Computing and Intelligent Interaction (ACII). IEEE, 2017, pp. [112] N. Bu, M. Fukami, and O. Fukuda, “Pattern recognition of mental
292–298. stress levels from differential rri time series using lstm networks,”
[94] K. Mundnich, B. M. Booth, M. l’Hommedieu, T. Feng, B. Girault, in 2021 IEEE 3rd Global Conference on Life Sciences and Technologies
J. L’hommedieu, M. Wildman, S. Skaaden, A. Nadarajan, J. L. Vil- (LifeTech). IEEE, 2021, pp. 408–411.
latte et al., “Tiles-2018, a longitudinal physiologic and behavioral [113] G. Giannakakis, E. Trivizakis, M. Tsiknakis, and K. Marias, “A
data set of hospital workers,” Scientific Data, vol. 7, no. 1, pp. novel multi-kernel 1d convolutional neural network for stress
1–26, 2020. recognition from ecg,” in 2019 8th International Conference on
[95] R. Wang, F. Chen, Z. Chen, T. Li, G. Harari, S. Tignor, X. Zhou, Affective Computing and Intelligent Interaction Workshops and Demos
D. Ben-Zeev, and A. T. Campbell, “Studentlife: assessing mental (ACIIW). IEEE, 2019, pp. 1–4.
health, academic performance and behavioral trends of college [114] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classi-
students using smartphones,” in Proceedings of the 2014 ACM fication with deep convolutional neural networks,” Advances in
international joint conference on pervasive and ubiquitous computing, neural information processing systems, vol. 25, pp. 1097–1105, 2012.
2014, pp. 3–14. [115] B. Behinaein, A. Bhatti, D. Rodenburg, P. Hungler, and
[96] J. Zhang, X. Mei, H. Liu, S. Yuan, and T. Qian, “Detecting negative A. Etemad, “A transformer architecture for stress detection from
emotional stress based on facial expression in real time,” in 2019 ecg,” in 2021 International Symposium on Wearable Computers, 2021,
IEEE 4th International Conference on Signal and Image Processing pp. 132–134.
(ICSIP). IEEE, 2019, pp. 430–434. [116] F. Chollet, “Xception: Deep learning with depthwise separable
[97] J. Almeida and F. Rodrigues, “Facial expression recognition sys- convolutions,” in Proceedings of the IEEE conference on computer
tem for stress detection with deep learning,” in ICEIS (1), 2021, vision and pattern recognition, 2017, pp. 1251–1258.
pp. 256–263. [117] S. Ishaque, N. Khan, and S. Krishnan, “Comprehending the
[98] W. D. Scherz, J. Baun, R. Seepold, N. M. Madrid, and J. A. impact of deep learning algorithms on optimizing for recurring
Ortega, “A portable ecg for recording and flexible development impediments associated with stress prediction using ecg data
of algorithms and stress detection,” Procedia computer science, vol. through statistical analysis,” Biomedical Signal Processing and Con-
176, pp. 2886–2893, 2020. trol, vol. 74, p. 103484, 2022.
[99] B. Hwang, J. You, T. Vaessen, I. Myin-Germeys, C. Park, and B.- [118] S. Koldijk, M. Sappelli, S. Verberne, M. A. Neerincx, and
T. Zhang, “Deep ecgnet: An optimal deep learning framework W. Kraaij, “The swell knowledge work dataset for stress and
for monitoring mental stress using ultra short-term ecg signals,” user modeling research,” in Proceedings of the 16th international
TELEMEDICINE and e-HEALTH, vol. 24, no. 10, pp. 753–772, conference on multimodal interaction, 2014, pp. 291–298.
2018. [119] S. Rabbani and N. Khan, “Contrastive self-supervised learning
[100] H.-M. Cho, H. Park, S.-Y. Dong, and I. Youn, “Ambulatory for stress detection from ecg data,” Bioengineering, vol. 9, no. 8, p.
and laboratory stress detection based on raw electrocardiogram 374, 2022.
signals using a convolutional neural network,” Sensors, vol. 19, [120] P. Sarkar and A. Etemad, “Self-supervised learning for ecg-based
no. 20, p. 4408, 2019. emotion recognition,” in ICASSP 2020-2020 IEEE International
[101] P. Zhang, F. Li, L. Du, R. Zhao, X. Chen, T. Yang, and Z. Fang, Conference on Acoustics, Speech and Signal Processing (ICASSP).
“Psychological stress detection according to ecg using a deep IEEE, 2020, pp. 3217–3221.
learning model with attention mechanism,” Applied Sciences,
[121] Z. Ahmad and N. M. Khan, “Multi-level stress assessment using
vol. 11, no. 6, p. 2848, 2021.
multi-domain fusion of ecg signal,” in 2020 42nd Annual Inter-
[102] A. Mahajan, M. K. Shetty, M. Girish, M. D. Gupta, and national Conference of the IEEE Engineering in Medicine & Biology
A. Gupta, “Building an ai model on ecg data for identifying Society (EMBC). IEEE, 2020, pp. 4518–4521.
burnout/stressed healthcare workers involved in covid-19 man-
[122] S. Elzeiny and M. Qaraqe, “Automatic and intelligent stressor
agement,” in 2021 Fourth International Conference on Electrical,
identification based on photoplethysmography analysis,” IEEE
Computer and Communication Technologies (ICECCT). IEEE, 2021,
Access, vol. 9, pp. 68 498–68 510, 2021.
pp. 1–6.
[103] P. Zhang, F. Li, R. Zhao, R. Zhou, L. Du, Z. Zhao, X. Chen, and [123] Y. S. Can and C. Ersoy, “Privacy-preserving federated deep
Z. Fang, “Real-time psychological stress detection according to learning for wearable iot-based biomedical monitoring,” ACM
ecg using deep learning,” Applied Sciences, vol. 11, no. 9, p. 3838, Transactions on Internet Technology (TOIT), vol. 21, no. 1, pp. 1–17,
2021. 2021.
[104] J. He, K. Li, X. Liao, P. Zhang, and N. Jiang, “Real-time detection [124] Z. Halim and M. Rehan, “On identification of driving-induced
of acute cognitive stress using a convolutional neural network stress using electroencephalogram signals: A framework based
from electrocardiographic signal,” IEEE Access, vol. 7, pp. 42 710– on wearable safety-critical scheme and machine learning,” Infor-
42 717, 2019. mation Fusion, vol. 53, pp. 66–79, 2020.
[105] J. He and N. Jiang, “Optimizing probability threshold of convolu- [125] U. M. Al-Saggaf, S. F. Naqvi, M. Moinuddin, S. A. Alfakeh,
tion neural network to improve hrv-based acute stress detection and S. S. A. Ali, “Performance evaluation of eeg based mental
performance,” in 2019 41st Annual International Conference of the stress assessment approaches for wearable devices,” Frontiers in
IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, Neurorobotics, vol. 15, p. 819448, 2022.
2019, pp. 5318–5321. [126] A. Sundaresan, B. Penchina, S. Cheong, V. Grace, A. Valero-
[106] M. Amin, K. Ullah, M. Asif, A. Waheed, S. U. Haq, M. Zareei, and Cabré, and A. Martel, “Evaluating deep learning eeg-based men-
R. Biswal, “Ecg-based driver’s stress detection using deep trans- tal stress classification in adolescents with autism for breathing
fer learning and fuzzy logic approaches,” IEEE Access, vol. 10, entrainment bci,” Brain Informatics, vol. 8, no. 1, pp. 1–12, 2021.
pp. 29 788–29 809, 2022. [127] A. Martı́nez-Rodrigo, B. Garcı́a-Martı́nez, Á. Huerta, and R. Al-
[107] K. Tzevelekakis, Z. Stefanidi, and G. Margetis, “Real-time stress caraz, “Detection of negative stress through spectral features of
level feedback from raw ecg signals for personalised, context- electroencephalographic recordings and a convolutional neural
aware applications using lightweight convolutional neural net- network,” Sensors, vol. 21, no. 9, p. 3050, 2021.
work architectures,” Sensors, vol. 21, no. 23, p. 7802, 2021. [128] C.-Y. Liao, R.-C. Chen, and S.-K. Tai, “Emotion stress detection
[108] M. Moridani, Z. Mahabadi, and N. Javadi, “Heart rate variability using eeg signal and deep learning technologies,” in 2018 IEEE
features for different stress classification.” Bratislavske Lekarske International Conference on Applied System Invention (ICASI). IEEE,
Listy, vol. 121, no. 9, pp. 619–627, 2020. 2018, pp. 90–93.
[109] R. B. Ramteke and V. R. Thool, “Heart rate variability-based [129] H. Jebelli, M. M. Khalili, and S. Lee, “Mobile eeg-based workers’
mental stress detection using deep learning approach,” in Applied stress recognition by applying deep neural network,” in Advances
Information Processing Systems. Springer, 2022, pp. 51–61. in informatics and computing in civil and construction engineering.
[110] K. Sardeshpande and V. R. Thool, “Psychological stress detection Springer, 2019, pp. 173–180.
using deep convolutional neural networks,” in International Con- [130] J. R. M. Fernández and L. Anishchenko, “Mental stress detection
ference on Computer Vision and Image Processing. Springer, 2019, using bioradar respiratory signals,” Biomedical signal processing
pp. 180–189. and control, vol. 43, pp. 244–249, 2018.
[111] J. Huang, X. Luo, and X. Peng, “A novel classification method for [131] P. Partila, J. Tovarek, J. Rozhon, and J. Jalowiczor, “Human
a driver’s cognitive stress level by transferring interbeat intervals stress detection from the speech in danger situation,” in Mobile
of the ecg signal to pictures,” Sensors, vol. 20, no. 5, p. 1340, 2020. Multimedia/Image Processing, Security, and Applications 2019, vol.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 19

10993. International Society for Optics and Photonics, 2019, p. [150] J. Lee, H. Lee, and M. Shin, “Driving stress detection using
109930U. multimodal convolutional neural networks with nonlinear rep-
[132] H. Han, K. Byun, and H.-G. Kang, “A deep learning-based stress resentation of short-term physiological signals,” Sensors, vol. 21,
detection algorithm with speech signal,” in proceedings of the no. 7, p. 2381, 2021.
2018 workshop on audio-visual scene understanding for immersive [151] F. Di Martino and F. Delmastro, “High-resolution physiological
multimedia, 2018, pp. 11–15. stress prediction models based on ensemble learning and recur-
[133] K.-Y. Huang, C.-H. Wu, Q.-B. Hong, M.-H. Su, and Y.-H. Chen, rent neural networks,” in 2020 IEEE Symposium on Computers and
“Speech emotion recognition using deep neural network con- Communications (ISCC). IEEE, 2020, pp. 1–6.
sidering verbal and nonverbal speech sounds,” in ICASSP 2019- [152] K. Radhika and V. R. M. Oruganti, “Deep multimodal fusion for
2019 IEEE International Conference on Acoustics, Speech and Signal subject-independent stress detection,” in 2021 11th International
Processing (ICASSP). IEEE, 2019, pp. 5866–5870. Conference on Cloud Computing, Data Science & Engineering (Con-
[134] H.-K. Shin, H. Han, K. Byun, and H.-G. Kang, “Speaker-invariant fluence). IEEE, 2021, pp. 105–109.
psychological stress detection using attention-based network,” [153] ——, “Transfer learning for subject-independent stress detection
in 2020 Asia-Pacific Signal and Information Processing Association using physiological signals,” in 2020 IEEE 17th India Council
Annual Summit and Conference (APSIPA ASC). IEEE, 2020, pp. International Conference (INDICON). IEEE, 2020, pp. 1–6.
308–313. [154] ——, “Stress detection using cnn fusion,” in TENCON 2021-2021
[135] F. Eyben, M. Wöllmer, and B. Schuller, “Opensmile: the munich IEEE Region 10 Conference (TENCON). IEEE, 2021, pp. 492–497.
versatile and fast open-source audio feature extractor,” in Proceed- [155] M. Maier, D. Elsner, C. Marouane, M. Zehnle, and C. Fuchs,
ings of the 18th ACM international conference on Multimedia, 2010, “Deepflow: Detecting optimal user experience from physiological
pp. 1459–1462. data using deep neural networks.” in AAMAS, 2019, pp. 2108–
[136] A. R. Avila, S. R. Kshirsagar, A. Tiwari, D. Lafond, 2110.
D. O’Shaughnessy, and T. H. Falk, “Speech-based stress classi- [156] R. K. Nath, H. Thapliyal, and A. Caban-Holt, “Machine learning
fication based on modulation spectral features and convolutional based stress monitoring in older adults using wearable sensors
neural networks,” in 2019 27th European Signal Processing Confer- and cortisol as stress biomarker,” Journal of Signal Processing
ence (EUSIPCO). IEEE, 2019, pp. 1–5. Systems, pp. 1–13, 2021.
[137] A. R. Avila, Z. Akhtar, J. F. Santos, D. O’Shaughnessy, and [157] F. Albertetti, A. Simalastar, and A. Rizzotti-Kaddouri, “Stress
T. H. Falk, “Feature pooling of modulation spectrum features detection with deep learning approaches using physiological sig-
for improved speech emotion recognition in the wild,” IEEE nals,” in International Conference on IoT Technologies for HealthCare.
Transactions on Affective Computing, vol. 12, no. 1, pp. 177–188, Springer, 2020, pp. 95–111.
2018.
[158] P. Bobade and M. Vani, “Stress detection with machine learning
[138] D. Banerjee, K. Islam, K. Xue, G. Mei, L. Xiao, G. Zhang, R. Xu, and deep learning using multimodal physiological data,” in 2020
C. Lei, S. Ji, and J. Li, “A deep transfer learning approach for Second International Conference on Inventive Research in Computing
improved post-traumatic stress disorder diagnosis,” Knowledge Applications (ICIRCA). IEEE, 2020, pp. 51–57.
and Information Systems, vol. 60, no. 3, pp. 1693–1724, 2019.
[159] M. Gil-Martin, R. San-Segundo, A. Mateos, and J. Ferreiros-
[139] J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and
Lopez, “Human stress detection with wearable sensors using
D. S. Pallett, “DARPA TIMIT acoustic-phonetic continous speech
convolutional neural networks,” IEEE Aerospace and Electronic
corpus CD-ROM. NIST speech disc 1-1.1,” p. 27403, Feb. 1993.
Systems Magazine, vol. 37, no. 1, pp. 60–70, 2022.
[140] J. Deng, X. Xu, Z. Zhang, S. Frühholz, and B. Schuller,
[160] R. Li and Z. Liu, “Stress detection using deep neural networks,”
“Semisupervised autoencoders for speech emotion recognition,”
BMC Medical Informatics and Decision Making, vol. 20, no. 11, pp.
IEEE/ACM Transactions on Audio, Speech, and Language Processing,
1–10, 2020.
vol. 26, no. 1, pp. 31–43, 2017.
[161] A. Kumar, K. Sharma, and A. Sharma, “Hierarchical deep neu-
[141] M. Pediaditis, G. Giannakakis, F. Chiarugi, D. Manousos, A. Pam-
ral network for mental stress state detection using iot based
pouchidou, E. Christinaki, G. Iatraki, E. Kazantzaki, P. G. Simos,
biomarkers,” Pattern Recognition Letters, vol. 145, pp. 81–87, 2021.
K. Marias et al., “Extraction of facial features as indicators of
stress and anxiety,” in 2015 37th Annual International Conference [162] W. Seo, N. Kim, S. Kim, C. Lee, and S.-M. Park, “Deep ecg-
of the IEEE Engineering in Medicine and Biology Society (EMBC). respiration network (deeper net) for recognizing mental stress,”
IEEE, 2015, pp. 3711–3714. Sensors, vol. 19, no. 13, p. 3021, 2019.
[142] H. Zhang, L. Feng, N. Li, Z. Jin, and L. Cao, “Video-based stress [163] K. Wang, Y. L. Murphey, Y. Zhou, X. Hu, and X. Zhang, “De-
detection through deep learning,” Sensors, vol. 20, no. 19, p. 5552, tection of driver stress in real-world driving environment using
2020. physiological signals,” in 2019 IEEE 17th International Conference
[143] Y. Li, X.-Z. Lin, and M.-Y. Jiang, “Facial expression recognition on Industrial Informatics (INDIN), vol. 1. IEEE, 2019, pp. 1807–
with cross-connect lenet-5 network,” Zidonghua Xuebao/Acta Au- 1814.
tomatica Sinica, vol. 44, pp. 176–182, 01 2018. [164] K. Wang and P. Guo, “An ensemble classification model with
[144] B. H. Prasetio, H. Tamura, and K. Tanno, “The facial stress unsupervised representation learning for driving stress recogni-
recognition based on multi-histogram features and convolutional tion using physiological signals,” IEEE transactions on intelligent
neural network,” in 2018 IEEE International Conference on Systems, transportation systems, vol. 22, no. 6, pp. 3303–3315, 2020.
Man, and Cybernetics (SMC). IEEE, 2018, pp. 881–887. [165] R. Holder, R. K. Sah, M. Cleveland, and H. Ghasemzadeh, “Com-
[145] J. Shin, J. Moon, B. Kim, J. Eom, N. Park, and K. Lee, “Attention- paring the predictability of sensor modalities to detect stress
based stress detection exploiting non-contact monitoring of from wearable sensor data,” in 2022 IEEE 19th Annual Consumer
movement patterns with ir-uwb radar,” in Proceedings of the 36th Communications & Networking Conference (CCNC). IEEE, 2022,
Annual ACM Symposium on Applied Computing, 2021, pp. 637–640. pp. 557–562.
[146] R. Castaldo, W. Xu, P. Melillo, L. Pecchia, L. Santamaria, and [166] K. Masood and M. A. Alghamdi, “Modeling mental stress using a
C. James, “Detection of mental stress due to oral academic exam- deep learning framework,” IEEE Access, vol. 7, pp. 68 446–68 454,
ination via ultra-short-term hrv analysis,” in 2016 38th Annual 2019.
International Conference of the IEEE Engineering in Medicine and [167] H. Lin, J. Jia, J. Qiu, Y. Zhang, G. Shen, L. Xie, J. Tang, L. Feng, and
Biology Society (EMBC). IEEE, 2016, pp. 3805–3808. T.-S. Chua, “Detecting stress based on social interactions in social
[147] S. Koelstra, C. Muhl, M. Soleymani, J.-S. Lee, A. Yazdani, networks,” IEEE Transactions on Knowledge and Data Engineering,
T. Ebrahimi, T. Pun, A. Nijholt, and I. Patras, “Deap: A database vol. 29, no. 9, pp. 1820–1833, 2017.
for emotion analysis; using physiological signals,” IEEE transac- [168] A. Benton, M. Mitchell, and D. Hovy, “Multi-task learning
tions on affective computing, vol. 3, no. 1, pp. 18–31, 2011. for mental health using social media text,” arXiv preprint
[148] A. Saeed and S. Trajanovski, “Personalized driver stress detection arXiv:1712.03538, 2017.
with multi-task neural networks using physiological signals,” [169] A. Gaballah, A. Tiwari, S. Narayanan, and T. H. Falk, “Context-
arXiv preprint arXiv:1711.06116, 2017. aware speech stress detection in hospital workers using bi-lstm
[149] A. Saeed, T. Ozcelebi, J. Lukkien, J. B. van Erp, and S. Trajanovski, classifiers,” in ICASSP 2021-2021 IEEE International Conference on
“Model adaptation and personalization for physiological stress Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021, pp.
detection,” in 2018 IEEE 5th International Conference on Data 8348–8352.
Science and Advanced Analytics (DSAA). IEEE, 2018, pp. 209–216. [170] Y. Acikmese and S. E. Alptekin, “Prediction of stress levels with

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/
This article has been accepted for publication in IEEE Transactions on Affective Computing. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2024.3455371

JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 20

lstm and passive mobile sensors,” Procedia Computer Science, vol. Maria Kyrou received the Diploma degree in
159, pp. 658–667, 2019. Production and Management Engineering in
[171] S. Taylor, N. Jaques, E. Nosakhare, A. Sano, and R. Picard, “Per- 2017 from Democritus University of Thrace
sonalized multitask learning for predicting tomorrow’s mood, (DUTH), Greece and the Postgraduate Diploma
stress, and health,” IEEE Transactions on Affective Computing, (MSc) in Digital Media - Computational Intelli-
vol. 11, no. 2, pp. 200–213, 2017. gence in 2020 from Aristotle University of Thes-
[172] T. Umematsu, A. Sano, S. Taylor, and R. W. Picard, “Improving saloniki (AUTH), Greece. She is currently work-
students’ daily life stress forecasting using lstm neural net- ing toward the Ph.D. degree with the Artificial
works,” in 2019 IEEE EMBS International Conference on Biomedical Intelligence Information Analysis Laboratory, De-
& Health Informatics (BHI). IEEE, 2019, pp. 1–4. partment of Informatics at AUTH. She is working
[173] A. Sano, S. Taylor, A. W. McHill, A. J. Phillips, L. K. Barger, as a research associate in Centre for Research
E. Klerman, R. Picard et al., “Identifying objective physiological and Technology Hellas (CERTH). Her research interests include emo-
markers and modifiable behaviors for self-reported stress and tions recognition, signal processing, machine learning and computa-
mental health status using wearable sensors and mobile phones: tional intelligence.
observational study,” Journal of medical Internet research, vol. 20,
no. 6, p. e9410, 2018.
[174] N. Jaques, S. Taylor, E. Nosakhare, A. Sano, and R. Picard, “Multi-
task learning for predicting health, stress, and happiness,” in
NIPS Workshop on Machine Learning for Healthcare, 2016.
[175] A. Kumar, A. Sharma, and A. Arora, “Anxious depression pre-
diction in real-time social data,” arXiv preprint arXiv:1903.10222,
2019.
[176] P. Linardatos, V. Papastefanopoulos, and S. Kotsiantis, “Explain-
able ai: A review of machine learning interpretability methods,”
Entropy, vol. 23, no. 1, p. 18, 2020. Ioannis Kompatsiaris received the Ph.D. de-
[177] N. Chalabianloo, Y. S. Can, M. Umair, C. Sas, and C. Ersoy, gree in 3-D model-based image sequence cod-
“Application level performance evaluation of wearable devices ing from the Aristotle University of Thessaloniki,
for stress classification with explainable ai,” Pervasive and Mobile in 2001. He is currently a Research Director at
Computing, vol. 87, p. 101703, 2022. Information Technologies Institute/Centre for Re-
[178] T. Fredriksson, D. I. Mattos, J. Bosch, and H. H. Olsson, “Data search and Technology Hellas, the Head of Mul-
labeling: An empirical investigation into industrial challenges timedia Knowledge and Social Media Analytics
and mitigation strategies,” in International Conference on Product- Laboratory and Director of the Institute. He has
Focused Software Process Improvement. Springer, 2020, pp. 202–216. co-authored 129 papers in refereed journals, 46
[179] O. T. Başaran, Y. S. Can, E. André, and C. Ersoy, “Relieving the book chapters, and more than 420 papers in in-
burden of intensive labeling for stress monitoring in the wild by ternational conferences. He holds eight patents.
using semi-supervised learning,” Frontiers in Psychology, vol. 14, His research interests include semantic multimedia analysis, indexing
2023. and retrieval, social media and big data analysis, knowledge structures,
[180] H. Yu and A. Sano, “Semi-supervised learning for wearable- reasoning and personalization for multimedia applications, eHealth, se-
based momentary stress detection in the wild,” Proceedings of the curity, and environmental applications. He is a Senior Member of the
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, IEEE and a member of the ACM. He has been the Co-Organizer of
vol. 7, no. 2, pp. 1–23, 2023. various international conferences and workshops and has served as a
[181] D. Naous and T. Mettler, “Mental health monitoring at work: regular Reviewer, an Associate Editor, and a Guest Editor for a number
Iot solutions and privacy concerns,” in International Conference on of journals and conferences.
Well-Being in the Information Society. Springer, 2022, pp. 37–45.
[182] G. M. Slavich, S. Taylor, and R. W. Picard, “Stress measurement
using speech: Recent advancements, validation issues, and ethi-
cal and privacy considerations,” Stress, vol. 22, no. 4, pp. 408–413,
2019.
[183] C.-R. Shyu, K. T. Putra, H.-C. Chen, Y.-Y. Tsai, K. T. Hossain,
W. Jiang, and Z.-Y. Shae, “A systematic review of federated
learning in the healthcare area: From the perspective of data
properties and applications,” Applied Sciences, vol. 11, no. 23, p.
11191, 2021. Panagiotis C. Petrantonakis received the
[184] M. A. Fauzi, B. Yang, and B. Blobel, “Comparative analysis be- Diploma degree in electrical and computer en-
tween individual, centralized, and federated learning for smart- gineering and the Ph.D. degree in signal pro-
watch based stress detection,” Journal of Personalized Medicine, cessing and machine learning from the Aristo-
vol. 12, no. 10, p. 1584, 2022. tle University of Thessaloniki, Greece, in 2007
[185] O. Mazhelis, A. Hämäläinen, T. Asp, and P. Tyrväinen, “Towards and 2011, respectively. From 2012 to 2016 he
enabling privacy preserving smart city apps,” in 2016 IEEE was a postdoctoral researcher at the Institute
International Smart Cities Conference (ISC2). IEEE, 2016, pp. 1– of Molecular Biology and Biotechnology of the
7. Foundation for Research and Technology – Hel-
las (FORTH) and from 2017 to 2022 a postdoc-
toral researcher at the Information Technologies
Institute of the Centre for Research and Technology – Hellas (CERTH).
Currently, he is an Assistant Professor in the Department of Electrical
and Computer Engineering at Aristotle University of Thessaloniki. He
has published more than 50 papers in peer reviewed journals, book
chapters and conferences in the field of signal processing and machine
learning with applications on biomedical engineering, neural information
processing and large scale data analysis. He serves as an Associate
Editor for IEEE Signal Processing Letters and he is a Senior Member of
the IEEE.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/

You might also like