Dai 2021
Dai 2021
PII: S0169-2607(21)00281-9
DOI: https://doi.org/10.1016/j.cmpb.2021.106207
Reference: COMM 106207
Please cite this article as: Ruixuan Dai , Chenyang Lu , Linda Yun , Eric Lenze , Michael Avidan ,
Thomas Kannampallil , Comparing Stress Prediction Models using Smartwatch Physiological Sig-
nals and Participant Self-reports, Computer Methods and Programs in Biomedicine (2021), doi:
https://doi.org/10.1016/j.cmpb.2021.106207
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.
Corresponding Author:
Thomas Kannampallil, PhD
660 S. Euclid Ave, Campus Box 8054,
St Louis, MO 63110
[email protected]
314-273-7801
Abstract
Recent advances in wearable technology have facilitated the non-obtrusive monitoring of physiological
signals, creating opportunities to monitor and predict stress. Researchers have utilized machine learning
methods using these physiological signals to develop stress prediction models. Many of these prediction
models have utilized objective stressor tasks (e.g., a public speaking task or solving math problems).
Alternatively, the subjective user responses with self-reports have also been used for measuring stress.
In this paper, we describe a methodological approach (a) to compare the prediction performance of
models developed using objective markers of stress with participant-reported subjective markers of
stress from self-reports; and (b) to develop personalized stress models by accounting for inter-individual
differences. Towards this end, we conducted a laboratory-based study with 32 healthy volunteers.
Participants completed a series of stressor tasks—social, cognitive and physical—wearing an
instrumented commercial smartwatch that collected physiological signals and participant responses
using timed self-reports. After extensive data preprocessing using a combination of signal processing
techniques, we developed two types of models: objective stress models using the stressor tasks as
labels; and subjective stress models using participant responses to each task as the label for that stress
task. We trained and tested several machine learning algorithms—support vector machine (SVM),
random forest (RF), gradient boosted trees (GBT), AdaBoost, and Logistic Regression (LR)—and
evaluated their performance. SVM had the best performance for the models using the objective stressor
(i.e., stressor tasks) with an AUROC of 0.790 and an F-1 score of 0.623. SVM also had the highest
performance for the models using the subjective stress (i.e., participant self-reports) with an AUROC of
0.726 and an F-1 score of 0.520. Model performance improved with a personalized threshold model to
1
an AUROC of 0.775 and an F-1 score of 0.599. The performance of the stress models using an
instrumented commercial smartwatch was comparable to similar models from other state-of-the-art
laboratory-based studies. However, the subjective stress models had a lower performance, indicating
the need for further research on the use of self-reports for stress-related studies. The improvement in
performance with the personalized threshold-based models provide new directions for building stress
prediction models.
Highlights
Instrumented a commercial smartwatch for stress modeling using multi-stage data processing
Compared stress prediction models using objective task-based stressors with subjective EMAs
Developed personalized stress models with a personalized threshold accounting for inter-
individual differences in stress
Keywords
Introduction
Stress is a common health concern and chronic stress is associated with the development of depression
and anxiety [1], immune function dysregulation [2,3], cardiovascular disease [4], decreased work-related
performance [5], quality of life [6], and drug use [7]. In the United States, over 50% of the adult working
population have described their work productivity being affected by stress [8]—resulting in diseases,
absenteeism, presenteeism, and staff turnover. Such loss in productivity cost nearly 187 billion dollars
[9].
The accumulation of daily stress contributes to chronic stress. In spite of its significant impact, routine
measurement of stress is challenging. The complexities of measurement arise from difficulties in
discerning appropriate physiological and subjective measures of stress, and the considerable intra- and
inter-individual differences in the manifestations of stress [10]. Objective measurements have relied on
measurement of cortisol and inflammatory cytokines [11,12] that have been used as successful
objective proxies for measuring stress [13]. However, such measurements are not pragmatic in real-
world, routine stress situations. The most widely used method for measuring stress is through
questionnaires. Survey scales such as the Perceived Stress Scale (PSS) [14], and Depression Anxiety
Stress Scales (DASS) [15] have been shown to be effective in measuring perceived stress in different
cohorts [16,17]. Although useful, these self-reported questionnaires are time-consuming, suffer from
recall bias and provide only a snapshot view of an individual’s perceived stress [10].
2
Newer approaches using mobile or internet-enabled devices, such as ecological momentary assessments
(EMA) and participant self-reports, can considerably increase the sampling frequency, providing insights
into individuals’ activities, affect and behaviors [18]. Within this context, EMAs or self-reports afford a
viable mechanism to detect and characterize the evolution and progression of stress [19]. However,
even EMAs or self-reports with high frequency of contact have shown to decrease the quality and
response rate among participants over time [19,20]. This is because EMAs and self-reports have been
shown to be affected by recall bias, fabrication, and falsification in reporting [21].
The exponential growth and adoption of wearable technology has afforded new opportunities to
measure and monitor a number of physiological signals including skin conductance, skin temperature,
electrocardiogram (ECG) and photoplethysmogram (PPG). Physiological measurements derived from
these sensors have also been used to develop machine learning-based prediction models [21–24]. For
developing these models, most of these studies have relied on laboratory-based trials where artificial
stress stimuli were induced. For example, Hovsepian et al. used a combination of ECG and respiration
inductive plethysmography to predict stress using machine learning algorithms [21]. Similarly, King et al.
focused on a cohort of pregnant women to developed stress models from laboratory-based studies and
translated these models for “in the wild” studies [23]. Several other studies have also developed similar
models using a combination of sensors relying primarily on laboratory-based studies [22,25].
Although these models had relatively high performance in laboratory-based settings, there are several
challenges. First, most of these studies used a combination of multiple body-worn sensors that are
pragmatically difficult to translate for real-world clinical applications. The chest belts that were used in
several of these studies [21–23], are cumbersome to use in free-living situations, limiting user
compliance and data yield. Second, many of the machine learning models that were developed using the
laboratory-based trials may not be applicable in free-living settings, where we would need to rely on
participant self-reports rather than specific induced stress stimuli.
To address these gaps, we had the following methodological and research objectives: first, to instrument
a commercially available smartwatch for detecting and predicting stress in controlled scenarios. Such
measurements with a commercial smartwatch—using both physiological measurements and associated
self-reports—have translational potential for use in clinical settings. Second, to compare the objective
markers of stress with participant-reported subjective markers of stress from self-reports. Such a
comparison can help in establishing the viability of using self-reports as a potential proxy measurement
mechanism for stress. Finally, to develop an approach for creating personalized stress models by
accounting for inter-individual differences.
Method
Participants
32 healthy volunteers were recruited through flyers posted across the campus at Washington University
in St. Louis. Respondents who met the inclusion criteria—between 18 and 69 years of age, with no heart
disease, not pregnant at the time of recruitment, and not having an implanted pacemaker—were
3
screened over the phone; if participants met all inclusion criteria, they were recruited for the study. All
participants received a $25 Amazon gift card. The institutional review board of Washington University
approved this study, and written consents were obtained from all participants (IRB#2019-04150).
Study Smartwatch
We instrumented a commercial smartwatch, Fossil Gen4 Explorist [26], to collect physiological and
motion signals, and to deliver self-reports. This smartwatch is equipped with a photoplethysmogram
(PPG) sensor as well as a six-axis inertial measurement unit (IMU) and run on Google’s Wear OS system
[27]. On the Wear OS infrastructure, we designed an optimized data collection application with auto
triggered self-reports. From the smartwatch, we collected PPG raw waveform at 200Hz and IMU motion
data at 50Hz. The instrumented watch lasted for approximately 15 hours on a single charge with
continuous data collection. The data collected from the smartwatch was initially stored locally, and then
uploaded to a secure server for further analysis (see Figure 1B).
In the laboratory-based phase, recruited participants first completed two surveys on paper: the 10-item
Perceived Stress Scale (PSS) [14]and the 42-item Depression Anxiety Stress Scales (DASS) [15]; both of
these surveys have been shown to be effective in the measurement of perceived stress and depression
[28,29]. After the completion of the surveys, participants were asked to wear the study smartwatch on
their non-dominant hand.
The laboratory-based phase included several stages (see Figure 1A for the sequence). First, participants
had a 20-minute resting period, during which they watched a relaxing nature-oriented program, as a
“non-stressed” period [21,22,30] (we refer to this period as “video-based resting period”). During this
period, participants were left alone in a room and asked to relax as much as possible. If participants felt
stressed or uncomfortable during this video-based resting period, they were instructed to discuss with
the study coordinator to potentially stop their continued participation. No participants withdrew during
this video-based resting period.
4
Figure 1. A. Sequence of activities for the laboratory-based phase of the study. B. Fossil Gen4
Explorist smartwatch instrumented for this study. C. Computer-based math tasks used for the
study.
After the first 20-minute resting period, participants were provided with general instructions regarding a
series of tasks related to public speaking, mental arithmetic, and cold stressor. These tasks
corresponded to social, cognitive, and physical stressors and have been applied as stress-inducing
stimuli in controlled settings in previous laboratory-based studies [21,22,31]. Participants completed
each of these stressor tasks in the same order, with 5-minute resting and recovery periods between
each stressor task. We asked the participants to hold their watch-wearing hand as still as possible during
the laboratory-based phase, as the physiological signals recorded from the smartwatch are vulnerable to
physical motion [32].
The first task was the public speaking task. For this task, participants were given a topic, a 4-minute
preparation time, and then were asked to speak to the study coordinator and a researcher in the room
for a period of 4 minutes. At the end of the public speaking task, participants were given a 5-minute rest
and recovery period. Next, participants were given instructions regarding a mental arithmetic task. This
task was completed on a computer using an application that we developed (see Figure 1C). The task
involved mentally adding the digits of a number, and then adding the total to the original number. For
example, if the initial number was 234, the sum of the digits was 9, and the next number would be 243.
On the application, there was a countdown timer (for 4 minutes), and an indicator for the number of
errors that the participant made. If a participant made three consecutive errors, they had to restart the
task. Participants were asked to achieve at least 20 accurate responses. Participants completed the
arithmetic tasks sequentially twice, once standing up, and once seated on a chair, in the same order. At
the end of the arithmetic task, participants were given another 5-minute rest and recovery period.
The last task was the cold stressor task. We used a custom-made solid stainless-steel cylinder (10
centimeters high, 5 centimeters diameter) that is routinely used for cold allodynia tasks [33]. The rod
5
was kept in the refrigerator for a period of 12 hours and measured approximately at 4 at study time.
The task involved each participant holding the rod for a period of 90 seconds on each hand (first, in their
dominant hand, followed by the non-dominant, in that order). If the pain was unbearable, participants
were asked to release their hold prior to the end of the testing period (i.e., 90 seconds). At the end of
the cold stressor task, participants were asked to watch another nature-oriented program for a period
of 20 minutes as a relaxation period.
As previously described, at the end of each stressor task, there was a 5-minute resting and recovery
period that potentially allowed for the stressor to subside prior to the next exposure. Additionally, at the
end of each stressor task, participants were automatically sent a self-report on the watch that asked
them regarding their “stress” with a 4-option response: “*Happy+ *Stressed+ *Tired+ *Neutral+.” This self-
report was based on scales used in previous studies of the measurement of stress [21–23]. For example,
Zachary et al. showed that the “Happy” response was negatively correlated to the intended stress,
which can be used as the indicator of non-stress [23]. The “stressed” and “neutral” responses were
direct indicators of self-reported (i.e., subjective) stress and non-stress.
At the completion of the final resting and recovery period, participants were given instructions for the
field-based phase on wearing the watch, charging, and responding to self-reports. In addition,
participants were given the smartwatch, its charger, and a paper-based physical activity tracker.
Participants recorded their physical activity that they participated in while wearing the watch along with
the start and finish times on the paper-based physical activity tracker. Additionally, all participants had a
return date and time scheduled at the end of the field phase such that all materials could be collected.
At the return visit, all participants were given a $25 Amazon gift card for their study participation. The
data collected in the field phase was exploratory to evaluate the viability of collecting self-reports and
collecting physiological data regarding stress in free living situations.
Based on a participant’s self-reported response after a task, we labeled the preceding task as “stressed”
or “not stressed.” Our framework for analysis compared the performance of stress prediction models
developed using objective stressors and the subjective responses (i.e., self-reports) for each of the
stressor tasks (see Figure 2). This approach established the potential for using subjective responses as a
proxy for stressors.
6
Figure 2. Framework for analysis: comparing the performance of machine learning models
across objective measurements of stressor tasks with subjective participant responses as
measurements.
Data Preprocessing
We conducted several data pre-processing tasks to translate the raw data file into interpretable
physiological readings. Across all study participants, there was an average sampling rate of 206.02 Hz for
the PPG sensor and 48.81 Hz for the IMU sensor. As the sampling rates were not stable, we
synchronized data based on the timestamps of each sensor event and then re-sampled the synchronized
PPG and IMU data at 200Hz with Hermite spline interpolation [36]. The re-sampled data were
segmented into sliding windows with a window size of 60 seconds and a step size of 20 seconds. The 60-
second window size has been previously used for stress-related studies [21,23].
Intense movement and poor physical contact between the PPG sensor and skin can potentially degrade
the quality of the PPG signal. We first employed a forward-backward Butterworth bandpass filter to
remove the noise outside of heart rate and respiration band with cutoff frequency of 0.15Hz and 4Hz in
each window [37]. To screen out motion artifacts and poor signal sequences within the heart and
respiration band, we further utilized a sliding sub-window approach, which divided each 60-second
window into 10-second sub-windows with a 2-second step size. The motion detector [38] and heartbeat
pattern detector [39] were then applied on each of these 10-second sub-windows. The motion detector
detects movement based on the IMU sensor. The heartbeat pattern detector can validate whether a
PPG waveform matches a valid heartbeat pattern.
Only the sub-windows that passed both detectors were marked as valid signals. Once we iterated
through all sub-windows within the 60-second window, valid consecutive sub-windows were merged
into a larger “valid” segment. Features were extracted only from these valid segments of signals. This
approach helped in eliminating short invalid signal periods within 60-second windows and increased the
availability of testing samples (see similar approach in [23]). Figure 3 shows an example of how we
translated the raw PPG signal into “valid” and “invalid” segments. We set a threshold of 25% for the
7
invalid period. If the invalid period was more than this threshold, the entire 60-second window was
discarded.
We also retrieved self-reported responses from participants at the end of each stressor task. The self-
reported responses were grouped into two categories “stressed” (with responses of stressed) and “not
stressed” (with responses of happy, tired, or neutral). These responses were used as labels for the
analysis for determining subjective stress patterns (see section on “Objective and subjective models of
stress”).
8
Figure 3. Overview of the multi-stage data processing, machine learning for objective and
subjective stress machine learning pipeline. (LOSO: leave one subject out.)
9
Feature Extraction
Features were extracted from valid segments in the 60-second window after pre-processing. We used
the peak detection method with adaptive filtering [40] to extract the inter-beat interval (IBI) series from
PPG data. IBI refers to the time interval between individual heart beats. Figure 3 displays the waveforms
from the PPG sensor. Each peak (marked by a red dot) represents one heartbeat. By detecting the peaks
of the waveform and calculating the horizontal distances between the dots, we can extract the time
series of the IBI.
Heart Rate Variability (HRV) features and other non-HRV features were derived from the IBI series. We
also used the Detrended Fluctuation Analysis (DFA), a method to measure the statistical self-similarity of
a signal, to determine non-stationarity within the IBI series [41].
In addition, we extracted respiration-related features, which have been known to be associated with
stress [21,42]. Though the smartwatch sensors do not directly provide respiratory rate, we estimated
respiratory rate based on three respiratory-induced variations from the IBI series and the PPG signal,
similar to what has been used in previous research [43,44]: respiratory-induced amplitude variation
(RIAV), respiration-induced intensity variation (RIIV), and respiratory-induced frequency variation
(RIFV). RIAV is the change in peripheral pulse strength, caused by reduced ventricular filling, and is the
peak height change in the raw PPG waveform. RIIV is the change of perfusion baseline, caused by the
intrathoracic pressure, and is the intensity change in each peak-to-valley in the raw PPG waveform.
Finally, RFIV is the change of heart rate, caused by the autonomic response to the respiration cycle, and
is represented as the heartbeat interval change. We interpolated these three variations using linear
interpolation at 100Hz. Since the respiration is cyclic, these changes could be also cyclic. We employed
Fast Fourier Transfer (FFT) with a Hamming window to calculate major frequency for each respiration
rate variation in the data window. The major sinusoidal frequency in the FFT was used as an estimation
of the respiratory rate.
Table 1. Features that were extracted from the IBI and PPG signals. HRV features [45]: SDNN,
standard deviation of the IBI of normal heartbeats; RMSSD, root mean square of successive
differences between normal heartbeats; SDSD, standard deviation of differences between
adjacent IBI; pNNX, percentage of successive IBIs that differ by more than X milliseconds.
Non-HRV features are other features extracted from the IBI time series. Respiratory-related
features: FFT_RIIV, major frequency of the respiratory-induced intensity variations; FFT_RIAV,
major frequency of the respiratory-induced amplitude variations; FFT_RIFV, major frequency
of the respiratory-induced frequency variations.
Category Features
10
HRV features SDNN, RMSSD, SDSD,
pNN20, pNN50,
low frequency (LF) energy (0.04-0.15 Hz),
high frequency (HF) energy (0.15-0.40 Hz),
LF/HF energy ratio (LF_HF)
Non-HRV Inter-beat mean, median, minimum,
Interval features maximum, interquartile range (iqr),
20th percentile, 80th percentile,
detrended fluctuation analysis (DFA),
heart rate
Respiration-related FFT_RIIV, FFT_RIAV, FFT_RIFV
features
A total of 20 features were extracted (see Table 1). All features were standardized using a normalization
method, where the median was removed, and each feature was divided by its interquartile range. As
there are large differences in an individual’s physiological signal manifestations, we applied this
normalization method on each individual’s feature data to alleviate the subject-specific components in
the feature data [21,23,30].
We used multiple machine learning models on both the objective and subjective stress detection
including support vector machine (SVM), random forest (RF), AdaBoost, gradient boosting (GB) and
logistic regression (LR). These models have been widely applied in the literature on developing similar
health-related models [23,46]. These models also generate probability estimates for each prediction. By
tuning the threshold probabilities, it is possible to achieve a desired sensitivity or specificity. We used a
fixed threshold for the prediction, where a signal with a probability>0.5 was categorized as stressed.
The hyperparameters for each model were tuned with grid search to achieve highest F1-Score (See
Supplementary Materials). F1-score is the harmonic mean of the precision and recall, represents the
performance with an imbalanced dataset [47] . When training the model, we up-sampled the minority
class to avoid skewed prediction on the majority class, as we have more data in the resting period (non-
stressor) than in the stressor tasks. We applied leave-one-subject-out (LOSO) cross validation; in other
words, we evaluated each participant’s data with a model trained on all the other participants’ data.
This ensured that there was no overlap for each participant between the training and validation dataset.
After choosing the model with the best F1-score, we ran the feature selection algorithms to eliminate
the highly correlated [48] and unimportant features. For the SVM model with radial basis function (RBF)
kernels, we employed the multi-kernel learning for feature selection. The feature importance was
ranked based on the kernel weight coefficient for each feature [49]. For the tree-based models, the
feature importance was derived from the Sklearn Python package [50]. For the LR, the weight coefficient
11
of each feature was regarded as feature importance. Feature selection helped in trimming the model
and avoiding overfitting.
For both objective and subjective stress models, we computed the area under receiver operating curve
(AUROC), accuracy, sensitivity (recall), specificity, and precision (positive predictive value). All the
evaluations were run 10 times, and an average performance metric with standard deviations was used
for all reported results.
In the machine learning models described above, we first used the same threshold of probability
estimates (=0.5) for each participant. In other words, we classified data signals as stressed if the
probability estimates exceeded this threshold. We also developed models with personalized threshold
for the subjective stress detection. The subjective self-reported response usually suffers from individual
differences in responses[51]. To address the challenge induced by those inter- and intra-individual
differences in the experience of stress, we incorporated a personalized self-perception of stress
threshold by exploiting the correlation between the model prediction threshold and the participant’s
cumulative PSS score. Towards this end, we extracted the best threshold using a grid search, to obtain
the best prediction accuracy score for each participant in the training set. Then, based on the best
threshold, we used ridge regression [52] to fit the relationship between the best threshold and a
participant’s PSS score in the training set. We generated the threshold for each individual from the
regression model in the testing set, and use this threshold to classify signals as stressed or as not
stressed for the individual. This approach personalizes the stress detection with the generated threshold
from the PSS score regression model, while avoiding the complexity of training a personalized machine
learning model for each individual (See Figure 3, bottom half).
To investigate the differences in the features between stressed and non-stressed groups (in both
subjective and objective models), we performed one-way Analysis of variance (ANOVA) tests on all the
normalized features to test the hypothesis that whether the mean values were significantly different
between the stressed and non-stressed group of signals. We used a p-value <0.05 for significance, unless
otherwise specified.
12
Results
General Characteristics
All participants (n=32) successfully completed the entire study protocol. However, data on two
participants were not used for final analysis due to partial malfunction of the smartwatch. Participants
were primarily female (n=24), with an average age of 36 years (S.D.=12.6, Supplementary Figure S1).
The average PSS score across participants was 11, showing the participants had average low or
moderate perceived stress (S.D.=5.0 (Supplementary Figure S2); based on the PSS score range
[28,54,55], 10 participants reported moderate stress, 22 participants reported low or no stress, and no
participants reported high stress. The average stress score on the DASS scale was 5.2, (S.D.=4.5). A
moderate positive relationship was observed between the PSS and DASS (r=0.53, p<0.005).
Except for nine participants who only completed part of the cold stressor tasks, all other tasks were
completed by all participants (Supplementary Figure S3). In total, we captured 1700 minutes of PPG
signal data, with a 1160-minute resting period, and a 540-minute stressor period.
Predicting Stressed and Non-Stressed Periods from Stressor Tasks (Objective Stress)
We first investigated the ability of machine learning models to predict the objective stress. Based on the
objective stress definition, we labeled the data during the stressor tasks as “stressed” and video-based
resting period as “not-stressed”. A fixed threshold of 0.5 was adopted for the probability output from
the machine learning models: i.e., if the probability was greater than 0.5, we classified the signal as
stressed (and vice-versa).
We found that the SVM outperforms other machine learning models, with an F-1 Score of 0.623,
highlighting that these models have predictive capabilities of differentiating stress and resting periods
(See Table 2).
Table 2. Predictions of stressed and non-stressed periods using multiple machine learning algorithms for
objective stress. Mean (S.D.) are reported.
Next, we investigated whether the machine learning models can differentiate between the three
induced stressor tasks, i.e., the social (speech), cognitive (math), and physical (cold) stressors. As the
SVM achieved the best performance for differentiating stress and non-stress periods, we evaluated its
performance on each of the stressor tasks.
13
Table 3. Predictions of social, cognitive and physical stressor tasks using the SVM model. Mean (S.D.) are
reported.
Figure 4. Clusters of various stressor activities based on t-distributed stochastic neighbor embedding (t-
SNE). The social and cognitive stressors (red and yellow dots) are clustered separately from the video-
based resting phase (green dots). The cold stressors could not be differentiated from the video-based
resting tasks.
We investigated the potential causes for the lower performance of the physical (cold) stressor tasks.
Using t-distributed stochastic neighbor embedding (t-SNE), an unsupervised approach, we visualized the
extracted features across the three stressor tasks (See Figure 4). The t-SNE plot showed that features
from the social and cognitive stressor tasks (yellow and red dots) were separated from the large cluster
of resting tasks (dark green dots). We can observe a relatively clear separation boundary between the
social and cognitive stressors and the video-based resting period, but no clear separation boundary
14
between physical stressor and video-based resting period. This lack of a clear boundary or separation
between the physical stressor and the resting period potentially explains the lower performance on
physical stressor.
Predicting Stressed and Non-Stressed Periods from Self-reported Responses (Subjective Stress)
For predicting the subjective stress, we used the participants’ self-reported responses after each
stressor task as the ground truth. When a participant’s self-reported response was “not stressed,” we
labeled the data during that stressor task as “non-stressed” (and vice-versa). Nearly 48% stressor tasks
were labelled as “stressed” based on participants’ self-reported responses. We first used the fixed
threshold of 0.5 as the cutoff to classify the stressed and non-stressed (same as the objective stress
model).
Table 4. Predictions of stressed and non-stressed periods using multiple machine learning algorithms for
subjective stress. Mean (S.D.) are reported (based on the fixed threshold of 0.5).
Similar to objective stress, SVM achieved the highest F-1 score (0.520). However, the model
performance was lower, indicating that the subjective stress was potentially harder for machine learning
algorithms to detect.
We performed one-way ANOVA tests on all features (from Table 1) for each stressor task comparing
self-reported responses of “stressed” and “non-stressed” (See Figure 5). Comparing each stressor with
the video-based resting (i.e., social stressor-resting (S-R), cognitive stressor-resting (C-R), physical
stressor-resting (P-R)), we found that the HR during social and cognitive stressors was significantly
higher than during video-based resting. The SDNN and LF were significantly lower during the cognitive
stressor. SDSD and RMSSD were higher for the social stressor. Other HRV features did not show
significant differences between the stressor and video-based resting.
Similarly, respiration-related features such as the RIAV_FFT and the RIIV_FFT were significantly higher
during cognitive stressor. Nonetheless, comparisons between the physical stressor and the resting
phases showed fewer features with significant differences. The physical stressor also induced less heart
rate and SDNN changes compared to other stressors (See Supplementary Figure S6). Similarly, we found
that periods labelled using self-reported stress (i.e., subjective stress) had fewer significant differences
on the considered features compared with the objective stressor models (See last column of Figure 5),
which were consistent with their lower machine learning performance.
15
Figure 5. Differences of means for each feature between stressed and non-stressed (in both subjective
and objective stress). Each row represents a feature; each column represents the comparison of
stressed and not stressed groups. Colors represent positive(red) or negative(blue) differences. For
example, the heart rate (HR) is significantly higher (i.e., red) during social stressor compared to during
video-based resting. (S: social stressor; C: cognitive stressor; P: physical stressor; ALL: all three stressors;
R: video-based resting; Reported_1: stressed based self-reported response; 0: not-stressed based on
self-reported response and video-based resting; *p<0.05, **p<0.005).
We analyzed the differences of the PSS scores, between participants who reported stress and those that
did not, based on their self-reported responses for each of the three stressor tasks using Wilcoxon rank-
sum tests (See Table 5). We observed that the mean PSS score of people reported being stressed was
larger than those reporting not stressed, for both the social and cognitive stressors, although the
differences were not statistically significant. However, the differences in mean between the non-
stressed and stressed groups could potentially be used to improve the subjective stress prediction
models, utilizing these scores as a personalized stress threshold.
16
Table 5. Differences in the PSS scores for participants reporting as stressed or non-stressed with each of
the stressor tasks (on self-reports). Note that the p values were not statistically significant between the
stressed and non-stressed groups for each of the tasks.
PSS score
Difference of mean p value
Social (Speech) 3.14 0.085
Cognitive (Math) 3.18 0.055
Physical (Cold) 1.67 0.243
Towards this end, we extracted the best predicting threshold to achieve the highest classification
accuracy for each individual with the grid search on the probability output from the machine learning
models, based on their response labels (e.g., stressed or not stressed). Table 6 shows the prediction
results using the best threshold. The results have a significant improvement compared with the result
with the fixed threshold (=0.5) (See Table 4).
Table 6. Model performance with using the best threshold for each participant. Mean (S.D.) are
reported.
To investigate the relationship between the PSS scores and the best probability threshold, we measured
the degree of association using Pearson’s correlation coefficient. There was a negative correlation
between the PSS score and the best probability threshold with . Participants
reporting that they were stressed on their self-reports tended to have a higher PSS score (i.e., higher
perceived stress), and had a lower threshold of being predicted as stressed. With a lower threshold, the
machine learning models have a higher probability to predict the signals as stressed.
To take advantage of this observation, we trained a linear regression model with ridge regression
between the best threshold and the PSS score on the training set, Then, in the test set, we generate the
threshold from the trained regression model for each participant. Unlike the best threshold, which
needs the ground truth label in the test set to select for each participant, the regression model is
obtained from the training set, and the participants in the test set do not rely on any specific ground
truth labels, it can be generalized to new, unseen participants.
Table 7 shows the LOSO cross-validation results with the personalized threshold retrieved from the
linear regression. The SVM model had the best performance with an F-1 score of 0.599. The
performance of the models dropped compared to the best threshold models (Table 6) but were still
considerably better than the fixed threshold (Table 4).
17
Table 7. Model performance using the threshold derived from the linear regression. Mean (S.D.) are
reported.
Discussion
The impact of stress on the development of chronic and diseases is well-acknowledged—with its label as
a “silent killer.” [10]. In spite of its considerable effects on long term health, we have limited
mechanistic characterization of the causal underpinnings of stress. An understanding of the interplay
between subjective and objective markers of stress can provide preliminary insights to address the
deleterious effects of stress. There is limited consensus in the current literature regarding the
associations between subjective and objective stress. For example, Föhr et al. found that subjective, self-
reported stress was associated with objective HRV-based stress and recovery, but was affected by
several external factors (i.e., physical activity and body composition). Others have noted that objective
and subjective measures of stress, in fact, measure different things, and may have different pathogenic
consequences [6,7], although both could lead to adverse outcomes [8]. We investigated subjective
stress and objective stress separately using data-driven approaches with machine learning techniques
and evaluated potential associations between them.
Towards this end, we conducted a laboratory-based study collecting physiological data on a smartwatch
on objective markers of stress and subjective participant responses (using self-reports) on episodes of
stress. Using machine learning techniques, we compared the performance of models that used
physiological signals to detect stress with the objective markers, and with the subjective participant
responses using self-reports. We found that the performance of subjective stress models was lower than
that of the objective models of stress; however, the use of personalized thresholds, derived from
standardized scales such as the PSS improved the performance of the subjective models of stress. This
study, conducted using a commercial off-the-shelf smartwatch, affords new opportunities and directions
for the study of stress in routine situations. We discuss the directions for future research on
measurement of stress using smartwatches.
First, as opposed to prior studies our study utilized only a single smartwatch as opposed to multiple
body-word sensing devices [21,22], and achieved a reasonable performance compared to similar studies
[21–23]. One of the highlights of our study is that the smartwatch usage did not add additional stress to
participants compared to the complex body-worn devices used in other studies [56], which allows in
capturing robust, potentially unbiased signals. With the advances in wearable technology and modeling
techniques, there is considerable potential for improving the performance of stress prediction models.
Closely related is the fact that we used some respiration-related features from the raw PPG signal; some
18
of these features (RIAV_FFT, RIFV_FFT) showed significant differences between periods of stress and
resting and were used for the models (Supplementary Figure S4 and S5). This suggests potential use for
utilizing respiration-related features for stress prediction. Although we are currently limited to the
respiration-related features available from the PPG sensors, this provides a potential direction for future
research.
Second, although commercial smartwatches, such as the one we used, affords considerable capabilities
for real-time, unobtrusive capabilities for physiological signal monitoring, there are challenges for
effective signal processing. We created a multi-stage data processing pipeline that could be used for
future studies relying on physiological signals from smartwatches. The forward-backward filter that we
used can remove high frequency noise and baseline drift without phase shifts. Given the impact of
motion artifacts on smartwatch-based sensing, our multi-stage approach could be used to mitigate the
noise in the smartwatch signal data. Our approach involved creating sliding windows and associated
sub-windows (10s duration with 2s step size) for feature extraction and noise elimination. Within the
sub-windows, the combination of the motion detector and heartbeat pattern detection can detect both
motion artifacts and poor contact of the PPG sensor, potentially guaranteeing the elimination of the
noisy data. This contributes to a higher data yield, as we can preserve data after the elimination of noisy
spikes. Features extracted from the windows that are free from the motion artifacts and noise are likely
to introduce less confounders in the prediction models.
Third, our approach of utilizing inter-individual differences as an input for model prediction offers new
directions for modeling stress in free-living situations. Although in our case the model performance did
not improve much compared with the objective stress model, this approach affords a realistic way to
the personalizing stress models. Previous studies (e.g., Hovsepian, K. et al. [21]) have explored the
potential for developing personalized models based on participant’s training data. However, such
models are dependent on a limited amount of a participant training data and often does not scale to
real-world applications. Additionally, such models also tend to overfit for each participant, restricting
generalizability. Smets et al. conducted a large-scale stress study with wearable sensors in a free-living
setting [30]; their prediction model had an F-1 score of 0.4. In contrast, our approach utilizes a generic
model, relying on a standardized stress scale for each participant, with a personalized threshold. In our
models, we investigated the relationship between the survey responses and the threshold. Additional
variables such as age, work environment, lifestyle and behaviors could potentially be incorporated into
building a more informed personalized threshold. Such an approach can provide new directions towards
a precision medicine approach for stress, utilizing personalized models that adjust for physiological
stress response [30].
Finally, although we found that the models using subjective stressors had lower performance, our
methodological approach has several pragmatic uses. Much of the research on self-reports, especially
those using non-standard scales, have used it as a “gold standard,” relying on its ecological validity
rather than using objective measurements for comparison [57,58]. In this study, using a novel
comparison using machine learning techniques, we compared the objective and subjective (self-
reported) measures of stress. Our mechanism for such a comparison relied on appropriately timed
smartwatch based self-reports [20]. Such self-reports delivered via smartwatches offers a sustainable
19
approach to capturing subjective markers as they offer a quick and easy mechanism for participants to
respond to questions. Our response rate during the laboratory-based study was 100% across all tasks
(n=96). Although not reported in this paper, the overall response rate of the self-reports during the free-
living phase was nearly 90% with an average response time of <30 seconds, showing its potential for
participant compliance to such short self-reports. As such the smartwatch-based self-report approach
can be particularly useful for capturing subjective responses in a variety of settings.
We acknowledge several limitations of this study. This was a single site study with 32 participants, and
as such the results may not be generalizable. This is because the laboratory-based study is unlikely to
account for the complexities associated with capturing physiological signals in free living settings.
laboratory-We did not counter-balance the order of presentation of the stressor tasks. This may have
affected the stress perception in the later, like the cold stressor tasks. It is also potentially possible that
the stressor tasks did not induce the necessary stress in the participants, which may have affected their
subjective self-reported responses. The stressor tasks were developed from previously used
experiments using wearable sensors [21]. These tasks were simplified versions from the Trier Social
Stress Test [59] and the physical stressor. Although previous literature has shown that deep learning
techniques could potentially address the motion artifacts and noisy data (See e.g., [60]), the relatively
small set of participants made it difficult to apply such models to our data. As such, we relied on our
proposed data modeling algorithms remove noisy fragments of the physiological signals to ensure
maximum data availability.
This was an exploratory study comparing the performance of machine learning-based models for stress
prediction using subjective and objective markers of stress. Although the subjective stress models had a
lower performance when compared to objective stress models, additional research with a potentially
larger sample of participants is required. Our approach, however, establishes an approach for a
commercial smartwatch-based stress detection, developing and comparing objective and subjective
stress prediction models, and a framework for personalized stress prediction models. Finally, we did not
report on the data collected for a day, once the participants completed the laboratory-based portion of
the study. The purpose of data collection during this phase was to evaluate viability of collecting self-
reports in the wild and mapping to corresponding physiological signals from a smartwatch. However,
because of the relatively small data sample (1 day), we could not perform any meaningful computational
analyses. We are currently exploring the possibility of expanding our models for stress prediction in free
living situations.
Data Availability
De-identified data that support the findings are available upon request from the corresponding author
upon reasonable request.
20
Acknowledgments
This research was supported in part by a grant-in-aid from the Division of Clinical and Translational
Research of the Department of Anesthesiology, Washington University School of Medicine, St. Louis,
Missouri; and in part, by the Healthcare Innovation Lab and the Institute for Informatics at BJC
HealthCare and Washington University School of Medicine. The authors would also like to give special
thanks to Dr. Simon Haroutounian for providing the physical stressor equipment.
Author Contributions
R.D., T.K, C.L., L.Y., M.A., E.L. contributed to the design and implementation of the study. R.D. and L.Y.
were collected. R.D. accomplished the data analysis under the supervision of T.K. and C.L. All authors
contributed to the interpretation of results and final review of the manuscript.
Competing Interests
The authors declare no competing interests.
Declaration of interests
The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
References
[1] H.M. van Praag, Can stress cause depression?, World J. Biol. Psychiatry. 6 (2005) 5–22.
https://doi.org/10.1080/15622970510030018.
[2] J.N. Morey, I.A. Boggero, A.B. Scott, S.C. Segerstrom, Current directions in stress and human
immune function, Curr. Opin. Psychol. 5 (2015) 13–17.
https://doi.org/10.1016/j.copsyc.2015.03.007.
[3] S.C. Segerstrom, G.E. Miller, Psychological stress and the human immune system: A meta-analytic
study of 30 years of inquiry, Psychol. Bull. 130 (2004) 601–630. https://doi.org/10.1037/0033-
2909.130.4.601.
[4] C.A. Low, K. Salomon, K.A. Matthews, Chronic life stress, cardiovascular reactivity, and subclinical
cardiovascular disease in adolescents, Psychosom. Med. 71 (2009) 927–931.
21
https://doi.org/10.1097/PSY.0b013e3181ba18ed.
[5] S. Arora, N. Sevdalis, D. Nestel, M. Woloshynowych, A. Darzi, R. Kneebone, The impact of stress
on surgical performance: A systematic review of the literature, Surgery. 147 (2010) 318-330.e6.
https://doi.org/10.1016/j.surg.2009.10.007.
[6] Í.J.S. Ribeiro, R. Pereira, I. V. Freire, B.G. de Oliveira, C.A. Casotti, E.N. Boery, Stress and Quality of
Life Among University Students: A Systematic Literature Review, Heal. Prof. Educ. 4 (2018) 70–77.
https://doi.org/10.1016/j.hpe.2017.03.002.
[7] R. Sinha, Chronic stress, drug use, and vulnerability to addiction, Ann. N. Y. Acad. Sci. 1141 (2008)
105–130. https://doi.org/10.1196/annals.1441.030.
[9] J. Hassard, K.R.H. Teoh, G. Visockaite, P. Dewe, T. Cox, The cost of work-related stress to society:
A systematic review, J. Occup. Health Psychol. 23 (2018) 1–17.
https://doi.org/10.1037/ocp0000069.
[10] S.M. Goodday, S. Friend, Unlocking stress and forecasting its consequences with digital
technology, Npj Digit. Med. 2 (2019) 1–5. https://doi.org/10.1038/s41746-019-0151-8.
[11] O. Parlak, S.T. Keene, A. Marais, V.F. Curto, A. Salleo, Molecularly selective nanoporous
membrane-based wearable organic electrochemical device for noninvasive cortisol sensing, Sci.
Adv. 4 (2018) eaar2904. https://doi.org/10.1126/sciadv.aar2904.
[12] M.D. Hladek, S.L. Szanton, Y.E. Cho, C. Lai, C. Sacko, L. Roberts, J. Gill, Using sweat to measure
cytokines in older adults compared to younger adults: A pilot study, J. Immunol. Methods. 454
(2018) 1–5. https://doi.org/10.1016/j.jim.2017.11.003.
[13] C. Kirschbaum, S. Wüst, H.G. Faig, D.H. Hellhammer, Heritability of cortisol responses to human
corticotropin-releasing hormone, ergometry, and psychological stress in humans, J. Clin.
Endocrinol. Metab. 75 (1992) 1526–1530. https://doi.org/10.1210/jcem.75.6.1464659.
[14] S. Cohen, T. Kamarck, R. Mermelstein, A global measure of perceived stress., J. Health Soc.
Behav. 24 (1983) 385–396. https://doi.org/10.2307/2136404.
[15] P.J. Norton, Depression Anxiety and Stress Scales (DASS-21): Psychometric analysis across four
racial groups, Anxiety, Stress Coping. 20 (2007) 253–265.
https://doi.org/10.1080/10615800701309279.
[16] E. Remor, Psychometric properties of a European Spanish version of the Perceived Stress Scale
(PSS), Span. J. Psychol. 9 (2006) 86–93. https://doi.org/10.1017/S1138741600006004.
[17] R. Siqueira Reis, A.A. Ferreira Hino, C. Romélio Rodriguez Añez, Perceived Stress Scale: Reliability
and Validity Study in Brazil, J. Health Psychol. 15 (2010) 107–114.
https://doi.org/10.1177/1359105309346343.
[18] S. Shiffman, A.A. Stone, M.R. Hufford, Ecological Momentary Assessment, Annu. Rev. Clin.
22
Psychol. 4 (2008) 1–32. https://doi.org/10.1146/annurev.clinpsy.3.022806.091415.
[19] R. Wang, F. Chen, Z. Chen, T. Li, G. Harari, S. Tignor, X. Zhou, D. Ben-Zeev, A.T. Campbell,
Studentlife: Assessing mental health, academic performance and behavioral trends of college
students using smartphones, in: UbiComp 2014 - Proc. 2014 ACM Int. Jt. Conf. Pervasive
Ubiquitous Comput., Association for Computing Machinery, Inc, New York, New York, USA, 2014:
pp. 3–14. https://doi.org/10.1145/2632048.2632054.
[21] K. Hovsepian, M. Al’absi, E. Ertin, T. Kamarck, M. Nakajima, S. Kumar, CStress: Towards a gold
standard for continuous stress assessment in the mobile environment, in: UbiComp 2015 - Proc.
2015 ACM Int. Jt. Conf. Pervasive Ubiquitous Comput., ACM Press, New York, New York, USA,
2015: pp. 493–504. https://doi.org/10.1145/2750858.2807526.
[22] K. Plarre, A. Raij, S.M. Hossain, A.A. Ali, M. Nakajima, M. Al’Absi, E. Ertin, T. Kamarck, S. Kumar,
M. Scott, D. Siewiorek, A. Smailagic, L.E. Wittmers, Continuous inference of psychological stress
from sensory measurements collected in the natural environment, in: Proc. 10th ACM/IEEE Int.
Conf. Inf. Process. Sens. Networks, IPSN’11, 2011: pp. 97–108.
[23] Z.D. King, J. Moskowitz, B. Egilmez, S. Zhang, L. Zhang, M. Bass, J. Rogers, R. Ghaffari, L.
Wakschlag, N. Alshurafa, micro-Stress EMA: A Passive Sensing Framework for Detecting in-the-
wild Stress in Pregnant Mothers, Proc. ACM Interactive, Mobile, Wearable Ubiquitous Technol. 3
(2019) 1–22. https://doi.org/10.1145/3351249.
[24] P. Schmidt, A. Reiss, R. Duerichen, K. Van Laerhoven, Introducing WeSAD, a multimodal dataset
for wearable stress and affect detection, in: ICMI 2018 - Proc. 2018 Int. Conf. Multimodal
Interact., Association for Computing Machinery, Inc, 2018: pp. 400–408.
https://doi.org/10.1145/3242969.3242985.
[25] E. Smets, G. Schiavone, E.R. Velazquez, W. De Raedt, K. Bogaerts, I. Van Diest, C. Van Hoof,
Comparing task-induced psychophysiological responses between persons with stress-related
complaints and healthy controls: A methodological pilot study, Heal. Sci. Reports. 1 (2018) e60.
https://doi.org/10.1002/hsr2.60.
[28] S. Cohen, G. Williamson, Perceived stress in a probability sample of the United States, Soc.
Psychol. Heal. 13 (1988) 31–67. https://doi.org/10.1111/j.1559-1816.1983.tb02325.x.
[29] A. Osman, J.L. Wong, C.L. Bagge, S. Freedenthal, P.M. Gutierrez, G. Lozano, The Depression
Anxiety Stress Scales-21 (DASS-21): Further Examination of Dimensions, Scale Reliability, and
Correlates, J. Clin. Psychol. 68 (2012) 1322–1338. https://doi.org/10.1002/jclp.21908.
23
Janssens, S. Van Hoecke, S. Claes, I. Van Diest, C. Van Hoof, Large-scale wearable data reveal
digital phenotypes for daily-life stress detection, Npj Digit. Med. 1 (2018) 67.
https://doi.org/10.1038/s41746-018-0074-9.
[31] P. Karthikeyan, M. Murugappan, S. Yaacob, A review on stress inducement stimuli for assessing
human stress using physiological signals, in: Proc. - 2011 IEEE 7th Int. Colloq. Signal Process. Its
Appl. CSPA 2011, 2011: pp. 420–425. https://doi.org/10.1109/CSPA.2011.5759914.
[32] E. Smets, W. De Raedt, C. Van Hoof, Into the Wild: The Challenges of Physiological Stress
Detection in Laboratory and Ambulatory Settings, IEEE J. Biomed. Heal. Informatics. 23 (2019)
463–473. https://doi.org/10.1109/JBHI.2018.2883751.
[33] L. Ventzel, C.S. Madsen, A.B. Jensen, A.R. Jensen, T.S. Jensen, N.B. Finnerup, Assessment of acute
oxaliplatin-induced cold allodynia: A pilot study, Acta Neurol. Scand. 133 (2016) 152–155.
https://doi.org/10.1111/ane.12443.
[34] H. Selye, Stress without distress, in: Psychopathol. Hum. Adapt., Springer, 1976: pp. 137–146.
[35] S.D. Kreibig, Autonomic nervous system activity in emotion: A review, Biol. Psychol. 84 (2010)
394–421. https://doi.org/10.1016/j.biopsycho.2010.03.010.
[36] P.M.F. Nielsen, I.J. Le Grice, B.H. Smaill, P.J. Hunter, Mathematical model of geometry and fibrous
structure of the heart, Am. J. Physiol. - Hear. Circ. Physiol. 260 (1991) H1365--H1378.
https://doi.org/10.1152/ajpheart.1991.260.4.h1365.
[37] F. Foroozan, M. Mohan, J.S. Wu, Robust Beat-To-Beat Detection Algorithm for Pulse Rate
Variability Analysis from Wrist Photoplethysmography Signals, in: ICASSP, IEEE Int. Conf. Acoust.
Speech Signal Process. - Proc., 2018: pp. 2136–2140.
https://doi.org/10.1109/ICASSP.2018.8462286.
[38] T. Hao, C. Bi, G. Xing, R. Chan, L. Tu, MindfulWatch, Proc. ACM Interactive, Mobile, Wearable
Ubiquitous Technol. 1 (2017) 1–19. https://doi.org/10.1145/3130922.
[40] L. Dos Santos, J.J. Barroso, E.E.N. Macau, M.F. de Godoy, Application of an automatic adaptive
filter for Heart Rate Variability analysis, Med. Eng. Phys. 35 (2013) 1778–1785.
https://doi.org/10.1016/j.medengphy.2013.07.009.
[41] T. Penzel, J.W. Kantelhardt, L. Grote, J.H. Peter, A. Bunde, Comparison of detrended fluctuation
analysis and spectral analysis for heart rate variability in sleep and sleep apnea, IEEE Trans.
Biomed. Eng. 50 (2003) 1143–1151. https://doi.org/10.1109/TBME.2003.817636.
[42] A. Hernando, J. Lázaro, E. Gil, A. Arza, J.M. Garzón, R. López-Antón, C. De La Camara, P. Laguna, J.
Aguiló, R. Bailón, Inclusion of Respiratory Frequency Information in Heart Rate Variability Analysis
for Stress Assessment, IEEE J. Biomed. Heal. Informatics. 20 (2016) 1016–1025.
https://doi.org/10.1109/JBHI.2016.2553578.
24
[43] W. Karlen, S. Raman, J.M. Ansermino, G.A. Dumont, Multiparameter Respiratory Rate Estimation
From the Photoplethysmogram, IEEE Trans. Biomed. Eng. 60 (2013) 1946–1953.
https://doi.org/10.1109/TBME.2013.2246160.
[44] D. Jarchi, D. Salvi, L. Tarassenko, D.A. Clifton, Validation of instantaneous respiratory rate using
reflectance ppg from different body positions, Sensors. 18 (2018) 3705.
https://doi.org/10.3390/s18113705.
[45] F. Shaffer, J.P. Ginsberg, An Overview of Heart Rate Variability Metrics and Norms, Front. Public
Heal. 5 (2017) 258. https://doi.org/10.3389/fpubh.2017.00258.
[46] A. Bogomolov, B. Lepri, M. Ferron, F. Pianesi, A.S. Pentland, Pervasive stress recognition for
sustainable living, in: 2014 IEEE Int. Conf. Pervasive Comput. Commun. Work. PERCOM Work.
2014, 2014: pp. 345–350. https://doi.org/10.1109/PerComW.2014.6815230.
[47] S. Al-Azani, E.S.M. El-Alfy, Using Word Embedding and Ensemble Learning for Highly Imbalanced
Data Sentiment Analysis in Short Arabic Text, in: Procedia Comput. Sci., 2017: pp. 359–366.
https://doi.org/10.1016/j.procs.2017.05.365.
[48] M.A. Hall, Correlation-based Feature Selection for Machine Learning, 1999.
[51] M.B. Solhan, T.J. Trull, S. Jahng, P.K. Wood, Clinical Assessment of Affective Instability: Comparing
EMA Indices, Questionnaire Reports, and Retrospective Recall, Psychol. Assess. 21 (2009) 425–
436. https://doi.org/10.1037/a0016869.
[52] D.M. Hawkins, The Problem of Overfitting, J. Chem. Inf. Comput. Sci. 44 (2004) 1–12.
https://doi.org/10.1021/ci0342472.
[53] L. Van Der Maaten, G. Hinton, Visualizing data using t-SNE, J. Mach. Learn. Res. 9 (2008) 2579–
2625.
[54] M.J. Perera, C.E. Brintz, O. Birnbaum-Weitzman, F.J. Penedo, L.C. Gallo, P. Gonzalez, N. Gouskova,
C.R. Isasi, E.L. Navas-Nacher, K.M. Perreira, S.C. Roesch, N. Schneiderman, M.M. Llabre, Factor
structure of the Perceived Stress Scale-10 (PSS) across English and Spanish language responders
in the HCHS/SOL Sociocultural Ancillary Study., Psychol. Assess. 29 (2017) 320–328.
https://doi.org/10.1037/pas0000336.
25
7 (2019) e13978. https://doi.org/10.2196/13978.
[57] J. Kim, T. Nakamura, H. Kikuchi, T. Sasaki, Y. Yamamoto, Co-Variation of Depressive Mood and
Locomotor Dynamics Evaluated by Ecological Momentary Assessment in Healthy Humans, PLoS
One. 8 (2013) e74979. https://doi.org/10.1371/journal.pone.0074979.
[58] A.E. Cain, C.A. Depp, D. V. Jeste, Ecological momentary assessment in aging research: A critical
review, J. Psychiatr. Res. 43 (2009) 987–996. https://doi.org/10.1016/j.jpsychires.2009.01.014.
[59] C. Kirschbaum, K.M. Pirke, D.H. Hellhammer, The “Trier social stress test” - A tool for
investigating psychobiological stress responses in a laboratory setting, in: Neuropsychobiology,
1993: pp. 76–81. https://doi.org/10.1159/000119004.
[60] Y. Shen, M. Voisin, A. Aliamiri, A. Avati, A. Hannun, A. Ng, Ambulatory atrial fibrillation
monitoring using wearable photoplethysmography with deep learning, in: Proc. ACM SIGKDD Int.
Conf. Knowl. Discov. Data Min., Association for Computing Machinery, New York, NY, USA, 2019:
pp. 1909–1916. https://doi.org/10.1145/3292500.3330657.
26