Context-Aware Stress Monitoring Using Wearable and Mobile Technologies in Everyday Settings
Context-Aware Stress Monitoring Using Wearable and Mobile Technologies in Everyday Settings
Abstract
Background
Daily monitoring of stress is a critical component of maintaining optimal physical and
mental health. Physiological signals and contextual information have recently emerged
as promising indicators for detecting instances of heightened stress. Nonetheless,
developing a real-time monitoring system that utilizes both physiological and contextual
data to anticipate stress levels in everyday settings while also gathering stress labels
from participants represents a significant challenge.
Objective
We present a monitoring system that objectively tracks daily stress levels by utilizing
both physiological and contextual data in a daily-life environment. Additionally, we
have integrated a smart labeling approach to optimize the ecological momentary
assessment (EMA) collection, which is required for building machine learning models for
stress detection. We propose a three-tier Internet-of-Things-based system architecture
to address the challenges.
Methods
A group of university students (n=11) consisting of both males (n=4) and females
(n=7) with ages ranging from 18 to 37 years (Mean = 22.91, SD = 5.05) were recruited
from the University of California, Irvine. During a period of two weeks, the students
wore a smartwatch that continuously monitored their physiology and activity levels. A
context-logging application was also installed on their smartphone. They were asked to
respond to several EMAs daily through a smart EMA query system. We employed three
different machine learning algorithms to evaluate the performance of our system. The
mean decrease impurity approach was employed to identify the most significant features.
The k-nearest neighbor imputation technique was used to fill out the missing contextual
features.
Conclusion
We proposed a system for monitoring daily-life stress using both physiological and
smartphone data. The system includes a smart query module to capture high-quality
labels. This is the first system to employ both physiology and context data for stress
monitoring and to include a smart query system for capturing frequent self-reported
data throughout the day.
Introduction
Based on recent reports, a remarkable 70% of individuals in the United States have
encountered at least one symptom of stress within a given month [1]. Long-term stress
can lead to a compromised immune system, cancer, cardiovascular disease, depression,
diabetes, and substance addiction, among other serious effects [1]. In light of these
consequences, the routine monitoring of stress levels has become increasingly essential.
Thus, developing dependable techniques for promptly detecting human stress is
paramount.
The utilization of physiological signals as a modality for identifying stress has been
extensively explored in the literature [2, 3]. Among the physiological signals, the
photoplethysmograph (PPG) signal is considered a valuable information source for
stress detection [4]. This signal is influenced by the cardiac, vascular, and autonomic
nervous systems, which are all known to be impacted by stress [5]. With the rapid
development of wearable technologies, PPG signals can now be conveniently monitored
in daily life settings using cost-effective wearable devices [2]. Moreover, the
advancement of context-logging mobile applications has furnished a mechanism for
continuously monitoring and tracking a user’s contextual information, which
encompasses location, activities, weather, and other pertinent factors, in real-time.
Existing research has already illustrated the importance of this contextual information
in comprehending and detecting stressful events experienced by individuals [6].
Real-time monitoring of physiological signals and contextual data presents a
formidable challenge. The acquisition of physiological signals via smartwatches and
wearable devices is particularly prone to motion artifact noise [7], necessitating
extensive filtering and processing to enable their use in stress detection algorithms
within daily life settings. Moreover, developing a daily life stress monitoring system
mandates access to real-time stress level labels from participants, a task that poses
several challenges [8]. The timing of label querying is critical, requiring careful selection
of moments when participants are not engaged in activities such as sleeping, studying,
or working, to ensure optimal participation and reliable labels. Additionally, capturing
the moments most conducive to experiencing stressful situations is crucial [8].
Notwithstanding these obstacles, designing a robust and accurate system that captures
both physiological and contextual information in real-time while querying labels from
participants is an even greater challenge.
In this study, we present a context-aware daily life stress monitoring system that
leverages physiological and contextual data and incorporates a smart label querying
Related Works
This section presents an overview of the related works as summarized in Table 1. The
majority of the existing research works in stress detection are conducted in laboratory
settings or controlled environments [9–11]. In these studies, participants are typically
required to wear wearable devices while engaging in a sequence of experimental tasks,
such as viewing a series of images or videos or being exposed to stressful activities.
During the study, various kinds of bio-signals, such as PPG, Electrocardiogram (ECG),
Electrodermal Activity (EDA), and Electroencephalogram (EEG) are recorded and
employed for building models for stress detection. Despite the remarkable performance
obtained by these controlled experimental methods, such algorithms are not feasible for
usage in real-world stress detection systems. The data gathered in daily life is
susceptible to contextual confounders and motion artifact noise resulting from
movements and routine activities. Moreover, the type of stress encountered in daily life
scenarios can substantially differ from that induced in the controlled laboratory
setting [8].
Recent advancements in stress detection methods have involved using physiological
signals collected in real-world settings [12–16]. This is achieved through the use of
wearable devices, such as smartwatches or smart wristbands, which continuously collect
physiological data from participants. Multiple questionnaires are sent randomly
throughout the day to gather information on stress levels. Finally, machine learning
techniques and statistical algorithms are applied to the collected data to build a stress
model. A disadvantage of these studies is the absence of contextual information in their
stress models, which can result in less reliable stress detection algorithms. The
importance of contextual data in stress detection tasks has already been extensively
demonstrated in the literature [17].
Can et al. [16] propose an objective stress detection system that uses smart bands
and contextual information, such as weather information and activity type (e.g., lecture,
presentation, or relaxation). However, one of the major limitations of this study is its
semi-controlled setting. In their study, the data was obtained during an eight-day
training event, where all the participants followed a predetermined schedule, including
designated training days, free days, midterm presentations, and other similar activities.
Consequentially, the contextual data captured in this study was captured manually and
is limited to the time and date of the predetermined schedule.
Only a limited number of studies have investigated the integration of physiological
signals and contextual information in a non-controlled, real-world setting [18, 19].
Methods
Study
Starting in November of 2021, we recruited a sample of college students (n = 11) from
the University of California, Irvine, via flyers and faculty outreach. The participants,
comprising both male (n = 4) and female (n = 7) populations, ranged in age from 18 to
37 years (Mean = 22.91, SD = 5.05). The students were enrolled on a rolling basis at
different intervals, depending on their enrollment date, and participated for a total of 2
weeks. During the enrollment process, participants review our study information
document and are asked afterward if they agree to continue their participation. Consent
is obtained verbally and is then documented in an Excel file by the research team
member running the participant session. As a component of the enrollment process,
students were instructed to download 2 mobile applications (one foreground app to
provide EMAs and one background app to perform passive mobile logging) and were
equipped with a smartwatch. Throughout the 2-week period, while wearing the
smartwatch that continuously measured physiology and activity levels, students were
prompted to complete multiple daily EMAs that was triggered by a smart EMA query
system.
The experimental procedures involving human subjects described in this paper were
approved by the Institutional Review Board (IRB) at the University of California,
Irvine.
System Architecture
The architecture of our proposed system is shown in Figure 1. The system comprises
three primary layers that facilitate the collection of physiological and movement data,
capturing contextual data, and querying labels.
Sensor Layer
This study uses Samsung Galaxy Gear Sport Watches as the wearable device. This
smartwatch is equipped with sensors capable of recording PPG (20Hz), accelerometer,
and gyroscope (movement) signals. We designed a custom smartwatch application for
Samsung Galaxy Gear Sport Watches running on the Tizen operating system to gather
these unprocessed PPG and movement signals. The data collected by the watch is
transferred to the cloud layer when it is connected to a local Wi-Fi network, and in the
absence of such a network, the data is transmitted via Bluetooth to a smartphone. Two
services and a user interface (UI) are included in the raw signal acquisition program.
The initial service delivers sensor data to the cloud at intervals of two minutes which
take place every fifteen minutes.
Edge Layer
We use the AWARE framework [20] to capture contextual data in everyday settings.
AWARE is an open-source mobile instrumentation framework for logging, sharing, and
reusing mobile context. AWARE uses smartphone built-in sensors to capture daily life
logging information such as phone battery level, weather, location, screen status, etc. In
Cloud Layer
A smart EMA query system is implemented (S-EMA) on the cloud to query labels
throughout the day. The followings are the summary of the main rules used by the
S-EMA module to trigger EMAs:
• Sending EMAs only between 7 AM and midnight.
• Sending EMAs only when the user is wearing the watch using the accelerometer
data.
• Sending EMAs only when the collected data is recent (the watch may record the
data without the Internet and sync it later).
• It is intended to query labels seven times per day. The frequency of querying
labels is adjusted dynamically to ensure that approximately seven labels are
captured daily. The waiting period is calculated based on the initial wear time of
the watch to achieve this target.
The stress levels in the EMAs are listed as “not at all” (1), “a little bit” (2), “some” (3),
“a lot” (4), and “extremely” (5).
Contextual data collected from the AWARE Framework (at the Edge layer), labels
queried through EMAs (the Edge layer), and raw physiological (PPG) and movement
(accelerometer, gyroscope, and gravity) signals captured by wearable platforms (the
Sensory layer) are sent and stored in the cloud for cleaning, filtering, preprocessing, and
being utilized in the predictive models.
Preprocessing
The PPG signals stored in the cloud layer are collected from wearable devices and hence
are prone to noise. To mitigate this issue, a number of preprocessing and filtering
techniques are applied to the raw PPG signals in order to prepare them for further
analysis. To detect stress as a stimulus in human subjects, a variety of features are
extracted from these signals, such as heart rate, heart rate variability, and breathing
rate, to name a few. Raw contextual data collected from the AWARE framework are
also too broad and non-informative. A feature extraction module is designed with the
purpose of transforming the raw contextual logging data into informative contextual
life-logging features. These extracted features serve as inputs for our predictive
machine-learning models. Data Imputation and Feature Selection are two
postprocessing techniques employed to improve our models’ performance.
Feature Extraction
In the feature extraction module, we use the HeartPy library [21] to process the clean
PPG signals to extract PPG peaks and PPG-relevant features including heart rate
variability. The HeartPy is a Python Heart Rate Analysis Toolkit. The toolkit is
designed to handle (noisy) PPG data. Using this library, the following 12 features are
extracted from the PPG signals: BPM, IBI, SDNN, SDSD, RMSSD, PNN20, PNN50,
HR mad, SD1, SD2, S, and BR. Table 2 outlines the definitions of these features for
reference.
The raw contextual data captured from AWARE framework are presented in Table 3.
These raw contextual data are not immediately usable in our predictive models and
thus require further processing. To address this issue, we have implemented a feature
extraction module to translate the raw data into numerical features that can be utilized
by the models.
In Table 3, the “Values” column details the range, type, and units for each raw
feature. The “Cut-offs” column lists the threshold values utilized to convert the raw
Data Labeling
The EMA protocol is designed to trigger a maximum of seven times per day and prompt
participants to indicate their stress level on a five-point Likert scale: (1) not at all, (2) a
little bit, (3) some, (4) a lot, and (5) extremely. The stress level reports, along with the
corresponding timestamps, are recorded in the cloud for subsequent analysis. Each
15-minute timing window of collected physiological and contextual data is then labeled
based on the closest subsequent EMA query. The label distribution is shown in Figure 2.
300 288
# Reported Labels
200
142
120
100
20 21
0
SL1 SL2 SL3 SL4 SL5
Stress Levels (SL)
Fig 2. The distribution of reported stress levels
Data Imputation
The contextual features obtained from the AWARE framework are sourced from various
sensors integrated within smartphones. As a result, certain 15-minute timing windows
may exhibit missing data for some features, which can occur due to differences in the
frequency of the sensors or technical limitations of a specific sensor. In order to
optimize the construction of efficient machine learning models, it is recommended to
Stress Detection
We evaluate the efficacy of our proposed stress detection system through binary
classification. In this classification, instances of “no stress” (represented by a stress level
of 1) are assigned a value of 0, while instances of “a little bit,” “some,” “a lot,” and
“extremely” (represented by stress levels greater than or equal to 2) are assigned a value
of 1.
The main reason for classifying the samples into “stress” and “no stress” is to have a
more balanced distribution of labels since some classes such as “extremely” and “a lot”
are rare. As shown in Figure 2, with this categorization, there will be 288 samples with
label 0 (“no stress”) and 303 samples with label 1 “stress”.
Feature Selection
In this work, feature selection constitutes a pivotal stage, and its inclusion can
significantly enhance the performance of our model. This is largely due to the inherent
constraints associated with the AWARE features. Specifically, some features may
exhibit consistent values over time, thereby rendering them less reliable and less critical
for classification. Additionally, certain features may present significant quantities of
missing data due to the challenges encountered in the collection of contextual data, such
as sensor malfunction. Furthermore, the filtering and cleaning of the PPG signals,
necessary to eliminate motion artifact noise, may result in the loss of critical
information from the signals. As a consequence, the extraction of features from PPG
signals may be rendered less dependable. It is, therefore, imperative to identify and
Performance Metrics
In order to evaluate the performance of our stress monitoring system, we use F1-score
as a quality metric. The F1-score is a measurement of a test’s accuracy used in
statistical analyses of binary categorization. It is derived from the test’s precision, and
recall, where precision is the proportion of ”true positive” results to ”all positive
results,” including those incorrectly identified as positive, and recall is the proportion of
”true positive” results to ”all samples that should have been identified as positive.” In
diagnostic binary classification, recall is also referred to as sensitivity, while precision is
also referred to as positive predictive value. The F1-score is calculated as the weighted
average of precision and recall:
precision × recall
F1 = 2 ×
precision + recall
Results
In our research, a cross-validation technique [26] was utilized to evaluate the
performance of our classification models. Cross-validation is a widely employed
algorithm for accurately estimating the performance of a machine-learning model on
unknown data. The process involves training a model using different subsets of the data
and then testing the average accuracy of the remaining data. To assess the effectiveness
of our research findings, we employed a 5-fold cross-validation method. To ensure that
there is no overlap of user data in the train and test splits, the splits were created based
on user IDs.
In order to ensure objectivity and prevent any potential biases, we adopted a fresh
start approach for each iteration of the stress detection model. We disregarded any
prior knowledge or information from previous stress models or the data from the current
test users. The ultimate performance of the model was computed by calculating the
mean of the individual stress models’ performances generated.
The summary of the performance achieved for our stress assessment algorithm
utilizing solely PPG data with a 5-fold cross-validation technique is presented in Table
4. The table shows that KNN with k=7 exhibited the best performance with an
F1-score of 56. However, the result of 56 is not promising for binary classification.
As a subsequent measure, we resolved to augment our model with contextual
information to enhance its performance. The results of the stress assessment algorithm
that incorporates both PPG and contextual data are summarized in Table 5. This
assessment was carried out using a 5-fold cross-validation technique. According to the
table, the Random Forest model with a depth of 5, employing the top five features
chosen by the GINI index algorithm, attained the best performance. This outcome
indicates a 14% increase in performance, underscoring the noteworthy role of contextual
data in stress detection techniques.
Table 5. Validation accuracy of our stress assessment algorithm using both PPG and contextual data
Classifiers F1 Selected Contextual Features Selected PPG Features
Random Forest (d=3) 70 weather, wind speed, device off, location BPM
KNN (k=9) 62 weather, wind speed, device off, location, speed BPM, S, SD2, SDSD, SDNN, BR
XGBoost 64 weather, wind speed, device off, location, speed BPM, S, SD2, SDSD, SDNN, BR, IBI
The most important contextual features, as determined by our analysis, are weather,
wind speed, device off, and location. Regarding the PPG signal, beat per minute
(BPM) was identified as the most relevant feature for stress detection. However, the
performance of the KNN and XGBoost classifiers was found to be lower. For these two
classifiers, the top 11 and 12 features were selected, respectively.
BPM 0.12
weather 0.15
To observe the impact of each feature on the model’s output based on the feature
values, we employ the beeswarm plot. Figure 4 presents a beeswarm plot that
summarizes the complete distribution of SHAP values for each feature. Utilizing SHAP
values, this plot showcases the effects of each feature on the model output. Features are
sorted by the total sum of SHAP value magnitudes over all samples. The color of the
plot demonstrates the feature value, with red indicating high and blue indicating low.
This analysis shows that a high value for the device off feature (more time the device is
not in use) results in a lower predicted stress value. As expected, a higher BPM
increases the predicted stress value. For the weather feature, our findings suggest that
when the weather condition is in the mid-range, such as mist, clouds, or rain, it
increases the probability of stress, whereas when the weather condition is clear, it
reduces the likelihood of stress. Furthermore, higher wind speed also increases the
predicted stress value.
One notable finding that has been made here is that a lower value for the location
feature indicates a higher predicted stress level while higher values indicate lower
predicted stress levels. According to our designated ranges for the location feature, a
lower value corresponds to the presence inside the university premises. On the other
hand, higher values of the location feature correspond to UCI housing or an outdoor
location, which results in decreased predicted stress levels. This observation suggests
that the location of an individual can play a significant role in their stress levels, with
certain locations associated with higher levels of stress.
Personalization
In order to evaluate the impact of personalization on our stress detection algorithm, we
conducted an experiment using data from three subjects with substantial amounts of
data (S111, S912, and S731). To achieve this, we trained our model on data from all
other subjects in the first stage, and then tested it on half of the data from one of the
selected subjects (e.g., S111). In the second stage, we customized the model using the
second half of the S111 data for training (in addition to data from other subjects) and
the first half of the S111 data for testing. We utilized the Random Forest classifier to
demonstrate the performance changes. Table 6 shows the results, indicating that
personalization improves the prediction performance by approximately 10%. All the
extracted PPG and contextual features were used in our models for this experiment.
Discussion
Capturing real-time features and signals while collecting labels from participants in
daily life is challenging. We propose a real-time monitoring system using a three-level
architecture. It includes a sensor layer with a Samsung Gear Sport watch 2, an edge
layer with mobile apps using the AWARE framework, and a cloud layer to store data
securely in a database.
Our real-time multi-tier system architecture was able to achieve an F1-score of 70%
for the task of stress detection. In addition, we have shown that personalization has a
positive impact on our stress detection models, resulting in approximately a 10%
improvement. This observation suggests that having a personalized model for the
participants could result in improved performance for stress prediction models.
Despite the successful implementation of a multi-tier system, one limitation of our
work is the occasional absence of contextual data for certain timing windows. The
contextual data is captured by the AWARE framework through different phone sensors,
services, and APIs. Therefore, the limitations and potential inaccuracies of these sensors,
services, and APIs, may result in missing features for some of the captured context data.
The second limitation of our research is associated with our label query system.
Specifically, our labeling system functions on a time-event paradigm, wherein it solicits
EMAs from study participants at pre-determined intervals of T hours. As a result, there
exists a possibility that our system may not trigger an EMA request during instances
when an individual is undergoing stress. Conversely, the system may initiate an EMA
during moments when participants are resting or occupied with work, leading to an
unsatisfactory experience and subsequently, an increase in missing EMA submissions. In
future work, we intend to implement a smarter query system to overcome the mentioned
challenges with the purpose of (1) accurately identifying the time frames during which
Conclusion
In this work, we proposed a context-aware daily-life stress monitoring system using
physiological and smartphone data. A smart query module, which uses accelerometer
signals collected from a watch, is implemented in order to capture sufficient and
high-quality labels. To the best of our knowledge, this is the first work presenting a
daily-life stress monitoring system employing both physiology and context data with a
smart query system to capture a sufficient number of EMAs throughout the day.
According to our results, we were able to achieve an F1-score of up to 70% using a
Random Forest classifier.
Acknowledgments
References
1. Li, Russell and Liu, Zhandong. Stress detection using deep neural networks.
BMC Medical Informatics and Decision Making: Springer; 2020.
2. Yekta Said Can, Niaz Chalabianloo, Deniz Ekiz, and Cem Ersoy. Continuous
Stress Detection Using Wearable Sensors in Real Life: Algorithmic Programming
Contest Case Study. Sensors: MDPI; 2019.
3. Ishaque, Syem and Rueda, Alice and Nguyen, Binh and Khan, Naimul and
Krishnan, Sridhar. Physiological signal analysis and classification of stress from
virtual reality video game. 2020 42nd Annual International Conference of the
IEEE Engineering in Medicine & Biology Society (EMBC); 2020.
4. Peter H Charlton, Patrick Celka, Bushra Farukh, Phil Chowienczyk, and Jordi
Alastruey. Assessing mental stress from the photoplethysmogram: a numerical
study. Physiological measurement: IOP Publishing; 2018.
5. Allen J. Photoplethysmography and its application in clinical physiological
measurement. Physiol. Meas. 2007
6. Kostopoulos, Panagiotis and Kyritsis, Athanasios I and Deriaz, Michel and
Konstantas, Dimitri. Stress detection using smart phone data. eHealth 360°:
International Summit on eHealth, Budapest, Hungary, June 14-16, 2016, Revised
Selected Papers: Springer; 2017
7. Seok, Dongyeol and Lee, Sanghyun and Kim, Minjae and Cho, Jaeouk and Kim,
Chul Motion artifact removal techniques for wearable EEG and PPG sensor
systems. Frontiers in Electronics, volume 2, page 685513, 2021, Frontiers Media
SA
12. Yu, Han and Sano, Akane Semi-Supervised Learning and Data Augmentation in
Wearable-based Momentary Stress Detection in the Wild. arXiv preprint, 2022
13. Sah, Ramesh Kumar and McDonell, Michael and Pendry, Patricia and Parent,
Sara and Ghasemzadeh, Hassan and Cleveland, Michael J ADARP: A Multi
Modal Dataset for Stress and Alcohol Relapse Quantification in Real Life Setting.
arXiv preprint 2022
14. Tazarv, Ali and Labbaf, Sina and Reich, Stephanie M and Dutt, Nikil and
Rahmani, Amir M and Levorato, Marco Personalized stress monitoring using
wearable sensors in everyday settings. 2021 43rd Annual International Conference
of the IEEE Engineering in Medicine & Biology Society (EMBC), pages
7332-7335, 2021, IEEE
15. Battalio, Samuel L and Conroy, David E and Dempsey, Walter and Liao, Peng
and Menictas, Marianne and Murphy, Susan and Nahum-Shani, Inbal and Qian,
Tianchen and Kumar, Santosh and Spring, Bonnie Sense2Stop: a
micro-randomized trial using wearable sensors to optimize a just-in-time-adaptive
stress management intervention for smoking relapse prevention. Contemporary
Clinical Trials, volume 109, page 106534, 2021, Elsevier
16. Can, Yekta Said and Chalabianloo, Niaz and Ekiz, Deniz and Ersoy, Cem
Continuous stress detection using wearable sensors in real life: Algorithmic
programming contest case study. Sensors, volume 19, number 8, page 1849, 2019,
MDPI
17. Wang, Weichen and Mirjafari, Shayan and Harari, Gabriella and Ben-Zeev, Dror
and Brian, Rachel and Choudhury, Tanzeem and Hauser, Marta and Kane, John
and Masaba, Kizito and Nepal, Subigya and others Social sensing: assessing
social functioning of patients living with schizophrenia using mobile phone
sensing. Proceedings of the 2020 CHI conference on human factors in computing
systems, pages 1-15, 2020
21. Van Gent, Paul and Farah, Haneen and Van Nes, Nicole and Van Arem, Bart
HeartPy: A novel heart rate algorithm for the analysis of noisy signals.
Transportation research part F: traffic psychology and behaviour, volume 66,
pages 368-378, 2019, Elsevier
22. Batista, Gustavo EAPA and Monard, Maria Carolina and others A study of
K-nearest neighbour as an imputation method. His, volume 87, number 251-260,
page 48, 2002
23. Peterson, Leif E K-nearest neighbor. Scholarpedia, volume 4, number 2, page
1883, 2009
24. Breiman, Leo Random forests. Machine learning, volume 45, pages 5-32, 2001,
Springer
25. Chen, Tianqi and Guestrin, Carlos Xgboost: A scalable tree boosting system.
Proceedings of the 22nd acm sigkdd international conference on knowledge
discovery and data mining, pages 785-794, 2016
26. Larson, Selmer C The shrinkage of the coefficient of multiple correlation. Journal
of Educational Psychology, volume 22, number 1, page 45, 1931, Warwich & York
27. Lundberg, Scott M and Lee, Su-In A unified approach to interpreting model
predictions. Advances in neural information processing systems, volume 30, 2017