November, 2024
November, 2024
By Group 9
(November, 2024)
1
Individual Contributions:
2
Table Of Contents
Table Of Contents........................................................................................................................ 1
Introduction..................................................................................................................................2
Datasets Overview.......................................................................................................................3
3
Introduction
The rapid evolution of wearable devices, such as the latest iterations of the Apple Watch, has
significantly enhanced health monitoring and fitness tracking capabilities. These devices
generate rich physiological datasets, enabling advanced machine learning and deep learning
techniques to analyze and predict metrics such as heart rate and activity levels. This project,
utilizing data from the Harvard Dataverse, is divided into three phases, each exploring different
aspects of wearable data analysis.
This phased approach showcases the utility of wearable device data and machine learning in
enabling advanced health and fitness tracking solutions.
4
Datasets Overview
The datasets have come from a convenience sample of 46 participants (26 women) to wear two
devices, Apple Watch Series 2 and a Fitbit Charge HR2. Participants completed a 65-minute
protocol with 40-minutes of total treadmill time and 25-minutes of sitting or lying time. Indirect
calorimetry was used to measure energy expenditure. The outcome variable for the study was
the activity class; lying, sitting, walking self-paced, 3 METS, 5 METS, and 7 METS.
Minute-by-minute heart rate, steps, distance, and calories from Apple Watch and Fitbit were
included in four different machine learning models. The analysis dataset includes3656 and
2608 minutes of Apple Watch and Fitbit data, respectively.
● aw_fb_data:
○ Shape: 6264 rows × 20 columns
○ Columns: Includes general demographic data (example: age, gender, height,
weight), physical activity metrics (example: steps, heart rate, calories, distance),
entropy-based metrics (example: entropy_heart), and device-specific information.
Columns are labeled more generically (example: steps, hear_rate, device).
● Data_for_weka_aw:
○ Shape: 3656 rows × 18 columns
○ Columns: Similar to aw_fb_data but focuses specifically on Apple Watch data.
Columns are prefixed with Apple watch (example: Applewatch.Steps_LE,
Applewatch.Heart_LE). Includes additional entropy-based metrics and trimmed
activity labels.
● Data_for_weka_fb:
○ Shape: 2608 rows × 18 columns
○ Columns: Structured similarly to data_for_weka_aw but focuses on Fitbit data.
Columns are prefixed with Fitbit(example: Fitbit.Steps_LE, Fitbit.Heart_LE).
So here the aw_fb_data serves as a master dataset combining multiple devices, suitable for
comparisons or general analyses. Data_for_weka_aw & Data_for_weka_fb are more focused
5
on individual device data, which is likely cleaned and formatted for specific analyses or machine
learning tasks.
1. Size:
○ aw_fb_data has the largest number of rows, possibly combining data from
multiple devices.
○ data_for_weka_aw and data_for_weka_fb have fewer rows, likely filtered or
device-specific subsets.
2. Column Names:
○ aw_fb_data uses generic column names applicable to any device.
○ data_for_weka_aw and data_for_weka_fb prefix column names with Apple
Watch or Fitbit, indicating device-specific datasets.
3. Focus:
○ aw_fb_data is a combined or general dataset, encompassing data from multiple
devices.
6
○ data_for_weka_aw and data_for_weka_fb are tailored for individual devices,
focusing on Apple Watch and Fitbit, respectively.
4. Activity Label:
○ In aw_fb_data, the activity label is activity.
○ In the device-specific datasets, the activity label is activity_trimmed.
To further understand the datasets, we can use some high level statistical methods.
7
8
In the combined dataset, aw_fb_data, metrics like steps and heart rate exhibit moderate
variability, with mean step counts of 109.56 and a variance of 49,638.91. The heart rate data is
similarly distributed, with a mean of 86.14 and a variance of 820.73. Notably, the "steps times
distance" metric shows extreme variability, with a mean of 590.04 and a variance exceeding 16
million, suggesting significant outliers or skewed data. Entropy metrics for heart rate and steps
are centered around 6 with low variability, indicating consistent daily patterns across
participants.
The Apple Watch dataset, data_for_weka_aw, shows higher step counts (mean: 180.25)
compared to Fitbit but also greater variability (variance: 72,596.79). Heart rate measurements
have a mean of 91.25 and lower variability, demonstrating stability in readings. Entropy metrics
for steps and heart rate are consistent at approximately 6, mirroring patterns in the combined
dataset. Resting heart rate in the Apple Watch dataset shows less variability (variance: 142.32),
highlighting more consistent readings.
9
In contrast, the Fitbit dataset, data_for_weka_fb, demonstrates significantly lower mean step
counts (10.47) but higher variability in distance traveled (variance: 4,433.80), reflecting a
broader range of physical activity levels. The Fitbit dataset also exhibits a stronger correlation
between steps and heart rate (mean correlation: 0.727) compared to the Apple Watch dataset
(mean correlation: 0.006), suggesting better synchronization between these metrics. Entropy
metrics in the Fitbit dataset are slightly lower than in the Apple Watch dataset, with higher
variability, which may indicate differences in activity patterns or measurement algorithms.
Overall, the Apple Watch dataset tends to provide more consistent readings with lower
variability, making it suitable for analyses requiring stability. In contrast, the Fitbit dataset
captures a wider range of activities, as evidenced by its higher variance in several metrics like
steps, calories, and distance. The combined dataset, aw_fb_data, integrates data from both
devices but introduces variability due to differences in measurement approaches. Each dataset
offers unique strengths, allowing researchers to select the most appropriate data source based
on the specific objectives of their analysis.
10
Exploratory Data Analysis (EDA)
Statistical Summary:
Activity Categories:
1. Lying
2. Self Pace walk
3. Running 3 METs
4. Running 5 METs
5. Sitting
6. Running 7 METs
11
12
Visualizations:
1. Age Distribution: The data skews slightly towards younger ages, with a peak around
25–30 years.
2. Height and Weight Distribution:
○ Height clusters around 160–180 cm.
○ Weight peaks between 60–70 kg, with a smaller spread than height.
3. Activity Class Distribution:
○ Some activities (example: "Lying" and "Self Pace walk") dominate the dataset.
○ Activities like "Running 7 METs" occur less frequently, indicating a potential class
imbalance.
13
EDA Summary for data_for_weka_aw.csv
1. Data Overview
2. Missing Values
3. Summary Statistics
14
15
The plot above shows the distributions of key numeric variables from the data_for_weka_aw.csv
dataset.
● Observations:
○ Age is fairly evenly distributed, with a peak around 25–30 years.
○ Height and weight have a normal distribution centered around their means (~170
cm and ~70 kg).
○ Apple Watch metrics, like steps and calories, show skewed distributions,
indicating most users fall into lower activity ranges.
○ Heart rate data clusters between 60–100 bpm, with fewer outliers at higher rates.
The dataset contains 6264 entries and 20 columns. Here's a quick summary of its structure:
● Numeric Columns: Includes metrics such as age, height, weight, steps, heart rate,
calories, distance, and others.
● Categorical Columns: device, activity.
● Target/Relevant Insights: Possible correlations between activity type, health metrics,
and device usage.
16
17
Here are the visual insights:
18
Machine Learning Pipeline Implementation
To ensure a streamlined and reproducible approach for model training, testing, and
hyperparameter tuning, we designed and implemented a custom Python class,
ClassifierPipeLine. This class allowed us to efficiently build, evaluate, and optimize machine
learning models while incorporating necessary preprocessing steps. Below, we outline its key
functionalities and the process followed for our analysis.
The ClassifierPipeLine class provided a structured framework for model training and testing. It
facilitated the seamless integration of the training data, validation through cross-validation, and
evaluation on test data. The class ensured that models were trained using the optimal
hyperparameters and that their performance was tested under consistent conditions, thereby
enabling reliable comparisons between different classifiers.
The ClassifierPipeLine class includes the capability to build machine learning pipelines with
preprocessing steps and classifiers integrated into a unified structure. This ensures that
necessary data transformations, such as standardization or feature engineering, are
consistently applied to both the training and testing phases.
For example, during our analysis, scaling transformations were integrated into the pipeline for
models like K-Nearest Neighbors (KNN) to ensure optimal performance. This modular approach
ensured that preprocessing steps were reusable across different models and configurations.
● Random Forest: Parameters like the number of estimators, maximum depth, and feature
selection strategies were explored.
● KNN: The number of neighbors and distance metrics were optimized.
● Gradient Boosting: Learning rate, number of estimators, and tree depth were fine-tuned.
19
The best-performing hyperparameters were automatically identified, and the models were
retrained with these settings on the entire training dataset.
1. Evaluation Metrics
● Accuracy: Measures the proportion of correctly classified instances out of the total
instances.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
Precision= 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
● Recall (Sensitivity or True Positive Rate): Measures the model’s ability to capture all
relevant positive cases.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
Recall= 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
● F1 Score: The harmonic mean of precision and recall. It’s useful when you want a
balance between precision and recall, especially with imbalanced classes.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙
F1 Score=2× 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
20
2. Confusion Matrix
● A confusion matrix gives a thorough explanation of the right and wrong classifications:
● True Positives (TP) refer to actions that were correctly anticipated.
● False positives (FP) are actions that were incorrectly anticipated but were not
really executed.
● True Negatives (TN) are correctly anticipated non-activities.
● False Negatives (FN) are missed forecasts for actions that were actually done.
● The confusion matrix identifies which actions are misclassified, giving information for
model development.
● Confusion matrices for the four models are shown below:
21
22
Results:
23
Gradient 83 Running 7 Sitting: -Surpassed other - More In alignment with
Boosting METs: Precision = models in both resource-intensi Random Forest
Precision 0.71 accuracy and ve than basic findings, heart
= 0.93 Recall = precision across algorithms. rate features
Recall = 0.71 the majority of - Some (norm_heart,
0.89 F1 = 0.71 classes. challenges in heart_rate) are
F1 = 0.90 -Identifies subtle differentiating the most
patterns through low-intensity prominent in the
a process of activities. rankings.
iterative boosting.
Predicting Heart Rate Using Apple Watch Data with LSTM Model
Introduction
Wearable devices such as the Apple Watch provide a unique opportunity to monitor and predict
physiological parameters like heart rate during various physical activities, which is helpful for
athletes. Accurate heart rate prediction can enable more effective fitness tracking and health
monitoring. In this, we leverage data collected from Apple Watch sensors, which is considered
close to the Gold Standard ECG measurement as mentioned in this paper
(mc.ncbi.nlm.nih.gov/articles/PMC6444219) to build a Long Short-Term Memory (LSTM) neural
network model to predict heart rate over an 8-minute interval while training on the 56 minutes.
This part of the project focuses on modeling heart rate fluctuations in response to activity
intensity using time-series data collected from 46 participants. The goal is to demonstrate how
temporal dependencies in sequential data can be effectively captured using LSTM to achieve
accurate predictions.
Preprocessing
Features Utilized:
24
Preprocessing Steps
1. Trimming Data:
○ Each participant’s data was trimmed to ensure a consistent number of entries (64
data points) across individuals.
2. Feature Scaling:
○ Both the input features and target heart rate values were normalized to the range
[0, 1] using Min-Max Scaling, ensuring uniformity across different units.
3. Sequence Generation:
○ Data was transformed into sequential samples, where 20 consecutive time steps
were used as input to predict the heart rate at the next time step. This sliding
window approach allowed the model to capture temporal patterns in the data.
4. One-Hot Encoding:
○ Categorical variables, in this case, activity types, were one-hot encoded to
provide distinct representations for the model.
5. Train-Test Split:
○ For each participant, 80% of the data was used for training, and 20% was
reserved for testing.
25
Model Design
Model Architecture
The LSTM model was designed to leverage the temporal nature of the dataset with the following
components:
1. Input Layer:
○ Accepts sequences of 20 time steps, with each step containing scaled feature
values.
2. Bidirectional LSTM Layers:
○ Two stacked layers of Bidirectional LSTM units (128 and 64 units, respectively)
were used to capture temporal dependencies in both forward and backward
directions. Regularization (l2) was applied to prevent overfitting.
3. Dropout Layers:
○ A dropout probability of 0.3 was applied after each LSTM layer to further reduce
overfitting.
4. Dense Output Layer:
○ The final dense layer produces a single scalar value representing the predicted
heart rate.
5. Loss Function:
○ Mean Squared Error (MSE) was used to minimize the difference between actual
and predicted heart rate values.
6. Optimizer:
○ Adam optimizer was used with a learning rate of 0.00005, ensuring efficient
convergence during training.
26
Training Process
1. Early Stopping:
○ The training was stopped early if validation loss did not improve for 25
consecutive epochs, preventing overfitting and unnecessary computations.
2. Training Parameters:
○ Batch Size: 32 samples.
○ Epochs: 150 maximum, with early stopping applied.
3. Sequence Input:
○ Input sequences consisted of 20 time steps, allowing the model to predict the
heart rate for the next minute.
Performance Metrics
● Average RMSE: The model achieved an average Root Mean Squared Error (RMSE) of
2.34 bpm across all participants.
● Alignment: Predictions closely matched actual heart rate values, particularly during
low-to-moderate activity intensity phases.
27
Visualization of Results
Figures below show the actual vs. predicted heart rate values for selected participants:
28
Comparison with State of the Art
The study through which the data was collected was done in 2018, since when the hardware in
both the Fitbit and Apple Watch have improved.
The Apple Watch Series 10 and the latest Fitbit models seem to cater to different user needs.
The Apple Watch Series 10 boasts a thinner design with the largest display yet, enhanced sleep
apnea notifications, water depth and temperature sensing, and advanced health metrics through
watchOS 11, including a new Vitals app for monitoring key overnight health data. It integrates
seamlessly with the Apple ecosystem, allowing for extensive app usage, notifications, and
contactless payments via Apple Pay.
In contrast, Fitbit emphasizes affordability and battery life, with some models lasting several
days on a single charge compared to the Apple Watch's 18-hour battery life. Fitbit devices excel
in basic fitness tracking features like heart rate monitoring and sleep analysis but offer fewer
smartwatch functionalities. They are compatible with both iOS and Android devices, making
them versatile for a broader audience. Overall, the choice between the two largely depends on
whether users prioritize comprehensive smartwatch capabilities or extended fitness tracking
features at a lower price point.
But similar to our findings above, the latest apple watch seems to outperform the fitbit.
Which is no surprise considering the cost difference between them.
29