Master Project
Master Project
Abstract
CHAPTER 1. INTRODUCTION.......................................................................... 1
4.4. Age..................................................................................................................... 18
6.2.1 Testing the Decision Tree classifier model with HRV parameters ..... 32
......................................................................................................................... 39
REFERENCES ................................................................................................. 40
Master Thesis 1
CHAPTER 1. INTRODUCTION
“Stress” is commonly defined as the external pressures on a person's bodily and
mental well-being, whether they be physical or psychological. Stress is a process
of interpreting and adjusting to external events, not merely a stimulus or a
reaction. Humans are frequently affected by perceived and potential stress, which
leaves them open to psychological issues and negative effects on their physical
health. Increased level of stress may cause an individual to become disorganized
and unsure about their objectives and aspirations. This could make it difficult for
them to excel in their lives and, make it difficult to manage their time well in
accordance with the situation. Several research suggests that mental stress can
degrade the performance of individuals (Kiselica et al., 1994). In order to develop
a deep understanding of such a critical issue, an in depth research has been
conducted in this thesis, additionally, with the assistance of machine learning
techniques, and utilizing artificial intelligence, a mental stress predictive model
has been generated to understand how stress can hamper academic
performance. The model thus created will provide a better insight regarding
academic stress among students and would help in a better coordination between
students, parents and teachers.
Chapter 1 gives an introduction to the thesis topic including the overall aim and
research targets. A comprehensive scientific study was applied to the concept
and the theme was broken down into an objective approach in order to evaluate
the findings in a systematic way. In order to dive deeper into the content being
presented in the thesis, Chapter 2 tries to understand the underlying theory and
the brief description associated with the technology; the fundamental algorithms
and the governing concepts related to machine learning (ML). Chapter 3 is related
to the methodology that has been followed, and tries to dissect the various
options that were taken under consideration for conducting the experiment and
study. Chapter 4 gives detail insights of the dataset and provides an overview of
the data analysis approach. In chapter 5, the advantages of using IBM Watson
studio platform have been discussed. Additionally, we discuss the steps involved
in implementing and configuring an AI machine learning experiment using IBM
Watson studio. The experiment results are documented and presented in
Chapter 6 as dedicated tables, plot diagrams and bar graphs. The thesis is
concluded in Chapters 7 and 8, with an investigative overview of the research,
drawing on conclusions and justification of thesis topic as well as
recommendations to enhance further research.
Consistent research has revealed that students with exceedingly higher levels of
stress had poor grade point averages (Deng et al., 2022). Depending on the
2 Master Thesis
levels and factors, stress may or may not impede academic performance (Deng
et al., 2022). Students may experience exceeding stress and feelings of
academic burnout if they are unable to timely manage and finish their work load
in the designated time. Additionally, students who are under a lot of stress have
a tendency to put off tasks like finishing projects on time and meeting deadlines
(Lin et al., 2020). Naturally, this will have an impact on their ability to study and
the calibre of their work. This study is more inclined towards a general category
of classification which classifies how students’ academic performance is affected
through stress.
The motivation behind this research work is to assist students and their
teachers/parents to develop a better understanding of determining the impact of
mental stress in an academic environment. Teachers/parents would be able to
provide extra support to their students that would give these students a fair
chance to achieve success.
Through the use of analysis of higher education data gathered through a survey,
this project aims to assist students in achieving better academic performance
while categorizing their mental stress. In terms of students’ mental stress and
academic performance, it would be self-explanatory if a ML-modelled thematic
framework could be developed to assist university academic staff as well as
students in determining if a particular student is tapping his/her maximum
academic potential. The primary objective is to establish a correlation between
the students´ perceived stress and their academic performance. The secondary
aim of this quantitative study is to predict students mental stress while taking into
account their Heart Rate Variability (HRV).
The main agenda and the focus of the study is broken down further into the
following objectives:
1. Determining the legitimacy of using ML algorithms for predicting student
performance in academics.
2. Determining if there is a possible correlation between students’ academic
performance and their mental stress.
3. Taking advantage of IBM Watson ML platform for implementing the
particular ML model that delivers the best accuracy, precision, recall and
F1 score when fed with data collected from university students.
4. Implementing a ML model to predict the mental stress using Heart Rate
Variability.
5. How accurately can we measure HRV from a wrist wearable device like a
smart watch?
Master Thesis 3
potent tool to tackle various real-world issues, including spam filtering, image
recognition, speech analysis, and medical diagnosis, provided there is an ample
supply of labeled data (IBM, n.d.).
Decision trees are a type of machine learning algorithm that uses a tree-like
structure to make decisions as depicted in figure 2. The algorithm starts with a
Master Thesis 5
root node, and branches off into internal nodes or decision nodes, which evaluate
available features to form homogenous subsets represented by leaf nodes. For
example, when deciding whether to go surfing, one might use a decision tree with
rules like "Is the temperature warm enough?" and "Is the wind too strong?". The
following decision rules can be followed to make the possible choice (IBM, 2023):
that are easier to classify. This iterative process allows subsequent trees to
classify observations that were previously misclassified. The final ensemble
model's predictions are determined by the weighted sum of the predictions made
by the preceding tree models (IBM, 2023).
Figure 4, shows the scheme that is adopted by typical decision tree algorithms
where the growth of the trees is level-wise.
On the other hand, for Light Gradient Boosting Machine (LGBM) ,developed by
Microsoft, the growth of the trees is leaf-wise, as shown in figure 5. For growth,
LGBM would choose leaf with maximum delta loss.
Fig. 5 Leaf wise tree growth in which only the node having the highest delta loss is split.
Successive splitting occurs only on one side of the tree, thus, resulting in an
asymmetrical tree (Microsoft Corporation, 2023).
Numerous machine learning (ML) projects have been developed for forecasting
student performance. Some of the research work, related to the theme of this
thesis, are elaborated in the following sections
Two machine learning models were created and a comparative analysis was
performed by researchers (Ahmed et al., 2021). The models being ANN model
and the Random Forest. TensorFlow was integrated at the backend for both the
models. They tried to develop the models in order to predict the academic
success using students previous academic evaluation and geographical data.
They achieved the required results by investigating that ANN can out preform the
Random Forest model by using a sizeable amount of data. The increased interest
among researchers in applying data mining techniques to evaluate student data
served as a prerequisite for this study. Future implications could include using
recurrent neural networks (RNN) for identifying students who are about to drop-
out from college.
In their study (Rajendran et al., 2022), the authors utilized machine learning
techniques to develop models for predicting the academic performance of high
school students. The models took into account various socio-demographic
factors such as age, gender, obesity, average household income, family size, and
marital status of parents, as well as school-related variables like type of gender
education and academic level, and student-related variables such as stress and
lifestyle. The output variable considered in the models was the students' GPA.
The results showed that the gradient boosting method outperformed other
techniques, followed by random forest, in terms of generating better predictions.
The analysis of the models led to the conclusion that maintaining a health-
conscious lifestyle has a positive correlation with academic performance, while
the presence of stress has a negative impact. Nevertheless, the impact of gender
was not identified as a significant predictor of a student's academic performance.
This paper (Xu et al., 2017) addresses several new challenges in the field. The
authors propose an innovative approach for predicting the future performance of
students in degree programs based on their current and past performance. To
construct base predictors, they develop a course clustering method using a latent
factor model. They also introduce an ensemble-based progressive prediction
architecture to incorporate the evolving performance of students into the
prediction process. These data-driven methods can complement other
pedagogical approaches and provide valuable information for academic advisors.
8 Master Thesis
The second part of the thesis focuses on a technique used to measure the mental
stress of the students, which is based on a phenomenon called Heart Rate
Variability (HRV). HRV analysis is a tool increasingly utilized for non-invasive
analysis of the Autonomous Nervous System (ANS) in the human body. Its
analysis and contextual application have gained importance due to its sensitivity
to both physiological and psychological environmental factors. Altered HRV
measurements are extensively utilized for monitoring the arrhythmic
dysregulation of the Autonomous Nervous System. Additionally, HRV
measurements are employed to monitor and assess sleep patterns, stress levels,
drowsiness, and the effects of prolonged strenuous exercise training (Colom et
al., 2010).
There are various approaches to reducing chronic stress and improving HRV.
One potential method involves modulating the autonomic nervous system. During
periods of stress, sympathetic nervous system responses become more
dominant, triggering the "fight or flight" response. This heightened sympathetic
activity leads to an increase in heart rate and a decrease in HRV. Conversely, the
parasympathetic nervous system, responsible for the "rest and digest" state,
becomes less active and restricted during stressful times (Welltory, 2023).
High HRV: “If the interval length variates and you are in a more relaxed
state then your HRV is high. This is mostly associated with good recovery.”
10 Master Thesis
CHAPTER 3. METHODOLOGY
In this chapter project methodology that has been followed in the thesis has been
elaborated.
The first phase consisted of a literature study. The study looked at previous
related work in the form of research articles, surveys, journals, and e-books. This
process was done to familiarize the reader on the current state-of-the-art ML
techniques and to show a research gap, to justify the current research that was
being conducted. Google Scholar was used to find these resources.
As per the Perceived Stress Scale (PSS) criteria, following are the thresholds
defined (NH Dept. of Administrative Services, n.d.):
The survey form consisted of five direct questions i.e., asking the students to
specify their age, gender, self-study hours, number of times they skipped their
school day and grades in the last three graded activities. The stress and cognitive
performance levels of the students were gauged through the two psychological
assessment scales (Perceived Stress Scale and Cognitive assessment scale, as
attached in the appendix).
The third phase was an experimental phase. In this phase, deep analysis was
performed with the help of a Bluetooth enabled Apple smart watch. A total of four
students were considered for this experimental stage. Informed consent was
obtained from each of the four students. The age group of the four students varied
between 20 – 23 years.
Master Thesis 11
The experimental phase was conducted during the mid-semester exams of the
students. In the experimental phase, the perceived mental stress and Heart Rate
Variability (HRV) was measured with the help of the Perceived Stress Scale
(PSS) and Apple smart watch, respectively. The students were asked to wear the
Apple watch during their exams.
The model was trained using various parameters that were derived from the Heart
Rate Variability. These HRV parameters were extracted from an original dataset,
known as Swell dataset, that was used to train the model. The SWELL dataset
consists of HRV indices computed from the multimodal SWELL knowledge work
dataset for research on stress and user modelling (Hazer-Rau et al., 2020).
The original dataset consists of data that was captured using the following means
(Hazer-Rau et al., 2020);
As the third phase of this research work revolves around the idea of predicting
mental stress from HRV (gathered through a smartwatch) of the students, hence,
the model was developed using only the features related to the HRV (in Time
Domain) that are as following (Shaffer & Ginsberg, 2017);
Table 1, shows the expressions used to calculate the HRV parameters in time
domain. Once the HRV data of the four students was extracted from the Apple
smart watch, these expressions were used to calculate the HRV parameters and
fed to the trained (with SWELL dataset) ML model to evaluate the performance
of the model.
Different wearable watches and gadgets have been studied and evaluated for HR
estimation measurement. For a total of one minute of effective granularity, the
Apple Watch has the best performance estimation (Hernando et al., 2018). This
app stores the raw RR values, with a precision of centi-seconds, in the user’s
Personal Health Record, accessible to be exported in XML format using Apple’s
Health App (Hernando et al., 2018).
In this thesis, the typical parameters that have been used in evaluating the
performance of a classification ML model based are as following;
Fig. 8. Confusion matrix consisting of True Positives (TP), False Positives (FP), False
Negatives (FN), Ture Negatives (TN) (Suresh, 2021)
3.2.3. Accuracy
3.2.4. Precision
3.2.5. Recall
Recall metric is defined as the ratio of correctly classified instances with positive
(TP) to total number of instances who have actually positive class(Singh et al.,
2021).
3.2.6. F1 score
F1 score is also known as the F Measure. The F1 score states the equilibrium
between the precision and the recall (Singh et al., 2021).
Informed consent was obtained from all the participants while conducting this
research work. None of the participants were minor or underage. The survey as
well as the details of the participating four students in this research work is kept
anonymous for privacy reasons. The purpose of the research was restated to all
of the participants. Moreover, while conducting the survey, special permission
was taken from the departmental director of the university.
This research will assist students and their teachers/parents to develop a better
understanding of determining the impact of mental stress in an academic
environment. Teachers/parents would be able to provide extra support to their
students that would give these students, irrespective of their gender, a fair chance
to achieve success.
16 Master Thesis
4.1. PSS-Score
Figure 9 depicts the perceived stress of the students. This feature is based on
score obtained from a Perceived Stress Scale (PSS) questionnaire.
Fig. 9 PSS score calculated for all the 298 students. Among these, 70 students
had no stress, 171 students had medium stress level and 57 students had high
stress level
Cognitive ability refers to the capacity of the human brain to process, store and
retrieve information. It also refers to innate functions of the brain which include
attention, memory and reasoning ability. According to Sternberg and Sternberg
(2009), it is the essential psychological component for people to successfully
complete an activity.
Master Thesis 17
Figure 10 shows the cognitive performance score of the students. This score (0-
100) has been codified ranging from a scale of 0 -10, with 0 being the lowest
(worst) score and 10 being the highest (best). On vertical axis we can observe
the cognitive performance score of the students whereas on the horizontal axis
the number of students.
4.3. Gender
Both male and female students were considered in the survey. Male participants
were assigned a “0” coded value, whereas, the female participants were
represented by “1” as shown in figure 11.
18 Master Thesis
4.4. Age
This feature represents the age group of the students taking into account only the
number of years. The age of the students was not codified.
As shown in the figure 12, frequently the students falls in the age group of 18 to
21 years. There were also a small number of students who didn’t specify their
age, depicted as missing in the figure. The missing field were left blank in the
dataset on which the model was trained.
Master Thesis 19
This feature depicts the total number of hours spent by any student on his/her
self-study at home or on campus.
As shown in figure 13, approximately 120 students spent between 2 to 4 hours a
day, studying by themselves doing homework and other academic related tasks.
About 45 students preferred not to answer this question, depicted as missing in
the figure. The missing field were left blank in the dataset on which the model
was trained.
This feature represents the number of days any student was absent from the
class (i.e., absent for the whole working day) during the past three months in the
on-going semester.
As depicted in figure 14, about 110 students skipped 0 to 3 lecture days, being
the most frequent. About 38 students preferred not to answer this question,
shown as missing.
The performance of the students has been analysed through this feature.
Students were asked to mention their scores in terms of percentage in their last
three graded class activities. Then the average of these three graded class
activities was taken and classified in three coded values i.e., 0, 1 and 2,
corresponding to percentage score falling in three categories i.e., 0-50 %, 51-
70% and 71- 100 %, respectively, representing 0 the lowest and 2 the best
academic performance.
Master Thesis 21
As shown in figure 15, 170 students had an normal performance in the last three
graded activities with the score falling between 51-70%, thus, classified as 1.
Around 66 students had poor performance with a score falling between 0-50 %,
classified as 0. Approximately, 62 students had a good score with average grade
between 71- 100 %.
22 Master Thesis
CHAPTER 6. RESULTS
The dataset, compiled from both the survey based on the questionnaire and the
Apple smart watch was analysed, separately, using IBM Watson ML experiment.
The study was divided in two phases as described in the following sections
Figure 16 shows the detailed progress map of the ML model building using IBM
Watson studio. The experiment consists of various steps as shown in the figure.
In the first step, the dataset is read by the machine. Then the dataset is split into
training and testing data. Afterwards, the data is cleaned in the Pre-processing
stage. In the last step, model that is best suited for the dataset, in terms of
accuracy, is selected. The users have the option to set the numbers of models
that they want to observe in the output. These models are fine-tuned by applying
feature engineering and hyperparameter optimization.
The two models that gave the best performance measure were chosen, namely,
LGBM classifier and Snap Boosting Machine Classifier, with the Cross Validation
accuracy of 71.4% and 71.0%. Feature engineering and Hyperparameter
optimization was applied to further improve the accuracy of the two models, thus
generating different pipelines as shown in figure 16.
24 Master Thesis
Table 2 shows the ranking based on importance for various features of the
dataset. As recommended by the IBM machine, any feature with the importance
above 65% should be considered while the feature having importance below this
threshold may be neglected for the optimum results.
As depicted in the Table 2, the four most important features using the LGBM
classifier comes out to be, the total number of days any student skipped the class
lectures, the perceived stress scale score, the cognitive performance and the age
of the students. This shows us that stress level of the students, with a feature
importance of 99%, has an important role in predicting the average grades of the
students.
Master Thesis 25
Table 3, shows various model measures of the LGBM classifier. The accuracy
of the LGBM classifier is 71.4%, impacted by the size of dataset and missing
values in the dataset, could be improved further by increasing the dataset size.
Table 3. LGBM classifier model measures The Cross Validation accuracy score
of the LGBM classifier comes out to be 71.4%
By taking the arithmetic mean of all the per-class F1 scores, macro F1 score is
calculated. This method treats all classes equally regardless of their support
values (Leung, 2022). Since, we have an imbalanced dataset, hence, the F1
macro score of the model should be given importance, which in our case comes
out to be 0.713, signifying above average performance of the model.
26 Master Thesis
The accuracy of the model is derived from the Confusion Matrix. Tables 4, 5 and
5 shows the confusion matrix of the stress levels of one class compared with the
other two classes.
Table 4. Confusion matrix predicting “0” stress level (no stress) against the other
two levels i.e., “1” (medium stress) and “2” (high stress)
From table 4, it can be observed that the overall accuracy of predicting “0” stress
level is 72.4%, with True Positives (i.e., individuals who had no stress and were
correctly identified by the ML model) are equal to 5 and False Negatives (i.e.,
individuals who had stress and were incorrectly identified as not having stress by
the ML model) are also equal to 5. Similarly, the True Negatives (i.e., individuals
who had stress and were correctly identified by the ML model) are equal to 16
and False Positives (i.e., individuals who had no stress and were incorrectly
identified as having stress by the ML model) are also equal to 3.
Master Thesis 27
Table 5. Confusion matrix predicting “1” stress level (medium) against the
other two levels i.e., “0” (no stress) and “2” (high stress)
From table 5, it can be observed that the overall accuracy of predicting “1” stress
level is 65.5%, compared with the “0” (no stress) and “2” (high stress) output
classes.
Table 6. Confusion matrix predicting “1” stress level (medium) against the other
two levels i.e., “0” (no stress) and “2” (high stress)
Table 6 depicts that the overall accuracy of predicting “2” (high stress level) is
79.3%, compared with the “0” (no stress) and “1” (medium stress) output classes.
28 Master Thesis
The ranking based on importance of the features, while using Snap Boosting
classifier, is shown in Table 7.
The most important features using the Snap Boosting classifier comes out to be,
the cognitive performance of the students. The importance of the perceived stress
in this case is only 16%, which implies that this models is not suitable for
predicting the average grades of the students based on their stress level.
Master Thesis 29
From table 8, it can be deduced that Snap Boosting model has more or less
similar performance measures as compared to LGBM classifier model. But, this
model is not recommended to predict the grades of the students based on their
stress levels as it shows high correlation for cognitive performance, only.
30 Master Thesis
Using the modified version of the SWELL dataset, a model was built and
deployed on the IBM Watson studio. The top performing model in this case came
out to be Decision Tree Classifier.
Table 10, depicts the confusion matrix of the Decision Tree classifier. It can be
seen that the accuracy of the classifier comes out to be 98.5% which proves that
the model is best suited for predicting the stress level of the individuals using
HRV parameters.
Master Thesis 31
Table 12. Feature ranking of Decision Tree classifier model. The features of the
dataset are ranked according to their corresponding correlation in predicting the
stress
As shown in table 12, the most important feature having the highest correlation
in predicting the stress is median of RR intervals between two heartbeats.
6.2.1 Testing the Decision Tree classifier model with HRV parameters
After developing this model, the next step was to gather the data related to HRV
of four students, in order to confirm the viability of using this model for HRV
measured through the Apple smart watch.
In parallel to measuring the stress levels of the students with the Apple smart
watch, the stress score of the students was also measured with the Perceived
Stress Scale.
Master Thesis 33
Table 13. the PSS score, cognitive assessment (CA) score, age and grades
achieved i.e., performance in the exams of the four students
Grade CA
Age PSS score
level score
Sample 1 Sample 2 Sample 3
Student 1 22 1 29 14 ( moderate stress) 09 (no stress) 13 ( moderate stress)
Student 2 23 1 47 16 (moderate stress) 23 ( high stress) 20 (moderate stress)
Student 3 20 1 39 19 (moderate stress) 16 ( moderate stress) 20 (moderate stress)
Student 4 21 2 58 26 (high stress) 29 (high stress) 30 (high stress)
For each of the four students, three samples (during the examination time period)
after every 3 days were taken with the help of PSS. From table 13, It can be
observed that student 4 (an outlier) was constantly under high stress during the
examination days, but irrespective of the high stress level, achieved an above
average grades in the three exams. This may be due to the high cognitive
performance score.
The HRV measurements were taken in two phases. In the first phase, the
measurements were taken in order to determine the baseline threshold level of
HRV parameters for the four students, whereas, the second phase was
implemented during the mid-term examination days.
Table 14. HRV parameters measured during relaxing stage of the four students
The HRV, during the relax stage, of the four students was recorded with the help
of the Apple smart watch and the corresponding parameters were calculated as
shown in table 14.
34 Master Thesis
Table 15 illustrates the HRV measurements calculated during the stress stage,
of the four students.
Considering the results obtained, in tables 14 and 15, it can be observed that
the median value of the RR interval increases during the stress stage of the
students whereas the heart rate (HR) decreases.
After calculating the HRV parameters during both the relax and stress stages,
these parameters were fed into the previously deployed model on IBM cloud and
the model made predictions about the stress level of the students. As depicted in
table 16, all the predictions were made correctly.
Table 16. Predictions made by the Decision Tress Classifier model, deployed on
the IBM Cloud
Master Thesis 35
Similarly, the HRV recording was taken into account before and during the mid-
term examination of the four students as shown in tables 17 and 18, respectively.
Table 17. HRV measurements during the examination days of the students (measurements
taken before the start of the exam)
Table 18. HRV measurements during the examination days of the students
(measurements taken while the students were taking their exam)
Measurements during exam
HRV Parameters Student 1 Student 2 Student 3 Student 4
SDRR 385.751 124.941 141.86 32.84
MEAN_RR 1017.31 696.782 1028.88 792.47
MEDIAN_RR 970.72 710.4 1020.40 796.62
RMSSD 15.868 15.723 15.2304 7.556
SDRR_RMSSD 24.30 7.946 9.31422 4.347
HR 68.096 89.159 59.452 75.84
pNN25 10.466 8.06 10.4 0.4
36 Master Thesis
After calculating the HRV parameters during both before and during exam stages,
these parameters were fed into the previously deployed model on IBM cloud and
the model made predictions about the presence of mental stress in the students.
From table 13, we observed that using the PSS scale, it was determined that
student 4 was constantly under high stress during the examination days, but
observing the prediction made in table 19, for student 4, the DT classifier model
predicted “no stress” (even though the student was undertaking the exam).
CHAPTER 7. DISCUSSION
The first objective of this research work was to determine the feasibility of using
ML algorithms by establishing a correlation between students’ academic
performance and their mental stress level. To this extent this research seems to
be effective as the two classifier ML models that are developed tends to achieve
an accuracy of 71% approximately. The low accuracy is due small size of
dataset. The accuracy could be enhanced, significantly, taking into account a
larger dataset. Furthermore, when we consider the feature ranking of the two ML
classifiers, it can be observed that Perceived Stress Score (PSS) feature plays a
significant role in predicting the academic performance of the students,
considering the LGBM classifier. The higher the PSS score, the lower the
academic performance of the students.
The second research question was to explore the benefit of using IBM Watson
ML platform. Implementing this research by utilizing IBM Watson ML cloud based
platform proved to be a very viable and powerful solution as one can easily utilize
the automatic experimental approach of the IBM ML without the need of extensive
coding. In this regard, steps to build and implement the ML models using IBM
auto AI experiment has also been mentioned in this thesis. Top performing ML
model was chosen on the basis of accuracy in addition to various parameters
like, F1 Score, Precision, Recall, etc.
The original SWELL dataset was altered by taking into account only the HRV time
domain parameters as mentioned in the previous section. The top performing
model in our case came out to be Decision Tree Classifier with a Cross Validation
accuracy of 98.1%.
After developing this model, the next step was to gather the data related to HRV
of four students, in order to confirm the viability of using this model for HRV
measured through Apple smart watch. As far as the question of how accurately
the smart watch measures the HRV is concerned, the results of this analysis as
described in the previous chapter, confirms the suitability of using Apple smart
watch.
Furthermore, it was deduced that whenever the mean value of the RR interval
increases, the heart rate decrease and consequently the stress level increases.
Additionally, it was observed that the physiological stress measurement can
deviate from subjective perception of stress. In another study it was revealed that
there might be a genetic moderation in the association between resting state HRV
and perceived stress (Looser et al., 2023).
CHAPTER 8. CONCLUSIONS
This research work utilized a quantitative approach to analyse the impact of
stress on heart rate variability (HRV) using machine learning techniques. The
findings revealed a positive association between higher stress levels and
increased HRV, consequently, degrading the academic performance of the
students as proved from the first phase of this study. In this context, Snap
Boosting model had more or less similar performance measures as compared to
LGBM classifier model. But, this model is not recommended to predict the grades
of the students based on their stress levels as it shows high correlation for only
cognitive performance.
The stress level was determined by extracting features from HRV analysis, and
a classification technique was employed using threshold values derived from the
training dataset. Performance accuracy measures were employed to assess the
outcomes. Consequently, this study suggests that stress influences HRV, thus
establishing its potential as an objective tool for assessing stress in academic
settings.
Following are some of the recommendations given in regard to the future scope
of this study;
• The accuracy of the model, developed to predict the academic
performance of the students, could be further enhanced if the size of the
dataset is increased i.e., conducting survey on a larger scale.
• The dataset that was gathered with the help of a survey based on
responses from the students, seems to be less efficient. Steps can be
Master Thesis 39
taken to reduce the human error due to negligence of the students taking
part in the survey.
• The correlation between the physiological stress measurement and the
subjective perception of stress can be further explored by including more
test subjects under study.
• A smart Learning Management System (LMS) can be designed that can
give smart recommendations to the students on ways to improve their
academic performance & to cope up with the mental stress suggestions
like breathing exercises etc. Data mining could be further applied while
considering solutions for students with different performance/stress levels.
Figure 17 shows proposed structure of a smart LMS database.
References
Analysis of Autonomic Nervous System. (n.d.). HRV - measuring parameter -
analysis.com/hrv/hrv-measuring-parameter.html
Colom, R., Karama, S., Jung, R. E., & Haier, R. J. (2010). Human intelligence
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3181994/
Deng, Y., Cherian, J., Khan, N. U. N., Kumari, K., Sial, M. S., Comite, U.,
Gavurova, B., & Popp, J. (2022). Family and Academic Stress and
https://doi.org/10.3389/fpsyt.2022.869337
https://drandreadinardo.com/2016/10/05/whats-your-stress-threshold/
Hazer-Rau, D., Meudt, S., Daucher, A., Spohrs, J., Hoffmann, H., Schwenker,
8220/20/8/2308
Hernando, D., Roca, S., Sancho, J., Alesanco, Á., & Bailón, R. (2018, August
IBM Watson. (n.d.). IBM Watson Studio - Overview. IBM. Retrieved April 3,
https://www.ibm.com/topics/decision-
trees#:~:text=A%20decision%20tree%20is%20a,internal%20nodes%20
and%20leaf%20nodes
https://www.ibm.com/topics/supervised-learning
https://www.ibm.com/topics/boosting
Kiselica, M. S., Baker, S. B., Thomas, R. N., & Reedy, S. (1994). Effects of
335–342. https://doi.org/10.1037/0022-0167.41.3.335
Koldijk, S., Sappelli, M., Verberne, S., Neerincx, M. A., & Kraaij, W. (2014). The
swell knowledge work dataset for stress and User Modeling Research.
Interaction. https://doi.org/10.1145/2663204.2663257
Kim, H.-G., Cheon, E.-J., Bai, D.-S., Lee, Y. H., & Koo, B.-H. (2018, February
28). Stress and heart rate variability: A meta-analysis and review of the
https://www.psychiatryinvestigation.org/journal/view.php?doi=10.30773
%2Fpi.2017.08.17
Kim, Y., Yoon, H. Y., Kwon, I. K., Youn, I., & Han, S. (2022). Heart rate
https://doi.org/10.3390/s22062152
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., & Liu, T.-Y. (2017, December
https://dl.acm.org/doi/10.5555/3294996.3295074#sec-ref
Lin, X.-J., Zhang, C.-Y., Yang, S., Hsu, M.-L., Cheng, H., Chen, J., & Yu, H.
(2020, June 3). Stress and its association with academic performance
https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-020-
02095-4
https://towardsdatascience.com/micro-macro-weighted-averages-of-f1-
score-clearly-explained-
b603420b292f#:~:text=The%20macro%2Daveraged%20F1%20score,r
egardless%20of%20their%20support%20values.&text=The%20value%
20of%200.58%20we,score%20in%20our%20classification%20report
Looser, V. N., Ludyga, S., & Gerber, M. (2023). Does heart rate variability
https://doi.org/10.1111/sms.14308
Metabolic Meals. (2021, June 20). A beginner's Guide to Heart Rate Variability
https://blog.mymetabolicmeals.com/hrv-guide/
est/Features.html#references
https://www.das.nh.gov/wellness/Docs/Percieved%20Stress%20Scale.
https://doi.org/10.17501/24246700.2021.7133
Rajendran, S., Chamundeswari, S., & Sinha, A. A. (2022). Predicting the aca-
https://doi.org/10.1016/j.ssaho.2022.100357
Singh, P., Singh, N., Singh, K. K., & Singh, A. (2021). Diagnosing of disease
821229-5.00003-3
44 Master Thesis
Shi, Y., & Qu, S. (2022). The effect of cognitive ability on academic
https://doi.org/10.3389/fpsyg.2022.1014655
Shaffer, F., & Ginsberg, J. P. (2017, September 28). An overview of heart rate
Singh, N., Moneghetti, K. J., Christle, J. W., Hadley, D., Plews, D., & Froelicher,
the Era of using mHealth Technologies for Health and Exercise Training
https://doi.org/10.15420/aer.2018.27.2
https://medium.com/analytics-vidhya/what-is-a-confusion-matrix-d1c0f8feda5
Welltory, W. I. (2023, March 18). Stress and HRV - what's The connection:
https://welltory.com/stress-and-hrv/
Williams, S., Layard Horsfall, H., Funnell, J. P., Hanrahan, J. G., Khan, D. Z.,
https://doi.org/10.3390/cancers13195010
Xu, J., Moon, K. H., & van der Schaar, M. (2017). A machine learning approach
https://doi.org/10.1109/jstsp.2017.2692560
https://www.ocf.berkeley.edu/~jfkihlstrom/ConsciousnessWeb/Meditatio
n/CFQ.htm
46 Master Thesis
Annex
Master Thesis 47
48 Master Thesis
Survey questionnaire