0% found this document useful (0 votes)
13 views43 pages

File 160828

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views43 pages

File 160828

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

A C O M PA R AT I V E S T U D Y O N

S T U D E N T M E N TA L H E A LT H
C L A S S I F I C AT I O N B E T W E E N
TRADITIONAL METHODS AND
MACHINE LEARNING
ALGORITHMS

R.K. CHOTE

thesis submitted in partial fulfillment


of the requirements for the degree of
master of science in data science & society
at the school of humanities and digital sciences
of tilburg university
student number
2083104

committee
ir. F. Zamberlan
dr. M. van Wingerden

location
Tilburg University
School of Humanities and Digital Sciences
Department of Cognitive Science &
Artificial Intelligence
Tilburg, The Netherlands

date
July 4, 2022

acknowledgments
I want to thank ir. F. Zamberlan for his supervision and encouragment
during the entire process. My gratitude also extends to dr. M. van
Wingerden, who functioned as a second reader and provided me with
valueable feedback. Lastly, I want to thank my parents and sisters for
their unconditional support this entire year.
A C O M PA R AT I V E S T U D Y O N
S T U D E N T M E N TA L H E A LT H
C L A S S I F I C AT I O N B E T W E E N
TRADITIONAL METHODS AND
MACHINE LEARNING
ALGORITHMS

r.k. chote

Abstract
The prevalence of mental illness among students is on the rise, with
severe consequences for all those affected. Prevention and promotion
intervention strategies that identify students prior to the onset or
worsening of mental health problems could play a substantial role in
reducing this burden. Using the Mental Health Quotient questionnaire,
the purpose of this thesis was to investigate a feasible prevention
strategy that focuses on the early detection of students who exhibit
risk factors for poor mental health. Using the completed questionnaires
of n = 15,906 students between the ages of 18 and 24, a classification
prediction task was developed to classify individuals as either at-risk for
their mental health or normal/healthy. Machine learning models were
compared to a baseline linear model. Model selection included feature
selection and hyperparameter tuning using 10-fold cross-validation.
The results reveal that, in terms of their F1-score and ROC-AUC,
Naive Bayes, K-nearest neighbors, Random Forest, and Feedforward
Neural Network do not outperform the baseline Logistic Regression. No
statistical difference in performance was found between BLR and SVM.
Support Vector Machine and Binary Logistic Regression performed
the best, while a Feedforward Neural Network model performed the
worst. Directions for future research are given and the implications are
discussed.

1
1 data source/code/ethics statement 2

1 data source/code/ethics statement

Work on this thesis did not involve collecting data from human participants or
animals. The original owner of the data used in this thesis retains ownership
of the data during and after the completion of this thesis. The author of
this thesis acknowledges that they do not have any legal claim to this data
or code.

2 introduction

2.1 The need for mental health prevention and promotion strategies

Mental illnesses are becoming increasingly prevalent among university stu-


dents, with undergraduate students being particularly at risk during the
transition to university (Sheldon et al., 2012; Selvaraj & Bhat, 2018). Not
only do the affected students perform worse academically and drop out at
a greater rate in comparison to students who do not have mental illnesses,
but they are also often faced with overall poor health outcomes such as
higher comorbidity of physical illnesses and experiencing a low quality of life
(Bruffaerts et al., 2018; McCloughen et al., 2012; Leijdesdorff et al., 2020;
Lipson & Eisenberg, 2018). The dominant approach to relieving this burden
is treatment, which is primarily based on the ‘illness-oriented’ perspective of
psychiatry, in which mental health is viewed in terms of the presence and/or
absence of psychopathological symptoms (Thieme et al., 2015; Bohlmeijer
& Westerhof, 2020). Treatment consists of identifying individuals who are
currently suffering from a diagnosable mental disorder and offering them
treatment with the expectation that this will provide relief from the disorder
(National Research Council, 2009). The problem is, however, that a large
proportion of students with poor mental health receive no mental treatment
due to perceived barriers or lack of access to mental treatment (i.e., because
of low social economic status or limited mental health resources) (Horwitz
et al., 2020; Wu et al., 2007). Additionally, it is becoming more apparent
that the current view of mental health based on the ‘illness-oriented’ per-
spective does not capture the full scope of what it means to experience good
mental health (Margraf et al., 2020). That is, good mental health is not
just the absence of a mental disorder, but as described by the World Health
Organization, it is a complete state of being where wellbeing and positive
functioning are two of its core elements (Bolier et al., 2013; World Health
Organization, 2004). Both prevention, in combination with early detection
(i.e., screening) and promotion strategies, could play an important role in
complementing treatment of students by increasing the number of students
who are offered help (Colizzi et al., 2020; Keyes, 2007). Prevention of mental
2 introduction 3

illnesses involves identifying, monitoring, and managing risk factors with the
goal of acting before the mental health problems worsen or even prevent their
onset by providing timely treatment (Arango et al., 2018). Promotion of
mental health focuses on enhancing protective factors and healthy behaviors
with the goal of enhancing mental health, which in turn reduces the incidence
of psychiatric disease. Given the fact that multiple studies show that both
prevention and promotion are effective supplements and/or alternatives to
treatment and are cost- effective, more research is needed to explore ways in
which students could benefit from these interventions (Arango et al., 2018;
Arumugam, 2019; Mihalopoulos et al., 2011; Purtle et al., 2020).

2.2 Artificial Intelligence and Mental Health Research

Artificial intelligence (AI) prediction algorithms are already widely used in


the business sector and in some sectors of medicine, but the search for AI
applications in mental health has only recently gained popularity. As is
apparent from the systematic reviews written by Shatte et al. (2019) and
Thieme et al. (2020), which reveal a recent increase in machine learning
research in the field of mental health, with many articles addressing how
machine learning can be applied in mental health domains such as the
detection and diagnosis of mental illness. The appeal of machine learning
in mental health research may be explained not just by its technological
advantages but also by its methodological advantages in comparison to
the more traditional psychological methods. For instance, by integrating
non-linearity into the model, machine learning is able to model complex
phenomena that are not easily captured by commonly used linear models
in psychology, such as General Linear Models (GLM) (Jacobucci et al.,
2020). Also, in contrast to machine learning research, evaluating a model on
unseen data is uncommon in psychological research. Consequently, reported
model fit indicators are nearly always only related to the training sample
(i.e., in-sample data), which provides an overly optimistic perspective of
reality and overestimates the generalizability (Yarkoni and Westfall, 2017;
Khanzode & Sarode, 2020).
While numerous studies support and advocate for the use of machine
learning, a number of articles have also demonstrated that machine learn-
ing algorithms do not outperform traditional methods. For instance, the
systematic review by Christodoulou et al. (2019) in which the performance
of logistic regression was compared against the performance of machine
learning algorithms such as decision trees, random forests, artificial neural
networks, and support vector machines in clinical prediction modeling found
no superior performance of machine learning to that of logistic regression.
Therefore, as Christodoulou et al. (2019) point out, more research is needed
3 related work 4

to define the situations in which machine learning methods have advantages


over traditional methods. This is especially relevant since the application of
machine learning in the absence of domain expertise can lead to misinformed
conclusions, which also stresses the importance of identifying which machine
learning methods should be used (Bone et al., 2015; Vélez, 2021).

2.3 Research Questions

In summary, current approaches to alleviating the burden of student mental


health may not be sufficient, and cost-effective alternatives and/or supple-
ments are needed. Given the increase in popularity of applying machine
learning in mental health research and the discrepancies in findings regarding
the superiority of machine learning in comparison to traditional methods, the
overall research goal is to explore if machine learning algorithms outperform
traditional methods in mental health prediction.
A classification prediction model that assesses students’ mental health
and determines whether they are at-risk or not at-risk could increase the
proportion of students who are helped by assisting mental healthcare prac-
titioners and universities with early detection of students’ mental health
problems (Becker, 2018; Bzdok & Meyer-Lindenberg, 2018). Therefore, the
main research question is as follows:

RQ1 How do Naive Bayes, K-nearest neighbor, Support Vector Machine,


Random Forest, and a Feedforward Neural Network perform in students’
mental health classification compared to Binary Logistic Regression?

To answer this research question, a classification prediction task is created


in which Binary Logistic Regression model will serve as the baseline against
which other models will be evaluated. To acquire a deeper understanding
of the performance of the models, a statistical model comparison and error
analysis will be performed.
The remaining part of this is structured as follows: Section 3 provides
relevant literature regarding the use of machine learning in mental health
research. Section 4 of this thesis describes the methodology used. In Section
5, the results are presented. In Section 6, interpretations of these results
are provided, along with the research’s strengths and limitations, as well
as suggestions for future research. The conclusion provided in Section 7
concludes this thesis.

3 related work

This section discusses the existing literature on the use of machine learning
in mental health research based on the systematic reviews by Shatte et al.
3 related work 5

(2019) and Thieme et al. (2020), concentrating on applications pertinent to


this thesis. Specifically, articles that focus on machine learning applications
in relation to detection of general mental health and well-being, excluding
those that focus on other applications such as prognosis or treatment, or
articles that focus on specific mental disorders.
General mental health and wellbeing are frequently accessed either di-
rectly or indirectly through either a single type of data source or using
multiple types of data sources. These studies frequently employ Ecological
Momentary Assessment methods, which involve the repeated collection of
real-time data on an individual’s behavior and experiences in their natural
environment (Schiffman et al., 2008). The most frequently used data source
type is wearable monitoring devices. For example, Sano et al. (2015) ob-
tained data from 66 participants using wearable sensors and smartphones
which identified certain physiological patterns and surveys and used this data
to predict academic performance, sleep quality, perceived stress, and mental
health. A Support Vector Machine with a linear kernel and a Support Vector
Machine with a radial basis function kernel were used for classification, and
they concluded that the survey and personality information alone yielded
an 80% accuracy and that the data from the wearable sensors increased the
accuracy by around 8%. There are however some limitations regarding the
use of wearable sensors or smartphones to predict clinical outcomes. While
these technological advances are often cost-effective and are relatively easy
to use (Majumder et al., 2017), the review by Saeb et al. (2017) found
that prediction accuracies of these studies are often inaccurate in that they
overestimate the accuracy due to the validation procedures used. In addition,
most articles are exploring ways of integrating wearable devices as moni-
toring devices into the clinical setting, which could be valuable in its own
right, however it is also limiting since it only targets individuals who receive
treatment. Another popular data source type is social media. For example,
Joshi et al. (2018) utilized tweets (i.e., Twitter posts) from 200 Twitter
users to classify individuals as ’normal people,’ ’people who are prone to
mental illnesses,’ or ’people who are currently suffering from some type of
mental illness’. Multiple classifiers were trained using extracted features
from these tweets. An average of 85% accuracy was achieved in which
an ensemble classifier performed the best while a multi-layered perceptron
classifier performed the worst. Unfortunately, no information was provided
regarding the multi-layered perceptron set-up, therefore no explanation can
be provided regarding this surprising result. Utilizing social media as a data
source for training machine learning algorithms can give researchers unique
information about the individuals themselves. However, using social media
as a data source in the context of mental health may raise some serious
ethical concerns. As described in the systematic review by Wongkoblap et al.
3 related work 6

(2017), very few articles considered ethics in relation to the privacy of their
participants’ personal or private data (Conway & O’Connor, 2016; Rubeis,
2022).
Finally, the research conducted by Srividya et al. (2018) appears to align
most closely with the objective of this thesis. The purpose of their research
was to identify the mentally distressed individuals in the target population.
Using a questionnaire designed in consultation with a psychologist, they
collected data from 656 participants. The target population consisted of high
school and college students (n = 300) and working professionals with less
than five years of experience from various organizations (n = 356). Clustering
was utilized to obtain labels, which were then used to train classification
algorithms. Logistic regression, Naive Bayes, Support Vector Machines,
Decision Trees, and K-Nearest Neighbors were the classification algorithms
used. Support vector machines and K-Nearest Neighbors performed best
as individual classifiers, obtaining just above 80% accuracy. Random forest
and the bagging ensemble method produced the best results, obtaining
around 95% accuracy. There are, however, a few limitations regarding this
paper. Firstly, while the questionnaire used to assess the mental health of
the participants was consulted with a psychologist, no reliability or validity
results were provided to verify the quality of the obtained data, which could
be problematic during generalization (Shrout & Lane, 2012). Secondly, the
logistic regression did not perform significantly worse than the more advanced
machine learning algorithms like Random Forest, which leaves the question
of whether machine learning algorithms are needed in psychological research
unresolved. That is, why use advanced machine learning techniques with the
potential of drawing wrong conclusions if traditional methods accomplish the
same results and provides more interpretability? Thirdly, a small sample size
(n = 300) was used, which could have overestimated the predictive accuracy
that was obtained (Cui & Gong, 2018).
In conclusion, the results and limitations of relevant articles were dis-
cussed, and currently, there is no prediction model for assessing the overall
mental health of students with a focus on early detection. While the pilot
study by Srividya et al. (2018) is a solid starting point, the limitations of
the questionnaire and the small sample size, the lack of consensus on which
algorithm should be used, and the question of whether machine learning is
necessary in psychological research indicate that there is still a significant
gap in the literature on this topic. Therefore, the research presented in this
thesis aims at proposing a classification model which may be utilized as
an early detection method to identify students who are at risk in terms of
their mental health, using both descriptive and contextual features from the
Sapien Labs’ Dynamic Dataset of Population Mental Wellbeing (Newson &
Thiagarajan, 2021). Inspired by the choice of algorithms of Joshi et al. (2018)
4 method 7

and Srividya et al. (2018), the aim of this thesis extends to determining
whether a baseline Binary Logistic Regression model is outperformed by
Naive Bayes, K-nearest neighbors, Support Vector Machine, Random Forest,
and a Feedforward Neural Network.

4 method

Figure 1: Conceptualization of methodology

4.1 Dataset description

The dataset used for this thesis is Sapien Labs’ Dynamic Dataset of Pop-
ulation Mental Wellbeing (Newson Thiagarajan, 2021). In response to a
request for the dataset from the creators at https://sapienlabs.org, the devel-
opers gave downloadable access to the dataset. The Mental Health Quotient
(MHQ) questionnaire was administered online to collect information on the
mental health of the global population. Respondents were recruited through
Facebook and Google Ads-based outreach initiatives. Observations are peri-
odically added to the dataset, making it an ongoing collection. When the
thesis was written, the dataset comprised approximately 250,000 observations.
Each observation represents an individual who completed the questionnaire
and is described by two types of features: MHQ scores in their (a) raw form,
and (b) composite forms. The raw MHQ scores are comprised of 47 distinct
mental wellbeing indicators and 30 contextual descriptors (i.e., life context
metrics and descriptive features) that provide demographical information
and information about a person’s particular circumstances. The composite
MHQ scores are the aggregated scores of the 47 individual indicators across
six dimensions of mental health, along with an overall MHQ score that
represents the individual’s mental health as a whole. It is important to note
that, for the purpose of this thesis, only the 30 contextual descriptors will
be considered as possible input features for the classification model in which
the overall MHQ will function as a target variable. As there is a summative
relationship between the calculation of the overall MHQ score and the 47
mental wellbeing indicators, they are ineligible for use as input features.
4 method 8

Appendix A provides an overview of both the raw and composite features.


Lastly, the MHQ has been shown to be a valid and reliable questionnaire for
assessing mental health (Newson et al., 2021).

4.2 Pre-processing

The dataset, which is maintained by Sapiens Lab, was provided in a relatively


clean and well-organized csv file, with each column titled according to the
questionnaire descriptions, which were included as one of the supporting
documentation. Consequently, only a few further steps were required to
ensure that the data was suitable for usage in the prediction analyses. Figure
2 provides an overview of all of the preprocessing steps.

Figure 2: Pre-processing pipeline

The first step was to select the observations that were most likely to be
part of the target group. Namely, individuals who are currently studying.
Therefore, only individuals who answered ‘studying’ to the question ‘What is
your current occupational status?’ were selected for further preprocessing. It
is also important to note that only individuals between the ages of 18 and 24
were included because this age group is assumed to be most representative
of undergraduate students who are particularly at-risk for mental health
problems. See Figure 3 for the number of individuals per current educational
level.

Figure 3: The number of individuals per educational level


4 method 9

Lastly, to protect the external validity of the study, only people from nations
with a minimum of 500 observations were included (Kukull & Ganguli, M,
2012). See Figure 4 for the number of individuals per country.

Figure 4: The number of obsevations per country

To ensure the quality of the observations, the second step involved


excluding certain individuals based on the exclusion criteria set by Newson
and Thiagarajan (2020), the questionnaire’s developers. Firstly, individuals
who completed the questionnaire in less than seven minutes or in excess of one
hour were excluded. Given that the average time to take the questionnaire is
15-20 minutes, the assumption is made that the responses from participants
with completion times of beyond one hour or less than seven minutes were
considered too fast or too slow, which in turn could have affected the quality
of their responses. Secondly, only participants who responded "understood"
to the question "Were the questions in this assessment easy to understand?"
were included in the analysis; those who responded "not understood" were
excluded. Lastly, those who provided unrealistic responses to the questions
"How many hours did you sleep last night?" and "When was your last meal?"
were excluded as well. The authors reasoned that these responses were
invalid because the participants may have submitted them under duress (i.e.,
physical stress, which could have affected their thinking).
The third step consisted of ‘data cleaning’ and ‘data reduction’ in which
missing values and outliers were analyzed and treated if needed, and re-
dundant information was omitted. Redundant information constituted of
features that were either uninformative to the target group (e.g., household
4 method 10

income) or unrelated to this particular prediction task (i.e., non-contextual


features). Often, the missing data were due to redundancy. For instance,
household income was one of the features with numerous missing values;
however, due to the fact that many students are not yet employed, the
feature was deemed uninformative and was therefore excluded from the
analysis. Boxplots were utilized to identify outliers, and the decision was
made not to eliminate them. The assumption was made that these outliers
could potentially be a natural reflection of the population. The feature "how
would you describe your overall mood?", for example, measured whether the
individual’s mood was negative or positive on a scale of 1 (very negative)
to 9 (very positive). The boxplot revealed that the median score was 7,
and the minimum was 3. 251 individuals answered 1, making them outliers
according to this boxplot. Excluding these individuals from the dataset
could potentially result in an underrepresentation of the population and
degrade the quality of the research, although the contrary could also be true,
in which case these individuals would be overrepresented (Aguinis et al.,
2013).
The fourth step involved the dummy coding of all categorical features
and the target variable. For four features, a decision had to be made
regarding the way these features were represented. These features were: the
presence of a medical condition; whether or not the individual experienced a
specific life trauma, whether the individual experienced physical complaints;
and if the individual currently use substances like drugs or alcohol. On
the questionnaire, these features were defined as multiple-choice questions.
Respondents could either submit their responses manually or select them
from a predefined list. This resulted in a huge number of unique responses
as well as a significant number of missing values. For each of these features,
the decision was made to interpret missing values as a "no" response to the
particular question. For example, if an individual did not provide a response
to the life trauma-related question, the assumption was made that the
individual had not experienced life trauma. These instances are represented
as the integer value ‘0’. If an individual did answer the question by either a
manual response or choosing a pre-defined answer, the instance would be
represented by an integer value of ‘1’. In the case of the life trauma related
question, a ‘1’ would mean that the individual experienced a life trauma in
their life. To avoid a high-dimensional feature space, one-hot coding was
chosen rather than allowing each unique response to be a distinct attribute
(Fan & Li 2006). The target variable, "overall MHQ", was one-hot coded as
well, with negative overall MHQ scores (i.e., < 0) representing the at-risk
class and positive overall MHQ scores (i.e., > 0) representing the normal or
healthy class (Newson & Thiagarajan, 2021).
4 method 11

Finally, having ensured that the data was "tidy" and "clean," the non-
dummy coded features were standardized. The dataset was split at a
ratio of 80:20, with 80 percent of the data going into the training set. A
stratification technique was utilized to maintain the proportion of class
distribution depicted in Figure 5. The class distribution is roughly equal,
with 8058 individuals being labeled as normal/healthy and 7848 individuals
being labeled as at-risk. The total number of individuals involved in the
prediction analysis is 15,906.

Figure 5: Class distribution per gender

4.3 Cross-validation

Using stratified K-fold Cross-validation (K-fold) k = 10, feature selection


and hyperparameter tuning will be performed for the prediction task. Cross-
validation is a data resampling method that is routinely utilized in machine
learning as it allows researchers to assess the generalization capabilities
of models by testing them on a validation set with the goal of preventing
overfitting (Refaeilzadeh et al., 2009). K-fold is a variation of cross-validation
that runs the cross-validation procedure k times, in which k is a parameter
that can be adjusted. The benefit of using K-fold as opposed to regular
cross-validation is that the dataset is divided into k different folds and each
portion of the dataset will be trained and validated k times. Stratification
enables each fold to have an accurate representation of the distributions in
4 method 12

the dataset, with particular emphasis devoted to the subgroups of gender,


current education level, and what country the individual lives in. Due to
time restrictions, feature selection and hyperparameter tuning are performed
separately as opposed to simultaneously. That is, feature selection is per-
formed first, and these selected features are then used during hyperparameter
tuning.

4.4 Feature selection

Feature selection will consist of correlation analysis between the features and
target variable, as well as inter-correlation analysis to assess multicollinearity.
Since the target variable for the is binary, Point Biserial Correlation is used,
which allows for the investigation of the association between continuous and
binary variables (Kornbrot, 2014). The idea behind correlation analysis as
a feature selection method is to only retain the set of features that show a
strong linear relationship with the target variable and discard the features
that show a weak relationship. The assumption is made that features that
show a weak linear relationship with the target variable are not informative
and in turn could negatively affect the predictions results. Correlation ranges
from -1 to 1, in which 1 represents a perfect positive linear relationship and
-1 a perfect negative linear relationship. Different values within this range
of -1 and 1 will serve as a minimum correlation value that features have to
meet in order to be selected. For example, a minimum value of 0.4, would
mean that only features with a correlation value of ≤ -0.4 or ≥ 0.4 will be
selected.
However, correlation analysis is restricted to linear connections. Conse-
quently, a tree-based feature selection method will also be utilized. Specifi-
cally, feature selection through Random Forest, which enables the exploration
of nonlinear relationships. Important features are identified by their impurity
importance, which is often measured by the Gini impurity index (Louppe,
2014; Nembrini et al., 2018). This operation will be carried out utilizing the
Boruta package (Kursa & Rudnicki, 2010).
Both feature selection via correlational analysis and feature selection
through Random Forest will yield a selection of features. To determine which
of these two will be utilized during the hyperparameter tuning and testing
phase, their performance on a Random Forest Classifier will be compared
using F 1-score as an evaluation metric. See section 3.6 for additional
information regarding the rationale behind the selected evaluation metric.
4 method 13

4.5 Algorithms and hyperparameter tuning

This section describes the algorithms used during the classification prediction
analysis and, if available, their hyperparameter search space.

Table 1: Hyperparameters and considered values per algorithm

Algorithms Hyperparameters Considered Values


K-nearest- K-neighbors 20,40,60,80
neighbor
Distance metric Euclidean, Manhattan, Cosine
Support Vector Kernel type Linear, Poly, RBF, Sigmoid
Machine
Random Forest Splitting criteria Gini, Entropy
Number of estima- 200,400,600,800
tors
Feedforward neu- Optimizer RMSProp, SGD, ADAM
ral network
Learning rate 0.01, 0.001, 0.0001
Number of hidden 1,2,3
layers
Dropout 0, 0.5, 0.8
Batch size 32,64,128

4.5.1 Binary Logistic Regression


Binary Logistic Regression (BLR) is a statistical method that belongs to
the class of statistical models called Generalized Linear Models (GLM).
This statistical method is performed with Statsmodels (Seabold & Perktold,
2010).

4.5.2 Naive Bayes


Naive Bayes (NB) is a simple probabilistic classification machine learning
algorithm. NB is based on Bayes’ rule and assumes the conditional indepen-
dence of features given the target label. This machine learning algorithm is
performed with Scikit-learn (Pedregosa et al., 2011).

4.5.3 K-nearest-neighbor
K-nearest neighbor (KNN) is a supervised non-parametric machine learning
algorithm that can be used for classification or regression (Azadkia, 2019).
Its performance, which is heavily reliant on the employed distance metric,
4 method 14

is often similar to the most sophisticated classifiers in the literature (Abu


Alfeilat et al., 2019). The hyperparameter search space is comprised of
several different values for the parameter k, which determines the number
of neighbors the algorithm will consider when making predictions, and the
type of distance metric. Consideration will be given to Euclidean Distance,
Manhattan Distance, and Cosine Similarity as distance metrics. Euclidean
Distance and Cosine Similarity often perform similarly. However, Cosine
Similarity has the advantage of a normalized distance, which may result in
greater efficiency (Qian et al., 2004). This machine learning algorithm is
performed with Scikit-learn (Pedregosa et al., 2011).

4.5.4 Support Vector Machine


Support Vector Machine (SVM) is a machine learning algorithm commonly
used for classification problems in which the algorithm attempts to maximize
the margin between classes by constructing a hyperplane. The hyperparame-
ter space is made up of various kernel types. The most frequently employed
kernel is the Gaussian radial basis function (RBF), a general-purpose kernel
utilized when there is no prior knowledge of the data. The linear kernel is
employed when data is considered to be linearly separable; the polynomial
kernel permits the addition of d-degrees of polynomials; and the sigmoid
kernel employs the sigmoid function and is akin to a two-layered perceptron
(Hassan et al., 2014). This machine learning approach is implemented with
Scikit-learn (Pedregosa et al., 2011).

4.5.5 Random Forest


Random Forest (RF) is a member of the algorithm family known as decision
trees. Decision trees are sequential models that logically incorporate a series
of simple tests, each of which compares a numeric attribute to a threshold
level or a nominal attribute to a set of possible values (Kotsiantis, 2013). The
hyperparameter search space is comprised of the number of trees utilized in
the RF. A RF with a large number of trees is not always superior to one with
fewer trees. Therefore, increasing the number of trees is not always necessary;
there is a threshold beyond which there is no significant gain (Oshiro et al.,
2012). While both Entropy and Gini Impurity can be used as a splitting
criterion, and they perform comparably in general, Gini Impurity was chosen
because it is often the default option. Therefore, the splitting criterion was
not deemed a hyperparameter (Teng & Lee, 2019). This machine learning
approach is implemented with Scikit-learn (Pedregosa et al., 2011).
4 method 15

4.5.6 Feedforward Neural Network


A Feedforward Neural Network (FNN), also known as a multilayered percep-
tron, is a deep learning model with the objective of learning the parameter
values that produce the best approximation of some function f ∗. During
classification, the function f ∗ (x) maps an input x onto one or multiple
classes y (Goodfellow et al., 2016). Stochastic gradient descent (SGD) and
gradient descent extension training algorithms RMSProp and ADAM are
among the optimizers under consideration. SGD is a simplification of the
more generic gradient descent training algorithm, and when appropriate
training data is provided, it is an effective learning algorithm. However, the
pace of convergence is limited by a noisy approximation of the true gradient.
Both RMSProp and ADAM are adaptive learning rates that permit not just
good performance but also faster convergence (Goodfellow, 2016). Despite
the fact that the RMSProp and ADAM algorithms alter the learning rate
during training as opposed to SGD, several initial learning rates will be
explored. The number of hidden layers under consideration spans from one
to three, with the number of nodes determined through trial and error. Two
hidden layers frequently outperform a single hidden layer, but they also
increase the model’s complexity, which may reduce its efficiency (Thomas
et al., 2016; Stathakis, 2009). In addition, dropout rates ranging from 0.0
to 0.6, which are known to be effective, will be evaluated (Srivastava et al.,
2014). The batch sizes considered are 32, 64, and 128. Lastly, in order to
avoid overfitting, early stopping will be used. This deep learning algorithm
is performed with PyTorch (Paszke et al., 2019).

4.6 Evaluation

The evaluation metrics used for the classification task are the receiver
operating characteristic curve (ROC-AUC), the F1 score, and a confusion
matrix. To gain more insights into the models’ performance, error analysis
will be performed. Finally, model comparison will be performed by measuring
statistical differences between the models.

4.6.1 Evaluation metrics


ROC-AUC quantifies the trade-off between the true positive rate and the
false positive rate and offers information on the model’s discriminatory power
(Fan et al., 2006). In the context of the prediction task, this score would
indicate the model’s ability to distinguish between students who are at risk
for mental health problems and those who are not. One of the drawbacks
of using ROC-AUC is that when applied to an imbalanced dataset, it can
portray an overly optimistic performance of a classifier (Movahedi et al.,
4 method 16

2020). Although steps are taken in this thesis take the imbalance of the class
distribution into account when training the models (i.e., stratification), the
decision has been made to also include the F 1-score as an evaluation metric.
The F 1-score is the harmonic mean of the precision and recall scores. It
focuses more on the classification performance of the model with regard to
the positive class (Raschka, 2014). That is, the students who are at-risk
regarding their overall mental health. In comparison to ROC-AUC, it is
a more robust metric against imbalanced datasets (DeVries et al., 2021).
Lastly, the confusion matrix provides a summary table of the number of
true and false positives as well as true and false negatives (Rosenbusch et
al., 2021).

4.6.2 Error analysis and model comparison


The error analysis will consist of a search for anomalous feature value discrep-
ancies between correct and incorrect classifications. The strategy consists of
determining which observations were correctly or incorrectly classified per
model and examining patterns across features that are interesting. Also,
group disparate impact analysis will be performed and if patterns of biases
exist they will be discussed.
The comparison of models entails measuring statistical differences in
score distributions between models. These score distributions are obtained
through 1000 iterations of model training. Due to the computational limits of
the hardware, the algorithms are trained independently, with each iteration
employing a unique train-test split with an 80:20 ratio, of which 80% is
comprised of training data. F1-scores and ROC-AUC will both be used as
evaluation metrics. The goal is to see whether or not the differences observed
among the different distributions result from the differences in terms of
performance between the models or simply by chance. To accomplish this,
the means of the distributions will be compared to determine whether they
significantly differ. Firstly, normality checks are performed using the Shapiro-
Wilkins test. The null-hypothesis of the Shapiro-Wilkins test is that the
score distribution is normally distributed. If the distributions are normally
distributed and the variances are equal a two-tailed Independent Samples
T-test will be performed to test whether the means of the distributions differ
significantly from each other. The homogeneity of variance will be assessed
through the Levene’s Test. If the distribution are normality distributed but
the variances are unequal a two-tailed Welch’s Test will be performed to test
whether the means of the distributions differ significantly from each other.
If the distributions appear to be nonnormal, the Wilcoxon Signed-Rank Test
is used (Bielza & Larrañaga, 2020). All statistical tests are performed using
an alpha of 0.05.
5 results 17

5 results

This section describes the results of the classification prediction task. The
first part of this section consists of the cross-validation results. Namely, the
feature selection results and the classification scores that were obtained on
the validation sets. The second portion of this section will focus on the
results that were obtained during testing. Both validation and test results
are described by F1-scores, ROC-AUC scores, and confusion matrices. The
third portion of this section will focus on error analysis. This section will
conclude with a model comparison.

5.1 Cross-validation results

During the correlational feature selection phase, a number of potential


optimal threshold values were explored. The thresholds were initially varied
in 0.1 instances from 0.1 to 1, and the results revealed that the Random Forest
Classifier produced the highest average F1-scores between the threshold range
of 0.1 and 0.3. This range was then further investigated using 0.05 instances,
with the aim of obtaining, if possible, an even better threshold value. The
optimal threshold value was 0.1, resulting in the selection of 16 features. In
addition, feature selection using Gini Importance was conducted, resulting
in the selection of 25 features.
The performance of both selection of features were compared against
each other and the results revealed that the features obtained from the
correlational analysis obtained the greatest average F1-score across the
different folds, with a mean of 0.747 (SD = 0.015). In contrast, features
obtained through Gini Importance, yielded a mean F1-score of 0.719 (SD
= 0.013). Therefore, during the hyperparameter and testing phases, the 16
features selected by correlational analysis feature selection were utilized.
After feature selection, hyperparameter tuning was conducted. The
optimal value for k-neighbors in the KNN classifier was 60, and the optimal
distance metric was Cosine Similarity. SVM performed best when using a
linear kernel. The ideal splitting criterion for the RF model was Gini and
the optimal number of trees to use was 600. The FNN model performed
best when the ADAM optimizer was utilized with an initial learning rate
of 0.001 and a batch size of 64. Length (i.e., the number of hidden layers)
and width (i.e., number of nodes per hidden layer) were manipulated to test
various architectural configurations. Cross-validation results revealed that
the optimal number of hidden layers is two, with 100 nodes per hidden layer
and where each layer utilized a ReLu activation function.
The validation results are shown in Table 2, and Figure 6. The results
show that none of the models outperform the BLR baseline score, with SVM
5 results 18

performing the best out of all machine learning models obtaining a similar
score as BLR on both the F1 and ROC-AUC metric.
Table 2: Validation results

Model F 1-score ROC-AUC


BLRa 0.762 0.765
NB 0.74 0.751
KNN 0.753 0.761
SVM 0.762 0.765
RF 0.741 0.745
FNN 0.707 0.748
a Baseline model

The confusion matrices show that all models are capable of distinguishing
between the negative class and positive class given the number of true
negatives and true positives. However, BRL, SVM, and RF tend to produce
fewer false negatives in comparison to the other models.

Figure 6: Confusion matrices based on classification validation results


5 results 19

5.2 Test results

The F1-scores and ROC-AUC scores for each model in the test set are
presented in Table 3. On both the F1-score and ROC-AUC, the outcomes
reveal that BLR performs slightly better than SVM. The run duration in
seconds displayed in the first column indicates that, with the exception of
NB, the baseline logistic regression model is significantly faster than the
machine learning and deep learning models.

Table 3: Test results

Model Run timeb F 1-score ROC-AUC


BLRa 0.039 0.764 0.768
NB 0.004 0.731 0.746
KNN 0.824 0.757 0.764
SVM 9.665 0.762 0.767
RF 4.838 0.731 0.737
FNN 1.689 0.719 0.753
a Baseline model, b In seconds

Figure 7 depicts the confusion matrices for all models on the test set.
Similar to the validation results, the confusion matrices demonstrate that all
models appear to be able to differentiate between the negative and positive
classes, as indicated by the true negative and true positive rates. Also similar
to the validation results, BLR, SVM, and RF tend to have a more balanced
distribution between true and false negatives and positives. In contrast to
NB, KNN, and FNN where the number of false negatives exceed the number
of false positives.
5 results 20

Figure 7: Confusion matrices based on test results

5.3 Error Analysis

This section discusses results of the error analysis. The results reveal a
couple of things. Firstly, there is an indication that extreme cases of mental
health were overall easier to classify than less extreme cases, where extremer
cases can be both a positive experience of mental health, such as being
very satisfied with one’s life, or a negative experience, such as describing
one’s mood as very negative. More specifically, when an individual has an
extremely positive or negative experience regarding a certain aspect of their
life, the number of correct classifications tends to be higher and the number
of incorrect classifications tends to be lower. This was a constant finding
across all models. An example is given in Figure 8, which shows the correct
and incorrect classifications of the BLR model on 3 features.
As seen in the figure, when someone is either very satisfied with their
life, indicated by a score of 8 or 9, or not satisfied at all, indicated by a score
of 1 or 2, the classifications are mostly correct, with only a small number
of incorrect classifications. The same goes for "overall mood" and "mental
5 results 21

Figure 8: Incorrect and correct classification for BLR model on three features

alertness". When an individual provided a moderate response, indicated by


a 4 or 5, the number of incorrect classifications tends to be larger.
However, whether an incorrect classification is deemed a false positive
or a false negative, seems to be influenced not just by whether a response
is moderate, but also by whether a response is negative or positive. For
example, the instances that were labeled as false positives by the better
performing models such as BLR and SVM seem to consist of individuals
who gave a moderate response which as either neutral or bit a more negative.
In contrast, the instances that were labeled as false negatives were often
individuals who responded either neutral or a bit more positive. For instance,
as seen in Figure 9, the responses of individuals classified as false positive
by the SVM model on the feature ‘life satisfaction’ are mostly moderate and
slightly more negative (i.e., being less satisfied with life). On the other hand,
the responses of individuals classified as false negatives tend to be moderate
and slightly more positive.

Figure 9: SVM false positives and false negatives on the feature life satisfaction

Whether a response was moderate or a bit positive or negative seems


to matter less for the worse performing models such as FNN. While the
trend of being labeled a false positive still holds when a moderate or slightly
more negative response was given, most individuals who provided moderate
5 results 22

responses were classified as false negatives. Figure 10 provides an example of


both individuals who were labeled false positives and false negatives by the
model FNN on the feature life satisfaction. This is also apparent in Figure 6
in which the number of false negatives for the FNN model is much higher in
comparison to the number of false positives.

Figure 10: SVM false positives and false negatives on the feature life satisfaction

5.4 Model Comparison

This section discusses the model comparison results between the baseline
BLR and all other models. Shapiro-Wilk Normality tests showed that the
score distributions from each model, both F1-scores and ROC-AUC scores
seem to be normally distributed, as shown in Table 4.

Table 4: Shapiro-Wilk Normality Test results

F1-scores ROC-AUC scores


Model W P-value W P-value
BLRa .998 .375 .997 .116
NB .998 .272 .998 .445
KNN .998 .427 .999 .824
SVM .999 .578 .998 .159
RF .999 .566 .999 .947
FNN .999 .522 .998 .522
a Baseline model

Levene’s test was performed to test the homogeneity of variance between


the baseline distributions and the score distributions obtained by the ma-
chine learning algorithms. The results show that the assumption of equal
variances can be assumed for all machine learning algorithms, except for the
distributions obtained by FNN. This applies to both the FNN F1-score score
5 results 23

distribution, as well as the ROC-AUC distribution, p < 0.05. Therefore,


the statistical comparison between BLR and FNN was performed using the
Welch’s test, while the statistical comparison between BLR and all other
models was performed using an Independent Samples T-Test.
As visualized in Figure 11, the comparison between F1-score distributions
shows that SVM (M = 0.762, SD = 0.008) performs only slightly better
than BLR (M = 0.762, SD = 0.007). As seen in Table 5, this difference
was not significant. The results also show that BLR outperforms NB (M =
0.737, SD = 0.008), KNN (M = 0.753, SD = 0.008), RF (M = 0.741, SD
= 0.008), and FNN (M = 0.710, SD = 0.015). As seen in Table X, these
differences in performances were significant.

Figure 11: F1-score score distributions per model

Note: The solid line represents the baseline mean.

Table 5: T-test results F1-score distributions

Model df t p-value
NB 1998 70.486 .000∗
KNN 1998 23.677 .000∗
SVM 1998 -1.632 .103∗
RF 1998 59.453 .000∗
FNN 1998 100.646 .000∗
∗p-value < .05

As visualized in Figure 12, the comparison between ROC-AUC score


distributions shows that SVM (M = 0.766, SD = 0.007) performs only
slightly better than BLR (M = 0.765, SD = 0.007). As seen in Table 6, this
difference was not significant. The results also show that BLR outperforms
NB (M = 0.749, SD = 0.007), KNN (M = 0.761, SD = 0.007), RF (M =
0.744, SD = 0.007), and FNN (M = 0.748, SD = 0.008). As seen in Table
6, these differences in performances were significant.
6 discussion 24

Figure 12: ROC-AUC score distributions per model

Note: The solid line represents the baseline mean.

Table 6: T-test results ROC-AUC distributions

Model df t p-value
NB 1998 53.898 .000∗
KNN 1998 15.282 .000∗
SVM 1998 -1.780 .075∗
RF 1998 67.439 .000∗
FNN 1998 109.860 .000∗
∗p-value < .05

6 discussion

6.1 Goal of the research

The prevalence of mental illness among students is becoming increasingly


concerning, and viable prevention strategies are needed. The goal of this
thesis was therefore to present a prediction model that focuses on early
detection of students who show indicators of being at risk for mental health
problems. However, the literature is ambiguous regarding the appropriate
methods and algorithms. On the one hand, there is a rise in the use of
machine learning applications in mental health research showing promising
results, but there are also several studies indicating that traditional psycho-
logical methods are not necessarily outperformed. Therefore, a classification
prediction task was created with the goal of categorizing individuals as either
at-risk or normal/healthy. To see whether machine learning algorithms
outperforms traditional psychological methods, this thesis investigated the
research question: "How does the predictive performance of Naive Bayes, K-
nearest neighbor, Support Vector Machine, Random Forest, and Feedforward
6 discussion 25

Neural Network compare to that of a baseline Binary Logistic Regression in


classifying students as either at-risk or not at-risk for their mental health?".
The findings and their implications are discussed in the following sections.

6.2 Research question

The research question investigated the performance of the algorithms Naive


Bayes, K-nearest neighbor, Support Vector Machine, Random Forest, and
Feedforward Neural Network against a baseline Binary Logistic Regression
in classifying students as either at-risk or not at-risk for their mental health.
The comparison in performance indicates that (a) none of the machine
learning algorithms perform significantly better than the baseline BLR
model; (b) BLR performs significantly better than NB, KNN, RF, and
FNN; and (c) there is no significant difference between the performances
of BLR and SVM. This demonstrates that a traditional method such as
logistic regression can perform comparably to more advanced algorithms in
this particular prediction task and, in comparison to some algorithms, even
better. Overall, these findings are consistent with those of Christodoulou et
al. (2019). Their systematic review of 71 articles led to the conclusion that
there was no evidence that machine learning methods performed better than
logistic regression. There are a few possible explanations for this difference
in performance.
Firstly, machine learning algorithms are often best utilized when the
dataset consists of a large number of observations and a large number of
features (Rajkomar et al., 2018; Deo & Nallamothu, 2016; Luo et al., 2016).
As van der Ploeg et al. (2014) discuss, in comparison to classical modelling
techniques (e.g., logistic regression), machine learning may need over ten
times as many features to obtain an optimistic performance. While this
thesis included a relatively substantial number of observations in comparison
to previous studies, only 16 features were utilized. Since the advantage of
machine learning lies in its ability to handle a large number of features more
effectively, it could be the case that machine learning algorithms will not out-
perform logistic regression when only a small number of prespecified features
are examined (Kononenko, 2001; van der Ploeg et al., 2014). In addition,
these 16 features were selected based on their linear relationship with the
target variable, leaving no features with a strong nonlinear relationship with
the target variable. It is possible that the machine learning algorithms were
underutilized, given that their strength is in identifying nonlinear relation-
ships in the data (Nusinovici et al., 2020). Kuhle et al. (2018) offered a
similar conclusion. They demonstrated that machine learning methods did
not outperform logistic regression on a prediction task that aimed to identify
pregnancies at risk for adverse outcomes. They attributed this disparity in
6 discussion 26

performance to a possible lack of complexity in the relationships between the


features and target variables. That is, the lack of complexity did not play to
the strengths of machine learning models, which are known to learn complex
non-linear relationships more easily (Rajkomar et al., 2019; Jacobucci et
al., 2020). In addition, nonlinear methods are frequently most effective
when solving prediction problems with a high signal-to-noise ratio (Walsh et
al., 2016). Signal-to-noise ratio relates to the dataset’s quality (Hastie et
al., 2019; Walsh et al., 2016). In contrast to domains such as engineering
and physical science, the signal-to-noise ratio in human medical research is
frequently low. Data noise can be problematic for machine learning algo-
rithms, as the algorithm may mistake it for a pattern and begin generalizing
from it (Cao & Zhang, 2020; Chen et al., 2019). In situations with a low
signal-to-noise ratio, machine learning methods may be less useful (Ennis et
al., 1998).
Secondly, the class distribution, as shown in Figure 5, was somewhat
unbalanced. More specifically, the dataset comprised a larger proportion
of normal/healthy individuals than at-risk individuals. Imbalances are fre-
quently observed in medical risk prediction, where the number of incident
disease cases are often lower (Christodoulou et al., 2019). Nusinovici et al.
(2020) concluded that machine learning techniques may not be warranted
in such cases. They compared the performances of logistic regression and
machine learning algorithms on an imbalanced dataset and a corrected im-
balanced dataset via a technique called Synthetic Minority Over-sampling
Technique (SMOTE). They found that machine learning algorithms did not
outperform logistic regression on the imbalanced dataset, but the perfor-
mance of algorithms such as SVM improved substantially on the corrected
dataset showing an equal class distribution. It is important to note that the
imbalanced dataset included in the study done by Nusinovici et al. (2020)
was more imbalanced than the dataset included in this thesis. Nevertheless,
this could be an explanation for why the number of false negatives was
substantially higher for most of the machine learning algorithms such as NB,
KNN, and NB and why the performance of the machine learning algorithms
overall was worse.
Lastly, a somewhat unexpected finding was that FNN performed not only
significantly worse than BLR, but also performed the worst out of all other
algorithms. These findings do however appear to be consistent with those of
Joshi et al. (2018), in which a multi-layered perceptron was outperformed
by BLR and numerous algorithms covered in this thesis, including RF and
KNN. Whilst other models seem to achieve a minimum accuracy score of
86% the MLP classifier 83%. One potential explanation is that FNN is more
susceptible to overfitting. In BLR, model complexity is low, particularly
when interaction terms are absent, but FNN is more flexible and complicated,
6 discussion 27

making it more susceptible to overfitting (Dreiseitl & Ohno-Machado, 2002).


Lastly, another possible explanation is that the choice of hyperparameters
might not have been optimal which is known to affect the performance of
the model substantially (Claesen & De Moor, 2015).
As mentioned earlier, the results also indicated there was no significant
difference in performance between SVM and BLR. These results are in line
with findings of Verplancke et al. (2008). Both support vector machine
and logistic regression were used to predict hospital mortality for critical ill
patients with hematological malignancies. They concluded that no statistical
differences were found in the discriminative power between logistic regression
and support vector machine in which both models achieved an ROC-AUC
score around 80%. Since the choice of kernel type is crucial to the specific
prediction problem (Noble, 2006; Savas & Dovis, 2019), it is likely that the
similar performance of SVM and BLR is due to the fact that SVM employed
a linear kernel, given that only features with a linear connection to the
target variable were utilized. These results are comparable with those of
Danades et al. (2016) and Ahmad et al (2018). These studies compared the
performance of SVM with a linear kernel to that of nonlinear algorithms
such as KNN and RF and showed that SVM with a linear kernel performed
much better. Similar to this thesis’ findings, KNN and RF performed poorly.
However, no explicit information was given regarding the linear or nonlinear
relationships in the data.
The last point is that, despite the fact that a group-differences analysis
was conducted, the data showed no bias toward predicting a certain grouping
more accurately or less accurately. This may be a result of the stratification
method utilized during cross-validation. While the distribution of females
and males was substantially uneven from the start, no model displayed
a preference for correctly categorizing females. Surprisingly, for certain
features, the proportion of correct classification was somewhat higher than
the proportion of correct classification for females. While the literature
makes it abundantly clear that gender disparities exist in terms of mental
disease prevalence, type, and severity, no significant differences in feature
values were detected (Afifi, 2007).

6.3 Contributions, limitations, and suggestions for future research

This thesis contributes to the body of knowledge regarding the application of


machine learning to mental health research in a variety of ways. Firstly, this
is one of the first cross-regional studies that attempted to employ machine
algorithms to create a prediction model for the early detection of mental
health problems in students. However, to increase the study’s validity, future
research should use external datasets to validate their findings (Christodoulou
6 discussion 28

et al., 2019). Strengths of this thesis include the use of a systematic approach
to model comparison; dealing with limitations from previous studies, such as
using a larger sample size in comparison to previous studies; and optimization
of hyperparameter tuning. In addition, the performance between machine
learning algorithms and traditional methods in predicting student mental
health was unclear. The findings of this thesis show that traditional methods,
such as binary logistic regression, should continue to play a key role in mental
health research. This is especially relevant given that traditional methods
are not only simple to implement but are also easily interpretable, whereas
machine learning methods are often viewed as "black boxes" that do not
provide the user with readily available information regarding the importance
of individual features (Kuhle et al., 2018).
However, several limitations need to be acknowledged. The first limita-
tion is the self-reporting nature of the questionnaire. Although participants
were encouraged to provide accurate responses, there is always the possibility
that they misunderstood the questions or provided socially acceptable re-
sponses. That being said, there really is no alternate answer to this possible
problem other than evaluating the psychological status of individuals in a
clinical setting, which is unfeasible on a large scale. A second limitation is
that only a relatively small number of features were utilized. With more
variables or observations, it is plausible that the machine learning algorithms
included in this thesis could have been more discriminative. Future research
should investigate methods for utilizing not only the maximum number of
observations but also features. A third limitation is that while the run-time
of models was considered, the primary focus of this thesis was the com-
parison in performance. However, in real-life settings, implementation and
applicability could be just as important, which should be investigated by
future research. Lastly, the quality of the features used in the prediction
task could be a potential limitation. The average F1-scores and ROC-AUC
scores derived by the models included in this thesis using only demographic
and contextual characteristics is around 75%. As demonstrated by the error
analysis, although the models are capable of differentiating between the
positive and negative classes, there are a considerable number of inaccurate
classifications, as seen by the false positives and false negatives rates. While
demographics alone can predict mental health to a limited extent, these
results seem to suggest that they do not provide sufficient information on
their own. For example, Sano et al. (2015) found an average of 80% accuracy
utilizing only personal data. However, when data from wearable sensors
was included, the predictive performance was increased. Typically, risk
prediction models use demographic patient data to forecast the occurrence
of a given unfavorable event in conjunction with psychological patient traits
that can be directly related to a particular mental condition, and these
7 conclusion 29

features were not present in the current study (Shatte et al., 2019; Thieme
et al., 2020; Sheldon et al., 2012). Future research could study this topic
further by comparing the predictive performance of models containing solely
features relating to demographic characteristics to those containing both
demographic characteristics and various psychological aspects.

7 conclusion

In summary, the purpose of this thesis was to determine whether traditional


psychological methods are outperformed by machine learning algorithms
when predicting the mental health of students. The results of the prediction
analysis indicate that traditional approaches should not be abandoned just
yet. The implications of this thesis are as follows. More research and
resources are required to support mental health prevention and promotion
initiatives in real-world settings such as university life. There are numerous
initiatives that institutions, such as universities, can undertake to either
prevent the onset of mental health problems or promote positive mental
health, in which an early detection-focused prediction model could play
a significant role. Important aspects of future research should therefore
not only include the validation of models, but also their employment and
applicability.
Bibliography

Abu Alfeilat, H. A., Hassanat, A. B., Lasassmeh, O., Tarawneh, A. S.,


Alhasanat, M. B., Eyal Salman, H. S., & Prasath, V. S. (2019).
Effects of distance measure choice on k-nearest neighbor
classifier performance: a review. Big data, 7(4), 221-248.
https://doi.org/10.1089/big.2018.0175
Afifi, M. (2007). Gender differences in mental health. Singapore medical
journal, 48(5), 385. Retrieved from
https://pubmed.ncbi.nlm.nih.gov/
Aguinis, H., Gottfredson, R. K., & Joo, H. (2013). Best-practice
recommendations for defining, identifying, and handling
outliers. Organizational Research Methods, 16(2), 270-301.
https://doi.org/10.1177/1094428112470848
Ahmad, I., Basheri, M., Iqbal, M. J., & Rahim, A. (2018). Performance
comparison of support vector machine, random forest, and
extreme learning machine for intrusion detection. IEEE access,
6, 33789-33795. https://doi.org/10.1109/access.2018.2841987
Arango, C., Díaz-Caneja, C. M., McGorry, P. D., Rapoport, J., Sommer,
I. E., Vorstman, J. A., ... & Carpenter, W. (2018). Preventive
strategies for mental health. The Lancet Psychiatry, 5(7), 591-
604. https://doi.org/10.1016/s2215-0366(18)30057-9
Arumugam, S. (2019). Strategies toward building preventive mental
health. Indian Journal of Social Psychiatry, 35(3), 164.
https://doi.org/10.4103/ijsp.ijsp_92_18
Azadkia, M. (2019). Optimal choice of k for k-nearest neighbor regression.
arXiv pre-print arXiv:1909.05495.
https://doi.org/10.48550/arxiv.1909.05495
Becker, D., van Breda, W., Funk, B., Hoogendoorn, M., Ruwaard, J., &
Riper, H. (2018). Predictive modeling in e-mental health: a
common language framework. Internet interventions, 12, 57-
67. https://doi.org/10.1016/j.invent.2018.03.002
Bielza, C., & Larrañaga, P. (2020). Data-driven computational
neuroscience: machine learning and statistical models.
Cambridge University Press.
https://doi.org/10.1017/9781108642989
Bohlmeijer, E. T., & Westerhof, G. J. (2020). A new model for
sustainable mental health: Integrating well-being into
psychological treatment. In J. N. Kirby & P. Gilbert (Eds.),
Making an impact on mental health: The applications of
psychological research (pp. 153–188). Routledge.
https://doi.org/10.4324/9780429244551-7
Bolier, L., Haverman, M., Westerhof, G. J., Riper, H., Smit, F., &
Bohlmeijer, E. (2013). Positive psychology interventions: a
meta-analysis of randomized controlled studies. BMC public
health, 13(1), 1-20. https://doi.org/10.1186/1471-2458-13-119
Bone, D., Goodwin, M. S., Black, M. P., Lee, C. C., Audhkhasi, K., &
Narayanan, S. (2015). Applying machine learning to facilitate
autism diagnostics: pitfalls and promises. Journal of autism
and developmental disorders, 45(5), 1121-1136.
https://doi.org/10.1007/s10803-014-2268-6
Bruffaerts, R., Mortier, P., Kiekens, G., Auerbach, R. P., Cuijpers, P.,
Demyttenaere, K., ... & Kessler, R. C. (2018). Mental health
problems in college freshmen: Prevalence and academic
functioning. Journal of affective disorders, 225, 97-103.
https://doi.org/10.1016/j.jad.2017.07.044
Bzdok, D., & Meyer-Lindenberg, A. (2018). Machine learning for
precision psychiatry: opportunities and challenges. Biological
Psychiatry: Cognitive Neuroscience and Neuroimaging, 3(3),
223-230. https://doi.org/10.1016/j.bpsc.2017.11.007
Cao, W., & Zhang, W. (2020). Machine learning of partial differential
equations from noise data. arXiv preprint arXiv:2010.06507.
https://doi.org/10.48550/arXiv.2010.06507
Chatmon, B. N. (2020). Males and mental health stigma. American
Journal of Men's Health, 14(4), 1557988320949322.
https://doi.org/10.1016/j.cpr.2007.01.008
Chen, Y., Zhang, M., Bai, M., & Chen, W. (2019). Improving the signal‐
to‐noise ratio of seismological datasets by unsupervised
machine learning. Seismological Research Letters, 90(4), 1552-
1564. https://doi.org/10.1785/0220190028
Christodoulou, E., Ma, J., Collins, G. S., Steyerberg, E. W., Verbakel, J.
Y., & Van Calster, B. (2019). A systematic review shows no
performance benefit of machine learning over logistic regression
for clinical prediction models. Journal of clinical
epidemiology, 110, 12-22.
https://doi.org/10.1016/j.jclinepi.2019.02.004
Claesen, M., & De Moor, B. (2015). Hyperparameter search in machine
learning. arXiv preprint arXiv:1502.02127.
https://doi.org/10.48550/arXiv.1502.02127
Colizzi, M., Lasalvia, A., & Ruggeri, M. (2020). Prevention and early
intervention in youth mental health: is it time for a
multidisciplinary and trans-diagnostic model for
care?. International Journal of Mental Health Systems, 14(1),
1-14. https://doi.org/10.1186/s13033-020-00356-9
Conway, M., & O’Connor, D. (2016). Social media, big data, and mental
health: current advances and ethical implications. Current
opinion in psychology, 9, 77-82.
https://doi.org/10.1016/j.copsyc.2016.01.004
Cui, Z., & Gong, G. (2018). The effect of machine learning regression
algorithms and sample size on individualized behavioral
prediction with functional connectivity
features. Neuroimage, 178, 622-637.
https://doi.org/10.1016/j.neuroimage.2018.06.001
Danades, A., Pratama, D., Anggraini, D., & Anggriani, D. (2016,
October). Comparison of accuracy level K-nearest neighbor
algorithm and support vector machine algorithm in
classification water quality status. In 2016 6th International
Conference on System Engineering and Technology (ICSET)
(pp. 137-141). IEEE.
https://doi.org/10.1109/icsengt.2016.7849638
Deo, R. C., & Nallamothu, B. K. (2016). Learning about machine
learning: the promise and pitfalls of big data and the electronic
health record. Circulation: Cardiovascular Quality and
Outcomes, 9(6), 618-620.
https://doi.org/10.1161/circoutcomes.116.003308
DeVries, Z., Locke, E., Hoda, M., Moravek, D., Phan, K., Stratton, A., ...
& Phan, P. (2021). Using a national surgical database to
predict complications following posterior lumbar surgery and
comparing the area under the curve and F1-score for the
assessment of prognostic capability. The Spine Journal, 21(7),
1135-1142. https://doi.org/10.1016/j.spinee.2021.02.007
Dreiseitl, S., & Ohno-Machado, L. (2002). Logistic regression and
artificial neural network classification models: a methodology
review. Journal of biomedical informatics, 35(5-6), 352-359.
https://doi.org/10.1016/s1532-0464(03)00034-0
Ennis, M., Hinton, G., Naylor, D., Revow, M., & Tibshirani, R. (1998). A
comparison of statistical learning methods on the GUSTO
database. Statistics in medicine, 17(21), 2501-2508.
https://doi.org/10.1002/(sici)1097-
0258(19981115)17:21%3C2501::aid-sim938%3E3.0.co;2-m
Fan, J., & Li, R. (2006). Statistical challenges with high dimensionality:
Feature selection in knowledge discovery. arXiv preprint
math/0602133. https://doi.org/10.4171/022-3/31
Fan, J., Upadhye, S., & Worster, A. (2006). Understanding receiver
operating characteristic (ROC) curves. Canadian Journal of
Emergency Medicine, 8(1), 19-20.
https://doi.org/10.1017/s1481803500013336
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT
press.
Hassan, U. K., Nawi, N. M., & Kasim, S. (2014, May). Classify a protein
domain using sigmoid support vector machine. In 2014
International Conference on Information Science &
Applications (ICISA) (pp. 1-4). IEEE.
Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009).
The elements of statistical learning: data mining, inference,
and prediction (Vol. 2, pp. 1-758). New York: springer.
Horwitz, A. G., McGuire, T., Busby, D. R., Eisenberg, D., Zheng, K.,
Pistorello, J., ... & King, C. A. (2020). Sociodemographic
differences in barriers to mental health care among college
students at elevated suicide risk. Journal of affective disorders,
271, 123-130. https://doi.org/10.1016/j.jad.2020.03.115
Jacobucci, R., Littlefield, A. K., Millner, A. J., Kleiman, E., & Steinley,
D. (2020). Pairing Machine Learning and Clinical Psychology:
How You Evaluate Predictive Performance Matters. Center for
Open Science. https://doi.org/10.31234/osf.io/2yber
Joshi, D. J., Makhija, M., Nabar, Y., Nehete, N., & Patwardhan, M. S.
(2018, January). Mental health analysis using deep learning for
feature extraction. In Proceedings of the ACM India Joint
International Conference on Data Science and Management of
Data (pp. 356-359). https://doi.org/10.1145/3152494.3167990
Keyes, C. (2007). Towards a mentally flourishing society: Mental health
promotion, not cure. Journal of Public Mental Health, 6(2), 4-
7. https://doi.org/10.1108/17465729200700009
Khanzode, K. C. A., & Sarode, R. D. (2020). Advantages and
Disadvantages of Artificial Intelligence and Machine Learning:
A Literature Review. International Journal of Library &
Information Science (IJLIS), 9(1), 3.)
https://paper.researchbib.com/view/paper/275760
Kononenko, I. (2001). Machine learning for medical diagnosis: history,
state of the art and perspective. Artificial Intelligence in
medicine, 23(1), 89-109. https://doi.org/10.1016/s0933-
3657(01)00077-x
Kornbrot, D. (2014). Point biserial correlation. Wiley StatsRef: Statistics
Reference Online.
https://doi.org/10.1002/9781118445112.stat06227
Kotsiantis, S. B. (2013). Decision trees: a recent overview. Artificial
Intelligence Review, 39(4), 261-283.
https://doi.org/10.1007/s10462-011-9272-4
Kuhle, S., Maguire, B., Zhang, H., Hamilton, D., Allen, A. C., Joseph, K.
S., & Allen, V. M. (2018). Comparison of logistic regression
with machine learning methods for the prediction of fetal
growth abnormalities: a retrospective cohort study. BMC
pregnancy and childbirth, 18(1), 1-9.
https://doi.org/10.1186/s12884-018-1971-2
Kukull, W. A., & Ganguli, M. (2012). Generalizability: the trees, the
forest, and the low-hanging fruit. Neurology, 78(23), 1886-1891.
https://doi.org/10.1212/wnl.0b013e318258f812
Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the
Boruta package. Journal of statistical software, 36, 1-13.
https://doi.org/10.18637/jss.v036.i11
Leijdesdorff, S. M. J., Huijs, C. E. M., Klaassen, R. M. C., Popma, A.,
van Amelsvoort, T. A. M. J., & Evers, S. M. A. A. (2020).
Burden of mental health problems: quality of life and cost-of-
illness in youth consulting Dutch walk-in youth health centres.
Journal of Mental Health, 1-8.
https://doi.org/10.1111/inm.12189
Lipson, S. K., & Eisenberg, D. (2018). Mental health and academic
attitudes and expectations in university populations: results
from the healthy minds study. Journal of Mental Health, 27(3),
205-213. https://doi.org/10.1080/09638237.2017.1417567
Louppe, G. (2014). Understanding random forests: From theory to
practice. arXiv preprint arXiv:1407.7502.
https://doi.org/10.1063/pt.5.028530
Luo, W., Phung, D., Tran, T., Gupta, S., Rana, S., Karmakar, C., ... &
Berk, M. (2016). Guidelines for developing and reporting
machine learning predictive models in biomedical research: a
multidisciplinary view. Journal of medical Internet research,
18(12), e5870. https://doi.org/10.2196/jmir.5870
Majumder, S., Mondal, T., & Deen, M. J. (2017). Wearable sensors for
remote health monitoring. Sensors, 17(1), 130.
https://doi.org/10.3390/s17010130
Margraf, J., Zhang, X. C., Lavallee, K. L., & Schneider, S. (2020).
Longitudinal prediction of positive and negative mental health
in Germany, Russia, and China. Plos one, 15(6), e0234997.
https://doi.org/10.1371/journal.pone.0234997
McCloughen, A., Foster, K., Huws‐Thomas, M., & Delgado, C. (2012).
Physical health and wellbeing of emerging and young adults
with mental illness: An integrative review of international
literature. International Journal of Mental Health Nursing,
21(3), 274-288. https://doi.org/10.1111/j.1447-
0349.2011.00796.x
Mihalopoulos, C., Vos, T., Pirkis, J., & Carter, R. (2011). The economic
analysis of prevention in mental health programs. Annual
review of clinical psychology, 7, 169-201.
https://doi.org/10.1146/annurev-clinpsy-032210-104601
Movahedi, F., Padman, R., & Antaki, J. F. (2020). Limitations of ROC
on imbalanced data: Evaluation of LVAD mortality risk scores.
arXiv preprint arXiv:2010.16253.
https://doi.org/10.1016/j.healun.2021.01.1160
National Research Council. (2009). Preventing mental, emotional, and
behavioral disorders among young people: Progress and
possibilities. National Academies Press.
Nembrini, S., König, I. R., & Wright, M. N. (2018). The revival of the
Gini importance?. Bioinformatics, 34(21), 3711-3718.
https://doi.org/10.1093/bioinformatics/bty373
Newson, J. J., & Thiagarajan, T. C. (2020). Assessment of population
well-being with the Mental Health Quotient (MHQ):
development and usability study. JMIR Mental Health, 7(7),
e17935. https://doi.org/10.2196/17935
Newson, J., & Thiagarajan, T. (2021). Dynamic dataset of global
population mental wellbeing.
https://doi.org/10.31234/osf.io/vtzne
Newson, J., Pastukh, V., & Thiagarajan, T. (2021). Reliability and
Validity of the Mental Health Quotient (MHQ).
https://doi.org/10.31234/osf.io/n7e9p
Nick, T. G., & Campbell, K. M. (2007). Logistic regression. Topics in
biostatistics, 273-301. Nick, T. G., & Campbell, K. M. (2007).
Logistic regression. Topics in biostatistics, 273-301.
Noble, W. S. (2006). What is a support vector machine?. Nature
biotechnology, 24(12), 1565-1567.
https://doi.org/10.1038/nbt1206-1565
Nusinovici, S., Tham, Y. C., Yan, M. Y. C., Ting, D. S. W., Li, J.,
Sabanayagam, C., ... & Cheng, C. Y. (2020). Logistic
regression was as good as machine learning for predicting
major chronic diseases. Journal of clinical epidemiology, 122,
56-69. https://doi.org/10.1016/j.jclinepi.2020.03.002
Oshiro, T. M., Perez, P. S., & Baranauskas, J. A. (2012, July). How
many trees in a random forest?. In International workshop on
machine learning and data mining in pattern recognition (pp.
154-168). Springer, Berlin, Heidelberg.
https://doi.org/10.1007/978-3-642-31537-4_13
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., …
Chintala, S. (2019). PyTorch: An Imperative Style, High-
Performance Deep Learning Library. In Advances in Neural
Information Processing Systems 32 (pp. 8024–8035). Curran
Associates, Inc. Retrieved from
http://papers.neurips.cc/paper/9015-pytorch-an-imperative-
style-high-performance-deep-learning-library.pdf
Pedregosa, F., Varoquaux, Ga"el, Gramfort, A., Michel, V., Thirion, B.,
Grisel, O., … others. (2011). Scikit-learn: Machine learning in
Python. Journal of Machine Learning Research, 12(Oct),
2825–2830.
Purtle, J., Nelson, K. L., Counts, N. Z., & Yudell, M. (2020). Population-
based approaches to mental health: history, strategies, and
evidence. Annual Review of Public Health, 41, 201-221.
https://doi.org/10.1146/annurev-publhealth-040119-094247
Qian, G., Sural, S., Gu, Y., & Pramanik, S. (2004, March). Similarity
between Euclidean and cosine angle distance for nearest
neighbor queries. In Proceedings of the 2004 ACM symposium
on Applied computing (pp. 1232-1237).
https://doi.org/10.1145/967900.968151
Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in
medicine. New England Journal of Medicine, 380(14), 1347-
1358. https://doi.org/10.1056/nejmra1814259
Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Hardt, M., ... &
Dean, J. (2018). Scalable and accurate deep learning with
electronic health records. NPJ digital medicine, 1(1), 1-10.
https://doi.org/10.1038/s41746-018-0029-1
Raschka, S. (2014). An overview of general performance metrics of binary
classifier systems. arXiv preprint arXiv:1410.5330.
https://doi.org/10.48550/arXiv.1410.5330
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-
validation. Encyclopedia of database systems, 5, 532-538.
https://doi.org/10.1007/978-0-387-39940-9_565
Rosenbusch, H., Soldner, F., Evans, A. M., & Zeelenberg, M. (2021).
Supervised machine learning methods in psychology: A
practical introduction with annotated R code. Social and
Personality Psychology Compass, 15(2), e12579.
https://doi.org/10.1111/spc3.12579
Rubeis, G. (2022). iHealth: The ethics of artificial intelligence and big
data in mental healthcare. Internet Interventions, 28, 100518.
https://doi.org/10.1016/j.invent.2022.100518
Saeb, S., Lonini, L., Jayaraman, A., Mohr, D. C., & Kording, K. P.
(2017). The need to approximate the use-case in clinical
machine learning. Gigascience, 6(5), gix019.
https://doi.org/10.1093/gigascience/gix019
Sano, A., Phillips, A. J., Amy, Z. Y., McHill, A. W., Taylor, S., Jaques,
N., ... & Picard, R. W. (2015, June). Recognizing academic
performance, sleep quality, stress level, and mental health
using personality traits, wearable sensors and mobile phones.
In 2015 IEEE 12th International Conference on Wearable and
Implantable Body Sensor Networks (BSN) (pp. 1-6). IEEE.
https://doi.org/10.1109/bsn.2015.7299420
Savas, C., & Dovis, F. (2019). The impact of different kernel functions on
the performance of scintillation detection based on support
vector machines. Sensors, 19(23), 5219.
https://doi.org/10.3390/s19235219
Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and
statistical modeling with python. In 9th Python in Science
Conference (pp. 92-96). https://doi.org/10.25080/Majora-
92bf1922-011
Selvaraj, P. R., & Bhat, C. S. (2018). Predicting the mental health of
college students with psychological capital. Journal of Mental
Health, 27(3), 279-287.
https://doi.org/10.1080/09638237.2018.1469738
Shatte, A. B., Hutchinson, D. M., & Teague, S. J. (2019). Machine
learning in mental health: a scoping review of methods and
applications. Psychological medicine, 49(9), 1426-1448.
https://doi.org/10.1017/s0033291719000151
Schiffman, S., Stone, A. A., & Hufford, M. R. (2008). Ecological
momentary assessment. Annu. Rev. Clin. Psychol., 4, 1-32.
https://doi.org/10.1146/annurev.clinpsy.3.022806.091415
Shrout, P., & Lane, S. P. (2012). Psychometrics. In M. R. Mehl, & T. S.
Conner (Eds.), Handbook of research methods for studying
daily life (pp. 302-320). Guilford Press.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., &
Salakhutdinov, R. (2014). Dropout: a simple way to prevent
neural networks from overfitting. The journal of machine
learning research, 15(1), 1929-1958.
Srividya, M., Mohanavalli, S., & Bhalaji, N. (2018). Behavioral modeling
for mental health using machine learning algorithms. Journal
of medical systems, 42(5), 1-12.
https://doi.org/10.1007/s10916-018-0934-5
Stathakis, D. (2009). How many hidden layers and nodes?. International
Journal of Remote Sensing, 30(8), 2133-2147.
https://doi.org/10.1080/01431160802549278
Teng, H. W., & Lee, M. (2019). Estimation procedures of using five
alternative machine learning methods for predicting credit card
default. In Handbook of Financial Econometrics, Mathematics,
Statistics, and Machine Learning (pp. 3545-3572).
https://doi.org/10.1142/9789811202391_0101
Thieme, A., Belgrave, D., & Doherty, G. (2020). Machine learning in
mental health: A systematic review of the HCI literature to
support the development of effective and implementable ML
systems. ACM Transactions on Computer-Human Interaction
(TOCHI), 27(5), 1-53. https://doi.org/10.1145/3398069
Thieme, A., Wallace, J., Meyer, T. D., & Olivier, P. (2015). Designing
for mental wellbeing: towards a more holistic approach in the
treatment and prevention of mental illness. In Proceedings of
the 2015 British HCI Conference (pp.1-10). Association for
Computing Machinery.
https://doi.org/10.1145/2783446.2783586
Thomas, A. J., Walters, S. D., Gheytassi, S. M., Morgan, R. E., &
Petridis, M. (2016). On the optimal node ratio between hidden
layers: a probabilistic study. International Journal of Machine
Learning and Computing, 6(5), 241.
https://doi.org/10.18178/ijmlc.2016.6.5.605
van der Ploeg, T., Austin, P. C., & Steyerberg, E. W. (2014). Modern
modelling techniques are data hungry: a simulation study for
predicting dichotomous endpoints. BMC medical research
methodology, 14(1), 1-13. https://doi.org/10.1186/1471-2288-
14-137
Vélez, J. I. (2021). Machine Learning based Psychology: Advocating for
A Data-Driven Approach. International Journal of
Psychological Research, 14(1), 6-11.
https://doi.org/10.21500/20112084.5365
Verplancke, T., Van Looy, S., Benoit, D., Vansteelandt, S., Depuydt, P.,
De Turck, F., & Decruyenaere, J. (2008). Support vector
machine versus logistic regression modeling for prediction of
hospital mortality in critically ill patients with haematological
malignancies. BMC medical informatics and decision making,
8(1), 1-8. https://doi.org/10.1186/1472-6947-8-56
Walsh, I., Pollastri, G., & Tosatto, S. C. (2016). Correct machine
learning on protein sequences: a peer-reviewing perspective.
Briefings in bioinformatics, 17(5), 831-840.
https://doi.org/10.1093/bib/bbv082
Wongkoblap, A., Vadillo, M. A., & Curcin, V. (2017). Researching
mental health disorders in the era of social media: systematic
review. Journal of medical Internet research, 19(6), e228.
https://doi.org/10.2196/jmir.7215
World Health Organization. (2004). Promoting mental health: Concepts,
emerging evidence, practice: Summary report. World Health
Organization.
https://apps.who.int/iris/bitstream/handle/10665/42940/9241
591595.pdf?sequence=1&isAllowed=y
Wu, L. T., Pilowsky, D. J., Schlenger, W. E., & Hasin, D. (2007).
Alcohol use disorders and the use of treatment services among
college-age young adults. Psychiatric Services, 58(2), 192-200.
https://doi.org/10.1176/appi.ps.58.2.192
Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation
in psychology: Lessons from machine learning. Perspectives on
Psychological Science, 12(6), 1100-1122.
https://doi.org/10.1177/1745691617693393
Zhang, H. (2004). The optimality of naive Bayes. AA, 1(2), 3
APPENDIX A

Composite scores
Overall mental health Dimensions of mental health
Overall MHQ Core Cognition
Complex Cognition
Mood & Outlook
Drive & Motivation
Social Self
Mind-Body Connection

Individual indicators of mental health


Spectrum items Problem items
Adaptability to change Restlessness and hyperactivity
Self-worth and confidence Fear and anxiety
Creativity and problem
solving Susceptibility to infection
Drive and motivation Aggression toward others
Stability and calmness Avoidance and withdrawal
Unwanted, strange, or obsessive
Sleep quality thoughts
Self-control and impulsivity Mood swings
Ability to learn Sense of being detached from reality
Coordination Nightmares
Relationships with others Addictions
Emotional resilience Anger and irritability
Planning and organization Suicidal thoughts or intentions
Physical intimacy Experience of pain
Speech and language Guilt and blame
Memory Hallucinations
Decision making and risk-
taking Traumatic flashbacks
Social interactions and co-
operation Repetitive or compulsive actions
Feelings of sadness, distress, and
Energy level hopelessness
Curiosity, interest, and
enthusiasm Physical health issues
Emotional control Confusion or slowed thinking
Focus and concentration
Appetite regulation
Empathy
Sensory sensitivity
Self-image
Outlook and optimism
Selective attention

Contextual factors
Age
Gender
Time of day
Country
State
Zip Code
Race/Ethnicity
Income
Employment
Job Role
Household
Education
How satisfied are you with your life in general?
In general, I get as much sleep as I need:
How regularly to you engage in physical exercise (30 minutes or more)?
How regularly do you socialize with friends in person?
Please select which substances you consume regularly (at least every
week)
Do you have a diagnosed medical disorder that significantly impacts
your way of life?
Medical Condition
Are you currently seeking treatment for any mental health concerns?
You answered "No" to the previous question. Please explain further by
selecting the following/ What kind of mental health support have you
sought/are you currently seeking?
Life Trauma
How would you describe your overall mood right now on a scale from
very negative to very positive?
How mentally alert are you feeling right now?
Approximately how many hours did you sleep last night?
Approximately how many hours ago was your last meal?
Physical complaints
Are you currently pregnant?
Covid-health impact
Covid-financial and social impact

You might also like