Machine Learning Techniques For Prediction of Mental Health
Machine Learning Techniques For Prediction of Mental Health
Abstract— Suicide is the 2 nd leading cause of death in the world, for those aged 15-24 and about 800,000
victims of suicide yearly (all age), which is about 40 per second. Behavioural health disorder, explicitly
depression, are the type of health concerns, not many are aware of. There is no way one can get treatment
of something they are not aware of. So, classifying potential health disordered person is the first step
towards prevention. Lifestyle is something which defines individual the best. Lifestyle including Income,
age group, martial status, child, property owned, alcohol or tobacco consumption, medical expenditure,
insurance or other type of investment and many more. Using 76 such kind of attributes, model will
predict if the individual is victim of depression or not. The proposed model has used eight mainstream
ML calculation methods, namely (Decision tree (DT), Random Forest(RF), Support Vector
Machine(SVM), Naïve Bayes(NB), Logistic Regression(LR), XGBoost(XGB), Gradient Boosting
Classifier(GBC) and Artificial Neural Network(ANN) to build up the expectation models utilizing a huge
dataset (1429 individual’s survey), bringing about precise and productive dynamics. By using various
strategies and different model, this research work has attempted to get a clear and precise picture. The
reason to follow various approaches is that, precise the information, work in a better way and reduce the
number of suicide case. The final outcome received was 87.38 percent, which was using Support Vector
Machine (SVM).
(SVM), Naïve Bayes (NB), Logistic Regression grounds of data or information given to system.
(LR), XGBoost (XGB), Gradient Boosting Using information, model justifies a pattern using
Classifier (GBC) and Artificial Neural Network which the result is anticipated. They are addressed
(ANN). as Regression and Classification methods. Unaided
learning, while we perform unaided learning, the
A. Dateset information given to machine does not have any
labelling, thus machine has to make its own ways,
Dataset used has 76 attributed from them some are connecting various parameters [4]. It is used in
given [11]: circumstances in which we just need some
We have mentioned attributes with the details: connection between data or some hidden pattern
which are about impossible to find using human
1. Surveyed : Individual Identifier eyes, and it is all done without any human
2. Village : Village Identifier intervention, so chances of anomaly also decreases.
3.cons_nondurable: Non-durable expenditure As per our dataset, as dependent variable or result
(USD) variable we have has two possible conclusion,
4. asset_savings : asset_savings either the individual can be depressed or not
depressed. As we had labelled dataset, we had
5. cons_alcohol : Alcohol (USD)
applied classification calculation of directed
6. cons_tobacco : Tobacco (USD)
learning. Eight Distinct sort of arrangement
7. saved_mpesa: Saved money using M-Pesa calculation of machine learning is used .
8. durable_investment: Durable Investments 1. Decision tree (DT)
9. ed_expenses : Education expenditure past month 2. Random Forest (RF)
(USD) 3. Support Vector Machine (SVM)
10. early_survey : Psychology survey in 1st wave 4. Naïve Bayes (NB)
(dummy) 5. Logistic Regression (LR)
11. depressed: Meets epidemiological threshold for 6. XGBoost(XGB)
moderate depression (dummy) 7. Gradient Boosting Classifier (GBC)
8. Artificial Neural Network (ANN)
B. Data Pre-processing
A. Decision Tree (DT)
The process of converting some raw data into some
Decision tree is a concept which can be used in
meaningful information can be performed with
both, either classification or regression. In
some logical task and derive results [3]. This is
classification we mostly use decision tree, as it one
known as data pre-processing. For making a
of the most powerful and efficient tool [12].
machine learning model, this is one the most
Decision tree is a tree like structure classifier. After
crucial step, as in this phase we decide, if the
providing dataset as input in an algorithm, it makes
information is useful to us, or using which we can
a model out of it, further which we can use it to
have some insights.
predict results by providing our desired values or
In this process we do clean data, that means
information. The result will be anticipated on the
removing null values, assigning some values,
bases of the input dataset [6]. In Decision tree, each
derived from similar data or mean of data [13]. In
node represents a test on an attribute, outcome of
this phase we may also encounter some NA values,
the test is represented by the branch and leaf node
which we must remove, which we can do by
is treated as class label(terminal node) [17].
different ways [18]. Feature scaling is used to scale
Decision tree is drawn up side down, which mean it
different values to a same scale, so that we can
starts from the top, which means it’s root locates on
perform operations easily.
the top and leaf at bottom.
There was a column which was of date, we didn’t
In decision tree we have tree nodes:
want to make our prediction based on date, or make
date as a influencing factor, so we dropped the date 1. Chance Node – represented by circle
as column. Out of 86868 there were 13196 Nan 2. Decision Node – represented by square
values, which we removed and applied median 3. End Node – represented by triangle
function.
Construction of Decision tree
MODEL SELECTION Decision tree is made on the bases of outcomes
There are many strategies, which can be used to from the dependent attribute. For that we have to
find relevant results. There are two gatherings find entropy. Entropy is the possibility of getting a
possible from which we can characterize results of yes or no or we can say a 0 value or 1 value. To
those calculations, Regulated (Supervised) learning find the entropy of the class attribute, we have a
and unaided (Unsupervised) learning [14]. In formula:
Supervised learning, machine is prepared using
information. Then the result is anticipated on the
( ) ( ) ( ) ( )
Fig.3 Support Vector Machine Classifier
It is considering 2 child nodes.
( ) ( )
Ni(j) = importance of node j
W(j)=Weighted number of samples reaching node j
Class labels are denoted as -1 for negative class and
C(j)= Impurity level of node j
Left(j)=child node from left split on node j +1 for positive class in SVM.
F. XGBoost
Table 4: Evaluation Parameters for Naïve Bayes Accuracy comparison of all the method applied
XGBoost(XGB)
Bar Graph 1: Comparing results of all applied method
[7] Chancellor, Stevie, Eric PS Baumer, and [18] Jain, Tarun, et al. "Performance Prediction for
Munmun De Choudhury. "Who is the" Human" in Crop Irrigation Using Different Machine Learning
Human-Centered Machine Learning: The Case of Approaches." Examining the Impact of Deep
Predicting Mental Health from Social Learning and IoT on Multi-Industry Applications.
IGI Global, 2021. 61-79.