0% found this document useful (0 votes)

51 views9 pages

Leveraging Machine Learning For Predicting Mental Health Outcomes A Data-Driven Approach

This study explores the use of machine learning models to predict mental health outcomes, comparing various algorithms including Random Forest Classifier, which achieved the highest accuracy of 86.66%. It addresses class imbalance in datasets using the SMOTEENN approach and emphasizes the importance of hyperparameter tuning and regularization for improving model reliability. The findings suggest that machine learning can enhance early detection and management of mental health risks, providing valuable insights for mental health professionals.

Uploaded by

IJMSRT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views9 pages

Leveraging Machine Learning For Predicting Mental Health Outcomes A Data-Driven Approach

Uploaded by

IJMSRT

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology

ISSN No- 2584-2706

Leveraging Machine Learning for Predicting

Mental Health Outcomes: A Data-Driven
Approach
Devesh Kumar1,SOET Department, K.R. Mangalam University, Gurugram, India
Dr. Shweta Bansal2,SOET Department, K.R. Mangalam University, Gurugram, India,

Abstract Introduction
This study examines the application of General well-being and mental health are
machine learning models for predicting risks very important, as they impact both
in mental health issues and shows a individuals and society. The World Health
comparative analysis of various algorithms Organization defines mental health as being
focusing on K-Nearest Neighbors (KNN), free from mental illness, which is manifested
Logistic Regression (LR), Decision Trees by a state of equilibrium in which an
(DT), Random Forest Classifiers (RFC), individual can use their full capacity, work
Ada Boost Classifier, and Gradient Boosting properly, adapt to physical, psychological,
Classifier. The SMOTEENN approach was and social environments, and participate in
applied or educe class imbalance in the social life. Still, mental illnesses have arisen
dataset. This technique enhances the balance and increased in this scenario because, with
of the dataset and also the whole predictive high-stress technology environments of
performance of the models. Hyperparameter work, job demands outstrip and are usually
tuning optimized model parameters, and much higher than available resources. Along
significant results were obtained for with this awareness, recognition is also
enhancing the accuracy and F1 scores across being given toward the prevention of mental
all models. Applying L1 and L2 health risks being treated effectively, as this
regularization to reduce over fitting for approach improves treatment outcomes,
better reliability of models revealed that the removes the stigma attached to the matter,
Random Forest Classifier outperformed and enhances workplace productivity.
other algorithms with a near accuracy of Despite the increasing awareness about
about 86.66%. These findings highlight the mental health issues, there is still a lack of
possible role of machine learning in early tools that predict who might possibly
detection and proactive management of develop mental health disorders. Traditional
mental health risks. As such, data-driven assessments remain highly and heavily
approaches are likely to give new insights to dependent on self-report data or clinician
mental health professionals. The study is, judgments, which are not fully reliable and
therefore, a valuable contribution to the not very fine-grained. This presents a huge
growing body of literature on mental health opportunity for machine learning (ML)
analytics and underscores the importance of techniques, which can serve as an exemplary
robust methodologies in predicting alternative. Data-driven approaches by ML
outcomes for mental health. may discover intricate patterns and
Keywords: SVM.KNN, SMOTE, Random
Forest Classifier, Decision Trees

IJMSRT25MAR035 www.ijmsrt.com 132

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

Interconnections within datasets that often are employed to allow the model to be
go undetected with traditional methods. robust to the data and prevent overfitting.
To this aim, this study employs different Regularization is a penalty added to the loss
machine learning algorithms such as function to prevent overly complex models.
Decision Trees (DT), Random Forest Most often, these models perform well on
Classifiers (RFC), K-Nearest Neighbors training data but poorly on unseen data. We
(KNN), Logistic Regression (LR), and also use these techniques so that our predictive
ensemble methods like AdaBoost and models are more flexible and interpretable.
Gradient Boosting classifiers. All of these Our analysis shows that pretty accurate
algorithms have specific characteristics and predictions of risk cases for mental health
strengths, and they can be applied to can be made through the use of machine
different parts of the problem. For instance, learning models. The Random Forest
the Decision Tree is more interpretive in Classifier was found to be the best algorithm
nature, whereas ensemble methods like in terms of performance, with an accuracy
Random Forest and AdaBoost increase rate of 86.66%. Thus, this result suggests
predictive performance through the that machine learning methods could be
combination of multiple models. applied in the context of assessment and
Class imbalance is often one of the key intervention plans in mental health
challenges in predictive modeling with situations. If individuals at risk are identified
mental health data. In such data, the number early, they can be reached in time, and
of individuals who have taken treatment is intervention can be made to address their
always far fewer than those who haven’t. mental health.
This kind of problem can lead to biased Using this research, it is hoped that
models that favor the majority class. This exploration into these methods will lead to
study applies the SMOTE-ENN approach to advancements in understanding the risk
address this. It helps in generating synthetic factors related to mental health and
samples for the minority class, and noise is contribute to the development of strategies
removed from misclassified instances. Thus, that may proactively enhance mental health
our models become robust enough to learn management, ultimately improving the
better from minority class examples. overall quality of life for individuals
Additionally, aside from dealing with class exposed to mental health disorders. The
imbalance, we tune the hyperparameters of incorporation of machine learning in mental
our machine learning models with the health assessment has a strong future and
intention of finding optimal configurations. may provide a route to better understanding
Hyperparameter tuning is the act of mental health disorders, as well as a gateway
searching systematically through parameter to much-needed improvements in dealing
configurations to find the best one that with mental health concerns in an
results in the highest model performance. increasingly demanding world
This is very critical because, in many cases, .
it can cause huge variations in accuracy and Literature Review
generalizability. However, we use grid With an increasingly important field of
search and cross-validation techniques to get research regarding the application of
optimal values for the hyperparameters for machine learning models in the prediction of
each algorithm. mental health risks, this study uses various
Furthermore, two different regularization algorithms, including K-Nearest Neighbors,
procedures—L1 (Lasso) and L2 (Ridge)— Logistic Regression, Decision Trees,

IJMSRT25MAR035 www.ijmsrt.com 133

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

Random Forest Classifiers, AdaBoost Machine learning models applied across

Classifier, and Gradient Boosting Classifier, over 10,000 labeled posts were built using
to present their contributions to this highly SVM, random forests, XGBoost, KNN, and
relevant and critical issue in both practice neural networks. SVM achieved the highest
and research (Wang et al., 2020) [1], (Shin accuracy at around 83% for detecting signs
et al., 2020) [2]. of stress. GPT-3 embeddings allowed for
Big data analytics and AI appear to hold more nuanced signals of mental health than
promise in terms of mental health care traditional textual analysis, presenting a
through current research. Machine learning promising, effective, and scalable screening
techniques may have an advantage in that tool for detecting stress disorders based on
they can include a wider range of variables online data.
and observations in a model to predict According to [7], a machine learning model,
outcomes without following pre- such as XGBoost and regularized logistic
programmed rules (Wang et al., 2020). regression, has been designed to predict
Thus, these data-driven approaches have Type 2 diabetes risk among patients with
been applied to predict various health mental illness. In this study, the authors used
outcomes, including mental disorders like routine clinical data for 74,880 patients,
postpartum depression (Shin et al., 2020) deriving 1,343 potential predictors from 51
[2]. variables related to demographic, diagnostic,
This research addresses the dataset's issue of treatment, and lab information. The best-
class imbalance by utilizing the Synthetic performing model, XGBoost, identified
Minority Over-sampling Technique with patients who had a high risk of developing
Edited Nearest Neighbors (SMOTE-ENN). Type 2 diabetes approximately 2.7 years
This approach helps improve the balance of prior to diagnosis. The model had an area
the dataset and the generalized predictive under the ROC curve of 0.84, providing
capacity of the developed models (Ahsan & early risk warnings for 31% of the patients
Siddique, 2022) [4], (Shin et al., 2020), who eventually developed Type 2 diabetes.
(Rosenfeld et al., 2019) [3]. This illustrates the potential of predictive
Hyperparameter tuning, in addition to analytics for preemptive health interventions
regularization methods, helped fine-tune the among high-risk populations.
model parameters to prevent overfitting. The study [8] incorporates methodologies of
This was an extension aimed at increasing machine learning and deep learning in
accuracy in the model's outputs. Preliminary healthcare systems, addressing the growing
results show that the Random Forest concern of global mental illness, with rising
Classifier outperformed the other cases of depression and anxiety. The authors
algorithms, achieving around 80% accuracy reviewed 33 articles covering various mental
and an F1 score of 0.78 (Rosenfeld et al., health issues, including schizophrenia,
2019), (Akuamoah-Boateng et al., 2019) [5], bipolar disorder, and post-traumatic stress
(Wang et al., 2020) [1], (Shin et al., 2020) disorder (PTSD), using the PRISMA
[2]. methodology. These studies were grouped
According to [6], the authors represented under distinct methodologies associated with
social media posts in terms of vector the conditions addressed, showcasing the
representations that translate semantic broad range of ML and DL techniques in
meaning and linguistic nuances indicative of mental health applications.
mental health using large language models, The work [9] is an integrative review that
such as OpenAI’s GPT-3 embeddings. examines the integration of AI and ML

IJMSRT25MAR035 www.ijmsrt.com 134

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

decision support systems into mental health reporting the strengths and weaknesses of
care settings, reviewing literature from 2016 individual approaches.
to 2021. A dominant theme identified was
trust and confidence, with the study showing Dataset Description
that significant barriers hinder the adoption The dataset, in this study, was taken from
of AI-based systems in clinical practice. the "Mental Health in Tech Survey." The
Uncertainty regarding clinician trust, end- "Mental Health in Tech Survey" has 1,259
user acceptance, and system transparency observations and 27 features. Such a survey
will impede effective implementation. is very valuable for gaining knowledge
Therefore, the study calls for more research about people's experiences when it comes to
into understanding clinicians' attitudes mental health in the technological industry.
toward AI to instill confidence and Each feature in this database captures
accelerate its adoption in mental health care different aspects of the demographics of the
settings. respondents, working environments, and
The systematic review [10] analyzed 184 attitudes related to mental health. Thus, it is
studies that utilized machine learning (ML) quite an asset for predictive analytics.
methodologies in identifying mental health
(MH) disorders using multimodal data Attribute Description
collection methods from audio and video Data set "Mental Health in Tech
recordings, social media interactions, Source Survey” from Kaggle
smartphones, and wearable devices. This Total 1,259
review emphasized the feature extraction Observations
and fusion phases, revealing that neural Total 27
network architectures have widely gained Features
Purpose To analyze mental health
popularity in handling high-dimensional
experiences in the technology
data and modeling relationships between sector
various data modalities. The findings Feature Demographics, Workplace
suggest that using different sources of data Categories Environment, Mental
improves accuracy in detecting MH Health Attitudes
disorders. Usage Supports predictiveanalytics
Recent research captures the trend of mental health trends
machine learning methods progressing Table 1 : Summarizing the dataset used in the
towards the prediction of mental health, study.
highlighting the roles of advanced
algorithms, preprocessing techniques, Methodology
regularization, and ethical considerations. This project has been designed in a
Our contribution further advances the systematic manner so that it can tackle the
current understanding by combining problem of predicting the mental health risk
SMOTE-ENN, hyperparameter tuning, and through the use of machine learning
regularization to improve predictive techniques. It follows an approach with a
accuracy and the applicability of ML models few major steps, such as data preprocessing,
in the realm of mental health. Additionally, model selection, hyperparameter tuning, and
while previous studies have focused on the use of multiple techniques to optimize
individual models in mental health the model at hand. These models ensure that
prediction, this study systematically robust models are developed for predicting
evaluates a range of algorithms, not only mental health conditions in order to gain
more insights into these concerns.

IJMSRT25MAR035 www.ijmsrt.com 135

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

Data Preprocessing Variants of ensemble methods, such as

In the first place, we have data AdaBoost and Random Forests, are
preprocessing, which is a very important developed to increase prediction accuracy by
step before analyzing the raw data. In this taking the output of many trees. Since KNN
step, missing data is handled, and encoding is both simple and effective, it can also be
for categorical variables is done. Further, applied to such scenarios. In logistic
numerical features are scaled to ensure that regression, one gains insight into which
all the inputs are appropriately formatted to features do or do not influence the target
be used by machine learning algorithms. variable. Each of these models will be
Missing data would significantly skew the assessed relative to how well they predict
results of the experiment, so such gaps are mental health treatment-seeking behavior.
filled with either mean imputation or
through predictive modeling. One-hot Handling Class Imbalance
encoding is applied on categorical variables, Since the dataset is highly imbalanced due to
such as gender or employment status, to turn the nature of it, we will address class
them into numerical formats for use in imbalance by employing the SMOTE
training models. Finally, feature scaling can algorithm coupled with ENN. Class
be applied to standardize numerical features, imbalance refers to a scenario where a class
which are helpful in improving the speed of is grossly underrepresented relative to
convergence of some algorithms, especially another. In this case, the "seeking treatment"
those sensitive to scaling, such as Support class is grossly underrepresented compared
Vector Machines and K-Nearest Neighbors. to the "not seeking treatment" class. This
imbalance leads to skewed model
Model Selection predictions toward the majority class.
Secondly, we choose from the set of Adding synthesized examples of the
machine learning algorithms we will be minority class in the dataset balances it out,
using. After preprocessing, the models showing the model how to predict patterns
considered were Decision Trees (DT), related to the given classes. ENN performs
Random Forest Classifier (RFC), K-Nearest additional filtering of the dataset with its
Neighbors (KNN), Logistic Regression neighbors, removing examples that might
(LR), AdaBoost Classifier, and Gradient negatively influence classification. This
Boosting Classifier, as they have high enhances the training dataset's quality.
precision, are interpretable, and are suitable
for classification problems. Decision Trees
are particularly useful as they are more
transparent in leaving a decision path.

IJMSRT25MAR035 www.ijmsrt.com 136

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

Figure1: Proposed Methodology perform in predicting which people are

Hyper parameter Tuning and likely to seek treatment for their mental
Regularization health. For clarity, the results are presented
To enhance the performance of our models, using confusion matrices and ROC curves so
we use hyperparameter tuning techniques that we can understand the strengths and
such as grid search and randomized search. weaknesses of each model. This
The parameters of machine learning comprehensive methodology ensures a
algorithms are adjusted to optimize their rigorous approach towards developing a
performance metrics, like accuracy and F1 predictive model that can provide
score, using hyperparameter tuning. This is information for effective mental health
an important step since the selection of interventions.
hyperparameters can significantly affect the
predictive capabilities of the models. We Results
also include L1 and L2 regularization to Interesting results were obtained concerning
monitor over fitting. the ability of these classifiers to predict
Model Evaluation and Visualization mental health treatment-seeking behavior.
Lastly, we examine the behavior of these One of the major metrics in evaluating the
models by leveraging accuracy and F1 score. performance of a classifier in a classification
task is accuracy. The following is a table
CLASSIFIER ACCURACYSCORE F1SCORE
summarizing the accuracy scores achieved
This evaluation allows us to compare and from each classifier during testing.
see how well the different algorithms

IJMSRT25MAR035 www.ijmsrt.com 137

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

LOGISTIC REGRESSION 0.8293 0.8375

Figure 2: Confusion Matrix of Random
K-NEAREST
0.7813 0.7771
Forest Classifier before Hyperparameter
NEIGHBORS (KNN) Tuning
DECISION T CLASSIFIER 0.8533 0.8614

RANDOM FOR
0.8586 0.8684
CLASSIFIER

ADA BOOST CLASSIFIER 0.8213 0.8337

GRADIENT BOOST
0.8426 0.8543
CLASSIFIER

Table 2: Representing Accuracy score of

different classifiers
Figure 3: Confusion Matrix of Random
Among all the classifiers that were used, the Forest Classifier after Hyperparameter
Random Forest Classifier had the best score Tuning
of 0.8586 for accuracy. After completing Below is the ROC Curve for the Random
hyperparameter tuning, the accuracy of the Forest classifier before and after
Random Forest Classifier further improved hyperparameter tuning. The ROC AUC
to 86.66% when the parameters were curve denotes how well the model might
adjusted as follows: classify a person who could be in need of a
visit to mental health facilities against those
 n_estimators: 159 who are probably not in need, with the
 min_samples_split: 5 positive class versus the negative class.
 min_samples_leaf: 1
 max_depth: 70 The higher the AUC, the better the model is
 bootstrap: False at correctly classifying a person into the
group requiring treatment.
Below is the Confusion Matrix of the
Random Forest Classifier before and after Attention. This is particularly very crucial
tuning. The confusion matrix is a direct way for the mental health sector because such
to observe what the model predicts for early diagnosis boosts the chances of being
outputs. It can help explain findings to
stakeholders who may not be familiar with
more complicated metrics. Additionally,
patterns of misclassification can be
identified through the confusion matrix.

cured.

Figure 4: ROC Curve of Random Forest

Classifier before Tuning

IJMSRT25MAR035 www.ijmsrt.com 138

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

Mental health professionals should be able

to make real-time predictions and
interventions with tools or applications that
are user-friendly. The potential for such
advanced analytics in managing mental
health in the future is promising, especially
for early detection and intervention across
diverse populations, leading to better mental
health outcomes.

References
[1] Wang, W., Kiik, M., Peek, N., Curcin,
Figure 5: ROC Curve of Random Forest V., Marshall, I. J., Rudd, A. G., ... & Bray,
Classifier after Tuning B. (2020). A systematic review of machine
learning models for predicting outcomes of
In summary, these results indicate the stroke with structured data. PloS One, 15(6),
strength of ensemble methods, especially e0234722.
the AdaBoost algorithm, for mental health [2] Shin, D., Lee, K. J., Adeluwa, T., & Hur,
analytics. Moreover, the analysis here calls J. (2020). Machine learning-based predictive
attention not only to appropriate model modeling of postpartum depression. Journal
selection but also hyperparameter tuning to of Clinical Medicine, 9(9), 2899.
enhance mental health outcome prediction [3] Rosenfeld, A., Benrimoh, D., Armstrong,
models' performance. C., Mirchi, N., Langlois-Therrien, T.,
Rollins, C., ... & Yaniv-Rosenfeld, A.
Conclusion and Future Work (2019). Big data analytics and AI in mental
Furthermore, a better performance was healthcare. arXiv Preprint
achieved by the Random Forest Classifier, arXiv:1903.12071.
which attained an accuracy of 86.66% after [4] Ahsan, M. M., & Siddique, Z. (2022).
hyperparameter tuning. This improvement in Machine learning-based heart disease
accuracy must be taken into account in order diagnosis: A systematic literature review.
to enhance the prediction capability of the Artificial Intelligence in Medicine, 128,
model. Above, we have predominantly 102289.
discussed the ways in which machine [5] Akuamoah-Boateng, K., Banguti, P.,
learning algorithms are being applied to Starling, D., Mvukiyehe, J. P., Moses, B.,
mental health analyses, and such capabilities Tuyishime, E., ... & Bethea, A. (2020).
could be meaningfully relevant to mental 1383: Effect of implementing a fundamental
health research and practice for both critical care support course in emerging
researchers and practitioners. critical care systems. Critical Care
Some avenues for future work might include Medicine, 48(1), 668.
expanding this dataset to better represent the [6] Radwan, A., Amarneh, M., Alawneh, H.,
population, making it more generalizable Ashqar, H. I., AlSobeh, A., & Magableh, A.
and robust for a wider audience. A. A. R. (2024). Predictive analytics in
Additionally, it is likely that deep learning mental health leveraging LLM embeddings
and natural language processing can offer and machine learning models for social
further insights into the dynamics of mental media analysis. International Journal of
health.

IJMSRT25MAR035 www.ijmsrt.com 139

DOI: https://doi.org/10.5281/zenodo.15099218
Volume-3, Issue3, March 2025 International Journal of Modern Science and Research Technology
ISSN No- 2584-2706

Web Services Research (IJWSR), 21(1), 1-

22.
[7] Bernstorff, M., Hansen, L., Enevoldsen,
K., Damgaard, J., Hæstrup, F., Perfalk, E., ...
& Østergaard, S. D. (2024). Development
and validation of a machine learning model
for prediction of type 2 diabetes in patients
with mental illness. Acta Psychiatrica
Scandinavica.
[8] Iyortsuun, N. K., Kim, S. H., Jhon, M.,
Yang, H. J., & Pant, S. (2023, January). A
review of machine learning and deep
learning approaches on mental health
diagnosis. In Healthcare (Vol. 11, No. 3, p.
285). MDPI.
[9] Higgins, O., Short, B. L., Chalup, S. K.,
& Wilson, R. L. (2023). Artificial
intelligence (AI) and machine learning (ML)
based decision support systems in mental
health: An integrative review. International
Journal of Mental Health Nursing, 32(4),
966-978.
[10] Khoo, L. S., Lim, M. K., Chong, C. Y.,
& McNaney, R. (2024). Machine learning
for multimodal mental health detection: A
systematic review of passive sensing
approaches. Sensors, 24(2), 348.