0% found this document useful (0 votes)
67 views21 pages

Effective Analysis of Machine and Deep Learning Methods For Diagnosing Mental He

This study investigates the application of machine learning (ML) and deep learning (DL) techniques for diagnosing mental health issues through social media conversations. Utilizing advanced natural language processing (NLP) models like BERT and XLNet, the research demonstrates high accuracy in detecting early signs of mental health conditions, emphasizing the potential for scalable, data-driven diagnostics. The findings highlight the importance of these technologies in enhancing mental health support and intervention strategies while addressing the challenges of traditional assessment methods.

Uploaded by

rank4118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views21 pages

Effective Analysis of Machine and Deep Learning Methods For Diagnosing Mental He

This study investigates the application of machine learning (ML) and deep learning (DL) techniques for diagnosing mental health issues through social media conversations. Utilizing advanced natural language processing (NLP) models like BERT and XLNet, the research demonstrates high accuracy in detecting early signs of mental health conditions, emphasizing the potential for scalable, data-driven diagnostics. The findings highlight the importance of these technologies in enhancing mental health support and intervention strategies while addressing the challenges of traditional assessment methods.

Uploaded by

rank4118
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

274 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO.

1, FEBRUARY 2025

Effective Analysis of Machine and Deep Learning


Methods for Diagnosing Mental Health Using
Social Media Conversations
Yashwanth Kasanneni , Achyut Duggal , R. Sathyaraj , and S. P. Raja

Abstract—The increasing incidence of mental health issues methods of diagnosing and treating mental health issues often
demands innovative diagnostic methods, especially within digital involve subjective assessments and therapy and can be time
communication. Traditional assessments are challenged by the sheer
volume of data and the nuanced language found on social media and intensive.
other text-based platforms. This study seeks to apply machine learn- With the use of big data and advanced computational technol-
ing (ML) to interpret these digital narratives and identify patterns ogies, machine learning (ML) and deep learning (DL) methods
that signal mental health conditions. We apply natural language have emerged as powerful tools that can potentially transform
processing (NLP) techniques to analyze sentiments and emotional the landscape of mental health diagnostics. The primary goal of
cues across datasets from social media and other text-based commu-
nication. Using ML, deep learning, and transfer learning models employing ML and DL in this context is to improve the accu-
such as bidirectional encoder representations (BERTs), robustly racy, efficiency, and accessibility of mental health diagnostics.
optimized BERT approach (RoBERTa), distilled BERT (Distil- These technologies offer the promise of detecting subtle patterns
BERT), and generalized autoregressive pretraining for language in large datasets that human clinicians might not easily recog-
understanding (XLNet), we assess their ability to detect early nize. This includes analyzing everything from electronic health
signs of mental health concerns. The results show that BERT,
RoBERTa, and XLNet consistently achieve over 95% accuracy, records (EHRs) and clinical notes to voice recordings and social
highlighting their strong contextual understanding and effective- media posts. Such analyses can reveal insights into behavioral
ness in this application. The significance of this research lies in its patterns, linguistic cues, and other markers that are indicative of
potential to revolutionize mental health diagnostics by providing mental health status.
a scalable, data-driven approach to early detection. By harnessing This article seeks to explore the various ML and DL techniques
the power of advanced NLP models, this study offers a pathway
to more timely and accurate identification of individuals in need that have been applied to detect mental health issues. It will cover
of mental health support, thereby contributing to better outcomes a range of methods including, but not limited to, support vector
in public health. machines, decision trees, recurrent neural networks (RNNs), and
convolutional neural networks (CNNs). Each method has its
Index Terms—Bidirectional encoder representation (BERT),
deep learning (DL), distilled BERT (DistilBERT), generalized strengths and limitations, which are examined considering their
autoregressive pretraining for language understanding (XLNet), application to diverse types of data, including social media inter-
machine learning (ML), mental health prediction, natural language actions, unstructured text, and real-time interaction data.
processing (NLP), robustly optimized BERT approach (RoBERTa), Furthermore, the integration of natural language processing
social media analytics, transfer learning.
(NLP) techniques to analyze text for sentiment and emotion pro-
vides a promising avenue for noninvasive mental health monitor-
ing and intervention. The potential of these technologies to act as
I. INTRODUCTION early warning systems identifying individuals at risk and facilitat-
ing timely intervention is an important point of discussion.
M ENTAL health disorders represent a major challenge to
public health, affecting millions worldwide and contrib-
uting significantly to the global burden of disease. Traditional
This introduction sets the stage for a detailed review and criti-
cal analysis of existing studies, highlighting the innovative ways
in which ML and DL are being leveraged to address mental
health challenges. It also discusses the ethical implications and
practical hurdles in the application of these technologies, paving
the way for an informed discussion on how these challenges can
be overcome. Through this exploration, the article contributes to
Received 11 June 2024; revised 24 September 2024; accepted 24 October
the broader discourse on mental health care, encouraging the
2024. Date of publication 8 November 2024; date of current version 31 responsible and effective integration of cutting-edge technolo-
January 2025. (Corresponding author: S. P. Raja.) gies in clinical settings.
The authors are with the School of Computer Science and Engineering
Overall, the study makes the following contributions.
(SCOPE), Vellore Institute of Technology, Vellore 632014, India (e-mail:
[email protected]; [email protected]; rsathyarajme@ 1) It was shown that other models were outperformed by gen-
gmail.com; [email protected]). eralized autoregressive pretraining for language under-
Digital Object Identifier 10.1109/TCSS.2024.3487168 standing (XLNet), achieving over 97% accuracy.

2329-924X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 275

2) The algorithms were trained across multiple datasets to for managing FAQs in educational environments, employing
ensure robustness and generalizability, enabling a com- advanced tools such as TensorFlow, PyTorch, and TF-IDF vec-
prehensive evaluation of their performance in diverse torization. The models were fine-tuned using techniques such as
contexts. stochastic gradient descent and dropout, addressing a gap in
3) The viability of using transfer learning to predict mental NLP applications within educational institutions. Their study
health conditions from textual data was studied. Complex provides a comprehensive evaluation of different NLP models,
texts were analyzed, and meticulous data preprocessing showcasing their effectiveness in handling diverse queries and
was performed using transfer learning and NLP techniques the strength gained from using advanced technologies and optimi-
to ensure all critical information was retained. zation methods. While the study highlights the practical benefits
of these models in improving student support and information
II. MOTIVATION distribution, it has certain limitations. The narrow scope of the
datasets used may not fully capture the diversity of queries
In the digital era, social media platforms have evolved into encountered in broader educational settings. Additionally, the
vital spaces for personal expression, community building, and, study does not delve into the scalability of the models for larger
notably, the sharing of mental health experiences. These plat- or more complex datasets, nor does it thoroughly explore how
forms hold a wealth of information that, if harnessed responsi- these models could be integrated into existing educational sys-
bly, can offer unprecedented insights into public mental health tems, potentially restricting their practical application.
trends and individual well being. The impetus for this study is Sunar and Khalid [3] conducted a systematic review of NLP
anchored in the imperative to better understand mental health techniques in student feedback analysis, categorizing studies
discourse within these digital communities. By harnessing the using Creswell’s five-step process based on objectives, methods,
capabilities of ML, we strive to develop tools that can detect pat- models, and tools. The review highlights the need for more
terns and sentiments related to mental health, potentially offer- advanced NLP techniques and broader language support, sug-
ing early identification of at-risk individuals and communities. gesting future research to develop models and lexicons tailored
Despite the considerable growth in online mental health dis- to educational contexts. Although the review provides a compre-
cussions, there remains a paucity of effective computational hensive synthesis of the current literature and identifies key
tools that can navigate the complexities of natural language to gaps, its dependence on existing studies might introduce biases,
identify relevant mental health information. The nuanced nature and it does not offer specific strategies for advancing NLP.
of such discourse, where context is key and expressions are Additionally, the focus on categorization may overlook nuanced
diverse, poses significant challenges to traditional text analysis or interdisciplinary approaches that could enhance feedback
techniques. The current landscape demands more sophisticated analysis. Sufi and Khalil [4] introduced an artificial intelligence
approaches that can not only discern the subtleties of language (AI)-based method for real-time disaster monitoring using social
but do so with sensitivity and accuracy commensurate with the media, integrating named entity recognition (NER), sentiment
seriousness of mental health issues. analysis, CNNs, and the Getis-Ord Gi* algorithm. This approach
Furthermore, the detection of negative and toxic content on effectively extracts location-based sentiments from tweets,
these platforms is critical in safeguarding users from harmful enhancing disaster response with high accuracy and broad lan-
interactions that can exacerbate mental health conditions. There guage coverage. The study’s strengths include its innovative
is a compelling need to refine these detection methods to con- combination of AI and NLP techniques, which enable the accu-
tribute to safer online environments. rate extraction and analysis of disaster-related data across multi-
ple languages and regions. The reported high accuracy (97%),
III. LITERATURE REVIEW precision (0.93), and F1-score (0.90) validate the method’s
effectiveness in identifying disaster locations and assessing pub-
Khan et al. [1] focused on detecting abusive language in lic sentiment. The study does have limitations, particularly due
Urdu, a low-resource language, by developing the “dataset of to its reliance on potentially noisy and incomplete social media
Urdu abusive language” (DUAL). They applied DL techniques data, which could affect the consistency of location intelligence.
and models such as logistic regression, Gaussian Naïve Bayes, Moreover, despite supporting multiple languages, the accuracy
support vector machines (SVMs), and random forest, with ran- of sentiment analysis may differ in non-English contexts, which
dom forest achieving the highest effectiveness. The inclusion of could restrict its broader applicability. Further exploration is
an attention layer in a bi-LSTM model, using custom Word2Vec needed to assess the scalability of the approach for handling
embeddings, notably improved detection performance. This larger datasets or real-time global processing.
study is valuable for addressing resource scarcity in Urdu and Nouman et al. [5] applied NLP for mental health prediction
enhancing detection accuracy with advanced models. However, using a novel dataset from the Lyf Support app, with a BiGRU
the dataset’s size and specific focus on Urdu may limit the find- model showing superior accuracy. The study emphasizes the
ings’ generalizability to other languages or more complex abu- importance of well-labeled datasets, enhancing model perfor-
sive content. Additionally, while random forest performs well, it mance for real-time mental health monitoring. A major strength
might not fully capture the nuanced nature of abusive language, is the use of a carefully labeled dataset by psychologists, ensur-
suggesting a need for more sophisticated approaches. Attigeri ing reliable findings. The study fails due to the small sample
et al. [2] developed and assessed various NLP models tailored size and reliance on oversampling, which may limit the

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
276 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

generalizability and robustness of the model. The study calls for applicability across different languages and cultural contexts may
further research with larger datasets to validate and enhance the be constrained, affecting its generalizability. Aggarwal et al. [10]
applicability of the proposed models. Kadam and Reddy [6] explored using linguistic markers from social media posts for
explored the prediction of mental health conditions from social early mental health detection, proposing an NLP and DL-based
media text using ML and DL models, focusing on complex data- model to analyze these elements. This approach offers a timely,
sets and nuanced language to enhance accuracy. The study noninvasive method for mental health monitoring, capitalizing on
expands the scope of mental health prediction to include condi- social media’s widespread use for psychological assessment and
tions such as digital addiction and substance use disorders, dem- intervention. The study’s strength lies in its innovative use of
onstrating the superiority of DL over traditional methods. A key social media as a scalable, real-time data source. The integration
strength is the innovative approach to analyze a broader range of of DL with linguistic analysis enhances the model’s ability to
conditions, with DL effectively handling complex language pat- detect subtle mood and emotional changes, making it valuable for
terns in social media. The reliance on social media data introdu- early intervention. The reliance on social media data introduces
ces potential biases, and the small, English-specific dataset may potential biases due to the varying quality and representativeness
limit generalizability. The study also highlights challenges in of the information, and the model’s effectiveness might be limited
practical implementation, such as the need for continuous model by the accuracy of linguistic markers and the difficulty in general-
updates and managing evolving language patterns. izing findings across diverse cultural and linguistic contexts.
Dristy et al. [7] assessed ML classifiers for predicting mental Mathin et al. [11] explored personalized mental health analysis
health status from text data processed using NLP techniques, using AI, employing NLP techniques to extract insights on anxiety,
emphasizing feature extraction and model selection for high depression, and stress from user inputs. They utilized AI models
accuracy. The study highlights how traditional data processing such as decision tree, random forest, multinomial Naive Bayes,
methods can be adapted for mental health diagnostics, with deci- and XGBoost, with the combination of multinomial Naive Bayes
sion trees and support vector machines proving effective when and XGBoost achieving the highest accuracy. The study also eval-
paired with TF-IDF scores and lexicon-based sentiment markers. uated the PSYCHE system, a wearable tool integrated with a
The research’s strength lies in its comprehensive evaluation of smartphone, and introduced “Diary Bot,” a chatbot for expressive
classifiers and the importance of feature engineering in boosting writing to support mental well being. This research demonstrates
predictive accuracy. The study’s small dataset could restrict the AI’s potential in providing tailored therapeutic strategies, with
generalizability of its findings, and the focus on traditional NLP high accuracy in mental health predictions. The inclusion of wear-
methods might overlook the potential benefits of recent DL able technology and a chatbot enhances the practical applicability
advancements that could improve accuracy across more diverse of the findings. The reliance on specific AI models might reduce
datasets and languages. Otter et al. [8] surveyed the use of DL in the system’s flexibility, and concentrating on predefined keywords
NLP tasks, particularly focusing on sentiment analysis crucial could limit its ability to adapt to complex or emerging mental
for mental health monitoring. They examined the adaptability of health issues. Furthermore, the PSYCHE system and Diary Bot
DL models in handling complex linguistic patterns to detect need additional validation in diverse real-world environments to
subtle changes in mood and emotions from text. The survey cov- confirm the broader applicability of the findings.
ers various DL architectures and transformers, highlighting their Msosa et al. [12] explored using AI and NLP to predict men-
strengths in processing large-scale unstructured text data. The tal health crises in individuals with depression, utilizing EHRs
authors also discuss challenges such as the need for large data- from Mersey Care that include both structured and unstructured
sets and high computational costs, proposing solutions such as data. The study employed random forest models, gradient boost-
transfer learning and model compression techniques. While the ing trees, and LSTM networks, with the LSTM network demon-
article offers a comprehensive overview of state-of-the-art mod- strating the best performance. This research highlights the
els, its broad scope may limit in-depth discussion on specific potential of integrating AI with EHR data to predict mental
models or challenges. Moreover, the practical implementation health crises, suggesting its use in clinical decision support
of proposed solutions such as transfer learning and model com- tools. The study’s strength lies in its comprehensive approach,
pression is not fully explored, leaving gaps in their applicability. leveraging a large dataset and combining data types to improve
Varshney et al. [9] introduced an ensemble classification the predictive accuracy. The LSTM network’s ability to capture
method for sentiment analysis to enhance mental health monitor- temporal data aspects enhances its relevance for real-time clini-
ing, focusing on improving the robustness and accuracy of detect- cal applications. However, the reliance on data from a single
ing emotional states from text. This approach combines multiple source may limit generalizability across different healthcare sys-
ML algorithms, neural networks, and decision trees, leveraging tems, and the high computational requirements of these models
their strengths to overcome individual weaknesses. The study could challenge their widespread implementation, particularly
emphasizes the method’s effectiveness in handling complex lin- in resource-limited settings. Danner et al. [13] introduced a
guistic features such as sarcasm, which are challenging for single- novel AI application for detecting depression using advanced
model systems, and integrates contextual and semantic analysis to transformer networks, analyzing clinical interviews with
refine sentiment detection accuracy. While the ensemble tech- bidirectional encoder representation (BERT)-based models,
nique improves reliability and accuracy, its reliance on multiple GPT-3.5, and ChatGPT-4. They enhanced traditional datasets
models increases computational complexity and resource with simulated data to improve the model performance while
demands, potentially limiting scalability. Additionally, the study’s addressing data protection concerns. This approach significantly

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 277

TABLE I
CHARACTERISTIC COMPARISON TABLE

Deep
Previous Text Sentiment Feature Language Transfer Attention Data
Linguistic
Study Classification Analysis Extraction Modeling Learning Mechanisms Augmentation
Processing
[1]        
[2]        
[3]        
[4]        
[5]        
[6]        
[7]        
[8]        
[9]        
[10]        
[11]        
[12]        
[13]        
[14]        
[15]        
[16]        
[17]        
[18]        
[19]        
[20]        
Proposed
       
Study

outperforms previous methods in detecting depression from lin- Serrano and Kwak [15] developed an emotional support AI
guistic patterns, demonstrating AI’s potential in revolutionizing (ESAI) system to assist individuals with mental health disorders,
mental health care through early detection and intervention. The using NLP and ML. Trained on 160 000 Reddit posts, ESAI
study’s strengths include the innovative use of advanced trans- employs a Naive Bayes classification model to detect symptoms
formers and simulated data to overcome the data scarcity and of various mental health disorders, offering a user-friendly inter-
improve the accuracy. The reliance on simulated data could face for text and speech interactions. Preliminary results show a
restrict the model’s applicability in real-world clinical environ- classification accuracy of around 70%, highlighting ESAI’s
ments. Despite addressing the ethical, legal, and social implica- potential to complement professional mental health care. The
tions of using AI in mental health care, the study does not fully study’s strengths include the innovative use of a large social
explore practical solutions to these challenges, leaving some media dataset and an accessible interface. The choice of a Naive
important issues insufficiently addressed. Table I presents a Bayes model might limit the system’s ability to capture complex
comparative analysis of NLP techniques across multiple studies, linguistic nuances, and the 70% accuracy indicates that there is
with the proposed study distinguishing itself by incorporating a room for improvement. Additionally, using self-reported Reddit
range of advanced methods. data could introduce biases, affecting the generalizability of the
Dixit et al. [14] utilized NLP techniques to assess mental health findings. Ahmad et al. [16] explored the use of NLP for mental
by targeting depression and anxiety markers in textual data, using health detection in Malay text, focusing on sentiment analysis,
sentiment analysis, emotion identification, and linguistic pattern emotion recognition, and linguistic pattern detection. The study
detection. Their approach, applied to a diverse dataset from addresses the challenges of applying NLP to Malay, such as the
social media, online forums, and healthcare records, signifi- need for high-quality datasets and cultural nuances. By using
cantly improved F1 scores, recall, and accuracy. The study’s techniques such as TF-IDF, Word2Vec, and GloVe, the research
strength lies in its comprehensive approach and diverse dataset, demonstrates NLP’s potential for early mental health detection
enhancing generalizability and accuracy in detecting mental health and intervention, particularly in underrepresented languages.
markers. The focus on textual data might miss nonverbal cues, and The study’s focus on Malay enhances the relevance and robust-
the complexity of the NLP techniques could pose challenges for ness of its findings, but limitations arise from the scarcity of high-
implementation in resource-limited environments. The study sug- quality datasets and the limited integration of cultural nuances
gests that future research should explore cross-cultural datasets into NLP models, which may affect the accuracy and applicability
and integrate multimodal data to overcome these challenges. across diverse Malay-speaking populations. Ahmad et al. [17]

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
278 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

reviewed the integration of AI in mental health services, focusing


on the analysis of electronic medical records and social media
posts using NLP and ML algorithms. The study highlights AI’s
potential for early detection and intervention by identifying pat-
terns and symptoms of mental disorders. It discusses the role of
AI-powered virtual assistants, chatbots, VR, AR, and wearable
devices in enhancing patient care through initial assessments, psy-
choeducation, and real-time monitoring. The review’s strength
lies in its comprehensive coverage of AI’s multifaceted applica-
tions, showcasing a holistic approach to mental health care. How-
ever, the broad scope may limit the depth of analysis in specific
areas, such as implementation challenges in clinical settings.
Additionally, the review does not deeply explore ethical, legal,
and social implications, including data privacy concerns and algo-
rithmic bias, which are crucial for the responsible use of AI in
mental health.
Lee and Kyung [18] explored using NLP to address mental
health stigma by identifying and classifying stigmatized lan-
guage in text. They developed the Mental Health Stigma Corpus
(MHSC) to train a BERT model, achieving a 94% accuracy and
a 91% F1 score despite the limited dataset. The study highlights
NLP’s potential to detect and reduce mental health stigma, even
with small data sets, suggesting applicability to other underex-
plored fields with limited data. The research stands out for its
innovative focus on mental health stigma and the successful
application of BERT in a low-resource setting. Yet, the limited
size of the MHSC could restrict the model’s ability to generalize
and fully capture subtle, implicit stigmatized language. More-
over, while the results are promising, the study does not explore
the broader implications, such as how the model could be inte-
grated into real-world systems or the ethical considerations of
using AI to monitor and address stigma. Siddik et al. [19] devel-
oped Psyche Conversa, a DL-based chatbot framework for
detecting mental health states, integrating a keylogger, chat
module, and DL models such as Conv-LSTM and BERT.
BERT demonstrated the highest performance with a 75.3%
accuracy on the Reddit mental health dataset. The chatbot also Fig. 1. Flow diagram of the training process.
uses cognitive behavioral therapy (CBT) techniques to provide
therapeutic responses. While the integration of multiple data
IV. METHODOLOGY
sources enhances detection accuracy, the reliance on keyloggers
raises privacy concerns that could limit user adoption. Addition- Fig. 1 shows a flowchart representing a comprehensive process
ally, the study’s accuracy suggests room for improvement, and for preparing and training models using ML, DL, or transfer
it does not fully address the ethical implications of invasive data learning techniques. It begins with loading the dataset, followed
collection methods. Rani et al. [20] developed Saarthi, a mental by data preprocessing to ensure the data are ready for further anal-
health chatbot system that utilizes NLP and AI to deliver CBT ysis. Depending on the chosen method—either transfer learning
and remote health monitoring. Saarthi provides personalized or a more traditional approach—the process includes lemmatizing
support for anxiety and depression, using AI algorithms to ana- the text data and performing label encoding to convert categorical
lyze user interactions and offer empathetic responses. It aims to labels into numerical values. The flowchart then branches into
manage symptoms, improve the well being, and connect patients specific workflows based on the selected learning method. For
with a supportive community and professionals, addressing the ML, TF-IDF vectorization is applied, while DL involves tokeni-
mental health professional shortage and reducing stigma. The zation and the generation of input IDs. If neither is specifically
integration of CBT with AI for personalized and accessible care, chosen, a text vectorization layer is used. The dataset is subse-
along with a focus on community support, is key strengths of the quently split into training and testing sets, allowing the model to
Saarthi system. That said, AI-driven interactions might not match be compiled and trained on the training data. Finally, the model
the depth of traditional therapy, and its effectiveness across undergoes evaluation and fine-tuning to optimize performance
diverse cultural contexts and in managing complex conditions before deployment. This structured approach ensures robustness
without professional oversight still needs further exploration. and adaptability across various datasets and methods.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 279

A. BERT, Robustly Optimized BERT Approach (RoBERTa),


and Distilled BERT (DistilBERT)
1) Concept: These models rely on the Transformer architecture,
distinguished by its innovative self-attention mechanism that
dynamically assesses word relevance within a sentence to grasp
intricate linguistic patterns and dependencies. These models are dis-
tinguished by their dual-phase learning process. Initially, they
undergo extensive pretraining on voluminous text corpora, where
BERT and RoBERTa employ masked language modeling (MLM)
to predict randomly masked tokens within a context. DistilBERT
further streamlines this approach through knowledge distillation,
absorbing the distilled essence of BERT’s capabilities but in a more
compact and efficient form. RoBERTa diverges slightly by exclud-
ing the next sentence prediction (NSP) task, concentrating instead
on optimizing MLM through varied data and training enhance-
ments. BERT and RoBERTa share a similar architectural founda-
tion, composed of multilayer bidirectional transformer encoders.
BERT is available in two configurations: BERT Base, with 12
transformer blocks and BERT large, with 24 transformer blocks.
RoBERTa scales up the BERT architecture, optimizing hyperpara-
meters, and training procedures but fundamentally relies on the
same transformer encoder structure. DistilBERT, in contrast,
streamlines the BERT architecture by halving the number of Trans-
former blocks to 6, maintaining the multihead self-attention and
feed-forward networks (FFNs) but with fewer parameters, facilitat-
ing a balance between performance and efficiency.
2) Mathematical Formulation: First, the initial embedding Ei
which is given in (1) for a token i is the sum of its token embed-
ding etokeni , segment embedding esegmenti , and positional embed-
ding epositioni
Ei ¼ etokeni þ esegmenti þ epositioni: (1)
Then, self-attention calculates query (Q) given by (2), key (K)
given by (3), and value (V) vectors given by (4)
K ¼ E  WQ (2)

V ¼ E  WK (3)
Fig. 2. Flow diagram of a transform learning-based model [21].
V ¼ E  WV : (4)
For classification, the final hidden state of the [CLS] token is
Attention weights (A) are computed by used. The logits are obtained by a linear transformation of this
!
QK T hidden state from
A ¼ softmax pffiffiffiffiffi (5) Logits ¼ Hidden State½CLS  W þ b: (8)
dk
A softmax function is applied to the logits to derive class prob-
In the end, the output is given by Attention(Q, K, V) ¼ A  V.
The output from the self-attention mechanism for each position abilities from the function
passes through a FFN given by the following equation, which PðclassÞ ¼ softmaxðLogitsÞ: (9)
applies two linear transformations with rectified linear Unit (ReLU) Fig. 2 shows the architecture of a transformer-based language
activation in between model for text understanding. The input is tokenized and
FFNðxÞ ¼ maxð0, xW1 þ b1ÞW2 þ b2: (6) passed through an embedding layer that combines token embed-
dings with positional encodings (E0, E1, … E7) to retain the
Each sublayer, including self-attention and FFNs, is wrapped with
order of words. The “CLS” token is used for classification tasks
layer normalization and residual connections. For an input x, the
and is included at the beginning of the sequence. Each token
output y after a sublayer with a residual connection and layer nor-
embedding is then processed through a stack of 12 transformer
malization is given by
encoder blocks, each consisting of multihead attention and feed-
y ¼ LayerNormðx þ SubLayerðxÞÞ: (7) forward neural network layers, with add & norm steps following

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
280 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

Algorithm 1: Text Classification Using Transformers a profound understanding of text nuances. This quality is par-
(BERT, RoBERTa, DistilBERT) ticularly beneficial in analyzing content where contextual
interpretation—such as differentiating between literal and figura-
1. Load dataset D into a DataFrame df. tive language—is critical, as in mental health discourse on social
2. Preprocess ðdf Þ: Clean and normalize each text instance ðti Þ; media platforms. The robust pretraining regimen equips these
tokenize, remove stopwords, and lemmatize using NLP libraries; models to navigate and interpret the linguistic diversity of social
then split into training set ðDtrain Þ and validation set ðDval Þ: media language, including idioms, colloquialisms, and abbrevia-
tions. Their capacity to parse and understand such varied linguistic
3. Load the tokenizer T and model M with pretrained weights expressions makes them invaluable tools for sentiment analysis
(e.g., ‘bert-base-uncased,’ ‘distilbert-base-uncased,’ and and emotional state detection in digital communications. Their
‘roberta-base’) depending on the chosen architecture. ability to process multilingual content and detect regional mental
4. Convert texts into tokens and map to input IDs, attention health discourse patterns marks a significant advancement in
masks A, and segment/type IDs S using T: cross-cultural communication, contributing to global mental health
support initiatives. Particularly, DistilBERT’s efficiency under-
5. Create tensor slices or TensorFlow datasets from encoded scores the potential for real-time sentiment analysis and emotional
texts and labels for Dtrain and Dval : state detection, essential for monitoring harmful content, crisis
6. Compile M with an optimizer H ðe.g., Adam or AdamW), intervention, and live customer support on digital platforms.
learning rate g, epsilon , and clipnorm c if applicable. 5) Pros/Cons:
a) Pros:
7. Define loss function L as sparse categorical cross-entropy:
1) The deep bidirectional architecture of these models allows
XN
Lðy,by Þ ¼  yc  logðybc Þ them to understand the context of words within a sentence
i¼1
better than traditional models. This is crucial for detecting
where y is the true label and by is the predicted label probabil- nuanced expressions related to mental health.
ity, C is the number of classes. 2) Social media text often contains irregularities such as
 P slang, abbreviations, and emojis. These models’ robust-
8. Set metrics to track, such as accuracy Acc ¼ N1 Ni¼1
ness to different types of text can be an advantage when
1ðyi ¼ by i ÞÞ where N is the number of instances.
processing such data.
9. For each epoch E in a predefined number of epochs: 3) Since these models are pretrained on a large corpus, they
For each mini-batch b in Dtrain : come with a built-in understanding of language. It can rec-
a. Perform a forward pass to compute logit predictions ognize a range of language constructs that are beneficial
Pb ¼ M ðbÞ: when dealing with complex, real-world data such as tweets.
b. Apply the softmax function to obtain predicted b) Cons:
d 1) The architectural complexity can make hyperparameter
probabilities P b,h ¼ softmax
PðPb Þ.  
c. Calculate the loss Lb ¼  yb log P cb . tuning and model optimization challenging, requiring
d. Perform a backward pass to compute gradients ðrLb Þ: significant expertise to achieve optimal performance.
e. Update model parameters H using gradients and 2) The depth and capacity of these models, while beneficial
optimizer H ¼ H  grLb Þ. for capturing linguistic nuances, also increase the risk of
overfitting, especially on smaller datasets.
10. Evaluate M on Dval to calculate validation loss Lval and 3) The large number of parameters and the complexity of
accuracy Accval : the self-attention mechanism increase the computational
demands for training and fine-tuning, particularly for
BERT and RoBERTa.
each sublayer. The output from the final transformer block goes
B. XLNet
through a classification layer with a fully connected neural net-
work, Gaussian error linear unit (GELU) activation, and layer nor- 1) Concept: XLNet uses permutation language modeling,
malization (Norm). The result is then converted to a probability allowing it to learn from all possible word order permutations,
distribution over the possible vocabulary through a Softmax func- thus gaining a deeper contextual understanding of language.
tion to generate the final output. Symbols used in the diagram XLNet’s pretraining involves permutation language modeling, a
include “CLS” for denoting the start and separation of sentences, technique that differs from traditional sequential predictions by
Ei for positional encodings, and EToken for token embeddings. considering every possible permutation of the input sentences.
3) Algorithm: Algorithm 1 describes a text classification pro- This approach enables XLNet to capture a richer language
cess using Transformer models, involving data preprocessing, context than models trained in a single, fixed direction. Built
tokenization, model compilation, and iterative training with gra- upon the Transformer-XL architecture, XLNet benefits from
dient updates. an extended memory across longer text sequences, which allows
4) Relevance: By harnessing the Transformer’s self-attention, it to maintain context effectively over large documents. The core
these models offer adaptive contextual intelligence, enabling of its architecture is a series of transformer layers that use self-

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 281

attention mechanisms, enhanced to handle the permutation-based


training approach. XLNet operates by considering all possible
permutations of the input sentence or document, which allows it
to model the probability of a word sequence without being biased
toward a specific order. During its pretraining phase, XLNet uses
the permutation language modeling technique along with a spe-
cial two-stream self-attention mechanism. This mechanism com-
prises a content stream, which processes the actual words, and a
query stream, which handles the positions of the masked words.
This dual-stream attention allows XLNet to effectively predict the
identity of the masked words by considering their context in all
possible directions.
2) Mathematical Formulation: First, XLNet’s PLM allows
the model to learn from all possible permutations of the input
sequence, enhancing its ability to understand the bidirectional con-
text. For a sequence of tokens X ¼ fx1 , x2 , :::, xN g, PLM predicts
a token xt considering all possible permutations of the sequence.
Given a permutation p, the probability of is modeled from
 
P xpðtÞ jxpð1Þ , :::, xpðt1Þ : (10)
The objective is to maximize the expected log likelihood over
Fig. 3. Working of an XLNet-based model [22].
all permutations p of the sequence
" #
XN
  the content stream to attend to itself while the query stream can-
L ¼ Ep log P xpðtÞ jxpð1Þ , :::, xpðt1Þ ; H : (11)
t¼1
not, denoted by different attention masks.
3) Algorithm: Algorithm 2 describes a text classification pro-
Then, XLNet utilizes a distinct dual-stream approach to self- cess using XLNet, involving data preprocessing, tokenization,
attention that involves separate streams for content and queries. model compilation, and iterative training with gradient updates.
a) Content stream attention: For the content representation 4) Relevance: XLNet’s permutation-based training allows it
hct at position t, we compute from to understand the context of words in sentences comprehen-
hct ¼ AttentionðQc , Kc , Vc Þ (12) sively, without bias toward any order. This feature is crucial for
analyzing mental health-related texts, where the significance of
where Qc, Kc, and Vc are the query, key, and value matrices for certain expressions can highly depend on their context. Mental
the content stream. health discussions in tweets or personal texts may involve com-
b) Query stream attention: The query representation hqt is plex expressions, idiomatic language, or subtle cues. XLNet’s
used for predicting the masked token. It includes the context up ability to model diverse word orders helps it to capture these
to the position t in permutation p. The query vector for the posi- nuances more effectively than models trained in unidirec-
tion to predict is given by tional contexts.
hqt ¼ AttentionðQq , Kq , Vq Þ: (13) Given the variable length of tweets and personal texts, from
short tweets to potentially longer personal texts, XLNet’s under-
where Qq, Kq, and Vq are the query, key, and value matrices for
lying Transformer-XL architecture enables it to maintain con-
the query stream, respectively. Building on Transformer-XL,
text over long sequences of text. This adaptability ensures
XLNet introduces a recurrence mechanism with relative posi-
effective processing across texts of varying lengths, capturing
tional encoding to maintain context over long sequences. The
detailed context in longer discussions or narratives about mental
self-attention mechanism for XLNet is given by
! health.
QK T 5) Pros/Cons:
AttentionðQ, K, V Þ ¼ softmax pffiffiffiffiffi þ Srel V: (14) a) Pros:
dk
1) XLNet’s permutation-based learning allows for a more
Here, Srel represents the relative positional encoding matrix, nuanced understanding of context. This is particularly
enhancing XLNet’s ability to capture positional relationships beneficial for mental health texts, where the meaning
within sequences. may subtly defer from the word.
In Fig. 3, the left side of the diagram illustrates the standard 2) XLNet is adept at processing texts of varying lengths,
self-attention mechanism where each token attends to all others from short tweets to longer personal narratives. This
in a sequence, represented by Q (query), K (key), V (value) vec- flexibility ensures effective context capture across differ-
tors, and subsequent output hi. The right side depicts a marked ent forms of expression related to mental health.
two-stream attention model that separates the attention process 3) XLNet’s ability to model language without a predefined
into two streams: a content stream and a query stream, allowing direction makes it exceptionally good at understanding

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
282 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

Algorithm 2: Text Classification Using XLNet be more complex and time-consuming compared to sim-
pler models. This complexity may pose challenges for
1. Load dataset D from a CSV file into a pandas DataFrame df . teams with limited machine-learning expertise.
2. Preprocess df : Normalize ti in df by lowercasing and 3) Owing to its robust language processing capabilities,
removing new lines and special characters, tokenize and XLNet could be susceptible to overfitting when applied
remove stopwords using NLTK, lemmatize with SpaCy, to limited or homogenous datasets. This predisposition
and split into Dtrain and Dval : may diminish the model’s proficiency in applying learned
knowledge from the training datasets to practical, real-
3. Load the XLNet tokenizer TXLNet and model MXLNet with world textual applications.
‘xlnet-base-cased’ pre-trained weights.
4. Convert Dtrain and into input features: Encode each ti into C. Long Short-Term Memory (LSTM)
input IDs and attention masks using TXLNet then structure and 1) Concept: LSTMs represent an advanced type of RNN
batch these into tensors. designed for mastery of prolonged data dependencies. Their
5. Compile MXLNet with: development aimed to address traditional RNNs’ shortcomings,
notably the challenging vanishing gradient issue that hampers
a. An Adam optimizer H with a learning rate g. the learning process for extended sequences. LSTMs have sig-
b. The loss function L, defined as the cross-entropy loss, nificantly pushed forward the capabilities in the realm of
which can be expressed with log-softmax as: sequential analysis and forecasting.
X
K
  An LSTM harbors a sophisticated internal architecture to
Lðy,b
y Þ ¼  yi  log softmaxðzi Þ steward and reshape the information over durations. This intri-
i¼1 cate setup empowers them to retain and advance vital informa-
where K is the number of classes. tion across protracted sequences, thus serving as a bridge over
c. Metrics for model evaluation, with accuracy Acc as: sizable temporal gaps. The architecture’s core element that ena-
bles the conservation of data over extensive periods consists of
1X N
various gates that manage information flow: these include the
Acc ¼ 1ðyi ¼ by i Þ
N i¼1 input gate, forget gate, and output gate.
The input gate is tasked with deciding the quantity of new
where N is the number of instances. data to be integrated into the cell state. The forget gate is respon-
6. Train MXLNet on Dtrain : sible for identifying and eliminating data that is no longer
needed for the cell’s current task. The output gate’s role is to
a. For each epoch e out of a total E epochs, iterate over select the portion of the cell state to be released during the cur-
mini-batches b in Dtrain : rent processing stage.
b. In each mini-batch b: These components are pivotal in providing the LSTM with
i. Perform a forward pass with MXLNet to compute the the discretion to both retain and omit information, tailoring its
logits zb : memory to the demands of tasks where long-term data retention
ii. Apply the log-softmax function to obtain the  is essential.
predicted probabilities pb ¼ log softmaxðzb Þ . 2) Mathematical Formulation: There are four gates mainly
iii. Compute the negative log-likelihood loss Lb for used in an LSTM, they are as follows.
pb with respect to the true labels yb : The forget gate which is given in (15) decides what informa-
iv. Backpropagate the loss to compute the gradient rLb : tion is discarded from the cell state
v. Update the model parameters H using rLb
through H. ft ¼ rðWf  ½ht  1, xt þ bf Þ: (15)
c. After each epoch, evaluate MXLNet on Dval to compute The input gate which is given by (16) updates the cell state
the validation loss and accuracy. with new information
it ¼ rðWi  ½ht  1,xt þ biÞ C e t ¼ tanhðWC  ½ht  1,xt þ bCÞ:
the complex, nuanced language often used in discussions
(16)
about mental health, potentially leading to higher accu-
racy in identifying specific mental health issues. The cell state is given by (17) updated by forgetting the
b) Cons: selected information and adding new candidate values
1) XLNet’s sophisticated architecture and the requirement 1Ct ¼ ft  Ct  1 þ iteC e t: (17)
to handle permutations make it computationally inten-
sive. Training and inference with XLNet might require The output gate decides the next hidden state, which is
significant computational resources, which could be a given by
limiting factor for some applications. ot ¼ rðWo  ½ht  1, xt þ boÞ (18)
2) Setting up XLNet, especially customizing and fine-tuning
it for specific tasks such as mental health classification, can ht ¼ ot  tanhðCtÞ: (19)

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 283

Algorithm 3: Text Classification Using LSTM


1. Load the dataset D into a DataFrame df and remove any
missing values.
2. Apply a preprocessing function to clean, tokenize, remove
stopwords, and stem each text instance ti :
3. Encode the labels using a LabelEncoder, converting
categorical labels into a numeric form suitable for classification.
4. Split the dataset into a training set ðXtrain , Ytrain Þ and a testing
set ðXtest , Ytest Þ:
5. Define a text vectorization layer to convert raw text into
integer sequences, adapting it based on Xtrain :
6. Construct the LSTM model architecture:
Fig. 4. Single unit of an LSTM network [23]. a. Use an input layer to receive sequences of token IDs.
b. Apply the Text Vectorization layer.
Fig. 4 illustrates the structure of an LSTM network. This dia- c. Utilize an Embedding layer to map the integer
gram shows the flow of data through the LSTM’s cell states (C0, sequences to dense vector representations.
C1, … Ct) and hidden states (H0, 0xn). The “tanh” layers create d. Add an LSTM layer, with the gate and state updates
a vector of new candidate values that could be added to the state. specified for each cell at each time step.
Multiplication (“”) signifies elementwise multiplication, serv- e. Add Dense layers with ReLU activation functions for
ing as gating mechanisms in the LSTM, and addition (“þ”) higher-level representation learning, and a final Dense
denotes elementwise addition, which is used to update the cell layer with a Sigmoid activation function for binary
states. The circular “r” represents the sigmoid activation func- classification.
tion, which outputs a value between 0 and 1, dictating the degree
7. Compile the LSTM model with the binary cross-entropy loss
to which each component is let through a gate.
function and Stochastic Gradient Descent (SGD) optimizer.
3) Algorithm: Algorithm 3 details text classification using an
LSTM network, involving data preprocessing, label encod- 8. Train the LSTM model on the training data ðXtrain , Ytrain Þ
ing, dataset splitting, and constructing an LSTM model with and validate its performance on the testing data ðXtest , Ytest Þ:
embedding and dense layers. The model is trained using
binary cross-entropy loss and SGD optimizer.
4) Relevance: LSTM networks have been instrumental in 2) Flexibility: They can handle a variety of sequence lengths,
advancing NLP, particularly in applications such as the modeling of from short to very long sequences, without needing the
mental health conditions. LSTMs are designed to learn from sequence length to be specified ahead of time.
sequence data and can remember long-term dependencies because 3) Wide Applicability: LSTMs have been proven effective
of their unique architecture. This is especially valuable in NLP tasks across a broad range of domains and tasks, especially
where context and the order of words play a crucial role in under- where understanding temporal dynamics is crucial.
standing the meaning of a text, which is often the case in detecting b) Cons:
and analyzing language for mental health assessment. LSTM mod- 1) Training Complexity: LSTMs are computationally inten-
els can be trained to detect sentiment in text data, which is a valu- sive to train, requiring significant resources and time, espe-
able feature for mental health monitoring. They can help to identify cially as the sequence length and dataset size increase.
negative sentiment that might correlate with conditions such as 2) Risk of Overfitting: Without proper regularization, LSTMs
depression and anxiety. Mental health conditions can be reflected can be overfit on smaller datasets, learning to memorize
through complex patterns of speech or writing that may include rather than generalize.
subtle cues about a person’s mood or cognitive state. LSTMs can 3) Parameter Tuning: They come with a plethora of hyper-
learn these complex patterns due to their ability to retain informa- parameters (related to both architecture and training)
tion over long sequences. Certain mental health conditions manifest that can be challenging to tune effectively.
through changes over time in how individuals express themselves.
LSTMs can recognize shifts in linguistic patterns over time, which
D. Gated Recurrent Units (GRUs)
can be indicative of changes in mental health.
5) Pros/Cons: 1) Concept: GRUs enhance the basic RNN framework, facil-
a) Pros: itating the retention of information across extended data sequen-
1) Extended Memory Retention: LSTMs excel at acquiring ces. The pivotal feature of GRUs is their gating mechanism,
and retaining information across extensive sequences, which regulates the flow of information to future states, thus
outperforming conventional RNNs in this regard. addressing the issue prevalent in traditional RNNs known as the

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
284 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

Algorithm 4: Text Classification Using GRU


1. Load the dataset D from a CSV file into a DataFrame df
and remove any missing values.
2. Apply a function preprocessor to clean, tokenize, remove
stopwords, and stem each text instance ti :
3. Encode the labels fy1 , y2 , :::, yn g to numeric form using label
encoding, where each unique label is mapped to a unique
integer.
4. Convert the preprocessed text data to a numerical format
using TF-IDF vectorization or Count Vectorization.
5. Split the dataset into a training set ðXtrain , ytrain Þ and a testing
Fig. 5. Single unit of a GRU network [24]. set ðXtest , ytest Þ:
6. Define a text vectorization layer TextVectorization to
convert raw text into integer sequences, setting the maximum
vanishing gradient, where learning long-term dependencies is vocabulary size and sequence length based on the dataset.
problematic.
In GRUs, two gates operate: one is the reset gate, which inte- 7. Construct the GRU model architecture:
grates incoming data with previous memory, and the other is the a. Use an input layer to receive sequences of token IDs.
update gate, which determines the retention of historical infor-
mation. Through these gates, GRUs can selectively maintain or b. Utilize an Embedding layer to map the integer sequences
discard data over various time steps, rendering them highly to dense vector representations of size 128.
effective for sequence modeling, particularly when significant c. Add a GRU layer with 128 units, defining the update
intervals separate pertinent data points. rules for the GRU operations.
2) Mathematical Formulation: Like an LSTM network, there
d. Add Dense layers with ReLU activation functions for
are four gates used in a GRU network. They are as follows.
higher-level representation learning.
The update gate determines how much of the past information
needs to be passed along to the future, calculated by e. Use a final Dense layer with a sigmoid activation
zt ¼ rðWz  xt þ Uz  ht1 þ bz Þ: (20) function to output the probability of the positive class.

The reset gate decides how much of the past information to 8. Compile the GRU model with a binary cross-entropy loss
forget, given by function and use the Stochastic Gradient Descent (SGD)
optimizer.
rt ¼ rðWr  xt þ Ur  ht1 þ br Þ: (21)
9. Train the GRU model on the training data ðXtrain , ytrain Þ and
The candidate hidden state is a combination of the current validate its performance on the testing dataðXtest , ytest Þ:
input and the past hidden state, modulated by the reset gate
denoted by
encoding, and text vectorization. The GRU model is constructed
h~t ¼ tanhðWh  xt þ Uh  ðrt  ht1 Þ þ bh Þ: (22)
with embedding and dense layers, trained with binary cross-
The hidden state ht is the final output of the GRU cell at time entropy loss and SGD optimizer.
step t, combining the old hidden state and the candidate hidden 4) Relevance: GRUs are designed to work with sequential
state, as influenced by the update gate shown in data, which is a fundamental aspect of NLP. GRUs employ
gating systems to regulate the transfer of information and are
ht ¼ zt  ht1 þ ð1  zt Þ  h~t: (23)
particularly useful for NLP tasks, including those related to
Fig. 5 illustrates the architecture of a GRU. “r” denotes the sig- mental health models. Just like LSTMs, GRUs are adept at cap-
moid activation function, which outputs a value between 0 and turing dependencies from long sequences of data. In the context
1. “tanh” represents the hyperbolic tangent activation function. of mental health, this means that GRUs can effectively use the
The symbols “” and “þ” indicate elementwise multiplication context from a patient’s earlier conversations or written text to
and addition, respectively. “1-” represents the operation of inform the understanding of their current mental state. GRUs
subtracting the value from one, essentially inverting it. These simplify the gating mechanism and have been shown to perform
operations collectively enable the GRU to effectively capture on par with LSTMs on certain tasks. This reduction in complex-
dependencies from input data (x1, x2, x3, … xn) and produce the ity can be particularly advantageous when modeling mental
output by updating the hidden states (H0, H1, … Ht  1). health conditions, where overfitting to the training data is a con-
3) Algorithm: Algorithm 4 outlines text classification using cern due to the nuanced and highly individual nature of mental
a GRU neural network, involving data preprocessing, label health expression.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 285

5) Pros/Cons:
a) Pros:
1) Mastery of Persistent Dependencies: GRUs are adept at
grasping enduring relationships within serial data, sur-
passing the capabilities of conventional RNNs.
2) Superior Resource Economy: GRUs are recognized for
their greater resource efficiency relative to LSTM net-
works, attributed to their simplified parameter architecture.
3) Versatile Applicability: GRUs have shown prowess across
an array of tasks dealing with sequential data, encompass-
ing NLP and the analysis of time series.
b) Cons: Fig. 6. Different layers of CNN [25].
1) Still Prone to Overfitting: Despite their improvements
over standard RNNs, GRUs can still suffer from overfit-
The pooling layer reduces the dimensionality of each feature
ting, especially with smaller datasets.
map while retaining the most important information. Max pool-
2) Complexity: While simpler than LSTMs, GRUs are still
ing over a window of size p is defined by
more complex than basic RNNs, which can make them
harder to train and optimize. c^j ¼ max ð1  k  pÞcjþk1 (25)
3) Limited Processing Power for Very Long Sequences:
for j ¼ 1 … (n  h þ 1  p þ 1), effectively reducing the size
Despite their ability to handle long-term dependencies
of the feature map.
better than traditional RNNs, they might still struggle After flattening the pooled feature maps, the result is passed
with extremely long sequences compared to some newer through one or more fully connected layers. For a flattened vec-
architectures such as Transformers. tor v 2 Rq and weights Wfc 2 Rqr, the output is given by
E. Convolution Neural Network (CNN) z ¼ W fc v þ bfc (26)

1) Concept: Designed to tackle gridlike data, with image where bfc is the bias term and r is the number of output neurons.
processing being a prime example, the CNN framework is built For binary classification, the output layer often uses a sigmoid
on several key components such as convolutional layers, pool- function to predict the probability p of the positive class function:
ing layers, and densely connected layers. Convolutional layers 1
process the input with multiple filters to produce feature maps, p ¼ rðzÞ ¼ : (27)
1 þ ez
capturing key elements within the input. Pooling layers serve
For multilabel classification, separate sigmoid units can be used
to simplify these feature maps by reducing their size, thus
for each class, allowing the model to predict multiple classes
streamlining the data and lessening the need for computational
independently.
resources. Following this, the densely connected layers utilize
Fig. 6 describes the feature maps in CNN layers. The center
the streamlined feature maps for making predictions or classi-
image illustrates the process of applying a convolution operation
fying data.
with a 1-D kernel of size 5 across the feature maps, highlighting
CNNs distinguish themselves by maintaining the spatial rela-
the transformation of input data into a series of feature maps. To
tionships found in pixel data through the analysis of features
the right, a max pooling operation reduces dimensionality, select-
extracted from small, localized segments of the input data. This
ing the maximum value in each feature region. Following pool-
stands in contrast to traditional neural networks, which typically
ing, the feature maps are concatenated into a single vector and
convert an image into a 1-D array of pixels, thereby losing spa-
subsequently fed into a fully connected layer for classification.
tial structure. By keeping the image’s spatial hierarchy, CNNs
3) Algorithm: Algorithm 5 presents the text classification
can more adeptly identify patterns.
with a CNN, including preprocessing and encoding text data,
2) Mathematical Formulation: The multiple layers of a CNN
constructing the model with embedding, convolutional, pooling,
model are given as follows.
and dense layers, and training using binary cross-entropy loss
The convolutional layer utilizes filters to process the input
and SGD optimizer.
and distill important features. For an input matrix X 2 Rnd,
4) Relevance: CNNs, typically linked with analyzing visual
where n is the sequence length and d is the embedding dimen-
data, have demonstrated effectiveness in diverse NLP applica-
sion, and a filter W 2 Rhd of height h, the feature map c is gen-
tions, particularly within the mental health field. CNNs can be
erated by
X X  applied to text analysis by considering segments of text as analo-
ci ¼ f ð m ¼ 1Þ h ðn ¼ 1Þd Wðm,nÞ  Xðiþm1,nÞ þ b gous to image patches and finding patterns within those text
“images.” CNNs are excellent at automatic feature extraction. In
(24) text applications, convolutional layers can detect patterns such
for i ¼ 1 … (n  h þ 1), where f is a nonlinear activation func- as n-grams (combinations of words) that might be indicative of
tion (e.g., ReLU), and b is a bias term. mental health issues when analyzing transcripts or written

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
286 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

Algorithm 5: Text Classification Using CNN b) Cons:


1) High Demand for Resources: The process of training
1. Load the dataset D into a DataFrame df and remove any CNNs requires significant computational power, espe-
missing values. cially when dealing with extensive datasets and intricate
2. Apply a preprocessing function to each text instance ti to network designs.
perform cleaning, tokenization, removal of stopwords, and 2) Structured Data Preference: The architecture of CNNs
stemming. is specifically tailored for processing structured, grid-
patterned data, such as that found in images, which may
3. Encode the labels using a LabelEncoder, converting restrict their use with varied data forms.
categorical labels into a numeric form suitable for classification. 3) Opacity in Operation: CNNs, along with numerous DL
4. Vectorize the preprocessed text data using a Text models, often operate opaquely, posing difficulties in deci-
Vectorization layer to convert text into integer sequences, phering the mechanics of their decision-making processes.
setting the maximum vocabulary size and sequence length.
5. Split the dataset into a training set ðXtrain , ytrain Þ and a F. Random Forest
testing set ðXtest , ytest Þ:
1) Concept: The random forest algorithm is a robust and
6. Construct the CNN model architecture: adaptable ML technique that works by building numerous deci-
a. Use an Input layer to receive sequences of token IDs. sion trees while in the training phase. It then determines the
most frequent class (in classification tasks) or calculates the
b. Apply the Text Vectorization layer.
average outcome (in regression tasks) from the individual trees.
c. Utilize an Embedding layer to map the integer It falls under the umbrella of ensemble learning strategies,
sequences to dense vector representations. which leverage the collective outputs of multiple models to
enhance accuracy and performance. Such an approach is espe-
d. Add Convolutional 1D layer(s) with ReLU activation
cially beneficial in mitigating the issue of overfitting, a typical
to extract features from the sequence.
challenge with decision tree algorithms, as it aggregates the pre-
e. Apply GlobalMaxPooling1D layer to each feature map. dictions from several trees to form a more universally applicable
model. The essence of the random forest methodology is to
f. Add Dense layers with ReLU activation for higher-level
develop a series of decision trees during the training process and
representation learning, followed by a final Dense layer
to derive conclusions based on either the majority ruling for clas-
with a Sigmoid activation function to output the
probability of the positive class. sifying data or on the averaged forecasts for regression analysis.
2) Mathematical Formulation:
7. Compile the CNN model with the binary cross-entropy loss There are several steps included in implementing the random
function and use Stochastic Gradient Descent (SGD) as the forest algorithm, they are as follows.
optimizer. Given a dataset D with n features and m observations, we
8. Train the CNN model on the training data ðXtrain , ytrain Þ construct a series of decision trees {T1, T2, … , Tk}. Each tree Tk
and validate its performance on the testing data ðXtest , ytest Þ: is built on a bootstrap sample Dk  D.
Then, each tree Tk is grown by selecting the best splits from a
random subset of features. The goal is to maximize the informa-
communication. This is particularly valuable when dealing with tion gain (IG) which is given in (28) or minimize the gini impu-
extensive collections of EHRs, social media posts, or therapy rity (I) at each split which is given in (29).
session transcripts. Just as they do with images, CNNs can cap- a) Information gain:
ture local dependencies in text (such as the proximity of certain X  
words and phrases) and, with the appropriate architecture, can IGðDk , F 0 Þ ¼ H ðDk Þ  ði ¼ 1Þs jDki j=jDk jH ðDki Þ : (28)
account for the order of words using temporal convolutions.
6) Pros/Cons: where H(Dk) is the entropy of Dk, and s is the number of splits
for the feature F.
a) Pros:
b) Gini impurity:
1) Efficiency: CNNs require fewer parameters compared X  
to fully connected networks, making them more efficient IGðDk Þ ¼ 1  ðj ¼ 1Þc Pðyj Þ 2 (29)
to train.
2) Feature Extraction: They are capable of automatically with c representing the number of classes.
detecting important features without any human supervi- Then, at each decision point within each tree, a number / of
sion, thanks to their hierarchical structure features are randomly selected from the total n, where typically
pffiffiffi
3) Translational Invariance: Once trained, CNNs can recog- / ¼ n. The split that provides the best separation according to
nize objects regardless of where they appear in the image. the chosen metric (IG or Gini) is used.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 287

After all the training, the prediction for an instance with fea-
ture vector x is given by the mode of predictions from all indi-
vidual trees is given by

CðxÞ ¼ modefT1 ðxÞ, T2 ðxÞ, :::, TK ðxÞg (30)

where C(x) is the final classification outcome and Tk(x) is the


prediction by the kth tree.
The random forest aims to minimize the cumulative error
across all trees, optimizing the following objective function:
 X 
min 1=K ðk ¼ 1ÞK ErrðTk Þ (31)

where Err(Tk) is the error rate of tree Tk on out-of-bag samples.


3) Technical Diagram: Fig. 7 represents the flowchart for
random forest classification for diagnosing mental health. The
training dataset undergoes bootstrapping to generate multiple
subsets, each used to train a corresponding decision tree. The
ensemble of decision trees then collectively votes to determine
the final classification result Fig. 7. Flowchart for random forest algorithm [26].
4) Algorithm: Algorithm 6 outlines text classification using a
Random Forest, involving data preprocessing and vectorization,
model initialization with hyperparameters, and training with
Algorithm 6: Text Classification Using Random Forest
bootstrap samples and recursive splitting.
5) Relevance: The random forest technique, a form of ensem- 1. Load the dataset D into a DataFrame df from the specified
ble learning, builds several decision trees when training and CSV file.
delivers an output that represents the most common class among
2. Define and apply a text cleaning function to each text
these trees, particularly when applied to NLP for mental health
instance ti to obtain a cleaned text instance ti0 : Apply tokeniza-
purposes. Text data often get converted into a high-dimensional
tion and removal of stopwords to ti0 : Apply stemming to each
space, especially when using bag-of-words or TF-IDF approaches.
word in ti0
Random forest can handle high-dimensional data effectively,
making it well suited for NLP tasks. Mental health data often 3. Split the dataset into a training set Dtrain and a testing
involve complex, nonlinear relationships that random forest set Dtest :
can capture. 4. Vectorize Dtrain and Dtest using the TF-IDF method to
6) Pros/Cons: obtain feature matrices Xtrain and Xtest :
a) Pros:
1) Accuracy: Random forest typically offers high accuracy 5. Initialize the Random Forest classifier MRF with
and can handle both binary and multiclass classification hyperparameters: number of estimators, criterion, maximum
problems effectively. depth, minimum samples split, minimum samples leaf, and
2) Robustness: Random forest exhibits greater resistance to random state.
overfitting compared to single decision trees and demon- 6. Train MRF using Xtrain and corresponding labels Ytrain :
strates a strong capability to manage data with a high
level of variability. a. For each tree Tj in the ensemble MRF :
3) Versatility: Capable of performing both classification i. Create a bootstrap sample Bj by randomly selecting
and regression tasks, making it adaptable to a wide range samples with replacement from Xtrain :
of data modeling problems. ii. Grow the tree Tj by recursively choosing the best
4) Significance of Features: The random forest algorithm split based on the criterion (Gini or entropy).
can highlight the most influential features in, which can iii. Stop tree growth according to the stopping criteria
be particularly useful in understanding key factors affecting (max depth, min samples at a leaf, etc.).
mental health outcomes. b. Aggregate the trees to form the Random Forest
b) Cons: classifier MRF :
1) Model Complexity: Random forest models can become
7. Make predictions on Xtest using MRF and evaluate the
quite complex, making them harder to interpret com-
model’s performance by calculating accuracy.
pared to single decision trees.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
288 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

2) Computationally Intensive: Training many trees on large


datasets can be computationally demanding and time-
consuming.
3) Memory Consumption: The ensemble approach requires
keeping multiple decision trees in memory, which can
be a concern with very large datasets.
4) Less Intuitive Decisions: The decision process of a ran-
dom forest model is not as straightforward to understand
and explain as that of simpler models, which can be a
drawback in applications where interpretability is critical.

G. Support Vector Classification (SVC)


1) Concept: SVC aims to determine an optimal hyperplane
or multiple hyperplanes within a multidimensional space that
separates the data points into distinct categories. This hyper-
plane or these hyperplanes serve as a boundary for decision- Fig. 8. Flowchart for functioning of SVC [27].
making, with data points on either side falling into different cat-
egories. In the context of binary classification, the objective is
clear-cut identify the hyperplane that most effectively bifurcates To handle nonlinearly separable data, the kernel trick is applied
the two classes. given in
What sets SVC apart is its proficiency in not merely locating
Kðxi , xj Þ ¼ /ðxi Þ  /ðxj Þ: (34)
a decision boundary, but in identifying the boundary that allows
for the greatest separation, or margin. This margin is the space This allows the SVM to operate in a transformed feature space
between the hyperplane and the closest data points of both clas- without explicitly computing the mapping /.
ses, which are referred to as support vectors. The principle is For a new sample x, the decision function is given by
that a more substantial margin enhances the model’s predictive
X 
accuracy on new, unseen data, thereby fortifying it against the ði ¼ 1ÞN ai yi Kðxi , xÞ þ b:
f ðxÞ ¼ sign (35)
risk of overfitting.
When dealing with data that do not naturally fall into a linear
separation, SVC makes use of a method known as the kernel Fig. 8 outlines the flowchart for a SVC-based mental health
trick. By employing this technique, it is possible to project the issue detection mechanism. The kernel trick is a feature of SVC
input features into an expanded dimensional space where they that allows it to handle nonlinear relationships. By applying the
can be linearly divided. Some of the well-known kernel func- kernel trick, an SVC can project linearly inseparable data into a
tions used in this context include the linear, polynomial, radial higher dimension where it is separable.
basis function (RBF), and sigmoid kernels. 4) Algorithm: Algorithm 7 describes text classification using
2) Mathematical Formulation: The steps involved in imple- Linear SVC, involving data preprocessing, label encoding, and
menting an SVC are as follows. TF-IDF vectorization. The Linear SVC model is trained using a
Given training vectors xi 2 Rn and a binary label yi 2 {1, linear kernel and the hinge loss optimization.
1}, the optimization problem for the linear SVC is formulated as 5) Relevance: SVM classifiers, particularly the SVC for clas-
sification tasks, are powerful tools in ML that can also be
1 XN
applied to NLP tasks in the mental health domain. SVCs are
minw,b,n kwk2 þ C ni (32)
2 i¼1
adept at categorizing text into different groups. In mental health
applications, this could mean distinguishing between different
subject to emotional states, stress levels, or identifying language indica-
yi ðw  xi þ bÞ 1  ni , ni 0, 8i tive of specific mental health disorders based on text data. Text
data, when converted into numerical form using techniques
where ni are slack variables allowing for margin violation, and
such as term frequency-inverse document frequency (TF-IDF)
C > 0 is a regularization parameter.
and word embeddings, often exists in high-dimensional spaces.
The Lagrange dual of the above problem is given by
X X  SVCs are particularly well suited for such high-dimensional
maxa ði ¼ 1ÞN ai  1=2 ði,j ¼ 1ÞN yi yj ai aj hxi  xj i feature spaces, as they are effective at handling complexity
without the curse of dimensionality. One of the strengths of
(33) SVCs is their generalization ability. They tend to perform well
subject to on unseen data, which is important for developing NLP models
X that can generalize beyond the training data to real-world men-
0  ai  C, ði ¼ 1ÞN ai yi ¼ 0:
tal health assessments.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 289

Algorithm 7: Text Classification Using SVC V. DATASETS

1. Load the dataset D from a CSV file into a DataFrame df A. Dataset 1


and remove any missing values. This dataset is a structured collection of text posts derived from
2. Preprocess the text data in df using a function preprocessor specific online community discussions focused on mental health.
which includes tokenization, conversion to lowercase, removal It is primarily composed of data in English, acquired via web
of stopwords, and stemming. Represent the preprocessing scraping techniques from subreddit forums, which are known for
transformation mathematically as: their user-generated content on a myriad of subjects. Following
  the acquisition, the dataset underwent a rigorous cleansing process,
ti0 ¼ Stem TokenizeðToLowerðCleanðti ÞÞÞ employing a variety of NLP methods to ensure the text is free
from noise and inconsistencies that are typical in raw online data.
3. Encode the labels ðfy1 , y2 , :::, yn gÞ to numeric form using The core element of the dataset is a collection of over 36 000
label encoding. processed text entries, each carefully curated to ensure unifor-
4. Vectorize the preprocessed text using the Term Frequency- mity and clarity for analytical purposes. These entries are paired
Inverse Document Frequency (TF-IDF) method, mathemati- with binary labels that categorize them according to their rele-
cally represented as: vance to a specific mental health condition, with one label repre-
senting posts that are indicative of such sentiments and another
Xtfidf ¼ TFIDFðti Þ ¼ tf ðti Þ  idf ðti Þ
for those that are not. The dataset is characterized by a wide
where tf ðti Þ is the term frequency, and idf ðti Þ is the inverse variety of discussions, as reflected by the numerous unique text
document frequency of term ti0 in the corpus. entries it contains. This diversity highlights the range of conver-
sations and emotional expressions present in the data, making it
5. Split the TF-IDF vectors Xtfidf and encoded labels into
a rich resource for sentiment analysis and mental health studies.
training ðXtrain , ytrain Þ and testing sets ðXtest , ytest Þ:
6. Initialize the Linear Support Vector Classifier MSVC with B. Dataset 2
the linear kernel.
This dataset serves as a repository of user-generated comments
7. Train MSVC on ðXtrain , ytrain Þ the training involves solving curated from online platforms. It is meticulously organized into
the two primary sections. The first contains a wide array of user com-
following optimization problem: ments, which captures a spectrum of individual thoughts and con-
!
X N    versations. The second part of the dataset applies a binary
1 2 0
min kwk þ C max 0; 1  yi w xi þ b
T
: categorization system, which serves to identify whether a com-
w, b 2 i¼1 ment carries negative connotations that could potentially be harm-
Here, w represents the weights assigned to the features, ful or triggering in the context of mental health. This is a critical
b is the bias term, C is the penalty parameter, xi is the feature aspect of the dataset, aiming to differentiate between neutral dis-
vector, and y0i is the label after encoding. course and that which could be deemed “toxic” or detrimental to
individuals facing mental health challenges.
With over 27 000 unique entries, the dataset represents a sub-
6) Pros/Cons: stantial breadth of data, suggesting its potential utility in various
a) Pros: analytical applications. It can prove especially instrumental to
1) Effectiveness in High-Dimensional Spaces: SVC per- delve into sentiment analysis, the detection of harmful language,
forms effectively even when the number of dimensions or to gain a deeper understanding of the discourse surrounding
exceeds the number of samples. mental health on digital platforms.
2) Versatility: The use of different kernel functions (linear,
polynomial, RBF, and sigmoid) makes SVC versatile for
C. Dataset 3
various types of data.
3) Robustness: It is robust against overfitting, especially in This dataset is a compilation of textual content sourced from
high-dimensional spaces. posts within subreddits dedicated to mental health discussions.
b) Cons: It is organized to include the title of the post, which acts as the
1) Scalability: SVC does not perform well with very large header, and the main body of text, providing a deeper dive into
datasets because the training time can be cubic in the the subject matter. Each post is associated with a “target” value
size of the dataset. ranging from 0 to 4, where each number corresponds to a spe-
2) Output Interpretability: Unlike some other models, SVC cific mental health issue, such as stress, depression, bipolar dis-
does not directly provide probability estimates for classi- order, personality disorder, and anxiety.
fications, making its outputs harder to interpret. With its focus on mental health, the dataset’s structure facili-
3) Sensitivity to Feature Scaling: SVC is sensitive to the tates an in-depth analysis of how different conditions are dis-
scaling of the input features; thus, proper feature scaling cussed and presented in online communities. The target labels are
is essential before applying SVC. essential for distinguishing the various mental health conditions

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
290 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

TABLE II
RESULTS FROM DATASET 1

Jaccard
F1 Cohens Hamming Matthews Balanced
Accuracy Precision Recall Log Loss AUC ROC Similarity
Score Kappa Loss Correlation Accuracy
Coeff
SVC 95.6 96 96 96 91.2 1.51 95.63 4.4 91.58 91.21 95.63
Random
93.66 94 94 94 87.32 2.18 93.65 6.33 88.08 87.34 93.65
Forest
CNN 95.66 96 96 96 91.33 7.36 98.87 4.33 91.69 91.41 95.65
LSTM 96.5 97 97 97 93.02 12.4 98.47 3.49 93.25 93.08 96.52
GRU 93.01 94 93 93 86.03 21.42 97.2 6.98 86.89 86.82 93
BERT 96.96 97 97 97 93.14 14.24 99.18 3.42 93.35 93.16 96.58
RoBERTa 97.67 98 98 98 95.34 11.38 99.56 2.32 95.45 95.34 97.66
DistilBERT 97.54 98 98 98 95.08 11.19 99.66 2.45 95.2 95.08 97.54
XLNet 97.09 97 97 97 94.17 8.89 99.55 2.9 94.34 94.2 97.07

TABLE III
RESULTS FROM DATASET 2

Jaccard
Cohens Hamming Matthews Balanced
Accuracy Precision Recall F1 Score Log Loss AUC ROC Similarity
Kappa Loss Correlation Accuracy
Coeff
SVC 81.63 83 82 82 76.95 52.25 96.5 18.36 69.16 77.01 81.68
Random
82.26 83 83 83 77.73 86.72 96.1 17.73 70.25 77.77 82.54
Forest
CNN 79.76 80 80 80 74.63 94.57 95.1 20.23 66.46 74.65 79.91
LSTM 72.01 74 72 73 64.84 165.11 91.2 27.98 56.84 65.05 72.14
GRU 70.86 73 71 71 63.44 199.16 90 29.14 55.38 63.73 70.93
BERT 83.33 84 84 84 79.14 98.67 95.2 16.67 71.59 79.21 83.8
RoBERTa 82.53 83 83 83 78.14 79.22 95.2 17.47 70.32 78.2 83.07
DistilBERT 81.63 82 82 82 76.99 95.51 95.3 18.36 68.99 77.04 82.06
XLNet 81.63 82 82 82 76.98 84.96 94.5 18.36 69.09 76.98 81.87

discussed across the 4651 unique textual instances, allowing for with an accuracy of 93.66% and an F1-score of 94%, showing
detailed categorization and analysis. its limitations in processing this dataset, which is further reflected
in its relatively high Log loss of 21.42.
In dataset 2, which consists of user-generated comments, the
VI. RESULTS
BERT model outperforms others with an accuracy of 83.33%
The results of our proposed models are summarized in Tables and an F1-score of 84%. This performance is particularly signif-
II–IV, each corresponding to the performance on datasets 1, 2, icant in the context of identifying and classifying mental health-
and 3, respectively. For dataset 1, which includes over 36 000 related discourse, where BERT’s lower Hamming loss of 16.67
meticulously processed text entries focused on mental health signifies better precision in its predictions. Following closely,
discussions, our models displayed varied performance across RoBERTa and random forest achieve accuracies of 82.53% and
several key metrics. The RoBERTa model stands out with the 82.26%, respectively, with RoBERTa maintaining a stronger
highest accuracy at 97.67%, coupled with a robust F1-score of AUC ROC value of 95.2 compared to random forest. However,
98%. This model’s strength is further underscored by its low GRU and LSTM models show noticeably weaker performance
Log loss of 11.38 and the highest AUC ROC value at 99.56, with accuracies of 70.86% and 72.01%, coupled with higher
demonstrating its superior capability in distinguishing between Log loss values, highlighting their struggle in effectively catego-
relevant and nonrelevant posts. Similarly, DistilBERT shows a rizing the more nuanced aspects of user comments in this data-
commendable performance with an accuracy of 97.54% and an set. Despite this, CNN, although not leading in accuracy, shows
identical F1-score of 98%, though slightly behind RoBERTa in a balanced performance with an AUC ROC of 95.1 and a bal-
terms of AUC ROC and Log loss values. On the other hand, anced accuracy of 79.91%, making it a reliable model for certain
CNN and LSTM models, while performing well with accuracies aspects of this dataset.
of 95.66% and 96.5%, respectively, do not match the top models In dataset 3, which categorizes posts based on specific mental
in discriminative power, as indicated by their respective AUC health conditions, random forest emerges as the top performer with
ROC scores and higher Log loss values. Random forest lags an accuracy of 95.6% and an F1-score of 96%, demonstrating its

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 291

TABLE IV
RESULTS FROM DATASET 3

Jaccard
Cohens Hamming Matthews Balanced
Accuracy Precision Recall F1 Score Log Loss AUC ROC Similarity
Kappa Loss Correlation Accuracy
Coeff
SVC 92.22 92 92 92 84.44 2.68 92.2 7.77 85.57 84.45 92.21
Random
95.6 96 96 96 91.2 1.51 95.6 4.4 91.58 91.21 95.63
Forest
CNN 88.23 91 91 91 81.95 44.03 96.9 9.02 83.44 81.96 90.97
LSTM 89.68 90 90 90 79.42 26.18 96.9 10.31 81.27 79.96 89.8
GRU 91.56 92 92 92 83.14 22.86 97 8.43 84.44 83.24 91.61
BERT 95.53 96 96 96 91.06 18.94 99.1 4.46 91.49 91.07 95.53
RoBERTa 95.22 95 95 95 90.45 23.81 99 4.77 90.89 90.46 95.22
DistilBERT 95.46 95 95 95 90.92 22.41 99.1 4.53 91.31 90.92 95.46
XLNet 95.39 95 95 95 90.87 22.67 99.1 4.39 91.27 90.84 95.39

TABLE V
STATE-OF-THE-ART COMPARISON TABLE

Previous Study Accuracy Precision Recall F1 Score


[1] 0.96 0.97 0.98 0.96
[4] 0.97 0.93 0.88 0.9
[7] 0.93 0.93 0.95 0.94
[9] 0.8029 0.805 0.8 0.805
[11] 0.9 - - -
[12] 0.44 0.553 0.493 0.493
[13] 0.78 0.79 0.78 0.78
[14] 0.893 0.912 0.907 0.909
[15] 0.7 - - -
[18] 0.94 0.96 0.96 0.98
[19] 0.8354 0.8814 0.857 0.71
Proposed Study 0.9767 0.98 0.98 0.98

robust ability to classify text entries according to various mental of mental health discussions, where accurate classification can be
health issues. BERT and RoBERTa also perform strongly with critical for understanding and addressing the underlying issues.
accuracies of 95.53% and 95.22%, respectively, and F1-scores
of 96% and 95%, showcasing their consistent performance
VII. DISCUSSION
across multiple datasets. However, CNN shows a drop in
accuracy to 88.23%, suggesting potential limitations in han- This research embarked on an exploratory journey through
dling the specific mental health categorizations present in this the digital topography of mental health discussions, leveraging
dataset. Meanwhile, LSTM and GRU exhibit moderate perfor- advanced ML techniques to distill meaningful patterns from
mance, with LSTM achieving an accuracy of 89.68% and vast textual datasets. The results presented in this study not only
GRU at 91.56%, both maintaining reasonable precision and underscore the capabilities of various classifiers in text categori-
recall metrics but not surpassing the leading models. Distil- zation but also highlight the nuanced differences in performance
BERT and XLNet also demonstrate strong results, closely mir- metrics across different datasets and modeling approaches.
roring the performance of BERT and RoBERTa, with balanced Notably, the study illustrates the efficacy of advanced NLP
accuracies above 95% and low Hamming losses, indicating their models such as RoBERTa, DistilBERT, and BERT in accurately
reliability for text classification tasks within the mental health detecting mental health-related posts, demonstrating impressive
domain. metrics across multiple datasets.
In conclusion, across all three datasets, RoBERTa and BERT In dataset 1, our findings reveal a compelling narrative of ML
consistently emerge as the top performers, particularly excelling in efficacy. Ensemble methods and transformer-based models,
metrics such as accuracy, F1-score, and AUC ROC. Their strong particularly RoBERTa and DistilBERT, achieved outstanding
results suggest that these models are well suited for handling com- accuracy, precision, recall, and F1 scores, all surpassing the 95%
plex and nuanced text classification tasks, especially in the context mark. These models also exhibited remarkable Cohen’s Kappa

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
292 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

scores, signifying substantial agreement beyond chance—a testa- be relatively efficient, with an average training time of approxi-
ment to their robustness in discerning mental health-relevant posts mately 16 200 s (around 4.5 h), making them a viable option for
amidst the online chatter. However, this high performance comes tasks with moderate computational resources.
with significant computational demands. The time complexity of The performance variance across the datasets can be
these transformer models is Oðn2 Þ due to the self-attention mech- attributed to the intrinsic complexities of the textual data.
anism, where n represents the sequence length. This quadratic The subtleties of language, context-dependency of expres-
complexity requires substantial computational resources, includ- sions, and the myriad ways in which mental health issues are
ing high-performance GPUs with memory often exceeding 16 communicated online make perfect precision inherently
GB, making the training process computationally intensive and challenging. Nonetheless, the consistently lower log loss
time-consuming, averaging around 32 h for models such as RoB- values for transformer models across all datasets hint at their
ERTa and DistilBERT. superior confidence in predictions, likely due to their deeper
Moving on to dataset 2, characterized by its binary classifica- contextual understanding gleaned from the training data.
tion of negative versus neutral comments, the models faced a This deeper understanding, however, comes at the cost of
more challenging task. Accuracy scores ranged from 70.86% to increased computational complexity and training time.
83.33%, with BERT leading in performance. This suggests that These findings resonate with the growing consensus in the
more nuanced detection of toxic content may be required, poten- NLP community that transformer-based models, with their deep
tially through incorporating more sophisticated contextual embed- contextual understanding, generally outperform traditional ML
dings. The complexity of BERT’s self-attention mechanism again approaches, especially in tasks involving rich, nuanced language
highlights the balance between model accuracy and computa- data such as mental health discussions. This study reaffirms the
tional cost, as the sequence length significantly impacts process- importance of selecting appropriate models based not only on
ing time and resource requirements. the dataset’s nature and the specific nuances of the classification
In contrast, models such as XLNet, which also rely on trans- task but also considering the computational resources available
former architecture, have similar computational challenges due for model training and deployment.
to their Oðn2 Þ time complexity. However, XLNet typically While the article discusses the performance evaluation of the
requires slightly less training time, averaging around 27 h. This models, it does not mention any real-world evaluation or practical
reduction in training time, while still significant, reflects the con- applications of the approach. In addressing this gap, a case study
tinuous improvements in transformer models to handle large- involving a mental health organization deploying a real-time
scale data more efficiently. monitoring system using a fine-tuned RoBERTa model to identify
In dataset 3, representing a multiclass categorization task, the social media posts indicating severe distress or suicidal intent
transformer models, particularly BERT and its optimized variant demonstrated initial success. However, the model faced chal-
DistilBERT, continued to excel, achieving balanced accuracy lenges when new slang or terminology appeared, which was not
scores well above 90%. These results underscore the efficacy of covered in the training data, necessitating periodic retraining to
transformer models in handling complex classification tasks maintain accuracy and relevance. This retraining process, while
with multiple categories. Again, the computational cost associ- crucial, is resource-intensive due to the model’s Oðn2 Þ complex-
ated with these models, given their Oðn2 Þ time complexity, ity, further emphasizing the need for powerful computational
remains a crucial consideration, especially when scaling these infrastructure in real-world deployments. This, along with other
models for real-world applications. examples such as the deployment of mental health chatbots and
When considering nontransformer models, the study also systems for detecting toxic content in online forums, underscores
explored the performance of LSTM and GRU models, which the need for continuous updates, human oversight, and robust
have a time complexity of Oðn:mÞ, where n represents the strategies to ensure the effectiveness and ethical application of
sequence length and m is the dimensionality of the hidden state. these methods in dynamic, real-world environments.
Unlike transformers, these models process sequences sequen- However, when considering real-world applicability, several
tially, resulting in a linear relationship between sequence length challenges arise, particularly in deployment and the robustness
and computation time. Although LSTM and GRU models are of these methods in dynamic environments. Challenges include
less computationally intensive than transformers, they are poten- handling data drift—where evolving online discourse necessi-
tially slower for very long sequences due to their sequential tates models to adapt continuously to new language patterns—
processing. LSTM models, for example, required approximately and managing the computational demands required to scale
7 h to train in this context, while GRU models, with their sim- these models in real-time applications. Additionally, ensuring
pler architecture, were slightly more efficient, completing train- the interpretability of these complex models is crucial, espe-
ing in around 5 h. cially in sensitive domains such as mental health, where under-
Additionally, CNN models were employed for text classifica- standing the rationale behind a prediction is vital.
tion tasks. The time complexity of CNNs primarily depends on In conclusion, this research highlights the powerful capabili-
the sequence length, the number of filters, and the kernel size, ties of transformer models in text categorization tasks related to
resulting in a complexity of Oðn:f :kÞ. CNNs, while not as pow- mental health, while also addressing the practical challenges in
erful in capturing long-range dependencies as transformers or deploying these models effectively and ethically in real-world
recurrent networks, offer a more efficient approach for tasks scenarios. The balance between model performance and compu-
requiring local feature extraction. In this study, CNNs proved to tational resource demands remains a key consideration for future

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 293

applications, particularly in environments where real-time proc- [4] F. K. Sufi and I. Khalil, “Automated disaster monitoring from social
media posts using AI-based location intelligence and sentiment analysis,”
essing and adaptability are essential. IEEE Trans. Computat. Social Syst., vol. 11, no. 4, pp. 4614–4624,
Aug. 2024, doi: 10.1109/TCSS.2022.3157142.
[5] M. Nouman, H. Sara, S. Y. Khoo, M. P. Mahmud, and A. Z. Kouzani,
VIII. CONCLUSION “Mental health prediction through text chat conversations,” in Proc. Int.
Joint Conf. Neural Netw. (IJCNN), Gold Coast, Australia, 2023,
This article has provided a specific and thorough analysis of pp. 1–6, doi: 10.1109/IJCNN54540.2023.10191849.
various ML and DL methods for detecting mental health issues. [6] D. P. Kadam and K. T. V. Reddy, “A study of machine learning models
for predicting mental health through text analysis,” in Proc. 1st
The findings highlight that ensemble methods and transformer- DMIHER Int. Conf. Artif. Intell. Educ. Ind. 4.0 (IDICAIEI), Wardha,
based models such as RoBERTa and DistilBERT have demon- India, 2023, pp. 1–5, doi: 10.1109/IDICAIEI58380.2023.10406845.
strated exceptional performance in accurately identifying mental [7] I. J. Dristy, A. M. Saad, and A. A. Rasel, “Mental health status prediction
using ML classifiers with NLP-based approaches,” in Proc. Int. Conf.
health-related posts, with accuracy, precision, recall, and F1 Recent Progresses Sci. Eng. Technol. (ICRPSET), Rajshahi, Bangladesh,
scores surpassing 95%. Additionally, BERT has shown profi- 2022, pp. 1–6, doi: 10.1109/ICRPSET57982.2022.10188544.
ciency in classifying negative or toxic comments with the high- [8] D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of
deep learning for natural language processing,” IEEE Trans. Neural
est accuracy among the models tested. Netw. Learn. Syst., vol. 32, no. 2, pp. 604–624, Feb. 2021, doi: 10.1109/
Despite these successes, the research faced several limita- TNNLS.2020.2979670.
tions. The inherent complexities of natural language and the [9] C. J. Varshney, A. Sharma and D. P. Yadav, “Sentiment analysis using
ensemble classification technique,” in Proc. IEEE Students Conf. Eng.
contextual nuances in expressing mental health issues online Syst. (SCES), Prayagraj, India, 2020, pp. 1–6, doi: 10.1109/SCES50439.
posed significant challenges, impacting the predictive perfor- 2020.9236754.
mance of the models across different datasets. Specifically, the [10] V. Aggarwal, J. Kaur, T. Walia, and D. Kaur, “Harnessing linguistic
markers for early mental health detection via social media,” in Proc. Int.
detection of toxic content required more sophisticated detection Conf. Adv. Comput. Commun. Technol. (ICACCTech), Banur, India,
of context and language subtleties, an area where even advanced 2023, pp. 292–297, doi: 10.1109/ICACCTech61146.2023.00054.
models such as BERT struggled to some extent. Furthermore, [11] S. Mathin, D. S. Chandra, A. R. Sunkireddy, B. J. V. Varma, S.
Hariharan, and V. Kukreja, “Personalized mental health analysis using
the computational intensity associated with training large-scale artificial intelligence approach,” in Proc. Int. Conf. Adv. Data Eng.
models such as BERT and RoBERTa is considerable, making Intell. Comput. Syst. (ADICS), Chennai, India, 2024, pp. 1–6, doi: 10.
them difficult to deploy in resource-constrained environments. 1109/ADICS58448.2024.10533648.
[12] Y. J. Msosa et al., “Trustworthy data and AI environments for clinical
Another limitation is the potential risk of overfitting on smaller prediction: Application to crisis-risk in people with depression,” IEEE J.
datasets, which could lead to reduced generalizability when Biomed. Health Inform., vol. 27, no. 11, pp. 5588–5598, Nov. 2023,
applied to new data. doi: 10.1109/JBHI.2023.3312011.
[13] M. Danner et al., “Advancing mental health diagnostics: GPT-based
To address these challenges, future work should focus on method for depression detection,” in Proc. 62nd Annu. Conf. Soc.
refining these models to enhance their sensitivity to the subtle- Instrum. Control Eng. (SICE), Tsu, Japan, 2023, pp. 1290–1296, doi:
ties of language used in mental health contexts. This could 10.23919/SICE59929.2023.10354236.
[14] K. K. Dixit, S. Pundir, A. Shrivastava, C. P. Kumar, A. P. Srivastava,
involve incorporating larger and more diverse datasets, including and P. Singh, “Analyzing textual data for mental health assessment:
multilingual data, to improve the robustness and generalizability Natural language processing for depression and anxiety,” in Proc. 10th
IEEE Uttar Pradesh Sect. Int. Conf. Elect., Electron. Comput. Eng.
of the models. Additionally, exploring newer architectures and
(UPCON), Gautam Buddha Nagar, India, 2023, pp. 1796–1802, doi: 10.
hybrid models could offer further improvements in the accurate 1109/UPCON59197.2023.10434291.
detection of mental health issues from textual data. Future [15] G. Serrano and D. Kwak, “ESAI: An AI-based emotional support
system to assist mental health disorders,” in Proc. Congr. Comput. Sci.
research should also consider the practical aspects of deploying Comput. Eng. Appl. Comput. (CSCE), Las Vegas, NV, USA, 2023,
these models in real-world scenarios, such as optimizing models pp. 1348–1354, doi: 10.1109/CSCE60160.2023.00226.
for scalability and efficiency in dynamic environments, and ensur- [16] Z. Ahmad, R. Maskat, and A. Mohamed, “Harnessing natural language
processing for mental health detection in Malay text: A review,” in
ing they are interpretable and ethically sound in sensitive applica- Proc. 4th Int. Conf. Artif. Intell. Data Sci. (AiDAS), IPOH, Malaysia,
tions. By addressing these limitations and exploring these future 2023, pp. 29–35, doi: 10.1109/AiDAS60501.2023.10284653.
directions, the research can contribute to more effective and reli- [17] A. Mittal, L. Dumka, and L. Mohan, “A comprehensive review on the
use of artificial intelligence in mental health care,” in Proc. 14th Int.
able tools for mental health analysis using NLP techniques. Conf. Comput. Commun. Netw. Technol. (ICCCNT), Delhi, India, 2023,
pp. 1–5, doi: 10.1109/ICCCNT56998.2023.10308255.
REFERENCES [18] M. H. Lee and R. Kyung, “Mental health stigma and natural language
processing: Two enigmas through the lens of a limited corpus,” in Proc.
[1] A. Khan, A. Ahmed, S. Jan, M. Bilal, and M. F. Zuhairi, “Abusive IEEE World AI IoT Congr. (AIIoT), Seattle, WA, USA, 2022, pp.
language detection in Urdu text: Leveraging deep learning and attention 688–691, doi: 10.1109/AIIoT54504.2022.9817362.
mechanism,” IEEE Access, vol. 12, pp. 37418–37431, 2024, doi: 10. [19] S. A. N. Siddik, B. M. Arifuzzaman, and A. Kalam, “Psyche
1109/ACCESS.2024.3370232. conversa—A deep learning based chatbot framework to detect
[2] G. Attigeri, A. Agrawal, and S. V. Kolekar, “Advanced NLP models for mental health state,” in Proc. 10th Int. Conf. Inf. Commun. Technol.
technical university information chatbots: Development and comparative (ICoICT), Bandung, Indonesia, 2022, pp. 146–151, doi: 10.1109/
analysis,” IEEE Access, vol. 12, pp. 29633–29647, 2024, doi: 10.1109/ ICoICT55009.2022.9914844.
ACCESS.2024.3368382. [20] K. Rani, H. Vishnoi, and M. Mishra, “A mental health chatbot delivering
[3] A. S. Sunar and M. S. Khalid, “Natural language processing of cognitive behavior therapy and remote health monitoring using NLP and
student’s feedback to instructors: A systematic review,” IEEE Trans. Learn. AI,” in Proc. Int. Conf. Disruptive Technol. (ICDT), Greater Noida, India,
Technol., vol. 17, pp. 741–753, 2024. doi: 10.1109/TLT.2023.3330531. 2023, pp. 313–317, doi: 10.1109/ICDT57929.2023.10150665.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
294 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025

[21] Towards Data Science. Accessed: Sep. 20, 2024. [Online]. Available: Achyut Duggal is currently working toward the
https://towardsdatascience.com/bert-explained-state-of-the-art-language- Bachelor of Technology degree in computer science
model-for-nlp-f8b21a9b6270 and engineering with Vellore Institute of Technology
[22] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available: https://www. (VIT), Vellore, India.
researchgate.net/figure/XLNet-architecture-a-Query-stream-attention- He is a dedicated and innovative Developer in com-
b-Content-stream-attention-and-c-The_fig3_379781094 puter science with VIT. With a strong foundation in the
[23] ML Review. Accessed: Sep. 20, 2024. [Online]. Available: https://blog. field, he has excelled in various technical roles, show-
mlreview.com/understanding-lstm-and-its-diagrams-37e2f46f1714 casing his expertise in GUI development, full-stack
[24] upGrad. Accessed: Sep. 20, 2024. [Online]. Available: https://www. mobile applications, and robotics. He has done intern-
upgrad.com/blog/basic-cnn-architecture/ ships at various organizations as a Software Developer
[25] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available: https:// and an Application Developer. He has worked in the
www.researchgate.net/figure/The-architecture-of-a-gated-recurrent-unit- fields of image processing, machine learning, and deep learning. His research inter-
GRU-cell_fig2_350933270 ests include computer vision and retrieval augmented generation.
[26] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available:https:// Mr. Achyut is a part of RoboVITics—The official robotics club of VIT and
www.researchgate.net/figure/Architecture-of-the-Random-Forest-algorithm_ has served as the Vice-Chairperson of the club from 2023 to 2024. The club
fig1_337407116 had won the “Best Technical Club Award” at VIT under his tenure.
[27] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available:https://
www.researchgate.net/figure/A-schematic-diagram-for-support-vector-
machine-SVM-training-and-testing-process_fig3_350007804 R. Sathyaraj completed his B.Tech., M.E., and
Ph.D. degrees. He is a Professor with the School of
Computer Science and Engineering, Vellore Insti-
tute of Technology, Vellore, Tamilnadu, India. His
Yashwanth Kasanneni is currently working toward research interests include software fault prediction, nat-
the Bachelor of Technology degree in computer sci- ural language processing, machine learning, and deep
ence and engineering with Vellore Institute of Tech- learning.
nology, Vellore, India.
He is a creative and detail-driven individual spe-
cializing in crafting innovative solutions by exploring
the intricate nuances. He has excelled in software
development, high-performance coding, and backend
web development. His expertise spans these critical
areas, showcasing his ability to deliver robust and S. P. Raja is born in Sathankulam, Tuticorin District,
efficient solutions. He is resolute in his pursuit of sur- Tamilnadu, India. He completed the schooling at
passing personal and professional milestones and excited to contribute with a dis- Sacred Heart Higher Secondary School, Sathankulam.
tinctive perspective and collaborate with like-minded individuals, fostering He received the B.Tech. degree in information tech-
innovation. He has completed an internship at Enabled Analytics as a Junior nology from Dr. Sivanthi Aditanar College of Engi-
Salesforce Developer. Additionally, he has worked extensively in the fields of neering, Tiruchendur, Tamilnadu, in 2007, and the
OpenCV, machine learning, deep learning, and natural language processing. M.E. degree in computer science and engineering and
Mr. Yashwanth is an active member of Anokha, an NGO club, Vellore, India. the Ph.D. degree in image processing from Manonma-
Anokha is a non-government organization that directly benefits thousands of chil- niam Sundaranar University, Tirunelveli, Tamilnadu,
dren through various live welfare projects focused on education, healthcare, live- in 2010 and 2016, respectively. Currently, he is work-
lihood, and women empowerment. He has been actively involved in the club, ing as an Associate Professor in the School of Com-
participating in numerous outreach events to support these initiatives. puter Science and Engineering, Vellore Institute of Technology, Vellore, India.

Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.

You might also like