Effective Analysis of Machine and Deep Learning Methods For Diagnosing Mental He
Effective Analysis of Machine and Deep Learning Methods For Diagnosing Mental He
1, FEBRUARY 2025
Abstract—The increasing incidence of mental health issues methods of diagnosing and treating mental health issues often
demands innovative diagnostic methods, especially within digital involve subjective assessments and therapy and can be time
communication. Traditional assessments are challenged by the sheer
volume of data and the nuanced language found on social media and intensive.
other text-based platforms. This study seeks to apply machine learn- With the use of big data and advanced computational technol-
ing (ML) to interpret these digital narratives and identify patterns ogies, machine learning (ML) and deep learning (DL) methods
that signal mental health conditions. We apply natural language have emerged as powerful tools that can potentially transform
processing (NLP) techniques to analyze sentiments and emotional the landscape of mental health diagnostics. The primary goal of
cues across datasets from social media and other text-based commu-
nication. Using ML, deep learning, and transfer learning models employing ML and DL in this context is to improve the accu-
such as bidirectional encoder representations (BERTs), robustly racy, efficiency, and accessibility of mental health diagnostics.
optimized BERT approach (RoBERTa), distilled BERT (Distil- These technologies offer the promise of detecting subtle patterns
BERT), and generalized autoregressive pretraining for language in large datasets that human clinicians might not easily recog-
understanding (XLNet), we assess their ability to detect early nize. This includes analyzing everything from electronic health
signs of mental health concerns. The results show that BERT,
RoBERTa, and XLNet consistently achieve over 95% accuracy, records (EHRs) and clinical notes to voice recordings and social
highlighting their strong contextual understanding and effective- media posts. Such analyses can reveal insights into behavioral
ness in this application. The significance of this research lies in its patterns, linguistic cues, and other markers that are indicative of
potential to revolutionize mental health diagnostics by providing mental health status.
a scalable, data-driven approach to early detection. By harnessing This article seeks to explore the various ML and DL techniques
the power of advanced NLP models, this study offers a pathway
to more timely and accurate identification of individuals in need that have been applied to detect mental health issues. It will cover
of mental health support, thereby contributing to better outcomes a range of methods including, but not limited to, support vector
in public health. machines, decision trees, recurrent neural networks (RNNs), and
convolutional neural networks (CNNs). Each method has its
Index Terms—Bidirectional encoder representation (BERT),
deep learning (DL), distilled BERT (DistilBERT), generalized strengths and limitations, which are examined considering their
autoregressive pretraining for language understanding (XLNet), application to diverse types of data, including social media inter-
machine learning (ML), mental health prediction, natural language actions, unstructured text, and real-time interaction data.
processing (NLP), robustly optimized BERT approach (RoBERTa), Furthermore, the integration of natural language processing
social media analytics, transfer learning.
(NLP) techniques to analyze text for sentiment and emotion pro-
vides a promising avenue for noninvasive mental health monitor-
ing and intervention. The potential of these technologies to act as
I. INTRODUCTION early warning systems identifying individuals at risk and facilitat-
ing timely intervention is an important point of discussion.
M ENTAL health disorders represent a major challenge to
public health, affecting millions worldwide and contrib-
uting significantly to the global burden of disease. Traditional
This introduction sets the stage for a detailed review and criti-
cal analysis of existing studies, highlighting the innovative ways
in which ML and DL are being leveraged to address mental
health challenges. It also discusses the ethical implications and
practical hurdles in the application of these technologies, paving
the way for an informed discussion on how these challenges can
be overcome. Through this exploration, the article contributes to
Received 11 June 2024; revised 24 September 2024; accepted 24 October
the broader discourse on mental health care, encouraging the
2024. Date of publication 8 November 2024; date of current version 31 responsible and effective integration of cutting-edge technolo-
January 2025. (Corresponding author: S. P. Raja.) gies in clinical settings.
The authors are with the School of Computer Science and Engineering
Overall, the study makes the following contributions.
(SCOPE), Vellore Institute of Technology, Vellore 632014, India (e-mail:
[email protected]; [email protected]; rsathyarajme@ 1) It was shown that other models were outperformed by gen-
gmail.com; [email protected]). eralized autoregressive pretraining for language under-
Digital Object Identifier 10.1109/TCSS.2024.3487168 standing (XLNet), achieving over 97% accuracy.
2329-924X © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 275
2) The algorithms were trained across multiple datasets to for managing FAQs in educational environments, employing
ensure robustness and generalizability, enabling a com- advanced tools such as TensorFlow, PyTorch, and TF-IDF vec-
prehensive evaluation of their performance in diverse torization. The models were fine-tuned using techniques such as
contexts. stochastic gradient descent and dropout, addressing a gap in
3) The viability of using transfer learning to predict mental NLP applications within educational institutions. Their study
health conditions from textual data was studied. Complex provides a comprehensive evaluation of different NLP models,
texts were analyzed, and meticulous data preprocessing showcasing their effectiveness in handling diverse queries and
was performed using transfer learning and NLP techniques the strength gained from using advanced technologies and optimi-
to ensure all critical information was retained. zation methods. While the study highlights the practical benefits
of these models in improving student support and information
II. MOTIVATION distribution, it has certain limitations. The narrow scope of the
datasets used may not fully capture the diversity of queries
In the digital era, social media platforms have evolved into encountered in broader educational settings. Additionally, the
vital spaces for personal expression, community building, and, study does not delve into the scalability of the models for larger
notably, the sharing of mental health experiences. These plat- or more complex datasets, nor does it thoroughly explore how
forms hold a wealth of information that, if harnessed responsi- these models could be integrated into existing educational sys-
bly, can offer unprecedented insights into public mental health tems, potentially restricting their practical application.
trends and individual well being. The impetus for this study is Sunar and Khalid [3] conducted a systematic review of NLP
anchored in the imperative to better understand mental health techniques in student feedback analysis, categorizing studies
discourse within these digital communities. By harnessing the using Creswell’s five-step process based on objectives, methods,
capabilities of ML, we strive to develop tools that can detect pat- models, and tools. The review highlights the need for more
terns and sentiments related to mental health, potentially offer- advanced NLP techniques and broader language support, sug-
ing early identification of at-risk individuals and communities. gesting future research to develop models and lexicons tailored
Despite the considerable growth in online mental health dis- to educational contexts. Although the review provides a compre-
cussions, there remains a paucity of effective computational hensive synthesis of the current literature and identifies key
tools that can navigate the complexities of natural language to gaps, its dependence on existing studies might introduce biases,
identify relevant mental health information. The nuanced nature and it does not offer specific strategies for advancing NLP.
of such discourse, where context is key and expressions are Additionally, the focus on categorization may overlook nuanced
diverse, poses significant challenges to traditional text analysis or interdisciplinary approaches that could enhance feedback
techniques. The current landscape demands more sophisticated analysis. Sufi and Khalil [4] introduced an artificial intelligence
approaches that can not only discern the subtleties of language (AI)-based method for real-time disaster monitoring using social
but do so with sensitivity and accuracy commensurate with the media, integrating named entity recognition (NER), sentiment
seriousness of mental health issues. analysis, CNNs, and the Getis-Ord Gi* algorithm. This approach
Furthermore, the detection of negative and toxic content on effectively extracts location-based sentiments from tweets,
these platforms is critical in safeguarding users from harmful enhancing disaster response with high accuracy and broad lan-
interactions that can exacerbate mental health conditions. There guage coverage. The study’s strengths include its innovative
is a compelling need to refine these detection methods to con- combination of AI and NLP techniques, which enable the accu-
tribute to safer online environments. rate extraction and analysis of disaster-related data across multi-
ple languages and regions. The reported high accuracy (97%),
III. LITERATURE REVIEW precision (0.93), and F1-score (0.90) validate the method’s
effectiveness in identifying disaster locations and assessing pub-
Khan et al. [1] focused on detecting abusive language in lic sentiment. The study does have limitations, particularly due
Urdu, a low-resource language, by developing the “dataset of to its reliance on potentially noisy and incomplete social media
Urdu abusive language” (DUAL). They applied DL techniques data, which could affect the consistency of location intelligence.
and models such as logistic regression, Gaussian Naïve Bayes, Moreover, despite supporting multiple languages, the accuracy
support vector machines (SVMs), and random forest, with ran- of sentiment analysis may differ in non-English contexts, which
dom forest achieving the highest effectiveness. The inclusion of could restrict its broader applicability. Further exploration is
an attention layer in a bi-LSTM model, using custom Word2Vec needed to assess the scalability of the approach for handling
embeddings, notably improved detection performance. This larger datasets or real-time global processing.
study is valuable for addressing resource scarcity in Urdu and Nouman et al. [5] applied NLP for mental health prediction
enhancing detection accuracy with advanced models. However, using a novel dataset from the Lyf Support app, with a BiGRU
the dataset’s size and specific focus on Urdu may limit the find- model showing superior accuracy. The study emphasizes the
ings’ generalizability to other languages or more complex abu- importance of well-labeled datasets, enhancing model perfor-
sive content. Additionally, while random forest performs well, it mance for real-time mental health monitoring. A major strength
might not fully capture the nuanced nature of abusive language, is the use of a carefully labeled dataset by psychologists, ensur-
suggesting a need for more sophisticated approaches. Attigeri ing reliable findings. The study fails due to the small sample
et al. [2] developed and assessed various NLP models tailored size and reliance on oversampling, which may limit the
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
276 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
generalizability and robustness of the model. The study calls for applicability across different languages and cultural contexts may
further research with larger datasets to validate and enhance the be constrained, affecting its generalizability. Aggarwal et al. [10]
applicability of the proposed models. Kadam and Reddy [6] explored using linguistic markers from social media posts for
explored the prediction of mental health conditions from social early mental health detection, proposing an NLP and DL-based
media text using ML and DL models, focusing on complex data- model to analyze these elements. This approach offers a timely,
sets and nuanced language to enhance accuracy. The study noninvasive method for mental health monitoring, capitalizing on
expands the scope of mental health prediction to include condi- social media’s widespread use for psychological assessment and
tions such as digital addiction and substance use disorders, dem- intervention. The study’s strength lies in its innovative use of
onstrating the superiority of DL over traditional methods. A key social media as a scalable, real-time data source. The integration
strength is the innovative approach to analyze a broader range of of DL with linguistic analysis enhances the model’s ability to
conditions, with DL effectively handling complex language pat- detect subtle mood and emotional changes, making it valuable for
terns in social media. The reliance on social media data introdu- early intervention. The reliance on social media data introduces
ces potential biases, and the small, English-specific dataset may potential biases due to the varying quality and representativeness
limit generalizability. The study also highlights challenges in of the information, and the model’s effectiveness might be limited
practical implementation, such as the need for continuous model by the accuracy of linguistic markers and the difficulty in general-
updates and managing evolving language patterns. izing findings across diverse cultural and linguistic contexts.
Dristy et al. [7] assessed ML classifiers for predicting mental Mathin et al. [11] explored personalized mental health analysis
health status from text data processed using NLP techniques, using AI, employing NLP techniques to extract insights on anxiety,
emphasizing feature extraction and model selection for high depression, and stress from user inputs. They utilized AI models
accuracy. The study highlights how traditional data processing such as decision tree, random forest, multinomial Naive Bayes,
methods can be adapted for mental health diagnostics, with deci- and XGBoost, with the combination of multinomial Naive Bayes
sion trees and support vector machines proving effective when and XGBoost achieving the highest accuracy. The study also eval-
paired with TF-IDF scores and lexicon-based sentiment markers. uated the PSYCHE system, a wearable tool integrated with a
The research’s strength lies in its comprehensive evaluation of smartphone, and introduced “Diary Bot,” a chatbot for expressive
classifiers and the importance of feature engineering in boosting writing to support mental well being. This research demonstrates
predictive accuracy. The study’s small dataset could restrict the AI’s potential in providing tailored therapeutic strategies, with
generalizability of its findings, and the focus on traditional NLP high accuracy in mental health predictions. The inclusion of wear-
methods might overlook the potential benefits of recent DL able technology and a chatbot enhances the practical applicability
advancements that could improve accuracy across more diverse of the findings. The reliance on specific AI models might reduce
datasets and languages. Otter et al. [8] surveyed the use of DL in the system’s flexibility, and concentrating on predefined keywords
NLP tasks, particularly focusing on sentiment analysis crucial could limit its ability to adapt to complex or emerging mental
for mental health monitoring. They examined the adaptability of health issues. Furthermore, the PSYCHE system and Diary Bot
DL models in handling complex linguistic patterns to detect need additional validation in diverse real-world environments to
subtle changes in mood and emotions from text. The survey cov- confirm the broader applicability of the findings.
ers various DL architectures and transformers, highlighting their Msosa et al. [12] explored using AI and NLP to predict men-
strengths in processing large-scale unstructured text data. The tal health crises in individuals with depression, utilizing EHRs
authors also discuss challenges such as the need for large data- from Mersey Care that include both structured and unstructured
sets and high computational costs, proposing solutions such as data. The study employed random forest models, gradient boost-
transfer learning and model compression techniques. While the ing trees, and LSTM networks, with the LSTM network demon-
article offers a comprehensive overview of state-of-the-art mod- strating the best performance. This research highlights the
els, its broad scope may limit in-depth discussion on specific potential of integrating AI with EHR data to predict mental
models or challenges. Moreover, the practical implementation health crises, suggesting its use in clinical decision support
of proposed solutions such as transfer learning and model com- tools. The study’s strength lies in its comprehensive approach,
pression is not fully explored, leaving gaps in their applicability. leveraging a large dataset and combining data types to improve
Varshney et al. [9] introduced an ensemble classification the predictive accuracy. The LSTM network’s ability to capture
method for sentiment analysis to enhance mental health monitor- temporal data aspects enhances its relevance for real-time clini-
ing, focusing on improving the robustness and accuracy of detect- cal applications. However, the reliance on data from a single
ing emotional states from text. This approach combines multiple source may limit generalizability across different healthcare sys-
ML algorithms, neural networks, and decision trees, leveraging tems, and the high computational requirements of these models
their strengths to overcome individual weaknesses. The study could challenge their widespread implementation, particularly
emphasizes the method’s effectiveness in handling complex lin- in resource-limited settings. Danner et al. [13] introduced a
guistic features such as sarcasm, which are challenging for single- novel AI application for detecting depression using advanced
model systems, and integrates contextual and semantic analysis to transformer networks, analyzing clinical interviews with
refine sentiment detection accuracy. While the ensemble tech- bidirectional encoder representation (BERT)-based models,
nique improves reliability and accuracy, its reliance on multiple GPT-3.5, and ChatGPT-4. They enhanced traditional datasets
models increases computational complexity and resource with simulated data to improve the model performance while
demands, potentially limiting scalability. Additionally, the study’s addressing data protection concerns. This approach significantly
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 277
TABLE I
CHARACTERISTIC COMPARISON TABLE
Deep
Previous Text Sentiment Feature Language Transfer Attention Data
Linguistic
Study Classification Analysis Extraction Modeling Learning Mechanisms Augmentation
Processing
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
Proposed
Study
outperforms previous methods in detecting depression from lin- Serrano and Kwak [15] developed an emotional support AI
guistic patterns, demonstrating AI’s potential in revolutionizing (ESAI) system to assist individuals with mental health disorders,
mental health care through early detection and intervention. The using NLP and ML. Trained on 160 000 Reddit posts, ESAI
study’s strengths include the innovative use of advanced trans- employs a Naive Bayes classification model to detect symptoms
formers and simulated data to overcome the data scarcity and of various mental health disorders, offering a user-friendly inter-
improve the accuracy. The reliance on simulated data could face for text and speech interactions. Preliminary results show a
restrict the model’s applicability in real-world clinical environ- classification accuracy of around 70%, highlighting ESAI’s
ments. Despite addressing the ethical, legal, and social implica- potential to complement professional mental health care. The
tions of using AI in mental health care, the study does not fully study’s strengths include the innovative use of a large social
explore practical solutions to these challenges, leaving some media dataset and an accessible interface. The choice of a Naive
important issues insufficiently addressed. Table I presents a Bayes model might limit the system’s ability to capture complex
comparative analysis of NLP techniques across multiple studies, linguistic nuances, and the 70% accuracy indicates that there is
with the proposed study distinguishing itself by incorporating a room for improvement. Additionally, using self-reported Reddit
range of advanced methods. data could introduce biases, affecting the generalizability of the
Dixit et al. [14] utilized NLP techniques to assess mental health findings. Ahmad et al. [16] explored the use of NLP for mental
by targeting depression and anxiety markers in textual data, using health detection in Malay text, focusing on sentiment analysis,
sentiment analysis, emotion identification, and linguistic pattern emotion recognition, and linguistic pattern detection. The study
detection. Their approach, applied to a diverse dataset from addresses the challenges of applying NLP to Malay, such as the
social media, online forums, and healthcare records, signifi- need for high-quality datasets and cultural nuances. By using
cantly improved F1 scores, recall, and accuracy. The study’s techniques such as TF-IDF, Word2Vec, and GloVe, the research
strength lies in its comprehensive approach and diverse dataset, demonstrates NLP’s potential for early mental health detection
enhancing generalizability and accuracy in detecting mental health and intervention, particularly in underrepresented languages.
markers. The focus on textual data might miss nonverbal cues, and The study’s focus on Malay enhances the relevance and robust-
the complexity of the NLP techniques could pose challenges for ness of its findings, but limitations arise from the scarcity of high-
implementation in resource-limited environments. The study sug- quality datasets and the limited integration of cultural nuances
gests that future research should explore cross-cultural datasets into NLP models, which may affect the accuracy and applicability
and integrate multimodal data to overcome these challenges. across diverse Malay-speaking populations. Ahmad et al. [17]
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
278 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 279
V ¼ E WK (3)
Fig. 2. Flow diagram of a transform learning-based model [21].
V ¼ E WV : (4)
For classification, the final hidden state of the [CLS] token is
Attention weights (A) are computed by used. The logits are obtained by a linear transformation of this
!
QK T hidden state from
A ¼ softmax pffiffiffiffiffi (5) Logits ¼ Hidden State½CLS W þ b: (8)
dk
A softmax function is applied to the logits to derive class prob-
In the end, the output is given by Attention(Q, K, V) ¼ A V.
The output from the self-attention mechanism for each position abilities from the function
passes through a FFN given by the following equation, which PðclassÞ ¼ softmaxðLogitsÞ: (9)
applies two linear transformations with rectified linear Unit (ReLU) Fig. 2 shows the architecture of a transformer-based language
activation in between model for text understanding. The input is tokenized and
FFNðxÞ ¼ maxð0, xW1 þ b1ÞW2 þ b2: (6) passed through an embedding layer that combines token embed-
dings with positional encodings (E0, E1, … E7) to retain the
Each sublayer, including self-attention and FFNs, is wrapped with
order of words. The “CLS” token is used for classification tasks
layer normalization and residual connections. For an input x, the
and is included at the beginning of the sequence. Each token
output y after a sublayer with a residual connection and layer nor-
embedding is then processed through a stack of 12 transformer
malization is given by
encoder blocks, each consisting of multihead attention and feed-
y ¼ LayerNormðx þ SubLayerðxÞÞ: (7) forward neural network layers, with add & norm steps following
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
280 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
Algorithm 1: Text Classification Using Transformers a profound understanding of text nuances. This quality is par-
(BERT, RoBERTa, DistilBERT) ticularly beneficial in analyzing content where contextual
interpretation—such as differentiating between literal and figura-
1. Load dataset D into a DataFrame df. tive language—is critical, as in mental health discourse on social
2. Preprocess ðdf Þ: Clean and normalize each text instance ðti Þ; media platforms. The robust pretraining regimen equips these
tokenize, remove stopwords, and lemmatize using NLP libraries; models to navigate and interpret the linguistic diversity of social
then split into training set ðDtrain Þ and validation set ðDval Þ: media language, including idioms, colloquialisms, and abbrevia-
tions. Their capacity to parse and understand such varied linguistic
3. Load the tokenizer T and model M with pretrained weights expressions makes them invaluable tools for sentiment analysis
(e.g., ‘bert-base-uncased,’ ‘distilbert-base-uncased,’ and and emotional state detection in digital communications. Their
‘roberta-base’) depending on the chosen architecture. ability to process multilingual content and detect regional mental
4. Convert texts into tokens and map to input IDs, attention health discourse patterns marks a significant advancement in
masks A, and segment/type IDs S using T: cross-cultural communication, contributing to global mental health
support initiatives. Particularly, DistilBERT’s efficiency under-
5. Create tensor slices or TensorFlow datasets from encoded scores the potential for real-time sentiment analysis and emotional
texts and labels for Dtrain and Dval : state detection, essential for monitoring harmful content, crisis
6. Compile M with an optimizer H ðe.g., Adam or AdamW), intervention, and live customer support on digital platforms.
learning rate g, epsilon , and clipnorm c if applicable. 5) Pros/Cons:
a) Pros:
7. Define loss function L as sparse categorical cross-entropy:
1) The deep bidirectional architecture of these models allows
XN
Lðy,by Þ ¼ yc logðybc Þ them to understand the context of words within a sentence
i¼1
better than traditional models. This is crucial for detecting
where y is the true label and by is the predicted label probabil- nuanced expressions related to mental health.
ity, C is the number of classes. 2) Social media text often contains irregularities such as
P slang, abbreviations, and emojis. These models’ robust-
8. Set metrics to track, such as accuracy Acc ¼ N1 Ni¼1
ness to different types of text can be an advantage when
1ðyi ¼ by i ÞÞ where N is the number of instances.
processing such data.
9. For each epoch E in a predefined number of epochs: 3) Since these models are pretrained on a large corpus, they
For each mini-batch b in Dtrain : come with a built-in understanding of language. It can rec-
a. Perform a forward pass to compute logit predictions ognize a range of language constructs that are beneficial
Pb ¼ M ðbÞ: when dealing with complex, real-world data such as tweets.
b. Apply the softmax function to obtain predicted b) Cons:
d 1) The architectural complexity can make hyperparameter
probabilities P b,h ¼ softmax
PðPb Þ.
c. Calculate the loss Lb ¼ yb log P cb . tuning and model optimization challenging, requiring
d. Perform a backward pass to compute gradients ðrLb Þ: significant expertise to achieve optimal performance.
e. Update model parameters H using gradients and 2) The depth and capacity of these models, while beneficial
optimizer H ¼ H grLb Þ. for capturing linguistic nuances, also increase the risk of
overfitting, especially on smaller datasets.
10. Evaluate M on Dval to calculate validation loss Lval and 3) The large number of parameters and the complexity of
accuracy Accval : the self-attention mechanism increase the computational
demands for training and fine-tuning, particularly for
BERT and RoBERTa.
each sublayer. The output from the final transformer block goes
B. XLNet
through a classification layer with a fully connected neural net-
work, Gaussian error linear unit (GELU) activation, and layer nor- 1) Concept: XLNet uses permutation language modeling,
malization (Norm). The result is then converted to a probability allowing it to learn from all possible word order permutations,
distribution over the possible vocabulary through a Softmax func- thus gaining a deeper contextual understanding of language.
tion to generate the final output. Symbols used in the diagram XLNet’s pretraining involves permutation language modeling, a
include “CLS” for denoting the start and separation of sentences, technique that differs from traditional sequential predictions by
Ei for positional encodings, and EToken for token embeddings. considering every possible permutation of the input sentences.
3) Algorithm: Algorithm 1 describes a text classification pro- This approach enables XLNet to capture a richer language
cess using Transformer models, involving data preprocessing, context than models trained in a single, fixed direction. Built
tokenization, model compilation, and iterative training with gra- upon the Transformer-XL architecture, XLNet benefits from
dient updates. an extended memory across longer text sequences, which allows
4) Relevance: By harnessing the Transformer’s self-attention, it to maintain context effectively over large documents. The core
these models offer adaptive contextual intelligence, enabling of its architecture is a series of transformer layers that use self-
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 281
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
282 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
Algorithm 2: Text Classification Using XLNet be more complex and time-consuming compared to sim-
pler models. This complexity may pose challenges for
1. Load dataset D from a CSV file into a pandas DataFrame df . teams with limited machine-learning expertise.
2. Preprocess df : Normalize ti in df by lowercasing and 3) Owing to its robust language processing capabilities,
removing new lines and special characters, tokenize and XLNet could be susceptible to overfitting when applied
remove stopwords using NLTK, lemmatize with SpaCy, to limited or homogenous datasets. This predisposition
and split into Dtrain and Dval : may diminish the model’s proficiency in applying learned
knowledge from the training datasets to practical, real-
3. Load the XLNet tokenizer TXLNet and model MXLNet with world textual applications.
‘xlnet-base-cased’ pre-trained weights.
4. Convert Dtrain and into input features: Encode each ti into C. Long Short-Term Memory (LSTM)
input IDs and attention masks using TXLNet then structure and 1) Concept: LSTMs represent an advanced type of RNN
batch these into tensors. designed for mastery of prolonged data dependencies. Their
5. Compile MXLNet with: development aimed to address traditional RNNs’ shortcomings,
notably the challenging vanishing gradient issue that hampers
a. An Adam optimizer H with a learning rate g. the learning process for extended sequences. LSTMs have sig-
b. The loss function L, defined as the cross-entropy loss, nificantly pushed forward the capabilities in the realm of
which can be expressed with log-softmax as: sequential analysis and forecasting.
X
K
An LSTM harbors a sophisticated internal architecture to
Lðy,b
y Þ ¼ yi log softmaxðzi Þ steward and reshape the information over durations. This intri-
i¼1 cate setup empowers them to retain and advance vital informa-
where K is the number of classes. tion across protracted sequences, thus serving as a bridge over
c. Metrics for model evaluation, with accuracy Acc as: sizable temporal gaps. The architecture’s core element that ena-
bles the conservation of data over extensive periods consists of
1X N
various gates that manage information flow: these include the
Acc ¼ 1ðyi ¼ by i Þ
N i¼1 input gate, forget gate, and output gate.
The input gate is tasked with deciding the quantity of new
where N is the number of instances. data to be integrated into the cell state. The forget gate is respon-
6. Train MXLNet on Dtrain : sible for identifying and eliminating data that is no longer
needed for the cell’s current task. The output gate’s role is to
a. For each epoch e out of a total E epochs, iterate over select the portion of the cell state to be released during the cur-
mini-batches b in Dtrain : rent processing stage.
b. In each mini-batch b: These components are pivotal in providing the LSTM with
i. Perform a forward pass with MXLNet to compute the the discretion to both retain and omit information, tailoring its
logits zb : memory to the demands of tasks where long-term data retention
ii. Apply the log-softmax function to obtain the is essential.
predicted probabilities pb ¼ log softmaxðzb Þ . 2) Mathematical Formulation: There are four gates mainly
iii. Compute the negative log-likelihood loss Lb for used in an LSTM, they are as follows.
pb with respect to the true labels yb : The forget gate which is given in (15) decides what informa-
iv. Backpropagate the loss to compute the gradient rLb : tion is discarded from the cell state
v. Update the model parameters H using rLb
through H. ft ¼ rðWf ½ht 1, xt þ bf Þ: (15)
c. After each epoch, evaluate MXLNet on Dval to compute The input gate which is given by (16) updates the cell state
the validation loss and accuracy. with new information
it ¼ rðWi ½ht 1,xt þ biÞ C e t ¼ tanhðWC ½ht 1,xt þ bCÞ:
the complex, nuanced language often used in discussions
(16)
about mental health, potentially leading to higher accu-
racy in identifying specific mental health issues. The cell state is given by (17) updated by forgetting the
b) Cons: selected information and adding new candidate values
1) XLNet’s sophisticated architecture and the requirement 1Ct ¼ ft Ct 1 þ iteC e t: (17)
to handle permutations make it computationally inten-
sive. Training and inference with XLNet might require The output gate decides the next hidden state, which is
significant computational resources, which could be a given by
limiting factor for some applications. ot ¼ rðWo ½ht 1, xt þ boÞ (18)
2) Setting up XLNet, especially customizing and fine-tuning
it for specific tasks such as mental health classification, can ht ¼ ot tanhðCtÞ: (19)
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 283
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
284 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
The reset gate decides how much of the past information to 8. Compile the GRU model with a binary cross-entropy loss
forget, given by function and use the Stochastic Gradient Descent (SGD)
optimizer.
rt ¼ rðWr xt þ Ur ht1 þ br Þ: (21)
9. Train the GRU model on the training data ðXtrain , ytrain Þ and
The candidate hidden state is a combination of the current validate its performance on the testing dataðXtest , ytest Þ:
input and the past hidden state, modulated by the reset gate
denoted by
encoding, and text vectorization. The GRU model is constructed
h~t ¼ tanhðWh xt þ Uh ðrt ht1 Þ þ bh Þ: (22)
with embedding and dense layers, trained with binary cross-
The hidden state ht is the final output of the GRU cell at time entropy loss and SGD optimizer.
step t, combining the old hidden state and the candidate hidden 4) Relevance: GRUs are designed to work with sequential
state, as influenced by the update gate shown in data, which is a fundamental aspect of NLP. GRUs employ
gating systems to regulate the transfer of information and are
ht ¼ zt ht1 þ ð1 zt Þ h~t: (23)
particularly useful for NLP tasks, including those related to
Fig. 5 illustrates the architecture of a GRU. “r” denotes the sig- mental health models. Just like LSTMs, GRUs are adept at cap-
moid activation function, which outputs a value between 0 and turing dependencies from long sequences of data. In the context
1. “tanh” represents the hyperbolic tangent activation function. of mental health, this means that GRUs can effectively use the
The symbols “” and “þ” indicate elementwise multiplication context from a patient’s earlier conversations or written text to
and addition, respectively. “1-” represents the operation of inform the understanding of their current mental state. GRUs
subtracting the value from one, essentially inverting it. These simplify the gating mechanism and have been shown to perform
operations collectively enable the GRU to effectively capture on par with LSTMs on certain tasks. This reduction in complex-
dependencies from input data (x1, x2, x3, … xn) and produce the ity can be particularly advantageous when modeling mental
output by updating the hidden states (H0, H1, … Ht 1). health conditions, where overfitting to the training data is a con-
3) Algorithm: Algorithm 4 outlines text classification using cern due to the nuanced and highly individual nature of mental
a GRU neural network, involving data preprocessing, label health expression.
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 285
5) Pros/Cons:
a) Pros:
1) Mastery of Persistent Dependencies: GRUs are adept at
grasping enduring relationships within serial data, sur-
passing the capabilities of conventional RNNs.
2) Superior Resource Economy: GRUs are recognized for
their greater resource efficiency relative to LSTM net-
works, attributed to their simplified parameter architecture.
3) Versatile Applicability: GRUs have shown prowess across
an array of tasks dealing with sequential data, encompass-
ing NLP and the analysis of time series.
b) Cons: Fig. 6. Different layers of CNN [25].
1) Still Prone to Overfitting: Despite their improvements
over standard RNNs, GRUs can still suffer from overfit-
The pooling layer reduces the dimensionality of each feature
ting, especially with smaller datasets.
map while retaining the most important information. Max pool-
2) Complexity: While simpler than LSTMs, GRUs are still
ing over a window of size p is defined by
more complex than basic RNNs, which can make them
harder to train and optimize. c^j ¼ max ð1 k pÞcjþk1 (25)
3) Limited Processing Power for Very Long Sequences:
for j ¼ 1 … (n h þ 1 p þ 1), effectively reducing the size
Despite their ability to handle long-term dependencies
of the feature map.
better than traditional RNNs, they might still struggle After flattening the pooled feature maps, the result is passed
with extremely long sequences compared to some newer through one or more fully connected layers. For a flattened vec-
architectures such as Transformers. tor v 2 Rq and weights Wfc 2 Rqr, the output is given by
E. Convolution Neural Network (CNN) z ¼ W fc v þ bfc (26)
1) Concept: Designed to tackle gridlike data, with image where bfc is the bias term and r is the number of output neurons.
processing being a prime example, the CNN framework is built For binary classification, the output layer often uses a sigmoid
on several key components such as convolutional layers, pool- function to predict the probability p of the positive class function:
ing layers, and densely connected layers. Convolutional layers 1
process the input with multiple filters to produce feature maps, p ¼ rðzÞ ¼ : (27)
1 þ ez
capturing key elements within the input. Pooling layers serve
For multilabel classification, separate sigmoid units can be used
to simplify these feature maps by reducing their size, thus
for each class, allowing the model to predict multiple classes
streamlining the data and lessening the need for computational
independently.
resources. Following this, the densely connected layers utilize
Fig. 6 describes the feature maps in CNN layers. The center
the streamlined feature maps for making predictions or classi-
image illustrates the process of applying a convolution operation
fying data.
with a 1-D kernel of size 5 across the feature maps, highlighting
CNNs distinguish themselves by maintaining the spatial rela-
the transformation of input data into a series of feature maps. To
tionships found in pixel data through the analysis of features
the right, a max pooling operation reduces dimensionality, select-
extracted from small, localized segments of the input data. This
ing the maximum value in each feature region. Following pool-
stands in contrast to traditional neural networks, which typically
ing, the feature maps are concatenated into a single vector and
convert an image into a 1-D array of pixels, thereby losing spa-
subsequently fed into a fully connected layer for classification.
tial structure. By keeping the image’s spatial hierarchy, CNNs
3) Algorithm: Algorithm 5 presents the text classification
can more adeptly identify patterns.
with a CNN, including preprocessing and encoding text data,
2) Mathematical Formulation: The multiple layers of a CNN
constructing the model with embedding, convolutional, pooling,
model are given as follows.
and dense layers, and training using binary cross-entropy loss
The convolutional layer utilizes filters to process the input
and SGD optimizer.
and distill important features. For an input matrix X 2 Rnd,
4) Relevance: CNNs, typically linked with analyzing visual
where n is the sequence length and d is the embedding dimen-
data, have demonstrated effectiveness in diverse NLP applica-
sion, and a filter W 2 Rhd of height h, the feature map c is gen-
tions, particularly within the mental health field. CNNs can be
erated by
X X applied to text analysis by considering segments of text as analo-
ci ¼ f ð m ¼ 1Þ h ðn ¼ 1Þd Wðm,nÞ Xðiþm1,nÞ þ b gous to image patches and finding patterns within those text
“images.” CNNs are excellent at automatic feature extraction. In
(24) text applications, convolutional layers can detect patterns such
for i ¼ 1 … (n h þ 1), where f is a nonlinear activation func- as n-grams (combinations of words) that might be indicative of
tion (e.g., ReLU), and b is a bias term. mental health issues when analyzing transcripts or written
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
286 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 287
After all the training, the prediction for an instance with fea-
ture vector x is given by the mode of predictions from all indi-
vidual trees is given by
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
288 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 289
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
290 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
TABLE II
RESULTS FROM DATASET 1
Jaccard
F1 Cohens Hamming Matthews Balanced
Accuracy Precision Recall Log Loss AUC ROC Similarity
Score Kappa Loss Correlation Accuracy
Coeff
SVC 95.6 96 96 96 91.2 1.51 95.63 4.4 91.58 91.21 95.63
Random
93.66 94 94 94 87.32 2.18 93.65 6.33 88.08 87.34 93.65
Forest
CNN 95.66 96 96 96 91.33 7.36 98.87 4.33 91.69 91.41 95.65
LSTM 96.5 97 97 97 93.02 12.4 98.47 3.49 93.25 93.08 96.52
GRU 93.01 94 93 93 86.03 21.42 97.2 6.98 86.89 86.82 93
BERT 96.96 97 97 97 93.14 14.24 99.18 3.42 93.35 93.16 96.58
RoBERTa 97.67 98 98 98 95.34 11.38 99.56 2.32 95.45 95.34 97.66
DistilBERT 97.54 98 98 98 95.08 11.19 99.66 2.45 95.2 95.08 97.54
XLNet 97.09 97 97 97 94.17 8.89 99.55 2.9 94.34 94.2 97.07
TABLE III
RESULTS FROM DATASET 2
Jaccard
Cohens Hamming Matthews Balanced
Accuracy Precision Recall F1 Score Log Loss AUC ROC Similarity
Kappa Loss Correlation Accuracy
Coeff
SVC 81.63 83 82 82 76.95 52.25 96.5 18.36 69.16 77.01 81.68
Random
82.26 83 83 83 77.73 86.72 96.1 17.73 70.25 77.77 82.54
Forest
CNN 79.76 80 80 80 74.63 94.57 95.1 20.23 66.46 74.65 79.91
LSTM 72.01 74 72 73 64.84 165.11 91.2 27.98 56.84 65.05 72.14
GRU 70.86 73 71 71 63.44 199.16 90 29.14 55.38 63.73 70.93
BERT 83.33 84 84 84 79.14 98.67 95.2 16.67 71.59 79.21 83.8
RoBERTa 82.53 83 83 83 78.14 79.22 95.2 17.47 70.32 78.2 83.07
DistilBERT 81.63 82 82 82 76.99 95.51 95.3 18.36 68.99 77.04 82.06
XLNet 81.63 82 82 82 76.98 84.96 94.5 18.36 69.09 76.98 81.87
discussed across the 4651 unique textual instances, allowing for with an accuracy of 93.66% and an F1-score of 94%, showing
detailed categorization and analysis. its limitations in processing this dataset, which is further reflected
in its relatively high Log loss of 21.42.
In dataset 2, which consists of user-generated comments, the
VI. RESULTS
BERT model outperforms others with an accuracy of 83.33%
The results of our proposed models are summarized in Tables and an F1-score of 84%. This performance is particularly signif-
II–IV, each corresponding to the performance on datasets 1, 2, icant in the context of identifying and classifying mental health-
and 3, respectively. For dataset 1, which includes over 36 000 related discourse, where BERT’s lower Hamming loss of 16.67
meticulously processed text entries focused on mental health signifies better precision in its predictions. Following closely,
discussions, our models displayed varied performance across RoBERTa and random forest achieve accuracies of 82.53% and
several key metrics. The RoBERTa model stands out with the 82.26%, respectively, with RoBERTa maintaining a stronger
highest accuracy at 97.67%, coupled with a robust F1-score of AUC ROC value of 95.2 compared to random forest. However,
98%. This model’s strength is further underscored by its low GRU and LSTM models show noticeably weaker performance
Log loss of 11.38 and the highest AUC ROC value at 99.56, with accuracies of 70.86% and 72.01%, coupled with higher
demonstrating its superior capability in distinguishing between Log loss values, highlighting their struggle in effectively catego-
relevant and nonrelevant posts. Similarly, DistilBERT shows a rizing the more nuanced aspects of user comments in this data-
commendable performance with an accuracy of 97.54% and an set. Despite this, CNN, although not leading in accuracy, shows
identical F1-score of 98%, though slightly behind RoBERTa in a balanced performance with an AUC ROC of 95.1 and a bal-
terms of AUC ROC and Log loss values. On the other hand, anced accuracy of 79.91%, making it a reliable model for certain
CNN and LSTM models, while performing well with accuracies aspects of this dataset.
of 95.66% and 96.5%, respectively, do not match the top models In dataset 3, which categorizes posts based on specific mental
in discriminative power, as indicated by their respective AUC health conditions, random forest emerges as the top performer with
ROC scores and higher Log loss values. Random forest lags an accuracy of 95.6% and an F1-score of 96%, demonstrating its
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 291
TABLE IV
RESULTS FROM DATASET 3
Jaccard
Cohens Hamming Matthews Balanced
Accuracy Precision Recall F1 Score Log Loss AUC ROC Similarity
Kappa Loss Correlation Accuracy
Coeff
SVC 92.22 92 92 92 84.44 2.68 92.2 7.77 85.57 84.45 92.21
Random
95.6 96 96 96 91.2 1.51 95.6 4.4 91.58 91.21 95.63
Forest
CNN 88.23 91 91 91 81.95 44.03 96.9 9.02 83.44 81.96 90.97
LSTM 89.68 90 90 90 79.42 26.18 96.9 10.31 81.27 79.96 89.8
GRU 91.56 92 92 92 83.14 22.86 97 8.43 84.44 83.24 91.61
BERT 95.53 96 96 96 91.06 18.94 99.1 4.46 91.49 91.07 95.53
RoBERTa 95.22 95 95 95 90.45 23.81 99 4.77 90.89 90.46 95.22
DistilBERT 95.46 95 95 95 90.92 22.41 99.1 4.53 91.31 90.92 95.46
XLNet 95.39 95 95 95 90.87 22.67 99.1 4.39 91.27 90.84 95.39
TABLE V
STATE-OF-THE-ART COMPARISON TABLE
robust ability to classify text entries according to various mental of mental health discussions, where accurate classification can be
health issues. BERT and RoBERTa also perform strongly with critical for understanding and addressing the underlying issues.
accuracies of 95.53% and 95.22%, respectively, and F1-scores
of 96% and 95%, showcasing their consistent performance
VII. DISCUSSION
across multiple datasets. However, CNN shows a drop in
accuracy to 88.23%, suggesting potential limitations in han- This research embarked on an exploratory journey through
dling the specific mental health categorizations present in this the digital topography of mental health discussions, leveraging
dataset. Meanwhile, LSTM and GRU exhibit moderate perfor- advanced ML techniques to distill meaningful patterns from
mance, with LSTM achieving an accuracy of 89.68% and vast textual datasets. The results presented in this study not only
GRU at 91.56%, both maintaining reasonable precision and underscore the capabilities of various classifiers in text categori-
recall metrics but not surpassing the leading models. Distil- zation but also highlight the nuanced differences in performance
BERT and XLNet also demonstrate strong results, closely mir- metrics across different datasets and modeling approaches.
roring the performance of BERT and RoBERTa, with balanced Notably, the study illustrates the efficacy of advanced NLP
accuracies above 95% and low Hamming losses, indicating their models such as RoBERTa, DistilBERT, and BERT in accurately
reliability for text classification tasks within the mental health detecting mental health-related posts, demonstrating impressive
domain. metrics across multiple datasets.
In conclusion, across all three datasets, RoBERTa and BERT In dataset 1, our findings reveal a compelling narrative of ML
consistently emerge as the top performers, particularly excelling in efficacy. Ensemble methods and transformer-based models,
metrics such as accuracy, F1-score, and AUC ROC. Their strong particularly RoBERTa and DistilBERT, achieved outstanding
results suggest that these models are well suited for handling com- accuracy, precision, recall, and F1 scores, all surpassing the 95%
plex and nuanced text classification tasks, especially in the context mark. These models also exhibited remarkable Cohen’s Kappa
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
292 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
scores, signifying substantial agreement beyond chance—a testa- be relatively efficient, with an average training time of approxi-
ment to their robustness in discerning mental health-relevant posts mately 16 200 s (around 4.5 h), making them a viable option for
amidst the online chatter. However, this high performance comes tasks with moderate computational resources.
with significant computational demands. The time complexity of The performance variance across the datasets can be
these transformer models is Oðn2 Þ due to the self-attention mech- attributed to the intrinsic complexities of the textual data.
anism, where n represents the sequence length. This quadratic The subtleties of language, context-dependency of expres-
complexity requires substantial computational resources, includ- sions, and the myriad ways in which mental health issues are
ing high-performance GPUs with memory often exceeding 16 communicated online make perfect precision inherently
GB, making the training process computationally intensive and challenging. Nonetheless, the consistently lower log loss
time-consuming, averaging around 32 h for models such as RoB- values for transformer models across all datasets hint at their
ERTa and DistilBERT. superior confidence in predictions, likely due to their deeper
Moving on to dataset 2, characterized by its binary classifica- contextual understanding gleaned from the training data.
tion of negative versus neutral comments, the models faced a This deeper understanding, however, comes at the cost of
more challenging task. Accuracy scores ranged from 70.86% to increased computational complexity and training time.
83.33%, with BERT leading in performance. This suggests that These findings resonate with the growing consensus in the
more nuanced detection of toxic content may be required, poten- NLP community that transformer-based models, with their deep
tially through incorporating more sophisticated contextual embed- contextual understanding, generally outperform traditional ML
dings. The complexity of BERT’s self-attention mechanism again approaches, especially in tasks involving rich, nuanced language
highlights the balance between model accuracy and computa- data such as mental health discussions. This study reaffirms the
tional cost, as the sequence length significantly impacts process- importance of selecting appropriate models based not only on
ing time and resource requirements. the dataset’s nature and the specific nuances of the classification
In contrast, models such as XLNet, which also rely on trans- task but also considering the computational resources available
former architecture, have similar computational challenges due for model training and deployment.
to their Oðn2 Þ time complexity. However, XLNet typically While the article discusses the performance evaluation of the
requires slightly less training time, averaging around 27 h. This models, it does not mention any real-world evaluation or practical
reduction in training time, while still significant, reflects the con- applications of the approach. In addressing this gap, a case study
tinuous improvements in transformer models to handle large- involving a mental health organization deploying a real-time
scale data more efficiently. monitoring system using a fine-tuned RoBERTa model to identify
In dataset 3, representing a multiclass categorization task, the social media posts indicating severe distress or suicidal intent
transformer models, particularly BERT and its optimized variant demonstrated initial success. However, the model faced chal-
DistilBERT, continued to excel, achieving balanced accuracy lenges when new slang or terminology appeared, which was not
scores well above 90%. These results underscore the efficacy of covered in the training data, necessitating periodic retraining to
transformer models in handling complex classification tasks maintain accuracy and relevance. This retraining process, while
with multiple categories. Again, the computational cost associ- crucial, is resource-intensive due to the model’s Oðn2 Þ complex-
ated with these models, given their Oðn2 Þ time complexity, ity, further emphasizing the need for powerful computational
remains a crucial consideration, especially when scaling these infrastructure in real-world deployments. This, along with other
models for real-world applications. examples such as the deployment of mental health chatbots and
When considering nontransformer models, the study also systems for detecting toxic content in online forums, underscores
explored the performance of LSTM and GRU models, which the need for continuous updates, human oversight, and robust
have a time complexity of Oðn:mÞ, where n represents the strategies to ensure the effectiveness and ethical application of
sequence length and m is the dimensionality of the hidden state. these methods in dynamic, real-world environments.
Unlike transformers, these models process sequences sequen- However, when considering real-world applicability, several
tially, resulting in a linear relationship between sequence length challenges arise, particularly in deployment and the robustness
and computation time. Although LSTM and GRU models are of these methods in dynamic environments. Challenges include
less computationally intensive than transformers, they are poten- handling data drift—where evolving online discourse necessi-
tially slower for very long sequences due to their sequential tates models to adapt continuously to new language patterns—
processing. LSTM models, for example, required approximately and managing the computational demands required to scale
7 h to train in this context, while GRU models, with their sim- these models in real-time applications. Additionally, ensuring
pler architecture, were slightly more efficient, completing train- the interpretability of these complex models is crucial, espe-
ing in around 5 h. cially in sensitive domains such as mental health, where under-
Additionally, CNN models were employed for text classifica- standing the rationale behind a prediction is vital.
tion tasks. The time complexity of CNNs primarily depends on In conclusion, this research highlights the powerful capabili-
the sequence length, the number of filters, and the kernel size, ties of transformer models in text categorization tasks related to
resulting in a complexity of Oðn:f :kÞ. CNNs, while not as pow- mental health, while also addressing the practical challenges in
erful in capturing long-range dependencies as transformers or deploying these models effectively and ethically in real-world
recurrent networks, offer a more efficient approach for tasks scenarios. The balance between model performance and compu-
requiring local feature extraction. In this study, CNNs proved to tational resource demands remains a key consideration for future
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
KASANNENI et al.: EFFECTIVE ANALYSIS OF MACHINE AND DEEP LEARNING METHODS 293
applications, particularly in environments where real-time proc- [4] F. K. Sufi and I. Khalil, “Automated disaster monitoring from social
media posts using AI-based location intelligence and sentiment analysis,”
essing and adaptability are essential. IEEE Trans. Computat. Social Syst., vol. 11, no. 4, pp. 4614–4624,
Aug. 2024, doi: 10.1109/TCSS.2022.3157142.
[5] M. Nouman, H. Sara, S. Y. Khoo, M. P. Mahmud, and A. Z. Kouzani,
VIII. CONCLUSION “Mental health prediction through text chat conversations,” in Proc. Int.
Joint Conf. Neural Netw. (IJCNN), Gold Coast, Australia, 2023,
This article has provided a specific and thorough analysis of pp. 1–6, doi: 10.1109/IJCNN54540.2023.10191849.
various ML and DL methods for detecting mental health issues. [6] D. P. Kadam and K. T. V. Reddy, “A study of machine learning models
for predicting mental health through text analysis,” in Proc. 1st
The findings highlight that ensemble methods and transformer- DMIHER Int. Conf. Artif. Intell. Educ. Ind. 4.0 (IDICAIEI), Wardha,
based models such as RoBERTa and DistilBERT have demon- India, 2023, pp. 1–5, doi: 10.1109/IDICAIEI58380.2023.10406845.
strated exceptional performance in accurately identifying mental [7] I. J. Dristy, A. M. Saad, and A. A. Rasel, “Mental health status prediction
using ML classifiers with NLP-based approaches,” in Proc. Int. Conf.
health-related posts, with accuracy, precision, recall, and F1 Recent Progresses Sci. Eng. Technol. (ICRPSET), Rajshahi, Bangladesh,
scores surpassing 95%. Additionally, BERT has shown profi- 2022, pp. 1–6, doi: 10.1109/ICRPSET57982.2022.10188544.
ciency in classifying negative or toxic comments with the high- [8] D. W. Otter, J. R. Medina, and J. K. Kalita, “A survey of the usages of
deep learning for natural language processing,” IEEE Trans. Neural
est accuracy among the models tested. Netw. Learn. Syst., vol. 32, no. 2, pp. 604–624, Feb. 2021, doi: 10.1109/
Despite these successes, the research faced several limita- TNNLS.2020.2979670.
tions. The inherent complexities of natural language and the [9] C. J. Varshney, A. Sharma and D. P. Yadav, “Sentiment analysis using
ensemble classification technique,” in Proc. IEEE Students Conf. Eng.
contextual nuances in expressing mental health issues online Syst. (SCES), Prayagraj, India, 2020, pp. 1–6, doi: 10.1109/SCES50439.
posed significant challenges, impacting the predictive perfor- 2020.9236754.
mance of the models across different datasets. Specifically, the [10] V. Aggarwal, J. Kaur, T. Walia, and D. Kaur, “Harnessing linguistic
markers for early mental health detection via social media,” in Proc. Int.
detection of toxic content required more sophisticated detection Conf. Adv. Comput. Commun. Technol. (ICACCTech), Banur, India,
of context and language subtleties, an area where even advanced 2023, pp. 292–297, doi: 10.1109/ICACCTech61146.2023.00054.
models such as BERT struggled to some extent. Furthermore, [11] S. Mathin, D. S. Chandra, A. R. Sunkireddy, B. J. V. Varma, S.
Hariharan, and V. Kukreja, “Personalized mental health analysis using
the computational intensity associated with training large-scale artificial intelligence approach,” in Proc. Int. Conf. Adv. Data Eng.
models such as BERT and RoBERTa is considerable, making Intell. Comput. Syst. (ADICS), Chennai, India, 2024, pp. 1–6, doi: 10.
them difficult to deploy in resource-constrained environments. 1109/ADICS58448.2024.10533648.
[12] Y. J. Msosa et al., “Trustworthy data and AI environments for clinical
Another limitation is the potential risk of overfitting on smaller prediction: Application to crisis-risk in people with depression,” IEEE J.
datasets, which could lead to reduced generalizability when Biomed. Health Inform., vol. 27, no. 11, pp. 5588–5598, Nov. 2023,
applied to new data. doi: 10.1109/JBHI.2023.3312011.
[13] M. Danner et al., “Advancing mental health diagnostics: GPT-based
To address these challenges, future work should focus on method for depression detection,” in Proc. 62nd Annu. Conf. Soc.
refining these models to enhance their sensitivity to the subtle- Instrum. Control Eng. (SICE), Tsu, Japan, 2023, pp. 1290–1296, doi:
ties of language used in mental health contexts. This could 10.23919/SICE59929.2023.10354236.
[14] K. K. Dixit, S. Pundir, A. Shrivastava, C. P. Kumar, A. P. Srivastava,
involve incorporating larger and more diverse datasets, including and P. Singh, “Analyzing textual data for mental health assessment:
multilingual data, to improve the robustness and generalizability Natural language processing for depression and anxiety,” in Proc. 10th
IEEE Uttar Pradesh Sect. Int. Conf. Elect., Electron. Comput. Eng.
of the models. Additionally, exploring newer architectures and
(UPCON), Gautam Buddha Nagar, India, 2023, pp. 1796–1802, doi: 10.
hybrid models could offer further improvements in the accurate 1109/UPCON59197.2023.10434291.
detection of mental health issues from textual data. Future [15] G. Serrano and D. Kwak, “ESAI: An AI-based emotional support
system to assist mental health disorders,” in Proc. Congr. Comput. Sci.
research should also consider the practical aspects of deploying Comput. Eng. Appl. Comput. (CSCE), Las Vegas, NV, USA, 2023,
these models in real-world scenarios, such as optimizing models pp. 1348–1354, doi: 10.1109/CSCE60160.2023.00226.
for scalability and efficiency in dynamic environments, and ensur- [16] Z. Ahmad, R. Maskat, and A. Mohamed, “Harnessing natural language
processing for mental health detection in Malay text: A review,” in
ing they are interpretable and ethically sound in sensitive applica- Proc. 4th Int. Conf. Artif. Intell. Data Sci. (AiDAS), IPOH, Malaysia,
tions. By addressing these limitations and exploring these future 2023, pp. 29–35, doi: 10.1109/AiDAS60501.2023.10284653.
directions, the research can contribute to more effective and reli- [17] A. Mittal, L. Dumka, and L. Mohan, “A comprehensive review on the
use of artificial intelligence in mental health care,” in Proc. 14th Int.
able tools for mental health analysis using NLP techniques. Conf. Comput. Commun. Netw. Technol. (ICCCNT), Delhi, India, 2023,
pp. 1–5, doi: 10.1109/ICCCNT56998.2023.10308255.
REFERENCES [18] M. H. Lee and R. Kyung, “Mental health stigma and natural language
processing: Two enigmas through the lens of a limited corpus,” in Proc.
[1] A. Khan, A. Ahmed, S. Jan, M. Bilal, and M. F. Zuhairi, “Abusive IEEE World AI IoT Congr. (AIIoT), Seattle, WA, USA, 2022, pp.
language detection in Urdu text: Leveraging deep learning and attention 688–691, doi: 10.1109/AIIoT54504.2022.9817362.
mechanism,” IEEE Access, vol. 12, pp. 37418–37431, 2024, doi: 10. [19] S. A. N. Siddik, B. M. Arifuzzaman, and A. Kalam, “Psyche
1109/ACCESS.2024.3370232. conversa—A deep learning based chatbot framework to detect
[2] G. Attigeri, A. Agrawal, and S. V. Kolekar, “Advanced NLP models for mental health state,” in Proc. 10th Int. Conf. Inf. Commun. Technol.
technical university information chatbots: Development and comparative (ICoICT), Bandung, Indonesia, 2022, pp. 146–151, doi: 10.1109/
analysis,” IEEE Access, vol. 12, pp. 29633–29647, 2024, doi: 10.1109/ ICoICT55009.2022.9914844.
ACCESS.2024.3368382. [20] K. Rani, H. Vishnoi, and M. Mishra, “A mental health chatbot delivering
[3] A. S. Sunar and M. S. Khalid, “Natural language processing of cognitive behavior therapy and remote health monitoring using NLP and
student’s feedback to instructors: A systematic review,” IEEE Trans. Learn. AI,” in Proc. Int. Conf. Disruptive Technol. (ICDT), Greater Noida, India,
Technol., vol. 17, pp. 741–753, 2024. doi: 10.1109/TLT.2023.3330531. 2023, pp. 313–317, doi: 10.1109/ICDT57929.2023.10150665.
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.
294 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 12, NO. 1, FEBRUARY 2025
[21] Towards Data Science. Accessed: Sep. 20, 2024. [Online]. Available: Achyut Duggal is currently working toward the
https://towardsdatascience.com/bert-explained-state-of-the-art-language- Bachelor of Technology degree in computer science
model-for-nlp-f8b21a9b6270 and engineering with Vellore Institute of Technology
[22] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available: https://www. (VIT), Vellore, India.
researchgate.net/figure/XLNet-architecture-a-Query-stream-attention- He is a dedicated and innovative Developer in com-
b-Content-stream-attention-and-c-The_fig3_379781094 puter science with VIT. With a strong foundation in the
[23] ML Review. Accessed: Sep. 20, 2024. [Online]. Available: https://blog. field, he has excelled in various technical roles, show-
mlreview.com/understanding-lstm-and-its-diagrams-37e2f46f1714 casing his expertise in GUI development, full-stack
[24] upGrad. Accessed: Sep. 20, 2024. [Online]. Available: https://www. mobile applications, and robotics. He has done intern-
upgrad.com/blog/basic-cnn-architecture/ ships at various organizations as a Software Developer
[25] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available: https:// and an Application Developer. He has worked in the
www.researchgate.net/figure/The-architecture-of-a-gated-recurrent-unit- fields of image processing, machine learning, and deep learning. His research inter-
GRU-cell_fig2_350933270 ests include computer vision and retrieval augmented generation.
[26] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available:https:// Mr. Achyut is a part of RoboVITics—The official robotics club of VIT and
www.researchgate.net/figure/Architecture-of-the-Random-Forest-algorithm_ has served as the Vice-Chairperson of the club from 2023 to 2024. The club
fig1_337407116 had won the “Best Technical Club Award” at VIT under his tenure.
[27] ResearchGate. Accessed: Sep. 20, 2024. [Online]. Available:https://
www.researchgate.net/figure/A-schematic-diagram-for-support-vector-
machine-SVM-training-and-testing-process_fig3_350007804 R. Sathyaraj completed his B.Tech., M.E., and
Ph.D. degrees. He is a Professor with the School of
Computer Science and Engineering, Vellore Insti-
tute of Technology, Vellore, Tamilnadu, India. His
Yashwanth Kasanneni is currently working toward research interests include software fault prediction, nat-
the Bachelor of Technology degree in computer sci- ural language processing, machine learning, and deep
ence and engineering with Vellore Institute of Tech- learning.
nology, Vellore, India.
He is a creative and detail-driven individual spe-
cializing in crafting innovative solutions by exploring
the intricate nuances. He has excelled in software
development, high-performance coding, and backend
web development. His expertise spans these critical
areas, showcasing his ability to deliver robust and S. P. Raja is born in Sathankulam, Tuticorin District,
efficient solutions. He is resolute in his pursuit of sur- Tamilnadu, India. He completed the schooling at
passing personal and professional milestones and excited to contribute with a dis- Sacred Heart Higher Secondary School, Sathankulam.
tinctive perspective and collaborate with like-minded individuals, fostering He received the B.Tech. degree in information tech-
innovation. He has completed an internship at Enabled Analytics as a Junior nology from Dr. Sivanthi Aditanar College of Engi-
Salesforce Developer. Additionally, he has worked extensively in the fields of neering, Tiruchendur, Tamilnadu, in 2007, and the
OpenCV, machine learning, deep learning, and natural language processing. M.E. degree in computer science and engineering and
Mr. Yashwanth is an active member of Anokha, an NGO club, Vellore, India. the Ph.D. degree in image processing from Manonma-
Anokha is a non-government organization that directly benefits thousands of chil- niam Sundaranar University, Tirunelveli, Tamilnadu,
dren through various live welfare projects focused on education, healthcare, live- in 2010 and 2016, respectively. Currently, he is work-
lihood, and women empowerment. He has been actively involved in the club, ing as an Associate Professor in the School of Com-
participating in numerous outreach events to support these initiatives. puter Science and Engineering, Vellore Institute of Technology, Vellore, India.
Authorized licensed use limited to: Zhejiang University. Downloaded on March 27,2025 at 03:34:01 UTC from IEEE Xplore. Restrictions apply.