0% found this document useful (0 votes)
8 views14 pages

Phase 1

The document discusses a project aimed at predicting depression intensity using social media data, particularly from Twitter, by leveraging natural language processing (NLP) and deep learning (DL) techniques. It highlights the challenges of traditional depression diagnosis and proposes an intelligent system that utilizes user-generated content to identify signs of depression and estimate severity through a Long Short-Term Memory (LSTM) neural network. The project emphasizes the importance of early detection and aims to contribute to mental health monitoring in a non-intrusive and scalable manner.

Uploaded by

secretoperato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

Phase 1

The document discusses a project aimed at predicting depression intensity using social media data, particularly from Twitter, by leveraging natural language processing (NLP) and deep learning (DL) techniques. It highlights the challenges of traditional depression diagnosis and proposes an intelligent system that utilizes user-generated content to identify signs of depression and estimate severity through a Long Short-Term Memory (LSTM) neural network. The project emphasizes the importance of early detection and aims to contribute to mental health monitoring in a non-intrusive and scalable manner.

Uploaded by

secretoperato
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

CHAPTER 1
SYNOPSIS
ABSTRACT
Depression has become a serious and growing concern in modern society. It is one of the leading
causes of suicide, particularly among teenagers and young adults. In recent years, large-scale social
and lifestyle disruptions have contributed significantly to increased emotional stress and mental
health issues. Prolonged isolation, reduced social interactions, and uncertainty about the future
have exacerbated feelings of loneliness, anxiety, and depression among individuals worldwide.

Typically, clinical psychologists diagnose depression through in-person interviews based on


established psychological assessment criteria. However, many individuals do not seek medical
attention during the early stages of depression due to stigma or lack of awareness. As a result, early
detection remains a critical challenge in mental health care.

Interestingly, people often express their emotions, thoughts, and mental states through social media
platforms such as Twitter. These platforms offer a valuable opportunity to understand users'
psychological well-being based on their posts, behaviors, and language use.

Interestingly, people often express their emotions, thoughts, and mental states through social media
platforms such as Twitter. These platforms offer a valuable opportunity to understand users'
psychological well-being based on their posts, behaviors, and language use.

INTRODUCTION
In recent years, mental health disorders—particularly depression—have emerged as one of the
most critical challenges faced by individuals and communities worldwide. Depression not only
affects a person's emotional and psychological state but also severely impacts their social life,
productivity, and overall well-being. Traditional diagnostic methods rely heavily on clinical
interviews and self-reported symptoms, which can be both time-consuming and dependent on the
individual's willingness to seek help.

However, with the increasing digital presence of individuals on social networking platforms, there
exists an opportunity to monitor psychological health through behavioral and linguistic cues.
People often use platforms like Twitter to express their feelings, share personal experiences, and
communicate their mental state. These platforms act as informal diaries, making it feasible to
observe patterns indicative of depression.

DEPARTMENT OF MCA, BIT 2024-25 Page 1


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

This project proposes an intelligent system to predict depression and estimate its intensity using
social media data. The approach includes collecting tweets from users and extracting a variety of
features such as emotional tone, topic distributions, behavioral patterns, and user-level metadata.
A supervised machine learning framework is used, with models trained on weakly labeled data. In
particular, a Long Short-Term Memory (LSTM) neural network enhanced with the Swish
activation function is utilized to detect depression indicators with high accuracy.

By focusing on early detection through automated systems, this work aims to raise awareness and
potentially trigger timely interventions. The ultimate goal is to harness the power of machine
learning and social media to contribute to mental health monitoring in a non-intrusive and scalable
manner.

OBJECTIVES
The primary goal of this project is to detect signs of depression in social media users and estimate
the intensity of depressive symptoms using machine learning techniques. The specific objectives
of this study are as follows:

1. To design a depression detection framework that leverages user data from social media
platforms (e.g., Twitter) to predict depression and its severity..
2. To develop a weakly supervised approach for labeling social media data without requiring
manual annotations, facilitating large-scale data acquisition.
3. To extract a comprehensive set of user-level features, including emotional cues, topical
interests, behavioral patterns, and depression-related linguistic expressions.
4. To employ a deep learning model, specifically a Long Short-Term Memory (LSTM) network
enhanced with the Swish activation function, to accurately estimate depression intensity.
5. To perform comparative experiments against existing baseline models and evaluate the
proposed model using appropriate performance metrics such as Mean Squared Error (MSE)
and classification accuracy.
6. To explore behavioral patterns of depressed users, such as the frequent use of negative
words (e.g., “stress”, “sad”), personal pronouns, and late-night posting habits, for
psychological insight and improved model performance.

DEPARTMENT OF MCA, BIT 2024-25 Page 2


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

EXISTING METHOD

In recent years, several systems have been developed to identify depression using machine learning
(ML) techniques. These systems typically rely on traditional ML models such as Support Vector
Machines (SVM), Naive Bayes, and Random Forests. They classify user behavior and emotional
states based on manually engineered features extracted from social media text. While these
approaches have shown some success in binary depression detection, they face significant
limitations in capturing the deeper, sequential nature of user behavior over time.

Traditional ML-based systems often lack the ability to understand contextual and temporal
dependencies in the text, which are essential for detecting subtle signs of depression. These
systems also depend heavily on handcrafted features, which may not fully represent the emotional
and linguistic diversity of real-world user posts, especially in platforms like Twitter where users
express themselves informally and briefly.

Due to these limitations, existing systems struggle to accurately estimate the intensity of
depression or detect users in early stages. As a result, the need has emerged for more intelligent,
context-aware systems that can learn patterns from data over time. This gap has led to the
exploration of deep learning models, which are more suited to analyzing complex emotional
patterns in social media content.

In this project, we addressed the critical issue of depression detection by analyzing users’ social
media activity. Recognizing the growing trend of individuals expressing emotions online,
particularly on platforms like Twitter, we utilized this digital footprint to predict depressive
behavior and estimate depression intensity.

By leveraging a deep learning approach—specifically, a Long Short-Term Memory (LSTM)


network—we were able to extract and analyze a diverse range of features such as emotional, topical,
behavioral, user-level, and depression-related n-grams. These features were instrumental in
modeling user behavior effectively and contributed to an improved understanding of the linguistic
and behavioral patterns associated with depression.

DEPARTMENT OF MCA, BIT 2024-25 Page 3


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

The proposed system proved to be more efficient than traditional machine learning models by
capturing temporal patterns and context more effectively. Overall, this method shows promising
potential in supporting early detection and awareness of depression through automated, scalable, and
data-driven analysis of social media behavior.

PROPOSED METHOD

The proposed system aims to detect signs of depression and estimate its severity by analyzing
users' behavior and content on social media, specifically Twitter. Traditional diagnostic methods
rely on in-person consultations and clinical interviews, which may not be feasible for individuals
in the early stages of depression. To address this gap, the proposed approach models the task as a
supervised learning problem and utilizes publicly available social media data to automatically
identify potentially depressed users. This not only enables early intervention but also allows for
continuous, large-scale monitoring of mental health trends.

To effectively capture the characteristics of depressive behavior, the system extracts a diverse set
of features from user tweets. These include emotional indicators (such as the tone of the message),
topical features (topics discussed), behavioral traits (like posting time and frequency), and
linguistic patterns such as the use of personal pronouns and specific depression-related keywords
or phrases. The tweets are weakly labeled using a self-supervised method that doesn't rely on
manual data annotation, making the system scalable and more efficient.

For prediction, the system employs a Long Short-Term Memory (LSTM) neural network, which
is capable of understanding the context and sequence of users’ posts over time. The Swish
activation function is used to enhance the model's learning capability. This deep learning model is
trained on the extracted features to perform two main tasks: classify whether a user is depressed
or not and estimate the level of depression. The system shows improved accuracy in identifying
depressed users by capturing subtle patterns in their online behavior and language usage that
traditional machine learning models often miss.

DEPARTMENT OF MCA, BIT 2024-25 Page 4


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

Hardware Requirements

• Processor : Intel Core i7 Processor (8th Gen or above)


• RAM : 8GB or higher
• GPU : NVIDIA GPU with CUDA support (e.g., GTX 1650 or above)
(recommended for LSTM model training)
• Storage : 256GB SSD minimum

Software Requirements

• Programming Language: Python


• Deep Learning Framework: TensorFlow / Keras
• Libraries & Tools: Scikit-learn, NLTK, Gensim, Matplotlib, NumPy, Pandas
• Front End : HTML, CSS (if UI is involved)
• Code Editor / IDE : Jupyter Notebook / Jupyter Lab
• Operating System : Windows 10 / Linux (Ubuntu preferred for model
deployment)

DEPARTMENT OF MCA, BIT 2024-25 Page 5


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

CHAPTER 2
ABSTRACT
Depression has emerged as one of the most serious mental health issues affecting individuals
worldwide, particularly among younger demographics. With the increasing influence of online
platforms, people now frequently express their emotional states through social media, offering
valuable insight into their mental well-being. Traditional clinical methods of diagnosing depression
rely on face-to-face interactions with trained professionals. However, such methods often miss early
detection, as individuals are either unaware of their symptoms or hesitant to seek help.

This project addresses the challenge of early depression detection by leveraging user-generated
content from Twitter. The approach involves the use of a self-supervised learning technique to label
data weakly, followed by the extraction of a diverse range of features — including emotional cues,
behavioral patterns, topical interests, linguistic n-grams, and user-level attributes. These features are
then used to train a deep learning model based on a Long Short-Term Memory (LSTM) network
architecture, which is well-suited for understanding sequential and temporal patterns in text.

The proposed method aims not only to classify whether a user is depressed or not, but also to estimate
the level of depression intensity. Through comprehensive experimentation and analysis, the system
demonstrates improved performance compared to conventional machine learning baselines,
highlighting the potential of combining linguistic analysis with neural network models for mental health
assessment. This work contributes to the growing field of affective computing and could play a role in
building proactive mental health monitoring systems using social media data.

DEPARTMENT OF MCA, BIT 2024-25 Page 6


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

CHAPTER 3

INTRODUCTION
Depression is a common and serious mental health disorder that affects how individuals feel, think,
and handle daily activities. Despite its widespread impact, depression often goes undiagnosed due to
stigma, lack of awareness, or insufficient mental health infrastructure. With the exponential growth
in artificial intelligence (AI) and machine learning (ML), innovative approaches are being developed
to identify depressive symptoms early, based on behavioral, linguistic, and biometric data patterns.

Machine learning algorithms are capable of analyzing vast amounts of data, identifying subtle
patterns that might not be apparent to human evaluators. These patterns can emerge from text, voice,
facial expressions, social media activity, or even wearable sensor data. Leveraging such data, machine
learning models can predict the likelihood of depression with remarkable accuracy, offering potential
for early detection and timely intervention.

The traditional methods of diagnosing depression rely heavily on self-reported symptoms and
psychological evaluation, which can be subjective and inconsistent. In contrast, ML-based systems
are data-driven and can provide consistent, objective assessments. One popular approach includes
Natural Language Processing (NLP), where machine learning models analyze text from interviews,
conversations, or social media posts to detect depressive language markers such as reduced
vocabulary richness, negative sentiment, or use of first-person pronouns. Other techniques use audio
signal processing to capture changes in speech tone, energy, and rhythm that may correlate with
depressive states.

Deep learning models, particularly convolutional neural networks (CNNs) and recurrent neural
networks (RNNs), have been widely adopted in this domain due to their strong performance in pattern
recognition tasks. For example, CNNs can analyze micro-expressions and facial features to detect
emotional states, while RNNs are adept at processing sequential data such as speech or text, making
them ideal for temporal emotion recognition

DEPARTMENT OF MCA, BIT 2024-25 Page 7


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105
Recent studies have also introduced multimodal approaches that combine visual, audio, and textual
inputs to enhance accuracy and robustness. These models are trained on large annotated datasets of
individuals with and without depression, learning to differentiate between subtle cues. Moreover, as
with any data-driven AI system, ethical considerations such as data privacy, fairness, and bias
mitigation are crucial in designing responsible depression detection tools

Despite the promising capabilities of these models, challenges remain. Factors such as limited labeled
data, variability in expression across cultures, and the complex nature of mental health disorders can
affect performance. However, ongoing research continues to refine feature extraction, improve model
generalization, and enhance interpretability.

Ultimately, machine learning offers a powerful supplement to traditional mental health screening
methods. By enabling scalable, non-invasive, and real-time monitoring of mental well-being, ML-
based depression detection systems have the potential to revolutionize how mental health is assessed
and managed in the digital age.

Depression is not merely a temporary feeling of sadness or a reaction to everyday challenges—it is a


serious medical condition that affects a person's mood, thoughts, and physical health. It often results
in a persistent feeling of emptiness, hopelessness, and disinterest in daily activities. According to the
World Health Organization (WHO), depression is one of the leading causes of disability worldwide,
affecting more than 264 million people of all ages. The condition can result from a complex
interaction of social, psychological, and biological factors. Chronic stress, traumatic experiences,
genetic predisposition, and imbalances in brain chemistry (such as serotonin, dopamine, and
norepinephrine levels) are among the most significant contributors. Moreover, depression is often
comorbid with other mental health disorders, such as anxiety or substance abuse, which can
complicate diagnosis and treatment.

DEPARTMENT OF MCA, BIT 2024-25 Page 8


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

CHAPTER 4

LITERATURE SURVEY:

Methodology / Input
Author / Year Objective Result Limitations
Algorithm Used Parameters
Review NLP NLP is
methods for Systematic Text data from effective; Lack of standard
Y. Shen et al.
depression Literature social need for benchmarks,
(2020)[1]
detection on Review platforms multimodal multimodal data
social media methods
Visual
Detect
R. Reece, C. Brightness, features Limited to visual
depression Logistic
Danforth (2017) saturation, correlate analysis, privacy
via Instagram Regression
[2] hue, metadata with risks
photos
depression
Propose a
Framework
depression Content-,
Social media highlights Ethical/privacy
G. Gui et al. detection behavior-based
content, user challenges issues not fully
(2020) [3] framework & multimodal
metadata and addressed
and identify learning
directions
research gaps
Predict
depression Emotional,
Better Needs rich
S. Ghosh, T. severity LSTM (Deep behavioral,
estimation of annotated
Anwar (2021) [4] rather than Learning) topical
severity datasets
binary features
classification
Detect
symptoms SentenceBERT, Twitter/Reddit Can quantify Accuracy
S. Anwar, S.
from social semantic posts, BDI-II specific depends on
Ghosh (2022) [5]
media using similarity symptoms symptoms symptom model
BDI-II
ALBEF: Vision-
Improve Outperforms
language
vision- Image-text VQA and Large-scale data
Li et al. (2021) [6] transformer with
language pairs retrieval requirement
contrastive
learning benchmarks
learning
Enable few- Flamingo:
Strong few-
Alayrac et al. shot vision- Frozen LLM + Few image- Computationally
shot
(2022) [7] language Perceiver text examples intensive
performance
learning Resampler
SimVLM: Excellent
Unify vision
Wang et al. (2021) Transformer + across V+L Complex
and language Image and text
[8] Prefix Language and NLP architecture
in one model
Modeling tasks

DEPARTMENT OF MCA, BIT 2024-25 Page 9


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

Methodology / Input
Author / Year Objective Result Limitations
Algorithm Used Parameters

Use domain
knowledge in DKDD: Social media High
social media- Knowledge- content + robustness Requires
Wenli Zhang et al.
based enhanced deep knowledge and accuracy external
(2023) [9] knowledge
depression learning graph
detection integration
Use affective
I. BERT + Reddit posts, Enhanced
and moral Cultural bias in
Triantafyllopoulos affective & affect/moral classification
norms in morality norms
et al. (2023) [10] morality features lexicons performance
detection
Early
CNN + Effective
M. Trotzek et al. detection of Reddit post Metadata
linguistic early
(2018) [11] depression histories privacy concerns
metadata detection
from text
Improve Combined
ML classifiers Tweet Language-
H. AlSagri, M. detection features
(SVM, Naive frequency, specific features
Ykhlef (2020) using tweet outperform
Bayes, Decision sentiment, may not
[12] content and basic
Tree) keywords generalize
user activity methods
Predict Text-based
anxious classification Real-time Context-limited,
A. Kumar et al. 85.09%
depression in using tweets, language
(2019) [13] accuracy
real-time pattern/keyword hashtags variation
tweets analysis
Apply naive
Performs
R. Chatterjee et al. Bayes to Multinomial Facebook Simple model
well on small
(2021) [14] depression Naive Bayes comment data lacks nuance
datasets
detection

Use
transformer Mixture of High
Twitter text in High
Wesley Santos et ensemble for Experts + accuracy and
multiple computational
al. (2023) [15] multilingual Transformer multilingual
languages cost
depression models support
detection

DEPARTMENT OF MCA, BIT 2024-25 Page 10


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

CHAPTER 5

METHODOLOGY

Depression detection involves identifying signs of mental health issues from user-generated content
on social media platforms. Deep learning techniques, especially recurrent neural networks, are highly
effective in modeling the linguistic and emotional patterns associated with depression. The following
outlines the detailed methodology adopted for this project:

1. Data Collection
To build a depression detection model, a labeled dataset of social media posts is required. The dataset
includes:
• Depressed user posts: Texts sourced from mental health-related communities (e.g.,
subreddits like r/depression, r/Suicide Watch).
• Non-depressed user posts: Collected from general interest communities, serving as the
control group.
Each data point consists of the user's posts along with relevant metadata like timestamps and post
frequency.

2. Preprocessing the Data


The collected social media posts are preprocessed before training the model. Key preprocessing steps
include:
• Tokenization: Splitting text into individual words or tokens.
• Lowercasing: Converting all characters to lowercase for uniformity.
• Stopword Removal: Removing commonly used words that do not carry significant meaning.
• Lemmatization: Reducing words to their base forms (e.g., “running” to “run”).
• Padding: Standardizing input length by padding or truncating sequences to a fixed size.
• Vectorization: Converting text to numerical format using pretrained word embeddings like
GloVe.

3. Choosing and Training a Neural Network


Several deep learning architectures were considered for this task, including:
1. Recurrent Neural Networks (RNNs)
o Effective in modeling sequence data like natural language.
o Captures the contextual flow of words across posts.
DEPARTMENT OF MCA, BIT 2024-25 Page 11
Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105
2. Long Short-Term Memory Networks (LSTMs)
o Handles long-term dependencies and emotional tone across sequences.
o Suitable for identifying depression-related patterns over multiple posts.

3. Bidirectional LSTMs (BiLSTMs)


o Processes the sequence in both forward and backward directions.
o Enhances contextual understanding of each word in the post.

Training the Model


• Posts are fed into the BiLSTM model.
• Binary Cross Entropy is used as the loss function.
• The model is optimized using the Adam optimizer.
• Dropout and batch normalization are applied to reduce overfitting.

4. Evaluating the Model


After training, the model is evaluated on unseen data. The following metrics are used:
• Accuracy: Overall correctness of the predictions.
• Precision: How many of the predicted “depressed” users are actually depressed.
• Recall: How many of the actual depressed users are correctly identified.
• F1-Score: Balance between precision and recall.
• ROC-AUC Curve: Evaluates model performance across different classification thresholds.

5. Deployment for Real-World Detection

Once trained and validated, the model can be deployed in real-world applications such as:

1. Mental Health Monitoring Tools: Integrated into platforms for early warning alerts.
2. Social Media Screening Tools: Used by moderators to identify at-risk users.
3. Therapy Support Systems: Assisting mental health professionals in tracking patient
behavior online.

6. Challenges in Depression Detection


1. Ambiguous Language: Slang, sarcasm, and figurative speech can reduce model accuracy.
2. Privacy Concerns: User consent and ethical data handling are critical.
3. Generalization: Models trained on one platform may not perform well on another.
4. Imbalanced Data: More non-depressed samples can lead to biased learning.

DEPARTMENT OF MCA, BIT 2024-25 Page 12


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

CHAPTER 6

REFERENCES

1. Shen, Y., Rudzicz, F., & Araki, K. (2020). Text-based depression detection on social media
posts: A systematic literature review. Journal of Medical Internet Research, 22(3), e15649.
2. Reece, A. G., & Danforth, C. M. (2017). Instagram photos reveal predictive markers of
depression. EPJ Data Science, 6(1), 1-12.
3. Gui, G., Xu, L., Liu, Y., & Gong, Y. (2020). Depression detection on social media: A
classification framework and research challenges. IEEE Access, 8, 23577-23591.
4. Ghosh, S., & Anwar, T. (2021). Depression intensity estimation via social media: A deep
learning approach. In Proceedings of the International Joint Conference on Neural Networks
(IJCNN).
5. Anwar, S., & Ghosh, S. (2022). DepressMind: A system for mining Twitter and Reddit to
analyze depression symptoms. In Proceedings of the 2022 IEEE International Conference on
Big Data (Big Data).
6. Li, J., Selvaraju, R. R., Gotmare, A., et al. (2021). Align before fuse: Vision and language
representation learning with momentum distillation. Advances in Neural Information
Processing Systems (NeurIPS).
7. Alayrac, J.-B., Donahue, J., Luc, P., et al. (2022). Flamingo: A visual language model for few-
shot learning. arXiv preprint arXiv:2204.14198.
8. Wang, W., Li, X., Ma, Y., et al. (2021). SimVLM: Simple visual language model pretrained
with weak supervision. arXiv preprint arXiv:2108.10904.
9. Zhang, W., Sun, Y., Zhu, M., & Lin, H. (2023). Depression detection using digital traces on
social media: A knowledge-aware deep learning approach. Information Processing &
Management, 60(2), 103218.
10. Triantafyllopoulos, I., He, Z., & Bennett, K. (2023). Depression detection in social media posts
using affective and social norm features. Computers in Human Behavior Reports, 9, 100237.
11. Trotzek, M., Koitka, S., & Friedrich, C. M. (2018). Utilizing neural networks and linguistic
metadata for early detection of depression indications in text sequences. IEEE Transactions on
Knowledge and Data Engineering.
12. AlSagri, H., & Ykhlef, M. (2020). Machine learning approach for depression detection in
Twitter using content and user interaction features. Journal of Ambient Intelligence and
Humanized Computing.

DEPARTMENT OF MCA, BIT 2024-25 Page 13


Social media Based Depression Intensity Prediction Using NLP & DL 1BI23MC105

13. Kumar, A., Sharma, A., & Arora, A. (2019). Anxious depression prediction in real-time social
data. Procedia Computer Science, 152, 202-210.
14. Chatterjee, R., Mandal, S., & Barman, S. (2021). Depression detection using multinomial naive
theorem. In Proceedings of the 2021 International Conference on Intelligent Technologies
(CONIT).
15. Santos, W., Silva, E., Correia, D., et al. (2023). Mental health prediction from social media
text using mixture of experts. IEEE Access, 11, 37788-37800.

DEPARTMENT OF MCA, BIT 2024-25 Page 14

You might also like