0% found this document useful (0 votes)
71 views32 pages

Depression Analysis From Social Media Data in Bangla Language Applying Deep Recurrent Neural Network

This document summarizes a student project that analyzed social media data from Bangla language posts to detect signs of depression. The students collected over 5,000 tweets and used data labeling and preprocessing to create datasets for training recurrent neural networks. They tested two models - GRU and LSTM RNNs - and performed hyperparameter tuning and 10-fold cross validation. The best model was an LSTM RNN that achieved an average validation accuracy of 83.33% for detecting depression in Bangla language social media posts.

Uploaded by

Sumit Kumar Dam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
71 views32 pages

Depression Analysis From Social Media Data in Bangla Language Applying Deep Recurrent Neural Network

This document summarizes a student project that analyzed social media data from Bangla language posts to detect signs of depression. The students collected over 5,000 tweets and used data labeling and preprocessing to create datasets for training recurrent neural networks. They tested two models - GRU and LSTM RNNs - and performed hyperparameter tuning and 10-fold cross validation. The best model was an LSTM RNN that achieved an average validation accuracy of 83.33% for detecting depression in Bangla language social media posts.

Uploaded by

Sumit Kumar Dam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Depression Analysis from Social Media Data in

Bangla Language Applying Deep Recurrent


Neural Network

Presented By

Durjoy Bapery Abdul Hasib Uddin


Student ID: 150229 Student ID: 150236
overview
Introduction
Classification
Motivation
Our Contribution
Related works
Methodology
Result
Conclusion & Future Work
Introduction

Lots of User-generated data

Sentiment Analysis
Classification
Motivation
According to WHO, depression was ranked as the third leading cause of the
global burden of disease in 2004 and will move into the first place by 2030. In
Bangladesh, national survey on mental health documented that depression was
found-
 In 4.6% of the adult population.
1% in children.[9]

This depressed people can commit any type of crimes starting from suicide to
killing other out of depression.
Depression Statistic graph
60
55

49 50
50

43 43

40
34

30
27 27
Percentage

26
22 21 22
20
15
12
10

0
20 and over 20-39 40-54 55 and over 20-39 40-54 55 and over
Age
Men Women
Our Contribution
Related works
Related works Continue…
Name of the paper Used models Dataset Result
A Depression Detection Machine learning. 6013 Micro-blog posts from Sina Precisions 80%
Model Based on Sentiment Micro-blog. Segmented into50000
Analysis in Micro-blog sub-sentences.
Social Network [2]

Exploring human emotion Unigram model and 4232 tweets from Sentiment140, Unigram(81%) and
via Twitter[10] Unigram model with manually labelled. Unigram with
POS tagging POS(79.5%) for 4-
Multinomial Naïve way classification.
Bayes Classifier Unigram(66%) and
Unigram with
POS(64.8%) for 5-
way classification.
Sentiment Analysis on Long Short Term 9337 posts (Facebook, Twitter, Accuracy for Bangla
Bangla and Romanized Memory (LSTM) YouTube, news portals, product view) (78%) and Romanized
Bangla Text (BRBT) using Among them 6698(72%) Bangla and Bangla (55%).
Deep Recurrent models [4] 2639(28%) Romanized Bangla.
Related works continue…
Name of the paper Used models Dataset Result

Multilingual Sentiment Recurrent Neural 9,478,095 Amazon, 8539 Yelp, English- 87.06%
Analysis: An RNN-Based Network (LSTM and 68170 Competition restaurant Spanish-84.21%
Framework for Limited Data[5] GRU) reviews as training dataset. Turkish-74.36%
Dutch-81.77%
2045 Spanish, 932 Turkish, Russian-85.61%
1635 Dutch , 2529 Russian
restaurant reviews as testing
datasets.

BUSEM at SemEval-2017 Task Support Vector SemEval-2016 Task4 Subtask LSTM model-62.6%
4 Sentiment Analysis with Word Machine (SVM), A’s twitter train and test SVM model-62.8%
Embedding and Long Short Random Forest (RF), dataset.
Term Memory RNN Naïve Bayes(NB)
Approaches [6] And Long Short Term
Memory (LSTM)
Proposed method
 Our Proposed Method consists of two main steps.
Creating dataset
The Creating Dataset section is divided into five sub-sections –
Collect Raw Data
(Twitter, Google sheet)
­

Data Preprocessing
(Letter, Number, Stop character)
­

Data labelling
(Manually Labeled 5000 tweets)
­

Data Post-processing
(Removing Redundancy,
Stratifying)
­

Data Vectorization
(Integer level Encoding)
­

Split Dataset
(Training Data 80%
Validation Data 10%
Test Data 10%)
Collecting Raw Data

For creating our own dataset we applied the following steps.

Collecting Raw Data


We collected about 5,000 raw Bangla tweets from Twitter.
 We collected 210 depressed Bangla data from our friends using google form.
Data pre-processing
 Data Preprocessing
 For data pre-processing task, we applied our algorithm. We created a white
list of Bangla alphanumeric characters, punctuation and space.
Data Labelling

 Data Labelling
 Our data set was labeled manually by a Sociology student into two types,
i.e.
1. Depressed and
2. Non-depressed
 After labeling we got-
1. 984 depressive tweets
2. 27 negative but non-depressive tweets
3. 195 neutral tweets and
4. 2,708 positive tweets.
Data post-processing

Data Post-processing
Removing redundancies.
Down-sampling non-depressed data to balance with 588 depressed data.
Stratifying Depressed and non-depressed data.
Data vectorization
 Data Vectorization
 For vectorizing our dataset, we applied sentence level integer encoding on
our dataset.
Data splitting
 Data Splitting
 For hyper-parameter tuning steps, we split our entire dataset into three parts

i. Training dataset (80%)
ii. Validation dataset (10%)
iii. Testing dataset (10%).

 While applying 10 folds cross-validation, our entire dataset was split into
two parts on each fold –
i. Training dataset (90%)
ii. ii. Validation dataset (10%).
Applied methods
 For our thesis work, we have applied two distinct models for
analyzing depression from Bangla Social media data. The
methodologies are –

i. Applying Gated Recurrent Neural Network for Depression Analysis on


Bangla Data.
ii. Applying Long Short Term Memory Recurrent Neural Network for
Depression Analysis on Bangla Data.
Diagram for the applied methods
Train & Validate model ( Training Dataset
80% & Validation Dataset 10%)

­
Hyper-parameter Tuning for LSTM
& GRU
(Size, Batch size, No. of epochs, No. of LSTN &
GRU layers)
­

No Training & Validation


Complete?

­ Yes
Measure Accuracy
(Test Dataset 10%)
­

No
Hyper-parameter Tuning
complete?

­ Yes
Compare Test Accuracies and select
the best model
­

Apply 10-fold cross-validation


(On selected Model using entire dataset)

Calculate Average Validation


Accuracy
­

Compare the result among of LSTM


& GRU models
Results of Tuning GRU-RNN Model Hyper-parameters
.
Comparing gru model test accuracies
Accuracy vs loss for 10 fold cross validation (GRU)

• Average Validation Accuracy : 83.33%


Results of Tuning LSTM-RNN Model Hyper-parameters
Comparing lstm model test accuracies
Accuracy vs loss for 10 fold cross validation (lstm)

Average Validation Accuracy : 84.4%


Gru vs lstm test accuracy

GRU performs better than LSTM in 8 implementations.


LSTM performs better than GRU in 5 implementations.
GRU vs lstm 10 fold cross validation accuracies

GRU learning curve is more stable than LSTM learning curve.


Conclusion & Future work
In this work we have applied Deep Recurrent models to analyze Bangla
sentences to predict human’s depression. We have applied our proposed
methods on small dataset.

 Applying these methods on huge datasets may lead us to more accurate


predictions.
 Other Deep Learning models can be applied for depression analysis for
further research.
 Other types of depression can be detected.
 The cause behind the depression can be predicted.
References
1. “Depression.” World Health Organization, World Health Organization, 4 July 2017,
www.who.int/mental_health/management/depression/en/, Last Visited 21st July, 2018
2. Wang, Xinyu, et al. "A depression detection model based on sentiment analysis in micro-blog social
network." Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, Berlin,
Heidelberg, 2013.
3. Hassan, Asif, et al. "Sentiment Analysis on Bangla and Romanized Bangla Text (BRBT) using Deep
Recurrent models." arXiv preprint arXiv:1610.00369 (2016).
4. Can, Ethem F., Aysu Ezen-Can, and Fazli Can. "Multilingual Sentiment Analysis: An RNN-Based
Framework for Limited Data." arXiv preprint arXiv:1806.04511 (2018).
5. Ayata, Deger, Murat Saraclar, and Arzucan Ozgur. "BUSEM at SemEval-2017 Task 4A Sentiment
Analysis with Word Embedding and Long Short Term Memory RNN Approaches." Proceedings of the
11th International Workshop on Semantic Evaluation (SemEval-2017). 2017.
6. Yin, Wenpeng, et al. "Comparative study of cnn and rnn for natural language processing." arXiv
preprint arXiv:1702.01923 (2017).
References
8. Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical
machine translation." arXiv preprint arXiv:1406.1078 (2014).
9. Helal, Ahmed. “Depression: Let's Talk.” The Daily Star, The Daily Star, 1 Apr. 2017,
www.thedailystar.net/health/depression-lets-talk-1384978, Last Visited 21st July, 2018.
10. Riyadh, Abu Zonayed, Nasif Alvi, and Kamrul Hasan Talukder. "Exploring human emotion via Twitter."
Computer and Information Technology (ICCIT), 2017 20th International Conference of. IEEE, 2017.
11. Beal, Vangie. “Stop Words.” The Five Generations of Computers - Webopedia Reference,
www.webopedia.com/TERM/S/stop_words.html.

12. “National Center for Health Statistics.” Centers for Disease Control and Prevention, Centers for Disease
Control and Prevention, 16 Oct. 2014, www.cdc.gov/nchs/products/databriefs/db167.htm.
End of the Presentation

Thank you…

You might also like