Enhancing Depressive Post Detection in Bangla - A Comparative Study of TF-IDF, BERT and FastText Embeddings
Enhancing Depressive Post Detection in Bangla - A Comparative Study of TF-IDF, BERT and FastText Embeddings
Research Article
Received: 8th May 2024; Accepted: 30th June 2024; Published: 1st July 2024
Abstract: Due to massive adoption of social media, detection of users’ depression through social media analytics
bears significant importance, particularly for underrepresented languages, such as Bangla. This study introduces a
well-grounded approach to identify depressive social media posts in Bangla, by employing advanced natural
language processing techniques. The dataset used in this work, annotated by domain experts, includes both
depressive and non-depressive posts, ensuring high-quality data for model training and evaluation. To address the
prevalent issue of class imbalance, we utilised random oversampling for the minority class, thereby enhancing the
model's ability to accurately detect depressive posts. We explored various numerical representation techniques,
including Term Frequency – Inverse Document Frequency (TF-IDF), Bidirectional Encoder Representations from
Transformers (BERT) embedding and FastText embedding, by integrating them with a deep learning-based
Convolutional Neural Network-Bidirectional Long Short-Term Memory (CNN-BiLSTM) model. The results
obtained through extensive experimentation, indicate that the BERT approach performed better the others,
achieving a F1-score of 84%. This indicates that BERT, in combination with the CNN-BiLSTM architecture,
effectively recognises the nuances of Bangla texts relevant to depressive contents. Comparative analysis with the
existing state-of-the-art methods demonstrates that our approach with BERT embedding performs better than others
in terms of evaluation metrics and the reliability of dataset annotations. Our research significantly contribution to
the development of reliable tools for detecting depressive posts in the Bangla language. By highlighting the efficacy
of different embedding techniques and deep learning models, this study paves the way for improved mental health
monitoring through social media platforms.
Keywords: BERT; Bi-LSTM; CNN; Depression; FastText; Post Detection; TF-IDF; Text Classification
1. Introduction
Depression is a serious condition characterised by worsening of negative emotions [1]. Prolonged
sadness or enduring a difficult situation that causes continuous, unbearable sufferings can lead to
depression. If left untreated, depression can even lead to the suicide [2]. According to the World Health
Organization (WHO), an estimated 280 million people, i.e. 3.8% of the global population, experience
Saad Ahmed Sazan, Mahdi H. Miraz and A B M Muntasir Rahman, “Enhancing Depressive Post Detection in Bangla: A Comparative
Study of TF-IDF, BERT and FastText Embeddings”, Annals of Emerging Technologies in Computing (AETiC), Print ISSN: 2516-0281, Online
ISSN: 2516-029X, pp. 34-49, Vol. 8, No. 3, 1st July 2024, Published by International Association for Educators and Researchers (IAER),
DOI: 10.33166/AETiC.2024.03.003, Available: http://aetic.theiaer.org/archive/v8/v8n3/p3.html.
AETiC 2024, Vol. 8, No. 3 35
depression, including 5% adults and 5.7% senior citizens (60 years or older)1. In 2022, 49,369 people
worldwide took their own lives due to depressive disorders, and the rate of suicide continues to rise2.In era
of this digitally connected word, social media is everywhere, serving as a platform for people to express
themselves, communicate and share information [3]. However, social media often becomes a space where
people express mental health issues such as depression [4]. It is important to accurately and promptly
identify these depressive posts on social media so that early intervention, timely support and potentially
life-saving actions can be taken. Machine learning approaches have emerged as powerful tools to address
this challenge. They can process and analyse large amounts of data, uncovering patterns that may not be
visible to the human eye. The necessity of depressive post detection on social media stems from the necessity
of timely assistance and intervention. Social media platforms provide a unique window into the thoughts
and emotions of individuals in real time. By analysing the language used in posts, machine learning models
can identify linguistic cues and patterns associated with depression. This automated detection can then
trigger alerts for mental health professionals or support systems, ensuring that the individuals promptly
receive the help they need. This scalable approach offers a level of objectivity and consistency that is difficult
to achieve through manual monitoring on its own.
While there has been noteworthy progress in detecting depressive posts in widely spoken languages,
such as English, not much advancement in this domain is apparent for less widely spoken languages, such
as Bangla. Bangla, spoken by over 230 million people worldwide3, mainly in Bangladesh and the Indian
state of West Bengal, presents unique linguistic challenges for natural language processing (NLP) [5]. The
scarcity of Bangla datasets, linguistic resources and pre-trained models makes it even more difficult to
detect depressive posts in this language. Figure 1 demonstrates a significant rise in depressive disorders in
Bangladesh over the past three decades4. This linear upward trend highlights a critical public health
concern, underscoring the importance of identifying individuals suffering from depression and
implementing effective interventions. Addressing this growing mental health issue is crucial for improving
the well-being and quality of life of the Bangladeshi population.
1 https://www.who.int/news-room/fact-sheets/detail/depression#:~:text=Approximately%20280%20million%20people%20in
2 https://www.kff.org/mental-health/issue-brief/a-look-at-the-latest-suicide-data-and-change-over-the-last-decade/
3 https://bangladeshus.com/roots-of-the-bangla-language/#:~:text=The%20Bangla%20language%2C%20also%20known%20as%20Bengali%2C%20is
4 https://vizhub.healthdata.org/gbd-compare/
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 36
and FastText generates word embeddings that can handle out-of-vocabulary words by utilising Sub-word
information.
By comparing the performance of TF-IDF, BERT and FastText embeddings for Bangla
depressive post detection, this study seeks to identify the most effective approach. The goal is to
enhance the understanding of depressive language in Bangla and pave the way for developing
tools that can support mental health monitoring on social media. The key contributions of our work
include:
1. Effectively Handling Class Imbalance: We addressed the issue of class imbalance in our dataset,
which ensures that our model performs better to predict the minority class.
2. Enhancing Performance of Text Representation Techniques: Our research demonstrates that
while Term Frequency-Inverse Document Frequency (TF-IDF) effectively captures the essential
features, BERT embeddings provide a more comprehensive understanding of the texts. This
indicates that while simpler methods (e.g. TF-IDF) are useful, advanced embeddings (e.g. BERT)
can offer significant benefits in capturing the nuanced semantics of Bangla depressive posts. This
finding underscores the importance of choosing the right text representation technique based on
the specific requirements as well as constraints of the dataset and the application.
3. Proposing a Novel Custom CNN-BiLSTM Model: We introduce a custom model combining
Convolutional Neural Networks (CNN) and Bidirectional Long Short-Term Memory (BiLSTM)
networks. This hybrid approach captures local patterns and long-term dependencies, resulting in
highly accurate predictions of depressive posts. This model represents a significant advancement
in the automatic detection of depressive symptoms from the texts.
2. Related Works
While there have been numerous studies conducted in widely spoken languages such as English, there
is a notable lack of research in Bangla regarding depressive text analysis. This section provides a
comprehensive literature review on this topic.
The study by Uddin et al. [7] delved into the understanding of depression through Bangla social media
data. The researchers utilised advanced models such as Gated Recurrent Unit (GRU) and Long Short-Term
Memory (LSTM) Recurrent Neural Networks. The development of a small yet meticulously organised
dataset consisting of Bangla tweets is the novelty of their work. The findings of their research demonstrated
that adjusting the settings of these models, also known as hyper-parameter tuning, significantly impacted
their accuracy. It has also been reported that the GRU models outperformed the LSTM models, particularly
with the smaller dataset used. However, this justification was solely based on the achieved accuracy scores,
which can sometimes be misleading. In fact, the reduced size of the dataset can result in a less generalisable
model. This implies that the model may perform well on the reduced dataset but poorly on new, unseen
data due to the lack of diverse training examples. This constitutes a significant research gap in their work.
Another study by Uddin et al. [8] deployed a Long Short-Term Memory (LSTM) Deep Recurrent
Network for depression analysis on Bangla social media data. The study involved the creation of a small
dataset of Bangla tweets, which was then stratified. The paper demonstrated the impact of hyper-parameter
tuning on the efficacy of depression analysis on a small Bangla social media dataset. The data was sourced
from Twitter, with 5,000 Bangla tweets collected through repeated sampling, allowing for random
repetitions of tweets. The 5,000 Bangla tweets were categorised into four groups: depressive (984 tweets),
non-depressive (2930 tweets), ambiguous (699 tweets) and incomplete sentences (387 tweets). The initial
dataset exhibited an imbalance, with 2,930 non-depressive tweets and only 984 depressive tweets, which
could lead to accuracy and overfitting issues. Consequently, 984 non-depressive tweets were chosen to
balance the dataset with 984 depressive tweets, excluding ambiguous and incomplete tweets. The study
applied the LSTM Deep Recurrent model to analyse Bangla tweets for predicting human depression. The
findings revealed that the LSTM model with a size of 128, batch size of 25 with 10 epochs and 5 layers with
20 epochs achieved high depression detection accuracies. This suggests that high accuracy can be achieved
for small datasets in complex psychological tasks such as depression analysis by tuning the Deep Recurrent
model. However, it is important to note that this approach has limitations similar to those of the previous
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 37
work [7]. The justifications provided were solely based on the accuracy metric and down-sampling the
dataset may lead to a loss of model generalisation. Moreover, smaller datasets can also impact the reliability
of evaluation metrics.
Chowdhury et al. [9] aimed to automatically extract sentiment or polarity expressed by users in Bangla
Twitter posts or "tweets". Since no labelled training corpus of Bangla tweets was available, they utilised
tweets obtained from the Twitter API, which were designated for the training set, to construct the dataset
for training the classifier. For this purpose, they utilised a semi-supervised method called self-training
bootstrapping. Their practical findings were promising for a resource-scarce language such as Bangla,
achieving an accuracy of 93% for SVM, using unigrams with emoticons as features. The underlying
assumption was that users share tweets to convey opinions and subjective content, effectively narrowing
down the classification problem to identifying the overall polarity of tweets as either negative or positive.
To construct the training corpus, they employed a semi-supervised bootstrapping approach, eliminating
the need for labour-intensive manual annotation. Drawing from previous research in English, Support
Vector Machine (SVM) and Maximum Entropy (MaxEnt) were found to outperform other classifiers in this
domain. Consequently, for classification, they utilised SVM and MaxEnt, conducting a comparative analysis
of the performance of these two machine learning algorithms by experimenting with various sets of
features. However, it is worth noting that their dataset was labelled by the people who are not expert in
identifying the polarity which is one of the key limitations of their work. Non-experts could introduce biases
based on their subjective understanding of the data, potentially skewing the model's performance and
creating a bias toward non-expert perspectives.
Tasnim et al. [10] employed various machine learning algorithms to detect depressive Bangla text from
social media posts. Feature extraction methods such as count vectorisation, TF-IDF and word embedding
were applied to a dataset containing 6,178 texts gathered from social media. The dataset was self-generated
and perfectly balanced. The experiment utilised Multinomial NB, Aggressive Classifier, Decision tree
classifier, Neural Network and Linear Support Vector Machine. Additionally, two deep learning models,
namely Bidirectional LSTM (BiLSTM) and Gated Recurrent Units (GRU), were employed. Each model was
subjected to 10-fold cross-validation, with each fold serving as a test dataset for the corresponding training
iteration. Their research achieved a classification accuracy of 97% using the decision tree algorithm and 94%
with the bidirectional LSTM deep learning model for predicting depressive text in the Bangla language.
However, similar to the work by Chowdhury et al. [9], this study also faced limitations related to the
labelling of the dataset by non-experts.
Akhter et al. [11] proposed the use of machine learning algorithms and user information to detect
cyberbullying in Bangla text. They gathered a dataset from social media, labelling it as bullied or not bullied,
to train various classification models. Cross-validation results showed that a support vector machine (SVM)
algorithm achieved a 97% detection accuracy. The study aimed to develop a novel method for analysing
Bangla content on social media by combining text analytics and machine learning algorithms and to
compare its performance with other techniques. They extensively explored suitable algorithms for Bangla
text categorisation, including Naive Bayes, SVM, Decision Tree and K-Nearest Neighbours, using the
WEKA software platform. The experiments involved 2,400 Bangla texts from social media posts, with 10%
labelled as bullying, and a 10-fold cross-validation model was used to evaluate the models' performance. It
is worth noting that the model was trained using an imbalanced dataset, which can lead to bias towards the
majority class. Additionally, similar to [9] and [10], the dataset was not labelled by an expert, potentially
leading to biased classification.
In the work by Hassan et al. [12], a substantial textual dataset of both Bangla and Romanised Bangla
texts was provided, marking the first of its kind. This dataset underwent pre-processing and multiple
validations and was made ready for sentiment analysis (SA) implementation and experiments.
Furthermore, the dataset was tested in a Deep Recurrent model, specifically Long Short-Term Memory
(LSTM), using two types of loss functions: binary cross-entropy and categorical cross-entropy. Experimental
pre-training was also conducted by utilising data from one validation set to pre-train the other and vice
versa. Their key contributions included providing a dataset comprising 10,000 Bangla and Romanised
Bangla text samples, each annotated by two adult Bangla speakers, pre-processing the data for easy usability
by researchers, applying deep recurrent models to the Bangla and Romanised Bangla text corpus and
experimenting with pre-training the dataset of one label for another (and vice versa) to assess potential
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 38
improvements in results. It is important to consider that the achieved level of accuracy may not be deemed
fair. Moreover, solely prioritising accuracy without considering other factors may raise concerns about the
validity of the justification.
These studies exhibit several common limitations, including the absence of datasets labelled by experts,
the training of models on imbalanced datasets, and the reliance solely on accuracy scores, which can be
misleading. Our research primarily addresses these limitations. We utilised a dataset labelled by domain
experts, addressed the issue of class imbalance and provided a comprehensive set of evaluation metrics to
offer a holistic view of our model's performance.
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 39
3.1. Dataset
This segment provides a description of the dataset. The dataset was created by Uddin et al. [7] and
available on GitHub5. Initially, they collected the data from various tweets. As a secondary source, they also
collected Bangla depressive data using user through distributing online questionnaire. The dataset was
manually labelled by a sociology student, as a domain expert in human behaviours. There are a total of 2930
non-depressive posts and 984 depressive posts in the dataset. This dataset is the largest publicly available
collection of Bangla depressive and non-depressive posts. Additionally, this dataset has been meticulously
labelled by experts, ensuring high-quality and reliable annotations, which is a key factor in our selection.
While there are other datasets, they are not publicly accessible and most of them lack expert annotation.
These factors make the selected dataset5 the optimal choice for our study. While the class distribution of the
dataset is shown in Figure 3, the sample of the dataset is depicted in Table 1, showing two different posts
from two distinct classes.
Table 1. Snapshot of the Dataset
Text Label
স্বার্ক থ জন্ম আমরা জন্মন্মছি এই দেন্মে রাস্তায় মানুষ মন্মর আর Depressive
মন্ত্রী সান্মেব োন্মস োয় আফন্মসাস
জজত্তা দেছি ! বাে্ ! এটাই বাাংলান্মেে ! সাবাস ! অছিনন্দন ! Non_depressive
২০০৯সান্মলর পর এছেয়ার বাইন্মর এটাই ওছিআই ছসছরজ জয় !
5https://github.com/abdulhasibuddin/Depression-Analysis-from-Social-Media-Data-in-Bangla-Language-Applying-Deep-Recurrent-
Neural-Network/tree/master/Implementations
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 40
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 41
3. Removing Punctuations: Punctuation marks can sometimes interfere with the text processing
algorithms, especially those which are not punctuation-aware. Removing them can simplify the
text, making it easier for the models to learn patterns in the data.
4. Removing Extra White Spaces: Extra white spaces can create inconsistencies in the dataset. They
can affect tokenisation and other preprocessing steps, leading to irregularities in text
representation. Removing unnecessary white spaces ensures that the text data is clean and
compact, making it more efficient for storage and faster for processing.
For instance, the sentence ‘শুি সকাল পছবত্র জুম্মার ছেন জুম্মা দমাবারক ! ☺️সুন্দর দোক সবার জীবন
#Jumma’ will be transformed to ‘শুি সকাল পছবত্র জুম্মার ছেন জুম্মা দমাবারক সুন্দর দোক সবার জীবন’ after the
preprocessing steps.
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 42
words into smaller units such as character n-Grams and training word embeddings using a skip-
gram model with negative sampling. During the training process, the model learns to predict the
context (surrounding words) given a target word based on its sub-word representations, adjusting
parameters to minimise the loss. The resulting embeddings provide dense vector representations
for words, capturing semantic relationships and meanings based on their sub-word compositions.
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 43
learning rates, which combine the benefits of momentum and RMSprop optimisers as well as require less
manual hyperparameter tuning.
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 44
Figure 11. Accuracy Curve (BERT Embedding) Figure 12. Loss Curve (BERT Embedding)
Figure 13. Confusion Matrix (BERT Embedding) Figure 14. ROC Curve (BERT Embedding)
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 45
The model's performance using BERT and FastText embeddings is almost identical, as shown in Figure
15, where the validation set's accuracy shows minimal improvement over time despite a continuous
decrease in loss (Figure 16), indicating slight overfitting. Figures 17 and 18 display the confusion matrix and
ROC curve for the FastText approach. The ROC covers 80% of the area, which closely resembles that of the
BERT approach. This similarity is attributed to the dataset's characteristics; a small or limited vocabulary
dataset prevents FastText from fully leveraging its advantage of capturing sub-word information and
producing out-of-vocabulary word embeddings. Consequently, FastText's potential benefits are not fully
realised, resulting in comparable performance to BERT, which excels at handling larger, more diverse
datasets. Thus, in scenarios with constrained datasets, the choice between BERT and FastText embeddings
might not significantly impact the model's performance.
Figure 15. Accuracy Curve (FastText Embedding) Figure 16. Loss Curve (FastText Embedding)
Figure 17. Confusion Matrix (FastText Embedding) Figure 18. ROC Curve (FastText Embedding)
In Table 3, the performance of three different word representation approaches across four evaluation
metrics is presented. The BERT representation achieved the highest F1 score of 84%. In comparison, TF-IDF
and FastText embeddings achieved slightly lower scores of 82% and 83%, respectively. Moving on to
accuracy, TF-IDF, BERT and FastText achieved scores of 81%, 83%, and 82%, respectively. In terms of
precision and recall, similar patterns emerged. While the precision scores are 82%, 85% and 84%,
respectively, the recall scores are similar to the accuracy scores. These results indicate that the BERT
approach achieves the highest F1 score, while FastText and TF-IDF displayed comparable performance in
accuracy, precision and recall. Despite TF-IDF achieving the highest AUC (84%), its F1 score (82%), accuracy
(81%), and precision (82%) are slightly lower compared to BERT and FastText. This discrepancy implies that
while TF-IDF is overall good at distinguishing between classes, it might not be as effective as BERT in
handling the balance between precision and recall.
BERT embeddings emerge as the most effective technique for detecting depressive posts, as evidenced
by the highest F1 score. This makes it a preferable choice when the goal is to achieve a balanced model with
both high precision and recall. While TF-IDF provides a strong AUC score, suggesting good overall
discrimination ability, its lower F1 score highlights potential issues in balancing false positives and
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 46
negatives. In fact, FastText offers a middle ground with consistent performance, making it a reliable
alternative.
Table 3. Evaluation Metrics for Three Different Approaches
Type of Numerical Accuracy (%) Precision (%) Recall (%) F1-Score (%)
Representation
TF-IDF 81 82 81 82
BERT Embedding 83 85 83 84
FastText Embedding 82 84 82 83
Table 4 provides a comparative overview of recent studies focused on Bangla textual data. Uddin et al.
[9-10] conducted two distinct studies using the same dataset but employing different methodologies. In the
first study, they implemented a GRU model, which yielded an accuracy of 75.7%. However, this study fell
short of providing comprehensive evaluation metrics, making it difficult to thoroughly assess the model's
performance. The second study used an LSTM model and achieved a higher accuracy of 86.3%, yet it
similarly lacked additional performance metrics, which can make relying solely on accuracy potentially
misleading. Chowdhury et al. [11] achieved an impressive accuracy and F1 score of 93% using a rule-based
classifier. However, a notable limitation of their study is that the dataset was manually labelled by the
researchers themselves rather than by domain experts, which might introduce biases in the results. In a
different approach, Tasnim et al. [12] developed a balanced dataset, leading to robust performance metrics
across the board. They reported an accuracy of 97%, with precision, recall and F1 measures being
consistently high. Despite the high scores achieved, their dataset was also not annotated by experts, as the
case of [9-11]. Akhter et al. [13] focused on identifying bullying texts and achieved an accuracy of 97.27%
using an SVM classifier. However, their dataset was imbalanced, which could affect the generalisability of
their findings. Hassan et al. [14], on the other hand, used an LSTM model and reported an accuracy of 78%
but did not provide any additional evaluation metrics. In contrast to these studies, our research addresses
the class imbalance issue within the dataset and employs a BERT embedding combined with a CNN-
BiLSTM architecture. This approach resulted in an accuracy of 83%. Unlike several studies that only report
accuracy, our study provides a comprehensive set of evaluation metrics, including AUC, precision, recall
and F1 score. This holistic evaluation allows for a better understanding of the model's performance beyond
mere accuracy, highlighting its ability to effectively balance false positives and false negatives. Furthermore,
our dataset was annotated by domain experts, which significantly enhances the reliability and validity of
our results. This comprehensive evaluation of both the dataset and the model's performance underscores
the robustness of our findings.
Table 4. Comparison with the other published works
Work Methodology Dataset Accuracy Precision Recall F1 score Argument
(%) (%) (%) (%)
Udiin et al. GRU 984 depressive 75.7 - - - Accuracy is fairly low and
[9] tweets and 2,930 does not provide other
non-depressive evaluation metrics.
tweets.
Udiin et al. LSTM 984 depressive 86.3 - - - Only accuracy was provided,
[10] tweets and 2,930 which might have been
non-depressive misleading.
tweets.
Chowdhury SVM & MaxEnt Consisted of 93 - - 93 Dataset was not labelled by
et al. [11] 1300 unlabelled domain experts.
tweets
Tasnim et Decision Tree 7000 data were 97 97 97 97 Dataset was not labelled by
al. [12] collected from domain experts.
Facebook, where
3500 were
labelled as
depressed and
the rest were not
depressive.
Akhter et SVM 2400 texts were 97.27 99 - 99 The dataset is imbalanced
al. [13] collected from and was not labelled by
Twitter and domain experts.
Facebook where
10% of the data
were labelled as
bullied.
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 47
This section demonstrates the limitations faced throughout this work, along with our future research
directions.
5.1. Limitations
• Small Dataset Size: One key limitation is the small size of our dataset, which restricts the model's
ability to generalise to a broader range of data. This constraint limits the scope and applicability of
our findings.
• Test Set Size: The test set size was not sufficient to conclusively determine the model's performance
across a diverse range of inputs. A larger test set would have provided a more robust evaluation of
the model's capabilities.
• Lack of External Validation: We were unable to test the robustness of our model with other
datasets, as they were not publicly available. This limits our ability to confirm the model's
performance on different data sources.
6. Conclusion
Our work aimed to contribute to the identification of individuals experiencing ongoing depression, as
reflected in their social media posts. Given the scarcity of research in the Bangla language, it is essential to
develop a reliable tool for detecting depressive posts written in Bangla. To this end, our research utilised a
dataset containing both depressive and non-depressive posts that were carefully annotated by domain
experts. To address the issue of class imbalance, random oversampling was employed for the minority class,
ensuring that our model could effectively learn from a more balanced dataset. This step was crucial in
preventing the model from being biased towards the majority class and enhancing its ability to accurately
detect depressive posts. Several experiments were conducted with various numerical representation
techniques to convert textual data into meaningful vectors for machine learning. These techniques included
Term Frequency-Inverse Document Frequency (TF-IDF), BERT embeddings and FastText embeddings.
These representations were then applied to a deep learning-based CNN-BiLSTM model, which combined
the strengths of Convolutional Neural Networks (CNN) for feature extraction and Bidirectional Long Short-
Term Memory (BiLSTM) networks for understanding contextual information from both directions in a
sequence. Our results indicated that the TF-IDF representation achieved the best performance in classifying
the classes, with an AUC of 84%, which is better than both BERT and FastText embeddings. This finding
highlights the effectiveness of TF-IDF in capturing the essential features of Bangla text in our dataset.
Additionally, while BERT did not outperform TF-IDF in terms of AUC, it achieved a good F1 score,
demonstrating its capability to effectively balance precision and recall. When comparing our work with
other published research, we found that our results were superior in terms of various evaluation metrics,
including accuracy, precision, recall and F1-score. Additionally, the reliability of our dataset annotations,
performed by domain experts, contributed to the acceptance of our model.
Acknowledgement
This research is financially supported by the Xiamen University Malaysia (Project codes:
XMUMRF/2021-C8/IECE/0025 and XMUMRF/2022-C10/IECE/0043).
References
[1] Nancy Frasure-Smith, François Lespérance and Mansour Talajic, “The impact of negative emotions on prognosis
following myocardial infarction: Is it more than depression?”, Health Psychology, vol. 14, no. 5, pp. 388–398,
September 1995, Published by American Psychological Association, Print ISSN: 0278-6133, Online ISSN: 1930-7810,
DOI: 10.1037/0278-6133.14.5.388, Available: https://psycnet.apa.org/doi/10.1037/0278-6133.14.5.388.
[2] Kevin M. Malone, Gretchen L. Haas, John A. Sweeney and J. John Mann, “Major depression and the risk of
attempted suicide”, Journal of Affective Disorders, vol. 34, pp. 173–185, 1995, ISSN: 0165-0327, DOI: 10.1016/0165-
0327(95)00015-F, Available: https://www.sciencedirect.com/science/article/abs/pii/016503279500015F.
[3] Casey T. Carr and Rebecca A. Hayes, “Social Media: Defining, Developing, and Divining”, Atlantic Journal of
Communication, vol. 23, no. 1, pp. 46–65, 2015, Print ISSN: 1545-6870, Online ISSN: 1545-6889, DOI:
10.1080/15456870.2015.972282, Available: https://www.tandfonline.com/doi/abs/10.1080/15456870.2015.972282.
[4] John A. Naslund, Arnav Bondre, John Torous and Kelly A. Aschbrenner, "Social Media and Mental Health: Benefits,
Risks, and Opportunities for Research and Practice", Journal of Technology in Behavioral Science, vol. 5, pp. 245–257,
2020, Electronic ISSN: 2366-5963, Published by Springer Nature, DOI: 10.1007/s41347-020-00134-x, Available:
https://link.springer.com/article/10.1007/s41347-020-00134-x.
[5] Prakahs M Nadkarni, Lucila Ohno-Machado and Wendy W Chapman, “Natural language processing: an
introduction”, Journal of the American Medical Informatics Association, vol. 18, no. 5, pp. 544–551, September 2011,
Published by Oxford University Press, ISSN: 1067-5027, DOI: 10.1136/amiajnl-2011-000464, Available:
https://academic.oup.com/jamia/article/18/5/544/829676.
[6] Avinash Madasu and Sivasankar Elango, “Efficient feature selection techniques for sentiment analysis”, Multimedia
Tools and Applications, December 2019, Published by Springer, ISSN: 1573-7721, DOI: 10.1007/s11042-019-08409-z,
Available: https://link.springer.com/article/10.1007/s11042-019-08409-z.
[7] Abdul Hasib Uddin, Durjoy Bapery and Abu Shamim Mohammad Arif, “Depression Analysis of Bangla Social
Media Data using Gated Recurrent Neural Network”, in Proceedings of the 2019 1st International Conference on
Advances in Science, Engineering and Robotics Technology (ICASERT), 03-05 May 2019, Dhaka, Bangladesh, Published
by IEEE, Electronic ISBN:978-1-7281-3445-1, Print on Demand(PoD) ISBN:978-1-7281-3446-8, DOI:
10.1109/icasert.2019.8934455, Available: https://ieeexplore.ieee.org/document/8934455.
www.aetic.theiaer.org
AETiC 2024, Vol. 8, No. 3 49
[8] Abdul Hasib Uddin, Durjoy Bapery and Abu Shamim Mohammad Arif, “Depression Analysis from Social Media
Data in Bangla Language using Long Short Term Memory (LSTM) Recurrent Neural Network Technique”, in
Proceedings of the 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic
Engineering (IC4ME2), 11-12 July 2019, Rajshahi, Bangladesh, Published by IEEE, Electronic ISBN:978-1-7281-3060-
6, Print on Demand (PoD) ISBN:978-1-7281-3061-3, DOI: 10.1109/IC4ME247184.2019.9036528, Available:
https://ieeexplore.ieee.org/abstract/document/9036528.
[9] Shaika Chowdhury and Wasifa Chowdhury, “Performing sentiment analysis in Bangla microblog posts”, in
Proceedings of the 2014 International Conference on Informatics, Electronics & Vision (ICIEV), 23-24 May 2014, Dhaka,
Bangladesh, Published by IEEE, Electronic ISBN:978-1-4799-5180-2, Print ISBN:978-1-4799-5179-6, DOI:
10.1109/ICIEV.2014.6850712, Available: https://ieeexplore.ieee.org/document/6850712.
[10] Farzana Tasnim, Sultana Umme Habiba, Nuren Nafisa and Afsana Ahmed, “Depressive Bangla Text Detection
from Social Media Post Using Different Data Mining Techniques”, Lecture Notes in Electrical Engineering, vol. 834,
pp. 237–247, 3 March 2022, Published by Springer, Print ISBN: 978-981-16-8483-8, Online ISBN: 978-981-16-8484-5,
DOI: 10.1007/978-981-16-8484-5_21, Available: https://link.springer.com/chapter/10.1007/978-981-16-8484-5_21.
[11] Abdhullah-Al-Mamun and Shahin Akhter, “Social media bullying detection using machine learning on Bangla
text”, in Proceedings of the 2018 10th International Conference on Electrical and Computer Engineering (ICECE), 20-22
December 2018, Dhaka, Bangladesh, Published by IEEE, Print on Demand (PoD) ISBN: 978-1-5386-7483-3,
Electronic ISBN: 978-1-5386-7482-6, DOI: 10.1109/icece.2018.8636797, Available:
https://ieeexplore.ieee.org/document/8636797.
[12] Asif Hassan, Mohammad Rashedul Amin, Abul Kalam Al Azad and Nabeel Mohammed, “Sentiment analysis on
bangla and romanized bangla text using deep recurrent models”, in Proceedings of the 2016 International Workshop
on Computational Intelligence (IWCI), 12-13 December 2016, Dhaka, Bangladesh, Published by IEEE, Print on Demand
(PoD) ISBN: 978-1-5090-5770-2, Electronic ISBN: 978-1-5090-5769-6, DOI: 10.1109/iwci.2016.7860338, Available:
https://ieeexplore.ieee.org/document/7860338.
[13] Zewen Li, Fan Liu, Wenjie Yang, Shouheng Peng and Jun Zhou, “A Survey of Convolutional Neural Networks:
Analysis, Applications, and Prospects”, IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 12,
pp. 1–21, 2021, Published by IEEE, Print ISSN: 2162-237X, Electronic ISSN: 2162-2388, DOI:
10.1109/tnnls.2021.3084827, Available: https://ieeexplore.ieee.org/document/9451544.
[14] Sepp Hochreiter and Jürgen Schmidhuber, “Long Short-Term Memory”, Neural Computation, vol. 9, no. 8, pp. 1735–
1780, November 1997, Published by MIT Press, Print ISSN: 0899-7667, Online ISSN: 1530-888X, DOI:
10.1162/neco.1997.9.8.1735, Available: https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-
Memory.
[15] Kazuyuki Hara, Daisuke Saito and Hayaru Shouno, “Analysis of function of rectified linear unit used in deep
learning”, in Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), 12-17 July 2015,
Killarney, Ireland, Published by IEEE, Electronic ISSN: 2161-4407, Print ISSN: 2161-4393, DOI:
10.1109/IJCNN.2015.7280578, Available: https://ieeexplore.ieee.org/abstract/document/7280578.
[16] Xiaoyong Yuan, Zheng Feng, Matthew Norton, Xiaolin Li, "Generalized Batch Normalization: Towards
Accelerating Deep Neural Networks", in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no. 01,
pp. 1682-1689, 17th July 2019, Published by AAAI Press, Print ISSN: 2159-5399, Online ISSN: 2374-3468, DOI:
https://doi.org/10.1609/aaai.v33i01.33011682, Available: https://ojs.aaai.org/index.php/AAAI/article/view/3985.
[17] Afia Zafar, Muhammad Aamir, Nazri Mohd Nawi, Ali Arshad, Saman Riaz et al., “A Comparison of Pooling
Methods for Convolutional Neural Networks”, Applied Sciences, vol. 12, no. 17, 2022, Published by MDPI, ISSN:
2076-3417, DOI: 10.3390/app12178643, Available: https://www.mdpi.com/2076-3417/12/17/8643.
[18] Sungheon Park and Nojun Kwak, "Analysis on the Dropout Effect in Convolutional Neural Networks", Computer
Vision -- ACCV 2016, pp. 189–204, 10th March 2017, Published by Springer, Print ISBN: 978-3-319-54183-9, Online
ISBN: 978-3-319-54184-6, DOI: https://doi.org/10.1007/978-3-319-54184-6_12, Available:
https://link.springer.com/chapter/10.1007/978-3-319-54184-6_12.
[19] Imran Khan Mohd Jais, Amelia Ritahani Ismail and Syed Qamrun Nisa, "Adam Optimization Algorithm for Wide
and Deep Neural Network", Knowledge Engineering and Data Science (KEDS), vol. 2, no. 1, pp. 41–46, 2019, Published
by Universitas Negeri Malang, Print ISSN: 2597-4602, Online ISSN: 2597-4637, DOI: 10.17977/um018v2i12019p41-
46, Available: https://journal2.um.ac.id/index.php/keds/article/view/6775.
[20] Anthony J. Bowers and Xiaodong Zhou, “Receiver Operating Characteristic (ROC) Area Under the Curve (AUC):
A Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes”, Journal of Education for
Students Placed at Risk (JESPAR), vol. 24, no. 1, pp. 20–46, 2019, Published by Routledge, ISSN: 1082-4669, DOI:
10.1080/10824669.2018.1523734, Available: https://www.tandfonline.com/doi/full/10.1080/10824669.2018.1523734.
www.aetic.theiaer.org