Enhancing - Depression - Detection - Employing - Autoencoders - and - Linguistic - Feature - Analysis - With - BERT - and - LSTM - Model
Enhancing - Depression - Detection - Employing - Autoencoders - and - Linguistic - Feature - Analysis - With - BERT - and - LSTM - Model
300
Authorized licensed use limited to: Bangladesh University of Professionals. Downloaded on December 10,2023 at 17:00:43 UTC from IEEE Xplore. Restrictions apply.
2023 International Russian Automation Conference (RusAutoCon)
combination with gender feature which is a powerful B. Feature Augmentation
combination for depression detection. Considering this gap, our a) Pre-processing Text
proposed method involves extracting text features from a
pretrained BERT model and employing Autoencoders for Preparing the text transcript for analysis necessitates the
feature fusion to encode BERT features with absolutist words application of natural language processing (NLP) practices via
usage and gender features. The use of autoencoders helps to an ordered series of actions. This pre-processing stage is
minimize the feature space. Later, we leverage these encoded essential for refining the text data, reducing noise, and
features as embeddings at the input layer of our model, which enhancing its overall quality. By employing these techniques,
combines Long Short-Term Memory (LSTM) to attain superior the pre-processing step enables more accurate and meaningful
accuracy. analysis of the text data.
i) Tokenization
III. METHODOLOGY
Lexical analysis involves segmenting a text into lesser units
This section highlights the proposed methodology. We called lexemes, which may have expressions, words, or
intended to use gender features, so we experimented with the sentences [25].
model with and without gender as feature input to the model.
ii) Removing stop words
This work aims to address the following research questions
and present novel contributions: To minimize noise in text data, commonly used words such
as "and," "the," and "is" that lack significant meaning are
• Examining the comparative effectiveness of linguistic usually eliminated [26].
features versus text-based features in depression
detection, evaluated based on Mean Absolute Error iii) Stemming and lexical normalization
(MAE) and Root Mean Square Error (RMSE). Two common exercises, stemming and lemmatization, are
• Investigating the value of incorporating linguistic employed to transform words into their basic forms [27].
features such as absolutist word count assimilated into a Stemming involves removing word affixes, while lexical
depression prediction system pipeline. normalization relies on a wordlist to transform the word into its
root shape.
• Analyzing the effectiveness of gender-based features in
comparison to linguistic features based on absolutist iv) Morphosyntactic tagging
words, estimated by MAE/RMSE. This process is utilized to ascertain the syntactical
• Identifying the most effective combination of feature(s) organization of sentences by carrying the suitable part of
for capturing differences in depression levels. speech to each word, as discussed in the book "Speech and
Language Processing" by [26].
A. Data Used The proportion of absolutist terms in forum groups
The DAIC_WOZ dataset comprises audio and video discussing anxiety, depression, and suicidal ideation is notably
recordings of clinical interviews with individuals diagnosed higher than that observed in healthy groups [11]. Further, we
with depression. This dataset contains self-documented generate a list of absolutist words and subsequently create a
assessments of depression acuteness, distinct attributes, and frequency feature by iterating through the transcripts. We
verbatim interviews. Its purpose is to facilitate the development employ both Pearson's and Spearman's correlation coefficients
and evaluation of AI-powered systems targeting the detection to assess the relationship between the identified features and
of depression in clinical contexts. the PHQ-8 score. This correlation analysis was conducted
within the training and development datasets of the AVEC
Figure 1 illustrates the proposed architecture of this work. 2019 dataset.
The following figure 2 represents the dataset spread of
absolutist words usage, gender and depressed subjects.
b) BERT based feature extraction
For binary categorization of depression in the DAIC_WOZ
corpus, BERT [17] was utilized to extract 767 embeddings
deemed crucial. Initially introduced by Devlin et al. in 2018
[19], BERT effectively captures important information from
both Preceding and succeeding context at each layer, enabling
a comprehensive understanding of the input text. BERT offers
a versatile framework that can be adjusted for extensive range
of natural language processing (NLP) tasks with minimal
Fig. 1. Proposed Methodology. modifications to its architecture or hyperparameter tuning.
Bidirectional Encoder Representations from Transformers
(BERT) is model tailored to pre-train bidirectional
representations from raw text without the need for annotations.
301
Authorized licensed use limited to: Bangladesh University of Professionals. Downloaded on December 10,2023 at 17:00:43 UTC from IEEE Xplore. Restrictions apply.
2023 International Russian Automation Conference (RusAutoCon)
from the AVEC 2017-2019 Depression Sub-challenge, namely
the Mean Absolute Error (MAE) and Root Mean Square Error
(RMSE) [6], [16]. Various analytical statistics can be
employed to assess the performance of regression models, and
RMSE and MAE are commonly employed for this purpose.
We provide both evaluations because the AVEC 2014
Depression Sub-challenge utilizes both measures, and since
there is no agreement on the most appropriate metric for
assessing model errors [27].
Table 1 presents the results of our model. The accuracy
values during training and testing are 89.5% and 89.6%
respectively for our model. We conduct a comparative analysis
between the proposed model and experiments in Table 2 to
provide a contextual understanding of our findings in relation
to previous studies. The results of the other models mentioned
are sourced from their respective original studies.
Fig. 2. Visualizing dataset for use of absolutist words, gender and depressive
subjects TABLE I. MODEL EVALUATION
302
Authorized licensed use limited to: Bangladesh University of Professionals. Downloaded on December 10,2023 at 17:00:43 UTC from IEEE Xplore. Restrictions apply.
2023 International Russian Automation Conference (RusAutoCon)
et al. [15] and Lin et al [28]. The accuracy values during Subsequent investigations can explore the incorporation of
training and testing are 89.5 and 89.6 respectively which supplementary features and data sources, along with the
suggests the predicted depressed label is in accord with the true implementation of our suggested approach on broader and
value. more heterogeneous datasets. By continuing to advance the
field of text-based depression prediction, we can ultimately
Comparing our LSTM model with the results from contribute to improved mental health support and care.
Williamson et al. [2], Jan et al. [3], Zhang et al. [4], Oureshi et
al. [14], Al Hanai et al. [15], and Lin et al. [28], it is apparent To evaluate the strength and applicability of this approach,
that our model achieved significantly lower MAE and RMSE it is advisable to extend its implementation to more extensive
values. This indicates that our LSTM model exhibits better datasets. Furthermore, future research endeavors could
accuracy and precision in predicting depression severity. The reconnoiter the potential of applying in medicine applications,
R-squared value of 0.76 postulates that approximately 76% of offering the possibility of enhancing the early identification
the variation in the dependent variable can be elucidated by the and amelioration of depression.
independent variables incorporated in the model.
Moreover, when considering different feature combinations ACKNOWLEDGMENT
in the context of exploiting BERT features, our autoencoder- We would like to extend our gratitude to the DAIC_WOZ
centered fusion approach showcased consistent improvements. dataset license, which has greatly facilitated our research
When fusing BERT features with Absolutist Word Count, endeavours. Additionally, we would like to express our
Gender, or both, the MAE, RMSE, and R2 scores were appreciation to the library of Tomsk State University for
significantly reduced compared to the other fusion methods. providing us with access to a diverse range of valuable
This suggests that incorporating additional features, especially resources.
those related to the usage of absolutist words and gender, can
augment the predictive effectiveness of the model.
REFERENCES
The utilization of Autoencoders [21] for feature fusion has [1] World Health Organization. [Online]. Available:
demonstrated its effectiveness in capturing vital features while https://www.who.int/news-room/fact-sheets/detail/depression.
mitigating noise. According to the findings displayed in Table [2] J. R. Williamson, E. Godoy, M. Cha, A. Schwarzentruber, P. Khorrami,
III, it is apparent that our proposed LSTM model surpassed Y. Gwon, H. T. Kung, C. Dagli, and T. F. Quatieri., “Detecting
numerous other models in relation to Mean Absolute Error depression using vocal, facial and semantic communication cues,” in
Proceedings of the 6th International Workshop on Audio/ Visual
(MAE), Root Mean Square Error (RMSE), and R2 score. Emotion Challenge, vol. 6, pp. 11–18, October 2016.
[3] M. L. Joshi and N. Kanoongo, “Depression detection using emotional
VI. CONCLUSION AND FUTURE WORKS artificial intelligence and machine learning: A closer review,” Materials
Today: Proceedings, vol. 58, pp. 217–226, January 2022.
Given the rapid advancement of Depression as a medical [4] D. G. Blazer, “Psychiatry and the oldest old,” American Journal of
condition, this study placed emphasis on the anticipation of Psychiatry, vol. 157, iss. 12, pp. 1915–1924, December 2000.
depression severity through the analysis of textual data. By [5] S. Chattopadhyay, “A neuro-fuzzy approach for the diagnosis of
employing an array of advanced approaches, including the depression,” Applied Computing and Informatics, vol. 13, iss. 1, pp. 10–
incorporation of gender-related details, evaluation of absolutist 18, January 2017.
language frequency, utilization of BERT embeddings, and [6] F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, and M.
integration of features through autoencoder-based fusion, we Pantic, “AVEC'19: Audio/visual emotion challenge and workshop,”
Proceedings of the 27th ACM International Conference on Multimedia,
successfully bolstered the precision of prediction of true labels vol. 27, pp. 2718–2719, October 2019.
of depressive subjects.
[7] P. Wu, R. Wang, H. Lin, F. Zhang, J. Tu, and M. Sun, “Automatic
Our results emphasize the significance of feature depression recognition by intelligent speech signal processing: A
systematic survey,” CAAI Transactions on Intelligence Technology, pp.
engineering in text data and the relevance of incorporating 1–11, June 2022.
gender disparities in the detection of depression. By integrating
[8] T. Deng, X. Shu, and J. Shu, “A depression tendency detection model
these factors into our predictive model, we accomplished fusing weibo content and user behavior,” 5th International Conference
outstanding findings, surpassing baseline models and achieving on Artificial Intelligence and Big Data (ICAIBD) IEEE, vol. 5, pp. 304–
an elevated prediction accuracy of 89.5%. 309, May 2022.
[9] M. R. Morales and R. Levitan, “Speech vs. text: A comparative analysis
These findings showcase the capability of text-based of features for depression detection systems,” IEEE Spoken Language
methodologies in recognizing and forecasting the severity of Technology Workshop (SLT), pp. 136–143, December 2016.
depression. The integration of advanced techniques and [10] E. Victor, Z. M. Aghajan, A. R. Sewart, and R. Christian, “Detecting
meticulous evaluation of pertinent features can substantially depression using a framework combining deep multimodal neural
boost the accuracy and effectiveness of predictive models networks with a purpose-built automated evaluation,” Psychological
Assessment, vol. 31, no. 8, 1019, August 2019.
within the realm of mental health. Our study makes a valuable
contribution to the increasing body of research on harnessing [11] M. Al-Mosaiwi and T. Johnstone, “In an absolute state: Elevated use of
absolutist words is a marker specific to anxiety, depression, and suicidal
textual data for mental health assessment. The insights derived ideation,” Clinical Psychological Science, vol. 6, no. 4, pp. 529–542,
from this study have the capacity to shape the advancement of July 2018.
more accurate and reliable tools for early detection and [12] A. Trifan, R. Antunes, S. Matos, and J. L. Oliveira, “Understanding
intervention in cases of depression. depression from psycholinguistic patterns in social media texts,” in
Advances in Information Retrieval. ECIR 2020. Lecture Notes in
303
Authorized licensed use limited to: Bangladesh University of Professionals. Downloaded on December 10,2023 at 17:00:43 UTC from IEEE Xplore. Restrictions apply.
2023 International Russian Automation Conference (RusAutoCon)
Computer Science(), vol. 12036, J. Jose, et al. Springer, Cham., pp. 402– [20] Z. Chen and W. Li, “Multisensor feature fusion for bearing fault
409, April 2020. diagnosis using sparse autoencoder and deep belief network,” IEEE
[13] N. Cummins, B. Vlasenko, H. Sagha, and B. Schuller, “Enhancing Transactions on Instrumentation and Measurement, vol. 66, no. 7, pp.
speech-based depression detection through gender dependent vowel- 1693–1702, March 2017.
level formant features,” Artificial Intelligence in Medicine: 16th [21] D. Charte, F. Charte, S. García, M. J. del Jesus, and F. Herrera, “A
Conference on Artificial Intelligence in Medicine, AIME Vienna, practical tutorial on autoencoders for nonlinear feature fusion:
Austria, Springer International Publishing, vol. 16, pp. 209–214, June Taxonomy, models, software and guidelines,” Information Fusion, vol.
2017. 44, pp. 78–96, November 2018.
[14] S. A. Oureshi, G. Dias, S. Saha, and M. Hasanuzzaman, “Gender-aware [22] O. Irsoy and E. Alpaydın, “Unsupervised feature extraction with
estimation of depression severity level in a multimodal setting,” autoencoder trees,” Neurocomputing, vol. 258, pp. 63–73, October
International Joint Conference on Neural Networks (IJCNN) IEEE, pp. 2017.
1–8, July 2021. [23] K. Rama, P. Kumar, and B. Bhasker, “Deep autoencoders for feature
[15] T. Al Hanai, M. M. Ghassemi, and J. R. Glass, “Detecting depression learning with embeddings for recommendations: a novel recommender
with audio/text sequence modeling of interviews,” Interspeech, pp. system solution,” Neural Computing And Applications, vol. 33, pp.
1716–1720, September 2018. 14167–14177, November 2021.
[16] F. Ringeval, B. Schuller, M. Valstar, N. Cummins, R. Cowie, L. Tavabi, [24] C. Zhou, C. Sun, Z. Liu, and F. Lau., “A C-LSTM neural network for
M. Schmitt et al., “AVEC 2019 workshop and challenge: state-of-mind, text classification,” arXiv preprint, arXiv:1511.08630, November 2015.
detecting depression with AI, and cross-cultural affect recognition,” [25] W. Wagner, S. Bird, E. Klein, and E. Loper, Natural Language
Proceedings of the 9th International on Audio/visual Emotion Challenge Processing with Python, Analyzing Text with the Natural Language
and Workshop, vol. 9, pp. 3–12, October 2019. Toolkit. O’Reilly Media, Beijing, 2009, pp. 421–424.
[17] B. Cui, Y. Li, M. Chen, and Z. Zhang, “Fine-tune BERT with sparse [26] V. Keselj, “Book Review: Speech and language processing by Daniel
self-attention mechanism,” in Proceedings of the 2019 Conference on Jurafsky and James H. Martin,” Computational Linguistics, vol. 35, no.
Empirical Methods in Natural Language Processing and the 9th 3, Sept. 2009.
International Joint Conference on Natural Language Processing
(EMNLP-IJCNLP), vol. 9, pp. 3548–3553, November 2019. [27] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean
absolute error (MAE)?–Arguments against avoiding RMSE in the
[18] M. M. Rodrigues, T. Warnita, K. Uto, and K. Shinoda, “Multimodal literature,” Geoscientific Model Development, vol. 7, no.. 3, pp. 1247–
fusion of bert-cnn and gated cnn representations for depression 1250, June 2014.
detection,” Proceedings of the 9th International on Audio/Visual
Emotion Challenge and Workshop, vol. 9, pp. 55–63, October 2019. [28] L. Lin, X. Chen, Y. Shen, and L. Zhang, “Towards automatic depression
detection: A BiLSTM /1D CNN-based model,” Applied Sciences, vol.
[19] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training 10, no. 23, pp. 1–20, Dec. 2020.
of deep bidirectional transformers for language understanding,” arXiv
preprint, arXiv:1810.04805, October 2018.
304
Authorized licensed use limited to: Bangladesh University of Professionals. Downloaded on December 10,2023 at 17:00:43 UTC from IEEE Xplore. Restrictions apply.