Less Is More Stress Detection Through Condensed So
Less Is More Stress Detection Through Condensed So
Contents
Zeyad Alghamdi1, Tharindu Kumarage1, Garima Agrawal1, Huan Liu1, and Russell Bernard2
1
School of Computing and Augmented Intelligence, Arizona State University, Tempe, USA
2
Institute for Social Science Research, Arizona State University, Tempe, USA
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Abstract: In the digital age, social media has been a go-to platform for stress-related discussions, yielding valuable data to
advance the understanding and detection of stress. Swift identification of stress indicators in these online conversations is
essential in enabling immediate support and helping to avert subsequent severe mental and physical health issues, especially
during global crises such as pandemics and conflicts. Detecting stress in social media posts automatically poses a formidable
challenge. While techniques such as supervised Pretrained Language Models (PLMs) and zero-shot Large Language Models
(LLMs) based classifiers have demonstrated significant performance, they exhibit limitations, especially on platforms like
Reddit. For example, on Reddit, users tend to write lengthy, expressive posts, which causes these methods to often fail to
consider the entire context, leading to incomplete or inaccurate assessments of a user's mental health or stress status. To
overcome these limitations, we present a new approach to identifying and classifying stress-related discourse on social
media. Our approach involves analyzing condensed versions of user posts, such as user-provided summaries or the "Too
Long Didn’t Read" (TLDR) portion of the original post. We question whether these abridged texts can yield a more accurate
classification of stress. In this paper, we make the following contributions. First, we investigate the relationship between the
performance of the model's perceived textual context and the length of social media posts. Second, we present a novel
approach to use the summarized texts for stress detection. We experiment with different classifiers to evaluate their
performance on stress detection accuracy using summarized versus full-length posts. Furthermore, by examining the
emotional and linguistic features of the original posts and their summaries, we suggest improvements to current state-of-
the-art LLM-based stress classifier prompts, thereby enhancing stress detection capabilities. Finally, when user summaries
are absent, we synthetically generate meaningful user post summaries by incorporating the power of LLMs. Our results show
that the stress detection performance deteriorates for longer posts, and utilizing the TLDR and summaries improves
classification outcomes. We also provide augmented datasets containing human and AI-generated summaries for future
research in stress detection on social media.
Keywords: Mental health, Stress detection, Social media, Large language models (LLMs), Linguistic features analysis, Text
summarization
1. Introduction
Recent global crises have escalated stress levels, profoundly impacting mental health worldwide. Data from the
American Psychological Association (APA, 2023) reveals a significant increase in chronic illnesses and mental
health diagnoses since the COVID-19 pandemic, with adults aged 18 to 34 reporting the highest rates. In this
digital era, social media platforms have emerged as crucial forums for mental health discourse and support,
offering both anonymity and empathy (Sowles, S.J. et al., 2018; Sher, L., 2020). More importantly, these mental
health discourses on social media provide a wealth of textual data that can be leveraged for early stress
detection. This, in turn, could play a pivotal role in mitigating and addressing severe mental health challenges.
In the field of stress detection on social media, supervised Pretrained Language Models (PLMs) have
demonstrated state-of-the-art performance across various platforms, including Twitter and Reddit (Lin, H. et
al.,2017; Nijhawan, T. et al., 2022). With the emergence of large language models (LLMs), recent research has
presented compelling cases for LLMs serving as superior zero-shot stress classifiers capable of performing
effectively across multiple social media platforms(Xu, X., et al., 2023; Lamichhane, B., 2023), alleviating the need
for additional fine-tuning, as typically required by PLM-based stress classifiers. However, both fine-tuned PLM
and zero-shot LLM-based stress classifiers encounter challenges when confronted with lengthy content. Yang,
K., et al. (2023) have noted that models like ChatGPT face difficulties in effectively addressing long contextual
posts. Moreover, as shown by Ji, S., et al. (2023), transformer-based models like BERT exhibit inherent limitations
in processing long text content, constraining their effective range to a mere 512 tokens. This limitation is
significant in the context of social media platforms like Reddit, where posts often exceed this length.
13
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
Our research addresses this problem by proposing an innovative approach distinct from direct long-document
analysis. We hypothesize that integrating summaries could effectively mitigate the identified limitation for two
primary reasons: 1) Conciseness: Summaries distill the essence of longer posts while preserving crucial
information; 2) Availability: On platforms like Reddit, the prevalent use of "Too-Long-Didn't-Read" (TLDR)
sections in lengthy posts presents a readily accessible summary, aiding stress detection. Consequently, our study
enhances the established Dreaddit stress dataset (Turcan, E., and Kathleen M. 2019) by incorporating summaries
derived from these TLDRs (in our paper, we use 'TLDR' and 'summary' interchangeably). Subsequently, we assess
the effectiveness of summaries as an alternative unit of analysis for stress classification in lengthy posts.
Furthermore, through a detailed analysis of the psychological, linguistic, and emotional features between these
user-generated summaries and the complete posts, we propose modifications to current LLM-based
classification prompts to boost stress detection efficacy further.
Additionally, our study investigates the potential of LLMs to produce summaries analogous to human-written
TLDRs. The primary aim is to assess whether AI-generated TLDRs could serve as a viable substitute for stress
classification in lengthy posts lacking human-authored TLDRs. To this end, we further supplement our dataset
with TLDRs generated by ChatGPT. We then benchmark the stress detection performance on AI-generated TLDRs
against those written by users. Our comparative analysis reveals that both human and AI-generated summaries
significantly improve classification accuracy and F1 scores, particularly in the context of longer posts. In
summary, our key contributions are as follows:
• We propose a novel approach for classifying stress using user-provided summary texts as an
alternative to analyzing full-length posts.
• We augment the existing LLM-based stress classification prompts by conducting a psychological,
linguistic, and emotional feature analysis on user-written posts against their user-written summaries.
• We study the effectiveness of LLM-written summaries for stress classification on longer posts that
lack human-authored summaries.
• We release an augmented dataset comprising the original extended post texts, the user-provided
summaries, summaries generated by LLMs, and associated features, making it available for other
researchers in the field. The code and data are available in our GitHub repository: (
https://github.com/Zeyad-o/TLDR-AISummarizeStress/ )
2. Related Work
2.1 Evolution of Stress Detection Techniques in Social Media
The field of stress detection on social media has evolved substantially over time. Beginning with traditional text
analysis methods, such as rule-based systems (Thelwall, M., 2017) and Latent Dirichlet Allocation (LDA) (Khan,
A., Ali, R. (2020); Nijhawan, T. et al. (2022)). The scope of stress detection expanded when a new dimension of
integrating multimodal data, including images and social network information was proposed by Lin, H., et al.
(2014, 2017). Furthermore, Turcan, E., et al. (2021) and Alghamdi Z., et al. (2023), have delved into the role of
emotions in stress detection from the poster’s perspective. More recent studies have explored emotional
disparity in social media comments as a novel approach for stress detection (Alghamdi, Z. et al., (2023)). The
advent of LLMs marked a new era, Lamichhane, B. (2023) demonstrated ChatGPT's effectiveness in mental
health classification tasks, achieving notable F1 scores. This study highlights ChatGPT's potential for mental
health classification roles that are typically reserved for domain-specific models. In parallel, Xu, X., et al. (2023)
evaluated multiple LLMs, including GPT-3.5, across various mental health tasks, showing promising results with
zero-shot and few-shot prompting despite limitations related to the different LLMs used context window sizes
compared to GPT-3.5. Furthermore, Yang, K., et al. (2023) advance this research by focusing on interpretable
mental health analysis using LLMs. They address low interpretability of traditional methods by exploring
different prompting strategies and the generation of explanations that are close to human performance. Most
importantly, they highlight that ChatGPT shows strong in-context learning abilities, but it still falls short of
advanced task-specific methods, indicating a need for careful prompt engineering.
2.2 Summarization in Mental Health
The application of summarization in mental health is increasingly being recognized as crucial. It has gained
significant recognition for its ability to aid healthcare professionals. Manas, G., et al. (2021) demonstrate the
importance of creating semantically relevant summaries from clinical diagnostic interviews. Gao, Y., et al. (2022)
investigated how summaries of medical problems can help healthcare stakeholders accurately grasp patient
conditions, easing their workload and reducing cognitive biases. Furthering the application of technology in this
14
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
domain, Li, Hao, et al. (2023) used LLMs to generate concise lists summarizing patients' problems, showcasing
the value of LLMs in enhancing the efficiency of patient care. Kim, T., et al. (2023) applied LLMs to summarize
psychiatric patients' experiences for clinician dashboards, enhancing patient monitoring. Syed, S., et al. (2023)
used multiple LLMs, including GPT3.5 and GPT4, for summarizing extensive social media discussions, focusing
on comments. This approach aids in navigating and analyzing complex social media content, demonstrating the
practicality of LLMs in mental health and social media contexts. The growing interest in LLMs for text
summarization is highlighted by studies from Zhang, Tianyi, et al. (2023), Pu, Xiao, et al. (2023), and Laban, P., et
al. (2023). These studies focus on comparing human-generated and LLM-generated summaries, revealing the
complexities and potential for further research in this area. While these studies underscore the growing validity
of LLM-generated summaries in various contexts, our research focuses on the potential of human-generated
summaries, specifically TLDRs, in the realm of stress detection on social media.
3. Methodology
In this section, we first discuss the process used to extract and prepare the data, followed by how the different
text sizes were handled. Then, we present our approach for the analysis and classification of the features. For
brevity, only the critical aspects of the method are presented here. The overall design and methodology are
shown in Figure 1.
Figure 1: Methodology Pipeline - illustrates the flow of our approach with a sample from our dataset
3.1 Dataset
3.1.1 Dataset extraction
For our analysis, we utilized the Dreaddit dataset, a comprehensive and manually annotated collection of social
media posts, specifically from Reddit. This dataset, made publicly available by Turcan, E., and Kathleen M. (2019),
is widely recognized for its application in stress detection research. Spanning from January 1, 2017, to November
19, 2018, Dreaddit offers a diverse range of lengthy posts, encompassing various subreddits related to mental
health issues, including abuse, anxiety, financial stress, PTSD, and social challenges.
Dreaddit's distinctive quality lies in its focus on everyday stress experiences, as opposed to strictly clinical
scenarios, and its inclusion of extensive Reddit posts, which provide a rich insight into the multifaceted nature
of stress expression in social media contexts. We utilized the PRAW API (Reddit, 2023) to extract these posts
based on their unique IDs. Although the original Dreaddit dataset comprises 2,750 posts with individual
classifications, our extraction yielded 1,984 posts, with the reduction mainly due to deletions by users or
removals by moderators.
3.1.2 TLDRs extraction
TLDRs are commonly employed by users to encapsulate the essence of their posts. We employed regular
expressions to methodically extract TLDRs and user-written summaries, ensuring they were explicit and
identified by key phrases such as 'long story short,' 'basically,' 'short story,' among others, or as integral
components of the post narratives. Our extraction process prioritized original content, deliberately excluding
any edits influenced by subsequent user comments to maintain the authenticity of the posts' sentiment and
intended message. This approach was crucial as our objective was to pinpoint early indicators of stress, and
incorporating edits that might reflect positive feedback could potentially distort the original context and skew
the results. From the initial dataset comprising 1,984 posts, we successfully narrowed it down to a subset of 527
15
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
samples that included both the original post text and user-written summaries. It is important to note, however,
that not all TLDRs strictly conform to the traditional definition of summaries. Some users tend to list key points
or pose questions, aiming to attract readers interested in the core message, who might then choose to engage
with the full post or respond based on these highlighted elements.
3.1.3 Augmented dataset
In this work, we developed an augmented dataset that contains the cleaned original social media posts along
with both user-generated and AI-generated summaries. This enriched dataset is designed not only for feature
extraction and emotional analysis but also to augment the reproducibility of our research. By providing these
diverse data elements in a consolidated form, we aim to facilitate future studies in the field of mental health
and stress detection.
3.2 Features Extraction
In our study, we used the Linguistic Inquiry and Word Count (LIWC) software (LIWC-22; Boyd, R.L., et al., 2022),
a text analysis program. LIWC, known for quantifying psychological and linguistic attributes in text, has been
validated for stress detection in previous studies, including Turcan, E. and Kathleen M,.(2019). We employed
LIWC's latest version, which extracts up to 118 features from unsegmented posts to maintain contextual
integrity. It categorizes words into emotional, cognitive, and structural components, aiding in a comprehensive
analysis of psychological constructs in the text. LIWC’s ability to assess emotional and cognitive states in social
media posts is crucial for detecting early stress indicators in our study.
3.3 Summarization
In this study, we utilized LLMs for summarization with a simple prompt:
"Summarize the following social media post: [POST]".
This approach leverages the model's broad training, enabling it to adapt to various topics efficiently. The use of
a straightforward prompt accentuates its capability in zero-shot learning, producing coherent and contextually
appropriate summaries. This demonstrates the models' potential to rapidly process diverse mental health topics,
and highlight their utility.
3.4 LLM Zero-Shot Classification
The design of our prompts was an integral part of our experiment. Drawing from previous research, the prompts
were specifically crafted to align with our task’s unique characteristics. This included incorporating expressions
of current negative stress as identified in the work of Turcan and McKeown (2019) and Turcan, E., et al. (2021).
Based on our dataset text types, we have the following: either post or (post TLDR or summary). Therefore, we
use the following zero-shot prompt:
"Is the following [Text Type] indicative of current negative stress or not? Just answer in Yes or No. Don't
provide explanations. [POST]".
4. Experimental Setup
Here, we describe the experimental settings used to validate our approach, including the bucketization of the
dataset, LLM usage, and stress detection baselines, to support reproducibility.
4.1 Bucketization
To investigate the relationship between stress classification performance and the textual data length, we divided
the dataset into three segments into buckets or Quartiles 1, 2, and 3 (Q1, Q2, and Q3) based on the ascending
order of token counts in the posts. Each quartile contains approximately one-third of the total dataset. The
tokenization of each post was done using the OpenAI tokenizer (Turbo 3.5). Our analysis revealed the following
average token counts per quartile:
• Q1 average is 178 tokens (minimum 80, maximum 268).
• Q2 average is 387 tokens (minimum 269, maximum 554).
• Q3 average is 982 tokens (minimum 555, maximum 3,833).
The overall dataset average is 515 tokens. Figure 2 shows the frequency distribution of token count in each
quartile.
16
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
17
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
18
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
Figure 5: Venn Diagram of the Top 10 Features in Posts vs. User-Written Summaries (TLDRs)
Moreover, analyzing the intersection of features from posts and user-written summaries proves vital. Shared
features like 'general Tone', 'Negative Emotions and Tone’, ‘ lack of Positive Tone', and the first person singular
pronouns such as ('I') in both posts and summaries are particularly telling. These elements, indicative of negative
emotions and a self-centric narrative, are potent stress markers. Furthermore, each text type offers unique
insights. Posts often feature elements like 'We', 'Affiliation,' 'Clout,' 'Conflict,' and 'Risk,' which delve into
personal dynamics, interpersonal tensions, and perceived threats. On the other hand, the TLDRs, known for their
brevity and focus, highlight specific aspects such as 'Female,' 'Work,' and 'Swear,' 'Affect,' and 'Social
References'. These features shed light on gender reference, workplace stress, and emotional intensity more
directly and succinctly.
5.2.3 Enhanced LLM-Classification
We have leveraged the insights gained from our feature analysis to refine the prompt used in existing LLM-based
stress classifiers. Specifically, we have emphasized capturing first-person perspectives and emotional intensity,
key elements highlighted by the dominant features shown in Figure 5. Consequently, we have revised the
classification prompt, which we denote as the 'enhanced prompt':
"Given the following social media text ( can be either post or post summary or post TLDR ), looking from
the poster's perspective, only classify if it is indicative of current very severe negative stress as ‘Yes’
otherwise ‘No’. Just answer in ‘Yes’ or ‘No’. Don't provide explanations. Text:[Text Type]"
Figure 6 shows the enhanced prompt's results, demonstrating an improvement in post-classification accuracy
from 51.9% to 57.6% and an increase in performance on human summaries from 58.2% to 60.0%, compared to
the generic prompt results in Figure 4.
5.3 Analysis of AI-Generated Summaries
While the above findings are promising, they underscore a practical challenge: not all posts include user-
generated summaries. This challenge compels us to explore the feasibility of employing AI-generated summaries
as a potential alternative. Here, we try to answer the RQ2: "Can we use AI-generated summaries instead of
human-generated summaries?". Figure 6 shows the performance comparison of full posts, human summaries,
and AI-generated summaries using enhanced LLM classification.
Figure 6: Performance Comparison of Subset Text Types with Zero Shot Enhanced Prompt
These results revealed that while human-written summaries outperformed full posts by 2.4% in classification
accuracy, the AI-generated summaries further improved on this by an additional 2%.
19
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
Regarding the F1 score, human summaries and posts had similar performance, but AI-generated summaries
showed a slightly better F1 score.
To evaluate the resource efficiency of summaries in stress classification, we analyzed token counts within the
subset, including human summaries. The average counts were 715 tokens for original posts, 44 for human
summaries (6.2% of the total), and 65 for AI-generated summaries (9.1%), as shown in Figure 7.
Figure 7: Token Percentage and Performance Comparison for Different Text Types
This analysis yielded two significant insights. First, human-generated summaries, with only 6.2% of the total
token count, improved classification accuracy over full posts by 2.4%. More importantly, AI-generated
summaries, constituting just 9.1% of the total tokens, outperformed both full posts and human summaries. The
marginal increase in token count (2.9%) correlated with a notable 2% accuracy gain compared to human
summaries. Thus, AI-generated summaries emerge not only as a valid approach for stress classification but also
demonstrate that a slight increase in length can significantly enhance performance.
5.4 Benchmarking Summarization Impact on Classification
To address RQ3: 'How does the summarization approach compare against baseline methods?', we broaden our
analysis to encompass the entire dataset. This comparison is critical to understanding the efficacy of our
approach across different data buckets or quartiles and the overall performance, as illustrated in Table 1.
We observed that the PLM-based classifier (BERT) was better for the Q2 and Q3 quartiles, though the F1 scores
are low. In contrast, the enhanced zero-shot LLM classifier gives the best classification accuracy. The
improvement is notable, with a 67.5% accuracy and a 76.4% F1 score.
Table 1: Performance of Posts vs. Generic Summaries in LLMs and PLMs
Q1 Q2 Q3 Overall
Classifier / Metric
ACC% F1% ACC% F1% ACC% F1% ACC% F1%
Generic Prompt 73.8 81.8 62.3 74.4 51.6 66.5 62.6 74.3
Post
Enhanced Prompt 78.3 84.0 69.2 77.5 54.9 67.9 67.5 76.4
AI
Generated Generic Prompt 75.3 82.5 65.0 75.6 54.0 67.9 64.8 75.2
Summary
Enhanced Prompt 78.5 83.4 71.5 78.1 62.4 70.3 70.8 77.4
20
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
Shifting our focus to AI-generated summaries, the application of the zero-shot prompt for the AI-generated
summaries we observe the following: i) notable improvement of 2.2% in accuracy and a 0.9% rise in F1 score,
and ii) summaries are at least on par with, if not superior to full post classifications. Moreover, the PLM-based
baseline on these summaries improved classification in Q1 and Q2, and overall (an increase of 1.7% in accuracy
and 1.9% in F1 score) compared to the PLM full post-performance, albeit with a slight dip in Q3 performance.
Our analysis concludes that applying the enhanced prompt to AI-generated summaries significantly enhances
classification performance. It outperforms the generic zero-shot prompts (8.2% higher accuracy, 3.1% better F1
score) and PLM-based classifications (3.9% more accurate, 13.8% higher F1 score).
Acknowledgement
This work was supported by the Office of Naval Research under Award No. N00014-21-1-4002. We also
appreciate the valuable guidance and support provided by Wildflower Primary Care & Wellness Center.
Interpretations, conclusions, and recommendations within this article are solely those of the authors.
References
Agrawal, G., et al. (2023) "Can Knowledge Graphs Reduce Hallucinations in LLMs?: A Survey", arXiv preprint
arXiv:2311.07914.
Alghamdi, Z., et al. (2023). "Code RED: Reactive Emotion Difference for Stress Detection on Social Media". No. 10659.
EasyChair.
Alghamdi, Z., et al. (2023) "Studying the Influence of Toxicity and Emotion Features for Stress Detection on Social Media",
ECSM 2023 10th European Conference on Social Media, Academic Conferences, and Publishing Limited.
21
Proceedings of the 11th European Conference on Social Media , ECSM 2024
Zeyad Alghamdi et al.
22
Proceedings of the 11th European Conference on Social Media , ECSM 2024