Automatic Identification of Suicide Notes With A Transformer-Based Deep
Automatic Identification of Suicide Notes With A Transformer-Based Deep
Internet Interventions
journal homepage: www.elsevier.com/locate/invent
A R T I C L E I N F O A B S T R A C T
Keywords: Suicide is one of the leading causes of death worldwide. At the same time, the widespread use of social media has
Suicide notes led to an increase in people posting their suicide notes online. Therefore, designing a learning model that can aid
Social media the detection of suicide notes online is of great importance. However, current methods cannot capture both local
Deep learning
and global semantic features. In this paper, we propose a transformer-based model named TransformerRNN,
Natural language processing
Transformer-based model
which can effectively extract contextual and long-term dependency information by using a transformer encoder
and a Bi-directional Long Short-Term Memory (BiLSTM) structure. We evaluate our model with baseline ap
proaches on a dataset collected from online sources (including 659 suicide notes, 431 last statements, and 2000
neutral posts). Our proposed TransformerRNN achieves 95.0%, 94.9% and 94.9% performance in P, R and F1-
score metrics respectively and therefore outperforms comparable machine learning and state-of-the-art deep
learning models. The proposed model is effective for classifying suicide notes, which in turn, may help to develop
suicide prevention technologies for social media.
1. Introduction Pestian et al., 2012) focused on emotion features and latent semantic
features to identify suicide notes. In addition, some conventional ma
According to the World Health Organization (WHO), the total chine learning algorithms such as Logistic Mode Tree (LMT) and Naive
number of people dying from suicide is nearly 800,000 a year, and a Bayes model are also used (Schoene and Dethlefs, 2016). Although these
recent study predicts the number is continually rising (Dhingra et al., approaches have achieved some success, they rely heavily on feature
2015). Furthermore, suicide has become one of the leading causes of engineering and costly expert knowledge from professionals such as
death (World Health Organization, 2014), which makes it a public forensic linguists and psychiatrists.
health concern worldwide. Recently, social media platforms like Twitter Deep learning allows models to automatically learn representations
and Facebook have become increasingly popular where people between from data (LeCun et al., 2015) and has recently brought about a number
16 and 34 years old are more active (Chaffey, 2016). There also has been of breakthroughs in natural language processing (Young et al., 2018),
a growing trend that young people who potentially have suicide ideation computer vision (Szegedy and Toshev, 2013) and speech recognition
leave their suicide notes on social media platforms (Desmet and Hoste, (Nassif et al., 2019). Moreover, some promising methods based on deep
2013; Ji et al., 2020; Luxton et al., 2012). Therefore, the automatic learning have been introduced to some mental health applications (e.g.,
identification of suicide notes can play an important role in under depression detection (Acharya et al., 2018; Lam et al., 2019)) and ach
standing people's mental health status and help to prevent suicidal ieved competitive performance. Sentiment analysis is concerned with
behavior. detecting emotion and sentiment in textual data and is key for many
Previous works in identifying suicide notes used hand-crafted fea Artificial Intelligence applications (Cambria, 2016). Early work related
tures and feature selection, including sentiment and linguistic features. to sentiment analysis mainly focused on the linguistic feature selection
For example, Jones et al. (Jones and Bennell, 2007) designed a classi using machine learning methods (Lin and Luo, 2020) (e.g., Support
fication model based on statistical prediction rules like average sentence Vector Machine (SVM), Latent Dirichlet Allocation (LDA)) to improve
length and other structural features. Pestian et al. (Pestian et al., 2010; the performance. More recently, deep learning approaches have become
* Corresponding author.
E-mail address: [email protected] (T. Zhang).
https://doi.org/10.1016/j.invent.2021.100422
Received 16 February 2021; Received in revised form 2 June 2021; Accepted 22 June 2021
Available online 24 June 2021
2214-7829/© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
T. Zhang et al. Internet Interventions 25 (2021) 100422
increasingly popular for a variety of sentiment analysis tasks. There are fitness, r/parenting, r/teaching, r/relationships, etc.)2 where the posts
classic multiple neural network architectures (Zhang et al., 2018), did not contain obvious suicidal content. There is a total of 2000 samples
including Convolutional Neural Networks (CNN), LSTM, LSTM with in this corpus.
attention to extract subjective information. Cambria et al. (Cambria The data was collected from the public domain and we did not
et al., 2020) built SenticNet6, a commonsense knowledge base, by using discriminate between gender or any other distinguished factors. To
an ensemble of symbolic and sub-symbolic AI tools for sentiment anal protect the authors' identity and preserve their privacy, we also removed
ysis. Basiri et al. (Basiri et al., 2021) proposed an attention-based CNN- personal information. Moreover, all data were also checked manually to
BiLSTM learning model to consider temporal information of texts. Li ensure the accuracy of the label. Fig. 1 shows some examples of our
et al. (Li et al., 2020) designed a lexicon integrated two-channel CNN- dataset.
BiLSTM model to improve performance. In addition, stacked ensemble
learning (Akhtar et al., 2020) and multi-task learning (Majumder et al.,
2019) are also used for sentiment analysis. 2.2. Dataset analysis
Similar to sentiment classification (Tang et al., 2015), deep learning
is also a useful technique for identifying suicide notes, e.g., dilated LSTM To better understand the linguistic clues and language usage of
with attention (DLSTMAttention) (Schoene et al., 2019). However, these people who leave suicide notes behind, we analyzed our dataset in
methods cannot capture both local and global semantic features. words, topics and other linguistic features.
In this study, we propose a transformer-based deep learning model Table 1 shows a quantitative comparison of our three corpora in
named TransformerRNN, which can extract contextual information and terms of the number of notes and posts, the average number of words in
latent features to identify suicide notes by using the transformer encoder each note and the average number of words in each sentence. It can be
and BiLSTM. We evaluate the TransformerRNN using conventional seen that the average note length of suicide notes is greater than others.
machine learning methods and deep learning-based models on the same Research by (Gregory, 1999) has shown that this could be due to people
dataset. The results show that our model is better than baseline ap conveying their feelings as much as possible before they commit suicide.
proaches on the suicide note identification task. At the same time, the average number of words in a sentence of last
statements is the lowest, which could be because people break their
communication down into shorter units during stressful situations
2. Dataset
(Osgood and Walker, 1959), such as being a prison inmate on death row.
In addition, term clouds were used to compare the usage of high-
2.1. Dataset collection
frequency terms visually in different texts. The suicide notes
frequently use some terms such as “mental health”, the mention of
Identifying suicide notes is a subtask of text classification within the
people (wife, William, friend etc.) and “life” as shown in Fig. 2(a),
mental health domain. Besides suicide notes, we added last statements
indicating that the writers have suicidal tendencies. Fig. 2(b) shows that
that were written by prison and a number of posts containing no obvious
last statement writers are showing their repentance by using “god”,
references to suicidal behavior. Therefore, in our experiments, the
“jesus christ” and “death row”. For example, someone wrote, “In the
dataset covers suicide notes, last statements, and neutral posts, which is
name of Jesus, I am sorry for the pain I caused you all.” For neutral
a 3-class classification task.
Reddit posts, the dominant terms are mainly about everyday life like
“student”, “credit card”, “story” and “guy”.
2.1.1. Suicide notes
In order to show the different linguistic and psychological features in
Some data was collected from existing corpora (Schoene et al.,
our datasets, we used the LIWC to analyze each type of note and post.
2019), where it is known that the note writer has died by suicide. Due to
the limited dataset size, we further extended our dataset with data
collected from Kaggle.1
However, we do not know if a user who posted suicidal thoughts
online has died by suicide. We used the Linguistic Inquiry and Word
Count software (LIWC 2015) (Pennebaker et al., 2015) to compare the
differences between the two datasets. LIWC (Pennebaker et al., 2015)
has been developed to extract linguistic and psychological information
via statical analysis based on word counts. We then use Cohen's d effect
size (Cohen, 1992) for each feature between each dataset to calculate the
statistical significance of each feature. We find that there are only a
small number of features that have a medium effect size (the result of
Cohen's d greater than 0.5), such as the emotions of a person, the usages
of informal language and the second person pronoun, whereas all other
linguistic features are similar. Therefore, we merge the two datasets
from different sources creating a new dataset of 659 samples.
1
https://www.kaggle.com/mohanedmashaly/suicide-notes/
2
https://www.reddit.com
2
T. Zhang et al. Internet Interventions 25 (2021) 100422
Table 1 (ii) The usage of function words and content words reflects how
Quantitative comparison of corpora. people communicate and what they say (Tausczik and Penne
Corpora Suicide notes Last statements Neutral posts baker, 2010). It has been observed that suicide notes and last
statements use more personal pronouns because their authors
No. of notes 659 431 2000
Av. no. of words in note 143.30 110.97 130.90 prefer to focus on themselves (Just et al., 2017). We also
Av. no. of words in sentence 15.09 10.53 16.25 compared the average number of adjectives and adverbs. The
higher amount of these two parts of speech is observed in suicide
notes, which means it is more likely that people tend to use more
We also calculated effect sizes using Cohen's d (Cohen, 1992) be amplifying language (Baker and Baker, 2003), whereas the
tween pairwise corpora to find linguistic features that are statistically number of adjectives and adverbs in last statements is lower
significant (at least two results of Cohen's d greater than 0.5, because 0.2 because prisoners have limited time to express their feelings
indicates a small effect, 0.5 indicates a medium effect, and 0.8 indicates (Hemming et al., 2020).
a large effect). As shown in Table 2, the listed items include dimension (iii) Social processes stand for the social relationships of writers,
analysis, function and content word, affect analysis, social process, and where we observe that in suicide notes writers tend to write less
personal concerns. about social issues and family, while we observe the opposite in
the results of last statements. The reason might be related to the
(i) The clout and tone for suicide notes are lowest, and last state low frequency in interpersonal relationships (Kelly and Foley,
ments are highest overall. Clout refers to a person's social confi 2018).
dence or status in text (Pennebaker et al., 2014). Therefore, the (iv) Personal concerns highlight the common topics covered in notes.
results indicate that people who wrote suicide notes have a lower Unsurprisingly, most neutral posts refer to words related to work,
socio-economic status (Cohan et al., 2018). Tone stands for the and the topic of death is commonly referenced in suicide notes
emotional tone, where higher scores indicate greater emotional and last statements. Moreover, words related to religion are most
positivity (Cohn et al., 2004). The analysis of tone has also been referenced in suicide notes, which is confirmed by previous
verified in terms of affect analysis in Table 2, demonstrating that studies (Foley and Kelly, 2018) (Just et al., 2017).
suicide notes express negative emotions (e.g., sadness, anxiety)
and last statements often use resignation words (Schoene and
Dethlefs, 2016).
Fig. 2. Term cloud visualization of our dataset, the term clouds were generated using the Termine system (Frantzi et al., 2000).
http://www.nactem.ac.uk/software/termine/.
3
T. Zhang et al. Internet Interventions 25 (2021) 100422
Fig. 3. The overall architecture of TransformerRNN. The model contains five components: input embeddings, transformer encoder, BiLSTM, max-pooling layer and
classification layer. The symbol ⊕ denotes vector concatenation. The internal architecture of transformer encoder is shown in light green block. More details about
our model are provided in the main text.
4
T. Zhang et al. Internet Interventions 25 (2021) 100422
where Q, K, V and output are all matrices when a set of queries are PL = softmax(W3 H + b3 )
computed simultaneously, and dk is the dimension of queries and keys.
Meanwhile, in order to allow the model to jointly gain information from We train the model to minimize cross-entropy error:
different representation sub-spaces at different positions, multi-head 1∑ c
Table 3
The performance evaluation of different models on test set.
Method Suicide notes Last statements Neutral posts Avg.
P (%) R (%) F1 (%) P (%) R (%) F1 (%) P (%) R (%) F1 (%) P (%) R (%) F1 (%)
J48 67.5 64.4 65.9 64.5 73.1 68.5 86.1 84.7 85.4 79.3 79.1 79.2
Naive Bayes 66.7 69.0 67.8 53.3 83.6 65.1 95.4 82.3 88.4 83.7 80.0 81.8
Bayes Net 88.1 67.8 76.6 66.3 94.0 77.8 93.5 91.0 92.2 88.4 87.0 87.7
LMT 82.6 65.5 73.1 100 65.7 79.3 87.7 99.7 93.3 88.5 88.1 88.3
CNN 90.0 72.4 80.2 93.9 91.0 92.4 91.9 97.7 94.7 91.8 91.9 91.7
BiLSTM 42.9 83.9 56.8 40.0 3.0 5.6 93.6 87.0 90.2 75.9 74.0 74.9
BiLSTMAttention 87.2 78.2 82.4 96.9 92.5 94.7 94.2 98.0 96.1 93.3 93.4 93.3
DLSTMAttention 85.5 81.6 83.5 96.9 92.5 94.7 94.8 97.0 95.9 93.3 93.4 93.3
TransformerRNN 87.5 88.5 88.0 94.0 94.0 94.0 97.4 97.0 97.2 95.0 94.9 94.9
5
T. Zhang et al. Internet Interventions 25 (2021) 100422
Fig. 4. Confusion matrices for different models, SN stands for suicide notes, LS stands for last statements, NP stands for neutral posts.
6
T. Zhang et al. Internet Interventions 25 (2021) 100422
information from both syntactic and semantic aspects. Although the Cohen, J., 1992. A power primer[J]. Psychol. Bull. 112 (1), 155.
Cohn, M.A., Mehl, M.R., Pennebaker, J.W., 2004. Linguistic markers of psychological
hybrid structure may increase some model complexity and the duration
change surrounding September 11, 2001[J]. Psychol. Sci. 15 (10), 687–693.
of training, users can use it to classify notes automatically once the Desmet, B., Hoste, V.R., 2013. Emotion detection in suicide notes[J]. Expert Syst. Appl.
model is well-trained. 40 (16), 6351–6358.
There are also several potential limitations that are worth Dhingra, K., Boduszek, D., O’Connor, R.C., 2015. Differentiating suicide attempters from
suicide ideators using the integrated motivational–volitional model of suicidal
mentioning. First, the volume and sources of data are essential for behaviour[J]. J. Affect. Disord. 186, 211–218.
training a stable and robust supervised learning-based model. In our Foley, S.R., Kelly, B.D., 2018. Forgiveness, spirituality and love: thematic analysis of last
dataset, the suicide notes collected are still insufficient (659 samples). statements from Death Row, Texas (2002–17) [J]. QJM 111 (6), 399–403.
Frantzi, K., Ananiadou, S., Mima, H., 2000. Automatic recognition of multi-word terms:
Meanwhile, the Kaggle data is the text posted by users with a suicidal the c-value/nc-value method[J]. Int. J. Digit. Libr. 3 (2), 115–130.
thought. Although these notes are similar to suicide notes in terms of Gehring, J., Auli, M., Grangier, D., et al., 2017. Convolutional sequence to sequence
linguistic features after LIWC analysis and also can help us understand learning[C]//international conference on machine learning. PMLR 1243–1252.
Gregory, A., 1999. The decision to die: the psychology of the suicide note[J].
people's mental status, it's not sure if users died by suicide. Thus, future Interviewing Deception 127–156.
studies should collect more precise data from different social media and Hall, M., Frank, E., Holmes, G., et al., 2009. The WEKA data mining software: an update
groups of people. Additionally, semi-supervised and unsupervised ap [J]. ACM SIGKDD Explor. Newsl. 11 (1), 10–18.
He, K., Zhang, X., Ren, S., et al., 2016. Deep Residual Learning for Image Recognition
proaches can be applied to suicide note identification. Second, unlike [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern
machine learning-based models, deep learning-based models have the Recognition, pp. 770–778.
advantages of automatic capturing semantic information and achieve Hemming, L., Pratt, D., Shaw, J., et al., 2020. Prison staff’s views and understanding of
the role of emotions in prisoner suicide and violence[J]. J. Forensic Psychiatry
remarkable performance, the drawback is that they are not directly
Psychol. 31 (6), 868–888.
interpretable. This is often not suitable for clinical decision-making Ji, S., Pan, S., Li, X., et al., 2021. Suicidal ideation detection: a review of machine
process and needs to be taken into account when using such models. learning methods and applications[J]. IEEE Trans. Comput. Soc. Syst. 8 (1),
Despite these limitations, we believe that the application of deep 214–226.
Jones, N.J., Bennell, C., 2007. The development and validation of statistical prediction
learning in suicide note identification will have great development rules for discriminating between genuine and simulated suicide notes[J]. Arch.
prospects. Suicide Res. 11 (2), 219–233.
Just, M.A., Pan, L., Cherkassky, V.L., et al., 2017. Machine learning of neural
representations of suicide and emotion concepts identifies suicidal youth[J]. Nat.
6. Conclusions Hum. Behav. 1 (12), 911–919.
Kelly, B.D., Foley, S.R., 2018. Analysis of last statements prior to execution: methods,
We presented TransformerRNN, a transformer-based deep learning themes and future directions[J]. QJM 111 (1), 3–6.
Kim, Y., 2014. Convolutional neural networks for sentence classification[C]. In:
model, applied for suicide note identification. Our experiments Proceedings of the Conference on Empirical Methods in Natural Language
demonstrated that our model outperforms conventional machine Processing, vol. 2014, pp. 1746–1751.
learning models and deep learning approaches on different datasets. The Lam, G., Dongyan, H., Lin, W., 2019. Context-aware deep learning for multi-modal
depression detection[C]//ICASSP 2019-2019. In: IEEE International Conference on
method proposed in this paper can be used as a means to suicidal risk Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 3946–3950.
identification from social media. LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning[J]. nature 521 (7553), 436–444.
Li, Y., Yuan, Y., 2017. Convergence Analysis of Two-layer Neural Networks With Relu
Activation[J]. arXiv Preprint arXiv:1705.09886.
Declaration of competing interest Li, W., Zhu, L., Shi, Y., et al., 2020. User reviews: sentiment analysis using lexicon
integrated two-channel CNN–LSTM family models[J]. Appl. Soft Comput. 94,
The authors declare that they have no known competing financial 106435.
Lin, P., Luo, X., 2020. A Survey of Sentiment Analysis Based on Machine Learning[C]//
interests or personal relationships that could have appeared to influence CCF International Conference on Natural Language Processing and Chinese
the work reported in this paper. Computing. Springer, Cham, pp. 372–387.
Luxton, D.D., June, J.D., Fairall, J.M., 2012. Social media and suicide: a public health
perspective[J]. Am. J. Public Health 102 (S2), S195-S200.
Acknowledgements Majumder, N., Poria, S., Peng, H., et al., 2019. Sentiment and sarcasm classification with
multitask learning[J]. IEEE Intell. Syst. 34 (3), 38–43.
This research was funded by the National Centre for Text Mining and Nassif, A.B., Shahin, I., Attili, I., et al., 2019. Speech recognition using deep neural
networks: a systematic review[J]. IEEE Access 7, 19143–19165.
MRC grant MR/R022461/1. Osgood, C.E., Walker, E.G., 1959. Motivation and language behavior: a content analysis
of suicide notes[J]. J. Abnorm. Soc. Psychol. 59 (1), 58.
References Pennebaker, J.W., Chung, C.K., Frazee, J., et al., 2014. When small words foretell
academic success: the case of college admissions essays[J]. PLoS One 9 (12),
e115844.
Acharya, U.R., Oh, S.L., Hagiwara, Y., et al., 2018. Automated EEG-based screening of
Pennebaker, J.W., Boyd, R.L., Jordan, K., et al., 2015. The Development and
depression using deep convolutional neural network[J]. Comput. Methods Prog.
Psychometric Properties of LIWC2015[R].
Biomed. 161, 103–113.
Pennington, J., Socher, R., Manning, C.D., 2014. Glove: Global Vectors for Word
Akhtar, M.S., Ekbal, A., Cambria, E., 2020. How intense are you? Predicting intensities of
Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in
emotions and sentiments using stacked ensemble [application notes] [J]. IEEE
Natural Language Processing (EMNLP), pp. 1532–1543.
Comput. Intell. Mag. 15 (1), 64–75.
Pestian, J., Nasrallah, H., Matykiewicz, P., et al., 2010. Suicide note classification using
Ba, J.L., Kiros, J.R., Hinton, G.E., 2016. Layer Normalization[J]. arXiv Preprint arXiv:
natural language processing: a content analysis[J]. Biomedical Informatics Insights 3
1607.06450.
(BII. S4706).
Baker, M.C., Baker, M.C., 2003. Lexical Categories: Verbs, Nouns and Adjectives[M].
Pestian, J.P., Matykiewicz, P., Linn-Gust, M., 2012. What’s in a note: construction of a
Cambridge University Press.
suicide note corpus[J]. Biomedical informatics insights 5 (BII. S10213).
Basiri, M.E., Nemati, S., Abdar, M., et al., 2021. ABCDM: an attention-based bidirectional
Schoene, A.M., Dethlefs, N., 2016. Automatic identification of suicide notes from
CNN-RNN deep model for sentiment analysis[J]. Futur. Gener. Comput. Syst. 115,
linguistic and sentiment features[C]//. In: Proceedings of the 10th SIGHUM
279–294.
Workshop on Language Technology for Cultural Heritage, Social Sciences, and
Bengio, Y., Ducharme, R., Vincent, P., et al., 2003. A neural probabilistic language model
Humanities, pp. 128–133.
[J]. J. Mach. Learn. Res. 3, 1137–1155.
Schoene, A.M., Dethlefs, N., 2018. Unsupervised Suicide Note Classification[C]//
Cambria, E., 2016. Affective computing and sentiment analysis[J]. IEEE Intell. Syst. 31
Workshop on Issues of Sentiment Discovery and Opinion Mining at Knowledge
(2), 102–107.
Discovery and Data Mining (KDD), pp. 1–9.
Cambria, E., Li, Y., Xing, F.Z., et al., 2020. SenticNet 6: Ensemble application of symbolic
Schoene, A.M., Lacey, G., Turner, A.P., et al., 2019. Dilated lstm With Attention for
and subsymbolic AI for sentiment analysis[C]//. In: Proceedings of the 29th ACM
Classification of Suicide Notes[C]//Proceedings of the Tenth International Workshop
International Conference on Information & Knowledge Management, pp. 105–114.
on Health Text Mining and Information Analysis (LOUHI 2019), pp. 136–145.
Chaffey, D., 2016. Global Social Media Statistics Summary[J]. smartinsights. com.
Springenberg, J.T., Dosovitskiy, A., Brox, T., et al., 2014. Striving for Simplicity: The All
Chen, T., Xu, R., He, Y., et al., 2017. Improving sentiment analysis via sentence type
Convolutional net[J]. arXiv Preprint arXiv:1412.6806.
classification using BiLSTM-CRF and CNN[J]. Expert Syst. Appl. 72, 221–230.
Szegedy, C., Toshev, A.D., 2013. Erhan. Deep neural networks for object detection [J].
Cohan, A., Desmet, B., Yates, A., et al., 2018. SMHD: A Large-scale Resource for
Adv. Neural Inf. Proces. Syst. 26.
Exploring Online Language Usage for Multiple Mental Health Conditions[J]. arXiv
Preprint arXiv:1806.05258.
7
T. Zhang et al. Internet Interventions 25 (2021) 100422
Tang, D., Qin, B., Liu, T., 2015. Document modeling with gated recurrent neural network World Health Organization, 2014. Preventing suicide: A global imperative[M]. World
for sentiment classification[C]//. In: Proceedings of the 2015 Conference on Health Organization.
Empirical Methods in Natural Language Processing, pp. 1422–1432. Young, T., Hazarika, D., Poria, S., et al., 2018. Recent trends in deep learning based
Tausczik, Y.R., Pennebaker, J.W., 2010. The psychological meaning of words: LIWC and natural language processing[J]. IEEE Comput. Intell. Mag. 13 (3), 55–75.
computerized text analysis methods[J]. J. Lang. Soc. Psychol. 29 (1), 24–54. Zhang, L., Wang, S., Liu, B., 2018. Deep learning for sentiment analysis: a survey[J].
Vaswani, A., Shazeer, N., Parmar, N., et al., 2017. Attention is All You Need[J]. arXiv Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8 (4), e1253.
Preprint arXiv:1706.03762.