0% found this document useful (0 votes)
13 views8 pages

Automatic Identification of Suicide Notes With A Transformer-Based Deep

Uploaded by

Rajkumar231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views8 pages

Automatic Identification of Suicide Notes With A Transformer-Based Deep

Uploaded by

Rajkumar231
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Internet Interventions 25 (2021) 100422

Contents lists available at ScienceDirect

Internet Interventions
journal homepage: www.elsevier.com/locate/invent

Automatic identification of suicide notes with a transformer-based deep


learning model
Tianlin Zhang a, *, Annika M. Schoene a, Sophia Ananiadou a, b
a
Department of Computer Science, The University of Manchester, National Centre for Text Mining, Manchester, UK
b
The Alan Turing Institute, London, UK

A R T I C L E I N F O A B S T R A C T

Keywords: Suicide is one of the leading causes of death worldwide. At the same time, the widespread use of social media has
Suicide notes led to an increase in people posting their suicide notes online. Therefore, designing a learning model that can aid
Social media the detection of suicide notes online is of great importance. However, current methods cannot capture both local
Deep learning
and global semantic features. In this paper, we propose a transformer-based model named TransformerRNN,
Natural language processing
Transformer-based model
which can effectively extract contextual and long-term dependency information by using a transformer encoder
and a Bi-directional Long Short-Term Memory (BiLSTM) structure. We evaluate our model with baseline ap­
proaches on a dataset collected from online sources (including 659 suicide notes, 431 last statements, and 2000
neutral posts). Our proposed TransformerRNN achieves 95.0%, 94.9% and 94.9% performance in P, R and F1-
score metrics respectively and therefore outperforms comparable machine learning and state-of-the-art deep
learning models. The proposed model is effective for classifying suicide notes, which in turn, may help to develop
suicide prevention technologies for social media.

1. Introduction Pestian et al., 2012) focused on emotion features and latent semantic
features to identify suicide notes. In addition, some conventional ma­
According to the World Health Organization (WHO), the total chine learning algorithms such as Logistic Mode Tree (LMT) and Naive
number of people dying from suicide is nearly 800,000 a year, and a Bayes model are also used (Schoene and Dethlefs, 2016). Although these
recent study predicts the number is continually rising (Dhingra et al., approaches have achieved some success, they rely heavily on feature
2015). Furthermore, suicide has become one of the leading causes of engineering and costly expert knowledge from professionals such as
death (World Health Organization, 2014), which makes it a public forensic linguists and psychiatrists.
health concern worldwide. Recently, social media platforms like Twitter Deep learning allows models to automatically learn representations
and Facebook have become increasingly popular where people between from data (LeCun et al., 2015) and has recently brought about a number
16 and 34 years old are more active (Chaffey, 2016). There also has been of breakthroughs in natural language processing (Young et al., 2018),
a growing trend that young people who potentially have suicide ideation computer vision (Szegedy and Toshev, 2013) and speech recognition
leave their suicide notes on social media platforms (Desmet and Hoste, (Nassif et al., 2019). Moreover, some promising methods based on deep
2013; Ji et al., 2020; Luxton et al., 2012). Therefore, the automatic learning have been introduced to some mental health applications (e.g.,
identification of suicide notes can play an important role in under­ depression detection (Acharya et al., 2018; Lam et al., 2019)) and ach­
standing people's mental health status and help to prevent suicidal ieved competitive performance. Sentiment analysis is concerned with
behavior. detecting emotion and sentiment in textual data and is key for many
Previous works in identifying suicide notes used hand-crafted fea­ Artificial Intelligence applications (Cambria, 2016). Early work related
tures and feature selection, including sentiment and linguistic features. to sentiment analysis mainly focused on the linguistic feature selection
For example, Jones et al. (Jones and Bennell, 2007) designed a classi­ using machine learning methods (Lin and Luo, 2020) (e.g., Support
fication model based on statistical prediction rules like average sentence Vector Machine (SVM), Latent Dirichlet Allocation (LDA)) to improve
length and other structural features. Pestian et al. (Pestian et al., 2010; the performance. More recently, deep learning approaches have become

* Corresponding author.
E-mail address: [email protected] (T. Zhang).

https://doi.org/10.1016/j.invent.2021.100422
Received 16 February 2021; Received in revised form 2 June 2021; Accepted 22 June 2021
Available online 24 June 2021
2214-7829/© 2021 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
T. Zhang et al. Internet Interventions 25 (2021) 100422

increasingly popular for a variety of sentiment analysis tasks. There are fitness, r/parenting, r/teaching, r/relationships, etc.)2 where the posts
classic multiple neural network architectures (Zhang et al., 2018), did not contain obvious suicidal content. There is a total of 2000 samples
including Convolutional Neural Networks (CNN), LSTM, LSTM with in this corpus.
attention to extract subjective information. Cambria et al. (Cambria The data was collected from the public domain and we did not
et al., 2020) built SenticNet6, a commonsense knowledge base, by using discriminate between gender or any other distinguished factors. To
an ensemble of symbolic and sub-symbolic AI tools for sentiment anal­ protect the authors' identity and preserve their privacy, we also removed
ysis. Basiri et al. (Basiri et al., 2021) proposed an attention-based CNN- personal information. Moreover, all data were also checked manually to
BiLSTM learning model to consider temporal information of texts. Li ensure the accuracy of the label. Fig. 1 shows some examples of our
et al. (Li et al., 2020) designed a lexicon integrated two-channel CNN- dataset.
BiLSTM model to improve performance. In addition, stacked ensemble
learning (Akhtar et al., 2020) and multi-task learning (Majumder et al.,
2019) are also used for sentiment analysis. 2.2. Dataset analysis
Similar to sentiment classification (Tang et al., 2015), deep learning
is also a useful technique for identifying suicide notes, e.g., dilated LSTM To better understand the linguistic clues and language usage of
with attention (DLSTMAttention) (Schoene et al., 2019). However, these people who leave suicide notes behind, we analyzed our dataset in
methods cannot capture both local and global semantic features. words, topics and other linguistic features.
In this study, we propose a transformer-based deep learning model Table 1 shows a quantitative comparison of our three corpora in
named TransformerRNN, which can extract contextual information and terms of the number of notes and posts, the average number of words in
latent features to identify suicide notes by using the transformer encoder each note and the average number of words in each sentence. It can be
and BiLSTM. We evaluate the TransformerRNN using conventional seen that the average note length of suicide notes is greater than others.
machine learning methods and deep learning-based models on the same Research by (Gregory, 1999) has shown that this could be due to people
dataset. The results show that our model is better than baseline ap­ conveying their feelings as much as possible before they commit suicide.
proaches on the suicide note identification task. At the same time, the average number of words in a sentence of last
statements is the lowest, which could be because people break their
communication down into shorter units during stressful situations
2. Dataset
(Osgood and Walker, 1959), such as being a prison inmate on death row.
In addition, term clouds were used to compare the usage of high-
2.1. Dataset collection
frequency terms visually in different texts. The suicide notes
frequently use some terms such as “mental health”, the mention of
Identifying suicide notes is a subtask of text classification within the
people (wife, William, friend etc.) and “life” as shown in Fig. 2(a),
mental health domain. Besides suicide notes, we added last statements
indicating that the writers have suicidal tendencies. Fig. 2(b) shows that
that were written by prison and a number of posts containing no obvious
last statement writers are showing their repentance by using “god”,
references to suicidal behavior. Therefore, in our experiments, the
“jesus christ” and “death row”. For example, someone wrote, “In the
dataset covers suicide notes, last statements, and neutral posts, which is
name of Jesus, I am sorry for the pain I caused you all.” For neutral
a 3-class classification task.
Reddit posts, the dominant terms are mainly about everyday life like
“student”, “credit card”, “story” and “guy”.
2.1.1. Suicide notes
In order to show the different linguistic and psychological features in
Some data was collected from existing corpora (Schoene et al.,
our datasets, we used the LIWC to analyze each type of note and post.
2019), where it is known that the note writer has died by suicide. Due to
the limited dataset size, we further extended our dataset with data
collected from Kaggle.1
However, we do not know if a user who posted suicidal thoughts
online has died by suicide. We used the Linguistic Inquiry and Word
Count software (LIWC 2015) (Pennebaker et al., 2015) to compare the
differences between the two datasets. LIWC (Pennebaker et al., 2015)
has been developed to extract linguistic and psychological information
via statical analysis based on word counts. We then use Cohen's d effect
size (Cohen, 1992) for each feature between each dataset to calculate the
statistical significance of each feature. We find that there are only a
small number of features that have a medium effect size (the result of
Cohen's d greater than 0.5), such as the emotions of a person, the usages
of informal language and the second person pronoun, whereas all other
linguistic features are similar. Therefore, we merge the two datasets
from different sources creating a new dataset of 659 samples.

2.1.2. Last statements


This data has been made available by the Department of Criminal
Justices (Schoene et al., 2019), containing 431 records written prior to
the death by prisoners who received death penalty between 1982 and
2017 in Texas totally.

2.1.3. Neutral posts Fig. 1. Examples of our dataset.


The neutral posts dataset was collected from ten subreddits (e.g., r/

1
https://www.kaggle.com/mohanedmashaly/suicide-notes/
2
https://www.reddit.com

2
T. Zhang et al. Internet Interventions 25 (2021) 100422

Table 1 (ii) The usage of function words and content words reflects how
Quantitative comparison of corpora. people communicate and what they say (Tausczik and Penne­
Corpora Suicide notes Last statements Neutral posts baker, 2010). It has been observed that suicide notes and last
statements use more personal pronouns because their authors
No. of notes 659 431 2000
Av. no. of words in note 143.30 110.97 130.90 prefer to focus on themselves (Just et al., 2017). We also
Av. no. of words in sentence 15.09 10.53 16.25 compared the average number of adjectives and adverbs. The
higher amount of these two parts of speech is observed in suicide
notes, which means it is more likely that people tend to use more
We also calculated effect sizes using Cohen's d (Cohen, 1992) be­ amplifying language (Baker and Baker, 2003), whereas the
tween pairwise corpora to find linguistic features that are statistically number of adjectives and adverbs in last statements is lower
significant (at least two results of Cohen's d greater than 0.5, because 0.2 because prisoners have limited time to express their feelings
indicates a small effect, 0.5 indicates a medium effect, and 0.8 indicates (Hemming et al., 2020).
a large effect). As shown in Table 2, the listed items include dimension (iii) Social processes stand for the social relationships of writers,
analysis, function and content word, affect analysis, social process, and where we observe that in suicide notes writers tend to write less
personal concerns. about social issues and family, while we observe the opposite in
the results of last statements. The reason might be related to the
(i) The clout and tone for suicide notes are lowest, and last state­ low frequency in interpersonal relationships (Kelly and Foley,
ments are highest overall. Clout refers to a person's social confi­ 2018).
dence or status in text (Pennebaker et al., 2014). Therefore, the (iv) Personal concerns highlight the common topics covered in notes.
results indicate that people who wrote suicide notes have a lower Unsurprisingly, most neutral posts refer to words related to work,
socio-economic status (Cohan et al., 2018). Tone stands for the and the topic of death is commonly referenced in suicide notes
emotional tone, where higher scores indicate greater emotional and last statements. Moreover, words related to religion are most
positivity (Cohn et al., 2004). The analysis of tone has also been referenced in suicide notes, which is confirmed by previous
verified in terms of affect analysis in Table 2, demonstrating that studies (Foley and Kelly, 2018) (Just et al., 2017).
suicide notes express negative emotions (e.g., sadness, anxiety)
and last statements often use resignation words (Schoene and
Dethlefs, 2016).

Fig. 2. Term cloud visualization of our dataset, the term clouds were generated using the Termine system (Frantzi et al., 2000).
http://www.nactem.ac.uk/software/termine/.

3
T. Zhang et al. Internet Interventions 25 (2021) 100422

Table 2 3.1. Input embeddings


Linguistic statistical information extracted by LIWC.
Corpora Suicide notes Last statements Neutral posts Word embeddings are the distributed representation of words, which
are more suitable for natural language processing tasks and are used as
Dimension analysis
Clout 26.42 67.78 45.65 input into neural networks (Bengio et al., 2003). In this paper, we use
Tone 33.48 75.46 42.32 pretrained GloVe (Pennington et al., 2014) word representation for the
word embeddings of inputs. Therefore, the input sequence is embedded
Function and content words
Personal pronouns 15.29 19.75 11.33 into word vectors of W = {w1, w1⋯wn}, W ∈ n × d, where n is the length
Adjectives 4.42 2.54 4.04 of note and d is the dimension of word embeddings.
Adverbs 6.45 3.09 5.47

Affect analysis 3.2. Transformer encoder


Positive emotion 3.74 8.61 2.63
Negative emotion 4.06 2.55 1.97 Transformer encoders are a new type of sequence transduction
Social processes model that can interactively calculate each word of the sequence to
Social 8.61 17.56 9.96 capture both local semantic and long-term dependency information
Family 0.75 2.11 0.85 without any convolutional or recursive structures (Vaswani et al., 2017).
Personal concerns In this paper, we use the transformer encoder to model the input text.
Work 1.01 0.40 2.98 The transformer encoder architecture contains the following com­
Religions 0.29 2.64 0.46
ponents: multi-head self-attention layer, fully connected feed-forward
Death 1.28 0.68 0.15
network, layer normalization and positional encodings. The general
architecture is shown as a light green block in Fig. 3.
3. Method Firstly, the positional encodings are added to the input embeddings
to ensure that the model take advantage of the word-order or fixed
In this section, we propose a Transformer-based Recurrent Neural sequential information, including relative and absolute positional in­
Network (TransformerRNN) to identify suicide notes automatically. For formation since there is no convolution or recurrence. In this work, we
this task, the input of the model is a note N, which is an input sequence of use sine and cosine functions of different frequencies proposed by
words w1, w2⋯wn. The output of the model is a predicted label L (suicide Gehring et al. (Gehring et al., 2017) to get positional encodings.
notes, last statements or neutral posts). The general architecture of The multi-head self-attention layer is the basic module of trans­
TransformerRNN is shown in Fig. 3, which consists of five components: former encoder. The self-attention mechanism can be described as
(1) input embeddings, (2) transformer encoder, (3) BiLSTM, (4) max- mapping a Query (Q) and a set of Key-Value (K–V) pairs to an output
pooling layer and (5) a classification layer. In the following sub­ (Vaswani et al., 2017):
sections, we will introduce each component of our model in detail. ( T)
QK
Attention(Q, K, V) = softmax √̅̅̅̅̅ V
dk

Fig. 3. The overall architecture of TransformerRNN. The model contains five components: input embeddings, transformer encoder, BiLSTM, max-pooling layer and
classification layer. The symbol ⊕ denotes vector concatenation. The internal architecture of transformer encoder is shown in light green block. More details about
our model are provided in the main text.

4
T. Zhang et al. Internet Interventions 25 (2021) 100422

where Q, K, V and output are all matrices when a set of queries are PL = softmax(W3 H + b3 )
computed simultaneously, and dk is the dimension of queries and keys.
Meanwhile, in order to allow the model to jointly gain information from We train the model to minimize cross-entropy error:
different representation sub-spaces at different positions, multi-head 1∑ c

self- attention is used. Loss = −


c i=1
ti logPL

MultiHead(Q, K, V) = Concat(head1 , ⋯headh )W O


where c is the number of notes type and ti ∈ {0, 1, 2} is the ground truth
where headi = Attention(QWQ K V Q
i , KWi , VWi ), Wi ∈ ℝ
dmodel×dk
, WKi ∈ ℝdmodel×dk, of label.
V dmodel×dv O ndv×dmodel
Wi ∈ ℝ and Wi ∈ ℝ , h is the number of heads, dk = dv =
dmodel/h. 4. Results
Next, the output of the multi-head self-attention layer is fed into a
fully connected feed-forward network, which consists of two linear We use precision (P), recall (R) and F1-score (F1) as complementary
transformations with a Rectified Linear Unit (ReLU) (Li and Yuan, 2017) evaluation metrics to evaluate the model's classification performance on
activation in between. each class. We also use the weighted average metric method to show the
overall performance. As shown in Table 3, the top, middle, and bottom
FFN(x) = ReLU(W1 x + b1 )W2 + b2
parts are the machine learning-based baselines, the deep learning-based
Additionally, the transformer encoder architecture contains a resid­ models and our model's results, respectively. The J48 Decision Tree
ual connection (He et al., 2016) and layer normalization (Ba et al., 2016) (J48), Naive Bayes, Bayes Net and LMT were developed by using WEKA
to accelerate the convergence speed. toolkit (Hall et al., 2009). Additionally, we also chose to benchmark our
model also against CNN (Kim, 2014), BiLSTM (Schoene et al., 2019),
BiLSTMAttention (Schoene and Dethlefs, 2018) and DLSTMAttention
3.3. BiLSTM layer (Schoene et al., 2019) on the same datasets.
We split the data into training, validation, and testing subsets with a
As shown in Fig. 3, we concatenate input embeddings and the hidden proportion of 70%, 15%, 15%. We tune all parameters on the validation
outputs of the transformer encoder so that the resulting representation data, and the best performance results are reported based on test data.
contains both semantic information and contextual information. Then, For tuned hyper-parameters of the TransformerRNN, we set the vector
we encode the transformer-based sequence via BiLSTM (Chen et al., size of word embedding at 200, the initial learning rate as 0.0005, the
2017), which can not only capture long-term dependencies but also dropout rate as 0.5, the dimension of BiLSTM hidden state as 128, the
obtain context-aware information by modeling sequences from forward number of attention heads as 4, and the mini-batch size as 64.
and backward hidden states. This BiLSTM contains a forward LSTM The experimental results are summarized in Table 3, where we can
observe that:
̅̅̅→ ←
LSTM and a backward LSTM LSTM , which learns sequence information
from both directions.
(i) In traditional machine learning models, LMT and Bayes Net
→ ̅̅̅ ̅→( ̅→ ) ← ← ( ← ) → ← classifiers gain relatively good performance, showing 88.3% and
hi = LSTM hi− 1 , xi , hi = LSTM hi+1 , xi , hi = hi ⊕ hi
87.7% in average F1-score. But the F1-scores of suicide notes are
→ ← not high, with 73.1% and 76.6%, which shows that the conven­
where hi ∈ ℝh , and hi ∈ ℝh (h is dimension of output) are hidden states tional machine learning-based methods cannot capture the fea­
of forward and backward LSTM at position i, respectively. xi is the i-th tures of suicide notes effectively.
input, ⊕ denotes concatenation. Finally, we obtain the encoding (ii) When we use deep learning methods, the results illustrate that
sequence as H = [h1, h2⋯hn]. neural network frameworks perform better at classifying suicide
notes. For example, the CNN-based model achieves relatively
good performance in F1-score. It is also observed that the
3.4. Max-pooling layer and classification layer
BiLSTMAttention and DLSTMAttention outperform the tradi­
tional methods via attention mechanism, which makes them
After obtaining the output of the BiLSTM, we use it as direct input
achieve 93.3% performance in average F1-score, and win 18.4%
into the max-pooling layer. With the max-pooling operation, we can
compared to a vanilla BiLSTM. This proves that neural network-
capture the most important latent semantic information throughout the
based models with attention mechanism can make a significant
note (Springenberg et al., 2014). Then, the last part of TransformerRNN
contribution to suicide note classification by utilizing semantic
is a classification layer (also called output layer), which is similar to
representation.
traditional fully-connected layer. The prediction of probability distri­
bution is calculated by using the softmax function:

Table 3
The performance evaluation of different models on test set.
Method Suicide notes Last statements Neutral posts Avg.

P (%) R (%) F1 (%) P (%) R (%) F1 (%) P (%) R (%) F1 (%) P (%) R (%) F1 (%)

J48 67.5 64.4 65.9 64.5 73.1 68.5 86.1 84.7 85.4 79.3 79.1 79.2
Naive Bayes 66.7 69.0 67.8 53.3 83.6 65.1 95.4 82.3 88.4 83.7 80.0 81.8
Bayes Net 88.1 67.8 76.6 66.3 94.0 77.8 93.5 91.0 92.2 88.4 87.0 87.7
LMT 82.6 65.5 73.1 100 65.7 79.3 87.7 99.7 93.3 88.5 88.1 88.3
CNN 90.0 72.4 80.2 93.9 91.0 92.4 91.9 97.7 94.7 91.8 91.9 91.7
BiLSTM 42.9 83.9 56.8 40.0 3.0 5.6 93.6 87.0 90.2 75.9 74.0 74.9
BiLSTMAttention 87.2 78.2 82.4 96.9 92.5 94.7 94.2 98.0 96.1 93.3 93.4 93.3
DLSTMAttention 85.5 81.6 83.5 96.9 92.5 94.7 94.8 97.0 95.9 93.3 93.4 93.3
TransformerRNN 87.5 88.5 88.0 94.0 94.0 94.0 97.4 97.0 97.2 95.0 94.9 94.9

Values in bold are the maximum scores attained.

5
T. Zhang et al. Internet Interventions 25 (2021) 100422

(iii) Our proposed transformer-based model achieves the highest Table 4


scores on suicide notes, neutral posts and overall performance. The performance evaluation of TransformerRNN and corresponding ablation
Compared with DLSTMAttention, TransformerRNN drops 0.7% studies.
in F1-score and 2.9% in P on last statements. However, our model Method P (%) R (%) F1 (%)
has significant advantages in suicide notes classification which is TransformerRNN(ours) 95.0 94.9 94.9
more important for our task. Therefore, the results reveal our No max-pooling 94.1 94.2 94.1
model can be useful to identify suicide notes and outperform No BiLSTM 84.9 85.5 85.2
existing state-of-the-art approaches. No concatenated embedding 93.3 93.2 93.2
(iv) In order to display classified results intuitively, we looked at the
predicted labels in more details. Fig. 4 shows the normalized
5. Discussion
confusion matrices for different models over the test set. We
observe that machine learning models often correctly predict
The purpose of this research is to design a model for suicide note
neutral posts and misclassify suicide notes. For BiLSTM without
classification, which could be useful in finding messages indicating po­
attention mechanism, most last statements samples are mis­
tential suicidal behavior on social media platforms. Analyses from our
classified into suicide notes.
dataset suggest that suicide notes have their own linguistic features.
(v) We also carried out ablation studies by removing components
However, modeling with handcrafted identification rules is labor-
from the proposed TransformerRNN (Table 4). “No max-pooling”
intensive and costly. As seen in our experiments, our model out­
removed the max-pooling layer. “No BiLSTM” removed the
performs all other baseline methods without any feature engineering. By
BiLSTM part. “No concatenated embedding” removed word em­
encoding sentences with transformer encoder architecture, incorpo­
beddings in the concatenated hid- den representation of trans­
rating original word information, and capturing contextual information
former encoder. These results further prove the effectiveness of
through BiLSTM, the TransformerRNN can better exploit the notes
each component in our model.

Fig. 4. Confusion matrices for different models, SN stands for suicide notes, LS stands for last statements, NP stands for neutral posts.

6
T. Zhang et al. Internet Interventions 25 (2021) 100422

information from both syntactic and semantic aspects. Although the Cohen, J., 1992. A power primer[J]. Psychol. Bull. 112 (1), 155.
Cohn, M.A., Mehl, M.R., Pennebaker, J.W., 2004. Linguistic markers of psychological
hybrid structure may increase some model complexity and the duration
change surrounding September 11, 2001[J]. Psychol. Sci. 15 (10), 687–693.
of training, users can use it to classify notes automatically once the Desmet, B., Hoste, V.R., 2013. Emotion detection in suicide notes[J]. Expert Syst. Appl.
model is well-trained. 40 (16), 6351–6358.
There are also several potential limitations that are worth Dhingra, K., Boduszek, D., O’Connor, R.C., 2015. Differentiating suicide attempters from
suicide ideators using the integrated motivational–volitional model of suicidal
mentioning. First, the volume and sources of data are essential for behaviour[J]. J. Affect. Disord. 186, 211–218.
training a stable and robust supervised learning-based model. In our Foley, S.R., Kelly, B.D., 2018. Forgiveness, spirituality and love: thematic analysis of last
dataset, the suicide notes collected are still insufficient (659 samples). statements from Death Row, Texas (2002–17) [J]. QJM 111 (6), 399–403.
Frantzi, K., Ananiadou, S., Mima, H., 2000. Automatic recognition of multi-word terms:
Meanwhile, the Kaggle data is the text posted by users with a suicidal the c-value/nc-value method[J]. Int. J. Digit. Libr. 3 (2), 115–130.
thought. Although these notes are similar to suicide notes in terms of Gehring, J., Auli, M., Grangier, D., et al., 2017. Convolutional sequence to sequence
linguistic features after LIWC analysis and also can help us understand learning[C]//international conference on machine learning. PMLR 1243–1252.
Gregory, A., 1999. The decision to die: the psychology of the suicide note[J].
people's mental status, it's not sure if users died by suicide. Thus, future Interviewing Deception 127–156.
studies should collect more precise data from different social media and Hall, M., Frank, E., Holmes, G., et al., 2009. The WEKA data mining software: an update
groups of people. Additionally, semi-supervised and unsupervised ap­ [J]. ACM SIGKDD Explor. Newsl. 11 (1), 10–18.
He, K., Zhang, X., Ren, S., et al., 2016. Deep Residual Learning for Image Recognition
proaches can be applied to suicide note identification. Second, unlike [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern
machine learning-based models, deep learning-based models have the Recognition, pp. 770–778.
advantages of automatic capturing semantic information and achieve Hemming, L., Pratt, D., Shaw, J., et al., 2020. Prison staff’s views and understanding of
the role of emotions in prisoner suicide and violence[J]. J. Forensic Psychiatry
remarkable performance, the drawback is that they are not directly
Psychol. 31 (6), 868–888.
interpretable. This is often not suitable for clinical decision-making Ji, S., Pan, S., Li, X., et al., 2021. Suicidal ideation detection: a review of machine
process and needs to be taken into account when using such models. learning methods and applications[J]. IEEE Trans. Comput. Soc. Syst. 8 (1),
Despite these limitations, we believe that the application of deep 214–226.
Jones, N.J., Bennell, C., 2007. The development and validation of statistical prediction
learning in suicide note identification will have great development rules for discriminating between genuine and simulated suicide notes[J]. Arch.
prospects. Suicide Res. 11 (2), 219–233.
Just, M.A., Pan, L., Cherkassky, V.L., et al., 2017. Machine learning of neural
representations of suicide and emotion concepts identifies suicidal youth[J]. Nat.
6. Conclusions Hum. Behav. 1 (12), 911–919.
Kelly, B.D., Foley, S.R., 2018. Analysis of last statements prior to execution: methods,
We presented TransformerRNN, a transformer-based deep learning themes and future directions[J]. QJM 111 (1), 3–6.
Kim, Y., 2014. Convolutional neural networks for sentence classification[C]. In:
model, applied for suicide note identification. Our experiments Proceedings of the Conference on Empirical Methods in Natural Language
demonstrated that our model outperforms conventional machine Processing, vol. 2014, pp. 1746–1751.
learning models and deep learning approaches on different datasets. The Lam, G., Dongyan, H., Lin, W., 2019. Context-aware deep learning for multi-modal
depression detection[C]//ICASSP 2019-2019. In: IEEE International Conference on
method proposed in this paper can be used as a means to suicidal risk Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 3946–3950.
identification from social media. LeCun, Y., Bengio, Y., Hinton, G., 2015. Deep learning[J]. nature 521 (7553), 436–444.
Li, Y., Yuan, Y., 2017. Convergence Analysis of Two-layer Neural Networks With Relu
Activation[J]. arXiv Preprint arXiv:1705.09886.
Declaration of competing interest Li, W., Zhu, L., Shi, Y., et al., 2020. User reviews: sentiment analysis using lexicon
integrated two-channel CNN–LSTM family models[J]. Appl. Soft Comput. 94,
The authors declare that they have no known competing financial 106435.
Lin, P., Luo, X., 2020. A Survey of Sentiment Analysis Based on Machine Learning[C]//
interests or personal relationships that could have appeared to influence CCF International Conference on Natural Language Processing and Chinese
the work reported in this paper. Computing. Springer, Cham, pp. 372–387.
Luxton, D.D., June, J.D., Fairall, J.M., 2012. Social media and suicide: a public health
perspective[J]. Am. J. Public Health 102 (S2), S195-S200.
Acknowledgements Majumder, N., Poria, S., Peng, H., et al., 2019. Sentiment and sarcasm classification with
multitask learning[J]. IEEE Intell. Syst. 34 (3), 38–43.
This research was funded by the National Centre for Text Mining and Nassif, A.B., Shahin, I., Attili, I., et al., 2019. Speech recognition using deep neural
networks: a systematic review[J]. IEEE Access 7, 19143–19165.
MRC grant MR/R022461/1. Osgood, C.E., Walker, E.G., 1959. Motivation and language behavior: a content analysis
of suicide notes[J]. J. Abnorm. Soc. Psychol. 59 (1), 58.
References Pennebaker, J.W., Chung, C.K., Frazee, J., et al., 2014. When small words foretell
academic success: the case of college admissions essays[J]. PLoS One 9 (12),
e115844.
Acharya, U.R., Oh, S.L., Hagiwara, Y., et al., 2018. Automated EEG-based screening of
Pennebaker, J.W., Boyd, R.L., Jordan, K., et al., 2015. The Development and
depression using deep convolutional neural network[J]. Comput. Methods Prog.
Psychometric Properties of LIWC2015[R].
Biomed. 161, 103–113.
Pennington, J., Socher, R., Manning, C.D., 2014. Glove: Global Vectors for Word
Akhtar, M.S., Ekbal, A., Cambria, E., 2020. How intense are you? Predicting intensities of
Representation[C]//Proceedings of the 2014 Conference on Empirical Methods in
emotions and sentiments using stacked ensemble [application notes] [J]. IEEE
Natural Language Processing (EMNLP), pp. 1532–1543.
Comput. Intell. Mag. 15 (1), 64–75.
Pestian, J., Nasrallah, H., Matykiewicz, P., et al., 2010. Suicide note classification using
Ba, J.L., Kiros, J.R., Hinton, G.E., 2016. Layer Normalization[J]. arXiv Preprint arXiv:
natural language processing: a content analysis[J]. Biomedical Informatics Insights 3
1607.06450.
(BII. S4706).
Baker, M.C., Baker, M.C., 2003. Lexical Categories: Verbs, Nouns and Adjectives[M].
Pestian, J.P., Matykiewicz, P., Linn-Gust, M., 2012. What’s in a note: construction of a
Cambridge University Press.
suicide note corpus[J]. Biomedical informatics insights 5 (BII. S10213).
Basiri, M.E., Nemati, S., Abdar, M., et al., 2021. ABCDM: an attention-based bidirectional
Schoene, A.M., Dethlefs, N., 2016. Automatic identification of suicide notes from
CNN-RNN deep model for sentiment analysis[J]. Futur. Gener. Comput. Syst. 115,
linguistic and sentiment features[C]//. In: Proceedings of the 10th SIGHUM
279–294.
Workshop on Language Technology for Cultural Heritage, Social Sciences, and
Bengio, Y., Ducharme, R., Vincent, P., et al., 2003. A neural probabilistic language model
Humanities, pp. 128–133.
[J]. J. Mach. Learn. Res. 3, 1137–1155.
Schoene, A.M., Dethlefs, N., 2018. Unsupervised Suicide Note Classification[C]//
Cambria, E., 2016. Affective computing and sentiment analysis[J]. IEEE Intell. Syst. 31
Workshop on Issues of Sentiment Discovery and Opinion Mining at Knowledge
(2), 102–107.
Discovery and Data Mining (KDD), pp. 1–9.
Cambria, E., Li, Y., Xing, F.Z., et al., 2020. SenticNet 6: Ensemble application of symbolic
Schoene, A.M., Lacey, G., Turner, A.P., et al., 2019. Dilated lstm With Attention for
and subsymbolic AI for sentiment analysis[C]//. In: Proceedings of the 29th ACM
Classification of Suicide Notes[C]//Proceedings of the Tenth International Workshop
International Conference on Information & Knowledge Management, pp. 105–114.
on Health Text Mining and Information Analysis (LOUHI 2019), pp. 136–145.
Chaffey, D., 2016. Global Social Media Statistics Summary[J]. smartinsights. com.
Springenberg, J.T., Dosovitskiy, A., Brox, T., et al., 2014. Striving for Simplicity: The All
Chen, T., Xu, R., He, Y., et al., 2017. Improving sentiment analysis via sentence type
Convolutional net[J]. arXiv Preprint arXiv:1412.6806.
classification using BiLSTM-CRF and CNN[J]. Expert Syst. Appl. 72, 221–230.
Szegedy, C., Toshev, A.D., 2013. Erhan. Deep neural networks for object detection [J].
Cohan, A., Desmet, B., Yates, A., et al., 2018. SMHD: A Large-scale Resource for
Adv. Neural Inf. Proces. Syst. 26.
Exploring Online Language Usage for Multiple Mental Health Conditions[J]. arXiv
Preprint arXiv:1806.05258.

7
T. Zhang et al. Internet Interventions 25 (2021) 100422

Tang, D., Qin, B., Liu, T., 2015. Document modeling with gated recurrent neural network World Health Organization, 2014. Preventing suicide: A global imperative[M]. World
for sentiment classification[C]//. In: Proceedings of the 2015 Conference on Health Organization.
Empirical Methods in Natural Language Processing, pp. 1422–1432. Young, T., Hazarika, D., Poria, S., et al., 2018. Recent trends in deep learning based
Tausczik, Y.R., Pennebaker, J.W., 2010. The psychological meaning of words: LIWC and natural language processing[J]. IEEE Comput. Intell. Mag. 13 (3), 55–75.
computerized text analysis methods[J]. J. Lang. Soc. Psychol. 29 (1), 24–54. Zhang, L., Wang, S., Liu, B., 2018. Deep learning for sentiment analysis: a survey[J].
Vaswani, A., Shazeer, N., Parmar, N., et al., 2017. Attention is All You Need[J]. arXiv Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8 (4), e1253.
Preprint arXiv:1706.03762.

You might also like