Using Machine Learning Algorithms To Detect Suicide Risk Factors On Twitter
Using Machine Learning Algorithms To Detect Suicide Risk Factors On Twitter
Abstract: The goal from this study is to identify suicide risk factors been known to be used as a platform for suicidal messages and
on Twitter. We propose a machine learning framework that could suicide notes [5]. Social media platforms have even been used
be potentially useful for suicide prevention interventions. We ap- as venues to live-stream suicide attempts showing that these
plied search terms from the suicidal ideation tracking framework warning signs need to be taken seriously [6]. Twitter has recog-
proposed by Jashinsky et al. and downloaded 12,066 public tweets
from 3,873 users via Twitter’s application programming interface
nized this serious risk and has put a service in place for people
(API). We created “HighRisk” or “AtRisk” labels for users based who are, or know of somebody who is, suicidal to reach out and
on their suicidal ideation terms’ usage and applied three topic dis- get help [7]. However, Twitter is not proactive in identifying
covery algorithms to find underlying suicide risk factors among users at risk, and reporting is at the discretion of the user and
users, which were subsequently used to classify users into “High- not in real time. To make suicide prevention timely and effec-
Risk” or “AtRisk”. Algorithms applied included Latent Semantic tive, the suicide-related data has to be collected, analyzed and
Analysis, Latent Dirichlet Allocation, Non-negative Matrix Fac- reported in a timely manner, so that interventions can be made
torization, Decision Tree and K-means Clustering. Our topic dis- before the person commits suicide.
covery approach detected 7 out of 12 suicide risk factors proposed
by Jashinsky et al. Using a decision tree classification model that Prior research has identified specific terms or phrases in
utilized these factors, we achieved 0.844 in precision, 0.912 in sen- tweets indicative of suicide risk factors [5]. While most research
sitivity, and 0.829 in specificity in classifying users into “HighRisk” has focused on using machine learning in conjunction with hu-
and “AtRisk” groups. The development of this framework supple- man annotators and/or suicide research experts, our study
ments suicide researchers and suicide prevention efforts, with a
dropped the human factor to reduce cost and utilized machine
potential to be employed at run-time.
learning approaches to increase efficiency in identifying users
Keywords- Suicide ideation, Topic modeling, Text analysis, La- at high risk of suicide. For example, in [5], human annotators
tent Semantic Analysis determined the suicide risk factors and linked them to certain
1. Introduction terms and phrases in the tweets. In our previous work [8, 9], we
Suicide is the 10th leading cause of death in the United States utilized the suicide risk factor framework by [5] to detect rela-
with an estimate cost of $51-billion annually [1], making sui- tional features and language patterns indicative of suicidal ide-
cide prevention not only a public health issue, but also an eco- ation. In this study, we applied reproducible machine learning
nomic one. Center for Disease Control and Prevention (CDC) algorithms including Latent Semantic Analysis (LSA), Latent
reported that for youth between the ages of 10 and 24, suicide Dirichlet Allocation (LDA) and Non-Negative Matrix Factori-
is the third leading cause of death. [1] Even more concerning, zation (NMF) to identify suicide related topics and themes of
the results of a recent study on 32 U.S. children’s hospitals discussion in tweets and studied the user cluster formations and
which show that rates of suicide and serious self-harm in chil- user profile classifications.
dren and adolescents have increased steadily from 2008 to 2. Related Work
2015. [2] Social media has been identified as playing a possible The study of Twitter data in suicide analysis has become
role in contributing to suicide through copycat actions, mainly more prevalent as there are known issues with recall and context
in vulnerable and impressionable youth [3]. Twitter is a social bias in the psychological assessments that have been studied in
network platform where users share messages limited to 140 the past [10]. There is a lack of textual data for review in tradi-
characters. It has been shown that 21% of Americans use Twit- tional analysis of suicide and the review of Twitter data offers
ter, with 36% of them age between 18 and 29 [4]. Twitter has an insight into the day-to-day feelings of individual users [11].
942
Authorized licensed use limited to: University of Gloucestershire. Downloaded on March 02,2024 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
she was removed. In the filtered dataset, there were 280 comparing the suicide risk factor assignments across three al-
“HighRisk” users and 1,614 “AtRisk” users. gorithms, 5 topics are found consistent in the results, including
Depressive Feelings, Drug Abuse, Psychological Disorders,
3.4 Addressing Class Imbalance Self-harm, and Bullying. Depressive Symptoms appears to be
To deal with the class imbalance issue (minimal number of another topic found by LSA and NMF, and the top contributing
"HighRisk" users), two additional balanced data-sets were cre- terms for that topic include sleeping, alcohol, and empty. In ad-
ated using random down-sampling and K-means clustering. The dition, Family Violence/Discord is discovered by LSA. In total,
first balanced dataset contained all the 280 “HighRisk” users using the three topic identification algorithms, 7 out of 12 sui-
and 280 randomly-selected “AtRisk” users. The second bal- cide risk factors were identified.
anced dataset contained 280 “HighRisk” users and 285 5.2 User Classification
“AtRisk” users, which were selected from 15 clusters found us- After confirming the definitions of topics, the user-term ma-
ing K-means clustering; for each cluster, we selected 19 most trix ܣwas transformed into the user-topic matrix, where Deci-
representative users. sion Tree and K-means Clustering were applied to classify us-
4. Methodology ers.
We first applied three topic clustering algorithms in this
study to discover topics discussed by users on Twitter, includ- 5.2.1 Decision Tree
ing Latent Semantic Analysis (LSA), Latent Dirichlet Alloca- Decision Tree models were built using the Chi-square Au-
tion (LDA), and Non-Negative Matrix Factorization (NMF). tomatic Interaction Detector (CHAID) algorithm with gain ratio
Second, we explored cluster formations and user classifications as the splitting criterion, which produces a more balanced tree
into “HighRisk” or “AtRisk” using the identified topics. LDA that is less likely to be overfitting than using other splitting cri-
is a probabilistic approach that maximizes the log likelihood of teria. With these parameter settings, number of minimum mem-
each term appearing in each topic, while LSA is a matrix fac- berships in child and parent nodes were varied in experiments,
torization approach that rotates topic vectors in the term space where the restriction in the number of members in child nodes
to best capture the variation of terms. However, LSA produces is half of that in the parent nodes, and the parent nodes mem-
topics which contain terms that are negatively correlated with bership restriction is varied between the range of 10 and 180.
those topics, and such results are hard to be interpreted. To ac-
count for this disadvantage, NMF, a non-negative rank factori- 5.2.2 K-means Clustering
zation approach, is performed to ensure the topics are better in In total, 54 K-means Clustering experiments were per-
being interpretable and well separated. formed. For each of the dataset prepared, two K-means Cluster-
In order to classify users into either “HighRisk” or ing analyses are conducted for each of the similarity measures.
“AtRisk”, two machine learning approaches were employed: Two assumptions were made: the clusters are well separated,
Decision Tree classification and K-means clustering. While De- and the clusters can be partitioned into two groups and be la-
cision Tree is a supervised approach that learns to classify in- beled “AtRisk” and “HighRisk”. Therefore, K=2 is chosen for
stances based on relations between the data points and corre- the clustering analysis. Second, the clusters are disjointed and
sponding ground truth labels, K-means is an unsupervised ap- in order to separate all groups and label each of them into
proach that partitions users based on their similarities Since the “AtRisk” or “HighRisk”, K=N is chosen for the clustering anal-
ground truth labels are unknown when such applications are de- ysis, where N is determined by the number of terminal nodes in
ployed in real-world, we applied K-means clustering to examine the corresponding Decision Tree. For example, for the LDA un-
the effectiveness of using topics discovered in separating balanced dataset, there are 27 terminal nodes in the best deci-
“HighRisk” and “AtRisk” users. sion tree trained, therefore, the K-means clustering is conducted
with K=27.
5. Experiments and Results
5.1 Topic Identification 5.3 Classification Results
Based on the framework proposed in [5], three topic identi- After clusters are formed, the cluster label is based on the
fication algorithms were applied with the assumption that there majority class label within the cluster. For example, if a cluster
are 12 underlying suicide related topics, corresponding to the is dominated by “AtRisk” users, that cluster will be labeled as
12 suicide risk factors. As shown in Table 2, each topic identi- “AtRisk” and all users will be assigned the label “AtRisk” as
fied from results of LDA, LSA, and NMF is associated with one predicted labels. Four metrics are used to evaluate the perfor-
of the 12 risk factors. To determine the topic definition using mances of decision tree and K-means clustering, and they are
one of the twelve suicide risk factors, a threshold of 0.25 is used precision, sensitivity (recall), specificity, and AUC; to calculate
on terms’ weights associated with the topics. For example, in each of these metrics, “HighRisk” class is treated as the positive
the first topic discovered by LDA, sleeping and cut had loadings class.
that are greater than the threshold, and since the term sleeping As shown in Table 3, in general, Decision Tree models per-
has the greatest loading value, the topic is assigned with De- form better than K-means Clustering in identifying “HighRisk”
pressive feelings, according to the framework in [5]. Further- users, measured by precision. Using the unbalanced dataset,
more, topics that are assigned Self-harm and Bullying must have
dominating terms such as Cut and Bullied, respectively. By
943
Authorized licensed use limited to: University of Gloucestershire. Downloaded on March 02,2024 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
Table 2: Topic Clustering Results
LDA Topics Top Contributing Terms with Probabilities
1 Depressive Feelings Sleeping (0.605), Cut (0.347)
2 Drug Abuse Zoloft (0.328), Prozac (0.298)
3 Psychological Disorders Panic Disorder (0.411)
4 Self-harm Cut (0.31)
5 Bullying Bullying (0.796)
6 Self-harm Cut (0.395)
7 Self-harm Cut (0.354), Depressed (0.324)
8 Drug Abuse Alcohol (0.768)
9 Depressive Feelings Empty (0.964), Worthless (0.375)
10 Drug Abuse Pills (0.747)
11 Self-harm Cut (0.995)
12 Family Violence/Discord Suicide (0.295)
LSA Topics with Variance Explained Top Contributing Terms with Loadings
1 Self-harm (0.463) Cut (0.995)
2 Bullying (0.353) Bully (0.988)
3 Drug Abuse (0.031) Zoloft (0.56), Alcohol (0.475), Prozac (0.372)
4 Psychological Disorders (0.023) Panic Disorder (0.825)
5 Bullying (0.021) Bullied (0.978)
6 Depressive Symptoms (0.017) Sleeping (0.54), Alcohol (0.438), Empty (0.417)
7 Depressive Symptoms (0.015) Sleeping (0.698)
8 Drug Abuse (0.014) Pills (0.75)
9 Drug Abuse (0.012) Pills (0.533), Empty (0.33)
10 Depressive Feelings (0.012) Empty (0.687), Depressed (0.415)
11 Bullying (0.012) Abused (0.997)
12 Psychological Disorders (0.008) Bipolar (0.65), Schizophrenia (0.442)
NMF Topics Top Contributing Terms with Loadings
1 Self-harm Cut (2.22)
2 Psychological Disorders Panic Disorder (0.89)
3 Drug Abuse Alcohol (0.651)
4 Bullying Bully (2.275)
5 Depressive Symptoms Sleeping (0.961)
6 Drug Abuse Pills (2.154)
7 Depressive Feelings Empty (2.002)
8 Depressive Feelings Worthless (2.106), Bullied (0.77)
9 Depressive Feelings Depressed (1.143)
10 Psychological Disorders Bipolar (0.572)
11 Depressive Feelings Anxious (1.513), Hopeless (0.421)
12 Depressive Feelings Abused (1.246)
944
Authorized licensed use limited to: University of Gloucestershire. Downloaded on March 02,2024 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
neither Decision Tree nor K-means Clustering is able to clearly captured by, or the popularity of each topic is reported next to
distinguish between “AtRisk” and “HighRisk” users. the topic definition. For example, topic 1, which is "Self-harm"
(0.463), indicates it’s a self-harm related topic and it captures
Decision Tree tends to classify most of users into “High-
0.463 of the variance of the original dataset, and its dominating
Risk”, and therefore, all Decision Tree models built on the un-
term is “Cut” which has a weight of 0.994. By observing these
balanced dataset have sensitivity close to one while the speci-
variances and term weights, the LSA results show that “Self-
ficity is low. On the other hand, K-means Clustering tends to
harm” is the dominating topic in the dataset, followed by “Bul-
assign all users into the “AtRisk” class, and therefore, the K-
lying” as the second mostly discussed topic. Between these two
means Clustering results have zero sensitivity and a specificity
topics, “Cut” and “Bully” are the most commonly used terms.
of one. Using the balanced dataset whose “AtRisk” users were
While LDA and NMF do not produce ordered topics with a par-
randomly sampled, Decision Tree models tend to perform better
ametric approach, the popularity of topics discussed can be
than K-means Clustering as well. K-means Clustering with
measured by the frequency of the topic discovered. Since LDA
K=N, where N is determined as described in Section 5.2.2, can
is a generative probabilistic approach, and Self-harm appear in
capture more "HighRisk" users than that of the Decision Tree
4 out of 12 topics discovered, which has the highest frequency
model; the K-means Clustering result has a sensitivity of 0.771
compared to other topics, it can be argued that Self-harm is the
and the Decision Tree model has a sensitivity of 0.736. Overall,
most probable topic being discussed in the dataset. Among
both K-means clustering, and decision tree perform better using
NMF’s topics discovered, Depressive Feelings is the dominat-
the balanced dataset generated by the K-means approach com-
ing topic. Such results, however, do not provide insights on
pared to those using the other datasets. Decision Tree outper-
what topics are important in distinguishing between “AtRisk”
forms K-means Clustering in precision, sensitivity, specificity
and “HighRisk” users. Such information can be observed from
and AUC. However, using LDA’s balanced dataset with K-
decision tree plots.
means approach, K-means Clustering is able to separate the
“AtRisk” user better than the Decision Tree; K-means Cluster- Among the classification experiments, decision tree models
ing result has a specificity of 0.86 compared to that of the De- built using balanced datasets generated using the K-means ap-
cision Tree which is 0.829. In addition, K-means Clustering is proach performed the best. Therefore, conclusions concerning
achieving a precision (0.828) that is close to that of the Decision the importance of topics shall be drawn from those trees. In a
Tree model (0.844). Decision Tree model, the topics that are used to first partition
the users are considered to be the most important topics. Ob-
K-means Clustering results show that the “AtRisk” and
serving three Decision Tree models’ top nodes, LDA’s most im-
“HighRisk” user clusters are disjointed in the risk factor space
portant topics are Self-harm, Depressive feelings, Drug Abuse,
and can be properly separated using the clustering approach,
and Bullying, LSA’s top important topics are Drug Abuse, Self-
and specifically, the clustering approach can almost perfectly
harm, Psychological Disorders, Bullying, and Depressive
separate the “AtRisk” users with 0.993 specificity. Further-
Symptoms, and NMF’s top important topics are Drug Abuse,
more, such disjoint clusters can be better separated using the
Psychological Disorders, Self-harm, and Drug Abuse. Overall,
supervised approach, which in this study is the Decision Tree
it can be concluded that Self-harm, Drug Abuse, Bullying, De-
model. Among all the Decision Tree models, using NMF’s bal-
pressive Feelings, Depressive Symptoms, and Psychological
anced dataset generated using the K-means Clustering ap-
Disorders play important roles in determining “AtRisk” and
proach, it has the best performance of all, where the precision
“HighRisk” users. Furthermore, since the ground truth labels
is 0.853, the sensitivity is 0.933, the specificity is 0.836, and the
are usually unknown in real-world applications, and with a
AUC is 0.885.
specificity of 0.993 performance by applying K-means cluster-
6. Discussion ing on topics identified, we believe such suicidal ideation de-
tection technique can be studied further and eventually de-
The results have indicated that by utilizing the supervised
ployed as a real-time application to accomplish in-time suicide
and unsupervised machine learning algorithms combined with
prevention work.
topic identification techniques, users who are “AtRisk” and
“High-Risk” of suicidal ideation can be identified using their 7. Limitations
Twitter data. Using the topic clustering techniques, the result
The suicide risk factor framework designed by [5] was
also shows that with minimum human interpretation, 7 out of
based on previous research ranging from 1994 to 2012. With
the 12 suicide risk factors confirmed by suicide researchers
the evolving language usage on social media, an up-to-date su-
were discovered. Without looking at any additional linguistic
icide-related lexicon should be considered for this type of
features of tweets, decision tree models are able to distinguish
framework development and incorporated into the suicide de-
the “AtRisk” and “HighRisk” users. Additionally, the classifi-
tection and prevention research. This is indicative that in the
cation results in combination with the topic clustering provide
dataset used in this study, there might be missing information
qualitative interpretation of the users’ psychological state which
that conveys the ideation of the other 5 suicide risk topics. Fur-
could be potentially useful in future prevention efforts. Among
thermore, the use of emoticons, hashtags, and other twitter
the three topic discovery approaches, Latent Semantic Analysis
meta-data that could potentially indicate suicide ideation are not
is able to provide information on the dominating topics or the
included in the dataset. In addition, there is an unsolved issue in
popularly used suicide-related terms. In Table 2, the variance
the selection of tweets and users who are displaying sarcasm or
945
Authorized licensed use limited to: University of Gloucestershire. Downloaded on March 02,2024 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
making statements in jest compared to users who have actual
suicidal intent. As advancements are made to natural language
processing techniques, this piece of the study can be improved.
This study is evaluated based on the previously established
framework in suicide-related research using Twitter data in-
stead of the ground truth: whether the Twitter user committed
suicide. For this work to be applied in the suicide prevention
domain, ground truth data should be collected and used to eval-
uate the true performance of this study. Other possibilities could
include working to determine the number of different classes
within suicidal users. This study operates off of the “AtRisk”
and “HighRisk” structure and could miss other insights.
8. Conclusion
Suicide is a serious social and economic problem in the
United States. Many efforts have been made in studying the lan-
guage formation of suicide-related tweets and a few are made
to detect suicidal ideation using open social media data. In the
most recent study in achieving such task, [5, 18] tweets are man-
ually tagged by human annotators before machine learning al-
gorithms are applied to classify if users are at risk of committing
suicide, which is not efficient enough to detect suicidal ideation
to support suicide prevention. In this study, we propose a sui-
cidal ideation detection framework that requires minimum hu-
man efforts in annotating data by incorporating unsupervised
topic discovery algorithms. Three techniques are tested in this
study, including Latent Semantic Analysis, Latent Dirichlet Al-
location, and Non-Negative Matrix Factorization. Using these
algorithms, we were able to discover 7 out of 12 suicide risk
factors proposed by [5], and using those topics, we were able to
represent of user profiles in a more compact format using top-
ics. Furthermore, by conducting K-means clustering analysis on
the transformed datasets, we concluded that “AtRisk” and
“HighRisk” user groups are disjointed and cannot be well dis-
tinguished by partitioning them into clusters. However, as
shown in the experimental results using Decision Tree, we were
able to achieve 0.844 of precision, 0.912 of sensitivity, and
0.829 of specificity, where “HighRisk” users are the positive
class. This framework shows that with minimal human interpre-
tation of social media data, it is possible to detect suicidal idea-
tion using the combination of supervised and unsupervised ma-
chine learning algorithms.
946
Authorized licensed use limited to: University of Gloucestershire. Downloaded on March 02,2024 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
Table 3: User Classification Result
R: Random Sampled Unbalanced Balanced (R) Balanced (K) Unbalanced Balanced (R) Balanced (K) Unbalanced Balanced (R) Balanced (K)
K: K-means Sampled
Precision 0.000 0.511 0.546 0.000 0.631 0.647 0.865 0.747 0.845
Sensitivity 0.000 0.886 0.739 0.000 0.689 0.746 0.993 0.750 0.839
LSA
Specificity 1.000 0.154 0.396 1.000 0.596 0.600 0.111 0.746 0.843
AUC 0.500 0.520 0.568 0.500 0.643 0.673 0.552 0.748 0.841
Precision 0.000 0.538 0.648 0.686 0.554 0.828 0.881 0.640 0.844
Sensitivity 0.000 0.432 0.571 0.086 0.771 0.686 0.979 0.736 0.912
LDA
Specificity 1.000 0.629 0.695 0.993 0.379 0.860 0.239 0.586 0.829
AUC 0.500 0.530 0.633 0.539 0.575 0.773 0.609 0.661 0.870
Precision 0.000 0.520 0.606 0.556 0.575 0.734 0.873 0.711 0.853
Sensitivity 0.000 0.882 0.604 0.018 0.661 0.700 0.984 0.739 0.933
NMF
947
Specificity 1.000 0.186 0.614 0.998 0.511 0.751 0.171 0.700 0.836
AUC 0.500 0.534 0.609 0.508 0.586 0.725 0.578 0.720 0.885
Authorized licensed use limited to: University of Gloucestershire. Downloaded on March 02,2024 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.
References Workshop on Computational Linguistics and Clinical Psy-
chology, San Diego, 2016.
1. Center for Disease Control and Prevention, "Suicide Data 12. J. F. Gunn and D. Lester, "Twitter postings and suicide: An
Sheet," Center for Disease Control and Prevention, 1 Janu- analysis of the postings of a fatal suicide in the 24 hours
ary 2015. Available: https://www.cdc.gov/violencepreven- prior to death," Suicidologi, vol. 17, no. 3, P28-30, 2012.
tion/pdf/suicide-datasheet-a.pdf. (accessed 26 December 13. S. R. Braithwaite, C. Giraud-Carrier and J. West, et al. "Val-
2017). idating machine learning algorithms for Twitter data against
2. G. Plemmons, M. Hall and W. Browning, et al. "Trends in established measures of suicidality," JMIR mental health,
Suicidality and Serious Self-Harm for Children 5-17 Years vol. 3, no. 2, P e21, 2016.
at 32 U.S. Children's Hospitals, 2008-2015," in Pediatric 14. B. O'Dea, M. E. Larsen and P. J. Batterham, et al. "A lin-
Academic Societies 2017, Toronto, 2017. guistic analysis of suicide-related Twitter posts," Crisis, vol.
3. D. Luxton, J. D. June and J. M. Fairall, "Social media and 38, no. 5, P319-329, 2017.
suicide: a public health perspective," American journal of 15. B. O'Dea, S. Wan and P. J. Batterham, et al. "Detecting sui-
public health, vol. 102, no. S2, PS195--S200, 2012. cidality on Twitter," Internet Interventions, vol. 2, no. 2,
4. Pew Research Center, "24% of online adults (21% of all P183-188, 2015.
Americans) use Twitter," 10 November 2016. Available: 16. X. Huang, L. Zhang and D. Chiu, et al. "Detecting suicidal
http://www.pewinternet.org/2016/11/11/social-media-up- ideation in Chinese microblogs with psychological lexi-
date-2016/pi_2016-11-11_social-media-update_0-04/. (ac- cons," in Ubiquitous Intelligence and Computing, 2014
cessed 26 December 2017). IEEE 11th International Conference on and IEEE 11th In-
5. J. Jashinsky, S. H. Burton, and C. L. Hanson, et al. "Track- ternational Conference on and Autonomic and Trusted
ing suicide risk factors through Twitter in the US," Crisis, Computing, and IEEE 14th International Conference on
vol. 35, no. 1, P51-9, 2014. Scalable Computing and Communications and Its Associ-
6. B. Stelter, "Web Suicide Viewed Live and Reaction Spur a ated Workshops (UTC-ATC-ScalCom), Bali, 2014.
Debate," 24 November 2008. Available: http://www.ny- 17. T. Liu, Q. Cheng and C. M. Homan, et al. "Learning from
times.com/2008/11/25/us/25suicides.html. (accessed 26 De- various labeling strategies for suicide-related messages on
cember 2017). social media: An experimental study," CoRR, vol.
7. Twitter, "About self-harm and suicide," 1 January 2017. abs/1701.08796, no. abs/1701.08796, P1, 2017.
Available: https://support.twitter.com/articles/20170313. 18. P. Burnap, W. Colombo and J. Scourfield, "Machine classi-
(accessed 26 December 2017). fication and analysis of suicide-related communication on
8. S. Fodeh, J. Goulet, and C. Brandt, et al. "Leveraging Twit- twitter," in 26th ACM Conference on Hypertext & Social
ter to better identify suicide risk," in Machine Learning Re- Media, New York, 2015.
arch, Halifax, 2017. 19. K. D. Rosa, R. Shah and B. Lin, et al. "Topical clustering
9. R. Grant, D. Kucher and A. León, et al. "Automatic Extrac- of tweets," in 34th international ACM SIGIR conference on
tion of Informal Topics from Online Suicidal Ideation," in Research and development in Information Retrieval, Bei-
11th International Workshop on Data and Text Mining in jing, 2011.
Biomedical Informatics, Singapore, 2017. 20. V. Rus, N. Niraula and R. Banjade, "Similarity measures
10. S. Shiffman, A. A. Stone and M. R. Hufford, "Ecological based on latent dirichlet allocation," in International Confer-
momentary assessment," Annual Review of Clinical Psy- ence on Intelligent Text Processing and Computational Lin-
chology, vol. 4, no. 1, P1-32, 2008. guistics, Samos, 2013.
11. G. Coppersmith, K. Ngo and R. Leary, et al. "Exploratory
analysis of social media prior to a suicide attempt," in Third
948
Authorized licensed use limited to: University of Gloucestershire. Downloaded on March 02,2024 at 17:40:32 UTC from IEEE Xplore. Restrictions apply.