0% found this document useful (0 votes)

5 views3 pages

Topic Shift Detectionin Online Discussionsusing Structural Context Conference Paper

Uploaded by

Daisy Beth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views3 pages

Topic Shift Detectionin Online Discussionsusing Structural Context Conference Paper

Uploaded by

Daisy Beth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/334338039

Topic Shift Detection in Online Discussions using Structural Context

Conference Paper · July 2019

DOI: 10.1109/COMPSAC.2019.00155

CITATIONS READS
19 695

2 authors:

Yingcheng Sun Kenneth Loparo

University of North Carolina at Greensboro Case Western Reserve University
41 PUBLICATIONS 431 CITATIONS 436 PUBLICATIONS 11,978 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Yingcheng Sun on 22 October 2019.

The user has requested enhancement of the downloaded file.

2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)

Topic Shift Detection in Online Discussions using Structural Context

Yingcheng Sun Kenneth Loparo

Case Western Reserve University Case Western Reserve University
Cleveland, OH, USA Cleveland, OH, USA
[email protected] [email protected]

Abstract—Topic shift occurs frequently in online discussions, 0 Texas serial bomber made video confession before blowing himself up
and automatically detecting topic shift can help to better 1 What are the chances we ever see the video?
capture the main clues and obtain relevant answers from large 2 The same as the chances of the Browns winning Super Bowl.
3 Browns run too many bad plays in the last quarter.
number of comments. Traditional topic-shift detection 4 I take the browns to the super bowl every morning
methods calculate text similarity and have limited success 5 Zero, videos like this are locked down and used for training purposes
because they ignore semantic relatedness. In this paper, we 6 Here I am thinking how bad can it be?
propose a new topic shift detection model that uses 7 I want to know what kind of phone he has?
conversational structure to enrich the context information and 8 An old analog one from the 90's.
9 Nokia brick, amazingly durable.
word embedding to build the semantic associations for each
comment - post pair. Experiments show that the proposed Figure 1. An example thread of user replies on a news article about Texas
model leads to better performance in terms of precision, recall, serial bomber’s video1. Blue color indictates on- the topic comment and red
and F1 score. color indicates off-the-topic comment. Number i is the i-th comment.

Keywords- topic shift; online discussions; structural context II. TOPIC SHIFT DETECTION MODEL
I. INTRODUCTION Word embedding trained on large corpus has been shown
that it can capture the semantic and syntactic features of
While a nested conversation in social media usually starts words so that similar words are close to each other in the
from the topics discussed in the initial post, often the topic embedding space [5]. Compared to word occurrence
changes during comments and replies and this, leads to topic measurement, word embedding can improve the accuracy of
shift or topic drift [1]. Automatically detecting topic shift in topic shift detection, but may have issues on topic-related
online discussions can help to capture the main clues of the comments with low semantic similarity. For example,
discussion threads, and filter irrelevant replies to promote comment 6 in Figure 1 asking about the video content is
bringing the conversation back on topic and then improve related to the topic but may be incorrectly classified into the
members’ experiences of online communities, especially for “topic shift” group even using word embedding, because its
question and answer sites [2]. Conventional topic shift semantic similarity with post 0 is also low. We thus need to
detection models are based on comparing text similarity introduce more clues from the contextual environment.
between comments and the initial post [3][4], but comments In online discussions, users can easily participate by
in online discussions are short and omit background submitting comments or writing replies to those that draw
information, and are sometimes sparse with lots of co- their attention. In writing a reply, a user reads the initial post
referenced expressions [5], so traditional topic shift detection or headline, browses the comments and selects one for a
models that only use literal similarity may not work well. reply. By writing a reply, a user explicitly expresses their
Figure 1 illustrates a real discussion thread example of user interest in the topic(s) in the discussion thread, thereby
comments on a news article about “Texas serial bomber’s enlarging the discussion tree by adding leaf nodes. The main
video”. In Figure 1, comments 7, 8 and 9 are discussing the topics of a reply may not be closely related to comments
phone the “bomber” uses, that is related to the news article, located at a distance in the discussion thread, but will
but comments 2,3 and 4 (colored red) are talking about definitely be responsive to the comment it is directly
“Browns in the Super Bowl”, that is clearly a shift from the replying to. We thus design our model based on the intuition:
original topic. Neither of the two groups of comments have the topic distribution of a node can be inferred by its
any words that occurred in the news article, so comments in parents, children and siblings nodes besides itself. Figure
both groups will be identified as “topic shift” if we use text 2 shows the tree structure of the example in Figure 1 and its
similarity value as the metric. To address this issue, in this topic shift detection process using word embedding and
paper, we propose a topic shift detection model using word structural context following the above intuition. First, we use
embedding as the vector representation to build the semantic word embedding to obtain the matrix of vectors. In this paper,
relationship between comments and the post, and use the tree we choose the 100-dimensional GloVe2 word embedding
structure that each discussion thread inherently exhibits as pre-trained on Wikipedia as the vector representations. We
context information to enrich the background knowledge for
each comment. Experiments show that our model effectively 1
https://www.reddit.com/r/news/comments/867njq/texas_serial_bomber_
improves topic shift detection performance. 2
made_video_confession_before/?st=juj3moys&sh=3a890c20.
https://nlp.stanford.edu/projects/glove/

DOI 10.1109/COMPSAC.2019.00155
00 00 Context Topic Shift
0 0 0
11 77 Information Comment
11 77 Word 55 22 Cosine 1 Computed in 1 level 1 Node 1 7
88 7 7
Embedding Similarity Bottom up Classification
55 22 88 66 33 99 level 2 5 2
5 2 8 5 2 8 8
66 33 99 44
6 3 9 6 3 9 level 3 6 3 9
44 4 44 level 4 4
Vector matrix

Figure 2. Tree structure and topic shift detection process of the example in figure 1. Blue color indictates on- the topic comment and red color indicates off-
the-topic comment. Shade of color represents topic similiarity to the root, the deeper the larger the similarity.

can obtain the matrix corresponding to the work embedding 0.8

for the entire discussion tree with one line as a comment’s
vector representation. Next, we compute the cosine similarity 0.6
between each comment to the initial post, namely, the root
node in the tree, and rank them by their values. The darker 0.4
the color, the larger the value illustrated in the figure. We
can see that comments 2, 3 and 4 are “topic shifted” and 0.2
comments 6 and 9 are “ not topic shifted” but because they
all have low values, it is difficult to tell them apart. We thus 0
use structural context in the next step to provide additional Entertainmnet Sports Politics Health
background information to each node. We calculate the topic Text Similarity Structural Context and Word Embedding
similarity S’i of node i by its original cosine similarity value
Si, and the similarity value of its parent Sp, children Sc and Figure 3. Comparions of F1 score for the data sets of four domains
sibling Ss :
The result shows that our model achieves better performance
1 1 than traditional text similarity based model in all of the four
′ = + + + domains, with average accuracy, recall and F1 scores of 0.67,
0.654 and 0.661 compared to 0.61, 0.57 and 0.5985. It also
shows that it is harder to detect topic shift or connectedness
where Mi and Ni are respectively the number of children and in the entertainment field than the other three fields, because
siblings of node i if any, and w is the weight. Experiments lots of popular movies, shows and topics are not listed in
on 800 comments show that the topic reliance of a node on GloVe, so their relatedness failed to be detected. In the future,
the four types of nodes above is in a descending order: wi > we plan to use more data to further develop and test our
wp > wc > ws. This is pretty straightforward, because the model.
content of the node itself is the most important indicator and
it replies to the content of its parent node but may discuss ACKNOWLEDGMENT
some other aspects, so wp is also important but less than wi, This work was supported by the Ohio Department of
and the same for wc and ws. In the example of Figure 2, wi, Higher Education, the Ohio Federal Research Network and
wp, wc, ws are set as 0.56, 0.24, 0.15, 0.05 respectively the Wright State Applied Research Corporation under award
according to the statistical result we obtained from WSARC-16-00530 (C4ISR: Human-Centered Big Data).
experiments. We compute the topic similarity in a “bottom
REFERENCES
up” order – from the highest level to level 1. With the new
calculated topic similarity value, we can see that topics
discussed in nodes 6 and 9 are closer to the root, so only [1] Lifna, C.S. and Vijayalakshmi, M., 2015. Identifying concept-drift in
twitter streams. Procedia Computer Science, 45, pp.86-94
nodes 2, 3 and 4 are classified as “topic shift” comments.
[2] Park, A., Hartzler, A.L., Huh, J., Hsieh, G., McDonald, D.W. and
Pratt, W., 2016. “How Did We Get Here?”: topic drift in online health
III. EXPERIMENT discussions. Journal of medical Internet research, 18(11), p.e284.
To reduce the domain bias [3], we collected 200 [3] Topal, K., Koyuturk, M. and Ozsoyoglu, G., 2016, August. Emotion-
comments from discussion threads in four different domains and area-driven topic shift analysis in social media discussions. In
from Yahoo News and Reddit, and have 800 comments in 2016 IEEE/ACM International Conference on Advances in Social
Networks Analysis and Mining (ASONAM) (pp. 510-518). IEEE.
total. We systematically identified a number of main topics
[4] Q. Li, Y. Sun, and B. Xue, “Complex query recognition based on
in each of the posts and examined whether and how many of dynamic learning mechanism,” in Journal of Computational
those main topics changed as threads evolved. Using this Information Systems, vol. 8. Springer, 2012, pp. 8333–8340.
information, we categorized topical changes into topic [5] Li, C., Duan, Y., Wang, H., Zhang, Z., Sun, A. and Ma, Z., 2017.
shifted (254 comments) and not shifted (546 comments). Enhancing topic modeling for short texts with auxiliary word
With the labelled data as a gold standard, we calculated embeddings. ACM Transactions on Information Systems (TOIS),
the precision, recall, and F1 score of our proposed topic shift 36(2), p.11.
detection model and the traditional text similarity based
model. The comparative results are provided in Figure 3.

949

View publication stats

COT Math 3 Week 8
No ratings yet
COT Math 3 Week 8
9 pages
NLP Text Preprocessing
No ratings yet
NLP Text Preprocessing
19 pages
Building HVAC 03 en - 123
100% (1)
Building HVAC 03 en - 123
215 pages
Significant Figures Powerpoint
100% (1)
Significant Figures Powerpoint
18 pages
Payment Certificate 01
No ratings yet
Payment Certificate 01
5 pages
Nema PB 2.2-2004
No ratings yet
Nema PB 2.2-2004
42 pages
Economist Intelligence Unit - Silver Economy
100% (1)
Economist Intelligence Unit - Silver Economy
29 pages
Person 2 Vec
100% (1)
Person 2 Vec
17 pages
Reviewer For BM CORE 2 1
100% (2)
Reviewer For BM CORE 2 1
21 pages
1
No ratings yet
1
15 pages
Gibbs-BERTopic A Hybrid Approach For Short Text To
No ratings yet
Gibbs-BERTopic A Hybrid Approach For Short Text To
13 pages
Rumor Detection Based On Attention CNN and Time Series of
No ratings yet
Rumor Detection Based On Attention CNN and Time Series of
18 pages
Question Bank
100% (1)
Question Bank
5 pages
Jipeng Qiang 2019
No ratings yet
Jipeng Qiang 2019
17 pages
A Review of Approaches For Topic Detection in Twitter
No ratings yet
A Review of Approaches For Topic Detection in Twitter
28 pages
Semantics Graph Mining For Topic Discovery and Word Associations
No ratings yet
Semantics Graph Mining For Topic Discovery and Word Associations
14 pages
Topic Mining Based On Fine-Tuning Sentence-Bert and Lda
No ratings yet
Topic Mining Based On Fine-Tuning Sentence-Bert and Lda
11 pages
A Novel Heuristic For Graph-Based Topic
No ratings yet
A Novel Heuristic For Graph-Based Topic
9 pages
News Article Text Classification and Summary For Authors and Topics
No ratings yet
News Article Text Classification and Summary For Authors and Topics
12 pages
Analyzing and Ranking Prevalent News Over Social Media
No ratings yet
Analyzing and Ranking Prevalent News Over Social Media
12 pages
2022 Acl-Long 536
No ratings yet
2022 Acl-Long 536
13 pages
Pdfs-V6-I2-P11 - Chinthala Shyamala 2016
No ratings yet
Pdfs-V6-I2-P11 - Chinthala Shyamala 2016
7 pages
Short Text Understanding: March 2018
No ratings yet
Short Text Understanding: March 2018
7 pages
Paper News Text Summaraizaton
No ratings yet
Paper News Text Summaraizaton
8 pages
A Clustering Analysis of Tweet Length and Its Relation To Sentiment
No ratings yet
A Clustering Analysis of Tweet Length and Its Relation To Sentiment
6 pages
Hybrid Model For Extractive Single Document Summarization: Utilizing BERTopic and BERT Model
No ratings yet
Hybrid Model For Extractive Single Document Summarization: Utilizing BERTopic and BERT Model
9 pages
Module 2 and 3 Qualitative Research and Its Importance in Daily Life
100% (1)
Module 2 and 3 Qualitative Research and Its Importance in Daily Life
24 pages
Lecture 6 - From Unstructured Texts To Structure Data I
No ratings yet
Lecture 6 - From Unstructured Texts To Structure Data I
17 pages
Action Research Visual Art 2023
No ratings yet
Action Research Visual Art 2023
24 pages
Azarbonyad - Words Are Malleable
No ratings yet
Azarbonyad - Words Are Malleable
10 pages
Slade Et Al 2014
No ratings yet
Slade Et Al 2014
9 pages
Review 3 - Journal Submission Format: Team Number Title (New)
No ratings yet
Review 3 - Journal Submission Format: Team Number Title (New)
28 pages
Dissertation Results Paper JETIR2209126 SEP22
No ratings yet
Dissertation Results Paper JETIR2209126 SEP22
5 pages
Titov Bunker
No ratings yet
Titov Bunker
8 pages
4 - 21 - Sentiment Analysis Using BERT - ICCTA - 2021
No ratings yet
4 - 21 - Sentiment Analysis Using BERT - ICCTA - 2021
5 pages
Dataminer Proceedings
No ratings yet
Dataminer Proceedings
5 pages
Project Example
No ratings yet
Project Example
19 pages
Topic Detection and Extraction in Chat
No ratings yet
Topic Detection and Extraction in Chat
8 pages
Research Paper 2
No ratings yet
Research Paper 2
7 pages
Streaming First Story Detection With Application To Twitter: (LSH) (Indyk and Motwani, 1998), A Randomized
No ratings yet
Streaming First Story Detection With Application To Twitter: (LSH) (Indyk and Motwani, 1998), A Randomized
9 pages
Revised Research Methodologies Manual
100% (1)
Revised Research Methodologies Manual
100 pages
DR 4000 Boron Method 8015
No ratings yet
DR 4000 Boron Method 8015
6 pages
Paper Intro and Conclusion Corrected
No ratings yet
Paper Intro and Conclusion Corrected
5 pages
Fin Irjmets1715854730
No ratings yet
Fin Irjmets1715854730
8 pages
Text Similarity Measures in News Articles by Vector Space Model Using NLP
No ratings yet
Text Similarity Measures in News Articles by Vector Space Model Using NLP
10 pages
Answer Key For AP Biology Practice Exam, Section I
No ratings yet
Answer Key For AP Biology Practice Exam, Section I
13 pages
Applications of CNN For Sentiement Analysis
No ratings yet
Applications of CNN For Sentiement Analysis
6 pages
A Comparative Analysis of Temporal Long Text Similarity: Application To Financial Documents
No ratings yet
A Comparative Analysis of Temporal Long Text Similarity: Application To Financial Documents
15 pages
Petronas Grease Lix Ep 2/380 and Lix Mep: Advanced Industrial Lithium Complex and Heavy Duty Greases
100% (1)
Petronas Grease Lix Ep 2/380 and Lix Mep: Advanced Industrial Lithium Complex and Heavy Duty Greases
2 pages
Mridul 2021 Ijca 921582
No ratings yet
Mridul 2021 Ijca 921582
7 pages
Topic Modeling Text Clustering Based On Deep Learning Model
No ratings yet
Topic Modeling Text Clustering Based On Deep Learning Model
11 pages
Text Similarity in Vector Space Models: A Comparative Study
No ratings yet
Text Similarity in Vector Space Models: A Comparative Study
17 pages
An Integrated Clustering and BERT Framework For Improved Topic Modeling
No ratings yet
An Integrated Clustering and BERT Framework For Improved Topic Modeling
9 pages
T 2V: D R T: OP EC Istributed Epresentations of Opics
No ratings yet
T 2V: D R T: OP EC Istributed Epresentations of Opics
25 pages
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
No ratings yet
A Two Staged NLP Based Framework For Assessing The Sentiments On Indian Supreme Court Judgments
10 pages
Thesis Paper Patrick Jaehnichen
No ratings yet
Thesis Paper Patrick Jaehnichen
88 pages
Sentence Similarity Based On Semantic Networks
No ratings yet
Sentence Similarity Based On Semantic Networks
36 pages
iK60N iC60N-iC60H-iC60L C120
No ratings yet
iK60N iC60N-iC60H-iC60L C120
11 pages
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
No ratings yet
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
6 pages
Analysis On Text Summarization
No ratings yet
Analysis On Text Summarization
10 pages
Proposed Preprocessing Methods For Manipulate Text of Tweet
No ratings yet
Proposed Preprocessing Methods For Manipulate Text of Tweet
12 pages
Combining Lexical and Semantic Features For Short Text Classification
No ratings yet
Combining Lexical and Semantic Features For Short Text Classification
9 pages
3614 Ijnlc 02
No ratings yet
3614 Ijnlc 02
12 pages
Twitternews: Real Time Event Detection From The Twitter Data Stream
No ratings yet
Twitternews: Real Time Event Detection From The Twitter Data Stream
9 pages
Vox Tone Lab TT Service Manual
No ratings yet
Vox Tone Lab TT Service Manual
17 pages
Detecting Emerging Areas in Social Streams
No ratings yet
Detecting Emerging Areas in Social Streams
6 pages
Main Components USG 900 V
No ratings yet
Main Components USG 900 V
13 pages
JournalNX - Traffic Time Monitoring
No ratings yet
JournalNX - Traffic Time Monitoring
3 pages
Text Mining: Concepts, Process and Applications: January 2013
No ratings yet
Text Mining: Concepts, Process and Applications: January 2013
5 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
4 pages
Text Mining Project Report
No ratings yet
Text Mining Project Report
27 pages
Review On Topic Detection Methods For Twitter Streams
No ratings yet
Review On Topic Detection Methods For Twitter Streams
5 pages
Summarizing Micro Blog Ing Automatically
No ratings yet
Summarizing Micro Blog Ing Automatically
4 pages
Mobi 18 DBi Dual Band 4 Port MB900 3F 65 16 5 17 5DE TH PDF
No ratings yet
Mobi 18 DBi Dual Band 4 Port MB900 3F 65 16 5 17 5DE TH PDF
1 page
Review On Detection of Spam Comments Using NLP Algorithm
No ratings yet
Review On Detection of Spam Comments Using NLP Algorithm
4 pages
Fundamental of Data Science
No ratings yet
Fundamental of Data Science
20 pages
Nursing Competency Program Disaster Preparedness: Knowledge (Initial Each Element) CRN Title Date Remarks Signature
No ratings yet
Nursing Competency Program Disaster Preparedness: Knowledge (Initial Each Element) CRN Title Date Remarks Signature
4 pages
Legal and Ethical Questions Scavenger Hunt
No ratings yet
Legal and Ethical Questions Scavenger Hunt
12 pages
Introduction To Latent Things
No ratings yet
Introduction To Latent Things
2 pages
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
No ratings yet
Lip Reading Using External Viseme Decoding: 1 Javad Peymanfard 2 Mohammad Reza Mohammadi 3 Hossein Zeinali
5 pages
Team Leading Level 3 Communicate Work-Related Information
No ratings yet
Team Leading Level 3 Communicate Work-Related Information
17 pages
Summary For Elementary School
No ratings yet
Summary For Elementary School
5 pages
Cable Billing
No ratings yet
Cable Billing
2 pages
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
No ratings yet
Machine Learning Algorithm For Sentimental Analysis of Twitter Feeds
4 pages
17 Ac Eng
No ratings yet
17 Ac Eng
38 pages
Acoustic B410C Owner Manual
No ratings yet
Acoustic B410C Owner Manual
2 pages
Army Public School No.1 Jabalpur Practical List - Computer Science Class - XII
No ratings yet
Army Public School No.1 Jabalpur Practical List - Computer Science Class - XII
4 pages
SAP Community Network Forums - Sales Order Open Quantity ..
No ratings yet
SAP Community Network Forums - Sales Order Open Quantity ..
2 pages

Topic Shift Detectionin Online Discussionsusing Structural Context Conference Paper

Uploaded by

Topic Shift Detectionin Online Discussionsusing Structural Context Conference Paper

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Topic Shift Detection in Online Discussions using Structural Context

Conference Paper · July 2019

Yingcheng Sun Kenneth Loparo

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Topic Shift Detection in Online Discussions using Structural Context

Yingcheng Sun Kenneth Loparo

978-1-7281-2607-4/19/$31.00 ©2019 IEEE 948

can obtain the matrix corresponding to the work embedding 0.8

View publication stats

You might also like