0% found this document useful (0 votes)

2 views12 pages

Emerging Topic Detection in Twitter Stream Based On High Utility Pattern Mining

The paper presents a novel method for detecting topics in Twitter streams using High Utility Pattern Mining (HUPM), which integrates word frequency and utility based on growth rates. It introduces a dynamic minimum utility threshold and a Topic-tree (TP-Tree) for post-processing to refine candidate topic patterns. Experimental results show that this method outperforms existing techniques in terms of topic recall and efficiency across various datasets.

Uploaded by

Huyen Ngoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views12 pages

Emerging Topic Detection in Twitter Stream Based On High Utility Pattern Mining

Uploaded by

Huyen Ngoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

EMERGING TOPIC DETECTION IN

TWITTER STREAM BASED ON HIGH

UTILITY PATTERN MINING
Hyeok-Jun Choi - Cheong Hee Park
The process of extracting and summarizing trending
issues in the form of useful information is called topic
detection.
Paper proposes a topic detection method for Twitter
using High Utility Pattern Mining. The proposed method
considers both the frequency of words over tweets and
the utility of words, which is defined based on the
growth rate in appearance frequency. A technique to
dynamically determine the minimum utility threshold
for each chunk of tweets is presented.
And define a Topic-tree (TP-Tree) for post-processing to
extract actual topic patterns from candidate topic
patterns generated by High Utility Pattern Mining
(HUPM).

1. Introduction
2
2. Related Work
2.1. Feature-pivot methods 2.2. Document-pivot methods
• In feature-pivot based methods, a topic • The document-pivot based methods are
is expressed as a group of words, and generally characterized according to the
the goal is to determine the word method used to represent documents and
groups that appear simultaneously in a measure the similarity between documents.
document set.
• (Phuvipadawat & Murata, 2010;
• Some methods are referenced from Sankaranarayanan et al., 2009; O’Connor et
(Mathioudakis & Koudas, 2010), (Weng al., 2010), (Becker et al., 2011), (Petrovic et
& Lee, 2011), (Zhang et al., 2010), al, 2010), (Zhou & Chen, 2014).
(Petkos et al., 2014; Huang et al., 2015;
Gaglio et al., 2015) (Li et al., 2012; 2.3. Probabilistic topic model
Aiello et al., 2013), (Erra et al., 2015). • A topic is expressed as a probability
• Method treats hashtag words as usual distribution of words and documents are
terms and popular hashtag words can considered to be the probability
be used as the words representing distributions of topics.
emerging topics. • (Quercia et al., 2012, Blei et al., 2003;
Hofmann, 1999), (Kim et al., 2012). 3
3. High Utility Pattern
Mining
• Definition 1 [Transaction table]: If a tweet is considered
to be a transaction, words in a tweet can be treated as items
together with the word frequency in the tweet.
• Definition 2 [External utility, Internal utility, Utility]: An
external utility for item means the value of the item,
expressed as eu(i). An internal utility of item i represents
the frequency of the item in the transaction, expressed as
iu(i,T).
The utility of item i for the transaction T, u(i,T):
(1)
• Definition 3 [Itemset utility, Transaction utility,
Transaction-weighted utility]: X denotes a subset of the
items included in transaction T, the utility of the itemset X,
u(X,T), the transaction utility of the transaction T, tu(T), and
the transaction-weighted utility, twu(X):
(2)
(3)
(4) 4
4. Topic Detection based on High
Utility Pattern Mining
4.1. Computation of Utility for Words
Tweets generated in a time order are denoted as Ti and
the Twitter streams as TS = T1, T2, T3,… TS is
represented as a sequence of batches B1, B2, B3,…

The frequency of word i for batch Bt as f(Bt,i), the

difference in the frequency of word i between the
current batch BL and the previous batch BL-1:
dif(i) = f(BL,i) – f(BL-1,i) (5)

• diff(i) > 0: the frequency of word is increasing

• diff(i) < 0: the frequency of word is decreasing

The rate of frequency increase of word i:

Rate(i) = (6)

Figure 1. The flowchart of the proposed method for

The external utility for word i included in batch BL: emerging topic detection in a Twitter stream.
5
(7)
4.2. Determining a Minimum Utility Threshold

• All tweet posts containing the words in X as s(X), the length

l(T) of a tweet post T:

4.3. Generation of Candidate Topic Patterns

(4)

In the HUPM described in (Liu & Qu, 2012),

(8) most of the patterns generated in manner
 α = : the utility average of words include redundant patterns, where some
patterns of short length appear repeatedly in
 β = : the average length of tweets the patterns of long length.
 γ = s(X): the number of tweets The paper calls the patterns generated by
, HUPM candidate topic patterns and apply post-
processing to eliminate the redundant patterns.
• The number of selected words = (9)

 lower-bound
 upper-bound
min-util = avg( (10) 6
4.4. Extraction of Actual Topic Patterns
• TP-Tree (Topic-tree) was constructed to effectively remove the redundancy from the candidate topic
patterns.

 The utility of the pattern p,

PU(p) is defined as follows:

 The means the sum of

external utilities for the
words included in the
pattern.

7
5. Experimental Results

5.1. Twitter Data 5.2. Data Preprocessing

 FA Cup Final (FA): Ground-truth data includes 13 1. All the characters were changed to small letters.
topics in 13 intervals. 2. The tweets that are collected through the Twitter
 Super Tuesday (ST): Ground-truth data contains 22 Streaming API often include HTML tags . Changing
topics in 8 intervals. the HTML tags to white-space and removed URLs
 US Elections (US): The ground-truth information included in the tweet.
for the 64 topics in 26 intervals is given. 3. Performed tokenization including all of hashtags by
Lucene’s Standard Analyzer3

8
5.3. Measures for Performance Evaluation

• Topic recall and topic relevance: Topic recall is the

ratio of the topics successfully detected among the
ground-truth topics. Topic relevance is the ratio of the
topics matched to some ground-truth topic among the
topics found by a method.
• Keyword precision and keyword recall: Keyword
precision is the ratio of correctly detected keywords
out of the total number of keywords for the found
topics matched to some ground-truth topic.
• F-measure: From keyword precision and keyword
recall

9
5.4. Performance Comparison
• Tested with a setting of 27
combinations:

5.5. Parameter Sensitivity

• Figure 4 compares the performance
depending on the parameter selection.
• It shows the topic recall when , , were
varied.
• Overall, for three data sets, topic recall
was good when the value was above
0.025, the value was in the range of
200 to 400, and the value was in the
range of 70 to 90.

10
6.
Conclusions
In this paper, the authors proposed a method for
detecting topics from Twitter streams using HUPM. The
proposed method includes a stage for calculating the
utilities for words in each batch of tweets by the sliding
window technique, a stage for determining the min-util on
each batch, and a stage for extracting actual topic patterns
from the candidate topic patterns using TP-Tree.

They experimentally analyzed the topic detection

performance of the proposed method in comparison with
other methods. Notably, the proposed method showed a
topic recall 5 % higher than the other compared methods
for the ST dataset, 6% higher for the US election data set,
and 8% higher for the FA dataset. Regarding time spent for
topic detection, the proposed method demonstrated short
running time for the three datasets.

11
THANK YOU!

Haitian Plastics Machinery Group CO., LTD.: Spare Parts
0% (1)
Haitian Plastics Machinery Group CO., LTD.: Spare Parts
529 pages
Assisted Combat Personnel Armor - Homebrew ACPA Guide For Cyberpunk RED 1.75
100% (3)
Assisted Combat Personnel Armor - Homebrew ACPA Guide For Cyberpunk RED 1.75
30 pages
Project Management in Oil and Gas
50% (8)
Project Management in Oil and Gas
16 pages
The Dairy Farming Handbook 2017 - by DR CJC Muller
No ratings yet
The Dairy Farming Handbook 2017 - by DR CJC Muller
346 pages
Faaliyihii
No ratings yet
Faaliyihii
137 pages
Physical Sciences P2 Memo
No ratings yet
Physical Sciences P2 Memo
14 pages
60VUPCH1
No ratings yet
60VUPCH1
1 page
Direct Design Handbook Working October 4 2012
No ratings yet
Direct Design Handbook Working October 4 2012
122 pages
Shabad of Saints
No ratings yet
Shabad of Saints
351 pages
Tweet Segmentation and Its Application
No ratings yet
Tweet Segmentation and Its Application
5 pages
Genetic Engineering in Animals Part 1 17052013
100% (1)
Genetic Engineering in Animals Part 1 17052013
52 pages
AUC Knowledge Fountain AUC Knowledge Fountain
No ratings yet
AUC Knowledge Fountain AUC Knowledge Fountain
104 pages
Simple Free-Energy Devices: Chapter 2: The "Joule Thief"
No ratings yet
Simple Free-Energy Devices: Chapter 2: The "Joule Thief"
5 pages
Topic Modeling For Social Media Content A Practical Approach
No ratings yet
Topic Modeling For Social Media Content A Practical Approach
7 pages
Nissan Google Sheet Client Status
No ratings yet
Nissan Google Sheet Client Status
50 pages
A Graph Analytical Approach For Topic Detection
No ratings yet
A Graph Analytical Approach For Topic Detection
21 pages
A Hashtag Recommendation System For Twitter Data Streams
No ratings yet
A Hashtag Recommendation System For Twitter Data Streams
26 pages
Hawkes Processes For Continuous Time Sequence Classification: An Application To Rumour Stance Classification in Twitter
No ratings yet
Hawkes Processes For Continuous Time Sequence Classification: An Application To Rumour Stance Classification in Twitter
7 pages
Monitoring The Public Opinion About The Vaccination Topic From Tweets Analysis
100% (1)
Monitoring The Public Opinion About The Vaccination Topic From Tweets Analysis
18 pages
Hashtag-Based Tweet Expansion For Improved Topic Modeling
No ratings yet
Hashtag-Based Tweet Expansion For Improved Topic Modeling
19 pages
A Review of Approaches For Topic Detection in Twitter
No ratings yet
A Review of Approaches For Topic Detection in Twitter
28 pages
Simbig 2018 Paper n58
No ratings yet
Simbig 2018 Paper n58
16 pages
A Dynamic Users' Interest Discovery Model With Distributed Inference Algorithm
No ratings yet
A Dynamic Users' Interest Discovery Model With Distributed Inference Algorithm
11 pages
IEEE BigData 2022 QueryExpansion
No ratings yet
IEEE BigData 2022 QueryExpansion
8 pages
TopK-HUI-INS
No ratings yet
TopK-HUI-INS
16 pages
A Survey of Key Technologies For High Utility Patterns Mining
No ratings yet
A Survey of Key Technologies For High Utility Patterns Mining
17 pages
2018 Local and Peak Utility Patterns FINAL
No ratings yet
2018 Local and Peak Utility Patterns FINAL
27 pages
Analyzing and Ranking Prevalent News Over Social Media
No ratings yet
Analyzing and Ranking Prevalent News Over Social Media
12 pages
What Is Islam
No ratings yet
What Is Islam
26 pages
McAdams Olson 2010 Personality Development
No ratings yet
McAdams Olson 2010 Personality Development
26 pages
2017, Hashimoto - Topic Life Cycle Extraction From Big Twitter Data Based On Community Detection in Bipartite Networks
No ratings yet
2017, Hashimoto - Topic Life Cycle Extraction From Big Twitter Data Based On Community Detection in Bipartite Networks
6 pages
Chen 2021
No ratings yet
Chen 2021
15 pages
Welcome
No ratings yet
Welcome
19 pages
Dirichlet-Hawkes Processes With Applications To Clustering Continuous-Time Document Streams
No ratings yet
Dirichlet-Hawkes Processes With Applications To Clustering Continuous-Time Document Streams
10 pages
Stan Diffusion Twitter
No ratings yet
Stan Diffusion Twitter
68 pages
Twitternews: Real Time Event Detection From The Twitter Data Stream
No ratings yet
Twitternews: Real Time Event Detection From The Twitter Data Stream
9 pages
Review On Topic Detection Methods For Twitter Streams
No ratings yet
Review On Topic Detection Methods For Twitter Streams
5 pages
Clustering Thesis
No ratings yet
Clustering Thesis
55 pages
1143-Article Text-7844-1-10-20221206
No ratings yet
1143-Article Text-7844-1-10-20221206
10 pages
Refrion Sistemi Adiabatici Def ENG 3 LR 02
No ratings yet
Refrion Sistemi Adiabatici Def ENG 3 LR 02
11 pages
Titov Bunker
No ratings yet
Titov Bunker
8 pages
Size of Problem Visual
No ratings yet
Size of Problem Visual
1 page
A Framework To Predict Social Crimes Using Twitter Tweets
No ratings yet
A Framework To Predict Social Crimes Using Twitter Tweets
5 pages
Abaqus-Modeling of Nonlinear Cyclic Load Behavior of Ishaped
No ratings yet
Abaqus-Modeling of Nonlinear Cyclic Load Behavior of Ishaped
10 pages
05 Bio30 Archive Bulletin 2015 16 - 20150922
No ratings yet
05 Bio30 Archive Bulletin 2015 16 - 20150922
15 pages
Design and Scale-Up Challenges in Hydrothermal Liquefaction
No ratings yet
Design and Scale-Up Challenges in Hydrothermal Liquefaction
14 pages
Schedule Be'e Mos Bridge
No ratings yet
Schedule Be'e Mos Bridge
1 page
Mini Project
No ratings yet
Mini Project
16 pages
Kumar 2021
No ratings yet
Kumar 2021
8 pages
Interactive Hashtag Recommendation System
No ratings yet
Interactive Hashtag Recommendation System
6 pages
Unit-4 1
No ratings yet
Unit-4 1
7 pages
Event Detection in Twitter: Jianshu Weng Yuxia Yao Erwin Leonardi Francis Lee
No ratings yet
Event Detection in Twitter: Jianshu Weng Yuxia Yao Erwin Leonardi Francis Lee
22 pages
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
No ratings yet
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
5 pages
Improving Crisis Event Detection Rate in Online Social Networks Twitter Stream Using Apache Spark
No ratings yet
Improving Crisis Event Detection Rate in Online Social Networks Twitter Stream Using Apache Spark
11 pages
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
No ratings yet
Text Mining of Twitter Data Using A Latent Dirichlet Allocation Topic Model and Sentiment Analysis
6 pages
Detecting Emerging Topics in Social Networks Using Anomaly Detection
No ratings yet
Detecting Emerging Topics in Social Networks Using Anomaly Detection
6 pages
Review of Bacteriology: BY: Paul Aeron E. Bansil, RMT
No ratings yet
Review of Bacteriology: BY: Paul Aeron E. Bansil, RMT
18 pages
Exploring Emerging Issues in Social Torrent Via Link-Irregularity Detection
No ratings yet
Exploring Emerging Issues in Social Torrent Via Link-Irregularity Detection
6 pages
Research Plan: Key Lime As Potential Chelating Agent For Grease - Contaminated Water
No ratings yet
Research Plan: Key Lime As Potential Chelating Agent For Grease - Contaminated Water
14 pages
Lecture 1 CSU510 Introduction To Metabolism
No ratings yet
Lecture 1 CSU510 Introduction To Metabolism
13 pages
Real Time Text Mining On Twitter Data: Shilpy Gandharv Vivek Richhariya Richhariya
No ratings yet
Real Time Text Mining On Twitter Data: Shilpy Gandharv Vivek Richhariya Richhariya
5 pages
(IJCST-V4I6P20) :siddu P. Algur, Rashmi H. Patil, Prashant Bhat
No ratings yet
(IJCST-V4I6P20) :siddu P. Algur, Rashmi H. Patil, Prashant Bhat
6 pages
NewSociRank: Recognizing and Ranking Frequent News Topics Using Social Media Factors
No ratings yet
NewSociRank: Recognizing and Ranking Frequent News Topics Using Social Media Factors
4 pages
Detecting Emerging Areas in Social Streams
No ratings yet
Detecting Emerging Areas in Social Streams
6 pages
Journal of King Saud University - Computer and Information Sciences
No ratings yet
Journal of King Saud University - Computer and Information Sciences
9 pages
Streaming First Story Detection With Application To Twitter: (LSH) (Indyk and Motwani, 1998), A Randomized
No ratings yet
Streaming First Story Detection With Application To Twitter: (LSH) (Indyk and Motwani, 1998), A Randomized
9 pages
(IJCST-V5I2P89) :Riswana.P.P, Divya.M
No ratings yet
(IJCST-V5I2P89) :Riswana.P.P, Divya.M
4 pages
2.hierarchical Topic Modeling of Twitter Data
No ratings yet
2.hierarchical Topic Modeling of Twitter Data
13 pages
JournalNX - Traffic Time Monitoring
No ratings yet
JournalNX - Traffic Time Monitoring
3 pages
Transient Circuit Analysis: 3.1. First Order Transient Circuits
No ratings yet
Transient Circuit Analysis: 3.1. First Order Transient Circuits
14 pages
Domino'S Nutrition Guide
No ratings yet
Domino'S Nutrition Guide
17 pages
(IJCST-V5I2P88) :salmath Amina KP, Farzin Ahammed T
No ratings yet
(IJCST-V5I2P88) :salmath Amina KP, Farzin Ahammed T
3 pages
On The Wisdom of Experts vs. Crowds: Discovering Trustworthy Topical News in Microblogs
No ratings yet
On The Wisdom of Experts vs. Crowds: Discovering Trustworthy Topical News in Microblogs
14 pages
Sec-Buzzer: Cyber Security Emerging Topic Mining With Open Threat Intelligence Retrieval and Timeline Event Annotation
No ratings yet
Sec-Buzzer: Cyber Security Emerging Topic Mining With Open Threat Intelligence Retrieval and Timeline Event Annotation
14 pages
Printable Minimalism
No ratings yet
Printable Minimalism
2 pages
Utility-Driven Data Analytics On Uncertain Data
No ratings yet
Utility-Driven Data Analytics On Uncertain Data
11 pages
One Pager On MidSmall Cap 400 Index Fund
No ratings yet
One Pager On MidSmall Cap 400 Index Fund
2 pages
Prelab Acid Base
100% (4)
Prelab Acid Base
1 page
Omron
No ratings yet
Omron
19 pages
Ulangan Harian Bahasa Inggris Kelas X
No ratings yet
Ulangan Harian Bahasa Inggris Kelas X
3 pages
Mining User - Aware Rare Sequential Topic Pattern in Document Streams
No ratings yet
Mining User - Aware Rare Sequential Topic Pattern in Document Streams
6 pages
How Others Affect Your Twitter #Hashtag Adoption? Examination of Community - Based and Context-Based Information Diffusion in Twitter
No ratings yet
How Others Affect Your Twitter #Hashtag Adoption? Examination of Community - Based and Context-Based Information Diffusion in Twitter
4 pages
Incorporating Topic Transition in Topic Detection and Tracking Algorithmsincorporating Topic Transition in Topic Detection and Tracking Algorithms
No ratings yet
Incorporating Topic Transition in Topic Detection and Tracking Algorithmsincorporating Topic Transition in Topic Detection and Tracking Algorithms
6 pages
Anomalous Topic Discovery in High Dimensional Discrete Data
No ratings yet
Anomalous Topic Discovery in High Dimensional Discrete Data
4 pages
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
No ratings yet
Discovering Emerging Topics in Social Streams Via Link-Anomaly Detection
5 pages
Introduction To Latent Things
No ratings yet
Introduction To Latent Things
2 pages
Topic Models From Twitter Hashtags: 1 Problem Definition
No ratings yet
Topic Models From Twitter Hashtags: 1 Problem Definition
2 pages
Demo Tweet Sieve
No ratings yet
Demo Tweet Sieve
2 pages
Python Mini Manual
From Everand
Python Mini Manual
CodeCraft Dynamics
No ratings yet
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet

Emerging Topic Detection in Twitter Stream Based On High Utility Pattern Mining

Uploaded by

Emerging Topic Detection in Twitter Stream Based On High Utility Pattern Mining

Uploaded by

EMERGING TOPIC DETECTION IN

TWITTER STREAM BASED ON HIGH

The frequency of word i for batch Bt as f(Bt,i), the

• diff(i) > 0: the frequency of word is increasing

The rate of frequency increase of word i:

Figure 1. The flowchart of the proposed method for

• All tweet posts containing the words in X as s(X), the length

4.3. Generation of Candidate Topic Patterns

In the HUPM described in (Liu & Qu, 2012),

 The utility of the pattern p,

 The means the sum of

5.1. Twitter Data 5.2. Data Preprocessing

• Topic recall and topic relevance: Topic recall is the

5.5. Parameter Sensitivity

They experimentally analyzed the topic detection

You might also like