Emerging Topic Detection in Twitter Stream Based On High Utility Pattern Mining
Emerging Topic Detection in Twitter Stream Based On High Utility Pattern Mining
1. Introduction
2
2. Related Work
2.1. Feature-pivot methods 2.2. Document-pivot methods
• In feature-pivot based methods, a topic • The document-pivot based methods are
is expressed as a group of words, and generally characterized according to the
the goal is to determine the word method used to represent documents and
groups that appear simultaneously in a measure the similarity between documents.
document set.
• (Phuvipadawat & Murata, 2010;
• Some methods are referenced from Sankaranarayanan et al., 2009; O’Connor et
(Mathioudakis & Koudas, 2010), (Weng al., 2010), (Becker et al., 2011), (Petrovic et
& Lee, 2011), (Zhang et al., 2010), al, 2010), (Zhou & Chen, 2014).
(Petkos et al., 2014; Huang et al., 2015;
Gaglio et al., 2015) (Li et al., 2012; 2.3. Probabilistic topic model
Aiello et al., 2013), (Erra et al., 2015). • A topic is expressed as a probability
• Method treats hashtag words as usual distribution of words and documents are
terms and popular hashtag words can considered to be the probability
be used as the words representing distributions of topics.
emerging topics. • (Quercia et al., 2012, Blei et al., 2003;
Hofmann, 1999), (Kim et al., 2012). 3
3. High Utility Pattern
Mining
• Definition 1 [Transaction table]: If a tweet is considered
to be a transaction, words in a tweet can be treated as items
together with the word frequency in the tweet.
• Definition 2 [External utility, Internal utility, Utility]: An
external utility for item means the value of the item,
expressed as eu(i). An internal utility of item i represents
the frequency of the item in the transaction, expressed as
iu(i,T).
The utility of item i for the transaction T, u(i,T):
(1)
• Definition 3 [Itemset utility, Transaction utility,
Transaction-weighted utility]: X denotes a subset of the
items included in transaction T, the utility of the itemset X,
u(X,T), the transaction utility of the transaction T, tu(T), and
the transaction-weighted utility, twu(X):
(2)
(3)
(4) 4
4. Topic Detection based on High
Utility Pattern Mining
4.1. Computation of Utility for Words
Tweets generated in a time order are denoted as Ti and
the Twitter streams as TS = T1, T2, T3,… TS is
represented as a sequence of batches B1, B2, B3,…
>
lower-bound
upper-bound
min-util = avg( (10) 6
4.4. Extraction of Actual Topic Patterns
• TP-Tree (Topic-tree) was constructed to effectively remove the redundancy from the candidate topic
patterns.
7
5. Experimental Results
8
5.3. Measures for Performance Evaluation
9
5.4. Performance Comparison
• Tested with a setting of 27
combinations:
10
6.
Conclusions
In this paper, the authors proposed a method for
detecting topics from Twitter streams using HUPM. The
proposed method includes a stage for calculating the
utilities for words in each batch of tweets by the sliding
window technique, a stage for determining the min-util on
each batch, and a stage for extracting actual topic patterns
from the candidate topic patterns using TP-Tree.
11
THANK YOU!