0% found this document useful (0 votes)
159 views122 pages

07 - Topic Modeling

The document discusses topic modeling and provides an agenda for the lecture. It includes 5 topics: [1] Topic Modeling, [2] Probabilistic Latent Semantic Analysis, [3] LDA Document Generation Process, [4] LDA Inference via Gibbs Sampling, and [5] LDA Evaluation. Conceptual approaches to topic modeling are also briefly described.

Uploaded by

Marco Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views122 pages

07 - Topic Modeling

The document discusses topic modeling and provides an agenda for the lecture. It includes 5 topics: [1] Topic Modeling, [2] Probabilistic Latent Semantic Analysis, [3] LDA Document Generation Process, [4] LDA Inference via Gibbs Sampling, and [5] LDA Evaluation. Conceptual approaches to topic modeling are also briefly described.

Uploaded by

Marco Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 122

Lecture 7: Topic Modeling

Pilsung Kang
School of Industrial Management Engineering
Korea University
AGENDA
01 Topic Modeling
02 Probabilistic Latent Semantic Analysis
03 LDA: Document Generation Process
04 LDA Inference: Gibbs Sampling
05 LDA Evaluation

2/122
Topic Model: Conceptual Approach
• Topic Model
✓ From an input corpus and the number of topics K → words to topics
✓ From an input corpus and the number of topics K → words to topics

3/122
Topic Model: Conceptual Approach
• Topic Model
✓ For each document, what topics are expressed by that document?

4/122
Topic Model: Conceptual Approach
Knispelis (2015)

5/122
Topic Models: Topic Extraction
Kim et al. (2016)

• Topic Extraction
✓ 30 Topics discovered for “Deep Learning”

Fault detection wi Convolutional Network Representation Face Speech Acoustic Extreme Deep learning Image
th DBN neural network Learning learning Recognition Recognition Modeling Learning architecture Segmentation
layer
deep neural feature face speaker speech deep deep image
input
belief convolutional level recognition speech recognition learn architecture scene
output
network pool extract estimation noise acoustic algorithm neural scale
unit
dbn convolution learn facial adaptation hmm structure standard segmentation
hide
fault convnet extraction shape source neural extreme explore pixel
function

Long-short Predictive Signal Classification Large-scale Image quality Visual Detection Action
NLP
term memory analytics processing models computing assessment recognition using CNN recognition

term data analysis classification application domain pattern word cnn video
recurrent prediction filter classifier implementation state process text detection human
long technique signal class efficient quality compute language convolutional temporal
lstm information component vector process resolution visual representation neural action
network research audio support power relationship field semantic detect track

Learning with Fast learning Applications


Image Medical image di Reinforcement Parameter Auto RBM and Character
few labeled complexity for vehicles
retrieval agnosis learning optimization encoder variations recognition
data reduction & robots
image learn
image train representation machine train fast time recognition
segmentation question
visual algorithm learn boltzmann data reduce real system
disease state
retrieval gradient sparse rbm label parameter application character
cell answer
descriptor sample encode restrict few weight drive network
medical reinforcement
attribute optimization stack distribution transfer complexity Vehicle neural

6/122
Topic Models: Topic Extraction
• Topic Extraction
✓ 50 Topics discovered for “Ultrasound/Ultrasonography”
Vascular Prostate heart CAD MSK nerve tumor OB surgery intervention
plaque biopsy artery image joint block case ultrasound surgery guide
ivus prostate carotid ultrasound patient nerve lesion fetal patient patient
coronary cancer patient method disease ultrasound diagnosis infant intraoperative complication
intravascular patient stenosis base score guide ultrasound abnormality preoperative treatment
stent transrectal plaque propose arthritis patient cyst prenatal surgical percutaneous
patient trus ultrasound feature ultrasound pain mass case ultrasound ultrasound
lesion guide cardiac algorithm clinical anesthesia tumor fetus localization drainage
mm. core dus segmentation inflammatory surgery finding anomaly operative month
ultrasound ultrasound stroke analysis activity plexus ultrasonography diagnosis resection rate
area rate arterial result study technique present congenital surgeon procedure

osteoporosis cerebral ER&ICU cancer Lab test US general vein lymph node lung Healthcare
age brain patient cancer extraction ultrasound vein node lung patient
ultrasound dog emergency patient assist imaging venous lymph chest risk
child fus care tumor ultrasound technique patient patient ultrasound ultrasound
bone bbb ultrasound stage method clinical internal biopsy patient year
year ultrasound department eus liquid review ultrasound metastasis pulmonary study
study blood bedside gastric sample application jugular ultrasound lus follow
fat study perform ovarian time diagnostic thrombosis cancer pleural clinical
qus day physician endoscopic solvent disease central guide line factor
body follicle point ultrasonography determination article dvt negative radiography month
measure barrier cardiac invasion extract role femoral positive diagnosis age

7/122
Topic Models: Topic Extraction
• Topic Extraction
✓ 10 Topics discovered for “Insider Threat”
Insider attacks on Modeling and system
Insider threat in Assessment of Masquerade
No. Communication framework for
relational database insider threat detection
protocol insider threat
1 data measure attack insider user
2 information assess agent threat behavior
3 database security scheme social detect
4 leakage behavior protocol analysis activity
5 access analysis monitor framework malicious
6 detect management mitigation mitigate masquerade
7 transaction privacy fraud monitor attack
8 confidential policy damage factor legitimate
9 document risk psychological technical abnormal
10 file threat financial business decoy

Access control Feature selection


Network intrusion Malicious domain
No. for insider threat for intrusion Miscellaneous
detection systems detection
mitigation detection
1 insider network detection software attack
2 access detection algorithm security malicious
3 user intrusion feature system domain
4 control malicious classification device event
5 cloud traffic accuracy server scenario
6 misuse log dataset malicious human
7 trust event performance protect knowledge
8 risk packet pattern web ontology
9 abuse internet learning architecture represent
10 attacker resource random electronic generate

8/122
Topic Models: Relation between Topics
Kim et al. (2016)

• Relation between Topics: Deep Learning

Scalability
Applications
Object/Signal Recognition
Image Processing

Optimization &
Advanced Learning

Learning Strategies
& NLP/ Autoencoder

Deep Learning Structures


Independent
& Learning
Topics

9/122
Topic Models: Relation between Topics
Kim and Kang (2018+)

• Relation between Topics: Internet of Things

10/122
Topic Models: Trend Analysis
Lee and Kang (2017)

• Topic trends for “technology and innovation management”

11/122
Topic Model: Document Retrieval Knispelis (2015)

12/122
Topic Model: Document Retrieval Knispelis (2015)

13/122
Topic Model: Document Retrieval Knispelis (2015)

14/122
Topic Model
• Matrix Factorization Approach

✓ If we use singular value decomposition (SVD), it is called latent semantic analysis (LSA)

15/122
Topic Model Helic (2014)

• Disadvantage of LSA
✓ Statistical foundation is missing
✓ SVD assumes normally distributed data
✓ Term occurrence is not normally distributed
✓ Still, often it works remarkably good because matrix entries are weighted (e.g. tf-idf)
and those weighted entries may be normally distributed

16/122
Topic Model Helic (2014)

• Probabilistic Topic Model: Generative Approach


✓ Each document is a probability distribution over topics
✓ Distribution over topics represents the essence of a given document
✓ Each topic is a probability distribution over words
▪ Topic “Education”: school, students, education, university,…
▪ Topic “Budget”: million, finance, tax, program, …

17/122
Topic Model: Generative Approach Helic (2014)

• Model-based methods
✓ Statistical inference is based on fitting a probabilistic model of data
✓ The idea is based on a probabilistic or generative model
✓ Such models assign a probability for observing specific data examples
▪ Observing words in a text document

✓ Generative models are powerful method to encode specific assumptions of how


unknown parameters interact to create data

• How it work?
✓ It defines a conditional probability distribution over data given a hypothesis P(D|h)
✓ Given h, we generate data from the conditional distribution P(D|h)
✓ Has many advantages but the main disadvantage is that fitting the model can be more
complicated than an algorithmic approach

18/122
Topic Model: Generative Approach Helic (2014)

• How it work?
✓ It defines a conditional probability distribution over data given a hypothesis P(D|h)
✓ Given h, we generate data from the conditional distribution P(D|h)
✓ Has many advantages but the main disadvantage is that fitting the model can be more
complicated than an algorithmic approach

19/122
Topic Model: Generative Approach Helic (2014)

• (Statistical) inference is the reverse of the generation process


✓ We are given some data D, e.g. a collection of documents
✓ We want to estimate the model, or more precisely the parameters of the hypothesis
h that are most likely to have generated data

20/122
Topic Model: Generative Approach
• Process of generative model

21/122
AGENDA
01 Topic Modeling
02 Probabilistic Latent Semantic Analysis
03 LDA: Document Generation Process
04 LDA Inference: Gibbs Sampling
05 LDA Evaluation

22/122
Latent Structure Hofmann (2005)

• Given a matrix that “encodes” data (e.g. term-document matrix), we have


following potential problems
✓ Too large
✓ Too complicated
✓ Lack of structure
✓ Missing Entries
✓ Noisy Entries, …

• Questions
✓ Is there a simpler way to explain entities?
✓ There might be a latent structure underlying the data
✓ How can we reveal or discover this structure?

23/122
Matrix Decomposition Hofmann (2005)

• Common approach: approximately factorize matrix

approximation left factor right factor

• Factors are typically constrained to be “thin”

 


reduction

factors = latent structure

24/122
LSA Decomposition (revisited)
• Reduce the dimensions using SVD

✓ Step 1) Construct the approximated matrix Ak from the original term-document


matrix A using SVD

✓ Step 2) Multiply the transpose of Uk to obtain k (<<m) by n term-document matrix

✓ Step 3: Apply data mining algorithms


25/122
LSA Decomposition Hofmann (2005)

• Illustrative Example

26/122
Language Model: Naïve Approach Hofmann (2005)

• Maximum likelihood estimation (MLE)

Documents Terms

Number of occurrences
of term w in document d

Zero frequency problem: terms not


occurring in a document get zero probability

27/122
Language Model: Estimation Problem Hofmann (2005)

• Crucial question
✓ In which way can the document collection be utilized to improve estimates?

(i.i.d) sample

document estimation

learning from other


documents in a
collection ?

other
documents

28/122
Probabilistic Latent Semantic Analysis (pLSA)
Hofmann (2005)

• Concept expression probability


✓ Estimated based on all documents Documents Terms
that are dealing with a concept
✓ “Unmixing” of superimposed
concepts is achieved by statistical economic
learning algorithm
✓ No prior knowledge about
imports
concepts required, context and
term co-occurrences are TRADE
exploited
trade
Latent
Concepts

29/122
pLSA: Latent Variable Model
Hofmann (2005)

• Structural modeling assumption (mixture model)

Document Document-specific
language model mixture proportions

Concept expression
Latent concepts
probabilities
or topics

Model fitting 30/122


pLSA: Matrix Decomposition
Hofmann (2005)

• Mixture model can be written as a matrix factorization

...
=
concept
probabilities pLSA term
probabilities

... pLSA document


probabilities

• Contrast to LSA
✓ Non-negativity: every element in U & V is non-negative
✓ Normalization: Each document vector in U and each term vector in V has sum 1

31/122
pLSA: Graphical Model
Hofmann (2005)

• Graphical Representation
shared by all words
in a document

P(z|d)

shared by all
documents in z
collection

P(w|z) w

n(d)
N

32/122
pLSA: Parameter Inference
Helic (2014)

• Parameter inference
✓ We will infer parameters using Maximum Likelihood Estimator (MLE)
✓ First, we need to write down the likelihood function
✓ Let be the number of occurrences of word in document
✓ is the probability of observing a single occurrence word in document
✓ Then, the probability of observing occurrence of word in document
is give by:

33/122
pLSA: Parameter Inference
Helic (2014)

• Parameter Inference
✓ The probability of observing the compete document collection is then given by the
product of probabilities of observing every single word in every document with
corresponding number of occurrences
✓ Then, the likelihood function becomes

✓ The log-likelihood function becomes

34/122
pLSA: Parameter Inference
Helic (2014)

• Parameter Inference
✓ We can not maximize the likelihood analytically because of the logarithm of the sum

✓ A standard procedure is to use an algorithm called Expectation-Maximization (EM)

✓ This is an iterative method to estimate parameters of the models with latent


variables

✓ Each iteration consists of two steps: expectation step (E) and maximization step (M)

35/122
pLSA: EM Algorithm
• E-Step: Posterior probability of latent variables (concepts)
Probability that the occurence of term w
in document d can be “explained“ by
concept z

• M-Step: Parameter estimation based on “completed” statistics

how often is term w associated with


concept z ?

how often is document d associated


with concept z ?

how prevalent is the concept z ?

36/122
pLSA: A Simple Example
• Raw Data

Doc 1 Doc 2 Doc 3 Doc 4 Doc 5 Doc 6


Baseball 1 2 0 0 0 0
Basketball 3 1 0 0 0 0
Boxing 2 0 0 0 0 0
Money 3 3 2 3 2 4
Interest 0 0 3 2 0 0
Rate 0 0 4 1 0 0
Democrat 0 0 0 0 4 3
Republican 0 0 0 0 2 1
Cocus 0 0 0 0 3 2
President 0 0 1 0 2 3

37/122
pLSA: A Simple Example
• Parameter Initialization

Topic 1 Topic 2 Topic 3


0.525 0.407 0.068

Topic 1 Topic 2 Topic 3 Topic 1 Topic 2 Topic 3


Doc 1 0.020 0.008 0.048 Term 1 0.022 0.016 0.010
Doc 2 0.294 0.255 0.329 Term 2 0.018 0.133 0.166
Doc 3 0.204 0.138 0.178 Term 3 0.242 0.058 0.133
Doc 4 0.200 0.146 0.007 Term 4 0.123 0.088 0.145
Doc 5 0.186 0.196 0.233 Term 5 0.016 0.030 0.044
Doc 6 0.096 0.257 0.205 Term 6 0.020 0.167 0.056
Term 7 0.147 0.129 0.201
Term 8 0.188 0.156 0.039
Term 9 0.146 0.114 0.008
Term 10 0.077 0.110 0.199
38/122
pLSA: A Simple Example
• After 1 EM step
Initialization After 1 EM step
Topic 1 Topic 2 Topic 3 Topic 1 Topic 2 Topic 3
0.525 0.407 0.068 0.459 0.430 0.111

Topic 1 Topic 2 Topic 3 Topic 1 Topic 2 Topic 3


Doc 1 0.020 0.008 0.048 Doc 1 0.180 0.077 0.382
Doc 2 0.294 0.255 0.329 Doc 2 0.124 0.089 0.091
Doc 3 0.204 0.138 0.178 Doc 3 0.147 0.213 0.149
Doc 4 0.200 0.146 0.007 Doc 4 0.125 0.110 0.004
Doc 5 0.186 0.196 0.233 Doc 5 0.266 0.204 0.167
Doc 6 0.096 0.257 0.205 Doc 6 0.158 0.308 0.207

39/122
pLSA: A Simple Example
• After 1 EM step
Initialization After 1 EM step
Topic 1 Topic 2 Topic 3 Topic 1 Topic 2 Topic 3
Term 1 0.022 0.016 0.010 Term 1 0.077 0.033 0.028
Term 2 0.018 0.133 0.166 Term 2 0.024 0.074 0.245
Term 3 0.242 0.058 0.133 Term 3 0.061 0.005 0.043
Term 4 0.123 0.088 0.145 Term 4 0.370 0.222 0.295
Term 5 0.016 0.030 0.044 Term 5 0.088 0.093 0.065
Term 6 0.020 0.167 0.056 Term 6 0.033 0.159 0.035
Term 7 0.147 0.129 0.201 Term 7 0.115 0.129 0.129
Term 8 0.188 0.156 0.039 Term 8 0.058 0.058 0.010
Term 9 0.146 0.114 0.008 Term 9 0.099 0.098 0.004
Term 10 0.077 0.110 0.199 Term 10 0.073 0.129 0.146

40/122
pLSA: A Simple Example
• Topic Distribution
✓ Topic distribution changes w.r.t. the EM iterations
100%

90%

80%

70%

60%

50%

40%

30%

20%

10%

0%
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

Topic 1 Topic 2 Topic 3

41/122
pLSA: A Simple Example
• Final result
Doc 1 Doc 2 Doc 3 Doc 4 Doc 5 Doc 6
Baseball 1 2 0 0 0 0
Basketball 3 1 0 0 0 0
Boxing 2 0 0 0 0 0
Money 3 3 2 3 2 4
Interest 0 0 3 2 0 0
Rate 0 0 4 1 0 0
Democrat 0 0 0 0 4 3
Republican 0 0 0 0 2 1
Cocus 0 0 0 0 3 2
President 0 0 1 0 2 3

Topic 1 Topic 2 Topic 3 Topic 1 Topic 2 Topic 3 Topic 1 Topic 2 Topic 3


0.456 0.281 0.263 Doc 1 0.000 0.000 0.600 Baseball 0.000 0.000 0.200
Doc 2 0.000 0.000 0.400 Basketball 0.000 0.000 0.267
Doc 3 0.000 0.625 0.000 Boxing 0.000 0.000 0.133
Doc 4 0.000 0.375 0.000 Money 0.231 0.313 0.400
Doc 5 0.500 0.000 0.000 Interest 0.000 0.312 0.000
Doc 6 0.500 0.000 0.000 Rate 0.000 0.312 0.000
Democrat 0.269 0.000 0.000
Republican 0.115 0.000 0.000
Cocus 0.192 0.000 0.000
President 0.192 0.063 0.000
42/122
pLSA: Example
• Concepts extracted from Science Magazine articles
P(w|z)
P(w|z)

43/122
pLSA: Example
• Example
✓ Polysemy: a word may have multiple senses and multiple types of usage in different
context

44/122
pLSA: Example
• Experimental Evaluation
80 50%
45%
70 40%
Average Precision

60 35%
30%
50 25% VSM
VSM 20% LSA
40
LSA 15% PLSA
30 PLSA 10%
5%
20
0%
10 -5%
Medline CRAN CACM CISI TREC
0
Medline CRAN CACM CISI TREC

✓ Consistent improvements of retrieval accuracy


✓ Relative improvement of average precision: 15-45%

45/122
AGENDA
01 Topic Modeling
02 Probabilistic Latent Semantic Analysis
03 LDA: Document Generation Process
04 LDA Inference: Gibbs Sampling
05 LDA Evaluation

46/122
LDA: Intuition
Blei (2012)

• Documents exhibit multiple topics

47/122
LDA: Intuition
Blei (2012)

• Documents exhibit multiple topics

✓ Each topic is a distribution over words


✓ Each document is a mixture of corpus-wide topics
✓ Each word is drawn from one of those topics
48/122
LDA: Intuition
Blei (2012)

• Documents exhibit multiple topics

✓ In reality, we only observe the documents


✓ The other structure are hidden variables
49/122
LDA: Intuition
Blei (2012)

• Documents exhibit multiple topics

✓ The goal of LDA is to infer the hidden variables


✓ i.e. compute their distribution conditioned on the document
➔ p(topics, proportions, assignments | documents)
50/122
LDA Overview
• Documents exhibit multiple topics
Dirichlet Per-word Topic
parameter topic assignment hyperparameter
Per-document Observed Topics
topic proportions word

✓ Encode assumptions
✓ Define a factorization of the joint distribution
✓ Connect to algorithms for computing with data
51/122
LDA Overview
• LDA structure

✓ Nodes are random variables while edges indicate dependence


✓ Shaded nodes are observed
✓ Plates indicate replicated variables

52/122
LDA: Document generation process
• Document generation process

✓ Draw each topic


✓ For each document
▪ Draw topic proportions
▪ For each word
• Draw
• Draw

53/122
LDA: Document generation process
• Document generation process

✓ Term distribution per topic


▪ Drawn from the Dirichlet distribution, given the Dirichlet parameter , which is a V-vector
with component

54/122
LDA: Document generation process
• Document generation process

✓ Term distribution per topic

55/122
LDA: Document generation process
• Document generation process
✓ Term distribution per topic

56/122
LDA: Document generation process
• Document generation process

✓ Topic distribution per document


▪ Drawn from the Dirichlet distribution, given the Dirichlet parameter , which is a K-
vector with components

57/122
LDA: Document generation process
• Document generation process

✓ Topic distribution per document

58/122
LDA: Document generation process
• Document generation process
✓ Topic distribution per document

59/122
LDA: Document generation process
• Document generation process

✓ Topic to words assignments

60/122
LDA: Document generation process
• Document generation process
✓ Topic to words assignments

61/122
LDA: Document generation process
• Document generation process

✓ Probability of a corpus

62/122
LDA: Document generation process
• Document generation process
✓ Word selection

63/122
LDA: Document generation process
• Document generation process
✓ Word selection

64/122
AGENDA
01 Topic Modeling
02 Probabilistic Latent Semantic Analysis
03 LDA: Document Generation Process
04 LDA Inference: Gibbs Sampling
05 LDA Evaluation

65/122
LDA Inference
• LDA structure

✓ From a collection of documents, we infer


▪ Per-word topic assignment
▪ Per-document topic proportions
▪ Per-corpus topic distributions

66/122
LDA Inference
• Inference
✓ The posterior of the latent variables given the document is

✓ Computing the posterior is intractable (we cannot compute the denominator, the
marginal p(w))
✓ Approximate posterior inference algorithms
▪ Mean field variational methods
▪ Expectation propagation
▪ Collapsed Gibbs sampling
▪ Collapsed variational inference
▪ Online variational inference

67/122
LDA: Dirichlet Distribution
• Binomial & Multinomial
✓ Binomial distribution: the number of successes in a sequence of independent yes/no
experiments (Bernoulli trials)

✓ Multinomial distribution: suppose that each experiment results in one of k possible


outcomes with probabilities p1, …, pk, multinomial models the distribution of the
histogram vector which indicates how many time each outcome was observed over
N trials of experiments.

68/122
LDA: Dirichlet Distribution
• Beta distribution

✓ : considering p as the parameter of a Binomial distribution, we can think of


Beta is a “distribution over distributions” (binomials)

69/122
LDA: Dirichlet Distribution
• Dirichlet distribution


✓ Two parameters
▪ the scale (or concentration):
▪ the base measure:

✓ A generalization of Beta
▪ Beta is a distribution over binomials (in an interval )
▪ Dirichlet is a distribution over Multinomials (in the so-called simplex )

✓ Dirichlet is the conjugate prior of multinomial

70/122
LDA: Dirichlet Distribution
• Important properties of Dirichlet distribution
✓ Posterior is also Dirichlet

71/122
LDA: Dirichlet Distribution
• Important properties of Dirichlet distribution
✓ The parameter α controls the mean shape and sparsity of θ
✓ A Dirichlet with αi < 1 favors extreme distribution

72/122
LDA: Dirichlet Distribution
• Important properties of Dirichlet distribution
✓ The parameter α controls the mean shape and sparsity of θ

73/122
LDA: Dirichlet Distribution
• Sampling results from Dirichlet with different α
✓α = 1

74/122
LDA: Dirichlet Distribution
• Sampling results from Dirichlet with different α
✓ α = 10

75/122
LDA: Dirichlet Distribution
• Sampling results from Dirichlet with different α
✓ α = 100

76/122
LDA: Dirichlet Distribution
• Sampling results from Dirichlet with different α
✓α = 1

77/122
LDA: Dirichlet Distribution
• Sampling results from Dirichlet with different α
✓ α = 0.1

78/122
LDA: Dirichlet Distribution
• Sampling results from Dirichlet with different α
✓ α = 0.01

79/122
LDA: Dirichlet Distribution
• Sampling results from Dirichlet with different α
✓ α = 0.001

80/122
LDA Inference
• We are interested in posterior distribution

✓ Here, latent variables are topic assignments z and topics 𝜃. X is the words (divided
into documents), and Θ are hyper-parameters to Dirichlet distributions: 𝛼 for topic
proportions, 𝛽 for topics

81/122
LDA Inference
• Gibbs Sampling
✓ A form of Markov Chain Monte Carlo
✓ Chain is a sequence of random variable states
✓ Given a state given certain technical conditions,
drawing for all k (repeatedly) results in a
Markov Chain whose stationary distribution is the posterior
✓ For notational call with removed

82/122
LDA Inference
• Monte Carlo method
✓ Computational algorithms that rely on repeated random sampling to obtain numerical
results
✓ Use randomness to solve problems that might be deterministic in principle
✓ Example: approximating the value of 𝜋

83/122
LDA Inference
Murray (2009)

• Markov Chain Monte Carlo sampling


✓ Sampling from a probability distribution based on constructing a Markov Chain that
has the desired distribution as its equilibrium distribution
✓ Use the local information rather than a complete randomness

84/122
LDA Inference
Murray (2009)

• Gibbs Sampling

85/122
LDA Inference
• Gibbs Sampling

https://www.youtube.com/watch?v=ZaKwpVgmKTY 86/122
LDA Inference
Tang (2008)

• Gibbs Variants
✓ Gibbs Sampling
▪ Draw a conditioned on b, c
▪ Draw b conditioned on a, c
▪ Draw c conditioned on a, b

✓ Block Gibbs Sampling


▪ Draw a, b conditioned on c
▪ Draw c, conditioned on a, b

✓ Collapsed Gibbs Sampling


▪ Draw a conditioned on c
▪ Draw c conditioned on a
▪ b is collapsed out during the sampling process

87/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs sampling procedure boils down to estimate

✓ θ and φ are integrated out

88/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs sampling procedure boils down to estimate

✓ θ and φ are integrated out

89/122
LDA Inference: Collapsed Gibbs Sampling
Tang (2008)

• Conditional posterior distribution for zi is given by

✓ The first term is the likelihood and the second term acts like a prior

90/122
LDA Inference: Collapsed Gibbs Sampling
Tang (2008)

• Conditional posterior distribution for zi is given by

✓ Here, is the number of instances of word w assigned to topic j excluding the


current one.
✓ Using the property of expectation of Dirichlet distribution, we have

✓ where is the total number of words assigned to topic j excluding the current
one in the corpus 91/122
LDA Inference: Collapsed Gibbs Sampling
Tang (2008)

• Conditional posterior distribution for zi is given by

✓ where is the number of words assigned to topic j excluding current one in the
document d.

92/122
LDA Inference: Collapsed Gibbs Sampling
Tang (2008)

• Gibbs Sampling Equation

✓ Need to record four count variables


▪ Document-topic count

▪ Document-topic sum

▪ Topic-term count

▪ Topic-term sum
93/122
LDA Inference: Collapsed Gibbs Sampling
Tang (2008)

• Parameter Estimation
✓ To obtain θ and φ, two ways are possible (draw one sample of z or draw multiple
samples of z to calculate the average)

✓ where is the frequency of word assigned to topic j, and is the number of


word assigned to topic z in the document d

94/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs Sampling Equation: Another Form

✓ Number of times document d uses topic k


✓ Number of times topic k uses word type wd,n
✓ Dirichlet parameter for document topic distribution
✓ Dirichlet parameter for topic to word distribution
✓ How much this document likes topic k
✓ How much this topic likes word wd,n

95/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs Sampling Equation: Another Form

✓ Number of times document d uses topic k


✓ Number of times topic k uses word type wd,n
✓ Dirichlet parameter for document topic distribution
✓ Dirichlet parameter for topic to word distribution
✓ How much this document likes topic k
✓ How much this topic likes word wd,n

96/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs Sampling Equation: Another Form

✓ Number of times document d uses topic k


✓ Number of times topic k uses word type wd,n
✓ Dirichlet parameter for document topic distribution
✓ Dirichlet parameter for topic to word distribution
✓ How much this document likes topic k
✓ How much this topic likes word wd,n

97/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs Sampling Equation: Another Form

✓ Number of times document d uses topic k


✓ Number of times topic k uses word type wd,n
✓ Dirichlet parameter for document topic distribution
✓ Dirichlet parameter for topic to word distribution
✓ How much this document likes topic k
✓ How much this topic likes word wd,n

98/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs Sampling Equation: Another Form

✓ Number of times document d uses topic k


✓ Number of times topic k uses word type wd,n
✓ Dirichlet parameter for document topic distribution
✓ Dirichlet parameter for topic to word distribution
✓ How much this document likes topic k
✓ How much this topic likes word wd,n

99/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs Sampling Equation: Another Form

✓ Number of times document d uses topic k


✓ Number of times topic k uses word type wd,n
✓ Dirichlet parameter for document topic distribution
✓ Dirichlet parameter for topic to word distribution
✓ How much this document likes topic k
✓ How much this topic likes word wd,n

100/122
LDA Inference: Collapsed Gibbs Sampling
Speh et al. (2013)

• Collapsed Gibbs Sampling


✓ Illustrative procedure

101/122
LDA Inference: Collapsed Gibbs Sampling
Speh et al. (2013)

• Collapsed Gibbs Sampling


✓ Illustrative procedure

102/122
LDA Inference: Collapsed Gibbs Sampling
Speh et al. (2013)

• Collapsed Gibbs Sampling


✓ Illustrative procedure

103/122
LDA Inference: Collapsed Gibbs Sampling
Speh et al. (2013)

• Collapsed Gibbs Sampling


✓ Illustrative procedure

104/122
LDA Inference: Collapsed Gibbs Sampling
Speh et al. (2013)

• Collapsed Gibbs Sampling


✓ Illustrative procedure

105/122
LDA Inference: Collapsed Gibbs Sampling
Speh et al. (2013)

• Collapsed Gibbs Sampling


✓ Illustrative procedure

106/122
LDA Inference: Collapsed Gibbs Sampling
Speh et al. (2013)

• Collapsed Gibbs Sampling


✓ Illustrative procedure

107/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Randomly assign topics

108/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Randomly assign topics

109/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Sampling

• What is the conditional distribution for this topic


✓ Part 1: How much does this document like each topic?
✓ Part 2: How much does each topic like the word?

110/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• What is the conditional distribution for this topic


✓ Part 1: How much does this document like each topic?

111/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• What is the conditional distribution for this topic


✓ Part 2: How much does each topic like the word?

112/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• What is the conditional distribution for this topic


✓ Geometric interpretation

Normalization

113/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Update count

114/122
LDA Inference: Collapsed Gibbs Sampling
Boyd-Graber (2014)

• Gibbs Sampling

115/122
AGENDA
01 Topic Modeling
02 Probabilistic Latent Semantic Analysis
03 LDA: Document Generation Process
04 LDA Inference: Gibbs Sampling
05 LDA Evaluation

116/122
LDA Evaluation & Model Selection
Qui et al. (2014)

• How many topics are optimal?


✓ Split the data into training and test data sets
▪ Log-likelihood for Gibbs sampling

▪ Perplexity (the geometric mean per-word likelihood) is often used

▪ Topic weights is determined for the new data (hold0ut data set) using Gibbs sampling
▪ Term distributions for topics are kept fixed from the training corpus

117/122
LDA Evaluation & Model Selection
• Model Selection based on Perplexity

118/122
LDA Visualization
• LDAviz

https://nbviewer.jupyter.org/github/bmabey/hacker_news_topic_modelling/blob/master/HN%20Topic%20Model%20Talk.ipynb#topic=10&lambda=1&term= 119/122
120/122
References
Research Papers

• Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.

• Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993-1022.

• Hofmann, T. (1999, August). Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR
conference on Research and development in information retrieval (pp. 50-57). ACM.

• Kim, H., Park, M., & Kang, P. (2016). 토픽모델링과 사회연경망을 통한 딥러닝 연구동향 분석. 대한산업공학회 춘계공동학술대회,
제주.

• Kim, J. & Kang, P. (2018+). Analyzing International Collaboration and Identifying Core Topics for the “Internet of Things” based on
Network Analysis and Topic Modeling, Under review

• Lee, H. & Kang, P. (2017+). Identifying core topics in technology and innovation management studies: A topic model approach, Journal of
Technology Transfer, Accepted for Publication.

• Qiu, Z., Wu, B., Wang, B., Shi, C. Yu, L. (2014). Collapsed Gibbs Sampling for Latent Dirichlet Allocation on Spark, JMLR: Workshop and
Conference Proceedings 36: 17-28.

121/122
References
Other Materials

• Image on the first page: http://cacm.acm.org/magazines/2012/4/147361-probabilistic-topic-models/fulltext

• Blei, D.M. (2012). Probabilistic Topic Models, ICML’12 Tutorial.

• Boyd-Graber, J. (2014). Topic Models, Natural Language Processing Course, Dept. of Computer Science, University of Colorado Boulder.
(Video Lecture Link)

• Helic, D. (2014). Knowledge Discovery and Data Mining 1: Probabilistic Latent Semantic Analysis.

• Hofmann, T. (2005). Latent Semantic Variable Models, Workshop on Subspace, Latent Structure and Feature Selection Techniques:
Statistical and Optimisation Perspectives, Bohinj 2005.

• Knispelis, A. (2015). LDA Topic Models. Slide: https://issuu.com/andriusknispelis/docs/topic_models_-_video, Youtube video:


https://www.youtube.com/watch?v=3mHy4OSyRf0

• Murray, I. (2009). Markov Chain Monte Carlo. Lectures on Machine Learning Summer School 2009:
http://homepages.inf.ed.ac.uk/imurray2/teaching/09mlss/slides.pdf

• Speh, J., Muhic, A., and Rupnik, J. (2013). Parameter Estimation for the Latent Dirichlet Allocation, SiKDD’13. (Video Lecture Link)

• Tang, L. (2008). Gibbs Sampling for LDA

122/122

You might also like