Latent Dirichlet Allocation: An Example of A Graphical Model
Latent Dirichlet Allocation: An Example of A Graphical Model
Latent Dirichlet Allocation: An Example of A Graphical Model
• from parameters
• from parameters
{1, 0} TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
A generative model for
documents
z z z
w w w
• Useful Facts:
– This distribution is defined over a (k-1)-simplex. That is, it
takes k non-negative arguments which sum to one.
Consequently it is a natural distribution to use over
multinomial distributions.
– In fact, the Dirichlet distribution is the conjugate prior to
the multinomial distribution. (This means that if our
likelihood is multinomial with a Dirichlet prior, then the
posterior is also Dirichlet!)
– The Dirichlet parameter i can be thought of as a prior
count of the ith class.
The LDA Model
z1 z2 z3 z4 z1 z2 z3 z4 z1 z2 z3 z4
w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
• Choose ~Dirichlet()
• For each of the N words wn:
– Choose a topic zn» Multinomial()
– Choose a word wn from p(wn|zn,), a multinomial
probability conditioned on the topic z .
The LDA Model
K
z w N M
z w N M
z w N M
z
wm,n zn , n Discrete n
z
M #documents, K #topics
Dirichlet V # terms in vocabulary
zi
di
Discrete di
N # document length
Dirichlet
Joint factorization
K Dirichlet
Discrete
Discrete
z w N M
N
p d , , z, w , p d p z n d p wn z n , p
n 1
N
p z, w , p d p z n d p wn z n , p d d
n 1
N K
p w , p d p zn d p wm, n zn , p d d
n 1 n 1
M N K
p D , p d p zn d p wm, n zn , p d d
m 1 n 1 n 1
Intractabiliby
K
z w N M
p z, w ,
p z w, ,
p z, w ,
z
Problems
Denominator does not factorize
Denominator represents a summation over O K V
Gibbs sampling
pixel = word
image = document
Corpus preprocessing
• Perplexity
– Indicates ability to predict words on new
unseen documents
Lower the
better
Author-Topic Model
Uniform Document
distribution of
documents over
authors ad
Author
Distribution of
authors over x
topics
Topic
z
Topic A Word
distribution
over words
w
Nd
T D