Latent Dirichlet Allocation: An Example of A Graphical Model

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 47

Latent Dirichlet Allocation:

An example of a graphical model


CS775
LDA: discovering topics in a text
corpus
• Why map knowledge?
– Quickly grasp important themes in a new field
– Synthesize content of an existing field
– Discover targets for funding and research
– Understand a corpus of text
1. A generative model for documents
2. Discovering topics with Gibbs sampling
3. Results
– Topics and classes
– Topic dynamics
A generative model for
documents
• Each document a mixture of topics
• Each word chosen from a single topic

• from parameters
• from parameters

(Blei, Ng, & Jordan, 2003)


A generative model for
documents
w P(w|z = 1) = (1) w P(w|z = 2) = (2)
HEART 0.2 HEART 0.0
LOVE 0.2 LOVE 0.0
SOUL 0.2 SOUL 0.0
TEARS 0.2 TEARS 0.0
JOY 0.2 JOY 0.0
SCIENTIFIC 0.0 SCIENTIFIC 0.2
KNOWLEDGE 0.0 KNOWLEDGE 0.2
WORK 0.0 WORK 0.2
RESEARCH 0.0 RESEARCH 0.2
MATHEMATICS 0.0 MATHEMATICS 0.2
topic 1 topic 2
Choose mixture weights for each document, generate “bag of words”
 = {P(z = 1), P(z = 2)}
MATHEMATICS KNOWLEDGE RESEARCH WORK MATHEMATICS
{0, 1} RESEARCH WORK SCIENTIFIC MATHEMATICS WORK

SCIENTIFIC KNOWLEDGE MATHEMATICS SCIENTIFIC


{0.25, 0.75} HEART LOVE TEARS KNOWLEDGE HEART

MATHEMATICS HEART RESEARCH LOVE MATHEMATICS


{0.5, 0.5} WORK TEARS SOUL KNOWLEDGE HEART

{0.75, 0.25} WORK JOY SOUL TEARS MATHEMATICS


TEARS LOVE LOVE LOVE SOUL

{1, 0} TEARS LOVE JOY SOUL LOVE TEARS SOUL SOUL TEARS JOY
A generative model for
documents

z z z

w w w

• Called Latent Dirichlet Allocation (LDA)


• Introduced by Blei, Ng, and Jordan
(2003), reinterpretation of PLSI
(Hofmann, 2001)
Dirichlet Distributions
• In the LDA model, we would like to say that the topic
mixture proportions for each document are drawn from
some distribution.
• So, we want to put a distribution on multinomials. That
is, k-tuples of non-negative numbers that sum to one.
• The space is of all of these multinomials has a nice
geometric interpretation as a (k-1)-simplex, which is just
a generalization of a triangle to (k-1) dimensions.
• Criteria for selecting our prior:
– It needs to be defined for a (k-1)-simplex.
– Algebraically speaking, we would like it to play nice with the
multinomial distribution.
Dirichlet Examples
Topic Model:
Geometric Representation
Dirichlet Distributions

• Useful Facts:
– This distribution is defined over a (k-1)-simplex. That is, it
takes k non-negative arguments which sum to one.
Consequently it is a natural distribution to use over
multinomial distributions.
– In fact, the Dirichlet distribution is the conjugate prior to
the multinomial distribution. (This means that if our
likelihood is multinomial with a Dirichlet prior, then the
posterior is also Dirichlet!)
– The Dirichlet parameter i can be thought of as a prior
count of the ith class.
The LDA Model

  

z1 z2 z3 z4 z1 z2 z3 z4 z1 z2 z3 z4

w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4

• For each document, 

• Choose ~Dirichlet()
• For each of the N words wn:
– Choose a topic zn» Multinomial()
– Choose a word wn from p(wn|zn,), a multinomial
probability conditioned on the topic z .
The LDA Model
  K

 z w N M

For each document,


• Choose » Dirichlet()
• For each of the N words wn:
– Choose a topic zn» Multinomial()
– Choose a word wn from p(wn|zn,), a multinomial
probability conditioned on the topic zn.
Inference
  K

 z w N M

•The inference problem in LDA is to compute the posterior of the


hidden variables given a document and corpus parameters 
and . That is, compute p(,,z|w,,).
•Unfortunately, exact inference is intractable, so we turn to
alternatives…
The LDA equations
  K

 z w N M

z
 
wm,n zn ,   n   Discrete   n 
z
M #documents, K #topics
  Dirichlet    V # terms in vocabulary
zi  
di 
 Discrete   di
N # document length
  Dirichlet  
Joint factorization
  K Dirichlet

Discrete
Discrete

 z w N M

N
p  d ,  , z, w  ,    p  d    p  z n  d  p  wn z n ,   p   
n 1
N
p  z, w  ,      p  d    p  z n  d  p  wn z n ,   p   d d
n 1

N K

 
p  w  ,      p  d     p  zn  d  p wm, n zn ,  p   d d
n 1 n 1
M N K

 
p  D  ,       p  d     p  zn  d  p wm, n zn ,  p   d d
m 1 n 1 n 1
Intractabiliby
  K

 z w N M

p  z, w  ,  
p  z w,  ,   
 p  z, w  ,  
z

Problems
Denominator does not factorize
Denominator represents a summation over O  K V 
Gibbs sampling

For variables z = z1, z2, …, zn


Draw zi(t+1) from P(zi|z-i, w)
z-i = z1(t+1), z2(t+1),…, zi-1(t+1), zi+1(t), …, zn(t)
Gibbs sampling

• Need full conditional distributions for


variables
• Since we only sample z we need

number of times word w assigned to topic j

number of times topic j used in document d


Gibbs sampling
iteration
1
i wi di zi
1 MATHEMATICS 1 2
2 KNOWLEDGE 1 2
3 RESEARCH 1 1
4 WORK 1 2
5 MATHEMATICS 1 1
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2
i wi di zi zi
1 MATHEMATICS 1 2 ?
2 KNOWLEDGE 1 2
3 RESEARCH 1 1
4 WORK 1 2
5 MATHEMATICS 1 1
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2
i wi di zi zi
1 MATHEMATICS 1 2 ?
2 KNOWLEDGE 1 2
3 RESEARCH 1 1
4 WORK 1 2
5 MATHEMATICS 1 1
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2
i wi di zi zi
1 MATHEMATICS 1 2 ?
2 KNOWLEDGE 1 2
3 RESEARCH 1 1
4 WORK 1 2
5 MATHEMATICS 1 1
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2
i wi di zi zi
1 MATHEMATICS 1 2 2
2 KNOWLEDGE 1 2 ?
3 RESEARCH 1 1
4 WORK 1 2
5 MATHEMATICS 1 1
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2
i wi di zi zi
1 MATHEMATICS 1 2 2
2 KNOWLEDGE 1 2 1
3 RESEARCH 1 1 ?
4 WORK 1 2
5 MATHEMATICS 1 1
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2
i wi di zi zi
1 MATHEMATICS 1 2 2
2 KNOWLEDGE 1 2 1
3 RESEARCH 1 1 1
4 WORK 1 2 ?
5 MATHEMATICS 1 1
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2
i wi di zi zi
1 MATHEMATICS 1 2 2
2 KNOWLEDGE 1 2 1
3 RESEARCH 1 1 1
4 WORK 1 2 2
5 MATHEMATICS 1 1 ?
6 RESEARCH 1 2
7 WORK 1 2
8 SCIENTIFIC 1 1
9 MATHEMATICS 1 2
10 WORK 1 1
11 SCIENTIFIC 2 1
12 KNOWLEDGE 2 1
. . . .
. . . .
. . . .
50 JOY 5 2
Gibbs sampling
iteration
1 2 … 1000
i wi di zi zi zi
1 MATHEMATICS 1 2 2 2
2 KNOWLEDGE 1 2 1 2
3 RESEARCH 1 1 1 2
4 WORK 1 2 2 1
5 MATHEMATICS 1 1 2 2
6 RESEARCH 1 2 2 2
7 WORK 1 2 2 2
8 SCIENTIFIC 1 1 1 … 1
9 MATHEMATICS 1 2 2 2
10 WORK 1 1 2 2
11 SCIENTIFIC 2 1 1 2
12 KNOWLEDGE 2 1 2 2
. . . . . .
. . . . . .
. . . . . .
50 JOY 5 2 1 1
Example of Gibbs Sampling
• Assign word tokens randomly to topics
(●=topic 1; ●=topic 2 )

River Stream Bank Money Loan


River Stream Bank Money Loan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Slide Credit: Padhraic Smyth, UC Irvine


After 1 iteration
• Apply sampling equation to each word
token
River Stream Bank Money Loan
River Stream Bank Money Loan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Slide Credit: Padhraic Smyth, UC Irvine


After 4 iterations

River Stream Bank Money Loan


River Stream Bank Money Loan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Slide Credit: Padhraic Smyth, UC Irvine


After 32 iterations
● ●
topic 1 topic 2
stream .40 bank .39
bank .35 money .32
river .25 loan .29

River Stream Bank Money Loan


River Stream Bank Money Loan
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Slide Credit: Padhraic Smyth, UC Irvine


A visual example: Bars

sample each pixel from


a mixture of topics

pixel = word
image = document
Corpus preprocessing

• Used all D = 28,154 abstracts from 1991-


2001
• Used any word occurring in at least five
abstracts, not on “stop” list (W = 20,551)
• Segmentation by any delimiting character,
total of n = 3,026,970 word tokens in corpus
• Also, PNAS class designations for 2001
(thanks to Kevin Boyack)
Topics and classes
• PNAS authors provide class designations
– major: Biological, Physical, Social Sciences
– minor: 33 separate disciplines*
• Find topics diagnostic of classes
– validate “reality” of classes
– show topics pick out meaningful structure
(classes, and the the relations between them)
210
SYNAPTIC
NEURONS
POSTSYNAPTIC
HIPPOCAMPAL
SYNAPSES
LTP
PRESYNAPTIC
TRANSMISSION
POTENTIATION
PLASTICITY
EXCITATORY
RELEASE
DENDRITIC
PYRAMIDAL
HIPPOCAMPUS
DENDRITES
CA1
STIMULATION
TERMINALS
SYNAPSE
201
RESISTANCE
RESISTANT
DRUG
DRUGS
SENSITIVE
MDR
MULTIDRUG
SUSCEPTIBLE
SELECTED
GLYCOPROTEIN
SENSITIVITY
PGP
AGENTS
CONFERS
MDR1
CYTOTOXIC
CONFERRED
CHEMOTHERAPEUTIC
EFFLUX
INCREASED
280
SPECIES
SELECTION
EVOLUTION
GENETIC
POPULATIONS
POPULATION
VARIATION
NATURAL
EVOLUTIONARY
FITNESS
ADAPTIVE
RATES
THEORY
TRAITS
DIVERSITY
EXPECTED
NEUTRAL
EVOLVED
COMPETITION
HISTORY
222
CORTEX
BRAIN
SUBJECTS
TASK
AREAS
REGIONS
FUNCTIONAL
LEFT
MEMORY
TEMPORAL
IMAGING
PREFRONTAL
CEREBRAL
TASKS
FRONTAL
AREA
TOMOGRAPHY
EMISSION
POSITRON
CORTICAL
2
SPECIES
GLOBAL
CLIMATE
CO2
WATER
ENVIRONMENTAL
YEARS
MARINE
CARBON
DIVERSITY
OCEAN
EXTINCTION
TERRESTRIAL
COMMUNITY
ABUNDANCE
EARTH
ECOLOGICAL
CHANGE
TIME
ECOSYSTEM
39
THEORY
TIME
SPACE
GIVEN
PROBLEM
SHAPE
SIMPLE
DIMENSIONAL
PAPER
NUMBER
CASE
LOCAL
TERMS
SYMMETRY
RANDOM
EQUATION
CLASSICAL
COMPLEXITY
NUMERICAL
PROPERTIES
Mapping science
• Topics provide dimensionality reduction

• Some applications require visualization


(and even lower dimensionality)

• Low-dimensional representation from


methods for analysis of compositional data
Evaluating Predictive Power

• Perplexity
– Indicates ability to predict words on new
unseen documents
Lower the
better
Author-Topic Model
Uniform Document
distribution of
documents over
authors ad
Author
Distribution of
authors over x
topics
Topic

  z
Topic A Word
distribution
over words
  w
Nd
T D

You might also like