Prob Inf
Prob Inf
J. Zico Kolter
1
Outline
Exact inference
Approximate inference
2
Outline
Exact inference
Approximate inference
3
Probability distributions
p(x)
4
Bayesian networks
5
Bayes net example
Burglary? Earthquake?
x1 x2
x3 Alarm?
x4 x5
JohnCalls? MaryCalls?
6
Bayes net example
p(x1 = 1) Burglary? Earthquake? p(x = 1)
2
x1 x2
0.001 0.002
x1 x2 p(x3 = 1)
x3 Alarm?
0 0 0.001
0 1 0.29
x4 x5 1 0 0.94
1 1 0.95
JohnCalls? MaryCalls?
x3 p(x4 = 1) x3 p(x5 = 1)
0 0.05 0 0.01
1 0.9 1 0.7
6
Bayes net example
p(x1 = 1) Burglary? Earthquake? p(x = 1)
2
x1 x2
0.001 0.002
x1 x2 p(x3 = 1)
x3 Alarm?
0 0 0.001
0 1 0.29
x4 x5 1 0 0.94
1 1 0.95
JohnCalls? MaryCalls?
x3 p(x4 = 1) x3 p(x5 = 1)
0 0.05 0 0.01
1 0.9 1 0.7
8
MRF example
x1 x2
9
MRF example
x1 x2 f (x1, x2)
0 0 10
0 1 1
1 0 1
1 1 10
x1 f (x1) x2 f (x1)
0 1 x1 x2 0 5
1 5 1 1
9
MRF example
x1 x2 f (x1, x2)
0 0 10 x1 x2 Q
f p(x)
0 1 1 0 0 50 1/3
1 0 1 0 1 25 1/6
1 1 10 1 0 25 1/6
x1 f (x1) x2 f (x1) 1 1 50 1/3
0 1 x1 x2 0 5
1 5 1 1
1
• E.g. p(x1 = 1, x2 = 1) = 150 5 · 10 · 1 = 1/3
9
Factor graphs
• A generalization that captures both Bayesian networks and
Markov random fields
x1 x2
11
MRF to factor graph
x1 f3 x2
f1 f2
11
MRF to factor graph
x1 x2 f3(x1, x2)
0 0 10
0 1 1
1 0 1
1 1 10
x1 f3 x2
f1 f2
11
Bayes net to factor graph
x1 x2
x3
x4 x5
12
Bayes net to factor graph
x1 x2
f3
f1 f2
x3
f4 f5
x4 x5
12
Bayes net to factor graph
x1 x2
f3
f1 f2
x3 x3 p(x5 = 1)
0 0.01
f4 f5
1 0.7
x4 x5
12
Bayes net to factor graph
x1 x2
f3 x3 x5 f5(x3, x5)
f1 f2
0 0 0.99
x3 x3 p(x5 = 1) 0 1 0.01
0 0.01 1 0 0.3
f4 f5
1 0.7 1 1 0.7
x4 x5
12
Outline
Exact inference
Approximate inference
13
Inference in probabilistic graphical models
14
Inference via enumeration
• If we’re willing to enumerate all 2n possible values, inference
queries can be answered easily
– Marginal inference:
X m
XY
p(xI ) = p(xI , x̄I ) = fi (Xi )
x̄I x̄I i=1
– Conditional inference
p(xI , xE = x0E )
p(xI |xE = x0E ) =
p(xE = x0E )
15
Exploiting graph structure in inference
16
Example: chain Bayesian network
x1 x2 x3 x4
X
p(x4 ) = p(x1 , x2 , x3 , x4 )
x1 ,x2 ,x3
17
Example: chain Bayesian network
x1 x2 x3 x4
X
p(x4 ) = p(x1 )p(x2 |x1 )p(x3 |x2 )p(x4 |x3 )
x1 ,x2 ,x3
17
Example: chain Bayesian network
x1 x2 x3 x4
X X
p(x4 ) = p(x3 |x2 )p(x4 |x3 ) p(x1 )p(x2 |x1 )
x2 ,x3 x1
17
Example: chain Bayesian network
x1 x2 x3 x4
X
p(x4 ) = p(x3 |x2 )p(x4 |x3 )p(x2 )
x2 ,x3
17
Example: chain Bayesian network
x1 x2 x3 x4
X X
p(x4 ) = p(x4 |x3 ) p(x3 |x2 )p(x2 )
x3 x2
17
Example: chain Bayesian network
x1 x2 x3 x4
X
p(x4 ) = p(x4 |x3 )p(x3 ) = p(x4 )
x3
17
General algorithm: variable elimination
function G0 = Sum-Product-Eliminate(G, xi )
// eliminate variable xi from the factor graph G
F ← {fj ∈ V : (fj , xi ) ∈ E}
X̃ ← {xk : (fj , xk ) ∈ E, fj ∈ F} − {xi }
P Q
f˜(X̃ ) ← xi fj ∈F fj (Xj )
V 0 ← V − {xi , fj ∈ F} + {f˜}
E 0 ← E − {(fj , xk ) ∈ E : fj ∈ F} + {(f˜, xk ) : xk ∈ X̃ }
return G0 = (V 0 , E 0 )
18
Variable elimination example
x1 x2
f3
f1 f2
x3
f4 f5
x4 x5
19
Variable elimination example
x1 x2
f3 F = {f3, f4, f5}
f1 f2
x3
f4 f5
x4 x5
19
Variable elimination example
x1 x2
f3 F = {f3, f4, f5}
f1 f2
X̃ = {x1, x2, x4, x4}
x3
f4 f5
x4 x5
19
Variable elimination example
x1 x2
F = {f3, f4, f5}
f1 f2
X̃ = {x1, x2, x4, x5}
f˜(x1, x2, x3, x5) = f3(x1, x2, x3)f4(x3, x4)f5(x3, x5)
X
f˜ x3
V ′ = {x1, x2, x4, x5, f1, f2, f˜}
E ′ = {(f1, x1), (f2, x2), (f˜, x1), (f˜, x2), (f˜, x4), (f˜, x5)}
x4 x5
19
• Full variable elimination algorithm just repeatedly eliminates
variables
function G0 = Sum-Product-Variable-Elimination(G, X)
// eliminate an ordered list of variables X
for xi ∈ X :
G ← Sum-Product-Eliminate(G, xi )
return G
20
Variable elimination example
Goal: compute p(x4 )
x1 x2
f3
f1 f2
x3
f4 f5
x4 x5
21
Variable elimination example
Goal: compute p(x4 )
x1 x2
f3
f1 f2
x3
f4 f5
x4 x5
21
Variable elimination example
Goal: compute p(x4 )
x2
f˜1
f2
x3
f4 f5
x4 x5
21
Variable elimination example
Goal: compute p(x4 )
x2
f˜1
f2
x3
f4 f5
x4 x5
21
Variable elimination example
Goal: compute p(x4 )
f˜2
x3
f4 f5
x4 x5
21
Variable elimination example
Goal: compute p(x4 )
f˜2
x3
f4 f5
x4 x5
21
Variable elimination example
Goal: compute p(x4 )
x4 f˜3 x5
21
Variable elimination example
Goal: compute p(x4 )
x4 f˜3 x5
21
Variable elimination example
Goal: compute p(x4 )
x4 f˜4
21
Pitfalls
• But...
22
Extensions
23
MAP Inference
• Virtually identical approach can be applied to MAP inference
Exact inference
Approximate inference
25
Sampling methods
xi ∼ p(xi |Parents(xi )) i = 1, . . . , n
26
Gibbs sampling
• But what about cases too big for variable elimination?
function x = Gibbs-Sampling(G, x, K)
for i = 1, . . . , K:
Choose a random xi Y
Sample xi ∼ p(xi |x−i ) ∝ fj (Xj )
fj :(fj ,xi )∈G
maximize p(x)
x
• To put this is a form that we’re more familiar with, for each
|X |
factor fi define the optimization variable µi ∈ R2 i ; µi should
be thought of an an indicator for the assignment to Xi
28
• Abusing notation a bit, we can write optimization as a binary
integer program
m
X
maximize log p(µ) = µTi (log fi )
µ1 ,...,µm
i=1
subject to µ1 , . . . , µn is valid distribution
(µi )j ∈ {0, 1}, ∀i, j
29
• This is still a hard, binary integer programming task, but it turns
out that the LP relaxation is sometimes tight (i.e., just
removing the integer constraints still gives the optimal solution)
• One case where relaxation is tight: tree factor graphs (these are
ones we could already solve with max-product)
31