0% found this document useful (0 votes)

24 views56 pages

Prob Inf

The document discusses probabilistic graphical models (PGMs), focusing on Bayesian networks and Markov random fields (MRFs) as methods for representing probability distributions over random variables. It outlines various inference methods, including marginal, conditional, and maximum a posteriori (MAP) inference, and introduces the variable elimination algorithm for efficient computation. Additionally, it covers the conversion between different graphical representations, such as factor graphs, to facilitate inference.

Uploaded by

kociro5434

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views56 pages

Prob Inf

Uploaded by

kociro5434

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

15-780 – Probabilistic Inference

J. Zico Kolter

March 30, 2014

1
Outline

Probabilistic graphical models

Exact inference

Approximate inference

2
Outline

Probabilistic graphical models

Exact inference

Approximate inference

3
Probability distributions

• Probabilistic graphical models (PGMs) are about representing

probability distributions over random variables

p(x)

where for this lecture, x ∈ {0, 1}n , p : {0, 1}n → [0, 1]

• Naively, since there are 2n possible assignments to x, can

represent this distribution completely using 2n − 1 numbers, but
quickly becomes intractable for large n

• PGMs are methods to represent these distributions more

compactly, by exploiting conditional independence

4
Bayesian networks

• A Bayesian network is defined by:

1. A directed acyclic graph (DAG) G = (V = {x1 , . . . , xn }, E)

2. A set of conditional probability tables p(xi |Parents(xi ))

• Defines the joint probability distribution

n
Y
p(x) = p(xi |Parents(xi ))
i=1

• Equivalently, each node is conditionally independent of all

non-descendants given its parents

5
Bayes net example
Burglary? Earthquake?
x1 x2

x3 Alarm?

x4 x5

JohnCalls? MaryCalls?

6
Bayes net example
p(x1 = 1) Burglary? Earthquake? p(x = 1)
2
x1 x2
0.001 0.002
x1 x2 p(x3 = 1)
x3 Alarm?
0 0 0.001
0 1 0.29
x4 x5 1 0 0.94
1 1 0.95
JohnCalls? MaryCalls?
x3 p(x4 = 1) x3 p(x5 = 1)
0 0.05 0 0.01
1 0.9 1 0.7

• Can write distribution as

p(x) = p(x1 )p(x2 |x1 )p(x3 |x1 , x2 )p(x4 |x1 , x2 , x3 )p(x5 |x1 , x2 , x3 , x4 )
= p(x1 )p(x2 )p(x3 |x1 , x2 )p(x4 |x3 )p(x5 |x3 ) 6
Markov random fields
• A (pairwise) Markov random field (MRF) is defined by:
1. An undirected graph G = (V = {x1 , . . . , xn }, E)

2. A set of unary potential f (xi ) for each i = 1, . . . , n

3. A set of binary potentials f (xi , xj ) for all i, j ∈ E

• Defines the joint probability distribution

n
1 Y Y
p(x) = f (xi ) f (xi , xj )
Z
i=1 i,j∈E

where Z is a normalization constant (also called partition

function)
XY n Y
Z= f (xi ) f (xi , xj )
x i=1 i,j∈E 7
• Equivalently, each node is in MRF is conditionally independent
of all other nodes given it’s neighbors

p(xi |x−i ) = p(xi |Neighbors(xi ))

not trivial to show, known as Hammersley-Clifford theorem

8
MRF example

x1 x2

9
MRF example

x1 x2 f (x1, x2)
0 0 10
0 1 1
1 0 1
1 1 10
x1 f (x1) x2 f (x1)
0 1 x1 x2 0 5
1 5 1 1

9
MRF example

x1 x2 f (x1, x2)
0 0 10 x1 x2 Q
f p(x)
0 1 1 0 0 50 1/3
1 0 1 0 1 25 1/6
1 1 10 1 0 25 1/6
x1 f (x1) x2 f (x1) 1 1 50 1/3
0 1 x1 x2 0 5
1 5 1 1

1
• E.g. p(x1 = 1, x2 = 1) = 150 5 · 10 · 1 = 1/3

9
Factor graphs
• A generalization that captures both Bayesian networks and
Markov random fields

• An undirected graph, G = {V = {x1 , . . . , xn , f1 , . . . , fm }, E}

over variables and factors

• There exists an edge fi — xj if and only if factor fi includes

variable xj

• Defines the joint probability distribution

m
1 Y
p(x) = fi (Xi )
Z
i=1

where Xi = {xj : (fi , xj ) ∈ E}) are all variables in factor fi

10
MRF to factor graph

x1 x2

11
MRF to factor graph

x1 f3 x2

f1 f2

11
MRF to factor graph

x1 x2 f3(x1, x2)
0 0 10
0 1 1
1 0 1
1 1 10

x1 f3 x2

f1 f2

11
Bayes net to factor graph

x1 x2

x4 x5

12
Bayes net to factor graph

x1 x2
f3
f1 f2
x3

f4 f5

x4 x5

12
Bayes net to factor graph

x1 x2
f3
f1 f2
x3 x3 p(x5 = 1)
0 0.01
f4 f5
1 0.7
x4 x5

12
Bayes net to factor graph

x1 x2
f3 x3 x5 f5(x3, x5)
f1 f2
0 0 0.99
x3 x3 p(x5 = 1) 0 1 0.01
0 0.01 1 0 0.3
f4 f5
1 0.7 1 1 0.7
x4 x5

12
Outline

Probabilistic graphical models

Exact inference

Approximate inference

13
Inference in probabilistic graphical models

• Inference generally refers to methods that query probabilities

given a graphical model

• Several types that come up frequently

– Marginal inference: compute p(xI ) for some xI ⊆ {x1 , . . . , xn }

(non-trivial even for xI = {x1 , . . . , xn } in factor graph)

– Conditional inference: compute p(xI |xE = x0E ) for some

xI , xE ⊆ {x1 , . . . , xn }, xI ∩ xE = ∅

– Maximum a posteriori (MAP) inference: compute maxxI p(xI ),

and possibly the maximizing assignment x?I (also, conditional
analogue); also called most probable explanation (MPE)

14
Inference via enumeration
• If we’re willing to enumerate all 2n possible values, inference
queries can be answered easily

– Marginal inference:
X m
XY
p(xI ) = p(xI , x̄I ) = fi (Xi )
x̄I x̄I i=1

– Conditional inference
p(xI , xE = x0E )
p(xI |xE = x0E ) =
p(xE = x0E )

– MAP inference: compute p(xI = x0I ) for all possible assignments

x0I , choose largest

15
Exploiting graph structure in inference

• When n gets large, inference by exact enumeration is intractable

• Can (sometimes) use compact graph representation of the

distribution to derive compact forms of inference

16
Example: chain Bayesian network

x1 x2 x3 x4

X
p(x4 ) = p(x1 , x2 , x3 , x4 )
x1 ,x2 ,x3

17
Example: chain Bayesian network

x1 x2 x3 x4

X
p(x4 ) = p(x1 )p(x2 |x1 )p(x3 |x2 )p(x4 |x3 )
x1 ,x2 ,x3

17
Example: chain Bayesian network

x1 x2 x3 x4

X X
p(x4 ) = p(x3 |x2 )p(x4 |x3 ) p(x1 )p(x2 |x1 )
x2 ,x3 x1

17
Example: chain Bayesian network

x1 x2 x3 x4

X
p(x4 ) = p(x3 |x2 )p(x4 |x3 )p(x2 )
x2 ,x3

17
Example: chain Bayesian network

x1 x2 x3 x4

X X
p(x4 ) = p(x4 |x3 ) p(x3 |x2 )p(x2 )
x3 x2

17
Example: chain Bayesian network

x1 x2 x3 x4

X
p(x4 ) = p(x4 |x3 )p(x3 ) = p(x4 )
x3

17
General algorithm: variable elimination

function G0 = Sum-Product-Eliminate(G, xi )
// eliminate variable xi from the factor graph G
F ← {fj ∈ V : (fj , xi ) ∈ E}
X̃ ← {xk : (fj , xk ) ∈ E, fj ∈ F} − {xi }
P Q
f˜(X̃ ) ← xi fj ∈F fj (Xj )
V 0 ← V − {xi , fj ∈ F} + {f˜}
E 0 ← E − {(fj , xk ) ∈ E : fj ∈ F} + {(f˜, xk ) : xk ∈ X̃ }
return G0 = (V 0 , E 0 )

18
Variable elimination example

x1 x2
f3
f1 f2
x3

f4 f5

x4 x5

19
Variable elimination example

x1 x2
f3 F = {f3, f4, f5}
f1 f2
x3

f4 f5

x4 x5

19
Variable elimination example

x1 x2
f3 F = {f3, f4, f5}
f1 f2
X̃ = {x1, x2, x4, x4}
x3

f4 f5

x4 x5

19
Variable elimination example

x1 x2
F = {f3, f4, f5}
f1 f2
X̃ = {x1, x2, x4, x5}
f˜(x1, x2, x3, x5) = f3(x1, x2, x3)f4(x3, x4)f5(x3, x5)
X

f˜ x3
V ′ = {x1, x2, x4, x5, f1, f2, f˜}
E ′ = {(f1, x1), (f2, x2), (f˜, x1), (f˜, x2), (f˜, x4), (f˜, x5)}
x4 x5

19
• Full variable elimination algorithm just repeatedly eliminates
variables
function G0 = Sum-Product-Variable-Elimination(G, X)
// eliminate an ordered list of variables X
for xi ∈ X :
G ← Sum-Product-Eliminate(G, xi )
return G

• Graph returned at the end is a marginalized factor graph over

non-eliminated variables (eliminating all variables returns
constant equal to partition function Z)

• The ordering matters a lot, eliminating variables in the wrong

order can make algorithm no better than enumeration

20
Variable elimination example
Goal: compute p(x4 )

x1 x2
f3
f1 f2
x3

f4 f5

x4 x5