0% found this document useful (0 votes)
112 views37 pages

Approximate Inference

The document discusses approximate inference techniques for Bayesian networks. Because exact inference is intractable for Bayesian networks, the goal is to find efficient approximate inference methods. Specifically, it discusses randomized methods like forward sampling, likelihood weighting, Gibbs sampling, and Metropolis-Hastings algorithms that provide approximations by computing probabilities from random samples. It explains how forward sampling samples from the joint distribution rather than the conditional distribution given evidence, and how likelihood weighting addresses this by weighting samples based on the likelihood of the evidence.

Uploaded by

Manish LACHHETA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
112 views37 pages

Approximate Inference

The document discusses approximate inference techniques for Bayesian networks. Because exact inference is intractable for Bayesian networks, the goal is to find efficient approximate inference methods. Specifically, it discusses randomized methods like forward sampling, likelihood weighting, Gibbs sampling, and Metropolis-Hastings algorithms that provide approximations by computing probabilities from random samples. It explains how forward sampling samples from the joint distribution rather than the conditional distribution given evidence, and how likelihood weighting addresses this by weighting samples based on the likelihood of the evidence.

Uploaded by

Manish LACHHETA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Bayesian networks: approximate inference

Machine Intelligence

Thomas D. Nielsen

September 2008

Approximative inference September 2008 1 / 25


Approximate Inference

Motivation

Because of the (worst-case) intractability of exact inference in Bayesian networks, try to find more
efficient approximate inference techniques:

Instead of computing exact posterior


P(A | E = e)
compute approximation
P̂(A | E = e)
with
P̂(A | E = e) ∼ P(A | E = e)

Approximative inference September 2008 2 / 25


Approximate Inference

Absolute/Relative Error

For p, p̂ ∈ [0, 1]: p̂ is approximation for p with absolute error ≤ ǫ, if

| p − p̂ |≤ ǫ, i.e. p̂ ∈ [p − ǫ, p + ǫ].

Approximative inference September 2008 3 / 25


Approximate Inference

Absolute/Relative Error

For p, p̂ ∈ [0, 1]: p̂ is approximation for p with absolute error ≤ ǫ, if

| p − p̂ |≤ ǫ, i.e. p̂ ∈ [p − ǫ, p + ǫ].

p̂ is approximation for p with relative error ≤ ǫ, if

| 1 − p̂/p |≤ ǫ, i.e. p̂ ∈ [p(1 − ǫ), p(1 + ǫ)].

Approximative inference September 2008 3 / 25


Approximate Inference

Absolute/Relative Error

For p, p̂ ∈ [0, 1]: p̂ is approximation for p with absolute error ≤ ǫ, if

| p − p̂ |≤ ǫ, i.e. p̂ ∈ [p − ǫ, p + ǫ].

p̂ is approximation for p with relative error ≤ ǫ, if

| 1 − p̂/p |≤ ǫ, i.e. p̂ ∈ [p(1 − ǫ), p(1 + ǫ)].

This definition is not always fully satisfactory, because it is not symmetric in p and p̂ and not
invariant under the transition p → (1 − p), p̂ → (1 − p̂). Use with care!

When p̂1 , p̂2 are approximations for p1 , p2 with absolute error ≤ ǫ, then no error bounds follow for
p̂1 /p̂2 as an approximation for p1 /p2 .

When p̂1 , p̂2 are approximations for p1 , p2 with relative error ≤ ǫ, then p̂1 /p̂2 approximates p1 /p2
with relative error ≤ (2ǫ)/(1 + ǫ).

Approximative inference September 2008 3 / 25


Approximate Inference

Randomized Methods

Most methods for approximate inference are randomized algorithms that compute approximations
P̂ from random samples of instantiations.

We shall consider:
Forward sampling
Likelihood weighting
Gibbs sampling
Metropolis Hastings algorithm

Approximative inference September 2008 4 / 25


Approximate Inference

Forward Sampling

Observation: can use Bayesian network as random generator that produces full instantiations
V = v according to distribution P(V).

Example:

A
A t
.2
f
.8

- Generate random numbers r1 , r2 uniformly


from [0,1].
- Set A = t if r1 ≤ .2 and A = f else.
B
A t f - Depending on the value of A and r2 set B
B t .7 .3 to t or f .
f .4 .6

Generation of one random instantiation: linear in size of network.

Approximative inference September 2008 5 / 25


Approximate Inference

Sampling Algorithm

Thus, we have a randomized algorithm S that produces possible outputs from sp(V) according to
the distribution P(V).

Define
|{i ∈ 1, . . . , N | E = e, A = a in Si }|
P̂(A = a | E = e) :=
|{i ∈ 1, . . . , N | E = e in Si }|

Approximative inference September 2008 6 / 25


Approximate Inference

Forward Sampling: Illustration

Sample with

not E = e

E = e, A 6= a

E = e, A = a

#
Approximation for P(A = a | E = e):
# ∪

Approximative inference September 2008 7 / 25


Approximate Inference

Sampling from the conditional distribution

Problem of forward sampling: samples with E 6= e are useless!

Idea: find sampling algorithm Sc that produces outputs from sp(V) according to the distribution
P(V | E = e).

Approximative inference September 2008 8 / 25


Approximate Inference

Sampling from the conditional distribution

Problem of forward sampling: samples with E 6= e are useless!

Idea: find sampling algorithm Sc that produces outputs from sp(V) according to the distribution
P(V | E = e).

A tempting approach: Fix the variables in E to e and sample from the nonevidence variables
only!

Approximative inference September 2008 8 / 25


Approximate Inference

Sampling from the conditional distribution

Problem of forward sampling: samples with E 6= e are useless!

Idea: find sampling algorithm Sc that produces outputs from sp(V) according to the distribution
P(V | E = e).

A tempting approach: Fix the variables in E to e and sample from the nonevidence variables
only!
Problem: Only evidence from the ancestors are taken into account!
Approximative inference September 2008 8 / 25
Approximate Inference

Likelihood weighting

We would like to sample from (pa(X )′′ are the parents in E)


Y Y
P(U , e) = P(X | pa(X )′ , pa(X )′′ = e) × P(X = e | pa(X )′ , pa(X )′′ = e),
X ∈U \E X ∈E

but by applying forward sampling with fixed E we actually sample from:


Y
Sampling distribution = P(X | pa(X )′ , pa(X )′′ = e).
X ∈U \E

Approximative inference September 2008 9 / 25


Approximate Inference

Likelihood weighting

We would like to sample from (pa(X )′′ are the parents in E)


Y Y
P(U , e) = P(X | pa(X )′ , pa(X )′′ = e) × P(X = e | pa(X )′ , pa(X )′′ = e),
X ∈U \E X ∈E

but by applying forward sampling with fixed E we actually sample from:


Y
Sampling distribution = P(X | pa(X )′ , pa(X )′′ = e).
X ∈U \E

Solution: Instead of letting each sample count as 1, use


Y
w (x, e) = P(X = e | pa(X )′ , pa(X )′′ = e).
X ∈E

Approximative inference September 2008 9 / 25


Approximate Inference

Likelihood weighting: example

A
A t f - Assume evidence B = t.
.2 .8
- Generate a random number r uniformly
from [0,1].
- Set A = t if r ≤ .2 and A = f else.
B
A t f
- If A = t then let the sample count as
B t .7 .3 w (t, t) = 0.7; otherwise w (f , t) = 0.4.
f .4 .6

Approximative inference September 2008 10 / 25


Approximate Inference

Likelihood weighting: example

A
A t f - Assume evidence B = t.
.2 .8
- Generate a random number r uniformly
from [0,1].
- Set A = t if r ≤ .2 and A = f else.
B
A t f
- If A = t then let the sample count as
B t .7 .3 w (t, t) = 0.7; otherwise w (f , t) = 0.4.
f .4 .6

With N samples (a1 , . . . , aN ) we get


PN
i=1 w (ai = t, e)
P̂(A = t | B = t) = PN .
i=1 (w (ai = t, e) + w (ai = f , e))

Approximative inference September 2008 10 / 25


Approximate Inference

Gibbs Sampling

For notational convenience assume from now on that for some l: E = Vl+1 , Vl+2 , . . . , Vn . Write W
for V1 , . . . , Vl .

Principle: obtain new sample from previous sample by randomly changing the value of only one
selected variable.

Procedure Gibbs sampling


v0 = (v0,1 , . . . , v0,l ) := arbitrary instantiation of W
i := 1
repeat forever
choose Vk ∈ W # deterministic or randomized
generate randomly vi,k according to distribution
P(Vk | V1 = vi−1,1 , . . . , Vk −1 = vi−1,k −1 ,
Vk +1 = vi−1,k +1 , . . . , Vl = vi−1,l , E = e)
set vi = (vi−1,1 , . . . vi−1,k −1 , vi,k , vi−1,k +1 , . . . , vi−1,l )
i := i + 1

Approximative inference September 2008 11 / 25


Approximate Inference

Illustration

The process of Gibbs sampling can be understood as a random walk in the space of all
instantiations with E = e:

Reachable in one step: instantiations that differ from current one by value assignment to at most
one variable (assume randomized choice of variable Vk ).

Approximative inference September 2008 12 / 25


Approximate Inference

Implementation of Sampling Step

The sampling step

generate randomly vi,k according to distribution


P(Vk | V1 = vi−1,1 , . . . , Vk −1 = vi−1,k −1 ,
Vk +1 = vi−1,k +1 , . . . , Vl = vi−1,l , E = e)

requires sampling from a conditional distribution. In this special case (all but one variables are
instantiated) this is easy: just need to compute for each v ∈ sp(Vk ) the probability

P(V1 = vi−1,1 , . . . , Vk −1 = vi−1,k −1 , Vk = v , Vk +1 = vi−1,k +1 , . . . , Vl = vi−1,l , E = e)

(linear in network size), and choose vi,k according to these probabilities (normalized).
This can be further simplified by computing the distribution on sp(Vk ) only in the Markov blanket of
Vk , i.e. the subnetwork consisting of Vk , its parents, its children, and the parents of its children.

Approximative inference September 2008 13 / 25


Approximate Inference

Convergence of Gibbs Sampling

Under certain conditions: the distribution of samples converges to the posterior distribution
P(W | E = e):
lim P(vi = v) = P(W = v | E = e) (v ∈ sp(W)).
i→∞

Sufficient conditions are:


in the repeat loop of the Gibbs sampler, variable Vk is randomly selected (with non-zero
selection probability for all Vk ∈ W), and
the Bayesian network has no zero-entries in its cpt’s

Approximative inference September 2008 14 / 25


Approximate Inference

Approximate Inference using Gibbs Sampling

1. Start Gibbs sampling with some starting configuration v0 .


2. Run the sampler for N steps (“Burn in”)
3. Run the sampler for M additional steps; use the relative frequency of state v in
these M samples as an estimate for P(W = v | E = e).
Problems:
How large must N be chosen? Difficult to say how long it takes for Gibbs sampler to converge!
Even when sampling is from the stationary distribution, samples are not independent. Result:
error cannot be bounded as function of M using Chebyshev’s inequality (or related methods).

Approximative inference September 2008 15 / 25


Approximate Inference

Effect of dependence

P(vN = v) close to P(W = v | E = e): probability that vN is in the red region is close to
P(A = a | E = e).

This does not guarantee that the fraction of samples in vN , vN+1 , . . . , vN+M that are in the red
region yields a good approximation to P(A = a | E = e)!

v0

vN

vN
vN+M

vN+M

Approximative inference September 2008 16 / 25


Approximate Inference

Multiple starting points

In practice, one tries to counteract these difficulties by restarting the Gibbs sampling several times
(often with different starting points):

v0
v0
vN

vN+M

vN+M
vN

vN

vN+M
v0

Approximative inference September 2008 17 / 25


Approximate Inference

Metropolis Hastings Algorithm

Another way of constructing a random walk on sp(W):

Let
{q(v, v′ ) | v, v′ ∈ sp(W)}
be a set of transition probabilities over sp(W), i.e. q(v, ·) is a probability distribution for each
v ∈ sp(W). The q(v, v′ ) are called proposal probabilities.

Define
P(W = v′ | E = e)q(v′ , v)
 ff
α(v, v′ ) := min 1,
P(W = v | E = e)q(v, v′ )
P(W = v′ , E = e)q(v′ , v)
 ff
:= min 1,
P(W = v, E = e)q(v, v′ )

α(v, v′ ) is called the acceptance probability for the transition from v to v′ .

Approximative inference September 2008 18 / 25


Approximate Inference

Procedure Metropolis Hastings sampling


v0 = (v0,1 , . . . , v0,l ) := arbitrary instantiation of W
i := 1
repeat forever
sample v′ according to distribution q(vi−1 , ·)
set accept to true with probability α(v, v′ )
if accept
vi := v′
else
vi := vi−1
i := i + 1

Approximative inference September 2008 19 / 25


Approximate Inference

Convergence of Metropolis Hastings Sampling

Under certain conditions: the distribution of samples converges to the posterior distribution
P(W | E = e).

A sufficient condition is:


q(v, v′ ) > 0 for all v, v′ .

To obtain a good performance, q should be chosen so as to obtain high acceptance probabilities,


i.e. quotients
P(W = v′ | E = e)q(v′ , v)
P(W = v | E = e)q(v, v′ )
should be close to 1. Optimal (but usually not feasible): q(v, v′ ) = P(W = v′ | E = e). Generally:
try to approximate with q the target distribution P(W | E = e).

Approximative inference September 2008 20 / 25


Loopy belief propagation

Loopy belief propagation


Message passing algorithm like junction tree propagation
Works directly on the Bayesian network structure (rather than the junction tree)

Approximative inference September 2008 21 / 25


Loopy belief propagation

Message passing
A node sends a message to a neighbor by
multiplying the incoming messages from all other neighbors to the potential it holds.
marginalizing the result down to the separator.

A B

P(C|A, B)
C

D E

Approximative inference September 2008 22 / 25


Loopy belief propagation

Message passing
A node sends a message to a neighbor by
multiplying the incoming messages from all other neighbors to the potential it holds.
marginalizing the result down to the separator.

A B

φA

P(C|A, B)
C

D E

Approximative inference September 2008 22 / 25


Loopy belief propagation

Message passing
A node sends a message to a neighbor by
multiplying the incoming messages from all other neighbors to the potential it holds.
marginalizing the result down to the separator.

A B

φA φB

P(C|A, B)
C

D E

Approximative inference September 2008 22 / 25


Loopy belief propagation

Message passing
A node sends a message to a neighbor by
multiplying the incoming messages from all other neighbors to the potential it holds.
marginalizing the result down to the separator.

A B

φA φB

P(C|A, B)
C

φD

D E

Approximative inference September 2008 22 / 25


Loopy belief propagation

Message passing
A node sends a message to a neighbor by
multiplying the incoming messages from all other neighbors to the potential it holds.
marginalizing the result down to the separator.

A B

φA φB

P(C|A, B)
C

φD
πE (C)

D E

Approximative inference September 2008 22 / 25


Loopy belief propagation

Message passing
A node sends a message to a neighbor by
multiplying the incoming messages from all other neighbors to the potential it holds.
marginalizing the result down to the separator.

A B

φA φB

P(C|A, B)
C

φD φE
πE (C)

D E

X
πE (C) = φD P(C | A, B)φA φB
A,B

Approximative inference September 2008 22 / 25


Loopy belief propagation

Message passing
A node sends a message to a neighbor by
multiplying the incoming messages from all other neighbors to the potential it holds.
marginalizing the result down to the separator.

A B

λA (C)
φA φB

P(C|A, B)
C

φD φE
πE (C)

D E

X X
πE (C) = φD P(C | A, B)φA φB λC (A) = P(C | A, B)φB φD φE
A,B B,C

Approximative inference September 2008 22 / 25


Loopy belief propagation

Example: error sources

A few observations:
When calculating P(E) we treat C and D as being independent (in the junction tree C and D
would appear in the same separator).
Evidence on a converging connection ma cause the error to cycle.

Approximative inference September 2008 23 / 25


Loopy belief propagation

In general

There is no guarantee of convergence, nor that in case of convergence, it will converge to the
correct distribution. However, the method converges to the correct distribution surprisingly
often!
If the network is singly connected, convergence is guaranteed.

Approximative inference September 2008 24 / 25


Approximate Inference

Literature

R.M. Neal: Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical
report CRG-TR-93-11993, Department of Computer Science, University of Toronto.
http://omega.albany.edu:8008/neal.pdf
P. Dagum, M. Luby: Approximating probabilistic inference in Bayesian belief networks is
NP-hard. Artificial Intelligence 60, 1993.

Approximative inference September 2008 25 / 25

You might also like