0% found this document useful (0 votes)
55 views23 pages

04 Exact Inference

This chapter discusses exact inference in Bayesian networks. It describes how to calculate marginal distributions and maximum a posteriori estimates for unobserved nodes given observed nodes in a Bayesian network. It focuses on polytrees, which are directed acyclic graphs whose underlying undirected graphs are acyclic, as exact inference is possible for these types of graphs. The chapter also introduces factor graphs and how they can be used to perform exact marginalization by passing "messages" between nodes in the graph.

Uploaded by

Manish LACHHETA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views23 pages

04 Exact Inference

This chapter discusses exact inference in Bayesian networks. It describes how to calculate marginal distributions and maximum a posteriori estimates for unobserved nodes given observed nodes in a Bayesian network. It focuses on polytrees, which are directed acyclic graphs whose underlying undirected graphs are acyclic, as exact inference is possible for these types of graphs. The chapter also introduces factor graphs and how they can be used to perform exact marginalization by passing "messages" between nodes in the graph.

Uploaded by

Manish LACHHETA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

L EARNING AND I NFERENCE IN G RAPHICAL M ODELS

Chapter 04: Exact Inference in Bayesian Networks

Dr. Martin Lauer

University of Freiburg
Machine Learning Lab

Karlsruhe Institute of Technology


Institute of Measurement and Control Systems

Learning and Inference in Graphical Models. Chapter 04 – p. 1/23


References for this chapter

◮ Christopher M. Bishop, Pattern Recognition and Machine Learning, ch. 8,


Springer, 2006
◮ Stuart Russell and Peter Norvig, Artificial Intelligenece: A Modern Approach,
ch. 14, Prentice Hall, 2003
◮ Judea Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of
Plausible Inference, Morgan Kaufmann, 1989
◮ Steffen L. Lauritzen and David J. Spiegelhalter, Local computations with
probabilities on graphical structures and their applications to expert systems,
In: The Journal of the Royal Statistical Society, vol. 50, no. 2, pp. 157–224,
1988
◮ Brendan J. Frey and David J. C. MacKay, A Revolution: Belief Propagation in
Graphs with Cycles, In: Advances in Neural Information Processing Systems
(NIPS), vol. 10, pp. 479-485, 1997, http:
//books.nips.cc/papers/files/nips10/0479.pdf

Learning and Inference in Graphical Models. Chapter 04 – p. 2/23


Inference

Given a graphical model with unobserved nodes U


and observed nodes O , we want to conclude about
the distribution of unobserved nodes
◮ calculate the single node marginal distribution
p(x|O) with x ∈ U
◮ calculate the maximum-a-posterior estimator
example of a polytree
arg max p(U = ~u|O = ~o)
~
u
Exact inference is not always possible. We focus on
the easier cases, polytrees
◮ a polytree is a directed acyclic graph whose
underlying undirected graph is acyclic.

counterexample of a polytree

Learning and Inference in Graphical Models. Chapter 04 – p. 3/23


Marginalization

Calculate p(X = x|O1 = o1 , O2 = o2 ) for the network on the right.


p (X = x|O1 = o1 , O2 = o2 ) U1

p(X = x, O1 = o1 , O2 = o2 )
=
p(O1 = o1 , O2 = o2 ) U2 U3
p(X = x, O1 = o1 , O2 = o2 )
=R
p(X = x′ , O1 = o1 , O2 = o2 )dx′
with X U4
p (X
Z = Zx, O1 = o1 , O2 = o2 ) =
· · · p(X = x, O1 = o1 , O2 = o2 , U1 = u1 , . . . , U5 O1
U7 = u7 ) du1 du2 du3 du4 du5 du6 du7
→ continue on blackboard U6 O2 U7

Learning and Inference in Graphical Models. Chapter 04 – p. 4/23


Factor graph

A factor graph is a bipartite graph with two kind of nodes:


◮ variable nodes that model random variables like in a Bayesian network
◮ factor nodes that model a probabilistic relationship between variable nodes.
Each factor node is assigned with a factor, i.e. a function that models the
stochastic relationship
Variable nodes and factor nodes are connected by undirected links.
For each Bayesian polytree we can create a factor graph as follows:
◮ the set of variable nodes is taken from the nodes of the Bayesian polytree
◮ for each factor p(X|Pred (X ) in the Bayesian network
• we create a new factor node f
• we connect X and Pred (X ) with f
• we assign f (x, y1 , . . . , yn ) ← p(X = x|Pred (X ) = (y1 , . . . , yn ))
Hence, the joint probability of the Bayesian polytree is equal to the product of all
factors of the factor tree.
Learning and Inference in Graphical Models. Chapter 04 – p. 5/23
Factor graphs

Example: factor graph for the Bayesian  f1


network on the right
U1
f1 (u1 ) = p(U1 = u1 )
 f2  f3
f2 (u1 , u2 ) = p(U2 = u2 |U1 = u1 )
f3 (u3 ) = p(U3 = u3 ) U2 U3
f4 (u2 , u3 , x) = p(X = x|U2 = u2 , U3 = u3 )  f4
 f5
f5 (u4 ) = p(U4 = u4 ) U4
X
f6 (x, u4 , u5 ) = p(U5 = u5 |X = x, U4 = u4 )  f6  f7
f7 (u4 , o1 ) = p(O1 = o1 |U4 = u4 )
f8 (u5 , u6 ) = p(U6 = u6 |U5 = u5 ) U5 O1
f9 (u5 , o2 ) = p(O2 = o2 |U5 = u5 )  f8  f9  f10
f10 (o1 , u7 ) = p(U7 = u7 |O1 = o1 ) U6 O2 U7

Learning and Inference in Graphical Models. Chapter 04 – p. 6/23


Marginalization on factor graphs
Task: calculate
Z Z  f1
I(x) = ··· f1 (u1 )f2 (u1 , u2 )f3 (u3 ) U1
f4 (u2 , u3 , x)f5 (u4 )f6 (x, u4 , u5 )  f2  f3
f7 (u4 , o1 )f8 (u5 , u6 )f9 (u5 , o2 )f10 (o1 , u7 )
U2 U3
du1 du2 du3 du4 du5 du6 du7
 f4
Observations:  f5
◮ factor graph is tree with root X X U4
◮ I(x) can be split  f6  f7
Z Zinto
Z two large factors
mf4 →X (x) = f1 (u1 )f2 (u1 , u2 )f3 (u3 ) U5 O1
f4 (u2 , u3 , x)du1 du2 du3  f8  f9  f10
Z Z
mf6 →X (x) = · · · f5 (u4 )f6 (x, u4 , u5 ) U6 O2 U7
f7 (u4 , o1 )f8 (u5 , u6 )f9 (u5 , o2 )f10 (o1 , u7 )du4 du5 du6 du7
Learning and Inference in Graphical Models. Chapter 04 – p. 7/23
Marginalization on factor graphs

In mf4 →x we can factor out f3 and f4


ZZ Z

mf4 →X (x) = f1 (u1 )f2 (u1 , u2 )du1 f3 (u3 ) f4 (u2 , u3 , x)du2 du3
| {z }
| {z } =:mf →U (u3 )
3 3
=:mf2 →U2 (u2 )
and rewrite mf2 →U2 (u2 ) as
Z
mf2 →U2 (u2 ) = f1 (u1 ) f2 (u1 , u2 )du1  f1
| {z }
=:mf1 →U1 (u1 )
U1
Observations:
◮ calculation can be split along the branches of the tree  f2  f3

◮ the lead nodes can serve as starting points for the U2 U3


calculation  f4
◮ only multiplication and integration/summation occur
X
◮ intermediate results can be interpreted as
“messages” sent from one node to its neighbors
Learning and Inference in Graphical Models. Chapter 04 – p. 8/23
Marginalization on factor graphs

mf6 →X (x) can


Z Zbe split in a similar manner
mf6 →X (x) = f6 (x, u4 , u5 )mU4 →f6 (u4 )mU5 →f6 (u5 )du4 du5

mU4 →f6 (u4 ) = mf5 →U4 (u4 ) · mf7 →U4 (u4 )


 f5
mU5 →f6 (u4 ) = mf8 →U5 (u5 ) · mf9 →U5 (u5 )
Z U4
X
mf8 →U5 (u5 ) = f8 (u5 , u6 )mU6 →f8 (u6 )du6  f6  f7
mU6 →f8 (u6 ) = 1
U5 O1
mf9 →U5 (u5 ) = f9 (u5 , o2 ) with observed o2
If we want to extend the procedure to  f8  f9  f10
observed nodes,Zwe could also argue U6 O2 U7
mf9 →U5 (u5 ) = f9 (u5 , o′2 )mO2 →f9 (o′2 )do′2

mO2 →f9 (o′2 ) = δ(o′2 − o2 ) with δ is the Dirac distribution


Learning and Inference in Graphical Models. Chapter 04 – p. 9/23
Side topic: Dirac distribution

The Dirac δ is a distribution that can be used to model discrete distributions in


continuous space:
(
0 if x 6= 0
δ(x) =
∞ if x = 0
so that
Z ∞
δ(x)dx = 1
−∞

Examples
◮ if X is distributed w.r.t. the Dirac distribution, X can only take the value 0
◮ if Y is distributed w.r.t. 0.3 · δ(y − 2) + 0.7 · δ(y + 5.1), Y will take the
value 2 with probability of 0.3 and −5.1 with probability of 0.7

Learning and Inference in Graphical Models. Chapter 04 – p. 10/23


Belief propagation

Example motivates a generic algorithm known as sum-product algorithm or belief


propagation (Pearl, 1989; Lauritzen and Spiegelhalter, 1988)

◮ factor nodes f generate messages and send them to variable nodes V :


mf →V (v)
◮ variable nodes V generate messages and send them to factor nodes f :
mV →f (v)
◮ messages are like distributions (but not necessarily normalized)
◮ a message from a node n to a neighboring node n′ can be generated as
soon as n received messages from all its neighbors except n′
◮ hence, the method can start at the leaf nodes and follow the branches of the
tree until the node of interest is met (dynamic programing principle)

Learning and Inference in Graphical Models. Chapter 04 – p. 11/23


Belief propagation

How are the messages created?


◮ messages from unobserved
Y variable nodes to factor nodes:
mX→f (x) = 1 · mfi →X (x)
i
if f, f1 , . . . , fn are the neighbors of X
◮ messages from observed variable
Y nodes to factor nodes:
mX→f (x) = δ(x − x′ ) · mfi →X (x)
i
if f, f1 , . . . , fn are the neighbors of X and x′ is the observed value at X
◮ messages fromZfactorZnodes to variable nodes:
Y
mf →X (x) = · · · f (x, y1 , . . . , yn ) mYi →f (yi )dy1 . . . dyn
i
if X, Y1 , . . . , Yn are the neighbors of f

Learning and Inference in Graphical Models. Chapter 04 – p. 12/23


Belief propagation

Example (“exam problem”)


level of com- mental
c 2 3 4 prehension mood
1 1 1
P (c = c) 3 3 3

m −1 +1
3 1 result of exam
P (M = m) 4 4
1

 4
if e = c + m

1
2
if e = c
P (E = e|C = c, M = m) = 1
 if e = c − m
4


0 otherwise

apply belief propagation to calculate P (C|E) for


E=2
→ blackboard

Learning and Inference in Graphical Models. Chapter 04 – p. 13/23


Belief propagation

Belief propagation works if either or


◮ all distributions are categorical (e.g. example)
◮ all distributions are conjugate
◮ all distributions are Gaussian and variables depend linearly, i.e.
X|Y1 , . . . , Yn ∼ N (c1 Y1 + · · · + cn Yn , σ 2 ) with fixed values
c1 , . . . , cn , σ 2
Otherwise, integrals might become untreatable analytically

Gauss-linear example: “exam problem” with


C ∼ N (3, 4)
M ∼ N (0, 1)
E|C, M ∼ N (C + M, 1)
→ blackboard/homework

Learning and Inference in Graphical Models. Chapter 04 – p. 14/23


MAP estimator

Second task, calculate the


maximum-a-posterior estimator
U1
arg max p(U = ~u|O = ~o)
~
u
Again, we focus on polytrees only.
U2 U3
Calculate the MAP estimator for the model on
the right.

arg max p(U = ~u|O = ~o) U8 U4


~
u
p(U = ~u, O = ~o)
= arg max
~
u p(O = ~o)
U5 O1
= arg max p(U = ~u, O = ~o)
~
u
= arg max log p(U = ~u, O = ~o)
~
u
U6 O2 U7

Learning and Inference in Graphical Models. Chapter 04 – p. 15/23


MAP estimator

log p(U = ~u, O = ~o)  f1


Y
= log f (ui , Pred (Ui ))· U1
i Y 
f (oj , Pred (Oj ))  f2  f3
X j U2 U3
= log f (ui , Pred (Ui ))+  f4
i  f5
X
log f (oj , Pred (Oj )) U8 U4
j  f6  f7
choose one node as root node (e.g. U8 ):
max log p(U = ~u, O = ~o) U5 O1
u1 ,...,u8
  f8  f9  f10
= max mf4 →U8 (u8 ) + mf6 →U8 (u8 )
u8
where mf4 →U8 contains all terms reated to f1 , . . . , f4 U6 O2 U7
and mf6 →U8 contains all terms related to f5 , . . . , f10

Learning and Inference in Graphical Models. Chapter 04 – p. 16/23


MAP estimator

mf6 →U8 (u8 ) = max log f6 (u4 , u5 , u8 ) + mU4 →f6 (u4 ) + mU5 →f6 (u5 )
u4 ,u5

mU5 →f6 (u5 ) = mf8 →U5 (u5 ) + mf9 →U5 (u5 )



mf8 →U5 (u5 ) = max log f8 (u5 , u6 ) + mU6 →f8 (u6 )
u6

mU6 →f8 (u6 ) = 0


 f5
mf9 →U5 (u5 ) = log f9 (u5 , o2 )
′ ′
 U8 U4
= max′
log f9 (u5 , o2 ) + mO2 →f9 (o2 )  f6  f7
o2
(
′ 0 if o 2 = o2

U5 O1
mO2 →f9 (o2 ) =
−∞ otherwise
 f8  f9  f10
U6 O2 U7

Learning and Inference in Graphical Models. Chapter 04 – p. 17/23


MAP estimator

mU4 →f6 (u4 ) = mf5 →U4 (u4 ) + mf7 →U4 (u4 )


mf5 →U4 (u4 ) = log f4 (u4 )
mf7 →U4 (u4 ) = log f7 (u4 , o1 ) + something
′ ′

= max′
log f7 (u4 , o1 ) + mO1 →f7 (o1 )
o1
(
′ ′ 0 if o 1 = o1

 f5
mO1 →f7 (o1 ) = mf10 →O1 (o1 ) +
−∞ otherwise U8 U4

mf10 →O1 (o′1 ) = max log f10 (o′1 , u7 ) + mU7 →f10 (u7 )  f6  f7
u7

mU7 →f10 (u7 ) = 0 U5 O1


 f8  f9  f10
U6 O2 U7

Learning and Inference in Graphical Models. Chapter 04 – p. 18/23


Max-sum algorithm

Example motivates a generic algorithm to calculate max~u log p(U = ~u, O = ~o)
known as max-sum algorithm

◮ factor nodes f generate messages and send them to variable nodes V :


mf →V (v)
◮ variable nodes V generate messages and send them to factor nodes f :
mV →f (v)
◮ messages are like functions depending on one variable
◮ a message from a node n to a neighboring node n′ can be generated as
soon as n received messages from all its neighbors except n′
◮ hence, the method can start at the leaf nodes and follow the branches of the
tree until the node of interest is met (dynamic programing principle)

Learning and Inference in Graphical Models. Chapter 04 – p. 19/23


Max-sum algorithm

How are the messages created?


◮ messages fromX
unobserved variable nodes to factor nodes:
mX→f (x) = mfi →X (x) + 0
i
if f, f1 , . . . , fn are the neighbors of X
◮ messages from observed variable( nodes to factor nodes:
X 0 if x = x′
mX→f (x) = mfi →X (x) +
i
−∞ otherwise
if f, f1 , . . . , fn are the neighbors of X and x′ is the observed value at X
◮ messages from factor nodes to variable nodes: X

mf →X (x) = max log f (x, y1 , . . . , yn ) + mYi →f (yi )
y1 ,...,yn
i
if X, Y1 , . . . , Yn are the neighbors of f

Learning and Inference in Graphical Models. Chapter 04 – p. 20/23


Max-sum algorithm

How do we calculate arg max~u log p(U = ~u, O = ~o) with the max-sum
algorithm?

◮ basic idea: backtracking


◮ in each maximization step we remind the maximizing value of each variable
◮ after having calculated the maximum over the whole tree, we backtrack
through all branches following the reminded values

Example:
f1 (u1 , u2 ) u2 = 0 u2 = 1
 f1  f2
1 1
u1 = 0 4 8
1 1
U1 U2 U3 u1 = 1 8 2

with binary variables U1 , U2 , U3


f2 (u2 , u3 ) u3 = 0 u3 = 1
1
u2 = 0 2
0
→ blackboard 1 1
u2 = 1 3 6
Learning and Inference in Graphical Models. Chapter 04 – p. 21/23
Non-polytrees

Can we apply max-sum or sum-product also in non-polytree structures?

◮ no
◮ loopy belief propagation (Frey and MacKay, 1998)
◮ EM/ECM algorithm
◮ variational methods
◮ Monte Carlo methods

Learning and Inference in Graphical Models. Chapter 04 – p. 22/23


Summary

◮ sum-product algorithm (belief propagation)


◮ max-sum algorithm

Learning and Inference in Graphical Models. Chapter 04 – p. 23/23

You might also like