2006 March 21 MRF
2006 March 21 MRF
L R
Squared
difference,
(L[x] – R[x-d])^2,
for some x.
d
x1 x2 x3
y z
http://mark.michaelis.net/weblog/2002/12/29/Tinker%20Toys%20Car.jpg
Steps in building and using graphical models
⎛1 α α⎞
⎜ ⎟
⎜α 1 α ⎟
⎜α α 1 ⎟
⎝ ⎠
A more general compatibility matrix
(values shown as grey scale)
Derivation of belief propagation
y1 y2 y3
Φ ( x1 , y1 ) Φ ( x2 , y 2 ) Φ ( x3 , y3 )
x1 x2 x3
Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
Φ ( x2 , y2 ) Ψ ( x1 , x2 )
Φ ( x3 , y3 ) Ψ ( x2 , x3 )
x1MMSE = mean Φ ( x1 , y1 )
x1
y1 y2 y3
sum Φ ( x2 , y2 ) Ψ ( x1 , x2 ) Φ ( x1 , y1 ) Φ ( x2 , y 2 ) Φ ( x3 , y3 )
x2
x1 x2 x3
sum Φ ( x3 , y3 ) Ψ ( x2 , x3 ) Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
x3
Propagation rules
x1MMSE = mean sum sum P ( x1 , x2 , x3 , y1 , y2 , y3 )
x1 x2 x3
Φ ( x2 , y2 ) Ψ ( x1 , x2 )
Φ ( x3 , y3 ) Ψ ( x2 , x3 )
x1MMSE = mean Φ ( x1 , y1 )
x1
y1 y2 y3
sum Φ ( x2 , y2 ) Ψ ( x1 , x2 ) Φ ( x1 , y1 ) Φ ( x2 , y 2 ) Φ ( x3 , y3 )
x2
x1 x2 x3
sum Φ ( x3 , y3 ) Ψ ( x2 , x3 ) Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
x3
Propagation rules
x1MMSE = mean Φ ( x1 , y1 )
x1
sum Φ ( x2 , y2 ) Ψ ( x1 , x2 )
x2
sum Φ ( x3 , y3 ) Ψ ( x2 , x3 )
x3
M 12 ( x1 ) = sum Ψ ( x1 , x2 ) Φ ( x2 , y2 ) M 23 ( x2 )
x2
y1 y2 y3
Φ ( x1 , y1 ) Φ ( x2 , y 2 ) Φ ( x3 , y3 )
x1 x2 x3
Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
Propagation rules
x1MMSE = mean Φ ( x1 , y1 )
x1
sum Φ ( x2 , y2 ) Ψ ( x1 , x2 )
x2
sum Φ ( x3 , y3 ) Ψ ( x2 , x3 )
x3
M 12 ( x1 ) = sum Ψ ( x1 , x2 ) Φ ( x2 , y2 ) M 23 ( x2 )
x2
y1 y2 y3
Φ ( x1 , y1 ) Φ ( x2 , y 2 ) Φ ( x3 , y3 )
x1 x2 x3
Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
Belief propagation: the nosey
neighbor rule
“Given everything that I know, here’s what I
think you should think”
M i j ( xi ) = ∑ψ ij (xi , x j ) ∏ j (x j )
M k
xj k ∈N ( j ) \ i
i j i j
=
Beliefs
To find a node’s beliefs: Multiply together all the
messages coming in to that node.
j bj (x j ) = ∏ j (x j )
M k
k ∈N ( j )
Simple BP example
y1 y3
Φ ( x1 , y1 ) Φ ( x3 , y3 )
x1 x2 x3
⎛ .9 .1 ⎞
Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
Ψ ( x1 , x2 ) = ⎜⎜ ⎟⎟
⎝ .1 .9 ⎠
⎛ .4 ⎞ ⎛ .8 ⎞
M y1
= ⎜⎜ ⎟⎟ M y3
= ⎜⎜ ⎟⎟ ⎛ .9 .1 ⎞
Ψ ( x2 , x3 ) = ⎜⎜ ⎟⎟
1 3
⎝ .6 ⎠ ⎝ .2 ⎠
⎝ .1 .9 ⎠
x1 x2 x3
Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
Simple BP example
⎛ .4 ⎞ ⎛ .8 ⎞
M 1y1 = ⎜⎜ ⎟⎟ M 3y 3 = ⎜⎜ ⎟⎟
⎝ .6 ⎠ ⎝ .2 ⎠ ⎛ .9 .1 ⎞
Ψ ( x1 , x2 ) = Ψ ( x2 , x3 ) = ⎜⎜ ⎟⎟
⎝ .1 .9 ⎠
x1 x2 x3
Ψ ( x1 , x2 ) Ψ ( x2 , x3 )
(b) Or you can run belief propagation, (BP). BP redistributes the various
partial sums, leading to a very efficient calculation.
Belief, and message updates
j bj (x j ) = ∏ j (x j )
M k
k ∈N ( j )
M i j ( xi ) = ∑ψ ij (xi , x j ) ∏ j (x j )
M k
xj k ∈N ( j ) \ i
i i j
=
Optimal solution in a chain or tree:
Belief Propagation
• “Do the right thing” Bayesian algorithm.
• For Gaussian random variables over time:
Kalman filter.
• For hidden Markov models:
forward/backward algorithm (and MAP
variant is Viterbi).
Making probability distributions modular, and
therefore tractable:
Probabilistic graphical models
If we want to find out what the likely state of variable x1 is (say, the
position of the hand of some person we are observing), what can we do?
Two reasonable choices are: (a) find the value of x1 (and of all the other
variables) that gives the maximum of P(x1, x2, x3, x4, x5); that’s the MAP
solution.
Or (b) marginalize over all the other variables and then take the mean or the
maximum of the other variables. Marginalizing, then taking the mean, is
equivalent to finding the MMSE solution. Marginalizing, then taking the
max, is called the max marginal solution and sometimes a useful thing to do.
To find the marginal probability at x1, we have to take this sum:
∑ P( x , x , x , x , x )
x2 , x3 , x4 , x5
1 2 3 4 5
Suppose the variables form a Markov chain: x1 causes x2 which causes x3,
etc. We might draw out this relationship as follows:
x1 x2 x3 x4 x5
P(a,b) = P(b|a) P(a)
P ( x1 , x2 , x3 , x4 , x5 ) = P ( x1 ) P( x2 , x3 , x4 , x5 | x1 )
= P ( x1 ) P( x2 | x1 ) P ( x3 , x4 , x5 | x1 , x2 )
= P ( x1 ) P ( x2 | x1 ) P ( x3 | x1 , x2 ) P ( x4 , x5 | x1 , x2 , x3 )
= P ( x1 ) P ( x2 | x1 ) P ( x3 | x1 , x2 ) P ( x4 | x1 , x2 , x3 ) P ( x5 | x1 , x2 , x3 , x4 )
∑ P ( x , x , x , x , x ) = ∑ P ( x )∑ P ( x
x2 , x3 , x4 , x5
1 2 3 4 5
x1
1
x2
2 | x1 )∑ P ( x3 | x2 )∑ P ( x4 | x3 )∑ P ( x5 | x4 )
x3 x4 x5
Belief propagation
Performing the marginalization by doing the partial sums is called
“belief propagation”.
∑ P ( x , x , x , x , x ) = ∑ P ( x )∑ P ( x
x2 , x3 , x4 , x5
1 2 3 4 5
x1
1
x2
2 | x1 )∑ P ( x3 | x2 )∑ P ( x4 | x3 )∑ P( x5 | x4 )
x3 x4 x5
x1 x2 x3 x4 x5
P ( x1 , x2 , x3 , x4 , x5 ) = Φ ( x1 , x2 )Φ ( x2 , x3 )Φ ( x3 , x4 )Φ ( x4 , x5 )
sum Φ ( x2 , y2 ) Ψ ( x1 , x2 )
x2
sum Φ ( x3 , y3 ) Ψ ( x2 , x3 ) Ψ ( x1 , x3 )
x3
y2
y1 y3
x2
x1 x3
Justification for running belief propagation
in networks with loops
• Experimental results:
– Error-correcting codes Kschischang and Frey, 1998;
McEliece et al., 1998
Freeman and Pasztor, 1999;
– Vision applications
Frey, 2000
• Theoretical results:
– For Gaussian processes, means are correct.
Weiss and Freeman, 1999
– Large neighborhood local maximum for MAP.
Weiss and Freeman, 2000
– Equivalent to Bethe approx. in statistical physics.
Yedidia, Freeman, and Weiss, 2000
– Tree-weighted reparameterization
Wainwright, Willsky, Jaakkola, 2001
Region marginal probabilities
bi ( xi ) = k Φ ( xi ) ∏ i ( xi )
M k
k ∈N ( i )
i
bij ( xi , x j ) = k Ψ ( xi , x j ) ∏ i ( xi )
M k
k ∈N ( i ) \ j
∏ j (x j )
M k
k ∈N ( j ) \ i
i j
Belief propagation equations
Belief propagation equations come from the
marginalization constraints.
i i j
i i j
=
M i j ( xi ) = ∑ψ ij (xi , x j ) ∏ j (x j )
M k
xj k ∈N ( j ) \ i
Results from Bethe free energy analysis
• Fixed point of belief propagation equations iff. Bethe
approximation stationary point.
• Belief propagation always has a fixed point.
• Connection with variational methods for inference: both
minimize approximations to Free Energy,
– variational: usually use primal variables.
– belief propagation: fixed pt. equs. for dual variables.
• Kikuchi approximations lead to more accurate belief
propagation algorithms.
• Other Bethe free energy minimization algorithms—
Yuille, Welling, etc.
Kikuchi message-update rules
Groups of nodes send messages to other groups of nodes.
i j i j
i j
i
= =
k l
Winkler, 1995, p. 32
MRF nodes as patches
image patches
scene patches
image
Φ(xi, yi)
Ψ(xi, xj)
scene
Network joint probability
∏Ψ( x , x ) ∏Φ( x , y )
1
P ( x, y ) = i j i i
Z i, j i
scene Scene-scene Image-scene
image compatibility compatibility
function function
neighboring local
scene nodes observations
In order to use MRFs:
• Given observations y, and the parameters of
the MRF, how infer the hidden variables, x?
• How learn the parameters of the MRF?
Outline of MRF section
• Inference in MRF’s.
– Iterated conditional modes (ICM)
– Gibbs sampling, simulated annealing
– Variational methods
– Belief propagation
– Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Iterated conditional modes
• For each node:
– Condition on all the neighbors
– Find the mode
– Repeat.
• Gibbs sampling:
– A way to generate random samples from a (potentially
very complicated) probability distribution.
• Simulated annealing:
– A schedule for modifying the probability distribution so
that, at “zero temperature”, you draw samples only
from the MAP solution.
f (x ) f (k ) draw α ~ U(0,1);
for k = 1 to n
if F (k ) ≥ α
break;
f (k ) F (k ) x = x0 + kτ ;
2. Compute distribution function
from density function
Gibbs Sampling
x1(t +1) ~ π ( x1 | x2( t ) , x3(t ) , L, xK(t ) )
x2
x1
Slide by Ce Liu
Gibbs sampling and simulated
annealing
Simulated annealing as you gradually lower
the “temperature” of the probability
distribution ultimately giving zero
probability to all but the MAP estimate.
What’s good about it: finds global MAP
solution.
What’s bad about it: takes forever. Gibbs
sampling is in the inner loop…
Gibbs sampling and simulated
annealing
So you can find the mean value (MMSE
estimate) of a variable by doing Gibbs
sampling and averaging over the values that
come out of your sampler.
You can find the MAP value of a variable by
doing Gibbs sampling and gradually
lowering the temperature parameter to zero.
Outline of MRF section
• Inference in MRF’s.
– Iterated conditional modes (ICM)
– Gibbs sampling, simulated annealing
– Variational methods
– Belief propagation
– Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Variational methods
• Reference: Tommi Jaakkola’s tutorial on
variational methods,
http://www.ai.mit.edu/people/tommi/
• Example: mean field
– For each node
• Calculate the expected value of the node,
conditioned on the mean values of the neighbors.
Outline of MRF section
• Inference in MRF’s.
– Iterated conditional modes (ICM)
– Gibbs sampling, simulated annealing
– Variational methods
– Belief propagation
– Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Outline of MRF section
• Inference in MRF’s.
– Iterated conditional modes (ICM)
– Gibbs sampling, simulated annealing
– Variational methods
– Belief propagation
– Graph cuts
• Vision applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Graph cuts
• Algorithm: uses node label swaps or expansions
as moves in the algorithm to reduce the energy.
Swaps many labels at once, not just one at a time,
as with ICM.
• Find which pixel labels to swap using min cut/max
flow algorithms from network theory.
• Can offer bounds on optimality.
• See Boykov, Veksler, Zabih, IEEE PAMI 23 (11)
Nov. 2001 (available on web).
Comparison of graph cuts and belief
propagation
Comparison of Graph Cuts with Belief
Propagation for Stereo, using Identical
MRF Parameters, ICCV 2003.
Marshall F. Tappen William T. Freeman
Ground truth, graph cuts, and belief
propagation disparity solution energies
Graph cuts versus belief propagation
• Graph cuts consistently gave slightly lower energy
solutions for that stereo-problem MRF, although
BP ran faster, although there is now a faster graph
cuts implementation than what we used…
• However, here’s why I still use Belief
Propagation:
– Works for any compatibility functions, not a restricted
set like graph cuts.
– I find it very intuitive.
– Extensions: sum-product algorithm computes MMSE,
and Generalized Belief Propagation gives you very
accurate solutions, at a cost of time.
MAP versus MMSE
Show program comparing some
methods on a simple MRF
testMRF.m
Outline of MRF section
• Inference in MRF’s.
– Gibbs sampling, simulated annealing
– Iterated condtional modes (ICM)
– Variational methods
– Belief propagation
– Graph cuts
• Applications of inference in MRF’s.
• Learning MRF parameters.
– Iterative proportional fitting (IPF)
Applications of MRF’s
• Stereo
• Motion estimation
• Labelling shading and reflectance
• Many others…
Applications of MRF’s
• Stereo
• Motion estimation
• Labelling shading and reflectance
• Many others…
Motion application
image patches
image
scene patches
scene
What behavior should we see in a
motion algorithm?
• Aperture problem
• Resolution through propagation of
information
• Figure/ground discrimination
The aperture problem
The aperture problem
Program demo
Motion analysis: related work
• Markov network
– Luettgen, Karl, Willsky and collaborators.
• Neural network or learning-based
– Nowlan & T. J. Senjowski; Sereno.
• Optical flow analysis
– Weiss & Adelson; Darrell & Pentland; Ju,
Black & Jepson; Simoncelli; Grzywacz &
Yuille; Hildreth; Horn & Schunk; etc.
Inference: Motion estimation results
(maxima of scene probability distributions displayed)
Image data
Iterations 0 and 1
Initial guesses only
show motion at edges.
Motion estimation results
(maxima of scene probability distributions displayed)
Iterations 2 and 3
Figure/ground still
unresolved here.
Motion estimation results
(maxima of scene probability distributions displayed)
Iterations 4 and 5
Scene Image
Add a reflectance pattern to the surface. Points
inside the squares should reflect less light
Goal
Results without
considering gray-scale
Some Areas of the Image Are
Locally Ambiguous
Input
or ?
Shading Reflectance
Propagating Information
• Can disambiguate areas by propagating
information from reliable areas of the image
into ambiguous areas of the image
Propagating Information
• Consider relationship between
neighboring derivatives
Observed
marginal
distributions
Initial guess at joint probability
IPF update equation
True joint
probability
http://research.microsoft.com/vision/Cambridge/papers/siggraph04.pdf
end