A Tutorial Introduction To Belief Propagation: James Coughlan
A Tutorial Introduction To Belief Propagation: James Coughlan
Belief Propagation
James Coughlan
August 2009
Table of Contents
Introduction p. 3
MRFs, graphical models, factor graphs 5
BP 11
messages 16
belief 22
sum-product vs. max-product 24
Example: MRF stereo 27
Complications and “gotchas” 35
Speed-ups 36
Extensions/variations 37
Connections 38
Advantages 39
Disadvantages 40
Perspective 41
References 43
2
Introduction
This tutorial introduces belief propagation in the context of factor
graphs and demonstrates its use in a simple model of stereo
matching used in computer vision.
It assumes knowledge of probability and some familiarity with
MRFs (Markov random fields), but no familiarity with factor
graphs is assumed.
3
What is belief propagation (BP)?
Technique invented in 1982
[Pearl] to calculate marginals in
Bayes nets.
Also works with MRFs, graphical
models, factor graphs.
Exact in some cases, but
approximate for most
problems.
Can be used to estimate
marginals, or to estimate most
likely states (e.g. MAP).
4
MRFs, graphical models, factor
graphs
Common property: joint probability of many
variables factors into little pieces.
Probability domain
5
Factors
In general factors are not probabilities
themselves – they are functions that
determine all probabilities.
However, in special cases (Markov chain,
Bayes net) they can be interpreted as
conditional probabilities.
Non-negative (except in log domain), but
don’t need to normalize to 1.
6
Bayes net example
w x
y z
7
MRF example
w x
y z
8
Factor graph example
Each box denotes a factor (interaction)
among the variables it connects to:
f g h
w x y z
9
Marginals vs. maximizer
Marginals: find
Maximizer: find
10
One solution: BP
BP provides exact solution when there
are no loops in graph! (E.g. chain, tree.)
Equivalent to dynamic programming/
Viterbi in these cases.
Otherwise, “loopy” BP provides
approximate (but often good) solution.
12
Overview of BP, con’t
After enough iterations, this series of
conversations is likely to converge to a
consensus that determines the marginal
probabilities of all the variables.
Estimated marginal probabilities are
called beliefs.
BP algorithm: update messages until
convergence, then calculate beliefs.
13
Common case: pairwise MRF
Pairwise MRF (graphical model) – has just
unary and pairwise factors:
15
Messages
Message from node i to node j:
mij(xj)
17
Message update
The messiest equation in this tutorial:
18
Message update
Note: given a pair of neighboring nodes, there is
only one pairwise interaction but messages
flow in both directions. Define pairwise
potential so that we can use the message
update equation in both directions (from i to j
and from j to i) without problems:
19
Message normalization
In practice one usually normalizes the
messages to sum to 1, so that
20
Update schedule
Synchronous: update all messages in parallel
Asynchronous: update one message at a time
With luck, messages will converge after enough updates.
23
Sum-product vs. max-product
The standard BP we just described is
called sum-product (from message
update equation), and is used to
estimate marginals.
A simple variant, called max-product (or
max-sum in log domain), is used to
estimate the state configuration with
maximum probability.
24
Max-product
Message update same as before, except that
sum is replaced by max:
25
Max-sum
This is what max-product becomes in log
domain: products become sums, and
messages can be positive or negative.
Note: in practice, beliefs are often
“normalized” to avoid underflow/
overflow, e.g. by uniformly shifting them
so that lowest belief value is 0.
26
Example: MRF stereo
Let r or s denote 2D image coordinates (x,y).
Unknown disparity field D(x,y)=D(r).
Smoothness prior:
where sums over all
neighboring pixels.
Often the penalty is used
instead of for greater robustness.
27
Likelihood function
Let m denote the matching error across the
entire left and right images, i.e.
29
Sample results
Tsukuba images from Middlebury stereo database
(http://vision.middlebury.edu/stereo/ )
Left Right
31
Sample results
Winning disparities shown by grayscale
levels (lighter pixels have higher estimated
disparity)
Before BP (i.e. disparities estimated solely by
unary potentials):
32
Sample results
First iteration: disparities shown after
left, right, up, down sweeps
33
Sample results
Subsequent iterations:
2 3 4 5
… 20
Note:
Little change after first few iterations.
Model can be improved to give better results
-- this is just a simple example to illustrate BP.
34
Complications and “gotchas”
1.) Ties: suppose there are two state
configurations that are equally probable. BP
beliefs will show ties between the two
solutions; how to recover both globally
consistent solutions?
Solution: back-tracking [Bishop]
35
Speed-ups
Binary variables: use log ratios [Mackay]
Distance transform and multi-scale
[Felzenszwalb & Huttenlocher]
Sparse forward-backward [Pal et al]
Dynamic quantization of state space [Coughlan
& Shen]
Higher-order factors with linear interactions
[Potetz and Lee]
GPU [Brunton et al]
36
Extensions/variations
Factor graph BP: higher-order factors
(cliques) [Kschiang et al]
Particles for continuous variables: non-
parametric BP [Sudderth et al]
Top m solutions [Yanover & Weiss]
Tree reweighted BP [Wainwright et al]
37
Connections
38
Advantages
Extremely general: you can apply BP to any
graphical model with any form of potentials –
even higher-order than pairwise!
Useful for marginals or maximum probability
solution
Exact when there are no loops
Easy to program
Easy to parallelize
39
Disadvantages
Other methods may be more accurate,
faster and/or less memory-intensive in
some domains [Szeliski et al].
For instance, graph cuts are faster than
BP for stereo, and give slightly better
results.
40
Perspective
But [Meltzer et al]: better to improve model than
to improve the optimization technique:
D. Mackay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press. 2003.
J. Pearl. “Reverend Bayes on inference engines: A distributed hierarchical approach.” AAAI-82: Pittsburgh, PA. Second
National Conference on Artificial Intelligence. Menlo Park, California: AAAI Press. pp. 133–136. 1982.
Performance studies:
T. Meltzer, C. Yanover and Y. Weiss. “Globally Optimal Solutions for Energy Minimization in Stereo Vision using
Reweighted Belief Propagation.” ICCV 2005.
K.P. Murphy,Y. Weiss and M.I. Jordan. “Loopy belief propagation for approximate inference: an empirical study.”
Uncertainty in AI. 1999.
R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. “A comparative
study of energy minimization methods for Markov random fields with smoothness-based priors.” IEEE Transactions
on Pattern Analysis and Machine Intelligence, 30(6):1068-1080. June 2008.
M. F. Tappen and W. T. Freeman. “Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF
Parameters.” International Conference on Computer Vision (ICCV). 2003.
Factor graphs:
F.R. Kschischang, B.J. Frey and H.-A. Loeliger. “Factor graphs and the sum-product algorithm.” IEEETIT: IEEE
Transactions on Information Theory 47. 2001.
43
References, con’t
Extensions:
E. Sudderth, A. Ihler, W. Freeman, and A. Willsky. “Nonparametric Belief Propagation.” Conference on Computer Vision & Pattern Recognition
(CVPR). June 2003.
M. Wainwright, T. Jaakkola, and A. Willsky, “Map Estimation via Agreement on Trees: Message-Passing and Linear Programming.” IEEE Trans.
Information Theory, vol. 51, no. 11, pp. 3697-3717. 2005.
C. Yanover and Y. Weiss. “Finding the M Most Probable Configurations using Loopy Belief Propagation.” NIPS 2003.
Speed-ups:
B. Potetz and T.S. Lee. “Efficient belief propagation for higher-order cliques using linear constraint nodes.” Comput. Vis. Image Understanding,
112(1):39-54. 2008.
A. Brunton, C. Shu, and G. Roth. “Belief propagation on the gpu for stereo vision.” 3rd Canadian Conference on Computer and Robot Vision.
2006.
J. Coughlan and H. Shen. "Dynamic Quantization for Belief Propagation in Sparse Spaces." Computer Vision and Image Understanding (CVIU)
Special issue on Generative-Model Based Vision. Volume 106, Issue 1, pp. 47-58. April 2007.
P. F. Felzenszwalb and D. P. Huttenlocher. “Efficient belief propagation for early vision.” Int. J. Comput. Vision, 70(1):41–54. 2006.
C. Pal, C. Sutton, and A. McCallum. “Sparse forwardbackward using minimum divergence beams for fast training of conditional random fields.” In
Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing, volume 5, pages 581–584, 2006.
Variational formulation:
J. Yedidia. “Bethe free energy, Kikuchi approximations, and belief propagation algorithms.” TR2001-016. 2001.
Neuroscience:
T. Ott and R. Stoop. “The neurodynamics of belief propagation on binary markov random fields.” NIPS 2006.
44