0% found this document useful (0 votes)

54 views

A Tutorial Introduction To Belief Propagation: James Coughlan

This document provides an introduction to belief propagation (BP). It describes BP as a technique for calculating marginals in graphical models like Markov random fields (MRFs) and factor graphs. The document outlines the key concepts of BP including messages, beliefs, sum-product vs max-product algorithms, and computational complexity. It also provides an example of using BP for stereo matching in computer vision by defining a MRF model over disparity and likelihoods.

Uploaded by

girishkulkarni008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

A Tutorial Introduction To Belief Propagation: James Coughlan

Uploaded by

girishkulkarni008

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

A Tutorial Introduction to

Belief Propagation

James Coughlan

August 2009
Table of Contents
Introduction p. 3
MRFs, graphical models, factor graphs 5
BP 11
messages 16
belief 22
sum-product vs. max-product 24
Example: MRF stereo 27
Complications and “gotchas” 35
Speed-ups 36
Extensions/variations 37
Connections 38
Advantages 39
Disadvantages 40
Perspective 41
References 43

2
Introduction
This tutorial introduces belief propagation in the context of factor
graphs and demonstrates its use in a simple model of stereo
matching used in computer vision.
It assumes knowledge of probability and some familiarity with
MRFs (Markov random fields), but no familiarity with factor
graphs is assumed.

Based on a tutorial presented at Sixth Canadian Conference on

Computer and Robot Vision (CRV 2009). Kelowna, British
Columbia. May 2009.
Feedback is welcome: please send it to [email protected]
Updated versions will be available at
http://www.ski.org/Rehab/Coughlan_lab/General/TutorialsandReference.html

3
What is belief propagation (BP)?
Technique invented in 1982
[Pearl] to calculate marginals in
Bayes nets.
Also works with MRFs, graphical
models, factor graphs.
Exact in some cases, but
approximate for most
problems.
Can be used to estimate
marginals, or to estimate most
likely states (e.g. MAP).

4
MRFs, graphical models, factor
graphs
Common property: joint probability of many
variables factors into little pieces.
Probability domain

Energy (log prob.)

domain

The factors (f, g, F, G, etc.) are called potentials.

5
Factors
In general factors are not probabilities
themselves – they are functions that
determine all probabilities.
However, in special cases (Markov chain,
Bayes net) they can be interpreted as
conditional probabilities.
Non-negative (except in log domain), but
don’t need to normalize to 1.

6
Bayes net example
w x

y z

7
MRF example
w x

y z

8
Factor graph example
Each box denotes a factor (interaction)
among the variables it connects to:
f g h

w x y z

9
Marginals vs. maximizer
Marginals: find

Maximizer: find

Both are computationally difficult: if state space

of all xi has S possible states, then O(SN)
calculations required (exhaustive addition/
exhaustive search)!

10
One solution: BP
BP provides exact solution when there
are no loops in graph! (E.g. chain, tree.)
Equivalent to dynamic programming/
Viterbi in these cases.
Otherwise, “loopy” BP provides
approximate (but often good) solution.

Alternatives: graph cuts, MCMC/

simulated annealing, etc. 11
Overview of BP
Overview: iterative process in which
neighboring variables “talk” to each
other, passing messages such as:

“I (variable x3) think that you (variable x2)

belong in these states with various
likelihoods…”

12
Overview of BP, con’t
After enough iterations, this series of
conversations is likely to converge to a
consensus that determines the marginal
probabilities of all the variables.
Estimated marginal probabilities are
called beliefs.
BP algorithm: update messages until
convergence, then calculate beliefs.

13
Common case: pairwise MRF
Pairwise MRF (graphical model) – has just
unary and pairwise factors:

Here gi(.) are the unary factors,

fij(.,.) are the pairwise factors,
and the second product is over neighboring
pairs of nodes (variables), such that i<j (i.e.
don’t count a pair of variables twice).
14
General case: factor graph
BP also formulated for factor graphs,
which may have interactions higher-
order than pairwise. See [Kschischang
et al] for details.
This tutorial will only describe BP for
pairwise MRF.

15
Messages
Message from node i to node j:
mij(xj)

Messages are similar to

likelihoods: non-negative,
don’t have to sum to 1.
A high value of mij(xj) means
that node i “believes” the
marginal value P(xj) to be
high.
Usually initialize all messages
to 1 (uniform), or random
positive values.
16
Message update
To update message from i to j, consider
all messages flowing into i (except for
message from j):
node p mold (x )
pi i
mnewij(xj )
node i ⇒ node j
moldqi(xi )
node q

17
Message update
The messiest equation in this tutorial:

Messages (and unary) factor on RHS

multiply like independent likelihoods
Æ update equation has this form:

18
Message update
Note: given a pair of neighboring nodes, there is
only one pairwise interaction but messages
flow in both directions. Define pairwise
potential so that we can use the message
update equation in both directions (from i to j
and from j to i) without problems:

By the way, this isn’t the same as assuming a

symmetric potential, i.e.

19
Message normalization
In practice one usually normalizes the
messages to sum to 1, so that

Useful for numerical stability (otherwise

overflow/ underflow likely to occur after
enough message updates)

20
Update schedule
Synchronous: update all messages in parallel
Asynchronous: update one message at a time
With luck, messages will converge after enough updates.

Which schedule to choose?

For a chain, asynchronous is most efficient for a serial
computer (up and down chain once guarantees
convergence). Similar procedure for a tree.
For a grid (e.g. stereo on pixel lattice), people often
sweep in an “up-down-left-right fashion” [Tappen &
Freeman].
Choosing a good schedule requires some
experimentation
21
Belief read-out
Once messages have converged, use belief read-out
equation:

If you normalize belief then it approximates the marginal

probability. (Approximation exact when no loops.)

Note: another belief equation available for pairwise

beliefs, i.e. estimates of pairwise marginal
distributions. See [Bishop 2006].
22
Computational complexity
The main cost is the message update
equation, which is O(S2) for each pair of
variables.

Much better than brute-force complexity

O(SN).

23
Sum-product vs. max-product
The standard BP we just described is
called sum-product (from message
update equation), and is used to
estimate marginals.
A simple variant, called max-product (or
max-sum in log domain), is used to
estimate the state configuration with
maximum probability.
24
Max-product
Message update same as before, except that
sum is replaced by max:

Belief equation same as before, but beliefs no

longer estimate marginals. Instead, they are
scoring functions whose maxima point to
most likely states.

25
Max-sum
This is what max-product becomes in log
domain: products become sums, and
messages can be positive or negative.
Note: in practice, beliefs are often
“normalized” to avoid underflow/
overflow, e.g. by uniformly shifting them
so that lowest belief value is 0.

26
Example: MRF stereo
Let r or s denote 2D image coordinates (x,y).
Unknown disparity field D(x,y)=D(r).
Smoothness prior:
where sums over all
neighboring pixels.
Often the penalty is used
instead of for greater robustness.

27
Likelihood function
Let m denote the matching error across the
entire left and right images, i.e.

Simple likelihood function:

Assume conditional independence across

image:
28
Posterior and inference
Posterior: P(D | m) = P(D) P(m|D) / P(m)

Posterior has unary and pairwise factors.

Sum-product BP Æ estimates marginals

P(Dr | m) at each pixel r.

29
Sample results
Tsukuba images from Middlebury stereo database
(http://vision.middlebury.edu/stereo/ )

Left Right

Original images in color, but our simple model uses

grayscale versions
30
Sample results
Message update schedule: “left-right-up-
down”
“Left” means an entire sweep that updates
messages from all pixels to their left
neighbors, etc.
One iteration consists of a sweep left, then
right, then up, then down.

31
Sample results
Winning disparities shown by grayscale
levels (lighter pixels have higher estimated
disparity)
Before BP (i.e. disparities estimated solely by
unary potentials):

32
Sample results
First iteration: disparities shown after
left, right, up, down sweeps

Noticeable “streaking” after left and right sweeps, mostly

erased by up sweep.

33
Sample results
Subsequent iterations:
2 3 4 5

… 20
Note:
Little change after first few iterations.
Model can be improved to give better results
-- this is just a simple example to illustrate BP.
34
Complications and “gotchas”
1.) Ties: suppose there are two state
configurations that are equally probable. BP
beliefs will show ties between the two
solutions; how to recover both globally
consistent solutions?
Solution: back-tracking [Bishop]

2.) If messages oscillate instead of converge Æ

damp them with “momentum” [Murphy et al]

35
Speed-ups
Binary variables: use log ratios [Mackay]
Distance transform and multi-scale
[Felzenszwalb & Huttenlocher]
Sparse forward-backward [Pal et al]
Dynamic quantization of state space [Coughlan
& Shen]
Higher-order factors with linear interactions
[Potetz and Lee]
GPU [Brunton et al]
36
Extensions/variations
Factor graph BP: higher-order factors
(cliques) [Kschiang et al]
Particles for continuous variables: non-
parametric BP [Sudderth et al]
Top m solutions [Yanover & Weiss]
Tree reweighted BP [Wainwright et al]

37
Connections

Variational formulation/ connection with

statistical physics: Bethe free energy,
Kikuchi, comparison with mean field
[Yedidia]
Model of neurodynamics [Ott & Stoop]

38
Advantages
Extremely general: you can apply BP to any
graphical model with any form of potentials –
even higher-order than pairwise!
Useful for marginals or maximum probability
solution
Exact when there are no loops
Easy to program
Easy to parallelize

39
Disadvantages
Other methods may be more accurate,
faster and/or less memory-intensive in
some domains [Szeliski et al].
For instance, graph cuts are faster than
BP for stereo, and give slightly better
results.

40
Perspective
But [Meltzer et al]: better to improve model than
to improve the optimization technique:

“As can be seen, the global minimum of the

energy function does not solve many of the
problems in the BP or Graph Cuts solutions.
This suggests that the problem is not in the
optimization algorithm but rather in the
energy function. A promising problem for
future research is to learn better energy
functions from ground truth data.”
41
Thanks to
Dr. Ender Tekin and Dr. Huiying Shen

View of Lake Okanagan (about 10 miles from Kelowna)

42
References
Introductory material:

C. M. Bishop. Pattern Recognition and Machine Learning. Springer. 2006. http://research.microsoft.com/en-

us/um/people/cmbishop/PRML/index.htm

D. Mackay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press. 2003.

J. Pearl. “Reverend Bayes on inference engines: A distributed hierarchical approach.” AAAI-82: Pittsburgh, PA. Second
National Conference on Artificial Intelligence. Menlo Park, California: AAAI Press. pp. 133–136. 1982.

Performance studies:

T. Meltzer, C. Yanover and Y. Weiss. “Globally Optimal Solutions for Energy Minimization in Stereo Vision using
Reweighted Belief Propagation.” ICCV 2005.

K.P. Murphy,Y. Weiss and M.I. Jordan. “Loopy belief propagation for approximate inference: an empirical study.”
Uncertainty in AI. 1999.

R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kolmogorov, A. Agarwala, M. Tappen, and C. Rother. “A comparative
study of energy minimization methods for Markov random fields with smoothness-based priors.” IEEE Transactions
on Pattern Analysis and Machine Intelligence, 30(6):1068-1080. June 2008.

M. F. Tappen and W. T. Freeman. “Comparison of Graph Cuts with Belief Propagation for Stereo, using Identical MRF
Parameters.” International Conference on Computer Vision (ICCV). 2003.

Factor graphs:

F.R. Kschischang, B.J. Frey and H.-A. Loeliger. “Factor graphs and the sum-product algorithm.” IEEETIT: IEEE
Transactions on Information Theory 47. 2001.

43
References, con’t
Extensions:

E. Sudderth, A. Ihler, W. Freeman, and A. Willsky. “Nonparametric Belief Propagation.” Conference on Computer Vision & Pattern Recognition
(CVPR). June 2003.

M. Wainwright, T. Jaakkola, and A. Willsky, “Map Estimation via Agreement on Trees: Message-Passing and Linear Programming.” IEEE Trans.
Information Theory, vol. 51, no. 11, pp. 3697-3717. 2005.

C. Yanover and Y. Weiss. “Finding the M Most Probable Configurations using Loopy Belief Propagation.” NIPS 2003.

Speed-ups:

B. Potetz and T.S. Lee. “Efficient belief propagation for higher-order cliques using linear constraint nodes.” Comput. Vis. Image Understanding,
112(1):39-54. 2008.

A. Brunton, C. Shu, and G. Roth. “Belief propagation on the gpu for stereo vision.” 3rd Canadian Conference on Computer and Robot Vision.
2006.

J. Coughlan and H. Shen. "Dynamic Quantization for Belief Propagation in Sparse Spaces." Computer Vision and Image Understanding (CVIU)
Special issue on Generative-Model Based Vision. Volume 106, Issue 1, pp. 47-58. April 2007.

P. F. Felzenszwalb and D. P. Huttenlocher. “Efficient belief propagation for early vision.” Int. J. Comput. Vision, 70(1):41–54. 2006.

C. Pal, C. Sutton, and A. McCallum. “Sparse forwardbackward using minimum divergence beams for fast training of conditional random fields.” In
Proc. Intl. Conf. on Acoustics, Speech, and Signal Processing, volume 5, pages 581–584, 2006.

Variational formulation:

J. Yedidia. “Characterization of belief propagation and its generalizations.” TR2001-015. 2001.

J. Yedidia. “Bethe free energy, Kikuchi approximations, and belief propagation algorithms.” TR2001-016. 2001.

Neuroscience:

T. Ott and R. Stoop. “The neurodynamics of belief propagation on binary markov random fields.” NIPS 2006.

DRP Proposal - 210103
No ratings yet
DRP Proposal - 210103
6 pages
CS 188 Introduction To AI Midterm Study Guide
No ratings yet
CS 188 Introduction To AI Midterm Study Guide
2 pages
2006 March 21 MRF
No ratings yet
2006 March 21 MRF
101 pages
Stavros A Gpu
No ratings yet
Stavros A Gpu
8 pages
A Visual Introduction To Gaussian Belief Propagation A Framework For Distributed Inference With Emerging Hardware.
No ratings yet
A Visual Introduction To Gaussian Belief Propagation A Framework For Distributed Inference With Emerging Hardware.
20 pages
lcaw_fvs_tsp12
No ratings yet
lcaw_fvs_tsp12
16 pages
2-BP Basics
No ratings yet
2-BP Basics
69 pages
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
No ratings yet
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation
58 pages
Probabilistic Graphical Models CPSC 532c (Topics in AI) Stat 521a (Topics in Multivariate Analysis)
No ratings yet
Probabilistic Graphical Models CPSC 532c (Topics in AI) Stat 521a (Topics in Multivariate Analysis)
35 pages
Belief Propagation Cambridge
No ratings yet
Belief Propagation Cambridge
22 pages
Distributed Message Passing Research Paper
No ratings yet
Distributed Message Passing Research Paper
8 pages
Variable Elimination hmm ppt-1
No ratings yet
Variable Elimination hmm ppt-1
21 pages
Lec23 PDF
No ratings yet
Lec23 PDF
7 pages
Programming Assignment 5
No ratings yet
Programming Assignment 5
8 pages
prob_inf
No ratings yet
prob_inf
56 pages
6438 CombinedNotes
No ratings yet
6438 CombinedNotes
206 pages
Belief Propagation Algorithm
No ratings yet
Belief Propagation Algorithm
20 pages
Lec7_Bayesian Network I(1)
No ratings yet
Lec7_Bayesian Network I(1)
62 pages
41-, Gaussian Mixture Models, Expectation Maximization-20-11-2024
No ratings yet
41-, Gaussian Mixture Models, Expectation Maximization-20-11-2024
40 pages
Inference in BN
No ratings yet
Inference in BN
18 pages
Learning To Decode Linear Codes Using Deep Learning: July 2016
No ratings yet
Learning To Decode Linear Codes Using Deep Learning: July 2016
7 pages
Bayesian Networks and Belief Propagation
No ratings yet
Bayesian Networks and Belief Propagation
67 pages
Graphical Models, Exponential Families, And Variational Inference
No ratings yet
Graphical Models, Exponential Families, And Variational Inference
301 pages
ASHTIKA
No ratings yet
ASHTIKA
9 pages
Lab Tutorial
No ratings yet
Lab Tutorial
103 pages
Plenary - Willsky
No ratings yet
Plenary - Willsky
46 pages
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
No ratings yet
ECE 6504: Advanced Topics in Machine Learning: Probabilistic Graphical Models and Large-Scale Learning
40 pages
04 Exact Inference
No ratings yet
04 Exact Inference
23 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
Scribe Lecture4
No ratings yet
Scribe Lecture4
9 pages
Tutorial GFqDecoding Part1
No ratings yet
Tutorial GFqDecoding Part1
59 pages
abc_slides
No ratings yet
abc_slides
68 pages
2008 Eccv Pres
No ratings yet
2008 Eccv Pres
31 pages
AItRBM Proof
No ratings yet
AItRBM Proof
23 pages
EXP1_A09_DS
No ratings yet
EXP1_A09_DS
6 pages
Restricted Boltzmann Machines: Abstract
No ratings yet
Restricted Boltzmann Machines: Abstract
21 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
An Introduction To Probabilistic Programming: Jan-Willem Van de Meent
No ratings yet
An Introduction To Probabilistic Programming: Jan-Willem Van de Meent
218 pages
supp2 (2)
No ratings yet
supp2 (2)
332 pages
Supp2 2
No ratings yet
Supp2 2
307 pages
Bayesian Belief Network: Dr. Shuang Liang School of Software Engineering Tongji University Fall, 2012
No ratings yet
Bayesian Belief Network: Dr. Shuang Liang School of Software Engineering Tongji University Fall, 2012
33 pages
Probabilistic Reasoning: CS 188: Artificial Intelligence
No ratings yet
Probabilistic Reasoning: CS 188: Artificial Intelligence
10 pages
Ece368h1s 01 - 22 - 2023
No ratings yet
Ece368h1s 01 - 22 - 2023
4 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
lec3
No ratings yet
lec3
25 pages
On Bethe Approximation: M. Chertkov Los Alamos Nat'l Lab
No ratings yet
On Bethe Approximation: M. Chertkov Los Alamos Nat'l Lab
27 pages
Machine Learning For Transmit Beamforming and Power Control
No ratings yet
Machine Learning For Transmit Beamforming and Power Control
67 pages
Bayesian Basics: Ryan P. Adams
No ratings yet
Bayesian Basics: Ryan P. Adams
7 pages
Real-Valued Graphical Models For Computer Vision: Ampas
No ratings yet
Real-Valued Graphical Models For Computer Vision: Ampas
8 pages
Essentials of Bayesian Inference 1706204646
No ratings yet
Essentials of Bayesian Inference 1706204646
21 pages
Week-9
No ratings yet
Week-9
88 pages
Inference On Relational Models Using Markov Chain Monte Carlo
No ratings yet
Inference On Relational Models Using Markov Chain Monte Carlo
61 pages
Implementing The Belief Propagation Algorithm in MATLAB
No ratings yet
Implementing The Belief Propagation Algorithm in MATLAB
15 pages
Kolter PGM
No ratings yet
Kolter PGM
75 pages
Exact_Inference_Bayesian_Networks
No ratings yet
Exact_Inference_Bayesian_Networks
3 pages
AI unit 5 notes
No ratings yet
AI unit 5 notes
35 pages
Crib Sheet
No ratings yet
Crib Sheet
2 pages
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet

A Tutorial Introduction To Belief Propagation: James Coughlan

Uploaded by

A Tutorial Introduction To Belief Propagation: James Coughlan

Uploaded by

A Tutorial Introduction to

Based on a tutorial presented at Sixth Canadian Conference on

Energy (log prob.)

The factors (f, g, F, G, etc.) are called potentials.

Both are computationally difficult: if state space

Alternatives: graph cuts, MCMC/

“I (variable x3) think that you (variable x2)

Here gi(.) are the unary factors,

Messages are similar to

Messages (and unary) factor on RHS

By the way, this isn’t the same as assuming a

Useful for numerical stability (otherwise

Which schedule to choose?

If you normalize belief then it approximates the marginal

Note: another belief equation available for pairwise

Much better than brute-force complexity

Belief equation same as before, but beliefs no

Simple likelihood function:

Assume conditional independence across

Posterior has unary and pairwise factors.

Sum-product BP Æ estimates marginals

Original images in color, but our simple model uses

Noticeable “streaking” after left and right sweeps, mostly

2.) If messages oscillate instead of converge Æ

Variational formulation/ connection with

“As can be seen, the global minimum of the

View of Lake Okanagan (about 10 miles from Kelowna)

C. M. Bishop. Pattern Recognition and Machine Learning. Springer. 2006. http://research.microsoft.com/en-

J. Yedidia. “Characterization of belief propagation and its generalizations.” TR2001-015. 2001.

You might also like