0% found this document useful (0 votes)
55 views22 pages

BP Power Camera Ready

BP computes posterior marginals, called beliefs, for each variable when the network is singly connected. The idea of applying Belief Propagation directly to multiply connected networks caught up only a decade after the book was published.

Uploaded by

Thys Potgieter
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views22 pages

BP Power Camera Ready

BP computes posterior marginals, called beliefs, for each variable when the network is singly connected. The idea of applying Belief Propagation directly to multiply connected networks caught up only a decade after the book was published.

Uploaded by

Thys Potgieter
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

9

On the Power of Belief Propagation:


A Constraint Propagation Perspective
R. DECHTER, B. BIDYUK, R. MATEESCU AND E. ROLLON
1 Introduction
In his seminal paper, Pearl [1986] introduced the notion of Bayesian networks and the rst
processing algorithm, Belief Propagation (BP), that computes posterior marginals, called
beliefs, for each variable when the network is singly connected. The paper provided the
foundation for the whole area of Bayesian networks. It was the rst in a series of inuential
papers by Pearl, his students and his collaborators that culminated a few years later in his
book on probabilistic reasoning [Pearl 1988]. In his early paper Pearl showed that for singly
connected networks (e.g., polytrees) the distributed message-passing algorithm converges
to the correct marginals in a number of iterations equal to the diameter of the network. In
his book Pearl goes further to suggest the use of BP for loopy networks as an approximation
algorithm (see page 195 and exercise 4.7 in [Pearl 1988]). During the decade that followed
researchers focused on extending BP to general loopy networks using two principles. The
rst is tree-clustering, namely, the transformation of a general network into a tree of large-
domain variables called clusters on which BP can be applied. This led to the join-tree or
junction-tree clustering and to the bucket-elimination schemes [Pearl 1988; Dechter 2003]
whose time and space complexity is exponential in the tree-width of the network. The
second principle is that of cutset-conditioning that decomposes the original network into
a collection of independent singly-connected networks all of which must be processed by
BP. The cutset-conditioning approach is time exponential in the networks loop-cutset size
and require linear space [Pearl 1988; Dechter 2003].
The idea of applying belief propagation directly to multiply connected networks caught
up only a decade after the book was published, when it was observed by researchers in
coding theory that high performing probabilistic decoding algorithms such as turbo codes
and low density parity-check codes, which signicantly outperformed the best decoders at
the time, are equivalent to an iterative application of Pearls belief propagation algorithm
[McEliece, MacKay, and Cheng 1998]. This success intrigued researchers and started
massive explorations of the potential of these local computation algorithms for general
applications. There is now a signicant body of research seeking the understanding and
improvement of the inference power of iterative belief propagation (IBP).
The early work on IBP showed its convergence for a single loop, provided empirical evi-
dence of its successes and failures on various classes of networks [Rish, Kask, and Dechter
1998; Murphy, Weiss, and Jordan 2000] and explored the relationship between energy min-
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
imization and belief-propagation shedding light on convergence and stable points [Yedidia,
Freeman, and Weiss 2000]. Current state of the art in convergence analysis are the works by
[Ihler, Fisher, and Willsky 2005; Mooij and Kappen 2007] that characterize convergence in
networks having no determinism. The work by [Roosta, Wainwright, and Sastry 2008] also
includes an analysis of the possible effects of strong evidence on convergence which can act
to suppress the effects of cycles. As far as accuracy, the work of [Ihler 2007] considers how
weak potentials can make the graph sufciently tree-like to provide error bounds, a work
which is extended and improved in [Mooij and Kappen 2009]. For additional information
see [Koller 2010].
While a signicant progress has been made in understanding the relationship between
belief propagation and energy minimization, and while many extensions and variations
were proposed, some with remarkable performance (e.g., survey propagation for solving
satisability for random SAT problems), the following questions remain even now:
Why does belief propagation work so well on coding networks?
Can we characterize additional classes of problems for which IBP is effective?
Can we assess the quality of the algorithms performance once and if it converges.
In this paper we try to shed light on the power (and limits) of belief propagation algo-
rithms and on the above questions by explicating its relationship with constraint propaga-
tion algorithms such as arc-consistency. Our results are relevant primarily to networks that
have determinism and extreme probabilities. Specically, we show that: (1) Belief prop-
agation converges for zero beliefs; (2) All IBP-inferred zero beliefs are correct; (3) IBPs
power to infer zero beliefs is as weak and as strong as that of arc-consistency; (4) Evidence
and inferred singleton beliefs act like cutsets during IBPs performance. From points (2)
and (4) it follows that if the inferred evidence breaks all the cycles, then IBP converges to
the exact beliefs for all variables.
Subsequently, we investigate empirically the behavior of IBP for inferred near-zero be-
liefs. Specically, we explore the hypothesis that: (5) If IBP infers that the belief of a
variable is close to zero then this inference is relatively accurate. We will see that while our
empirical results support the hypothesis on benchmarks having no determinism, the results
are quite mixed for networks with determinism.
Finally, (6) We investigate if variables that have extreme probabilities in all its domain
values (i.e., extreme support) also nearly cut off information ow. If that hypothesis is true,
whenever the set of variables with extreme support constitute a loop-cutset, IBP is likely
to converge and, if the inferred beliefs for those variables are sound, it will converge to
accurate beliefs throughout the network.
On coding networks that posses signicant determinism, we do see this desired behavior.
So, we could view this hypothesis as the rst to provide a plausible explanation to the
success of belief propagation on coding networks. In coding networks the channel noise is
modeled through a normal distribution centered at the transmitted character and controlled
by a small standard deviation. The problem is modeled as a layered belief network whose
On the Power of Belief Propagation
sink nodes are all evidence that transmit extreme support to their parents, which constitute
all the rest of the variables. The remaining dependencies are functional and arc-consistency
on this type of networks is strong and often complete. Alas, as we show, on some other
deterministic networks IBPs performance inferring near zero values is utterly inaccurate,
and therefore the strength of this explanation is questionable.
The paper is based for the most part on [Dechter and Mateescu 2003] and also on
[Bidyuk and Dechter 2001]. The empirical portion of the paper includes signicant new
analysis of recent empirical evaluations carried on in UAI 2006 and UAI 2008
1
.
2 Arc-consistency
DEFINITION 1 (constraint network). A constraint network is a triple ( = X, D, C),
where X = X
1
, ..., X
n
is a set of variables associated with a set of discrete-valued
domains D = D
1
, ..., D
n
and a set of constraints C = C
1
, ..., C
r
. Each constraint
C
i
is a pair S
i
, R
i
) where R
i
is a relation R
i
D
Si
dened on a subset of variables
S
i
X and D
S
i
is the Cartesian product of the domains of variables S
i
. The relation
R
i
denotes all tuples of D
Si
allowed by the constraint. The projection operator creates
a new relation,
S
j
(R
i
) = x [ x D
S
j
and y, y D
Si\Sj
and x y R
i
, where
S
j
S
i
. Constraints can be combined with the join operator 1, resulting in a new relation,
R
i
1 R
j
= x [ x D
S
i
S
j
and
S
i
(x) R
i
and
S
j
(x) R
j
.
DEFINITION2 (constraint satisfaction problem). The constraint satisfaction problem(CSP)
dened over a constraint network ( = X, D, C), is the task of nding a solution, that is,
an assignment of values to all the variables x = (x
1
, ..., x
n
), x
i
D
i
, such that C
i

C,
S
i
(x) R
i
. The set of all solutions of the constraint network ( is sol(() =1 R
i
.
2.1 Describing Arc-Consistency Algorithms
Arc-consistency algorithms belong to the well-known class of constraint propagation algo-
rithms [Mackworth 1977; Dechter 2003]. All constraint propagation algorithms are poly-
nomial time algorithms that are at the center of constraint processing techniques.
DEFINITION 3 (arc-consistency). [Mackworth 1977] Given a binary constraint network
( = X, D, C), ( is arc-consistent iff for every binary constraint R
i
C s.t. S
i
=
X
j
, X
k
, every value x
j
D
j
has a value x
k
D
k
s.t. (x
j
, x
k
) R
i
.
When a binary constraint network is not arc-consistent, arc-consistency algorithms re-
move values from the domains of the variables until an arc-consistent network is generated.
A variety of such algorithms were developed over the past three decades [Dechter 2003].
We will consider here a simple and not the most efcient version, which we call relational
distributed arc-consistency algorithm. Rather than dening it on binary constraint networks
we will dene it directly over the dual graph, extending the arc-consistency condition to
non-binary networks.
DEFINITION 4 (dual graph). Given a set of functions/constraints F = f
1
, ..., f
r
over
scopes S
1
, ..., S
r
, the dual graph of F is a graph T
F
= (V, E, L) that associates a node
1
http://graphmod.ics.uci.edu/uai08/Evaluation/Report
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
A
B C
D F
G
A
AB AC
ABD BCF
DFG
AB
2
1
3
A
2 3
2
C
1
A
1 2
3 2
1 3
2 3
3
2
B
1
1
A
1
3
F
2 3
2
C
1
B
3 1 2
1 3 2
2 1 3
1
2
3
D
2 3
3
2
B
1
1
A
3
3
G
1
2
F
2
1
D
4
1
5
3
6
2
2
4
h
2
5
h
6
5
h
=
6
5
h 3
1
F
B
4
6
h
D
4
5
h
B
F
( ) = =
6
5 6
4
6
h R h
D

2
D
( ) = =
4
5
4
6 4
2
4
h h R h
AB

3
B
1
A
5
6
h
( ) = =
6
5 6
5
6
h R h
F

1
F
( ) = = =
5
6 5
2
5
4
5
h R h h
B

3
B
1
2
h A
( ) = =
2
5
2
4 2
1
2
h h R h
A

1
A
1
R
2
R
4
R
3
R
5
R
6
R
Figure 1. Part of the execution of RDAC algorithm
with each function, namely V = F, and an arc connects any two nodes whose scope share
a variable, E = (f
i
, f
j
)[S
i
S
j
,= . L is a set of labels for the arcs, where each arc is
labeled by the shared variables of its nodes, L = l
ij
= S
i
S
j
[(i, j) E.
Algorithm Relational distributed arc-consistency (RDAC) is a message passing algo-
rithm dened over the dual graph T
C
of a constraint network ( = X, D, C). It enforces
what is known as relational arc-consistency [Dechter 2003]. Each node/constraint in T
C
i
,
for a constraint C
i
( maintains a current set of viable tuples R
i
. Let ne(i) be the set of
neighbors of C
i
in T
C
. Every node C
i
sends a message to any node C
j
ne(i), which
consists of the tuples over their label variables l
ij
that are allowed by the current relation
R
i
. Formally, let R
i
and R
j
be two constraints sharing scopes, whose arc in T
C
is labeled
by l
ij
. The message that R
i
sends to R
j
denoted h
j
i
is dened by:
(1) h
j
i

lij
(R
i
1 (1
kne(i)
h
i
k
))
and each node updates its current relation according to:
(2) R
i
R
i
1 (1
kne(i)
h
i
k
)
EXAMPLE 5. Figure 1 describes part of the execution of RDAC for a graph coloring prob-
lem, having the constraint graph shown on the left. All variables have the same domain,
1,2,3, except for variable C whose domain is 2, and variable G whose domain is 3. The
arcs correspond to not equal constraints, and the relations are R
A
, R
AB
, R
AC
, R
ABD
,
R
BCF
, R
DFG
, where the subscript corresponds to their scopes. The dual graph of this
problem is given on the right side of the gure, and each table shows the initial constraints
(there are unary, binary and ternary constraints). To initialize the algorithm, the rst mes-
sages sent out by each node are universal relations over the labels. For this example, RDAC
actually solves the problem and nds the unique solution A=1, B=3, C=2, D=2, F=1, G=3.
Relational distributed arc-consistency algorithm converges after O(r t) iterations to the
largest relational arc-consistent network that is equivalent to the original network, where
r is the number of constraints and t bounds the number of tuples in each constraint. Its
complexity can be shown to be O(r
2
t
2
log t) [Dechter 2003].
On the Power of Belief Propagation
3 Iterative Belief Propagation
DEFINITION 6 (belief network). A belief network is a quadruple B = X, D, G, P)
where X = X
1
, . . . , X
n
is a set of random variables, D = D
1
, ..., D
n
is the set of
the corresponding domains, G = (X, E) is a directed acyclic graph over X and P =
p
1
, ..., p
n
is a set of conditional probability tables (CPTs) p
i
= P(X
i
[pa(X
i
)), where
pa(X
i
) are the parents of X
i
in G. The belief network represents a probability distribution
over X having the product form P(x
1
, . . . , x
n
) =

n
i=1
P(x
i
[x
pa(Xi)
). An evidence set
e is an instantiated subset of variables. The family of X
i
, denoted by fa(X
i
), includes X
i
and its parent variables. Namely, fa(X
i
) = X
i
pa(X
i
).
DEFINITION 7 (belief updating problem). The belief updating problem dened over a be-
lief network B = X, D, G, P) is the task of computing the posterior probability P(Y [e) of
query nodes Y X given evidence e. We will sometime denote by P
B
the exact probabil-
ity according the Baysian network B. When Y consists of a single variable X
i
, P
B
(X
i
[e)
is also denoted as Bel(X
i
) and called belief, or posterior marginal, or just marginal.
3.1 Describing Iterative Belief Propagation
Iterative belief propagation (IBP) is an iterative application of Pearls algorithm that was
dened for poly-trees [Pearl 1988]. Since it is a distributed algorithm, it is well dened for
any network. We will dene IBP as operating over the belief networks dual join-graph.
DEFINITION 8 (dual join-graph). Given a belief network B = X, D, G, P), a dual join-
graph is an arc subgraph of the dual graph T
B
whose arc labels are subsets of the labels
of T
B
satisfying the running intersection property, namely, that any two nodes that share
a variable in the dual join-graph be connected by a path of arcs whose labels contain the
shared variable. An arc-minimal dual join-graph is one for which none of the labels can be
further reduced while maintaining the running intersection property.
In IBP each node in the dual join-graph sends a message over an adjacent arc whose
scope is identical to its label. Pearls original algorithm sends messages whose scopes are
singleton variables only. It is easy to show that any dual graph (which itself is a dual join-
graph) has an arc-minimal singleton dual join-graph which can be constructed directly by
labeling the arc between the CPT of a variable and the CPT of its parent, by its parent
variable. Algorithm IBP dened for any dual join-graph is given in Figure 2. One iteration
of IBP is time and space linear in the size of the belief network, and when IBP is applied to
the singleton labeled dual graph it coincides with Pearls belief propagation. The inferred
approximation of belief P(X[e) output by IBP, will be denoted by P
IBP
(X[e).
4 Belief Propagations Inferred Zeros
We will now make connections between distributed relational arc-consistency and iterative
belief propagation. We rst associate any belief network with a constraint network that
captures its zero probability tuples and dene algorithm IBP-RDAC, an IBP-like algorithm
that achieves relational arc-consistency on the associated constraint network. Then, we
show that IBP-RDAC and IBP are equivalent in terms of removing inconsistent domain
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
Algorithm IBP
Input: An arc-labeled dual join-graph DJ = (V, E, L) for a belief network B = X, D, G, P.
Evidence e.
Output: An augmented graph whose nodes include the original CPTs and the messages received from
neighbors. Approximations of P(X
i
|e) and P(fa(X
i
)|e), X
i
X.
Denote by: h
v
u
the message fromu to v; ne(u) the neighbors of u in V ; ne
v
(u) = ne(u){v}; l
uv
the label of (u, v) E; elim(u, v) = fa(X
i
) fa(X
j
), where u and v are the vertexs of family
fa(X
i
) and fa(X
j
) in DJ, respectively.
One iteration of IBP
For every node u in DJ in a topological order and back, do:
1. Process observed variables
Assign evidence variables to each p
i
and remove them from the labeled arcs.
2. Compute and send to v the function:
h
v
u
=

elim(u,v)
(p
u

{h
u
i
,ine
v
(u)}
h
u
i
)
EndFor
Compute approximations of P(X
i
|e) and P(fa(X
i
)|e):
For every X
i
X (let u be the vertex of family fa(X
i
) in DJ), do:
P(fa(X
i
)|e) = (

h
u
i
,une(i)
h
u
i
) p
u
P(X
i
|e) =

fa(X
i
){X
i
}
P(fa(X
i
)|e)
EndFor
Figure 2. Algorithm Iterative Belief Propagation
values and computing zero marginal probabilities, respectively. Since arc-consistency al-
gorithms are well understood, this correspondence between IBP-RDAC and IBP yields the
main claims and provides insight into the behavior of IBP for inferred zero beliefs. In par-
ticular, this relationship justies the iterative application of belief propagation algorithms,
while also illuminates their distance from being complete.
More precisely, in this section we will show that: (a) If a variable-value pair is assessed
in some iteration by IBP as having a zero-belief, it remains zero in subsequent iterations; (b)
Any IBP-inferred zero-belief is correct with respect to the corresponding belief networks
marginal; and (c) IBP converges in nite time for all its inferred zeros.
4.1 Flattening the Belief Network
Given a belief network B = X, D, G, P), we dene the attening of a belief network B,
called flat(B), as the constraint network where all the zero entries in a probability table
are removed from the corresponding relation. Formally,
DEFINITION 9 (attening). Given a belief network B = X, D, G, P), its attening
is a constraint network flat(B) = X, D, flat(P)). Each CPT p
i
P over fa(X
i
)
is associated with a constraint S
i
, R
i
) s.t. S
i
= fa(X
i
) and R
i
= (x
i
, x
pa(X
i
)
)
D
S
i
[P(x
i
[x
pa(Xi)
) > 0. The set flat(P) is the set of the constraints S
i
, R
i
), p
i
P.
EXAMPLE 10. Figure 3 shows (a) a belief network and (b) its corresponding attening.
On the Power of Belief Propagation
A
AB AC
ABD BCF
DFG
B
B
D F
A
A
A
C
A P(A)
1 .2
2 .5
3 .3
0
A B P(B|A)
1 2 .3
1 3 .7
2 1 .4
2 3 .6
3 1 .1
3 2 .9
0
A B D P(D|A,B)
1 2 3 1
1 3 2 1
2 1 3 1
2 3 1 1
3 1 2 1
3 2 1 1
0 D F G P(G|D,F)
1 2 3 1
2 1 3 1
0
B C F P(F|B,C)
1 2 3 1
3 2 1 1
0
A C P(C|A)
1 2 1
3 2 1
0
(a) Bayesian network
A
AB AC
ABD BCF
DFG
B
B
D F
A
A
A
C
A
1
2
3
A B
1 2
1 3
2 1
2 3
3 1
3 2
A B D
1 2 3
1 3 2
2 1 3
2 3 1
3 1 2
3 2 1
D F G
1 2 3
2 1 3
B C F
1 2 3
3 2 1
A C
1 2
3 2
(b) Flat constraint network
Figure 3. Flattening of a Bayesian network
THEOREM 11. Given a belief network B = X, D, G, P), where X = X
1
, . . . , X
n
,
for any tuple x = (x
1
, . . . , x
n
): P
B
(x) > 0 x sol(flat(B)), where sol(flat(B)) is
the set of solutions of flat(B).
Proof. P
B
(x) > 0
n
i=1
P(x
i
[x
pa(Xi)
) > 0 i 1, . . . , n, P(x
i
[x
pa(Xi)
) > 0
i 1, . . . , n, (x
i
, x
pa(X
i
)
) R
Fi
x sol(flat(B)). .
Clearly this can extend to Bayesian networks with evidence:
COROLLARY 12. Given a belief network B = X, D, G, P), and evidence e P
B
(x[e) >
0 x sol(flat(B) e).
We next dene algorithmIBP-RDACand showthat it achieves relational arc-consistency
on the at network.
DEFINITION 13 (Algorithm IBP-RDAC). Given B = X, D, G, P) and evidence e, let
T
B
be a dual join-graph and T
flat(B)
be a corresponding dual join-graph of the constraint
network flat(B). Algorithm IBP-RDAC applied to T
flat(B)
is dened using IBPs speci-
cation in Figure 2 with the following modications:
1. Pre-processing evidence: when processing evidence, we remove from each R
i

flat(P) those tuples that do not agree with the assignments in evidence e.
2. Instead of

, we use the join operator 1.


3. Instead of

, we use the projection operator .


4. At the termination, we update the domains of variables by:
D
i
D
i

X
i
((1
vne(u)
h
(v,u)
) 1 R
i
)
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
By construction, it should be easy to see that,
PROPOSITION 14. Given a belief network B = X, D, G, P), algorithm IBP-RDAC is
identical to algorithm RDAC when applied to T
flat(B)
. Therefore, IBP-RDAC enforces
relational arc-consistency over flat(B).
Due to the convergence of RDAC, we get that:
PROPOSITION 15. Given a belief network B, algorithm IBP-RDAC over flat(B) con-
verges in O(n t) iterations, where n is the number of nodes in B and t is the maximum
number of tuples over the labeling variables between two nodes that have positive proba-
bility.
4.2 The Main Claim
In the following we will establish an equivalence between IBP and IBP-RDAC in terms of
zero probabilities.
PROPOSITION 16. When IBP and IBP-RDAC are applied in the same order of computa-
tion to B and flat(B) respectively, the messages computed by IBP are identical to those
computed by IBP-RDAC in terms of zero / non-zero probabilities. That is, for any pair of
corresponding messages, h
(u,v)
(t) ,= 0 in IBP iff t h
(u,v)
in IBP-RDAC.
Proof. The proof is by induction. The base case is trivially true since messages h in IBP
are initialized to a uniform distribution and messages h in IBP-RDAC are initialized to
complete relations.
The induction step. Suppose that h
IBP
(u,v)
is the message sent from u to v by IBP. We
will show that if h
IBP
(u,v)
(x) ,= 0, then x h
IBPRDAC
(u,v)
where h
IBPRDAC
(u,v)
is the mes-
sage sent by IBP-RDAC from u to v. Assume that the claim holds for all messages re-
ceived by u from its neighbors. Let f u in IBP and R
f
be the corresponding rela-
tion in IBP-RDAC, and t be an assignment of values to variables in elim(u, v). We have
h
IBP
(u,v)
(x) ,= 0

elim(u,v)

f
f(x) ,= 0 t,

f
f(x, t) ,= 0 t, f, f(x, t) ,= 0
t, f,
scope(R
f
)
(x, t) R
f
t,
elim(u,v)
(1
R
f

scope(R
f
)
(x, t)) h
IBPRDAC
(u,v)
x h
IBPRDAC
(u,v)
. .
Moving from tuples to domain values, we will show that whenever IBP computes a
marginal probability P
IBP
(x
i
[e) = 0, IBP-RDAC removes x
i
from the domain of variable
X
i
, and vice-versa.
PROPOSITION 17. Given a belief network B and evidence e, IBP applied to B derives
P
IBP
(x
i
[e) = 0 iff IBP-RDAC over flat(B) decides that x
i
, D
i
.
Proof. According to Proposition 16, the messages computed by IBP and IBP-RDAC
are identical in terms of zero probabilities. Let f cluster(u) in IBP and R
f
be the
corresponding relation in IBP-RDAC, and t be an assignment of values to variables in
(u)X
i
. We will show that when IBP computes P(X
i
= x
i
) = 0 (upon convergence),
then IBP-RDAC computes x
i
, D
i
. We have P(X
i
= x
i
) =

X\Xi

f
f(x
i
) = 0
On the Power of Belief Propagation
3
1
2
X
3
3 1
2
1
X
2
1
1
H
2
X
1
X
2
X
3
H
1
H
2
H
3
a)
X
1
X
2
X
3
H
1
X
1
X
2
H
2
X
2
X
3
H
3
X
1
X
3
c)
X
1
X
1
X
2
X
2
X
3
X
3
3
1
2
X
2
3 1
2
1
X
1
1
1
H
1
3
2
1
X
1
3
1
2
X
3
3 1
2
1
X
1
1
1
H
3
3
2
1
X
2
3
2
1
X
3
1 0 0
True
belief
0 .5 .5 300
1e-260 200
1e-129 100
.49986
.49721
.45
Bel(X
i
= 2)
.00027
.00545
.1
Bel(X
i
= 3)
.49986 3
.49721
.45
Bel(X
i
= 1)
2
1
#iter
b)
Figure 4. a) A belief network; b) Example of a nite precision problem; and (c) An arc-
minimal dual join-graph.
t,

f
f(x
i
, t) = 0 t, f, f(x
i
, t) = 0 t, R
f
,
scope(R
f
)
(x
i
, t) , R
f

t, (x
i
, t) , (1
R
f
R
f
(x
i
, t)) x
i
, D
i

X
i
(1
R
f
R
f
(x
i
, t)) x
i
, D
i
. Since
arc-consistency is sound, so is the decision of zero probabilities. .
We can now conclude that:
THEOREM 18. Given evidence e, whenever IBP applied to B infers that P
IBP
(x
i
[e) = 0,
the marginal Bel(x
i
) = P
B
(x
i
[e) = 0.
Proof. By Proposition 17, if IBP over B computes P
IBP
(x
i
[e) = 0, then IBP-RDAC over
flat(B) removes the value x
i
from the domain D
i
. Therefore, x
i
D
i
is a no-good of the
constraint network flat(B) and from Theorem 11 it follows that Bel(x
i
) = 0. .
Next, we show that the time it takes IBP to nd its inferred zeros is bounded.
PROPOSITION 19. Given a belief network B and evidence e, IBP nds all its x
i
for which
P
IBP
(x
i
[e) = 0 in nite time, that is, there exists a number k such that no P
IBP
(x
i
[e) = 0
will be generated after k iterations.
Proof. This follows from the fact that the number of iterations it takes for IBP to compute
P
IBP
(X
i
= x
i
[e) = 0 over B is exactly the same number of iterations IBP-RDAC needs
to remove x
i
from the domain D
i
over flat(B) (Propositions 16 and 17) and the fact that
IBP-RDACs number of iterations is bounded (Proposition 15). .
4.3 A Finite Precision Problem
Algorithms should always be implemented with care on nite precision machines. In the
following example we show that IBPs messages converge in the limit (i.e. in an innite
number of iterations), but they do not stabilize in any nite number of iterations.
EXAMPLE 20. Consider the belief network in Figure 4a dened over 6 variables X
1
, X
2
,
X
3
, H
1
, H
2
, H
3
. The domain of the X variables is 1, 2, 3 and the domain of the H
variables is 0, 1. The priors on X variables are:
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
P(X
i
) =

0.45, if X
i
= 1;
0.45, if X
i
= 2;
0.1, if X
i
= 3;
There are three CPTs over the scopes: H
1
, X
1
, X
2
, H
2
, X
2
, X
3
, and H
3
, X
1
, X
3
.
The values of the CPTs for every triplet of variables H
k
, X
i
, X
j
are:
P(h
k
= 1[x
i
, x
j
) =

1, if (3 ,= x
i
,= x
j
,= 3);
1, if (x
i
= x
j
= 3);
0, otherwise ;
P(h
k
= 0[x
i
, x
j
) = 1 P(h
k
= 1[x
i
, x
j
).
Consider the evidence set e = H
1
= H
2
= H
3
= 1. This Bayesian network expresses
the probability distribution that is concentrated in a single tuple:
P(x
1
, x
2
, x
3
[e) =

1, if x
1
= x
2
= x
3
= 3;
0, otherwise.
The belief for any of the X variables as a function of the number of iteration is given
in Figure 4b. After about 300 iterations, the nite precision of our computer is not able
to represent the value for Bel(X
i
= 3), and this appears to be zero, yielding the nal
updated belief (.5, .5, 0), when in fact the true updated belief should be (0, 0, 1). Notice
that (.5, .5, 0) cannot be regarded as a legitimate xed point for IBP. Namely, if we would
initialize IBP with the values (.5, .5, 0), then the algorithm would maintain them, appearing
to have a xed point. However, initializing IBP with zero values cannot be expected to be
correct. Indeed, when we initialize with zeros we forcibly introduce determinism in the
model, and IBP will always maintain it afterwards.
However, this example does not contradict our theory because, mathematically, Bel(X
i
=
3) never becomes a true zero, and IBP never reaches a quiescent state. The example shows
however that a close to zero inferred belief by IBP can be arbitrarily inaccurate. In this
case the inaccuracy seems to be due to the initial prior belief which are so different from
the posterior ones.
4.4 Zeros Inferred by Generalized Belief Propagation
Belief propagation algorithms were extended yielding the class of generalized belief prop-
agation (GBP) algorithms [Yedidia, Freeman, and Weiss 2000]. These algorithms fully
process subparts of the networks, transforming it closer to a tree structure on which IBP
can be more effective [Dechter, Mateescu, and Kask 2002; Mateescu, Kask, Gogate, and
Dechter 2010]. The above results for IBP can now be extended to GBP and in particular to
the variant of iterative join-graph propagation, IJGP [Dechter, Mateescu, and Kask 2002].
The algorithm applies message passing over a partition of the CPTs into clusters, called a
join-graph, rather than over the dual graph. The set of clusters in such a partition denes a
On the Power of Belief Propagation
unique dual graph (i.e., each cluster is a node). This dual graph can be associated with var-
ious dual join-graphs, each dened by the labeling on the arcs between neighboring cluster
nodes.
Algorithm IJGP has an accuracy parameter i, called i-bound, which restricts the maxi-
mum number of variables that can appear in a cluster and it is more accurate as i grows. The
extension of all the previous observations regarding zeros to IJGP is straightforward and
is summarized next, where the inferred approximation of the belief P
calB
(X
i
[e) computed
by IJGP is denoted by P
IJGP
(X
i
[e).
THEOREM 21. Given a belief network B to which IJGP is applied then:
1. IJGP generates all its P
IJGP
(x
i
[e) = 0 in nite time, that is, there exists a number
k, such that no P
IJGP
(x
i
) = 0 will be generated after k iterations.
2. Whenever IJGP determines P
IJGP
(x
i
[e) = 0, it stays 0 during all subsequent itera-
tions.
3. Whenever IJGP determines P
IJGP
(x
i
[e) = 0, then Bel(x
i
) = 0.
5 The Impact of IBPs Inferred Zeros
This section discusses the ramications of having sound inferred zero beliefs.
5.1 The Inference Power of IBP
We now show that the inference power of IBP for zeros is sometimes very limited and other
times strong, exactly wherever arc-consistency is weak or strong.
Cases of weak inference power. Consider the belief network described in Example 20.
The at constraint network of that belief network is dened over the scopes S
1
=H
1
, X
1
, X
2
,
S
2
=H
2
, X
2
, X
3
, S
3
=H
3
, X
1
, X
3
. The constraints are dened by: R
S
i
= (1, 1, 2),
(1, 2, 1), (1, 3, 3), (0, 1, 1), (0, 1, 3), (0, 2, 2), (0, 2, 3), (0, 3, 1), (0, 3, 2). The prior prob-
abilities for X
i
s imply unary constraints equal to the full domain 1,2,3. An arc-minimal
dual join-graph that is identical to the constraint network is given in Figure 4b. In this
case, IBP-RDAC sends as messages the full domains of the variables and thus no tuple is
removed from any constraint. Since IBP infers the same zeros as arc-consistency, IBP will
also not infer any zeros. Since the true probability of most tuples is zero, we can conclude
that the inference power of IBP on this example is weak or non-existent.
The weakness of arc-consistency in this example is not surprising. Arc-consistency is
known to be far from complete. Since every constraint network can be expressed as a be-
lief network (by adding a variable for each constraint as we did in the above example) and
since arc-consistency can be arbitrarily weak on some constraint networks, so could be IBP.
Cases of strong inference power. The relationship between IBP and arc-consistency en-
sures that IBP is zero-complete, whenever arc-consistency is. In general, if for a at con-
straint network of a belief network B, arc-consistency removes all the inconsistent domain
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
values, then IBP will also discover all the true zeros of B. Examples of constraint networks
that are complete for arc-consistency are max-closed constraints. These constraints have
the property that if 2 tuples are in the relation so is their intersection. Linear constraints
are often max-closed and so are Horn clauses (see [Dechter 2003]). Clearly, IBP is zero
complete for acyclic networks which include binary trees, polytrees and networks whose
dual graph is a hypertree [Dechter 2003]. This is not too illuminating though as we know
that IBP is fully complete (not only for zeros) for such networks.
An interesting case is when the belief network has no evidence. In this case, the at
network always corresponds to the causal constraint network dened in [Dechter and Pearl
1991]. The inconsistent tuples or domain values are already explicitly described in each
relation and no new zeros can be inferred. What is more interesting is that in the absence
of evidence IBP is also complete for non-zero beliefs for many variables as we show later.
5.2 IBP and Loop-Cutset
It is well-known that if evidence nodes form a loop-cutset, then we can transform any
multiply-connected belief network into an equivalent singly-connected network which can
be solved by belief propagation, leading to the loop-cutset conditioning method [Pearl
1988]. Now that we established that inferred zeros, and in particular inferred evidence
(i.e., when only a single value in the domain of a variable has a non-zero probability) are
sound, we show that evidence play the cutset role automatically during IBPs performance.
Indeed, we can show that during IBPs operation, an observed node X
i
in a Bayesian net-
work blocks the path between its parents and its children as dened in the d-separation
criteria. All the proofs of claims appearing in Section 5.2 and Section 5.3 can be found in
[Bidyuk and Dechter 2001].
PROPOSITION 22. Let X
i
be an observed node in a belief network B. Then for any child
Y
j
of node X
i
, the belief of Y
j
computed by IBP is not dependent on the messages that
X
i
receives from its parents pa(X
i
) or the messages that node X
i
receives from its other
children Y
k
, k ,= j.
From this we can conclude that:
THEOREM 23. If evidence nodes, original or inferred, constitute a loop-cutset, then IBP
converges to the correct beliefs in linear time.
5.3 IBP on Irrelevant Nodes
An orthogonal property is that unobserved nodes that have only unobserved descendents are
irrelevant to the beliefs of the remaining nodes and therefore, processing can be restricted
to the relevant subgraphs. In IBP, this property is expressed by the fact that irrelevant nodes
send messages to their parents that equally support each value in the domain of a parent
and thus do not affect the computation of marginal posteriors of its parents.
PROPOSITION 24. Let X
i
be an unobserved node without observed descendents in B
and let B

be a subnetwork obtained by removing X


i
and its descendents from B. Then,
Y B

the belief of Y computed by IBP over B equals the belief of Y computed by IBP
On the Power of Belief Propagation
over B

.
Thus, in a loopy network without evidence, IBP always converges after 1 iteration since
only propagation of top-down messages affects the computation of beliefs and those mes-
sages do not change. Also in that case, IBP converges to the correct marginals for any
node X
i
such that there exists only one directed path from any ancestor of X
i
to X
i
. This
is because the relevant subnetwork that contains only the node and its ancestors is singly-
connected and by Proposition 24 they are the same as the beliefs computed by applying
IBP to the complete network. In summary,
THEOREM 25. Let B

be a subnetwork obtained from B by recursively eliminating all its


unobserved leaf nodes. If observed nodes constitute a loop-cutset of B

, then IBP applied


to B converges to the correct beliefs for all nodes in B

.
THEOREM 26. If a belief network does not contain any observed nodes or only has ob-
served root nodes, then IBP always converges.
In summary, in Sections 5.2 and 5.3 we observed that IBP exploits the two properties
of observed and unobserved nodes, automatically, without any outside intervention for
network transformation. As a result, the correctness and convergence of IBP on a node
X
i
in a multiply-connected belief network will be determined by the structure restricted to
X
i
s relevant subgraph. If the relevant subnetwork of X
i
is singly-connected relative to the
evidence (observed or inferred), IBP will converge to the correct beliefs for node X
i
.
6 Experimental Evaluation
The goal of the experiments is two-fold. First, since zero values inferred by IBP/IJGP
are proved correct, we want to explore the behavior of IBP/IJGP for near zero inferred
beliefs. Second, we want to explore the hypothesis that the loop-cutset impact on IBPs
performance, as discussed in Section 5.2, also extends to variables with extreme support.
The next two subsections are devoted to these two issues, respectively.
6.1 On the Accuracy of IBP in Near Zero Marginals
We test the performance of IBP and IJGP both on cases of strong and weak inference
power. In particular, we look at networks where probabilities are extreme and investigate
empirically the accuracy of IBP/IJGP across the range of belief values from 0 to 1. Since
zero values inferred by IBP/IJGP are proved correct, we focus especially on the behavior
of IBP/IJGP for near zero inferred beliefs.
Using names inspired by the well known measures in information retrieval, we report
Recall Absolute Error and Precision Absolute Error over small intervals spanning [0, 1].
Recall is the absolute error averaged over all the exact beliefs that fall into the interval, and
can therefore be viewed as capturing the level of completeness. For precision, the average
is taken over all the belief values computed by IBP/IJGP that fall into the interval, and can
be viewed as capturing soundness.
The X coordinate in Figure 5 and Figure 10 denotes the interval [X, X +0.05). For the
rest of the gures, the X coordinate denotes the interval (X0.05, X], where the 0 interval
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
noise = 0.60
0
0.01
0.02
0.03
0.04
0.05
A
b
s
o
l
u
t
e

E
r
r
o
r
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
noise = 0.40
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
noise = 0.20
P
e
r
c
e
n
t
a
g
e
Exact Histogram IBP Histogram Recall Abs. Error Precision Abs. Error
Figure 5. Coding, N=200, evidence=100, w*=15, 1000 instances.
is [0, 0]. The left Y axis corresponds to the histograms (the bars), while the right Y axis
corresponds to the absolute error (the lines). For problems with binary variables, we only
show the interval [0, 0.5] because the graphs are symmetric around 0.5. The number of
variables, number of evidence variables and induced width w* are reported in each graph.
Since the behavior within each benchmark is similar, we report a subset of the results
(for an extended report see [Rollon and Dechter 2009].
Coding networks. Coding networks are the famous case where IBP has impressive per-
formance. The instances are from the class of linear block codes, with 50 nodes per layer
and 3 parent nodes for each variable. We experiment with instances having three differ-
ent values of channel noise: 0.2, 0.4 and 0.6. For each channel value, we generate 1000
samples.
Figure 5 shows the results. When the noise level is 0.2, all the beliefs computed by IBP
are extreme. The Recall and Precision are very small, of the order of 10
11
. So, in this
case, all the beliefs are very small (i.e., small) and IBP is able to infer them correctly, re-
sulting in almost perfect accuracy (IBP is indeed perfect in this case for the bit error rate).
As noise increases, the Recall and Precision get closer to a bell shape, indicating higher
error for values close to 0.5 and smaller error for extreme values. The histograms show that
fewer belief values are extreme as noise increases.
Linkage Analysis networks. Genetic linkage analysis is a statistical method for mapping
genes onto a chromosome. The problem can be modeled as a belief network. We experi-
mented with four pedigree instances from the UAI08 competition. The domain size ranges
between 1 to 4. For these instances exact results are available. Figure 6 shows the results.
We observe that the number of exact 0 beliefs is small and IJGP correctly infers all of them.
The behavior of IJGP for small beliefs varies accross instances. For pedigree1, the Exact
and IJGP histograms are about the same (for all intervals). Moreover, Recall and Precision
errors are relatively small. For the rest of the instances, the accuracy of IJGP for extreme
inferred marginals decreases. Notice that IJGP infers more small beliefs than the number
of exact extremes in the corresponding intervals, leading to relatively high Precision error
while small Recall error. The behaviour for beliefs in the 0.5 interval is reversed, leading to
On the Power of Belief Propagation
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
pedigree1
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
510
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
51
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
Exact Histogram IJGP Histogram Recall Abs. Error Precision Abs. Error
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
510
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
51
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
pedigree23
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
pedigree37
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
510
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
51
0
0.05
0.1
0.15
0.2
0.25
0.3
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
pedigree38
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
510
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
.
5
5
0
.
6
0
.
6
5
0
.
7
0
.
7
5
0
.
8
0
.
8
5
0
.
9
0
.
9
51
0
0.05
0.1
0.15
0.2
i-bound = 3 i-bound = 7
Figure 6. Results on pedigree instances. Each row is the result for one instance. Each
column is the result of running IJGP with i-bound equal to 3 and 7, respectively. The
number of variables N, number of evidence variables NE, and induced width w* of each
instance is as follows. Pedigree1: N = 334, NE = 36 and w*=21; pedigree23: N = 402,
NE = 93 and w*=30; pedigree37: N = 1032, NE = 306 and w*=30; pedigree38:
N = 724, NE = 143 and w*=18.
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Exact Histogram IJGP Histogram Recall Abs. Error Precision Abs. Error
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
(16, 50)
(16, 75)
i-bound = 3 i-bound = 5 i-bound = 7
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
0.02
0.04
0.06
0.08
0.1
Figure 7. Results on grids2 instances. First row shows the results for parameter cong-
uration (16, 50). Second row shows the results for (16, 75). Each column is the result
of running IJGP with i-bound equal to 3, 5, and 7, respectively. Each plot indicates the
mean value for up to 10 instances. Both parameter congurations have 256 variables, one
evidence variable, and induced width w*=22.
high Recall error while small Precision error. As expected, the accuracy of IJGP improves
as the value of the control parameter i-bound increases.
Grid networks. Grid networks are characterized by two parameters (N, D), where NN
is the size of the network and D is the percentage of determinism (i.e., the percentage of
values in all CPTs assigned to either 0 or 1). We experiment with grids2 instances from the
UAI08 competition. They are characterized by parameters (16, . . . , 42, 50, 75, 90).
For each parameter conguration, there are samples of size 10 generated by randomly
assigning value 1 to one leaf node.
Figure 7 and Figure 8 report the results. IJGP correctly infers all 0 beliefs. However, its
performance for small beliefs is quite poor. Only for networks with parameters (16, 50)
the Precision error is relatively small (less than 0.05). If we x the size of the network and
the i-bound, both Precision and Recall errors increase as the determinism level Dincreases.
The histograms clearly show the gap between the number of true small beliefs and the
ones inferred by IJGP. As before, the accuracy of IJGP improves as the value of the control
parameter i-bound increases.
Two-layer noisy-ORnetworks. Variables are organized in two layers where the ones in the
On the Power of Belief Propagation
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Exact Histogram IJGP Histogram Recall Abs. Error Precision Abs. Error
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
(26, 75)
(26, 90)
i-bound = 3 i-bound = 5 i-bound = 7
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
0.05
0.1
0.15
0.2
Figure 8. Results on grids2 instances. First row shows the results for parameter cong-
uration (26, 75). Second row shows the results for (26, 90). Each column is the result
of running IJGP with i-bound equal to 3, 5 and 7, respectively. Each plot indicates the
mean value for up to 10 instances. Both parameter congurations have 676 variables, one
evidence variable, and induced width w*=40.
second layer have 10 parents. Each probability table represents a noisy OR-function. Each
parent variable y
j
has a value P
j
[0..P
noise
]. The CPT for each variable in the second
layer is then dened as, P(x = 0[y
1
, . . . , y
P
) =

y
j
=1
P
j
and P(x = 1[y
1
, . . . , y
P
) =
1 P(x = 0[y
1
, . . . , y
P
). We experiment on bn2o instances from the UAI08 competition.
Figure 9 reports the results for 3 instances. In this case, IJGP is very accurate for all
instances. In particular, the accuracy in small beliefs is very high.
CPCS networks. These are medical diagnosis networks derived from the Computer-Based
Patient Care Simulation system (CPCS) expert system. We tested on two networks, cpcs54
and cpcs360, with 54 and 360 variables, respectively. For the rst network, we generate
samples of size 100 by randomly assigning 10 variables as evidence. For the second net-
work, we also generate samples of the same size by randomly assigning 20 and 30 variables
as evidence.
Figure 10 shows the results. The histograms show opposing trends in the distribution
of beliefs. Although irregular, the absolute error tends to increase towards 0.5 for cpcs54.
In general, the error is quite small throughout all intervals and, in particular, for inferred
extreme marginals.
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
Exact Histogram IJGP Histogram Recall Abs. Error Precision Abs. Error
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
0.0002
0.0004
0.0006
0.0008
0.001
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
bn2o-30-15-150-1a
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
bn2o-30-20-200-1a
P
e
r
c
e
n
t
a
g
e
A
b
s
o
l
u
t
e

E
r
r
o
r
bn2o-30-25-250-1a
0
5
10
15
20
25
30
35
40
45
50
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
50
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
0
.
5
0
0.001
0.002
0.003
0.004
0.005
0.006
i-bound = 3 i-bound = 11 i-bound = 15
Figure 9. Results on bn2o instances. Each row is the result for one instance. Each column
in each row is the result of running IJGP with i-bound equal to 3, 5 and 7, respectively. The
number of variables N, number of evidence variables NE, and induced width w* of each
instance is as follows. bn2o-30-15-150-1a: N = 45, NE = 15, and w*=24; bn2o-30-20-
200-1a: N = 50, NE = 20, and w*=27; bn2o-30-25-250-1a: N = 55, NE = 25, and
w*=26.
6.2 On the Impact of Epsilon Loop-Cutset
In [Bidyuk and Dechter 2001] we explored also the hypothesis that the loop-cutset impact
on IBPs performance, as discussed in Section 5.2, extends to variables with extreme sup-
port. Extreme support is expressed in the form of either extreme prior value P(x
i
) <
or strong correlation with an observed variable. We hypothesize that a variable X
i
with
extreme support nearly-cuts the information ow from its parents to its children similar to
an observed variable. Subsequently, we conjecture that when a subset of variables with ex-
treme support, called -cutset, form a loop-cutset of the graph, IBP converges and computes
beliefs that approach exact ones.
We will briey recap the empirical evidence supporting the hypothesis in 2-layer noisy-
OR networks. The number of root nodes m and total number of nodes n was xed in each
On the Power of Belief Propagation
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
cpcs54, evidence = 10
0.000
0.005
0.010
0.015
0.020
0.025
0.030
0.035
A
b
s
o
l
u
t
e

E
r
r
o
r
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
cpcs360, evidence = 20
P
e
r
c
e
n
t
a
g
e
0
0
.
0
5
0
.
1
0
.
1
5
0
.
2
0
.
2
5
0
.
3
0
.
3
5
0
.
4
0
.
4
5
cpcs360, evidence = 30
Exact Histogram IBP Histogram Recall Abs. Error Precision Abs. Error
Figure 10. CPCS54, 100 instances, w*=15; CPCS360, 5 instances, w*=20
test set (indexed mn). Generating the networks, each leaf node Y
j
was added to the list
of children of a root node U
i
with probability 0.5. All nodes were bi-valued. All leaf nodes
were observed. We used average absolute error in the posterior marginals (averaged over
all unobserved variables) to measure IBPs accuracy and the percent of variables for which
IBP converged as a measure of convergence. In each group of experiments, the results were
averaged over 100 instances.
In one set of experiments, we measured the performance of IBP while changing the
number of observed loop-cutset variables (we xed all priors to (.5, .5) and picked ob-
served value for loop-cutset variables at random). The results are shown in Figure 11,
top. As expected, the number of converged nodes increased and the absolute average error
decreased monotonically as number of observed loop-cutset nodes increased.
Then, we repeated the experiment except now, instead of instantiating a loop-cutset
variable, we set its priors to extreme (, 1-) with =1E 10, i.e., instead of increasing the
number of observed loop-cuset variables, we increased the number of -cutset variables. If
our hypothesis is correct, increasing the size of -cutset should produce an effect similar
to increasing the number of observed loop-cutset variables, namely, improved convergence
and better accuracy in IBP computed beliefs. The results, in Figure 11, bottom, demonstrate
that initially, as the number of -cutset variables grows, the performance of IBP improves
just as we conjectured. However, the percentage of nodes with converged beliefs never
reaches 100% just like the average absolute error converges to some > 0. In the case of
10-40 network, the number of converged beliefs (average absolute error) reaches maximum
of 95% (minimum of .001) at 3 -cutset nodes and then drops to 80% (increases to
.003) as the size of -cutset increases.
To further investigate the effect of the strength of -support on the performance of IBP,
we experimented on the same 2-layer networks varying the prior values of the loop-cutset
nodes from (, 1-) to (1-, ) for [1E 10, .5]. As shown in Figure 12, initially, as
decreased, the convergence and accuracy of IBP worsened. This effect was previously
reported by Murphy, Weiss, and Jordan [Murphy, Weiss, and Jordan 2000]. However,
as the priors of loop-cutset nodes continue to approach 0 and 1, the average error value
approaches 0 and the number of converged nodes reaches 100%. Note that convergence is
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
2-layer, Ave Error
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0 1 2 3 4 5 6 7 8
# observed loop-cutset nodes
5-20
7-23
10-40
2-layer, %converged
40
50
60
70
80
90
100
110
0 1 2 3 4 5 6 7 8
# observed loop-cutset nodes
5-20
7-23
10-40
Ave Error
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0 1 2 3 4 5 6 7 8
size of epsilon-cutset e=1E-10
5-20
7-23
10-40
%converged
0
100
200
300
400
500
600
700
800
900
1000
0 1 2 3 4 5 6 7 8
size of epsilon-cutset e=1E-10
5-20
7-23
10-40
Figure 11. Results for 2-layer Noisy-OR networks. The average error and the number of
converged nodes vs the number of truly observed loop-cutset nodes (top) and the size of of
-cutset (bottom).
not symmetric with respect to . The average absolute error and percentage of converged
nodes approach 0 and 1 respectively for =1-(1E-10) but not for =1E-10 (which we also
observed in Figure 11, bottom).
7 Conclusion
The paper provides insight into the power of the Iterative Belief Propagation (IBP) algo-
rithm by making its relationship with constraint propagation explicit. We show that the
power of belief propagation for zero beliefs is identical to the power of arc-consistency in
removing inconsistent domain values. Therefore, the strength and weakness of this scheme
can be gleaned from understanding the inference power of arc-consistency. In particular
we show that the inference of zero beliefs (marginals) by IBP and IJGP is always sound.
These algorithms are guaranteed to converge for inferred zeros and are as efcient as the
corresponding constraint propagation algorithms.
Then the paper empirically investigates whether the sound inference of zeros by IBP is
extended to near zeros. We show that while the inference of near zeros is often quite accu-
rate, it can sometimes be extremely inaccurate for networks having signicant determinism.
Specically, for networks without determinism IBPs near zero inference was sound in the
sense that the average absolute error was contained within the length of the 0.05 interval
(see two layer noisy-OR and CPCS benchmarks). However, the behavior was different on
benchmark networks having determinism. For example, experiments on coding networks
On the Power of Belief Propagation
2-layer, 100 instances, Ave Err P(x|e)
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
1E-10 1E-07 0.0001 0.05 0.3 0.6 0.9 0.999 1
root priors
5-20
7-23
10-40
2-layer, 100 instances, %nodes converged
0
20
40
60
80
100
120
1E-10 1E-07 0.0001 0.05 0.3 0.6 0.9 0.999 1
root priors
5-20
7-23
10-40
Figure 12. Results for 2-layer Noisy-OR networks. The average error and the percent of
converged nodes vs -support.
show that IBP is almost perfect, while for pedigree and grid networks the results are quite
inaccurate near zeros.
Finally, we show that evidence, observed or inferred, automatically acts as a cycle-
cutting mechanism and improves the performance of IBP. We also provide preliminary
empirical evaluation showing that the effect of loop-cutset on the accuracy of IBP extends
to variables that have extreme probabilities.
References
Bidyuk, B. and R. Dechter (2001). The epsilon-cutset effect in Bayesian networks, r97,
r97a in http://www.ics.uci.edu/ dechter/publications. Technical report, University of
California, Irvine.
Dechter, R. (2003). Constraint Processing. Morgan Kaufmann Publishers.
Dechter, R. and R. Mateescu (2003). A simple insight into iterative belief propagations
success. In Proceedings of the Nineteenth Conference on Uncertainty in Articial
Intelligence (UAI03), pp. 175183.
Dechter, R., R. Mateescu, and K. Kask (2002). Iterative join-graph propagation. In
Proceedings of the Eighteenth Conference on Uncertainty in Articial Intelligence
(UAI02), pp. 128136.
Dechter, R. and J. Pearl (1991). Directed constraint networks: A relational framework
for causal reasoning. In Proceedings of the Twelfth International Joint Conferences
on Articial Intelligence (IJCAI91), pp. 11641170.
Ihler, A. T. (2007). Accuracy bounds for belief propagation. In Proceedings of the
Twenty Third Conference on Uncertainty in Articial Intelligence (UAI07).
Ihler, A. T., J. W. Fisher, III, and A. S. Willsky (2005). Loopy belief propagation: Con-
vergence and effects of message errors. J. Machine Learning Research 6, 905936.
Koller, D. (2010). Belief propagation in loopy graphs. In Heuristics, Probabilities and
Causality: A tribute to Judea Pearl, Editors, R. Dechter, H. Gefner and J. Halpern.
R. Dechter, B. Bidyuk, R. Mateescu and E. Rollon
Mackworth, A. K. (1977). Consistency in networks of relations. Articial Intelli-
gence 8(1), 99118.
Mateescu, R., K. Kask, V. Gogate, and R. Dechter (2010). Iterative join-graph propaga-
tion. Journal of Articial Intelligence Research (JAIR) (accepted, 2009).
McEliece, R. J., D. J. C. MacKay, and J. F. Cheng (1998). Turbo decoding as an in-
stance of Pearls belief propagation algorithm. IEEE J. Selected Areas in Communi-
cation 16(2), 140152.
Mooij, J. M. and H. J. Kappen (2007). Sufcient conditions for convergence of the sum-
product algorithm. IEEE Trans. Information Theory 53(12), 44224437.
Mooij, J. M. and H. J. Kappen (2009). Bounds on marginal probability distributions. In
Advances in Neural Information Processing Systems 21 (NIPS08), pp. 11051112.
Murphy, K., Y. Weiss, and M. Jordan (2000). Loopy-belief propagation for approxi-
mate inference: An empirical study. In Proceedings of the Sixteenth Conference on
Uncertainty in Articial Intelligence (UAI00), pp. 467475.
Pearl, J. (1986). Fusion, propagation, and structuring in belief networks. Articial. In-
telligence 29(3), 241288.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann Pub-
lishers.
Rish, I., K. Kask, and R. Dechter (1998). Empirical evaluation of approximation algo-
rithms for probabilistic decoding. In Proceedings of the Fourteenth Conference on
Uncertainty in Articial Intelligence (UAI98), pp. 455463.
Rollon, E. and R. Dechter (December, 2009). Some new empirical analysis in iterative
join-graph propagation, r170 in http://www.ics.uci.edu/ dechter/publications. Tech-
nical report, University of California, Irvine.
Roosta, T. G., M. J. Wainwright, and S. S. Sastry (2008). Convergence analysis of
reweighted sum-product algorithms. IEEE Trans. Signal Processing 56(9), 4293
4305.
Yedidia, J. S., W. T. Freeman, and Y. Weiss (2000). Generalized belief propagation. In
Advances in Neural Information Processing Systems 13 (NIPS00), pp. 689695.

You might also like