Bayes Ball
Bayes Ball
• Goal 1: represent a joint distribution P(X) = P(x1, x2, . . . , xn) • Probabilistic graphical models represent large joint distributions
compactly even when there are many variables. compactly using a set of “local” relationships specified by a graph.
• Goal 2: efficiently calculate marginal and conditionals of such • Each random variable in our model corresponds to a graph node.
compactly represented joint distributions. • There are directed/undirected edges between the nodes which tell
• Notice: for n discrete variables of arity k, the naive (table) us qualitatively about the factorization of the joint probability.
representation is HUGE: it requires k n entries. • There are functions stored at the nodes which tell us the
• We need to make some assumptions about the distribution. quantitative details of the pieces into which the distribution factors.
X4
One simple assumption: independence
Q == complete factorization: X2
P(X) = i P(xi) X6
X1
• But the independence assumption is too restrictive.
So we make conditional independence assumptions instead. X Y Z
X3 X5
• Each node maintains a function fi(xi; xπi ) such that X6 P(x1, x2, x3, x4, x5, x6) =
P X1
fi > 0 and xi fi(xi; xπi ) = 1 ∀πi. P(x1)P(x2|x1)P(x3|x1)
• Define the joint probability to be: P(x4|x2)P(x5|x3)P(x6|x2, x5)
Y X3 X5
P(x1, x2, . . . , xn) = fi(xi; xπi ) x2
0 1
i x1
x4 0
Even with no further restriction on the the fi, it is always true that 0
0 1
1
x5 1
x2
fi(xi; xπi ) = P(xi|xπi ) 1 X4
0
0
x6
X2
so we will just write 1
Y x1
0 X6
0 1
x2
X1
P(x1, x2, . . . , xn) = P(xi|xπi ) 1
i X3 X5
x3
x1
• Factorization of the joint in terms of local conditional probabilities. 0 1
0
0 1
0 x5
Exponential in “fan-in” of each node instead of in total variables n. x3
1
1
• If we order the nodes in a directed graphical model so that parents • Key point about directed graphical models:
always come before their children in the ordering then the graphical Missing edges imply conditional independence
model implies the following about the distribution: • Remember, that by the chain rule we can always write the full joint
{xi ⊥ xπei |xπi }∀i as a product of conditionals, given an ordering:
where xπei are the nodes coming before xi that are not its parents. P(x1, x2, x3, x4, . . .) = P(x1)P(x2|x1)P(x3|x1, x2)P(x4|x1, x2, x3) . . .
• In other words, the DAG is telling us that each variable is • If the joint is represented by a DAGM, then some of the
conditionally independent of its non-descendants given its parents. conditioned variables on the right hand sides are missing.
• Such an ordering is called a “topological” ordering. This is equivalent to enforcing conditional independence.
• Start with the “idiot’s graph”: each node has all previous nodes in
the ordering as its parents.
• Now remove edges to get your DAG.
• Removing an edge into node i eliminates an argument from the
conditional probability factor p(xi|x1, x2, . . . , xi−1)
Even more structure Chain
(a) (b)
Y Y
Y
• x and z are independent, but if I tell you y, they become coupled! (a) (b)
• The Bayes-Ball Algorithm is a such a d-separation test. • Here’s a trick for the explaining away case:
We shade all nodes xC , place balls at each node in xA (or xB ), let If y or any of its descendants is shaded,
them bounce around according to some rules, and then ask if any the ball passes through.
X Z X Z
of the balls reach any of the nodes in xB (or xA).
W
X
Y Y
Y
(a) (b)
So we need to know what happens
when a ball arrives at a node Y • Notice balls can travel opposite to edge directions.
Z on its way from X to Z.
Canonical Micrographs Examples of Bayes-Ball Algorithm
X Y Z X Y Z x2 ⊥ x3|{x1, x6} ?
X Y X Y
X4
X Y X Y
X Z X Z X3 X5
(a) (b) (a) (b)
X Z X Z Notice: balls can travel opposite to edge directions.
Y Y
(a) (b)