0% found this document useful (0 votes)
101 views5 pages

Bayes Ball

This document discusses directed graphical models and conditional independence. Some key points: 1) Directed graphical models (DAGs) represent joint distributions compactly using a graph where nodes are variables and edges represent conditional dependencies. 2) The joint distribution factors according to the graph as a product of conditional probabilities, one for each variable given its parents. 3) DAGs imply conditional independencies between nodes - a node is independent of other non-descendant nodes given its parents. Missing edges in the graph directly imply conditional independencies in the modeled distribution.

Uploaded by

islam2059
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views5 pages

Bayes Ball

This document discusses directed graphical models and conditional independence. Some key points: 1) Directed graphical models (DAGs) represent joint distributions compactly using a graph where nodes are variables and edges represent conditional dependencies. 2) The joint distribution factors according to the graph as a product of conditional probabilities, one for each variable given its parents. 3) DAGs imply conditional independencies between nodes - a node is independent of other non-descendant nodes given its parents. Missing edges in the graph directly imply conditional independencies in the modeled distribution.

Uploaded by

islam2059
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Conditional Independence

Lecture 2: • Notation: xA ⊥ xB |xC


Definition: two (sets of) variables xA and xB are conditionally
Directed Graphical Models independent given a third xC if:
P(xA, xB |xC ) = P(xA|xC )P(xB |xC ) ∀xC
which is equivalent to saying
P(xA|xB , xC ) = P(xA|xC ) ∀xC
Sam Roweis
• Only a subset of all distributions respect any given (nontrivial)
conditional independence statement. The subset of distributions
that respect all the CI assumptions we make is the
family of distributions consisitent with our assumptions.
January 7, 2004 • Probabilistic graphical models are a powerful, elegant and simple
way to specify such a family.

Joint Probabilities Probabilistic Graphical Models

• Goal 1: represent a joint distribution P(X) = P(x1, x2, . . . , xn) • Probabilistic graphical models represent large joint distributions
compactly even when there are many variables. compactly using a set of “local” relationships specified by a graph.
• Goal 2: efficiently calculate marginal and conditionals of such • Each random variable in our model corresponds to a graph node.
compactly represented joint distributions. • There are directed/undirected edges between the nodes which tell
• Notice: for n discrete variables of arity k, the naive (table) us qualitatively about the factorization of the joint probability.
representation is HUGE: it requires k n entries. • There are functions stored at the nodes which tell us the
• We need to make some assumptions about the distribution. quantitative details of the pieces into which the distribution factors.
X4
One simple assumption: independence
Q == complete factorization: X2
P(X) = i P(xi) X6
X1
• But the independence assumption is too restrictive.
So we make conditional independence assumptions instead. X Y Z

X3 X5

• Graphical models are also known as Bayes(ian) (Belief) Net(work)s.


Directed Graphical Models Example DAG
• Consider directed acyclic graphs over n variables. • Consider this six node network: The joint probability is now:
X4
• Each node has (possibly empty) set of parents πi. X2

• Each node maintains a function fi(xi; xπi ) such that X6 P(x1, x2, x3, x4, x5, x6) =
P X1
fi > 0 and xi fi(xi; xπi ) = 1 ∀πi. P(x1)P(x2|x1)P(x3|x1)
• Define the joint probability to be: P(x4|x2)P(x5|x3)P(x6|x2, x5)
Y X3 X5
P(x1, x2, . . . , xn) = fi(xi; xπi ) x2
0 1
i x1
x4 0
Even with no further restriction on the the fi, it is always true that 0
0 1
1
x5 1
x2
fi(xi; xπi ) = P(xi|xπi ) 1 X4
0
0

x6
X2
so we will just write 1

Y x1
0 X6
0 1
x2
X1
P(x1, x2, . . . , xn) = P(xi|xπi ) 1

i X3 X5
x3
x1
• Factorization of the joint in terms of local conditional probabilities. 0 1
0
0 1

0 x5
Exponential in “fan-in” of each node instead of in total variables n. x3
1
1

Conditional Independence in DAGs Missing Edges

• If we order the nodes in a directed graphical model so that parents • Key point about directed graphical models:
always come before their children in the ordering then the graphical Missing edges imply conditional independence
model implies the following about the distribution: • Remember, that by the chain rule we can always write the full joint
{xi ⊥ xπei |xπi }∀i as a product of conditionals, given an ordering:
where xπei are the nodes coming before xi that are not its parents. P(x1, x2, x3, x4, . . .) = P(x1)P(x2|x1)P(x3|x1, x2)P(x4|x1, x2, x3) . . .
• In other words, the DAG is telling us that each variable is • If the joint is represented by a DAGM, then some of the
conditionally independent of its non-descendants given its parents. conditioned variables on the right hand sides are missing.
• Such an ordering is called a “topological” ordering. This is equivalent to enforcing conditional independence.
• Start with the “idiot’s graph”: each node has all previous nodes in
the ordering as its parents.
• Now remove edges to get your DAG.
• Removing an edge into node i eliminates an argument from the
conditional probability factor p(xi|x1, x2, . . . , xi−1)
Even more structure Chain

• Surprisingly, once you have specified the basic conditional X Y Z X Y Z


independencies, there are other ones that follow from those.
• In general, it is a hard problem to say which extra CI statements
follow from a basic set. However, in the case of DAGMs, we have
an efficient way of generating all CI statements that must be true • Q: When we condition on y, are x and z independent?
given the connectivity of the graph. P(x, y, z) = P(x)P(y|x)P(z|y)
• This involves the idea of d-separation in a graph. which implies
• Notice that for specific (numerical) choices of factors at the nodes P(x, y, z)
P(z|x, y) =
there may be even more conditional independencies, but we are P(x, y)
only concerned with statements that are always true of every P(x)P(y|x)P(z|y)
=
member of the family of distributions, no matter what specific P(x)P(y|x)
factors live at the nodes. = P(z|y)
• Remember: the graph alone represents a family of joint distributions and therefore x ⊥ z|y
consistent with its CI assumptions, not any specific distribution. • Think of x as the past, y as the present and z as the future.

D-separation Common Cause


Y Y
• D-separation, or directed-separation is a notion of connectedness in
DAGMs in which two (sets of) variables may or may not be y is the common cause
connected conditioned on a third (set of) variable. of the two independent
effects x and z
• D-connection implies conditional dependence and d-separation
X Z X Z
implies conditional independence.
• In particular, we say that xA ⊥ xB |xC if every variable in A is • Q: When we condition on y, are x and z independent?
d-separated from every variable in B conditioned on all the P(x, y, z) = P(y)P(x|y)P(z|y)
variables in C. which implies
• To check if an independence is true, we can cycle through each P(x, y, z)
node in A, do a depth-first search to reach every node in B, and P(x, z|y) =
P(y)
examine the path between them. If all of the paths are d-separated, P(y)P(x|y)P(z|y)
then we can assert xA ⊥ xB |xC . =
P(y)
• Thus, it will be sufficient to consider triples of nodes. (Why?) = P(x|y)P(z|y)
• Pictorially, when we condition on a node, we shade it in. and therefore x ⊥ z|y
Explaining Away Bayes-Ball Rules
X Z X Z
• The three cases we considered tell us rules:
X Y Z X Y Z

(a) (b)
Y Y
Y

• Q: When we condition on y, are x and z independent?


P(x, y, z) = P(x)P(z)P(y|x, z)
X Z X Z
• x and z are marginally independent, but given y they are (a) (b)
X Z X Z
conditionally dependent.
• This important effect is called explaining away (Berkson’s paradox.)
• For example, flip two coins independently; let x=coin1,z=coin2.
Let y=1 if the coins come up the same and y=0 if different. Y Y

• x and z are independent, but if I tell you y, they become coupled! (a) (b)

Bayes Ball Algorithm Bayes-Ball Boundary Rules


• To check if xA ⊥ xB |xC we need to check if every variable in A is • We also need the boundary conditions:
X Y X Y X Y X Y
d-separated from every variable in B conditioned on all vars in C.
• In other words, given that all the nodes in xC are clamped, when
we wiggle nodes xA can we change any of the node xB ? (a) (b) (a) (b)

• The Bayes-Ball Algorithm is a such a d-separation test. • Here’s a trick for the explaining away case:
We shade all nodes xC , place balls at each node in xA (or xB ), let If y or any of its descendants is shaded,
them bounce around according to some rules, and then ask if any the ball passes through.
X Z X Z
of the balls reach any of the nodes in xB (or xA).
W
X

Y Y
Y
(a) (b)
So we need to know what happens
when a ball arrives at a node Y • Notice balls can travel opposite to edge directions.
Z on its way from X to Z.
Canonical Micrographs Examples of Bayes-Ball Algorithm
X Y Z X Y Z x2 ⊥ x3|{x1, x6} ?
X Y X Y

X4

(a) (b) (a) (b) X2


Y Y
X6
X1

X Y X Y

X Z X Z X3 X5
(a) (b) (a) (b)
X Z X Z Notice: balls can travel opposite to edge directions.

Y Y

(a) (b)

Examples of Bayes-Ball Algorithm Families of Distributions


x1 ⊥ x6|{x2, x3} ? • Consider two families of distributions.
X4 • One is generated by all possible settings of the conditional
X2 probability tables in the DAGM form:
Y
X6 P(x1, x2, . . . , xn) = P(xi|xπi )
X1
i
• The other is generated by finding all the conditional independencies
implied by a DAGM and eliminating any joint distributions which
X3 X5 violate them.
• A version of the amazing Hammersley-Clifford Theorem (1971)
states that these two families are exactly the same.

You might also like