Chapter 3
Chapter 3
Louis Wehenkel
We will first propose a set of four rules and then verify that they
are valid inference rules for probabilistic independence relations.
Semi-graphoids
What are the desired properties of an independence relation ?
◮ Symmetry:
(X ⊥ Y|Z) ⇔ (Y ⊥ X|Z). (1)
◮ Decomposition:
◮ Contraction:
◮ Chaining rule:
Exercise: Show that these two rules follow logically from (1) to
(4), and hence are valid inference rules for any probabilistic
independence relation.
Summary
Summary of UG models
◮ We have seen that we can make use of undirected models to
represent useful independencies, in a way compatible with the
definition of probabilistic conditional independence.
◮ Not all independence structures may be represented by UGs.
◮ But we can commit with the idea of building the most refined
model of them in the form of a minimal I-map.
◮ Further topics of relevance:
◮ For any G is there a P such that its G is a perfect map ?
(answer is yes, with a few hypotheses).
◮ Once we have a minimal I-map (or simply an I-map) how do
we “decorate” this structure with numerical information to
represent a particular probability distribution P ? (later)
◮ How to carry out “numerical computations” with such
structures ? (later)
Louis Wehenkel IT... (24/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models
D-separation in DAGs
Definition (D-separation)
If X, Y, Z are three disjoint sets of vertices in a DAG D, then Z is
said to d-separate X from Y, denoted by (X; Y|Z)D , if there is no
path between a vertex in X and a vertex in Y along which the
following two conditions hold: (i) every node with converging
arrows is in Z or has a descendant in Z and (ii) every other node is
outside of Z.
◮ If a path satisfies the above condition, it is said to be active;
otherwise it is said to be blocked.
◮ A DAG is an I-map of P is all its d-separations correspond to
conditional independencies satisfied in P.
◮ It is a minimal I-map, or a Bayesian network of P, if none of
its arrows can be deleted without destroying its I-mapness.
Louis Wehenkel IT... (27/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models
1
ie. is a minimal subset of Udi satisfying ({vi } ⊥M (Udi \ Bdi )|Bdi )
Louis Wehenkel IT... (28/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models
◮ Nota Bene:
◮ given any M, the existence (not necessarily uniquely) of a
boundary strata is established once d is given.
◮ hence, there is a at least one boundary DAG defined by each
ordering d; they may all be different or all identical or diverse
(cf examples).
◮ if M is induced by a strictly positive P, then for each ordering
d the boundary strata is unique as well as its induced
boundary DAG.
◮ Still, different orderings may yield identical boundary DAGs.
◮ Main result:
◮ For any semi-graphoid M (in particular, for any P induced
independence model) and any ordering d, any corresponding
boundary DAG is a minimal I-map of M.
Chordal−graph isomorph
(decomposable) models
Quantitative aspects
Motivations
◮ Tree structured models offer simple interpretations
◮ Efficient inference algorithms
◮ Efficient learning algorithms
Two classes
◮ Undirected trees and their equivalent directed versions
◮ Polytrees : DAGs whose skeleton is a tree
Parameterizing UGs
In order to create a P from an UG we proceed as follows (see Pearl
for additional details):
1. Identify the set C = {C1 , . . .} of all maximal cliques of G .
2. For each i ≤ #C, assign a compatibility function gi (·) which
maps configurations of the subset of variables of the clique to
a non-negative real number.
3. Write
Q
i gQ
i (ci (x1 , . . . , xn ))
P(x1 , . . . , xn ) = P ,
x1 ,...,xn i gi (ci (x1 , . . . , xn ))
where ci (x1 , . . . , xn ) extracts from a configuration of all the
variables of the UG the corresponding configuration of the
subset of variables of the clique Ci .
But: How to derive the functions gi from evidence, and how to
compute the normalization constant are practical problems...
Louis Wehenkel IT... (36/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models
◮ Cliques: X − Y and Y − Z
◮ Compatibility functions: g1 (x, y ) and g2 (y , z)
◮ Suppose we know P(X , Y , Z ): how to derive the gi from P?
◮ Idea 1:
◮ let us take g1 (x, y ) = P(x, y ) and g2 (y , z) = P(y , z)
◮ we getPP(x,P y , z) =P(P(x, y )P(y , z))/K where
K = x∈X y∈Y z∈Z P(x, y )P(y , z)
◮ Impossible in general (Why ?)
◮ Idea 2:
p
◮ Let us, instead, takepg1 (x, y ) = P(x, y )/ P(y ) and
g2 (y , z) = P(y , z)/ P(y ): we get K = 1 ! (Why ?)
◮ Possible in general (Why ?)
Markov trees
Summary
Polytrees