0% found this document useful (0 votes)

27 views54 pages

Chapter 3

Uploaded by

Mayouf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views54 pages

Chapter 3

Uploaded by

Mayouf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

On the qualitative vs quantitative notion of Independence

Graphical models of independence relations

Discussion and further topics
Tree structured graphical models

Introduction to information theory and coding -

Lecture 1 on Graphical models

Louis Wehenkel

Department of Electrical Engineering and Computer Science

University of Liège

Montefiore - Liège - October, 2011

Find slides: http://montefiore.ulg.ac.be/∼lwh/Info/

Louis Wehenkel IT... (1/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

On the qualitative vs quantitative notion of Independence

Motivation for this lecture
Characterization of Independence relations

Graphical models of independence relations

Undirected graphical models: Markov networks
Directed graphical models: Bayesian networks

Discussion and further topics

Relations between UGs and DAGs
Quantitative aspects

Tree structured graphical models

Louis Wehenkel IT... (2/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Motivations and structure of course

◮ Objective:
Introduce and motivate graphical representation of qualitative
and quantitative probabilistic knowledge
◮ Qualitative notion of dependence
◮ Characterization of desired properties of independence relations
◮ Probability calculus as a model of Independence relations
◮ Two graphical representations of Independence relations
◮ Undirected graphs: Markov networks
◮ Directed graphs: Bayesian networks
◮ Relations between these two types of representations
◮ Quantitative aspects/questions
◮ In depth analysis of tree-structured graphical models
◮ Undirected trees and the Chow-Liu algorithm
◮ Directed trees and polytrees
Louis Wehenkel IT... (3/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Why do we need a qualitative notion of dependence?

◮ Making statements about independence (or relevance) is a
profound feature of common-sense reasoning, while probability
calculus gives a formalization and a safe procedure for testing
any (conditional) Independence statements.
◮ However, this procedure relies on the computation of the
probabilities of all combinations of statements, and is
essentially intractable in large domains.
◮ In short, the probability calculus procedure is in itself not an
operational model of reasoning about Independence relations,
specially when we don’t (yet) have the numbers.
◮ We would like to dispose of a kind of ’logic’ of Independence,
in which we can derive easily new Independence statements
from previously established or postulated ones, without
resorting to number crunching.
Louis Wehenkel IT... (4/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Desired properties of Independence relations

Consider a domain characterised by a finite set U of discrete

variables, and let A, B, C denote three disjoint subsets of U.

Let us denote by A ⊥ B|C the statement that “A is independent of

B, given that we know C”, i.e. when we already know the values of
the variables in C, we consider that the knowledge of the values of
the variables in B is irrelevant to our beliefs about the values in A.

We want to derive rules which are characteristic of independence

relations, and which allow us to infer in a sound way new
independence relations from established ones.

We will first propose a set of four rules and then verify that they
are valid inference rules for probabilistic independence relations.

Louis Wehenkel IT... (5/54)

Semi-graphoids
What are the desired properties of an independence relation ?
◮ Symmetry:
(X ⊥ Y|Z) ⇔ (Y ⊥ X|Z). (1)
◮ Decomposition:

(X ⊥ (Y ∪ W)|Z) ⇒ (X ⊥ Y|Z)&(X ⊥ W|Z). (2)

◮ Weak union: NB: “strong” union will be defined later.

(X ⊥ (Y ∪ W)|Z) ⇒ (X ⊥ Y|(Z ∪ W)). (3)

◮ Contraction:

(X ⊥ Y|Z)&(X ⊥ W|(Z ∪ Y)) ⇒ (X ⊥ (Y ∪ W)|Z). (4)

Louis Wehenkel IT... (6/54)

In other words, what are the desired properties of such a relation ?

◮ Symmetry:
If Y tells us nothing about X (in some context Z), then X
tells us nothing about Y.
◮ Decomposition:
If two combined items of information are judged irrelevant to
X, then each separate item is irrelevant as well.
◮ Weak union:
Learning some irrelevant information W cannot help the other
irrelevant information Y become relevant.
◮ Contraction:
If we judge W irrelevant to X after learning some irrelevant
information Y, then W must also have been irrelevant before
we learned Y.

Louis Wehenkel IT... (7/54)

Properties of the probabilistic independence relation

NB: if P is a probability distribution defined over the variables in U, we
write: (A ⊥P B|C) ⇔ (∀a, b, c : P(b, c) > 0 ⇒ P(a|b, c) = P(a|c)).

Theorem (Probabilistic independence)

The probabilistic independence relationship ( · ⊥P · | · ) induced by
any probabilistic model P satisfies the four properties (1)-(4)
(symmetry, decomposition, weak union and contraction).

Theorem (Intersection property)

The probabilistic independence relationship induced by any strictly
positive probabilistic model P also satisfies

(X ⊥p Y|(Z ∪ W))&(X ⊥P W|(Z ∪ Y)) ⇒ (X ⊥P (Y ∪ W)|Z). (5)

Louis Wehenkel IT... (8/54)

Induced inference rules

◮ Chaining rule:

(X ⊥ Z|Y)&((X ∪ Y) ⊥ W|Z) ⇒ X ⊥ W|Y.

◮ Mixing rule:

(X ⊥ (Y ∪ W)|Z)&(Y ⊥ W|Z) ⇒ (X ∪ W) ⊥ Y|Z.

Exercise: Show that these two rules follow logically from (1) to
(4), and hence are valid inference rules for any probabilistic
independence relation.

Louis Wehenkel IT... (9/54)

Summary

◮ We have abstracted from the quantitative notion of

conditional independence defined by probability theory.
◮ This abstraction is necessary for efficient manipulation of the
notion of independence/irrelevance.
◮ We have shown, to some extent, that one can axiomatize the
notion of independence in a way which remains logically
coherent with the same notion defined by probability calculus.
◮ We have illustrated that such an axiomatization is useful to
derive new independencies from postulated ones, and even
new inference rules from postulated ones.
◮ However, we are still lacking an intuitive and efficient way to
reason ourselves coherently in this framework.

Louis Wehenkel IT... (10/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Why graphical (independence) models?

◮ A picture is worth a thousand words...

Louis Wehenkel IT... (11/54)

Why graphical models?

◮ A picture is worth a thousand words...

Louis Wehenkel IT... (12/54)

Why graphical models?

◮ A picture is worth a thousand words...

Louis Wehenkel IT... (13/54)

Undirected graphs as independence models

Notion of undirected graph:

◮ A (general) graph is denoted by G = (V , E ) where V is a
finite set of vertices, and E ⊂ V × V is the set of edges.
◮ A path (of length n > 0) in G , is a sequence of different
vertices v1 , v2 , . . . , vn+1 such that (vi , vi +1 ) ∈ E , i = 1, . . . , n.
◮ An edge (v , v ′ ) ∈ E such that v = v ′ is called a loop.
◮ An edge (v , v ′ ) ∈ E such that v 6= v ′ and (v ′ , v ) ∈ E is called
a line.
◮ An edge which not a line nor a loop is called an arrow.
◮ We say that G is undirected if G has no loops and no arrows
(i.e. G has only lines).

Louis Wehenkel IT... (14/54)

Vertex separation in undirected graphs

From local to global vertex separation:

◮ Consider an undirected G = (U, E ).
◮ The absence of a line between two variables represents the
absence of a direct interaction between them.
◮ All other relations are induced by the notion of separation:
We say that in a graph G the sets A and B are separated by
C if all paths from A to B traverse C.
We denote this by (A; B|C)G .
◮ In particular, we say that the sets A and B are separated if
there is no path from A to B.
◮ In particular, if there is no line connecting A to B, then
(A; B|U \ (A ∪ B))G

Louis Wehenkel IT... (15/54)

Undirected graphs as independence models

◮ The good news:

◮ The vertex separation relation satisfies properties (1) to (5)
(please check this yourself)
◮ Vertex separation is easy to check (polynomial time).
◮ Questions:
◮ Is vertex separation compatible with probabilistic
independence?
◮ How general is vertex separation wrt probabilistic
independence?
◮ What kind of independence relations can be exactly
represented by vertex separation?

Louis Wehenkel IT... (16/54)

Notions of dependency, independency and perfect maps

Consider a distribution P and an undirected graph G over U.
Definition (D-map (independent subsets are indeed separated))
G is a D-map of P if for any three disjoint A, B, C ⊂ U we have

(A ⊥P B|C) ⇒ (A; B|C)G . (6)

Definition (I-map (separated subsets are indeed independent))

G is a I-map of P if for any three disjoint A, B, C ⊂ U we have

(A ⊥P B|C) ⇐ (A; B|C)G . (7)

Definition (Perfect map (equivalence between “⊥P ” and “;”))

G is a perfect map of P if it is a D-map and an I-map of P.
Louis Wehenkel IT... (17/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Representation power of UGs

◮ Preliminary comments:
◮ Any P has at least a D-map (e.g. the empty graph)
◮ Any P has at least an I-map (e.g. the complete graph)
◮ Some P have no perfect-map (e.g. two coins and a bell)
◮ There is thus a need to delineate more precisely
◮ those dependency models that have perfect maps, and
◮ those graphical models which are perfect maps of a
dependency model
◮ provide constructive algorithms to switch between P and G .
◮ We say that a dependency model M (i.e. a rule that assigns
truth values to a three-place relation (A ⊥M B|C) over
disjoint subsets of some U) is graph-isomorph if there exists
an undirected graph (U, E ) which is a perfect map of M.
◮ Goal: characterize graph-isomorph probabilistic models.
Louis Wehenkel IT... (18/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

On the structure of the set of UGs over some U

◮ Lattice structure:
◮ For a fixed U, we can identify an undirected graph G = (U, E )
with its set of edges E .
◮ The set of edges can itself be identified with a subset of the
set of pairs {v , v ′ } ∈ U.
◮ For any G = (U, E ) and G ′ = (U, E ′ ), let us write G ⊂ G ′ if
E ⊂ E ′.
◮ Monotonicity wrt addition or removal of edges:
◮ if G is a D-map of P, any G ′ ⊂ G is also a D-map of P,
◮ if G is an I-map of P, any G ′ ⊃ G is also an I-map of P.
◮ Extreme maps:
◮ G is a minimal I-map, if there is no G ′ ⊂ G (other than G
itself) which is also an I-map.
◮ G is a maximal D-map, if there is no G ′ ⊃ G (other than G
itself) which is also a D-map.
Louis Wehenkel IT... (19/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Characterization of graph-isomorph dependency model

Theorem (Graph isomorph dependency model M)

A necessary and sufficient condition for a dependency model M
over some U to be graph-isomorph, is that is satisfies Symmetry,
Decomposition, Intersection, Strong union and Transitivity,

where Strong union means that:

(X ⊥M Y|Z) ⇒ (X ⊥M Y|(Z ∪ W)), (8)

and Transitivity means that:

(X ⊥M Y|Z) ⇒ ∀γ ∈ U : (X ⊥M {γ}|Z) or ({γ} ⊥M Y|Z). (9)

(NB: {γ} denotes a singleton subset of U)

Louis Wehenkel IT... (20/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Question 1: Given P construct minimal I-map G of P

Theorem (Existence, unicity, construction of minimal I-map)
Every dependency model M which satisfies symmetry,
decomposition and intersection, has a unique minimal I-map
G0 = (U, E0 ) produced by connecting only those pairs (v , v ′ ) for
which ({v } ⊥M {v ′ }|U \ {v , v ′ }) is FALSE.
◮ Motivation: a minimal I-map is a graph displaying a maximal
number of independencies without false-positives.
◮ Notice that the property holds for independencies displayed by
strictly positive P.
◮ NB:
◮ existence of an I-map guarantees existence of a minimal one,
but not unicity.
◮ NB: unicity of minimal I-map implies that any I-map is a
superset of the unique minimal one.
Louis Wehenkel IT... (21/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Question 2: Check whether G is an I-map of P

◮ If P is strictly positive, we can check whether G is an I-map,

by constructing first a minimal I-mapG0 of P and then
checking whether G0 ⊂ G .
◮ Assuming that we can establish ({v } ⊥P {v ′ }|U \ {v , v ′ }) for
any (v , v ′ ) in constant time:
◮ building a minimal I-map G0 may be done in polynomial time
(quadratic in the number of variables).
◮ checking G0 ⊂ G may also be done in polynomial time
(quadratic).
◮ Hence these problems are solved “efficiently” if we have an
oracle providing in constant time answers to “elementary”
queries of conditional independence of two variables given all
others.

Louis Wehenkel IT... (22/54)

Markov networks, blankets and boundaries

Definitions:
◮ Given P (resp. M), we say that G is a Markov network of P
(resp. M) is it is a minimal I-map of P (resp. M).
◮ A Markov blanket BLM (v ) of v ∈ U is any subset S ⊂ U for
which ({v } ⊥M U \ ({v } ∪ S)|S).
◮ A Markov boundary BM (v ) of v ∈ U is a minimal Markov
blanket.

Theorem (Unicity and construction of Markov boundaries)

Every element v of a dependency model M which satisfies
symmetry, decomposition, intersection, and weak union, has a
unique Markov boundary, and this corresponds with the set of
vertices adjacent to v in the minimal I-map G0 of M.

Louis Wehenkel IT... (23/54)

Summary of UG models
◮ We have seen that we can make use of undirected models to
represent useful independencies, in a way compatible with the
definition of probabilistic conditional independence.
◮ Not all independence structures may be represented by UGs.
◮ But we can commit with the idea of building the most refined
model of them in the form of a minimal I-map.
◮ Further topics of relevance:
◮ For any G is there a P such that its G is a perfect map ?
(answer is yes, with a few hypotheses).
◮ Once we have a minimal I-map (or simply an I-map) how do
we “decorate” this structure with numerical information to
represent a particular probability distribution P ? (later)
◮ How to carry out “numerical computations” with such
structures ? (later)
Louis Wehenkel IT... (24/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Directed graphical models: motivations

◮ Undirected graphical models lack of representation power:
◮ unable to represent induced and non-transitive dependencies.
◮ no representation of causality.
◮ To overcome these deficiencies, we will use the language of
directed graphs.
◮ Arrows (instead of lines) may allow to distinguish genuine
dependencies from spurious ones (see two-coins + bell
example).
◮ Arrows also allow to impose (asymmetric) causal relations on
top of (symmetric) dependencies.
◮ Menu:
◮ Define Bayesian networks and their semantics as independence
maps.
◮ Answer questions 1, 2 and 3 along the treatment we have done
for Markov models.
Louis Wehenkel IT... (25/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Directed acyclic graphs (DAGs)

◮ A directed graph D = (V , E ) is a graph with only arrows:

◮ i.e. no loops and no lines,
◮ or, in other words, (v , v ′ ) ∈ E ⇒ v 6= v ′ &(v ′ , v ) 6∈ E .
◮ A cycle of length n > 0, in a graph G = (V , E ), is a sequence
v1 , . . . , vn+1 such that (vi , vi +1 ) ∈ E & v1 = vn+1 .
◮ The cycle is said to be simple (or proper) if all nodes except
v1 and vn+1 are different.
◮ A DAG is a directed graph without any cycle.
◮ Note that this is equivalent to saying:
◮ that a DAG is a directed graph without any simple cycle.
◮ or that a DAG is a graph without any cycle.

Louis Wehenkel IT... (26/54)

D-separation in DAGs
Definition (D-separation)
If X, Y, Z are three disjoint sets of vertices in a DAG D, then Z is
said to d-separate X from Y, denoted by (X; Y|Z)D , if there is no
path between a vertex in X and a vertex in Y along which the
following two conditions hold: (i) every node with converging
arrows is in Z or has a descendant in Z and (ii) every other node is
outside of Z.
◮ If a path satisfies the above condition, it is said to be active;
otherwise it is said to be blocked.
◮ A DAG is an I-map of P is all its d-separations correspond to
conditional independencies satisfied in P.
◮ It is a minimal I-map, or a Bayesian network of P, if none of
its arrows can be deleted without destroying its I-mapness.
Louis Wehenkel IT... (27/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Construction of Bayesian networks for a distribution P

Definition (Boundary DAG of M relative to a vertex ordering)

Let M be a dependency model over U and d = (v1 , . . . , vn ) any
ordering of the elements of U, and define the sequence of nested
sets Udi by Ud1 = ∅ and Udi = {v1 , . . . , vi −1 }, for i = 2, . . . , n.

The boundary strata of M relative to d is an ordered set of subsets

of U, (Bd1 , Bd2 , . . . , Bdi , . . .) such that Bdi is a Markov boundary1 of
{vi } wrt. Udi .

The DAG created by designating each Bdi as parent set of vi is

called the boundary DAG of M relative to the ordering d.

1
ie. is a minimal subset of Udi satisfying ({vi } ⊥M (Udi \ Bdi )|Bdi )
Louis Wehenkel IT... (28/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

DAGs as minimal I-maps of Semi-graphoids

◮ Nota Bene:
◮ given any M, the existence (not necessarily uniquely) of a
boundary strata is established once d is given.
◮ hence, there is a at least one boundary DAG defined by each
ordering d; they may all be different or all identical or diverse
(cf examples).
◮ if M is induced by a strictly positive P, then for each ordering
d the boundary strata is unique as well as its induced
boundary DAG.
◮ Still, different orderings may yield identical boundary DAGs.
◮ Main result:
◮ For any semi-graphoid M (in particular, for any P induced
independence model) and any ordering d, any corresponding
boundary DAG is a minimal I-map of M.

Louis Wehenkel IT... (29/54)

Corollaries of the main result

◮ Given P over U and any ordering d = (X1 , . . . , Xn ) of the
variables, the DAG created by designating as parents of Xi
any minimal subset of ΠXi of predecessors of Xi in d satisfying
P(Xi |X1 , . . . , Xi −1 ) = P(Xi |ΠXi )
is a Bayesian network of P. If P is strictly positive then this is
uniquely defined.
◮ A necessary and sufficient condition for a DAG D to be a
Bayesian network of P is that each variable X be conditionally
independent of all its non-descendants, given its parents ΠX ,
and that no proper subset of ΠX satisfies this condition.
◮ If any Bayesian network D is constructed by the boundary
strata method in some ordering d, then any ordering d ′
consistent with the direction of arrows in D will lead to the
same D.
Louis Wehenkel IT... (30/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Summary of DAG models

◮ We have seen that we can make use of directed acyclic

graphical models to represent useful independencies, in a way
compatible with the definition of probabilistic conditional
independence.
◮ Not all independence structures may be represented exactly by
DAGs (examples will be given).
◮ But we can commit with the idea of building the most refined
model of them in the form of a minimal I-map.
◮ As with UGs, we can infer independencies by inspection of the
graph, in polynomial time.

Louis Wehenkel IT... (31/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Relations between UGs and DAGs
Discussion and further topics Quantitative aspects
Tree structured graphical models

The global picture

Dependency models

Probabilistic dependency models

UG−isomorph (i.e. Markov) models

Chordal−graph isomorph
(decomposable) models

DAG−isomorph (i.e. Bayesian) models

Louis Wehenkel IT... (32/54)

Quantitative aspects

◮ What do we need to add to a minimal I-map graphical

structure to describe fully a given P?
◮ UGs: parameterization via potential (or compatibility)
functions over cliques.
◮ DAGs: parameterization via conditional distributions over
families.
◮ How can we compute with parameterized DAG or UG
P-models?
◮ Exact computations: reduce du CG and use (generalized)
forward-backward algorithm.
◮ Approximations: turn problem into a tractable optimization
problem (subject of current research).
◮ How can we infer UG or DAG models from data ?

Louis Wehenkel IT... (33/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Tree structured graphical models

Motivations
◮ Tree structured models offer simple interpretations
◮ Efficient inference algorithms
◮ Efficient learning algorithms
Two classes
◮ Undirected trees and their equivalent directed versions
◮ Polytrees : DAGs whose skeleton is a tree

Louis Wehenkel IT... (34/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

A few additional definitions from graph theory

Skeleton of a DAG: UG obtained by replacing all arrows by lines.

Directing an UG: DAG obtained by replacing every line by an
arrow, under the constraint of producing a DAG.
Induced subraph: (of G = (V , E )) by V ′ ⊂ V is the graph
G (V ′ ) = (V ′ , E ∩ V ′ × V ′ ). (I.e; induced subgraphs
of UGs (resp. DAGs) are UGs (resp. DAGs).)
Clique of an UG: a clique of G = (V , E ) is an induced subgraph
G (V ′ ) such that ∀v , v ′ ∈ V ′ : v 6= v ′ ⇒ (v , v ′ ) ∈ E .
Maximal clique: a clique which can not be augmented while
maintaining the property of being a clique, i.e. a
maximal subgraph whose vertices are all adjacent to
each other in G .

Louis Wehenkel IT... (35/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Parameterizing UGs
In order to create a P from an UG we proceed as follows (see Pearl
for additional details):
1. Identify the set C = {C1 , . . .} of all maximal cliques of G .
2. For each i ≤ #C, assign a compatibility function gi (·) which
maps configurations of the subset of variables of the clique to
a non-negative real number.
3. Write
Q
i gQ
i (ci (x1 , . . . , xn ))
P(x1 , . . . , xn ) = P ,
x1 ,...,xn i gi (ci (x1 , . . . , xn ))
where ci (x1 , . . . , xn ) extracts from a configuration of all the
variables of the UG the corresponding configuration of the
subset of variables of the clique Ci .
But: How to derive the functions gi from evidence, and how to
compute the normalization constant are practical problems...
Louis Wehenkel IT... (36/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Example: Markov chain X − Y − Z

◮ Cliques: X − Y and Y − Z
◮ Compatibility functions: g1 (x, y ) and g2 (y , z)
◮ Suppose we know P(X , Y , Z ): how to derive the gi from P?
◮ Idea 1:
◮ let us take g1 (x, y ) = P(x, y ) and g2 (y , z) = P(y , z)
◮ we getPP(x,P y , z) =P(P(x, y )P(y , z))/K where
K = x∈X y∈Y z∈Z P(x, y )P(y , z)
◮ Impossible in general (Why ?)
◮ Idea 2:
p
◮ Let us, instead, takepg1 (x, y ) = P(x, y )/ P(y ) and
g2 (y , z) = P(y , z)/ P(y ): we get K = 1 ! (Why ?)
◮ Possible in general (Why ?)

Louis Wehenkel IT... (37/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Example: Markov chain X − Y − Z (continued)

◮ We could as well take g1 (x, y ) = P(x, y ) = P(x)P(y |x) and

g2 (y , z) = P(z|y ), which corresponds to the parameterization
of the DAG X → Y → Z ;
◮ or g2 (y , z) = P(y , z) = P(z|y )P(y ) and g1 = P(x|y ) which
corresponds to X ← Y → Z ;
◮ or g2 (y , z) = P(z)P(y |z) and g1 = P(x|y ) which corresponds
to X ← Y ← Z .
◮ But, we could not take the parameterization of the DAG
X → Y ← Z.
◮ The three first parameterizations correspond to directed
versions of the UG which do not introduce a v -structure,
while the fourth one introduces a v -structure.

Louis Wehenkel IT... (38/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Markov trees

◮ We use indifferently the term Markov tree, tree, or

tree-structured UG, to denote UGs whithout any cycles.
◮ Typically, we assume in addition that these trees are singly
connected, i.e. such that there is a path from any vertex to
anyother vertex, and use the term ’forest’ to denote the case
where not all nodes are connected.
◮ In a singly connected tree over n vertices, we always have
exactly n − 1 edges.
◮ In a forest over n vertices, we have n − c edges, where c is the
number of connected components.

Louis Wehenkel IT... (39/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

General procedure for parameterizing Markov trees:

1. For each edge (i.e. maximal clique Ci ) e = X − Y in a

Markov tree use a function ge (x1 , . . . , xn ) = P(x(e), y (e))
derived from P(Z)
2. For each vertex Xi in a Markov tree, use a function
gXi (xi ) = (P(xi ))d(Xi )−1 , where d(Xi ) denotes the number of
neighbors of the vertex Xi in G .
3. Write:
Q
e∈E (G ) ge (x(e), y (e))
PG (x1 , . . . , xn ) = Q .
Xi ∈V (G ) gXi (xi )

4. Now, let us consider what we get if we first direct the Markov

tree, and then use the DAG parameterization procedure to
infer a probability distribution from it... (from examples)
Louis Wehenkel IT... (40/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

I-map preserving direction of tree-structured UGs

◮ Theorem A: any directed version of a tree-structured UG

which has no v -structure produces a DAG which represents
exactly the same set of independencies as the original
undirected tree. (also true for chordal graphs)
◮ Corollary A1: tree-structured UGs may be parameterized by
first directing them without introducing any v -structure, and
then parameterizing the resulting DAG.
◮ Algorithm: to direct a tree-structured UG in such a way that
no v -structures are introduced
1. Choose first a root of the tree: any node of the UG
2. Direct its arcs ’away’ from the root
3. Proceed recursively by directing the yet not directed arcs of
the successors ’away’ from them.

Louis Wehenkel IT... (41/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Summary

◮ Like DAGs, tree structured UGs may be parameterized ’easily’

to represent a P which satisfies the independencies encoded
by the UG, by first directing the tree structured UG without
introducing v -structures (which maintains the encoded
independencies), and by then using the DAG parameterization
procedure to attach conditional distributions to nodes.
◮ The generalization of these ideas carries over to “chordal
graphs”, i.e. UGs such that every cycle of length 4 or more
has a chord (a line between non-consecutive vertices of the
cycle), with some modifications (see Pearl, chapter 3...)
◮ See examples to understand why chordality is essential to
yield such a decomposable representation of a probability
distribution...
Louis Wehenkel IT... (42/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning structure from data (chapter 8 of Pearl)

◮ Main question: how to infer the graph structure from the
information at hand
◮ We will limit ourselves to tree structures
◮ We will decompose the question in this context into three
successive questions:
◮ Given a P(x) known to factorize according to a tree structured
graph, how to efficiently recover its tree-structured perfect
map.
◮ Given a general P(x), can we recover the best approximation
of P(x) in the form of a parameterization of a tree structured
graph.
◮ Given only a sample from a generative distribution, how to
answer the two preceding questions.
◮ These questions will be declined successively with tree
structured UGs and general tree structured DAGs (polytrees).
Louis Wehenkel IT... (43/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Intuition about structure inference:

◮ Consider the case of three variables X , Y , Z , and suppose that

we know that they form a Markov chain, but that we don’t
know in which order.
◮ In other words, we hesitate between the three following
structures: X − Y − Z , Y − X − Z , X − Z − Y .
◮ Suppose that we are able to compute I (X ; Y ), I (Y ; Z ) and
I (X ; Z ):
◮ Can we infer from these three quantities a correct structure ?
◮ The answer is YES.
◮ Sort the quantities I (X ; Y ), I (Y ; Z ) and I (X ; Z ) by decreasing
order of numerical value, take the two first and create an UG
with lines among the corresponding two pairs of variables.
◮ Explanation: data processing inequality !

Louis Wehenkel IT... (44/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Generalization to distributions which factorize according to

a tree-structured UG
◮ We want to represent graphically the independencies of a
distribution P(X1 , . . . , Xn ) known to be Markov w.r.t. to a
tree-structured UG (but we do not know the structure).
◮ Algorithm (Chow and Liu, 1968):
1. Compute the pairwise mutual informations I (Xi ; Xj ), ∀i 6= j.
2. Assign a line between the variables corresponding to the
largest mutual information.
3. Examine the next largest information and assign a line, unless
it creates a cycle in the graph.
4. Repeat step 3, until n − 1 branches have been assigned.
◮ Select an arbitrary node as root, direct the UG from it
(without introducing v -structures) and to each Xi assign
P(Xi |Xp(i ) ) (where p(i ) addresses the (sole) parent of Xi ).
Louis Wehenkel IT... (45/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Comments about the Chow and Liu algorithm

◮ When we want to infer a tree-structured UG (or a directed
version of it without v -structures) for a target distribution,
and dispose of means to compute pairwise quantities from the
target distribution, in the form of mutual informations among
variables and conditional distributions of one variable given
another, we dispose of an ’efficient’ algorithm for generating a
Markov network (order n2 , roughly).
◮ The Chow Liu algorithm is an instance of the ’maximum
weight spanning tree’ algorithm of graph theory (MWST).
◮ NB: In the algorithm, we may in principle be led to situations
where the I (Xi ; Xj ) of the next line to assign is equal to zero;
if this is the case we can immediately stop the procedure
(leading to a ’forest’ model, i.e. a model where some subsets
of variables are disconnected).
Louis Wehenkel IT... (46/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Approximation of probability distributions

◮ In many practical situations, we do not dispose of precise

information about de probability distribution at hand.
◮ In particular, in such contexts, we are not able to verify in a
definite way independencies such as (Xi ⊥ Xj |Xk ).
◮ In other words, we can only estimate/approximate quantities
such as I (Xi ; Xj ) or P(Xi |Xp(i ) ).
◮ Then the following question arises:
How to infer precise probabilistic models from imprecise data ?
◮ Approach:
◮ Define a space of target probability distributions (model).
◮ Define a measure of discrepancy between distributions.
◮ Choose the probability distribution in the target space which is
as ’compatible as possible’ with the information at hand.

Louis Wehenkel IT... (47/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Measuring the compatibility among two distributions

◮ Kullback-Leibler divergence:
X P(x)
D(P, P ′ ) = P(x) log .
x
P ′ (x)

◮ tends to zero when P → P ′ .

◮ has the likelihood interpretation, when P is inferred from a
sample (...explained on the blackboard).
◮ Given a space P of distributions and a target distribution P,
we thus may seek to compute

P̂(P) = arg min D(P, P ′ ),

′ P ∈P

which we call the D-projection of P onto P.

Louis Wehenkel IT... (48/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Some first results

◮ Let us consider the space P t of probability distributions that

may be represented by an undirected tree t and a target
probability distribution P.
◮ Then the D-projection of P onto the space P t is obtained
simply by directing t without introducing v -structures and by
assigning to each node i in t the conditional distribution
P(Xi |Xp(i ) ) where p(i ) denotes the father of i in a directed
version of t.
◮ (For the proofs see Pearl chapter 8).
◮ Furthermore, to minimize D over the space of trees, we can
simply use Chow Liu based on information quantities derived
from P.

Louis Wehenkel IT... (49/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Comments about the Chow-Liu algorithm

◮ Given any probability distribution P and the means to

compute pairwise mutual informations and pairwise
conditional distributions in P, this algorithm allows to infer (in
quadratic time), a tree structured approximation of P.
◮ The resulting distribution P ′ is the one, among all that
factorize along UG trees, that is closest according to the
distance measure D(P, P ′ ).
◮ In particular, if P is Markov wrt to an UG tree, then the
resulting P ′ will be equal to P.

Louis Wehenkel IT... (50/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning from observations drawn from P

◮ Let us consider a sample of observations S = (x 1 , . . . , x N )

drawn i.i.d. from a target distribution P(x) (where each x i is
actually an n-tuple, having one element for each variable Xj .
◮ Given anyother distribution P ′ defined over the same set of
variables, we define the sample log-likelihood, by
N N
!
X Y
lL(S, P ′ ) = log P ′ (x i ) = log P ′ (x i )
i =1 i =1

◮ Given a space P of candidate distributions, a classical

criterion use in statistics, is to choose the one which
maximizes the sample likelihood.

Louis Wehenkel IT... (51/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning from observations drawn from P

◮ Let us consider a configuration x, and denote by N(x) the
number of observations in our sample which correspond to
that configuration and by F (x) = N(x)/N their relative
frequency among the N observations.
◮ We can rewrite the log-likelihood of the sample wrt to P’ as
X
lL(S, P ′ ) = N F (x) log P ′ (x)
x
.
◮ We then immediately see that maximizing the log-likelihood
of the sample by choosing P ′ is equivalent to choosing P ′ so
as to minimize the KL-divergence x F (x) log PF′(x)
P
(x) . Indeed,
X F (x) X
F (x) log ′
= −N −1 L(S, P ′ ) + F (x) log F (x).
x
P (x) x

Louis Wehenkel IT... (52/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning a Markov tree approximation from a sample

◮ Goal: find a tree structure and a parameterization such that

the sample likelihood is maximal (over all possible trees and
parameterizations of them).
◮ Solution: use sample to estimate mutual informations, by
replacing probabilities by relatives frequencies derived from the
sample (see example on the blackboard), then apply Chow-Liu
to get MWST, then choose a root, then use again sample to
estimate the conditional probabilities needed for each vertex.

Louis Wehenkel IT... (53/54)

On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Polytrees

Explain the main differences between polytrees and trees.

Explain very briefly learning of polytree structures (François takes
care of the rest...)

Louis Wehenkel IT... (54/54)

Instructor's Manual For Probabilistic Graphical Models by Daphne Koller, Benjamin Packer
No ratings yet
Instructor's Manual For Probabilistic Graphical Models by Daphne Koller, Benjamin Packer
59 pages
CN Questions & Solved Numericals
No ratings yet
CN Questions & Solved Numericals
21 pages
Discrete Mathematics Testbank With Anskey
100% (4)
Discrete Mathematics Testbank With Anskey
41 pages
AI Module-2
No ratings yet
AI Module-2
123 pages
ADA Solved Model Paper 2024
No ratings yet
ADA Solved Model Paper 2024
43 pages
Introduction To Bayesian Networks - Koski - Noble
No ratings yet
Introduction To Bayesian Networks - Koski - Noble
471 pages
Lecture-4.1. Representing Knowledge Using Rules
No ratings yet
Lecture-4.1. Representing Knowledge Using Rules
29 pages
B.SC - Maths Syllabus
No ratings yet
B.SC - Maths Syllabus
32 pages
00108
100% (1)
00108
991 pages
Unit 4 Full PPT (ML)
No ratings yet
Unit 4 Full PPT (ML)
31 pages
Bayes Ball
No ratings yet
Bayes Ball
5 pages
ST Flour Notes
No ratings yet
ST Flour Notes
104 pages
Data Structures and Algorithms
100% (1)
Data Structures and Algorithms
8 pages
Data Mining-Graph Mining
No ratings yet
Data Mining-Graph Mining
9 pages
Epi Summer 24
No ratings yet
Epi Summer 24
291 pages
Directed Graphical Models
No ratings yet
Directed Graphical Models
54 pages
Causality Bernhard Schölkopf
No ratings yet
Causality Bernhard Schölkopf
169 pages
Bayes Nets - Representation
No ratings yet
Bayes Nets - Representation
96 pages
Unit - 4 AI
No ratings yet
Unit - 4 AI
75 pages
2 Information Theory
No ratings yet
2 Information Theory
40 pages
All of Graphical Models
No ratings yet
All of Graphical Models
135 pages
T2L1 Undirected Graphs
No ratings yet
T2L1 Undirected Graphs
38 pages
Week 4
No ratings yet
Week 4
48 pages
c7 Bninf
No ratings yet
c7 Bninf
48 pages
c5 RoughSet
No ratings yet
c5 RoughSet
67 pages
Notes On Domain Theory by Mislove
No ratings yet
Notes On Domain Theory by Mislove
33 pages
Jour 2
No ratings yet
Jour 2
37 pages
3 6-ConditionalIndependence
No ratings yet
3 6-ConditionalIndependence
38 pages
Unit 3 Topic Ontological-Engineering
No ratings yet
Unit 3 Topic Ontological-Engineering
60 pages
Lecture 4-5 Reasoning With Uncertainty-2
No ratings yet
Lecture 4-5 Reasoning With Uncertainty-2
34 pages
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
No ratings yet
Bayesian Networks: Independencies and Inference: Scott Davies and Andrew Moore
21 pages
AI 17 Bayes Nets II Independence
No ratings yet
AI 17 Bayes Nets II Independence
32 pages
An Introduction to Probabilistic Graphical Models 【微信公众号：一介狂书生】
No ratings yet
An Introduction to Probabilistic Graphical Models 【微信公众号：一介狂书生】
35 pages
BN Lecture2
No ratings yet
BN Lecture2
37 pages
Prob Nets
No ratings yet
Prob Nets
21 pages
Exploiting Pearl's Theorems For Graphical Model Structure Discovery
No ratings yet
Exploiting Pearl's Theorems For Graphical Model Structure Discovery
66 pages
Data Mining - Utrecht University - 10. Slides
No ratings yet
Data Mining - Utrecht University - 10. Slides
49 pages
An Introduction To Artificial Intelligence: Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks
No ratings yet
An Introduction To Artificial Intelligence: Chapter 13 &14.1-14.2: Uncertainty & Bayesian Networks
31 pages
Filler Structures
No ratings yet
Filler Structures
28 pages
Structure Learning in Graphical Modeling
No ratings yet
Structure Learning in Graphical Modeling
28 pages
SP14 CS188 Lecture 16 Bayes Nets 4
No ratings yet
SP14 CS188 Lecture 16 Bayes Nets 4
42 pages
Methods of Inference: Expert Systems: Principles and Programming, Fourth Edition
No ratings yet
Methods of Inference: Expert Systems: Principles and Programming, Fourth Edition
46 pages
Markov Logic Networks: Matthew Richardson
No ratings yet
Markov Logic Networks: Matthew Richardson
30 pages
Bayesian Networks-Univ of Washington
No ratings yet
Bayesian Networks-Univ of Washington
21 pages
BN DBN SSM HMM - Ghahramani
No ratings yet
BN DBN SSM HMM - Ghahramani
30 pages
Min-Based Symmetry Possibilistic Network Model For Representation of Uncertain Datamodels
No ratings yet
Min-Based Symmetry Possibilistic Network Model For Representation of Uncertain Datamodels
10 pages
Lec 3
No ratings yet
Lec 3
25 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
04 - Graphical Causal Models - Causal Inference For The Brave and True
No ratings yet
04 - Graphical Causal Models - Causal Inference For The Brave and True
13 pages
To Louse 23 Hand Out
No ratings yet
To Louse 23 Hand Out
31 pages
Lecture 4
No ratings yet
Lecture 4
16 pages
Ai Unit Iiim
No ratings yet
Ai Unit Iiim
19 pages
Research On CDR
No ratings yet
Research On CDR
24 pages
Bayesian Networks - Exercises: 1 Independence and Conditional Independence
No ratings yet
Bayesian Networks - Exercises: 1 Independence and Conditional Independence
20 pages
EC6301-Object Oriented Programming PDF
100% (1)
EC6301-Object Oriented Programming PDF
6 pages
Larranaga 2021 10 3
No ratings yet
Larranaga 2021 10 3
17 pages
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
No ratings yet
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
43 pages
Bayesian Networks
No ratings yet
Bayesian Networks
24 pages
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
No ratings yet
Bayesian Networks: Machine Learning, Lecture (Jaakkola)
8 pages
Thiago Pereira
No ratings yet
Thiago Pereira
7 pages
2.-UndirectedGraphs 2
No ratings yet
2.-UndirectedGraphs 2
8 pages
04 - Graphical Causal Models - Causal Inference For The Brave and True
No ratings yet
04 - Graphical Causal Models - Causal Inference For The Brave and True
13 pages
Module-2 Ai
No ratings yet
Module-2 Ai
9 pages
Conditionally-Independent Dyad Models
No ratings yet
Conditionally-Independent Dyad Models
14 pages
04 - Graphical Causal Models - Causal Inference For The Brave and True
No ratings yet
04 - Graphical Causal Models - Causal Inference For The Brave and True
12 pages
2005 Howard
No ratings yet
2005 Howard
17 pages
Karimi20a Supp
No ratings yet
Karimi20a Supp
6 pages
Midterm Quiz 2 - Attempt Review PDF
No ratings yet
Midterm Quiz 2 - Attempt Review PDF
6 pages
Computer Science and Engineering - Full Paper - 2010
No ratings yet
Computer Science and Engineering - Full Paper - 2010
17 pages
4 Information Theory
No ratings yet
4 Information Theory
53 pages
Summary of CS1231
No ratings yet
Summary of CS1231
4 pages
Data Structures Semester 3
No ratings yet
Data Structures Semester 3
329 pages
Ada Lab Manual: Design and Analysis of Algorithms Laboratory
No ratings yet
Ada Lab Manual: Design and Analysis of Algorithms Laboratory
48 pages
Trees
No ratings yet
Trees
6 pages
S4 - Graph Theory
No ratings yet
S4 - Graph Theory
15 pages
10 Trees (SF)
No ratings yet
10 Trees (SF)
72 pages
Report 2.3-Revised-Final - Kaal Harir Abdulle, 160041080
No ratings yet
Report 2.3-Revised-Final - Kaal Harir Abdulle, 160041080
36 pages
Topic 10 Part 2 T
No ratings yet
Topic 10 Part 2 T
13 pages
Dsa Lecture 14 Graphs
No ratings yet
Dsa Lecture 14 Graphs
39 pages
Unit 3 - Matrices, Colouring
No ratings yet
Unit 3 - Matrices, Colouring
89 pages
Discrete Structures QP IT
No ratings yet
Discrete Structures QP IT
4 pages
1 s2.0 S2405844024160999 Main
No ratings yet
1 s2.0 S2405844024160999 Main
53 pages
THL (ch1+2+3+4)
No ratings yet
THL (ch1+2+3+4)
33 pages
Discrete Mathematics: Chapter 10.4-10.5
No ratings yet
Discrete Mathematics: Chapter 10.4-10.5
17 pages
Symmetric-Key Encryption: Constructions: PRG, PRF Stream and Block Ciphers
No ratings yet
Symmetric-Key Encryption: Constructions: PRG, PRF Stream and Block Ciphers
19 pages
Unit - V DS
No ratings yet
Unit - V DS
50 pages
Data Structure Fin
No ratings yet
Data Structure Fin
20 pages
Public-Key Cryptography: CCA Secure PKE Hybrid Encryption
No ratings yet
Public-Key Cryptography: CCA Secure PKE Hybrid Encryption
18 pages
Disjoint Sets Data Structure: Example. Consider A System of Three Sets (1, 3, 5), (2, 6), (4, 7, 8)
No ratings yet
Disjoint Sets Data Structure: Example. Consider A System of Three Sets (1, 3, 5), (2, 6), (4, 7, 8)
8 pages
IET Generation Trans Dist - 2023 - Mansour - Applications of IoT and Digital Twin in Electrical Power Systems A
No ratings yet
IET Generation Trans Dist - 2023 - Mansour - Applications of IoT and Digital Twin in Electrical Power Systems A
23 pages
Algorithm
No ratings yet
Algorithm
25 pages
CSA MDM 111 Practice Questions
No ratings yet
CSA MDM 111 Practice Questions
13 pages
Brute Force Approach in Algotihm
No ratings yet
Brute Force Approach in Algotihm
10 pages
DFS, DAG, and Strongly Connected Components: Shang-Hua Teng
No ratings yet
DFS, DAG, and Strongly Connected Components: Shang-Hua Teng
26 pages
Lecture 17 - Minimum Spanning Tree PDF
No ratings yet
Lecture 17 - Minimum Spanning Tree PDF
16 pages
Basics of Representation Theory
From Everand
Basics of Representation Theory
Udayan Bhattacharya
No ratings yet