0% found this document useful (0 votes)
27 views54 pages

Chapter 3

Uploaded by

Mayouf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views54 pages

Chapter 3

Uploaded by

Mayouf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

On the qualitative vs quantitative notion of Independence

Graphical models of independence relations


Discussion and further topics
Tree structured graphical models

Introduction to information theory and coding -


Lecture 1 on Graphical models

Louis Wehenkel

Department of Electrical Engineering and Computer Science


University of Liège

Montefiore - Liège - October, 2011

Find slides: http://montefiore.ulg.ac.be/∼lwh/Info/

Louis Wehenkel IT... (1/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

On the qualitative vs quantitative notion of Independence


Motivation for this lecture
Characterization of Independence relations

Graphical models of independence relations


Undirected graphical models: Markov networks
Directed graphical models: Bayesian networks

Discussion and further topics


Relations between UGs and DAGs
Quantitative aspects

Tree structured graphical models

Louis Wehenkel IT... (2/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Motivations and structure of course


◮ Objective:
Introduce and motivate graphical representation of qualitative
and quantitative probabilistic knowledge
◮ Qualitative notion of dependence
◮ Characterization of desired properties of independence relations
◮ Probability calculus as a model of Independence relations
◮ Two graphical representations of Independence relations
◮ Undirected graphs: Markov networks
◮ Directed graphs: Bayesian networks
◮ Relations between these two types of representations
◮ Quantitative aspects/questions
◮ In depth analysis of tree-structured graphical models
◮ Undirected trees and the Chow-Liu algorithm
◮ Directed trees and polytrees
Louis Wehenkel IT... (3/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Why do we need a qualitative notion of dependence?


◮ Making statements about independence (or relevance) is a
profound feature of common-sense reasoning, while probability
calculus gives a formalization and a safe procedure for testing
any (conditional) Independence statements.
◮ However, this procedure relies on the computation of the
probabilities of all combinations of statements, and is
essentially intractable in large domains.
◮ In short, the probability calculus procedure is in itself not an
operational model of reasoning about Independence relations,
specially when we don’t (yet) have the numbers.
◮ We would like to dispose of a kind of ’logic’ of Independence,
in which we can derive easily new Independence statements
from previously established or postulated ones, without
resorting to number crunching.
Louis Wehenkel IT... (4/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Desired properties of Independence relations

Consider a domain characterised by a finite set U of discrete


variables, and let A, B, C denote three disjoint subsets of U.

Let us denote by A ⊥ B|C the statement that “A is independent of


B, given that we know C”, i.e. when we already know the values of
the variables in C, we consider that the knowledge of the values of
the variables in B is irrelevant to our beliefs about the values in A.

We want to derive rules which are characteristic of independence


relations, and which allow us to infer in a sound way new
independence relations from established ones.

We will first propose a set of four rules and then verify that they
are valid inference rules for probabilistic independence relations.

Louis Wehenkel IT... (5/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Semi-graphoids
What are the desired properties of an independence relation ?
◮ Symmetry:
(X ⊥ Y|Z) ⇔ (Y ⊥ X|Z). (1)
◮ Decomposition:

(X ⊥ (Y ∪ W)|Z) ⇒ (X ⊥ Y|Z)&(X ⊥ W|Z). (2)

◮ Weak union: NB: “strong” union will be defined later.

(X ⊥ (Y ∪ W)|Z) ⇒ (X ⊥ Y|(Z ∪ W)). (3)

◮ Contraction:

(X ⊥ Y|Z)&(X ⊥ W|(Z ∪ Y)) ⇒ (X ⊥ (Y ∪ W)|Z). (4)

Louis Wehenkel IT... (6/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

In other words, what are the desired properties of such a relation ?


◮ Symmetry:
If Y tells us nothing about X (in some context Z), then X
tells us nothing about Y.
◮ Decomposition:
If two combined items of information are judged irrelevant to
X, then each separate item is irrelevant as well.
◮ Weak union:
Learning some irrelevant information W cannot help the other
irrelevant information Y become relevant.
◮ Contraction:
If we judge W irrelevant to X after learning some irrelevant
information Y, then W must also have been irrelevant before
we learned Y.

Louis Wehenkel IT... (7/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Properties of the probabilistic independence relation


NB: if P is a probability distribution defined over the variables in U, we
write: (A ⊥P B|C) ⇔ (∀a, b, c : P(b, c) > 0 ⇒ P(a|b, c) = P(a|c)).

Theorem (Probabilistic independence)


The probabilistic independence relationship ( · ⊥P · | · ) induced by
any probabilistic model P satisfies the four properties (1)-(4)
(symmetry, decomposition, weak union and contraction).

Theorem (Intersection property)


The probabilistic independence relationship induced by any strictly
positive probabilistic model P also satisfies

(X ⊥p Y|(Z ∪ W))&(X ⊥P W|(Z ∪ Y)) ⇒ (X ⊥P (Y ∪ W)|Z). (5)

Louis Wehenkel IT... (8/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Induced inference rules

◮ Chaining rule:

(X ⊥ Z|Y)&((X ∪ Y) ⊥ W|Z) ⇒ X ⊥ W|Y.


◮ Mixing rule:

(X ⊥ (Y ∪ W)|Z)&(Y ⊥ W|Z) ⇒ (X ∪ W) ⊥ Y|Z.

Exercise: Show that these two rules follow logically from (1) to
(4), and hence are valid inference rules for any probabilistic
independence relation.

Louis Wehenkel IT... (9/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Motivation for this lecture
Discussion and further topics Characterization of Independence relations
Tree structured graphical models

Summary

◮ We have abstracted from the quantitative notion of


conditional independence defined by probability theory.
◮ This abstraction is necessary for efficient manipulation of the
notion of independence/irrelevance.
◮ We have shown, to some extent, that one can axiomatize the
notion of independence in a way which remains logically
coherent with the same notion defined by probability calculus.
◮ We have illustrated that such an axiomatization is useful to
derive new independencies from postulated ones, and even
new inference rules from postulated ones.
◮ However, we are still lacking an intuitive and efficient way to
reason ourselves coherently in this framework.

Louis Wehenkel IT... (10/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Why graphical (independence) models?

◮ A picture is worth a thousand words...

Louis Wehenkel IT... (11/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Why graphical models?

◮ A picture is worth a thousand words...

Louis Wehenkel IT... (12/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Why graphical models?


◮ A picture is worth a thousand words...

Louis Wehenkel IT... (13/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Undirected graphs as independence models

Notion of undirected graph:


◮ A (general) graph is denoted by G = (V , E ) where V is a
finite set of vertices, and E ⊂ V × V is the set of edges.
◮ A path (of length n > 0) in G , is a sequence of different
vertices v1 , v2 , . . . , vn+1 such that (vi , vi +1 ) ∈ E , i = 1, . . . , n.
◮ An edge (v , v ′ ) ∈ E such that v = v ′ is called a loop.
◮ An edge (v , v ′ ) ∈ E such that v 6= v ′ and (v ′ , v ) ∈ E is called
a line.
◮ An edge which not a line nor a loop is called an arrow.
◮ We say that G is undirected if G has no loops and no arrows
(i.e. G has only lines).

Louis Wehenkel IT... (14/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Vertex separation in undirected graphs

From local to global vertex separation:


◮ Consider an undirected G = (U, E ).
◮ The absence of a line between two variables represents the
absence of a direct interaction between them.
◮ All other relations are induced by the notion of separation:
We say that in a graph G the sets A and B are separated by
C if all paths from A to B traverse C.
We denote this by (A; B|C)G .
◮ In particular, we say that the sets A and B are separated if
there is no path from A to B.
◮ In particular, if there is no line connecting A to B, then
(A; B|U \ (A ∪ B))G

Louis Wehenkel IT... (15/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Undirected graphs as independence models

◮ The good news:


◮ The vertex separation relation satisfies properties (1) to (5)
(please check this yourself)
◮ Vertex separation is easy to check (polynomial time).
◮ Questions:
◮ Is vertex separation compatible with probabilistic
independence?
◮ How general is vertex separation wrt probabilistic
independence?
◮ What kind of independence relations can be exactly
represented by vertex separation?

Louis Wehenkel IT... (16/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Notions of dependency, independency and perfect maps


Consider a distribution P and an undirected graph G over U.
Definition (D-map (independent subsets are indeed separated))
G is a D-map of P if for any three disjoint A, B, C ⊂ U we have

(A ⊥P B|C) ⇒ (A; B|C)G . (6)

Definition (I-map (separated subsets are indeed independent))


G is a I-map of P if for any three disjoint A, B, C ⊂ U we have

(A ⊥P B|C) ⇐ (A; B|C)G . (7)

Definition (Perfect map (equivalence between “⊥P ” and “;”))


G is a perfect map of P if it is a D-map and an I-map of P.
Louis Wehenkel IT... (17/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Representation power of UGs


◮ Preliminary comments:
◮ Any P has at least a D-map (e.g. the empty graph)
◮ Any P has at least an I-map (e.g. the complete graph)
◮ Some P have no perfect-map (e.g. two coins and a bell)
◮ There is thus a need to delineate more precisely
◮ those dependency models that have perfect maps, and
◮ those graphical models which are perfect maps of a
dependency model
◮ provide constructive algorithms to switch between P and G .
◮ We say that a dependency model M (i.e. a rule that assigns
truth values to a three-place relation (A ⊥M B|C) over
disjoint subsets of some U) is graph-isomorph if there exists
an undirected graph (U, E ) which is a perfect map of M.
◮ Goal: characterize graph-isomorph probabilistic models.
Louis Wehenkel IT... (18/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

On the structure of the set of UGs over some U


◮ Lattice structure:
◮ For a fixed U, we can identify an undirected graph G = (U, E )
with its set of edges E .
◮ The set of edges can itself be identified with a subset of the
set of pairs {v , v ′ } ∈ U.
◮ For any G = (U, E ) and G ′ = (U, E ′ ), let us write G ⊂ G ′ if
E ⊂ E ′.
◮ Monotonicity wrt addition or removal of edges:
◮ if G is a D-map of P, any G ′ ⊂ G is also a D-map of P,
◮ if G is an I-map of P, any G ′ ⊃ G is also an I-map of P.
◮ Extreme maps:
◮ G is a minimal I-map, if there is no G ′ ⊂ G (other than G
itself) which is also an I-map.
◮ G is a maximal D-map, if there is no G ′ ⊃ G (other than G
itself) which is also a D-map.
Louis Wehenkel IT... (19/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Characterization of graph-isomorph dependency model

Theorem (Graph isomorph dependency model M)


A necessary and sufficient condition for a dependency model M
over some U to be graph-isomorph, is that is satisfies Symmetry,
Decomposition, Intersection, Strong union and Transitivity,

where Strong union means that:

(X ⊥M Y|Z) ⇒ (X ⊥M Y|(Z ∪ W)), (8)

and Transitivity means that:

(X ⊥M Y|Z) ⇒ ∀γ ∈ U : (X ⊥M {γ}|Z) or ({γ} ⊥M Y|Z). (9)

(NB: {γ} denotes a singleton subset of U)


Louis Wehenkel IT... (20/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Question 1: Given P construct minimal I-map G of P


Theorem (Existence, unicity, construction of minimal I-map)
Every dependency model M which satisfies symmetry,
decomposition and intersection, has a unique minimal I-map
G0 = (U, E0 ) produced by connecting only those pairs (v , v ′ ) for
which ({v } ⊥M {v ′ }|U \ {v , v ′ }) is FALSE.
◮ Motivation: a minimal I-map is a graph displaying a maximal
number of independencies without false-positives.
◮ Notice that the property holds for independencies displayed by
strictly positive P.
◮ NB:
◮ existence of an I-map guarantees existence of a minimal one,
but not unicity.
◮ NB: unicity of minimal I-map implies that any I-map is a
superset of the unique minimal one.
Louis Wehenkel IT... (21/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Question 2: Check whether G is an I-map of P

◮ If P is strictly positive, we can check whether G is an I-map,


by constructing first a minimal I-mapG0 of P and then
checking whether G0 ⊂ G .
◮ Assuming that we can establish ({v } ⊥P {v ′ }|U \ {v , v ′ }) for
any (v , v ′ ) in constant time:
◮ building a minimal I-map G0 may be done in polynomial time
(quadratic in the number of variables).
◮ checking G0 ⊂ G may also be done in polynomial time
(quadratic).
◮ Hence these problems are solved “efficiently” if we have an
oracle providing in constant time answers to “elementary”
queries of conditional independence of two variables given all
others.

Louis Wehenkel IT... (22/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Markov networks, blankets and boundaries


Definitions:
◮ Given P (resp. M), we say that G is a Markov network of P
(resp. M) is it is a minimal I-map of P (resp. M).
◮ A Markov blanket BLM (v ) of v ∈ U is any subset S ⊂ U for
which ({v } ⊥M U \ ({v } ∪ S)|S).
◮ A Markov boundary BM (v ) of v ∈ U is a minimal Markov
blanket.

Theorem (Unicity and construction of Markov boundaries)


Every element v of a dependency model M which satisfies
symmetry, decomposition, intersection, and weak union, has a
unique Markov boundary, and this corresponds with the set of
vertices adjacent to v in the minimal I-map G0 of M.

Louis Wehenkel IT... (23/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Summary of UG models
◮ We have seen that we can make use of undirected models to
represent useful independencies, in a way compatible with the
definition of probabilistic conditional independence.
◮ Not all independence structures may be represented by UGs.
◮ But we can commit with the idea of building the most refined
model of them in the form of a minimal I-map.
◮ Further topics of relevance:
◮ For any G is there a P such that its G is a perfect map ?
(answer is yes, with a few hypotheses).
◮ Once we have a minimal I-map (or simply an I-map) how do
we “decorate” this structure with numerical information to
represent a particular probability distribution P ? (later)
◮ How to carry out “numerical computations” with such
structures ? (later)
Louis Wehenkel IT... (24/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Directed graphical models: motivations


◮ Undirected graphical models lack of representation power:
◮ unable to represent induced and non-transitive dependencies.
◮ no representation of causality.
◮ To overcome these deficiencies, we will use the language of
directed graphs.
◮ Arrows (instead of lines) may allow to distinguish genuine
dependencies from spurious ones (see two-coins + bell
example).
◮ Arrows also allow to impose (asymmetric) causal relations on
top of (symmetric) dependencies.
◮ Menu:
◮ Define Bayesian networks and their semantics as independence
maps.
◮ Answer questions 1, 2 and 3 along the treatment we have done
for Markov models.
Louis Wehenkel IT... (25/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Directed acyclic graphs (DAGs)

◮ A directed graph D = (V , E ) is a graph with only arrows:


◮ i.e. no loops and no lines,
◮ or, in other words, (v , v ′ ) ∈ E ⇒ v 6= v ′ &(v ′ , v ) 6∈ E .
◮ A cycle of length n > 0, in a graph G = (V , E ), is a sequence
v1 , . . . , vn+1 such that (vi , vi +1 ) ∈ E & v1 = vn+1 .
◮ The cycle is said to be simple (or proper) if all nodes except
v1 and vn+1 are different.
◮ A DAG is a directed graph without any cycle.
◮ Note that this is equivalent to saying:
◮ that a DAG is a directed graph without any simple cycle.
◮ or that a DAG is a graph without any cycle.

Louis Wehenkel IT... (26/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

D-separation in DAGs
Definition (D-separation)
If X, Y, Z are three disjoint sets of vertices in a DAG D, then Z is
said to d-separate X from Y, denoted by (X; Y|Z)D , if there is no
path between a vertex in X and a vertex in Y along which the
following two conditions hold: (i) every node with converging
arrows is in Z or has a descendant in Z and (ii) every other node is
outside of Z.
◮ If a path satisfies the above condition, it is said to be active;
otherwise it is said to be blocked.
◮ A DAG is an I-map of P is all its d-separations correspond to
conditional independencies satisfied in P.
◮ It is a minimal I-map, or a Bayesian network of P, if none of
its arrows can be deleted without destroying its I-mapness.
Louis Wehenkel IT... (27/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Construction of Bayesian networks for a distribution P

Definition (Boundary DAG of M relative to a vertex ordering)


Let M be a dependency model over U and d = (v1 , . . . , vn ) any
ordering of the elements of U, and define the sequence of nested
sets Udi by Ud1 = ∅ and Udi = {v1 , . . . , vi −1 }, for i = 2, . . . , n.

The boundary strata of M relative to d is an ordered set of subsets


of U, (Bd1 , Bd2 , . . . , Bdi , . . .) such that Bdi is a Markov boundary1 of
{vi } wrt. Udi .

The DAG created by designating each Bdi as parent set of vi is


called the boundary DAG of M relative to the ordering d.

1
ie. is a minimal subset of Udi satisfying ({vi } ⊥M (Udi \ Bdi )|Bdi )
Louis Wehenkel IT... (28/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

DAGs as minimal I-maps of Semi-graphoids

◮ Nota Bene:
◮ given any M, the existence (not necessarily uniquely) of a
boundary strata is established once d is given.
◮ hence, there is a at least one boundary DAG defined by each
ordering d; they may all be different or all identical or diverse
(cf examples).
◮ if M is induced by a strictly positive P, then for each ordering
d the boundary strata is unique as well as its induced
boundary DAG.
◮ Still, different orderings may yield identical boundary DAGs.
◮ Main result:
◮ For any semi-graphoid M (in particular, for any P induced
independence model) and any ordering d, any corresponding
boundary DAG is a minimal I-map of M.

Louis Wehenkel IT... (29/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Corollaries of the main result


◮ Given P over U and any ordering d = (X1 , . . . , Xn ) of the
variables, the DAG created by designating as parents of Xi
any minimal subset of ΠXi of predecessors of Xi in d satisfying
P(Xi |X1 , . . . , Xi −1 ) = P(Xi |ΠXi )
is a Bayesian network of P. If P is strictly positive then this is
uniquely defined.
◮ A necessary and sufficient condition for a DAG D to be a
Bayesian network of P is that each variable X be conditionally
independent of all its non-descendants, given its parents ΠX ,
and that no proper subset of ΠX satisfies this condition.
◮ If any Bayesian network D is constructed by the boundary
strata method in some ordering d, then any ordering d ′
consistent with the direction of arrows in D will lead to the
same D.
Louis Wehenkel IT... (30/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Undirected graphical models: Markov networks
Discussion and further topics Directed graphical models: Bayesian networks
Tree structured graphical models

Summary of DAG models

◮ We have seen that we can make use of directed acyclic


graphical models to represent useful independencies, in a way
compatible with the definition of probabilistic conditional
independence.
◮ Not all independence structures may be represented exactly by
DAGs (examples will be given).
◮ But we can commit with the idea of building the most refined
model of them in the form of a minimal I-map.
◮ As with UGs, we can infer independencies by inspection of the
graph, in polynomial time.

Louis Wehenkel IT... (31/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Relations between UGs and DAGs
Discussion and further topics Quantitative aspects
Tree structured graphical models

The global picture


Dependency models

Probabilistic dependency models

UG−isomorph (i.e. Markov) models

Chordal−graph isomorph
(decomposable) models

DAG−isomorph (i.e. Bayesian) models

Louis Wehenkel IT... (32/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations Relations between UGs and DAGs
Discussion and further topics Quantitative aspects
Tree structured graphical models

Quantitative aspects

◮ What do we need to add to a minimal I-map graphical


structure to describe fully a given P?
◮ UGs: parameterization via potential (or compatibility)
functions over cliques.
◮ DAGs: parameterization via conditional distributions over
families.
◮ How can we compute with parameterized DAG or UG
P-models?
◮ Exact computations: reduce du CG and use (generalized)
forward-backward algorithm.
◮ Approximations: turn problem into a tractable optimization
problem (subject of current research).
◮ How can we infer UG or DAG models from data ?

Louis Wehenkel IT... (33/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Tree structured graphical models

Motivations
◮ Tree structured models offer simple interpretations
◮ Efficient inference algorithms
◮ Efficient learning algorithms
Two classes
◮ Undirected trees and their equivalent directed versions
◮ Polytrees : DAGs whose skeleton is a tree

Louis Wehenkel IT... (34/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

A few additional definitions from graph theory

Skeleton of a DAG: UG obtained by replacing all arrows by lines.


Directing an UG: DAG obtained by replacing every line by an
arrow, under the constraint of producing a DAG.
Induced subraph: (of G = (V , E )) by V ′ ⊂ V is the graph
G (V ′ ) = (V ′ , E ∩ V ′ × V ′ ). (I.e; induced subgraphs
of UGs (resp. DAGs) are UGs (resp. DAGs).)
Clique of an UG: a clique of G = (V , E ) is an induced subgraph
G (V ′ ) such that ∀v , v ′ ∈ V ′ : v 6= v ′ ⇒ (v , v ′ ) ∈ E .
Maximal clique: a clique which can not be augmented while
maintaining the property of being a clique, i.e. a
maximal subgraph whose vertices are all adjacent to
each other in G .

Louis Wehenkel IT... (35/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Parameterizing UGs
In order to create a P from an UG we proceed as follows (see Pearl
for additional details):
1. Identify the set C = {C1 , . . .} of all maximal cliques of G .
2. For each i ≤ #C, assign a compatibility function gi (·) which
maps configurations of the subset of variables of the clique to
a non-negative real number.
3. Write
Q
i gQ
i (ci (x1 , . . . , xn ))
P(x1 , . . . , xn ) = P ,
x1 ,...,xn i gi (ci (x1 , . . . , xn ))
where ci (x1 , . . . , xn ) extracts from a configuration of all the
variables of the UG the corresponding configuration of the
subset of variables of the clique Ci .
But: How to derive the functions gi from evidence, and how to
compute the normalization constant are practical problems...
Louis Wehenkel IT... (36/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Example: Markov chain X − Y − Z

◮ Cliques: X − Y and Y − Z
◮ Compatibility functions: g1 (x, y ) and g2 (y , z)
◮ Suppose we know P(X , Y , Z ): how to derive the gi from P?
◮ Idea 1:
◮ let us take g1 (x, y ) = P(x, y ) and g2 (y , z) = P(y , z)
◮ we getPP(x,P y , z) =P(P(x, y )P(y , z))/K where
K = x∈X y∈Y z∈Z P(x, y )P(y , z)
◮ Impossible in general (Why ?)
◮ Idea 2:
p
◮ Let us, instead, takepg1 (x, y ) = P(x, y )/ P(y ) and
g2 (y , z) = P(y , z)/ P(y ): we get K = 1 ! (Why ?)
◮ Possible in general (Why ?)

Louis Wehenkel IT... (37/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Example: Markov chain X − Y − Z (continued)

◮ We could as well take g1 (x, y ) = P(x, y ) = P(x)P(y |x) and


g2 (y , z) = P(z|y ), which corresponds to the parameterization
of the DAG X → Y → Z ;
◮ or g2 (y , z) = P(y , z) = P(z|y )P(y ) and g1 = P(x|y ) which
corresponds to X ← Y → Z ;
◮ or g2 (y , z) = P(z)P(y |z) and g1 = P(x|y ) which corresponds
to X ← Y ← Z .
◮ But, we could not take the parameterization of the DAG
X → Y ← Z.
◮ The three first parameterizations correspond to directed
versions of the UG which do not introduce a v -structure,
while the fourth one introduces a v -structure.

Louis Wehenkel IT... (38/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Markov trees

◮ We use indifferently the term Markov tree, tree, or


tree-structured UG, to denote UGs whithout any cycles.
◮ Typically, we assume in addition that these trees are singly
connected, i.e. such that there is a path from any vertex to
anyother vertex, and use the term ’forest’ to denote the case
where not all nodes are connected.
◮ In a singly connected tree over n vertices, we always have
exactly n − 1 edges.
◮ In a forest over n vertices, we have n − c edges, where c is the
number of connected components.

Louis Wehenkel IT... (39/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

General procedure for parameterizing Markov trees:

1. For each edge (i.e. maximal clique Ci ) e = X − Y in a


Markov tree use a function ge (x1 , . . . , xn ) = P(x(e), y (e))
derived from P(Z)
2. For each vertex Xi in a Markov tree, use a function
gXi (xi ) = (P(xi ))d(Xi )−1 , where d(Xi ) denotes the number of
neighbors of the vertex Xi in G .
3. Write:
Q
e∈E (G ) ge (x(e), y (e))
PG (x1 , . . . , xn ) = Q .
Xi ∈V (G ) gXi (xi )

4. Now, let us consider what we get if we first direct the Markov


tree, and then use the DAG parameterization procedure to
infer a probability distribution from it... (from examples)
Louis Wehenkel IT... (40/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

I-map preserving direction of tree-structured UGs

◮ Theorem A: any directed version of a tree-structured UG


which has no v -structure produces a DAG which represents
exactly the same set of independencies as the original
undirected tree. (also true for chordal graphs)
◮ Corollary A1: tree-structured UGs may be parameterized by
first directing them without introducing any v -structure, and
then parameterizing the resulting DAG.
◮ Algorithm: to direct a tree-structured UG in such a way that
no v -structures are introduced
1. Choose first a root of the tree: any node of the UG
2. Direct its arcs ’away’ from the root
3. Proceed recursively by directing the yet not directed arcs of
the successors ’away’ from them.

Louis Wehenkel IT... (41/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Summary

◮ Like DAGs, tree structured UGs may be parameterized ’easily’


to represent a P which satisfies the independencies encoded
by the UG, by first directing the tree structured UG without
introducing v -structures (which maintains the encoded
independencies), and by then using the DAG parameterization
procedure to attach conditional distributions to nodes.
◮ The generalization of these ideas carries over to “chordal
graphs”, i.e. UGs such that every cycle of length 4 or more
has a chord (a line between non-consecutive vertices of the
cycle), with some modifications (see Pearl, chapter 3...)
◮ See examples to understand why chordality is essential to
yield such a decomposable representation of a probability
distribution...
Louis Wehenkel IT... (42/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning structure from data (chapter 8 of Pearl)


◮ Main question: how to infer the graph structure from the
information at hand
◮ We will limit ourselves to tree structures
◮ We will decompose the question in this context into three
successive questions:
◮ Given a P(x) known to factorize according to a tree structured
graph, how to efficiently recover its tree-structured perfect
map.
◮ Given a general P(x), can we recover the best approximation
of P(x) in the form of a parameterization of a tree structured
graph.
◮ Given only a sample from a generative distribution, how to
answer the two preceding questions.
◮ These questions will be declined successively with tree
structured UGs and general tree structured DAGs (polytrees).
Louis Wehenkel IT... (43/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Intuition about structure inference:

◮ Consider the case of three variables X , Y , Z , and suppose that


we know that they form a Markov chain, but that we don’t
know in which order.
◮ In other words, we hesitate between the three following
structures: X − Y − Z , Y − X − Z , X − Z − Y .
◮ Suppose that we are able to compute I (X ; Y ), I (Y ; Z ) and
I (X ; Z ):
◮ Can we infer from these three quantities a correct structure ?
◮ The answer is YES.
◮ Sort the quantities I (X ; Y ), I (Y ; Z ) and I (X ; Z ) by decreasing
order of numerical value, take the two first and create an UG
with lines among the corresponding two pairs of variables.
◮ Explanation: data processing inequality !

Louis Wehenkel IT... (44/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Generalization to distributions which factorize according to


a tree-structured UG
◮ We want to represent graphically the independencies of a
distribution P(X1 , . . . , Xn ) known to be Markov w.r.t. to a
tree-structured UG (but we do not know the structure).
◮ Algorithm (Chow and Liu, 1968):
1. Compute the pairwise mutual informations I (Xi ; Xj ), ∀i 6= j.
2. Assign a line between the variables corresponding to the
largest mutual information.
3. Examine the next largest information and assign a line, unless
it creates a cycle in the graph.
4. Repeat step 3, until n − 1 branches have been assigned.
◮ Select an arbitrary node as root, direct the UG from it
(without introducing v -structures) and to each Xi assign
P(Xi |Xp(i ) ) (where p(i ) addresses the (sole) parent of Xi ).
Louis Wehenkel IT... (45/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Comments about the Chow and Liu algorithm


◮ When we want to infer a tree-structured UG (or a directed
version of it without v -structures) for a target distribution,
and dispose of means to compute pairwise quantities from the
target distribution, in the form of mutual informations among
variables and conditional distributions of one variable given
another, we dispose of an ’efficient’ algorithm for generating a
Markov network (order n2 , roughly).
◮ The Chow Liu algorithm is an instance of the ’maximum
weight spanning tree’ algorithm of graph theory (MWST).
◮ NB: In the algorithm, we may in principle be led to situations
where the I (Xi ; Xj ) of the next line to assign is equal to zero;
if this is the case we can immediately stop the procedure
(leading to a ’forest’ model, i.e. a model where some subsets
of variables are disconnected).
Louis Wehenkel IT... (46/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Approximation of probability distributions

◮ In many practical situations, we do not dispose of precise


information about de probability distribution at hand.
◮ In particular, in such contexts, we are not able to verify in a
definite way independencies such as (Xi ⊥ Xj |Xk ).
◮ In other words, we can only estimate/approximate quantities
such as I (Xi ; Xj ) or P(Xi |Xp(i ) ).
◮ Then the following question arises:
How to infer precise probabilistic models from imprecise data ?
◮ Approach:
◮ Define a space of target probability distributions (model).
◮ Define a measure of discrepancy between distributions.
◮ Choose the probability distribution in the target space which is
as ’compatible as possible’ with the information at hand.

Louis Wehenkel IT... (47/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Measuring the compatibility among two distributions


◮ Kullback-Leibler divergence:
X P(x)
D(P, P ′ ) = P(x) log .
x
P ′ (x)

◮ tends to zero when P → P ′ .


◮ has the likelihood interpretation, when P is inferred from a
sample (...explained on the blackboard).
◮ Given a space P of distributions and a target distribution P,
we thus may seek to compute

P̂(P) = arg min D(P, P ′ ),


′ P ∈P

which we call the D-projection of P onto P.


Louis Wehenkel IT... (48/54)
On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Some first results

◮ Let us consider the space P t of probability distributions that


may be represented by an undirected tree t and a target
probability distribution P.
◮ Then the D-projection of P onto the space P t is obtained
simply by directing t without introducing v -structures and by
assigning to each node i in t the conditional distribution
P(Xi |Xp(i ) ) where p(i ) denotes the father of i in a directed
version of t.
◮ (For the proofs see Pearl chapter 8).
◮ Furthermore, to minimize D over the space of trees, we can
simply use Chow Liu based on information quantities derived
from P.

Louis Wehenkel IT... (49/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Comments about the Chow-Liu algorithm

◮ Given any probability distribution P and the means to


compute pairwise mutual informations and pairwise
conditional distributions in P, this algorithm allows to infer (in
quadratic time), a tree structured approximation of P.
◮ The resulting distribution P ′ is the one, among all that
factorize along UG trees, that is closest according to the
distance measure D(P, P ′ ).
◮ In particular, if P is Markov wrt to an UG tree, then the
resulting P ′ will be equal to P.

Louis Wehenkel IT... (50/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning from observations drawn from P

◮ Let us consider a sample of observations S = (x 1 , . . . , x N )


drawn i.i.d. from a target distribution P(x) (where each x i is
actually an n-tuple, having one element for each variable Xj .
◮ Given anyother distribution P ′ defined over the same set of
variables, we define the sample log-likelihood, by
N N
!
X Y
lL(S, P ′ ) = log P ′ (x i ) = log P ′ (x i )
i =1 i =1

◮ Given a space P of candidate distributions, a classical


criterion use in statistics, is to choose the one which
maximizes the sample likelihood.

Louis Wehenkel IT... (51/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning from observations drawn from P


◮ Let us consider a configuration x, and denote by N(x) the
number of observations in our sample which correspond to
that configuration and by F (x) = N(x)/N their relative
frequency among the N observations.
◮ We can rewrite the log-likelihood of the sample wrt to P’ as
X
lL(S, P ′ ) = N F (x) log P ′ (x)
x
.
◮ We then immediately see that maximizing the log-likelihood
of the sample by choosing P ′ is equivalent to choosing P ′ so
as to minimize the KL-divergence x F (x) log PF′(x)
P
(x) . Indeed,
X F (x) X
F (x) log ′
= −N −1 L(S, P ′ ) + F (x) log F (x).
x
P (x) x

Louis Wehenkel IT... (52/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Learning a Markov tree approximation from a sample

◮ Goal: find a tree structure and a parameterization such that


the sample likelihood is maximal (over all possible trees and
parameterizations of them).
◮ Solution: use sample to estimate mutual informations, by
replacing probabilities by relatives frequencies derived from the
sample (see example on the blackboard), then apply Chow-Liu
to get MWST, then choose a root, then use again sample to
estimate the conditional probabilities needed for each vertex.

Louis Wehenkel IT... (53/54)


On the qualitative vs quantitative notion of Independence
Graphical models of independence relations
Discussion and further topics
Tree structured graphical models

Polytrees

Explain the main differences between polytrees and trees.


Explain very briefly learning of polytree structures (François takes
care of the rest...)

Louis Wehenkel IT... (54/54)

You might also like