0% found this document useful (0 votes)

27 views

Directed Graphical Models

Halloween

Uploaded by

Youssef Bahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

Directed Graphical Models

Halloween

Uploaded by

Youssef Bahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

Graphical Models

Christophe Ambroise
[email protected]
UEVE, UMR CNRS 8071

November 7, 2023

Introduction

3
Practical matters
Reference document

The lecture closely follows and largely borrows material from

“Machine Learning: A Probabilistic Perspective” (MLAPP) from
Kevin P. Murphy, chapters: 4

Chapter 10: Directed graphical models (Bayes nets)

Practical matters
Evaluation

The project will be evaluated through of a project in R or Python

(realized by 2 or 3 student). Each project will be different and rated
on the basis of

a code (1/3)
a presentation (15 minutes) (1/3)
a report (1/3)

5
What is a graphical model ?
A graphical model is a probability distribution in a factorized form

There a two main type of representation of the factorization:

directed graphical model

undirected graphical model
Why the term graph ?

Conditionnal independences between variables are well modeled

via Graphs

What is it usefull for ?

reduce the number of parameters
→ may be used for supervised or unsupervised approaches
allow exploratory data analysis by providing a simple graphical
representation
→ “approach causality”

7
What problems does it raise ?
learning the parameters of a given factorized form
learning the structure of the graphical model (factorized form)

Directed Graphical Models

(Chapter 10 MLAPP)

10
Joint distribution
Observation

Suppose we observe multiple correlated variables, such as words

in a document, pixels in an image, or genes in a microarray.
Joint distribution

How can we compactly represent the joint distribution p(x|θ)?

Chain Rule
By the chain rule of probability, we can always represent a joint
distribution as follows, using any ordering of the variables:

p(x 1:V ) = p(x 1 )p(x 2 |x 1 )p(x 3 |x 2 , x 1 )p(x 4 |x 1 , x 2 , x 3 ). . . p(x

The problem of the number of parameters

O(K) + O(K
2
) + O(K
3
)+. . . There are O(K V ) parameters
in the system

12
Conditional independence
The key to efficiently representing large joint distributions is to
make some assumptions about conditional independence (CI).

X ⊥ Y |Z ⇔ p(X, Y |Z) = p(X|Z)p(Y |Z)

X is conditionaly independent of Y knowing Z if once you know Z

knowing Y does not help you to guess X

Conditional independence: an example

Setting: picking a card at random in a traditional set of cards

1. if full set of color and values then color ⊥ value

2. if all diamond faces (⧫) are discarded from the set then
color ⊥̸ value but still color ⊥ value|F acecard

P (King|F acecard) = 1/3 = P (♣|F acecard)

P (King♣|F acecard) = 1/9 = P (King|F acecard)P (♣|F a

14
Simplification of chain rule
Simplficiation of chain rule factorization

Let assume that x t+1 ⊥ x 1:t−1 |x t , first order Markov

assumption.

p(x 1:V ) = p(x 1 ) ∏ p(x t |x t−1 )

t=2

K − 1 + K
2
parameters

Graphical models
A graphical model (GM) is a way to represent a joint distribution by
making Conditional Independence (CI) assumptions.

the nodes in the graph represent random variables,

and the (lack of) edges represent CI assumptions.

A better name for these models would in fact be ‘’independence

diagrams’’

There are several kinds of graphical model, depending on whether

the graph is directed,
undirected,
or some combination of directed and undirected.

Example of directed and undirected graphical

model

17
Graph terminology
A graph G = (V , E) consists of

a set of nodes or vertices, V , and

= {1, . . . , V }

a set of edges, E = {(s, t) : s, t ∈ V } .

Adjacency matrix

We can represent the graph by its adjacency matrix, in which we

write G(s, t) = 1 to denote (s, t) ∈ E , that is, if s → t is an
edge in the graph. If G(s, t) = 1 iff G(t, s) = 1, we say the
graph is undirected, otherwise it is directed.

We usually assume G(s, s) = 0 , which means there are no self

loops. 18

Graph terminology
Parent: For a directed graph, the parents of a node is the set of
all nodes that feed into it: pa(s) ≜ {t : G(t, s) = 1}.
Child: For a directed graph, the children of a node is the set of
all nodes that feed out of it: ch(s) ≜ {t : G(s, t) = 1}.
Family: For a directed graph, the family of a node is the node
and its parents, f am(s) = s ∪ pa(s).
Root: For a directed graph, a root is a node with no parents.
Leaf: For a directed graph, a leaf is a node with no children.
Ancestors: For a directed graph, the ancestors are the parents,
grand-parents, etc of a node. That is, the ancestors of t is the
set of nodes that connect to t via a trail:
anc(t) ≜ {s : s ⇝ t}.

Descendants: For a directed graph, the descendants are the

children, grand-children, etc of a node. That is, the descendants 19

Graph terminology
Clique: For an undirected graph, a clique is a set of nodes that
are all neighbors of each other.
A maximal clique is a clique which cannot be made any larger
without losing the clique property.
Neighbors For any graph, we define the neighbors of a node as
the set of all immediately connected nodes,
nbr(s) ≜ {t : G(s, t) = 1vG(t, s) = 1}. For an undirected

graph, we write s ∼ t to indicate that s and t are neighbors.

Degree: The degree of a node is the number of neighbors. For
directed graphs, we speak of the in-degree and out-degree,
which count the number of parents and children.
Cycle or loop: For any graph, we define a cycle or loop to be a
series of nodes such that we can get back to where we started
by following edges
20

DAG A directed acyclic graph or DAG is a directed graph with no

Directed graphical models

A directed graphical model or DGM is a GM whose graph is a
DAG.
These are more commonly known as Bayesian networks
These models are also called belief networks
Finally, these models are sometimes called causal networks,
because the directed arrows are sometimes interpreted as
representing causal relations.

21
Topological ordering of DAGs
nodes can be ordered such that parents come before children
it can be constructed from any DAG
The ordered Markov property

a node only depends on its immediate parents

x s ⊥ x pred(s)∖pa(s) |x pa(s)

where pa(s) are the parents of node s, and pred(s) are the
predecessors of node s in the ordering.

General form of factorization

p(x 1:V ) = ∏ p(x t |x pa(t) )

t=1

if the Conditional Independence assumptions encoded in DAG G

are correct

23
Examples

Naive Bayes classifiers

p(y, x) = p(y) ∏ p(x j |y)

The naive Bayes assumption is rather naive, since it assumes the

features are conditionally independent.

26
Markov and hidden Markov models
Markov chain

p(x 1:T ) = p(x 1 )p(x 2 |x 1 )p(x 3 |x 2 ). . . = p(x 1 ) ∏ p(x t |x t−1 )

t=2

Hidden Markov Model

The hidden variables often represent quantities of interest, such as

the identity of the word that someone is currently speaking. The
observed variables are what we measure, such as the acoustic
waveform.

Directed Gaussian graphical models

Consider a DGM where all the variables are real-valued, and all the
Conditional Proba. Distributions have the following form:

T 2
p(x t |x pa(t) ) = N (x t |μ t + w x pa(t) , σ )
t t

Directed GGM (Gaussian Bayes net)

p(x) = N (x|μ, Σ)

28
Directed GGM (Gaussian Bayes net)
For convenience let rewrite the CPDs

xt = μt + ∑ w ts (x s − μ s ) + σ t z t

s∈pa(t)

where z t ∼ N (0, 1), σ t is the conditional standard deviation of

x t given its parents, wts is the strength of the s → t edge, and μ t

is the local mean.

Mean

The global mean is just the concatenation of the local means

t
μ = (μ 1 , . . . , μ D ) .
29

Directed GGM (Gaussian Bayes net)

Covariance matrix

(x − μ) = W (x − μ) + Sz

where S ≜ diag(S) Let consider e ≜ Sz = (I − W )(x − μ)

We have

−1 2
Σ = cov(x − μ) = cov((I − W ) e) = cov(U Sz) = U S U

where U = (I − W )
−1

30
Examples
Two extreme cases

Isolated vertices : Naive Bayes where Σ = S , p vertices, no

edges
Fully connected Graph: p vertices, p(p − 1)/2 directed edges

Click to goto exercice on Directed GGM

Learning {#learning}

33
Learning from complete data (with known
graph structure)
If all the variables are fully observed in each case, so there is no
missing data and there are no hidden variables, we say the data is
complete.

N N

p(D|θ) = ∏ p(x i |θ) = ∏ ∏ p(x it |x i,pa(t) , θ t )

i=1 i=1 t∈V

The likelihhod decomposes according the graph structure

Click to goto exercice on Sprinkler

Discrete distribution

N tck ≜ ∑ I(x i,t = k, x i,pa(t) = c)

i=1

N tck
and thus p(x
^ t = k, x pa(t) = c) =
∑ N ′
Of course, the MLE
′ tck
k

suffers from the zero-count

34
Conditional independence
properties of DGMs

Diverging edges (fork)

With the DAG

A ← C → B

with have

A ⊥̸ B

but

A ⊥ B|C

Exercice
37
Chain (Head - tail)
With the DAG

A → C → B

with have

A ⊥̸ B

but

A ⊥ B|C

Exercice
38

Converging edges (V) and collider

With the DAG

A → C ← B

with have

A ⊥ B

but

A ⊥
╱B|C

Exercice
Show it
Independence map

a directed graph G is an I-map (independence map) for p, or that p

is Markov wrt G,

iff I (G) ⊆ I (p), where I (p) is the set of all CI statements that
hold for distribution p.

This allows us to use the graph as a safe proxy for p

Minimal I-map

The fully connected graph is an I map of all distributions

d-separation
The “d” in d-separation and d-connection stands for
dependence.
d-separation is related the ideas of active path and active vertex
on a path
a path is active if it carries information, or dependence.
Thus, when the conditioning set is empty, only paths that
correspond to “causal connection” are active (creating
dependance).

40
d-separation: example of Pearl (1988)
two independent causes of your car refusing to start: having no
gas and having a dead battery.

dead battery –> car won’t start <– no gas

Telling you that the battery is charged tells you nothing about
whether there is gas,
Telling you that the battery is charged after I have told you that
the car won’t start tells me that the gas tank must be empty.

So independent causes are made dependent by conditioning on a

common effect, which in the directed graph representing the
causal structure is the same as conditioning on a collider. 41

d-separation
When a vertex is in the conditioning set, its status with respect to
being active or inactive flip-flops. If we condition by C

Are variables A and B are d-separated by C (in boldface).

1. A –> C –> B Inactive

2. A <– C <– B Inactive
3. A <– C –> B Iactive
4. A –> C <– B, C is a collider and thus inactive when the
conditioning set is empty, so condiitionning by C it becomes
Active (produce dependence)

42
Formulation d-separation definition
an undirected path P is d-separated by a set of nodes E iff at least
one of the following conditions hold:

P contains a chain, s → m → t or s ← m ← t where

m ∈ E

P contains a fork, s ← m → t where m ∈ E

P contains a collider, s → m ← t where m ∉ E and nor is

any descendant of m.

Alternative formulation of d-connection:

If G is a directed graph in which X, Y and E are disjoint sets of
vertices, then X and Y are d-connected by E in G if and only if there
exists an undirected path P between some vertex in X and some
vertex in Y such that

for every collider C on P, either C or a descendent of C is in E

(active path),
and no non-collider on P is in E (no inactive path).

X and Y are d-separated by E in G if and only if they are not d-

connected by E in G (all path are inactives… ).
Independance requires all possible paths to be inactive whereas
dependence requires only on leak (one active path)

see https://www.youtube.com/watch?v=yDs_q6jKHb0 for

examples

d-separation versus conditional independence

a set of nodes A is d-separated from a different set of nodes B
given a third observed set E iff each undirected path from every
node a ∈ A to every node b ∈ B is d-separated by E:

x A ⊥ G x B |x E ⇔ A is d-separated from B given E

45
Consequences of d-separation

Directed local Markov property

From the d-separation criterion, one can conclude that

t ⊥ nd(t)∖pa(t)|pa(t) where the non-descendants of a node

nd(t) are all the nodes except for its descendants

Consequences of d-separation
Ordered Markov property

A special case of directed local Markov property is when we only

look at predecessors of a node according to some topological
ordering. We have t ⊥ pred(t)∖pa(t)|pa(t)

47
Markov blanket
The set of nodes that renders a node t conditionally independent of
all the other nodes in the graph is called t’s Markov blanket

mb(t) ≜ pa(t) ∪ ch(t) ∪ copa(t)

The Markov blanket of a node in a DGM is equal to the parents, the

children, and the co-parents.

Markov blanket
To understand the Markov blanket, one could start with the local
Markov property which block the dependence to non-descendant
by conditioning on the parents.

To further block the path the descendants of t one has to

Condition on the children of t.

But conditioning on the children open the path to the
coparents.
Thus one needs conditioning on the coparents to block all
paths.

49
Graphical Model Learning
Structure (chapter 26
MLAPP)

Introduction
Two main applications of structure learning:

1. knowledge discovery (requires a graph topology)

2. density estimation (requires a fully specified model).
main obstacle

the number of possible graphs is exponential in the number of

nodes: a simple upper bound is O(2 V (V −1)/2 ).

52
Relevance network
A relevance network is a way of visualizing the pairwise mutual
information between multiple random variables:

we simply choose a threshold α

draw an edge from node i to node j if I(X i ; X j ) > α

Major problem

the graphs are usually very dense,

most variables are dependent on most other variables, even
after thresholding the MIs.

Gaussian case
In the Gaussian case, I(X i ; X j ) = −1/2 log (1 − ρ
2
ij
) where
ρ ij is the correlation coefficient so we are essentially visualizing Σ;

this is known as the covariance graph.

Exercice : Gaussian mutual information

Show the previous statement

54
Dependency networks

Learning tree structures

Since the problem of structure learning for general graphs is NP-
hard (Chickering 1996), we start by considering the special case of
trees. Trees are special because we can learn their structure
efficiently

56
Joint Distribution associated to a directed tree
A directed tree, with a single root node r, defines a joint distribution
as follows

p(x|T ) = ∏ p(x t |x pa(t) )

t∈V

The distribution is a product over the edges and the choice of root
does not matter
Symetrization

To make the model more symmetric, it is preferable to use an

undirected tree:

( )

p(x s , x t )
p(x|T ) = ∏ p(x t ) ∏
p(x s )p(x t )
t∈V (s,t)∈E

57
Chow-Liu algorithm for finding the ML tree
structure (1968)
Goal: Chow Liu algorithm constructs tree distribution
approximation that has the minimum Kullback–Leibler divergence
to the actual distribution (that maximizes the data likelihood)
Principle

1. Compute weight I (s, t) of each (possible) edge (s, t)

2. Find a maximum weight spanning tree (MST)
3. Give directions to edges in MST by chosing a root node

Chow-Liu algorithm for finding the ML tree

structure (1968)
log-likelihood

log P (θ|D, T ) = ∑ N tk log p(x t = k) + ∑ ∑ N stjk log

p
tk st jk

N tk N stjk
thus p(x
^ t = k) =
N
and p(x
^ s = j, x t = k) =
N
.
Mutual information of a pair of variables

^
p(x s = j, x t = k)
I (s, t) = ∑ p(x
^ s = j, x t = k) log
^
p(x ^
s = j)p(x t = k)
jk
The Kullback–Leibler divergence

^
log P (θ M L |D, T )
^
= ∑ p(x ^
t = k) log p(x t = k) + ∑ I (s
N
tk st

Chow-Liu algorithm
There are several algorithms for finding a max spanning tree
(MST). The two best known are - Prim’s algorithm and - Kruskal’s
algorithm.

Both can be implemented to run in O(ElogV ) time, where

E = V
2
is the number of edges and V is the number of nodes.

60
Exercice Gaussian Chow-Liu
1. Show that in the Gaussian case, I (s, t) = − 12 log(1 − ρ 2st )
,where ρ st is the correlation coefficient (see Exercise 2.13,
Murphy)
2. Given a realisation of n gaussian vector of size p find the ML
tree structured covariance matrix using Chow-Liu algorithm.

TAN: Tree-Augmented Naive Bayes

Naive Bayse with Chow-Liu

62
Mixtures of trees
A single tree is rather limited in its expressive power.
learning a mixture of trees (Meila and Jordan 2000), where
each mixture component may have a different tree topology is
an alternative
Tntegrating out over all possible trees.

This can be done in V 3

time using the matrix tree theorem.

Learning DAG structures

Three DAGs. G1 and G3 are Markov equivalent,G2 is not.

Graphs are Markov equivalent

if they encode the same set of CI assumptions

64
Learning DAG structures
An ill posed problem

when we learn the DAG structure from data, we will not be able to
uniquely identify all of the edge directions

we can learn DAG structure “up to Markov equivalence”.

Do not read too much into the meaning of particular edge

orientations, since we can often change them without changing
the model in any observable way.

Exact structural inference

Exact structural inference is based on the computation of exact
posterior over graphs, p(G|D).

It requires:

the computation of the likelihood p(D|G)

the computation of the prior p(G)

This solution allows to compared different graph in terms of

posterior and eventually find the MAP if the search space is small

66
Exact structural inference (categorical case)
Consider x it ∈ {1, ⋯ , K t } be the value of node t in case i,
where

Kt is the number of states for node t.

θ tck ≜ p(x t = k|x pa(t) = c) , for k = 1 : K t , and c = 1 : Ct

, where C t is the number of parent combinations (possible

conditioning cases).

Let d t = dim(pa(t)) be the degree or fan-in of node t, so that

Ct = K
dt
.

Exact structural inference (categorical case)

Prior

V V Ct

p(θ) = ∏ p(θ t ) = ∏ ∏ p(θ tc )

t=1 t=1 c=1

where C t is the number of parent combinations (possible

conditioning cases)
Likelihood

V Ct Kt
N tck
p(D|G, θ) = ∏ ∏ ∏ θ
tck

t=1 c=1 k=1

where N tck is the number of time node t is in state k and its parent
in state c.

Exact structural inference (categorical case)

Chosing a Dirichlet prior p(θ tc ) = Dir(θ tc |α tc ) allows to
compute the posterior

V Ct
B(N tc + α tc )
p(D|G) = ∏ ∏
B(α tc )
t=1 c=1

where N tc = ∑
k
N tck , and α tc = ∑
k
α tck .
Local scoring
For node t and its parents

Ct
B(N tc + α tc )
score(N t,pa(t) ) ≜ ∏
B(α tc )
c=1

Marginal likelihood factorizes according to the graph structure.

Setting the prior

How should we set the hyper-parameters α tck ?

Jeffreys prior of the form α tck = 1/2 violates a property called

likelihood equivalence
This property says that if G1 and G2 are Markov equivalent ,
they should have the same marginal likelihood
BDe prior

Geiger and Heckerman (1997) proved that, for complete

graphs, the only prior that satisfies likelihood equivalence and
parameter independence is the Dirichlet prior, where the pseudo
counts have the form
α tck = αp 0 (x t = k, x pa(t) = c)

where α > 0 is called the equivalent sample size, and p 0 is some

prior joint probability distribution. This is called the BDe prior
(Bayesian Dirichlet likelihood equivalent).

Example of Exact structural inference

(Neapolitan 2003, p.438)

71
Scaling up to larger graphs
The main challenge in computing the posterior over DAGs is that
there are so many possible graphs.

Consequently, we must settle for finding a locally optimal MAP

DAG.
Popular solution: Greedy hill climbing

Learning causal DAGs

Causal models

predict the effects of interventions to, or manipulations of, a

system.
Causal claims are inherently stronger, yet more useful, than
purely associative claims
Causal interpretation of DAGs

A → B in a DAG to mean that “A directly causes B” so if we

manipulate A, then B will change.
Known as the causal Markov assumption.

73
Intervention
Perfect intervention

represents the act of setting a variable to some known value

A real world example of such a perfect intervention is a gene
knockout experiment
do calculus notation

do(X i = x i ) to denote the event that we set X i to x i

A causal model makes inferences of the form

p(x|do(X i = x i )),

Different from making inferences of the form p(x|X i .

= xi )

Observing versus doing

Consider a 2 node DGM S → Y

S = 1 if you smoke
S = 0 otherwise,
Y = 1 if you have yellow-stained fingers
Y = 0 otherwise.

If I observe you have yellow fingers, I am licensed to infer that you

are probably a smoker (since nicotine causes yellow stains):

p(S = 1|Y = 1) > p(S = 1)

If I intervene and paint your fingers yellow, I am no longer licensed
to infer this, since I have disrupted the normal causal mechanism.
Thus

p(S = 1|do(Y = 1)) = p(S = 1)

Graph surgery

One way to model perfect interventions is to use graph surgery: -

represent the joint distribution by a DGM, - cut the arcs coming into
any nodes that were set by intervention.
76
Exercices on Directed
Graphical Models

Exercice Gaussian Bayesian Network

Data

Let consider the following graph x 1 → x 2 → x 3 where -

E[x 1 ] = b 1 , E[x 2 ] = b 2 , E[x 3 ] = b 3 - x 1 = b 1 + z 1 -

x 2 = b 2 + (x 1 − b 1 ) + z 2 - x 3 = b 3 + 1/2(x 2 − b 2 ) + z 3 -

σ 1 = σ 2 = σ 3 = 1,

Problem

79
Exercice Directed GGM
T
μ = (0, 1, 2)

diag(S) = (1, 1, 1)

0 0 0
⎛ ⎞
W = 1 0 0
⎝ ⎠
0 1/2 0

Exercice Directed GGM

We can observe that the precision matrix has the some support as
W

1 n=1000
2 mu=c(0,1,2)
3 sigma=c(1,1,1)
4 W=matrix(c(0,1,0,0,0,1/2,0,0,0),3,3)
5 U=solve(diag(rep(1,3))-W)
6 S=diag(sigma)
7 Sigma=U%*%S^2%*%t(U)
8 solve(Sigma)
[,1] [,2] [,3]
[1,] 2 -1.00 0.0
[2,] -1 1.25 -0.5
[3,] 0 -0.50 1.0

81
Exercice Directed GGM
First solution (direct)
1 library(mvtnorm)
2 Xprime=rmvnorm(n,mean=c(0,1,2),sigma=Sigma)

Second solution (constructive)

1 X=matrix(0,n,3)
2 Z=matrix(rnorm(n*3),n,3)
3 for (i in 1:n)
4 for (j in 1:3)
5 X[i,j]=mu[j]+sigma[j]*Z[i,j] + sum(W[j,]*(X[i,]-mu))

Click to go Back to Lecture

Sprinkler Exercice
Let us define the structure of the network
1 library(bnlearn)
2 library(visNetwork)
3 variables<-c("Nuageux","Arrosage","Pluie","HerbeMouillee")
4 net<-empty.graph(variables)
5 adj = matrix(0L, ncol = 4, nrow = 4, dimnames=list(variables, variables))
6 adj["Nuageux","Arrosage"]<-1
7 adj["Nuageux","Pluie"]<-1
8 adj["Arrosage","HerbeMouillee"]<-1
9 adj["Pluie","HerbeMouillee"]<-1
10 amat(net)=adj

83
Sprinkler Exercice
1 #plot.network(net) # for a nice html plot
2 plot(net)

Sprinkler Exercice
Simulate a sample according the model

85
Basic Simulation with using conditional
probability tables
Function for one event (one line of dataframe)
1 NAPHM1<-function(n){
2 N<-rbinom(1,size = 1,prob = 1/2)
3 if (N==1) {A<-rbinom(1,size = 1,prob = 0.1)} else {A<-rbinom(1,size =1,prob = 0.5
4 if (N==1) {P<-rbinom(1,size = 1,prob = 0.8)} else {P<-rbinom(1,size = ,1,prob = 0
5 if (A+P==0) HM<-rbinom(1,size = 1,prob = 0.1) else if
6 (A+P==1) HM<-rbinom(1,size = 1,prob = 0.9) else
7 HM<-rbinom(1,size = 1,prob = 0.99)
8 X<-as.logical(c(N,A,P,HM))
9 }

Basic Simulation with using conditional

probability tables
1 n<-1000
2 X<-data.frame(t(sapply(1:n,NAPHM1)))
3 names(X)<-c("Nuageux","Arrosage","Pluie","HerbeMouillee")
4 head(X)
Nuageux Arrosage Pluie HerbeMouillee
1 TRUE FALSE TRUE TRUE
2 TRUE FALSE TRUE TRUE
3 TRUE FALSE TRUE TRUE
4 FALSE TRUE FALSE TRUE
5 FALSE TRUE FALSE FALSE
6 TRUE TRUE TRUE TRUE

87
Learning the parameters
1 mean(X$Nuageux) -> pNuageux
2 lapply(sousTableauxNuageux<-split(X,X$Nuageux),
3 function(XsousTableau){mean(XsousTableau$Arrosage)})
4 lapply(sousTableauxNuageux<-split(X,X$Nuageux),
5 function(XsousTableau){mean(XsousTableau$Pluie)})
6 lapply(sousTableauxNuageux<-split(X,X$Arrosage + X$Pluie),
7 function(XsousTableau){mean(XsousTableau$HerbeMouillee)})

Back to lecture

Exercices directed Graphical Model

Joint distribution and graphical decomposition (Bishop 8.3)

The joint distribution over three binary variables

89
Exercices directed Graphical Model
Bishop 8.3

Consider three binary variables a, b, c ∈ {0, 1} having the joint

distribution given in Table above. Show by direct evaluation that
this distribution has the property that a and b are marginally
dependent, so that p(a, b) ≠= p(a)p(b), but that they become
independent when conditioned on c, so that
p(a, b ∣ c) = p(a ∣ c)p(b ∣ c) for both c = 0 and c = 1.

Exercices directed Graphical Model

Bishop 8.4

Show by direct evaluation that p(a, b, c) = p(a)p(c ∣ a)p(b ∣ c) .

Draw the corresponding directed graph.

91
Local Markov Property
directed local Markov property

t ⊥ nd(t)∖pa(t)|pa(t) where the non-descendants of a node

nd(t) are all the nodes except for its descendants

We the topological ordering we have

p(x t |x 1 , ⋯ , x t−1 ) = p(x t |x nd(t) ) = p(x t |x pa(t) )

Thus

p(x t , x nd(t)∖pa(t) |x pa(t) ) = p(x nd(t)∖pa(t) |x pa(t) )p(x t |x pa(t) , x nd

= p(x nd(t) |x pa(t) )p(x t |x pa(t) )

Gaussian mutual information

p(x s , x t )
I (s, t) =E[log ]
p(x s )p(x t )

2
1 |Σ| 1 1/σ
t −1 t 1
= − log − E[z Σ z − z [
2 2
2 |diag(σ , σ )| 2 0
1 2

2
1 1/σ
2 t −1 1
= − log(1 − ρ ) − 1/2trace(E[zz (Σ − [
2 0

2
1 1 σ 12 /σ
2 2
= − log(1 − ρ ) − trace(I − [ ])
2
2 σ 12 /σ 1
1

1
2
= − log(1 − ρ )
2
xs μs
where z = [ ] − [ ] and E[zz t ] = Σ
xt μt

KL-divergence
Maximizing log-likelihood is equivalent to minimizing KL-
divergence

94
Projects

List 2023
Explain a concept and illustrate with an example:
10 minutes OBS recording
Commented Code Notebook (not a full report).

1. Simulation of images using a Strauss model (Markov Random

Field). You may use the paper “Markov Random Field Texture
Models” Code for simulation Bonus : estimation of the
parameters
2. Programmation of Graphical Lasso. You may use the paper
“Sparse inverse covariance estimation with the graphical lasso”
Original code of the algorithm illustrated with sachs data
3. Program you own Restricted Boltzmann Machine for prediction.
You may use the paper “A Practical Guide to Training Restricted
Boltzmann Machines” Original code of the algorithm with
illustration on MNIST dataset
4. Structural equation models (SEM) using the NoTears approach.
You may use the paper “DAGs with NO TEARS: Continuous
Optimization for Structure Learning” Use the code from
https://github.com/xunzheng/notears and illustrate with one 97

HKMO Solution Final
0% (3)
HKMO Solution Final
248 pages
Building Probabilistic Graphical Models With Python
No ratings yet
Building Probabilistic Graphical Models With Python
24 pages
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
100% (1)
Probabilistic Graphical Models Principles and Techniques - Koller, Friedman - Unknown - 2009
1,266 pages
CH 6-Graphs
No ratings yet
CH 6-Graphs
46 pages
Graphical
No ratings yet
Graphical
99 pages
Structure Learning in Graphical Modeling
No ratings yet
Structure Learning in Graphical Modeling
28 pages
Directed vs. Undirected Graphical Models
No ratings yet
Directed vs. Undirected Graphical Models
16 pages
BN DBN SSM HMM - ghahramani
No ratings yet
BN DBN SSM HMM - ghahramani
30 pages
Bayes Ball
No ratings yet
Bayes Ball
5 pages
Graph Theory Piyushwairale
No ratings yet
Graph Theory Piyushwairale
21 pages
PMRslides 03 B
No ratings yet
PMRslides 03 B
45 pages
Introduction To Graphs (Part-I - III)
No ratings yet
Introduction To Graphs (Part-I - III)
69 pages
2.-UndirectedGraphs 2
No ratings yet
2.-UndirectedGraphs 2
8 pages
Lecture 12 Graph
No ratings yet
Lecture 12 Graph
57 pages
Module 6. Mathematics of Graphs (1)
No ratings yet
Module 6. Mathematics of Graphs (1)
95 pages
Graphs: Definition, Applications, Representation
No ratings yet
Graphs: Definition, Applications, Representation
12 pages
PGM Theory Notes
No ratings yet
PGM Theory Notes
16 pages
Epi Summer 24
No ratings yet
Epi Summer 24
291 pages
Introduction To Graph Theory
No ratings yet
Introduction To Graph Theory
22 pages
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Probabilistic Graphical Models: EEE 485/585 Statistical Learning and Data Analytics
29 pages
Reader+5+ +Basics+of+Graph+Theory
No ratings yet
Reader+5+ +Basics+of+Graph+Theory
29 pages
CENG 205 Data Structures: Graphs
No ratings yet
CENG 205 Data Structures: Graphs
72 pages
Graphs
No ratings yet
Graphs
154 pages
Graphs
No ratings yet
Graphs
21 pages
BBM201-Lecture11
No ratings yet
BBM201-Lecture11
72 pages
Chapter 3-Graph Algorithms: 2021 Prepared By: Beimnet G
No ratings yet
Chapter 3-Graph Algorithms: 2021 Prepared By: Beimnet G
42 pages
DataStructure - Graphs
No ratings yet
DataStructure - Graphs
24 pages
Chapter10 Graphs
No ratings yet
Chapter10 Graphs
90 pages
Markov Networks
No ratings yet
Markov Networks
22 pages
Slides Bnshort
No ratings yet
Slides Bnshort
322 pages
ST Flour Notes
No ratings yet
ST Flour Notes
104 pages
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
No ratings yet
2 Graphical Models in A Nutshell: Daphne Koller, Nir Friedman, Lise Getoor and Ben Taskar
43 pages
n5
No ratings yet
n5
17 pages
Graph Final
No ratings yet
Graph Final
87 pages
BN Lecture2
No ratings yet
BN Lecture2
37 pages
DAA Unit-III
No ratings yet
DAA Unit-III
40 pages
prob_inf
No ratings yet
prob_inf
56 pages
All of Graphical Models
No ratings yet
All of Graphical Models
135 pages
Complex Network Models
No ratings yet
Complex Network Models
110 pages
Week 2 Graph Theory
No ratings yet
Week 2 Graph Theory
44 pages
MIT14 15JS18 Lec2-3
No ratings yet
MIT14 15JS18 Lec2-3
30 pages
DS Unit3 1. Graph
No ratings yet
DS Unit3 1. Graph
29 pages
Graphs
No ratings yet
Graphs
19 pages
Optimization On Graphs
No ratings yet
Optimization On Graphs
23 pages
fe9a1c29-3bdb-4273-82e3-3f0ff5815757
No ratings yet
fe9a1c29-3bdb-4273-82e3-3f0ff5815757
73 pages
Slides 03 A
No ratings yet
Slides 03 A
21 pages
Unit2 Graphs
No ratings yet
Unit2 Graphs
15 pages
Chapter 10
No ratings yet
Chapter 10
88 pages
Unit5 Graphs
No ratings yet
Unit5 Graphs
40 pages
Biol Sistemas 1 Redes
No ratings yet
Biol Sistemas 1 Redes
70 pages
Oxford SC2 Transcribed Notes
No ratings yet
Oxford SC2 Transcribed Notes
42 pages
Topic 1- Graphs
No ratings yet
Topic 1- Graphs
14 pages
Graph Worksheets
No ratings yet
Graph Worksheets
21 pages
Monte Carlo Artificial Intelligence: Bayesian Networks
No ratings yet
Monte Carlo Artificial Intelligence: Bayesian Networks
26 pages
Graph Theory (1)
No ratings yet
Graph Theory (1)
21 pages
Graph Theory
No ratings yet
Graph Theory
89 pages
Graphs: Algorithmic Thinking Luay Nakhleh
No ratings yet
Graphs: Algorithmic Thinking Luay Nakhleh
21 pages
Graph Theory
No ratings yet
Graph Theory
163 pages
Probabilistic Graphical Models Homework Solutions
100% (2)
Probabilistic Graphical Models Homework Solutions
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Mathematical Functions
From Everand
Mathematical Functions
Oliver Linton
No ratings yet
Tifr Dec2011 Question Paper
0% (1)
Tifr Dec2011 Question Paper
15 pages
Economic Growth Lecture 13 2023
No ratings yet
Economic Growth Lecture 13 2023
21 pages
Grade 8 02-18-20
No ratings yet
Grade 8 02-18-20
2 pages
UES103 Syllabus
No ratings yet
UES103 Syllabus
2 pages
Homework 5.1 The Cell Cycle
100% (1)
Homework 5.1 The Cell Cycle
6 pages
PL-SQL Tables
No ratings yet
PL-SQL Tables
2 pages
Computer Science Solved Mcqs
No ratings yet
Computer Science Solved Mcqs
10 pages
International Mathematical Olympiad Preliminary Selection Contest 2011 - Hong Kong
No ratings yet
International Mathematical Olympiad Preliminary Selection Contest 2011 - Hong Kong
8 pages
Elementary Statistics A Step Step Approach 10th Edition Bluman Solutions Manual instant download
100% (6)
Elementary Statistics A Step Step Approach 10th Edition Bluman Solutions Manual instant download
51 pages
Introduction To Philosophy and Arguments
No ratings yet
Introduction To Philosophy and Arguments
15 pages
EC3000 Sem3 (Lec5 Answer)
No ratings yet
EC3000 Sem3 (Lec5 Answer)
8 pages
Ex# (3.1-3.2-3.4) First Year Step
No ratings yet
Ex# (3.1-3.2-3.4) First Year Step
11 pages
Module On Acceleration
100% (1)
Module On Acceleration
4 pages
Quadratic Equations DPP 1666087837458
No ratings yet
Quadratic Equations DPP 1666087837458
2 pages
Mod 5 Engine Testing
No ratings yet
Mod 5 Engine Testing
73 pages
L3 - Fuzzy Relations
No ratings yet
L3 - Fuzzy Relations
55 pages
Normalised Least Mean-Square Adaptive Filtering: (Fast) Block LMS ELE 774 - Adaptive Signal Processing 1
No ratings yet
Normalised Least Mean-Square Adaptive Filtering: (Fast) Block LMS ELE 774 - Adaptive Signal Processing 1
28 pages
Planning Your Career
100% (1)
Planning Your Career
17 pages
Group 4 A1 E101 Final Lab Report
No ratings yet
Group 4 A1 E101 Final Lab Report
6 pages
Models - Porous.pore Scale Flow 3d
No ratings yet
Models - Porous.pore Scale Flow 3d
16 pages
Understanding Multivariate Research A Primer For Beginning Social Scientists First Edition. Edition Berry 2024 Scribd Download
100% (4)
Understanding Multivariate Research A Primer For Beginning Social Scientists First Edition. Edition Berry 2024 Scribd Download
60 pages
Student Study Guide and Solutions Chapter 1 FA14 PDF
No ratings yet
Student Study Guide and Solutions Chapter 1 FA14 PDF
7 pages
Puzzles Paper - 2: Search This Website
No ratings yet
Puzzles Paper - 2: Search This Website
8 pages
Learning Mathematics With Pattern Blocks - Activities
No ratings yet
Learning Mathematics With Pattern Blocks - Activities
8 pages
COMS W3261 Computer Science Theory
No ratings yet
COMS W3261 Computer Science Theory
3 pages
Download 50 Visions of Mathematics 1st Edition Sam Parc ebook All Chapters PDF
100% (5)
Download 50 Visions of Mathematics 1st Edition Sam Parc ebook All Chapters PDF
79 pages
Notes On Jensen's Inequality
No ratings yet
Notes On Jensen's Inequality
7 pages
Airnguru Revenue Management
100% (1)
Airnguru Revenue Management
29 pages
Program: Mathematics and Artificial Intelligence: Section 1. Linear Algebra and Analytic Geometry
No ratings yet
Program: Mathematics and Artificial Intelligence: Section 1. Linear Algebra and Analytic Geometry
49 pages