Message Passing AlMessage Passing Algorithms For Exact Inference - Parameter Learninggorithms For Exact Inference - Parameter Learning
Message Passing AlMessage Passing Algorithms For Exact Inference - Parameter Learninggorithms For Exact Inference - Parameter Learning
Part I
TWO MESSAGE PASSING
ALGORITHMS
2
C,D
230[G,I]=
=D20[G,I,D]12 [D]
2 3[G , I]
G,I,D
G,S,I
G,I
21[ D ]
10[C,D]=
P(c)P(D|C)
3 5 [ G , S ]
3 2 [ G , I ]
20[G,I,D]=
P(G|I,D)
5 4 [G , S ]
5
G,J,S,L
G,S
4 5 [G , S]
5 3 [G , S ]
30[G,S,I]=
P(I)P(S|I)
50[G,J,S,L]=
P(L|G)P(J|L,S)
40[G,H,J]=
P(H|G,J)
Belief3[G,S,I]=
30[G,S,I]23 [G,I] 53 [G,S]
G,H,J
G,J
Compute P(X)
L
H
J
3
[C ] =
Ci S i , j
C,D
G,I,D
C j Si , j
Sepset belief
[C j ] = i , j ( S i , j )
i , j ( D) = i [C , D] = j [G, I , D]
C
G ,I
Ci T
i [Ci ]
( Ci C j )T
Proof:
i , j ( Si , j )
i , j ( S i , j ) = C S i [C i ] = C S i0 [C i ] kN k i
i
i, j
i, j
= C S [C i ] j i ( S i , j ) k N
i
0
i
i, j
{ j }
= j i ( S i , j ) C S [C i ] k N
i
0
i
i, j
{ j }
= j i ( S i , j ) i j ( S i , j )
Ci T
( Ci C j
i [Ci ]
( Si , j )
)T i , j
Ci T
0
i
k i
k i
Definition
i j = C S 0 kN { j} k i
i
[Ci ]kN k i
( Ci C j
)T i j j i
i, j
= C T i0 [Ci ] = P ( X)
i
Clique tree
A,B
P(C | B) =
P( B, C ) 2 [ B, C ] 2 [ B, C ]
=
=
P ( B)
P ( B)
12 [ B]
B,C
1[ A, B] 2 [ B, C ]
2 [ B]
Clique tree invariant
P ( X) =
Ci T
( Ci C j )T
i , j
i, j
Ci S i , j
0 kN k i
i
j i
Ci S i , j
j i
Sepset belief
i , j ( Si , j )
Bayesian network
X1
X3
X2
X1,X2
X4
X2,X3
X3
X3,X4
Root: C2
0
C1 to C2 Message: 12 ( X 2 ) = X [ X 1 , X 2 ] = X P(X 1 ) P( X 2 | X 1 )
0
C2 to C1 Message: 21 ( X 2 ) = X [ X 2 , X 3 ] 32 ( X 3 )
X2
Alternatively compute 2 [ X 2 , X 3 ] = 12 ( X 2 ) 32 ( X 3 ) 2 [ X 2 , X 3 ]
And then: Sepset belief 1, 2 ( X 2 )
0
21 ( X 2 ) =
X3
2[ X 2 , X 3 ]
12 ( X 2 )
= 20 [ X 2 , X 3 ] 32 ( X 3 )
X3
Ci
Si,j
Cj
Ci Si , j i
by previous i,j
i, j
Update the clique belief j by multiplying with
i0
i, j
10
Select CiCj
i : ( )=i
i , j 1
i j C S i
i
j j i j
i , j
i, j
i , j i j
11
Ci T
( Ci C j )T
i , j ( S i , j )
Ci T
i [Ci ]
( Ci C j )T
i , j ( Si , j )
= P ( X)
i , j
Answering Queries
Since clique tree is calibrated, multiply clique that contains X and Z with
indicator function I(Z=z) and sum out irrelevant variables.
P ( X ) =
P (X, Z = z) = 1{Z = z}
13
14
The converse holds: any chordal graph can be used as the basis for
inference.
Any chordal graph can be associated with a clique tree (Theorem 4.12)
Union of all of the graphs resulting from the different steps of the variable elimination
algorithm.
Xi and Xj are connected if they appeared in the same factor throughout the VE
algorithm using as the ordering
16
Step II: Find cliques in H and make each a node in the clique tree
Any chordal graph can be associated with a clique tree (Theorem 4.12)
Use maximum spanning tree algorithm on an undirected graph whose nodes are
cliques selected above and edge weight is |CiCj|
We can show that resulting graph obeys running intersection valid clique tree
17
Example
C
C
Moralized
Graph
D
G
One possible
triangulation
D
G
L
J
C,D
G,I,D
1
1
G,S,I
L
J
G,S,L
L,S,J
C,D
1
2
G,I,D
G,S,I
G,H
G,S,L
L,S,J
G,H
1
18
Part II
PARAMETER LEARNING
19
Learning Introduction
20
10
Learning Introduction
Measures of success
Classification accuracy
How close is the structure of the network to the true one?
Prior Knowledge
Prespecified structure
Prespecified variables
Hidden variables
Complete/incomplete data
Missing data
Unobserved variables
22
11
Data
Prior information
X2
Inducer
Y
P(Y|X1,X2)
X1
X2
y0
x1 0
x2 0
x1 0
x2 1
0.2
0.8
x1 1
x2 0
0.1
0.9
x1 1
x2 1
0.02
0.98
y1
23
Initial
network
X1
X2
X1
Inducer
Y
Input
Data
X2
Y
X1
X2
x1 0
x2 1
y0
x1 1
x2 0
y0
X1
X2
y0
x1 0
x2 1
y1
x1 0
x2 0
x1 0
x2 0
y0
x1 0
x2 1
0.2
0.8
x1 1
x2 1
y1
x1 1
x2 0
0.1
0.9
x1 0
x2 1
y1
x1 1
x2 1
0.02
0.98
x1 1
x2 0
y0
P(Y|X1,X2)
y1
24
12
Initial
network
X1
X2
X1
Inducer
Y
Input
Data
X2
Y
X1
X2
x1 0
x2 1
y0
x1
x2
y0
X1
X2
y0
x1
x2
y1
x1 0
x2 0
x1
x2
y0
x1 0
x2 1
0.2
0.8
x1 1
x2 1
y1
x1 1
x2 0
0.1
0.9
x1
x2
y1
x1 1
x2 1
0.02
0.98
x1
x2
y0
P(Y|X1,X2)
y1
25
Initial
network
X1
X2
X1
Inducer
Y
Input
Data
X2
Y
X1
X2
x2 1
y0
x1 1
y0
X1
X2
y0
x2 1
x1 0
x2 0
x1 0
x2 0
y0
x1 0
x2 1
0.2
0.8
P(Y|X1,X2)
y1
x2 1
y1
x1 1
x2 0
0.1
0.9
x1 0
x2 1
x1 1
x2 1
0.02
0.98
x1 1
y0
26
13
Initial
network
X1
X2
X1
Inducer
Y
X1
X2
x2 1
y0
y0
X1
X2
y0
x2
x1 0
x2 0
x2
y0
x1 0
x2 1
0.2
0.8
x2 1
y1
x1 1
x2 0
0.1
0.9
x1 1
x2 1
0.02
0.98
x1
Input
Data
X2
x1
?
x1
x1
x2
?
P(Y|X1,X2)
y1
y0
27
Parameter Estimation
Input
Network structure
Choice of parametric family for each CPD P(Xi|Pa(Xi))
28
14
Estimation task
L(D:)
L( H , T , T , H , H : ) = P( H | ) P (T | ) P (T | ) P ( H | ) P ( H | ) = 3 (1 ) 2
0.2
0.4
0.6
0.8
30
15
L(D:)
0.2
0.4
0.6
0.8
31
General case
l ( M H , M T : ) = M H log + M T log(1 )
32
16
Acknowledgement
33
17