Lecture6 Handout
Lecture6 Handout
Niklas Wahlström
Division of Systems and Control
Department of Information Technology
Uppsala University
[email protected]
www.it.uu.se/katalog/nikwa778
a d
b e f
1
p(a, b, c, d, e, f ) = ψ1 (a, b, c)ψ2 (c, d, e)ψ3 (e, f )
Z
|=
a d
b C e f
A B
p(b, f |c) = p(b|c)p(f |c)
f1 a
f3
c d
f5
f2 b
f4
Kullback-Leibler divergence
Expectation propagation
Variational inference
• D : observed data
• θ : parameters of some model explaining the data
• p(θ): prior belief of parameters before we collected any data
• p(θ|D): posterior belief of parameters after inferring data
• p(D|θ): likelihood of the data in view of the parameters
Some properties
• Non-negative KL p(x) k q(x) ≥ 0,
• KL p(x) k q(x) = 0 if and only if p(x) = q(x)
• Non-symmetric KL p(x) k q(x) 6= KL q(x) k p(x)
Suppose we have
p(θ) = 0.2N θ; 5, 12 + 0.8N θ; −5, 22
Let
q(θ) = N θ; µ, σ 2
q̂ = minµ,σ KL p k qµ,σ q̂ = minµ,σ KL qµ,σ k p
0.2 p(θ) 0.2 p(θ)
0.15 q̂(θ) 0.15 q̂(θ)
0.1 0.1
0.05 0.05
0 0
−10 0 10 −10 0 10
12 / 33 [email protected] Expectation propagation and Variational inference
Minimization of KL-divergence
q̂ = minµ,σ KL p k qµ,σ q̂ = minµ,σ KL qµ,σ k p
0.2 p(θ) 0.2 p(θ)
0.15 q̂(θ) 0.15 q̂(θ)
0.1 0.1
0.05 0.05
0 0
−10 0 10 −10 0 10
KL p k qµ,σ = KL qµ,σ k p =
Z Z
qµ,σ (θ) p(θ)
− p(θ) ln dθ − qµ,σ (θ) ln dθ
p(θ) qµ,σ (θ)
non-zero-forcing zero-forcing
Where p 0, q needs to be 0 Where p ≈ 0, q needs to be ≈ 0.
N (w; µ, κ2 ) N (t; w, σ 2 )
w t
µ1 µ2 µ3
µ1 (w) = N (w; m, κ2 )
µ2 (w) = µ1 (w)
Z
µ3 (t) = N (t; w, σ 2 )µ1 (w) dw
= N (t; m, σ 2 + κ2 )
p(t) ∝ µ3 (t) = N (t; m, σ 2 + κ2 )
15 / 33 [email protected] Expectation propagation and Variational inference
Example: Moment matching in factor graphs
µ1 (w) = N (w; m, κ2 )
µ2 (w) = µ1 (w)
Z
µ3 (t) = N (t; w, σ 2 )µ1 (w) dw
= N (t; m, σ 2 + κ2 )
µ1 (w) = N (w; m, κ2 )
µ2 (w) = µ1 (w) µ4 (y) = δ(y = 1)
µ5 (y) = µ4 (y)
Z
µ3 (t) = N (t; w, σ 2 )µ1 (w) dw
µ6 (t) = δ(t > 0)
= N (t; m, σ 2 + κ2 )
p(t|y)
p(t|y) ∝ µ3 (t)µ6 (t)
q̂(t)
mt = Ep(t|y) [t]
σt2 = Varp(t|y) [t]
q̂(t) = N t; mt , σt2
15 / 33 [email protected] Expectation propagation and Variational inference
Example: Moment matching in factor graphs
q̂(t)
µ7 (t) =
µ3 (t)
m = 0, κ = 1, σ = 1, y=1
0.1 0.1
0.0 0.0
4 3 2 1 0 1 2 3 4 4 3 2 1 0 1 2 3 4
w t
Note, we only did explicit moment matching in node t !
16 / 33 [email protected] Expectation propagation and Variational inference
Moment matching in graphs
If more than one node in the graph needs to be approximated the
solution will be iterative.
Example
• Prior: w ∼ N (m, κ2 )
• Likelihood: yn = sign(tn ), tn |w ∼ N (w, σ 2 ), n = 1, 2
What is p(w|y1 , y2 )?
µ1
µ2 µ11 • Moment matching in t1
w
⇒ a new message µ4
µ6 µ7
µ3 µ5 µ8 µ10 • Pass the messages to t2
• Moment matching in t2
t1 t2
⇒ a new message µ9
µ4 µ9
• Pass the messages to t1
y1 y2
• ...
17 / 33 [email protected] Expectation propagation and Variational inference
Moment matching in factor graphs
Comments
• In the factor graph example we approximated the marginals
p(θ i ) rather than the factors fj (θ).
• It can be shown that this is equivalent (by factorizing each
approximated factor f˜j (θ) further into its marginals)
• Normalize q̂j (θ j )
2. Repeat 1 until convergence
with
1
aN = a0 + ,
2
1
bN = b0 + Eq̂(w) [w2 ]
2
30 / 33 [email protected] Expectation propagation and Variational inference
Ex.: Variational linear regression (V/VI)
Now we proceed with q̂(w).
ln q̂(w) = Eq̂(α) [ln p(y, w, α)] + const.
= Eq̂(α) [ln p(y|w) + ln p(w|α) + ln p(α)] + const.
= ln p(y|w) + Eq̂(α) [ln p(w|α)] + const.
β 1
= − (wx − y)T (wx − y) − E[α]w2 + const.
2 2
1
= − (Eq̂(α) [α] + βx x)w + βxT yw + const.
T 2
2
(w − mN )2
=− 2 + const.
2σN
2
We recognize this as a Gaussian q̂(w) = N w; mN , σN where
2
σN = (Eq̂(α) [α] + βxT x)−1
2 T
mN = βσN x y
31 / 33 [email protected] Expectation propagation and Variational inference
Ex.: Variational linear regression (VI/VI)
Since
2
q̂(α) = Gam (α; aN , bN ) , q̂(w) = N w; mN , σN
we can compute
aN
Eq̂(α) [α] =
bN
Eq̂(w) [w ] = m2N + σN
2 2
Kullback-Leibler (KL) divergence: A distance KL p k q
between two distributions p and q
Expectation propagation:
A form of deterministic approximative
inference where KL p k q is minimized.
Variational inference:
A form of deterministic approximative
inference where KL q k p is minimized.