Lecture1_EM-Variational-inference
Lecture1_EM-Variational-inference
Pnathosh A. P .
ECE ,
IISc .
Variational
Inference Ee Expectation Maximization .
Background Reading
:
Gradient descent
KL
divergence
1 .
6 .
4.
Expectations Er
gradients
5. Jensen 's
Inequality
Mixture Density models :
consider Data D= { ×,
,
✗2 .
. . _
,
✗n
} in I. i. de (Pa is )
unknown
✗ c- lRᵈ
Model
p ,
to be a convex combination
of several
¥4; pick)
M
Po (e) 4=1
=
Li ≥ 0
, _¥ ,
GMM :
Po G)
=
IE X
; .N ( x
; iii.
,
The
problem : Obtain the Max . Likelihood estimate
GMM D.
for parameters given
eco ) =
É , log Potti)
=
TÉ log IÉ
,
N(×i ; Mj , ;)
-2 -
(1)
( instead
Eq C)
. has sum
of logs of usually occurring
) Thus
log
22¥
does not lead to
of sums .
closed in 0
form covens
. .
One around : Assume / hidden / Latent
missing
work a
{ :}
'
complete data D =
✗
i. 2
i -4
n i. i.
tape
Observed data D= {✗i } :[ in i. i. d
¥
this
Under
assumption ,
ID is sampeled as below :
)
i
sample 2in
12 (Pz is the prior on latent variables
i :) sample *
:-p
*
12=-7-5
Note
p•C✗)
=
]p• (4-2) DE or
E- p•( ✗
if
2 is discrete
Ilo)
log polx) log / Poke)dz
°
• .
=
Since 2 is unobserved ,
let us assume a distribution
qlz) over it ,
called the variational distribution .
Now ,
tho ) =
log / Po
Ki 2) dz
q( 2)
≥
[ qlz) log
I
Pocx 2)
,
dz [ Jensen's inequality ]
02C 2)
É Fo (g) + H (g)
log Po
=
It
(g) =
(2)
logqlasdz
-
Folq) is a
functional which is a lower bound on the
( Evidence)
data
dog likelihood and hence termed the
In order to ICO)
maximize ,
we
maximize Folq) instead
q€* =
argmax Foetal
q
M Fo (g)
step :
Maximize w.at - O
given q .
Oᵗᵗ
'
Fo ( OE )
argmax
=
argmax
0 / OF log Po ( ✗ F) die .
Lemma : EM
algorithm never decreases the
log likelihood .
:
Proof
Consider ICO) -
Fo (g) =
02127
OLCZ)
=
log Poli) -
Po #)
×
= DKL [04411%121×1]
i. t.co) =
Fo (g) Eff DKL [ qC2) Up (21×7)=0 ,
02k) E-
poczlx)
of (21×1)
°
• . in E- step =
p→
110¥ =E →
(E) ≤ F(qᵗ) [Ensured via the
M
step]
llot )
'
E- Not )
-
8. E. D i☒%
In to the likelihood
summary
in
,
maximize a
Subsequently ,
the lower bound
is
iteratively optimized over variational Er the
sampling .
Variation Auto Encoders .
the variational
optimization over
density is tractable .
i. e.
, of =
PEG / )
× which can be
analytically
computed .
intractable .
such
sampling for cases .
References:
1. http://stillbreeze.github.io/Variational-Inference-and-
Expectation-Maximization/
2. https://www.cs.cmu.edu/~tom/10-702/Zoubin-702.pdf
3. https://arxiv.org/pdf/1312.6114.pdf
4. https://arxiv.org/pdf/1606.05908.pdf
5. https://gregorygundersen.com/blog/2018/04/29/
reparameterization/