0% found this document useful (0 votes)
11 views

Lecture1_EM-Variational-inference

The document discusses advanced concepts in deep representation learning, focusing on Variational Inference and Expectation Maximization (EM) algorithms. It explains Gaussian Mixture Models (GMM) and the process of maximizing the likelihood estimate for parameters using latent variables. Additionally, it introduces Variational Autoencoders as a method for efficient maximum likelihood estimation and posterior inference when dealing with intractable densities.

Uploaded by

Saksham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lecture1_EM-Variational-inference

The document discusses advanced concepts in deep representation learning, focusing on Variational Inference and Expectation Maximization (EM) algorithms. It explains Gaussian Mixture Models (GMM) and the process of maximizing the likelihood estimate for parameters using latent variables. Additionally, it introduces Variational Autoencoders as a method for efficient maximum likelihood estimation and posterior inference when dealing with intractable densities.

Uploaded by

Saksham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Notes for E9 333 - Advanced Deep Representation Learning

Pnathosh A. P .

ECE ,
IISc .
Variational
Inference Ee Expectation Maximization .

Background Reading
:

Gradient descent
KL
divergence
1 .
6 .

Maximum Likelihood Estimation


2 . 7. Back
propagation .

Law the statistician


3.
of unconscious

4.
Expectations Er
gradients
5. Jensen 's
Inequality
Mixture Density models :

consider Data D= { ×,
,
✗2 .
. . _

,
✗n
} in I. i. de (Pa is )
unknown

✗ c- lRᵈ

Model
p ,
to be a convex combination
of several

densities termed Mixture Models


homogeneous .
:

That is , Let denote the model


density Then
Po .
,

¥4; pick)
M

Po (e) 4=1
=
Li ≥ 0
, _¥ ,

where polit any density


is
In particular if piicx n
NG ;ÑÉ) ,
then such

a model is termed as a Gaussian Mixture Model

GMM :
Po G)
=

IE X
; .N ( x
; iii.
,

The
problem : Obtain the Max . Likelihood estimate
GMM D.
for parameters given

Let 0 denote the parameters :


0--22 , µ ,
}, !
-2
The likelihood
function :

eco ) =
É , log Potti)

=
TÉ log IÉ
,
N(×i ; Mj , ;)
-2 -

(1)

( instead
Eq C)
. has sum
of logs of usually occurring

) Thus
log
22¥
does not lead to
of sums .

closed in 0
form covens
. .
One around : Assume / hidden / Latent
missing
work a

variable in the data .

{ :}
'

complete data D =

i. 2
i -4
n i. i.
tape
Observed data D= {✗i } :[ in i. i. d
¥
this
Under
assumption ,
ID is sampeled as below :

)
i
sample 2in
12 (Pz is the prior on latent variables

i :) sample *
:-p
*
12=-7-5

Zi is unobserved but Xi is observed .


perform Max likelihood parameter
to
question : How .

estimation such latent variable models ?


for

Note
p•C✗)
=

]p• (4-2) DE or
E- p•( ✗
if
2 is discrete

Ilo)
log polx) log / Poke)dz
°

• .
=

Since 2 is unobserved ,
let us assume a distribution
qlz) over it ,
called the variational distribution .
Now ,
tho ) =

log / Po
Ki 2) dz

log fqlz) if kid dz

q( 2)


[ qlz) log
I
Pocx 2)
,
dz [ Jensen's inequality ]
02C 2)

É Fo (g) + H (g)
log Po
=

It
(g) =
(2)
logqlasdz
-
Folq) is a
functional which is a lower bound on the

( Evidence)
data
dog likelihood and hence termed the

Evidence Lower Round [ ELBO] .

In order to ICO)
maximize ,
we
maximize Folq) instead

by alternatively optimizing over


if EO .

This is called the


Expectation Maximization ( En)
algorithm .
EM
algorithm :
Estep Fo (g)
:
Maximize w.r.to given 0

q€* =

argmax Foetal
q

M Fo (g)
step :
Maximize w.at - O
given q .

Oᵗᵗ
'

Fo ( OE )
argmax
=

argmax
0 / OF log Po ( ✗ F) die .
Lemma : EM
algorithm never decreases the
log likelihood .

:
Proof

Consider ICO) -

Fo (g) =

log poem -1912 )


log poke) dz

02127

log Poli) folk] log pfzlDP.CH dz


-

OLCZ)
=

log Poli) -

log Pdx) -1-1042) toggle) de

Po #)
×

= DKL [04411%121×1]
i. t.co) =
Fo (g) Eff DKL [ qC2) Up (21×7)=0 ,

02k) E-
poczlx)

of (21×1)
°

• . in E- step =
p→

110¥ =E →
(E) ≤ F(qᵗ) [Ensured via the
M
step]

We also have Fot ( OE) ≤ llot) [Jensen's ineq] .

llot )
'

E- Not )
-

8. E. D i☒%
In to the likelihood
summary
in
,
maximize a

variable ( mixture model ) lower bound


latent
, first a

is constructed likelihood variation distribution


on
using
a

on the latent variable .

Subsequently ,
the lower bound

is
iteratively optimized over variational Er the

model parameters Model then be used several


. can
for
tasks such as
posterior inference (
estimating Potala) and

sampling .
Variation Auto Encoders .

For mixture densities ,


EM can be used since

the variational
optimization over
density is tractable .

i. e.
, of =
PEG / )
× which can be
analytically
computed .

However thins not be the densities


may
case
for arbitrary .
Thus desirable
,
it is to have an
algorithm that

works even in the cases where poczlx) a Poli) are

intractable .

Variation Auto Encoders provide to


efficient way

accomplish ML estimation , posterior inference a

such
sampling for cases .
References:

1. http://stillbreeze.github.io/Variational-Inference-and-
Expectation-Maximization/

2. https://www.cs.cmu.edu/~tom/10-702/Zoubin-702.pdf

3. https://arxiv.org/pdf/1312.6114.pdf

4. https://arxiv.org/pdf/1606.05908.pdf

5. https://gregorygundersen.com/blog/2018/04/29/
reparameterization/

You might also like