0% found this document useful (0 votes)

13 views33 pages

AI61004 Module4 Latentvariable

Uploaded by

anushreeghosh1301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views33 pages

AI61004 Module4 Latentvariable

Uploaded by

anushreeghosh1301

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Latent Variable Models and E-M algorithm

Statistical Foundations of AI and ML, Module 3

Adway Mitra
Center for Artificial Intelligence
Indian Institute of Technology Kharagpur

March 6, 2023

Adway Mitra LVM

Generative Model for Clustering

▶ Generative model is a story about how the data was created

▶ We imagine that each of K clusters has a prototype
▶ Every data point is a “noisy version” of one prototype
▶ For any datapoint i,
▶ first the cluster index Zi is decided (Zi ∈ {1, 2, . . . , K })
▶ then the feature Xi is created, as a noisy version of the
selected cluster’s prototype

Adway Mitra LVM

Generative Model for Clustering

▶ Assumption: Each cluster represented by prototypes:

{θ1 , θ2 , . . . , θK }
▶ for each datapoint i
▶ Draw cluster index Zi ∼ g (g : distribution on the clusters)
▶ Draw feature vector Xi ∼ f (θZi ) (f : distribution on the
observation space)
▶ We choose f and g according to application (eg. f can be
Gaussian if our observations are real-valued)

Adway Mitra LVM

Inference and Estimation Problems

▶ Observed variables: X (our observed datapoints)

▶ Unknown variables: cluster assignments Z , cluster parameters
θ
▶ Finding Z : Inference problem, prob(Z |X , θ)
▶ Finding θ: Estimation problem, θ = argmaxprob(Z , X , θ)
▶ Challenge: The two problems are linked together!
▶ Cannot estimate θ directly because of Z

Adway Mitra LVM

Gaussian Mixture Model

▶ Each cluster represented by a Gaussian distribution: N (µj , σj )

(j ∈ {1, . . . , K })
▶ Each cluster has a probability πj , π = [π1 , . . . , πK ], πj ≥ 0,
PK
j=1 πj = 1
▶ Model parameters: {µj , σj , πj }K
j=1
▶ for each datapoint i
▶ Draw cluster index Zi ∼ Categorical(π)
▶ Draw feature vector Xi ∼ N (µZi , σZi )

Adway Mitra LVM

Gaussian Mixture Model

▶ Xi depends only on Zi , Zi depends on nothing!

▶ Joint distribution:
prob(Z1 , . . . , ZN , X1 , . . . , XN ) = N
Q
i=1 prob(Zi )prob(Xi |Zi )
QK I (Zi =j)
▶ prob(Zi ) = j=1 πj (I : indicator function)
(xi −µj )2 I (Zi =j)
▶ prob(Xi |Zi ) = K 1
Q
j=1 ( σj exp(− 2σ 2 ))
j
▶ Likelihood function
Q QK πj (xi −µj )2 I (Zi =j)
L(µ, σ, π) = N i=1 j=1 ( σj exp(− 2σ 2 ))
j
▶ Log-likelihood =
PN PK (xi −µj )2
i=1 j=1 I (Zi = j)(log πj − log σj − 2σj2
))

Adway Mitra LVM

Gaussian Mixture Model

▶ {µMLE , σMLE , πMLE } = argmaxµ,σ,π L(µ, σ, π)

∂L
▶ Solve ∂µ ∂L
= 0, ∂σ =0
| |
PN
I (Zi =j)xi
▶ µj = Pi=1
N , i.e. mean of the points in cluster j
i=1 I (Zi =j)
PN 2
▶ σj = i=1 I (Zi =j)(xi −µj )
PN i.e. variance of the points in cluster j
i=1 I (Z i =j)
PN
I (Zi =j)
▶ πj = PN i=1
PN , i.e. relative frequency of the points in
i=1 j=1 I (Zi =j)
cluster j
▶ Unfortunately we cannot compute these, as we do not know
Z!

Adway Mitra LVM

Expectation Maximization

▶ As we do not know I (Zi = j), we consider it as a random

variable, with distribution p(Zi |X )
▶ We replace I (Zi = j) by its expected value, γij = E (I (Zi = j))
▶ As I is binary, E (I (Zi = j)) = p(Zi = j|X )
p(Xi |Zi =j)p(Zi =j)
▶ p(Zi = j|X ) = p(Zi = j|Xi ) = p(Xi ) =
p(X |Z =j)p(Zi =j)
PK i i
l=1 p(Xi |Zi =l)p(Zi =l)
πj N (Xi ;µj ,σj )
▶ So, γij = PK
l=1 πl N (xi ;µl ,σl )
PN PN 2
PN
▶ µj = γij xi i=1 γij (xi −µj ) γij
Pi=1
N , σ j = PN , πj = PN i=1
PN
γ
i=1 ij γ
i=1 ij i=1 j=1 γij

Adway Mitra LVM

Expectation Maximization

▶ We use an iterative algorithm

1. Initialize µ0 , σ 0 , π 0
2. Repeat
πj0 N (Xi ;µ0j ,σj0 )
2.1 E-step: Calculate γij = PK
π 0 N (Xi ;µ0l ,σl0 )
l=1 l
2.2 M-step: Re-estimate the parameters
N N
γij (xi −µj )2
P P PN
γij xi γij
2.3 µ1j = Pi=1
N γ
, σj1 = i=1
PN
γ
, πj1 = i=1
N
i=1 ij i=1 ij

3. If (µ0 , σ 0 , π 0 ) ≈ (µ1 , σ 1 , π 1 ), STOP

4. Else set (µ0 = µ1 , σ 0 = σ 1 , π 0 = π 1 ) and GOTO 2

Adway Mitra LVM

Expectation Maximization

▶ When E-M algorithm converges, we get optimal values of the

parameters (µEM , σ EM , π EM )
πjEM N (Xi ;µEM EM
j ,σj )
▶ Compute posterior distribution p(Zi |Xi ) = PK EM EM )
l=1 πl N (Xi ;µl ,σl
▶ Soft-clustering instead of hard-clustering as in K-means
▶ Mode of distribution may be used as cluster assignment

Adway Mitra LVM

Model Likelihood

▶ The likelihood of a model: L(P) = prob(X ) the joint

distribution of the data according to the model
▶ If model contains latent variables like Z , marginalize over
them
N
Y N X
Y K
L(µ, σ, π) = prob(X ) = prob(Xi ) = prob(Xi , Zi = k)
i=1 i=1 k=1
YN XK
= prob(Xi |Zi = k)prob(Zi = k)
i=1 k=1
N X K
Y 1 (xi − µk )2
= πk exp(− )
2πσk 2σk2
i=1 k=1

Adway Mitra LVM

Comparing models

▶ Two different GMMs - with different sets of parameters

(µa , σa , πa ) and (µb , σb , πb )
▶ They can be compared by their likelihoods
▶ L(µa , σa , πa ) > L(µb , σb , πb ) implies that first model fits the
data better than the second
▶ Choosing K may be done by this approach

Adway Mitra LVM

Latent Linear Gaussian Model

▶ Zi ∼ N (µ0 , Σ0 ), Xi ∼ N (WZi + µ, Σ)
▶ Observation X = {X1 , . . . , XN }, Parameters
θ = {W , µ0 , µ, Σ0 , Σ}, Latent Z = {Z1 , . . . , ZN }
▶ Posterior on latents pθ (Zi |Xi ) = N (µi , Σi ) where
Σi = (Σ−1 T
0 + W ΣW )
−1 and

µi = Σi (W Σ (Xi − µ) + Σ−1
T −1
0 µ0 )
▶ Marginal pθ (Xi ) = N (W µ0 + µ, Σ + W Σ0 W T )
▶ Log-Likelihood l(θ) = N
P
i=1 log (pθ (Xi ))
▶ Parameters can be estimated by maximum-likelihood

Adway Mitra LVM

Mixture of Latent Linear Gaussians

▶ Yi ∼ Cat(π), Zi ∼ N (µ0 , Σ0 )
▶ Xi |Zi , Yi = k ∼ N (Wk Zi + µk , Σk )
▶ Observation X = {X1 , . . . , XN }, Latent
Z = {Z1 , . . . , ZN , Y1 , . . . , YN }
▶ Parameters
θ = {π, W1 , . . . , WK , µ0 , µ1 , . . . , µK , Σ0 , Σ1 , . . . , ΣK }
▶ For simplicity, assume µ0 = 0, Σ0 = I , Σk = Σ
▶ Log-likelihood l(θ) = N
P
i=1 log (pθ (Xi ))
▶ Expected Complete Log-likelihood Er ,s (log (pθ ((X , Y , Z ))
where ri (Yi ) = pθ (Yi |Xi ) and si (Zi ) = pθ (Zi |Xi , Yi )

Adway Mitra LVM

E-M for Mixture of Latent Linear Gaussians
▶ E-step for Y :
ri (Yi = c, θt ) = pθt (Yi = c|Xi ) ∝ πct N (µtc , Wct Wct T )
▶ E-step for Z :
si (Zi |Xi , Yi , θt ) = pθt (Zi |Xi , Yi = c) = N (mic , Σic ) where
Σic = (I + Wct T Σt Wct )−1 and mic = Σic (WcT Σ−1 (Xi − µc ))
▶ M-step: Estimate θt+1 as below (Ŵc = {Wc , µc }, qi = Yi ):

Adway Mitra LVM

Hidden Markov Model

▶ Consider sequential observations x1 , x2 , . . . , xT

▶ Key assumption in GMM: all the data-points are independent
▶ For sequential applications, this may not be true any longer!
▶ eg. a long audio stream with many speakers
▶ The observation xt is likely to belong to same speaker as xt−1
▶ There may be a transition pattern from one speaker to
another!

Adway Mitra LVM

Hidden Markov Model

▶ Different values of Z indicate state of the system (eg. which

speaker is talking)
▶ System may have K states (decided by user)
▶ Current state Zt depends on previous states Z1 , . . . , Zt−1
▶ Instead of prob(Zt ), we need prob(Zt |Zt−1 , . . . , Z1 )
▶ Markov Assumption: Future indepedent of past, given the
present!
▶ Markov model: prob(Zt |Zt−1 , . . . , Z1 ) = prob(Zt |Zt−1 )
▶ New parameter instead of π: Aij = prob(Zt = j|Zt−1 = i)

Adway Mitra LVM

Hidden Markov Model

▶ Each state represented by parameters: pj (j ∈ {1, . . . , K }) of

emission distribution f
▶ Transition distribution from state i to state j:
Aij = prob(Zt = j|Zt−1 = i) (KxK matrix)
▶ Each row of matrix A: categorical probability distribution
▶ An initial state distribution π (similar to GMM)
▶ Z1 ∼ Categorical(π); X1 ∼ f (pZ1 )
▶ for each datapoint t
▶ Draw cluster index Zt ∼ Categorical(AZt −1 )
▶ Draw feature vector Xt ∼ f (pZt )

Adway Mitra LVM

Hidden Markov Model

▶ Common emission distributions: Categorical (discrete

observations) or Gaussian (real observations)
▶ Xt depends on Zt only, Zt depends on Zt−1 only
▶ Joint distribution prob(X , Z ) =
prob(Z1 )prob(X1 |Z1 ) T
Q
t=2 prob(Zt |Zt−1 )prob(Xt |Zt )
▶ Rearranging, prob(X , Z ) =
prob(Z1 ) × T
Q QT
t=2 prob(Zt |Zt−1 ) × t=1 prob(Zt |Xt )
▶ prob(Z1 ) : π (initial state distribution), prob(Zt |Zt−1 ) : A
(transition distribution), prob(Xt |Zt ) : f (p) (emission
distribution)

Adway Mitra LVM

Forward-Backward Algorithm

Inference problem: Given (π, A, p), find posterior distribution

prob(Zt |X1 , . . . , XT )

prob(Zt |X1 , . . . , XT ) ∝ prob(Zt , X1 , . . . , XT )

= prob(Zt , X1 , . . . , Xt )
x prob(Xt+1 , . . . , XT |Zt , X1 , . . . , Xt )
= prob(Zt , X1 , . . . , Xt )prob(Xt+1 , . . . , XT |Zt )
= αt (Zt )βt (Zt )

Adway Mitra LVM

Forward Algorithm

αt (Zt ) = prob(Zt , X1 , . . . , Xt )
X
= prob(Zt , Zt−1 , X1 , . . . , Xt )
Zt−1
X
= prob(Zt−1 , X1 , . . . , Xt−1 )prob(Zt , Xt |Zt−1 , X1 , . . . , Xt−1 )
Zt−1
X
= αt−1 (Zt−1 )prob(Zt , Xt |Zt−1 )
Zt−1
X
= αt−1 (Zt−1 )prob(Zt |Zt−1 )prob(Xt |Zt )
Zt−1

K K ,K
I (Z =k) I (Zt−1 =k,Zt =l)
Y Y
α1 (Z1 ) = πk 1 f (X1 , pk ), prob(Zt |Zt−1 ) = Akl
k=1 k,l=1

Adway Mitra LVM

Backward Algorithm

βt (Zt ) = prob(Xt+1 , . . . , XT |Zt )

Estimation problem: Estimate the parameters (π, A, p), though

we don’t know Z
T
Y
prob(X , Z ) = prob(Z1 )prob(X1 |Z1 ) prob(Zt |Zt−1 )prob(Xt |Zt )
t=2

K T K ,K T
I (Z1 =k) I (Zt−1 =k,Zt =l)
Y Y Y Y
L(π, A, p) = πk × Akl × f (Xt , pZt )
k=1 t=2 k,l=1 t=1

Replace I (Z1 = k) by γ1 (k) = E (I (Z1 = k)), I (Zt−1 = k, Zt = l)

by ξt (kl) = E (I (Zt−1 = k, Zt = l))

Adway Mitra LVM

Baum-Welch Algorithm (E-M)

Input: sequence {X1 , . . . , XT }, emission parameters p

1. Make initial estimates of parameters π 0 , A0
2. Repeat
π 0 f (X ,p )
2.1 πk1 = γk = PK k 0 1 k
π f (X1 ,pl )
PT −1 l=1 l
1 t=1 ξt (kl)
2.2 Akl = PT −1 γ (k)
t=1 t
0 0 1 1
2.3 If (π , A , ) ≈ (π , A ), STOP
2.4 Else set (π 0 = π 1 , A0 = A1 ) and GOTO 2
αt (k)βt (k) αt−1 (k)βt (l)A0kl f (Xt , pl )
γt (k) = PK , ξt (kl) = PK ,K
0
i=1 αt (i)βt (i) i,j=1 αt−1 (i)βt (j)Aij f (Xt , pj )

Adway Mitra LVM

Why does E-M work?

▶ Latent Variables: Z , Observed variables: X , Model

parameters: θ
▶ Full model log-likelihood log (pθ (Z , X )) cannot be evaluated
because Z is latent
▶ Data log-likelihood l(θ) = log (pθ (X )) = log ( Z pθ (X , Z ))
P

▶ Using Jensen’s Inequality, log (Eq (Z )) ≥ Eq (log (Z )) for any

arbitrary distribution q
▶ Hence l(θ) = log ( Z q(Z ) pθq(Z
(X ,Z ) pθ (X ,Z )
P P
) )≥ q(Z ) log ( q(Z ) )

▶ i.e. l(θ) ≥ Eq (log ( pθq(Z

(X ,Z )
) )) = Q(θ, q)
▶ So for any arbitrary distribution q, Q(θ, q) is a lower bound
on q

Adway Mitra LVM

Why does E-M work?

▶ Aim: to estimate the parameters θ that maximize l(θ) - which

is analytically difficult
▶ Key idea: As an alternative, find a tight lower bound of
l(θ) and maximize it
▶ Find that distribution q(Z ) for which Q(θ, q) is as close to
l(θ) as possible!
▶ Q(θ, q) = Eq (log ( pθq(Z
(X ,Z )
) )) = Eq (log (
pθ (Z /X )pθ (X )
q(Z ) )) =
Eq (log ( pθq(Z
(Z /X )
) )) + log (pθ (X ))
▶ Rearranging terms, we have
l(θ) = Q(θ, q) + KL(pθ (Z /X ), q(Z ))
▶ If q(Z ) = pθ (Z /X ), KL is 0, i.e. l(θ) = Q(θ, q) (tight lower
bound)

Adway Mitra LVM

Why does E-M work?

▶ Now that we have found a tight lower bound, we need to

estimate parameters θ to maximize it
▶ But since we cannot maximize it directly, we have to do it
iteratively!
▶ Current estimate of parameters: θt
▶ q t (Z ) = pθt (Z |X ), i.e. conditional distribution of latent
variables w.r.t. current estimate of parameters
▶ This can be numerically evaluated using the model
▶ Q(θ, q t ) = Eqt (log ( pθq(X ,Z )
t (Z ) )) = Eq t (log (pθ (X , Z )) + C

▶ Calculating this is equivalent to the E-step (calculating

expected total likelihood WRT current parameter estimate)!

Adway Mitra LVM

Why does E-M work?

▶ θt+1 = argmaxθ Q(θ, q t ) = argmaxθ Eqt (log (pθ (X , Z ))

▶ This is equivalent to maximizing the total likelihood
▶ l(θt+1 ) ≥ Q(θt+1 , q t ) (Jensen’s Inequality)
▶ Q(θt+1 , q t ) ≥ Q(θt , q t ) (As θt+1 maximizes Q(θ, q t )
▶ But Q(θt , q t ) = l(θt ) (tight lower bound)
▶ Combining, we have l(θt+1 ) ≥ l(θt ), i.e. with each iteration
the data likelihood increases
▶ Hence, E-M algorithm must converge to a maxima
(local/global) depending on initial values θ0

Adway Mitra LVM

General Framework of E-M

▶ Find analytical expression for complete log-likelihood of model

log (pθ (Z , X ))
▶ Find analytical expression of expected complete likelihood
Eq (log (pθ (Z , X ))), where the expectation is taken with
respect to q(Z ) = pθ (Z |X ), i.e. posterior of latent variables
WRT observations
▶ Initialize the parameters θ0
▶ E-step: Calculate numerically the posterior of latent variables
WRT this parameter estimates q 0 (Z ) = pθ0 (Z |X )
▶ M-step: Choose θ1 to maximize the expected complete
likelihood Eq0 (log (pθ (Z , X )))
▶ Repeat E and M steps till θ converges

Adway Mitra LVM

Unit 3 Machine learning aktu
No ratings yet
Unit 3 Machine learning aktu
13 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
16 pages
CS-601-Machine-learning-Unit-5 (1)
No ratings yet
CS-601-Machine-learning-Unit-5 (1)
18 pages
Yirga Ashenafi Argaw 2022
No ratings yet
Yirga Ashenafi Argaw 2022
295 pages
EM algorithm-ppt
No ratings yet
EM algorithm-ppt
30 pages
Bivariate Poisson Regression
0% (1)
Bivariate Poisson Regression
39 pages
Bio24_Rathouz
No ratings yet
Bio24_Rathouz
45 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
data science for civil engineering unit 3 notes-1
No ratings yet
data science for civil engineering unit 3 notes-1
29 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Expectation Minimization
No ratings yet
Expectation Minimization
22 pages
Scs 414 Machine Learning Assignment 2 Sc212-1012-2019
No ratings yet
Scs 414 Machine Learning Assignment 2 Sc212-1012-2019
12 pages
ML Module-02
No ratings yet
ML Module-02
37 pages
lec13
No ratings yet
lec13
27 pages
AML_mod2
No ratings yet
AML_mod2
38 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
CS772 Lec10
No ratings yet
CS772 Lec10
23 pages
AIML ML Session 16
No ratings yet
AIML ML Session 16
10 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
Chapter 1 - Part1
No ratings yet
Chapter 1 - Part1
56 pages
20-gaussian-mixture-model
No ratings yet
20-gaussian-mixture-model
55 pages
MLQB Unit 3
No ratings yet
MLQB Unit 3
12 pages
771 A18 Lec16
No ratings yet
771 A18 Lec16
99 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
L08_GMM
No ratings yet
L08_GMM
11 pages
Ai Ques Ans Unit 4 Lecture Notes 4 aktu
No ratings yet
Ai Ques Ans Unit 4 Lecture Notes 4 aktu
20 pages
NCME Program 2015
No ratings yet
NCME Program 2015
238 pages
16) ISM-Session 16 - 30th and 31st March 2024
No ratings yet
16) ISM-Session 16 - 30th and 31st March 2024
36 pages
lec12
No ratings yet
lec12
15 pages
Mixtures of Gaussian - PPT
No ratings yet
Mixtures of Gaussian - PPT
12 pages
gmm
No ratings yet
gmm
8 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Package Emcluster': February 1, 2018
No ratings yet
Package Emcluster': February 1, 2018
31 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
GaussianMixtureModel(GMM)_0a8d7758700f041bd57d8aef0862eb14
No ratings yet
GaussianMixtureModel(GMM)_0a8d7758700f041bd57d8aef0862eb14
18 pages
6 Text Clustering
No ratings yet
6 Text Clustering
66 pages
AD3461_ML Lab Manual
No ratings yet
AD3461_ML Lab Manual
54 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
Lecture 19 and 20
No ratings yet
Lecture 19 and 20
27 pages
UNIT 5 - ML
No ratings yet
UNIT 5 - ML
10 pages
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
No ratings yet
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
14 pages
Lecture 26 - Latent Variable Models (1) - Plain
No ratings yet
Lecture 26 - Latent Variable Models (1) - Plain
9 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Ch9 2-MixturesofGaussians PDF
No ratings yet
Ch9 2-MixturesofGaussians PDF
38 pages
Wk04 machine learning
No ratings yet
Wk04 machine learning
6 pages
GAUSSIAN MIXTURES
No ratings yet
GAUSSIAN MIXTURES
5 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Reputation-Aware Data Fusion and Malicious Participant Detection in Mobile Crowdsensing
No ratings yet
Reputation-Aware Data Fusion and Malicious Participant Detection in Mobile Crowdsensing
9 pages
Prques 2
No ratings yet
Prques 2
13 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
Shinji Watanabe PHD Thesis
No ratings yet
Shinji Watanabe PHD Thesis
125 pages
19-Hidden Markov Models
No ratings yet
19-Hidden Markov Models
17 pages
Ch13 4-LinearDynamicalSystems
No ratings yet
Ch13 4-LinearDynamicalSystems
20 pages
ds11 2
No ratings yet
ds11 2
19 pages
Image Segmentation Using Gaussian Mixture Models
No ratings yet
Image Segmentation Using Gaussian Mixture Models
9 pages
OCR For Printed Telugu Documents
No ratings yet
OCR For Printed Telugu Documents
32 pages
Andrew Rosenberg - Lecture 18: Gaussian Mixture Models and Expectation Maximization
No ratings yet
Andrew Rosenberg - Lecture 18: Gaussian Mixture Models and Expectation Maximization
34 pages
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
No ratings yet
DENCLUE 2.0: Fast Clustering Based On Kernel Density Estimation
11 pages
Yirdaw 2012
No ratings yet
Yirdaw 2012
8 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Unit IV Aiml
No ratings yet
Unit IV Aiml
32 pages
Bayesian Kernel Methods
No ratings yet
Bayesian Kernel Methods
40 pages
GMM Methodandapplication
No ratings yet
GMM Methodandapplication
28 pages
AI29
No ratings yet
AI29
3 pages
Unit Iv L Earning
No ratings yet
Unit Iv L Earning
33 pages
Fisher Information For GLM
No ratings yet
Fisher Information For GLM
35 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Probabilistic Identification o F Keyblocks in Rock Excavations PDF
No ratings yet
Probabilistic Identification o F Keyblocks in Rock Excavations PDF
250 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
2B Naive Bayes
No ratings yet
2B Naive Bayes
90 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
5 pages
Machine Learning Techniques Short Answers
No ratings yet
Machine Learning Techniques Short Answers
20 pages
CB PDF
No ratings yet
CB PDF
69 pages
Reynolds Bio Metrics GMM
No ratings yet
Reynolds Bio Metrics GMM
5 pages
Forecasting Intraday Trading Volume: A Kalman Filter Approach
No ratings yet
Forecasting Intraday Trading Volume: A Kalman Filter Approach
16 pages
KoKo Manual PDF
No ratings yet
KoKo Manual PDF
27 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
RVM Tutorial
No ratings yet
RVM Tutorial
25 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
MM Algorithm
No ratings yet
MM Algorithm
28 pages
Cheet Sheet
No ratings yet
Cheet Sheet
47 pages
Algorithmic Trading
No ratings yet
Algorithmic Trading
95 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet

AI61004 Module4 Latentvariable

Uploaded by

AI61004 Module4 Latentvariable

Uploaded by

Latent Variable Models and E-M algorithm

Statistical Foundations of AI and ML, Module 3

Adway Mitra LVM

▶ Generative model is a story about how the data was created

Adway Mitra LVM

▶ Assumption: Each cluster represented by prototypes:

Adway Mitra LVM

▶ Observed variables: X (our observed datapoints)

Adway Mitra LVM

▶ Each cluster represented by a Gaussian distribution: N (µj , σj )

Adway Mitra LVM

▶ Xi depends only on Zi , Zi depends on nothing!

Adway Mitra LVM

▶ {µMLE , σMLE , πMLE } = argmaxµ,σ,π L(µ, σ, π)

Adway Mitra LVM

▶ As we do not know I (Zi = j), we consider it as a random

Adway Mitra LVM

▶ We use an iterative algorithm

3. If (µ0 , σ 0 , π 0 ) ≈ (µ1 , σ 1 , π 1 ), STOP

Adway Mitra LVM

▶ When E-M algorithm converges, we get optimal values of the

Adway Mitra LVM

▶ The likelihood of a model: L(P) = prob(X ) the joint

Adway Mitra LVM

▶ Two different GMMs - with different sets of parameters

Adway Mitra LVM

Adway Mitra LVM

Adway Mitra LVM

Adway Mitra LVM

▶ Consider sequential observations x1 , x2 , . . . , xT

Adway Mitra LVM

▶ Different values of Z indicate state of the system (eg. which

Adway Mitra LVM

▶ Each state represented by parameters: pj (j ∈ {1, . . . , K }) of

Adway Mitra LVM

▶ Common emission distributions: Categorical (discrete

Adway Mitra LVM

Inference problem: Given (π, A, p), find posterior distribution

prob(Zt |X1 , . . . , XT ) ∝ prob(Zt , X1 , . . . , XT )

Adway Mitra LVM

Adway Mitra LVM

βt (Zt ) = prob(Xt+1 , . . . , XT |Zt )

Estimation problem: Estimate the parameters (π, A, p), though

Replace I (Z1 = k) by γ1 (k) = E (I (Z1 = k)), I (Zt−1 = k, Zt = l)

Adway Mitra LVM

Input: sequence {X1 , . . . , XT }, emission parameters p

Adway Mitra LVM

▶ Latent Variables: Z , Observed variables: X , Model

▶ Using Jensen’s Inequality, log (Eq (Z )) ≥ Eq (log (Z )) for any

▶ i.e. l(θ) ≥ Eq (log ( pθq(Z

Adway Mitra LVM

▶ Aim: to estimate the parameters θ that maximize l(θ) - which

Adway Mitra LVM

▶ Now that we have found a tight lower bound, we need to

▶ Calculating this is equivalent to the E-step (calculating

Adway Mitra LVM

▶ θt+1 = argmaxθ Q(θ, q t ) = argmaxθ Eqt (log (pθ (X , Z ))

Adway Mitra LVM

▶ Find analytical expression for complete log-likelihood of model

Adway Mitra LVM

You might also like