0% found this document useful (0 votes)

124 views24 pages

Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie

The document discusses the Expectation-Maximization (EM) algorithm and its applications, including Gaussian mixture models (GMM). [1] The EM algorithm is an iterative method for finding maximum likelihood estimates in problems with missing or latent data. [2] It alternates between an expectation (E) step, which computes the expected value of the log-likelihood, and a maximization (M) step, which computes the parameters maximizing the expected log-likelihood from the E step. [3] The algorithm is applied to GMM clustering by treating cluster labels as latent variables.

Uploaded by

Vikash Movva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views24 pages

Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie

Uploaded by

Vikash Movva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

ISyE 6416: Computational Statistics

Spring 2023

Lecture 8: EM algorithm and

Gaussian Mixture Model

Prof. Yao Xie

H. Milton Stewart School of Industrial and Systems Engineering

Georgia Institute of Technology
Expectation-Maximization (EM) Algorithm
▶ an algorithm to a maximum likelihood estimator in non-ideal case: missing data,
indirect observations
▶ missing data
▶ clustering (unknown label)
▶ hidden-states in HMM
▶ latent factors

▶ replace one difficult likelihood

maximization with a sequence of
easier maximizations
▶ in the limit, the answer to the original
problem
Applications of EM
▶ Data clustering in machine learning
▶ Natural language processing (Baum-Welch algorithm to fit hidden Markov model)
▶ Imputing missing data
General set-up

S O

Observation
Hidden state
Space

▶ we do not observe S, only observe indirectly from O

▶ Joint distribution of state and observation f (S, O|θ)
Deriving EM

▶ Introduce

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation uses the conditional

distribution of S given O and assumed
value of parameter θ′
Rhymer’s Notes

Intuition
Given O, the “best guess” we could have for S, is its conditional expectation with
respect to S|O, θ (notion of projection); but the computation of expectation involves
parameter values. We take a guess, and improve in next round.
Comment on the Q function
For the conditional likelihood: Q-function

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation is taken with respect to the conditional distribution f (S|O)

▶ O: observed data
▶ In this sense, it has a Bayesian flavor: we have to compute the posterior
distribution of the state given the observation
▶ θ′ : assumed value of the parameter when deriving the posterior distribution
f (S|O)
▶ θ is the parameter involved in “log-likelihood” log f (S|θ) that we will maximize
with respect to
▶ θ and θ′ are usually not the same in your algorithm
E-M algorithm

▶ E-step: compute expectation of the log-likelihood

observed data O, unknown state S

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ M-step: compute maximum likelihood using the expectation in previous step

E-step ⇒ M-step ⇒ E-step ⇒ M-step ⇒

▶ stop until ∥θk+1 − θk ∥ < ϵ or |Q(θk+1 |θk ) − Q(θk |θk−1 )| < ϵ
Example: EM for missing data

n = 4, p = 2

x1 = (0, 2)T , x2 = (1, 0)T , x3 = (2, 2)T , x4 = (∗, 4)T

Assume they are i.i.d. samples from Gaussian
2
T σ1 0
N ([µ1 , µ2 ] , )
0 σ22

Use EM algorithm to impute the missing data *.

Hidden state: Missing data.

Pattern classification, R. O. Duda, P. E. Hart, and D. G. Stork

(Cont.) Example: missing data
▶ Initialization: θ0 = (0, 0, 1, 1)T , i.e., mean [0, 0]T and covariance I2 .
▶ E-step

Q(θ|θ0 ) = Ex41 [log p(x|θ)|x1 , x2 , x3 , x42 ]

3
X
= log p(xi |θ)
i=1
Z
+ log(p([x41 , 4]T )|θ) · p([x41 , 4]|θ0 )dx41
3
X (1 + µ21 ) (4 − µ2 )2
= log p(xi |θ) − − − log(2πσ1 σ2 )
i=1
2σ12 2σ22

▶ M-step
θ1 = arg max Q(θ|θ0 )
θ
(Cont.) Example: missing data - iterations
 
0.75
 2.0  0.75 0.938 0
0.938 ⇒
θ1 =  µ1 = Σ1 =

2.0 0 2.0
2.0

 
1.0
 2.0 
θ2 =  
0.667
2.0
The absent-minded biologist
197 animals
Distributed into 4 categories

125 18 20 34

Multinomial model of 5 category with unknown parameter θ

1 θ 1−θ 1−θ θ
( , , , , )
2 4 4 4 4
Can we figure out the number of Monkey A based on the data?
(Cont.) The absent-minded biologist

▶ data y = (125, 18, 20, 34)

▶ now assume y1 = y11 + y12 = 125
▶ Likelihood function
n! 1 θ 1 θ 1 θ θ
f (y|θ) = ( )y11 ( )y12 ( − )y2 ( − )y3 ( )y4
y11 !y12 !y2 !y3 !y4 ! 2 4 4 4 4 4 4
▶ log-likelihood

ℓ(θ|y) ∝ (y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)

▶ y12 unknown, cannot directly maximize ℓ(θ|y)

(Cont.) The absent-minded biologist: set-up EM

Q(θ|θ′ ) = Ey12 [(y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)|y1 , . . . , y4 , θ′ ]

= (Ey12 [y12 |y1 , θ′ ] + y4 ) log θ + (y2 + y3 ) log(1 − θ)

θ′ /4
Conditional distribution of y12 given y1 : Binomial (y1 , θ′ /4+1/2 )

y1 θ ′ θ′
Ey12 [y12 |y1 , θ′ ] = ′
:= y12 ,
2+θ
E-step:
′
Q(θ|θ′ ) = (y12
θ
+ y4 ) log θ + (y2 + y3 ) log(1 − θ)
(θ )
y12k +y4
M-step: θk+1 = arg max Q(θ|θk ) = (θk )
y12 +y2 +y3 +y4
Fitting Gaussian mixture model (GMM)
C
X
xi ∼ πc ϕ(xi |µc , Σc )
c=1
ϕ: density of multi-variate normal
▶ parameters {µc , Σc , πc }C c=1
▶ assume C is known.
▶ observed data {x1 , . . . , xn }
▶ complete data {(x1 , y1 ), . . . , (xn , yn )}
yn : “label” for each sample, missing.
(𝑥' , 𝑦' )

𝜋"

𝜋$

𝜋#
EM for GMM
▶ If we know the label information yi , likelihood function can be easily written

πyi ϕ(xi |µyi , Σyi )

▶ now yi unknown, compute its expectation with respect to the set of parameters

Xn
Q(θ|θ′ ) = E[ log πyi + log ϕ(xi |µyi , Σyi )|xi , θ′ ]
i=1

(𝑥' , 𝑦' )

𝜋"

𝜋$

𝜋#
E-step

▶ (πc(k) , µ(k) (k)

c , Σc ) parameter values in the kth iteration
▶ we need yi |xi , posterior distribution of label, given observation xi

pi,c := p(yi = c|xi ) ∝ πc(k) ϕ(xi |µc(k) , Σ(k)

Q: where is θ?
M-step
▶ Maximize Q(θ|θk ) with respect to πc , µc , Σc (note that they can be maximized
separately)
θk+1 = arg max Q(θ|θk )
θ
PC
▶ note that c=1 πc =1

Pn
pi,c xi
µ(k+1)
c = Pi=1
n
i=1 pi,c
Pn (k+1) (k+1) T
i=1 pi,c (xi − µc )(xi − µc )
Σ(k+1)
c = Pn
i=1 pi,c
n
1X
πc(k+1) = pi,c
n
i=1
Interpretation

(𝑥' , 𝑦' )
▶ pi,c : probability of each sample belong
to computer c
▶ πc(k+1) : count the expected number of 𝜋"
samples belong to component c 𝜋$
▶ soft-assignment: xi belong to 𝜋#
component c with assignment
probability pi,c 0.5 1
(k+1)
▶ µc :
“average” centroid using soft 0.3
𝑥' 2
assignment
▶ µ(k+1)
c : “average” covariance using 0.2
soft assignment 3
P(𝑦' = 𝑗|𝑥' )
k-means
1 1
▶ K-means: “hard” assignment
▶ EM algorithm: “soft” assignment: in the end, pi,c can 0
𝑥" 2
be viewed as a soft label for each sample; convert into
hard label:
C
ĉi = arg max pi,c
c=1
0
3
Demo
▶ The wine data set was introduced by Forina et al. (1986)
▶ It originally includes the results of 27 chemical measurements on 178 wines made
in the same region of Italy but derived from three different cultivars: Barolo,
Grignolino and Barbera
▶ We use the first two principle components of the data
Mixture of 3 Gaussian components
▶ First fun PCA to reduce the data dimension to 2
▶ Use pi,c , c = 1, 2, 3 as the proportion of “red”, “green”, and “blue” components
Properties of EM

▶ EM algorithm converges to local maximum

▶ Heuristic: escaping the local maximum through a random start
▶ EM works on improving Q(θ|θ′ ) rather than directly improving log f (x|θ)
▶ one can show that improvement on Q(θ|θ′ ) improves log f (x|θ)
▶ EM works well with exponential family
▶ E-step: sum of expectations of the sufficient statistics
▶ M-step: maximizing a linear function
usually possible to derive closed-form update
Convergence of EM

▶ Proof by A. Dempster, N. Larid and

D. Rubin in 1977, later generalized by
J. Wu in 1983.
▶ Basic idea: find a sequence of
quadratic lower bounds for the
likelihood function
▶ EM monotonically increases the
observed data log likelihood

ℓ(θk+1 ) ≥ Q(θk+1 ; θk ) ≥ Q(θk ; θk ) = ℓ(θk )

Algorithmic Trading Using Reinforcement Learning Augmented With Hidden Markov Model
No ratings yet
Algorithmic Trading Using Reinforcement Learning Augmented With Hidden Markov Model
10 pages
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
No ratings yet
EM Algorithm: Shu-Ching Chang Hyung Jin Kim December 9, 2007
10 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
Beamer
No ratings yet
Beamer
34 pages
5
No ratings yet
5
29 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
Lecture-04 GMM EMalg
No ratings yet
Lecture-04 GMM EMalg
34 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
AI29
No ratings yet
AI29
3 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
Applied Stat
No ratings yet
Applied Stat
2 pages
Machine Learning: CSCE883
No ratings yet
Machine Learning: CSCE883
22 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Intro To em
No ratings yet
Intro To em
4 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
Gaussian Mixtures
No ratings yet
Gaussian Mixtures
5 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
Lec16 PDF
No ratings yet
Lec16 PDF
10 pages
ML Unit3 EM GMM VodnalaSrujana
No ratings yet
ML Unit3 EM GMM VodnalaSrujana
4 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
No ratings yet
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
26 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
L08 GMM
No ratings yet
L08 GMM
11 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
An Alternative View of EM - Poornima
No ratings yet
An Alternative View of EM - Poornima
4 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
No ratings yet
EM-algorithm: California Institute of Technology 136-93 Pasadena, CA 91125 Welling@vision - Caltech.edu
7 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
Learning With Hidden Variables - EM Algorithm
No ratings yet
Learning With Hidden Variables - EM Algorithm
31 pages
20 Gaussian Mixture Model
No ratings yet
20 Gaussian Mixture Model
55 pages
6.2 K Means
No ratings yet
6.2 K Means
23 pages
N D IX: The E-M Algorithm
No ratings yet
N D IX: The E-M Algorithm
12 pages
Em Algo For Multivariate GMM
No ratings yet
Em Algo For Multivariate GMM
9 pages
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
Unit2 6
No ratings yet
Unit2 6
12 pages
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
The Kullback-Liebler Distance and Entropy
No ratings yet
The Kullback-Liebler Distance and Entropy
5 pages
CB PDF
No ratings yet
CB PDF
69 pages
Expectation Maximization Homework Solution
100% (1)
Expectation Maximization Homework Solution
8 pages
Tutorial On Generalized Expectation
No ratings yet
Tutorial On Generalized Expectation
6 pages
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
From Everand
10+2 Level Mathematics For All Exams GMAT, GRE, CAT, SAT, ACT, IIT JEE, WBJEE, ISI, CMI, RMO, INMO, KVPY Etc.
Shubhankar Paul
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Inventory Management Case
No ratings yet
Inventory Management Case
3 pages
HW 4
No ratings yet
HW 4
3 pages
Computational Data Analysis: Machine Learning
No ratings yet
Computational Data Analysis: Machine Learning
26 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
44 pages
Sarbin Contribution Study Actuarial Individual Methods Prediction
No ratings yet
Sarbin Contribution Study Actuarial Individual Methods Prediction
11 pages
Supply Chain Economics: Anton J. Kleywegt
No ratings yet
Supply Chain Economics: Anton J. Kleywegt
161 pages
Discrete Choice Models
No ratings yet
Discrete Choice Models
21 pages
Jordan Learning in Graphical Models
100% (2)
Jordan Learning in Graphical Models
623 pages
ML 2 EM Example
No ratings yet
ML 2 EM Example
7 pages
Introduction-To-Ml-Part-3 Edited
No ratings yet
Introduction-To-Ml-Part-3 Edited
73 pages
Ai Fundamentals Midterm Quizzes Source
No ratings yet
Ai Fundamentals Midterm Quizzes Source
26 pages
6 Text Clustering
No ratings yet
6 Text Clustering
66 pages
Image Segmentation and Classification: Professor Michael Brady Frs Freng Hilary Term 2005
No ratings yet
Image Segmentation and Classification: Professor Michael Brady Frs Freng Hilary Term 2005
26 pages
Gaussian Mixture Model Clustering With Incomplete Data
No ratings yet
Gaussian Mixture Model Clustering With Incomplete Data
14 pages
Midterm Lab Exam - Attempt Review
No ratings yet
Midterm Lab Exam - Attempt Review
17 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
ML Unit-2
No ratings yet
ML Unit-2
138 pages
Python, Data Science, and Unsupervised Learning
No ratings yet
Python, Data Science, and Unsupervised Learning
32 pages
Deep DFM
No ratings yet
Deep DFM
40 pages
Guy Revach Research Statement
No ratings yet
Guy Revach Research Statement
10 pages
Study Materials Sem5-1
No ratings yet
Study Materials Sem5-1
15 pages
Aditya Srivastava: Educational Qualification
No ratings yet
Aditya Srivastava: Educational Qualification
1 page
A Tutorial On MM Algorithms
No ratings yet
A Tutorial On MM Algorithms
8 pages
Clustering Techniques and Their Applications in Engineering
100% (1)
Clustering Techniques and Their Applications in Engineering
16 pages
Module 2 Rnsit
No ratings yet
Module 2 Rnsit
15 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Alize and Lia - Ral
No ratings yet
Alize and Lia - Ral
17 pages
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
No ratings yet
The Expectation-Maximization Algorithm: IEEE Signal Processing Magazine December 1996
15 pages
Aligning Sentences in Parallel Corpora
No ratings yet
Aligning Sentences in Parallel Corpora
8 pages
Bayesian Learning Video Tutorial
No ratings yet
Bayesian Learning Video Tutorial
25 pages
AiMidterm Exam - Attempt Review
No ratings yet
AiMidterm Exam - Attempt Review
17 pages
Honeypot: Intrusion Detection System
No ratings yet
Honeypot: Intrusion Detection System
4 pages
Finite Mixture of Skewed Distributions: Víctor Hugo Lachos Dávila Celso Rômulo Barbosa Cabral Camila Borelli Zeller
No ratings yet
Finite Mixture of Skewed Distributions: Víctor Hugo Lachos Dávila Celso Rômulo Barbosa Cabral Camila Borelli Zeller
108 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages

Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie

Uploaded by

Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie

Uploaded by

ISyE 6416: Computational Statistics

Lecture 8: EM algorithm and

Prof. Yao Xie

H. Milton Stewart School of Industrial and Systems Engineering

▶ replace one difficult likelihood

▶ we do not observe S, only observe indirectly from O

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation uses the conditional

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation is taken with respect to the conditional distribution f (S|O)

▶ E-step: compute expectation of the log-likelihood

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ M-step: compute maximum likelihood using the expectation in previous step

E-step ⇒ M-step ⇒ E-step ⇒ M-step ⇒

x1 = (0, 2)T , x2 = (1, 0)T , x3 = (2, 2)T , x4 = (∗, 4)T

Use EM algorithm to impute the missing data *.

Pattern classification, R. O. Duda, P. E. Hart, and D. G. Stork

Q(θ|θ0 ) = Ex41 [log p(x|θ)|x1 , x2 , x3 , x42 ]

Multinomial model of 5 category with unknown parameter θ

▶ data y = (125, 18, 20, 34)

ℓ(θ|y) ∝ (y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)

▶ y12 unknown, cannot directly maximize ℓ(θ|y)

Q(θ|θ′ ) = Ey12 [(y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)|y1 , . . . , y4 , θ′ ]

πyi ϕ(xi |µyi , Σyi )

▶ (πc(k) , µ(k) (k)

pi,c := p(yi = c|xi ) ∝ πc(k) ϕ(xi |µc(k) , Σ(k)

▶ EM algorithm converges to local maximum

▶ Proof by A. Dempster, N. Larid and

ℓ(θk+1 ) ≥ Q(θk+1 ; θk ) ≥ Q(θk ; θk ) = ℓ(θk )

You might also like