0% found this document useful (0 votes)

9 views16 pages

Lecture 5

This document presents Lecture 5 of a course on Probabilistic Machine Learning, focusing on the Expectation Maximization (EM) algorithm and its application to Gaussian Mixture Models (GMMs). It outlines the concepts of latent variables, the steps involved in the EM algorithm, and the derivation of the algorithm for GMMs, along with potential issues such as local optima and label-switching. The lecture emphasizes the advantages of GMMs over k-means clustering and provides references for further reading.

Uploaded by

marius.boda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views16 pages

Lecture 5

Uploaded by

marius.boda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Probabilistic Machine Learning

Lecture 5: Expectation maximization

Pekka Marttinen

Aalto University

February, 2025

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 1 / 16

Lecture 5 overview

Gaussian mixture models (GMMs), recap

EM algorithm
EM for Gaussian mixture models
Suggested reading: Bishop: Pattern Recognition and Machine
Learning
p. 110-113 (2.3.9): Mixtures of Gaussians
simple_example.pdf
p. 430-443: EM for Gaussian mixtures

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 2 / 16

GMMs, latent variable representation
Introduce latent variables zn =(zn1 , . . . , znK ) which spci…es the
component k of observation xn

1 , 0, . . . , 0)T
zn = (0, . . . , 0, |{z}
k th elem.

De…ne
K K
p (zn ) = ∏ πkz nk
and p (xn jzn ) = ∏ N (xn jµk , Σk )z nk

k =1 k =1
Then the marginal distribution p (xn ) is a GMM:
K
p (xn ) = ∑ πk N (xn jµk , Σk )
k =1
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 3 / 16
GMM: responsibilities, complete data

Posterior probability (responsibility) p (znk = 1jxn ) that observation

xn was generated by component k

πk N (xn jµk , Σk )
γ(znk ) p (znk = 1jxn ) =
∑ j = 1 π j N ( xn j µ j , Σ j )
K

Complete data: latent variables z and data x together: (x, z)

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 4 / 16

Idea of the EM algorithm (1/2)

Let X denote the observed data, and θ model parameters. The goal
b
in maximum likelihood is to …nd θ:

θb = arg max flog p (X jθ )g

If model contains latent variables Z , the log-likelihood is given by

( )
log p (X jθ ) = log ∑ p (X , Z j θ ) ,
Z

which may be di¢ cult to maximize analytically

Possible solutions: 1) numerical optimization, 2) the EM algorithm
(expectation-maximization)

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 5 / 16

Idea of the EM algorithm (2/2)

X : observed data, Z : unobserved latent variables

fX , Z g: complete data, X : incomplete data
In EM algorithm, we assume that the complete data log-likelihood:

log p (X , Z jθ )

is easy to maximize.
Problem: Z is not observed
Solution: maximize

Q (θ, θ0 ) EZ jX ,θ0 [log p (X , Z jθ )]

= ∑ p (Z jX , θ0 ) log p (X , Z jθ )
Z

where p (Z jX , θ0 ) is the posterior distribution of the latent variables

computed using the current parameter estimate θ0
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 6 / 16
Illustration of the EM algorithm for GMMs

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 7 / 16

EM algorithm in detail

Goal: maximize log p (X jθ ) w.r.t. θ

1 Initialize θ0
2 E-step Evaluate p (Z jX , θ0 ), and then compute

Q (θ, θ0 ) = EZ jX ,θ0 [log p (X , Z jθ )] = ∑ p (Z jX , θ0 ) log p (X , Z jθ )

3 M-step Evaluate θ new using

θ new = arg max Q (θ, θ0 ).

Set θ0 θ new
4 Repeat E and M steps until convergence

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 8 / 16

Why EM works

Figure: 11.16 in Murphy (2012)

As a function of θ, Q (θ, θ0 ) is a lower bound of the log-likelihood

log p (x jθ ) (plus a constant, see Bishop, Ch. 9.4).
EM iterates between 1) updating the lower bound (E-step), 2)
maximizing the lower bound (M-step).
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 9 / 16
EM algorithm, comments

In general, Z does not have to be discrete, just replace the

summation in Q (θ, θ0 ) by integration.
EM-algorithm can be used to compute the MAP (maximum a
posteriori) estimate by maximizing in the M-step Q (θ, θ0 ) + log p (θ ).
In general, EM-algorithm is applicable when the observed data X can
be augmented into complete data fX , Z g such that log p (X , Z jθ ) is
easy to maximize; Z does not have to be latent variables but can
represent, for example, unobserved values of missing or censored
observations.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 10 / 16

EM algorithm, simple example

Consider N independent observations x = (x1 , . . . , xN ) from a

two-component mixture of univariate Gaussians
1 1
p (xn jθ ) = N (xn j0, 1) + N (xn jθ, 1). (1)
2 2
One unknown parameter, θ, the mean of the second component.
Goal: estimate
θb = arg max flog p (xjθ )g .
θ

simple_example.pdf

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 11 / 16

EM algorithm for GMMs
k =1 π k N (x j µ k , Σ k )
p (x) = ∑K
1 Initialize parameter µk , Σk and mizing coe¢ cients πk . Repeat until
convergence:
2 E-step: Evaluate the responsibilities using current parameter values
πk N (xn jµk , Σk )
γ(znk ) =
∑ j = 1 π k N ( xn j µ k , Σ j )
K

3 M-step: Re-estimate the parameters using the current responsibilities

N
1
µnew
k =
Nk ∑ γ(znk )xn
n =1
N
1
Σnew
k =
Nk ∑ γ(znk )(xn µnew
k )(xn µnew
k )
T
n =1
N
πknew = k
N
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 12 / 16
Derivation of the EM algorithm for GMMs

In the M-step the formulas for µnew

k and Σnew
k are obtained by
di¤erentiating the expected complete data log-likelihood Q (θ, θ0 )
with respect to the particular parameters, and setting the derivatives
to zero.
The formula for πknew can be derived by maximizing Q (θ, θ0 ) under
the constraint ∑K
k = πk = 1. This can be done using the Lagrange
multipliers.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 13 / 16

EM for GMM, caveats
EM converges to a local optimum. In fact, the ML estimation for
GMMs is not well-de…ned due to singularities: if σk ! 0 for
components k with a single data point, likelihood goes to in…nity
(…g). Remedy: prior on σk .
Label-switching: non-identi…ability due to the fact that cluster labels
can be switched and likelihood remains the same.
In practice it is recommended to initialize the EM for the GMM by
k-means.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 14 / 16

GMM vs. k-means
"Why use GMMs and not just k-means?"

from Wikipedia

1 Clusters can be of di¤erent sizes and shapes

2 Probabilistic assignment of data items to clusters
3 Possibility to include prior knowledge (structure of the model/prior
distributions on the parameters)
Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 15 / 16
Important points

ML-estimation of GMMs can be done using numerical optimization or

the EM algorithm.
The main idea of the EM algorithm is to maximize the expectation of
the complete data log-likelihood, where the expectation is computed
with respect to the current posterior distributions (responsibilites) of
the latent variables.

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 16 / 16

A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
No ratings yet
A Novel Deep Reinforcement Learning Based Automated Stock Trading System Using Cascaded LSTM Networks
11 pages
MCQ Question
No ratings yet
MCQ Question
5 pages
5 Common Encryption Algorithms and The Unbreakables of The Future - StorageCraft
No ratings yet
5 Common Encryption Algorithms and The Unbreakables of The Future - StorageCraft
5 pages
ML - Expectation-Maximization Algorithm
No ratings yet
ML - Expectation-Maximization Algorithm
3 pages
00 - FIR Filtering Results Review & Practical Applications
No ratings yet
00 - FIR Filtering Results Review & Practical Applications
70 pages
Artiticial Intelligence AI Chapter 3 Search Techniques Ioenotes Edu NP
No ratings yet
Artiticial Intelligence AI Chapter 3 Search Techniques Ioenotes Edu NP
23 pages
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
No ratings yet
N 228, PV - $1,100, FV $13,438 Compute I: Solutions To TVM Practice Set II
5 pages
DMRG Theory and Introducton - Manual For DMRG Code
No ratings yet
DMRG Theory and Introducton - Manual For DMRG Code
28 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
24 pages
Lecture3 EM
No ratings yet
Lecture3 EM
36 pages
Learning With Hidden Variables - EM Algorithm
No ratings yet
Learning With Hidden Variables - EM Algorithm
31 pages
Unit-4 Introduction To Data Mining
No ratings yet
Unit-4 Introduction To Data Mining
26 pages
Shravya Banala
No ratings yet
Shravya Banala
29 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Get One More Story in Your Member Preview When You Sign Up. It's Free
No ratings yet
Get One More Story in Your Member Preview When You Sign Up. It's Free
12 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
N D IX: The E-M Algorithm
No ratings yet
N D IX: The E-M Algorithm
12 pages
Emergency Fund-TVM
No ratings yet
Emergency Fund-TVM
33 pages
Variables Aleatorias 2
No ratings yet
Variables Aleatorias 2
34 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
Lec16 PDF
No ratings yet
Lec16 PDF
10 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
How To Use The HMM Toolbox
No ratings yet
How To Use The HMM Toolbox
3 pages
L11.2 Prob Models em
No ratings yet
L11.2 Prob Models em
20 pages
Research Paper Spam Detection
No ratings yet
Research Paper Spam Detection
4 pages
The EM Algorithm: Ajit Singh November 20, 2005
No ratings yet
The EM Algorithm: Ajit Singh November 20, 2005
4 pages
6.2 K Means
No ratings yet
6.2 K Means
23 pages
Unit 2 - Week 1: Assignment 1
No ratings yet
Unit 2 - Week 1: Assignment 1
4 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
43-th Math Paper 2nd Final
No ratings yet
43-th Math Paper 2nd Final
5 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Beamer
No ratings yet
Beamer
34 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
Gaussian Mixture Modelling GMM
No ratings yet
Gaussian Mixture Modelling GMM
11 pages
Applied Stat
No ratings yet
Applied Stat
2 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
Image Enhancement Frequency Domain
No ratings yet
Image Enhancement Frequency Domain
40 pages
HMM Tutorial
No ratings yet
HMM Tutorial
15 pages
Unit 2
No ratings yet
Unit 2
7 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
OR Chapter - 3 - 100638
No ratings yet
OR Chapter - 3 - 100638
24 pages
QM Notes 3
No ratings yet
QM Notes 3
2 pages
ds11 2
No ratings yet
ds11 2
19 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
MMW - Topic 6 - Polyas Solving Technique
No ratings yet
MMW - Topic 6 - Polyas Solving Technique
4 pages
Chapter 2 (Part 1) OOP Vs SP
No ratings yet
Chapter 2 (Part 1) OOP Vs SP
11 pages
AI-based Ransomware Detection A Comprehensive Review
No ratings yet
AI-based Ransomware Detection A Comprehensive Review
30 pages
Main
No ratings yet
Main
12 pages
Ch06 Crypto7e
No ratings yet
Ch06 Crypto7e
34 pages
UNIT 4 - EM Alg
No ratings yet
UNIT 4 - EM Alg
3 pages
A1 U6 Review3
No ratings yet
A1 U6 Review3
3 pages
AI29
No ratings yet
AI29
3 pages
20 Gaussian Mixture Model
No ratings yet
20 Gaussian Mixture Model
55 pages
Machine Learning: CSCE883
No ratings yet
Machine Learning: CSCE883
22 pages
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
No ratings yet
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
26 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Ada qp1
No ratings yet
Ada qp1
3 pages
CS822 DataMining Week3
No ratings yet
CS822 DataMining Week3
91 pages
Unit Iii Efficiency 9
No ratings yet
Unit Iii Efficiency 9
16 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
Gaussian Mixtures
No ratings yet
Gaussian Mixtures
5 pages
An Alternative View of EM - Poornima
No ratings yet
An Alternative View of EM - Poornima
4 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
ExpectationMaximization Algorithm
No ratings yet
ExpectationMaximization Algorithm
7 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
Data-Driven Switching Control Technique Based On Deep Reinforcement Learning For Packed E-Cell As Smart EV Charger
No ratings yet
Data-Driven Switching Control Technique Based On Deep Reinforcement Learning For Packed E-Cell As Smart EV Charger
9 pages
References
No ratings yet
References
9 pages
Lecture-04 GMM EMalg
No ratings yet
Lecture-04 GMM EMalg
34 pages
Conv Net 1
No ratings yet
Conv Net 1
93 pages
5
No ratings yet
5
29 pages
Applications of Mathematics in Field of Information Technology
No ratings yet
Applications of Mathematics in Field of Information Technology
11 pages
Unit 5 - ML
No ratings yet
Unit 5 - ML
10 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
ML Unit3 EM GMM VodnalaSrujana
No ratings yet
ML Unit3 EM GMM VodnalaSrujana
4 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages

Lecture 5

Uploaded by

Lecture 5

Uploaded by

Probabilistic Machine Learning

Lecture 5: Expectation maximization

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 1 / 16

Gaussian mixture models (GMMs), recap

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 2 / 16

Posterior probability (responsibility) p (znk = 1jxn ) that observation

Complete data: latent variables z and data x together: (x, z)

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 4 / 16

θb = arg max flog p (X jθ )g

If model contains latent variables Z , the log-likelihood is given by

which may be di¢ cult to maximize analytically

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 5 / 16

X : observed data, Z : unobserved latent variables

Q (θ, θ0 ) EZ jX ,θ0 [log p (X , Z jθ )]

where p (Z jX , θ0 ) is the posterior distribution of the latent variables

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 7 / 16

Goal: maximize log p (X jθ ) w.r.t. θ

Q (θ, θ0 ) = EZ jX ,θ0 [log p (X , Z jθ )] = ∑ p (Z jX , θ0 ) log p (X , Z jθ )

3 M-step Evaluate θ new using

θ new = arg max Q (θ, θ0 ).

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 8 / 16

Figure: 11.16 in Murphy (2012)

As a function of θ, Q (θ, θ0 ) is a lower bound of the log-likelihood

In general, Z does not have to be discrete, just replace the

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 10 / 16

Consider N independent observations x = (x1 , . . . , xN ) from a

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 11 / 16

3 M-step: Re-estimate the parameters using the current responsibilities

In the M-step the formulas for µnew

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 13 / 16

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 14 / 16

1 Clusters can be of di¤erent sizes and shapes

ML-estimation of GMMs can be done using numerical optimization or

Pekka Marttinen (Aalto University) Probabilistic Machine Learning February, 2025 16 / 16

You might also like