0% found this document useful (0 votes)

44 views69 pages

CB PDF

Mixture models and the EM algorithm are techniques for modeling heterogeneous data as a probabilistic mixture of underlying sub-populations or components. The EM algorithm is an iterative method for finding maximum likelihood or maximum a posteriori estimates of parameters in statistical models with latent variables. It alternates between performing an expectation (E) step, which computes the expectation of the log-likelihood evaluated using the current estimate of the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. The EM algorithm is commonly used to estimate the parameters of mixture models such as Gaussian mixture models.

Uploaded by

manojituuu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views69 pages

CB PDF

Uploaded by

manojituuu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Mixture Models and

the EM Algorithm

Christopher M. Bishop
Microsoft Research, Cambridge
1 1

0.5
0.5
2006 Advanced Tutorial
Lecture Series, CUED 0
0
0 0.5 1 0 0.5 1
(a) (b)
Applications of Machine Learning

• Web search, email spam detection, collaborative

filtering, game player ranking, video games, real-time
stereo, protein folding, image editing, jet engine anomaly
detection, fluorescence in-situ hybridisation, signature
verification, satellite scatterometer, cervical smear
screening, human genome analysis, compiler
optimization, handwriting recognition, breast X-ray
screening, fingerprint recognition, fast spectral analysis,
one-touch microwave oven, monitoring of premature
babies, text/graphics discrimination, event selection in
high energy physics, electronic nose, real-time tokamak
control, crash log analysis, QSAR, backgammon, sleep
EEG staging, fMRI analysis, speech recognition, natural
language processing, face detection, data visualization,
computer Go, satellite track removal, iris recognition, …
Three Important Developments
• 1. Adoption of a Bayesian framework
• 2. Probabilistic graphical models
• 3. Efficient techniques for approximate inference
Illustration: Bayesian Ranking
• Goal is to rank player skill from outcome of games
• Conventional approach: Elo (used in chess)
– maintains a single strength value for each player
– cannot handle team games, or more than 2 players
Bayesian Ranking: TrueSkillTM
• Ralf Herbrich, Thore Graepel, Tom Minka
• Xbox 360 Live (November 2005)
– millions of players
– billions of service-hours
– hundreds of thousands of game outcomes per day
• First “planet-scale” application of Bayesian methods?
• NIPS (2006) oral
Expectation Propagation on a Factor Graph
New Book
• Springer (2006)
• 738 pages, hardcover
• Full colour
• Low price
• 431 exercises + solutions
• Matlab software and
companion text with
Ian Nabney

http://research.microsoft.com/~cmbishop/PRML
Mixture Models and EM
• K-means clustering
• Gaussian mixture model
• Maximum likelihood and EM
• Bayesian GMM and variational inference

Please ask questions!

Old Faithful
Old Faithful Data Set

Time
between
eruptions
(minutes)

Duration of eruption (minutes)

K-means Algorithm
• Goal: represent a data set in terms of K clusters each of
which is summarized by a prototype
• Initialize prototypes, then iterate between two phases:
– E-step: assign each data point to nearest prototype
– M-step: update prototypes to be the cluster means
• Simplest version is based on Euclidean distance
– re-scale Old Faithful data
Responsibilities
• Responsibilities assign data points to clusters

such that

• Example: 5 data points and 3 clusters

K-means Cost Function

data

responsibilities prototypes
Minimizing the Cost Function
• E-step: minimize J w.r.t.

• M-step: minimize J w.r.t

• Convergence guaranteed since there is a finite number

of possible settings for the responsibilities
Probabilistic Clustering
• Represent the probability distribution of the data as a
mixture model
– captures uncertainty in cluster assignments
– gives model for data distribution
– Bayesian mixture model allows us to determine K
• Consider mixtures of Gaussians
The Gaussian Distribution
• Multivariate Gaussian

mean covariance

x2 x2 x2

x1 x1 x1
(a) (b) (c)
Likelihood Function
• Data set

• Consider first a single Gaussian

• Assume observed data points generated independently

• Viewed as a function of the parameters, this is known as

the likelihood function
Maximum Likelihood
• Set the parameters by maximizing the likelihood function
• Equivalently maximize the log likelihood
Maximum Likelihood Solution
• Maximizing w.r.t. the mean gives the sample mean

• Maximizing w.r.t covariance gives the sample covariance

Gaussian Mixtures
• Linear super-position of Gaussians

• Normalization and positivity require

• Can interpret the mixing coefficients as prior probabilities

Example: Mixture of 3 Gaussians

0.5

0
0 0.5 1
(a)
Contours of Probability Distribution

0.5

0
0 0.5 1
(b)
Surface Plot
Sampling from the Gaussian
• To generate a data point:
– first pick one of the components with probability
– then draw a sample from that component
• Repeat these two steps for each new data point
Synthetic Data Set

0.5

0
0 0.5 1
(a)
Fitting the Gaussian Mixture
• We wish to invert this process – given the data set, find
the corresponding parameters:
– mixing coefficients
– means
– covariances
• If we knew which component generated each data point,
the maximum likelihood solution would involve fitting
each component to the corresponding cluster
• Problem: the data set is unlabelled
• We shall refer to the labels as latent (= hidden) variables
Synthetic Data Set Without Labels

0.5

0
0 0.5 1
(b)
Posterior Probabilities
• We can think of the mixing coefficients as prior
probabilities for the components
• For a given value of we can evaluate the
corresponding posterior probabilities, called
responsibilities
• These are given from Bayes’ theorem by
Posterior Probabilities (colour coded)

0.5

0
0 0.5 1
(a)
Latent Variables

1 1 1

0.5 0.5 0.5

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
(a) (b) (a)
Maximum Likelihood for the GMM
• The log likelihood function takes the form

• Note: sum over components appears inside the log

• There is no closed form solution for maximum likelihood
Over-fitting in Gaussian Mixture Models
• Singularities in likelihood function when a component
‘collapses’ onto a data point:

then consider

• Likelihood function gets larger as we add more

components (and hence parameters) to the model
– not clear how to choose the number K of components
Problems and Solutions
• How to maximize the log likelihood
– solved by expectation-maximization (EM) algorithm
• How to avoid singularities in the likelihood function
– solved by a Bayesian treatment
• How to choose number K of components
– also solved by a Bayesian treatment
EM Algorithm – Informal Derivation
• Let us proceed by simply differentiating the log likelihood
EM Algorithm – Informal Derivation
• Similarly for the covariances

• For mixing coefficients use a Lagrange multiplier to give

EM Algorithm – Informal Derivation
• The solutions are not closed form since they are coupled
• Suggests an iterative scheme for solving them:
– make initial guesses for the parameters
– alternate between the following two stages:
1. E-step: evaluate responsibilities
2. M-step: update parameters using ML results
• Each EM cycle guaranteed not to decrease the likelihood
Relation to K-means
• Consider GMM with common covariances
• Take limit
• Responsibilities become binary

• EM algorithm is precisely equivalent to K-means

Bayesian Mixture of Gaussians
• Include prior distribution over parameters

• Make predictions by marginalizing over parameters

– c.f. point estimate from maximum likelihood
Bayesian Mixture of Gaussians
• Conjugate priors for the parameters:
– Dirichlet prior for mixing coefficients

– Normal-Wishart prior for means and precisions

where the Wishart distribution is given by

Variational Inference
• Exact solution is intractable
• Variational inference:
– extension of EM
– alternate between updating posterior over parameters
and posterior over latent variables
– again convergence is guaranteed
Illustration: a single Gaussian
• Convenient to work with precision

• Likelihood function

• Prior over parameters

Variational Inference
• Goal is to find true posterior distribution

• Factorized approximation

• Alternately update each factor to minimize a measure of

closeness between the true and approximate distributions
Initial Configuration

2
(a)
τ

0
−1 0 µ 1
After Updating

2
(b)
τ

0
−1 0 µ 1
After Updating

2
(c)
τ

0
−1 0 µ 1
Converged Solution

2
(d)
τ

0
−1 0 µ 1
Variational Equations for GMM
Sufficient Statistics

• Small computational overhead compared to maximum

likelihood EM
Bayesian Model Comparison
• Multiple models (e.g. different values of K) with priors

• Posterior probabilities

• For equal priors, models are compared using evidence

• Variational inference maximizes lower bound on

Evidence vs. K for Old Faithful
Bayesian Model Complexity
Take-home Messages
• Maximum likelihood gives severe over-fitting
– singularities
– favours ever larger numbers of components
• Bayesian mixture of Gaussians
– no singularities
– determines optimal number of components
• Variational inference
– effective solution for Bayesian GMM
– little computational overhead compared to EM
Viewgraphs, tutorials and
publications available from:
http://research.microsoft.com/~cmbishop

Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Clustering Mixture
No ratings yet
Clustering Mixture
22 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
16) ISM-Session 16 - 30th and 31st March 2024
No ratings yet
16) ISM-Session 16 - 30th and 31st March 2024
36 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Module13 GaussianMixtureModel
No ratings yet
Module13 GaussianMixtureModel
17 pages
Chapter 1 - Part1
No ratings yet
Chapter 1 - Part1
56 pages
Lec15 16 Handout
No ratings yet
Lec15 16 Handout
33 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
PRML Slides 2
No ratings yet
PRML Slides 2
86 pages
20 Gaussian Mixture Model
No ratings yet
20 Gaussian Mixture Model
55 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
TD10 - TD - GMM - 2025
No ratings yet
TD10 - TD - GMM - 2025
1 page
Lecture2 2013
No ratings yet
Lecture2 2013
60 pages
AI29
No ratings yet
AI29
3 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
An Alternative View of EM - Poornima
No ratings yet
An Alternative View of EM - Poornima
4 pages
MLSlides5 - Selected - Shared
No ratings yet
MLSlides5 - Selected - Shared
30 pages
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
No ratings yet
Density Estimation With Gaussian Mixture Models: CS 2XX: Mathematics For AI and ML
26 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
Cse291d 7
No ratings yet
Cse291d 7
39 pages
Machine Learning: CSCE883
No ratings yet
Machine Learning: CSCE883
22 pages
Finite Mixture Modelling Model Specification, Estimation & Application
No ratings yet
Finite Mixture Modelling Model Specification, Estimation & Application
11 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Ch9 2-MixturesofGaussians PDF
No ratings yet
Ch9 2-MixturesofGaussians PDF
38 pages
Mixture Models and Clustering
No ratings yet
Mixture Models and Clustering
8 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
GaussianMixtureModel (GMM)
No ratings yet
GaussianMixtureModel (GMM)
18 pages
Likelihood EM HMM Kalman
No ratings yet
Likelihood EM HMM Kalman
46 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Bayesian Linear Regression in Data Mining: K.Sathyanarayana Sharma, Dr.S.Rajagopal
No ratings yet
Bayesian Linear Regression in Data Mining: K.Sathyanarayana Sharma, Dr.S.Rajagopal
3 pages
Intro Bayes Time Series 1
No ratings yet
Intro Bayes Time Series 1
72 pages
جلسه پنجم-1
No ratings yet
جلسه پنجم-1
15 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
Essentials of Bayesian Inference 1706204646
No ratings yet
Essentials of Bayesian Inference 1706204646
21 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
No ratings yet
BayesianThinking Day1 Albert WORKSHOP Ppts PDF
188 pages
Week 7 GMM
No ratings yet
Week 7 GMM
9 pages
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
No ratings yet
Bishop-Pattern-Recognition-and-Machine-Learning-2006 第455 - 459页
5 pages
Unit 2
No ratings yet
Unit 2
7 pages
Lec 12
No ratings yet
Lec 12
15 pages
Wk04 Machine Learning
No ratings yet
Wk04 Machine Learning
6 pages
CM Latent - Models 2022
No ratings yet
CM Latent - Models 2022
27 pages
Notes7 Mixtures and EM
No ratings yet
Notes7 Mixtures and EM
7 pages
Symmetrical Based Projects
No ratings yet
Symmetrical Based Projects
105 pages
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
No ratings yet
S6, S7, S8 CS - U4 Getter Setter EM Algorithm
32 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
Tutorial em
No ratings yet
Tutorial em
57 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
ASSIGNMENT1
No ratings yet
ASSIGNMENT1
7 pages
Talk On Regression Based Method For Bayesian Nonparanormal Graphical Models
No ratings yet
Talk On Regression Based Method For Bayesian Nonparanormal Graphical Models
40 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
Bayesian Inference Slides 2021
No ratings yet
Bayesian Inference Slides 2021
37 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
MySQL Notes
No ratings yet
MySQL Notes
20 pages
MX BNG 17.x-18.x New Features v02 (ENG)
No ratings yet
MX BNG 17.x-18.x New Features v02 (ENG)
48 pages
Security
No ratings yet
Security
235 pages
Cyber Security Lab
No ratings yet
Cyber Security Lab
19 pages
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
No ratings yet
KELOMPOK 5 - An Overview of Business Intelligence, Analytics, and Data Science
15 pages
SIM7100 SIM7500 SIM7600 Sleep Mode Application Note V1.01
No ratings yet
SIM7100 SIM7500 SIM7600 Sleep Mode Application Note V1.01
11 pages
Adhisuchna 25042013
No ratings yet
Adhisuchna 25042013
5 pages
Lecture 3 Hypothesis Space & Inductive Bias
No ratings yet
Lecture 3 Hypothesis Space & Inductive Bias
29 pages
Iphone 12 Mini 07300290A Repair
100% (1)
Iphone 12 Mini 07300290A Repair
81 pages
HEC-HMS Training-V2-20231219 - 223927
No ratings yet
HEC-HMS Training-V2-20231219 - 223927
61 pages
Quarashi Network Whitepaper
No ratings yet
Quarashi Network Whitepaper
16 pages
Assignment III - Advanced CUDA
No ratings yet
Assignment III - Advanced CUDA
12 pages
Physics With Arduino
No ratings yet
Physics With Arduino
44 pages
Smart Lock System Project Report
No ratings yet
Smart Lock System Project Report
2 pages
JCM Training Overview Uba 10-11-12 14
No ratings yet
JCM Training Overview Uba 10-11-12 14
25 pages
l3 Phono Stage v2
100% (1)
l3 Phono Stage v2
110 pages
Codeverse Documentation
No ratings yet
Codeverse Documentation
60 pages
A Guide To UX Design and Development: Developer's Journey Through The UX Process 1st Edition Tom Green All Chapters Instant Download
100% (5)
A Guide To UX Design and Development: Developer's Journey Through The UX Process 1st Edition Tom Green All Chapters Instant Download
66 pages
AiPhen - Solution Challenge - Project Submission
No ratings yet
AiPhen - Solution Challenge - Project Submission
13 pages
Practical No. - 1
No ratings yet
Practical No. - 1
55 pages
Aon - Cyber Solution: Ransomware Supplemental Questionnaire
No ratings yet
Aon - Cyber Solution: Ransomware Supplemental Questionnaire
9 pages
Data Anonymization - SAP
No ratings yet
Data Anonymization - SAP
4 pages
Tsedey Bank
No ratings yet
Tsedey Bank
11 pages
Examples of Sets in Real Life
No ratings yet
Examples of Sets in Real Life
12 pages
Unit I - MMD - Lecture NoteStu
No ratings yet
Unit I - MMD - Lecture NoteStu
10 pages
Infocyte Hunt-Biotech Case Study
No ratings yet
Infocyte Hunt-Biotech Case Study
4 pages
Python Mysql Connectivity Lab Work
No ratings yet
Python Mysql Connectivity Lab Work
3 pages
Project-Timeline Ms
No ratings yet
Project-Timeline Ms
3 pages
Import & Update Drivers in Bulk
No ratings yet
Import & Update Drivers in Bulk
6 pages