CB PDF
CB PDF
the EM Algorithm
Christopher M. Bishop
Microsoft Research, Cambridge
1 1
0.5
0.5
2006 Advanced Tutorial
Lecture Series, CUED 0
0
0 0.5 1 0 0.5 1
(a) (b)
Applications of Machine Learning
http://research.microsoft.com/~cmbishop/PRML
Mixture Models and EM
• K-means clustering
• Gaussian mixture model
• Maximum likelihood and EM
• Bayesian GMM and variational inference
Time
between
eruptions
(minutes)
such that
data
responsibilities prototypes
Minimizing the Cost Function
• E-step: minimize J w.r.t.
mean covariance
x2 x2 x2
x1 x1 x1
(a) (b) (c)
Likelihood Function
• Data set
0.5
0
0 0.5 1
(a)
Contours of Probability Distribution
0.5
0
0 0.5 1
(b)
Surface Plot
Sampling from the Gaussian
• To generate a data point:
– first pick one of the components with probability
– then draw a sample from that component
• Repeat these two steps for each new data point
Synthetic Data Set
0.5
0
0 0.5 1
(a)
Fitting the Gaussian Mixture
• We wish to invert this process – given the data set, find
the corresponding parameters:
– mixing coefficients
– means
– covariances
• If we knew which component generated each data point,
the maximum likelihood solution would involve fitting
each component to the corresponding cluster
• Problem: the data set is unlabelled
• We shall refer to the labels as latent (= hidden) variables
Synthetic Data Set Without Labels
0.5
0
0 0.5 1
(b)
Posterior Probabilities
• We can think of the mixing coefficients as prior
probabilities for the components
• For a given value of we can evaluate the
corresponding posterior probabilities, called
responsibilities
• These are given from Bayes’ theorem by
Posterior Probabilities (colour coded)
0.5
0
0 0.5 1
(a)
Latent Variables
1 1 1
0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
(a) (b) (a)
Maximum Likelihood for the GMM
• The log likelihood function takes the form
then consider
• Likelihood function
• Factorized approximation
2
(a)
τ
0
−1 0 µ 1
After Updating
2
(b)
τ
0
−1 0 µ 1
After Updating
2
(c)
τ
0
−1 0 µ 1
Converged Solution
2
(d)
τ
0
−1 0 µ 1
Variational Equations for GMM
Sufficient Statistics
• Posterior probabilities