0% found this document useful (0 votes)
6 views

Module13 GaussianMixtureModel

The document discusses K-means clustering and Gaussian mixture models. It explains how K-means can be used for image segmentation and data compression. It then describes Gaussian distributions and how Gaussian mixture models can model clustered data. The document outlines the expectation maximization algorithm for estimating the parameters of a Gaussian mixture model.

Uploaded by

riya pandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Module13 GaussianMixtureModel

The document discusses K-means clustering and Gaussian mixture models. It explains how K-means can be used for image segmentation and data compression. It then describes Gaussian distributions and how Gaussian mixture models can model clustered data. The document outlines the expectation maximization algorithm for estimating the parameters of a Gaussian mixture model.

Uploaded by

riya pandey
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Clustering and

Gaussian Mixture Model


Dr. Sayak Roychowdhury
Department of Industrial & Systems Engineering,
IIT Kharagpur
Reference
• Bishop, C. M. (2006). Pattern recognition and machine
learning. Springer google schola, 2, 5-43.
Application of K-means Clustering
• Image segmentation and compression
• The goal of segmentation is to partition an image into regions each of which has a
reasonably homogeneous visual appearance or which corresponds to objects or
parts of objects
• Each pixel in an image is a point in a 3-dimensional space comprising the intensities
of the RGB channels
• Running K-means to convergence, for any particular value of K, by re-drawing the
image replacing each pixel vector with the {R, G,B} intensity triplet given by the
centre 𝜇𝑘 to which that pixel has been assigned.
• Data Compressioning: K-means for lossy data compression
• Each data point is approximated by nearest cluster centre 𝜇𝑘
• This framework is often called vector quantization, and the vectors 𝜇𝑘 are called
code-book vectors
Image Segmentation with K-means

Bishop, C. M. (2006).
Pattern recognition
and machine
learning. Springer
google schola, 2, 5-43.
Gaussian Distribution
• Univariate Gaussian Distribution:
1 1 2
• 𝑓 𝑥|𝜇, 𝜎 = exp − 2 𝑥−𝜇
𝜎 2𝜋 2𝜎

• Multivariate Gaussian Distribution:


1 1
−2 𝑥−𝜇 𝑇 Σ−1 𝑥−𝜇
𝑓 𝑥|𝜇, Σ = 𝑝 1 𝑒
2𝜋 2 Σ 2
Gaussian Mixture

Bishop, C. M. (2006). Pattern recognition and machine


learning. Springer google schola, 2, 5-43.
Gaussian Mixture

3 gaussian distribution
That generated the datapoints

Bishop, C. M. (2006). Pattern recognition and machine


learning. Springer google schola, 2, 5-43.
Gaussian Mixture

3 gaussian distribution Clustering using estimated


that generated the datapoints posterior probability
of clusters using GMM
Bishop, C. M. (2006). Pattern recognition and machine
learning. Springer google schola, 2, 5-43.
Maximum Likelihood for Parameter
Estimation
1 𝑇 −1
𝑝
ln 𝑓𝑘 𝑥|𝜇𝑘 , Σ𝑘 = − l𝑛 Σ𝑘 − 𝑥 − 𝜇𝑘 Σ𝑘 𝑥 − 𝜇𝑘 − l𝑛 𝜋
2 2

Differentiating and equating to 0


𝑥𝑖
𝜇Ƹ 𝑘 = σ𝑔𝑖 =𝑘
𝑁𝑘
𝑥𝑖 −ෝ 𝜇𝑘 𝑇
𝜇𝑘 𝑥𝑖 −ෝ
෢k =
Σ 𝐾
σ𝑘=1 σ𝑔𝑖 =𝑘
𝑁 𝑘

Where 𝑁𝑘 is the number of datapoints in 𝑘𝑡ℎ cluster


Gaussian Mixture
• Linear superposition of Gaussians:
𝐾

𝑓(𝑥) = ෍ 𝑤𝑘 𝒩(𝑥|𝜇𝑘 , Σ𝑘 )
𝑘=1
Normalization and positivity of weights (mixing coefficients):
0 ≤ 𝑤𝑘 ≤ 1, σ𝐾 𝑘=1 𝑤𝑘 = 1
• Log-likelihood:
𝑁 𝑁 𝐾

ln 𝑓(𝑋|𝜇, Σ, 𝑊) = ෍ ln 𝑓 𝑥𝑖 = ෍ ln ෍ 𝑤𝑘 𝒩 𝑥 𝜇𝑘 , Σ𝑘
𝑖=1 𝑖=1 𝑘=1
Responsibilities
• The mixing coefficients can be thought of as prior probabilities
• For a given value of ‘x’, the posterior probabilities can be calculated, which are
also called “responsibilities”
• Using Bayes rule:

𝑓 𝑥𝑘 𝑓 𝑘 𝑤𝑘 𝑓𝑘 𝑥
𝛾𝑘 𝑥 = 𝑓 𝑘 𝑥 = = σ𝑙 𝑤𝑙 𝑓𝑙 𝑥
𝑓(𝑥)

𝑤𝑘 𝒩 𝑥 𝜇𝑘 , Σ𝑘
= 𝐾
σ𝑙=1 𝑤𝑙 𝒩(𝑥|𝜇𝑙 , Σ𝑙 )
𝑁𝑘
where 𝑤𝑘 =
𝑁

𝛾𝑘 𝑥 is also called latent variable here.


Expectation Maximization (EM) Algorithm
• EM algorithm is an iterative optimization technique
• Estimation step: for the given parameter values, compute the
expected values of the latent variable
• Maximization step: update the parameters of the model based on the
calculated value of the latent variable
Expectation Maximization (EM) Algorithm
• Given a Gaussian Mixture Model, the goal is to maximize the
likelihood function by varying the means and covariances and the
mixing coefficients
• Initialize 𝜇𝑗 , Σ𝑗 and mixing coefficients 𝑤𝑗 and evaluate initial log-
likelihood value
• Expectation step: Evaluate responsibilities using current parameter
values:
𝑤𝑘 𝒩 𝑥 𝜇𝑘 , Σ𝑘
𝛾𝑘 𝑥 = σ𝐾
𝑙=1 𝑤𝑙 𝒩(𝑥|𝜇𝑙 ,Σ𝑙 )
Expectation Maximization (EM) Algorithm
• Maximization step: Reestimate the parameters using current
responsibilities:
σ𝑁
𝑛=1 𝛾 𝑧𝑛𝑘 𝑥𝑛
𝜇𝑘𝑛𝑒𝑤 = , where 𝑁𝑘 = σ𝑁
𝑛=1 𝛾 𝑧𝑛𝑘
𝑁𝑘
• The mean 𝜇𝑘 for the kth Gaussian component is obtained by taking a
weighted mean of all of the points in the data set, in which the
weighting factor for data point 𝒙𝒏 is given by the posterior probability
𝛾 𝑧𝑛𝑘 that component k was responsible for generating 𝒙𝒏 .
Expectation Maximization (EM) Algorithm
• Setting derivative of ln 𝑓(𝑋|𝜇, Σ, 𝑊) equal to 0 w.r.t. Σ𝑘
𝑇
𝜇𝑘𝑛𝑒𝑤 𝑥𝑛 −ෝ
𝛾 𝑧𝑛𝑘 𝑥𝑛 −ෝ 𝜇𝑘𝑛𝑒𝑤
• Σ𝑘new = σ𝑁
𝑛=1 𝑁𝑘
• Finally maximize ln 𝑓(𝑋|𝜇, Σ, 𝑊), with respect to 𝑤𝑘 subject to constraint
𝐾

෍ 𝑤𝑘 = 1
𝑘=1
This can be achieved using Lagrange multiplier and maximizing
𝐾

ln 𝑓(𝑋|𝜇, Σ, 𝑊) + 𝜆(෍ 𝑤𝑘 − 1)
𝑘=1
𝑁𝑘
Resulting 𝑤𝑘𝑛𝑒𝑤 = , where 𝑁𝑘 = σ𝑁
𝑛=1 𝛾 𝑧𝑛𝑘
𝑁
• Evaluate ln 𝑓(𝑋|𝜇, Σ, 𝑊) = σ𝑁 𝑁 𝐾
𝑖=1 ln 𝑓 𝑥𝑖 = σ𝑖=1 ln σ𝑘=1 𝑤𝑘 𝒩 𝑥 𝜇𝑘 , Σ𝑘
• Iterate through E-step and M-step.
Expectation Maximization (EM)

Bishop, C. M. (2006).
Pattern recognition and
machine
learning. Springer
google schola, 2, 5-43.
EM Algorithm
• Since K-means is faster, it is common to run the K-means algorithm to
find a suitable initialization for a Gaussian mixture model that is
subsequently adapted using EM.

You might also like