Module13 GaussianMixtureModel
Module13 GaussianMixtureModel
Bishop, C. M. (2006).
Pattern recognition
and machine
learning. Springer
google schola, 2, 5-43.
Gaussian Distribution
• Univariate Gaussian Distribution:
1 1 2
• 𝑓 𝑥|𝜇, 𝜎 = exp − 2 𝑥−𝜇
𝜎 2𝜋 2𝜎
3 gaussian distribution
That generated the datapoints
𝑓(𝑥) = 𝑤𝑘 𝒩(𝑥|𝜇𝑘 , Σ𝑘 )
𝑘=1
Normalization and positivity of weights (mixing coefficients):
0 ≤ 𝑤𝑘 ≤ 1, σ𝐾 𝑘=1 𝑤𝑘 = 1
• Log-likelihood:
𝑁 𝑁 𝐾
ln 𝑓(𝑋|𝜇, Σ, 𝑊) = ln 𝑓 𝑥𝑖 = ln 𝑤𝑘 𝒩 𝑥 𝜇𝑘 , Σ𝑘
𝑖=1 𝑖=1 𝑘=1
Responsibilities
• The mixing coefficients can be thought of as prior probabilities
• For a given value of ‘x’, the posterior probabilities can be calculated, which are
also called “responsibilities”
• Using Bayes rule:
𝑓 𝑥𝑘 𝑓 𝑘 𝑤𝑘 𝑓𝑘 𝑥
𝛾𝑘 𝑥 = 𝑓 𝑘 𝑥 = = σ𝑙 𝑤𝑙 𝑓𝑙 𝑥
𝑓(𝑥)
𝑤𝑘 𝒩 𝑥 𝜇𝑘 , Σ𝑘
= 𝐾
σ𝑙=1 𝑤𝑙 𝒩(𝑥|𝜇𝑙 , Σ𝑙 )
𝑁𝑘
where 𝑤𝑘 =
𝑁
𝑤𝑘 = 1
𝑘=1
This can be achieved using Lagrange multiplier and maximizing
𝐾
ln 𝑓(𝑋|𝜇, Σ, 𝑊) + 𝜆( 𝑤𝑘 − 1)
𝑘=1
𝑁𝑘
Resulting 𝑤𝑘𝑛𝑒𝑤 = , where 𝑁𝑘 = σ𝑁
𝑛=1 𝛾 𝑧𝑛𝑘
𝑁
• Evaluate ln 𝑓(𝑋|𝜇, Σ, 𝑊) = σ𝑁 𝑁 𝐾
𝑖=1 ln 𝑓 𝑥𝑖 = σ𝑖=1 ln σ𝑘=1 𝑤𝑘 𝒩 𝑥 𝜇𝑘 , Σ𝑘
• Iterate through E-step and M-step.
Expectation Maximization (EM)
Bishop, C. M. (2006).
Pattern recognition and
machine
learning. Springer
google schola, 2, 5-43.
EM Algorithm
• Since K-means is faster, it is common to run the K-means algorithm to
find a suitable initialization for a Gaussian mixture model that is
subsequently adapted using EM.