0% found this document useful (0 votes)
28 views4 pages

Mixture Models and EM Algorithm: S. Sumitra

The document summarizes mixture models and the Expectation-Maximization (EM) algorithm. It explains that mixture models represent cluster distributions as a combination of component distributions like Gaussians. The EM algorithm is used to determine the unknown parameters of a mixture model, including component weights, means, and covariances. It does this through an iterative E-step to calculate membership probabilities, and an M-step to re-estimate the parameters based on these probabilities.

Uploaded by

kadarsh226521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views4 pages

Mixture Models and EM Algorithm: S. Sumitra

The document summarizes mixture models and the Expectation-Maximization (EM) algorithm. It explains that mixture models represent cluster distributions as a combination of component distributions like Gaussians. The EM algorithm is used to determine the unknown parameters of a mixture model, including component weights, means, and covariances. It does this through an iterative E-step to calculate membership probabilities, and an M-step to re-estimate the parameters based on these probabilities.

Uploaded by

kadarsh226521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Mixture Models and EM Algorithm

S. Sumitra

Clustering problems could be solved by applying model-based approach, which


consists in using certain models for clusters and attempting to optimize the fit be-
tween the data and the model. Each cluster (component) can be mathematically
represented by a parametric distribution, for eg, Gaussian (continuous) or a poisson
(discrete). The entire data set is therefore modelled by a mixture of these distribu-
tions. An individual distribution used to model a specific cluster is often referred to
as a component distribution.
Let there be k clusters. Let the random variable C denote the component with
valuesP1, ..k. Here we are considering Gaussian mixture models. So xj /(C = i) ∼
N (µi , i ) where µi and Σi are the mean and covariance matrix of the ith class.
A data point is generated by first choosing a component and then generating a
sample from that component. By total probability theorem,
k
X
p(x) = p(C = i)p(x/C = i) (1)
i=1

[p(C = i) is analogous to p(y = i) in Gaussian discriminant analysis.]


To determine in which cluster each xj belongs, p(C = i/xj ) has to be found. Now

p(C = i)p(xj /C = i)
p(C = i/xj ) = pij = , i = 1, 2, . . . k, j = 1, 2, . . . N (2)
p(xj )

Hence ki=1 pij = 1. Let wi = p(C = i), i = 1, 2. . . . k. Therefore the unknown


P
parameters of a mixture of Gaussians are wi , µi and Σi .
The EM algorithm can be applied to determine the unknown parameters. The
EM algorithm has two main steps: E step & M step. In the E-step, it assumes
the values of the model (that is wi , µi and Σi ) and find P (C = i/xj ), i = 1, 2, . . . k,
j = 1, 2, . . . N . In the M-step, it updates the parameters of the model. The process
iterates until convergence.

1
E step
In the E step, compute the probabilities pij , i = 1, 2, . . . k, j = 1, 2, . . . N.

M step
Compute the new mean, covariance and component weights as follows:
PN
j=1 pij xj
µi = PN
j=1 pij
P
j 1{xj ∈ C = i}xj
[For sure event, µi = P . Here, we don’t know whether xj is in
j 1.{xj ∈ C = i}
component i. We only know p(C = i/xj ).]
T
P
j pij (xj − µi )(xj − µi )
Σi = PN
j=1 pij
PN
j=1 pij
wi =
N
[Compare these formulas with those of Gaussian discriminant analysis]
The algorithm can be summarized as follows:

2
Algorithm 1 EM algorithm
Initialize µi , Σi , wi , i = 1, 2, . . . k
Iterate until covergence:
E Step
for i = 1 to k do
for j = 1 to N do
1 1
calculate p(xj /C = i) = n/2 1/2
exp − (xj − µi )T Σ−1
i (xj − µi )
(2π) |Σi | 2
PN 
calculate pij = (p(xj /C = i)wi ) / j=1 p(x j /C = i)w i
end for
pi = N
P
j=1 pij
end for
M Step
for i = 1 to k doP
N
j=1 pij xj
calculate µi =
PN p i T
j=1 pij (xj − µi )(xj − µi )
calculate Σi =
pi
pi
set wi =
N
end for
end

3
References
(1) Artificial Intelligence by Stuart Russel and Peter Norwig
(2) Andrew Ng’s Lecture Note

You might also like