Expectation-Maximization Clustring V2
Expectation-Maximization Clustring V2
1.Expectation-maximization: .................................................................................................................................. 2
1.1.Definition: ..................................................................................................................................................... 2
1.2.Intuition behind: ............................................................................................................................................ 2
1.3 Mathematic formulation: ............................................................................................................................... 2
1.4. EM for Clustering: (Soft assignment) .......................................................................................................... 3
1.4.1 Mixture Models: ..................................................................................................................................... 3
1.4.2 Example: ................................................................................................................................................. 3
1.4.3 Complexity: ............................................................................................................................................ 5
Conclusion: ............................................................................................................................................................. 5
References: .............................................................................................................................................................. 6
2.Mean shift clustering: .......................................................................................................................................... 7
2.1 Definition: ..................................................................................................................................................... 7
2.1.1 Advantages: ............................................................................................................................................ 7
2.1.2 How Does Mean-Shift Clustering Work? .............................................................................................. 7
2.2 Example:........................................................................................................................................................ 8
2.3 Complexity: ................................................................................................................................................... 8
References: .............................................................................................................................................................. 9
1
Project Data mining Ensam Meknes
1.Expectation-maximization:
1.1.Definition:
The Expectation-Maximization (EM) algorithm is an iterative optimization method that combines different
unsupervised machine learning algorithms to find maximum likelihood or maximum posterior estimates of
parameters in statistical models that involve unobserved latent variables.
- E-step, the algorithm computes the latent variables using the current parameter estimates.
- M-step, the algorithm determines the parameters that maximize the expected log-likelihood obtained in
the E-step, then corresponding model parameters are updated.
1.2.Intuition behind:
Case 1: Distribution parameters are Known/Missing values:
Let's consider that we have a variable X with values [1,2, x], the X has the gaussian distribution (1,1), the best
estimation for x is the mean value 1.
I know all values [1,2,3], and I want to estimate the µ, so the best value is the arithmetic mean = 3.
To guess missing values, I need µ, and to estimate µ I need all values? It’s like Checken-egg problem, so here
the EM (Expectation-maximization) came to game, how?
We guess a µ0 = 0, then x0 = 0, then µ1 = 1 (1+2+0/3), so x1 = 1... at some point of this iterative process we
1+2+𝑥
come to this equation: µ = = 𝑥 so, the x = 1.5 = µ.
3
2
Project Data mining Ensam Meknes
for each data point, x (in red), we can compute the probability that it belongs to each component
(cluster/distribution)
1.4.2 Example:
In this example our dataset is a bunch of 1-dimentionel points, we have two gaussian mixtures (distributions),
we try to find out if a specific point belongs to the red or blue distribution.
3
Project Data mining Ensam Meknes
Next, we calculate the probability that a point belongs to a distribution, we call the responsibilities:
1 (𝑥𝑖−𝜇𝑏 )2 𝑃(𝑥𝑖/𝑏𝑙𝑢𝑒)⋅𝑃(𝑏𝑙𝑢𝑒)
𝑃(𝑥𝑖/𝑏𝑙𝑢𝑒) = exp (− ) 𝑏𝑖 =
2𝜎𝑏2 𝑃(𝑥𝑖/𝑏𝑙𝑢𝑒)⋅𝑃(𝑏𝑙𝑢𝑒)+𝑃(𝑥𝑖/𝑟𝑒𝑑)⋅𝑃(𝑟𝑒𝑑)
√2𝜋𝜎𝑏2
𝑎𝑖 = 1 − 𝑏𝑖
bi,ai are the probability that a point xi belongs the blue, red distributions respectively.
4
Project Data mining Ensam Meknes
Like K-means is an iterative approach. After several iterations, parameters are no longer changing
(convergence) we come up with our clusters.
1.4.3 Complexity:
Its time complexity is of 𝑂(𝑁𝐾𝐷 3 ), where N is the number of data points, K is the number of Gaussian
components and D is the problem dimension.
For example, for a problem with 3 components, 2D, and with 200 points per cluster the running time is around
2 min.
Conclusion:
The EM algorithm is very sensitive to initialization. What some people recommend is to run K-Means (because
it has a lower computational cost) and use the output centers as the initialization means of the mixture
components. By doing that, you substantially accelerate the convergence of the EM algorithm. I would add that
it is also easier to find an appropriate number of clusters by running K-Means.
Nevertheless, the EM algorithm is considered to be better than K-Means because it provides additional
information about the data, namely, the dispersion (variance) of the cluster, not only its centers.
5
Project Data mining Ensam Meknes
References:
Definitions
Intuition
EM clustring1
EM clustring2
6
Project Data mining Ensam Meknes
2.1.1 Advantages:
Unlike the popular K-Means cluster algorithm, mean-shift does not require specifying the number of clusters in
advance. The number of clusters is determined by the algorithm with respect to the data.
It is particularly useful for datasets where the clusters have arbitrary shapes and are not well-separated by linear
boundaries.
Kernel Density Estimation: we need first to estimate the density function for our data points, using KDE
technique, we start by assigning a kernel function to each data point, this function can be equivalent to a
gaussian distribution with zero mean and unit variance (Eq1), the assigned function (Eq2) is divided by a
parameter h (kernel bandwidth) to have a unit area.
The KDE will be the sum of the kernel functions (Eq3) with n is the number of points.
Eq1 Eq3
Eq2
7
Project Data mining Ensam Meknes
Shifting Data Points: In the second step, the algorithm iteratively shifts the data points towards regions of
higher density. The shift is determined by calculating the mean shift vector for each data point, this shift vector
calculated inside a region of interest determined by a radius R (the only parameter of the Algorithm).
Convergence and Cluster Identification: The algorithm continues shifting the data points until convergence is
reached. Convergence in Mean Shift occurs when the data points stop moving significantly. This means that the
data points have reached the modes of the density distribution.
Once convergence is achieved, the final position of each data point represents a cluster center. So points
belongs to the same cluster will converge to the same point (Cluster center /mode).
Once we identify centroids the algorithm assigns each data point to the closest cluster center.
2.2 Example:
An example on the car.xls dataset (from lab3 Tanagra), with performing PCA to visualize results, give 3
clusters as showing bellow:
2.3 Complexity:
Sklearn’s implementation of the algorithm, has a lower runtime complexity, will usually be around
O(T*n*log(n)), where n is the number of samples and T is the number of iterations. In higher dimensions, the
complexity will be around O(T*n²).
8
Project Data mining Ensam Meknes
References:
Definition
Explications