0% found this document useful (0 votes)
19 views9 pages

Expectation-Maximization Clustring V2

Uploaded by

Mono job
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views9 pages

Expectation-Maximization Clustring V2

Uploaded by

Mono job
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Project Data mining Ensam Meknes

1.Expectation-maximization: .................................................................................................................................. 2
1.1.Definition: ..................................................................................................................................................... 2
1.2.Intuition behind: ............................................................................................................................................ 2
1.3 Mathematic formulation: ............................................................................................................................... 2
1.4. EM for Clustering: (Soft assignment) .......................................................................................................... 3
1.4.1 Mixture Models: ..................................................................................................................................... 3
1.4.2 Example: ................................................................................................................................................. 3
1.4.3 Complexity: ............................................................................................................................................ 5
Conclusion: ............................................................................................................................................................. 5
References: .............................................................................................................................................................. 6
2.Mean shift clustering: .......................................................................................................................................... 7
2.1 Definition: ..................................................................................................................................................... 7
2.1.1 Advantages: ............................................................................................................................................ 7
2.1.2 How Does Mean-Shift Clustering Work? .............................................................................................. 7
2.2 Example:........................................................................................................................................................ 8
2.3 Complexity: ................................................................................................................................................... 8
References: .............................................................................................................................................................. 9

Figure 1:Example of a KDE function for 7 data points .................................................................................... 7

1
Project Data mining Ensam Meknes

1.Expectation-maximization:
1.1.Definition:
The Expectation-Maximization (EM) algorithm is an iterative optimization method that combines different
unsupervised machine learning algorithms to find maximum likelihood or maximum posterior estimates of
parameters in statistical models that involve unobserved latent variables.

There are two main steps:

- E-step, the algorithm computes the latent variables using the current parameter estimates.
- M-step, the algorithm determines the parameters that maximize the expected log-likelihood obtained in
the E-step, then corresponding model parameters are updated.

1.2.Intuition behind:
Case 1: Distribution parameters are Known/Missing values:

Let's consider that we have a variable X with values [1,2, x], the X has the gaussian distribution (1,1), the best
estimation for x is the mean value 1.

Case 2: A parameter is unknown/No Missing values:

I know all values [1,2,3], and I want to estimate the µ, so the best value is the arithmetic mean = 3.

Case 3: A parameter is unknown/Missing values:

To guess missing values, I need µ, and to estimate µ I need all values? It’s like Checken-egg problem, so here
the EM (Expectation-maximization) came to game, how?

We guess a µ0 = 0, then x0 = 0, then µ1 = 1 (1+2+0/3), so x1 = 1... at some point of this iterative process we
1+2+𝑥
come to this equation: µ = = 𝑥 so, the x = 1.5 = µ.
3

1.3 Mathematic formulation:


- P (X = x /µ) the probability that X equal to x knowing that X distribution is N(µ,1).
- L = Log (likehood (x1, x2....x)/µ)) = log(p(x1/µ)) + log(p(x2/µ)) +......+ log(p(x/µ))
1- Geuss µ0.
𝑝
2- 𝐸(log(𝑙𝑖𝑘𝑒ℎ𝑜𝑜𝑑(µ))) = ∫𝑥 (𝑥/µ0) ⋅ 𝐿 ⋅ 𝑑𝑥 , this function used to estimate the next value of µ.
3- µ = argmax (E (log (likehood(µ)))), the value that maximize the E function.
=> the main application of this algorithm is for clustering tasks or handling missing values.

2
Project Data mining Ensam Meknes

1.4. EM for Clustering: (Soft assignment)


Like K-means we start with a random guess of distributions/clusters and then proceed to improve iteratively by
alternating two steps:
1. (Expectation) Assign each data point to a cluster probabilistically.
2. (Maximization) Update the parameters for each cluster based on the points in the cluster (weighted by
their probability assigned in the first step).
instead of doing “hard” assignment providing a cluster for each data point (K-means). The model will
provide the probability that a given data point belongs to each cluster. This is called “soft” assignment,
EM implemented for clustering as Mixture Models.

1.4.1 Mixture Models:


A Mixture Model is expressed by the following equations:

With k is the number of clusters


πⱼ are the mixture weights
GMM is a Mixture Model where the pⱼ(x) is a finite combination of Gaussian Distributions. Where θ is the
collection of all the parameters of the model (mixture weights, means, and covariance matrices)

for each data point, x (in red), we can compute the probability that it belongs to each component
(cluster/distribution)

1.4.2 Example:
In this example our dataset is a bunch of 1-dimentionel points, we have two gaussian mixtures (distributions),
we try to find out if a specific point belongs to the red or blue distribution.

3
Project Data mining Ensam Meknes

Next, we calculate the probability that a point belongs to a distribution, we call the responsibilities:
1 (𝑥𝑖−𝜇𝑏 )2 𝑃(𝑥𝑖/𝑏𝑙𝑢𝑒)⋅𝑃(𝑏𝑙𝑢𝑒)
𝑃(𝑥𝑖/𝑏𝑙𝑢𝑒) = exp (− ) 𝑏𝑖 =
2𝜎𝑏2 𝑃(𝑥𝑖/𝑏𝑙𝑢𝑒)⋅𝑃(𝑏𝑙𝑢𝑒)+𝑃(𝑥𝑖/𝑟𝑒𝑑)⋅𝑃(𝑟𝑒𝑑)
√2𝜋𝜎𝑏2

𝑎𝑖 = 1 − 𝑏𝑖

bi,ai are the probability that a point xi belongs the blue, red distributions respectively.

Then updating parameters of the distributions (mean, std):


𝑏1⋅𝑥1 + 𝑏2⋅𝑥2 +⋯+𝑏𝑛⋅𝑥𝑛 𝑎1⋅𝑥1 + 𝑎2⋅𝑥2 +⋯+𝑎𝑛⋅𝑥𝑛
𝜇𝑏 = 𝜇𝑎 =
𝑏1+𝑏2+⋯+𝑏𝑛 𝑎1+𝑎2+⋯+𝑎𝑛

𝑏1 ⋅ (𝑥1 − 𝜇𝑏)2 + 𝑏2 ⋅ (𝑥2 − 𝜇𝑏)2 + ⋯ + 𝑏𝑛 ⋅ (𝑥𝑛 − 𝜇𝑏)2


𝜎𝑏2 =
𝑏1 + 𝑏2 + ⋯ + 𝑏𝑛
𝑎1 ⋅ (𝑥1 − 𝜇𝑎)2 + 𝑎2 ⋅ (𝑥2 − 𝜇𝑎)2 + ⋯ + 𝑎𝑛 ⋅ (𝑥𝑛 − 𝜇𝑎)2
𝜎𝑎2 =
𝑎1 + 𝑎2 + ⋯ + 𝑎𝑛
We can observe that if bi, ai = 0 or 1 in the means we obtain formulas like those we use for updating centroids
in K-means.

4
Project Data mining Ensam Meknes

Like K-means is an iterative approach. After several iterations, parameters are no longer changing
(convergence) we come up with our clusters.

1.4.3 Complexity:
Its time complexity is of 𝑂(𝑁𝐾𝐷 3 ), where N is the number of data points, K is the number of Gaussian
components and D is the problem dimension.
For example, for a problem with 3 components, 2D, and with 200 points per cluster the running time is around
2 min.

Conclusion:
The EM algorithm is very sensitive to initialization. What some people recommend is to run K-Means (because
it has a lower computational cost) and use the output centers as the initialization means of the mixture
components. By doing that, you substantially accelerate the convergence of the EM algorithm. I would add that
it is also easier to find an appropriate number of clusters by running K-Means.
Nevertheless, the EM algorithm is considered to be better than K-Means because it provides additional
information about the data, namely, the dispersion (variance) of the cluster, not only its centers.

5
Project Data mining Ensam Meknes

References:
Definitions

Intuition

EM clustring1

EM clustring2

6
Project Data mining Ensam Meknes

2.Mean shift clustering:


2.1 Definition:
Mean-shift clustering is an unsupervised machine learning algorithm used to identify clusters within a dataset. It
is a density-based clustering method that focuses on finding the regions of high density and iteratively shifting
data points towards the highest density of points.

2.1.1 Advantages:
Unlike the popular K-Means cluster algorithm, mean-shift does not require specifying the number of clusters in
advance. The number of clusters is determined by the algorithm with respect to the data.

It is particularly useful for datasets where the clusters have arbitrary shapes and are not well-separated by linear
boundaries.

2.1.2 How Does Mean-Shift Clustering Work?


The process can be divided to 3 main steps:

Kernel Density Estimation: we need first to estimate the density function for our data points, using KDE
technique, we start by assigning a kernel function to each data point, this function can be equivalent to a
gaussian distribution with zero mean and unit variance (Eq1), the assigned function (Eq2) is divided by a
parameter h (kernel bandwidth) to have a unit area.

The KDE will be the sum of the kernel functions (Eq3) with n is the number of points.

Eq1 Eq3

Eq2

Figure 1:Example of a KDE function for 7 data points

7
Project Data mining Ensam Meknes

Shifting Data Points: In the second step, the algorithm iteratively shifts the data points towards regions of
higher density. The shift is determined by calculating the mean shift vector for each data point, this shift vector
calculated inside a region of interest determined by a radius R (the only parameter of the Algorithm).

Convergence and Cluster Identification: The algorithm continues shifting the data points until convergence is
reached. Convergence in Mean Shift occurs when the data points stop moving significantly. This means that the
data points have reached the modes of the density distribution.

Once convergence is achieved, the final position of each data point represents a cluster center. So points
belongs to the same cluster will converge to the same point (Cluster center /mode).

Once we identify centroids the algorithm assigns each data point to the closest cluster center.

2.2 Example:
An example on the car.xls dataset (from lab3 Tanagra), with performing PCA to visualize results, give 3
clusters as showing bellow:

2.3 Complexity:
Sklearn’s implementation of the algorithm, has a lower runtime complexity, will usually be around
O(T*n*log(n)), where n is the number of samples and T is the number of iterations. In higher dimensions, the
complexity will be around O(T*n²).

8
Project Data mining Ensam Meknes

References:
Definition

Explications

You might also like