Expectation-Maximization Algorithm - ML

Last Updated : 16 May, 2025

Expectation-Maximization (EM) algorithm is a iterative method used in unsupervised machine learning to find unknown values in statistical models. It helps to find the best values for unknown parameters especially when some data is missing or hidden. It works in two steps:

E-step (Expectation Step): Estimates missing or hidden values using current parameter estimates.
M-step (Maximization Step): Updates model parameters to maximize the likelihood based on the estimated values from the E-step.

This process repeats until the model reaches a stable solution as it improve accuracy with each iteration. It is widely used in clustering like Gaussian Mixture Models and handling missing data.

Expectation-Maximization in EM Algorithm-Geeksforgeeks — Expectation-Maximization in EM Algorithm

By iteratively repeating these steps the EM algorithm seeks to maximize the likelihood of the observed data.

Key Terms in Expectation-Maximization (EM) Algorithm

Lets understand about some of the most commonly used key terms in the Expectation-Maximization (EM) Algorithm:

Latent Variables: These are hidden parts of the data that we can’t see directly but they still affect what we do see. We try to guess their values using the visible data.
Likelihood: This refers to the probability of seeing the data we have based on certain assumptions or parameters. The EM algorithm tries to find the best parameters that make the data most likely.
Log-Likelihood: This is just the natural log of the likelihood function. It's used to make calculations easier and measure how well the model fits the data. The EM algorithm tries to maximize the log-likelihood to improve the model fit.
Maximum Likelihood Estimation (MLE): This is a method to find the best values for a model’s settings called parameters. It looks for the values that make the data we observed most likely to happen.
Posterior Probability: In Bayesian methods this is the probability of the parameters given both prior knowledge and the observed data. In EM it helps estimate the "best" parameters when there's uncertainty about the data.
Expectation (E) Step: In this step the algorithm estimates the missing or hidden information (latent variables) based on the observed data and current parameters. It calculates probabilities for the hidden values given what we can see.
Maximization (M) Step: This step update the parameters by finding the values that maximize the likelihood based on the estimates from the E-step.
Convergence: Convergence happens when the algorithm has reached a stable point. This is checked by seeing if the changes in the model's parameters or the log-likelihood are small enough to stop the process.

Working of Expectation-Maximization (EM) Algorithm

So far, we've discussed the key terms in the EM algorithm. Now, let's dive into how the EM algorithm works. Here's a step-by-step breakdown of the process:

EM Algorithm Flow chart-Geeksforgeeks — EM Algorithm Flowchart

1. Initialization: The algorithm starts with initial parameter values and assumes the observed data comes from a specific model.

2. E-Step (Expectation Step):

Find the missing or hidden data based on the current parameters.
Calculate the posterior probability of each latent variable based on the observed data.
Compute the log-likelihood of the observed data using the current parameter estimates.

3. M-Step (Maximization Step):

Update the model parameters by maximize the log-likelihood.
The better the model the higher this value.

4. Convergence:

Check if the model parameters are stable and converging.
If the changes in log-likelihood or parameters are below a set threshold, stop. If not repeat the E-step and M-step until convergence is reached

Implementation of Expectation-Maximization Algorithm

Step 1 : Import the necessary libraries

First we will import the necessary Python libraries like NumPy, Seaborn, Matplotlib and SciPy.

Python

import numpy as np
import seaborn as sns
from scipy.stats import norm
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt

Step 2 : Generate a dataset with two Gaussian components

We generate two sets of data values from two different normal distributions:

One centered around 2 (with more spread).
Another around -1 (with less spread).

These two sets are then combined to form a single dataset. We plot this dataset to visualize how the values are distributed.

Python

mu1, sigma1 = 2, 1
mu2, sigma2 = -1, 0.8
X1 = np.random.normal(mu1, sigma1, size=200)
X2 = np.random.normal(mu2, sigma2, size=600)
X = np.concatenate([X1, X2])

sns.kdeplot(X)
plt.xlabel('X')
plt.ylabel('Density')
plt.title('Density Estimation of X')
plt.show()

Output:

Density Plot-Geeksforgeeks — Density Plot

Step 3: Initialize parameters

We make initial guesses for each group’s:

Mean (average),
Standard deviation (spread),
Proportion (how much each group contributes to the total data).

Python

mu1_hat, sigma1_hat = np.mean(X1), np.std(X1)
mu2_hat, sigma2_hat = np.mean(X2), np.std(X2)
pi1_hat, pi2_hat = len(X1) / len(X), len(X2) / len(X)

Step 4: Perform EM algorithm

We run a loop for 20 rounds called epochs. In each round:

The E-step calculates the responsibilities (gamma values) by evaluating the Gaussian probability densities for each component and weighting them by the corresponding proportions.
The M-step updates the parameters by computing the weighted mean and standard deviation for each component

We also calculate the log-likelihood in each round to check if the model is getting better. This is a measure of how well the model explains the data.

Python

num_epochs = 20
log_likelihoods = []

for epoch in range(num_epochs):
    # E-step: Compute responsibilities
    gamma1 = pi1_hat * norm.pdf(X, mu1_hat, sigma1_hat)
    gamma2 = pi2_hat * norm.pdf(X, mu2_hat, sigma2_hat)
    total = gamma1 + gamma2
    gamma1 /= total
    gamma2 /= total
    
    # M-step: Update parameters
    mu1_hat = np.sum(gamma1 * X) / np.sum(gamma1)
    mu2_hat = np.sum(gamma2 * X) / np.sum(gamma2)
    sigma1_hat = np.sqrt(np.sum(gamma1 * (X - mu1_hat)**2) / np.sum(gamma1))
    sigma2_hat = np.sqrt(np.sum(gamma2 * (X - mu2_hat)**2) / np.sum(gamma2))
    pi1_hat = np.mean(gamma1)
    pi2_hat = np.mean(gamma2)
    
    # Compute log-likelihood
    log_likelihood = np.sum(np.log(pi1_hat * norm.pdf(X, mu1_hat, sigma1_hat)
                                   + pi2_hat * norm.pdf(X, mu2_hat, sigma2_hat)))
    log_likelihoods.append(log_likelihood)


plt.plot(range(1, num_epochs+1), log_likelihoods)
plt.xlabel('Epoch')
plt.ylabel('Log-Likelihood')
plt.title('Log-Likelihood vs. Epoch')
plt.show()

Output:

Epoch vs Log-likelihood-Geeksforgeeks — Epoch vs Log-likelihood

Step 5: Visualize the Final Result

Now we will finally visualize the curve which compare the final estimated curve (in red) with the original data’s smooth curve (in green).

Python

X_sorted = np.sort(X)
density_estimation = pi1_hat*norm.pdf(X_sorted,
                                        mu1_hat, 
                                        sigma1_hat) + pi2_hat * norm.pdf(X_sorted,
                                                                         mu2_hat, 
                                                                         sigma2_hat)


plt.plot(X_sorted, gaussian_kde(X_sorted)(X_sorted), color='green', linewidth=2)
plt.plot(X_sorted, density_estimation, color='red', linewidth=2)
plt.xlabel('X')
plt.ylabel('Density')
plt.title('Density Estimation of X')
plt.legend(['Kernel Density Estimation','Mixture Density'])
plt.show()

Output:

Estimated density-Geeksforgeeks — Estimated density

The above image compares Kernel Density Estimation (green) and Mixture Density (red) for variable X. Both show similar patterns with a main peak near -1.5 and a smaller bump around 2 indicate two data clusters. The red curve is slightly smoother and sharper than the green one.

Advantages of EM algorithm

Always improves results – With each step, the algorithm improves the likelihood (chances) of finding a good solution.
Simple to implement – The two steps (E-step and M-step) are often easy to code for many problems.
Quick math solutions – In many cases, the M-step has a direct mathematical solution (closed-form), making it efficient

Disadvantages of EM algorithm

Takes time to finish: It converges slowly meaning it may take many iterations to reach the best solution.
Gets stuck in local best: Instead of finding the absolute best solution, it might settle for a "good enough" one.
Needs extra probabilities: Unlike some optimization methods that only need forward probability, EM requires both forward and backward probabilities making it slightly more complex.

The EM algorithm iteratively estimates missing data and updates model parameters to improve accuracy. By alternating between the E-step and M-step it refines the model until it converges making it widely used tool for handling hidden or incomplete data in machine learning.

Linear Regression (Python Implementation)

raman_257

Improve

Article Tags :

Practice Tags :

Machine Learning

Expectation-Maximization Algorithm - ML

Key Terms in Expectation-Maximization (EM) Algorithm

Working of Expectation-Maximization (EM) Algorithm

Implementation of Expectation-Maximization Algorithm

Step 1 : Import the necessary libraries

Step 2 : Generate a dataset with two Gaussian components

Step 3: Initialize parameters

Step 4: Perform EM algorithm

Step 5: Visualize the Final Result

Advantages of EM algorithm

Disadvantages of EM algorithm

Similar Reads

Linear Model Regression

Linear Model Classification

Regularization

K-Nearest Neighbors (KNN)

Support Vector Machines

Decision Tree

Ensemble Learning

Thank You!

What kind of Experience do you want to share?