0% found this document useful (0 votes)
10 views28 pages

Probablistic_Clustering

The document covers probabilistic clustering in machine learning, focusing on Gaussian Mixture Models (GMM) and the Expectation-Maximization (EM) algorithm. It contrasts probabilistic clustering, which assigns membership probabilities to data points, with hard clustering methods like k-means. The session also includes a practical example of clustering students based on exam scores using GMM, illustrating the concept of soft assignments.

Uploaded by

meowyoongi159
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views28 pages

Probablistic_Clustering

The document covers probabilistic clustering in machine learning, focusing on Gaussian Mixture Models (GMM) and the Expectation-Maximization (EM) algorithm. It contrasts probabilistic clustering, which assigns membership probabilities to data points, with hard clustering methods like k-means. The session also includes a practical example of clustering students based on exam scores using GMM, illustrating the concept of soft assignments.

Uploaded by

meowyoongi159
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Probabilistic Clustering

Session No.: 30
Course Name: Machine Learning
Course Code: E1UA406C
Instructor Name:
Date of Conduction of Class: 05-06-2025

Galgotias University 1
Review of the key concepts of previous session
Hierarchical Clustering produces a tree-like structure of clusters (called a
dendrogram).

Hierarchical Agglomerative Clustering (HAC) is a bottom-up clustering method used


in unsupervised machine learning. It builds a hierarchy of clusters by repeatedly
merging the closest pair of clusters, starting from individual points.

In single linkage, the distance between two clusters is defined as the shortest
distance between any single point in one cluster and any point in the other cluster.

In complete linkage, the distance between two clusters is defined as the maximum
distance between any single point in one cluster and any point in the other cluster.
Galgotias University 2
Galgotias University 3
At the end of this session you will be able to

LO1: Identify and apply Gaussian Mixture


Models (GMM) for data clustering using
probability distributions.

LO2: Evaluate cluster membership using


soft assignments and expectation-
maximization (EM) algorithm.
4
1 Introduction of the title

2 Basic Concepts

Session 3 Review Quiz(today’s topic)

Outline 4 Learning Activities and Reflections

5 Summary

6 Post-session discussion and information to next topic

5
Probabilistic Clustering is a type of clustering technique in
machine learning where each data point is assigned to one
or more clusters based on a probability distribution.

Unlike hard clustering (like k-means), where each point


belongs strictly to one cluster, probabilistic clustering
assigns a probability (or likelihood) of membership to each
cluster for every point.

Galgotias University 6
Expectation-Maximization (EM) Algorithm:

A popular algorithm used to find the parameters (mean and covariance) of


probabilistic models, especially in Gaussian Mixture Models (GMMs).

1. E-Step: Estimate probabilities of data points belonging to each cluster.

2. M-Step: Updates model parameters to maximize the likelihood based on


the estimated values from the E-step.

The better the model the higher this value.

Galgotias University 7
Gaussian Mixture Models (GMM)

▪ Most common probabilistic clustering model.


▪ Assumes data is generated from a mixture of several
Gaussian distributions.
▪ Each cluster is modeled as a Gaussian distribution with
its own mean and covariance.

Galgotias University 8
Example Problem:
We have a small dataset of 2D points. We want to cluster them into two groups using a
Gaussian Mixture Model (GMM).
Imagine we have exam scores of 6 students in two subjects: Math and English.

Student Math English


A 85 80
B 82 78
C 30 25
D 28 20
E 60 55
F 58 58

Our goal is to group these students into 2 clusters based on their scores, but with
probabilistic assignments, not hard ones.
Galgotias University 9
Step 1: Plot the data
If you plot the data, you'd see:
▪ Students A and B are high scorers
▪ C and D are low scorers
▪ E and F are in the middle
So we might expect 2 clusters, but E and F could belong partly to both
clusters.

Step 2: Apply GMM

We fit a Gaussian Mixture Model with 2 components (clusters). This model


assumes each cluster is a Gaussian distribution in the score space.
Galgotias University 10
Step 3: Probabilistic Output
After running GMM, we get the following cluster membership probabilities:
Student P(Cluster 1) P(Cluster 2) Final Assignment
A 0.95 0.05 Cluster 1
B 0.93 0.07 Cluster 1
C 0.06 0.94 Cluster 2
D 0.04 0.96 Cluster 2
E 0.55 0.45 Cluster 1 (soft)
F 0.52 0.48 Cluster 1 (soft)
Interpretation
•A and B are very likely in Cluster 1 (high scorers).
•C and D are very likely in Cluster 2 (low scorers).
•E and F have soft membership — they belong partially to both clusters:
• E has a 55% chance of being in Cluster 1 and 45% in Cluster 2.
• This reflects their intermediate score levels.
Galgotias University 11
Calculating cluster membership probability in GMM
Example: Let's start with a simple dataset and go step-by-step to compute
cluster membership probabilities using a GMM with 2 components.

Data points (X): [1, 2, 3, 8, 9, 10] its 1D Dataset

Step 1: Initialization from Scratch


In GMM (or EM algorithm), the initial values of mean, variance, and weight are often:

❑ Set randomly (or using k-means centroids).

❑ Or heuristically chosen (e.g., first half of data in one cluster, second half in another).

Cluster 1 → [1, 2, 3]
Cluster 2 → [8, 9, 10]
Galgotias University 17
Galgotias University 18
Galgotias University 19
Galgotias University 20
Galgotias University 21
Galgotias University 22
Galgotias University 23
Galgotias University 24
Learning Activity 1:

Quiz on LMS

GSCALE full form and date 25


Post Session Activity
https://www.youtube.com/shorts/Nc2G2g8-Obw

26
Next Session

Dimensionality
Reduction
27
19

Thank You
28

You might also like