0% found this document useful (0 votes)
4 views193 pages

Module 3

Clustering is an unsupervised learning process that partitions data into meaningful sub-classes called clusters, where objects in the same cluster are similar to each other and dissimilar to those in other clusters. The document discusses various clustering approaches, including partitioning, hierarchical, density-based, graph-based, and model-based methods, with a focus on techniques such as k-Means, k-Medoids, and their respective algorithms. It also highlights the importance of choosing the right number of clusters and the advantages and limitations of these clustering techniques.

Uploaded by

2205092
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views193 pages

Module 3

Clustering is an unsupervised learning process that partitions data into meaningful sub-classes called clusters, where objects in the same cluster are similar to each other and dissimilar to those in other clusters. The document discusses various clustering approaches, including partitioning, hierarchical, density-based, graph-based, and model-based methods, with a focus on techniques such as k-Means, k-Medoids, and their respective algorithms. It also highlights the importance of choosing the right number of clusters and the advantages and limitations of these clustering techniques.

Uploaded by

2205092
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 193

Clustering

Prabhu Prasad Dev ML / Module-III 1


Clustering

• Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-
classes, called clusters.
• Unsupervised learning (No predefined classes)
• Cluster: a collection of data objects
• Similar to one another within the same cluster
• Dissimilar to the objects in other clusters
• Cluster analysis
• Finding groups of objects such that the objects in a group will be similar (or related) to one
another and different from (or unrelated to) the objects in other groups

Prabhu Prasad Dev ML / Module-III 2


What is Good Clustering?

• A good clustering method will produce high quality


clusters with
• high intra-class similarity
• low inter-class similarity

• The quality of a clustering result depends on both the


similarity measure used by the method and its
implementation.

Prabhu Prasad Dev ML / Module-III 3


Major Clustering Approaches

• Partitioning algorithms:
• These algorithms divide a dataset into a predetermined number of clusters.
• Each data point is assigned to exactly one cluster based on similarity.
• A well-known example is k-Means, which minimizes the variance within clusters.
• These methods work well when the number of clusters is known and the data is well-
separated.

• Hierarchical algorithms:
• These algorithms create a tree-like structure (dendrogram) by repeatedly merging or splitting
clusters.
• Agglomerative (Bottom-Up): Starts with individual points as clusters and merges them
iteratively.
• Divisive (Top-Down): Starts with a single cluster and splits it into smaller ones.
• Example: Hierarchical Clustering.

Prabhu Prasad Dev ML / Module-III 4


Major Clustering Approaches
• Density-based:
• Forms clusters based on dense regions of data points.
• Can discover clusters of arbitrary shape and is robust to noise.
• Works well for datasets with varying densities.
• Example: DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

• Graph-based:
• Represents data as a graph, where nodes represent data points, and edges represent
relationships (similarities) between them.
• This method is particularly useful when the relationships between data points are complex
and not easily captured by distance metrics alone.

• Model-based:
• Assumes that the data is generated by a mixture of underlying probability distributions.
• Tries to find the best fit for each cluster using statistical models.
• Can be useful when data follows a known distribution.
• Example: Gaussian Mixture Models (GMM).

Prabhu Prasad Dev ML / Module-III 5


Different Clustering Techniques
• k-Means algorithm [1957, 1967] • PAM [1990]
• k-Medoids algorithm • CLARA [1990]
• k-Modes [1998] • CLARANS
Partitioning [1994]
methods • Fuzzy c-means algorithm [1999]

• DIANA [1990]
Divisive
• AGNES [1990]
Hierarchical • BIRCH [1996]
methods Agglomerative • CURE [1998]
methods • ROCK [1999]
• Chamelon [1999]
Clustering
Techniques • STING [1997] • DENCLUE [1998]
Density-based
• DBSCAN [1996] • OPTICS [1999]
methods • CLIQUE [1998] • Wave Cluster [1998]

• MST Clustering [1999]


Graph based
• OPOSSUM [2000]
methods • SNN Similarity Clustering [2001, 2003]

• EM Algorithm [1977]
Model based • Auto class [1996]
clustering • COBWEB [1987]
• ANN Clustering [1982, 1989]

Prabhu Prasad Dev ML / Module-III 6


Clustering Techniques

 In this lecture, we shall cover the following clustering techniques only.

❑ Partitioning
❑ k-Means algorithm

❑ PAM (k-Medoids algorithm)

❑ Hierarchical
❑ Divisive algorithm

❑ Agglomerative algorithm

❑ Density based
❑ DBSCAN

Prabhu Prasad Dev ML / Module-III 7


k-Means

Prabhu Prasad Dev ML / Module-III 8


k-Means

▪ k-Means clustering algorithm proposed by J. Hartigan and M. A. Wong [1979].

▪ Given a set of n distinct objects, the k-Means clustering algorithm partitions the objects into k number of clusters such
that intracluster similarity is high but the intercluster similarity is low.

▪ In this algorithm, user has to specify k, the number of clusters.

Prabhu Prasad Dev ML / Module-III 9


k-Means

Algorithm : k-Means clustering

Input: D is a dataset containing n objects, k is the number of cluster


Output: A set of k clusters
Steps:
1. Randomly choose k objects from D as the initial cluster centroids.

2. For each of the objects in D do


i. Compute distance between the current objects and k cluster centroids
ii. Assign the current object to that cluster to which it is closest.

3. Re-compute the “cluster centers” by calculating mean of each cluster. These become the new cluster
centroids.

4. Repeat step 2-3 until the convergence criterion is satisfied

Prabhu Prasad Dev ML / Module-III 10


Convergence Criteria

Prabhu Prasad Dev ML / Module-III 11


K-Means Example-1

Prabhu Prasad Dev ML / Module-III 12


Contd…

Prabhu Prasad Dev ML / Module-III 13


Contd…

Prabhu Prasad Dev ML / Module-III 14


Contd…

Prabhu Prasad Dev ML / Module-III 15


Contd…

Prabhu Prasad Dev ML / Module-III 16


Example-2

Prabhu Prasad Dev ML / Module-III 17


Example-2

Prabhu Prasad Dev ML / Module-III 18


Contd…

Prabhu Prasad Dev ML / Module-III 19


Contd…

Prabhu Prasad Dev ML / Module-III 20


Contd…

Prabhu Prasad Dev ML / Module-III 21


Contd…

Prabhu Prasad Dev ML / Module-III 22


Contd…

Prabhu Prasad Dev ML / Module-III 23


Advantages

▪ k-Means is simple and can be used for a wide variety of object types.

▪ It is also efficient both from storage requirement and execution time point of views.

▪ By saving distance information from one iteration to the next, the actual number of distance calculations that
must be made can be reduced (specially, as it reaches towards the termination).

Prabhu Prasad Dev ML / Module-III 24


Limitations
▪ Needs Predefined k:
• The number of clusters (k) must be chosen in advance, which may not be straightforward.
▪ Sensitive to Initial seeds:
• The final clusters depend on the initial centroids, leading to different results for different runs.

Prabhu Prasad Dev ML / Module-III 25


Limitations
▪ Assumes Spherical Clusters:
• K-Means assumes clusters are spherical and equally sized, making it ineffective for complex, non-convex
cluster shapes.

• Sensitive to Outliers:
• Outliers can significantly distort the cluster centroids, leading to incorrect clustering.

Prabhu Prasad Dev ML / Module-III 26


Techniques to find optimal value of k
1. Elbow Method:
▪ The Elbow Method is a common technique to determine
the optimal number of clusters (K) in K-Means clustering.
▪ It is based on the Inertia or Within-Cluster Sum of
Squares (WCSS), which measures the compactness of
clusters.
▪ The goal is to select a value of K where adding more
clusters does not significantly reduce WCSS.

Prabhu Prasad Dev ML / Module-III 27


Techniques to find optimal value of k
2. Silhouette Method:
▪ The Silhouette Method is used to evaluate the quality of clustering and determine the optimal number of clusters (K) in K-
Means.
▪ It measures how well each point fits within its assigned cluster compared to other clusters.

Prabhu Prasad Dev ML / Module-III 28


Techniques to find optimal value of k
2. Silhouette Method (Contd..)

In the above graph, the silhouette score is highest at k=3. Hence


the number of clusters i.e the value of k should be 3.

Prabhu Prasad Dev ML / Module-III 29


k-Medians

Prabhu Prasad Dev ML / Module-III 30


k-Medians

Prabhu Prasad Dev ML / Module-III 31


Example-1

Prabhu Prasad Dev ML / Module-III 32


Contd…

Prabhu Prasad Dev ML / Module-III 33


Example-2

Prabhu Prasad Dev ML / Module-III 34


Contd…

Prabhu Prasad Dev ML / Module-III 35


Contd…

Prabhu Prasad Dev ML / Module-III 36


Contd…

Prabhu Prasad Dev ML / Module-III 37


Disadvantages of k-Medians:

• Computational Complexity:
• Determining medians can be computationally more expensive than computing means,
especially for large datasets, as sorting or median-finding operations are required at
each iteration.
• Sensitivity to Initial Centers:
• Similar to k-means, k-medians can converge to local minima. The choice of initial cluster
centers strongly influences results, and poor initialization can yield suboptimal
clustering.
• Less Efficient for High-Dimensional Data:
• When dealing with high-dimensional spaces, the performance of k-medians often
deteriorates due to the complexity of median calculations in each dimension.
• Not Completely Robust to Outliers:
• While k-medians are more robust to outliers than k-means (due to using median rather
than mean), significant outliers can still distort clustering, especially in smaller clusters.

Prabhu Prasad Dev ML / Module-III 38


k-Medoids

Prabhu Prasad Dev ML / Module-III 39


k-Medoids Algorithm
➢ The k-Medoids algorithm is a clustering technique similar to k-Means, but it is more robust to noise and outliers.
Instead of using the mean of data points to define cluster centers, k-Medoids selects actual data points (medoids) as
cluster centers.
➢ The k-Medoids algorithm aims to diminish the effect of outliers.
➢ Algorithm was proposed in 1987 by Kaufman and Rousseeuw.
➢ The K-Medoids clustering is called a partitioning clustering algorithm. The most popular implementation of K-medoids
clustering is the Partitioning around Medoids (PAM) clustering. In this article, we will discuss the PAM algorithm for
K-medoids clustering with a numerical example.

➢ The sum-of-absolute error (SAE) function is used as the objective function.


𝑘

𝑆𝐴𝐸 = ෍ ෍ 𝑥 − 𝑐𝑚
𝑖=1 𝑥∈𝑪𝑖 ,𝑥∉𝑀 𝑎𝑛𝑑 𝑐𝑚 ∈𝑀

Where 𝑐𝑚 denotes a medoid


M is the set of all medoids at any instant
x is an object belongs to set of non-medoid object, that is, x belongs to some cluster and is not a medoid. i.e.
𝑥 ∈ 𝑪𝑖 , 𝑥 ∉ 𝑀

Prabhu Prasad Dev ML / Module-III 40


Steps of k-Medoids Algorithm (PAM Algorithm)
Algorithm
Input: Database of objects D.
k, the number of desired clusters.
Output: Set of k clusters
Steps:
1. Arbitrarily select k medoids from D.
2. For each object 𝑜𝑖 not a medoid do
3. For each medoid 𝑜𝑗 do
4. Let 𝑀 = {𝑜1 , 𝑜2 , … , 𝑜𝑖−1 , 𝑜𝑖 , 𝑜𝑖+1 , 𝑜𝑘 } //Set of current medoids
𝑀 ′ = 𝑜1 , 𝑜2 , … , 𝑜𝑗−1 , 𝑜𝑗 , 𝑜𝑗+1 , 𝑜𝑘 //set of medoids but swap with non-medoids 𝑜𝑗
5. Calculate 𝑐𝑜𝑠𝑡 𝑜𝑖 , 𝑜𝑗 = 𝑆𝐴𝐸 ȁ𝑀 − 𝑆𝐴𝐸𝑀′
6. End of 2 for loop
7. Find 𝑜𝑖 , 𝑜𝑗 for which the cost(𝑜𝑖 , 𝑜𝑗 ) is the smallest.
8. Replace 𝑜𝑖 with 𝑜𝑗 and accordingly update the set M.
9. Repeat step 2 - step 8 until cost(𝑜𝑖 , 𝑜𝑖 ) ≤ 0.
10. Return the cluster with M as the set of cluster centers.
11. Stop

Prabhu Prasad Dev ML / Module-III 41


Example

Point Coordinates
• Suppose that we want to group the above dataset into two
A1 (2, 6)
clusters.
A2 (3, 8)
• Following are two points from the dataset that we have A3 (4, 7)
selected as medoids. A4 (6, 2)
• M1 = (3, 4) A5 (6, 4)
• M2 = (7, 3) A6 (7, 3)
• Use the Manhattan distance measure. A7 (7,4)
A8 (8, 5)
• Apply k-Medoids Clustering to form 2 clusters.
A9 (7, 6)
A10 (3, 4)

Prabhu Prasad Dev ML / Module-III 42


Iteration-1

Distance Distance
Assigned
Point Coordinates From M1 from M2
Cluster
(3,4) (7,3)

A1 (2, 6) 3 8 Cluster 1 ➢ The clusters made with medoids (3, 4) and (7, 3) are as
follows.
A2 (3, 8) 4 9 Cluster 1
➢ Points in cluster1= {(2, 6), (3, 8), (4, 7), (3, 4)}
A3 (4, 7) 4 7 Cluster 1 ➢ Points in cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5),
(7,6)}
A4 (6, 2) 5 2 Cluster 2
➢ After assigning clusters, we will calculate the cost for
A5 (6, 4) 3 2 Cluster 2 each cluster and find their sum. The cost is nothing but
the sum of distances of all the data points from the
A6 (7, 3) 5 0 Cluster 2
medoid of the cluster they belong to.
A7 (7,4) 4 1 Cluster 2 ➢ Hence, the cost for the current cluster will be
A8 (8, 5) 6 3 Cluster 2
3+4+4+2+2+0+1+3+3+0=22.

A9 (7, 6) 6 3 Cluster 2
A10 (3, 4) 0 5 Cluster 1

Prabhu Prasad Dev ML / Module-III 43


Iteration-2

Distance Distance
Coordinat Now, we will select another non-medoid point (7, 4) and make it
Point From M1 from M2 Assigned Cluster
es a temporary medoid for the second cluster. Hence,
(3,4) (7,4)
• M1 = (3, 4)
A1 (2, 6) 3 7 Cluster 1 • M2 = (7, 4)
Now, let us calculate the distance between all the data points
A2 (3, 8) 4 8 Cluster 1 and the current medoids.
A3 (4, 7) 4 6 Cluster 1 ➢ The data points haven’t changed in the clusters after changing
the medoids. Hence, clusters are:
A4 (6, 2) 5 3 Cluster 2
➢ Points in cluster1:{(2, 6), (3, 8), (4, 7), (3, 4)}
A5 (6, 4) 3 1 Cluster 2 ➢ Points in cluster 2:{(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}
➢ Now, let us again calculate the cost for each cluster and find
A6 (7, 3) 5 1 Cluster 2
their sum. The total cost this time will be
A7 (7,4) 4 0 Cluster 2 3+4+4+3+1+1+0+2+2+0=20.
➢ Here, the current cost is less than the cost calculated in the
A8 (8, 5) 6 2 Cluster 2
previous iteration. Hence, we will make the swap permanent
A9 (7, 6) 6 2 Cluster 2 and make (7,4) the medoid for cluster 2.
➢ If the cost this time was greater than the previous cost i.e. 22,
A10 (3, 4) 0 4 Cluster 1
we would have to revert the change. New medoids after this
iteration are (3, 4) and (7, 4) with no change in the clusters.

Prabhu Prasad Dev ML / Module-III 44


Iteration-3

Distance Distance ➢ Now, let us again change the medoid of cluster 2 to (6, 4).
Assigned
Point Coordinates From M1 from M2 ➢ Hence, the new medoids for the clusters are M1=(3, 4) and
Cluster
(3,4) (6,4)
M2= (6, 4 ).
A1 (2, 6) 3 6 Cluster 1 ➢ Again, the clusters haven’t changed. Hence, clusters are:
➢ Points in cluster1:{(2, 6), (3, 8), (4, 7), (3, 4)}
A2 (3, 8) 4 7 Cluster 1 ➢ Points in cluster 2:{(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}
A3 (4, 7) 4 5 Cluster 1 ➢ Now, let us again calculate the cost for each cluster and find
their sum. The total cost this time will be
A4 (6, 2) 5 2 Cluster 2 3+4+4+2+0+2+1+3+3+0=22.
A5 (6, 4) 3 0 Cluster 2 ➢ The current cost is 22 which is greater than the cost in the
previous iteration i.e. 20. Hence, we will revert the change
A6 (7, 3) 5 2 Cluster 2 and the point (7, 4) will again be made the medoid for
A7 (7,4) 4 1 Cluster 2 cluster 2.
➢ So, the clusters after this iteration will be
A8 (8, 5) 6 3 Cluster 2 • cluster1 = {(2, 6), (3, 8), (4, 7), (3, 4)} and
A9 (7, 6) 6 3 Cluster 2 • cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}.
• The medoids are (3,4) and (7,4).
A10 (3, 4) 0 3 Cluster 1

Prabhu Prasad Dev ML / Module-III 45


Advantages of K-Medoids Algorithm

• It is simple to understand and easy to implement.


• K-Medoid Algorithm is fast and converges in a fixed number of steps.
• K-Medoid is less sensitive to outliers than other partitioning algorithms.

Prabhu Prasad Dev ML / Module-III 46


Disadvantages of k-Medoids Algorithm

• The main disadvantage of K-Medoid algorithms is that it is not suitable for clustering non-spherical
(arbitrarily shaped) groups of objects.
• It may obtain different results for different runs on the same dataset because the first k medoids
are chosen randomly.
• Not as Scalable – Works better for small to medium-sized datasets but can be slow for large
datasets.

Prabhu Prasad Dev ML / Module-III 47


k-Means vs k-Medoids

Prabhu Prasad Dev ML / Module-III 48


Hierarchical Clustering

Prabhu Prasad Dev ML / Module-III 49


Hierarchical Clustering

▪ Hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy of clusters.

▪ The assumption is that data points close to each other are more similar or related than data points
farther apart.

▪ It need not to pre-specify the number of clusters.

▪ Uses distance matrix or proximity matrix as clustering criteria.

Prabhu Prasad Dev ML / Module-III 50


Types of Hierarchical Clustering
1. Agglomerative Hierarchical Clustering

• Bottom-up strategy
• Each cluster starts with only one object.
• Clusters are merged into larger and larger clusters until:
 All the objects are in a single cluster

 Certain termination conditions are satisfied

2. Divisive Hierarchical Clustering

• Top-down strategy
• Start with all objects in one cluster
• Clusters are subdivided into smaller and smaller clusters until:
 Each object forms a cluster on its own

 Certain termination conditions are satisfied

Prabhu Prasad Dev ML / Module-III 51


Agglomerative Clustering

Prabhu Prasad Dev ML / Module-III 52


Agglomerative Hierarchical Clustering

▪ Agglomerative hierarchical clustering follows the


bottom-up approach.

▪ Initially, each data point is considered as a singleton


cluster, and then successively, we merge the data
points which are close to each other.

▪ The process is repeated until all clusters have been


merged into a single cluster that contains all data.

▪ This clustering algorithm does not require us to


prespecify the number of clusters.

Prabhu Prasad Dev ML / Module-III 53


Linkage Criteria
The linkage method determines how we compute the
distance between clusters when merging them.
1. Single Linkage (Minimum Linkage)
The distance between two clusters is the minimum
distance between any two points from each cluster.

2. Complete Linkage (Maximum Linkage)


The distance between two clusters is the maximum
distance between any two points from each cluster.

Prabhu Prasad Dev ML / Module-III 54


Linkage Criteria
3. Average Linkage
The distance between two clusters is the average distance
between all pairs of points (one from each cluster).

4. Centroid Linkage
The distance between two clusters is the distance between their
centroids (mean points).

Prabhu Prasad Dev ML / Module-III 55


Linkage Criteria

5. Ward’s Method (Minimum Variance)


Minimizes the total variance within clusters by merging clusters that
cause the least increase in variance.

Prabhu Prasad Dev ML / Module-III 56


When to choose a Particular Linkage Method?

• If data has chains or irregular shapes: Use Single Linkage.

• If compact clusters are needed: Use Complete Linkage.

• If a balance is required: Use Average Linkage.

• If the centroid represents the cluster well: Use Centroid Linkage.

• If minimizing cluster variance is important: Use Ward’s Method.

Prabhu Prasad Dev ML / Module-III 57


How to choose the number of clusters?

▪ To choose the number of clusters in hierarchical clustering, we make use of a concept called dendrogram.

▪ A dendrogram is a tree-like diagram that shows the hierarchical relationship between the observations. It contains
the memory of hierarchical clustering algorithms.

Prabhu Prasad Dev ML / Module-III 58


How to choose the number of clusters?

▪ Just by looking at the Dendrogram, you can tell how the cluster is
formed.

▪ Let’s see how to form the dendrogram for the data points.

• The observations E and F are closest to each other by any other


points. So, they are combined into one cluster and also the height
of the link that joins them together is the smallest. The next
observations that are closest to each other are A and B which are
combined together.

• This can also be observed in the dendrogram as the height of the


block between A and B is slightly bigger than E and F. Similarly, D
can be merged into E and F clusters and then C can be combined
to that. Finally, A and B combined to C, D, E and F to form a single
cluster.

Prabhu Prasad Dev ML / Module-III 59


How to choose the number of clusters?
The important point to note while reading the dendrogram is that:

1. The Height of the blocks represents the distance between clusters, Cutting line
and

2. Distance between observations represents dissimilarities.

▪ But the question still remains the same, how do we find the number
of clusters using a dendrogram or where should we stop merging
the clusters? Observations are allocated to clusters by drawing a
horizontal line through the dendrogram.

Generally, we cut the dendrogram in such a way that it cuts the tallest
vertical line. In the above example, we have two clusters. One cluster has
observations A and B, and a second cluster has C, D, E, and F.

Prabhu Prasad Dev ML / Module-III 60


Complete Linkage Example

Prabhu Prasad Dev ML / Module-III 61


Contd..

Prabhu Prasad Dev ML / Module-III 62


Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 63


Contd..

Prabhu Prasad Dev ML / Module-III 64


Contd..

Prabhu Prasad Dev ML / Module-III 65


Contd..

Prabhu Prasad Dev ML / Module-III 66


Contd..

Prabhu Prasad Dev ML / Module-III 67


Contd..

Prabhu Prasad Dev ML / Module-III 68


Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 69


Contd..

Prabhu Prasad Dev ML / Module-III 70


Contd..

Prabhu Prasad Dev ML / Module-III 71


Contd..

Prabhu Prasad Dev ML / Module-III 72


Contd..

Prabhu Prasad Dev ML / Module-III 73


Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 74


Contd..

Prabhu Prasad Dev ML / Module-III 75


Contd..

Prabhu Prasad Dev ML / Module-III 76


Contd..

Prabhu Prasad Dev ML / Module-III 77


Contd..

Prabhu Prasad Dev ML / Module-III 78


Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 79


Contd..

Prabhu Prasad Dev ML / Module-III 80


Contd..

Prabhu Prasad Dev ML / Module-III 81


Contd..

Prabhu Prasad Dev ML / Module-III 82


Contd..

Prabhu Prasad Dev ML / Module-III 83


Strength of Complete Linkage

Prabhu Prasad Dev ML / Module-III 84


Limitations of Complete Linkage

Prabhu Prasad Dev ML / Module-III 85


Single Linkage Example

Prabhu Prasad Dev ML / Module-III 86


Contd..

Prabhu Prasad Dev ML / Module-III 87


Contd..

Prabhu Prasad Dev ML / Module-III 88


Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 89


Contd..

Prabhu Prasad Dev ML / Module-III 90


Contd..

Prabhu Prasad Dev ML / Module-III 91


Contd..

Prabhu Prasad Dev ML / Module-III 92


Contd..

Prabhu Prasad Dev ML / Module-III 93


Contd..

Prabhu Prasad Dev ML / Module-III 94


Contd..

Prabhu Prasad Dev ML / Module-III 95


Contd..

Prabhu Prasad Dev ML / Module-III 96


Contd..

Prabhu Prasad Dev ML / Module-III 97


Contd..

Prabhu Prasad Dev ML / Module-III 98


Contd..

Prabhu Prasad Dev ML / Module-III 99


Contd..

Prabhu Prasad Dev ML / Module-III 100


Contd..

Prabhu Prasad Dev ML / Module-III 101


Contd..

Prabhu Prasad Dev ML / Module-III 102


Contd..

Prabhu Prasad Dev ML / Module-III 103


Contd..

Prabhu Prasad Dev ML / Module-III 104


Contd..

Prabhu Prasad Dev ML / Module-III 105


Contd..

Prabhu Prasad Dev ML / Module-III 106


Contd..

Prabhu Prasad Dev ML / Module-III 107


Contd..

Prabhu Prasad Dev ML / Module-III 108


Contd..

Prabhu Prasad Dev ML / Module-III 109


Contd..

Prabhu Prasad Dev ML / Module-III 110


Contd..

Prabhu Prasad Dev ML / Module-III 111


Strength of Single Linkage

Prabhu Prasad Dev ML / Module-III 112


Limitations of Single Linkage

Prabhu Prasad Dev ML / Module-III 113


Average Linkage Example

Prabhu Prasad Dev ML / Module-III 114


Contd..

Prabhu Prasad Dev ML / Module-III 115


Contd..

Prabhu Prasad Dev ML / Module-III 116


Contd..

Prabhu Prasad Dev ML / Module-III 117


Contd..

Prabhu Prasad Dev ML / Module-III 118


Contd..

Prabhu Prasad Dev ML / Module-III 119


Contd..

Prabhu Prasad Dev ML / Module-III 120


Contd..

Prabhu Prasad Dev ML / Module-III 121


Contd..

Prabhu Prasad Dev ML / Module-III 122


Contd..

Prabhu Prasad Dev ML / Module-III 123


Contd..

Prabhu Prasad Dev ML / Module-III 124


Contd..

Prabhu Prasad Dev ML / Module-III 125


Contd..

Prabhu Prasad Dev ML / Module-III 126


Contd..

Prabhu Prasad Dev ML / Module-III 127


Contd..

Prabhu Prasad Dev ML / Module-III 128


Contd..

Prabhu Prasad Dev ML / Module-III 129


Contd..

Prabhu Prasad Dev ML / Module-III 130


Contd..

Prabhu Prasad Dev ML / Module-III 131


Contd..

Prabhu Prasad Dev ML / Module-III 132


Contd..

Prabhu Prasad Dev ML / Module-III 133


Contd..

Prabhu Prasad Dev ML / Module-III 134


Contd..

Prabhu Prasad Dev ML / Module-III 135


Divisive Clustering

Prabhu Prasad Dev ML / Module-III 136


Divisive Clustering

• In this clustering, objects are grouped in a top-down manner.


• Initially, all objects are in one cluster.
• Then the cluster is subdivided into smaller and smaller pieces, until each object forms a
cluster on its own or until it satisfies certain termination conditions as the desired number of
clusters is obtained.

Prabhu Prasad Dev ML / Module-III 137


Divisive Clustering Algorithm

1. Start with all data points in one cluster:


• Initially, all data points are grouped into a single cluster.
2. Compute a distance measure:
• Calculate the pairwise distance between all points using a suitable distance metric (e.g., Euclidean distance).
3. Construct a Minimum Spanning Tree (MST):
• Use algorithms like Kruskal's or Prim's to compute the Minimum Spanning Tree (MST) for the given set of points
based on the distance matrix.
4. Find the longest edge in the MST:
• Identify the edge with the largest weight (i.e., the longest distance) in the MST.
5. Split the cluster:
• Remove the longest edge from the MST, splitting the cluster into two parts at the location of the edge.
• These two parts now form two distinct clusters.
6. Repeat the process for each new cluster:
• Apply the same steps recursively to each of the resulting clusters. For each cluster, compute its MST, find the
longest edge, and split again.
7. Terminate when a stopping criterion is met:
• The process continues until the desired number of clusters is reached, or a termination condition (like minimum
intra-cluster distance or a predefined number of clusters) is satisfied.

Prabhu Prasad Dev ML / Module-III 138


Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 139


Example-1

Prabhu Prasad Dev ML / Module-III 140


Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 141


Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 142


Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 143


Example-2

Prabhu Prasad Dev ML / Module-III 144


Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 145


Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 146


DBSCAN Clustering

Prabhu Prasad Dev ML / Module-III 147


Density-Based Spatial Clustering of Applications with Noise
(DBSCAN)

Prabhu Prasad Dev ML / Module-III 148


Parameters

DBSCAN uses two main parameters:


• ε (epsilon)
• MinPts

By adjusting these parameters, you can control how the


algorithm defines clusters, allowing it to adapt to different types
of datasets and clustering requirements.

Prabhu Prasad Dev ML / Module-III 149


1) Epsilon (𝜺)
• Definition: Epsilon is the radius of the circle (or hypersphere in
higher dimensions) around a data point.
• Explanation: Points that fall within this radius are considered as
neighbors of that point.
• Effect of ε:
• If the ε value is extremely small, then most of the points may not
lie in the neighborhood and will be treated as outliers.
• This leads to poor clustering as most of the data points fail to
satisfy the minimum number of points desired to create a dense
region.
• In contrast, if ε is an extremely high value, then most of the data
points will remain in the same cluster.
• This leads to poor clustering where multiple clusters may end up
merging due to the high value of epsilon.
• Choosing ε: It can be determined using a k-distance graph, where
you look for a "knee" in the plot to identify a good value for ε.

Prabhu Prasad Dev ML / Module-III 150


2) MinPoints

• Definition: MinPoints is the minimum number of data points


required in a neighborhood to form a dense region (i.e., a cluster).
• Explanation: For a point to be considered as a core point, it must
have at least minPts within its ε-neighborhood (including itself).
• Effect of minPts:
• A larger minPts makes it harder to form clusters (as more
points are required).
• A smaller minPts might lead to the formation of smaller,
potentially less meaningful clusters.
• Typical choice: A common heuristic is to choose minPts = D
+ 1, where D is the number of dimensions in the dataset.

Prabhu Prasad Dev ML / Module-III 151


Types of Data Points

DBSCAN revolves around three key concepts:

1.Core Points: These are points that have at least a


minimum number of other points (MinPts) within a
specified distance (ε or epsilon).

2.Border Points: These are points that are within the ε


distance of a core point but don't have MinPts neighbors
themselves.

3.Noise Points: These are points that are neither core


points nor border points. They're not close enough to any
cluster to be included.

Prabhu Prasad Dev ML / Module-III 152


Types of Data Points

Prabhu Prasad Dev ML / Module-III 153


Contd..

Prabhu Prasad Dev ML / Module-III 154


Contd..

Prabhu Prasad Dev ML / Module-III 155


Directly Density Reachable (DDR)

Prabhu Prasad Dev ML / Module-III 156


Density Reachable (DR)

Prabhu Prasad Dev ML / Module-III 157


Density Connectivity (DC)

Prabhu Prasad Dev ML / Module-III 158


DBSCAN Pseudocode

Prabhu Prasad Dev ML / Module-III 159


Example

Prabhu Prasad Dev ML / Module-III 160


Contd..

Prabhu Prasad Dev ML / Module-III 161


Contd…

Prabhu Prasad Dev ML / Module-III 162


Contd…

Prabhu Prasad Dev ML / Module-III 163


Contd…

Prabhu Prasad Dev ML / Module-III 164


Example-2
Apply DBSCAN Algorithm for the following dataset with Eps=2, MinPts=3.
Distance Metric is Euclidean Distance

Prabhu Prasad Dev ML / Module-III 165


Contd…

Prabhu Prasad Dev ML / Module-III 166


Contd…

Prabhu Prasad Dev ML / Module-III 167


Contd…

Prabhu Prasad Dev ML / Module-III 168


Example-3

Prabhu Prasad Dev ML / Module-III 169


Contd…

Prabhu Prasad Dev ML / Module-III 170


Contd…

Prabhu Prasad Dev ML / Module-III 171


Contd…

Prabhu Prasad Dev ML / Module-III 172


When DBSCAN works well

Prabhu Prasad Dev ML / Module-III 173


When DBSCAN does not work well

Prabhu Prasad Dev ML / Module-III 174


Mean Shift Clustering

Prabhu Prasad Dev ML / Module-III 175


Mean Shift Clustering

▪ Mean shift clustering is a non-parametric, density-based algorithm that does not assume any specific shape for
the clusters. It is particularly useful for discovering clusters of arbitrary shapes and sizes in datasets.

▪ The core idea behind mean shift clustering is that it attempts to identify high-density regions of the data by
iteratively shifting data points toward regions of higher density. It essentially "shifts" data points to the mode
(peak) of the data distribution.

Prabhu Prasad Dev ML / Module-III 176


Key Concepts
1. Kernel Density Estimation: It uses a probability density function of a random variable to identify areas of higher
data density in each iteration. The formula for KDE is:

Where h is a bandwidth parameter, and the kernel is commonly a Gaussian. The kernel function k smooths the
contribution of each data point, ensuring that points closer to x have a higher influence on the density estimate.

2. Kernel Function: Mean shift uses a kernel function to define the neighborhood of each data point. The kernel is
often a Gaussian kernel, but other kernel types can also be used.

Prabhu Prasad Dev ML / Module-III 177


Algorithm
As a mode-seeking algorithm, it labels the clusters by finding the modes, or peaks, in the data distribution.
Essentially, it highlights the most dense areas. It does this by iteratively shifting the cluster centers toward regions of
a higher data density.

1. Initialization:

• Begin by considering each data point in the dataset as an initial cluster center. These are the points that the
algorithm will try to "shift" toward denser regions.

• The first step is to define a bandwidth (hyper-parameter) which determines the size of the neighborhood around
each data point (often referred to as the window size).

2. Kernel Function and Window:

• A kernel function is applied to each data point. The kernel typically assigns weights to points depending on their
distance from the current point of interest. The most commonly used kernel is the Gaussian kernel.

• The idea behind the kernel is that nearby points (in terms of the bandwidth) will have a higher influence than
distant points.
Prabhu Prasad Dev ML / Module-III 178
Contd…

Prabhu Prasad Dev ML / Module-III 179


Contd…

3. Shift Step:
• For each data point, a local mean is calculated using all the points within its bandwidth (window). The mean
is a weighted average of the points in the neighborhood.
• The mean shift vector is computed as the difference between the current point and the mean of its
neighborhood.
• This vector is then used to "shift" the current point toward the denser region (higher density area).
▪ Mathematically:

Prabhu Prasad Dev ML / Module-III 180


Contd…

4. Iterative Shifting:
• The point is shifted by the computed vector and the process is repeated.
• This shifting continues until the movement of the point becomes very small, i.e., when convergence is
achieved.
5. Convergence:
• The algorithm stops when the data points stop shifting significantly, meaning that each point has converged
to a location of higher density (the mode of the data distribution).
• Once convergence is achieved, all the points that have converged to the same position (mode) are grouped
together as a single cluster.
6. Cluster Formation:
• After convergence, all points that are "close" to each other (in terms of the distance between their final mean
shift positions) are assigned to the same cluster.
• Typically, a distance threshold is used to merge points that are close enough to each other.
• The clusters correspond to local maxima (modes) of the data density, where a large number of points are
gathered in the same area.

Prabhu Prasad Dev ML / Module-III 181


Illustration

Prabhu Prasad Dev ML / Module-III 182


Example
▪ Data = [(1, 2), (2, 3), (3, 4), (7, 7), (8, 8), (12, 2)]

Prabhu Prasad Dev ML / Module-III 183


Example

Prabhu Prasad Dev ML / Module-III 184


Example

Prabhu Prasad Dev ML / Module-III 185


Example

The same procedure is repeated for the remaining points in the


dataset to compute their new positions.

Prabhu Prasad Dev ML / Module-III 186


Example
Let us assume the given matrix is Gaussian similarity matrix from each pair of points and bandwidth=2.
Apply Mean shift clustering.

Prabhu Prasad Dev ML / Module-III 187


How to calculate Gaussian Similarity

Prabhu Prasad Dev ML / Module-III 188


Contd…
Gaussian similarity matrix

Prabhu Prasad Dev ML / Module-III 189


Example

Within bandwidth=2, find out neighbor points of


each point.

Prabhu Prasad Dev ML / Module-III 190


Contd..
Initial Data Iteration-1

Prabhu Prasad Dev ML / Module-III 191


Strengths

▪ No Need to Predefine the Number of Clusters: Unlike k-means, mean shift clustering does not

require you to specify the number of clusters beforehand. It finds the number of clusters automatically

based on the data’s distribution.

▪ Arbitrary Shape Clusters: Mean shift can discover clusters of any shape, unlike k-means, which

assumes spherical or circular clusters. It is especially useful for irregularly shaped clusters.

▪ Robust to Outliers: Outliers have less impact on the mean shift algorithm because they do not

significantly affect the density in the areas where they reside.

Prabhu Prasad Dev ML / Module-III 192


Weakness

• Computational Complexity: Mean shift is computationally expensive, especially for large datasets. It

involves multiple iterations to shift each data point, which can make it slow for high-dimensional data.

• Sensitivity to Bandwidth Parameter: The bandwidth parameter significantly impacts the results. A small

bandwidth might lead to many small clusters, while a large bandwidth might merge distinct clusters.

Selecting the appropriate bandwidth can be challenging.

Prabhu Prasad Dev ML / Module-III 193

You might also like