0% found this document useful (0 votes)

4 views193 pages

Module 3

Clustering is an unsupervised learning process that partitions data into meaningful sub-classes called clusters, where objects in the same cluster are similar to each other and dissimilar to those in other clusters. The document discusses various clustering approaches, including partitioning, hierarchical, density-based, graph-based, and model-based methods, with a focus on techniques such as k-Means, k-Medoids, and their respective algorithms. It also highlights the importance of choosing the right number of clusters and the advantages and limitations of these clustering techniques.

Uploaded by

2205092

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views193 pages

Module 3

Uploaded by

2205092

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 193

Clustering

Prabhu Prasad Dev ML / Module-III 1

Clustering

• Clustering is a process of partitioning a set of data (or objects) in a set of meaningful sub-
classes, called clusters.
• Unsupervised learning (No predefined classes)
• Cluster: a collection of data objects
• Similar to one another within the same cluster
• Dissimilar to the objects in other clusters
• Cluster analysis
• Finding groups of objects such that the objects in a group will be similar (or related) to one
another and different from (or unrelated to) the objects in other groups

Prabhu Prasad Dev ML / Module-III 2

What is Good Clustering?

• A good clustering method will produce high quality

clusters with
• high intra-class similarity
• low inter-class similarity

• The quality of a clustering result depends on both the

similarity measure used by the method and its
implementation.

Prabhu Prasad Dev ML / Module-III 3

Major Clustering Approaches

• Partitioning algorithms:
• These algorithms divide a dataset into a predetermined number of clusters.
• Each data point is assigned to exactly one cluster based on similarity.
• A well-known example is k-Means, which minimizes the variance within clusters.
• These methods work well when the number of clusters is known and the data is well-
separated.

• Hierarchical algorithms:
• These algorithms create a tree-like structure (dendrogram) by repeatedly merging or splitting
clusters.
• Agglomerative (Bottom-Up): Starts with individual points as clusters and merges them
iteratively.
• Divisive (Top-Down): Starts with a single cluster and splits it into smaller ones.
• Example: Hierarchical Clustering.

Prabhu Prasad Dev ML / Module-III 4

Major Clustering Approaches
• Density-based:
• Forms clusters based on dense regions of data points.
• Can discover clusters of arbitrary shape and is robust to noise.
• Works well for datasets with varying densities.
• Example: DBSCAN (Density-Based Spatial Clustering of Applications with Noise).

• Graph-based:
• Represents data as a graph, where nodes represent data points, and edges represent
relationships (similarities) between them.
• This method is particularly useful when the relationships between data points are complex
and not easily captured by distance metrics alone.

• Model-based:
• Assumes that the data is generated by a mixture of underlying probability distributions.
• Tries to find the best fit for each cluster using statistical models.
• Can be useful when data follows a known distribution.
• Example: Gaussian Mixture Models (GMM).

Prabhu Prasad Dev ML / Module-III 5

Different Clustering Techniques
• k-Means algorithm [1957, 1967] • PAM [1990]
• k-Medoids algorithm • CLARA [1990]
• k-Modes [1998] • CLARANS
Partitioning [1994]
methods • Fuzzy c-means algorithm [1999]

• DIANA [1990]
Divisive
• AGNES [1990]
Hierarchical • BIRCH [1996]
methods Agglomerative • CURE [1998]
methods • ROCK [1999]
• Chamelon [1999]
Clustering
Techniques • STING [1997] • DENCLUE [1998]
Density-based
• DBSCAN [1996] • OPTICS [1999]
methods • CLIQUE [1998] • Wave Cluster [1998]

• MST Clustering [1999]

Graph based
• OPOSSUM [2000]
methods • SNN Similarity Clustering [2001, 2003]

• EM Algorithm [1977]
Model based • Auto class [1996]
clustering • COBWEB [1987]
• ANN Clustering [1982, 1989]

Prabhu Prasad Dev ML / Module-III 6

Clustering Techniques

 In this lecture, we shall cover the following clustering techniques only.

❑ Partitioning
❑ k-Means algorithm

❑ PAM (k-Medoids algorithm)

❑ Hierarchical
❑ Divisive algorithm

❑ Agglomerative algorithm

❑ Density based
❑ DBSCAN

Prabhu Prasad Dev ML / Module-III 7

k-Means

Prabhu Prasad Dev ML / Module-III 8

k-Means

▪ k-Means clustering algorithm proposed by J. Hartigan and M. A. Wong [1979].

▪ Given a set of n distinct objects, the k-Means clustering algorithm partitions the objects into k number of clusters such
that intracluster similarity is high but the intercluster similarity is low.

▪ In this algorithm, user has to specify k, the number of clusters.

Prabhu Prasad Dev ML / Module-III 9

k-Means

Algorithm : k-Means clustering

Input: D is a dataset containing n objects, k is the number of cluster

Output: A set of k clusters
Steps:
1. Randomly choose k objects from D as the initial cluster centroids.

2. For each of the objects in D do

i. Compute distance between the current objects and k cluster centroids
ii. Assign the current object to that cluster to which it is closest.

3. Re-compute the “cluster centers” by calculating mean of each cluster. These become the new cluster
centroids.

4. Repeat step 2-3 until the convergence criterion is satisfied

Prabhu Prasad Dev ML / Module-III 10

Convergence Criteria

Prabhu Prasad Dev ML / Module-III 11

K-Means Example-1

Prabhu Prasad Dev ML / Module-III 12

Contd…

Prabhu Prasad Dev ML / Module-III 13

Contd…

Prabhu Prasad Dev ML / Module-III 14

Contd…

Prabhu Prasad Dev ML / Module-III 15

Contd…

Prabhu Prasad Dev ML / Module-III 16

Example-2

Prabhu Prasad Dev ML / Module-III 17

Example-2

Prabhu Prasad Dev ML / Module-III 18

Contd…

Prabhu Prasad Dev ML / Module-III 19

Contd…

Prabhu Prasad Dev ML / Module-III 20

Contd…

Prabhu Prasad Dev ML / Module-III 21

Contd…

Prabhu Prasad Dev ML / Module-III 22

Contd…

Prabhu Prasad Dev ML / Module-III 23

Advantages

▪ k-Means is simple and can be used for a wide variety of object types.

▪ It is also efficient both from storage requirement and execution time point of views.

▪ By saving distance information from one iteration to the next, the actual number of distance calculations that
must be made can be reduced (specially, as it reaches towards the termination).

Prabhu Prasad Dev ML / Module-III 24

Limitations
▪ Needs Predefined k:
• The number of clusters (k) must be chosen in advance, which may not be straightforward.
▪ Sensitive to Initial seeds:
• The final clusters depend on the initial centroids, leading to different results for different runs.

Prabhu Prasad Dev ML / Module-III 25

Limitations
▪ Assumes Spherical Clusters:
• K-Means assumes clusters are spherical and equally sized, making it ineffective for complex, non-convex
cluster shapes.

• Sensitive to Outliers:
• Outliers can significantly distort the cluster centroids, leading to incorrect clustering.

Prabhu Prasad Dev ML / Module-III 26

Techniques to find optimal value of k
1. Elbow Method:
▪ The Elbow Method is a common technique to determine
the optimal number of clusters (K) in K-Means clustering.
▪ It is based on the Inertia or Within-Cluster Sum of
Squares (WCSS), which measures the compactness of
clusters.
▪ The goal is to select a value of K where adding more
clusters does not significantly reduce WCSS.

Prabhu Prasad Dev ML / Module-III 27

Techniques to find optimal value of k
2. Silhouette Method:
▪ The Silhouette Method is used to evaluate the quality of clustering and determine the optimal number of clusters (K) in K-
Means.
▪ It measures how well each point fits within its assigned cluster compared to other clusters.

Prabhu Prasad Dev ML / Module-III 28

Techniques to find optimal value of k
2. Silhouette Method (Contd..)

In the above graph, the silhouette score is highest at k=3. Hence

the number of clusters i.e the value of k should be 3.

Prabhu Prasad Dev ML / Module-III 29

k-Medians

Prabhu Prasad Dev ML / Module-III 30

k-Medians

Prabhu Prasad Dev ML / Module-III 31

Example-1

Prabhu Prasad Dev ML / Module-III 32

Contd…

Prabhu Prasad Dev ML / Module-III 33

Example-2

Prabhu Prasad Dev ML / Module-III 34

Contd…

Prabhu Prasad Dev ML / Module-III 35

Contd…

Prabhu Prasad Dev ML / Module-III 36

Contd…

Prabhu Prasad Dev ML / Module-III 37

Disadvantages of k-Medians:

• Computational Complexity:
• Determining medians can be computationally more expensive than computing means,
especially for large datasets, as sorting or median-finding operations are required at
each iteration.
• Sensitivity to Initial Centers:
• Similar to k-means, k-medians can converge to local minima. The choice of initial cluster
centers strongly influences results, and poor initialization can yield suboptimal
clustering.
• Less Efficient for High-Dimensional Data:
• When dealing with high-dimensional spaces, the performance of k-medians often
deteriorates due to the complexity of median calculations in each dimension.
• Not Completely Robust to Outliers:
• While k-medians are more robust to outliers than k-means (due to using median rather
than mean), significant outliers can still distort clustering, especially in smaller clusters.

Prabhu Prasad Dev ML / Module-III 38

k-Medoids

Prabhu Prasad Dev ML / Module-III 39

k-Medoids Algorithm
➢ The k-Medoids algorithm is a clustering technique similar to k-Means, but it is more robust to noise and outliers.
Instead of using the mean of data points to define cluster centers, k-Medoids selects actual data points (medoids) as
cluster centers.
➢ The k-Medoids algorithm aims to diminish the effect of outliers.
➢ Algorithm was proposed in 1987 by Kaufman and Rousseeuw.
➢ The K-Medoids clustering is called a partitioning clustering algorithm. The most popular implementation of K-medoids
clustering is the Partitioning around Medoids (PAM) clustering. In this article, we will discuss the PAM algorithm for
K-medoids clustering with a numerical example.

➢ The sum-of-absolute error (SAE) function is used as the objective function.

𝑘

𝑆𝐴𝐸 = ෍ ෍ 𝑥 − 𝑐𝑚
𝑖=1 𝑥∈𝑪𝑖 ,𝑥∉𝑀 𝑎𝑛𝑑 𝑐𝑚 ∈𝑀

Where 𝑐𝑚 denotes a medoid

M is the set of all medoids at any instant
x is an object belongs to set of non-medoid object, that is, x belongs to some cluster and is not a medoid. i.e.
𝑥 ∈ 𝑪𝑖 , 𝑥 ∉ 𝑀

Prabhu Prasad Dev ML / Module-III 40

Steps of k-Medoids Algorithm (PAM Algorithm)
Algorithm
Input: Database of objects D.
k, the number of desired clusters.
Output: Set of k clusters
Steps:
1. Arbitrarily select k medoids from D.
2. For each object 𝑜𝑖 not a medoid do
3. For each medoid 𝑜𝑗 do
4. Let 𝑀 = {𝑜1 , 𝑜2 , … , 𝑜𝑖−1 , 𝑜𝑖 , 𝑜𝑖+1 , 𝑜𝑘 } //Set of current medoids
𝑀 ′ = 𝑜1 , 𝑜2 , … , 𝑜𝑗−1 , 𝑜𝑗 , 𝑜𝑗+1 , 𝑜𝑘 //set of medoids but swap with non-medoids 𝑜𝑗
5. Calculate 𝑐𝑜𝑠𝑡 𝑜𝑖 , 𝑜𝑗 = 𝑆𝐴𝐸 ȁ𝑀 − 𝑆𝐴𝐸𝑀′
6. End of 2 for loop
7. Find 𝑜𝑖 , 𝑜𝑗 for which the cost(𝑜𝑖 , 𝑜𝑗 ) is the smallest.
8. Replace 𝑜𝑖 with 𝑜𝑗 and accordingly update the set M.
9. Repeat step 2 - step 8 until cost(𝑜𝑖 , 𝑜𝑖 ) ≤ 0.
10. Return the cluster with M as the set of cluster centers.
11. Stop

Prabhu Prasad Dev ML / Module-III 41

Example

Point Coordinates
• Suppose that we want to group the above dataset into two
A1 (2, 6)
clusters.
A2 (3, 8)
• Following are two points from the dataset that we have A3 (4, 7)
selected as medoids. A4 (6, 2)
• M1 = (3, 4) A5 (6, 4)
• M2 = (7, 3) A6 (7, 3)
• Use the Manhattan distance measure. A7 (7,4)
A8 (8, 5)
• Apply k-Medoids Clustering to form 2 clusters.
A9 (7, 6)
A10 (3, 4)

Prabhu Prasad Dev ML / Module-III 42

Iteration-1

Distance Distance
Assigned
Point Coordinates From M1 from M2
Cluster
(3,4) (7,3)

A1 (2, 6) 3 8 Cluster 1 ➢ The clusters made with medoids (3, 4) and (7, 3) are as
follows.
A2 (3, 8) 4 9 Cluster 1
➢ Points in cluster1= {(2, 6), (3, 8), (4, 7), (3, 4)}
A3 (4, 7) 4 7 Cluster 1 ➢ Points in cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5),
(7,6)}
A4 (6, 2) 5 2 Cluster 2
➢ After assigning clusters, we will calculate the cost for
A5 (6, 4) 3 2 Cluster 2 each cluster and find their sum. The cost is nothing but
the sum of distances of all the data points from the
A6 (7, 3) 5 0 Cluster 2
medoid of the cluster they belong to.
A7 (7,4) 4 1 Cluster 2 ➢ Hence, the cost for the current cluster will be
A8 (8, 5) 6 3 Cluster 2
3+4+4+2+2+0+1+3+3+0=22.

A9 (7, 6) 6 3 Cluster 2
A10 (3, 4) 0 5 Cluster 1

Prabhu Prasad Dev ML / Module-III 43

Iteration-2

Distance Distance
Coordinat Now, we will select another non-medoid point (7, 4) and make it
Point From M1 from M2 Assigned Cluster
es a temporary medoid for the second cluster. Hence,
(3,4) (7,4)
• M1 = (3, 4)
A1 (2, 6) 3 7 Cluster 1 • M2 = (7, 4)
Now, let us calculate the distance between all the data points
A2 (3, 8) 4 8 Cluster 1 and the current medoids.
A3 (4, 7) 4 6 Cluster 1 ➢ The data points haven’t changed in the clusters after changing
the medoids. Hence, clusters are:
A4 (6, 2) 5 3 Cluster 2
➢ Points in cluster1:{(2, 6), (3, 8), (4, 7), (3, 4)}
A5 (6, 4) 3 1 Cluster 2 ➢ Points in cluster 2:{(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}
➢ Now, let us again calculate the cost for each cluster and find
A6 (7, 3) 5 1 Cluster 2
their sum. The total cost this time will be
A7 (7,4) 4 0 Cluster 2 3+4+4+3+1+1+0+2+2+0=20.
➢ Here, the current cost is less than the cost calculated in the
A8 (8, 5) 6 2 Cluster 2
previous iteration. Hence, we will make the swap permanent
A9 (7, 6) 6 2 Cluster 2 and make (7,4) the medoid for cluster 2.
➢ If the cost this time was greater than the previous cost i.e. 22,
A10 (3, 4) 0 4 Cluster 1
we would have to revert the change. New medoids after this
iteration are (3, 4) and (7, 4) with no change in the clusters.

Prabhu Prasad Dev ML / Module-III 44

Iteration-3

Distance Distance ➢ Now, let us again change the medoid of cluster 2 to (6, 4).
Assigned
Point Coordinates From M1 from M2 ➢ Hence, the new medoids for the clusters are M1=(3, 4) and
Cluster
(3,4) (6,4)
M2= (6, 4 ).
A1 (2, 6) 3 6 Cluster 1 ➢ Again, the clusters haven’t changed. Hence, clusters are:
➢ Points in cluster1:{(2, 6), (3, 8), (4, 7), (3, 4)}
A2 (3, 8) 4 7 Cluster 1 ➢ Points in cluster 2:{(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}
A3 (4, 7) 4 5 Cluster 1 ➢ Now, let us again calculate the cost for each cluster and find
their sum. The total cost this time will be
A4 (6, 2) 5 2 Cluster 2 3+4+4+2+0+2+1+3+3+0=22.
A5 (6, 4) 3 0 Cluster 2 ➢ The current cost is 22 which is greater than the cost in the
previous iteration i.e. 20. Hence, we will revert the change
A6 (7, 3) 5 2 Cluster 2 and the point (7, 4) will again be made the medoid for
A7 (7,4) 4 1 Cluster 2 cluster 2.
➢ So, the clusters after this iteration will be
A8 (8, 5) 6 3 Cluster 2 • cluster1 = {(2, 6), (3, 8), (4, 7), (3, 4)} and
A9 (7, 6) 6 3 Cluster 2 • cluster 2= {(7,4), (6,2), (6, 4), (7,3), (8,5), (7,6)}.
• The medoids are (3,4) and (7,4).
A10 (3, 4) 0 3 Cluster 1

Prabhu Prasad Dev ML / Module-III 45

Advantages of K-Medoids Algorithm

• It is simple to understand and easy to implement.

• K-Medoid Algorithm is fast and converges in a fixed number of steps.
• K-Medoid is less sensitive to outliers than other partitioning algorithms.

Prabhu Prasad Dev ML / Module-III 46

Disadvantages of k-Medoids Algorithm

• The main disadvantage of K-Medoid algorithms is that it is not suitable for clustering non-spherical
(arbitrarily shaped) groups of objects.
• It may obtain different results for different runs on the same dataset because the first k medoids
are chosen randomly.
• Not as Scalable – Works better for small to medium-sized datasets but can be slow for large
datasets.

Prabhu Prasad Dev ML / Module-III 47

k-Means vs k-Medoids

Prabhu Prasad Dev ML / Module-III 48

Hierarchical Clustering

Prabhu Prasad Dev ML / Module-III 49

Hierarchical Clustering

▪ Hierarchical clustering is a method of cluster analysis that seeks to build a hierarchy of clusters.

▪ The assumption is that data points close to each other are more similar or related than data points
farther apart.

▪ It need not to pre-specify the number of clusters.

▪ Uses distance matrix or proximity matrix as clustering criteria.

Prabhu Prasad Dev ML / Module-III 50

Types of Hierarchical Clustering
1. Agglomerative Hierarchical Clustering

• Bottom-up strategy
• Each cluster starts with only one object.
• Clusters are merged into larger and larger clusters until:
 All the objects are in a single cluster

 Certain termination conditions are satisfied

2. Divisive Hierarchical Clustering

• Top-down strategy
• Start with all objects in one cluster
• Clusters are subdivided into smaller and smaller clusters until:
 Each object forms a cluster on its own

 Certain termination conditions are satisfied

Prabhu Prasad Dev ML / Module-III 51

Agglomerative Clustering

Prabhu Prasad Dev ML / Module-III 52

Agglomerative Hierarchical Clustering

▪ Agglomerative hierarchical clustering follows the

bottom-up approach.

▪ Initially, each data point is considered as a singleton

cluster, and then successively, we merge the data
points which are close to each other.

▪ The process is repeated until all clusters have been

merged into a single cluster that contains all data.

▪ This clustering algorithm does not require us to

prespecify the number of clusters.

Prabhu Prasad Dev ML / Module-III 53

Linkage Criteria
The linkage method determines how we compute the
distance between clusters when merging them.
1. Single Linkage (Minimum Linkage)
The distance between two clusters is the minimum
distance between any two points from each cluster.

2. Complete Linkage (Maximum Linkage)

The distance between two clusters is the maximum
distance between any two points from each cluster.

Prabhu Prasad Dev ML / Module-III 54

Linkage Criteria
3. Average Linkage
The distance between two clusters is the average distance
between all pairs of points (one from each cluster).

4. Centroid Linkage
The distance between two clusters is the distance between their
centroids (mean points).

Prabhu Prasad Dev ML / Module-III 55

Linkage Criteria

5. Ward’s Method (Minimum Variance)

Minimizes the total variance within clusters by merging clusters that
cause the least increase in variance.

Prabhu Prasad Dev ML / Module-III 56

When to choose a Particular Linkage Method?

• If data has chains or irregular shapes: Use Single Linkage.

• If compact clusters are needed: Use Complete Linkage.

• If a balance is required: Use Average Linkage.

• If the centroid represents the cluster well: Use Centroid Linkage.

• If minimizing cluster variance is important: Use Ward’s Method.

Prabhu Prasad Dev ML / Module-III 57

How to choose the number of clusters?

▪ To choose the number of clusters in hierarchical clustering, we make use of a concept called dendrogram.

▪ A dendrogram is a tree-like diagram that shows the hierarchical relationship between the observations. It contains
the memory of hierarchical clustering algorithms.

Prabhu Prasad Dev ML / Module-III 58

How to choose the number of clusters?

▪ Just by looking at the Dendrogram, you can tell how the cluster is
formed.

▪ Let’s see how to form the dendrogram for the data points.

• The observations E and F are closest to each other by any other

points. So, they are combined into one cluster and also the height
of the link that joins them together is the smallest. The next
observations that are closest to each other are A and B which are
combined together.

• This can also be observed in the dendrogram as the height of the

block between A and B is slightly bigger than E and F. Similarly, D
can be merged into E and F clusters and then C can be combined
to that. Finally, A and B combined to C, D, E and F to form a single
cluster.

Prabhu Prasad Dev ML / Module-III 59

How to choose the number of clusters?
The important point to note while reading the dendrogram is that:

1. The Height of the blocks represents the distance between clusters, Cutting line
and

2. Distance between observations represents dissimilarities.

▪ But the question still remains the same, how do we find the number
of clusters using a dendrogram or where should we stop merging
the clusters? Observations are allocated to clusters by drawing a
horizontal line through the dendrogram.

Generally, we cut the dendrogram in such a way that it cuts the tallest
vertical line. In the above example, we have two clusters. One cluster has
observations A and B, and a second cluster has C, D, E, and F.

Prabhu Prasad Dev ML / Module-III 60

Complete Linkage Example

Prabhu Prasad Dev ML / Module-III 61

Contd..

Prabhu Prasad Dev ML / Module-III 62

Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 63

Contd..

Prabhu Prasad Dev ML / Module-III 64

Contd..

Prabhu Prasad Dev ML / Module-III 65

Contd..

Prabhu Prasad Dev ML / Module-III 66

Contd..

Prabhu Prasad Dev ML / Module-III 67

Contd..

Prabhu Prasad Dev ML / Module-III 68

Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 69

Contd..

Prabhu Prasad Dev ML / Module-III 70

Contd..

Prabhu Prasad Dev ML / Module-III 71

Contd..

Prabhu Prasad Dev ML / Module-III 72

Contd..

Prabhu Prasad Dev ML / Module-III 73

Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 74

Contd..

Prabhu Prasad Dev ML / Module-III 75

Contd..

Prabhu Prasad Dev ML / Module-III 76

Contd..

Prabhu Prasad Dev ML / Module-III 77

Contd..

Prabhu Prasad Dev ML / Module-III 78

Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 79

Contd..

Prabhu Prasad Dev ML / Module-III 80

Contd..

Prabhu Prasad Dev ML / Module-III 81

Contd..

Prabhu Prasad Dev ML / Module-III 82

Contd..

Prabhu Prasad Dev ML / Module-III 83

Strength of Complete Linkage

Prabhu Prasad Dev ML / Module-III 84

Limitations of Complete Linkage

Prabhu Prasad Dev ML / Module-III 85

Single Linkage Example

Prabhu Prasad Dev ML / Module-III 86

Contd..

Prabhu Prasad Dev ML / Module-III 87

Contd..

Prabhu Prasad Dev ML / Module-III 88

Contd..

Take Closest Distance

Prabhu Prasad Dev ML / Module-III 89

Contd..

Prabhu Prasad Dev ML / Module-III 90

Contd..

Prabhu Prasad Dev ML / Module-III 91

Contd..

Prabhu Prasad Dev ML / Module-III 92

Contd..

Prabhu Prasad Dev ML / Module-III 93

Contd..

Prabhu Prasad Dev ML / Module-III 94

Contd..

Prabhu Prasad Dev ML / Module-III 95

Contd..

Prabhu Prasad Dev ML / Module-III 96

Contd..

Prabhu Prasad Dev ML / Module-III 97

Contd..

Prabhu Prasad Dev ML / Module-III 98

Contd..

Prabhu Prasad Dev ML / Module-III 99

Contd..

Prabhu Prasad Dev ML / Module-III 100

Contd..

Prabhu Prasad Dev ML / Module-III 101

Contd..

Prabhu Prasad Dev ML / Module-III 102

Contd..

Prabhu Prasad Dev ML / Module-III 103

Contd..

Prabhu Prasad Dev ML / Module-III 104

Contd..

Prabhu Prasad Dev ML / Module-III 105

Contd..

Prabhu Prasad Dev ML / Module-III 106

Contd..

Prabhu Prasad Dev ML / Module-III 107

Contd..

Prabhu Prasad Dev ML / Module-III 108

Contd..

Prabhu Prasad Dev ML / Module-III 109

Contd..

Prabhu Prasad Dev ML / Module-III 110

Contd..

Prabhu Prasad Dev ML / Module-III 111

Strength of Single Linkage

Prabhu Prasad Dev ML / Module-III 112

Limitations of Single Linkage

Prabhu Prasad Dev ML / Module-III 113

Average Linkage Example

Prabhu Prasad Dev ML / Module-III 114

Contd..

Prabhu Prasad Dev ML / Module-III 115

Contd..

Prabhu Prasad Dev ML / Module-III 116

Contd..

Prabhu Prasad Dev ML / Module-III 117

Contd..

Prabhu Prasad Dev ML / Module-III 118

Contd..

Prabhu Prasad Dev ML / Module-III 119

Contd..

Prabhu Prasad Dev ML / Module-III 120

Contd..

Prabhu Prasad Dev ML / Module-III 121

Contd..

Prabhu Prasad Dev ML / Module-III 122

Contd..

Prabhu Prasad Dev ML / Module-III 123

Contd..

Prabhu Prasad Dev ML / Module-III 124

Contd..

Prabhu Prasad Dev ML / Module-III 125

Contd..

Prabhu Prasad Dev ML / Module-III 126

Contd..

Prabhu Prasad Dev ML / Module-III 127

Contd..

Prabhu Prasad Dev ML / Module-III 128

Contd..

Prabhu Prasad Dev ML / Module-III 129

Contd..

Prabhu Prasad Dev ML / Module-III 130

Contd..

Prabhu Prasad Dev ML / Module-III 131

Contd..

Prabhu Prasad Dev ML / Module-III 132

Contd..

Prabhu Prasad Dev ML / Module-III 133

Contd..

Prabhu Prasad Dev ML / Module-III 134

Contd..

Prabhu Prasad Dev ML / Module-III 135

Divisive Clustering

Prabhu Prasad Dev ML / Module-III 136

Divisive Clustering

• In this clustering, objects are grouped in a top-down manner.

• Initially, all objects are in one cluster.
• Then the cluster is subdivided into smaller and smaller pieces, until each object forms a
cluster on its own or until it satisfies certain termination conditions as the desired number of
clusters is obtained.

Prabhu Prasad Dev ML / Module-III 137

Divisive Clustering Algorithm

1. Start with all data points in one cluster:

• Initially, all data points are grouped into a single cluster.
2. Compute a distance measure:
• Calculate the pairwise distance between all points using a suitable distance metric (e.g., Euclidean distance).
3. Construct a Minimum Spanning Tree (MST):
• Use algorithms like Kruskal's or Prim's to compute the Minimum Spanning Tree (MST) for the given set of points
based on the distance matrix.
4. Find the longest edge in the MST:
• Identify the edge with the largest weight (i.e., the longest distance) in the MST.
5. Split the cluster:
• Remove the longest edge from the MST, splitting the cluster into two parts at the location of the edge.
• These two parts now form two distinct clusters.
6. Repeat the process for each new cluster:
• Apply the same steps recursively to each of the resulting clusters. For each cluster, compute its MST, find the
longest edge, and split again.
7. Terminate when a stopping criterion is met:
• The process continues until the desired number of clusters is reached, or a termination condition (like minimum
intra-cluster distance or a predefined number of clusters) is satisfied.

Prabhu Prasad Dev ML / Module-III 138

Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 139

Example-1

Prabhu Prasad Dev ML / Module-III 140

Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 141

Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 142

Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 143

Example-2

Prabhu Prasad Dev ML / Module-III 144

Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 145

Divisive Clustering Algorithm

Prabhu Prasad Dev ML / Module-III 146

DBSCAN Clustering

Prabhu Prasad Dev ML / Module-III 147

Density-Based Spatial Clustering of Applications with Noise
(DBSCAN)

Prabhu Prasad Dev ML / Module-III 148

Parameters

DBSCAN uses two main parameters:

• ε (epsilon)
• MinPts

By adjusting these parameters, you can control how the

algorithm defines clusters, allowing it to adapt to different types
of datasets and clustering requirements.

Prabhu Prasad Dev ML / Module-III 149

1) Epsilon (𝜺)
• Definition: Epsilon is the radius of the circle (or hypersphere in
higher dimensions) around a data point.
• Explanation: Points that fall within this radius are considered as
neighbors of that point.
• Effect of ε:
• If the ε value is extremely small, then most of the points may not
lie in the neighborhood and will be treated as outliers.
• This leads to poor clustering as most of the data points fail to
satisfy the minimum number of points desired to create a dense
region.
• In contrast, if ε is an extremely high value, then most of the data
points will remain in the same cluster.
• This leads to poor clustering where multiple clusters may end up
merging due to the high value of epsilon.
• Choosing ε: It can be determined using a k-distance graph, where
you look for a "knee" in the plot to identify a good value for ε.

Prabhu Prasad Dev ML / Module-III 150

2) MinPoints

• Definition: MinPoints is the minimum number of data points

required in a neighborhood to form a dense region (i.e., a cluster).
• Explanation: For a point to be considered as a core point, it must
have at least minPts within its ε-neighborhood (including itself).
• Effect of minPts:
• A larger minPts makes it harder to form clusters (as more
points are required).
• A smaller minPts might lead to the formation of smaller,
potentially less meaningful clusters.
• Typical choice: A common heuristic is to choose minPts = D
+ 1, where D is the number of dimensions in the dataset.

Prabhu Prasad Dev ML / Module-III 151

Types of Data Points

DBSCAN revolves around three key concepts:

1.Core Points: These are points that have at least a

minimum number of other points (MinPts) within a
specified distance (ε or epsilon).

2.Border Points: These are points that are within the ε

distance of a core point but don't have MinPts neighbors
themselves.

3.Noise Points: These are points that are neither core

points nor border points. They're not close enough to any
cluster to be included.

Prabhu Prasad Dev ML / Module-III 152

Types of Data Points

Prabhu Prasad Dev ML / Module-III 153

Contd..

Prabhu Prasad Dev ML / Module-III 154

Contd..

Prabhu Prasad Dev ML / Module-III 155

Directly Density Reachable (DDR)

Prabhu Prasad Dev ML / Module-III 156

Density Reachable (DR)

Prabhu Prasad Dev ML / Module-III 157

Density Connectivity (DC)

Prabhu Prasad Dev ML / Module-III 158

DBSCAN Pseudocode

Prabhu Prasad Dev ML / Module-III 159

Example

Prabhu Prasad Dev ML / Module-III 160

Contd..

Prabhu Prasad Dev ML / Module-III 161

Contd…

Prabhu Prasad Dev ML / Module-III 162

Contd…

Prabhu Prasad Dev ML / Module-III 163

Contd…

Prabhu Prasad Dev ML / Module-III 164

Example-2
Apply DBSCAN Algorithm for the following dataset with Eps=2, MinPts=3.
Distance Metric is Euclidean Distance

Prabhu Prasad Dev ML / Module-III 165

Contd…

Prabhu Prasad Dev ML / Module-III 166

Contd…

Prabhu Prasad Dev ML / Module-III 167

Contd…

Prabhu Prasad Dev ML / Module-III 168

Example-3

Prabhu Prasad Dev ML / Module-III 169

Contd…

Prabhu Prasad Dev ML / Module-III 170

Contd…

Prabhu Prasad Dev ML / Module-III 171

Contd…

Prabhu Prasad Dev ML / Module-III 172

When DBSCAN works well

Prabhu Prasad Dev ML / Module-III 173

When DBSCAN does not work well

Prabhu Prasad Dev ML / Module-III 174

Mean Shift Clustering

Prabhu Prasad Dev ML / Module-III 175

Mean Shift Clustering

▪ Mean shift clustering is a non-parametric, density-based algorithm that does not assume any specific shape for
the clusters. It is particularly useful for discovering clusters of arbitrary shapes and sizes in datasets.

▪ The core idea behind mean shift clustering is that it attempts to identify high-density regions of the data by
iteratively shifting data points toward regions of higher density. It essentially "shifts" data points to the mode
(peak) of the data distribution.

Prabhu Prasad Dev ML / Module-III 176

Key Concepts
1. Kernel Density Estimation: It uses a probability density function of a random variable to identify areas of higher
data density in each iteration. The formula for KDE is:

Where h is a bandwidth parameter, and the kernel is commonly a Gaussian. The kernel function k smooths the
contribution of each data point, ensuring that points closer to x have a higher influence on the density estimate.

2. Kernel Function: Mean shift uses a kernel function to define the neighborhood of each data point. The kernel is
often a Gaussian kernel, but other kernel types can also be used.

Prabhu Prasad Dev ML / Module-III 177

Algorithm
As a mode-seeking algorithm, it labels the clusters by finding the modes, or peaks, in the data distribution.
Essentially, it highlights the most dense areas. It does this by iteratively shifting the cluster centers toward regions of
a higher data density.

1. Initialization:

• Begin by considering each data point in the dataset as an initial cluster center. These are the points that the
algorithm will try to "shift" toward denser regions.

• The first step is to define a bandwidth (hyper-parameter) which determines the size of the neighborhood around
each data point (often referred to as the window size).

2. Kernel Function and Window:

• A kernel function is applied to each data point. The kernel typically assigns weights to points depending on their
distance from the current point of interest. The most commonly used kernel is the Gaussian kernel.

• The idea behind the kernel is that nearby points (in terms of the bandwidth) will have a higher influence than
distant points.
Prabhu Prasad Dev ML / Module-III 178
Contd…

Prabhu Prasad Dev ML / Module-III 179

Contd…

3. Shift Step:
• For each data point, a local mean is calculated using all the points within its bandwidth (window). The mean
is a weighted average of the points in the neighborhood.
• The mean shift vector is computed as the difference between the current point and the mean of its
neighborhood.
• This vector is then used to "shift" the current point toward the denser region (higher density area).
▪ Mathematically:

Prabhu Prasad Dev ML / Module-III 180

Contd…

4. Iterative Shifting:
• The point is shifted by the computed vector and the process is repeated.
• This shifting continues until the movement of the point becomes very small, i.e., when convergence is
achieved.
5. Convergence:
• The algorithm stops when the data points stop shifting significantly, meaning that each point has converged
to a location of higher density (the mode of the data distribution).
• Once convergence is achieved, all the points that have converged to the same position (mode) are grouped
together as a single cluster.
6. Cluster Formation:
• After convergence, all points that are "close" to each other (in terms of the distance between their final mean
shift positions) are assigned to the same cluster.
• Typically, a distance threshold is used to merge points that are close enough to each other.
• The clusters correspond to local maxima (modes) of the data density, where a large number of points are
gathered in the same area.

Prabhu Prasad Dev ML / Module-III 181

Illustration

Prabhu Prasad Dev ML / Module-III 182

Example
▪ Data = [(1, 2), (2, 3), (3, 4), (7, 7), (8, 8), (12, 2)]

Prabhu Prasad Dev ML / Module-III 183

Example

Prabhu Prasad Dev ML / Module-III 184

Example

Prabhu Prasad Dev ML / Module-III 185

Example

The same procedure is repeated for the remaining points in the

dataset to compute their new positions.

Prabhu Prasad Dev ML / Module-III 186

Example
Let us assume the given matrix is Gaussian similarity matrix from each pair of points and bandwidth=2.
Apply Mean shift clustering.

Prabhu Prasad Dev ML / Module-III 187

How to calculate Gaussian Similarity

Prabhu Prasad Dev ML / Module-III 188

Contd…
Gaussian similarity matrix

Prabhu Prasad Dev ML / Module-III 189

Example

Within bandwidth=2, find out neighbor points of

each point.

Prabhu Prasad Dev ML / Module-III 190

Contd..
Initial Data Iteration-1

Prabhu Prasad Dev ML / Module-III 191

Strengths

▪ No Need to Predefine the Number of Clusters: Unlike k-means, mean shift clustering does not

require you to specify the number of clusters beforehand. It finds the number of clusters automatically

based on the data’s distribution.

▪ Arbitrary Shape Clusters: Mean shift can discover clusters of any shape, unlike k-means, which

assumes spherical or circular clusters. It is especially useful for irregularly shaped clusters.

▪ Robust to Outliers: Outliers have less impact on the mean shift algorithm because they do not

significantly affect the density in the areas where they reside.

Prabhu Prasad Dev ML / Module-III 192

Weakness

• Computational Complexity: Mean shift is computationally expensive, especially for large datasets. It

involves multiple iterations to shift each data point, which can make it slow for high-dimensional data.

• Sensitivity to Bandwidth Parameter: The bandwidth parameter significantly impacts the results. A small

bandwidth might lead to many small clusters, while a large bandwidth might merge distinct clusters.

Selecting the appropriate bandwidth can be challenging.

Prabhu Prasad Dev ML / Module-III 193

Machine Learning Notes-1 (Clustering-1)
No ratings yet
Machine Learning Notes-1 (Clustering-1)
25 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
ML Unit III
No ratings yet
ML Unit III
82 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
UNIT III Part-1
No ratings yet
UNIT III Part-1
69 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
Unit 5
No ratings yet
Unit 5
85 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Clustering
No ratings yet
Clustering
104 pages
Clustering Partitioning-Hierarchical-DensityBased
No ratings yet
Clustering Partitioning-Hierarchical-DensityBased
87 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Lesson8 Clustering
100% (1)
Lesson8 Clustering
33 pages
L 8 Clustering
No ratings yet
L 8 Clustering
58 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Module 5 - Notes - 13 12 2024
No ratings yet
Module 5 - Notes - 13 12 2024
45 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Clustering and Dimensionality Reduction
No ratings yet
Clustering and Dimensionality Reduction
58 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
Lecture 3.2.3 3.2.4
No ratings yet
Lecture 3.2.3 3.2.4
28 pages
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
No ratings yet
Chapter 3: Cluster Analysis: 3.1 Basic Concepts of Clustering
33 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering
No ratings yet
Clustering
25 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
ML Unit-3
No ratings yet
ML Unit-3
22 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
Cluster
No ratings yet
Cluster
20 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
No ratings yet
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
17 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Mod4 - Unsupervised Learning
No ratings yet
Mod4 - Unsupervised Learning
9 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Clustering and Visualisation of Data - 2020
No ratings yet
Clustering and Visualisation of Data - 2020
5 pages
Unit IV
No ratings yet
Unit IV
51 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
No ratings yet
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
13 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Density Based Clustering
No ratings yet
Density Based Clustering
70 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
06 - Unsupervised Learning - 18 Dec 2023
No ratings yet
06 - Unsupervised Learning - 18 Dec 2023
50 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
No ratings yet
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
79 pages
Lec 06 Clustering
No ratings yet
Lec 06 Clustering
44 pages
NJ - Corrected Final
No ratings yet
NJ - Corrected Final
27 pages
Zafira fk,+4 Vol11No1 855+ (36-47) +
No ratings yet
Zafira fk,+4 Vol11No1 855+ (36-47) +
12 pages
Doucet, de Freitas, Gordon - An Introduction To Sequential Monte Carlo Methods
No ratings yet
Doucet, de Freitas, Gordon - An Introduction To Sequential Monte Carlo Methods
12 pages
Assign 7
No ratings yet
Assign 7
5 pages
Algoritma K-Means Clustering Dan Contoh Soal - KETUTRARE
No ratings yet
Algoritma K-Means Clustering Dan Contoh Soal - KETUTRARE
17 pages
Clustering Algorithms: K-Means
No ratings yet
Clustering Algorithms: K-Means
17 pages
Metode Subtractive Fuzzy C-Means (SFCM) Dalam Pengelompokan
No ratings yet
Metode Subtractive Fuzzy C-Means (SFCM) Dalam Pengelompokan
13 pages
Inter Cluster Inertia Gains: Slim Kammoun
No ratings yet
Inter Cluster Inertia Gains: Slim Kammoun
13 pages
Compute2
No ratings yet
Compute2
10 pages
Data Tugas2 Data Mining Kmeans Clustering
No ratings yet
Data Tugas2 Data Mining Kmeans Clustering
4 pages
Clustering
No ratings yet
Clustering
7 pages
Tara Venit Per Capita (US$) Rata de Alfabetizare (%) Rata de Mortalitate Infantila (%) Durata Medie de Viata (Ani)
No ratings yet
Tara Venit Per Capita (US$) Rata de Alfabetizare (%) Rata de Mortalitate Infantila (%) Durata Medie de Viata (Ani)
8 pages
UAS Mechine Learning
No ratings yet
UAS Mechine Learning
5 pages
American Journal of Physics Volume 53 Issue 9 1985 (Doi 10.1119/1.14356) MacKeown, P. K. - Evaluation of Feynman Path Integrals by Monte Carlo Methods
No ratings yet
American Journal of Physics Volume 53 Issue 9 1985 (Doi 10.1119/1.14356) MacKeown, P. K. - Evaluation of Feynman Path Integrals by Monte Carlo Methods
6 pages
Income (K-Means Clustering On A Sample Data Set)
No ratings yet
Income (K-Means Clustering On A Sample Data Set)
3 pages
Customer Segmentation With K-Means Clustering and Visualization - Colab
No ratings yet
Customer Segmentation With K-Means Clustering and Visualization - Colab
3 pages
Program 7-EM Algorithm-K Means Algorithm
No ratings yet
Program 7-EM Algorithm-K Means Algorithm
3 pages
ML Clustering
No ratings yet
ML Clustering
3 pages
Código K-Means en Spyder
No ratings yet
Código K-Means en Spyder
3 pages
An Incremental Clustering Algorithm Based On Mahalanobis Distance
No ratings yet
An Incremental Clustering Algorithm Based On Mahalanobis Distance
1 page
Pertemuan-X - Manajemen Data Bagian 2
No ratings yet
Pertemuan-X - Manajemen Data Bagian 2
31 pages