0% found this document useful (0 votes)

12 views87 pages

Clustering Partitioning-Hierarchical-DensityBased

Clustering is the process of grouping a collection of examples into cohesive clusters based on similarity, which can be either supervised or unsupervised. Various methods exist for clustering, including partitioning, hierarchical, and density-based methods, each with distinct algorithms like K-means and DBSCAN. The evaluation of clustering effectiveness can be measured using metrics such as purity score, which assesses the homogeneity of clusters.

Uploaded by

Paladin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views87 pages

Clustering Partitioning-Hierarchical-DensityBased

Uploaded by

Paladin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 87

Clustering

Clustering
• Process of grouping a set of examples (samples)
• Clustering generates a partition consisting of cohesive
groups or clusters from given collection of examples
(samples)
A collection Partition
of examples Clustering (cluster)
Algorithm

• For example:
– Grouping students in a class based on gender
– Grouping students in a class based the month of birth
– Grouping the students based on the place of sitting

2
Clustering
• Process of grouping a set of examples
• Clustering generates a partition consisting of cohesive
groups or clusters from given collection of examples
A collection Partition
of examples Clustering (cluster)
Algorithm

• The examples to be clustered are either labelled or

unlabelled
– Algorithms which cluster labelled examples:
• Supervised clustering
• Classification: Learning by examples
– Algorithms which cluster unlabelled examples:
• Unsupervised clustering
• Do not rely on predefined classes
• Learning by observation, rather than learning by examples.
3
Clustering
• Clustering is a two step process
– Step1: Partition the collection of examples (clustering)
• Learning by observation (training phase)
• Group the collection of examples into finite number of
clusters such that the examples that are similar to one
another within the same cluster and are dissimilar to
examples in other clusters
• Obtaining cluster labels
• Unsupervised learning: Do not rely on predefined classes
and class-labelled training examples

– Step2: Assign cluster labels to examples

• Testing phase

4
Categorization of Clustering Methods

• Partitioning methods

• Hierarchical methods

• Density-based methods

5
Categorization of Clustering Methods
• Partitioning methods:
– These methods construct K partitions of the data, where
each partition represents a cluster
– Idea: Cluster the collection of examples based on the
distance between examples
– Results in spherical shaped cluster
1. K-means algorithm
2. K-medoids algorithm
3. Gaussian mixture model
• Hierarchical methods:
– These methods create a hierarchical decomposition of
the collection of examples
– Results in spherical shaped cluster
1. Agglomerative approach (bottom-up approach)
2. Divisive approach (top-down approach)
6
Categorization of Clustering Methods

• Density-based methods:
– These methods cluster collection of examples based on
the notion of density
– General idea: To continue growing the given cluster as
long as density (number of examples) in the
neighbourhood exceeds some threshold
– Example:
• DBSCAN (Density-Based Spatial Clustering of Applications
with Noise)

7
Partitioning Method based
Clustering
Classical Portioning Methods
• Centroid-based technique:
– Partition the collection of examples into K clusters based
on the distance between examples
– Cluster similarity is measured in regard to the sample
mean of the examples within a cluster
– Cluster centroid or center of gravity: Sample mean value
of the examples within a cluster
– Cluster center is used to represent the cluster
– Example: K-means algorithm
• Representative object-based technique:
– Actual example is considered to represent the cluster
– One representative example per cluster
– Example: K-medoids algorithm

9
K-Means Clustering Algorithm
• Dividing the data into K groups or partitions
• Given: Training data, and K

• Target: Partition the set D into K clusters (disjoint

subsets),
– Each of the clusters is associated with centers, μk,
k=1, 2, …, K
– Come up with the centers of clusters
– Cluster center acts as a cluster representative
• Euclidean distance with center of a cluster can be
used as a measure of dissimilarity

10
K-Means Clustering Algorithm:
Training Phase
• Given: Training data, and K

1. Initialize the cluster center, μk, k=1, 2, …, K using

randomly selected K data points in D
2. Assign each data point xn to cluster center k*
Squared Euclidean distance

3. Update μk, k=1, 2, …, K: Re-compute μk after assigning

all the data points.
Dk: Data for cluster k
Nk: Number of examples in
cluster k
4. Repeat the steps 2 and 3 until the convergence

11
K-Means Clustering Algorithm:
Training Phase
• Convergence criteria:
– No change in the cluster assignment OR
– The difference between the distortion measure (J) in the
successive iteration falls below the threshold
• Distortion measure (J) : Sum of the squares of the distance
of each example to its assigned cluster center

znk is 1 if xn belongs to cluster k, otherwise 0

12
Illustration of K-Means Clustering

K=3

• Boundary between the cluster is linear

• Hard clustering: Each example must belong to
exactly one group 13
Modified K-Means Clustering Algorithm
• Dividing the data into K groups or partitions
• Given: Training data, and K
• Target: Partition the set D into K clusters (disjoint
subsets),
– Each of the clusters is associated with centers, μk,
k=1, 2, …, K
– Better representative for a cluster
– Come up with the centers of clusters and variance &
covariance (covariance matrix)
– Cluster center and covariance matrix act as cluster
representatives
– For d=2

• Mahalanobis distance with cluster representatives can be

used as a measure of dissimilarity 14
Modified K-Means Clustering Algorithm:
Training Phase
• Given: Training data, and K
1. Initialize the cluster center, μk, k=1, 2, …, K using
randomly selected K data points in D
2. Initialize the covariance matrix, Σk, k=1, 2, …, K using
unit matrix
3. Assign each data point xn to cluster center k*
Squared Mahalanobis
distance

4. Update μk and Σk, k=1, 2, …, K: Re-compute μk and Σk

after assigning all the data points.
Dk: Data for cluster k
Nk: Number of
examples in cluster k

5. Repeat the steps 3 and 4 until the convergence 15

Modified K-Means Clustering Algorithm:
Training Phase
• Convergence criteria:
– No change in the cluster assignment OR
– The difference between the distortion measure (J) in the
successive iteration falls below the threshold

znk is 1 if xn belongs to cluster k, otherwise 0

• Hard clustering: Each example must belong to

exactly one group

16
Illustration of K-Means Clustering

K=3

• Boundary between the cluster is quadratic

• Hard clustering: Each example must belong to
exactly one group 17
Illustration of K-Means Clustering
K=3

Measure of Euclidian Mahalanobis

Dissimilarity: Distance Distance

18
Elbow Method to Choose K
• Determine the distortion measure for different values
of K
• Plot the K vs Distortion
• Optimal number of clusters: Select the value of K at
the “elbow” i.e. the point after which the distortion
start decreasing in a linear fashion

19
K-Means Clustering Algorithm:
Training Phase
• Given: Training data, and K

1. Initialize the cluster center, μk, k=1, 2, …, K using

randomly selected K data points in D
2. Assign each data point xn to cluster center k*
Squared Euclidean distance

3. Update μk, k=1, 2, …, K: Re-compute μk after assigning

all the data points.
Dk: Data for cluster k
Nk: Number of examples in
cluster k
4. Repeat the steps 2 and 3 until the convergence

20
Illustration of K-Means Clustering

K=3

• Boundary between the cluster is linear

• Hard clustering: Each example must belong to
exactly one group 21
K-Medoid Clustering Algorithms
• Related to K-means clustering
• The K-means algorithm is sensitive to outliers
because an example with extremely large value may
substantially distort the distribution of data
• Solution: One of the data points is chosen as
representative of cluster, instead of mean value of the
cluster
• Its replaces the means of cluster with modes
• Partitioning around medoids
• A medoid of a finite dataset: The data point from the
set, whose average dissimilarity (distance) to all the
points is minimal
– The most centrally located point in the set

22
K-Medoid Clustering Algorithm
• Given: Training data, and K
1. Initialize the medoid, , k=1, 2, …, K using randomly
selected K data points in D
2. Assign each data point xn to the closest medoid
Squared Euclidian distance

3. Update medoids , k=1, 2, …, K

–For each data point xn assigned to a cluster k compute the
average dissimilarity (distance) of xn to all the data points
assigned to cluster k

Nk: Number of
examples in cluster k

–Select the example with minimum average dissimilarity as

medoid
1. Repeat the steps 2 and 3 until the convergence 23
K-Medoid Clustering Algorithm
• Convergence criteria:
– No change in the cluster assignment OR
– The difference between the distortion measure
(absolute-error) (J) in the successive iteration falls
below the threshold
• Distortion measure (J) : Sum of the squares of the distance
of each example to its corresponding reference point
(medoid)

znk is 1 if xn belongs to cluster k, otherwise 0

• Optimal number of clusters (k) can obtained using

elbow method

24
Evaluation of Clustering: Purity Score
• Lets us assume that class index for each example is
given
• Purity score: Purity is a measure of the extent to
which clusters contain a single class
– For each cluster, count the number of data points from
the most common class
– Take the sum over all clusters and divide by the total
number of data points
• Let M be the number of classes, C1, C2,…,Cm, …, CM
• Let K be the number of clusters, k = 1,2,…, K
• Let N be the number of data points

25
Evaluation of Clustering: Purity Score
• For each cluster k,
– Count the number of data points from each class
– Consider the number of data points of most common
class

is the number of data points in kth cluster belonging to class m

• Take the sum over all clusters, k

• Divide by the total number of data points (N)

26
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
x2

27
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5

28
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5
• Cluster 2:
– Number of examples of Red x1
Class are more, i.e. 5

29
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5
• Cluster 2:
– Number of examples of Red x1
Class are more, i.e. 5
• Cluster 3:
– Number of examples of Green
Class are more, i.e. 5

30
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5
• Cluster 2:
– Number of examples of Red x1
Class are more, i.e. 5
• Cluster 3:
– Number of examples of Green
Class are more, i.e. 5

• Purity score: (5+5+5)/25 = 0.60

31
Hierarchical Clustering
Hierarchical Clustering Algorithms
• These methods create a hierarchical decomposition of
the collection of examples
• Produce nested sequence of data partitions
• These sequence can be depicted using a tree structure
• Hierarchical clustering method works by grouping data
points into a tree of clusters
• Hierarchical algorithms are either agglomerative or
divisive
– This classification of hierarchical clustering is depending on
whether the hierarchical decomposition is formed in a
• Bottom-up (merging) OR
• Top-down (splitting) fashion
• Need not have to specify the number of clusters

33
Agglomerative Hierarchical Clustering
• Bottom-up approach
• This strategy starts by placing each example in its
own cluster (atomic clusters or singleton clusters) and
then merges these atomic clusters into larger and
larger clusters

Level: 3 Number of clusters: 1

Threshold

Level: 2 Number of clusters: 2

Level: 1 Number of clusters: 3

Level: 0 Number of clusters: 4

34
Agglomerative Hierarchical Clustering
• Bottom-up approach
• This strategy starts by placing each example in its
own cluster (atomic clusters) and then merges these
atomic clusters into larger and larger clusters
• Starts with N clusters where each example is a cluster
• At each successive step (level), the most similar pair
of clusters are merged
– The measure of closeness (intercluster similarity) is
considered to decide which two clusters are merged
– At each level, number of clusters is reduces by one
• The process continues till all the examples are in a
single cluster or until certain termination conditions
are satisfied
– Termination condition could be
• Number of clusters
• Intercluster similarity between each pair of cluster is within
a certain threshold 35
Agglomerative Hierarchical Clustering
• Once two examples are placed in the same
cluster at a level, they remain in same cluster at
all subsequent levels
• Example: AGglomerative NESting (AGNES)
• Most hierarchical clustering methods belong to this
category
• They differ only in their definition of inter cluster
similarity
• Intercluster similarity is to identifying two closest
cluster for merging
• When there is one example in a cluster, two closest
clusters are found by computing minimum Euclidean
distance between two clusters
• However, there is no unique way to find the two
closest clusters when there are more than one data
points in each clusters 36
Agglomerative Hierarchical Clustering
• Different intercluster similarities to find similarity
between the clusters having more than one examples:
1. Minimum distance between any two examples from two
clusters Ci and Cj

• Select a pair of clusters for merging whose minimum

distance between any two examples is minimum of all the
pair of clusters

2. Distance between the centers of two clusters Ci and Cj

• Where μi is the center of Ci and μj is the center of Cj

• Select a pair of clusters for merging whose distance
between the centers is minimum of all the pair of clusters

37
Agglomerative Hierarchical Clustering
• Different intercluster similarities to find similarity
between the clusters having more than one examples:
3. Average distance of all the points in one cluster (Ci) to
all the points in another cluster (Cj)

• Where Ni and Nj are the number of examples in clusters Ci

and Cj respectively
• Select a pair of clusters for merging whose average
distance is minimum than that of all the pair of clusters

38
Agglomerative Hierarchical Clustering
• Given: Training data,
• Target: Partition the data
• Step1: N clusters where each example is a cluster
• Step2: Compute intercluster similarity between each
pair of clusters
• Step3: Choose a pair of clusters that are most similar
(minimum intercluster distance) and merge them
• Step4: Repeat Step2 and Step3 until all the examples
are in a single cluster or until certain termination
conditions are satisfied

39
Illustration: Agglomerative Hierarchical
• Intercluster similarity:
Clustering
– Single example in clusters: Euclidian distance
– More than one examples in cluster: Distance between the centres of
two clusters

Level: 0 Number of clusters: 10 40

Illustration: Agglomerative Hierarchical
• Intercluster similarity:
Clustering
– Single example in clusters: Euclidian distance
– More than one examples in cluster: Distance between the centres of
two clusters

Level: 1 Number of clusters: 9 41

Level: 2 Number of clusters: 8 42

Level: 3 Number of clusters: 7 43

Level: 4 Number of clusters: 6 44

Level: 5 Number of clusters: 5 45

Level: 6 Number of clusters: 4 46

Level: 7 Number of clusters: 3 47

Divisive Hierarchical Clustering
• Top-down approach
• Starts with single cluster having all the examples
• It subdivides the cluster into smaller and smaller
clusters in the successive step

Level: 0 Number of clusters: 1

Level: 1 Number of clusters: 2

Threshold

Level: 2 Number of clusters: 3

Level: 3 Number of clusters: 4

48
Divisive Hierarchical Clustering
• Top-down approach
• Starts with single cluster having all the examples
• It subdivides the cluster into smaller and smaller
clusters in the successive step
• At each successive step, a compactness measure is
used to choose which cluster to split
– Compactness measure: Average value of distance
between the data points of a cluster
– Compactness measure (CMi) of a cluster Ci :

• Where Ni is the number of examples in cluster Ci

– Choose the cluster with larger value of compactness
measure to split
49
Divisive Hierarchical Clustering
• To split a cluster, find a pair of examples having
maximum Euclidian distance and split around these
two examples (keeping them as centroids)
• At each level, number of clusters is increases by one
• The process continues until each example forms a
cluster (atomic or singleton cluster) or until it satisfies
certain termination condition
– Termination condition could be
• Number of clusters
• Compactness measure of each cluster is within a certain
threshold
• Once two examples are placed in two different
clusters at a level, they remain in different
clusters at all subsequent levels
• Example: DIvisive ANAlysis (DIANA)
50
Divisive Hierarchical Clustering
• Given: Training data,
• Target: Partition the data
• Step1: Single cluster having all the examples
• Step2: Find a pair of examples with in a cluster having
maximum Euclidian distance
– These examples act as centroid
• Step3: Split into two clusters by assigning each data
point to one of these two examples using Euclidian
distance
• Step4: Compute compactness measure for each
cluster
• Step5: Choose the cluster with larger value of
compactness measure to split
• Step6: Repeat Step2 to Step5 until each example
forms a cluster (atomic or singleton cluster) or until it
satisfies certain termination condition 51
Divisive Hierarchical Clustering

Level: 0 Number of clusters: 1

52
Divisive Hierarchical Clustering

Level: 0 Number of clusters: 1

53
Divisive Hierarchical Clustering

Level: 1 Number of clusters: 2

54
Divisive Hierarchical Clustering

Level: 1 Number of clusters: 2

55
Divisive Hierarchical Clustering

Level: 2 Number of clusters: 3

56
Density-Based Clustering
Density-Based Clustering
• These methods cluster collection of examples based
on the notion of density
• These methods regard clusters as dense regions of
examples in the data space that are separated by
regions of low density (i.e. noise)
• They discover clusters with arbitrary shape

x1
58
Density-Based Clustering
• These methods cluster collection of examples based
on the notion of density
• These methods regard clusters as dense regions of
examples in the data space that are separated by
regions of low density (i.e. noise)
• They discover clusters with arbitrary shape
• They automatically identifies the number of clusters
• General idea: To continue growing the given cluster as
long as density (number of examples) in the
neighbourhood exceeds some threshold
• Example: Density-based Spatial Clustering of
Applications with Noise (DBSCAN)
– It grows the clusters according to a density-based
connectivity analysis

59
Density-based Spatial Clustering of
Applications with Noise (DBSCAN)
• DBSCAN is a density-based clustering included with
noise
• It grows the regions with sufficiently high density
(neighbors) into clusters with arbitrary shape
• It defines a cluster as a maximal set of
density-connected points
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example

x
ε
x 60
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPoints: Minimum number of examples present
inside the boundary with radius of ε from an example x
• These examples with in a boundary are neighbors to x and
called as ε-neighborhood of an example, x

x
ε
x 61
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPoints: Minimum number of examples present
inside the boundary with radius of ε from an example x
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point

MinPoints = 4
x2

x Core point
ε
x 62
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPoints: Minimum number of examples present
inside the boundary with radius of ε from an example x
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point

MinPoints = 4
x2

Core point x

x 63
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPts: Minimum number of examples present inside
the boundary with radius of ε from an example x
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point

MinPoints = 4
x2

x Not Core point

x 64
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point
4. Border point:
• The number of examples within ε–radius from x is less
than MinPoints AND atleast one of the example in
neighborhood is core point, then x is called as border
point

MinPoints = 4
x2

x Border point

x 65
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point
4. Border point:
• The number of examples within ε–radius from x is less
than MinPoints AND atleast one of the example in
neighborhood is core point, then x is called as border
point

MinPoints = 4
x2

x 66
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
5. Noise point:
• The number of examples within ε–radius from x is less
than MinPoints AND no example in neighborhood is core
point
• The noise point is similar to outlier

MinPoints = 4
x2
Noise point
x

x 67
Clustering using DBSCAN
• Given: Training data,

• First step is to identify core points, border points and

noise points
– Only core points and border points are considered inside
the cluster
– Noise points are not taken into the cluster
– Thus DBSCAN is robust to outliers
• Next step is to find the connected components of core
points
• Connected component of core points: Connecting the
core points that are reachable from any point
– All the connected (reachable) core points form a cluster
Clustering using DBSCAN
• The connected component of core points is obtained
by understanding following two definitions.
• Directly density-reachable: A core point xi is directly
density-reachable to a core point xj, if the core point xj
is within ε–distance from core point xi
• Density-reachable: A core point xi is indirectly
reachable to another core point xj through other core
points, x1, x2, …, xk, xk+1, …, xK such that
– xi is directly density-reachable to x1
– x1 is directly density-reachable to x2
– ….
– xk is directly density-reachable to xk+1
– ….
– xK is directly density-reachable to xj
Clustering using DBSCAN
• Identify core points, border points and noise points

MinPoints = 4

x1
Clustering using DBSCAN
• Identify core points, border points and noise points

Core points
Border points
Noise points

MinPoints = 4

x1
Clustering using DBSCAN
• Obtain the connected component of core points
– Directly density-reachable:
– Density-reachable:
Core points
Border points
Noise points

MinPoints = 4

x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster

Core points
Border points
Noise points

MinPoints = 4