Clustering Partitioning-Hierarchical-DensityBased
Clustering Partitioning-Hierarchical-DensityBased
Clustering
• Process of grouping a set of examples (samples)
• Clustering generates a partition consisting of cohesive
groups or clusters from given collection of examples
(samples)
A collection Partition
of examples Clustering (cluster)
Algorithm
• For example:
– Grouping students in a class based on gender
– Grouping students in a class based the month of birth
– Grouping the students based on the place of sitting
2
Clustering
• Process of grouping a set of examples
• Clustering generates a partition consisting of cohesive
groups or clusters from given collection of examples
A collection Partition
of examples Clustering (cluster)
Algorithm
4
Categorization of Clustering Methods
• Partitioning methods
• Hierarchical methods
• Density-based methods
5
Categorization of Clustering Methods
• Partitioning methods:
– These methods construct K partitions of the data, where
each partition represents a cluster
– Idea: Cluster the collection of examples based on the
distance between examples
– Results in spherical shaped cluster
1. K-means algorithm
2. K-medoids algorithm
3. Gaussian mixture model
• Hierarchical methods:
– These methods create a hierarchical decomposition of
the collection of examples
– Results in spherical shaped cluster
1. Agglomerative approach (bottom-up approach)
2. Divisive approach (top-down approach)
6
Categorization of Clustering Methods
• Density-based methods:
– These methods cluster collection of examples based on
the notion of density
– General idea: To continue growing the given cluster as
long as density (number of examples) in the
neighbourhood exceeds some threshold
– Example:
• DBSCAN (Density-Based Spatial Clustering of Applications
with Noise)
7
Partitioning Method based
Clustering
Classical Portioning Methods
• Centroid-based technique:
– Partition the collection of examples into K clusters based
on the distance between examples
– Cluster similarity is measured in regard to the sample
mean of the examples within a cluster
– Cluster centroid or center of gravity: Sample mean value
of the examples within a cluster
– Cluster center is used to represent the cluster
– Example: K-means algorithm
• Representative object-based technique:
– Actual example is considered to represent the cluster
– One representative example per cluster
– Example: K-medoids algorithm
9
K-Means Clustering Algorithm
• Dividing the data into K groups or partitions
• Given: Training data, and K
10
K-Means Clustering Algorithm:
Training Phase
• Given: Training data, and K
11
K-Means Clustering Algorithm:
Training Phase
• Convergence criteria:
– No change in the cluster assignment OR
– The difference between the distortion measure (J) in the
successive iteration falls below the threshold
• Distortion measure (J) : Sum of the squares of the distance
of each example to its assigned cluster center
12
Illustration of K-Means Clustering
K=3
16
Illustration of K-Means Clustering
K=3
18
Elbow Method to Choose K
• Determine the distortion measure for different values
of K
• Plot the K vs Distortion
• Optimal number of clusters: Select the value of K at
the “elbow” i.e. the point after which the distortion
start decreasing in a linear fashion
19
K-Means Clustering Algorithm:
Training Phase
• Given: Training data, and K
20
Illustration of K-Means Clustering
K=3
22
K-Medoid Clustering Algorithm
• Given: Training data, and K
1. Initialize the medoid, , k=1, 2, …, K using randomly
selected K data points in D
2. Assign each data point xn to the closest medoid
Squared Euclidian distance
Nk: Number of
examples in cluster k
24
Evaluation of Clustering: Purity Score
• Lets us assume that class index for each example is
given
• Purity score: Purity is a measure of the extent to
which clusters contain a single class
– For each cluster, count the number of data points from
the most common class
– Take the sum over all clusters and divide by the total
number of data points
• Let M be the number of classes, C1, C2,…,Cm, …, CM
• Let K be the number of clusters, k = 1,2,…, K
• Let N be the number of data points
25
Evaluation of Clustering: Purity Score
• For each cluster k,
– Count the number of data points from each class
– Consider the number of data points of most common
class
26
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
x2
x1
27
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5
x1
28
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5
• Cluster 2:
– Number of examples of Red x1
Class are more, i.e. 5
29
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5
• Cluster 2:
– Number of examples of Red x1
Class are more, i.e. 5
• Cluster 3:
– Number of examples of Green
Class are more, i.e. 5
30
Illustration of Computing Purity Score
• Number of data points, N = 25
• number of classes, M = 3
• number of clusters, K = 3
• Cluster 1: x2
– Number of examples of Blue
Class are more, i.e. 5
• Cluster 2:
– Number of examples of Red x1
Class are more, i.e. 5
• Cluster 3:
– Number of examples of Green
Class are more, i.e. 5
31
Hierarchical Clustering
Hierarchical Clustering Algorithms
• These methods create a hierarchical decomposition of
the collection of examples
• Produce nested sequence of data partitions
• These sequence can be depicted using a tree structure
• Hierarchical clustering method works by grouping data
points into a tree of clusters
• Hierarchical algorithms are either agglomerative or
divisive
– This classification of hierarchical clustering is depending on
whether the hierarchical decomposition is formed in a
• Bottom-up (merging) OR
• Top-down (splitting) fashion
• Need not have to specify the number of clusters
33
Agglomerative Hierarchical Clustering
• Bottom-up approach
• This strategy starts by placing each example in its
own cluster (atomic clusters or singleton clusters) and
then merges these atomic clusters into larger and
larger clusters
Threshold
37
Agglomerative Hierarchical Clustering
• Different intercluster similarities to find similarity
between the clusters having more than one examples:
3. Average distance of all the points in one cluster (Ci) to
all the points in another cluster (Cj)
38
Agglomerative Hierarchical Clustering
• Given: Training data,
• Target: Partition the data
• Step1: N clusters where each example is a cluster
• Step2: Compute intercluster similarity between each
pair of clusters
• Step3: Choose a pair of clusters that are most similar
(minimum intercluster distance) and merge them
• Step4: Repeat Step2 and Step3 until all the examples
are in a single cluster or until certain termination
conditions are satisfied
39
Illustration: Agglomerative Hierarchical
• Intercluster similarity:
Clustering
– Single example in clusters: Euclidian distance
– More than one examples in cluster: Distance between the centres of
two clusters
Threshold
x2
x1
58
Density-Based Clustering
• These methods cluster collection of examples based
on the notion of density
• These methods regard clusters as dense regions of
examples in the data space that are separated by
regions of low density (i.e. noise)
• They discover clusters with arbitrary shape
• They automatically identifies the number of clusters
• General idea: To continue growing the given cluster as
long as density (number of examples) in the
neighbourhood exceeds some threshold
• Example: Density-based Spatial Clustering of
Applications with Noise (DBSCAN)
– It grows the clusters according to a density-based
connectivity analysis
59
Density-based Spatial Clustering of
Applications with Noise (DBSCAN)
• DBSCAN is a density-based clustering included with
noise
• It grows the regions with sufficiently high density
(neighbors) into clusters with arbitrary shape
• It defines a cluster as a maximal set of
density-connected points
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
x2
x
ε
x 60
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPoints: Minimum number of examples present
inside the boundary with radius of ε from an example x
• These examples with in a boundary are neighbors to x and
called as ε-neighborhood of an example, x
x2
x
ε
x 61
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPoints: Minimum number of examples present
inside the boundary with radius of ε from an example x
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point
MinPoints = 4
x2
x Core point
ε
x 62
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPoints: Minimum number of examples present
inside the boundary with radius of ε from an example x
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point
MinPoints = 4
x2
Core point x
x 63
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
1. Epsilon (ε): It is a value of radius of boundary from
every example
2. MinPts: Minimum number of examples present inside
the boundary with radius of ε from an example x
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point
MinPoints = 4
x2
x 64
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point
4. Border point:
• The number of examples within ε–radius from x is less
than MinPoints AND atleast one of the example in
neighborhood is core point, then x is called as border
point
MinPoints = 4
x2
x Border point
x 65
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
3. Core point: If there are atleast MinPoints number of
examples are with in ε–radius from x, then x is called
as core point
4. Border point:
• The number of examples within ε–radius from x is less
than MinPoints AND atleast one of the example in
neighborhood is core point, then x is called as border
point
MinPoints = 4
x2
x 66
Density-based Special Clustering of
Applications with Noise (DBSCAN)
• DBSCAN has 5 important components:
5. Noise point:
• The number of examples within ε–radius from x is less
than MinPoints AND no example in neighborhood is core
point
• The noise point is similar to outlier
MinPoints = 4
x2
Noise point
x
x 67
Clustering using DBSCAN
• Given: Training data,
x2
MinPoints = 4
x1
Clustering using DBSCAN
• Identify core points, border points and noise points
Core points
Border points
Noise points
x2
MinPoints = 4
x1
Clustering using DBSCAN
• Obtain the connected component of core points
– Directly density-reachable:
– Density-reachable:
Core points
Border points
Noise points
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
Core points
Border points
Noise points
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• All the core points with connected component forms a
cluster
• Assign the border points to nearby cluster which is at
ε–radius from that border point
x2
MinPoints = 4
x1
Clustering using DBSCAN
• Training process:
• Given: Training data,
• Identify the core points, border points and noise
points
• Find the connected components of core points
• Each connected component forms a cluster
• Assign each of the border points to a nearby cluster
which is at ε–radius from that border point
• Noise points are not assigned to any clusters
87