Lecture - 11 Hierarchical Clustering
Lecture - 11 Hierarchical Clustering
1
Hierarchical Clustering
2
Hierarchical Clustering - Dendrogram
• It constructs a binary tree of the data that consecutively combines related ensembles of points. The
graphical representation of the resultant hierarchy is a tree-structured graph named dendrogram.
3
Hierarchical Clustering - Strategies
• Agglomerative (bottom-up):
• Beginning with singletons (sets with 1 element)
• Merging them until S is achieved as the root
• In each step, the two closest clusters are aggregates into a new combined cluster
• In this way, number of clusters in the dataset is reduced at each step
• Eventually, all records/elements are combined into a single huge cluster
• It is the most common approach
• Divisive (top-down):
• All records are combined in to a one big cluster
• Then the most dissimilar records being split off recursively partitioning S until
singleton sets are reached.
4
Hierarchical Clustering - Strategies
5
Hierarchical Agglomerative Clustering - Steps
7
Example - Dataset
Reg# Marks
1 10
2 7
3 28
4 20
5 35
8
Example – Proximity Matrix
9
Example – Step 1
10
Example – Step 2
11
Example – Step 2
Reg# Marks
(1,2) 10
3 28
4 20
5 35
12
Example – Step 3
13
Example – How many clusters?
14
Distance Measures
15
Summary
17
Divisive Hierarchical Clustering
19
Hierarchical Clustering - Steps
ID X1 X2 X3 ID 1 2 3 4 5
1 1 6 -1 1 0 4 4 5 7
2 3 7 0 2 4 0 4 3 3
3 3 5 -2 3 4 4 0 5 7
4 4 8 -1 4 5 3 5 0 2
5 5 8 0 5 7 3 7 2 0
22
Example
23
Example
24
Example
25
Example
For the next step, we choose the cluster with the largest diameter, that
ID 1 2 3 4 5 is the cluster with the greatest distance between two points in the
1 0 4 4 5 7 cluster.
2 4 0 4 3 3
3 4 4 0 5 7
4 5 3 5 0 2
So cluster {1,3} has the largest diameter. Trivially, this will be split into
5 7 3 7 2 0 {1} and {3}. So now we have clusters {2,4,5}, {1} and {3}.
27
Example - Dendrogram
ID 1 2 3 4 5
1 0 4 4 5 7
2 4 0 4 3 3
3 4 4 0 5 7
4 5 3 5 0 2
5 7 3 7 2 0
28