0% found this document useful (0 votes)
4 views32 pages

Hierarchical Clustering

The document discusses hierarchical clustering, detailing two main types: agglomerative (bottom-up) and divisive (top-down), along with their respective algorithms and methodologies. It explains the process of merging or splitting clusters based on distance metrics and illustrates the concept using examples and distance matrices. Additionally, it introduces the dendrogram as a visual representation of the clustering hierarchy.

Uploaded by

pobocow192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views32 pages

Hierarchical Clustering

The document discusses hierarchical clustering, detailing two main types: agglomerative (bottom-up) and divisive (top-down), along with their respective algorithms and methodologies. It explains the process of merging or splitting clusters based on distance metrics and illustrates the concept using examples and distance matrices. Additionally, it introduces the dendrogram as a visual representation of the clustering hierarchy.

Uploaded by

pobocow192
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Hierarchical Clustering

• Clusters are created in levels actually creating sets of


clusters at each level.
• Types:
– Agglomerative
• Initially each item is in its own cluster
• Iteratively clusters are merged together
• Bottom Up approach is followed
– Divisive
• Initially all items are in one cluster
• Large clusters are successively divided
• Top Down approach is followed
Types of hierarchical clustering
• Divisive (top down) clustering: Starts with all data points in
one cluster, the root, then
–Splits the root into a set of child clusters. Each child cluster is recursively
divided further
–stops when only singleton clusters of individual data points remain, i.e.,
each cluster with only a single point

• Agglomerative (bottom up) clustering :The dendrogram is


built from the bottom level by
–merging the most similar (or nearest) pair of clusters
–stopping when all the data points are merged into a single cluster (i.e.,
the root cluster).
Hierarchical Clustering
• Distance matrix is used as clustering criteria. This method does not
require the number of clusters k as an input, but needs a termination
condition.

Step 0 Step 1 Step 2 Step 3 Step 4 Agglomerative Nesting


(AGNES)
a ab
b abcde
c
cde
d
de
e
Divisive Analysis
Step 4 Step 3 Step 2 Step 1 Step 0 (DIANA)
Hierarchical Algorithms
Algorithms are based on
• Single Link - the distance between two clusters is the minimum distance
between the two groups (the similarity of two clusters is the similarity of
their most similar members)
• Complete Link –the distance between two clusters is the maximum distance
between the two groups(the similarity of two clusters is the similarity of their most
dissimilar members)
• Average Link
Dendrogram
• Dendrogram: a tree data
structure which illustrates
hierarchical clustering
techniques.
• Each level shows clusters for
that level.
– Leaf – individual clusters
– Root – one cluster
• A cluster at level i is the union of
its children clusters at level i+1.
Levels of Clustering
Single Link
• View all items with links (distances) between them.
• Finds maximal connected components in this graph.
• Two clusters are merged if there is at least one edge which
connects them.
• Uses threshold distances at each level.
• Could be agglomerative or divisive.
AGNES (Agglomerative Nesting)
• Introduced in Kaufmann and Rousseeuw (1990)
• Uses the Single-Link method and the dissimilarity
matrix.
– Merge nodes that have the least dissimilarity
– Continue the merging of nodes in a non-descending fashion
– Eventually all nodes belong to the same cluster

10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
DIANA (Divisive Analysis)
• Introduced in Kaufmann and Rousseeuw (1990)
• Inverse order of AGNES
• Eventually each node forms a cluster on its own
• A simple algorithm based on MST version of single link algorithm:
– All items are initially placed in one cluster
– Clusters are split into two until all items are in their own cluster

10 10
10

9 9
9
8 8
8

7 7
7
6 6
6

5 5
5
4 4
4

3 3
3
2 2
2

1 1
1
0 0
0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10
AGNES - Example 1
• Consider the data points A(1,1), B(1.5, 1.5), C (5,5), D(3,4),
E(4,4) and F(3,3.5).
• Example Reference:
• The distance (Euclidean) matrix is
Dist A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
Example 1 (cont’d)
• All the 6 objects are in a cluster of its own.
• In the beginning we have 6 clusters. We iterate until we have a cluster
consisting of the whole six original objects.
• In each step of the iteration, we find the closest pair clusters.
• The closest cluster is between cluster F and D with shortest distance of
0.5.
• Thus, we group cluster D and F into cluster (D, F).
• Distance between ungrouped clusters will not change from the original
distance matrix.
• Then we update the distance matrix as given below.
Dist A B C D,F E
A 0.00 0.71 5.66 ? 3.20
B 0.71 0.00 4.95 ? 2.50
C 5.66 4.95 0.00 ? 2.50
D,F ? ? ? 0.00 ?
E 4.24 3.54 1.41 ? 1.12
Example 1 (cont’d)
• Calculate distance between newly grouped
clusters (D, F) and other clusters
– Use the linkage rule (Single link). Using single
linkage, we specify minimum distance between
original objects of the two clusters.
Dist A B C D,F E
A 0.00 0.71 5.66 ? 3.20
B 0.71 0.00 4.95 ? 2.50
C 5.66 4.95 0.00 ? 2.50
D,F ? ? ? 0.00 ?
E 4.24 3.54 1.41 ? 1.12
Example 1 (cont’d) -Distance table updation:
Using the input distance matrix, distance between cluster (D, F) and cluster A is computed
as
= min(, )= min( 3.61, 3.20) =3.20

Distance between cluster (D, F) and cluster B is


= min(, )= min( 2.92, 2.50) =2.50

Similarly, distance between cluster (D, F) and cluster C is


= min(, )= min( 2.24, 2.50) =2.24

Finally, distance between cluster E and cluster (D, F) is calculated as


= min(, )= min(1.00,1.12) =1.00
Then, the updated distance matrix becomes Looking

Dist A B C D,F E
A 0.00 0.71 5.66 3.20 3.20
B 0.71 0.00 4.95 2.50 2.50
C 5.66 4.95 0.00 2.24 2.50
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 1.00 1.12
Example 1 (cont’d)
• Looking at the lower triangular updated distance
matrix (previous slide), the closest distance is
between cluster B and cluster A (0.71). Thus, we
group cluster A and cluster B into a single cluster
name (A, B).
– Now we update the distance matrix. Expecting the first
row and first column, all the other elements of the new
distance matrix are not changed.
Dist A,B C D,F E
Dist A B C D,F E
A,B 0.0 ? ? ?
A 0.00 0.71 5.66 3.20 3.20
C ? 0.00 2.24 2.50
B 0.71 0.00 4.95 2.50 2.50
D,F ? 2.24 0.00 1.00
C 5.66 4.95 0.00 2.24 2.50
E ? 1.41 1.00 1.12
D,F 3.20 2.50 2.24 0.00 1.00
E 4.24 3.54 1.41 1.00 1.12
Example 1 (cont’d)
• Using the input distance matrix (size 6 by 6), distance
between cluster C and cluster (D, F) is computed as

• Distance between cluster (D, F) and cluster (A, B) is the


minimum distance between all objects involved in the two
clusters

• Similarly, distance between cluster E and (A, B) is


Dist A,B C D,F E

• Then the updated distance matrix is A,B 0.0 4.95 2.50 3.54
C 4.95 0.00 2.24 2.50
D,F 2.50 2.24 0.00 1.00
E 3.54 1.41 1.00 1.12
Example 1 (cont’d)
• From the updated distance matrix, the closest
distance happens between clusters cluster E
and (D, F) at distance 1.00.
– Thus, we cluster them together into cluster ((D, F),
E ).
• The updated distance matrix is given below.
Dist A,B C (D,F),E

A,B 0.0 4.95 3.54


C 4.95 0.00 2.50
(D,F),E 2.50 1.41 0.00
Example 1 (cont’d)
• Distance between cluster ((D, F), E) and cluster (A, B) is
calculated as

• Distance between cluster ((D, F), E) and cluster C yields the


minimum distance of 1.41.
– Hence , we merge cluster ((D, F), E) and cluster C into a new cluster name
(((D, F), E), C).

• The updated distance matrix is


Dist A,B C (D,F),E Dist A,B ((D,F),E
),C
A,B 0.0 4.95 3.54 A,B 0.0 2.50

C 4.95 0.00 2.50 ((D,F),E 2.50 0.00


), C
(D,F),E 2.50 1.41 0.00
Example 1 (cont’d)

• The minimum distance of 2.5 is the result of


the following computation Dist A,B ((D,F),E),C

A,B 0.0 2.50


((D,F),E), C 2.50 0.00
Example 1 (cont’d)
Summary
• In the beginning we have 6 clusters: A, B, C, D, E and F
• We merge cluster D and F into cluster (D, F) at distance 0.50
• We merge cluster A and cluster B into (A, B) at distance 0.71
• We merge cluster E and (D, F) into ((D, F), E) at distance 1.00
• We merge cluster ((D, F), E) and C into (((D, F), E), C) at distance 1.41
• We merge cluster (((D, F), E), C) and (A, B) into ((((D, F), E), C), (A, B)) at
distance 2.50
• The last cluster contains all the objects.
• Using this information, we can now draw the final results of a dendogram.
The dendogram is drawn based on the distances to merge the clusters
above.
Example 1 (cont’d)

• The hierarchy is given as (((D, F), E),C), (A,B).


We can also plot the clustering hierarchy into
XY space

x1 x2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
Example 2
• Consider the distance matrix:
– Minimum distance between cluster E and F is
1.31; hence clustered together.
Dist. A B C D E F

A 0 7.56 12.15 34.34 45.11 46.42

B 7.56 0 19.71 41.9 52.67 53.98

C 12.15 19.71 0 22.19 32.96 34.27

D 34.34 41.9 22.19 0 10.77 20.08

E 45.11 52.67 32.96 10.77 0 1.31

F 46.42 53.98 34.27 12.08 1.31 0


Example 2 (cont’d)
• ? marks under a column or row means two elements are
clubbed together at this point of calculation but distance
from this new point to all other points are not known.
Dist. A B C D E, F
A 0 7.56 12.15 34.34 ?
B 7.56 0 19.71 41.9 ?
C 12.15 19.71 0 22.19 ?
D 34.34 41.9 22.19 0 ?
E, F ? ? ? ? 0

• Minimum distance is between cluster A and B is 7.56.


• The cluster A and B is grouped into single cluster name (A,
B).
Dist. A, B C D E, F
A, B 0 ? ? ?
C ? 0 22.19 32.96
D ? 22.19 0 10.77
E, F ? 32.96 10.77 0
Example 2 (cont’d)
• From the distance matrix it is found that the closest distance between
clusters happens between cluster D and (E, F) at distance 10.77.
• Distance between cluster (E, F) and A is

• Distance between cluster (E, F) and B is

Dist. A, B C D E, F
A, B 0 12.15 34.34 34.34
C 12.15 0 22.19 32.96
D 34.34 22.19 0 10.77
E, F 45.11 32.96 10.77 0
Example 2 (cont’d)
• It can be seen that the closest distance between clusters happens
between cluster D and (E, F) at distance 10.77. Thus, result in clustering
these together into cluster ((E, F), D).
Dist. A, B C D E, F
A, B 0 12.15 34.34 34.34
C 12.15 0 22.19 32.96
D 34.34 22.19 0 10.77
E, F 45.11 32.96 10.77 0

• The minimum distance appears between cluster (A, B) and C at distance


12.15. Thus, clustered them together ((A, B), C).
Dist. A, B C (E, F), D
A, B 0 12.15 34.34
C 12.15 0 22.19
(E, F), D 34.34 22.19 0
Example 2 (cont’d)
• The minimum distance appears between cluster (A, B) and C
at distance 12.15. Thus, clustered them together ((A, B), C).

Dist. ((A, B), C) ((E, F), D)


((A, B), C) 0 22.19
((E, F), D) 22.19 0

• From the distance matrix, it has been found that ((E, F), D)
and ((A, B), C) are merged into cluster {((E, F), D), ((A, B), C)};
– The cluster contains all the objects, and thus terminates the
Agglomerative Hierarchical Clustering computation.
Example 3
Example of Complete Linkage Clustering
• Clustering starts by computing a distance
between every pair of units to be clustered.
The table below is an example of a
distance matrix.
• The smallest distance is between three and
five and they get merged first into a the
cluster '35'.
• Using complete linkage clustering, the
distance between "35" and every other
item is the maximum of the distance
between this item and 3 and this item and
5.
• For example, d(1,3)= 3 and d(1,5)=11. So,
D(1,"35")=11. This gives us the new
distance matrix.
– If it had been “Average linkage clustering”, the
distance is (3+11)/2 = 7
• The items with the smallest distance get
clustered next. This will be 2 and 4.
Example 3 (cont’d)

• Now, clusters 1 and (2,4) will get • Updated Distance


clustered at a height of 9.
• Finally we have a cluster of all 5 Matrices:
objects. 1 2,4 3,5
• On this plot given below, the y-axis 1 0
shows the distance between the 2,4 9 0
objects at the time they were 3,5 11 10 0
clustered. This is called the cluster
height.
1, 2,4 3,5
1,2,4 0
3,5 11 0
Example 3 (cont’d)
• Below is the single linkage dendogram for the
same distance matrix. It starts with cluster
"35" but the distance between "35" and each
item is now the minimum of d(x,3) and d(x,5).
So c(1,"35")=3.
Divisive Clustering – Example 1
• Consider the graph and its adjacency matrix
A B

A B C D E
A 0 1 2 2 3
E C
B 1 0 2 4 3
C 2 2 0 1 5
D 2 4 1 0 3 D
E 3 3 5 3 0

• Reference :Margaret H. Dunham , “Data Mining :


Introductory and Advanced Concepts”, Pearson, 2012.
Example 1 ( cont’d)
The Minimum Spanning Tree (MST) is

A B
A B C D E
A 0 1 2 2 3
B 1 0 2 4 3 E C
C 2 2 0 1 5
D 2 4 1 0 3
E 3 3 5 3 0 D
Example 1 ( cont’d)
• Cut edges from the MST starting
from largest (weight) to smallest
repeatedly
• Step1: All items are in 1 cluster
{A, B, C, D, E}
• Step 2: Largest edge is between D
and E, cutting this results in 2
clusters
{E}, {A, B, C, D}
• Step 3: Removing the edge between
B and C results in
MST
{E}, {A, B} {C, D}
• Step 4: Removing the edge between
A and B ( and between C and D),
results in
{E}, {A}, {B}, {C} , {D}
References
• https://newonlinecourses.science.psu.edu/sta
t555/node/86
/
• https://
people.revoledu.com/kardi/tutorial/Clustering
/Numerical%20Example.htm
• Margaret H. Dunham , “Data Mining :
Introductory and Advanced Concepts”,
Pearson, 2012.

You might also like