Hierarchical Clustering: Ke Chen
Hierarchical Clustering: Ke Chen
Ke Chen
Outline
Introduction
Cluster Distance Measures
Agglomerative Algorithm
Example and Demo
Relevant Issues
Summary
COMP24111 Machine Learning
Introduction
Hierarchical Clustering Approach
Introduction
Illustrative Example
Agglomerative and divisive clustering on the data set {a, b,
c, d ,e }
Step 1
Step 2
Step 3
Step 4
Step 0
Agglomerative
a
ab
abcde
cde
de
e
Step 4
Cluster distance
Termination condition
Divisive
Step 3
Step 2
Step 1
Step 0
single link
(min)
complete link
(max)
average
d(C, C)=0
5
min{3, 4, 5, 2, 3, 4} 2
Complete link
dist(C 1 , C 2 ) max{d(a, c), d(a, d), d(a, e), d(b, c), d(b, d), d(b, e)}
Average
max{3, 4, 5, 2, 3, 4} 5
3.5
6
6
dist(C1 , C 2 )
Agglomerative Algorithm
The Agglomerative algorithm is carried out in three
steps:
1)Convert all object features
into a distance matrix
2)Set each object as a cluster
(thus if we have N objects,
we will have N clusters at
the beginning)
3)Repeat until number of
cluster is one (or known #
of clusters)
Merge two closest
clusters
Update distance
Example
Problem: clustering analysis with agglomerative
algorithm
data matrix
Euclidean distance
distance matrix
COMP24111 Machine Learning
Example
Merge two closest clusters (iteration 1)
Example
Update distance matrix (iteration 1)
10
Example
Merge two closest clusters (iteration 2)
11
Example
Update distance matrix (iteration 2)
12
Example
Merge two closest clusters/update distance matrix
(iteration 3)
13
Example
Merge two closest clusters/update distance matrix
(iteration 4)
14
Example
Final result (meeting termination condition)
15
Example
Dendrogram tree representation
lifetime
5
4
3
2
object
16
Example
Dendrogram tree representation: clustering USA
states
17
Exercise
Given a data set of five objects characterised by a single continuous
feature:
a
b
C
d
e
Feature
18
Demo
Agglomerative Demo
19
Relevant Issues
How to determine the number of clusters
20
Summary
Hierarchical algorithm is a sequential clustering
algorithm
21