Hierarchical Clustering
Hierarchical Clustering
The need:
Recalculate the distances or similarities between the newly formed and the
remaining clusters.
This step reduces the number of clusters by one.
Continue merging the closest clusters and updating the distance matrix
until only one cluster remains. This process forms a hierarchy of clusters.
6. Creating a Dendrogram:
As clusters are merged, you can represent the hierarchical structure using
a dendrogram. A dendrogram is a tree-like diagram visually showing
clusters’ merging processes and relationships.
8. Cluster Assignment:
Once you’ve determined the desired number of clusters, you can assign
each data point to its corresponding cluster based on the hierarchical
structure you’ve created.
Practice Example:
Step 1: Initialization
Pair Distance
A-B √((2−1)² + (2−2)²) = √1 = 1
A-C √((5−1)² + (5−2)²) = √(16+9) = √25 = 5
A-D √((6−1)² + (5−2)²) = √(25+9) = √34 ≈ 5.83
B-C √((5−2)² + (5−2)²) = √(9+9) = √18 ≈ 4.24
B-D √((6−2)² + (5−2)²) = √(16+9) = √25 = 5
C-D √((6−5)² + (5−5)²) = √1 = 1
Step 3: First Merge
Now, calculate distances between the new cluster {AB} and others using single
linkage:
C-D distance = 1
Merge C and D.
New cluster: {C, D}
Clusters now: {AB}, {CD}
A-C = 5
A-D = 5.83
B-C = 4.24
B-D = 5
If you cut the dendrogram at distance around 2, you get two clusters:
Cluster 1: {A, B}
Cluster 2: {C, D}
If you cut at a higher distance like 5, all points are merged into one cluster.
Practice Question:
Practice Question:
Given five points:
Point Coordinates (x, y)
P (1, 2)
Q (2, 2)
R (8, 9)
S (9, 8)
T (8, 8)
Use Divisive Hierarchical Clustering with these conditions:
Use Euclidean distance.
At each split, separate the two farthest points into different clusters.
Continue splitting until the distance between points in a cluster is less than
or equal to 4.
Solution:
Step 1: Start with all points together {P, Q, R, S, T}.
Step 2: Find the farthest points:
Calculate Euclidean distances:
P–S ≈ 10 (largest distance)
P–R ≈ 9.9
Other distances are smaller.
Thus, P and S are the farthest apart.
Step 3: First Split:
Separate {P} from {Q, R, S, T}.
Clusters:
Cluster 1: {P}
Cluster 2: {Q, R, S, T}
Step 4: Split {Q, R, S, T}:
Among {Q, R, S, T}:
Q–R ≈ 9.22
Q–S ≈ 9.22
Q–T ≈ 8.48
R–S ≈ 1.41
R–T ≈ 1
S–T ≈ 1
The farthest points again involve Q.
Thus, split {Q} from {R, S, T}.
New Clusters:
Cluster 1: {P}
Cluster 2: {Q}
Cluster 3: {R, S, T}
Step 5: Examine {R, S, T}:
Distances:
R–S ≈ 1.41
R–T ≈ 1
S–T ≈ 1
All distances are less than 4, so no further splitting is needed.
Final Clusters:
Cluster Number Points
Cluster 1 {P}
Cluster 2 {Q}
Cluster 3 {R, S, T}
Dendrogram Structure:
Practice Question:
You are given the following five points:
Point Coordinates (x, y)
A (2, 3)
B (3, 4)
C (10, 10)
D (11, 10)
E (10, 9)
Tasks:
1. Perform the clustering step-by-step.
2. Show the final clusters.
3. Draw a rough dendrogram structure showing the splits.