Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
1. The first two points (7 and 10) are close to each other and should be in the same cluster
2. Also, the last two points (28 and 35) are close to each other and should be in the same
cluster
3. Cluster of the center point (20) is not easy to conclude
Let’s solve the problem using both types of agglomerative hierarchical clustering:
1. Single Linkage: In single link hierarchical clustering, we merge two clusters in each
step, whose two closest members have the smallest distance.
Using single linkage, two clusters are formed:
X Y
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30
Iteration 1:
The minimum distance is between P3 and P6 (0.11), Therefore, we form first cluster containing two
points (P3, P6). Then
Iteration 2:
Iteration 3:
Iteration 4:
Cluster1 = {P1}
---------------------------------------------------------------------------------------------------------------------
Second Example (Complete Link):
X Y
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30
Solution:
Iteration 1:
The minimum distance is between P3 and P6 (0.11), Therefore, we form first cluster containing two
points (P3, P6). Then
Iteration 2:
Iteration 3:
P1 P2, P5 P3, P6 P4
P1 0
P2, P5 0
P3, P6 0.23 0.39 0
P4 0.37 0.22 0
Third Example (Average Link):
X Y
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30
Solution:
Iteration/Step 1:
The minimum distance is between P3 and P6 (0.11), Therefore, we form first cluster containing two
points (P3, P6). Then
Iteration/Step 2:
P1 P2 P3, P6 P4 P5
P1 0
P2 0.23 0
P3, P6 0.225 0.20 0
P4 0.37 0.20 0.185 0
P5 0.34 0.14 0.335 0.29 0
The minimum distance is between P2 and P5 (0.14), Therefore, we form second cluster containing two
points (P2, P5). Then
Iteration/Step 3:
Distance ((P3, P6), (P2, P5)) = MEAN (dist (P3, P6), (P2, P5))
= MEAN (dist (P3, P2), dist (P3, P5), dist (P6, P2), dist (P6, P5))
= MEAN (0.15, 0.28, 0.25, 0.39)
= (0.15 + 0.28 + 0.25 + 0.39) / 4 = 0.267
P1 P2, P5 P3, P6 P4
P1 0
P2, P5 0
P3, P6 0.225 0.267 0
P4 0.37 0.185 0