Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

Objective 

: For one dimensional data set {7,10,20,28,35}, perform hierarchical clustering


(AGNES) and plot the dendrogram to visualize it.

Solution : First, visualize the data.

By observing the plot above, we can intuitively conclude that:

1. The first two points (7 and 10) are close to each other and should be in the same cluster
2. Also, the last two points (28 and 35) are close to each other and should be in the same
cluster
3. Cluster of the center point (20) is not easy to conclude

Let’s solve the problem using both types of agglomerative hierarchical clustering:

1. Single Linkage: In single link hierarchical clustering, we merge two clusters in each
step, whose two closest members have the smallest distance.
Using single linkage, two clusters are formed:

Cluster 1 : (7,10)

Cluster 2 : (20,28,35)


2. Complete Linkage: In complete link hierarchical clustering, we merge those members of the
clusters, which provide the smallest maximum pairwise distance in each step.

Using complete linkage, two clusters are formed :

Cluster 1 : (7,10,20)

Cluster 2 : (28,35)


Conclusion: Hierarchical clustering is mostly used when the application requires a hierarchy, e.g
creation of a taxonomy. However, they are expensive in terms of their computational and storage
requirements.

Second Example (Single Link):

Consider the following points:

X Y
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30

Solution: let’s visualize the data.

Iteration 1:
The minimum distance is between P3 and P6 (0.11), Therefore, we form first cluster containing two
points (P3, P6). Then

Iteration 2:

Step 3: How distances are re-calculated


The minimum distance is between P2 and P5 (0.14), so we form second cluster containing two points (P2,
P5). Then

Iteration 3:

Distance {(P2, P5), (P3, P6)}

= MIN {dist {(P2, P5), (P3, P6)}}

= MIN { dist(P2, P3), dist(P2, P6), dist(P5, P3), dist(P5, P6)}

= MIN { 0.15, 0.25, 0.28, 0.39} = 0.15


The minimum distance is between (P3,P6) and (P2,P5) = (0.15), so we form third cluster containing four
points (P2, P3, P5, P6). Then

Iteration 4:

Distance {(P2, P5, P3, P6), (P1)}

= MIN {dist {(P2, P5, P3, P6), (P1)}}

= MIN { dist(P2, P1), dist(P5, P1), dist(P3, P1), dist(P6, P1)}

= MIN { 0.23, 0.34, 0.22, 0.23} = 0.22


The minimum distance is between (P4) and (P2,P5,P3,P6) = (0.15), so we form fourth cluster containing
five points (P2, P3, P4, P5, P6). Then finally we have total two clusters:

Cluster1 = {P1}

Cluster2 = {P2, P3, P4, P5, P6}

---------------------------------------------------------------------------------------------------------------------
Second Example (Complete Link):

Consider the following points:

X Y
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30

Solution:
Iteration 1:

The minimum distance is between P3 and P6 (0.11), Therefore, we form first cluster containing two
points (P3, P6). Then

Iteration 2:

How distances are re-calculated

To compute the distances between clusters


Distance ((P3, P6), P1) = MAX (dist (P3, P6), P1)
= MAX (dist (P3, P1), dist (P6, P1))
= MAX (0.22, 0.23)
= 0.23
The minimum distance is between P2 and P5 (0.14), Therefore, we form second cluster containing two
points (P2, P5). Then

Iteration 3:

To compute the distances between clusters


Distance ((P2, P5), (P3, P6)) = MAX (dist (P2, P5), (P3, P6))
= MAX (0.15, 0.25, 0.28, 0.39)
= 0.39

P1 P2, P5 P3, P6 P4
P1 0
P2, P5 0
P3, P6 0.23 0.39 0
P4 0.37 0.22 0
Third Example (Average Link):

Consider the following points:

X Y
P1 0.40 0.53
P2 0.22 0.38
P3 0.35 0.32
P4 0.26 0.19
P5 0.08 0.41
P6 0.45 0.30

Solution:
Iteration/Step 1:

The minimum distance is between P3 and P6 (0.11), Therefore, we form first cluster containing two
points (P3, P6). Then

Iteration/Step 2:

How distances are computed

To compute the distances between clusters


Distance ((P3, P6), P1) = MEAN (dist (P3, P6), P1)
= MEAN (dist (P3, P1), dist (P6, P1))
= MEAN (0.22, 0.23)
= (0.22 + 0.23) / 2
= 0.225
Distance ((P3, P6), P2) = MEAN (dist (P3, P6), P2)
= MEAN (dist (P3, P2), dist (P6, P2))
= MEAN (0.15, 0.25)
= (0.15 + 0.25) / 2
= 0.20

Distance ((P3, P6), P4) = MEAN (dist (P3, P6), P4)


= MEAN (dist (P3, P4), dist (P6, P4))
= MEAN (0.15, 0.22)
= (0.15 + 0.22) / 2
= 0.185

Distance ((P3, P6), P5) = MEAN (dist (P3, P6), P5)


= MEAN (dist (P3, P5), dist (P6, P5))
= MEAN (0.28, 0.39)
= (0.28 + 0.39) / 2
= 0.335

P1 P2 P3, P6 P4 P5
P1 0
P2 0.23 0
P3, P6 0.225 0.20 0
P4 0.37 0.20 0.185 0
P5 0.34 0.14 0.335 0.29 0

The minimum distance is between P2 and P5 (0.14), Therefore, we form second cluster containing two
points (P2, P5). Then

Iteration/Step 3:

Distance ((P3, P6), (P2, P5)) = MEAN (dist (P3, P6), (P2, P5))
= MEAN (dist (P3, P2), dist (P3, P5), dist (P6, P2), dist (P6, P5))
= MEAN (0.15, 0.28, 0.25, 0.39)
= (0.15 + 0.28 + 0.25 + 0.39) / 4 = 0.267

P1 P2, P5 P3, P6 P4
P1 0
P2, P5 0
P3, P6 0.225 0.267 0
P4 0.37 0.185 0

You might also like