Lecture 8 Clustring
Lecture 8 Clustring
Single Linkage
➢ The inputs to a single linkage algorithm can be distances or similarities
between pairs of objects.
➢ Groups are formed from the individual entities by merging nearest
neighbors, where the term nearest neighbor connotes the smallest distance
or largest similarity.
➢ Initially, we must find the smallest distance in 𝐷 = {𝑑𝑖𝑘 } and merge the
corresponding objects, say, U and V, to get the cluster (UV). For Step 3 of the
general algorithm of (2), the distances between (UV) and any other cluster
W are computed by
𝑑(𝑈𝑉)𝑊 = 𝑚𝑖𝑛{𝑑𝑈𝑊, 𝑑𝑉𝑊 }
➢ Here the quantities 𝑑𝑈𝑊 and 𝑑𝑉𝑊 are the distances between the nearest
neighbors of clusters U and W and clusters V and W, respectively.
➢ At this point we have two distinct clusters (135) and (24). Their nearest
neighbor distance is
𝑑(135)(24) = 𝑚𝑖𝑛{𝑑(135)2 , 𝑑(135)4 } = 𝑚𝑖𝑛{7, 6} = 6
➢ The final distance matrix becomes
(135) (24)
(135) 0
[ ]
(24) 6 0
➢ Consequently, clusters (135) and (24) are merged to form a single cluster of
all five objects (12345), when the nearest neighbor distance reaches 6.
Objects
Figure 12.3 Single linkage dendrogram for distances between five objects.
Complete linkage
➢ Complete linkage clustering proceeds in much the same manner as single
linkage clustering, with one important exception: At each stage, the
distance (similarity) between clusters is determined by the distance
(similarity) between the two elements, one from each cluster, that are most
distant.
➢ Thus, complete linkage ensures that all items in a cluster are within some
maximum distance (or minimum similarity) of each other.
Average Linkage
➢ Average linkage treats the distance between two clusters as the average
distance between all pairs of items where one member of a pair belongs to
each cluster.
➢ Again, the input to the average linkage algorithm may be distances or
similarities, and the method can be used to group objects or variables.
➢ The average linkage algorithm proceeds in the manner of the general
algorithm.
➢ We begin by searching the distance matrix
𝐷 = {𝑑𝑖𝑘 }
to find the nearest (most similar) objects—for example, U and V.
➢ These objects are merged to form the cluster (UV).
➢ For Step 3 of the general agglomerative algorithm, the distances between
(UV) and the other cluster W are determined by
∑𝑖 ∑𝑘 𝑑𝑖𝑘
𝑑(𝑈𝑉)𝑊 =
𝑁(𝑈𝑉) 𝑁𝑊
where 𝑑𝑖𝑘 is the distance between object i in the cluster (UV) and object k in
the W, and 𝑁(𝑈𝑉) and 𝑁𝑊 are the number of items in clusters (UV) and W,
respectively.
Problem: For the given set of data points find the clusters using Average Linkage
Technique. Use the Euclidian Distance and draw the dendrogram.
X Y
𝑃1 .40 .53
𝑃2 .22 .38
𝑃3 .35 .32
𝑃4 .26 .19
𝑃5 .08 .41
𝑃6 .45 .30
Solution:
The formula of Euclidian distance between 𝑃1 (𝑥1 , 𝑦1 ) and 𝑃2 (𝑥2 , 𝑦2 ) is
√(𝑥1 − 𝑥2 )2 + (𝑦1 − 𝑦2 )2
The Euclidian distance between 𝑃1 and 𝑃2 is
𝑃3 𝑃6
Now we need to create a new distance matrix.
Calculate distance between (𝑃3 , 𝑃6 ) and 𝑃1 using Average Linkage
= 𝐴𝑉𝐺[(𝑑𝑖𝑠𝑡 (𝑃3 , 𝑃1 ), 𝑑𝑖𝑠𝑡 (𝑃6 , 𝑃1 )]
. 22 + .24
= = .23
2
Calculate distance between (𝑃3 , 𝑃6 ) and 𝑃2 using Average Linkage
= 𝐴𝑉𝐺[(𝑑𝑖𝑠𝑡 (𝑃3 , 𝑃2 ), 𝑑𝑖𝑠𝑡 (𝑃6 , 𝑃2 )]
. 14 + .24
= = .19
2
Calculate distance between (𝑃3 , 𝑃6 ) and 𝑃4 and (𝑃3 , 𝑃6 ) and 𝑃5 using Average
Linkage in similar way.
Now let us create the new distance matrix as
𝑃1 𝑃2 𝑃4 𝑃5 𝑃3 𝑃6
𝑃1 0
𝑃2 .23 0
𝑃4 .37 .19 0
𝑃5 .34 .14 .28 0
𝑃3 𝑃6 .23 .19 .19 .34 0
Find the smallest distance. It is .14. It is the distance between 𝑃2 and 𝑃5 . Merge 𝑃2
and 𝑃5 to form second cluster.
Let us draw the dendrogram between 𝑃2 and 𝑃5 .
𝑃3 𝑃6 𝑃2 𝑃5
Let us calculate distance between pairs of (𝑃2 𝑃5 ) with other points. Distance
between (𝑃2 𝑃5 ) and 𝑃1 using Average Linkage
= 𝐴𝑉𝐺[(𝑑𝑖𝑠𝑡 (𝑃2 𝑃1 ), 𝑑𝑖𝑠𝑡 (𝑃5 𝑃1 )]
.23+.34
= = .29
2
Find the smallest distance. It is .19. It is the distance between 𝑃4 and 𝑃3 𝑃6 . So, 𝑃3, ,
𝑃6 and 𝑃4 form the next cluster.
Update the dendrogram.
𝑃3 𝑃6 𝑃4 𝑃2 𝑃5
Recalculate the distance matrix. Replace 𝑃3 , 𝑃6 and 𝑃4 with single entry for 𝑃3 𝑃6 𝑃4 .
Calculate the distance of two other points to the new cluster.
Distance between (𝑃3 𝑃6 𝑃4 ) and 𝑃1 using Average Linkage
= 𝐴𝑉𝐺[(𝑑𝑖𝑠𝑡 𝑃1 , (𝑃3 𝑃6 ), 𝑑𝑖𝑠𝑡 (𝑃4 , 𝑃1 )]
. 24 + .37
= = .31
2
Distance between (𝑃3 𝑃6 𝑃4 ) and (𝑃2 , 𝑃5 ) using Average Linkage
= 𝐴𝑉𝐺[{(𝑑𝑖𝑠𝑡 (𝑃3 𝑃6 ), (𝑃2 , 𝑃5 )}𝑑𝑖𝑠𝑡 {𝑃4 , (𝑃2 , 𝑃5 )}]
.27+.24
= = .26
2
The smallest distance is .26. It is the distance between the clusters 𝑃2 𝑃5 and 𝑃3 𝑃6 𝑃4 .
Make the new cluster 𝑃3 𝑃6 , 𝑃2 𝑃5 , 𝑎𝑛𝑑 𝑃4 .
Update the dendrogram as
𝑃3 𝑃6 𝑃4 𝑃2 𝑃5
Recalculate the distance matrix.
Replace the entries for the clusters 𝑃2 𝑃5 and 𝑃3 𝑃6 𝑃4 as new cluster. Calculate the
distance of new clusters.
Distance between (𝑃2 𝑃5 , 𝑃3 , 𝑃6 , 𝑃4 ) and 𝑃1 using Average Linkage
= 𝐴𝑉𝐺[{(𝑑𝑖𝑠𝑡 (𝑃2 𝑃5 ), 𝑃1 }𝑑𝑖𝑠𝑡 {(𝑃3 , 𝑃6 , 𝑃4 )𝑃1 }]
.29+.31
= = .30
2
Join the two clusters to form the final last cluster. Update the dendrogram. This is
the result of hierarchy average linkage clustering.
𝑃3 𝑃6 𝑃4 𝑃2 𝑃5 𝑃1