0% found this document useful (0 votes)
34 views14 pages

Topic 6e - Hierarchical Clustering (MIN)

The document discusses hierarchical clustering using the single link or minimum distance approach. It explains that with this approach, the similarity between two clusters is based on the closest/most similar pair of points between the clusters. The algorithm works by successively merging the closest clusters into larger clusters. This is done by finding the minimum distance between points in different clusters and linking those clusters. A dendrogram is used to illustrate the cluster mergers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views14 pages

Topic 6e - Hierarchical Clustering (MIN)

The document discusses hierarchical clustering using the single link or minimum distance approach. It explains that with this approach, the similarity between two clusters is based on the closest/most similar pair of points between the clusters. The algorithm works by successively merging the closest clusters into larger clusters. This is done by finding the minimum distance between points in different clusters and linking those clusters. A dendrogram is used to illustrate the cluster mergers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

TOPIC 6 – PART E

CLUSTERING: HIERARCHICAL
APPROACH (MIN)
OBJECTIVES

• To introduce the basic concepts of clustering


• To discuss how to compute the dissimilarity between objects of
different attribute types
• To examine several clustering techniques
• Partitioning approach
• Hierarchical approach ✅

https://discuss.cryosparc.com/t/using-particles-from-cluster-mode-in-3d-va-for-refinement-fa
ils/3665/2
CLUSTER SIMILARITY: MIN OR SINGLE LINK

• Similarity of two clusters is based on


the two most similar (closest) points in
the different clusters
• Determined by one pair of points, i.e., by
one link in the dissimilarity graph.

1 2 3 4 5
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Point x coordinate y coordinate
p1 0.40 0.53
p2 0.22 0.38 Numeric
p3 0.35 0.32 Attributes
p4 0.26 0.19
p5 0.08 0.41
p6 0.45 0.30

p1 p2 p3 p4 p5 p6
p1 0.00
p2 0.24 0.00
p3 0.22 0.15 0.00
p4 0.37 0.20 0.15 0.00
p5 0.34 0.14 0.28 0.29 0.00
√ 2
𝑑 ( 𝑝 1 , 𝑝 2 ) = ( 𝑥1 − 𝑥 2 ) + ( 𝑦 1 − 𝑦 2 )
2

p6 0.23 0.25 0.11 0.22 0.39 0.00


𝑑 ( 𝑝 1 , 𝑝 2 ) =√ ( 0 . 40 − 0 . 22 ) + ( 0 . 53 −0 . 38 )
2 2
CLUSTER SIMILARITY: MIN OR SINGLE LINK
• Step 1 – find 2 shortest distances from the dissimilarity matrix become
the first 2 clusters
cluster (p3, p6) and cluster (p2,p5)

p1 p2 p3 p4 p5 p6
Draw the dendrogram
p1 0.00 0.2
p2 0.24 0.00
p3 0.22 0.15 0.00 0.15

p4 0.37 0.20 0.16 0.00


0.1
p5 0.34 0.14 0.28 0.29 0.00
p6 0.23 0.25 0.11 0.22 0.39 0.00 0.05

0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK

1 O Step 2 – form the next cluster by finding the point


with the minimum distance to the other points in
the existing clusters, so update the matrix with two
5
2 1 newly formed clusters
2 3 6 Update dissimilarity matrix
p1 P(2,5) P(3,6) p4
p1 0.00
4
P(2,5) ??? 0.00
P(3,6) ??? ??? 0.00
p4 0.37 ??? ??? 0.00
CLUSTER SIMILARITY: MIN OR SINGLE LINK
Distance Single link
dist({3,6},{2,5}) = min (dist(3,2), dist(6,2), dist(3,5), dist(6,5)) p1 P(2,5) P(3,6) p4
= min (0.15,0.25,0.28,0.39) p1 0.00
= 0.15 P(2,5) 0.24 0.00
dist({3,6},{1}) = min (dist(3,1), dist(6,1)) P(3,6) 0.22 0.15 0.00
= min (0.22,0.23)
= 0.22 p4 0.37 0.20 0.16 0.00
dist({3,6},{4}) = min (dist(3,4), dist(6,4))
= min (0.16,0.22)
= 0.16 The closest clusters are
dist({2,5},{1}) = min (dist(2,1), dist(5,1)) cluster {3,6} and cluster
= min (0.24,0.34) {2,5} with 0.15, so the
= 0.24 clusters are merged.
dist({2,5},{4}) = min (dist(2,4), dist(5,4))
= min (0.20,0.29)
= 0.20
CLUSTER SIMILARITY: MIN OR SINGLE LINK

0.2

Draw the 0.15

dendrogram
0.1

0.05

0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 3 – form the next cluster by finding the point p1 P(2,5),(3,6)
P(2,5),(3,6) p4
p4
with the minimum distance to the other points in p1
p1 0.00
0.00
the existing clusters; P(2,5),(3,6) ??? 0.00
0.00
P(2,5),(3,6) 0.22
p4
p4 0.37 0.16
0.37 ??? 0.00
0.00
Distance Single link
dist({{3,6},{2,5}}, 4) = min (dist{(3,6), 4}, dist{(2,5), 4}) Updated matrix
= min (0.16,0.20)
= 0.16
dist({{3,6},{2,5}}, 1) = min (dist{(3,6), 1}, dist{(2,5), 1}) 0.2
= min (0.22,0.24)
= 0.22 0.15

0.1

0.05
Draw the dendogram
0
3 6 2 5 4 1
CLUSTER SIMILARITY: MIN OR SINGLE LINK
O Step 4 – form the next cluster by finding the distance p1 P((2,5),(3,6)),4
to the existing clusters; p1 0.00

P((2,5),(3,6)),4 ???
0.22 0.00
Distance Single link
Updated matrix
dist({{{3,6},{2,5}}, 4},1) = min ((dist({{3,6},{2,5}}, 1), dist(4,1))
= min (0.22,0.37)
= 0.22 0.2

0.15

0.1
Draw the dendogram
0.05

0
3 6 2 5 4 1

P(((2,5),(3,6)),4),1
Final matrix P(((2,5),(3,6)),4),1 0 Terminate
CLUSTER SIMILARITY: MIN OR SINGLE LINK

Nested Clusters Dendrogram

5
1 0.2
3
0.15
5
2 1 0.1

2 3 6 0.05

0
3 6 2 5 4 1
4
4
References

1. Jiawei Han and Micheline Kamber, Data Mining: Concepts and


Techniques, 3rd Edition, Morgan Kaufmann, 2012.

2. Pang-Ning Tan, Michael Steinbach & Vipin Kumar, Introduction to Data


Mining, Addison Wesley, 2019.
THANK YOU
Shuzlina Abdul Rahman | Sofianita Mutalib | Siti Nur Kamaliah Kamarudin

You might also like