0% found this document useful (0 votes)

18 views41 pages

ML TCS Lecture Hierarchical 1608

The document discusses Hierarchical Clustering as an unsupervised machine learning algorithm, highlighting its advantages over partitional clustering methods. It covers both agglomerative and divisive approaches, detailing their processes, including the construction of dissimilarity matrices and the use of various proximity measures. Additionally, it introduces advanced techniques like Minimum Spanning Tree-based clustering and CURE, emphasizing their effectiveness in capturing complex cluster shapes.

Uploaded by

nehofo2338

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views41 pages

ML TCS Lecture Hierarchical 1608

Uploaded by

nehofo2338

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Using Hierarchical Clustering as Unsupervised Algorithm for ML

(Lecture 10)
Machine Learning for Real World Applications

Date 16-Aug-2021

Copyright © 2021 Tata Consultancy Services Limited

1 External
Hierarchical Clustering Algorithms

• Hierarchical clustering algorithms can overcome some of the disadvantages of

partitional clustering methods.

Partitional Clustering Hierarchical Clustering

Requires value of K Flexible

Non-deterministic Deterministic

2 External
Hierarchical Clustering Algorithms
Hierarchical Clustering

Agglomerative Divisive
• Agglomerative approaches start with singleton clusters at the bottom level and
continue merging two clusters at a time
— builds a bottom-up hierarchy
• Divisive approaches start with all the data in a single cluster and split it
continuously into smaller groups
— builds a top-down hierarchy
3 External
Hierarchical Clustering Algorithms

• A cluster hierarchy is also called as a dendrogram.

4 External
Hierarchical Clustering Algorithms

• A cluster hierarchy is also called as a dendrogram.

Level 0 (Apex level)

Singletons
5 External
Hierarchical Clustering Algorithms

• Hierarchy can be cut at any given level and clusters can be obtained
k=1

k=4

k=16

Singletons
6 External
Agglomerative Clustering

Basic steps for agglomerative clustering

• A dissimilarity matrix is constructed using a particular proximity measure.

— All data points are represented at the bottom of the dendrogram

• Repeat until the final maximal cluster is obtained:

1. The closest sets of clusters are merged at each level
2. The dissimilarity matrix is updated

7 External
Hierarchical Clustering Algorithms

• Dissimilarity Matrix

P1 P2 P3 P4
P1 0.0 0.2 0.15 0.3
P2 0.2 0.0 0.4 0.5
P3 0.15 0.40 0.0 0.10
P4 0.30 0.50 0.10 0.0

8 External
Agglomerative Clustering

Algorithm for Agglomerative Hierarchical Clustering

• Compute the dissimilarity matrix between all the data points.

• Repeat until the final maximal cluster is obtained:

1. Merge clusters as Ca∪b = Ca ∪ Cb

Set new cluster’s cardinality as Na∪b = Na + Nb

2. Insert a new row and column containing the distances between the new
cluster Ca∪b and the remaining clusters
9 External
Proximity Measures in Agglomerative Clustering

• Single Link Agglomerative Clustering

• Complete Link Agglomerative Clustering
• Group Averaged Agglomerative Clustering
• Centroid Agglomerative Clustering
• Ward’s Agglomerative Clustering

10 External
Single Link Agglomerative Clustering

• Here, the similarity of two clusters is the similarity between their most similar
(nearest neighbour) members.
• This method gives more importance to the regions where clusters are closest.
• Sensitive to noise and outliers in the data

11 External
Single Link Agglomerative Clustering

12 External
13 External
14 External
Complete Link Agglomerative Clustering

• Here, the similarity of two clusters is the similarity of their most dissimilar
members.
• The cluster pair whose merger would result in the smallest diameter is the one
chosen for merger.
• Obtains compact clusters, but again, sensitive to outliers

15 External
16 External
17 External
Single Link Agglomerative Clustering Single Link
0.2
• Example
0.15
P1 P2 P3 P4
0.1
P1 0.0 0.2 0.15 0.3
P2 0.2 0.0 0.4 0.5 3 4 1 2
P3 0.15 0.40 0.0 0.10
P4 0.30 0.50 0.10 0.0 0.5
Complete Link

0.2

0.1

3 4 1 2
18 External
Single Link Agglomerative Clustering Single Link
0.2
• Example
0.15
P1 P2 P3 P4
0.1
P1 0.0 0.2 0.15 0.3
P2 0.2 0.0 0.4 0.5 3 4 1 2
P3 0.15 0.40 0.0 0.10
P4 0.30 0.50 0.10 0.0

dmin((3,4),1) = min(d(3,1), d(4,1)) = 0.15

dmin((3,4),2) = min(d(3,2), d(4,2)) = 0.4
dmin((3,4,1),2) = min(d(3,2), d(4,2), d(1,2)) = 0.2

19 External
Single Link Agglomerative Clustering
0.5
• Example Complete Link
P1 P2 P3 P4
0.2
P1 0.0 0.2 0.15 0.3
P2 0.2 0.0 0.4 0.5 0.1
P3 0.15 0.40 0.0 0.10
P4 0.30 0.50 0.10 0.0 3 4 1 2

dmax((3,4),1) = max(d(3,1), d(4,1)) = 0.30

dmax((3,4),2) = max(d(3,2), d(4,2)) = 0.50
dmax((3,4), (1,2)) = max(d(3,1), d(3,2), d(4,1), d(4,2)) = 0.50

20 External
Group Averaged Agglomerative Clustering (GAAC)

• This measure considers the similarity between all pairs of points present in both
the clusters
• Distance between two clusters is the average of all the pair-wise distances
between the data points in these two clusters
1
∑ ∑
SGAAC(Ca, Cb) = d(i, j)
(Na + Nb)(Na + Nb − 1) i∈C ∪C j∈Ca∪Cb,i≠j
a b

• This measure is expensive to compute

21 External
Centroid-based Agglomerative Clustering

• This measure calculates the similarity between two clusters by measuring the
similarity between their centroids.

22 External
Ward’s Agglomerative Clustering

• Ward’s criterion for agglomeration

It uses the K-means squared error criterion to determine the distance.

— For any two clusters, Ca and Cb, the Ward’s criterion calculates the increase in
the value of SSE (sum of squared error) for clustering obtained by merging them
into Ca ∪ Cb

Na Nb
d(ca, cb)
Na + Nb

Where ca and cb are the centroids of the two clusters Ca and Cb

23 External
Divisive Hierarchical clustering
Divisive Hierarchical Clustering

• Divisive hierarchical clustering is a top-down approach

— the procedure starts at the root with all the data points
— the dendrogram is built through a recursive split of clusters

• Divisive approach is more efficient compared to agglomerative clustering when

there is no need to generate a complete hierarchy

• To make the split decision all the points need to be looked at.
Therefore, divisive clustering is considered as a global approach.

25 External
Issues in Divisive Clustering

• Splitting Criterion:
The Ward’s K-means square error criterion(SSE) is used here.
— The greater the reduction obtained in SSE, the better is the split

However, the SSE criterion can be applied only to numerical data

26 External
Issues in Divisive Clustering

• Evaluating the Ward’s criterion takes time

• Alternatively, we can use the K-means approach with K = 2

— This is the bisecting K-means
— Obtain a few good splits using K-means and choose the best one

27 External
Issues in Divisive Clustering

• Choosing the cluster to split

Check the mean square errors of the clusters

— Choose the one with the largest mean square error

This will ensure compact clusters in the dendrogram

28 External
Divisive Hierarchical Clustering Algorithm

• Start with the root node consisting all the data points
• Repeat

— Split parent node into two parts C1 and C2 using Bisecting K-means

to maximise Ward’s distance W(C1, C2)

— Construct the dendrogram. Among the current, choose the cluster

with the highest squared error as the next node to split

• Until Singleton leaves are obtained.

29 External
Minimum Spanning Tree-Based clustering
Minimum Spanning Tree-based clustering

• Given a weighted graph, a minimum spanning tree is an acyclic subgraph

— that covers all the vertices
— has the minimum edge weights

• Minimum spanning tree for a weighted graph can be found using the
— Prim’s algorithm
— Kruskal’s algorithm

31 External
Weighted Graph

P3 27 19 P7
23
P5 15
P1 30
21
P8
P4 37
34
32
55
P6
45
P2

32 External
Weighted Graph

P3 27 19 P7
23
P5 15
P1 30
21
P8
P4 47
34
32
55
P6
45
P2
Minimum Spanning Tree

33 External
Minimum Spanning Tree-based clustering

• For clustering purpose, we can use the edge weights as the Euclidean distance
between two data points.

• Given an MST, a divisive clustering algorithm has the following steps:

— Remove the largest weighted edge to get two clusterings
— Remove the next largest to get 3 clusterings
— and so on

34 External
Minimum Spanning Tree-based clustering

• If we are looking for K clusters, remove the top K − 1 weights one by one
• This will give K connected components.

• Removal of every weighted-edge gives us a finer split.

• This can detect clusters with non-spherical shapes

35 External
Minimum Spanning Tree-based clustering

• Instead of removing the largest weighted edge, we can also remove the edge
with the highest inconsistency measure.

— An inconsistent edge is the one whose edge weight is much higher than
the average weight of the edges in the neighbourhood of that edge

36 External
CURE (Clustering Using Representatives)
CURE (Clustering Using Representatives)

• In this method, a cluster is represented using a set of well-scattered

representative points.

• The distance between two clusters is computed as the average distance between
the representative points.

• Choosing scattered points helps in capturing arbitrary shapes of clusters.

38 External
CHAMELEON
CHAMELEON

• Algorithm begins with an initial partitioning obtained by

— formulating a K-nearest neighbour graph
— applying graph partitioning

• Agglomerative clustering
— Algorithm uses two metrics to merge the clusters
♦ relative interconnectivity of the two clusters
♦ relative closeness of clusters } Capture local information
of clusters

40 External
Thank You

Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Clustering
No ratings yet
Clustering
110 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Clustring
No ratings yet
Clustring
20 pages
AIMLB-PGP-2025-Session-12
No ratings yet
AIMLB-PGP-2025-Session-12
45 pages
Agnes
No ratings yet
Agnes
25 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
Clustering
No ratings yet
Clustering
39 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
Data Science Session 8 Clustering V0
No ratings yet
Data Science Session 8 Clustering V0
30 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Unit 5 Cluster Analysis
No ratings yet
Unit 5 Cluster Analysis
15 pages
Unit 4 ML
No ratings yet
Unit 4 ML
14 pages
DWM 4
No ratings yet
DWM 4
14 pages
Chp10 Cluster Analysis Basic Concepts and Methods
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
24 pages
4.4 Hierarchical Clustering Methods
No ratings yet
4.4 Hierarchical Clustering Methods
39 pages
Lect 11 DM
No ratings yet
Lect 11 DM
41 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
Unit 2
No ratings yet
Unit 2
33 pages
Clustering
No ratings yet
Clustering
65 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
AI20 - Hierarchical-Clustering
No ratings yet
AI20 - Hierarchical-Clustering
31 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Example For Agglomerative Clustering
No ratings yet
Example For Agglomerative Clustering
2 pages
Partition
No ratings yet
Partition
52 pages
Expt 5
No ratings yet
Expt 5
3 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Unit5 CSM ML
No ratings yet
Unit5 CSM ML
32 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
CE345 - Lecture #10 - Clustering (Part 2)
No ratings yet
CE345 - Lecture #10 - Clustering (Part 2)
64 pages
6 - Chapter 6 - Hierarchical Clustering
No ratings yet
6 - Chapter 6 - Hierarchical Clustering
32 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Ahc 1
No ratings yet
Ahc 1
6 pages
Clustering - Unit 4
No ratings yet
Clustering - Unit 4
19 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
HAC
No ratings yet
HAC
8 pages
Lec.4.D. M. Spring 2025
No ratings yet
Lec.4.D. M. Spring 2025
19 pages
INVENRELATION
From Everand
INVENRELATION
Shih Yu Chang
No ratings yet
Multivariate Data Analysis Techniques Using Python. Dimension Reduction, Classification and Segmentation
From Everand
Multivariate Data Analysis Techniques Using Python. Dimension Reduction, Classification and Segmentation
César Pérez López
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Physics Bits Solutions
No ratings yet
Physics Bits Solutions
20 pages
GE 9 - Rizal Law ORIBIA
No ratings yet
GE 9 - Rizal Law ORIBIA
5 pages
Prufrock Notes 3
No ratings yet
Prufrock Notes 3
2 pages
Mathematics Course Outline
No ratings yet
Mathematics Course Outline
9 pages
New Proposal On Mathematics Tutoring Application For Secondary School
No ratings yet
New Proposal On Mathematics Tutoring Application For Secondary School
4 pages
Revival and Reinvention of Kathak Dance
No ratings yet
Revival and Reinvention of Kathak Dance
14 pages
Digital Literacy For The 21st Century: Rethinking & Redesigning The Roles of Libraries
No ratings yet
Digital Literacy For The 21st Century: Rethinking & Redesigning The Roles of Libraries
6 pages
Week 3 Summarizing and Outlining
No ratings yet
Week 3 Summarizing and Outlining
38 pages
Total Quality Management Exam - Prelim
100% (2)
Total Quality Management Exam - Prelim
4 pages
NURS 324 Athabasca
No ratings yet
NURS 324 Athabasca
5 pages
Microeconomics 3rd Edition Karlan Unlocked Test Bank
No ratings yet
Microeconomics 3rd Edition Karlan Unlocked Test Bank
306 pages
Jayson B. Bejec: # 128 San Simon Street. Brgy Holy Spirit, Quezon City 09454292147
No ratings yet
Jayson B. Bejec: # 128 San Simon Street. Brgy Holy Spirit, Quezon City 09454292147
3 pages
Carol Gilligan': S Theory OF Oral Development
No ratings yet
Carol Gilligan': S Theory OF Oral Development
14 pages
Fin Irjmets1711372102
No ratings yet
Fin Irjmets1711372102
3 pages
Dodzik2017 Article BehaviorRatingInventoryOfExecu
No ratings yet
Dodzik2017 Article BehaviorRatingInventoryOfExecu
6 pages
Q Skill-1-Reading Final Test
100% (1)
Q Skill-1-Reading Final Test
4 pages
M. Ali CV
No ratings yet
M. Ali CV
2 pages
1712-34 L3 Qualification Handbook v1
No ratings yet
1712-34 L3 Qualification Handbook v1
66 pages
Unit of Work
No ratings yet
Unit of Work
23 pages
A Quantitative Evaluation of Shame Resilience Theory
No ratings yet
A Quantitative Evaluation of Shame Resilience Theory
2 pages
Personal Development Worksheets WK 1 - 1
No ratings yet
Personal Development Worksheets WK 1 - 1
7 pages
（阿里国际-MarcoVL 团队）WINGS- Learning Multimodal LLMs Without Text-Only Forgetting
No ratings yet
（阿里国际-MarcoVL 团队）WINGS- Learning Multimodal LLMs Without Text-Only Forgetting
19 pages
Apa Style Dissertation Table of Contents
100% (2)
Apa Style Dissertation Table of Contents
4 pages
SWOTanalysis TeachersGuide
No ratings yet
SWOTanalysis TeachersGuide
10 pages
From Silent Spring PDF
No ratings yet
From Silent Spring PDF
10 pages
Final Exam TTPG Dec 19
No ratings yet
Final Exam TTPG Dec 19
15 pages
LE Cleaning and Lubricating The Sewing Machine.
No ratings yet
LE Cleaning and Lubricating The Sewing Machine.
4 pages
Form Renewal PEPC Final
No ratings yet
Form Renewal PEPC Final
7 pages
Assignment 2 - Conflicting Viewpoint Part 2
No ratings yet
Assignment 2 - Conflicting Viewpoint Part 2
2 pages
Catholic Practices Exam Mock
No ratings yet
Catholic Practices Exam Mock
5 pages

ML TCS Lecture Hierarchical 1608

Uploaded by

ML TCS Lecture Hierarchical 1608

Uploaded by

Using Hierarchical Clustering as Unsupervised Algorithm for ML

Copyright © 2021 Tata Consultancy Services Limited

• Hierarchical clustering algorithms can overcome some of the disadvantages of

Partitional Clustering Hierarchical Clustering

Requires value of K Flexible

• A cluster hierarchy is also called as a dendrogram.

• A cluster hierarchy is also called as a dendrogram.

Basic steps for agglomerative clustering

• A dissimilarity matrix is constructed using a particular proximity measure.

• Repeat until the final maximal cluster is obtained:

Algorithm for Agglomerative Hierarchical Clustering

• Compute the dissimilarity matrix between all the data points.

• Repeat until the final maximal cluster is obtained:

1. Merge clusters as Ca∪b = Ca ∪ Cb

Set new cluster’s cardinality as Na∪b = Na + Nb

• Single Link Agglomerative Clustering

dmin((3,4),1) = min(d(3,1), d(4,1)) = 0.15

dmax((3,4),1) = max(d(3,1), d(4,1)) = 0.30

• This measure is expensive to compute

• Ward’s criterion for agglomeration

Where ca and cb are the centroids of the two clusters Ca and Cb

• Divisive hierarchical clustering is a top-down approach

• Divisive approach is more efficient compared to agglomerative clustering when

However, the SSE criterion can be applied only to numerical data

• Evaluating the Ward’s criterion takes time

• Alternatively, we can use the K-means approach with K = 2

• Choosing the cluster to split

Check the mean square errors of the clusters

This will ensure compact clusters in the dendrogram

to maximise Ward’s distance W(C1, C2)

— Construct the dendrogram. Among the current, choose the cluster

• Until Singleton leaves are obtained.

• Given a weighted graph, a minimum spanning tree is an acyclic subgraph

• Given an MST, a divisive clustering algorithm has the following steps:

• Removal of every weighted-edge gives us a finer split.

• This can detect clusters with non-spherical shapes

• In this method, a cluster is represented using a set of well-scattered

• Choosing scattered points helps in capturing arbitrary shapes of clusters.

• Algorithm begins with an initial partitioning obtained by

You might also like