100% found this document useful (1 vote)

75 views

03 Hierarchical Clustering

Hierarchical clustering is an algorithm that groups similar objects into clusters. It works by initially putting each object in its own cluster, then iteratively merging the closest pairs of clusters until all objects are in a single cluster. The distance between clusters is defined differently depending on the linkage method used, such as single, complete, or average linkage. Hierarchical clustering produces a dendrogram that shows the cluster mergers at each step. It can cluster objects based on either a distance matrix or raw data.

Uploaded by

Kushagra Bhatnagar

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

75 views

03 Hierarchical Clustering

Uploaded by

Kushagra Bhatnagar

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Hierarchical Clustering

Introduction-Hierarchical clustering

• Hierarchical clustering is an algorithm that groups similar objects into

groups called clusters.
• The endpoint is a set of clusters, where each cluster is distinct from each
other cluster, and the objects within each cluster are broadly similar to
each other.
• Hierarchical clustering can be performed with either a distance
matrix or raw data.
• When raw data is provided, the software will automatically compute a
distance matrix in the background
• It falls in the category of connectivity based models.

2
How it works
Given a set of N items to be clustered, and an N*N distance (or similarity) matrix, the
basic process of hierarchical clustering (defined by S.C. Johnson in 1967) is this:
1. Start by assigning each item to a cluster, so that if you have N items, you now
have N clusters, each containing just one item. Let the distances (similarities)
between the clusters be the same as the distances (similarities) between the
items they contain.
2. Find the closest (most similar) pair of clusters and merge them into a single
cluster, so that now you have one cluster less.
3. Compute distances (similarities) between the new cluster and each of the old
clusters.
4. Repeat steps 2 and 3 until all items are clustered into a single cluster of size N.
(*)

3
Contd..
• Step 3 can be done in different ways, which is what distinguishes single-
linkage from complete-linkage and average-linkage clustering.
• In single-linkage clustering (also called the connectedness or minimum method), we consider
the distance between one cluster and another cluster to be equal to the shortest distance from
any member of one cluster to any member of the other cluster. If the data consist of
similarities, we consider the similarity between one cluster and another cluster to be equal to
the greatest similarity from any member of one cluster to any member of the other cluster.
• In complete-linkage clustering (also called the diameter or maximum method), we consider
the distance between one cluster and another cluster to be equal to the greatest distance
from any member of one cluster to any member of the other cluster.
• In average-linkage clustering, we consider the distance between one cluster and another
cluster to be equal to the average distance from any member of one cluster to any member of
the other cluster.
A variation on average-link clustering is the UCLUS method of R. D'Andrade (1978) which
uses the median distance, which is much more outlier-proof than the average distance.

4
Single-Linkage Clustering: The Algorithm
• Begin with the disjoint clustering having level L(0) = 0 and sequence number m = 0.

• Find the least dissimilar pair of clusters in the current clustering, say pair (r), (s), according to

d[(r),(s)] = min d[(i),(j)]

where the minimum is over all pairs of clusters in the current clustering.

• Increment the sequence number : m = m +1. Merge clusters (r) and (s) into a single cluster to form the next clustering
m. Set the level of this clustering to

L(m) = d[(r),(s)]

• Update the proximity matrix, D, by deleting the rows and columns corresponding to clusters (r) and (s) and adding a row
and column corresponding to the newly formed cluster. The proximity between the new cluster, denoted (r,s) and old
cluster (k) is defined in this way:

d[(k), (r,s)] = min d[(k),(r)], d[(k),(s)]

• If all objects are in one cluster, stop. Else, go to step 2.

5
Example

Input distance matrix (L = 0 for all the clusters):

TO
BA FI MI NA RM

BA 0 662 877 255 412 996

FI 662 0 295 468 268 400
MI 877 295 0 754 564 138
NA 255 468 754 0 219 869
RM 412 268 564 219 0 669
TO 996 400 138 869 669 0

6
• The nearest pair of cities is MI and TO, at distance 138. These are merged into a single
cluster called "MI/TO". The level of the new cluster is L(MI/TO) = 138 and the new sequence
number is m = 1.
• Then we compute the distance from this new compound object to all other objects. The
shortest distance from "MI/TO" to RM is chosen to be 564, which is the distance from MI to
RM, and so on.

BA FI MI/TO NA RM
BA 0 662 877 255 412
FI 662 0 295 468 268
MI/TO 877 295 0 754 564
NA 255 468 754 0 219
RM 412 268 564 219 0

7
Contd..
• min d(i,j) = d(NA,RM) = 219 => merge NA and RM into a new cluster called NA/RM
L(NA/RM) = 219
m=2

FI MI/TO NA/RM
BA
BA 0 662 877 255
FI 662 0 295 268
MI/TO 877 295 0 564

NA/RM 255 268 564 0

8
Contd..
• min d(i,j) = d(BA,NA/RM) = 255 => merge BA and NA/RM into a new cluster called
BA/NA/RM
L(BA/NA/RM) = 255
m=3

BA/NA/RM FI MI/TO

BA/NA/RM 0 268 564

FI 268 0 295
MI/TO 564 295 0

9
Contd..
• min d(i,j) = d(BA/NA/RM,FI) = 268 => merge BA/NA/RM and FI into a new cluster
called BA/FI/NA/RM
L(BA/FI/NA/RM) = 268
m=4

MI/TO
BA/FI/NA/RM

BA/FI/NA/RM 0 295

MI/TO 295 0

10
• Finally, we merge the last two clusters at level 295.
• The process is summarized by the following hierarchical tree:

295

268

255
219

138

11
Difference between k-means and hierarchical
• Hierarchical clustering can’t handle big data well but K Means clustering can. This
is because the time complexity of K Means is linear i.e. O(n) while that of
hierarchical clustering is quadratic i.e. O(n2).
• In K Means clustering, since we start with random choice of clusters, the results
produced by running the algorithm multiple times might differ. While results are
reproducible in Hierarchical clustering.
• K Means is found to work well when the shape of the clusters is hyper spherical
(like circle in 2D, sphere in 3D).
• K Means clustering requires prior knowledge of K i.e. no. of clusters you want to
divide your data into. But, you can stop at whatever number of clusters you find
appropriate in hierarchical clustering by interpreting the dendrogram

12
Applications of Clustering
• Clustering has a large no. of applications spread across various domains. Some of
the most popular applications of clustering are:
• Recommendation engines
• Market segmentation
• Social network analysis
• Search result grouping
• Medical imaging
• Image segmentation
• Anomaly detection

13
Practice question
BOS NY DC MIA CHI SEA SF LA DEN

BOS 0 206 429 1504 963 2976 3095 2979 1949

NY 206 0 233 1308 802 2815 2934 2786 1771

DC 429 233 0 1075 671 2684 2799 2631 1616

MIA 1504 1308 1075 0 1329 3273 3053 2687 2037

CHI 963 802 671 1329 0 2013 2142 2054 996

SEA 2976 2815 2684 3273 2013 0 808 1131 1307

SF 3095 2934 2799 3053 2142 808 0 379 1235

LA 2979 2786 2631 2687 2054 1131 379 0 1059

DEN 1949 1771 1616 2037 996 1307 1235 1059 0

14
THANK YOU

CS 188: Artificial Intelligence: Search
No ratings yet
CS 188: Artificial Intelligence: Search
55 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Clustering - Hierarchical
No ratings yet
Clustering - Hierarchical
4 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Clustering
No ratings yet
Clustering
110 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
Clustring
No ratings yet
Clustring
20 pages
Clustering
No ratings yet
Clustering
75 pages
19 - Sessionppt - Clusteringalgos
No ratings yet
19 - Sessionppt - Clusteringalgos
36 pages
MACHINE LEARNING NOTES ANNA UNIVERSITY
No ratings yet
MACHINE LEARNING NOTES ANNA UNIVERSITY
14 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
3CP10 Mjj Hierarchical Clustering
No ratings yet
3CP10 Mjj Hierarchical Clustering
40 pages
lec2
No ratings yet
lec2
32 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
Topic 6d - Hierarchical Algorithm
No ratings yet
Topic 6d - Hierarchical Algorithm
38 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Clustering
No ratings yet
Clustering
39 pages
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
No ratings yet
Cluster Analysis 04: Elbow, Slihouette, Hierarchical Clustering, Agglomerative Clustering, Min, Max, Group Average
28 pages
13 Clustering and Classifier
No ratings yet
13 Clustering and Classifier
123 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Clustering
No ratings yet
Clustering
75 pages
Cluster
100% (1)
Cluster
72 pages
CLUSTERING_REVISION
No ratings yet
CLUSTERING_REVISION
6 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
Lect 11 DM
No ratings yet
Lect 11 DM
41 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
No ratings yet
Data Mining: Hierarchical Clustering, DBSCAN The EM Algorithm
63 pages
Hierar Scale4
No ratings yet
Hierar Scale4
51 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
1 s2.0 016781919500017I Main
No ratings yet
1 s2.0 016781919500017I Main
13 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
7 HierarchicalClustering AND DBSCAN
No ratings yet
7 HierarchicalClustering AND DBSCAN
41 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
"These Are Just Rough Notes For References" What Is K-Means Clustering
No ratings yet
"These Are Just Rough Notes For References" What Is K-Means Clustering
9 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
1629189889 ML TCS Lecture Hierarchical 1608
No ratings yet
1629189889 ML TCS Lecture Hierarchical 1608
41 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
ML-UNIT-III
No ratings yet
ML-UNIT-III
12 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Unsupervised Algorithms Unit3
No ratings yet
Unsupervised Algorithms Unit3
53 pages
Clustering
No ratings yet
Clustering
12 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Hierarchical-Clustering-in-Machine-Learning
No ratings yet
Hierarchical-Clustering-in-Machine-Learning
10 pages
Clustering
No ratings yet
Clustering
69 pages
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
From Everand
Group Theory in Solid State Physics and Photonics: Problem Solving with Mathematica
Wolfram Hergert
No ratings yet
01 Introduction Clustering
No ratings yet
01 Introduction Clustering
11 pages
02 K-Means
No ratings yet
02 K-Means
25 pages
02.1 K-Means Example
No ratings yet
02.1 K-Means Example
12 pages
Icmai (Icwai) Club: Capital Market Analysis Objective Questions
No ratings yet
Icmai (Icwai) Club: Capital Market Analysis Objective Questions
35 pages
AOA - Viva QnA - Doubtly - in
No ratings yet
AOA - Viva QnA - Doubtly - in
15 pages
Chapter 3 - Searching-Part 3
No ratings yet
Chapter 3 - Searching-Part 3
64 pages
Normal Punch (1) 11111
No ratings yet
Normal Punch (1) 11111
2 pages
q1 Modular Exam
No ratings yet
q1 Modular Exam
2 pages
Final - Assessment-MOGAJI - GABRIEL - ROTIMI - R1812D7158691-UU-COM-3005-42931
No ratings yet
Final - Assessment-MOGAJI - GABRIEL - ROTIMI - R1812D7158691-UU-COM-3005-42931
22 pages
Group Assignment 1
No ratings yet
Group Assignment 1
4 pages
AI Exam 2021-2022 UET
No ratings yet
AI Exam 2021-2022 UET
2 pages
Introduction Numerical Analysis
No ratings yet
Introduction Numerical Analysis
411 pages
Chapter10 - Section3
No ratings yet
Chapter10 - Section3
40 pages
Assignment Muiz
No ratings yet
Assignment Muiz
2 pages
Experiment 3.1 K-Mean
No ratings yet
Experiment 3.1 K-Mean
8 pages
System of Linear Equations Two Variables
No ratings yet
System of Linear Equations Two Variables
4 pages
F (X) Cos (X) + 2 Sin (X) + X: CPE 202 - Numerical Methods
No ratings yet
F (X) Cos (X) + 2 Sin (X) + X: CPE 202 - Numerical Methods
4 pages
Artificial neural networks (II) (Part I)
No ratings yet
Artificial neural networks (II) (Part I)
12 pages
Programs: / Implementation of Dijkstra'S Algorithm
No ratings yet
Programs: / Implementation of Dijkstra'S Algorithm
4 pages
Simulasi Fenomena Aliran Daya Pada Sistem Tenaga Listrik "Ieee 5-Bus" Berbasis Metode Numeris Dan Berbantuan Aplikasi Matlab
No ratings yet
Simulasi Fenomena Aliran Daya Pada Sistem Tenaga Listrik "Ieee 5-Bus" Berbasis Metode Numeris Dan Berbantuan Aplikasi Matlab
9 pages
Methods of Solving Quadratic Equations ANSWERS
No ratings yet
Methods of Solving Quadratic Equations ANSWERS
2 pages
Quarter Test - Basic Calculus
No ratings yet
Quarter Test - Basic Calculus
6 pages
FFT Using Overlap Add Method: 'Enter Input Signal X:' 'Enter Impulse Signal H:'
No ratings yet
FFT Using Overlap Add Method: 'Enter Input Signal X:' 'Enter Impulse Signal H:'
3 pages
Assignment 2 PRG181
No ratings yet
Assignment 2 PRG181
4 pages
Romberg
No ratings yet
Romberg
8 pages
1.3 Gauss Elimination Method
No ratings yet
1.3 Gauss Elimination Method
6 pages
Class Notes - 22.05.2024
No ratings yet
Class Notes - 22.05.2024
6 pages
Chapter 16 Numerical Linear Algebra: 16.1 Sets of Linear Equations
No ratings yet
Chapter 16 Numerical Linear Algebra: 16.1 Sets of Linear Equations
9 pages
Greedy
No ratings yet
Greedy
22 pages
M (CS) 312
No ratings yet
M (CS) 312
8 pages
Monotone Cubic Spline Interpolation For Functions
No ratings yet
Monotone Cubic Spline Interpolation For Functions
17 pages
Homework 1 Solutions
No ratings yet
Homework 1 Solutions
6 pages
Artificial Starting Solution
No ratings yet
Artificial Starting Solution
5 pages

03 Hierarchical Clustering

Uploaded by

03 Hierarchical Clustering

Uploaded by

Hierarchical Clustering

• Hierarchical clustering is an algorithm that groups similar objects into

d[(r),(s)] = min d[(i),(j)]

d[(k), (r,s)] = min d[(k),(r)], d[(k),(s)]

• If all objects are in one cluster, stop. Else, go to step 2.

Input distance matrix (L = 0 for all the clusters):

BA 0 662 877 255 412 996

NA/RM 255 268 564 0

BA/NA/RM 0 268 564

BOS 0 206 429 1504 963 2976 3095 2979 1949

NY 206 0 233 1308 802 2815 2934 2786 1771

DC 429 233 0 1075 671 2684 2799 2631 1616

MIA 1504 1308 1075 0 1329 3273 3053 2687 2037

CHI 963 802 671 1329 0 2013 2142 2054 996

SEA 2976 2815 2684 3273 2013 0 808 1131 1307

SF 3095 2934 2799 3053 2142 808 0 379 1235

LA 2979 2786 2631 2687 2054 1131 379 0 1059

DEN 1949 1771 1616 2037 996 1307 1235 1059 0

You might also like