0% found this document useful (0 votes)

149 views21 pages

Hierarchical Clustering: Ke Chen

This document discusses hierarchical clustering and the agglomerative algorithm. It begins with an introduction to hierarchical clustering, including that it is an approach that sequentially constructs nested partitions of data into a tree of clusters without needing to specify the number of clusters in advance. It then covers cluster distance measures, provides an example application of the agglomerative algorithm, and discusses relevant issues like determining the number of clusters.

Uploaded by

Shay Bhatter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

149 views21 pages

Hierarchical Clustering: Ke Chen

Uploaded by

Shay Bhatter

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 21

Hierarchical Clustering

Ke Chen

COMP24111 Machine Learning

Outline
Introduction
Cluster Distance Measures
Agglomerative Algorithm
Example and Demo
Relevant Issues
Summary
COMP24111 Machine Learning

Introduction
Hierarchical Clustering Approach

A typical clustering analysis approach via partitioning data set

sequentially

Construct nested partitions layer by layer via grouping objects into a

tree of clusters (without the need to know the number of clusters in
advance)

Use (generalised) distance matrix as clustering criteria

Agglomerative vs. Divisive

Two sequential clustering strategies for constructing a tree of clusters

Agglomerative: a bottom-up strategy

Initially each data object is in its own (atomic) cluster
Then merge these atomic clusters into larger and larger clusters

Divisive: a top-down strategy

Initially all objects are in one single cluster
Then the cluster is subdivided into smaller and smaller clusters
COMP24111 Machine Learning

Introduction
Illustrative Example
Agglomerative and divisive clustering on the data set {a, b,
c, d ,e }
Step 1
Step 2
Step 3
Step 4
Step 0
Agglomerative
a

abcde

cde

e
Step 4

Cluster distance
Termination condition

Divisive
Step 3

Step 2

Step 1

Step 0

COMP24111 Machine Learning

Cluster Distance Measures

Single link: smallest distance

between an element in one cluster

single link
(min)

and an element in the other, i.e.,

d(Ci, Cj) = min{d(xip, xjq)}

Complete link: largest distance

between an element in one cluster

complete link
(max)

and an element in the other, i.e.,

d(Ci, Cj) = max{d(xip, xjq)}

Average: avg distance between

elements in one cluster and

average

elements in the other, i.e.,

d(Ci, Cj) = avg{d(xip, xjq)}

COMP24111 Machine Learning

d(C, C)=0
5

Cluster Distance Measures

Example: Given a data set of five objects characterised by a single
continuous feature, assume that there are two clusters: C 1: {a, b} and C2:
{c, d, e}.
a
b
c
d
e
Feature

1. Calculate the distance matrix.Single

2. link
Calculate three cluster distances between
C1 andaC2. b
c
d
e
dist(C1 , C 2 ) min{d(a, c), d(a, d), d(a, e), d(b, c), d(b, d), d(b, e)}

min{3, 4, 5, 2, 3, 4} 2
Complete link

dist(C 1 , C 2 ) max{d(a, c), d(a, d), d(a, e), d(b, c), d(b, d), d(b, e)}

Average

max{3, 4, 5, 2, 3, 4} 5

d(a, c) d(a, d) d(a, e) d(b, c) d(b, d) d(b, e)

6
3 4 5 2 3 4 21

3.5
6
6

dist(C1 , C 2 )

COMP24111 Machine Learning

Agglomerative Algorithm
The Agglomerative algorithm is carried out in three
steps:
1)Convert all object features
into a distance matrix
2)Set each object as a cluster
(thus if we have N objects,
we will have N clusters at
the beginning)
3)Repeat until number of
cluster is one (or known #
of clusters)
Merge two closest
clusters

COMP24111 Machine Learning

Update distance

Example
Problem: clustering analysis with agglomerative
algorithm

data matrix

Euclidean distance
distance matrix
COMP24111 Machine Learning

Example
Merge two closest clusters (iteration 1)

COMP24111 Machine Learning

Example
Update distance matrix (iteration 1)

COMP24111 Machine Learning

Example
Merge two closest clusters (iteration 2)

COMP24111 Machine Learning

Example
Update distance matrix (iteration 2)

COMP24111 Machine Learning

Example
Merge two closest clusters/update distance matrix
(iteration 3)

COMP24111 Machine Learning

Example
Merge two closest clusters/update distance matrix
(iteration 4)

COMP24111 Machine Learning

Example
Final result (meeting termination condition)

COMP24111 Machine Learning

Example
Dendrogram tree representation

lifetime

5
4
3
2

object

1. In the beginning we have 6

clusters: A, B, C, D, E and F
2. We merge clusters D and F into
cluster (D, F) at distance 0.50
3. We merge cluster A and cluster B
into (A, B) at distance 0.71
4. We merge clusters E and (D, F)
into ((D, F), E) at distance 1.00
5. We merge clusters ((D, F), E) and C
into (((D, F), E), C) at distance 1.41
6. We merge clusters (((D, F), E), C)
and (A, B) into ((((D, F), E), C), (A, B))
at distance 2.50
7. The last cluster contain all the objects,
thus conclude the computation

COMP24111 Machine Learning

Example
Dendrogram tree representation: clustering USA
states

COMP24111 Machine Learning

Exercise
Given a data set of five objects characterised by a single continuous
feature:
a
b
C
d
e
Feature

Apply the agglomerative algorithm with single-link, complete-link and

averaging cluster distance measures to produce three dendrogram trees,
a
b
c
d
e
respectively.
a

COMP24111 Machine Learning

Demo

Agglomerative Demo

COMP24111 Machine Learning

Relevant Issues
How to determine the number of clusters

If the number of clusters known, termination condition is

given!
The K-cluster lifetime as the range of threshold value on
the dendrogram tree that leads to the identification of K
clusters
Heuristic rule: cut a dendrogram tree with maximum Kcluster life time

COMP24111 Machine Learning

Summary
Hierarchical algorithm is a sequential clustering
algorithm

Use distance matrix to construct a tree of clusters (dendrogram)

Hierarchical representation without the need of knowing # of
clusters (can set termination condition with known # of clusters)

Major weakness of agglomerative clustering methods

Can never undo what was done previously

Sensitive to cluster distance measures and noise/outliers
Less efficient: O (n2 logn), where n is the number of total objects

There are several variants to overcome its weaknesses

BIRCH: scalable to a large data set

ROCK: clustering categorical data
CHAMELEON: hierarchical clustering using dynamic modelling

Online tutorial: the hierarchical clustering functions in Matlab

https://www.youtube.com/watch?v=aYzjenNNOcc
COMP24111 Machine Learning

Django ORM Cheatsheet
No ratings yet
Django ORM Cheatsheet
13 pages
Saep 349 PDF
100% (1)
Saep 349 PDF
41 pages
Quantum Mechanics - Special Chapters PDF
No ratings yet
Quantum Mechanics - Special Chapters PDF
398 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Algorithm Design Foundations Solutions
No ratings yet
Algorithm Design Foundations Solutions
111 pages
Activation Functions - Ipynb - Colaboratory
No ratings yet
Activation Functions - Ipynb - Colaboratory
10 pages
Design For Test Scan Test
100% (1)
Design For Test Scan Test
31 pages
Group 3: Molecular Orbital Theory
No ratings yet
Group 3: Molecular Orbital Theory
37 pages
Yio Chu Kang Secondary Sec 1 SA2 2020 Science
No ratings yet
Yio Chu Kang Secondary Sec 1 SA2 2020 Science
21 pages
Otto Cycle - Wikipedia
No ratings yet
Otto Cycle - Wikipedia
13 pages
2.RGP Corneal Lens
No ratings yet
2.RGP Corneal Lens
13 pages
English: Communication Studies
No ratings yet
English: Communication Studies
4 pages
Welding Machine Pre Start Checklist
No ratings yet
Welding Machine Pre Start Checklist
2 pages
Outsmart Clat PDF
No ratings yet
Outsmart Clat PDF
2 pages
Fisher Thermo Scientific Catalogue V Dear
100% (1)
Fisher Thermo Scientific Catalogue V Dear
72 pages
Programed Statistics
No ratings yet
Programed Statistics
551 pages
Study of Suspension System in All Terrain Vehicle: Presented by
No ratings yet
Study of Suspension System in All Terrain Vehicle: Presented by
14 pages
Hierarchical Clustering: Ke Chen
No ratings yet
Hierarchical Clustering: Ke Chen
21 pages
How To Know (Check) My Own Mobile Number - Airtel, Idea, Jio Vodafone, Tata Docomo, Reliance, BSNL, Aircel, MTNL, Videocon, Virgin, Uninor
No ratings yet
How To Know (Check) My Own Mobile Number - Airtel, Idea, Jio Vodafone, Tata Docomo, Reliance, BSNL, Aircel, MTNL, Videocon, Virgin, Uninor
3 pages
Classroom Inventory List SCHOOL YEAR
No ratings yet
Classroom Inventory List SCHOOL YEAR
1 page
Kluang (A) S2 STPM 2019
No ratings yet
Kluang (A) S2 STPM 2019
9 pages
Cortex™ M3
No ratings yet
Cortex™ M3
384 pages
1000-4 European Union EN12975
No ratings yet
1000-4 European Union EN12975
26 pages
JNV. Chemistry Viva
No ratings yet
JNV. Chemistry Viva
30 pages
Chapter 5. Probability and Random Process - Updated
No ratings yet
Chapter 5. Probability and Random Process - Updated
151 pages
EGU2020 Poster Thiesen E 02 ST A0portrait
No ratings yet
EGU2020 Poster Thiesen E 02 ST A0portrait
1 page
Module - 7 Lecture Notes - 2 Mixed Integer Programming: y C B X
No ratings yet
Module - 7 Lecture Notes - 2 Mixed Integer Programming: y C B X
3 pages
CME113 Formula Excel
No ratings yet
CME113 Formula Excel
16 pages
Chapter 3 - Supervised Learning - Neural Network Final
No ratings yet
Chapter 3 - Supervised Learning - Neural Network Final
103 pages
Consolidated Sineority List 2010
No ratings yet
Consolidated Sineority List 2010
34 pages
Multi Class Logistic Regression Training and Testing
No ratings yet
Multi Class Logistic Regression Training and Testing
9 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
UNIT - 5 3D Object Representation
No ratings yet
UNIT - 5 3D Object Representation
59 pages
Chapter 1 Digital Systems and Binary Numbers
No ratings yet
Chapter 1 Digital Systems and Binary Numbers
100 pages
Classify Long Span
No ratings yet
Classify Long Span
6 pages
Decision Science Material
No ratings yet
Decision Science Material
136 pages
Algebra 1
No ratings yet
Algebra 1
90 pages
ADSP Lab Manual
No ratings yet
ADSP Lab Manual
33 pages
Mark Scheme For Grade 11 HL Chemistry-Revision Booklet
No ratings yet
Mark Scheme For Grade 11 HL Chemistry-Revision Booklet
15 pages
Class 11 Ut-4 Budwa
No ratings yet
Class 11 Ut-4 Budwa
2 pages
Experiment No.5 Aim: Theory:: Develop An Application That Makes Use of Database
No ratings yet
Experiment No.5 Aim: Theory:: Develop An Application That Makes Use of Database
7 pages
DL Unit-2 Notes PPT
No ratings yet
DL Unit-2 Notes PPT
39 pages
Unit 2a
No ratings yet
Unit 2a
31 pages
Quantifying Uncertainty: Week 5
No ratings yet
Quantifying Uncertainty: Week 5
38 pages
CS 601 Machine Learning Unit 3
No ratings yet
CS 601 Machine Learning Unit 3
37 pages
Algebra 2 Homework Help Answers
100% (1)
Algebra 2 Homework Help Answers
7 pages
Binomial Distribution
No ratings yet
Binomial Distribution
7 pages
Max and Min PDF
No ratings yet
Max and Min PDF
19 pages
Continue
No ratings yet
Continue
2 pages
Plant Detection Final
No ratings yet
Plant Detection Final
25 pages
Image Enhancement
No ratings yet
Image Enhancement
144 pages
Image Processing - Notes
No ratings yet
Image Processing - Notes
239 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
POM 1 - Intro
No ratings yet
POM 1 - Intro
38 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Computer Education For Nepali School Students - QBASIC CLASS IX
No ratings yet
Computer Education For Nepali School Students - QBASIC CLASS IX
10 pages
IML-IITKGP - Assignment 2 Solution
No ratings yet
IML-IITKGP - Assignment 2 Solution
11 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
18 pages
Wxmaxima Lab Manual
100% (1)
Wxmaxima Lab Manual
52 pages
Digital Nurture 2.0 - Deep Skilling Stage - Handbook
No ratings yet
Digital Nurture 2.0 - Deep Skilling Stage - Handbook
11 pages
Engineering Mathematics 3
No ratings yet
Engineering Mathematics 3
3 pages
Mathematics For Machine Learning-I
No ratings yet
Mathematics For Machine Learning-I
10 pages
Mehryar Mohri - Foundations of Machine Learning - Book
No ratings yet
Mehryar Mohri - Foundations of Machine Learning - Book
1 page
Unit Iii Context-Free Grammar and Languages: 3.1.1. Definition
No ratings yet
Unit Iii Context-Free Grammar and Languages: 3.1.1. Definition
29 pages
Perceptron Learning Rule and Problems
No ratings yet
Perceptron Learning Rule and Problems
15 pages
Correction To: Cryptocurrency Price and Volatility Predictions With Machine Learning
No ratings yet
Correction To: Cryptocurrency Price and Volatility Predictions With Machine Learning
1 page
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
12 pages
Convolution Neural Networks U2
No ratings yet
Convolution Neural Networks U2
24 pages
Unit 5
No ratings yet
Unit 5
36 pages
Ecc!
No ratings yet
Ecc!
49 pages
MA3355 - AprMay 2023
No ratings yet
MA3355 - AprMay 2023
4 pages
Binary Cross Entropy and Categorical Cross Entropy
No ratings yet
Binary Cross Entropy and Categorical Cross Entropy
19 pages
Unit 2
No ratings yet
Unit 2
31 pages
Chapter10-Computer Arithmatic
No ratings yet
Chapter10-Computer Arithmatic
25 pages
DAA Important Questions
No ratings yet
DAA Important Questions
7 pages
It2402 Mobile Communication
No ratings yet
It2402 Mobile Communication
1 page
18bge14a U4
No ratings yet
18bge14a U4
16 pages
قوانين الفصول بملف واحد فيزياء السادس علمي للاستاذ سعيد محي تومان PDF PDF Mathematical Analysis Teaching Mathematics
No ratings yet
قوانين الفصول بملف واحد فيزياء السادس علمي للاستاذ سعيد محي تومان PDF PDF Mathematical Analysis Teaching Mathematics
1 page
A Comparative Study and Systematic Analysis of XAI Models and Their Applications in Healthcare
No ratings yet
A Comparative Study and Systematic Analysis of XAI Models and Their Applications in Healthcare
26 pages
UNIT3
No ratings yet
UNIT3
17 pages
Ad3501-Dl-Unit 2 Notes
No ratings yet
Ad3501-Dl-Unit 2 Notes
29 pages
AIML Feb, March Scheme 2023
No ratings yet
AIML Feb, March Scheme 2023
25 pages
Syllabus
No ratings yet
Syllabus
2 pages
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
No ratings yet
Confusion Matrix, Accuracy, Precision, Recall, F1 Score
1 page
Clustering in Non-Euclidean Space
No ratings yet
Clustering in Non-Euclidean Space
4 pages
Module - 3 - ANALYSIS OF TIME SERIES
No ratings yet
Module - 3 - ANALYSIS OF TIME SERIES
21 pages
Motor Current Calculator
No ratings yet
Motor Current Calculator
2 pages
Artificial Intelligence and Machine Learning - CS3491 2021 Regulation - Question Paper 2024 April May
No ratings yet
Artificial Intelligence and Machine Learning - CS3491 2021 Regulation - Question Paper 2024 April May
12 pages
Unit 4
No ratings yet
Unit 4
108 pages
MacOS Monograph
No ratings yet
MacOS Monograph
58 pages

Hierarchical Clustering: Ke Chen

Uploaded by

Hierarchical Clustering: Ke Chen

Uploaded by

Hierarchical Clustering

COMP24111 Machine Learning

A typical clustering analysis approach via partitioning data set

Construct nested partitions layer by layer via grouping objects into a

Use (generalised) distance matrix as clustering criteria

Agglomerative vs. Divisive

Two sequential clustering strategies for constructing a tree of clusters

Agglomerative: a bottom-up strategy

Divisive: a top-down strategy

COMP24111 Machine Learning

Cluster Distance Measures

Single link: smallest distance

and an element in the other, i.e.,

Complete link: largest distance

and an element in the other, i.e.,

Average: avg distance between

elements in the other, i.e.,

COMP24111 Machine Learning

Cluster Distance Measures

1. Calculate the distance matrix.Single

d(a, c) d(a, d) d(a, e) d(b, c) d(b, d) d(b, e)

COMP24111 Machine Learning

COMP24111 Machine Learning

COMP24111 Machine Learning

COMP24111 Machine Learning

COMP24111 Machine Learning

COMP24111 Machine Learning

COMP24111 Machine Learning

COMP24111 Machine Learning

COMP24111 Machine Learning

1. In the beginning we have 6

COMP24111 Machine Learning

COMP24111 Machine Learning

Apply the agglomerative algorithm with single-link, complete-link and

COMP24111 Machine Learning

COMP24111 Machine Learning

If the number of clusters known, termination condition is

COMP24111 Machine Learning

Use distance matrix to construct a tree of clusters (dendrogram)

Major weakness of agglomerative clustering methods

Can never undo what was done previously

There are several variants to overcome its weaknesses

BIRCH: scalable to a large data set

Online tutorial: the hierarchical clustering functions in Matlab

You might also like