0% found this document useful (0 votes)

11 views44 pages

Agglomerative Clustering

The document provides an overview of hierarchical clustering, a method for organizing data into clusters with high intra-cluster similarity and low inter-cluster similarity. It discusses the principles of clustering, distance measures, and the advantages and disadvantages of hierarchical versus partitional clustering methods. Additionally, it highlights the importance of determining the appropriate number of clusters and introduces techniques like elbow finding and cross-validation for this purpose.

Uploaded by

Dr Aruna Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views44 pages

Agglomerative Clustering

Uploaded by

Dr Aruna Malik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

10601

Machine Learning

Hierarchical clustering

Reading: Bishop: 9-9.2

Second half: Overview
• Clustering
- Hierarchical, semi-supervised learning
• Graphical models
- Bayesian networks, HMMs, Reasoning under uncertainty
• Putting it together
- Model / feature selection, Boosting, dimensionality reduction
• Advanced classification
- SVM
What is Clustering?
• Organizing data into clusters
such that there is
• high intra-cluster similarity

• low inter-cluster similarity

•Informally, finding natural

groupings among objects.

•Why do we want to do that?

•Any REAL application?
Example: clusty
Example: clustering genes

• Microarrays measures the activities

of all genes in different conditions

• Clustering genes can help determine

new functions for unknown genes

• An early “killer application” in this

area
– The most cited (12,309) paper in PNAS!
Unsupervised learning

• Clustering methods are unsupervised learning

techniques
- We do not have a teacher that provides examples with their
labels

• We will also discuss dimensionality reduction,

another unsupervised learning method later in the
course
Outline
•Distance functions
•Hierarchical clustering
•Number of clusters
What is Similarity?
The quality or state of being similar; likeness; resemblance; as, a similarity of features.
Webster's Dictionary

Similarity is hard
to define, but…
“We know it when
we see it”

The real meaning

of similarity is a
philosophical
question. We will
take a more
pragmatic
approach.
Defining Distance Measures
Definition: Let O1 and O2 be two objects from the
universe of possible objects. The distance (dissimilarity)
between O1 and O2 is a real number denoted by D(O1,O2)

gene1
gene2

0.23 3 342.7
gene1 gene2

Inside these black boxes:

d('', '') = 0 d(s, '') = d('',
s) = |s| -- i.e. length
of s d(s1+ch1,
some function on two variables
s2+ch2) = min( d(s1,
s2) + if ch1=ch2 then
0 else 1 fi, d(s1+ch1,
(might be simple or very
s2) + 1, d(s1, s2+ch2)
+1) complex)

A few examples: d(x, y)  (x i  y i )2

• Euclidian distance i • Similarity rather than distance
• Can determine similar trends
• Correlation coefficient
(x i  x )(y i  y )
 s(x, y)  i
 x y
Outline
•Distance measure
•Hierarchical clustering
•Number of clusters
Desirable Properties of a Clustering Algorithm

• Scalability (in terms of both time and space)

• Ability to deal with different data types
• Minimal requirements for domain knowledge to
determine input parameters
• Interpretability and usability
Optional
- Incorporation of user-specified constraints
Two Types of Clustering
• Partitional algorithms: Construct various partitions and then
evaluate them by some criterion
• Hierarchical algorithms: Create a hierarchical decomposition of
the set of objects using some criterion (focus of this class)
Bottom up or top down Top down

Hierarchical Partitional
(How-to) Hierarchical Clustering
The number of dendrograms with n Bottom-Up (agglomerative): Starting
leafs = (2n -3)!/[(2(n -2)) (n -2)!] with each item in its own cluster, find
the best pair to merge into a new cluster.
Number Number of Possible
of Leafs Dendrograms Repeat until all clusters are fused
2 1 together.
3 3
4 15
5 105
... …
10 34,459,425
We begin with a distance
matrix which contains the
distances between every pair
of objects in our database.

0 8 8 7 7

0 2 4 4

0 3 3
D( , ) = 8 0 1

D( , ) = 1 0
Bottom-Up (agglomerative):
Starting with each item in its own
cluster, find the best pair to merge into
a new cluster. Repeat until all clusters
are fused together.

Consider all Choose

possible … the best
merges…
Bottom-Up (agglomerative):
Starting with each item in its own
cluster, find the best pair to merge into
a new cluster. Repeat until all clusters
are fused together.

Consider all
Choose
possible
merges… … the best

Consider all Choose

possible … the best
merges…
Bottom-Up (agglomerative):
Starting with each item in its own
cluster, find the best pair to merge into
a new cluster. Repeat until all clusters
are fused together.

Consider all
Choose
possible
merges… … the best

Consider all Choose

possible … the best
merges…
Bottom-Up (agglomerative):
Starting with each item in its own
cluster, find the best pair to merge into
a new cluster. Repeat until all clusters
are fused together.

Consider all
Choose
possible
merges… … the best
But how do we compute distances
between clusters rather than
Consider all objects? Choose
possible
merges… … the best

Consider all Choose

possible … the best
merges…
Computing distance between
clusters: Single Link
• cluster distance = distance of two closest
members in each class

- Potentially
long and skinny
clusters
Computing distance between
clusters: : Complete Link
• cluster distance = distance of two farthest
members

+ tight clusters
Computing distance between
clusters: Average Link
• cluster distance = average distance of all
pairs

the most widely

used measure
Robust against
noise
Example: single link
1 2 3 4 5
1 0 
2  2 0 

3 6 3 0 
 
4 10 9 7 0 
5  9 8 5 4 0

5
4
3
2
1
Example: single link
1 2 3 4 5 (1,2) 3 4 5
1 0  (1,2) 0 
2  2 
3 3 0 
0 
3 6 3 0  
  4 9 7 0 
4 10 9 7 0   
5 8 5 4 0
5  9 8 5 4 0

d (1, 2), 3  min{d1,3 , d 2, 3}  min{6,3}  3

5
d (1, 2), 4  min{d1, 4 , d 2, 4 }  min{10,9}  9 4
d (1, 2), 5  min{d1,5 , d 2, 5}  min{9,8}  8 3
2
1
Example: single link
1 2 3 4 5 (1,2) 3 4 5 (1,2,3) 4 5
1 0  (1,2) 0 
2  2  (1,2,3) 0 
3 3 0 
0 
3 6 3 0   4 7 0 
  4 9 7 0 
4 10 9 7 0    5 5 4 0
5 8 5 4 0
5  9 8 5 4 0

5
d (1, 2, 3), 4  min{d(1, 2), 4 , d 3, 4}  min{9,7}  7
d (1, 2, 3),5  min{d (1, 2), 5 , d3, 5}  min{8,5}  5 4
3
2
1
Example: single link
1 2 3 4 5 (1,2) 3 4 5 (1,2,3) 4 5
1 0  (1,2) 0 
2  2  (1,2,3) 0 
3 3 0 
0 
3 6 3 0   4 7 0 
  4 9 7 0 
4 10 9 7 0    5 5 4 0
5 8 5 4 0
5  9 8 5 4 0

5
d (1, 2, 3),( 4, 5)  min{d (1, 2, 3), 4 , d (1, 2, 3),5 }  5
4
3
2
1
Single linkage

Height represents 2

distance between objects 1

29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7

/ clusters Average linkage

Summary of Hierarchal Clustering Methods

• No need to specify the number of clusters in

advance.
• Hierarchical structure maps nicely onto human
intuition for some domains
• They do not scale well: time complexity of at least
O(n2), where n is the number of total objects.
• Like any heuristic search algorithms, local optima
are a problem.
• Interpretation of results is (very) subjective.
But what are the clusters?
In some cases we can determine the “correct” number of clusters.
However, things are rarely this clear cut, unfortunately.
One potential use of a dendrogram is to detect outliers
The single isolated branch is suggestive of a
data point that is very different to all others

Outlier
Example: clustering genes
• Microarrays measures the activities of all
genes in different conditions

• Clustering genes can help determine new

functions for unknown genes
Partitional Clustering
• Nonhierarchical, each instance is placed in
exactly one of K non-overlapping clusters.
• Since the output is only one set of clusters the
user has to specify the desired number of
clusters K.
K-means Clustering: Finished!
Re-assign and move centers, until …
no objects changed membership.
expression in condition 2 5

4
k1

k2
1
k3

0
0 1 2 3 4 5

expression in condition 1
Gaussian
mixture
clustering
Clustering methods: Comparison
Hierarchical K-means GMM

Running naively, O(N3) fastest (each fast (each

time iteration is iteration is
linear) linear)
Assumptions requires a strong strongest
similarity / assumptions assumptions
distance measure
Input none K (number of K (number of
parameters clusters) clusters)
Clusters subjective (only a exactly K exactly K
tree is returned) clusters clusters
Outline
• Distance measure
• Hierarchical clustering
• Number of clusters
How can we tell the right number of clusters?

In general, this is a unsolved problem. However there are many

approximate methods. In the next few slides we will see an example.

10
9
8
7
6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10
When k = 1, the objective function is 873.0

1 2 3 4 5 6 7 8 9 10
When k = 2, the objective function is 173.1

1 2 3 4 5 6 7 8 9 10
When k = 3, the objective function is 133.6

1 2 3 4 5 6 7 8 9 10
We can plot the objective function values for k equals 1 to 6…

The abrupt change at k = 2, is highly suggestive of two clusters

in the data. This technique for determining the number of
clusters is known as “knee finding” or “elbow finding”.
1.00E+03

9.00E+02
Objective Function

8.00E+02

7.00E+02

6.00E+02

5.00E+02

4.00E+02

3.00E+02

2.00E+02

1.00E+02

0.00E+00
1 2 3 4 5 6
k

Note that the results are not always as clear cut as in this toy example
Cross validation
• We can also use cross validation to determine the correct number of classes
• Recall that GMMs is a generative model. We can compute the likelihood of
the left out data to determine which model (number of clusters) is more
accurate
n  k 
p(x1 x n |  )   p(x j | C  i)wi 
j1 i1 


Cross validation
What you should know
• Why is clustering useful
• What are the different types of clustering
algorithms
• What are the assumptions we are making
for each, and what can we get from them
• Unsolved issues: number of clusters,
initialization, etc.

ISSCC 2021 Regular Presentations (Template & Guide)
No ratings yet
ISSCC 2021 Regular Presentations (Template & Guide)
17 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering
No ratings yet
Clustering
20 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Data Clustering..
No ratings yet
Data Clustering..
10 pages
Introduction To Clustering: Alka Arora Sr. Scientist
No ratings yet
Introduction To Clustering: Alka Arora Sr. Scientist
57 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Lecture+Notes+ +clustering
No ratings yet
Lecture+Notes+ +clustering
13 pages
Lecture Notes - Clustering
No ratings yet
Lecture Notes - Clustering
13 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
Clustering
No ratings yet
Clustering
39 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Week 10
No ratings yet
Week 10
84 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
Week 07 Lecture Material
No ratings yet
Week 07 Lecture Material
49 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Clustering: Sridhar S Department of IST Anna University
No ratings yet
Clustering: Sridhar S Department of IST Anna University
91 pages
Clustering
No ratings yet
Clustering
75 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Unit 2
No ratings yet
Unit 2
33 pages
Cluster
100% (1)
Cluster
72 pages
MachineLearning Unit IV
No ratings yet
MachineLearning Unit IV
51 pages
AIMLB PGP 2024 Session 12
No ratings yet
AIMLB PGP 2024 Session 12
46 pages
4.1 Clustering
No ratings yet
4.1 Clustering
80 pages
Clustering
No ratings yet
Clustering
22 pages
CV w4 - Recognition - Statistical Based
No ratings yet
CV w4 - Recognition - Statistical Based
42 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
Clustering
No ratings yet
Clustering
69 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Unit 2
No ratings yet
Unit 2
89 pages
Slide TIF311 DM 10 11
No ratings yet
Slide TIF311 DM 10 11
49 pages
Clustering Class
No ratings yet
Clustering Class
103 pages
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
No ratings yet
ML Ch-5 Clustering, Dimensionality Reduction and Recommender System
13 pages
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
No ratings yet
DEU CSC5045 Intelligent System Applications Using Fuzzy - 4+clustering
61 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Unit IV
No ratings yet
Unit IV
51 pages
Module 5
No ratings yet
Module 5
43 pages
Clustering
No ratings yet
Clustering
80 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Clustering
No ratings yet
Clustering
38 pages
Clustering Part1
No ratings yet
Clustering Part1
84 pages
ML Unit-5
No ratings yet
ML Unit-5
30 pages
ML4 Unsupervised Learning
No ratings yet
ML4 Unsupervised Learning
60 pages
Lecture 4
No ratings yet
Lecture 4
6 pages
Clustering
No ratings yet
Clustering
84 pages
w6 Clustering
No ratings yet
w6 Clustering
29 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
W6 Clustering
No ratings yet
W6 Clustering
29 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
EBLQ-CV3, CW1 EDLQ-CV3, CW1 4PEN522034-1 2018 01 Installer Reference Guide English
No ratings yet
EBLQ-CV3, CW1 EDLQ-CV3, CW1 4PEN522034-1 2018 01 Installer Reference Guide English
108 pages
Adaptive DFE Modeling Using IBIS v4. 2
No ratings yet
Adaptive DFE Modeling Using IBIS v4. 2
36 pages
Phone
0% (1)
Phone
4 pages
Cat Printable Pack o
100% (1)
Cat Printable Pack o
81 pages
Aman Pandey Resume 20241012
No ratings yet
Aman Pandey Resume 20241012
2 pages
Resume: Lokam Srikanth Contact No: +91 8463931010
No ratings yet
Resume: Lokam Srikanth Contact No: +91 8463931010
2 pages
LV Circuit Breaker Calculator Guide (Level 2) European Arc Guide EAG
No ratings yet
LV Circuit Breaker Calculator Guide (Level 2) European Arc Guide EAG
5 pages
Bilal CV
No ratings yet
Bilal CV
3 pages
Multi-Service Video Management Platform: Ultra Series
No ratings yet
Multi-Service Video Management Platform: Ultra Series
2 pages
Second Floor Beam & Slab Layout: B C D E A
No ratings yet
Second Floor Beam & Slab Layout: B C D E A
1 page
Toyota Sienna 6
No ratings yet
Toyota Sienna 6
2 pages
Permission Forms
No ratings yet
Permission Forms
4 pages
Cambridge International AS & A Level: Computer Science 9618/13 October/November 2021
No ratings yet
Cambridge International AS & A Level: Computer Science 9618/13 October/November 2021
9 pages
"SCILAB - An Open Source Substitute For MATLAB": Organized By: JNTUH College of Engineering, Sultanpur
No ratings yet
"SCILAB - An Open Source Substitute For MATLAB": Organized By: JNTUH College of Engineering, Sultanpur
4 pages
Lecture 1: Cryptography: 1.2.1 Symmetric Case
No ratings yet
Lecture 1: Cryptography: 1.2.1 Symmetric Case
3 pages
Communication Superiority4
No ratings yet
Communication Superiority4
9 pages
Standard Truss Garage Plan
No ratings yet
Standard Truss Garage Plan
12 pages
Ics 2105 Data Structures & Algorithm
No ratings yet
Ics 2105 Data Structures & Algorithm
4 pages
Model SLS
No ratings yet
Model SLS
2 pages
5a931d082a7d0 PDF
No ratings yet
5a931d082a7d0 PDF
83 pages
BCI Protocol V1.4
No ratings yet
BCI Protocol V1.4
3 pages
Full Introduction About Xilinx FPGA and Its Architecture
No ratings yet
Full Introduction About Xilinx FPGA and Its Architecture
19 pages
Mbeya University of Science and Technology: Admission Requirements
No ratings yet
Mbeya University of Science and Technology: Admission Requirements
15 pages
Introduction To Text Mining
No ratings yet
Introduction To Text Mining
45 pages
Magnum Press-On
No ratings yet
Magnum Press-On
5 pages
Vintage Disco Kit Manual
No ratings yet
Vintage Disco Kit Manual
24 pages
SCM Module1 Questions and Answers 1
No ratings yet
SCM Module1 Questions and Answers 1
11 pages
Lecture 01.1 Introduction To Website Development
No ratings yet
Lecture 01.1 Introduction To Website Development
22 pages
Temporary Revision N 07702-TR-02-20181009
No ratings yet
Temporary Revision N 07702-TR-02-20181009
32 pages

Agglomerative Clustering

Uploaded by

Agglomerative Clustering

Uploaded by

10601

Reading: Bishop: 9-9.2

• low inter-cluster similarity

•Informally, finding natural

•Why do we want to do that?

• Microarrays measures the activities

• Clustering genes can help determine

• An early “killer application” in this

• Clustering methods are unsupervised learning

• We will also discuss dimensionality reduction,

The real meaning

Inside these black boxes:

A few examples: d(x, y)  (x i  y i )2

• Scalability (in terms of both time and space)

Consider all Choose

Consider all Choose

Consider all Choose

Consider all Choose

the most widely

d (1, 2), 3  min{d1,3 , d 2, 3}  min{6,3}  3

distance between objects 1

/ clusters Average linkage

• No need to specify the number of clusters in

• Clustering genes can help determine new

Running naively, O(N3) fastest (each fast (each

In general, this is a unsolved problem. However there are many

The abrupt change at k = 2, is highly suggestive of two clusters

You might also like