Agglomerative Clustering
Agglomerative Clustering
Machine Learning
Hierarchical clustering
Similarity is hard
to define, but…
“We know it when
we see it”
gene1
gene2
0.23 3 342.7
gene1 gene2
Hierarchical Partitional
(How-to) Hierarchical Clustering
The number of dendrograms with n Bottom-Up (agglomerative): Starting
leafs = (2n -3)!/[(2(n -2)) (n -2)!] with each item in its own cluster, find
the best pair to merge into a new cluster.
Number Number of Possible
of Leafs Dendrograms Repeat until all clusters are fused
2 1 together.
3 3
4 15
5 105
... …
10 34,459,425
We begin with a distance
matrix which contains the
distances between every pair
of objects in our database.
0 8 8 7 7
0 2 4 4
0 3 3
D( , ) = 8 0 1
D( , ) = 1 0
Bottom-Up (agglomerative):
Starting with each item in its own
cluster, find the best pair to merge into
a new cluster. Repeat until all clusters
are fused together.
Consider all
Choose
possible
merges… … the best
Consider all
Choose
possible
merges… … the best
Consider all
Choose
possible
merges… … the best
Consider all
Choose
possible
merges… … the best
But how do we compute distances
between clusters rather than
Consider all objects? Choose
possible
merges… … the best
- Potentially
long and skinny
clusters
Computing distance between
clusters: : Complete Link
• cluster distance = distance of two farthest
members
+ tight clusters
Computing distance between
clusters: Average Link
• cluster distance = average distance of all
pairs
5
4
3
2
1
Example: single link
1 2 3 4 5 (1,2) 3 4 5
1 0 (1,2) 0
2 2
3 3 0
0
3 6 3 0
4 9 7 0
4 10 9 7 0
5 8 5 4 0
5 9 8 5 4 0
5
d (1, 2, 3), 4 min{d(1, 2), 4 , d 3, 4} min{9,7} 7
d (1, 2, 3),5 min{d (1, 2), 5 , d3, 5} min{8,5} 5 4
3
2
1
Example: single link
1 2 3 4 5 (1,2) 3 4 5 (1,2,3) 4 5
1 0 (1,2) 0
2 2 (1,2,3) 0
3 3 0
0
3 6 3 0 4 7 0
4 9 7 0
4 10 9 7 0 5 5 4 0
5 8 5 4 0
5 9 8 5 4 0
5
d (1, 2, 3),( 4, 5) min{d (1, 2, 3), 4 , d (1, 2, 3),5 } 5
4
3
2
1
Single linkage
Height represents 2
29 2 6 11 9 17 10 13 24 25 26 20 22 30 27 1 3 8 4 12 5 14 23 15 16 18 19 21 28 7
Outlier
Example: clustering genes
• Microarrays measures the activities of all
genes in different conditions
4
k1
k2
1
k3
0
0 1 2 3 4 5
expression in condition 1
Gaussian
mixture
clustering
Clustering methods: Comparison
Hierarchical K-means GMM
10
9
8
7
6
5
4
3
2
1
1 2 3 4 5 6 7 8 9 10
When k = 1, the objective function is 873.0
1 2 3 4 5 6 7 8 9 10
When k = 2, the objective function is 173.1
1 2 3 4 5 6 7 8 9 10
When k = 3, the objective function is 133.6
1 2 3 4 5 6 7 8 9 10
We can plot the objective function values for k equals 1 to 6…
9.00E+02
Objective Function
8.00E+02
7.00E+02
6.00E+02
5.00E+02
4.00E+02
3.00E+02
2.00E+02
1.00E+02
0.00E+00
1 2 3 4 5 6
k
Note that the results are not always as clear cut as in this toy example
Cross validation
• We can also use cross validation to determine the correct number of classes
• Recall that GMMs is a generative model. We can compute the likelihood of
the left out data to determine which model (number of clusters) is more
accurate
n k
p(x1 x n | ) p(x j | C i)wi
j1 i1
Cross validation
What you should know
• Why is clustering useful
• What are the different types of clustering
algorithms
• What are the assumptions we are making
for each, and what can we get from them
• Unsolved issues: number of clusters,
initialization, etc.