Minimum Spanning Trees: Application To Clustering
Minimum Spanning Trees: Application To Clustering
Spanning Trees
Application to
Algorithms: Design
and Analysis, Part II Clustering
Clustering
[aka “unsupervised learning”]
Informal goal: Given n “points” [Web pages, images, genome
fragments, etc.] classify into “coherent groups”.
Assumptions: (1) As input, given a (dis)similarity measure — a
distance d(p, q) between each point pair.
(2) Symmetric [i.e., d(p, q) = d(q, p)]
Examples: Euclidean distance, genome similarity, etc.
Goal: Same cluster ⇐⇒ “nearby”
Tim Roughgarden
Max-Spacing k-Clusterings
Assume: We know k:= # of clusters desired. [In practice, can
experiment with a range of values]
Call points p & q separated if they’re assigned to different clusters.
Tim Roughgarden
A Greedy Algorithm
- Initially, each point in a separate cluster
- Repeat until only k clusters:
- Let p, q = closest pair of separated points (determines the
current spacing)
-Merge the clusters containing p & q into a single cluster.
Note: Just like Kruskal’s MST algorithm, but stopped early.
- Points ↔ vertices, distances ↔ edge costs, point pairs ↔ edges.
⇒ Called single-link clustering
Tim Roughgarden