Clustering
Clustering
Computer Vision
CSE M164
Today’s class
• Fitting and alignment
– One more algorithm: ICP
– Review of all the algorithms
• Clustering algorithms
– K-means
– Hierarchical clustering
– Spectral clustering
What if you want to align but have no prior
matched pairs?
• Important applications
A1
A2 A3 B1
B2 B3
Given matched points in {A} and {B}, estimate the translation of the object
xiB xiA t x
B A t
yi yi y
Example: solving for translation
A1
A2 A3 (tx, ty) B1
B2 B3
A1
A5
B4
A2 A3 (tx, ty) B1
A4
B2 B3
B5
Problem: outliers
A2 A3 (tx, ty) B1
A4 B2 B3
A5 A6
(tx, ty)
Key Challenges:
1) What makes two points/images/patches similar?
2) How do we compute an overall grouping from
pairwise similarities?
Why do we cluster?
• Summarizing data
– Look at large amounts of data
– Patch-based compression or denoising
– Represent a large continuous vector with the cluster number
• Counting
– Histograms of texture, color, SIFT vectors
• Segmentation
– Separate the image into different regions
• Prediction
– Images in the same cluster may have the same labels
How do we cluster?
• K-means
– Iteratively re-assign points to the nearest cluster center
• Agglomerative clustering
– Start with each point as its own cluster and iteratively
merge the closest clusters
• Spectral clustering
– Split the nodes in a graph based on assigned links with
similarity weights
Clustering for Summarization
Goal: cluster to minimize variance in data
given clusters
– Preserve information
c ,δ j i
Whether xj is assigned to ci
K-means
c xj
t t 1 2
δ argmin 1
N ij i
δ j i
c j i
• Distance measures
– Traditionally Euclidean, could be others
• Optimization
– Will converge to a local minimum
– May want to perform multiple restarts
• Generative
– How well are points reconstructed from the clusters?
– Example: Predict the next word in a sequence
• Discriminative
– How well do the clusters correspond to labels?
• Purity
– Example: Spectral clustering
– Note: unsupervised clustering does not aim to be
discriminative
How to choose the number of clusters?
• Validation set
– Try different numbers of clusters and look at
performance
• When building dictionaries (discussed later), more
clusters typically work better
Conclusions: K-means
Good
• Finds cluster centers that minimize conditional variance (good
representation of data)
• Simple to implement, widespread application
Bad
• Prone to local minima
• Need to choose K
• All clusters have the same parameters (e.g., distance measure
is non-adaptive)
• Can be slow: each iteration is O(KNd) for N d-dimensional
points
Building Visual Dictionaries
1. Sample patches from
a database
– E.g., 128 dimensional
SIFT vectors
3. Assign a codeword
(number) to each
new patch, according
to the nearest cluster
Examples of learned codewords
• Mahalanobis
– Scaled Euclidean
• Cosine distance
K-medoids
• Just like K-means except
– Represent the cluster with one of its members,
rather than the mean of its members
– Choose the member (data point) that minimizes
cluster dissimilarity
distance
Conclusions: Agglomerative Clustering
Good
• Simple to implement, widespread application
• Clusters have adaptive shapes
• Provides a hierarchy of clusters
Bad
• May have imbalanced clusters
• Still have to choose number of clusters or threshold
• Need to use an “ultrametric” to get a meaningful
hierarchy
Spectral clustering
Group points based on links in a graph
B
A
Cuts in a graph
B
A
Normalized Cut
• a cut penalizes large segments
• fix by normalizing for size of segments
Source: Seitz
Normalized cuts for segmentation
Visual PageRank
• Determining importance by random walk
– What’s the probability that you will randomly walk
to a given node?
• Create adjacency matrix based on visual similarity
• Edge weights determine probability of transition
http://www.cs.berkeley.edu/~arbelaez/UCM.html
Which algorithm to use?
• Image segmentation: spectral clustering
– Can provide more regular regions
– Spectral methods also used to propagate global
cues
Things to remember
• K-means useful for summarization,
building dictionaries of patches,
general clustering