Spectral Clustering: Eyal David Image Processing Seminar May 2008
Spectral Clustering: Eyal David Image Processing Seminar May 2008
Eyal David
Image Processing seminar
May 2008
Lecture Outline
Motivation
Graph overview and construction
Demo
Spectral Clustering
Demo
Cool implementations
2
3
-2 -1.5 -1 -0.5
0
0 0.5 1 1.5 2 poorly in this space due bias
-0.5
toward dense spherical
-1
clusters.
-1.5
-2
0.8
0.6
0.4
0.2
5
6
19
20
29
30
Motivation
Given a set of points
S s1 ,..., sn Rl
We would like to cluster them into k
subsets
33
Slides from Spectral Clustering by Rebecca Nugent, Larissa Stanberry
based on Ng et al On Spectral clustering: analysis and algorithm
Algorithm
Form the affinity matrix W R nxn
||si s j || / 2
if i j
2 2
DefineWij e
Wii 0
Scaling parameter chosen by user
35
Slides from Spectral Clustering by Rebecca Nugent, Larissa Stanberry
based on Ng et al On Spectral clustering: analysis and algorithm
Algorithm
Form the matrix Y
Renormalize each of X’s rows to have unit length
Yij X ij /( X ij 2 ) 2
Y Rnxk j
Treat each row of Y as a point in R k
Cluster into k clusters via K-means
36
Slides from Spectral Clustering by Rebecca Nugent, Larissa Stanberry
based on Ng et al On Spectral clustering: analysis and algorithm
Algorithm
Final Cluster Assignment
Assign point si to cluster j iff row i of Y was
assigned to cluster j
37
Slides from Spectral Clustering by Rebecca Nugent, Larissa Stanberry
based on Ng et al On Spectral clustering: analysis and algorithm
Why?
If we eventually use K-means, why not just
apply K-means to the original data?
38
Slides from Spectral Clustering by Rebecca Nugent, Larissa Stanberry
based on Ng et al On Spectral clustering: analysis and algorithm
Some Examples
39
40
41
42
43
44
45
46
47
User’s Prerogative
Affinity matrix construction
Choice of scaling factor
Realistically, search over 2
and pick value that
gives the tightest clusters
Choice of k, the number of clusters
Choice of clustering method
48
Slides from Spectral Clustering by Rebecca Nugent, Larissa Stanberry
based on Ng et al On Spectral clustering: analysis and algorithm
How to select k?
Eigengap: the difference between two consecutive eigenvalues.
Most stable clustering is generally given by the value k that
maximises the expression
k k k 1
Largest eigenvalues 50
of Cisi/Medline data 45 λ1
40
35
max k 2 1
Eigenvalue
30
25
λ2
Choose k=2
20
15
10
5
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
K 49
50
51
The End