0% found this document useful (0 votes)
10 views9 pages

LecN10 R

Uploaded by

taha23akter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

LecN10 R

Uploaded by

taha23akter
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

APPLICATIONS OF GRAPH LAPLACEANS: Clustering: Background

CLUSTERING
ä Problem: we are given n data items: x1, x2, · · · , xn. Would
• Details on clustering like to ‘cluster’ them, i.e., group them so that each group or cluster
• K-means contains items that are similar in some sense.

• Similarity graphs, KNN graphs ä Example: materials ä Example: Digits


Photovoltaic PCA − digits : 5 −− 7
Superhard 5

Superconductors 4

• Edge cuts, ratio cuts, etc. 3

• Application: segmentation Ferromagnetic 0

−1

−2

−3
5
Catalytic −4 6
Multi−ferroics Thermo−electric 7
−5
−6 −4 −2 0 2 4 6 8

ä Refer to each group as a ‘cluster’ or a ‘class’


ä A basic method: K-Means

10-2 – Clustering

A basic method: K-means Methods based on similarity graphs

ä A basic algorithm that uses Euclidean distance ä Class of Methods that perform clustering by exploiting a graph
that describes the similarities between any two items in the data.
1 Select p initial centers: c1, c2, ..., cp for classes 1, 2, · · · , p
2 For each xi do: determine class of xi as argmink kxi − ck k ä Need to:
3 Redefine each ck to be the centroid of class k
4 Repeat until convergence 1. decide what nodes are in the neighborhood of a given node
c1
2. quantify their similarities - by assigning a weight to any pair of


● ● ● nodes.
● ● ● ● ä Simple algorithm


●c ä Works well (gives good Example: For text data: Can decide that any columns i and j

● 3 ●
results) but can be slow with a cosine greater than 0.95 are ‘similar’ and assign that cosine

● c ● ● ä Performance depends on value to wij
2
● initialization
● ●

10-3 – Clustering 10-4 – Clustering


First task: build a ‘similarity’ graph K-nearest neighbor graphs

ä Goal: to build a similarity graph, i.e., a graph that captures ä Given: a set of n data points X = {x1, . . . , xn} → vertices
similarity between any two items ä Given: a proximity measure between two data points xi and xj

j
● – as measured by a quantity dist(xi, xj )

● w(i,j)=? ä Want: For each point xi a list of the ‘nearest neighbors’ of xi



i (edges between xi and these nodes).
● ä Note: graph will usually be directed → need to symmetrize

ä Two methods: K-nearest Neighbor graphs or use Gaussian (‘heat’)


kernel

10-5 – Clustering 10-6 – Clustering

Nearest neighbor graphs Two types of nearest neighbor graph often used:
-graph: Edges consist of pairs (xi, xj ) such that
ä For each node, get a few of the nearest neighbors → Graph ρ(xi, xj ) ≤ 

kNN graph: Nodes adjacent to xi are those nodes x` with the


Data
k with smallest distances ρ(xi, x`).
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
ä -graph is undirected and is geometrically motivated. Issues: 1)
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111 may result in disconnected components 2) what ?
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
ä kNN graphs are directed in general (can be trivially fixed).
Graph ä kNN graphs especially useful in practice.
ä Problem: How to build a nearest-neighbor graph from given data
ä We will revisit this later.

10-7 – Clustering 10-8 – Clustering


Similarity graphs: Using ‘heat-kernels’ Edge cuts, ratio cuts, normalized cuts, ...

Define weight between i and j as: ä Assume now that we have built a ‘similarity graph’
 ä Setting is identical with that of graph partitioning.
−kxi −xj k2
 2
wij = fij × e σX
if kxi − xj k < r ä Need a Graph Laplacean: L = D−W with wii = 0, wij ≥ 0
0 if not and D = diag(W ∗ ones(n, 1)) [in matlab notation]

ä Note kxi − xj k could be any measure of distance... ä Partition vertex set V in two sets A and B with

ä fij = optional = some measure of similarity - other than distance A ∪ B = V, A∩B =∅

ä Only nearby points kept.


ä Define
ä Sparsity depends on parameters X
cut(A, B) = w(u, v)
u ∈A,v∈B

10-9 – Clustering 10-10 – Clustering

ä First (naive) approach: use this measure to partition graph, i.e., Ratio-cuts
... Find A and B that minimize cut(A, B).
ä Standard Graph Partitioning approach: Find A, B by solving
ä Issue: Small sets, isolated nodes, big imbalances,
● Minimize cut(A, B), subject to |A| = |B|
●● ● Min−cut 1
● ● ● ●
● ä Condition |A| = |B| not too meaningful in some applications -
● ●
● ● too restrictive in others.
● ●● ● ●
● ä Minimum Ratio Cut approach. Find A, B by solving:
● ● ● ●
● ● Min−cut 2 cut(A,B)
●● ● Minimize |A|.|B|

ä Difficult to find solution (original paper [Wei-Cheng ’91] proposes


Better cut several heuristics)
ä Approximate solution : spectral .

10-11 – Clustering 10-12 – Clustering


Theorem [Hagen-Kahng, 91] If λ2 is the 2nd smallest eigenvalue Therefore, by the Courant-Fischer theorem:
of L, then a lower bound for the cost c of the optimal ratio cut (Lx, x) w(A, B)
partition, is: λ2 ≤ =n× = n × c.
(x, x) |A|.|B|
λ2
c≥ . Hence result.
n
Proof: Consider an optimal partition A, B and let p = |A|/n, q = ä Idea is to use eigenvector associated with λ2 to determine par-
|B|/n. Note that p + q = 1. Let x be the vector with coordinates tition, e.g., based on sign of entries. Use the ratio-cut measure to
 actually determine where to split.
q if i ∈ A
xi =
−p if i ∈ B
Note that x ⊥ 1. Also if (i, j) == an edge-cut then |xi − xj | =
|q − (−p)|P = |q + p| = 1,2otherwise xi − xj = 0. Therefore,
T
x Lx = (i,j)∈E (xi − xj ) = w(A, B). In addition:
kxk2 = pq 2n + qp2n = pq(p + q)n = pqn = |A|.|B| n
.

10-14 – Clustering

Normalized cuts [Shi-Malik,2000] ä Therefore:


X
P cut(A, B) = wij = xT Lx
ä Recall notation w(X, Y ) = x∈X,y∈Y w(x, y) - then define: xi=1,xj =0
X
ncut(A, B) = cut(A,B)
+ cut(A,B) w(A, V ) = di = x T W 1 = xT D 1
w(A,V ) w(B,V ) xi=1
X
ä Goal is to avoid small sets A, B w(B, V ) = dj = ( 1 − x)T W 1 = ( 1 − x)T D 1
xj =0
- 1 What is w(A, V ) in the case when wij == 1 ?
ä Goal now: to minimize ncut
ä Let x be an indicator vector:

1 if i ∈ A
xi = xT Lx xT Lx
0 if i ∈ B min ncut(A, B) = min +
A,B xi ∈{0,1} xT Dx ( 1 − x)T Dx
P
ä Recall that: xT Lx = (i,j)∈E wij |xi − xj |2 (note: each
edge counted once)

10-15 – Clustering 10-16 – Clustering


w(A, V ) xT D 1 A few properties
ä Let β= =
w(B, V ) ( 1 − x)T D 1 - 2 Show that
y = x − β( 1 − x)
cut(A, B)
ncut(A, B) = σ ×
w(A, V ) × w(B, V )
y T Ly
ä Then we need to solve: min where σ is a constant
yi {0,−β}y T Dy
Subject to y T D 1 = 0 - 3 How do ratio-cuts and normalized cuts compare when the graph
is d-regular (same degree for each node).
ä + Relax → need to solve Generalized eigenvalue problem
Ly = λDy

ä y1 = 1 is eigenvector associated with eigenvalue λ1 = 0


ä y2 associated with second eigenvalue solves problem.

10-17 – Clustering 10-18 – Clustering

Extension to more than 2 clusters Application: Image segmentation

ä Just like graph partitioning we can: ä First task: obtain a graph from pixels.

1. Apply the method recursively [Repeat clustering on the resulted ä Common idea: use “Heat kernels”
parts] ä Let Fj = feature value (e.g., brightness), and Let Xj = spatial
2. or compute a few eigenvectors and run K-means clustering on these position.
eigenvectors to get the clustering. Then define

−kXi −Xj k2
−kFi −Fj k2  2
wij = e σI2
× e σX
ifkXi − Xj k < r
0 else
ä Sparsity depends on parameters

10-19 – Clustering 10-20 – Clustering


Spectral clustering: General approach ●


● ● ● ●


1 Given: Collection of data samples {x1, x2, · · · , xn} ●



2 Build a similarity graph between items ●




j ●

● ●

● w(i,j)=? ●
i
● ä Alg. Multiplicity of eigenvalue zero = # connected components.


3 Compute (smallest) eigenvector (s) of resulting graph Laplacean


4 Use k-means on eigenvector (s) of Laplacean
ä For Normalized cuts solve generalized eigen problem.
10-21 – Clustering 10-22 – Clustering

Building a nearest neighbor graph Recall: Two common types of nearest neighbor graphs
-graph: Edges consist of pairs (xi, xj ) such that
ä Question: How to build a nearest-neighbor graph from given ρ(xi, xj ) ≤ 
data?
kNN graph: Nodes adjacent to xi are those nodes x` with the
Data k with smallest distances ρ(xi, x`).
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111 ä -graph is undirected and is geometrically motivated. Issues: 1)
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111 may result in disconnected components 2) what ?
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
000000000000000000000000000000000000000
111111111111111111111111111111111111111
ä kNN graphs are directed in general (can be trivially fixed).
Graph
ä kNN graphs especially useful in practice.
ä Will demonstrate the power of a divide a conquer approach
combined with the Lanczos algorithm.
ä Note: The Lanczos algortithm will be covered in detail later

10-23 – knn 10-24 – knn


Divide and conquer KNN: key ingredient ä Hyperplane is defined as hu, xi = 0, i.e., it splits the set of
data points into two subsets:
ä Key ingredient is Spectral bisection
ä Let the data matrix X = [x1, . . . , xn] ∈ Rd×n X+ = {xi | uT x̂i ≥ 0} and X− = {xi | uT x̂i < 0}.
ä Each column == a data point.
ä Center the data: X̂ = [x̂1, . . . , x̂n] = X − ceT u
where c == centroid; e = ones(d, 1) (matlab) ●

Goal: Split X̂ into halves using a hyperplane.


Method: Principal Direction Divisive Partitioning D. Boley, ’98. ● + SIDE

Idea: Use the (σ, u, v) = largest singular triplet of X̂ with: − SIDE


Hyperplane

uT X̂ = σv T .
ä Note that uT x̂i = uT X̂ei = σv T ei →

10-25 – knn 10-26 – knn

X+ = {xi | vi ≥ 0} and X− = {xi | vi < 0}, Two divide and conquer algorithms

Overlap method: divide current set into two overlapping subsets


where vi is the i-th entry of v. X1, X2

ä In practice: replace above criterion by Glue method: divide current set into two disjoint subsets X1, X2
plus a third set X3 called gluing set.
X+ = {xi | vi ≥ med(v)} & X− = {xi | vi < med(v)}
hyperplane hyperplane

where med(v) == median of the entries of v.


ä For largest singular triplet (σ, u, v) of X̂ : use Golub-Kahan- X1 X2 X1 X3 X2
Lanczos algorithm or Lanczos applied to X̂ X̂ T or X̂ T X̂
ä Cost (assuming s Lanczos steps) : O(n × d × s) ; Usually: d
very small

10-27 – knn 10-28 – knn


The Overlap Method The Glue Method

ä Divide current set X into two overlapping subsets: Divide the set X into two disjoint subsets X1 and X2 with a gluing
subset X3:
X1 = {xi | vi ≥ −hα(Sv )} and X2 = {xi | vi < hα(Sv )},
X1∪X2 = X, X1∩X2 = ∅, X1∩X3 6= ∅, X2∩X3 6= ∅.
• where Sv = {|vi| | i = 1, 2, . . . , n}.
Criterion used for splitting:
• and hα(·) is a function that returns an element larger than (100α)%
of those in Sv . X1 = {xi | vi ≥ 0}, X2 = {xi | vi < 0},
X3 = {xi | −hα(Sv ) ≤ vi < hα(Sv )}.
ä Rationale: to ensure that the two subsets overlap (100α)% of
the data, i.e., Note: gluing subset X3 here is just the intersection of the sets
|X1 ∩ X2| = dα|X|e . X1, X2 of the overlap method.

10-29 – knn 10-30 – knn

Approximate kNN Graph Construction: The Overlap Method Approximate kNN Graph Construction: The Glue Method

function G = kNN-Overlap[X, k, α] function G = kNN-Glue[X, k, α]


if|X| < nk if|X| < nk
G ← Call kNN-BruteForce[X, k] G ← Call kNN-BruteForce[X, k]
else else
(X1, X2) ← Call Divide-Overlap[X, α] (X1, X2, X3) ← Call Divide-Glue[X, α]
G1 ← Call kNN-Overlap[X1, k, α] G1 ← Call kNN-Glue[X1, k, α]
G2 ← Call kNN-Overlap[X2, k, α] G2 ← Call kNN-Glue[X2, k, α]
G ← Call Conquer[G1, G2] G3 ← Call kNN-Glue[X3, k, α]
Call Refine[G] G ← Call Conquer[G1, G2, G3]
EndIf Call Refine[G]
End EndIf
End

10-31 – knn 10-32 – knn


Theorem The time complexity for the overlap method is
To(n) = Θ(dnto ),
1
where: to = log2/(1+α) 2 = .
1 − log2(1 + α)
Theorem The time complexity for the glue method is
Tg(n) = Θ(dntg /α),
2
where tg is the solution to the equation: 2t
+ αt = 1.

Example: When α = 0.1, then to = 1.16 while tg = 1.12.


Reference:
Jie Chen, Haw-Ren Fang and YS, “Fast Approximate kNN Graph
Construction for High Dimensional Data via Recursive Lanczos Bi-
section” JMLR, vol. 10, pp. 1989-2012 (2009).
10-33 – knn

You might also like