0% found this document useful (0 votes)
63 views35 pages

An Introduction To Locally Linear Embedding: L. K. Saul S. T. Roweis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views35 pages

An Introduction To Locally Linear Embedding: L. K. Saul S. T. Roweis

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

An Introduction to Locally Linear Embedding

L. K. Saul S. T. Roweis
Outline

• Introduction
• Algorithm
• Part 1: Constrained Least Squares Problem
• Part 2: Eigenvalue Problem
• Examples
• Discussion
• LLE from pairwise distances
• Conclusion

An Introduction to Locally Linear Embedding


Introduction and problem definition (1/2)

• Dimensionality reduction: maps the high dimensional data


onto a lower dimensional space in order to reveal meaningful
information for their structure.
• Two popular techniques for dimensionality reduction:
•Principal Component Analysis (PCA): compute the linear
projections of largest variance. It employs the
eigenvectors of the data covariance matrix.
•Multi Dimensional Scaling (MDS): compute low
dimensional embedding that best preserves pairwise
distances.
• Both methods employ eigenvalue calculations (no local
minima appear).
An Introduction to Locally Linear Embedding
Introduction and problem definition (2/2)
• Locally Linear Embedding: performs nonlinear dimensionality
reduction by preserving the neighborhood structure of the data
points.

An Introduction to Locally Linear Embedding


Algorithm Part A (1/2)

• Dataset representation: X [ X 1 , X 2 ,  , X N ]  R DN

Basic assumption of LLE: each data point and its


neighbors lie on or close to a locally linear patch of the
manifold.

IDEA: represent local geometry of the patches by linear


coefficients that reconstruct each data point from its neighbors.

Preprocessing step: identify K neighbors (via Euclidean


distances or epsilon-balls).

An Introduction to Locally Linear Embedding


Algorithm Part A (2/2)

Reconstruction errors: E (W )  | X i   ij j
W X | 2

i j

• The weights Wij represent the contribution of the j-th data point
to the i-th reconstruction.
• The weights are determined under the constraints:
• Wij 0 if X j does not belong to this set.
• W
j
ij 1 i.e., the rows of W sum to 1.

• The optimal weights are determined in a closed form by the


solution of a constrained least-squares problem.

An Introduction to Locally Linear Embedding


Constrained Least Squares (1/4)

 | x   j j  j
w
j
 | 2
| w ( x
j
  j ) | 2
 w j wk C jk
jk

C jk ( x   j )T ( x   k )

Constrained Optimization Problem:

min w  w j wk C jk s. t.  w j 1
jk j

Method: Lagrange multipliers

L  w j wk C jk   ( w j  1)
jk j

An Introduction to Locally Linear Embedding


Constrained Least Squares (2/4)

The Lagrangian function is


L  w j wk C jk   ( w j  1)
jk j

L L
0 (1) and 0 (2) (constraint)
w j 
L
(1)   wk C jk   0  w C k jk 
w j k k

 C(:,j)T w   C T w e  w C  T e (3)

1
(2)  wT e 1  eT C  1e 1    T 1
( 4)
e C e

An Introduction to Locally Linear Embedding


Constrained Least Squares (3/4)

The optimal weights are given in closed form:


CTe
(3), ( 4)  w  T  1
e C e k jk
C 1

or in componentwise form: w j 
 lm
C 1

lm

For any particular data point, the optimal weights are


invariant to rotations, rescalings and translations of that
data point and its neighbors.

Invariance to rotations and rescalings follows from the


definition of the cost function: E (W )  | X i   Wij X j |2
i j

An Introduction to Locally Linear Embedding


Constrained Least Squares (4/4)

What if the local covariance matrix C is singular?


Solutions
C e
1. Use pseudoinverse of C: w  T 
e C e
2. Use regularization
2
C jk C jk  ( ) jk
K

2 is small compared to the trace of C.

An Introduction to Locally Linear Embedding


Properties of the optimal weights

Invariance to translations is due to:

 | ( X i  t) 
i
 ij j
W
j
( X  t ) | 2
 | X i 
i
 ij j because
W
j
X | 2

t  W t 0
j
ij since W
j
ij 1.

The reconstruction weights characterize intrinsic geometric


properties of each neighborhood.

Thus, the local geometry in the original data space is valid


for local patches on the embedded manifold.

An Introduction to Locally Linear Embedding


Intuition…

The same weights Wij that reconstruct the i-th data point in
D dimensions should also reconstruct its embedded
manifold coordinates in d dimensions (d<D).

Take a pair of scissors, cutting out locally linear patches


and placing them in the low dimensional embedding space.
We need no more than rotation, translation and scaling.
When the patch arrives at its low dimensional destination,
we expect the same weights to reconstruct each data point
from its neighbors.

LLE constructs a neighborhood preserving mapping based


on the above idea!

An Introduction to Locally Linear Embedding


Algorithm Part B (1/6)

Each high dimensional data point X i is mapped to a low


dimensional vector Yi representing global internal
coordinates on the manifold.

 (Y )  | Yi   WijY j |
2
Cost function:
i j

• We fix the weights and optimize w.r.t Y.

The optimization problem reduces to an eigenvalue


problem where one needs to compute the bottom d
eigenvectors of a sparse N x N matrix.

An Introduction to Locally Linear Embedding


Algorithm Part B (2/6)
Optimization problem: min Y  (Y )  | Yi  W Y
2
ij j |
i j

 (Y )  M ij (Yi T Y j ), where
ij

M ij  ij  Wij  W ji   WkiWkj
k

Constraints:
1. Coordinates centered at the origin: Y
i
i 0
2. The embedding vectors have unit covariance:
1 (Reconstruction errors for different

N
YY
i
i i
T
I coordinates should be measured
on the same scale.)

An Introduction to Locally Linear Embedding


Algorithm Part B (3/6)

Φ(Y) || Y  WY ||2F || ( I  W )Y ||2F


tr (Y T ( I  W )T ( I  W )Y )

min tr (Y T ( I  W )T ( I  W )Y ) 

Y : bottom eigenvectors of M ( I  W )T ( I  W )

LLE vectors are the bottom eigenvectors of M

An Introduction to Locally Linear Embedding


Algorithm Part B (4/6)
Solution of the problem:
Y is equal to the bottom d+1 eigenvectors of
M ( I  W ) T ( I  W )

The bottom eigenvector is e [1,1,  1]T with eigenvalue 0, since


We e  ( I  W )e 0  Me 0.

This eigenvector is discarded since it carries no meaningful


information.

The remaining d eigenvectors form the d embedding


coordinates found by LLE.

An Introduction to Locally Linear Embedding


Algorithm Part B (5/6) – Implementation aspects
Sparse eigenvalue problem of
M ( I  W ) T ( I  W )  AT A, where A  I  W .

• We need only the d+1 bottom eigenvectors of M.


• W is an N x N sparse matrix.
• M is not explicitly formed.
• We can use a matrix-free sparse eigensolver (e.g.
Lanczos), where M is accessible via MV products:
Mv v  Wv  W T v  W T Wv
• The eigenvectors of M are the right singular vectors of A:
A U  V T , then M  AT A V  T U T U  V T V  T  V T
An Introduction to Locally Linear Embedding
Algorithm Part B (6/6)
• The weights Wij are computed from local neighborhoods.
• The embedding coordinates Yi are computed by the
eigenproblem which is a global operation.

LLE (Final Algorithm)


1. Compute the neighbors of each data point X i.
2. Compute the weights Wij that best reconstruct each data
point from its neighbors, minimizing the cost in eq(1) by
constrained least squares.
3. Compute the vectors Yi best reconstructed by Wij ,
minimizing the eq(2) by its bottom nonzero eigenvectors.

An Introduction to Locally Linear Embedding


The LLE algorithm in one picture…

• Step 2 is solved by the


constrained least squares
problem.
• Step 3 is solved by the
sparse eigenvalue problem.

An Introduction to Locally Linear Embedding


LLE from Pairwise Distances (1/2)

• LLE can be applied to user input in the form of pairwise


distances.
• How to compute the weights?
• We need the local covariance matrix C.
1
C jk  ( D j  Dk  D jk  D0 ), D jk : squared distance between
2
1 1
the jth and kth neighbors, Dl   Dlz and D0  2  D jk
N z N jk

C 1
jk

• The weights are computed via: w j  k

C
lm
1
lm

An Introduction to Locally Linear Embedding


LLE from Pairwise Distances (2/2)

• For each data point, the user only needs to supply its NNs
and the submatrix of pairwise distances between those
neighbors.
• Is it possible to recover manifold structure with less
information?
• The answer is NO.
• This embedding fails to preserve the
underlying structure of the original
manifold.

An Introduction to Locally Linear Embedding


Example 1

• N=600 data points sampled from the S-shaped


manifold.
• K=12 neighbors per data point.
• LLE unravels successfully the underlying two
dimensional structure.

An Introduction to Locally Linear Embedding


Example 2

PCA

• N = 961 grayscale images, each


image contains a 28 x 20 face
superimposed on a 59 x 51
background of noise.
• K = 4 neighbors.

LLE

An Introduction to Locally Linear Embedding


Example 3

PCA

• N = 8588 color images of lips at


108 x 84 resolution.
• K = 16 neighbors.
• If the lip images belong to a linear
manifold, LLE and PCA should yield
similar results.

LLE

An Introduction to Locally Linear Embedding


Example 4

• Color coding illustrates the neighborhood preserving


mapping discovered by LLE.
• N=2000 data points.
• K =20 nearest neighbors.
• LLE discovers the global internal coordinates of the
manifold.

An Introduction to Locally Linear Embedding


Example 5

• 20 x 28 grayscale images.
• Two-dimensional
embeddings of faces.
• K = 12 nearest neighbors.
• The coordinates of the
embeddings are related to
meaningful attributes, such
as the pose and expression
of human face.

An Introduction to Locally Linear Embedding


Example 6

• N = 5000 words from


D = 31000 articles in
Grolier’s Encyclopedia.
• K = 20 nearest
neighbors.
• The coordinates of the
embeddings are related
to meaningful attributes,
such as the semantic
associations of words.

An Introduction to Locally Linear Embedding


Example 6

• The coordinates of
the embeddings are
related to meaningful
attributes, such as the
semantic associations
of words.

An Introduction to Locally Linear Embedding


Discussion (1/2)
• LLE has no local minima problems and only one free
parameter (the number of neighbors K).
• Computational complexity
• Step 1 (computing NNs): O ( DN 2 ) with K-D trees
reduces to O ( N log N ) .
• Step 2 (constrained least-squares): O ( DNK 3 ) solving
a K x K set of linear equations for each data point.
• Step 3 (eigenvalue problem): O ( dN 2 )
• Storage requirement: only the N x K weight matrix W.
• More dimensions of the embedding subspace can be easily
added (since the existing ones do not change.)

An Introduction to Locally Linear Embedding


Discussion (2/2)

• Overlapping local neighborhoods can provide information


about global geometry.
• Similarities with Isomap
• Isomap is an extension of MDS where embeddings are
optimized to preserve “geodesic” distances between pairs
of data points.
• LLE avoids the need to solve large dynamic programming
problems.
• LLE relies on very sparse matrices.
• LLE may be more useful in combination with other methods
in data analysis.

An Introduction to Locally Linear Embedding


Extensions of LLE

• For certain applications, one might impose the constraint


that the weight matrix W is nonnegative. (the reconstruction of
each data point lies within the convex hull of its neighbors)
• LLE uses only info concerning the relative location of each
data point w.r.t. its neighbors.
• LLE can be applied for data lying on several disjoint
manifolds. Different connected components are decoupled in
the eigenvalue problem for LLE.
• If neighbors correspond to nearby observations in time, then
the reconstruction weights can be computed on-line.

An Introduction to Locally Linear Embedding


Conclusions
• LLE is a very useful technique for nonlinear dimension
reduction. All in all, LLE:
• preserves the local neighborhood structure.
• has no local minima problems.
• can handle highly nonlinear manifolds.
• can be applied in cases where only pairwise distances are
available.

For more information, visit the LLE homepage:


http://www.cs.toronto.edu/~roweis/lle/

Shape and motion from image streams: a factorization method


Eigenvalue problem

Φ(Y) || Y  YW ||2F || Y ( I  W ) ||2F


tr (Y ( I  W )( I  W )T Y T )

T T
min
 tr (Y ( I  W )( I  W ) Y )
Y R d N
YY T I

Y T : principal eigenvectors of M ( I  W )( I  W )T

LLE vectors are the eigenvectors of M

An Introduction to Locally Linear Embedding


LLE from Pairwise Distances (1/2)

• Define xi  i  x . The local covariance matrix can be


written C jk ( j  x )T ( k  x ).

Dij || xi  x j ||2  xiT xi  x Tj x j  2 xiT x j C ii  C jj  2C ij (1)

• Note that C
i
ij 0 since the xi’s are centered.
• Summing over i, over j and over i,j yields

 D  C
i
ij
i
ii  NC jj  Di tr (C ) / N  C jj

Dj
ij  NCii   C jj  D j Cii  tr (C ) / N
j

An Introduction to Locally Linear Embedding


LLE from Pairwise Distances (2/2)
1
 Dij 2 N tr (C ) 
ij N2
D
ij
ij 2 tr (C ) / N  D0 2 tr (C ) / N

The above equations result in:

2Cij Cii  C jj  Dij


D j  tr (C ) / N  Di  tr (C ) / N  Dij
Di  D j  Dij  2 tr (C ) / N
Di  D j  Dij  D0

1
Cij  ( Di  D j  Dij  D0 )
2

An Introduction to Locally Linear Embedding

You might also like