Nonlinear Dimensionality Reduction by Locally Linear Embedding
Nonlinear Dimensionality Reduction by Locally Linear Embedding
35. R. N. Shepard, Psychon. Bull. Rev. 1, 2 (1994). 1– R2(D̂M , DY). DY is the matrix of Euclidean distanc- ing the coordinates of corresponding points {xi , yi} in
36. J. B. Tenenbaum, Adv. Neural Info. Proc. Syst. 10, 682 es in the low-dimensional embedding recovered by both spaces provided by Isomap together with stan-
(1998). each algorithm. D̂M is each algorithm’s best estimate dard supervised learning techniques (39).
37. T. Martinetz, K. Schulten, Neural Netw. 7, 507 (1994). of the intrinsic manifold distances: for Isomap, this is 44. Supported by the Mitsubishi Electric Research Labo-
38. V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduc- the graph distance matrix DG; for PCA and MDS, it is ratories, the Schlumberger Foundation, the NSF
tion to Parallel Computing: Design and Analysis of the Euclidean input-space distance matrix DX (except (DBS-9021648), and the DARPA Human ID program.
Algorithms (Benjamin/Cummings, Redwood City, CA, with the handwritten “2”s, where MDS uses the We thank Y. LeCun for making available the MNIST
1994), pp. 257–297. tangent distance). R is the standard linear correlation database and S. Roweis and L. Saul for sharing related
39. D. Beymer, T. Poggio, Science 272, 1905 (1996). coefficient, taken over all entries of D̂M and DY. unpublished work. For many helpful discussions, we
40. Available at www.research.att.com/⬃yann/ocr/mnist. 43. In each sequence shown, the three intermediate im- thank G. Carlsson, H. Farid, W. Freeman, T. Griffiths,
41. P. Y. Simard, Y. LeCun, J. Denker, Adv. Neural Info. ages are those closest to the points 1/4, 1/2, and 3/4 R. Lehrer, S. Mahajan, D. Reich, W. Richards, J. M.
Proc. Syst. 5, 50 (1993). of the way between the given endpoints. We can also Tenenbaum, Y. Weiss, and especially M. Bernstein.
42. In order to evaluate the fits of PCA, MDS, and Isomap synthesize an explicit mapping from input space X to
on comparable grounds, we use the residual variance the low-dimensional embedding Y, or vice versa, us- 10 August 2000; accepted 21 November 2000
冘冏 冏
How do we judge similarity? Our mental coordinates as observed modes of variability. 2
representations of the world are formed by Previous approaches to this problem, based on ε共W 兲 ⫽ Xជi⫺⌺jW ij Xជj (1)
processing large numbers of sensory in- multidimensional scaling (MDS) (2), have i
puts—including, for example, the pixel in- computed embeddings that attempt to preserve which adds up the squared distances between
tensities of images, the power spectra of pairwise distances [or generalized disparities all the data points and their reconstructions. The
sounds, and the joint angles of articulated (3)] between data points; these distances are weights Wij summarize the contribution of the
bodies. While complex stimuli of this form can measured along straight lines or, in more so- jth data point to the ith reconstruction. To com-
be represented by points in a high-dimensional phisticated usages of MDS such as Isomap (4), pute the weights Wij, we minimize the cost
vector space, they typically have a much more
compact description. Coherent structure in the
world leads to strong correlations between in-
puts (such as between neighboring pixels in
images), generating observations that lie on or
close to a smooth low-dimensional manifold.
To compare and classify such observations—in
effect, to reason about the world— depends
crucially on modeling the nonlinear geometry
of these low-dimensional manifolds.
Scientists interested in exploratory analysis
or visualization of multivariate data (1) face a
similar problem in dimensionality reduction.
The problem, as illustrated in Fig. 1, involves Fig. 1. The problem of nonlinear dimensionality reduction, as illustrated (10) for three-dimensional
mapping high-dimensional inputs into a low- data (B) sampled from a two-dimensional manifold (A). An unsupervised learning algorithm must
dimensional “description” space with as many discover the global internal coordinates of the manifold without signals that explicitly indicate how
the data should be embedded in two dimensions. The color coding illustrates the neighborhood-
preserving mapping discovered by LLE; black outlines in (B) and (C) show the neighborhood of a
1
Gatsby Computational Neuroscience Unit, Universi- single point. Unlike LLE, projections of the data by principal component analysis (PCA) (28) or
ty College London, 17 Queen Square, London WC1N classical MDS (2) map faraway data points to nearby points in the plane, failing to identify the
3AR, UK. 2AT&T Lab—Research, 180 Park Avenue, underlying structure of the manifold. Note that mixture models for local dimensionality reduction
Florham Park, NJ 07932, USA. (29), which cluster the data and perform PCA within each cluster, do not address the problem
E-mail: [email protected] (S.T.R.); lsaul@research. considered here: namely, how to map high-dimensional data into a single global coordinate system
att.com (L.K.S.) of lower dimensionality.
冘冏 冏
2
Editor's Summary
Article Tools Visit the online version of this article to access the personalization and article
tools:
http://science.sciencemag.org/content/290/5500/2323
Science (print ISSN 0036-8075; online ISSN 1095-9203) is published weekly, except the last week in
December, by the American Association for the Advancement of Science, 1200 New York Avenue NW,
Washington, DC 20005. Copyright 2016 by the American Association for the Advancement of Science;
all rights reserved. The title Science is a registered trademark of AAAS.