Dimensionality Reduction
Dimensionality Reduction
Methods are commonly divided into linear and nonlinear approaches.[1] Approaches can also be divided
into feature selection and feature extraction.[2] Dimensionality reduction can be used for noise reduction,
data visualization, cluster analysis, or as an intermediate step to facilitate other analyses.
Feature selection
Feature selection approaches try to find a subset of the input variables (also called features or attributes).
The three strategies are: the filter strategy (e.g. information gain), the wrapper strategy (e.g. search guided
by accuracy), and the embedded strategy (selected features are added or removed while building the model
based on prediction errors).
Data analysis such as regression or classification can be done in the reduced space more accurately than in
the original space.[3]
Feature projection
Feature projection (also called feature extraction) transforms the data from the high-dimensional space to a
space of fewer dimensions. The data transformation may be linear, as in principal component analysis
(PCA), but many nonlinear dimensionality reduction techniques also exist.[4][5] For multidimensional data,
tensor representation can be used in dimensionality reduction through multilinear subspace learning.[6]
NMF decomposes a non-negative matrix to the product of two non-negative ones, which has been a
promising tool in fields where only non-negative signals exist,[7][8] such as astronomy.[9][10] NMF is well
known since the multiplicative update rule by Lee & Seung,[7] which has been continuously developed: the
inclusion of uncertainties,[9] the consideration of missing data and parallel computation,[11] sequential
construction[11] which leads to the stability and linearity of NMF,[10] as well as other updates including
handling missing data in digital image processing.[12]
With a stable component basis during construction, and a linear modeling process, sequential NMF[11] is
able to preserve the flux in direct imaging of circumstellar structures in astronomy,[10] as one of the
methods of detecting exoplanets, especially for the direct imaging of circumstellar discs. In comparison with
PCA, NMF does not remove the mean of the matrices, which leads to unphysical non-negative fluxes;
therefore NMF is able to preserve more information than PCA as demonstrated by Ren et al.[10]
Kernel PCA
Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The
resulting technique is capable of constructing nonlinear mappings that maximize the variance in the data.
The resulting technique is called kernel PCA.
Other prominent nonlinear techniques include manifold learning techniques such as Isomap, locally linear
embedding (LLE),[13] Hessian LLE, Laplacian eigenmaps, and methods based on tangent space
analysis.[14] These techniques construct a low-dimensional data representation using a cost function that
retains local properties of the data, and can be viewed as defining a graph-based kernel for Kernel PCA.
More recently, techniques have been proposed that, instead of defining a fixed kernel, try to learn the kernel
using semidefinite programming. The most prominent example of such a technique is maximum variance
unfolding (MVU). The central idea of MVU is to exactly preserve all pairwise distances between nearest
neighbors (in the inner product space), while maximizing the distances between points that are not nearest
neighbors.
An alternative approach to neighborhood preservation is through the minimization of a cost function that
measures differences between distances in the input and output spaces. Important examples of such
techniques include: classical multidimensional scaling, which is identical to PCA; Isomap, which uses
geodesic distances in the data space; diffusion maps, which use diffusion distances in the data space; t-
distributed stochastic neighbor embedding (t-SNE), which minimizes the divergence between distributions
over pairs of points; and curvilinear component analysis.
A different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special
kind of feedforward neural networks with a bottle-neck hidden layer.[15] The training of deep encoders is
typically performed using a greedy layer-wise pre-training (e.g., using a stack of restricted Boltzmann
machines) that is followed by a finetuning stage based on backpropagation.
Autoencoder
Autoencoders can be used to learn nonlinear dimension reduction functions and codings together with an
inverse function from the coding to the original representation.
t-SNE
UMAP
Dimension reduction
For high-dimensional datasets (i.e. with number of dimensions more than 10), dimension reduction is
usually performed prior to applying a K-nearest neighbors algorithm (k-NN) in order to avoid the effects of
the curse of dimensionality.[19]
Feature extraction and dimension reduction can be combined in one step using principal component
analysis (PCA), linear discriminant analysis (LDA), canonical correlation analysis (CCA), or non-negative
matrix factorization (NMF) techniques as a pre-processing step followed by clustering by K-NN on feature
vectors in reduced-dimension space. In machine learning this process is also called low-dimensional
embedding.[20]
For very-high-dimensional datasets (e.g. when performing similarity search on live video streams, DNA
data or high-dimensional time series) running a fast approximate K-NN search using locality-sensitive
hashing, random projection,[21] "sketches",[22] or other high-dimensional similarity search techniques from
the VLDB conference toolbox might be the only feasible option.
Applications
A dimensionality reduction technique that is sometimes used in neuroscience is maximally informative
dimensions, which finds a lower-dimensional representation of a dataset such that as much information as
possible about the original data is preserved.
See also
CUR matrix approximation Nearest neighbor search
Data transformation (statistics) Nonlinear dimensionality reduction
Hyperparameter optimization Random projection
Information gain in decision trees Sammon mapping
Johnson–Lindenstrauss lemma Semantic mapping (statistics)
Latent semantic analysis Semidefinite embedding
Local tangent space alignment Singular value decomposition
Locality-sensitive hashing Sufficient dimension reduction
MinHash Topological data analysis
Multifactor dimensionality reduction Weighted correlation network analysis
Notes
1. van der Maaten, Laurens; Postma, Eric; van den Herik, Jaap (October 26, 2009).
"Dimensionality Reduction: A Comparative Review" (https://members.loria.fr/moberger/Ensei
gnement/AVR/Exposes/TR_Dimensiereductie.pdf) (PDF). J Mach Learn Res. 10: 66–71.
2. Pudil, P.; Novovičová, J. (1998). "Novel Methods for Feature Subset Selection with Respect
to Problem Knowledge". In Liu, Huan; Motoda, Hiroshi (eds.). Feature Extraction,
Construction and Selection. p. 101. doi:10.1007/978-1-4615-5725-8_7 (https://doi.org/10.10
07%2F978-1-4615-5725-8_7). ISBN 978-1-4613-7622-4.
3. Rico-Sulayes, Antonio (2017). "Reducing Vector Space Dimensionality in Automatic
Classification for Authorship Attribution" (https://rielac.cujae.edu.cu/index.php/rieac/article/vi
ew/478). Revista Ingeniería Electrónica, Automática y Comunicaciones. 38 (3): 26–35.
ISSN 1815-5928 (https://www.worldcat.org/issn/1815-5928).
4. Samet, H. (2006) Foundations of Multidimensional and Metric Data Structures. Morgan
Kaufmann. ISBN 0-12-369446-9
5. C. Ding, X. He, H. Zha, H.D. Simon, Adaptive Dimension Reduction for Clustering High
Dimensional Data (https://escholarship.org/uc/item/8pv153t1), Proceedings of International
Conference on Data Mining, 2002
6. Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N. (2011). "A Survey of Multilinear
Subspace Learning for Tensor Data" (https://www.dsp.utoronto.ca/~haiping/Publication/Surv
eyMSL_PR2011.pdf) (PDF). Pattern Recognition. 44 (7): 1540–1551.
Bibcode:2011PatRe..44.1540L (https://ui.adsabs.harvard.edu/abs/2011PatRe..44.1540L).
doi:10.1016/j.patcog.2011.01.004 (https://doi.org/10.1016%2Fj.patcog.2011.01.004).
7. Daniel D. Lee & H. Sebastian Seung (1999). "Learning the parts of objects by non-negative
matrix factorization". Nature. 401 (6755): 788–791. Bibcode:1999Natur.401..788L (https://ui.a
dsabs.harvard.edu/abs/1999Natur.401..788L). doi:10.1038/44565 (https://doi.org/10.1038%2
F44565). PMID 10548103 (https://pubmed.ncbi.nlm.nih.gov/10548103). S2CID 4428232 (htt
ps://api.semanticscholar.org/CorpusID:4428232).
8. Daniel D. Lee & H. Sebastian Seung (2001). Algorithms for Non-negative Matrix
Factorization (https://proceedings.neurips.cc/paper/2000/file/f9d1152547c0bde01830b7e8b
d60024c-Paper.pdf) (PDF). Advances in Neural Information Processing Systems 13:
Proceedings of the 2000 Conference. MIT Press. pp. 556–562.
9. Blanton, Michael R.; Roweis, Sam (2007). "K-corrections and filter transformations in the
ultraviolet, optical, and near infrared". The Astronomical Journal. 133 (2): 734–754.
arXiv:astro-ph/0606170 (https://arxiv.org/abs/astro-ph/0606170).
Bibcode:2007AJ....133..734B (https://ui.adsabs.harvard.edu/abs/2007AJ....133..734B).
doi:10.1086/510127 (https://doi.org/10.1086%2F510127). S2CID 18561804 (https://api.sem
anticscholar.org/CorpusID:18561804).
10. Ren, Bin; Pueyo, Laurent; Zhu, Guangtun B.; Duchêne, Gaspard (2018). "Non-negative
Matrix Factorization: Robust Extraction of Extended Structures". The Astrophysical Journal.
852 (2): 104. arXiv:1712.10317 (https://arxiv.org/abs/1712.10317).
Bibcode:2018ApJ...852..104R (https://ui.adsabs.harvard.edu/abs/2018ApJ...852..104R).
doi:10.3847/1538-4357/aaa1f2 (https://doi.org/10.3847%2F1538-4357%2Faaa1f2).
S2CID 3966513 (https://api.semanticscholar.org/CorpusID:3966513).
11. Zhu, Guangtun B. (2016-12-19). "Nonnegative Matrix Factorization (NMF) with
Heteroscedastic Uncertainties and Missing data". arXiv:1612.06037 (https://arxiv.org/abs/16
12.06037) [astro-ph.IM (https://arxiv.org/archive/astro-ph.IM)].
12. Ren, Bin; Pueyo, Laurent; Chen, Christine; Choquet, Elodie; Debes, John H.; Duechene,
Gaspard; Menard, Francois; Perrin, Marshall D. (2020). "Using Data Imputation for Signal
Separation in High Contrast Imaging". The Astrophysical Journal. 892 (2): 74.
arXiv:2001.00563 (https://arxiv.org/abs/2001.00563). Bibcode:2020ApJ...892...74R (https://u
i.adsabs.harvard.edu/abs/2020ApJ...892...74R). doi:10.3847/1538-4357/ab7024 (https://doi.
org/10.3847%2F1538-4357%2Fab7024). S2CID 209531731 (https://api.semanticscholar.or
g/CorpusID:209531731).
13. Roweis, S. T.; Saul, L. K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear
Embedding". Science. 290 (5500): 2323–2326. Bibcode:2000Sci...290.2323R (https://ui.ads
abs.harvard.edu/abs/2000Sci...290.2323R). CiteSeerX 10.1.1.111.3313 (https://citeseerx.ist.
psu.edu/viewdoc/summary?doi=10.1.1.111.3313). doi:10.1126/science.290.5500.2323 (http
s://doi.org/10.1126%2Fscience.290.5500.2323). PMID 11125150 (https://pubmed.ncbi.nlm.n
ih.gov/11125150). S2CID 5987139 (https://api.semanticscholar.org/CorpusID:5987139).
14. Zhang, Zhenyue; Zha, Hongyuan (2004). "Principal Manifolds and Nonlinear Dimensionality
Reduction via Tangent Space Alignment". SIAM Journal on Scientific Computing. 26 (1):
313–338. Bibcode:2004SJSC...26..313Z (https://ui.adsabs.harvard.edu/abs/2004SJSC...26..
313Z). doi:10.1137/s1064827502419154 (https://doi.org/10.1137%2Fs1064827502419154).
15. Hongbing Hu, Stephen A. Zahorian, (2010) "Dimensionality Reduction Methods for HMM
Phonetic Recognition" (http://ws2.binghamton.edu/zahorian/pdf/Hu2010Dimensionality.pdf),
ICASSP 2010, Dallas, TX
16. Baudat, G.; Anouar, F. (2000). "Generalized Discriminant Analysis Using a Kernel
Approach". Neural Computation. 12 (10): 2385–2404. CiteSeerX 10.1.1.412.760 (https://cite
seerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.412.760).
doi:10.1162/089976600300014980 (https://doi.org/10.1162%2F089976600300014980).
PMID 11032039 (https://pubmed.ncbi.nlm.nih.gov/11032039). S2CID 7036341 (https://api.se
manticscholar.org/CorpusID:7036341).
17. Haghighat, Mohammad; Zonouz, Saman; Abdel-Mottaleb, Mohamed (2015). "CloudID:
Trustworthy cloud-based and cross-enterprise biometric identification". Expert Systems with
Applications. 42 (21): 7905–7916. doi:10.1016/j.eswa.2015.06.025 (https://doi.org/10.1016%
2Fj.eswa.2015.06.025).
18. Schubert, Erich; Gertz, Michael (2017). Beecks, Christian; Borutta, Felix; Kröger, Peer; Seidl,
Thomas (eds.). "Intrinsic t-Stochastic Neighbor Embedding for Visualization and Outlier
Detection" (https://link.springer.com/chapter/10.1007/978-3-319-68474-1_13). Similarity
Search and Applications. Lecture Notes in Computer Science. Cham: Springer International
Publishing. 10609: 188–203. doi:10.1007/978-3-319-68474-1_13 (https://doi.org/10.1007%2
F978-3-319-68474-1_13). ISBN 978-3-319-68474-1.
19. Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft (1999) "When is "nearest
neighbor" meaningful?" (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422).
Database Theory—ICDT99, 217–235
20. Shaw, B.; Jebara, T. (2009). "Structure preserving embedding" (http://www.cs.columbia.edu/
~jebara/papers/spe-icml09.pdf) (PDF). Proceedings of the 26th Annual International
Conference on Machine Learning – ICML '09. p. 1. CiteSeerX 10.1.1.161.451 (https://citesee
rx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.451). doi:10.1145/1553374.1553494 (http
s://doi.org/10.1145%2F1553374.1553494). ISBN 9781605585161. S2CID 8522279 (https://
api.semanticscholar.org/CorpusID:8522279).
21. Bingham, E.; Mannila, H. (2001). "Random projection in dimensionality reduction".
Proceedings of the seventh ACM SIGKDD international conference on Knowledge
discovery and data mining – KDD '01. p. 245. doi:10.1145/502512.502546 (https://doi.org/1
0.1145%2F502512.502546). ISBN 978-1581133912. S2CID 1854295 (https://api.semantics
cholar.org/CorpusID:1854295).
22. Shasha, D High (2004) Performance Discovery in Time Series Berlin: Springer. ISBN 0-387-
00857-8
References
Boehmke, Brad; Greenwell, Brandon M. (2019). "Dimension Reduction" (https://books.googl
e.com/books?id=aXC9DwAAQBAJ&pg=PA343). Hands-On Machine Learning with R.
Chapman & Hall. pp. 343–396. ISBN 978-1-138-49568-5.
Cunningham, P. (2007). Dimension Reduction (http://citeseerx.ist.psu.edu/viewdoc/summar
y?doi=10.1.1.98.1478) (Technical report). University College Dublin. UCD-CSI-2007-7.
Fodor, I. (2002). A survey of dimension reduction techniques (http://citeseerx.ist.psu.edu/vie
wdoc/versions?doi=10.1.1.8.5098) (Technical report). Center for Applied Scientific
Computing, Lawrence Livermore National. UCRL-ID-148494.
Lakshmi Padmaja, Dhyaram; Vishnuvardhan, B (2016). "Comparative Study of Feature
Subset Selection Methods for Dimensionality Reduction on Scientific Data". 2016 IEEE 6th
International Conference on Advanced Computing (IACC). pp. 31–34.
doi:10.1109/IACC.2016.16 (https://doi.org/10.1109%2FIACC.2016.16). ISBN 978-1-4673-
8286-1. S2CID 14532363 (https://api.semanticscholar.org/CorpusID:14532363).
External links
JMLR Special Issue on Variable and Feature Selection (https://jmlr.csail.mit.edu/papers/spe
cial/feature03.html)
ELastic MAPs (http://bioinfo-out.curie.fr/projects/elmap/)
Locally Linear Embedding (https://cs.nyu.edu/~roweis/lle/)
Visual Comparison of various dimensionality reduction methods (https://intelligencereborn.c
om/MachineLearningDimensionalityReduction.html)
A Global Geometric Framework for Nonlinear Dimensionality Reduction (https://web.archive.
org/web/20040411051530/http://isomap.stanford.edu/)