Dimensionality Reduction
Dimensionality Reduction
Contents
Feature selection
Feature projection
Principal component analysis (PCA)
Non-negative matrix factorization (NMF)
Kernel PCA
Graph-based kernel PCA
Linear discriminant analysis (LDA)
Generalized discriminant analysis (GDA)
Autoencoder
t-SNE
UMAP
Dimension reduction
Advantages of dimensionality reduction
Applications
See also
Notes
References
External links
Feature selection
Feature selection approaches try to find a subset of the input variables (also called features or attributes).
The three strategies are: the filter strategy (e.g. information gain), the wrapper strategy (e.g. search
guided by accuracy), and the embedded strategy (selected features add or are removed while building the
model based on prediction errors).
Data analysis such as regression or classification can be done in the reduced space more accurately than
in the original space.[3]
Feature projection
Feature projection (also called Feature extraction) transforms the data in the high-dimensional space to a
space of fewer dimensions. The data transformation may be linear, as in principal component analysis
(PCA), but many nonlinear dimensionality reduction techniques also exist.[4][5] For multidimensional
data, tensor representation can be used in dimensionality reduction through multilinear subspace
learning.[6]
updates.
With a stable component basis during construction, and a linear modeling process, sequential NMF[11] is
able to preserve the flux in direct imaging of circumstellar structures in astromony[10], as one of the
methods of detecting exoplanets, especially for the direct imaging of circumstellar disks. In comparison
with PCA, NMF does not remove the mean of the matrices which leads to unphysical non-negative
fluxes, therefore NMF is able to preserve more information than PCA as demonstrated by Ren et al[10].
Kernel PCA
Principal component analysis can be employed in a nonlinear way by means of the kernel trick. The
resulting technique is capable of constructing nonlinear mappings that maximize the variance in the data.
The resulting technique is entitled kernel PCA.
An alternative approach to neighborhood preservation is through the minimization of a cost function that
measures differences between distances in the input and output spaces. Important examples of such
techniques include: classical multidimensional scaling, which is identical to PCA; Isomap, which uses
geodesic distances in the data space; diffusion maps, which use diffusion distances in the data space; t-
distributed stochastic neighbor embedding (t-SNE), which minimizes the divergence between
distributions over pairs of points; and curvilinear component analysis.
A different approach to nonlinear dimensionality reduction is through the use of autoencoders, a special
kind of feed-forward neural networks with a bottle-neck hidden layer.[14] The training of deep encoders
is typically performed using a greedy layer-wise pre-training (e.g., using a stack of restricted Boltzmann
machines) that is followed by a finetuning stage based on backpropagation.
Autoencoder
Autoencoders can be used to learn non-linear dimension reduction functions and codings together with
an inverse function from the coding to the original representation.
t-SNE
T-distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear dimensionality reduction technique
useful for visualization of high-dimensional datasets.
UMAP
Uniform manifold approximation and projection (UMAP) is a nonlinear dimensionality reduction
technique. Visually, it is similar to t-SNE, but it assumes that the data is uniformly distributed on a
locally connected Riemannian manifold and that the Riemannian metric is locally constant or
approximately locally constant.
Dimension reduction
For high-dimensional datasets (i.e. with number of dimensions more than 10), dimension reduction is
usually performed prior to applying a K-nearest neighbors algorithm (k-NN) in order to avoid the effects
of the curse of dimensionality.[17]
Feature extraction and dimension reduction can be combined in one step using principal component
analysis (PCA), linear discriminant analysis (LDA), canonical correlation analysis (CCA), or non-
negative matrix factorization (NMF) techniques as a pre-processing step followed by clustering by K-NN
on feature vectors in reduced-dimension space. In machine learning this process is also called low-
dimensional embedding.[18]
For very-high-dimensional datasets (e.g. when performing similarity search on live video streams, DNA
data or high-dimensional time series) running a fast approximate K-NN search using locality sensitive
hashing, random projection,[19] "sketches" [20] or other high-dimensional similarity search techniques
from the VLDB toolbox might be the only feasible option.
Applications
A dimensionality reduction technique that is sometimes used in neuroscience is maximally informative
dimensions, which finds a lower-dimensional representation of a dataset such that as much information
as possible about the original data is preserved.
See also
Nearest neighbor search
MinHash
Information gain in decision trees
Semidefinite embedding
Multifactor dimensionality reduction
Multilinear subspace learning
Multilinear PCA
Random projection
Singular value decomposition
Latent semantic analysis
Semantic mapping
Topological data analysis
Locality sensitive hashing
Sufficient dimension reduction
Data transformation (statistics)
Weighted correlation network analysis
Hyperparameter optimization
CUR matrix approximation
Envelope model
Nonlinear dimensionality reduction
Sammon mapping
Johnson–Lindenstrauss lemma
Local tangent space alignment
Notes
1. Roweis, S. T.; Saul, L. K. (2000). "Nonlinear Dimensionality Reduction by Locally Linear
Embedding". Science. 290 (5500): 2323–2326. Bibcode:2000Sci...290.2323R (https://ui.ads
abs.harvard.edu/abs/2000Sci...290.2323R). CiteSeerX 10.1.1.111.3313 (https://citeseerx.is
t.psu.edu/viewdoc/summary?doi=10.1.1.111.3313). doi:10.1126/science.290.5500.2323 (htt
ps://doi.org/10.1126%2Fscience.290.5500.2323). PMID 11125150 (https://pubmed.ncbi.nl
m.nih.gov/11125150).
2. Pudil, P.; Novovičová, J. (1998). "Novel Methods for Feature Subset Selection with Respect
to Problem Knowledge". In Liu, Huan; Motoda, Hiroshi (eds.). Feature Extraction,
Construction and Selection. p. 101. doi:10.1007/978-1-4615-5725-8_7 (https://doi.org/10.10
07%2F978-1-4615-5725-8_7). ISBN 978-1-4613-7622-4.
3. Rico-Sulayes, Antonio (2017). "Reducing Vector Space Dimensionality in Automatic
Classification for Authorship Attribution" (http://rielac.cujae.edu.cu/index.php/rieac/article/do
wnload/478/278). Revista Ingeniería Electrónica, Automática y Comunicaciones. 38 (3): 26–
35.
4. Samet, H. (2006) Foundations of Multidimensional and Metric Data Structures. Morgan
Kaufmann. ISBN 0-12-369446-9
5. C. Ding, X. He, H. Zha, H.D. Simon, Adaptive Dimension Reduction for Clustering High
Dimensional Data (https://cloudfront.escholarship.org/dist/prd/content/qt8pv153t1/qt8pv153t
1.pdf), Proceedings of International Conference on Data Mining, 2002
6. Lu, Haiping; Plataniotis, K.N.; Venetsanopoulos, A.N. (2011). "A Survey of Multilinear
Subspace Learning for Tensor Data" (http://www.dsp.utoronto.ca/~haiping/Publication/Surve
yMSL_PR2011.pdf) (PDF). Pattern Recognition. 44 (7): 1540–1551.
doi:10.1016/j.patcog.2011.01.004 (https://doi.org/10.1016%2Fj.patcog.2011.01.004).
7. Daniel D. Lee & H. Sebastian Seung (1999). "Learning the parts of objects by non-negative
matrix factorization". Nature. 401 (6755): 788–791. Bibcode:1999Natur.401..788L (https://ui.
adsabs.harvard.edu/abs/1999Natur.401..788L). doi:10.1038/44565 (https://doi.org/10.103
8%2F44565). PMID 10548103 (https://pubmed.ncbi.nlm.nih.gov/10548103).
8. Daniel D. Lee & H. Sebastian Seung (2001). Algorithms for Non-negative Matrix
Factorization (http://papers.nips.cc/paper/1861-algorithms-for-non-negative-matrix-factorizat
ion.pdf) (PDF). Advances in Neural Information Processing Systems 13: Proceedings of the
2000 Conference. MIT Press. pp. 556–562.
9. Blanton, Michael R.; Roweis, Sam (2007). "K-corrections and filter transformations in the
ultraviolet, optical, and near infrared". The Astronomical Journal. 133 (2): 734–754.
arXiv:astro-ph/0606170 (https://arxiv.org/abs/astro-ph/0606170).
Bibcode:2007AJ....133..734B (https://ui.adsabs.harvard.edu/abs/2007AJ....133..734B).
doi:10.1086/510127 (https://doi.org/10.1086%2F510127).
10. Ren, Bin; Pueyo, Laurent; Zhu, Guangtun B.; Duchêne, Gaspard (2018). "Non-negative
Matrix Factorization: Robust Extraction of Extended Structures". The Astrophysical Journal.
852 (2): 104. arXiv:1712.10317 (https://arxiv.org/abs/1712.10317).
Bibcode:2018ApJ...852..104R (https://ui.adsabs.harvard.edu/abs/2018ApJ...852..104R).
doi:10.3847/1538-4357/aaa1f2 (https://doi.org/10.3847%2F1538-4357%2Faaa1f2).
11. Zhu, Guangtun B. (2016-12-19). "Nonnegative Matrix Factorization (NMF) with
Heteroscedastic Uncertainties and Missing data". arXiv:1612.06037 (https://arxiv.org/abs/16
12.06037) [astro-ph.IM (https://arxiv.org/archive/astro-ph.IM)].
12. Zhang, Zhenyue; Zha, Hongyuan (2004). "Principal Manifolds and Nonlinear Dimensionality
Reduction via Tangent Space Alignment". SIAM Journal on Scientific Computing. 26 (1):
313–338. doi:10.1137/s1064827502419154 (https://doi.org/10.1137%2Fs10648275024191
54).
13. Bengio, Yoshua; Monperrus, Martin; Larochelle, Hugo (2006). "Nonlocal Estimation of
Manifold Structure" (https://hal.archives-ouvertes.fr/hal-01575345/document). Neural
Computation. 18 (10): 2509–2528. CiteSeerX 10.1.1.116.4230 (https://citeseerx.ist.psu.edu/
viewdoc/summary?doi=10.1.1.116.4230). doi:10.1162/neco.2006.18.10.2509 (https://doi.or
g/10.1162%2Fneco.2006.18.10.2509). PMID 16907635 (https://pubmed.ncbi.nlm.nih.gov/16
907635).
14. Hongbing Hu, Stephen A. Zahorian, (2010) "Dimensionality Reduction Methods for HMM
Phonetic Recognition," (http://bingweb.binghamton.edu/~hhu1/paper/Hu2010Dimensionalit
y.pdf) ICASSP 2010, Dallas, TX
15. Baudat, G.; Anouar, F. (2000). "Generalized Discriminant Analysis Using a Kernel
Approach". Neural Computation. 12 (10): 2385–2404. CiteSeerX 10.1.1.412.760 (https://cite
seerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.412.760).
doi:10.1162/089976600300014980 (https://doi.org/10.1162%2F089976600300014980).
PMID 11032039 (https://pubmed.ncbi.nlm.nih.gov/11032039).
16. Haghighat, Mohammad; Zonouz, Saman; Abdel-Mottaleb, Mohamed (2015). "CloudID:
Trustworthy cloud-based and cross-enterprise biometric identification". Expert Systems with
Applications. 42 (21): 7905–7916. doi:10.1016/j.eswa.2015.06.025 (https://doi.org/10.101
6%2Fj.eswa.2015.06.025).
17. Kevin Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft (1999) "When is “nearest
neighbor” meaningful?" (http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.1422).
Database Theory—ICDT99, 217–235
18. Shaw, B.; Jebara, T. (2009). "Structure preserving embedding" (https://www.cs.columbia.ed
u/~jebara/papers/spe-icml09.pdf) (PDF). Proceedings of the 26th Annual International
Conference on Machine Learning – ICML '09. p. 1. CiteSeerX 10.1.1.161.451 (https://citese
erx.ist.psu.edu/viewdoc/summary?doi=10.1.1.161.451). doi:10.1145/1553374.1553494 (http
s://doi.org/10.1145%2F1553374.1553494). ISBN 9781605585161.
19. Bingham, E.; Mannila, H. (2001). "Random projection in dimensionality reduction".
Proceedings of the seventh ACM SIGKDD international conference on Knowledge
discovery and data mining – KDD '01. p. 245. doi:10.1145/502512.502546 (https://doi.org/1
0.1145%2F502512.502546). ISBN 978-1581133912.
20. Shasha, D High (2004) Performance Discovery in Time Series Berlin: Springer. ISBN 0-387-
00857-8
References
Boehmke, Brad; Greenwell, Brandon M. (2019). "Dimension Reduction" (https://books.googl
e.com/books?id=aXC9DwAAQBAJ&pg=PA343). Hands-On Machine Learning with R.
Chapman & Hall. pp. 343–396. ISBN 978-1-138-49568-5.
Fodor, I. (2002). A survey of dimension reduction techniques (http://citeseerx.ist.psu.edu/vie
wdoc/versions?doi=10.1.1.8.5098) (Technical report). Center for Applied Scientific
Computing, Lawrence Livermore National. UCRL-ID-148494.
Cunningham, P. (2007). Dimension Reduction (http://citeseerx.ist.psu.edu/viewdoc/summar
y?doi=10.1.1.98.1478) (Technical report). University College Dublin. UCD-CSI-2007-7.
Lakshmi Padmaja, Dhyaram; Vishnuvardhan, B (2016). "Comparative Study of Feature
Subset Selection Methods for Dimensionality Reduction on Scientific Data". 2016 IEEE 6th
International Conference on Advanced Computing (IACC). pp. 31–34.
doi:10.1109/IACC.2016.16 (https://doi.org/10.1109%2FIACC.2016.16). ISBN 978-1-4673-
8286-1.
External links
JMLR Special Issue on Variable and Feature Selection (http://jmlr.csail.mit.edu/papers/speci
al/feature03.html)
ELastic MAPs (http://bioinfo-out.curie.fr/projects/elmap/)
Locally Linear Embedding (http://www.cs.toronto.edu/~roweis/lle)
A Global Geometric Framework for Nonlinear Dimensionality Reduction (https://web.archiv
e.org/web/20040411051530/http://isomap.stanford.edu/)
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using
this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.