0% found this document useful (0 votes)

204 views9 pages

Curse of Dimensionality

The curse of dimensionality refers to phenomena that arise in high-dimensional data analysis and organization that do not occur in low-dimensional settings. As dimensionality increases, the volume of the space increases exponentially, resulting in sparse data. Obtaining reliable results requires an exponential increase in the amount of data needed. Common data organization strategies also become inefficient in high dimensions as all data points appear sparse and dissimilar. The curse of dimensionality affects domains like machine learning, data mining, sampling, and optimization by increasing computational requirements and reducing predictive performance with increased dimensions.

Uploaded by

john949

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

204 views9 pages

Curse of Dimensionality

Uploaded by

john949

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Curse of dimensionality

The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in
high-dimensional spaces that do not occur in low-dimensional settings such as the three-dimensional
physical space of everyday experience. The expression was coined by Richard E. Bellman when
considering problems in dynamic programming.[1][2]

Dimensionally cursed phenomena occur in domains such as numerical analysis, sampling, combinatorics,
machine learning, data mining and databases. The common theme of these problems is that when the
dimensionality increases, the volume of the space increases so fast that the available data become sparse. In
order to obtain a reliable result, the amount of data needed often grows exponentially with the
dimensionality. Also, organizing and searching data often relies on detecting areas where objects form
groups with similar properties; in high dimensional data, however, all objects appear to be sparse and
dissimilar in many ways, which prevents common data organization strategies from being efficient.

Domains

Combinatorics

In some problems, each variable can take one of several discrete values, or the range of possible values is
divided to give a finite number of possibilities. Taking the variables together, a huge number of
combinations of values must be considered. This effect is also known as the combinatorial explosion. Even
in the simplest case of binary variables, the number of possible combinations already is , exponential in
the dimensionality. Naively, each additional dimension doubles the effort needed to try all combinations.

Sampling

There is an exponential increase in volume associated with adding extra dimensions to a mathematical
space. For example, 102 = 100 evenly spaced sample points suffice to sample a unit interval (try to visualize
a "1-dimensional" cube) with no more than 10−2 = 0.01 distance between points; an equivalent sampling of
a 10-dimensional unit hypercube with a lattice that has a spacing of 10−2 = 0.01 between adjacent points
would require 1020 = [(102 )10 ] sample points. In general, with a spacing distance of 10−n the 10-
dimensional hypercube appears to be a factor of 10n(10−1) = [(10n )10 /(10n )] "larger" than the 1-dimensional
hypercube, which is the unit interval. In the above example n = 2: when using a sampling distance of 0.01
the 10-dimensional hypercube appears to be 1018 "larger" than the unit interval. This effect is a
combination of the combinatorics problems above and the distance function problems explained below.

Optimization
When solving dynamic optimization problems by numerical backward induction, the objective function
must be computed for each combination of values. This is a significant obstacle when the dimension of the
"state variable" is large.[3]

Machine learning

In machine learning problems that involve learning a "state-of-nature" from a finite number of data samples
in a high-dimensional feature space with each feature having a range of possible values, typically an
enormous amount of training data is required to ensure that there are several samples with each combination
of values. In an abstract sense, as the number of features or dimensions grows, the amount of data we need
to generalize accurately grows exponentially.[4]

A typical rule of thumb is that there should be at least 5 training examples for each dimension in the
representation.[5] In machine learning and insofar as predictive performance is concerned, the curse of
dimensionality is used interchangeably with the peaking phenomenon,[5] which is also known as Hughes
phenomenon.[6] This phenomenon states that with a fixed number of training samples, the average
(expected) predictive power of a classifier or regressor first increases as the number of dimensions or
features used is increased but beyond a certain dimensionality it starts deteriorating instead of improving
steadily.[7][8][9]

Nevertheless, in the context of a simple classifier (linear discriminant analysis in the multivariate Gaussian
model under the assumption of a common known covariance matrix) Zollanvari et al. [10] showed both
analytically and empirically that as long as the relative cumulative efficacy of an additional feature set (with
respect to features that are already part of the classifier) is greater (or less) than the size of this additional
feature set, the expected error of the classifier constructed using these additional features will be less (or
greater) than the expected error of the classifier constructed without them. In other words, both the size of
additional features and their (relative) cumulative discriminatory effect are important in observing a
decrease or increase in the average predictive power.

In metric learning, higher dimensions can sometimes allow a model to achieve better performance. After
normalizing embeddings to the surface of a hypersphere, FaceNet achieves the best performance using 128
dimensions as opposed to 64, 256 and 512 dimensions in the authors' ablation study.[11] A loss function for
unitary-invariant dissimilarity between word embeddings was found to be minimized in high
dimensions.[12]

Data mining
Genetic mutations in individuals data set
In data mining, the curse of dimensionality refers
Individual name Gene 1 Gene 2 ... Gene 2000
to a data set with too many features.
Individual 1 1 0 ... 1
Consider the first table, which depicts 200
individuals and 2000 genes (features) with a 1 or 0 ... ... ... ... ...

denoting whether or not they have a genetic Individual 200 0 1 ... 1

mutation in that gene. A data mining application to
this data set may be finding the correlation between specific genetic mutations and creating a classification
algorithm such as a decision tree to determine whether an individual has cancer or not.

A common practice of data mining in this domain would be to create association rules between genetic
mutations that lead to the development of cancers. To do this, one would have to loop through each genetic
mutation of each individual and find other genetic mutations that occur over a desired threshold and create
pairs. Growth of association pair permutations as pair size grows
They
Number of pairs Calculation for permutations Number of permutations calculated for each row
would
start 2 3998000
with 3 7988004000
pairs
of 4 15952043988000
two,
5 31840279800048000
then
three,
then four until they result in an empty set of pairs. The complexity of this algorithm can lead to calculating
all permutations of gene pairs for each individual or row. Given the formula for calculating the permutations
of n items with a group size of r is: , calculating the number of three pair permutations of any

given individual would be 7988004000 different pairs of genes to evaluate for each individual. The number
of pairs created will grow by an order of factorial as the size of the pairs increase. The growth is depicted in
the permutation table (see right).

As we can see from the permutation table above, one of the major problems data miners face regarding the
curse of dimensionality is that the space of possible parameter values grows exponentially or factorially as
the number of features in the data set grows. This problem critically affects both computational time and
space when searching for associations or optimal features to consider.

Another problem data miners may face when dealing with too many features is the notion that the number
of false predictions or classifications tend to increase as the number of features grow in the data set. In terms
of the classification problem discussed above, keeping every data point could lead to a higher number of
false positives and false negatives in the model.

This may seem counter intuitive but consider the genetic mutation table from above, depicting all genetic
mutations for each individual. Each genetic mutation, whether they correlate with cancer or not, will have
some input or weight in the model that guides the decision-making process of the algorithm. There may be
mutations that are outliers or ones that dominate the overall distribution of genetic mutations when in fact
they do not correlate with cancer. These features may be working against one's model, making it more
difficult to obtain optimal results.

This problem is up to the data miner to solve, and there is no universal solution. The first step any data
miner should take is to explore the data, in an attempt to gain an understanding of how it can be used to
solve the problem. One must first understand what the data means, and what they are trying to discover
before they can decide if anything must be removed from the data set. Then they can create or use a feature
selection or dimensionality reduction algorithm to remove samples or features from the data set if they deem
it necessary. One example of such methods is the interquartile range method, used to remove outliers in a
data set by calculating the standard deviation of a feature or occurrence.

Distance function

When a measure such as a Euclidean distance is defined using many coordinates, there is little difference in
the distances between different pairs of points.
One way to illustrate the "vastness" of high-dimensional Euclidean space is to compare the proportion of an
inscribed hypersphere with radius and dimension , to that of a hypercube with edges of length The

volume of such a sphere is , where is the gamma function, while the volume of the cube is

. As the dimension of the space increases, the hypersphere becomes an insignificant volume relative
to that of the hypercube. This can clearly be seen by comparing the proportions as the dimension goes to
infinity:

as .

Furthermore, the distance between the center and the corners is , which increases without bound for
fixed r. In this sense when points are uniformly generated in a high-dimensional hypercube, almost all
points are much farther than units away from the centre. In high dimensions, the volume of the d-
dimensional unit hypercube (with coordinates of the vertices ) is concentrated near a sphere with the
radius for large dimension d. Indeed, for each coordinate the average value of in the cube
is[13]

The variance of for uniform distribution in the cube is

Therefore, the squared distance from the origin, has the average value d/3 and variance

4d/45. For large d, distribution of is close to the normal distribution with the mean 1/3 and the
standard deviation according to the central limit theorem. Thus, when uniformly generating points

in high dimensions, both the "middle" of the hypercube, and the corners are empty, and all the volume is
concentrated near the surface of a sphere of "intermediate" radius .

This also helps to understand the chi-squared distribution. Indeed, the (non-central) chi-squared distribution
associated to a random point in the interval [-1, 1] is the same as the distribution of the length-squared of a
random point in the d-cube. By the law of large numbers, this distribution concentrates itself in a narrow
band around d times the standard deviation squared (σ2 ) of the original derivation. This illuminates the chi-
squared distribution and also illustrates that most of the volume of the d-cube concentrates near the
boundary of a sphere of radius .

A further development of this phenomenon is as follows. Any fixed distribution on the real numbers
induces a product distribution on points in . For any fixed n, it turns out that the difference between the
minimum and the maximum distance between a random reference point Q and a list of n random data
points P1 ,...,Pn become indiscernible compared to the minimum distance:[14]

.
This is often cited as distance functions losing their usefulness (for the nearest-neighbor criterion in feature-
comparison algorithms, for example) in high dimensions. However, recent research has shown this to only
hold in the artificial scenario when the one-dimensional distributions are independent and identically
distributed. [15] When attributes are correlated, data can become easier and provide higher distance contrast
and the signal-to-noise ratio was found to play an important role, thus feature selection should be used.[15]

More recently, it has been suggested that there may be a conceptual flaw in the argument that contrast-loss
creates a curse in high dimensions. Machine learning can be understood as the problem of assigning
instances to their respective generative process of origin, with class labels acting as symbolic
representations of individual generative processes. The curse's derivation assumes all instances are
independent, identical outcomes of a single high dimensional generative process. If there is only one
generative process, there would exist only one (naturally occurring) class and machine learning would be
conceptually ill-defined in both high and low dimensions. Thus, the traditional argument that contrast-loss
creates a curse, may be fundamentally inappropriate. In addition, it has been shown that when the
generative model is modified to accommodate multiple generative processes, contrast-loss can morph from
a curse to a blessing, as it ensures that the nearest-neighbor of an instance is almost-surely its most closely
related instance. From this perspective, contrast-loss makes high dimensional distances especially
meaningful and not especially non-meaningful as is often argued.[16]

Nearest neighbor search

The effect complicates nearest neighbor search in high dimensional space. It is not possible to quickly reject
candidates by using the difference in one coordinate as a lower bound for a distance based on all the
dimensions.[17][18]

However, it has recently been observed that the mere number of dimensions does not necessarily result in
difficulties,[19] since relevant additional dimensions can also increase the contrast. In addition, for the
resulting ranking it remains useful to discern close and far neighbors. Irrelevant ("noise") dimensions,
however, reduce the contrast in the manner described above. In time series analysis, where the data are
inherently high-dimensional, distance functions also work reliably as long as the signal-to-noise ratio is high
enough.[20]

k-nearest neighbor classification

Another effect of high dimensionality on distance functions concerns k-nearest neighbor (k-NN) graphs
constructed from a data set using a distance function. As the dimension increases, the indegree distribution
of the k-NN digraph becomes skewed with a peak on the right because of the emergence of a
disproportionate number of hubs, that is, data-points that appear in many more k-NN lists of other data-
points than the average. This phenomenon can have a considerable impact on various techniques for
classification (including the k-NN classifier), semi-supervised learning, and clustering,[21] and it also affects
information retrieval.[22]

Anomaly detection

In a 2012 survey, Zimek et al. identified the following problems when searching for anomalies in high-
dimensional data:[15]

1. Concentration of scores and distances: derived values such as distances become

numerically similar
2. Irrelevant attributes: in high dimensional data, a significant number of attributes may be
irrelevant
3. Definition of reference sets: for local methods, reference sets are often nearest-neighbor
based
4. Incomparable scores for different dimensionalities: different subspaces produce
incomparable scores
5. Interpretability of scores: the scores often no longer convey a semantic meaning
6. Exponential search space: the search space can no longer be systematically scanned
7. Data snooping bias: given the large search space, for every desired significance a
hypothesis can be found
8. Hubness: certain objects occur more frequently in neighbor lists than others.

Many of the analyzed specialized methods tackle one or another of these problems, but there remain many
open research questions.

Blessing of dimensionality

Surprisingly and despite the expected "curse of dimensionality" difficulties, common-sense heuristics based
on the most straightforward methods "can yield results which are almost surely optimal" for high-
dimensional problems.[23] The term "blessing of dimensionality" was introduced in the late 1990s.[23]
Donoho in his "Millennium manifesto" clearly explained why the "blessing of dimensionality" will form a
basis of future data mining.[24] The effects of the blessing of dimensionality were discovered in many
applications and found their foundation in the concentration of measure phenomena.[25] One example of
the blessing of dimensionality phenomenon is linear separability of a random point from a large finite
random set with high probability even if this set is exponentially large: the number of elements in this
random set can grow exponentially with dimension. Moreover, this linear functional can be selected in the
form of the simplest linear Fisher discriminant. This separability theorem was proven for a wide class of
probability distributions: general uniformly log-concave distributions, product distributions in a cube and
many other families (reviewed recently in [25]).

"The blessing of dimensionality and the curse of dimensionality are two sides of the same coin."[26] For
example, the typical property of essentially high-dimensional probability distributions in a high-dimensional
space is: the squared distance of random points to a selected point is, with high probability, close to the
average (or median) squared distance. This property significantly simplifies the expected geometry of data
and indexing of high-dimensional data (blessing),[27] but, at the same time, it makes the similarity search in
high dimensions difficult and even useless (curse).[28]

Zimek et al.[15] noted that while the typical formalizations of the curse of dimensionality affect i.i.d. data,
having data that is separated in each attribute becomes easier even in high dimensions, and argued that the
signal-to-noise ratio matters: data becomes easier with each attribute that adds signal, and harder with
attributes that only add noise (irrelevant error) to the data. In particular for unsupervised data analysis this
effect is known as swamping.

See also
Bellman equation
Clustering high-dimensional data
Concentration of measure
Dimension reduction
Dynamic programming
Fourier-related transforms
Grand Tour
Linear least squares
Model order reduction
Multilinear PCA
Multilinear subspace learning
Principal component analysis
Singular value decomposition

References
1. Bellman, Richard Ernest; Rand Corporation (1957). Dynamic programming (https://books.go
ogle.com/books?id=wdtoPwAACAAJ). Princeton University Press. p. ix. ISBN 978-0-691-
07951-6.,
Republished: Bellman, Richard Ernest (2003). Dynamic Programming (https://books.google.
com/books?id=fyVtp3EMxasC). Courier Dover Publications. ISBN 978-0-486-42809-3.
2. Bellman, Richard Ernest (1961). Adaptive control processes: a guided tour (https://books.go
ogle.com/books?id=POAmAAAAMAAJ). Princeton University Press. ISBN 9780691079011.
3. Taylor, C. Robert (1993). "Dynamic Programming and the Curses of Dimensionality" (https://
books.google.com/books?id=71SsDwAAQBAJ&pg=PA1). Applications Of Dynamic
Programming To Agricultural Decision Problems. Westview Press. pp. 1–10. ISBN 0-8133-
8641-1.
4. Curse of Dimensionality - Georgia Tech - Machine Learning (https://www.youtube.com/watc
h?v=QZ0DtNFdDko), retrieved 2022-06-29
5. Koutroumbas, Konstantinos; Theodoridis, Sergios (2008). Pattern Recognition (https://www.
elsevier.com/books/pattern-recognition/theodoridis/978-1-59749-272-0) (4th ed.). Burlington.
ISBN 978-1-59749-272-0. Retrieved 8 January 2018.
6. Hughes, G.F. (January 1968). "On the mean accuracy of statistical pattern recognizers".
IEEE Transactions on Information Theory. 14 (1): 55–63. doi:10.1109/TIT.1968.1054102 (htt
ps://doi.org/10.1109%2FTIT.1968.1054102). S2CID 206729491 (https://api.semanticscholar.
org/CorpusID:206729491).
7. Trunk, G. V. (July 1979). "A Problem of Dimensionality: A Simple Example". IEEE
Transactions on Pattern Analysis and Machine Intelligence. PAMI-1 (3): 306–307.
doi:10.1109/TPAMI.1979.4766926 (https://doi.org/10.1109%2FTPAMI.1979.4766926).
PMID 21868861 (https://pubmed.ncbi.nlm.nih.gov/21868861). S2CID 13086902 (https://api.s
emanticscholar.org/CorpusID:13086902).
8. B. Chandrasekaran; A. K. Jain (1974). "Quantization Complexity and Independent
Measurements". IEEE Transactions on Computers. 23 (8): 102–106. doi:10.1109/T-
C.1974.223789 (https://doi.org/10.1109%2FT-C.1974.223789). S2CID 35360973 (https://api.
semanticscholar.org/CorpusID:35360973).
9. McLachlan, G. J. (2004). Discriminant Analysis and Statistical Pattern Recognition. Wiley
Interscience. ISBN 978-0-471-69115-0. MR 1190469 (https://mathscinet.ams.org/mathscinet
-getitem?mr=1190469).
10. A. Zollanvari; A. P. James; R. Sameni (2020). "A Theoretical Analysis of the Peaking
Phenomenon in Classification". Journal of Classification. 37 (2): 421–434.
doi:10.1007/s00357-019-09327-3 (https://doi.org/10.1007%2Fs00357-019-09327-3).
S2CID 253851666 (https://api.semanticscholar.org/CorpusID:253851666).
11. Schroff, Florian; Kalenichenko, Dmitry; Philbin, James (June 2015). "FaceNet: A unified
embedding for face recognition and clustering" (https://www.cv-foundation.org/openaccess/c
ontent_cvpr_2015/papers/Schroff_FaceNet_A_Unified_2015_CVPR_paper.pdf) (PDF).
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 815–823.
doi:10.1109/CVPR.2015.7298682 (https://doi.org/10.1109%2FCVPR.2015.7298682).
12. Yin, Zi; Shen, Yuanyuan (2018). "On the Dimensionality of Word Embedding" (https://procee
dings.neurips.cc/paper_files/paper/2018/file/b534ba68236ba543ae44b22bd110a1d6-Pape
r.pdf) (PDF). Advances in Neural Information Processing Systems. Curran Associates, Inc.
31.
13. Bailey, D.H.; Borwein, J.M.; Crandall, R.E. (2006), "Box integrals", Journal of Computational
and Applied Mathematics, 206: 196–208, doi:10.1016/j.cam.2006.06.010 (https://doi.org/10.1
016%2Fj.cam.2006.06.010), S2CID 2763194 (https://api.semanticscholar.org/CorpusID:276
3194)
14. Beyer, K.; Goldstein, J.; Ramakrishnan, R.; Shaft, U. (1999). When is "Nearest Neighbor"
Meaningful? (http://digital.library.wisc.edu/1793/60174). Proc. 7th International Conference
on Database Theory - ICDT'99. LNCS. Vol. 1540. pp. 217–235. doi:10.1007/3-540-49257-
7_15 (https://doi.org/10.1007%2F3-540-49257-7_15). ISBN 978-3-540-65452-0.
15. Zimek, A.; Schubert, E.; Kriegel, H.-P. (2012). "A survey on unsupervised outlier detection in
high-dimensional numerical data". Statistical Analysis and Data Mining. 5 (5): 363–387.
doi:10.1002/sam.11161 (https://doi.org/10.1002%2Fsam.11161). S2CID 6724536 (https://ap
i.semanticscholar.org/CorpusID:6724536).
16. Lin, Wen-Yan; Liu, Siying; Ren, Changhao; Cheung, Ngai-Man; Li, Hongdong; Matsushita,
Yasuyuki (2021). "Shell Theory: A Statistical Model of Reality" (https://ieeexplore.ieee.org/do
cument/9444188). IEEE Transactions on Pattern Analysis and Machine Intelligence. 44 (10):
6438–6453. doi:10.1109/TPAMI.2021.3084598 (https://doi.org/10.1109%2FTPAMI.2021.308
4598). ISSN 1939-3539 (https://www.worldcat.org/issn/1939-3539). PMID 34048335 (https://
pubmed.ncbi.nlm.nih.gov/34048335). S2CID 235242104 (https://api.semanticscholar.org/Co
rpusID:235242104).
17. Marimont, R.B.; Shapiro, M.B. (1979). "Nearest Neighbour Searches and the Curse of
Dimensionality". IMA J Appl Math. 24 (1): 59–70. doi:10.1093/imamat/24.1.59 (https://doi.org/
10.1093%2Fimamat%2F24.1.59).
18. Chávez, Edgar; Navarro, Gonzalo; Baeza-Yates, Ricardo; Marroquín, José Luis (2001).
"Searching in Metric Spaces". ACM Computing Surveys. 33 (3): 273–321.
CiteSeerX 10.1.1.100.7845 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.
7845). doi:10.1145/502807.502808 (https://doi.org/10.1145%2F502807.502808).
S2CID 3201604 (https://api.semanticscholar.org/CorpusID:3201604).
19. Houle, M. E.; Kriegel, H. P.; Kröger, P.; Schubert, E.; Zimek, A. (2010). Can Shared-Neighbor
Distances Defeat the Curse of Dimensionality? (http://www.dbs.ifi.lmu.de/~zimek/publication
s/SSDBM2010/SNN-SSDBM2010-preprint.pdf) (PDF). Scientific and Statistical Database
Management. Lecture Notes in Computer Science. Vol. 6187. p. 482. doi:10.1007/978-3-
642-13818-8_34 (https://doi.org/10.1007%2F978-3-642-13818-8_34). ISBN 978-3-642-
13817-1.
20. Bernecker, T.; Houle, M. E.; Kriegel, H. P.; Kröger, P.; Renz, M.; Schubert, E.; Zimek, A.
(2011). Quality of Similarity Rankings in Time Series. Symposium on Spatial and Temporal
Databases. Lecture Notes in Computer Science. Vol. 6849. p. 422. doi:10.1007/978-3-642-
22922-0_25 (https://doi.org/10.1007%2F978-3-642-22922-0_25). ISBN 978-3-642-22921-3.
21. Radovanović, Miloš; Nanopoulos, Alexandros; Ivanović, Mirjana (2010). "Hubs in space:
Popular nearest neighbors in high-dimensional data" (http://www.jmlr.org/papers/volume11/r
adovanovic10a/radovanovic10a.pdf) (PDF). Journal of Machine Learning Research. 11:
2487–2531.
22. Radovanović, M.; Nanopoulos, A.; Ivanović, M. (2010). On the existence of obstinate results
in vector space models. 33rd international ACM SIGIR conference on Research and
development in information retrieval - SIGIR '10. p. 186. doi:10.1145/1835449.1835482 (http
s://doi.org/10.1145%2F1835449.1835482). ISBN 9781450301534.
23. Kainen, Paul C. (1997), "Utilizing Geometric Anomalies of High Dimension: When
Complexity Makes Computation Easier", in Kárný, M.; Warwick, K. (eds.), Computer
Intensive Methods in Control and Signal Processing, pp. 283–294, doi:10.1007/978-1-4612-
1996-5_18 (https://doi.org/10.1007%2F978-1-4612-1996-5_18)
24. Donoho, David L. (2000), "High-Dimensional Data Analysis: The Curses and Blessings of
Dimensionality", Invited lecture at Mathematical Challenges of the 21st Century, AMS
National Meeting, Los Angeles, CA, USA, August 6-12, 2000, CiteSeerX 10.1.1.329.3392 (ht
tps://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.329.3392)
25. Gorban, Alexander N.; Makarov, Valery A.; Tyukin, Ivan Y. (2020). "High-Dimensional Brain
in a High-Dimensional World: Blessing of Dimensionality" (https://www.ncbi.nlm.nih.gov/pm
c/articles/PMC7516518). Entropy. 22 (1): 82. arXiv:2001.04959 (https://arxiv.org/abs/2001.04
959). Bibcode:2020Entrp..22...82G (https://ui.adsabs.harvard.edu/abs/2020Entrp..22...82G).
doi:10.3390/e22010082 (https://doi.org/10.3390%2Fe22010082). PMC 7516518 (https://ww
w.ncbi.nlm.nih.gov/pmc/articles/PMC7516518). PMID 33285855 (https://pubmed.ncbi.nlm.ni
h.gov/33285855).
26. Gorban, Alexander N.; Tyukin, Ivan Y. (2018). "Blessing of dimensionality: mathematical
foundations of the statistical physics of data" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC
5869543). Phil. Trans. R. Soc. A. 376 (2118): 20170237. arXiv:1801.03421 (https://arxiv.org/
abs/1801.03421). Bibcode:2018RSPTA.37670237G (https://ui.adsabs.harvard.edu/abs/201
8RSPTA.37670237G). doi:10.1098/rsta.2017.0237 (https://doi.org/10.1098%2Frsta.2017.02
37). PMC 5869543 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5869543).
PMID 29555807 (https://pubmed.ncbi.nlm.nih.gov/29555807).
27. Hecht-Nielsen, Robert (1994), "Context vectors: general-purpose approximate meaning
representations self-organized from raw data", in Zurada, J.M.; Marks, R.J.; Robinson, C.J.
(eds.), Computational intelligence: imitating life; Proceedings of World Congress on
Computational Intelligence, Neural Networks; 1994; Orlando; FL, Piscataway, NJ: IEEE
Press, pp. 43–56, ISBN 0780311043
28. Pestov, Vladimir (2013). "Is the k-NN classifier in high dimensions affected by the curse of
dimensionality?" (https://doi.org/10.1016%2Fj.camwa.2012.09.011). Comput. Math. Appl. 65
(10): 43–56. doi:10.1016/j.camwa.2012.09.011 (https://doi.org/10.1016%2Fj.camwa.2012.0
9.011).

Retrieved from "https://en.wikipedia.org/w/index.php?title=Curse_of_dimensionality&oldid=1165145277"

Blood Chemistry and CBC Analysis - Clinical Laboratory Testing From A Functional Perspective - Quick Reference Guide
75% (20)
Blood Chemistry and CBC Analysis - Clinical Laboratory Testing From A Functional Perspective - Quick Reference Guide
29 pages
5.1 Constitutional Law-II Project
No ratings yet
5.1 Constitutional Law-II Project
16 pages
Trump v. Benson
No ratings yet
Trump v. Benson
350 pages
Stock Investment
No ratings yet
Stock Investment
61 pages
Data Lineage
No ratings yet
Data Lineage
14 pages
Data Engineering
No ratings yet
Data Engineering
6 pages
List of Datasets For Machine-Learning Research
100% (1)
List of Datasets For Machine-Learning Research
61 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
274 pages
Bayesian Programming
No ratings yet
Bayesian Programming
16 pages
TheBaroquePeriod PDF
No ratings yet
TheBaroquePeriod PDF
39 pages
A History of Organized Labor in Panama and Central America
No ratings yet
A History of Organized Labor in Panama and Central America
321 pages
Introduction To Language Summary
No ratings yet
Introduction To Language Summary
6 pages
(15 Games) Mikhail Botvinnik Plays The Reti
No ratings yet
(15 Games) Mikhail Botvinnik Plays The Reti
6 pages
Heart Beat: 15 Reasons To Take Up The Drums: Mamiverse Team
No ratings yet
Heart Beat: 15 Reasons To Take Up The Drums: Mamiverse Team
7 pages
Data Wrangling
0% (1)
Data Wrangling
5 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
33 pages
DM - 01 - 02 - Data Mining Functionalities PDF
No ratings yet
DM - 01 - 02 - Data Mining Functionalities PDF
63 pages
Franz Schubert and His Times - Kobald, Karl, 1876 - Marshall, Beatrice, TR - 1970 - Port Washington, N.Y., Kennikat Press - 9780804607568 - Anna's Archive
No ratings yet
Franz Schubert and His Times - Kobald, Karl, 1876 - Marshall, Beatrice, TR - 1970 - Port Washington, N.Y., Kennikat Press - 9780804607568 - Anna's Archive
328 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
9 pages
Developmental Psychology: Bandura, Ross and Ross (1961)
No ratings yet
Developmental Psychology: Bandura, Ross and Ross (1961)
1 page
Cultural Identity Assignment
No ratings yet
Cultural Identity Assignment
4 pages
Emission Line Studies of Thousands of Galaxies: Grazyna Stasinska
No ratings yet
Emission Line Studies of Thousands of Galaxies: Grazyna Stasinska
10 pages
Paper - 29 - Presentation
No ratings yet
Paper - 29 - Presentation
40 pages
Curse of Dimensionality, Dimensionality Reduction With PCA
No ratings yet
Curse of Dimensionality, Dimensionality Reduction With PCA
36 pages
Shoulderexaminationppt 180505152418
No ratings yet
Shoulderexaminationppt 180505152418
19 pages
Data Acquisition
No ratings yet
Data Acquisition
28 pages
Data Science
No ratings yet
Data Science
7 pages
Deteksi Thorium Pada Kaos Lampu Petromaks Menggunakan Spektrometer Beta Dengan Detektor Sintilasi Dari Bahan Organik Naftalen
No ratings yet
Deteksi Thorium Pada Kaos Lampu Petromaks Menggunakan Spektrometer Beta Dengan Detektor Sintilasi Dari Bahan Organik Naftalen
6 pages
DataScience - Project (Banknote Authentication) - SHILANJOY BHATTACHARJEE EE
No ratings yet
DataScience - Project (Banknote Authentication) - SHILANJOY BHATTACHARJEE EE
14 pages
Lesson Plan 6
100% (3)
Lesson Plan 6
3 pages
Learning From Experience Through Reflection 1996daudelin868
No ratings yet
Learning From Experience Through Reflection 1996daudelin868
13 pages
Curse of Dimensionality: CSE Department Dit University Dehradun
No ratings yet
Curse of Dimensionality: CSE Department Dit University Dehradun
7 pages
Wavelet
No ratings yet
Wavelet
19 pages
Hierarchical Temporal Memory
No ratings yet
Hierarchical Temporal Memory
11 pages
JMP SUMMIT EUROPE 2018 - Data Mining Under The Curse of Dimensionality (Gianpaolo Polsinelli - LFoundry Italy)
No ratings yet
JMP SUMMIT EUROPE 2018 - Data Mining Under The Curse of Dimensionality (Gianpaolo Polsinelli - LFoundry Italy)
7 pages
Meatsleep Sewnkin Q&Areddit
No ratings yet
Meatsleep Sewnkin Q&Areddit
4 pages
Out of The Cradle Endlessly Rocking
No ratings yet
Out of The Cradle Endlessly Rocking
6 pages
Computational Phylogenetics
No ratings yet
Computational Phylogenetics
18 pages
Hubi Dubi
No ratings yet
Hubi Dubi
13 pages
Dat LM3940
No ratings yet
Dat LM3940
9 pages
Daily Time Record Daily Time Record: Hannah Mae D. Doblas Hannah Mae D. Doblas
No ratings yet
Daily Time Record Daily Time Record: Hannah Mae D. Doblas Hannah Mae D. Doblas
3 pages
3 - Noman Naseer - Dimentionality - Reduction and Principle Component Analysis
No ratings yet
3 - Noman Naseer - Dimentionality - Reduction and Principle Component Analysis
43 pages
Cap6 - Data Reduction
No ratings yet
Cap6 - Data Reduction
27 pages
CS434a/541a: Pattern Recognition Prof. Olga Veksler
No ratings yet
CS434a/541a: Pattern Recognition Prof. Olga Veksler
42 pages
Very Large Database
No ratings yet
Very Large Database
6 pages
L3 KNN
No ratings yet
L3 KNN
17 pages
What Is The Curse of Dimensionality?
No ratings yet
What Is The Curse of Dimensionality?
3 pages
Multidimensional Scaling
No ratings yet
Multidimensional Scaling
6 pages
Data Integration
No ratings yet
Data Integration
8 pages
Data Philanthropy
No ratings yet
Data Philanthropy
5 pages
Parallel Coordinates
No ratings yet
Parallel Coordinates
5 pages
Paper Dimensionless Groups
No ratings yet
Paper Dimensionless Groups
6 pages
Feature Selection Based On Class-Dependent Densities For High-Dimensional Binary Data
No ratings yet
Feature Selection Based On Class-Dependent Densities For High-Dimensional Binary Data
13 pages
Causal Loop Diagram
No ratings yet
Causal Loop Diagram
4 pages
Awards Level A2 June 2015
No ratings yet
Awards Level A2 June 2015
12 pages
Data Blending
No ratings yet
Data Blending
3 pages
XLDB
No ratings yet
XLDB
3 pages
1 s2.0 S156625351930377X Main
No ratings yet
1 s2.0 S156625351930377X Main
15 pages
Dimensionality Reduction For Classification With High-Dimensional Data
No ratings yet
Dimensionality Reduction For Classification With High-Dimensional Data
6 pages
Lecture4-Dimensionality Reduction Methods
No ratings yet
Lecture4-Dimensionality Reduction Methods
40 pages
DWDM AR16 Unit 1.2
No ratings yet
DWDM AR16 Unit 1.2
14 pages
PP&DS Unit Iv
No ratings yet
PP&DS Unit Iv
49 pages
Sexually Transmitted Infections in Pregnancy And.3
No ratings yet
Sexually Transmitted Infections in Pregnancy And.3
7 pages
Support Vector Machine Linear Boundary
No ratings yet
Support Vector Machine Linear Boundary
2 pages
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
No ratings yet
A Tutorial on ν-Support Vector Machines: 1 An Introductory Example
29 pages
The Long House
No ratings yet
The Long House
1 page
The Curse of Dimensionality - Inside Out 2
No ratings yet
The Curse of Dimensionality - Inside Out 2
8 pages
Kundnani, A. (2012) - Radicalisation. The Journey of A Concept
No ratings yet
Kundnani, A. (2012) - Radicalisation. The Journey of A Concept
23 pages
Pattern Recognition
No ratings yet
Pattern Recognition
11 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
41 pages
Structured Data Analysis (Statistics)
No ratings yet
Structured Data Analysis (Statistics)
1 page
Ijdkp 030205
No ratings yet
Ijdkp 030205
18 pages
Geoff Bohling NonParClass
No ratings yet
Geoff Bohling NonParClass
26 pages
Neural Networks Study Notes
100% (2)
Neural Networks Study Notes
11 pages
Document-Oriented Database
No ratings yet
Document-Oriented Database
10 pages
Computational Intelligence
No ratings yet
Computational Intelligence
6 pages
Chapter 4
No ratings yet
Chapter 4
8 pages
Digital Signal Processing
No ratings yet
Digital Signal Processing
8 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
Building A Medieval Lute, Building Early Instruments
No ratings yet
Building A Medieval Lute, Building Early Instruments
4 pages
Curse of Dimensionality and Its Reduction
No ratings yet
Curse of Dimensionality and Its Reduction
5 pages
Views About Language
No ratings yet
Views About Language
8 pages
CH 28
No ratings yet
CH 28
6 pages
2 3-FeatureRelatedIssues
No ratings yet
2 3-FeatureRelatedIssues
10 pages
Ingrey 2016
No ratings yet
Ingrey 2016
3 pages
Unit-I, Part-2 Feature Engineering
No ratings yet
Unit-I, Part-2 Feature Engineering
21 pages
PP&DS 4
No ratings yet
PP&DS 4
82 pages
01 Basics 02knn 02
No ratings yet
01 Basics 02knn 02
7 pages
Introduction - Data Exploration - Overview - Dim Curse
No ratings yet
Introduction - Data Exploration - Overview - Dim Curse
18 pages
Python Unit 4
No ratings yet
Python Unit 4
43 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
9 pages
Modeling Generalization in Machine Learning: A Methodological and Computational Study
No ratings yet
Modeling Generalization in Machine Learning: A Methodological and Computational Study
21 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
UNIT04
No ratings yet
UNIT04
35 pages
Data Reduction
No ratings yet
Data Reduction
28 pages
Weighted Nearest Neighbors and Radius Oversampling For Imbalanced Data Classification
No ratings yet
Weighted Nearest Neighbors and Radius Oversampling For Imbalanced Data Classification
12 pages
08 Kmethods3 Curse Deminsionality
No ratings yet
08 Kmethods3 Curse Deminsionality
44 pages
Yihao Final Paper CCSC For Submission
No ratings yet
Yihao Final Paper CCSC For Submission
6 pages
முத்திநிச்சயம்
No ratings yet
முத்திநிச்சயம்
172 pages
1041 Anomaly Detection For Tabular
No ratings yet
1041 Anomaly Detection For Tabular
26 pages
Module 6
No ratings yet
Module 6
51 pages
Portinale Saitta 2002a
No ratings yet
Portinale Saitta 2002a
22 pages
Feature Dimensionality Reduction: A Review: Survey and State of The Art
No ratings yet
Feature Dimensionality Reduction: A Review: Survey and State of The Art
31 pages
Outline Draft 1
No ratings yet
Outline Draft 1
3 pages
AML Unit-8 - Numericals - Only For Reference
No ratings yet
AML Unit-8 - Numericals - Only For Reference
10 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
From Everand
Scale Invariant Feature Transform: Unveiling the Power of Scale Invariant Feature Transform in Computer Vision
Fouad Sabry
No ratings yet
Genetic Algorithm: Fundamentals and Applications
From Everand
Genetic Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet

Curse of Dimensionality

Uploaded by

Curse of Dimensionality

Uploaded by

Curse of dimensionality

denoting whether or not they have a genetic Individual 200 0 1 ... 1

The variance of for uniform distribution in the cube is

Nearest neighbor search

k-nearest neighbor classification

1. Concentration of scores and distances: derived values such as distances become

Retrieved from "https://en.wikipedia.org/w/index.php?title=Curse_of_dimensionality&oldid=1165145277"

You might also like