0% found this document useful (0 votes)

101 views21 pages

Chapter 6: High-Dimensional Data

The document discusses concepts related to high-dimensional data, including: 1) High-dimensional space behaves differently than lower dimensions and does not follow familiar geometry. Data can be represented as a hyperrectangle or hypercube in high-dimensional space. 2) Hyperspheres and hyperballs are used to represent data centered around a point in high-dimensions. The volume of hyperspheres increases exponentially with dimensionality. 3) As dimensionality increases, the volume of the unit hypersphere approaches zero, indicating most of the space is empty, even though the total volume continues to grow exponentially with dimensionality.

Uploaded by

s8nd11d UNI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views21 pages

Chapter 6: High-Dimensional Data

Uploaded by

s8nd11d UNI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Mining and Machine Learning:

Fundamental Concepts and Algorithms

dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 6: High-dimensional Data

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 1 / 21
High-dimensional Space
Let D be a n × d data matrix. In data mining typically the data is very high dimensional.
Understanding the nature of high-dimensional space, or hyperspace, is very important,
especially because it does not behave like the more familiar geometry in two or three
dimensions.
Hyper-rectangle: The data space is a d-dimensional hyper-rectangle
d h
Y i
Rd = min(Xj ), max(Xj )
j =1

where min(Xj ) and max(Xj ) specify the range of Xj .

Hypercube: Assume the data is centered, and let m denote the maximum attribute value
d n
n o
m = max max |xij |
j =1 i =1

The data hyperspace can be represented as a hypercube, centered at 0, with all sides of
length l = 2m, given as
n o
Hd (l) = x = (x1 , x2 , . . . , xd )T ∀i, xi ∈ [−l/2, l/2]

The unit hypercube has all sides of length l = 1, and is denoted as Hd (1).
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 2 / 21
Hypersphere
Assume that the data has been centered, so that µ = 0. Let r denote the largest
magnitude among all points:
n o
r = max kx i k
i

The data hyperspace can be represented as a d-dimensional hyperball centered at

0 with radius r , defined as
( d
)
X
xj2 ≤ r 2

Bd (r ) = x | kxk ≤ r or Bd (r ) = x = (x1 , x2 , . . . , xd )
j =1

The surface of the hyperball is called a hypersphere, and it consists of all the
points exactly at distance r from the center of the hyperball

Sd (r ) = x | kxk = r
( d
)
X
or Sd (r ) = x = (x1 , x2 , . . . , xd ) (xj )2 = r 2
j =1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 3 / 21
Iris Data Hyperspace: Hypercube and Hypersphere
l = 4.12 and r = 2.19

bC
bC
bC
bC
1 bC
bC bC bC bC
bC bC bC
bC bC bC
X2 : sepal width

bC bC Cb bC r
bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC Cb bC Cb Cb
bC bC bC b bC bC Cb
bC bC bC bC bC bC bC bC bC bC Cb Cb bC bC bC Cb Cb bC bC
0 bC Cb bC bC bC Cb Cb Cb Cb Cb
Cb bC Cb bC bC Cb Cb bC bC Cb bC
bC Cb Cb bC Cb Cb
Cb Cb bC bC bC
bC bC bC bC bC bC Cb
bC bC
bC bC bC bC
bC bC
bC
−1

−2
−2 −1 0 1 2
X1 : sepal length
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 4 / 21
High-dimensional Volumes
Hypercube: The volume of a hypercube with edge length l is given as

vol(Hd (l)) = l d

HypersphereThe volume of a hyperball and its corresponding hypersphere is identical

The volume of a hypersphere is given as
4
In 1D: vol(S1 (r )) = 2r In 2D: vol(S2 (r )) = πr 2 In 3D: vol(S3 (r )) = πr 3
3

d
!
π2
In d-dimensions: vol(Sd (r )) = Kd r d = rd
Γ d2 + 1

where
( d
! if d is even

d 2
Γ + 1 = √ d !!
2 π 2(d+1)/2 if d is odd

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 5 / 21
Volume of Unit Hypersphere

With increasing dimensionality the hypersphere volume first increases up to a point, and
then starts to decrease, and ultimately vanishes. In particular, for the unit hypersphere
with r = 1,

d
π2
lim vol(Sd (1)) = lim →0
d →∞ d →∞ Γ( d + 1)
2

bC
bC
5 bC
bC

bC
bC
4
vol(Sd (1))

bC
bC
3
bC

bC
2 bC

1 bC
bC
bC
bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
0
0 5 10 15 20 25 30 35 40 45 50
d
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 6 / 21
Hypersphere Inscribed within Hypercube

Consider the space enclosed within the largest hypersphere that can be
accommodated within a hypercube (which represents the dataspace).
The ratio of the volume of the hypersphere of radius r to the hypercube with side
length l = 2r is given as

vol(S2 (r )) πr 2 π
In 2 dimensions: = 2 = = 78.5%
vol(H2 (2r )) 4r 4
4
vol(S3 (r )) πr 3 π
In 3 dimensions: = 3 3 = = 52.4%
vol(H3 (2r )) 8r 6

vol(Sd (r )) π d /2
In d dimensions: lim = lim d d →0
d →∞ vol(Hd (2r )) d →∞ 2 Γ( + 1)
2

As the dimensionality increases, most of the volume of the hypercube is in the

“corners,” whereas the center is essentially empty.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 7 / 21
Hypersphere Inscribed inside a Hypercube

−r
−r 0 r
0
r

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 8 / 21
Conceptual View of High-dimensional Space
Two, three, four, and higher dimensions

All the volume of the hyperspace is in the corners, with the center being
essentially empty.

High-dimensional space looks like a rolled-up porcupine!

(a) 2D (b) 3D (c) 4D (d) dD

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 9 / 21
Volume of a Thin Shell

The volume of a thin hypershell of width

ǫ is given as

vol(Sd (r , ǫ)) = vol(Sd (r )) − vol(Sd (r − ǫ))

= Kd r d − Kd (r − ǫ)d .

The ratio of volume of the thin shell to

the volume of the outer sphere:
r
d d
vol(Sd (r , ǫ)) Kd r − Kd (r − ǫ) ǫ d
= = 1− 1−
vol(Sd (r )) Kd r d r

r−
ǫ
As d increases, we have

ǫ
vol(Sd (r , ǫ)) ǫ d
lim = lim 1 − 1 − →1
d →∞ vol(Sd (r )) d →∞ r

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 10 / 21
Diagonals in Hyperspace

Consider a d-dimensional hypercube, with origin 0d = (01 , 02 , . . . , 0d ), and

bounded in each dimension in the range [−1, 1]. Each “corner” of the hyperspace
is a d-dimensional vector of the form (±11 , ±12 , . . . , ±1d )T .
Let e i = (01 , . . . , 1i , . . . , 0d )T denote the d-dimensional canonical unit vector in
dimension i, and let 1 denote the d-dimensional diagonal vector (11 , 12 , . . . , 1d )T .
Consider the angle θd between the diagonal vector 1 and the first axis e 1 , in d
dimensions:
e T1 1 e T1 1 1 1
cos θd = =p √ =√ √ =√
ke 1 k k1k e T1 e 1 1T 1 1 d d

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 11 / 21
Diagonals in Hyperspace

As d increases, we have
1
lim cos θd = lim √ → 0
d →∞ d →∞ d
which implies that

π
lim θd → = 90◦
d →∞ 2

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 12 / 21
Angle between Diagonal Vector 1 and e 1

1
1

1 1

θ
0 e1
0
θ

−1 −1
−1
−1 0 1
1
0
0
1 −1

(a) In 2D (b) In 3D
In high dimensions all of the diagonal vectors are perpendicular (or orthogonal) to all the
coordinates axes! Each of the 2d −1 new axes connecting pairs of 2d corners are
essentially orthogonal to all of the d principal coordinate axes! Thus, in effect,
high-dimensional space has an exponential number of orthogonal “axes.”

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 13 / 21
Density of the Multivariate Normal
Consider the standard multivariate normal distribution with µ = 0, and Σ = I
xT x

1
f (x) = √ exp −
( 2π)d 2

The peak of the density is at the mean. Consider the set of points x with density at least
α fraction of the density at the mean
f (x)
≥α
f (0)
xT x

exp − ≥α
2
x T x ≤ −2 ln(α)
d
X
(xi )2 ≤ −2 ln(α)
i =1

The sum of squared IID random variables follows a chi-squared distribution χ2d . Thus,

f (x)
P ≥ α = Fχ2 (−2 ln(α))
f (0) d

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 14 / 21
Density Contour for α Fraction of the Density at the Mean:
One Dimension

Let α = 0.5, then −2 ln(0.5) = 1.386 and Fχ2 (1.386) = 0.76. Thus, 24% of the
1
density is in the tail regions.

0.4

0.3

α = 0.5
0.2

0.1

| |
−4 −3 −2 −1 0 1 2 3 4

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 15 / 21
Density Contour for α Fraction of the Density at the Mean:
Two Dimensions
Let α = 0.5, then −2 ln(0.5) = 1.386 and Fχ2 (1.386) = 0.50. Thus, 50% of the
2
density is in the tail regions.

f (x)

0.15

0.10
α = 0.5
0.05 −4
−3
b −2
0
−1
0 X2
−4 1
−3
−2 2
−1
0
1 3
X1 2
3
4 4

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 16 / 21
Chi-Squared Distribution: P(f (x)/f (0) ≥ α)

This probability decreases rapidly with dimensionality. For 2D, it is 0.5. For 3D it
is 0.29, ie., 71% of the density is in the tails. By d = 10, it decreases to 0.075%,
that is, 99.925% of the points lie in the extreme or tail regions.
f (x) f (x)

0.5 F = 0.29
F = 0.5 0.25

0.4
0.20

0.3
0.15

0.2
0.10

0.1 0.05

0 x 0 x
0 5 10 15 0 5 10 15

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 17 / 21
Hypersphere Volume: Polar Coordinates in 2D
X2

The Jacobian matrix for this transformation

(x1 , x2 ) is given as
bC
∂ x1 ∂ x1
!
∂r ∂θ1 c1 −rs1
r J(θ1 ) = ∂ x
2 ∂ x2
=
s1 rc1
θ1 ∂r ∂θ1
X1
Hypersphere volume is obtained by
integration over r and θ1 (with r > 0, and
0 ≤ θ1 ≤ 2π):
Z Z
vol(S2 (r )) = det(J(θ1 )) dr dθ1

r θ1
Z rZ 2π Z r Z 2π
The point x = (x1 , x2 ) in polar coordinates = r dr dθ1 = r dr dθ1
0 0 0 0
x1 = r cos θ1 = rc1 2 r

r
2π
= · θ1 = πr 2

x2 = r sin θ1 = rs1 2 0
0

where r = kxk, and cos θ1 = c1 and

sin θ1 = s1 .
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 18 / 21
Hypersphere Volume: Polar Coordinates in 3D
x = (x1 , x2 , x2 ) in polar coordinates

x1 = r cos θ1 cos θ2 = rc1 c2

X3
x2 = r cos θ1 sin θ2 = rc1 s2
x3 = r sin θ1 = rs1

(x1 , x2 , x3 )
The Jacobian matrix is given as
bC
 
r c1 c2 −rs1 c2 −rc1 s2
X2 J(θ1 , θ2 ) =  c1 s2 −rs1 s2 rc1 c2 
θ1
s1 rc1 0
θ2
The volume of the hypersphere for d = 3 is
obtained via a triple integral with r > 0,
−π/2 ≤ θ1 ≤ π/2, and 0 ≤ θ2 ≤ 2π
Z Z Z
X1 vol(S3 (r )) = det(J(θ1 , θ2 )) dr dθ1 dθ2

r θ1 θ2
4
= πr 3
3

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 19 / 21
Hypersphere Volume in d Dimensions
The determinant of the d-dimensional Jacobian matrix is

det(J(θ1 , θ2 , . . . , θd −1 )) = (−1)d r d −1 c1d −2 c2d −3 . . . cd −2

The volume of the hypersphere is given by the d-dimensional integral with r > 0,
−π/2 ≤ θi ≤ π/2 for all i = 1, . . . , d − 2, and 0 ≤ θd −1 ≤ 2π:
Z Z Z Z
vol(Sd (r )) = ··· det(J(θ1 , θ2 , . . . , θd −1 )) dr dθ1 dθ2 . . . dθd −1

r θ1 θ2 θd−1
Z r Z π/2 Z π/2 Z 2π
= r d −1 dr c1d −2 dθ1 · · · cd −2 dθd −2 dθd −1
0 −π/2 −π/2 0
d −1 d −2
1

rd Γ 2
Γ 2
Γ 2
Γ 21 Γ (1) Γ 12
= ... 2π
d d −1
Γ 23

d Γ 2
Γ 2
1 d /2−1 d

πΓ 2
r
= d d

2
Γ 2
!
π d /2
= rd
Γ d2 + 1

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 20 / 21
Data Mining and Machine Learning:
Fundamental Concepts and Algorithms
dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 6: High-dimensional Data

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 6: High-dimensional Data 21 / 21

Probability I - Mark Scheme
No ratings yet
Probability I - Mark Scheme
17 pages
Pat Recogn
No ratings yet
Pat Recogn
145 pages
General Mathematics: Quarter 1 - Module 12: The Inverse of One-To-One Functions
90% (30)
General Mathematics: Quarter 1 - Module 12: The Inverse of One-To-One Functions
21 pages
Class 12 - KST STUDY POINT - Question Bank - Maths
No ratings yet
Class 12 - KST STUDY POINT - Question Bank - Maths
230 pages
Chapter 7: Dimensionality Reduction
No ratings yet
Chapter 7: Dimensionality Reduction
34 pages
UMAP
No ratings yet
UMAP
28 pages
Year 2 Autumn Block 1 Step 1 PPT Count Objects To 100
No ratings yet
Year 2 Autumn Block 1 Step 1 PPT Count Objects To 100
21 pages
Geometric and Topological Data Reduction
No ratings yet
Geometric and Topological Data Reduction
275 pages
NCERT Reference
No ratings yet
NCERT Reference
295 pages
Day 3 Solutions
100% (1)
Day 3 Solutions
5 pages
MathPSHS Curriculum
No ratings yet
MathPSHS Curriculum
1 page
Module7 Slides
No ratings yet
Module7 Slides
69 pages
FLAC Manual
90% (10)
FLAC Manual
3,058 pages
CSEC - Add Math - Paper 2 Booklet (2016-2023)
No ratings yet
CSEC - Add Math - Paper 2 Booklet (2016-2023)
151 pages
L16 LogisticRegression2
No ratings yet
L16 LogisticRegression2
58 pages
Lec10 CurseOfDimensionality
No ratings yet
Lec10 CurseOfDimensionality
15 pages
DAS732 Lecture - 2024 08 08
No ratings yet
DAS732 Lecture - 2024 08 08
47 pages
CH-3a (JCDF & JPDF)
No ratings yet
CH-3a (JCDF & JPDF)
98 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
274 pages
Behat
100% (1)
Behat
87 pages
TT02 Data, Methods, and Scenarios
No ratings yet
TT02 Data, Methods, and Scenarios
44 pages
Kernel Density Estimation With Polyspherical Data and Its Applications
No ratings yet
Kernel Density Estimation With Polyspherical Data and Its Applications
44 pages
Data Mining Chapter#2
No ratings yet
Data Mining Chapter#2
79 pages
Mathematial Introduction To Data Science
No ratings yet
Mathematial Introduction To Data Science
158 pages
5 - High-Dimensional Space
No ratings yet
5 - High-Dimensional Space
18 pages
FeatureCAM 2015 Reference Help
100% (1)
FeatureCAM 2015 Reference Help
1,985 pages
Neubert2019 Article AnIntroductionToHyperdimension
No ratings yet
Neubert2019 Article AnIntroductionToHyperdimension
12 pages
Lec 4
No ratings yet
Lec 4
1 page
High Dimensional Spaces
No ratings yet
High Dimensional Spaces
24 pages
Visualization of High Dimensional Scientific Data
No ratings yet
Visualization of High Dimensional Scientific Data
105 pages
Unit-4 Finalized
No ratings yet
Unit-4 Finalized
7 pages
Geometry of High-Dimensional Space
No ratings yet
Geometry of High-Dimensional Space
36 pages
Workshop Technology Lesson Plan (Rev.2)
No ratings yet
Workshop Technology Lesson Plan (Rev.2)
3 pages
Applications of Integration - Mean and Root Mean Square Values
No ratings yet
Applications of Integration - Mean and Root Mean Square Values
6 pages
The Wedge Product and Analytic Geometry - Khosravi, Taylor (2008)
No ratings yet
The Wedge Product and Analytic Geometry - Khosravi, Taylor (2008)
23 pages
Placement With MCTS
No ratings yet
Placement With MCTS
15 pages
2202.00726 Yhe Atrs
No ratings yet
2202.00726 Yhe Atrs
13 pages
Supplementary - Active Learning Alloys
No ratings yet
Supplementary - Active Learning Alloys
38 pages
L - 2 - High-Dimensional Space
No ratings yet
L - 2 - High-Dimensional Space
20 pages
Symmetrical Based Projects
No ratings yet
Symmetrical Based Projects
105 pages
Iarjset 2024 11912
No ratings yet
Iarjset 2024 11912
14 pages
R 2 Calculations
No ratings yet
R 2 Calculations
30 pages
Introduction To Bounding Volume Hierarchies: Herman J. Haverkort 18 May 2004
No ratings yet
Introduction To Bounding Volume Hierarchies: Herman J. Haverkort 18 May 2004
9 pages
Math Tessellation Final Project
No ratings yet
Math Tessellation Final Project
8 pages
Sta 5
No ratings yet
Sta 5
16 pages
Introduction - Data Exploration - Overview - Dim Curse
No ratings yet
Introduction - Data Exploration - Overview - Dim Curse
18 pages
Computing Persistent Homology
No ratings yet
Computing Persistent Homology
41 pages
Clustering Methods For Spherical Data: An Overview and A New Generalization
No ratings yet
Clustering Methods For Spherical Data: An Overview and A New Generalization
11 pages
Assignment 1 CV1 WS16/17: This Assignment Is Due On November 14th, 2016 at 12:00
No ratings yet
Assignment 1 CV1 WS16/17: This Assignment Is Due On November 14th, 2016 at 12:00
4 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
Chapter Three: One-Dimensional, Two Dimensional, Three-Dimensional
No ratings yet
Chapter Three: One-Dimensional, Two Dimensional, Three-Dimensional
15 pages
Data Structures For Range-Sum Queries: The Evolution of The Data Cube
No ratings yet
Data Structures For Range-Sum Queries: The Evolution of The Data Cube
44 pages
On Clustering Binary Data: Tao Li Shenghuo Zhu
No ratings yet
On Clustering Binary Data: Tao Li Shenghuo Zhu
5 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
79 pages
Convolution Sum PDF
No ratings yet
Convolution Sum PDF
17 pages
Big Feature Data Analytics: Split and Combine Linear Discriminant Analysis (SC-LDA) For Integration Towards Decision Making Analytics
No ratings yet
Big Feature Data Analytics: Split and Combine Linear Discriminant Analysis (SC-LDA) For Integration Towards Decision Making Analytics
10 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
58 pages
Planning Kopia
No ratings yet
Planning Kopia
4 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
57 pages
Chapter 8: Itemset Mining
No ratings yet
Chapter 8: Itemset Mining
34 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
31 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
29 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
28 pages
Compact Hilbert Indices For Multi-Dimensional Data
No ratings yet
Compact Hilbert Indices For Multi-Dimensional Data
8 pages
Chapter 3: Categorical Attributes
No ratings yet
Chapter 3: Categorical Attributes
26 pages
Chapter 1: Data Mining and Analysis
No ratings yet
Chapter 1: Data Mining and Analysis
24 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
16 pages
Clustering High-Dimensional Data
No ratings yet
Clustering High-Dimensional Data
5 pages
Attribute Value Reordering For Efficient Hybrid Olap: Owen Kaser
No ratings yet
Attribute Value Reordering For Efficient Hybrid Olap: Owen Kaser
32 pages
Chunking of Large Multidimensional Arrays
No ratings yet
Chunking of Large Multidimensional Arrays
19 pages
Ipmv Mod 5&6 (Theory Questions)
No ratings yet
Ipmv Mod 5&6 (Theory Questions)
11 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
UNIT-8 Mining Complex Types of Data
No ratings yet
UNIT-8 Mining Complex Types of Data
96 pages
High Precision Agriculture: An Application of Improved Machine-Learning Algorithms 2019
No ratings yet
High Precision Agriculture: An Application of Improved Machine-Learning Algorithms 2019
6 pages
1904.11044v1 Flow in 3d
No ratings yet
1904.11044v1 Flow in 3d
47 pages
Preview-9781108983945 A45557565
No ratings yet
Preview-9781108983945 A45557565
24 pages
Lect Notes
No ratings yet
Lect Notes
60 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
59 pages
Fourier Analysis in Polar and Spherical Coordinates (Optics)
No ratings yet
Fourier Analysis in Polar and Spherical Coordinates (Optics)
26 pages
Data Science in Agriculture Part I: Introduction
100% (1)
Data Science in Agriculture Part I: Introduction
2 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Visually Mining The Datacube Using A Pixel-Oriented Technique
No ratings yet
Visually Mining The Datacube Using A Pixel-Oriented Technique
8 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Clase01 - Introducción Al Paralelismo
No ratings yet
Clase01 - Introducción Al Paralelismo
30 pages
Rough Sets Association Analysis
No ratings yet
Rough Sets Association Analysis
14 pages
Data Preprocessing: Data Cleaning Data Integration and Transformation
No ratings yet
Data Preprocessing: Data Cleaning Data Integration and Transformation
41 pages
Oil PHD
No ratings yet
Oil PHD
129 pages
Chapter 4-1
No ratings yet
Chapter 4-1
7 pages
Fig-1 in (Lec - 05 - Ver - 01.vsd) : Common Emitter Amplifier Frequency Response
No ratings yet
Fig-1 in (Lec - 05 - Ver - 01.vsd) : Common Emitter Amplifier Frequency Response
16 pages
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Machine Learning: Fundamental Concepts and Algorithms
45 pages
Science 2006 Cottrell 454 5
No ratings yet
Science 2006 Cottrell 454 5
3 pages
Chapter 10: Sequence Mining
No ratings yet
Chapter 10: Sequence Mining
37 pages
Chapter - 1: 1.1 Overview
No ratings yet
Chapter - 1: 1.1 Overview
50 pages
Cloud Hypothesis
No ratings yet
Cloud Hypothesis
17 pages
Reporting Document-Sap BPC Epm
100% (1)
Reporting Document-Sap BPC Epm
43 pages
Math-12th Sample Question Papers (Solved) 2024-25
No ratings yet
Math-12th Sample Question Papers (Solved) 2024-25
21 pages
Advanced Database Systems
No ratings yet
Advanced Database Systems
15 pages
Introduction of Data Science - Mahatma Gandhi Central University
No ratings yet
Introduction of Data Science - Mahatma Gandhi Central University
17 pages
Hydrocarbon Reservoir Modeling Comparison Between Theoretical and Real Petrophysical Properties From The Namorado Field (Brazil) Case Study
No ratings yet
Hydrocarbon Reservoir Modeling Comparison Between Theoretical and Real Petrophysical Properties From The Namorado Field (Brazil) Case Study
17 pages
Homological Algebra
0% (1)
Homological Algebra
279 pages
Is 1893 (Part 4) :2005
100% (3)
Is 1893 (Part 4) :2005
24 pages
UG Physics PH1101-wave-2
No ratings yet
UG Physics PH1101-wave-2
35 pages
Secondary 4 Mathematics: 〔 Whole Syllabus〕 Marking Scheme
No ratings yet
Secondary 4 Mathematics: 〔 Whole Syllabus〕 Marking Scheme
20 pages
Data Mining1
No ratings yet
Data Mining1
3 pages
4TH Summative Test in Math4
No ratings yet
4TH Summative Test in Math4
1 page
Math Problems
No ratings yet
Math Problems
8 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Useful Formulae: Mathematical & Physical
From Everand
Useful Formulae: Mathematical & Physical
Matthew Watkins
No ratings yet
Shortcuts to College Calculus Refreshment Kit
From Everand
Shortcuts to College Calculus Refreshment Kit
Juan Acevedo
No ratings yet

Chapter 6: High-Dimensional Data

Uploaded by

Chapter 6: High-Dimensional Data

Uploaded by

Data Mining and Machine Learning:

Fundamental Concepts and Algorithms

Mohammed J. Zaki1 Wagner Meira Jr.2

Chapter 6: High-dimensional Data

where min(Xj ) and max(Xj ) specify the range of Xj .

The data hyperspace can be represented as a d-dimensional hyperball centered at

HypersphereThe volume of a hyperball and its corresponding hypersphere is identical

As the dimensionality increases, most of the volume of the hypercube is in the

High-dimensional space looks like a rolled-up porcupine!

(a) 2D (b) 3D (c) 4D (d) dD

The volume of a thin hypershell of width

vol(Sd (r , ǫ)) = vol(Sd (r )) − vol(Sd (r − ǫ))

The ratio of volume of the thin shell to

Consider a d-dimensional hypercube, with origin 0d = (01 , 02 , . . . , 0d ), and

The Jacobian matrix for this transformation

where r = kxk, and cos θ1 = c1 and

x1 = r cos θ1 cos θ2 = rc1 c2

det(J(θ1 , θ2 , . . . , θd −1 )) = (−1)d r d −1 c1d −2 c2d −3 . . . cd −2

Mohammed J. Zaki1 Wagner Meira Jr.2

Chapter 6: High-dimensional Data

You might also like