Manifold Fitting
Manifold Fitting
3 Yau Mathematical Sciences Center, Jingzhai, Tsinghua University, Haidian District, Beijing, 100084 China
While classical data analysis has addressed observations that are real num-
arXiv:2304.07680v2 [math.ST] 12 Aug 2023
* Research are supported by MOE Tier 2 grant A-0008520-00-00 and Tier 1 grant A8000987-00-00 at the
National University of Singapore.
†
ZY thanks the support from the Center of Mathematical Sciences and Applications (CMSA) at Harvard
University during his visit since 2022. ZY thanks Professor Charles Fefferman for his helpful discussions. Part of
the work has been done during the Harvard Conference on Geometry and Statistics, supported by CMSA during
Feb 27-March 1, 2023.
MSC2020 subject classifications: Primary 62R99; secondary 62A99.
Keywords and phrases: Manifold fitting, Convergence, Hausdorff distance, Reach.
1
2
To address these problems, various mathematical approaches have been proposed (see [13,
14, 15, 17, 16]). However, many of these methods rely on restrictive assumptions, making it
challenging to implement them as efficient algorithms. As the manifold hypothesis continues to
be a foundational element in statistical research, the Geometric Whitney Problems, particularly
Problem I, merit further exploration and discussion within the statistical community.
The manifold hypothesis posits that high-dimensional data typically lie close to a low-
dimensional manifold. The genesis of the manifold hypothesis stems from the observation that
numerous physical systems possess a limited number of underlying variables that determine
their behavior, even when they display intricate and diverse phenomena in high-dimensional
spaces. For instance, while the motion of a body can be expressed as high-dimensional signals,
the actual motion signals comprise a low-dimensional manifold, as they are generated by a
small number of joint angles and muscle activations. Analogous phenomena arise in diverse
areas, such as speech signals, face images, climate models, and fluid turbulence. The manifold
hypothesis is thus essential for efficient and accurate high-dimensional data analysis in fields
such as computer vision, speech analysis, and medical diagnosis.
In early statistics, one common approach for approximating high-dimensional data was to
use a lower-dimensional linear subspace. One widely used technique for identifying the linear
subspace of high-dimensional data is Principal Component Analysis (PCA). Specifically, PCA
involves computing the eigenvectors of the sample covariance matrix and then employing
these eigenvectors to map the data points onto a lower-dimensional space. One of the principal
advantages of methods like this is that they can yield a simplified representation of the data,
facilitating visualization and analysis. Nevertheless, linear subspaces can only capture linear
relationships in the data and may fail to represent non-linear patterns accurately. To address
these limitations, it is often necessary to employ more advanced manifold-learning techniques
that can better capture non-linear relationships and preserve key information in the data. These
algorithms can be grouped into three categories based on their purpose: manifold embedding,
manifold denoising, and manifold fitting. The key distinction between them is depicted in
Figure 1.
(b) Denoising
(a) Embedding
(c) Fitting
F IG 1. Illustrations for (a) manifold embedding, (b) manifold denoising, and (c) manifold fitting.
clearer demarcation between classes. Many scholars have performed various types of research
on manifold-embedding algorithms, such as Isometric Mapping ([38]), Locally Linear Em-
bedding ([36, 8]), Laplacian Eigenmaps ([1]), Local Tangent Space Alignment ([44]), and
Uniform Manifold Approximation Map ([31]). Although these algorithms achieve useful
representations of real-world data, few of them provide theoretical guarantees. Furthermore,
these algorithms typically do not consider the geometry of the original manifold or provide
any illustration of the smoothness of the embedding points.
Manifold denoising aims to address outliers in data sets distributed along a low-dimensional
manifold. Because of disturbances during collection, storage, and transportation, real-world
manifold-distributed data often contain noise. Manifold denoising methods are designed to
reduce the effect of noise and produce a new set of points closer to the underlying manifold.
There are two main approaches to achieving this: feature-based and expectation-based methods.
Feature-based methods extract features using techniques such as wavelet transformation
([7, 41]) or neural networks ([30]) and then drop non-informative features to recover denoised
points via inverse transformations. However, such methods are typically validated only through
simulation studies, lacking theoretical analysis. On the other hand, expectation-based methods
can achieve manifold denoising by shifting the local sample mean ([39]) or by fitting a local
mean function ([37]). However, these methods lack a solid theoretical basis or require overly
restrictive assumptions.
Manifold fitting is a crucial and challenging problem in manifold learning. It aims to
reconstruct a smooth manifold that closely approximates the geometry and topology of
a hidden low-dimensional manifold, using only a data set that lies on or near it. Unlike
manifold embedding or denoising, manifold fitting strongly emphasizes the local and global
properties of the approximation. It seeks to ensure that the generated manifold’s geometry,
particularly its curvature and smoothness, is precise. The application of manifold fitting can
significantly enhance data analysis by providing a deeper understanding of data geometry. A
key benefit of manifold fitting is its ability to uncover the shape of the hidden manifold by
projecting the samples onto the learned manifold. For example, when reproducing the three-
dimensional structure of a given protein molecule, the molecule must be photoed from different
angles several times via cryo-electron microscopy (cryo-EM). Although the orientation of the
molecule is equivalent to the Lie group SO(3), the cryo-EM images are often buried by a
high-dimensional noise because of the scale of the pixels. Manifold fitting helps recover the
underlying low-dimensional Lie group of protein-molecule images and infer the structure of
the protein from it. In a similar manner, manifold fitting can also be used for light detection
and ranging ([25]), as well as wind-direction detection ([6]). In addition, manifold fitting can
generate manifold-valued data with a specific distribution. This capability is potentially useful
in generative machine-learning models, such as Generative Adversarial Network (GAN, [22]).
1.1. Main Contribution. The main objective of this paper is to address the problem
of manifold fitting by developing a smooth manifold estimator based on a set of noisy
observations in the ambient space. Our goal is to achieve a state-of-the-art geometric error
bound while preserving the geometric properties of the manifold. To this end, we employ the
Hausdorff distance to measure the estimation error and reach to quantify the smoothness of
manifolds. Further details and definitions of these concepts are provided in Section 2.1.
Specifically, we consider a random vector Y ∈ RD that can be expressed as
(1.1) Y = X + ξ,
where X ∈ RD is an unobserved random vector following a distribution ω supported on the
latent manifold M, and ξ ∼ ϕσ represents the ambient-space observation noise, independent
4
Assume Y = {yi }N
i=1 ⊂ RD is the collection of observed data points, also in the form of
(1.3) yi = x i + ξi , for i = 1, · · · , N,
with (yi , xi , ξi ) being N independent and identical realizations of (Y, X, ξ). Based on Y , we
construct an estimator M c for M and provide theoretical justification for it under the following
main assumptions:
• The latent manifold M is a compact and twice-differentiable d-dimensional sub-manifold,
embedded in the ambient space RD . Its volume with respect to the d-dimensional Hausdorff
measure is upper bounded by V , and its reach is lower bounded by a fixed constant τ .
• The distribution ω is a uniform distribution, with respect to the d-dimensional Hausdorff
measure, on M.
• The noise distribution ϕσ is a Gaussian distribution supported on RD with density function
1 D ∥ξ∥22
ϕσ (ξ) = ( ) 2 exp (− ).
2πσ 2 2σ 2
• The intrinsic dimension d and noise standard deviation σ are known.
In general, M
c is constructed by estimating the projection of points. For a point y in the
domain Γ = {y : d(y, M) ≤ Cσ}, we estimate its projection on M in a two-step manner:
determining the direction and moving y in that direction. The estimation has both theoretical
and algorithmic contributions. From the theoretical perspective:
• On the population level, given the observation distribution ν and the domain Γ, we are able
to obtain a smoothly bordered set S ∈ RD such that the Hausdorff distance satisfies
dH (S, M) < cσ 2 log(1/σ).
• On the sample level, given a sample set Y , with sample size N = O(σ −(d+3) ) and σ being
sufficiently small, we are able to obtain an estimator M
c as a smooth d-dimensional manifold
such that
– For any point y ∈ M c, d(y, M) is less than Cσ 2 log(1/σ);
– For any point x ∈ M, d(x, M) c is less than Cσ 2 log(1/σ);
– For any two points y1 , y2 , we have ∥y1 − y2 ∥22 /d(y2 , Ty1 M)
c ≥ cστ ,
−c
with probability 1 − C1 exp(−C2 σ ), for some positive constant c, c1 , C , C1 , and C2 .
1
• Our method requires only noisy samples and does not need any information about the latent
manifold, such as its dimension, thereby broadening the applicability of our framework.
• Our framework computes the approximate projection of an observed point onto the hidden
manifold, providing a clear relationship between input and output. In comparison, pre-
vious algorithms used multiple iterative operations, making it difficult to understand the
relationship between input samples and the corresponding outputs.
1.2. Related Works. One main source of manifold fitting would be the Delaunay trian-
gulation [26] from the 1980s. Given a sample set, a Delaunay triangulation is a meshing in
which no samples are inside the circumcircle of any triangle in the triangulation. Based on
this technique, the early manifold-fitting approaches [5, 2] consider dense samples without
noise. In other words, the given data set constitutes (ϵ, δ)-net of the hidden manifold. Both
[5] and [2] generate a piecewise linear manifold by triangulation that is geometrically and
topologically similar to the hidden manifold. However, the generated manifold is not smooth
and the noise-free and densely distributed assumption of the given data prevents the algorithm
from being widespread.
In recent years, manifold fitting has been more intensively studied and developed, the
research including the accommodation of multiple types of noise and sample distributions, as
well as the smoothness of the resulting manifolds. Genovese et al. have obtained a sequence
of results from the perspective of minimax risk under Hausdorff distance ([19, 20]) with Le
Cam’s method. Their work starts from [19], where noisy sample points are also modeled as
the summation of latent random variables from the hidden manifold and additive noise, but the
noise term is assumed to be bounded and perpendicular to the manifold. The optimal minimax
estimation rate is lower bounded by O(N −2/(2+d) ) with properly constructed extreme cases,
and upper bounded by O(( logNN )2/(2+d) ) with a sieve maximum likelihood estimator (MLE).
Hence, they conclude the rate is tight up to logarithmic factors, and the optimal rate of
convergence is O(N −2/(2+d) ). This result is impressive since the rate only depends on the
intrinsic dimension d instead of the ambient dimension D . However, the noise assumption
is not realistic, and the sieve MLE is not computationally tractable. Their subsequent work
[20] considers the noiseless model, clutter noise model, and additive noise model. In the
additive model, the noise assumption is relaxed to general Gaussian distributions. They view
the distribution of samples as a convolution of a manifold-valued distribution and a distribution
of noise in ambient space, and the fitting problem is treated as a deconvolution problem. They
find a lower bound for the optimal estimation rate, O( log1N ), with the same methodology
in [19], and an upper bound as a polynomial of log1N with a standard deconvolution density
estimator. Nevertheless, their output is not necessarily a manifold, and they claim that this
method requires a known noise distribution, which is also unrealistic. Meanwhile, to guarantee
a small minimax risk, the required sample size should be in exponential form, which is
unsatisfactory.
Since a consistent estimation of the manifold requires a very large sample size, Genovese
et al. avoid this difficulty by studying the ridge of the sample distribution as a proxy [21].
They begin by showing that the Hausdorff distance between the ridge of the kernel density
estimator (KDE) and the ridge of the sample density is OP (( logNN )2/(D+8) ), and then prove
that the ridge of the sample density is O(σ 2 log(1/σ)) in the Hausdorff distance with their
model. Consequently, the ridge of the KDE density is shown to be an estimator with rate
OP (( logNN )2/(D+8) ) + O(σ 2 log(1/σ)), and they adopt the mean-shift algorithm [34] to esti-
mate it. In two similar works, [4, 32], ridge estimation is implemented by two other approaches
with convergence guarantee. While these methods yield favorable results in terms of mini-
max risk, evaluating the smoothness of their estimators presents a challenge. Despite claims
that some methods require only a small sample size, their complex algorithms may prove
6
impractical even for toy examples. Furthermore, the feasibility of the KDE-based algorithm
in high-dimensional cases remains unverified. As noted by [9], kernel-based methodologies
which fail to consider the intrinsic geometry of the domain may lead to sub-optimal outcomes,
such as convergence rates that are dependent on the ambient dimensionality, D , rather than
the intrinsic dimensionality, d. Although [10] introduce a local-covariance-based approach
that transforms the global manifold reconstruction problem into a local Gaussian process
regression problem, thereby facilitating interpolation of the estimated manifold between fitted
data points, their resulting output estimator is still in the form of discrete point sets.
The manifold generated with the above methods may have a very small reach, resulting
in small twists and turns that do not align with the local geometry of the hidden manifold.
To address this, some new research has aimed to ensure a lower-bounded reach of the output
manifold, such as [13], [42] and [16]. Together with [32], all four papers design smooth
mappings to capture some spatial properties and depict the output manifold as its root set or
ridge. Despite the different techniques used, all these papers provide estimators, which are
close to M and have a lower-bounded reach, with high probability. Their required sample
size depends only on σ and d, which is noteworthy and instructive. The main difference is
that [32], [13], and [42] estimate the latent manifold with accuracy O(σ), measured in terms
of Hausdorff distance, while [16] achieves a higher approximation rate O(σ 2 ). However, the
method in [16] requires more knowledge of the manifold, which conflicts with the noisy
observation assumption, and the restriction of sample size and the immature algorithms for
estimating the projection direction hinder the implementation of the idea. On the other hand,
obtaining a manifold defined as the ridge or root set of a function requires additional numerical
algorithms. These algorithms can be computationally expensive and affect the accuracy of the
estimate. A detailed technical comparison of these approaches is provided in Section 1.3 for
completeness.
y r y r
Ty ∗ M Ty ∗ M
yi yj y∗ pj
pi
M M
(a) (b)
r1
ey Ty ∗ M
y r
Ty ∗ M y
y∗ yj r2
yi M
M Vy
(c) (d)
1.3. Detailed review of existing fitting algorithms. This subsection presents a review of
the technical details of the previously mentioned work [32, 13, 42, 16]. These papers relax the
MANIFOLD FITTING 7
requirement for sample size by exploiting the geometric properties of the data points. For ease
of understanding, we introduce some common geometric notations here, while more detailed
notations can be found in Section 2.1. For a point x ∈ M, Tx M denotes the tangent space of
M at x, and Π⊥ x is the orthogonal projection matrix onto the normal space of M at x. For a
point y off M, y ∗ = arg minx∈M ∥y − x∥2 denotes the projection of y on M, and Π b⊥y is the
⊥
estimator of Πy∗ . For an arbitrary matrix A, Πhi (A) represents its projection on the span of
the eigenvectors corresponding to the largest D − d eigenvalues. We use the notation BD (z, r)
to denote a D -dimensional ball with center z and radius r . To be consistent with the papers
subsequently referred to, we frequently use upper- and lower-case letters (such as c, c1 , c2 , C ,
C1 , and C2 ) to represent absolute constants. The upper and lower cases represent constants
greater or less than one, respectively, and their values may vary from line to line.
An early work without noise. One early work on manifold fitting is [32], which only
focuses on the case of noiseless sample X = {xi ∈ M}N i=1 . To reconstruct an M with X , the
c
authors construct a function f (y) to approximate the squared distance from an arbitrary point
y to M, and the ridge set of f (y) is a proper estimator of M.
As stated in [32], f (y) can be estimated by performing local Principal Components Analysis
(PCA). The procedure is shown in Fig. 2(a). For an arbitrary point y close to M, its r
neighborhood index set is defined as
Iy = {i : ∥xi − y∥2 ≤ r}.
b⊥
For each i ∈ Iy , Π xi can be obtained via local PCA, and the square distance between y and
Txi M is approximated by
b⊥
fi (y) = ∥Π 2
xi (y − xi )∥2 .
and θ(t) is an indicator function such that θ(t) = 1 for t ≤ 1/4 and θ(t) = 0 for t ≥ 1.
The estimator M c is given as the ridge set of f (y); that is,
c = {y ∈ RD : d(y, M) ≤ cr, Πhi (Hf (y))∂f (y) = 0},
M
where Hf (y) is the Hessian matrix of f at point y . Such an M c is claimed to have a reach
bounded below by cr and O(r2 )-close to M in terms of Hausdorff distance.
Although this paper does not consider the ambient space noise and relies heavily on a
b⊥
well-estimated projection direction Π xi , the idea of approximating the distance function with
projection matrices is desirable and provides a good direction for subsequent work.
An attempt with noise. In the follow-up work [13], noise from the ambient space is
considered. Similar to [32], the main aim of [13] is to estimate the bias from an arbitrary
point to the hidden manifold with local PCA. The collection of all zero-bias points can be
interpreted as an estimator for M.
8
To construct the bias function f (y), the authors assume there is a sample set Y0 = {yi }N
i=1 ,
with the sample size satisfying
CV
N/ ln(N ) > , N ≤ eD ,
ωmin βd (r2 /τ )d
where V is the volume of M, βd is the volume of a Euclidean unit ball in Rd , and ωmin is
the lower bound of ω on M. Under such conditions, Y0 is Cr2 /τ -close to M in Hausdorff
distance with probability 1 − N −C . Then, a subset Y1 = {pi } ⊂ Y0 is selected greedily as a
minimal cr/d-net of Y0 . For each pi ∈ Y1 , there exists a D -dimensional ball Ui = BD (pi , r)
and a d-dimensional ball Di = Bd (pi , r), where Di can be viewed as a disc cut from Ui . In
the ideal case, Di should be parallel to Tp∗i M. Thus, the authors provide a new algorithm
to estimate the basis of Di with the sample points falling in Ui . The basis of Di leads to an
estimator of Π⊥ b⊥
pi , which is denoted by Πpi .
For y near M, let Iy = {i : ∥pi − y∥2 ≤ r}, and
b⊥
fi (y) = Π pi (y − pi ), for i ∈ Iy .
Then, f (y) can be constructed as
X
(1.4) f (y) = b⊥
αi (y)(Π b⊥
y Πpi )(y − pi ),
i∈Iy
b⊥
with Π
P b⊥
y = Πhi ( i∈Iy αi (y)Πpi ), and the weights defined as
A better estimation for noisy data. To address the issues in [13], the authors of [42]
propose an improved method that avoids the continuous projections and estimates Π⊥ y ∗ better.
The authors claim that fitting the manifold is enough to estimate the projection direction and
the local mean well, because the manifold can be viewed as a linear subspace locally, and the
local sample mean is a good reference point for the hidden manifold. They assume there is a
b⊥ √
sample set Y = {yi }N i=1 . For each yi , Πyi is obtained by local PCA with r = O( σ). Then,
for an arbitrary point y with Iy = {i : ∥yi − y∥2 ≤ r}, the bias function can be constructed as
X
(1.5) f (y) = Πb⊥
y (y − αi (y)yi ),
i∈Iy
b⊥
where Π
P b⊥
y = Πhi ( i∈Iy αi (y)Πyi ). The weights are defined as
The necessity of noise reduction and an attempt. Based on the result mentioned above, the
error in the manifold fitting can be attributed to two components: sampling bias and learning
error, namely
dH (M, M)
c ≤ dH (M, Y) + dH (Y, M),
c
where Y is the generic sample set. Usually, the first term can be regarded as O(σ), as the
Gaussian noise will die out within several σ , and the second term is bounded by Cr2 with
the PCA-based algorithms listed above. The optimal radius of local PCA, √which balances the
overall estimation error and the computation complicity, should be r = O( σ), and leads to a
fitting error such that
dH (M, M)
c ≤ Cσ.
Since the sampling bias dH (Y, M) = O(σ) prevents us from moving closer to M, denoising
is necessary for a better M
c.
On the basis of [13], the same group of authors provides better results in [16] with refined
points and a net. They refine the points by constructing a mesh grid on each disc Di . As
illustrated in Figure 2(d), each hyper-cylinder of the mesh is much longer in the direction
perpendicular to the manifold than parallel. Subsequently, in each hyper-cylinder, a subset of
Y0 is selected with a complicated design, and their average is denoted by ey . The collection of
such ey in all hyper-cylinders is denoted by Y1 , which is shown to be Cdσ 2 /τ -close to M.
The authors take Y1 as the input data set of [13] to perform subsampling and construct
a new group of discs {Di′ }. With the refined points in Y1 and refined discs {Di′ }, the same
c which is O(σ 2 )-close to M and has a reach no less than cτ
function f (y) will lead to an M
with probability 1 − N .−C
To the best of our knowledge, the result presented in [16] constitutes a state-of-the-art
error bound for manifold fitting. However, some challenges exist in implementing the method
described in that paper:
• The refinement step for ey involves sampling directly from the latent manifold, which
contradicts the initial assumption of noisy data.
• The algorithms for refining points and determining the orientation of discs are only briefly
described and may not be directly applied to real-world data.
• The sample-size requirement is similar to that described in [13], further limiting the practical
implementation of the algorithm.
1.4. Organization. This paper is organized as follows. Section 2 presents the model set-
tings, assumptions, preliminary results, and mathematical preliminaries. Section 3 introduces
a novel contraction direction-estimation method. The workflow and theoretical results of our
local contraction methods are included in Section 4, and the output manifold is analyzed in
Section 5. Numerical studies are presented in Section 6, to demonstrate the effectiveness of
our approach. Finally, Section 7 provides a summary of the key findings and conclusions of
our study, as well as several directions for future research.
10
2. Proposed method. In this section, we present some necessary notations and funda-
mental concepts, then formally state our primary result regarding the fitting of a manifold.
Finally, we introduce several lemmas and propositions crucial for further elaboration.
2.1. Notations and important concepts. Throughout this paper, we use both upper- and
lower-case C to represent absolute constants. The distinction between upper and lower-case
letters represents the magnitude of the constants, with the former being greater than one and
the latter being less than one. The values of these constants may vary from line to line. In our
notation, x represents a point on the latent manifold M, y represents a point related to the
distribution ν , and z represents an arbitrary point in the ambient space. The symbol r is used
to denote the radius in some instances. Capitalized math calligraphy letters, such as M, Y ,
and BD (z, r), represent concepts related to sets. This last symbol denotes a D -dimensional
Euclidean ball with center z and radius r .
The distance between a point a and a set A is represented as d(a, A) = mina′ ∈A ∥a − a′ ∥2 ,
where ∥ · ∥2 is the Euclidean distance. To measure the distance between two sets, we adopt the
Hausdorff distance, a commonly used metric in evaluating the accuracy of estimators. This
distance will be used to measure the distance between the latent manifold M and its estimate
Mc throughout this paper. Formally, this metric can be defined as follows:
R EMARK . For any A1 , A2 ⊂ RD , dH (A1 , A2 ) < ϵ is equivalent to the fact that, for
∀a ∈ A1 and ∀b ∈ A2 ,
d(a, A2 ) < ϵ and d(b, A1 ) < ϵ.
In the context of geometry, the Hausdorff distance provides a measure of the proximity
between two manifolds. It is commonly acknowledged that a small Hausdorff distance implies
a high level of alignment between the two manifolds, with controlled discrepancies.
We also require some basic geometrical concepts related to manifolds, more of which can
be found in the supplementary material. Given a point x in the manifold M, the tangent space
at x, denoted by Tx M, is a d-dimensional affine space containing all the vectors tangent to
M at x. To facilitate our analysis, we introduce the projection matrices Π− ⊥
x and Πx , which
D
project any vector v ∈ R onto the tangent space Tx M and its normal space, respectively.
These two projection matrices are closely related as Π⊥ −
x = ID − Πx , where ID is the identity
D
mapping of R . Furthermore, given an arbitrary point z not in M, its projection onto the
manifold is defined as z ∗ = arg minx∈M ∥x − z∥2 , and we use Π b⊥ b−
z and Πz as estimators for
−
Π⊥z ∗ and Πz ∗ , respectively.
The concept of Reach, first introduced by Federer [11], plays a crucial role in measuring
the regularity of manifolds embedded in Euclidean space. Reach has proven to be a valuable
tool in various applications, including signal processing and machine learning, making it an
indispensable concept in the study of manifold models. It can be defined as follows:
For example, the reach of a circle is its radius, and the reach of a linear subspace is infinite.
Intuitively, a large reach implies that the manifold is locally close to the tangent space. This
phenomenon can be explained by the following lemma in [11]:
2.2. Overview of the main results. As stated in the introduction, the fundamental objective
of this paper is to develop an estimator M
c for the latent manifold M, using the available sam-
ple set Y . To this end, we employ a two-step procedure for each y ∈ Γ = {y : d(y, M) ≤ Cσ},
involving (i) identification of the contraction direction and (ii) estimation of the contracted
point. It should be noted that contraction is distinct from projection, as the former entails
movement in a singular direction in normal space.
(Theorem 3.1). To make the estimation continuous with respect to Y and y , we let
X
F (y) = αi (y)yi ,
with the weights αi ’s given in Section 3.2. When the total sample size N = p C1 r0−d σ −3 ,
F (y) − y estimates the direction of y ∗ − y with an error upper bounded by Cσ log(1/σ)
with probability no less than 1 − C1 exp(−C2 σ −c ) (Theorem 3.5).
Estimating the contracted point. The estimation of the projection points is discussed in
three distinct scenarios in Section 4, the most notable of which is using F (y) to estimate the
contraction direction.
e be the projection matrix onto the direction of µBy − y . Consider a cylinder region
Let U
Vy = BD−1 (y, r1 ) × B1 (y, r2 ),
where the second ball is an open interval in the direction of µBy − y , and the first ball is in the
p
complement of it in RD , with r1 = cσ and r2 = Cσ log(1/σ). On the population level, let
the contracted version of y be denoted by
µV
y = y + U EY ∼ν (Y − y|Y ∈ Vy ) ;
e
∗ 2
p
then, ∥µV
y − y ∥2 ≤ Cσ log(1/σ) (Theorem 4.4). For the sake of continuity, we let
T
b = (F (y) − y)(F (y) − y) ,
U
∥(F (y) − y)∥22
12
2.3. Lemmas and propositions. In this subsection, we present some propositions and
lemmas for reference. Their proofs are omitted from the main content and can be found in the
supplementary material.
A notable phenomenon when analyzing the distribution in the vicinity of the manifold is
the prevalence of quantities contingent upon d rather than D . This phenomenon is particularly
evident in the subsequent lemma and its corollary.
C OROLLARY 2.4.1. Let n be the number of observed points that fall in BD (z, r). Assume
the total sample size is N = CDσ −3 r−d . Then,
P(C1 Dσ −3 ≤ n ≤ C2 Dσ −3 ) ≥ 1 − 2 exp −C3 σ −3 ,
Since the Gaussian distribution can be approximated to vanish within a few standard
deviations (σ ), adopting a radius that is marginally larger than σ can result in polynomial
benefits for local estimation. For instance, when computing the conditional expectation within
a ball near the origin, we have the following proposition:
MANIFOLD FITTING 13
P ROPOSITION 2.5. Let ξ be a D -dimensional normal random vector with mean 0 and
covariance matrix σ 2p
ID . Assume there is a D -dimensional ball BD (z, r) centered at point z
with radius r = C1 σ log(1/σ), and ∥z∥2 = C2 σ . Then, the truncated version of ξ satisfies
∥E(ξ|ξ ∈ BD (z, r))∥2 ≤ C3 σ 2 ,
for some constants C1 , C2 , and C3 .
If we normalize them within BD (z, r), the two densities ν̃ and ν̃R should be close, and it is
sufficient to work with ν̃R (y) directly. These can be summarized as the following lemma:
L EMMA 2.6. Let ν̃(y) be the conditional density function within BD (z, r), and ν̃R (y) be
its estimator based on MR . By setting
p
R = r + C1 σ (d + η) log(1/σ),
we have
(2.1) |ν̃(y) − ν̃R (y)| ≤ C2 σ η ν̃R (y)
for some constant C1 and C2 .
3. Estimation of contraction direction. This section presents a novel method for esti-
mating the contraction direction and provides an error bound. Our approach is underpinned by
the fact that, in the denoising step, the goal is to “push” a point z , which is within a distance of
∆ = Cσ to M, toward its projection on M, i.e., z ∗ . Therefore, it is sufficient to estimate the
direction of z ∗ − z instead of estimating the entire basis of Tz ∗ M. To determine this direction,
we focus on a ball BD (z, r0 ) centered at z with radius r0 = Cσ and provide population-level
and sample-level estimators.
3.1. Population level. Let the conditional expectation of ν within the ball be µBz , namely
(3.1) µBz = EY ∼ν (Y |Y ∈ BD (z, r0 )).
The accuracy of the vector µBz − z in estimating the direction of z ∗ − z is reported in Theorem
3.1. The proof of this theorem is presented in the remainder of this subsection. This result
demonstrates that the vector µBz − z performs well in estimating the contraction direction, pro-
viding further support for its use in the denoising step. The proof of lemmas and propositions
is omitted here and can be found in the supplementary material.
14
z r0
R
µB
z Tz ∗ M
z∗
M
T HEOREM 3.1. For a point z such that d(z, M) = O(σ), we can estimate the direction
of z ∗ − z with
µBz − z = EY ∼ν (Y − z|Y ∈ BD (z, r0 )).
The estimation error can be bounded as
sin{Θ µBz − z, z ∗ − z } ≤ Cσ log(1/σ).
p
(3.2)
Without loss of generality, we assume that z ∗ is the origin, Tz ∗ M is the span of the first d
Cartesian-coordinate directions, z ∗ − z is the (d + 1)-th direction, and the remaining directions
constitute the complement in RD . To prove Theorem 3.1, we first provide a sufficient statement
for the error bound in (3.2):
P ROPOSITION 3.2. Let µBz = (µ(1) , · · · , µ(D) ), to show (3.2) is sufficient to show
|∆ − µ(i) | ≥ c1 σ, p
for i = d + 1;
(i) 2
|µ | ≤ c2 σ log(1/σ), for i ̸= d + 1.
L EMMA 3.3. Let ν̃R (y) be the conditional density function within p BD (z, r0 ) induced by
MR , and ν̃D (y) be its estimator with D. By setting R = r0 + C1 σ log(1/σ), we have
p
|ν̃R (y) − ν̃D (y)| ≤ C2 σ log(1/σ)ν̃D (y)
for some constant C1 and C2 .
3.2. Estimation with finite sample. In practice, we typically have access to only the data
point collection Y , which is sampled from the distribution ν(y). To construct an estimator for
µBz as defined in (3.1), a natural approach is to use the local average, defined as
1 X
µ̃Bz = yi ,
|Iz |
i∈Iz
where Iz is the index of yi ’s that lie in Y ∩ BD (z, r0 ). Although µ̃Bz converges to µBz as the
size of Iz goes to infinity, it is not a continuous mapping of y because of the discontinuity
introduced by the change in the neighborhood. The discontinuity can adversely affect the
smoothness of M c. To address this issue, we need a smooth version of µ̃Bz .
Let the local weighted average at point z be
X
(3.3) F (z) = αi (z)yi ,
i
T HEOREM 3.5. If the sample size N = C1 σ −(d+3) , for a point z such that d(z, M) =
O(σ), F (z) as defined in (3.3) provides an estimation of the contraction direction, whose
error can be bounded by
sin{Θ (F (z) − z, z ∗ − z)} ≤ C2 σ log(1/σ),
p
4. Local contraction. This section presents the theoretical results of the local contraction
process. Let z be a point within a distance of Cσ to M, and let Vz be a neighborhood of z .
The conditional expectation of ν within Vz can be viewed as a denoised version of z , namely
EY ∼ν (Y |Y ∈ Vz ) .
To minimize noise and avoid distortion by the manifold, Vz should be narrow in the directions
tangent to the manifold and broad in the direction perpendicular to it, like inserting a straw
into a ball. Thus, determining the orientation of Vz and the scale in two directions is crucial. In
the following sub-sections, we analyze the population-level denoising result for three different
orientation settings and provide a smooth estimator for the last case.
16
4.1. Contraction with known projection direction. In the simplest scenario, we assume the
direction of Tz ∗ M, i.e., Π⊥
z ∗ , is known. Then, Vz can be constructed as the Cartesian product
of two balls. Specifically,
Vz = Bd (z, r1 ) × BD−d (z, r2 )
(4.1)
= Π− ⊥
z ∗ BD (z, r1 ) × Πz ∗ BD (z, r2 ),
where the first ball is d-dimensional, lying in Rd = Tz ∗ M, while the second one is in the
orthogonal complement of Rd in RD with a radius r2 ≫ r1 . Let µV z be the denoised point,
calculated with the conditional expectation within Vz ; precisely,
⊥
(4.2) µV
z = z + Πz ∗ EY ∼ν (Y − z|Y ∈ Vz ) ,
where Y is a random vector with density function ν(y). The refined point µV
z is much closer
to M. This result can be summarized as the following theorem:
T HEOREM 4.1. Consider a point z such that d(z, M) < Cσ . Let its neighborhood Vz be
defined as (4.1) with radius
p
r1 = cσ and r2 = Cσ log(1/σ).
The refined point µV
z given by (4.2) satisfies
2
d(µV
z , M) ≤ Cσ log(1/σ),
for some constant C .
(4.3) = z ∗ + δz + Π ⊥ ∗
z ∗ Eν ((X + ξ) − (z + δz )|Y ∈ Vz )
= z ∗ + Π−z z
∗ δ + Eν Π ⊥
z ∗ (X − z ∗
)|Y ∈ V z + Eν Π ⊥
z ∗ ξ|Y ∈ V z .
∗
With such an expression, µV z − z can be decomposed into three terms. The next step is to show
that the norms of these terms are upper bounded by O(σ 2 log(1/σ)). According to Lemma
2.6, to get a bound in theporder of O(σ 2 log(1/σ)), we only need to consider a localppart of M,
i.e., MR with R = Cσ log(1/σ), and thus it is safe to assume ∥X − z∥2 ≤ Cσ log(1/σ)
for some constant C .
(a) Π−
z ∗ δz :
As δz ⊥ Tz M, we have
(4.4) Π−
z ∗ δz = 0.
(b) Eν Π⊥ ∗
z ∗ (X − z )|Y ∈ Vz :
Since z ∗ and X are exactly on M, from Jensen’s inequality and Lemma 2.3 we have
Eν Π⊥z ∗ (X − z ∗
)|Y ∈ V z ≤ Eν Π ⊥
z ∗ (X − z ∗
) |Y ∈ V z
2 2
1
Eν ∥X − z ∗ ∥22 |Y ∈ Vz ,
≤
2τ
MANIFOLD FITTING 17
Tz ∗ M z∗ Tz ∗ M z∗
δz X
M M
(a) (b)
r1
RD−d
O
Tz ∗ M Rd
∆
z−X r2 − ∆
X
ξ
r2
M 2∆
Vz
(c) (d)
F IG 4. Illustration for the three parts of the error bound in (4.3). (a) δz , perpendicular to Tz ∗ M; (b) Projection of
X − z ∗ , in a higher order than the length of X − z ∗ ; (c, d) Projection of noise term, in two Cartesian-coordinate
systems. A large area is canceled out because of symmetry.
where
∥X − z ∗ ∥22 = ∥X − z + z − z ∗ ∥22
≤ ∥X − z∥22 + ∥z − z ∗ ∥22
≤ Cσ 2 log(1/σ).
Hence,
C 2
(4.5) Eν Π⊥
z ∗ (X − z ∗
)|Y ∈ Vz ≤ σ log(1/σ).
2 τ
Π⊥
(c) Eν z ∗ ξ|Y ∈ Vz :
Because
Eν Π⊥ ⊥
z ∗ ξ|Y ∈ Vz = Eω Eϕ Πz ∗ ξ|X, X + ξ ∈ Vz ,
∆ ≤ ∥Π⊥ ∗ ⊥ ∗
z ∗ (z − z )∥2 + ∥Πz ∗ (z − X)∥2 ≤ Cσ.
Let ξ ′ = Π⊥
z ∗ ξ ; then, according to Proposition 2.5, we have
Eϕ Π⊥ = ∥E ξ ′ |ξ ′ ∈ BD−d (a∆ , r2 ) ∥2 ≤ Cσ 2 ,
(4.6) z∗ ξ|X, X + ξ ∈ V z
2
4.2. Contraction with estimated projection direction. Usually, the projection matrix is
unknown, but it can be estimated via many statistical methods. Assume Πb⊥
z ∗ is an estimator
⊥
for Πz ∗ , whose bias is
(4.7) b⊥
Π ⊥
z ∗ − Πz ∗ ≤ cσ κ .
F
which is still closer to M. The error bound can be summarized as the following theorem:
T HEOREM 4.2. Consider a point z such that d(z, M) < Cσ . Let its neighborhood V b z be
⊥
defined as in (4.8), and the estimation error of Πz ∗ be bounded as in (4.7). The refined point
b
bV
µ z given by (4.9) satisfies
1+κ
p
d(µbV
z , M) ≤ Cσ log(1/σ),
for some constant C .
b⊥
Such an estimator Π z ∗ can be obtained via classical dimension-reduction methods such as
local PCA. Here we cite an error bound of local PCA estimators and implement the result of
Theorem 4.2 in the following remark.
L EMMA 4.3 (Theorem 2.1 in [42]). For a point z such that d(z, M) < Cσ , let Π b⊥z be the
√
estimator of Πz ∗ , obtained via local PCA with r = C σ . The difference between Πz and Π⊥
⊥ b ⊥
z∗
is bounded by
b⊥ ⊥ r
∥Π z − Πz ∗ ∥ F ≤ C
τ
with high probability.
b⊥
R EMARK . With the PCA estimator Π bV
z ∗ mentioned above, the distance between µ z and
M is bounded by
3/2
p
d(µbV
z , M) ≤ Cσ log(1/σ)
with high probability.
4.3. Contraction with estimated contraction direction. In the previous two cases, we
attempted to move z closer to z ∗ in the direction of Π⊥z ∗ . However, instead of estimating the
entire projection matrix, finding an estimator in the main direction is sufficient and can be
more accurate. Specifically, let the projection matrix onto z ∗ − z be
U = (z ∗ − z)(z ∗ − z)T /∥z ∗ − z∥22 ,
and, according to the discussion in Section 3, there is one estimator
e = (µBz − z)(µBz − z)T /∥µBz − z∥22 ,
U
MANIFOLD FITTING 19
T HEOREM 4.4. Consider a point z such that d(z, M) < Cσ . Let its neighborhood V b z be
defined as in (4.11), and the estimation error of U
e be bounded as in (4.10). The refined point
V
µ
bz given by (4.12) satisfies
∗ 2
bV
∥µ z − z ∥2 ≤ Cσ log(1/σ)
r1
Tz ∗ M
z∗
z
ui
r2 yi
vi M
bz
V
For reasons similar to those discussed in Section 3.2, a smooth estimator constructed with
finite samples is needed. Recall that the continuous estimator for U is
T
b = (F (z) − z)(F (z) − z) ,
U
∥(F (z) − z)∥22
whose asymptomatic property is given in Theorem 3.5. For a data point yi , we define
(4.13) b (yi − z),
ui = U vi = yi − z − ui ,
which can be interpreted as the illustration in Fig. 5. Let the contracted point of z be
X
(4.14) G(z) = βi (z)yi ,
i
20
k
(
(4.15) ∥vi ∥22
wv (vi ) = 1 − r 2 , ∥vi ∥2 ≤ r1 ,
1
0, otherwise
X β̃i (y)
βi (z) = wu (ui )wv (vi ), β̃(z) = β̃i (z), βi (z) = ,
β̃(z)
with k ≥ 2 being a fixed integer. It is clear that G is a C 2 -continuous map from RD to RD .
The estimation accuracy of G(z) is summarized in the following theorem:
T HEOREM 4.5. If the sample size N = C1 σ −(d+3) , for a point z such that d(z, M) =
O(σ), G(z), as defined in (4.14), provides an estimation of z ∗ , whose error can be bounded
by
∥G(z) − z ∗ ∥2 ≤ C2 σ 2 log(1/σ)
with probability at least 1 − C3 exp(−C4 σ −c ), for some constant c, C1 , C2 , C3 , and C4 .
5. Fit a smooth manifold. Up to this point, we have explicated the techniques for
estimating the contraction direction and executing the contraction process for points proximal
to M. In this section, we synthesize these two procedures to yield the ultimate smooth
manifold estimator. The estimator is predicated upon a tubular neighborhood of M, denoted
by Γ = {y : d(y, M) ≤ Cσ}, and manifests in two distinct incarnations, corresponding to the
population and sample levels.
On the population level, we assume the distribution ν(y) is known, so that we can calculate
all the expectations. As mentioned in the introduction, estimating ω or M with a known density
function in the form of ν = ω ∗ ϕσ is closely related to the singular deconvolution problem
discussed in [20]. In contrast to their approach, our method uses geometrical structures to
generate an estimate in the form of an image set, yielding a similar error bound. Formally, we
have:
T HEOREM 5.1. Assume the density function ν(y) and region of interest Γ are given. With
bV
the µ y defined in (4.12), we let
bV
S = {µ y : y ∈ Γ}.
Then, we have
dH (S, M) ≤ Cσ 2 log(1/σ)
for some constant C .
When only the sample set Y is available, the function G(y), as defined in (4.14), can be
bV
used as an estimator of µ ∗
y . First, G(y) provides a good estimate of y with high probability.
Additionally, by definition, G(·) is a C 2 -continuous mapping in RD . Hence, similar to the
population case, the image set of Γ under the mapping G also has a good approximation
property. Moreover, because of the smoothness of both G and Γ, the output we obtain is also
a smooth manifold. Specifically, we have the following theorem:
MANIFOLD FITTING 21
T HEOREM 5.2. Assume the region Γ is given. With the G(y) defined in (4.14), we let
(5.1) Sb = G(Γ) = {G(y) : y ∈ Γ}.
Then, Sb is a smooth sub-manifold in RD , and the following claims simultaneously hold for
some constant C with high probability:
• For any x ∈ M, d(x, S)b ≤ Cσ 2 log(1/σ);
• For any s ∈ Sb, d(x, M) ≤ Cσ 2 log(1/σ).
T HEOREM 5.3. For x ∈ M, let Π b x be the estimation of Πx as the one defined in (1.5).
Then there exists a constant c > 0 such that
Mcx = {y ∈ Γ ∩ BD (x, cτ ) : Πb⊥x (G(y) − y) = 0}
In summary, we present two manifold estimators in the form of image sets and one in
the form of level set, all satisfying the Hausdorff-distance condition under certain statistical
conditions. Among them, the estimator proposed in Theorem 5.2 is computationally simpler
and more suitable for scenarios involving sample points, while the other estimators offer
stronger theoretical guarantees for the geometric properties. As discussed in the introduction,
prior works often employed level sets as manifold estimators, despite their inherent limitations:
the existence of solutions to f (x) = 0, where f (x) maps from RD to RD , is not always evident.
Thus the nonemptiness of the level sets is uncertain, requiring additional scrutiny. Furthermore,
this approach lacks an explicit solution, making it difficult to obtain the projection of a given
point onto Mc. Iterative solvers are necessary to approximate the projections, although their
convergence remains unproven.
operating system is Windows 10 Professional 64 Bit. The simulations are implemented with
Matlab R2023a, which is chosen for its ability to perform parallel running conveniently
and reliably. The detailed algorithm used in this paper can be found in the supplementary
material, and the latest version of Python and Matlab implementation are available at
https://github.com/zhigang-yao/manifold-fitting.
6.1. Numerical illustrations of ysl23. Three different manifolds, including two constant-
curvature manifolds - a circle embedded in R2 and a sphere embedded in R3 - and a manifold
with negative curvature, namely a torus embedded in R3 , will be tested in this and the next
subsection. A visualization of these simulated manifolds is presented in Figure 6.
F IG 6. Manifolds employed in the numerical study. Left: a unit circle in R2 ; Middle: A unit sphere in R3 ; Right: a
torus in R3 .
• For each w ∈ W:
1. Find the spherical neighborhood of w with radius r0 , and denote the index of the samples in it as Iw .
2. Calculate the weight function α̃i (w) and αi (w) for each i ∈ Iw as in (3.4), then calculate F (w) by (3.3).
3. Find the cylindrical neighborhood as in (4.11) with radius r1 and r2 , and denote the index of the samples in
it as Ibw .
4. Calculate the weight function β̃i (w) and βi (w) for each i ∈ Ibw as in (4.15), then calculate G(w) by (4.14).
5. Obtain the output point as w
b = G(w).
F IG 7. Visualization of ysl23’s steps: (a) Locating the neighborhood of a noisy observation w. (b) Computing
F (w) defined in (3.3). (c) Identifying the cylindrical neighborhood (points in the black rectangle) of w based on
F (w). (d) Obtaining the output point G(w) using (4.14).
F IG 8. Assessing the performance of ysl23 in fitting the circle (N = 5 × 104 , N0 = 100, σ = 0.06): the left panel
displays points in W surrounding the underlying manifold, while the right panel illustrates the corresponding
points in W.
c
The visualization of ysl23’s performance for the circle case is shown in Figure 8, and
the result for the sphere and the torus case can be found in the supplementary material. In
these tests, we set N = 5 × 104 , N0 = 100 for each case. The closer W c are to the underlying
manifold, the better it works. As can be observed from Figure 8, the output points are
significantly closer to the hidden manifold, clearly demonstrating the efficacy of ysl23. Similar
phenomena, as shown in the supplementary material, can be observed for both sphere and
torus cases.
0.03 0.01
0.025
0.008
0.02
0.006
0.015
0.004
0.01
0.005 0.002
0 0
< = 0.12 < = 0.1 < = 0.08 < = 0.06 < = 0.04 < = 0.02 < = 0.12 < = 0.1 < = 0.08 < = 0.06 < = 0.04 < = 0.02
Hauadorff Distance (Circle, < = 0.06) #10-3 Average Distance (Circle, < = 0.06)
0.06
0.05
15
0.04
0.03 10
0.02
5
0.01
0 0
3# 10 2 3# 10 3 3# 10 4 3# 10 5 3# 10 2 3# 10 3 3# 10 4 3# 10 5
F IG 9. The asymptotic performance of ysl23 when fitting the circle. The top two figures show how the two distances
change with σ, while the bottom two figure show how the two distances change with N .
and average distances both decrease significantly. This improvement can be attributed to two
aspects. Firstly, with the increase of N , we can more accurately estimate the local geometry
of the manifold. Secondly, the radius of the neighborhood in ysl23 is set to decrease with the
increase of the sample size. Hence, the neighborhood in ysl23 becomes closer to its center
point while maintaining a sufficient number of points in the neighborhood. Similar results and
phenomena, as shown in the supplementary material, can be observed for both sphere and
torus cases.
6.2. Comparison of other manifold fitting methods. We performed ysl23, yx19, cf18, and
km17 on the three aforementioned manifolds. The circles and spheres cases were combined
since they both have constant curvature. The torus case was separately presented due to its
inconstant curvature.
6.2.1. The fitting of the circle and sphere. We set N = N0 = 300 for the√circle, and
N = N0 = 1000 for the sphere. The radius of the neighborhood was set as r = 2 σ for yx19,
cf18, and km17. Figure 10 displays the fitting results. The black and red dots correspond to
Mc and M, respectively. A higher degree of overlap between these two sets indicates a better
fit. The first row presents the complete space for the circle embedded in R2 , while the second
row shows the view from the positive z -axis of the sphere embedded in R3 . Notably, km17
demonstrates inferior performance compared with the other methods. Moreover, the estimated
circles by cf18 exhibit two significant gaps, suggesting inaccuracies in the estimator for some
local regions. The ysl23, as well as yx19, demonstrates the best performance.
We made an observation of interest when ysl23 successfully mapped the noisy samples
to the proximity of the hidden manifold, but the sample distribution on the output manifold
was slightly changed. This phenomenon occurred because the number of samples was not
sufficient to represent the perturbation of the uniform distribution on the manifold. Because
26
F IG 10. From left to right: the performance of ysl23, yx19, cf18, and km17 when fitting a circle (top, N = 300,
σ = 0.06) and a sphere (bottom, N = 1000, σ = 0.06).
of this, our contraction strategy clustered the output points towards the denser regions on the
input points. Fortunately, when the sample size is sufficiently large, ysl23 is able to ensure
that the output points are approximately uniformly distributed on M c (see Figure 22 in the
supplementary material).
Circle, Hauadorff Distance Circle, Average Distance Circle, CPU Time (s)
0.12 0.08 5
0.1 4
0.06
0.08 3
0.04
0.06 2
0.02
1
0.04
0 0
ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17
Sphere, Hauadorff Distance Sphere, Average Distance Sphere, CPU Time (s)
0.1
0.15
30
0.08
0.06 20
0.1
0.04
10
0.02
0.05 0 0
ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17
F IG 11. The Hausdorff distance, average distance, and CPU time of fitting a circle (top, N = 300, σ = 0.06) and
a sphere (bottom, N = 1000, σ = 0.06), using ysl23, yx19, cf18, and km17.
We repeated each method 10 times and evaluated their effectiveness in Figure 11. We
find that ysl23 and yx19 achieve slightly better results than cf18 in terms of the Hausdorff
MANIFOLD FITTING 27
distance, while all three outperform km17 significantly. When evaluating the average distance,
ysl23 and cf18 slightly outperform yx19, while all three show significant improvement over
km17. Overall, ysl23 consistently ranks among the top across different metrics. In terms of
computing time, ysl23 also stands out, with remarkably lower running times than those of
the other three methods. Among them, yx19 is the most efficient, while km17 lags behind
significantly.
Circle, Hauadorff Distance Circle, Average Distance Circle, CPU Time (s)
0.05 0.025 16
ysl23 ysl23 ysl23
0.045 yx19 yx19 14 yx19
0.02
0.04 12
0.035 0.015 10
0.03 8
0.025 0.01 6
0.02 4
0.005
0.015 2
0.01 0 0
N = 300 N = 1000 N = 3000 N = 300 N = 1000 N = 3000 N = 300 N = 1000 N = 3000
Shpere, Hauadorff Distance Sphere, Average Distance Sphere, CPU Time (s)
0.08 0.03 15
ysl23 ysl23 ysl23
yx19 yx19 yx19
0.07
0.025
10
0.06
0.02
0.05
5
0.015
0.04
0.03 0.01 0
N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000
F IG 12. The Hausdorff distance, average distance, and CPU time of fitting a circle (top, σ = 0.06) and a sphere
(bottom, σ = 0.06) with increasing N , using ysl23 and yx19.
6.2.2. The fitting of the torus. We set N = 103 for the torus case. The results, displayed
in Figure 13, show that ysl23 outperformed the other three methods in terms of the Hausdorff
distance, average distance, and computing time. To evaluate the performance of ysl23 and
yx19 on the torus, we set an increasing sample size of N ∈ {1000, 2000, 3000} and compared
their results. Figure 14 illustrates the results of both algorithms for each N . As N increased,
we observed a reduction in the distance for both algorithms. However, ysl23 consistently
achieved a much lower distance than yx19, no matter which metric is used. Furthermore, ysl23
demonstrated a remarkable advantage in computational efficiency, completing the task with a
28
Torus, Hauadorff Distance Torus, Average Distance Torus, CPU Time (s)
0.35 0.045
0.3 0.04 50
0.25 0.035 40
0.2 0.03 30
0.15 0.025 20
0.1 0.02 10
0.05 0.015 0
ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17
F IG 13. The Hausdorff distance, average distance, and CPU time of fitting a torus (N = 1000, σ = 0.06), using
ysl23, yx19, cf18, and km17.
significantly shorter running time than yx19. Specifically, in the presented examples, yx19
took over 10 seconds to terminate when N reached 3000, while ysl23 finished in under 0.5
seconds.
Torus, Hauadorff Distance Torus, Average Distance Torus, CPU Time (s)
0.16 0.03 14
ysl23 ysl23 ysl23
yx19 yx19 yx19
0.028 12
0.14
0.026 10
0.12
0.024 8
0.1
0.022 6
0.08
0.02 4
0.06
0.018 2
0.04 0.016 0
N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000
F IG 14. The Hausdorff distance, average distance, and CPU time of fitting a torus (σ = 0.06) with increasing N ,
using ysl23 and yx19.
6.3. Fitting of a Calabi–Yau manifold. Calabi–Yau manifolds [3] are a class of compact,
complex Kähler manifolds that possess a vanishing first Chern class. They are highly significant
because they are Ricci-flat manifolds, which means that their Ricci curvature is zero at all
points, aligning with the universe model of physicists. A simple example of a Calabi–Yau
manifold is the Fermat quartic:
(6.1) x4 + y 4 + z 4 + w4 = 0, (x, y, z, w) ∈ P3 ,
where P3 refers to the complex projective 3-space. To visualize it, we generate low-dimensional
4
projections of the manifold by eliminating variables as in [23], dividing by w4 , and setting wz 4
to be constant. We then normalize the resulting inhomogeneous equation as
(6.2) x4 + y 4 = 1, x, y ∈ C.
The resulting surface is embedded in 4D and can be projected to ordinary 3D space for display.
The parametric representation of (6.2) is
(6.3) x(θ, k1 ) = e2πik1 /4 cosh(θ + ζi)2/4
MANIFOLD FITTING 29
θ + ζi 2/4
(6.4) y(θ, ζ, k2 ) = e2πik2 /4 sinh(
) ,
i
where the integer pair (k1 , k2 ) is selected by 0 ≤ k1 , k2 ≤ 3. Such {(x, y)} can be seen as
points in R4 , denoted by {Re(x), Re(y), Im(x), Im(y)}. A natural 3D projection is
(Re(x), Re(y), cos(ψ)Im(x) + sin(ψ)Im(y)),
where ψ is a parameter. The left panel of Figure 15 shows the surface plot of the 3D projection.
F IG 15. Performance of ysl23 when fitting the real projection of the Calabi–Yau manifold (6.1). The left panel
illustrates the shape of the 3D projection. The middle panel shows some noisy points around the manifold, and the
right panel shows the points on the output manifold.
We generated a set of points in (6.3) and (6.4) on a uniform grid (θ, ζ), where θ is a
sequence of numbers ranging from −1.5 to 1.5 with a step size of 0.05 between consecutive
values, and ζ a sequence of numbers ranging from 0 to π/2 with a step size of 1/640 between
consecutive values. In total, the dataset contains N = 313296 samples with Gaussian noise
added in R4 . As shown in the middle panel of Figure 15, the initial point distribution is not
close to the manifold. However, after running ysl23, the output is significantly closer to it, as
shown in the right panel of Figure 15. This phenomenon indicates that ysl23 performs well
in estimating complicated manifolds. It should be noted that we only applied ysl23 to this
example without running other algorithms because the sample size would cause very long
running times for other algorithms and would not yield usable results.
1.2 0.06
1 0.05
0.8 0.04
0.6 0.03
0.4 0.02
0.2 0.01
0 0
< = 0.03 < = 0.025 < = 0.02 < = 0.015 < = 0.01 < = 0.005 < = 0.03 < = 0.025 < = 0.02 < = 0.015 < = 0.01 < = 0.005
F IG 16. The asymptotic performance of ysl23 fitting the real projection of the Calabi–Yau manifold (6.1). The two
panels show how the two distances change with σ.
We also executed ysl23 with different σ . Specifically, we tested ysl23 with decreasing
σ ∈ {0.03, 0.025, 0.02, 0.015, 0.01, 0.005}. As we decrease σ , both the Hausdorff distance
and average distance decrease at a quadratic rate, which matches Theorem 5.4. These results
further support the effectiveness and reliability of ysl23.
30
A.1. Topology.
E XAMPLE A.1 (Metric Spaces). A metric space is a set M endowed with a distance
function (also called a metric) d : M × M → R (where R denotes the set of real numbers)
satisfying the following properties for all x, y, z ∈ M :
(a) Positivity: d(x, y) ≥ 0, with equality if and only if x = y .
(b) Symmetry: d(x, y) = d(y, x).
(c) Triangle inequality: d(x, z) ≤ d(x, y) + d(y, z).
If M is a metric space, x ∈ M , and r > 0, the open ball of radius r around x is the set
B(x, r) = {y ∈ M : d(x, y) < r}.
The metric topology on M is defined by declaring a subset S ⊆ M to be open if, for every
point x ∈ S , there is some r > 0 such that B(x, r) ⊆ S .
E XAMPLE A.2 (Euclidean Spaces). For integer n ≥ 1, the set Rn of ordered n-tuples of
real numbers is called n-dimensional Euclidean space. We let a point in Rn be denoted by
x(1) , · · · , x(n) or x. The numbers x(i) are called the i-th components or coordinates of x.
For x ∈ Rn , the Euclidean norm of x is the nonnegative real number
q 2 2
∥x∥2 = x(1) + · · · + x(n) ,
and, for x, y ∈ Rn , the Euclidean distance function is defined by
d(x, y) = ∥x − y∥2 .
This distance function turns Rn into a complete metric space. The resulting metric topology
on Rn is called the Euclidean topology.
For the purposes of manifold theory, arbitrary topological spaces are too general. To avoid
pathological situations arising when there are not enough open subsets of X , we often restrict
our attention to Hausdorff space.
There are numerous essential concepts in topology concerning maps, and these will be
introduced next. Let X and Y be two topological spaces, and F : X → Y be a map between
them.
• F is continuous if, for every open subset U ⊆ Y , the preimage F −1 (U ) is open in X .
• If F is a continuous bijective map with continuous inverse, it is called a homeomorphism.
If there exists a homeomorphism from X to Y , we say that X and Y are homeomorphic.
• A continuous map F is said to be a local homeomorphism if every point p ∈ X has a
neighborhood U ⊆ X such that F (U ) is open in Y and F restricts to a homeomorphism
from U to F (U ).
• F is said to be a closed map if, for each closed subset K ⊆ X , the image set F (K) is closed
in Y , and an open map if, for each open subset U ⊆ X , the image set F (U ) is open in Y . It
is a quotient map if it is surjective and V ⊆ Y is open if and only if F −1 (V ) is open.
Furthermore, for a continuous map F , which is either open or closed, the following rules
apply:
(a) If F is surjective, it is a quotient map.
(b) If F is injective, it is a topological embedding.
(c) If F is bijective, it is a homeomorphism.
For maps between metric spaces, there are several useful variants of continuity, especially in
the case of compact spaces. Assume (M1 , d1 ) and (M2 , d2 ) are metric spaces, and F : M1 →
M2 is a map. Then, F is said to be uniformly continuous if, for every ϵ > 0, there exists δ > 0
such that, for all x, y ∈ M1 , d1 (x, y) < δ implies d2 (F (x), F (y)) < ϵ. It is said to be Lipschitz
continuous if there is a constant C such that d2 (F (x), F (y)) ≤ Cd1 (x, y) for all x, y ∈ M1 .
Any such C is called a (globally) Lipschitz constant for F . We say that F is locally Lipschitz
continuous if every point x ∈ M1 has a neighborhood on which F is Lipschitz continuous.
A.1.2. Bases and countability. Suppose X is merely a set, and B is a collection of subsets
of X satisfying the following conditions:
S
(a) X = B∈B B .
(b) If B1 , B2 ∈ B and x ∈ B1 ∩ B2 , then there exists B3 ∈ B such that x ∈ B3 ⊆ B1 ∩ B2 .
Then, the collection of all unions of elements of B is a topology on X , called the topology
generated by B , and B is a basis for this topology.
A set is said to be countably infinite if it admits a bijection with the set of positive integers,
and countable if it is finite or countably infinite. A topological space X is said to be first-
countable if there is a countable neighborhood basis at each point, and second-countable if
there is a countable basis for its topology. Since a countable basis for X contains a countable
neighborhood basis at each point, second-countability implies first-countability.
A.2.2. Smooth Manifolds. Briefly speaking, smooth manifolds are topological manifolds
endowed with an extra structure that allows us to differentiate functions and maps. To introduce
the smooth structure, we first recall the smoothness of a map F : U → Rk . When U is an
open subset of Rd , F is said to be smooth (or C ∞ ), and all of its component functions have
34
continuous partial derivatives of all orders. More generally, when the domain U is an arbitrary
subset of Rd , not necessarily open, F is said to be smooth if, for each x ∈ U, F has a smooth
extension to a neighborhood of x in Rn . A diffeomorphism is a bijective smooth map whose
inverse is also smooth.
If M is a topological d-manifold, then two coordinate charts (U, φ), (V, ψ) for M are said
to be smoothly compatible if both of the transition maps ψ ◦ φ−1 and φ ◦ ψ −1 are smooth
where they are defined (on φ(U ∩ V ) and ψ(U ∩ V ), respectively). Since these maps are
inverses of each other, it follows that both transition maps are in fact diffeomorphisms. An
atlas for M is a collection of coordinate charts whose domains cover M. It is called a smooth
atlas if any two charts in the atlas are smoothly compatible. A smooth structure on M is a
smooth atlas that is maximal, which means it is not properly contained in any larger smooth
atlas. A smooth manifold is a topological manifold endowed with a specific smooth structure.
If M is a set, a smooth manifold structure on M is a second-countable, Hausdorff, locally
Euclidean topology together with a smooth structure, making it a smooth manifold. If M is
a smooth d-manifold and W ⊆ M is an open subset, then W has a natural smooth structure
consisting of all smooth charts (U, φ) for M such that U ⊆ W , and so every open subset of a
smooth d-manifold is a smooth d manifold in a natural way.
Suppose M and N are smooth manifolds. A map F : M → N is said to be smooth if,
for every p ∈ M, there exist smooth charts (U, φ) for M containing p and (V, ψ) for N
containing F (p) such that F (U ) ⊆ V and the composite map ψ ◦ F ◦ φ−1 is smooth from
φ(U ) to ψ(V ). In particular, if N is an open subset of Rk or Rk+ with its standard smooth
structure, we can take ψ to be the identity map of N , and then smoothness of F simply
means that each point of M is contained in the domain of a chart (U, φ) such that F ◦ φ−1
is smooth. It is a clear and direct consequence of the definition that identity maps, constant
maps, and compositions of smooth maps are all smooth. A map F : M → N is said to be a
diffeomorphism if it is smooth and bijective and F −1 : N → M is also smooth.
We let C ∞ (M, N ) denote the set of all smooth maps from M to N , and C ∞ (M) the
vector space of all smooth functions from M to R. For every function f : M → R or Rk , we
define the support of f , denoted by supp f , as the closure of the set {x ∈ M : f (x) ̸= 0}. If
A ⊆ M is a closed subset and U ⊆ M is an open subset containing A, then a smooth bump
function for A supported in U is a smooth function f : M → R satisfying 0 ≤ f (x) ≤ 1 for
all x ∈ M, f |A ≡ 1, and supp f ⊂ U . Such smooth bump functions always exist.
There are various equivalent approaches to define tangent vectors on M. The most con-
venient one is via the following definition: for every point p ∈ M, a tangent vector at p is a
linear map v : C ∞ (M) → R that is a derivation at p, which means that, for all f, g ∈ C ∞ (M),
v satisfies the product rule
v(f g) = f (p)vg + g(p)vf.
The set of all tangent vectors at p is denoted by Tp M and called the tangent space at p.
Suppose M is d-dimensional and φ : U → U e ⊆ Rd is a smooth coordinate chart on some
open subset U ⊆ M. Writing the coordinate functions of φ as x(1) , · · · , x(n) , we define the
These vectors form a basis for Tp M, which therefore has dimension d. Thus, once a smooth
coordinate chart has been chosen, every tangent vector v ∈ Tp M can be written uniquely in
the form
v = v (1) ∂/ ∂x(1) + · · · + v (n) ∂/ ∂x(n) .
p p
MANIFOLD FITTING 35
T HEOREM A.4 (Inverse Function Theorem for Manifolds, Thm. 4.5 of [28]). Suppose
M and N are smooth manifolds and F : M → N is a smooth map. If the linear map dFp is
invertible at some point p ∈ M, then there exist connected neighborhoods U0 of p and V0 of
F (p) such that F |U0 : U0 → V0 is a diffeomorphism.
The most useful consequence of the inverse function theorem is that a smooth map F :
M → N is said to have constant rank if the linear map dFp has the same rank at every point
p ∈ M.
T HEOREM A.5 (Rank Theorem, Thm. 4.12 of [28]). Suppose M and N are smooth
manifolds of dimensions m and n, respectively, and F : M → N is a smooth map with
constant rank r . For each p ∈ M there exist smooth charts (U, φ) for M centered at p
and (V, ψ) for N centered at F (p) such that F (U ) ⊆ V , in which F has a coordinate
representation of the form
Fe x(1) , · · · , x(r) , x(r+1) , · · · , x(m) = x(1) , · · · , x(r) , 0, · · · , 0
The most important types of constant-rank maps are listed below. In all of these definitions,
M and N are smooth manifolds, and F : M → N is a smooth map.
• F is a submersion if its differential is surjective at each point, or equivalently if it has
constant rank equal to dim N .
• F is an immersion if its differential is injective at each point, or equivalently if it has
constant rank equal to dim M.
• F is a local diffeomorphism if every point p ∈ M has a neighborhood U such that F |U is a
diffeomorphism onto an open subset of N , or equivalently if F is both a submersion and an
immersion.
• F is a smooth embedding if it is an injective immersion that is also a topological embedding
(a homeomorphism onto its image, endowed with the subspace topology).
T HEOREM A.6 (Constant-Rank Level Set Theorem, Thm. 5.12 of [28]). Suppose M and
N are smooth manifolds, and Φ : M → N is a smooth map with constant rank r . Every level
set of Φ is a properly embedded submanifold of codimension r in M
C OROLLARY A.6.1 (Submersion Level Set Theorem, Cor. 5.13 of [28]). Suppose M and
N are smooth manifolds, and Φ : M → N is a smooth submersion. Every level set of Φ is a
properly embedded submanifold of M, whose codimension is equal to dim N .
In fact, a map does not have to be a submersion, or even to have constant rank, for its level
sets to be embedded submanifolds. If Φ : M → N is a smooth map, a point p ∈ M is called
a regular point of Φ if the linear map dΦp : Tp M → TΦ(p) N is surjective, and p is called a
critical point of Φ if it is not. A point c ∈ N is called a regular value of Φ if every point of
Φ−1 (c) is a regular point of Φ, and a critical value otherwise. A level set Φ−1 (c) is called a
regular level set of Φ if c is a regular value of Φ.
C OROLLARY A.6.2 (Regular Level Set Theorem, Cor. 5.14 of [28]). Let M and N be
smooth manifolds, and let Φ : M → N be a smooth map. Every regular level set of Φ is a
properly embedded submanifold of M whose codimension is equal to dim N .
A.3. Riemannian manifold. There are many important geometric concepts in Euclidean
space, such as length and angle, which are derived from inner product. To extend these
geometric ideas to abstract smooth manifolds, we need a structure that amounts to a smoothly
varying choice of inner product on each tangent space.
Let M be a smooth manifold. A Riemannian metric on M is a collection of inner products,
whose element at pM is an inner product gp : Tp M × Tp M → R that varies smoothly with
respect to p. A Riemannian manifold is a pair (M, g), where M is a smooth manifold and
g is a specific choice of Riemannian metric on M. If M is understood to be endowed with
a specific Riemannian metric, a conventional statement often used is “M is a Riemannian
manifold.” In the following sections, we assume (M, g) is an oriented Riemannian d-manifold.
Another important construction provided by a metric on an oriented manifold is a canonical
volume form. For (M, g), there is a unique d-form dVg on M, called the Riemannian volume
form, characterized by
q
dVg = det (gij )dx(1) ∧ · · · ∧ dx(d) ,
where the dx(i) are 1-forms from any oriented local coordinates. Here, det (gij ) is the absolute
value of the determinant of the matrix representation of the metric tensor on the manifold.
The Riemannian volume form allows us to integrate functions on an oriented Riemannian
manifold. Let f be a continuous, compactly supported real-valued
R function on (M, g). Then,
f dVg is a compactly supported d-form. Therefore, the integral M f dVg makes sense, and we
define it as the integral of f over M. Similarly, we can define probability measures on M,
and if M is compact, the volume of M can be evaluated as
Z Z
Vol(M) = dVg = 1dVg .
M M
A curve in M usually means a parametrized curve, namely a continuous map γ : I → M,
where I ⊆ R is some interval. To say that γ is a smooth curve is to say that it is smooth as a
map from I to M . A smooth curve γ : I → M has a well-defined velocity γ ′ (t) ∈ Tγ(t) M for
each t ∈ I . We say that γ is a regular curve if γ ′ (t) ̸= 0 for t ∈ I . This implies that the image
of γ has no “corners” or “kinks.” For brevity, we refer to a piecewise regular curve segment
MANIFOLD FITTING 37
γ : [a, b] → M as an admissible curve, and any partition (a0 , · · · , ak ) such that γ|[ai−1 ,ai ] is
smooth for each i as an admissible partition for γ . If γ is an admissible curve, we define the
length of γ as
Z b
Lg (γ) = γ ′ (t) g dt.
a
The speed of γ at any time t ∈ I is defined as the scalar |γ ′ (t)|. We say that γ is a unit-speed
curve if |γ ′ (t)| = 1 for all t, and a constant-speed curve if |γ ′ (t)| is constant. If γ : [a, b] → M
is a unit-speed admissible curve, then its arc-length function has the simple form s(t) = t − a.
For this reason, a unit-speed admissible curve whose parameter interval is of the form [0, b] is
said to be parametrized by arc-length.
For each pair of points p, q ∈ M, we define the Riemannian distance from p to q , denoted
by dM (p, q), as the infimum of the lengths of all admissible curves from p to q . When M is
connected, we say an admissible curve γ is a minimizing curve if and only if Lg (γ) is equal to
the distance between its endpoints. A unit-speed minimizing curve is also called a geodesic.
Thus, we use geodesic distance and Riemannian distance interchangeably.
T HEOREM A.7 (Existence and Uniqueness of Geodesics, Thm 4.27 of [29]). For every
p ∈ M, w ∈ Tp M, and t0 ∈ R, there exist an open interval I ⊆ R containing t0 and a
geodesic γ : I → M satisfying γ (t0 ) = p and γ ′ (t0 ) = w . Any two such geodesics agree on
their common domain.
D EFINITION A.8 (Normal matrices). A matrix square matrix A is normal when AA∗ =
A∗ A, where A∗ is its conjugate-transpose. This is equivalent to saying that there exists a
unitary matrix U such that U AU ∗ is diagonal (and the diagonal elements are precisely the
eigenvalues of A). Every Hermitian and every unitary matrix is normal.
D EFINITION A.9 (Trace norm). The trace norm is defined for every A by
X
∥A∥2F := Tr (AA∗ ) = Tr (A∗ A) = |Ai,j |2 .
1≤i,j≤n
This is also known as the Frobenius, Schur, or Hilbert–Schmidt norm.
D EFINITION A.10 (Principal angles). Suppose A and B are two vector spaces; we call
each
θi (A, B) = arccos(λi (A, B))
the i-th principal angle between A and B , where λi (A, B) is the i-th largest eigenvalue of
AT B . Let Θ(A, B) denote the diagonal matrix whose i-th diagonal entry is θi (A, B), and let
sin Θ(A, B) be performed entrywise, i.e.,
sin Θ(A, B) := diag (sin θi (A, B)) .
38
L EMMA B.1 (Chernoff bound). The generic Chernoff bound for a random variable X is
attained by applying Markov’s inequality to etX . For every t > 0, there is
E etX
tX ta
P(X ≥ a) = P e ≥ e ≤ .
eta
Since the inequality holds for every t > 0, we have
E etX
P(X ≥ a) ≤ inf .
t>0 eta
C OROLLARY B.1.2. Let n ∼ Bino(N, p) be a binomial random variable with size N and
probability p. According to the Chernoff bound,
n
P ≥ p + ϵ ≤ exp {−N DKL (p + ϵ∥p)} ,
N
n
P ≤ p − ϵ ≤ exp {−N DKL (p − ϵ∥p)} ,
N
for ϵ > 0, where
a 1−a
DKL (a∥b) = a log( ) + (1 − a) log( )
b 1−b
denotes the Kullback–Leibler divergence between Bernoulli distributions Be(a) and Be(b).
L EMMA B.2. Assume there is a sequence of observed points {yi }ni=1 , with a series of
weights W (y1 ), · · · , W (y1 ). Let the local moving weighted average be
Pn
W (yi )yi
µbn = Pi=1n .
i=1 W (yi )
Then, if {y : W (y) > 0} ⊂ BD (z, r),
√
d Σ
bn − µ
n (µ bw ) → N 0, ,
E(W )2
with Σ ≤ r2 ID and µ
bw = E(W Y )/E(W ).
P ROOF. According to the central limit theorem and the law of large numbers,
Pn
i=1 wi a.s.
→ E(W ),
n
Pn
√
i=1 wi yi d
n − E(W Y ) → N (0, Σ),
n
MANIFOLD FITTING 39
where Σ ≤ r2 ID . Thus,
√
E(W Y ) d Σ
n µbn − → N 0, .
E(W ) E(W )2
≤ c2 rd ,
where the first inequality comes from the Chernoff bound, while the last one occurs because σ
is sufficiently small.
Then, for P(Y ∈ BD (z, r)), on one hand,
P(Y ∈ BD (z, r)) ≥ P(∥ξ∥2 ≤ c1 r)P(X ∈ M ∩ BD (z, (1 − c1 )r))
vol(M ∩ BD (z, (1 − c1 )r))
≥ (1 − c2 rd )
vol(M)
≥ c3 rd .
On the other,
P(Y ∈ BD (z, r)) = P(X ∈ M ∩ BD (z, C2 r), ∥Y − z∥2 ≤ r)
+ P(X ∈
/ M ∩ BD (z, C2 r), ∥Y − z∥2 ≤ r)
≤ P(X ∈ M ∩ BD (z, C2 r)) + P(∥ξ∥2 ≥ (C2 − 1)r)
vol(M ∩ BD (z, (1 − c1 )r))
≤ + c4 rd
vol(M)
≤ c5 rd .
40
P ROOF. The number of points n can be viewed as a binomial random variable with size N
and probability parameter p = crd . For any c1 ∈ (0, 1), according to Corollary B.1.2,
n
P ≤ (1 − c1 )p ≤ exp {−N ((1 − c1 )p log(1 − c1 ))}
N
1 − (1 − c1 )p
× exp −N (1 − (1 − c1 )p) log( )
1−p
≤ exp −C1 σ −3 ,
n
P ≥ (1 + c1 )p ≤ exp {−N ((1 + c1 )p log(1 + c1 ))}
N
1 − (1 + c1 )p
× exp −N (1 − (1 + c1 )p) log( )
1−p
≤ exp −C2 σ −3 .
Therefore,
P(C3 Dσ −3 ≤ n ≤ C4 Dσ −3 ) ≥ 1 − 2 exp −C5 σ −3 .
RD−1 RD−1
V1
V3
r V2
R1 R1
O ∆
(a) (b)
F IG 17. Illustration of the integral region in the proof of Proposition 2.5: (a) The region of calculating the
conditional expectation E(ξ|ξ ∈ BD (∆U, r)), where the two shaded parts cancel each other out; (b) Three
multidimensional cubes designed for bounding the expectation.
P ROOF. Without loss of generality, we adjust the Cartesian-coordinate system such that
z = (∆, 0, · · · , 0) and ξ = (ξ (1) , · · · , ξ (D) ), with ∆ = ∥z∥2 ≤ C1 σ for some constant C1 . As
MANIFOLD FITTING 41
illustrated in Fig.17, a large part of the calculating region is canceled, and the expectations
can be bounded through three integrations on multidimensional cubes. That is,
V1 = [∆ − r, ∆ + r] × [−r, r]D−1 ,
∆−r r−∆
V2 = [ √ , √ ]D ,
D D
r r r r
V3 = [∆ − √ , ∆ + √ ] × [− √ , √ ]D−1 .
D D D D
To bound the distance between E(ξ|ξ ∈ BD (z, r)) and the origin, let
∥ξ∥2
Z
D
I1 = |ξ (1) |(2πσ 2 )− 2 exp{− 22 }dξ,
V1 2σ
∥ξ∥2
Z
D
I2 = |ξ (1) |(2πσ 2 )− 2 exp{− 22 }dξ,
V2 2σ
∥ξ∥2
Z
D
I3 = (2πσ 2 )− 2 exp{− 22 }dξ.
V3 2σ
Because of the symmetry,
2
(1) (2πσ 2 )− D exp{− ∥ξ∥
R
BD (z,r) ξ 2σ 2 }dξ
2 2
I1 − I2
≤ .
I3
For simplicity, denote
∆+r r D−1
t2 s2
Z Z
D
D−1 2
2 (2πσ ) I1 =
2 |t| exp{− 2 }dt exp{− 2 }ds
∆−r 2σ 0 2σ
:= I1+ + I1− ,
r−∆ r−∆
!D−1
√
t2 √
s2
Z Z
D D D
2D−1 (2πσ 2 ) I2 =
2 |t| exp{− 2 }dt exp{− 2 }ds
∆−r
√ 2σ 0 2σ
D
:= I2+ + I2− .
Meanwhile, let
r−∆
r+∆
√
t2 t2
Z Z
D
a= t exp{− 2 }dt, δa = t exp{− }dt,
0 2σ r−∆
√ 2σ 2
D
r−∆
r
√
s2 s2
Z Z
D
b= exp{− 2 }ds, δb = exp{− }ds.
0 2σ r−∆
√ 2σ 2
D
Then, there is
Cr2
a = σ 2 1 − exp{− 2 } < σ 2 , b = σ(Φ(C) − Φ(0)) < Cσ,
2σ
( r−∆
√ )2
( ) !
(r + ∆)2
2 D
δa = σ exp − − exp − < Cσ 3 ,
2σ 2 2σ 2
42
( r−∆ 2 )
r−∆ (√ )
δb < (r − √ ) exp − D2 = Cσ 2 log(1/σ) < Cσ.
D 2σ
Furthermore, we can obtain
I1+ − I2+ = (a + δa )(b + δb )D−1 − abD−1
= a(b + δb )D−1 − abD−1 + δa (b + δb )D−1
= δa (b + δb )D−1 + a((b + δb )D−1 − bD−1 )
< Cσ D+2 + aδb (b + δb )D−2 + (b + δb )D−3 b + · · · + (b + δb )bD−3 + bD−2
< Cσ D+2 .
Thus, I1 − I2 < Cσ −D (I1+ − I2+ ) = Cσ 2 .
Additionally, it is clear that I3 > C , and hence,
∥E(ξ|ξ ∈ BD (z, r))∥2 ≤ Cσ 2 .
νR (y) + νc (y)
=R R
BD (z,r) R (y)dy + BD (z,r) νc (y)dy
ν
for y ∈ BD (z, r). Then, the relative difference between ν̃(y) and ν̃R (y) can be evaluated as
νR (y) + νc (y) νR (y)
|ν̃(y) − ν̃R (y)| = R R −R
BD (z,r) νR (y)dy + BD (z,r) νc (y)dy BD (z,r) νR (y)dy
R
νc (y) B (z,r) νc (y)dy
≤ −R D ν̃R (y).
νR (y) BD (z,r) νR (y)dy
Therefore,
|ν̃(y) − ν̃R (y)| ≤ Cσ η ν̃R (y).
P ROOF. Recall that z ∗ is the origin, and z − z ∗ is the (d + 1)-th direction in the Cartesian-
coordinate system. Then,
µBz = (µ(1) , · · · , µ(d) , µ(d+1) , µ(d+2) , · · · , µ(D) ),
z = (0, · · · , 0, ∆, 0, · · · , 0),
where ∆ = ∥z − z ∗ ∥ ≤ cσ . The angle between µBz − z and z ∗ − z can be represented by its
sine as follows:
sin2 (Θ(µBz − z, z ∗ − z)) = 1 − cos2 (Θ(µBz − z, z ∗ − z))
2
(µz − z) · (z ∗ − z)
B
=1−
∥µBz − z∥2 ∥z ∗ − z∥2
(i) 2
P
i̸=d+1 (µ )
=P (i) 2 (d+1) − ∆)2
i̸=d+1 (µ ) + (µ
If
|µ(i) − ∆| ≥ c1 σ, p
for i = d + 1;
(B.1)
|µ(i) | ≤ c2 σ 2 log(1/σ), for i ̸= d + 1.
then
(D − 1)C22 σ 4 log(1/σ)
sin2 (Θ(µBz − z, z ∗ − z)) ≤ ≤ Cσ 2 log(1/σ)
(D − 1)C22 σ 4 log(1/σ) + C12 σ 2
for some constant C .
In other words, equation (B.1) is sufficient for sin(Θ(µBz − z, z ∗ − z)) ≤ Cσ log(1/σ).
p
P ROPOSITION B.3. Assume there is a mapping Ψ : D → MR that satisfies, for any point
z = (z1 , · · · , zd , 0, · · · , 0) ∈ D,
Ψ(z) = (z1 , · · · , zd , ψ(z1 , · · · , zd )).
P ROPOSITION B.4. Since the approximation error of the tangent space for the manifold
localization is in quadratic order,
d(Ψ(z)) = (1 + g(z))dz
with |g(z)| < C∥z∥2 .
44
and hi (y) ∈ BD (z, r). That is, for all y ∈ BD (z, r), hi (y) is its mirror with respect to the i-th
direction, and
y ∈ BD (yr ) ⇔ hi (y) ∈ BD (z, r), for i ̸= d + 1,
x ∈ D ⇒ hi (x) ∈ D, for i = 1, · · · , d,
x ∈ D ⇒ hi (x) = x, for i = d + 1, · · · , D.
Let Bi+ and Bi− be two hemispheres such that
n o n o
Bi+ = y ∈ BD (z, r) : y (i) > 0 , Bi− = y ∈ BD (z, r) : y (i) < 0 .
Then,
Z Z
(i) (i)
µ = y νeD (y)dy + y (i) νeD (y)dy
Bi+ Bi−
Z Z
= y (i) νeD (y)dy + (hi (y))(i) νeD (hi (y))d(hi (y))
+ +
Bi Bi
Z
= y (i) (νeD (y) − νeD (hi (y)))dy
Bi+
To show µ(i) = 0, it is sufficient to show νeD (y) = νeD (hi (y)) or νD (y) = νD (hi (y)). Recall
that
Z
νD (y) = ϕσ (y − x)ω(x)dx,
D
and
∥y − x∥2 = ∥hi (y) − hi (x)∥2 , ∥hi (y) − x∥2 = ∥y − hi (x)∥2 .
Therefore, for i = 1, · · · , d,
Z
νD (hi (y)) = ϕσ (hi (y) − x)ω(x)dx
D
Z
= ϕσ (y − hi (x))ω(hi (x))dhi (x)
D
= νD (y),
and, for i = d + 2, · · · , D ,
Z
νD (y) = ϕσ (y − x)ω(x)dx
D
Z
= ϕσ (hi (y) − hi (x))ω(hi (x))dhi (x)
D
Z
= ϕσ (hi (y) − x)ω(x)dx
D
= νD (hi (y)).
Thus,
µ(i) = 0, for i ̸= d + 1.
46
Z d
Z Y D
Y
(d+1) (d+1) (j) (j)
= (∆ − y )ϕσ (y ) ϕσ (y −x )ω(x) dx ϕσ (y (j) ) dy
BD (z,r) D j=1 j=d+2
Z D
Y
≥C (∆ − y (d+1) )ϕσ (y (d+1) ) ϕσ (y (j) ) dy
BD (z,r) j=d+2
Z ∆ Z
≥C t(ϕσ (t − ∆) − ϕσ (t + ∆)) dt P
ϕσ (y (j) ) dy
0 j̸=d+1 (y ) ≤r −∆
(j) 2 2 2
:= CI1 I2 ,
where the last inequality is the result of the cropping of the integral area (similar to that in Fig.
17), while the first inequality stems from the fact that
Z Y d
ϕσ (y (j) − x(j) )ω(x) dx ≥ P ∥ξ ′ ∥2 ≥ (R − r)|ξ ′ ∼ N (0, σ 2 Id )
(B.2) D j=1
≥ 1 − cσ C ≈ 1.
If we let p = (y (1) , · · · , y (d) ) and q = (y (d+2) , · · · , y (D) ), with ∆ = C0 σ and r ≥ ∆ + c0 σ ,
we have
√ 2
C0 πErf (C0 ) − 2(e−C0 − 1)
Z ∆
I1 = tϕσ (t − ∆) dt = √ σ,
−∆ 2π
Z
I2 = ϕσ (q) dp dq
∥p∥2 +∥q∥22 ≤r2 −∆2
Z
≥ ϕσ (q) dp dq
∥p∥2 +∥q∥22 ≤c20 σ 2
c0 σ
s2
Z
d
−(D−d−1)
= Cσ (c20 σ 2 −s ) s2 2
D−d−2
exp − 2 ds
0 2σ
≥ cσ d .
In other words, when C0 > 0 and c0 > 0, we have I1 ≥ cσ , I2 ≥ cσ d , and thus
|∆ − µ(d+1) | ≥ Cr−d I1 I2 ≥ cσ.
MANIFOLD FITTING 47
P ROOF. The proof is based on the framework of Lemma B.2 and Corollary 2.4.1. We first
provide an estimation of the local sample size, and then show the equivalent property between
µBz and E(W (Y )Y )/E(W (Y )).
For simplicity, we let the collection of observation that falls in BD (z, r0 ) be {yi }ni=1 , with
size n. According to Corollary 2.4.1, if N = Cr0−d σ −3 ,
P(C3 Dσ −3 ≤ n ≤ C4 Dσ −3 ) ≥ 1 − 2 exp −C5 σ −3 .
If we define p = (y (1) , · · · , y (d) ), t = y (d+1) , and q = (y (d+2) , · · · , y (D) ), and assume η ∈ R2k
being an auxiliary vector, there is
R R
BD (z,r0 ) D W (y)ϕσ (y − x)ω(x) dx dy
E(W (Y )) = R R
BD (z,r0 ) D ϕσ (y − x)ω(x) dx dy
Z
−d
≈cr0 W (y)ϕσ (t − ∆)ϕσ (q) dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 ≤r0
Z
−(d+2k)
=cr0 (r02 − ∥p∥22 − (t − ∆)2 − ∥q∥22 )k
2 2
∥p∥ +(t−∆) +∥q∥ ≤r0
2
2
2
× ϕσ (t − ∆)ϕσ (q) dp dt dq
Z
−(d+2k)
=cr0 ϕσ (t − ∆)ϕσ (q) dη dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 +∥η∥22 ≤r0
=c.
Meanwhile, the i-th element of E(W (Y )Y ) can be expressed as
W (y)y (i) ϕσ (y − x)ω(x) dx dy
R R
(i) B D (z,r 0 ) D
(E(W (Y )Y )) = R R
BD (z,r0 ) D ϕσ (y − x)ω(x) dx dy
Z
−d
≈cr0 W (y)y (i) ϕσ (t − ∆)ϕσ (q) dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 ≤r0
48
Z
−(d+2k)
=cr0 y (i) ϕσ (t − ∆)ϕσ (q) dη dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 +∥η∥22 ≤r0
where the two approximation marks are the result of (B.2). By introducing the auxiliary vector
η , these two expectations can be viewed as the analogy of our manifold-fitting model in a
higher-dimension case where the dimensionalities of the ambient space and latent manifold
are D + 2k and d + 2k , respectively.
Hence, let µ
bw = E(W (Y )Y )/E(W (Y )); then, according to Theorem 3.1,
(
(d+1)
|µ
bw − ∆| ≥ c1 σ
(i) .
|µ
bw | ≤ c2 σ 2 , for i ̸= d + 1
Combining the result with Corollary 2.4.1 and Corollary B.2.1, if the total sample size
N = Cr0−d σ −3 ,
bw ∥2 ≤ cσ 2 ) ≥ 1 − C1 σ c1 −1 exp −C2 σ c1 −1 ,
P(∥F (z) − µ
for some constant C1 , C2 , and any c3 ∈ (0, 1), and thus
sin{Θ (F (z) − z, z ∗ − z)} ≤ C1 σ
p
log(1/σ)
with probability at least 1 − C2 exp(−C3 σ −c ), for some constant c, C1 , C2 , and C3 .
b⊥
P ROOF. Assume Πz ∗ satisfies
b⊥
∥Π ⊥ κ
z ∗ − Πz ∗ ∥F ≤ cσ ,
can be divided into three parts. According to Lemma 2.6, we can assume ∥X −
which also p
z ∗ ∥2 ≤ Cσ log(1/σ) for some constant C . Let δz = z − z ∗ , and the three parts can be
evaluated as follows:
b−
(a) Π z ∗ δz :
b−
The norm of Π z ∗ δz is upper bounded as
b−
∥Π − b− −
z ∗ δz ∥2 = ∥Πz ∗ δz + (Πz ∗ − Πz ∗ )δz ∥2
≤ ∥Π− b− −
z ∗ δz ∥2 + ∥(Πz ∗ − Πz ∗ )∥F ∥δz ∥2
≤ 0 + cσ κ σ
≤ Cσ 1+κ ,
for some constant C .
MANIFOLD FITTING 49
b⊥
(b) Eν Π ∗
z ∗ (X − z )|Y ∈ Vz :
b
b⊥
≤ ∥Π ∗ b⊥ ∗
z ∗ (z − z )∥2 + ∥Πz ∗ (z − X)∥2
≤ ∥z − z ∗ ∥2 + Cσ 1+κ log(1/σ)
p
≤ Cσ.
Thus, if we let ξ ′ = Π
b⊥z ∗ ξ , according to Proposition 2.5,
h i
Eν Π b⊥
z ∗ ξ|Y ∈ Vz
b ≤ Eω Eϕ Π⊥ z ∗ ξ|X, X + ξ ∈ Vz
2 2
′ ′
≤ Eω ∥E ξ |ξ ∈ BD−d (a∆ , r2 ) ∥2
≤ Cσ 2
for some constant C .
Therefore,
∗ 1+κ
p
bV
∥µ z − z ∥2 ≤ Cσ log(1/σ),
1+κ
p
bz is O(σ
for some constant C , and µ V
log(1/σ)) to M.
which also pcan be divided into three parts. According to Lemma 2.6, we can assume ∥X −
∗
z ∥2 ≤ Cσ log(1/σ) for some constant C . Let δz = z − z ∗ . The three parts can be evaluated
as follows:
e − δz :
(a) U
As δz is orthogonal to the base of U − ,
e − δz ∥2 ≤ ∥U − δz ∥2 + ∥U − − U
e − ∥F ∥δz ∥2 ≤ Cσ 2
p
∥U log(1/σ).
(b) Eν Ue (X − z ∗ )|Y ∈ V
bz :
∥U (X − z ∗ )∥2 ≤ ∥Π⊥ ∗
z ∗ (X − z )∥2 .
≤ Π⊥
z ∗ (X −z ) ∗ e − U ∥F ∥X − z ∗ ∥
+ ∥U 2
2
1
≤ ∥X − z ∗ ∥22 + σ∥X − z ∗ ∥2
2τ
≤ Cσ 2 log(1/σ).
(c) Eν Ue ξ|Y ∈ V
bz :
≤ ∥z − z ∗ ∥2 + Cσ 2 log(1/σ)
p
≤ Cσ.
Thus, if we let ξ ′ = U
e ξ , according to Proposition 2.5,
Eν U e ξ|Y ∈ V bz ≤ Eω Eϕ U e ξ|X, X + ξ ∈ Vz
2 2
′ ′
≤ Eω ∥E ξ |ξ ∈ BD−d (a∆ , r2 ) ∥2
≤ Cσ 2
for some constant C .
Therefore,
∗ 2
bV
∥µ z − z ∥2 ≤ Cσ log(1/σ),
Bound of ∥µ
bw − µw ∥2 :
p
According to Theorem 3.5, ∥U b − U ∥F ≤ C1 σ log(1/σ) with probability at least 1 −
C2 exp(−C3 σ −c ), and the first derivatives of wu and wv are both upper bounded by a constant
C . We have
|W
cu − Wu | =: |wu (U
b (y − z)) − wu (U (y − z))|
≤ C∥U
b − U ∥F ∥y − z∥2
≤ C4 σ 2 log(1/σ),
|W
cv − Wv | =: |wv ((ID − U
b )(y − z)) − wv ((ID − U )(y − z))|
≤ C∥U
b − U ∥F ∥y − z∥2
≤ C5 σ 2 log(1/σ),
and thus,
|β ∗ (y) − β(y)| = |Wu Wv − W cv |
cu W
= |Wu Wv − Wu W cv − W
cv + Wu W cv |
cu W
≤ Wu |W
cv − Wv | + W
cv |W
cu − Wu |
≤ C6 σ 2 log(1/σ),
52
Property of µw,D :
As in the proof of Section 3, we let z ∗ be the origin and z − z ∗ be the (d + 1)-th direction
in the Cartesian-coordinate system. We also let p = (y (1) , · · · , y (d) ), t = y (d+1) , and q =
(y (d+2) , · · · , y (D) ). With U the same as before, let ∥u∥ = ∥U (y − z)∥ = |t − ∆|, ∥v∥ =
∥p + q∥2 . Assume µw,D = (µ(1) , · · · , µ(D) ); then, the i-th element of µw,D , i.e., µ(i) , can be
expressed as
(i) 2 2 2 k
R R
∥p∥22 +∥q∥22 ≤r12 (t−∆)2 ≤r22 y wu (|t − ∆|)(r1 − ∥p∥2 − ∥q∥2 ) ϕσ (t)ϕσ (q) dt dp dq
R R 2 2 2 k .
∥p∥2 +∥q∥2 ≤r2 (t−∆)2 ≤r2 wu (|t − ∆|)(r1 − ∥p∥2 − ∥q∥2 ) ϕσ (t)ϕσ (q) dt dp dq
2 2 1 2
For i ̸= d + 1:
y (i) (r12 − ∥p∥22 − ∥q∥22 )k ϕσ (q) dp dq
R
∥p∥22 +∥q∥22 ≤r12
µ(i) ≈ 2 − ∥p∥22 − ∥q∥22 )k ϕσ (q) dp dq
R
∥p∥22 +∥q∥22 ≤r12 (r1
y (i) ϕσ (q) dη dp dq
R
∥p∥22 +∥q∥22 +∥η∥22 ≤r12
= R
∥p∥22 +∥q∥22 +∥η∥22 ≤r12 ϕσ (q) dη dp dq
= 0,
where η ∈ R2k is an auxiliary vector making the above conditional expectation an analogy of
Lemma 3.4 in D + 2k − 1-dimensionalp space.
For i = d + 1, we assume r2 = Cσ log(1/σ) > 2∆. We have
R ∆+r2
twu (|t − ∆|)ϕσ (t) dt
µ (d+1)
≈ R∆−r 2
∆+r2
∆−r2 wu (|t − ∆|)ϕσ (t) dt
Z ∆+r2
≤C twu (|t − ∆|)ϕσ (t) dt
∆−r2
Z ∆+r2
=C t[wu (|t − ∆|) − wu (|t + ∆|)]ϕσ (t) dt
0
Z ∆+r2
=C t[wu (|t − ∆|) − wu (|t + ∆|)]ϕσ (t) dt
r2 /2−∆
MANIFOLD FITTING 53
Z ∆+r2
≤C tϕσ (t) dt
r2 /2−∆
≤ Cσ 2 .
Therefore, ∥µw,D − z ∗ ∥2 ≤ Cσ 2 .
bw − z ∗ ∥2 ≤ ∥µ
∥µ bw − µw ∥2 + ∥µw − µw,D ∥2 + ∥µw,D − z ∗ ∥2 ≤ Cσ 2 log(1/σ),
with probability at least 1 − C2 exp(−C3 σ −c ).
According to Corollary 2.4.1 and Corollary B.2.1, if the sample size N = C1 r1−d σ −3 ,
∥G(z) − z ∗ ∥2 ≤ C2 σ 2 log(1/σ)
with probability at least 1 − C2 exp(−C3 σ −c ), for some constant c, C1 , C2 , and C3 .
For the second inequality, let x be an arbitrary point on M. Then, there exists a point
yx ∈ Γ such that x is its projection on M. Hence, from Theorem 4.4 again,
2
(B.4) d(x, S) ≤ ∥x − µV
yx ∥2 ≤ Cσ log(1/σ).
Because (B.3) and (B.4) hold for any s ∈ S and x ∈ M, we complete the proof.
P ROOF. From the smoothness of Γ and G, it is evident that Sb becomes a smooth manifold.
For any s ∈ Sb, there exists a ys ∈ Γ such that s = G(ys ). Then, according to Theorem 4.5,
(B.5) d(s, M) ≤ ∥G(ys ) − ys∗ ∥2 ≤ Cσ 2 log(1/σ),
with a high probability. For the second inequality, let x be an arbitrary point on M. Then,
there exists a point yx ∈ Γ such that x is its projection on M. Hence, from Theorem 4.5 again,
(B.6) d(x, S) ≤ ∥x − G(yx )∥2 ≤ Cσ 2 log(1/σ)
with a high probability. Thus the proof is completed.
54
P ROOF. By fixing the projection matrix Πb⊥x within a neighbour, the function defining Mx
c
is a smooth map with constant rank D − d, and thus, according to the Constant-Rank Level-Set
Theorem, M cx is a properly embedded submanifold of dimension d in RD .
To show the distance, let y be an arbitrary point on Mcx . Then there is
b⊥
Π b⊥ ∗ ∗
x (G(y) − y) = Πx (G(y) − y − (y − y )) = 0,
where y ∗ is the projection of y onto M. Thus, there is
b⊥
∥Π ∗ b⊥ ∗
x (y − y )∥2 = ∥Πx (G(y) − y )∥2
≤ ∥G(y) − y ∗ ∥2
≤ Cσ 2 (log(1/σ)),
b⊥
with high probability. Since y ∈ BD (x, cτ ), there exists c1 ∈ (0, 1) such that ∥Π ⊥
x − Πy ∗ ∥ ≤ c1
with high probability. Hence,
b⊥
∥Π ∗ ⊥ ∗ ⊥ b⊥ ∗
x (y − y )∥2 ≥ ∥Πy ∗ (y − y )∥2 − ∥(Πy ∗ − Πx )(y − y )∥2
≥ |1 − c1 |∥y − y ∗ ∥2
≥ c∥y − y ∗ ∥2 .
Therefore, for any y ∈ M
cx
d(y, M) = ∥y − y ∗ ∥ ≤ Cσ 2 log(1/σ)
with high probability.
P ROOF. The proof of (I) and (II) is exactly the same as the proof in Theorem 5.2. To reveal
c, and a ̸= b. When ∥a − b∥2 ≥ cστ0 , ∥a − b∥2 /d(b, Ta M)
(III), Let a, b ∈ M c ≥ cστ0 is clearly
2
true since ∥a − b∥2 ≥ d(b, Ta M) c . Hence, we assume that ∥a − b∥2 < cστ0 . We further denote
−1
a0 = G (a) ∈ M f and b0 = G−1 (b) ∈ M f.
Let JG denote the Jacobi matrix of G, then JG (a0 ) is a linear mapping from Ta0 Mf to Ta M c.
Consider a local chart of Γ at Ta0 Γ, then the natural projection from M
c ∩ B(a0 , ∥b0 − a0 ∥2 ) to
Ta0 Mc ∩ B(a0 , ∥b0 − a0 ∥2 ) is an invertible mapping. Denote the inverse mapping of the natural
projection as ϕ, and then there exists ηb0 ∈ Ta0 Mc such that ϕ(0) = a0 , and ϕ(ηb ) = b0 . Since
0
∥a − b∥2 < cστ0 , there exists 0 < c < C such that
c∥a0 − b0 ∥2 ≤ ∥ηb0 − ηa0 ∥2 = ∥ηb0 ∥2 ≤ C∥a0 − b0 ∥2 .
Using the Taylor expansion of G at a0 , there is
d(b, Ta M)
c ≤ ∥b − JG (a0 )ηb − G(a0 )∥2
0
Here, HG is the Hessian matrix of G, MG and LG are the upper bound of ∥HG ∥2 and ∥JG ∥2 .
Moreover,
1 1
∥a0 − b0 ∥2 ≤ ∥G(a0 ) − G(b0 )∥2 = ∥G(a0 ) − G(b0 )∥2 ,
ℓG ℓG
where ℓG is the lower bound of JG . Hence, we have
c ≤ C MG + LG ∥a − b∥22 .
d(b, Ta M)
ℓG
Finally, the reach of M
c can be bounded below as
ℓG
reach(M) ≥ min cστ0 , c
c .
MG + LG
According to Lemma 17 and Theorem 18 in [42], for any unit norm direction vector v ∈ RD ,
∥∂v H(y) − v∥2 ≤ Cr0 ,
with high probability. In the case of σ being sufficiently small, the Jacobian matrix of H ,
denoted by JH , is full rank. For any fixed arbitrary rank D − d projection matrix Π∗ ,
Π∗ H : R D → R D ,
JΠ∗ H = Π∗ JH .
In other words, Π∗ H is a smooth map with constant rank D − d, and thus, according to
f = {y ∈ Γ : Π∗ H(y) = 0} is a properly embedded
the Constant-Rank Level-Set Theorem, M
submanifold of co-dimension D − d in Γ. Therefore, dim M
f = d.
(a) (b)
F IG 18. Assessing the performance of ysl23 in fitting the sphere (N = 5 × 104 , N0 = 100, σ = 0.06): the left
panel displays points in W surrounding the underlying manifold, while the right panel illustrates the corresponding
points in W.
c
F IG 19. Assessing the performance of ysl23 in fitting the torus (N = 5 × 104 , N0 = 100, σ = 0.06): the left panel
displays points in W surrounding the underlying manifold, while the right panel illustrates the corresponding
points in W.
c
0.06
0.02
0.05
0.04 0.015
0.03
0.01
0.02
0.01 0.005
< = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02 < = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02
Hauadorff Distance (Sphere, < = 0.06) Average Distance (Sphere, < = 0.06)
0.1 0.025
0.08
0.02
0.06
0.015
0.04
0.01
0.02
0 0.005
1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5 1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5
F IG 20. The asymptotic performance of ysl23 when fitting a sphere. The top two figures show how the two distance
change with σ, while the bottom two figures show how the distances change with N .
MANIFOLD FITTING 57
Hauadorff Distance (Torus, N = 2.5# 10 4 ) Average Distance (Torus, N = 2.5# 10 4 )
0.2 0.08
0.15 0.06
0.1 0.04
0.05 0.02
0 0
< = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02 < = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02
Hauadorff Distance (Torus, < = 0.06) Average Distance (Torus, < = 0.06)
0.12 0.025
0.1
0.02
0.08
0.06 0.015
0.04
0.01
0.02
0 0.005
1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5 1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5
F IG 21. The asymptotic performance of ysl23 when fitting a torus. The top two diagrams show how the two distance
change with σ, while the bottom two figures show how the distances change with N .
F IG 22. The performance of ysl23 with increasing N . Top row, from left to right: N = 3 × 102 , 3 × 103 , 3 × 104 ,
3 × 105 . Middle row, from left to right: N = 1 × 103 , 5 × 103 , 2.5 × 104 , 1.25 × 105 . Bottom row, from left to
right: N = 1 × 103 , 5 × 103 , 2.5 × 104 , 1.25 × 105 . It can be observed that for each example, as the number of
samples increases, the distribution of W output by ysl23 becomes more uniform.
58
SUPPLEMENTARY MATERIAL
Supplementary material for “Manifold Fitting: an Invitation to Statistics"
(doi: COMPLETED BY THE TYPESETTER; .pdf). We include all materials omitted from
the main text.
REFERENCES
[1] B ELKIN , M. and N IYOGI , P. (2003). Laplacian eigenmaps for dimensionality reduction and data representa-
tion. Neural computation 15 1373–1396.
[2] B OISSONNAT, J.-D., G UIBAS , L. J. and O UDOT, S. Y. (2009). Manifold reconstruction in arbitrary dimen-
sions using witness complexes. Discrete & Computational Geometry 42 37–70.
[3] C ALABI , E. (2015). On Kähler manifolds with vanishing canonical class. In Algebraic geometry and topology.
A symposium in honor of S. Lefschetz 12 78–89.
[4] C HEN , Y.-C., G ENOVESE , C. R. and WASSERMAN , L. (2015). Asymptotic theory for density ridges. The
Annals of Statistics 43 1896–1928.
[5] C HENG , S.-W., D EY, T. K. and R AMOS , E. A. (2005). Manifold reconstruction from point samples. In
SODA 5 1018–1027.
[6] DANG , C., S AFAIE , A., P HANIKUMAR , M. and R ADHA , H. (2015). Wind speed and direction estimation
using manifold approximation. In Proceedings of the 14th International Conference on Information
Processing in Sensor Networks 328–329.
[7] D EUTSCH , S., O RTEGA , A. and M EDIONI , G. (2016). Manifold denoising based on spectral graph wavelets.
In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4673–
4677. IEEE.
[8] D ONOHO , D. L. and G RIMES , C. (2003). Hessian eigenmaps: Locally linear embedding techniques for
high-dimensional data. Proceedings of the National Academy of Sciences 100 5591–5596.
[9] D UNSON , D. B., W U , H.-T. and W U , N. (2022). Graph based Gaussian processes on restricted domains.
Journal of the Royal Statistical Society Series B: Statistical Methodology 84 414–439.
[10] D UNSON , D. B. and W U , N. (2022). Inferring manifolds from noisy data using Gaussian processes. arXiv:
2110.07478.
[11] F EDERER , H. (1959). Curvature measures. Transactions of the American Mathematical Society 93 418–491.
[12] F EFFERMAN , C. (2006). Whitney’s Extension Problem for C m . Annals of mathematics 313–359.
[13] F EFFERMAN , C., I VANOV, S., K URYLEV, Y., L ASSAS , M. and NARAYANAN , H. (2018). Fitting a putative
manifold to noisy data. In Conference On Learning Theory 688–720. PMLR.
[14] F EFFERMAN , C., I VANOV, S., K URYLEV, Y., L ASSAS , M. and NARAYANAN , H. (2020). Reconstruction
and interpolation of manifolds. I: The geometric Whitney problem. Foundations of Computational
Mathematics 20 1035–1133.
[15] F EFFERMAN , C., I VANOV, S., L ASSAS , M., L U , J. and NARAYANAN , H. (2021). Reconstruction and
interpolation of manifolds II: Inverse problems for Riemannian manifolds with partial distance data.
arXiv:2111.14528.
[16] F EFFERMAN , C., I VANOV, S., L ASSAS , M. and NARAYANAN , H. (2021). Fitting a manifold of large reach
to noisy data. arXiv:1910.05084.
[17] F EFFERMAN , C., M ITTER , S. and NARAYANAN , H. (2016). Testing the manifold hypothesis. Journal of the
American Mathematical Society 29 983–1049.
[18] F EFFERMAN , C. L. (2005). A sharp form of Whitney’s extension theorem. Annals of mathematics 509–577.
[19] G ENOVESE , C., P ERONE -PACIFICO , M., V ERDINELLI , I. and WASSERMAN , L. (2012). Minimax Manifold
Estimation. Journal of Machine Learning Research 13 1263–1291.
[20] G ENOVESE , C. R., P ERONE -PACIFICO , M., V ERDINELLI , I. and WASSERMAN , L. (2012). Manifold
estimation and singular deconvolution under Hausdorff loss. The Annals of Statistics 40 941–963.
[21] G ENOVESE , C. R., P ERONE -PACIFICO , M., V ERDINELLI , I. and WASSERMAN , L. (2014). Nonparametric
ridge estimation. The Annals of Statistics 42 1511–1545.
[22] G OODFELLOW, I., P OUGET-A BADIE , J., M IRZA , M., X U , B., WARDE -FARLEY, D., O ZAIR , S.,
C OURVILLE , A. and B ENGIO , Y. (2014). Generative adversarial nets. Advances in neural information
processing systems 27.
[23] H ANSON , A. J. (1994). A construction for computer visualization of certain complex curves. Notices of the
Amer. Math. Soc 41 1156–1163.
[24] J UNG , S., D RYDEN , I. L. and M ARRON , J. S. (2012). Analysis of principal nested spheres. Biometrika 99
551-568.
MANIFOLD FITTING 59
[25] K IM , I., M ARTINS , R. J., JANG , J., BADLOE , T., K HADIR , S., J UNG , H.-Y., K IM , H., K IM , J., G ENEVET, P.
and R HO , J. (2021). Nanophotonics for light detection and ranging technology. Nature nanotechnology
16 508–524.
[26] L EE , D.-T. and S CHACHTER , B. J. (1980). Two algorithms for constructing a Delaunay triangulation.
International Journal of Computer & Information Sciences 9 219–242.
[27] L EE , J. M. (2010). Introduction to topological manifolds 202. Springer Science & Business Media.
[28] L EE , J. M. (2013). Smooth manifolds. In Introduction to smooth manifolds 1–31. Springer.
[29] L EE , J. M. (2018). Introduction to Riemannian manifolds 176. Springer.
[30] L UO , S. and H U , W. (2020). Differentiable manifold reconstruction for point cloud denoising. In Proceedings
of the 28th ACM international conference on multimedia 1330–1338.
[31] M C I NNES , L., H EALY, J. and M ELVILLE , J. (2018). Umap: Uniform manifold approximation and projection
for dimension reduction. arXiv preprint arXiv:1802.03426.
[32] M OHAMMED , K. and NARAYANAN , H. (2017). Manifold learning using kernel density estimation and local
principal components analysis. arXiv:1709.03615.
[33] N IYOGI , P., S MALE , S. and W EINBERGER , S. (2008). Finding the homology of submanifolds with high
confidence from random samples. Discrete & Computational Geometry 39 419–441.
[34] O ZERTEM , U. and E RDOGMUS , D. (2011). Locally defined principal curves and surfaces. The Journal of
Machine Learning Research 12 1249–1286.
[35] PANARETOS , V. M., P HAM , T. and YAO , Z. (2014). Principal Flows. Journal of the American Statistical
Association 109 424-436.
[36] ROWEIS , S. T. and S AUL , L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding.
science 290 2323–2326.
[37] S OBER , B. and L EVIN , D. (2020). Manifold approximation by moving least-squares projection (MMLS).
Constructive Approximation 52 433–478.
[38] T ENENBAUM , J. B., S ILVA , V. D . and L ANGFORD , J. C. (2000). A global geometric framework for nonlinear
dimensionality reduction. science 290 2319–2323.
[39] WANG , W. and C ARREIRA -P ERPINÁN , M. A. (2010). Manifold blurring mean shift algorithms for manifold
denoising. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
1759–1766. IEEE.
[40] W HITNEY, H. (1992). Analytic extensions of differentiable functions defined in closed sets. In Hassler
Whitney Collected Papers 228–254. Springer.
[41] YANG , T. and M ENG , J. (2021). Manifold fitting algorithm of noisy manifold data based on variable-scale
spectral graph. Soft Computing 1–12.
[42] YAO , Z. and X IA , Y. (2019). Manifold fitting under unbounded noise. arXiv:1909.10228.
[43] YAU , S.-T. (1978). On the ricci curvature of a compact kähler manifold and the complex monge-ampére
equation, I. Communications on pure and applied mathematics 31 339–411.
[44] Z HANG , Z. and Z HA , H. (2004). Principal manifolds and nonlinear dimensionality reduction via tangent
space alignment. SIAM journal on scientific computing 26 313–338.