0% found this document useful (0 votes)
20 views59 pages

Manifold Fitting

This document discusses a novel method for manifold fitting that addresses the challenges of reconstructing smooth manifolds from noisy observations in high-dimensional spaces. The proposed approach guarantees efficient estimators with low error rates, utilizing geometric structures and the Hausdorff distance for measurement. The findings have significant implications for various fields, including machine learning, by enhancing the analysis of complex data structures.

Uploaded by

jxjiang20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views59 pages

Manifold Fitting

This document discusses a novel method for manifold fitting that addresses the challenges of reconstructing smooth manifolds from noisy observations in high-dimensional spaces. The proposed approach guarantees efficient estimators with low error rates, utilizing geometric structures and the Hausdorff distance for measurement. The findings have significant implications for various fields, including machine learning, by enhancing the analysis of complex data structures.

Uploaded by

jxjiang20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

MANIFOLD FITTING

B Y Z HIGANG YAO1,*,† , J IAJI S U1 B INGJIE L I1 AND S HING -T UNG YAU2,3


1 Department of Statistics and Data Science, National University of Singapore, 117546 Singapore

2 Department of Mathematics, Harvard University, 02138 Cambridge USA

3 Yau Mathematical Sciences Center, Jingzhai, Tsinghua University, Haidian District, Beijing, 100084 China

While classical data analysis has addressed observations that are real num-
arXiv:2304.07680v2 [math.ST] 12 Aug 2023

bers or elements of a real vector space, at present many statistical problems of


high interest in the sciences address the analysis of data that consist of more
complex objects, taking values in spaces that are naturally not (Euclidean)
vector spaces but which still feature some geometric structure. Manifold fitting
is a long-standing problem, and has finally been addressed in recent years
by Fefferman et. al ([14, 15]). We develop a method with a theory guarantee
that fits a d-dimensional underlying manifold from noisy observations sam-
pled in the ambient space RD . The new approach uses geometric structures
to obtain the manifold estimator in the form of image sets via a two-step
mapping approach. We prove that, under certain mild assumptions and with
a sample size N = O(σ −(d+3) ), these estimators are true d-dimensional
smooth manifolds whose estimation error, as measured by the Hausdorff dis-
tance, is bounded by O(σ 2 log(1/σ)) with high probability. Compared with
the existing approaches proposed in [13, 16, 21, 42], our method exhibits
superior efficiency while attaining very low error rates with a significantly
reduced sample size, which scales polynomially in σ −1 and exponentially
in d. Extensive simulations are performed to validate our theoretical results.
Our findings are relevant to various fields involving high-dimensional data in
machine learning. Furthermore, our method opens up new avenues for existing
non-Euclidean statistical methods in the sense that it has the potential to unify
them to analyze data on manifolds in the ambience space domain.

1. Introduction. The Whitney extension theorem, named after Hassler Whitney, is a


partial converse to Taylor’s theorem. Broadly speaking, it states that any smooth function
defined on a closed subset of a smooth manifold can be extended to a smooth function defined
on the entire manifold. This question can be traced back to H. Whitney’s work in the early
1930s ([40]), and has finally been answered in recent years by Charles Fefferman [12, 18].
The solution to the Whitney extension problem led to new insights into data interpolation and
inspired the formulation of the Geometric Whitney Problems ([14, 15]):
Problem I. Assume that we are given a set A ⊂ RD . When can we construct a smooth d-
dimensional submanifold M c ⊂ RD to approximate A, and how well can M
c estimate A in
terms of distance and smoothness?
Problem II. If (A, dA ) is a metric space, when does there exist a Riemannian manifold
(M,
c g c) that approximates (A, dA ) well?
M

* Research are supported by MOE Tier 2 grant A-0008520-00-00 and Tier 1 grant A8000987-00-00 at the
National University of Singapore.

ZY thanks the support from the Center of Mathematical Sciences and Applications (CMSA) at Harvard
University during his visit since 2022. ZY thanks Professor Charles Fefferman for his helpful discussions. Part of
the work has been done during the Harvard Conference on Geometry and Statistics, supported by CMSA during
Feb 27-March 1, 2023.
MSC2020 subject classifications: Primary 62R99; secondary 62A99.
Keywords and phrases: Manifold fitting, Convergence, Hausdorff distance, Reach.
1
2

To address these problems, various mathematical approaches have been proposed (see [13,
14, 15, 17, 16]). However, many of these methods rely on restrictive assumptions, making it
challenging to implement them as efficient algorithms. As the manifold hypothesis continues to
be a foundational element in statistical research, the Geometric Whitney Problems, particularly
Problem I, merit further exploration and discussion within the statistical community.
The manifold hypothesis posits that high-dimensional data typically lie close to a low-
dimensional manifold. The genesis of the manifold hypothesis stems from the observation that
numerous physical systems possess a limited number of underlying variables that determine
their behavior, even when they display intricate and diverse phenomena in high-dimensional
spaces. For instance, while the motion of a body can be expressed as high-dimensional signals,
the actual motion signals comprise a low-dimensional manifold, as they are generated by a
small number of joint angles and muscle activations. Analogous phenomena arise in diverse
areas, such as speech signals, face images, climate models, and fluid turbulence. The manifold
hypothesis is thus essential for efficient and accurate high-dimensional data analysis in fields
such as computer vision, speech analysis, and medical diagnosis.
In early statistics, one common approach for approximating high-dimensional data was to
use a lower-dimensional linear subspace. One widely used technique for identifying the linear
subspace of high-dimensional data is Principal Component Analysis (PCA). Specifically, PCA
involves computing the eigenvectors of the sample covariance matrix and then employing
these eigenvectors to map the data points onto a lower-dimensional space. One of the principal
advantages of methods like this is that they can yield a simplified representation of the data,
facilitating visualization and analysis. Nevertheless, linear subspaces can only capture linear
relationships in the data and may fail to represent non-linear patterns accurately. To address
these limitations, it is often necessary to employ more advanced manifold-learning techniques
that can better capture non-linear relationships and preserve key information in the data. These
algorithms can be grouped into three categories based on their purpose: manifold embedding,
manifold denoising, and manifold fitting. The key distinction between them is depicted in
Figure 1.

(b) Denoising

(a) Embedding
(c) Fitting

F IG 1. Illustrations for (a) manifold embedding, (b) manifold denoising, and (c) manifold fitting.

Manifold embedding, a technique that aims to find a low-dimensional representation of


high-dimensional data sets sampled near an unknown low-dimensional manifold, has gained
significant attention and contributed to the development of dimensionality reduction, visu-
alization, and clustering techniques since the beginning of the 21st century. This technique
seeks to preserve the distances between points on the manifold. Thus the Euclidean distance
between each pair of low-dimensional points is similar to the manifold distance between the
corresponding high-dimensional points. Manifold embedding tries to learn a set of points
in a low-dimensional space with a similar local or global geometric structure to the mani-
fold data. The resulting low-dimensional representation usually has better aggregation and
MANIFOLD FITTING 3

clearer demarcation between classes. Many scholars have performed various types of research
on manifold-embedding algorithms, such as Isometric Mapping ([38]), Locally Linear Em-
bedding ([36, 8]), Laplacian Eigenmaps ([1]), Local Tangent Space Alignment ([44]), and
Uniform Manifold Approximation Map ([31]). Although these algorithms achieve useful
representations of real-world data, few of them provide theoretical guarantees. Furthermore,
these algorithms typically do not consider the geometry of the original manifold or provide
any illustration of the smoothness of the embedding points.
Manifold denoising aims to address outliers in data sets distributed along a low-dimensional
manifold. Because of disturbances during collection, storage, and transportation, real-world
manifold-distributed data often contain noise. Manifold denoising methods are designed to
reduce the effect of noise and produce a new set of points closer to the underlying manifold.
There are two main approaches to achieving this: feature-based and expectation-based methods.
Feature-based methods extract features using techniques such as wavelet transformation
([7, 41]) or neural networks ([30]) and then drop non-informative features to recover denoised
points via inverse transformations. However, such methods are typically validated only through
simulation studies, lacking theoretical analysis. On the other hand, expectation-based methods
can achieve manifold denoising by shifting the local sample mean ([39]) or by fitting a local
mean function ([37]). However, these methods lack a solid theoretical basis or require overly
restrictive assumptions.
Manifold fitting is a crucial and challenging problem in manifold learning. It aims to
reconstruct a smooth manifold that closely approximates the geometry and topology of
a hidden low-dimensional manifold, using only a data set that lies on or near it. Unlike
manifold embedding or denoising, manifold fitting strongly emphasizes the local and global
properties of the approximation. It seeks to ensure that the generated manifold’s geometry,
particularly its curvature and smoothness, is precise. The application of manifold fitting can
significantly enhance data analysis by providing a deeper understanding of data geometry. A
key benefit of manifold fitting is its ability to uncover the shape of the hidden manifold by
projecting the samples onto the learned manifold. For example, when reproducing the three-
dimensional structure of a given protein molecule, the molecule must be photoed from different
angles several times via cryo-electron microscopy (cryo-EM). Although the orientation of the
molecule is equivalent to the Lie group SO(3), the cryo-EM images are often buried by a
high-dimensional noise because of the scale of the pixels. Manifold fitting helps recover the
underlying low-dimensional Lie group of protein-molecule images and infer the structure of
the protein from it. In a similar manner, manifold fitting can also be used for light detection
and ranging ([25]), as well as wind-direction detection ([6]). In addition, manifold fitting can
generate manifold-valued data with a specific distribution. This capability is potentially useful
in generative machine-learning models, such as Generative Adversarial Network (GAN, [22]).

1.1. Main Contribution. The main objective of this paper is to address the problem
of manifold fitting by developing a smooth manifold estimator based on a set of noisy
observations in the ambient space. Our goal is to achieve a state-of-the-art geometric error
bound while preserving the geometric properties of the manifold. To this end, we employ the
Hausdorff distance to measure the estimation error and reach to quantify the smoothness of
manifolds. Further details and definitions of these concepts are provided in Section 2.1.
Specifically, we consider a random vector Y ∈ RD that can be expressed as
(1.1) Y = X + ξ,
where X ∈ RD is an unobserved random vector following a distribution ω supported on the
latent manifold M, and ξ ∼ ϕσ represents the ambient-space observation noise, independent
4

of X , with a standard deviation σ . The distribution of Y can be viewed as the convolution of


ω and ϕσ , whose density at point y can be expressed as
Z
(1.2) ν(y) = ϕσ (y − x)ω(x)dx.
M

Assume Y = {yi }N
i=1 ⊂ RD is the collection of observed data points, also in the form of
(1.3) yi = x i + ξi , for i = 1, · · · , N,
with (yi , xi , ξi ) being N independent and identical realizations of (Y, X, ξ). Based on Y , we
construct an estimator M c for M and provide theoretical justification for it under the following
main assumptions:
• The latent manifold M is a compact and twice-differentiable d-dimensional sub-manifold,
embedded in the ambient space RD . Its volume with respect to the d-dimensional Hausdorff
measure is upper bounded by V , and its reach is lower bounded by a fixed constant τ .
• The distribution ω is a uniform distribution, with respect to the d-dimensional Hausdorff
measure, on M.
• The noise distribution ϕσ is a Gaussian distribution supported on RD with density function
1 D ∥ξ∥22
ϕσ (ξ) = ( ) 2 exp (− ).
2πσ 2 2σ 2
• The intrinsic dimension d and noise standard deviation σ are known.
In general, M
c is constructed by estimating the projection of points. For a point y in the
domain Γ = {y : d(y, M) ≤ Cσ}, we estimate its projection on M in a two-step manner:
determining the direction and moving y in that direction. The estimation has both theoretical
and algorithmic contributions. From the theoretical perspective:
• On the population level, given the observation distribution ν and the domain Γ, we are able
to obtain a smoothly bordered set S ∈ RD such that the Hausdorff distance satisfies
dH (S, M) < cσ 2 log(1/σ).
• On the sample level, given a sample set Y , with sample size N = O(σ −(d+3) ) and σ being
sufficiently small, we are able to obtain an estimator M
c as a smooth d-dimensional manifold
such that
– For any point y ∈ M c, d(y, M) is less than Cσ 2 log(1/σ);
– For any point x ∈ M, d(x, M) c is less than Cσ 2 log(1/σ);
– For any two points y1 , y2 , we have ∥y1 − y2 ∥22 /d(y2 , Ty1 M)
c ≥ cστ ,
−c
with probability 1 − C1 exp(−C2 σ ), for some positive constant c, c1 , C , C1 , and C2 .
1

In summary, given a set of observed samples, we can provide a smooth d-dimension


manifold Mc which is higher-order closer to M than Y . Meanwhile, the approximate reach of
Mc is no less than cστ .
In addition to its theoretical contributions, our method has practical benefits for some
applications. This paper diverges from previous literature in its motivation, as other works
often define output manifolds through the roots or ridge set of a complicated mapping f . In
contrast, we estimate the orthogonal projection onto M for each point near M. Compared
with previous manifold-fitting methods, our framework offers three notable advantages:
• Our framework yields a definitive solution to the output manifold, which can be calculated in
two simple steps without iteration. This results in greater efficiency than existing algorithms.
MANIFOLD FITTING 5

• Our method requires only noisy samples and does not need any information about the latent
manifold, such as its dimension, thereby broadening the applicability of our framework.
• Our framework computes the approximate projection of an observed point onto the hidden
manifold, providing a clear relationship between input and output. In comparison, pre-
vious algorithms used multiple iterative operations, making it difficult to understand the
relationship between input samples and the corresponding outputs.

1.2. Related Works. One main source of manifold fitting would be the Delaunay trian-
gulation [26] from the 1980s. Given a sample set, a Delaunay triangulation is a meshing in
which no samples are inside the circumcircle of any triangle in the triangulation. Based on
this technique, the early manifold-fitting approaches [5, 2] consider dense samples without
noise. In other words, the given data set constitutes (ϵ, δ)-net of the hidden manifold. Both
[5] and [2] generate a piecewise linear manifold by triangulation that is geometrically and
topologically similar to the hidden manifold. However, the generated manifold is not smooth
and the noise-free and densely distributed assumption of the given data prevents the algorithm
from being widespread.
In recent years, manifold fitting has been more intensively studied and developed, the
research including the accommodation of multiple types of noise and sample distributions, as
well as the smoothness of the resulting manifolds. Genovese et al. have obtained a sequence
of results from the perspective of minimax risk under Hausdorff distance ([19, 20]) with Le
Cam’s method. Their work starts from [19], where noisy sample points are also modeled as
the summation of latent random variables from the hidden manifold and additive noise, but the
noise term is assumed to be bounded and perpendicular to the manifold. The optimal minimax
estimation rate is lower bounded by O(N −2/(2+d) ) with properly constructed extreme cases,
and upper bounded by O(( logNN )2/(2+d) ) with a sieve maximum likelihood estimator (MLE).
Hence, they conclude the rate is tight up to logarithmic factors, and the optimal rate of
convergence is O(N −2/(2+d) ). This result is impressive since the rate only depends on the
intrinsic dimension d instead of the ambient dimension D . However, the noise assumption
is not realistic, and the sieve MLE is not computationally tractable. Their subsequent work
[20] considers the noiseless model, clutter noise model, and additive noise model. In the
additive model, the noise assumption is relaxed to general Gaussian distributions. They view
the distribution of samples as a convolution of a manifold-valued distribution and a distribution
of noise in ambient space, and the fitting problem is treated as a deconvolution problem. They
find a lower bound for the optimal estimation rate, O( log1N ), with the same methodology
in [19], and an upper bound as a polynomial of log1N with a standard deconvolution density
estimator. Nevertheless, their output is not necessarily a manifold, and they claim that this
method requires a known noise distribution, which is also unrealistic. Meanwhile, to guarantee
a small minimax risk, the required sample size should be in exponential form, which is
unsatisfactory.
Since a consistent estimation of the manifold requires a very large sample size, Genovese
et al. avoid this difficulty by studying the ridge of the sample distribution as a proxy [21].
They begin by showing that the Hausdorff distance between the ridge of the kernel density
estimator (KDE) and the ridge of the sample density is OP (( logNN )2/(D+8) ), and then prove
that the ridge of the sample density is O(σ 2 log(1/σ)) in the Hausdorff distance with their
model. Consequently, the ridge of the KDE density is shown to be an estimator with rate
OP (( logNN )2/(D+8) ) + O(σ 2 log(1/σ)), and they adopt the mean-shift algorithm [34] to esti-
mate it. In two similar works, [4, 32], ridge estimation is implemented by two other approaches
with convergence guarantee. While these methods yield favorable results in terms of mini-
max risk, evaluating the smoothness of their estimators presents a challenge. Despite claims
that some methods require only a small sample size, their complex algorithms may prove
6

impractical even for toy examples. Furthermore, the feasibility of the KDE-based algorithm
in high-dimensional cases remains unverified. As noted by [9], kernel-based methodologies
which fail to consider the intrinsic geometry of the domain may lead to sub-optimal outcomes,
such as convergence rates that are dependent on the ambient dimensionality, D , rather than
the intrinsic dimensionality, d. Although [10] introduce a local-covariance-based approach
that transforms the global manifold reconstruction problem into a local Gaussian process
regression problem, thereby facilitating interpolation of the estimated manifold between fitted
data points, their resulting output estimator is still in the form of discrete point sets.
The manifold generated with the above methods may have a very small reach, resulting
in small twists and turns that do not align with the local geometry of the hidden manifold.
To address this, some new research has aimed to ensure a lower-bounded reach of the output
manifold, such as [13], [42] and [16]. Together with [32], all four papers design smooth
mappings to capture some spatial properties and depict the output manifold as its root set or
ridge. Despite the different techniques used, all these papers provide estimators, which are
close to M and have a lower-bounded reach, with high probability. Their required sample
size depends only on σ and d, which is noteworthy and instructive. The main difference is
that [32], [13], and [42] estimate the latent manifold with accuracy O(σ), measured in terms
of Hausdorff distance, while [16] achieves a higher approximation rate O(σ 2 ). However, the
method in [16] requires more knowledge of the manifold, which conflicts with the noisy
observation assumption, and the restriction of sample size and the immature algorithms for
estimating the projection direction hinder the implementation of the idea. On the other hand,
obtaining a manifold defined as the ridge or root set of a function requires additional numerical
algorithms. These algorithms can be computationally expensive and affect the accuracy of the
estimate. A detailed technical comparison of these approaches is provided in Section 1.3 for
completeness.

y r y r
Ty ∗ M Ty ∗ M
yi yj y∗ pj
pi
M M
(a) (b)
r1

ey Ty ∗ M
y r
Ty ∗ M y

y∗ yj r2
yi M
M Vy
(c) (d)

F IG 2. A toy example to illustrate the methodologies in [32, 13, 42, 16].

1.3. Detailed review of existing fitting algorithms. This subsection presents a review of
the technical details of the previously mentioned work [32, 13, 42, 16]. These papers relax the
MANIFOLD FITTING 7

requirement for sample size by exploiting the geometric properties of the data points. For ease
of understanding, we introduce some common geometric notations here, while more detailed
notations can be found in Section 2.1. For a point x ∈ M, Tx M denotes the tangent space of
M at x, and Π⊥ x is the orthogonal projection matrix onto the normal space of M at x. For a
point y off M, y ∗ = arg minx∈M ∥y − x∥2 denotes the projection of y on M, and Π b⊥y is the

estimator of Πy∗ . For an arbitrary matrix A, Πhi (A) represents its projection on the span of
the eigenvectors corresponding to the largest D − d eigenvalues. We use the notation BD (z, r)
to denote a D -dimensional ball with center z and radius r . To be consistent with the papers
subsequently referred to, we frequently use upper- and lower-case letters (such as c, c1 , c2 , C ,
C1 , and C2 ) to represent absolute constants. The upper and lower cases represent constants
greater or less than one, respectively, and their values may vary from line to line.

An early work without noise. One early work on manifold fitting is [32], which only
focuses on the case of noiseless sample X = {xi ∈ M}N i=1 . To reconstruct an M with X , the
c
authors construct a function f (y) to approximate the squared distance from an arbitrary point
y to M, and the ridge set of f (y) is a proper estimator of M.
As stated in [32], f (y) can be estimated by performing local Principal Components Analysis
(PCA). The procedure is shown in Fig. 2(a). For an arbitrary point y close to M, its r
neighborhood index set is defined as
Iy = {i : ∥xi − y∥2 ≤ r}.
b⊥
For each i ∈ Iy , Π xi can be obtained via local PCA, and the square distance between y and
Txi M is approximated by
b⊥
fi (y) = ∥Π 2
xi (y − xi )∥2 .

Then, f (y) is designed as the weighted average of fi (y)’s; that is,


X
f (y) = αi (y)fi (y),
i∈Iy

with the weights defined as


p
fi (y) X α̃i (y)
α̃i (y) = θ( ), α̃(y) = α̃i (y), αi (y) = ,
2r α̃(y)
i∈Iy

and θ(t) is an indicator function such that θ(t) = 1 for t ≤ 1/4 and θ(t) = 0 for t ≥ 1.
The estimator M c is given as the ridge set of f (y); that is,
c = {y ∈ RD : d(y, M) ≤ cr, Πhi (Hf (y))∂f (y) = 0},
M

where Hf (y) is the Hessian matrix of f at point y . Such an M c is claimed to have a reach
bounded below by cr and O(r2 )-close to M in terms of Hausdorff distance.
Although this paper does not consider the ambient space noise and relies heavily on a
b⊥
well-estimated projection direction Π xi , the idea of approximating the distance function with
projection matrices is desirable and provides a good direction for subsequent work.

An attempt with noise. In the follow-up work [13], noise from the ambient space is
considered. Similar to [32], the main aim of [13] is to estimate the bias from an arbitrary
point to the hidden manifold with local PCA. The collection of all zero-bias points can be
interpreted as an estimator for M.
8

To construct the bias function f (y), the authors assume there is a sample set Y0 = {yi }N
i=1 ,
with the sample size satisfying
CV
N/ ln(N ) > , N ≤ eD ,
ωmin βd (r2 /τ )d
where V is the volume of M, βd is the volume of a Euclidean unit ball in Rd , and ωmin is
the lower bound of ω on M. Under such conditions, Y0 is Cr2 /τ -close to M in Hausdorff
distance with probability 1 − N −C . Then, a subset Y1 = {pi } ⊂ Y0 is selected greedily as a
minimal cr/d-net of Y0 . For each pi ∈ Y1 , there exists a D -dimensional ball Ui = BD (pi , r)
and a d-dimensional ball Di = Bd (pi , r), where Di can be viewed as a disc cut from Ui . In
the ideal case, Di should be parallel to Tp∗i M. Thus, the authors provide a new algorithm
to estimate the basis of Di with the sample points falling in Ui . The basis of Di leads to an
estimator of Π⊥ b⊥
pi , which is denoted by Πpi .
For y near M, let Iy = {i : ∥pi − y∥2 ≤ r}, and
b⊥
fi (y) = Π pi (y − pi ), for i ∈ Iy .
Then, f (y) can be constructed as
X
(1.4) f (y) = b⊥
αi (y)(Π b⊥
y Πpi )(y − pi ),
i∈Iy

b⊥
with Π
P b⊥
y = Πhi ( i∈Iy αi (y)Πpi ), and the weights defined as

∥y − pi ∥22 d+2 X α̃i (y)


α̃i (y) = (1 − ) , α̃(y) = α̃i (y), αi (y) = ,
r2 α̃(y)
i∈Iy

for y satisfying ∥y − pi ∥2 ≤ r and 0 otherwise. Subsequently, there is


c = {y ∈ RD : d(y, M) ≤ cr,
M f (y) = 0}.
By setting r = O(σ), the authors prove M c is O(r2 ) = O(σ)-close to M and its reach is
bounded below by cτ with probability 1 − N −C . However, it is notable that the algorithm
for disc-orientation estimation is not proved theoretically in the paper, and the accuracy of
f (y) is limited by the subsequent successive projections Π b⊥ b⊥
y Πpi and the lack of accuracy in
estimating Πb⊥y . Moreover, because of the limitation of the sample size N , the estimation error
of the manifold has a non-zero lower bound and the practical application is very limited.

A better estimation for noisy data. To address the issues in [13], the authors of [42]
propose an improved method that avoids the continuous projections and estimates Π⊥ y ∗ better.
The authors claim that fitting the manifold is enough to estimate the projection direction and
the local mean well, because the manifold can be viewed as a linear subspace locally, and the
local sample mean is a good reference point for the hidden manifold. They assume there is a
b⊥ √
sample set Y = {yi }N i=1 . For each yi , Πyi is obtained by local PCA with r = O( σ). Then,
for an arbitrary point y with Iy = {i : ∥yi − y∥2 ≤ r}, the bias function can be constructed as
X
(1.5) f (y) = Πb⊥
y (y − αi (y)yi ),
i∈Iy

b⊥
where Π
P b⊥
y = Πhi ( i∈Iy αi (y)Πyi ). The weights are defined as

∥y − yi ∥22 β X α̃i (y)


α̃i (y) = (1 − ) , α̃(y) = α̃i (y), αi (y) = ,
r2 α̃(y)
i∈Iy
MANIFOLD FITTING 9

for y satisfying ∥y − yi ∥ ≤ r and 0 otherwise, with β ≥ 2 being a fixed integer which


guarantees f (y) to be derivable in the second order. With such a bias function, the output
manifold can be given as
c = {y ∈ RD : d(y, M) ≤ cr,
M f (y) = 0},
which is shown to be O(σ)-close to M in Hausdorff distance and have a reach no less than cτ
with probability 1 − c exp(−Crd+2 N ). Although the theoretical error bound in [42] remains
the same as that in [13], the method in [42] vastly simplifies the computational process and
outperforms the previous works numerically in many cases.

The necessity of noise reduction and an attempt. Based on the result mentioned above, the
error in the manifold fitting can be attributed to two components: sampling bias and learning
error, namely
dH (M, M)
c ≤ dH (M, Y) + dH (Y, M),
c
where Y is the generic sample set. Usually, the first term can be regarded as O(σ), as the
Gaussian noise will die out within several σ , and the second term is bounded by Cr2 with
the PCA-based algorithms listed above. The optimal radius of local PCA, √which balances the
overall estimation error and the computation complicity, should be r = O( σ), and leads to a
fitting error such that
dH (M, M)
c ≤ Cσ.
Since the sampling bias dH (Y, M) = O(σ) prevents us from moving closer to M, denoising
is necessary for a better M
c.
On the basis of [13], the same group of authors provides better results in [16] with refined
points and a net. They refine the points by constructing a mesh grid on each disc Di . As
illustrated in Figure 2(d), each hyper-cylinder of the mesh is much longer in the direction
perpendicular to the manifold than parallel. Subsequently, in each hyper-cylinder, a subset of
Y0 is selected with a complicated design, and their average is denoted by ey . The collection of
such ey in all hyper-cylinders is denoted by Y1 , which is shown to be Cdσ 2 /τ -close to M.
The authors take Y1 as the input data set of [13] to perform subsampling and construct
a new group of discs {Di′ }. With the refined points in Y1 and refined discs {Di′ }, the same
c which is O(σ 2 )-close to M and has a reach no less than cτ
function f (y) will lead to an M
with probability 1 − N .−C

To the best of our knowledge, the result presented in [16] constitutes a state-of-the-art
error bound for manifold fitting. However, some challenges exist in implementing the method
described in that paper:
• The refinement step for ey involves sampling directly from the latent manifold, which
contradicts the initial assumption of noisy data.
• The algorithms for refining points and determining the orientation of discs are only briefly
described and may not be directly applied to real-world data.
• The sample-size requirement is similar to that described in [13], further limiting the practical
implementation of the algorithm.

1.4. Organization. This paper is organized as follows. Section 2 presents the model set-
tings, assumptions, preliminary results, and mathematical preliminaries. Section 3 introduces
a novel contraction direction-estimation method. The workflow and theoretical results of our
local contraction methods are included in Section 4, and the output manifold is analyzed in
Section 5. Numerical studies are presented in Section 6, to demonstrate the effectiveness of
our approach. Finally, Section 7 provides a summary of the key findings and conclusions of
our study, as well as several directions for future research.
10

2. Proposed method. In this section, we present some necessary notations and funda-
mental concepts, then formally state our primary result regarding the fitting of a manifold.
Finally, we introduce several lemmas and propositions crucial for further elaboration.

2.1. Notations and important concepts. Throughout this paper, we use both upper- and
lower-case C to represent absolute constants. The distinction between upper and lower-case
letters represents the magnitude of the constants, with the former being greater than one and
the latter being less than one. The values of these constants may vary from line to line. In our
notation, x represents a point on the latent manifold M, y represents a point related to the
distribution ν , and z represents an arbitrary point in the ambient space. The symbol r is used
to denote the radius in some instances. Capitalized math calligraphy letters, such as M, Y ,
and BD (z, r), represent concepts related to sets. This last symbol denotes a D -dimensional
Euclidean ball with center z and radius r .
The distance between a point a and a set A is represented as d(a, A) = mina′ ∈A ∥a − a′ ∥2 ,
where ∥ · ∥2 is the Euclidean distance. To measure the distance between two sets, we adopt the
Hausdorff distance, a commonly used metric in evaluating the accuracy of estimators. This
distance will be used to measure the distance between the latent manifold M and its estimate
Mc throughout this paper. Formally, this metric can be defined as follows:

D EFINITION 2.1 (Hausdorff distance). Let A1 and A2 be two non-empty subsets of RD .


We define their Hausdorff distance dH (A1 , A2 ) induced by Euclidean distance as
dH (A1 , A2 ) = max{ sup inf ∥a − b∥2 , sup inf ∥a − b∥2 }.
a∈A1 b∈A2 b∈A2 a∈A1

R EMARK . For any A1 , A2 ⊂ RD , dH (A1 , A2 ) < ϵ is equivalent to the fact that, for
∀a ∈ A1 and ∀b ∈ A2 ,
d(a, A2 ) < ϵ and d(b, A1 ) < ϵ.

In the context of geometry, the Hausdorff distance provides a measure of the proximity
between two manifolds. It is commonly acknowledged that a small Hausdorff distance implies
a high level of alignment between the two manifolds, with controlled discrepancies.
We also require some basic geometrical concepts related to manifolds, more of which can
be found in the supplementary material. Given a point x in the manifold M, the tangent space
at x, denoted by Tx M, is a d-dimensional affine space containing all the vectors tangent to
M at x. To facilitate our analysis, we introduce the projection matrices Π− ⊥
x and Πx , which
D
project any vector v ∈ R onto the tangent space Tx M and its normal space, respectively.
These two projection matrices are closely related as Π⊥ −
x = ID − Πx , where ID is the identity
D
mapping of R . Furthermore, given an arbitrary point z not in M, its projection onto the
manifold is defined as z ∗ = arg minx∈M ∥x − z∥2 , and we use Π b⊥ b−
z and Πz as estimators for

Π⊥z ∗ and Πz ∗ , respectively.
The concept of Reach, first introduced by Federer [11], plays a crucial role in measuring
the regularity of manifolds embedded in Euclidean space. Reach has proven to be a valuable
tool in various applications, including signal processing and machine learning, making it an
indispensable concept in the study of manifold models. It can be defined as follows:

D EFINITION 2.2 (Reach). Let A be a closed subset of RD . The reach of A, denoted by


reach(A), is the largest number τ to have the following property: any point at a distance less
than τ from A has a unique nearest point in A.
MANIFOLD FITTING 11

R EMARK . The value of reach(M) can be interpreted as a second-order differential


quantity if M is treated as a function. Namely, let γ be an arc-length parameterized geodesic
of M; then, according to [33], ∥γ ′′ (t)∥2 ≤ reach(M)−1 for all t.

For example, the reach of a circle is its radius, and the reach of a linear subspace is infinite.
Intuitively, a large reach implies that the manifold is locally close to the tangent space. This
phenomenon can be explained by the following lemma in [11]:

L EMMA 2.3. [Federer’s reach condition] Let M be an embedded sub-manifold of RD


with reach τ . Then,
 
2d(b, Ta M)
τ −1 = sup |a, b ∈ M, a ̸
= b .
∥a − b∥22

2.2. Overview of the main results. As stated in the introduction, the fundamental objective
of this paper is to develop an estimator M
c for the latent manifold M, using the available sam-
ple set Y . To this end, we employ a two-step procedure for each y ∈ Γ = {y : d(y, M) ≤ Cσ},
involving (i) identification of the contraction direction and (ii) estimation of the contracted
point. It should be noted that contraction is distinct from projection, as the former entails
movement in a singular direction in normal space.

Determining the contraction direction. To enhance the accuracy of our algorithm, we


introduce a novel approach for estimating the direction of y ∗ − y for each y , instead of
estimating the basis of Ty∗ M.
On the population level, consider a D -dimensional ball B(y, r0 ) with r0 = Cσ . Let
µBy = EY ∼ν (Y |Y ∈ BD (y, r0 )).

µBy − y estimates the direction of y ∗ − y with an error upper bounded by Cσ log(1/σ)


p

(Theorem 3.1). To make the estimation continuous with respect to Y and y , we let
X
F (y) = αi (y)yi ,

with the weights αi ’s given in Section 3.2. When the total sample size N = p C1 r0−d σ −3 ,
F (y) − y estimates the direction of y ∗ − y with an error upper bounded by Cσ log(1/σ)
with probability no less than 1 − C1 exp(−C2 σ −c ) (Theorem 3.5).

Estimating the contracted point. The estimation of the projection points is discussed in
three distinct scenarios in Section 4, the most notable of which is using F (y) to estimate the
contraction direction.
e be the projection matrix onto the direction of µBy − y . Consider a cylinder region
Let U
Vy = BD−1 (y, r1 ) × B1 (y, r2 ),
where the second ball is an open interval in the direction of µBy − y , and the first ball is in the
p
complement of it in RD , with r1 = cσ and r2 = Cσ log(1/σ). On the population level, let
the contracted version of y be denoted by
µV
y = y + U EY ∼ν (Y − y|Y ∈ Vy ) ;
e
∗ 2
p
then, ∥µV
y − y ∥2 ≤ Cσ log(1/σ) (Theorem 4.4). For the sake of continuity, we let
T
b = (F (y) − y)(F (y) − y) ,
U
∥(F (y) − y)∥22
12

and construct another smooth map


X
G(y) = βi (y)yi ,

where the weights βi ’s are related to U


b ; their definition can be found in Section 4.3. Then,

the distance between G(y) and y is upper bounded by Cσ 2 log(1/σ) with probability at least
1 − C1 exp(−C2 σ −c ) (Theorem 4.5).

Constructing the manifold estimator. In Section 5, we propose a variety of methods to


construct the manifold estimator for various scenarios. We begin by considering the case
where the distribution ν is known, and demonstrate that the set
S = {µV
y : y ∈ Γ}

has a Hausdorff distance of O(σ 2 log(1/σ)) to M (Theorem 5.1).


Next, we use the sample set Y to obtain an estimated version,
Sb = {G(y) : y ∈ Γ},
which has an approximate Hausdorff distance to M in the order of O(σ 2 log(1/σ)) with high
probability (Theorem 5.2).
Finally, we consider the scenario in which there exists a d-dimensional preliminary estima-
tion M
f that is O(σ) close to M. In this case, we show that, with high probability, G(M) f is
2
a d-dimensional manifold having an approximate Hausdorff error of O(σ log(1/σ)) and a
reach no less than cσreach(M)f (Theorem 5.4).

2.3. Lemmas and propositions. In this subsection, we present some propositions and
lemmas for reference. Their proofs are omitted from the main content and can be found in the
supplementary material.
A notable phenomenon when analyzing the distribution in the vicinity of the manifold is
the prevalence of quantities contingent upon d rather than D . This phenomenon is particularly
evident in the subsequent lemma and its corollary.

For any arbitrary point z such that its neighborhood BD (z, r) ∩ M =


L EMMA√2.4. p ̸ ∅
with r = C 2dσ log(1/σ), the probability that Y ∼ ν falls in BD (z, r) is
P(Y ∈ BD (z, r)) = crd
for some small constant c.

C OROLLARY 2.4.1. Let n be the number of observed points that fall in BD (z, r). Assume
the total sample size is N = CDσ −3 r−d . Then,
P(C1 Dσ −3 ≤ n ≤ C2 Dσ −3 ) ≥ 1 − 2 exp −C3 σ −3 ,


for some constant C1 , C2 , and C3 .

Since the Gaussian distribution can be approximated to vanish within a few standard
deviations (σ ), adopting a radius that is marginally larger than σ can result in polynomial
benefits for local estimation. For instance, when computing the conditional expectation within
a ball near the origin, we have the following proposition:
MANIFOLD FITTING 13

P ROPOSITION 2.5. Let ξ be a D -dimensional normal random vector with mean 0 and
covariance matrix σ 2p
ID . Assume there is a D -dimensional ball BD (z, r) centered at point z
with radius r = C1 σ log(1/σ), and ∥z∥2 = C2 σ . Then, the truncated version of ξ satisfies
∥E(ξ|ξ ∈ BD (z, r))∥2 ≤ C3 σ 2 ,
for some constants C1 , C2 , and C3 .

Analogously, it is sufficient to focus on a subset of M when studying certain local struc-


tures. For instance, in analyzing the conditional moments of ν within a D -dimensional ball
BD (z, r), the submanifold MR = M ∩ BD (z, R) with R ≫ r exerts a significant influence.
By incorporating MR , ν can be approximated with
Z
νR (y) = ϕσ (y − x)ω(x)dx.
MR

If we normalize them within BD (z, r), the two densities ν̃ and ν̃R should be close, and it is
sufficient to work with ν̃R (y) directly. These can be summarized as the following lemma:

L EMMA 2.6. Let ν̃(y) be the conditional density function within BD (z, r), and ν̃R (y) be
its estimator based on MR . By setting
p
R = r + C1 σ (d + η) log(1/σ),
we have
(2.1) |ν̃(y) − ν̃R (y)| ≤ C2 σ η ν̃R (y)
for some constant C1 and C2 .

C OROLLARY 2.6.1. If (2.1) holds for all y ∈ BD (z, r), we have


Z
η
∥Eν̃ Y − Eν̃R Y ∥2 ≤ Cσ ∥y − z∥2 ν̃R (y) dy ≤ Crσ η .
BD (z,r)

3. Estimation of contraction direction. This section presents a novel method for esti-
mating the contraction direction and provides an error bound. Our approach is underpinned by
the fact that, in the denoising step, the goal is to “push” a point z , which is within a distance of
∆ = Cσ to M, toward its projection on M, i.e., z ∗ . Therefore, it is sufficient to estimate the
direction of z ∗ − z instead of estimating the entire basis of Tz ∗ M. To determine this direction,
we focus on a ball BD (z, r0 ) centered at z with radius r0 = Cσ and provide population-level
and sample-level estimators.

3.1. Population level. Let the conditional expectation of ν within the ball be µBz , namely
(3.1) µBz = EY ∼ν (Y |Y ∈ BD (z, r0 )).
The accuracy of the vector µBz − z in estimating the direction of z ∗ − z is reported in Theorem
3.1. The proof of this theorem is presented in the remainder of this subsection. This result
demonstrates that the vector µBz − z performs well in estimating the contraction direction, pro-
viding further support for its use in the denoising step. The proof of lemmas and propositions
is omitted here and can be found in the supplementary material.
14

z r0
R
µB
z Tz ∗ M
z∗
M

F IG 3. An illustration for estimating the contraction direction.

T HEOREM 3.1. For a point z such that d(z, M) = O(σ), we can estimate the direction
of z ∗ − z with
µBz − z = EY ∼ν (Y − z|Y ∈ BD (z, r0 )).
The estimation error can be bounded as
sin{Θ µBz − z, z ∗ − z } ≤ Cσ log(1/σ).
 p
(3.2)

Without loss of generality, we assume that z ∗ is the origin, Tz ∗ M is the span of the first d
Cartesian-coordinate directions, z ∗ − z is the (d + 1)-th direction, and the remaining directions
constitute the complement in RD . To prove Theorem 3.1, we first provide a sufficient statement
for the error bound in (3.2):

P ROPOSITION 3.2. Let µBz = (µ(1) , · · · , µ(D) ), to show (3.2) is sufficient to show
|∆ − µ(i) | ≥ c1 σ, p

for i = d + 1;
(i) 2
|µ | ≤ c2 σ log(1/σ), for i ̸= d + 1.

To prove Proposition 3.2, we employ a strategy of locally approximating the manifold to


the whole and using discs to approximate the local neighborhood of the manifold. Specifically,
we use a disc D = Tz ∗ M ∩ B(z, R) to approximate MR and generalize the result to the entire
manifold. The final error bound is achieved by combining the following lemmas:

L EMMA 3.3. Let ν̃R (y) be the conditional density function within p BD (z, r0 ) induced by
MR , and ν̃D (y) be its estimator with D. By setting R = r0 + C1 σ log(1/σ), we have
p
|ν̃R (y) − ν̃D (y)| ≤ C2 σ log(1/σ)ν̃D (y)
for some constant C1 and C2 .

L EMMA 3.4. Let the conditional expectation of Y ∼ ν̃D within BD (z, r) be


(1) (D)
µBz,D = (µD , · · · , µD ).
Then, there is
(
(i)
|∆ − µD | ≥ cσ for i = d + 1
(i) ,
|µD | = 0, for i ̸= d + 1
MANIFOLD FITTING 15
p
According to Lemma 2.6 and Lemma 3.3, by setting R = r0 + C1 σ log(1/σ), we have
p
|ν̃R (z) − ν̃D (z)| ≤ C2 σ log(1/σ)ν̃D (z),
and thus the conditional expectations within BD (z, r0 ) should also be close, namely
p
∥µBz,D − µBz ∥2 ≤ Cσ 2 log(1/σ),
for some constant C .
Therefore, together with Lemma 3.4, the statement in Proposition 3.2 is fulfilled, and hence
the proof of Theorem 3.1 is completed.

3.2. Estimation with finite sample. In practice, we typically have access to only the data
point collection Y , which is sampled from the distribution ν(y). To construct an estimator for
µBz as defined in (3.1), a natural approach is to use the local average, defined as
1 X
µ̃Bz = yi ,
|Iz |
i∈Iz

where Iz is the index of yi ’s that lie in Y ∩ BD (z, r0 ). Although µ̃Bz converges to µBz as the
size of Iz goes to infinity, it is not a continuous mapping of y because of the discontinuity
introduced by the change in the neighborhood. The discontinuity can adversely affect the
smoothness of M c. To address this issue, we need a smooth version of µ̃Bz .
Let the local weighted average at point z be
X
(3.3) F (z) = αi (z)yi ,
i

with the weights being defined as


( k
∥z−yi ∥22 α̃i (z)
(3.4) α̃i (z) = 1 − r02 , ∥z − yi ∥2 ≤ r0 ; α̃(z) =
X
α̃i (z), αi (z) = ,
0, otherwise; α̃(z)
i∈Iz

with k > 2 being a fixed integer guaranteeing a twice-differentiable smoothness. Similar to


µBz − z , the direction of F (z) − z approximates the direction z to z ∗ well:

T HEOREM 3.5. If the sample size N = C1 σ −(d+3) , for a point z such that d(z, M) =
O(σ), F (z) as defined in (3.3) provides an estimation of the contraction direction, whose
error can be bounded by
sin{Θ (F (z) − z, z ∗ − z)} ≤ C2 σ log(1/σ),
p

with probability at least 1 − C3 exp(−C4 σ −c ), for some constant c, C1 , C2 , C3 , and C4 .

4. Local contraction. This section presents the theoretical results of the local contraction
process. Let z be a point within a distance of Cσ to M, and let Vz be a neighborhood of z .
The conditional expectation of ν within Vz can be viewed as a denoised version of z , namely
EY ∼ν (Y |Y ∈ Vz ) .
To minimize noise and avoid distortion by the manifold, Vz should be narrow in the directions
tangent to the manifold and broad in the direction perpendicular to it, like inserting a straw
into a ball. Thus, determining the orientation of Vz and the scale in two directions is crucial. In
the following sub-sections, we analyze the population-level denoising result for three different
orientation settings and provide a smooth estimator for the last case.
16

4.1. Contraction with known projection direction. In the simplest scenario, we assume the
direction of Tz ∗ M, i.e., Π⊥
z ∗ , is known. Then, Vz can be constructed as the Cartesian product
of two balls. Specifically,
Vz = Bd (z, r1 ) × BD−d (z, r2 )
(4.1)
= Π− ⊥
z ∗ BD (z, r1 ) × Πz ∗ BD (z, r2 ),

where the first ball is d-dimensional, lying in Rd = Tz ∗ M, while the second one is in the
orthogonal complement of Rd in RD with a radius r2 ≫ r1 . Let µV z be the denoised point,
calculated with the conditional expectation within Vz ; precisely,

(4.2) µV
z = z + Πz ∗ EY ∼ν (Y − z|Y ∈ Vz ) ,
where Y is a random vector with density function ν(y). The refined point µV
z is much closer
to M. This result can be summarized as the following theorem:

T HEOREM 4.1. Consider a point z such that d(z, M) < Cσ . Let its neighborhood Vz be
defined as (4.1) with radius
p
r1 = cσ and r2 = Cσ log(1/σ).
The refined point µV
z given by (4.2) satisfies
2
d(µV
z , M) ≤ Cσ log(1/σ),
for some constant C .

P ROOF. Recall that Y = X + ξ in our model setting, and Π−


z ∗ is the orthogonal projection
onto Tz ∗ M. If we analogously write z as
z = z ∗ + (z − z ∗ ) := z ∗ + δz ,
µV
z in (4.2) can be decomposed as

µV
z = z + Πz ∗ EY ∼ν (Y − z|Y ∈ Vz )

(4.3) = z ∗ + δz + Π ⊥ ∗
z ∗ Eν ((X + ξ) − (z + δz )|Y ∈ Vz )
   
= z ∗ + Π−z z
∗ δ + Eν Π ⊥
z ∗ (X − z ∗
)|Y ∈ V z + Eν Π ⊥
z ∗ ξ|Y ∈ V z .


With such an expression, µV z − z can be decomposed into three terms. The next step is to show
that the norms of these terms are upper bounded by O(σ 2 log(1/σ)). According to Lemma
2.6, to get a bound in theporder of O(σ 2 log(1/σ)), we only need to consider a localppart of M,
i.e., MR with R = Cσ log(1/σ), and thus it is safe to assume ∥X − z∥2 ≤ Cσ log(1/σ)
for some constant C .
(a) Π−
z ∗ δz :
As δz ⊥ Tz M, we have
(4.4) Π−
z ∗ δz = 0.
(b) Eν Π⊥ ∗

z ∗ (X − z )|Y ∈ Vz :
Since z ∗ and X are exactly on M, from Jensen’s inequality and Lemma 2.3 we have
   
Eν Π⊥z ∗ (X − z ∗
)|Y ∈ V z ≤ Eν Π ⊥
z ∗ (X − z ∗
) |Y ∈ V z
2 2
1
Eν ∥X − z ∗ ∥22 |Y ∈ Vz ,



MANIFOLD FITTING 17

Tz ∗ M z∗ Tz ∗ M z∗

δz X

M M
(a) (b)

r1
RD−d

O
Tz ∗ M Rd

z−X r2 − ∆
X
ξ
r2
M 2∆
Vz
(c) (d)

F IG 4. Illustration for the three parts of the error bound in (4.3). (a) δz , perpendicular to Tz ∗ M; (b) Projection of
X − z ∗ , in a higher order than the length of X − z ∗ ; (c, d) Projection of noise term, in two Cartesian-coordinate
systems. A large area is canceled out because of symmetry.

where
∥X − z ∗ ∥22 = ∥X − z + z − z ∗ ∥22
≤ ∥X − z∥22 + ∥z − z ∗ ∥22
≤ Cσ 2 log(1/σ).
Hence,
  C 2
(4.5) Eν Π⊥
z ∗ (X − z ∗
)|Y ∈ Vz ≤ σ log(1/σ).
2 τ
Π⊥

(c) Eν z ∗ ξ|Y ∈ Vz :
Because
    
Eν Π⊥ ⊥
z ∗ ξ|Y ∈ Vz = Eω Eϕ Πz ∗ ξ|X, X + ξ ∈ Vz ,

we evaluate the inner part Eϕ Π⊥



z ∗ ξ|X, X + ξ ∈ Vz first. Assume the origin is transferred
to X as illustrated in Fig. 4(d). Now,
Vz = Bd (Π− ⊥
z ∗ (z − X), r1 ) × BD−d (Πz ∗ (z − X), r2 )

and there is a dislocation ∆ = ∥Π⊥


z ∗ (z − X)∥2 in R
D−d , which is bounded by

∆ ≤ ∥Π⊥ ∗ ⊥ ∗
z ∗ (z − z )∥2 + ∥Πz ∗ (z − X)∥2 ≤ Cσ.

Let ξ ′ = Π⊥
z ∗ ξ ; then, according to Proposition 2.5, we have
 
Eϕ Π⊥ = ∥E ξ ′ |ξ ′ ∈ BD−d (a∆ , r2 ) ∥2 ≤ Cσ 2 ,

(4.6) z∗ ξ|X, X + ξ ∈ V z
2

where a∆ is the projection of z − X onto RD−d .


Combining the above result in (4.4), (4.5), and (4.6), for any z such that d(z, M) < Cσ ,
the corresponding µV
z satisfies
∗ 2
∥µV
z − z ∥2 ≤ Cσ log(1/σ),
2
z is O(σ log(1/σ))-close to M.
for some constant C . Thus, the revised point µV
18

4.2. Contraction with estimated projection direction. Usually, the projection matrix is
unknown, but it can be estimated via many statistical methods. Assume Πb⊥
z ∗ is an estimator

for Πz ∗ , whose bias is

(4.7) b⊥
Π ⊥
z ∗ − Πz ∗ ≤ cσ κ .
F

Based on this estimation, a similar region V


b z can be defined as

(4.8) b z = Bd (z, r1 ) × BD−d (z, r2 ),


V
where the first ball Bd (z, r1 ) is in the span space of Π b−∗ with a radius r1 = cσ , and the second
zp
one is in the span space of Π ⊥
b z ∗ with a radius r2 = Cσ log(1/σ). Then, an estimated version
of µV
z can be obtained:
 

(4.9) bV
µ z = z + Π
b z ∗ EY ∼ν Y − z|Y ∈ Vb z ,

which is still closer to M. The error bound can be summarized as the following theorem:

T HEOREM 4.2. Consider a point z such that d(z, M) < Cσ . Let its neighborhood V b z be

defined as in (4.8), and the estimation error of Πz ∗ be bounded as in (4.7). The refined point
b
bV
µ z given by (4.9) satisfies
1+κ
p
d(µbV
z , M) ≤ Cσ log(1/σ),
for some constant C .

b⊥
Such an estimator Π z ∗ can be obtained via classical dimension-reduction methods such as
local PCA. Here we cite an error bound of local PCA estimators and implement the result of
Theorem 4.2 in the following remark.

L EMMA 4.3 (Theorem 2.1 in [42]). For a point z such that d(z, M) < Cσ , let Π b⊥z be the

estimator of Πz ∗ , obtained via local PCA with r = C σ . The difference between Πz and Π⊥
⊥ b ⊥
z∗
is bounded by
b⊥ ⊥ r
∥Π z − Πz ∗ ∥ F ≤ C
τ
with high probability.

b⊥
R EMARK . With the PCA estimator Π bV
z ∗ mentioned above, the distance between µ z and
M is bounded by
3/2
p
d(µbV
z , M) ≤ Cσ log(1/σ)
with high probability.

4.3. Contraction with estimated contraction direction. In the previous two cases, we
attempted to move z closer to z ∗ in the direction of Π⊥z ∗ . However, instead of estimating the
entire projection matrix, finding an estimator in the main direction is sufficient and can be
more accurate. Specifically, let the projection matrix onto z ∗ − z be
U = (z ∗ − z)(z ∗ − z)T /∥z ∗ − z∥22 ,
and, according to the discussion in Section 3, there is one estimator
e = (µBz − z)(µBz − z)T /∥µBz − z∥22 ,
U
MANIFOLD FITTING 19

whose error bound of U


e satisfies
p
(4.10) ∥U
e − U ∥F ≤ Cσ log(1/σ).
A narrow region can be analogously constructed based on U
e , namely

(4.11) b z = BD−1 (z, r1 ) × B1 (z, r2 ),


V
p
where the second ball is actually an interval in the direction of U e with r2 = Cσ log(1/σ),
and the first ball is in the span space of the complement of Ue in RD with r1 = cσ . Similarly, y
can be refined by
 
(4.12) bV
µ z = z + U
e E ν Y − z|Y ∈ V
b z ,

whose distance to M can be bounded with the following theorem.

T HEOREM 4.4. Consider a point z such that d(z, M) < Cσ . Let its neighborhood V b z be
defined as in (4.11), and the estimation error of U
e be bounded as in (4.10). The refined point
V
µ
bz given by (4.12) satisfies
∗ 2
bV
∥µ z − z ∥2 ≤ Cσ log(1/σ)

for some constant C .

r1

Tz ∗ M
z∗
z
ui
r2 yi
vi M
bz
V

F IG 5. Geometrical interpretation of ui and vi defined in (4.13): decomposing yi − z into its components; ui


denotes the projection along F (z) − z, and vi represents the orthogonal component.

For reasons similar to those discussed in Section 3.2, a smooth estimator constructed with
finite samples is needed. Recall that the continuous estimator for U is
T
b = (F (z) − z)(F (z) − z) ,
U
∥(F (z) − z)∥22
whose asymptomatic property is given in Theorem 3.5. For a data point yi , we define
(4.13) b (yi − z),
ui = U vi = yi − z − ui ,
which can be interpreted as the illustration in Fig. 5. Let the contracted point of z be
X
(4.14) G(z) = βi (z)yi ,
i
20

with the weights given by


 r2
 1, ∥ui ∥2 ≤ 2
 k
wu (ui ) = 1 − ( 2∥uir∥22 −r2 )2 , ∥ui ∥2 ∈ ( r22 , r2 ) ,

0, otherwise

k
( 
(4.15) ∥vi ∥22
wv (vi ) = 1 − r 2 , ∥vi ∥2 ≤ r1 ,
1

0, otherwise
X β̃i (y)
βi (z) = wu (ui )wv (vi ), β̃(z) = β̃i (z), βi (z) = ,
β̃(z)
with k ≥ 2 being a fixed integer. It is clear that G is a C 2 -continuous map from RD to RD .
The estimation accuracy of G(z) is summarized in the following theorem:

T HEOREM 4.5. If the sample size N = C1 σ −(d+3) , for a point z such that d(z, M) =
O(σ), G(z), as defined in (4.14), provides an estimation of z ∗ , whose error can be bounded
by
∥G(z) − z ∗ ∥2 ≤ C2 σ 2 log(1/σ)
with probability at least 1 − C3 exp(−C4 σ −c ), for some constant c, C1 , C2 , C3 , and C4 .

5. Fit a smooth manifold. Up to this point, we have explicated the techniques for
estimating the contraction direction and executing the contraction process for points proximal
to M. In this section, we synthesize these two procedures to yield the ultimate smooth
manifold estimator. The estimator is predicated upon a tubular neighborhood of M, denoted
by Γ = {y : d(y, M) ≤ Cσ}, and manifests in two distinct incarnations, corresponding to the
population and sample levels.
On the population level, we assume the distribution ν(y) is known, so that we can calculate
all the expectations. As mentioned in the introduction, estimating ω or M with a known density
function in the form of ν = ω ∗ ϕσ is closely related to the singular deconvolution problem
discussed in [20]. In contrast to their approach, our method uses geometrical structures to
generate an estimate in the form of an image set, yielding a similar error bound. Formally, we
have:

T HEOREM 5.1. Assume the density function ν(y) and region of interest Γ are given. With
bV
the µ y defined in (4.12), we let

bV
S = {µ y : y ∈ Γ}.

Then, we have
dH (S, M) ≤ Cσ 2 log(1/σ)
for some constant C .

When only the sample set Y is available, the function G(y), as defined in (4.14), can be
bV
used as an estimator of µ ∗
y . First, G(y) provides a good estimate of y with high probability.
Additionally, by definition, G(·) is a C 2 -continuous mapping in RD . Hence, similar to the
population case, the image set of Γ under the mapping G also has a good approximation
property. Moreover, because of the smoothness of both G and Γ, the output we obtain is also
a smooth manifold. Specifically, we have the following theorem:
MANIFOLD FITTING 21

T HEOREM 5.2. Assume the region Γ is given. With the G(y) defined in (4.14), we let
(5.1) Sb = G(Γ) = {G(y) : y ∈ Γ}.
Then, Sb is a smooth sub-manifold in RD , and the following claims simultaneously hold for
some constant C with high probability:
• For any x ∈ M, d(x, S)b ≤ Cσ 2 log(1/σ);
• For any s ∈ Sb, d(x, M) ≤ Cσ 2 log(1/σ).

The output manifold Sb furnishes a narrow tubular neighborhood of M. By disregarding


any anomalous points situated within a low-probability regime, we establish that the Haus-
dorff distance separating Sb and M scales as O(σ 2 log(1/σ)). To further refine the intrinsic
dimension of the manifold estimator to d, we introduce a partial solution in Theorem 5.3 and
a global solution in Theorem 5.4.

T HEOREM 5.3. For x ∈ M, let Π b x be the estimation of Πx as the one defined in (1.5).
Then there exists a constant c > 0 such that
Mcx = {y ∈ Γ ∩ BD (x, cτ ) : Πb⊥x (G(y) − y) = 0}

is a d-dimensional manifold embedded in RD . Meanwhile, for any point y ∈ M


cx ,
d(y, M) ≤ Cσ 2 log(1/σ)
for some constant C with high probability.
b⊥
Theorem 5.3 provides a local solution, by guaranteeing that the function Π x (G(y) − y) has
a constant rank D − d through predetermined regions of interest and a fixed projection matrix.
The resulting estimator is a piecewise d-dimensional manifold, which is more natural and
smooth, but requires further manipulations to integrate the piecewise manifolds into an entirely
smooth one. To avoid these manipulations, we assume there is a smooth initial manifold M f
2
contained by Γ. Additionally, since G is a C continuous mapping in Γ, we can assume that
the Jacobi matrix of G is bounded by LG and ℓG , and the Hessian matrix of G is bounded by
MG . Then a global estimator M f can be obtained via the following theorem:

T HEOREM 5.4. Let M f ⊂ Γ be a d-dimensional manifold with a positive reach τ0 . Suppose


that for each point x ∈ M, there exists a point y such that y ∗ = x. Then, the estimator defined
by Mc = G(M) f is also a d-dimensional manifold with the following conditions holding for
some constant c and C with high probability:
c, d(y, M) is less than Cσ 2 log(1/σ);
(I). For any point y ∈ M
c is less than Cσ 2 log(1/σ);
(II). For any point x ∈ M, d(x, M) n o
cℓG
(III). The reach of M
c is larger than a constant τb = min cστ0 ,
MG +LG .

Notably, the estimator defined in Theorem 5.4 requires an initial estimate M


f, which can
be obtained using the methods proposed in [32, 13, 42, 16]. In this paper, we also provide a
defined strategy for reference.

P ROPOSITION 5.5. Let M


f be a level set such that
f = {y ∈ Γ : Π∗ (F (y) − y) = 0},
M
where Π∗ is any arbitrary fixed projection matrix with rank D − d. Then, with high probability,
M
f is a d-dimensional submanifold embedded in Γ, and dH (M, f M) ≤ Cσ .
22

In summary, we present two manifold estimators in the form of image sets and one in
the form of level set, all satisfying the Hausdorff-distance condition under certain statistical
conditions. Among them, the estimator proposed in Theorem 5.2 is computationally simpler
and more suitable for scenarios involving sample points, while the other estimators offer
stronger theoretical guarantees for the geometric properties. As discussed in the introduction,
prior works often employed level sets as manifold estimators, despite their inherent limitations:
the existence of solutions to f (x) = 0, where f (x) maps from RD to RD , is not always evident.
Thus the nonemptiness of the level sets is uncertain, requiring additional scrutiny. Furthermore,
this approach lacks an explicit solution, making it difficult to obtain the projection of a given
point onto Mc. Iterative solvers are necessary to approximate the projections, although their
convergence remains unproven.

6. Numerical study. This section presents a comprehensive numerical investigation of


the superior performance of our method (ysl23) in manifold fitting. The experiments are
divided into three parts, each showcasing the advantages of ysl23 from different perspectives.
• We comprehensively demonstrate ysl23’s effectiveness through various numerical visual-
izations, performance evaluations on diverse manifolds, and exploration of its asymptotic
properties. The experiments confirm that the asymptotic behavior of ysl23 aligns with the
main theorems presented in this paper as we increase the number of samples and reduce
noise. Through this, we establish the reliability and validity of ysl23.
• We compare ysl23 with three major manifold-fitting methods: yx19 [42], cf18 [13], and
km17 [32], on two constant curvature manifolds and one inconstant curvature manifold.
Their performance is evaluated using metrics such as the Hausdorff distance, average
distance, and running time. The comparisons demonstrate that ysl23 outperforms the other
methods in terms of both accuracy and efficiency.
• We apply ysl23 to a particularly challenging class of manifolds, the Calabi–Yau manifolds [3,
43], which have a complex structure and diverse shapes. We demonstrate the effectiveness
of ysl23 by fitting Calabi–Yau manifolds and evaluating their performance by comparing
the output with the underlying Calabi–Yau manifold. Through these experiments, we show
that ysl23 can accurately fit the most complex manifolds, demonstrating its versatility and
applicability in challenging scenarios.
To ensure reproducibility, we followed a standardized setup similar to [32]. For each
manifold M, The generation and evaluation of the output manifold are based on the following
steps.
1. Independently generate the sample set Y with size N from the distribution ν defined in
(1.2), where σ is predefined.
2. Generate another set of initial points W = {w1 , ..., wN0 } near the underlying manifold,
satisfying σ/2 ≤ d(wi , M) ≤ 2σ .
3. Project every point in W from each tested method to the output manifold, respectively.
Denote the projection of W as W c.
4. Evaluate the performance of all tested methods via the three measures:
– The supremum of the approximation error, maxj d(w bj , M), calculated as an estimation
of the Hausdorff distance between M and M
c
P.
– The average of the approximation error, N10 j d(w bj , M), calculated as an estimation of
the average distance between Mc and M.
– The CPU time of the tested method.
Implementation and code The numerical study is conducted on a standard tower worksta-
tion with AMD ThreadRipper 3970X @4.5 GHz and 128GB of DDR4-3200mHz RAM. The
MANIFOLD FITTING 23

operating system is Windows 10 Professional 64 Bit. The simulations are implemented with
Matlab R2023a, which is chosen for its ability to perform parallel running conveniently
and reliably. The detailed algorithm used in this paper can be found in the supplementary
material, and the latest version of Python and Matlab implementation are available at
https://github.com/zhigang-yao/manifold-fitting.

6.1. Numerical illustrations of ysl23. Three different manifolds, including two constant-
curvature manifolds - a circle embedded in R2 and a sphere embedded in R3 - and a manifold
with negative curvature, namely a torus embedded in R3 , will be tested in this and the next
subsection. A visualization of these simulated manifolds is presented in Figure 6.

F IG 6. Manifolds employed in the numerical study. Left: a unit circle in R2 ; Middle: A unit sphere in R3 ; Right: a
torus in R3 .

Algorithm 1 ysl23: Project W onto M


f.
Input: Initial points W, noisy data Y, three radius parameters r0 , r1 , and r2 .
Output: Projection W c of W onto M.f

• For each w ∈ W:
1. Find the spherical neighborhood of w with radius r0 , and denote the index of the samples in it as Iw .
2. Calculate the weight function α̃i (w) and αi (w) for each i ∈ Iw as in (3.4), then calculate F (w) by (3.3).
3. Find the cylindrical neighborhood as in (4.11) with radius r1 and r2 , and denote the index of the samples in
it as Ibw .
4. Calculate the weight function β̃i (w) and βi (w) for each i ∈ Ibw as in (4.15), then calculate G(w) by (4.14).
5. Obtain the output point as w
b = G(w).

6.1.1. The fundamental procedure of ysl23. Figure 7 depicts a visualization of ysl23’s


steps using the circle as the underlying manifold. There are two simple steps in obtaining the
final output for a given noisy point w . Firstly, the weighted means of a spherical neighborhood
of w are computed using (3.3), which yields F (w). The first step captures the crucial informa-
tion about w , i.e., an approximation of the projected direction onto the underlying manifold.
In the second step, the weighted means of a cylinder neighborhood of F (w) are calculated to
obtain the final output G(w). The long axis of the cylinder is determined by the line connecting
w and F (w). Notably, ysl23 requires no iteration or knowledge of the underlying manifold’s
dimension. Furthermore, ysl23 can map a noisy sample point not only approximately on the
underlying manifold but also to its projection’s proximity on the manifold, as demonstrated
in panel (d) of Figure 7. As a summary, the detail of ysl23 can be found pin Algorithm 1. We
always set the radius parameters as r0 = r1 = 5σ/ lg(N ) and r2 = 10σ log(1/σ)/ lg(N ) in
our experiment.
24
(a) (b) (c) (d)
1 1 1 1

0.8 0.8 0.8 0.8

0.6 0.6 0.6 0.6


y
y y F(y)
0.4 y
0.4 0.4 0.4
F(y) F(y) G(y)
-1.2 -1 -0.8 -0.6 -1.2 -1 -0.8 -0.6 -1.2 -1 -0.8 -0.6 -1.2 -1 -0.8 -0.6

F IG 7. Visualization of ysl23’s steps: (a) Locating the neighborhood of a noisy observation w. (b) Computing
F (w) defined in (3.3). (c) Identifying the cylindrical neighborhood (points in the black rectangle) of w based on
F (w). (d) Obtaining the output point G(w) using (4.14).

F IG 8. Assessing the performance of ysl23 in fitting the circle (N = 5 × 104 , N0 = 100, σ = 0.06): the left panel
displays points in W surrounding the underlying manifold, while the right panel illustrates the corresponding
points in W.
c

The visualization of ysl23’s performance for the circle case is shown in Figure 8, and
the result for the sphere and the torus case can be found in the supplementary material. In
these tests, we set N = 5 × 104 , N0 = 100 for each case. The closer W c are to the underlying
manifold, the better it works. As can be observed from Figure 8, the output points are
significantly closer to the hidden manifold, clearly demonstrating the efficacy of ysl23. Similar
phenomena, as shown in the supplementary material, can be observed for both sphere and
torus cases.

6.1.2. Asymptotic analysis. To investigate the asymptotic properties of ysl23, we increased


N to simulate the case where it tends to infinity and decreased σ to simulate the case where
it tends to zero. Specifically, for the circle case, we considered N ∈ {3 × 102 , 3 × 103 , 3 ×
104 , 3 × 105 }, and σ ∈ {0.12, 0.1, 0.08, 0.06, 0.04, 0.02}. We started by fixing N = 3 × 104 ,
N0 = 100, and testing the performance of ysl23 with the change of σ . For each σ , we randomly
selected 50 different W and executed ysl23 on each of them. The Hausdorff distances and
average distances between the output manifold and the underlying manifold is shown at the top
of Figure 9. It shows that the Hausdorff distance and average distance decrease at a quadratic
rate as σ decreases, which matches the upper bound of the error given in Section 5. We also
observe that the average distance decreases more rapidly, demonstrating the global stability
of ysl23. Similarly, we fixed σ = 0.06 to test the performance of ysl23 with the change of
N . The Hausdorff distances and average distances between the output and hidden manifolds
are shown at the bottom of Figure 9. It shows that, as N increases, the Hausdorff distances
MANIFOLD FITTING 25

Hauadorff Distance (Circle, N = 3# 10 4 ) Average Distance (Circle, N = 3# 10 4 )


0.035 0.012

0.03 0.01
0.025
0.008
0.02
0.006
0.015
0.004
0.01

0.005 0.002

0 0
< = 0.12 < = 0.1 < = 0.08 < = 0.06 < = 0.04 < = 0.02 < = 0.12 < = 0.1 < = 0.08 < = 0.06 < = 0.04 < = 0.02

Hauadorff Distance (Circle, < = 0.06) #10-3 Average Distance (Circle, < = 0.06)
0.06

0.05
15
0.04

0.03 10

0.02
5
0.01

0 0
3# 10 2 3# 10 3 3# 10 4 3# 10 5 3# 10 2 3# 10 3 3# 10 4 3# 10 5

F IG 9. The asymptotic performance of ysl23 when fitting the circle. The top two figures show how the two distances
change with σ, while the bottom two figure show how the two distances change with N .

and average distances both decrease significantly. This improvement can be attributed to two
aspects. Firstly, with the increase of N , we can more accurately estimate the local geometry
of the manifold. Secondly, the radius of the neighborhood in ysl23 is set to decrease with the
increase of the sample size. Hence, the neighborhood in ysl23 becomes closer to its center
point while maintaining a sufficient number of points in the neighborhood. Similar results and
phenomena, as shown in the supplementary material, can be observed for both sphere and
torus cases.

6.2. Comparison of other manifold fitting methods. We performed ysl23, yx19, cf18, and
km17 on the three aforementioned manifolds. The circles and spheres cases were combined
since they both have constant curvature. The torus case was separately presented due to its
inconstant curvature.

6.2.1. The fitting of the circle and sphere. We set N = N0 = 300 for the√circle, and
N = N0 = 1000 for the sphere. The radius of the neighborhood was set as r = 2 σ for yx19,
cf18, and km17. Figure 10 displays the fitting results. The black and red dots correspond to
Mc and M, respectively. A higher degree of overlap between these two sets indicates a better
fit. The first row presents the complete space for the circle embedded in R2 , while the second
row shows the view from the positive z -axis of the sphere embedded in R3 . Notably, km17
demonstrates inferior performance compared with the other methods. Moreover, the estimated
circles by cf18 exhibit two significant gaps, suggesting inaccuracies in the estimator for some
local regions. The ysl23, as well as yx19, demonstrates the best performance.
We made an observation of interest when ysl23 successfully mapped the noisy samples
to the proximity of the hidden manifold, but the sample distribution on the output manifold
was slightly changed. This phenomenon occurred because the number of samples was not
sufficient to represent the perturbation of the uniform distribution on the manifold. Because
26

F IG 10. From left to right: the performance of ysl23, yx19, cf18, and km17 when fitting a circle (top, N = 300,
σ = 0.06) and a sphere (bottom, N = 1000, σ = 0.06).

of this, our contraction strategy clustered the output points towards the denser regions on the
input points. Fortunately, when the sample size is sufficiently large, ysl23 is able to ensure
that the output points are approximately uniformly distributed on M c (see Figure 22 in the
supplementary material).

Circle, Hauadorff Distance Circle, Average Distance Circle, CPU Time (s)
0.12 0.08 5

0.1 4
0.06

0.08 3
0.04
0.06 2
0.02
1
0.04
0 0
ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17

Sphere, Hauadorff Distance Sphere, Average Distance Sphere, CPU Time (s)
0.1
0.15
30
0.08

0.06 20
0.1
0.04
10
0.02

0.05 0 0
ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17

F IG 11. The Hausdorff distance, average distance, and CPU time of fitting a circle (top, N = 300, σ = 0.06) and
a sphere (bottom, N = 1000, σ = 0.06), using ysl23, yx19, cf18, and km17.

We repeated each method 10 times and evaluated their effectiveness in Figure 11. We
find that ysl23 and yx19 achieve slightly better results than cf18 in terms of the Hausdorff
MANIFOLD FITTING 27

distance, while all three outperform km17 significantly. When evaluating the average distance,
ysl23 and cf18 slightly outperform yx19, while all three show significant improvement over
km17. Overall, ysl23 consistently ranks among the top across different metrics. In terms of
computing time, ysl23 also stands out, with remarkably lower running times than those of
the other three methods. Among them, yx19 is the most efficient, while km17 lags behind
significantly.

Circle, Hauadorff Distance Circle, Average Distance Circle, CPU Time (s)
0.05 0.025 16
ysl23 ysl23 ysl23
0.045 yx19 yx19 14 yx19
0.02
0.04 12

0.035 0.015 10

0.03 8

0.025 0.01 6

0.02 4
0.005
0.015 2

0.01 0 0
N = 300 N = 1000 N = 3000 N = 300 N = 1000 N = 3000 N = 300 N = 1000 N = 3000

Shpere, Hauadorff Distance Sphere, Average Distance Sphere, CPU Time (s)
0.08 0.03 15
ysl23 ysl23 ysl23
yx19 yx19 yx19
0.07
0.025
10
0.06
0.02
0.05
5
0.015
0.04

0.03 0.01 0
N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000

F IG 12. The Hausdorff distance, average distance, and CPU time of fitting a circle (top, σ = 0.06) and a sphere
(bottom, σ = 0.06) with increasing N , using ysl23 and yx19.

We compared ysl23 and the well-performing yx19 by incrementally varying N to explore


their performance dependence on it. For the circle case, we selected N ∈ {3 × 102 , 1 ×
103 , 3 × 103 }, while for the sphere case, we selected N ∈ {1 × 102 , 2 × 103 , 3 × 103 }. Results
in terms of Hausdorff and average distance and running time are shown in Figure 12. The
Hausdorff distance showed a significant decrease for both algorithms as N increased. However,
yx19 remained relatively constant with increasing N when using the average distance, while
ysl23 achieved a significant reduction. Additionally, ysl23 demonstrated a clear advantage
in computational efficiency, with significantly shorter running times than yx19. For example,
yx19 took over 10 seconds to terminate when N reached 3000 in the presented examples,
while ysl23 was completed in under 0.5 seconds.

6.2.2. The fitting of the torus. We set N = 103 for the torus case. The results, displayed
in Figure 13, show that ysl23 outperformed the other three methods in terms of the Hausdorff
distance, average distance, and computing time. To evaluate the performance of ysl23 and
yx19 on the torus, we set an increasing sample size of N ∈ {1000, 2000, 3000} and compared
their results. Figure 14 illustrates the results of both algorithms for each N . As N increased,
we observed a reduction in the distance for both algorithms. However, ysl23 consistently
achieved a much lower distance than yx19, no matter which metric is used. Furthermore, ysl23
demonstrated a remarkable advantage in computational efficiency, completing the task with a
28

Torus, Hauadorff Distance Torus, Average Distance Torus, CPU Time (s)
0.35 0.045

0.3 0.04 50

0.25 0.035 40

0.2 0.03 30

0.15 0.025 20

0.1 0.02 10

0.05 0.015 0
ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17 ysl23 yx19 cf18 km17

F IG 13. The Hausdorff distance, average distance, and CPU time of fitting a torus (N = 1000, σ = 0.06), using
ysl23, yx19, cf18, and km17.

significantly shorter running time than yx19. Specifically, in the presented examples, yx19
took over 10 seconds to terminate when N reached 3000, while ysl23 finished in under 0.5
seconds.
Torus, Hauadorff Distance Torus, Average Distance Torus, CPU Time (s)
0.16 0.03 14
ysl23 ysl23 ysl23
yx19 yx19 yx19
0.028 12
0.14

0.026 10
0.12

0.024 8
0.1
0.022 6

0.08
0.02 4

0.06
0.018 2

0.04 0.016 0
N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000 N = 1000 N = 2000 N = 3000

F IG 14. The Hausdorff distance, average distance, and CPU time of fitting a torus (σ = 0.06) with increasing N ,
using ysl23 and yx19.

6.3. Fitting of a Calabi–Yau manifold. Calabi–Yau manifolds [3] are a class of compact,
complex Kähler manifolds that possess a vanishing first Chern class. They are highly significant
because they are Ricci-flat manifolds, which means that their Ricci curvature is zero at all
points, aligning with the universe model of physicists. A simple example of a Calabi–Yau
manifold is the Fermat quartic:
(6.1) x4 + y 4 + z 4 + w4 = 0, (x, y, z, w) ∈ P3 ,
where P3 refers to the complex projective 3-space. To visualize it, we generate low-dimensional
4
projections of the manifold by eliminating variables as in [23], dividing by w4 , and setting wz 4
to be constant. We then normalize the resulting inhomogeneous equation as
(6.2) x4 + y 4 = 1, x, y ∈ C.
The resulting surface is embedded in 4D and can be projected to ordinary 3D space for display.
The parametric representation of (6.2) is
(6.3) x(θ, k1 ) = e2πik1 /4 cosh(θ + ζi)2/4
MANIFOLD FITTING 29

θ + ζi 2/4
(6.4) y(θ, ζ, k2 ) = e2πik2 /4 sinh(
) ,
i
where the integer pair (k1 , k2 ) is selected by 0 ≤ k1 , k2 ≤ 3. Such {(x, y)} can be seen as
points in R4 , denoted by {Re(x), Re(y), Im(x), Im(y)}. A natural 3D projection is
(Re(x), Re(y), cos(ψ)Im(x) + sin(ψ)Im(y)),
where ψ is a parameter. The left panel of Figure 15 shows the surface plot of the 3D projection.

F IG 15. Performance of ysl23 when fitting the real projection of the Calabi–Yau manifold (6.1). The left panel
illustrates the shape of the 3D projection. The middle panel shows some noisy points around the manifold, and the
right panel shows the points on the output manifold.

We generated a set of points in (6.3) and (6.4) on a uniform grid (θ, ζ), where θ is a
sequence of numbers ranging from −1.5 to 1.5 with a step size of 0.05 between consecutive
values, and ζ a sequence of numbers ranging from 0 to π/2 with a step size of 1/640 between
consecutive values. In total, the dataset contains N = 313296 samples with Gaussian noise
added in R4 . As shown in the middle panel of Figure 15, the initial point distribution is not
close to the manifold. However, after running ysl23, the output is significantly closer to it, as
shown in the right panel of Figure 15. This phenomenon indicates that ysl23 performs well
in estimating complicated manifolds. It should be noted that we only applied ysl23 to this
example without running other algorithms because the sample size would cause very long
running times for other algorithms and would not yield usable results.

Hauadorff Distance (Calabi-Yau, N = 313296) Average Distance (Calabi-Yau, N = 313296)

1.2 0.06

1 0.05

0.8 0.04

0.6 0.03

0.4 0.02

0.2 0.01

0 0

< = 0.03 < = 0.025 < = 0.02 < = 0.015 < = 0.01 < = 0.005 < = 0.03 < = 0.025 < = 0.02 < = 0.015 < = 0.01 < = 0.005

F IG 16. The asymptotic performance of ysl23 fitting the real projection of the Calabi–Yau manifold (6.1). The two
panels show how the two distances change with σ.

We also executed ysl23 with different σ . Specifically, we tested ysl23 with decreasing
σ ∈ {0.03, 0.025, 0.02, 0.015, 0.01, 0.005}. As we decrease σ , both the Hausdorff distance
and average distance decrease at a quadratic rate, which matches Theorem 5.4. These results
further support the effectiveness and reliability of ysl23.
30

7. Conclusion. In this paper, the manifold-fitting problem is investigated by proposing


a novel approach to construct a manifold estimator for the latent manifold in the presence
of ambient-space noise. Our estimator achieves the best error rate, to our knowledge, with a
sample size bounded by a polynomial of the standard deviation of the noise term, and preserves
the smoothness of the latent manifold. The performance of the estimator is demonstrated
through rigorous theoretical analysis and numerical experiments. Our method provides a
reliable and efficient solution to the problem of manifold fitting from noisy observations, with
potential applications in various fields, such as computer vision and machine learning.
Our approach uses a two-step local contraction strategy to obtain an output manifold with a
significantly smaller error. First, we estimate the direction of contraction for a point around M
using a local average. Compared with previous methods that estimate the basis of the tangent
space, our approach provides a significant advantage in terms of the error rate and facilitates
the obtaining of better-contracted points. Next, we construct a hyper-cylinder, and the local
average within it is regarded as the contracted point. This point is O(σ 2 log(1/σ))-close to
M. Our hyper-cylinder has a length in a higher order of σ than the width, which differs from
the approach proposed in [16]. This difference in order allows us to eschew their requirement
of directly sampling from M.
We provide several methods to obtain the estimators of M. All of these estimators can
roughly achieve a Hausdorff distance in the order of O(σ 2 log(1/σ)), with or without the
high probability statement. Unlike in previous work, we achieve the state-of-the-art error
bound by reducing the required sample size to N = O(σ −(d+3) ). Using image sets to generate
estimators, our method is faster and more applicable to larger data sets. We also conduct
comprehensive numerical experiments to validate our theoretical results and demonstrate that
our algorithm not only achieves higher approximation accuracy but also consumes significantly
less time and computational resources than other methods. These simulation results indicate
the significant superiority of our approach in fitting the latent manifold, and suggest its
potential in various applications.
Overall, our approach has demonstrated promising results in fitting smooth manifolds
from ambient space, but nevertheless has some limitations that warrant further investigation.
First, our current assumption that the observations are from the convolution of a uniform
distribution on the manifold with a homogeneous Gaussian distribution may not capture the
full complexity of real-world data. Therefore, future research could explore the effects of
relaxing these assumptions. Second, while our theoretical results are promising, there is still
scope for optimization because of the application of inequalities in the proof and the choice of
weights in the two-step mapping. This limitation arises from the lack of an explicit expression
for some integrations with respect to Gaussian distributions. We believe that further research
addressing these limitations can lead to significant advancements in manifold-fitting methods,
at both the theoretical and applied levels.
To conclude, we discuss potential avenues for further research. In the real world, data often
exist on complicated manifolds, such as spheres, tori, and shape spaces, requiring specialized
analysis methods. Our manifold-fitting algorithm projects data onto a low-dimensional mani-
fold, allowing the use of other algorithms. Firstly, our approach has wide-ranging implications
for research involving the manifold hypothesis. For example, in GAN-based image-to-image
translation, images are assumed to lie around a low-dimensional manifold. Incorporating
our manifold-fitting method can significantly enhance the performance of the discriminator
and improve the overall GAN model. Secondly, numerous statistical studies concentrate on
non-Euclidean data originating from manifolds, including the principal nested spheres [24]
and the principal flows [35]. As our method can fit smooth d-dimensional manifolds from
ambient space, it provides a natural framework for generalizing statistical work on manifolds
to ambient space. Additionally, our method can also aid in the analysis of Euclidean data
by facilitating data clustering and simplifying subsequent objectives. We believe that our
approach will inspire further research in these areas.
MANIFOLD FITTING 31

APPENDIX A: MATHEMATICAL PRELIMINARY


We briefly review the basic concepts of topology and smooth manifolds essential for the
study of manifold fitting; for further details, see, for example, [27, 28, 29].

A.1. Topology.

A.1.1. Topological Space. Let X be a set. A topology on X is a collection T of subsets


of S , called open subsets, satisfying the following:
(a) X and ∅ are open.
(b) The union of any family of open sets is open.
(c) The intersection of any finite family of open subsets is open.
A pair (X, T ) consisting of a set X and a topology T on X is called a topological space.
Usually, when the topology is understood, these details will be omitted, with only the statement
that "X is a topological space".
The most common examples of topological spaces, from which most of our examples of
manifolds are built, are presented below.

E XAMPLE A.1 (Metric Spaces). A metric space is a set M endowed with a distance
function (also called a metric) d : M × M → R (where R denotes the set of real numbers)
satisfying the following properties for all x, y, z ∈ M :
(a) Positivity: d(x, y) ≥ 0, with equality if and only if x = y .
(b) Symmetry: d(x, y) = d(y, x).
(c) Triangle inequality: d(x, z) ≤ d(x, y) + d(y, z).
If M is a metric space, x ∈ M , and r > 0, the open ball of radius r around x is the set
B(x, r) = {y ∈ M : d(x, y) < r}.
The metric topology on M is defined by declaring a subset S ⊆ M to be open if, for every
point x ∈ S , there is some r > 0 such that B(x, r) ⊆ S .

E XAMPLE A.2 (Euclidean Spaces). For integer n ≥ 1, the set Rn of ordered n-tuples of
real numbers is called n-dimensional Euclidean space. We let a point in Rn be denoted by
x(1) , · · · , x(n) or x. The numbers x(i) are called the i-th components or coordinates of x.
For x ∈ Rn , the Euclidean norm of x is the nonnegative real number
q 2 2
∥x∥2 = x(1) + · · · + x(n) ,
and, for x, y ∈ Rn , the Euclidean distance function is defined by
d(x, y) = ∥x − y∥2 .
This distance function turns Rn into a complete metric space. The resulting metric topology
on Rn is called the Euclidean topology.

For the purposes of manifold theory, arbitrary topological spaces are too general. To avoid
pathological situations arising when there are not enough open subsets of X , we often restrict
our attention to Hausdorff space.

D EFINITION A.3 (Hausdorff space). A topological space X is said to be a Hausdorff


space if, for every pair of distinct points p, q ∈ X , there exist disjoint open subsets U, V ⊆ X
such that p ∈ U and q ∈ V .
32

There are numerous essential concepts in topology concerning maps, and these will be
introduced next. Let X and Y be two topological spaces, and F : X → Y be a map between
them.
• F is continuous if, for every open subset U ⊆ Y , the preimage F −1 (U ) is open in X .
• If F is a continuous bijective map with continuous inverse, it is called a homeomorphism.
If there exists a homeomorphism from X to Y , we say that X and Y are homeomorphic.
• A continuous map F is said to be a local homeomorphism if every point p ∈ X has a
neighborhood U ⊆ X such that F (U ) is open in Y and F restricts to a homeomorphism
from U to F (U ).
• F is said to be a closed map if, for each closed subset K ⊆ X , the image set F (K) is closed
in Y , and an open map if, for each open subset U ⊆ X , the image set F (U ) is open in Y . It
is a quotient map if it is surjective and V ⊆ Y is open if and only if F −1 (V ) is open.
Furthermore, for a continuous map F , which is either open or closed, the following rules
apply:
(a) If F is surjective, it is a quotient map.
(b) If F is injective, it is a topological embedding.
(c) If F is bijective, it is a homeomorphism.
For maps between metric spaces, there are several useful variants of continuity, especially in
the case of compact spaces. Assume (M1 , d1 ) and (M2 , d2 ) are metric spaces, and F : M1 →
M2 is a map. Then, F is said to be uniformly continuous if, for every ϵ > 0, there exists δ > 0
such that, for all x, y ∈ M1 , d1 (x, y) < δ implies d2 (F (x), F (y)) < ϵ. It is said to be Lipschitz
continuous if there is a constant C such that d2 (F (x), F (y)) ≤ Cd1 (x, y) for all x, y ∈ M1 .
Any such C is called a (globally) Lipschitz constant for F . We say that F is locally Lipschitz
continuous if every point x ∈ M1 has a neighborhood on which F is Lipschitz continuous.

A.1.2. Bases and countability. Suppose X is merely a set, and B is a collection of subsets
of X satisfying the following conditions:
S
(a) X = B∈B B .
(b) If B1 , B2 ∈ B and x ∈ B1 ∩ B2 , then there exists B3 ∈ B such that x ∈ B3 ⊆ B1 ∩ B2 .
Then, the collection of all unions of elements of B is a topology on X , called the topology
generated by B , and B is a basis for this topology.
A set is said to be countably infinite if it admits a bijection with the set of positive integers,
and countable if it is finite or countably infinite. A topological space X is said to be first-
countable if there is a countable neighborhood basis at each point, and second-countable if
there is a countable basis for its topology. Since a countable basis for X contains a countable
neighborhood basis at each point, second-countability implies first-countability.

A.1.3. Subspaces and Products. If X is a topological space and S ⊆ X is an arbitrary


subset, we define the subspace topology (or relative topology) on S by declaring a subset
U ⊆ S to be open in S if and only if there exists an open subset V ⊆ X such that U = V ∩ S .
A subset of S that is open or closed in the subspace topology is sometimes said to be relatively
open or relatively closed in S , to make it clear that we do not mean open or closed as a subset
of X . Any subset of X endowed with the subspace topology is said to be a subspace of X .
If X and Y are topological spaces, a continuous injective map F : X → Y is called a
topological embedding if it is a homeomorphism onto its image F (X) ⊆ Y in the subspace
topology.
If X1 , · · · , Xk are (finitely many) sets, their Cartesian product is the set X1 × · · · × Xk
consisting of all ordered k -tuples of the form (x1 , · · · , xk ) with xi ∈ Xi for each i.
MANIFOLD FITTING 33

Suppose X1 , · · · , Xk are topological spaces. The collection of all subsets of X1 × · · · × Xk


of the form U1 × · · · × Uk , where each Ui is open in Xi , forms a basis for a topology on
X1 × · · · × Xk , called the product topology. Endowed with this topology, a finite product of
topological spaces is called a product space. Any open subset of the form U1 × · · · × Uk ⊆
X1 × · · · × Xk , where each Ui is open in Xi , is called a product open subset.

A.1.4. Connectedness and Compactness. A topological space X is said to be disconnected


if it has two disjoint nonempty open subsets whose union is X , and it is connected otherwise.
Equivalently, X is connected if and only if the only subsets of X that are both open and closed
are ∅ and X itself. If X is any topological space, a connected subset of X is a subset that is a
connected space when endowed with the subspace topology.
Closely related to connectedness is path connectedness. If X is a topological space and
p, q ∈ X , a path in X from p to q is a continuous map f : I → X (where I = [0, 1] ) such that
f (0) = p and f (1) = q . If for every pair of points p, q ∈ X there exists a path in X from p to
q , then X is said to be path-connected.
A topological space X is said to be compact if every open cover of X has a finite subcover.
A compact subset of a topological space is one that is a compact space in the subspace topology.
For example, it is a consequence of the Heine–Borel theorem that a subset of Rn is compact if
and only if it is closed and bounded. We list some of the properties of compactness as follows.
• If F : X → Y is continuous and X is compact, then F (X) is compact.
• If X is compact and f : X → R is continuous, then f is bounded and attains its maximum
and minimum values on X .
• Any union of finitely many compact subspaces of X is compact.
• If X is Hausdorff and K and L are disjoint compact subsets of X , then there exist disjoint
open subsets U, V ⊆ X such that K ⊆ U and L ⊆ V .
• Every closed subset of a compact space is compact.
• Every compact subset of a Hausdorff space is closed.
• Every compact subset of a metric space is bounded.
• Every finite product of compact spaces is compact.
• Every quotient of a compact space is compact.

A.2. Smooth Manifold.

A.2.1. Topological Manifolds. A d-dimensional topological manifold (or simply a d-


manifold) is a second-countable Hausdorff topological space that is locally Euclidean of
dimension d, which means every point has a neighborhood homeomorphic to an open subset
of Rd . Given a d-manifold M, a coordinate chart for M is a pair (U, φ), where U ⊆ M is an
open set and φ : U → U e is a homeomorphism from U to an open subset U e ⊆ Rd . If p ∈ M
and (U, φ) is a chart such that p ∈ U , we say that (U, φ) is a chart containing p.
On occasion, we may need to consider manifolds with boundaries. A d-dimensional
topological manifold with boundary a is a second-countable Hausdorff topological space in
which every point has a neighborhood homeomorphic either to an open subset of Rd or to an
open subset of the half space of Rd . The corresponding concepts are slightly different with the
manifolds without boundaries. For consistency, in the following sections a manifold without
further qualification is always assumed to be a manifold without a boundary.

A.2.2. Smooth Manifolds. Briefly speaking, smooth manifolds are topological manifolds
endowed with an extra structure that allows us to differentiate functions and maps. To introduce
the smooth structure, we first recall the smoothness of a map F : U → Rk . When U is an
open subset of Rd , F is said to be smooth (or C ∞ ), and all of its component functions have
34

continuous partial derivatives of all orders. More generally, when the domain U is an arbitrary
subset of Rd , not necessarily open, F is said to be smooth if, for each x ∈ U, F has a smooth
extension to a neighborhood of x in Rn . A diffeomorphism is a bijective smooth map whose
inverse is also smooth.
If M is a topological d-manifold, then two coordinate charts (U, φ), (V, ψ) for M are said
to be smoothly compatible if both of the transition maps ψ ◦ φ−1 and φ ◦ ψ −1 are smooth
where they are defined (on φ(U ∩ V ) and ψ(U ∩ V ), respectively). Since these maps are
inverses of each other, it follows that both transition maps are in fact diffeomorphisms. An
atlas for M is a collection of coordinate charts whose domains cover M. It is called a smooth
atlas if any two charts in the atlas are smoothly compatible. A smooth structure on M is a
smooth atlas that is maximal, which means it is not properly contained in any larger smooth
atlas. A smooth manifold is a topological manifold endowed with a specific smooth structure.
If M is a set, a smooth manifold structure on M is a second-countable, Hausdorff, locally
Euclidean topology together with a smooth structure, making it a smooth manifold. If M is
a smooth d-manifold and W ⊆ M is an open subset, then W has a natural smooth structure
consisting of all smooth charts (U, φ) for M such that U ⊆ W , and so every open subset of a
smooth d-manifold is a smooth d manifold in a natural way.
Suppose M and N are smooth manifolds. A map F : M → N is said to be smooth if,
for every p ∈ M, there exist smooth charts (U, φ) for M containing p and (V, ψ) for N
containing F (p) such that F (U ) ⊆ V and the composite map ψ ◦ F ◦ φ−1 is smooth from
φ(U ) to ψ(V ). In particular, if N is an open subset of Rk or Rk+ with its standard smooth
structure, we can take ψ to be the identity map of N , and then smoothness of F simply
means that each point of M is contained in the domain of a chart (U, φ) such that F ◦ φ−1
is smooth. It is a clear and direct consequence of the definition that identity maps, constant
maps, and compositions of smooth maps are all smooth. A map F : M → N is said to be a
diffeomorphism if it is smooth and bijective and F −1 : N → M is also smooth.
We let C ∞ (M, N ) denote the set of all smooth maps from M to N , and C ∞ (M) the
vector space of all smooth functions from M to R. For every function f : M → R or Rk , we
define the support of f , denoted by supp f , as the closure of the set {x ∈ M : f (x) ̸= 0}. If
A ⊆ M is a closed subset and U ⊆ M is an open subset containing A, then a smooth bump
function for A supported in U is a smooth function f : M → R satisfying 0 ≤ f (x) ≤ 1 for
all x ∈ M, f |A ≡ 1, and supp f ⊂ U . Such smooth bump functions always exist.
There are various equivalent approaches to define tangent vectors on M. The most con-
venient one is via the following definition: for every point p ∈ M, a tangent vector at p is a
linear map v : C ∞ (M) → R that is a derivation at p, which means that, for all f, g ∈ C ∞ (M),
v satisfies the product rule
v(f g) = f (p)vg + g(p)vf.
The set of all tangent vectors at p is denoted by Tp M and called the tangent space at p.
Suppose M is d-dimensional and φ : U → U e ⊆ Rd is a smooth coordinate chart on some
open subset U ⊆ M. Writing the coordinate functions of φ as x(1) , · · · , x(n) , we define the


coordinate vectors ∂/ ∂x(1) p , · · · , ∂/ ∂x(n) p by


∂ ∂
f ◦ φ−1 .

f=
∂x(i) p ∂x(i) φ(p)

These vectors form a basis for Tp M, which therefore has dimension d. Thus, once a smooth
coordinate chart has been chosen, every tangent vector v ∈ Tp M can be written uniquely in
the form
v = v (1) ∂/ ∂x(1) + · · · + v (n) ∂/ ∂x(n) .
p p
MANIFOLD FITTING 35

If F : M → N is a smooth map and p is any point in M, we define a linear map dFp :


Tp M → TF (p) N , called the differential of F at p, with
dFp (v)f = v(f ◦ F ), v ∈ Tp M.
Once we have chosen local coordinates x(i) for M and y (j) for N , we find, by unwinding
 
the definitions, that the coordinate representation of the differential map is given by the
Jacobian matrix of the coordinate representation of F , which is its matrix of first-order partial
derivatives:
!
(i) ∂ ∂ Fe(j) ∂
dFp v = (p)v (i) (j) .
∂x(i) p ∂x(i) ∂y
F (p)

A.2.3. Submanifolds. The theory system of submanifolds is established on the inverse


function theorem and its corollaries.

T HEOREM A.4 (Inverse Function Theorem for Manifolds, Thm. 4.5 of [28]). Suppose
M and N are smooth manifolds and F : M → N is a smooth map. If the linear map dFp is
invertible at some point p ∈ M, then there exist connected neighborhoods U0 of p and V0 of
F (p) such that F |U0 : U0 → V0 is a diffeomorphism.

The most useful consequence of the inverse function theorem is that a smooth map F :
M → N is said to have constant rank if the linear map dFp has the same rank at every point
p ∈ M.

T HEOREM A.5 (Rank Theorem, Thm. 4.12 of [28]). Suppose M and N are smooth
manifolds of dimensions m and n, respectively, and F : M → N is a smooth map with
constant rank r . For each p ∈ M there exist smooth charts (U, φ) for M centered at p
and (V, ψ) for N centered at F (p) such that F (U ) ⊆ V , in which F has a coordinate
representation of the form
   
Fe x(1) , · · · , x(r) , x(r+1) , · · · , x(m) = x(1) , · · · , x(r) , 0, · · · , 0

The most important types of constant-rank maps are listed below. In all of these definitions,
M and N are smooth manifolds, and F : M → N is a smooth map.
• F is a submersion if its differential is surjective at each point, or equivalently if it has
constant rank equal to dim N .
• F is an immersion if its differential is injective at each point, or equivalently if it has
constant rank equal to dim M.
• F is a local diffeomorphism if every point p ∈ M has a neighborhood U such that F |U is a
diffeomorphism onto an open subset of N , or equivalently if F is both a submersion and an
immersion.
• F is a smooth embedding if it is an injective immersion that is also a topological embedding
(a homeomorphism onto its image, endowed with the subspace topology).

R EMARK (Prop. 5.5 of [28]). If M is a smooth manifold, then an embedded submanifold


N ⊆ M is properly embedded if and only if it is a closed subset of M.

Most submanifolds are presented in the following manner. Suppose Φ : M → N is any


map. Every subset of the form Φ−1 ({y}) ⊆ M for some y ∈ N is called a level set of Φ, or
the fiber of Φ over y . The simpler notation Φ−1 (y) is also used for a level set when there is no
likelihood of ambiguity. Let the codimension of N be the difference dim N − dim M.
36

T HEOREM A.6 (Constant-Rank Level Set Theorem, Thm. 5.12 of [28]). Suppose M and
N are smooth manifolds, and Φ : M → N is a smooth map with constant rank r . Every level
set of Φ is a properly embedded submanifold of codimension r in M

C OROLLARY A.6.1 (Submersion Level Set Theorem, Cor. 5.13 of [28]). Suppose M and
N are smooth manifolds, and Φ : M → N is a smooth submersion. Every level set of Φ is a
properly embedded submanifold of M, whose codimension is equal to dim N .

In fact, a map does not have to be a submersion, or even to have constant rank, for its level
sets to be embedded submanifolds. If Φ : M → N is a smooth map, a point p ∈ M is called
a regular point of Φ if the linear map dΦp : Tp M → TΦ(p) N is surjective, and p is called a
critical point of Φ if it is not. A point c ∈ N is called a regular value of Φ if every point of
Φ−1 (c) is a regular point of Φ, and a critical value otherwise. A level set Φ−1 (c) is called a
regular level set of Φ if c is a regular value of Φ.

C OROLLARY A.6.2 (Regular Level Set Theorem, Cor. 5.14 of [28]). Let M and N be
smooth manifolds, and let Φ : M → N be a smooth map. Every regular level set of Φ is a
properly embedded submanifold of M whose codimension is equal to dim N .

A.3. Riemannian manifold. There are many important geometric concepts in Euclidean
space, such as length and angle, which are derived from inner product. To extend these
geometric ideas to abstract smooth manifolds, we need a structure that amounts to a smoothly
varying choice of inner product on each tangent space.
Let M be a smooth manifold. A Riemannian metric on M is a collection of inner products,
whose element at pM is an inner product gp : Tp M × Tp M → R that varies smoothly with
respect to p. A Riemannian manifold is a pair (M, g), where M is a smooth manifold and
g is a specific choice of Riemannian metric on M. If M is understood to be endowed with
a specific Riemannian metric, a conventional statement often used is “M is a Riemannian
manifold.” In the following sections, we assume (M, g) is an oriented Riemannian d-manifold.
Another important construction provided by a metric on an oriented manifold is a canonical
volume form. For (M, g), there is a unique d-form dVg on M, called the Riemannian volume
form, characterized by
q
dVg = det (gij )dx(1) ∧ · · · ∧ dx(d) ,

where the dx(i) are 1-forms from any oriented local coordinates. Here, det (gij ) is the absolute
value of the determinant of the matrix representation of the metric tensor on the manifold.
The Riemannian volume form allows us to integrate functions on an oriented Riemannian
manifold. Let f be a continuous, compactly supported real-valued
R function on (M, g). Then,
f dVg is a compactly supported d-form. Therefore, the integral M f dVg makes sense, and we
define it as the integral of f over M. Similarly, we can define probability measures on M,
and if M is compact, the volume of M can be evaluated as
Z Z
Vol(M) = dVg = 1dVg .
M M
A curve in M usually means a parametrized curve, namely a continuous map γ : I → M,
where I ⊆ R is some interval. To say that γ is a smooth curve is to say that it is smooth as a
map from I to M . A smooth curve γ : I → M has a well-defined velocity γ ′ (t) ∈ Tγ(t) M for
each t ∈ I . We say that γ is a regular curve if γ ′ (t) ̸= 0 for t ∈ I . This implies that the image
of γ has no “corners” or “kinks.” For brevity, we refer to a piecewise regular curve segment
MANIFOLD FITTING 37

γ : [a, b] → M as an admissible curve, and any partition (a0 , · · · , ak ) such that γ|[ai−1 ,ai ] is
smooth for each i as an admissible partition for γ . If γ is an admissible curve, we define the
length of γ as
Z b
Lg (γ) = γ ′ (t) g dt.
a
The speed of γ at any time t ∈ I is defined as the scalar |γ ′ (t)|. We say that γ is a unit-speed
curve if |γ ′ (t)| = 1 for all t, and a constant-speed curve if |γ ′ (t)| is constant. If γ : [a, b] → M
is a unit-speed admissible curve, then its arc-length function has the simple form s(t) = t − a.
For this reason, a unit-speed admissible curve whose parameter interval is of the form [0, b] is
said to be parametrized by arc-length.
For each pair of points p, q ∈ M, we define the Riemannian distance from p to q , denoted
by dM (p, q), as the infimum of the lengths of all admissible curves from p to q . When M is
connected, we say an admissible curve γ is a minimizing curve if and only if Lg (γ) is equal to
the distance between its endpoints. A unit-speed minimizing curve is also called a geodesic.
Thus, we use geodesic distance and Riemannian distance interchangeably.

T HEOREM A.7 (Existence and Uniqueness of Geodesics, Thm 4.27 of [29]). For every
p ∈ M, w ∈ Tp M, and t0 ∈ R, there exist an open interval I ⊆ R containing t0 and a
geodesic γ : I → M satisfying γ (t0 ) = p and γ ′ (t0 ) = w . Any two such geodesics agree on
their common domain.

A geodesic γ : I → M is said to be maximal if it cannot be extended to a geodesic on a


larger interval. A geodesic segment is a geodesic whose domain is a compact interval. For
each p ∈ M , the (restricted) exponential map at p, denoted by expp , is defined by
expp (v) = γv (1),
where v ∈ Tp M and γv are the unique geodesic with initial location γv (0) = p and γv′ = v .
The exponential map is a diffeomorphism in a neighborhood of the tangent space. Similarly,
we define the logarithm map logp as the inverse of expp . The injectivity radius of M at p,
denoted by inj(p), is the supremum of all r > 0 such that expp is a diffeomorphism from
B(0, r) ⊆ Tp M onto its image.

A.4. Other concepts.

D EFINITION A.8 (Normal matrices). A matrix square matrix A is normal when AA∗ =
A∗ A, where A∗ is its conjugate-transpose. This is equivalent to saying that there exists a
unitary matrix U such that U AU ∗ is diagonal (and the diagonal elements are precisely the
eigenvalues of A). Every Hermitian and every unitary matrix is normal.

D EFINITION A.9 (Trace norm). The trace norm is defined for every A by
X
∥A∥2F := Tr (AA∗ ) = Tr (A∗ A) = |Ai,j |2 .
1≤i,j≤n
This is also known as the Frobenius, Schur, or Hilbert–Schmidt norm.

D EFINITION A.10 (Principal angles). Suppose A and B are two vector spaces; we call
each
θi (A, B) = arccos(λi (A, B))
the i-th principal angle between A and B , where λi (A, B) is the i-th largest eigenvalue of
AT B . Let Θ(A, B) denote the diagonal matrix whose i-th diagonal entry is θi (A, B), and let
sin Θ(A, B) be performed entrywise, i.e.,
sin Θ(A, B) := diag (sin θi (A, B)) .
38

APPENDIX B: PROOF OMITTED FROM THE MAIN TEXT


B.1. Some useful lemmas and corollaries.

L EMMA B.1 (Chernoff bound). The generic Chernoff bound for a random variable X is
attained by applying Markov’s inequality to etX . For every t > 0, there is
 E etX

tX ta
P(X ≥ a) = P e ≥ e ≤ .
eta
Since the inequality holds for every t > 0, we have
E etX

P(X ≥ a) ≤ inf .
t>0 eta

C OROLLARY B.1.1. Let ξ ∼ N (0, σ 2 ID ) be a D -dimensional normal random vector with


mean 0 and covariance matrix σ 2 ID . According to the Chernoff bound,
 2 D/2
t2

t
P(∥ξ∥2 ≥ t) ≤ exp 1 −
Dσ 2 Dσ 2

for t ≥ Dσ .

C OROLLARY B.1.2. Let n ∼ Bino(N, p) be a binomial random variable with size N and
probability p. According to the Chernoff bound,
n 
P ≥ p + ϵ ≤ exp {−N DKL (p + ϵ∥p)} ,
N
n 
P ≤ p − ϵ ≤ exp {−N DKL (p − ϵ∥p)} ,
N
for ϵ > 0, where
a 1−a
DKL (a∥b) = a log( ) + (1 − a) log( )
b 1−b
denotes the Kullback–Leibler divergence between Bernoulli distributions Be(a) and Be(b).

L EMMA B.2. Assume there is a sequence of observed points {yi }ni=1 , with a series of
weights W (y1 ), · · · , W (y1 ). Let the local moving weighted average be
Pn
W (yi )yi
µbn = Pi=1n .
i=1 W (yi )
Then, if {y : W (y) > 0} ⊂ BD (z, r),

 
d Σ
bn − µ
n (µ bw ) → N 0, ,
E(W )2
with Σ ≤ r2 ID and µ
bw = E(W Y )/E(W ).

P ROOF. According to the central limit theorem and the law of large numbers,
Pn
i=1 wi a.s.
→ E(W ),
n
 Pn


i=1 wi yi d
n − E(W Y ) → N (0, Σ),
n
MANIFOLD FITTING 39

where Σ ≤ r2 ID . Thus,

   
E(W Y ) d Σ
n µbn − → N 0, .
E(W ) E(W )2

C OROLLARY B.2.1. In the case of n = CDσ −3 , with σ sufficiently small,


bw ∥2 ≤ cσ 2 ) ≥ 1 − C1 σ c1 −1 exp −C2 σ c1 −1 ,

bn − µ
P(∥µ
for some constant C1 , C2 , and any c1 ∈ (0, 1).

P ROOF. According to Corollary B.1.1, when σ is sufficiently small,


r √
P(∥µ bw ∥2 ≤ cσ 2 ) ≥ P( √
bn − µ χ ≤ cσ 2 )
n
D/2
c nσ 2 c nσ 2
 
≥1− exp 1 − ,
D log(1/σ) D log(1/σ)
D −2
for n ≥ cσ log(1/σ). Thus, in the case of n = CDσ −3 , the probability is close to 1.

B.2. Proof of content in Section 2.

B.2.1. Proof of Lemma 2.4.

P ROOF. Recall that, in our model, Y = X + ξ , with X ∼ ω(M) and ξ ∼ N (0, σ 2 ID ). We


first check the Chernoff bound for the noise term, which is
 2 2 D/2
c21 r2

c1 r
P(∥ξ∥2 ≥ c1 r) ≤ exp 1 −
Dσ 2 Dσ 2
 2 2 D/2
c21 C 2 2d

c1 C 2d
= log(1/σ) exp 1 − log(1/σ)
D D
2 2
= c2 (log(1/σ))D/2 σ c1 C d

≤ c2 rd ,
where the first inequality comes from the Chernoff bound, while the last one occurs because σ
is sufficiently small.
Then, for P(Y ∈ BD (z, r)), on one hand,
P(Y ∈ BD (z, r)) ≥ P(∥ξ∥2 ≤ c1 r)P(X ∈ M ∩ BD (z, (1 − c1 )r))
vol(M ∩ BD (z, (1 − c1 )r))
≥ (1 − c2 rd )
vol(M)
≥ c3 rd .
On the other,
P(Y ∈ BD (z, r)) = P(X ∈ M ∩ BD (z, C2 r), ∥Y − z∥2 ≤ r)
+ P(X ∈
/ M ∩ BD (z, C2 r), ∥Y − z∥2 ≤ r)
≤ P(X ∈ M ∩ BD (z, C2 r)) + P(∥ξ∥2 ≥ (C2 − 1)r)
vol(M ∩ BD (z, (1 − c1 )r))
≤ + c4 rd
vol(M)
≤ c5 rd .
40

Therefore, P(Y ∈ BD (z, r)) = crd for some constant c.

B.2.2. Proof of Corollary 2.4.1.

P ROOF. The number of points n can be viewed as a binomial random variable with size N
and probability parameter p = crd . For any c1 ∈ (0, 1), according to Corollary B.1.2,
n 
P ≤ (1 − c1 )p ≤ exp {−N ((1 − c1 )p log(1 − c1 ))}
N
  
1 − (1 − c1 )p
× exp −N (1 − (1 − c1 )p) log( )
1−p
≤ exp −C1 σ −3 ,


n 
P ≥ (1 + c1 )p ≤ exp {−N ((1 + c1 )p log(1 + c1 ))}
N
  
1 − (1 + c1 )p
× exp −N (1 − (1 + c1 )p) log( )
1−p
≤ exp −C2 σ −3 .


Therefore,
P(C3 Dσ −3 ≤ n ≤ C4 Dσ −3 ) ≥ 1 − 2 exp −C5 σ −3 .


When σ is sufficiently small, the probability will be close to 1.

RD−1 RD−1

V1
V3
r V2
R1 R1
O ∆

(a) (b)

F IG 17. Illustration of the integral region in the proof of Proposition 2.5: (a) The region of calculating the
conditional expectation E(ξ|ξ ∈ BD (∆U, r)), where the two shaded parts cancel each other out; (b) Three
multidimensional cubes designed for bounding the expectation.

B.2.3. Proof of Proposition 2.5.

P ROOF. Without loss of generality, we adjust the Cartesian-coordinate system such that
z = (∆, 0, · · · , 0) and ξ = (ξ (1) , · · · , ξ (D) ), with ∆ = ∥z∥2 ≤ C1 σ for some constant C1 . As
MANIFOLD FITTING 41

illustrated in Fig.17, a large part of the calculating region is canceled, and the expectations
can be bounded through three integrations on multidimensional cubes. That is,
V1 = [∆ − r, ∆ + r] × [−r, r]D−1 ,
∆−r r−∆
V2 = [ √ , √ ]D ,
D D
r r r r
V3 = [∆ − √ , ∆ + √ ] × [− √ , √ ]D−1 .
D D D D
To bound the distance between E(ξ|ξ ∈ BD (z, r)) and the origin, let
∥ξ∥2
Z
D
I1 = |ξ (1) |(2πσ 2 )− 2 exp{− 22 }dξ,
V1 2σ
∥ξ∥2
Z
D
I2 = |ξ (1) |(2πσ 2 )− 2 exp{− 22 }dξ,
V2 2σ
∥ξ∥2
Z
D
I3 = (2πσ 2 )− 2 exp{− 22 }dξ.
V3 2σ
Because of the symmetry,
2
(1) (2πσ 2 )− D exp{− ∥ξ∥
R
BD (z,r) ξ 2σ 2 }dξ
2 2

∥E(ξ|ξ ∈ BD (z, r))∥2 = 2


exp{− ∥ξ∥
D
2 )− 2
R
BD (z,r) (2πσ 2σ 2 }dξ
2

I1 − I2
≤ .
I3
For simplicity, denote
∆+r r D−1
t2 s2
Z Z
D
D−1 2
2 (2πσ ) I1 =
2 |t| exp{− 2 }dt exp{− 2 }ds
∆−r 2σ 0 2σ
:= I1+ + I1− ,
r−∆ r−∆
!D−1

t2 √
s2
Z Z
D D D
2D−1 (2πσ 2 ) I2 =
2 |t| exp{− 2 }dt exp{− 2 }ds
∆−r
√ 2σ 0 2σ
D

:= I2+ + I2− .
Meanwhile, let
r−∆
r+∆

t2 t2
Z Z
D
a= t exp{− 2 }dt, δa = t exp{− }dt,
0 2σ r−∆
√ 2σ 2
D

r−∆
r

s2 s2
Z Z
D
b= exp{− 2 }ds, δb = exp{− }ds.
0 2σ r−∆
√ 2σ 2
D

Then, there is
Cr2
 
a = σ 2 1 − exp{− 2 } < σ 2 , b = σ(Φ(C) − Φ(0)) < Cσ,

( r−∆
√ )2
( ) !
(r + ∆)2

2 D
δa = σ exp − − exp − < Cσ 3 ,
2σ 2 2σ 2
42
( r−∆ 2 )
r−∆ (√ )
δb < (r − √ ) exp − D2 = Cσ 2 log(1/σ) < Cσ.
D 2σ
Furthermore, we can obtain
I1+ − I2+ = (a + δa )(b + δb )D−1 − abD−1
= a(b + δb )D−1 − abD−1 + δa (b + δb )D−1
= δa (b + δb )D−1 + a((b + δb )D−1 − bD−1 )
< Cσ D+2 + aδb (b + δb )D−2 + (b + δb )D−3 b + · · · + (b + δb )bD−3 + bD−2


< Cσ D+2 .
Thus, I1 − I2 < Cσ −D (I1+ − I2+ ) = Cσ 2 .
Additionally, it is clear that I3 > C , and hence,
∥E(ξ|ξ ∈ BD (z, r))∥2 ≤ Cσ 2 .

B.2.4. Proof of Lemma 2.6.


R
P ROOF. Let νc (y) = M\MR ϕσ (y − x)ω(x)dx; then, according to the model setting,
νc (y) = ν(y) − νR (y) ≥ 0.
The probability measure within BD (z, r) is proportional to ν(y), which can be expressed as
ν(y)
ν̃(y) = R
BD (z,r) ν(y)dy

νR (y) + νc (y)
=R R
BD (z,r) R (y)dy + BD (z,r) νc (y)dy
ν
for y ∈ BD (z, r). Then, the relative difference between ν̃(y) and ν̃R (y) can be evaluated as
νR (y) + νc (y) νR (y)
|ν̃(y) − ν̃R (y)| = R R −R
BD (z,r) νR (y)dy + BD (z,r) νc (y)dy BD (z,r) νR (y)dy
R
νc (y) B (z,r) νc (y)dy
≤ −R D ν̃R (y).
νR (y) BD (z,r) νR (y)dy

(d + η) log(1/σ) and R′ = r + C1 σ < R; then,


p
Let R = r + C1 σ
R
νc (y) M\MR ϕσ (y − x)ω(x)dy
= R
MR ϕσ (y − x)ω(x)dy
νR (y)
ϕσ (R − r)ω(M)

ϕσ (R′ − r)ω(M ∩ BD (y, R′ ))
σ d+η V
≤C
vol(M ∩ BD (y, R′ ))
σ d+η
≤C .
(R′ )d
MANIFOLD FITTING 43

Therefore,
|ν̃(y) − ν̃R (y)| ≤ Cσ η ν̃R (y).

B.3. Proof of content in Section 3.

B.3.1. Proof of Proposition 3.2.

P ROOF. Recall that z ∗ is the origin, and z − z ∗ is the (d + 1)-th direction in the Cartesian-
coordinate system. Then,
µBz = (µ(1) , · · · , µ(d) , µ(d+1) , µ(d+2) , · · · , µ(D) ),
z = (0, · · · , 0, ∆, 0, · · · , 0),
where ∆ = ∥z − z ∗ ∥ ≤ cσ . The angle between µBz − z and z ∗ − z can be represented by its
sine as follows:
sin2 (Θ(µBz − z, z ∗ − z)) = 1 − cos2 (Θ(µBz − z, z ∗ − z))
2
(µz − z) · (z ∗ − z)
 B
=1−
∥µBz − z∥2 ∥z ∗ − z∥2
(i) 2
P
i̸=d+1 (µ )
=P (i) 2 (d+1) − ∆)2
i̸=d+1 (µ ) + (µ

If
|µ(i) − ∆| ≥ c1 σ, p

for i = d + 1;
(B.1)
|µ(i) | ≤ c2 σ 2 log(1/σ), for i ̸= d + 1.
then
(D − 1)C22 σ 4 log(1/σ)
sin2 (Θ(µBz − z, z ∗ − z)) ≤ ≤ Cσ 2 log(1/σ)
(D − 1)C22 σ 4 log(1/σ) + C12 σ 2
for some constant C .
In other words, equation (B.1) is sufficient for sin(Θ(µBz − z, z ∗ − z)) ≤ Cσ log(1/σ).
p

B.3.2. Proof of Lemma 3.3.

P ROOF. To prove Lemma 3.3, the following propositions are needed.

P ROPOSITION B.3. Assume there is a mapping Ψ : D → MR that satisfies, for any point
z = (z1 , · · · , zd , 0, · · · , 0) ∈ D,
Ψ(z) = (z1 , · · · , zd , ψ(z1 , · · · , zd )).

P ROPOSITION B.4. Since the approximation error of the tangent space for the manifold
localization is in quadratic order,
d(Ψ(z)) = (1 + g(z))dz
with |g(z)| < C∥z∥2 .
44

Let δz = Ψ(z) − z for z ∈ D. Then, there is


Z
νR (y) = ϕσ (y − x)ω(x)dx
MR
Z
= ϕσ (y − Ψ(z))ω(Ψ(z))dΨ(z)
MR
Z
1
= ϕσ (y − z − δz )(1 + g(z))dz
V D
Z Z
1 1
= ϕσ (y − z − δz )dz + ϕσ (y − z − δz )g(z)dz.
V D V D
The difference between νD (y) and the first term of νR (y) can be expressed as follows:
Z Z
1 1
νD (y) − ϕσ (y − z − δz )dz = ϕσ (y − z) − ϕσ (y − z − δz )dz
V D V D
∥x − z − δz ∥22 − ∥x − z∥22
Z  
1
≤ ϕσ (y − z) 1 − exp − dz
V D 2σ 2
∥x − z − δz ∥22 − ∥x − z∥22
Z
1
≤ ϕσ (y − z) dz
V D 2σ 2
∥δz ∥22
Z
1
≤ ϕσ (y − z) dz
V D 2σ 2
≤ CσνD (y),
for some constant C . Moreover, the second term of νR (y) is of higher order:
Z Z
1 1
ϕσ (y − z − δz )g(z)dz ≤ ϕσ (y − z − δz )∥z∥2 dz
V D V D
p
≤ Cσ log(1/σ)νD (y).

B.3.3. Proof of Lemma 3.4.

P ROOF. Assume the manifold can be regard as D = Tz ∗ M ∩ BD (y, R) locally with


p
R = r + Cσ log(1/σ) ≫ r.
We still investigate the conditional expectation within BD (z, r), where we use νD (y) to denote
the density function of y and νeD (y) to denote its normalized version in BD (z, r). Similarly,
we let z ∗ be the origin, z − z ∗ be the (d + 1)-th direction, and µBz = (µ(1) , · · · , µ(D) ). Then,
for the i-th direction,
(i)
R
BD (z,r) y νD (y) dy
Z
(i) (i)
µ = y νeD (y) dy = R .
BD (z,r) BD (z,r) νD (y) dy

We first prove that µ(i) = 0 for i ̸= d + 1:


Since the ball BD (z, r) is centrosymmetric for i ̸= d + 1, there exists a mapping hi for each
i ̸= d + 1 such that, for any y = (y (1) , · · · , y (i) , · · · , y (D) ) ∈ BD (z, r),
hi : (y (1) , · · · , y (i) , · · · , y (D) ) 7→ (y (1) , · · · , −y (i) , · · · , y (D) ),
MANIFOLD FITTING 45

and hi (y) ∈ BD (z, r). That is, for all y ∈ BD (z, r), hi (y) is its mirror with respect to the i-th
direction, and
y ∈ BD (yr ) ⇔ hi (y) ∈ BD (z, r), for i ̸= d + 1,

x ∈ D ⇒ hi (x) ∈ D, for i = 1, · · · , d,

x ∈ D ⇒ hi (x) = x, for i = d + 1, · · · , D.
Let Bi+ and Bi− be two hemispheres such that
n o n o
Bi+ = y ∈ BD (z, r) : y (i) > 0 , Bi− = y ∈ BD (z, r) : y (i) < 0 .

Then,
Z Z
(i) (i)
µ = y νeD (y)dy + y (i) νeD (y)dy
Bi+ Bi−
Z Z
= y (i) νeD (y)dy + (hi (y))(i) νeD (hi (y))d(hi (y))
+ +
Bi Bi
Z
= y (i) (νeD (y) − νeD (hi (y)))dy
Bi+

To show µ(i) = 0, it is sufficient to show νeD (y) = νeD (hi (y)) or νD (y) = νD (hi (y)). Recall
that
Z
νD (y) = ϕσ (y − x)ω(x)dx,
D

and
∥y − x∥2 = ∥hi (y) − hi (x)∥2 , ∥hi (y) − x∥2 = ∥y − hi (x)∥2 .
Therefore, for i = 1, · · · , d,
Z
νD (hi (y)) = ϕσ (hi (y) − x)ω(x)dx
D
Z
= ϕσ (y − hi (x))ω(hi (x))dhi (x)
D

= νD (y),
and, for i = d + 2, · · · , D ,
Z
νD (y) = ϕσ (y − x)ω(x)dx
D
Z
= ϕσ (hi (y) − hi (x))ω(hi (x))dhi (x)
D
Z
= ϕσ (hi (y) − x)ω(x)dx
D

= νD (hi (y)).
Thus,
µ(i) = 0, for i ̸= d + 1.
46

For i = d + 1, we need to bound |∆ − µ(d+1) | below:


According to Lemma 2.4 and our model setting,
(d+1) )ϕ (y − x)ω(x) dx dy
R R
(d+1) BD (z,r) D (∆ − y σ
|∆ − µ |= R
BD (z,r) νD (y) dy
Z Z
= Cr−d (∆ − y (d+1) )ϕσ (y − x)ω(x) dx dy .
BD (z,r) D

If we express the numerator in the form of elements in the Cartesian coordinates,


Z Z
(∆ − y (d+1) )ϕσ (y − x)ω(x) dx dy
BD (z,r) D

Z d
Z Y D
Y
(d+1) (d+1) (j) (j)
= (∆ − y )ϕσ (y ) ϕσ (y −x )ω(x) dx ϕσ (y (j) ) dy
BD (z,r) D j=1 j=d+2

Z D
Y
≥C (∆ − y (d+1) )ϕσ (y (d+1) ) ϕσ (y (j) ) dy
BD (z,r) j=d+2
Z ∆ Z
≥C t(ϕσ (t − ∆) − ϕσ (t + ∆)) dt P
ϕσ (y (j) ) dy
0 j̸=d+1 (y ) ≤r −∆
(j) 2 2 2

:= CI1 I2 ,
where the last inequality is the result of the cropping of the integral area (similar to that in Fig.
17), while the first inequality stems from the fact that
Z Y d
ϕσ (y (j) − x(j) )ω(x) dx ≥ P ∥ξ ′ ∥2 ≥ (R − r)|ξ ′ ∼ N (0, σ 2 Id )

(B.2) D j=1

≥ 1 − cσ C ≈ 1.
If we let p = (y (1) , · · · , y (d) ) and q = (y (d+2) , · · · , y (D) ), with ∆ = C0 σ and r ≥ ∆ + c0 σ ,
we have
√ 2
C0 πErf (C0 ) − 2(e−C0 − 1)
Z ∆
I1 = tϕσ (t − ∆) dt = √ σ,
−∆ 2π
Z
I2 = ϕσ (q) dp dq
∥p∥2 +∥q∥22 ≤r2 −∆2
Z
≥ ϕσ (q) dp dq
∥p∥2 +∥q∥22 ≤c20 σ 2
c0 σ
s2
Z  
d
−(D−d−1)
= Cσ (c20 σ 2 −s ) s2 2
D−d−2
exp − 2 ds
0 2σ
≥ cσ d .
In other words, when C0 > 0 and c0 > 0, we have I1 ≥ cσ , I2 ≥ cσ d , and thus
|∆ − µ(d+1) | ≥ Cr−d I1 I2 ≥ cσ.
MANIFOLD FITTING 47

Combining all the results above, we have



|∆ − µ(d+1) | ≥ cσ
D
.
|µ(i) | = 0, for i ̸= d + 1
D

B.3.4. Proof of Theorem 3.5.

P ROOF. The proof is based on the framework of Lemma B.2 and Corollary 2.4.1. We first
provide an estimation of the local sample size, and then show the equivalent property between
µBz and E(W (Y )Y )/E(W (Y )).
For simplicity, we let the collection of observation that falls in BD (z, r0 ) be {yi }ni=1 , with
size n. According to Corollary 2.4.1, if N = Cr0−d σ −3 ,
P(C3 Dσ −3 ≤ n ≤ C4 Dσ −3 ) ≥ 1 − 2 exp −C5 σ −3 .


In the definition of F (z), the weight function is constructed as


k
∥z − y∥22

W (y) = 1 − .
r02
To obtain the asymptotic distribution, we need to evaluate E(W (Y )Y ) and E(W (Y )). Same
with the proof of Theorem 3.1, we only need to work on νD (y) under the same setting of
Cartesian coordinates, which means, z ∗ is the origin, z − z ∗ is the (d + 1)-th direction in the
Cartesian-coordinate system, and
z = (0, · · · , 0, ∆, 0, · · · 0).
| {z } | {z }
d D−d−1

If we define p = (y (1) , · · · , y (d) ), t = y (d+1) , and q = (y (d+2) , · · · , y (D) ), and assume η ∈ R2k
being an auxiliary vector, there is
R R
BD (z,r0 ) D W (y)ϕσ (y − x)ω(x) dx dy
E(W (Y )) = R R
BD (z,r0 ) D ϕσ (y − x)ω(x) dx dy
Z
−d
≈cr0 W (y)ϕσ (t − ∆)ϕσ (q) dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 ≤r0
Z
−(d+2k)
=cr0 (r02 − ∥p∥22 − (t − ∆)2 − ∥q∥22 )k
2 2
∥p∥ +(t−∆) +∥q∥ ≤r0
2
2
2

× ϕσ (t − ∆)ϕσ (q) dp dt dq
Z
−(d+2k)
=cr0 ϕσ (t − ∆)ϕσ (q) dη dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 +∥η∥22 ≤r0

=c.
Meanwhile, the i-th element of E(W (Y )Y ) can be expressed as
W (y)y (i) ϕσ (y − x)ω(x) dx dy
R R
(i) B D (z,r 0 ) D
(E(W (Y )Y )) = R R
BD (z,r0 ) D ϕσ (y − x)ω(x) dx dy
Z
−d
≈cr0 W (y)y (i) ϕσ (t − ∆)ϕσ (q) dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 ≤r0
48
Z
−(d+2k)
=cr0 y (i) ϕσ (t − ∆)ϕσ (q) dη dp dt dq
∥p∥22 +(t−∆)2 +∥q∥22 +∥η∥22 ≤r0

where the two approximation marks are the result of (B.2). By introducing the auxiliary vector
η , these two expectations can be viewed as the analogy of our manifold-fitting model in a
higher-dimension case where the dimensionalities of the ambient space and latent manifold
are D + 2k and d + 2k , respectively.
Hence, let µ
bw = E(W (Y )Y )/E(W (Y )); then, according to Theorem 3.1,
(
(d+1)

bw − ∆| ≥ c1 σ
(i) .

bw | ≤ c2 σ 2 , for i ̸= d + 1
Combining the result with Corollary 2.4.1 and Corollary B.2.1, if the total sample size
N = Cr0−d σ −3 ,
bw ∥2 ≤ cσ 2 ) ≥ 1 − C1 σ c1 −1 exp −C2 σ c1 −1 ,

P(∥F (z) − µ
for some constant C1 , C2 , and any c3 ∈ (0, 1), and thus
sin{Θ (F (z) − z, z ∗ − z)} ≤ C1 σ
p
log(1/σ)
with probability at least 1 − C2 exp(−C3 σ −c ), for some constant c, C1 , C2 , and C3 .

B.4. Proof of content in Section 4.

B.4.1. Proof of Theorem 4.2.

b⊥
P ROOF. Assume Πz ∗ satisfies
b⊥
∥Π ⊥ κ
z ∗ − Πz ∗ ∥F ≤ cσ ,

and the region V bV


b z is constructed correspondingly. As in the proof of Theorem 4.1, µ
z can be
written as
 

bV
µ z = z + Π
b z ∗ E Y ∼ν Y − z|Y ∈ V
b z
   
= z∗ + Π b− b⊥ ∗ b b⊥
z ∗ δz + Eν Πz ∗ (X − z )|Y ∈ Vz + Eν Πz ∗ ξ|Y ∈ Vz ,
b

can be divided into three parts. According to Lemma 2.6, we can assume ∥X −
which also p
z ∗ ∥2 ≤ Cσ log(1/σ) for some constant C . Let δz = z − z ∗ , and the three parts can be
evaluated as follows:
b−
(a) Π z ∗ δz :
b−
The norm of Π z ∗ δz is upper bounded as
b−
∥Π − b− −
z ∗ δz ∥2 = ∥Πz ∗ δz + (Πz ∗ − Πz ∗ )δz ∥2

≤ ∥Π− b− −
z ∗ δz ∥2 + ∥(Πz ∗ − Πz ∗ )∥F ∥δz ∥2

≤ 0 + cσ κ σ
≤ Cσ 1+κ ,
for some constant C .
MANIFOLD FITTING 49
 
b⊥
(b) Eν Π ∗
z ∗ (X − z )|Y ∈ Vz :
b

From Jensen’s inequality,


   
Eν Π b⊥z∗ (X − z ∗
)|Y ∈ V
b z ≤ Eν b⊥
Π ∗
z ∗ (X − z ) |Y ∈ V
bz .
2 2
Since z∗ and X are exactly on M, according to Lemma 2.3,
b⊥
Π ∗
z ∗ (X − z ) = Π⊥ ∗ b⊥ ⊥ ∗
z ∗ (X − z ) + (Πz ∗ − Πz ∗ )(X − z )
2 2
1
≤ ∥X − z ∗ ∥22 + σ κ ∥X − z ∗ ∥2 ,

where
∥X − z ∗ ∥2 = ∥X − z + z − z ∗ ∥22
≤ ∥X − z∥22 + ∥z − z ∗ ∥22
≤ Cσ 2 log(1/σ),
and thus  
b⊥ ∗
p
Eν Π ∗ (X − z )|Y ∈ b z ≤ Cσ 1+κ log(1/σ).
V
z
 
b ⊥
(c) Eν Πz ∗ ξ|Y ∈ Vz :
b

The dislocation ∆ can be evaluated as follows:


∆ = ∥Πb⊥z ∗ (z − X)∥2

b⊥
≤ ∥Π ∗ b⊥ ∗
z ∗ (z − z )∥2 + ∥Πz ∗ (z − X)∥2

≤ ∥z − z ∗ ∥2 + Cσ 1+κ log(1/σ)
p

≤ Cσ.
Thus, if we let ξ ′ = Π
b⊥z ∗ ξ , according to Proposition 2.5,
   h i 
Eν Π b⊥
z ∗ ξ|Y ∈ Vz
b ≤ Eω Eϕ Π⊥ z ∗ ξ|X, X + ξ ∈ Vz
2 2
′ ′
 
≤ Eω ∥E ξ |ξ ∈ BD−d (a∆ , r2 ) ∥2
≤ Cσ 2
for some constant C .
Therefore,
∗ 1+κ
p
bV
∥µ z − z ∥2 ≤ Cσ log(1/σ),
1+κ
p
bz is O(σ
for some constant C , and µ V
log(1/σ)) to M.

B.4.2. Proof of Theorem 4.4.

P ROOF. Assume U is thepprojection matrix onto z − z ∗ , and U


e is its estimation via µBz − z
such that ∥U − U e ∥F ≤ Cσ log(1/σ) for some constant C . Let U − be the complement in
D
R ; then, µ V
bz can be rewritten as follows:
 
bV
µ z = z + U
e Eν Y − z|Y ∈ V
bz
   
= z∗ + U
e − δz + Eν U e (X − z ∗ )|Y ∈ V
b z + Eν U e ξ|Y ∈ V bz ,
50

which also pcan be divided into three parts. According to Lemma 2.6, we can assume ∥X −

z ∥2 ≤ Cσ log(1/σ) for some constant C . Let δz = z − z ∗ . The three parts can be evaluated
as follows:
e − δz :
(a) U
As δz is orthogonal to the base of U − ,
e − δz ∥2 ≤ ∥U − δz ∥2 + ∥U − − U
e − ∥F ∥δz ∥2 ≤ Cσ 2
p
∥U log(1/σ).
 
(b) Eν Ue (X − z ∗ )|Y ∈ V
bz :

Using Jensen’s inequality,


   
Eν U e (X − z ∗ )|Y ∈ V
bz ≤ Eν e (X − z ∗ )
U |Y ∈ V
bz .
2 2

Since U is one direction of Π⊥


z∗ ,

∥U (X − z ∗ )∥2 ≤ ∥Π⊥ ∗
z ∗ (X − z )∥2 .

As z ∗ and X are exactly on M, according to Lemma 2.3,


e (X − z ∗ ) = U (X − z ∗ ) + (U
U e − U )(X − z ∗ )
2

≤ Π⊥
z ∗ (X −z ) ∗ e − U ∥F ∥X − z ∗ ∥
+ ∥U 2
2
1
≤ ∥X − z ∗ ∥22 + σ∥X − z ∗ ∥2

≤ Cσ 2 log(1/σ).
 
(c) Eν Ue ξ|Y ∈ V
bz :

The dislocation ∆ can be evaluated as


∆ = ∥U
e (z − X)∥2
e (z − z ∗ )∥2 + ∥U
≤ ∥U e (z ∗ − X)∥2

≤ ∥z − z ∗ ∥2 + Cσ 2 log(1/σ)
p

≤ Cσ.

Thus, if we let ξ ′ = U
e ξ , according to Proposition 2.5,
     
Eν U e ξ|Y ∈ V bz ≤ Eω Eϕ U e ξ|X, X + ξ ∈ Vz
2 2
′ ′
 
≤ Eω ∥E ξ |ξ ∈ BD−d (a∆ , r2 ) ∥2
≤ Cσ 2
for some constant C .
Therefore,
∗ 2
bV
∥µ z − z ∥2 ≤ Cσ log(1/σ),

for some constant C .


MANIFOLD FITTING 51

B.4.3. Proof of Theorem 4.5.

P ROOF. Let n be the number of samples falling in V


b z . According to Lemma B.2,
√ d
n (G(z) − µbw ) → N (0, Σ),
where Σ ≤ r2 ID and
R
E(β(Y )Y ) yβ(y)ν(y) dy
µ
bw = = R
E(β(Y )) β(y)ν(y) dy
R
b (y − z))wv ((ID − U
ywu (U b )(y − z))ν(y) dy
= R .
wu (Ub (y − z))wv ((ID − U
b )(y − z))ν(y) dy

To obtain the asymptomatic property of G(z), we need to investigate µ


bw first. For simplicity,
we define two more expectations:
R
ywu (U (y − z))wv ((ID − U )(y − z))ν(y) dy
µw = R
wu (U (y − z))wv ((ID − U )(y − z))ν(y) dy
R ∗
yβ (y)ν(y) dy
=: R ∗ ,
β (y)ν(y) dy
R ∗
yβ (y)νD (y) dy
µw,D = R ∗ .
β (y)νD (y) dy
bw is O(σ 2 log(1/σ))-close to M with high probability.
In what follows, we will show that µ
2
Since ∥µw,D − µw ∥2 ≤ Cσ log(1/σ), from Lemma 3.3, we only need to show that ∥µ bw −
µw ∥2 ≤ Cσ 2 log(1/σ) with high probability and µw,D is O(σ 2 )-close to M.

Bound of ∥µ
bw − µw ∥2 :
p
According to Theorem 3.5, ∥U b − U ∥F ≤ C1 σ log(1/σ) with probability at least 1 −
C2 exp(−C3 σ −c ), and the first derivatives of wu and wv are both upper bounded by a constant
C . We have
|W
cu − Wu | =: |wu (U
b (y − z)) − wu (U (y − z))|

≤ C∥U
b − U ∥F ∥y − z∥2

≤ C4 σ 2 log(1/σ),

|W
cv − Wv | =: |wv ((ID − U
b )(y − z)) − wv ((ID − U )(y − z))|

≤ C∥U
b − U ∥F ∥y − z∥2

≤ C5 σ 2 log(1/σ),
and thus,
|β ∗ (y) − β(y)| = |Wu Wv − W cv |
cu W

= |Wu Wv − Wu W cv − W
cv + Wu W cv |
cu W

≤ Wu |W
cv − Wv | + W
cv |W
cu − Wu |

≤ C6 σ 2 log(1/σ),
52

where the last inequality is the result of both Wu and W


cv being in the interval [0, 1]. Then,
R R ∗
yβ(y)ν(y) dy yβ (y)ν(y) dy
∥µbw − µw ∥2 = R − R ∗
β(y)ν(y) dy β (y)ν(y) dy 2
y(β(y) − β ∗ (y))ν(y) dy
R
≤ R
β(y)ν(y) dy 2
R ∗
y(β(y) − β ∗ (y))ν(y) dy
R
yβ (y)ν(y) dy
+ R ∗ R
β (y)ν(y) dy 2 β(y)ν(y) dy
1 + ∥µw − z ∗ ∥2
≤ C6 σ 2 log(1/σ)
E(β(Y ))
≤ C7 σ 2 log(1/σ),
with probability at least 1 − C2 exp(−C3 σ −c ).

Property of µw,D :
As in the proof of Section 3, we let z ∗ be the origin and z − z ∗ be the (d + 1)-th direction
in the Cartesian-coordinate system. We also let p = (y (1) , · · · , y (d) ), t = y (d+1) , and q =
(y (d+2) , · · · , y (D) ). With U the same as before, let ∥u∥ = ∥U (y − z)∥ = |t − ∆|, ∥v∥ =
∥p + q∥2 . Assume µw,D = (µ(1) , · · · , µ(D) ); then, the i-th element of µw,D , i.e., µ(i) , can be
expressed as
(i) 2 2 2 k
R R
∥p∥22 +∥q∥22 ≤r12 (t−∆)2 ≤r22 y wu (|t − ∆|)(r1 − ∥p∥2 − ∥q∥2 ) ϕσ (t)ϕσ (q) dt dp dq
R R 2 2 2 k .
∥p∥2 +∥q∥2 ≤r2 (t−∆)2 ≤r2 wu (|t − ∆|)(r1 − ∥p∥2 − ∥q∥2 ) ϕσ (t)ϕσ (q) dt dp dq
2 2 1 2

For i ̸= d + 1:
y (i) (r12 − ∥p∥22 − ∥q∥22 )k ϕσ (q) dp dq
R
∥p∥22 +∥q∥22 ≤r12
µ(i) ≈ 2 − ∥p∥22 − ∥q∥22 )k ϕσ (q) dp dq
R
∥p∥22 +∥q∥22 ≤r12 (r1

y (i) ϕσ (q) dη dp dq
R
∥p∥22 +∥q∥22 +∥η∥22 ≤r12
= R
∥p∥22 +∥q∥22 +∥η∥22 ≤r12 ϕσ (q) dη dp dq

= 0,
where η ∈ R2k is an auxiliary vector making the above conditional expectation an analogy of
Lemma 3.4 in D + 2k − 1-dimensionalp space.
For i = d + 1, we assume r2 = Cσ log(1/σ) > 2∆. We have
R ∆+r2
twu (|t − ∆|)ϕσ (t) dt
µ (d+1)
≈ R∆−r 2
∆+r2
∆−r2 wu (|t − ∆|)ϕσ (t) dt
Z ∆+r2
≤C twu (|t − ∆|)ϕσ (t) dt
∆−r2
Z ∆+r2
=C t[wu (|t − ∆|) − wu (|t + ∆|)]ϕσ (t) dt
0
Z ∆+r2
=C t[wu (|t − ∆|) − wu (|t + ∆|)]ϕσ (t) dt
r2 /2−∆
MANIFOLD FITTING 53
Z ∆+r2
≤C tϕσ (t) dt
r2 /2−∆

≤ Cσ 2 .
Therefore, ∥µw,D − z ∗ ∥2 ≤ Cσ 2 .

Combining all the results above, we have

bw − z ∗ ∥2 ≤ ∥µ
∥µ bw − µw ∥2 + ∥µw − µw,D ∥2 + ∥µw,D − z ∗ ∥2 ≤ Cσ 2 log(1/σ),
with probability at least 1 − C2 exp(−C3 σ −c ).
According to Corollary 2.4.1 and Corollary B.2.1, if the sample size N = C1 r1−d σ −3 ,
∥G(z) − z ∗ ∥2 ≤ C2 σ 2 log(1/σ)
with probability at least 1 − C2 exp(−C3 σ −c ), for some constant c, C1 , C2 , and C3 .

B.5. Proof of content in Section 5.

B.5.1. Proof of Theorem 5.1.

P ROOF. To show dH (S, M) ≤ Cσ 2 log(1/σ) is equivalent to showing that


d(s, M) ≤ Cσ 2 log(1/σ), for all s ∈ S,
(

d(x, S) ≤ Cσ 2 log(1/σ), for all x ∈ M.


The first condition is clear. For any s ∈ S , there exists a ys ∈ Γ such that s = µ
bVys . Then,
according to Theorem 4.4,
∗ 2
(B.3) d(s, M) ≤ ∥µV
ys − ys ∥2 ≤ Cσ log(1/σ).

For the second inequality, let x be an arbitrary point on M. Then, there exists a point
yx ∈ Γ such that x is its projection on M. Hence, from Theorem 4.4 again,
2
(B.4) d(x, S) ≤ ∥x − µV
yx ∥2 ≤ Cσ log(1/σ).

Because (B.3) and (B.4) hold for any s ∈ S and x ∈ M, we complete the proof.

B.5.2. Proof of Theorem 5.2.

P ROOF. From the smoothness of Γ and G, it is evident that Sb becomes a smooth manifold.
For any s ∈ Sb, there exists a ys ∈ Γ such that s = G(ys ). Then, according to Theorem 4.5,
(B.5) d(s, M) ≤ ∥G(ys ) − ys∗ ∥2 ≤ Cσ 2 log(1/σ),
with a high probability. For the second inequality, let x be an arbitrary point on M. Then,
there exists a point yx ∈ Γ such that x is its projection on M. Hence, from Theorem 4.5 again,
(B.6) d(x, S) ≤ ∥x − G(yx )∥2 ≤ Cσ 2 log(1/σ)
with a high probability. Thus the proof is completed.
54

B.5.3. Proof of Theorem 5.3.

P ROOF. By fixing the projection matrix Πb⊥x within a neighbour, the function defining Mx
c
is a smooth map with constant rank D − d, and thus, according to the Constant-Rank Level-Set
Theorem, M cx is a properly embedded submanifold of dimension d in RD .
To show the distance, let y be an arbitrary point on Mcx . Then there is
b⊥
Π b⊥ ∗ ∗
x (G(y) − y) = Πx (G(y) − y − (y − y )) = 0,
where y ∗ is the projection of y onto M. Thus, there is
b⊥
∥Π ∗ b⊥ ∗
x (y − y )∥2 = ∥Πx (G(y) − y )∥2

≤ ∥G(y) − y ∗ ∥2
≤ Cσ 2 (log(1/σ)),
b⊥
with high probability. Since y ∈ BD (x, cτ ), there exists c1 ∈ (0, 1) such that ∥Π ⊥
x − Πy ∗ ∥ ≤ c1
with high probability. Hence,
b⊥
∥Π ∗ ⊥ ∗ ⊥ b⊥ ∗
x (y − y )∥2 ≥ ∥Πy ∗ (y − y )∥2 − ∥(Πy ∗ − Πx )(y − y )∥2

≥ |1 − c1 |∥y − y ∗ ∥2
≥ c∥y − y ∗ ∥2 .
Therefore, for any y ∈ M
cx
d(y, M) = ∥y − y ∗ ∥ ≤ Cσ 2 log(1/σ)
with high probability.

B.5.4. Proof of Theorem 5.4.

P ROOF. The proof of (I) and (II) is exactly the same as the proof in Theorem 5.2. To reveal
c, and a ̸= b. When ∥a − b∥2 ≥ cστ0 , ∥a − b∥2 /d(b, Ta M)
(III), Let a, b ∈ M c ≥ cστ0 is clearly
2
true since ∥a − b∥2 ≥ d(b, Ta M) c . Hence, we assume that ∥a − b∥2 < cστ0 . We further denote
−1
a0 = G (a) ∈ M f and b0 = G−1 (b) ∈ M f.
Let JG denote the Jacobi matrix of G, then JG (a0 ) is a linear mapping from Ta0 Mf to Ta M c.
Consider a local chart of Γ at Ta0 Γ, then the natural projection from M
c ∩ B(a0 , ∥b0 − a0 ∥2 ) to
Ta0 Mc ∩ B(a0 , ∥b0 − a0 ∥2 ) is an invertible mapping. Denote the inverse mapping of the natural
projection as ϕ, and then there exists ηb0 ∈ Ta0 Mc such that ϕ(0) = a0 , and ϕ(ηb ) = b0 . Since
0
∥a − b∥2 < cστ0 , there exists 0 < c < C such that
c∥a0 − b0 ∥2 ≤ ∥ηb0 − ηa0 ∥2 = ∥ηb0 ∥2 ≤ C∥a0 − b0 ∥2 .
Using the Taylor expansion of G at a0 , there is
d(b, Ta M)
c ≤ ∥b − JG (a0 )ηb − G(a0 )∥2
0

= ∥G(ϕ(ηb0 )) − JG (a0 )ηb0 − G(a0 )∥2


= ∥G(ηb0 ) − JG (a0 )ηb0 − G((a0 ))∥2 + ∥G(ηb0 ) − G(ϕ(ηb0 ))∥2
≤ ∥HG (z1 )∥2 ∥ηb0 ∥22 + ∥JG (z2 )∥2 ∥ηb0 − ϕ(ηb0 )∥2
≤ C(MG + LG )∥ηb0 ∥22
≤ C(MG + LG )∥a0 − b0 ∥2 .
MANIFOLD FITTING 55

Here, HG is the Hessian matrix of G, MG and LG are the upper bound of ∥HG ∥2 and ∥JG ∥2 .
Moreover,
1 1
∥a0 − b0 ∥2 ≤ ∥G(a0 ) − G(b0 )∥2 = ∥G(a0 ) − G(b0 )∥2 ,
ℓG ℓG
where ℓG is the lower bound of JG . Hence, we have
c ≤ C MG + LG ∥a − b∥22 .
d(b, Ta M)
ℓG
Finally, the reach of M
c can be bounded below as
 
ℓG
reach(M) ≥ min cστ0 , c
c .
MG + LG

B.5.5. Proof of Proposition 5.5.

P ROOF. Since M f ⊂ Γ, it is clear that dH (M,


f M) ≤ Cσ . In the following section, we
show that dim M = d.P
f
P
Recall that F (y) = i αi (y)yi , with i αi (y) = 1. Let H(y) = F (y) − y ; then, we have
X
H(y) = αi (y)yi − y
i
X
= αi (y)(yi − y).
i

According to Lemma 17 and Theorem 18 in [42], for any unit norm direction vector v ∈ RD ,
∥∂v H(y) − v∥2 ≤ Cr0 ,
with high probability. In the case of σ being sufficiently small, the Jacobian matrix of H ,
denoted by JH , is full rank. For any fixed arbitrary rank D − d projection matrix Π∗ ,
Π∗ H : R D → R D ,

JΠ∗ H = Π∗ JH .
In other words, Π∗ H is a smooth map with constant rank D − d, and thus, according to
f = {y ∈ Γ : Π∗ H(y) = 0} is a properly embedded
the Constant-Rank Level-Set Theorem, M
submanifold of co-dimension D − d in Γ. Therefore, dim M
f = d.

APPENDIX C: SUPPLEMENT TO THE SIMULATION


56 (a) (b)

(a) (b)
F IG 18. Assessing the performance of ysl23 in fitting the sphere (N = 5 × 104 , N0 = 100, σ = 0.06): the left
panel displays points in W surrounding the underlying manifold, while the right panel illustrates the corresponding
points in W.
c

F IG 19. Assessing the performance of ysl23 in fitting the torus (N = 5 × 104 , N0 = 100, σ = 0.06): the left panel
displays points in W surrounding the underlying manifold, while the right panel illustrates the corresponding
points in W.
c

Hauadorff Distance (Sphere, N = 2.5# 10 4 ) Average Distance (Sphere, N = 2.5# 10 4 )


0.07 0.025

0.06
0.02
0.05

0.04 0.015

0.03
0.01
0.02

0.01 0.005
< = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02 < = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02

Hauadorff Distance (Sphere, < = 0.06) Average Distance (Sphere, < = 0.06)
0.1 0.025

0.08
0.02

0.06
0.015
0.04

0.01
0.02

0 0.005
1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5 1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5

F IG 20. The asymptotic performance of ysl23 when fitting a sphere. The top two figures show how the two distance
change with σ, while the bottom two figures show how the distances change with N .
MANIFOLD FITTING 57
Hauadorff Distance (Torus, N = 2.5# 10 4 ) Average Distance (Torus, N = 2.5# 10 4 )
0.2 0.08

0.15 0.06

0.1 0.04

0.05 0.02

0 0
< = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02 < = 0.12 < = 0.10 < = 0.08 < = 0.06 < = 0.04 < = 0.02

Hauadorff Distance (Torus, < = 0.06) Average Distance (Torus, < = 0.06)
0.12 0.025

0.1
0.02
0.08

0.06 0.015

0.04
0.01
0.02

0 0.005
1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5 1# 10 3 5# 10 3 2.5# 10 4 1.25# 10 5

F IG 21. The asymptotic performance of ysl23 when fitting a torus. The top two diagrams show how the two distance
change with σ, while the bottom two figures show how the distances change with N .

F IG 22. The performance of ysl23 with increasing N . Top row, from left to right: N = 3 × 102 , 3 × 103 , 3 × 104 ,
3 × 105 . Middle row, from left to right: N = 1 × 103 , 5 × 103 , 2.5 × 104 , 1.25 × 105 . Bottom row, from left to
right: N = 1 × 103 , 5 × 103 , 2.5 × 104 , 1.25 × 105 . It can be observed that for each example, as the number of
samples increases, the distribution of W output by ysl23 becomes more uniform.
58

SUPPLEMENTARY MATERIAL
Supplementary material for “Manifold Fitting: an Invitation to Statistics"
(doi: COMPLETED BY THE TYPESETTER; .pdf). We include all materials omitted from
the main text.

REFERENCES
[1] B ELKIN , M. and N IYOGI , P. (2003). Laplacian eigenmaps for dimensionality reduction and data representa-
tion. Neural computation 15 1373–1396.
[2] B OISSONNAT, J.-D., G UIBAS , L. J. and O UDOT, S. Y. (2009). Manifold reconstruction in arbitrary dimen-
sions using witness complexes. Discrete & Computational Geometry 42 37–70.
[3] C ALABI , E. (2015). On Kähler manifolds with vanishing canonical class. In Algebraic geometry and topology.
A symposium in honor of S. Lefschetz 12 78–89.
[4] C HEN , Y.-C., G ENOVESE , C. R. and WASSERMAN , L. (2015). Asymptotic theory for density ridges. The
Annals of Statistics 43 1896–1928.
[5] C HENG , S.-W., D EY, T. K. and R AMOS , E. A. (2005). Manifold reconstruction from point samples. In
SODA 5 1018–1027.
[6] DANG , C., S AFAIE , A., P HANIKUMAR , M. and R ADHA , H. (2015). Wind speed and direction estimation
using manifold approximation. In Proceedings of the 14th International Conference on Information
Processing in Sensor Networks 328–329.
[7] D EUTSCH , S., O RTEGA , A. and M EDIONI , G. (2016). Manifold denoising based on spectral graph wavelets.
In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 4673–
4677. IEEE.
[8] D ONOHO , D. L. and G RIMES , C. (2003). Hessian eigenmaps: Locally linear embedding techniques for
high-dimensional data. Proceedings of the National Academy of Sciences 100 5591–5596.
[9] D UNSON , D. B., W U , H.-T. and W U , N. (2022). Graph based Gaussian processes on restricted domains.
Journal of the Royal Statistical Society Series B: Statistical Methodology 84 414–439.
[10] D UNSON , D. B. and W U , N. (2022). Inferring manifolds from noisy data using Gaussian processes. arXiv:
2110.07478.
[11] F EDERER , H. (1959). Curvature measures. Transactions of the American Mathematical Society 93 418–491.
[12] F EFFERMAN , C. (2006). Whitney’s Extension Problem for C m . Annals of mathematics 313–359.
[13] F EFFERMAN , C., I VANOV, S., K URYLEV, Y., L ASSAS , M. and NARAYANAN , H. (2018). Fitting a putative
manifold to noisy data. In Conference On Learning Theory 688–720. PMLR.
[14] F EFFERMAN , C., I VANOV, S., K URYLEV, Y., L ASSAS , M. and NARAYANAN , H. (2020). Reconstruction
and interpolation of manifolds. I: The geometric Whitney problem. Foundations of Computational
Mathematics 20 1035–1133.
[15] F EFFERMAN , C., I VANOV, S., L ASSAS , M., L U , J. and NARAYANAN , H. (2021). Reconstruction and
interpolation of manifolds II: Inverse problems for Riemannian manifolds with partial distance data.
arXiv:2111.14528.
[16] F EFFERMAN , C., I VANOV, S., L ASSAS , M. and NARAYANAN , H. (2021). Fitting a manifold of large reach
to noisy data. arXiv:1910.05084.
[17] F EFFERMAN , C., M ITTER , S. and NARAYANAN , H. (2016). Testing the manifold hypothesis. Journal of the
American Mathematical Society 29 983–1049.
[18] F EFFERMAN , C. L. (2005). A sharp form of Whitney’s extension theorem. Annals of mathematics 509–577.
[19] G ENOVESE , C., P ERONE -PACIFICO , M., V ERDINELLI , I. and WASSERMAN , L. (2012). Minimax Manifold
Estimation. Journal of Machine Learning Research 13 1263–1291.
[20] G ENOVESE , C. R., P ERONE -PACIFICO , M., V ERDINELLI , I. and WASSERMAN , L. (2012). Manifold
estimation and singular deconvolution under Hausdorff loss. The Annals of Statistics 40 941–963.
[21] G ENOVESE , C. R., P ERONE -PACIFICO , M., V ERDINELLI , I. and WASSERMAN , L. (2014). Nonparametric
ridge estimation. The Annals of Statistics 42 1511–1545.
[22] G OODFELLOW, I., P OUGET-A BADIE , J., M IRZA , M., X U , B., WARDE -FARLEY, D., O ZAIR , S.,
C OURVILLE , A. and B ENGIO , Y. (2014). Generative adversarial nets. Advances in neural information
processing systems 27.
[23] H ANSON , A. J. (1994). A construction for computer visualization of certain complex curves. Notices of the
Amer. Math. Soc 41 1156–1163.
[24] J UNG , S., D RYDEN , I. L. and M ARRON , J. S. (2012). Analysis of principal nested spheres. Biometrika 99
551-568.
MANIFOLD FITTING 59

[25] K IM , I., M ARTINS , R. J., JANG , J., BADLOE , T., K HADIR , S., J UNG , H.-Y., K IM , H., K IM , J., G ENEVET, P.
and R HO , J. (2021). Nanophotonics for light detection and ranging technology. Nature nanotechnology
16 508–524.
[26] L EE , D.-T. and S CHACHTER , B. J. (1980). Two algorithms for constructing a Delaunay triangulation.
International Journal of Computer & Information Sciences 9 219–242.
[27] L EE , J. M. (2010). Introduction to topological manifolds 202. Springer Science & Business Media.
[28] L EE , J. M. (2013). Smooth manifolds. In Introduction to smooth manifolds 1–31. Springer.
[29] L EE , J. M. (2018). Introduction to Riemannian manifolds 176. Springer.
[30] L UO , S. and H U , W. (2020). Differentiable manifold reconstruction for point cloud denoising. In Proceedings
of the 28th ACM international conference on multimedia 1330–1338.
[31] M C I NNES , L., H EALY, J. and M ELVILLE , J. (2018). Umap: Uniform manifold approximation and projection
for dimension reduction. arXiv preprint arXiv:1802.03426.
[32] M OHAMMED , K. and NARAYANAN , H. (2017). Manifold learning using kernel density estimation and local
principal components analysis. arXiv:1709.03615.
[33] N IYOGI , P., S MALE , S. and W EINBERGER , S. (2008). Finding the homology of submanifolds with high
confidence from random samples. Discrete & Computational Geometry 39 419–441.
[34] O ZERTEM , U. and E RDOGMUS , D. (2011). Locally defined principal curves and surfaces. The Journal of
Machine Learning Research 12 1249–1286.
[35] PANARETOS , V. M., P HAM , T. and YAO , Z. (2014). Principal Flows. Journal of the American Statistical
Association 109 424-436.
[36] ROWEIS , S. T. and S AUL , L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding.
science 290 2323–2326.
[37] S OBER , B. and L EVIN , D. (2020). Manifold approximation by moving least-squares projection (MMLS).
Constructive Approximation 52 433–478.
[38] T ENENBAUM , J. B., S ILVA , V. D . and L ANGFORD , J. C. (2000). A global geometric framework for nonlinear
dimensionality reduction. science 290 2319–2323.
[39] WANG , W. and C ARREIRA -P ERPINÁN , M. A. (2010). Manifold blurring mean shift algorithms for manifold
denoising. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition
1759–1766. IEEE.
[40] W HITNEY, H. (1992). Analytic extensions of differentiable functions defined in closed sets. In Hassler
Whitney Collected Papers 228–254. Springer.
[41] YANG , T. and M ENG , J. (2021). Manifold fitting algorithm of noisy manifold data based on variable-scale
spectral graph. Soft Computing 1–12.
[42] YAO , Z. and X IA , Y. (2019). Manifold fitting under unbounded noise. arXiv:1909.10228.
[43] YAU , S.-T. (1978). On the ricci curvature of a compact kähler manifold and the complex monge-ampére
equation, I. Communications on pure and applied mathematics 31 339–411.
[44] Z HANG , Z. and Z HA , H. (2004). Principal manifolds and nonlinear dimensionality reduction via tangent
space alignment. SIAM journal on scientific computing 26 313–338.

You might also like