0% found this document useful (0 votes)
40 views40 pages

Robust Topological Inference: Distance To A Measure and Kernel Distance

This document summarizes research on using robust topological measures for shape analysis of point cloud data. It introduces the distance-to-a-measure (DTM) and kernel distance functions, which are smooth alternatives to the empirical distance function that are robust to outliers. The paper derives limiting distributions for the DTM, proposes bootstrap confidence bands, and extends these statistical results to the kernel distance. It also discusses choosing smoothing parameters and addresses boundary bias issues. The goal is to perform statistically valid topological inference and identify significant topological features of the underlying data distribution from samples.

Uploaded by

omid fazli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views40 pages

Robust Topological Inference: Distance To A Measure and Kernel Distance

This document summarizes research on using robust topological measures for shape analysis of point cloud data. It introduces the distance-to-a-measure (DTM) and kernel distance functions, which are smooth alternatives to the empirical distance function that are robust to outliers. The paper derives limiting distributions for the DTM, proposes bootstrap confidence bands, and extends these statistical results to the kernel distance. It also discusses choosing smoothing parameters and addresses boundary bias issues. The goal is to perform statistically valid topological inference and identify significant topological features of the underlying data distribution from samples.

Uploaded by

omid fazli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Journal of Machine Learning Research 18 (2018) 1-40 Submitted 9/15; Revised 2/17; Published 4/18

Robust Topological Inference:


Distance To a Measure and Kernel Distance

Frédéric Chazal [email protected]


Inria Saclay - Ile-de-France
Alan Turing Bldg, Office 2043
1 rue Honoré d’Estienne d’Orves
91120 Palaiseau, FRANCE
Brittany Fasy [email protected]
Computer Science Department
Montana State University
357 EPS Building
Montana State University
Bozeman, MT 59717
Fabrizio Lecci [email protected]
New York, NY
Bertrand Michel [email protected]
Ecole Centrale de Nantes
Laboratoire de mathématiques Jean Leray
1 Rue de La Noe
44300 Nantes FRANCE
Alessandro Rinaldo [email protected]
Department of Statistics
Carnegie Mellon University
Pittsburgh, PA 15213
Larry Wasserman [email protected]
Department of Statistics
Carnegie Mellon University
Pittsburgh, PA 15213

Editor: Mikhail Belkin

Abstract
Let P be a distribution with support S. The salient features of S can be quantified with
persistent homology, which summarizes topological features of the sublevel sets of the dis-
tance function (the distance of any point x to S). Given a sample from P we can infer
the persistent homology using an empirical version of the distance function. However, the
empirical distance function is highly non-robust to noise and outliers. Even one outlier is
deadly. The distance-to-a-measure (DTM), introduced by Chazal et al. (2011), and the ker-
nel distance, introduced by Phillips et al. (2014), are smooth functions that provide useful
topological information but are robust to noise and outliers. Chazal et al. (2015) derived
concentration bounds for DTM. Building on these results, we derive limiting distributions
and confidence sets, and we propose a method for choosing tuning parameters.
Keywords: Topological data analysis, persistent homology, RKHS.

2018
c Frederic Chazal, Brittany Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo and Larry Wasserman.
License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided
at http://jmlr.org/papers/v18/15-484.html.
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

1. Introduction

Figure 1 shows three complex point clouds, based on a model used for simulating cosmology
data. Visually, the three samples look very similar. Below the data plots are the persistence
diagrams, which are summaries of topological features defined in Section 2. The persistence
diagrams make it clearer that the third data set is from a different data generating process
than the first two.

Figure 1: The first two datasets come from the same data generating mechanism. In the
third one, the particles are more concentrated around the walls of the Voronoi
cells. Although the difference is not clear from the scatterplots, it is evident from
the persistence diagrams of the sublevel sets of the distance-to-measure functions.
See Example 3 for more details on the Voronoi Models.

This is an example of how topological features can summarize structure in point clouds.
The field of topological data analysis (TDA) is concerned with defining such topological
features; see Carlsson (2009). When performing TDA, it is important to use topological
measures that are robust to noise. This paper explores some of these robust topological
measures.

2
Robust Topological Inference

Let P be a distribution with compact support S ⊂ Rd . One way to describe the


shape of S is by using homology. Roughly speaking, the homology of S measures the
topological features of S, such as the connected components, the holes, and the voids.
A more nuanced way to describe the shape of S is using persistent homology, which is a
multiscale version of homology. To describe persistent homology, we begin with the distance
function ∆S : Rd → R for S which is defined by

∆S (x) = inf kx − yk. (1)


y∈S

The sublevel sets Lt = {x : ∆S (x) ≤ t} provide multiscale topological information about S.


As t varies from zero to ∞, topological features — connected components, loops, voids —
are born and die. Persistent homology quantifies the evolution of these topological features
as a function of t. See Figure 2. Each point on the persistence diagram represents the birth
and death time of a topological feature.
Given a sample X1 , . . . , Xn ∼ P , the empirical distance function is defined by

∆(x)
b = min kx − Xi k. (2)
Xi

If P is supported on S, and has a density bounded away from zero and infinity, then ∆ b is
P
a consistent estimator of ∆S , i.e., supx |∆(x) − ∆S (x)| → 0. However, if there are outliers,
b
or noise, then ∆(x)
b is no longer consistent. Figure 3 (bottom) shows that a few outliers
completely change the distance function. In the language of robust statistics, the empirical
distance function has breakdown point zero.
A more robust approach is to estimate the persistent homology of the super-level sets
of the density p of P . As long as P is concentrated near S, we expect the level sets of
p to provide useful topological information about S. Specifically, some level sets of p are
homotopic to S under weak conditions, and this implies that we can estimate the homology
of S. Note that, in this case, we are using the persistent homology of the super-level sets of
p, to estimate the homology of S. This is the approach suggested by Bubenik (2015), Fasy
et al. (2014b) and Bobrowski et al. (2014). A related idea is to use persistent homology based
on a kernel distance (Phillips et al., 2014). In fact, the sublevel sets of the kernel distance
are a rescaling of the super-level sets of p, so these two ideas are essentially equivalent. We
discuss this approach in Section 5.
A different approach, more closely related to the distance function, but robust to noise, is
to use the distance-to-a-measure (DTM), δ ≡ δP,m , from Chazal et al. (2011); see Section 2.
An estimate δb of δ is obtained by replacing the true probability measure with the empirical
probability measure Pn , or with a deconvolved version of the observed measure Caillerie
et al. (2011). One then constructs a persistence diagram based on the sub-level sets of the
DTM. See Figure 1. This approach is aimed at estimating the persistent homology of S.
(The DTM also suggests new approaches to density estimation; see Biau et al. (2011).)
The density estimation approach and the DTM are both trying to probe the topology
of S. But the former is using persistent homology to estimate the homology of S, while the
DTM is directly trying to estimate the persistent homology of distance function of S. We
discuss this point in detail in Section 9.1.
In this paper, we explore some statistical properties of these methods. In particular:

3
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman


1. We show that n(δb2 (x) − δ 2 (x)) converges to a Gaussian process. (Theorem 5).
2. We show that the bootstrap provides asymptotically valid confidence bands for δ.
This allows us to identify significant topological features. (Theorem 19).
3. We find the limiting distribution of a key topological quantity called the bottleneck
distance. (Section 4.1).
4. We also show that, under additional assumptions, there is another version of the
bootstrap—which we call the bottleneck bootstrap—that provides more precise infer-
ences. (Section 6).
5. We show similar results for the kernel distance. (Section 5).
6. We propose a method for choosing the tuning parameter m for DTM and the band-
width h for the kernel distance. (Section 7.1).
7. We show that the DTM and the kernel density estimator (KDE) both suffer from
boundary bias and we suggest a method for reducing the bias. (Section 7.2).

S Notation. B(x, ) is a Euclidean ball of radius , centered at x. We define A ⊕  =


x∈A B(x, ), the union of -balls centered at points in A. If x is a vector then ||x||∞ =
maxj |xj |. Similarly, if f is a real-valued function then ||f ||∞ = supx |f (x)|. We write
Xn X to mean that Xn converges in distribution to X, and we use symbols like c, C, . . . ,
as generic positive constants.
Remark: The computing for the examples in this paper were done using the R pack-
age TDA. See Fasy et al. (2014a). The package can be downloaded from http://cran.
r-project.org/web/packages/TDA/index.html.
Remark: In this paper, we discuss the DTM which uses a smoothing parameter m and
the kernel density estimator which uses a smoothing bandwidth h. Unlike in traditional
function estimation, we do not send these parameters to zero as n increases. In TDA, the
topological features created with a fixed smoothing parameter are of interest. Thus, all
the theory in this paper treats the smoothing parameters as being bounded away from 0.
See also Section 4.4 in Fasy et al. (2014b). In Section 7.1, we discuss the choice of these
smoothing parameters.

2. Background
In this section, we define several distance functions and distance-like functions, and we
introduce the relevant concepts from computational topology. For more detail, we refer the
reader to Edelsbrunner and Harer (2010).

2.1 Distance Functions and Persistent Homology


Let S ⊂ Rd be a compact set. The homology of S characterizes certain topological features
of S, such as its connected components, holes, and voids. Persistent homology is a multiscale
version of homology. Recall that the distance function ∆S for S is

∆S (x) = inf kx − yk. (3)


y∈S

4
Robust Topological Inference

Circle Distance Function Sublevel Set, t=0.25 Persistence Diagram


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●

1.2

●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●


●●


●●

●●


●●

●●


●●

●●


●●

●●


●●

●●


●●

●●


●●

●●


●●


●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●




●●

●●

●● ●


●●

●●


●●

●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●



●●


●●

● ●
●●


●●

●●

●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

●●

●●


● ●

●●

●●

●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●




●●

● ●
●●


●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


●●


● ●

●●



●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

●●

●● ●

●●


●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

●●
● ●●

●●
●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●



●● ●

●●
●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●



●● ●
●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


●● ●
●●

●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●


●● ●
●●

●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

0.8


● ●
●●
●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●

●● ●


●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

Death
Death

● ●


●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●
●●●


● ●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●● ●


●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●
●●●


● ●
●●

●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●
●●

●● ●



●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●


● ●


●●


●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●
●●




● ●●

●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●



● ●


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


● ●


●●


●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●


● ●
● ●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●●
●●●

● ● ● ●

0.4

● ●


●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●



● ●


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●


●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

● ●






●●

● ●









●●

●●

●●




●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●
●●

●●


●●

●●

●●
●●●

●●

●●


●●

● dim 0


● ●
● ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●


●●


●●

● ●














●●
●●

●●


●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●
●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●

●●

●●
●●
●●

●●


●●

●●


●●

dim 1
● ●

0.0


● ●●
● ●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

● ●
●●
● ●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●

●●

● ●
●●
● ●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●




● ●
●●
● ●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●
●●●


●● ●


●● ● ●


●●
● ●●



●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●

●●
● ●


●●
● ●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●


●●


●● ●

●●


● ●●
●●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●



●●


●● ●


●●


● ●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●
●●●

●●


●●

● ●
●●


●●

● ● ●



●●

●●

●●
● ●

●●

●●


●● ●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●●
●●


●●

●● ●

●●


● ●


●●

●●


●●

●●


●●

●●


●●

●●


●●

●●


●●

●●


●●

●●


● ●



●●

●●


●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●
●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●


●●
●●●

0.0 0.4 0.8 1.2
Birth

Figure 2: The left plot shows a one-dimensional curve. The second plot is the distance
function. The third plot shows a typical sublevel set of the distance function.
The fourth plot is the persistence diagram which shows the birth and death times
of loops (triangles) and connected components (points) of the sublevel sets.

Let Lt = {x : ∆S (x) ≤ t}. We will refer to the parameter t as “time.”


Given the nested family of the sublevel sets of ∆S , the topology of Lt changes as t in-
creases: new connected components can appear, existing connected components can merge,
cycles and cavities can appear or be filled, etc. Persistent homology tracks these changes,
identifies features and associates an interval or lifetime (from tbirth to tdeath ) to them. For
instance, a connected component is a feature that is born at the smallest t such that the
component is present in Lt , and dies when it merges with an older connected component.
Intuitively, the longer a feature persists, the more relevant it is.
A feature, or more precisely its lifetime, can be represented as a segment whose ex-
tremities have abscissae tbirth and tdeath ; the set of these segments is called the barcode
of ∆S . An interval can also be represented as a point in the plane with coordinates
(u, v) = (tbirth , tdeath ). The set of points (with multiplicity) representing the intervals is
called the persistence diagram of ∆S . Note that the diagram is entirely contained in the
half-plane above the diagonal defined by u = v, since death always occurs after birth.
This diagram is well-defined for any compact set S (Chazal et al. (2012), Theorem 2.22).
The most persistent features (supposedly the most important) are those represented by the
points furthest from the diagonal in the diagram, whereas points close to the diagonal can
be interpreted as (topological) noise.
Figure 2 shows a simple example. Here, the points on the circle are regarded as a subset
of R2 . At time zero, there is one connected component and one loop. As t increases, the
loop dies.
Let S1 and S2 be compact sets with distance functions ∆1 and ∆2 and diagrams D1
and D2 . The bottleneck distance between D1 and D2 is defined by
W∞ (D1 , D2 ) = min sup kz − g(z)k∞ , (4)
g: D1 →D2 z∈D1

where the minimum is over all bijections between D1 and D2 . In words, the bottleneck
distance is the maximum distance between the points of the two diagrams, after minimizing
over all possible pairings of the points (including the points on the diagonals).

5
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

A fundamental property of persistence diagrams is their stability. According to the


Persistence Stability Theorem (Cohen-Steiner et al. (2005); Chazal et al. (2012))

W∞ (D1 , D2 ) ≤ ||∆1 − ∆2 ||∞ = H(S1 , S2 ). (5)

Here, H is the Hausdorff distance, namely,


n o
H(A, B) = inf  : A ⊂ B ⊕  and B ⊂ A ⊕  ,
S
where we recall that A ⊕  = x∈A B(x, ). More generally, the definition of persistence
diagrams and the above stability theorem do not restrict to distance functions but also
extend to families of sublevel sets (resp. upper-level sets) of functions defined on Rd under
very weak assumption. We refer the reader to Edelsbrunner and Harer (2010); Chazal et al.
(2009, 2012) for a detailed exposition of the theory.
Given a sample X1 , . . . , Xn ∼ P , the empirical distance function is defined by

∆(x)
b = min kx − Xi k. (6)
Xi

Lemma 1 (Lemma 4 in Fasy et al., 2014b) Suppose that P is supported on S, and has
a density bounded away from zero and infinity. Then
P
sup |∆(x)
b − ∆S (x)| → 0.
x

See also Cuevas and Rodrı́guez-Casal (2004). The previous lemma justifies using ∆ b to
estimate the persistent homology of sublevel sets of ∆S . In fact, the sublevel sets of ∆ are
b
just unions of balls around the observed data. That is,
n
 [
Lt = x : ∆(x)
b ≤t = B(Xi , t).
i=1

The persistent homology of the union of the balls as t increases may be computed by
creating a combinatorial representation (called a Cech complex) of the union of balls, and
then applying basic operations from linear algebra (Edelsbrunner and Harer, 2010, Sections
VI.2 and VII.1).
However, as soon as there is noise or outliers, the empirical distance function becomes
useless, as illustrated in Figure 3. More specifically, suppose that

P = πR + (1 − π)(Q ? Φσ ), (7)

where π ∈ [0, 1], R is an outlier distribution (such as a uniform on a large set), Q is


supported on S, ? denotes convolution, and Φσ is a compactly supported noise distribution
with scale parameter σ.
Recovering the persistent homology of ∆S exactly (or even the homology of S) is not
possible in general since the problem is under-identified. But we would still like to find a
function that is similar to the distance function for S. The empirical distance function fails
miserably even when π and σ are small. Instead, we now turn to the DTM.

6
Robust Topological Inference

Figure 3: Top: data on the Cassini curve, the distance function ∆,


b a typical sublevel set
{x : ∆(x)
b ≤ t} and the resulting persistence diagram. Bottom: the effect of
adding a few outliers. Note that the distance function and persistence diagram
are dramatically different.

2.2 Distance to a Measure


Given a probability measure P , for 0 < m < 1, the distance-to-measure (DTM) at resolu-
tion m (Chazal et al., 2011) is defined by
s Z m
1
δ(x) ≡ δP,m (x) = (G−1 2
x (u)) du, (8)
m 0

where Gx (t) = P (kX − xk ≤ t). Alternatively, the DTM can be defined using the cdf of
the squared distances, as in the following lemma:

Lemma 2 (Chazal et al., 2015) Let Fx (t) = P (kX − xk2 ≤ t). Then

1 m −1
Z
2
δP,m (x) = Fx (u)du.
m 0

Proof For any 0 < u < 1,


 −1 2
Gx (u) = inf t2 : Gx (t) ≥ u = inf t2 : P (kX − xk ≤ t) ≥ u
 

= inf t : P (kX − xk2 ≤ t) ≥ u = inf {t : Fx (t) ≥ u} = Fx−1 (u).




7
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

Therefore Z m Z m
1 1
2
δP,m (x) = (G−1 2
x (u)) du = Fx−1 (u)du.
m 0 m 0

Given a sample X1 , . . . , Xn ∼ P , let Pn be the probability measure that puts mass 1/n
on each Xi . It is easy to see that the distance to the measure Pn at resolution m is
1 X
δb2 (x) ≡ δP2 n ,m (x) = kXi − xk2 , (9)
k
Xi ∈Nk (x)

where k = dmne and Nk (x) is the set containing the k nearest neighbors of x among
X1 , . . . , Xn . We will use δb to estimate δ.
Now we summarize some important properties of the DTM, all of which are proved in
Chazal et al. (2011) and Buchet et al. (2013). First, recall that the Wasserstein distance of
order p between two probability measures P and Q is given by
Z 1/p
p
Wp (P, Q) = inf kx − yk dJ(x, y) , (10)
J

where the infimum is over all joint distributions J for (X, Y ) such that X ∼ P and Y ∼ Q.
We say that P satisfies the (a, b)-condition if there exist a, b > 0 such that, for every x in
the support of P and every  > 0,
 
P B(x, ) ≥ ab . (11)

This means that the support does not have long, thin components.
Theorem 3 (Properties of DTM) The following properties hold:
1. The distance to measure is 1-Lipschitz: for any probability measure P on Rd and
any (x, x0 ) ∈ Rd ,
|δP,m (x) − δP,m (x0 )| ≤ kx − x0 k.
2. If Q satisfies (11) and is supported on a compact set S, then
sup |δQ,m (x) − ∆S (x)| ≤ a−1/b m1/b . (12)
x

In particular, supx |δQ,m (x) − ∆S (x)| → 0 as m → 0.


3. If P and Q are two distributions, then
1
sup |δP,m (x) − δQ,m (x)| ≤ √ W2 (P, Q). (13)
x m

4. If Q satisfies (11) and is supported on a compact set S and P is another distribution


(not necessarily supported on S), then
1
sup |δP,m (x) − ∆S (x)| ≤ a−1/b m1/b + √ W2 (P, Q) (14)
x m
Hence, if m  W2 (P, Q)2b/(2+b) , then supx |δP,m (x) − ∆S (x)| = O(W2 (P, Q)2/(2+b) ).

8
Robust Topological Inference

5. Let DP be the diagram from δP,m and let DQ be the diagram from δQ,m , then

W∞ (DP , DQ ) ≤ ||δP,m − δQ,m ||∞ . (15)

For any compact set A ⊂ Rd , let r(A) denotes the radius of the smallest enclosing ball
of A centered at zero:
r(A) = inf {r > 0 : A ⊂ B(0, r)} .

We conclude this section by bounding the distance between the diagrams DδP,m and D∆S .

Lemma 4 (Comparison of Diagrams) Let P = πR + (1 − π)(Q ? Φσ ) where Q is sup-


ported on S and satisfies (11), R is uniform on a compact set A ⊂ Rd and Φσ = N (0, σ 2 I).
Then,
p
 −1/b 1/b π r(A)2 + 2r(S)2 + 2σ 2 + σ
W∞ DδP,m , D∆S ≤ a m + √ .
m

Proof We first apply the stability theorem and parts 4 and 5 of the previous result:

1
≤ a−1/b m1/b + √ W2 (P, Q).

W∞ DδP,m , D∆S
m

The term W2 (P, Q) can be upper bounded as follows:

W2 (P, Q) ≤ W2 (P, Q ? Φσ ) + W2 (Q ? Φσ , Q)

These two terms can be bounded with simple transport plans. Let Z be a Bernoulli random
variable with parameter π. Let X and Y be random variables with distributions R and
Q ? Φσ . We take these three random variables to be independent. Then, the random
variable V defined by V = ZX + (1 − Z)Y has for distribution the mixture distribution P .
By definition of W2 , one has

W22 (P, Q ? Φσ ) ≤ E kV − Y k2


≤ E |Z|2 E kX − Y k2 ,
 

by definitionof V and by independence of Z and X − Y . Next, we have E kXk2 ≤ r(A)2




and E kY k2 ≤ 2[r(S)2 + σ 2 ]. Thus

W22 (P, Q ? Φσ ) ≤ π 2 r(A)2 + 2r(S)2 + 2σ 2 .




It can be checked in a similar way that W2 (Q ? Φσ , Q) ≤ σ (see for instance the proof of
Proposition 1 in Caillerie et al. (2011)) and the Lemma is proved.

Remark: Note that when π and σ are small (and m tends to 0) we see that the diagrams
DδP,m and D∆S are close.

9
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

3. Limiting Distribution of the Empirical DTM


In this section, we find the limiting distribution of δb and we use this to find confidence
bands for δ(x). We start with the pointwise limit.
Let δ(x) ≡ δP,m (x) and δ(x)
b ≡ δPn ,m (x), as defined in the previous section.
Theorem 5 (Convergence to Normal Distribution) Let P be some distribution in Rd .
For some fixed x, assume that Fx is differentiable at Fx−1 (m), for m ∈ (0, 1), with positive
derivative Fx0 (Fx−1 (m)). Then we have
√ 2
n(δb (x) − δ 2 (x)) N (0, σx2 ), (16)

where Z Fx−1 (m) Z Fx−1 (m)


1
σx2 = 2 [Fx (s ∧ t) − Fx (s)Fx (t)] ds dt.
m 0 0

Remark 6 Note that assuming that Fx is differentiable is not a strong assumption. Ac-
cording to the Lebesgue differentiation theorem on R, it will be satisfied as soon as the push
forward measure of P by the function kx − ·k2 is absolutely continuous with respect to the
Lebesgue measure on R.

Proof From Lemma 2,


Z m Z m
1 1
δ 2 (x) = (G−1 2
x (t)) dt = Fx−1 (t)dt
m 0 m 0

where Gx (t) = P(kX − xk ≤ t) and Fx (t) = P(kX − xk2


≤ t). So
Z m
√ 2 1 √
n(δb (x) − δ 2 (x)) = n[Fbx−1 (t) − Fx−1 (t)]dt. (17)
m 0

First suppose that Fbx−1 (m) > Fx−1 (m). Then, by integrating “horizontally” rather than

Figure 4: The integral of (17) can be decomposed into two parts, An and Rn .

“vertically”, we can split the integral into two parts, as illustrated in Figure 4:
Z −1 Z b−1
1 m √ b−1 1 Fx (m) √ 1 Fx (m) √
Z
−1
n[Fx (t) − Fx (t)]dt = n[Fx (t) − Fx (t)]dt +
b n[m − Fbx (t)]dt
m 0 m 0 m Fx−1 (m)
≡ An (x) + Rn (x) (18)

10
Robust Topological Inference

Next, it can be easily checked that (18) is also true when Fbx−1 (m) < Fx−1 (m) if we take
Rb Ra
a f (u)du
:= − b f (u)du
when a √ > b. Now, since Fx is differentiable at m, we have
−1 −1
that Fx (m) − Fx (m) = OP (1/ n), see for instance Corollary 21.5 in van der Vaart
b

(2000).
According to the DKW (Dvoretsky-Kiefer-Wolfowitz) inequality we have that
p
supt Fx (t) − Fbx (t) = OP ( 1/n) and thus


n −1
|Rn | ≤ Fx (m) − Fbx−1 (m) sup Fx (t) − Fbx (t) = oP (1).

m t

Next, note that n[Fx (t) − Fbx (t)] B(t), where B(t) is a Gaussian process with covari-
ance function [Fx (s ∧ t) − Fx (s)Fx (t)] (See, for example, van der Vaart and Wellner (1996)).
By taking the integral, which is a bounded operator, we have that
Z Fx−1 (m)
d
An B(t) dt = N (0, σx2 ),
0
where Z Fx−1 (m) Z Fx−1 (m)
1
σx2 = 2 [Fx (s ∧ t) − Fx (s)Fx (t)] ds dt.
m 0 0

Now, we consider the functional limit of the distance to measure, on a compact domain
X ⊂ Rd . The functional convergence of the DTM requires assumptions on the regularity
of the quantile functions Fx−1 . We say that ωx : (0, 1) → R+ is a modulus of (uniform)
continuity of Fx−1 if, for any u ∈ (0, 1),
sup |Fx−1 (m0 ) − Fx−1 (m)| ≤ ωx (u),
(m,m0 )∈(0,1)2 , |m0 −m|<u

with limu→0 ωx (u) = ωx (0) = 0. We say that ωX : (0, 1) → R+ is an uniform modulus of


continuity for the family of quantiles functions (Fx−1 )X if, for any u ∈ (0, 1) and any x ∈ X ,
sup |Fx−1 (m0 ) − Fx−1 (m)| ≤ ωX (u),
(m,m0 )∈(0,1)2 , |m0 −m|<u

with limu→0 ωX (u) = ωX (0) = 0. When such modulus of continuity ω exists, note that it
always can be chosen nondecreasing and this allows us to consider its generalized inverse
ω −1 .
One may ask if the existence of the uniform modulus of continuity over a compact
domain X is a strong assumption or not. To answer this issue, let us introduce the following
assumption:
(Hω,X ) : for any x ∈ X , the push forward measure Px of P by kx − ·k2 is supported on a
finite closed interval.
Note that Assumption (Hω,X ) is not very strong. For instance it is satisfied for a measure
P supported on a compact and connected manifold, with Px absolutely continuous for the
Hausdorff measure on P . The following Lemma derives from general results on quantile
functions given in Bobkov and Ledoux (2014) (see their Appendix A); the lemma shows
that a uniform modulus of continuity for the quantiles exists under Assumption (Hω,X ).

11
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

Lemma 7 (Existence of Uniform Modulus of Continuity) Let X be a compact do-


main and let P be a measure with compact support in Rd . Assume that Assumption (Hω,X )
is satisfied. Then there exists a uniform modulus of continuity for the family of quantile
functions Fx−1 over X .

Proof Let x ∈ X . According to Propositions A.7 and A.12 in Bobkov and Ledoux
(2014), Assumption (Hω,X ) is equivalent to assuming the existence of a uniform modulus
of continuity of Fx−1 (it tends to zero at zero). We can then define ωx on (0, 1) by
−1 0
Fx (m ) − Fx−1 (m) .

u ∈ (0, 1) 7→ ωx (u) := sup
(m,m0 )∈(0,1)2 , |m0 −m|<u

According to Lemma 8, we have that for any (x, x0 ) ∈ X 2 :


−1
F 0 (m) − Fx−1 (m) ≤ Ckx0 − xk,

x (19)

where C only depends on P and X . According to (19), for any (m, m0 ) ∈ (0, 1)2 , and for
any (x, x0 ) ∈ X 2 :
Fx (m ) − Fx−1 (m) ≤ F −1 −1
−1 0 0
0
x0 (m ) − Fx0 (m) + 2Ckx − xk.

By taking the supremum over the m and the m0 such that |m0 − m| < u, it yields:

ωx (u) ≤ ωx0 (u) + 2Ckx0 − xk,

and x 7→ ωx (u) is thus Lipschitz at any u. For any u ∈ (0, 1), let

ωX (u) := sup ωx (u),


x∈X

which is finite because the function x 7→ ωx (u) is continuous on the compact X for any
u ∈ (0, 1). We only need to prove that ωX is continuous at 0. Let (un ) ∈ (0, 1)N be a
decreasing sequence to zero. Since ωX is a non decreasing function, ωX (un ) has a limit.
For any n ∈ N, there exists a point xn ∈ X such that ωX (un ) = ωxn (un ). Let xφ(n) be a
subsequence which converges to x̄ ∈ X . According to (19),

ωX (uφ(n) ) ≤ ωxφ(n) (uφ(n) ) − ωx̄ (uφ(n) ) + ωx̄ (uφ(n) )


≤ C xφ(n) − x̄ + ωx̄ (uφ(n) )

which gives that ωX (uφ(n) ) and ωX (un ) both tend to zero because ωx̄ is continuous at zero.
Thus ωX is continuous at zero and the Lemma is proved.

We will also need the the following result, which shows that on any compact domain X ,
the function x 7→ Fx−1 (m) is Lipschitz. For a domain X ∈ Rd , a probability P and a level
m, we introduce the quantity qP,X (m) ∈ R̄, defined by

qP,X (m) := sup Fx−1 (m).


x∈X

12
Robust Topological Inference

Lemma 8 (Lipschitz Lemma) Let P be a measure on Rd and let m ∈ (0, 1). Then, for
any (x, x0 ) ∈ Rd , q
q
−1 −1
F 0 (m) − Fx (m) ≤ kx0 − xk.

x

Moreover, if X is a compact domain in Rd , then qP,X (m) < ∞ and for any (x, x0 ) ∈ X 2 :
−1 q
F 0 (m) − Fx−1 (m) ≤ 2 qP,X (m) kx0 − xk.

x

Proof Let (x, a) ∈ R2 , note that


 q   q 
−1 −1
B x, Fx (m) ⊆ B x + a, Fx (m) + kak ,

which implies
  q    q 
−1 −1
m = P B x, Fx (m) ≤ P B x + a, Fx (m) + kak .

q q
Therefore Fx+a (m) ≤ Fx−1 (m) + kak. Similarly,
−1

  q    q 
−1 −1
m = P B x + a, Fx+a (m) ≤ P B x, Fx+a (m) + kak ,

q q
which implies Fx−1 (m) −1
≤ Fx+a (m) + kak.
d
Let X be a compact domain
q of R , then according q to the previous result for some fixed
x ∈ X and for any x ∈ X , Fx0 (m) ≤ kx − xk + Fx−1 (m) which is bounded on X . The
0 −1 0
√ √ √ √
last statement follows from the fact that |x − y| = | x − y| | x + y|.

We are now in position to state the functional limit of the distance to measure to the
empirical measure.

Theorem 9 (Functional Limit) Let P be a measure on Rd with compact support. Let X


be a compact domain on Rd and m ∈ (0, 1). Assume that there exists a uniform modulus

of continuity ωX for the family (Fx−1 )X . Then n(δb2 (x) − δ 2 (x)) B(x) for a centered
Gaussian process B(x) with covariance kernel
Z Fx−1 (m) Z Fy−1 (m) √
!
1 h √ i
κ(x, y) = 2 P B(x, t) ∩ B(y, s) − Fx (t)Fy (s) ds dt.
m 0 0

Remark 10 Note that the functional limit is valid for any value of m ∈ (0, 1). A local
version of this result could be also proposed by considering the (local) modulii of continuity
of the quantile functions at m. For the sake of clarity, we prefer to give a global version.

Remark 11 By the delta method (as described in Section 4) a similar result holds for
√ b
n(δ(x) − δ(x)) as long as inf x δ(x) > 0.

13
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman


Proof In the proof of Theorem 5 we showed that n(δb2 (x) − δ 2 (x)) = An (x) + Rn (x)
where
Z Fx−1 (m) √
1
An (x) = n[Fx (t) − Fbx (t)]dt
m 0
Z Fbx−1 (m) √
1
Rn (x) = n[m − Fbx (t)]dt.
m Fx−1 (m)

First, we show that supx∈X |Rn (x)| = oP (1). Then we prove that An (x) converges to a
Gaussian process. √
Note that |Rn (x)| ≤ mn |Sn (x)||Tn (x)| where

Sn (x) = Fx−1 (m) − Fbx−1 (m) , Tn (x) = sup Fx (t) − Fbx (t) .

t

Let ξi ∼Uniform (0,1), for i = 1, . . . , n and let Hn be their empirical distribution function.
d
Define k = mn. Then Fbx−1 (m) = Fx−1 (ξ(k) ) = Fx−1 Hn−1 (m) , where ξ(k) is the kth order


statistic. Thus, for any m > 0 and any x ∈ X :

P (|Sn (x)| > ) = P |Fx−1 (Hn−1 (m)) − Fx−1 (m)| > 




≤ P ωX (|m − Hn−1 (m)|) > 




−1
≤ P |m − Hn−1 (m)| > ωX

()
 
 n ω −1 ()2 1 
X
≤ 2 exp − −1 (20)
 m 1+ X
2ω () 
3m

In the last line we used inequality 1 page 453 and Point (12) of Proposition 1 page 455 of
−1
Shorack and Wellner (2009). Note that ωX () > 0 for any ε > 0 because ωX is assumed to
be continuous at zero by definition.
Fix ε > 0. There exists an absolute constant CX such thatSthere exists an integer
N ≤ CX ε−d and N points (x1 , . . . , xN ) laying in X such that j=1...N Bj ⊇ X , where
Bj = B(xj , ε). Now, we apply Lemma 8 with P , and with Pn and we find that for any
x ∈ Bj :
q q
−1
Fx (m) − Fx−1
b−1
(m) ≤ 2 qP,X (m) ε and Fx (m) − Fbx−1 (m) ≤ 2 qPn ,X (m) ε.

j j

Thus, for any x ∈ Bj ,



−1
Fx (m) − Fbx−1 (m) ≤ Fx−1 (m) − Fx−1
−1 b−1 (m) + Fb−1 (m) − Fb−1 (m)
(m) + F (m) − F

j
xj xj xj x
q q 
≤2 qP,X (m) + qPn ,X (m) ε + |Fx−1 j
(m) − Fbx−1
j
(m)|

≤ Cε + |Fx−1
j
(m) − Fbx−1
j
(m)| (21)

14
Robust Topological Inference

where C is a positive constant which only depends on X and P . Using a union bound
together with (20) , we find that
  !
P sup |Sn (x)| > 2Cε ≤ P sup |Sn (xj )| > Cε
x∈X j=1...N
 
 n ω −1 (C)2 1 
X
≤ 2CX ε−d exp − −1 .
 m 1+
2ωX (C) 
3m
Thus, supx∈X |Sn (x)| = oP (1). Then
sup |Tn (x)| = sup sup |Fbx (t) − Fx (t)|
x∈X x∈X t
 √  √ 
= sup sup Pn B(x, t − P B(x, t

x∈X t
r !
d
≤ sup |Pn (B) − P(B)| = OP (22)
B∈Bd n

where Bd is the set of balls in Rd and we used the Vapnik-Chervonenkis theorem. Finally,
we obtain that √
n
sup |Rn (x)| ≤ sup |Sn (x)| sup |Tn (x)| = oP (1). (23)
x∈X m x∈X x∈X
Since supx∈X |Rn (x)| = oP (1), it only remains to prove that the process An converges to a
Gaussian process.

Now, we consider the process An on X . Let us denote νn := n(Pn − P ) the empirical
process. Note that
Z Fx−1 (m) !
1 1
An (x) = νn Ikx−Xk2 ≤t dt = νn (fx )
m 0 m

where fx (y) := Fx−1 (m) − kx − yk2 ∧ 0. For any (x, x0 ) ∈ X and any y ∈ Rd , we have
 

|fx (y) − fx (y)| ≤ |Fx−1 (m) − Fx−1 0 0


 
0 (m)| + kx − x k kxk + kx k + 2kyk
 q 
≤ 2 r(X ) + kyk + qP,X (m) kx − x0 k

Since P is compactly supported, then the collection of functions (fx )x∈X is P -Donsker (see
for instance 19.7 in van der Vaart (2000)) and An (x) B(x) for a centered Gaussian process
B(x) with covariance kernel
κ(x, y) = Cov(An (x), An (y)) = E[An (x)An (y)]
Z Fx−1 (m) Z Fy−1 (m) h
1  i
= 2 E Fbx (t) − Fx (t) Fby (s) − Fy (s) ds dt
m 0 0
Z Fx−1 (m) Z Fy−1 (m) √
!
1 h √ i
= 2 P B(x, t) ∩ B(y, s) − Fx (t)Fy (s) ds dt.
m 0 0

15
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

4. Hadamard Differentiability and The Bootstrap


In this section, we use the bootstrap to get a confidence band for δ. Define cα by

P( n||δb − δ||∞ > cα ) = α.

Let X1∗ , . . . , Xn∗ be a sample from the empirical measure Pn and let δb∗ be the corresponding
empirical DTM. The bootstrap estimate b cα is defined by

P( n||δb∗ − δ|| cα | X1 , . . . , Xn ) = α.
b ∞>b

As usual, bcα can be approximated by Monte Carlo. Below we show that this bootstrap is
valid. It then follows that
 
b ∞<√ cα
P ||δ − δ|| → 1 − α.
b
n

A different approach to the bootstrap is considered in Section 6.


To prepare for our next result, let B denote the class of all closed Euclidean balls in Rd
and let B denote the P -Brownian bridge on B, i.e. the centered Gaussian process on B with
covariance function κ(B, C) = P (B ∩ C) − P (B)P (C), with B, C in B. We will denote with
Bx (r) the value of B at B(x, r), the closed ball centered at x ∈ Rd and with radius r > 0.

Theorem 12 (Bootstrap Validity) Let P be a measure on Rd with compact support S,


m ∈ (0, 1) be fixed and X be a compact domain in Rd . Assume that FP,x = Fx is differ-
entiable at Fx−1 (m) and that there exist a constant C > 0 such that for all small η ∈ R,

sup Fx Fx−1 (m) − Fx (Fx−1 (m) + η) < 



implies |η| < C. (24)
x∈X

√ b∗ 2 b 2
for all x ∈ X . Then, supx∈X n δ (x) − δ(x) converges in distribution to

Z −1
1 Fx (m)
supx∈X Bx (u)du

m 0

conditionally given X1 , X2 , . . ., in probability.

We will establish the above result using the functional delta method, which entails
showing that the distance to measure function is Hadamard differentiable at P . In fact, the
proof further shows that the process
√  
x ∈ X 7→ n δ 2 (x) − δb2 (x) ,

converges weakly to the Gaussian process


Z Fx−1 (m)
1
x∈X →
7 − Bx (u)du.
m 0

16
Robust Topological Inference

Remark 13 This result is consistent with the result established in Theorem 9, but in order
to establish Hadamard differentiability, we use a slightly different assumption. Theorem 9 is
proved by assuming a uniform modulus of continuity on the quantile functions Fx−1 whereas
in Theorem 12 a uniform lower bound on the derivatives is required. These two assumptions
are consistent: they both say that Fx−1 is well-behaved in a neighborhood of m for all x.
However, the condition used in Theorem 12 is stronger than the condition used in Theorem 9.

Proof [Proof of Theorem 12] Let us first give the definition of Hadamard differentiability,
for which we refer the reader to, e.g., Section 3.9 of van der Vaart and Wellner (1996). A
map φ from a normed space (D, k·kD ) to a normed space (E, k·kE ) is Hadamard differentiable
at the point x ∈ D if there exists a continuous linear map φ0x : D → E such that

φ(x + tht ) − φ(x) 0

− φx (h) → 0, (25)
t E

whenever kht − hkD → 0 as t → 0.


We also recall the functional delta method (see, e.g. van der Vaart and Wellner, 1996,
Theorem 3.9.4): suppose that Tn takes values in D, rn → ∞, rn (Tn − θ) T , and suppose
that φ is Hadamard differentiable at θ. Then rn (φ(Tn ) − φ(θ)) φ0θ (T ). Moreover, by
Theorem 3.9.11 of van der Vaart and Wellner (1996) the bootstrap has the same limit.
More precisely, given X1 , X2 , . . ., we have that rn (φ(Tn∗ ) − φ(Tn )) converges conditionally in
distribution to φ0θ (T ), in probability. This implies the validity of the bootstrap confidence
sets.
We define M to be the space of finite, σ-finite signed measures on (Rd , B d ) supported
on the compact set S and the mapping k · kB : M 7→ R given by

kµkB = sup |µ(B)|, µ ∈ M.


B∈B

In Lemma 14 we show that this is a normed space.


For our purposes, instead of using M it will be convenient to work with the equivalent
space of the evaluations of all µ ∈ M over the balls B. Formally, let `∞ (B) denote the
normed space of bounded functions on B equipped with the supremum norm. Then, by
Lemma 14, the mapping from M into `∞ (B) given by

µ 7→ (µ : B → [0, 1]) (26)

is a bijection on its image, which we will denote by D. By definition, the supremum norm
on D is exactly the norm k · kB , so that D ⊂ `∞ (B) equipped with the supremum norm is
a normed space. With a slight abuse of notation, we will identify measures in M with the
corresponding points in D and write µ ∈ D to denote the signed measure µ corresponding
to the point {µ(B), B ∈ B} in D.
The advantage of using the space D instead of M is that the convergence of the empirical

process ( n(Pn (B) − P (B)) : B ∈ B) to the Brownian bridge takes place in D, as required
by the delta-method for the bootstrap (see van der Vaart and Wellner, 1996, Theorem 3.9.).

For a signed measure µ in M, x ∈ Rd and r > 0, we set Fµ,x (r) = µ (B(x, r)). Notice
if P is a probability measure, then FP,x is the c.d.f. of the univariate random variable

17
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

kX − xk2 , with X ∼ P . For a general µ ∈ M, Fµ,x is a cadlag function, though not


monotone. For any m ∈ R, µ in M and x ∈ Rd , set
−1
n √ o
Fµ,x (m) = inf r > 0 : µ(B(x, r)) ≥ m ,

where the infimum over the empty set is define to be ∞. If P is a probability measure and
−1
m ∈ (0, 1) then FP,x (m) is just the m-th quantile of the random variable kX − xk2 , X ∼ P .
Fix a m ∈ (0, 1) and let Mm = Mm (X ) denote the subset of M consisting of all finite

signed measure µ such that, there exists a value of r > 0 for which inf x∈X µ (B(x, r)) ≥ m.
Thus, for any µ ∈ Mm and x ∈ X , Fµ,x −1 (m) < ∞. Let D
m be the image of Mm by the
mapping (26).
Let E the set of bounded, real-valued function on X , a normed space with respect to
the sup norm. Finally, we define φ : Dm → E to be the mapping
Z −1
−1 1 Fµ,x (m)
µ ∈ Dm 7→ φ(µ)(x) = Fµ,x (m) − Fµ,x (u)du, x ∈ X (27)
m 0
Notice that if P is a probability measure, simple algebra shows that φ(P )(x) is the square
value of the distance to measure of P at the point x, i.e. δp2 (x); see Figure 5.
Below we will show that, for any probability measure P , the mapping (27) is Hadamard
differentiable at P .
For an arbitrary Q ∈ D, let {Qt }t>0 ⊂ D be a sequence of signed measure such that
limt→0 kQt − QkB = 0 and such that P + tQt ∈ Dm for all t. Sequences of this form exist:
since ktQt kB → 0 as t → 0, for any arbitrary 0 <  < 1 − m and all t small enough,
  
−1
inf (P + tQt ) B x, FP,x (m + ) ≥ m + /2.
x∈X

By the boundedness of X and compactness of S, this implies that there exists a number
r > 0 such that
inf (P + tQt ) (B (x, r)) ≥ m,
x
so the image of P + tQt by (27) is an element of E (i.e. it is a bounded function).

−1
R FP,x
Rm −1 −1 (m)
Figure 5: The integral 0 FP,x (u)du is equivalent to mFP,x (m) − 0 FP,x (u)du.

−1
For sake of readability, below we will write Fx,t and Fx,t (m) for FP +tQt ,x and FP−1
+tQt ,x (m),
respectively, and Fx for FP,x . Also, for each x ∈ X and z ∈ R+ we the set Ax,z = {y :
ky − xk2 ≤ z} and Fx,t (z) = (P + tQt )(Ax,z ).

18
Robust Topological Inference

Thus,
Z Fx−1 (m)
1
φ(P )(x) = δP2 (x) = Fx−1 (m) − Fx (u)du
m 0

and
−1
Z Fx,t (m)
−1 1
φ(P + tQt )(x) = Fx,t (m) − Fx,t (u)du.
m 0

Some algebra show that, for any x,


−1
φ(P )(x) − φ(P + tQt )(x) Fx−1 (m) − Fx,t (m) A(x, t)
= − , (28)
t t mt
where
 −1
R −1
 Fx (m) [Fx (u) − Fx,t (u)] du − F−1 (m) −1
if Fx−1 (m) ≤ Fx,t
R x,t
F (u)du (m)

0 Fx (m) x,t
A(x, t) = −1
F (m) −1
Fx (m) −1
if Fx−1 (m) > Fx,t
R R
 0x
 [Fx (u) − Fx,t (u)] du + −1 Fx,t (u)du (m).
Fx,t (m)

To demonstrate Hadamard differentiability (see 25), we will prove that, as t → 0, the


expression in (28), as a bounded function of x ∈ X , will converge in E to the bounded
function
Z −1
1 Fx (m)
x ∈ X 7→ − Q(Ax,u )du.
m 0
Towards that end, we have, for all t and any x ∈ X ,
"Z −1 Z F −1 (m) #
Fx (m)
A(x, t) 1 x,t
= tQt (Ax,u )du − (P + tQt )(Ax,u )du
t t 0 Fx−1 (m)
Z Fx−1 (m) Z −1 Z F −1 (m)
1 Fx,t (m) x,t
= Qt (Ax,u )du − P (Ax,u )du − (Qt )(Ax,u )du
0 t Fx−1 (m) Fx−1 (m)
≡ A1 (x, t) − A2 (x, t) − A3 (x, t),
Ra Rb
where, for a < b, we write b = − a .
To handle the three terms appearing
 in the last display, we use−1Lemma 15 below which
−1 −1

shows that supx∈X m − Fx Fx,t (m) = O(t) and that supx∈X |Fx (m) − Fx,t (m)| = O(t)

as as t → 0.
We now analyze the terms A1 (x, t), A2 (x, t) and A3 (x, t) separately.

• Term A1 (x, t). As t → 0, Qt → Q and, uniformly in x ∈ X and z > 0, |Qt (Ax,z )| ≤


|Qt (S)| = |Q(S)| + o(1) < ∞. Furthermore, supx∈X Fx−1 (m) < ∞ by compactness of
X and S. Therefore, using the dominated convergence theorem,

A (x, t) Z Fx−1 (m)
1 1
lim sup − Q(Ax,u )du = 0. (29)

t→0 x∈X m m 0

19
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

• Term A2 (x, t). Since P (Ax,u ) is non-decreasing in u for all x, we have


−1 −1
Fx,t (m) − Fx−1 (m)  −1
Fx,t (m) − Fx−1 (m)  −1

×min m, Fx (Fx,t (m)) ≤ A2 (x, t) ≤ ×max m, Fx (Fx,t (m)) .
t t

Using (32), we conclude that


F −1 (m) − F −1 (m) A (x, t)
x x,t 2
lim sup − = 0. (30)

t→0 x∈X t m

• Term A3 (x, t). Finally, since |Qt (S)| ≤ |Q(S)| + o(1) as t → 0 and using (35), we
obtain  −1 
sup |A3 (x, t)| = O sup Fx,t (m) − Fx−1 (m) = o(1) (31)
x x
as t → 0.

Therefore, from (28), (29), (30), and (31),


Z −1
φ(P )(x) − φ(P + tQ )(x)
t 1 Fx (m)
lim sup + Q(Ax,u )du = 0,

t→0 x t m 0
which shows that Z Fx−1 (m)
1
x∈X →
7 − Q(Ax,u )du
m 0

is the Hadamard derivative of δ 2 at P .


The statement of the theorem now follows from an application of Theorem 3.9.11 in
van der Vaart and Wellner (1996) and the fact that, since B is a Donsker class, the empir-

ical process ( n(Pn (B) − P (B)) : B ∈ B) converges to the Brownian bridge B on B with
covariance kernel κ(B, C) = P (B ∩ C) − P (B)P (C).

Lemma 14 (Normed Space) The pair (M, k · kB ) is a normed space.

Proof It is clear that M is closed under addition and scalar multiplication, and so it is
a linear space. We then need to show that the mapping k · kB is a norm. It is immediate
to see that it absolutely homogeneous and satisfies the triangle inequality: for any µ and
ν in M and c ∈ R, kcµkB = |c|kµkB and kµ + νkB ≤ kµkB + kνkB . It remains to prove
that kµkB = 0 if and only if µ is identically zero, i.e. µ(A) = 0 for all Borel sets A. One
direction is immediate: if kµkB > 0, then there exists a ball B such that µ(B) 6= 0, so
that µ 6= 0. For the other direction, assume that µ ∈ M is such that kµkB = 0. By the
Jordan decomposition, µ can be represented as the difference of two singular, non-negative
finite measures: µ = µ+ − µ− . The condition µ(B) = 0 for all B ∈ B is equivalent to
µ+ (B) = µ− (B) for all B ∈ B. We will show that this further implies that the supports
of µ+ and µ− , denoted with S+ and S− respectively, are both empty, and therefore that µ
is identically zero. Indeed, recall that the support of a Borel measure λ over a topological
space X is the set of points x ∈ X all of whose open neighborhoods have positive λ-measure.

20
Robust Topological Inference

In our setting this is equivalent to the set of points in Rd such that all open balls centered
at those points have positive measure, which in turn is equivalent to the set of points such
that all closed balls centered at those points have positive measure. Therefore, using the
fact that µ+ (B) = µ− (B), for all B ∈ B,
n o n o
S+ = x ∈ Rd : µ+ (B(x, r)) > 0, ∀r > 0 = x ∈ Rd : µ− (B(x, r)) > 0, ∀r > 0 = S− .

where B(X, r) = {y ∈ Rd : ky − xk ≤ r}. It then follows that S+ and S− must be empty,


for otherwise µ+ and µ− would be mutually singular, non-zero measures with the same
support, a contradiction.

Lemma 15 Under the assumptions of the theorem and as t → 0,


−1

sup m − Fx Fx,t (m) = O(t), (32)
x∈X

and
−1
sup |Fx−1 (m) − Fx,t (m)| = O(t). (33)
x∈X

−1
Proof Set Ax,t = {y : ky − xk2 ≤ Fx,t (m)} and let γt be any positive, decreasing function
of t such that γt = o(t) as t → 0. Then, for all small enough t and all x ∈ X , the set
−1
Ax,t,γt = {y : ky − xk2 ≤ Fx,t (m) − γt } is non-empty and
−1 −1
 
Fx Fx,t (m) − γt + tQt (Ax,t,γt ) ≤ m ≤ Fx Fx,t (m) + tQt (Ax,t ), (34)
−1 −1
because P (Ax,t ) = Fx (Fx,t (m)) and P (Ax,t,γ ) = Fx (Fx,t (m) − γt ). Rearranging and using
−1 −1
the bound supx∈X Fx (Fx,t (m)−γt ) = supx∈X Fx (Fx,t (m))+O(γt ), which holds for all small
enough t, we obtain that, for all such values of t,
−1

sup m − Fx Fx,t (m) ≤ t sup max {|Qt (Ax,t )|, |Qt (Ax,t,γt )|} + o(t) = tO (|Q(S)|) ,
x∈X x∈X

since |Qt (S)| = |Q(S)| + o(1) as t → 0. This establishes (32). Next, by the monotonicity
of Fx for each x ∈ X and the facts – both implied by (24) – that m = Fx (Fx−1 (m)) and
inf x∈X Fx0 (Fx−1 (m)) is bounded away from 0, (32) yields that
−1
lim sup |Fx−1 (m) − Fx,t (m)| = 0. (35)
t→0 x∈X

Combining the last display with the bound



−1
sup Fx Fx−1 (m) − Fx Fx,t

(m) = tO (|Q(S)|)

x∈X

and assumption (24) again, we obtain that


−1
sup Fx−1 (m) − Fx,t

(m) ≤ CtO(|Q(S)|) = O(t),
x∈X

for all t small enough, where C is the constant in (24). This completes the proof of (33).

21
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

4.1 Significance of Topological Features


Fasy et al (2014) showed how to use the bootstrap to test the significance of a topological
feature. They did this for distance functions and density estimators but the same idea works
for DTM as we now explain. We assume in this section that the support of the distribution
is contained in a compact set. The supremum norm refers to the supremum over this set.
Given a feature with birth and death time (u, v), we will say that the feature is significant

if |v − u| > 2cα / n where cα is defined by

b − δ(x)k∞ > cα ) = α.
P( nkδ(x)

In particular, cα can be estimated from the bootstrap as we showed in the previous section.
Specifically, define b
cα by

P( nkδb∗ (x) − δ(x)k
b cα |X1 , . . . , Xn ) = α.
∞ >b

Then bcα is a consistent estimate of cα .


To see why this makes sense, let D be the set of all persistence diagrams. Let D ≡ Dδ
be the true diagram and let D b ≡ Db be the estimated diagram. Let
δ
 

b E) ≤ √
Cn = E ∈ D : W∞ (D,
b
.
n

Then

 

b ≤√
P(D ∈ Cn ) = P W∞ (D, D) ≥ P( n||δ(x)
b − δ(x)||∞ ≤ b
cα ) → 1 − α
b
n

as n → ∞. Now |v − u| > 2b cα / n if and only if the feature cannot be matched to the
diagonal for any diagram in C. (Recall that the diagonal corresponds to features with zero
lifetime.)

We can visualize the significant features by putting a band of size 2cα / n around the
diagonal of D.
b See Figure 6.

5. Theory for Kernels


In this section, we consider an alternative to the DTM, namely, kernel based methods. This
includes the kernel distance and the kernel density estimator.
Phillips et al. (2014) suggest using the kernel distance for topological inference. Given
a kernel K(x, y), the kernel distance between two probability measures P and Q is
sZ Z ZZ ZZ
DK (P, Q) = K(x, y)dP (x)dP (y) + K(x, y)dQ(x)dQ(y) − 2 K(x, y)dP (x)dQ(y).

It can be shown that DK (P, Q) = kµP − µQ k for vectors µP and µQ in an appropriate


reproducing kernel Hilbert space (RKHS). Such distances are popular in machine learning;
see Sriperumbudur et al. (2009), for example.

22
Robust Topological Inference

Cassini with Noise DTM Sublevel Set, t=0.5 Persistence Diagram


0.4 0.6 0.8 1.0


● ● ●
● ● ●
● ● ●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●
● ●●●
●●●
●●●● ●● ● ●●
●●●
●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●●
● ●
●●●● ● ●●●●
●●●
● ●
●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●● ●
●●●

● ●

●●● ●●
●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●● ●
●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

Death
Death
●●● ● ●● ●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●● ● ●
●● ●● ● ● ●
●●
●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●● ●● ● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ● ● ●●
●● ●●
● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


● ● ●● ●●●
● ●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●



● ●
● ●●● ●
●●●● ●
● ●●
●● ●
● ● ●●


●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●●● ●●●
● ● ● ● ●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●● ● ● ●●

●● ●●
●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ●


● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ●
●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●● ● ● ● ●
● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●






●● ●
● ●










● ●



●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● dim 0
● ● ● ● ● ●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

dim 1
●●●●

● ●●
●●●
● ●● ● ●●● ●●
●●

●●●●
●●
●●
●●●● ● ● ●● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ●●● ●●●●● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●
● ●● ● ●

●●

● ●●●
●●

●●
●●

●●

● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●

●●

●●● ● ●● ●


●●●
● ● ● ●●
●●
●● ● ●
● ●●


● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●
●●●
●●
● ●● ●

●●● ●

●●● ●
●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●


● ●
●●

●●●●

● ●●●●● ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●● ●●●●●
● ●●●
●● ●●
●●
●●
●●
●●●
●●●●●

● ●●● ● ● ●●
●●
●●●●●

●●● ● ●●
● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ● ● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●


● ● ● ● ●

0.4 0.6 0.8 1.0


Birth

Figure 6: The left plot shows a sample from the Cassini curve together with a few outliers.
The second plot is the empirical DTM. The third plot is one sub-level set of
the DTM. The last plot is the persistence diagram. Points not in the shaded
band are significant features. Thus, this method detects one significant connected
component and two significant loops in the sublevel set filtration of the empirical
DTM function.

Given a sample X1 , . . . , Xn ∼ P , let Pn be the probability measure that puts mass 1/n
on each Xi . Let ϑx be the Dirac measure that puts mass one on x. Phillips et al. (2014)
suggest using the discrete kernel distance

v
u X n X
n n
b K (x) ≡ DK (Pn , ϑx ) = t 1 2X
u
D K(Xi , Xj ) + K(x, x) − K(x, Xi ) (36)
n2 n
i=1 j=1 i=1

for topological inference. This is an estimate of the population quantity

sZ Z Z
DK (x) ≡ DK (P, ϑx ) = K(z, y)dP (z)dP (y) + K(x, x) − 2 K(x, y)dP (y).

 2

The most common choice of kernel is the Gaussian kernel K(x, y) ≡ Kh (x, y) = exp − kx−yk
2h2 ,
which has one tuning parameter h. We recall that, in topological inference, we generally do
not let h tend to zero. The reason is that topological features can be detected with h > 0
and keeping h bounded way from 0 reduces the variance of the estimator. See the related
discussion in Section 4.4 of Fasy et al. (2014b).
Recall that the kernel density estimator is defined by

1 X
pbh (x) = √ K(x, Xi ).
n( 2πh)d i

23
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

Let ph (x) = E[b


ph (x)]. We see that
√ !
2 d ( 2π)d X −d
√ d
D
bK (x) =h pbh (Xi ) + h K(0, 0) − 2( 2π) pbh (x)
n
i
√ d √ d !
( 2π) X ( 2π) X √
= hd ph (Xi ) − ph (Xi )] +
[b ph (Xi ) + h−d K(0, 0) − 2( 2π)d pbh (x)
n n
i i
r !
√ log n √
= ( 2π)d hd (c + oP (1)) + OP + K(0, 0) − 2( 2πh)d pbh (x).
n

−1
Pn p
Here, we used
R the fact that n i=1 p h (Xi ) = c + o P (1) and ||b
p h − p h ||∞ = OP ( log n/n)
where c = ph p.
We see that up to small order terms, the sublevel sets of DK (x) are a rescaled version
of the super-level sets of the kernel density estimator. Hence, the kernel distance approach
and the density estimator approach are essentially the same, up to a rescaling. However,
DK2 has some nice properties; see Phillips et al. (2014).

The limiting properties of D b 2 (x) follow immediately from well-known properties of


K
kernel density estimators. In fact, the conditions needed for D b 2 are weaker than for the
K
DTM.

Theorem 16 (Limiting Behavior of Kernel Distance) We have that


√ 2 2
n(D
bK − DK ) B,

where B is a Brownian bridge. The bootstrap version converges to the same limit, condi-
tionally almost surely.

The proof of the above theorem is based on the aforementioned equivalence of DK to



the rescaled density function and the well known fact that n(b ph (x) − ph (x)) converges to
a Brownian bridge. This theorem justifies using the bootstrap to construct L∞ bands for
ph = E(bph ) or DK .
As we mentioned before, for topological inference, we keep the bandwidth h fixed. Thus,
Rit is important to keep in mind that we view pbh as an estimate of ph (x) = E[b ph (x)] =
Kh (x, u)dP (u).

6. The Bottleneck Bootstrap


More precise inferences can be obtained by directly bootstrapping the persistence diagram.
Let X1∗ , . . . , Xn∗ be as before a sample from the empirical measure Pn and let D
b ∗ be the
(random) persistence diagram defined on this point cloud. Define b tα by

b ∗ , D)
P( nW∞ (D tα | X1 , . . . , Xn ) = α.
b >b (37)

The quantile b
tα can be estimated by Monte Carlo. We then use a band of size 2b
tα on the
diagram D.

24
Robust Topological Inference

In the following, we show that b


tα consistently estimates the population value tα defined
by

P( nW∞ (D,
b D) > tα ) = α. (38)

The reason why the bottleneck bootstrap can lead to more precise inferences than the
functional bootstrap from the previous section is that the functional bootstrap uses the fact
that W∞ (D, b D) ≤ ||δb − δ||∞ and finds an upper bound for ||δb − δ||∞ . But in many cases
the inequality is not sharp so the confidence set can be very conservative. Moreover, we
can obtain different critical values for different dimensions (connected components, loops,
voids, ...) and so the inferences are tuned to the specific features we are estimating. See
Figure 7.

Cassini with Noise DTM Bootstrap Bottleneck Bootstrap Together


● ● ●
1.0

1.0

1.0
● ● ● ● ●
● ● ● ●
● ● ● ● ● ●● ● ● ●
●● ● ● ● ● ● ●● ●
●● ● ●●
● ●● ● ● ●●
● ● ● ● ● ● ● ●
●● ● ● ● ● ●
● ● ●●
●●●●● ●● ●●
● ● ● ● ●●● ●
●● ●

●●● ●●
● ●
●● ● ●● ●
●●
● ● ●●●
● ●●
0.6 0.8

0.6 0.8

0.6 0.8

●●
●● ●●
●●●●●● ● ●●●
● ●● ●
● ●●●
●● ●
● ●● ● ●
●●
● ● ● ● ●●● ●● ●
● ● ● ● ● ●●●
●●● ● ● ●
●● ●●
Death
Death

Death
Death

Death
Death
● ●● ●● ● ● ● ●●●● ●


●● ● ●●
●● ●● ●●●●
● ●● ●●● ●


● ●
● ●
●● ● ●

● ● ●
● ●● ● ●● ● ●●● ● ● ●
●●
● ● ●● ● ●
●●
● ● ● ●● ● ●
●●
●●● ●
●●●
●●● ● ●●
●● ● ● ●● ●● ●●
●● ● ● ● ●●●
●●●●● ●
●●
●●● ● ● ● ● ●

● ● ● ●●●●
●●●● ● ● ● ● ●

● ● ●● ● ● ●
●●
● ● ● ● ● ●
● ● ● ● ●

● ●● ● ● ● ●●


● ● ● ●● ● ●
● ● ● ● ●● ●

● ● ●● ● ● ● ● ● ● ●

● ● ● ● ● ●
●●●●
● ● ● ●
● ● ● ● ● ●
● ● ● ● ● ● ●


● ●

● ● ●
● ● ● ● ● ●●● ● ●

● ● ● ● ●●
● ●
● ● ● ● ● ● ● ●● ● ● ●●
● ●● ● ● ● ● ● ● ● ● ●
● ●


●●

●●
●●




● ●●

●●● ●●


●●●●
●●
● ●●● ● ●
●●●●●● ●
●●●
●●

● ●

●●











● dim 0


●●● ●●●
●●●
●●

● ● ● ●
● ●●
●●
●●
●●● ● ●
● ●
● ● ●
● ● ●●●●

●● ●●●●

●● ●●●●

●●
●● ●● ●
dim 1
0.4

0.4

0.4
●●● ● ● ● ●
● ●●● ●● ●
●●


● ●●●●


●●●● ●●●
●● ●● ●●
●● ●● ●●

●● ● ●
●● ● ●

●●●● ● ●●●
● ● ●●●
●●● ● ●● ●
●●●


●● ● ● ●●● ●
●● ● ●● ●● ● ●●●
●●
●●● ● ●
●● ●
●●●●●●● ● ● ●● ● ●
● ●●● ●

● ●


●●●●●●





● ●



●●


●●●
●●●●●
●● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ●●

● ●
● ● ● ● ●●
● ● ●● ●
● ●

0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0 0.4 0.6 0.8 1.0
Birth Birth Birth

Figure 7: The left plot shows a sample from the Cassini curve together with a few outliers.
The second plot shows the DTM persistence diagram with a 95% confidence
band constructed using the method of Section 4. The third plot shows the same
persistence diagram with two 95% confidence bands constructed using the bot-
tleneck bootstrap with zero-dimensional features and one-dimensional features.
The fourth plot shows the three confidence bands at the same time. In Section
8, we use this compact form to show multiple confidence bands.

Although the bottleneck bootstrap can be used with either the DTM or the KDE, we
shall only prove its validity for the KDE. First, we need the following result. For any
function p, let g = ∇p denote its gradient and let H = ∇2 p denotes its Hessian. We say
that x is a critical point if g(x) = (0, . . . , 0)T . We then call p(x) a critical value. A function
is Morse if the Hessian is non-degenerate at each critical point. The Morse index of a critical
point x is the number of negative eigenvalues of H(x).

Lemma 17 (Stability of Critical Points) Let p be a density with compact support S.


Assume that S is a d-dimensional compact submanifold of Rd with boundary. Assume p
is a Morse function with finitely many, distinct, critical values with corresponding critical
points c1 , . . . , ck . Also assume that p is of class C 2 on the interior of S, continuous and
differentiable with non vanishing gradient on the boundary of S. Then, there exist 0 > 0
and c > 0 such that for all 0 <  < 0 , there exists η ≥ c such that, for any density q with

25
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

support S satisfying

sup |p(x) − q(x)| < η, sup |∇p(x) − ∇q(x)| < η, sup |∇2 p(x) − ∇2 q(x)| < η,
x x x

q is a Morse function with exactly k critical points c01 , . . . , c0k say, and, after a suitable
re-labeling of indices,
max ||cj − c0j || ≤ .
j

Moreover, cj and c0j have the same Morse index.

Proof This lemma is a consequence of classical stability properties of Morse functions.


First, from Theorem 5.31, p.140 in Banyaga and Hurtubise (2004) and Proposition II.2.2,
p.79 in Golubitsky and Guillemin (1986), there exists 1 > 0 such that if q is at distance
less than 1 in the C 2 topology (i.e. such that the sup-norm of p − q and its first and
second derivatives are bounded by 1 ) then q is a Morse function. Moreover, there exist
two diffeomorphisms h : R → R and φ : S → S such that q = h ◦ p ◦ φ. As the notion of
critical point and of index are invariant by diffeomorphism, p and q have the same number
of critical points with same index. More precisely, the critical points of q are the points
c0i = φ−1 (ci ).
Now let  > 0 be small enough such that 2 < mini6=j kci − cj k, and for any i 6= j,
p(B(ci , )) ∩ p(B(cj , )) = ∅. Then η1 = η1 () = mini6=j d(p(B(ci , )), p(B(cj , ))) where
d(A, B) = mina∈A,b∈B |a − b| and η2 = η2 () = inf{k∇p(x)k : x ∈ S \ ∪ki=1 B(ci , )} are
both positive. If q satisfies the assumptions of the lemma for any 0 < η ≤ min(η1 , η2 ), then
the critical values of q have to be in ∪i p(B(ci , )) and the critical points c0i have to be in
∪i B(ci , ).
More precisely, notice that since p is a Morse function, for  small enough, η2 = O(),
and, for any i ∈ {1, · · · , k}, the Taylor series of ∇p about cj yields

∇p(x) = Hi (x − ci ) + kx − ci kr (x − ci ) ,

where r(z) → 0 as kzk → 0 and Hi is the Hessian of p at ci . Let λmin be the smallest
absolute eigenvalue of the Hessians at all the critical points. Since p is a Morse function,
the matrix Hi is full rank and λmin is positive. As a consequence, for all x ∈ S \ ∪ki=1 B(ci , )
and  small enough, k∇p(x)k ≥ λmin 2 . Since η1 is a non-increasing function of , we have
that, for  small enough, η = η2 ≥ λmin 2 .
To conclude the proof of the lemma, we need to prove that each ball B(ci , ) contains ex-
actly one critical point of q. Indeed, for t ∈ [0, 1], the functions qt (x) = p(x) + t(q(x) − p(x))
are Morse functions satisfying the same properties as q. Now, since each ci is a non-
degenerate point of p, it follows from the continuity of the critical points (see, e.g. Prop.
4.6.1 in Demazure (2013)) that, restricting  if necessary, there exist smooth functions
ci : [0, 1] → S, ci (0) = ci , ci (1) = c0i such that ci (t) is the unique critical point of qt in
B(ci , ). Moreover, since all the qt are Morse functions and since the Hessian of qt at ci (t)
is a continuous function of t, then for any t ∈ [0, 1], ci (t) is a non-degenerate critical point
of qt with same index as ci .

26
Robust Topological Inference

Figure 8: This figure illustrates the assumptions of Lemma 18. The functions p and q are
shown in solid blue and dashed pink, respectively. The grey regions on the y-axis
represent the sets p(c) ± b for critical points c of p.

Consider now two smooth functions such that the critical points are close, as illustrated
in Figure 8. Next we show that, in this circumstance, the bottleneck distance takes a simple
form.

Lemma 18 (Critical Distances) Let p and q be two Morse functions as in Lemma 17,
with finitely many critical points C = {c1 , . . . , ck } and C 0 = {c01 , . . . , c0k } respectively. Let Dp
and Dq be the persistence diagrams from the upper level set (i.e. super level sets) filtrations
of p and q respectively and let a = mini6=j |p(ci ) − p(cj )| and b = maxj |p(cj ) − q(c0j )|. If
b ≤ a/2 − ||p − q||∞ and a/2 > 2||p − q||∞ , then W∞ (Dp , Dq ) = b.

Proof The topology of the upper level sets of the Morse functions p and q only changes
at critical values (Theorem 3.1 in Milnor (1963)). As a consequence the non-diagonal
points of Dp (resp. Dq ) have their coordinates among the set {p(c1 ), . . . , p(ck )} (resp.
{q(c01 ), . . . , p(c0k )}) and each p(ci ) is the coordinate of exactly one point in Dp . Moreover,
the pairwise distances between the points of Dp are lower bounded by a and all non-diagonal
points of Dp are at distance at least a from the diagonal. From the persistence stability
theorem Cohen-Steiner et al. (2005); Chazal et al. (2012), W∞ (Dp , Dq ) ≤ ||p − q||∞ . Since
a > 4||p − q||∞ and a ≥ 2b + 2||p − q||∞ , the (unique) optimal matching realizing the bot-
tleneck distance W∞ (Dp , Dq ) is such that if (p(ci ), p(cj )) ∈ Dp then it is matched to the
point (q(c0i ), q(c0j )) which thus have to be in Dq . It follows that W∞ (Dp , Dq ) = b.


Now we establish the limiting distribution of nW∞ (D,
b D).

Theorem 19 (Limiting Distribution) Let ph (x) = E[b ph (x)], where pbh (x) is the Kernel
Density Estimator evaluated in x. Assume that ph is a Morse function with two uniformly
bounded continuous derivatives and finitely many critical points c = {c1 , . . . , ck }. Let D be
the persistence diagram of the upper level sets of ph and let Db be the diagram of upper level
sets of pbh . Then

nW∞ (D,b D) ||Z||∞

27
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

where Z = (Z1 , . . . , Zk ) ∼ N (0, Σ) and


Z Z Z
Σjk = Kh (cj , u)Kh (ck , u)dP (u) − Kh (cj , u)dP (u) Kh (ck , u)dP (u).

Proof Let b c = {b c2 , . . . } be the set of critical points of pbh . Let g and H be the
c1 , b
gradient and Hessian of ph . Let gb and H b be the gradient and Hessian of pbh . By a standard
concentration of measure argument (and recalling that the support is compact), for any
η > 0 there is an event An,η such that, on An,η ,
(i) (i)
sup ||b
ph (x) − ph (x)|| < η (39)
x
2
for i = 0, 1, 2, and P(Acn,η ) ≤ e−ncη . This is proved for i = 0 in Rao (1983), Giné and
Guillou (2002), Yukich (1985), and the same proof gives the results for i = 1, 2. It follows
√ √
that supx ||g(x) − gb(x)|| = OP (1/ n) and supx ||H(x) − H(x)||
b max = OP (1/ n).
For η smaller than a fixed value η0 , we can apply Lemma 17, we get that on An,η , b
c and
c have the same number of elements and can be indexed so that
η
max kb cj − cj k ≤
j=1,...,k C
q
where C is the same constant is in Lemma 17. We then take ηn := logn n and we consider
the events An := An,ηn . Then, for n large enough, on An we get
r !
log n
max kb cj − cj k = O
j=1,...,k n

whereas P (Acn ) = o(1). In the following, we thus can restrict to An .


The critical values of ph are v = (v1 ≡ ph (c1 ), . . . , vk ≡ ph (ck )) and the critical values
of pbh are vb = (bv1 ≡ pbh (b
c1 ), . . . , vbk ≡ pbh (b
ck )). Now we use Lemma 18 to conclude that
W∞ (D,b D) = maxj kb vj − vj k∞ for n large enough. Hence,

b D) = max |b
W∞ (D, cj ) − ph (cj )|.
ph (b
j=1,...,k

Then, using a Taylor expansion, for each j,

pbh (b cj − cj )T gb(cj ) + O(||b


cj ) = pbh (cj ) + (b cj − cj ||2 ).

Since g(cj ) = (0, . . . , 0) we can write the last equation as

pbh (b cj − cj )T (b
cj ) = pbh (cj ) + (b cj − cj ||2 ).
g (cj ) − g(cj )) + O(||b

So,
√ √
vj − vj ) =
n(b n(b cj ) − ph (cj ))
ph (b
√ √
= ph (cj ) − ph (cj )) +
n(b cj − cj )T (b
n(b cj − cj ||2 )
g (cj ) − g(cj )) + O(||b
√ √ √
= ph (cj ) − ph (cj )) +
n(b cj − cj )T (b
n(b g (cj ) − g(cj )) + o(1/ n).

28
Robust Topological Inference

√ √
For the second term, note that cj − cj ) = O(log n) and (b
n(b g (cj ) − g(cj )) = OP (1/ n). So
√ √
vj − vj ) = n(b
n(b ph (cj ) − ph (cj )) + oP (1).

Therefore,
√ √ √
b D) = n max |b
nW∞ (D, vj − vj | = max | n(b
ph (cj ) − ph (cj ))| + oP (1).
j j

By the multivariate Berry-Esseen theorem (Bentkus, 2003),

√ C1
ph (c) − ph (c)) ∈ A) − P(Z ∈ A)| ≤ √
sup |P( n(b
A n

where the supremum is over all convex sets A ∈ Rk , C1 > 0 depends on k and the third
moment of h−d K(x − X/h) (which is finite since h is fixed and the support is compact),
Z = (Z1 , . . . , Zk ) ∼ N (0, Σ) and
Z Z Z
Σjk = Kh (cj , u)Kh (ck , u)dP (u) − Kh (cj , u)dP (u) Kh (ck , u)dP (u).

Hence,
 √  C1
ph (cj ) − ph (cj ))| ≤ t − P(||Z||∞ ≤ t) ≤ √ .
sup P max | n(b

t j n

b D) = maxj |b
By Lemma 18, W∞ (D, vj − vj |. The result follows.

Let

b D) ≤ t).
Fbn (t) = P( nW∞ (D,
Let X1∗ , . . . , Xn∗ ∼ Pn where Pn is the empirical distribution. Let D b ∗ be the diagram from

pbh and let
√ 
Fbn (t) = P nW∞ (D b ∗ , D)
b ≤ t X1 , . . . , Xn

be the bootstrap approximation to Fn .


Next we show that the bootstrap quantity Fn (t) converges to the same limit as Fn (t).

Corollary 20 Assume the same conditions as the last theorem. Then,

P
sup |Fbn (t) − Fn (t)| → 0.
t

Proof The proof is essentially the same as the proof of Theorem 19 except that pbh replaces
ph and pb∗h replaces pbh . Using the same notations as in the proof of Theorem 19, we note
that on the set An , for n larger than a fixed value n0 , the function pbh is a Morse function
with two uniformly bounded continuous derivatives and finitely many critical points b c =
{b ck }. We can restrict the analysis to the sequence of events An since P (An ) tends
c1 , . . . , b

29
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

to zero. Assuming that An is satisfied, using the same argument as in Theorem 19, we get
that:

 √ ∗  C2∗
e ∞ ≤ t) ≤ √
sup P max | n(b cj ) − pbh (b
ph (b cj ))| ≤ t X1 , . . . , Xn − P(||Z||

t j n

where Ze ∼ N (0, Σ)
b with

b jk = 1 1X 1X
X
Σ Kh (b ck , Xi ) −
cj , Xi )Kh (b Kh (b
cj , Xi ) Kh (b
ck , Xi )
n n n
i i

and C2∗
depends on the empirical third moments of h−d K((x−X ∗ )/h). There exists an upper
b j,k − Σj,k | = OP (log n/√n)
bound C2 on C2∗ that only depends on K and P . Since maxj,k |Σ

and maxj kb
cj − cj k = OP (log n/ n), we conclude that
 
log n
sup |P(||Z||∞ ≤ t) − P(||Z||∞ ≤ t)| = OP √
e .
t n
Then
sup |Fbn (t) − Fn (t)| ≤ sup |Fbn (t) − P(||Z||
e ∞ ≤ t)|
t t
 
log n
+ sup |P(||Z||
e ∞ ≤ t) − P(||Z||∞ ≤ t)| + sup |Fn (t) − P(||Z||∞ ≤ t)| = OP √ .
t t n
The result follows.

7. Extensions
In this section, we discuss how to deal with three issues that can arise: choosing the pa-
rameters, correcting for boundary bias, and dealing with noisy data.

7.1 A Method for Choosing the Smoothing Parameter


An unsolved problem in topological inference is how to choose the smoothing parameter m
(or h). Guibas et al. (2013) suggested tracking the evolution of the persistence of the ho-
mological features as the tuning parameter varies. Here we make this method more formal,
by selecting the parameter that maximizes the total amount of significant persistence.

Let `1 (m), `2 (m), . . . , be the lifetimes of the features at scale m. Let cα (m)/ n be
the significance cutoff at scale m. We define two quantities that measure the amount of
significant information using parameter m:
  X 
cα (m) cα (m)
N (m) = # i : `(i) > √ , S(m) = `i − √ .
n n +
i

These measures are small when m is small since cα (m) is large. On the other hand, they
are small when m is large since then all the features are smoothed out. Thus we have a kind
of topological bias-variance trade-off. We choose m to maximize N (m) or S(m). The same
idea can be applied to the kernel distance and kernel density estimator. See the example in
Figure 9.

30
Robust Topological Inference

Figure 9: Max Persistence Method with Bottleneck Bootstrap Bands for 1-dimensional
features. DTM: argmaxm N (m) = {0.05, 0.10, 0.15, 0.20}, argmaxm S(m) =
0.05 ; Kernel Distance: argmaxh N (h) = {0.25, 0.30, 0.35, 0.40, 0.45, 0.50},
argmaxh S(h) = 0.35; KDE: argmaxh N (h) = {0.25, 0.30, 0.35, 0.40, 0.45, 0.50},
argmaxh S(h) = 0.3 The plots show how to choose the smoothing parameters to
maximize the number of significant features. The red triangles are the lifetimes
of the features versus the tuning parameter.The red line is the significance cutoff.

7.2 Boundary Bias

It is well known that kernel density estimators suffer from boundary bias. For topological
inference, this bias manifests itself in a particular form and the same problem affects the
DTM. Consider Figure 10. Because of the bounding box, many of the loops are incomplete.
The result is that, using either the DTM or the KDE we will miss many of the loops.
There is a large literature on reducing boundary bias in the kernel density estimation
literature. Perhaps the simplest approach is to reflect the data around the boundaries (see
for example Schuster (1958)). But there is a simpler fix for topological inference: we merely
need to close the loops at the boundary. This can be done by adding points uniformly
around the boundary.

7.3 Two Methods for Improving Performance

We can improve the performance of all the methods if we can mitigate the outliers and
noise. Here we suggest two methods to do this. We focus on the kernel density estimator.
First, a simple method to reduce the number of outliers is to truncate the density, that
is, we eliminate {Xi : pb(Xi ) < t} for some threshold t. Then we re-estimate the density.
Secondly, we sharpen the data as described in Choi and Hall (1999) and Hall and
Minnotte (2002). The idea of sharpening is to move each data point Xi slightly in the
direction of the gradient ∇bp(Xi ) and then re-estimate the density. The authors show that
this reduces the bias at peaks in the density which should make it easier to find topological
features. It can be seen that the sharpening method amounts to running one or more steps
if the mean-shift algorithm. This is a gradient ascent which is intended to find modes of
the density estimator. Given a point x, we move x to

31
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

2D Voronoi Model
2D Voronoi Model DTM m=0.005 DTM m=0.005
with Boundary Correction

0.14
●●
● ●
●●

● ●● ●
●● ●




●●


●●

●●


●●


●●

●●


●●



●●


●●

●●


●●


●●

●●


●●

●●


●●

●●


●●


●●

●●


●●


●●

●●








●●


●●


●●

●●


●●

●●


●●

●●


●●


●●

●●


●●

●●


●●


●●

●●


●●

●●


●●
●●


●●

●●


●●



●●

●●


●●

●●


●●


●●

●●


●●


●●






●●


●●

●●


●●


●●

●●


●●


●●

●●


●●


●●

●●





●●
● ●

●● ●●
● ●
● ●

● ●●
● ●

●● ●●
● ●
● ●


● ●

● ● ●
● ● ●
● ●

● ● ●
● ●

0.00 0.05 0.10 0.15




● ●

● ●
● ●
● ● ●

● ●

● ●
● ●
● ●


● ●

●● ●
● ●● ●

● ●
● ●

●● ●
● ●● ●


●● ●
●●
● ●
●● ●
● ●
● ●
●● ●
●●
● ●
●● ●
● ●


●●




●●
●●

● ●
●●





●●







● ●




●●




●●
●●

● ●●






●●













● ●●

● ●
●●


●●


●●


●●
● ●


●●
● ●

● ●
● ●
● ●●

● ●
●●


●●

●●


●●

● ●


●●
● ●

● ●

●●
● ●


●●

●●


●●

●●


● ●

● ●●

● ●
● ●
● ●●
● ●


●●

●●


●●

●●


● ●

● ●●

● ●
● ●



● ●
●●
● ●
● ●

●●

● ●
● ●
● ●● ●
●●
● ●
● ●


●●
● ●
● ●


● ●

●● ●● ●●

● ●
● ●
● ●

● ●

●● ●
● ●●

● ●
● ●

●●
●●

●● ●


●● ●

● ●


●● ●
● ●
● ●●
●●

●● ●


●● ●

● ●


●● ●
● ●



●●●
● ●
●● ●●

● ●
● ●
●●


● ●


●●

● ● ●
●●●
● ●
●● ●●

● ●
● ●
●●


● ●


●●

● ●

●●

● ●
●●
● ●●

● ●●
● ●
●●●●


●●
● ●

●●


●●

●●


●●


●●

●●


●●
● ●

● ●

●● ●
●●
● ●●

● ●●
● ●
●●●●


●●
● ●

●●


●●

●●


●●


●●

●●


●●
●●

●●
● ●
●● ●●

● ●●

●●


●●

●●●
●● ●

●●

● ●●

● ●
● ●

● ●
●● ●●

● ●●

●●


●●

●●●
●● ●

●●

● ●●


Death
Death

Death
Death

●●
● ●

●● ●
●●
● ●

●●

●●

●●


●●


●●

●●

●●
● ●● ●
●●
● ●
● ●
●●
● ●

●● ●
●●
● ●

●●

●●

●●


●●


●●

●●
●●

● ●● ●
●●
● ●


● ● ● ●
●●


● ● ● ●
● ● ● ●
●●


● ● ● ●

0.08
●●
● ●●
● ●


●●


●●

●●●

●●


● ●
● ●
●● ●
● ●●
● ●●
● ●


●●


●●

●●●

●●


● ●
● ●
●● ●


●●

● ●●


●●


●●
● ●

● ●

●● ●
● ●
●●

● ●●


●●


●●
● ●

● ●

●● ●



●● ●
●●
● ●●

●● ●● ●
●● ●
● ●

●● ●
●●
● ●●

●● ●● ●
●● ●


●●


● ●●
● ●

●●
● ●

● ●

●● ●
●● ●●

●● ●●
● ●

●●
● ●

● ●

●● ●

●●
● ●
●●
● ●●

● ●
● ●

●● ●
● ●●
● ●
●●
● ●●

● ●
● ●

●● ●




●● ●

●● ●


●● ●●


● ●
●●

●● ●

●● ●


●● ●●


● ●




●●

●●
●●

●●

●●

●●


● ●●

● ●
●●

● ●
● ●



●●

●●
●●

●●

●●

●●


● ●●

● ●
●●

● ●
● ●




●●

●●
●●
●●

●●
● ●●

● ●●


● ●

● ●
● ●

●●


●●
●●
●●

●●
● ●●

● ●●


● ●

● ●

●●

●●


●●

● ●

●● ●
● ●
● ●●

●●


●●

● ●

●● ●
● ●

● ●●

●●


●●


●●

●●

●●


●●

●●

●●
●●

●●


●●

●●


●●

●●
● ●
● ●
● ●●

●●


●●


●●

●●

●●

●●


●●
●●

●●

●●


●●

●●


●●


●● ●




● ●●
● ●
●●
● ●
● ●

● ●●
● ●
●●
● ●




● ●
●●
● ●●


● ●
● ●


● ●
●●
● ●●


● ●

●●
● ● ●
● ● ●●● ● ●
● ●

●●








●●




●●
● ●

●●

●●


●●
●●

● ●










●● ●
●●

●●


●●

●●


●●









● ●●

●●








●●

● ●




●●
●●


●●
●●

● ●










●● ●
●●

●●


●●

●●


●●









● ● ●
● dim 0
●● ● ●● ● ●● ● ●● ● ● ● ●
●● ●

● ●
● ● ●

●●

●●


● ● ● ●
● ●
● ● ●

●●

●●


● ●


● ●●
● ●● ●
● ●
●●


●●

●●


●●

●●

●●

●●


●●

●●

● ● ●
● ●

● ●●
● ●● ●
● ●
●●


●●

●●


●●

●●

●●

●●


●●

●●

● ●
● ●
● ● ● ● ●
dim 1

0.02

● ●● ●
● ●
● ● ●
● ●● ●
● ●
● ●
●●


● ●
● ● ● ●
●● ● ●
● ●
● ● ● ●●






●● ●●


●●●



● ●
●●

●●

●●



● ●●●

●●

●●
●●

●● ●●


●●●



● ●
●●

●●

●●



● ●

● ●●

●●
●●


●●



● ●





● ●


●●
●●
●●


● ● ●●
● ●●●
●● ●




● ●





● ●


● ●
●●


● ●

● ●●
● ●

● ●●●
●●



●●


●●


●●



●●


●●

●●
●●


●●

●●


●●
●●
●●


●●

●●

●●

●●

●●

●●

●●



●●








●●


●●


● ●

●●

●●


● ●

●●
●●
●●

●●

●●


●●

●●





● ●●
●●
●● ●






●●


●●


●●



●●


●●

●●
●●


●●

●●

●●

●●
●●

●●

●●


●●

●●

●●

●●

●●








●●


●●●


●●


● ●

●●

●●


● ●
●●

●●
●●

●●

●●


●●

●●











●●
●●

●●


● ●

● ●


● ●
●●

●●
● ●
● ●

● ●●

●●


● ●

● ●


● ●
●●
●●

● ●
● ●




●●

●● ●
● ●

● ●
●●
●●

● ●
● ● ●

●●


●● ●
● ●

● ●
●●
●●

● ●
● ●



●●


●●
● ●
●● ●


● ●
●●

●●


●● ●
●●

●●
● ●
● ●
●●

●●

● ●
●● ●


● ●
●●

●●


●● ●
●●

●●
● ●

●●

●●

●● ●●
● ●

●● ●●

●●●● ●

● ●●

●●

●● ●●
● ●

●● ●●

●●●● ●



●●

●●
● ●

● ●●


●● ●

● ● ●
●●


●●
● ●

● ●●
●●
● ●
● ●
● ●


● ●
● ●

●●

●● ●
● ●
● ●
● ●
● ●

●●

●● ●
● ●

●● ●
● ●●

●●

● ●
● ●
● ●● ●
● ●●

●●

● ●
● ●


● ●
● ●

●●
●● ●
● ●
● ●
● ●

● ●
●●

●● ●
● ●


● ●
● ●●

●●
● ●
● ●

● ●
● ●
● ●●

●●
● ●
● ●

●● ●

● ●
●●
● ●
● ● ●
● ●
● ●
●●
● ●
● ●







● ●




●●

●●


●●
●●


●●
●●





● 0.00 0.05 0.10 0.15 ●







●●


●●

●●


●●


●●

●●


●●

●●


●●

●●















●●


●●

●●


●●

●●


●●


●●

●●


●●


●●

●●


●●

●●


●●


●●

●●


●●

●●










●●




































●●


●●

●●


●●

●●


●●
●●


●●

●●


●●


●●

●●


●●

●●


●●










●●


●●

●●


●●

●●


●●

●●


●●


●●

●●


●●


●●

●●


●●


●●










0.00 0.05 0.10 0.15
Birth Birth

Figure 10: First: 10,000 points sampled from a 2D Voronoi model with 20 nuclei. Second:
the corresponding persistence diagram of sublevel sets of the distance to measure
function. Note that only 9 loops are detected as significant. Third: 2,000 points
have been added on the boundary of the square delimiting the Voronoi model.
Fourth: now the corresponding persistence diagram shows 16 significant loops.

P
i Xi Kh (x, Xi )
P ,
i Kh (x, Xi )

which is simply the local average centered at x. For data sharpening, we do one (or a few)
iterations of this to each data point Xi . Then the density is re-estimated.
In fact, we could also use the subspace constrained mean shift algorithm (SCMS) which
moves points towards ridges of the density; see Ozertem and Erdogmus (2011).
Figure 11 shows these methods applied to a simple example.

8. Examples
Example 1 (Noisy Grid) The data in Figure 12 are 10,000 data points on a 2D grid.
We add Gaussian noise plus 1,000 outliers and compute the persistence diagrams of Kernel
Density Estimator, Kernel distance, and Distance to Measure. The pink bands show 95%
confidence sets obtained by bootstrapping the corresponding functions. The black lines show
95% confidence bands obtained with the bottleneck bootstrap for dimension 0, while the red
lines show 95% confidence bands obtained with the bottleneck bootstrap for dimension 1. The
Distance to Measure, which is less sensitive to the density of the points, correctly captures the
topology of the data. The Kernel Distance and KDE find some extra significant connected
component, corresponding to high density regions at the intersection of the grid.

Example 2 (Soccer) Figure 13 shows the field position of two soccer players. The data
come from body-sensor traces collected during a professional soccer game in late 2013 at the
Alfheim Stadium in Tromso, Norway. The data are sampled at 20 Hz. See Pettersen et al.
(2014). Although the data is a function observed over time, we treat it as a point cloud.
Points on the boundary of the field have been added to avoid boundary bias. The DTM

32
Robust Topological Inference

Original Data Distance Fct Diagram DTM Diagram

0.6
0.20
● ●

0.3 0.4 0.5


Death
Death

Death
Birth
0.10
● ●
●●● ●


● ●

●●


●●● ● ●




●●


●●
● ●






●●



●●




●●




●●
● ●





●●



●●

0.00



●●




●●

●●

0.2

0.00 0.10 0.20 0.2 0.3 0.4 0.5 0.6


Birth Death
Birth Birth
High Density Data Distance Fct Diagram DTM Diagram

0.2 0.3 0.4 0.5 0.6


● ●
●●●●
● ●●●
●●●● ● ●●●●●
●● ● ● ● ●● ●
●● ●●●●● ● ●● ●●

● ●●●● ●●● ●●●
●●●●
● ●●●●●
● ●●
●●
● ●●●●●●
● ●●
●●●●●● ● ● ● ● ● ● ●●

0.1 0.2 0.3


● ● ●●● ●● ● ●●● ●
●●● ●

● ● ●●
● ●●●●

● ●
●●
●●
●● ●

●●●●●
● ●●
●● ●
● ●●●● ●

●●●

●●
●●● ●● ● ●●●● ●● ●
●●● ● ●
●● ●●
●●
● ●●●
● ●●
●●●●
●●● ●●●● ●●●●●●●

● ● ● ●●●●●● ●●
●●●●
● ●
● ● ● ●
● ●● ●●●●
● ●●
●●●

●●● ● ●● ●● ●●

●●

●●

● ● ●

● ●

● ●●●

●●

●●● ●●
● ●● ●
●●
● ●

● ●
● ●
●● ●● ●●
●● ●
Death
Death

Death

●●● ●
●●● ●●●
●●●

●●● ●
●●●

Birth
● ● ● ●● ●
●●
● ● ●●●
●●
● ●
● ● ●●● ●
● ●● ●● ● ● ●
●●

●●
●●
● ●●
● ●● ● ●
● ●● ● ● ●● ●●●●● ● ●●
●●●●● ● ●● ●●●●●● ●●●
●●

●● ●

●●● ●●● ●● ●● ●●
●● ●●
●● ● ●●●● ●● ●
●●● ● ●
● ●●●
●●
● ● ●
●●
● ●
●●●● ● ●● ●● ●●
●●● ●●●●
●●
● ●
● ●
●●●
●●
●●● ●●●● ●

●● ●● ●


● ●●●●●
●●● ●●
● ●● ● ●● ●
● ●
●●●● ●●●●●●
●●●

●●
● ●● ●●●● ● ● ●●●
●●●● ●
● ●●
●●● ●●●●●● ●●●
●● ●●
●● ●●●
● ●● ●


● ●●

●● ●●
● ●●●●●●


●● ●
●● ●




●● ● ●●● ●●●●
●●●●● ● ● ●

●●●
● ●
●● ●
●●●●
● ●
●● ●
●●
●●●●


● ●
●●●
●●

●●●● ●

● ●●
● ● ● ●
●●● ●
●● ● ●
●●● ●
●●●●
●●
●●●
●● ●●
●●● ●
●●●●
●●
● ●

●●




●● ●●

● ●●●●





●●

● ●
●●●
●●
●●● ● ●
● ●
●●

●●● ●
●●
● ●●
●●
● ●● ●●
● ●●
●●●

●●


●●

● ●●
●●●● ●● ● ●● ●● ●●●●● ●●●● ●●
●● ● ●
●● ●● ●
● ●
● ●● ●●●●●

● ●●●●
●●


●●
●●●●
●●●
●●
●● ●●●●
●●
●●
● ●●

●●●
● ●
● ●●
●● ●●


● ●
●●
● ●●
● ● ●
●●● ●
●●
● ●


●●


●●●●●●●●●● ● ●
●●● ● ●●●
●●

● ● ●● ●●
● ● ●● ● ●●●

●● ● ●●●●●
●● ●






●●
●● ●●●● ● ●● ● ●● ●● ●●
0.0

● ●
● ● ●● ●




0.0 0.1 0.2 0.3 0.2 0.3 0.4 0.5 0.6


Birth Death
Birth Birth
Sharpened Data Distance Fct Diagram DTM Diagram

0.6
● ●
0.4



● ●

● ●
●●●

●●●
●●
●●
●●
●●

●●
●●
● ●●
● ●
●●
● ●
●●●●●
●●●
●●
●●

●●



●● ●
●●●

●●
●●

●●

●●

●●


●●
●● ●●

●●
●●

● ●
●●

●● ●
● ●

● ● ●



● ● ●


● ● ●


● ●
● ●

● ● ●
● ●
● ●
● ●
Death
Death

Death
● ●
0.4
●● ●
Birth


● ●

● ●


● ●
● ●


● ●
● ●



●●● ●
●●●
● ●
●●
●●●●●

●● ●

●●●

●●●
●●

●●


●●●●●
●●●

●●●

●●●

●●

●●

●●
●●
●●
●●
●●

●●
●●
●●

●●●●
●●

● ●

●●

●●
●●

●●
● ●



0.2

● ●

● ●

● ● ●


● ● ●
● ●
●●
● ● ●


● ●









● ● ●

● ●

● ●

● ● ● ●
















● ● dim 0

●● ●
● ●


● ●

dim 1
0.2


● ●
● ●




●●
● ●●

● ●●
●●

●●●


●●●

●●
●●

●●
●●
●●●
●●
●●
●●
●●
●●
●●
●●

●●
●●

● ●
●●

●●

●●
●●
●●

●●
●●●
●●
●●●● ●●●
●●
●●
●●





0.0







0.0 0.2 0.4 0.2 0.4 0.6


Birth Death

Figure 11: Top: 1,300 points sampled along a 2 × 2 grid with Gaussian noise; the diagram
of the distance function shows many loops due to noise. Middle: the red points
are the high density data (density > 0.15); the corresponding diagram of the
distance function correctly captures the 4 loops, plus a few features with short
lifetime. Bottom: the red points represent the sharpened high density data;
now most of the noise in the corresponding diagram is eliminated. Note that
the diagram of the distance to measure function does a good job with the original
data. The bottom left plot shows a slight improvement, in the sense that the
persistence of the 4 loops has increased.

captures the difference between the two players: the defender leaves one big portion of the
filed uncovered (1 significant loop in the persistence diagram), while the midfielder does not
cover the 4 corners (4 significant loops). Nonetheless, the Kernel distance, which is more
sensible to the density of these points, fails to detect significant topological features.

33
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

Noisy Grid KDE h=0.05 KDIST h=0.05 DTM m=0.01


1.5

● ● ●

1.005
●●

0.15


●●●


1.0

0.05 0.10
Death
Death

Death
Death
Birth
Birth

0.995

●●

●●
● ●

●●

●● ●



● ●


●●●

●●

0.5
dim 0

0.985
●● dim 1


●●









●●





●●●
●●●

0.0
0.0 0.5 1.0 1.5 0.985 0.995 1.005 0.05 0.10 0.15
Death Birth Birth

Figure 12: 10,000 data points on a 2D grid and the corresponding persistence diagrams of
Kernel Density Estimator, Kernel distance, and Distance to Measure. For more
details see Example 1.

Figure 13: Top: data for a defender. We show the DTM, the digram for the DTM and
the digram for the kernel distance. Bottom: same but for a midfielder. The
midfielder data has more loops.

Example 3 (Voronoi . . . , zk } ⊂ R3 , let the Voronoi


Models) Given k points (nuclei) {z1 ,
3

region Rk be Rk = x ∈ R : kx − zk k ≤ kx − xj k for all j 6= k . The Voronoi regions R1 , . . . , Rk
partition the space, forming what is known as the Voronoi diagram. A face is formed by the
intersection of 2 adjacent Voronoi regions; a line is formed at the intersection of two faces
and a node is formed at the intersection of two or more lines.

34
Robust Topological Inference

We will sample points around the the nodes, lines and faces that are formed at the
intersection of the Voronoi regions. A Voronoi wall model is a sampling scheme that returns
points within or around the Voronoi faces. Similarly, by sampling points exclusively around
the lines or exclusively around the nodes, we can construct Voronoi filament models and
Voronoi cluster models.
These models were introduced by Icke and van de Weygaert (1991) to mimic key features
of cosmological data; see also van de Weygaert et al. (2011).
In this example we generate data from filament models and wall models using the basic
definition of Voronoi diagram, computed on a fine grid in [0, 50]3 . We also add random
Gaussian noise. See Figure 14: the first two rows show 100K particles concentrated around
the filaments of 8 and 64 Voronoi cells, respectively. The last two rows show 100K particles
concentrated around the walls of 8 and 64 Voronoi cells. 60K points on the boundary of the
boxes have been added to mitigate boundary bias. For each model we present the persistent
diagrams of the distance function, distance to measure and kernel density estimator. We
chose the smoothing parameters by maximizing the quantity S(·), defined in Section 7.1.
The diagrams illustrate the evolution of the filtrations for the three different functions:
at first, the connected components appear (black points in the diagrams); then they merge
forming loops (red triangle), that eventually evolve into 3D voids (blue squares).
The persistence diagrams of the three functions allow us to distinguish the different
models (see Figure 1 for a less trivial example) and the confidence bands, generated using
the bootstrap method of Section 4.1, allow us to separate the topological signal from the
topological noise. In general, the DTM performs better than the KDE, which is more affected
by the high density of points around the nodes and filaments. For instance, this is very clear
in the third row of Figure 14. The DTM diagram correctly captures the topology of the
Voronoi wall model with 8 nuclei: one connected component and 8 voids are significant,
while the remaining homological features fall into the band and are classified as noise.

9. Discussion
In this paper, we showed how the DTM and KDE can be used for robust topological
inference. Further, we showed how to use the bootstrap to identify topological features that
are distinguishable from noise. We conclude by discussing two issues: comparing DTM and
KDE, and using persistent homology versus selecting a single level set.

9.1 Comparison of DTM and Kernel Distance


The DTM and the KDE have the same broad aim: to provide a means for extracting
topological features from data. However, these two methods are really focused on different
goals. Consider again the model P = πR + (1 − π)(Q ? Φσ ) and let S be the support of Q.
As before, we assume that S is a “small set” meaning that either it has dimension k < d or
that it is full dimensional but has small Lebesgue measure. When π and σ are small, the
persistent homology of the upper level sets of the density p will be dominated by features
corresponding to the homology of S. In other words, we are using the persistent homology
of {p > t} to learn about the homology of S. In contrast, the DTM is aimed at estimating
the persistent homology of S. Both are useful, but they have slightly different goals.

35
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

Figure 14: Data from four Voronoi foam models. In each case we show the diagrams of the
distance function, the DTM and the KDE. A boundary correction was included.

This also raises the intriguing idea of extracting more information from both the KDE
and DTM by varying more parameters. For example, if we look at the sets {ph > t} for
fixed t but varying h, we get information very similar to that of the DTM. Conversely, for

36
Robust Topological Inference

the DTM, we can vary the tuning parameter m. There are many possibilities here which
we will investigate in future work.

9.2 Persistent Homology Versus Choosing One Level Set


We have used the persistent homology of the upper level sets {b
ph > t} to probe the homology
of S. This is the approach used in Bubenik (2015) and Phillips et al. (2014).
Bobrowski et al (2014) suggest a different approach. They select a particular level set
{p > t} and they form a robust estimate of the homology of this one level set. They have
a data-driven method for selecting t. (This approach is only one part of the paper. They
also consider persistent homology.)
They make two key assumptions. The first is that there exists A < B such that {p > t}
is homotopic to S for all A < t < B. (If two sets are homotopic, then they have the same
homology.) This is a very reasonable assumption. In the mixture model P = πR + (1 −
π)(Q ? Φσ ) this assumption will be satisfied when S is a small set and when π and σ are
small. In this case, persistent homology will also work well: the dominant features in the
persistence diagram will correspond to the homology of S.
Bobrowski et al (2014) make an additional assumption. They assume that the dimension
k of S is known and that the rank of the k th homology group is 0 for all t > B. This
assumption is critical for their approach to choosing a single level set. Currently, it is not
clear how strong this assumption is. In future work, we plan to compare the robustness of
the single-level approach versus persistent homology.

9.3 Future Work


Lastly, we would like to mention that several issues deserve future attention. In particular,
the methods we discussed for choosing the tuning parameters, for mitigating boundary bias
and for sharpening the data, all deserve further investigation.
In a companion paper we will show how the ideas presented in this work can be used to
develop hypothesis tests for comparing point clouds.

Acknowledgments

The authors are grateful to Jérome Dedecker for pointing out the key decomposition (18)
of the DTM. The authors also would like to thank Jessi Cisewski and Jisu Kim for their
comments and two referees for helpful suggestions. We would like to acknowledge support
for this project from ANR-13-BS01-0008, NSF CAREER Grant DMS 1149677, Air Force
Grant FA95500910373 and NSF Grant DMS-0806009.

References
A. Banyaga and D. Hurtubise. Lectures on Morse Homology. Kluwer Academic Publishers,
2004.

Vidmantas Bentkus. On the dependence of the berry–esseen bound on dimension. Journal


of Statistical Planning and Inference, 113(2):385–402, 2003.

37
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

Gérard Biau, Frédéric Chazal, David Cohen-Steiner, Luc Devroye, Carlos Rodriguez, et al.
A weighted k-nearest neighbor density estimate for geometric inference. Electronic Jour-
nal of Statistics, 5:204–237, 2011.

S. Bobkov and M. Ledoux. One-dimensional empirical measures, order statistics and Kan-
torovich transport distances. Preprint, 2014.

Omer Bobrowski, Sayan Mukherjee, and Jonathan Taylor. Topological consistency via
kernel estimation. arXiv preprint arXiv:1407.5272, 2014.

Peter Bubenik. Statistical topological data analysis using persistence landscapes. Journal
of Machine Learning Research, 16:77–102, 2015.

Mickaël Buchet, Frédéric Chazal, Steve Y Oudot, and Donald R Sheehy. Efficient and
robust topological data analysis on metric spaces. arXiv preprint arXiv:1306.0039, 2013.

Claire Caillerie, Frédéric Chazal, Jérôme Dedecker, and Bertrand Michel. Deconvolution
for the wasserstein metric and geometric inference. Electronic Journal of Statistics, 5:
1394–1423, 2011.

Gunnar Carlsson. Topology and data. Bulletin of the American Mathematical Society, 46
(2):255–308, 2009.

F. Chazal, D. Cohen-Steiner, M. Glisse, L.J. Guibas, and S.Y. Oudot. Proximity of persis-
tence modules and their diagrams. In SCG, pages 237–246, 2009. ISBN 978-1-60558-501-7.
doi: http://doi.acm.org/10.1145/1542362.1542407.

F. Chazal, V. de Silva, M. Glisse, and S. Oudot. The structure and stability of persistence
modules. arXiv preprint arXiv:1207.3674, 2012.

Frédéric Chazal, David Cohen-Steiner, and Quentin Mérigot. Geometric inference for prob-
ability measures. Foundations of Computational Mathematics, 11(6):733–751, 2011.

Frédéric Chazal, Pascal Massart, and Bertrand Michel. Rates of convergence for robust
geometric inference. Technical report, ArXiv preprint 1505.07602, 2015.

D. Cohen-Steiner, H. Edelsbrunner, and J. Harer. Stability of persistence diagrams. In


SCG, pages 263–271, 2005.

Antonio Cuevas and Alberto Rodrı́guez-Casal. On boundary estimation. Advances in Ap-


plied Probability, 36(2):340–354, 2004.

M. Demazure. Bifurcations and Catastrophes: Geometry of Solutions to Nonlinear Prob-


lems. Springer-Verlag, 2013.

Herbert Edelsbrunner and John Harer. Computational Topology: An Introduction. Ameri-


can Mathematical Society, 2010.

Brittany Terese Fasy, Jisu Kim, Fabrizio Lecci, and Clement Maria. Introduction to the R
package TDA. arXiv preprint arXiv: 1411.1830, 2014a.

38
Robust Topological Inference

Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman
Balakrishnan, and Aarti Singh. Confidence sets for persistence diagrams. The Annals of
Statistics, 42(6):2301–2339, 2014b.

Evarist Giné and Armelle Guillou. Rates of strong uniform consistency for multivariate
kernel density estimators. In Annales de l’Institut Henri Poincare (B) Probability and
Statistics, volume 38, pages 907–921. Elsevier, 2002.

M. Golubitsky and V. Guillemin. Stable Mappings and Their Singularities. Springer-Verlag,


1986.

Leonidas Guibas, Dmitriy Morozov, and Quentin Mérigot. Witnessed k-distance. Discrete
& Computational Geometry, 49(1):22–45, 2013.

Vincent Icke and Rien van de Weygaert. The galaxy distribution as a voronoi foam. Quar-
terly Journal of the Royal Astronomical Society, 32:85–112, 1991.

J/. Milnor. Morse Theory. Number 51. Princeton University Press, 1963.

Umut Ozertem and Deniz Erdogmus. Locally defined principal curves and surfaces. The
Journal of Machine Learning Research, 12:1249–1286, 2011.

Svein Arne Pettersen, Dag Johansen, Håvard Johansen, Vegard Berg-Johansen, Vamsid-
har Reddy Gaddam, Asgeir Mortensen, Ragnar Langseth, Carsten Griwodz, Håkon Kvale
Stensland, and Pål Halvorsen. Soccer video and player position dataset. In Proceedings
of the 5th ACM Multimedia Systems Conference, pages 18–23. ACM, 2014.

Jeff M. Phillips, Bei Wang, and Yan Zheng. Geometric inference on kernel density estimates.
arXiv preprint arXiv:1307.7760, 2014.

B. L. S. Prakasa Rao. Nonparametric Functional Estimation. Probability and Mathematical


Statistics. Academic Press, Orlando, FL, 1983.

E. Schuster. Incorporating support constraints into nonparametric estimators of densities.


Communications in Statistics, A(14):1123–1136, 1958.

Galen R Shorack and Jon A Wellner. Empirical processes with applications to statistics,
volume 59. SIAM, 2009.

Bharath K Sriperumbudur, Kenji Fukumizu, Arthur Gretton, Gert RG Lanckriet, and


Bernhard Schölkopf. Kernel choice and classifiability for rkhs embeddings of probability
distributions. In NIPS, pages 1750–1758, 2009.

Rien van de Weygaert, Gert Vegter, Herbert Edelsbrunner, Bernard JT Jones, Pratyush
Pranav, Changbom Park, Wojciech A Hellwing, Bob Eldering, Nico Kruithof, EGP Bos,
et al. Alpha, Betti and the megaparsec universe: On the topology of the cosmic web. In
Transactions on Computational Science XIV, pages 60–101. Springer-Verlag, 2011.

Aad W. van der Vaart. Asymptotic Statistics, volume 3. Cambridge UP, 2000.

Aad W. van der Vaart and Jon A. Wellner. Weak Convergence. Springer, 1996.

39
Chazal, Fasy, Lecci, Michel, Rinaldo, and Wasserman

JE Yukich. Laws of large numbers for classes of functions. Journal of multivariate analysis,
17(3):245–260, 1985.

40

You might also like