0% found this document useful (0 votes)
109 views377 pages

CTDAbook

This document is a textbook on computational topology for data analysis. It introduces fundamental concepts in topology such as topological spaces, maps, homotopies, manifolds, and homology. It then discusses computational techniques for topology including simplicial complexes, persistent homology, and algorithms for computing topological persistence. The textbook aims to provide background and methods for applying topological data analysis.

Uploaded by

mobius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views377 pages

CTDAbook

This document is a textbook on computational topology for data analysis. It introduces fundamental concepts in topology such as topological spaces, maps, homotopies, manifolds, and homology. It then discusses computational techniques for topology including simplicial complexes, persistent homology, and algorithms for computing topological persistence. The textbook aims to provide background and methods for applying topological data analysis.

Uploaded by

mobius
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 377

Computational Topology for Data Analysis

Tamal Krishna Dey


Department of Computer Science
Purdue University
West Lafayette, Indiana, USA 47907

Yusu Wang
Halıcıoğlu Data Science Institute
University of California, San Diego
La Jolla, California, USA 92093
ii Computational Topology for Data Analysis

c Tamal Dey and Yusu Wang 2016-2021

This material has been / will be published by Cambridge University Press as Computational
Topology for Data Analysis by Tamal Dey and Yusu Wang. This pre-publication version is free
to view and download for personal use only. Not for re-distribution, re-sale, or use in derivative
works.
Contents

1 Basics 3
1.1 Topological space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Metric space topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Maps, homeomorphisms, and homotopies . . . . . . . . . . . . . . . . . . . . . 9
1.4 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4.1 Smooth manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Functions on smooth manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.1 Gradients and critical points . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.2 Morse functions and Morse Lemma . . . . . . . . . . . . . . . . . . . . 18
1.5.3 Connection to topology . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Complexes and Homology Groups 23


2.1 Simplicial complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Nerves, Čech and Rips complex . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Sparse complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.1 Delaunay complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3.2 Witness complex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.3 Graph induced complex . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Chains, cycles, boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.1 Algebraic structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.4.2 Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.3 Boundaries and cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.5.1 Induced homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.5.2 Relative homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.5.3 Singular Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.5.4 Cohomology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3 Topological Persistence 51
3.1 Filtrations and persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1.1 Space filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.1.2 Simplicial filtrations and persistence . . . . . . . . . . . . . . . . . . . . 54

iii
iv Computational Topology for Data Analysis

3.2 Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2.1 Persistence diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3 Persistence algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.3.1 Matrix reduction algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 68
3.3.2 Efficient implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Persistence modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.5 Persistence for PL-functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.5.1 PL-functions and critical points . . . . . . . . . . . . . . . . . . . . . . 80
3.5.2 Lower star filtration and its persistent homology . . . . . . . . . . . . . 84
3.5.3 Persistence algorithm for 0-th persistent homology . . . . . . . . . . . . 86
3.6 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4 General Persistence 93
4.1 Stability of towers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 Computing persistence of simplicial towers . . . . . . . . . . . . . . . . . . . . 97
4.2.1 Annotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.2.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.2.3 Elementary inclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.4 Elementary collapse . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.3 Persistence for zigzag filtration . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.3.1 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.3.2 Zigzag persistence algorithm . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4 Persistence for zigzag towers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.5 Levelset zigzag persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.5.1 Simplicial levelset zigzag filtration . . . . . . . . . . . . . . . . . . . . . 115
4.5.2 Barcode for levelset zigzag filtration . . . . . . . . . . . . . . . . . . . . 116
4.5.3 Correspondence to sublevel set persistence . . . . . . . . . . . . . . . . 117
4.5.4 Correspondence to extended persistence . . . . . . . . . . . . . . . . . . 118
4.6 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

5 Generators and Optimality 123


5.1 Optimal generators/basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
5.1.1 Greedy algorithm for optimal H p (K)-basis . . . . . . . . . . . . . . . . . 125
5.1.2 Optimal H1 (K)-basis and independence check . . . . . . . . . . . . . . . 128
5.2 Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.2.1 Linear program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
5.2.2 Total unimodularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
5.2.3 Relative torsion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
5.3 Persistent cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
5.3.1 Finite intervals for weak (p + 1)-pseudomanifolds . . . . . . . . . . . . 138
5.3.2 Algorithm correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
5.3.3 Infinite intervals for weak (p + 1)-pseudomanifolds embedded in R p+1 . . 143
5.4 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
Computational Topology for Data Analysis v

6 Topological Analysis of Point Clouds 147


6.1 Persistence for Rips and Čech filtrations . . . . . . . . . . . . . . . . . . . . . . 148
6.2 Approximation via data sparsification . . . . . . . . . . . . . . . . . . . . . . . 150
6.2.1 Data sparsification for Rips filtration via reweighting . . . . . . . . . . . 151
6.2.2 Approximation via simplicial tower . . . . . . . . . . . . . . . . . . . . 156
6.3 Homology inference from PCDs . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.3.1 Distance field and feature sizes . . . . . . . . . . . . . . . . . . . . . . . 158
6.3.2 Data on manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
6.3.3 Data on a compact set . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
6.4 Homology inference for scalar fields . . . . . . . . . . . . . . . . . . . . . . . . 162
6.4.1 Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.4.2 Inference guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.5 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

7 Reeb Graphs 169


7.1 Reeb graph: Definitions and properties . . . . . . . . . . . . . . . . . . . . . . . 170
7.2 Algorithms in the PL-setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
7.2.1 An O(m log m) time algorithm via dynamic graph connectivity . . . . . . 173
7.2.2 A randomized algorithm with O(m log m) expected time . . . . . . . . . 176
7.2.3 Homology groups of Reeb graphs . . . . . . . . . . . . . . . . . . . . . 179
7.3 Distances for Reeb graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.3.1 Interleaving distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.3.2 Functional distortion distance . . . . . . . . . . . . . . . . . . . . . . . 184
7.4 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

8 Topological Analysis of Graphs 191


8.1 Topological summaries for graphs . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.1.1 Combinatorial graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
8.1.2 Graphs viewed as metric spaces . . . . . . . . . . . . . . . . . . . . . . 193
8.2 Graph comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
8.3 Topological invariants for directed graphs . . . . . . . . . . . . . . . . . . . . . 197
8.3.1 Simplicial complexes for directed graphs . . . . . . . . . . . . . . . . . 197
8.3.2 Path homology for directed graphs . . . . . . . . . . . . . . . . . . . . . 198
8.3.3 Computation of (persistent) path homology . . . . . . . . . . . . . . . . 201
8.4 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

9 Cover, Nerve, and Mapper 209


9.1 Covers and nerves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
9.1.1 Special case of H1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
9.2 Analysis of persistent H1 -classes . . . . . . . . . . . . . . . . . . . . . . . . . . 217
9.3 Mapper and multiscale mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
9.3.1 Multiscale Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
9.3.2 Persistence of H1 -classes in mapper and multiscale mapper . . . . . . . . 223
9.4 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
9.4.1 Interleaving of cover towers and multiscale mappers . . . . . . . . . . . 225
vi Computational Topology for Data Analysis

9.4.2 (c, s)-good covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226


9.4.3 Relation to intrinsic Čech filtration . . . . . . . . . . . . . . . . . . . . . 228
9.5 Exact Computation for PL-functions on simplicial domains . . . . . . . . . . . . 229
9.6 Approximating multiscale mapper for general maps . . . . . . . . . . . . . . . . 231
9.6.1 Combinatorial mapper and multiscale mapper . . . . . . . . . . . . . . . 232
9.6.2 Advantage of combinatorial multiscale mapper . . . . . . . . . . . . . . 233
9.7 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234

10 Discrete Morse Theory and Applications 237


10.1 Discrete Morse function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
10.1.1 Discrete Morse vector field . . . . . . . . . . . . . . . . . . . . . . . . . 239
10.2 Persistence based DMVF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
10.2.1 Persistence-guided cancellation . . . . . . . . . . . . . . . . . . . . . . 242
10.2.2 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244
10.3 Stable and unstable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
10.3.1 Morse theory revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . 248
10.3.2 (Un)Stable manifolds in DMVF . . . . . . . . . . . . . . . . . . . . . . 249
10.4 Graph reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.4.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
10.4.2 Noise model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252
10.4.3 Theoretical guarantees . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
10.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.5.1 Road network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255
10.5.2 Neuron network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257
10.6 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258

11 Multiparameter Persistence and Decomposition 261


11.1 Multiparameter persistence modules . . . . . . . . . . . . . . . . . . . . . . . . 264
11.1.1 Persistence modules as graded modules . . . . . . . . . . . . . . . . . . 264
11.2 Presentations of persistence modules . . . . . . . . . . . . . . . . . . . . . . . . 267
11.2.1 Presentation and its decomposition . . . . . . . . . . . . . . . . . . . . . 268
11.3 Presentation matrix: diagonalization and simplification . . . . . . . . . . . . . . 270
11.3.1 Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
11.4 Total diagonalization algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 274
11.4.1 Running TotDiagonalize on the working example in Figure 11.5 . . . . . 282
11.5 Computing presentations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285
11.5.1 Graded chain, cycle, and boundary modules . . . . . . . . . . . . . . . . 285
11.5.2 Multiparameter filtration, zero-dimensional homology . . . . . . . . . . 287
11.5.3 2-parameter filtration, multi-dimensional homology . . . . . . . . . . . . 287
11.5.4 d > 2-parameter filtration, multi-dimensional homology . . . . . . . . . 288
11.5.5 Time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289
11.6 Invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
11.6.1 Rank invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290
11.6.2 Graded Betti numbers and blockcodes . . . . . . . . . . . . . . . . . . . 291
11.7 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295
Computational Topology for Data Analysis 7

12 Multiparameter Persistence and Distances 299


12.1 Persistence modules from categorical viewpoint . . . . . . . . . . . . . . . . . . 301
12.2 Interleaving distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302
12.3 Matching distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
12.3.1 Computing matching distance . . . . . . . . . . . . . . . . . . . . . . . 304
12.4 Bottleneck distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
12.4.1 Interval decomposable modules . . . . . . . . . . . . . . . . . . . . . . 308
12.4.2 Bottleneck distance for 2-parameter interval decomposable modules . . . 309
12.4.3 Algorithm to compute dI for intervals . . . . . . . . . . . . . . . . . . . 314
12.5 Notes and Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315

13 Topological Persistence and Machine Learning 319


13.1 Feature vectorization of persistence diagrams . . . . . . . . . . . . . . . . . . . 320
13.1.1 Persistence landscape . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320
13.1.2 Persistence scale space (PSS) kernel . . . . . . . . . . . . . . . . . . . . 322
13.1.3 Persistence images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
13.1.4 Persistence weighted Gaussian kernel (PWGK) . . . . . . . . . . . . . . 324
13.1.5 Sliced Wasserstein kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 326
13.1.6 Persistence Fisher kernel . . . . . . . . . . . . . . . . . . . . . . . . . . 327
13.2 Optimizing topological loss functions . . . . . . . . . . . . . . . . . . . . . . . 328
13.2.1 Topological regularizer . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
13.2.2 Gradients of a persistence-based topological function . . . . . . . . . . . 330
13.3 Statistical treatment of topological summaries . . . . . . . . . . . . . . . . . . . 332
13.4 Bibliographical notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
8 Computational Topology for Data Analysis
Preface

In recent years, the area of topological data analysis (TDA) has emerged as a viable tool for an-
alyzing data in applied areas of science and engineering. The area started in the 90’s with the
computational geometers finding an interest in studying the algorithmic aspect of classical sub-
ject of algebraic topology in mathematics. The area of computational geometry flourished in
80’s and 90’s by addressing various practical problems and enriching the area of discrete geom-
etry in the course. Handful of computational geometers felt that, analogous to this development,
computational topology has the potential of addressing the area of shape and data analysis while
drawing upon and perhaps developing further the area of topology in the discrete context; see
e.g. [26, 117, 120, 188, 292]. The area gained the momentum with the introduction of persistent
homology in early 2000 followed by a series of mathematical and algorithmic developments on
the topic. The book by Edelsbrunner and Harer [149] presents these fundamental developments
quite nicely. Since then, the area has grown both in its methodology and applicability. One conse-
quence of this growth has been the development of various algorithms which intertwine with the
discoveries of various mathematical structures in the context of processing data. The purpose of
this book is to capture these algorithmic developments with the associated mathematical guaran-
tees. It is appropriate to mention that there is an emerging sub-area of TDA which centers more
around statistical aspects. This book does not deal with these developments though we mention
some of it in the last chapter where we describe the recent results connecting TDA and machine
learning.
We have 13 chapters in the book listed in the table of contents. After developing the basics
of topological spaces, simplicial complexes, homology groups, and persistent homology in the
first three chapters, the book is then devoted to presenting algorithms and associated mathemat-
ical structures in various contexts of topological data analysis. These chapters present materials
mostly not covered in any book in the market. To elaborate on this claim, we briefly give an
overview of the topics covered by the present book. The fourth chapter presents generalization
of the persistence algorithm to extended settings such as to simplicial maps (instead of inclu-
sions), zigzag sequences both with inclusions and simplicial maps. Chapter 5 covers algorithms
on computing optimal generators both for persistent and non-persistent homology. Chapter 6 fo-
cuses on algorithms that infer homological information from point cloud data. Chapter 7 presents
algorithms and structural results for Reeb graphs. Chapter 8 considers general graphs including
directed ones. Chapter 9 focuses on various recent results on characterizing nerves of covers in-
cluding the well known Mapper and its multiscale version. Chapter 10 devotes to the important
concept discrete Morse theory, its connection to persistent homology, and its applications to graph
reconstruction. Chapter 11 and 12 introduce multiparameter persistence. The standard persistence

9
10 Computational Topology for Data Analysis

is defined over a 1-parameter index set such as Z or R. Extending this index set to a poset such
as Zd or Rd , we get d-parameter or multiparameter persistence. Chapter 11 focuses on computing
indecomposables for multiparameter persistence that are generalizations of bars in 1-parameter
case. Chapter 12 focuses on various definitions of distances among multiparameter persistence
modules and their computations. Finally, we conclude with Chapter 13 that presents some recent
development of incorporating persistence into the machine learning (ML) framework.
This book is intended for the audience comprising researchers and teachers in computer sci-
ence and mathematics. The graduate students in both fields will benefit from learning the new
materials in topological data analysis. Because of the topics, the book plays a role of a bridge
between mathematics and computer science. Students in computer science will learn the math-
ematics in topology that they are usually not familiar with. Similarly, students in mathematics
will learn about designing algorithms based on mathematical structures. The book can be used
for a graduate course in topological data analysis. In particular, it can be part of a curriculum in
data science which has been/is being adopted in universities. We are including exercises for each
chapter to facilitate teaching and learning.
There are currently few books on computational topology/topological data analysis in the mar-
ket to which our book will be complementary. The materials covered in this book predominately
are new and have not been covered in any of the previous books. The book by Edelsbrunner and
Harer [149] mainly focuses on early developments in persistent homology and do not cover the
materials in Chapters 4 to 13 in this book. The recent book of Boissonnat et al.[39] focuses mainly
on reconstruction, inference, and Delaunay meshes. Other than the Chapter 6 which focuses on
point cloud data and inference of topological properties and Chapter 1-3 which focus on prelim-
inaries about topological persistence, there are hardly any overlap. The book by Oudot [249]
mainly focuses on algebraic structures of persistence modules and inference results. Again, other
than preliminary Chapters 1-3 and Chapter 6, there are hardly any overlap. Finally, unlike ours,
the books by Tierny [286] and by Rabadán and Blumberg [260] mainly focus on applying TDA
to specific domains of scientific visualizations and genomics respectively.
This book, as any other, is not created in isolation. Help coming from various corners con-
tributed to its creation. It was seeded by the class notes that we developed for our introductory
course on Computational Topology and Data Analysis which we taught at the Ohio State Univer-
sity. During this teaching, the class feedback from students gave us the hint that a book covering
increasingly diversified repertoire of topological data analysis is necessary at this point. We thank
all those students who had to bear with the initial disarray that was part of freshly gathering a
coherent material on a new subject. This book would not have been possible without our own
involvement with TDA which was mostly supported by grants from National Science Foundation
(NSF). Many of our PhD students worked through these projects that helped us consolidate our
focus on TDA. In particular, Tao Hou, Ryan Slechta, Cheng Xin, and Soham Mukherjee gave
their comments on drafts of some of the chapters. We thank all of them. We thank everyone from
the TGDA@OSU group for creating one of the best environments for carrying out research in
applied and computational topology. Our special thanks go to Facundo Mémoli, who has been a
great colleague (collaborated with us on several topics) as well as a wonderful friend at OSU. We
also acknowledge the support of the department of CSE at the Ohio State University where a large
amount of the contents of this book were planned and written. The finishing came to fruition after
we moved to our current institutions.
Computational Topology for Data Analysis 11

Finally, it is our pleasure to acknowledge the support of our families that kept us motivated
and engaged throughout the marathon of writing this book, especially during the last stretch over-
lapping the 2020-2021 Coronavirus pandemic. Tamal recalls his daughter Soumi and son Sounak
asking him continuously about the progress of the book. His wife Kajari extended all the help
necessary to make space for extra time needed for the book. Despite suffering from the reduced
attention to family matters, all of them offered their unwavering support and understanding gra-
ciously. Tamal dedicates this book to his family and his late parents Gopal Dey and Hasi Dey
without whose encouragement and love, he would not have been in a position to take up this
project. Yusu thanks her husband Mikhail Belkin for his never-ending support and encouragement
throughout writing this book and beyond. Their two children Alexander and Julia contributed in
their typical ways by making everyday delightful and unpredictable for her. Without their support
and love, she would not be able to finish this book. Finally, Yusu dedicates this book to her par-
ents Qingfen Wang and Jinlong Huang, who always gave her space to grow and encouraged her
to do her best in life, as well as to her great aunt Zhige Zhao and great uncle Humin Wang, who
kindly took her under their care when she was 13. She can never repay their kindness.
12 Computational Topology for Data Analysis
Prelude

We make sense of the world around us primarily by understanding and studying the “shape" of
the objects that we encounter in real life or in a digital environment. Geometry offers a common
language that we usually use to model and describe shapes. For example, the familiar descriptors
such as distances, coordinates, angles and so on from this language assist us to provide detailed
information of a shape of interest. Not surprisingly, mankind has used geometry for thousands of
years to describe objects in his/her surrounding.
However, there are many situations where the detailed ge-
ometric information is not needed and may even obscure the
real useful structure that is not so explicit. A notable example
is the Seven Bridges of Königsberg problem, where in the city
of Königsberg, Pregel river separated the city into four regions,
connected by seven bridges as shown in Figure 1 (taken from
the Wikipedia page for "Seven bridge of Königsberg"). The
question is to find a walk through the city that crosses each
bridge exactly once. Story goes that mathematician Leonhard
Euler observed that factors such as the precise shape of these
regions and the exact path taken are not important. What is
Figure 1: “Map of Königsberg
important is the connectivity among the different regions of
in Euler’s time showing the ac-
the city as connected by the bridges. In particular, the problem
tual layout of the seven bridges,
can be modeled abstractly using a graph with four nodes, rep-
highlighting the river Pregel and
resenting the four regions in the city of Königsberg, and seven
the bridges" by Bogdan Giuşcă
edges representing the bridges connecting them. The problem
is licensed under CC BY-SA 3.0.
then reduces to what’s later known as finding the Euler tour (or
Eulerian cycle) in this graph, which can be easily solved.
For another example, consider animation in computer graphics where one wants to develop a
software that can continuously deform one object to another (in the sense that one can stretch and
change the shape, but cannot break and add to the shape). Can we continuously deform a frog to
a prince this way1 ? Is it possible to continuously deform a tea cup to a bunny? It turns out the
latter is not possible.
In these examples, the core structure of interest behind the input object or space is character-
ized by the way the space is connected, and the detailed geometric information may not matter. In
general, topology intuitively models and studies properties that are invariant as long as the con-
nectivity of space does not change. As a result, topological language and concepts can provide
1
Yes according to Disney movies.

13
14 Computational Topology for Data Analysis

powerful tools to characterize, identify, and process essential features of both spaces and functions
defined on them. However, to bring topological methods to the realm of practical applications,
not only do we need new ideas to make topological concepts and resulting structures more suit-
able for modern data analysis tasks, but also algorithms to compute these structures efficiently. In
the past two decades, the field of applied and computational topology has developed rapidly, pro-
ducing many fundamental results and algorithms that have advanced both fronts. These progress
further fueled the significant growth of topological data analysis (TDA) which has already found
applications in various domains such as computer graphics, visualization, material science, com-
putational biology, neuroscience and so on.

Examples. In Figure 2, we present some examples of the use of topological methodologies in


applications. The topological structures involved will be described later in the book.
An important development in applied and computational topology in the past two decades
centers around the concept of persistent homology which generalizes the classic algebraic struc-
ture of homology groups to the multi-scale setting aided by the concept of so-called filtration
and persistence modules (discussed in Chapters 2 and 3). This helps significantly to broaden the
applications of homological features to characterizing shapes/spaces of interest. Figure 2(a) gives
an example where persistent homology of a density field is used to develop a clustering strategy
for the points [87]. In particular, at the beginning, each point is in its own cluster. Then, these
clusters are grown using persistent homology which identifies their importance and merges them
according to this importance. The final output captures key clusters which may look like ‘blobs’
or ‘curvy strips’–intuitively, they comprise dense regions separated by sparse regions.
Figure 2(b) gives an example where the resulting topological summaries from persistent ho-
mology have been used for clustering a collection of neurons, each of which is represented by a
rooted tree (as neuron cells have tree morphology). We will see in Chapter 13, persistent homol-
ogy can serve as a general way to vectorize features of such complex input objects.
In Figure 2(c), diseased parts of retinal degeneracy in eyes are localized from image data. Al-
gorithms for computing optimal cycles for bars in the persistent barcode as described in Chapter 5
are used for this purpose.
In Figure 2(d), we present an example where the topological object of contour tree (the special
loop-free case of the so-called Reeb graph as discussed in Chapter 7) has been used to give low-
dimensional terrain metaphor of a potentially high dimensional scalar field. To illustrate further,
suppose that we are given a scalar field f : X → R where X is a space of potentially high
dimension. To visualize and explore X and f in R2 and R3 , just mapping X to R2 can cause
significant geometric distortion, which in turn leads to artifacts in the visualization of f over the
projection. Instead, we can create a 2D terrain metaphor f 0 : R2 → R for f which preserves
the contour tree information as proposed in [299]; intuitively, this preserves the valleys/mountain
peaks and how they merge and split. In this example, the original scalar field is in R3 . However,
in general, the idea is applicable to higher dimensional scalar fields (e.g., the protein energy
landscape considered in [184]).
In Figure 2(e), we give an example of an alternative approach of exploring a high-dimensional
space X or functions defined on it via the Mapper methodology (introduced in Chapter 9). In par-
ticular, the Mapper methodology constructs a representation of the essential structure behind X
Computational Topology for Data Analysis 15

(a) (b)

(c) (d)

(e) (f)

Figure 2: Examples of the use of topological ideas in data analysis. (a) A persistence-based clus-
tering strategy: The persistence diagram of a density field estimated from an input noisy point
cloud (shown in top row) is used to help group points into clusters (bottom row). Reprinted
by permission from Springer Nature: Springer Nature, Discrete & Computational Geome-
try, "Analysis of scalar fields over point cloud data", Frédéric Chazal et al. [87], c 2011.
(b) Using persistence diagram summaries to represent and cluster neuron cells based on their
tree morphology; image taken from [206] licensed by Kanari et al.(2018) under CC BY 4.0
(https://creativecommons.org/licenses/by/4.0/). (c) Using optimal persistent 1-cycle correspond-
ing to a bar (red) in the persistence barcode, defects in diseased eyes are localized; image taken
from [128]. (d) Topological landscape (left) of a 3D volumetric Silicium data set. A volume
rendering of Silicium dataset is on the right. However, note that it is hard to see all the structures
forming the lattice of the crystal, while the topological landscape view shows clearly that most
of them have high function values and are of similar sizes; image taken from [299], reprinted
by permission from IEEE: Gunther Weber et al. (2007). (e) Mapper structure behind the high-
dimensional cell gene expression data set can not only show the cluster of different tumor or
normal cells, but also their connections; image taken from [244], reprinted by permission from
Monica Nicolau et al. (2011, fig. 3). (f) Using a discrete Morse based graph skeleton reconstruc-
tion algorithm to help reconstruct road networks from satellite images even with few labelled
training data; image taken from [139].
Computational Topology for Data Analysis 1

via a pull-back of a covering of Z through a map f : X → Z. This intuitively captures the


continuous structure of X at coarser level via the discretization of Z. See Figure 2(e), where the
1-dimensional skeleton of the Mapper structure behind a breast cancer microarray gene expres-
sion data set is shown [244]. This continuous space representation not only shows “clusters" of
different groups of tumors and of normal cells, but also how they connect in the space of cells,
which are typically missing in standard cluster analysis.
Finally, Figure 2(f) shows an example of combining topological structures from the discrete
Morse theory (Chapter 10) with convolutional neural networks to infer road networks from satel-
lite images [139]. In particular, the so-called 1-unstable manifolds from discrete Morse theory
can be used to extract hidden graph skeletons from noisy data.
We conclude this prelude by summarizing the aim of this book: introduce the recent progress
in applied and computational topology for data analysis with an emphasis on the algorithmic
aspect.
2 Computational Topology for Data Analysis
Chapter 1

Basics

Topology–mainly algebraic topology, is the fundamental mathematical subject that topological


data analysis bases on. In this chapter, we introduce some of the very basics of this subject that
are used in this book. First, in Section 1.1, we give the definition of a topological space and other
notions such as open and closed sets, covers, subspace topology that are derived from it. These
notions are quite abstract in the sense that it does not require any geometry. However, the intuition
of topology becomes more concrete to non-mathematicians when we bring geometry into the mix.
Section 1.2 is devoted to make the connection between topology and geometry through what is
called metric spaces.
Maps such as homeomorphism and homotopy equivalence play a significant role to relate
topological spaces. They are introduced in Section 1.3. At the heart of these definitions sits the
important notion of continuous functions which generalizes the concept mainly known for Eu-
clidean domains to topological spaces. Certain categories of topological spaces become important
for their wide presence in applications. Manifolds are one such category which we introduce in
Section 1.4. Functions on them satisfying certain conditions are presented in Section 1.5. They
are well known as Morse functions. The critical points of such functions relate to the topol-
ogy of the manifold they are defined on. We introduce these concepts in the smooth setting in
this chapter, and later adapt them for the piecewise linear domains that are amenable for finite
computations.

1.1 Topological space


The basic object in a topological space is a ground set whose elements are called points. A
topology on these points specifies how they are connected by listing out what points constitute
a neighborhood – the so-called an open set. The expression “rubber-sheet topology” commonly
associated with the term ‘topology’ exemplifies this idea of connectivity of neighborhoods. If we
bend and stretch a sheet of rubber, it changes shape but always preserves the neighborhoods in
terms of the points and how they are connected.
We first introduce basic notions from point set topology. These notions are prerequisites for
more sophisticated topological ideas—manifolds, homeomorphism, isotopy, and other maps—
used later to study algorithms for topological data analysis. Homeomorphisms, for example, offer
a rigorous way to state that an operation preserves the topology of a domain, and isotopy offers

3
4 Computational Topology for Data Analysis

a rigorous way to state that the domain can be deformed into a shape without ever colliding with
itself.
Perhaps, it is more intuitive to understand the concept of topology in presence of a metric
because then we can use the metric balls such as Euclidean balls in an Euclidean space to define
neighborhoods – the open sets. Topological spaces provide a way to abstract out this idea without
a metric or point coordinates, so they are more general than metric spaces. In place of a metric, we
encode the connectivity of a point set by supplying a list of all of the open sets. This list is called
a system of subsets of the point set. The point set and its system together describe a topological
space.
Definition 1.1 (Topological space). A topological space is a point set T endowed with a system
of subsets T , which is a set of subsets of T that satisfies the following conditions.
• ∅, T ∈ T .

• For every U ⊆ T , the union of the subsets in U is in T .

• For every finite U ⊆ T , the common intersection of the subsets in U is in T .


The system T is called a topology on T. The sets in T are called the open sets in T. A
neighborhood of a point p ∈ T is an open set containing p.
First, we give examples of topological spaces to illustrate the definition above. These exam-
ples have the set T to be finite.
Example 1.1. Let T = {0, 1, 3, 5, 7}. Then, T = {∅, {0}, {1}, {5}, {1, 5}, {0, 1}, {0, 1, 5}, {0, 1, 3, 5, 7}}
is a topology because ∅ and T is in T required by the first axiom, union of any sets in T is in T
required by the second axiom, and intersection of any two sets is also in T required by the third
axiom. However, T = {∅, {0}, {1}, {1, 5}, {0, 1, 5}, {0, 1, 3, 5, 7}} is not a topology because the set
{0, 1} = {0} ∪ {1} is missing.
Example 1.2. Let T = {u, v, w}. The power set 2T = {∅, {u}, {v}, {w}, {u, v}, {u, w}, {v, w}, {u, v, w}}
is a topology. For any ground set T, the power set is always a topolgy on it which is called the
discrete topology.
One may take a subset of the power set as a ground set and define a topology as the next
example shows. We will recognize later that the ground set here corresponds to simplices in a
simplicial complex and the ’stars’ of simplices generate all open sets of a topology.
Example 1.3. Let T = {u, v, w, z, (u, z), (v, z), (w, z)}; this can be viewed as a graph with four
vertices and three edges as shown in Figure 1.1. Let
• T 1 = {{(u, z)}, {(v, z)}, {(w, z)}} and

• T 2 = {{(u, z), u}, {(v, z), v}, {(w, z), w}, {(u, z), (v, z), (w, z), z}}.
Then, T = 2T1 ∪T2 is a topology because it satisfies all three axioms. All open sets of T are
generated by union of elements in B = T 1 ∪ T 2 and there is no smaller set with this property. Such
a set B is called a basis of T . We will see later in the next chapter (Section 2.1) that these are
open stars of all vertices and edges.
Computational Topology for Data Analysis 5

v u v u v u

z z z

w w w
(a) (b) (c)
Figure 1.1: Example 3: (a) a graph as a topological space, stars of the vertices and edges as open
sets, (b) a closed cover with three elements, (c) an open cover with four elements.

We now present some more definitions that will be useful later.

Definition 1.2 (Closure; Closed sets). A set Q is closed if its complement T \ Q is open. The
closure Cl Q of a set Q ⊆ T is the smallest closed set containing Q.

In Example 1.1, the set {3, 5, 7} is closed because its complement {0, 1} in T is open. The
closure of the open set {0} is {0, 3, 7} because it is the smallest closed set (complement of open set
{1, 5}) containing 0. In Example 1.2, all sets are both open and closed. In Example 1.3, the set
{u, z, (u, z)} is closed, but the set {z, (u, z)} is neither open nor closed. Interestingly, observe that
{z} is closed. The closure of the open set {u, (u, z)} is {u, z, (u, z)}. In all examples, the sets ∅ and
T are both open and closed.

Definition 1.3. Given a topological space (T, T ), the interior Int A of a subset A ⊆ T is the union
of all open subsets of A. The boundary of A is Bd A = Cl A \ Int A.

The interior of the set {3, 5, 7} in Example 1.1 is {5} and its boundary is {3, 7}.

Definition 1.4 (Subspace topology). For every point set U ⊆ T, the topology T induces a subspace
topology on U, namely the system of open subsets U = {P ∩ U : P ∈ T }. The point set U endowed
with the system U is said to be a topological subspace of T.

In Example 1.1, consider the subset U = {1, 5, 7}. It has the subspace topology

U = {∅, {1}, {5}, {1, 5}, {1, 5, 7}}.

In Example 1.3, the subset U = {u, (u, z), (v, z)} has the subspace topology

{∅, {u, (u, z)}, {(u, z)}, {(v, z)}, {(u, z), (v, z)}, {u, (u, z), (v, z)}}.

Definition 1.5 (Connected). A topological space (T, T ) is disconnected if there are two disjoint
non-empty open sets U, V ∈ T so that T = U ∪ V. A topological space is connected if its not
disconnected.
6 Computational Topology for Data Analysis

The topological space in Example 1.1 is connected. However, the topological subspace (Def-
inition 1.4) induced by the subset {0, 1, 5} is disconnected because it can be obtained as the union
of two disjoint open sets {0, 1} and {5}. The topological space in Example 1.3 is also connected,
but the subspace induced by the subset {(u, z), (v, z), (w, z)} is disconnected.
Definition 1.6 (Cover; Compact). An open (closed) cover of a topological space (T, T ) is a col-
lection C of open (closed) sets so that T = c∈C c. The topological space (T, T ) is called compact
S
if every open cover C of it has a finite subcover, that is, there exists C 0 ⊆ C such that T = c∈C 0 c
S
and C 0 is finite.
In Figure 1.1(b), the cover consisting of {{u, z, (u, z)}, {v, z, (v, z)}, {w, z, (w, z)} is a closed cover
whereas the cover consisting of {{u, (u, z)}, {v, (v, z)}, {w, (w, z)}, {z, (u, z), (v, z), (w.z)} in Figure 1.1(c)
is an open cover. Any topological space with finite point set T is compact because all of its cov-
ers are finite. Thus, all topological spaces in the discussed examples are compact. We will see
example of non-compact topological spaces where the ground set is infinite.
In the above examples, the ground set T is finite. It can be infinite in general and topology
may have uncountably infinitely many open sets containing uncountably infinitely many points.
Next, we introduce the concept of quotient topology. Given a space (T, T ) and an equivalence
relation ∼ on elements in T, one can define a topology induced by the original topology T on the
quotient set T/ ∼ whose elements are equivalence classes [x] for every point x ∈ T.
Definition 1.7 (Quotient topology). Given a topological space (T, T ) and an equivalence relation
∼ defined on the set T, a quotient space (S, S ) induced by ∼ is defined by the set S = T/ ∼ and
the quotient topology S where

S := U ⊆ S | {x : [x] ∈ U} ∈ T .


We will see the use of quotient topology in Chapter 7 when we study Reeb graphs.
Infinite topological spaces may seem baffling from a computational point of view, because
they may have uncountably infinitely many open sets containing uncountably infinitely many
points. The easiest way to define such a topological space is to inherit the open sets from a metric
space. A topology on a metric space excludes information that is not topologically essential. For
instance, the act of stretching a rubber sheet changes the distances between points and thereby
changes the metric, but it does not change the open sets or the topology of the rubber sheet. In
the next section, we construct such a topology on a metric space and examine it from the concept
of limit points.

1.2 Metric space topology


Metric spaces are a special type of topological space commonly encountered in practice. Such
a space admits a metric that specifies the scalar distance between every pair of points satisfying
certain axioms.
Definition 1.8 (Metric space). A metric space is a pair (T, d) where T is a set and d is a distance
function d : T × T → R satisfying the following properties:
Computational Topology for Data Analysis 7

• d(p, q) = 0 if and only if p = q ∀p ∈ T;

• d(p, q) = d(q, p) ∀p, q ∈ T;

• d(p, q) ≤ d(p, r) + d(r, q) ∀p, q, r ∈ T.

It can be shown that three axioms above imply that d(p, q) ≥ 0 for every pair p, q ∈ T. In
a metric space T, an open metric ball with center c and radius r is defined to be the point set
Bo (c, r) = {p ∈ T : d(p, c) < r}. Metric balls define a topology on a metric space.

Definition 1.9 (Metric space topology). Given a metric space T, all metric balls {Bo (c, r) | c ∈
T and 0 < r ≤ ∞} and their union constituting the open sets define a topology on T.

All definitions for general topological spaces apply to metric spaces with the above defined
topology. However, we give alternative definitions using the concept of limit points which may
be more intuitive.
As we mentioned already, the heart of topology is the question of what it means for a set
of points to be connected. After all, two distinct points cannot be adjacent to each other; they
can only be connected to another by passing through an uncountably many intermediate points.
The idea of limit points helps express this concept more concretely, specifically in case of metric
spaces.
We use the notation d(·, ·) to express minimum distances between point sets P, Q ⊆ T,

d(p, Q) = inf{d(p, q) : q ∈ Q} and


d(P, Q) = inf{d(p, q) : p ∈ P, q ∈ Q}.

Definition 1.10 (Limit point). Let Q ⊆ T be a point set. A point p ∈ T is a limit point of Q, also
known as an accumulation point of Q, if for every real number  > 0, however tiny, Q contains a
point q , p such that that d(p, q) < .

In other words, there is an infinite sequence of points in Q that gets successively closer and
closer to p—without actually being p—and gets arbitrarily close. Stated succinctly, d(p, Q\{p}) =
0. Observe that it doesn’t matter whether p ∈ Q or not.
To see the parallel between definitions given in this subsection and the definitions given be-
fore, it is instructive to define limit points also for general topological spaces. In particular, a
point p ∈ T is a limit point of a set Q ⊆ T if every open set containing p intersect Q.

Definition 1.11 (Connected). A point set Q ⊆ T is called disconnected if Q can be partitioned


into two disjoint non-empty sets U and V so that there is no point in U that is a limit point of V,
and no point in V that is a limit point of U. (See the left in Figure 1.2 for an example.) If no such
partition exists, Q is connected, like the point set at right in Figure 1.2.

We can also distinguish between closed and open point sets using the concept of limit points.
Informally, a triangle in the plane is closed if it contains all the points on its edges, and open if it
excludes all the points on its edges, as illustrated in Figure 1.3. The idea can be formally extended
to any point set.
8 Computational Topology for Data Analysis

Figure 1.2: The point set at left is disconnected; it can be partitioned into two connected subsets
shaded differently. The point set at right is connected. The black point at the center is a limit point
of the points shaded lightly.

boundary relative boundary


boundary closure closure
closure interior boundary relative
interior interior
∅ relative relative
closure interior
closure boundary interior ∅

closed open closed closed relatively closed


open

Figure 1.3: Closed, open, and relatively open point sets in the plane. Dashed edges and open
circles indicate points missing from the point set.

Definition 1.12 (Closure; Closed; Open). The closure of a point set Q ⊆ T, denoted Cl Q, is the
set containing every point in Q and every limit point of Q. A point set Q is closed if Q = Cl Q,
i.e. Q contains all its limit points. The complement of a point set Q is T \ Q. A point set Q is open
if its complement is closed, i.e. T \ Q = Cl (T \ Q).

For example, consider the open interval (0, 1) ⊂ R, which contains every r ∈ R so that
0 < r < 1. Let [0, 1] denote a closed interval (0, 1) ∪ {0} ∪ {1}. The numbers 0 and 1 are both limit
points of the open interval, so Cl (0, 1) = [0, 1] = Cl [0, 1]. Therefore, [0, 1] is closed and (0, 1) is
not. The numbers 0 and 1 are also limit points of the complement of the closed interval, R \ [0, 1],
so (0, 1) is open, but [0, 1] is not.
The definition of open set of course depends on the space being considered. A triangle τ that
is missing the points on its edges, and therefore is open in the two-dimensional Euclidean space
aff τ. However, it is not open in the Euclidean space R3 . Indeed, every point in τ is a limit point
of R3 \ τ, because we can find sequences of points that approach τ from the side. In recognition
of this caveat, a simplex σ ⊂ Rd is said to be relatively open if it is open relative to its affine hull.
Figure 1.3 illustrates this fact where in this example, the metric space is R2 .
We can define the interior and boundary of a set using the notion of limit points also. Infor-
mally, the boundary of a point set Q is the set of points where Q meets its complement T \ Q. The
interior of Q contains all the other points of Q.
Computational Topology for Data Analysis 9

Definition 1.13 (Boundary; Interior). The boundary of a point set Q in a metric space T, denoted
Bd Q, is the intersection of the closures of Q and its complement; i.e. Bd Q = Cl Q ∩ Cl (T \ Q).
The interior of Q, denoted Int Q, is Q \ Bd Q = Q \ Cl (T \ Q).
For example, Bd [0, 1] = {0, 1} = Bd (0, 1) and Int [0, 1] = (0, 1) = Int (0, 1). The boundary
of a triangle (closed or open) in the Euclidean plane is the union of the triangle’s three edges, and
its interior is an open triangle, illustrated in Figure 1.3. The terms boundary and interior have
similar subtlety as open sets: the boundary of a triangle embedded in R3 is the whole triangle,
and its interior is the empty set. However, relative to its affine hull, its interior and boundary are
defined exactly as in the case of triangles embedded in the Euclidean plane. Interested readers
can draw the analogy between this observation and the definition of interior and boundary of a
manifold that appear later in Definition 1.23.
We have seen a definition of compactness of a point set in a topological space (Definition 1.6).
We define it differently here for the metric space. It can be shown that the two definitions are
equivalent.
Definition 1.14 (Bounded; Compact). The diameter of a point set Q is sup p,q∈Q d(p, q). The set
Q is bounded if its diameter is finite, and is unbounded otherwise. A point set Q in a metric space
is compact if it is closed and bounded.
In the Euclidean space Rd we can use the standard Euclidean distance as the choice of metric.
On the surface of the coffee mug, we could choose the Euclidean distance too; alternatively, we
could choose the geodesic distance, namely the length of the shortest path from p to q on the
mug’s surface.
Example 1.4 (Euclidean ball). In Rd , the Euclidean d-ball with center c and radius r, denoted
B(c, r), is the point set B(c, r) = {p ∈ Rd : d(p, c) ≤ r}. A 1-ball is an edge, and a 2-ball is called
a disk. A unit ball is a ball with radius 1. The boundary of the d-ball is called the Euclidean
(d − 1)-sphere and denoted S (c, r) = {p ∈ Rd : d(p, c) = r}. The name expresses the fact that we
consider it a (d − 1)-dimensional point set—to be precise, a (d − 1)-dimensional manifold—even
though it is embedded in d-dimensional space. For example, a circle is a 1-sphere, and a layman’s
“sphere” in R3 is a 2-sphere. If we remove the boundary from a ball, we have the open Euclidean
d-ball Bo (c, r) = {p ∈ Rd : d(p, c) < r}.
The topological spaces that are subspaces of a metric space such as Rd inherit their topology
as a subspace topology. Examples of topological subspaces are the Euclidean d-ball Bd , Euclidean
d-sphere Sd , open Euclidean d-ball Bdo , and Euclidean halfball Hd , where
Bd = {x ∈ Rd : kxk ≤ 1},
Sd = {x ∈ Rd+1 : kxk = 1},
Bdo = {x ∈ Rd : kxk < 1},
Hd = {x ∈ Rd : kxk < 1 and xd ≥ 0}.

1.3 Maps, homeomorphisms, and homotopies


Equivalence of two topological spaces is determined by how the points that comprise them are
connected. For example, the surface of a cube can be deformed into a sphere without cutting or
10 Computational Topology for Data Analysis

gluing it because they are connected the same way. They have the same topology. This notion
of topological equivalence can be formalized via functions that send the points of one space to
points of the other while preserving the connectivity.
This preservation of connectivity is achieved by preserving the open sets. A function from one
space to another that preserves the open sets is called a continuous function or a map. Continuity
is a vehicle to define topological equivalence, because a continuous function can send many points
to a single point in the target space, or send no points to a given point in the target space. If the
former does not happen, that is, when the function is injective, we call it an embedding of the
domain into the target space. True equivalence is given by a homeomorphism, a bijective function
from one space to another which has continuity as well as a continuous inverse. This ensures that
open sets are preserved in both directions.

Definition 1.15 (Continuous function; Map). A function f : T → U from the topological space
T to another topological space U is continuous if for every open set Q ⊆ U, f −1 (Q) is open.
Continuous functions are also called maps.

Definition 1.16 (Embedding). A map g : T → U is an embedding of T into U if g is injective.

A topological space can be embedded into a Euclidean space by assigning coordinates to its
points so that the assignment is continuous and injective. For example, drawing a triangle on a
paper is an embedding of S1 into R2 . There are topological spaces that cannot be embedded into a
Euclidean space, or even into a metric space—these spaces cannot be represented by any metric.
Next we define homeomorphism that connects two spaces that have essentially the same topol-
ogy.

Definition 1.17 (Homeomorphism). Let T and U be topological spaces. A homeomorphism is a


bijective map h : T → U whose inverse is continuous too.
Two topological spaces are homeomorphic if there exists a homeomorphism between them.

Homeomorphism induces an equivalence relation among topological spaces, which is why


two homeomorphic topological spaces are called topologically equivalent. Figure 1.4 shows pairs
of homeomorphic topological spaces. A less obvious example is that the open d-ball Bdo is home-
omorphic to the Euclidean space Rd , given by the homeomorphism h(x) = 1−kxk
1
x. The same map
also exhibits that the halfball H is homeomorphic to the Euclidean halfspace {x ∈ Rd : xd ≥ 0}.
d

For maps between compact spaces, there is a weaker condition to be verified for homeomor-
phism because of the following property.

Proposition 1.1. If T and U are compact metric spaces, every bijective map from T to U has a
continuous inverse.

One can take advantage of this fact to prove that certain functions are homeomorphisms by
showing continuity only in the forward direction. When two topological spaces are subspaces of
the same larger space, a notion of similarity called isotopy exists which is stronger than homeo-
morphism. If two subspaces are isotopic, one can be continuously deformed to the other while
keeping the deforming subspace homeomorphic to its original form all the time. For example, a
solid cube can be continuously deformed into a ball in this manner.
Computational Topology for Data Analysis 11

Figure 1.4: Each point set in this figure is homeomorphic to the point set above or below it, but
not to any of the others. Open circles indicate points missing from the point set, as do the dashed
edges in the point sets second from the right.

(a) (b) (c)

Figure 1.5: Two tori knotted differently, one triangulated and the other not. Both are homeomor-
phic to the standard unknotted torus on the left, but not isotopic to it.

Homeomorphic subspaces are not necessarily isotopic. Consider a torus embedded in R3 ,


illustrated in Figure 1.5(a). One can embed the torus in R3 so that it is knotted, as shown in Fig-
ure 1.5(b) and (c). The knotted torus is homeomorphic to the standard, unknotted one. However,
it is not possible to continuously deform one to the other while keeping it embedded in R3 and
homeomorphic to the original. Any attempt to do so forces the torus to be “self-intersecting” and
thus not being a manifold. One way to look at this obstruction is by considering the topology
of the space around the tori. Although the knotted and unknotted tori are homeomorphic, their
complements in R3 are not. This motivates us to consider both the notion of an isotopy, in which
a torus deforms continuously, and the notion of an ambient isotopy, in which not only the torus
deforms; the entire R3 deforms with it.

Definition 1.18 (Isotopy). An isotopy connecting two spaces T ⊆ Rd and U ⊆ Rd is a continuous


map ξ : T × [0, 1] → Rd where ξ(T, 0) = T, ξ(T, 1) = U, and for every t ∈ [0, 1], ξ(·, t) is a
homeomorphism between T and its image {ξ(x, t) : x ∈ T}. An ambient isotopy connecting T and
U is a map ξ : Rd × [0, 1] → Rd such that ξ(·, 0) is the identity function on Rd , ξ(T, 1) = U, and
for each t ∈ [0, 1], ξ(·, t) is a homeomorphism.
12 Computational Topology for Data Analysis

For an example, consider the map


1 − (1 − t)kxk
ξ(x, t) = x
1 − kxk

that sends the open d-ball Bdo to itself if t = 0, and to the Euclidean space Rd if t = 1. The
parameter t plays the role of time, that is, ξ(Bdo , t) deforms continuously from a ball at time zero
to Rd at time one. Thus, there is an isotopy between the open d-ball and Rd .
Every ambient isotopy becomes an isotopy if its domain is restricted from Rd × [0, 1] to
T × [0, 1]. It is known that if there is an isotopy between two subspaces, then there exists an
ambient isotopy between them. Hence, the two notions are equivalent.
There is another notion of similarity among topological spaces that is weaker than homeo-
morphism, called homotopy equivalence. It relates spaces that can be continuously deformed to
one another but the transformation may not preserve homeomorphism. For example, a ball can
shrink to a point, which is not homeomorphic to it because a bijective function from an infinite
point set to a single point cannot exist. However, homotopy preserves some form of connectivity,
such as the number of connected components, holes, and/or voids. This is why a coffee cup is
homotopy equivalent to a circle, but not to a ball or a point.
To get to homotopy equivalence, we first need the concept of homotopies, which are isotopies
sans the homeomorphism.

Definition 1.19 (Homotopy). Let g : X → U and h : X → U be maps. A homotopy is a map


H : X × [0, 1] → U such that H(·, 0) = g and H(·, 1) = h. Two maps are homotopic if there is a
homotopy connecting them.

For example, let g : B3 → R3 be the identity map on a unit ball and h : B3 → R3 be the map
sending every point in the ball to the origin. The fact that g and h are homotopic is demonstrated
by the homotopy H(x, t) = (1 − t) · g(x). Observe that H(B3 , t) deforms continuously a ball at time
zero to a point at time one. A key property of a homotopy is that, as H is continuous, at every
time t the map H(·, t) remains continuous.
For developing more intuition, consider two maps that are not homotopic. Let g : S1 → S1
be the identity map from the circle to itself, and let h : S1 → S1 map every point on the circle to
a single point p ∈ S1 . Although apparently it seems that we can contract a circle to a point, that
view is misleading because the map H is required to map every point on the circle at every time
to a point on the circle. The contraction of the circle to a point is possible only if we break the
continuity, say by cutting or gluing the circle somewhere.
Observe that a homeomorphism relates two topological spaces T and U whereas a homotopy
or an isotopy (which is a special kind of homotopy) relates two maps, thereby indirectly estab-
lishing a relationship between two subspaces g(X) ⊆ U and h(X) ⊆ U. That relationship is not
necessarily an equivalent one, but the following is.

Definition 1.20 (Homotopy equivalent). Two topological spaces T and U are homotopy equivalent
if there exist maps g : T → U and h : U → T such that h ◦ g is homotopic to the identity map
ιT : T → T and g ◦ h is homotopic to the identity map ιU : U → U.

Homotopy equivalence is indeed an equivalence relation, that is, if A, B and B, C are homo-
topy equivalent spaces, so are the pairs A, C. Homeomorphic spaces necessarily have the same
Computational Topology for Data Analysis 13

Figure 1.6: All three of the topological spaces are homotopy equivalent, because they are all
deformation retracts of the leftmost space.

dimension though homotopy equivalent spaces may have different dimensions. To gain more
intuition about homotopy equivalent spaces, we show why a 2-ball is homotopy equivalent to a
single point p. Consider a map h : B2 → {p} and a map g : {p} → B2 where g(p) is any point q
in B2 . Observe that h ◦ g is the identity map on {p}, which is trivially homotopic to itself. In the
other direction, g ◦ h : B2 → B2 sends every point in B2 to q. A homotopy between g ◦ h and the
identity map idB2 is given by the map H(x, t) = (1 − t)q + tx.
An useful intuition for understanding the definition of homotopy equivalent spaces can be
derived from the fact that two spaces T and U are homotopy equivalent if and only if there exists
a third space X so that both T and U are deformation retracts of X; see Figure 1.6.
Definition 1.21 (Deformation retract). Let T be a topological space, and let U ⊂ T be a subspace.
A retraction r of T to U is a map from T to U such that r(x) = x for every x ∈ U. The space U is
a deformation retract of T if the identity map on T can be continuously deformed to a retraction
with no motion of the points already in U: specifically, there is a homotopy called deformation
retraction R : T × [0, 1] → T such that R(·, 0) is the identity map on T, R(·, 1) is a retraction of T
to U, and R(x, t) = x for every x ∈ U and every t ∈ [0, 1].
Fact 1.1. If U is a deformation retract of T, then T and U are homotopy equivalent.
For example, any point on a line segment (open or closed) is a deformation retract of the
line segment and is homotopy equivalent to it. The letter M is a deformation retract of the letter
W, and also of a 1-ball. Moreover, as we said before, two spaces are homotopy equivalent if
they are deformation retractions of a common space. The symbols ∅, ∞, and  (viewed as
one-dimensional point sets) are deformation retracts of a double doughnut—a doughnut with
two holes. Therefore, they are homotopy equivalent to each other, though none of them is a
deformation retract of any of the others because one is not a subspace of the other. They are not
homotopy equivalent to A, X, O, ⊕, , }, a ball, nor a coffee cup.

1.4 Manifolds
A manifold is a topological space that is locally connected in a particular way. A 1-manifold
has this local connectivity looking like a segment. A 2-manifold (with boundary) has the local
connectivity looking like a complete or partial disc. In layman’s term, a 2-manifold has the
structure of a piece of paper or rubber sheet, possibly with the boundaries glued together forming
a closed surface—a category that includes disks, spheres, tori, and Möbius bands.
Definition 1.22 (Manifold). A topological space M is a m-manifold, or simply manifold, if every
point x ∈ M has a neighborhood homeomorphic to Bm m
o or H . The dimension of M is m.
14 Computational Topology for Data Analysis

Every manifold can be partitioned into boundary and interior points. Observe that these words
mean very different things for a manifold than they do for a metric space or topological space.

Definition 1.23 (Boundary; Interior). The interior Int M of a m-manifold M is the set of points in
M that have a neighborhood homeomorphic to Bm o . The boundary Bd M of M is the set of points
M \ Int M. The boundary Bd M, if not empty, consists of the points that have a neighborhood
homeomorphic to Hm . If Bd M is the empty set, we say that M is without boundary.

A single point, a 0-ball, is a 0-manifold without boundary according to this definition. The
closed disk B2 is a 2-manifold whose interior is the open disk B2o and whose boundary is the circle
S1 . The open disk B2o is a 2-manifold whose interior is B2o and whose boundary is the empty set.
This highlights an important difference between Definitions 1.13 and 1.23 of “boundary”: when
B2o is viewed as a point set in the space R2 , its boundary is S1 according to Definition 1.13; but
viewed as a manifold, its boundary is empty according to Definition 1.23. The boundary of a
manifold is always included in the manifold.
The open disk B2o , the Euclidean space R2 , the sphere S2 , and the torus are all connected 2-
manifolds without boundary. The first two are homeomorphic to each other, but the last two are
not. The sphere and the torus in R3 are compact (bounded and closed with respect to R3 ) whereas
B2o and R2 are not.
A d-manifold, d ≥ 2 can have orientations whose formal definition we skip here. Informally,
we say that a 2-manifold M is non-orientable if, starting from a point p, one can walk on one
side of M and end up on the opposite side of M upon returning to p. Otherwise, M is orientable.
Spheres and balls are orientable, whereas the Möbius band in Figure 1.7 (a) is a non-orientable
2-manifold with boundary.

(a) (b) (c) (d)

Figure 1.7: (a) A Möbius band. (b) Removal of the red and green loops opens up the torus into a
topological disk. (c) A double torus: every surface without boundary in R3 resembles a sphere or
a conjunction of one or more tori. (d) Double torus knotted.

A surface is a 2-manifold that is a subspace of Rd . Any compact surface without boundary in


R3 is an orientable 2-manifold. To be non-orientable, a compact surface must have a nonempty
boundary (like the Möbius band) or be embedded in a 4- or higher-dimensional Euclidean space.
A surface can sometimes be disconnected by removing one or more loops (connected 1-
manifolds without boundary) from it. The genus of an orientable and compact surface without
Computational Topology for Data Analysis 15

boundary is g if 2g is the maximum number of loops that can be removed from the surface without
disconnecting it; here the loops are permitted to intersect each other. For example, the sphere has
genus zero as every loop cuts it into two discs. The torus has genus one: a circular cut around
its neck and a second circular cut around its circumference, illustrated in Figure 1.7(b), allow it
to unfold into a topological disk. A third loop would cut it into two pieces. Figure 1.7(c) and (d)
each shows a 2-manifold without boundary of genus 2. Although a high-genus surface can have
a very complex shape, all compact 2-manifolds in R3 that have the same genus and no boundary
are homeomorphic to each other.

1.4.1 Smooth manifolds


A purely topological manifold has no geometry. But if we embed it in a Euclidean space, it could
appear smooth or wrinkled. We now introduce a “geometric” manifold by imposing a differential
structure on it. For the rest of this chapter, we focus on only manifolds without boundary.
Consider a map φ : U → W where U and W are open sets in Rk and Rd , respectively. The map
φ has d components, namely φ(x) = (φ1 (x), φ2 (x), . . . , φd (x)), where x = (x1 , x2 , . . . , xk ) denotes
a point in Rk . The Jacobian of φ at x is the d × k matrix of the first-order partial derivatives
∂φ1 (x) ∂φ1 (x)
...
 
∂x1 ∂xk
 
.. .. ..
 
.  .
. .

 
∂φd (x) ∂φd (x)
...
 
∂x1 ∂xk

The map φ is regular if its Jacobian has rank k at every point in U. The map φ is C i -continuous if
the ith-order partial derivatives of φ are continuous.
The reader may be familiar with parametric surfaces, for which U is a 2-dimensional param-
eter space and its image φ(U) in d-dimensional space is a parametric surface. Unfortunately, a
single parametric surface cannot easily represent a manifold with a complicated topology. How-
ever, for a manifold to be smooth, it suffices that each point on the manifold has a neighborhood
that looks like a smooth parametric surface.

Definition 1.24 (Smooth embedded manifold). For any i > 0, an m-manifold M without boundary
embedded in Rd is C i -smooth if for every point p ∈ M, there exists an open set U p ⊂ Rm , a
neighborhood W p ⊂ Rd of p, and a map φ p : U p → W p ∩ M such that (i) φ p is C i -continuous, (ii)
φ p is a homeomorphism, and (iii) φ p is regular. If m = 2, we call M a C i -smooth surface.

The first condition says that each map is continuously differentiable at least i times. The
second condition requires each map to be bijective, ruling out “wrinkles” where multiple points
in U map to a single point in W. The third condition prohibits any map from having a directional
derivative of zero at any point in any direction. The first and third conditions together enforce
smoothness, and imply that there is a well-defined tangent m-flat at each point in M. The three
conditions together imply that the maps φ p defined in the neighborhood of each point p ∈ M
overlap smoothly. There are two extremes of smoothness. We say that M is C ∞ -smooth if for
every point p ∈ M, the partial derivatives of φ p of all orders are continuous. On the other hand,
M is nonsmooth if M is a m-manifold (therefore C 0 -smooth) but not C 1 -smooth.
16 Computational Topology for Data Analysis

1.5 Functions on smooth manifolds

R
(a) (b)

Figure 1.8: (a) The graph of a function f : R2 → R. (b) The graph of a function f : R → R with
critical points marked.

In previous sections, we introduced topological spaces, including the special case of (smooth)
manifolds. Very often, a space can be equipped with continuous functions defined on it. In this
section, we focus on real-valued functions of the form f : X → R defined on a topological space
X, also called scalar functions; see Figure 1.8 (a) for the graph of a function f : R2 → R. Scalar
functions appear commonly in practice that describe space/data of interest (e.g., the elevation
function defined on the surface of earth). We are interested in the topological structures behind
scalar functions. In this section, we limit our discussion to nicely behaved scalar functions (called
Morse functions) defined on smooth manifolds. Their topological structures are characterized
by the so-called critical points which we will introduce below. Later in the book we will also
discuss scalar functions on simplicial complex domains, as well as more complex maps defined
on a space X, e.g., a multivariate function f : X → Rd .

1.5.1 Gradients and critical points


In what follows, for simplicity of presentation, we assume that we consider smooth (C ∞ -continuous)
functions and smooth manifolds embedded in Rd , even though often we only require the functions
(resp. manifolds) to be C 2 -continuous (resp. C 2 -smooth).
To provide intuition, let us start with a smooth scalar function defined on the real line: f :
R → R; the graph of such a function is shown in Figure 1.8 (b) on the right. Recall that the
derivative of a function at a point x ∈ R is defined as:
d f (x + t) − f (x)
D f (x) = f (x) = lim . (1.1)
dx t→0 t
The value D f (x) gives the rate of change of the value of f at x. This can be visualized as the slope
of the tangent line of the graph of f at (x, f (x)). The critical points of f are the set of points x
such that D f (x) = 0. For a function defined on the real line, there are two types of critical points
in the generic case: maxima and minima, as marked in the figure.
Computational Topology for Data Analysis 17

Now suppose we have a smooth function f : Rd → R defined on Rd . Fix an arbitrary point


x ∈ Rd . As we move a little around x within its local neighborhood, the rate of change of f differs
depending on which direction we move. This gives rise to the directional derivative Dv f (x) at x
in direction (i.e., a unit vector) v ∈ Sd−1 , where Sd−1 is the unit (d − 1)-sphere, defined as:
f (x + t · v) − f (x)
Dv f (x) = lim (1.2)
t→0 t

The gradient vector of f at x ∈ Rd intuitively captures the direction of steepest increase of function
f . More precisely, we have:

Definition 1.25 (Gradient for functions on Rd ). Given a smooth function f : Rd → R, the gradient
vector field ∇ f : Rd → Rd is defined as follows: for any x ∈ Rd ,
 ∂f ∂f ∂f
∇ f (x) = (x) T ,

(x), (x), · · · (1.3)
∂x1 ∂x2 ∂xd

where (x1 , x2 , . . . , xd ) represents an orthonormal coordinate system for Rd . The vector ∇ f (x) ∈ Rd
is called the gradient vector of f at x. A point x ∈ Rd is a critical point if ∇ f (x) = [0 0 · · · 0]T ;
otherwise, x is regular.

Observe that for any v ∈ Rd , the directional derivative satisfies that Dv f (x) = h∇ f (x), vi.
It then follows that ∇ f (x) ∈ Rd is along the unit vector v where Dv f (x) is maximized among
the directional derivatives in all unit directions around x; and its magnitude k∇ f (x)k equals the
value of this maximum directional derivative. The critical points of f are those points where
the directional derivative vanishes in all directions – locally, the rate of change for f is zero no
matter which direction one deviates from x. See Figure 1.9 for the three types of critical points,
minimum, saddle point, and maximum, for a generic smooth function f : R2 → R.
Finally, we can extend the above definitions of gradients and critical points to a smooth func-
tion f : M → R defined on a smooth Riemannian m-manifold M. Here, a Riemannian manifold
is a manifold equipped with a Riemannian metric, which is a smoothly varying inner product de-
fined on the tangent spaces. This allows the measurements of length so as to define gradient. At a
point x ∈ M, denote the tangent space of M at x by TM x , which is the m-dimensional vector space
consisting of all tangent vectors of M at x. For example, TM x is a m-dimensional linear space Rm
for a m-dimensional manifold M embedded in the Euclidean space Rd , with Riemannian metric
(inner product in the tangent space) induced from Rd .
The gradient ∇ f is a vector field on M, that is, ∇ f : M → TM maps every point x ∈ M to
a vector ∇ f (x) ∈ TM x in the tangent space of M at x. Similar to the case for a function defined
on Rd , the gradient vector field ∇ f satisfies that for any x ∈ M and v ∈ TM x , h∇ f (x), vi gives
rise to the directional derivative Dv f (x) of f in direction v, and ∇ f (x) still specifies the direction
of steepest increase of f along all directions in TM x with its magnitude being the maximum rate
of change. More formally, we have the following definition, analogous to Definition 1.25 for the
case of a smooth function on Rd .

Definition 1.26 (Gradient vector field; Critical points). Given a smooth function f : M → R
defined on a smooth m-dimensional Riemannian manifold M, the gradient vector field ∇ f : M →
18 Computational Topology for Data Analysis

minimum (index-0) saddle (index-1) maximum (index-2) monkey-saddle

p p p p

Figure 1.9: Top row: The graph of the function around non-degenerate critical points for a smooth
function on R2 , and a degenerate critical point, called “monkey saddle”. For example, for an
index-0 critical point p, its local neighborhood can be written as f (x) = f (p) + x12 + x22 , making
p a local minimum. Bottom row: the local (closed) neighborhood of the corresponding critical
point in the domain R2 , where the dark blue colored regions are the portion of neighborhood of p
whose function value is at most f (p).

TM is defined as follows: for any x ∈ M, let (x1 , x2 , . . . , xm ) be a local coordinate system in a


neighborhood of x with orthonormal unit vectors xi , the gradient at x is
 ∂f ∂f ∂f
∇ f (x) = (x) T .

(x), (x), · · ·
∂x1 ∂x2 ∂xm
A point x ∈ M is critical if ∇ f (x) vanishes, in which case f (x) is called a critical value for f .
Otherwise, x is regular.
It follows from the chain rule that the criticality of a point x is independent of the local
coordinate system being used.

1.5.2 Morse functions and Morse Lemma


From the first-order derivatives of a function we can determine critical points. We can learn more
about the “type" of the critical points by inspecting the second-order derivatives of f .
Definition 1.27 (Hessian matrix; Non-degenerate critical points). Given a smooth m-manifold M,
the Hessian matrix of a twice differentiable function f : M → R at x is the matrix of second-order
partial derivatives,
 ∂2 f ∂2 f ∂2 f


∂x ∂x (x) ∂x ∂x (x) · · · ∂x ∂x (x) 
 ∂21 f 1 ∂
1 2
2f ∂
1 m
2f 
(x) 2(x) · · · (x)
Hessian(x) =  ∂x2 ∂x.1 ∂x2 ∂x2 ∂x2 ∂xm
 
 ,
.. .. .. ..
. . .
 
 2 
 ∂ f ∂2 f ∂2 f 
∂xm ∂x1 (x) ∂xm ∂x2 2(x) · · · ∂xm ∂xm (x)

where (x1 , x2 , . . . , xm ) is a local coordinate system in a neighborhood of x.


Computational Topology for Data Analysis 19

A critical point x of f is non-degenerate if its Hessian matrix Hessian(x) is non-singular (has


non-zero determinant); otherwise, it is a degenerate critical point.
For example, consider f : R2 → R defined by f (x, y) = x3 − 3xy2 . The origin (0, 0) is a
degenerate critical point often referred to as a “monkey saddle": see the last picture in Figure 1.9,
where the graph of the function around (0, 0) goes up and down three times (instead of twice as for
a non-degenerate saddle shown in the second picture). It turns out that, as a consequence of the
Morse Lemma below, non-degenerate critical points are always isolated whereas the degenerate
ones may not be so. A simple example is f : R2 → R defined by f (x, y) = x2 , where all points on
the y-axis are degenerate critical points. The local neighborhood of non-degenerate critical points
can be completely characterized by the following Morse Lemma:
Proposition 1.2 (Morse Lemma). Given a smooth function f : M → R defined on a smooth
m-manifold M, let p be a non-degenerate critical point of f . Then there is a local coordinate
system in a neighborhood U(p) of p so that (i) the coordinate of p is (0, 0, . . . , 0), and (ii) locally
for every point x = (x1 , x2 , ..., xm ) in neighborhood U(p),
f (x) = f (p) − x12 − ...x2s + x2s+1 ...xm
2
, for some s ∈ [0, m].
The number s of minus signs in the above quadratic representation of f (x) is called the index of
the critical point p.
For a smooth function f : M → R defined on a 2-manifold M, an index-0, index-1, or index-2
(non-degenerate) critical point corresponds to a minimum, a saddle, or a maximum, respectively.
For a function defined on a m-manifold, non-degenerate critical points include minima (index-0),
maxima (index-m), and m − 1 types of saddle points.
The behavior of degenerate critical points is more complicated to characterize. Instead, we
now introduce a family of “nice” functions, called Morse functions whose critical points cannot
be degenerate.
Definition 1.28 (Morse function). A smooth function f : M → R defined on a smooth manifold
M is a Morse function if and only if: (i) none of f ’s critical points are degenerate; and (ii) the
critical points have distinct function values.
Limiting our study only to well-behaved Morse functions is not too restrictive as the Morse
functions form an open and dense subset of the space of all smooth functions C ∞ (M) on M. So
in this sense, a generic function is a Morse function. On the other hand, it is much cleaner to
characterize the topology induced by such a function, which we do now.

1.5.3 Connection to topology


We now characterize how critical points influence the topology of M induced by the scalar func-
tion f : M → R.
Definition 1.29 (Interval, sub, and superlevel sets). Given f : M → R and I ⊆ R, the interval
levelset of f w.r.t. I is defined as:
MI = f −1 (I) = {x ∈ M | f (x) ∈ I}.
The case for I = (−∞, a] is also referred to as sublevel set M≤a := f −1 ((−∞, a]) of f , while
M≥a := f −1 ([a, ∞)) is called the superlevel set; and f −1 (a) is called the levelset of f at a ∈ R.
20 Computational Topology for Data Analysis

f z z z

w w w

v v v v v

u u u u u u

(a) (b) (c) (d) (e) (f)

Figure 1.10: (a) The height function defined on a torus with critical points u, v, w, and z. (b) – (f):
Passing through an index-k critical point is the same as attaching a k-cell from the homotopy point
of view. For example, M≤a+ε for a = f (v) (as shown in (d)) is homotopy equivalent to attaching a
1-cell (shown in (c)) to M≤a−ε (shown in (b)) for an infinitesimal positive ε.

Given f : M → R, imagine sweeping M with increasing function values of f . It turns out


that the topology of the sublevel sets can only change when we sweep through critical values of
f . More precisely, we have the following classical result, where a diffeomorphism is a homeo-
morphism that is smooth in both directions.
Theorem 1.3 (Homotopy type of sublevel sets). Let f : M → R be a smooth function defined
on a manifold M. Given a < b, suppose the interval levelset M[a,b] = f −1 ([a, b]) is compact and
contains no critical points of f . Then M≤a is diffeomorphic to M≤b .
Furthermore, M≤a is a deformation retract of M≤b , and the inclusion map i : M≤a ,→ M≤b is
a homotopy equivalence.
As an illustration, consider the example of height function f : M → R defined on a vertical
torus as shown in Figure 1.10 (a). There are four critical points for the height function f , u
(minimum), v, w (saddles) and z (maximum). We have that M≤a is (1) empty for a < f (u); (2)
homeomorphic to a 2-disk for f (u) < a < f (v); (3) homeomorphic to a cylinder for f (v) <
a < f (w); (4) homeomorphic to a compact genus-one surface with a circle as boundary for
f (w) < a < f (z); and (5) a full torus for a > f (z).
Theorem 1.3 states that the homotopy type of the sublevel set remains the same until it passes
a critical point. For Morse functions, we can also characterize the homotopy type of sublevel sets
around critical points, captured by attaching k-cells.
Specifically, recall that Bk is the k-dimensional unit Euclidean ball, and its boundary is Sk−1 ,
the (k − 1)-dimensional sphere. Let X be a topological space, and g : Sk−1 → X a continuous
map. For k > 0, attaching a k-cell to X (w.r.t. g) is obtained by attaching the k-cell Bk to X along
its boundary as follows: first, take the disjoint union of X and Bk , and next, identify all points
x ∈ Sk−1 with g(x) ∈ X. For the special case of k = 0, attaching a 0-cell to X is obtained by simply
taking the disjoint union of X and a single point.
The following theorem states that, from the homotopy point of view, sweeping past an index-k
critical point is equivalent to attaching a k-cell to the sublevel set. See Figure 1.10 for illustrations.
Theorem 1.4. Given a Morse function f : M → R defined on a smooth manifold M, let p be an
index-k critical point of f with α = f (p). Assume f −1 ([α − ε, α + ε]) is compact for a sufficiently
Computational Topology for Data Analysis 21

small ε > 0 such that there is no other critical points of f contained in this interval-level set other
than p. Then the sublevel set M≤α+ε has the same homotopy type as M≤α−ε with a k-cell attached
to its boundary Bd M≤α−ε .
Finally, we state the well-known Morse inequalities, connecting critical points with the so-
called Betti numbers of the domain which we will define formally in Section 2.5. In particular,
fixing a field coefficient, the i-th Betti number is the rank of the so-called i-th (singular) homology
group of a topological space X.
Theorem 1.5 (Morse inequalities). Let f be a Morse function on a smooth compact d-manifold
M. For 0 ≤ i ≤ d, let ci denote the number of critical points of f with index i, and βi be the i-th
Betti number of M. We then have:
• ci ≥ βi for all i ≥ 0; and di=0 (−1)i ci = di=0 (−1)i βi . (weak Morse inequality)
P P

• ci − ci−1 + ci−2 − · · · ± c0 ≥ βi − βi−1 + βi−2 · · · ± β0 for all i ≥ 0. (strong Morse inequality)

1.6 Notes and Exercises


A good source on point set topology is Munkres [242]. The concepts of various maps and man-
ifolds are well described in Hatcher [186]. Books by Guillemin and Pollack [179] and Mil-
nor [232, 233] are good sources for Morse theory on smooth manifolds and differential topology
in general.

Exercises
1. A space is called Hausdorff if every two disjoint point sets have disjoint open sets containing
them.
(a) Give an example of a space that is not Hausdorff.
(b) Give an example of a space that is Hausdorff.
(c) Show the above examples on the same ground set T.
2. In every space T, the point sets ∅ and T are both closed and open.
(a) Give an example of a space that has more than two sets that are both closed and open,
and list all of those sets.
(b) Explain the relationship between the idea of connectedness and the number of sets
that are both closed and open.
3. A topological space T is called path connected if any two points x, y ∈ T can be joined by
a path, i.e. there exists a continuous map f : [0, 1] → T of the segment [0, 1] ⊂ R onto T
so that f (0) = x and f (1) = y. Prove that a path connected space is also connected but the
converse may not be true; however, if T is finite, then the two notions are equivalent.
4. Prove that for every subset X of a metric space, Cl Cl X = Cl X. In other words, augmenting
a set with its limit points does not give it more limit points.
22 Computational Topology for Data Analysis

5. Show that any metric on a finite set induces a discrete topology.

6. Prove that the metric is a continuous function on the Cartesian space T × T of a metric space
T.

7. Give an example of a bijective function that is continuous, but its inverse is not. In light of
Proposition 1.1, the spaces need to be non-compact.

8. A space is called normal if it is Hausdorff and for any two disjoint closed sets X and Y,
there are disjoint open sets U X ⊃ X and UY ⊃ Y. Show that any metric space is normal.
Show the same for any compact space.

9. Let f : T → U be a continuous function of a compact space T into another space U. Prove


that the image f (T) is compact.

10. (a) Construct an explicit deformation retraction of Rk \ {o} onto Sk−1 where o denotes the
origin. Also, show Rk ∪ {∞} is homeomorphic to Sk .
(b) Show that any d-dimensional finite convex polytope is homeomorphic to the d-dimensional
unit ball Bd .

11. Deduce that homeomorphism is an equivalence relation. Show that the relation of homo-
topy among maps is an equivalence relation.

12. Consider the function f : R3 → R defined as f (x1 , x2 , x3 ) = 3x12 + 3x22 − 9x32 . Show that
the origin (0, 0, 0) is a critical point of f . Give the index of this critical point. Let S denote
the unit sphere centered at the origin. Show that f (−∞,0] ∩ S is homotopy equivalent to two
points, whereas f [0,∞) ∩ S is homotopy equivalent to S1 , the unit 1-sphere (i.e., circle).
Chapter 2

Complexes and Homology Groups

This chapter introduces two very basic tools on which topological data analysis (TDA) is built.
One is simplicial complexes and the other is homology groups. Data supplied as a discrete set
of points do not have an interesting topology. Usually, we construct a scaffold on top of it which
is commonly taken as a simplicial complex. It consists of vertices at the data points, edges con-
necting them, triangles, tetrahedra and their higher dimensional analogues that establish higher
order connectivity. Section 2.1 formalizes this construction. There are different kinds of simpli-
cial complexes. Some are easier to compute, but take more space. The others are more sparse,
but take more time to compute. Section 2.2 presents an important construction called the nerve
and a complex called the Čech complex which is defined on this construction. This section also
presents a commonly used complex in topological data analysis called the Vietoris-Rips complex
that interleaves with the Čech complexes in terms of containment. In Section 2.3, we introduce
some of the complexes which are sparser in size than the Vietoris-Rips or Čech complexes.
The second topic of this chapter, the homology groups of a simplicial complex, are the essen-
tial algebraic structures with which TDA analyzes data. Homology groups of a topological space
capture the space of cycles up to the ones called boundaries that bound “higher dimensional” sub-
sets. For simplicity, we introduce the concept in the context of simplicial complexes instead of
topological spaces. This is called simplicial homology. The essential entities for defining the ho-
mology groups are chains, cycles, and boundaries which we cover in Section 2.4. For simplicity
and also for the relevance in TDA, we define these structures under Z2 -additions.
Section 2.5 defines the simplicial homology group of a simplicial complex as the quotient
space of the cycles with respect to the boundaries. Some of the related concepts to homology
groups such as induced homology under a map, singular homology groups for general topological
spaces, relative homology groups of a complex with respect to a subcomplex, and the dual concept
of homology groups called cohomology groups are also introduced in this section.

2.1 Simplicial complex


A complex is a collection of some basic elements that satisfy certain properties. In a simplicial
complex, these basic elements are simplices.

23
24 Computational Topology for Data Analysis

Definition 2.1 (Simplex). For k ≥ 0, a k-simplex σ in an Euclidean space Rm is the convex hull1
of a set P of k + 1 affinely independent points in Rm . In particular, a 0-simplex is a vertex, a
1-simplex is an edge, a 2-simplex is a triangle, and a 3-simplex is a tetrahedron. A k-simplex is
said to have dimension k. For 0 ≤ k0 ≤ k, a k0 -face (or, simply a face) of σ is a k0 -simplex that
is the convex hull of a nonempty subset of P. Faces of σ come in all dimensions from zero (σ’s
vertices) to k; and σ is a face of σ. A proper face of σ is a simplex that is the convex hull of a
proper subset of P; i.e. any face except σ. The (k − 1)-faces of σ are called facets of σ; σ has
k + 1 facets.

In Figure 2.1(left), triangle abc is a 2-simplex which has three vertices as 0-faces and three
edges as 1-faces. These are proper faces out of which edges are its facets. Similarly, a tetra-
hedron has four 0-faces (vertices), six 1-faces (edges), four 2-faces (triangles), and one 3-face
(tetrahedron itself) out of which vertices, edges, triangles are proper. The triangles are facets.

Definition 2.2 (Geometric simplicial complex). A geometric simplicial complex K, also known
as a triangulation, is a set containing finitely2 many simplices that satisfies the following two
restrictions.

• K contains every face of each simplex in K.

• For any two simplices σ, τ ∈ K, their intersection σ ∩ τ is either empty or a face of both σ
and τ.

The dimension k of K is the maximum dimension of any simplex in K which is why we also refer
it as a simplicial k-complex.

The above definition of simplicial complexes is very geometric which is why they are referred
as geometric simplicial complexes. Figure 2.1 shows such a geometric simplicial 2-complex in
R2 (left) and another in R3 (right). There is a parallel notion of simplicial complexes that is devoid
of geometry.

Definition 2.3 (Abstract simplex and simplicial complex). A collection K of non-empty subsets
of a given set V(K) is an abstract simplicial complex if every element σ ∈ K has all of its non-
empty subsets σ0 ⊆ σ also in K. Each such element σ with |σ| = k + 1 is called a k-simplex (or
simply a simplex). Each subset σ0 ⊆ σ with |σ0 | = k0 + 1 is called a k0 -face (or, simply a face)
of σ and σ with |σ| = k + 1 is called a k-coface (or, simply a coface) of σ0 . Sometimes, σ0 is
also called a face of σ with co-dimension k − k0 . Also, a (k − 1)-face ((k + 1)-coface resp.) of
a k-simplex is called its facet (cofacet resp.). The elements of V(K) are the vertices of K. Each
k-simplex in K is said to have dimension k. We also say K is a simplicial k-complex if the top
dimension of any simplex in K is k.

Remark 2.1. The collection K can possibly be empty in which case V(K) is empty though a
non-empty K cannot have the empty set as one of its elements by definition.
1
Convex hull of a set of given points p0 , . . . , pk in Rm is the set of all points x ∈ Rm that are convex combination of
the given points, i.e., x = Σki=0 αi pi for αi ≥ 0 and Σαi = 1.
2
Topologists usually define complexes so they have countable cardinality. We restrict complexes to finite cardinality
here.
Computational Topology for Data Analysis 25

a
c
d

f e

Figure 2.1: (left) A simplicial complex with six vertices, eight edges, and one triangle, (right) A
simplicial 2-complex triangulating a 2-manifold in R3 .

A geometric simplicial complex K in Rm is called a geometric realization of an abstract sim-


plicial complex K 0 if and only if there is an embedding e : V(K 0 ) → Rm that takes every k-simplex
{v0 , v1 , . . . , vk } in K 0 to a k-simplex in K that is the convex hull of e(v0 ), e(v1 ), . . . , e(vk ). For exam-
ple, the complex drawn in R2 in Figure 2.1(left) is a geometric realization of the abstract complex
with vertices a, b, c, d, e, f , eight 1-simplices {a, b}, {a, d}, {a, f }, {b, c}, {b.d}, {c, d}, {d, e}, {d, f },
and one 2-simplex {a, b, d}.
Any simplicial k-complex can be geometrically realized in R2k+1 by mapping the vertices
generically to the moment curve C(t) in R2k+1 given by the parameterization C(t) = (t, t2 , · · · , t2k+1 ).
Also, an abstract simplicial complex K with m vertices can always be geometrically realized in
Rm−1 as a subcomplex of a geometric (m − 1)-simplex. To make the realization canonical, we
choose the (m − 1)-simplex to be in Rm with a vertex vi having the ith coordinate to be 1 and all
other coordinates 0. We define K’s underlying space as the underlying space of this canonical
geometric realization.
Definition 2.4 (Underlying space). The underlying space of an abstract simplicial complex K,
denoted |K|, is the pointwise union of its simplices in its canonical geometric realization; that is,
|K| = σ∈K |σ| where |σ| is the restriction of this realization on σ. In case K is geometric, its
S
geometric realization can be taken as itself.
Because of the equivalence between geometric and abstract simplicial complexes, we drop
the qualifiers “geometric" and “abstract" and call them simply as simplicial complexes when it
is clear from the context which one we actually mean. Also, sometimes, we denote a simplex
σ = {v0 , v1 , · · · , vk } simply as v0 v1 · · · vk .
Definition 2.5 (k-skeleton). For any k ≥ 0, the k-skeleton of a simplicial complex K, denoted by
K k , is the subcomplex formed by all simplices of dimension at most k.
In Figure 2.1, the 1-skeleton of the simplicial complex on left consists of six vertices a, b, c,
d, e, f and eight edges adjoining them.

Stars and links. Given a simplex τ ∈ K, its star in K is the set of simplices that have τ as a face,
denoted by St(τ) = {σ ∈ K | τ ⊆ σ} (recall that τ ⊆ σ means that τ is a face of σ). Generally, the
star is not closed under face relation and hence is not a simplicial complex. We can make it so by
adding all missing faces. The result is the closed star, denoted by St(τ) = σ∈St(τ) {σ} ∪ {σ0 ∈ K |
S
26 Computational Topology for Data Analysis

σ0 ⊂ σ}, which is also the smallest subcomplex that contains the star. The link of τ consists of the
set of simplices in the closed star that are disjoint from τ, that is, Lk(τ) = {σ ∈ St(τ) | σ ∩ τ = ∅}.
Intuitively, we can think of the star (resp. the closed star) of a vertex as an open (resp. closed)
neighborhood around it, and the link as the boundary of that neighborhood.
In Figure 2.1(left), we have

• St(a) = {{a}, {a, b}, {a, d}, {a, f }, {a, b, d}}, St(a) = St(a) ∪ {{b}, {d}, { f }, {b, d}}

• St( f ) = {{ f }, {a, f }, {d, f }}, St( f ) = St( f ) ∪ {{a}, {d}}

• St({a, b}) = {{a, b}, {a, b, d}}, St({a, b}) = St({a, b}) ∪ {{a}, {b}, {d}, {a, d}, {b, d}}

• Lk(a) = {{b}, {d}, { f }, {b, d}}, Lk( f ) = {{a}, {d}}, Lk({a, b}) = {{d}}.

Triangulation of a manifold. Given a simplicial complex K and a manifold M, we say that K


is a triangulation of M if the underlying space |K| is homeomorphic to M. Note that if M is a
k-manifold, the dimension of K is also k. Furthermore, for any vertex v ∈ K, the underlying space
|St(v)| of the star St(v) is homeomorphic to the open k-ball Bko if v maps to an interior point in M
and to the k-dimensional halfspace Hk if v maps to a point on the boundary of M. The underlying
space |Lk(v)| of the link Lk(v) is homeomorphic to (k − 1)-sphere Sk−1 if v maps to interior and to
a closed (k − 1)-ball Bk−1
o otherwise.

Simplicial map. Corresponding to the continuous functions (maps) between topological spaces,
we have a notion called simplicial map between simplicial complexes.

Definition 2.6 (Simplicial map). A map f : K1 → K2 is called simplicial if for every simplex
{v0 , . . . , vk } ∈ K1 , we have the simplex { f (v0 ), . . . , f (vk )} in K2 .

A simplicial map is called a vertex map if the domain and codomain of f are only vertex sets
V(K1 ) and V(K2 ) respectively. Every simplicial map is associated with a vertex map. However, a
vertex map f : V(K1 ) → V(K2 ) does not necessarily extend to a simplicial map from K1 to K2 .

Fact 2.1. Every continuous function f : |K1 | → |K2 | can be approximated closely by a simplicial
maps g on appropriate subdivisions of K1 and K2 . The approximation being ‘close’ means that,
for a point x ∈ |K1 |, there is a simplex in K2 which contains both f (x) and h(x) in geometric
realization.

There is also a counterpart of homotopic maps in simplicial setting.

Definition 2.7 (Contiguous maps). Two simplicial maps f1 : K1 → K2 , f2 : K1 → K2 are


contiguous if for every simplex σ ∈ K1 , f1 (σ) ∪ f2 (σ) is a simplex in K2 .

Contiguous maps play an important role in topological analysis. We use a result involving
contiguous maps and homology groups. We defer stating it till Section 2.5 where we introduce
homology groups.
Computational Topology for Data Analysis 27

2.2 Nerves, Čech and Rips complex


Recall Definition 1.6 of covers from Chapter 1. A cover of a topological space defines a special
simplicial complex called its nerve. The nerve plays an important role in bridging topological
spaces to complexes which we will see below and also later in Chapter 9. We first define the
nerve in general terms which can be specialised to covers easily.
Definition 2.8 (Nerve). Given a finite collection of sets U = {Uα }α∈A , we define the nerve of the
set U to be the simplicial complex N(U) whose vertex set is the index set A, and where a subset
{α0 , α1 , . . . , αk } ⊆ A spans a k-simplex in N(U) if and only if Uα0 ∩ Uα1 ∩ . . . ∩ Uαk , ∅.

M U N (U)

Figure 2.2: Examples of two spaces (left), open covers of them (middle), and their nerves (right).
(Top) the intersections of covers are contractible, (bottom) the intersections of covers are not
necessarily contractible.

Taking U to be a cover of a topological space in the above definition, one gets a nerve of a
cover. Figure 2.2 shows two topological spaces, their covers, and corresponding nerves.
One important result involving nerves is the so called Nerve Theorem which have different
forms that depend on the type of topological spaces and covers. Adapting to our need, we state
it for metric spaces (Definition 1.8) which are a special type of topological spaces as we have
observed in Chapter 1.
Theorem 2.1 (Nerve Theorem [45, 300]). Given a finite cover U (open or closed) of a metric
space M, the underlying space |N(U)| is homotopy equivalent to M if every non-empty intersection
∩ki=0 Uαi of cover elements is homotopy equivalent to a point, that is, contractible.
The cover in the top row of Figure 2.2 satisfies the property of the above theorem and its nerve
is homotopy equivalent to M whereas the same is not true for the cover shown in the bottom row.
Given a finite subset P for a metric space (M, d), we can build an abstract simplicial complex
called Čech complex with vertices in P using the concept of nerve.
Definition 2.9 (Čech complex). Let (M, d) be a metric space and P be a finite subset of M. Given
a real r > 0, the Čech complex Cr (P) is defined to be the nerve of the set {B(pi , r)} where
B(pi , r) = {x ∈ M | d(pi , x) ≤ r}
28 Computational Topology for Data Analysis

Figure 2.3: (left) Čech complex Cr (P), (right) Rips complex VRr (P).

is the geodesic closed ball of radius r centering pi .

Observe that if M is Euclidean, the balls considered for Čech complex are necessarily convex
and hence their intersections are contractible. By Theorem 2.1, Čech complex in this case is
homotopy equivalent to the space of union of the balls. The Čech complex is related to another
complex called Vietoris-Rips complex which is often used in topological data analysis.

Definition 2.10 (Vietoris-Rips complex). Let (P, d) be a finite metric space. Given a real r > 0, the
Vietoris-Rips (Rips in short) complex is the abstract simplicial complex VRr (P) where a simplex
σ ∈ VRr (P) if and only if d(p, q) ≤ 2r for every pair of vertices of σ.

Notice that the 1-skeleton of VRr (P) determines all of its simplices. It is the completion (in
terms of simplices) of its 1-skeleton; see Figure 2.3. Also, we observe the following fact.

Fact 2.2. Let P be a finite subset of a metric space (M, d) where M satisfies the property that, for
any real r > 0 and two points p, q ∈ M with d(p, q) ≤ 2r, the metric balls B(p, r) and B(q, r) have
non-empty intersection. Then, the 1-skeletons of VRr (P) and Cr (P) coincide.

Notice that if M is Euclidean, it satisfies the condition stated in the above fact and hence for
finite point sets in any Euclidean space, Čech and Rips complexes defined with Euclidean balls
share the same 1-skeleton. However, for a general finite metric space (P, d), it may happen that
for some p, q ∈ P, one has d(p, q) ≤ 2r and B(p, r) ∩ B(q, r) = ∅.
An easy but important observation is that the Rips and Čech complexes interleave.

Proposition 2.2. Let P be a finite subset of a metric space (M, d). Then,

Cr (P) ⊆ VRr (P) ⊆ C2r (P).

Proof. The first inclusion is obvious because if there is a point x in the intersection ∩ki=1 B(pi , r),
the distances d(pi , p j ) for every pair (i, j), 1 ≤ i, j ≤ k, are at most 2r. It follows that for every
simplex {p1 , . . . , pk } ∈ Cr (P) is also in VRr (P).
To prove the second inclusion, consider a simplex {p1 , . . . , pk } ∈ VRr (P). Since by definition
of the Rips complex d(pi , p1 ) ≤ 2r for every pi , i = 1, . . . , k, we have ∩ki=1 B(pi , 2r) ⊃ p1 , ∅.
Then, by definition, {p1 , . . . , pk } is also a simplex in C2r (P). 
Computational Topology for Data Analysis 29

Figure 2.4: Every triangle in a Delaunay complex has an empty open circumdisk.

2.3 Sparse complexes


The Rips and Čech complexes are often too large to handle in practice. For example, the Rips
complex with n points in Rd can have Ω(nd ) simplices. In practice, they can become large even in
dimension as low as three. Just to give a sense of the scale of the problem, we note that the Rips
or Čech complex built out of a few thousand points often has triangles in the range of millions.
There are other complexes that are much sparser in size because of which they may be preferred
sometimes for computations.

2.3.1 Delaunay complex


This is a special complex that can be constructed out of a point set P ∈ Rd . This complex embeds
in Rd (in the generic setting). Because of its various optimal properties, this complex is used
in many applications involving mesh generation, in particular in R2 and R3 , see [98]. However,
computation of Delaunay complexes in high dimensions beyond R3 can be time intensive, so it is
not yet the preferred choice for applications in dimensions beyond R3 .

Definition 2.11 (Delaunay simplex; Complex). In the context of a finite point set P ∈ Rd , a k-
simplex σ is Delaunay if its vertices are in P and there is an open d-ball whose boundary contains
its vertices and is empty—contains no point in P. Note that any number of points in P can lie on
the boundary of this ball. But, for simplicity, we will assume that only the vertices of σ are on the
boundary of its empty ball. A Delaunay complex of P, denoted Del P, is a (geometric) simplicial
complex with vertices in P in which every simplex is Delaunay and |Del P| coincides with the
convex hull of P, as illustrated in Figure 2.4.

In R2 , a Delaunay complex of a set of points in general position is made out of Delaunay


triangles and all of their lower dimensional faces. Similarly, in R3 , a Delaunay complex is made
out of Delaunay tetrahedra and all of their lower dimensional faces.

Fact 2.3. Every non-degenerate point set (no d + 2 points are co-spherical) admits a unique
Delaunay complex.

Delaunay complexes are dual to the famous Voronoi diagrams defined below.
30 Computational Topology for Data Analysis

Definition 2.12 (Voronoi diagram). Given a finite point set P ⊂ Rd in generic position, the
Voronoi diagram Vor (P) of P is the tessellation of the embedding space Rd into convex cells
V p for every p ∈ P where

V p = {x ∈ Rd | d(x, p) ≤ d(x, q) ∀q ∈ P}.

A k-face of Vor (P) is the intersection of (d − k + 1) Voronoi cells.

Fact 2.4. For P ⊂ Rd , Del (P) is the nerve of the set of Voronoi cells {V p } p∈P which is a closed
cover of Rd .

The above fact actually provides a duality between Delaunay complex and Voronoi diagram.
It is expresed by the duality among their faces. Specifically, a Delaunay k-simplex in Del (P) is
dual to a Voronoi (d − k)-face in Vor (P). The Voronoi diagram dual to the Delaunay complex in
Figure 2.4 is shown in Figure 2.5.
The following optimality properties make Delaunay complexes useful for applications.

Fact 2.5. A triangulation of a point set P ⊂ Rd is a geometric simplicial complex whose vertex
set is P and whose simplices tessellate the convex hull of P. Among all triangulations of a point
set P ⊂ Rd , Del P achieves the following optimized criteria:

1. In R2 , Del P maximizes the minimum angle of triangles in the complex.

2. In R2 , Del P minimizes the largest circumcircle for triangles in the complex.

3. For a simplex in Del P, let its min-ball be the smallest ball that contains the simplex in it.
In all dimensions, Del P minimizes the largest min-ball.

1-skeletons of Delaunay complexes in R2 are plane graphs and hence Delaunay complexes in
R2 have size Θ(n) for n points. They can be computed in Θ(n log n) time. In R3 , their size grows
to Θ(n2 ) and they can be computed in Θ(n2 ) time. In Rd , d ≥ 3, Delaunay complexes have size
Θ(ndd/2e ) and can be computed in optimal time Θ(ndd/2e ) [92].

Alpha complex. Alpha complexes are subcomplexes of the Delaunay complexes which are
parameterized by a real α ≥ 0. For a given point set P and α ≥ 0, an alpha complex consists
of all simplices in Del (P) that have a circumscribing ball of radius at most α. It can also be
described alternatively as a nerve. For each point p ∈ P, let B(p, α) denote a closed ball of radius
α centering p. Consider the closed set Dαp defined as follows:

Dαp = {x ∈ B(p, α) | d(x, p) ≤ d(x, q) ∀q ∈ P}

The alpha complex Del α (P) is the nerve of the closed sets {Dαp } p∈P . Another interpretation for
alpha complex stems from its relation to the Voronoi diagram of the point set P. Alpha complex
contains a k-simplex σ = {p0 , . . . , pk } if and only if ∪ p∈P B(p, α) meets the intersection of Voronoi
cells V p0 ∩ V p1 · · · ∩ V pk . Figure 2.5 shows an alpha complex for the point set in Figure 2.4 for an
α. The Voronoi diagram is shown with the dotted segments.
Computational Topology for Data Analysis 31

Figure 2.5: Alpha complex of the point set in Figure 2.4 for an α indicated in the figure. The
Voronoi diagram of the point set is shown with dotted edges. The triangles and edges in the
complex are shown with solid edges which are subset of the Delaunay complex.

2.3.2 Witness complex


The witness complex defined by de Silva and Carlsson [114] sidesteps the size problem by a
subsampling strategy. First, we define the witness complex with two point sets, P called the
witnesses and Q called the landmarks. The complex is built with vertices in the landmarks where
the simplices are defined with a notion of witness from the witness set. Given a point set P
equipped with pairwise distances d : P × P → R, we can build the witness complex on a finite
subsample Q ⊆ P.

Definition 2.13 (Weak witness). Let P be a point set with a real valued function on pairs d :
P × P → R and Q ⊆ P be a finite subset . A simplex σ = {q1 , . . . , qk } with qi ∈ Q is weakly
witnessed by x ∈ P \ Q if d(q, x) ≤ d(p, x) for every q ∈ {q1 , . . . , qk } and p ∈ Q \ {q1 , . . . , qk }.

We now define the witness complex using the notion of weak witnesses.

Definition 2.14 (Witness complex). Let P, Q be point sets as in Definition 2.13. The witness
complex W(Q, P) is defined as the collection of all simplices whose all faces are weakly witnessed
by a point in P \ Q.

Observe that a simplex which is weakly witnessed may not have all its faces weakly witnessed
(Exercise 7). This is why the definition above forces the condition to have a simplicial complex.
When P = Rd equipped with Euclidean distance and Q is a finite subset of it, we have the
notion of strong witness.

Definition 2.15 (Strong witness). Let Q ⊂ Rd be a finite set. A simplex σ = {q1 , . . . , qd } with
qi ∈ Q is strongly witnessed by x ∈ Rd if d(q, x) ≤ d(p, x) for every q ∈ {q1 , . . . , qd } and
p ∈ Q \ {q1 , . . . , qd } and additionally, d(q1 , x) = · · · = d(qd , x).

When Q ⊂ Rd as in the above definition, the following fact holds [113].


32 Computational Topology for Data Analysis

p2 p1
q1
q2
q3

Figure 2.6: A witness complex constructed out of the points in Figure 2.4 where landmarks are
the black dots and the witness points are the hollow dots. The witnesses for five edges and the
triangle are the centers of the six circles; e.g., the triangle q1 q2 q3 and the edge q1 q3 are weakly
witnessed by the points p1 and p2 respectively.

Proposition 2.3. A simplex σ is strongly witnessed if and only if every face τ ≤ σ is weakly
witnessed.

Furthermore, when Q ⊂ Rd , we have some connections of the witness complex to the Delau-
nay complex. By definition, we know the following:

Fact 2.6. Let Q be a finite subset of Rd . Then a simplex σ is in the Delaunay triangulation Del Q
if and only if σ is strongly witnessed by points in Rd .

By combining the above fact and the observation that every simplex in a witness complex is
strongly witnessed, we have the following result which was observed by de Silva [113].

Proposition 2.4. If P is a finite subset of Rd and Q ⊆ P, then W(Q, P) ⊆ Del Q.

One important implication of the above observation is that the witness complexes for point
samples in an Euclidean space are embedded in that space.
The concept of the witness complex has a parallel in the concept of the restricted Delaunay
complex. When the set P in Proposition 2.4 is not necessarily a finite subset, but only a subset X of
Rd , and Q is finite, we can relate W(Q, P) to the restricted Delaunay complex Del|X Q defined as
the collection of Delaunay simplices in Del Q whose Voronoi duals have non-empty intersection
with X.

Proposition 2.5.

1. W(Q, Rd ) = Del|Rd Q := Del Q [113].

2. W(Q, M) = Del| M Q if M ⊆ Rd is a smooth 1- or 2-manifold [11].

3. W(Q, P) = Del| M Q where P and Q are sufficiently dense sample of a 1-manifold M in


R2 and the result does not extend to other cases of submanifolds embedded in Euclidean
spaces [178].
Computational Topology for Data Analysis 33

2.3.3 Graph induced complex


The witness complex does not capture the topology of a manifold even if the input sample is
dense except for smooth curves in the plane. One can modify them with extra structures such as
putting weights on the points and changing the metric to weighted distances to tackle this problem
as shown in [40]. But, this becomes clumsy in terms of the ‘practicality’ of a solution. We study
another complex called graph induced complex (GIC) [124] which also uses subsampling, but is
more powerful in capturing topology and in some case geometry. The advantage of the GIC over
the witness complex is that GIC is not necessarily a subcomplex of the Delaunay complex and
hence contains few more simplices which aid topology inference. But, for the same reason, it
may not embed in the Euclidean space where its input vertices lie.
In the following definition, the minimization argmin d(p, Q) may be a set instead of a single
point in which case the nearest point map ν is set to choose any point in the set.
Definition 2.16. Let (P, d) be a metric space where P is a finite set and G(P) be a graph with
vertices in P. Let Q ⊆ P and let ν : P → Q that sets ν(p) to be any point in argmin d(p, Q).
The graph induced complex (GIC) G(G(P), Q, d) is the simplicial complex containing a k-simplex
σ = {q1 , . . . , qk+1 }, qi ∈ Q, if and only if there exists a (k + 1)-clique in G(P) spanned by vertices
{p1 , . . . , pk+1 } ⊆ P so that qi ∈ ν(pi ) for each i ∈ {1, 2, . . . , k + 1}. To see that it is indeed a
simplicial complex, observe that a subset of a clique is also a clique.

Figure 2.7: A graph induced complex shown with bold vertices, edges, and a shaded triangle on
the left. The input graph within the shaded triangle is shown on right. The 3-clique with three
different colors (shown inside the shaded triangle on the right) cause the shaded triangle in the
left to be in the graph induced complex.

Input graph G(P). The input point set P can be a finite sample of a subset X of an Euclidean
space, such as a manifold or a compact subset. In this case, we may consider the input graph
G(P) to be the neighborhood graph Gα (P) := (P, E) where there is an edge {p, q} ∈ E if and only
if d(p, q) ≤ α. The intuition is that if P is a sufficiently dense sample of X, then Gα (P) captures
the local neighborhoods of the points in X. Figure 2.7 shows a graph induced complex for a point
data in the plane with a neighborhood graph where d is the Euclidean metric. To emphasize the
dependence on α we use the notation Gα (P, Q, d) := G(Gα (P), Q, d).
34 Computational Topology for Data Analysis

Subsample Q. Of course, the ability of capturing the topology of the sampled space after sub-
sampling with Q depends on the quality of Q. We quantify this quality with a parameter δ > 0.

Definition 2.17. A subset Q ⊆ P is called a δ-sample of a metric space (P, d), if the following
condition holds:

• ∀p ∈ P, there exists a q ∈ Q, so that d(p, q) ≤ δ.

Q is called δ-sparse if the following condition holds:

• ∀(q, r) ∈ Q × Q with q , r, d(q, r) ≥ δ.

The first condition ensures Q to be a good sample of P with respect to the parameter δ and the
second condition enforces that the points in Q cannot be too close relative to the distance δ.

Metric d. The metric d assumed in the metric space (P, d) will be of two types in our discus-
sion below; (i) the Euclidean metric denoted dE , (ii) the graph metric dG derived from the the
input graph G(P) where dG (p, q) is the shortest path distance between p and q in the graph G(P)
assuming its edges have non-negative weights such as their Euclidean lengths.
We state two inference results involving GIC below. The first result is about reconstructing
a surface from its sample. The other result is about inferring one dimensional homology group
from a sample. We introduce the homology groups in the next section. The reader can skip this
result and come back to it after consulting the relevant definitions later. Also, for details, we refer
to [124]. In the following theorem ρ denotes the ‘reach’ of the manifold, an intrinsic feature w.r.t.
which the sampling needs to be dense. We define it more precisely in Definition 6.8 of Chapter 6.

Theorem 2.6. Let M ⊂ R3 be a smooth, compact, and connected surface. If 8ε ≤ δ ≤ 27 2


ρ,
α ≥ 8ε, P is an ε-sample of (M, dE ), and Q ⊆ P is a δ-sparse δ-sample of (P, dE ), then a
triangulation T of M exists as a subcomplex of Gα (P, Q, dE ) which can be computed efficiently.

In the next theorem, dG is the graph metric where the input graph is Gα (P) for some α ≥ 0
constructed by the Euclidean metric which is the input P is equipped with.

Theorem 2.7. Let P be an ε-sample of an embedded smooth, compact manifoldqM in an Eu-


clidean space with reach ρ, and Q a δ-sample of (P, dG ). For 4ε ≤ α, δ ≤ 13 35 ρ, the map
h∗ : H1 (VRα (P)) → H1 (Gα (P, Q, dG )) is an isomorphism where h : VRα (P) → Gα (P, Q, dG ) is the
simplicial map induced by the nearest point map νdG : P → Q.

Instead of stating other homology inference results precisely, we give some empirical results
involving homology groups just to emphasize the advantage of GICs over other complexes in this
respect. Again, the readers unfamiliar with the homology groups can consult the next section.

An empirical example. When equipped with appropriate metric, the GIC can decipher the
topology from data. It retains the simplicity of the Rips complex as well as the sparsity of the
witness complex. It does not build a Rips complex on the subsample and thus is sparser than
the Rips complex with the same set of vertices. This fact makes a real difference in practice as
experiments show.
Computational Topology for Data Analysis 35

Figure 2.8 shows experimental results on two data sets, 40,000 sample points from a Klein
bottle in R4 and 15,000 sample points from the so-called primary circle of natural image data
considered in R25 . The graphs connecting any two points within α = 0.05 unit distance for
Klein bottle and α = 0.6 unit distance for the primary circle were taken as input for the graph
induced complexes. The 2-skeleton of the Rips complexes for these α parameters have 608,200
and 1,329,672,867 simplices respectively. These sizes are too large to carry out fast computations.
14 14
Witness complex Witness complex
12 Rips complex Rips complex
12
Graph induced complex Graph induced complex

Complex size in log scale


10
10
Dimension of H1

8
8
6
6
4

4
2

0 2
0 0.5 1 1.5 0 0.5 1 1.5
δ δ

(a) (b)
8 15
Witnessdcomplex Witness complex
7 Ripsdcomplex Rips complex
Graphdinduceddcomplex Graph induced complex
6
Complex size in log scale

10
DimensiondofdH1

3
5
2

0 0
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5
δ δ

(c) (d)

Figure 2.8: Comparison results for Klein bottle in R4 (top row) and primary circle in R25 (bottom
row). The estimated β1 computed from three complexes are shown on the left, and their sizes are
shown on log scale on right; images taken from [124].

For comparisons, we constructed the graph induced complex, a sparsified version of Rips
complex (Section 6.2), and the witness complex on the same subsample determined by a param-
eter δ. The parameter δ is also used in the graph induced complex and the witness complex. The
edges in the Rips complex built on the same subsample were of lengths at most α + 2δ. One
of the main use of the sparse complexes in TDA is to infer homology groups (covered in next
section) from samples. To compare GIC with the sparse Rips and witness complexes, we varied
δ and observed the rank of the one dimensional homology group (β1 ). As evident from the plots,
the graph induced complex captured β1 correctly for a significantly wider range of δ (left plots)
while its size remained comparable to that of the witness complex (right plots). In some cases, the
graph induced complex could capture the correct β1 with remarkably small number of simplices.
For example, it had β1 = 2 for Klein bottle when there were 278 simplices for δ = 0.7 and 154
36 Computational Topology for Data Analysis

simplices for δ = 1.0. In both cases Rips and witness complexes had wrong β1 while the Rips
complex had a much larger size (loge scale plot) and the witness complex had comparable size.
This illustrates why the graph induced complex can be a better choice than the Rips and witness
complexes.

Constructing a GIC. One may wonder how to efficiently construct the graph induced com-
plexes in practice. Experiments show that the following procedure runs quite efficiently in prac-
tice. It takes advantage of computing nearest neighbors within a range and, more importantly,
computing cliques only in a sparsified graph.
Let the ball B(q, δ) in metric d be called the δ-cover for the point q. A graph induced complex
α
G (P, Q, d) where Q is a δ-sparse δ-sample can be built easily by identifying δ-covers with a
rather standard greedy (farthest point) iterative algorithm. Let Qi = {q1 , . . . , qi } be the point set
sampled so far from P. We maintain the invariants (i) Qi is δ-sparse and (ii) every point p ∈ P
that are in the union of δ-covers q∈Qi B(q, δ) have their closest point ν(p) ∈ argminq∈Qi d(p, q) in
S
Qi identified. To augment Qi to Qi+1 = Qi ∪ {qi+1 }, we choose a point qi+1 ∈ P that is outside the
δ-covers q∈Qi B(q, δ). Certainly, qi+1 is at least δ units away from all points in Qi thus satisfying
S
the first invariant. For the second invariant, we check every point p in the δ-cover of qi+1 and
update ν(p) to include qi+1 if its distance to qi+1 is smaller than the distance d(p, ν(p)). At the
end, we obtain a sample Q ⊆ P whose δ-covers cover the entire point set P and thus is a δ-sample
of (P, d) which is also δ-sparse due to the invariants maintained. Next, we construct the simplices
of Gα (P, Q, d). This needs identifying cliques in Gα (P) that have vertices with different closest
points in Q. We delete every edge pp0 from Gα (P) where ν(p) = ν(p0 ). Then, we determine every
clique {p1 , . . . pk } in the remaining sparsified graph and include the simplex {ν(p1 ), . . . , ν(pk )} in
Gα (P, Q, d). The main saving here is that many cliques of the original graph are removed before
it is processed for clique computation.

Next, we focus on the second topic of this chapter, namely homology groups.They are algebraic
structures to quantify topological features in a space. It does not capture all topological aspects
of a space in the sense that two spaces with the same homology groups may not be topologically
equivalent. However, two spaces that are topologically equivalent must have isomorphic homol-
ogy groups. It turns out that the homology groups are computationally tractable in many cases,
thus making them more attractive in topological data analysis. Before we introduce its definition
and variants in Section 2.5, we need the important notions of chains, cycles, and boundaries given
in the following section.

2.4 Chains, cycles, boundaries


2.4.1 Algebraic structures
First, we recall briefly the definitions of some standard algebraic structures that are used in the
book. For details we refer the reader to any standard book on algebra, e.g. [14].

Definition 2.18 (Group; Homomorphism; Isomorphism). A set G together with a binary operation
‘+’ is a group if it satisfies the following properties: (i) for every a, b ∈ G, a + b ∈ G, (ii) for
every a, b, c ∈ G, (a + b) + c = a + (b + c), (iii) there is an identity element denoted 0 in G so that
Computational Topology for Data Analysis 37

a + 0 = 0 + a = a for every a ∈ G, and (iv) there is an inverse −a ∈ G for every a ∈ G so that


a + (−a) = 0. If the operation + commutes, that is, a + b = b + a for every a, b ∈ G, then G is
called abelian. A subset H ⊆ G is a subgroup of (G, +) if (H, +) is also a group.
Definition 2.19 (Free abelian group; Basis; Rank; Generator). An abelian group G is called free
if there is a subset B ⊆ G so that every element of G can be written uniquely as a finite sum of
elements in B and their inverses disregarding trivial cancellations a + b = a + c − c + b. Such a
set B is called a basis of G and its cardinality is called its rank. If the condition of uniqueness is
dropped, then B is called a generator of G and we also say B generates G.
Definition 2.20 (Coset; Quotient). For a subgroup H ⊆ G and an element a ∈ G, the left coset
is aH = {a + b | b ∈ H} and the right coset is Ha = {b + a | b ∈ H}. For abelian groups, the
left and right cosets are identical and hence are simply called cosets. If G is abelian, the quotient
group of G with a subgroup H ⊆ G is given by G/H = {aH | a ∈ G} where the group operation
is inherited from G as aH + bH = (a + b)H for every a, b ∈ G.
Definition 2.21 (Homomorphism; Isomorphism; Kernel; Image; Cokernel). A map h : G → H
between two groups (G, +) and (H, ∗) is called a homomorphism if h(a + b) = h(a) ∗ h(b) for
every a, b ∈ G. If, in addition, h is bijective, it is called an isomorphism. Two groups G and
H with an isomorphism are called isomorphic and denoted as G  H. The kernel, image, and
cokernel of a homomorphism h : G → H are defined as subgroups ker h = {a ∈ G | h(a) = 0},
Im h = {b ∈ H | ∃ a ∈ G with h(a) = b}, and the quotient group coker h = H/im h respectively.
Definition 2.22 (Ring). A set R equipped with two binary operations, addition ‘+’ and multiplica-
tion ‘·’ is called a ring if (i) R is an abelian group with the addition, (ii) the multiplication is asso-
ciative, that is, (a·b)·c = a·(b·c) and is distributive with the addition, that is, a·(b+c) = a·b+a·c,
∀a, b, c ∈ R, and (iii) there is an identity for the multiplication.
The additive identity of a ring R is usually denoted as 0 whereas the multiplicative identity is
denoted as 1. Observe that, by the definition of abelian group, the addition is commutative. How-
ever, the multiplication need not be so. When the multiplication is also commutative, R is called
a commutative ring. A commutative ring in which every nonzero element has a multiplicative
inverse is called a field.
Definition 2.23 (Module). Given a commutative ring R with multiplicative identity 1, an R-
module M is an abelian group with an operation R × M → M which satisfies the following
properties ∀r, r0 ∈ R and x, y ∈ M:
• r · (x + y) = r · x + r · y

• (r + r0 )x = r · x + r · x0

• 1·x= x

• (r · r0 ) · x = r · (r0 · x)
Essentially, in an R-module, elements can be added and multiplied with coefficients in R.
However, if R is taken as a field k, each non-zero element acquires a multiplicative inverse and
we get a vector space.
38 Computational Topology for Data Analysis

Definition 2.24 (Vector space). An R-module V is called a vector space if R is a field. A set of
elements {g1 , . . . , gk } is said to generate the vector space V if every element a ∈ V can be written
as a = α1 g1 + . . . + αk gk for some α1 , . . . , αk ∈ R. The set {g1 , . . . , gk } is called a basis of V if
every a ∈ V can be written in the above way uniquely. All bases of V have the same cardinality
which is called the dimension of V. We say a set {g1 , . . . , gm } ⊆ V is independent if the equation
α1 g1 + . . . + αm gm = 0 can only be satisfied by setting αi = 0 for i = 1, . . . , m.
Fact 2.7. A basis of a vector space is a generating set of minimal cardinality and an independent
set of maximal cardinality.

2.4.2 Chains
Let K be a simplicial k-complex with m p number of p-simplices, k ≤ p ≤ 0. A p-chain c in K is
Pm
a formal sum of p-simplices added with some coefficients, that is, c = i=1p αi σi where σi are the
p-simplices and αi are the coefficients. Two p-chains c = αi σi and c0 = α0i σi can be added
P P
to obtain another p-chain
mp
X
c+c =0
(αi + α0i )σi .
i=1
In general, coefficients can come from a ring R with its associated additions making the chains
constituting an R-module. For example, these additions can be integer additions where the coef-
ficients are integers; e.g., from two 1-chains (edges) we get

(2e1 + 3e2 + 5e3 ) + (e1 + 7e2 + 6e4 ) = 3e1 + 10e2 + 5e3 + 6e4 .

Notice that while writing a chain, we only write the simplices that have non-zero coefficient in
the chain. We follow this convention all along. In our case, we will focus on the cases where the
coefficients come from a field k. In particular, we will mostly be interested in k = Z2 . This means
that the coefficients come from the field Z2 whose elements can only be 0 or 1 with the modulo-2
additions 0 + 0 = 0, 0 + 1 = 1, and 1 + 1 = 0. This gives us Z2 -additions of chains, for example,
we have
(e1 + e3 + e4 ) + (e1 + e2 + e3 ) = e2 + e4 .
Observe that p-chains with Z2 -coefficients can be treated as sets: the chain e1 + e3 + e4 is the
set {e1 , e3 , e4 }, and Z2 -addition between two chains is simply the symmetric difference between
the corresponding sets.
From now on, unless specified otherwise, we will consider all chain additions to be Z2 -
additions. One should keep in mind that one can have parallel concepts for coefficients and
additions coming from integers, reals, rationals, fields, and other rings. Under Z2 -additions, we
have
mp
X
c+c= 0σi = 0.
i=1
Below, we show addition of chains shown in Figure 2.9:
0-chain: ({b} + {d}) + ({d} + {e}) = {b} + {e} (left)
1-chain: ({a, b} + {b, d}) + ({b, c} + {b, d}) = {a, b} + {b, c} (left)
2-chain: ({a, b, c} + {b, c, e}) + ({b, c, e}) = {a, b, c} (right)
Computational Topology for Data Analysis 39

b
d
a a b e
c
d

e c

Figure 2.9: Chains, boundaries, cycles.

Pm
The p-chains with the Z2 -additions form a group where the identity is the chain 0 = i=1p 0σi ,
and the inverse of a chain c is c itself since c + c = 0. This group, called the p-th chain group, is
denoted C p (K). We also drop the complex K and use the notation C p when K is clear from the
context. We do the same for other structures that we define afterward.

2.4.3 Boundaries and cycles


The chain groups at different dimensions are related by a boundary operator. Given a p-simplex
σ = {v0 , . . . , v p } (also denoted as v0 v1 · · · v p ), let
p
X
∂pσ = {v0 , . . . , v̂i , . . . , v p }
i=0

where v̂i indicates that the vertex vi is omitted. Informally, we can view ∂ p as a map that sends
a p-simplex σ to the (p − 1)-chain that has non-zero coefficients only on σ’s (p − 1)-faces also
referred as σ’s boundary. At this point, it is instructive to note that the boundary of a vertex is
empty, that is, ∂0 σ = ∅. Extending ∂ p to a p-chain, we obtain a homomorphism ∂ p : C p → C p−1
called the boundary operator that produces a (p − 1)-chain when applied to a p-chain:
mp
X mp
X
∂pc = αi (∂ p σi ) for a p-chain c = αi σi ∈ C p .
i=1 i=1

Again, we note the special case of p = 0 when we get ∂0 c = ∅. The chain group C−1 has only one
single element which is its identity 0. On the other end, we also assume that if K is a k-complex,
then C p is 0 for p > k.
Consider the complex in Figure 2.9(right). For the 2-chain abc + bcd we get

∂2 (abc + bcd) = (ab + bc + ca) + (bc + cd + db) = ab + ca + cd + db.

It means that from the two triangles sharing the edge bc, the boundary operator returns the four
boundary edges that are not shared. Similarly, one can check that the boundary of the 2-chains
40 Computational Topology for Data Analysis

consisting of all three triangles in Figure 2.9(right) contains all 7 edges. In particular, the edge bc
does not get cancelled because of all three (odd) triangles adjoin it.
∂2 (abc + bcd + bce) = ab + bc + ca + be + ce + bd + dc.
One important property of the boundary operator is that, applying it twice produces an empty
chain.
Proposition 2.8. For p > 0 and any p-chain c, ∂ p−1 ◦ ∂ p (c) = 0.
Proof. Observe that ∂0 is a zero map by definition. Also, for a k-complex, ∂ p operates on a zero
element for p > k by definition. Then, it is sufficient to show that, for 1 ≤ p ≤ k, ∂ p−1 ◦ ∂ p (σ) = 0
for a p-simplex σ. Observe that ∂ p σ is the set of all (p − 1)-faces of σ and every (p − 2)-faces of
σ is contained in exactly two (p − 1)-faces. Thus, ∂ p−1 (∂ p σ) = 0. 

Extending the boundary operator to the chains groups, we obtain the following sequence of
homomorphisms satisfying Proposition 2.8 for a simplicial k-complex; such a sequence is also
called a chain complex:
∂k+1 ∂k
0 = Ck+1 / Ck / Ck−1 ∂k−1 / Ck−2 ··· C1
∂1
/ C0 ∂0
/ C−1 = 0. (2.1)
Fact 2.8.
1. For p ≥ −1, C p is a vector space because the coefficients are drawn from a field Z2 –it has a
basis so that every element can be expressed uniquely as a sum of the elements in the basis.
2. There is a basis for C p where every p-simplex form a basis element because any p-chain
is a unique subset of the p-simplices. The dimension of C p is therefore n, the number of p-
simplices. When p = −1 and p ≥ k + 1, C p is trivial with dimension 0. In Figure 2.9(right)
{abc, bcd, bce} is a basis for C2 and so is {abc, (abc + bcd), bce}.

Cycle and boundary groups.


Definition 2.25 (Cycle and cycle group). A p-chain c is a p-cycle if ∂c = 0. In words, a chain
that has empty boundary is a cycle. All p-cycles together form the p-th cycle group Z p under
the addition that is used to define the chain groups. In terms of the boundary operator, Z p is the
subgroup of C p which is sent to the zero of C p−1 , that is, ker ∂ p = Z p .
For example, in Figure 2.9(right), the 1-chain ab + bc + ca is a 1-cycle since
∂1 (ab + bc + ca) = (a + b) + (b + c) + (c + a) = 0.
Also, observe that the above 1-chain is the boundary of the triangle abc. It’s not accident that
the boundary of a simplex is a cycle. Thanks to Proposition 2.8, the boundary of a p-chain is a
(p − 1)-cycle. This is a fundamental fact in homology theory.
The set of (p − 1)-chains that can be obtained by applying the boundary operator ∂ p on p-
chains form a subgroup of (p − 1)-chains, called the (p − 1)-th boundary group B p−1 = ∂ p (C p ); or
in other words, the image of the boundary homomorphism is the boundary group, B p−1 = im ∂ p .
We have ∂ p−1 B p−1 = 0 for p > 0 due to Proposition 2.8 and hence B p−1 ⊆ Z p−1 . Figure 2.10
illustrates cycles and boundaries.
Computational Topology for Data Analysis 41

Figure 2.10: Each individual red, blue, green cycle is not a boundary because they do not bound
any 2-chain. However, the sum of the two red cycles, and the sum of the two blue cycles each
form a boundary cycle because they bound 2-chains consisting of redish and bluish triangles
respectively.

Fact 2.9. For a simplicial k-complex,

1. C0 = Z0 and Bk = 0.

2. For p ≥ 0, B p ⊆ Z p ⊆ C p .

3. Like C p , both B p and Z p are vector spaces.

2.5 Homology
The homology groups classify the cycles in a cycle group by putting togther those cycles in the
same class that differ by a boundary. From a group theoretic point of view, this is done by taking
the quotient of the cycle groups with the boundary groups, which is allowed since the boundary
group is a subgroup of the cycle group.

Definition 2.26 (Homology group). For p ≥ 0, the p-th homology group is the quotient group
H p = Z p /B p . Since we use a field, namely Z2 , for coefficients, H p is a vector space and its
dimension is called the p-th Betti number, denoted by β p :

β p := dim H p .

Every element of H p is obtained by adding a p-cycle c ∈ Z p to the entire boundary group,


c + B p , which is a coset of B p in Z p . All cycles constructed by adding an element of B p to c form
the class [c], referred to as the homology class of c. Two cycles c and c0 in the same homology
class are called homologous, which also means [c] = [c0 ]. By definition, [c] = [c0 ] if and only
if c ∈ c0 + B p , and under Z2 coefficients, this also means that c + c0 ∈ B p . For example, in
Figure 2.10, the outer cycle c5 is homologous to the sum c2 + c4 because they together bound the
2-chain consisting of all triangles. Also, observe that the group operation for H p is defined by
[c] + [c0 ] = [c + c0 ].
42 Computational Topology for Data Analysis

(a) (b) (c) (d)

Figure 2.11: Complex K of a tetrahedron: (a) Vertices, (b) spanning tree of the 1-skeleton, (c)
1-skeleton, (d) 2-skeleton of K.

Example. Consider the boundary complex K of a tetrahedron which consists of four triangles,
six edges, and four vertices. Consider the 0-skeleton K 0 of K which consists of four vertices only.
All four vertices whose classes coincide with them are necessary to generate H0 (K 0 ). Therefore,
these four vertices form a basis of H0 (K 0 ). However, one can verify that H0 (K 1 ) for the 1-skeleton
K 1 is generated by any one of the four vertices because all four vertices belong to the same
class when we consider K 1 . This exemplifies the fact that rank of H0 (K) captures the number of
connected components in a complex K.
The 1-skeleton K 1 of the tetrahedron is a graph with four vertices and six edges. Consider a
spanning tree with any vertex and the three edges adjoining it as in Figure 2.11(b). There is no
1-cycle in this configuration. However, each of the other three edges create a new 1-cycle which
are not boundary because there is no triangle in K 1 . These three cycles c1 , c2 , c3 as indicated
in Figure 2.11(c) form their own classes in H1 (K 1 ). Observe that the 1-cycle at the base can be
written as a combination of the other three and thus all classes in H1 (K 1 ) can be generated by
only three classes [c1 ], [c2 ], [c3 ] and no fewer. Hence, these three classes form a basis of H1 (K 1 ).
To develop more intuition, consider a simplicial surface M without boundary embedded in R3 . If
the surface has genus g, that is g tunnels and handles in the complement space, then H1 (M) has
dimension 2g (Exercise 4).
The 2-chain of the sum of four triangles in K make a 2-cycle c because its boundary is 0. Since
K does not have any 3-simplex (the tetrahedron is not part of the complex), this 2-cycle cannot
be added to any 2-boundary other than 0 to form its class. Therefore, the homology class of c is
c itself, [c] = {c}. There is no other 2-cycle in K. Therefore, H2 (K) is generated by [c] alone.
Its dimension is only one. If the tetrahedron is included in the complex, c becomes a boundary
element, and hence [c] = [0]. In that case, H2 (K) = 0. Intuitively, one may think H2 (K) capturing
the voids in a complex K embedded in R3 . (Convince yourself that H1 (K) = 0 no matter whether
the tetrahedron belongs to K or not.)
Fact 2.10. For p ≥ 0,
1. H p is a vector space (when defined over Z2 ),
2. H p may not be a vector space when defined over Z, the integer coefficients. In this case,
there could be torsion subgroups,
3. the Betti number, β p = dim H p , is given by β p = dim Z p − dim B p ,
4. there are exactly 2β p homology classes in H p when defined with Z2 coefficients.
Computational Topology for Data Analysis 43

2.5.1 Induced homology


Continuous functions from a topological space to another topological space takes cycles to cycles
and boundaries to boundaries. Therefore, they induce a map in their homology groups as well.
Here we will restrict ourselves only to simplicial complexes and simplicial maps that are the
counterpart of continuous maps between topological spaces. Simplicial maps between simplicial
complexes take cycles to cycles and boundaries to boundaries with the following definition.

b e

a
K1 c K2 h K3

d g

Figure 2.12: Induced homology by simplicial map: Simplicial map f obtained by the vertex map
a → e, b → e, c → g, d → g induces a map at the homology level f∗ : H1 (K1 ) → H1 (K2 ) which
takes the only non-trivial class created by the empty triangle abc to zero though H1 (K1 )  H1 (K2 ).
Another simplicial map K2 → K3 destroys the single homology class born by the empty triangle
egh in K2 .

Definition 2.27 (Chain map). Let f : K1 → K2 be a simplicial map. The chain map f# :
C p (K1 ) → C p (K2 ) corresponding to f is defined as follows. If c = αi σi is a p-chain, then
P
f# (c) = αi τi where
P
(
f (σi ), if f (σi ) is a p-simplex in K2
τi =
0 otherwise.
For example, in Figure 2.12, the 1-cycle bc+cd+db in K1 is mapped to the 1-chain eg+eg = 0
by the chain map f# .
Proposition 2.9. Let f : K1 → K2 a simplicial map. Let ∂Kp 1 and ∂Kp 2 denote the boundary
homomorphisms in dimension p ≥ 0. Then, the induced chain maps commute with the boundary
homomorphisms, that is, f# ◦ ∂Kp 1 = ∂Kp 2 ◦ f# .
The statement in the above proposition can also be represented with the following diagram,
which we say commutes since starting from the top left corner, one reaches to the same chain at
the lower right corner using both paths–first going right and then down, or first going down and
then right (see Definition 3.15 in the next chapter).
f#
C p (K1 ) / C p (K2 ) (2.2)
K K
∂p 1 ∂p 2
 f# 
C p−1 (K1 ) / C p−1 (K2 )

For example, in Figure 2.12, we have f# (c = ab + bd + da) = 0 and ∂Kp 1 (c) = 0. Therefore,
∂Kp 2 ( f# (c)) = ∂Kp 2 (0) = 0 = f# (0) = f# (∂Kp 1 (c)).
44 Computational Topology for Data Analysis

Since B p (K1 ) ⊆ Z p (K1 ), we have that f# (B p (K1 )) ⊆ f# (Z p (K1 )). Thus, the induced map in the
quotient space, namely,

f∗ (Z p (K1 )/B p (K1 )) := f# (Z p (K1 ))/ f# (B p (K1 ))

is well defined. Furthermore, by the commutativity of the Diagram (2.2), f# (Z p (K1 )) ⊆ Z p (K2 )
and f# (B p (K1 )) ⊆ B p (K2 ), which gives an induced homomorphism in the homology groups:

f∗ : Z p (K1 )/B p (K1 ) → Z p (K2 )/B p (K2 ) or equivalently f∗ : H p (K1 ) → H p (K2 )

A homology class [c] = c + B p in K1 is mapped to the homology class f# (c) + f# (B p ) in K2


by f∗ . In Figure 2.12, we have B1 = {0, ab + bd + da}. Then, for c = bd + dc + cb, we have
f∗ ([c]) = { f# (c), f# (c) + f# (ab + bd + da)} = {0, 0} = [0].
Now we can state a result relating contiguous maps (Definition 2.7) and homology groups
that we promised in Section 2.1.

Fact 2.11. For two contiguous maps f1 : K1 → K2 and f2 : K1 → K2 , the induced maps
f1 ∗ : H p (K1 ) → H p (K2 ) and f2 ∗ : H p (K1 ) → H p (K2 ) are equal.

2.5.2 Relative homology


As the name suggests, we can define a homology group of a complex relative to a subcomplex.
Let K0 be a subcomplex of K. By definition, the chain group C p (K0 ) is a subgroup of C p (K).
Therefore, the quotient group C p (K)/C p (K0 ) is well defined which is called a relative chain group
and is denoted C p (K, K0 ). It is an abelian group whose elements are the cosets [c p ] = c p + C p (K0 )
for every chain c p ∈ C p (K).
The boundary operator ∂ p : C p (K) → C p−1 (K) extends to the relative chain groups in a
natural way:
∂K,K
p
0
: C p (K, K0 ) → C p−1 (K, K0 ), [c p ] 7→ [∂ p c p ].
One may verify that ∂K,K K,K0
p−1 ◦ ∂ p
0
= 0 as before. Therefore, we can define

Z p (K, K0 ) = ker ∂K,K


p
0
, the p-th relative cycle group
B p (K, K0 ) = Im ∂K,K
p+1 , the p-th relative boundary group
0

H p (K, K0 ) = Z p (K, K0 )/B p (K, K0 ), the p-th relative homology group.

The relative homology H p (K, K0 ) is related to a coned complex K ∗ . A coned complex K ∗ of a


simplicial complex K w.r.t. to the pair (K, K0 ) is a simplicial complex which has all simplices
from K and every coned simplex σ ∪ {x} from an additional vertex x to every simplex σ ∈ K0 .
Figure 2.13 shows the coned complexes on right in each case. The following fact is useful to build
an intuition about relative homology groups.

Fact 2.12. H p (K, K0 )  H p (K ∗ ) for all p > 0 and β0 (H0 (K, K0 )) = β0 (H0 (K ∗ )) − 1.

For example, consider K to be an edge {a, b, ab} with K0 = {a, b} as in Figure 2.13(left). The
1-chain ab is a relative 1-cycle because ∂1 (ab) = a + b ∈ C0 (K0 ) and hence ∂1K,K0 ([ab]) is 0 in
C0 (K, K0 ). This is indicated by the presence of the loop in the coned space.
Computational Topology for Data Analysis 45

a a b b

x a a x

b b c c

Figure 2.13: Illustration for relative homology: the subcomplex K0 consists of (left) vertices a
and b, (right) vertices a, b, c, and the edge ab; the coned complex K ∗ are indicated with a coning
from a dummy vertex x.

Now, consider K to be a triangle {a, b, c, ab, ac, bc, abc} with K0 = {a, b, c, ab} as in Fig-
ure 2.13(right). The 1-chains bc and ac both are relative 1-cycles because ∂1 (bc) = b + c ∈ C0 (K0 )
and hence ∂1K,K0 ([bc]) is 0 in C0 (K, K0 ); similarly, ∂1K,K0 ([ac]) = 0. The 1-chain ab is of course
a relative 1-cycle because it is already 0 as a relative chain. Therefore, the relative 1-cycle
group Z1 (K, K0 ) has a basis {[bc], [ac]}. The relative 1-boundary group B1 (K, K0 ) is given by
∂2K,K0 (abc) = [ab] + [bc] + [ac] = [bc] + [ac]. The relative homology group H1 (K, K0 ) has one
non-trivial class, namely the class of either [bc] or [ac] but not both because [bc]+[ac] is a relative
boundary.

2.5.3 Singular Homology


So far we have considered only simplicial homology which is defined on a simplicial complex
without any assumption of a particular topology. Now, we extend this definition to topological
spaces. Let X be a topological space. We bring the notion of simplices in the context of X by
considering maps from the standard d-simplices to X. A standard p-simplex ∆ p is defined by the
convex hull of p + 1 points (x1 , . . . , xi , . . . , x p+1 ) | xi = 1 and x j = 0 f or j , i i=1,...,p+1 in R p+1 .


Definition 2.28 (Singular simplex). A singular p-simplex for a topological space X is defined as
a map σ : ∆ p → X.
Notice that the map σ need not be injective and thus ∆ p may be ‘squashed’ arbitrarily in its
image. Nevertheless, we can still have a notion of the chains, boundaries, and cycles which are
the main ingredients for defining a homology group called the singular homology of X.
The boundary of a p-simplex σ is given by ∂σ = τ0 + τ2 + . . . + τ p where τi : (∂∆ p )i → X is
the restriction of the map σ on the ith facet (∂∆ p )i of ∆ p .
A p-chain is a sum of singular p-simplices with coefficients from integers, reals, or some
appropriate rings. As before, under our assumption of Z2 coefficients, a singular p-chain is given
by i αi σi where αi = 0 or 1. The boundary of a singular p-chain is defined the same way as we
P
did for simplicial chains, only difference being that we have to accommodate for infinite chains.
∂(c p = σ1 + σ2 + . . .) = ∂σ1 + ∂σ2 + . . .
We get the usual chain complex with ∂ p ◦ ∂ p−1 = 0 for all p > 0
∂ p+1 ∂p ∂ p−1
· · · → C p → C p−1 → · · ·
46 Computational Topology for Data Analysis

and can define the cycle and boundary groups as Z p = ker ∂ p and B p = im ∂ p+1 . We have the
singular homology defined as the quotient group H p = Z p /B p .
A useful fact is that singular and simplicial homology coincide when both are well defined.

Theorem 2.10. Let X be a topological space with a triangulation K, that is, the underlying space
|K| is homeomorphic to X. Then H p (K)  H p (X) for any p ≥ 0.

Note that the above theorem also implies that different triangulations of the same topological
space give rise to isomorphic simplicial homology.

2.5.4 Cohomology
There is a dual concept to homology called cohomology. Although cohomology can be defined
with coefficients in rings as in the case of homology groups, we will mainly focus on defining it
over a field thus becoming a vector space.
A vector space V defined with a field k admits a dual vector space V ∗ whose elements are
linear functions φ : V → k. These linear functions themselves can be added and multiplied over
k forming the dual vector space V ∗ . The homology group H p (K) as we defined in Definition 2.26
over the field Z2 is a vector space and hence admits a dual vector space which is usually denoted
as Hom(H p (K), Z2 ). The p-th cohomology group denoted H p (K) is not equal to this dual space,
though over the coefficient field Z2 , one has that H p (K) is isomorphic to Hom(H p (K), Z2 ) and
H p (K) is also defined with spaces of linear maps.

Cochains, cobounadries, and cocycles. A p-cochain is a homomorphism φ : C p → Z2 from


the chain group to the coefficient ring over which C p is defined which is Z2 here. In this case, a
p-cochain φ is given by its evaluation φ(σ) (0 or 1) on every p-simplex σ in K, or more precisely,
Pm
a p-chain c = i=1p αi σi gets a value

φ(c) = α1 φ(σ1 ) + α2 φ(σ2 ) + · · · + αm p φ(σm p ).

Also, verify that φ(c + c0 ) = φ(c) + φ(c0 ) satisfying the property of group homomorphism. For a
chain c, the particular cochain that assigns 1 to a simplex if and only if it has a non-zero coefficient
in c, is called its dual cochain c∗ . The p-cochains form a cochain group C p dual to C p where the
addition is defined by (φ + φ0 )(c) = φ(c) + φ0 (c) by taking Z2 -addition on the right. We can also
define a scalar multiplication (αφ)(c) = αφ(c) by using the Z2 -multiplication. This makes C p a
vector space.
Similar to boundaries of chains, we have the notion of coboundaries of cochains δ p : C p →
C p+1 . Specifically, for a p-cochain φ, its (p + 1)-coboundary is given by the homomorphism
δφ : C p+1 → Z2 defined as δφ(c) = φ(∂c) for any (p + 1)-chain c. Therefore, the coboundary
operator δ takes a p-cochain and produces a (p + 1)-cochain giving the sequence for a simplicial
k-complex:
δ−1 δ0 δ1 δk−1 δk
0 = C−1 −−→ C0 −→ C1 −→ · · · −−−→ Ck −→ Ck+1 = 0

The set of p-coboundaries forms the coboundary group (vector space) B p where the group
addition and scalar multiplication are given by the same in C p .
Computational Topology for Data Analysis 47

f
b b a h
d
b g
a a

c c
c e
(i) (ii) (iii)

Figure 2.14: Illustration for cohomology: (i) and (iii) 1-cochain with support on the solid thick
edges is a 1-cocycle which is not a 1-coboundary, so it constitutes a non-trivial class in H1 . The
1-cochain with support on dashed edges constitutes a cohomologous class, (ii) 1-cochain with
support on the solid thick edges is a 1-cocycle which is also a 1-coboundary and hence belongs
to a trivial class.

Now we come to cocycles, the dual notion to cycles. A p-cochain φ is called a p-cocycle if its
coboundary δφ is a zero homomorphism. The set of p-cocycles form a group Z p (a vector space)
where again the addition and multiplication are induced by the same in C p .
Similar to the boundary operator ∂, the coboundary operator δ satisfies the following property:

Fact 2.13. For p > 0, δ p ◦ δ p−1 = 0 which implies B p ⊆ Z p .

Definition 2.29 (Cohomology group). Since B p is a subgroup of Z p , the quotient group H p =


Z p /B p is well defined which is called the p-th cohomology group.

Example. Consider the three complexes in Figure 2.14. In the following discussion, for conve-
nience, we refer to the p-simplices on which c p evaluates to 1 as the support of c p . The 1-cochain
φ with the support on the edge ac is a cocycle because δ1 φ = 0 as there is no triangle and hence
no non-zero 2-cochain. It is also not a coboundary because there is no 0-cochain φ0 (assignment
of 0 and 1 on vertices) so that

δ0 φ0 (ac) = φ0 (a + c) = 1 = φ(ac)
δ0 φ0 (ab) = φ0 (a + b) = 0 = φ(ab)
δ0 φ0 (bc) = φ0 (b + c) = 0 = φ(bc).

The 1-cochain φ with support on edges ab and ac in Figure 2.14(ii) is a 1-cocycle because
δ1 φ(abc) = φ(ab + ac + bc) = 0. Notice that, now a cochain with support only on one edge
ac cannot be a cocycle because of the presence of the triangle abc. The 1-cochain φ is also a
1-coboundary because a 0-cochain with assignment of 1 on the vertex a produces φ as a cobound-
ary.
Similarly, verify that the 1-cochain φ with support on edges cd and ce in Figure 2.14(iii) is
a cocycle but not a coboundary. Thus, the class [φ] is non-trivial in 1-dimensional cohomology
H1 . Any other non-trivial class is cohomologous to it. For example, the class [φ0 ] where φ0 has
48 Computational Topology for Data Analysis

support on edges b f and bg is cohomologous to [φ]. This follows from the fact that [φ] + [φ0 ] =
[φ + φ0 ] = [0] because φ + φ0 is a 1-coboundary obtained by assigning 1 to vertices a, b, and c.
Similar to the homology groups, a simplicial map f : K1 → K2 also induces a homomorphism
f ∗ between the two cohomology groups, but in the opposite direction. To see this, consider the
chain map f# induced by f (Definition 2.27). Then, a cochain map f # : C p (K2 ) → C p (K1 ) is
defined as f # (φ)(c) = φ( f# (c)). The cochain map f # in turn defines the induced homomorphism
between the respective cohomology groups. We will use the following fact in Section 4.2.1.

Fact 2.14. A simplicial map f : K1 → K2 induces a homomorphism f ∗ : H p (K2 ) → H p (K1 ) for


every p ≥ 0.

2.6 Notes and Exercises


Simplicial complexes is a fundamental structure in algebraic topology. A good source for the
subject is Munkres [241].
The concept of nerve is credited to Aleksandroff [7]. The nerve theorem has different versions.
It holds for open covers for topological spaces with some mild conditions [300]. Borsuk proved
it for closed covers again with some conditions on the space and covers [45]. The assumptions of
both are satisfied by metric spaces and finite covers with which we state the theorem in section 2.2.
A version of the theorem is also credited to Leray [219].
Čech and Vietoris-Rips complexes have turned out to be a very effective data structure in
topological data analysis. Čech complexes were introduced to define Čech homology. Leonid
Vietoris [293] introduced Vietoris complex for extending the homology theory from simplicial
complexes to metric spaces. Later, Eliyahu Rips used it in hyperbolic group theory [176]. Jean-
Claude Hausmann named it as Vietoris-Rips complex and showed that it is homotopoy equivalent
to a compact Riemannian manifold when the vertex set spans all points of the manifold and the pa-
rameter to build it is sufficiently small [187]. This result was further improved by Latschev [217]
who showed that the homotopy equivalence holds even when the vertex set is finite.
Delaunay complex is a very well known and useful data structure for various geometric ap-
plications in two and three dimensions. They enjoy various optimal properties. For example, for
a given point set P ⊂ R2 , among all simplicial complexes linearly embedded in R2 with vertex
set P, the Delaunay complex maximizes the minimum angle over all triangles as stated in Fact
2.5. Many such properties and algorithms for computing Delaunay complexes are described in
books by Edelsbrunner [148] and Cheng et al. [97]. Alpha complex was proposed in [151] and
further developed in [153]. The first author of this book can attest to the historic fact that the de-
velopment of the persistence algorithm was motivated by the study of alpha complexes and their
Betti numbers. The book by Edelsbrunner and Harer [149] confirms this. Witness complexes are
proposed by de Silva and Carlsson [114] in an attempt to build a sparser complex out of a dense
point sample. The graph induced complex is also another such construction proposed in [124].
Homology groups and its associated concepts are main algebraic tools used in topological data
analysis. Many associated structures and results about them exist in algebraic topology. We only
cover the main necessary concepts that are used in this book and leave others. Interested readers
can familiarize themselves with these omitted topics by reading Munkres [241], Hatcher [186],
or Ghrist [170] among many other excellent sources.
Computational Topology for Data Analysis 49

Exercises
1. Suppose we have a collection of sets U = {Uα }α∈A where there exists an element U ∈ U
that contains all other elements in U. Show that the nerve complex N(U) is contractible to
a point.

2. Given a parameter α and a set of points P ⊂ Rd , show that the alpha complex Del α (P) is
contained in the intersection of Delauney complex and Čech complex at scale α; that is,
Del α (P) ⊆ Del (P) ∩ Cα (P).

3. Let K be the simplicial complex of a tetrahedron. Write a basis for the chain groups C1 ,
C2 , boundary groups B1 , B2 , and cycle groups Z1 , Z2 . Write the boundary matrix repre-
senting the boundary operator ∂2 with rows and columns representing bases of C1 and C2
respectively.

4. Let K be a triangulation of an orientable surface without boundary that has genus g. Prove
that β1 (K) = 2g.

5. Let K be a triangulation of a 2-dimensional sphere S2 . Now remove h number of vertex-


disjoint triangles from K, and let the resulting simplicial complex be K 0 . Describe the Betti
numbers of K 0 , and justify your answer.

6. We state the nerve theorem (Theorem 2.1) for covers where either all cover elements are
closed or all cover elements are open. Show that the theorem does not hold if we mix open
and closed elements in the cover.

7. Give an example where a simplex which is weakly witnessed may not have all its faces
weakly witnessed. Show that (i) W(Q, P0 ) ⊆ W(Q, P) for P0 ⊆ P, (ii) W(Q0 , P) may not be
a subcomplex of W(Q, P) where Q0 ⊆ Q.

8. Consider Definition 2.16 for Graph induced complex. Let VR(G) be the clique complex
given by the input graph G(P). Assume that the map ν : P → 2Q sends every point to a
singleton under input metric d. Then, ν : P → ν(P) is a well defined vertex map. Prove that
the vertex map ν : P → Q extends to a simplicial map ν̄ : VR(G) → G(G(P), Q, d). Also,
show that every simplicial complex K(Q) with the vertex set Q for which ν̄ : VR(G) →
K(Q) becomes simplicial must contain G(G(P), Q, d).

9. Prove Proposition 2.9.

10. Consider a complex K = {a, b, c, ab, bc, ca, abc}. Enumerate all elements in the 1-chain,
1-cycle, 1-boundary groups defined on K under Z2 coefficient. Do the same for cochains,
cocycles, and coboundaries.

11. Show an example for the following:

• a chain that is a cycle but its dual cochain is not a cocycle.


• a chain that is a cycle and its dual cochain is a cocycle.
• a chain that is a boundary and its dual cochain is not a coboundary.
50 Computational Topology for Data Analysis

• a chain that is a boundary and its dual cochain is a coboundary.

12. Prove that ∂ p−1 ◦ ∂ p = 0 for relative chain groups and also δ p ◦ δ p−1 = 0 for cochain groups.
Chapter 3

Topological Persistence

Suppose we have a point cloud data P sampled from a 3D model. A quantified summary of the
topological features of the model that can be computed from this sampled representation helps in
further processing such as shape analysis in geometric modeling. Persistent homology offers this
avenue as Figure 3.1 illustrates. For further explanation, consider P sampled from a curve in R2 as
in Figure 3.3. Our goal is to get the information that the sampled space had two loops, one bigger
and more prominent than the other. The notion of persistence captures this information. Consider
the distance function r : R2 → R defined over R2 where r(x) equals d(x, P), that is, the minimum
distance of x to the points in P. Now let us look at the sublevel sets of r, that is, r−1 [−∞, a]
for some a ∈ R+ ∪ {0}. These sublevel sets are union of closed balls of radius a centering the
points. We can observe from Figure 3.3 that if we increase a starting from zero, we come across

Figure 3.1: Persistence barcodes computed from a point cloud data. The barcode on right shows
a single long bar for H0 signifying one connected component, eight long bars for H1 signifying
eight fundamental classes two for each of the four ‘through holes’, and a single long bar for H2
signifying the connected closed surface; picture taken from [135].

different holes surrounded by the union of these balls which ultimately get filled up at different
times. However, the two holes corresponding to the original two loops persist longer than the
others. We can abstract out this observation by looking at how long a feature (homological class)
survives when we scan over the increasing sublevel sets. This weeds out the ‘false’ features
(noise) from the true ones. The notion of persistent homology formalizes and discretizes this
idea – It takes a function defined on a topological space (simplicial complex) and quantifies the

51
52 Computational Topology for Data Analysis

changes in homology classes as the sublevel sets (subcomplexes) grow with increasing value of
the function.
There are two predominant scenarios where persistence appears though in slightly different
contexts. One is when the function is defined on a topological space which requires considering
singular homology groups of the sublevel sets. The other is when the function is defined on a
simplicial complex and the sequence of sublevel sets are implicitly given by a nested sequence
of subcomplexes called a filtration. This involves simplicial homology. Section 3.1 introduces
persistence in both of these contexts though we focus mainly on the simplicial setting which is
availed most commonly for computational purposes.
The birth and death of homological classes give rise to intervals during which a class remains
alive. These intervals together called a barcode summarize the topological persistence of a filtra-
tion; see e.g. Figure 3.1. An equivalent notion called persistence diagrams plots the intervals as
points in the extended plane R̄2 := (R ∪ {±∞})2 ; specifically, the birth and death constitute the x-
and y-coordinates of a point. The stability of the persistence diagrams against the perturbation of
the functions that generate the filtrations is an important result. It makes topological persistence
robust against noise. When filtrations are given without any explicit mention of a function, we
can still talk about the stability of the persistence diagrams with respect to the so-called interleav-
ing distance between the induced persistence modules. Sections 3.2 and 3.4 are devoted to these
concepts.
The algorithms that compute the persistence diagram from a given filtration are presented
in Section 3.3. First, we introduce it assuming that the input is presented combinatorially with
simplices added one at a time in a filtration. The algorithm pairs simplices, one creating and the
other destroying an interval. Then, this pairing is translated into matrix operations assuming that
the input is a boundary matrix representing the filtration. A more efficient version of the algorithm
is obtained by some simple but effective modification.
Finally, we consider the case of a piecewise linear (PL) function on a simplicial complex
and derive a filtration out of it from which the actual persistence of the input PL function can be
computed. This is presented in Section 3.5.

3.1 Filtrations and persistence


At the core of topological persistence is the notion of filtrations which can arise in the context of
topological spaces or simplicial complexes.

3.1.1 Space filtration


Consider a real-valued function f : T → R defined on a topological space T. Let Ta = f −1 (−∞, a]
denote the sublevel set for the function value a. Certainly, we have inclusions:

Ta ⊆ Tb for a ≤ b.

Now consider a sequence of reals a1 ≤ a2 ≤ . . . , ≤ an which are often chosen to be critical values
where the homology group of the sublevel sets change as illustrated in Figure 3.2. Considering
the sublevel sets at these values and a dummy value a0 = −∞ with Ta0 = ∅, we obtain a nested
Computational Topology for Data Analysis 53

sequence of subspaces of T connected by inclusions which gives a filtration F f :

F f : ∅ = Ta0 ,→ Ta1 ,→ Ta2 ,→ · · · ,→ Tan . (3.1)

Figure 3.2 shows an example of the inclusions of the sublevel sets. The inclusions in a filtration
induce linear maps in the singular homology groups of the subspaces involved. So, if ι : Tai →
Ta j , i ≤ j, denotes the inclusion map x 7→ x, we have an induced homomorphism
i, j
h p = ι∗ : H p (Tai ) → H p (Ta j ) (3.2)

for all p ≥ 0 and 0 ≤ i ≤ j ≤ n. Therefore, we have a sequence of homomorphisms induced by


inclusions forming what we call a homology module:

0 = H p (Ta0 ) → H p (Ta1 ) → H p (Ta2 ) → · · · → H p (Tan ).

a1 a1 a2 a1 a2 a3 a1 a2 a3 a4 a1 a2 a3 a4 a5

(a) (b) (c) (d) (e)

Figure 3.2: Persistence of a function on a topological space that has five critical values: (a) Ta1 :
only a new class in H0 is created, (b) Ta2 : two new independent classes in H1 are created, (c) Ta3 :
one of the two classes in H1 dies, (d) Ta4 : the single remaining class in H1 dies, (e) Ta5 : a new
class in H2 is created.

It is worthwhile to mention that writing a group to be 0 means that it is a trivial group con-
i, j
taining only the identity element 0. The homomorphism h p sends the homology classes of the
sublevel set Tai to those of the sublevel set of Ta j . Some of these classes may die (become trivial)
i, j
while the others survive. The image Im h p contains this information.
The inclusions of sublevel sets give rise to persistence also in the context of point clouds, a
common input form in data analysis.

Point cloud. For a point set P in a metric space (M, d), define the distance function f : M → R,
x 7→ d(x, p) where p ∈ argminq∈P d(x, q). Observe that the sublevel sets f −1 (−∞, a] are the union
of closed metric balls of radius a centering points in P. Now we have exactly the same setting as
we described for general topological spaces above where T is replaced with M and sublevel sets
Ta ’s by the union of metric balls that grows with increasing value of a. Figure 3.3 illustrates an
example where M is the Euclidean plane R2 .
54 Computational Topology for Data Analysis

Figure 3.3: Noisy sample of a curve with two loops and the growing sublevel sets of the distance
function to the sample points: The larger loop appearing as the bigger hole in the complement of
the union of balls persists longer than the same for the smaller loop while other spurious holes
persist even shorter.

3.1.2 Simplicial filtrations and persistence


Persistence on topological spaces involves computing singular homology groups for sublevel sets.
Computationally, this is cumbersome. So, we take refuge in the discrete analogue of the topolog-
ical persistence. This involves two important adaptations: first, the topological space is replaced
with a simplicial complex; second, singular homology groups are replaced with simplicial homol-
ogy groups. This means that the topological space T considered before is replaced with one of its
triangulations as Figure 3.4 illustrates. For point cloud data, the union of balls can be replaced by
their nerve, the Čech complex or its cousin Vietoris-Rips complex introduced in Section 2.2. Fig-
ure 3.5 illustrates this conversion for example in Figure 3.3. Of course, these replacements need
to preserve the original persistence in some sense, which is addressed in general by the notion of
stability introduced in Section 3.4.
The nested sequence of topological spaces that arise with growing sublevel sets translates into
a nested sequence of simplicial complexes in the discrete analogue. This brings in the concept of
filtration of simplicial complexes that allows defining the persistence using simplicial homology
groups.
Definition 3.1 (Simplicial filtration). A filtration F = F(K) of a simplicial complex K is a nested
sequence of its subcomplexes

F : ∅ = K0 ⊆ K1 ⊆ · · · ⊆ Kn = K
Computational Topology for Data Analysis 55

a1 a1 a2 a1 a2 a3 a1 a2 a3 a4 a1 a2 a3 a4 a5

(a) (b) (c) (d) (e)

Figure 3.4: Persistence of the piecewise linear version of the function on a triangulation of the
topological space considered in Figure 3.2.

which is also written with inclusion maps as

F : ∅ = K0 ,→ K1 ,→ · · · ,→ Kn = K.

F is called simplex-wise if either Ki \ Ki−1 is empty or a single simplex for every i ∈ [1, n]. Notice
that the possibility of difference being empty allows two consecutive complexes to be the same.

Simplicial filtrations can appear in various contexts.

(b)
(a)

(c) (d)

Figure 3.5: Čech complex of the union of balls considered in Figure 3.3. Homology classes in
H1 are being born and die as the union grows. The two most prominent holes appear as two most
persistent homology classes in H1 . Other classes appear and disappear quickly with relatively
much shorter persistence.
56 Computational Topology for Data Analysis

Simplex-wise monotone function. Consider a simplicial complex K and a (simplex-wise) func-


tion f : K → R on it. We call the function f simplex-wise monotone if for every σ0 ⊆ σ, we have
f (σ0 ) ≤ f (σ). This property ensures that the sublevel sets f −1 (−∞, a] are subcomplexes of K for
every a ∈ R. Denoting Ki = f −1 (−∞, ai ] and a dummy value a0 = −∞, we get a filtration:

∅ = K0 ,→ K1 ,→ · · · ,→ Kn = K.

Vertex function. A vertex function f : V(K) → R is defined on the vertex set V(K) of the
complex K. We can construct a filtration F from such a function.

Lower/upper stars. Recall that in Section 2.1 we have defined the star and link of a vertex
v ∈ K which intuitively captures the concept of local neighborhood of v in K. We infuse the
information about a vertex function f into these structures. First, we fix a total order on vertices
V = {v1 , . . . , vn } of K so that their f -values are in non-decreasing order, that is, f (v1 ) ≤ f (v2 ) ≤
· · · ≤ f (vn ). The lower-star of a vertex v ∈ V, denoted by Lst(v), is the set of simplices in St(v)
whose vertices except v appear before v in this order. The closed lower-star Lst(v) is the closure
of Lst(v), i.e, it consists of simplices in Lst(v) and their faces. The lower-link Llk(v) is the set of
simplices in Lst(v) disjoint from v. Symmetrically, we can define the upper star Ust(v), closed
upper star Ust(v), and upper link Ulk(v), spanned by vertices in the star of v which appear after v
in the chosen order.
One gets a filtration using the lower stars of the vertices: K f (vi ) in the following filtration
denotes all simplices in K spanned by vertices in {v1 , . . . , vi }. Let v0 denote a dummy vertex with
f (v0 ) = −∞.
∅ = K f (v0 ) ⊆ K f (v1 ) ⊆ K f (v2 ) ⊆ · · · ⊆ K f (vn ) = K
Observe that the K f (vi ) \ K f (vi−1 ) = Lst(vi ) for i ∈ [1, n] in the above filtration, that is, each time we
add the lower star of the next vertex in the filtration. This filtration called the lower star filtration
for f is studied in Section 3.5 in more details. Figure 3.6 shows a lower star filtration. A lower
stat filtration can be made simplex-wise by adding the simplices in a lower star in any order that
puts a simplex after all of its faces.
Alternatively, we may consider the vertices in non-increasing order of f values and ob-
tain an upper star filtration. For this we take K f (vi ) to be all simplices spanned by vertices in
{vi , vi+1 , . . . , vn }. Assuming a dummy vertex vn+1 with f (vn+1 ) = ∞, one gets a filtration

∅ = K f (vn+1 ) ⊆ K f (vn ) ⊆ K f (vn−1 ) ⊆ · · · ⊆ K f (v1 ) = K

Observe that the K f (vi ) \ K f (vi+1 ) = Ust(vi ) for i ∈ [1, n] in the above filtration, that is, each time we
add the upper star of the next vertex in the filtration. This filtration called the upper star filtration
for f is in some sense a symmetric version of the lower star filtration though they may provide
different persistence pairs. An upper stat filtration can also be made simplex-wise by adding the
simplices in an upper star in any order that puts a simplex after all of its faces. In this book, by
default, we will assume that the function values along a filtration are non-decreasing. This means
that we consider only lower filtrations by default.
Vertex functions are closely related to the so called piecewise linear functions (PL-functions).
A vertex function f : K → R defines a piecewise linear function (PL-function) on the underlying
Computational Topology for Data Analysis 57

v2 v3 v4 v3
v2 v2
v1 v1
v0 v1
v0 v0 v1
v0 v0
v6 v8 v9
v5 v6 v7 v8
v5 v6 v7 v6 v7
v4 v3 v4 v3 v5 v4 v5 v5
v3 v4 v3 v4
v2 v2 v3
v1 v2 v2
v1 v1 v2
v0 v0 v1 v1
v0 v0
v0

Figure 3.6: The sequence shows a lower-star filtration of K induced by a vertex function which is
a ‘height function’ that records the vertical height of a vertex increasing from bottom to top here.

space |K| of K which is obtained by linearly interpolating f over all simplices. On the other hand,
the restriction of a PL-function to vertices trivially provides a vertex function.

Definition 3.2 (PL-functions). Given a simplicial complex K, a piecewise-linear (PL) function


f : |K| → R is defined to be the linear extension of a vertex function fV : V(K) → R defined on
vertices V(K) of K so that for every point x ∈ |K|, f¯(x) = k+1
i=1 αi fV (vi ) where σ = {v1 , . . . , vk+1 }
P
is the unique lowest dimensional simplex of dimension k ≥ 0 containing x and α1 , . . . , αk+1 are
the barycentric coordinates of x in σ. 1

Fact 3.1.

• A PL-function f : |K| → R naturally provides a vertex function fV : V(K) → R.

• A simplex-wise lower star filtration for f is also a filtration for the simplex-wise monotonic
function f¯ : K → R where f¯(σ) = maxv∈σ f (v).

• Similarly, a simplex-wise upper star filtration for f is also a filtration for the simplex-wise
monotonic function f¯(σ) = maxv∈σ (− f (v)).

Observe that a given vertex function fV : K → R induces a PL-function f : |K| → R whose


persistence on the topological space |K| can be defined by taking sublevel sets at critical values
(see Definition 3.23 for critical points in PL-case) and then applying Definition 3.4. The relation
of this persistence to the persistence of the lower star filtration of K induced by fV is studied in
Section 3.5.2. Indeed, the persistence of f can be read from the persistence of lower star filtration
of fV .
Finally, we note that any simplicial filtration F can naturally be induced by a function. We
introduce this association for unifying the definition of persistence pairing later in Definition 3.7.

Definition 3.3 (Filtration function). If a simplicial filtration F is obtained from a simplex-wise


monotone function or a vertex function f , then F is induced by f . Conversely, if F is given
1
Unique numbers α1 , . . . , αk+1 for which x = Σk+1
i=1 αi vi with Σαi = 1 and αi ≥ 0 ∀i are called barycentric coordinates
of x in σ.
58 Computational Topology for Data Analysis

without any explicit input function, we say F is induced by the simplex-wise monotone function
f where every simplex σ ∈ (Ki \ Ki−1 ) for Ki , Ki+1 is given the value f (σ) = i.
i, j
Naturally, every simplicial filtration gives rise to a sequence of homomorphisms h p as in
Equation 3.2 induced by inclusions again forming a homology module
i, j
hp
0 = H p (K0 ) → H p (K1 ) → · · · → H p (Ki ) →· · ·→ H p (K j ) · · · → H p (Kn ) = H p (K).

3.2 Persistence
In both cases of space and simplicial filtration F, we arrive at a homology module:
i, j
hp
H p F : 0 = H p (X0 ) → H p (X1 ) → · · · → H p (Xi ) →· · ·→ H p (X j ) · · · → H p (Xn ) = H p (X) (3.3)

where Xi = Tai if F is a space filtration of a topological space X = T or Xi = Ki if F is a simplicial


filtration of a simplicial complex X = K. Persistent homology groups for a homology module
are algebraic structures capturing the survival of the homology classes through this sequence. In
general, we will call homology modules as persistence modules in Section 3.4 recognizing that
we can replace homology groups with vector spaces.

Definition 3.4 (Persistent Betti number). The p-th persistent homology groups are the images of
i, j i, j
the homomorphisms; H p = im h p , for 0 ≤ i ≤ j ≤ n. The p-th persistent Betti numbers are the
i, j i, j i, j
dimensions β p = dim H p of the vector spaces H p .

The p-th persistent homology groups contain the important information of when a homology
class is born or when it dies. The issue of birth and death of a class becomes more subtle because
when a new class is born, many other classes that are sum of this new class and any other exist-
ing class also are born. Similarly, when a class ceases to exist, many other classes may cease to
exist along with it. Therefore, we need a mechanism to pair births and deaths canonically. Fig-
ure 3.7 illustrates birth and death of a class though the pairing of birth and death events is more
complicated as stated in Fact 3.3.
i, j
Observe that the non trivial elements of p-th persistent homology groups H p consist of classes
that survive from Xi to X j , that is, the classes which do not get ‘quotient out’ by the boundaries in
X j . So, one can observe:
i, j i, j ij
Fact 3.2. H p = Z p (Xi )/(B p (X j ) ∩ Z p (Xi )) and β p = dim H p .

Notice that Z p (Xi ) is a subgroup of Z p (X j ) because Xi ⊆ X j and hence the above quotient is
well defined. We now formally state when a class is born or dies.

Definition 3.5 (Birth and death). A non-trivial p-th homology class ξ ∈ H p (Xa ) is born at Xi ,
i ≤ a, if ξ ∈ Hi,ap but ξ < H p
i−1,a
. Similarly, a non-trivial p-th homology class ξ ∈ H p (Xa ) dies
a, j−1 a, j
entering X j , a < j, if h p (ξ) is not zero (non-trivial) but h p (ξ) = 0.

Observe that not all classes that are born at Xi necessarily die entering some X j though more
than one such may do so.
Computational Topology for Data Analysis 59

Fact 3.3. Let [c] ∈ H p (X j−1 ) be a p-th homology class that dies entering X j . Then, it is born
at Xi if and only if there exists a sequence i1 ≤ i2 ≤ · · · ≤ ik = i for some k ≥ 1 so that (i)
0 , [ci` ] ∈ H p (X j−1 ) is born at Xi` for every ` ∈ {1, . . . , k} and (ii) [c] = [ci1 ] + · · · + [cik ].
One may interpret the above fact as follows. When a class dies, it may be thought of as a
merge of several classes among which the youngest one [cik ] determines the birth point. This
viewpoint is particularly helpful while pairing simplices in the persistence algorithm PairPersis-
tence presented later.

Hp (Xi−1 ) Hp (Xi ) Hp (Xj−1 ) Hp (Xj )


[c]

Figure 3.7: A simplistic view of birth and death of classes: A class [c] is born at Xi since it is not
in the image of H p (Xi−1 ). It dies entering X j since this is the first time its image becomes trivial.

Notice that each Xi , i = 0, . . . , n, is associated with a value of the function f that induces
F. For a space filtration, we say f (Xi ) = ai where Xi = Tai . For a simplicial filtration, we say
f (Xi ) = ai where ai = f (σ) for any σ ∈ Xi when the filtration function (Definition 3.3) is simplex-
wise monotone. When it is a vertex function f , then we extend f to a simplex-wise monotone
function as stated in Fact 3.1.

3.2.1 Persistence diagram


Fact 3.3 provides a qualitative characterization of the pairing of births and deaths of classes. Now
we give a quantitative characterization which helps drawing a visual representation of this pairing
called persistence diagram; see Figure 3.8(left). Consider the extended plane R̄2 := (R ∪ {±∞})2
on which we represent a birth at ai paired with the death at a j as a point (ai , a j ). This pairing
i, j
uses a persistence pairing function µ p defined below. Strictly positive values of this function
correspond to multiplicities of points in the persistence diagram (Definition 3.8). In what follows,
to account for classes that never die, we extend the induced module in Eqn.(3.3) on the right end
by assuming that H p (Xn+1 ) = 0.
Definition 3.6. For 0 < i < j ≤ n + 1, define
i, j i, j−1 i, j i−1, j−1 i−1, j
µ p = (β p − β p ) − (β p − βp ). (3.4)

The first difference on the RHS counts the number of independent classes that are born at or
before Xi and die entering X j . The second difference counts the number of independent classes
that are born at or before Xi−1 and die entering X j . The difference between the two differences
thus counts the number of independent classes that are born at Xi and die entering X j . When
j = n + 1, µi,n+1
p counts the number of independent classes that are born at Xi and die entering
60 Computational Topology for Data Analysis

Xn+1 . They remain alive till the end in the original filtration without extension, or we say that they
never die. To emphasize that classes which exist in Xn actually never die, we equate n + 1 with ∞
and take an+1 = a∞ = ∞. Observe that, with this assumption, we have βi,n+1 = βi,∞ = 0 for every
i ≤ n.
Remark 3.1. The p-th homology classes in H p (X j−1 ) that get born at Xi and die entering X j
may not form a vector space. Hence, we cannot talk about its dimension. In fact, definition of
i, j
µ p , in some sense, compensates for this limitation. This definition involves alternating sums of
dimensions (βi j ’s) of vector spaces. The dimensions appearing with the negative signs lead to this
i, j
anomaly. However, one can express µ p as the dimension of a vector space which is a quotient of
a subspace, see [18] for details.
i, j
Definition 3.7 (Class persistence). For µ p , 0, the persistence Pers ([c]) of a class [c] that is born
at Xi and dies at X j is defined as Pers ([c]) = a j − ai . When j = n + 1 = ∞, Pers ([c]) equals
an+1 − ai = ∞.
Notice that, values ai s can be the index i when no explicit function is given (Definition 3.3).
In that case, persistence of a class sometimes referred as index persistence which is j − i.
Definition 3.8 (Persistence diagram). The persistence diagram Dgm p (F f ) (also written Dgm p f )
of a filtration F f induced by a function f is obtained by drawing a point (ai , a j ) with non-zero
i, j
multiplicity µ p , i < j, on the extended plane R̄2 := (R ∪ {±∞})2 where the points on the diagonal
∆ : {(x, x)} are added with infinite multiplicity.
The addition of the diagonal is a technical necessity for results that we will see afterward.
A class born at ai and never dying is represented as a point (ai , an+1 ) = (ai , ∞) (point v in
Figure 3.8) – we call such points in the persistence diagram as essential persistent points, and
their corresponding homology classes as essential homology classes. Classes may have the same
coordinates because they may be born and die at the same time. This happens only when we allow
mutiple homology classes being created or destroyed at the same function value or filtration point.
In general, this also opens up the possibility of creating infinitely many birth-death pairs even if
the filtration is finite. To avoid such pathological cases, we always assume that the linear maps in
the homology modules have finite rank, a condition known as q-tameness in the literature [80].
There is also an alternative representation of persistence called barcode where each birth-
death pair (ai , a j ) is represented by a line segment [ai , a j ) called a bar which is open on the right.
The open end signifies that the class dying entering X j does not exist in X j . Points at infinity
such as (ai , ∞) are represented with a ray [ai , ∞) giving an infinite bar. See Figure 3.8(right).
Figure 3.9 shows typical persistence diagrams and barcodes (ignoring the types of end points) for
p = 0, 1.
Fact 3.4.

1. If a class has persistence s, then the point representing it will be at a Euclidean distance
√s from the diagonal ∆ (distance between t, t¯ and r, r̄ in Figure 3.8).
2

2. For sublevel set filtrations, all points (ai , a j ) representing a class have ai ≤ a j , so they lie
on or above the diagonal.
Computational Topology for Data Analysis 61


v

u
w t
death v
t̄ u
q t
r w
r
p r̄ q
p

birth

Figure 3.8: (left) A persistence diagram with non-diagonal points only in the positive quadrant,
(right) corresponding barcode.

Figure 3.9: Typical persistence diagrams and the corresponding barcodes for an image data, red
and blue correspond to 0-th and 1-st persistence diagrams respectively. The bars are sorted in
increasing order of their birth time from bottom to top.

3. If mi denotes the multiplicity of an essential point (ai , ∞) in Dgm p (F), where F is a filtration
of X = Xn , one has Σi mi = dim H p (X), the p-th Betti number of X.

Here is one important fact relating persistent Betti numbers and persistence diagrams.
Theorem 3.1. For every pair of indices 0 ≤ k ≤ ` ≤ n and every p, the p-th persistent Betti
i, j
number satisfies βk,`
p = i≤k j>` µ p .
P P

Observe that βk,`


p is the number of points in the upper left quadrant of the corner (ak , a` ). A
p iff i ≤ k and j > `. The quadrant is
class that is born at Xi and dies entering X j is counted for βk,`
therefore closed on the right and open at the bottom.
62 Computational Topology for Data Analysis

Stability of persistence diagrams. A persistence diagram Dgm p (F f ), as a set of points in the


extended plane R̄2 , summarizes certain topological information of a simplicial complex (space)
in relation to the function f that induces the filtration F f . However, this is not useful in practice
unless we can be certain that a slight change in f does not change this diagram dramatically.
In practice f is seldom measured accurately, and if its persistence diagram can be approximated
from a slightly perturbed version, it becomes useful. Fortunately, persistence diagrams are stable.
To formulate this stability, we need a notion of a distance between persistence diagrams.

death

birth

Figure 3.10: Two persistence diagrams and their bottleneck distance which is half of the side
lengths of the squares representing bijections.

Let Dgm p (F f ) and Dgm p (Fg ) be two persistence diagrams for two functions f and g. We
want to consider bijections between points from Dgm p (F f ) and Dgm p (Fg ). However, they may
have different cardinality for off-diagonal points. Recall that persistence diagrams include the
points on the diagonal ∆ each with infinite multiplicity. This addition allows us to borrow points
from the diagonal when necessary to define the bijections. Note that we are considering only
filtrations of finite complexes which also make each homology group finite.

Definition 3.9 (Bottleneck distance). Let Π = {π : Dgm p (F f ) → Dgm p (Fg )} denote the set of
all bijections. Consider the distance between two points x = (x1 , x2 ) and y = (y1 , y2 ) in L∞ -norm
kx − yk∞ = max{|x1 − x2 |, |y1 − y2 |} with the assumption that ∞ − ∞ = 0. The bottleneck distance
between the two diagrams is:

db (Dgm p (F f ), Dgm p (Fg )) = inf sup kx − π(x)k∞ .


π∈Π x∈Dgm (F f )
p

Fact 3.5. db is a metric on the space of persistence diagrams. Clearly, db (X, Y) = 0 if and only if
X = Y. Moreover, db (X, Y) = db (Y, X) and db (X, Y) ≤ db (X, Z) + db (Z, Y).

There is a caveat for the above fact. If db is taken as a distance on the space of homology
modules H p F instead of the persistence diagrams Dgm p (F) they generate, that is, if we define
Computational Topology for Data Analysis 63

db (H p F f , H p Fg ) := db (Dgm p (F f ), Dgm p (Fg )), then it may not be a metric. The first axiom for
metric becomes false if the homology modules are allowed to have classes created and destroyed
at the same function values. These classes of zero persistence generate points on the diagonal ∆
in the diagram. Since points on the diagonal have infinite multiplicity, two modules differing in
the number of such classes of zero persistence may have diagrams with zero bottleneck distance.
If we allow such cases, db becomes a pseudometric on the space of homology modules meaning
that it satisfies all axioms of a metric except the first one.
The following theorems originally proved in [102] and further detailed in [149] quantify the
notion of the stability of the persistence diagram. There are two versions, one involves simplicial
filtrations and the other involves space filtrations. For two functions, f, g : X → R, the infinity
norm is defined as k f − gk∞ := sup x∈X | f (x) − g(x)|.
Theorem 3.2 (Stability for simplicial filtrations). Let f, g : K → R be two simplex-wise monotone
functions giving rise to two simplicial filtrations F f and Fg . Then, for every p ≥ 0,

db (Dgm p (F f ), Dgm p (Fg )) ≤ k f − gk∞ .

For the second version of the stability theorem, we require that the functions referred in the
theorem are ‘nice’ in the sense that they are tame. A function f : X → R is tame if the homology
groups of its sublevel sets have finite rank and these ranks change only at finitely many values
called critical.
Theorem 3.3 (Stability for space filtrations). Let X be a triangulable space and f, g : X → R be
two tame functions giving rise to two space filtrations F f and Fg where the values for sublevel
sets include critical values. Then, for every p ≥ 0,

db (Dgm p (F f ), Dgm p (Fg )) ≤ k f − gk∞ .

There is another distance called q-Wasserstein distance with which persistence diagrams are
also compared.
Definition 3.10 (Wasserstein distance). Let Π be the set of bijections as defined in Definition 3.9.
For any p ≥ 0, q ≥ 1, the q-Wasserstein distance is define as
h  q i1/q
dW,q (Dgm p (F f ), Dgm p (Fg )) = inf Σ x∈Dgm p (F f ) kx − π(x)kq .
π∈Π

The distance dW,q also is a metric on the space of persistence diagrams just like the bottleneck
distance. It also enjoys a stability property though it is not as strong as in Theorem 3.3.
Fact 3.6. Let f, g : X → R be two Lipschitz functions defined on a triangulable compact metric
space X. Then, there exist constants C and k depending on X and the Lipschitz constants of f and
g so that for every p ≥ 0 and q ≥ k,
1− k
dW,q (Dgm p (F f ), Dgm p (Fg )) ≤ C · k f − gk∞ q .
64 Computational Topology for Data Analysis

The above result got improved recently [278] by considering the Lq -distance between func-
tions defined on a common domain X:
1/q
k f − gkq = Σ x∈X | f (x) − g(x)|q .

Theorem 3.4 (Stability for Wasserstein distance). Let f, g : K → R be two simplex-wise mono-
tone functions on a simplicial complex K. Then, one has

dW,q (Dgm p (F f ), Dgm p (Fg )) ≤ k f − gkq .

Bottleneck distances can be computed using perfect matchings in bipartite graphs. Computing
Wasserstein distances become more difficult. It can be computed using an algorithm for minimum
weight perfect matching in weighted bipartite graphs. We leave it as an Exercise question (Exer-
cise 5).

Computing bottleneck distances.


Let A and B be the non-diagonal points in two persistence diagrams Dgm p (F f ) and Dgm p (Fg )
respectively. For a point a ∈ A, let ā denote the nearest point of a on the diagonal. Define b̄ for
every point b ∈ B similarly. Let Ā = {ā} and B̄ = {b̄}. Let à = A ∪ B̄ and B̃ = B ∪ Ā. We want
to bijectively match points in à and B̃. Let Π = {π} denote such a matching. It follows from the
definition that
db (Dgm p (F f ), Dgm p (Fg )) = min sup ka − π(a)k∞ .
π∈Π a∈Ã,π(a)∈ B̃

Then, the bottleneck distance we want to compute must be L∞ distance max{|xa − xb |, |ya − yb |}
for two points a ∈ Ã and b ∈ B̃. We do a binary search on all such possible O(n2 ) distances where
|Ã| = | B̃| = n. Let δ0 , δ1 , · · · , δn0 be the sorted sequence of these distances in a non-decreasing
order.
Given a δ = δi ≥ 0 where i is the median of the index in the binary search interval [`, u],
we construct a bipartite graph G = (Ã ∪ B̃, E) where an edge e = (a, b){a∈Ã,b∈B̃} is in E if and
only if either both a ∈ Ā and b ∈ B̄ (weight(e) = 0) or ka − bk∞ ≤ δ (weight(e) = ka − bk∞ ).
A complete matching in G is a set of n edges so that every vertex in à and B̃ is incident to
exactly one edge in the set. To determine if G has a complete matching, one can use an O(n2.5 )
algorithm of Hopcroft and Karp [198] for complete matching in a bipartite graph. However,
exploiting the geometric embedding of the points in the persistence diagrams, we can apply an
O(n1.5 ) time algorithm of Efrat et al. [154] for the purpose. If such an algorithm affirms that a
complete matching exists, we do the following: if ` = u we output δ, otherwise we set u = i
and repeat. If no matching exists, we set ` = i and repeat. Observe that matching has to exist
for some value of δ, in particular for δn0 and thus the binary search always succeeds. Algorithm
1: Bottleneck lays out the pseudocode for this matching. The algorithm runs in O(n1.5 log n)
time accounting for the O(log n) probes for binary search each applying O(n1.5 ) time matching
algorithm. However, to achieve this complexity, we have to avoid sorting n0 = O(n2 ) values taking
O(n2 log n) time. Again, using the geometric embedding of the points, one can perform the binary
probes without incurring the cost for sorting. For details and an efficient implementation of this
algorithm see [208].
Computational Topology for Data Analysis 65

Algorithm 1 Bottleneck(Dgm p (F f ), Dgm p (Fg ))


Input:
Two persistent diagrams Dgm p (F f ), Dgm p (Fg )
Output:
Bottleneck distance db (Dgm p (F f ), Dgm p (Fg ))
1: Compute sorted distances δ0 ≤ δ1 ≤ · · · ≤ δn0 from Dgm p (F f ) and Dgm p (Fg )
2: ` := 0; u = n0
3: while ` < u do
4: i := b (u+`)
2 c; δ := δi
5: Compute graph G = (Ã ∪ B̃, E) where ∀e ∈ E, weight(e) ≤ δ
6: if ∃ complete matching in G then
7: u := i
8: else
9: ` := i
10: end if
11: end while
12: Output δ

3.3 Persistence algorithm


For computational purposes, we focus on simplicial filtrations because it is not always easy to
compute singular homology of topological spaces. We present algorithms that, given a simpli-
cial filtration, compute its persistence diagram. For this, it is sufficient to compute every pair of
simplices that ensue birth and death of a homology class. First, we describe a combinatorial algo-
rithm originally proposed in [152] and later present a version of it in terms of matrix reductions.
We assume that the input is a simplex-wise filtration that begins with an empty complex

∅ = K0 ,→ K1 ,→ K2 ,→ · · · ,→ Kn = K

where K j \ K j−1 = σ j is a single simplex for each j ∈ [1, n].


Remark 3.2. The assumption of simplex-wise filtration does not pose any limitation because any
filtration can be expanded into a simplex-wise filtration. For this, put all simplices in the difference
of two consecutive complexes in the given filtration in any order only ensuring that all faces of
a simplex appear before it in the expanded filtration. The persistence diagram of the original
filtration can be read from the diagram of this expanded simplex-wise filtration by considering
the original filtration function values associated with the simplices.
i, j
Observe that a simplex-wise filtration necessarily renders the persistence pairing function µ p
to assume a value of at most 1 due to the following fact.
Fact 3.7. When a p-simplex σ j = K j \ K j−1 is added, exactly one of the following two possibilities
occurs:
1. A non-boundary p-cycle c along with its classes [c] + h for any class h ∈ H p (K j−1 ) are born
(created). In this case we call σ j a positive simplex (also called a creator).
66 Computational Topology for Data Analysis

2. An existing (p − 1)-cycle c along with its class [c] dies (destroyed). In this case we call σ j
a negative simplex (also called a destructor).

v2 v2 v2 v4

v3 v3
v1 v1 v1 v1
K1 (v1 , −) K2 (v2 , −) K3 (v3 , −) K4 (v4 , −)
v2 v4 v2 v4 v2 v4 v2 e 8 v4

e6 e6 e6 e7
e7
v0
e5 v3 e 5 v3 e5 v3 e 5 v3
v1 v1 v1 v1
K5 (v3 , e5 ) K6 (v2 , e6 ) K7 (v4 , e7 ) K8 (e8 , −)
v2 e8 v4 v2 e8 v4 v2 e8 v4

e6 t10 t10
e6 e9 e7 e9 e7 e6 e9 e7
t11
v3 e5 v3 e5 v3
e5 v1 v1
v1 K9 (e9 , −) K10 (e9 , t10 ) K11 (e8 , t11 )

Figure 3.11: Red simplices are positive and blue ones are negative. The simplices are indexed
to coincide with their order in the filtration. (·, ·) in each subcomplex Ki (·, ·) shows the pairing
between the positive and the negative. The second component missing in the parenthesis shows
the introduction of a positive simplex.

To elaborate the above two changes consider the example depicted in Figure 3.11. When one
moves from K7 to K8 , a non-boundary loop which is a 1-cycle (e5 + e6 + e7 + e8 ) is created after
adding edge e8 . Strictly speaking, a positive p-simplex σ j may create more than one p-cycle.
Only one of them can be taken as independent and the others become its linear combinations with
the existing ones in K j−1 . From K8 to K9 , the introduction of edge e9 creates two non-boundary
loops (e5 + e6 + e9 ) and (e7 + e8 + e9 ). But any one of them is the linear combination of the other
one with the existing loop (e5 + e6 + e7 + e8 ). Notice that there is no canonical way to choose an
independent one. However, the creation of a loop is reflected in the increase of the rank of H1 . In
other words, in general, the Betti number β p increases by 1 for a positive simplex. For a negative
simplex, we get the opposite effect. In this case β p−1 decreases by 1 signifying a death of a cycle.
However, unlike positive simplices, the destroyed cycle is determined uniquely up to homology,
which is the equivalent class carried by the boundary of σ j . For example, in Figure 3.11, the loop
(e7 + e8 + e9 ) gets destroyed by triangle t10 when we go from K9 to K10 .

Pairing. We already saw that destruction of a class is uniquely paired with the creation of a
class through the ‘youngest first’ rule; see the discussion after Fact 3.3. By Fact 3.7, this means
that each negative simplex is paired uniquely with a positive simplex. The goal of the persistence
algorithm is to find out these pairs.
Consider the birth and death of the classes by addition of simplices into a filtration. When
a p-simplex σ j is added, we explore if it destroys the class [c] of its boundary c = ∂σ j if it is
not a boundary already. The cycle c was created when the youngest (p − 1)-simplex in it, say
σi , was added. Note that a simplex is younger if it comes later in the filtration. If σi , a positive
Computational Topology for Data Analysis 67

(p − 1)-simplex, has already been paired with a p-simplex σ0j , then a class also created by σi got
destroyed when σ0j appeared. We can get the (p − 1)-cycle representing this destroyed class and
add it to ∂σ j . The addition provides a cycle that existed before σi . We update c to be this new
cycle and look for the youngest (p − 1)-simplex σi in c and continue the process till we find one
that is unpaired, or the cycle c becomes empty. In the latter case, we discover that c = ∂σ j was a
boundary cycle already and thus σ j creates a new class in H p (K j ). In the other case, we discover
that σ j is a negative p-simplex which destroys a class created by σi . We pair σ j with σi . Indeed,
one can show that the above algorithm produces the persistence pairs according to Definition 3.11
below, that is, their function values lead to the persistence diagram (Definition 3.8). We give a
proof for a matrix version of the algorithm later (Theorem 3.6).
Definition 3.11 (Persistence pairs). Given a simplex-wise filtration F : K0 ,→ K1 ,→ · · · ,→ Kn ,
for 0 < i < j ≤ n, we say a p-simplex σi = Ki \ Ki−1 and a (p + 1)-simplex σ j = K j \ K j−1 form a
i, j
persistence pair (σi , σ j ) if and only if µ p > 0.
The full algorithm is presented in Algorithm 2:PairPersistence, which takes as input a se-
quence of simplices σ1 , σ2 , · · · , σn ordered according to the filtration of a complex whose persis-
tence diagram is to be computed. It assumes that the complex is represented combinatorially with
adjacency structures among its simplices.

Algorithm 2 PairPersistence(σ1 , σ2 , · · · , σn )
Input:
An ordered sequence of simplices forming a filtration of a complex
Output:
Determine if a simplex is ‘positive’ or ‘negative’ and generate persistent pairs
1: for j = 1 to n do
2: c := ∂ p σ j
3: σi is the youngest positive (p − 1)-simplex in c
4: while σi is paired and c is not empty do
5: Let c0 be the cycle destroyed by the simplex paired with σi \∗ computed previously in
step 10 ∗\
6: c := c0 + c \∗ this addition may cancel simplices ∗\
7: Update σi to be the youngest positive (p − 1)-simplex in c
8: end while
9: if c is not empty then
10: σ j is a negative p-simplex; generate pair (σi , σ j ); associate c with σ j as destroyed
11: else
12: σ j is a positive p-simplex \∗ σ j may get paired later ∗\
13: end if
14: end for

Let us again consider the example in Figure 3.11 and see how the algorithm Pair works. From
K7 to K8 , e8 is added. Its boundary is c = (v2 + v4 ). The vertex v4 is the youngest positive vertex
in c but it is paired with e7 in K7 . Thus, c is updated to (v3 + v4 + v4 + v2 ) = (v3 + v2 ). The vertex
v3 becomes the youngest positive one but it is paired with e5 . So, c is updated to (v1 + v2 ). The
68 Computational Topology for Data Analysis

vertex v2 becomes the youngest positive one but it is paired with e6 . So, c is updated to be empty.
Hence e8 is a positive edge. Now we examine the addition of the triangle t11 from K10 to K11 .
The boundary of t11 is c = (e5 + e6 + e9 ). The youngest positive edge e9 is paired with t10 . Thus,
c is updated by adding the cycle destroyed by t10 to (e5 + e6 + e7 + e8 ). Since e8 is the youngest
positive edge that is not yet paired, t11 finds e8 as its paired positive edge. Observe that, we finally
obtain a loop that is destroyed by adding the negative triangle. For example, we obtain the loop
(e5 + e6 + e7 + e8 ) by adding t11 .

3.3.1 Matrix reduction algorithm


There is a version of the algorithm PairPersistence that uses only matrix operations. First notice
the following:

• The boundary operator ∂ p : C p → C p−1 can be represented by a boundary matrix D p where


the columns correspond to the p-simplices and rows correspond to (p − 1)-simplices.

• It represents the transformation of a basis of C p given by the set of p-simplices to a basis


of C p−1 given by the set of (p − 1)-simplices.

1 if σi ∈ ∂ p σ j
(
D p [i, j] =
0 otherwise.

• One can
L combineL all boundary matrices into a single matrix D that represents all linear
maps ∂
p p = p (C p → C p−1 ), that is, transformation of a basis of all chain groups
together to a basis of itself, but with a shift to a one lower dimension.

1 if σi ∈ ∂∗ σ j
(
D[i, j] =
0 otherwise.

Definition 3.12 (Filtered boundary matrix). Let F : ∅ = K0 ,→ K1 ,→ . . . ,→ Kn = K be a


filtration induced by an ordering of simplices (σ1 , σ2 , . . . , σn ) in K. Let D denote the boundary
matrix for simplices in K that respects the ordering of the simplices in the filtration, that is, the
simplex σi in the filtration occupies column and row i in D. We call D the filtered boundary
matrix for F.

Given any matrix A, let rowA [i] and colA [ j] denote the ith row and jth column of A, respec-
tively. We abuse the notation slightly to let colA [ j] denote also the chain {σi | A[i, j] = 1}, which
is the collection of simplices corresponding to 1’s in the column colA [ j].

Definition 3.13 (Reduced matrix). Let lowA [ j] denote the row index of the last 1 in the jth column
of A, which we call the low-row index of the column j. It is undefined for empty columns (marked
with −1 in Algorithm 3). The matrix A is reduced (or is in reduced form) if lowA [ j] , lowA [ j0 ]
for any j , j0 ; that is, no two columns share the same low-row indices.

Fact 3.8. Given a matrix A in reduced form, we have that the set of non-zero columns in A are all
linearly independent over Z2 .
Computational Topology for Data Analysis 69

We define a matrix A over Z2 to be upper-triangular if all of its diagonal elements are 1,


and there is no entry A[i, j] = 1 with i > j. We will compute a reduced matrix from a given
boundary matrix by left-to-right column additions. A series of such column additions is equivalent
to multiplying the boundary matrix on right with an upper triangular matrix.
Now, we state a result saying that if a reduced form is obtained via only left-to-right column
additions, then for each column, the low-row index is unique in the sense that it does not depend on
how the reduced form is obtained. Using this result we show that persistence pairing of simplices
can be obtained from these low-row indices. Given an n1 × n2 matrix A, let A[c,d]
[a,b] , a ≤ b and c ≤ d,
denote the sub-matrix formed by rows a to b, and columns from c to d. In cases when b = n2 and
c = 1, we also write it as Ada := A1,d
a,n2 for simplicity. For any 1 ≤ i < j ≤ n, define the quantity
rA (i, j) as follows:
j j j−1 j−1
rA (i, j) = rank (Ai ) − rank (Ai+1 ) + rank (Ai+1 ) − rank (Ai ). (3.5)
Proposition 3.5 (Paring Uniqueness [106]). Let R = DV, where R is in reduced form and V is
upper triangular. Then for any 1 ≤ j ≤ n, lowR [ j] = i if and only if rD (i, j) = 1.
Next, we show that a pairing based on low-row indices indeed provides the persistent pairs
according to Definition 3.11.
Theorem 3.6. Let D be the m × m filtered boundary matrix for a filtration F (Definition 3.12).
Let R = DV, where R is in reduced form and V is upper triangular. Then, the simplices σi and σ j
in F form a persistent pair if and only if lowR [ j] = i.
Proof. First, it is easy to verify that rR (i, j) = rD (i, j) for any 1 ≤ i < j ≤ n (in particular,
rank (Rda ) = rank (Dda ) as the effect of V is to add columns of D to columns on the right only).
Combining this and Proposition 3.5, we only need to show that there is a persistent pair (σi , σ j )
i, j
(i.e., µ p = 1) if and only if rR (i, j) = 1.
Next, we observe that due to the uniqueness of the entry lowR [ j] (Proposition 3.5), if we
prove the theorem for a specific reduced matrix R0 = DV 0 , then it holds for any reduced form
R = DV. In what follows, we assume that the reduced form R = DV is obtained by Algorithm
3:MatPersistence(D). For this specific reduction algorithm, it is easy to see that if a simplex
σ j is of dimension p, then all columns ever added to the j-th column correspond to simplices
of dimension p. In particular, let D p denote the matrix obtained by setting all columns in D
corresponding to simplices of dimension , p to be all-zero; hence all non-zero columns in D p
represents the p-th boundary operator ∂ p : C p (K) → C p−1 (K). Define R p similarly. Then observe
that algorithm MatPersistence simply reduces each matrix D p independently, for all dimensions
p, and the reduced form for D p is R p .
In what follows, we assume that the dimension of simplex σ j (corresponding to the j-th
column of D) is p; and for simplicity, set R̃ := R p . We leave the proof of the following claim as
an exercise (Exercise 5).
Proposition 3.7. Let the dimension for σ j be p and construct R̃ = R p as described above. For
any 1 ≤ i < j, we have that rR (i, j) = rR̃ (i, j).
To this end, let Zkp−1 and Bkp−1 denote the (p − 1)-th cycle group and the (p − 1)-th boundary
group for Kk , respectively. Consider the persistence pairing function for 1 ≤ i < j ≤ n:
i, j i, j−1 i, j i−1, j−1 i−1, j
µ p−1 = (β p−1 − β p−1 ) − (β p−1 − β p−1 ). (3.6)
70 Computational Topology for Data Analysis

On the other hand, note that for any 1 ≤ a < b ≤ n,


Zap−1
βa,b
p−1 := rank (H p−1 ) = rank
a,b
= rank (Zap−1 ) − rank (Zap−1 ∩ Bbp−1 ).

(3.7)
Zap−1 ∩ Bbp−1

Let Γba := {colR̃ [k] | k ∈ [1, b] and 1 ≤ lowR̃ [k] ≤ a}. Using the facts that all non-zero columns
in R̃ with index at most b form a basis for Bbp−1 , and that each low-row index for every non-
zero column is unique, one can show that rank (Zap−1 ∩ Bbp−1 ) = |Γba |. Now consider the set of
all non-zero columns in R̃ with index at most b that are not in Γba , denoted by b Γba . Note that
Γa | = rank (R̃a+1 ) = rank (B p−1 ) − |Γa |; hence
|bb b b b

rank (Zap−1 ∩ Bbp−1 ) = |Γba | = rank (Bbp−1 ) − |b


Γba | = rank (R̃b1 ) − rank (R̃ba+1 ).

Combining the above with Proposition 3.7, Eqn. (3.6) and Eqn. (3.7), we thus have that:
i, j j−1 j
µ p−1 = rank (Zip−1 ) − rank (Zip−1 ∩ B p−1 ) − rank (Zip−1 ) − rank (Zip−1 ∩ B p−1 )


j−1  j
p−1 ) − rank (Z p−1 ∩ B p−1 ) + rank (Z p−1 ) − rank (Z p−1 ∩ B p−1 )
− rank (Zi−1 i−1 i−1 i−1

j j−1 j−1 j
= rank (Zip−1 ∩ B p−1 ) − rank (Zip−1 ∩ B p−1 ) + rank (Zi−1 i−1
p−1 ∩ B p−1 ) − rank (Z p−1 ∩ B p−1 )
j j−1 j−1 j
= − rank (R̃i+1 ) + rank (R̃i+1 ) − rank (R̃i ) + rank (R̃i ) = rR̃ (i, j) = rR (i, j) = rD (i, j).

By Proposition 3.5, the theorem then follows. 

Algorithm 3 MatPersistence(D)
Input:
Boundary matrix D of a complex with columns and rows ordered by a given filtration
Output:
Reduced matrix with each column j either being empty or having a unique lowD [ j] entity
1: for j = 1 → |colD | do
2: while ∃ j0 < j s.t. lowD [ j0 ] == lowD [ j] and lowD [ j] , −1 do
3: colD [ j] := colD [ j] + colD [ j0 ]
4: end while
5: if lowD [ j] , −1 then
6: i := lowD [ j] \∗ generate pair (σi , σ j ) ∗\
7: end if
8: end for

Matrix reduction algorithm. Notice that there are possibly many R and V for a fixed D forming
a reduced-form decomposition. Theorem 3.6 implies that the persistent pairing is independent of
the particular contents of R and V as long as R is reduced and V is upper triangular. If we reduce
a given filtered boundary matrix D to a reduced form R only with left-to-right column additions,
Computational Topology for Data Analysis 71

1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 1 1 2 1 1 2 1 1 2 1
3 1 1 3 1 1 1 3 1 1 3 1 1
4 1 1 4 1 1 4 1 1 1 4 1 1
5 1 1 5 1 1 5 1 5 1
6 1 1 6 1 6 1 6 1

(a) (b) (c) (d)

Figure 3.12: Matrix reduction for a 6 × 4 matrix D: low of columns are shaded to point out the
conflicts. (a) lowD [1] conflicts with lowD [2] and colD [1] is added to colD [2], (b) lowD [2] conflicts
with lowD [3], (c) lowD [3] conflicts with lowD [4], (d) the addition of colD [3] to colD [4] zero out
the entire column colD [4].

indeed then we obtain R = DV as required. With this principle, Algorithm 3:MatPersistence is


designed to compute the persistent pairs of simplices. We process the columns of D from left to
right which correspond to the order in which they appear in the filtration. The row indices also
follow the same order top down (thus “lower" refers to a larger index, which also means that a
simplex is “younger" in the filtration). We assume that |colD | denotes the number of columns in
D. Suppose we have processed all columns up to j − 1 and now are going to process the column
j. We check if the row lowD [ j] contains any other lowest 1 for any column j0 to the left of j, that
is j0 < j. If so, we add colD [ j0 ] to colD [ j]. This decreases lowD [ j]. We continue this process until
either we turn all entries in colD [ j] to be 0, or settle on lowD [ j] that does not conflict with any
other lowD [ j0 ] to its left. In the latter case, σ j is a negative p-simplex that pairs with the positive
(p − 1)-simplex σlowD [ j] . In the algorithm MatPersistence, we assume that when a column j is
zeroed out completely, lowD [ j] returns −1.
To compute the persistence diagram Dgm(F f ) for a filtration F f , we first run MatPersistence
on the filtered boundary matrix D representing F f . Every computed persistence pair (σi , σ j ) gives
a finite bar [ f (σ j ), f (σi )) or a point with finite coordinates ( f (σi ), f (σ j )) in Dgm(F f ). Every sim-
plex σi that remains unpaired provides an infinite bar [ f (σi ), ∞) or a point ( f (σi ), ∞) at infinity
in Dgm(F f ). Observe that not every positive p-simplex σi (column i is zeroed out) gives a point
at infinity in Dgm p (F f ), the only ones that do are the ones that are not paired with a (p + 1)-
simplex whose column is processed afterward. A simple fact about unpaired simplices follows
from Fact 3.4.

Fact 3.9. The number of unpaired p-simplices in a simplex-wise filtration of a simplicial complex
K equals its p-th Betti number β p (K).

We already mentioned that the input boundary matrix D should respect the filtration order,
that is, the row and column indices of D correspond to the indices of the simplices in the input fil-
tration. Observe that we can consider slightly different filtration without changing the persistence
pairs. We can arrange all of p-simplices for any p ≥ 0 together in the filtration without changing
72 Computational Topology for Data Analysis

their relative orders as follows where σij denotes the jth i-simplex among all i-simplices in the
original filtration.
p p p
(σ01 , σ02 . . . , σ0n0 ), . . . , (σ1 , σ2 , . . . , σn p ), . . . , (σd1 , σd2 , . . . , σdnd ) (3.8)

This means columns and rows of p-simplices in D become adjacent though retaining their relative
ordering from the original matrix. Observe that, by this rearrangement, all columns that are added
to a column j in the original D still remain to the left of j in their newly assigned indices. In other
words, processing the rearranged matrix D can be thought of as processing each individual p-
boundary matrix D p = [∂ p ] separately where the column and row indices respect the relative
orders of p and (p − 1)-simplices in the original filtration.

Complexity of MatPersistence. Let the filtration F based on which the boundary matrix D is
constructed insert n simplices. This means that D has at most n rows and columns. Then, the outer
for loop is executed at most O(n) times. Within this for loop, steps 5-7 takes only O(1) time. The
complexity is indeed determined by the while loop (steps 2-4). We argue that this loop iterates
at most O(n) times. This follows from the fact that each column addition in step 3 decreases
lowD [ j] by at least one and over the entire algorithm it cannot decrease by more than the length
of the column which is O(n). Each column addition in step 3 takes at most O(n) time giving a
total time of O(n2 ) for the while loop. Accounting for the outer for loop, we get a complexity of
O(n3 ) for MatPersistence.
One can implement the above matrix reduction algorithm with a more efficient data structure
noting that most of the entries in the input matrix D is empty. A linked list representing the
non-zero entries in the columns of D is space-wise more efficient. Edelsbrunner and Harer [149]
present a clever implementation of MatPersistence using such a sparse matrix representation. For
every column j, the algorithm executes O( j − i) column additions of O( j − i) length each incurring
a cost O(( j − i)2 ) where i = 1 if σ j is positive and is the index of the simplex σi with which it pairs
in case σ j is negative. Therefore, the total time complexity becomes O( j∈[1,n] ( j − i)2 ). Here, we
P
assume that the dimension of the complex K is a constant.
It is worth noting that essentially the matrix reduction algorithm is a version of the classical
Gaussian elimination method with a given column order and a specific choice of row pivots. In
this respect, persistence of a given filtration can be computed by the PLU factorization of a matrix
for which Bunch and Hopcroft [57] gives an O(M(n)) time algorithm where M(n) is the time to
multiply two n × n matrices. It is known that M(n) = O(nω ) where ω ∈ [2, 2.373) is called the
exponent for matrix multiplication.

3.3.2 Efficient implementation


The matrix reduction algorithm considers a column from left to right and reduces it by left-to-
right additions. As we have observed, every addition to a column with index j pushes lowD [ j]
upward. In the case, that σ j is a positive simplex, the entire column is zeroed out. In general,
positive simplices incur more cost than the negative ones because lowD [·] needs to be pushed all
the way up for zeroing out the entire column. However, they do not participate in any future
left-to-right column additions. Therefore, if it is known beforehand that the simplex σ j will be a
positive simplex, then the costly step of zeroing out the column j can be avoided.
Computational Topology for Data Analysis 73

Chen and Kerber [95] observed the following simple fact. If we process the input filtration
backward in dimension, that is, process the boundary matrices D p , p = 1, . . . , d in decreasing
order of dimensions, then a persistence pair (σ p−1 , σ p ) is detected from D p before processing the
column for σ p−1 in D p−1 . Fortunately, we already know that σ p−1 has to be a positive simplex
because it cannot pair with a negative simplex σ p otherwise. So, we can simply ignore the column
of σ p−1 while processing D p−1 . We call it clearing out column p − 1. In practice, this saves a
considerable amount of computation in cases where a lot of positive simplices occur such as in
Rips filtrations. Algorithm 4:ClearPersistence implements this idea.
We cannot take advantage of the clearing for the last dimension in the filtration. If d is the
highest dimension of the simplices in the input filtration, the matrix Dd has to be processed for all
columns because the pairings for the positive d-simplices are not available.

6 5 4 3 2 1 6 5 4 3 2 1
4 1 1 1 4 1 1 1
3 1 1 1 3 1 1 1 1
2 1 1 1 2 1 1
1 1 1 1 1 1 1 1
(a) (b)
6 5 4 3 2 1 6 5 4 3 2 1
4 1 1 1 4 1
3 1 1 1 3 1 1
2 1 1 2 1 1
1 1 1 1 1
(c) (d)

Figure 3.13: Matrix reduction with the twisted matrix D∗ of the matrix D in Figure 3.12 which is
first transposed and then got its rows and columns reversed in order; the conflicts in lowD [·] are
resolved to obtain the intermediate matrices shown in (a) through (d); the last transformation from
(c) to (d) assumes to complete all conflict resolutions from columns 3 through 1. Observe that
every column-row pair correspond to row-column pair in the original matrix. Also, all columns
that are zeroed out here correspond to all rows in the original that did not get paired with any
column meaning that they are either negative simplex, or positive simplex not paired with any.

If the number of d-simplices is large compared to simplices of lower dimensions, the incurred
cost of processing their columns can still be high. For example, in a Rips filtration restricted up to
a certain dimension d, the number of d-simplices becomes usually much larger than the number
of, say, 1-simplices. In those cases, the clearing can be more cost-effective if it can be applied
forward.
In this respect, the following observation becomes helpful. Let D∗p denote the anti-transpose
of the matrix D p , defined by the transpose of D p with the columns and rows being ordered in
74 Computational Topology for Data Analysis

Algorithm 4 ClearPersistence(D1 , D2 , . . . , Dd )
Input:
Boundary matrices ordered by dimension of the boundary operators with columns ordered by
filtration
Output:
Reduced matrices with each column for negative simplex having a unique low entry
1: MatPersistence(Dd )
2: for i = (d − 1) → 1 do
3: for j = 1 → |colDi | do
4: if σ j is not paired while processing Di+1 then
5: \∗ column j is not processed if σ j is already paired∗\
6: while ∃ j0 < j s.t. lowD [ j] , −1 and lowDi [ j0 ] == lowDi [ j] do
7: colDi [ j] := colDi [ j] + colDi [ j0 ]
8: end while
9: if lowD [ j] , −1 then
10: k := lowDi [ j] \∗ generate pair (σk , σ j ) ∗\
11: end if
12: end if
13: end for
14: end for

reverse. This means that if D p has row and column indices 1, . . . , m and 1, . . . , n respectively,
then D∗p (i, j) = D p (n + 1 − j, m + 1 − i). Call it the twisted matrix of D p . Figure 3.13 shows the
twisted matrix D∗ of the matrix D in Figure 3.12 where the rows and columns are marked with
the indices of the original matrix. The following proposition guarantees that we can compute the
persistence pairs in D p from the matrix D∗p .

Proposition 3.8. (σ p−1 , σ p ) is a persistence pair computed from D p if and only if (σ p , σ p−1 ) is
computed as a pair from D∗p by MatPersistence(D∗p ).

Proof. Let the indices of σ p−1 and σ p in D p be i and j respectively. Then, by Theorem 3.6, one
has lowR [ j] = i where R is the reduced matrix obtained from D p by left-to-right column additions.
Consider bottom-to-top row additions in D p each of which takes a row and adds it to a row above
it. Similar to lowA [ j] for a matrix A, let lftA [i] denote the column index of the leftmost 1 in the
row i of A. Call A left reduced if every non-zero row i has a unique lftA [i]. In the rest of the proof,
for simplicity, we use the row and column indices of D p also for D∗p , that is, by an index pair ( j, i)
in D∗p , we actually mean the pair (n + 1 − j, m + 1 − i).
First, observe that, each bottom-to-top row addition in D p is equivalent to a left-to-right col-
umn addition in D∗p . Also, a reduced matrix by left-to-right column additions in D∗p correspond
to a left reduced matrix obtained by corresponding bottom-to-top row additions in D p . So, if S
denotes the reduced matrix obtained from D∗p by left-to-right column additions and L denotes the
left reduced matrix obtained from D p by bottom-to-top row additions, then lowS [i] = j if and
only if lftL [i] = j. Furthermore, MatPersistence(D∗p ) computes the pair ( j, i) (hence (σ p , σ p−1 ) if
and only if lowS [i] = j.
Computational Topology for Data Analysis 75

Therefore, to prove the proposition, it is sufficient to argue that lowR [ j] = i if and only if
lftL [i] = j. By Proposition 3.5, lowR [ j] = i if and only if rD p (i, j) as defined in Eqn. (3.5) equals
1. Therefore, it is sufficient to show that lftL [i] = j if and only if rD p (i, j) = 1.
The above claim can be proved exactly the same way as Proposition 3.5 is proved in [106]
while replacing the role of lowR [ j] with lftL [i]. Observe that bottom-to-top row additions do not
change the rank of the lower left minors. Hence, rD p = rL . Therefore, it is sufficient to show
j
that lftL [i] = j if and only if rL (i, j) = 1. Assume first lftL [i] = j. The rows of Li (see the
j j
definitions above Eqn. (3.5)) are linearly independent and hence rank (Li ) − rank (Li+1 ) = 1.
j
Now delete the last column in Li which leaves the top row to have only zeroes. This im-
j−1 j−1
plies that rank (Li ) − rank (Li+1 ) = 0. This gives rL (i, j) = 1 as needed. Next, assume that
j j−1
lftL [i] , j. Consider Li and Li . If lftL [i] > j, the top row in both matrices is zero. Therefore,
j j j−1 j−1
rank (Li ) − rank (Li+1 ) = 0 and also rank (Li ) = rank (Li+1 ) giving rL (i, j) = 0 as required.
j j
If lftL [i] < j, the top row in both matrices are non-zero giving rank (Li ) − rank (Li+1 ) = 1 and
j−1 j−1
rank (Li ) − rank (Li+1 ) = 1 giving again rL (i, j) = 0 as required. 

To apply clearing we process D∗p+1 after D∗p by calling ClearPersistence(D∗d , · · · , D∗2 , D∗1 )
because if we get a pair (σ p+1 , σ p ) while processing D∗p , we already know that σ p+1 is a negative
simplex and its column in D∗p+1 cannot contain a defined low entry. This means that the column
of σ p+1 in D∗p+1 can be zeroed out and hence can be ignored. Now, the only boundary matrix that
needs to be processed without any clearing is D∗1 . So, depending on whether Dd or D1 is large,
one can choose to process the filtration in increasing or decreasing dimensions respectively.

3.4 Persistence modules


We have seen in Section 3.2.1 that persistence diagrams are stable with respect to the perturbation
of the function that defines the filtration on a given simplicial complex or a space. This requires the
domain of the function to be fixed. The result depends on the observation that perturbations in the
filtrations are bounded by the perturbations in the function which in turn also results into bounded
perturbations at the homology level. A natural follow up is to derive a bound of the perturbations
of the persistence diagrams directly in terms of the perturbations at the homology level. Toward
this goal, we now define a generalized notion of homology modules called persistence modules
and a distance among them called the interleaving distance.
Recall that a filtration gives rise to a homology module which is a sequence of homology
groups connected by homomorphisms that are induced by inclusions defining the filtration. These
homology groups when defined over a field (e.g. Z2 ) can be thought of as vector spaces connected
by linear maps. Persistence modules extend homology modules by taking vector spaces in place
of homology groups and linear maps in place of inclusion induced homomorphisms.
We make one more extension. So far, the sequences in a filtration and homology modules
have been indexed over a finite subset of natural numbers. It turns out that we can enlarge the
index set to be any subposet A of R. In the following definition persistence modules and their
interleaving distance are defined over the poset (A, ≤).
76 Computational Topology for Data Analysis

Definition 3.14 (Persistence module). A persistence module over a poset A ⊆ R is any collection
V = Va a∈A of vector spaces Va together with linear maps va,a0 : Va → Va0 so that va,a = id and

va,a0
va0 ,a00 ◦ va,a0 = va,a00 for all a, a0 , a00 ∈ A where a ≤ a0 ≤ a00 . Sometimes we write V = Va −→

Va0 a≤a0 to denote this collection with the maps.
Remark 3.3. A persistence module defined over a subposet A of R can be ‘extended’ into a
module over R. For this, for any a < a0 ∈ A where the open interval (a, a0 ) is not in A and for any
a ≤ b < b0 < a0 , assume that vb,b0 is an isomorphism and lima→−∞ Va = 0 if it is not given.
Our goal is to define a distance between two persistence modules with respect to which we
would bound the distance between their persistence diagrams. Given two persistence modules
defined over R, we define a distance between them by identifying maps between constituent vector
spaces of the modules.
We will come across a structural property involving maps called commutative diagrams quite
often in this and following chapters.
fi
Definition 3.15 (Commutative diagram). A commutative diagram is a collection of maps Ai → Bi
where any two compositions of maps beginning and ending in the same sets result in equal maps.
Formally, whenever we have two sequences in the collection of the form:
f1 fm
A = U1 → U2 · · · → Um+1 = B
g1 gn
A = V1 → V2 · · · → Vn+1 = B,
we have fm ◦ · · · ◦ f1 = gn ◦ · · · ◦ g1 . Commutative diagrams are usually formed by commutative
triangles and squares.
Definition 3.16 (ε-interleaving). Let U and V be two persistence modules over the index set
R. We say U and V are ε-interleaved if there exist two families of maps ϕa : Ua → Va+ε and
ψa : Va → Ua+ε satisfying the following two conditions:
1. va+ε,a0 +ε ◦ ϕa = ϕa0 ◦ ua,a0 and ua+ε,a0 +ε ◦ ψa = ψa0 ◦ va,a0 [rectangular commutativity]
2. ψa+ε ◦ ϕa = ua,a+2ε and ϕa+ε ◦ ψa = va,a+2ε [triangular commutativity]

U : ... / Ua / Ua+ε / Ua+2ε / ......


< : 9

" $
V : ... / Va / Va+ε / Va+2ε / . .% . . . .

Some of the relevant maps for interleaving between two modules are shown above whereas the
two parallelograms and the two triangles below depict the rectangular and the triangular commu-
tativities respectively.

ua,a0 ua+ε,a0 +ε
Ua / Ua0 U / Va0 +ε
< a+ε <
ϕa ϕa0 ψa ψa0

" "
Va+ε / Va0 +ε Va / Va0
va+ε,a0 +ε va,a0
Computational Topology for Data Analysis 77

ua,a+2ε
Ua / Ua+2ε U
; < a+ε
ϕa ψa+ε ψa ϕa+ε

" va,a+2ε #
Va+ε Va / Va+2ε

Definition 3.17 (Interleaving distance). Given two persistence modules U and V, their interleav-
ing distance is defined as

dI (U, V) = inf{ε | U and V are ε-interleaved}.

Observe that, when ε = 0, Definition 3.16 implies that the maps ϕa : Ua → Va and ψa : Va →
Ua are isomorphisms. In that case, we get the following diagrams where each vertical map is an
isomorphism and each square commutes. We get two isomorphic persistence modules.

U: ······ / Ua / ··· Ua0 / ······

V: ······ / Va / ··· Va0 / ······

Definition 3.18 (Isomorphic persistence modules). We say two persistence modules U and V
indexed over an index set A ⊆ R are isomorphic if the following two conditions hold (illustrated
by the diagram above).
1. Ua  Va for every a ∈ A, and

2. for every x ∈ Ua , if x is mapped to y ∈ Va by the isomorphism, then ua,a0 (x) ∈ Ua0 is


mapped to va,a0 (y) ∈ Va0 also by the isomorphism.

Fact 3.10. If two persistence modules arising from two filtrations F f and Fg are isomorphic, the
persistence diagrams Dgm p (F f ) and Dgm p (Fg ) are identical.
We have seen earlier that filtrations give rise to homology modules and hence persistence
modules. Just like the persistence modules, we can define an interleaving distance between two
filtrations too. In the following definition, ιa,a0 denotes the inclusion map from Xa to Xa0 and also
from Ya to Ya0 for a0 ≥ a. For simplicial filtrations, we need contiguity of simplicial maps to
assert equality of maps at the homology level whereas for space filtrations, we need homotopy
of continuous maps to assert equality at the homology level. These maps are between filtrations
and not internal maps within the filtrations which are still inclusions. In next chapter, we go from
inclusions to simplicial maps as internal maps (see Definition 4.2).
Definition 3.19 (ε-interleaving). We say two simplicial (space resp.) filtrations X and Y defined
over R are ε-interleaved if there exist two families of simplicial (continuous resp.) maps ϕa :
Xa → Ya+ε and ψa : Ya → Xa+ε satisfying the following two conditions:
1. ϕa ◦ ιa+ε,a0 +ε is contiguous (homotopic) to ϕa0 ◦ ιa,a0 and ιa+ε,a0 +ε ◦ ψa is contiguous (homo-
topic) to ψa0 ◦ ιa,a0 [rectangular commutativity]

2. ψa+ε ◦ ϕa is contiguous (homotopic) to ιa,a+2ε and ϕa+ε ◦ ψa is contiguous (homotopic) to


ιa,a+2ε [triangular commutativity]
78 Computational Topology for Data Analysis

Similar to the persistence modules, we can define the interleaving distance between two fil-
trations:
dI (X, Y) = inf{ε | X and Y are ε-interleaved}
Two ε-interleaved filtrations give rise to ε-interleaved persistence modules at the homology level.
Since contiguous simplicial (homotopic continuous resp.) maps become equal at the homology
level, we obtain the following inequality.

Proposition 3.9. dI (H p X, H p Y) ≤ dI (X, Y) for any p ≥ 0 where H p X and H p Y denote the persis-
tence modules of X and Y respectively at the homology level.

Now we relate the interleaving distance between two persistence modules and the persistence
diagrams they define. For this, we consider a special type of a persistence module called interval
module. Below, we use the standard convention that an open end of an interval is denoted with
the first brackets ‘(’ or ‘)’ and a closed end of an interval with the third brackets ‘[’ or ‘]’.

Definition 3.20 (Interval module). Given an index set A ⊆ R and a pair of indices b, d ∈ A,
b ≤ d, four types of interval modules denoted I[b, d), I(b, d], I[b, d], I(b, d) respectively are special
persistence modules defined as:
va,a0
• (closed-open): I[b, d) : {Va −→ Va0 }a,a0 ∈A where (i) Va = Z2 for all a ∈ [b, d) and Va = 0
otherwise, (ii) va,a0 is identity map for b ≤ a ≤ a0 < d and zero map otherwise.
va,a0
• (open-closed): I(b, d] : {Va −→ Va0 }a,a0 ∈A where (i) Va = Z2 for all a ∈ (b, d] and Va = 0
otherwise, (ii) va,a0 is identity map for b < a ≤ a0 ≤ d and zero map otherwise.
va,a0
• (closed-closed): I[b, d] : {Va −→ Va0 }a,a0 ∈A where (i) Va = Z2 for all a ∈ [b, d] and Va = 0
otherwise, (ii) va,a0 is identity map for b ≤ a ≤ a0 ≤ d and zero map otherwise.
va,a0
• (open-open): I(b, d) : {Va −→ Va0 }a,a0 ∈A where (i) Va = Z2 for all a ∈ (b, d) and Va = 0
otherwise, (ii) va,a0 is identity map for b < a ≤ a0 < d and zero map otherwise.

In general, we denote the four types of interval modules as Ihb, di being oblivious to the
particular type. The two end points b, d signify the birth and the death points of the interval
in analogy to the bars we have seen for persistence diagrams. This is why sometimes we also
write Ihb, di = hb, di. Gabriel [163] showed that a persistence module decomposes uniquely into
interval modules when the index set is finite. This condition can be relaxed further as stated in
proposition below. A persistence module U for which each of the vectors spaces Ua , a ∈ A ⊆ R
has finite dimension is called a pointwise finite dimensional (p.f.d. in short) persistence module.
A persistence module for which the connecting linear maps have finite rank is called q-tame. The
results below are part of a more general concept called quiver theory.

Proposition 3.10.

• Any persistence module over a finite index setL


decomposes uniquely up to isomorphism into
closed-closed interval modules, that is, U  j∈J I[b j , d j ] [163].
Computational Topology for Data Analysis 79

Any p.f.d. persistence module decomposes uniquely into interval modules, that is, U 
• L
j∈J Ihb j , d j i [111, 298].

• Any q-tame persistence module decomposes uniquely into interval modules [80].

The birth and death points of the interval modules that a given persistence module U decom-
poses into (Proposition 3.10) can be plotted as points in R2 . This defines a persistence diagram
DgmU for a persistence module U. We aim to relate the interleaving distance between persistence
modules and the bottleneck distance between their persistence diagram thus defined.

j hb j , d j i be the interval decomposition


L
Definition 3.21 (PD for persistence module). Let U 
of a given persistence module U (Proposition 3.10). The collection of points {(b j , d j )} with proper
multiplicity and the points on the diagonal ∆ : {(x, x)} with infinite multiplicity constitute the
persistence diagram DgmU of the persistence module U.

For the index set A = R, Chazal et al. [77] showed that the bottleneck distance between two
persistence diagrams of p.f.d. modules is bounded from above by their interleaving distance. The
result also holds for q-tame modules. It is proved in [23, 220] that the two distances are indeed
equal.

Theorem 3.11. Given two q-tame persistence modules defined over the totally ordered index set
R, dI (U, V) = db (Dgm U, Dgm V).

Remark 3.4. The isometry theorem stated for the index set R does not apply directly to the per-
sistence modules that are not defined over the index set R. In this case, to define the interleaving
distance, we can extend the module to be indexed over R as described in Remark 3.3. For exam-
ple, consider a persistence module H p F obtained from a filtration F defined on a finite index set
A or when A = Z. Observe that, all interval modules for H p F (without extension) are of closed-
closed type [b, d] for some b, d ∈ A. This brings out a subtlety. The intervals of the form [b, d]
where b = d are mapped to the diagonal ∆ in the persistence diagram. These points get ignored
while computing the bottleneck distance as both diagrams have the diagonal points with infinite
multiplicity. In fact, the isometry theorem (Theorem 3.11) does not hold if this is not taken care
of. To address the issue, for persistence modules H p F generated by a finite filtration F, we map
each interval [b, d] in the decomposition of H p F to a point (b, d + 1) in Dgm p (F) (Definition 3.8).
This aligns with the observation that, after the extension over the index set R, the interval [b, d]
indeed stretches to [b, d + 1).

3.5 Persistence for PL-functions


Given a PL-function f : |K| → R on a simplicial complex K (Definition 3.2), we can produce a
simplicial filtration of K as well as a space filtration of the topological space |K|. In this section,
we study the relation between the persistent homology of the two noting that the former involves
simplicial homology and the latter singular homology. Observe that the PL framework allows us
to inspect the topological space |K| through the lens of the function f , and is useful in practice as
one can describe different properties of K by designing appropriate (descriptor) function f .
80 Computational Topology for Data Analysis

In Section 3.5.1, we describe critical points of such functions. The restriction of f on the
vertex set of K is a vertex function fV : V(K) → R which naturally induces a simplicial filtration
(the lower-star filtration). In Section 3.5.2, we relate the space filtration of the PL-function f :
|K| → R with the simplicial filtration induced by fV , which in turn allows us to apply the output of
the persistence algorithm run on F fV to the space filtration. Finally, in Section 3.5.3, we present a
simple algorithm to compute the 0-th persistence diagram induced by a PL-function (thus also a
vertex function).

3.5.1 PL-functions and critical points


In Section 1.5, we discussed smooth functions defied on smooth manifolds. However, often
the domain is a piecewise-linear domain such as a simplicial complex, and a natural family of
functions defined on a simplicial complex is a piecewise-linear (PL) function as introduced in
Definition 3.2; in particular, recall that a PL-function f : |K| → R is determined by its restriction
on vertices f |V(K) : V(K) → R and linearly extending it within each simplex σ ∈ K. From now
on, we will simplify notations and use f to denote the vertex function f |V(K) as well; that is, write
the vertex function also as f : V(K) → R.

PL-critical points. For a Morse function f defined on a smooth d-manifold M, the Morse
Lemma (see Proposition 1.2) suggests that the index of a critical point p is completely determined
by its local neighborhood within the sub-level set M≤ f (p) . For PL-functions, this is captured by
lower-star and lower-link. We define the PL-critical points for PL-functions using homology
groups. However, as the neighborhood of a point is not necessarily a topological ball, we now
need to consider both the lower and upper links. In this context, it is more convenient to use the
p-th reduced Betti number β̃ p (X) of a space/complex X.

Definition 3.22 (Reduced Betti number). β̃ p (X) = β p (X) for p > 0. For p = 0, β̃0 (X) = β0 (X) − 1
and β̃−1 (X) = 0 if X is not empty; otherwise, β̃0 (X) = 0 and β̃−1 (X) = 1.

Definition 3.23 (PL-critical points). Given a PL-function f : |K| → R, we say that a vertex v ∈ K
is a regular vertex or point if β̃ p (Llk(v)) = 0 and β̃ p (Ulk(v)) = 0 for any p ≥ −1. Otherwise, it is
a PL-critical (or simply critical) vertex or point. Furthermore, we say that v has lower-link-index
p if β̃ p−1 (Llk(v)) > 0 . Similarly v has upper-link-index p if β̃ p−1 (Ulk(v)) > 0.
The function value of a critical point is a critical value for f .

Discussions of PL-critical points. Some examples of PL-critical points are given in Figure
3.14. As mentioned above, in the smooth case for a Morse function defined on a m-manifold M,
the type of a non-degenerate critical point v is completely defined by its local neighborhood lower
than f (v) (as the portion higher than f (v) is its complement w.r.t. a m-ball). This is no longer
the case for the PL case, as we see in Figure 3.14. We also note that a PL-critical point could
have multiple lower-link-indices and upper-link-indices. Nevertheless, as we will see later (e.g.,
Theorem 3.13), these PL-critical points are related to the change of homology groups within the
sublevel-sets or superlevel-sets, somewhat analogous to the smooth setting.
We note that there are other concepts of “critical values" exist in the literature. In particular,
the concept of homological critical values is introduced in [102] for a function f : M → R defined
Computational Topology for Data Analysis 81

p p p p

e
(a) (b) (c) (d)

Figure 3.14: The point p is a regular point in (a). The point p is PL-critical in (b), (c) and
(d). Light-blue shaded triangles are in the lower-star, light-pink ones are in the upper-star; while
light-yellow shaded ones are in neither. In (b), note that edge e is not in Llk(p); here p has lower-
link-index 1 as β̃0 (Llk(p)) = 1. In (c), the point p has upper-link-index 2. In (d), the point p has
lower-link-index 1 and upper-link-index 2.

on a topological space M. In particular, α ∈ R is a homological critical value if there exists some


p ≥ 0 such that for all sufficiently small ε > 0, the homomorphism H p ( f −1 (−∞, α − ε]) →
H p ( f −1 (−∞, α + ε]) induced by inclusion is not an isomorphism. It can be shown that for a PL-
function, any PL-critical point with a non-zero lower-link-index is homological critical. In other
words, we can think of our definition of PL-critical points provides an explicit characterization
for the local neighborhood of points giving rise to “critical values" for the PL-setting – this now
allows us to identify critical points using only the star/link of a point.
The homological critical value of [102] is not symmetric w.r.t. the role of sub- versus super-
level sets for general spaces. Indeed, one could also define a symmetric version using superlevel
sets. The point in Figure 3.14 (c) does not give rise to a homological critical value w.r.t. the
sublevel sets, but does w.r.t. the superlevel sets. A more general (and symmetric) concept of
critical values is introduced in [63], which we formally define later (Definition 4.14) in Chapter 4
when we describe the more general zigzag persistence modules.

Two choices of “sublevel sets". Consider a PL-function f : |K| → R. Its sublevel set at a is
given by
|K|a := {x ∈ |K| | f (x) ≤ a},
which gives rise to a space filtration over |K| as a increases. Let us call it a space sublevel set.
On the other hand, given a ∈ R, we can also consider the subcomplex Ka spanned by all
vertices from K whose function value is at most a; that is,

Ka := {{u0 , . . . , ud } ∈ K | f (ui ) ≤ a}.

We refer to Ka as the simplicial sublevel set w.r.t f : |K| → R (or w.r.t. the vertex function f |V(K) :
V(K) → R). Assume vertices v1 , . . . , vn ∈ V(K) are ordered so that f (v1 ) ≤ f (v2 ) ≤ · · · f (vn ). It
is easy to see that Ka = K f (vi ) if a ∈ [ f (vi ), f (vi+1 )). Note that this is also the sublevel set for the
simplex-wise monotonic function f¯ introduced in Fact 3.1. These two “types” of sublevel sets
relate to each other via the following result.
82 Computational Topology for Data Analysis

f
q p3
p2
z
a
p0 p1
p

Figure 3.15: Consider the simplex σ = {p0 , p1 , p2 , p3 }, where τI = {p0 , p1 } and τO = {p2 , p3 }.
The shaded region equals |σ| ∩ |K|a . This shaded region is the union of a set of segments pz
which are disjoint in their interior. The map µ deformation retracts the segment pz to the point
p ∈ |τI | ⊆ |Ka |.

Theorem 3.12. Given a PL-function f : |K| → R, for any a ∈ R, the space and simplicial sublevel
sets have isomorphic homology groups; that is, H∗ (Ka )H∗ (|K|a ).
Furthermore, the following diagram commutes, where the horizontal homomorphisms are
induced by natural inclusions:

H∗ (Ka ) / H∗ (Kb )

 
 
H∗ (|K|a ) / H∗ (|K|b )

Proof. If a < f (v1 ), Ka = ∅ and |K|a = ∅. If a ≥ f (vn ), Ka = K and |K|a = |K|. Thus
the theorem holds in both cases. Now assume a ∈ [ f (vi ), f (vi+1 )) for some i ∈ [1, n). In this
case, Ka = K f (vi ) = j≤i Lst(v j ). It follows that |Ka | ⊆ |K|a . We now show that there is a
S
continuous map µ : |K|a × [0, 1] → |K|a that will continuously deform the identity map on |K|a
to a retraction from |K|a to |Ka |. In other words, µ is a deformation retraction from |K|a onto |Ka |;
thus |Ka | ,→ |K|a induces an isomorphism at the homology level. This will then establish the first
part of the theorem.
For any point x ∈ |Ka |, we set µ(t, x) = x for any t ∈ [0, 1]. Now the set of points in
A := |K|a \ |Ka | form a set of “partial simplices": In particular, since f is a PL-function, there is a
set C of simplices in K such that A = ∪σ∈C interior(σ) ∩ |K|a , where interior(σ) denotes the set
of points in |σ| that is not in any proper face of σ. We construct the map µ on A by constructing
its restriction to each simplex σ ∈ C.
Specifically, consider σ = {p0 , . . . , pd } ∈ C. Let τI = {p0 , . . . , p s } be the maximal face of
σ contained in |K|a , and τO = {p s+1 , . . . , pd } is then the face outside |K|a spanned by vertices
of σ not in |K|a . See Figure 3.15. On the other hand, we can write the underlying space |σ|
as |σ| = p∈|τI | q∈|τO | pq, where pq denotes the convex combination of p and q (line segment
S S
from p to q). Furthermore, pq ∩ |K|a = pz with f (z) = a as f is a PL-function. For any point
x ∈ pz, we simply set µ(t, x) = (1 − t)x + tp. This map is well-defined as all lines pq, p ∈ |τI | and
q ∈ |τO |, are disjoint in their interior. Since f is piecewise linear on σ, the map µ as constructed
is continuous. Also, µ(0, ·) is identity on |K|a , and µ(1, ·) : |K|a → |Ka | is a retraction map. Thus µ
is a deformation retraction and by Fact 1.1, |K|a and |Ka | are homotopy equivalent, implying that
H∗ (|K|a )H∗ (|Ka |). The first part of theorem then follows.
Computational Topology for Data Analysis 83

Furthermore, given that µ is a deformation retraction, the natural inclusion |Ka | ⊆ |K|a induces
an isomorphism at the homology level. The second part of the theorem follows from this, com-
bined with the naturality of the isomorphism H∗ (Ka )H∗ (|Ka |). 

We note that we can also inspect the superlevel sets for the underlying space |K| and for the
simplicial setting in a symmetric manner. A result analogous to the above theorem also holds for
the superlevel sets.

Relation to PL-critical points. Similar to critical points for smooth functions, the homology
group of the sublevel sets can only change at the PL-critical points. For simplicity, in what follows
we set Ki := K f (vi ) , for any i ∈ [1, n]. Observe that for any a ∈ R, if complex Ka is non-empty,
then it equals Ki for some i; in particular, Ka = Ki where a ∈ [ f (vi ), f (vi+1 )).

Theorem 3.13 (PL-critical Points). Let f : |K| → R be a PL-function defined on a simplicial


complex K. For any index r ∈ [2, n] and dimension p ≥ 0, the inclusion Kr−1 ,→ Kr induces an
isomorphism H p (Kr−1 )H p (Kr ) unless vr is a PL-critical point of lower-link-index p or p + 1.
A symmetric statement for the superlevel sets and PL-critical points of non-zero upper-link-
index also holds.

Proof. Let A = Lst(vr ) be the closed lower-star of vr , and B = Kr−1 . Set U = A ∪ B and
V = A ∩ B; it is easy to see that U = Kr ; while V = Llk(vr ). Furthermore, by the definition of
lower-stars and lower-links over a simplicial complex, A = Lst(vr ) equals the coning of vr and
Llk(vr ). It follows that A has trivial reduced homology for all dimensions. Now consider the
following (Mayer-Vietoris) exact sequence:
φ
· · · −→ H̃ p (V) −→ H̃ p (A) ⊕ H̃ p (B) −→ H̃ p (U) −→ H̃ p−1 (V) −→ · · · (3.9)

Assume vr is neither an lower-link-index-(p + 1) PL-critical point, nor an index-p one. Since vr


is not lower-link-index-(p + 1) PL-critical, H̃ p (V) is trivial. Thus by exactness of the sequence,
the homomorphism φ must be injective. Similarly, as vr is not lower-link-index-p PL-critical,
H̃ p−1 (V) is trivial; thus φ must be surjective. Hence φ is an isomorphism. Furthermore, note that
H̃ p (A) ⊕ H̃ p (B) = 0 ⊕ H̃ p (B) as A has trivial homology. It then follows that H p (Kr−1 )H p (Kr )
induced by inclusion map Kr−1 ,→ Kr . The claim then follows. 

Corollary 3.14. Given a PL-function f : |K| → R defined on a finite simplicial complex K, let
[a, b] ⊂ R be such that it does not contain any PL-critical value for f .
(1) Then the inclusion map Ka ,→ Kb induces isomorphism between the simplicial homology
groups, that is, H p (Ka )H p (Kb ) for any dimension p ≥ 0.
(2) This also implies that |K|a ,→ |K|b induces isomorphism between the singular homology
groups, that is, H p (|K|a )H p (|K|b ) for any dimension p ≥ 0.

Again, a version of the above Corollary also holds for superlevel sets.
84 Computational Topology for Data Analysis

3.5.2 Lower star filtration and its persistent homology


Let f : |K| → R be a PL-function. Recall that Ki := K f (vi ) where v1 , . . . , vn are ordered in non-
decreasing values of f . Setting ai = f (vi ), we write |K|ai = |K| f (vi ) . The two different types of
sublevel sets give rise to two sequences of growing spaces:

Lower star simplicial filtration F f : ∅ ,→ K1 ,→ K2 ,→ · · · ,→ Kn−1 ,→ Kn = K; (3.10)


Sublevel set space filtration Fcf : ∅ ,→ |K|a ,→ |K|a ,→ · · · ,→ |K|a ,→ |K|a = |K| (3.11)
1 2 n−1 n

As Ki := j≤i Lst(v j ) is the union of the lower-stars of v1 , . . . , vi , we call the filtration in Eqn.
S
(3.10) the lower star filtration for f ; see also Section 3.1.2 and Figure 3.6. The two homology
modules H p F f and H p b
F f can be shown to be isomorphic due to Theorem 3.12, and thus they
produce identical persistence diagrams (Fact 3.10).

Corollary 3.15. The homology module H p F f is isomorphic to the homology module H p b F f for
every p ≥ 0. This implies that these two persistence modules have the same persistence diagrams.

Intuitively, the lower-star filtration of the simplicial complex K can be thought of as the dis-
crete version of the sublevel set filtration of the space |K| w.r.t. the PL-function f . By Corollary
3.15, the lower star simplicial filtratoin F f and the sublevel set space filtration F
cf have identical
persistence diagrams. We refer to this common persistence diagram as the persistence diagram of
the PL-function f , denoted by Dgm f .
For a space filtration induced by a Morse function defined on a Riemannian manifold, the
birth- and death-coordinates of the points in the persistence diagrams correspond to critical values
of this Morse function. A similar result holds for the PL-case. In particular, one can prove, using
Corollary 3.14 that, for a PL-function f , the persistence pairings for F f occur only between PL-
critical points. That is:
i, j
Fact 3.11. Given a PL-function f : |K| → R and its associated filtration F f , let µ f,p denote the
i, j
corresponding p-th persistence pairing function w.r.t. F f . If µ f,p , 0, then vertices vi and v j must
be PL-critical.

However, not all PL-critical points necessarily appear in persistence pairings w.r.t. the lower
star filtration F f .

Computing persistence diagram induced by F f and F cf . By Corollary 3.15, we only need to


describe how to compute the persistence diagram for the lower star filtration F f . We will do so via
Algorithm 3:MatPersistence from Section 3.3. However, recall that algorithm MatPersistence
works on simplex-wise filtrations (Definition 3.1). The algorithm either pairs each simplex with
another simplex producing a persistence pairing, or leaves it unpaired producing an essential
persistent point (b, ∞) in the persistence diagram. To compute Dgm f , we first expand F f into a
simplex-wise filtration F s induced by a total ordering of all m simplices in K

σ1 , . . . σI1 , σI1 +1 , . . . , σI2 , σI2 +1 , . . . , σI j , σI j +1 , . . . , σI j+1 , . . . , σIn−‘ , σIn−1 +1 , . . . , σIn =m (3.12)

so that following two conditions hold:


Computational Topology for Data Analysis 85

• Lst(v j ) = {σI j−1 +1 , . . . , σI j }, for any j ∈ [1, n] (here I0 = 0);

• for any simplex σ, its faces appear earlier than it in the total ordering of simplices.

With this total ordering of simplices, the induced simplex-wise filtration becomes:

F s : L1 ,→ L2 ,→ · · · Lm , where Li := {σ j | j ≤ i} and thus σi = Li \ Li−1 . (3.13)

Note that Ki = LIi ; thus F f is a subsequence of the simplex-wise filtration F s . The construction
of F s from F f is not necessarily unique. We can simply choose σI j−1 +1 , . . . , σI j to be the set of
simplices in Lst(v j ) sorted by their dimension. We now construct the following map π : [0, m] →
[0, n] as π( j) = k if j ∈ [Ik−1 + 1, Ik ]; that is, π( j) = k means that simplex σ j is in the lower-star of
vertex vk .
We run the persistence algorithm MatPersistence (Algorithm 3) on the simplex-wise filtration
i, j
F s . Let µ s,p denote the persistence pairing function w.r.t. F s . Many of the pairings are between
two simplices within the same lower-star of a vertex and are not interesting. Instead, we aim to
compute the persistence diagram Dgm f for the filtration F f , which captures only the non-local
pairings where the birth and death are from different Ki s. The following theorem specifies how we
can compute the persistence diagram Dgm f for the filtration F f from the output of the persistence
algorithm with the simplex-wise filtration F s as input.

Theorem 3.16 (Computation of Dgm f in the PL-case). Given a PL-function f : |K| → R, let
i, j
µ s,p denote the p-dimensional persistence pairing function w.r.t. the simplex-wise filtration F s as
i, j
described above. We can compute the persistence pairing function µ f,p w.r.t. F f as follows:
X X
i, j
µ f,p := s,p for any i < j ≤ n; and µ f,p :=
µb,d i,∞
µb,∞
s,p for any i ≤ n.
b∈(Ii−1 ,Ii ],d∈(I j−1 ,I j ] b∈(Ii−1 ,Ii ]

i, j
If µ f,p , 0, we refer to (vi , v j ) as a persistence pair w.r.t. f and we add the corresponding
i, j
persistent point ( f (vi ), f (v j )), with multiplicity µ f,p , 0, to the persistence diagram Dgm f . The
persistence of this pair (vi , v j ) is | f (vi ) − f (v j )|.

Lst(vi ) 1
1

..... ...
Lst(vj )

i, j
Figure 3.16: µ f,p = 2.
86 Computational Topology for Data Analysis

Remark 3.5. As an example, see Figure 3.16 which shows the reduced matrix after running
Algorithm 3:MatPersistence on the filtered boundary matrix D for F s where ‘1’ indicates the
lowest ‘1’ in the shaded columns. Only columns corresponding to p-simplices are shown. We
i, j i, j
have µ f,p = 2. One can have an alternate view of the persistence pairs given by µ f,p as follows:
i, j
for each persistence index pair (i, j) ∈ Dgm(F s ) (i.e., µ p > 0 w.r.t. F s ), one has a persistence
pair (vπ(i) , vπ( j) ) for F f if an only if π(i) , π( j). In other words, all local pairs (i, j) ∈ Dgm(F s )
with π(i) = π( j) signifying that σi and σ j are from the lower-star of the same vertex are ignored
for the persistence diagram Dgm(F f ).

i, j i, j
Proof. [Proof of Theorem 3.16.] Recall that µ f,p and µ s,p are the persistence pairing functions
i, j i, j
induced by the filtration F f and F s , respectively. Similarly, we use β f,p and β s,p to denote the
persistent Betti numbers induced by filtrations F f and F s , respectively. In what follows, we will
i, j
just prove that, for any dimension p ≥ 0 and i, j ∈ [1, n], we have that µ f,p can be computed as
stated in the theorem. The case when j = ∞ can be handled in a similar manner and is left as an
exercise.
For any i ∈ [1, n], let Ii be as defined in Eqn. (3.12). Given the relation of F f and F s , it follows
i0 , j0 Ii0 ,I j0
that for any i0 , j0 ∈ [1, n], we have that Ki0 = LIi0 , K j0 = LI j0 , and thus β f,p = β s,p as F f is a
subsequence of F s .
Now fix the dimension p ≥ 0, and for simplicity, omit p from all subscripts. Given any
i, j ∈ [1, n], we have that:
i, j i, j−1 i, j i−1, j−1 i−1, j
µ f = (β f − β f ) − (β f − βf )
I ,I j−1 I ,I I ,I j−1 I ,I I ,I
= (β si − β si j ) − (β si−1 − β si−1 j ) = µ si j .
I ,I
Hence we aim to show µ si j = b∈(Ii−1 ,Ii ],d∈(I j−1 ,I j ] µb,d
P
s which then proves the theorem. To this end,
note that by Theorem 3.1, we have the following:
I ,I I ,I
X X X
β si j−1 − β si j = µb,d
s − µb,d
s = µb,d
s ;
b≤Ii ,d>I j−1 b≤Ii ,d>I j b≤Ii ,d∈(I j−1 ,I j ]
I ,I I ,I
X X X
β si−1 j−1 − β si−1 j = µb,d
s − s =
µb,d µb,d
s ;
b≤Ii−1 ,d>I j−1 b≤Ii−1 ,d>I j b≤Ii−1 ,d∈(I j−1 ,I j ]
I ,I
X X X
⇒ µ si j = µb,d
s − µb,d
s = µb,d
s .
b≤Ii ,d∈(I j−1 ,I j ] b≤Ii−1 ,d∈(I j−1 ,I j ] b∈(Ii−1 ,Ii ],d∈(I j−1 ,I j ]

The theorem then follows. 

An implication of the above result is that any simplex-wise filtration F s obtained from the
lower star filtration F f produces the same pairing between critical points and the same persistence
diagram.

3.5.3 Persistence algorithm for 0-th persistent homology


The best known running time for general persistence algorithm is O(nω ), where n is the total
number of simplicies in the filtration. However, the 0-th persistent homology (0-th persistence
Computational Topology for Data Analysis 87

diagram Dgm0 f ) for a PL-function f : |K| → R can be computed efficiently in O(n log n + mα(n))
time, where n and m are the number of vertices and edges in K, respectively.
Indeed, first observe that we only need the 1-skeleton of K to compute Dgm0 f . So, in what
follows, assume that K contains only vertices V and edges E. Assume that all vertices in V are
sorted in non-decreasing order of their f -values. As before, let Ki be the union of lower-stars of
all vertices v j where j ≤ i. Since we are only interested in the 0-th homology, we only need to
track the 0-th homology group of Ki , which essentially embodies the information about connected
components.
Assume we are at vertex v j . Consider Lst(v j ). There are three cases.

Case-1 : Lst(v j ) = {v j }. Then v j starts a new connected component in K j . Hence v j is a creator.


Case-2 : All edges in Lst(v j ) connect to vertices from the same connected component C in K j−1 .
In this case, the component C grows in the sense that it now includes also vertex v j and
its incident edges in the lower-star. However, H0 (K j−1 ) and H0 (K j ) are isomorphic where
K j−1 ⊆ K j induces an isomorphism.
Case-3 : Edges in Lst(v j ) link to two or more components, say C1 , . . . , Cr , in K j−1 . In this case,
after the addition of Lst(v j ), all C1 , . . . , Cr are merged into a single component

C 0 = C1 ∪ C2 ∪ · · · ∪ Cr ∪ Lstv j .

Hence inclusion K j−1 ,→ K j induces a surjective homomorphism ξ : H0 (K j−1 ) → H0 (K j )


and β0 (K j ) = β0 (K j−1 ) − (r − 1). That is, we can consider that r − 1 number of components
are destroyed, only one stays on as C 0 .

Proposition 3.17. Suppose Case-3 happens where edges in Lst(v j ) merges components C1 , . . . , Cr
in K j−1 . Let vki be the global minimum of component Ci for i ∈ [1, r]. Assume w.l.o.g that
f (vk1 ) ≤ f (vk2 ) ≤ · · · ≤ f (vkr ). Then the node v j participates in exactly r−1 number of persistence
pairings (vk2 , v j ), . . . , (vkr , v j ) for the 0-dimensional persistent diagram Dgm0 f , corresponding to
points ( f (vk2 ), f (v j )), . . . , ( f (vkr ), f (v j )) in Dgm0 f .

Intuitively, when Case-3 happens, consider the set of 0-cycles c2 = vk2 + vk1 , c3 = vk3 +
vk1 , . . . , cr = vkr + vk1 . On one hand, it is easy to see that their corresponding homology classes
[ci ]’s are independent within H0 (K j−1 ). Furthermore, each ci is created upon entering Kki for each
i ∈ [1, r]. On the other hand, the homology classes [c2 ], . . . , [cr ] become trivial in H0 (K j ) (thus
k ,j
they are destroyed upon entering K j ). Hence µ0i > 0 for i ∈ [2, r], corresponding to persistence
pairings (vk2 , v j ), . . . , (vkr , v j ). Furthermore, consider any 0-cycle c1 = vk1 + c where c is a 0-chain
from Kk1 −1 . The class [c1 ] is created at Kk1 yet remains non-trivial at K j . Hence there is no
persistence pairing (vk1 , v j ).
Based on Proposition 3.17 we can compute the persistence pairings for the 0-dimensional
persistent homology without the matrix reduction algorithm. We only need to maintain connected
components information for each Ki , and potentially merge multiple components. We would also
need to be able to query the membership of a given vertex u in the components of the current
sublevel set. Such operations can be implemented by a standard union-find data structure.
Specifically, a union-find data structure is a standard data structure that maintains dynamic
disjoint sets [109]. Given a set of elements U called the universe, this data structure typically
88 Computational Topology for Data Analysis

supports the following three operations to maintain a set S of disjoint subsets of U, where each
subset also maintains a representative element: (1) MakeSet(x) which creates a new set {x} and
adds it to S; (2) FindSet(x) returns the representative of the set from S containing x; and (3)
Union(x, y) merges the sets from S containing x and y respectively into a single one if they are
different.
We now present Algorithm 5:ZeroPerDg. Here the universe U is the set of all vertices V of
K. Note that each vertex v is also associated with its function value f (v). In this algorithm, we
assume that the representative of a set C is the minimum in it, i.e, the vertex with the smallest
f -value, and the query RepSet(v) returns the representative of the set containing vertex v. We
assume that this query takes the same time as FindSet(v). Given a disjoint set C, we also use
RepSet(C) to represent the representative (minimum) of this set. One can view a disjoint set C in
the collection S as the maximal set of elements sharing the same representative.

Algorithm 5 ZeroPerDg(K = (V, E), f )


Input:
K: a 1-complex with a vertex function f on it
Output:
Vertex pairs generating Dgm0 ( f ) for the PL function given by f
1: Sort vertices in V so that f (v1 ) ≤ f (v2 ) ≤ . . . ≤ f (vn )
2: for j = 1 → n do
3: CreateSet(v j )
4: f lag := 0
5: for each (vk , v j ) ∈ Lst(v j ) do
6: if ( f lag == 0) then
7: Union(vk , v j )
8: f lag := 1
9: else
10: if FindSet(vk ) , FindSet(v j ) then
11: Set `1 = RepSet(vk ) and `2 = RepSet(v j )
12: Union(vk , v j )
13: Output pairing (argmax{ f (`1 ), f (`2 )}, v j )
14: end if
15: end if
16: end for
17: end for
18: for each disjoint set C do
19: Output pairing (RepSet(C), ∞)
20: end for

Let n and m denote the number of vertices and edges in K, respectively. Sorting all vertices
in V takes O(n log n) time. There are O(n + m) number of CreateSet, FindSet, Union and RepSet
operations. By using the standard union-find data structure, the total time for all these operations
are (n + m)α(n) where α(n) is the inverse Ackermann’s function that grows extremely slowy with
n [109]. Hence the total time complexity of Algorithm ZeroPerDg is O(n log n + mα(n)).
Computational Topology for Data Analysis 89

Note that lines 18-20 of algorithm ZeroPerDg inspect all disjoint sets after processing all
vertices and their lower-stars; each of such disjoint sets corresponds to a connected component in
K. Hence each of them generates an essential pair in the 0-th persistence diagram.

Theorem 3.18. Given a PL-function f : |K| → R, the 0-dimensional persistence diagram Dgm0 f
for the lower-star filtration of f can be computed by the algorithm ZeroPerDg in O(n log n +
mα(n)) time, where n and m are the number of vertices and edges in K respectively.

Connection to minimum spanning tree. If we view the 1-skeleton of K as a graph G = (V, E),
then ZeorPerDg(K, f ) essentially computes the minimum spanning forest of G with the following
edge weights: for every edge e = (u, v), we set its weight w(e) = max{ f (u), f (v)}. Then, we can
get the persistent pairs output of ZeroPerDg by running the well known Kruskal’s algorithm on
the weighted graph G. When we come across an edge e = (u, v) that joins two disjoint components
in this algorithm, we determine the two minimum vertices `1 , `2 in these two components and pair
e with the one among `1 , `2 that has the larger f -value. After generating all such vertex-edge pairs
(u, e), we convert them to vertex-vertex pairs (u, v) where e ∈ Lst(v). We throw away any pair of
the form (u, u) because they signify local pairs.

Graph filtration. The algorithm ZeroPerDg can be easily adapted to compute persistence for
a given filtration of a graph. In this case, we process the vertices and edges in their order in the
filtration and maintain connected components using union-find data structure as in ZeroPerDg.
For each edge e = (u, v), we check if it connects two disconnected components represented by
vertices `1 and `2 (line 11) and if so, e is paired with the younger vertex between `1 and `2 (line
13). We output all vertex-edge pairs thus computed. The vertices and edges that remain unpaired
provide the infinite bars in the 0-th and 1-st persistence diagrams. The algorithm runs in O(nα(n))
time if the graph has n vertices and edges in total. The O(n log n) term in the complexity is
eliminated because sorting of the vertices is implicitly given by the input filtration.

3.6 Notes and Exercises


The concept of topological persistence came to the fore in early 2000 with the paper by Edels-
brunner, Letscher, and Zomorodian [152] though the concept was proposed in a rudimentary form
(e.g. 0-dimensional homology) in other papers by Frosini [162] and Robins [266]. The persistence
algorithm as described in this chapter was presented in [152] which has become the cornerstone
of topological data analysis. The original algorithm was described without any matrix reduction
which first appeared in [106]. Since then various versions of the algorithm has been presented.
We already saw that persistence for filtrations of simplicial 1-complexes (graphs) with n simplices
can be computed in O(nα(n)) time. Persistence for filtrations of simplicial 2-manifolds also can
be computed in O(nα(n)) time algorithm by essentially reducing the problem to computing per-
sistence on a dual graph. In general, for any constant d ≥ 1, the persistence pairs between d- and
(d − 1)-simplices of a simplicial d-manifold can be computed in O(nα(n)) time by considering
the dual graph. If the manifold has boundary, then one has to consider a ‘dummy’ vertex that
connects to every dual vertex of a d-simplex adjoining a boundary (d − 1)-simplex.
90 Computational Topology for Data Analysis

For efficient implementation, clearing and compression strategies as described in Section 3.3.2
were presented by Chen and Kerber [95]. We have given a proof based on matrix reduction that the
same persistent pairs can be computed by considering the anti-transpose of the boundary matrix.
This is termed as the cohomology algorithm first introduced in [115]. The name is justified by
the fact that considering cohomology groups and the resulting persistence module that reverses
the arrows (Fact 2.14), we obtain the same barcode. The anti-transpose of the boundary matrix
indeed represents the coboundary matrix filtered reversely. These tricks are further used by Bauer
for processing Rips filtration efficiently in the Ripser software [19]; see also [304]. Boissonnat et
al. [41, 42] have suggested a technique to reduce the size of a given filtration using strong collapse
of Barmak and Minian [17]. The collapse on the complex can be efficiently achieved only through
simple manipulations of the boundary matrix.
The concept of bottleneck distance for persistence diagrams was first proposed by Cohen-
Steiner et al. [102] who also showed the stability of such diagrams in terms of bottleneck dis-
tances with respect to the infinity norm of the difference between functions generating them. This
result was extended to Wassrstein distance though in a weaker form in [104] which got improved
recently [278]. The more general concept of interleaving distance between persistence modules
and the stability of persistence diagrams with respect them was presented by Chazal et al. [77].
The fact that bottleneck distance between persistence diagrams is not only bounded from above
by interleaving distance but is indeed equal to it was shown by Lesnick [220] which was further
studied by Bauer and Lesnick [23] later. Also, see [54] for more generalization at algebraic level.
The use of the reduced Betti numbers for lower-link of a vertex to quantify its criticality was
originally introduced in [150] for a PL-function defined on a triangulation of a d-manifold. Our
PL-criticality considers both the lower-link and upper-link for more general simplicial complexes.
As far as we know, the relations between such PL-critical points and homology groups of sublevel-
sets for the PL-setting have not been stated explicitly elsewhere in the literature. The concept of
homological critical values was first introduced in [102], and the more general concept of “levelset
critical values" (and levelset tame functions) was originally introduced in [63].
The idea of using union-find data structure to compute the 0-th persistent homology group was
already introduced in the original persistence algorithm paper [152]. In this chapter, we present a
modification for the PL-function setting.

Exercises
1. Let K be a p-complex with every (p − 1)-simplex incident to exactly two p-simplices. Let
M be a boundary matrix of the boundary operator ∂ p for K. We run a different version of
the persistence algorithm on M. We scan its columns from left to right as before, but we
add the current column to its right to resolve conflict, i.e., for each i = 1, · · · , n in this order
if there exists j > i so that low M [i] = low M [ j], then add col M [i] to col M [ j]. Show that:

(a) There can be at most one such j,


(b) At termination, every column of M is either empty or has a unique low entry,
(c) The algorithm outputs in O(n2 ) time the same low M [i] as the original persistence
algorithm returns on M.
Computational Topology for Data Analysis 91

2. For a given matrix with binary entries, a valid column operation is one that adds a column
to its right (Z2 -addition). Similarly, define a valid row operation is the one that adds a row
to another one above it. Show that there exists a set of valid column and row operations
that leave every row and column either empty or with a single non-zero entry.

3. Let D be a boundary matrix of a simplicial complex K. Modify the algorithm MatPer-


sistence to compute a set of p-cycles, p ≥ 0, whose classes form a basis of H p (K) (Hint:
consider interpreting the role of the matrix V in the decomposition R = DV of the reduced
matrix R).

4. Prove Theorem 3.1.

5. Give a polynomial time algorithm for computing dW,q .

6. Prove Proposition 3.7.

7. Let mq and βq be the number of q-simplices and qth Betti number of a simplicial complex
of dimension p. Using pairing in persistence, show that

m p − m p−1 + · · · + ±m0 = β p − β p−1 + · · · ± β0 .

8. Let F be a filtration where every p-simplex appear only after all (p − 1)-simplices like in
Eqn. (3.8). Let F0 be a modified filtration of F as follows. For every p ≥ 0, all p-simplices
in F are ordered in non-decreasing order of their persistence values in F0 assuming that
unpaired p-simplices have persistence value ∞. Show that the persistence pairing remains
the same for F and F0 .

9. Let F be a simplex-wise filtration F of complex K induced by the sequence of simplices:


σ1 , . . . , σN . Let F0 be a modification of F where only two consecutive simplices σk and
σk+1 swap their order, that is, F0 is induced by the sequence:

σ1 , . . . , σk−1 , σk+1 , σk , σk+2 , . . . , σN .

Describe the relation between their corresponding persistence diagrams Dgm(F) and Dgm(F0 ).

10. Give an example of a piecewise linear function f : |K| → R where a vertex vi is a PL-critical
point, but H∗ (Ki−1 )H∗ (Ki ) as induced by inclusion.

11. Let f : V(K) → R be a vertex function defined on the vertex set V(K) of complex K.
Consider g = h ◦ f + a, where h : R → R is a monotone function and a ∈ R is a real value.
Consider the lower-star filtrations F f and Fg induced by induced PL-functions f, g : |K| →
R as in Eqn. 3.10. Describe the relation between their corresponding persistence diagrams
Dgm(F f ) and Dgm(Fg ).

12. Consider two PL-functions f, g : |K| → R on K induced by vertex functions f, g : V(K) →


R, respectively. Suppose k f − gk∞ = δ, where k f − gk∞ = maxv∈V | f (v) − g(v)|. Consider the
persistence modules P f and Pg induced by the lower-star filtration for f and g respectively.

• Show that dI (P f , Pg ) ≤ δ.
92 Computational Topology for Data Analysis

• Give an example of K, f , and g so that dI (P f , Pg ) < δ.

13. For a PL-function f : |K| → R, we know how to produce a simplex-wise filtration F so that
the barcode for f can be read from the barcode of F. Design an algorithm to do the reverse,
that is, given a filtration F on a complex K, produce a filtration G of a simplicial complex
K 0 so that G is indeed a simplex-wise filtration of a PL function g : |K 0 | → R where bars
for F can be obtained from those for G. (Hint: use barycentric subdivision of K).

14. Prove Proposition 3.9.

15. Consider the two persistence modules U and V as shown below and a sequence of linear
maps fi : Ui → Vi so that all squares commute.

U: U1 / U2 / U3 / ...... / Um

f1 f2 f3 fm
   
V: V1 / V2 / V3 / ...... / Vm

Consider the sequences

ker F : {ker fi ⊆ Ui → ker fi+1 ⊆ Ui+1 }

where the maps are induced from the module U. Prove that ker F is a persistence module.
Show the same for the sequences

im F : {im fi ⊆ Vi → im fi+1 ⊆ im Vi+1 } and

coker F : {coker fi = Vi /im fi → coker fi+1 = Vi+1 /im fi+1 }.


Chapter 4

General Persistence

We have considered filtrations so far for defining persistence and its stability. In a filtration, the
connecting maps between consecutive spaces or complexes are inclusions. Assuming a discrete
subset of reals, A : a0 ≤ a1 ≤ · · · ≤ an , as an index set, we write a filtration as:

F : Xa0 ,→ Xa1 ,→ · · · ,→ Xan .

A more generalized scenario occurs when the inclusions are replaced with continuous maps for
space filtrations and simplicial maps for simplicial filtrations: xi j : Xai → Xa j . In that case, we
call the sequence a space and a simplicial tower respectively:
x01 x12 x(n−1)n
X : Xa0 −→ Xa1 −→ · · · −→ Xan . (4.1)

Considering the homology group of each space (complex resp.) in the sequence, we obtain a
sequence of vector spaces connected with linear maps, which we have seen before. Specifically,
we obtain the following tower of vector spaces:
x01∗ x12∗ x(n−1)n∗
H p X : H p (Xa0 ) −→ H p (Xa1 ) −→ · · · −→ H p (Xan ).

In the above sequence each linear map xi j∗ is the homomorphism induced by the map xi j . We
have already seen that persistent homology of such a sequence of vector spaces and linear maps
are well defined. However, since the linear maps here are not induced by inclusions, the original
persistence algorithm as described in the previous chapter does not work. In Section 4.2, we
describe a new algorithm to compute the persistence diagram of simplicial towers. Next, we
generalize a filtration by allowing the inclusion maps to be directed either way giving rise to what
is called a zigzag filtration:
F : Xa0 ↔ Xa1 ↔ · · · ↔ Xan . (4.2)
where each bidirectional arrow ‘↔’ is either a forward or a backward inclusion map. In Sec-
tion 4.3, we present an algorithm to compute the persistence of a zigzag filtration. A juxtaposition
of a zigzag filtration with a tower provides a further generalization referred to as a zigzag tower.
Section 4.4 presents an approach for computing the persistence of such a tower.
Before presenting the algorithms, we generalize the notion of stability for towers. We have
seen such a notion in Section 3.4 for persistence modules arising out of filtrations. Here, we adapt
it to a tower.

93
94 Computational Topology for Data Analysis

4.1 Stability of towers


Just like the previous chapter, we define the stability with respect to the perturbation of the towers
themselves forgetting the functions who generate them. This requires a definition of a distance
between towers at simplicial (space) levels and homology levels.
It turns out that it is convenient and sometimes appropriate if the objects (spaces, simplicial
complexes, or vector spaces) in a tower are indexed over the positive real axis instead of a discrete
subset of it. This, in turn, requires to spell out the connecting map between every pair of objects.

Definition 4.1 (Tower). A tower indexed in an ordered set A ⊆ R is any collection T = T a a∈A of

objects T a , a ∈ A, together with maps ta,a0 : T a → T a0 so that ta,a = id and ta0 ,a00 ◦ ta,a0 = ta,a00 for
 ta,a0
all a ≤ a0 ≤ a00 . Sometimes we write T = T a −→ T a0 a≤a0 to denote the collection with the maps.
We say that the tower T has resolution r if a ≥ r for every a ∈ A.
When T is a collection of topological spaces connected with continuous maps, we call it a
space tower. When it is a collection of simplicial complexes connected with simplicial maps, we
call it a simplicial tower, and when it is a collection of vector spaces connected with linear maps,
we call it a vector space tower.

Remark 4.1. As we have already seen, in practice, it may happen that a tower needs to be defined
over a discrete set or more generally an index set A that is only a subposet of R. In such a case,
one can ‘embed’ A into R and convert the input to a tower according to Definition 4.1 by assuming
that for any a < a0 ∈ A with (a, a0 ) < A and for any a ≤ b < b0 < a0 , tb,b0 is an isomorphism.
xa,a0
Definition 4.2 (Interleaving of simplicial (space) towers). Let X = Xa −→ Xa0 a≤a0 and Y =

 ya,a0
Ya −→ Ya0 a≤a0 be two towers of simplicial complexes (spaces resp.) indexed in R. For any real
ε ≥ 0, we say that they are ε-interleaved if for every a one can find simplicial maps (continuous
maps resp.) ϕa : Xa → Ya+ε and ψa : Ya → Xa+ε so that

(i) for all a ∈ real, ψa+ε ◦ ϕa and xa,a+2ε are contiguous (homotopic resp.),

(ii) for all a ∈ R, ϕa+ε ◦ ψa and ya,a+2ε are contiguous (homotopic resp.),

(iii) for all a0 ≥ a, ϕa0 ◦ xa,a0 and ya+ε,a0 +ε ◦ ϕa are contiguous (homotopic resp.),

(iv) for all a0 ≥ a, xa+ε,a0 +ε ◦ ψa and ψa0 ◦ ya,a0 are contiguous (homotopic resp.).

If no such finite ε exists, we say the two towers are ∞-interleaved.

These four conditions are summarized by requiring that the four diagrams below commute up
to contiguity (homotopy resp.):

xa,a+2ε
Xa / Xa+2ε X (4.3)
; = a+ε
ϕa ψa+ε ψa ϕa+ε

! ya,a+2ε #
Ya+ε Ya / Ya+2ε
Computational Topology for Data Analysis 95

xa,a0 xa+ε,a0 +ε
Xa / Xa0 X / Xa0 +ε
= a+ε <
ϕa ϕa 0 ψa ψa0

! "
Ya+ε / Ya0 +ε Ya / Ya0
ya+ε,a0 +ε ya,a0

If we replace the operator ‘+’ by the multiplication ‘·’ with respect to the indices in the above
definition, then we say that X and Y are multiplicatively ε-interleaved. By interleaving we will
mean additive interleaving by default and use the term multiplicative interleaving where necessary
to signify that the shift is multiplicative rather than additive.
Definition 4.3 (Interleaving distance between simplicial (space) towers). The interleaving dis-
tance between two simplicial (space) towers X and Y is:

dI (X, Y) = inf {X and Y are ε−interleaved}.


ε

Similar to the simplicial (space) towers, we can define interleaving of vector space towers.
But, in that case, we replace contiguity (homotopy) with equality in conditions (i) through (iv).
 ua,a0 va,a0
Definition 4.4 (Interleaving of vector space towers). Let U = Ua −→ Ua0 a≤a0 and V = Va −→

Va0 a≤a0 be two vector space towers indexed in R. For any real ε ≥ 0, we say that they are ε-
interleaved if for each a ∈ R one can find linear maps ϕa : Ua → Va+ε and ψa : Va → Ua+ε so
that
(i) for all a ∈ R, ψa+ε ◦ ϕa = ua,a+2ε ,

(ii) for all a ∈ R, ϕa+ε ◦ ψa = va,a+2ε .

(iii) for all a0 ≥ a, ϕa0 ◦ ua,a0 = va+ε,a0 +ε ◦ ϕa ,

(iv) for all a0 ≥ a, ua+ε,a0 +ε ◦ ψa = ψa0 ◦ va,a0 .


If no such finite ε exists, we say the two towers are ∞-interleaved.
Analogous to the simplicial (space) towers, if we replace the operator ‘+’ by the multiplication
‘·’ in the above definition, then we say that U and V are multiplicatively ε-interleaved.
Definition 4.5 (Interleaving distance between vector space towers). The interleaving distance
between two towers of vector spaces U and V is:

dI (U, V) = inf {U and V are ε−interleaved}.


ε

xa,a0 ya,a0
Suppose that we have two simplicial (space) towers X = {Xa → Xa0 } and Y = {Ya →
Ya0 }. Consider the two vector space towers also called homology towers obtained by taking the
homology groups of the complexes (spaces), that is,
x(a,a0 )∗ y(a,a0 )∗
VX = {H p (Xa ) → H p (Xa0 )} and VY = {H p (Ya ) → H p (Ya0 )}.

The following should be obvious because simplicial (continuous resp.) maps become linear maps
and contiguous (homotopic resp.) maps become equal at the homology level.
96 Computational Topology for Data Analysis

Proposition 4.1. dI (VX , VY ) ≤ dI (X, Y).


One can recognize that the vector space tower is a persistence module defined in Section 3.4.
Therefore, we can use Definition 3.21 to define the persistence diagram DgmV of the tower V.
Recall that db denotes the bottleneck distance between persistence diagrams. Isometry theorem
as stated in Theorem 3.11 also holds for towers that are q-tame (or simply tame), that is, towers
with all linear maps having finite rank.
Theorem 4.2. For any two tame vector space towers U and V, we have db (Dgm(U), Dgm(V)) =
dI (U, V).
Combining Proposition 4.1 and Theorem 4.2, we obtain the following result.
Theorem 4.3. Let X and Y be two simplicial (space) towers and VX and VY be their homology
towers respectively that are tame. Then, db (Dgm(VX ), Dgm(VY )) ≤ dI (X, Y).
We want to apply the above result to translate the multiplicative interleaving distances into a
bottleneck distance of the persistence diagrams. For that we need to consider log scale. Given a
persistence diagram Dgm for a tower with a positive resolution, we denote its log-scaled version
Dgmlog to be the diagram consisting of the set of non-diagonal points {(log x, log y) | (x, y) ∈ Dgm}
along with the usual diagonal points. In log scale, a multiplicative interleaving turns into an
additive interleaving by which the following corollary is deduced immediately from Theorem 4.3.
Corollary 4.4. Let X and Y be two simplicial (space) towers with a positive resolution that are
multiplicatively c-interleaved and VX and VY be their homology towers respectively that are tame.
Then,
db (Dgmlog (VX ), Dgmlog (VY )) ≤ log c.

Interleaving between Čech and Rips filtrations: We show an example where we can use the
stability result in Corollary 4.4. Let P ⊆ M be a finite subset of a metric space (M, d). Consider
the Rips and Čech-filtrations:
R : {VRε (P) ,→ VRε (P)}0<ε≤ε0 and C : {Cε (P) ,→ Cε (P)}0<ε≤ε0 .
0 0

From Proposition 2.2, we know that the following inclusions hold.


· · · Cε (P) ⊆ VRε (P) ⊆ C2ε (P) ⊆ VR2ε (P) ⊆ C4ε (P) ⊆ VR4ε (P) · · · .

··· Cε C2ε C4ε ···

··· VRε VR2ε VR4ε ···

Figure 4.1: Čech and Rips complexes interleave multiplicatively.

Figure 4.1 illustrates that Čech an Rips complexes are multiplicatively 2-interleaved. Then,
according to Corollary 4.4, the persistence diagrams Dgmlog C and Dgmlog R have bottleneck dis-
tance of log 2 = 1.
Computational Topology for Data Analysis 97

4.2 Computing persistence of simplicial towers


In this section, we present an algorithm for computing the persistence of a simplicial tower. Con-
f0 f1 fn−1
sider a simplicial tower K : K0 → K1 → K2 · · · → Kn and the map fi j : Ki → K j where
fi j = f j−1 ◦ · · · ◦ fi+1 ◦ fi . To compute the persistent homology for a simplicial filtration, the
persistence algorithm in the previous chapter essentially maintains a consistent basis by comput-
ing the image fi j∗ (Bi ) of a basis Bi of H∗ (Ki ). As the algorithm moves through an inclusion in
the filtration, the homology basis elements get created (birth) or are destroyed (death). Here, for
towers, instead of a consistent homology basis, we maintain a consistent cohomology basis. We
need to be aware that, for cohomology, the induced maps from fi j : Ki → K j are reversed, that
is, fi∗j : H p (Ki ) ← H p (K j ); refer to Section 2.5.4. So, if Bi is a cohomology basis of H p (Ki )
maintained by the algorithm, it computes implicitly the preimage fi∗−1 i
j (B ). Dually, this implicitly
maintains a consistent homology basis and thus captures all information about persistent homol-
ogy as well.

4.2.1 Annotations
We maintain a consistent cohomology basis using a notion called annotations [60] which are
binary vectors assigned to simplices. These annotations are updated as we go forward through the
sequence in the given tower. This implicitly maintains a cohomology basis in the reverse direction
where birth and death of cohomology classes coincide with the death and birth respectively of
homology classes.
Definition 4.6 (Annotation). Given a simplicial complex K, Let K(p) denote the set of p-simplices
g
in K. An annotation for K(p) is an assignment a : K(p) → Z2 of a binary vector aσ = a(σ) of
length g to each p-simplex σ ∈ K. The binary vector aσ is called the annotation for σ. Each entry
‘0’ or ‘1’ of aσ is called its element. Annotations for simplices provide an annotation for every
p-chain c p : ac p = Σσ∈c p aσ .
g
An annotation a : K(p) → Z2 is valid if following two conditions are satisfied:
1. g = rank Hp (K), and
2. two p-cycles z1 and z2 have az1 = az2 if and only if their homology classes are identical, i.e.
[z1 ] = [z2 ].
Proposition 4.5. The following two statements are equivalent.
g
1. An annotation a : K(p) → Z2 is valid
2. The cochains {φi }i=1,··· ,g given by φi (σ) = aσ [i] for every σ ∈ K(p) are cocycles whose
cohomology classes {[φi ]}, i = 1, . . . , g, constitute a basis of H p (K).

In light of the above result, an annotation is simply one way to represent a cohomology ba-
sis. However, by representing the corresponding basis as an explicit vector associated with each
simplex, it localizes the basis to each simplex. As a result, we can update the cohomology basis
locally by changing the annotations locally (see Proposition 4.8). This point of view also helps to
reveal how we can process elementary collapses, which are neither inclusions nor deletions, by
transferring annotations (see Proposition 4.9).
98 Computational Topology for Data Analysis

4.2.2 Algorithm
fi
Consider the persistence module H p K induced by a simplicial tower K : {Ki → Ki+1 } where
every fi is a so-called elementary simplicial map which we will introduce shortly:
f 0∗ f 1∗ fn−1∗
H p K : H p (K0 ) → H p (K1 ) → H p (K2 ) · · · → H p (Kn ).

Instead of tracking a consistent homology basis for the module H p K, we track a cohomology
basis in the module H p K where the homomorphisms are in reverse direction:
f0∗ f1∗ ∗
fn−1
H p K : H p (K0 ) ← H p (K1 ) ← H p (K2 ) · · · ← H p (Kn ).

As we move from left to right in the above sequence, the annotations implicitly maintain a coho-
mology basis whose elements are also time stamped to signify when a basis element is born or
dies. We keep in mind that the birth and death of a cohomology basis element coincides with the
death and birth of a homology basis element because the two modules run in opposite directions.
To jump start the algorithm, we need annotations for simplices in K0 at the beginning whose
non-zero elements are timestamped with 0. This can be achieved by considering an arbitrary filtra-
tion of K0 and then applying the generic algorithm as we describe for inclusions in Section 4.2.3.
The first vertex in this filtration gets the annotation of [1].
Before describing the algorithm, we observe a simple fact that simplicial maps can be decom-
posed into elementary maps which let us design simpler atomic steps for the algorithm.

Definition 4.7 (Elementary simplicial maps). A simplicial map f : K → K 0 is called elementary


if it is of one of the following two types:

• f is injective, and K 0 has at most one more simplex than K. In this case, f is called an
elementary inclusion.

• f is not injective but is surjective, and the vertex map fV is injective everywhere except on a
pair {u, v} ⊆ V(K). In this case, f is called an elementary collapse. An elementary collapse
maps a pair of vertices into a single vertex, and is injective on every other vertex.

We observe that any simplicial map is a composition of elementary simplicial maps.

Proposition 4.6. If f : K → K 0 is a simplicial map, then there are elementary simplicial maps fi
f0 f1 fn−1
K = K0 → K1 → K2 · · · → Kn = K 0 so that f = fn−1 ◦ fn−2 ◦ · · · ◦ f0 .

In view of Proposition 4.6, it is sufficient to show how one can design the persistence algo-
rithm for an elementary simplicial map. At this point, we make a change in the definition 4.7
of elementary simplicial maps that eases further discussions. We let fV to be identity (which is
an injective map) everywhere except possibly on a pair of vertices {u, v} ⊆ V(K) for which fV
maps to one of these two vertices, say u, in K 0 . This change can be implemented by renaming the
vertices in K 0 that are mapped onto injectively.
Computational Topology for Data Analysis 99

0 0 00 00 00 00 00 00 0 0
u 01 v u 01 v u 00 v 0 v
1 u v 1 1 1 u

0 1 00 10 00 10 00 10 0 1

(a) Case(i) (b) Case(ii)

Figure 4.2: Case(i) of inclusion: the boundary ∂uv = u + v of the edge uv has annotation 1 + 1 = 0.
After its addition, every edge gains an element in its annotation which is 0 for all except the edge
uv. Case (ii) of inclusion: the boundary of the top triangle has annotation 01. It is added to the
annotation of uv which is the only edge having the second element 1. Consequently the second
element is zeroed out for every edge, and is then deleted.

4.2.3 Elementary inclusion


Consider an elementary inclusion Ki ,→ Ki+1 . Assume that Ki has a valid annotation. We de-
scribe how we obtain a valid annotation for Ki+1 from that of Ki after inserting the p-simplex
σ = Ki+1 \ Ki . We compute the annotation a∂σ for the boundary ∂σ in Ki and take actions as
follows which ultimately lead to computing the persistence diagram.

Case (i): If a∂σ is a zero vector, the class [∂σ] is trivial in H p−1 (Ki ). This means that σ creates a
p-cycle in Ki+1 and by duality a p-cocycle is killed while going left from Ki+1 to Ki . In this case
we augment the annotations for all p-simplices by one element with a time stamp i + 1, that is,
the annotation [b1 , b2 , · · · , bg ] for every p-simplex τ is updated to [b1 , b2 , · · · , bg , bg+1 ] with bg+1
being time stamped i + 1. We set bg+1 = 0 for τ , σ and bg+1 = 1 for τ = σ. The element bi
of aσ is set to zero for 1 ≤ i ≤ g. Other annotations for other simplices remain unchanged. See
Figure 4.2(a).

Case (ii): If a∂σ is not a zero vector, the class of the (p − 1)-cycle ∂σ is nontrivial in H p−1 (Ki ).
Therefore, σ kills the class of this (p − 1)-cycle and a corresponding class of (p − 1)-cocycles
is born in the reverse direction. We simulate it by forcing a∂σ to be zero which affects other
annotations as well. Let i1 < i2 < · · · < ik be the set of indices in non-decreasing order so that
bi1 , bi2 , · · · , bik are all of the nonzero elements in a∂σ = [b1 , b2 , · · · , bik , · · · , bg ]. Recall that φ j
denotes the (p − 1)-cocycle given by its evaluation φ j (σ0 ) = aσ0 [ j] for every (p − 1)-simplex
σ0 ∈ Ki (Proposition 4.5). With this notation, the cocycle φ = φi1 + φi2 + · · · + φik is born
after deleting σ in the reverse direction. This cocycle does not exist after time ik in the reverse
direction. In other words, the cohomology class [φ] which is born leaving the time i + 1 is killed at
time ik . This pairing matches that of the standard persistence algorithm where the youngest basis
element is chosen to be paired among all those ones whose combination is killed. We add the
vector a∂σ to the annotation of every (p − 1)-simplex whose ik -th element is nonzero. This zeroes
out the ik -th element of the annotation for every (p − 1)-simplex and at the same time updates
other elements so that a valid annotation according to Proposition 4.5 is maintained. We simply
delete ik -th element from the annotation for every (p − 1)-simplex. See Figure 4.2(b). We further
set the annotation aσ for σ to be a zero-vector of length s, where s is the length of the annotation
100 Computational Topology for Data Analysis

vector of every p-simplex at this point.


Notice that determining if we have case (i) or (ii) can be done easily in O(pg) time by check-
ing the annotation of ∂σ. Indeed, this is achieved because the annotation already localizes the
cohomology basis to each individual simplex.
Before going to the next case of elementary collapse, here we present Algorithm 6:Annot
for computing the annotations for all simplices in a given simplicial complex using the steps of
elementary inclusions. The algorithm proceeds in the order of increasing dimension because it
needs to have the annotations of (p − 1)-simplices before dealing with p-simplices. It starts with
vertices whose annotations are readily computable. In the following algorithm K p denotes the
p-skeleton of the input simplicial d-complex K.

Algorithm 6 Annot(K)
Input:
K: input complex
Output:
Annotation for every simplex in K
1: Let m := |K 0 |
2: For every vertex vi ∈ K 0 , assign an m-vector a(vi ) where a(vi )[ j] = 1 iff j = i
3: for p = 1 → d do
4: for all simplex σ ∈ K p do
5: Let annotation of every p-simplex be a vector of length g so far
6: if a(∂σ) , 0 then
7: assign a(σ) to be a 0 vector of size g
8: pick any non-zero entry bu in a(∂σ)
9: add a(∂σ) to every (p − 1)-simplex σ0 s.t. a(σ0 )[u] = 1
10: delete u-th entry from annotation of every (p − 1)-simplex
11: else
12: extend a(τ) for every p-simplex τ so far added by appending a 0 bit
13: create vector a(σ) of length g + 1 with only the last bit being 1
14: end if
15: end for
16: end for

4.2.4 Elementary collapse


The case for handling collapse is more interesting. It has three distinct steps, (i) elementary
inclusions to satisfy the so called link condition, (ii) local annotation transfer to prepare for the
collapse, and (iii) collapse of the simplices with updated annotations. We explain each of these
steps now.
The elementary inclusions that may precede the final collapse are motivated by a result that
fi
connects collapses with the change in cohomology. Consider an elementary collapse Ki → Ki+1
where the vertex pair (u, v) collapses to u. The following link condition, introduced in [121] and
later used to preserve homotopy [12], becomes relevant.
Computational Topology for Data Analysis 101

Definition 4.8 (Link condition). A vertex pair (u, v) in a simplicial complex Ki satisfies the link
condition if the edge uv ∈ Ki and Lk u ∩ Lk v = Lk uv. An elementary collapse fi : Ki → Ki+1
satisfies the link condition if the vertex pair on which fi is not injective satisfies the link condition.

000 000 00

000 000 000 000 000 000 000 000 00 00 00 00

010 100 010 100 01 10


u 001 v u 110 v u 11 v
00 00 00

00 00
00 00 00 00 11 00 00 00 11 00
w w w
10
01 10 10 10
u 11 v u 00 v u

Figure 4.3: Annotation updates for elementary collapse: inclusion of a triangle so as to satisfy the
link condition (upper row), annotation transfer and actual collapse (lower row); annotation 11 of
the vanishing edge uv is added to all edges (cofacets) adjoining u.

Proposition 4.7 ([12]). If an elementary collapse fi : Ki → Ki+1 satisfies the link condition, then
the underlying spaces |Ki | and |Ki+1 | remain homotopy equivalent. Hence, the induced homomor-
phisms fi∗ : H p (Ki ) → H p (Ki+1 ) and fi∗ : H p (Ki ) ← H p (Ki+1 ) are isomorphisms.

If an elementary collapse satisfies the link condition, we can perform the collapse knowing
that the cohomology does not change. Otherwise, we know that the cohomology is affected by the
collapse and it should be reflected in our updates for annotations.
The diagram at the left provides a precise means to carry out the change
in cohomology. Let S be the minimal set of simplices ordered in non-
fi
/ Ki+1 decreasing order of their dimensions whose addition to Ki makes (u, v) sat-
Ki O
j
isfy the link condition. One can describe a construction of S recursively
fi0 as follows. In dimension 1, if the edge (u, v) is missing, it is added to S .
!
K̂i Recursively assume that S has all of the necessary p-simplices. Then, all
missing (p + 1)-simplices adjoining the edge (a, b) whose boundary is al-
ready present are added to S . For each simplex σ ∈ S , we modify the
annotations of every simplex which we would have done if σ were to be
inserted. Thereafter, we carry out the rest of the elementary collapse. In essence, implicitly, we
obtain an intermediate complex K̂i = Ki ∪ S where the diagram on the left commutes. Here, fi0
is induced by the same vertex map that induces fi , and j is an inclusion. This means that the
persistence of fi is identical to that of fi0 ◦ j which justifies our action of elementary inclusions
followed by the actual collapses.
102 Computational Topology for Data Analysis

We remark that this is the only place where we may insert implicitly a simplex σ in the current
approach. The number of such σ is usually much smaller than the number of simplices that one
may need for a coning strategy detailed in Section 4.4 to process simplicial towers.
After constructing K̂i with annotations, we transfer annotations to prepare for the collapse.
This step locally changes the annotations for simplices containing the vertices u and/or v. The
following definition facilitates the description.

Definition 4.9 (Vanishing; Mirror simplices). For the elementary collapse fi0 : K̂i → Ki+1 , a
simplex σ ∈ Ki is called vanishing if the cardinality of fi0 (σ) is one less than that of σ. Two
simplices σ and σ0 are called mirror partners if one contains u and the other v, and share rest
of the vertices. In Figure 4.3 (lower row), the vanishing simplices are {uv, uvw} and the mirror
partners are {u, v}, {uw, vw}.

In an elementary collapse that sends (u, v) to u, all vanishing simplices need to be deleted, and
all simplices containing v need to be pulled to corresponding ones containing the vertex u (which
are their mirror partners). We update the annotations in such a way that the annotations of all
vanishing simplices become zero, and those of both mirror partners become the same. Once this
is achieved, the collapse is implemented by simply deleting the vanishing simplices and replacing
v with u in all simplices containing v (effectively this identifies mirror partners) without changing
their annotations. The following proposition provides the justification behind the specific update
operations that we perform.
g
Proposition 4.8. Let K be a simplicial complex and a : K(p) → Z2 be a valid annotation. Let
σ ∈ K(p) be any p-simplex and τ any of its (p − 1)-faces. Then, adding aσ to the annotation for
all cofacets of τ including σ produces a valid annotation for K(p). Furthermore, the cohomology
basis corresponding to the annotations (Proposition 4.5) remains unchanged by this modification.

Consider now the elementary collapse fi0 : K̂i → Ki+1 that sends (u, v) to u. We update the
annotations for simplices in K̂i as follows. First, note that the vanishing simplices are exactly
those simplices containing the edge {u, v}. For every p-simplex containing {u, v}, i.e., a vanishing
simplex, exactly two of its (p − 1)-faces are mirror simplices, and all other remaining (p − 1)-faces
are vanishing simplices. Let σ be a vanishing p-simplex and τ be its (p − 1)-face that is a mirror
simplex containing u. We add aσ to the annotations for all cofacets (cofaces of codimension 1)
of τ including σ. This implements the annotation transfer for σ. By Proposition 4.8, the new
annotation generated by this process corresponds to the old cohomology basis for K̂i . This new
annotation has aσ as zero since aσ + aσ = 0. See the the lower row of Figure 4.3. We perform the
above operation for each vanishing simplex. It turns out that by using the relations of vanishing
simplices and mirror simplices, each mirror simplex eventually acquires an identical annotation
to that of its partner. Specifically, we have the following observation.

Proposition 4.9. After all possible annotation transfers involved in a collapse, (i) each vanishing
simplex has a zero annotation; and (ii) each mirror simplex τ has the same annotation as its
mirror partner simplex τ0 .

Subsequent to the annotation transfer, the annotation of K̂i fits for actual collapse since each
pair of mirror simplices which are collapsed to a single simplex get the identical annotation and
Computational Topology for Data Analysis 103

the vanishing simplex acquires the zero annotation. Furthermore, Proposition 4.8 tells us that
the cohomology basis does not change by annotation transfer which aligns with the fact that
fi0∗ : H p (Ki+1 ) → H p (K̂i ) is indeed an isomorphism. Accordingly, no time stamp changes after
the annotation transfer and the actual collapse. Propositions 5.2 and 5.3 in [122] provide formal
statements justifying the algorithm for annotation updates.
The persistence diagram of a given simplicial tower K can be retrieved easily from the anno-
tation algorithm. Each time during an elementary operation either we add a new element into the
annotation of all p-simplices for some p ≥ 0 or delete an element from the annotations of all of
them. During the deletion, we add the point (bar) (a, b) into Dgm p K where b is the current time
of deletion (death) and a is the time stamp of the element when it was added (birth).

4.3 Persistence for zigzag filtration


Now we consider another generalization of filtration where all inclusions are not necessarily in
the forward direction. The possibility of backward inclusions allows simplices to be deleted as we
move forward. So, essentially, we allow both insertions and deletions making it possible for the
complex to grow and shrink as we move forward with the filtration. It is not obvious a priori that
the resulting persistence module admits bar codes as in the original filtration where all inclusions
are in the forward direction. Existence of such bar codes is essential for defining persistence pairs
and designing an algorithm to compute them. We are assured by quiver theory [163] that such
bar codes also exist for zigzag filtration with both forward and backward insertions. We aim to
compute them.
K0 K2 K4

K1 K3

0 1 3 4

Figure 4.4: The zigzag filtration K0 ,→ K1 ←- K2 ,→ K3 ←- K4 has four intervals (bars) for one
dimensional homology H1 , namely [0, 4], [1, 1], [3, 4], and [4, 4].

Specifically, a zigzag filtration F of a complex K (space T) is a zigzag diagram of the form:


F : X0 ↔ X1 ↔ · · · ↔ Xn−1 ↔ Xn (4.4)
where for each i, Xi = Ki ⊆ K for a simplicial filtration and Xi = Ti ⊆ T for a space filtration,
and Xi ↔ Xi+1 is either a forward inclusion Xi ,→ Xi+1 or a backward inclusion Xi ←- Xi+1 .
Figure 4.4 illustrates a simplicial zigzag filtration and its barcode. Observe that reverse arrows
can be interpreted as simplex deletions. For any j ∈ [0, n], we let F j denote the prefix of F
consisting of the complexes (spaces) X0 , . . . , X j .
104 Computational Topology for Data Analysis

For p ≥ 0, considering the p-th homology groups with coefficient in a field k (which is Z2
here), we obtain a sequence of vector spaces connected by forward or backward linear maps,
called a zigzag persistence module:
ϕ0 ϕ1 ϕn−2 ϕn−1
H p F : H p (X0 ) ←−→ H p (X1 ) ←−→ · · · ←−−→ H p (Xn−1 ) ←−−→ H p (Xn ) (4.5)

where the map ϕi : H p (Xi ) ↔ H p (Xi+1 ) can either be forward or backward and is induced by the
inclusion.
In the non-zigzag case, when index set for H p F is finite, Proposition 3.10 says that H p F is a
direct sum of interval modules. In zigzag case, similar statement holds due to quiver theory [163].

Definition 4.10 (Quiver). A quiver Q = (N, E) is a directed graph which can be finite or infinite.
A representation V(Q) of Q is an assignment of a vector space Vi to every node Ni ∈ N and a linear
map vi j : Vi → V j for every directed edge (Ni , N j ) ∈ E. Figure 4.5 illustrates representations of
two quivers.

Vi−2 Vi−1 Vi Vi+1 Vi+2

vi−1,i vi,i0

Vi0
Vi−3 Vi−2 Vi−1 Vi Vi+1 Vi+2

vi−1,i

Figure 4.5: A representation of a quiver (top); a representation of an An -type quiver (bottom).

A zigzag persistence module is a special type of quiver representation where the graph is finite
and linear shaped, also known as An -type (see Figure 4.5(bottom)), where every node has at most
two directed edges incident to it. Such a quiver representation has an interval decomposition
though we need to define the intervals afresh to take into account the fact that arrows can be
bidirectional.

Definition 4.11 (Interval module). An interval module I[b,d] also called an interval or a bar over
an index set 0, 1, . . . , n with field k is a sequence of vector spaces

I[b,d] : I0 ↔ I1 · · · ↔ In

where Ik = k for b ≤ k ≤ d and 0 otherwise with the maps k ← k and k → k being identities.

Remark 4.2. Notice that unlike the bars that we defined in Chapter 3 for non-zigzag filtration,
here the bars are closed on both ends. However, we will see that we can designate them to be of
four types similar to what we have seen for the persistence modules for non-zigzag persistence.
Computational Topology for Data Analysis 105

Theorem 4.10 ([13, 265, 163]). Every quiverL representation V(Q) for an An -type quiver Q has
an interval decomposition, that is, V(Q)  i I[bi ,di ] . Furthermore, this decomposition is unique
up to isomorphism and permutation of the intervals.

The underlying graph of a zigzag persistence


L module as shown in Eqn. (4.5) is of An -type.
Hence, we have the decomposition H p F  i I[bi ,di ] that provides the bar code for zigzag per-
sistence. Notice that Theorem 4.10 does not require the vector spaces to be finite dimensional.
Hence, we still have a valid decomposition even if the vector spaces in the zigzag persistence mod-
ule are not finite dimensional. However, for finite computation, we will assume that our zigzag
persistence module is finite both in terms of the index set and also in terms of the dimension of
the vector spaces.
Recall from Section 3.2.1 that each bar (interval) in a barcode (interval decomposition) corre-
sponds to a point in the persistence diagram Dgm p (F) and thus we also say that the bar belongs to
the diagram. Sometimes, we also abuse the notation [b, d] to denote both an interval in the index
set and an interval module in a p-th zigzag persistent module.

Types of bars. A bar [b, d] for a zigzag persistence module H p F can be of four types depending
on the direction of the arrow between Xb−1 and Xb and the arrow between Xd and Xd+1 in F. They
are:

closed-closed [b, d]: Xb−1 ,→ Xb · · · Xd ←- Xd+1 : Either b = 0 or the inclusion Xb−1 ,→ Xb is


a forward arrow; and d < n with the inclusion Xd ←- Xd+1 being a backward arrow;

closed-open [b, d]: Xb−1 ,→ Xb · · · Xd ,→ Xd+1 : Either b = 0 or the inclusion Xb−1 ,→ Xb is a


forward arrow; and either d = n or the inclusion Xd ,→ Xd+1 is a forward arrow;

open-closed [b, d]: Xb−1 ←- Xb · · · Xd ←- Xd+1 : b > 0 and the inclusion Xb−1 ←- Xb is a
backward arrow; and d < n with the inclusion Xd ←- Xd+1 being a backward arrow;

open-open [b, d]: Xb−1 ←- Xb · · · Xd ,→ Xd+1 : b > 0 and the inclusion Xb−1 ←- Xb is a back-
ward arrow; and either d = n or the inclusion Xd ,→ Xd+1 is a forward arrow.

With the four types of bars, when we compute the bottleneck distance between persistence
diagrams for two zigzag persistence modules, we consider matching between bars of similar
types. That is, db (Dgm p (F1 ), Dgm p (F2 )) is computed with the understanding that only similar
types of bars are compared while matching the bars and the points on the diagonal are assumed to
have any type. We face a difficulty in defining an interleaving distance between zigzag modules
because of the the zigzag nature of the arrows. However, one can define such an interleaving
distance by mapping the module to a 2-parameter persistence module. See the notes in Chapter 12
for more details.
106 Computational Topology for Data Analysis

4.3.1 Approach
We briefly describe an overview of our approach for computing zigzag persistent intervals for a
simplicial zigzag filtration:

F : ∅ = K0 ↔ K1 ↔ · · · ↔ Kn−1 ↔ Kn . (4.6)

We assume that the filtration is simplex-wise, which means that Ki , Ki+1 differ by only one simplex
σi and also begins with the empty complex. We have seen similar conditions before for the non-
zigzag case in Section 3.1.2. This is not a serious restriction because we can expand an inclusion
of a set of simplices to a series of inclusions by a single simplex while using any order that puts a
simplex after all its faces and we can always pad an empty complex at the beginning with the first
inclusion being forward.
The method we describe is derived from maintaining a consistent basis with a set of represen-
tative cycles over the intervals as we define now. These cycles generate an interval module in a
straightforward way by associating a cycle to a homology class at each position.

Definition 4.12 (Representative cycles). Let p ≥ 0, F : K0 ↔ · · · ↔ Kn be a zigzag filtration and


φi
H p F = {H p (Ki ) ↔ H p (Ki+1 )}i=0,...,n−1 be the corresponding zigzag persistence module. Let [b, d]
be an interval in Dgm p (F). A set of representative p-cycles for [b, d] is an indexed set of p-cycles

ci ⊆ Ki | i ∈ [b, d] so that

1. For b > 0, [cb ] is not in the image of ϕb−1 if Kb−1 ↔ Kb is a forward inclusion, or [cb ] is
the non-zero class mapped to 0 by ϕb−1 otherwise.

2. For d < n, [cd ] is not in the image of ϕd if Kd ↔ Kd+1 is a backward inclusion, or [cd ] is
the non-zero class mapped to 0 by ϕd otherwise.

3. For each i ∈ [b, d − 1], [ci ] ↔ [ci+1 ] by ϕi , that is, either [ci ] 7→ [ci+1 ] or [ci ] ←[ [ci+1 ] by ϕi .

The interval module induced by the representative p-cycles is a zigzag persistence module I :
I0 ↔ I1 · · · ↔ In such that Ii equals the 1-dimensional vector space generated by [ci ] ∈ H p (Ki )
for i ∈ [b, d] and equals 0 otherwise.

The following theorem justifies the definition of representative cycles, which says that repre-
sentative cycles always produce an interval decomposition of a zigzag module and vice versa:

Theorem 4.11. Let p ≥ 0, F : K0 ↔ · · · ↔ Kn be a zigzag filtration with H p (K0 ) = 0 and A be


an index set.L One has that H p F is equal to (not merely isomorphic to) a direct sum of interval
submodules α∈A I[bα ,dα ] if and only if for each α ∈ A, I[bα ,dα ] is an interval module induced by
a set of representative p-cycles for [bα , dα ] where Dgm p (F) = [bα , dα ] | α ∈ A .

We now present an abstract algorithm based on an approach in [228] which helps us design
a concrete algorithm later. Given a filtration F : ∅ = K0 ↔ · · · ↔ Kn starting with an empty
complex, first let Dgm p (F0 ) = ∅. The algorithm then iterates for i ← 0, . . . , n − 1. At the
beginning of the i-th iteration, inductively assume that the intervals and their representative cycles
for H p Fi have already been computed. The aim of the i-th iteration is to compute these for H p Fi+1 .
Let Dgm p (Fi ) = [bα , dα ] | α ∈ Ai be indexed by a set Ai , and let cαk ⊆ Kk | k ∈ [bα , dα ] be a
 
Computational Topology for Data Analysis 107

set of representative p-cycles for each [bα , dα ]. For ease of presentation, we also let cαk = 0 for
each α ∈ Ai and each k ∈ [0, i] not in [bα , dα ]. We call intervals of Dgm p (Fi ) ending with i
as surviving intervals at index i. Each non-surviving interval of Dgm p (Fi ) is directly included
in Dgm p (Fi+1 ) and its representative cycles stay the same. For surviving intervals of Dgm p (Fi ),
the i-th iteration proceeds with the following cases determined by the types of the linear maps
ϕi : H p (Ki ) ↔ H p (Ki+1 ).
ϕi is isomorphic: In this case, no intervals are created or cease to persist. For each surviv-
ing interval [bα , dα ] in Dgm p (Fi ), [bα , dα ] now corresponds to an interval [bα , i + 1] in
Dgm p (Fi+1 ). The representative cycles for [bα , i + 1] are set by the following rule:
Trivial setting rule of representative cycles: For each j with bα ≤ j ≤ i, the representative
cycle for [bα , i + 1] at index j stays the same. The representative cycle for [bα , i + 1] at i + 1
is set to a cαi+1 ⊆ Ki+1 such that [cαi ] ↔ [cαi+1 ] by ϕi .
ϕi points forward and is injective: A new interval [i + 1, i + 1] is added to Dgm p (Fi+1 ) and its
representative cycle at i + 1 is set to a p-cycle in Ki+1 containing σi . All surviving intervals
of Dgm p (Fi ) persist to index i+1 and their representative cycles are set by the trivial setting
rule.
ϕi points backward and is surjective: A new interval [i + 1, i + 1] is added to Dgm p (Fi+1 ) and
its representative cycle at i + 1 is set to a p-cycle homologous to ∂(σi ) in Ki+1 . All surviving
intervals of Dgm p (Fi ) persist to index i + 1 and their representative cycles are set by the
trivial setting rule.
ϕi points forward and is surjective: A surviving interval of Dgm p (Fi ) does not persist to i + 1.
Let Bi ⊆ Ai consist of indices of all surviving intervals. We have that [cαi ] | α ∈ Bi forms

α
a basis of H p (Ki ). Suppose that ϕi [cαi 1 ] + · · · + [ci ` ] = 0, where α1 , . . . , α` ∈ Bi . We can

rearrange the indices such that bα1 < bα2 < · · · < bα` and α1 < α2 < · · · < α` . Let λ be
α1 if the arrow ◦bα −1 ↔ ◦bα points backward for every α ∈ {α1 , . . . , α` } and otherwise be
the largest α ∈ {α1 , . . . , α` } such that ◦bα −1 ↔ ◦bα points forward. Then, [bλ , i] forms an
interval of Dgm p (Fi+1 ). For each k ∈ [bλ , i], let zk = cαk 1 + · · · + cαk ` ; then, zk | k ∈ [bλ , i] is a


set of representative cycles for [bλ , i]. All the other surviving intervals of Dgm p (Fi ) persist
to i + 1 and their representative cycles are set by the trivial setting rule.
ϕi points backward and is injective: A surviving interval of Dgm p (Fi ) does not persist to i + 1.
Let Bi ⊆ Ai consist of indices of all surviving intervals, and let cαi 1 , . . . , cαi ` be the cycles in
 α
ci | α ∈ Bi containing σi . We can rearrange the indices such that bα1 < bα2 < · · · < bα`
and α1 < α2 < · · · < α` . Let λ be α1 if the arrow ◦bα −1 ↔ ◦bα points forward for every
α ∈ {α1 , . . . , α` } and otherwise be the largest α ∈ {α1 , . . . , α` } such that ◦bα −1 ↔ ◦bα points
backward. Then, [bλ , i] forms an interval of Dgm p (Fi+1 ) and the representative cycles for
[bλ , i] stay the same. For each α ∈ {α1 , . . . , α` } not equal to λ, let zk = cαk + cλk for each k
such that bα ≤ k ≤ i, and let zi+1 = zi ; then, zk | k ∈ [bα , i + 1] is a set of representative

cycles for [bα , i + 1]. For the other surviving intervals, the setting of representative cycles
follows the trivial setting rule.
Remark 4.3. Note that in the above algorithm, there is no canonical choice for the representative
classes. However, all choices produce the same intervals.
108 Computational Topology for Data Analysis

4.3.2 Zigzag persistence algorithm


We now present a concrete version of our approach which runs in cubic time. In this algorithm,
given a zigzag filtration F : ∅ = K0 ↔ K1 ↔ · · · ↔ Kn , the main loop iterates for i ← 0, . . . , n−1
so that the i-th iteration takes care of the changes from Ki to Ki+1 . A unique integral id less than n
is assigned to each simplex in Ki and id[σ] is used to record the id of a simplex σ. Note that the
id of a simplex is subject to change during the execution. For each dimension p, a cycle matrix
Z p and a chain matrix C p+1 with entries in Z2 are maintained. The number of columns of Z p and
C p+1 equals rank Zp (Ki ) and the number of rows of Z p and C p+1 equals n. We will see that certain
columns j of C p+1 maintain a (p + 1)-chain whose boundary is in column j of Z p . Each column
of Z p and C p represents a p-chain in Ki such that for each simplex σ ∈ Ki , σ belongs to the
p-chain if and only if the bit with index id[σ] in the column equals 1. For convenience, we make
no distinction between a column of the matrix Z p or C p and the chain it represents. We use Z p [ j]
to denote the j-th column of Z p (columns of C p are denoted similarly). For each column Z p [ j],
a birth timestamp b p [ j] is maintained. This timestamp is usually non-negative, but can possibly
be negative one (−1). We will see that this special negative value is assigned only to indicate that
the column represents a boundary cycle. Moreover, we let the pivot of Z p [ j] be the largest index
whose corresponding bit equals to 1 in Z p [ j] and denote it as pivot(Z p [ j]). At the start of the i-th
iteration, for each p, the following properties for the matrices are preserved:

1. The columns of Z p form a basis of Z p (Ki ) and have distinct pivots.

2. The columns of Z p with negative birth timestamps form a basis of B p (Ki ). Moreover, for
each column Z p [ j] of Z p with a negative birth timestamp, one has that Z p [ j] = ∂ C p+1 [ j] .


3. For columns of Z p with non-negative birth timestamps, their birth timestamps bijectively
map to the starting indices of the intervals of Dgm p (Fi ) ending with i. Moreover, for each
column Z p [ j] of Z p such that b p [ j] is non-negative, one has that Z p [ j] is a representative
cycle at index i for the interval b p [ j], i .
 

The above properties indicate that a column Z p [ j] of Z p is a boundary if b p [ j] < 0 and is


not a boundary otherwise. Furthermore, we have that columns of Z p with non-negative birth
timestamps represent a homology basis for H p (Ki ) at the start of the i-th iteration.

Zigzag algorithm.
For each i ← 0, . . . , n − 1, the algorithm does the following:

• Case ϕi is forward: From Ki to Ki+1 , a p-simplex σi is added and the id of σi is set as


id[σi ] = i. Since the columns of Z p−1 form a basis of Z p−1 (Ki ) and have distinct pivots,
∂(σi ) can be represented as a sum of the columns of Z p−1 by a reduction algorithm. Suppose
that ∂(σi ) = α∈I Z p−1 [α] where I is a set of column indices of Z p−1 . The algorithm then
P
checks the timestamp of Z p−1 [α] for each α ∈ I to see whether all of them are boundaries.
After this, it is known whether or not ∂(σi ) is a boundary in Ki . An interval in dimension p
gets born if ∂(σi ) is a boundary in Ki and an interval in dimension p − 1 dies otherwise.

– Birth: Append a new column σi + α∈I C p [α] with birth timestamp i + 1 to Z p .


P
Computational Topology for Data Analysis 109

– Death: Let J consist of indices in I whose corresponding columns in Z p−1 have non-
negative birth timestamps. If ϕb p−1 [α]−1 points backward ∀ α ∈ J, let λ be the smallest
index in J; otherwise, let λ be the largest α in J such that ϕb p−1 [α]−1 points forward.
Then, do the following:
1. Output the (p − 1)-th interval b p−1 [λ], i .
 

2. Set Z p−1 [λ] = ∂(σi ), C p [λ] = σi , and b p−1 [λ] = −1.


Since the pivot of the column ∂(σi ) may conflict with that of another column in Z p−1 ,
we perform steps 1-13 described next to keep the pivots distinct. The total order b
used in step 10 and later is defined as follows.
Definition 4.13. Let I ⊆ 1, . . . , n − 1 be a set of indices. For i, j ∈ I, i b j in the

total order if and only if one of the following holds:
∗ i = j.
∗ i < j and the function ϕ j−1 points forward.
∗ j < i and the function ϕi−1 points backward.
1. while there are two columns Z p−1 [α], Z p−1 [β] with the same pivot do
2. if b p−1 [α] < 0 and b p−1 [β] < 0 then
3. Z p−1 [α] ← Z p−1 [α] + Z p−1 [β]
4. C p [α] ← C p [α] + C p [β]
5. if b p−1 [α] < 0 and b p−1 [β] ≥ 0 then
6. Z p−1 [β] ← Z p−1 [α] + Z p−1 [β]
7. if b p−1 [α] ≥ 0 and b p−1 [β] < 0 then
8. Z p−1 [α] ← Z p−1 [α] + Z p−1 [β]
9. if b p−1 [α] ≥ 0 and b p−1 [β] ≥ 0 then
10. if b p−1 [α] b b p−1 [β] then
11. Z p−1 [β] ← Z p−1 [α] + Z p−1 [β]
12. else
13. Z p−1 [α] ← Z p−1 [α] + Z p−1 [β]

• Case ϕi is backward: From Ki to Ki+1 , a p-simplex σi is deleted. If there is a column in Z p


containing σi , then there are some p-cycles missing going from Ki to Ki+1 and an interval
in dimension p dies. Otherwise, an interval in dimension p − 1 gets born.

– Birth: First, the boundaries in Z p−1 need to be updated so that they form a basis of
B p−1 (Ki+1 ):
1. while there are two columns Z p−1 [α], Z p−1 [β] with negative birth timestamps s.t.
C p [α], C p [β] contain σi do
2. if pivot(Z p−1 [α]) > pivot(Z p−1 [β]) then
3. Z p−1 [α] ← Z p−1 [α] + Z p−1 [β]
4. C p [α] ← C p [α] + C p [β]
5. else
6. Z p−1 [β] ← Z p−1 [α] + Z p−1 [β]
110 Computational Topology for Data Analysis

7. C p [β] ← C p [α] + C p [β]


Then, let Z p−1 [α] be the only column with negative birth timestamp in Z p−1 such that
C p [α] contains σi ; set b p−1 [α] = i + 1. Note that Z p−1 [α] is homologous to ∂(σi ) in
Ki+1 , and the pivots are automatically distinct.
– Death: First, update C p so that no columns of C p contain σi :
1. Let Z p [α] be a column of Z p containing σi .
2. For each column1 C p [β] of C p containing σi , set C p [β] = C p [β] + Z p [α].
Then, remove σi from Z p :
1. α1 , . . . , αk ← indices of all columns of Z p containing σi
2. sort α1 , . . . , αk s.t. b p [α1 ] b . . . b b p [αk ].
3. z ← Z p [α1 ]
4. for α ← α2 , . . . , αk do
5. if pivot(Z p [α]) > pivot(z) then
6. Z p [α] ← Z p [α] + z
7. else
8. temp ← Z p [α]
9. Z p [α] ← Z p [α] + z
10. z ← temp
output the p-th interval b p [α1 ], i
 
11.
12. delete the column Z p [α1 ] from Z p and delete b p [α1 ] from b p

At the end of the algorithm, for each p and each column Z p [α] of Z p with non-negative birth
timestamp, output the p-th interval b p [α], n . Notice that while spewing out the bars, the al-
 
gorithm can easily output the types of the bars by looking at the relevant arrows as described
before.

4.4 Persistence for zigzag towers


So far, we have considered computing persistence for towers where maps are all in the forward
direction though may not be inclusions and of zigzag filtrations where maps may be both in
forward and backward directions but cannot be other than inclusions. In this section, we consider
the zigzag towers that combines the both, that is, maps are simplicial (not necessarily inclusions)
and may point both in the forward and backward directions:

f0 f1 f2 fn−1
K : K0 ←→ K1 ←→ K2 ←→ · · · ←→ Kn . (4.7)

Recall that by Proposition 4.6 each map fi : Ki → Ki+1 can be decomposed into elementary
inclusions and elementary collapses. So, without loss of generality, we assume that every fi is
either an elementary inclusion or an elementary collapse.
Computational Topology for Data Analysis 111

w x f x
(u, v) → u

y w y
v
K0
K
ι ι0
u u
w
x

v K̂ y

Figure 4.6: Elementary collapse (u, v) → u: the cone u ∗ St v adds edges uw, uv, ux, triangles
uwx, uvx, uvw, and the tetrahedron uvwx.

First, we propose a simulation of an elementary collapse with a coning strategy that only
requires additions of simplices.
Let f : K → K 0 be an elementary collapse. Assume that the induced vertex map collapses
vertices u, v ∈ K to u ∈ K 0 , and is identity on other vertices. For a subcomplex X ⊆ K, define the
S
cone u ∗ X to be the complex σ∈X {σ ∪ {u}}. Consider the augmented complex
 
K̂ := K ∪ u ∗ St v .

In other words, for every simplex {u0 , . . . , ud } ∈ St v of K, we add the simplex {u0 , . . . , ud } ∪ {u}
to K̂ if it is not already in. See Figure 4.6. Notice that K 0 is a subcomplex of K̂ in this example
which we observe is true in general.
Claim 4.1. K 0 ⊆ K̂.
Now consider the inclusions ι : K ,→ K̂ and ι0 : K 0 ,→ K̂. These inclusions along with
the elementary collapse constitute a diagram in Figure 4.6 which does not necessarily commute.
Nevertheless, it commutes at the homology level which is precisely stated below.

ι∗ ι0∗
Proposition 4.12. In the zigzag module H p (K) → H p (K̂) ← H p (K 0 ) induced by inclusions ι and
ι0 , the linear maps ι0∗ is an isomorphism and f∗ : H p (K) → H p (K 0 ) equals to (ι0∗ )−1 ◦ ι∗ .
Proof. We use the notion of contiguous maps which induces equal maps at the homology level.
Recall that two maps f1 : K1 → K2 , f2 : K1 → K2 are contiguous if for every simplex σ ∈ K1 ,
f1 (σ)∪ f2 (σ) is a simplex in K2 . We observe that the simplicial maps ι0 ◦ f and ι are contiguous and
ι0 induces an isomorphism at the homology level, that is, ι0∗ : H p (K 0 ) → H p (K̂) is an isomorphism.
Since ι is contiguous to ι0 ◦ f , we have ι∗ = (ι0 ◦ f )∗ = ι0∗ ◦ f∗ . Since ι0∗ is an isomorphism,
(ι0∗ )−1 exists and is an isomorphism. It then follows that f∗ = (ι0∗ )−1 ◦ ι∗ . 

1
Note here we only iterate over columns C p [β] for which Z p−1 [β] is a boundary.
112 Computational Topology for Data Analysis

Proposition 4.12 allows us to simulate the persistence of a simplicial tower with only inclusion-
induced homomorphisms which, in turn, allows us to consider a simplicial zigzag filtration. More
specifically, the simplicial tower in Eqn. (4.7) generates the zigzag persistence module by induced
homomorphisms fi ∗
f0 ∗ f1 ∗ f2 ∗ fn−1 ∗
H p (K0 ) ←→ H p (K1 ) ←→ H p (K2 ) ←→ · · · ←→ H p (Kn ). (4.8)

With our observation that every map fi ∗ can be simulated with an inclusion induced map, our goal
is to replace the original simplicial tower in Eqn. (4.7) with a zigzag filtration so that we can take
advantage of the algorithm in section 4.3. In view of Proposition 4.12, the two diagrams shown
in Figure 4.7 commute, the one on left corresponds to a forward collapse fi : Ki → Ki+1 and the
other on right corresponds to a backward collapse fi : Ki ← Ki+1 .

H p (Ki )
fi ∗
/ H p (Ki+1 ) o = H p (Ki+1 ) H p (Ki )
= / H p (Ki ) o fi ∗
H p (Ki+1 )
= ' = = ' =
     
ιi∗
H p (Ki ) / H p (K̂i ) o '
H p (Ki+1 ) H p (Ki )
' / H p (K̂i+1 ) o ιi∗ H p (Ki+1 )

Figure 4.7: Top modules induced from an elementary collapse are isomorphic to the modules
induced by inclusions at the bottom.

Observe that, if fi is an inclusion instead of a collapse, we can still construct similar com-
muting diagrams. In that case, we simply take K̂i = Ki+1 when fi is a forward inclusion and take
K̂i+1 = Ki when fi is a backward inclusion.
Now, we can expand each fi ∗ of the persistence module in Eqn. (4.8) by juxtaposing it with
an equality as in the top modules shown in Figure 4.7. Then, this expanded module becomes
isomorphic to the modules induced by inclusions at the bottom of the commuting diagrams.
In general, we first consider the expansion of the module in Eqn. (4.8) to the following module
in Eqn. (4.9) where S i = Ki+1 , gi = fi , and hi is equality when fi is forward, and S i = Ki , gi is
equality and hi = fi when fi is backward.
g0 h0 g1 h1 g2 hn−1
H p (K0 ) −→ H p (S 0 ) ←− H p (K1 ) −→ H p (S 1 ) ←− H p (K2 ) −→ · · · ←− H p (Kn ) (4.9)

Using Figure 4.7, a module isomorphic to the module in Eqn. (4.9) can be constructed as given in
Eqn. (4.10) where T i = K̂i when fi is forward and T i = K̂i+1 when fi is backward. All maps are
induced by inclusions.

H p (K0 ) −→ H p (T 0 ) ←− H p (K1 ) −→ H p (T 1 ) ←− H p (K2 ) −→ · · · ←− H p (Kn ) (4.10)

The two persistence modules in Eqn. (4.9) and in Eqn. (4.10) are isomorphic because all vertical
maps in the diagram below are isomorphisms and all squares commute (Figure 4.7).
In view of the module in Eqn. (4.10), we convert the tower K in Eqn. (4.7) to the zigzag
filtration below where T i = K̂i when fi is forward and T i = K̂i+1 when fi is backward:

F : K0 ,→ T 0 ←- K1 ,→ T 1 ←- K2 ,→ · · · ←- Kn (4.11)
Computational Topology for Data Analysis 113

g0 g1 g2
H p (K0 ) / H p (S 0 ) o h0 H p (K1 ) / H p (S 1 ) o h1 H p (K2 ) / ...... o H p (Kn )
= ' = ' = =
     
H p (K0 ) / H p (T 0 ) o H p (K1 ) / H p (T 1 ) o H p (K2 ) / ...... o H p (Kn )

Figure 4.8: Modules in Eqn. 4.9 and 4.10 are isomorphic.

The zigzag filtration above is simplex-wise but does not begin with an empty complex. We can
expand K0 simplex-wise to convert the filtration to a simplex-wise filtration that begins with an
empty complex. Then, we can apply the zigzag algorithm in Section 4.3.2 to compute the barcode.

Theorem 4.13. The persistence diagram of K can be derived from that of the filtration F.

Example 4.1. Consider the tower in Eqn. (4.12) where each map is an elementary collapse and
the persistence module induced by it in Eqn. (4.13). This module can be expanded and its isomor-
phic module is shown at the bottom of the commuting diagram in Figure 4.9.
f0 f1 f2 fn−1
K0 −→ K1 ←− K2 −→ · · · −→ Kn (4.12)

f0 ∗ f1 ∗ f2 ∗ fn−1 ∗
H p (K0 ) −→ H p (K1 ) ←− H p (K2 ) −→ · · · −→ H p (Kn ) (4.13)

We obtain the following zigzag filtration that corresponds to the module at the bottom of the
diagram in Figure 4.9. Hence, we can compute the barcode for the input tower in Eqn. (4.12)
from this zigzag filtration.

K0 ,→ K̂0 ←- K1 ,→ K̂2 ←- K2 ,→ · · · ←- Kn (4.14)

/ H p (K1 ) o = = =
f 0∗ f 1∗ f 2∗
H p (K0 ) H p (K1 ) / H p (K1 ) o H p (K2 ) / ...... o H p (Kn )
= ' = ' = =
 i 0∗
   i1∗  i 2∗ 
H p (K0 ) / H p (K̂0 ) o ' H p (K1 )
' / H p (K̂2 ) o H p (K2 ) / ...... o '
H p (Kn )

Figure 4.9: Commuting diagram for the module in Eqn. (4.13) and its isomorphic module.

Remark 4.4. Notice that, when fi is an inclusion, we can eliminate introducing the middle column
in Figure 4.8 which will translate into eliminating some of the inclusions in the sequence in
Eqn. (4.11). We introduced these extraneous inclusions just to make the expanded module generic
in the sense that its inclusions reverse the directions alternately.
114 Computational Topology for Data Analysis

4.5 Levelset zigzag persistence


Now, we consider a special type of zigzag persistence stemming from a function over a topological
space. In standard persistence, growing sublevel sets of the function constitute the filtration over
which the persistence is defined. In levelset zigzag persistence, we replace the sublevel sets with
level sets and interval sets and the maps going from the level sets to the adjacent interval sets give
rise to a zigzag filtration. To produce a zigzag filtration corresponding to a level set persistence,
we consider a PL-function on the underlying space of a simplicial complex and then convert a
zigzag sequence of subspaces (level and interval sets) into subcomplexes. This is similar to what
we did while considering the standard persistence for a PL-function in Sections 3.1 and 3.5.
Before we focus on a PL-function, let us consider a more general real-valued continuous
function f : X → R on a topological space X. We need a restriction on f that keeps all homology
groups being considered to be finite. For a real value s ∈ R and an interval I ⊆ R, we denote the
level set f −1 (s) by X=s and the interval set f −1 (I) by XI .

Definition 4.14 (Critical, regular value). An open interval I ⊆ R is called a regular interval if
there exist a topological space Y and a homeomorphism Φ : Y × I → XI so that f ◦ Φ is the
projection onto I and Φ extends to a continuous function Φ̄ : Y × I¯ → XI¯ where I¯ is the closure of
I. We assume that f is of Morse type [63] meaning that each levelset X=s has finitely-generated
homology groups and there are finitely many values called critical a0 = −∞ < a1 < · · · <
an < an+1 = +∞, so that each interval (ai , ai+1 ) is a maximal interval that is regular. A value
s ∈ (ai , ai+1 ) is then called a regular value.

The original construction [63] of level set (henceforth written levelset) zigzag persistence
picks regular values s0 , s1 , . . . , sn so that each si ∈ (ai , ai+1 ). Then, the levelset zigzag filtration of
f is defined as follows:

X[s0 ,s1 ] ←- · · · ,→ X[si−1 ,si ] ←- X=si ,→ X[si ,si+1 ] ←- · · · ,→ X[sn−1 ,sn ] .

This construction relies on a choice of regular values and there is no canonical choice. As we
work on simplicial complexes, different regular values can result in different complexes in the
filtration. Therefore, we adopt the following alternative definition of a levelset zigzag filtration X,
which does not rely on a choice of regular values:

X : X(a0 ,a2 ) ←- · · · ,→ X(ai−1 ,ai+1 ) ←- X(ai ,ai+1 ) ,→ X(ai ,ai+2 ) ←- · · · ,→ X(an−1 ,an+1 ) . (4.15)

The space of the type X(ai−1 ,ai+1 ) contains a critical value ai and hence is called a critical space.
For a similar reason a space of the type X(ai ,ai+1 ) is called regular space which does not contain
any critical value. Considering the homology groups of the spaces, we get the zigzag persistence
module:

H p X : H p (X(a0 ,a2 ) ) ← · · · → H p (X(ai−1 ,ai+1 ) ) ← H p (X(ai ,ai+1 ) ) → H p (X(ai ,ai+2 ) ) ← · · · → H p (X(an−1 ,an+1 ) ).

Note that X(ai ,ai+1 ) deformation retracts to X=si and X(ai−1 ,ai+1 ) deformation retracts to X[si−1 ,si ] ,
so the zigzag modules induced by the two diagrams are isomorphic, i.e., equivalent at the persis-
tent homology level. See Figure 4.10 for an example of a levelset zigzag filtration.
Computational Topology for Data Analysis 115

: ···

a1 a2 a3 a4 X(a0 ,a2 ) X(a1 ,a2 ) X(a1 ,a3 ) X(a2 ,a3 )

Figure 4.10: A torus with four critical values. The real-valued function is the height function over
the horizontal line. The first several subspaces in the levelset zigzag diagram are given and the
remaining ones are symmetric. Empty dot indicates that the point is not included.

Generation of barcode for levelset zigzag. The interval decomposition of the module H p X
gives the barcode for the zigzag persistence. However, the endpoints of the bars may belong to
either the index of a critical or regular space. If it belongs to a critical space X(ai−1 ,ai+1 ) , we map
it to the critical value ai . Otherwise, if it belongs to a regular space X(ai ,ai+1 ) , we map it to the
regular value si . After this conversion, still the bars do not end solely in critical values. We
modify the endpoints further. In keeping with the understanding that even the levelset homology
classes do not change in the regular spaces, we convert an endpoint si to an adjacent critical value
and make the bar (interval module) open at that critical value. Precisely we modify the bars as (i)
[ai , a j ] ⇔ [ai , a j ], (ii) [ai , s j ] ⇔ [ai , a j+1 ) (iii) [si , a j ] ⇔ (ai , a j ] (ii) [si , s j ] ⇔ (ai , a j+1 ). As in
the case of standard zigzag filtration, the intervals in (i)-(iv) are referred as closed-closed, closed-
open, open-closed, and open-open bars respectively. Our goal is to compute these four types of
bars for a PL-function where the space X is the underlying space of a simplicial complex K.

4.5.1 Simplicial levelset zigzag filtration


We now turn to a simplicial version of the construction we just described. For a given complex
K, let X = |K| and f : X → R be a PL-function defined by interpolating values on the vertices of
K (Definition 3.2). We also assume f to be generic, that is, no two vertices of K have the same
function value.
We know that f can have critical values only at K’s vertices (Section 3.5.1). We call these
vertices critical and call other vertices regular. Let v1 , . . . , vn be all the critical vertices of f
with values a1 < · · · < an , and let a0 = −∞, an+1 = +∞ be two additional critical values.
For two critical values ai < a j , let X(i, j) := X(ai ,a j ) and K(i, j) be the complex σ ∈ K | ∀ v ∈

σ, f (v) ∈ (ai , a j ) . Then, the space and simplicial levelset zigzag filtration X and K of f are
defined respectively as:

X : X(0,2) ←- · · · ,→ X(i−1,i+1) ←- X(i,i+1) ,→ X(i,i+2) ←- · · · ,→ X(n−1,n+1) (4.16)

K : K(0,2) ←- · · · ,→ K(i−1,i+1) ←- K(i,i+1) ,→ K(i,i+2) ←- · · · ,→ K(n−1,n+1) (4.17)

A complex of the form K(i,i+1) in the filtration is called a regular complex and a complex of the
form K(i,i+2) is called a critical complex. Note that while we can expect the space and simplicial
116 Computational Topology for Data Analysis

levelset zigzag filtrations for a finely tessellated complex to be ai ai+1 ai ai+1

equivalent, this is not always the case. For example, in Fig-


0
ure 4.11, let K 0 be the complex on the left; K(i,i+1) (thick edges)
0
is not homotopy equivalent to |K |(i,i+1) , and hence the simplicial
levelset zigzag filtration is not equivalent to the space one. We ob-
serve that the non-equivalence is caused by the two central trian-
gles which contain more than one critical value. A subdivision of Figure 4.11: Simplicial
the two central triangles in the complex K 00 on the right, where no zigzag filtration is made
triangles contain more than one critical value, renders |K 00 |(i,i+1) equivalent to space filtration
00
deformation retracting to K(i,i+1) . Based on the above observa- by subdivision.
tion, we formulate the following property, which guarantees that
the module of the simplicial levelset zigzag filtration remain isomorphic to that of the space one.

Definition 4.15. A complex K is called compatible with the levelsets of a PL-function f : |K| → R
if for every simplex σ of K and its convex hull |σ|, function values of points in |σ| contain at most
one critical value of f .

Given a PL-function f on a complex K, one can make K compatible with the levelsets of f
by subdividing K with barycentric subdivisions; see e.g. [103].

Proposition 4.14. Let K be compatible with the levelsets of f , and let X = |K|; one has that
X(ai ,a j ) deformation retracts to K(i, j) for any two critical values ai < a j . Therefore, the zigzag
modules induced by the space and the simplicial levelset zigzag filtrations are isomorphic.

Our goal is to compute the four types of bars for the zigzag filtration X from its simplicial
version K. For this, we make K simplex-wise and call it F. First, F starts and ends with the same
original complexes in K. Second, whenever an inclusion in K is expanded so that one simplex is
added at a time, the addition follows the order of the simplices’ function values. Formally, for the
inclusion K(i,i+1) ,→ K(i,i+2) in K, let u1 = vi+1 , u2 , . . . , uk be all the vertices with function values
in [ai+1 , ai+2 ) such that f (u1 ) < f (u2 ) < · · · < f (uk ); then, the lower stars of u1 , . . . , uk are added
in sequence by F. Note that for each u j ∈ u1 , . . . , uk , we do not restrict how simplices in the

lower star of u j are added. For the inclusion K(i−1,i+1) ←- K(i,i+1) in K, everything is reversed, i.e.,
vertices are ordered in decreasing function values and upper stars are added. With this expansion,
the zigzag filtration K in Eqn. (4.17) is converted to a filtration F shown below where a dashed
arrow indicates insertions of one or more simplices and a solid arrow indicates a single simplex
insertion. In particular, we indicate that the backward inclusion K(i−1,i+1) c- K(i,i+1) is expanded
into a simplex-wise filtration.

F : · · · ,d K(i−1,i+1) ←- · · · ←- K`−1 ←- K` ←- · · · ←- K(i,i+1) ,d K(i,i+2) c- · · · (4.18)

After expanding all forward and backward inclusions to make them simplex-wise, we obtain a
zigzag filtration whose complexes can be indexed by 0, 1, . . . , n as we assume next.

4.5.2 Barcode for levelset zigzag filtration


One can compute the barcode for the zigzag filtration F in Eqn. (4.18) that is derived from the
original zigzag filtration K in Eqn. (4.17). There is one technicality that we need to take care
Computational Topology for Data Analysis 117

of. To apply the algorithm in Section 4.3.2, we need the input zigzag filtration to begin with an
empty complex. The filtration F as constructed from expanding K has the first complex K(0,2) that
is non-empty. So, as before, we expand K(0,2) simplex-wise and begin F with an empty complex.
We assume below this is the case for F.
The bars in the barcode for F do not necessarily coincide with the four types of bars for K
with endpoints only in critical values. However, we can read the bars for K from the bars of F.
First, assume that F is indexed as

F : ∅ = K0 ↔ K1 ↔ · · · ↔ Kn−1 ↔ Kn .

This means that a complex K j , j > 0, is of the four categories, (i) it is a complex in the expansion
of the backward inclusion K(i−1,i+1) c- K(i,i+1) , (ii) it is a complex in the expansion of the forward
inclusion K(i,i+1) ,d K(i,i+2) , (iii) it is a regular complex K(i,i+1) for some i > 0, (iv) it is a critical
complex K(i−1,i+1) for some i > 0. The types of complexes where the endpoints of a bar [b, d] for
F are located determine the bars for K and hence X which can be of four types: closed-closed
[ai , a j ], closed-open [ai , a j ), open-closed (ai , a j ], and open-open (ai , a j ).
Let [b, d] be a bar for F. If both Kb and Kd appear in the expansion of a forward inclusion
K(i,i+1) ,d K(i,i+2) , we ignore the bar because it is an artificial bar created due to expanding the
filtration K into the filtration F. Similarly, we ignore the bar if both Kb and Kd appear in the
expansion of a backward inclusion K(i−1,i+1) c- K(i,i+1) . We explain other cases below.

(Case 1.) Kb is either a regular complex K(i,i+1) or in the expansion of K(i−1,i+1) c- K(i,i+1) : the
complex Kb is a subcomplex of the critical complex K(i−1,i+1) which stands for the critical value
ai . So, the end b is mapped to ai and made open because the class for the bar [b, d] does not exist
in K(i−1,i+1) .

(Case 2.) Kb is either the critical complex K(i,i+2) or in the expansion of K(i,i+1) ,d K(i,i+2) : the
complex is a subcomplex of the critical complex K(i,i+2) which stands for the critical value ai+1 .
So, the end b is mapped to ai+1 and is closed because the class for [b, d] is alive in K(i,i+2)

(Case 3.) Kd is the critical complex K(i−1,i+1) or is in the expansion of the backward inclusion
K(i−1,i+1) c- K(i,i+1) : the complex is a subcomplex of the critical complex K(i−1,i+1) which stands
for the critical value ai . So, the end d is mapped to ai and made closed because the class for the
bar [b, d] exists in K(i−1,i+1) .

(Case 4.) Kd is either the regular complex K(i,i+1) or in the expansion of K(i,i+1) ,d K(i,i+2) : the
complex is a subcomplex of the critical complex K(i,i+2) which stands for the critical value ai+1 .
So, the end d is mapped to ai+1 and is open because the class for [b, d] is not alive in K(i,i+2) .

4.5.3 Correspondence to sublevel set persistence


Standard persistence as we have seen already is defined by considering the sublevel sets of f , that
is, X[0,i] = f −1 [s0 , si ] = f −1 (−∞, si ] where si ∈ (ai , ai+1 ) is a regular value. We get the following
sublevel set diagram:
X : X[0,0] → X[0,1] → · · · → X[0,n] .
118 Computational Topology for Data Analysis

Then, considering f to be a PL-function on X = |K|, we have already seen in Section 3.5 that X
can be converted to a simplicial filtration K shown below where K[0,i] = {σ ∈ K | f (σ) ≤ ai }. This
filtration can further be converted into a simplex-wise filtration which can be used for computing
Dgm p (K) for p ≥ 0.

K : K[0,0] → K[0,1] → K[0,2] · · · → K[0,n]

The bars for this case have the form [ai , a j ) where a j can be an+1 = ∞. Each such bar is closed at
the left endpoint because the homology class being born exists at K[0,i] . However, it is open at the
right endpoint because it does not exist at K[0, j] .
One can see that there are two types of bars in the sublevel set persistence, one of the type
[ai , a j ), j ≤ n, which is bounded on the right, and the other of the type [ai , ∞) = [ai , an+1 )
which is unbounded on the right. The unbounded bars are the infinite bars weLintroduced in
Section 3.2.1. They correspond to the essential homology classes since H p (K)  i [ai , ∞). The
work of [59, 63] imply that both types of barcodes of the standard persistence can be recovered
from those of the levelset zigzag persistence as the theorem below states:
Theorem 4.15. Let K and K0 denote the filtrations for the sublevel sets and level sets respectively
induced by a continuous function f on a topological space with critical values a0 , a1 , · · · , an+1
where a0 = −∞ and an+1 = ∞. For every p ≥ 0,
1. [ai , a j ), j , n + 1 is a bar for Dgm p (K) iff it is so for Dgm p (K0 ),
2. [ai , an+1 ) is a bar for Dgm p (K) iff either [ai , a j ] is a closed-closed bar for Dgm p (K0 ) for
some a j > ai , or (a j , ai ) is an open-open bar for Dgm p−1 (K0 ) for some a j < ai .

4.5.4 Correspondence to extended persistence


There is another persistence considered in the literature under the name extended persistence [103],
and it turns out that there is a correspondence between extended persistence and level set zigzag
persistence. For a real-valued function f : X → R, let X[0,i] denote the sublevel set f −1 [s0 , si ] as
before and X[i,n] denote the superlevel set f −1 [si , sn ]. Then, a persistence module that considers
the sublevel set filtration first and then juxtaposes it with a filtration of quotient spaces of X as
shown below gives the notion of extended persistence:

X : X[0,0] ,→ · · · ,→ X[0,n] ,→ (X[0,n] , X[n,n] ) ,→ · · · ,→ (X[0,n] , X[0,n] ).


Observe that each inclusion map between two quotient spaces induces a linear map in their relative
homology groups. One can read that the above sequence arises by first growing the space to the
full space X[0,n] with sublevel sets and then shrinking it by quotienting with the superlevel sets.
Again, taking f : X → R as a PL-function on X = |K|, we get the simplicial extended filtration
where K[0,i] = {σ ∈ K | f (σ) ≤ ai } and K[i,n] = {σ ∈ K | f (σ) ≥ ai }.

E : K[0,0] ,→ · · · ,→ K[0,n] ,→ (K[0,n] , K[n,n] ) ,→ · · · ,→ (K[0,n] , K[0,n] ).

The decomposition of the persistence module H p E arising out of E provides the bars in Dgm p (E).
For the first part of the sequence, the endpoints of the bars are designated with respective function
Computational Topology for Data Analysis 119

values ai as before. For the second part, the birth or death point of a bar is designated as an+i if its
class either is born in (K[0,n] , K[i,n] ) or dies entering into (K[0,n] , K[i,n] ) respectively for 0 ≤ i ≤ n.
We leave the proof of the following theorem as an exercise; see also [63].

Theorem 4.16. Let K and E denote the simplicial levelset zigzag filtration and the extended
filtration of a PL-function f : |K| → R. Then, for every p ≥ 0,

1. [ai , a j ) is a bar for Dgm p (K) iff it is a bar for Dgm p (E),

2. (ai , a j ] is a bar for Dgm p (K) iff [an+ j , an+i ) is a bar for Dgm p+1 (E),

3. [ai , a j ] is a bar for Dgm p (K) iff [ai , an+ j ) is a bar for Dgm p (E),

4. (ai , a j ) is a bar for Dgm p (K) iff [a j , an+i ) is a bar for Dgm p+1 (E).

Clearly, for two persistence modules H p E and H p E0 arising out of two extended filtrations E
and E0 , the stability of persistence diagrams holds, that is, db (Dgm p E, Dgm p E0 ) = dI (H p E, H p E0 )
(Theorem 3.11).

4.6 Notes and Exercises


Computation of persistent homology induced by simplicial towers generalizing filtrations were
considered in the context of TDA by Dey, Fan, Wang [122]. They gave two approaches to compute
persistence diagrams for such towers, one by converting a tower to a zigzag filtration which
we described in Section 4.4 and the other by considering annotations in combination with the
link conditions allowing edge collapses without altering homotopy types which is described in
Section 4.2.1. The first approach apparently increases the size of the filtration which motivated
the second approach. Kerber and Schreiber showed that indeed the first approach can be leveraged
to produce filtrations instead of zigzag filtrations and without blowing up sizes [210].
The concept of zigzag modules obtained from a zigzag filtration by taking the homology
groups and linear maps induced by inclusions is closely related to quiver theory due to Gabriel [163]
which was brought to the attention of TDA community by Carlsson and de Silva [62]. They were
the first to propose the concept of zigzag persistence and its computation [62]. They observed
that any zigzag module can be decomposed into a set of other zigzag modules where the forward
non-zero maps are only injective and the backward non-zero maps are only surjective. Although
they did not compute this decomposition, they used its existence to design an algorithm for com-
puting the interval decomposition of a given zigzag module. Later, with Morozov, they used these
concepts to present an O(n3 ) algorithm for computing the persistence of a simplex-wise zigzag
filtration with n arrows [63]. Milosavljević et al. [234] improved the algorithm for any zigzag
filtration with n arrows to have a time complexity of O(nω + n2 log2 n), where ω ∈ [2, 2.373) is the
exponent for matrix multiplication. Maria and Oudot [228] presented a different algorithm where
they showed how a filtration of the last complex in the prefix of a zigzag filtration can help com-
puting the persistence incrementally. The algorithm in this chapter draws upon these approaches
though is presented quite differently. Indeed, adaptation of the presented approach on graphs led
to recent near-linear time algorithms for zigzag persistence on graphs [127].
120 Computational Topology for Data Analysis

Given a real valued function f : X → R on a topological space X, the level sets at the critical
and intermediate values give rise to a levelset zigzag filtration as shown in Section 4.5. Carlsson,
de Silva, and Morozov [63] introduced this set up and observed the decomposition of the zigzag
module into interval modules with open or closed ends. The four types of bars arising out of
this zigzag module give more information than the standard sublevel set persistence which only
outputs closed-open and infinite bars. It was observed in [59] that the open-open and closed-
closed bars indeed capture the infinite bars of the sublevel set persistence with an appropriate
dimension shift. Theorem 4.15 summarizes this connection. The extended persistence originally
proposed for surfaces [5] and later extended for filtrations [103] also computes all four types of
bars, but they are described differently using the persistence diagrams rather than open and closed
ends.

Exercises
1. Show that the inequality in Proposition 4.1 cannot be improved to equality by giving a
counterexample.

2. Prove Proposition 4.5.

3. Prove Proposition 4.6.

4. Prove Proposition 4.7.

5. Prove Proposition 4.8.

6. For computing the persistence of a simplicial tower, we checked the link condition in all
dimensions. Argue that it is sufficient to check the condition only for three relevant dimen-
sions.

7. Let K be a triangulated 2-manifold of genus g without boundary. Consider the following


tasks:

• Compute the genus g by the formula 2 − 2g = #vertices − #edges + #triangles.


• Compute a spanning tree T of the 1-skeleton of K, and a spanning tree T ∗ of the dual
graph none of whose edges are dual to any edge in T .
• Annotate the edges in T with zero vector of length g, index the edges not in T and
whose duals are not in T ∗ as e1 , . . . , e2g . Annotate ei with a vector that has the ith
entry 1 and all other entries 0.
• Propagate systematically the annotation to the rest of the edges.

Complete the above approach with a proof of correctness into an algorithm that computes
the annotation for edges in O(gn) time if K has n simplices.

8. Do we get the same barcode if we run the zigzag persistence algorithm given in Sec-
tion 4.3.1 and the standard persistence algorithm on a non-zigzag filtration? If so, prove it.
If not, show the difference and suggest a modification to the zigzag persistence algorithm
so that the both output become the same.
Computational Topology for Data Analysis 121

fi
9. Suppose that a persistence module {Vi → Vi+1 } is presented with the linear maps fi as
matrices whose columns and rows are fixed bases of Vi and Vi+1 respectively. Design an
algorithm to compute the barcode for the input module. Do the same when the input module
is a zigzag tower.

10. ([127]) We have seen that for graphs a near-linear time algorithm exists for computing non-
zigzag persistence. Design a near-linear time algorithm for computing zigzag persistence
for graphs.

11. Consider a PL-function f : |K| → R.

(a) Design an algorithm to compute the barcode of − f from a level set zigzag filtration
of f .
(b) Show that f and − f produce the same closed-closed and open-open bars for the lev-
elset zigzag filtration.
(c) In general, given a zigzag filtration F, consider the filtration F0 = −F in opposite
direction from right to left. What is the relation between the barcodes of these two
filtrations?

12. We computed persistence of zigzag towers by first converting it into a zigzag filtration and
then using the algorithm in section 4.3 to compute the bars. Design an algorithm that skips
the intermediate conversion to a filtration.

13. Design an algorithm for computing the extended persistence from a given PL-function on
an input simplicial complex.

14. ([63]) Prove Theorem 4.16.


122 Computational Topology for Data Analysis
Chapter 5

Generators and Optimality

So far we have focused mainly on the rank of the homology groups. However, the homology
generators, that is, the cycles whose classes constitute the elements of the homology groups carry
information about the space. Computing just some generating cycles (cycle basis) typically can

Figure 5.1: Double torus has 1-st homology group of rank four meaning that classes of four
representative cycles generate H1 ; (left) A non-optimal cycle basis, (right) optimal cycle basis.

be done by the standard algorithms for computing homology groups such as the persistence al-
gorithms. In practice, however, we may sometimes be interested in generating cycles that have
some optimal property; see Figure 5.1.
In particular, if the space has a metric associated with it, one may associate a measure with
the cycles that can differentiate them in terms of their ‘size’. For example, if K is a simplicial
complex embedded in Rd , the measure of a 1-cycle can be its length. Then, we can ask to compute
a set of 1-cycles whose classes generate H1 (K) and has minimum total length among all such sets
of cycles. Typically, the locality of these cycles capture interesting geometric features of the
space |K|. Some applications may benefit from computing such cycles respecting geometry. For
example, in computer graphics often a surface is cut along a set of cycles to make it flat for
parameterization. The classes of these cycles constitute a basis of the 1-st homology group. In
general, shortest (optimal) cycle basis is desired because they produce good parameterzation for
graphic rendering. Figure 5.2 shows examples of such cycles for three kinds of input where
a shortest (optimal) cycle basis has been computed with an algorithm that we describe in this
chapter. The algorithm works for simplicial complexes though we can apply it on point cloud
data as well after computing an appropriate complex such as Čech or Rips complex on top of the
input points.

123
124 Computational Topology for Data Analysis

It turns out that, for p > 1, the problem of computing an optimal homology basis for p-
th homology group H p is NP-hard [94]. However, the problem is polynomial time solvable for
p = 1 [136]. A greedy algorithm which was originally devised for computing an optimal H1 -basis
for surfaces [156] extends to general simplicial complexes as described in Section 5.1.
There is another case of optimality, namely the localization of homology classes. In this
problem, given a p-cycle c, we want to compute an optimal p-cycle c∗ in the same homology
class of c, that is, [c] = [c∗ ]. This problem is NP-hard even for p = 1 [73]. Interestingly, there are
some special cases for which an integer program formulated for the problem can be solved with
a linear program [126]. This is the topic of Section 5.2.
The two versions mentioned above do not consider persistence framework. We may ask what
are the optimal cycles for persistent homology classes. Toward formulating the problem precisely,
we define a persistent cycle for a given bar in the barcode of a filtration. This is a cycle whose
class is created at the birth point and becomes a boundary at the death point of the bar. Among all
persistent cycles for a given bar, we want to compute an optimal one. The problem in general is
NP-hard, but one can devise polynomial time algorithms for some special cases such as filtrations
of what we call weak pseudo-manifolds [129]. Section 5.3 describes these algorithms.

Figure 5.2: Computed shortest basis cycles (left) on a triangular mesh of Botijo, a well known
surface model in computer graphics, (middle) on a point cloud data sampling the surface of Bud-
dha, another well known surface model in computer graphics, (right) on an isosurface generated
from a volume data in visualization.

5.1 Optimal generators/basis


We now formalize the definition for optimal cycles whose classes generate the homology group.
Strictly speaking these cycles should not be called generator because it is their classes which
generate the group. We take the liberty to call the cycles themselves as the generators.
Definition 5.1 (Weight of cycles). Let w : K p → R≥0 be a non-negative weight function defined
on the set of p-simplices in a simplical complex K. We extend w to the cycle space Z p by defining
w(c) = i αi w(σi ) where c = i αi σi for αi ∈ Z2 . For a set of cycles C = {c1 , . . . , cg | ci ∈ Z p (K)}
P P
Pg
define its weight w(C) = i=1 w(ci ).
Computational Topology for Data Analysis 125

Definition 5.2 (Optimal generator). A set of cycles C = {c1 , c2 , . . . , cg | ci ∈ Z p (K)} is an H p (K)-


generator if the classes {[ci ] | i = 1, . . . , g} generate H p (K). An H p (K)-generator is optimal if
there is no other generator C0 with w(C0 ) < w(C).

Observe that, an optimal generator may not have minimal number of cycles whose classes
generate the homology group because we allow zero weights and hence an optimal generator may
contain extra cycles with zero weights. This prompts us to define the following.

Definition 5.3 (Optimal basis). An H p (K)-generator C = {c1 , c2 , . . . , cg | ci ∈ Z p (K)} is an H p (K)-


cycle basis or H p (K)-basis in short if g = dim H p (K). The classes of cycles in such a cycle basis
constitute a basis for H p (K). An optimal H p (K)-generator that is also an H p (K)-basis is called an
optimal H p (K)-basis.

We observe that optimal H p (K)-generators with positively weighted cycles are necessarily
cycle bases. Notice that to generate H p (K), the number of cycles in any H p (K)-generator has to
be at least β p (K) = dim H p (K). On the other hand, an optimal H p (K)-generator with positively
weighted cycles cannot have more than β p cycles because such a generator must contain a cycle
whose class is a linear combination of the classes of other cycles in the generator. Thus, omission
of this cycle still generates H p (K) while decreasing the weight of the generator. For 1-dimension,
similar reasoning can also be applied to conclude that each cycle in an H1 (K)-cycle basis nec-
essarily contains a simple cycle which together form a cycle basis (Exercise 1). A 1-cycle is
simple if it has a single connected component (viewed as a graph) and every vertex has exactly
two incident edges.

Fact 5.1.

(i) An optimal H p (K)-generator with positively weighted cycles is an optimal H p (K)-basis.

(ii) Every cycle ci in an H1 (K)-basis has a simple cycle c0i ⊆ ci so that {c0i }i form an H1 (K)-basis.

We now focus on computing an optimal H p (K)-basis also known as the optimal homology
basis problem or OHBP in short. One may observe that Definition 5.3 formulates OHBP as a
weighted `1 -optimization of representatives of bases. This allows for different types of optimality
to be achieved by choosing different weights. For example, assume that the simplicial complex K
of dimension p or greater is embedded in Rd , where d ≥ p + 1. Let the Euclidean p-dimensional
volume of p-simplices be their weights. This specializes OHBP to the Euclidean `1 -optimization
problem. The resulting optimal H p (K)-basis has the smallest p-dimensional volume amongst all
such bases. If the weights are taken to be unit, the resulting optimal solution has the smallest
number of p-simplices amongst all H p (K)-bases.

5.1.1 Greedy algorithm for optimal H p (K)-basis


Consider the following greedy algorithm in which we first sort the input cycles in non-decreasing
order of their weights, and then choose a cycle following this order if its class is independent of
the classes of the cycles chosen before.
The greedy algorithm Algorithm 7:GreedyBasis is motivated by Proposition 5.1. The specific
implementation of line 4 (on independence test) will be given in Section 5.1.2.
126 Computational Topology for Data Analysis

Algorithm 7 GreedyBasis(C)
Input:
A set of p-cycles C in a complex
Output:
A maximal set of cycles from C whose classes are independent and total weight is minimum
1: Sort the cycles from C in non-decreasing order of their weights; that is, C = {c1 , . . . , cn }
implies w(ci ) ≤ w(c j ) for i ≤ j
2: Let B := {c1 }
3: for i = 2 to n do
4: if [ci ] is independent w.r.t. B then
5: B := B ∪ {ci }
6: end if
7: end for
8: if [c1 ] is trivial (boundary), output B \ {c1 } else output B

Proposition 5.1. Suppose that C, the input to the algorithm GreedyBasis, contains an optimal
H p (K)-basis. Then, the output of GreedyBasis is an optimal H p (K)-basis.

Proof. Let C contain an optimal H p (K)-basis C∗ = {c∗1 , . . . , c∗g } sorted according to their ap-
pearance in the ordered sequence of C = {c1 , . . . , cn }. Let C0 = {c01 , . . . , c0g0 } be the output of
GreedyBasis again sorted according to the appearance of the cycles in C. By Definition 5.3, g,
the cardinality of C∗ , is the dimension of H p (K) and hence g0 ≤ g because g + 1 or more classes
cannot be independent in H p (K).
Among all optimal H p (K)-basis that C contains, take C∗ to be lexicographically smallest, that
is, there is no other sorted C̃∗ = {c̃∗1 , . . . , c̃∗g } so that there exists a j ≥ 1 where c̃∗1 = c∗1 , . . . , c̃∗j−1 =
c∗j−1 and c̃∗j = ck and c∗j = c` with k < `.
First, we show that C0 is a prefix of C∗ . If not, there is a least index j ≥ 1 so that c∗j , c0j .
Since the classes of the cycles in C∗ form a basis for H p (K), and C0 cannot contain any trivial cycle
(ensured by step 8), the class [c0j ] can be written as a linear combination of the classes of the cycles
in C∗ . Consider the class [c∗k ] in this linear combination with the largest index k. It is not possible
that c∗k appears before c0j in the order. This is because then [c0j ] will be a linear combination of the
classes of the cycles appearing before c0j in C0 which is impossible by the construction of C0 . So,
assume that c∗k appears after c0j . Then, consider the sorted sequence of cycles C̃∗ constructed by
replacing c∗k in C∗ with c0j . First, notice that C̃∗ is lexicographically smaller than C∗ and it is also
an H p (K)-basis contradicting the fact that C∗ is the lexicographically smallest optimal cycle basis.
The fact that C̃∗ is an H p (K)-cycle basis follows from the observation that [c0j ] is independent
of the classes of the cycles in C∗ \ {c∗k } because [c0j ] is a linear combination of the classes that
necessarily include [c∗k ].
Now, to complete the proof, we note that g0 = g. If not, then g0 < g and C0 is a prefix of C∗ .
But, then one can add c∗g0 +1 from C∗ to C0 where [c∗g0 +1 ] is independent of all classes of the cycles
already in C0 . This suggests that the algorithm GreedyBasis cannot stop without enlarging C0 . 

The above proposition suggests that GreedyBasis can compute an optimal cycle basis if its
Computational Topology for Data Analysis 127

input set C contains one. We show next that such an input (i.e, a set of 1-cycles containing an
optimal H1 (K)-basis) can be computed for H1 (K) in O(n2 log n) time where the 2-skeleton of K
has n simplices.
Specifically, given a simplicial complex K, notice that H1 (K) is completely determined by
the 2-skeleton of K and hence without loss of generality we can assume K to be a 2-complex.
Algorithm 8:Generator computes a set C of 1-cycles from such a complex which includes an
optimal basis.

Algorithm 8 Generator(K)
Input:
A 2-complex K
Output:
A set of 1-cycles containing an optimal H1 (K)-basis
1: Let K 1 be the 1-skeleton of K with vertex set V and edge set E
2: C := {∅}
3: for all v ∈ V do
4: compute a shortest path tree T v rooted at v in K 1 = (V, E)
5: for all e = (u, w) ∈ E \ T v s.t. u, w ∈ T v do
6: Compute cycle ce = πu,w ∪ {e} where πu,w is the unique path connecting u and w in T v
7: C := C ∪ {ce }
8: end for
9: end for
10: Output C

Proposition 5.2. Generator(K) computes an H1 (K)-generator C with their weights in O(n2 log n)
time for a 2-complex K with n vertices and edges. Furthermore, the set C contains an optimal basis
where |C| = O(n2 ).

Proof. We prove that any cycle c in an optimal H1 -basis C∗ that is not computed by Generator can
be replaced by a cycle computed by Generator while keeping C∗ optimal. This proves the claim
that the output of Generator contains an optimal basis (and thus C is necessarily a H1 -generator).
First, assume that C∗ consists of simple cycles because otherwise we can choose such cycles
from the cycles of C∗ due to Fact 5.1(ii). So, assume that c ∈ C∗ is simple. Let v be any vertex in
c. There exists at least one edge e in c which is not in the shortest path tree T v . Let e = {u, w}.
Consider the shortest paths πv,u and πv,w in T v from the root v to the vertices u and w respectively.
Notice that even though K 1 may be disconnected, vertices u, w are necessarily in T v . Also, let π0v,u
and π0v,w be the paths from v to u and w respectively in the cycle c. If πv,u = π0v,u and πv,w = π0v,w
we have c = ce computed by Generator. So, assume that at least one path does not satisfy this
condition, say πv,u , π0v,u . See Figure 5.3.
Consider the two cycles c1 and c2 where c1 consists of the paths π0v,w , πv,u and e; c2 consists
of the paths πv,u and π0v,u . Observe that c = c1 + c2 . Also, w(c1 ) ≤ w(c) and w(c2 ) ≤ w(c). If both
[c1 ] and [c2 ] are dependent on the classes of the cycles in C∗ \ c, we will have [c] dependent on
them as well. This contradicts that C∗ is an H1 (K)-basis.
128 Computational Topology for Data Analysis

Tv

πv,w

πv,w
πv,u
πv,u

0
e
u w

Figure 5.3: Tree T v and the paths πv,u , πv,w , π0v,u , π0v,w .

If [c1 ] is independent of the classes of cycles in C∗ \ c, obtain a new H1 (K)-basis by replacing


c with c1 . Then, apply the same argument on c1 once more by taking the new vertex v to be the
common ancestor of πv,u and π0v,w and the new edge e to be the old one. We will have a new
H1 -basis whose weight is no more than C∗ while replacing one of its cycles that is not computed
by Generator with a cycle necessarily computed by Generator.
If [c2 ] is independent of the classes of cycles in C∗ \{c}, obtain a new H1 (K)-basis by replacing
c with c2 and then apply the same argument on c2 once more by taking the new vertex v to be the
common ancestor of πv,u and π0v,u and the new edge e to be an edge incident to u in c2 . Again, we
will have a new H1 -basis whose weight is no more than C∗ while replacing one of its cycles that
is not computed by Generator with a cycle necessarily computed by Generator. This completes
the claim that the output of Generator contains an optimal basis.
To see that Generator takes time as claimed, observe that each shortest path tree computation
takes O(n log n) time by Dijkstra’s algorithm implemented with Fibonacci heap [109]. Summing
over O(n) vertices, this gives O(n2 log n) time. Each of the O(n) edges in E \ T v for every vertex
v gives O(n) cycles in the output accounting for O(n2 ) cycles in total giving |C| = O(n2 ). One
can save space by representing each such cycle with the edge E \ T v while keeping T v for all of
them without duplicates. Also, observe that the weight of each cycle w(ce ) can be computed as a
by-product of the Dijkstra’s algorithm because it computes the weights of the shortest paths from
the root to any of the vertices. Therefore, in O(n2 log n) time, Generator can output an H1 (K)-
generator with their weights. 

5.1.2 Optimal H1 (K)-basis and independence check


To compute an optimal H1 (K)-basis, we first run Generator on K and then feed the output to
GreedyBasis as presented in Algorithm 9:OptGen which outputs an optimal H1 -basis due to
Propositions 5.2.
However, we need to specify how to check the independence of the cycle classes in step 4
and triviality of cycle c1 in step 8 of GreedyBasis. We do this by using annotations described in
Section 4.2.1. Recall that a(·) denotes the annotation of its argument which is a binary vector.
Algorithm 10:AnnotEdge is a version of Algorithm 6:Annot adapted to edges only.
Computational Topology for Data Analysis 129

Algorithm 9 OptGen(K)
Input:
A 2-complex K
Output:
An optimal H1 (K)-basis
1: C:= Generator(K)
2: Output C∗ :=GreedyBasis(C)

Algorithm 10 AnnotEdge(K)
Input:
A simplicial 2-complex K
Output:
Annotations for edges in K
1: Let K 1 be the 1-skeleton of K with edge set E
2: Compute a spanning forest T of K 1 ; m = |E| − |T |
3: For every edge e ∈ E ∩ T , assign an m-vector a(e) where a(e) = 0
4: Index remaining edges in E \ T as e1 , . . . , em
5: For every edge ei , assign a(ei )[ j] = 1 iff j = i
6: for all triangle t ∈ K do
7: if a(∂t) , 0 then
8: pick any non-zero entry bu in a(∂t)
9: add a(∂t) to every edge e s.t. a(e)[u] = 1
10: delete u-th entry from annotation of every edge
11: end if
12: end for

Assume that each cycle ce ∈ C output by Generator is represented by e and implicitly by


the path πu,w in T v . Assume that an annotation of edges has already been computed. This can be
done by the algorithm AnnotEdge. A straightforward analysis shows that AnnotEdge takes O(n3 )
time where n is the total number of vertices, edges, and triangles in K. However, for achieving
better time complexity, we can use the earliest basis algorithm described in [60] which runs in
time O(nω ).
Once the annotations for edges are computed, we need to compute the annotations for the set
C of cycles computed by Generator to check independence among them in GreedyBasis. We
first describe how do we compute the annotations for a cycle in C. We compute an auxiliary
annotation of the vertices in T v from the annotations of its edges to facilitate computing a(ce ) for
cycles ce ∈ C. We traverse the tree T v top-down and compute the auxiliary annotation a(x) of a
vertex x in T v as a(x) = a(y) + a(e xy ) where y is the parent of x and e xy is the edge connecting
x and y. The process is initiated by assigning a(v) for the root v to be the zero vector. It should
be immediately clear that all auxiliary annotations of the vertices can be computed in O(gn) time
where g, the length of the annotation vectors, equals β1 (K). The annotation of each cycle ce ∈ C
is computed as a(ce ) = a(u) + a(w) + a(e) where e = (u, w). Again, this takes O(g) time per edge
130 Computational Topology for Data Analysis

Figure 5.4: (left) A non-trivial cycle in a double torus, (right) optimal cycle in the class of the
cycle on left.

e and hence per cycle ce ∈ C giving a time complexity of O(gn2 ) in total for the entire set C.
Next, we describe an efficient way of determining the independence of cycles as needed in
step 4 of GreedyBasis. Independence of the class [ce ] with respect to all classes already chosen
by GreedyBasis is done in a batch mode. One can do it edge by edge incurring more cost. We use
a divide-and-conquer strategy instead.
Let ce1 , ce2 , . . . , cek be the sorted order of cycles in C computed by Generator. We construct a
matrix A whose ith column is the vector a(cei ), and compute the first g columns that are indepen-
dent called the earliest basis of A. Since there are k cycles in C, the matrix A is g × k. We use the
following iterative method, based on making blocks, to compute the set J of indices of columns
that define the earliest basis. We partition A from left to right into submatrices A = [A1 |A2 | · · · ],
where each submatrix Ai contains g columns, with the possible exception of the last submatrix,
which contains at most g columns. Initially, we set J to be the empty set. We then iterate over the
submatrices Ai by increasing index, that is, as they are ordered from left to right. At each iteration
we compute the earliest basis for the matrix [A J |Ai ], where A J is the submatrix whose column
indices are in J. We then set J to be the indices from the resulting earliest basis, increase i, and go
to the next iteration. At each iteration we need to compute the the earliest basis in a matrix with
g rows and at most |J| + g ≤ 2g columns. Thus, each iteration takes O(gω ) time, and there are at
most O(k/g) = O(n2 /g) iterations. Summing over all iterations, this gives a time complexity of
O(n2 gω−1 ).
Theorem 5.3. Given a simplicial 2-complex K with n simplices, an optimal H1 (K)-basis can be
computed in O(nω + n2 gω−1 ) time.
Proof. A H1 -generator containing an optimal (cycle) basis can be computed in O(n2 log n)
time due to Proposition 5.2. One can compute an optimal H1 -basis from C by GreedyBasis
due to Proposition 5.1. However, instead of using GreedyBasis, we can apply the divide-and-
conquer technique outlined above for computing the cycles output by GreedyBasis which takes
O(nω + n2 gω−1 ) time. Retaining only the dominating terms, we obtain the claimed complexity for
the entire algorithm. 

5.2 Localization
In this section we consider a different optimization problem. Here we are given a p-cycle c in
an input complex with non-negative weights on p-simplices and our goal is to compute a cycle
Computational Topology for Data Analysis 131

c∗ that is of optimal (minimal) weight in the homology class [c], see Figure 5.4. We extend this
localization problem from cycles to chains. For this, first we extend the concept of homologous
cycles in Section 2.5 to chains straightforwardly. Two p-chains c, c0 ∈ C p are called homologous
if and only if they differ by a boundary, that is, c ∈ c0 + B p . We ask for computing a chain of
minimal weight which is homologous to a given chain.

Definition 5.4. Let w : K(p) → R≥0 be a non-negative weight function defined on the set of
p-simplices in a simplicial complex K. We extend w to the chain group C p by defining w(c) =
i ci w(σi ) where c = i ci σi .
P P

Definition 5.5 (OHCP). Given a non-negative weight function w : K(p) → R≥0 defined on the
set of p-simplices in a simplicial complex K and a p-chain c in C p (K), the optimal homologous
chain problem (OHCP) is to find a chain c∗ which has the minimal weight w(c∗ ) among all chains
homologous to c.

If we use Z2 as the coefficient ring for defining homology classes, the OHCP becomes NP-
hard. We are going to show that it becomes polynomial time solvable if (i) the coefficient ring is
chosen to be integers Z and (ii) the complex K is such that H p (K) does not have a torsion which
may be introduced because of using Z as the coefficient ring.
We will formulate OHCP as an integer program which requires the chains to be represented
as an integer vector. Given a p-chain x = m−1 i=0 xi σi with integer coefficients xi , we use x ∈ Z
m
P
to denote the vector formed by the coefficients xi . Thus, x is the representation of the chain x in
the elementary p-chain basis, and we will use x and x interchangeably.
Recall that for a vector x ∈ Rm , the 1-norm (or `1 -norm) kxk1 is i |xi |. Let W be any real
P
m × m diagonal matrix with diagonal entries wi . Then, the 1-norm of W x, that is, kW xk1 is
P
i |wi ||xi |. (If W is a general m × m nonsingular matrix then kW xk1 is called the weighted 1-norm
of x.) We now state in words our approach to the optimal homologous chains and later formalize
it in Eqn. (5.1). The main idea is to cast OHCP as an integer program. Unfortunately, integer
programs are in general NP-hard and thus cannot be solved in polynomial time unless P=NP. We
solve it by a linear program and identify a class of integer programs called totally unimodular
for which linear programs give exact solution. Then, we interpret total unimodularity in terms of
topology. Our approach to solve OHCP can be succinctly stated by the following steps:

• write OHCP as an integer program involving 1-norm minimization, subject to linear con-
straints;

• convert the integer program into an integer linear program by converting the 1-norm cost
function to a linear one using the standard technique of introducing some extra variables
and constraints;

• find the conditions under which the constraint matrix of the integer linear program is totally
unimodular; and

• for this class of problems, relax the integer linear program to a linear program by dropping
the constraint that the variables be integral. The resulting optimal chain obtained by solving
the linear program will be an integer valued chain homologous to the given chain.
132 Computational Topology for Data Analysis

5.2.1 Linear program


Now we formally pose OHCP as an optimization problem. After showing existence of solutions
we reformulate the optimization problem as an integer linear program and eventually as a linear
program.
Assume that the number of p- and (p + 1)-simplices in K is m and n respectively, and let W
be a diagonal m × m matrix. Using the notation from Section 3.3.1, let D p represent the boundary
matrix for the boundary operator ∂ p : C p → C p−1 in the elementary chain bases. With these
notations, given a p-chain c represented with an integral vector, the optimal homologous chain
problem in dimension p is to solve:

min kW xk1 such that x = c + D p+1 y, and x ∈ Zm , y ∈ Zn . (5.1)


x, y

We assume that W is a diagonal matrix obtained from non-negative weights on simplices. Let
w be a non-negative real-valued weight function on the oriented p-simplices of K and let W be
the corresponding diagonal matrix (the i-th diagonal entry of W is w(σi ) = wi ).
The resulting objective function kW xk1 = i wi |xi | in (5.1) is not linear in xi because it uses
P
the absolute value of xi . However, it is piecewise-linear in these variables. As a result, Eqn. (5.1)
can be reformulated as an integer linear program by splitting every variable xi into two parts xi+
and xi− [27, page 18]:

wi (xi+ + xi− )
X
min
i
+
subject to x − x− = c + D p+1 y (5.2)
+
x , x ≥0

x+ , x− ∈ Zm , y ∈ Zn .

Comparing the above formulation to the standard form integer linear program in Eqn. (5.4), we
notice that the vector x in Eqn. (5.4) corresponds to [x+ , x− , y]T in Eqn. (5.2) above. Thus, the
minimization is over x+ , x− and y, and the coefficients of xi+ and xi− in the objective function are
wi , but the coefficients corresponding to y j are zero. The linear programming relaxation of this
formulation just removes the constraints about the variables being integral. The resulting linear
program is:

wi (xi+ + xi− )
X
min
i
subject to x+ − x− = c + D p+1 y
x+ , x− ≥ 0 .

To cast the program in standard form [27], we can eliminate the free (unrestricted in sign)
variables y by replacing these by y+ − y− and imposing the non-negativity constraints on the
new variables. The resulting linear program has the same objective function, and the equality
Computational Topology for Data Analysis 133

constraints:

wi (xi+ + xi− )
X
min
i
subject to x − x− = c + D p+1 (y+ − y− )
+

x+ , x− , y+ , y− ≥ 0 .

We can write the above program as

min f T z subject to Az = c, z ≥ 0 (5.3)

where f = [w, 0]T , z = [x+ , x− , y+ , y− ]T , and the equality constraint matrix is A = I −I −B B ,


h i

where B = D p+1 . This is exactly in the form we want the linear program to be in view of
Eqn. (5.4). We now prove a result about the total unimodularity of this matrix that allows us to
solve the optimization by a linear program.

5.2.2 Total unimodularity


A matrix is called totally unimodular if the determinant of each square submatrix is 0, 1, or −1.
The significance of total unimodularity in our setting is due to the following Theorem 5.4 which
follows immediately from known results in optimization [201].
Consider an integral vector b ∈ Zm and a real vector f ∈ Rn . Consider the integer linear
program
min f T x subject to Ax = b, x ≥ 0 and x ∈ Zn . (5.4)

Theorem 5.4. Let A be a m × n totally unimodular matrix. Then the integer linear program (5.4)
can be solved in time polynomial in the dimensions of A.
h i
Proposition 5.5. If B = D p+1 is totally unimodular then so is the matrix I −I −B B .

Proof. The proof uses operations that preserve the total unimodularity ofh a matrix.
i These are
listed in [272, page 280]. If B is totally unimodular then so is the matrix −B B since scalar
multiples of columns of B are being appended on the left to get this matrix. The full matrix in
question can be obtained from this one by appending columns with a single ±1 on the left, which
proves the result. 

As a result of Theorem 5.4 and Proposition 5.5, we have the following algorithmic result.

Theorem 5.6. If the boundary matrix D p+1 of a finite simplicial complex of dimension greater
than p is totally unimodular, the optimal homologous chain problem (5.1) for p-chains can be
solved in polynomial time.

Proof. We have seen above that a reformulation of OHCP without the integrality constraints
leads to the linear program (5.3). By Proposition 5.5, the equality constraint matrix of this linear
program is totally unimodular. Then, by Theorem 5.4, the linear program (5.3) can be solved in
polynomial time, while achieving an integral solution. 
134 Computational Topology for Data Analysis

Manifolds. Our results in the next section (Section 5.2.3) are valid for any finite simplicial
complex. But first we consider a simpler case – simplicial complexes that are triangulations of
manifolds. We show that for finite triangulations of compact p-dimensional orientable manifolds,
the top non-trivial boundary matrix D p is totally unimodular irrespective of the orientations of its
simplices. There are examples of non-orientable manifolds where total unimodularity does not
hold (Exercise 7). Further examination of why total unimodularity does not hold in these cases
leads to the results in Theorem 5.9.
Let K be a finite simplicial complex that triangulates a (p+1)-dimensional compact orientable
manifold M.

Theorem 5.7. For a finite simplicial complex triangulating a (p + 1)-dimensional compact ori-
entable manifold, D p+1 is totally unimodular irrespective of the orientations of the simplices.

As a result of the above theorem and Theorem 5.6 we have the following result.

Corollary 5.8. For a finite simplicial complex triangulating a (p + 1)-dimensional compact ori-
entable manifold, the optimal homologous chain problem can be solved for p-dimensional chains
in polynomial time.

5.2.3 Relative torsion


Now we consider the more general case of simplicial complexes. We characterize the total uni-
modularity of boundary matrices for arbitrary simplicial complexes. This characterization leads
to a torsion-related condition for the complexes; see [241] for the definition of torsion. Since we
do not use any conditions about the geometric realization or embedding in R p for the complex,
the result is also valid for abstract simplicial complexes. As a corollary of the characterization
we show that the OHCP can be solved in polynomial time as long as the input complex satisfies a
torsion-related condition.

TU and relative torsion


Definition 5.6 (Pure simplicial complex). A pure simplicial complex of dimension p is a simpli-
cial complex formed by a collection of p-simplices and their faces. Similarly, a pure subcomplex
is a subcomplex that is a pure simplicial complex.

An example of a pure simplicial complex of dimension p is one that triangulates a p-dimensional


manifold. Another example, relevant to our discussion, is a subcomplex formed by a collection
of some p-simplices of a simplicial complex and their faces.
Let K be a finite simplicial complex of dimension greater than p. Let L ⊆ K be a pure
subcomplex of dimension p + 1 and L0 ⊂ L be a pure subcomplex of dimension p. Recall
the definition of relative boundary operator in Section 2.5.2 used for defining relative homology.
Then, the matrix DL,L 0
p+1 representing the relative boundary operator

∂L,L
p+1 : C p+1 (L, L0 ) → C p (L, L0 ) ,
0

is obtained by first including the columns of D p+1 corresponding to (p + 1)-simplices in L and


then, from the submatrix so obtained, excluding the rows corresponding to the p-simplices in L0
Computational Topology for Data Analysis 135

and any zero rows. The zero rows correspond to p-simplices that are not faces of any of the
(p + 1)-simplices of L. Then the following holds.
Theorem 5.9. D p+1 is totally unimodular if and only if H p (L, L0 ) is torsion-free, for all pure
subcomplexes L0 , L of K of dimensions p and p + 1 respectively, where L0 ⊂ L.
Proof. (only if): We show that if H p (L, L0 ) has torsion for some L, L0 then D p+1 is not totally
unimodular. Let DL,L 0 L,L0
p+1 be the corresponding relative boundary matrix. Bring D p+1 to the so called
Smith normal form which is a block matrix
∆ 0
" #

0 0
where ∆ = diag(d1 , . . . , dl ) is a diagonal matrix with di ≥ 1 being integers. The row or column
of zero matrices in the block shown above may be empty, depending on the dimension of the
matrix. This can be done, for example, by using the reduction algorithm [241][pages 55–57]. The
construction of the Smith normal form implies that dk > 1 for some 1 ≤ k ≤ l because H p (L, L0 )
has torsion. Thus, the product d1 . . . dk is greater than 1. By a result of Smith [281] mentioned
in [272, page 50], this product is the greatest common divisor of the determinants of all k × k
square submatrices of DL,L 0 L,L0
p+1 . It follows that some square submatrix of D p+1 , and hence of D p+1 ,
has determinant value greater than 1. Then, D p+1 is not totally unimodular.

(if): Assume that D p+1 is not totally unimodular. We show that, in that case, there exist sub-
complexes L0 and L of dimensions p and (p + 1) respectively, with L0 ⊂ L, so that H p (L, L0 ) has
torsion. Let S be a square submatrix of D p+1 so that |det(S )| > 1. Let L correspond to the columns
of D p+1 that are included in S and let BL be the submatrix of D p+1 formed by these columns. This
submatrix BL may contain zero rows. Those zero rows (if any) correspond to p-simplices that are
not a facet of any of the (p + 1)-simplices in L. To form S from BL , we first discard the zero rows
to form a submatrix B0L . This is safe because det(S ) , 0 and so these zero rows cannot occur in
S.
The rows in B0L correspond to p-simplices that adjoin some (p + 1)-simplex in L. Let L0
correspond to rows of B0L which are excluded to form S . Observe that S is the relative boundary
matrix DL,L 0
p . Consider the Smith normal form of S . This normal form is a square diagonal matrix
obtained by reducing S . Since the elementary row and column operations used for this reduction
preserve determinant magnitude, the determinant of the resulting diagonal matrix has magnitude
greater than 1. It follows that, at least one of the diagonal entries in the normal form is greater
than 1. Then, by [241, page 61] H p (L, L0 ) has torsion. 

Corollary 5.10. For a simplicial complex K of dimension greater than p, there is a polynomial
time algorithm for answering the following question: Is H p (L, L0 ) torsion-free for all subcom-
plexes L0 and L of dimensions p and (p + 1) such that L0 ⊂ L?
Proof. Seymour’s decomposition theorem for totally unimodular matrices [273],[272, Theorem
19.6] yields a polynomial time algorithm for deciding if a matrix is totally unimodular or not
[272, Theorem 20.3]. That algorithm applied on the boundary matrix D p+1 proves the above as-
sertion. 
136 Computational Topology for Data Analysis

A special case. In Section 5.2.2, we have seen the special case of compact orientable manifolds.
We saw that the top dimensional boundary matrix of a finite triangulation of such a manifold is
totally unimodular. Now we show another special case for which the boundary matrix is totally
unimodular and hence OHCP is polynomial time solvable. This case occurs when we ask for
optimal p-chains in a simplicial complex K which is embedded in R p+1 . In particular, OHCP can
be solved by linear programming for 2-chains in 3-complexes embedded in R3 . This follows from
the following result:

Theorem 5.11. Let K be a finite simplicial complex embedded in R p+1 . Then, H p (L, L0 ) is torsion-
free for all pure subcomplexes L0 and L of dimensions p and p + 1 respectively, such that L0 ⊂ L.

Corollary 5.12. Given a p-chain c in a weighted finite simplicial complex embedded in R p+1 , an
optimal chain homologous to c can be computed by a linear program.

5.3 Persistent cycles


So far, we have considered optimal cycles in a given complex. Now, we consider optimal cycles
in the context of a filtration. We know that a filtration of a complex gives rise to persistence
of homology classes. An interval module which appears as a bar in the barcode are created by
homology classes that get born and die at the endpoints. However, the bar is not associated with
the class of a particular cycle because more than one cycle may get born and die at the endpoints.
Among all these cycles, we want to identify the cycle that is optimal with respect to a weight
assignment as defined earlier. Note that, by Remark 3.4 in Section 3.4, an interval [b, d − 1] in
the interval decomposition of a persistence module H p (F) arising from a simplicial filtration F
corresponds to a closed-open interval [b, d) contributing a point (b, d) in the persistence diagram
Dgm p (F) as defined in Definition 3.8. We also say that the interval [b, d) belongs to Dgm p (F).
Let the cycles be weighted with a weight function w : K(p) → R≥0 defined on the set of
p-simplices in a simplicial complex K as before.

Definition 5.7 (Persistent cycle). Given a filtration F : ∅ = K0 ,→ K1 ,→ . . . ,→ Kn = K, and a


finite interval [b, d) ∈ Dgm p (F), we say a cycle c is a persistent cycle for [b, d) if c is born at Kb
and becomes a boundary in Kd . For an infinite interval [b, ∞) ∈ Dgm p (F), we say a cycle c is a
persistent cycle for [b, ∞) if c is born at Kb . In both cases, a persistent cycle is called optimal if it
has the least weight among all such cycles for a bar.

Depending on whether the interval is finite or not, we have two cases captured in the following
definitions.

Problem 1 (PCYC-FIN p ). Given a finite filtration F and a finite interval [b, d) ∈ Dgm p (F), this
problem asks for computing an optimal persistent p-cycle for the bar [b, d).

Problem 2 (PCYC-INF p ). Given a finite filtration F and an infinite interval [b, ∞) ∈ Dgm p (F),
this problem asks for computing an optimal persistent p-cycle for the bar [b, ∞).

When p ≥ 2, computing optimal persistent p-cycles for both finite and infinite intervals is
NP-hard in general. We identify a special but important class of simplicial complexes, which we
term as weak (p + 1)-pseudomanifolds, whose optimal persistent p-cycles can be computed in
Computational Topology for Data Analysis 137

polynomial time. A weak (p + 1)-pseudomanifold is a generalization of a (p + 1)-manifold and is


defined as follows:

Definition 5.8 (Weak pseudomanifold). A simplicial complex K is a weak (p+1)-pseudomanifold


if each p-simplex is a face of no more than two (p + 1)-simplices in K.

Specifically, it turns out that if the given complex is a weak (p + 1)-pseudomanifold, the
problem of computing optimal persistent p-cycles for finite intervals can be cast into a mini-
mal cut problem (see Section 5.3.1) due to the fact that persistent cycles of such kind are null-
homologous in the complex. However, when p ≥ 2 and intervals are infinite, the computation of
the same becomes NP-hard. Nonetheless, for infinite intervals, if we assume that the weak (p+1)-
pseudomanifold is embedded in R p+1 , then the optimal persistent p-cycle problem reduces to a
minimal cut problem (see Section 5.3.3) and hence belongs to P. Note that a simplicial complex
that can be embedded in R p+1 is necessarily a weak (p + 1)-pseudomanifold. We also note that
while there is an algorithm [94] in the non-persistence setting which computes an optimal p-cycle
by minimal cuts (Exercise 8, the non-persistence algorithm assumes the (p + 1)-complex to be
embedded in R p+1 ), the algorithm for finite intervals presented here, to the contrary, does not need
the embedding assumption.
Before we present the algorithms for cases where they run in polynomial time, we summarize
the complexity results for different cases. In order to make our statements about the hardness
results precise, we let WPCYC-FIN p denote a subproblem1 of PCYC-FIN p and let WPCYC-
INF p , WEPCYC-INF p denote two subproblems of PCYC-INF p , with the subproblems requiring
additional constraints on the given simplicial complex. Table 5.1 lists the hardness results for
all problems of interest, where the column “Restriction on K” specifies the additional constraints
subproblems require on the given simplicial complex K. Note that WPCYC-INF p being NP-hard
trivially implies that PCYC-INF p is NP-hard.

Table 5.1: Hardness results for optimal persistent cycle problems.


Problem Restriction on K p Hardness
PCYC-FIN p − ≥1 NP-hard
WPCYC-FIN p K a weak (p + 1)-pseudomanifold ≥1 Polynomial
PCYC-INF p − =1 Polynomial
WPCYC-INF p K a weak (p + 1)-pseudomanifold ≥2 NP-hard
WEPCYC-INF p K a weak (p + 1)-pseudomanifold in R p+1 ≥2 Polynomial

The polynomial time algorithms for the cases listed in Table 5.1 map the problem of comput-
ing optimal persistent cycles into the classic problem of computing minimal cuts in a flow net-
work. The only exception is PCYC-INF1 which can be solved by computing Dijkstra’s shortest
paths in graphs. We will not consider this special case here whose details can be found in [128].

Undirected flow network. An undirected flow network (G, s1 , s2 ) consists of an undirected


graph G with vertex set V(G) and edge set E(G), a capacity function C : E(G) → [0, +∞], and
two non-empty disjoint subsets s1 and s2 of V(G). Vertices in s1 are referred to as sources and
1
For two problems P1 and P2 , P2 is a subproblem of P1 if any instance of P2 is an instance of P1 and P2 asks for
computing the same solutions as P1 .
138 Computational Topology for Data Analysis

σb

σd

(a) (b) (c) (d)

Figure 5.5: An example of the constructions in our algorithm showing the duality between persis-
tent cycles and cuts having finite capacity for p = 1. (a) The input weak 2-pseudomanifold K with
its dual flow network drawn in blue, where the central hollow vertex denotes the dummy vertex,
the red vertex denotes the source and the orange vertices denote the sinks. All graph edges dual
to the outer boundary 1-simplices actually connect to the dummy vertex. (b) The partial complex
Kb in the input filtration F, where the bold green 1-simplex denotes σFb which creates the green
1-cycle. (c) The partial complex Kd in F, where the 2-simplex σFd creates the pink 2-chain killing
the green 1-cycle. (d) The green persistent 1-cycle of the interval [b, d) is dual to a cut (S , T )
having finite capacity, where S contains all the vertices inside the pink 2-chain and T contains all
the other vertices. The red graph edges denote those edges across (S , T ) and their dual 1-chain is
the green persistent 1-cycle.

vertices in s2 are referred to as sinks. A cut (S , T ) of (G, s1 , s2 ) consists of two disjoint subsets S
and T of V(G) such that S ∪ T = V(G), s1 ⊆ S , and s2 ⊆ T . We define the set of edges across the
cut (S , T ) as

E(S , T ) = {e ∈ E(G) | e connects a vertex in S and a vertex in T }

The capacity of a cut (S , T ) is defined as C(S , T ) = e∈E(S ,T ) C(e). A minimal cut of (G, s1 , s2 ) is
P
a cut with the minimal capacity. Note that we allow parallel edges in G (see Figure 5.6) to ease
the presentation. These parallel edges can be merged into one edge during computation.

5.3.1 Finite intervals for weak (p + 1)-pseudomanifolds


In this subsection, we present an algorithm which computes optimal persistent p-cycles for finite
intervals given a filtration of a weak (p + 1)-pseudomanifold when p ≥ 1. The general approach
proceeds as follows: Suppose that the input weak (p + 1)-pseudomanifold is K which is associated
with a simplex-wise filtration F : ∅ = K0 ,→ K1 ,→ . . . ,→ Kn and the task is to compute an
optimal persistent cycle of a finite interval [b, d) ∈ Dgm p (F). Let σFb and σFd be the creator and
destructor pair for the interval [b, d). We first construct an undirected dual graph G for K where
vertices of G are dual to (p + 1)-simplices of K and edges of G are dual to p-simplices of K. One
dummy vertex v∞ termed as infinite vertex which does not correspond to any (p + 1)-simplices is
added to G for graph edges dual to those boundary p-simplices, i.e., the p-simplices that are faces
of at most one (p + 1)-simplex. We then build an undirected flow network on top of G where the
source is the vertex dual to σFd and the sink is the infinite vertex along with the set of vertices
dual to those (p + 1)-simplices which are added to F after σFd . If a p-simplex is σFb or added to F
before σFb , we let the capacity of its dual graph edge be its weight; otherwise, we let the capacity
Computational Topology for Data Analysis 139

of its dual graph edge be +∞. Finally, we compute a minimal cut of this flow network and return
the p-chain dual to the edges across the minimal cut as an optimal persistent cycle of the interval.
The intuition of the above algorithm is best explained by an example illustrated in Figure 5.5,
where p = 1. The key to the algorithm is the duality between persistent cycles of the input
interval and cuts of the dual flow network having finite capacity. To see this duality, first consider
a persistent p-cycle c of the input interval [b, d). There exists a (p + 1)-chain A in Kd created
by σFd whose boundary equals c, making c killed. We can let S be the set of graph vertices
dual to the simplices in A and let T be the set of the remaining graph vertices, then (S , T ) is a
cut. Furthermore, (S , T ) must have finite capacity as the edges across it are exactly dual to the
p-simplices in c and the p-simplices in c have indices in F less than or equal to b. On the other
hand, let (S , T ) be a cut with finite capacity, then the (p + 1)-chain whose simplices are dual to
the vertices in S is created by σFd . Taking the boundary of this (p + 1)-chain, we get a p-cycle c.
Because p-simplices of c are exactly dual to the edges across (S , T ) and each edge across (S , T )
has finite capacity, c must reside in Kb . We only need to ensure that c contains σFb in order to
show that c is a persistent cycle of [b, d). In Section 5.3.2, we argue that c indeed contains σFb
(proof of Theorem 5.14), so c is a persistent cycle.
In the dual graph, an edge is created for each p-simplex. If a p-simplex has two (p + 1)-
cofaces, we simply let its dual graph edge connect the two vertices dual to its two (p + 1)-cofaces;
otherwise, its dual graph edge has to connect to the infinite vertex on one end. A problem about
this construction is that some weak (p + 1)-pseudomanifolds may have p-simplices being face
of no (p + 1)-simplices and these p-simplices may create self loops around the infinite vertex.
To avoid self loops, we simply ignore these p-simplices. The reason why we can ignore these
p-simplices is that they cannot be on the boundary of a (p + 1)-chain and hence cannot be on a
persistent cycle of minimal weight. Algorithmically, we ignore these p-simplices by constructing
the dual graph only from what we call the (p + 1)-connected component of K containing σFd .

Definition 5.9 (q-connected). Let K be a simplicial complex. For q ≥ 1, two q-simplices σ and
σ0 of K are q-connected in K if there is a sequence of q-simplices of K, (σ0 , . . . , σl ), such that
σ0 = σ, σl = σ0 , and for all 0 ≤ i < l, σi and σi+1 share a (q − 1)-face. The property of
q-connectedness defines an equivalence relation on q-simplices of K. Each set in the partition
induced by the equivalence relation constitutes a q-connected component of K. We say K is q-
connected if any two q-simplices of K are q-connected in K. See Figure 5.6 for an example of
1-connected components and 2-connected components.

We present the pseudo-code in Algorithm 11:MinPersCycFin and it works as follows: Line 1


and 2 set up a complex K e that the algorithm mainly works on, where K e is taken as the closure
F
of the (p + 1)-connected component of σd . Line 3 constructs the dual graph G from K e and
lines 4−15 build the flow network on top of G. Note that we denote the infinite vertex by v∞ .
Line 16 computes a minimal cut for the flow network and line 17 returns the p-chain dual to
the edges across the minimal cut. In the pseudo-codes, to make presentation of algorithms and
some proofs easier, we treat a mathematical function as a programming object. For example, the
function θ returned by DualGraphFin in MinPersCycFin denotes the correspondence between the
simplices of Ke and their dual vertices or edges (see Section 5.3.1 for details). In practice, these
constructs can be easily implemented in any programming language.
140 Computational Topology for Data Analysis

Algorithm 11 MinPersCycFin(K, p, F, [b, d))


Input:
K: finite p-weighted weak (p + 1)-pseudomanifold
p: integer ≥ 1
F: filtration K0 ⊆ K1 ⊆ . . . ⊆ Kn of K
[b, d): finite interval of Dgm p (F)
Output:
An optimal persistent p-cycle for [b, d)
1: L p+1 ← (p + 1)-connected component of K containing σFd \∗ set up K
e ∗\
2: K ← closure of the simplicial set L
e p+1

3: (G, θ) ← DualGraphFin(K, e p) \∗ construct dual graph ∗\


4: for all e ∈ E(G) do
5: if index(θ−1 (e)) ≤ b then
6: C(e) ← w(θ−1 (e)) \∗assign finite capacity∗\
7: else
8: C(e) ← +∞ \∗assign infinite capacity∗\
9: end if
10: end for
11: s1 ← {θ(σFd )} \∗set the source∗\
12: s2 ← {v ∈ V(G) | v , v∞ , index(θ−1 (v)) > d} \∗set the sink∗\
13: if v∞ ∈ V(G) then
14: s2 ← s2 ∪ {v∞ }
15: end if
16: (S ∗ , T ∗ ) ← min-cut of (G, s1 , s2 )
17: Output θ−1 (E(S ∗ , T ∗ ))

Complexity. The time complexity of MinPersCycFin depends on the encoding scheme of the
input and the data structure used for representing a simplicial complex. For encoding the input,
we assume K and F are represented by a sequence of all the simplices of K ordered by their
indices in F, where each simplex is denoted by its set of vertices. We also assume a simple yet
reasonable simplicial complex data structure as follows: In each dimension, simplices are mapped
to integral identifiers ranging from 0 to the number of simplices in that dimension minus 1; each
q-simplex has an array (or linked list) storing all the id’s of its (q + 1)-cofaces; a hash map for
each dimension is maintained for the query of the integral id of each simplex in that dimension
based on the spanning vertices of the simplex. We further assume p to be constant. By the above
assumptions, let n be the size (number of bits) of the encoded input, then there are no more than
n elementary O(1) operations in line 1 and 2 so the time complexity of line 1 and 2 is O(n). It is
not hard to verify that the flow network construction also takes O(n) time so the time complexity
of MinPersCycFin is determined by the minimal cut algorithm. Using the max-flow algorithm by
Orlin [248], the time complexity of MinPersCycFin becomes O(n2 ).
Computational Topology for Data Analysis 141

In the rest of this section, we first describe the subroutine DualGraphFin, then close the section
by proving the correctness of the algorithm.

Dual graph construction. We describe the DualGraphFin subroutine used in Algorithm Min-
PersCycFin, which returns a dual graph G and a θ denoting two bijections which we use to prove
the correctness. Given the input (K,
e p), DualGraphFin constructs an undirected connected graph
G as follows:

• Let each vertex v of V(G) correspond to each (p + 1)-simplex σ p+1 of K.


e If there is any
p-simplex of K which has less than two (p + 1)-cofaces in K, we add an infinite vertex v∞
e e
to V(G). Simultaneously, we define a bijection

θ : {(p + 1)-simplices of K}
e → V(G) r {v∞ }

by letting θ(σ p+1 ) = v. Note that in the above range notation of θ, {v∞ } may not be a subset
of V(G).

• Let each edge e of E(G) correspond to each p-simplex σ p of K.


e Note that σ p has at least
p+1 p+1
one (p + 1)-coface in K.
e If σ has two (p + 1)-cofaces σ
p
0 and σ1 in K,
e then let e
p+1 p+1 p+1
connect θ(σ0 ) and θ(σ1 ); if σ has one (p + 1)-coface σ0 in K,
p e then let e connect
p+1
θ(σ0 ) and v∞ . We define another bijection

θ : {p-simplices of K}
e → E(G)

using the same notation as the bijection for V(G), by letting θ(σ p ) = e.

Note that we can take the image of a subset of the domain under a function. Therefore, if
(S , T ) is a cut for a flow network built on G, then θ−1 (E(S , T )) denotes the set of p-simplices
dual to the edges across the cut. Also note that since simplicial chains with Z2 coefficients can be
interpreted as sets, θ−1 (E(S , T )) is also a p-chain.

5.3.2 Algorithm correctness


In this subsection, we prove the correctness of the algorithm MinPersCycFin. Some of the sym-
bols we use refer to the pseudocode of the algorithm.

Proposition 5.13. In the algorithm MinPersCycFin, s2 is not an empty set.

Proof. For contradiction, suppose that s2 is an empty set. Then v∞ < V(G) and σFd is the (p + 1)-
e with the greatest index in F. Since v∞ < V(G), any p-simplex of K
simplex of K e must be a face
of two (p + 1)-simplices of K,
e so the set of (p + 1)-simplices of K
e forms a (p + 1)-cycle created
F F
by σd . Then σd must be a positive simplex in F, which is a contradiction. 

The following two propositions specify the duality mentioned at the beginning of Section 5.3.1:

Proposition 5.14. For any cut (S , T ) of (G, s1 , s2 ) with finite capacity, the p-chain c = θ−1 (E(S , T ))
is a persistent p-cycle of [b, d) and w(c) = C(S , T ).
142 Computational Topology for Data Analysis

Proof. Let A = θ−1 (S ), it is easy to check that c = ∂(A). The key is to show that c is created
by σFb which we show now. Suppose that c is created by a p-simplex σ p , σFb . Since C(S , T )
is finite, we have that index(σ p ) < b. We can let c0 be a persistent cycle of [b, d) and c0 = ∂(A0 )
where A0 is a (p + 1)-chain of Kd . Then we have c + c0 = ∂(A + A0 ). Since A and A0 are both
created by σFd , then A + A0 is created by a (p + 1)-simplex with an index less than d in F. So c + c0
is a p-cycle created by σFb which becomes a boundary before σFd is added. This means that σFb
is already paired when σFd is added, contradicting the fact that σFb is paired with σFd . Similarly,
we can prove that c is not a boundary until σFd is added, so c is a persistent cycle of [b, d). Since
(S , T ) has finite capacity, we must have
X X
C(S , T ) = C(e) = w(θ−1 (e)) = w(c)
e∈θ(c) θ−1 (e)∈c

Proposition 5.15. For any persistent p-cycle c of [b, d), there exists a cut (S , T ) of (G, s1 , s2 ) such
that C(S , T ) ≤ w(c).
Proof. Let A be a (p + 1)-chain in Kd such that c = ∂(A). Note that A is created by σFd and c
is the set of p-simplices which are face of exactly one (p + 1)-simplex of A. Let c0 = c ∩ K e and
A = A ∩ K, we claim that c = ∂(A ). To prove this, first let σ be any p-simplex of c , then σ p is
0 e 0 0 p 0

a face of exactly one (p + 1)-simplex σ p+1 of A. Since σ p ∈ K, e it is also true that σ p+1 ∈ K,
e and
so σ p+1 ∈ A . Then σ is a face of exactly one (p + 1)-simplex of A , so σ ∈ ∂(A ). On the other
0 p 0 p 0
p+1
hand, let σ p be any p-simplex of ∂(A0 ), then σ p is a face of exactly one (p + 1)-simplex σ0 of
p+1
A0 . Note that σ0 ∈ A, and we want to prove that σ p is a face of exactly one (p + 1)-simplex
p+1 p+1 p+1
σ0 of A. Suppose that σ p is a face of another (p + 1)-simplex σ1 of A, then σ1 ∈ K e because
p+1 p+1
σ0 ∈ K. e So we have σ
1 ∈ A∩K e = A0 , contradicting the fact that σ p is a face of exactly one
p+1
(p + 1)-simplex of A0 . Then we have σ p ∈ ∂(A). Since σ0 ∈ K, e we have σ p ∈ K, e which means
that σ ∈ c .
p 0

Let S = θ(A0 ) and T = V(G) r S , then it is true that (S , T ) is a cut of (G, s1 , s2 ) because A0
is created by σFd . We claim that θ−1 (E(S , T )) = ∂(A0 ). The proof of the equality is similar to the
one in the proof of Proposition 5.14. It follows that E(S , T ) = θ(c0 ). We then have that
X X
C(S , T ) = C(e) = w(θ−1 (e)) = w(c0 )
e∈θ(c0 ) θ−1 (e)∈c0

because each p-simplex of c0 has an index less than or equal to b in F.


Finally, since c0 is a subchain of c, we must have C(S , T ) = w(c0 ) ≤ w(c). 

Combining the above results, we conclude:


Theorem 5.16. Algorithm MinPersCycFin computes an optimal persistent p-cycle for the given
interval [b, d).
Proof. First, the flow network (G, s1 , s2 ) constructed by the algorithm MinPersCycFin must be
valid by Proposition 5.13. Since the interval [b, d) must have a persistent cycle, the flow network
Computational Topology for Data Analysis 143

Figure 5.6: A weak 2-pseudomanifold K e embedded in R2 with three voids. Its dual graph is
drawn. The complex has one 1-connected component and four 2-connected components with the
2-simplices in 2-connected components shaded.

(G, s1 , s2 ) has a cut with finite capacity by Proposition 5.15. This means that C(S ∗ , T ∗ ) is finite.
By Proposition 5.14, the chain c∗ = θ−1 (E(S ∗ , T ∗ )) is a persistent cycle of [b, d). Suppose that
c∗ is not an optimal persistent cycle of [b, d) and instead let c0 be a minimal persistent cycle of
[b, d). Then there exists a cut (S 0 , T 0 ) such that C(S 0 , T 0 ) ≤ w(c0 ) < w(c∗ ) = C(S ∗ , T ∗ ) by Propo-
sition 5.14 and 5.15, contradicting the fact that (S ∗ , T ∗ ) is a minimal cut. 

5.3.3 Infinite intervals for weak (p + 1)-pseudomanifolds embedded in R p+1


We already mentioned that computing optimal persistent p-cycles (p ≥ 2) for infinite intervals is
NP-hard even if we restrict to weak (p + 1)-pseudomanifolds [129].
However, when the complex is embedded in R p+1 , the problem becomes polynomial time
tractable. In this subsection, we present an algorithm for this problem given a weak (p + 1)-
pseudomanifold embedded in R p+1 , when p ≥ 1. For p = 1, the problem is polynomial time
tractable for arbitrary complexes, see Exercise 9. The algorithm uses a similar duality described
in Section 5.3.1. However, a direct use of the approach in Section 5.3.1 does not work. In
particular, the dual graph construction is different – previously there is only one dummy vertex
corresponding to infinity, now there is one per void. For example, in Figure 5.6, 1-simplices that
do not have any 2-cofaces cannot reside in any 2-connected component of the 2-complex. Hence,
no cut in the flow network may correspond to a persistent cycle of the infinite interval created by
such a 1-simplex. Furthermore, unlike the finite interval case, we do not have a negative simplex
whose dual can act as a source in the flow network.
Let (K, F, [b, +∞)) be an input to the problem where K is a weak (p + 1)-pseudomanifold
embedded in R p+1 , F : ∅ = K0 ,→ K1 ,→ . . . ,→ Kn is a simplex-wise filtration of K, and
[b, +∞) is an infinite interval of Dgm p (F). By the definition of the problem, the task boils down
to computing an optimal p-cycle containing σFb in Kb . Note that Kb is also a weak (p + 1)-
pseudomanifold embedded in R p+1 .
Generically, assume that K e is an arbitrary weak (p + 1)-pseudomanifold embedded in R p+1
and we want to compute an optimal p-cycle containing a p-simplex σ e for K.
e By the embedding
assumption, the connected components of R p+1 r |K|
e are well defined and we call them the voids
of K. The complex K has a natural (undirected) dual graph structure as illustrated by Figure 5.6
e e
for p = 1, where the graph vertices are dual to the (p + 1)-simplices as well as the voids and
144 Computational Topology for Data Analysis

the graph edges are dual to the p-simplices. The duality between cycles and cuts is as follows:
Since the ambient space R p+1 is contractible (homotopy equivalent to a point), every p-cycle in K e
is the boundary of a (p + 1)-dimensional region obtained by point-wise union of certain (p + 1)-
simplices and/or voids. We can derive a cut2 of the dual graph by putting all vertices contained
in the (p + 1)-dimensional region into one vertex set and putting the rest into the other vertex
set. On the other hand, for every cut of the graph, we can take the point-wise union of all the
(p + 1)-simplices and voids dual to the graph vertices in one set of the cut and derive a (p + 1)-
dimensional region. The boundary of the derived (p + 1)-dimensional region is then a p-cycle in
e We observe that by making the source and sink dual to the two (p + 1)-simplices or voids that
K.
σ
e adjoins, we can build a flow network where a minimal cut produces an optimal p-cycle in K e
containing σ e.
The efficiency of the above algorithm is in part determined by the efficiency of the dual graph
construction. This step requires identifying the voids that the boundary p-simplices are incident
on; see Figure 5.6 for an illustration. A straightforward approach would be to first group the
boundary p-simplices into p-cycles by local geometry, and then build the nesting structure of these
p-cycles to correctly reconstruct the boundaries of the voids. This approach has a quadratic worst-
case complexity. To make the void boundary reconstruction faster, we assume that the simplicial
complex being worked on is p-connected so that building the nesting structure is not needed.
This reconstruction then runs in almost linear time. To satisfy the p-connected assumption, we
begin the algorithm by taking K e as a p-connected subcomplex of Kb containing σF and continue
b
only with this K.e The computed output is still correct because the minimal cycle in K e is again
a minimal cycle in Kb . We skip the details of constructing void boundaries which can be done
in O(n log n) time. Also, we skip the proof of correctness of the following theorem. Interested
readers can consult [129] for details.

Theorem 5.17. Given an infinite interval [b, ∞) ∈ Dgm p (F) for a filtration F of a weak (p + 1)-
pseudomanifold K embedded in R p+1 , an optimal persistent cycle for [b, ∞) can be computed in
O(n2 ) time where n is the number of p and (p + 1)-simplices in K.

5.4 Notes and Exercises


The algorithm to compute an optimal homology basis based on a greedy strategy was first pre-
sented by Erickson and Whittlesey [156] who applied to simplicial 2-manifolds (surfaces). Chen
and Freedman [94] showed that the problem is NP-hard for all homology groups of dimensions
more than 1. It was shown in [136] that an optimal H1 -cycle basis can be computed in O(n4 )
time for a simplicial complex with n simplices. The time complexity got improved to O(nω+1 ) by
Busaryev et al. [60]. Finally, it was settled to O(n3 ) in [130]. Borradaile et al. [43] proposed an
algorithm for computing an optimal H1 -basis for graphs embedded on surfaces. For a graph with
a total of n vertices and edges, the algorithm runs in O(g3 n log n) time where g is the genus plus
the number of boundaries in the surface.
The problem of computing a minimal homologous cycle in a given class is NP-hard even
in dimension one as shown by Chambers et al. [73]. They proposed an algorithm for 1-cycles
2
The cut mentioned here is defined on a graph without sources and sinks, so a cut is simply a partition of the graph’s
vertex set into two sets.
Computational Topology for Data Analysis 145

on surfaces utilizing the duality between minimal cuts of a surface-embedded graph and optimal
homologous cycles of a dual complex. A better algorithm is proposed in [74]. Both algorithms
are fixed parameter tractable running in time exponential in the genus of the surface. For gen-
eralpdimension, Borradaile et al. [44] showed that the OHCP problem in dimension p can be
O( log n)-approximated and is fixed-parameter tractable for weak (p + 1)-pseudomanifolds. The
only polynomial-time exact algorithm [94] in general dimension for OHCP works for p-cycles in
complexes embedded in R p+1 , which uses a reduction to minimal (s, t)-cuts. Interestingly, when
the coefficient is chosen to be Z instead of Z2 for the homology groups, the problem becomes
polynomial time solvable if there is no relative torsion as shown in [126]. The material presented
in Section 5.2 is taken from this paper.
Persistence added an extra layer of complexity to the problem of computing minimal repre-
sentative cycles. Escolar and Hiraoka [157] and Obayashi [247] formulated the problem as an
integer program by adapting a similar formulation for the non-persistent case. Wu et. al [302]
adapted the algorithm of Busaryev et al. [60] to present an exponential-time algorithm, as well
as an A∗ heuristics in practice. The problem of computing optimal persistent cycle is NP-hard
even for H1 [128]. The problem becomes polynomial time solvable for some special cases such
as computing optimal persistent 2-cycles in a 3-complex embedded in 3-dimension [129]. The
materials in Section 5.3 are taken from this source.

Exercises
1. Show that every cycle in a H1 (K)-basis contains a simple cycle which together form a
H1 (K)-basis themselves.

2. Design an O(n2 log n + n2 g) algorithm to compute the shortest non-trivial 1-cycle in a sim-
plicial 2-complex K with n simplices and g = β1 (K). Do the same in O(n2 log n) time when
K is a 1-complex (a graph).
3. ([130]) We have given an O(nω + n2 gω−1 ) algorithm for computing an optimal H1 -basis for
a complex with n simplices. Taking g = Ω(n), this runs in O(nω+1 ) worst-case time. Give
an O(n3 ) algorithm for the problem.
4. How can one make the algorithm in [130] more efficient for a weighted graph G with n
vertices and edges? For this, show that (i) an annotation for G can be computed in O(n2 )
time, (ii) this annotation can be utilized to compute the annotations for O(n2 ) candidate
cycles in O(n3 ) time, (iii) finally, an optimal basis can be computed in O(n3 ) time by the
divide-and-conquer greedy algorithm in [130] though more efficiently.
5. Define a minmax basis of H p (K) as the set of cycles which generate H p (K) and the maxi-
mum weight of the cycles is minimized among all such generators. Prove that an optimal
H p -cycle basis defined in Definition 5.3 is also a minmax basis.

6. Prove that a simplicial p-complex embedded in R p cannot have torsion in H p−1 and hence
OHCP for (p − 1)-cycles can be solved in polynomial time in this case.
7. Take an example of a triangulation of Möbius strip and show that the integer program
formulation of OHCP for it is not totally unimodular.
146 Computational Topology for Data Analysis

8. Professor Optimist claims that an optimal H p -generator for K embedded in R p+1 can be
obtained by computing optimal persistent p-cycles for infinite bars in any filtration of K.
Show that he is wrong. Give a polynomial time algorithm for computing a non-trivial
p-cycle that has the least weight in K.

9. Consider computing a persistent 1-cycle for a bar [b, d) given a filtration of an edge-
weighted complex K. Let c be a cycle created by the edge e = (u, v) at the birth time b
where c is formed by the edge e and the shortest path between u and v in the 1-skeleton
of the complex Kb . If [c] = 0 at Kd , prove that c is an optimal persistent cycle for the bar
[b, d).

10. Give an example where the above computed cycle using shortest path at the birth time is
not a persistent cycle.

11. For a finite interval [b, d) ∈ Dgm p (F) of a filtration F of a weak (p + 1)-pseudomanifold,
one can take the two vertices of the dual edge of the creator p-simplex σb in the algorithm
MinPersCycFin (Section 5.3.1) as source and sink respectively. Give an example to show
that this does not work for computing a minimal persistent cycle for [b, d). What about
taking the dual vertex of the destroyer simplex σd and the infinite vertex as the source and
the sink respectively?

12. ([93]) For a vertex v in a complex with non-negative weights on edges, let discrete geodesic
ball Brv of radius r be the maximal subcomplex L ⊆ K so that the shortest path from v to
every vertex in L is at most r. For a cycle c, let w(c) = min{r | c ⊆ Brv }. Give a polynomial
time algorithm to compute an optimal H p -cycle basis for any p ≥ 1 with these weights.
Chapter 6

Topological Analysis of Point Clouds

In this chapter, we focus on topological analysis of point cloud data (PCD), a common type of
input data across a broad range of applications. Often, there is a hidden space of interest, and
the PCD we obtain contains only observations / samples from that hidden space. If the sample
is sufficiently dense, it should carry information about the hidden space. We are interested in
topological information in particular. However, discrete points themselves do not have interesting
topology. To impose a connectivity that mimics that of the hidden space, we construct a simplicial
complex such as the Rips or Čech complex using the points as vertices. Then, an appropriate
filtration is constructed as a proxy for the same on the topological space that the PCD presumably
samples. This provides topological summaries such as the persistence diagrams induced by the
filtrations. Figure 6.1 [192] shows an example application of this approach. The PCD in this
case represents atomic configurations of silica in three different states: liquid, glass, and crystal
states. Each atomic configuration can be viewed as a set of weighted points, where each point
represents the center of an atom and its weight is the radius of the atom. The persistence diagrams
for the three states show distinctive features which can be used for further analysis of the phase
transitions. The persistence diagrams can also be viewed as a signature of the input PCD and can
be used to compare shapes (e.g. [78]) or provide other analysis.

Figure 6.1: Persistence diagrams of silica in liquid (left), glass (middle), and crystal (right) states.
Image taken from [192], reprinted by permission from Yasuaki Hiraoka et al. (2016, fig. 2).

We mainly focus on PCD consisting of a set of points P ⊆ (Z, dZ ) embedded in some metric
space Z equipped with a metric dZ . One of the most common choices for (Z, dZ ) in practice is

147
148 Computational Topology for Data Analysis

the d-dimensional Euclidean space Rd equipped with the standard L p -distance. We review the
relevant concepts of constructing Rips and Čech complexes, their filtrations, and describe the
properties of the resulting persistence diagrams in Section 6.1. In practice, the size of a filtration
can be prohibitively large. In Section 6.2, we discuss data sparsification strategies to approximate
topological summaries much more efficiently and with theoretical guarantees.
As we have mentioned, a PCD can be viewed as a window through which we can peek at topo-
logical properties of the hidden space. In particular, we can infer about the hidden homological
information using the PCD at hand if it samples the hidden space sufficiently densely. In Section
6.3, we provide such inference results for the cases when the hidden space is a manifold or is a
compact set embedded in the Euclidean space. To obtain theoretical guarantees, we also need to
introduce the language of sampling conditions to describe the quality of point samples. Finally,
in Section 6.4, we focus on the inference of scalar field topology from a set of point samples P,
as well as function values available at these samples. More precisely, we wish to estimate the
persistent homology of a real-valued function f : X → R from a set of discrete points P ⊂ X as
well as the values of f over P.

6.1 Persistence for Rips and Čech filtrations


Suppose we are given a finite set of points P in a metric space (Z, dZ ). Consider a closed ball
BZ (p, r) with radius r centered at each point p ∈ P and consider the space Pr := ∪ p∈P BZ (p, r).
The Čech complex w.r.t. P and a parameter r ≥ 0 is defined as (Definition 2.9)

CrZ (P) = {σ = {p0 , . . . , pk } | ∩i∈[0,k] BZ (p, r) , ∅}. (6.1)

We often omit Z from the subscript when its choice is clear. As mentioned in Chapter 2.2, the
Čech complex Cr (P) is the nerve of the union of balls Pr . If the metric balls centered at points
in P in the metric space (Z, dZ ) are convex, then the Nerve Theorem (Theorem 2.1) gives the
following corollary.

Corollary 6.1. For a fixed r ≥ 0, if the metric ball BZ (x, r) is convex for every x ∈ P, then Cr (P)
is homotopy equivalent to Pr , and thus Hk (Cr (P))  Hk (Pr ) for any dimension k ≥ 0.

The above result justifies the utility of Čech complexes. For example, if P ⊆ Rd and dZ is the
standard L p -distance for p > 0, then the Čech complex Cr (P) becomes homotopy equivalent to the
union of r-radius balls centering at points in P. Later in this chapter, we will also see an example
where the points P are taken from a Riemannian manifold X equipped with the Riemannian metric
dX . When the radius r is small enough, the intrinsic metric balls also become convex. In both
cases, the resulting Čech complex captures information of the union of r-balls Pr .
In general, it is not clear at which scale (radius r) one should inspect the input PCD. Varying
the scale parameter r, we obtain a filtration of spaces P := {Pα ,→ Pα }α≤α0 as well as a filtered
0

sequence of simplicial complexes C(P) := {Cα (P) ,→ Cα (P)}α≤α0 . The homotopy equivalence
0

between Pr and Cr , if holds, further induces an isomorphism between the persistence modules
obtained from these two filtrations.

Proposition 6.2 ([91]). If the metric ball B(x, r) is convex for every x ∈ P and all r ≥ 0, then the
persistence module Hk P is isomorphic to the persistence module Hk C(P). This also implies that
Computational Topology for Data Analysis 149

their corresponding persistence diagrams are identical; that is, Dgmk P = Dgmk C(P), for any
dimension k ≥ 0.
A related persistence-based topological invariant is given by the Vietoris-Rips filtration R(P) =
{VRα (P) ,→ VRα (P)}α≤α0 , where the Vietoris-Rips complex VRr (P) for a finite subset P ⊆ (Z, dZ )
0

at scale r is defined as (Definition 2.10):

VRr (P) = {σ = {p0 , . . . , pk } | dZ (pi , p j ) ≤ 2r for any i, j ∈ [0, k]}. (6.2)

Recall from Chapter 4.1 that the Čech filtration and Vietoris-Rips filtration are multiplicatively
2-interleaved, meaning that their persistence modules are log 2-interleaved at the log-scale, and

db (Dgmlog C(P), Dgmlog R(P)) ≤ log 2, (Corollary 4.4). (6.3)

Finite metric spaces. The above definitions of Čech or Rips complexes assume that P is em-
bedded in an ambient metric space (Z, dZ ). It is possible that Z = P and we simply have a discrete
metric space spanned by points in P, which we denote by (P, dP ). Obviously, the construction of
Čech and Rips complexes can be extended to this case. In particular, the Čech complex CrP (P) is
now defined as

CrP (P) = {σ = {p0 , . . . , pk } | ∩i∈[0,k] BP (p; r) , ∅}, (6.4)

where BP (p, r) := {q ∈ P | dP (p, q) ≤ r}. However, note that when P ⊂ Z and dP is the restriction
of the metric dZ to points in P, the Čech complex CrP (P) defined above can be different from the
Čech complex CrZ (P), as the metric balls (BP vs. BZ ) are different. In particular, in this case, we
have the following relation between the two types of Čech complexes:

CrP (P) ⊆ CrZ (P) ⊆ C2r


P (P). (6.5)

On the other hand, in this setting, the two Rips complexes are the same because the definition of
Rips complex involves only pairwise distance between input points, not metric balls.
The persistence diagrams induced by the Čech and the Rips filtrations can be used as topologi-
cal summaries for the input PCD P. We can then for example, compare input PCDs by comparing
these persistence diagram summaries.
Definition 6.1 (Čech, Rips distance). Given two finite point sets P and Q, equipped with appro-
priate metrics, the Čech distance between them is a pseudo-distance defined as:

dCech (P, Q) = max dB (Dgmk C(P), Dgmk C(Q))).


k

Similarly, the Rips distance between P and Q is a pseudo-distance defined as:

dRips (P, Q) = max dB (Dgmk R(P), Dgmk R(Q))).


k

These distances are stable with respect to the Hausdorff or the Gromov-Hausdorff distance
between P and Q depending on whether they are embedded in a common metric space or are
viewed as two discrete metric spaces (P, dP ) and (Q, dQ )). We introduce the Hausdorff and
Gromov-Hausdorff distances now. Given a point x and a set A from a metric space (X, d), let
d(x, A) := inf a∈A d(x, a) denote the closest distance from x to any point in A.
150 Computational Topology for Data Analysis

Definition 6.2 (Hausdorff distance). Given two compact sets A, B ⊆ (Z, dZ ), the Hausdorff dis-
tance between them is defined as:

dH (A, B) = max{ max dZ (a, B), max dZ (b, A) }.


a∈A b∈B

Note that the Hausdorff distance requires the input objects are embedded in a common am-
bient space. In case they are not embedded in any common ambient space, we use Gromov-
Hausdorff distance, which intuitively measures how much two input metric spaces differ from
being isometric.

Definition 6.3 (Gromov-Hausdorff distance). Given two metric spaces (X, dX ) and (Y, dY ), a cor-
respondence C is a subset C ⊆ X × Y so that (i) for every x ∈ X, there exists some (x, y) ∈ C; and
(ii) for every y0 ∈ Y, there exists some (x0 , y0 ) ∈ C. The distortion induced by C is
1
distortC (X, Y) := sup |dX (x, x0 ) − dY (y, y0 )|.
2 (x,y),(x0 ,y0 )∈C

The Gromov-Hausdorff distance between (X, dX ) and (Y, dY ) is the smallest distortion possible by
any correspondence; that is,

dGH (X, Y) := inf distortC (X, Y).


C⊆X×Y

Theorem 6.3. Čech- and Rips-distances satisfy the following stability statements:

1. Given two finite sets P, Q ⊆ (Z, dZ ), we have

dCech (P, Q) ≤ dH (P, Q); and dRips (P, Q) ≤ dH (P, Q).

2. Given two finite metric spaces (P, dP ) and (Q, dQ ), we have

dCech (P, Q) ≤ 2dGH ((P, dP ), (Q, dQ )), and dRips (P, Q) ≤ dGH ((P, dP ), (Q, dQ )).

Note that the bound on dCech (P, Q) in statement (2) of the above theorem has an extra factor
of 2, which comes due to the difference in metric balls – see the discussions after Eqn (6.4). We
also remark that (2) in the above theorem can be extended to the so-called totally bounded metric
spaces (which are not necessarily finite) (P, dP ) and (Q, dQ ) defined as follows. First, recall that
an ε-sample (Definition 2.17) of a metric space (Z, dZ ) is a finite set S ⊆ Z so that for every
z ∈ Z, dZ (z, S ) ≤ ε. A metric space (Z, dZ ) is totally bounded if there exists a finite ε-sample for
every ε > 0. Intuitively, such a metric space can be approximated by a finite metric space for any
resolution.

6.2 Approximation via data sparsification


One issue with using the Vietoris-Rips or Čech filtrations in practice is that their sizes can become
huge, even for moderate number of points. For example, when the scale r is larger than the
diameter of a point set P, the Čech and the Vietoris-Rips complexes of P contain every simplex
Computational Topology for Data Analysis 151

(a) (b) (c)

Figure 6.2: Vietoris-Rips complex: (b) at small scale, the Rips complex of points shown in (a)
requires the two white points; (c) the two white points become redundant at larger scale.

spanned by points in P, in which case the size of d-skeleton of Cr (P) or VRr (P) is Θ(nd+1 ) for
n = |P|.
On the other hand, as shown in Figure 6.2, as the scale r increases, certain points could
become “redundant”, e.g, having no or little contribution to the underlying space of the union of
all r-radius balls. Based on this observation, one can approximate these filtrations with sparsified
filtrations of much smaller size. In particular, as the scale r increases, the point set P with which
one constructs a complex is gradually sparsified keeping the total number of simplicies in the
complex linear in the input size of P where the dimension of the embedding space is assumed to
be fixed.
We describe two data sparsification schemes in Sections 6.2.1 and 6.2.2, respectively. We
focus on the Vietoris-Rips filtration for points in a Euclidean space Rd equipped with the standard
Euclidean distance d.

6.2.1 Data sparsification for Rips filtration via reweighting


Most of the concepts presented in this section apply to general finite metric spaces though we
describe them for finite point sets equipped with an Euclidean metric. The reason for this choice
is that the complexity analysis draws upon the specific property of Euclidean space. The reader is
encouraged to think about generalizing the definitions and the technique to other metric spaces.
First we restate the definition of δ-sample and δ-sparse sample in Definition 2.17 slightly
differently.

Definition 6.4 (Nets and net-tower). Given a finite set of points P ⊂ (Rd , d) and γ, γ0 ≥ 0, a
subset Q ⊆ P is a (γ, γ0 )-net of P if the following two conditions hold:

Covering condition: Q is a γ-sample for (P, d), i.e., for every p ∈ P, d(p, Q) ≤ γ.
Packing condition: Q is also γ-sparse, i.e., for every q , q0 ∈ Q, d(q, q0 ) ≥ γ0 .

If γ = γ0 , we also refer to Q as a γ-net of P.


A single-parameter family of nets {Nγ }γ is called a net-tower of P if (i) there is a constant
c > 0 so that for all γ ∈ R, Nγ is a (γ, γ/c)-net for P, and (ii) Nγ ⊇ Nγ0 for any γ ≤ γ0 .
152 Computational Topology for Data Analysis

Intuitively, a γ-net approximates a PCD P at resolution γ (Covering condition), while also


being sparse (Packing condition). A net-tower provides a sequence of increasingly sparsified
approximation of P.

Net-tower via farthest point sampling. We now introduce a specific net-tower constructed via
the classical strategy of farthest point sampling, also called greedy permutation e.g. in [56, 70].
Given a point set P ⊂ (Rd , d), choose an arbitrary point p1 from P and set P1 = {p1 }. Pick pi
recursively as pi ∈ argmax p∈P\Pi−1 d(p, Pi−1 )1 , and set Pi = Pi−1 ∪ {pi }. Now set t pi = d(pi , Pi−1 ),
which we refer to as the exit-time of pi . Based on these exit-times, we construct the following two
families of sets:

Open net-tower N = {Nγ }γ∈R where Nγ := {p ∈ P | t p > γ}. (6.6)


Closed net-tower N = {N γ }γ∈R where N γ := {p ∈ P | t p ≥ γ}. (6.7)

It is easy to verify that both Nγ and N γ are γ-nets, and the families N and N are indeed two net-
towers as γ increases. As γ increases, Nγ and N γ can only change when γ = t p for some p ∈ P.
Hence the sequence of subsets P = Pn ⊃ Pn−1 ⊇ · · · ⊇ P2 ⊇ P1 contain all the distinct sets in the
open and closed net-towers {Nγ } and {N γ }.
In what follows, we discuss a sparsification strategy for the Rips filtration of P using the above
net-towers. The approach can be extended to other net-towers, such as the net-tower constructed
using the net-tree data structure of [182].

Weights, weighted distance, and sparse Rips filtration. Given the exit-time t p s for all points
p ∈ P, we now associate a weight w p (α) for each point p at a scale α as follows (the graph of this
weight function is shown on the right): for some constant 0 < ε < 1,
wα (p)

tp
≥α
 tp



 0 ε (1−)

tp tp tp
w p (α) =  α− <α<


 ε ε ε(1−ε)
tp

εα


≤α
ε(1−ε) 0
tp tp α
 (1−)

Claim 6.1. The weight function w p is a continuous, 1-Lipschitz, and non-decreasing function.
The parameter ε controls the resolution of the sparsification. The net-induced distance at
scale α between input points is defined as:

dα (p, q) := d(p, q) + w p (α) + wq (α).


b (6.8)

Definition 6.5 (Sparse (Vietoris-)Rips). Given a set of points P ⊂ Rd , a constant 0 < ε < 1, and
the open net-tower {Nγ } as well as the closed net-tower {N γ } for P as introduced above, the open
sparse-Rips complex at scale α is defined as

Qα := {σ ⊆ Nε(1−ε)α | ∀p, q ∈ σ, b
dα (p, q) ≤ 2α}; (6.9)
1
Note that there may be multiple points that maximize d(p, Pi−1 ) making argmax p∈P\Pi−1 d(p, Pi−1 ) a set. We can
choose pi to be any point in this set.
Computational Topology for Data Analysis 153

while the closed sparse-Rips at scale α is defined as


α
Q := {σ ⊆ N ε(1−ε)α | ∀p, q ∈ σ, b
dα (p, q) ≤ 2α}. (6.10)
α
Set Sα := ∪β≤α Q , which we call the cumulative complex at scale α. The (ε-)sparse Rips filtration
then refers to the R-indexed filtration S = {Sα ,→ Sβ }α≤β .
α α
Obviously, Qα ⊆ Q . Note that for α < β, Qα is not necessarily included in Qβ (neither is Q
β
in Q ); while the inclusion Sα ⊆ Sβ always holds.
In what follows, we show that the sparse Rips filtration approximates the standard Vietoris-
Rips filtration {VRr (P)} defined over P, and that the size of the sparse Rips filtration is only linear
in n for any fixed dimension d which is assumed to be constant. The main results are summarized
in the following theorem.
Theorem 6.4. Let P ⊂ Rd be a set of n points where d is a constant, and R(P) = {VRr (P)} be the
Vietoris-Rips filtration over P. Given net-towers {Nγ } and {N γ } induced by exit-times {t p } p∈P , let
S(P) = {Sα } be its corresponding ε-sparse Rips filtration as defined in Definition 6.5. Then, for a
fixed 0 < ε < 13 ,
(i) S(P) and R(P) are multiplicatively 1−ε
1
-interleaved at the homology level. Thus, for any
k ≥ 0, the persistence diagram Dgmk S(P) is a log 1−ε
1
-approximation of Dgmk R(P) at the
log-scale.
(ii) For any fixed dimension k ≥ 0, the total number of k-simplices ever appeared in S(P) is
Θ(( 1ε )kd n).
In the remainder of this section, we sketch the proof of the above theorem.

Proof of part (i) of Theorem 6.4. To relate S(P) to R(P), we need to go through a sequence of
intermediate steps. First, we define the relaxed Rips complex at scale α as
c α (P) := {σ ⊂ P | ∀p, q ∈ σ, b
VR dα (p, q) ≤ 2α}.

The following claim ensures that the relaxed Rips complexes form a valid filtration connected by
inclusions R(P)
b c α (P) ,→ VR
= {VR c β (P)}α≤β , which we call the relaxed Rips filtration.

dα (p, q) ≤ 2α ≤ 2β, then b


Claim 6.2. If b dβ (p, q) ≤ 2β.

Proof. The weight function w p is 1-Lipschitz for any p ∈ P (Claim 6.1). Thus we have that

dβ (p, q) = d(p, q) + w p (β) + wq (β)


b
≤ d(p, q) + w p (α) + (β − α) + wq (α) + (β − α)
≤ d(p, q) + w p (α) + wq (α) − 2α + 2β ≤ 2β.

The last inequality follows from d(p, q) + w p (α) + wq (α) = b


dα (p, q) ≤ 2α. 

In what follows, we drop the argument P from notations such as in complexes VRα (P) or in
sparse Rips filtration S(P) when the point set in question is understood.
154 Computational Topology for Data Analysis

α
Proposition 6.5. Let C = 1
1−ε . Then for any α ≥ 0 we have that VRα/C ⊆ VR
c ⊆ VRα .

Next, we relate filtrations S and R via the relaxed Rips filtration R


b by connecting the sparse
α
Rips complexes Qα s and Q s. Consider the following projection of P to points in the net Nε(1−ε)α
which are also vertices of Qα :

p
 if p ∈ Nε(1−ε)α
πα (p) = 

argminq∈N d(p, q) otherwise

εα

Again, if argminq∈Nεα d(p, q) contains more than one point, we set πα (p) to be an arbitrary one.
This projection is well-defined as Nεα ⊆ Nε(1−ε)α given that 0 < ε < 1/3 < 1. We need sev-
eral technical results on this projection map, which we rely on later to construct maps between
appropriate versions of Rips complexes. First, the following two results are easy to show.

Fact 6.1. For every p ∈ P, d(p, πα (p)) ≤ w p (α) − wπα (p) (α) ≤ εα.

dα (p, πα (q)) ≤ b
Fact 6.2. For every pair p, q ∈ P, we have that b dα (p, q).

We are now ready to show that inclusion induces an isomorphism between the homology
groups of the sparse Rips complex and the relaxed Rips complex.

Proposition 6.6. For any α ≥ 0, the inclusion i : Qα ,→ VR c α induces an isomorphism at the


α
homology level; that is, H∗ (Qα )  H∗ (VR
c ) under the homomorphism i∗ induced by i.

Proof. First, we consider the projection map πα and argue that it induces a simplicial map
πα : VRc α → Qα which is in fact a simplicial retraction2 . Next, we show that the map i ◦ πα :
c α → VR
VR c α is contiguous to the identity map id : VRc α → VR c α . As πα is a simplicial retraction,
it follows that i∗ is an isomorphism (Lemma 2 of [275]).
To see that πα is a simplicial map, apply Fact 6.2 twice to have that

dα (πα (p), πα (q)) ≤ b


b dα (p, πα (q)) ≤ b
dα (p, q). (6.11)

Since both Qα and VR c α are clique complexes, this then implies that πα is a simplicial map. Fur-
thermore, it is easy to see that it is a retraction as πα (q) = q for any q in the vertex set of Qα
(which is Nε(1−ε)α ).
Now to show that i◦πα is contiguous to id, we observe that for any p, q ∈ P with b
dα (p, q) ≤ 2α,
all edges among {p, q, πα (p), πα (q)} exist and thus all simplices spanned by them exist in VR c α.
Indeed, that bdα (πα (p), πα (q)) ≤ 2α is already shown above in Eqn (6.11). Combining Fact 6.1
with the fact that w p (α) ≤ εα, we have that

dα (p, πα (p)) = d(p, πα (p)) + w p (α) + wπα (p) (α) ≤ 2w p (α) ≤ 2εα < 2α.
b

Furthermore, by Fact 6.2, b dα (p, πα (q)) ≤ b


dα (p, q) ≤ 2α. Symmetric arguments can be applied to
dα (q, πα (q)), b
show that b dα (q, πα (p)) ≤ 2α. This establishes that i ◦ πα is contiguous to id. This
proves the proposition. 

2
A simplicial retraction f : K → L is a simplicial map from K ⊆ L to L so that f (σ) = σ for any σ ∈ K.
Computational Topology for Data Analysis 155

α
The closed sparse-Rips complex Q is the relaxed Rips complex over the vertex set N ε(1−ε)α ,
which is a superset of the vertex set of Qα . Hence the above proposition also holds for the
α α
inclusion Qα ,→ Q . It then follows that H∗ (Qα )  H∗ (Q ). Finally, we show that the inclusion
α
also induces an isomorphism between H∗ (Q ) and H∗ (Sα ), which when combined with the above
c α.
results connects Sα and VR
α
Proposition 6.7. For any α ≥ 0, the inclusion h : Q ,→ Sα induces an isomorphism at the
α
homology level, that is, H∗ (Q )  H∗ (Sα ) under h∗ .

Proof. Consider the sequence {Sα }α∈R . First, we discretize α to have distinct values α0 < α1 <
α2 . . . αm so that Sα0 = ∅, and αi s are exactly the time when the combinatorial structure of Sα
β
changes. As Sα = β≤α Q , these are also exactly the moments when the combinatorial structure
P
α
of Q changes. Hence we only need to prove the statement for such αi ’s, and it will then work for
αi α
all α’s. Set λi := ε(1 − ε)αi . Note that the vertex set for Q is N λi by the definition of Q in Eqn
(6.10).
αk
Now fix a k ≥ 0. We will show that h : Q ,→ Sαk induces an isomorphism at the homololgy
level. We use some intermediate complexes
k
[ αj
T i,k := Q , for i ∈ [1, k].
j=i

αk αk
Obviously, T 1,k = Sαk where T k,k = Q . Set hi : T i+1,k ,→ T i,k . The inclusion h : Q ,→ Sαk can
then be written as h = h1 ◦ h2 ◦ · · · ◦ hk−1 . In what follows, we prove that hi : T i+1,k ,→ T i,k induces
an isomorphism at the homology level for each i ∈ [1, k − 1], which then proves the proposition.
αi
First, note that while T i,k is not necessarily the same as Q , they share the same vertex set.
αi+1
Now, because of our choices of αi s and λi s, the vertex set of T i+1,k , which is the vertex set of Q ,
namely N λi+1 , equals Nλi . Hence we can consider the projection παi : T i,k → T i+1,k given by the
projection of the vertex set Nλi−1 = N λi of T i,k to the vertex set Nλi = N λi+1 of T i+1,k . To prove that
hi induces an isomorphism at the homology level, by Lemma 2 of [275], it suffices to show that
(i) παi is a simplicial retraction, and (ii) hi ◦ παi is contiguous to the identity map id : T i,k → T i,k .
To prove (i), it is easy to verify that παi is a retraction. To see that παi induces a simplicial map,
we need to show that for every σ ∈ T i,k , παi (σ) ∈ T i+1,k . As παi is a retraction, we only need to
αi
prove this for every σ ∈ T i,k \ T i+1,k . On the other hand, note that by definition, T i,k \ T i+1,k ⊆ Q .
αi
To this end, the argument in Proposition 6.6 also shows that παi : Q → Qαi is a simplicial map,
αi αi αi
and furthermore, h0 ◦ παi is contiguous to id0 : Q → Q , where h0 : Qαi ,→ Q . Because of
αi+1
our choice of αi s, Qαi and Q have the same vertex set, which is Nλi . Furthermore, for every
edge (p, q) ∈ Qαi , we have that b dαi (p, q) ≤ 2αi . As αi < αi+1 , it follows from Claim 6.2 that
αi+1 αi+1
dαi+1 (p, q) ≤ 2αi+1 . Hence, the edge (p, q) is in Q . This implies that Qαi ⊆ Q . Putting
b
αi
everything together, it follows that, for every σ ∈ T i,k \ T i+1,k ⊆ Q , we have
αi+1
παi (σ) ∈ Qαi ⊆ Q ⊆ T i+1,k .

Therefore, παi is a simplicial map. This finishes the proof of (i).


156 Computational Topology for Data Analysis

Now we prove (ii), that is, hi ◦ παi is contiguous to the identity map id : T i,k → T i,k . This
means that we need to show for every σ ∈ T i,k , σ ∪ παi (σ) ∈ T i,k . Again, as παi is a simplicial re-
αi
traction, we only need to show this for σ ∈ T i,k \T i+1,k ⊆ Q . As mentioned above, using the same
αi αi
argument as in Proposition 6.6, we know that h0 ◦παi is contiguous to the identity id0 : Q → Q .
αi αi
Hence we have that for every σ ∈ Q , σ ∪ παi (σ) ∈ Q . It follows that σ ∪ παi (σ) ∈ T i,k as
αi
Q ⊆ T i,k . This proves (ii) completing the proof of the proposition. 

Combining Propositions 6.6 (as well as the discussion after this proposition) and 6.7, we have
c α } induces isomorphic persistence modules. This, together with Proposition 6.5,
that {Sα } and {VR
implies part (i) of Theorem 6.4.

Proof of part (ii) of Theorem 6.4. Let S(k) denote the set of k-simplices ever appeared in S(P),
which is also the set of k-simplices in the last complex S∞ of S(P). To bound the size of S(k) , we
charge each simplex in S(k) to the vertex of it with smallest exit-time. Observe that a point p ∈ P
β tp
does not contribute to any new edge in the sparse Rips complex Q for β > ε(1−ε) . This means
αp
that to bound the number of simplices charged to p, we only need to bound such simplices in Q
tp
with α p = ε(1−ε) .
αp
Set E(p) = {q ∈ P | (p, q) ∈ Q and t p ≤ tq }. We add p to E(p) too. We claim that
|E(p)| = O(( 1ε )d ). In particular, consider the closed net-tower {N γ }; recall that N γ is a γ-net. As
E(p) ⊆ N t p , the packing-condition of the net implies that the closest pair in E(p) has distance
αp
at least t p between them. On the other hand, for each (p, q) ∈ Q , we have b dα p (p, q) ≤ 2α p
implying that the E(p) ⊆ B(p, 2α p ). A simple packing argument then implies that the number of
points in E(p) is
 2α p d 
 !   !d   !d 
 = O 
 2  = O  1  .
  
O 
tp ε(1 − ε) ε
The last equality follows because ε < 1/3 and thus 1 − ε ≥ 2/3. The total number of k-simplices
charged to p is bounded by O(( 1ε )kd ), and the total number k-simplices in S(P) is O(( 1ε )kd n),
proving part (ii) of Theorem 6.4.

6.2.2 Approximation via simplicial tower


We now describe a different sparsification strategy by directly building a simplicial tower of Rips
complexes connected by simplicial maps (Definition 4.1 and the discussion below it) whose per-
sistent homology also approximates that of the standard Rips-filtration. This sparsification is con-
ceptually simpler, but its approximation quality is worse than the one introduced in the previous
section.
Given a set of points P ⊂ Rd , α > 0, and some 0 < ε < 1, we are interested in the following
filtration (which is a subsequence of the standard Rips filtration)
VRα (P) ,→ VRα(1+ε) (P) ,→ VRα(1+ε) (P) ,→ · · · ,→ VRα(1+ε) (P).
2 m
(6.12)
We now construct a sparsified sequence by setting P0 := P, building a sequence of point sets
Pk , k = 0, 1, . . . , m where Pk+1 is a αε
2 (1 + ε)
k−1 -net of P , and terminating the process when P
k m
is of constant size.
Computational Topology for Data Analysis 157

Consider the following vertex map πk : Pk → Pk+1 , for any k ∈ [0, m − 1], where πk (v) is the
nearest neighbor of v ∈ Pk in Pk+1 . Define b πk : P0 → Pk+1 as b πk := πk ◦ · · · ◦ π0 . Based on the
fact that Pk+1 is a αε (1 + ε) -net of Pk , it can be verified that πk induces a simplicial map
2 k−1

πk : VRα(1+ε) (Pk ) → VRα(1+ε) (Pk+1 )


k k+1

πk : VRα (P0 ) → VRα(1+ε) (Pk+1 ). We thus have the


k+1
which further gives rise to a simplicial map b
following tower of simplicial complexes:
π0 π1 πm−1
S : VRα (P0 ) −−→ VRα(1+ε) (P1 ) −−→ · · · −−−→ VRα(1+ε) (Pm ).
m
b (6.13)
Claim 6.3. For any fixed α ≥ 0, ε ≥ 0, and any integer k ≥ 0, each triangle in the following
diagram commutes at the homology level:
  ik
/ VRα(1+ε)k+1 (P0 )
VRα(1+ε)
k

O (P0 ) O
πk
b
jk jk+1
? * ?
πk
VR α(1+ε)k
(Pk ) / VRα(1+ε)k+1 (Pk+1 )

Here, the maps ik s and jk s are canonical inclusions.


The above result implies that at the homology level, the sequence in Eqn (6.13) and the se-
quence Eqn. (6.12) are weakly (1 + ε)-interleaved in a multiplicative manner. In particular, dif-
ferent from the interleaving introduced by Definition 4.4 in Chapter 4.1, here the interleaving
relations only hold at discrete index values of the filtrations.
ua,b
Definition 6.6 (Weakly interleaving of vector space towers). Let U = Ua −→ Ub a0 ≤a≤b and

va,b
V = Va −→ Vb a0 ≤a≤b be two vector space towers over an index set A = {a ∈ R | a ≥ a0 } with

resolution a0 ≥ 0. For some real number ε ≥ 0, we say that they are weakly ε-interleaved if there
are two families of linear maps φi : Ua0 +iε → Va0 +(i+1)ε , and ψi : Va0 +iε → Ua0 +(i+1)ε , for any
integer i ≥ 0, such that any subdiagram of the following diagram commutes:
U: Ua0 / Ua +ε / Ua +2ε / ...... / Ua +mε / ... (6.14)
; 0 9 0 9 0 ;

# %
V: Va0 / Va +ε / Va +2ε / . . .% . . . / Va +mε / .# . .
0 0 0

It turns out that to verify the commutativity of the diagram in Eqn. (6.14), it is sufficient to
verify it for all subdiagrams of the form as in Eqn. (4.3). Furthermore, ε-weakly interleaved
persistence modules also have bounded bottleneck distances between their persistence diagrams
[77] though the distance bound is relaxed to 3ε, that is, if U and V are weakly-ε interleaved,
then db (DgmU, DgmV) ≤ 3ε. Analogous results hold for multiplicative setting. Finally, using a
similar packing argument as before, one can also show that the total number of k-simplices that
ever appear in the simplicial-map based sparsification b S is linear in n (assuming that k and the
dimension d are both constant). To summarize:
Theorem 6.8. Given a set of n points P ⊂ Rd , we can 3 log(1 + ε)-approximate the persistence
diagram of the discrete Rips filtration in Eqn. (6.12) by that of the filtration in Eqn. (6.13) at
the log-scale. The number of k-simplices that ever appear in the filtration in Eqn. (6.13) is
O(( 1ε )O(kd) n).
158 Computational Topology for Data Analysis

6.3 Homology inference from PCDs


So far, we considered the problem of approximating the persistence diagram of a filtration created
out of a given PCD. Now we consider the problem of inferring certain homological structure of
a (hidden) domain where the input PCD presumably is sampled from. More specifically, the
problem we consider is: Given a finite set of points P ⊂ Rd , residing on or around a hidden
domain X ⊆ Rd of interest, compute or approximate the rank of H∗ (X) using input PCD P. Later
in this chapter, X is assumed to be either a smooth Riemannian manifold embedded in Rd , or
simply a compact set of Rd .

Main ingredients. Since points themselves do not have interesting topology, we first construct
a certain simplicial complex K, typically a Čech or a Vietoris-Rips complex from P. Next, we
compute the homological information of K as a proxy for the same of X. Of course, the approx-
imation becomes faithful only when the given sample P is sufficiently dense and the parameters
used for building the complexes are appropriate. The high level approach works as follows.

Input: A finite point set P ⊂ Rd “approximating” a hidden space X ⊂ Rd .


Step 1. Compute the Čech complex Cα (P), or a pair of Rips complexes VRα (P) and VRα (P) for
0

some appropriate 0 < α < α0 .


Step 2. In the case of Čech complex, return dim(H∗ (Cα (P))) as an approximation of dim(H∗ (X)).
In the case of Rips complex, return rank (im (i∗ )), where the homomorphism
i∗ : H∗ (VRα (P)) → H∗ (VRα (P)) is induced by the inclusion VRα (P) ⊆ VRα (P).
0 0

To provide quantitative statements on the approximation quality of the outcome of the above
approach, we need to describe first what the quality of the input PCD P is, often referred to as
the sampling conditions. Intuitively, a better approximation in homology is achieved if the input
points P “approximates” or “samples” X better. The quality of input points is often measured by
the Hausdorff distance w.r.t. the Euclidean distances between PCD P and the hidden domain X of
interest (Definition 6.2), such as requiring that dH (P, X) ≤ ε for some ε > 0. Note that points in
P do not necessarily lie in X. The approximation guarantee for dim(H∗ (X)) relies on relating the
distance fields induced by X and by the sample P. We describe the distance field and feature sizes
of X in Section 6.3.1. We present how to infer homology for smooth manifolds and compact sets
from data in Section 6.3.2 and Section 6.3.3 respectively. In Section 6.4, we discuss inferring the
persistent homology induced by a scalar function f : X → R on X.

6.3.1 Distance field and feature sizes


To describe how well P samples X, we introduce two notions of the so-called “feature size" of X:
the local feature size and the weak feature size, both related to the distance field dX w.r.t. X.
Definition 6.7 (Distance field). Given a compact set X ⊂ Rd , the distance field (w.r.t. X) is
dX : Rd → R, x 7→ d(x, X),

where d is the Euclidean distance associated to Rd . The α-offset of X is defined as X α := {x ∈ Rd |


dX (x) ≤ α}, which is simply the sub-level set d−1
X ((−∞, α]) of dX .
Computational Topology for Data Analysis 159

Given x ∈ Rd , let Π(x) ∈ X denote the set of closest points of x in X; that is,
Π(x) = {y ∈ X | d(x, y) = dX (x)}.
The medial axis of X, denoted by MX , is the closure of the set of points with more than one closest
point in X; that is,
MX = closure{x ∈ Rd | |Π(x)| ≥ 2}.
Intuitively, |Π(x)| ≥ 2 implies that the maximal Euclidean ball centered at x whose interior is free
of points in X meets X in more than one point on its boundary. Hence, MX is the closure of the
centers of such maximal empty balls.
Definition 6.8 (Local feature size and reach). For a point x ∈ X, the local feature size at x,
denoted by lfs(x), is defined as the minimum distance to the medial axis MX ; that is,
lfs(x) := d(x, MX ).
The reach of X, denoted by ρ(X), is the minimum local feature size of any point in X.
The concept has been primarily developed for the case when X is a smooth manifold embed-
ded in Rd . Indeed, the local feature size can be zero at a non-smooth point: consider a planar
polygon; its medial axis intersects its vertices, and the local feature size at a vertex is thus zero.
The reach of a smoothly embedded manifold could also be zero; see Section 1.2 of [119] for an
example. Next, we describe a “weaker" notion of feature size [89, 90], which is more suitable for
compact subsets of Rd .

Critical points of distance field. The distance function dX introduced above is not everywhere
differentiable. Its gradient is defined on Rd \{X ∪MX }. However, one can still define the following
vector which extends the notion of gradient of dX to include the medial axis MX : Given any point
x ∈ Rd \ X, there exists a unique closed ball with minimal radius that encloses Π(x) [225]. Let
c(x) denote the center of this minimal enclosing ball, and r(x) its radius. It is easy to see that for
any x ∈ Rd \ MX , this ball and c(x) degenerates to the unique point in Π(x).
Definition 6.9 (Generalized vector field). Define the following vector field ∇d : Rd \ X → Rd
where the (generalized) gradient vector at x ∈ Rd \ X is:
x − c(x)
∇d (x) = .
dX (x)
The critical points of ∇d are points x for which ∇d (x) = 0. We also call the critical points of ∇d
the critical points of the distance function dX .
This generalized gradient field ∇d coincides with the gradient of the distance function dX for
points in Rd \ {X ∪ MX }. The distance field (distance function) and its critical points were previ-
ously studied in e.g., [177], and have played an important role in sampling theory and homology
inference. In general, a point x is a critical point if and only if x ∈ Rd \X is contained in the convex
hull of Π(x) (The convex hull of a compact set A ⊂ Rd is the smallest convex set that contains
A). It is necessary that all critical points of ∇d belong to the medial axis MX of X. For the case
where X is a finite set of points in Rd , the critical points of dX are the non-empty intersections of
the Delaunay simplices with their dual Voronoi cells (if exist) [119].
160 Computational Topology for Data Analysis

Definition 6.10 (Weak feature size). Let C denote the set of critical points of ∇d . The weak
feature size of X, denoted by wfs(X), is the distance between X and C; that is,
wfs(X) = min inf d(x, c).
x∈X c∈C

Proposition 6.9. If 0 < α < are such that there is no critical value of dX in the closed interval
α0
[α, α0 ], then X α deformation retracts onto X α . In particular, this implies that H∗ (X α )  H(X α ).
0 0

In the homology inference frameworks, the reach is usually used for the case when X is a
smoothly embedded manifold, while the weak feature size is used for general compact spaces.

6.3.2 Data on manifold


We now consider the problem of homology inference from a point sample of a manifold. We first
state a standard result from linear algebra (see also the Sandwich Lemma from [91]), which we
use several times in homology inference.
Fact 6.3. Given a sequence A → B → C → D → E → F of homomorphisms (linear maps)
between finite-dimensional vector spaces over some field, if rank (A → F) = rank (C → D), then
this quantity also equals rank (B → E).
Specifically, if A → B → C → E → F is a sequence of homomorphisms such that rank (A →
F) = dim C, then rank (B → E) = dim C.
Let P be a point set sampled from a manifold X ⊂ Rd . We construct either the Čech complex
Cα (P), or a pair of Rips complexes VRα (P) ,→ VR2α (P) for some parameter α > 0. The homology
groups of these spaces are related as follows.
Prop. 6.10
H(X) o / H(Pα ) o / H(Cα (P)) o / image H(VRα ) → H(VR2α )(6.15)
Nerve Thm Fact 6.3

Specifically, recall that Ar is the r-offset of A which also equals the union of balls ∪a∈A B(a, r).
The connection between the discrete samples P and the manifold X is made through the union of
balls Pα . The following result is a variant of a result by Niyogi, Smale, Weinberger [245]3 .
Proposition 6.10. Let P ⊂ Rd be a finite pointqset such that dH (X, P) ≤ ε where X ⊂ Rd is a
smooth manifold with reach ρ(X). If 3ε ≤ α ≤ 34 35 ρ(X), then H∗ (Pα ) is isomorphic to H∗ (X).

The Čech complex Cα (P) is the nerve complex for the set of balls {B(p, α), p ∈ P}. As
Euclidean balls are convex, Nerve Theorem implies that Cα (P) is homotopy equivalent to Pα .
It follows that we can use the Čech complex Cα (P), for an appropriate α, to infer homology of
X using the isomorphisms H∗ (X)  H∗ (Pα )  H∗ (Cα (P)). The first isomorphism follows from
Proposition 6.10 and the second one from the homotopy equivalence between the nerve and space.
A stronger statement in fact holds: For any α ≤ β, the following diagram commutes:
i∗
H∗ (Pα ) / H∗ (Pβ ) (6.16)
h∗ h∗
 
i∗
H∗ (Cα (P)) / H∗ (Cβ (P))

3
The result of [245] assumes that P ⊆ X, in which case it shows that Pα deformation retracts to X. In our statement
P is not necessarily from X, and the isomorphism follows from results of [245] and Fact 6.3.
Computational Topology for Data Analysis 161

Here, i∗ stands for the homomorphism induced by inclusions, and h∗ is the homomorphism in-
duced by the homotopy equivalence h : Pα → Cα (P) given by the Nerve Theorem. This leads to
the following theorem on estimating H∗ (X) from a pair of Rips complexes.

Theorem 6.11. Given a smooth manifold X embedded in Rqd , let ρ(X) be its reach. Let P ⊂ Rd be
finite sample such that dH (P, X) ≤ ε. For any 3ε ≤ α ≤ 3
16 5 ρ(X),
3
let i∗ : H∗ (VRα ) → H∗ (VR2α )
be the homomorphism induced by the inclusion i : VRα → VR . We have that 2α

rank (im (i∗ )) = dim(H∗ (Cα (P))) = dim(H∗ (X)).


q
Proof. By Eqn. (6.16) and Proposition 6.10, we have that for 3ε ≤ α ≤ β ≤ 3
4 5 ρ(X),
3

H∗ (X)  H∗ (Pα )  H∗ (Cα (P))  H∗ (Cβ (P)), (6.17)

where the last isomorphism is induced by inclusion. On the other hand, recall the interleaving
relation between the Čech and the Rips complexes:

· · · Cα (P) ⊆ VRα (P) ⊆ C2α (P) ⊆ VR2α (P) ⊆ C4α (P) · · · .

We thus have the following sequence of homomorphisms induced by inclusion:

H∗ (Cα (P)) → H∗ (VRα (P)) → H∗ (C2α (P)) → H∗ (VR2α (P)) → H∗ (C4α (P)).

We have H∗ (Cα (P))  H∗ (C2α (P))  H∗ (C4α (P)) by Eqn. (6.17). Thus we have

rank(H∗ (Cα (P)) → H∗ (C4α (P))) = dim(H∗ (Cα (P))).

The theorem then follows from the second part of Fact 6.3. 

6.3.3 Data on a compact set


We now consider the case when we are given a finite set of points P sampling a compact subset
X ⊂ Rd . It is known that an offset X α for any α > 0 may not be homotopy equivalent to X for
every compact set X. In fact, there exist compact sets so that H∗ (X λ ) is not isomorphic to H∗ (X) no
matter how small λ > 0 is (see Figure 4 of [90]). So, in this case we aim to recover the homology
groups of an offset X λ of X for a sufficiently small λ > 0.
The high level framework is in Eqn (6.18). Here we have 0 < λ < wfs(X), while Cα and
VR stand for the Čech and Rips complexes Cα (P) and VRα (P) over the point set P. For any
α

0 < λ < wfs(X):

Prop. 6.12
H∗ (X λ ) o / image H∗ (Cα → H∗ (C2α ) Eqn.
o (6.21)
/ image H∗ (VRα ) → H∗ (VR4α ). (6.18)
Fact 6.3

It is similar to Eqn (6.15) for the manifold case. However, we no longer have the isomorphism
between H∗ (Pα ) and H∗ (X). To overcome this difficulty, we leverage Proposition 6.9. This in turn
162 Computational Topology for Data Analysis

requires us to consider a pair of Čech complexes to infer homology of X λ , instead of a single Čech
complex as in the case of manifolds.
More specifically, suppose that the point set P satisfies that dH (P, X) ≤ ε; then we have the
following nested sequence for α > ε and α0 ≥ α + 2ε:

X α−ε ⊆ Pα ⊆ X α+ε ⊆ Pα ⊆ X α +ε .
0 0
(6.19)

By Proposition 6.9, we know that if it also holds that α0 + ε < wfs(X), then the inclusions
between X α−ε ⊆ X α+ε ⊆ X α +ε induce isomorphisms between their homology groups, which are
0

also isomorphic to H∗ (X λ ) for λ ∈ (0, wfs(X)). It then follows from the second part of Fact 6.3
that, for α, α0 ∈ ε, wfs(X) − ε and α0 − α ≥ 2ε, we have


H∗ (X λ )  im (i∗ ), where i∗ : H∗ (Pα ) → H∗ (Pα ) is induced by inclusion i : Pα ⊆ Pα .


0 0
(6.20)

Combining the above with the commutative diagram in Eqn. (6.16), we obtain the following
result on inferring homology of X λ using a pair of Čech complexes.

Proposition 6.12. Let X be a compact set in Rd and P ⊂ Rd a finite set of points with dH (X, P) < ε
for some ε < 41 wfs(X). Then, for all α, α0 ∈ ε, wfs(X) − ε such that α0 − α ≥ 2ε, and any λ ∈

(0, wfs(X)), we have H∗ (X λ )  im (i∗ ), where i∗ : H∗ (Cα (P)) → H∗ (Cα (P)) is the homomorphism
0

between homology groups induced by the inclusion i : Cα (P) ,→ Cα (P).


0

Finally, to perform homology inference with the Rips complexes, we again resort to the in-
terleaving relation between Čech and Rips complexes, and apply the first part of Fact 6.3 to the
following sequence

H∗ (Cα/2 (P)) → H∗ (VRα/2 (P)) → H∗ (Cα (P)) → H∗ (C2α (P)) → H∗ (VR2α (P)) → H∗ (C4α (P)).(6.21)

If 2ε ≤ α ≤ 41 (wfs − ε), both H∗ (Cα/2 (P)) → H∗ (C4α (P)) and H∗ (Cα (P)) → H∗ (C2α (P)) have
ranks equal to dim(H∗ (X λ )) by Proposition 6.12. Applying Fact 6.3, we then obtain the following
result.

Theorem 6.13. Let X be a compact set in Rd and P a finite point set with dH (X, P) < ε for some
ε < 91 wfs(X). Then, for all α ∈ 2ε, 41 (wfs(X) − ε) and all λ ∈ (0, wfs(X)), we have H∗ (Xλ ) 

im ( j∗ ), where j∗ is the homomorphism between homology groups induced by the inclusion j :
VRα/2 (P) ,→ VR2α (P).

6.4 Homology inference for scalar fields


Suppose we are only given a finite sample P ⊂ X from a smooth manifold X ⊂ Rd together with
a potentially noisy version fˆ of a smooth function f : X → R presented as a vertex function
fˆ : P → R. We are interested in recovering the persistent homology of the sub-level filtration
of f from fˆ. That is, the goal is to approximate the persistent homology induced by f from the
discrete sample P and function values fˆ on points in P.
Computational Topology for Data Analysis 163

6.4.1 Problem setup


Set Fα = f −1 (−∞, α] = {x ∈ X | f (x) ≤ α} as the sublevel set of f w.r.t. α. The sublevel set
filtration of X induced by f , denoted by F f = {Fα ; iα,β }α≤β , is a family of sets Fα totally ordered
by inclusion map iα,β : Fα ,→ Fβ for any α ≤ β (Section 3.1). This filtration induces the following
persistence module:
α,β
i∗ α,β
H p F f = {H p (Fα ) → H p (Fβ ) }α≤β , where i∗ is induced by inclusion map iα,β . (6.22)

For simplicity, we often write the filtration and the corresponding persistence module as F f =
{Fα }α∈R and H p F f = {H(Fα )}α∈R , when the choices of maps connecting their elements are clear.
Our goal is to approximate the persistence diagram Dgm p (F f ) from point samples P and
ˆf : P → R. Intuitively, we construct a specific Čech (or Rips) complex Cr (P), use fˆ to in-
duce a filtration of Cr (P), and then use its persistent homology to approximate Dgm p (F f ). More
specifically, we need to consider nested pair filtration for either Cr (P) or VRr (P).

Nested pair filtration. Let Pα = {p ∈ P | fˆ(p) ≤ α} be the set of sample points with the func-
tion value for fˆ at most α, which presumably samples the sublevel set Fα of X w.r.t. f . To estimate
the topology of Fα from these discrete sample Pα , we consider either the Čech complex Cr (Pα )
or the Rips complex VRr (Pα ). For the time being, consider VRr (Pα ). As we already saw in pre-
vious sections, the topological information of Fα can be inferred from a pair of nested complexes
jα 0
VRr (Pα ) ,→ VRr (Pα ) for some appropriate r < r0 . To study F f , we need to inspect Fα → Fβ for
α ≤ β. To this end, fixing r and r0 , for any α ≤ β, consider the following commutative diagram
induced by inclusions:
H∗ (VRr (Pα )) / H∗ (VRr (Pβ )) (6.23)
iα∗ iβ ∗
 β 
jα∗
H∗ (VRr (Pα ))
0
/ H∗ (VRr0 (Pβ ))

β β β β
Set φα : im (iα∗ ) → im (iβ ∗ ) to be φα = jα ∗ |im (iα∗ ) , that is, the restriction of jα ∗ to im (iα∗ ). This
map is well-defined as the diagram above commutes. This gives rise to a persistence module
β
im (iα∗ ); φα α≤β , that is, a family of totally ordered vector spaces im (iα ) with commutative ho-
 
β
momorphisms φα between any two elements. We formalize and generalize the above construction
below.

Definition 6.11 (Nested pair filtration). A nested pair filtration is a sequence of pairs of com-

plexes {ABα = (Aα , Bα )}α∈R where (i) Aα ,→ Bα is inclusion for every α and (ii) ABα ,→ ABβ for
β

α ≤ β is given by Aα ,→ Aβ and Bα ,→ Bβ . The p-th persistence module of the filtration {ABα }α∈R
β β
is given by the homology module {im (H p (Aα ) → H p (Bα )); φα }α≤β where φα is the restriction
β
of jα ∗ on the im iα∗ . For simplicity, we say the module is induced by the nested pair filtration
{Aα ,→ Bα }.

The high level approach of inferring persistent homology of a scalar field f : X → R from a
set of points P equipped with fˆ : P → R involves the following steps:
164 Computational Topology for Data Analysis

Step 1. Sort all points of αi in non-decreasing fˆ-values, P = {p1 , . . . , pn }. Set αi = fˆ(pi ) for
i ∈ [1, n].
Step 2. Compute the persistence diagram induced by the filtration of nested pairs {VRr (Pαi ) ,→
0 0
VRr (Pαi )}i∈[1,n] (or {Cr (Pαi ) ,→ Cr (Pαi )}i∈[1,n] ) for appropriate parameters 0 < r < r0 .

The persistent homology (as well as persistence diagram) induced by the filtration of nested
pairs is computed via the algorithm in [105]. To obtain an approximation guarantee for the above
approach, we consider an intermediate object defined by the intrinsic Riemannian metric on the
manifold X. Indeed, note that the filtration of X w.r.t. f is intrinsic in the sense that it is indepen-
dent of how X is embedded in Rd . Hence it is more natural to approximate its persistent homology
with an object defined intrinsically for X.
Given a compact Riemannian manifold X ⊂ Rd embedded in Rd , let dX be the Riemannian
metric of X inherited from the Euclidean metric dE of Rd . Let BX (x, r) := {y ∈ X | dX (x, y) ≤ r} be
the geodesic ball on X centered at x and with radius r, and BoX (x, r) be the open geodesic ball. In
contrast, BE (x, r) (or simply B(x, r)) denotes the Euclidean ball in Rd . A ball BoX (x, r) is strongly
convex if for every pair y, y0 ∈ BX (x, r), there exists a unique minimizing geodesic between y and
y0 whose interior is contained within BoX (x, r). For details on these concepts, see [76, 164].

Definition 6.12 (Strong convexity). For x ∈ X, let ρc (x; X) denote the supreme of radius r such
that the geodesic ball BoX (x, r) is strongly convex. The strong convexity radius of (X, dX ) is defined
as ρc (X) := inf x∈X ρc (x; X).

Let dX (x, P) := inf p∈P dX (x, p) denote the closest geodesic distance between x and the set
P ⊆ X.

Definition 6.13 (ε-geodesic sample). A point set P ⊂ X is an ε-geodesic sample of (X, dX ) if for
all x ∈ X, dX (x, P) ≤ ε.

Recall that Pα is the set of points in P with fˆ-value at most α. The union of geodesic balls
Pδ;X = p∈Pα BX (p, δ) is intuitively the “δ-thickening" of Pα within the manifold X. We use two
S
α
kinds of Čech and Rips complexes. One is defined with the metric dE of the ambient Euclidean
space which we call (extrinsic) Čech complex Cδ (Pα ) and (extrinsic) Rips complex VRδ (Pα ). The
other is intrinsic Čech complex CδX (Pα ) and intrinsic Rips complex VRδX (Pα ) that are defined with
the intrinsic metric dX . Note that CδX (Pα ) is the nerve complex of the union of geodesic balls
forming Pαδ;X . Also the interleaving relation between the Čech and Rips complexes remains the
same as for general geodesic spaces; that is, CδX (Pα ) ⊆ VRδX (Pα ) ⊆ C2δ
X (Pα ) for any α and δ.

6.4.2 Inference guarantees


Recall from Chapter 4.1 that two ε-interleaved filtrations lead to ε-interleaved persistence mod-
ules, which further mean that the bottleneck distance between their persistence diagrams are
bounded by ε. Here we first relate the space filtration with the intrinsic Čech filtrations and then
relate these intrinsic ones with the extrinsic Čech or Rips filtrations of nested pairs as illustrated
in Eqn. 6.24 below.

{Fα } o / {Pαr;X } o / {Cr (Pα )} o


X
/ {Cr (Pα ) ,→ Cr0 (Pα )} or {VRr (Pα ) ,→ VRr0 (Pα )} (6.24)
Computational Topology for Data Analysis 165

Proposition 6.14. Let X ⊂ Rd be a compact Riemannian manifold with intrinsic metric dX , and
let f : X → R be a C-Lipschitz function. Suppose P ⊂ X is an ε-geodesic sample of X, equipped
with fˆ : P → R so that fˆ = f |P . Then, for any fixed δ ≥ ε, the filtration {Fα }α and the filtration
{Pδ;X
α }α are (Cδ)-interleaved w.r.t. inclusions.

The intrinsic Čech complex CδX (Pα ) is the nerve complex for {BX (p, δ)} p∈Pα . Furthermore,
for δ < ρc (X), the family of geodesic balls in {BX (p, δ)} p∈Pα form a cover of the union Pδ;Xα
that satisfies the condition of the Nerve Theorem (Theorem 2.1). Hence, there is a homotopy
equivalence between the nerve complex CδX (Pα ) and Pδ;X
α . Furthermore, using the same argument
for showing that diagram in Eqn. (6.16) commutes (Lemma 3.4 of [91]), one can show that the
following diagram commutes for any α ≤ β ∈ R and δ ≤ ξ < ρc (X):

H∗ (Pδ;X
α )
i∗
/ H∗ (Pξ;X ) (6.25)
β

h∗ h∗
 
H∗ (CδX (Pα ))
i∗
/ H∗ (Cξ (Pβ ))
X

Here the horizontal homomorphisms are induced by inclusions, and the vertical ones are isomor-
phisms induced by the homotopy equivalence between a union of geodesic balls and its nerve
complex. The above diagram leads to the following result (see Lemma 2 of [87] for details):
Corollary 6.15. Let X, f , and P be as in Proposition 6.14 (although f does not need to be C-
Lipschitz). For any δ < ρc (X), {Pαδ;X }α∈R and {CδX (Pα )}α∈R are 0-interleaved. Hence they induce
isomorphic persistence modules which have identical persistence diagrams.
Combining with Proposition 6.14, this implies that the filtration {CδX (Pα )}α and the filtration
{Fα }α are Cδ-interleaved for ε ≤ δ < ρc (X).
However, we cannot access the intrinsic metric dX of the manifold X and thus cannot directly
construct intrinsic Čech complexes. It turns out that for points that are sufficiently close, their
Euclidean distance forms a constant factor approximation of the geodesic distance between them
on X.
Proposition 6.16. Let X ⊂ Rd be an embedded Riemannian manifold with reach ρX . For any two
points x, y ∈ X with dE (x, y) ≤ ρX /2, we have that:
4dE2 (x, y) 
 
4
dE (x, y) ≤ dX (x, y) ≤ 1 +

 dE (x, y) ≤ dE (x, y).
3ρX 2 3

This implies the following nested relation between the extrinsic and intrinsic Čech complexes:
4
δ 16
δ 3
CδX (Pα ) ⊆ Cδ (Pα ) ⊆ CX3 (Pα ) ⊆ C 3 δ (Pα ) ⊆ CX9 (Pα ); for any δ < ρX .
4
(6.26)
8
Note that a similar relation also holds between the intrinsic Čech filtration and the extrinsic Rips
complexes due to the nested relation between extrinsic Čech and Rips complexes. To infer persis-
tent homology from nested pairs filtrations for complexes constructed under the Euclidean metric,
we use the following key lemma from [87], which can be thought of as a persistent version as well
as a generalization of Fact 6.3.
166 Computational Topology for Data Analysis

Proposition 6.17. Let X, f , and P be as in Proposition 6.14. Suppose that there exist ε0 ≤ ε00 ∈
[ε, ρc (X)) and two filtrations {Gα }α and {G0α }α , so that

for all α ∈ R, CεX (Pα ) ⊆ Gα ⊆ CεX (Pα ) ⊆ G0α ⊆ CεX (Pα ).


0 00

Then the persistence module induced by the filtration {Fα }α for f and that induced by the nested
pairs of filtrations {Gα ,→ G0α }α are Cε00 -interleaved, where f is C-Lipschitz.

Combining this proposition with the sequences in Eqn. (6.26), we obtain the following results
on inferring the persistent homology induced by a function f : X → R.

Theorem 6.18. Let X ⊂ Rd be a compact Riemannian manifold with intrinsic metric dX , and
f : X → R a C-Lipschitz function on X. Let ρX and ρc (X) be the reach and the strong convexity
radius of (X, dX ) respectively. Suppose P ⊂ X is an ε-geodesic sample of X, equipped with
fˆ : P → R such that fˆ = f |P . Then:

(i) for any fixed r such that ε ≤ r ≤ min{ 16 9


ρc (X), 32
9
ρX }, the persistent homology module
induced by the sublevel-set filtration of f : X → R and that induced by the filtration of
4
nested pairs {Cr (Pα ) ,→ C 3 r (Pα )}α are 16
9 Cr-interleaved; and
(ii) for any fixed r such that 2ε ≤ r ≤ min{ 32 9
ρc (X), 64
9
ρX }, the persistent homology module
induced by the sublevel set filtration of f and that induced by the filtration of nested pairs
8
{VRr (Pα ) ,→ VR 3 r (Pα )}α are 32
9 Cr-interleaved.

In particular, in each case above, the bottleneck distance between their respective persistence
diagrams is bounded by the stated interleaving distance between persistence modules.

6.5 Notes and Exercises


Part of Theorem 6.3 is proved in [77, 78]. A complete proof as well as a thorough treatment for
geometric complexes such as Rips and Čech complexes can be found in [81]. The first approach
on data sparsification for Rips filtrations is proposed by Sheehy [274]. The presentation of Chap-
ter 6.2.1 is based on a combination of the treatments of sparsification in [56] and [275] (in [275],
a net-tower created via net-tree data structure (e.g., [182]) is used for constructing sparse Rips
filtration). Extension of such sparsification to Čech complexes and a geometric interpretation are
provided in [70]. The Rips sparsification is extended to handle weighted Rips complexes derived
from distance to measures in [56]. Sparsification via simplicial towers is introduced in [125].
This is an application of the algorithm we presented in Section 4.2 for computing persistent ho-
mology for a simplicial tower. Simplicial maps allow batch-collapse of vertices and leads to more
aggressive sparsification. However, in practice it is observed that it also has the over-connection
issues as one collapses the vertices. This issue is addressed in [135]. In particular, the SimBa
algorithm of [135] exploits the simplicial maps for sparsification, but connects vertices at sparser
levels based on a certain distance between two sets (each of which intuitively is the set of original
points mapped to a vertex at the present sparsified level). While SimBa has similar approximation
guarantees in sparsification, in practice, the sparsified sequence of complexes has much smaller
size compared to prior approaches.
Computational Topology for Data Analysis 167

Much of the materials in Section 6.3 are taken from [81, 87, 91, 245]. We remark that there
have been different variations of the medial axis in the literature. We follow the notation from
[119]. We also note that there exists a robust version of the medial axis, called the λ-medial axis,
proposed in [89]. The concept of the local feature size was originally proposed in [270] in the
context of mesh generation and a different version that we describe in this chapter was introduced
in [8] in the context of curve/surface reconstruction. The local feature size has been widely used
in the field of surface reconstruction and mesh generation; see the books [98, 119]. Critical points
of the distance field were originally studied in [177]. See [89, 90, 225] for further studies as well
as the development on weak feature sizes.
In homology inference for manifolds, we note that Niyogi, Smale and Weinberger in [245]
provide two deformation retract results from union of balls over P to a manifold X; Proposition
3.1 holds for the case when P ⊂ X, while Proposition 7.1 holds when P is within a tubular
neighborhood of X. The latter has much stronger requirement on the radius α. In our presentation,
Proposition 6.10 uses a corollary of Proposition 3.1 of [245] to obtain an isomorphism between
the homology groups of union of balls and of X. This allows a better range of the parameter
α – however, we lose the deformation retraction here; see the footnote above Proposition 6.10.
Results in Chapter 6.4 are mostly based on the work in [87].
This chapter focuses on presenting the main framework behind homology (or persistent ho-
mology) inference from point cloud data. The current theoretical guarantees hold when input
points sample the hidden domain well within Hausdorff distance. For more general noise models
that include outliers and statistical noise, we need a more robust notion of distance field than what
we used in Section 6.3.1. To this end, an elegant concept called distance to measures (DTM)
has been proposed in [79], which has many nice properties and can lead to more robust homo-
logical inferences; see, e.g., [82]. An alternative approach using kernel-distance is proposed in
[256]. See also [56, 79, 246] for data sparsification or homology inference for points corrupted
with more general noise, and [55] for persistent homology inference under more general noise for
input scalar fields.

Exercise
1. Prove Part (i) of Theorem 6.3.

2. Prove the bound on the Rips pseudo-distance dRips (P, Q) in Part (ii) of Theorem 6.3.

3. Given two finite sets of points P, Q ⊂ Rd , let dP and dQ denote the restriction of the Eu-
clidean metric over P and Q respectively. Consider the Hausdorff distance δH = dH (P, Q)
between P and Q, as well as the Gromov-Hausdorff distance δGH = dGH ((P, dP ), (Q, dQ )).

(i) Prove that δGH ≤ δH .


(ii) Assume P, Q ⊂ R2 . Let T stand for the set of rigid transformations over R2 (rotation,
reflection, translations and their combinations). Let δ∗H := in ft∈T δH (P, t(Q)) denote
the smallest Hausdorff distance possible between P and a copy of Q under rigid trans-
formation. Give an example of P, Q ⊂ R2 such that δ∗H is much larger than δGH , say
δ∗H ≥ 10δGH (in fact, this can hold for any fixed constant).
168 Computational Topology for Data Analysis

4. Prove Proposition 6.5.

5. Consider the greedy permutation approach introduced in Chapter 6.2, and the assignment
of exit-times for points p ∈ P. Construct the open tower {Nγ } and closed tower {N γ } as
described in the chapter. Prove that both Nγ and N γ are γ-nets for P.

6. Suppose we are given P0 ⊃ P1 sampled from a metric space (Z, d) where P1 is a γ-net of
P0 . Define π : P0 → P1 as π(p) 7→ argminq∈P1 d(p, q) (if argminq∈P1 d(p, q) contains more
than one point, then set π(p) to be any point q that minimizes d(p, q)).

(a) Prove that the vertex map π induces a simplicial map π : VRα (P0 ) → VRα+γ (P1 ).
(b) Consider the following diagram. Prove that the map j◦π is contiguous to the inclusion
map i.

VRα (P0 )  / VRα+γ (P0 )
i
O (6.27)
π
j
& ?
VRα+γ (P1 )

7. Let P be a set of points in Rd . Let d2 and d1 denote the distance metric under L2 norm
and under L1 norm respectively. Let C2 (P) and C1 (P) be the Čech filtration over P induced
by d2 and d1 respectively. Show the relation between the log-scaled version of persistence
diagrams Dgmlog C2 (P) and Dgmlog C1 (P), that is, bound db (Dgmlog C2 (P), Dgmlog C1 (P))
(see the discussion above Corollary 4.4 in Chapter 4).

8. Prove Proposition 6.14. Using the fact that Diagram 6.25 commutes, prove Corollary 6.15.
Chapter 7

Reeb Graphs

Topological persistence provides an avenue to study a function f : X → R on a space X. Reeb


graphs provide another avenue to do the same; although the summarizations produced by the two
differ in a fundamental way. Topological persistence produces barcodes as a simplified signa-
ture of the function. Reeb graphs instead provides a 1-dimensional (skeleton) structure which
represents a simplification of the input domain X while taking the function into account for this
simplification. Of course, one loses higher dimensional homological information in the Reeb
graphs, but at the same time, it offers a much lighter and computationally inexpensive transfor-
mation of the original space which can be used as a signature for tasks such as shape matching
and functional similarity. An example from [190] is given in Figure 7.1, where a multiresolutional
representation of the Reeb graph is used to match surface models.

Figure 7.1: (Left). A description function based on averaging geodesic distances is shown on
different models, together with some isocontours of this function. This function is robust w.r.t.
near-isometric deformation of shapes. (Right) The Reeb graph of the descriptor function (from the
left) is used to compare different shapes. Here, given a query shape (called “key"), the most sim-
ilar shapes retrieved from a database are shown on the right. Images taken from [190], reprinted
by permission from ACM: Masaki Hilaga et al. (2001).

169
170 Computational Topology for Data Analysis

We define the Reeb graph and introduce some properties of it in Section 7.1. We also describe
efficient algorithms to compute it for the piecewise-linear setting in Section 7.2. For comparing
Reeb graphs, we need to define distances among them. In Section 7.3, we present two equivalent
distance measures for the Reeb graphs and give a stability result of these distances w.r.t. changes
in the input function that define the Reeb graph. In particular, we note that a Reeb graph can also
be viewed as a graph equipped with a “height” function on it which is induced by the original
function f : X → R on the input domain. This height function provides a natural metric on the
Reeb graph, rendering a view of the Reeb graph as a specific metric graph. This further leads
to a distance measure for Reeb graphs based on the Gromov-Hausdorff distance idea, which we
present in Section 7.3. An alternative way to define a distance for Reeb graphs is based on the
interleaving idea, which we also introduce in Section 7.3. It turns out that these two versions of
distances for Reeb graphs are strongly equivalent, meaning that they are within a constant factor
of each other.

7.1 Reeb graph: Definitions and properties


Before we give a formal definition of the Reeb graph, let us recall some relevant definitions from
Section 1.1. A topological space X is disconnected if there are two disjoint open sets U and V so
that X = U ∪V. It is called connected otherwise. A connected component of X is a maximal subset
(subspace) that is connected. Given a continuous function f : X → R on a finitely triangulable
topological space X, for each a ∈ R, consider the level set f −1 (a) = {x ∈ X : f (x) = a} of f . It is
a subspace of X and we can talk about its connected components in this subspace topology.
Definition 7.1 (Reeb graph). Define an equivalence relation ∼ on X by asserting x ∼ y iff (i)
f (x) = f (y) = α and (ii) x and y belong to the same connected component of the level set f −1 (α).
Let [x] denote the equivalent class to which a point x ∈ X belongs to. The Reeb graph R f of
f : X → R is the quotient space X/∼, i.e., the set of equivalent classes equipped with the quotient
topology. Let Φ : X → R f , x 7→ [x] denote the quotient map.

Φ
X Rf
f

x y z
Φ(x) = Φ(y) Φ(z)

Figure 7.2: Reeb graph R f of the function f : X → R.

If the input is “nice”, for example, if f is a Morse function on a compact manifold, or a


PL-function on a compact polyhedron, then R f indeed has the structure of a finite 1-dimensional
Computational Topology for Data Analysis 171

regular CW complex which is a graph, and this is why it is commonly called a Reeb graph. In
particular, from now on, we tacitly assume that the input function f : X → R is levelset tame,
meaning that (i) each level set f −1 (a) has a finite number of components, and each component is
path connected1 , and (ii) f is of Morse type (Definition 4.14). It is known that Morse functions
on a compact smooth manifold and PL-functions on finite simplicial complexes are both levelset
tame.
A level set may consist of several connected components, each of which is called a contour.
Intuitively, the Reeb graph R f is obtained by collapsing contours (connected components) in each
level set f −1 (a) continuously. In particular, as we vary a, R f tracks the the changes (e.g., creation,
deletion, splitting and merging) of connected components in the levelsets f −1 (a), and thus is a
meaningful topological summary of f : X → R.
As the function f is constant on each contour in a levelset, f : X → R also induces a contin-
uous function f˜ : R f → R defined as f˜(z) = f (x) for any preimage x ∈ Φ−1 (z) of z. To simplify
notation, we often write f (z) instead of f˜(z) for z ∈ R f when there is no ambiguity, and use f˜
mostly to emphasize the different domains of the functions. In all illustrations of this chapter, we
plot the Reeb graph with the vertical coordinate of a point z to be the function value f (z).

Critical points. As we describe above, the Reeb graph can be viewed as the underlying space
of a 1-dimensional cell complex, where there is also a function f˜ : R f → R defined on R f . We
can further assume that the function f˜ is monotone along each 1-cell of R f – if not, we simply
insert a new node where this condition fails, and the tameness of f : X → R guarantees that we
only need to add finite number of nodes. Hence we can view the Reeb graph as the underlying
space of a 1-dimensional simplicial complex (graph) (V, E) associated with a function f˜ that is
monotone along each edge e ∈ E. Note that we can further insert more nodes into an edge in E,
breaking it into multiple edges; see, e.g., the augmented Reeb graph in Figure 7.4 (c). We now
continue with this general view of the Reeb graph, whose underlying space is a graph equipped
with a function f˜ that is monotone along each edge. We can then talk about the induced critical
points as in Definition 3.23. An alternative (and simpler) way to describe such critical points are
as follows: Given a node x ∈ V in the vertex set V := V(R f ) of the Reeb graph R f , let up-degree
(resp. down-degree) of x denote the number of edges incident to x that have higher (resp. lower)
values of f˜ than x. A node is regular if both of its up-degree and down-degree equal to 1, and
critical otherwise. A critical point is a minimum (maximum) if it has down-degree 0 (up-degree
0), and a down-fork (up-fork) if it has down-degree (up-degree) larger than 1. A critical point can
be degenerate, having more than one types of criticality: e.g., a point with down-degree 0 and
up-degree 2 is both a minimum and an up-fork.
Note that because of the monotonicity of f˜ at regular points, the Reeb graph together with its
associated function is completely described, up to homeomorphisms preserving the function, by
the function values at the critical points.
Now imagine that one sweeps the domain X in increasing order of f -values, and tracks the
changes in the connected components during this process. New components appear (at down-
degree 0 nodes), existing components vanish (at up-degree 0 nodes), or components merge or
1
As introduced in Exercise 3 of Chapter 1, a topological space T is path connected if any two points x, y ∈ T can
be joined by a path, i.e., there exists a continuous map f : [0, 1] → T of the segment [0, 1] ⊂ R onto T so that f (0) = x
and f (1) = y.
172 Computational Topology for Data Analysis

split (at down/up-forks). The Reeb graph R f encodes such changes thereby making it a simple
but meaningful topological summary of the function f : X → R. However, it only tracks the
connected components in the levelset, thus cannot capture complete information about f . Never-
theless, it reflects certain aspects about both the domain X itself and the function f defined on it,
which we describe in Section 7.2.3.
f X f f f

(a) Input scalar field (b) Reeb graph (c) Merge tree Split tree

Figure 7.3: Examples of the Reeb graph, the merge tree and the split tree of an input scalar field.

Variants of Reeb graphs. Treating a Reeb graph as a simplicial 1-complex, we can talk about
1-cycles (loops) in it. A loop-free Reeb graph is also called a contour tree, which itself has found
many applications in computer graphics and visualization. Instead of tracking the connected
components within a levelset, one can also track them within the sublevel set while sweeping
X along increasing f -values, or track them within the superlevel set while sweeping X along
decreasing f -values. The resulting topological summaries are called the merge tree and the split
tree, respectively. See the precise definition below and examples in Figure 7.3.

Definition 7.2. Define x ∼ M y if and only if f (x) = f (y) = a and x is connected to y within the
sublevel set f −1 ((−∞, a]). Then the quotient space T M = X/ ∼ M is the merge tree w.r.t. f .
Alternatively, if we define x ∼S y if and only if f (x) = f (y) = a and x is connected to y within
the superlevel set f −1 ([a, +∞)), then the quotient space T S = X/ ∼S is the split tree w.r.t. f .

Indeed, for levelset tame functions we consider, T M and T S are both finite trees. If R f is
loop-free (thus a tree), then this contour tree is uniquely decided by, and can be computed from,
the merge and split trees of f .
Finally, instead of real-valued functions, one can define a similar quotient space X/ ∼ for a
continuous map f : X → Z to a general metric space (e.g, Z = Rd ), where ∼ is the equivalence
relation x ∼ y if and only if f (x) = f (y) = a and x is connected to y within the levelset f −1 (a). The
resulting structure is called the Reeb space. See Section 9.3 where we consider this generalization
in the context of another structure called mapper.

7.2 Algorithms in the PL-setting


Piecewise-linear setting. Consider a simplicial complex K and a PL-function f : |K| → R on
it. Since R f depends only on the connectivity of each levelset, for a generic function f (where no
two vertices have the same function value), the Reeb graph of f depends only on the 2-skeleton
Computational Topology for Data Analysis 173

of K. From now on, we assume that f is generic and K = (V, E, T ) is a simplicial 2-complex
with vertex set V, edge set E and triangle set T . Let nv , ne and nt denote the size of V, E, and T ,
respectively, and set m = nv + ne + nt . We sketch algorithms to compute the Reeb graph for the
PL-function f . Sometimes, they output the so-called augmented Reeb graph, which is essentially
a refinement of the Reeb graph R f with certain additional degree-2 vertices inserted in arcs of R f .
Definition 7.3 (Augmented Reeb). Given a PL-function f : |K| → R defined on a simplicial
complex K = (V, E, T ), let R f be its Reeb graph and Φ f : |K| → R f (K) be the associated quotient
map. The augmented Reeb graph of f : |K| → R, denoted by b R f , is obtained by inserting each
point in Φ f (V) := {Φ f (v) | v ∈ V} as graph nodes to R f (if it is not already in).

r
w Φf (r)
Φf (w) Φf (w)
q
Φf (q)

p Φf (p)
Φf (p)

(a) (b) (c)

Figure 7.4: (a) A simplicial complex K. The set of 2-simpices of K include 4rpq, 4rpw, 4rqw,
as well as the two dark-colored triangles incident to p and to w, respectively. (b) Reeb graph of
the height function on |K|. (c) Its augmented Reeb graph.

For a PL-function, each critical point of the Reeb graph R f (w.r.t. f˜ : R f → R induced by f )
is necessarily the image of some vertex in K, and thus the critical points form a subset of points in
Φ f (V). The augmented Reeb graph b R f then includes all remaining points in Φ f (V) as (degree-2)
graph nodes. See Figure 7.4 for an example, where as a convention, we plot a node Φ f (v) at the
same height (function value) as v.
We now sketch the main ideas behind two algorithms that compute the Reeb graph for a
PL-function with the best time complexity, one deterministic and the other randomized.

7.2.1 An O(m log m) time algorithm via dynamic graph connectivity


Here we describe an O(m log m)-time algorithm [252] for computing the Reeb graph of a PL-
function f : |K| → R, whose time complexity is the best among all existing algorithms for Reeb
graph computation. We assume for simplicity that no two vertices in V share the same f -value.
As K = (V, E, T ) is a simplicial 2-complex, the level set f −1 (a) for any function value a con-
sists of nodes (intersection of the level set f −1 (a) with edges in E) and edges (intersection of the
levelset f −1 (a) with some triangles in T ). This can be viewed as yet another graph, which we
denote by Ga = (Wa , Fa ) and refer to as the pre-image graph: Each vertex in Wa corresponds to
some edge in E. Each edge in Fa connects two vertices in Wa and thus can be associated to a pair
of edges in E adjoining a certain triangle in T . See Figure 7.5 for an example. Obviously, con-
nected components in Ga correspond to connected components in f −1 (a), and under the quotient
map Φ, each component is mapped to a single point in the Reeb graph R f .
174 Computational Topology for Data Analysis

Gb

b
v
a
Ga

Figure 7.5: As one sweeps past v, the combinatorial structure of the pre-image graph changes.
Ga has 3 connected components (one of which contains a single point only), while Gb has only 2
components.

A natural idea to construct the Reeb graph R f of f : |K| → R is to sweep the domain K
with increasing value of a, track the connected components in Ga during the course, and record
the changes (merging or splitting of components, or creation and removal of components) in the
resulting Reeb graph.
Furthermore, as f is a PL-function, the combinatorial structure of Ga can only change when
we sweep past a vertex v ∈ V. When that happens, only edges / triangles from K incident to v
can incur changes in Ga . See Figure 7.5. Let sv denote the total number of simplicies incident
on v. It is easy to see that as one sweeps through the vertex v, only O(sv ) number of insertions
and deletions are needed to update the pre-image graph Ga . To be able to build the Reeb graph
R f , we simply need to maintain the connectivity of Ga as we sweep. Assuming we have a data
structure to achieve this, the high level framework of the sweep algorithm is then summarized in
Algorithm 12:Reeb-SweepAlg.

Algorithm 12 Reeb-SweepAlg(K, f )
Input:
A simplicial 2-complex K and a vertex function f : V(K) → R
Output:
The Reeb graph of the PL-function induced by f
1: Sort vertices in V = {v1 , . . . , vnv } in increasing order of f -values
2: Initialize the Reeb graph R and the pre-image grpah Ga to be empty
3: for i = 1 to nv do
4: LC = LowerComps(vi )
5: UpdatePreimage(vi ) \∗Update the pre-image graph Ga ∗\
6: UC = UpperComps(vi ))
7: UpdateReebgraph(R, LC, UC, vi )
8: end for
9: Output R as the Reeb graph

In particular, suppose we have a data structure, denoted by DynSF, that maintains a spanning
forest of the pre-image graph at any moment. Each connected component in the pre-image graph
is associated with a certain vertex v from V, called representative vertex of this component, which
Computational Topology for Data Analysis 175

indicates that this component is created when passing through v. We assume that the data structure
DynSF allows the following operations: First, assume that a graph node ea ∈ Wa in the pre-image
graph Ga is generated by edge e ∈ K, that is, ea is the intersection of e with the levelset f −1 (a).

• Find(e): given an edge e ∈ E, returns the representative vertex of the component in the
current pre-image graph Ga containing the node ea ∈ Wa generated by e.
• Insert(e, e0 ), Delete(e, e0 ): inserts an edge (ea , e0a ) into Ga and deletes (ea , e0a ) from Ga
respectively while still maintaining a spanning forest for Ga under these operations.

Using these operations, the pseudo-codes for the subroutines called in algorithm Reeb-SweepAlg
are given in Algorithms 13:LowerComps, 14:UpdatePreImage, and 15:UpdateReebGraph. (The
routine UpperComps is symmetric to LowerComps and thus omitted.) These codes assume that
edges of K not intersecting the levelsets are still in the pre-image graphs as isolated nodes; hence
there is no need to add or remove isolated nodes.

Algorithm 13 LowerComps(v)
Input:
a vertex v ∈ K
Output:
A list Lc of connected components in the pre-image graph generated by the lower-star of v
1: LC = empty list
2: for all edges e in the lower-star of v do
3: c = DynSF.Find(e)
4: if c is not marked ‘listed’ then
5: LC.add(c); and mark c as ’listed’
6: end if
7: end for

Time complexity analysis. Suppose the input simplicial 2-complex K = (V, E, T ) has n vertices
and m simplices in total. Sorting the vertices takes O(n log n) time. Then steps 4 to 7 of the
algorithm Reeb-SweepAlg performs O(m) numbers of Find, Insert and Delete operations using
the data structure DynSF.
One could use state-of-the-art data structure for dynamic graph connectivity as DynSF – in-
deed, this is the approach taken in [146]. However, note that this is an offline version of the
dynamic graph connectivity problem, as all insertions / deletions are known in advance and thus
can be pre-computed. To this end, we assign each edge in the pre-image graph a weight, which is
the time ( f -value) it will be deleted from the pre-image graph Ga . We then maintain a maximum
spanning forest of Ga during the sweeping to maintain connectivity. In general, a deletion of a
maximum-spanning tree edge (u, v) can incur expensive search in the pre-image graph for a re-
placement edge (as u and v may still be connected). However, because of the specific assignment
of edge weights, this expensive search is avoided in this case. If a maximum spanning tree edge
is to be deleted, it will simply break the tree in the maximum spanning forest containing this
edge, and no replacement edge needs to be identified. One can use a standard dynamic tree data
176 Computational Topology for Data Analysis

Algorithm 14 UpdatePreImage(v)
Input:
A vertex v ∈ K
Output:
Update the pre-image graph after sweeping past v
1: for all triangles uvw incident on v do
2: \∗ w.l.o.g. assume f (u) < f (w) ∗\
3: if f (v) < f (u) then
4: DynSF.Insert(vu, vw)
5: else
6: if f (v) > f (w) then
7: DynSF.Delete(vu, vw)
8: else
9: DynSF.Delete(uv, uw)
10: DynSF.Insert(vw, uw)
11: end if
12: end if
13: end for

Algorithm 15 UpdateReebGraph(R, LC, UC, v)


Input:
Current Reeb graph R for f −1 (−∞, f (v)), a vertex v, the list LC (resp. UC) of components in
the lower-star (resp. upper-star) of v
Output:
Update Reeb graph R to be that for sublevel set f −1 (−∞, f (v) + ε] for an infinitesimally small
ε>0
1: Create a new node v̂ in R corresponding to v
2: Assign node v̂ to each component in UC
3: Create an arc in R between v̂ and the Reeb graph node corresponding to the representative
vertex of each c in LC
4: Return updated Reeb graph R

structure, such as the Link-Cut trees [280], to maintain the maximum spanning forest efficiently in
O(log m) amortized time for each find / insertion / deletion operation. Putting everything together,
it takes O(m log m) time to compute the Reeb graph by the sweep.
Theorem 7.1. Given a PL-function f : |K| → R, let m denote the total number of simplices in the
2-skeleton of K. One can compute the (augmented) Reeb graph R f of f in O(m log m) time.

7.2.2 A randomized algorithm with O(m log m) expected time


In this section we describe a randomized algorithm [185] whose expected time complexity matches
the previous algorithm. However, it uses a strategy different from sweeping: Intuitively, it directly
Computational Topology for Data Analysis 177

models the effect of the quotient map Φ, but does so in a randomized manner so as to obtain a
good (expected) running time.
v7 v7 v7
v3 v3 v3
v6 v6 v6
v4 v4 v4
v1 v1 v1

v2 v8 v2 v8 v2 v8
v5 v5 v5

(a) (b) (c)


v7 v7 v7
v3 v3 v3
v6 v6 v6

v4 v4 v4
v1 v1 v1

v2 v8 v2 v8 v2 v8
v5 v5 v5

(d) (e) (f)

Figure 7.6: The vertices are randomly ordered. Starting from the initial simplicial complex in (a),
the algorithm performs vertex-collapse for vertices in this random order, as shown in (b) – (f).

In general, given f : X → R and associated quotient map Φ : X → R f , each connected


component (contour) C within a level set f −1 (a) is mapped (collapsed) to a single point Φ(C)
in R f . For the case where X = |K| and f is piecewise-linear over simplices in K, the image of
the collection of contours passing through every vertex in V decides the nodes in the augmented
Reeb graph b R, and intuitively contains sufficient information for constructing b
R. The high level
algorithm to compute the augmented Reeb graph R is given in Algorithm 16:Reeb-RandomAlg.
b
See Figure 7.6 for an illustration of the algorithm.

Algorithm 16 Reeb-RandomAlg(K, f )
Input:
A simplicial 2-complex K and a vertex function f : V(K) → R
Output:
The augmented Reeb graph of the PL-function induced by f
1: Let V = {v1 , . . . , vnv } be a random permutation of vertices in V
2: Set K0 = K and f0 = f
3: for i = 1 to nv do
4: Collapse the contour of fi−1 : |Ki−1 | → R passing through (incident to) vi and obtain
complex Ki
5: fi : |Ki | → R is the PL-function on Ki induced from fi−1
6: end for
7: Output the final complex Knv as the augmented Reeb graph

In particular, algorithm Reeb-RandomAlg starts with function f0 = f defined on the original


simplicial complex K0 = K. Take a random permutation of all vertices in V = V(K). At the
beginning of the i-th iteration, it maintains a PL-function fi−1 : |Ki−1 | → R over a partially
178 Computational Topology for Data Analysis

p2 p2
e2 e2
e3
q q q x q
y
e1 e1
p1 p1

(a) (b) (c)


r r1
p2 e5 r p2 r p2 e5 p3
p2 r
e2 e2 e6
q e3 e4 q e4
q
q0 q(= q 0 )
e1 e1
p1 p1 p1
p4 p1

(d) (e)

Figure 7.7: The function f is the height function. The contour incident to point q for the complex
in (a) is collapsed, resulting a new complex in (b); and (c) the collapse of the contour within a
single triangle incident to q. (d) An example where this triangle is bordering another triangle.
(e) There are two triangles incident to q that has q being the mid-vertex; and they both need to
be processed. The triangle qp1 p4 does not have q as mid-vertex, and it is not touched while
processing q.

collapsed simplicial complex Ki−1 whose augmented Reeb graph is the same as that of f . It
then “collapses" the contour of fi−1 passing through the vertex vi and obtains a new PL-function
fi : |Ki | → R over a further collapsed simplicial complex Ki that maintains the augmented Reeb
graph.
The key is to implement this “collapse" step (lines 4-5). To see the effect of collapsing the
contour incident to a vertex, see Figure 7.7 (a) and (b). To see how is the collapse implemented,
first consider the triangle qp1 p2 incident to vertex q as in Figure 7.7 (c), and assume that q is
the mid-vertex of this triangle, that is, its height value ranks second among the three vertices of
the triangle. Intuitively, we need to map each horizontal segment (part of a contour at different
height) to the corresponding point along the edges qp1 and qp2 . If this triangle incident to q that
we are collapsing has one or more triangles sharing the edge p1 p2 as shown in Figure 7.7 (d), then
for each such incident triangle, we need to process it appropriately. In particular, see one such
triangle (p1 , p2 , r) in Figure 7.7 (d), then, as q0 is sent to q, the dotted edge rq0 becomes edge rq as
shown. Thus, the triangle rp1 p2 is now split into two new triangles qrp1 and qrp2 . In this case, it
is easy to see that at most one of the new triangles will have q as the mid-vertex. We collapse this
triangle and continue the process until no more triangle with q as the mid-vertex is left (Figure 7.7
(b)). Triangle(s) incident to q but not having q as the mid-vertex are not processed, e.g., triangle
qp1 p4 in Figure 7.7 (e). At this point, the entire contour passing through q is collapsed into a
single point, and lines 4-5 of the algorithm are executed.
After processing each vertex as described above, the algorithm Reeb-RandomAlg in the end
Computational Topology for Data Analysis 179

computes the final complex Knv in line 7. It is necessarily a simplicial 1-complex because no
vertex can be the mid-vertex of any triangle, implying that there is no triangle left. It is easy to
see that, by construction, Knv is the augmented Reeb graph w.r.t. f : |K| → R.

Time complexity. For each vertex v, the time complexity of the collapse is proportional to
the number of triangles T v intersected by the contour Cv passing through v. In the worst case,
T v = |nt |, giving rise to O(nv nt ) worst case running time for algorithm Reeb-RandomAlg. This
worst case time complexity turns out to be tight. However, if one processes the vertices in a
random order, then the worst case behavior is unlikely to happen, and the expected running time
can be proven to be O(m log nv ) = O(m log m). Essentially, one argues that an original triangle
from the input simplicial complex is split only O(log nv ) = O(log m) expected number of times
thus creating O(log m) expected number of intermediate triangles which takes O(log m) expected
time to collapse. The argument is in spirit similar to the analysis of the path length in a randomly
built binary search tree [109].

Theorem 7.2. Given a PL-function f : |K| → R defined on a simpicial 2-complex K with m


simplices, one can compute the (augmented) Reeb graph in O(m log m) expected time.

7.2.3 Homology groups of Reeb graphs


Homology groups for a graph can have non-trivial ranks only in dimension zero and one. There-
fore, for a Reeb graph R f , we only need to consider H0 (R f ) and H1 (R f ). In particular, their rank
β0 (R f ) and β1 (R f ) are simply the number of connected components and the number of indepen-
dent loops in R f respectively.

Fact 7.1. For a tame function f : X → R, β0 (X) = β0 (R f ) and β1 (X) ≥ β1 (R f ).

The equality β0 (X) = β0 (R f ) in the above statement follows from the fact that R f is the quo-
tient space X/ ∼ and each equivalent class itself is connected (it is a connected component in some
levelset). The relation on β1 can be proven directly, and it is also a by-product of Theorem 7.4
below (combined with Fact 7.2). The above statement also implies that if X is simply connected,
then R f is loop-free.
For the case where X is a 2-manifold, more information about X can be recovered from the
Reeb graph of a Morse function defined on it.

Theorem 7.3 ([107]). Let f : X → R be a Morse function defined on a connected and compact
2-manifold.

(i) if X is orientable, β1 (R f ) = β1 (X)/2; and


(ii) if X is non-orientable, β1 (R f ) ≤ β1 (X)/2.

We now present a result that characterizes H1 (R f ) w.r.t. H1 (X) in a more precise manner,
which also generalizes Theorem 7.3.
180 Computational Topology for Data Analysis

Horizontal and vertical homology. Given a continuous function f : X → R, let X=a :=


f −1 (a) and XI := f −1 (I) denote its levelset and interval set as before for a ∈ R and for an open or
closed interval I ⊆ R respectively. We first define the so-called horizontal and vertical homology
groups with respect to f .
A p-th homology class h ∈ H p (X) is horizontal if there exists a finite set of values {ai ∈ R}i∈A ,
S
where A is a finite index set, such that h has a pre-image under the map H p ( i∈A X=ai ) → H p (X)
induced by inclusion. The set of horizontal homology classes form a subgroup H p (X) of H p (X)
since the trivial homology class is horizontal, and the addition of any two horizontal homology
classes is still horizontal. We call this subgroup H p (X) the horizontal homology group of X with
respect to f . The vertical homology group of X with respect to f is then defined as:

Ȟ p (X) := H p (X)/H p (X), the quotient of H p (X) with H p (X).

The coset ω + H p (X) for every class ω ∈ H p (X) provides an equivalence class in Ȟ p (X). We call
h a vertical homology class if h + H p (X) is not 0 in Ȟ p (X). In other words, h < H p (X). Two
homology classes h1 and h2 are vertically homologous if h1 ∈ h2 + H p (X).

Fact 7.2. By definition, rank (H p (X)) = rank (H p (X)) + rank (Ȟ p (X)).

Let I be a closed interval of R. We define the height of I = [a, b] to be height(I) = |b − a|; note
that the height could be 0. Given a homology class h ∈ H p (X) and an interval I, we say that h is
supported by I if h ∈ im (i∗ ) where i∗ : H p (XI ) → H p (X) is the homomorphism induced by the
canonical inclusion XI ,→ X. In other words, XI contains a p-cycle γ from the homology class h.
We define the height of a homology class h ∈ H p (X) to be

height(h) = inf height(I).


I supports h

Isomorphism between Ȟ1 (X) and H1 (R f ). The surjection Φ : X → R f (X) induces a chain map
Φ# from the 1-dimensional singular chain group of X to the 1-dimensional singular chain group of
R f (X) which eventually induces a homomorphism Φ∗ : H1 (X) → H1 (R f (X)). For the horizontal
subgroup H1 (X), we have that Φ∗ (H1 (X)) = 0 ∈ H1 (R f (X)). Hence Φ∗ induces a well-defined
homomorphism between the quotient groups

H1 (X) H1 (R f (X))
Φ̌ : Ȟ1 (X) = → = H1 (R f (X)).
H1 (X) H1 (R f (X))

The right equality above follows from that H1 (R f (X)) = 0, which holds because every level set
of R f (X) consists only of a finite set of disjoint points due to the levelset-tameness of function
f : X → R. It turns out that Φ̌ is an isomorphism – Intuitively, this is not surprising as Φ maps
each contour in the level set to a single point, which in turn collapses every horizontal cycle.

Theorem 7.4. Given a levelset tame function f : X → R, let Φ̌ : Ȟ1 (X) → H1 (R f (X)) be
the homomorphism induced by the surjection Φ : X → R f (X) as defined above. Then the map
Φ̌ is an isomorphism. Furthermore, for any vertical homology class h ∈ Ȟ1 (X), we have that
height(h) = height(Φ̌(h)).
Computational Topology for Data Analysis 181

Persistent homology for f : R f → R. We have discussed earlier that the Reeb graph of a
levelset tame function f : X → R can be represented by a graph whose edges have monotone
function values. Then, the function f : R f → R can be treated as a PL-function on the simpli-
cial 1-complex R f . This gives rise to the standard setting where a PL-function f is defined on a
simplicial 1-complex R f whose persistence is to be computed. We can apply algorithm ZeroP-
erDg from Section 3.5.3 to compute the 0-th persistence diagram Dgm0 ( f ). For computing one
dimensional persistence diagram Dgm1 ( f ), one can modify this algorithm slightly by registering
the function values of the edges that create cycles. These are edges that connect vertices in the
same component. The function values of these edges are the birth points of the 1-cycles that never
die. This algorithm takes O(n log n + mα(n)) time where m and n are the number of vertices and
edges respectively in R f .
We can also compute the levelset zigzag persistence of f (Section 4.5) using the zigzag per-
sistence algorithm in Section 4.3.2. However, taking advantage of the graph structures, one can
compute the levelset zigzag persistence for a Reeb graph with n vertices and edges in O(n log n)
time using an algorithm of [5] that takes advantage of mergeable tree data structure [169]. Only
the 0-th persistence diagram Dgm0 ( f ) is nontrivial in this case. We can read the zeroth persistence
diagram for the standard persistence using Theorem 4.15 from this level set persistence diagram.
Furthermore, for every infinite bar [ai , ∞) in the standard one dimensional persistence diagram,
we get a pairing (a j , ai ) (open-open bar) in the zeroth levelset diagram Dgm0 ( f ).
Reeb graphs can be a useful tool to compute the zeroth levelset zigzag persistence diagram
of a function on a topological space. Let f : X → R be a continuous function whose zeroth
persistence diagram we want to compute. We already observed that the function f induces a
continuous function on the Reeb graph R f . To distinguish the two domains more explicitly, we
denote the former function f X and the latter as f R . The following observation helps computing the
zeroth levelset zigzag persistence diagram Dgm0 ( f X ) because computationally it is much harder
to process a space, say the underlying space of a simplicial complex than only a graph (simplicial
1-complex).

Proposition 7.5. Dgm0 ( f X ) = Dgm0 ( f R ) where the diagrams are for the zeroth levelset zigzag
persistence.

The result follows from the following observation. Consider the levelset zigzag filtrations F X
and FR for the two functions as in sequence (4.15).

F X : X(a0 ,a2 ) ←- · · · ,→ X(ai−1 ,ai+1 ) ←- X(ai ,ai+1 ) ,→ X(ai ,ai+2 ) ←- · · · ,→ X(an−1 ,an+1 )

FR : R f (a0 ,a2 ) ←- · · · ,→ R f (ai−1 ,ai+1 ) ←- R f (ai ,ai+1 ) ,→ R f (ai ,ai+2 ) ←- · · · ,→ R f (an−1 ,an+1 )
j j
Using notation for interval sets Xi = X(ai ,a j ) and Ri = R f (ai ,a j ) , we have the following com-
mutative diagram between the 0-th levelset zigzag persistence modules.

H0 F X : H0 (X00 ) / H0 (X 1 ) o H0 (X11 ) · · · / H0 (X n ) o H0 (Xnn )


0 n−1

H0 F R : H0 (R00 ) / H0 (R1 ) o H0 (R11 ) · · · / H0 (Rn ) o H0 (Rnn )


0 n−1
182 Computational Topology for Data Analysis

All vertical maps are isomorphism because the number of components in X ij is exactly equal to
the number of components in the quotient space Rij = X ij / ∼ which is used to define the Reeb
graph. All horizontal maps are induced by inclusions. It follows that every square in the above
diagram commutes. Therefore the above two modules are isomorphic.

7.3 Distances for Reeb graphs


Several distance measures have been proposed for Reeb graphs. In this section, we introduce two
distances, one based on a natural interleaving idea, and the other based on the Gromov-Hausdorff
distance idea. It has been shown that these two distance measures are strongly equivalent, that is,
they are within a constant factor of each other for general Reeb graphs. For the special case of
merge trees, the two distance measures are exactly the same.
So far, we have used R f to denote the Reeb graph of a function f . For notational convenience,
in the following we use a different notation F for R f . Suppose we are given two Reeb graphs F
and G with the functions f : F → R and g : G → R associated to them. To emphasize the
associated functions we write (F, f ) and (G, g) in place of F and G when convenient. Again, we
assume that each Reeb graph is a finite simplicial 1-complex and the function is strictly monotone
on each edges. Our goal is to develop a concept of distance d(F, G) between them. Intuitively, if
two Reeb graphs are “the same”, then they are isomorphic and the function value of each point is
also preserved under the isomorphism. If two Reeb graphs are not the same, we aim to measure
how far it deviates from being “isomorphic". The two distances we introduce below both follow
this intuition, but measures the “deviation” differently.

7.3.1 Interleaving distance


We borrow the idea of interleaving between persistence modules (Section 3.4) to define a dis-
tance between Reeb graphs. Roughly speaking, instead of requiring that there is an isomorphism
between the two Reeb graphs, which would give rise to a pair of maps between them, φ : F → G
and φ−1 : G → F that is function preserving, we look for the existence of a pair of “compatible”
maps between appropriately “thickened" versions of F and G and the distance is measured by the
minimum amount of the “thickening" needed. We make this more precise below. First, given any
space X, set Xε := X × [−ε, ε].

Definition 7.4. Given a Reeb graph (F, f ), its ε-smoothing, denoted by Sε (F, f ), is the Reeb
graph of the function fε : Fε → R where fε (x, t) = f (x) + t for x ∈ F and t ∈ [−ε, ε]. In other
words, Sε (F, f ) = Fε / ∼ fε , where ∼ fε denotes the equivalence relation where x ∼ fε y if and only
if x, y ∈ Fε are from the same contour of fε .

See Figure 7.8 for an example. As Sε (F, f ) is the quotient space Fε / ∼ fε , we use [x, t],
x ∈ F, t ∈ [−ε, ε], to denote a point in Sε (F, f ), which is the equivalent class of (x, t) ∈ Fε
under the equivalence relation ∼ fε . Also, note that there is a natural “quotiented-inclusion” map
ι : (F, f ) → Sε (F, f ) defined as ι(x) = [x, 0], for any x ∈ F.
Suppose we have two Reeb graphs (A, fa ) and (B, fb ). A map µ : (A, fa ) → (B, fb ) between
them is function-preserving if fa (x) = fb (µ(x)) for each x ∈ A. A function-preserving map µ be-
tween (A, fa ) and Sε (B, fb ) induces a function-preserving map µε between Sε (A, fa ) and S2ε (B, fb )
Computational Topology for Data Analysis 183

Figure 7.8: From left to right, we have the Reeb graph (F, f ), its ε-thickening (Fε , fε ), and the
Reeb graph Sε (F, f ) of fε : Fε → R.

as follows:
µε : Sε (A, fa ) → S2ε (B, fb ) such that [x, t] 7→ [µ(x), t].
Now consider the “quotiented-inclusion” map ι introduced earlier, and suppose we also have a
pair of function-preserving maps φ : (F, f ) → Sε (G, g) and ψ : (G, g) → Sε (F, f ). Using the
above construction, we then obtain the following maps:

ιε : Sε (F, f ) → S2ε (F, f ), [x, t] 7→ [x, t],


φε : Sε (F, f ) → S2ε (G, g), [x, t] 7→ [φ(x), t]
ψε : Sε (G, g) → S2ε (F, f ), [y, t] 7→ [ψ(y), t]

Definition 7.5 (Reeb graph interleaving). A pair of continuous maps φ : (F, f ) → Sε (G, g) and
ψ : (G, g) → Sε (F, f ) are ε-interleaved if (i) both of them are function preserving, and (ii) the
following diagram commutes:

ι ιε
(F, f ) / Sε (F, f ) / S2ε (F, f )
: 8
φ φε
ψ ψε
$ &
(G, g) / Sε (G, g) / S2ε (G, g).
ι ιε

One can recognize that the above requirements of commutativity mirror the rectangular and
triangular commutativity in case of persistence modules (Definition 3.16). It is easy to verify the
rectangular commutativity, that is, to verify that the following diagram (and its symmetric version
involving maps ψ and ψε ) commutes.
ι / Sε (F, f )
(F, f )
φε
φ
$ ιε &
Sε (G, g) / S2ε (G, g)

Rectangular commutativity however does not embody the interaction between maps φ and ψ. The
key technicality lies in verifying the triangular commutativity, that is, φ and ψ make the diagram
184 Computational Topology for Data Analysis

below (and its symmetric version) commute.

S: ε (F, f )
φε
ψ &
(G, g) / Sε (G, g) / S2ε (G, g)
ι ιε

For sufficiently large ε, Sε (A, fa ) for any Reeb graph becomes a single segment with monotone
function values on it. Hence one can always find maps φ and ψ that are ε-interleaved for suf-
ficiently large ε. On the other hand, if ε = 0, then this implies ψ = φ−1 . Hence the smallest
ε accommodating ε-interleaved maps indicates how far the input Reeb graphs are from being
identical. This forms the intuition behind defining the following distance between Reeb graphs.

Definition 7.6 (Interleaving distance). Given two Reeb graphs (F, f ) and (G, g), the interleaving
distance between them is defined as:

dI (F, G) = inf{ε | there exists a pair of ε-interleaved maps between (F, f ) and (G, g) }. (7.1)

7.3.2 Functional distortion distance


We now define another distance between Reeb graphs called the functional distortion distance
which takes a metric space perspective. It views a Reeb graph as an appropriate metric space,
and measures the distance between two Reeb graphs via a construction similar to what is used for
defining Gromov-Hausdorff distances.

Definition 7.7 (Function-induced metric). Given a path π from u to v in a Reeb graph (A, fa ), the
height of π is defined as
height(π) = max fa (x) − min fa (x).
x∈π x∈π
Let Π(u, v) denote the set of all paths between two points u, v ∈ A. The function-induced metric
d fa : A × A → R on A induced by fa is defined as

d fa (u, v) = min height(π).


π∈Π(u,v)

In other words, d fa (u, v) is the minimum length of any closed interval I ⊂ R such that u and v
are in the same path component of fa−1 (I). It is easy to verify for a finite Reeb graph, the function-
induced distance d fa is indeed a proper metric on it, and hence we can view the Reeb graph
(A, fa ) as a metric space (A, d fa ). Refer to Chapter 9, Definition 9.6 for a generalized version of
this metric.

Definition 7.8 (Functional distortion distance). Given two Reeb graphs (F, f ) and (G, g), and a
pair of continuous maps Φ : F → G and Ψ : G → F, set

C(Φ, Ψ) = {(x, y) ∈ F × G | Φ(x) = y, or x = Ψ(y)}

and
1
D(Φ, Ψ) = sup d f (x, x0 ) − dg (y, y0 ) .
(x,y),(x0 ,y0 )∈C(Φ,Ψ) 2
Computational Topology for Data Analysis 185

The functional distortion distance between (F, f ) and (G, g) is defined as:

dFD (F, G) = inf max{ D(Φ, Ψ), k f − g ◦ Φk∞ , kg − f ◦ Ψk∞ }. (7.2)


Φ,Ψ

Note that the maps Φ and Ψ are not required to preserve function values; however the terms
k f − g ◦ Φk∞ and kg − f ◦ Ψk∞ bound the difference in function values under the maps Φ and Ψ. If
we ignore these two terms k f − g ◦ Φk∞ and kg − f ◦ Ψk∞ , and if we do not assume that Φ and Ψ
have to be continuous, then dFD is the simply the Gromov-Hausdorff distance between the metric
spaces (F, d f ) and (G, dg ) [175]. The above definition is thus a function-adapted version of the
continuous Gromov-Hausdorff distance 2 .

Properties of the distances. The two distances we introduced turn out to be strongly equivalent.

Theorem 7.6 (Bi-Lipschitz equivalence). dFD ≤ 3dI ≤ 3dFD .

Furthermore, it is known that for Reeb graphs F, G derived from two “nice” functions f, g :
X → R defined on the same domain X, both distances are stable [20, 116].

Definition 7.9 (Stable distance). Given f, g : X → R, let (F, f˜) and (G, g̃) be the Reeb graph of f
and g, respectively.
We say that a Reeb graph distance dR is stable if

dR (F, f˜), (G, g̃) ≤ k f − gk∞ .




Finally, it is also known that these distances are bounded from below (up to a constant factor)
by the bottleneck distance between the persistence diagrams associated to the two input Reeb
graphs. In particular, given (F, f ) (and similarly for (G, g)), consider the 0-th persistence diagram
Dgm0 ( f ) induced by the levelset zigzag-filtration of f as in previous section. We consider only
the 0-th persistence homology as each levelset f −1 (a) consisting of only a finite set of points. We
have the following result (see Theorem 3.2 of [32]).

Theorem 7.7. db (Dgm0 ( f ), Dgm0 (g)) ≤ 2dI (F, G) ≤ 2dFD (F, G).

Universal Reeb graph distance. We introduced two Reeb graph distances above. There are
other possible distances for Reeb graphs, such as the edit distance originally developed for Reeb
graphs induced by functions on curves and surfaces. All these distances are stable, which is an im-
portant property to have. The following concept allows one to identify the most “discriminative"
Reeb graph distance among all stable distances.

Definition 7.10. A Reeb graph distance dU is universal if and only if (i) dU is stable; and (ii) for
any other stable Reeb graph distance dS , we have dS ≤ dU .
2
It turns out that if one removes the requirement of continuity on Φ and Ψ, the resulting functional distortion
distance takes values within a constant factor of dFD we defined for the case of Reeb graphs.
186 Computational Topology for Data Analysis

It has been shown that neither the interleaving distance nor the functional distortion distance
is universal. On the other hand, for Reeb graphs of piecewise-linear functions defined on com-
pact triangulable spaces, such universal Reeb graph distance indeed exists. In particular, one
can construct a universal Reeb graph distance via a pullback idea to a common space; see [21].
The authors of [21] propose two further edit-like distances for Reeb graphs, both of which are
universal.

Computation. Unfortunately, except for the bottleneck distance db , the computation of any of
the distances mentioned above is at least as hard as graph isomorphism. In fact, even for merge
trees (which are simpler variant of the Reeb graph, described in Definition 7.2 at the end of
Section 7.1), it is NP-hard to compute the interleaving distance between them [6]. But for this
special case, a fixed-parameter tractable algorithm exists [289].

7.4 Notes and Exercises


The Reeb graph was originally introduced for Morse functions [261]. It was naturally extended
to more general spaces as it does not rely on smooth / differential structures. This graph, as
a summary of a scalar field, has found many applications in graphics, visualization and more
recently in data analysis; see e.g. [30, 31, 88, 123, 167, 189, 190, 276, 285, 290, 301]. Its loop
free version, the contour tree, has many applications of its own. Properties of the Reeb graph has
been studied in [107, 140]. The concept of Reeb space was introduced in [150]. The relations of
Merge, Split and Contour trees are studied in [66, 296].
An O(m log m) algorithm to compute the Reeb graph of a function on a triangulation of a
2-manifold is given in [107], where m is the size of the triangulation: In particular, it follows a
similar high level framework as in Algorithm 12:Reeb-SweepAlg. For the case where K represents
the triangulation of a 2-manifold, the pre-image graph Ga has a simpler structure (a collection of
disjoint loops for a generic value a). Hence the connectivity of Ga s can be maintained efficiently
in O(log nv ) time, rendering an O(m log nv ) = O(m log m) time algorithm to compute the Reeb
graph [107]. Several subsequent algorithms are proposed to handle more general cases; e.g,
[145, 146, 253, 287]. The best existing algorithm for computing the Reeb graph of a PL-function
defined on a simplicial complex, as described in Section 7.2.1, was proposed by Parsa in [252].
The randomized algorithm with the same time complex (in expectation) described in Section
7.2.2 was given in [185]. The loop-free version of the Reeb graph, namely, the contour tree, can
be computed much more efficiently in O(n log n) time, where n is the total number of vertices and
edges in the input simplicial complex domain [66]. As a by-product, this algorithm also computes
both the merge tree and split tree of the input PL-function within the same time complexity.
The concepts of horizontal and vertical homology groups were originally introduced in [103]
for any dimensions. The specific connection of the 1-dimensional case to the Reeb graphs (e.g.,
Theorem 7.4) was described in [140]. The 0-th levelset zigzag persistence (or equivalently, the
0-th and 1-st extended persistence) for the Reeb graph can be computed in O(n log n) time using
an algorithm of Agarwal et al. [5] originally proposed for computing persistence of functions
on surfaces based on mergeable tree data structures [169]. For the correctness proof of this
algorithm, see [127].
Computational Topology for Data Analysis 187

The interleaving distance of merge trees was originally introduced by Morozov et al. in
[237]. The interleaving distance for the Reeb graphs is more complicated, and was introduced
by de Silva et al. [116]. There is also an equivalent cosheave-theoretical way of defining the
interleaving distance. Its description involves the sheaf theory [112]. The functional distortion
distance for Reeb graphs was originally introduced in [20], and its relation to interleaving distance
was studied in [24]. The lower-bound in Theorem 7.7 was proven in [32]; while some weaker
bounds were earlier given in [47, 24]. An interesting distance between Reeb graphs can be defined
by mapping its levelset zigzag persistence module to a 2-parameter persistence module. See the
Notes in Chapter 12 for more details. The edit distance for Reeb graphs induced by functions
on curves or surfaces has been proposed in [158, 159]. Finally, the universality of Reeb graph
distance and universal (edit-like) distance for Reeb graphs was proposed and studied in [21].
It remains an interesting open question whether the interleaving distance (and thus functional
distortion distance) is within a constant factor of the universal Reeb graph distance.

Exercise
1. Suppose we are given a triangulation K of a 2-dimensional square. Let f : |K| → R be a
PL-function on K induced by a vertex function f : V(K) → R. Assume that all vertices
have distinct function values.

(1.a) Given a value a ∈ R, describe the topology of the contour f −1 (a).


(1.b) As we vary a continuously from −∞ to +∞, show that the connectivity of f −1 (a) can
only change when a equals f (v) for some v ∈ V(K).
(1.c) Enumerate all cases of topological changes of contours when a passes through f (v)
for some v ∈ V.

2. Given a finite simplicial complex K and a PL-function f induced by f : V(K) → R, let


R f (K) be the Reeb graph w.r.t. f . Suppose we add a new simplex σ of dimension 1 or 2 to
K, and let K 0 be the new simplicial complex. Describe how to obtain the new Reeb graph
R f (K 0 ) from R f (K).

3. Recall the vertical homology group introduced in Section 7.2.3. Suppose we are given
compact spaces X ⊂ Y and a function f : Y → R; without loss of generality, denote the
restriction of f over X also by f : X → R. Prove that the inclusion induces a well-defined
homomorphism ι∗p : Ȟ p (X) → Ȟ p (Y) between the vertical homology groups Ȟ p (X) and
Ȟ p (Y) w.r.t. f .

4. Recall the concept of merge tree introduced in Definition 7.2 and Figure 7.3 (c). An alter-
native way to define interleaving distance for merge trees is as follows [237]:
First, a merge tree (T, h) can be treated as a rooted tree where the function h serves as the
height function, and the function value from the root to any leaf is monotonically decreas-
ing. We also extend the root upward to +∞. See Figure 7.9 (a). Given any point x ∈ |T |,
we can then refer to any point along the path from x to +∞ as its ancestor; in particular, we
define xa , called the a-shift of x, as the ancestor of x with function value h(x) + a.
188 Computational Topology for Data Analysis

Consider two merge trees T f = (T 1 , f ) and Tg = (T 2 , g). A pair of continuous maps


α : |T 1 | → |T 2 | and β : |T 2 | → |T 1 | are ε-compatible if the following conditions are satisfied:
(i) g(α(x)) = f (x) + ε for any x ∈ |T 1 |; (ii) f (β(y)) = g(y) + ε for any y ∈ |T 2 |
(iii) β ◦ α(x) = x2ε for any x ∈ |T 1 |; (iv) α ◦ β(y) = y2ε for any y ∈ |T 2 |.
The interleaving distance between merge tree can then also be defined as:

dI (T f , Tg ) := inf { there exists a pair of ε-compatible maps between T f and Tg }.


ε

(4.a) Show that for merge trees, dI (T f , Tg ) = dFD (T f , Tg ).


(4.b) Suppose T 1 and T 2 have complexity n and m, respectively. Given a threshold δ, design
an algorithm to check whether there exists an δ-compatible maps between T1 and T2 .
(Note that the time complexity of your algorithm may depend exponentially on n and
m.) (Hint: Due to properties (3) and (4) for δ-compatible maps, knowing the image
α(x) for a point x ∈ |T 1 | will determine the image of all ancestors of x under α. )

5. Given a finite simplicial complex K, let nd denote the number of d-dimensional simplices in
K. Let f be a PL-function on K induced by f : V(K) → R, and assume that all n0 vertices
in V(K) are already sorted in non-decreasing order of f . Describe an algorithm to compute
the merge tree for K w.r.t. f , and give the time complexity of your algorithm. (Make your
algorithm as efficient as possible.)
h +∞

2r

z = xa = y a
f (x) + a

f (x)
x y

(a) (b) (c)

Figure 7.9: (a) The point z is the a-shift of both x and y. (b) An example of input points sampling a
hidden graph (Q-shaped curve). (c) The r-Rips complex spanned by these points “approximates"
a thickened version of the hidden graph G ⊂ R2 . The Reeb graph for distance to a basepoint will
then aim to recover this hidden graph.

6. [Programming exercise]: Let P be a set of points in Rd . Imagine that points in P are sam-
pled around a hidden graph G ⊂ Rd ; in particular, P is an ε-sample of G. See Figure 7.9 (b)
and (c). Implement the following algorithm to compute a graph from P as an approximation
of the hidden graph G.

Step 1 : Compute the Rips complex K := VRr (P) for a parameter r. Assume K is con-
nected. (If not, perform the following for each connected component of K). Assign
the weight of each edge in the 1-skeleton K 1 of K to be its length.
Computational Topology for Data Analysis 189

Step 2 : Choose a point q ∈ P as the base point. Let f : P → R be the shortest path
distance function from any point p ∈ P to the base point q in the weighted graph K 1 .
Step 3 : Compute the Reeb graph G b of the PL-function induced by f , and return G.
b

The returned Reeb graph G b can serve as an approximation of the hidden graph G. See
[167, 88] for analysis of variants of the above procedure.
190 Computational Topology for Data Analysis
Chapter 8

Topological Analysis of Graphs

In this chapter, we present some examples of topological tools that help analyze or summarize
graphs. In the previous chapter, we discussed one specific type of graph, the Reeb graph, obtained
by quotienting a space with the connected components of levelsets of a given function. Abstractly,
a Reeb graph can also be considered as a graph equipped with a height function. In this chapter,
we focus on general graphs. Structures such as cliques in a graph correspond to simplices as we
have seen in Vietoris-Rips complexes. They can help summarizing or characterizing graph data.
See Figure 8.1 for an example [262], where a directed graph is used to model the synaptic network
of neurons built by taking neurons as the vertices and the synaptic connections directed from pre-
to postsynaptic neurons as the directed edges. It is observed that there are unusually high number

Figure 8.1: (A) shows examples of two directed cliques (simplices) formed in the synap-
tic network. (B) shows the number of p-simplices for different types of graphs, where
“Bio-M" is the synaptic network from reconstructed neurons. Note that this neuronal net-
work has far more directed cliques than other biological or random graphs. (C) shows
that the count of directed cliques further differ depending on which layers neurons reside.
Image taken from [262], licensed by Michael W. Reimann et al.(2017) under CC BY 4.0
(https://creativecommons.org/licenses/by/4.0/).

191
192 Computational Topology for Data Analysis

of directed cliques (viewed as a simplex as we show in Section 8.3.1) in such networks, compared
to other biological networks or random graphs. Topological analysis such as the one described in
Section 8.3 can facilitate such applications.
Before considering directed graphs, we focus on topological analysis of undirected graphs in
Sections 8.1 and 8.2. We present topological approaches to summarize and compare undirected
graphs. In Section 8.3, we discuss how to obtain topological invariants for directed graphs. In
particular, we describe two ways of defining homology for directed graphs. The first approach
constructs an appropriate simplicial complex over an input directed graph and then takes the cor-
responding simplicial homology of this simplicial complex (Section 8.3.1). The second approach
considers the so-called path homology for directed graphs, which differs from the simplicial ho-
mology. It is based on constructing a specific chain complex directly from directed paths in the
input graph, and defining a homology group using the boundary operators associated with the
resulting chain complex (Section 8.3.2). It turns out that both path homology and the persistent
version of it can be computed via a matrix reduction algorithm similar to the one used in the
standard persistence algorithm for simplicial filtrations though with some key differences. We de-
scribe this algorithm in Section 8.3.3, and mention an improved algorithm for the 1-st homology.

8.1 Topological summaries for graphs


We have seen graphs in various contexts so far. Here we consolidate some of the persistence
results that specifically involve graphs. Sometimes graphs appear as abstract objects devoid of
any geometry where they are described only combinatorially. At other times, graphs are equipped
with a function or a metric. Reeb graphs studied in the previous chapter fall into this latter
category. Combinatorial graphs (weighted or unweighted) can also be viewed as metric graphs,
by associating them with an appropriate shortest path metric (Section 8.1.2).

8.1.1 Combinatorial graphs


A graph G is combinatorially presented as a pair G = (V, E), where V is a set called nodes/vertices
of G and E ⊆ V × V is a set of pairs of vertices called edges of G. We now introduce two common
ways to obtain a persistence-based topological summary for G.

Graphs viewed as a simplicial 1-complex. We can view G as a simplicial 1-complex with V


and E being the set of 0-simplices and 1-simplices respectively. Using tools such as the per-
sistence algorithm for graph filtration mentioned in Chapter 3, we can summarize G w.r.t. a
given PL-function f : |G| → R by the persistence diagram Dgm f . This is what was done in
Section 7.2.3 in the previous chapter while describing persistent homology for Reeb graphs. In
practice, the chosen PL-functions are sometimes called descriptor functions. For example, we
can choose f : |G| → R to be given by a vertex function called the degree-function, where f (v)
equals the degree of the graph node v in G. Some other choices for the descriptor function include
the heat-kernel signature function [284] used in [67] and the Ollivier Ricci curvature of graphs
[226] used in [305]. Note that, under this view, given that the domain is a simplicial 1-complex,
there is only zeroth and 1-st persistent homology to consider.
Computational Topology for Data Analysis 193

Clique complex view. Given a graph G = (V, E), its induced clique complex, also called the
flag complex, is defined as follows.

Definition 8.1 (Clique complex). Given a graph G = (V, E), a clique simplex σ of dimension k is

σ = {vi0 , . . . , vik } where either k = 0 or for any j , j0 ∈ [0, k], (vi j , vi j0 ) ∈ E.

By definition, every face of a clique simplex is also a clique simplex. Therefore, the collection of
all clique simplices form a simplicial complex CG called the clique complex of G. In other words,
the vertices of any (k + 1)-clique in G spans a k-simplex in CG .

Given a weighted graph G = (V, E, ω) with ω : E → R, let Ga denote the subgraph of G


spanned by all edges with weight at most a; that is, Ga = (V a , E a ) where E a = {(u, v) | ω(u, v) ≤ a}
and V a is the vertex set adjoining E a . Let CGa be the clique complex induced by Ga . It is easy
to see that CGa ⊆ CGb for any a ≤ b. Assuming all edges E = {e1 , . . . , em } are sorted in non-
decreasing order of their weights and setting ai = ω(ei ), we thus obtain the following clique-
complex filtration:
CGa1 ,→ CGa2 ,→ · · · ,→ CGam .
The persistent homology induced by the clique-complex filtration can be used to summarize the
weighted graph G = (V, E, ω). Here one can consider the k-th homology groups for k up to |V| − 1.

8.1.2 Graphs viewed as metric spaces


A finite metric graph is a metric space (|G|, dG ) where the space is the underlying space of a
finite graph G, equipped with a length metric dG [58]. We have already seen metric graphs
in the previous chapter where Reeb graphs are equipped with a metric induced by a function
(Definition 7.7). We can also obtain a metric graph from a (positively) weighted combinatorial
graph.
Given a graph G = (V, E, ω) where the weight of each edge is positive1 , we can view it as a
metric graph (|G|, dG ) obtained by gluing a set of length segments (edges), where intuitively dG is
the shortest path metric on |G| induced by edge lengths ω(e)s.

Fact 8.1. A positively weighted graph G = (V, E, ω) induces a metric graph (|G|, dG ).

Indeed, viewing G as a simplicial 1-complex, let |G| be the underlying space of G. For every
edge e ∈ E, consider the arclength parameterization e : [0, ω(e)] → |e|, and define dG (x, y) =
|e−1 (y) − e−1 (x)| for every pair x, y ∈ |e|. The length of any path π(u, v) between two points
u, v ∈ |G| is the sum of the lengths of the restrictions of π to edges in G. The distance dG (u, v)
between any two points u, v ∈ |G| is the minimum length of any path connecting u to v in |G|
which is a metric. The metric space (|G|, dG ) is the metric graph of G.

Intrinsic Čech and Vietoris-Rips filtrations. Given a metric graph (|G|, dG ), let Bo|G| (x; r) :=
{y ∈ |G| | dG (x, y) < r} denote the radius-r open metric ball centered at x ∈ |G|. Following
1
If G is unweighted, then ω : E → R is the constant function ω(e) = 1 for any e ∈ E.
194 Computational Topology for Data Analysis

Definitions 2.9 and 2.102 , the intrinsic Čech complex Cr (|G|) and intrinsic Vietoris-Rips complex
VRr (|G|) are defined as:
\
Cr (|G|) := {x0 , . . . , x p } | Bo|G| (xi ; r) , ∅ ;

i∈[0,p]

VR (|G|) := {x0 , . . . , x p } | dG (xi , x j ) < 2r for any i , j ∈ [0, p] .


r 

Remark 8.1. Observe that intrinsic Čech and Vietoris-Rips complexes as defined above are in-
finite complexes because we consider all points in the underlying space. Alternatively, G =
(V, E, ω) can also be viewed as a discrete metric space (V, d̂) where d̂ : V × V → R+ ∪ {0} is the
restriction of dG to graph nodes V of G. We can thus build discrete intrinsic Čech or Vietoris-Rips
complexes spanned by only vertices in G. If G is a complete graph, then the discrete Vietoris-Rips
complex at scale r is equivalent to the clique complex for Gr as introduced in Section 8.1.1. Most
of our discussions below apply to analogous results for the discrete case.
We now consider the intrinsic Čech filtration C := {Cr }r∈R and intrinsic Vietoris-Rips filtra-
tion R := {VRr }r∈R , and their induced persistence modules H p C := {H p (Cr )}r∈R and H p R :=
{H p (VRr )}r∈R . We have (see [81]):
Fact 8.2. Given a finite metric graph (|G|, dG ) induced by G = (V, E, ω), the persistence modules
H p C and H p R are both q-tame (recall the definition of q-tame in Section 3.4).

Hence both the intrinsic Čech and intrinsic Vietoris-Rips filtrations induce well-defined per-
sistence diagrams, which can be used as summaries (signatures) for the input graph G = (V, E, ω).
In what follows, we present some results on the homotopy types of these simplicial complexes,
as well as their induced persistent homology.

Topology of Čech and Vietoris-Rips complexes. The intrinsic Čech and Vietoris-Rips com-
plexes induced by a metric graph may have non-trivial high-dimensional homology groups. The
following results from [2] provide a precise characterization of the homotopy groups of these
complexes for a metric graph whose underlying space is a circle. Specifically, let S1 denote the
circle of unit circumference which is assumed for simplicity; the results below can be extended
to a circle of any length by appropriate scaling. Let Sd denote the d-dimensional sphere.
Theorem 8.1. Let 0 < r < 12 . There are homotopy equivalences: for ` = 0, 1, . . . ,

` `+1
Cr (S1 ) ' S2`+1 if <r≤ ; and
2(` + 1) 2(` + 2)
` r `+1
VRr/2 (S1 ) ' S2`+1 if < ≤ .
2` + 1 2 2` + 3
We remark that if one uses the closed ball to define these complexes, then the statements are
similar and but involve some additional technicalities; see [2].
Much less is known for more general metric graphs. Below we present two sets of results:
Theorem 8.2 characterizes the intrinsic Vietoris-Rips complexes for a certain family of metric
2
Note that here we use open metric balls instead of closed metric balls to define the Čech and Rips complexes, so
that the theoretical result in Theorem 8.1 is cleaner to state.
Computational Topology for Data Analysis 195

graphs [3]; while Theorem 8.3 characterizes only the 1-st persistent homology induced by the
intrinsic Čech complexes, but for any finite metric graph [166]. Recall that H̃ p denotes the p-th
reduced homology group.
Theorem 8.2. Let G be a finite metric graph, with each edge of length one, that can be obtained
from a vertex by iteratively attaching (i) an edge along a vertex or (ii) a k-cycle graph along
a vertex or a single edge for k > 2 (see, e.g., Figure 8.2). Then we have that H̃ p (VR(G; r)) ≈
⊕ni=1 H̃ p (VR(Cki ; r)) where ⊕ stands for the direct sum, n is the number of times operation (ii) is
performed, and Cki is a loop of ki edges (and thus Cki is of length ki ) which was attached in the
i-th time that operation (ii) is performed.

C6

u w C4

Figure 8.2: A 4-cycle C4 is attached to the base graph along vertex v; while a 6-cycle C6 is
attached to the base graph along edge (u, w).

The above theorem can be relaxed to allow for different edge lengths though one needs to
define the “gluing” more carefully in that case. See [3] for details. Graphs described in Theorem
8.2 are intuitively generated by iteratively gluing a simple loop along a “short” simple path in
the existing graph. Note that the above theorem implies that the Vietoris-Rips complex for a
connected metric tree has isomorphic reduced homology groups as a point.

Persistent homology induced by Čech complexes. Instead of a fixed scale, Theorem 8.3 below
provides a complete characterization for the 1-st persistent homology of intrinsic Čech complex
filtration of a general finite metric graph. To present the result, we recall the concept of the
shortest cycle basis (optimal basis) for H1 (G) while treating G = (V, E, ω) as a simplicial 1-
complex (Definition 5.3). Specifically, in our setting, given any 1-cycle γ = ei1 + ei2 + · + eis ,
define the length of γ to be length(γ) = sj=1 ω(ei j ). A cycle basis of G refers to a set of g 1-
P
cycles Γ = {γ1 , . . . , γg } that form a basis for the 1-dimensional cycle group Z1 (G). Notice that we
can replace H1 (G) with the cycle group Z1 (G) because the two are isomorphic in case of graphs.
Given a cycle basis Γ, its length-sequence is the sequence of lengths of elements in the basis
in non-decreasing order. A cycle basis of G is a shortest cycle basis if its length-sequence is
lexicographically minimal among all cycle basis of G.
Theorem 8.3. Let G = (V, E, ω) be a finite graph with positive weight function ω : E → R. Let
{γ1 , . . . , γg } be a shortest cycle basis of G where g = rank (Z1 (G)), and for each i = 1, . . . , g, let
`i = length(γi ). Then, the 1-st persistence diagram Dgm1 C induced by the intrinsic Čech filtration
C := {Cr (|G|)}r∈R on the metric graph (|G|, dG ) consists of the following set of points on the y-axis:
`i
Dgm1 C = {(0, ) | 1 ≤ i ≤ g}.
4
196 Computational Topology for Data Analysis

Unfortunately, no such characterization is available for high-dimensional cases. Some partial


results on the higher-dimensional persistent homology induced by intrinsic Čech filtration are
given in [133].

8.2 Graph comparison


The topological invariants described in the previous section can be used as signatures to compare
graphs. For example, given two graphs G1 = (V1 , E1 , ω1 ) and G2 = (V2 , E2 , ω2 ) with positive
weight functions, let C(G1 ) and C(G2 ) denote the intrinsic Čech filtrations for (|G1 |, dG1 ) and
(|G2 |, dG2 ), respectively. We can then define dIC (G1 , G2 ) = db (Dgm1 C(G1 ), Dgm1 C(G2 )) and
dIC gives rise to a pseudo-distance (a metric for which the first axiom may hold only with ‘if’
condition) for the family of finite graphs with positive weights. Furthermore, this pseudo-distance
is stable w.r.t. the Gromov-Hausdorff distance by a generalization of Theorem 6.3 (ii) to totally
bounded metric spaces (see the discussions after Theorem 6.3).

Persistence-distortion distance. In what follows, we introduce another pseudo-distance for


metric graphs, called the persistence distortion distance, which, instead of mapping the entire
graph into a single persistence diagram, maps each point in the graph to such a summary. This
distance can thus compare (metric) graphs at a more refined level.
First, given a finite metric graph (|G|, dG ), for any point s ∈ |G|, consider the shortest path
distance function fs : |G| → R defined as: x 7→ dG (x, s). Let

Ps := Dgm0 fs , the 0-th persistence diagram induced by the function fs . (8.1)

Let G and D denote the space of finite metric graphs and the space of finite persistence diagrams,
respectively; and let 2D denote the space of all subsets of D. We define:

φ : G → 2D where for any |G| ∈ G, φ(|G|) 7→ { Ps | s ∈ |G| }. (8.2)

In other words, φ maps a metric graph |G| to a set of (infinitely many) points φ(|G|) in the space of
persistence diagrams D. The image φ(|G|) is another graph in the space of persistence diagrams
though this map φ is not necessarily injective.
Now let (|G1 |, dG1 ) and (|G2 |, dG2 ) denote the metric graphs induced by finite graphs G1 =
(V1 , E1 , ω1 ) and G2 = (V2 , E2 , ω2 ) with positive edge weights.
Definition 8.2 (Persistence distortion distance). Given finite metric graphs (|G1 |, dG1 ) and (|G2 |, dG2 ),
the persistence-distortion distance between them, denoted by dPD (G1 , G2 ), is the Hausdorff dis-
tance dH (φ(|G1 |), φ(|G2 |) between the two image sets φ(|G1 |) and φ(|G2 |) in the space of persistence
diagrams (D, db ) equipped with the bottleneck distance db . In other words, setting A := φ(|G1 |)
and B := φ(|G2 |), we have

dPD (G1 , G2 ) := dH φ(|G1 |), φ(|G2 |) = max max min db (P, Q); max min db (P, Q) .
 
P∈A Q∈B Q∈B P∈A

The persistence distortion dPD is a pseudo-metric. It can be computed in polynomial time for
finite input graphs. It is stable w.r.t. the Gromov-Hausdorff distance between the two input metric
graphs.
Computational Topology for Data Analysis 197

Theorem 8.4. dPD (G1 , G2 ) ≤ 6dGH (|G1 |, |G2 |).


dPD = dH (b
One can also define a discrete persistence distortion distance b φ(G1 ), b
φ(G2 )), where
φ(G) := {Ps | s ∈ V} for a graph G = (V, E, ω). Both the persistence distortion distance and its
b
discrete variant can be computed in time polynomial in the size (number of vertices and edges) of
the combinatorial graphs G1 and G2 generating the metric graphs |G1 | and |G2 | respectively.

8.3 Topological invariants for directed graphs


In this section, we assume that we are given a directed graph G = (V, E,~ ω) where E~ ⊆ V × V is the
~
directed edge set, and ω : E → R is the edge weight function (if the input graph is unweighted,
we assume that all weights equal to 1). Each directed edge (u, v) is an ordered pair, and thus edge
~ and also there is at
(u, v) , (v, u). For simplicity, we assume that there is no self-loop (v, v) in E,
most one directed edge between an ordered pair of nodes. Given a node v ∈ V, its in-degree is
~ and its out-degree is outdeg(v) = |{u | (v, u) ∈ E}|.
indeg(v) = |{u | (u, v) ∈ E}|, ~

8.3.1 Simplicial complexes for directed graphs


Treating a directed graph as an asymmetric network (as it may be that ω(u, v) , ω(v, u)), one can
extend ideas in the previous section to this asymmetric setting. We give two examples below:
both cases lead to simplicial complexes from an input directed graph (weighted or unweighted),
and one can then compute (persistent) homological information of (filtrations of) this simplicial
complex as summaries of input directed graphs.
c c
b b
b d b d
a c a a
a c e e
d g f g f
(a) (b)

Figure 8.3: (a) a 3-clique and a 4-clique with source a and sink c. (b) A directed graph (left) and
its directed clique complex (right). The set of triangles in this complex are: {bce, ced, ed f }. There
is no higher dimensional simplices. Note that if the edge (b, d) is also in the directed graph in (b),
then the tetrahedron bcde will be in its corresponding directed clique complex.

Directed clique complex. A node in a directed graph is a source node if it has in-degree 0;
and it is a sink node if it has out-degree 0. A directed cycle is a sequence of directed edges
(v0 , v1 ), (v1 , v2 ), . . . , (vk , v0 ). A graph is a directed acyclic graph (DAG) if it does not contain any
directed cycle. A graph ({v1 , . . . , vk }, E 0 ) is a directed k-clique if (i) there is exactly one edge
between any pair of (unordered) vertices (thus there are 2k edges in E 0 ), and (ii) it is a DAG. See
Figure 8.3 (a) for examples. A set of vertices {vi1 , . . . , vik } spans a directed clique in G = (V, E) ~ if
0 ~
there is a subset of edges of E ⊆ E such that ({vi1 , . . . , vik }, E ) is a directed k-clique. It is easy to
0

see that given a directed clique, any subset of its vertices also form a directed clique (Exercise 5).
198 Computational Topology for Data Analysis

~ the directed clique


Definition 8.3 (Directed clique complex). Given a directed graph G = (V, E),
complex induced by G is a simplicial complex K defined as

C(G) := {σ = {vi1 , . . . , iik } | {vi1 , . . . , iik } spans a directed k-clique in G}.


b

Hence a k-clique spans a (k-1)-simplex in the directed clique complex. See Figure 8.3 (b) for
a simple example. Now given a weighted directed graph G = (V, E, ~ ω), for any a ≥ 0, let Ga be
the subgraph of G spanned by all directed edges whose weight is at most a. Assuming all edges
~ are sorted by their weights in a non-decreasing order, set ai = ω(ei ). Similar
e1 , . . . , em , m = |E|,
to the clique complex filtration for undirected graphs introduced in Section 8.1.1, this gives rise
to the following filtration of simplicial complexes induced by the directed clique complexes:

C(Ga1 ) ,→ b
b C(Ga2 ) ,→ · · · ,→ b
C(Gam ).

One can then use the persistence diagram induced by the above filtration as a topological invariant
for the input directed graph G.
~ ω) and a threshold
Definition 8.4 (Dowker complex). Given a weighted directed graph G = (V, E,
δ, the Dowker δ-sink complex is the following simplicial complex:

Dδsi (G) := {σ = {vi0 , . . . , vid } | there exists v ∈ V so that ω(vi j , v) ≤ δ for any j ∈ [0, d]}. (8.3)

In the above definition, v is called a δ-sink for the simplex σ. In the example on the right of
Figure 8.3 (a), assume all edges have weight 1. If we now remove edge (b, d), then abd is not a
3-clique any more in Gδ=1 . However, abd still forms a 2-simplex in the Dowker sink complex D1si
with sink c.
In general, as δ increases, we obtain a sequence of Dowker complexes connected by inclu-
sions, called the Dowker sink filtration D si (G) = {Dδsi ,→ Dδsi0 }δ≤δ0 .
Alternatively, one can define the Dowker δ-source complex in a symmetric manner:

Dδso (G) := {σ = {vi0 , . . . , vid } | there exists v ∈ V so that ω(v, vi j ) ≤ δ for any j ∈ [0, d]} (8.4)

resulting in a Dowker source filtration D so (G) = {Dδso ,→ Dδso0 }δ≤δ0 . It turns out that by the duality
theorem of Dowker [147], the two Dowker complexes have isomorphic homology groups. It can
be further shown that the choice of Dowker complexes does not matter when persistent homology
is considered [99].

Theorem 8.5 (Dowker Duality). Given a directed graph G = (V, E, ~ ω), for any threshold δ ∈ R
si so
and dimension p ≥ 0, we have H p (Dδ )  H p (Dδ ). Furthermore, the persistence modules induced
by the Dowker sink and the Dowker source filtrations are isomorphic as well, that is,

Dgm p D si = Dgm p D so , for any p ≥ 0.

8.3.2 Path homology for directed graphs


In this subsection, we introduce the so-called path homology, which is different from the simpli-
cial homology that we defined for clique complex and Dowker complex. Instead of constructing a
simplicial complex from an input directed graph and considering its simplicial homology group,
Computational Topology for Data Analysis 199

here, we use the directed graph to define a chain complex directly. The resulting path homol-
ogy group has interesting mathematical structures behind, e.g., there is a concept of homotopy in
directed graphs under which the path homology is preserved, and it accommodates the Künneth
formula [186].
Note that in this chapter, we have assumed that a given directed graph G = (V, E) ~ does not
contain self-loops (where a self-loop is an edge (u, u) from u to itself). For notational simplicity,
below we sometimes use index i to refer to vertex vi ∈ V = {v1 , . . . , vn }.
Let k be a field with 0 and 1 being the additive and multiplicative identities respectively. We
use −a to denote the additive inverse of a in k. An elementary p-path on V is an ordered sequence
vi0 , vi1 , · · · , vi p of p + 1 of vertices of V, which we denote by evi0 ,vi1 ,...,vid , or just ei0 ,i1 ,··· ,i p for
simplicity. Let Λ p = Λ p (G, k) denote the k-linear space of all linear combinations of elementary
p-paths with coefficients from k. The set {ei0 ,··· ,i p | i0 , · · · , i p ∈ V} forms a basis for Λ p . Each
element c of Λd is called a p-path or p-chain, and it can be written as
X
c= ai0 ···i p ei0 ···i p , where ai0 ···i p ∈ k.
i0 ,··· ,i p ∈V

Similar to the case of simplicial complexes, we can define boundary map ∂ p : Λ p → Λ p−1 as:
X
∂ p ei0 ···i p = (−1)k ei0 ···î j ···i p , for any elementary p -path ei0 ···i p ,
i0 ,··· ,i p ∈V

where îk means the removal of index ik . The boundary of a p-path c = ai0 ···i p · ei0 ···i p , is thus
P
∂ p c = ai0 ···i p · ∂ p ei0 ···i p . For convenience, we set Λ−1 = 0 and note that Λ0 is the set of k-linear
P
combinations of vertices in V. It is easy to show that ∂ p−1 · ∂ p = 0, for any p > 0. In what follows,
we often omit the dimension p from ∂ p when it is clear from the context.
Next, we restrict the consideration to real paths in directed graphs formed by consecutive
directed edges. Specifically, given a directed graph G = (V, E), ~ call an elementary p-path ei0 ,··· ,i p
allowed if there is an edge from ik to ik+1 for all k ∈ [0, p − 1]. Define A p as the space spanned
by all allowed elementary p-paths, that is, A p := span{ei0 ···i p : ei0 ···i p is allowed}. An elementary
p-path i0 · · · i p is called regular if ik , ik+1 for all k, and is irregular otherwise. Clearly, every
allowed path is regular since there is no self-loop. However, applying the boundary map ∂ to Λ p
may create irregular paths. For example, ∂euvu = evu − euu + euv is irregular because of the term
euu . To deal with this case, the term containing consecutive repeated vertices is taken as 0. Thus,
for the previous example, we have ∂euvu = evu − 0 + euv = evu + euv . The boundary map ∂ on
A p is now taken to be the boundary map for Λ p restricted on A p with this modification, where
all terms with consecutive repeated vertices created by the boundary map ∂ are replaced with 0’s.
For simplicity, we still use the same symbol ∂ to represent this modified boundary map on the
space of allowed paths.
After restricting the boundary operator to the space of allowed paths A p s, the inclusion that
∂A p ⊂ A p−1 may not hold; that is, the boundary of an allowed p-path is not necessarily an allowed
(p − 1)-path. To this end, we adopt a stronger notion of allowed paths: a path c is ∂-invariant
if both c and ∂c are allowed. Let Ω p := {c ∈ A p | ∂c ∈ A p−1 } be the space generated by all
∂-invariant p-paths. Note that ∂Ω p ⊂ Ω p−1 (as ∂2 = 0). This gives rise to the following chain
complex of ∂-invariant allowed paths:
∂ ∂ ∂ ∂
· · · Ωp →
− Ω p−1 →
− · · · Ω1 →
− Ω0 →
− 0.
200 Computational Topology for Data Analysis

We can now define the homology groups of this chain complex.

Definition 8.5 (Path homology). The p-th cycle group is defined as Z p = ker ∂|Ω p , and elements
in Z p are called p-cycles. The p-th boundary group is defined as B p = Im ∂|Ω p+1 , with elements of
B p called p-boundary cycles (or simply p-boundaries). The p-th path homology group is defined
as H p (G, k) = Z p /B p .

v2 v4

v1 v6

v3 v5

Figure 8.4: A directed graph G.

Examples. Consider the directed graph in Figure 8.4, and assume that the coefficient field k ,
Z2 : Examples of elementary 1-path include: e12 , e24 , e13 , e14 , and so on. However, e13 and e14 are
not an allowed 1-path. More examples of allowed 1-path include: e12 +e46 , e12 +e31 , e46 +e65 +e45
and e46 +e65 −e45 . Note that any allowed 1-path is also ∂-invariant; that is, Ω1 = A1 , as all 0-paths
are allowed. Observe that ∂(e46 + e65 + e45 ) = e6 − e4 + e5 − e6 + e5 − e4 = 2e5 − 2e4 , which
is not 0 (unless the coefficient field k = Z2 ). However, ∂(e46 + e65 − e45 ) = 0, meaning that
e46 + e65 − e45 ∈ Z1 . Other 1-cycle examples include

e12 + e23 + e31 , e24 + e45 − e23 − e35 , and e12 + e24 + e45 − e53 + e31 ∈ Z1

Examples of elementary 2-paths include: e123 , e245 , e256 and e465 . However, e256 is not al-
lowed. Consider the allowed 2-path e245 , its boundary ∂e245 = e45 − e25 + e24 is not allowed as e25
is not allowed. Hence the allowed 2-path e245 is not ∂-invariant; similarly, we can see that neither
e235 nor e123 is in Ω2 . It is easy to check that e465 ∈ Ω2 as ∂e465 = e65 − e45 + e46 . Also note that
while neither e235 nor e245 is in Ω2 , the allowed 2-path e245 − e235 is ∂-invariant as

∂(e245 − e235 ) = e45 − e25 + e24 − e35 + e25 − e23 = e45 + e24 − e35 − e23 ∈ A1 .

This example suggests that elementary ∂-invariant p-paths do not necessarily form a basis for Ω p
– this is rather different from the case of simplicial complex, where the set of p-simplices form a
basis for the p-th chain group.
The above discussion also suggests that e46 + e65 − e45 , e24 + e45 − e23 − e35 ∈ B1 .
For the example in Figure 8.4,

{e12 + e23 + e31 , e46 + e65 − e45 , e24 + e45 − e23 − e35 } is a basis for the 1-cycle group Z1 ;
{e46 + e65 − e45 , e24 + e45 − e23 − e35 } is a basis for the 1-boundary group B1 ; while
{e245 − e235 , e465 } is a basis for the space of ∂-invariant 2-paths Ω2 .
Computational Topology for Data Analysis 201

Persistent path homology for directed graphs. Given a weighted directed graph G = (V, E, ~ ω),
let Ga denote the subgraph of G containing all directed edges with weight at most a. This gives
rise to a filtration of graphs G : {Ga ,→ Gb }a≤b . Let H p (Ga ) denote the p-th path homology
induced by graph Ga . It can be shown [100] that the inclusion Ga ,→ Gb induces a well-defined
homormorphism ξa,b p : H p (Ga ) → H p (Gb ), and the sequence G : {Ga ,→ Gb }a≤b leads to a
persistence module H p G : {H p (Ga ) → H p (Gb )}a≤b .

8.3.3 Computation of (persistent) path homology


The example in the previous section illustrates the challenge for computing path homology in-
duced by a directed graph G in comparison to simplicial homology. In particular, the set of
elementary allowed d-paths may no longer form a basis for the space of the ∂-invariant d-paths
Ωd : Indeed, recall that for the graph in Figure 8.4, {e465 , e245 − e235 } form a basis for Ω2 , yet,
neither e245 nor e235 belongs to Ω2 .
We now present an algorithm to compute the persistent path homology of a given weighted
directed graph G = (V, E,~ ω). Note that as a byproduct, this algorithm can also compute the path
homology of a directed graph.

Algorithm setup. Given a p-path τ, its allowed-time is set to be the smallest value (weight)
a when it belongs to A p (Ga ); and we denote it by at(τ) = a. Let A p = {τ1 , . . . , τt } denote
the set of elementary allowed p-paths, sorted by their allowed-times in a non-decreasing order.
Similarly, set A p−1 = {σ1 , . . . , σ s } to be the sequence of elementary allowed (p − 1)-paths sorted
by their allowed-times in a non-decreasing order. Let a1 < a2 < · · · < atˆ be the sequence of
distinct allowed-times of elementary p-paths in A p in increasing orders. Obviously, tˆ ≤ t = |A p |.
Similarly, let b1 < b2 < · · · < b ŝ be the sequence of distinct allowed-times for (p − 1)-paths in
A p−1 sorted in increasing order.
Note that A p (resp. A p−1 ) forms a basis for A p (G) (resp. A p−1 (G)). In fact, for any i, set
Aapi := {τ j | at(τ j ) ≤ ai }. It is easy to see that Aapi equals {τ1 , . . . , τρi }, where

ρi ∈ [1, t] is the largest index of any elementary p-path whose allowed-time is at most ai ; (8.5)

and Aapi forms a basis for A p (Gai ). Note that the cardinality of Aapi \ Aapi−1 could be larger than 1 and
b
that is why ρi is not necessarily equal to i. A symmetric statement holds for A p−1
j
and A p−1 (Gb j ).
From now on, we fix a dimension p. At high level, the algorithm for computing the p-th
persistent path homology has the following three steps, which looks similar to the algorithm
that computes standard persistent homology for simplicial complexes. However, there are key
differences in the implementation of these steps.

Step 1. Set up a “boundary matrix” M = M p .

Step 2. Perform left-to-right matrix reduction to transform M to a reduced form M.


b

Step 3. Construct the persistence diagram from the reduced matrix M.


b

The details of these steps are given as follows.


202 Computational Topology for Data Analysis

Description of Step 1. The columns of M correspond to A p , ordered by their allowed-times.


We would like col M [i] = ∂τi . However, the boundary of an allowed path may not be allowed.
Hence the rows of the matrix need to correspond to not only the elementary allowed (p − 1)-paths
in A p−1 (ordered by their allowed-times), but also any elementary (non-allowed) (p − 1)-path
that appears in the boundary of any τ j ∈ A p : we assign the allowed-times for such paths to
be +∞. The rows of M are ordered in non-decreasing allowed-times from top to bottom. Let
bp−1 = {σ1 , . . . , σ s , σ s+1 , . . . , σ` } be the final set of elementary (p − 1)-paths corresponding to
A
rows of M. Note that the first s elements are from A p−1 , while those in {σ s+1 , . . . , σ` } are not-
allowed, and have allowed-time +∞. See an example in Figure 8.5 (a) and (b).
The matrix M represents the boundary operator ∂ p restricted to A p . In other words, the i-
th column of M, denoted by col M [i], contains the boundary of τi , represented using the basis
elements in A bp−1 ; that is, ∂ p τi = P` col M [i][ j]σ j . From a vector representation point of view,
j=1
we will also simply say that ∂ p τi = col M [i]. The allowed time for the (p−1)-path represented by a
column vector C is simply the allowed time at(σ j ) associated to the low-row index j = lowId(C)
in this vector: It is important to note that the rows of M are ordered in increasing indices from top
down. Hence lowId of a column means the largest index in A bp−1 for which this column contains
a non-zero entry. We further associate a p-path γi with the i-th column of M for each i ∈ [1, t],
with the property ∂ p γi = col M [i]. At the beginning of the algorithm, γi is initialized to be τi and
later will be updated through the reduction process in (Step 2) below.

eab 0 1 0 eab 0 1 −1
ecb 0 0 1 ecb 0 0 1
G b c ead 0 ead 0
1 2 −1 0 −1 1
eed 1 0 0 eed 1 0 −1
a 10 5 ece 1 0 0 ece 1 0 −1
ebd 0 1 1 ebd 0 1 0
3 4 ecd −1 0 −1 ecd −1 0 0
d e eced eabd ecbd eced eabd ecbd
(a) graph G (b) original boundary matrix M (c) reduced matrix M
b

Figure 8.5: The input is the weighted directed graph in (a). Its 1-dimensional boundary matrix
M as constructed in (Step 1) is shown in (b). Note that at(ecd ) = +∞ (so ecd < A1 (G)). For
each edge (i.e, elementary allowed 1-path) in G, its allowed-time is simply its weight. There are
only three elementary allowed 2-paths, and their allowed-times are: at(eced ) = 5, at(eabd ) = 10
and at(ecbd ) = 10. (c) shows the reduced matrix. From this matrix, we can deduce that the 1-
th persistence diagram (for path homology) includes two points: (10, 10) and (5, 10) (generated
by the second and third columns). Note that for the first column (corresponding to eced ), as
at(col Mb [1]) = ∞; hence the corresponding γ1 is not ∂-invariant.

Description of Step 2. We now perform the standard left-to-right matrix reduction to M, where
the only allowed operation is to add a column to some column on its right. We convert M to its
reduced form M b (Definition 3.13); and through this process, we also update γi accordingly so
that at any moment, ∂ p γi = col M0 [i] where M 0 is the updated boundary matrix at that point. In
particular, if we add column j to column i > j, then we will update γi = γi + γ j . We note that
Computational Topology for Data Analysis 203

other than the additional maintenance of γs, this reduction step of M is the same as the reduction
in Algorithm 3:MatPersistence given in Section 3.3. The following claim follows easily from
that there are only left-to-right column additions, and that the allowed-times of γi s are initially
sorted in non-decreasing order.

Claim 8.1. For any i ∈ [1, t], the allowed-time of γi remains the same through any sequence of
left-to-right column additions.

Let Ωip denote the space of ∂-invariant p-paths w.r.t. Gai ; that is, Ωip = Ω p (Gai ). Given a
p-path τ, let ent(τ) be its entry-time, which is the smallest value a such that τ ∈ Ω p (Ga ). It is
easy to see that for any p-path τ, we have that

ent(τ) = max{at(τ), at(∂ p τ)}. (8.6)

Recall that each column vector col M b [i] is in fact the vector representation of a (p − 1)-path
(with respect to basis elements in Ab = {σ1 , . . . , σ` }). Also, the allowed time for a column col b [i]
M
is given by at(col Mb [i]) = at(σh ) where h = lowId(col M b [i]).

Claim 8.2. Given a reduced matrix M, b let C = Pt ci col b [i] be a (p − 1)-path. Let col b [ j] be
i=1 M M
the column with lowest (i.e, largest) lowId among all columns col M b [i]s such that ci , 0, and set
h = lowId(col M
b [ j]). It then follows that at(C) = at(σh ).

Now for the reduced matrix M, b given any i ∈ [1, tˆ], we set ρi to be the largest index j ∈ [1, t]
such that at(γ j ) ≤ ai . By Claim 8.1, for each j there is a fixed allowed time associated to the
p-path γ j associated to it, which stays invariant through the reduction process. So this quantity
ρi is well defined, consistent with what we defined earlier in Eqn. (8.5), and remains invariant
through the reduction process. Now set:

Γi := {γ1 , . . . , γρi },
Ii := { j ≤ ρi | at(col M
b [ j]) ≤ ai }, and
Σi := {γ j | ent(γ j ) ≤ ai } = {γ j | j ∈ Ii }.

Theorem 8.6. For any k ∈ [1, tˆ], Γk forms a basis for Akp := A(Gak ); while Σk forms a basis for
Ωkp = Ω p (Gak ).

Proof. That Γk forms a basis for Akp follows easily from the facts that originally, {τ1 , . . . , τρk }
form a basis for Akp , and the left-to-right column additions maintain this. In what follows, we
prove that Σk forms a basis for Ωkp . First, note that all elements in Σk represent paths in Ωkp and
they are linearly independent by construction (as their low-row indices are distinct). So we only
need to show that any element in Ωkp can be represented by a linear combination of vectors in Σk .
Let ξk denote the largest index j ∈ [1, s] such that at(σ j ) ≤ ak . In other words, an equivalent
formulation for Ik is that Ik = { j ≤ ρk | lowId(col M
b [ j]) ≤ ξk }.
Now consider any γ ∈ Ω p ⊆ A p . As Γ forms a basis for Akp , we have that
k k k

ρk
X ρk
X ρk
X
γ= ci γi and ∂γ = ci ∂γi = ci col M
b [i].
i=1 i=1 i=1
204 Computational Topology for Data Analysis

As γ ∈ Ωkp and ent(γ) = max{at(γ), at(∂γ)} (see Eqn. (8.6)), we have at(γ) ≤ ak and at(∂γ) ≤
ak . By Claim 8.2, it follows that for any j ∈ [1, ρk ] with c j , 0, its lowId satisfies lowId(col M
b [ j]) ≤
ξk . Hence each such index j with c j , 0 must belong to Ik , and as a result, γ can be written as
a linear combination of p-paths in Σk . Combined with that all vectors in Σk are in Ωkp and are
linearly independent, it follows that Σk forms a basis for Ωkp . 

Corollary 8.7. Set Jk := { j ∈ Ik | col M k


b [ j] is all zeros}. Further we set Z := {γ j | j ∈ Jk };
and Bk := {col M k
b [ j] | j ∈ Ik \ Jk }. Then (i) Z forms a basis for the p-dimensional cycle group
Z p (G ); and (ii) B forms a basis for the (p − 1)-dimensional boundary group B p−1 (Gak ).
ak k

Proof. Let ∂ˆ p denote the restriction of ∂ p over Ω p . Recall that Z p = Ker∂ˆ p , while B p−1 = Im∂ˆ p .
Easy to see that by construction of Z k , we have Z k ⊆ Span(Z k ) ⊆ Z p (Gak ). Since all Γi s are
linearly independent, we thus have that vectors in Z k are linearly independent. It then follows that
|Z k | ≤ rank (Z p (Gak )) where |Z k | stands for the cardinality of Z k .
Similarly, as the matrix Mb is reduced, all non-zero columns of M b are linearly independent, and
thus vectors in B are linearly independent. Furthermore, by Theorem 8.6, each vector in Bk is in
k

B p−1 (Gak ) (as it is the boundary of a p-path from Ωkp ). Hence we have that Span(Bk ) ⊆ B p−1 (Gak ),
and |Bk | ≤ rank (B p−1 (Gak )).
On the other hand, let ∂ˆ p |Ωkp denote the restriction of ∂ˆ p to only Ωkp ⊆ Ω p . Note that by Rank
Nullity Theorem,
|Σkp | = rank (Ωkp ) = rank (ker(∂ˆ p |Ωkp )) + rank (im (∂ˆ p |Ωkp )) = rank (Z p (Gak )) + rank (B p−1 (Gak )).

As rank (Σkp ) = |Z k | + |Bk |, and combining the above equation with the inequalities obtained
in the previous paragraphs, it follows that it must be that |Z k | = rank (Z p (Gak )) and |Bk | =
rank (B p−1 (Gak )). The claim then follows. 

Description of Step 3: constructing persistence diagram from the reduced matrix M. b Given
a weighted directed graph G = (V, E, ~ ω), for each dimension p ≥ 0, construct the boundary
matrix M p+1 as described above in (Step 1). Perform the left-to-right column reduction to M p+1
b=M
to obtain a reduced form M b p+1 as in (Step 2). The p-th persistence diagram Dgm p G where
G : {G ,→ G }a≤b can be computed as follows.
a b

Let µa,b
p denote the persistence pairing function: that is, the persistence point (a, b) is in
Dgm p G with multiplicity µa,b
p if and only if µ p > 0. At the beginning, µ p is initialized to
a,b a,b

be 0 for all a, b ∈ R. We then inspect every non-zero column col M b [i], and take the following
actions.
b [i]) , ∞, then we increase the pairing function µ
at(col M
b [i]),ent(γi ) by 1, where
• If at(col M
γi is the allowed elementary (p + 1)-path corresponding to this column. Observe that,
at(col Mb [i]) ≤ ent(γi ) because

ent(γi ) = max{at(γi ), at(∂γi )} = max{at(τi ), at(col M


b [i])}.

• Otherwise, the path γi corresponds to this column is not ∂-invariant (i.e, not in Ω p ), and we
do nothing.
Computational Topology for Data Analysis 205

• Finally, consider the reduced matrix Mb p for the p-th boundary matrix M p as constructed in
(Step 1). Recall the construction of Jk as in Corollary 8.7. For any j ∈ Jk such that j is not
appearing as the low-row index of any column in M b p+1 , we increase the pairing function
µat(τ),∞ by 1, where τ is the elementary p-path corresponding to this column.

See Figure 8.5 for an example. Let N p denote the number of allowed elementary p-paths in
G: obviously, N p = O(n p+1 ). However, as we see earlier, the number of rows of M p+1 is not
necessarily bounded by N p ; and we can only bound it by the number of elementary p-paths in G,
which we denote by N bp . If we use the standard Gaussian elimination for the column reduction as
in Algorithm 3:MatPersistence, then the time complexity to compute the reduced matrix M b p+1 is
O(Nbp2 N p+1 ). One can further improve it using the fast matrix multiplication time.
We note that due to Theorem 8.6 and Corollary 8.7, the above algorithm is rather similar
to the matrix reduction for the standard persistent homology induced by simplicial complexes.
However, the example in Figure 8.5 shows the difference.

Improved computation for 1-st persistent path homology. The time complexity can be im-
proved for computing the 0-th and 1-st persistent path homology. In particular, the 0-th persis-
tence path homology coincides with the 0-th persistent homology induced by the persistence of
clique complexes, and thus can be computed in O(mα(n) + n log n) time using the union-find data
~
structure, where n = |V| and m = |E|.
u v

w u
u v v w z
bigon boundary triangle boundary quadrangle

Figure 8.6: Boundary bigon, triangle and quadrangle. Such boundary cycles generate all 1-
dimensional boundary cycles.

For the 1-dimensional case, it turns out that the boundary group has further structures. In
particular, the 1-dimensional boundary group is generated by only the specific forms of bigons,
triangles and quadrangles as shown in Figure 8.6. The 1-st persistent path homology can thus
be computed more efficiently by a different algorithm (from the above matrix reduction algo-
rithm) by enumerating certain family of boundary cycles of small cardinality which generates the
boundary group. In particular, the cardinality of this family depends on the so-called arboricity
a(G) of G: Ignoring the direction of edges in graph G (i.e., viewing it as an undirected graph),
its arboricity a(G) is the minimum number of edge-disjoint spanning forests into which G can be
decomposed [183]. An alternative definition of the arboricity is that:
|E(H)|
a(G) = max . (8.7)
H is a subgraph of G |V(H)| − 1

Without describing the algorithm developed in [131], we present its computational complexity
for the 1-st persistent path homology in the following theorem.
206 Computational Topology for Data Analysis

Theorem 8.8. Given a directed weighted graph G = (V, E, ~ w) with n = |V|, m = |E|,
~ and N p =
O(n p+1 ) the number of allowed elementary p-paths, assume that the time to compute the rank of
a r × r matrix is rω . Let din (v) and dout (u) denote the in-degree and out-degree of a node v ∈ V,
and a(G) be the arboricity of G. Set K = min{a(G)m, (u,v)∈E~ (din (u) + dout (u))}. Then we can
P
compute the p-th persistent path homology:

• in O(mα(n) + n log n) time when p = 0;

• in O(Kmω−1 + a(G)m) time when p = 1; and

In particular, the arboricity a(G) = O(1) for plannar graphs, thus it takes O(nω ) time to
compute the 1-st persistent path homology for a planar directed graph G.

8.4 Notes and Exercises


The clique complex (also called the flag complex) is one of the most common ways to construct
a simplicial complex from a graph. Recent years have seen much work on using the topological
profiles associated with the clique complex for network analysis; e.g., one of the early applications
in [255]. Most materials covered in Section 8.1.2 come from [2, 3, 81]; note that [81] provides
a detailed exposition for the intrinsic Čech and Vietoris-Rips filtrations of general metric spaces
(beyond merely metric graphs). Theorem 8.2 comes as a corollary of Proposition 4 in [3], which is
a stronger result than this theorem: In particular, Proposition 4 of [3] characterizes the homotopy
type of the family of graphs described in Theorem 8.2.
The comparison of graphs via persistence distortion was proposed in [134].
Topological analysis for directed graphs and asymmetric networks is more recent. Neverthe-
less, the clique complex for directed graphs has already found applications in practical domains;
e.g., [144, 229, 262]. The path homology was originally proposed in [171], and further studied
and developed in [172, 173, 174]. Its persistent version is proposed and studied in [100]. Note
that as mentioned earlier, the path homology is not a simplicial homology. Nevertheless, we have
shown in this chapter that there is still a matrix reduction algorithm to compute it for any dimen-
sion, with the same time complexity required for computing the homology groups for simplicial
complexes. The path homology also has a rich mathematical structure: There is a concept of
homotopy theory for digraphs under which the path homology is preserved [172], and it is also
dual to the cohomology theory of diagrams introduced in [173]. Note that in this chapter, we
have assumed that the input directed graph does not have self-loops. Additional care is needed to
handle such loops.
The matrix reduction algorithm for computing the persistent path homology that we described
in Section 8.3.3 is based on the work in [100]. The algorithm of [100] assumes that the input graph
is a complete and weighted directed graph; or equivalently, is a finite set V with a weight function
w : V × V → R that may be asymmetric. We modify it so that the algorithm works with an
arbitrary weighted directed graph. Finally, a hypergraph G = (V, E) consists of a finite set of
nodes V, and a collection E ⊆ 2V of subsets of V, each such subset called a hyper-edge. (In other
words, a graph is a hypergraph where every hyper-edge has cardinality 2.) We remark that the
idea behind path homology has also been extended to defining the so-called embedded homology
for hypergraphs [48].
Computational Topology for Data Analysis 207

Exercise

G b c G b c
2 3 1 2
a 4 1 a 10 5

6 5 e 3 4 e
d d
(a) (b)

Figure 8.7: (a) graph for Exercise 6. (b) graph for Exercise 7. Edge weights are marked.

1. Consider a metric tree (|T |, dT ) induced by a positively weighted finite tree T = (V, E, w).
Suppose the largest edge weight is w0 . Consider the discrete intrinsic Čech complex Cr (V)
spanned by vertices in V. That is, let BT (x; r) := {y ∈ |T | | dT (x, y) < r} denote the open
radius-r ball around a point x. Then, we have
\
Cr (V) := {hv0 , . . . , v p i | vi ∈ V for i ∈ [0, p], and BT (vi ; r) , ∅}.
i∈[0,p]

Prove that for any r > w0 , Cr (V) is homotopy equivalent to a point.

2. Consider a finite graph G=(V,E) with unit edge length, and its induced metric dG on it. For
a base point v ∈ V, let fv : |G| → R be the shortest path distance function to v; that is, for
any x ∈ |G|, fv (x) = dG (x, v).

(2.a) Characterize the maxima of this function fv .


(2.b) Show that the total number of critical values of fv is bounded from above by O(n + m).
(2.c) Show that this shortest path distance function can be described by O(n + m) functions
whose total descriptive complexity is O(n + m).

3. Consider a finite metric graph G = (V, E, ω) induced by positive edge weight ω : E → R+ .


Recall for each basepoint s ∈ |G|, it is mapped to the persistence diagram Ps as in Eqn. (8.1)
(which is a point in the space of persistence diagrams). Show that this map is 1-Liptschiz
w.r.t. the bottleneck metric on the space of persistence diagrams; that is, db (Ps , Pt ) ≤ dG (s, t)
for any two s, t ∈ |G|.

4. Given two finite metric graphs G1 = (|G1 |, dG1 ) and G2 = (|G2 |, dG2 ), pick an arbitrary point
v ∈ |G1 | and consider its associated shortest-path distance function fv : |G1 | → R to this
point; that is, fv (x) = dG1 (x, v) for any x ∈ |G1 |. For any point w ∈ |G2 |, let gw : |G2 | → R
denote the shortest-path distance function to w in |G2 | via dG2 . Let Dgm0 fv (resp. Dgm0 gw )
denote the 0-th persistence diagram induced by the superlevel set filtration of fv (resp.
of gw ). Argue that there exists some point w∗ ∈ |G2 | such that db (Dgm0 fv , Dgm0 gw∗ ) ≤
C · dGH (G1 , G2 ) for some constant C > 0, where dGH is the Gromov-Hausdorff distance.
208 Computational Topology for Data Analysis

5. Show that given a directed clique, any subset of its vertices span a directed subgraph with
a unique source and a unique sink.

6. Consider the graph in Figure 8.7 (a). Compute the 0-th and 1-st persistence diagrams for
the filtrations induced by (i) the directed clique complexes; (ii) the Dowker-sink complexes;
and (iii) the Dowker-source complexes.

7. Consider the graph in Figure 8.7 (b). Compute the 1-st persistence diagram for the filtra-
tions (i) induced by directed clique complexes; and (ii) induced by path homology.

8. Consider a pair of directed graphs G = (V, E) and G0 = (V, E 0 ) spanned by the same set of
vertices V, and E 0 = E ∪ {(u, v)}; that is, G0 equals to G with an additional directed edge
e = (u, v). Consider path homology. Consider the 1st cycle and boundary groups for G and
for G0 .

(i) Show that rank (Z1 (G0 )) ≤ rank (Z1 (G)) + 1.


(ii) Give an example of G and G0 where rank (B1 (G0 )) − rank (B1 (G)) ∈ Θ(n), where
n = |V|.
Chapter 9

Cover, Nerve, and Mapper

Data can be complex both in terms of the domain where they come from and in terms of proper-
ties/observations associated with them which are often modeled as functions/maps. For example,
we can have a set of patients, where each patient is associated with multiple biological markers,
giving rise to a multivariate function from the space of patients to an image domain that may or
may not be the Euclidean space. To this end, we need to analyze not only real-valued scalar fields
as we did in so far in the book, but also more complex maps defined on a given domain, such as
multivariate, circle valued, sphere valued maps, etc.

U1 U2 U3 U4 U5

Figure 9.1: The function values on a hand model are binned into intervals as indicated by different
colors. The mapper [277] corresponding to these intervals (cover) is shown with the graph below;
image courtesy of Facundo Mémoli and Gurjeet Singh.

One way to analyze complex maps is to use the Mapper methodology introduced by Singh
et al. in [277]. In particular, given a map f : X → Z, the mapper M( f, U) creates a topological
metaphor for the structure behind f by pulling back a cover U of the space Z to a cover on X
through f . This mapper methodology can work with any (reasonably tame) continuous maps
between two topological spaces. It converts complex maps and covers of the target space into
simplicial complexes, which are much easier to process computationally. One can view the map

209
210 Computational Topology for Data Analysis

f and a finite cover of the space Z as the lens through which the input data X is examined. It is in
some sense related to Reeb graphs which also summarizes f but without any particular attention
to a cover of the codomain. Figure 9.1 shows a mapper construction where the reader can see its
similarity to the Reeb graph. The choice of different maps and covers allows the user to capture
different aspects of the input data. The mapper methodology has been successfully applied to
analyzing various types of data, we have shown an example in Figure 2(e) in the Prelude, for
others see e.g. [227, 244].
To understand the Mapper and its multiscale version Multiscale Mapper better, we study first
some properties of nerves as they are at the core of these constructions. We already know Nerve
Theorem (Theorem 2.1) which states that if every intersection of cover elements in a cover U is
contractible, then the nerve N(U) is homotopy equivalent to the space X = U. However, we
S
cannot hope for such a good cover all the time and need to investigate what happens if the cover
is not good. Sections 9.1 and 9.2 are devoted to this study. Specifically, we show that if every
cover element satisfies a weaker property that it is only path connected, then the nerve may not
preserve homotopy, but satisfies a surjectivity property in one-dimensional homology.
One limitation of the mapper is that it is defined with respect to a fixed cover of the target
space. Naturally, the behavior of the mapper under a change of cover is of interest because it has
the potential to reveal the property of the map at different scales. Keeping this in mind, we study a
multiscale version of mapper, which we refer to as multiscale mapper. It is capable of producing
a multiscale summary in the form of a persistence diagram using a cover of the codomain at
different scales. In Section 9.4, we discuss the stability of the multiscale mapper under changes
in the input map and/or in the tower U of covers. An efficient algorithm for computing mapper
and multiscale mapper for a real valued PL-function is presented in Sections 9.5. In Section 9.6,
we consider the more general case of a map f : X → Z where X is a simplicial complex but Z is
not necessarily Euclidean. We show that we can use an even simpler combinatorial version of the
multiscale mapper, which only acts on vertex sets of X with connectivity given by the 1-skeleton
graph of X. The cost we pay here is that the resulting persistence diagram approximates (instead
of computing exactly) the persistence diagram of the standard multiscale mapper if the tower of
covers of Z is “good" in certain sense.

9.1 Covers and nerves


In this section we present several facts about covers of a topological space and their nerves.
Specifically, we focus on maps between covers and the maps they induce between nerves and
their homology groups.
Let X denote a path connected topological space. Recall that by this we mean that there exists
a continuous function called path γ : [0, 1] → X connecting every pair of points {x, x0 } ∈ X × X
where γ(0) = x and γ(1) = x0 . Also recall that for a topological space X, a collection U = {Uα }α∈A
of open sets such that α∈A Uα = X is called an open cover of X (Definition 1.6). Although it is
S
not required in general, we will always assume that each open cover Uα is path connected.

Maps between covers. If we have two covers U = {Uα }α∈A and V = {Vβ }β∈B of a space X,
a map of covers from U to V is a set map ξ : A → B so that Uα ⊆ Vξ(α) for every α ∈ A. We
Computational Topology for Data Analysis 211

abuse the notation ξ to also indicate the map U → V. The following proposition connects a map
between covers to a simplicial map between their nerves.

ξ ζ
N (U) U V N (V) N (U) U V N (V) N (U) U V N (V)

Figure 9.2: Cover maps ξ and ζ indicated by solid arrows induce simplicial maps N(ξ) and N(ζ)
whose corresponding vertex maps are indicated by dashed arrows.

Proposition 9.1. Given a map of covers ξ : U → V, there is an induced simplicial map N(ξ) :
N(U) → N(V) given on vertices by the map ξ.

Proof. Write U = {Uα }α∈A and V = {Vβ }β∈B . Then, for all α ∈ A we have Uα ⊆ Vξ(α) . Now take
any σ ∈ N(U). We need to prove that ξ(σ) ∈ N(V). For this observe that

\ \ \
Vβ = Vξ(α) ⊇ Uα , ∅,
β∈ξ(σ) α∈σ α∈σ

where the last step follows because σ ∈ N(U). 


An example is given in Figure 9.2, where both maps N(ξ) and N(ζ) are simplicial. Furthermore,
ξ ζ
if U → V → W are three different covers of a topological space with the intervening maps of
covers between them, then N(ζ ◦ ξ) = N(ζ) ◦ N(ξ) as well.
The following fact will be very useful later for defining multiscale mappers.

Proposition 9.2 (Induced maps are contiguous). Let ζ, ξ : U → V be any two maps of covers.
Then, the simplicial maps N(ζ) and N(ξ) are contiguous.

Proof. Write U = {Uα }α∈A and V = {Vβ }β∈B . Then, for all α ∈ A we have both

Uα ⊆ Vζ(α) and Uα ⊆ Vξ(α) ; ⇒ Uα ⊆ Vζ(α) ∩ Vξ(α) .

Now take any σ ∈ N(U). We need to prove that ζ(σ) ∪ ξ(σ) ∈ N(V). For this write

   
\  \   \
Vβ =  Vζ(α)  ∩  Vξ(α) 


β∈ζ(σ)∪ξ(σ) α∈σ α∈σ


\  \
= Vζ(α) ∩ Vξ(α) ⊇ Uα , ∅,
α∈σ α∈σ
212 Computational Topology for Data Analysis

where the last step follows from assuming that σ ∈ N(U). It implies that the vertices in ζ(σ)∪ξ(σ)
span a simplex in N(V). 

In Figure 9.2, the two maps N(ξ) and N(ζ) can be verified to be contiguous (Definition 2.7).
Furthermore, contiguous maps induce identical maps at the homology level (Fact 2.11). Proposi-
tion 9.2 implies that the map H∗ (N(U)) → H∗ (N(V)) thus induced can be deemed canonical.

Maps at homology level. Now we focus on establishing various maps at the homology levels
for covers and their nerves. We first establish a map φU between X and the geometric realization
|N(U)| of a nerve complex N(U). This helps us to define a map φU ∗ from the singular homology
groups of X to the simplicial homology groups of N(U) (through the singular homology of |N(U)|).
The nerve theorem (Theorem 2.1) says that if the elements of U intersect only in contractible
spaces, then φU is a homotopy equivalence and hence φU ∗ is an isomorphism between H∗ (X) and
H∗ (N(U)). The contractibility condition can be weakened to a homology ball condition to retain
the isomorphism between the two homology groups [219]. In absence of such conditions of the
cover, simple examples exist to show that φU ∗ could be neither a monophorphism (injection) nor
an epimorphism (surjection). Figure 9.3 gives an example where φU ∗ is not surjective in H2 .
However, for one dimensional homology groups, the map φU ∗ is necessarily a surjection when
each element in the cover U is path connected. We call such a cover U path connected. The
simplicial maps arising out of cover maps between path connected covers induce a surjection
between the 1-st homology groups of two nerve complexes.
R3 f R2 N (f −1 U)


f −1 Uα

Figure 9.3: The map f : S2 ⊂ R3 → R2 takes the sphere to R2 . The pullback of the cover element
Uα makes a band surrounding the equator which causes the nerve N( f −1 U) to pinch in the middle
creating two 2-cycles. This shows that the map φU : X → N(U) may not induce a surjection in
H2 .

Blow up space. The proof of the nerve theorem given by Hatcher in [186] uses a construction
that connects the two spaces X and |N(U)| via a blow-up space XU that is a product space of U
and the geometric realization |N(U)|. In our case U may not satisfy the contractibility condition
as in that proof. Nevertheless, we use a similar construction to define three maps, ζ : X → XU ,
π : XU → |N(U)|, and φU : X → |N(U)| where φU = π ◦ ζ is referred to as the nerve map; see
Figure 9.4(left). Details about the construction of these maps follow.
Denote the elements of the cover U as Uα for α taken from some indexing set A. The vertices
of N(U) are denoted by {uα , α ∈ A}, where each uα corresponds to the cover element Uα . For each
Computational Topology for Data Analysis 213

XU U1
U1 × {1}
XU
ζ π
U0,1 × [0, 1] U0,1
φU
X |N (U)|
U0 × {0}
U0
X

Figure 9.4: (left) Various maps used for blow up space; (right) example of a blow up space.

finite non-empty intersection Uα0 ,...,αn := ni=0 Uαi consider the product Uα0 ,...,αn × ∆nα0 ,...,αn , where
T
∆nα0 ,...,αn denotes the n-dimensional simplex with vertices uα0 , . . . , uαn . Consider now the disjoint
union
G
M := Uα0 ,...,αn × ∆nα0 ,...,αn
α0 ,...,αn ∈A: Uα0 ,...,αn ,∅

together with the following identification: each point (x, y) ∈ M, with x ∈ Uα0 ,...,αn and y ∈
[α0 , . . . , b αi , . . . , αn ] ⊂ ∆nα0 ,...,αn is identified with the corresponding point in the product Uα0 ,...,bαi ,...,αn ×
∆α0 ,...,bαi ,...,αn via the inclusion Uα0 ,...,αn ⊂ Uα0 ,...,bαi ,...,αn . Here [α0 , . . . , b
αi , . . . , αn ] denotes the
i-th face of the simplex ∆nα0 ,...,αn . Denote by ∼ this identification and now define the space
XU := M / ∼. An example for the case when X is a line segment and U consists of only two
open sets is shown in Figure 9.4(right).
In what follows we assume that the space X is compact. The main motivation behind re-
stricting X to such spaces is that they admit a condition called partition of unity which we use to
establish further results.

Definition 9.1 (Locally finite). An open cover {Uα , α ∈ A} of X is called a refinement of another
open cover {Vβ , β ∈ B} of X if every element Uα ∈ U is contained in an element Vβ ∈ V.
Furthermore, U is called locally finite if every point x ∈ X has a neighborhood contained in
finitely many elements of U.

Definition 9.2 (Partition of unity). A collection of real valued continuous functions {ϕα : X →
[0, 1], α ∈ A} is called a partition of unity if (i) α∈A ϕα (x) = 1 for all x ∈ X, (ii) For every x ∈ X,
P
there are only finitely many α ∈ A such that ϕα (x) > 0.
If U = {Uα , α ∈ A} is any open cover of X, then a partition of unity {ϕα , α ∈ A} is subordinate
to U if the support1 supp(ϕα ) of ϕα is contained in Uα for each α ∈ A.

Fact 9.1 ([258]). For any open cover U = {Uα , α ∈ A} of a compact space X, there exists a
partition of unity {ϕα , α ∈ A} subordinate to U.

We assume that X is compact and hence for an open cover U = {Uα }α of X, we can choose
any partition of unity {ϕα , α ∈ A} subordinate to U according to Fact 9.1. For each x ∈ X such that
1
The support of a real-valued function is the subset of the domain whose image is non-zero.
214 Computational Topology for Data Analysis

x ∈ Uα , denote by xα the corresponding copy of x residing in XU . For our choice of {ϕα , α ∈ A},
define the map ζ : X → XU as:
X
for any x ∈ X, ζ(x) := ϕα (x) xα .
α∈A

The map π : XU → |N(U)| is induced by the individual projection maps

Uα0 ,...,αn × ∆nα0 ,...,αn → ∆nα0 ,...,αn .

Then, it follows that φU = π ◦ ζ : X → |N(U)| satisfies, for x ∈ X,


X
φU (x) = ϕα (x) uα . (9.1)
α∈A

We have the following fact [258, pp. 108]:

Fact 9.2. ζ is a homotopy equivalence.

9.1.1 Special case of H1


Now, we show that the nerve maps at the homology level are surjective for one dimensional
homology groups, namely all homology classes in N(U) arise from those in X = U. Further-
S
more, if we assume that X is equipped with a pseudo-metric, we can define a size for cycles with
this pseudo-metric and show that all homology classes with representative cycles having a large
enough size survive in the nerve N(U). Note that the result is not true beyond one dimensional
homology (recall Figure 9.3).
To prove this result for H1 , first, we make a simple observation that connects the classes in
singular homology of |N(U)| to those in the simplicial homology of N(U). The result follows
immediately from the isomorphism between singular and simplicial homology induced by the
geometric realization; see [241]. Recall that [c] denotes the class of a cycle c. If c is simplicial,
|c| denotes its underlying space.

Proposition 9.3. Every 1-cycle γ in |N(U)| has a 1-cycle γ0 in N(U) so that [γ] = [|γ0 |].

Proposition 9.4. If U is path connected, φU∗ : H1 (X) → H1 (|N(U)|) is a surjection, where φU∗ is
the homomorphism induced by the nerve map defined in Eqn. (9.1).

Proof. Let [γ] be any class in H1 (|N(U)|). Because of Proposition 9.3, we can assume that γ = |γ0 |,
where γ0 is a 1-cycle in the 1-skeleton of N(U). We will construct a 1-cycle γU in XU so that
π(γU ) = γ. Assume first that such a γU can be constructed. Then, consider the map ζ : X → XU
in the construction of the nerve map φU where φU = π ◦ ζ. There exists a class [γX ] in H1 (X) so
that ζ∗ ([γX ]) = [γU ] because ζ∗ is an isomorphism by Fact 9.2. Then, φU∗ ([γX ]) = π∗ (ζ∗ ([γX ]))
because φU∗ = π∗ ◦ ζ∗ . It follows φU∗ ([γX ]) = π∗ ([γU ]) = [γ] showing that φU∗ is surjective.
Therefore, it remains only to show that a 1-cycle γU can be constructed given γ0 in N(U)
so that π(γU ) = γ = |γ0 |. Let e0 , e1 , . . . , er−1 , er = e0 be an ordered sequence of edges on γ0 .
Recall the construction of the space XU . In that terminology, let ei = ∆nαi α(i+1) mod r . Let vi =
e(i−1) mod r ∩ ei for i ∈ [0, r − 1]. The vertex vi = vαi corresponds to the cover element Uαi where
Computational Topology for Data Analysis 215

Uαi ∩ Uα(i+1) mod r , ∅ for every i ∈ [0, r − 1]. Choose a point xi in the common intersection
Uαi ∩ Uα(i+1) mod r for every i ∈ [0, r − 1]. Then, the edge path ẽi = ei × xi is in XU by construction.
Also, letting xαi to be the lift of xi in the lifted Uαi , we can choose a vertex path xαi { xα(i+1) mod r
residing in the lifted Uαi and hence in XU because Uαi is path connected. Consider the following
cycle obtained by concatenating the edge and vertex paths
γU = ẽ0 xα0 { xα1 ẽ1 · · · ẽr−1 xαr−1 { xα0
By projection, we have π(ẽi ) = ei for every i ∈ [0, r − 1] and π(xαi { xα(i+1) mod r ) = vαi and thus
π(γU ) = γ as required. 

Since we are eventually interested in the simplicial homology groups of the nerves rather
than the singular homology groups of their geometric realizations, we make one more transition
using the known isomorphism between the two homology groups (Theorem 2.10). Specifically,
if ιU : H p (|N(U)|) → H p (N(U)) denotes this isomorphism, we let
φ̄U∗ : H1 (X) → H1 (N(U)) denote the composition ιU ◦ φU∗ . (9.2)
As a corollary to Proposition 9.4, we obtain:
Theorem 9.5. If U is path connected, φ̄U∗ : H1 (X) → H1 (N(U)) is a surjection.

U1 U2 U3
ξ1 ξ2
X

N (U1) N (U2) N (U3)

N (ξ1 ) N (ξ2 )

Figure 9.5: Sequence of cover maps induce a simplicial tower and hence a persistence module:
classes in H1 can only die.

From nerves to nerves. We now extend the result in Theorem 9.5 to simplicial maps between
two nerves induced by cover maps. Figure 9.5 illustrates this fact. The following proposition is
key to establishing the result.
θ
Proposition 9.6 (Coherent partitions of unity). Suppose {Uα }α∈A = U −→ V = {Vβ }β∈B are open
covers of a compact topological space X and θ : A → B is a map of covers. Then there exists a
partition of unity {ϕα }α∈A subordinate to the cover U such that if for each β ∈ B we define

α∈θ−1 (β) ϕα if β ∈ im(θ);


( P
ψβ :=
0 otherwise.
216 Computational Topology for Data Analysis

then the set of functions {ψβ }β∈B is a partition of unity subordinate to the cover V.

Proof. The proof closely follows that of [258, Corollary pp. 97]. Since X is compact, there
exists a partition of unity {ϕα }α∈A subordinate to U. The fact that the sum in the expression of ψβ
is well defined and continuous follows from the fact that the family {supp(ϕα )}α is locally finite.
Let Cβ := α∈θ−1 (β) supp(ϕα ). The set Cβ is closed, Cβ ⊂ Uβ , and ψβ (x) = 0 for x < Cβ so that
S
supp(ψβ ) ⊂ Cβ ⊂ Vβ . Now, to check that the family {Cβ }β∈B is locally finite pick any point x ∈ X.
Since {supp(ϕα )}α is locally finite there is an open set O containing x such that O intersects only
finitely many elements in U. Denote these cover elements by Uα1 , . . . , Uα` . Now, notice if β ∈ B
and β < {θ(αi ), i = 1, . . . , `}, then O does not intersect Cβ . Then, the family {supp(ψβ )}β∈B is
locally finite. It then follows that for x ∈ X one has
X X X X
ψβ (x) = ϕα (x) = ϕα (x) = 1.
β∈B β∈B α∈θ−1 (β) α∈A

We have obtained that {ψβ }β∈B is a partition of unity subordinate to V as needed by the propo-
sition. 
θ
Let {Uα }α∈A = U −→ V = {Vβ }β∈B be two open covers of X connected by a map of covers
θ : A → B. Apply Proposition 9.6 to obtain coherent partitions of unity {ϕα }α∈A and {ψβ }β∈B
subordinate to U and V, respectively. Let the nerve maps φU : X → |N(U)| and φV : X → |N(V)|
τ
be defined as in Eqn. (9.1) using these coherent partitions of unity. Let N(U) → N(V) be the
simplicial map induced by the cover map θ. The map τ can be extended to a (linear) continuous
map τ̂ : |N(U)|→|N(V)| by assigning y ∈ |N(U)| to τ̂(y) ∈ |N(V)| where
X X X
y= tα uα =⇒ τ̂(y) = tα τ̂(uα ), with tα = 1.

Claim 9.1. The map τ̂ satisfies the property that, for x ∈ X, τ̂(φU (x)) = φV (x).

Proof. For any point x ∈ X, one has φU (x) = Σα∈A ϕα (x)uα where uα is the vertex corresponding
to Uα ∈ U in |N(U)|. Then,
 
X  X X
τ̂ ◦ φU (x) = τ̂  ϕα (x)uα  =
 ϕα (x)τ(uα ) = ϕα (x) vθ(α)
Xα∈AX α∈A
X α∈A

= ϕα (x) vθ(α) = ψβ (x)vβ = φV (x)


β∈B α∈θ−1 (β) β∈B

An immediate corollary of the above claim is:

Corollary 9.7. The induced maps of φU∗ : H p (X) → H p (|N(U)|), φV∗ : H p (X) → H p (|N(V)|), and
τ̂∗ : H p (|N(U)|) → H p (|N(V)|) commute, that is, φV∗ = τ̂∗ ◦ φU∗ .

With the fact that isomorphism between singular and simplicial homology commutes with
simplicial maps and their linear continuous extensions, Corollary 9.7 implies that:
Computational Topology for Data Analysis 217

Hp (X)
φU∗ φV∗

τ̂∗
Hp (|N (U)|) Hp (|N (V)|)

ιU ιV

τ∗
Hp (N (U)) Hp (N (V))

Figure 9.6: Maps relevant for Proposition 9.8; φ̄V∗ = ιV ◦ φV∗ and φ̄U∗ = ιU ◦ φU∗ . The triangular
‘roof’ and the square ‘room’ commute, so does the entire ‘house’.

Proposition 9.8. φ̄V∗ = τ∗ ◦ φ̄U∗ where φ̄V∗ : H p (X) → H p (N(V)), φ̄U∗ : H p (X) → H p (N(U)) and
τ : N(U) → N(V) is the simplicial map induced by a cover map U → V.

Proof. Consider the diagram in Figure 9.6. The upper triangle commutes by Corollary 9.7. The
bottom square commutes by the property of simplicial maps, see Theorem 34.4 in [241]. The
claim in the proposition follows by combining these two commutating subdiagrams. 
Proposition 9.8 extends Theorem 9.5 to the simplicial maps between two nerves.

Theorem 9.9. Let τ : N(U) → N(V) be a simplicial map induced by a cover map U → V where
both U and V are path connected. Then, τ∗ : H1 (N(U)) → H1 (N(V)) is a surjection.

Proof. Consider the maps


φ̄U∗ τ∗ φ̄V∗
H1 (X) → H1 (N(U)) → H1 (N(V)), and H1 (X) → H1 (N(V)).

By Proposition 9.8, τ∗ ◦ φ̄U∗ = φ̄V∗ . By Theorem 9.5, the map φ̄V∗ is a surjection. It follows that
τ∗ is a surjection. 

9.2 Analysis of persistent H1 -classes


Using the language of persistent homology, the results in the previous section imply that one
dimensional homology classes can die in the nerves, but they cannot be born. In this section,
we further characterize the classes that survive. The distinction among the classes is made via
a notion of ‘size’. Intuitively, we show that the classes with ‘size’ much larger than the ‘size’
of the cover survive. The ‘size’ is defined using a pseudometric that the space X is assumed to
be equipped with. Precise statements are made in the subsections. Let (X, d) be a pseudometric
space, meaning that d satisfies the axioms of a metric (Definition 1.8) except the first axiom, that
is, d(x, x0 ) = 0 may not necessarily imply x = x0 . Assume X is compact. We define a ‘size’ for a
homology class that reflects how big the smallest cycle in the class is w.r.t. the metric d.
218 Computational Topology for Data Analysis

Definition 9.3. The size s(X 0 ) of a subset X 0 of the pseudometric space (X, d) is defined to be
its diameter, that is, s(X 0 ) = sup x,x0 ∈X 0 ×X 0 d(x, x0 ). The size of a class c ∈ H p (X) is defined as
s(c) = inf z∈c s(z). According to Definition 5.3, a set of p-cycles z1 , z2 , . . . , zn of H p (X) is called a
cycle basis if the classes [z1 ], [z2 ], . . . , [zn ] together form a basis of H p (X). It is called an optimal
cycle basis if Σni=1 s(zi ) is minimal among all cycle bases.

Lebesgue number of a cover. Our goal is to characterize the classes in the nerve of U with
respect to the sizes of their preimages in X via the map φU . The Lebesgue number of a cover U
becomes useful in this characterization. It is the largest real number λ(U) so that any subset of X
with size at most λ(U) is contained in at least one element of U. Formally, the Lebesgue number
λ(U) of U is defined as:

λ(U) = sup{δ | ∀X 0 ⊆ X with s(X 0 ) ≤ δ, ∃Uα ∈ U where X 0 ⊆ Uα }.

As we will see below, a homology class of size no more than λ(U) cannot survive in the nerve
(Proposition 9.12). Further, the homology classes whose sizes are significantly larger than the
maximum size of a cover do necessarily survive where we define the maximum size of a cover as

smax (U) := max{s(U)}.


U∈U

Theorem 9.10 summarizes these observations.


Let z1 , z2 , . . . , zg be a non-decreasing sequence of the cycles with respect to their sizes in an
optimal cycle basis of H1 (X). Consider the map φU : X → |N(U)| as introduced in Eqn. (9.1), and
the map φ̄U∗ as defined by Eqn. (9.2). We have the following result.

Theorem 9.10. Let U be a path connected cover of X and z1 , z2 , . . . , zg be a sequence of an


optimal cycle basis of H1 (X) as stated above.

i. Let ` = g + 1 if λ(U) > s(zg ). Otherwise, let ` ∈ [1, g] be the smallest integer so that
s(z` ) > λ(U). If ` , 1, then we have that the class φ̄U∗ [z j ] = 0 for j = 1, . . . , ` − 1.
Moreover, if ` , g + 1, then the classes {φ̄U∗ [z j ]} j=`,...,g generate H1 (N(U)).

ii. The classes {φ̄U∗ [z j ]} j=`0 ,...,g are linearly independent where s(z`0 ) > 4smax (U).

The result above says that only the classes of H1 (X) generated by cycles of large enough size
survive in the nerve. To prove this result, we use a map ρ that sends each 1-cycle in N(U) to
a 1-cycle in X. We define a chain map ρ : C1 (N(U)) → C1 (X) among one dimensional chain
groups as follows. It is sufficient to exhibit the map for an elementary chain of an edge, say
e = {uα , uα0 } ∈ C1 (N(U)). Since e is an edge in N(U), the two cover elements Uα and Uα0 in X
have a common intersection. Let a ∈ Uα and b ∈ Uα0 be two points that are arbitrary but fixed
for Uα and Uα0 respectively. Pick a path ξ(a, b) (viewed as a singular chain) in the union of Uα
and Uα0 which is path connected as both Uα and Uα0 are. Then, define ρ(e) = ξ(a, b). A cycle γ
when pushed back by ρ and then pushed forward by φU remains in the same class. The following
proposition states this fact whose proof appears in [133].

Proposition 9.11. Let γ be any 1-cycle in N(U). Then, [φU (ρ(γ))] = [|γ|].
Computational Topology for Data Analysis 219

The following proposition provides a sufficient characterization of the cycles whose classes
become trivial after the push forward.

Proposition 9.12. Let z be a 1-cycle in C1 (X). Then, [φU (z)] = 0 if λ(U) > s(z).

Proof. It follows from the definition of the Lebesgue number that there exists a cover element
Uα ∈ U such that z ⊆ Uα because s(z) < λ(U). We claim that there is a homotopy equivalence
that sends φU (z) to a vertex in N(U) and hence [φU (z)] is trivial.
Let x be any point in z. Recall that φU (x) = Σi ϕi (x)uαi . Since Uα has a common intersection
with each Uαi so that ϕαi (x) , 0, we can conclude that φU (x) is contained in a simplex with the
vertex uα . Continuing this argument with all points of z, we observe that φU (z) is contained in
simplices that share the vertex uα . It follows that there is a homotopy that sends φU (z) to uα , a
vertex of N(U). 

Proof. [Proof of Theorem 9.10]


Proof of (i): By Proposition 9.12, we have φU∗ [z] = [φU (z)] = 0 if λ(U) > s(z). This establishes
the first part of the assertion because φ̄U∗ = ι◦φU∗ where ι is an isomorphism between the singular
homology of |N(U)| and the simplicial homology of N(U). To see the second part, notice that φ̄U∗
is a surjection by Theorem 9.5. Therefore, the classes φ̄U∗ (z) where s(z) ≥ λ(U) contain a basis
for H1 (N(U)). Hence they generate it.

Proof of (ii): For a contradiction, assume that there is a subsequence {`1 , . . . , `t } ⊂ {`0 , . . . , g}
so that Σtj=1 [φU (z` j )] = 0. Let z = Σtj=1 φU (z` j ). Let γ be a 1-cycle in N(U) so that [z] = [|γ|]
whose existence is guaranteed by Proposition 9.3. As Σtj=1 [φU (z` j )] = 0, it must be that there is
a 2-chain D in N(U) so that ∂D = γ. Consider a triangle t = {uα1 , uα2 , uα3 } contributing to D.
Let a0i = φ−1U
(uαi ). Since t appears in N(U), the covers Uα1 , Uα2 , Uα3 containing a01 , a02 , and a03
respectively have a common intersection in X. This also means that each of the paths a01 { a02 ,
a02 { a03 , a03 { a01 has size at most 2smax (U). Then, ρ(∂t) is mapped to a 1-cycle in X of size at
most 4smax (U). It follows that ρ(∂D) can be written as a linear combination of cycles of size at
most 4smax (U). Since z1 , . . . , zg form an optimal cycle basis of H1 (X), each of the 1-cycles of size
at most 4smax (U) is generated by basis elements z1 , . . . , zk where s(zk ) ≤ 4smax (U). Therefore, the
class of z0 = φU (ρ(γ)) is generated by a linear combination of the basis elements whose preimages
have size at most 4smax (U). The class [z0 ] is same as the class [|γ|] by Proposition 9.11. But, by
assumption [|γ|] = [z] is generated by a linear combination of the basis elements whose sizes are
larger than 4smax (U) reaching a contradiction. Hence the assumption cannot hold and (ii) is true. 

9.3 Mapper and multiscale mapper


In this section we extend the previous results to the structures called mapper and multiscale map-
per. Recall that X is assumed to be compact. Consider a cover of X obtained indirectly as a
pullback of a cover of another space Z. This gives rise to the so-called Mapper. More precisely,
let f : X → Z be a continuous map where Z is equipped with an open cover U = {Uα }α∈A for
some index set A. Since f is continuous, the sets { f −1 (Uα ), α ∈ A} form an open cover of X. For
220 Computational Topology for Data Analysis

each α, we can now consider the decomposition of f −1 (Uα ) into its path connected components,
S jα
and we write f −1 (Uα ) = i=1 Vα,i , where jα is the number of path connected components Vα,i ’s
in f (Uα ). We write f U for the cover of X obtained this way from the cover U of Z and refer to
−1 ∗

it as the pullback cover of X induced by U via f . By construction, every element in this pullback
cover f ∗ U is path connected.
Notice that there are pathological examples of f where f −1 (Uα ) may shatter into infinitely
many path components. This motivates us to consider well-behaved functions f : we require
that for every path connected open set U ⊆ Z, the preimage f −1 (U) has finitely many open path
connected components. Consequently, all nerves of pullbacks of finite covers become finite.

f f −1

X Z X Z X Z

Figure 9.7: Mapper construction: (left) a map f : X → Z from a circle to a subset Z ⊂ R, (middle)
the inverse map f −1 induces a cover of circle from a cover U of Z, (right) the nerves of the two
covers of X and Z: the nerve on the left (quadrangle shaped) is the mapper induced by f and U.

Definition 9.4 (Mapper). Let X and Z be topological spaces and let f : X → Z be a well-behaved
and continuous map. Let U = {Uα }α∈A be a finite open cover of Z. The mapper arising from these
data is defined to be the nerve of the pullback cover f ∗ (U) of X; that is, M(U, f ) := N( f ∗ (U)). See
an illustration in Figure 9.7.
Notice that we define the mapper using finite covers which allow us to extend definitions
of persistence modules and persistence diagrams from previous chapters to the case of mappers.
However, in the next Remark and later we allow infinite covers for simplicity. The definition of
mapper remains valid with infinite covers.
Remark 9.1. The construction of mapper is quite general if we allow the cover U to be infinite.
For example, it can encompass both the Reeb graph and merge trees: consider X a topological
space and f : X → R. Then, consider the following two options for U = {Uα }α∈A , the other
ingredient of the construction:
• Uα = (−∞, α) for α ∈ A = R. This corresponds to sublevel sets which in turn lead to merge
trees. See, for example, the construction in Figure 9.8(b).
• Uα = (α − ε, α + ε) for α ∈ A = R, for some fixed ε > 0. This corresponds to (ε-thick)
level sets, which induce a relaxed notion of Reeb graphs. See the description in “Mapper
for PCD” below and Figure 9.8(a).
In these two examples, for simplicity of presentation, the set A is allowed to have infinite cardi-
nality. Also, note one can take any open cover of R in this definition. This may give rise to other
constructions beyond merge trees or Reeb graphs. For instance, using the infinite setting for sim-
plicity again, one may choose any point r ∈ R and let Uα = (r − α, r + α) for each α ∈ A = R or
other constructions.
Computational Topology for Data Analysis 221

Mapper for PCD: Consider a finite metric space (P, dP ), that is, a point set P with distances
between every pair of points. For a real r ≥ 0, one can construct a graph Gr (P) with every
point in P as a vertex where an edge (p, p0 ) is in Gr (P) if and only if dP (p, p0 ) ≤ r. Let f :
P → R be a real-valued function on the point set P. For a set of intervals U covering R, we
can construct the mapper as follows. For every interval (a, b) ∈ U, let P(a,b) = f −1 ((a, b)) be
the set of points with function values in the range (a, b). Each such set consists of a partition
P(a,b) = tPi(a,b) determined by the graph connectivity of Gr (P). Each set Pi(a,b) consists of the
vertices of a connected component of the subgraph of Gr (P) spanned by the vertices in P(a,b) .
The vertex sets (a,b)∈U {Pi(a,b) } thus obtained over all intervals constitute a cover f −1 (U) of P.
S
The nerve of this cover is the mapper M(P, f ). Here the intersection between cover elements is
determined by the intersection of discrete sets.
Observe that, in the above construction, if one takes the intervals of U = {Ui }i∈Z where
Ui = (i − ε, i + ε) for some ε ∈ (0, 1) causing only two consecutive intervals overlap partially,
then we get a discretized approximation of the Reeb graphs of the function that f approximates
on the discretized sample P. Figure 9.8 illustrates this observation. In the limit that each interval
degenerates to a point, the discretized Reeb converges to the original Reeb graph as shown in [133,
240].

(a) (b)

Figure 9.8: Mapper construction for point cloud, a map f : P → Z from a PCD P to a subset
Z ⊂ R; the graph Gr is not shown: (a) covers are intervals; points are colored with the interval
colors, gray points have values in two overlapping intervals, the mapper is a discretized Reeb
graph; (b) the covers are sublevel sets, points are colored with the smallest levelset they belong
to, discretized Reeb graph does not have the central loop any more.
222 Computational Topology for Data Analysis

9.3.1 Multiscale Mapper


A mapper M(U, f ) is a simplicial complex encoding the structure of f through the lens of Z. How-
ever, the simplicial complex M(U, f ) provides only one snapshot of X at a fixed scale determined
by the scale of the cover U. Using the idea of persistent homology, we study the evolution of the
mapper M( f, Ua ) for a tower of covers U = {Ua }a∈A . The tower by definition coarsens the cover
with increasing indices and hence provides mappers at multiple scales.
As an intuitive example, consider a real-valued function f : X → R, and a cover Uε of R con-
sisting of all possible intervals of length ε. Intuitively, as ε tends to 0, the corresponding Mapper
M( f, Uε ) approaches the Reeb graph of f . As ε increases, we look at the Reeb graph at coarser
and coarser resolution. The multiscale mapper in this case roughly encodes this simplification
process.
The idea of multiscale mapper requires a sequence of covers of the target space connected by
cover maps. Through pullbacks, it generates a sequence of covers on the domain. In particular,
first we have:

Proposition 9.13. Let f : X → Z, and U and V be two covers of Z with a map of covers
ξ : U → V. Then, there is a corresponding map of covers between the respective pullback covers
of X: f ∗ (ξ) : f ∗ (U) −→ f ∗ (V).

Proof. Indeed, we only need to note that if U ⊆ V, then f −1 (U) ⊆ f −1 (V), and therefore it is clear
that each path connected component of f −1 (U) is included in exactly one path connected compo-
nent of f −1 (V). More precisely, let U = {Uα }α∈A , V = {Vβ }β∈B , with Uα ⊆ Vξ(α) for α ∈ A. Let
bα,i , i ∈ {1, . . . , nα } denote the connected components of f −1 (Uα ) and b
U Vβ, j , j ∈ {1, . . . , mβ } denote
the connected components of f (Vβ ). Then, the map of covers f (ξ) from f ∗ (U) to f ∗ (V) is given
−1 ∗

by requiring that each set U bα,i is sent to the unique set of the form b bα,i ⊆ b
Vξ(α), j so that U Vξ(α), j . 
ξ ζ
Furthermore, observe that if U → V → W are three different covers of a topological space
with the intervening maps of covers between them, then f ∗ (ζ ◦ ξ) = f ∗ (ζ) ◦ f ∗ (ξ).
The above result for three covers easily extends to multiple covers and their pullbacks. The
sequence of pullbacks connected by cover maps and the corresponding sequence of nerves con-
nected by simplicial maps define multiscale mappers. Recall the definition of towers (Defini-
ua,a0
tion 4.1) to designate a sequence of objects connected with maps. Let U = Ua −→ Ua0 r≤a≤a0

denote a tower, where r = res(U) refers to its resolution. The objects here can be covers, simpli-
cial complexes, or vector spaces. The notion of resolution and the variable a intuitively specify
the granularity of the covers and the simplicial complexes induced by them.
The pullback property given by Proposition 9.13 makes it possible to take the pullback of a
given tower of covers of a space via a given continuous function into another space as stated in
proposition below.

Proposition 9.14. Let U be a cover tower of Z and f : X → Z be a continuous function. Then,


f ∗ (U) is a cover tower of X.

In general, given a cover tower W of a space X, the nerve of each cover in W together with
simplicial maps induced by each map of W provides a simplicial tower which we denote by N(W).
Computational Topology for Data Analysis 223

Definition 9.5 (Multiscale Mapper). Let X and Z be topological spaces and f : X → Z be a


continuous map. Let U be a cover tower of Z. Then, the multiscale mapper is defined to be the
simplicial tower obtained by the nerve of the pullback:
MM(U, f ) := N( f ∗ (U))
where the simplicial maps are induced by the respective cover maps. See Figure 9.9 for an illus-
tration.

f −1
N erve

N (f ∗ Ua0 ) f ∗ Ua0 Ua0


N (f ∗ (ua,a0 )) f ∗ (ua,a0 ) ua,a0
N (f ∗ Ua ) f ∗ Ua Ua

N (f ∗ Ur ) f ∗ Ur Ur

ST(X) CT(X) CT(Z)

Figure 9.9: Illustrating construction of multiscale mapper from a cover tower; CT and ST denote
cover and simplicial towers respectively, that is, CT(Z) = U, CT(X) = f ∗ (U), and ST(X) =
N( f ∗ (U)).

Consider for example a sequence res(U) ≤ a1 < a2 < . . . < an of n distinct real numbers.
Then, the definition of multiscale mapper MM(U, f ) gives rise to the following simplicial tower:
N( f ∗ (Ua1 )) → N( f ∗ (Ua2 )) → · · · → N( f ∗ (Uan )). (9.3)
which is a sequence of simplicial complexes connected by simplicial maps.
Applying to them the homology functor H p (·), p = 0, 1, 2, . . ., with coefficients in a field, one
obtains a persistence module: tower of vector spaces connected by linear maps.
H p N( f ∗ (Ua1 )) → · · · → H p N( f ∗ (Uan )) .
 
(9.4)
Given our assumptions that the covers are finite and that the function f is well-behaved, we
obtain that the homology groups of all nerves have finite dimensions. Thus, we get a persistence
module which is p.f.d.(see Section 3.4). Now one can summarize the persistence module induced
by MM(U, f ) with its persistent diagram Dgm p MM(U, f ) for each dimension p ∈ N. The diagram
Dgm p MM(U, f ) can be viewed as a topological summary of f through the lens of U.

9.3.2 Persistence of H1 -classes in mapper and multiscale mapper


To apply the results for nerves in Section 9.2 to mappers and multiscale mappers, we need a ‘size’
measure on X. For this, we assume that Z is a metric space and we pull back the metric to X via
f : X → Z. Assuming that X is path connected, let ΓX (x, x0 ) denote the set of all continuous paths
γ : [0, 1] → X between any two given points x, x0 ∈ X so that γ(0) = x and γ(1) = x0 .
224 Computational Topology for Data Analysis

Definition 9.6 (Pullback metric). Given a metric space (Z, dZ ), we define its pullback metric as
the following pseudometric d f on X: for x, x0 ∈ X,

d f (x, x0 ) := inf diamZ ( f ◦ γ).


γ∈ΓX (x,x0 )

Consider the Lebesgue number of the pullback covers of X. The following observation in this
respect is useful.

Proposition 9.15. Let U be a cover for the codomain Z and U0 be its restriction to f (X). Then,
the pullback cover f ∗ U has the same Lebesgue number as that of U0 ; that is λ( f ∗ U) = λ(U0 ).

Proof. First, observe that, for any path connected cover of X, a subset of X that realizes the
Lebesgue number can be taken as path connected because, if not, this subset can be connected
by a path entirely lying within the cover element containing it. Let X 0 ⊆ X be any subset where
s(X 0 ) ≤ λ(U0 ). Then, f (X 0 ) ⊆ Z has a diameter at most λ(U0 ) by the definitions of size (Definition
9.3) and pullback metric. Therefore, by the definition of Lebesgue number, f (X 0 ) is contained in
a cover element U 0 ∈ U0 . Since X 0 is path connected, a path connected component of f −1 (U 0 )
contains X 0 . It follows that there is a cover element in f ∗ U that contains X 0 . Since X 0 was chosen
as an arbitrary path connected subset of size at most λ(U0 ), we have λ( f ∗ U) ≥ λ(U0 ). At the same
time, it is straightforward from the definition of size that each cover element in f −1 (U 0 ) has at
most the size of U 0 for any U 0 ∈ U0 . Combining with the fact that U0 is the restriction of U to
f (X), we have λ( f ∗ U) ≤ λ(U0 ), establishing the equality as claimed. 

Given a cover U of Z, consider the mapper N( f ∗ U). Let z1 , . . . , zg be a set of optimal cycle
basis for H1 (X) where the metric used to define optimality is the pullback metric d f . Then, as a
consequence of Theorem 9.10 we have:

Theorem 9.16. Let f : X → Z be a map from a path connected space X to a metric space Z
equipped with a cover U (i and ii below) or a tower of covers {Ua } (iii below). Let U0 be the
restriction of U to f (X).

i Let ` = g + 1 if λ(U0 ) > s(zg ). Otherwise, let ` ∈ [1, g] be the smallest integer so that
s(z` ) > λ(U0 ). If ` , 1, the class φU∗ [z j ] = 0 for j = 1, . . . , ` − 1. Moreover, if ` , g + 1, the
classes {φU∗ [z j ]} j=`,...,g generate H1 (N( f ∗ U)).

ii The classes {φU∗ [z j ]} j=`0 ,...,g are linearly independent where s(z`0 ) > 4smax (U).

iii Consider a H1 -persistence module of a multiscale mapper induced by a tower of path con-
nected covers:
 s1∗  s2∗ sn∗
H1 N( f ∗ Ua0 ) → H1 N( f ∗ Ua1 ) → · · · → H1 N( f ∗ Uan )

(9.5)

Let ŝi∗ = si∗ ◦ s(i−1)∗ ◦ · · · ◦ φ̄Ua0 ∗ . Then, the assertions in (i) and (ii) hold for H1 (N( f ∗ Uai ))
with the map ŝi∗ : X → N( f ∗ Uai ).
Computational Topology for Data Analysis 225

9.4 Stability
To be useful in practice, the multiscale mapper should be stable against the perturbations in the
maps and the covers. we show that such a stability is enjoyed by the multiscale mapper under
some natural condition on the tower of covers. Recall that previous stability results for towers as
described in section 4.1 were drawn on the notion of interleaving. We identify compatible notions
of interleaving for cover towers as a way to measure the “closeness" between two cover towers.

9.4.1 Interleaving of cover towers and multiscale mappers


In this section we consider cover and simplicial towers indexed over R. In practice, we often have
ua,a0
a cover tower U = Ua −→ Ua0 a≤a0 indexed by a discrete set in A ⊂ R. Any such tower can be

extended to a cover tower indexed over R by taking Uε = Ua for each index ε ∈ (a, a0 ) where a, a0
are any two consecutive indices in the ordered set A.

Definition 9.7 (Interleaving of cover towers). Let U = {Ua } and V = {Va } be two cover towers
of a topological space X so that res(U) = res(V) = r. Given η ≥ 0, we say that U and V are
η-interleaved if one can find cover maps ζa : Ua → Va+η and ξa0 : Va0 → Ua0 +η for all a, a0 ≥ r;
see the diagram below.

· · · −→ Ua / Ua+η /2 Ua+2η −→ · · ·
>
ζa ξa+η
ξa ζa+η

· · · −→ Va / Va+η /, Va+2η −→ · · ·

Analogously, if we replace the operator ‘+’ by the multiplication ‘·’ in the above definition, then
we say that U and V are multiplicatively η-interleaved.

Proposition 9.17. (i) If U and V are (multiplicative) η1 -interleaved and V and W are (multi-
plicative) η2 -interleaved, then, U and W are (multiplicative (η1 η2 )-) (η1 + η2 )-interleaved. (ii) Let
f : X → Z be a continuous function and U and V be two (multiplicative) η-interleaved tower of
covers of Z. Then, f ∗ (U) and f ∗ (V) are also (multiplicative) η-interleaved.

Note that in the definition of interleaving cover towers, we do not have explicit requirement
that maps need to make sub-diagrams commute unlike the the interleaving between simplicial
towers (Definition 4.2). However, it follows from Proposition 9.2 that interleaving cover towers
lead to interleaving between simplicial towers for N(U) and N(V) as shown in the proposition
below.

Proposition 9.18. Let U and V be two (multiplicatively) η-interleaved cover towers of X with
res(U) = res(V). Then, N(U) and N(V) are also (multiplicatively) η-interleaved.

Proof. We prove the proposition for additive interleaving. Replacing the ‘+’ operator with ‘·’
gives the proof for multiplicative interleaving. Let r denote the common resolution of U and V.
 ua,a0  va,a0
Write U = Ua −→ Ua0 r≤a≤a0 and V = Va −→ Va0 r≤a≤a0 , and for each a ≥ r let ζa : Ua → Va+η
and ξa : Va → Ua+η be given as in Definition 9.7. To define interleaving between the towers of
nerves arising out of covers, we consider similar diagrams to (4.3) at the level of covers involving
226 Computational Topology for Data Analysis

covers of the form Ua and Va , and apply the nerve construction. This operation yields diagrams
identical to those in (4.3) where for every a, a0 where a0 ≥ a ≥ r:

• Ka := N(Ua ), La := N(Va ),

• xa,a0 := N(ua,a0 ), for r ≤ a ≤ a0 ; ya,a0 := N(va,a0 ), for r ≤ a ≤ a0 ; ϕa := N(ζa ), and


ψa := N(ξa ).

To satisfy Definition 4.2, it remains to verify conditions (i) to (iv). We only verify (i), since the
proof of the others follows the same arguments. For this, notice that both the composite map
ξa+η ◦ ζa and ua,a+2η are maps of covers from Ua to Ua+2η . By Proposition 9.2 we then have that
N(ξa+η ◦ ζa ) and N(ua,a+2η ) = fa,a+2η are contiguous. But, by the properties of the nerve construc-
tion N(ξa+η ◦ ζa ) = N(ξa+η ) ◦ N(ζa ) = ψa+η ◦ ϕa , which completes the claim. 

Combining Proposition 9.17 and Proposition 9.18, we get that the two multiscale mappers un-
der cover perturbations stay stable, which is the first part of Corollary 9.19. Recall from Chapter 4
that, for a finite simplicial tower S and p ∈ N, we denote by Dgm p (S) the p-th persistence dia-
gram of the tower S with coefficients in a fixed field. Using Proposition 9.18 and Theorem 4.3, we
have a stability result for Dgm p MM(U, f ) when f is kept fixed but the cover tower U is perturbed,
which is the second part of the corollary below.

Corollary 9.19. For η ≥ 0, let U and V be two finite cover towers of Z with res(U) = res(V) >
0. Let f : X → Z be well-behaved and U and V be η-interleaved. Then, MM(U, f ) and
MM(V, f ) are η-interleaved. In particular, the bottleneck distance between the persistence di-
agrams Dgm p MM(U, f ) and Dgm p MM(V, f ) is at most η for all p ∈ N.

9.4.2 (c, s)-good covers


Although Dgm p MM(U, f ) is stable under perturbations of the covers U as we showed, it is not
necessarily stable under perturbations of the map f . To address this issue, we introduce a special
family of covers called (c,s)-good covers. To define these covers, we use the index value of the
covers to denote their scales. The notation ε for indexing is chosen to emphasize this meaning.

Definition 9.8 ((c, s)-good cover tower). Given a cover tower U = {Uε }ε≥s>0 , we say that it is
(c,s)-good if for any ε ≥ s > 0, we have that (i) smax (Uε ) ≤ ε and (ii) λ(Ucε ) ≥ ε.

As an example, consider the cover tower U = {Uε }ε≥s with Uε := {Bε/2 (z) | z ∈ Z}. It is a
(2, s)-good cover tower of the metric space (Z, dZ ).
We now characterize the persistent homology of multiscale mappers induced by (c, s)-good
cover towers. Theorem 9.20 states that the multiscale-mappers induced by any two (c, s)-good
cover towers interleave with each other, implying that their respective persistence diagrams are
also close under the bottleneck distance. From this point of view, the persistence diagrams in-
duced by any two (c, s)-good cover towers contain roughly the same information.
uε,ε0 vε,ε0
Theorem 9.20. Given a map f : X → Z, let U = {Uε −→ Uε0 ε≤ε0 and V = {Vε −→ Vε0 ε≤ε0
be two (c, s)-good cover towers of Z. Then the corresponding multiscale mappers MM(U, f ) and
MM(V, f ) are multiplicatively c-interleaved.
Computational Topology for Data Analysis 227

Proof. First, we make the following observation.

Claim 9.2. Any two (c, s)-good cover towers U and V are multiplicatively c-interleaved.

Proof. It follows easily from the definitions of (c, s)-good cover tower. Specifically, first we
construct ζε : Uε → Vcε . For any U ∈ Uε , we have that diam(U) ≤ ε. Furthermore, since V is
(c, s)-good, there exists V ∈ Vcε such that U ⊆ V. Set ζε (U) = V; if there are multiple choice of
V, we can choose an arbitrary one. We can construct ξε0 : Vε0 → Ucε0 in a symmetric manner, and
the claim then follows. 
This claim, combined with Propositions 9.17 and 9.18, prove the theorem. 

We also need the following definition in order to state the stability results precisely.

Definition 9.9. Given a tower of covers U = {Uε } and ε0 ≥ res(U), we define the ε0 -truncation of
U as the tower Trε0 (U) := Uε ε0 ≤ε . Observe that, by definition res(Trε0 (U)) = ε0 .


Proposition 9.21. Let X be a compact topological space, (Z, dZ ) be a compact path connected
metric space, and f, g : X → Z be two continuous functions such that for some δ ≥ 0 one has that
δ = max x∈X dZ ( f (x), g(x)). Let W be any (c, s)-good cover tower of Z. Let ε0 = max(1, s). Then,
the ε0 -truncations of f ∗ (W) and g∗ (W) are multiplicatively 2c max(δ, s) + c -interleaved.


Proof. For notational convenience write η := 2c max(δ, s) + c, {Ut } = U := f ∗ (W), and {Vt } =
V := g∗ (W). With regards to satisfying Definition 4.2 for U and V, for each ε ≥ ε0 we need only
exhibit maps of covers ζε : Uε → Vηε and ξε : Vε → Uηε . We first establish the following, where
recall that the offset Or is defined as Or := {z ∈ Z | dZ (z, O) ≤ r}.

Claim 9.3. For all O ⊂ Z, and all δ0 ≥ δ, f −1 (O) ⊆ g−1 (Oδ ).


0

Proof. Let x ∈ f −1 (O), then dZ ( f (x), O) = 0. Thus,

dZ (g(x), O) ≤ dZ ( f (x), O) + dZ (g(x), f (x)) ≤ δ,

which implies the claim. 

Now, pick any ε ≥ ε0 , any U ∈ Uε , and fix δ0 := max(δ, s). Then, there exists W ∈ Wε such
that U ∈ cc( f −1 (W)), where cc(Y) stands for the set of path connected components of Y. Claim
9.3 implies that f −1 (W) ⊆ g−1 (W δ ). Since W is a (c, s)-good cover of the connected space Z
0

and s ≤ max(δ, s) ≤ 2δ0 + ε, there exists at least one set W 0 ∈ Wc(2δ0 +ε) such that W δ ⊆ W 0 .
0

This means that U is contained in some element of cc(g−1 (W 0 )) where W 0 ∈ Wc(2δ0 +ε) . But, also,
since c(2δ0 + ε) ≤ c(2δ0 + 1)ε for ε ≥ ε0 ≥ 1, there exists W 00 ∈ Wc(2δ0 +1)ε such that W 0 ⊆ W 00 .
This implies that U is contained in some element of cc(g−1 (W 00 )) where W 00 ∈ Wc(2δ0 +1)ε . This
process, when applied to all U ∈ Uε , all ε ≥ ε0 , defines a map of covers ζt : Ut → V(2cδ0 +c)ε . A
similar observation produces for each ε ≥ ε0 a map of covers ξε from Vε to V(2cδ0 +c)ε .
So we have in fact proved that ε0 -truncations of U and V are multiplicatively η-interleaved. 

Applying Proposition 9.21, Proposition 9.18, and Corollary 4.4, we get the following result,
where Dgmlog stands for the persistence diagram at the log-scale (of coordinates).
228 Computational Topology for Data Analysis

Corollary 9.22. Let W be a (c, s)-good cover tower of the compact connected metric space Z and
let f, g : X → Z be any two well-behaved continuous functions such that max x∈X dZ ( f (x), g(x)) =
δ. Then, the bottleneck distance between the persistence diagrams
1
db (Dgmlog MM(W, f , Dgmlog MM W, g ) ≤ log(2c max(s, δ) + c) + max(0, log ).
 
s
Proof. We use the notation of Proposition 9.21. Let U = f (W) and V = g (W). If max(1, s) = s,
∗ ∗

then U and V are multiplicatively (2c max(s, δ) + c)-interleaved by Proposition 9.21 which gives a
bound on the bottleneck distance of log(2c max(s, δ) + c) between the corresponding persistence
diagrams at the log-scale by Corollary 4.4. In the case when s < 1, the bottleneck distance
remains the same only for the 1-truncations of U and V. Shifting the starting point of the two
families to the left by at most s can introduce barcodes of lengths at most log 1s or can stretch
the existing barcodes to the left by at most log 1s for the respective persistence modules at the
log-scale. To see this, consider the persistence module below where ε1 = s:
Hk N( f ∗ (Uε1 )) → Hk N( f ∗ (Uε2 )) → · · · · · · → Hk N( f ∗ (U1 )) → · · · → Hk N( f ∗ (Uεn ))
   

A homology class born at any index in the range [s, 1) either dies at or before the index 1 or
is mapped to a homology class of Hk N( f ∗ (U1 )) . In the first case we have a bar code of length at

most | log s| = log 1s at the log-scale. In the second case, a bar code of the persistence module

Hk N( f ∗ (Uε1 )) → · · · → Hk N( f ∗ (Uεn ))
 

starting at index 1 gets stretched to the left by at most | log s| = log 1s . The same conclusion can
be drawn for the persistence module induced by V. Therefore the bottleneck distance between
the respective persistence diagrams at log-scale changes by at most log 1s . 

9.4.3 Relation to intrinsic Čech filtration


In Section 9.3.2, we have seen that given a tower of covers U and a map f : X → Z there exists a
natural pull-back pseudo-metric d f defined on the input domain X (Definition 9.6). With such a
pseudo-metric on X, we can now construct the standard (intrinsic) Čech filtration C(X) = {Cε (X)}ε
(or Rips filtration) in X directly, instead of computing the Nerve complex of the pull-back covers
as required by mapper. The resulting filtration C(X) is connected by inclusion maps instead of
simplicial maps. This is easier for computational purposes even though one has a method to
compute the persistence diagram of a tower involving arbitrary simplicial maps (Section 4.2 and
4.4). Furthermore, it turns out that the resulting sequence of Čech complexes C interleaves with
the sequence of complexes MM(U, f ), implying that their corresponding persistence diagrams
approximate each other. Specifically, in Theorem 9.23, we show that when the codomain of
the function f : X → Z is a metric space (Z, dZ ), the multiscale mapper induced by any (c, s)-
good cover tower interleaves (at the homology level) with an intrinsic Čech filtration of X defined
below. We have already considered Čech filtrations before (Section 6.1). However, we considered
only a finite subset of a metric space to define the Čech complex (Definition 2.9). Here we redefine
it again to account for the fact that each point of the (pseudo-)metric space is considered and call
it intrinsic Čech complex (see an earlier example of intrinsic Čech complex when we analyze
graphs in Section 8.1.2)).
Computational Topology for Data Analysis 229

Definition 9.10. Given a (pseudo)metric space (Y, dY ), its intrinsic Čech complex Cr (Y) at scale r
is defined as the nerve complex of the set of intrinsic r-balls {B(y; r)}y∈Y defined using (pseudo)metric
dY .

The above definition gives way to defining a Čech filtration.

Definition 9.11 (Intrinsic Čech filtration). The intrinsic Čech filtration of the (pseudo)metric
space (Y, dY ) is
0
C(Y) = {Cr (Y) ,→ Cr (Y)}0<r<r0 .
0
The intrinsic Čech filtration at resolution s is defined as C s (Y) = {Cr (Y) ,→ Cr (Y)} s≤r<r0 .

Recall the definition of the pseudometric d f on X (Definition 9.6) induced from a metric on Z.
Applying Definition 9.10 on the pseudometric space (X, d f ), we obtain its intrinsic Čech complex
Cr (X) at scale r and then its Čech filtration C s (X).

Theorem 9.23. Let C s (X) be the intrinsic Čech filtration of (X, d f ) starting with resolution s. Let
uε,ε0
U = {Uε −→ Uε0 s≤ε≤ε0 be a (c, s)-good cover tower of the compact connected metric space Z.
Then the multiscale mapper MM(U, f ) and C s (X) are multiplicatively 2c-interleaved.

By Corollary 4.4 on multiplicative interleaving, the following result is deduced immediately


from Theorem 9.23.

Corollary 9.24. Given a continuous map f : X → Z and a (c, s)-good cover tower U of Z,
let Dgmlog MM(U, f ) and Dgmlog C s denote the log-scaled persistence diagram of the persistence
modules induced by MM(U, f ) and by the intrinsic Čech filtration C s of (X, d f ) respectively. We
have that
db (Dgmlog MM(U, f ), Dgmlog C s ) ≤ 2c.

9.5 Exact Computation for PL-functions on simplicial domains


The stability result in Theorem 9.23 further motivates us to design efficient algorithms for con-
structing multiscale mapper or its approximation in practice. A priori, the construction of the
mapper and multiscale mapper may seem clumsy. Even for PL-functions defined on a simplicial
complex, the standard algorithm needs to determine for each simplex the subset (partial simplex)
on which the function value falls within a certain range. We observe that for such an input, it is
sufficient to consider the restriction of the function to the 1-skeleton of the complex for computing
the mapper and the multiscale mapper. Since the 1-skeleton (a graph) is typically much smaller
in size than the full complex, this helps improving the time efficiency of computing the mapper
and multiscale mapper.
Consider one of the most common types of input in practice, a real-valued PL-function f :
|K| → R defined on the underlying space |K| of a simplicial complex K given as a vertex function.
In what follows, we consider this PL setting, and show that interestingly, if the input function
satisfies a mild “minimum diameter" condition, then we can compute both mapper and multiscale
mapper from simply the 1-skeleton (graph structure) of K. This makes the computation of the
multiscale mapper from a PL-function significantly faster and simpler as its time complexity
230 Computational Topology for Data Analysis

depends on the size of the 1-skeleton of K, which is typically orders of magnitude smaller than
the total number of simplices (such as triangles, tetrahedra, etc) in K.
Recall that K 1 denote the 1-skeleton of a simplicial complex K: that is, K 1 contains the set of
vertices and edges of K. Define f˜ : |K 1 | → R to be the restriction of f to |K 1 |; that is, f˜ is the PL
function on |K 1 | induced by function values at vertices.

Condition 9.1 (Minimum diameter condition). For a cover tower W of a compact connected
metric space (Z, dZ ), let
κ(W) := inf{diam(W); W ∈ W ∈ W}
denote the minimum diameter of any element of any cover of the tower W. Given a simplicial
complex K with a function f : |K| → Z and a tower of covers W of the metric space Z, we say
that (K, f, W) satisfies the minimum diameter condition if diam( f (σ)) ≤ κ(W) for every simplex
σ ∈ K.

In our case, f is a PL-function, and thus satisfying the minimum diameter condition means
that for every edge e = (u, v) ∈ K 1 , | f (u) − f (v)| ≤ κ(W). In what follows we assume that K is
connected. We do not lose any generality by this assumption because the arguments below can be
applied to each connected component of K.
sε,ε0
Definition 9.12 (Isomorphic simplicial towers). Two simplicial towers S = S ε −→ S ε0 and

tε,ε0
T = T ε −→ T ε0 are isomorphic, denoted S  T, if res(S) = res(T), and there exist simplicial

isomorphisms ηε and ηε0 such that the diagram below commutes for all res(S) ≤ ε ≤ ε0 .

sε,ε0
Sε / S ε0

ηε ηε0
tε,ε0
Tε / T ε0

Our main result in this section is the following theorem which enables us to compute the mapper,
multiscale mapper, as well as the persistence diagram for the multiscale mapper of a PL function
f from its restriction f˜ to the 1-skeleton of the respective simplicial complex.

Theorem 9.25. Given a PL-function f : |K| → R and a tower of covers W of the image of f
with (K, f, W) satisfying the minimum diameter condition, we have that MM(W, f )  MM(W, f˜),
where f˜ is the restriction of f to |K 1 |.

We show in Proposition 9.26 that the two mapper outputs M(W, f ) and M(W, f˜) are identical
up to a relabeling of their vertices (hence simplicially isomorphic) for every W ∈ W. Also, since
the simplicial maps in the filtrations MM(W, f ) and MM(W, f˜) are induced by the pullback of the
same tower of covers W, they are identical again up to the same relabeling of the vertices. This
then establishes the theorem.
In what follows, for clarity of exposition, we use X and X 1 to denote the underlying space |K|
and |K 1 | of K and K 1 , respectively. Also, we do not distinguish between a simplex σ ∈ K and its
image |σ| ⊆ X and thus freely say σ ⊆ X when it actually means that |σ| ⊆ X for a simplex σ ∈ K.
Computational Topology for Data Analysis 231

Proposition 9.26. If (K, f, W) satisfies the minimum diameter condition, then for every W ∈ W,
M(W, f ) is identical to M(W, f˜) up to relabeling of the vertices.

Proof. Let U = f ∗ W and Ũ = f˜∗ W. By definition of f˜, each Ũ ∈ Ũ is a connected component of


some U ∩ X 1 for some U ∈ U. In Proposition 9.27, we show that U ∩ X 1 is connected for every
U ∈ U. Therefore, for every element U ∈ U, there is a unique element Ũ = U ∩ X 1 in Ũ and vice
versa. It is not hard to show that ki=1 Ui , ∅ if and only if ki=1 Ũi , ∅. This finishes the proof. 
T T

Proposition 9.27. If (X, f, W) satisfies the minimum diameter condition, then for every W ∈ W
and every U ∈ f ∗ (W), the set U ∩ X 1 is connected.

Proof. Fix U ∈ f ∗ (W). If U ∩ X 1 is not connected, let C1 , . . . , Ck denote its k ≥ 2 connected


components. First, we show that each Ci contains at least one vertex of X 1 . Let e = (u, v)
be any edge of X 1 that intersects U. If both ends u and v lie outside U, then | f (u) − f (v)| >
| maxU f − minU f | ≥ κ(W). But, this violates the minimum diameter condition. Thus, at least one
vertex of e is contained in U. It immediately follows that Ci contains at least one vertex of X 1 .
Let ∆ be the set of all simplices σ ⊆ X so that σ ∩ U , ∅. Fix σ ∈ ∆ and let x be any point
in σ ∩ U. We defer the proof of the following claim as an exercise.

Claim 9.4. There exists a point y in an edge of σ so that f (x) = f (y).

Since σ contains an edge e that is intersected by U, it contains a vertex of e that is contained


in U. This means every simplex σ ∈ ∆ has a vertex contained in U. For each i = 1, . . . , k let
∆i := {σ ⊆ X | V(σ) ∩ Ci , ∅.} Since every simplex σ ∈ ∆ has a vertex contained in U, we have
∆ = i ∆i . We argue that the sets ∆1 , . . . , ∆k are disjoint from each other. Otherwise, there exist
S
i , j and a simplex σ with a vertex u in ∆i and another vertex v in ∆ j . Then, the edge (u, v) must
be in U because f is PL. But, this contradicts that Ci and C j are disjoint. This establishes that
each ∆i is disjoint from each other and hence ∆ is not connected contradicting that U is connected.
Therefore, our initial assumption that U ∩ X 1 is disconnected is wrong. 

9.6 Approximating multiscale mapper for general maps


While results in the previous section concern real-valued PL-functions, we now provide a sig-
nificant generalization for the case where f maps the underlying space of K into an arbitrary
compact metric space Z. We present a “combinatorial” version of the (multiscale) mapper where
each connected component of a pullback f −1 (W) for any cover W in the cover of Z consists of
only vertices of K. Hence, the construction of the Nerve complex for this modified (multiscale)
mapper is purely combinatorial, simpler, and more efficient to implement. But we lose the “ex-
actness”, that is, in contrast with the guarantees provided by Theorem 9.25, the combinatorial
mapper only approximates the actual multiscale mapper at the homology level. Also, it requires
a (c, s)-good tower of covers of Z. One more caveat is that the towers of simplicial complexes
arising in this case do not interleave in the (strong) sense of Definition 4.4 but in a weaker sense
(Definition 6.6). This limitation worsens the approximation result by a factor of 3.
232 Computational Topology for Data Analysis

f
G

W f −1 (W )

Figure 9.10: Partial thickened edges belong to the two connected components in f −1 (W). Note
that each set in ccG ( f −1 (W)) contains only the set of vertices of a component in cc( f −1 (W)).

In what follows, as before, cc(O) for a set O denotes the set of all path connected components
of O.
Given a map f : |K| → Z defined on the underlying space |K| of a simplicial complex K, to
construct the mapper and multiscale mapper, one needs to compute the pullback cover f ∗ (W) for
a cover W of the compact metric space Z. Specifically, for any W ∈ W one needs to compute
the preimage f −1 (W) ⊂ |K| and shatter it into connected components. Even in the setting adopted
in 9.5, where we have a PL function f˜ : |K 1 | → R defined on the 1-skeleton K 1 of K, the
connected components in cc( f˜−1 (W)) may contain vertices, edges, and also partial edges: say for
an edge e ∈ K 1 , its intersection eW = e ∩ f −1 (W) ⊆ e, that is, f (eW ) = f (e) ∩ W, is a partial
edge. See Figure 9.10 for an example. In general for more complex maps, σ ∩ f −1 (W) for any
k-simplex σ may be partial triangles, tetrahedra, etc., which can be nuisance for computations.
The combinatorial version of mapper and multiscale mapper sidesteps this problem by ensuring
that each connected component in the pullback f −1 (W) consists of only vertices of K. It is thus
simpler and faster to compute.

9.6.1 Combinatorial mapper and multiscale mapper


Let G be a graph with vertex set V(G) and edge set E(G). Suppose we are given a map f :
V(G) → Z and a finite open cover W = {Wα }α∈A of the metric space (Z, dZ ). For any Wα ∈ W, the
preimage f −1 (Wα ) consists of a set of vertices which is shattered into subsets by the connectivity
of the graph G. These subsets are taken as connected components. We now formalize this:

Definition 9.13 (G-induced connected component). Given a set of vertices O ⊆ V(G), the set
of connected components of O induced by G, denoted by ccG (O), is the partition of O into a
maximal subset of vertices connected in GO ⊆ G, the subgraph spanned by vertices in O. We
refer to each such maximal subset of vertices as a G-induced connected component of O. We
define f ∗G (W), the G-induced pull-back via the function f , as the collection of all G-induced
connected components ccG ( f −1 (Wα )) for all α ∈ A.

Definition 9.14. (G-induced multiscale mapper) Similar to the mapper construction, we define
Computational Topology for Data Analysis 233

Algorithm 17 MMapper( f, K, W)
Input:
f : |K| → Z given by fV : V(K) → Z, a cover tower W = {W1 , . . . , Wt }
Output:
Persistence diagram Dgm∗ (MMK 1 (W, fV )) induced by the combinatorial MM of f w.r.t. W
1: for i = 1, . . . , t do
j
2: compute VW ⊆ V(K) where f (VW ) = f (V(K)) ∩ W and {VW } j = ccK 1 (VW ), ∀W ∈ Wi ;
j
3: compute Nerve complex Ni = N({VW } j,W ).
4: end for
5: compute the filtration F : {Ni → Ni+1 , i ∈ [1, t − 1]}
6: compute Dgm∗ (F).

the G-induced mapper MG (W, f ) as the nerve complex N( fG∗ (W)).


Given a tower of covers W = {Wε } of Z, we define the G-induced multiscale mapper MMG (W, f )
as the tower of G-induced nerve complexes {N( fG∗ (Wε )) | Wε ∈ W}.

Given a map f : |K| → Z defined on the underlying space |K| of a simplicial complex K, let
fV : V(K) → R denote the restriction of f to the vertices of K. Consider the 1-skeleton graph
K 1 that provides the connectivity information for vertices in V(K). Given any cover tower W of
the metric space Z, the K 1 -induced multiscale mapper MMK 1 (W, fV ) is called the combinatorial
multiscale mapper of f w.r.t. W.

9.6.2 Advantage of combinatorial multiscale mapper


A simple description of the computation of the combinatorial mapper is in Algorithm 17. For the
simple PL example in Figure 9.10, f −1 (W) contains two connected components, one consists of
the set of white dots, while the other consists of the set of black dots. More generally, the con-
struction of the pullback cover needs to inspect only the 1-skeleton K 1 of K, which is typically of
significantly smaller size. Furthermore, the construction of the Nerve complex Ni as in Algorithm
17 is also much simpler: We simply remember, for each vertex v ∈ V(K), the set Iv of ids of
j
connected components {VW } j,W∈Wi which contain it. Any subset of Iv gives rise to a simplex in
the Nerve complex Ni .
Let MM(W, f ) denote the standard multiscale mapper as introduced in 9.3.1. Our main result
in this section is that if W is a (c, s)-good cover tower of Z, then the resulting two simplicial
towers, MM(W, f ) and MMK 1 (W, fV ) weakly interleave (Definition 6.6), and admits a bounded
distance between their respective persistence diagrams as a consequence of the weak-interleaving
result of [77]. This weaker setting of interleaving worsens the approximation by a factor of 3.

Theorem 9.28. Assume that (Z, dZ ) is a compact and connected metric space. Given a map
f : |K| → Z, let fV : V(K) → Z be the restriction of f to the vertex set V(K) of K.
Given a (c, s)-good cover tower W of Z such that (K, f, W) satisfies the minimum diam-
eter condition (cf. Condition 9.1), the bottleneck distance between the persistence diagrams
Dgmlog MM W, f and Dgmlog MMK 1 W, fV is at most 3 log(3c) + 3 max(0, log 1s ) for all k ∈ N.
 
234 Computational Topology for Data Analysis

9.7 Notes and Exercises


A corollary of the nerve theorem is that the space and the nerve have isomorphic homology groups
if all intersections of cover elements are homotopically trivial. This chapter studies a case when
covers do not necessarily satisfy this property. The result that for path connected covers, no
new 1-dimensional homology class is created in the nerve is proved in [133]. The materials in
sections 9.1 and 9.2 are taken from there. This result can be generalized for other dimensions; see
Exercise 5.
The concept of mapper was introduced by Singh, Mémoli, and Carlsson [277], and has since
been used in diverse applications, e.g [224, 227, 244, 288]. The authors of [277] showed for the
first time that a cover for the codomain in addition to domains can be useful for data analysis. The
mapper in some sense is connected to Reeb graphs (spaces) where the cover elements degenerate
to points in the codomain, see [240] for example. The structure and stability of 1-dimensional
mapper is studied in great details by Carrièr and Oudot in [69]. They showed that given a real
valued function f : X → R and an appropriate cover U, the extended persistence diagram of a
mapper M(U, f ) is a subset of the same of the Reeb graph R f . Furthermore, they characterized the
features of the Reeb graph that may disappear from the mapper. The mapper (for a real-valued
function f ) can also be viewed as a Reeb graph R f 0 of a perturbed function f 0 : X 0 → R. It
is shown in [69] how one can track the changes between R f and the mapper by computing the
functional distortion distance (Definition 7.8) between R f and R f 0 . In [15], the author established
a convergence result between Mapper for a real valued f and the Reeb graph R f . Specifically,
the mapper is characterized with a zigzag persistence module that is a coarsening of the zigzag
persistence module for R f . It is shown that the mapper converges to R f in the bottleneck dis-
tance of the corresponding zigzag persistence diagrams as the lengths of the intervals in the cover
approaches zero. Munch and Wang [240] showed a similar convergence in interleaving distance
(Definition 7.6) using sheaf theory [112].
The multiscale mapper which work on the notion of a filtration of covers was developed
in [132]. Most of the materials in this chapter are taken from this paper. The results on the class
of 1-cycles that persist through multiscale mapper are taken from [133].

Exercises
1. For a simplicial complex K, simplices with no cofacet are called maximal simplices. Con-
sider a closed cover of |K| with the closures of the maximal simplices as the cover elements.
Let N(K) denote the nerve of this cover. Prove that N(N(K)) is isomorphic to a subcomplex
of K.
2. ([17]) A vertex v in K is called dominated by a vertex v0 if every maximal simplex con-
taining v also contains v0 . We say K collapses strongly to a complex L if L is obtained by
a series of deletions of dominated vertices with all their incident simplices. Show that K
strongly collapses to N(N(K)).
3. We say a cover U of a metric space (Y, d) is (α, β)-cover if α ≤ λ(U) and β ≥ smax (U).
• Consider a δ-sample P of Y, that is, every metric ball B(y; δ), y ∈ Y, contains a point
in P. Prove that the cover U = {B(p; 2δ)} p∈P is a (δ, 4δ)-cover of Y.
Computational Topology for Data Analysis 235

• Prove that the infinite cover U = {B(y; δ)}y∈Y is a (δ, 2δ)-cover of Y.

4. Theorem 9.5 requires that the cover to be path connected. Show that this condition is
necessary by presenting a counterexample otherwise.

5. One may generalize Theorem 9.5 as follows: If for any k ≥ 0, t-wise intersections of cover
elements for all t > 0 have trivial reduced homology for Hk−t , then the nerve map induces a
surjection in Hk . Prove or disprove it.

6. Consider a function f : X → Z from a path connected space X to a metric space Z. Definite


the equivalence relation ∼ f such that x ∼ f x0 holds if and only if f (x) = f (x0 ) and there
exists a continuous path γ ∈ ΓX (x, x0 ) such that f ◦ γ is constant. The Reeb space R f is the
quotient of X under this equivalence relation.

• Prove that the quotient map q : X → R f is surjective and also induces a surjection
q∗ : H1 (X) → H1 (R f ).
• Call a class [c] ∈ H1 (X) vertical if and only if there is no c0 ∈ C1 (X) so that [c] = [c0 ]
and f ◦ σ is constant for every σ ∈ c0 . Show that q∗ ([c]) , 0 if and only if c is vertical.
• Let z1 , . . . , zg be an optimal cycle basis (Definition 9.3) of H1 (X) defined with respect
to the pseudometric d f (Definition 9.6). Let ` ∈ [1, g] be the smallest integer so that
s(z` ) , 0. Prove that if no such ` exists, H1 (R f ) is trivial, otherwise, {[q(zi )]}i=`,...g is
a basis for H1 (R f ).

7. Let us endow R f with a distance d̃ f that descends via the map q: for any equivalence classes
r, r0 ∈ R f , pick x, x0 ∈ X with r = q(x) and r0 = q(x0 ), then define

d̃ f (r, r0 ) := d f (x, x0 ).

Prove that d̃ f is a pseudo-metric.

8. Prove Proposition 9.17.

9. Prove Theorem 9.23.

10. Prove Claim 9.4.


236 Computational Topology for Data Analysis
Chapter 10

Discrete Morse Theory and


Applications

Discrete Morse theory is a combinatorial version of the classical Morse theory. Invented by For-
man [161], the theory combines topology with the combinatorial structure of a cell complex.
Specifically, much like the fact that critical points of a smooth Morse function on a manifold de-
termines its topological entities such as homology groups and Euler Characteristics, an analogous
concept called critical simplices of a discrete Morse function also determine similar structures
for the complex it is defined on. Gradient vectors associated with smooth Morse functions give
rise to integral lines and eventually the notion of stable and unstable manifolds [232]. Similarly,
a discrete Morse function defines discrete gradient vectors leading to V-paths analogous to the
integral lines. Using these V-paths, one can define the analogues of stable and unstable manifolds
of the critical simplices.
It turns out that an acyclic pairing between simplices and their faces so that every simplex
participates in at most one pair provides a discrete Morse function and conversely a discrete Morse
function defines such a pairing. This pairing termed as a Morse matching is a main building
block of the discrete Morse theory. In this chapter, we connect this matching with the pairing
obtained through persistence algorithm. Specifically, we present an algorithm for computing a
Morse matching and hence a discrete Morse vector field by connecting persistent pairs through
V-paths. This requires an operation called critical pair cancellation which may not succeed all
the time. However, for 1-complexes and simplicial 2-manifolds (pseudomanifolds), it always
succeeds. Section 10.1 and 10.2 are devoted to these results.
In Section 10.4, we apply our persistence based discrete Morse vector field to reconstruct
geometric graphs from their noisy samples. Here we show that unstable manifolds of critical
edges can recover a graph with guarantees from a density data that captures the hidden graph rea-
sonably well. We provide two applications of using this graph reconstruction algorithm, one for
road network reconstructions from GPS trajectories and satellite images and another for neuron
reconstructions from their images. Section 10.5 describes these applications.

237
238 Computational Topology for Data Analysis

10.1 Discrete Morse function


Following Forman [161] we define a discrete Morse function (henceforth called Morse function
in this chapter) as a function f : K → R on a simplicial complex K where for every p-simplex
σ p ∈ K the following two conditions hold1 ; recall that every (p − 1)-face of σ p is called its facet
and every (p + 1)-simplex adjoining σ p called its cofacet.

• #{σ p−1 | σ p−1 is a facet of σ p and f (σ p−1 ) ≥ f (σ p )} ≤ 1

• #{σ p+1 | σ p+1 is a coface of σ p and f (σ p+1 ) ≤ f (σ p )} ≤ 1

The first condition says that at most one facet of a simplex σ has higher or equal function
value than f (σ) and the second condition says that at most one cofacet of a simplex σ can have
lower or equal function value than f (σ). By a result of Chari [75], the two conditions imply that
the two sets above are disjoint, that is, if a pair (σ p−1 , σ p ) satisfies the first condition, there is no
pair (σ p , σ p+1 ) satisfying the second condition and vice versa. This means that a Morse function
f induces a matching:

Definition 10.1 (Matching). A set of ordered pairs M = {(σ, τ)} is a matching in K if the following
conditions hold:

1. For any (σ, τ) ∈ M, σ is a facet of τ.

2. Any simplex in K can appear in at most one pair in M.

Such a matching M defines two disjoint subsets L ⊆ K, U ⊆ K where there is a bijection µ : L →


U such that M = {(σ, µ(σ)) | σ ∈ L}.

In Figure 10.1, we indicate a matching by putting an arrow from the lower dimensional sim-
plex to the higher dimensional simplex. Observe that the source of each arrow is a facet of the
target of the arrow.
Note however, the matching in K defined by a Morse function has an additional property of
acyclicity which we show next. First, let us define a relation σi ≺ σi+1 if σi+1 = µ(σi ) or σi+1 is
a facet of σi but σi , µ(σi+1 ).

Definition 10.2 (V-path and Morse matching). Given a matching M in K, for k > 0, a V-path π
is a sequence

π : σ0 ≺ σ1 ≺ · · · σi−1 ≺ σi ≺ σi+1 · · · ≺ σk (10.1)

where for 0 < i < k, σi , µ(σi−1 ) implies σi+1 = µ(σi ). In other words, a V-path is an alternating
sequence of facets and cofacets thus alternating in dimensions where every consecutive pair also
alternates between matched and unmatched pairs. A V-path is cyclic if the first simplex σ0 is a
facet of the last simplex σk or σ0 = µ(σk ) and the matching M is called cyclic if there is such a
path in it. Otherwise, M is called acyclic. An acyclic matching in K is called a Morse matching.
1
Forman formulated discrete Morse function for more general cell complexes.
Computational Topology for Data Analysis 239

In Figure 10.1(left), the matching indicated by the arrows is not a Morse matching whereas
the matching in Figure 10.1(right) is a Morse matching. Observe that in a sequence like (10.1),
the function values on facets of the matched pairs strictly decreases. This observation leads to the
following fact.

Fact 10.1. The matching induced by a Morse function on K is acyclic, thus is a Morse matching.

We also have the following relation in the opposite direction.

Fact 10.2. A Morse matching M in K defines a Morse function on K.

Proof. First order those simplices which are in some pair of M. A simplex σ p−1 is ordered
before σ p if (σ p−1 , σ p ) ∈ M and it is ordered after σ p if it is a facet of σ p but (σ p−1 , σ p ) < M.
Such an ordering is possible because M is acyclic. Then, simply order the rest of the simplices
not in any pair of M according to their increasing dimensions. Assign the order numbers as the
function values of the simplices, which can easily be verified to satisfy the conditions (1) and (2)
of a (discrete) Morse function on K. 

Since a given Morse matching M in K can be associated with a Morse function f on K,


we call the simplices not covered by M the critical simplices of f . Let ci = ci (M) denote the
number of i-dimensional critical simplices. Recall that βi = βi (K) denotes the i-th Betti number,
the dimension of the homology group Hi (K). Assume that ci , βi = 0 for i > p where K is p-
dimensional. The following result is due to Forman [161]. It is analogous to Theorem 1.5 for
smooth Morse function in the smooth setting.

Proposition 10.1. Given a Morse function f on K with its induced Morse matching M, let ci s
and βi s defined as above. We have:

• (weak Morse inequality)

(i) ci ≥ βi for all i ≥ 0.


(ii) c p − c p−1 + · · · ± c0 = β p − β p−1 + · · · ± β0 where K is p-dimensional.

• (strong Morse inequality)

ci − ci−1 + ci−2 − · · · ± c0 ≥ βi − βi−1 + βi−2 · · · ± β0 for all i ≥ 0.

The weak Morse inequality can be derived from the strong Morse inequality (Exercise 7).

10.1.1 Discrete Morse vector field


Morse matchings can be interpreted naturally as a discrete counterpart of a vector field.

Definition 10.3 (DMVF). A discrete Morse vector field (DMVF) V in a simplicial complex K is
a partition V = C t L t U of K where L is the set of facets paired with a unique cofacet in U
in a Morse matching M giving µ(L) = U and C is the set of unpaired simplices called critical
simplices. We also say that V is induced by matching M in this case.
240 Computational Topology for Data Analysis

a a

b b
d d
c c

f e f e

Figure 10.1: Two DMVFs: (left) the matching is not Morse because the sequence a ≺ ab ≺ b ≺
bc ≺ c ≺ cd ≺ d ≺ da is cyclic; (right) the matching is Morse, and there is no cyclic sequence.

We interpret each pair (σ, τ = µ(σ)) as a vector originating at σ and terminating at τ and
draw the vector by an arrow with tail in σ and head in τ; see Figures 10.1 and 10.2. The critical
simplices are treated as critical points of the vector field justifying their names. The vertex e and
edge ce in both left and right pictures in Figure 10.1 are critical whereas the vertex c is critical
only in the right picture and the edge b f is only critical in the left picture.
In analogy to the integral lines for smooth vector fields, we define the so called critical V-paths
for discrete Morse vector fields.

Definition 10.4 (Critical V-path). Given a DMVF V = C t L t U induced by a matching M, a


V-path π : σ0 ≺ σ1 ≺ · · · σi−1 ≺ σi ≺ σi+1 · · · ≺ σk is critical in M if both σ0 and σk are critical.

Observe that, σ0 and σk in the above definition are necessarily a p- and (p − 1)-simplex
respectively if the V-path alternates between p and (p − 1)-simplices. The V-path corresponding
to a critical V-path cannot be cyclic due to this observation. The critical triangle cda with any of
its edges in Figure 10.1(left) forms a non-critical V-path wheres the pair ce ≺ e forms a critical
V-path in Figure 10.1(right).
In a critical V-path π, the pairs (σ1 , σ2 ), · · · , (σ2i−1 , σ2i ), · · · (σk−2 , σk−1 ) are matched. We
can cancel the pairs of critical simplices (σ0 , σk ) by reversing the matched pairs.

Definition 10.5 (Cancellation). Let (σ0 , σk ) be a pair of critical simplices with a critical V-path
π : σ0 ≺ σ1 ≺ · · · σi−1 ≺ σi ≺ σi+1 · · · ≺ σk . The pair (σ0 , σk ) is cancelled if one modifies
the matching by shifting the matched pairs by one position, that is, by asserting that the pairs
(σk , σk−1 ), · · · , (σ2i+1 , σi ), · · · , (σ1 , σ0 ) are matched instead – we refer to this as the (Morse)
cancellation on (σ0 , σk ). Observe that a cancellation essentially reverses the vectors in the V-path
π and additionally converts critical simplices σ0 and σk to be non-critical; see Figure 10.2. We
say that the pair (σ0 , σk ) is (Morse) cancellable if there exists a unique V-path between them.

Observe that a cancellation preserves the property of matching, that is, the new pairs together
with the undisturbed pairs indeed form a matching. Uniqueness of the critical V-path connecting
a pair of critical simplices ensures that the resulting new matching remains Morse. If there are
more than one such critical V-path, the new matching may become cyclic – for example, in Fig-
ure 10.2(c), the cancellation of one critical V-path between the triangle-edge pair creates a cyclic
V-path. The uniqueness of critical V-path is sufficient to ensure that such cyclic matching cannot
be produced. In particular, we have:
Computational Topology for Data Analysis 241

(a) (b) (c)

Figure 10.2: Critical vertices and edges are marked red; (a) before cancellation of edge-vertex
pair (v2 , e2 ); (b) after cancellation, the path from e2 to v2 is inverted, giving rise to a critical V-
path from e1 to v1 , making (v1 , e1 ) now potentially cancellable; (c) the edge-triangle pair (e, t), if
cancelled, creates cycle as there are two V-paths between them.

Proposition 10.2. Given a Morse matching M, suppose we cancel a pair of critical simplices σ
and σ0 in a DMVF V via a critical V-path to obtain a new matching M 0 . Then M 0 remains a
Morse matching if and only if this V-path is the only critical V-path connecting σ and σ0 in V
(i.e., the pair (σ, σ0 ) is cancellable as per Definition 10.5).

Proof. First, assume that there are two V-paths π and π0 originating at σ and ending at σ0 . Since
π and π0 are distinct and have common simplices σ at the beginning and σ0 at the end, there are
simplices τ and τ0 where the two paths differ for the first time after τ and join again for the first
time at τ0 . Reversing one V-path, say π, creates a V-path from τ0 to τ. This sub-path along with
the V-path from τ to τ0 on π0 creates a cyclic V-path, thus proving the ’only if’ part.
Next, suppose that there is only a single V-path from σ to σ0 . After reversing this path, we
claim that no cyclic V-path is created. For contradiction, assume that a cyclic V-path is created
as the result of reversal of π. Let the maximal sub-path of reversed π on this cyclic path starts
at τ and ends at τ0 . We have τ , τ0 because otherwise the original matching needs to be cyclic
in the first place. But, then the cyclic V-path has a sub-path from τ0 to τ that is not in π. Since
the reversed V-path π has a sub-path from τ to τ0 , the original path has a sub-path from τ0 to
τ. It means that the DMVF V originally had two V-paths from σ to σ0 , with one of them being
π while the other one containing a sub-path not in π. This forms a contradiction that there is a
single V-path from σ to σ0 . Hence the assumption that a cyclic V-path is created is wrong, which
completes the proof of the ’if’ part. 

10.2 Persistence based DMVF


Given a simplicial complex K, one can set up a trivial DMVF where every simplex is critical, that
is, V = K t ∅ t ∅. Then, one may use cancellations to build vector field further by constructing
more matchings. The key to the success of this approach is to identify pairs of critical simplices
that can be cancelled without creating cyclic paths. One way to do this is by taking advantage of
persistence pairs among simplices.
242 Computational Topology for Data Analysis

10.2.1 Persistence-guided cancellation


First, we consider the case of simplicial 1-complexes which consist of only vertices and edges.
Such a complex admits a DMVF obtained by cancelling the persistence pairs successively. Here
we consider pairs with finite persistence only. Recall that some of the creator simplices are never
paired with a destructor because the class created by them never dies. They are paired with ∞.
Such essential pairs are not considered in the following proposition.

Proposition 10.3. Let (v1 , e1 ), (v2 , e2 ), · · · , (vn , en ) be the sequence of all non-essential persis-
tence pairs of vertices and edges sorted in increasing order of the appearance of the edges ei ’s in
a filtration of a 1-complex K. Let V0 be the DMVF in K with all simplices being critical. Sup-
pose DMVF Vi−1 can be obtained by cancelling successively (v1 , e1 ), (v2 , e2 ), · · · (vi−1 , ei−1 ). Then,
(vi , ei ) can be cancelled in Vi−1 providing a DMVF Vi for all i ≥ 1.

Proof. Inductively assume that (i) Vi−1 is a DMVF obtained as claimed in the proposition and
(ii) any matched edge in Vi−1 is a paired edge in a persistence pair. We argue that these two
hypotheses hold for Vi proving the claim due to the hypothesis (i).
The base case for i = 1 is true trivially because V0 is a DMVF and there is no matched edge.
Inductively assume that Vi−1 satisfies the inductive hypothesis for i > 1. Consider the persistence
pair (vi , ei ). First, we observe that a V-path ei = ei1 ≺ vi1 ≺ . . . ≺ ein ≺ vin = vi exists in
Vi−1 . If not, starting from the two endpoints of ei , we attempt to follow the two V-paths and let
v, v0 , vi be the first two critical vertices encountered during this construction. Without loss of
generality, assume that v0 appears before v in the filtration. Then, the 0-dimensional class [v + v0 ]
is born when v is introduced. It is destroyed by ei . It follows that (v, ei ) is a persistence pair
(Fact 3.3) contradicting that actually (vi , ei ) is a persistence pair. For induction, consider the V-
path ei = ei1 ≺ vi1 ≺ . . . ≺ ein ≺ vin = vi in Vi−1 which is cancelled to create Vi . For Vi not to
be a DMVF, due to Proposition 10.2, we must have another distinct V-path from ei to vi in Vi−1 ,
ei = e j1 ≺ v j1 ≺ . . . ≺ e jn0 ≺ vin0 = vi . These two non-identical paths form a 1-cycle. Every edge
in this cycle except possibly ei are matched edges in Vi−1 and hence participates in a persistence
pair by the inductive hypothesis. Then, all edges in the 1-cycle participate in some persistence
pair because ei is also such an edge by assumption. But, this is impossible because in any 1-cycle
at least one edge has to remain unpaired in persistence. It follows that by cancelling (vi , ei ), we
obtain a DMVF Vi satisfying the inductive hypothesis (i). Also, inductive hypothesis (ii) follows
because the new matched pairs in Vi involve edges that were already matched in Vi−1 and the edge
ei which participates in a persistence pair by assumption. 

The result above holds for vertex-edge pairing in any simplicial complex. Furthermore, using
dual graphs, it can be used for edge-triangle pairing in triangulations of 2-manifolds. Given a
simplicial 2-complex K whose underlying space is a 2-manifold without boundary, consider the
dual graph (1-complex) K ∗ where each triangle t ∈ K becomes a vertex t∗ ∈ K ∗ and two vertices
t1∗ and t2∗ are joined with an edge e∗ if triangles t1 and t2 share an edge e in K.
The following result connects the persistence of a filtration of K and its dual graph K ∗ .

Proposition 10.4. Let σ1 , σ2 , · · · , σn be a subsequence of a simplex-wise filtration F of K con-


sisting of all edges and triangles. An edge-triangle pair (σi , σ j ) is a persistence pair for F if and
only if (σ∗j , σ∗i ) is a persistence pair for the filtration σ∗n , σ∗n−1 , · · · , σ∗1 of the dual graph K ∗ .
Computational Topology for Data Analysis 243

Proof. Recall Proposition 3.8. An edge-triangle persistence pair produced by the filtered bound-
ary matrix D2 for filtration of K are exactly same as the triangle-edge persistence pair obtained
from the twisted (transposed and reversed) matrix D∗2 by left-to-right column additions. The
matrix D∗2 is exactly the filtered boundary matrix of a filtration F(K ∗ ) of K ∗ that reverses the sub-
sequence of triangle and edges. Dualizing a triangle t to a vertex t∗ and an edge e to an edge e∗ ,
we can view F(K ∗ ) as a filtration on a 1-complex (graph). Then, applying Theorem 3.6, we get
that (t∗ , e∗ ) is indeed a persistence pair for the filtration F(K ∗ ). 

We can compute a DMVF V ∗ for K ∗ by cancelling all persistence pairs as stated in Propo-
sition 10.3. By duality, this also produces a DMVF V for the 2-manifold K. The action of
cancelling a vertex-edge pair in K ∗ can be translated into a cancellation of an edge-triangle pair
in K. Combining Propositions 10.3 and 10.4, we obtain the following result.

Theorem 10.5. Let K be a finite simplicial 2-complex whose underlying space is a 2-manifold
without boundary and F be a simplex-wise filtration of K (Definition 3.1). Starting from the trivial
DMVF where each simplex is critical, one can obtain a DMVF in K by cancelling the vertex-edge
and edge-triangle persistence pairs given by F.

In general, by duality one can apply the above theorem to cancel all persistence pairs be-
tween (d − 1)-simplices and d-simplices in a filtration of a simplicial d-complex where each
(d − 1)-simplex has at most two d-simplices as cofacets. This includes simplicial d-manifolds
with boundary. We call a (d − 1)-simplex boundary if it adjoins exactly one d-simplex. For
this extension, one has to introduce a ‘dummy’ vertex in the dual graph that connects to all dual
vertices of d-simplices incident to a boundary (d − 1)-simplex. We leave it as an exercise (Exer-
cise 11).
Unfortunately, it does not extend any further. In particular, the result in Theorem 10.5 does
not extend to arbitrary simplicial 2-complexes and hence arbitrary simplicial complexes. The
main difficulty arises because such a complex does not admit a dual graph in general. Indeed,
there are counterexamples which exhibit that every persistence pair for a filtration of a simplicial
2-complex cannot be cancelled leading to a DMVF. The following Dunce hat example exhibits
this obstruction.

Dunce hat. Consider a 2-manifold with boundary which is a cone with apex v and the boundary
circle c. Let u be a point on c. Modify the cone by identifying the line segment uv with the
circle c. Because of the similarity, the space obtained by this identification is called the Dunce
hat. Consider a triangulation K of the Dunce hat. Notice that Dunce hat and hence |K| is not a
2-manifold. The edges discretizing uv in K have three triangles incident to them. We show that
there is no DMVF without any critical edge and triangle for K. The complex K is known to have
βi (K) = 0 for all i > 0 and has two or more triangles adjoining every edge in it. For any filtration
of K, there cannot be any edge or triangle that remains unpaired because otherwise that would
contradict that β1 (K) = 0 and β2 (K) = 0 (Fact 3.9 in Chapter 3). If a DMVF V were possible
to be created by cancelling persistence pairs, there would be a finite maximal V-path that cannot
be extended any further. Consider such a path π starting at a simplex σ. If σ is a triangle, the
edge µ−1 (σ) matched with it can be added before it to extend π. If σ is an edge, there is a triangle
adjoining σ not in the V-path because at least two triangles adjoin e and the V-path starting at
244 Computational Topology for Data Analysis

e cannot be cyclic. We can add that triangle to extend π. In both cases, we contradict that π is
maximal.

10.2.2 Algorithms
The above results naturally suggest an algorithm for computing a persistence based DMVF for
a simplicial 2-manifold K. We compute the persistence pairs on a chosen filtration F of K and
then cancel them successively as Theorem 10.5 suggests. Both of these tasks can be combined by
modifying the well known Kruskal’s algorithm for computing minimum spanning tree of a graph.
Consider a graph G = (U, E) which can be either the 1-skeleton of a complex K or the dual
graph K ∗ if K is a simplicial 2-manifold. Assume that u1 , u2 , . . . , uk and e1 , e2 , . . . , e` be an
ordered sequence of vertices and edges in G. For minimum spanning tree, the sequence of edges
are taken in non-decreasing order of their weights. Here we describe the algorithm by assuming
any order. Kruskal’s algorithm maintains a spanning forest of the vertex set. It brings one edge
e at a time in the given order either to join two trees in the current forest or to discover that the
edge makes a cycle and hence does not belong to the spanning forest. If the two endpoints of
e belong to two different trees in the forest, then it joins those two trees. Otherwise, e connects
two vertices in the same tree creating a cycle. The main computation involves determining if two
vertices of an edge belong to the same tree or not. Algorithm 18:PersDMVF does it by union-find
data structure which maintains the set of vertices of a tree in a single set and two sets are united
if an edge joins the two respective trees. This is similar to FindSet and Union operations in the
algorithm ZeroPerDg described in Section 3.5.3. All such find and union operations can be done
in O(k + `α(`)) time assuming that there are k vertices and ` edges in the graph which dominates
the overall complexity.
We can incorporate the persistence computation and Morse cancellations simultaneously in
the above algorithm with some simple modifications. We process the vertices and edges in their
order of the input filtration. Usually, the filtration F = F f is given by a simplex-wise monotone
function f as described in Section 3.1.2. We compute the persistence Pers (e) of an edge e as
Pers (e) = | f (e) − f (r)| if e pairs with the vertex r and ∞ otherwise.
For a vertex u in the filtration F f , we do not do anything other than creating a new set con-
taining u only. When an edge e = (u, u0 ) comes in, we check if u and u0 belong to the same
tree by using the union-find data structure. If they do, the edge e is designated as a creator for
persistence and as a critical edge in DMVF that is being built on G. Otherwise, we compute
Pers (e) after finding the persistence pair for e and at the same time cancel e with its pair in
the DMVF as follows. Assume inductively that the current DMVF matches every vertex other
than the roots of the trees to one of its adjacent edge as follows. For a leaf vertex v, con-
sider the path v = v1 , e1 , . . . , ek−1 , vk = r from v to the root r which consist of matched pairs
(v1 , e1 ), . . . , (vk−1 , ek−1 ) and the critical vertex r. For the edge e = (u, u0 ), let the roots of the two
trees T u and T u0 containing u and u0 be r and r0 respectively. Assume without loss of generality
that r succeeds r0 in the input filtration. Then, e pairs with r in persistence because e joins the
two components created by r and r0 between which r comes later in the filtration. We cancel the
persistence pair (r, e) by shifting the matched pairs on the path from u to r as stated in Defini-
tion 10.5. We join the two trees T u and T u0 into one tree by calling the routine Join. The root of
the joined tree becomes r0 . Cancelling (r, e) maintains the invariant that every path from the leaf
to the root of the new tree remains a V-path. See Figure 10.3 for an illustration.
Computational Topology for Data Analysis 245

Algorithm 18 PersDMVF(G, F f )
Input:
A graph G and a filtration F f on its n vertices and edges
Output:
A DMVF V and persistence pairs of F f which are cancelled for creating V
1: Let G = (U, E) and F be the input filtration of its n vertices and edges.
2: T := {∅}; V := ∅ t ∅ t {(U ∪ E)}; Initialize U := U
3: for all i = 1, . . . , n do
4: if σi ∈ F f is a vertex u then
5: Create a tree T rooted at u; T := T ∪ {T }
6: else if σi ∈ F is an edge e = (u, u0 ) then
7: if t :=FindSet(u)= t0 :=FindSet(u0 ) then
8: designate e as creator and critical in V; Pers (e) := ∞
9: else
10: Union(t,t0 ) updating U
11: Let T u and T u0 be trees containing u and u0
12: Find V-paths πu from u to root r and πu0 from u0 to r0 in T u and T u0 respectively
13: Let r succeed r0 in F; Cancel (e, r) considering the V-path πu and update DMVF V;
Pers (e) := | f (e) − f (r)|
14: Join(T u , T u0 ) in T
15: end if
16: end if
17: end for
18: Output V and persistence pairs with persistence values

r0 r0
0
e
r r
u0 e u0
u e u

Tu Tu0 Tu Tu0

Figure 10.3: Illustration for Algorithm PersDMVF: destroyer edge e = (u, u0 ) is joining two trees
T u and T u0 with roots r and r0 respectively. The pair (r, e) is cancelled reversing the arrows on
three edges on the path from r to u0 ; edge e0 in the right picture is a creator and does not make
any change in the forest.

The costly step in algorithm PersDMVF is the cancellation step which takes O(n) time and
thus incurs a running time O(n2 ) in total. However, we observe that all matchings in the final
DMVF are made between a node v and the edge e that connects v to its parent parent(v) in the
respective rooted tree and the root remains critical. All non-tree edges remain critical. Thus, we
can eliminate the cancellation step in PersDMVF and after computing the final forest we can
246 Computational Topology for Data Analysis

Algorithm 19 SimplePersDMVF(G, F f )
Input:
A graph G and a filtration F f on its n vertices and edges
Output:
A DMVF V and persistence pairs of F f which are cancelled for creating V
1: Let G = (U, E) and F f be the input filtration of its n vertices and edges.
2: T := {∅}; V := ∅; Initialize U := U
3: for all i = 1, . . . , n do
4: if σi ∈ F f is a vertex u then
5: Create a tree T rooted at u; T := T ∪ {T }
6: else if σi ∈ F is an edge e = (u, u0 ) then
7: if t :=FindSet(u)= t0 :=FindSet(u0 ) then
8: designate e as creator and critical in V; Pers (e) := ∞
9: else
10: Union(t,t0 ) updating U
11: Let T u and T u0 be trees containing u and u0 with roots r and r0
12: Let r succeed r0 in F; Pers (e) := | f (e) − f (r)|
13: Join(T u , T u0 ) in T with edge e
14: end if
15: end if
16: end for
17: for each tree T ∈ T do
18: for each node v in T do
19: e := (v, parent(v)), V := V t (v, e)
20: end for
21: Put the root of T as a critical vertex in V
22: end for
23: Output V and persistence pairs with persistence values

determine all matched pairs by traversing the trees upward from the leaves to the roots while
matching a vertex with the edge visited next in this upward traversal. This matching takes O(n)
time. Accounting for the union-find operations, all other steps in PersDMVF take O(nα(n)) time
in total. The simplified version Algorithm 19:SimplePersDMVF incorporates these changes. We
have the following result.

Theorem 10.6. Given a simplicial 1-complex or a simplicial 2-manifold K with n simplices, one
can compute

1. a DMVF by cancelling all persistence pairs resulting from a given filtration of K in O(nα(n))
time;

2. a DMVF as above when the filtration is induced by a given PL-function on K in O(n log n)
time.
Computational Topology for Data Analysis 247

Proof. We argue for all statements in the theorem when K is a 1-complex. By considering the
dual graph K ∗ , and combining Propositions 10.3 and Proposition 10.4, the arguments also hold
for K when it is a simplicial 2-manifold. The algorithm SimplePersDMVF outputs the same as
the algorithm PersDMVF whose correctness follows from Theorem 10.5 because it cancels the
persistence pairs exactly as the theorem dictates. The complexity analysis of the algorithm Sim-
plePersDMVF establishes the first statement. For the second statement, given the function values
at the vertices of K, we can compute a simplex-wise lower star filtration (Section 3.5) in O(n log n)
time after sorting these function values. A subsequent application of SimplePersDMVF on this
lower star filtration provides us the desired DMVF. 

We can modify SimplePersDMVF slightly to take into account a threshold δ for persistence,
that is, we can cancel pairs only with persistence up to δ. To do this, we need a slightly different
version of Proposition 10.3. The cancellation also succeeds if we cancel persistent pairs in the
order of their persistence values. The proof of Proposition 10.3 can be adapted for the following
proposition.

Proposition 10.7. Let (v1 , e1 ), (v2 , e2 ), · · · , (vn , en ) be the sequence of all non-essential persis-
tence pairs of vertices and edges sorted in non-decreasing order of their persistence for a given
filtration of K. Let V0 be the DMVF in K with all simplices being critical. Suppose DMVF Vi−1
can be obtained by cancelling successively (v1 , e1 ), (v2 , e2 ), · · · (vi−1 , ei−1 ). Then, (vi , ei ) can be
cancelled in Vi−1 providing a DMVF Vi for all i ≥ 1.

The modified algorithm proceeds as in SimplePersDMVF, but designate those edges critical
whose persistence is more than δ. Then, before traversing the edges of the trees in the forest T
to output the vertex-edge pairs, we delete all these critical edges from T. This splits the trees in
T and creates more trees. We need to determine the roots of these trees. Observe that, had we
done the cancellations as in PersDMVF, the roots of the trees would have been the vertices that
appear the earliest in the filtration among all vertices in the respective trees. So, all trees in T
obtained after deleting all critical edges are rooted at the vertices that appear the earliest in the
filtration. Then, the steps 17 to 22 in SimplePersDMVF compute the vertex-edge matchings into
the DMVF from these rooted trees. The new algorithm called PartialPersDMVF modifies step
13 of the algorithm SimplePersDMVF as:

• if Pers (e) > δ then designate e critical in V endif; Join(T u ,T u0 )

Also, PartialPersDMVF introduces a step before step 17 of SimplePersDMVF as:

• delete all critical edges from T and create new rooted trees in T as described

We claim that PartialPersDMVF indeed computes a DMVF guaranteed by Proposition 10.7


(Exercise 9).

Claim 10.1. PartialPersDMVF(F,δ) computes a DMVF obtained by cancelling persistence pairs


in non-decreasing order of persistence values which do not exceed the input threshold δ.

Let Vδ denote the resulting DMVF after canceling all vertex-edge persistence pairs with per-
sistence at most δ.
248 Computational Topology for Data Analysis

Proposition 10.8. The following statements hold for the output T of the algorithm PartialPers-
DMVF w.r.t any δ ≥ 0:
(i) For each tree T i , its root ri is the only critical simplex in Vδ ∩ T i . The collection of these
roots corresponds exactly to those vertices whose persistence is bigger than δ.
(ii) Any edge with Pers (e) > δ remains critical in Vδ and cannot be contained in T.

10.3 Stable and unstable manifolds


In Section 1.5.2, we introduced the concept of Morse functions (Definition 1.28). These are
smooth functions f : Rd → R satisfying certain conditions. We defined critical points of these
functions and analyzed topological structures using the neighborhoods of these critical points.
Here, we introduce another well known structure associated with Morse functions and then draw a
parallel between these smooth continuous structures to their discrete counterparts with the discrete
Morse functions.

10.3.1 Morse theory revisited


∂f ∂f T
For a point p ∈ Rd , recall that the gradient vector of f at a point p is ∇ f (p) = [ ∂x1
· · · ∂x d
] , which
represents the steepest ascending direction of f at p, with its magnitude being the rate of change.
An integral path of f is a maximal path π : (0, 1) → Rd where the tangent vector at each point
p of this path equals ∇ f (p), which is intuitively a flow path following the steepest ascending
direction at any point. Recall that a point p ∈ Rd is critical if its gradient vector vanishes, i.e,
∇ f (p) = [0 · · · 0]T . An integral path necessarily “starts” and “ends” at critical points of f ; that is,
limt→0 π(t) = p with ∇ f (p) = [0 · · · 0]T , and limt→1 π(t) = q with ∇ f (q) = [0 · · · 0]T . See Figure
10.4 where we show the graph of a function f : R2 → R, and there is an integral path from a
minimum v to a maximum t2 and also to a saddle point e2 .
For a critical point p, the union of p and all the points from integral lines flowing into p is
referred to as the stable manifold of p. Similarly, for a critical point q, the union of q and all
the points on integral lines starting from q is called the unstable manifold of q. The unstable
manifold of a minimum p intuitively corresponds the basin/valley around p in the terrain of f .
The 1-unstable manifold of an index (d − 1) saddle consists of flow paths connecting this sad-
dle to maxima. These curves intuitively capture “mountain ridges” of the terrain (graph of the
function f ); see Figure 10.4 for an example. Symmetrically, the stable manifold of a maximum
q corresponds to the mountain around q. The 1-stable manifolds consist of a collection of curves
connecting minima to 1-saddles, corresponding intuitively to the “valley ridges”.
Now, we focus on a graph-reconstruction approach using Morse-theory. Suppose that a den-
sity field ρ : Ω → R on a domain Ω ⊆ Rd is given where ρ concentrates around a hidden
geometric graph G embedded in Rd . We want to reconstruct G from ρ. Intuitively, we wish to use
the 1-unstable manifolds of saddles (mountain ridges) of the density field ρ to capture the hidden
graph.
However, to implement this idea, we will use discrete Morse theory, which provides ro-
bustness and simplicity due to its combinatorial nature. The cancellations guided by the per-
sistence pairings can help us removing noise introduced both by discretization and measurement
Computational Topology for Data Analysis 249

errors. Below, we introduce some concept necessary for transitioning to the discrete versions of
(un)stable manifolds.

(a) (b)

Figure 10.4: (Un)stable manifolds for a smooth Morse function on left and its discrete version
(shown partially) on right: (a) t1 and t2 are maxima (critical triangles in discrete Morse), v is a
minimum, e1 and e2 are saddles (critical edges in discrete Morse). The unstable manifold of e1
flows out of it to t1 and t2 . On the other hand, its stable manifolds flow out of minima such as v
and come to it. These flows work in the opposite direction of ‘gravity’ because if we put a drop
of water at x it will flow to v. If we put it on the other side of the mountain ridge it will flow to
other minimum; (b) the flow direction reverses from the smooth case to the discrete case.

10.3.2 (Un)Stable manifolds in DMVF


The V-paths in a DMVF are analogues to the integral paths in the smooth setting. A V-path
π : σ0 ≺ σ1 ≺ · · · σi−1 ≺ σi ≺ σi+1 · · · ≺ σk is a vertex-edge gradient path if σi alternate
between edges and vertices. Similarly, it is an edge-triangle gradient path if they alternate between
triangles and edges. Also, we refer to vertex-edge or edge-triangle pairs as the gradient vertex-
edge and gradient edge-triangle vectors respectively.
Different from the smooth setting, a maximal V-path may not start or end at critical simplices.
However, those that do (i.e, when σ0 and σk are critical simplices) are exactly the critical V-paths.
These paths are discrete analogues of maximal integral paths in the smooth setting which “start”
and “end” at critical points. One can think of critical k-simplices in the discrete Morse setting
as index-k critical points in the smooth setting as defined in Section 1.5.2. For example, for a
function on R2 , critical 0-, 1- and 2-simplices in the discrete Morse setting correspond to minima,
saddles and maxima in the smooth setting, respectively.
There is one more caveat that one should be aware of. The direction of the integral paths and
the V-paths run in the opposite direction by definition: In the smooth setting, function values in-
crease along an integral path, while in the discrete setting, it decreases along a V-path. This means
that the stable and unstable manifolds reverse their roles in the two settings; refer to Figure 10.4.
For a critical edge e, we define its stable manifold to be the union of edge-triangle gradient paths
that ends at e. Its unstable manifold is defined to be the union of vertex-edge gradient paths that
begins with e. In the graph reconstruction approach presented below, we use “mountain ridges”
for the reconstruction. We have seen that these are 1-unstable manifolds of saddles in the smooth
setting and hence correspond to 1-stable manifolds in the discrete gradient fields consisting of
250 Computational Topology for Data Analysis

triangle-edge paths. Notice that these mountain ridges on a triangulation of d-manifold corre-
spond to a V-path alternating between d and (d − 1) dimensional simplices. Computationally,
however, vertex-edge gradient paths are simpler to handle especially for the Morse cancellations
below. Hence in our algorithm below, we negate the density function ρ and consider the function
−ρ. The algorithm outputs a subset of the 1-unstable manifolds that are vertex-edge gradient paths
in the discrete setting as the recovered hidden graph.
With the above set up, we have an input function f : V(K) → R defined at the vertices V(K)
of a complex K whose linear extension leads to a PL function still denoted by f : |K| → R. For
computing persistence, we use the lower-star filtration F f of f and its simplex-wise version as
described in Section 3.1.2.

10.4 Graph reconstruction


Suppose we have a domain Ω (which will be a cube in Rd ) and a density function ρ : Ω → R
that “concentrates” around a hidden geometric graph G ⊂ Ω. In the discrete setting, our input
will be a triangulation K of Ω and a density function given as a PL-function ρ : |K| → R. The
algorithm can be easily modified to take a cell complex as input. Our goal is to compute a graph
Ĝ approximating the hidden graph G.

10.4.1 Algorithm
Intuitively, we wish to use “mountain ridges” of the density field to approximate the hidden graph
as Figure 10.6 shows. We compute these ridges as the 1-stable manifolds (“valley ridges") of
f = −ρ, the negation of the density function. In the discrete setting, these become 1-unstable
manifolds consisting of vertex-edge gradient paths in an appropriate DMVF. We compute this
DMVF by cancelling vertex-edge persistence pairs whose persistence is at most a threshold δ.
The rational behind this choice is that the small undulations in a 1-unstable manifold caused by
noise and discretization need to be ignored by cancellation. The procedure PartialPersDMVF
described earlier in Section 10.2.2 achieves this goal. Finally, the union of the 1-unstable mani-
folds of all remaining high-persistence critical edges is taken as the output graph Ĝ, as outlined
in Algorithm 21:CollectG. Algorithm 20:MorseRecon presents these steps.

Algorithm 20 MorseRecon(K, ρ, δ)
Input:
A 2-complex K, a vertex function ρ on K, a threshold δ
Output:
A graph
1: Let F be a simplex-wise lower star filtration of K w.r.t. f = −ρ.
2: Compute persistence Pers (e) for every edge e for the filtration F.
3: Let K 1 be the 1-skeleton of K and F 1 be F restricted to vertices and edges only
4: Let T be the forest computed by PartialPersDMVF(K 1 ,F 1 ,δ)
5: CollectG(K 1 ,T, Pers (·), δ)
Computational Topology for Data Analysis 251

Algorithm 21 CollectG(K 1 ,T, Pers (·), δ)


Input:
A 1-skeleton K 1 , a forest T ⊆ K 1 , persistence values for edges in K 1 , a threshold δ
Output:
A graph
1: Ĝ := ∅
2: for every edge e = (u, v) ∈ K 1 \ T do
3: if Pers (e) > δ then
4: Let π(u) and π(v) be the two paths from u and v to the roots respectively;
5: Set Ĝ := Ĝ ∪ π(u) ∪ π(v) ∪ {e}
6: end if
7: end for
8: Return Ĝ

Since we only need 1-unstable manifolds, K is assumed to be a 2-complex. Notice that


one only needs to cancel vertex-edge pairs – this is because only vertex-edge gradient vectors
contribute to the 1-unstable manifolds, and also new vertex-edge vectors can only be generated
while canceling other vertex-edge pairs.
Let T 1 , T 2 , . . . , T k be the set of trees returned by PartialPersDMVF. The routine CollectG
outputs the 1-unstable manifold of every edge e = (u, v) with Pers (e) > δ, which is simply the
union of e and the unique paths from u and v to root of the tree containing them respectively.
Notice that we still need to compute the persistence for all edges. If it were only for those
edges that pair with vertices, we could have eliminated step 2 in MorseRecon and computed the
persistence of these edges in PartialPersDMVF in almost linear time (Theorem 10.6). However,
to compute persistence for edges that pair with triangles, we have to use the standard persistence
algorithm whose complexity again depends on the complex K. For example, if K is a simplicial
2-manifold, this can run in O(nα(n)) time (Section 3.6–Notes and Exercises in Chapter 3); but this
time complexity does not hold for general 2-complex K. To take into account this dependability
of the time complexity on the type of K, we simply denote the time for computing persistence
with Pert(K) in the following theorem.

Theorem 10.9. The time complexity of the algorithm MorseRecon is O(Pert(K)), where Pert(K)
is the time to compute persistence pairings for K.

We remark that, for K with n vertices and edges, collecting all 1-unstable manifolds takes
O(n) time if one avoids revisiting edges while tracing paths. This O(n) term is subsumed by
Pert(K) because there are at least n/2 such pairs.
Consider the DMVF Vδ computed by PartialPersDMVF. Notice that, Proposition 10.8(i)
implies that for each T i , any V-path of Vδ starting at a vertex or an edge in T i terminates at its
root ri . See figure 10.3 for an example. Hence for any vertex v ∈ T i , the path π(v) computed by
CollectG is the unique V-path starting at v. This immediately leads to the following result:

Corollary 10.10. For each critical edge e = (u, v) with Pers (e) ≥ δ, π(u) ∪ π(v) ∪ {e} computed
by the algorithm CollectG is the 1-unstable manifold of e in Vδ .
252 Computational Topology for Data Analysis

Figure 10.5: Noise model for graph reconstruction.

10.4.2 Noise model


To establish theoretical guarantees for the graph reconstructed by the algorithm MorseRecon, we
assume a noise model for the input. We first describe the noise model in the continuous setting
where the domain is k-dimensional unit cube Ω = [0, 1]k . We then explain the setup in the discrete
setting when the input is a triangulation K of Ω.
Given a connected “true graph” G ⊂ Ω, consider a ω-neighborhood Gω ⊆ Ω, meaning that (i)
G ⊆ Gω , and (ii) for any x ∈ Gω , d(x, G) ≤ ω (i.e, Gω is sandwiched between G and its ω-offset).
Given Gω , we use cl(Gω ) = cl(Ω \ Gω ) to denote the closure of its complement Gω = Ω \ Gω .
Figure 10.5 illustrates the noise model in the discrete setting, showing G (red graph) with its
ω-neighborhood Gω (yellow).
Definition 10.6 ((β, ν, ω)-approximation). A density function ρ : Ω → R is a (β, ν, ω)-approximation
of a connected graph G if the following holds:
C-1 There is a ω-neighborhood Gω of G such that Gω deformation retracts to G.
C-2 ρ(x) ∈ [β, β + ν] for x ∈ Gω ; and ρ(x) ∈ [0, ν] otherwise. Furthermore, β > 2ν.
Intuitively, this noise model requires that the density ρ concentrates around the true graph G in
the sense that the density is significantly higher inside Gω than outside; and the density fluctuation
inside or outside Gω is small compared to the density value in Gω (condition C-2). Condition C-1
says that the neighborhood has the same topology of the hidden graph. Such a density field could
for example be generated as follows: Imagine that there is an ideal density field fG : Ω → R
where fG (x) = β for x ∈ Gω and 0 otherwise. There is a noisy perturbation g : Ω → R whose size
is always bounded by g(x) ∈ [0, ν] for any x ∈ Ω. The observed density field ρ = fG + g is an
(β, ν, ω)-approximation of G.
In the discrete setting when we have a triangulation K of Ω, we define a ω-neighborhood Gω
to be a subcomplex of K, i.e, Gω ⊆ K, such that (i) G is contained in the underlying space of Gω
and (ii) for any vertex v ∈ V(Gω ), d(v, G) ≤ ω. The complex cl(Gω ) ⊆ K is simply the smallest
subcomplex of K that contains all simplices from K \ Gω (i.e, all simplices not in Gω and their
faces). A (β, ν, ω)-approximation of G is extended to this setting by a PL-function ρ : |K| → R
while requiring that the underlying space of Gω deformation retracts to G as in (C-1), and density
conditions in (C-2) are satisfied at vertices of K.
Computational Topology for Data Analysis 253

We remark that the noise model is rather limited – In particular, it does not allow significant
non-uniform density distribution. However, this is the only case that theoretical guarantees are
known at the moment for a discrete Morse based reconstruction framework. In practice, the
algorithm has often been applied to non-uniform density distributions.

10.4.3 Theoretical guarantees


In this subsection, we prove results that are applicable to hypercube domains of any dimensions.
Recall that Vδ is the discrete gradient field after the cancellation process with threshold δ, where
we perform cancellation for vertex-edge persistence pairs generated by a simplex-wise filtration
induced by the PL-function f = −ρ that negates the density PL-function. At this point, all
positive edges, i.e, those not paired with vertices, remain critical in Vδ . Some negative edges,
i.e, those paired with vertices also remain critical in Vδ – these are exactly the negative edges with
persistence bigger than δ. CollectG only takes the 1-unstable manifolds of those critical edges
(positive or negative) with persistence bigger than δ; so those edges whose persistence is at most
δ are ignored.

Input assumption. Let ρ be an input density field which is a (β, ν, ω)-approximation of a con-
nected graph G, and δ ∈ [ν, β − ν).

Under the above input assumption, let Ĝ be the output of algorithm MorseRecon(K, ρ, δ). The
proof of the following result can be found in [138].

Proposition 10.11. Under the input assumption, we have:

(i) There is a single critical vertex left after MorseRecon returns, which is in Gω .

(ii) Every critical edge considered by CollecG forms a persistence pair with a triangle.

(iii) Every critical edge considered by CollectG is in Gω .

Theorem 10.12. Under the input assumption, the output graph satisfies Ĝ ⊆ Gω .

Proof. Recall that the output graph Ĝ consists of the union of 1-unstable manifolds of all the
edges e∗1 , . . . , e∗g with persistence larger than δ – by Propositions 10.11 (ii) and (iii), they are all
positive (paired with triangles), and contained inside Gω . Below we show that other simplicies in
their 1-unstable manifolds are also contained in Gω .
Take any i ∈ [1, g] and consider e∗i = (u, v). Without loss of generality, consider the critical
V-path π : e∗i ≺ (u = u1 ) ≺ e1 ≺ u2 ≺ . . . ≺ e s ≺ u s+1 . By definition u s+1 is a critical vertex and is
necessarily the global minimum v0 for the density field ρ, which is also contained inside Gω . We
now argue that all simpliecs in the path π lie inside Gω . In fact, we argue a stronger statement:
first, we say that a gradient vector (v, e) is crossing if v ∈ Gω and e < Gω (i.e, e ∈ cl(Gω )). Since
v is an endpoint of e, this means that the other endpoint of e must lie in K \ Gω .

Claim 10.2. During the cancellation with threshold δ in the algorithm MorseRecon, no crossing
gradient vector is ever produced.
254 Computational Topology for Data Analysis

Proof. Suppose the claim is not true. Then, let (v, e) be the first crossing gradient vector ever
produced during the cancellation process. Since we start with a trivial discrete gradient vector
field, the creation of (v, e) can only be caused by reversing of some gradient path π0 connecting
two critical simplices v0 and e0 while we are performing cancellation for the persistence pair
(v0 , e0 ). Obviously, Pers (e0 ) ≤ δ because otherwise cancellation would not have been performed.
On the other hand, due to our (β, ν, ω)-noise model and the choice of δ, it must be that either both
v0 , e0 ∈ Gω or both v0 , e0 ∈ K \ Gω – as otherwise, the persistence of this pair will be larger than
β − ν > δ.
Now consider the V-path π0 connecting e0 and v0 in the current discrete gradient vector field
V 0 . The path π0 begins and ends with simplices that are either both in Gω or both are outside
Gω and also it has simplices both inside and outside Gω . It follows that the path π0 contains a
gradient vector (v00 , e00 ) going in the opposite direction crossing inside/outside, that is, v00 ∈ Gω
and e00 < Gω . In other words, it must contain a crossing gradient vector. This however contradicts
our assumption that (v, e) is the first crossing gradient vector. Hence, the assumption is wrong and
no crossing gradient vector can ever be created. 

As there is no crossing gradient vector during and after cancellation, it follows that π, which is
one piece of the 1-unstable manifold of the critical edge e∗i , has to be contained inside Gω . The
same argument works for the other piece of 1-unstable manifold of e∗i which starts from the other
endpoint of e∗i . Since this holds for any i ∈ [1, g], the theorem follows. 

The previous theorem shows that Ĝ is geometrically close to G. Next we show that they are
also close in topology.

Proposition 10.13. Under the input assumption, Ĝ is homotopy equivalent to G.

Proof. First we show that Ĝ is connected. Then, we show that Ĝ has the same first Betti number
as that of G which implies the claim as any two connected graphs in Rk with the same first
Betti number are homotopy equivalent. Suppose that Ĝ has at least two components. These two
components should come from two trees in the forest computed by PartialPersDMVF. The roots,
say r and r0 , of these two trees must reside in Gω due to Claim 10.2 and Proposition 10.11(iii).
Furthermore, the supporting complex of Gω is connected because it contains the connected graph
G. It follows that there is a path connecting r and r0 within Gω . All vertices and edges in Gω
appear earlier than other vertices and edges in the filtration that PartialPersDMVF works on.
This two facts mean that the first edge which connects the two trees rooted at r and r0 resides
in Gω . This edge has a persistence less than δ and should be included in the reconstruction
by MorseRecon. It follows that CollectG returns 1-unstable manifolds of edges ending at a
common root of the tree containing both r and r0 . In other words, Ĝ cannot have two components
as assumed.
The underlying space of ω-neighborhood Gω of G deformation retracts to G by definition.
Observe that, by our noise model, Gω is a sublevel set in the filtration that determines the per-
sistence pairs. This sublevel set being homotopy equivalent to G must contain exactly g positive
edges where g is the first Betti number of G. Each of these positive edges pairs with a triangle in
Gω . Therefore, Pers (e) > δ for each of the g positive edges in Gω . By our earlier results, these
are exactly the edges that will be considered by procedure CollectG. Our algorithm constructs Ĝ
Computational Topology for Data Analysis 255

by adding these g positive edges to the spanning tree each of which adds a new cycle. Thus, Ĝ
has first Betti number g as well, thus proving the proposition. 

We have already proved that Ĝ is contained in Gω . This fact along with Proposition 10.13 can
be used to argue that any deformation retraction taking (underlying space) Gω to G also takes Ĝ
to a subset G0 ⊆ G where G0 and G have the same first Betti number. In what follows, we use Gω
to denote also its underlying space.

Theorem 10.14. Let H : Gω × [0, 1] → Gω be any deformation retraction so that H(Gω , 1) = G.


Then, the restriction H|Ĝ : Ĝ × [0, 1] → Gω is a homotopy from the embedding Ĝ to G0 ⊆ G where
G and G0 have the same first Betti number.

Proof. The fact that H|Ĝ (·, `) is continuous for any ` ∈ [0, 1] is obvious from the continuity of H.
Only thing that needs to be shown is that G0 := H|Ĝ (Ĝ, 1) has the same first Betti number as that of
G. We observe that a cycle in Ĝ created by a positive edge e along with the paths to the root of the
spanning tree is also non-trivial in Gω because this is a cycle created by adding the edge e during
persistence filtration and the cycle created by the edge e is not destroyed in Gω . Therefore, a cycle
basis for H1 (Ĝ) is also a homology basis for H1 (Gω ). Since the map H(·, 1) : Gω → G is a homo-
topy equivalence, it induces an isomorphism in the respective homology groups; in particular, a
basis in H1 (Gω ) is mapped bijectively to a basis in H1 (G). Therefore, the image G0 = H|Ĝ (Ĝ, 1)
must have a basis of cardinality g = β1 (Ĝ) = β1 (Gω ) = β1 (G) proving that β1 (G0 ) = β1 (G). 

10.5 Applications
10.5.1 Road network
Robust and efficient automatic road network reconstruction from GPS traces and satellite images
is an important task in GIS data analysis and applications. The Morse-based approach can help
reconstructing the road network in both cases in a conceptually simple and clean manner. The
framework provides a meaningful and robust way to remove noise because it is based on the
concept of persistent homology. Intuitively, reconstruction of a road network from a noisy data
is tantamount to reconstructing a graph from a noisy function on a 2D domain. One needs to
eliminate noise and at the same time preserve the signal. Persistent homology and discrete Morse
theory help address both of these aspects. We can simply use the graph reconstruction algorithm
detailed in the previous section for this road network recovery.

GPS trajectories. Here the input is a set of GPS traces, and the goal is to reconstruct the under-
line road network automatically from these traces. The input set of GPS traces can be converted
into a density map ρ : Ω → R defined on the planar domain Ω = [0, 1] × [0, 1]. We then use
our graph reconstruction algorithm MorseRecon to recover the “mountain ridges" of the density
field; see Figure 10.6.
In Figure 10.7, we show reconstructed road network after improving the discrete-Morse based
output graphs with an editing strategy [137]. After the automatic reconstruction, the user can
observe the missing branches and can recover them by artificially making a vertex near the tip of
256 Computational Topology for Data Analysis

each such branch a minimum. This forces a 1-unstable manifold from a saddle edge to each of
these minima. Similarly, if a distinct loop in the network is missing, the user can artificially make
a triangle in the center of the loop a maximum which forces the loop to be detected.

Figure 10.6: Road network reconstruction [295]: (left) input GPS traces; (right) terrain corre-
sponding to the graph of the density function computed from input GPS traces; black lines are
the output of algorithm MorseRecon, which captures the ’mountain ridges’ of the terrain, corre-
sponding to the reconstructed road-network. The upper right is a top view of the terrain.

Figure 10.7: Road network reconstruction with editing [137]: (left) red points (minima) are added,
red branches are newly reconstructed for the Athens map (black curves are original reconstruction,
blue curves are input GPS traces); (middle) we also add blue triangles as maxima to capture many
missing loops; (right) upper: an example to show that adding extra triangles as maxima will
capture more loops, bottom: Berlin with adding both branches and loops.

Satellite images. In this case, we combine the Morse based graph reconstruction with a neural
network framework to recover the road network from input satellite images. First, we feed the
gray scale values of the input satellite image as a density function to MorseRecon. The output
graphs from a set of images are used to train a convolutional neural network CNN, which output
an image aiming to capture only the foreground (roads) in the satellite images. After training this
CNN, we feed the original satellite images to it to obtain a set of hopefully “cleaner" images.
These cleaned images are again fed to MorseRecon to output a graph which can again be used
to further train the CNN. Repeated use of this reconstruct-and-train step clean the noise consid-
Computational Topology for Data Analysis 257

erably. In Figure 2 (f) from Chapter Prelude, we show an example of the output of this strategy.
Notice that this strategy eliminates the need for curating the satellite images manually for creating
training samples.

10.5.2 Neuron network


To understand neuronal circuitry in the brain, a first step is often to reconstruct the neuronal cell
morphology and cell connectivity from microscopic neuroanatomical image data. Earlier work
often focuses on single neuron reconstruction from high resolution images of specific region in the
brain. With the advancement of imaging techniques, whole braining imaging data are becoming
more and more common. Robust and efficient methods that can segment and reconstruct neurons
and/or connectivities from such images are highly desirable.

Input image Reconstructed neurons


Figure 10.8: Discrete Morse based neuron morphology reconstruction from [294]; image courtesy
of Suyi Wang et al. (2018, fig. 13).

The discrete Morse based graph reconstruction algorithms have been applied to both fronts.
Neuron cells have tree morphology and can commonly be modeled as a rooted tree, where the root
of the tree locates in the soma (cell body) of the neuron. In Figure 10.8, we show the reconstructed
neuron morphology by applying the discrete Morse algorithm directly to an Olfactory Projection
Fibers data set (specifically, OP-2 data set) from the DIADEM challenge [259]. Specifically, the
input is an image stack acquired by 2-channel confocal microscopy method. In the approach
proposed in [294], after some preprocessing, the discrete Morse based algorithm is applied to
the 3D volumetric data to construct a graph skeleton. A tree-extraction based algorithm is then
applied to extract a tree structure from the graph output.
The discrete Morse based graph reconstruction algorithm can also be used in a more sophis-
ticated manner to handle more challenging data. Indeed, a new neural network framework is
proposed in [16] to combine the reconstructed Morse graph as topological prior with a UNet
[269] like neural network architecture for cell process segmentation from various neuroanatom-
ical image data. Intuitively, while UNet has been quite successful in image segmentation, such
approaches lack a global view (e.g., connectivity) of the structure behind the segmented signal.
Consequentially, the output can contain broken pieces for noisy images, and features such as
junction nodes in input signal can be particularly challenging to recover. On the other hand, while
DM-based graph reconstruction algorithm is particularly effective in capturing global graph struc-
tures, it may produce many false positives. The framework proposed in [16], called DM++ uses
output from discrete Morse as a separate channel of input, and co-train it together with the output
of a specific UNet-like architecture called ALBU [61] so as to use these two input to comple-
258 Computational Topology for Data Analysis

Figure 10.9: The DM++ framework proposed by [16], which combines both the DM output with
standard neural-network based output together via a Siamese neural network stack so as to use
these two inputs to augment each other and obtain better connected final segmentation; image
courtesy of Samik Banerjee et al. (2020, fig. 2b).

ment each other. See Figure 10.9. In particular, UNet output helps to remove false positives from
discrete Morse output, while the Morse graph output helps to obtain better connectivity.

10.6 Notes and Exercises


Forman [161] developed the discrete analogue of the classical Morse theory in mathematics. This
analogy is exemplified by the following fact. Let C p denote the p-th chain group formed by the
p-dimensional critical cells in a discrete Morse vector field. It means that C p is a free Abelian
group with critical p-cells forming a basis assuming Z2 -additions. For a critical cell c p , define
the boundary operator ∂ p c p = Σi (m p mod 2)cip−1 where cip is a critical (p − 1)-cell reachable
by m p number of V-paths from c p . Extending the boundary operator to the chains we get the
boundary homomorphism ∂ p : C p → C p−1 . One can verify that ∂ p−1 ◦ ∂ p = 0 (Exercise 12) thus
leading to a valid discrete Morse chain complex. Naturally, we get a homology group H p from
this construction. It turns out that this homology group is isomorphic to the homology group of
the complex on which the DMVF is defined.
Several researchers brought the concept to the area of topological data analysis [22, 213, 223,
235]. King et al. [213] presented an algorithm to produce a discrete Morse function on a complex
from a given real-valued function on its vertices. Bauer et al. [22] showed that persistent pairs can
be cancelled in order of their persistence values for any simplicial 2-manifolds. They also gave
an O(n log n)-time algorithm for cancelling pairs that have persistence below a given threshold.
The cancellation algorithm and its analysis in this chapter follow this result though with a slightly
different presentation. This cancellation does not generalize to simplicial 2-complexes and beyond
as we have illustrated. Mischaikow and Nanda [235] proposed Morse cancellation as a tool to
simplify an input complex before computing persistence pairs. The combinatorial view of the
vector field given by the discrete Morse theory has recently been extended to dynamical systems,
see, e.g., [38, 238].
Starting with Lewiner et al. [223], several researchers proposed discrete Morse theory for ap-
plications in visualization and image processing. Gyulassy et al. [181], Delgado-Friedricks et
al. [118] and Robins et al. [267] used discrete Morse theory in conjunction with persistence based
cancellations for processing images and analyzing features for e.g., porous solids. Sousbie [282]
Computational Topology for Data Analysis 259

proposed using the theory for detecting filamentary structures in data for cosmic webs. These
work proposed using cancellations as long as they are permitted acknowledging the fact that all
cancellations in a 2- or 3-complex may not be possible. Wang et al. proposed to use discrete
Morse complexes to compute unstable 1-manifolds as an output for a road network from GPS
data [295]. Using unstable 1-manifolds in a discrete Morse complex defined on a triangulation
in R2 to capture the hidden road network was proposed in this paper. Ultimately, this proposed
approach was implemented with a simplified algorithm and a proof of guarantee in [138]. The
material in Section 10.4 is taken from this paper. The application to road network reconstruction
from GPS trajectories and satellite images in Section 10.5 appeared in [137] and [139] respec-
tively. The application to neuron imaging data is taken from [16, 294].

Exercises
1. A Hasse diagram of a simplicial complex K is a directed graph that has a vertex vσ for
every simplex σ in K and a directed edge from vσ to vσ0 if and only if σ0 is a cofacet of
σ. Let M be a matching in K. Modify the Hasse diagram by reversing every edge that is
directed from vσ to vσ0 and (σ0 , σ) is in the matching M. Show that M induces a DMVF if
and only if the modified Hasse diagram does not have any directed cycle.

2. Let f be a Morse function defined on a simplicial complex K. We say K collapses to K 0


if there is a simplex σ with a single cofacet σ0 and K 0 = K \ {σ, σ0 }. Let Ka ⊆ K be the
subcomplex where Ka = {σ | f (σ) ≤ a}. Show that there is a series of collapses (possibly
empty) that brings Ka to Kb for any b ≤ a if there is no critical simplex with function value
c where b < c < a.

3. Call a V-path extendible if it can be extended by a simplex at any of the two ends.

(a) Show an example of a non-extendible V-path that is not critical.


(b) Show that every non-extendible V-path in a simplicial 2-manifold without boundary
must have at least one critical simplex.

4. Show that a discrete Morse function defines a Morse matching.

5. Let K be a simplicial Möbius strip with all its vertices on the boundary. Design a DMVF on
K so that there is only one critical edge and only one critical vertex and no critical triangle.

6. Prove that two V-paths that meet must have a common suffix.

7. Show the following:

(a) The strong Morse inequality implies the weak Morse inequality in Proposition 10.1.
(b) A matching which is not Morse may not satisfy Morse inequalities as in Proposi-
tion 10.1 but always satisfies the equality c p − c p−1 + · · · ± c0 = β p − β p−1 + · · · ± β0
for a p-dimensional complex K.
260 Computational Topology for Data Analysis

8. Consider a filtration of a simplicial complex K embedded in R3 . We want to create a DMVF


where all persistent triangle-tetrahedra pairs with persistence less than a threshold can be
cancelled. Show that this is always possible. Write an algorithm to compute the stable
manifolds for each of the critical tetrahedra in the resulting DMVF.

9. Prove Claim 10.1.

10. We propose a different version of PartialPersDMVF by changing only step 13 of Sim-


plePersDMVF as:

• if Pers (e) > δ then designate e critical in V else Join(T u ,T u0 ) endif

Prove that this simple modification produces the same DMVF as the PartialPersDMVF
described in the text.

11. Let K be a simplicial d-complex that has every (d − 1)-simplex incident to at most two d-
simplices. Extend Theorem 10.5 to prove that all persistent pairs between (d − 1)-simplices
and d-simplices arising from a filtration of K can be cancelled.

12. Prove ∂ p−1 ◦ ∂ p = 0 for the boundary operator defined for chain groups of critical cells as
described for discrete Morse chain complex in the notes above.
Chapter 11

Multiparameter Persistence and


Decomposition

In previous chapters, we have considered filtrations that are parameterised by a single parameter
such as Z or R. Naturally, they give rise to a 1-parameter persistence module. In this chapter, we
generalize the concept and consider persistence modules that are parameterized by one or more
parameters such as Zd or Rd .They are called multiparameter persistence modules in general. Mul-
tiparameter persistence modules naturally arise from filtrations that are parameterized by multiple
values such as the one shown in Figure 11.1 over two parameters.

Figure 11.1: A bi-filtration parameterized over curvature and radius, reprinted by permission
from Springer Nature: Springer Nature, Discrete & Computational Geometry, "The Theory of
Multidimensional Persistence", Gunnar Carlsson et al.[65], c 2009.

The classical algorithm of Edelsbrunner et al. [152] presented in Chapter 3 provides a unique
decomposition of the 1-parameter persistence module over Z implicitly generated by an input
simplicial filtration. Similarly, a multiparameter persistence module M over the grid Zd can be
implicitly given by an input multiparameter
L finite simplicial filtration and we look for computing a
decomposition (Definition 11.10) M  M i . The modules M i are the counterparts of bars in the
i

261
262 Computational Topology for Data Analysis

1-parameter case and are called indecomposables. These indecomposables are more complicated
and cannot be completely characterized as in the one-parameter case. Nonetheless, for finitely
generated persistence modules defined over Zd , their existence is guaranteed by the Krull-Schmidt
theorem [10]. Figure 11.2 illustrates indecomposables of some modules.

B
A

Figure 11.2: Decomposition of a finitely generated 2-parameter persistence module: (left) rect-
angle decomposable module: each indecomposable is supported by either bounded (A) or un-
bounded rectangle (B and C), D is a free module; (right) interval decomposable module: each
indecomposable is supported over a 2D interval (defined in next chapter).

An algorithm for decomposing a multiparameter persistence module can be derived from the
so-called Meataxe algorithm which applies to much more general modules than we consider in
TDA at the expense of high computational cost. Sacrificing this generality and still encompassing
a large class of modules that appear in TDA, we can design a much more efficient algorithm.
Specifically, we present an algorithm that can decompose a finitely presented module with a time
complexity that is much better than the Meataxe algorithm though we lose the generality as the
module needs to be distinctly graded as explained later.
For measuring algorithmic efficiency, it is imperative to specify how the input module is pre-
sented. Assuming an index set of size s and vector spaces of dimension O(m), a 1-parameter
persistence module can be presented by a set of matrices of dimensions O(m) × O(m) each rep-
resenting a linear map Mi → Mi+1 between two consecutive vector spaces Mi and Mi+1 . This
input format is costly as it takes O(sm2 ) space (O(m2 )-size matrix for each index) and also does
not appear to offer any benefit in time complexity for computing the bars. An alternative pre-
sentation is obtained by considering the persistence module as a graded module over a polyno-
mial ring k[t] and presenting it with the so-called generators {gi } of the module and relations
{ i αi gi = 0 | αi ∈ k[t]} among them. A presentation matrix encoding the relations in terms
P
of the generators characterizes the module completely. Then, a matrix reduction algorithm akin
to the persistence algorithm MatPersistence from Chapter 3 provides the desired decomposi-
tion1 . Figure 11.3 illustrates the advantage of this presentation over the other costly presentation.
In practice, when the 1-parameter persistence module is given by an implicit simplicial filtra-
1
persistence algorithm takes a filtration as input whereas here a module is presented with input matrices.
Computational Topology for Data Analysis 263

tion, one can apply the matrix reduction algorithm directly on a boundary matrix rather than first
computing a presentation matrix from it and then decomposing it. If there are O(n) simplices
constituting the filtration, the algorithm runs in O(n3 ) time with simple matrix reductions and
in O(nω ) time with more sophisticated matrix multiplication techniques where ω < 2.373 is the
exponent for matrix multiplication.

( 10 ) 2 ( 10 10 ) 2 ( 1 1 ) ( 1) (2) (3)
(0) 1
g1 tg1 , g2 t2 g1 , tg2 , g3 t3 g1 , t2 g2 , tg3 t4 g1 , t3 g2 , t2 g3
(1) 1 1
0 = t2 g1 + tg2 0 = t3 g1 + t2 g2 0 = t4 g1 + t3 g2
0 = t2 g2 + tg3 0 = t3 g2 + t2 g3 (2) 1

Figure 11.3: Costly presentation (top) vs. graded presentation (bottom,right). The top chain can
be summarized by three generators g1 , g2 , g3 at grades (0), (1), (2) respectively, and two relations
0 = t2 g1 + tg2 , 0 = t2 g2 + tg3 at grades (2), (3) respectively (Definition 11.5). The grades of
the generators and relations are given by the first times they appear in the chain. Finally, these
information can be summarized succinctly by the presentation matrix on the right.

The Meataxe algorithm for multiparameter persistence modules follows the costly approach
analogous to the one in the 1-parameter case that expects the presentation of each individual linear
map explicitly. In particular, it expects the input d-parameter module M over a finite subset of
Zd to be given as a large matrix in kD×D with entries in a fixed field k = Zq , where D is the sum
of dimensions of vector spaces over all points in Zd supporting M. The time complexity of the
Meataxe algorithm is O(D6 log q) [196]. In general, D might be quite large. It is not clear what is
the most efficient way to transform an input that specifies generators and relations ( or a simplicial
filtration) to a representation matrix required by the Meataxe algorithm. A naive approach is to
consider the minimal sub-grid in Zd that supports the non-trivial maps. In the  worst-case,
 with
N being the total number of generators and relations, one has to consider O( Nd ) = O(N d ) grid
points in Zd each with a vector space of dimension O(N). Therefore, D = O(N d+1 ) giving a
worst-case time complexity of O(N 6(d+1) log q). Even allowing approximation, the algorithm runs
in O(N 3(d+1) log q) time [197].
In this chapter, we take the alternate approach where the module is treated as a finitely pre-
sented graded module over multivariate polynomial ring R = k[t1 , · · · , td ] [108] and presented
with a set of generators and relations graded appropriately. Given a presentation matrix encod-
ing relations with generators, our algorithm computes a diagonalization of the matrix giving a
presentation of each indecompsable which the input module decomposes into. Compared to the
1-parameter case, we have to cross two main barriers for computing the indecomposables. First,
we need to allow row operations along with column operations for reducing the input matrix. In
1-parameter case, row operations become redundant because column operations already produce
the bars. Second, unlike in 1-parameter case, we cannot allow all left-to-right column or bottom-
to-top row operations for the matrix reduction because the parameter space Zd , d > 1, unlike Z
only induces a partial order on these operations. These two difficulties are overcome by an in-
cremental approach combined with a linearization trick. Given a presentation matrix with a total
of O(N) generators and relations that are graded distinctly, the algorithm runs in O(N 2ω+1 ) time.
264 Computational Topology for Data Analysis

Surprisingly, the complexity does not depend on the parameter d.


Computing presentation matrix from a multiparameter simplicial filtration is not easy. For
d-parameter filtrations with n simplices, a presentation matrix of size O(nd−1 ) × O(nd−1 ) can
be computed in O(nd+1 ) time by adapting an algorithm of Skryzalin [279] as detailed in [142].
We will not present this construction here. Instead, we focus on the two cases, 2-parameter
persistence modules where the homology groups could be multi-dimensional and d-parameter
persistence modules where the homology group is only zero dimensional. For these two cases,
we can compute the presentation matrices more efficiently. For the 2-parameter case, Lesnick
and Wright [222] gives an efficient O(n3 ) algorithm for computing a presentation matrix from an
input filtration. In this case, N, the total number of generators and relations, is O(n). For the 0-th
homology groups, presentation matrices are given by the boundary matrices straightforwardly as
detailed in Section 11.5.2 giving N = O(n).

11.1 Multiparameter persistence modules


We define persistence modules in this chapter differently using the definition of graded modules
in algebra. Graded module structures provide an appropriate framework for defining the mul-
tiparameter persistence, in particular, for the decomposition algorithm that we present. Also,
navigating between the simplicial filtration and the module induced by it becomes natural with
the graded module structure.

11.1.1 Persistence modules as graded modules


First, we recollect the definition of modules from section 2.4.1. It requires a ring. We consider a
module where the ring R is taken as the polynomial ring.

Definition 11.1 (Polynomial ring). Given a variable t and a field k, the set of polynomials given
by
k[t] = {a0 + a1 t + a2 t2 + · · · + an tn | n ≥ 0, ai ∈ k}
forms a ring with usual polynomial addition and multiplication operations. The definition can be
extended to multivariate polynomials
i
k[t] = k[t1 , · · · , tk ] = {Σi1 ,··· ,ik ai1 ,··· ,ik t1i1 · · · t jj · · · tkik | i1 , · · · , ik ≥ 0, ai1 ,··· ,ik ∈ k}.

We use polynomial ring to define multiparameter persistence modules. Specifically, let R =


k[t1 , · · · , td ] be the d-variate Polynomial ring for some d ∈ Z+ with k being a field. Throughout
this chapter, we assume coefficients are in k. Hence homology groups are vector spaces.

Definition 11.2 (Graded module). A Zd -graded R-module (graded module in brief) is an R-


module M Lthat is a direct sum of k-vector spaces Mu indexed by u = (u1 , u2 , . . . , ud ) ∈ Zd ,
i.e. M = d
u Mu , such that the ring action satisfies that ∀i, ∀u ∈ Z , ti · Mu ⊆ Mu+ei , where {ei }i=1
d
d d
is the standard basis in Z . The indices u ∈ Z are called grades.

Another interpretation of graded module is that, for each u ∈ Zd , the action of ti on Mu


determines a linear map ti • : Mu → Mu+ei by (ti •)(m) = ti · m. So, we can also describe a graded
Computational Topology for Data Analysis 265

··· ··· ···

··· M0,2 M1,2 M2,2 ···


t2 •
t1 •
· · · −→ t22 • M0,1 M1,1 M2,1 ···
t2 • t1 •t2 • t2 •

··· M0,0 t1 • M1,0 t1 • M2,0 ···

t12 •
··· ↑ ···

Figure 11.4: A graded 2-parameter module. All sub-diagrams of maps and compositions of maps
are commutative.

module equivalently as a collection of vectors spaces {Mu }u∈Zd with a collection of linear maps
{ti • : Mu → Mu+ei , ∀i, ∀u} where the commutative property (t j •) ◦ (ti •) = (ti •) ◦ (t j •) holds. The
commutative diagram in Figure 11.4 shows a graded module for d = 2, also called a bigraded
module.

Definition 11.3 (Graded module R). There is a special graded module M where Mu is the k-
vector space generated by tu = t1u1 t2u2 · · · tdud and the ring action is given by the ring R. We denote
it with R not to be confused with the ring R which is used to define it.

Before we introduce persistence modules as instances of graded modules, we extend the no-
tion of simplicial filtration to the multiparameter framework.

Definition 11.4 (d-parameter filtration). A (d-parameter) simplicial filtration is a family of sim-


plicial complexes {Xu }u∈Zd such that for each grade u ∈ Zd and each i = 1, · · · , d, Xu ⊆ Xu+ei .

Figure 11.5 shows an example of a 2-parameter filtration and various graded modules associ-
ated with it. The module resulting with the homology group at the bottom right is a persistence
module. The figure also shows other graded modules of chain groups (left) and boundary groups
(middle).

Definition 11.5 (d-parameter persistence module). We call a Zd -graded R-module M a d-parameter


persistence module when Mu for each u ∈ Zd is a homology group defined over a field and
the linear maps corresponding to ring actions among them are induced by inclusions in a d-
parameter simplicial filtration. We call M finitely generated if there exists a finite set of elements
{g1 , · · · , gn } ⊆ M such that each element m ∈ M can be written as an R-linear combination of
these elements, i.e. m = ni=1 αi gi with αi ∈ R. We call this set {gi } a generating set or generators
P
of M. A generating set is called minimal if its cardinality is minimal among all generating sets.
The R-linear combinations ni=1 αi gi that are 0 are called relations. We will see later that a module
P
can be represented by a set of generators and relations.
266 Computational Topology for Data Analysis

0
0 1 2
k [t1 ,0,0]> k3 [t1 ,0,0]> k3 0 0 k2 [C] k2 k t1 k t1 k

t2 [B] [B] 1 [t2 ,0]> [D] t2 [t2 ,t2 ] t2

k [t1 ,0,0]> k3 [A] k3 0 0 k [t1 ,0]> k2 k [t1 ,0]> k2 [t1 ,t1 ] k

0 [0,t2 ,0]> [0,t2 ,0] 0 0 0 0 [t2 ,0]> t2

0 0 k t1 k 0 0 0 0 0 0 0 k t1 k
(a) (b) (c)

Figure 11.5: (top) An example of a 2-parameter simplicial filtration. Each square box indicates
what is the current (filtered) simplical complex at the bottom left grid point of the box. (bottom)
We show different modules considering different abelian groups arising out of the complexes with
the ring actions on the arrows (see Section 11.5 for details): (a) The module of 0-th chain groups
   
 t1 0 0   t2 0 0 
C0 , A =  0 t1 0  and B =  0 t2 0 . (b) The module of 0-th boundary groups B0 ,
   
   
0 0 t1 0 0 t2
! !
t1 0 t2 0
C = and D = . (c) The module of the 0-th homology groups H0 , it has
0 t1 0 t2
one connected component in 0-th homology groups at grades except (0, 0) and (1, 1), and has two
connected components at grade (1, 1).

In this exposition, we assume that all modules are finitely generated. Such modules always
admit a minimal generating set. In our example in Figure 11.5, the vertex set {vb , vr , vg } is a
minimal generating set for the module of zero-dimensional homology groups.

Definition 11.6 (Morphism). A graded module morphism, called morphism in short, between
two graded modules M and N is defined as an R-linear map f : M → N preserving grades:
f (Mu ) ⊆ Nu , ∀u ∈ Zd . Equivalently, it can also be described as a collection of linear maps
Computational Topology for Data Analysis 267

{ fu : Mu → Nu } which gives the following commutative diagram for each u and i:


ti
Mu Mu+ei
fu fu+ei
ti
Nu Nu+ei

Two graded modules M, N are isomorphic if there exist two morphisms f : M → N and g : N →
M such that g ◦ f and f ◦ g are identity maps. We denote it as M  N.
Definition 11.7 (Shifted module). For a graded module M and some u ∈ Zd , define a shifted
graded module M→u by setting (M→u )v = Mv−u for each v.
Definition 11.8 (Free module). We say L a graded module is free if it is isomorphic to the direct
j R , where each R = R→u j for some u j ∈ Z . Here R is
j
sum of a collection of R , denoted as j j d

the special graded module in definition 11.3.


Definition 11.9 (Homogeneous element). We say an element m ∈ M is homogeneous if m ∈ Mu
for some u ∈ Zd . We denote gr(m) = u as the grade of such homogeneous element. To emphasize
the grade of a homogeneous element, we also write mgr(m) := m.
A minimal generating set of a free module is called a basis. We usually furtherL require that
all the elements (generators) in a basis are homogeneous. For a free module F  j
j R such
a basis exists. Specifically, {e j : j = 1, 2, · · · } is a homogeneous basis of F, where e j indicates
identity in R j . The generating set {e j : j = 1, 2, · · · } is often referred to as the
the multiplicativeL
j R =< e j : j = 1, 2, · · · >.
standard basis of j

11.2 Presentations of persistence modules


Definition
L i 11.10 (Decomposition). For a finitely generated graded module M, we call M 
M a decomposition of M for some collection of modules {M i }. We say M is indecompos-
able if M  M 1 ⊕ M 2 =⇒ M 1 = 0 or M 2 = 0 where 0 denotes a trivial module. By the
Krull-Schmidt theorem [10], L
there exists an essentially unique (up to permutation and isomor-
phism) decomposition M  M i with every M i being indecomposable. We call it the total
decomposition of M.
For example, the free module R in Definition 11.3 is generated by < e(0,0) 1 > and the free
u
module R→(0,1) ⊕ R→(1,0) is generated by < e(0,1) ,
1 L2e (1,0)
>. A free module M generated by < e j j :
j = 1, 2, · · · > has a (total) decomposition M  j R→u j .

Definition 11.11 (Isomorphic morphisms). Two morphisms f : M → N and f 0 : M 0 → N 0 are


isomorphic, denoted as f  f 0 , if there exist isomorphisms g : M → M 0 and h : N → N 0 such
that the following diagram commutes:
f
M N
g h
 
f0
M0 N0
268 Computational Topology for Data Analysis

Essentially, like isomorphic modules, two isomorphic morphisms can be considered the same.
For two morphisms f1 : M 1 → N 1 and f2 : M 2 → N 2 , there exists a canonical morphism
g : M 1 ⊕ M 2 → N 1 ⊕ N 2 , g(m1 , m2 ) = ( f1 (m1 ), f2 (m2 )), which is essentially uniquely determined
by f1 and f2 and is denoted as f1 ⊕ f2 . A module is trivial if it has only the element 0 at every
grade. We denote a trivial morphism by 0 : 0 → 0. Analogous to the decomposition of a module,
we can also define a decomposition of a morphism.
Definition 11.12 (Morphism decomposition). A morphism f is L indecomposable if f ' f1 ⊕
f2 =⇒ f1 or f2 is the trivial morphism 0 : 0 → 0. We call f  fi a decomposition of f . If
each fi is indecomposable, we call it a total decomposition of f .
Like decompositions of modules, the total decompositions of a morphism is also essentially
unique.

11.2.1 Presentation and its decomposition


To study total decompositions of persistence modules that are treated as graded modules, we
draw upon the idea of presentations of graded modules and build a bridge between decomposi-
tions of persistence modules and corresponding presentations. Decompositions of presentations
can be transformed to a matrix reduction problem with possibly nontrivial constrains which we
will introduce in Section 11.3. We first state a result saying that there are one to one correspon-
dences between persistence modules, presentations, and presentation matrices. Recall that, by
assumption, all modules are finitely generated. A graded module hence a persistence module ac-
commodates a description called its presentation that aids finding its decomposition. We remind
the reader that a sequence of maps is exact if the image of one map equals the kernel of the next
map.
Definition 11.13 (Presentation). A presentation of a graded module H is an exact sequence
f g
F1 F0 H 0 where F 1 , F 0 are free.
We call f a presentation map. We say a graded module H is finitely presented if there exists a
presentation of H with both F 1 and F 0 being finitely generated.
The exactness of the sequence implies that im f = ker g and im g = H. The double arrows on
the second map in the sequence signifies the surjection of g. It follows that coker f  H and the
presentation is determined by the presentation map f .
Remark 11.1. Presentations of a given graded module are not unique. However, there exists
an essentially unique (up to isomorphism) presentation f of a graded module in the sense that
any presentation f 0 of that module can be written as f 0  f ⊕ f 00 with coker f 00 = 0. We call this
unique presentation the minimal presentation. See more details of the construction and properties
of minimal presentation in [142].
f
Definition 11.14 (Presentation matrix). Given a presentation F 1 → F 0 → H, fixed bases of F 1
(relations) and F 0 (generators) provide a matrix form [ f ] of the presentation map f , which we
call a presentation matrix of H. It has entries in R. In the special case that H is a free module with
F 1 being a zero module, we define the presentation matrix [ f ] of H to be a null column matrix
with matrix size ` × 0 for some ` ∈ N.
Computational Topology for Data Analysis 269

In Figure 11.6, we illustrate the presentation matrix of the persistence module H0 consisting
of zero dimensional homology groups induced by the filtration shown in Figure 11.5. We will
see later that, in this case, f equals the boundary morphism ∂1 : C1 → C0 whose columns are
edges and rows are vertices. For example, the red edge er whose grade is (1, 1) has two boundary
vertices vb , the blue vertex with grade (0, 1) and vr , the red vertex with grade (1, 0). To bring vb
to grade (1, 1), we need to multiply by the polynomial t1 . Similarly, to bring vr to grade (1, 1), we
need to multiply by t2 . The corresponding entries in the column of er are t1 and t2 respectively
indicated by shaded boxes. Actual matrices are shown later in Example 11.1.
An important property of a graded module H is that a decomposition of its presentation f
corresponds to a decomposition of H itself. The decomposition of f can be computed by diago-
nalizing its presentation matrix [ f ]. Informally, a diagonalization of a matrix A is an equivalent
matrix A0 in the following form (see formal Definition 11.15 later):

 A1 0 · · · 0 
 
 0 A2 · · · 0 
A0 =  . .. . . .. 

 .. . . . 
 
0 0 · · · Ak
L
L nonzero entries are in Ai ’s and we write A 
All Ai . It is not hard to see that for a map
L
f  fi , there is a corresponding diagonalization [ f ]  [ fi ]. With these definitions, we have
the following theorem that motivates the decomposition algorithm (see proof in [142]).

(1,1) (2,1) (1,2)


er eb eg
2 (0,1) vb

1 (1,0) vr

(1,1) vg
0
0 1 2
Figure 11.6: The presentation matrix of the module H0 consisting of zero dimensional homology
groups for the example in Figure 11.5. The boxes in the matrix containing non-zero entries are
shaded.

Theorem 11.1. There are 1-1 correspondences between the following three structures arising
from a minimal presentation map f : F 1 → F 0 of a graded module H, and its presentation matrix
[ f ]:
L i
1. A decomposition of the graded module H  H;
L
2. A decomposition of the presentation map f  fi ;
270 Computational Topology for Data Analysis

L
3. A diagonalization of the presentation matrix [ f ]  [ f ]i .

Remark 11.2. From Theorem 11.1, we can see that there exist an essentially unique total decom-
position of a presentation map and an essentially unique total diagonalization of the presentation
matrix of H which correspond to an essentially unique total decomposition of H (up to permuta-
tion, isomorphism, and trivial summands). In practice, we might be given a presentation which is
not necessarily minimal. One way to handle this case is to compute the minimal presentation of
the given presentation first. For 2-parameter modules, this can be done by an algorithm presented
in [222]. The other choice is to compute the decomposition of the given presentation directly,
which is sufficient to get the decomposition of the module thanks to the following proposition.

Proposition 11.2. Let f be any presentation (not necessarily minimal) of a graded module H.
The following statements hold:
L i
1. for any decomposition of H  H , there exists a decomposition of f  ⊕ fi such that
coker fi = H i , ∀i;

2. the total decomposition of H follows from the total decomposition of f .

Remark 11.3. By Remark 11.1, any presentation f can be written as f  f ∗ ⊕ f 0 with f ∗ being
the minimal presentation and coker f 0 = 0. Furthermore, f 0 can be written as f 0  g ⊕ h where g
is an identity map and h is a zero map. The corresponding matrix form is [ f ]  [ f ∗ ] ⊕ [g] ⊕ [h]
with [g] being an identity submatrix and [h] being an empty matrix representing a collection
of zero columns. Therefore, one can easily read these trivial parts from the result of matrix
diagonalization. See the following diagram for an illustration.

f∗ g h
 
 
 
 

 [ f ∗] 

[ f ] = 
 

 


 1 
1
 
 
1

11.3 Presentation matrix: diagonalization and simplification


Our aim is to compute a total diagonalization of a presentation matrix over Z2 . Here we formally
define some notations used in the diagonalization. All (graded) modules are assumed to be finitely
presented and we take k = Z2 for simplicity though the method can be generalized for any finite
field (Exercise 8). We have observed that a total decomposition of a module can be achieved by
computing a total decomposition of its presentation f . This in turn requires a total diagonalization
of the presentation matrix [ f ]. Here we formally define some notations about the diagonalization.
Given an ` × m matrix A = [Ai, j ], with row indices Row(A) = [`] := {1, 2, · · · , `} and column
indices Col(A) = [m] := {1, 2, · · · , m}, we define an index block B of A as a pair Row(B), Col(B)
 
Computational Topology for Data Analysis 271

with Row(B) ⊆ Row(A), Col(B) ⊆ Col(A). We say an index pair (i, j) is in B if i ∈ Row(B) and
j ∈ Col(B), denoted as (i, j) ∈ B. We denote a block of A on B as the matrix restricted to
the index block B, i.e. [Ai, j ](i, j)∈B , denoted as A|B . We call B the index of the block A|B . We
abuse the notations Row(A|B ) := Row(B) and Col(A|B ) := Col(B). For example, the ith row
ri = Ai,∗ = A|[{i},Col(A)] and the jth column c j = A∗, j = A|[Row(A),{ j}] are blocks with indices
   
{i}, Col(A)] and Row(A), { j}] respectively. Specifically, ∅, { j} represents an index block of a
single column j and [{i}, ∅] represents an index block of a single row i. We call [∅, ∅] the empty
index block.
A matrix can have multiple equivalent forms for the same morphism they represent. We use
0
A ∼ A to denote the equivalence of matrices. One fact about equivalent matrices is that they can
be obtained from one another by row and column operations introduced later (Chapter 5 in [110]).

Definition 11.15 (Diagonalization). A matrix A0 ∼ A is called a diagonalization of A with a set of


nonempty disjoint index blocks B = {B1 , B2 , · · · , Bk } if rows and columns of A are partitioned into
these blocks, i.e., Row(A) = i Row(Bi ) and Col(A) = i Col(Bi ), and all the
` `
Lnonzero0 entries
of A have indices in some Bi ( i denotes disjoint union). We write A =
0 0
`
Bi ∈B A | Bi . We
say A0 =
L 0 | is total if no block in this diagonalization can be diagonalized further into
Bi ∈B A Bi
smaller nonempty blocks. That means, for each block A0 |Bi , there is no nontrivial diagonalization.
Specifically, when A is a null column matrix (the presentation matrix of a free module), we say
A is itself a total diagonalization with index blocks {[{i}, ∅] | i ∈ Row(A)}.

Note that each nonempty matrix A has a trivial diagonalization with the set of index blocks
being the singleton {(Row(A), Col(A))}. Guaranteed by Krull-Schmidt theorem [10], all total
diagonalizations are unique up to permutations of their rows and columns, and equivalent trans-
formation within each block. The total diagonalization of A is denoted generically as A∗ . All
total diagonalizaitons of A have the same set of index blocks unique up to permutations of rows
and columns. See Figure 11.7 for an illustration of a diagonalized matrix.
1 2 3 4 5 1 3 2 4 5
1 1

2 4

3 6

4 3
5 2

6 5

Figure 11.7: (left) A nontrivial diagonalization where the locations of non-zero entries are pat-
terned and the pattern for all such entries in the same block are the same. (right) The same matrix
with permutation of columns and rows to bring entries of a block in adjacent locations, the three
index blocks are: ((1, 4, 6)(1, 3)), ((2, 3)(2, 4)), and ((5)(5)).
272 Computational Topology for Data Analysis

11.3.1 Simplification
First we want to transform the diagonalization problem to an equivalent problem that involves
matrices with a simpler form. The idea is to simplify the presentation matrix to have entries only
in k which is taken as Z2 . There is a correspondence between diagonalizations of the original
presentation matrix and certain constrained diagonalizations of the corresponding transformed
matrix.
We first make some observations about the homogeneous property of presentation maps and
presentation matrices. Equivalent matrices actually represent isomorphic presentations f 0  f
that admit commutative diagram,
f
F1 F0
h1 h0
 
f0
F1 F0
where h1 and h0 are endomorphisms on F 1 and F 0 respectively. The endomorphisms are realized
by basis changes between corresponding presentation matrices [ f ]  [ f 0 ]. Since all morphisms
between graded modules are required to be homogeneous (preserve grades) by definition, we can
use homogeneous bases (all the basis elements chosen are homogeneous elements2 ) for F 0 and
F 1 to represent matrices. Let F 0 =< g1 , · · · , g` > and F 1 =< s1 , · · · , sm > where gi and si are
homogeneous elements for every i. With this choice, we can consider only equivalent presentation
matrices under homogeneous basis changes. Each entry [ f ]i, j is also homogeneous. That means,
[ f ]i, j = t1u1 t2u2 · · · tdud where (u1 , u2 , · · · , ud ) = gr(s j ) − gr(gi ). Writing u = (u1 , u2 , · · · , ud ) and
tu = t1u1 t2u2 · · · tdud , we get [ f ]i, j = tu where u = gr(s j ) − gr(gi ) called the grade of [ f ]i, j . We call
such presentation matrix homogeneous presentation matrix.
For example, given F 0 =< g(1,1) 1 , g2
(2,2)
>, the basis change g(2,2)
2 ← g(2,2)
2 + g(1,1)
1 is not
(2,2) (1,1) (2,2) (2,2)
homogeneous since g2 + g1 is not a homogeneous element. However, g2 ← g2 +
(1,1) (2,2) (1,1) (2,2)
t g1 is a homogeneous change with gr(g2 + t g1 ) = gr(g2 ) = (2, 2), which results
(1,1) (1,1)

in a new homogeneous basis, {g(1,1) 1 , g2


(2,2)
+ t(1,1) g(1,1)
1 }. Homogeneous basis changes always result
in homogeneous bases.
Let [ f ] be a homogeneous presentation matrix of f : F 1 → F 0 with bases F 0 =< g1 , · · · , g` >
and F 1 =< s1 , · · · , sm >. We extend the notation of grading to every row ri and every column
c j from the basis elements gi and s j they represent respectively, that is, gr(ri ) := gr(gi ) and
gr(c j ) := gr(s j ). We define a strict partial order <gr on rows {ri } by asserting ri <gr r j if and only
if gr(ri ) < gr(r j ). Similarly, we define a strict partial order on columns {c j }.
For such a homogeneous presentation matrix [ f ], we aim to diagonalize it totally by homoge-
neous change of basis while trying to zero out entries by column and row operations that include
additions and scalar multiplication of columns and rows as done in well known Gaussian elimi-
nation. We have the following observations:

1. gr([ f ]i, j ) = gr(c j ) − gr(ri )

2. a nonzero entry [ f ]i, j can only be zeroed out by column operations from columns ck <gr c j
or by row operations from rows r` >gr ri .
2
Recall that an element m ∈ M is homogeneous with grade gr(m) = u for some u ∈ Zd if m ∈ Mu .
Computational Topology for Data Analysis 273

Observation (2) indicates which subset of column and row operations is sufficient to zero out
the entry [ f ]i, j . We restate the diagonalization problem as follows:
Given an n×m homogeneous presentation matrix A = [ f ] consisting of entries in k[t1 , · · · , td ]
with grading on rows and columns, find a total diagonalization of A under the following admissi-
ble row and column operations:

• multiply a row or column by nonzero α ∈ k; (For k = Z2 , we can ignore these operations),

• for two rows ri , r j with j , i, r j <gr ri , set r j ← r j + tu · ri where u = gr(ri ) − gr(r j ),

• for two columns ci , c j with j , i, ci <gr c j , set c j ← c j + tv · ci where v = gr(c j ) − gr(ci ).

The above operations realize all possible homogeneous basis changes. That means, any ho-
mogeneous presentation matrix can be realized by a combination of the above operations.
In fact, the values of nonzero entries in the matrix are redundant under the homogeneous prop-
erty gr(Ai, j ) = gr(c j ) − gr(ri ) given by observation (1). So, we can further simplify the matrix by
replacing all the nonzero entries with their k-coefficients. For example, we can replace 2·tu with
2. What really matters are the partial orders defined by the grading of rows and columns. With
our assumption of k = Z2 , all nonzero entries are replaced with 1. Based on above observations,
we further simplify the diagonalization problem to be the one as follows.
Given a k-valued matrix A with a partial order on rows and columns, find a total diagonaliza-
tion A∗ ∼ A with the following admissible operations:

• multiply a row or column by nonzero α ∈ k; (For k = Z2 , we can ignore these operations).

• Adding ci to c j only if j , i and gr(ci ) < gr(c j ); denoted as ci → c j .

• Adding rk to rl only if l , k and gr(r` ) < gr(rk ); denoted as rk → rl .

The assumption of k = Z2 allows us to ignore the first set of multiplication operations on the
binary matrix obtained after transformation. We denote the set of all admissible column and row
operations as

Colop ={(i, j) | ci → c j is an admissible column operation}


Rowop ={(k, l) | rk → rl is an admissible row operation}

Under the assumption that no two columns nor rows have same grades, Colop and Rowop are
closed under transitive relation.

Proposition 11.3. If (i, j), ( j, k) ∈ Colop (Rowop) then (i, k) ∈ Colop (Rowop).

Given a solution of the diagonalization problem in the simplified form, one can reconstruct a
solution of the original problem on the presentation matrix by reversing the above process of sim-
plification. We will illustrate it by running the algorithm on the working example in Figure 11.5
at the end of this section. The matrix reduction we employ for diagonalization may be viewed as
a generalized matrix reduction because the matrix is reduced under constrained operations Colop
and Rowop which might be a nontrivial subset of all basic operations.
274 Computational Topology for Data Analysis

2 vb + vg = 0(1, 2)

1 vg (1, 1) v + v = 0(2, 1)
r g
vb (0, 1) vb + vr = 0(1, 1)
0
0 1 2 vr (1, 0)

Figure 11.8: The persistence module corresponding to the presentation matrix [∂1 ] shown in
Example 11.1. The generators are given by the three vertices with grades (0, 1), (1, 0), (1, 1) and
the relations are given by the edges with grades (1, 1), (1, 2), (2, 1).

Remark 11.4. There are two extreme but trivial cases: (i) there are no <gr -comparable pair of
rows and columns. In this case, Colop = Rowop = ∅ and the original matrix is a trivial solution.
(ii) All pairs of rows and all pairs of columns are <gr -comparable. Or equivalently, both Colop
and Rowop are totally ordered. In this case, one can apply traditional matrix reduction algorithm
to reduce the matrix to a diagonal matrix with all nonzero blocks being 1 × 1 minors. This is also
the case for 1-parameter persistence module if one further applies row reduction after column
reduction. Note that row reductions are not necessary for reading out persistence information
because it essentially does not change the persistence information. However, in multiparameter
cases, both column and row reductions are necessary to obtain a diagonalization from which
the persistence information can be read. From this view-point, the algorithm we present can be
thought of as a generalization of the traditional persistence algorithm.

Example 11.1. Consider our working example in Figure 11.5. One can see later in Section 11.5
(Case 1) that the presentation matrix of this example can be chosen to be the same as the matrix
of the boundary morphism ∂1 : C1 → C0 . With fixed bases C0 =< v(0,1) (1,0) (1,1)
b , vr , vg > and
(1,1) (1,2) (2,1)
C1 =< er , eb , eg >, this presentation matrix [∂1 ] and the corresponding binary matrix A
can be written as follows (recall that superscripts indicate the grades) :

[∂1 ] e(1,1)
r e(1,2) e(2,1)
g A c(1,1) c(1,2) c(2,1)
b  1 2 3 
r1(0,1)
 
v(0,1)
b
 t(1,0)
 t(1,1) 0   1
 1 0 

(1,0)
v(1,0) t(1,1)  −→ r2
 t(0,1)   1
r 0

 0 1 

v(1,1)

(0,1) t(1,0) r3(1,1)

g 0 t 0 1 1

Four admissible operations are: r3 → r1 , r3 → r2 , c1 → c2 , c1 → c3 . Figure 11.8 shows the


persistence module H0 whose presentation matrix is [∂1 ].

11.4 Total diagonalization algorithm


Assume that no two columns nor rows have the same grades. Without this assumption, the prob-
lem of total diagonalization becomes more complicated. At this point, we do not know how to
Computational Topology for Data Analysis 275

extend the algorithm to overcome this limitation. However, the algorithm introduced below can
still compute a correct diagonalization (not necessarily total) by applying the trick of adding small
enough perturbations to tied grades (considering Zd ⊆ Rd ) to reduce the case to the one satisfying
our assumption. Furthermore, this diagonalization in fact coincides with a total diagonalization
of some persistence module which is arbitrarily close to the original persistence module under a
well-known metric called interleaving distance which we discuss in the next chapter. In practice,
the persistence module usually arises from a simplicial filtration as shown in our working exam-
ple. The assumption of distinct grading of the columns and rows is automatically satisfied if at
most one simplex is introduced at each grade in the filtration.
Let A be the presentation matrix whose total diagonalization we are seeking for. We order
the rows and columns of the matrix A according to any order that extends the partial order on
the grades to a total order, e.g., dictionary order. We fix the indices Row(A) = {1, 2, · · · , `}
and Col(A) = {1, 2, · · · , m} according to this order. With this ordering, observe that, for each
admissible column operation ci → c j , we have i < j, and for each admissible row operation
rl → rk , we have l > k.
For any column ct , let A≤t := A|C denote the left submatrix on C = Row(A), { j ∈ Col(A) |


j ≤ t} and A<t denote its stricter version obtained by excluding column ct from A≤t . Our algo-
rithm starts with the finest decomposition that puts every free module given by each generator
(rows) into a separate block and then combine them incrementally as we process the relations
(columns). The main idea of our algorithm is presented in Algorithm 22:TotDiagonalize which
runs as follows (see Figure 11.9 for an illustration):
current column
(t−1) (t−1)
1. Main iteration (at iteration t) ct 2. Sub-column update (e.g.: Bi = B2 )

(t−1) (t−1)
B3 B3 sub-column untouched yet

T (t−1) T T
(t−1) Bi sub-column to update
B2 ... 0 (t−1)
= B2 0 ct |RowB (t−1)
2

B1 (t−1)
(t)
B1 0sub-column
=c| t RowB (t−1)
1
already reduced.
B1 unchanged
ct
A<t A≤t
(t) (t)
B (t−1) = {B1
(t−1) (t−1)
, B2
(t−1)
, B3 } B (t) = {B0 = (∅, {t}), B1 }

Figure 11.9: (left) A at the beginning of iteration t with A<t being totally diagonalized with
three index blocks B(t−1) = {B(t−1)
1 , B(t−1)
2 , B(t−1)
3 }. (right) A sub-column update step: ct |RowB(t−1)
1
has already been reduced to zero. So, B(t) (t−1)
1 = B1 is added into B(t) . White regions including
ct |RowB(t−1) must be preserved afterward. Now for i = 2, we attempt to reduce purple sub-column
1
ct |RowB(t−1) . We extend it to block on T := Row(B(t−1) ), (Col(A≤t ) \ Col(B(t−1)
 
2 2 )) (colored purple)
2
and try to reduce it in BlockReduce.

1. Initialization: Initialize the collection of index blocks B(0) := {B(0)


 
i := {i}, ∅ | i ∈
Row(A)}, for the total diagonalization of the null column matrix A≤0 .
276 Computational Topology for Data Analysis

2. Main iteration: Process A from left to right incrementally by introducing a column ct and
considering left submatrices A≤t for t = 1, 2, · · · , m. We update and maintain the collection
of index blocks B(t) ← {B(t)
i } for the current submatrix A≤t in each iteration by using column
and block updates stated below. Here we use upper index (·)(t) to emphasize the iteration t.

2(a). Sub-column update: Partition the column ct into sub-columns

ct |RowB(t−1) := A[Row(B(t−1) ), {t}] ,


i i

one for the set of rows Row(B(t−1)


i ) for each block from the previous iteration. We
process each such sub-column ct |RowB(t−1) one by one, checking whether there exists
i
a sequence of admissible operations that are able to reduce the sub-column to zero
while preserving the prior as defined below.
Definition 11.16. We say a prior with respect to a sub-column ct |RowB(t−1) is the left
i
submatrix A<t and sub-columns ct |RowB(t−1) for all j < i.
j

Prior preservation means that the operations together change neither A<t nor other
sub-columns ct |RowB(t−1) for every j < i. If such operations exist, we apply them on
j
the current A to get an equivalent matrix with the sub-column ct |RowB(t−1) being zeroed
i
out and we set B(t)
i ← Bi
(t−1)
. Otherwise, we leave the matrix A unchanged and add
the column index t to those of B(t−1) , i.e., we set B(t) (t−1)
), Col(B(t−1)

i i ← Row(Bi i )∪
{t} . After processing every sub-column ct |RowB(t−1) one by one, all index blocks B(t)

i
i
containing column index t are merged into one single index block. At the end of
iteration t, we get an equivalent matrix A with A≤t being totally diagonalized with
index blocks B(t) .
2(b). Block reduce: To update the entries of each sub-column of ct described in 2(a), we
propose a block reduction algorithm ALgorithm 24:BlockReduce to compute the cor-
rect entries. Given T := Row(B(t−1) ), (Col(A≤t ) \ Col(B(t−1)
 
i i )) , this routine checks
whether the block T can be zeroed out by some collection of admissible operations.
If so, ct does not join the block B(t)
i and A is updated with these operations.

For two index blocks B1 , B2 , we denote the merging B1 ⊕ B2 of these two index blocks as
 
an index block Row(B1 ) ∪ Row(B2 ), Col(B1 ) ∪ Col(B2 ) . In the following algorithm, we treat
the given matrix A to be a global variable which can be visited and modified anywhere by every
subroutines called. Consequently, every time we update values on A by some operations, these
operations are applied to the latest A.
The outer loop is the incremental step for main iteration introducing a new column ct which
updates the diagonalization of A≤t from the last iteration. The inner loop corresponds to block
updates which checks the intersection of the current column and the rows of each previous block
one by one.

Remark 11.5. The algorithm TotDiagonalize does not require the input presentation matrix to
be minimal. As indicated in Remark 11.3, the trivial parts result in either identity blocks or single
 
column blocks like ∅, { j} . Such a single column block corresponds to a zero morphism and
Computational Topology for Data Analysis 277

Algorithm 22 TotDiagonalize(A)
Input:
A = input matrix treated as a global variable whose columns and rows are totally ordered
respecting some fixed partial order given by the grading.
Output:
A total diagonalization A∗ with index blocks B∗
(0)
1: B(0) ← {Bi := {i}, ∅ | i ∈ Row(A)}
 
2: for t ← 1 to m := |Col(A)| do
B(t)
 
3: 0 ← ∅, {t}
4: for each B(t−1) ∈ B(t−1) do
 i
T := Row(B(t−1) ), Col(A≤t ) \ Col(B(t−1)

5: i i )
6: if BlockReduce (T )== false then
7: B(t)
i ← Bi
(t−1)
⊕ B(t)
0 ; \∗update Bi by appending t∗\
8: else
9: B(t)
i ← Bi
(t−1)
; \∗Bi remains unchanged∗\
10: \∗ A and ct are updated in Blockreduce when it return true∗\
11: end if
12: end for
13: B(t) ← {B(t) (t)
i } with all Bi containing t merged as one block.
14: end for
15: Return (A, B(m) )

is not merged with any other blocks. Therefore, c j is a zero column. For a single row block
 
{i}, ∅ which is not merged with any other blocks, ri is a zero row vector. It represents a free
indecomposable submodule in the total decomposition of the input persistence module.
We first prove the correctness of TotDiagonalize assuming that BlockReduce routine works
as claimed, namely, it checks if a sub-column of the current column ct can be zeroed out while
preserving the prior, that is, without changing the left submatrix from the previous iteration and
also the other sub-columns of ct that have already been zeroed out.
Proposition 11.4. At the end of each iteration t, A≤t is a total diagonalization.
Proof. We prove it by induction on t. For the base case t = 0, it follows trivially by definition.
Now assume A(t−1) is the matrix we get at the end of iteration (t − 1) with A(t−1)≤t−1 being totally
(t−1)
diagonalized. That means, A≤t−1 = A≤t−1 where A = A is the original given matrix. For
∗ (0)

contradiction, assume at the end of iteration t, the matrix we get, A(t) , has left submatrix A(t) ≤t
which is not a totally diagonalized. That means, some index block B ∈ B(t) can be decomposed
further. Observe that such B must contain t because all other index blocks (not containing t) in
B(t) are also in B(t−1) which cannot be decomposed further by our inductive assumption. We
denote this index block containing t as Bt . Let A0 be the equivalent matrix of A(t) such that A0≤t is
a total diagonalization with index blocks B0 . Let F be an equivalent transformation from A(t) to
A0 , which decomposes Bt into at least two distinct index blocks of B0 , say B0 , B1 , · · · . Only one
of them contains t, say B0 . Then B1 consists of only indices that are from A≤t−1 , which means
278 Computational Topology for Data Analysis

B1 equals some index block Bi ∈ B(t−1) . Therefore, the transformation F gives a sequence of
admissible operations which can reduce the sub-column ct |Row(Bi ) to zero in A(t) . Starting with
this sequence of admissible operations, we construct another sequence of admissible operations
which further keeps A(t) (t)
≤t−1 unchanged to reach the contradiction. Note that A≤t−1 = A≤t−1 .
(t−1)

Observe that all index blocks of B0 other


L than B0 are also index blocks in B
(t−1) , i.e. B0 \{B } ⊆
0
B (t−1) . For B0 , it can be written as B0 = B
B j ∈B(t−1) \B0 j ⊕ [∅, {t}]. Let Ba be the merge of index
are in A(t−1) and also in A0L
blocks thatL and Bb be the merge of the rest of the index blocks of A(t−1) ,
i.e., Ba = B j ∈B0 ∩B(t−1) B j and Bb = B j ∈B(t−1) \B0 B j . Then Ba and Bb can be viewed as a coarser
decomposition on A(t−1) 0 0
≤t−1 and also on A≤t−1 . By taking restrictions, we have A | Ba ∼ A
(t−1) |
Ba
with equivalent transformation Fa and A0 |Bb ∼ A(t−1) |Bb with equivalent transformation Fb . Then
Fa gives a sequence of admissible operations with indices in Ba and Fb gives a sequence of ad-
missible operations with indices in Bb . By applying these operations on A0 , we can transform
A0≤t−1 to A(t−1)
≤t−1 with sub-column [Row(A) \ Row(B0 ), {t}] unchanged, which consists of the sub-
columns that have already been reduced to zero. Combining all admissible operations from the
three transformations F, Fa and Fb together, we get a sequence of admissible operations that re-
(t)
duce sub-column [Row(Bi ), {t}] to zero without changing A<t and also those sub-columns which
have already been reduced. But, then BlockReduce would have returned ‘true’ signaling that Bi
should not be merged with any other block required to form the block Bt reaching a contradic-
tion. 

Now we design the BlockReduce subroutine as required. With the requirement of prior
preservation, observe that reducing the sub-column ct |RowB for some B ∈ B(t−1) is the same as
reducing T = [Row(B), (Col(A≤t ) \ Col(B))] called the target block (see Figure 11.9 on right).
The main idea of BlockReduce is to consider a specific subset of admissible operations called
independent operations. Within A≤t , these operations only change entries in T and this change is
independent of their order of application. The BlockReduce subroutine is designed to search for
a sequence of admissible operations within this subset and reduce T with it, if it exists. Clearly,
the prior is preserved with these operations. The only thing we need to ensure is that searching
within the set of independent operations is sufficient. That means, if there exists a sequence of ad-
missible operations that can reduce T to 0 and meanwhile preserves the prior, then we can always
find one such sequence with only independent operations. This is what we show next.
Consider the following matrices for each admissible operation. For each admissible column
operation ci → c j , let
Yi, j := A·[δi, j ]
where [δi, j ] is the m × m square matrix with only one non-zero entry at (i, j). Observe that A·[δi, j ]
is a matrix with the only nonzero column at j with entries copied from ci in A. Similarly, for each
admissible row operation rl → rk , let [δk,l ] be the ` × ` matrix with only non-zero entry at (k, l).
let
Xk,l := [δk,l ]·A
Application of a column operation ci → c j can be viewed as updating A to A·(I + [δi, j ]) = A +
Yi, j . Similar observation holds for row operations as well. For a target block T = [Row(B), Col(A≤t )\
Col(B)] defined on some B ∈ B(t−1) , we say an admissible column (row) operation, ci → c j
Computational Topology for Data Analysis 279

ci cj

rk

rl Al,i

Figure 11.10: [δk,l ]A[δi, j ] is a matrix with the only nonzero entry at (k, j) being a copy of Al,i .

(rl → rk resp.) is independent on T if i < Col(T ), j ∈ Col(T ) (l < Row(T ), k ∈ Row(T ) resp.).
Briefly, we just call such operations independent operations if T is clear from the context.
We have two observations about independent operations that are important. The first one
follows from the definition that T = [Row(B), Col(A≤t ) \ Col(B)].

Observation 11.1. Within A≤t , an independent column or row operation only changes entries on
T.

Observation 11.2. For any independent column operation ci → c j and row operation rl → rk ,
we have [δk,l ]·A·[δi, j ] = 0. Or, equivalently

(I + [δk,l ])·A·(I + [δi, j ]) = A + [δk,l ]A + A[δi, j ] = A + Xk,l + Yi, j (11.1)

Proof. [δk,l ]·A·[δi, j ] = Al,i [δk, j ] (see Fig 11.10 for an illustration). By definitions of independence
and T , we have l < Row(B), i ∈ Col(B). That means they are row index and column index from
different blocks. Therefore, Al,i = 0. 

The following proposition reveals why we are after the independent operations.

Proposition 11.5. The target block A|T can be reduced to 0 while preserving the prior if and only
if A|T can be written as a linear combination of independent operations. That is,
X X
A|T = αk,l Xk,l |T + βi, j Yi, j |T
l<Row(T ) i<Col(T )
k∈Row(T ) j∈Col(T )

where αk,l ’s and βi, j ’s are coefficient in k = Z2 .

The full proof can be seen in [142]. Here, we give some intuitive explanation. Reducing the
target block A|T to 0 is equivalent to finding matrices P and Q encoding sequences of admissible
row operations and admissible column operations respectively so that PAQ|T = 0. For ⇐ direc-
tion, we can build P = I + αk,l [δk,l ] and Q = I + βi, j [δi, j ] with binary coefficients αk,l ’s and
P P
βi, j ’s given in Eqn. (11.5). Then using Observations 11.1 and 11.2, one can show PAQ indeed
reduces A|T to 0 with the prior being preserved. This provides the proof for the ‘if’ direction.
280 Computational Topology for Data Analysis

For ‘only if’ direction, as long as we show that the existence of a transformation reducing A|T
to 0 implies the existence of a transformation reducing A|T to 0 by independent operations, we
are done. This is formally proved in [142].
We can view A|T , Yi, j |T , Xk,l |T as binary vectors in the same |T |-dimensional space. Propo-
sition 11.5 tells us that it is sufficient to check if A|T can be a linear combination of the vectors
corresponding to a set of independent operations. So, we first linearize each of the matrices
Yi, j |T ’s, Xk,l |T ’s, and A|T to a column vector as described later (see Figure 11.11). Then, we
check if A|T is in the span of Yi, j |T ’s and Xk,l |T ’s. This is done by collecting all vectors Xi, j |T ’s
and Yk,l |T ’s into a matrix S called the source matrix (Figure 11.11(right)) and then reducing the
vector c := A|T with S by some standard matrix reduction algorithm with left-to-right column
additions, which we have seen before in Section 3.3.1 for computing persistence. This routine is
presented in Algorithm 23:ColReduce (S, c) which reduces the column c w.r.t. the input matrix S
by reducing the matrix [S|c] altogether by MatPersistence in Section 3.3.1.
If c = A|T can be reduced to 0, we apply the corresponding independent operations to update
A. Observe that all column operations used in reducing A|T together only change the sub-column
ct |Row(B) while row operations may change A to the right of the column t. We say this procedure
reduces c with S.

Algorithm 23 ColReduce(S, c)
Input:
Source matrix S and target column c to reduce
Output:
Reduced column c with S
1: S0 ← [S|c]
2: Call MatPersistence(S0 );
3: return c along with indices of columns in S used for reduction of c

Fact 11.1. There exists a set of column operations adding a column only to its right such that the
matrix [S|c] is reduced to [S0 |0] if and only if ColReduce(S, c) returns a zero vector.

Now we describe the linearization used in in Algorithm 24:BlockReduce. We fix a linear


order ≤Lin on the set of matrix indices, Row(A) × Col(A), as follows: (i, j) ≤Lin (i0 , j0 ) if j > j0 or
j = j0 , i < i0 . Explicitly, we linearly order the indices as:

((1, m), (2, m), . . . , (`, n), (1, m − 1), (2, m − 1), . . . ).

For any index block B, let Lin(A|B ) be the vector of dimension |Col(B)| · |Row(B)| obtained by
linearizing A|B to a vector in the above linear order on the indices.

Proposition 11.6. The target block on T can be reduced to zero in A while preserving the prior
if and only if BlockReduce(T ) returns true.

Time complexity. First we analyze the time complexity of TotDiagonalize assuming that the
input matrix has size ` × m. Clearly, max{`, m} = O(N) where N is the total number of generators
Computational Topology for Data Analysis 281

Lin(A)T
A
ck := c`

ci cj
ST
cj := ci
Lin(Yij )T
Yij
0 0 0 0 cj := ci 0
cj := ct

Figure 11.11: (top) Matrix A is linearized to the vector Lin(A) in middle; (bottom) the column
operation ci → c j is captured by Yi j whose linearization is illustrated in middle; (right) source
matrix S combining all operations (row operations not shown). In the picture, (·)T denotes trans-
posed matrices.

Algorithm 24 BlockReduce(T )
Input:
Index of target block T to be reduced; Given matrix A is assumed to be a global variable
Output:
A boolean to indicate whether A|T can be reduced and reduced block A|T if possible.
1: Compute c := Lin(A|T ) and initialize empty matrix S
2: for each admissible column operation ci → c j with i < Col(T ), j ∈ Col(T ) do
3: compute Yi, j |T := (A·[δi, j ])|T and yi, j = Lin(Yi, j |T ); update S ← [S|yi, j ]
4: end for
5: for each admissible row operation rl → rk with l < Row(T ), k ∈ Row(T ) do
6: compute Xk,l |T := ([δk,l ]·A)|T and xk,l := Lin(Xk,l |T ); update S ← [S|xk,l ]
7: end for
8: ColReduce (S, c) returns indices of yi, j ’s and xk,l ’s used to reduce c (if possible)
9: For every returned index of yi, j or xk,l apply ci → c j or rl → rk to transform A
10: return A|T == 0

and relations. For each of O(N) columns, we attempt to zero out every sub-column with row
indices coinciding with each block B of the previously determined O(N) blocks. Let B has NB
rows. Then, the block T has NB rows and O(N) columns.
To zero-out a sub-column,N  we create a source matrix out of T which has size O(NNB ) ×
2
O(N ) because each of O( 2 ) possible operations is converted to a column of size O(NNB ) in
the source matrix. The source matrix S with the target vector c can be reduced with an efficient
algorithm [57, 200] in O(a + N 2 (NNB )ω−1 ) time where a is the total number of nonzero elements
in [S|c] and ω ∈ [2, 2.373) is the exponent for matrix multiplication. We have a = O(NNB · N 2 ) =
O(N 3 NB ). Therefore, for each block B we spend O(N 3 NB + N 2 (NNB )ω−1 ) time. Then, observing
B∈B N B = O(N), for each column we spend a total time of
P

O(N 3 NB + N 2 (NNB )ω−1 ) = O(N 4 + N ω+1 NBω−1 ) = O(N 4 + N 2ω ) = O(N 2ω ).


X X

B∈B B∈B
282 Computational Topology for Data Analysis

Therefore, counting for all of the O(N) columns, the total time for decomposition takes O(N 2ω+1 )
time.
We finish this analysis by commenting that one can build the presentation matrix from a given
simplicial filtration consisting of n simplices leading to the following cases: (i) For 0-th homology,
the boundary matrix ∂1 can be taken as the presentation matrix giving N = O(n) and a total time
complexity of O(n2ω+1 ); (ii) for 2-parameter case, N = O(n) and presentations can be computed
in O(n3 ) time giving a total time complexity of O(n2ω+1 ); (iii) for d-parameter case, N = O(nd−1 )
and a presentation matrix can be computed in O(nd+1 ) time giving a total time complexity of
O(n(2ω+1)(d−1) ). We discuss the details in Section 11.5.

11.4.1 Running TotDiagonalize on the working example in Figure 11.5


Example 11.2. Consider the binary matrix after simplification as illustrated in Example 11.1.

A c(1,1) c(1,2) c(2,1)


 1 2 3 
r1(0,1)  1
 1 0 

r2(1,0)  1
 0 1 

r3(1,1) 0 1 1

with 4 admissible operations: r3 → r1 , r3 → r2 , c1 → c2 , c1 → c3 . The matrix remains the same


after the first column c1 is processed in TotDiagonalize.

(1,1) (2,1) (1,2) (1,1) (2,1) (1,2)


er eb e0g er eb eg
(0,1) vb

(1,0) vr =
0
(1,1) vg

∂∗ U ∂ V

Figure 11.12: Diagonalizing the binary matrix given in Example 11.1. It can be viewed as
multiplying the original matrix ∂ with a left matrix U that represents the row operation and a right
matrix V that represents the column operations.

Before the first iteration, B is initialized to be B = {B1 = ({1}, ∅), B2 = ({2}, ∅), B3 =
({3}, ∅)}. In the first iteration when t = 1, we have block B0 = (∅, {1}) for column c1 . For B1 =
({1}, ∅), the target block we hope to zero out is T = ({1}, {1}). So we call BlockReduce(T ) to check
if A|T can be zeroed out and update the entries on T according to the results of BlockReduce(T ).
There is only one admissible operation from outside of T into it, namely, r3 → r1 . The target
vector c = Lin(A|T ) and the source matrix S = {Lin(([δ1,3 ]A)|T )} are:

S Lin(([δ1,3 ]A)|T ) c=Lin(A|T )


h i
0 1
Computational Topology for Data Analysis 283

The result of ColReduce(S, c) stays the same as its input. That means we cannot reduce c at all.
Therefore, BlockReduce(T, t) returns false and nothing is updated in the original matrix.
It is not surprising that the matrix remains the same because the only admissible operation
that can affect T does not change any entries in T at all. So there is nothing one can do to
reduce it, which results in merging B1 ⊕ B0 = ({1}, {1}). Similarly, for B2 with T = ({2}, {1}),
the only admissible operation r3 → r2 does not change anything in T . Therefore, the matrix
does not change and B2 is merged with B1 ⊕ B0 , which results in the block ({1, 2}, {1}). For
B3 with T = ({3}, {1}), there is no admissible operation. So the matrix does not change. But
A|T = A|({3},{1}) = 0. That means BlockReduce returns true. Therefore, we do not merge B3 . In
summary, B0 , B1 , B2 are merged to be one block ({1, 2}, {1}) in the first iteration. So after the first
iteration, there are two index blocks in B(1) : ({1, 2}, {1}) and ({3}, ∅).
In the second iteration t = 2, we process the second column c2 . Now B1 = ({1, 2}, {1}), B2 =
({3}, ∅) and B0 = (∅, {2}). For the block B1 = ({1, 2}, {1}), the target block we hope to zero out is
T = ({1, 2}, {2}). There are three admissible operations from outside of T into T , r3 → r1 , r3 →
r2 , c1 → c2 . BlockReduce(T ) constructs the target vector c = Lin(A|T ) and the source matrix
S = {Lin(([δ1,3 ]A)|T ), Lin(([δ2,3 ]A)|T ), Lin((A[δ1,2 ])|T )} illustrated as follows:

S Lin(([δ1,3 ]A)|T ) Lin([(δ2,3 ]A)|T ) Lin((A[δ1,2 ])|T ) c=Lin(A|T )


" #
1 0 1 1
0 1 1 0
The result of ColReduce(S, c) is

S c
" #
1 0 0 0
0 1 0 0

So the BlockReduce updates A|T to get the following updated matrix:

A0 c(1,1) c(1,2) c(2,1)


 1 2 3 
r1(0,1) + r3(1,1)  1
 0 1 

r2(1,0)  1
 0 1 

r3(1,1) 0 1 1

and returns true since A0 |T == 0. Therefore, we do not merge B1 . We continue to check for
the block B2 = ({3}, ∅) and T = ({3}, {1, 2}), whether A0 |T can be reduced to zero. There is no
admissible operation for this block at all. Therefore, the matrix stays the same and BlockReduce
returns false. We merge B2 ⊕ B0 = ({3}, {2}).
Continuing the process for the last column c3 in the third iteration t = 3, we see that B1 =
({1, 2}, {1}), B2 = ({3}, {2}) and B0 = (∅, {3}). For the block B1 = ({1, 2}, {1}), the target block we
hope to zero out is T = ({1, 2}, {2, 3}). There are four admissible operations from outside of T into
T , r3 → r1 , r3 → r2 , c1 → c2 , c1 → c3 . BlockReduce(T ) constructs the target vector c = Lin(A|T )
and the source matrix S = {Lin(([δ1,3 ]A)|T ), Lin(([δ2,3 ]A)|T ), Lin((A[δ1,2 ])|T )}, Lin((A[δ1,3 ])|T )} il-
284 Computational Topology for Data Analysis

lustrated as follows:

S Lin(([δ1,3 ]A)|T ) Lin([(δ2,3 ]A)|T ) Lin((A[δ1,2 ])|T ) Lin((A[δ1,3 ])|T ) c=Lin(A|T )


 
 1 0 0 1 1 

 0 1 0 1 1 


 1 0 1 0 0 

0 1 1 0 0

The result of ColReduce(S, c) is

S c
 
 1 0 1 0 0 
 0 1 1 0 0 
 
 1 0 0 0 0 
 
0 1 0 0 0

So the BlockReduce updates A|T to get the following updated matrix:

A0 c(1,1) c(1,2) + c(1,1) c(2,1)


 1 2 1 3 
r1(0,1)  1
 0 0 

r2(1,0) + r3(1,1)  1
 0 0 

r3(1,1) 0 1 1

and returns true since A0 |T == 0. Therefore, we do not merge B1 with any other block. We
continue to check for the block B2 = ({3}, {2}) and T = ({3}, {1, 3}), whether A0 |T can be reduced
to zero. There is no admissible operation for this block at all. Therefore, the matrix stays the same
and BlockReduce returns false. We merge B2 ⊕ B0 = ({3}, {2, 3}).
Finally the algorithm returns the matrix A0 shown above as the final result. It is the correct

total diagonalization with two index blocks in BA : B1 = ({1, 2}, {1}) and B2 = ({3}, {2, 3}). An
examination of ColReduce(S, c) in all three iterations over columns reveals that the entire matrix
A is updated by operations r3 → r2 and c1 → c2 . We can further transform it back to the original
form of the presentation matrix [∂1 ]. Observe that a row addition ri ← ri + r j reverts to a basis
change in the opposite direction.

[∂1 ] e(1,1) e(1,2) e(2,1)


 r b g

v(0,1)
b
 t(1,0)
 t(1,1) 0 
v(1,0)  t(0,1) 
r 0 t(1,1) 
v(1,1)

t(0,1) t(1,0)

g 0
=⇒
[∂1 ]∗ e(1,1) e(1,2) + t(0,1) e(1,1) e(2,1)
 r b r g

v(0,1)
b
 t(1,0)
 0 0 

v(1,0)
r
 t(0,1) 0 0 
v(1,1) + t(0,1) v(1,0)
 
g r 0 t(0,1) t(1,0)
Computational Topology for Data Analysis 285

11.5 Computing presentations


Now that we know how to decompose a presentation by diagonalizing its matrix form, we describe
how to construct and compute these matrices in this section. For a persistence module H p with
p-th homology groups, we consider a presentation C p+1 → Z p  H p → 0 where C p+1 is a graded
module of (p + 1)-chains and Z p is a graded module of p-cycles which we describe now. Recall
that a (d-parameter) simplicial filtration is a family of simplicial complexes {Xu }u∈Zd such that for
each grade u ∈ Zd and each i = 1, · · · , d, Xu ⊆ Xu+ei .

11.5.1 Graded chain, cycle, and boundary modules


We obtain a simplicial chain complex (C· (Xu ), ∂· ) for each Xu in the given simplicial filtration. For
each comparable pairs in the grading u ≤ v ∈ Zd , a family of inclusion maps C· (Xu ) ,→ C· (Xv ) is
induced by the canonical inclusion Xu ,→ Xv giving rise to the following diagram:
∂ p+2 ∂ p+1 ∂p ∂ p−1
C· (Xu ) : · · · C p+1 (Xu ) C p (Xu ) C p−1 (Xu ) ···

∂ p+2 ∂ p+1 ∂p ∂ p−1


C· (Xv ) : · · · C p+1 (Xv ) C p (Xv ) C p−1 (Xv ) ···

For each chain complex C· (Xu ), we have the cycle spaces Z p (Xu )’s and boundary spaces
B p (Xu )’s as kernels and images of boundary maps ∂ p ’s respectively, and the homology group
H p (Xu ) = Z p (Xu )/B p (Xu ) as the cokernel of the inclusion maps B p (Xu ) ,→ Z p (Xu ). In line with
category theory we use the notations im , ker, coker for indicating both the modules of kernel,
image, cokernel and the corresponding morphisms uniquely determined by their constructions3 .
We obtain the following commutative diagram:

B p (Xu ) Z p (Xu ) coker H p (Xu )


im ∂ p+1 ker ∂ p
∂ p+1
· · · C p+1 (Xu ) C p (Xu ) · · ·

In the language of graded modules, for each p, the family of vector spaces and linear maps (in-
clusions) ({C p (Xu )}u∈Zd , {C p (Xu ) ,→ C p (Xv )}u≤v ) can be summarized as a Zd -graded R-module:
M
C p (X) := C p (Xu ), with the ring action ti · C p (Xu ) : C p (Xu ) ,→ C p (Xu+ei ) ∀i, ∀u.
u∈Zd

That is, the ring R acts as the linear maps (inclusions) between pairs of vector spaces in C p (X· )
with comparable grades. It is not too hard to check that this C p (X· ) is indeed a graded module.
Each p-chain in a chain space C p (Xu ) is a homogeneous element with grade u.
Then we have a chain complex of graded L modules (C∗ (X), ∂∗ ) where ∂∗ : C∗ (X) → C∗−1 (X)
is the boundary morphism given by ∂∗ , u∈Zd ∂∗,u with ∂∗,u : C∗ (Xu ) → C∗−1 (Xu ) being the
boundary map on C∗ (Xu ).
3
e.g. ker ∂ p denotes the inclusion of Z p into C p
286 Computational Topology for Data Analysis

The kernel and image of a graded module morphism are also graded modules as submodules
of domain and codomain respectively whereas the cokernel is a quotient module of the codomain.
They can also be defined grade-wise in the expected way:

For f : M → N, (ker f )u = ker fu , (im f )u = im fu , (coker f )u = coker fu .

All the linear maps are naturally induced from the original linear maps in M and N. In our
chain complex cases, the kernel and image of the boundary morphism ∂ p : C p (X) → C p−1 (X)
is the family of cycle spaces Z p (X) and family of boundary spaces B p−1 (X) respectively with
linear maps induced by inclusions. Also, from the inclusion induced morphism
L B p (X) ,→ Z p (X),
we have the cokernel module H p (X), consisting of homology groups u∈Zd H p (Xu ) and linear
maps induced from inclusion maps Xu ,→ Xv for each comparable pairs u ≤ v. This H p (X)
is the persistence module M which we decompose. Classical persistence modules arising from a
filtration of a simplicial complex over Z is an example of a 1-parameter persistence module where
the action t1 · Mu ⊆ Mu+e1 signifies the linear map Mu → Mv between homology groups induced
by the inclusion of the complex at u into the complex at v = u + e1 .
In our case, we have chain complex of graded modules and induced homology groups which
can be succinctly described by the following diagram:

B p (X) Z p (X) H p (X) B p−1 (X) Z p−1 (X) H p−1 (X)


im ∂ p+1 ker(∂ p ) im ∂ p ker ∂ p−1
∂ p+1 ∂p
· · · C p+1 (X) C p (X) C p−1 (X) · · ·

An assumption. We always assume that the simplicial filtration is 1-critical, which means that
each simplex has a unique earliest birth time. For the case which is not 1-critical, called multi-
critical, one may utilize the mapping telescope, a standard algebraic construction [186], which
transforms a multi-critical filtration to a 1-critical one. However, notice that this transformation
increases the input size depending on the multiplicity of the incomparable birth times of the
simplices. For 1-critical filtrations, each module C p is free. With a fixed basis for each free
module C p , a concrete matrix [∂ p ] for each boundary morphism ∂ p based on the chosen bases can
be constructed.
With this input, we discuss our strategies for different cases that depend on two parameters,
d, the number of parameters of filtration function, and p, the dimension of the homology groups
in the persistence modules.
Note that a presentation gives an exact sequence F 1 → F 0  H → 0. To reveal further
details of a presentation of H, we recognize that it respects the following commutative diagram,

Y1
im f 1 ker f 0
f1 f 0 =coker f 1
F1 F0 H
Computational Topology for Data Analysis 287

where Y 1 ,→ F 0 is the kernel of f 0 . With this diagram being commutative, all maps in
this diagram are essentially determined by the presentation map f 1 . We call the surjective map
f 0 : F 0 → H generating map, and Y 1 = ker f 0 the 1 st syzygy module of H.

11.5.2 Multiparameter filtration, zero-dimensional homology


In this case p = 0 and d > 0. In this case, we obtain a presentation matrix straightforwardly with
the observation that the module Z0 of cycle spaces coincides with the module C0 of chain spaces.
∂1 coker∂1
• Presentation: C1 C0 H0

• Presentation matrix = [∂1 ] is given as part of the input.

Justification. For p = 0, the cycle module Z0 = C0 is a free module. So we have the presentation
of H0 as claimed. It is easy to check that ∂1 : C1 → C0 is a presentation of H0 since both C1 and
C0 are free modules. With standard basis of chain modules C p ’s, we have a presentation matrix
[∂1 ] as the valid input to our decomposition algorithm.
The 0-th homology in our working example (Figure 11.5) corresponds to this case. The
presentation matrix is the same as the matrix of boundary morphism ∂1 .

11.5.3 2-parameter filtration, multi-dimensional homology


In this case, d = 2 and p ≥ 0. Lesnick and Wright [222] present an algorithm to compute
a presentation, in fact a minimal presentation, for this case. When d = 2, by Hilbert Syzygy
Theorem [191], the kernel of a morphism between two free graded modules is always free. This
implies that the canonical surjective map Z p  H p from free module Z p can be naturally chosen
as a generating map in the presentation of H p . In this case we have:

∂¯ p+1 coker∂¯ p+1


• Presentation: C p+1 Zp H p where ∂¯ p+1 is the induced map from the dia-
gram:

Bp Zp Hp
im ∂ p+1 ker ∂ p
∂¯ p+1
C p+1 Cp
∂ p+1

• Presentation matrix = [∂¯ p+1 ] is constructed as follows:

1. Compute a basis G(Z p ) for the free module Z p where G(Z p ) is presented as a set of
generators in the basis of C p . This can be done by an algorithm in [222]. Take G(Z p )
as the row basis of the presentation matrix [∂¯ p+1 ].
2. Present im ∂ p+1 in the basis of G(Z p ) to get the presentation matrix [∂¯ p+1 ] of the
induced map as follows. Originally, im ∂ p+1 is presented in the basis of C p through
the given matrix [∂ p+1 ]. One needs to rewrite each column of [∂ p+1 ] in the basis G(Z p )
computed in the previous step. This can be done as follows. Let [G(Z p )] denote the
288 Computational Topology for Data Analysis

matrix presenting basis elements in G(Z p ) in the basis of C p . Let c be any column
vector in [∂ p+1 ]. We reduce c to zero vector by the matrix [G(Z p )] and note the
columns that are added to c. These columns provide the necessary presentation of c
in the basis G(Z p ). This reduction can be done by the persistent algorithm described
in Chapter 3.

Justification. Unlike p = 0 case, for p > 0, we just know Z p is a (proper) submodule of C p ,


which means that Z p is not necessarily equal to the free module C p . However, fortunately for
d = 2, the module Z p is free, and we have an efficient algorithm to compute a basis of Z p as the
kernel of the boundary map ∂ p : C p → C p−1 . Then, we can construct the following presentation
of H p :

Bp
im ∂ p+1
coker∂¯ p+1
C p+1 ∂¯ p+1 Zp Hp 0

Here the ∂¯ p+1 is an induced map from ∂ p+1 . With a fixed basis on Z p and standard basis of C p+1 ,
we rewrite the presentation matrix [∂ p+1 ] to get [∂¯ p+1 ], which constitutes a valid input to our
decomposition algorithm.

11.5.4 d > 2-parameter filtration, multi-dimensional homology


The above construction of presentation matrix cannot be extended straightforwardly to d-parameter
persistence modules d > 2. Unlike the case in d ≤ 2, the cycle module Z is not necessarily free
when d > 2. The issue caused by non-free Z is that, if we use the same presentation matrix as
we did in the previous case with free Z, we may lose some relations coming from the inner rela-
tions of a generating set of Z. One can fix this problem by adding these inner relations into the
presentation matrix as detailed in [142]. It is more complicated and we skip it here.
Figure 11.13 shows a simple example of a filtration of simplicial complex whose persistence
module H p for p = 1 is a quotient module of non-free module Z. The module H1 is generated
by three 1-cycles presented as g(0,1,1) 1 , g(1,0,1)
2 , g(1,1,0)
3 . But when they appear together in (1, 1, 1),
(0,1,1)
there is a relation between these three: t (1,0,0) g1 + t(0,1,0) g(1,0,1)
2 + t(0,0,1) g(1,1,0)
3 = 0. Although
(0,1,1) (1,0,1) (1,1,0)
im ∂1 = 0, we still have a nontrivial relation from Z. So, we have H1 =< g1 , g2 , g3 :
(0,1,1) (1,0,1) (1,1,0)
s (1,1,1) =t (1,0,0) g1 +t (0,1,0) g2 +t (0,0,1) g3 >. The presentation matrix turns out to be the
following:

s(1,1,1)
g(0,1,1)
 (1,0,0) 
1
 t 
 (0,1,0) 
g(1,0,1)
2
 t 
 (0,0,1) 
g(1,1,0)
3 t
Computational Topology for Data Analysis 289

(0, 1, 1) (1, 1, 1)

(0, 0, 1) (1, 0, 1)

(0, 1, 0) (1, 1, 0)

(0, 0, 0) (1, 0, 0)

Figure 11.13: An example of a filtration of simplicial complex for d = 3 with non-free Z when
p = 1. The three cycles at gradings (0, 1, 1), (1, 0, 1), (1, 1, 0) are three generators in Z1 . However,
at grading (1, 1, 1), the earliest time these three cycles exist simultaneously, there is a relation
among these three generators.

11.5.5 Time complexity


Now we consider the time complexity for computing presentation and decomposition together.
Let n be the size of the input filtration, that is, total number of simplices obtained by counting at
most one new simplex at a grid point of Zd . We consider three different cases as before:
Multi-parameter, 0-th homology: In this case, the presentation matrix [∂1 ] where ∂1 : C1 → C0
has size O(n) × O(n), that is, N = O(n). Therefore, the total time complexity for this case is
O(n2ω+1 ).
2-parameter, multi-dimensional homology: In this case, as described in section 11.5.3, first
we compute a basis G(Z p ) that is presented in the basis of C p . This is done by the algorithm of
Lesnick and Wright [222] which runs in O(n3 ) time. Using [G(Z p )], we compute the presentation
matrix [∂¯ p+1 ] as described in section 11.5.3. This can be done in O(n3 ) time assuming that G(Z p )
has at most O(n) elements. The presentation matrix is decomposed with TotDiagonalize as in the
previous case. However, to claim that it runs in O(n2ω+1 ) time, one needs to ensure that the basis
G(Z p ) has O(n) elements. This follows from the fact that Z p being a free submodule of C p cannot
have a rank larger than that of C p . In summary, the total time complexity in this case becomes
O(n3 ) + O(n2ω+1 ) = O(n2ω+1 ).
d-parameter, d ≥ 2, multi-dimensional homology: For d-parameter persistence modules where
d ≥ 2 (this subsumes the previous case), an algorithm using a result of Skryzalin [279] can be
designed that runs in time O(nd+1 ) and produces a presentation matrix of dimensions O(nd−1 ) ×
O(nd−1 ); see [142] for details. Plugging N = O(nd−1 ) and taking the computation of presenta-
290 Computational Topology for Data Analysis

tion matrix into consideration, we get a time complexity bound of O(nd+1 ) + O(n(2ω+1)(d−1) ) =
O(n(2ω+1)(d−1) ).

11.6 Invariants
For a given persistence module, it is useful to compute invariants that in some sense summarize
the information contained in them. Ideally, these invariants should characterize the input mod-
ule completely, meaning that the two invariants should be equal if and only if the modules are
isomophic. Persistence diagrams for 1-parameter tame persistence modules are such invaraints.
For multiparameter persistence modules, no such complete invariants exist that are finite and
hence computable. However, we can still aim for invariants that are computable and characterize
the modules in some limited sense, meaning that these invariants remain equal for isomorphic
modules though may not differentiate non-isomorphic modules. Of course, their effectiveness in
practice is determined by their discriminative power. We present two such invariants below: the
first one rank invariant was suggested in [65] whereas the second one graded Betti number was
brought to TDA by [214] and studied further in [221].

11.6.1 Rank invariants


Assume that the input graded module M is finitely generated as before and additionally finitely
supported. For this we need to define the support of M.
Definition 11.17 (Support). Let M be a Zd -graded module. Its support is defined as the graph
supp(M) = (V, E ⊆ V × V) where a node v ∈ V if and only if Mv , 0 and an edge (u, v) ∈ E if
and only if (i) u < v and there is no s ∈ Zd satisfying u < s < v, and (ii) rank(Mu → Mv ) , 0.
We say M is finitely supported if supp(M) is finite.
Fact 11.2. supp(M) is disconnected if there exist two grades u < v in supp(M) so that rank(Mu →
Mv ) = 0.
For a finitely generated and finitely supported module M, we can compute a finite number of
ranks of linear maps which collectively form the rank invariant of M. For two grades u  v, the
linear maps between Mu and Mv are not defined. In the following definitions, we take them as
zero maps.
Definition 11.18 (Rank invariant). Let ruv (M) = rank(Mu → Mv ) for any pair u, v ∈ supp(M).
The collection {ruv (M)}u,v∈supp(M) is called the rank invariant of M.
Fact 11.3. Rank invariant of a 1-parameter module is a complete invariant. For a 1-parameter
i, j
persistence module H p , it is given by persistent Betti numbers β p as defined in Definition 3.4.
Although in 1-parameter case, the rank invariant provides complete information about the
module, it does not do so for multiparameter persistence modules. For example, it cannot provide
information about ‘birth’ and ‘death’ of the generators. This information can be deduced from a
wider collection of rank invariant data called multirank invariant where we compute ranks of the
linear maps between vector spaces at multiple grades. Multirank invariant is still not a complete
invariant.
Computational Topology for Data Analysis 291

Definition 11.19 (Multirank invariant).


L L{rUV (M)}
The collection  for every pair U ⊆ supp(M) and
V ⊆ supp(M) where rUV (M) = rank M
u∈U u → v∈V v is called the multirank invariant
M
of M.

We can retrieve the information about birth and death of generators from the multirank. For a
grade u, define its immediate predecessors Pu and immediate successors S u as:

Pu = {u0 ∈ supp(M) | u0 < u and @u00 with u0 < u00 < u}


S u = {u0 ∈ supp(M) | u0 > u and @u00 with u < u00 < u0 }.

Fact 11.4.
L 
1. We have that m generators get born at grade u if and only if coker u0 ∈P u
Mu0 → Mu has
dimension m.
 L 
2. We have that m generators die leaving grade u if and only if ker Mu → 0
u ∈S u M u0 has

dimension m.

Although multirank invariants cannot characterize multiparameter persistence modules com-


pletely in general, they do so for the special case of interval decomposable modules. We will
describe these modules in details in the next chapter. Here we introduce them briefly.
We call I ⊆ supp(M) an interval if I is connected and for every u, v ∈ I, if u < w < v,
then w ∈ I. We call a persistence module with support on an interval an interval module if Mu
is unit dimensional for every vertex u ∈ supp(M).
L iA persistence module M is called interval
decomposable if there is a decomposition M = M where each M i is an interval module.

Fact 11.5. Two interval decomposable modules are isomorphic if and only if they have the same
multirank invariants.

11.6.2 Graded Betti numbers and blockcodes


For 1-parameter persistence modules, the barcodes provide a complete invariant. For multipa-
rameter persistence, we first introduce an invariant called graded Betti number, which we refine
further to define persistent graded Betti numbers as a generalization of persistence diagrams. The
decomposition of a module also allows us to define blockcodes as a generalization of barcodes.
Both of them depend on the ideas of free resolution and graded Betti numbers which are well
studied in commutative algebra and are first introduced in topological data analysis by Knud-
son [214].

Definition 11.20 (Free resolution). For a graded module M, a free resolution F → M is an exact
sequence:
f2 f1 f0
··· F2 F1 F0 M 0 where each F i is a free graded R-
module.

Now we observe that a free resolution can be obtained as an extension of a free presentation.
Consider a free presentation of M as depicted below.
292 Computational Topology for Data Analysis

Y1
im f 1 ker f 0
f1 f 0 =coker f 1
F1 F0 M

If the presentation map f 1 has nontrivial kernel, we can find a nontrivial map f 2 : F 2 → F 1
with im f 2 = ker f 1 , which implies coker f 2  im f 1 = ker f 0 = Y 1 . Therefore, f 2 is in fact a
presentation map of the module Y 1 which is so called the first syzygy module of M (named after
Hilbert’s famous syzygy theorem [191]). We can keep doing this to get f 3 , f 4 , . . . by constructing
presentation maps on higher order syzygy modules Y 2 , Y 3 , . . . of M, which results in a diagram
depicted below, which is gives a free resolution of M.
Y3 Y2 Y1
im f 3 ker f 2 im f 2 ker f 1 im f 1 ker f 0
f3 f2 f1 f 0 =coker f 1
··· F3 F2 F1 F0 M
Free resolution is not unique. However, there exists an essentially unique minimal free resolu-
tion in the sense that any free resolution can be obtained by summing the minimal free resolution
with a free resolution of a trivial module. Below we give a construction to build a minimal free
resolution from a minimal free presentation. The proof that it indeed creates a minimal free
resolution can be found in [50, 268].

Construction of minimal free Lnresolution. Choose a minimalgr(g set of homogeneous generators


1) gr(g )
g1 , · · · , gn of M. Let F 0 = R
i=1 →gr(gi ) with standard basis e 1 , · · · , en n of F 0 . The ho-
mogeneous R-map f 0 : F 0 → M is determined by f 0 (ei ) = gi . Now the 1st syzygy module
ker f 0
of M, Y1 F 0 , is again a finitely generated graded R-module. We choose a minimal set
Lm
of homogeneous generators y1 , · · · , ym of Y1 and let F 1 = j=1 R→gr(y j ) with standard basis
0gr(y ) 0gr(y )
e1 1 , · · · , em m of F 1 . The homogeneous R-map f 1 : F 1 → F 0 is determined by f 1 (e0j ) = y j .
By repeating this procedure for Y2 = ker f 1 and moving backward further, one gets a graded free
resolution of M.

Definition 11.21 (Graded Betti numbers). Let F j be a free module in the minimal free resolution
of a graded module M. Let β M d
j,u be the multiplicity of each grade u ∈ Z in the multiset consisting
of the grades of homogeneous basis elements for F j . Then, the mapping β(−,−)
M : Z≥0 × Zd → Z≥0
is an invariant called the graded Betti numbers of M.

For example, the graded Betti number of the persistence module for our working example in
Figure 11.5 is listed as Table 11.1.
L i
Definition 11.22 (Persistent graded Betti numbers). Let M  M be a total decomposition
i
of a graded module M. We have for each indecomposable M , the refined graded Betti numbers
i i Mi
β M = {β Mj,u | j ∈ N, u ∈ Z }. We call the set PB(M) := {β } the persistent graded Betti numbers
d

of M.

For the working example in Figure 11.5, the persistent graded Betti numbers are given in two
tables listed in Table 11.2.
Computational Topology for Data Analysis 293

βM (1,0) (0,1) (1,1) (2,1) (1,2) (2,2) ···


β0 1 1 1
β1 1 1 1
β2 1
β≥3

Table 11.1: All the nonzero graded Betti numbers βi,u are listed in the table. Empty items are all
zeros.

One way to summarize the information of graded Betti numbers is to use the Hilbert function,
which is also called dimension function [141] defined as:

dmM : Zd → Z≥0 dmM(u) = dim(Mu )

Fact 11.6. There is a relation between the graded Betti numbers and dimension function of a
persistence module as follows:
XX
∀u ∈ Zd , dmM(u) = (−1) j β j,v
v≤u j

Then for each indecomposable M i , we have the dimension function dmM i related to persistent
graded Betti numbers restricted to M i .

Definition 11.23 (Blockcode). The set of dimension functions Bdm (M) := {dmM i } is called the
blockcode of M.

For our working example, the dimension functions of indecomposable summands M 1 and M 2
are (see Figure 11.14 for the visualization):
 
 1 if u ≥ (1, 0) or u ≥ (0, 1) 1 if u = (1, 1)

dmM 1 (u) =  2
=
 
dmM (u) (11.2)

0 otherwise
 
0 otherwise

1
βM (1,0) (0,1) (1,1) (2,1) (1,2) (2,2) ···
β0 1 1
β1 1
β≥2
2
βM (1,0) (0,1) (1,1) (2,1) (1,2) (2,2) ···
β0 1
β1 1 1
β2 1
β≥3
1 2
Table 11.2: Persistence grades PB(M) = {β M , β M }. All nonzero entries are listed in this table.
Blank boxes indicate 0 entries.
294 Computational Topology for Data Analysis

2 2 2

1 1 1

0 0 0
0 1 2 0 1 2 0 1 2
k t1 k t1 k k t1 k t1 k 0 0 0 0 0

t2 [t2 ,t2 ] t2 t2 t2 t2 0 t2 0

k [t1 ,0]> k2 [t1 ,t1 ] k k t1 k t1 k 0 0 k t1 0

0 [t2 ,0]> t2 0 t2 t2 0 0 0

0 0 k t1 k 0 0 k t1 k 0 0 0 0 0

Figure 11.14: (top) 2-parameter simplicial filtration for our working example in Figure 11.5.
dmM 1 and dmM 2 : each colored square represents an 1-dimensional vector space k and each
white square represents a 0-dimensional vector space. In the middle picture M 1 is generated by
b , vr which are drawn as a blue dot and a red dot respectively. They are merged at (1, 1) by
v0,1 1,0

the red edge er . In the right picture, M 2 is generated by v(1,1)


g + t(0,1) v1,0
r which is represented by
the combination of the green circle and the red circle together at (1, 1). After this point (1, 1), the
generator is mod out to be zero by relation of eg starting at (2, 1), represented by the green dashed
line segment, and by relation of eb + t(0,1) er starting at (1, 2), represented by the blue dashed line
segment connected with the red dashed line segment.

We can read out some useful information from dimension functions on each indecompos-
able. We take the dimension functions of our working example as an example. For dmM 1 , two
connected components are born at the two left-bottom corners of the purple region. They are
merged together immediately when they meet at grade (1, 1). After that, they persist forever as
one connected component. For dmM 2 , one connected component born at the left-bottom corner
of the square green region. Later at the grades of left-top corner and right-bottom corner of the
green region, it is merged with some other connected component with smaller grades of birth.
Therefore, it only persists within this green region.
In general, both persistent graded Betti numbers and blockcodes are not sufficient to classify
multiparameter persistence modules, which means they are not complete invariants. As indicated
in [64], there is no complete discrete invariant for multiparameter persistence modules. How-
ever, interestingly, these two invariants are indeed complete invariants for interval decomposable
modules like this example, which we will study in the next chapter.
Computational Topology for Data Analysis 295

11.7 Notes and Exercises


In one of the first extensions of the persistence algorithm for 1-parameter, the authors in [9]
presented a matrix reduction based algorithm which applies to a very special case of commutative
ladder Cn for n ≤ 4 defined on a subgrid of Z2 . This matrix construction and the algorithm are
very different from the one presented here. This algorithm may not terminate if the input does not
satisfy the stated assumption.
We have already mentioned that the Meataxe algorithm [251] known in the computational
algebra community can be used for more general modules and hence for persistence modules.
The main advantage of this algorithm is that it applies to general persistence modules, but a major
disadvantage is that it runs very slow. Even allowing approximation, the algorithm [197] runs in
O(N 3(d+1) log q) time (or O(N 4(d+1) log q) as conjectured in [196] because of some special cases
mentioned in [197]) where N is the number of generators and relations in the input module that is
defined with polynomial ring Zq [t1 t2 . . . td ].
Under suitable finiteness condition, the fact that persistence modules are indeed finitely pre-
sented graded modules over multivariate polynoimials was first recognized by Carlsson et al. [64,
65] and Knudson [214] and further studied by Lesnick et al. [220, 222]. The graded module
structure studied in algebraic geometry and commutative algebra [155, 231] encodes a lot of in-
formation and thus can be leveraged for designing efficient algorithms. Lesnick and Wright [222]
leveraged this fact to design an efficient algorithm for computing minimal presentations for 2-
parameter persistence modules from an input 2-parameter simplicial filtration. Recognizing the
power of expressing graded modules in terms of presentations, Dey and Xin [142] proposed the
decomposition algorithm using matrix equivalents of presentations and their direct sums. The
materials in this chapter are mostly taken from this paper. This decomposition algorithm can be
viewed as a generalization of the classical persistence algorithm for 1-parameter though the ma-
trix reduction technique is more involved because it has to accommodate constraints on grades.
The algorithm in [142] handled these constraints using the technique of matrix linearization as
described in Section 11.4.
As a generalization of the 1-parameter persistence algorithm, it is expected that the algorithm
in [142] is interpreted as computing invariants such as persistence diagrams or barcodes. A road-
block to this goal is that d-parameter persistence modules do not have complete discrete invariants
for d ≥ 2 [65, 220]. Consequently, one needs to invent other invariants suitable for multiparame-
ter persistence modules. The rank invariants and multirank invariants described in Section 11.6.1
serve this purpose. There is a related notion of generalized persistence diagram introduced by
Patel [254] and further studied in [212].
One natural approach taking advantage of a decomposition algorithm would be to consider
the decomposition and take the discrete invariants in each indecomposable component. This gives
invariants which may not be complete but still contain rich information. We mentioned two in-
terpretations of the output of the algorithm presented in this chapter as two different invariants:
persistent graded Betti numbers as a generalization of persistence diagrams and blockcodes as a
generalization of barcodes. The persistent graded Betti numbers are linked to the graded Betti
numbers studied in commutative algebra brought to TDA by [214]. The bigraded Betti numbers
are further studied in [222]. By constructing the free resolution of a persistence module, we can
compute its graded Betti numbers and then decompose them according to each indecomposable
296 Computational Topology for Data Analysis

module, which results into the presistent graded Betti numbers. For each indecomposable, we
apply dimension function [141], which is also known as the Hilbert function in commutative al-
gebra to summarize the graded Betti numbers for each indecomposalbe module. This constitutes
a blockcode for indecomposable module of the persistence module. The blockcode is a good ve-
hicle for visualizing lower dimensional persistence modules such as 2- or 3-parameter persistence
modules. For details on these invariants, see [142].

Exercises
1. Using the matrix diagonalization algorithm as described in this chapter, devise an algo-
rithm to compute a minimal presentation of a 2-parameter persistence module given by a
simplicial filtration over Z2 .

2. Give an example of a 2-parameter simplicial filtration over Z2 at least one of whose decom-
posables is not free.

3. Give an example of a 2-parameter simplicial filtration over Z2 at least one of whose decom-
posables does not have all of its non-trivial vector spaces over the grades being isomorphic.

4. Give an example of a 2-parameter persistence module M with three generators and relations
that have the following properties: (i) M is indecomposable, (ii) M has two indecompos-
ables, (iii) M has three indecomposables.

5. Prove that the cycle module Z p arising from a 2-parameter simplicial filtration is always
free.

6. Design a polynomial time algorithm for computing decomposition of the persistence mod-
ule induced by a given simplicial filtration over Z2 when a simplex can be a generator at
different grades.

7. Let A a presentation matrix with n generators and relations whose grades are distinct and
totally ordered. Design an O(n3 ) time algorithm to decompose A. Interpret types of each
indecomposable in such a case.

8. The algorithm TotDiagonalize has been written assuming that the field of the polynomial
ring is Z2 . Write it for a general finite field.

9. Give and example of two non-isomorphic 2-parameter persistence modules which have the
same rank invariant.

10. Design an efficient algorithm to compute the rank invariant of a module from the simplicial
filtration inducing it.

11. Prove that a 2-parameter persistence module M is an interval (see Section 11.6.1) if and
only if supp(M) is connected and each Mu for u ∈ supp(M) has dimension 1.
Computational Topology for Data Analysis 297

12. Suppose that a 2-parameter persistence module is given by a presentation matrix. Design an
algorithm to determine if M is interval or not without decomposing the input matrix (hint:
consider computing graded Betti numbers from the grades of the rows and columns of the
matrix).

13. Show that for a finitely presented (finite number of generators and relations) graded module
M, there exist two interval decomposable graded modules M 1 and M 2 so that the rank
invariants (Definition 11.18) satisfy ruv (M) = ruv (M 1 ) − ruv (M 2 ) for every u, v ∈ supp(M).
Given a presentation matrix for M, compute such M 1 and M 2 efficiently.

14. Write a pseudocode for the construction of a minimal free resolution given in Section 11.6.2.
Analyze its complexity.
298 Computational Topology for Data Analysis
Chapter 12

Multiparameter Persistence and


Distances

We have seen that persistence modules are important objects of study in topological data analysis
in that they serve as an intermediate between the raw input data and the output summarization
with persistence diagrams. For 1-parameter case, the distances between modules can be computed
from bottleneck distances between the corresponding persistence diagrams. For multiparameter
persistence modules, we already saw in Chapter 11 that the indecomposables which are analogues
to bars in 1-parameter case are more complicated. So, defining distances between persistence

Figure 12.1: A 2-parameter module is sliced by lines that provide matching distance between
two modules as we explain in Section 12.3. Figure is an output of RIVET software due to [221],
courtesy of Michael Lesnick and Matthew Wright (2015, fig. 3).

modules in terms of indecomposables become also more complicated. However, we need distance
or distance-like notion between persistence modules to compare the input data inducing them.

299
300 Computational Topology for Data Analysis

Figure 12.1 shows an output of RIVET software [221] that implemented the so-called matching
distance between 2-parameter persistence modules. In this chapter, we describe some of these
distances proposed in the literature and algorithms for computing them efficiently (polynomial
time).
The interleaving distance dI between 1-parameter persistence modules as defined in Chapter 3
provides a useful means to compare them. Fortunately, for 1-parameter persistence modules,
they can be computed exactly by computing the bottleneck distance db between their persistence
diagrams thanks to the isometry theorem [220] (see also [23, 80]). Chapter 3 gives a polynomial
time algorithm O(n1.5 log n) for computing bottleneck distance. The status however is not so well
settled for multiparameter persistence modules.
One of the difficulties facing the definition and computation of distances among multiparame-
ter persistence modules is the fact that their indecomposables do not have a finite characterization
as indicated in Chapter 11. Even for finitely generated modules, this is true though a unique de-
composition is guaranteed by Krull-Schmidt Theorem [10]. Despite this difficulty, one can define
an interleaving distance dI for multiparameter persistence modules which can be viewed as an
extension of the interleaving distance defined for 1-parameter persistence modules. Shown by
Lesnick [220], this distance is the most fundamental one because it is the most discriminative
distance among persistence modules that is also stable with respect to functions or simplicial fil-
trations that give rise to the modules. Unfortunately, it turns out that computing dI for n-parameter
persistence modules and even approximating it within a factor less than 3 is NP-hard for n ≥ 2.
For a special case of modules called interval modules, dI can be computed in polynomial time.
In Section 12.2, we introduce the interleaving distance for multiparameter persistence modules.
We follow it with a polynomial time algorithm [141] in Section 12.4.3 which computes dI for
2-parameter interval modules.
To circumvent the problem of computing interleaving distances, several other distances have
been proposed in the literature that is computable in polynomial time and bounds the interleaving
distance either from above or below, but not both in the general case. Given the NP-hardness of
approximating interleaving distance, there cannot exist any polynomial time computable distance
that bounds dI both from above and below within a constant factor of 3 unless P = NP. The
matching distance dm as defined in Section 12.3 bounds dI from below, that is, dm ≤ dI , and it
can be computed in polynomial time.
Finally, in Section 12.4, we extend the definition of the bottleneck distance to multiparam-
eter persistence modules. Extending the concept from 1-parameter case, one can define db as
the supremum of the pairwise interleaving distances between indecomposables under an opti-
mal matching. Then, straightforwardly, dI ≤ db but the converse is not necessarily true. It is
known that no lower bound in terms of db for dI may exist even for a special class of 2-parameter
persistence modules called interval decomposable modules [47]. However, db can be useful as
a reasonable upper bound to dI . Unfortunately, a polynomial time algorithm for computing db
is not known for general persistence modules. For some persistence modules whose indecom-
posables have constant description such as block decomposable modules, one can compute db in
polynomial time simply because the interleaving distance between any two modules with constant
description cannot take more than O(1) time.
In Section 12.4, we consider a special class of persistence modules whose indecomposables
are intervals and present a polynomial time algorithm for computing db for them. These are mod-
Computational Topology for Data Analysis 301

ules whose indecomposables are supported by “stair-case" polyhedra. Our algorithm assumes
that all indecomposables are given and computes db exactly for 2-parameter interval decompos-
able modules. Although the algorithm can be extended to persistence modules with larger number
of parameters, we choose to present it only for 2-parameter case for simplicity while not losing
the essential ingredients for the general case. The indecomposables required as input can be
computed by the decomposition algorithm presented in the previous chapter (Chapter 11).

12.1 Persistence modules from categorical viewpoint


In this chapter we define the persistence modules as categorical structures which are different from
the graded structures used in the previous chapter. Other than introducing a different viewpoint
of persistence modules, we do so because this definition becomes more amenable to defining
distances. Thanks to representation theory [65, 108, 214], these two notions coincide when the
modules are finitely generated in the graded module definition (Definition 11.5) and are of finite
type (Definition 12.5) in the categorical definition. Let us recall the definition in 1-parameter
case. A persistence module M parameterized over A = Z, or R is defined by a sequence of vector
spaces M x , x ∈ A with linear maps ρ x,y : M x → My so that ρ x,x is identity for every x ∈ A and for
all x, y, z ∈ A with x ≤ y ≤ z, one has ρ x,z = ρy,z ◦ ρ x,y . These conditions can be formulated using
category theory.

Definition 12.1 (Category). A category C is a set of objects Obj C with a set of morphisms
hom(x, y) for every pair of elements x, y ∈ Obj C where

1. for every x ∈ Obj C, there is a special identity morphism 1 x ∈ hom(x, x);

2. if f ∈ hom(x, y) and g ∈ hom(y, z), then g ◦ f ∈ hom(x, z);

3. for homomorphisms f, g, h, the compositions wherever defined are associative, that is, ( f ◦
g) ◦ h = f ◦ (g ◦ h);

4. 1 x ◦ f x,y = f x,y and f x,y ◦ 1y = f x,y for every pair x, y ∈ Obj C.

All sets form a category Set with functions between them playing the role of morphisms. Topo-
logical spaces form a category Top with continuous maps between them being the morphisms.
Vector spaces form the category Vec with linear maps between them being the morphisms. A
poset P form a category with every pair x, y ∈ P admitting at most one morphism; hom(x, y)
has one element if x ≤ y and empty otherwise. Such a category is called a thin category in the
literature for which composition rules take trivial form.

Definition 12.2 (Functor). A functor between two categories C and D is an assignment F : C →


D satisfying the following conditions:

1. for every x ∈ Obj C, F(x) ∈ Obj D;

2. for every morphism f ∈ hom(x, y), F( f ) ∈ hom(F(x), F(y));

3. F respects composition, that is, F( f ◦ g) = F( f ) ◦ F(g);


302 Computational Topology for Data Analysis

4. F preserves identity morphisms, that is, F(1 x ) = 1F(x) for every x ∈ Obj C.

One can observe that the 1-parameter persistence module is a functor from the category of totally
ordered set of Z (or R) to the category of Vec. Homology groups with a field coefficient for
topological spaces provide a functor from category Top to the category of vectors spaces Vec. We
can define maps between functors themselves.

Definition 12.3 (Natural transformation). Given two functors F, G : C → D, a natural transfor-


mation η from F to G, denoted as η : F =⇒ G, is a family of morphisms {η x : F(x) → G(x)} for
every x ∈ Obj C so that the following diagram commutes:

F(ρ)
F(x) / F(y)

ηx ηy
 G(ρ) 
G(x) / G(y)

Let k be a field, Vec be the category of vector spaces over k, and vec be the subcategory of
finite dimensional vector spaces. As usual, for simplicity, we assume k = Z2 .

Definition 12.4 (Persistence module). Let P be a poset category. A P-indexed persistence module
is a functor M : P → Vec. If M takes values in vec, we say M is pointwise finite dimensional
(p.f.d.). The P-indexed persistence modules themselves form another category where the natural
transformations between functors constitute the morphisms.

Definition 12.5 (Finite type). A P-indexed persistence module M is said to have finite type if M
is p.f.d. and all morphisms M(x ≤ y) are isomorphisms outside a finite subset of P.

Here we consider the poset category to be Rd with the standard partial order and all modules
to be of finite type. We call Rd -indexed persistence modules as d-parameter modules in short.
The reader can recognize that this is a shift from our assumption in the last chapter where we
considered Zd -indexed modules. The category of d-parameter modules in this chapter is denoted
as Rd -mod. For a d-parameter module M ∈ Rd -mod, we use notations M x := M(x) and ρ M x→y :=
M(x ≤ y).

Definition 12.6 (Shift). For any δ ∈ R, we denote ~δ = (δ, · · · , δ) = δ · ~e, where ~e = (e1 , e2 , . . . , ed )
with {ei }di=1 being the standard basis of Rd . We define a shift functor (·)→δ : Rd -mod → Rd -mod
where M→δ := (·)→δ (M) is given by M→δ (x) = M(x + ~δ) and M→δ (x ≤ y) = M(x + ~δ ≤ y + ~δ). In
other words, M→δ is the module M shifted diagonally by ~δ.

12.2 Interleaving distance


The following definition of interleaving adapts the original definition designed for 1-parameter
modules in [77, 80] to d-parameter modules.

Definition 12.7 (Interleaving). For two d-parameter persistence modules M and N, and δ ≥ 0,
a δ-interleaving between M and N are two families of linear maps {φ x : M x → N x+~δ } x∈Rd and
{ψ x : N x → M x+~δ } x∈Rd satisfying the following two conditions; see Figure 12.2:
Computational Topology for Data Analysis 303

• ∀x ∈ Rn , ρ M = ψ x+~δ ◦ φ x and ρN = φ x+~δ ◦ ψ x


x→x+2~δ x→x+2~δ

• ∀x ≤ y ∈ Rn , φy ◦ ρ M
x→y = ρ ~
N ◦ φ x and ψy ◦ ρNx→y = ρ M ~ ◦ ψx
x+δ→y+~δ x+δ→y+~δ

ρN
x + 2~δ x+~δ,y+~δ y + ~δ
x + ~δ
ρN
x,x+2~δ
x + ~δ
N x N
ρN
x,y
ψx+~δ ψy
ψx
ψx
φx+~δ
φx φx φy
x + 2~δ ρM
x+~δ,y+~δ
ρM
x,x+2~δ y
M x + ~δ M
x x ρM
x,y

(a) (b)

Figure 12.2: (a) Triangular commutativity, (b) Rectangular commutativity.


If such a δ-interleaving exists, we say M and N are δ-interleaved. We call the first condition
triangular commutativity and the second condition rectangular commutativity.
Definition 12.8 (Interleaving distance). The interleaving distance between modules M and N is
defined as dI (M, N) = inf δ {M and N are δ-interleaved}. We say M and N are ∞-interleaved if
they are not δ-interleaved for any δ ∈ R+ , and assign dI (M, N) = ∞.
The following computational hardness result from [33] is stated assuming that the input mod-
ules are represented with the graded matrices as in Chapter 11. As we mentioned before, these
modules coincide with the category of modules of finite type.
Theorem 12.1. Given two modules M and N given by graded matrix representations, the problem
of computing a real r so that dI (M, N) ≤ r < 3dI (M, N) is NP-hard.

12.3 Matching distance


The matching distance between two persistence modules M and N draws upon the idea of taking
the restrictions of M and N over lines with positive slopes and then determining the supremum of
weighted interleaving distances on these restrictions. It can be defined for d-parameter modules.
We are going to describe a polynomial time algorithm for computing it for 2-parameter modules,
so for simplicity we define the matching distance for 2-parameter modules. Let ` : sx + t denote
any line in R2 with s > 0 and let Λ denote the space of all such lines. Define a parameterization
λ : R → ` of ` by taking λ(x) = 1+s
1
2 (x, sx+t). For a line ` ∈ Λ, let M|` denote the restriction of M
on ` where M|` (x) = M(λ(x)) with linear maps induced from M.This is a 1-parameter persistence
module. We define a weight w(`) that accounts for projections on one of the two axes depending
on the slope:
√1

 for s ≥ 1
 1+s2

w(`) = 


 q 1
1
for 0 < s < 1
 1+
s2
304 Computational Topology for Data Analysis

Definition 12.9. The matching distance dm (M, N) between two persistence modules is defined as

dm (M, N) = sup{w(`) · dI (M|` , N|` )}.


`∈Λ

The weight w(`) is introduced to make the matching distance stable with respect to the inter-
leaving distance.

12.3.1 Computing matching distance


We define a point-line duality in R2 : a line ` ⊂ R2 is dual to a point `∗ = (s, t) where ` : y = sx − t
and a point p = (s, t) is dual to a line p∗ : y = sx − t. Following facts can be deduced from the
definition easily (Exercise 4).
Fact 12.1.
1. For a point p and a line `, one has (p∗ )∗ = p and (`∗ )∗ = `.

2. If a point p is in a line `, then point `∗ is in line p∗ .

3. If a point p is above (below) a line `, then point `∗ is above (below) the line p∗ .

Consider the open half-plane Ω of R2 where Ω = {x, y | x > 0}. Let α denote the bijective map
between Ω and the space Λ of lines with positive slopes where α(p) = p∗ .
The representation theory [65, 108, 214] tells us that finitely generated graded modules as
defined in Chapter 11 are essentially equivalent to persistence modules as defined in this chapter
as long as they are of finite type (Definition 12.5). Then, if a persistence module M is a functor
on the poset P = R2 or Z2 , we can talk about the grades (elements of P) of a generating set of M
and the relations which are combinations of generators that become zero. A mindful reader can
recognize these are exactly the grades of the rows and columns of the presentation matrix for M
(Definition 11.14).
Given two 2-parameter persistence modules M and N, let gr(M) and gr(N) denote the grades
of all generators and relations in M and N respectively. Consider the set of lines L dual to the
points in gr(M) ∪ gr(N). These lines together create a line arrangement in Ω which is a partition
of Ω into vertices, edges, and faces. The vertices are points where two lines meet, the edges are
maximal connected subset of the lines excluding the vertices, and faces are maximal connected
subsets of Ω excluding the vertices and edges. Let A0 denote this initial arrangement. We refine
this arrangement further later. First, we observe an invariant property of the arrangemnt for which
we need the following definition.
Definition 12.10 (Point pair type). Given two points p, q and a line `, we say (p, q) has the
following types with respect to ` : (i) Type-1 if both p and q lie above `, (ii) Type-2 if both p and
q lie below `, (iii) Type-3 if p lies above and q lies below `, and (iv) Type-4 if p lies below and q
lies above `.
The following proposition follows from Fact 12.1.
Proposition 12.2. For two points p, q ∈ gr(M) ∪ gr(N) and a face τ ∈ A0 , the type of (p, q) with
respect to the line z∗ is the same for all z ∈ τ.
Computational Topology for Data Analysis 305

Our goal is to refine A0 further to another arrangement A so that for every face τ ∈ A the grade
points p, q that realizes dI (M|` , N|` ) for every ` = z∗ remains the same for all z ∈ τ. Toward that
goal, we define the push of a grade point.
Definition 12.11 (Push). For a point p = (p x , py ) and a line ` : y = sx − t, the push push(p, `) is
defined as
(p x , sp x − t) for p below `
(
push(p, `) =
((py + t)/s, py ) for p above `
Geometrically, push(p, `) is the intersection of ` with the upward ray originating from p in
the first case, and horizontal ray originating from p in the second case. Figure 12.3 illustrates the
two cases.

p
q

Figure 12.3: Pushes of two points to three lines. Thick segments indicate δ p,q for the correspond-
ing lines.

For p, q ∈ R2 , let
δ p,q (`) = kpush(p, `) − push(q, `)k2
Consider the equations

δ p,q (`) = 0 for p, q ∈ gr(M) or p, q ∈ gr(N)


c p,q δ p,q (`) = c p0 ,q0 δ p0 ,q0 (`) for p, q, p0 , q0 ∈ gr(M) t gr(N)

where
1
(
if p, q ∈ gr(M) or p, q ∈ gr(N)
c p,q = 2
1 otherwise.
The following proposition is proved in [207].
Proposition 12.3. The solution set z ∈ τ for a face τ ∈ A0 so that δ p,q (z∗ ) satisfies the above
equations is either empty, the entire face τ, intersection of a line with τ, or the intersection of two
lines with τ.
Let A be the arrangement of Ω with the lines used to form A0 , the lines stated in the above
proposition, and the vertical line x = 1.
306 Computational Topology for Data Analysis

Proposition 12.4. A is formed with O(n4 ) lines where n = |gr(M) + gr(N)|.


The next theorem states the main property of A which allows us to consider only finitely many
(polynomially bounded) lines ` for computing the supremum of {dI (M|` , N|` )}.
Theorem 12.5. For any face τ ∈ A, there exists a pair p, q ∈ gr(M) ∪ gr(N) so that c p,q δ p,q (z∗ ) =
dI (M|z∗ , N|z∗ ) for every z ∈ τ.
The above theorem implies that after determining the pair (p, q) for the face τ ∈ A, we need to
compute the supz∈τ F(z) where F(z) = dI (M|z∗ , N|z∗ ) because then considering all F over all faces
in A gives the global supremum. So, now we focus on how to compute the supremum of F on a
face τ.

s=0

Figure 12.4: Outer regions are shaded gray whose outer segments are drawn with thickened seg-
ments; the hatched region is inner.

A region is the closure of a face τ ∈ A in Ω. A region R is called inner if it is bounded and its
closure in R2 does not meet the vertical line s = 0. See Figure 12.4. All other regions are called
outer. An outer region has exactly two edges that are either unbounded or reaches the vertical
line s = 0 in the limit. They are called outer edges. It turns out that sup F(z) is achieved either at
a vertex or at the limit point of the outer edges that can be computed easily.
Theorem 12.6. The supremum supz∈R F(z) for a region R is realized either at a boundary vertex of
R or at the limit point of an outer edge. In the latter case, let p, q be the pair given by Theorem 12.5
for τ ⊆ R. If e is an outer edge and p lies above z∗ for any (and all by Proposition 12.2) z ∈ τ,
then sup F restricted to e is given by:
|p x − t| if line of e intersects line x = 0 at t.
(
sup F|e =
|q x + r| if line of e is infinite and has slope r.
The roles of p and q reverses if p lies below z∗ for any z ∈ τ.
We present the entire algorithm in Algorithm 25:MatchDist. It is known that this algorithm
runs in O(n11 ) time where n is the total number of generators and relations for the two input
modules. A more efficient algorithm approximating the matching distance is also known [209].
Computational Topology for Data Analysis 307

Algorithm 25 MatchDist(M, N)
Input:
Two modules M and N with grades of their generators and relations
Output:
Matching distance between M and N
1: Compute arrangement A as described from gr(M) ∪ gr(N);
2: Let V be the vertex set of A;
3: Compute maximum m = maxz∈V F(z∗ ) over all vertices z ∈ V;
4: for every outer region R do
5: Pick a point z ∈ R;
6: Compute the pair p, q ∈ gr(M) ∪ gr(N) that realizes dI (M|z∗ , N|z∗ );
7: if p is above z∗ then
8: if e as defined in Theorem 12.6 is infinite then
9: m := max(m, q x + r) where r is the slope of e
10: else
11: m := max(m, p x − t) where e meets line x = 0 at t
12: end if
13: else
14: reverse roles of p and q
15: end if
16: end for
17: return m

12.4 Bottleneck distance


Definition 12.12 (Matching). A matching µ : A 9 B between two multisets A and B is a partial
bijection, that is, µ : A0 → B0 for some A0 ⊆ A and B0 ⊆ B. We say im µ = B0 , coim µ = A0 .

For the next definition, we call a d-parameter module M δ-trivial if ρ M = 0 for all x ∈ Rd .
x→x+~δ
Lm Ln
Definition 12.13 (Bottleneck distance). Let M  i=1 M i and N  j=1 N j be two persistence
modules, where Mi and N j are indecomposable submodules of M and N respectively. Let I =
{1, · · · , m} and J = {1, · · · , n}. We say M and N are δ-matched for δ ≥ 0 if there exists a matching
µ : I 9 J so that, (i) i ∈ I \ coim µ =⇒ Mi is 2δ-trivial, (ii) j ∈ J \ im µ =⇒ N j is 2δ-trivial,
and (iii) i ∈ coim µ =⇒ Mi and Nµ(i) are δ-interleaved.
The bottleneck distance is defined as

db (M, N) = inf{δ | M and N are δ-matched}.

The following fact observed in [47] is straightforward from the definition.

Fact 12.2. dI ≤ db .
308 Computational Topology for Data Analysis

12.4.1 Interval decomposable modules


We present a polynomial time algorithm for computing the bottleneck distances for a class of per-
sistence modules called interval decomposable modules which we have seen in the previous chap-
ter (Section 11.6.1). For ease of description, we will describe the algorithm for the 2-parameter
case though an extension to multiparameter case exists.
Persistence modules whose indecomposables are interval modules (Definition 12.15) are called
interval decomposable modules. To account for the boundaries of free modules, we enrich the
poset Rd by adding points at ±∞ and consider the poset R̄n = R̄ × . . . × R̄ where R̄ = R ∪ {±∞}
with the usual additional rule a ± ∞ = ±∞.

Definition 12.14 (Interval). An interval is a subset ∅ , I ⊂ R̄d that satisfies the following:

1. If p, q ∈ I and p ≤ r ≤ q, then r ∈ I (convexity condition);

2. If p, q ∈ I, then there exists a sequence (p = p0 , . . . , pm = q) ∈ I for some m ∈ N so that


for every i ∈ [0, k − 1] either pi ≤ pi+1 or pi ≥ pi+1 (connectivity condition). We call the
sequence (p = p0 , . . . , pm = q) a path from p to q (in I).

Let I¯ denote the closure of an interval I in the standard topology of R̄d . The lower and upper
boundaries of I are defined as

L(I) = {x = (x1 , · · · , xd ) ∈ I¯ | ∀y = (y1 , · · · , yd ) with yi < xi ∀i =⇒ y < I}


U(I) = {x = (x1 , · · · , xd ) ∈ I¯ | ∀y = (y1 , · · · , yd ) with yi > xi ∀i =⇒ y < I}.

Let B(I) = L(I) ∪ U(I). According to this definition, R̄d is an interval with boundary B(R̄d )
that consists of all the points with at least one coordinate ∞. The vertex set V(R̄d ) consists of 2d
corner points with coordinates (±∞, · · · , ±∞).

Definition 12.15 (d-parameter interval module). A d-parameter interval persistence module, or


interval module in short, is a persistence module M that satisfies the following condition: for an
interval I M ⊆ R̄d , called the interval of M,
 
k if x ∈ I M
 1 if x, y ∈ I M

Mx =  ρ x→y = 
M
 
0 otherwise
 0 otherwise

where 1 and 0 denote the identity and zero maps respectively.

It is known that an interval module is indecomposable [47].

Definition 12.16 (Interval decomposable module). A d-parameter interval decomposable module


is a persistence module that can be decomposed into interval modules.

Definition 12.17 (Rectangle). A k-dimensional rectangle, 0 ≤ k ≤ d , or k-rectangle, in Rd , is a


set I = [a1 , b1 ]×, · · · , ×[ad , bd ], ai , bi ∈ R̄, such that, there exists a size k index set Λ ⊆ [d] where
∀i ∈ Λ, ai , bi , and ∀ j ∈ [d] \ Λ, a j = b j .
Computational Topology for Data Analysis 309

U (IM3 )
IM3
IM2

IM1

L(IM1 )
M = M1 ⊕ M2 ⊕ M3
(a) (b)

Figure 12.5: (a) Interval in R3 , (b) Intervals in R2 .

A 0-rectangle is a vertex. A 1-rectangle is an edge. Note that a rectangle is an example of an


interval.
We say an interval I ⊆ R̄d is discretely presented if it is a finite union of d-rectangles. We also
require the boundary of the interval is a (d − 1)-manifold. A facet of I is a (d − 1)-dimensional
subset f = fˆ ∩ L ⊆ R̄d where fˆ = {xi = c} is a hyperplane at some standard direction e~i in Rd
and L is either L(I) or U(I). We denote the facet set as F(I) and the union of all of their vertices
as V(I). So the boundary of I is the union of facets. And the vertices of each facet is a subset of
V(I). Figure 12.5(a) and (b) show intervals in R3 and R2 respectively.
For 2-parameter cases, a discretely presented interval I ⊆ R̄2 has boundary consisting of a
finite set of horizontal and vertical line segments called edges, with end points called vertices,
which satisfy the following condition: (i) every vertex is incident to either a single horizontal
edge or a vertical edge, (ii) no vertex appears in the interior of an edge. We denote the set of
edges and vertices with E(I) and V(I) respectively.
We say a d-parameter interval decomposable module is finitely presented if it can be decom-
posed into finitely many interval modules whose intervals are discretely presented (figure on right
for an example in 2-D cases). They belong to the finitely presented persistence modules as defined
in Chapter 11. In the following, we focus on finitely presented interval decomposable modules.
For an interval module M, let M be the interval module defined on the closure I M . To avoid
complication in this exposition, we assume that every interval module has closed intervals which
is justified by the following proposition (Exercise 8).
Proposition 12.7. dI (M, N) = dI (M, N).

12.4.2 Bottleneck distance for 2-parameter interval decomposable modules


We present an algorithm for 2-parameter interval decomposable persistence modules though most
of our definitions and claims in this section apply to general d-parameter persistence modules.
They are stated and proved in the general setting wherever applicable.
Given the intervals of the indecomposables (interval modules) as input, an approach based
on bipartite-graph matching is presented in Section 3.2.1 for computing the bottleneck distance
310 Computational Topology for Data Analysis

db (M, N) between two 1-parameter persistence modules M and N. This approach constructs
a bipartite graph G out of the intervals of M and N and their pairwise interleaving distances
including the distances to zero modules. If these distance computations take O(C) time in total,
5
then the algorithm for computing db takes time O(m 2 log m + C) where M and N together have
5
m indecomposables altogether. Observe that, the term m 2 in the complexity comes from the
bipartite matching. Although this could be avoided in the 1-parameter case taking advantage of
the two dimensional geometry of the persistence diagrams, we cannot do this here for determining
matching among indecomposables according to Definition 12.13. Given indecomposables (say
computed by the algorithm in Chapter 11 or Meataxe [251]), this approach is readily extensible
to the d-parameter modules if one can compute the interleaving distance between any pair of
indecomposables including the zero modules. To this end, we present an algorithm to compute
the interleaving distance between two 2-parameter interval modules Mi and N j with ti and t j
vertices respectively on their intervals in O((ti + t j ) log(ti + t j )) time. This gives a total time of
5 5
O(m 2 log m+ i, j (ti +t j ) log(ti +t j )) = O(m 2 log m+t2 log t) where t is the total number of vertices
P
over all input intervals.
Now we focus on computing the interleaving distance between two given intervals. Given
intervals I M and IN with t vertices, the algorithm searches a value δ so that there exists two families
of linear maps from M to N→δ and from N to M→δ respectively which satisfy both triangular and
square commutativity. The search is done with a binary probing: For a chosen δ from a candidate
set of O(t) values, the algorithm determines the direction of the search by checking two conditions
called trivializability and validity on the intersections of modules M and N.

Definition 12.18 (Intersection module). For two interval modules M and N with intervals I M and
IN respectively let IQ = I M ∩ L
`
IN , which is a disjoint union of intervals, IQi . The intersection
module Q of M and N is Q = Qi , where Qi is the interval module with interval IQi . That is,
 
k if x ∈ I M ∩ IN

Q 1 if x, y ∈ I M ∩ IN

Qx =  and for x ≤ y, ρ x→y = 
 
0 otherwise
 0 otherwise

From the definition we can see that the support of Q, supp(Q), is I M ∩ IN . We call each Qi an
intersection component of M and N. Write I := IQi and consider φ : M → N to be any morphism.
The following proposition says that φ is constant on I.

Proposition 12.8. φ|I ≡ a · 1 for some a ∈ k.

Proof.
1 1
M pi M pi+1 M pi M pi+1
φ pi φ pi+1 φ pi φ pi+1

N pi 1
N pi+1 N pi 1
N pi+1

For any x, y ∈ I, consider a path (x = p0 , p1 , p2 , ..., p2m , p2m+1 = y) in I from x to y and the
commutative diagrams above for pi ≤ pi+1 (left) and pi ≥ pi+1 (right) respectively. Observe that
φ pi = φ pi+1 in both cases due to the commutativity. Inducting on i, we get that φ(x) = φ(y). 
Computational Topology for Data Analysis 311

IQ1 IM

IN
IQ2

Q1 (M, N )-valid, Q2 not

Figure 12.6: Examples of a valid intersection and a invalid intersection.

(−∞, ∞) (∞, ∞)
IM
d0
x0 x 0 I
y
I
x d0
y IN
d
d IQ
∆x = ∆ x0 x 2d
∆x = ∆x0
2d0
(−∞, −∞) (∞, −∞)

Figure 12.7: d = dl(x, I), y = πI (x), d0 = dl(x0 , L(I)) (left); d = dl(x, I) and d0 = dl(x0 , U(I)) are
0
defined on the left edge of B(R̄2 ) (middle); Q is d(M,N) - and d(N,M) -trivializable (right).

Definition 12.19 (Valid intersection). An intersection component Qi is (M, N)-valid if for each
x ∈ IQi the following two conditions hold (see Figure 12.6):

(i) y ≤ x and y ∈ I M =⇒ y ∈ IN , and (ii) z ≥ x and z ∈ IN =⇒ z ∈ I M

Proposition 12.9. Let {Qi } be a set of intersection components of M and N with intervals {IQi }.
Let {φ x } : M → N be the family of linear maps defined as φ x = 1 for all x ∈ IQi and φ x = 0
otherwise. Then φ is a morphism if and only if every Qi is (M, N)-valid.

Definition 12.20 (Diagonal projection and distance). Let I be an interval and x ∈ R̄n . Let ∆ x =
~ | α ∈ R} denote the line called diagonal with slope 1 that passes through x. We define (see
{x + α
Figure 12.7) 
miny∈∆x ∩I {d∞ (x, y) := |x − y|∞ } if ∆ x ∩ I , ∅

dl(x, I) = 

+∞ otherwise.

In case ∆ x ∩ I , ∅, define πI (x), called the projection point of x on I, to be the point y ∈ ∆ x ∩ I


where dl(x, I) = d∞ (x, y).

Note that ∀α ∈ R, we have ±∞ + α = ±∞. Therefore, for x ∈ V(R̄n ), the line collapses to a
single point. In that case, dl(x, I) , +∞ if and only if x ∈ I, which means πI (x) = x.
Notice that upper and lower boundaries of an interval are also intervals by definition. With
this understanding, following properties of dl are obvious from the above definition.

Fact 12.3.
312 Computational Topology for Data Analysis

(i) For any x ∈ I M ,

dl(x, U(I M )) = sup{x + ~δ ∈ I M } and dl(x, L(I M )) = sup{x − ~δ ∈ I M }.


δ∈R̄ δ∈R̄

(ii) Let L = L(I M ) or U(I M ) and let x, x0 be two points such that πL (x), πL (x0 ) both exist. If x
and x0 are on the same facet or the same diagonal line, then |dl(x, L)−dl(x0 , L)| ≤ d∞ (x, x0 ).

Set V L(I) := V(I)∩L(I), EL(I) := E(I)∩L(I), VU(I) := V(I)∩U(I), and EU(I) := E(I)∩U(I).

Proposition 12.10. For an intersection component Q of M and N with interval I, the following
conditions are equivalent:

1. Q is (M, N)-valid.

2. L(I) ⊆ L(I M ) and U(I) ⊆ U(IN ).

3. V L(I) ⊆ L(I M ) and VU(I) ⊆ U(IN ).

Definition 12.21 (Trivializable intersection). Let Q be a connected component of the intersection


of two modules M and N. For each point x ∈ IQ , define
(M,N)
dtriv (x) = max{dl(x, U(I M ))/2, dl(x, L(IN ))/2)}.

For δ ≥ 0, we say a point x is δ(M,N) -trivializable if d(M,N)


triv (x) < δ. We say an intersection
component Q is δ(M,N) -trivializable if each point in IQ is δ(M,N) -trivializable (Figure 12.7). We
also denote d(M,N) (M,N)
triv (IQ ) := sup x∈IQ {dtriv (x)}.

The following proposition discretizes the search for trivializability.

Proposition 12.11. An intersection component Q is δ(M,N) -trivializable if and only if every vertex
of Q is δ(M,N) -trivializable.

Recall that for two modules to be δ-interleaved, we need two families of linear maps satisfying
both triangular commutativity and square commutativity. For a given δ, Theorem 12.14 below
provides criteria which ensure that such linear maps exist. In the algorithm, we then will make
sure that these criteria are verified.
Given an interval module M and the diagonal line ∆ x for any x ∈ R̄d , there is a 1-parameter
persistence module M|∆x which is the functor restricted on the poset ∆ x as a subcategory of R¯d .
We call it a 1-dimensional slice of M along ∆ x . Define

δ∗ = inf {δ : ∀x ∈ R̄d , M|∆x and N|∆x are δ-interleaved}.


δ∈R̄

Equivalently we have δ∗ = sup x∈R̄n {dI (M|∆x , N|∆x )}. We have the following Proposition and
Corollary from the equivalent definitions of δ∗ .

Proposition 12.12. For two interval modules M, N and δ > δ∗ ∈ R+ , there exist two families of
linear maps φ = {φ x : M x → N(x+δ) } and ψ = {ψ x : N x → M(x+δ) } such that for each x ∈ R̄d , the
1-dimensional slices M|∆x and N|∆x are δ-interleaved by the linear maps φ|∆x and ψ|∆x .
Computational Topology for Data Analysis 313

Corollary 12.13. dI (M, N) ≥ δ∗ .

Theorem 12.14. For two interval modules M and N, dI (M, N) ≤ δ if and only if the following
two conditions are satisfied:
(i) δ ≥ δ∗ ,
(ii) ∀δ0 > δ, each intersection component of M and N→δ0 is either (M, N→δ0 )-valid or δ(M,N→δ0 ) -
trivializable, and each intersection component of M→δ0 and N is either (N, M→δ0 )-valid or δ(N,M→δ0 ) -
trivializable.

Proof. Note that dI (M, N) ≤ δ if and only if ∀δ0 > δ, M, N is δ0 -interleaved.

‘only if’ direction: Given M and N are δ-interleaved. The part (i) follows from Corol-
lary 12.13 directly. For part (ii), by definition of interleaving, ∀δ0 > δ, we have two families
of linear maps {φ x } and {ψ x } which satisfy both triangular and square commutativities. Let the
morphisms between the two persistence modules constituted by these two families of linear maps
be φ = {φ x } and ψ = {ψ x } respectively. For each intersection component Q of M and N→δ0 with
interval I := IQ , consider the restriction φ|I . By Proposition 12.8, φ|I is constant, that is, φ|I ≡ 0
or 1. If φ|I ≡ 1, by Proposition 12.9, Q is (M, N→δ0 )-valid. If φ|I ≡ 0, by the triangular commuta-
tivity of φ, we have that ρ M = ψ x+~δ0 ◦ φ x = 0 for each point x ∈ I. That means x + 2δ~0 < I M .
x→x+2~δ0
By Fact 12.3(i), dl(x, U(I M ))/2 < δ0 . Similarly, ρN = φ x ◦ ψ ~0 = 0 =⇒ x − δ~0 < IN ,
x−δ
x−δ~0 →x+δ~0
which is the same as to say x − 2δ~0 < IN→δ0 . By Fact 12.3(i), dl(x, L(IN→δ0 ))/2 < δ0 . So ∀x ∈ I,
(M,N 0 )
we have dtriv →δ (x) < δ0 . This means Q is δ0(M,N 0 ) -trivializable. Similar statement holds for
→δ
intersection components of M→δ0 and N.
‘if’ direction: We construct two families of linear maps {φ x }, {ψ x } as follows: On the interval
I := IQi of each intersection component Qi of M and N→δ0 , set φ|I ≡ 1 if Qi is (M, N→δ0 )-valid
and φ|I ≡ 0 otherwise. Set φ x ≡ 0 for all x not in the interval of any intersection component.
Similarly, construct {ψ x }. Note that, by Proposition 12.9, φ := {φ x } is a morphism between M
and N→δ0 , and ψ := {ψ x } is a morphism between N and M→δ0 . Hence, they satisfy the square
commutativity. We show that they also satisfy the triangular commutativity.
We claim that ∀x ∈ I M , ρ M = 1 =⇒ x + δ~0 ∈ IN and similar statement holds for
x→x+2δ~0
IN . From condition that δ > δ ≥ δ and by proposition 12.12, we know that there exist two
0 ∗

families of linear maps satisfying triangular commutativity everywhere, especially on the pair of
1-parameter persistence modules M|∆x and N|∆x . From triangular commutativity, we know that
for ∀x ∈ I M with ρ M = 1, x + δ~0 ∈ IN since otherwise one cannot construct a δ-interleaving
x→x+2δ~0
between M|∆x and N|∆x . So we get our claim.
Now for each x ∈ I M with ρ M = 1, we have dl(x, U(I M ))/2 ≥ δ0 by Fact 12.3, and
x→x+2δ~0
x + δ~0 ∈ IN by our claim. This implies that x ∈ I M ∩ IN→δ0 is a point in an interval of an intersec-
tion component Q x of M, N→δ0 which is not δ0(M,N 0 ) -trivializable. Hence, it is (M, N→δ0 )-valid by
→δ
the assumption. So, by our construction of φ on valid intersection components, φ x = 1. Symmet-
rically, we have that x + δ~0 ∈ IN ∩ I M→δ0 is a point in an interval of an intersection component of N
and M→δ0 which is not δ0(N,M 0 ) -trvializable since dl(x + δ~0 , L(I M ))/2 ≥ δ0 . So by our construction
→δ
of ψ on valid intersection components, ψ x+δ~0 = 1. Then, we have ρ M ~0
= ψ x+δ~0 ◦ φ x for ev-
x→x+2δ
ery nonzero linear map ρ M ~0
. The statement also holds for any nonzero linear map ρN .
x→x+2δ x→x+2δ~0
314 Computational Topology for Data Analysis

Therefore, the triangular commutativity holds. 

Note that the above proof provides a construction of the interleaving maps for any specific
δ0 if it exists. Furthermore, the interleaving distance dI (M, N) is the infimum of all δ0 satisfying
the two conditions in the theorem, which means dI (M, N) is the infimum of all δ0 ≥ δ∗ satisfying
condition 2 in Theorem 12.14.

12.4.3 Algorithm to compute dI for intervals


In practice, we cannot verify all those infinitely many values δ0 > δ∗ . But we propose a finite
candidate set of potentially possible interleaving distance values and prove later that our final
target, the interleaving distance, is always contained in this finite set. Surprisingly, the size of the
candidate set is only O(n) with respect to the number of vertices for 2-parameter interval modules.
Based on our results, we propose a search algorithm for computing the interleaving distance
dI (M, N) for interval modules M and N.

Definition 12.22 (Candidate set). For two interval modules M and N, and for each point x in
I M ∪ IN , let

D(x) = {dl(x, L(I M )), dl(x, L(IN )), dl(x, U(I M )), dl(x, U(IN ))} and
S = {d | d ∈ D(x) or 2d ∈ D(x) for some vertex x ∈ V(I M ) ∪ V(IN )} and
S ≥δ := {d | d ≥ δ, d ∈ S }.

Algorithm 26 Interleaving(I M , IN )
Input:
I M and IN with t vertices in total
Output:
dI (M, N)
1: Compute the candidate set S and let  be the half of the smallest difference between any two
numbers in S . /* O(t) time */
2: Compute δ∗ ; Let δ = δ∗ . /* O(t) time */
3: Let δ∗ = δ0 , δ1 , · · · , δk be numbers in S ≥δ∗ in non-decreasing order. /*O(t log t) time */
4: ` := 0; u = k;
5: while ` < u /* O(log t) probes*/ do
6: i := b (u+`)
2 c; δ := δi ; δ := δ + ε;
0

7: Compute intersections Q := {I M ∩ IN→δ0 } ∪ {IN ∩ I M→δ0 }. /* O(t) time */


8: if every Q ∈ Q is valid or trivializable according to Theorem 12.14 /* O(t) time*/ then
9: u := i
10: else
11: ` := i
12: end if
13: end while
14: Output δ
Computational Topology for Data Analysis 315

In Algorithm 26:Interleaving, the following generic task of computing diagonal span is per-
formed for several steps. Let L and U be any two chains of vertical and horizontal edges that are
both x- and y-monotone. Assume that L and U have at most t vertices. Then, for a set X of O(t)
points in L, one can compute the intersection of ∆ x with U for every x ∈ X in O(t) total time. The
idea is to first compute by a binary search a point x in X so that ∆ x intersects U if at all. Then, for
other points in X, traverse from x in both directions while searching for the intersections of the
diagonal line with U in lock steps.
Now we analyze the complexity of the algorithm Interleaving. The candidate set, by def-
inition, has O(t) values which can be computed in O(t) time by the diagonal span procedure.
By Proposition 12.15, δ∗ is in S and can be determined by computing the interleaving distances
dI (M|∆x , N|∆x ) for modules indexed by diagonal lines passing through O(t) vertices of I M and IN .
This can be done in O(t) time by diagonal span procedure. Once we determine δ∗ , we perform
a binary search (while loop) with O(log t) probes for δ = dI (M, N) in the truncated set S δ≥δ∗
to satisfy the first condition of Theorem 12.14. Intersections between two polygons I M and IN
bounded by x- and y-monotone chains can be computed in O(t) time by a simple traversal of the
boundaries. The validity and trivializability of each intersection component can be determined in
time linear in the number of its vertices due to Proposition 12.10 and Proposition 12.11 respec-
tively. Since the total number of intersection points is O(t), validity check takes O(t) time in total.
The check for trivializabilty also takes O(t) time if one uses the diagonal span procedure. So the
total time complexity of the algorithm is O(t log t).
Proposition 12.15 below says that δ∗ is determined by a vertex in I M or IN and δ∗ ∈ S .
Proposition 12.15. (i) δ∗ = max x∈V(IM )∪V(IN ) {dI (M|∆x , N|∆x )}, (ii) δ∗ ∈ S .
The correctness of the algorithm Interleaving already follows from Theorem 12.14 as long
as the candidate set contains the distance dI (M, N). This is indeed true as shown in [141].
Theorem 12.16. dI (M, N) ∈ S .
Remark 12.1. Our main theorem and algorithm consider the persistence modules defined on R2 .
For a persistence module defined on a finite or discrete poset like Z2 , one can extend it to a persis-
tence module M on R2 to apply our theorem and algorithm. This extension is achieved by assum-
ing that all morphisms outside the given persistence module are isomorphisms and M x→−∞ = 0
if it is not given otherwise. The reader can draw the analogy between this extension and the one
we had for 1-parameter persistence modules (Remark 3.3).

12.5 Notes and Exercises


We already mentioned in Chapter 3 that for 1-parameter persistence modules, Chazal et al. [77]
showed that the bottleneck distance is bounded from above by the interleaving distance dI ; see
also [46, 53, 116] for further generalizations. Lesnick [220] established the isometry theorem
which showed that indeed dI = db . Consequently, dI for 1-parameter persistence modules can be
computed exactly by efficient algorithms known for computing db . In Section 3.2.1, we present
an algorithm for computing db from two given persistence diagrams.
Lesnick defined the interleaving distance for multiparameter persistence modules, and proved
its stability and universality [220]. Specifically, he established that interleaving distance between
316 Computational Topology for Data Analysis

persistence modules is the best discriminating distance between modules having the property of
stability. It is straightforward to observe that dI ≤ db . For some special cases, results in the
reverse direction exist. Botnan and Lesnick [47] proved that, for the special class of 2-parameter
persistence modules, called block decomposable modules, db ≤ 52 dI . The support of each inde-
composable in such modules consists of the intersection of a bounded or unbounded axis-parallel
rectangle with the upper halfplane supported by the diagonal line x1 = x2 . Bjerkevik [32] im-
proved this result to db ≤ dI thereby extending the isometry theorem dI = db to 2-parameter
block decomposable persistence modules.
Interestingly, a zigzag persistence module (Chapter 4) can be mapped to a block decompos-
able module [47]. Therefore, one can define an interleaving and a bottleneck distance between
two zigzag persistence modules by the same distances on their respective block decomposable
modules. Suppose that M1 and M2 denote the block decomposable modules corresponding to two
zigzag filtration F1 and F2 respectively. Bjerkevik’s result implies that db (Dgm p (F1 ), Dgm p (F2 )) ≤
2db (M1 , M2 ) = 2dI (M1 , M2 ). The factor of 2 comes due to the difference between how distances
to a null module are computed in 1-parameter and 2-parameter cases. It is important to note that
the bottleneck distance db for persistence diagrams here takes into account the types of the bars as
described in Section 4.3. This means, while matching the bars for computing this distance, only
bars of similar types are matched.
A similar conclusion can also be derived for the bottleneck distance between the levelset per-
sistence diagrams of Reeb graphs. Mapping the 0-th levelset zigzag modules Z f , Zg of two Reeb
graphs (F, f ) and (G, g) to block decomposable modules M f and Mg respectively, one gets that
db (Dgm0 (Z f ), Dgm0 (Zg )) ≤ 2db (M f , Mg ) = 2dI (M f , Mg ). The interleaving distance dI (M f , Mg )
between block decomposable modules is bounded from above (not necessarily equal) by the in-
terleaving distance between Reeb graphs given by Definition 7.6, that is, dI (M f , Mg ) ≤ dI (F, G).
Bjerkevik also extended his result to rectangle decomposable d-parameter modules (inde-
composables are supported on bounded or unbounded rectangles). Specifically, he showed that
db ≤ (2d − 1)dI for rectangle decomposable d-parameter modules and db ≤ (d − 1)dI for free
d-parameter modules. He gave an example for exactness of this bound when d = 2.
Multiparameter matching distance dm introduced in [71] provides a lower bound to interleav-
ing distance [216]. This matching distance can be approximated within any error threshold by
algorithms proposed in [29, 72]. But, it cannot provide an upper bound like db . The algorithm
for computing dm exactly as presented in Section 12.3 is taken from [207]. The complexity of
this algorithm is rather high. To address this issue, an approximation algorithm with better time
complexity has been proposed in [209] which builds on the result in [29].
For free, block, rectangle, and triangular decomposable modules, one can compute db by
computing pairwise interleaving distances between indecomposables in constant time because
they have a description of constant complexity. Due to the results mentioned earlier, dI can be es-
timated within a constant or dimension-dependent factors by computing db for these modules. On
the other hand, Botnan and Lesnick [47] observed that even for interval decomposable modules,
db cannot approximate dI by any constant factor approximation.
Bjerkevik et al. [33] showed that computing interleaving distance for 2-parameter interval
decomposable persistence modules as considered in this chapter is NP-hard. Worse, it cannot be
approximated within a factor of 3 in polynomial time. In this context, the fact that db does not
approximate dI within any factor for 2-parameter interval decomposable modules [47] turns out
Computational Topology for Data Analysis 317

to be a boon in disguise because otherwise a polynomial time algorithm for computing it by the
algorithm as presented in Section 12.4 would not have existed. This algorithm is taken from [141]
whose extension to the multiparameter persistence modules is available on arxiv.

Exercises
1. Show that dI and db are pseudo-metrics on the space of finitely generated multiparameter
persistence modules. Show that if the grades of generators and relations of the modules do
not coincide, both become metrics.

2. Give an example of two persistence modules M and N for which dm (M, N) = 0 but
dI (M, N) , 0.

3. Prove dI ≤ db and dm ≤ dI .

4. Prove Fact 12.1 for point-line duality.

5. The algorithm MatchDist computes dm in O(n11 ) time where n is the total number of gen-
erators and relations with which the input modules are described. Design an algorithm for
computing dm that runs in o(n11 ) time.

6. Consider the matching distance dm between two interval modules. Compute dm in this case
in O(n4 ) time.

7. Given an interval decomposable persistence module M ∈ Rd -mod and the subcategory B ⊆


Rd -mod of rectangle decomposable modules, let M ∗ denote an optimal approximation of M
with a module in BLw.r.t.i the bottleneck
L distance db , that is, M ∗ = argmin M 0 ∈B db (M, M 0 ).
Show that if M = M , then M =∗ i∗
M .

8. Prove Proposition 12.7.

9. For two points x, y ∈ R2 , the `∞ distance between x, y is given by `∞ (x, y) = max{x1 −


y1 , x2 − y2 }. Given a non-negative real δ ≥ 0, we can define an `∞ δ-ball centered at a point
x ∈ R2 as δ (x) = {x0 ∈ R2 : `∞ (x, x0 ) ≤ δ}. We can further extend this idea to a set I ∈ R2
as I +δ = x∈I δ (x), which is the union of all `∞ δ-balls centered at all points in I. For two
S
intervals I, J ⊂ R2 , the `∞ Hausdorff distance is defined as dH (I, J) = inf δ {I ⊆ J +δ , J ⊆
I +δ }. Show that:

(a) For two interval modules M and N, we have dI (M, N) ≤ dH (I M , IN ).


(b) dI (M, N) dH (I M , IN ) strictly.

(Hint: show that dH (I M , IN ) ≥ δ∗ and ∀δ ≤ dH (I M , IN ) each intersection component be-


tween M, N→δ , and between N, M→δ is valid.)
318 Computational Topology for Data Analysis
Chapter 13

Topological Persistence and Machine


Learning

Machine learning (ML) has been a prevailing technique for data analysis. Naturally, researchers
in the past few years have explored ways to combine the machine learning techniques with the
TDA techniques. In previous chapters we have introduced various topological structures and
algorithms for computing them. In this chapter, we give two examples of combining topological
ideas with machine learning approaches. Note that this chapter is not intended to be a survey of
such TDA+ML approaches, given that this is a very active and rapidly evolving field.
We have seen that persistent homology, in some sense, encodes the “shape” of data. Thus, it
is natural to use persistent homology to map a potentially complex input data (e.g., a point set or a
graph) to a feature representation (persistence diagram). In particular, a simple persistence-based
feature vectorization and data analysis framework can be as follows: Given a collection C of
objects (e.g., a set of images, a collection of graphs, etc.), apply the persistent homology to map
each object to a persistence-diagram representation. Thus, objects in the input collection are now
mapped to a set of points in the space of persistence diagrams. Different types of input data can
all be now mapped to a common feature space: the space of persistence diagrams. Equipping this
space with appropriate metric structures, one can then carry out downstream data analysis tasks on
C in the space of persistence diagrams. In Section 13.1, we further elaborate on this framework, by
describing several methods to assign a nice metric or kernel on the space of persistence diagrams.
One way to further incorporate topological information into machine learning framework is
by using a “topological loss function". In particular, as topology provides a language to describe
global properties of a space, it can help a machine learning task at hand by allowing one to
inject topological constraints or prior. This usually leads to optimizing a “topological function"
over certain persistence diagrams. An example is given in Figure 13.1, taken from [199], where
a term representing the topological quality of the output segmented images is added as part of
the loss function to help improve topology of the segmented foregrounds. In Section 13.2, we
give another example of how to use “topological function" and describe how to address the key
challenge of differentiating such a topological loss function when it involves persistent homology
based information.
In this book, we have focused mainly on the mathematical structures and algorithmic / compu-
tational aspects involving TDA. However, we note that there has been important development in

319
320 Computational Topology for Data Analysis

Figure 13.1: The high level neural network framework, where topological information of the
segmented image (captured via persistent homology) is used to help train the neural network for
better segmentation; reprinted by permission from Xiaoling Hu et al. (2019, fig. 2)[199].

statistical treatment of topological summaries, which are crucial for quantification of uncertainty,
noise, and convergence of topological summaries computed from sampled data. In concluding
this book, we provide a very brief description of some of such developments in Section 13.3 at
the end of this chapter. Interested readers can follow the references given within this section for
further details.

13.1 Feature vectorization of persistence diagrams


The space of persistence diagrams equipped with the bottleneck distance (or the p-th Wasserstein
distance) introduced in previous chapters lacks (e.g., inner-product) structure, which can pose
challenges when used within a machine learning framework. To address this issue, in the past few
years, starting with the persistence landscapes [51], a series of methods have been developed to
map persistence diagrams to a (finite or infinite dimensional) vector space or a Hilbert space. (A
Hilbert space is a vector space equipped with an inner product, and that it is a complete metric
space w.r.t. the distance induced by the inner product.) This can be done explicitly, or by defining
a positive (semi-)definite kernel for persistence diagrams. Below we briefly introduce some of
them. In what follows, for simplicity, let D denote the space of bounded and finite persistence
diagrams. Some of the results require only finite persistence diagrams, where the total number
of points other than the points from the diagonal in a diagram (including multiplicity) is finite.
However, for simplicity of presentation, we assume that diagrams are also bounded within a finite
box.

13.1.1 Persistence landscape


Persistence landscape was introduced in [51], aiming to make persistence-based summaries easier
for statistical analysis via mapping persistence diagrams to a function space.
Computational Topology for Data Analysis 321

Death-time
(2, 6)

(3, 5)

(1, 4) (2, 6)
(1, 4)
λ1
(3, 5)

λ2
Birth-time
λ3

(a) (b)
Figure 13.2: (a) A persistence diagram D and its corresponding landscape functions are in (b),
where λk := λD (k, ·) for k = 1, 2, and 3.

Definition 13.1 (Persistence landscape). Given a finite persistence diagram D = {(bi , di )}i∈[1,n]
from D, the persistence landscape w.r.t. D is a function λD : N × R → R where

λD (k, t) := k-th largest value of [min{t − bi , di − t}]+ for i ∈ [1, n].

Here, [c]+ = max(c, 0).

For a fixed k, λD (k, ·) : R → R is a function on R. In particular, one can think of each persistent
point (bi , di ) giving rise to a triangle whose upper boundary is traced out by points

(t, [min{t − bi , di − t}]+ ) | t ∈ R ;




see Fig. 13.2. There are n such triangles, and the function λD (k, ·) is the k-th upper envelop in the
arrangement formed by the union of these triangles, which intuitively are points on the boundary
of the k-th layer of these triangles.
The persistence landscape maps the persistence diagrams to a linear function space. The
p-norm on persistence landscapes is defined as

X
p p
kλD k p = kλD (k, ·)k p .
k=1

Given two persistence diagrams D1 and D2 , their p-landscape distance is defined by

Λ p (D1 , D2 ) = kλD1 − λD2 k p . (13.1)

Note that for any k > n, λD (k, ·) ≡ 0. One can recognize that persistence landscapes for
finite persistence diagrams lie in the so-called L p -space L p (N × R) 1 . If p = 2, then this is a
Hilbert space. Given a set of persistence diagrams, one can compute their mean or carry out other
statistical analysis in L p (N × R). For example, given a set of ` finite diagrams D1 , . . . , D` ∈ D,
one can define the mean landscape λ of their corresponding landscapes λD1 , . . . , λD` to be
`
1X
λ(k, t) = λD (k, t).
` i=1 i
1
For 1 ≤ p < ∞, L p (X) is defined as L p (X) := { f : X → R | k f k p ≤ +∞}. For example, L2 (Rd ) is the space of
standard square-integrable functions on Rd . Then L p (X) is defined as L p (X) = L p (X)/ ∼ where f ∼ g if k f − gk p = 0.
322 Computational Topology for Data Analysis

The following claim states that the map from the space of finite persistence diagrams D to the
space of persistence landscapes is injective, and this map is lossless in terms of the information
encoded in the persistence diagram.
Claim 13.1. Given a persistence diagram D, let λD be its persistence landscape. Then from λD
one can uniquely recover the persistence diagram D.
However, a function λ : N × R → R may not be the image of any valid persistence diagram.
For example, the mean landscape introduced above may not be the image of any persistence
diagram.
Finally, in addition to being injective, under appropriate norms, the map from persistence
diagram to persistence landscape is also stable (1-Lipschitz w.r.t. the bottleneck distance between
persistence diagrams):
Theorem 13.1. For persistence diagrams D and D0 , Λ∞ (D, D0 ) ≤ dB (D, D0 ).
Additional stability results for Λ p are given in [51], relating it to the p-th Wasserstein distance
for persistence diagrams, or for the case where the persistence diagrams are induced by tame
Lipschitz functions.

13.1.2 Persistence scale space (PSS) kernel


In the previous subsection, we introduced a way to map persistence diagrams into a function
space L p (N, R) (which is a Hilbert space when p = 2). One can also map persistence diagrams
to a so-called Reproducing Kernel Hilbert Space via the use of kernels. The work of [263] is the
first of a line of work defining a (positive semi-definite) kernel on persistence diagrams.
Definition 13.2 (Positive, negative semi-definite kernel). Given a topological space X, a function
k : X × X → R is a positive semi-definite kernel if it is symmetric and for any integer n > 0,
any x1 , . . . , xn ∈ X, and any a1 , . . . , an ∈ R with i ai = 0, it holds that i, j ai a j k(xi , x j ) ≥ 0.
P P
Analogously, k is a negative semi-definite kernel if it is symmetric and any integer n > 0, any
x1 , . . . , xn ∈ X and any a1 , . . . , an ∈ R with i ai = 0, it holds that i, j ai a j k(xi , x j ) ≤ 0.
P P

Now, given a set X and a Hilbert space H of real-valued functions on X, the evaluation
functional over H is a linear functional that evaluates each function in f at a point x: that is,
given x, L x : H → R is defined as L x ( f ) = f (x) for any f ∈ H. The Hilbert space H is called a
Reproducing Kernel Hilbert Space (RKHS) if L x is continuous for all x ∈ X. It is known that given
a positive semi-definite kernel k, there is a unique Reproducing Kernel Hilbert Space (RKHS) Hk
such that k(x, y) = hk(·, x), k(·, y)iHk . We call k the reproducing kernel for Hk . From now on,
we simply use “kernel" to refer to a positive semi-definite kernel. See [271] for more detailed
discussions of kernels, RKHS, and related concepts.
Equivalently, a kernel can be thought of as the inner product k(x, y) = hΦ(x), Φ(y)iH after
mapping X to some Hilbert space H via a feature map Φ : X → H. With this inner product, one
can also further induce a pseudo-metric2 by:

d2k (x, y) := k(x, x) + k(y, y) − 2k(x, y), or equivalently, dk (x, y) = kx − ykH .


2
Recall that different from a metric, for a pseudo-metric, d(x, y) = 0 may not imply that x = y. All other conditions
for a metric holds for a pseudo-metric.
Computational Topology for Data Analysis 323

Many machine learning pipelines directly use kernels and its associate inner-product structure.
The work of [263] constructs then following persistence scale
o space kernel (PSSK) by defining
an explicit feature map. Let Ω = x = (x1 , x2 ) ∈ R | x2 ≥ x1 denote the subspace of R2 on or
2

above the diagonal3 . Recall the L2 -space L2 (Ω), which is a Hilbert space.
Definition 13.3 (Persistence scale space kernel (PSSK)). Define the feature map Φσ : D → L2 (Ω)
at scale σ > 0 as follows: for a persistence diagram D ∈ D and x ∈ Ω, set:
1 X − ||x−y||2 ||x−ȳ||2
Φσ (D)(x) = [e 4σ − e− 4σ ],
4πσ y∈D

where ȳ = (y2 , y1 ) if y = (y1 , y2 ) (i.e, ȳ is the reflection of y across the diagonal). This feature
map induces the following persistence scale space kernel (PSSK) kσ : D × D → R using the inner
product structure on L2 (Ω): given two diagrams D, E ∈ D,
1 X ||y−z||2 ||y−z̄||2
kσ (D, E) = hΦσ (D), Φσ (E)iL2 (Ω) = [e− 8σ − e− 8σ ]. (13.2)
8πσ y∈D; z∈E

In other words, a persistence diagram is now mapped to a function Φσ (D) : Ω → R under the
feature map Φσ . By construction, the PSS kernel is positive definite. Now consider the distance
induced by the PSS kernel

kΦσ (D) − Φσ (E)kL2 (Ω) = kσ (D, D) + kσ (E, E) − 2kσ (D, E).


p

This distance is stable in the sense that the feature map Φσ is Lipschitz w.r.t. the 1-Wasserstein
distance:
Theorem 13.2. Given two persistence diagrams D, E ∈ D, we have
1
kΦσ (D) − Φσ (E)kL2 (Ω) ≤ dW,1 (D, E).
2πσ

13.1.3 Persistence images


Let D ∈ D be a finite persistence diagram. We set T : R2 → R2 to be the linear transformation
where for each (x, y) ∈ R2 , T (x, y) = (x, y − x). Let T (D) denote the transformed diagram of
D. Let φu : R2 → R be a differentiable probability distribution with mean u ∈ R2 (e.g, the
kz−uk2
1 − 2τ2
normalized Gaussian where for any z ∈ R2 , φu (z) = 2πτ2 e ). We now define the persistence
images introduced in [4].
Definition 13.4 (Persistence image). Let ω : R2 → R be a non-negative weight function for R2 .
Given a persistence diagram D, its persistence surface µD : R2 → R (w.r.t. ω) is defined as:
X
µD (z) := ω(u)φu (z), for any z ∈ R2 . (13.3)
u∈T (D)

3
Often in the literature, one assumes that the standard persistent homology is considered where the the birth time
is smaller than or equal to the death time in the filtration. Several of the kernels introduced here, including PSSK and
PWGK, assume that persistence diagrams lie in Ω.
324 Computational Topology for Data Analysis

The persistence image is a discretization of the persistence surface. Specifically, fix a grid on a
rectangular region in the plane with a collection P of N rectangles (pixels). The persistence image
of a persistence diagram D is ID = { I[p] }p∈P which consists of N numbers (i.e, a vector in RN ),
one for each pixel p in the grid P with ID [p] := p ρD dxdy.
R

We remark that the weight function ω in constructing the persistence surface allows points
in the persistence diagrams to have different contribution in the final representation. A natural
choice of ω(u) could be the persistence |b − d| of point u = (b, d).
The persistence image can be viewed as a vector in RN . One could then compute distance
between two persistence diagrams D and E by the L2 -distance kID −IE k2 between their persistence
images (vectors) ID and IE . Other L p -norms can also be used.
Persistence images are shown to be stable w.r.t. the 1-Wasserstein distance between persis-
tence diagrams [4]. As an example, below we state the stability result for the special case where
the persistence surfaces are generated using the normalized Gaussian distribution φu : R2 → R
−kz−uk22 /2σ2
defined via φu (z) = 2πσ
1
2e for any z ∈ R2 . See [4] for stability results for the general
cases.

Theorem 13.3. Suppose persistence images are computed with the normalized Gaussian distri-
bution with variance σ2 and weight function ω : R2 → R. Then the persistence images are stable
w.r.t. the 1-Wasserstein distance between persistence diagrams. More precisely, given two finite
and bounded persistence diagrams D and E, we have:
r
√ 10 kωk∞ 
k I D − I E k1 ≤ 5|∇ω| + · dW,1 (D, E).
π σ
Here, ∇ω stands for the gradient of ω, and |∇ω| = supz∈R2 k∇ωk2 is the maximum norm of the
gradient vector of ω at any point in R2 . The same upper bound holds for k ID −IE k2 and k ID −IE k∞
as well.

13.1.4 Persistence weighted Gaussian kernel (PWGK)


The work of [215] proposes to first embed each persistence diagram to a Reproducing Kernel
Hilbert Space (RKHS). Using the representation of persistence diagrams in this RKHS, one can
further put another kernel on top of them to obtain a final kernel for persistence diagrams.
In particular, the first step of embedding persistence diagrams to a RKHS is achieved by
kernel embedding for (signed) measures. Recall that given a kernel k, there is a unique RKHS Hk
associated to k where k is its reproducing kernel. Now given a locally compact Hausdorff space
X, let C0 (X) denote the space of continuous functions vanishing at infinity. A kernel k on X is
called a C0 -kernel if k(·, x) is in C0 (X) for any x ∈ Ω. It turns out that if k is C0 -kernel, then its
associated RKHS Hk is a subspace of C0 (X); we further call k C0 -universal if it is C0 -kernel and
2
− kx−yk
Hk is dense in C0 (X). For example, the d-dimensional Gaussian kernel kG (x, y) = 2πτ 1
2e
2τ2 ) is
n o
C0 -universal on Rd [283]. Recall that Ω = x = (x1 , x2 ) ∈ R2 | x2 ≥ x1 denote the subspace of R2
on or above the diagonal.

Definition 13.5 (Persistence weighted kernel). Let k : Ω → R be a C0 -universal kernel on Ω (e.g,


a Gaussian), and ω : Ω → R+ a strictly positive (weight) function on Ω. The following feature
Computational Topology for Data Analysis 325

map Ψk,ω : D → Hk maps each persistence diagram D ∈ D to the RKHS Hk associated to k:


X
Ψk,ω (D) = ω(x)k(·, x).
x∈D

This feature map induces the following persistence weighted kernel (PWK) Kk,ω : D × D → R:
X
Kk,ω (D, E) = hΨk,ω (D), Ψk,ω (E)iHk = ω(x)ω(y)k(x, y). (13.4)
x∈D; y∈E

The intuition of the above feature map is as follows: given a persistence diagram D, it can be
viewed as a discrete measure µωD := x∈D ω(x)δ x , where ω : R2 → R is a weight function, and δ x
P
is the Dirac measure at x. (Similar to persistence images, the use of the weight function ω allows
different point in the birth-death plane to have different influence.) The map Ψk,w (D) is essen-
tially the kernel mean embedding of distributions (with persistence diagrams viewed as discrete
measures) to the RKHS. It is known that if the kernel k is C0 -universal, then this embedding is in
fact injective [283], and hence the resulting induced distance kΨk,w (D) − Ψk,w (E)kHk is a proper
metric (instead of a pseudo-metric).

An alternate construction. An equivalent construction for Eqn. (13.4) is as follows: Treat a


persistence diagram D as an unweighted discrete measure µD = x∈D δ x . Given a kernel k,
P
consider a ω-weighted version of it

kω (x, y) := ω(x) · ω(y) · k(x, y).

This weighted kernel kω is still positive semi-definite for strictly positive weight function ω :
Ω → R+ . Let Hkω denote its associated RKHS. Then the following map
X
Ψkω (D) := ω(x)ω(·)k(·, x)
x∈D

defines a valid feature map Ψkω : D → Hkω to the RKHS Hkω . It is shown in [215] that the induced
inner product Kkω : D × D → R by this feature map equals the inner product in Eqn. (13.4):
X
Kkω (D, E) = hΨkω (D), Ψkω (E)iHkω = ω(x)ω(y)k(x, y) = Kk,w (D, E). (13.5)
x∈D;y∈E

Persistence weighted Gaussian kernel (PWGK). There are different choices for the weight
function ω and the kernel k. For example, given a persistence point x = (b, d), let pers(x) = |d−b|.
Then we can set the weight function to be

ωarc (x) = arctan(C · pers(x) p ), where C is a constant and p ∈ Z>0 .


2
− kx−yk
We can also choose the kernel k to be the 2D (un-normalize) Gaussian kernel kG (x, y) = e 2τ2 .
The the weighted kernel kGωarc (x, y) = ωarc (x)ωarc (y)kG (x, y) is referred to as the persistence
weighted Gaussian kernel (PWGK). Stability results of the PWGK-induced distance w.r.t. the
326 Computational Topology for Data Analysis

bottleneck distance dB and the 1-Wasserstein distance dW,1 on persistence diagrams are shown in
[215], with bounds depending on the weight function ω and the kernel kG . The precise statements
are somewhat involved, so we omit the details here. We remark that the stability w.r.t the bottle-
neck distance is provided, which is usually harder to obtain than for Wasserstein distance for such
vectorizations of persistence diagrams.
Finally, now that the persistence diagrams are embedded in a RKHS, one can directly use
the associated inner product and kernel for machine learning pipelines. One can also further put
another kernel based on the RKHS representation of persistence diagrams. Indeed, the persistence
weighted kernel in Eqn (13.4) is equivalent of putting a linear kernel on the RKHS Hk . We can
also consider using a non-linear kernel, say the Gaussian kernel on the RKHS Hk , and obtain yet
another kernel on persistence diagrams, called (k, ω)-Gaussian kernel4 :
1
G
(D, E) = exp − kΨk,w (D) − Ψk,w (E)k2Hk .

Kk,w 2

13.1.5 Sliced Wasserstein kernel


Instead of using feature maps, one can also construct a kernel for persistence diagrams directly,
and we now describe such an approach taken in [68]. This requires a positive semi-definite kernel
(recall Definition 13.2). One way to construct a positive definite kernel is to exponentiate a
negative (semi-)definite kernel: see the following result [25]5 .

Theorem 13.4. Given X and φ : X × X → R, the kernel φ is negative semi-definite if and only if
e−tφ is positive semi-definite for all t > 0.

In what follows, we construct the so-called Sliced Wasserstein distance dS W for persistence
diagrams, which is shown to be negative definite. We then use it to construct the Sliced Wasser-
stein kernel following the above theorem.
Specifically, let µ, ν be two (unnormalized) non-negative measures on the real line, such that
the total mass µ(R) equals ν(R), and they are bounded. Let Π(µ, ν) denote the set of measures on
R2 with marginals µ and ν. Consider
"
W(µ, ν) = inf |x − y| · dP(x, y), (13.6)
P∈Π(µ,ν) R×R

which is simply the 1-Wasserstein distance between measures µ and ν. In the following definition,
S1 denotes the unit circle in the plane.

Definition 13.6 (Sliced Wasserstein distance). Given a unit vector θ ∈ S1 ⊆ R2 , let L(θ) denote
the line {λθ | λ ∈ R}. Let πθ : R2 → L(θ) be the orthogonal projection of the plane onto L(θ).
Given two persistence diagrams D and E, set µθD := p∈D δπθ (p) and µθD := p∈D δπθ ◦π∆ (p) , where
P P

π∆ : R2 → ∆ is the orthogonal projection onto the diagonal ∆ = {(x, x) | x ∈ R}. Set µθE and µθE in
a symmetric manner. Then the Sliced Wasserstein distance between D and E is defined as:
Z
1
dS W (D, E) := W(µθD + µθE , µθE + µθD )dθ.
2π S1
4
In the work of [215], it also sometimes refer to ΨGkG ,warc as the persistence weighted Gaussian kernel.
5
In [25], the use of positive (negative) positive kernel is the same as our positive (negative) semi-positive kernel.
Computational Topology for Data Analysis 327

In the above definition, the sums µθD + µθE and µθE + µθD ensure the resulting two measures have
the same total mass.
Proposition 13.5. dS W is negative semi-definite on D where D is the space of bounded and finite
persistence diagrams.
Combining the above proposition with Theorem 13.4, we can now define the positive semi-
definite Sliced Wasserstein kernel kS W on D as:
dS W (D,E)

kS W (D, E) := e 2σ2 , for σ > 0. (13.7)

The Sliced Wasserstein distance is not only stable, but also strongly equivalent to 1-Wasserstein
distance dW,1 on bounded persistence diagrams in the following sense:
Theorem 13.6. Let DN be the set of bounded persistence diagrams with cardinalities at most N.
For any D, E ∈ DN , one has:
dW,1 (D, E) √
≤ dS W (D, E) ≤ 2 2 · dW,1 (D, E).
4N(4N − 1) + 2

13.1.6 Persistence Fisher kernel


The construction of persistence Fisher (PF) kernel, proposed by [218], uses a similar idea as the
Sliced Wasserstein (SW) distance in the sense that it will also leverage Theorem 13.4 to construct
a positive definite kernel on persistence diagrams. However, it uses Fisher information metric
from information geometry (usually used for probability measures) to derive the kernel. First,
given a persistence diagram D, we map it to a function µD : R2 → R+ ∪ {0} as follows:
1X
µD (x) := φG,σ (x, u),
Z z∈D
2
Z
− kx−uk
X
where φG,σ (x, u) = e 2σ and Z = φG,σ (x, u)dx.
R2 u∈D

This function is similar to the persistence surface used in [4]. Recall that ∆ denotes the diagonal
in the plane. Given a diagram D, let D∆ := {π∆ (u) | u ∈ D} where π∆ denotes the orthogonal
projection onto the diagonal ∆.
Definition 13.7 (Persistence Fisher (PF) kernel). Given two persistence diagrams D, E, the Fisher
information metric between their corresponding persistence surfaces µD and µE is defined as:
Z p 
dFI M (D, E) := dFI M (µD∪E∆ , µE∪D∆ ) = arccos µD∪E∆ (x)µE∪D∆ (x) dx .
R2

The Persistence Fisher (PF) kernel for persistence diagrams is then defined as:

kPF (D, E) := e−t·dFI M (D,E) , for some t > 0.

Note that similar to the Sliced-Wasserstein distance, the use of D ∪ E∆ (resp. E ∪ D∆ ) is to


address the issue that D and E may have different cardinality.
328 Computational Topology for Data Analysis

Proposition 13.7. The function (dFI M − τ) is negative definite on the set of bounded and finite
persistence diagrams D for any τ ≥ π2 .

By the above result and Theorem 13.4, we have that e−t(dFI M −τ) is positive definite for t > 0
and τ ≥ π2 . Furthermore, by definition, we can rewrite the Persistence Fisher kernel as:
π
kPF (D, E) = e−t·dFI M (D,E) = α · e−t·(dFI M (D,E)−τ) , where τ ≥ and α = e−tτ > 0.
2
As α > 0 is a fixed constant, it then follows that:

Corollary 13.8. The Persistence Fisher kernel kPF is positive definite on D.

The work of [218] provides interesting analysis of the eigen system of the integral opera-
tor induced by kPF . Furthermore, both persistence Fisher kernel and Sliced-Wasserstein kernel
are infinitely divisible. This could bring computational advantages when using them in kernel
machines. The PSS kernel and PWGK kernels do not have this property.
We remark that there are other vectorization approaches of persistence diagrams developed.
Very recently, there have also been several pieces of work on learning the representation of per-
sistence diagrams in an end-to-end manner using labelled data. We will mention some of these
work in the Bibliography notes later.

13.2 Optimizing topological loss functions


Topology provides a language to describe global properties of a space. One could envision to
add topological constraints or priors for a machine learning task at hand. This usually leads to
optimizing a “topological function" over certain persistence diagrams. To motivate this, in Section
13.2.1 we give an example where one aims to regularize the topological complexity of a classifier
that leads to a topological loss function. We then describe how a resulting topological function
can be optimized in Section 13.2.2. We briefly discuss some other recent work on injecting
topological constraints / loss in machine learning pipelines at the end of this section.

13.2.1 Topological regularizer


We describe the work of [96], which uses persistent homology to regularize classifiers as an
example to illustrate the occurrence of topological functions. For simplicity, we consider the
binary classification problem, and assume that the domain X (where the input data is sampled
from) is a d-dimensional hypercube. A classifier function is a smooth scalar function f : X → R,
which provides the prediction for a training/testing data point x ∈ X by evaluating sign( f (x)).
In other words, the classification boundary (separating the positive and negative classification
regions), denoted by S f , is simply the 0-level set (i.e, the level set at function value 0):

S f = f −1 (0) = {x ∈ X | f (x) = 0}.

See Fig. 13.3 (a) for an example, where the classification boundary S f consists of the U-shaped
curve and two closed loops.
Computational Topology for Data Analysis 329

(a) (b) (c) (d)

Figure 13.3: (a) Red curve is the classification boundary S f . (b) shows the graph of the classifier
function f , with S f (the level set at value 0) marked in red. (c) Pushing the saddle q1 down to
remove this left component in S f as shown in (d). (Image taken from [96]).

The classifier may have unnecessary details that over-fit the input data, and one way to address
this is via regularizing (constraining) properties of f (e.g., requiring that it is smooth). The work
of [96] proposed to regularize the “topological simplicity" of a classifier. In the example of Fig.
13.3 (a), there are three components (0-th homological features) in S f . To develop a notion of
“topological complexity” of the classification boundary, it is desirable to quantify the “robustness"
of these topological features. To do so, we will need to use the information of the entire classifier
function f beyond just the 0-level set; see Figure 13.3 (b). Notice that, while the two small
components in S f are of similar size in S f , intuitively, it takes less perturbation of the classifier
function f to remove the left component. In particular, one could push down the saddle point q1
so that this component is merged with the large component in the level-set S f and thus reduces
the 0-th Betti number of S f . See Figure 13.3 (c) and (d). The perturbation required to do so in
terms of the maximum changes in the function values is less than what is required for pushing q2
or p2 to remove the right component.
Hence the “robustness" of features within the level set S f depends on information of f beyond
just S f . To this end, one can do the following: Let Dgm f be the levelset zigzag persistence
diagram of f . Set
ΠS f := {(b, d) ∈ Dgm f | b ≤ 0; d ≥ 0}.

f
x7
x6 death time
x5 (f1, f6)
0 x4 (f2, f5)
0
(f3, f4)
x3 x2
x1

0 birth-time
(a) (b)

Figure 13.4: (a) A function f : R → R. Its persistence pairings (of critical points) are
marked by the dotted curves: {(x1 , x6 ), (x2 , x5 ), (x3 , x4 ), . . .}. The corresponding persistence di-
agram is shown in (b). The set ΠS f consists of all points within the red-rectangle; that is,
ΠS f = {( f1 , f6 ), ( f2 , f5 ), . . .}, where fi = f (xi ) for i ∈ [1, 6]. Note that ( f3 , f4 ) is not in ΠS f as
the interval [ f3 , f4 ] does not contain 0. (Image taken from [96].)

See Figure 13.4 for an illustration. Intuitively, points in ΠS f are those persistent features
330 Computational Topology for Data Analysis

whose life-time passes through the 0-level set S f . There is a one-to-one correspondence between
the “topological features" in S f with the points in ΠS f (this can be made more precise via the
persistent cycles concept introduced in Definition 5.7 of Chapter 5), and one can view a point
(b, d) ∈ ΠS f as the life-time of its corresponding feature in the 0-level set S f . The robustness of
the feature corresponding to point c = (b, d) is then defined as ρ(c) = min{|b|, |d|}. Intuitively, this
is the least amount of function perturbation in terms of the L∞ norm needed to remove this feature
from S f (i.e., to push persistent point c out of the set ΠS f ). One can then define the topological
complexity (topological penalty) for the classifier f as
X
Ltopo ( f ) := ρ2 (c).
c∈ΠS f

In practice, suppose for example we have the supervised setting where we are given a set of
points Xn = {x1 , . . . , xn } with class labels {y1 , . . . , yn }. Assume the classifier fω is parameterized
by ω ∈ Rm . We can combine the topological penalty Ltopo with any standard loss function to
define a final loss function, for example,
X
L( fω , Xn ) = `( fω (xi ), yi ) + λLtopo ( fω ), (13.8)
xi ∈Xn

where the first term represents standard loss and `(·, ·) could be cross-entropy loss, hinge loss and
so on.
Finally, to optimize L( fω , Xn ) w.r.t. ω (so as to learn the best classifier fω ), we can do (stochas-
tic) gradient decent, and thus need to compute gradient for Ltopo ( fω ). To this end, we approximate
the domain X by taking a certain simplicial complex K spanned by samples Xn . In [96], only the
0-th topological information of the classification boundary S f is used. Hence one only needs the
1-skeleton of K. In the implementation of [96], that is then simply taken as the k-nearest neighbor
graph spanned by input samples Xn . One then use the approach to be described shortly in Section
13.2.2 below to compute gradients of this loss function, which is a persistence-based topological
function.

13.2.2 Gradients of a persistence-based topological function


For simplicity, we describe the setting where we are given an input topological function which
incorporates the persistence induced by PL-function on a simplicial complex.
Specifically, given a simplifical complex domain K and a PL-function f : |K| → R, let Dgm f
denote the (sublevel set, superlevel set, union of them, or levelset zigzag) persistence diagram in-
duced by f . Now suppose that the function f is parameterized by some m-dimensional parameter
ω ∈ Rm , and denote the corresponding function by fω : |K| → R. Its resulting persistence diagram
is denoted by Dgm fω . In the exposition below, we sometime omit the subscript ω from fω for
simplicity.
Recall that Dgm f consists of a multiset set of points {(bi , di )}i∈I f , where I f is an index set.
Suppose we have a persistence-based topological function:

T(ω) := T(Dgm fω ) = T {(bi , di )}i∈I fω ;




for example, T(ω) could be Ltopo ( fω ) introduced in the previous section.


Computational Topology for Data Analysis 331

To optimize the topological function T(ω) one may need to compute gradient of T w.r.t. the
parameter ω. Applying the chain rule, this means that one needs to be able to compute ∂b i
∂ω and
∂di ∂T
∂ω for certain points (bi , di ) in the persistence diagram. (Terms such as ∂bi can be computed
easily if the analytic form of T w.r.t. bi s and di s are given; again, consider Ltopo ( f ) from previous
section as an example.) Intuitively, this requires the “inverse” of the map which maps fω to its
persistence diagram Dgm fω . This inverse in general does not exists. However, assuming that fω
is a PL function defined on K, then it turns out one can map bi s and di s back to vertices of K and
this map is locally constant if all vertices of K have distinct function values.
More specifically, suppose Dgm f is generated by the persistent homology of the sublevel
set filtration induced by f . Recall as described in Section 3.5.2, from the algorithmic point of
view, the sublevel set filtration is simulated by the so-called lower-star filtration of K. Using
notations from Section 3.5.2, let Vi be the first i vertices of V, sorted in non-decreasing order of
their function values, and Ki = j≤i Lst(vi ) the set of all simplices spanned by vertices in Vi (i.e,
S
by vertices whose function value is at most f (vi )). The sublevel set filtration F is constructed
by adding vi and all simplices in its lower-star in increasing order of i; recall Eqn. (3.10). Fur-
thermore, recall that (Theorem 3.16) each persistent point in the diagram Dgm f is in fact of the
form (bi , di ) = ( f (v`i ), f (vri )) such that the pairing function µ`fi ,ri > 0, and vertices v`i and vri are
both homological critical points for the PL-function f . We use the map ρ : Dgm f → V × V to
denote this correspondence6 with ρ(bi , di ) = (v`i , vri ). We will also abuse notation slightly and
write ρ(bi ) = v`i and ρ(di ) = vri . In other words, birth and death points in the persistence diagram
Dgm f can be mapped back to unique vertices in the vertex set of K.
This gives us a map ξ : Rm → 2V×V that map any parameter ω ∈ Rm to a collection of
pairs ξ(ω) := ρ(Dgm fω ) ⊆ V × V. Assume that as the parameter ω ∈ Rm changes, the function fω
changes continuously (w.r.t the L∞ norm on the function space). It then follows that its persistence
diagram Dgm fω also changes continuously due to the Stability result of persistence [102]. The
image of Dgm fω under ρω also changes, although not necessarily continuously. Nevertheless, for
a PL function fω , this image set stays fixed (constant) within a small neighborhood of ω if fω is
“nice”. More specifically,

Proposition 13.9. Suppose fω : |K| → R is a PL-function with distinct values on all vertices
V of K, and K is a finite simplicial complex. Then there exists a neighborhood of ω in the
parameter space such that ξ remains constant within this neighborhood; that is, the image set
ξ(ω) = ρω (Dgm fω ) remains the same for all parameters within this neighborhood.

Recall that bi = fω (ρω (bi )) = fω (v`i ). It follows that, if conditions on fω in Proposition 13.9
holds, then within a sufficiently small neighborhood of ω, even though bi moves continuously, the
identify of v`i remains the same and bi = fω (v`i ) as ω varies within this neighborhood. Hence we
have
∂bi ∂ fω (ρω (bi )) ∂ fω (v`i ) ∂ fω
= = = (v`i ).
ω ω ω ω
∂ fω
The derivative ∂di
∂ω can be computed in an analogous manner by ∂ω (vri ). This in turn leads to the
∂T
computation of the derivative ∂ω for the persistence-based topological function T(ω).
6
Note that while formulated differently, this map is the same as the one used in [257].
332 Computational Topology for Data Analysis

13.3 Statistical treatment of topological summaries


This book has been focused on the mathematical structures and computational aspects of various
topological objects useful for topological data analysis. Topological methods help us to map an
input object to its topological summaries, and thus it is natural to compute statistics or perform
statistical analysis of a collection of objects over their topological summaries. In this last section
of the book, we briefly mention some developments regarding stochastic and statistical aspects of
topological summaries. We note that while the main content of this book does not focus on them,
these are important topics for the development of topological data analysis, e.g., leading to more
rigorous quantification of uncertainty, noise, consistency and so on.

Performing statistics on space of persistence diagrams. One key objective in data analysis is
to model and quantify variations in data, such as computing the mean or variance of a collection
of data. Given the power of persistent homology in mapping an input complex object to its
persistence diagram summary, it’s natural to whether we can compute mean / variance in the
space of persistence diagrams. This question was first studied in [230], and to answer it, one
needs to study the property of the space of persistence diagrams equipped with certain metrics.
To state the results, we first need to refine the definition of Wasserstein distance of persistence
diagrams (Definition 3.10) to allow different norms for measuring the distance between two points
in the persistence diagram. The definition below assumes that we take the general view where a
persistence diagram includes infinitely many copies of the diagonal.

Definition 13.8. Let P and Q be two persistence diagrams. The (p, q)-Wasserstein distance be-
tween these two diagrams is:
  q1
X 
p q 
dW,q (P, Q) := inf  kx − Π(x)k p  ,

(13.9)

Π:P→Q
x∈P

where Π ranges over all bijections from P to Q.


Note that our q-Wasserstein distance introduced in Definition 3.10 is simply d∞
W,q under this
definition.

Now, let D∅ denote the trivial persistence diagram which contains only infinite copies of the
diagonal.
p
Definition 13.9. Given p, q, the space of persistence diagrams Dq consists of all persistence
diagrams within finite distance to the trivial persistence diagram D∅ ; that is,
p p
n o
Dq := P | dW,q (P, D∅ ) < ∞ .
p
In what follows, for simplicity, we abuse the notations slightly and let Dq denote the met-
p p p
ric space (Dq , dW,q ) equipped with dW,q . It is shown in [230] that D∞
q is a so-called Polish (i.e,
complete and separable) space, and probability measures can be defined. It is later shown that
more can be said about the space D22 (Theorem 2.5, [291]), which is a non-negatively curved
Alexandrov space (i.e., a geodesic space with curvature bounded from below by zero).
Computational Topology for Data Analysis 333

Furthermore, in both cases, the concept of “mean" and “variance" can be introduced using the
notion of Fréchet function. Specifically, in what follows, we use D to denote either D∞ 2
q and D2
with metric dD be the corresponding metric d∞ 2
W,q or dW,2 . We will consider probability measures
defined on (D, B(D)), where B(D) is the Borel σ-algebra on D.

Definition 13.10. Given a probability distribution ρ on (D, B(D)), its Fréchet function Fρ : D →
R is defined as, for any X ∈ D,
Z
Fρ (X) := d2D (X, Y)dρ(Y). (13.10)
D

The Fréchet variance of ρ is defined as the quantity

Varρ := inf Fρ (X),


X∈D

while the set at which this variance is obtained, i.e.,

E(ρ) = {X | Fρ (X) = Varρ },

is called the Fréchet expectation, or alternatively, the Fréchet mean set of ρ.

Often in the literature, one uses the Fréchet mean to refer to an element in the Fréchet mean
set. Intuitively, the Fréchet function is a generalization of the arithmetic mean in the sense that
it minimizes the sum of the square distances to all points in the distribution. If the input is a
collection of persistence diagrams Ω = {D1 , D2 , . . . , Dm }, then we can talk about the mean of this
collection as the mean of the discrete measure ρΩ = m1 m i=1 δDi induced by them, where δX is the
P
Dirac measure centered at X ∈ D.
In general, it is not clear whether Fréchet mean even exists. However, for the space D as
defined above, it is shown [230, 291] that Fréchet mean set is not empty under mild conditions on
the distribution.

Theorem 13.10. Let ρ be a probability measure on (D, B(D)) with a finite second moment, that
is, Fρ (X) < ∞ for any X ∈ D. If ρ has compact support, then E(ρ) , ∅.

In the case when D = D22 , leveraging the property of D22 , Turner et al. developed an iterative
algorithm to compute a local minimum of the Fréchet functional [291]. The computational ques-
tion for Fréchet mean however remains open. We also note that in general, the Fréchet mean is
not unique. This becomes undesirable for example when tracking the mean of a set of varying
persistence diagrams. To address this issue, a modified concept of probabilistic Fréchet mean is
proposed in [239], which intuitively is a probabilistic measure on D22 , and the authors show how
to use this to build useful statistics on the vineyard (the time-varying persistence diagrams).

Other statistical analysis of topological summaries. Another line of statistical treatment of


topological objects concerns relating the estimation of them when input data is assumed to be
sampled from a target space / distribution. For example, a common setting is that suppose we
observe a sample x1 , . . . , xn ∼ P, drawn i.i.d. from a distribution P supported on a compact
set X ⊂ Rd . One can then ask questions of how to relate the topological summaries estimated
334 Computational Topology for Data Analysis

from these samples to that of X (when appropriate), whether such estimates converge, or how to
compute confidence interval (set) and so on.
We will not review the results here, as that would require careful description of the models
used: We refer the readers to the nice survey by Wasserman in [297], which discussed the sta-
tistical estimation for various topological objects, including (hierarchical) clustering (related to
merge trees), persistence diagrams, and ridge estimation. We will just mention that in the context
of persistence diagram and its variants (e.g, persistence landscapes), there has been work to ana-
lyze their concentration and convergence behavior as the number of samples n tends to infinity for
different settings [85, 84], or to obtain confidence set for them via bootstrapping or subsampling
[34, 83, 160].
The inference and estimation of topological information has been discussed earlier in Chap-
ter 6; however we have assumed that the samples are deterministic there. Also note that as the
distribution P (where input points are sampled from) deviates further from the true distribution
we are interested in, the standard construction based on the Rips or Čech complexes to approx-
imate the sublevel sets of the distance field (recall Definition 6.7 in Section 6.3.1) is not longer
appropriate. Instead, one now needs to use more robust notions of “distance field". To this end,
an elegant concept called distance to measures (DTM) has been proposed [79], which has many
nice properties and can lead to more robust topology inferences; e.g., [82]. An alternative is to
use kernel-distance as proposed in [256].
Finally, we note that there also has been a line of work to study topological properties (e.g.,
Betti numbers, or the largest persistence in the persistence diagram) of random simplicial com-
plexes [35, 36, 37, 202, 203, 204]. We will not describe this interesting line of work in this book.

13.4 Bibliographical notes


The persistence landscape proposed by Bubenik [51] is perhaps the first to map persistence dia-
grams into a function space, which may often be taken to be a Banach space or a Hilbert space, so
as to facilitate statistical analysis (e.g, computing mean) of persistence-based summaries. The per-
sistence scale space (PSS) kernel, the persistence images, persistence weighted Gaussian kernel,
sliced Wasserstein kernel, and persistence Fisher kernel described in this chapter are introduced
in [263], [4], [215], [68], and [218], respectively. Note that this chapter does not aim to pro-
vide a complete survey on vectorization frameworks for persistence diagrams; and in addition to
those presented in Chapter 13.1, there are other similar approaches such as [1, 143, 205, 250]. As
mentioned in Section 13.3, there has also been work exploring how to perform statistical analysis
in the space of persistence diagrams equipped with standard bottleneck distance or Wasserstein
distance; e.g. [34, 52, 86, 160, 230, 239, 291], and see more details in Section 13.3.
Recently, there has been a line of work to learn the representation of persistence diagrams
in an end-to-end manner using labelled data. The work by Hofer et al. [194] was one of the
first along this direction, and the authors developed a neural network layer for this purpose (and
the work is later refined in [195]). Later, an alternative layer (based on the Deep Sets architec-
ture [303]) was proposed and developed in [67], which provides a rather general and versatile
framework to learn vector representation for persistence diagrams. For example, the representa-
tion learning can be based on several existing vectorization/kernelization of persistence diagrams,
including the persistence landscapes, persistence surfaces and sliced Wasserstein. In a related
Computational Topology for Data Analysis 335

work [305], learning the best representation based on persistence images is formulated as an op-
timization problem and solved directly via (stochastic) gradient descents. The resulting learned
vector representations can be combined simply with kernel-SVM for different tasks such as graph
classification.
Differentiating a function involving persistence has been independently proposed and studied
in several work from different communities, first in [165] for continuation of point clouds, then in
[257] for continuous shape matching and in [96] for topological regularization of classifiers. The
gradients computation of a persistence-based topological function presented in Section 13.2.2
follows mostly from the discussion in [257]. The general topological optimization framework
is rather general and powerful. Several recent work apply such ideas to different stages of ma-
chine learning applications. For example, [101, 199] used topological loss terms to help enforce
a topological prior on individual input object for deep-learning based image segmentation. The
work of [101] assumed certain prior knowledge of the topology of the segmented images. Instead
of assuming this prior-knowledge, [199] proposed to learn to segment with correct topology by
using a topological loss function to help ensure that the topology of segmented images is the
same as the ground-truth for labelled images. The potential applications of these ideas has been
further broadened in [49], where the authors introduced and developed a topological layer for
function-induced persistence and for distance-based filtration induced persistence. Such a persis-
tence layer idea is further developed in [211] using the persistence landscape representation of
general filtrations. Instead of having a topological constraint on individual input data point, one
can also consider using it for the latent space behind data. For example, [193, 236] applied such
ideas with auto-encoders.
There has also been some recent work on using (persistent) homology to help characterize
the complexity of a neural network (or its training process). For example, [264] proposed the so-
called neural persistence, to characterize the structural complexity of neural networks. [28, 180]
proposed to measure the capacity of an architecture by the topological complexity of the classifiers
it can produce. [168] proposed to study the topology of the activation networks (neural networks
with node activation for specific inputs), and used such patterns to help understand adversarial
examples. [243] studied the change of the topology of the transformed data space across different
layers of a deep neural network. While overall, exploration in this direction is still in the initial
stage, these are exciting ideas and there is much potential in using topological tools to understand
neural networks.
336 Computational Topology for Data Analysis
Bibliography

[1] Aaron Aadcock, Erik Carlsson, and Gunnar Carlsson. The ring of algebraic functions on
persistence barcodes. Homology, Homotopy and Applications, 18:381–402, 2016.

[2] Michal Adamaszek and Henry Adams. The Vietoris-Rips complexes of a circle. Pacific J.
Math., 290:1–40, 2017.

[3] Michal Adamaszek, Henry Adams, Ellen Gasparovic, Maria Gommel, Emilie Purvine,
Radmila Sazdanovic, Bei Wang, Yusu Wang, and Lori Ziegelmeier. On homotopy
types of Vietoris-Rips complexes of metric gluings. J. Appl. Comput. Topology, 2020.
https://doi.org/10.1007/s41468-020-00054-y.

[4] Henry Adams, Tegan Emerson, Michael Kirby, Rachel Neville, Chris Peterson, Patrick
Shipman, Sofya Chepushtanova, Eric Hanson, Francis Motta, and Lori Ziegelmeier. Persis-
tence images: a stable vector representation of persistent homology. J. Machine Learning
Research, 18:218–252, 2017.

[5] Pankaj K. Agarwal, Herbert Edelsbrunner, John Harer, and Yusu Wang. Extreme elevation
on a 2-manifold. Discrete Comput. Geom., 36(4):553–572, 2006.

[6] Pankaj K. Agarwal, Kyle Fox, Abhinandan Nath, Anastasios Sidiropoulos, and Yusu Wang.
Computing the Gromov-Hausdorff distance for metric trees. ACM Trans. Algorithms,
14(2):24:1–24:20, 2018.

[7] Paul Aleksandroff. Über den allgemeinen dimensionsbegriff und seine beziehungen zur
elementaren geometrischen anschauung. Mathematische Annalen, 98:617–635, 1928.

[8] Nina Amenta, Marshall W. Bern, and David Eppstein. The crust and the beta-skeleton:
Combinatorial curve reconstruction. Graphical Models Image Processing, 60(2):125–135,
1998.

[9] Hideto Asashiba, Emerson G. Escolar, Yasuaki Hiraoka, and Hiroshi Takeuchi. Matrix
method for persistence modules on commutative ladders of finite type. J. Industrial Applied
Math., 36(1):97–130, 2019.

[10] Michael Atiyah. On the Krull-Schmidt theorem with application to sheaves. Bulletin de la
Société Mathématique de France, 84:307–317, 1956.

337
338 Computational Topology for Data Analysis

[11] Dominique Attali, Herbert Edelsbrunner, and Yuriy Mileyko. Weak witnesses for Delaunay
triangulations of submanifolds. In Proc. ACM Sympos. Solid Physical Model., pages 143–
150, 2007.

[12] Dominique Attali, André Lieutier, and David Salinas. Efficient data structure for represent-
ing and simplifying simplicial complexes in high dimensions. In Proc. 27th Annu. Sympos.
Comput. Geom. (SoCG), pages 501–509, 2011.

[13] Maurice Auslander. Representation theory of Artin Algebras II. Communications in Alge-
bra, 1(4):269–310, 1974.

[14] Maurice Auslander and David Buchsbaum. Groups, rings, modules. Dover Publications,
2014.

[15] Aravindakshan Babu. Zigzag coarsenings, mapper stability and gene-network analyses,
2013. PhD thesis, Stanford University.

[16] Samik Banerjee, Lucas Magee, Dingkang Wang, Xu Li, Bing xing Huo, Jaikishan Jayaku-
mar, Katherine Matho, Meng-Kuan Lin, Keerthi Ram, Mohanasankar Sivaprakasam,
Josh Huang, Yusu Wang, and Partha Mitra. Semantic segmentation of microscopic
neuroanatomical data by combining topological priors with encoder-decoder deep net-
works. Nature Machine Intelligence, 2:585–594, 2020. Also available on biorxiv at
2020.02.18.955237.

[17] Jonathan Ariel Barmak and Elias Gabriel Minian. Strong homotopy types, nerves and
collapses. Discret. Comput. Geom., 47(2):301–328, 2012.

[18] Saugata Basu and Negin Karisani. Efficient simplicial replacement of semi-algebraic sets
and applications. CoRR, arXiv:2009.13365, 2020.

[19] Ulrich Bauer. Ripser: efficient computation of Vietoris-Rips persistence barcodes. CoRR,
arXiv:1908.02518, 2019.

[20] Ulrich Bauer, Xiaoyin Ge, and Yusu Wang. Measuring distance bewteen Reeb graphs. In
Proc. 30th Annu. Sympos. Comput. Geom. (SoCG), pages 464–473, 2014.

[21] Ulrich Bauer, Claudia Landi, and Facundo Mémoli. The Reeb graph edit distance is uni-
versal. In Proc. 36th Internat. Sympos. Comput. Geom. (SoCG), pages 15:1–15:16, 2020.

[22] Ulrich Bauer, Carsten Lange, and Max Wardetzky. Optimal topological simplification of
discrete functions on surfaces. Discrete Comput. Geom., 47(2):347–377, 2012.

[23] Ulrich Bauer and Michael Lesnick. Induced matchings of barcodes and the algebraic sta-
bility of persistence. In Proc. 13th Annu. Sympos. Comput. Geom. (SoCG), pages 355–364,
2014.

[24] Ulrich Bauer, Elizabeth Munch, and Yusu Wang. Strong equivalence of the interleaving
and functional distortion metrics for Reeb graphs. In Proc. 31st Annu. Sympos. Comput.
Geom. (SoCG), pages 461–475, 2015.
Computational Topology for Data Analysis 339

[25] Christian Berg, Jens P. R. Christensen, and Paul Ressel. Harmonic analysis on semigroups:
Theory of positive definite and related functions. Springer, 1984.

[26] Marshall W. Bern, David Eppstein, Pankaj K. Agarwal, Nina Amenta, L. Paul Chew,
Tamal K. Dey, David P. Dobkin, Herbert Edelsbrunner, Cindy Grimm, Leonidas J. Guibas,
John Harer, Joel Hass, Andrew Hicks, Carroll K. Johnson, Gilad Lerman, David Letscher,
Paul E. Plassmann, Eric Sedgwick, Jack Snoeyink, Jeff Weeks, Chee-Keng Yap, and Denis
Zorin. Emerging challenges in computational topology. CoRR, arXiv:cs/9909001, 1999.

[27] Dimitris Bertsimas and John N. Tsitsiklis. Introduction to Linear Optimization. Athena
Scientific, Belmont, MA, 1997.

[28] Monica Bianchini and Franco Scarselli. On the complexity of neural network classifiers:
A comparison between shallow and deep architectures. IEEE Trans. Neural Networks
Learning Sys., 25(8):1553–1565, 2014.

[29] Silvia Biasotti, Andrea Cerri, Patrizio Frosini, and Daniela Giorgi. A new algorithm for
computing the 2-dimensional matching distance between size functions. Pattern Recogni-
tion Letters, 32(14):1735–1746, 2011.

[30] Silvia Biasotti, Bianca Falcidieno, and Michela Spagnuolo. Extended Reeb graphs for sur-
face understanding and description. In Proc. 9th Internat. Conf. Discrete Geom. Computer
Imagery, pages 185–197, 2000.

[31] Silvia Biasotti, Daniela Giorgi, Michela Spagnuolo, and Bianca Falcidieno. Reeb graphs
for shape analysis and applications. Theor. Comput. Sci., 392(1-3):5–22, 2008.

[32] Håvard Bjerkevik. Stability of higher-dimensional interval decomposable persistence mod-


ules. CoRR, arXiv:1609.02086, 2020.

[33] Håvard Bjerkevik, Magnus Botnan, and Michael Kerber. Computing the interleaving dis-
tance is NP-hard. Found. Comput. Math., 2019.

[34] Ander J. Blumberg, Itamar Gal, Michael A. Mandell, and Matthew Pancia. Robust statis-
tics, hypothesis testing, and confidence intervals for persistent homology on metric mea-
sure spaces. Found. Comput. Math., 14:745–789, 2014.

[35] Omer Bobrowski and Matthew Kahle. Topology of random geometric complexes: a survey.
J. Applied Comput. Topology, 1(3):331–364, 2018.

[36] Omer Bobrowski, Matthew Kahle, and Primoz Skraba. Maximally persistent cycles in
random geometric complexes. Ann. Appl. Probab., 27(4):2032–2060, 2017.

[37] Omer Bobrowski and Primoz Skraba. Homological percolation and the Euler characteris-
tic. Phys. Rev. E, 101:032304, 2020.

[38] Erik Boczko, William D. Kalies, and Konstantin Mischaikow. Polygonal approximation of
flows. Topology and its Applications, 154:2501–2520, 2007.
340 Computational Topology for Data Analysis

[39] Jean-Daniel Boissonnat, Frédéric Chazal, and Mariette Yvinec. Geometric and Topological
Inference. Cambride Texts in Applied Mathematics. Cambridge U. Press, 2018.

[40] Jean-Daniel Boissonnat, Leonidas J. Guibas, and Steve Y. Oudot. Manifold reconstruction
in arbitrary dimensions using witness complexes. In Proc. 23rd Annu. Sympos. Comput.
Geom. (SoCG), pages 194–203, 2007.

[41] Jean-Daniel Boissonnat and Siddharth Pritam. Edge collapse and persistence of flag com-
plexes. In 36th Internat. Sympos. Comput. Geom., (SoCG), volume 164 of LIPIcs, pages
19:1–19:15, 2020.

[42] Jean-Daniel Boissonnat, Siddharth Pritam, and Divyansh Pareek. Strong collapse for per-
sistence. In 26th Annu. European Sympos. Algorithms, ESA, volume 112 of LIPIcs, pages
67:1–67:13, 2018.

[43] Glencora Borradaile, Erin Wolf Chambers, Kyle Fox, and Amir Nayyeri. Minimum cycle
and homology bases of surface-embedded graphs. J. Comput. Geom. (JoCG), 8(2):58–79,
2017.

[44] Glencora Borradaile, William Maxwell, and Amir Nayyeri. Minimum bounded chains
and minimum homologous chains in embedded simplicial complexes. In 36th Internat.
Sympos. Comput. Geom., (SoCG), volume 164 of LIPIcs, pages 21:1–21:15, 2020.

[45] Karol Borsuk. On the imbedding of systems of compacta in simplicial complexes. Funda-
menta Mathematicae, 35:217–234, 1948.

[46] Magnus Botnan, Justin Curry, and Elizabeth Munch. The poset interleaving distance, 2016.

[47] Magnus Botnan and Michael Lesnick. Algebraic stability of zigzag persistence modules.
Algebraic & Geometric Topology, 18:3133–3204, 2018.

[48] Stephane Bressan, Jingyan Li, Shiquan Ren, and Jie Wu. The embedded homology of
hypergraphs and applications. Asian J. Math., 23(3):479–500, 2019.

[49] Rickard Brüel-Gabrielsson, Bradley J. Nelson, Anjan Dwaraknath, Primoz Skraba,


Leonidas J. Guibas, and Gunnar Carlsson. A topology layer for machine learn-
ing. CoRR, arXiv:1905.12200, 2019. Code available at https://github.com/bruel-
gabrielsson/TopologyLayer.

[50] Winfried Bruns and H Jürgen Herzog. Cohen-Macaulay Rings. Cambridge University
Press, 1998.

[51] Peter Bubenik. Statistical topological data analysis using persistence landscapes. J. Ma-
chine Learning Research, 16(1):77–102, 2015.

[52] Peter Bubenik and Peter T. Kim. A statistical approach to persistent homology. Homology,
Homotopy, and Applications, 9(2):337–362, 2007.

[53] Peter Bubenik and Jonathan Scott. Categorification of persistent homology. Discrete Com-
put. Geom., 51(3):600–627, 2014.
Computational Topology for Data Analysis 341

[54] Peter Bubenik, Jonathan A. Scott, and Donald Stanley. Wasserstein distance for generalized
persistence modules and abelian categories. arXiv: Rings and Algebras, arxiv:1809.09654,
2018.
[55] Michaël Buchet, Frédéric Chazal, Tamal K. Dey, Fengtao Fan, Steve Y. Oudot, and Yusu
Wang. Topological analysis of scalar fields with outliers. In Proc. 31st Annu. Sympos.
Comput. Geom. (SoCG), pages 827–841, 2015.
[56] Mickaël Buchet, Frédéric Chazal, Steve Y. Oudot, and Donald Sheehy. Efficient and ro-
bust persistent homology for measures. In Proc. 26th Annu. ACM-SIAM Sympos. Discrete
Algorithms (SODA), pages 168–180, 2015.
[57] James R. Bunch and John E. Hopcroft. Triangular factorization and inversion by fast matrix
multiplication. Mathematics of Computation, 28(125):231–236, 1974.
[58] Dmitri Burago, Yuri Burago, and Sergei Ivanov. A course in metric geometry. volume 33
of AMS Graduate Studies in Math. American Mathematics Society, 2001.
[59] Dan Burghelea and Tamal K. Dey. Topological persistence for circle-valued maps. Discrete
Comput. Geom., 50(1):69–98, 2013.
[60] Oleksiy Busaryev, Sergio Cabello, Chao Chen, Tamal K. Dey, and Yusu Wang. Annotating
simplices with a homology basis and its applications. In Algorithm Theory - SWAT 2012 -
13th Scandinavian Sympos. Workshops, pages 189–200, 2012.
[61] Alexander Buslaev, Selim S Seferbekov, Vladimir Iglovikov, and Alexey Shvets. Fully
convolutional network for automatic road extraction from satellite imagery. In CVPR Work-
shops, pages 207–210, 2018.
[62] Gunnar Carlsson and Vin de Silva. Zigzag persistence. Found. Comput. Math., 10(4):367–
405, Aug 2010.
[63] Gunnar Carlsson, Vin de Silva, and Dmitriy Morozov. Zigzag persistent homology and
real-valued functions. In Proc. 26th Annu. Sympos. Comput. Geom. (SoCG), pages 247–
256, 2009.
[64] Gunnar Carlsson, Gurjeet Singh, and Afra Zomorodian. Computing multidimensional per-
sistence. In Proc. Internat. Sympos. Algorithms Computation (ISAAC), pages 730–739.
Springer, 2009.
[65] Gunnar Carlsson and Afra Zomorodian. The theory of multidimensional persistence. Dis-
crete Comput. Geom., 42(1):71–93, 2009.
[66] Hamish Carr, Jack Snoeyink, and Ulrike Axen. Computing contour trees in all dimensions.
Comput. Geom.: Theory and Applications, 24(2):75–94, 2003.
[67] Mathieu Carrière, Frédéric Chazal, Yuichi Ike, Théo Lacombe, Martin Royer, and Yuhei
Umeda. Perslay: a neural network layer for persistence diagrams and new graph topologi-
cal signatures. In Proc. 23rd Internat. Conf. Artificial Intelligence Stat. (AISTATS), volume
108, pages 2786–2796, 2020.
342 Computational Topology for Data Analysis

[68] Mathieu Carrière, Marco Cuturi, and Steve Y. Oudot. Sliced Wasserstein kernel for persis-
tence diagrams. In Proc. Internat. Conf. Machine Learning, pages 664–673, 2017.

[69] Mathieu Carrière and Steve Oudot. Structure and stability of the one-dimensional mapper.
Found. Comput. Math., 18(6):1333–1396, 2018.

[70] Nicholas J. Cavanna, Mahmoodreza Jahanseir, and Donald R. Sheehy. A geometric per-
spective on sparse filtrations. In Proc. Canadian Conf. Comput. Geom. (CCCG), 2015.

[71] Andrea Cerri, Barbara Di Fabio, Massimo Ferri, Patrizio Frosini, and Claudia Landi. Betti
numbers in multidimensional persistent homology are stable functions. Mathematical
Methods in the Applied Sciences, 36(12):1543–1557, 2013.

[72] Andrea Cerri and Patrizio Frosini. A new approximation algorithm for the matching dis-
tance in multidimensional persistence. J. Comput. Math., pages 291–309, 2020.

[73] Erin W. Chambers, Jeff Erickson, and Amir Nayyeri. Minimum cuts and shortest homolo-
gous cycles. In Proc. 25th Annu. Sympos. Comput. Geom. (SoCG), pages 377–385, 2009.

[74] Erin W. Chambers, Jeff Erickson, and Amir Nayyeri. Homology flows, cohomology cuts.
SIAM J. Comput., 41(6):1605–1634, 2012.

[75] Manoj K. Chari. On discrete Morse functions and combinatorial decompositions. Discrete
Math., 217(1-3):101–113, 2000.

[76] Isaac Chavel. Riemannian Geometry: A Modern Introduction, 2nd Ed. Cambridge univer-
sity press, 2006.

[77] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Oudot.
Proximity of persistence modules and their diagrams. In Proc. 25th Annu. Sympos. Comput.
Geom. (SoCG), pages 237–246, 2009.

[78] Frédéric Chazal, David Cohen-Steiner, Leonidas J. Guibas, Facundo Mémoli, and Steve Y.
Oudot. Gromov-Hausdorff stable signatures for shapes using persistence. Comput. Graph-
ics Forum, 28(5):1393–1403, 2009.

[79] Frédéric Chazal, David Cohen-Steiner, and Quentin Mérigot. Geometric inference for
probability distributions. Found. Comput. Math., 11(6):733–751, 2011.

[80] Frédéric Chazal, Vin de Silva, Marc Glisse, and Steve Oudot. The structure and stability
of persistence modules. CoRR, arXiv:1207.3674, 2012.

[81] Frédéric Chazal, Vin de Silva, and Steve Oudot. Persistence stability for geometric com-
plexes. Geometriae Dedicata, 173(1):193–214, Dec 2014.

[82] Frédéric Chazal, Brittany Fasy, Fabrizio Lecci, Bertr, Michel, Aless, ro Rinaldo, and Larry
Wasserman. Robust topological inference: Distance to a measure and kernel distance. J.
Machine Learning Research, 18(159):1–40, 2018.
Computational Topology for Data Analysis 343

[83] Frédéric Chazal, Brittany Fasy, Fabrizio Lecci, Alessandro Rinaldo, Aarti Singh, and Larry
Wasserman. On the bootstrap for persistence diagrams and landscapes. Modeling Analysis
Info. Sys., 20(6):96–105, 2013. Also available at arXiv:1311.0376.

[84] Frédéric Chazal, Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, and Larry A.
Wasserman. Stochastic convergence of persistence landscapes and silhouettes. J. Comput.
Geom. (JoCG), 6(2):140–161, 2015.

[85] Frédéric Chazal, Marc Glisse, Catherine Labruère, and Bertrand Michel. Convergence
rates for persistence diagram estimation in topological data analysis. J. Machine Learning
Research, 16(110):3603–3635, 2015.

[86] Frédéric Chazal, Marc Glisse, Catherine Labruère, and Bertrand Michel. Convergence
rates for persistence diagram estimation in Topological Data Analysis. J. Machine Learn-
ing Research, 16:3603–3635, 2015.

[87] Frédéric Chazal, Leonidas J. Guibas, Steve Oudot, and Primoz Skraba. Analysis of scalar
fields over point cloud data. Discrete Comput. Geom., 46(4):743–775, 2011.

[88] Frédéric Chazal, Ruqi Huang, and Jian Sun. Gromov-Hausdorff approximation of filamen-
tary structures using Reeb-type graphs. Discrete Comput. Geom., 53:621–649, 2015.

[89] Frédéric Chazal and André Lieutier. Weak feature size and persistent homology: com-
puting homology of solids in Rn from noisy data samples. In Proc. 21st Annu. Sympos.
Comput. Geom. (SoCG), pages 255–262, 2005.

[90] Frédéric Chazal and André Lieutier. Stability and computation of topological invariants of
solids in Rn . Discrete Comput. Geom., 37(4):601–617, 2007.

[91] Frédéric Chazal and Steve Y. Oudot. Towards persistence-based reconstruction in Eu-
clidean spaces. In Proc. 24th Annu. Sympos. Comput. Geom. (SoCG), pages 232–241,
2008.

[92] Bernard Chazelle. An optimal convex hull algorithm in any fixed dimension. Discrete
Comput. Geom., 10:377–409, 1993.

[93] Chao Chen and Daniel Freedman. Measuring and computing natural generators for homol-
ogy groups. Comput. Geom.: Theory & Applications, 43 (2):169–181, 2010.

[94] Chao Chen and Daniel Freedman. Hardness results for homology localization. Discrete
Comput. Geometry, 45 (3):425–448, 2011.

[95] Chao Chen and Michael Kerber. An output-sensitive algorithm for persistent homology.
Comput. Geom.: Theory and Applications, 46(4):435–447, 2013.

[96] Chao Chen, Xiuyan Ni, Qinxun Bai, and Yusu Wang. A topological regularizer for clas-
sifiers via persistent homology. In Proc. 22nd Internat. Conf. Artificial Intelligence Stat.
(AISTATS), pages 2573–2582, 2019.
344 Computational Topology for Data Analysis

[97] Siu-Wing Cheng, Tamal K. Dey, and Edgar A. Ramos. Manifold reconstruction from point
samples. In Proc. 16th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages
1018–1027, 2005.

[98] Siu-Wing Cheng, Tamal K. Dey, and Jonathan R. Shewchuk. Delaunay Mesh Generation.
CRC Press, 2012.

[99] Samir Chowdhury and Facundo Mémoli. Persistent homology of asymmetric networks:
An approach based on Dowker filtrations. CoRR, arXiv:1608.05432, 2018.

[100] Samir Chowdhury and Facundo Mémoli. Persistent path homology of directed networks.
In Proc. 29th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 1152–1169.
SIAM, 2018.

[101] James R. Clough, Ilkay Oksuz, Nicholas Byrne, Veronika A. Zimmer, Julia A. Schnabel,
and Andrew P. King. A topological loss function for deep-learning based image segmen-
tation using persistent homology. CoRR, 2019.

[102] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Stability of persistence dia-
grams. Discrete Comput. Geom., 37(1):103–120, Jan 2007.

[103] David Cohen-Steiner, Herbert Edelsbrunner, and John Harer. Extending persistence using
Poincaré and Lefschetz duality. Found. Comput. Math., 9(1):79–103, 2009.

[104] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Yuriy Mileyko. Lipschitz
functions have Lp -stable persistence. Found. Comput. Math., 10(2):127–139, 2010.

[105] David Cohen-Steiner, Herbert Edelsbrunner, John Harer, and Dmitriy Morozov. Persistent
homology for kernels, images, and cokernels. In Proc. 20th Annu. ACM-SIAM Sympos.
Discrete Algorithms (SODA), pages 1011–1020, 2009.

[106] David Cohen-Steiner, Herbert Edelsbrunner, and Dmitriy Morozov. Vines and vineyards by
updating persistence in linear time. In Proc. 22nd Annu. Sympos. Comput. Geom. (SoCG),
pages 119–126, 2006.

[107] Kree Cole-McLaughlin, Herbert Edelsbrunner, John Harer, Vijay Natarajan, and Valerio
Pascucci. Loops in Reeb graphs of 2-manifolds. Discrete Comput. Geom., 32(2):231–244,
2004.

[108] René Corbet and Michael Kerber. The representation theorem of persistence revisited and
generalized. J. Appl. Comput. Topology, 2(1):1–31, Oct 2018.

[109] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduc-
tion to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.

[110] David A Cox, John Little, and Donal O’shea. Using Algebraic Geometry, volume 185.
Springer Science & Business Media, 2006.

[111] William Crawley-Boevey. Decomposition of pointwise finite-dimensional persistence


modules. J. Algebra and Its Applications, 14(05):1550066, 2015.
Computational Topology for Data Analysis 345

[112] Justin Curry. Sheaves, cosheaves and applications. CoRR, arXiv:1303.3255, 2013.

[113] Vin de Silva. A weak definition of Delaunay triangulation. CoRR, arXiv:cs/0310031, 2003.

[114] Vin de Silva and Gunnar Carlsson. Topological estimation using witness complexes. In
Proc. Sympos. Point-Based Graphics., 2004.

[115] Vin de Silva, Dmitriy Morozov, and Mikael Vejdemo-Johansson. Dualities in persistent
(co)homology. Inverse Problems, 27:124003, 2011.

[116] Vin de Silva, Elizabeth Munch, and Amit Patel. Categorified Reeb graphs. Discrete Com-
put. Geom., 55(4):854–906, 2016.

[117] Cecil Jose A. Delfinado and Herbert Edelsbrunner. An incremental algorithm for Betti
numbers of simplicial complexes on the 3-sphere. Comput. Aided Geom. Design,
12(7):771–784, 1995.

[118] Olaf Delgado-Friedrichs, Vanessa Robins, and Adrian P. Sheppard. Skeletonization and
partitioning of digital images using discrete Morse theory. IEEE Trans. Pattern Anal. Ma-
chine Intelligence, 37(3):654–666, 2015.

[119] Tamal K. Dey. Curve and Surface Reconstruction: Algorithms with Mathematical Analy-
sis. Cambridge Monographs Applied Comput. Math. Cambridge University Press, 2006.

[120] Tamal K. Dey, Herbert Edelsbrunner, and Sumanta Guha. Computational topology. Ad-
vances in Discrete Comput. Geom., 1999.

[121] Tamal K. Dey, Herbert Edelsbrunner, Sumanta Guha, and Dmitry V. Nekhayev. Topology
preserving edge contraction. Publications de l’ Institut Mathematique (Beograd), 60:23–
45, 1999.

[122] Tamal K. Dey, Fengtao Fan, and Yusu Wang. Computing topological persistence for sim-
plicial maps. CoRR, arXiv:1208.5018, 2012.

[123] Tamal K. Dey, Fengtao Fan, and Yusu Wang. An efficient computation of handle and tunnel
loops via reeb graphs. ACM Trans. Graph., 32(4):32, 2013.

[124] Tamal K. Dey, Fengtao Fan, and Yusu Wang. Graph induced complex for point data. In
Proc. 29th. Annu. Sympos. Comput. Geom. (SoCG), pages 107–116, 2013.

[125] Tamal K. Dey, Fengtao Fan, and Yusu Wang. Computing topological persistence for simpli-
cial maps. In Proc. 13th Annu. Sympos. Comput. Geom. (SoCG), pages 345:345–345:354,
2014.

[126] Tamal K. Dey, Anil N. Hirani, and Bala Krishnamoorthy. Optimal homologous cycles,
total unimodularity, and linear programming. SIAM J. Comput., 40(4):1026–1044, 2011.

[127] Tamal K. Dey and Tao Hou. Computing zigzag persistence on graphs in near-linear time.
In Proc. 37th Internat. Sympos. Comput. Geom. (SoCG), 2021.
346 Computational Topology for Data Analysis

[128] Tamal K. Dey, Tao Hou, and Sayan Mandal. Persistent 1-cycles: Definition, computation,
and its application. In Comput. Topology Image Context - 7th Internat. Workshop, pages
123–136, 2019.

[129] Tamal K. Dey, Tao Hou, and Sayan Mandal. Computing minimal persistent cycles: Poly-
nomial and hard cases. In Proc. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages
2587–2606. SIAM, 2020.

[130] Tamal K. Dey, Tianqi Li, and Yusu Wang. Efficient algorithms for computing a minimal
homology basis. In LATIN 2018: Theoretical Informatics - 13th Latin American Sympo-
sium, pages 376–398, 2018.

[131] Tamal K. Dey, Tianqi Li, and Yusu Wang. An efficient algorithm for 1-dimensional (per-
sistent) path homology. In Proc. 36th. Internat. Sympos. Comput. Geom. (SoCG), pages
36:1–36:15, 2020.

[132] Tamal K. Dey, Facundo Mémoli, and Yusu Wang. Multiscale mapper: Topological sum-
marization via codomain covers. In Proc. 27th Annu. ACM-SIAM Symposium on Discrete
Algorithms (SODA), pages 997–1013, 2016.

[133] Tamal K. Dey, Facundo Mémoli, and Yusu Wang. Topological analysis of nerves, Reeb
spaces, mappers, and multiscale mappers. In Proc. 33rd Internat. Sympos. Comput. Geom.
(SOCG), pages 36:1–36:16, 2017.

[134] Tamal K. Dey, Dayu Shi, and Yusu Wang. Comparing graphs via persistence distortion. In
Proc. 31st Annu. Sympos. Comput. Geom. (SoCG), pages 491–506, 2015.

[135] Tamal K. Dey, Dayu Shi, and Yusu Wang. SimBa: An efficient tool for approximating
Rips-filtration persistence via simplicial batch-collapse. In Proc. 24th Annu. European
Sympos. Algorithms (ESA 2016), volume 57 of LIPIcs, pages 35:1–35:16, 2016.

[136] Tamal K. Dey, Jian Sun, and Yusu Wang. Approximating loops in a shortest homology
basis from point data. In Proc. 26th Annu. Sympos. Comput. Geom. (SoCG), pages 166–
175, 2010.

[137] Tamal K. Dey, Jiayuan Wang, and Yusu Wang. Improved road network reconstruction
using discrete Morse theory. In Proc. 25th ACM SIGSPATIAL Internat. Conf. Advances in
GIS, pages 58:1–58:4, 2017.

[138] Tamal K. Dey, Jiayuan Wang, and Yusu Wang. Graph reconstruction by discrete Morse
theory. In Proc. 34th Internat. Sympos. Comput. Geom. (SoCG), pages 31:1–31:15, 2018.

[139] Tamal K. Dey, Jiayuan Wang, and Yusu Wang. Road network reconstruction from satel-
lite images with machine learning supported by topological methods. In Proc. 27th ACM
SIGSPATIAL Internat. Conf. Advances in GIS, pages 520–523, 2019.

[140] Tamal K. Dey and Yusu Wang. Reeb graphs: Approximation and persistence. Discrete
Comput. Geom., 49(1):46–73, 2013.
Computational Topology for Data Analysis 347

[141] Tamal K. Dey and Cheng Xin. Computing bottleneck distance for 2-D interval decompos-
able modules. In Proc. 34th Internat. Sympos. Comput. Geom. (SoCG), pages 32:1–32:15,
2018.
[142] Tamal K. Dey and Cheng Xin. Generalized persistence algorithm for decomposing multi-
parameter persistence modules. CoRR, arXiv:1904.03766, 2019.
[143] Barbara Di Fabio and Massimo Ferri. Comparing persistence diagrams through complex
vectors. In Vittorio Murino and Enrico Puppo, editors, Image Analysis and Processing —
ICIAP 2015, pages 294–305, 2015.
[144] Pawel Dlotko, Kathryn Hess, Ran Levi, Max Nolte, Michael Reimann, Martina Sco-
lamiero, Katharine Turner, Eilif Muller, and Henry Markram. Topological analysis of the
connectome of digital reconstructions of neural microcircuits. CoRR, arXiv:1601.01580,
2016.
[145] Harish Doraiswamy and Vijay Natarajan. Efficient output-sensitive construction of Reeb
graphs. In Proc. 19th Internat. Sympos. Algorithms Computation, pages 556–567, 2008.
[146] Harish Doraiswamy and Vijay Natarajan. Efficient algorithms for computing Reeb graphs.
Comput. Geom.: Theory and Applications, 42:606–616, 2009.
[147] Clifford H. Dowker. Homology groups of relations. Annals of Maths, 56:84–95, 1952.
[148] Herbert Edelsbrunner. Geometry and Topology for Mesh Generation, volume 7 of Cam-
bridge Monographs Applied Comput. Math. Cambridge University Press, 2001.
[149] Herbert Edelsbrunner and John Harer. Computational Topology: An Introduction. Applied
Mathematics. American Mathematical Society, 2010.
[150] Herbert Edelsbrunner, John Harer, and Amit K. Patel. Reeb spaces of piecewise linear
mappings. In Proc. 24th Annu. Sympos. Comput. Geom. (SoCG), pages 242–250, 2008.
[151] Herbert Edelsbrunner, David G. Kirkpatrick, and Raimund Seidel. On the shape of a set of
points in the plane. IEEE Trans. Info. Theory, 29(4):551–558, 1983.
[152] Herbert Edelsbrunner, David Letscher, and Afra Zomorodian. Topological persistence and
simplification. Discrete Comput. Geom., 28:511–533, 2002.
[153] Herbert Edelsbrunner and Ernst P. Mücke. Three-dimensional alpha shapes. ACM Trans.
Graph., 13(1):43–72, 1994.
[154] Alon Efrat, Alon Itai, and Matthew J. Katz. Geometry helps in bottleneck matching and
related problems. Algorithmica, 31(1):1–28, 2001.
[155] David Eisenbud. The Geometry of Syzygies: A Second Course in Algebraic Geometry and
Commutative Algebra, volume 229. Springer Science & Business Media, 2005.
[156] Jeff Erickson and Kim Whittlesey. Greedy optimal homotopy and homology generators.
In Proc. 16th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 1038–1046,
2005.
348 Computational Topology for Data Analysis

[157] Emerson G. Escolar and Yasuaki Hiraoka. Optimal cycles for persistent homology via
linear programming. Optimization in the Real World, 13:79–96, 2016.

[158] Barbara Di Fabio and Claudia Landi. Reeb graphs of curves are stable under function
perturbations. Mathematical Methods in Applied Sciences, 35:1456–1471, 2012.

[159] Barbara Di Fabio and Claudia Landi. The edit distance for Reeb graphs of surfaces. Dis-
crete Comput. Geom., 55:423–461, 2016.

[160] Brittany Terese Fasy, Fabrizio Lecci, Alessandro Rinaldo, Larry Wasserman, Sivaraman
Balakrishnan, and Aarti Singh. Confidence sets for persistence diagrams. The Annal. Stat.,
42(6):2301–2339, 2014.

[161] Robin Forman. Morse theory for cell complexes. Adv. Math., 134:90–145, 1998.

[162] Patrizio Frosini. A distance for similarity classes of submanifolds of a Euclidean space.
Bulletin of the Australian Mathematical Society, 42(3):407–415, 1990.

[163] Peter Gabriel. Unzerlegbare Darstellungen I. Manuscripta Mathematica, 6(1):71–103,


1972.

[164] Sylvestre Gallot, Dominique Hulin, and Jacques Lafontaine. Riemannian Geometry.
Springer-Verlag, 2nd edition, 1993.

[165] Marcio Gameiro, Yasuaki Hiraoka, and Ippei Obayashi. Continuation of point clouds via
persistence diagrams. Physica D: Nonlinear Phenomena, 334:118 – 132, 2016. Topology
in Dynamics, Differential Equations, and Data.

[166] Ellen Gasparovic, Maria Gommel, Emilie Purvine, Radmila Sazdanovic, Bei Wang, Yusu
Wang, and Lori Ziegelmeier. The relationship between the intrinsic Čech and persistence
distortion distances for metric graphs. J. Comput. Geom. (JoCG), 10(1), 2019. DOI:
https://doi.org/10.20382/jocg.v10i1a16.

[167] Xiaoyin Ge, Issam Safa, Mikhail Belkin, and Yusu Wang. Data skeletonization via Reeb
graphs. In Proc. 25th Annu. Conf. Neural Info. Processing Sys. (NIPS), pages 837–845,
2011.

[168] Thomas Gebhart, Paul Schrater, and Alan Hylton. Characterizing the shape of activation
space in deep neural networks. CoRR, arXiv:1901.09496, 2019.

[169] Loukas Georgiadis, Robert Endre Tarjan, and Renato Fonseca F. Werneck. Design of
data structures for mergeable trees. In Proc. 17th Annu. ACM-SIAM Sympos. Discrete
Algorithms (SODA), pages 394–403, 2006.

[170] Robert Ghrist. Elementary Applied Topology. CreateSpace Independent Publishing Plat-
form, 2014.

[171] Alexander Grigor’yan, Yong Lin, Yuri Muranov, and Shing-Tung Yau. Homologies of path
complexes and digraphs. CoRR, arXiv:1207.2834, 2012.
Computational Topology for Data Analysis 349

[172] Alexander Grigor’yan, Yong Lin, Yuri Muranov, and Shing-Tung Yau. Homotopy theory
for digraphs. CoRR, arXiv:1407.0234, 2014.

[173] Alexander Grigor’yan, Yong Lin, Yuri Muranov, and Shing-Tung Yau. Cohomology of
digraphs and (undirected) graphs. Asian J. Math, 19(5):887–931, 2015.

[174] Alexander Grigor’yan, Yuri Muranov, and Shing-Tung Yau. Homologies of digraphs and
Künneth formulas. Communications in Analysis and Geometry, 25(5):969–1018, 2017.

[175] Mikhail Gromov. Groups of polynomial growth and expanding maps (with an appendix by
Jacques Tits). Publications Mathématiques de I’Institut des Hautes Études Scientifiques,
53(1):53–78, 1981.

[176] Mikhail Gromov. Hyperbolic groups. In S.M. Gersten, editor, Essays in Group Theory,
volume 8, pages 75–263. Mathematical Sciences Research Institute Publications, Springer,
1987.

[177] Karsten Grove. Critical point theory for distance functions. Proc. Sympos. Pure. Math.,
54(3):357–385, 1993.

[178] Leonidas J. Guibas and Steve Y. Oudot. Reconstructing using witness complexes. Discrete.
Comput. Geom., 30:325–356, 2008.

[179] Victor Guillemin and Alan Pollack. Differential Topology. Prentice Hall, 1974.

[180] William H. Guss and Ruslan Salakhutdinov. On characterizing the capacity of neural net-
works using algebraic topology. CoRR, arXiv:1802.04443, 2018.

[181] Attila Gyulassy, Natallia Kotava, Mark Kim, Charles Hansen, Hans Hagen, and Valerio
Pascucci. Direct feature visualization using Morse-Smale complexes. IEEE Trans. Visual-
ization Comput. Graphics (TVCG), 18(9):1549–1562, 2012.

[182] Sariel Har-Peled and Manor Mendel. Fast construction of nets in low-dimensional metrics
and their applications. SIAM J. Comput., 35(5):1148–1184, 2006.

[183] Frank Harary. Graph Theory. Addison Wesley series in mathematics. Addison-Wesley,
1971.

[184] William Harvey, In-Hee Park, Oliver Rübel, Valerio Pascucci, Peer-Timo Bremer, Cheng-
long Li, and Yusu Wang. A collaborative visual analytics suite for protein folding research.
J. Mol. Graph. Modeling (JMGM), 53:59–71, 2014.

[185] William Harvey, Raphael Wenger, and Yusu Wang. A randomized O(m log m) time algo-
rithm for computing Reeb graph of arbitrary simplicial complexes. In Proc. 25th Annu.
ACM Sympos. Compu. Geom. (SoCG), pages 267–276, 2010.

[186] Allen Hatcher. Algebraic Topology. Cambridge University Press, Cambridge, 2002.

[187] Jean-Claude Hausmann. On the Vietoris-Rips complexes and a cohomology theory for
metric spaces. Annals Math. Studies, 138:175–188, 1995.
350 Computational Topology for Data Analysis

[188] John Hershberger and Jack Snoeyink. Computing minimum length paths of a given homo-
topy class. Comput. Geom.: Theory and Applications, 4:63–97, 1994.

[189] Franck Hétroy and Dominique Attali. Topological quadrangulations of closed triangulated
surfaces using the Reeb graph. Graph. Models, 65(1-3):131–148, 2003.

[190] Masaki Hilaga, Yoshihisa Shinagawa, Taku Kohmura, and Tosiyasu L Kunii. Topology
matching for fully automatic similarity estimation of 3D shapes. In Proc. 28th Annu. Conf.
Comput. Graphics Interactive Techniques, pages 203–212, 2001.

[191] David Hilbert. Über die theorie der algebraischen formen. Mathematische Annalen,
36:473–530, 1890.

[192] Yasuaki Hiraoka, Takenobu Nakamura, Akihiko Hirata, Emerson G. Escolar, Kaname Mat-
sue, and Yasumasa Nishiura. Hierarchical structures of amorphous solids characterized by
persistent homology. Proc. National Academy Sci., 113(26):7035–7040, 2016.

[193] Christoph Hofer, Roland Kwitt, Marc Niethammer, and Mandar Dixit. Connectivity-
optimized representation learning via persistent homology. In Proc. 36th Internat. Conf.
Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2751–
2760. PMLR, 2019.

[194] Christoph Hofer, Roland Kwitt, Marc Niethammer, and Andreas Uhl. Deep learning with
topological signatures. In Proc. Advances Neural Information Processing Sys., pages
1634–1644, 2017.

[195] Christoph D. Hofer, Roland Kwitt, and Marc Niethammer. Learning representations of
persistence barcodes. J. Machine Learning Research, 20(126):1–45, 2019.

[196] Derek F. Holt. The Meataxe as a tool in computational group theory. London Mathematical
Society Lecture Note Series, pages 74–81, 1998.

[197] Derek F. Holt and Sarah Rees. Testing modules for irreducibility. J. Australian Math.
Society, 57(1):1–16, 1994.

[198] John E. Hopcroft and Richard M. Karp. An n5/2 algorithm for maximum matchings in
bipartite graphs. SIAM J. Comput., 2(4):225–231, 1973.

[199] Xiaoling Hu, Fuxin Li, Dimitris Samaras, and Chao Chen. Topology-preserving deep
image segmentation. In Proc. 33rd Annu. Conf. Neural Info. Processing Sys. (NeuRIPS),
pages 5658–5669, 2019.

[200] Oscar H Ibarra, Shlomo Moran, and Roger Hui. A generalization of the fast lup matrix
decomposition algorithm and applications. J. Algorithms, 3(1):45 – 56, 1982.

[201] Arthur F. Veinott Jr. and George B. Dantzig. Integral extreme points. SIAM Review, 10
(3):371–372, 1968.

[202] Matthew Kahle. Topology of random clique complexes. Discrete Mathematics,


309(6):1658–1671, 2009.
Computational Topology for Data Analysis 351

[203] Matthew Kahle. Sharp vanishing thresholds for cohomology of random flag complexes.
Annal. Math., pages 1085–1107, 2014.

[204] Matthew Kahle and Elizabeth Meckes. Limit theorems for betti numbers of random sim-
plicial complexes. Homology, Homotopy and Applications, 15(1):343–374, 2013.

[205] Sara Kališnik. Tropical coordinates on the space of persistence barcodes. Found. Comput.
Math., 19:101–129, 2019.

[206] Lida Kanari, Paweł Dłotko, Martina Scolamiero, Ran Levi, Julian Shillcock, Kathryn Hess,
and Henry Markram. A topological representation of branching neuronal morphologies.
Neuroinformatics, 16(1):3–13, 2018.

[207] Michael Kerber, Michael Lesnick, and Steve Oudot. Exact computation of the matching
distance on 2-parameter persistence modules. In Proc. 35th Internat. Sympos. Comput.
Geom. (SoCG), volume 129 of LIPIcs, pages 46:1–46:15, 2019.

[208] Michael Kerber, Dmitriy Morozov, and Arnur Nigmetov. Geometry helps to compare
persistence diagrams. J. Experimental Algo. (JEA), 22(1):1–4, 2017.

[209] Michael Kerber and Arnur Nigmetov. Efficient approximation of the matching distance for
2-parameter persistence. CoRR, arXiv:1912.05826, 2019.

[210] Michael Kerber and Hannah Schreiber. Barcodes of towers and a streaming algorithm for
persistent homology. Discrete Comput. Geom., 61(4):852–879, 2019.

[211] Kwangho Kim, Jisu Kim, Manzil Zaheer, Joon Kim, and Larry Wasserman
Frédéric Chazal. PLLay: Efficient topological layer based on persistent landscapes. In
Proc. 33rd Annu. Conf. Advances Neural Info. Processing Sys. (NeurIPS), 2020.

[212] Woojin Kim and Facundo Mémoli. Generalized persistence diagrams for persistence mod-
ules over posets. CoRR, arXiv:1810.11517, 2018.

[213] Henry King, Kevin P. Knudson, and Neza Mramor. Generating discrete Morse functions
from point data. Exp. Math., 14(4):435–444, 2005.

[214] Kevin P. Knudson. A refinement of multi-dimensional persistence. CoRR,


arXiv:0706.2608, 2007.

[215] Genki Kusano, Kenji Fukumizu, and Yasuaki Hiraoka. Kernel method for persistence
diagrams via kernel embedding and weight factor. Journal of Machine Learning Research,
18(189):1–41, 2018.

[216] Claudia Landi. The rank invariant stability via interleavings. CoRR, arXiv:1412.3374,
2014.

[217] Janko Latschev. Vietoris-Rips complexes of metric spaces near a closed Riemannian man-
ifold. Archiv der Mathematik, 77(6):522–528, 2001.
352 Computational Topology for Data Analysis

[218] Tam Le and Makoto Yamada. Persistence Fisher kernel: A Riemannian manifold kernel
for persistence diagrams. In Proc. Advances Neural Info. Processing Sys. (NIPS), pages
10028–10039, 2018.

[219] Jean Leray. Sur la forme des espaces topologiques et sur les points fixes des représenta-
tions. J. Math. Pure Appl., 24:95–167, 1945.

[220] Michael Lesnick. The theory of the interleaving distance on multidimensional persistence
modules. Found. Comput. Math., 15(3):613–650, 2015.

[221] Michael Lesnick and Matthew Wright. Interactive visualization of 2-d persistence modules.
CoRR, arXiv:1512.00180, 2015.

[222] Michael Lesnick and Matthew Wright. Computing minimal presentations and betti num-
bers of 2-parameter persistent homology. CoRR, arXiv:1902.05708, 2019.

[223] Thomas Lewiner, Hélio Lopes, and Geovan Tavares. Applications of Forman’s discrete
Morse theory to topology visualization and mesh compression. IEEE Trans. Vis. Comput.
Graph., 10(5):499–508, 2004.

[224] Li Li, Wei-Yi Cheng, Benjamin S. Glicksberg, Omri Gottesman, Ronald Tamler, Rong
Chen, Erwin P. Bottinger, and Joel T. Dudley. Identification of type 2 diabetes sub-
groups through topological analysis of patient similarity. Science Translational Medicine,
7(311):311ra174, 2015.

[225] André Lieutier. Any open bounded subset of Rn has the same homotopy type as its medial
axis. Computer-Aided Design, 36(11):1029–1046, 2004.

[226] Yong Lin, Linyuan Lu, and Shing-Tung Yau. Ricci curvature of graphs. Tohoku Mathe-
matical Journal, Second Series, 63(4):605–627, 2011.

[227] P.Y. Lum, G. Singh, A. Lehman, T. Ishkhanikov, M. Vejdemo-Johansson, M. Alagap-pan,


J. Carlsson, and G. Carlsson. Extracting insights from the shape of complex data using
topology. Scientific reports, 3, 2013.

[228] Clément Maria and Steve Y. Oudot. Zigzag persistence via reflections and transpositions.
In Proc. 26th Annu. ACM-SIAM Sympos. Discrete Algorithms (SODA), pages 181–199,
2015.

[229] Paolo Masulli and Alessandro EP Villa. The topology of the directed clique complex as a
network invariant. SpringerPlus, 5(1):388, 2016.

[230] Yuriy Mileyko, Sayan Mukherjee, and John Harer. Probability measures on the space of
persistence diagrams. Inverse Problems, 27(12):124007, 2011.

[231] Ezra Miller and Bernd Sturmfels. Combinatorial Commutative Algebra. Springer-Verlag
New York, 2004.

[232] John W. Milnor. Topology from a differentiable viewpoint. Virginia Univ. Press, 1965.
Computational Topology for Data Analysis 353

[233] John W. Milnor. Morse Theory. Annals of Mathematics Studies. Princeton University
Press, 5th edition, 1973.

[234] Nikola Milosavljević, Dmitriy Morozov, and Primoz Skraba. Zigzag persistent homology
in matrix multiplication time. In Proc. 27th Annu. Sympos. Comput. Geom. (SoCG), pages
216–225, 2011.

[235] Konstantin Mischaikow and Vidit Nanda. Morse theory for filtrations and efficient compu-
tation of persistent homology. Discrete Comput. Geom., 50(2):330–353, 2013.

[236] Michael Moor, Max Horn, Bastian Rieck, and Karsten Borgwardt. Topological autoen-
coders. CoRR, arXiv:1906.00722, 2019.

[237] Dmitriy Morozov, Kenes Beketayev, and Gunther H. Weber. Interleaving distance between
merge trees. In Workshop on Topological Methods in Data Analysis and Visualization:
Theory, Algorithms and Applications, 2013.

[238] Marian Mrozek. Conley-Morse-Forman theory for combinatorial multivector fields on


Lefschetz complexes. Found. Comput. Math., 17(6):1585–1633, 2017.

[239] Elizabeth Munch, Katharine Turner, Paul Bendich, Sayan Mukherjee, Jonathan Mattingly,
and John Harer. Probabilistic Fréchet means for time varying persistence diagrams. Elec-
tron. J. Statist., 9(1):1173–1204, 2015.

[240] Elizabeth Munch and Bei Wang. Convergence between categorical representations of
Reeb space and mapper. In 32nd Internat. Sympos. Comput. Geom. (SoCG), volume 51
of LIPIcs, pages 53:1–53:16, 2016.

[241] James R. Munkres. Elements of Algebraic Topology. Addison–Wesley Publishing Com-


pany, Menlo Park, 1984.

[242] James R. Munkres. Topology, 2nd Edition. Prentice Hall, Inc., 2000.

[243] Gregory Naitzat, Andrey Zhitnikov, and Lek-Heng Lim. Topology of deep neural networks.
J. Mach. Learn. Res., 21:184:1–184:40, 2020.

[244] Monica Nicolau, Arnold J. Levine, and Gunnar Carlsson. Topology based data analysis
identifies a subgroup of breast cancers with a unique mutational profile and excellent sur-
vival. Proc. National Acad. Sci., 108.17:7265–7270, 2011.

[245] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. Finding the homology of subman-
ifolds with high confidence from random samples. Discrete Comput. Geom., 39(1-3):419–
441, 2008.

[246] Partha Niyogi, Stephen Smale, and Shmuel Weinberger. A topological view of unsuper-
vised learning from noisy data. SIAM J. Comput., 40(3):646–663, 2011.

[247] Ippei Obayashi. Volume-optimal cycle: Tightest representative cycle of a generator in


persistent homology. SIAM J. Appl. Algebra Geom., 2(4):508–534, 2018.
354 Computational Topology for Data Analysis

[248] James B. Orlin. Max flows in O(nm) time, or better. In Proc. 45th Annu. ACM Sympos.
Theory Comput. (STOC), pages 765–774, 2013.

[249] Steve Oudot. Persistence Theory: From Quiver Representations to Data Analysis, volume
209. AMS Mathematical Surveys and Monographs, 2015.

[250] Deepti Pachauri, Chris Hinrichs, Moo K. Chung, Sterling C. Johnson, and Vikas Singh.
Topology-based kernels with application to inference problems in Alzheimer’s disease.
IEEE Trans. Med. Imaging, 30(10):1760–1770, 2011.

[251] Richard A. Parker. The computer calculation of modular characters (the Meataxe). Comput.
Group Theory, pages 267–274, 1984.

[252] Salman Parsa. A deterministic O(m log m) time algorithm for the Reeb graph. Discrete
Comput. Geom., 49(4):864–878, Jun 2013.

[253] Valerio Pascucci, Giorgio Scorzelli, Peer-Timo Bremer, and Ajith Mascarenhas. Robust
on-line computation of Reeb graphs: simplicity and speed. ACM Trans. Graph., 26(3):58,
2007.

[254] Amit Patel. Generalized persistence diagrams. J. Appl. Comput. Topology, 1:397–419,
2018.

[255] Giovanni Petri, Martina Scolamiero, Irene Donato, and Francesco Vaccarino. Topological
strata of weighted complex networks. PLOS ONE, 8:1–8, 06 2013.

[256] Jeff M. Phillips, Bei Wang, and Yan Zheng. Geometric inference on kernel density esti-
mates. In Lars Arge and János Pach, editors, Proc. 31st Internat. Sympos. Comput. Geom.
(SoCG), volume 34 of LIPIcs, pages 857–871, 2015.

[257] Adrien Poulenard, Primoz Skraba, and Maks Ovsjanikov. Topological function optimiza-
tion for continuous shape matching. Comput. Graphics Forum, 37(5):13–25, 2018.

[258] Victor V. Prasolov. Elements of combinatorial and differential topology, volume 74. Amer.
Math. Soc., 2006.

[259] Diadem challenge. http://diademchallenge.org.

[260] Raúl Rabadán and Andrew J. Blumberg. Topological Data Analysis for Genomics and
Evolution: Topology in Biology. Cambridge University Press, 2019.

[261] Geoge Reeb. Sur les points singuliers d’une forme de Pfaff complètement intégrable ou
d’une fonction numérique. Comptes Rendus Hebdomadaires des Séances de l’Académie
des Sciences, 222:847–849, 1946.

[262] Michael W Reimann, Max Nolte, Martina Scolamiero, Katharine Turner, Rodrigo Perin,
Giuseppe Chindemi, Paweł Dłotko, Ran Levi, Kathryn Hess, and Henry Markram. Cliques
of neurons bound into cavities provide a missing link between structure and function. Fron-
tiers Comput. Neuroscience, 11:48, 2017.
Computational Topology for Data Analysis 355

[263] Jan Reininghaus, Stefan Huber, Ulrich Bauer, and Roland Kwitt. A stable multi-scale
kernel for topological machine learning. In Proc. Comput. Vision Pattern Recognition,
pages 4741–4748, 2015.

[264] Bastian Rieck, Matteo Togninalli, Christian Bock, Michael Moor, Max Horn, Thomas
Gumbsch, and Karsten Borgwardt. Neural persistence: A complexity measure for deep
neural networks using algebraic topology. In Proc. Internat. Conf. Learning Representa-
tions (ICLR), 2019.

[265] Claus M. Ringel and Hiroyuki Tachikawa. Q-F3 rings. J. für die Reine und Angewandte
Mathematik, 272:49–72, 1975.

[266] Vanessa Robins. Towards computing homology from finite approximations. Topology
Proceedings, 24(1):503–532, 1999.

[267] Vanessa Robins, Peter J. Wood, and Adrian P. Sheppard. Theory and algorithms for con-
structing discrete Morse complexes from grayscale digital images. IEEE Trans. Pattern
Anal. Machine Intelligence, 33(8):1646–1658, 2011.

[268] Tim Römer. On minimal graded free resolutions. Illinois J. Math, 45(2):1361–1376, 2001.

[269] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks
for biomedical image segmentation. In Proc. Internat. Conf. Medical Image Comput.
Computer-Assisted Intervention, pages 234–241. Springer, 2015.

[270] Jim Ruppert. A Delaunay refinement algorithm for quality 2-dimensional mesh generation.
J. Algorithms, 18:548–585, 1995.

[271] Bernhard Schölkopf and Alexander J. Smola. Learning with Kernels: Support Vector Ma-
chines, Regularization, Optimization, and Beyond. The MIT Press, 1998.

[272] Alexander Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons Ltd.,
Chichester, 1986.

[273] Paul D. Seymour. Decomposition of regular matroids. J. Combin. Theory Ser. B,


28(3):305–359, 1980.

[274] Donald R. Sheehy. Linear-size approximations to the Vietoris-Rips filtration. In Proc.


28th. Annu. Sympos. Comput. Geom. (SoCG), pages 239–248, 2012.

[275] Donald R. Sheehy. Linear-size approximations to the Vietoris-Rips filtration. Discrete


Comput. Geom., 49:778–796, 2013.

[276] Yoshihisa Shinagawa, Tosiyasu L. Kunii, and Yannick L. Kergosien. Surface coding based
on Morse theory. IEEE Comput. Graph. Appl., 11(5):66–78, 1991.

[277] Gurjeet Singh, Facundo Mémoli, and Gunnar Carlsson. Topological methods for the analy-
sis of high dimensional data sets and 3D object recognition. In Proc. Eurographics Sympos.
Point-Based Graphics (2007), pages 91–100, 2007.
356 Computational Topology for Data Analysis

[278] Primoz Skraba and Katharine Turner. Wasserstein stability for persistence diagrams. CoRR,
arXiv:2006.16824, 2021.

[279] Jacek Skryzalin. Numeric invariants from multidimensional persistence, 2016. PhD thesis,
Stanford University.

[280] Daniel D. Sleator and Robert Endre Tarjan. A data structure for dynamic trees. J. Comput.
Syst. Sci., 26(3):362–391, June 1983.

[281] Henry J. S. Smith. On systems of linear indeterminate equations and congruences. Philo-
sophical Transactions of the Royal Society of London, 151:293–326, 1861.

[282] Thierry Sousbie. The persistent cosmic web and its filamentary structure - I. theory and
implementation. Monthly Notices Royal Astronomical Soc., 414(1):350–383, 2011.

[283] Bharath K. Sriperumbudur, Kenji Fukumizu, and Gert R.G. Lanckriet. Universality, char-
acteristic kernels and RKHS embedding of measures. J. Machine Learning Research,
12(70):2389–2410, 2011.

[284] Jian Sun, Maks Ovsjanikov, and Leonidas Guibas. A concise and provably informative
multi-scale signature based on heat diffusion. In Proceedings of the Symposium on Geom-
etry Processing (SGP), page 1383âĂŞ1392, Goslar, DEU, 2009. Eurographics Association.

[285] Julien Tierny. Reeb graph based 3D shape modeling and applications. PhD thesis, Uni-
versite des Sciences et Technologies de Lille, 2008.

[286] Julien Tierny. Topological Data Analysis for Scientific Visualization. Springer Internat.
Publishing, 2017.

[287] Julien Tierny, Attila Gyulassy, Eddie Simon, and Valerio Pascucci. Loop surgery for vol-
umetric meshes: Reeb graphs reduced to contour trees. IEEE Trans. Vis. Comput. Graph.,
15(6):1177–1184, 2009.

[288] Brenda Y. Torres, Jose H. M. Oliveira, Ann Thomas Tate, Poonam Rath, Katherine Cum-
nock, and David S. Schneider. Tracking resilience to infections by mapping disease space.
PLOS Biology, 14(4):1–19, 2016.

[289] Elena Farahbakhsh Touli and Yusu Wang. FPT-algorithms for computing gromov-
hausdorff and interleaving distances between trees. CoRR, arXiv:1811.02425, 2018. in
Proc. European Sympos. Algorithms (ESA) 2019.

[290] Tony Tung and Francis Schmitt. The augmented multiresolution Reeb graph approach for
content-based retrieval of 3d shapes. Internat. J. Shape Modeling, 11(1):91–120, 2005.

[291] Katharine Turner, Yuriy Mileyko, Sayan Mukherjee, and John Harer. Fréchet means for
distributions of persistence diagrams. Discrete Comput. Geom., 52(1):44–70, 2014.

[292] Gert Vegter and Chee K. Yap. Computational complexity of combinatorial surfaces. In
Proc. 6th Annu. Sympos. Comput. Geom. (SoCG), pages 102–111, 1990.
Computational Topology for Data Analysis 357

[293] Leopold Vietoris. Über den höheren zusammenhang kompakter räume und eine klasse von
zusammenhangstreuen abbildungen. Mathematische Annalen, 97:454–472, 1927.

[294] Suyi Wang, Xu Li, Partha Mitra, and Yusu Wang. Topological skeletonization and tree-
summarization of neurons using discrete Morse theory. CoRR, arXiv:1805.04997, 2018.

[295] Suyi Wang, Yusu Wang, and Yanjie Li. Efficient map reconstruction and augmentation
via topological methods. In Jie Bao, Christian Sengstock, Mohammed Eunus Ali, Yan
Huang, Michael Gertz, Matthias Renz, and Jagan Sankaranarayanan, editors, Proc. 23rd
SIGSPATIAL Internat. Conf. Advances in GIS, pages 25:1–25:10, 2015.

[296] Suyi Wang, Yusu Wang, and Rephael Wenger. The JS-graph of join and split trees. In
Proc. 30th Annu. Sympos. Comput. Geom. (SoCG), pages 539–548, 2014.

[297] Larry Wasserman. Topological data analysis. Annual Review of Statistics and Its Applica-
tion, 5(1):501–532, 2018. Available at SSRN: https://ssrn.com/abstract=3156968.

[298] Carry Webb. Decomposition of graded modules. Proc. American Math. Soc., 94(4):565–
571, 1985.

[299] Gunther Weber, Peer-Timo Bremer, and Valerio Pascucci. Topological landscapes: A ter-
rain metaphor for scientific data. IEEE Trans. Vis. Comput. Graphics, 13(6):1416–1423,
2007.

[300] André Weil. Sur les théoréms de de Rham. Commentarii Mathematici Helvetici, 26:119–
145, 1952.

[301] Zoë Wood, Hugues Hoppe, Mathieu Desbrun, and Peter Schröder. Removing excess topol-
ogy from isosurfaces. ACM Trans. Graph., 23(2):190–208, 2004.

[302] Pengxiang Wu, Chao Chen, Yusu Wang, Shaoting Zhang, Changhe Yuan, Zhen Qian, Dim-
itris N. Metaxas, and Leon Axel. Optimal topological cycles and their application in cardiac
trabeculae restoration. In Info. Processing Medical Imaging - 25th Internat. Conf., IPMI,
pages 80–92, 2017.

[303] Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Ruslan Salakhut-
dinov, and Alexander Smola. Deep sets. In Proc. Advances Neural Info. Processing Sys.,
pages 3391–3401, 2017.

[304] Simon Zhang, Mengbai Xiao, and Hao Wang. GPU-accelerated computation of Vietoris-
Rips persistence barcodes. In 36th Internat. Sympos. Comput. Geom. (SoCG), volume 164,
pages 70:1–70:17, 2020.

[305] Qi Zhao and Yusu Wang. Learning metrics for persistence-based summaries and appli-
cations for graph classification. In Proc. 33rd Annu. Conf. Neural Info. Processing Sys.
(NeuRIPS), pages 9855–9866, 2019.
Index

(c, s)-good cover, 226 levelset zigzag, 115


B(c, r), 9 barycentric coordinates, 57
Bo (c, r), 9 basis, 37, 267
C i -smooth manifold, 15 birth, 58
C i -smooth surface, 15 blockcode, 293
S (c, r), 9 bottleneck distance, 62, 307
V-path, 238 boundary, 243
Int ·, 8, 14 of a manifold, 14
Bd ·, 8, 14 of a point set, 8
Cl ·, 8 boundary matrix, 68
ε-interleaving, 76 filtered, 68
m-manifold, 13 bounded, 9
Bd , 9
Bdo , 9 cancellation, 240
Hd , 9 category, 301
Sd , 9 thin, 301
d-parameter persistence module, 265 chain complex, 40
k-simplex, 23, 24 chain map, 43
p-coboundary, 46 class persistence, 60
p-cochain, 46 clearing, 73
p-cocycle, 46 clique complex, 193
p-cycle, 40 directed, 198
q-connected component, 139 closed
PL-critical interval, 8
points, 80 set, 8
Čech complex, 27 star, 25
Čech distance, 149 closure, 8
coface
accumulation point, 7 of a simplex, 24
alpha complex, 30 cofacet
ambient isotopy, 11 of a simplex, 24
annotation, 97 cohomology, 46
arboricity, 205 cohomology group, 47
attaching cells, 19 cokernel, 37
commutative diagram, 43, 76
bar, 104 compact, 9
barcode, 60 complement, 8

358
Computational Topology for Data Analysis 359

complex ball, 9
simplicial, 24 sphere, 9
connected, 7 exact sequence, 268
connected space, 5 extended persistence, 118
contiguous maps, 26 extended plane, 52, 59
continuous function, 10
face
contour, 171
of a simplex, 23, 24
contour tree, 172
facet
convex hull, 24
of a simplex, 23, 24
coset, 37
field, 37
cover
filtration, 54
path connected, 212
function, 57
maps, 210
nested pair, 163
critical
simplex-wise, 54
V-path, 240
finite type, 302
point, 16, 17
finitely generated, 265
point index, 19
flag complex, 193
simplices, 239
free group, 37
value, 17, 114
free module, 267
cut, 137
free resolution, 291
cycle group, 40
function-induced metric, 184
death, 58 functional distortion distance, 184
deformation retract, 13 functor, 301
deformation retraction, 13
generalized vector field, 159
Delaunay
generating set, 265
complex, 29
minimal, 266
simplex, 29
generator, 37, 265
derivative, 16
genus, 14
diameter of a point set, 9
geodesic
dimension, 24
distance, 9
of a manifold, 13
geometric realization, 25
of a simplex, 23, 24
grade, 264
disconnected, 7
graded Betti number, 292
discrete Morse
graded module, 264
field, 239
gradient, 331
function, 238
vector field, 17
distance field, 158
path, 249
DMVF, 239
vector, 17, 237, 248, 249
Dowker complex, 198
gradient vector, 17
edge, 23 gradient vector field, 17
elementary simplicial map, 98 Gromov-Hausdorff distance, 150
embedding, 10 group, 36
Euclidean Hausdorff, 21
360 Computational Topology for Data Analysis

Hausdorff distance, 150 Rips complex, 164


Hessian matrix, 18 isomorphic modules, 267
Hessian, degenerate critical points, 18 isomorphism, 37
Hilbert space, 320 isotopic, 11
homeomorphic, 10 isotopy, 11
homeomorphism, 10
homogeneous Jacobian matrix, 15
presentation matrix, 272 kernel, 37, 322
homology
group, 41 Lebesgue number, 218
module, 53 levelset, 19, 114
towers, 95 persistence, 114
homomorphism, 37 tame, 171
homotopic, 12 limit point, 7
homotopy, 12 link, 26
equivalent, 12 link condition, 101
horizontal homology, 180 local feature size, 159
locally finite, 213
image, 37 loop, 14
inclusion, 52 lower star, 56
indecomposable, 267 filtration, 56
infinite bar, 60, 118 lower-link-index, 80
integral path, 248
interior Möbius band, 14
of a manifold, 14 manifold, 13
of a point set, 8 without boundary, 14
interleaving map, 10
vector space towers, 95 regular, 15
distance, 77, 303 mapper, 220
multiparameter persistence, 302 matching, 238, 307
Reeb graph, 183 merge tree, 172
simplicial towers, 94 metric, 6
space towers, 94 ball, 7
interleaving distance graph, 193
simplicial towers, 95 space, 6
Reeb graph, 184 minimal cut, 137
simplicial towers, 95 minimal generating set, 265
vector space towers, 95 module, 37
intersection module, 310 decomposition, 267
interval, 104, 291, 308 interval decomposable, 308
interval decomposable module, 291 morphism, 266
interval levelset, 19 decomposition, 268
interval module, 78, 104, 291, 308 Morse
intrinsic function, 18, 19, 238
Čech complex, 164 inequality, 239
Computational Topology for Data Analysis 361

Lemma, 18, 19 pair, 67


matching, 238 pairing function, 59
multiparameter scale space kernel, 323
filtration, 265 weighted kernel, 324
interval module, 308 persistent
persistence module, 265 graded Betti number, 292
multiplicatively interleaved, 95 Betti number, 58
multirank invariant, 291 cycle, 136
multiscale mapper, 223 homology groups, 58
piecewise-linear function, 80
natural transformation, 302 PL-function, 57
neighborhood, 4 point cloud, 53
nerve, 27 polynomial ring, 264
map, 212 presentation, 268
Nerve Theorem, 27 matrix, 268
net-tower, 151 proper face
nets, 151 of a simplex, 23
non-orientable manifold, 14 pullback
nonsmooth manifold, 15 cover, 220
metric, 224
open
pure simplicial complex, 134
ball, 9
push operation, 305
interval, 8
set, 4, 8 quiver, 104
triangle, 9 quotient, 37
optimal topology, 6
cycle basis, 218
basis, 125 rank, 37
cycle basis, 125 invariant, 290
generator, 125 reach, 159
persistent cycle, 136 rectangular commutativity, 76, 303
orientable manifold, 14 reduced Betti number, 80
reduced matrix, 68
p.f.d. persistence module, 78 Reeb graph, 170
parametric surface, 15 augmented, 173
partition of unity, 213 interleaving, 183
path connected, 21 Reeb space, 172
path homology, 200 regular map, 15
persistence, 58 regular value, 114
diagram, 60, 79 relations, 265
distortion distance, 196 relative homology, 44
Fisher kernel, 327 resolution, 94, 222
image, 323 restricted Delaunay complex, 32
landscape, 320 retraction, 13
module, 75, 79, 302 Riemannian manifold, 17
362 Computational Topology for Data Analysis

ring, 37 space, 4
Rips distance, 149 subspace, 5
topologically equivalent, 10
sampling conditions, 34 topology, 4, 6
shifted module M→u , 267 total decomposition
simplex, 23, 24 module, 267
simplex-wise, 54 morphism, 268
monotone function, 56 totally unimodular, 133
simplicial tower, 94
map, 26 triangle, 23
retraction, 154 triangular commutativity, 76, 303
simplicial complex, 25 triangulation, 24, 26
abstract, 24 trivial module, 268
geometric, 24
singular unbounded, 9
homology, 45 underlying space, 25
simplex, 45 union-find, 87
skeleton, 25 unit ball, 9
Sliced Wasserstein upper star, 56
distance, 326 filtration, 56
kernel, 327 upper-link-index, 80
Smith normal form, 135
smooth valid annotation, 97
manifold, 15 vector space, 38, 46
surface, 15 vertex, 23
sparse Rips, 152 function, 56
filtration, 152 map, 26
split tree, 172 vertical homology, 180
stability Vietoris-Rips complex, 28
persistence diagram, 62 Voronoi diagram, 30
star, 25
Wasserstein distance, 63
strong convexity, 164
weak
strong witness, 31
feature size, 160
sublevel set, 19, 52
interleaving
subordinate, 213
vector space towers, 157
subspace topology, 5
pseudomanifold, 137
superlevel set, 19
witness, 31
support, 290
weight of cycle, 124
surface, 14
without boundary, 14
system of subsets, 4
witness complex, 31
tame, 63
zigzag
persistence module, 78
filtration, 103
tetrahedron, 23
module, 104
topological

You might also like