0% found this document useful (0 votes)

53 views

Lecture Notes Introduction to Topological Data Analysis

Uploaded by

R S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views

Lecture Notes Introduction to Topological Data Analysis

Uploaded by

R S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 80

Introduction to Topological Data Analysis

Lecture Notes FS 2023

Patrick Schnider <[email protected]>

July 13, 2023

Preface
These lecture notes are designed to accompany a course “Introduction to topological
data analysis” that I teach at the Department of Computer Science, ETH Zürich, since
2023. The course is intended for students with a background in computer science or
data science. It requires knowledge of linear algebra, but does not assume any previous
experience with topology.
The course can be roughly divided into four parts. In the first part, we go over
the necessary mathematical foundations, in particular concepts from algebraic topology
such as homology. In the second part, we study the persistent homology pipeline. In
the third part, we discuss Reeb graphs and the Mapper algorithm. Finally, the fourth
part contains other applications of topology in computer and data science, as well as
applications of topological data analysis to other fields. At the end of each chapter there
is a list of questions that students are expected to be able to answer in the oral exam.
In the current setting, the course runs over 14 weeks, with three hours of lectures and
two hours of exercises each week. In addition, there are two sets of graded homeworks
which students have to hand in spread over the course.
These notes are an extended version of the scribe notes written by Simon Weber from
the first iteration of the course. We have tried our best to avoid mistakes, but experience
tells that there will be many that escape our detection. So in case you notice some
problem, please let me know, regardless of whether it is a minor typo or punctuation
error, a glitch in formulation, or a hole in an argument. This way the issue can be fixed
for the next edition and future readers profit from your findings.
I thank Anton Künzi and Simon Weber for their helpful contributions.

Patrick Schnider
Department of Computer Science, ETH Zürich
Andreasstrasse 5, CH-8050 Zürich, Switzerland
E-mail address: [email protected]
Contents

1 Mathematical Foundations 6
1.1 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Maps between topological spaces . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Homology 18
2.1 Simplicial Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 An intuitive view at holes . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2 Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Boundary Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.4 Cycle and boundary groups . . . . . . . . . . . . . . . . . . . . . . 28
2.2.5 Homology Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.6 Singular Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.7 The 0-th homology group . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.8 Homology of Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.9 Induced Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.10 Application: Brouwer fixed point theorem . . . . . . . . . . . . . . 36

3 Persistence 38
3.1 Filtrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Persistent Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Algorithms for persistent homology . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Persistence pairing algorithm . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Matrix reduction algorithm . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Simplicial Complexes on Point Sets . . . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Čech and Vietoris-Rips complexes . . . . . . . . . . . . . . . . . . 44
3.4.2 Delaunay and Alpha complexes . . . . . . . . . . . . . . . . . . . . 45
3.4.3 Subsample Complexes . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Distance Metrics on Persistence Diagrams . . . . . . . . . . . . . . . . . . 48
3.5.1 Bottleneck Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.2 Wasserstein Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4
Introduction to TDA Contents

3.6 Interleaving of persistence modules . . . . . . . . . . . . . . . . . . . . . . 53

3.6.1 Interleaving distance . . . . . . . . . . . . . . . . . . . . . . . . . . 53
3.6.2 Stability with respect to interleaving distance . . . . . . . . . . . . 57
3.6.3 Stability for Čech Complexes . . . . . . . . . . . . . . . . . . . . . 57
3.7 Interval decomposition of Persistence Modules . . . . . . . . . . . . . . . . 59

4 Reeb graphs and Mapper 62

4.1 Reeb Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.1.1 Homology of Reeb graphs . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Distances for Reeb Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.2.1 Interleaving Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.2.2 Functional Distortion Distance . . . . . . . . . . . . . . . . . . . . 69
4.3 Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.1 An approximation of the Reeb graph . . . . . . . . . . . . . . . . . 71
4.3.2 Topological Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.3.3 Mapper for Point Clouds . . . . . . . . . . . . . . . . . . . . . . . 72
4.4 Multiscale Mapper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5 Optimal Generators 76
5.1 Optimal basis of a fixed complex . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Persistent cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5
Chapter 1

Mathematical Foundations

1.1 Topological Spaces

Topology, sometimes also called “rubber-sheet goemtetry”, stems from the Greek words
tópos, which means place or locality, and lógos, which means study. So, it can be roughly
translated as the study of places and shapes. Indeed, as the name rubber-sheet geometry
suggests, topology studies similar objects as geometry, but in a setting where properties
are preserved under continuous deformations like stretching and twisting. In particular,
these properties should be independent of metrics, but we would still like to have ways
to describe proximity between points. We do this by looking at open neighborhoods of
points. The core objects in topology are topological spaces, whose definition captures
the system of open neighborhoods of the points in the space.

Definition 1.1. A topological space (X, T ) is a set of points X, with a system T of

subsets of X (called the topology on X), such that

1. ∅ ∈ T , X ∈ T .
S
2. For every S ⊆ T , S ∈ T.
T
3. For every finite S ⊆ T , S ∈ T .

The sets in T are called the open sets of X.

For example, setting X = R2 and T to be the collection of open subsets (in the
geometric/calculus sense) of R2 , we can check that (X, T ) is a topological space. A
further example of a topological space is (X, 2X ), where 2X denotes the family of all
subsets of X. This is called a discrete topology.
You might wonder, why do we consider infinite unions of open sets to be open,
but restrict to finite intersections in Condition 3. This is such that the open sets of
Euclidean space can actually be called open in the language of topology. If we allowed
infinite intersections in Condition 3, a set {p} consisting a single point p ∈ R2 would

6
Introduction to TDA 1.1. Topological Spaces

have to be considered to be open: it is the intersection of the infinite series of open balls
of radius 1/n centered at p, for n ∈ N.
In most applications in these lecture notes, we work with subspaces of the Euclidean
space Rd , so apart from open sets, we also know from calculus notions such as closed
sets, closure, interior and boundary. These terms can be defined also for abstract
topological spaces:
Definition 1.2. A set Q ⊆ X is called closed, if its complement X \ Q is open. The
closure cl Q is the smallest closed set containing Q. The interior int Q is the union
of all open subsets of Q. The boundary bnd Q is the set minus its interior: bnd Q =
Q \ int Q.
Note that sets can be open and closed simultaneously: in every topological space
(X, T ), ∅ and X are such examples. In a discrete topology, every subset S ⊆ X is open
and closed.
Exercise 1.3. Show that a finite union of closed sets is closed.
So far we have only seen two topological spaces: Euclidean space, or any set with
the (rather boring) discrete topology. In order to see the value in the abstractions we
are doing, we would like to have more examples of topological spaces. In particular, it
would be great if we had a way to get new topological spaces from known ones. In the
following we discuss some ways to this, starting with taking intersections.
Lemma 1.4. Let (X, T ) be some topological space, and Y ⊆ X. Then, U := {A∩Y | A ∈ T }
is a topology on Y. We call this a subspace topology.
Proof. We check the three conditions of a topology:
1. ∅ = ∅ ∩ Y, therefore ∅ ∈ U. Similarly, Y = X ∩ Y, and thus Y ∈ U.

2. i∈I (Ai ∩ Y) = ( i∈I Ai ) ∩ Y, and thus i∈I (Ai ∩ Y) ∈ U.

S S S

3. n
Tn Tn
i=1 (Ai ∩ Y) = ( i=1 Ai ) ∩ Y, and thus
T
i=1 (Ai ∩ Y) ∈ U.

Since we have seen that Rd is a topological space, this already tells us that all subsets
of Rd are topological spaces.
Another way to get topological spaces is as a product of spaces. We will not discuss
the details of this here, and refer the interested reader to any textbook on topology, such
as the excellent book by Munkres [1].
Fact 1.5. Let X, Y be two topological spaces. Then, X × Y is a topological space, with
the so-called product topology.
The definition of topological spaces allows us to formally define concepts from geom-
etry in a more abstract setting:

7
Chapter 1. Mathematical Foundations Introduction to TDA

Definition 1.6. A topological space (X, T ) is disconnected, if there are two disjoint non-
empty open sets U, V ∈ T , such that X = U ∪ V. A topological space is connected, if
it is not disconnected.
Exercise 1.7. In this exercise, we will use topology to prove that the set of primes is
infinite.
We define the sets S(a, b) as follows:
S(a, b) := {an + b | n ∈ Z}, ∀a ∈ Z \ {0}, b ∈ Z
We then say that a set U ⊆ Z is open, if and only if for all x ∈ U, there exists
a ∈ Z such that S(a, x) ⊆ U. This is equivalent to saying that every open set U is
a union of zero or more (including infinitely many) sets S(a, b).
(a) Show that this defines a topology on Z.
(b) Let A ⊂ Z be finite and non-empty. Show that Z \ A cannot be closed.
(c) Show that S(a, b) is both open and closed.
(d) Show that
[
S(p, 0) = Z \ {−1, 1}
p prime

(e) Conclude that there are infinitely many primes.

1.2 Metric Spaces

Recall that topological spaces should capture neighborhoods of points without using the
notion of a distance. However, if we do have distances, we should still be able to use the
framework of topological spaces, that is, topological spaces should be a generalization of
spaces with distances.
Definition 1.8. A metric space (X, d) is a set X of points and a distance function
d : X × X → R satisfying
1. d(p, q) = 0 if and only if p = q.
2. d(p, q) = d(q, p), ∀p, q ∈ X. (Symmetry)
3. d(p, q) ⩽ d(p, s) + d(s, q), ∀p, q, s ∈ X. (Triangle inequality)
Note that these three conditions imply that d(p, q) ⩾ 0 for all p, q ∈ X: If some
distance d(p, q) would be negative, we would have 0 = d(p, p) ⩽ d(p, q) + d(q, p) =
2 · d(p, q) < 0, a contradiction.
Fact 1.9. Every metric space has a topology (the metric space topology) given by the
open metric balls B(c, r) = {p ∈ X | d(p, c) < r} and their unions.

8
Introduction to TDA 1.3. Maps between topological spaces

1.3 Maps between topological spaces

In most areas of mathematics, there are two things that are at the core of every theory:
the studied objects, and maps between them. For example, in linear algebra we study
vector spaces and the linear maps between then. Now that we have defined the objects
of study, which are topological spaces, we want to look at the maps between them.
Definition 1.10. A function f : X → Y is continuous if for every open set U ⊆ Y,
its pre-image f−1 (U) ⊆ X (the set of all elements x ∈ X such that f(x) ∈ U) is
open. Continuous functions are also called maps. If f is injective, it is called an
embedding.
Let us give some examples:
• For X ⊆ Y, we write X ,→ Y for the function f(x) = x, ∀x ∈ X. This function, which
is also called the inclusion map, is continuous: f−1 (U) = U ∩ X, which is open in
the subspace topology on X.

• For a function f : R → R, continuity agrees with the “ϵ-δ” definition of continuity

from calculus.
Exercise 1.11. A topological space (X, T ) is called path-connected if any two points
x, y ∈ X can be joined by a path, i.e., there exists a map f : [0, 1] → X of the segment
[0, 1] ⊂ R onto X such that f(0) = x and f(1) = y. Prove that a path-connected space
is connected.
We can also use continuous functions to define when we consider two topological
spaces to be “the same”:
Definition 1.12. A homeomorphism is a bijective map f : X → Y whose inverse is also
continuous. Two topological spaces are homeomorphic, if there is a homeomorphism
between them. We also write X ≃ Y to say that X, Y are homeomorphic.
Let us again look at some examples:

• The boundary of a tetrahedron is homeomorphic to the sphere S2 . Idea: Take a

point c within the tetrahedron, and send each point p to the point f(p) on the ray
from c through p such that d(c, f(p)) = 1.

• I := (−1, 1) is homeomorphic to R. The following map f is a homeomorphism:

f : I → R, x 7→ 1−|x|
x
. Its inverse is f−1 : R → I, y 7→ 1+|y|
y
.

• All knots (embeddings of the circle into R3 ) are homeomorphic. Thus, we cannot
distinguish between knots using only homeomorphism.

Exercise 1.13. Give an example of a map f : X → Y that is bijective but not a homeo-
morphism.

9
Chapter 1. Mathematical Foundations Introduction to TDA

Exercise 1.14. Consider a grid of 2 vertical line segments and k + 2 horizontal seg-
ments, for some k ⩾ 0. For k = 1, this looks as follows:
Now, we consider the problem of placing a point on each of the k + 2 horizontal
line segments, such that each of the k + 4 total line segments contains at least one
point.

(a) How could one define a topology on the set of all such point placements?

(b) Prove that this space is homeomorphic to Sk .

Figure 1.1: Two knots.

The example of the knots shows that in certain cases, maps and homeomorphism are
not a good language to capture the relevant properties. In some cases, we want to look
at an entire process of continuously deforming one object into another.

Definition 1.15. An isotopy connecting X ⊆ Rd and Y ⊆ Rd is a continuous map

ϕ : X × [0, 1] → Rd , such that ϕ(X, 0) = X, ϕ(X, 1) = Y, and ∀t ∈ [0, 1], ϕ(·, t) is a
homeomorphism between X and its image. Two spaces are called isotopic, if there
is an isotopy connecting them.

Some examples:

• Let X ⊂ R be the union of 0, and [1, 2], and let Y ⊂ R be the union of [0, 1] and 2.
These spaces are homeomorphic (X ≃ Y), but not isotopic.

• The two knots from Figure 1.1 above are also not isotopic.

• Consider the two spaces in Figure 1.2. Do you think they are isotopic? Most
people would probably argue that they are not, as in one of them the “handcuff”
wraps around the “pole” once and in the other one twice. However, it turns out
that the spaces are in fact isotopic. An isotopy is illustrated by the following video:
https://www.youtube.com/watch?v=wDZx9B4TAXo

We can also study continuous deformations between maps:

Definition 1.16. Let g, h be maps X → Y. A homotopy connecting g and h is a map

H : X × [0, 1] → Y such that H(·, 0) = g and H(·, 1) = h. In this case g and h are
called homotopic.

10
Introduction to TDA 1.3. Maps between topological spaces

Figure 1.2: Left: Both handcuffs are connected to an infinite pole. Right: Only one
loop of the handcuffs is connected to the infinite pole. These spaces are
isotopic.

Some examples:
• The inclusion map g : B3 ,→ R3 (where B3 is the unit ball in R3 ), and h : B3 → R3
which sends every point to the origin, are homotopic, as shown by the homotopy

H(x, t) = (1 − t)g(x).

• The identity function g : S1 → S1 and h : S1 → S1 which sends everything to a

single point p ∈ S1 are not homotopic.
The notion of homotopy allows us to define an equivalence on topological spaces
that is weaker than homeomorphism. Intuitively, this notion says that two spaces are
“the same” if they can be continuously transformed into each other not only by bending,
twisting and stretching, but also by shrinking or blowing up parts of different dimensions.
Definition 1.17. Two spaces X, Y are homotopy equivalent if there exist maps g : X → Y
and h : Y → X such that:
• h ◦ g is homotopic to idX (the identity map x 7→ x), and

• g ◦ h is homotopic to idY .
For example, the circle S1 and R2 \ {0} are homotopy equivalent. We pick g as the
inclusion map S1 ,→ R2 \ {0}, and h(x) := |x|
x
. We see that h ◦ g(x) = x, i.e., h ◦ g = idS1 .

11
Chapter 1. Mathematical Foundations Introduction to TDA

Furthermore, g ◦ h(x) = h(x). Finally, g ◦ h and idR2 \{0} are homotopic as certified by
the homotopy H(x, t) := tx + (1 − t)h(x).
An important example of a homotopy equivalence are deformation retracts:

Definition 1.18. Let A ⊆ X. A deformation retract of X onto A is a map R : X×[0, 1] → X,

such that

• R(·, 0) = idX

• R(x, 1) ∈ A, ∀x ∈ X

• R(a, t) = a, ∀a ∈ A, t ∈ [0, 1]

If such a deformation retract of X onto A exists, we also say that A is a deformation

retract of X.

The intuition behind a deformation retract is to continuously shrink X to A, while

leaving A fixed.

Fact 1.19. If A is a deformation retract of X (there exists a deformation retract of

X onto A), then A and X are homotopy equivalent.

Some examples:

• The circle S1 is a deformation retract of R2 \ {0}: R(x, t) = (1 − t)x + t · x

|x|
.

• A punctured torus can be deformation retracted onto the symbol 8 where one of
the two circles is rotated by 90◦ , as seen by the following video:
https://www.youtube.com/watch?v=tz3QWrfPQj4

Lemma 1.20. If X and Y are homeomorphic, they are also homotopy equivalent.

Proof. Let g : X → Y be the homeomorphism, and h := g−1 its inverse. Then g ◦ h = idY
and h ◦ g = idX , and id is homotopic to itself.

The following is a nice way to show that two spaces are homotopy equivalent:

Fact 1.21. X, Y are homotopy equivalent if and only if there exists a space Z such
that X and Y are deformation retracts of Z.

An example of this fact can be found in Figure 1.3.

Exercise 1.22. Sort the letters of the alphabet into equivalence classes under homotopy
equivalence.

Exercise 1.23. Show that both a cylinder and a Möbius strip are homotopy equivalent
to a circle.

12
Introduction to TDA 1.4. Algebra

Figure 1.3: The top space deformation retracts to both spaces below, showing that
they are homotopy equivalent.

(a) The space X. (b) The space Y.

Figure 1.4: The spaces from Exercise 1.24.

Exercise 1.24. Let X be S2 where the north pole and the south pole have been glued
together, see Figure 1.4a. Let Y be S2 with an S1 attached at the north pole, see
Figure 1.4b.
Give an informal argument that X and Y are homotopy equivalent. Bonus ques-
tion: Are they also homeomorphic?
We note that in general showing existence of a map with certain properties (e.g., a
homeomorphism, isotopy, homotopy) is easy: just give a map and show that it satisfies
the required properties. On the other hand, showing that such a map cannot exist is
hard, as there are usually infinitely many candidate maps. The idea of algebraic topology
is to construct invariants preserved by these maps. Then, we know that no map can exist
between spaces on which these invariants differ. An example of such an invariant is the
number of “holes” a space has, which we will formalize when we introduce the notion of
homology.

1.4 Algebra
In this section we need the necessary background in algebra that is needed for the basics
of homology theory. Just as for topology, we first introduce the objects of study, followed

13
Chapter 1. Mathematical Foundations Introduction to TDA

by the maps between them.

Definition 1.25. A group (G, +) is a set G together with a binary operation “+” such
that

1. ∀a, b ∈ G: a + b ∈ G

2. ∀a, b, c ∈ G: (a + b) + c = a + (b + c) (Associativity)

3. ∃0 ∈ G: a + 0 = 0 + a = a ∀a ∈ G

4. ∀a ∈ G∃ − a ∈ G: a + (−a) = 0
(G, +) is abelian if we also have

5. ∀a, b ∈ G: a + b = b + a (Commutativity)

Examples:

• (Z, +) is a group (even an abelian one), but not (N, +).

• The moves of a Rubik’s cube also form a group (with the operation being concate-
nation), but not an abelian one: let L denote moving the left face clockwise, and
let U denote moving the upper face clockwise. Replacing “clockwise” by counter-
clockwise we get −L and −U, respectively. Now, if the group was abelian, then
L + U − L − U should give the same configuration again, but if you do these moves
on a Rubik’s cube, you will see that the configuration has changed.

As groups can be very large, even infinite, it can be useful to have a concise way of
writing them:

Definition 1.26. Let (G, +) be a group.

A subset A ⊆ G is a generator if every element of G can be written as a finite sum
of elements of A and their inverses.
A subset B ⊆ G is a basis if every element of G can be uniquely written as a
finite sum of elements of B and their inverses (ignoring trivial cancellations, i.e.,
a + c + (−c) + (−b) = a + (−b)).
An abelian group that has a basis is called free.

Examples:

• The six standard moves of the Rubik’s cube (rotating the top, bottom, front, back,
left, or right layer clockwise by 90◦ ) are a generator for the Rubik’s cube moves.

• {1} is a basis of (Z, +).

Exercise 1.27. A cyclic group is a group G that contains an element g ∈ G such that
{g} is a generator of G. Show that every cyclic group is abelian (commutative).

14
Introduction to TDA 1.4. Algebra

Exercise 1.28. Consider a Rubik’s cube. Prove that no move (sequence of elementary
moves) X exists such that every Rubik’s cube can be solved by repeatedly applying
X.

Definition 1.29. For some group (G, +), H ⊆ G is a subgroup, if (H, +) is also a group.

For example, the even integers (including 0) are a subgroup of (Z, +). Subgroups are
important in group theory, as they can be used to partition a group into several parts:

Definition 1.30. Let H ⊆ G be a subgroup of (G, +), and a ∈ G.

The left coset a + H is the set a + H := {a + b | b ∈ H}, and the right coset H + a :=
{b + a | b ∈ H}. If G is abelian, a + H = H + a, and they are simply called the coset.
For G abelian, the quotient group of G by H, denoted by G/H, is the group on cosets
{a + H, a ∈ G} with the operation ⊕ defined as (a + H) ⊕ (b + H) = (a + b) + H,
∀a, b ∈ G.

Examples:

• Let G = (Z, +) and H = nZ = {n · a | a ∈ Z}. Then, G/H = {0 + Z, 1 + Z, . . . , (n −

1) + Z} is the group usually referred to as Zn , the group of modular arithmetic
modulo n.

• R/Z is the circle group (the multiplicative group of all complex numbers of absolute
value 1). (You should try and convince yourself, why).

In order to compare groups with each other, we again want a notion of maps between
groups, that behave well with the group structures:

Definition 1.31. A map h : G → H between (G, +) and (H, ⋆) is a homomorphism if

h(a + b) = h(a) ⋆ h(b), ∀a, b ∈ G.
∼ H and
A bijective homomorphism is called an isomorphism, and then we write G =
say that G and H are isomorphic.

kernel ker h := {a ∈ G | h(a) = 0}

image im h := {b ∈ H | ∃a ∈ G with h(a) = b}

cokernel coker h := H/ im h

Note that we are assuming something in our definition of the cokernel: for the defi-
nition of a quotient group to apply, we need the divisor group to be a subgroup of the
dividend group. Luckily, the following lemma says that im h is always a subgroup of H.

Lemma 1.32. ker h and im h are subgroups of (G, +) and (H, ⋆), respectively.

Proof. We first prove this for ker h.

15
Chapter 1. Mathematical Foundations Introduction to TDA

1. a, b ∈ ker h ⇒ h(a) = h(b) = 0. By definition of homomorphism, h(a + b) =

h(a) ⋆ h(b) = 0 ⋆ 0 = 0, and thus by definition of ker h, a + b ∈ ker h. We conclude
that ker h is closed under +.

2. Associativity follows from associativity of + in G, since ker h ⊆ G.

3. ∀a ∈ G : h(0) ⋆ h(a) = h(0 + a) = h(a), and thus h(0) = 0, from which 0 ∈ ker h
follows.

4. Let a ∈ ker h. Then, 0 = h(0) = h(a − a) = h(a) ⋆ h(−a) = 0 ⋆ h(−a) = h(−a),

and thus −a ∈ ker h.

The proof for im h is left as an exercise.

Exercise 1.33. Show that im h is a subgroup of H.

Exercise 1.34. For two Abelian groups (G, ⋆) and (H, +), let the set of all homomor-
phisms f : G → H be denoted by Hom(G, H).

(a) Show that for any groups G, H, (Hom(G, H), ⊕), where the operation + is
defined as
(f ⊕ g)(x) = f(x) + g(x), ∀x ∈ G,
is also a group.
∼ Z2 , i.e., the groups are isomorphic.
(b) Show Hom(Z22 , Z2 ) = 2

As the example of the integers shows, a big motivation for the study of groups comes
from number theory. However, in number theory we do not only have addition but also
multiplication. This motivates the following definition:

Definition 1.35. (R, +, ·) is a ring, if

1. (R, +) is an abelian group.

2. ∀a, b, c ∈ R:
(a · b) · c = a · (b · c) and (Associativity of ·)
a · (b + c) = a · b + a · c,
(b + c) · a = b · a + c · a (Distributivity)

3. ∃1 ∈ R, such that a · 1 = 1 · a = a ∀a ∈ R. (Multiplicative identity)

If · is commutative, we say that R is commutative.

Definition 1.36. A commutative ring in which every non-zero element has a multi-
plicative inverse (∀a ∈ R \ {0}, ∃b ∈ R : a · b = 1) is called a field.

16
Introduction to TDA 1.4. Algebra

Another important area of algebra, which you already know, is linear algebra. Here,
vectors can be added and subtracted. Further the field of real numbers are called scalars
and they can be multiplied with vectors. So, we have very similar operations at hand.
This motivates the following generalization of the concept of vector spaces.

Definition 1.37. Given a ring (R, +, ·) with multiplicative identity 1, an R-module M is

an abelian group (M, ⊕) with an operation ⊗ : R × M → M such that for all r, r ′ ∈ R
and x, y ∈ M, we have

1. r ⊗ (x + y) = (r ⊗ x) ⊕ (r ⊗ y)

2. (r + r ′ ) ⊗ x = (r ⊗ x) ⊕ (r ′ ⊗ x)

3. 1 ⊗ x = x

4. (r · r ′ ) ⊗ x = r ⊗ (r ′ ⊗ x)

If R is a field, the R-module is called a vector space.

In the literature, often the same symbol (·) is used for both operations · and ⊗, and
+ for both + in R and ⊕ in M. For a vector space, this should feel quite normal, since
for the vector space Rn (which is an R-module), we also write · for multiplying scalars
to both scalars and vectors, and + for addition of both scalars and vectors.
Modules appear all over the place in homology theory. In some cases, in particu-
lar in all the ones we discuss in these lecture notes, the modules happen to be vector
spaces. Thus, most of what we discuss in the following chapters could be phrased using
only language from linear algebra. However, to be consisting with most of the existing
literature, we will phrase most results slightly more general.

Questions
1. What is a topological space? Give the formal definition and some examples.
2. What is a continuous map between topological spaces? What is a homeomor-
phism? State the definitions and give examples.
3. What is a homotopy? What is a homotopy equivalence? Give the formal
definitions. Further, define deformation retracts and use them to give an alternative
definition of homotopy equivalence.
4. What are groups and the maps between them? State the definitions and prove
that the image and kernel are subgroups.

References
[1] J.R. Munkres, Topology, Prentice Hall, Incorporated, 2000.

17
Chapter 2

Homology

In this chapter, we introduce homology, a fundamental concept in algebraic topology

and, as the name suggests, a crucial element of the persistent homology pipeline in
topological data analysis. Very informally, homology can be used to count the number
of “holes” of a topological space, where holes can have any dimension. While you might
have an intuition of what a 2-dimensional hole in a subspace of R2 might be, it is not
at all clear what a 4-dimensional hole in some 7-dimensional space should be. The main
idea of homology is to use algebra to talk about holes in an abstract setting.

2.1 Simplicial Complexes

In order to define homology, we restrict ourselves to special types of topological spaces,
namely simplicial complexes. We will see that this covers most natural spaces. Fur-
thermore, homology for simplicial complexes is sufficient for all classical applications in
topological data analysis. We will briefly outline a more general definition later in the
chapter.

Definition 2.1. A k-simplex in Rd is the convex hull of k + 1 affinely independent

points in Rd .

A face of a simplex is the convex hull of a subset of its vertices. In particular, every
face of a simplex is also a simplex. The empty set ∅ is also a face. The (k − 1)-faces are
called facets.

Definition 2.2. A geometric simplicial complex is a family K of simplices such that

• if τ ∈ K and σ is a face of τ, then σ ∈ K, and

• for σ, τ ∈ K, their intersection σ ∩ τ is a face of both.

We say the dimension of a simplicial complex is the maximum dimension of any

simplex, and the dimension of a k-simplex is k. In these lecture notes, and for applications

18
Introduction to TDA 2.1. Simplicial Complexes

Figure 2.1: Some examples of simplices: a point (0-dimensional), a line segment

(1-dimensional), a triangle (2-dimensional) and a (filled) tetrahedron
(3-dimensional).

Figure 2.2: The left is a simplicial complex. The right is not, as the intersection of
the two triangles is not a face of both of them.

in topological data analysis, we may assume that all simplicial complexes are finite, that
is, consisting of finitely many simplices.
The way we defined them, simplicial complexes are geometric objects. However, we
can also study them in a purely combinatorial setting.

Definition 2.3. An abstract simplicial complex K is a family of subsets of a vertex set

V(K) such that if τ ∈ K and σ ⊆ τ, then σ ∈ K.

A k-simplex here is a subset of k + 1 elements, and thus again called k-dimensional.

Note that 1-dimensional abstract simplicial complexes are exactly graphs: they are de-
fined by a vertex set V and a system of two-element subsets of V, called edges.
From every geometric simplicial complex we get an abstract simplicial complex by
simply taking the set of points as the vertex set and adding the correct subset for every
simplex. For the inverse direction, we have to talk about geometric realizations:

Definition 2.4. A geometric simplicial complex K is a geometric realization of some

abstract simplicial complex K ′ , if there is an embedding e : V(K ′ ) → Rd that takes

19
Chapter 2. Homology Introduction to TDA

every (abstract) k-simplex {v0 , . . . , vk } in K ′ to the (geometric) k-simplex that is the

convex hull of e(v0 ), . . . , e(vk ).

Does every abstract simplicial complex have a geometric realization? For 1-dimensional
complexes (graphs), we know that not all graphs admit a straight-line embedding in the
plane, as only planar graphs admit any embedding, i.e., crossing-free drawing, in the
plane. However, by placing the vertices in R3 in such a way that no four vertices lie on
a common plane, we see that we can always find a geometric realization of a graph in
R3. This generalizes to the following realization theorem:
Theorem 2.5. Every k-dimensional simplicial complex has a geometric realization in
R2k+1.
Proof. Place the vertices as distinct points on the moment curve in R2k+1 , which is
the curve given by f(t) = (t, t2 , . . . , t2k+1 ). This way, any 2k + 2 of the placed points are
affinely independent. Thus, any two faces with disjoint vertex sets will not intersect in
the realization, showing that the realization is indeed an embedding.
Since we now know that abstract and geometric simplicial complexes can be translated
into one another, we will not make the distinction between them again and just use the
word simplicial complex for both objects in the following. As a subset of Euclidean
space, a simplicial complex thus also inherits the subspace topology from Rd , which
allows us to view simplicial complexes as topological spaces.
On the other hand, most topological spaces are not simplicial complexes by definition.
For example, the 2-sphere S2 is not a simplicial complex, as it is not defined by a vertex
set and faces. However, the boundary of a tetrahedron is a simplicial complex, and it is
homeomorphic to S2 , so if we want to work with S2 , from a topologist’s point of view,
we might as well work with the boundary of a tetrahedron instead. This motivates the
following definition.

Definition 2.6. A simplicial complex K is a triangulation of a topological space X, if

|K| is homeomorphic to X.

We say that a topological space X is triangulable if it has a triangulation. Triangu-

lable spaces are nice for as, as we can replace them by simplicial complexes without any
loss of topological information. Unfortunately, not all topological spaces are triangulable,
but in this course we will not deal with such spaces. Also note that a triangulable space
has infinitely many triangulations, for example by subdividing simplices.
While triangulations give us simplicial complexes from topological spaces, we can
also go the other way: many combinatorial structures naturally give rise to (abstract)
simplicial complexes, which can in turn be interpreted as topological spaces. Thus, we
can use the machinery of topological also for many combinatorial problems. This gives
rise to a subfield of combinatorics called topological combinatorics, where the topology
of simplicial complexes associated to combinatorial objects is studied. Let us give some
examples of such simplicial complexes.

20
Introduction to TDA 2.1. Simplicial Complexes

• As we have already discussed, a graph is exactly a 1-dimensional simplicial complex.

• Given a graph G = (V, E), define a simplicial complex on V by including a face

{v1 , . . . , vk } whenever these vertices form a clique in G. This is called the clique
complex of G.

• For a poset (P, ⩽), the set of all chains of P forms a simplicial complex, giving rise
to the order topology.

Another example of high relevance for topological data analysis is the nerve:

Definition 2.7. For a finite collection U of sets, its nerve N(U) is a simplicial complex
on the vertex set U that contains U0 , . . . , Uk as a k-simplex if and only if U0 ∩ . . . ∩
Uk ̸= ∅.

In many applications, the considered sets are subsets of some topological space. In
this case, we often want the intersections to be “well-behaved”.

Definition 2.8. Let X be a metric space, and U a finite family of closed subsets of X.
We call U a good cover, if every non-empty intersection of sets in U is contractible
(i.e., homotopy equivalent to a point).

Under these conditions on the sets, we get the following, very powerful theorem, which
allows us to relate complicated spaces (unions of sets) with a much simpler simplicial
complex, namely the nerve. For a proof of this we refer to any textbook on algebraic
topology, for example the one by Hatcher [2].

Theorem
S 2.9 (Nerve theorem). If U is a good cover, then |N(U)| is homotopy equivalent
to U.

The nerve theorem also holds if all the sets in U are open with contractible intersec-
tions, but it may fail if some sets in U are closed, and some open: We can have an open
and a closed set which do not intersect, but whose union is connected.
Now that we have defined simplicial complexes, once again we want to study maps
between them. The study of simplicial complexes and the maps between them, as we
will define them, is called combinatorial topology.

Definition 2.10. A map f : K1 → K2 (which maps vertices of K1 to vertices of K2 , also

called a vertex map) is called simplicial if for every simplex {v0 , . . . , vk } ∈ K1 , we
have that {f(v0 ), . . . , f(vk )} is a simplex in K2 .

Recall that simplicial complexes are topological spaces, so there is also the notion of
continuous maps between them. It can be shown that every simplicial map is continuous.

Exercise 2.11. Let f : K1 → K2 be a simplicial map. Show that f is continuous.

21
Chapter 2. Homology Introduction to TDA

On the other hand, continuous maps are in general not even vertex maps and thus not
simplicial. Thus, simplicial maps are more restrictive than continuous maps. However,
the difference of the two concepts is smaller than one might think at first glance.

Fact 2.12. Every continuous map f : |K1 | → |K2 | can be approximated arbitrarily
closely by simplicial maps on appropriate subdivisions of K1 and K2 .

This shows that we can consider simplicial maps to be the analogue of continu-
ous maps in the world of simplicial complexes. This begs the question whether other
definitions from topology, such as homotopies or deformation retracts, have simplicial
analogues. As we will see in the next few definitions, they do.

Definition 2.13. Two simplicial maps f1 , f2 : K1 → K2 are contiguous if for every

simplex σ ∈ K1 we have that f1 (σ) ∪ f2 (σ) is a simplex in K2 .

This is the simplicial analogue of two continuous maps being homotopic.

Definition 2.14. A face of a simplicial complex is called free, if it is a non-maximal

(not inclusion-maximal) and contained in a unique maximal face.

Note that every face that is a superset of a free face is either a maximal face or also
free.

Definition 2.15. A collapse is the operation of removing all faces γ that contain some
fixed free face τ. A simplicial complex is collapsible if there is a sequence of collapses
leading to a point.

A collapse can be written as a deformation retract. Thus, a simplicial complex that

is collapsible is contractible, and we consider collapses to be the simplicial analogue of
deformation retracts.
You might wonder whether every contractible simplicial complex is also collapsi-
ble. We will see that this not hold: A good counterexample for this is Bing’s house
with two rooms, see Figure 2.3. In any triangulation of it, there are no free faces: As
a 2-dimensional space, there are only vertices, edges and triangles. We only have to
check edges, since triangles are maximal, and vertices are part of edges which are never
maximal. Every edge is incident to at least two triangles (there are no edges on the
“boundary”), and thus they are not free. Since we have no free faces, it is not collapsible.
On the other hand, Bing’s house is contractible: while there is no deformation retract
from Bing’s house to a point, both Bing’s house and a point are deformation retracts
of a 3-dimensional ball, and thus by Fact 1.21 the are homotopy equivalent. For a
visual sketch of the deformation retract from a 3-dimensional ball to Bing’s house, see
Figure 2.4.
To summarize, the connection between simplicial complexes and topological spaces is
that every simplicial complex defines a topological space, since we can consider a geomet-
ric embedding, and the underlying space of the embedding inherits the subspace topology

22
Introduction to TDA 2.1. Simplicial Complexes

Figure 2.3: Bing’s house with two rooms. Image taken from [2].

Figure 2.4: A visual representation of the deformation retract from a 3-dimensional

ball to Bing’s house. Images taken from the blog Sketches of topology [1].

from Rd . On the other hand, some topological spaces (the triangulable ones) can be ex-
pressed by simplicial complexes. As for maps, every simplicial map is continuous. On
the other hand, continuous maps between simplicial complexes can be approximated by
simplicial maps between subdivisions of the simplicial complexes. A similar property
holds between homotopic maps and contiguous maps, as well as between deformation
retracts and collapses. In general, we can say that the terms in combinatorial topol-
ogy are special cases of their “continuous” counterparts, and if we consider triangulable
spaces, the continuous terms can be approximated in some way by their combinatorial

23
Chapter 2. Homology Introduction to TDA

counterparts, and can thus be considered to be equivalent.

Table 2.1 summarizes the equivalent words in “continuous topology” and in combina-
torial topology on simplicial complexes.

“continuous” topology combinatorial topology

topological spaces simplicial complexes
continuous maps simplicial maps
homotopic maps contiguous maps
deformation retracts collapses

Table 2.1: Equivalent notions in “continuous” and combinatorial topology

2.2 Homology

2.2.1 An intuitive view at holes

In this short section we will try to give a comically simplified sketch of the basic intuition
of homology, in the hope that it might help understanding the next, more technical
sections.
Recall that homology is intended as a tool to count holes in objects. We now know
which objects we want to consider, namely simplicial complexes. Consider the following
two simplicial complexes on 4 vertices a, b, c and d. In the fist complex K1 , we include
all four possible triangles abc, abd, acd and bcd as 2-faces (and thus also all six possible
edges as 1-faces). In the second complex K2 , we again include all possible edges, but
only three of the triangles, namely abc, abd and acd. Thus, K1 is homeomorphic to the
boundary of a tetrahedron whereas K2 can be viewed as a triangulation of four points
in the plane, where the point a lies inside the convex hull of the other three points.
How many holes do these complexes have? The complex K1 is homeomorphic to a 2-
dimensional sphere, which is hollow, so we would like to say that it has a 2-dimensional
hole, or cavity. On the other hand, the complex K2 should not have any holes.
There is also another difference between the two complexes: the complex K2 has a
boundary consisting of the edges bc, bd and cd. On the other hand, K1 has no boundary,
just as a sphere has no boundary. Indeed, for every pure simplicial complex, that is, a
simplicial complex whose maximal faces all have the same dimension, we will define a
notion of boundary capturing this intuition. For example, for a 1-dimensional simplicial
complex, that is, a graph, the boundary will contain all the leaves (vertices of degree
1). Some complexes, like K1 , will have an empty boundary, and, in analogy to graphs
without leaves, we call such complexes cycles. Under this viewpoint, our d-dimensional
holes of a simplicial complex K should be (d − 1)-subcomplexes that are cycles, but that
are not themselves boundaries of another d-dimensional subcomplex which would be
“filling up” the hole.

24
Introduction to TDA 2.2. Homology

In the following we will make this intuition precise by precisely defining the types of
subcomplexes we consider, as well as the notions of boundaries and cycles, and how we
can mathematically describe the cycles that are not boundaries.

2.2.2 Chains
Let K be a simplicial complex with mp p-simplices.

Definition 2.16. A p-chain c (in K) is a formal sum1 of p-simplices added with some
coefficients from some ring R.

X
mp

c= αi σi
i=1

where αi ∈ R and σi ∈ K are p-simplices.

P P ′
Two p-chains c = αi σi and c ′ = αi σi (both in K) can be added:

X
mp
′
c + c := (αi + αi′ )σi
i=1

We write Cp (K) for the set of all p-chains in K, called the p-th chain group. The
following observation shows that this name makes sense:

Observation 2.17. (Cp (K), +) is an abelian group, it is free, and the p-simplices form
a basis.

Proof. To show that it is a group, we have

1. ∀c1 , c2 ∈ Cp (K), we have c1 + c2 ∈ Cp (K)

2. ∀c1 , c2 , c3 ∈ Cp (K),
P (1) (2) P (3) P (1) (2) (3)
(c1 + c2 ) + c3 = (αi + αi )σi + αi σi = (αi + αi + αi )σi =
P (1) P (2) (3)
αi σi + (αi + αi )σi = c1 + (c2 + c3 ).
P
3. 0 = 0σi ∈ Cp (K)
P P
4. ∀c ∈ Cp (K) we have −c = (−αi σi ) ∈ Cp (K) and c + (−c) = (αi − αi )σi = 0

Commutativity follows from + being commutative, thus the group is abelian. The p-
simplices clearly form a basis, since the set of chains is defined as the set of formal sums
of these p-simplices.
1
A formal sum just means that we formally write a sum, but that there is no meaning behind the
operation of adding the simplices.

25
Chapter 2. Homology Introduction to TDA

Observation 2.18. Equipped with the appropriate function · : R×Cp (K) → Cp (K), Cp (K)
is an R-module.

The proof is similar and left as an exercise, but the statement should feel natural
since every chain is simply described by a vector of mp elements of R, with addition
being element-wise addition in R.

Exercise 2.19. Prove Observation 2.18.

From now on we will always work with the ring R = Z2 , so in particular we have
that c + c = 0. With this, we will define homology over Z2 . Using some slightly more
abstract definitions, all of the following can be extended to define homology over any
ring R. For more on this, we refer to any textbook on algebraic topology, e.g. the one by
Hatcher [2].

2.2.3 Boundary Maps

In this section we will make formal the notion of a boundary. It should be intuitively
clear what the boundary of a simplex should be: just take the chain formed by its facets.
Thanks to the group structure of the chain group, this intuition can now be algebraically
extended to any chain.
More formally, let σ = {v0 , . . . , vp } be a p-simplex. Then, δp (σ) is defined by

X
p
{v1 , . . . , vp } + {v0 , v2 , . . . , vp } + . . . + {v0 , . . . , vp−1 } = {v0 , . . . , vˆi , . . . , vp }
i=0

In the above notation, vˆi denotes that the element vi is omitted from the set. Note that
δp (σ) is a (p − 1)-chain. For some examples, see Figure 2.5.

3 3

δ2( )= + + ≈
1 2 1 2

δ0( ) = 0
Figure 2.5: The boundary chains of two different simplices.

We have seen that δp is a map that sends a p-simplex to a (p − 1)-chain. Extending

this to chains, δp defines a boundary operator homomorphism:

δp : Cp (K) → Cp−1 (K)

X X
c= αi σi 7→ δp (c) = αi (δp (σi ))

26
Introduction to TDA 2.2. Homology

Let us apply this definition to the following example. In a slight abuse of notation,
we denote a face {a, b, c} by abc.
d

b
a e

δ2 (abc + bcd) = δ2 (abc) + δ2 (bcd)

= (ab + bc + ac) + (bc + cd + bd)
= ab + ac + cd + bd

δ2 (abc + bcd + bce) = (ab + bc + ac) + (bc + cd + bd) + (bc + ce + be)

= ab + bc + ac + cd + bd + ce + be

Note that an edge is in the boundary of a chain of triangles exactly if it is contained

in an odd number of triangles of the chain.
The following lemma states that the boundary of a boundary is empty. This for-
malizes the intuition that we want to consider the boundary of a simplex to be a cycle,
where the interior of the simplex fills up the cavity given by its boundary.

Lemma 2.20. For p > 0, δp−1 ◦ δp (c) = 0, for any p-chain c.

In the example above, δ1 (ab + ac + cd + bd) = (a + b) + (a + c) + (c + d) + (b + d) = 0.

P
Proof.
P It is enough to show this for simplices, as δp−1 ◦ δp (c) = δp−1 ( αi (δp (σi ))) =
αi (δp−1 ◦ δp (σi )).
For a p-simplex σ, every (p − 2)-face is contained in exactly 2 (p − 1)-faces, and does
thus not appear in δp−1 ◦ δp (σ).

For a k-dimensional simplicial complex K, we get a sequence of homomorphisms,

called the chain complex :

δk+1 k δ 2 δ 1 δ 0 δ
0 = Ck+1 (K) −→ Ck (K) −→ Ck−1 (K) · · · C2 (K) −→ C1 (K) −→ C0 (K) −→ C−1 = 0

27
Chapter 2. Homology Introduction to TDA

Cp (K) Cp−1 (K) Cp−2 (K)

Figure 2.6: A schematic illustration of a part of a chain complex.

2.2.4 Cycle and boundary groups

As already mentioned, chains without boundaries are called cycles. These are the objects
potentially giving rise to holes or cavities.

Definition 2.21. A p-chain c is a p-cycle if δ(c) = 0. Zp is the p-th cycle group,

consisting of all p-cycles.

Lemma 2.22. Zp is a group.

Proof. Zp = ker δp . (Recall that the kernel of a homomorphism is a subgroup of its

domain.)
So far we have only formally defined a boundary operator, but have not specified
which chains we call boundaries. Of course, as already used implicitly before, the bound-
aries are the chains that are the result of applying the boundary operator.

Definition 2.23. A p-chain c is a p-boundary if ∃c ′ ∈ Cp+1 such that δ(c ′ ) = c. Bp is

the p-th boundary group, consisting of all p-boundaries.

Lemma 2.24. Bp is a group.

Proof. Bp = im δp+1 .

Fact 2.25. Bp ⊆ Zp ⊆ Cp , and all of them are abelian and free.

We will not prove this statement here, but to see that Bp ⊆ Zp , recall that by
Lemma 2.20 the boundary of a boundary is empty.

2.2.5 Homology Groups

We are now ready to formalize the notion of holes or cavities. Recall that intuitively, a
hole is a cycle that is not a boundary, that is, not filled by something higher-dimensional.
Using that all objects defined so far form abelian groups, we can phrase this in algebraic
terms using quotient groups.

Definition 2.26. The p-th homology group Hp (K; Z2 ) is the quotient group Zp (K)/Bp (K).

28
Introduction to TDA 2.2. Homology

The Z2 in the notation emphasizes that this is homology over Z2 . In general, we

write Hp (K; R) for homology over some ring R. As in these lecture notes we only work
with homology over Z2 , we will from now on just write Hp (K) for homology over Z2 .
In essence, in the homology group cycles that differ only by boundaries are equivalent.
More formally, the coset [c] = c + Bp is the homology class of c. We say that c and c ′
are homologous, if [c] = [c ′ ], or equivalently c ∈ c ′ + Bp or equivalently c + c ′ ∈ Bp . See
Figure 2.7 for an example of homologous cycles, and Figure 2.9 for an example of the
homology group of a small complex.

Figure 2.7: c ′ and c are homologous cycles.

Exercise 2.27. Visualize the following simplicial complex K: 0-faces {a, b, c, d, e}, 1-
faces {ab, ac, ad, bc, bd, cd, ce, de} and 2-faces {abc, abd, acd, bcd}. For the dimen-
sions 1 & 2, what are the cycle, boundary, and homology groups of K? Note: You
can express the groups by their generators. You do not need to write out all the
elements.

Exercise 2.28. Give an informal derivation for the homology groups of a torus (see
Figure 2.8). Can you find a space with isomorphic homology that is not homeo-
morphic to the torus?

Figure 2.8: A torus.

29
Chapter 2. Homology Introduction to TDA

Exercise 2.29. For a simplicial complex K, its cone CK is the complex with the same
set of vertices plus one additional vertex z, and such that for all simplices in K we
have

{a, b, c, . . .} ∈ K =⇒ {a, b, c, . . . , z} ∈ CK

(a) Visualize a cone operation. What does it intuitively do to a complex?

(b) Show that the homology of the cone CK is 0 in all dimensions d > 0, for any
K.

(c) Bonus: What would happen (intuitively and to the homology) if we extended K
in the same way as before, but with two points? (this is called the suspension
of K)

Here are some nice properties of homology groups, that will be beneficial for us, but
that we will not prove here.

Fact 2.30.

• Hp is abelian and free.

• Hp is a Z2 -vector space.

Remark 2.31. If we consider homology defined over other rings, e.g. over Z instead
of Z2 , the homology groups might not be free.

4
K: 1
H1(K) : {0, 123, 234, 1234} ∼
= Z22
2

Figure 2.9: The first homology group of a small complex.

Recall that our original motivation was to count the number of holes. With homology
as we defined it, we have the algebraic structure of a vector space where we can add holes.
The number of distinct holes is now just the dimension of this vector space.

Definition 2.32. βp := dim Hp = dim Zp − dim Bp is the p-th Betti number.

In the definition above, dim denotes the dimension of a vector space as you know it
from Linear Algebra, i.e., dim Hp is the number of elements in a basis of Hp .

30
Introduction to TDA 2.2. Homology

Exercise 2.33. The Euler characteristic of a simplicial complex K is defined as

χ = k0 − k1 + k2 − . . .

with ki denoting the number of i-dimensional simplices in K. Convince yourself that

this is an invariant property for all triangulations of the same topological space X.
Hint: Show instead that χ = β0 (K) − β1 (K) + . . .. The statement then follows by
the fact that homeomorphic spaces have the same homology.

Exercise 2.34. Take any vector v = (a0 , . . . , ad ) ∈ Nd+1 with a0 > 0. Show that there
exists a simplicial complex Kv with that vector as its Betti numbers.

2.2.6 Singular Homology

With our definition of homology for simplicial complexes, we get for free a notion of
homology for many topological spaces, namely the triangulable ones: we can simply
triangulate them and take the homology of the triangulation. But, so far, it seems like
the structure of the homology group might differ depending on the choice of triangulation.
The aim of this section is to sketch the tools that show that the homology of a triangulable
space is independent of the chosen triangulation. The idea of singular homology is to
remove the need for a fixed triangulation by looking at all possible simplices at once.
Let X be a topological space, and let ∆p be the standard p-simplex in Rp+1 . We want
to consider all possible occurrences of this simplex in X.

Definition 2.35. A singular p-simplex is a map σ : ∆p → X.

Note that in this definition we do not require σ to be injective, thus it would even
be possible to map the simplex to a single point.
We now define Cp the same way as before, but now on the family of all singular
p-simplices, which in general makes the group uncountably infinite. We also define δp
as before, leading to Zp and Bp now also being uncountably infinite. Similarly, Hp (X) =
Zp (X)/Bp (X). The following relates singular homology and simplicial homology.

Theorem 2.36. Let X be a topological space, K a triangulation of X. Then we have

∼ Hp (K) for all p ⩾ 0.
Hp (X) =

As isomorphisms for vector spaces are an equivalence relation, we also get the desired
independence of the triangulation.
∼ Hp (K2 )
Corollary 2.37. Let K1 , K2 be two distinct triangulations of X. Then, Hp (K1 ) =
for all p ⩾ 0, that is, homology is independent of the chosen triangulation.

For the remainder of these notes, we will only work with simplicial homology, but we
often talk about the homology of a triangulable space without specifying a triangulation.
The above corollary gives us the right to do this.

31
Chapter 2. Homology Introduction to TDA

2.2.7 The 0-th homology group

In this section, we take a closer look at the 0-th homology group. Recall that the 0-
simplices of a simplicial complex K are simply its vertices. Since vertices do not have
any boundaries, every vertex is a 0-cycle. The boundary of a 1-simplex simply consists
of the two vertices which are connected by the edge. We can thus see that two vertices
v1 and v2 are homologous if there is a path from v1 to v2 , and the homology class [v1 ] is
simply the connected component containing v1 .

Observation 2.38. β0 (K) is the number of connected components of K.

Further, the 0-homology classes are the formal sums of connected components.

2.2.8 Homology of Spheres

We will now investigate the homology of the spheres Sd . Since we have seen in Section
2.2.6 that homology is independent from the chosen triangulation, let us fix some tri-
angulation of the sphere Sd . A good candidate (for its simplicity) is the boundary of a
simplex, that is, Sd ≃ δ(∆d+1 ), with the vertex set V = {v0 , . . . , vd+1 }.

H0 (Sd ): Let us first investigate H0 (Sd ). Since all vertices are connected, all vertices are
homologous, and H0 (Sd ) = ⟨[v]⟩ = ∼ Z2 .

Hd (Sd ): Now, let us check Hd (Sd ). We first compute Zd : Obviously, the zero element is
part of Zd . Furthermore, the d-simplices are exactly the sets σi = {v0 , . . . , vˆi , . . . , vd+1 .
The sum c of all these d-simplices must be a cycle, since every d − 1-simplex occurs
in exactly two d-simplices, thus the boundary of c must be empty. Thus, c ∈ Zd . We
cannot have any other cycle, since for any other chain there must be some d-simplex for
which we include one neighbor but not the other, thus this d-simplex would be part of
the boundary. We conclude that Zd (Sd ) = ⟨c⟩.
Since δ(∆d+1 ) is a d-dimensional simplicial complex, and thus does not contain any
(d +1)-simplices, c cannot be a boundary. Since Bd is a subgroup of Zd , we thus get that
Bd (Sd ) is the group containing only 0. Alternatively, we can also get this by noticing
that Cd+1 = 0, and Bd = im δd+1 = 0.
We finally get Hd (Sd ) = Zd /Bd = Zd =∼ Z2 .

P
Hp (Sd ): Finally, let us go to Hp (Sd ), for 0 < p < d: Let c = αi σi be a p-cycle. We
aim to show that c is homologous to the 0-chain, i.e., that [c] = 0. Equivalently, we show
that c must be a boundary.
Let σ = (vm0 , . . . , vmp ) be any p-simplex in c which does not include v0 . We will
keep replacing such simplices by simplices which do contain v0 , until we have no more
simplices not containing v0 .

32
Introduction to TDA 2.2. Homology

Let b be the (p + 1)-simplex (v0 , vm0 , . . . , vmp ). Note that b ∈ δ(∆d+1 ) and thus δ(b)
is a p-boundary. Also note that σ is in δ(b). Furthermore, σ is the only p-simplex in
δ(b) which does not contain v0 . We now add δ(b) to c, to get c ′ := c + δ(b). Since we
added a boundary, [c] = [c ′ ] (i.e., c and c ′ are homologous). Furthermore, c ′ contains
one fewer p-simplex not containing v0 , when compared to c.
We repeat this process until we reach a cycle c∗ in which every p-simplex contains
v0 . We now claim that c∗ must be the trivial cycle: Assume c∗ contains some p-simplex
a = (v0 , va1 , . . . , vap ). Then, the (p − 1)-simplex a ′ = (va1 , . . . , vap ) is part of δ(a). But,
a ′ cannot be part of the boundary of any other p-simplex in c∗ , since the only p-simplex
containing a ′ as a face while also containing v0 is a. Thus, to have an empty boundary,
c∗ must be 0. We thus have [c∗ ] = 0, and by construction, [c] = [c∗ ], therefore [c] = 0 as
we aimed to prove.
We have proven that every cycle is homologous to 0, and we can conclude that for
all 0 < p < d, Hp (Sd ) = 0.
By these arguments we conclude the following theorem:
Theorem 2.39. For any d > 0, we have

d Z2 p ∈ {0, d}
Hp (S ) =
0 else.

1 p ∈ {0, d}
βp (Sd ) =
0 else.

2.2.9 Induced Homology

As in many mathematical theories, we are not only interested in single objects, but also
in the maps between them. For simplicial complexes we have defined simplicial maps
between them, and we now want to study the effect that simplicial maps have on the
homology of a space.
Let f : K1 → K2 be a simplicial map. This induces a chain map

f# : Cp (K1 ) → Cp (K2 )

X X f(σi ) if f(σi ) is p-simplex in K2
c= αi σi 7→ f# (c) := αi τi , where τi =
0 otherwise

Note that f(σi ) is always a simplex in K2 since f is a simplicial map, but it could be a
simplex of smaller dimension. This is why we have the condition in the above definition
of τi .
The following can be shown with a bit of work:
• f# ◦ δ = δ ◦ f#

• f# (Bp (K1 )) ⊆ f# (Zp (K1 ))

33
Chapter 2. Homology Introduction to TDA

• f# (Zp (K1 )) ⊆ Zp (K2 ), f# (Bp (K1 )) ⊆ Bp (K2 )

From this chain map f# , we now get a well-defined induced homomorphism between
the homology groups of K1 and K2 :

f∗ : Hp (K1 ) → Hp (K2 )
[c] = c + Bp 7→ f# (c) + Bp (K2 ) = [f# (c)]

Fact 2.40. If Hp (K1 ) and Hp (K2 ) are vector spaces (as they are in e.g. Z2 -homology,
which is what we are using), then f∗ is a linear map.
We also get the following functorial property, which we will not prove: if f : X → Y,
g : Y → Z, then (g ◦ f)∗ = g∗ ◦ f∗ .
Let us look at a small example:
b b
a d a d

K1 K2
c c
b
We consider f : K1 ,→ K2 the inclusion map.
∼ Z2
H1 (K1 ) = {0, [abc], [bcd], [abdc]} = 2

f∗ (0) = 0, f∗ ([abc]) = [abc]

f∗ ([bcd]) = 0, f∗ ([abdc]) = [abc]
Exercise 2.41. Let

K1 = {∅, a, b, c, d, e, ab, ac, bc, bd, cd, ce, de, abc}

and

K2 = {∅, w, x, y, z, wx, wy, xy, xz, yz}.

Consider the map f : K1 → K2 induced by

a 7→ y, b 7→ x, c 7→ y, d 7→ z, e 7→ z.

You can verify easily that f is simplicial. Compute f∗ : Hp (K1 ) → Hp (K2 ) for
0 ⩽ p ⩽ 2.
Exercise 2.42. Which of the following four statements is true for every simplicial map
f?
“If f is {injective, surjective}, then f∗ is {injective, surjective}.”

34
Introduction to TDA 2.2. Homology

The following fact has some very powerful consequences, as we will see.
Fact 2.43. If f, g : K1 → K2 are contiguous, f∗ = g∗ .
Note that the definition of induced homology extends from simplicial maps to maps
between any topological spaces. We will not state the exact definitions, but the following
fact is the continuous analogue (remember that two simplicial maps being contiguous is
analogous to two maps being homotopic) of the previous fact.
Fact 2.44. If f, g : X → Y are homotopic, f∗ = g∗ .
The following corollary is very useful to compute the homology of a space, as it gives
us the option to relate it to the homology of a potentially simpler space.
Corollary 2.45. If f : X → Y is a homotopy equivalence (i.e., there exists g : Y → X
such that f ◦ g is homotopic to idY and g ◦ f is homotopic to idX ), then f∗ is an
isomorphism.
In particular, if Y is a deformation retract of X, then Hp (Y) and Hp (X) are isomorphic.
As a special case of the above, we have that a contractible space has the same homology
groups as a point.

Z2 p = 0,
Corollary 2.46. If X is contractible, Hp (X) =
0 otherwise.

Exercise 2.47.
Consider the space you get when you glue together two points of a torus. What is
the homology of this space?
Consider the space you get when you simultaneously pierce a balloon at n distinct
locations. What is the homology of this space?
Exercise 2.48. Let f, g : S1 → S1 be continuous maps such that f(−x) = f(x) and
g(−x) = −g(x) for all x ∈ S1 .
a) Convince yourself that f∗ : H1 (S1 ) → H1 (S1 ) is trivial (maps everything to 0)
and that g∗ is an isomorphism.

b) Show that f and g are not homotopic.

c) Show that there is no map h : S2 → S1 such that h(−x) = −h(x).

d) Conclude that every map ϕ : S2 → R2 with ϕ(−x) = −ϕ(x) has a zero.

The statement you have proven in d) is equivalent to the 2-dimensional case of
the famous Borsuk-Ulam theorem, which implies statements such as “at any time,
there are two antipodal points on the earth with both the same temperature and
pressure”.

35
Chapter 2. Homology Introduction to TDA

2.2.10 Application: Brouwer fixed point theorem

In this section we finally collect the fruits of our hard work by using homology to give a
relatively short proof of the famous fixed point theorem by Brouwer. Here, Bd denotes
the unit ball of dimension d.
Theorem 2.49 (Brouwer fixed point theorem). Let f : Bd → Bd be continuous. Then, f
has a fixed point, that is, ∃x ∈ Bd such that f(x) = x.
This theorem has many fascinating implications:
• Take two sheets of paper lying on top of each other. Crumple the top sheet and
set it back onto the other sheet. No matter how you crumpled the sheet, at least
one point of the crumpled sheet lies exactly above its corresponding point in the
bottom sheet.
• If you open a map of Switzerland in Switzerland, there is at least one point on the
map which is at its exact position.
• If you take a cup of liquid and stir or slosh it, at least one atom ends up at its
original position (but if you shake you might break continuity).
• The theorem also has many applications in mathematics and computer science,
such as in fair divisions or for proving existence of Nash equilibria.
To prove Theorem 2.49, we first introduce the following definition and a helper lemma,
which we only prove after proving Theorem 2.49 itself.
Definition 2.50. A map r : X → A ⊆ X is a retraction if r(a) = a, ∀a ∈ A.
Lemma 2.51. There is no retraction r : Bd → Sd−1 .
Proof of Theorem 2.49. We prove the theorem by contradiction. For an illustration
of the argument see Figure 2.10. Assume f : Bd → Bd has no fixed point. For each
−−−→
x, consider the ray f(x)x and let r(x) be the intersection of this ray with Sd−1 . Then,
r : Bd → Sd−1 is continuous (which we do not prove here) and r(s) = s ∀s ∈ Sd−1 , since
−−→
no matter where f(s) lies, f(s)s first intersects Sd−1 in s. Thus, r is a retraction, which
does not exist by Lemma 2.51.
It remains to prove the helper lemma.
Proof of Lemma 2.51. Consider i, the inclusion map Sd−1 ,→ Bd , and a retraction
r : Bd → Sd−1 . By definition, we have r ◦ i = id . Let us look at the induced maps
∼ Z2
of r and i in the (d − 1)-th homology of Sd−1 and Bd . Recall that Hd−1 (Sd−1 ) =
d ∼
and Hd−1 (B ) = 0. We thus view i∗ as a homomorphism from Z2 to 0, and r∗ as a
homomorphism from 0 to Z2 . But since r ◦ i = id, we also have r∗ ◦ i∗ = id. We can
combine this to reach a contradiction:
1 = id(1) = r∗ ◦ i∗ (1) = r∗ (0) = 0
Thus, either i or r cannot exist, but since i exists, r cannot.

36
Introduction to TDA 2.2. Homology

f (x)
x
r(x)

Figure 2.10: If f has no fixed point, we get a retraction to the boundary.

Questions
5. What is a simplicial complex? Define geometric and abstract simplicial com-
plexes and state and prove the realization theorem (Theorem 2.5).
6. What are simplicial and contiguous maps? State the definitions and discuss the
connection to their counterparts in continuous topology.
7. Is every contractible simplicial complex collapsible? Define the notion of col-
lapsibility and describe Bing’s house with two rooms.
8. What is simplicial homology? Explain the intuition and give the formal defini-
tions of chains, boundaries and cycles.
9. Why is the homology of a triangulable space independent of the chosen trian-
gulation? Explain the idea of singular homology.
10. What are the homology groups of a sphere? State and prove the corresponding
theorem (Theorem 2.39).
11. How does a simplicial map between two simplicial complexes induce maps
between their homology groups? Define induced homomorphisms.
12. What is the Brouwer fixed point theorem? State, illustrate and prove the
Brouwer fixed point theorem (Theorem 2.49).

References
[1] Sketches of topology - Bing’s house. https://sketchesoftopology.wordpress.com/
2010/03/25/bings-house/, accessed: 2023-04-27.

[2] Allen Hatcher, Algebraic topology, Cambridge Univ. Press, Cambridge, 2000.

37
Chapter 3

Persistence

In the previous chapter, we have studied the homology of fixed simplicial complexes.
In this chapter, we will look at simplicial complexes that vary over time. Let us start
with a small example. Consider the following process of building up a triangle abc. At
time t1 , we add the vertices a and b together with the edge ab. This gives birth to a
single connected component. At time t2 we add the vertex c, giving birth to a second
connected component. At time t3 we add the edge ac, connecting the two components.
We can interpret this as the younger of the components dying. At time t4 we add the
final edge bc, which gives birth to a hole, that is, an element of the homology group H1 .
Finally, at time t5 we add the interior of the triangle, killing the hole born at t4 . We
can summarize this process as follows: we have a connected component that was born
at t1 and survived the entire process, and a connected component that was born at t2
that died again at t3 . Finally, we have a hole born at t4 dying at t5 . Capturing this
information of holes with their birth and death is the motivation of persistent homology.
Persistent homology can be applied to data analysis by defining (in a way that we will
see soon) a process to build up a simplicial complex from point cloud data and computing
the birth and death times of holes. Subtracting the birth time from the death time, we
get the lifespan of a hole, and the underlying idea is that holes with a short lifetime are
a byproduct of the process, whereas holes with a long lifespan convey information about
the shape of the underlying data.

3.1 Filtrations
We start by a mathematical formulation of the process of building up a complex or, more
general, a topological space. A filtration is a nested sequence of subspaces
F : X0 ⊆ X1 ⊆ X2 ⊆ . . . ⊆ Xn = X.
For each i ⩽ j, we have the inclusion map ιi,j : Xi ,→ Xj . Given these functions ι,
we get induced maps in homology: hi,jp = ι∗ : Hp (Xi ) → Hp (Xj ). Filtrations are a very
general object that appear naturally in many settings. Let us look at some important
examples of filtrations.

38
Introduction to TDA 3.2. Persistent Homology

• Given a function f : X → R, we can define the (uncountably infinite) sublevel set

filtration Xa = f−1 (−∞, a].

• A simplicial filtration is a nested sequence of subcomplexes

F : K0 ⊆ K1 ⊆ . . . ⊆ Kn = K.

We call a simplicial filtration simplex-wise, if Ki \ Ki−1 is a single simplex (or

empty).

• We call a function f : K → R simplex-wise monotone if for every σ ⊆ τ we have

f(σ) ⩽ f(τ). A simplex-wise monotone function guarantees us that the sublevel set
filtration by f gives a proper simplicial filtration. Note that it does not necessarily
guarantee us that the sublevel set filtration is simplex-wise (e.g., consider a function
f that is not injective).

• We can also define a simplicial filtration by ordering our vertices v0 , v1 , . . . , vn .

Then, let Ki be the simplicial complex induced by the vertices v0 , . . . , vi . Then,
we call the simplices Ki \ Ki−1 added when adding vi the lower star of vi . Thus,
this type of filtration is also called the lower star filtration.

The following simplicial filtration captures the process that is relevant for analyzing
point cloud data.

Definition 3.1. Let (M, d) be a metric space. Let P be a finite subset of M, and
r > 0 a real number. The Čech complex Cr (P) is the nerve of the family of balls
B(p, r) = {x ∈ M|d(p, x) ⩽ r} for all p ∈ P.

Since the balls B(p, r) form a good cover, the nerve theorem tells us that the Čech
complex is homotopy equivalent to the union of the balls.
By looking at the sequence of Čech complexes for increasing r, we get a simplicial
filtration.

3.2 Persistent Homology

As we have seen, from a filtration X0 ⊆ X1 ⊆ . . . ⊆ Xn we get a sequence of homology
groups with homomorphisms between them:

Hp (F) : Hp (X0 ) → Hp (X1 ) → Hp (X2 ) → . . . → Hp (Xn ).

Such an object is called persistence module.

Definition 3.2. The p-th persistent homology group Hi,j

p is defined by

p := im hp = Zp (Ki )/(Bp (Kj ) ∩ Zp (Ki )).

Hi,j i,j

39
Chapter 3. Persistence Introduction to TDA

[c]

···

Hp (Ki−1 ) Hp (Ki ) Hp (Kj−1 ) Hp (Kj )

Figure 3.1: An illustration of a class [c] being born at Ki and dying entering Kj .

This definition characterizes the cycles that that are present already in Ki and that
are not boundaries even in Kj .

Definition 3.3. The p-th persistent Betti numbers βi,j

p are the dimensions of the p-th
persistent homology groups: βp = dim Hp .
i,j i,j

Exercise 3.4. Let p ⩾ 1. For every n ⩾ 1, construct a filtration X1 ⊆ X2 ⊆ . . . ⊆ Xn

such that

• Hp (Xk ) ̸= 0 for all k ∈ {1, . . . , n} and

• Hi,j
p = 0 for all i < j.

We say that a p-homology class [c] (a p-hole) is born at Ki if [c] ∈ Hp (Ki ) but
[c] ∈ Hi−1,i
p . Similarly, [c] dies entering Kj , if [c] ̸= 0 in Hp (Kj−1 ) but hj−1,j
p ([c]) = 0.
It is not always obvious which homology class dies. Consider the following filtration:
X1 consists of two points a and b, and in X2 the two points are connected by an edge.
Let us look at H0 , that is, the connected components. We have that H0 (X1 ) ≃ Z22 , with
the natural basis {[a], [b]}. On the other hand, in X2 there is only a single connected
component, and [a] = [b]. So a homology class is dying, but both our basis elements [a]
and [b] survive. What is happening?
It turns out that we were not careful with our choice of basis: H0 (X1 ) can also be
viewed as being generated by [a] and [a + b], and the class [a + b] indeed dies going into
X2 . In general, if two homology classes merge, they both do not die, but their sum does.
There is a consistent choice of basis which allows us to only look at persistent homology
in terms of basis elements, but we do not go into this at this point.
If we have a simplex-wise filtration, we can circumvent the above issue by sorting
homology classes by the time where they were born, and when they merge, we just say
the “younger one” dies. This can be seen as adapting the considered basis along the way.
Persistence pairings are another way around this issue. We add some final complex
Kn+1 which has trivial homology (i.e., by adding all simplices that are not yet present).

40
Introduction to TDA 3.3. Algorithms for persistent homology

Then, we aim to figure out how many holes get born at Ki and die entering Kj . For this,
we define

µi,j i,j−1
p = (βp − βi,j i−1,j−1
p ) − (βp − βi−1,j
p ), for i < j ⩽ n + 1.

Here, the content of the left parenthesis denotes the number of holes born at or before
Ki , which die entering Kj . Conversely, the right parenthesis denotes the number of holes
born strictly before Ki , and die entering Kj . Thus, subtracting the two, gives the number
of holes born exactly at Ki and die entering Kj . Note that this conveys the information
that we are interested in, but does not require choosing any basis.
The persistence diagram Dgmp (F) is a birth-death diagram which contains a point
for every pair i, j for which µi,j
p > 0. If we give each Ki a timestamp ai , the point is
drawn at the coordinates (ai , aj ). We give each point multiplicity µi,j p . On the diagram
we add points on the diagonal with infinite multiplicity, for some technical reasons that
will become apparent later. We can also represent the same information by barcodes:
For every i, j, we draw µi,j
p many intervals [ai , aj ]. This is then called the p-th persistence
barcode.

Exercise 3.5. Consider the simplex-wise filtration induced by the order σ1 , . . . , σN on

the simplices of a complex K.
When does the order

σ1 , . . . , σk−1 , σk+1 , σk , σk+2 , . . . , σN

induce a simplex-wise filtration? When it does, describe the relation between the
corresponding persistence diagrams.

Exercise 3.6. Give two filtrations X1 ⊆ . . . ⊆ Xn and Y1 ⊆ . . . ⊆ Yn that have the

same persistence diagrams but for which for any i ∈ {1, . . . , n}, Xi is not homotopy-
equivalent to Yi .

3.3 Algorithms for persistent homology

3.3.1 Persistence pairing algorithm

We consider a simple-wise filtration. Consider some j, and let p be the dimension of the
simplex added in Kj , i.e., Kj \ Kj−1 = σj is a p-simplex. There are only two things that
can happen when adding σj : Either, a new non-boundary p-cycle c (a hole) is born.
Then we say that σj is a creator. It is also possible that adding σj , a (p − 1)-cycle
becomes a boundary, thus a hole dies. Then we say that σj is a destructor. The fact
that at exactly one of the two events happens is a consequence of the Euler characteristic,
which is discussed in Exercise 2.33.
The persistence pairing algorithm pairs a destructor σ with the youngest still unpaired
creator within the cycle it destroys. To find this youngest unpaired creator, we look at

41
Chapter 3. Persistence Introduction to TDA

t1 t2 t3 t4 t5

∞ ∞

t5 t5

t4 t4

t3 t3

t2 t2
Dgm0 (F) Dgm1 (F)
t1 t1

t1 t2 t3 t4 t5 t1 t2 t3 t4 t5
Figure 3.2: An example of a filtration with the corresponding barcodes and persistence
diagrams.

42
Introduction to TDA 3.3. Algorithms for persistent homology

the boundary of σ. We try pairing σ to the youngest element ρ of its boundary. If this
element is already paired with some element τ, we replace it by the sum of ρ and the
boundary of τ. We now have a new set of candidate creators. We repeat this process
until we found an unpaired creator we can pair to, or until we cannot continue (there
are no more candidates). If we cannot pair σ to anything, it must be a new creator.
Whatever unpaired creators remain at the end of the algorithm are paired to an element
∞.
What is the runtime of this algorithm? Let N be the total number of simplices in the
final complex of our filtration. Whenever we add a simplex, and we replace a simplex by
the boundary of its paired destructor, we add at most O(N) simplices. We have to do
this at most O(N) times. Since we do this for each simplex, we get a runtime of O(N3 ).
Surprisingly, this runtime is tight.
Exercise 3.7. Let G be a weighted connected graph, where all edge weights are pairwise
distinct. Consider a filtration that first inserts all vertices (in some arbitrary order)
and then inserts the edges one by one, ordered by increasing weight. What is the
set of destructors?

3.3.2 Matrix reduction algorithm

In practice, a different algorithm is used, which actually does the same but in the language
of matrices. This is the Matrix Reduction Algorithm. Here, the filtration does not
necessarily have to be simplex-wise. We write a large matrix, which is N × N. Both
rows and columns are labeled by the simplices, ordered by order of insertion. If several
simplices are added at the same time, the ones of smaller dimension are put before the
ones of larger dimension. We then insert a 1 at row σ and column τ, if σ is part of the
boundary of τ. We now process the columns from left to right. For each column, we
look at the lowest 1 in the column. If there is a 1 in the row towards the left of that
column, add the column containing that other 1 to the current column (in Z2 ). At the
end, empty columns correspond to creators (births). To find the death of a creator, look
at its corresponding row, and find a pivot element in this row (a 1 which is the lowest 1
of its column). If there is no pivot element, this creator never dies, i.e., is unpaired.
Let us again look at the runtime. For each column (O(N)), we might have to add
O(N) times a column, and each addition takes O(N). So again, we have O(N3 ) runtime.
But, we can write the algorithm in a way such that it runs in O(Nω ), where ω is the
matrix-multiplication exponent. In practice, it runs in essentially O(N).
Exercise 3.8. Consider the following simplicial complex, and the simplex-wise filtra-
tion which first inserts the vertices in the order a, b, c, d, e, and the rest of the
simplices as specified by the numbering in Figure Figure 3.3.
Execute both the persistence pairing algorithm and matrix reduction algorithm
on this filtration. What are the similarities and differences in the algorithms? To
better see what happens, label the columns in the matrix by the sum of columns
they currently represent.

43
Chapter 3. Persistence Introduction to TDA

Represent the results you obtained by a persistence diagram, and also by the
persistence barcodes.

d 8 c

7 13 10 9
14 12

e 6 a 11 b
Figure 3.3: The filtration for Exercise 3.8.

Exercise 3.9. A Union-Find data structure is a data structure that maintains disjoint
sets dynamically. Given a ground set X, such a data structure maintains a family
S of disjoint subsets of X, where each subset is represented by the smallest element
contained in it. It supports three operations: MakeSet(x) creates a new set {x}.
FindSet(x) returns the representative (minimum) of the set in S which contains x
(or “no” if x is not contained in any set). Union(x, y) merges the sets containing x
and y into a single one. All of these operations can be implemented in amortized
Θ(α(n)) time, where α is the extremely slowly growing inverse Ackermann function
and can be considered a constant for any real world application.
Consider a simplicial complex K with its vertices ordered v0 , . . . , vn , and consider
its lower star filtration. Find an algorithm to compute the 0-dimensional persistence
diagram (i.e., the persistence pairings) of K which makes use of a Union-Find data
structure. How many Union-Find operations do you need to perform?

3.4 Simplicial Complexes on Point Sets

In general, the data we wish to analyze will not come in the form of a simplicial filtration,
so in order to use persistent homology we need to transform our data into one. Ideally,
the way we do this should retain the underlying shape of the data, that we want to
analyze. In this section we discuss several ways of constructing simplicial complexes
from point cloud data or, more generally, finite metric spaces (i.e., a finite set of data
points with given pairwise distances).

3.4.1 Čech and Vietoris-Rips complexes

Definition 3.10. Given a metric space (M, d), a finite point set P ⊆ M, and a real
number radius r > 0, the Čech complex Cr (P) is defined as the nerve of the set of
balls B(p, r) = {x ∈ M | d(p, x) ⩽ r} for all p ∈ P.

44
Introduction to TDA 3.4. Simplicial Complexes on Point Sets

The Čech complex has the nice property that by the nerve theorem, it is homotopy
equivalent to the union of the balls B(p, r). In particular, for nice radii, it will capture
the underlying shape. Sadly, checking whether a large number of balls have a common
intersection can be computationally expensive. Further, the definition requires that
the data points are embedded in a metric space. These two issues motivate the next
definition.

Definition 3.11. Given a finite metric space (P, d) and a real number radius r > 0,
the Vietoris-Rips complex VRr (P) is defined as the simplicial complex containing a
simplex σ if and only if d(p, q) ⩽ 2r for every pair p, q ∈ σ.

Clearly, for finite subsets of metric spaces, by definition, the Čech complex and the
Vietoris-Rips complex for the same radius and the same point set have the same set
of 1-simplices (the same 1-skeleton). While the Čech complex then contains additional
information about the common intersections of balls, the Vietoris-Rips complex is simply
the clique complex of this 1-skeleton. This makes the Vietoris-Rips complex easier to
compute. Furthermore, we make the following simple observation, showing that the
Vietoris-Rips complex still captures shapes in the data:

Observation 3.12. Cr(P) ⊆ VRr(P) ⊆ C2r(P).

Exercise 3.13. Prove Observation 3.12.

Exercise 3.14. Find a point set P ⊂ R2 and a radius r such that its Vietoris-Rips
complex has non-trivial 2-homology, i.e., such that H2 (VRr (P)) =
̸ ∼ 0.
Furthermore, is there a dimension k such that Hk ′ (VR (Q)) = 0 for all k ′ ⩾ k, all
r

r > 0, and all point sets Q ⊂ R2 ?

3.4.2 Delaunay and Alpha complexes

Recall that computing persistent homology takes O(N3 ) time, where N is the size of
the simplicial complex in the filtration. For large enough radii, both the Čech and the
Vietoris-Rips complex become complete, and thus contain 2n simplices. Thus, computing
persistent homology using those complexes is computationally very expensive, which is
why in many applications we would like to have sparser complexes. For data in Rd we
can look at the so-called Delaunay triangulation, which only has complexity O(n⌈d/2⌉ ).

Definition 3.15. Given a finite point set P ⊂ Rd , a Delaunay simplex is a geometric

simplex whose vertices are in P and lie on the boundary of a ball whose interior
contains no points of P.
A Delaunay triangulation Del(P) of P is a geometric simplicial complex where
every simplex is a Delaunay simplex and whose underlying space covers the convex
hull of P.
Given a finite point set P ⊂ Rd , the extended Delaunay complex is the simplicial
complex where for every face σ, for d ′ ⩽ d, every d ′ -face of σ is a Delaunay simplex.

45
Chapter 3. Persistence Introduction to TDA

It is a well-known fact that for a point set in general position (no d + 2 points lie on
a common sphere), there is a unique Delaunay triangulation. Furthermore, in this case
the extended Delaunay complex and the unique Delaunay triangulation coincide.
Definition 3.16. Given a finite point set P ⊂ Rd , the Voronoi diagram is the tessellation
of Rd into the Voronoi cells

Vp = {x ∈ Rd | d(x, p) ⩽ d(x, q)∀q ∈ P}

for all p ∈ P.
Fact 3.17. The nerve of the Voronoi cells of P is the extended Delaunay complex of
P.
Exercise 3.18. Convince yourself that for a point set in R2 , the nerve of the Voronoi
diagram is the extended Delaunay complex. Furthermore, convince yourself that if
the points are in general position (there are no three points that are collinear, and
no four points that are cocircular), then there is a unique Delaunay triangulation.
Based on the Delaunay triangulation, we define the Alpha complex by parameterizing
using a radius as follows:
Definition 3.19. Given a finite point set P ⊂ Rd in general position as well as a real
number radius r > 0, the Alpha complex Delr (P) consists of all simplices σ ∈ Del(P)
for which the circumscribing ball of σ has radius at most r.
The following fact provides us with an alternative definition of the Alpha complex:
Fact 3.20. The Alpha complex Delr (P) is the nerve of the sets B(p, r) ∩ Vp for all
p ∈ P.
Since the Alpha complex is a subset of the Delaunay triangulation (and for large
enough radius is equal to the Delaunay triangulation), it also has complexity O(n⌈d/2⌉ ).
Further, the above fact together with the Nerve theorem implies that the Alpha complex
Delr (P) is homotopy equivalent to the Čech complex Cr (P).
Exercise 3.21. Is the following true or false? Consider a point set P ⊂ R2 in gen-
eral position and a radius r > 0. Then the Alpha complex (with radius r) is the
intersection of the Čech complex (with radius r) with the Delaunay triangulation.

3.4.3 Subsample Complexes

For many applications, the Alpha complex is still too large. It is further expensive to
compute, as computing a Delaunay triangulation in Rd takes O(n⌈d/2⌉ ) time. Sparser
complexes can be constructed by looking at subsamples of the data, and relating the
rest of the data to these subsamples. In the following, we will discuss two examples of
complexes based on this idea.

46
Introduction to TDA 3.4. Simplicial Complexes on Point Sets

Definition 3.22. Given a finite point set Q and a point set P ⊃ Q in some metric space,
we say that a simplex σ ⊆ Q is weakly witnessed by x ∈ P \ Q, if d(q, x) ⩽ d(p, x)
for every q ∈ σ and p ∈ Q \ σ.
Note that the set of weakly witnessed simplices is not downwards closed. We thus
define a simplicial complex by requiring that all faces are weakly witnessed:
Definition 3.23. The Witness complex W(Q, P) is the collection of simplices on Q for
which all faces are weakly witnessed by some point p ∈ P \ Q.
Note that if we take the metric space Rd and we let P be the whole Rd , then W(Q, P) =
Del(Q), and by definition we thus get in general that W(Q, P) ⊆ Del(Q).
To arrive at a filtration, we again have to introduce a parameter r > 0:
Definition 3.24. Given a finite point set Q and a point set P ⊃ Q in some metric space
as well as a real number radius r > 0, the parameterized Witness complex Wr (Q, P)
is defined as follows:
An edge pq is in Wr (Q, P) if it is weakly witnessed by x ∈ P \ Q and d(p, x) ⩽ r and
d(q, x) ⩽ r. A simplex σ is in Wr (Q, P) if all its edges are.
The idea of this complex is that it should approximate the Vietoris-Rips complex on
P. There are theoretical guarantees about this approximation for manifolds of dimension
at most 2, but the parameterized witness complex may fail to capture the topology of
manifolds in dimension 3 and above.
Note that from the definition it is not guaranteed that the parameterized Witness
complex is a subcomplex of the Witness complex.
Definition 3.25. Given two finite point sets Q, P in Rd , as well as a graph G(P) with
vertices in P, we define v : P → Q by sending each point in P to its closest point in
Q. The graph induced complex G(Q, G(P)) contains a simplex σ = {q0 , . . . , qk } ⊂ Q
if and only if there is a clique {p0 , . . . , pk } in G(P) for which v(pi ) = qi .
We again parameterize this:
Definition 3.26. Let Gr (P) be the graph on P where pq is an edge if and only if d(p, q) ⩽
2r. The parameterized graph induced complex Gr (Q, P) is defined as G(Q, Gr (P)).
This complex again has theoretical guarantees of approximating the Vietoris-Rips
complex on P ∪ Q.
Exercise 3.27. Let P, Q be point sets and G(P) a graph with P as its vertex set. Let
v : P → Q be the map sending each point of P to its closest point of Q (assume
that this closest point is always unique). Let C be the clique complex of G(P) (the
complex which includes a simplex iff its corresponding vertices in G(P) form a
clique).
Show that v extends to a simplicial map v̄ : C → G(Q, G(P)). Also show that any
simplicial complex K with V(K) = Q for which v has a simplicial extension must
contain G(Q, G(P)).

47
Chapter 3. Persistence Introduction to TDA

3.5 Distance Metrics on Persistence Diagrams

In this section we will define some distance metrics that can be used to compare different
persistence diagrams.

3.5.1 Bottleneck Distance

Let F, G be two filtrations giving rise to persistence modules Hp F, Hp G. Let Dgmp (F)
and Dgmp (G) be their corresponding persistence diagrams. These diagrams are the
information we want to use to compare F and G.
The general idea of the bottleneck distance is to pair up points of the two persistence
diagrams, i.e., consider bijections between points of Dgmp (F) and Dgmp (G). Since we
can only find bijections between sets of the same cardinality, we need the two diagrams to
have the same number of points. This is where the definition of the persistence diagram
comes in: recall that a persistence diagram includes every possible point on the diagonal
with infinite multiplicity. Thus, both sets of points have the same (infinite) cardinality,
and bijections between these sets are thus well-defined.
To measure the “quality” or “distance” of such a bijection, we use the L∞ -norm:

Definition 3.28. Let x = (x1 , x2 ), y = (y1 , y2 ) be two points in R2 . Then,

||x − y||∞ := max(|x1 − y1 |, |x2 − y2 |),

where we say that ∞ − ∞ = 0 for points with coordinates that are ∞ (i.e., points
in persistence diagrams that correspond to holes that did not die).

Definition 3.29. Let Π = {π : Dgmp (F) → Dgmp (G) | π is bijective} be the set of all
bijections between Dgmp (F) and Dgmp (G). Then, the Bottleneck distance is defined
as

db (Dgmp (F), Dgmp (G)) := inf sup ||x − π(x)||∞ .

π∈Π x∈Dgmp (F)

The Bottleneck distance thus minimizes the maximum L∞ -norm of any pairing, over
all pairings of points.

Observation 3.30. The Bottleneck distance is a metric on the space of persistence

diagrams with finitely many off-diagonal points.

Proof. We check the three properties of metrics:

1. db (X, Y) = 0 if and only if X = Y is simple to see, since if X = Y, every point can

be matched to its copy, and if X ̸= Y, there exists some point p ∈ X \ Y ∪ Y \ X
which must be matched to some point with positive L∞ -distance to p.

2. db (X, Y) = db (Y, X) is clear by definition.

48
Introduction to TDA 3.5. Distance Metrics on Persistence Diagrams

Figure 3.4: An illustration of the idea of bottleneck distance.

3. db (X, Y) ⩽ db (X, Z) + db (Z, Y). Take a bijection π1 witnessing db (X, Z) and a

bijection π2 witnessing db (Z, Y), and concatenate the two: π := π2 ◦π1 is a bijection
X → Y where for every x ∈ X we have ||x−π(x)||∞ ⩽ ||x−π1 (x)||∞ +||π1 (x)−π2 (x)||∞ .
Note that since db is an infimum and not a minimum, there may not be π1 and π2
witnessing db . In this case, the same argument can be applied to the converging
sequences of bijections witnessing db .

Exercise 3.31. Give an algorithm to compute the Bottleneck distance between two
persistence diagrams. Your algorithm should be polynomial in n, where n is the
total number of off-diagonal points in the two persistence diagrams.

Recall that simplex-wise monotone functions f, g : K → R give rise to simplicial

sublevel set filtrations Ff , Fg . We could now compare the persistence diagrams of these
two filtrations using the Bottleneck distance, but we wish to define a metric directly
between the two functions f, g:

Definition 3.32 (infinity norm). Let f, g : X → R. Then, the infinity norm of f − g is

defined as

||f − g||∞ := sup |f(x) − g(x)|.

x∈X

The following theorem tells us that this infinity norm and the Bottleneck distance
are closely related:

Theorem 3.33 (Stability for simplicial filtrations). Let f, g : K → R be simplex-wise

monotone functions. Then, ∀p ⩾ 0 we have db (Dgmp (Ff ), Dgmp (Fg )) ⩽ ||f − g||∞ .

49
Chapter 3. Persistence Introduction to TDA

Proof. Let ft := (1 − t)f + tg for t ∈ [0, 1] be the linear interpolation between f and g.
Note that f0 = f, f1 = g.
We first show that each ft is a simplex-wise monotone function. It is clearly simplex-
wise, and we prove that it is also monotone: Let σ ⊆ τ. Since f and g are monotone, we
have f(σ) ⩽ f(τ) and g(σ) ⩽ g(τ). Thus,
ft (σ) = (1 − t)f(σ) + tg(σ) ⩽ (1 − t)f(τ) + tg(τ) = ft (τ).
Let p ⩾ 0 be fixed. We now draw the family of persistence diagrams Dgmp (Fft )
as a multiset in R2 × [0, 1]. Each off-diagonal point of Xt := Dgmp (Fft ) is of the form
x(t) = (ft (σ), ft (τ), t) for σ being the creator and τ being the destructor. Note that the
persistence pairings (σ, τ) may only change when the order of simplex insertion changes,
which only happens finitely many times when going from t = 0 to t = 1. Let us call
these values 0 = t0 < t1 < t2 < . . . < tn < tn+1 = 1. Without loss of generality, we
assume that at each of these values ti exactly two simplices have the same value fti .
Within each open interval (ti , ti+1 ) the pairings stay constant. Furthermore, every
off-diagonal point x(t) is a linear function of t in all three coordinates, meaning that it
defines a line segment.
At ti+1 , if x(ti+1 ) is an off-diagonal point whose creator and destructor are still paired
after ti+1 , x(t) continues in the same direction after ti+1 .
If on the other hand x(ti+1 ) is an off-diagonal point whose creator and destructor get
paired differently, recall by Exercise Sheet 5, Question 3, there are exactly two pairs that
swap their creators or destructors, and these creators or destructors that are swapped
must have the same value in fti+1 . In the persistence diagram, this means that two points
vertically or horizontally of each other swap creators/destructors, and there is a unique
continuing line segment for both of them.
Lastly, if x(ti+1 ) is on the diagonal, this means that its previous creator and destructor
now have the same value in fti+1 . There is no continuation for this point.
Every point thus moves along a polygonal path monotone in t. Every such path
is called a vine, and the multiset of all vines is called a vineyard, see Figure 3.5 for
an illustration. Based on this vineyard, we now wish to find a good matching giving
an upper bound on the Bottleneck distance. We simply take the matching where we
match the start point of every vine with its endpoint. To get a bound on the Bottleneck
distance, we simply need to get a bound for the distance of each matched pair.
Between ti and ti+1 we get for δx(t) δt
:
δ
((1 − t)(f(σ), f(τ), t)) + t(g(σ), g(τ), t)) = (g(σ) − f(σ), g(τ) − f(τ), 1)
δt
Projecting x(ti+1 ) and x(ti ) to R2 we get two points yi+1, yi such that
||yi+1 − yi ||∞ = (ti+1 − ti ) · max(g(σ) − f(σ), g(τ) − f(τ) ⩽ (ti+1 − ti ) · ||f − g||∞
Thus, since || · ||∞ is a norm and fulfills the triangle inequality, we also have that from
t = 0 to t = 1, the point can move at most ||f − g||∞ . We thus have the desired bound
on the Bottleneck distance.

50
Introduction to TDA 3.5. Distance Metrics on Persistence Diagrams

time

ti+1 h
at
de

birth

Figure 3.5: The vineyards in the proof of Theorem 3.33.

Exercise 3.34. Show that Theorem 3.33 (Stability for simplicial filtrations) can be
tight for all p ⩾ 0 and all values of ||f − g||∞ .

We wish to generalize the stability result above to general topological spaces.

Consider some topological space X and a function f : X → R, which induces a sublevel
set filtration for every r ∈ R. We only want to consider tame functions: A function f is
tame if all homology groups of sublevel sets have finite rank, and the homology groups
only change at finitely many values, called critical values.

Theorem 3.35. Let X be a triangulable topological space, and f, g : X → R be two tame

functions, then ∀p ⩾ 0, we have

db (Dgmp (Ff ), Dgmp (Fg )) ⩽ ||f − g||∞ .

We do not prove this theorem at this point, but with additional tools that we will
develop in Section 3.6, the proof of this (and of Theorem 3.33) will follow quite easily.

3.5.2 Wasserstein Distance

Consider the following three diagrams:
Which of Y1 and Y2 is X closer to? Intuitively, one clearly says Y1 : There are simply
fewer features in Y1 that are not present in X. In terms of Bottleneck distance, there is

51
Chapter 3. Persistence Introduction to TDA

X Y1 Y2

only one reasonable matching between X and Y1 , and also only one between X and Y2 : We
simply match each off-diagonal point with its closest point on the diagonal. Since we only
look at the longest edge in this matching, the Bottleneck distance db (X, Y1 ) = db (X, Y2 ).
We can get rid of this counter-intuitive behavior of the Bottleneck distance by using
the Wasserstein distance.

Definition 3.36 (Wasserstein distance). For p ⩾ 0, and q ⩾ 1, the q-Wasserstein dis-

tance is defined as

X 1/q
" #

dW,q (Dgmp (F), Dgmp (G)) := inf (||x − π(x)||∞ )q
π∈Π
x∈Dgmp (F)

Intuitively, we now consider the length of all edges in the matching induced by the
bijection, as opposed to just the longest one, but the longer ones get more weight. Note
that for q = ∞, we retrieve the bottleneck distance, that is, dW,∞ = db .
We can see that the stability theorem we proved for Bottleneck distance does not
hold for Wasserstein distance: consider two simplex-wise monotone functions f and g
on a path, as illustrated in Figure 3.6. In both f and g the first vertex on the path is
mapped to 1 and the edges along the path are mapped to increasing odd numbers. In
f the remaining vertices along the path get mapped to increasing even numbers, and in
g to increasing odd numbers. In particular, ||f − g||∞ = 1. In the filtration defined by
f, at every even step we add a vertex, creating a new connected component, which gets
connected to the rest of the path at the next step. Thus, each vertex of the path will
give an off-diagonal point in the 0-persistence diagram, where all of them except the first
one have a lifespan of 1. On the other hand, in the filtration defined by g, we always
add the new vertices and their connecting edge in the same step, thus the 0-persistence
diagram only has a single off-diagonal point with infinite lifespan. In particular, we have
that for arbitrarily long paths we get arbitrarily large Wasserstein distances between the
diagrams for all q < ∞.
A similar counterexample can also be found for topological spaces. Consider the
topological space [0, 1] and the two functions depicted by the curves in Figure 3.7. Here

52
Introduction to TDA 3.6. Interleaving of persistence modules

0 1 3 5 0 1 3 5
··· ···
2 4 6 1 3 5

Figure 3.6: Two simplex-wise monotone functions with bounded infinity norm whose
persistence diagrams have unbounded Wasserstein distance.

we again have that ||f − g||∞ ⩽ ϵ, but the Wasserstein distance between the two diagrams
can be made arbitrarily big.
To avoid these types of counterexamples, we only want to consider even nicer func-
tions:
Definition 3.37 (Lipschitz). Let (X, d) be a metric space. A function f : X → R is
Lipschitz if there exists a constant C such that |f(x) − f(y)| ⩽ c · d(x, y) for all
x, y ∈ X.
For these functions we again get stability theorems, that we will not prove here.
Theorem 3.38. Let X be a triangulable, compact metric space. Let f, g : X → R be
Lipschitz functions. Then there exist constants C and k (that may only depend on
X and on the Lipschitz constants of f, g) such that for every p ⩾ 0 and every q ⩾ k,
dW,q (Dgmp (Ff ), Dgmp (Fg )) ⩽ C · ||f − g||1−k/q
∞ .
Theorem 3.39. Let f, g : K → R be simplex-wise monotone functions. Then for all
p ⩾ 0 and all q ⩾ 1,
X 1/q
dW,q (Dgmp (Ff ), Dgmp (Fg )) ⩽ ||f − g||q = |f(σ) − g(σ)|q .
σ∈K

3.6 Interleaving of persistence modules

3.6.1 Interleaving distance

Until now, we compared persistence diagrams. We will now introduce the interleav-
ing distance, which instead compares persistence modules. Let us begin with a formal

53
Chapter 3. Persistence Introduction to TDA

X = [0, 1]
f: g:

Figure 3.7: Two functions [0, 1] → R with bounded infinity norm whose persistence
diagrams have unbounded Wasserstein distance.

definition of persistence modules.

Definition 3.40. A persistence module V over R is a collection V = {Va }a∈R of vector
spaces Va together with linear maps va,a ′ : Va → Va ′ such that va,a = id and
vb,c ◦ va,b = va,c for all a ⩽ b ⩽ c.
You already know a few examples of persistence modules, e.g., the persistent homol-
ogy of sublevel set filtrations or of Čech or Vietoris-Rips complexes (here one simply
defines Va = 0 for a < 0).
We again want to consider maps between persistence modules, starting with a notion
of isomorphism, telling us when two persistence modules are “the same”.
Definition 3.41. We say that two persistence modules U and V are isomorphic if there
are isomorphisms fa : Ua → Va such that
ua,a ′
Ua Ua ′
fa fa ′
va,a ′
Va Va ′

−1
commutes both ways, i.e., fa ′ ◦ ua,a ′ = va,a ′ ◦ fa , and ua,a ′ ◦ f−1
a = fa ′ ◦ va,a ′ .

The basic idea of interleaving distance is to measure how close two persistence mod-
ules are to being isomorphic. For this, we allow ourselves some slack, in the sense that
Ua does not need to map to Va , but it can map to Va+ϵ , as long as all the relevant

54
Introduction to TDA 3.6. Interleaving of persistence modules

maps still behave like they would for an isomorphism. We make this formal in the next
definition.
Definition 3.42 (ϵ-interleaving persistence modules). Let U and V be persistence modules
over R. We say that U and V are ϵ-interleaved if there exist two families of maps,
φa : Ua → Va+ϵ and ψa : Va → Ua+ϵ such that the following four diagrams are
commutative:
ua,a ′ ua+ϵ,a ′ +ϵ
Ua Ua ′ Ua+ϵ Ua ′ +ϵ
φa φa ′ ψa ψa ′
and
va+ϵ,a ′ +ϵ va,a ′
Va+ϵ Va ′ +ϵ Va Va ′

ua,a+2ϵ
Ua Ua+2ϵ Ua+ϵ
φa ψa+ϵ ψa φa+ϵ
and
va,a+2ϵ
Va+ϵ Va Va+2ϵ

Note that if U and V are isomorphic, then they are 0-interleaved: the first type
of diagrams (the square diagrams) are the commutative diagrams in the definition of
isomorphic persistence modules and the the second type of diagrams (the triangular
diagrams) collapse to two arrows that say that the maps φa are isomorphisms with
inverses ψa .
Theorem 3.43. Assume U and V are ϵ-interleaving. Let δ > ϵ. Then U and V are
also δ-interleaving.
Proof. Given φa′ : Ua → Va+ϵ we define φa : Ua → Va+δ simply as φa := va+ϵ,a+δ ◦φa′ .
Symmetrically, we define ψa := ua+ϵ,a+δ ◦ ψa′ . To check that the correct diagrams
commute, we only check the right of every pair of symmetric ones above. We have to
distinguish two cases for the first diagram, a + δ < a ′ + ϵ and a + δ > a ′ + ϵ.
For the first case, we get the following diagram:

Ua Ua ′

Va+ϵ Va+δ Va ′ +ϵ Va ′ +δ

For the second case we get the diagram:

Ua Ua ′

Va+ϵ Va ′ +ϵ Va+δ Va ′ +δ

55
Chapter 3. Persistence Introduction to TDA

And finally, for the triangular diagram we get:

Ua Ua+2ϵ Ua+δ+ϵ Ua+2δ

Va+ϵ Va+δ

One can now verify that in all of these diagrams the correct paths commute.
Thus, the following definition makes sense:

Definition 3.44 (Interleaving distance). dI (U, V) := inf{ϵ | U and V are ϵ-interleaved }.

Exercise 3.45. Show that interleaving distance is a pseudo-metric for persistence mod-
ules (up to isomorphism), i.e., prove that (i) the interleaving distance between iso-
morphic persistence modules is 0, (ii) the interleaving distance is non-negative, and
(iii) the interleaving distance fulfills the triangle inequality.
Also show that it is not a metric by showing that there exist non-isomorphic
persistence modules with interleaving distance 0.

Exercise 3.46. Let W1 and W2 be two arbitrary vector spaces. Let U be the persistence
module such that Ua = W1 for a ∈ [w, x), and Ua = 0, otherwise. For a, a ′ ∈ [w, x)
we have ua,a ′ being the identity map. For a < w or a ′ ⩾ x (or both), we have ua,a ′
being the zero map. Similarly, we define the persistence module V which is W2 in
a ∈ [y, z) and 0 otherwise.
Show that dI (U, V) ⩽ max( w−x2
, z−y
2
).

The underlying ideas that allowed us to define the interleaving distance of persistence
modules can also be applied to filtrations.

Definition 3.47 (Interleaving for Filtrations). Let F, G be filtrations over R. F and G

are ϵ-interleaved if there exist maps φa : Fa → Ga+ϵ and ψa : Ga → Fa+ϵ such
that the same type of diagrams commute up to homotopy, that is, for example
φa ′ ◦ ιFa,a ′ ≃ ιG
a+ϵ,a ′ +ϵ ◦ φa are homotopic (contiguous).
We again define the interleaving distance (now between filtrations):

dI (F, G) = inf{ϵ | F and G are ϵ-interleaved }.

Observation 3.48. For all p ⩾ 0, dI (Hp F, Hp G) ⩽ dI (F, G).

The proof follows immediately from induced homology.

As a first application of interleaving distance, we can quantify how different the
Čech and Vietoris-Rips filtrations are. Recall that for a point cloud P and a radius
r, we have the relationship between the Čech and Vietoris-Rips complexes as follows:
Cr(P) ⊆ VRr(P) ⊆ C2r(P). Since this factor 2 is multiplicative, and we need an additive
ϵ for interleaving, let us just take the logarithmic scale (base 2) for the radius, i.e., we

56
Introduction to TDA 3.6. Interleaving of persistence modules

define Crlog = C2 and similarly VRrlog = VR2log . Since 2(r+1) = 2r, we have Crlog (P) ⊆
r r

VRrlog (P) ⊆ Cr+1

log (P).
We thus have the following inclusions:

Crlog Cr+1
log Cr+2
log

VRrlog VRr+1
log VRr+2
log

Since these are all inclusions, all relevant diagrams must commute, and thus we get that
dI (Clog , VRlog ) ⩽ 1.

3.6.2 Stability with respect to interleaving distance

The main motivation for interleaving distance is, that it can be used to prove stability
result, at least under some tameness conditions.

Definition 3.49. A persistence module V is q-tame if the linear maps have finite rank.

Note that in this definition, the q is not a parameter, just a name. All persistence
modules that show up in the context of persistent homology on point clouds are q-tame,
so this condition is not restrictive.

Theorem 3.50. If U, V are q-tame persistence modules over R, then

db (DgmU, DgmV) = dI (U, V).

Thus, for every interleaving one can find between two persistence modules or between
filtrations, one immediately gets a bound on the Bottleneck distance. This is a very
powerful result, and the proof of this is out of scope for these lecture notes. One direction
of the proof however follows from a decomposition result of persistence modules, that
we will discuss in Section 3.7. But first, we will look at some examples, how we can use
Theorem 3.50 to prove stability theorems.

Exercise 3.51. Prove Theorem 3.35.

3.6.3 Stability for Čech Complexes

So far, we have only seen stability results comparing filtrations induced by different func-
tions on a fixed space. However, in applications in data analysis, we consider complexes
on point clouds, and two different point clouds might not have the same size, and thus
the simplicial complexes on which we get filtrations are generally different. Using inter-
leaving distance, we can however still give stability results. In this section, we will do
this for Čech complexes.

57
Chapter 3. Persistence Introduction to TDA

Consider two point clouds P, Q in the same metric space X. Let us first consider
the really simple case, where P = {p}, and Q = {q} with d(p, q) = d. Then, B(p, r) ⊆
B(q, r + d). Now, how does this generalize to larger point sets? To get the same kind of
behavior, we need that for every point in P, there exists some point in Q with distance
at most d. This motivates the following distance measure:

Definition 3.52 (Hausdorff distance). Let A, B ⊆ X be compact sets. Then the Hausdorff
distance between A and B is defined as

dH (A, B) := max{max d(a, B), max d(b, A)}.

a∈A b∈B

Exercise 3.53. Show that Hausdorff distance is a distance measure.

Let dH (P, Q) = d. Then, p∈P B(p, r) ⊆ q∈Q B(q, r + d). From this, we get the
S S
following lemma:

Lemma 3.54. The (filtration given by) the Čech complexes of P and Q are d-interleaved.

Proof.

Cr(P) Cr+d(P) Cr+2d(P)

≃ ≃ ≃
S S S
p∈P B(p, r) p∈P B(p, r + d) p∈P B(p, r + 2d)

S S S
q∈Q B(q, r) q∈Q B(q, r + d) q∈Q B(q, r + 2d)
≃ ≃ ≃

Cr(Q) Cr+d(Q) Cr+2d(Q)

The relevant diagrams commute up to homotopy, since we only chain together homo-
topies and inclusion maps.

We can conclude the following

Theorem 3.55. db (Dgmp (C(P)), Dgmp (C(Q))) ⩽ dH (P, Q) for all p ⩾ 0.

Proof. By Theorem 3.50, Observation 3.48, and finally Lemma 3.54, we have

db (. . .) = dI (Hp C(P), Hp C(Q)) ⩽ dI (C(P), C(Q)) ⩽ dH (P, Q).

58
Introduction to TDA 3.7. Interval decomposition of Persistence Modules

3.7 Interval decomposition of Persistence Modules

In this section, we again look at persistence modules, this time as algebraic structures.
We consider persistence modules over R of vector spaces over some field F. We start by
looking at some special persistence modules, called interval modules.
Definition 3.56. A interval module I[b, d] is an persistence module

Va =
F if a ∈ [b, d],
and va,a ′ =
id b ⩽ a ⩽ a ′ ⩽ d,
0 otherwise. 0 otherwise.
Similarly, we can define interval modules on open and clopen intervals, denoted by
I(b, d), I(b, d], and I[b, d). We write I⟨b, d⟩ to include all four of these types.
For an interval module we can easily talk about birth and death as we did in persistent
homology. If we have a persistent homology module that is (isomorphic to) an interval
module, the birth and death correspond to the boundaries b, d of the interval.
Definition 3.57. A persistence module U is called pointwise finite dimensional (p.f.d.) if
for all a ∈ R, Ua has finite dimension.
Note that all p.f.d. persistence modules are also q-tame.
Definition 3.58. Given two persistence modules U, V, we define their direct sum U⊕V
by (U ⊕ V)a = Ua ⊕ Va and (u ⊕ v)a,a ′ = ua,a ′ ⊕ va,a ′ .
Here, the direct sum of maps just means applying the respective maps component-
wise.
Proposition 3.59. If U1 , U2 are ϵ-interleaved, and V1, V2 are δ-interleaved, then U1 ⊕ V1
and U2 ⊕ V2 are max{ϵ, δ}-interleaved.
Proof. Without loss of generality, let ϵ ⩾ δ, so we need to show that they are ϵ-
interleaved. Recall that if two persistence modules are δ-interleaved, they are also ϵ-
interleaved. Let φu , ψu be (series of) functions showing that U1 , U2 are ϵ-interleaved.
Similarly, let φv , ψv be (series of) functions showing that V1 , V2 are ϵ-interleaved. Then,
φu ⊕ φv , ψu ⊕ ψv show that U1 ⊕ V1 and U2 ⊕ V2 are ϵ-interleaved.
If we now have a direct sum of interval modules, we can still nicely talk about birth
and death: we just look at each interval module in isolation. The following theorem
shows that surprisingly most persistence modules can be expressed as direct sums of
interval modules.
Theorem 3.60 (Structure theorem). Any p.f.d. persistence module decomposes uniquely
into interval modules, i.e., we have

U =∼ I⟨bi, di⟩.
M

i∈I

The intervals ⟨bi , di ⟩ are exactly the barcodes if U is a persistent homology module.
59
Chapter 3. Persistence Introduction to TDA

Note that unless we have some additional tame-ness condition on U, I is not guaran-
teed to be finite.
Recall that when we talked about persistent homology, we said that there is some
consistent global choice of basis for persistent homology groups. That is a consequence
of the structure theorem. The structure theorem also allows us to prove one direction of
Theorem 3.50, which we will do in the following.

Proposition 3.61. Consider two interval modules I1 = I⟨b1 , d1 ⟩ and I2 = I⟨b2 , d2 ⟩.

Then, dI (I1 , I2 ) = db (DgmI1 , DgmI2 ).

Proof. To prove that dI (I1 , I2 ) ⩾ db (DgmI1 , DgmI2 ), we show that every upper bound
on dI is also an upper bound on db : assume that we have maps φ, ψ showing that the
two modules are ϵ-interleaved. Then, consider ψa+ϵ ◦ φa = v1a,a+2ϵ , equality holding
because φ, ψ certify ϵ-interleaving. Consider a ∈ ⟨b1 , d1 ⟩.

Case 1: v1a,a+2ϵ = 0 for all a ∈ ⟨b1 , d1 ⟩. Then, d1 − b1 < 2ϵ, and the (infinity-norm)
distance of (b1 , d1 ) to the diagonal is less than ϵ.

Case 2: v1a,a+2ϵ = id for some a ∈ ⟨b1 , d1 ⟩. Then, d1 − b1 ⩾ 2ϵ. Furthermore, we have

φa (F) = F for all a ∈ ⟨b1 , d1 − 2ϵ⟩. So, for these a, we must also have a + ϵ ∈ ⟨b2 , d2 ⟩.
This tells us that ⟨b2 , d2 ⟩ must “cover” a large part of ⟨b1 , d1 ⟩, namely we get b2 ⩽ b1 +ϵ,
and d2 ⩾ d1 − ϵ. We can now see that |b2 − b1 | ⩽ ϵ and |d2 − d1 | ⩽ ϵ: to violate this,
⟨b2 , d2 ⟩ would have to be a larger interval than ⟨b1 , d1 ⟩ (in particular, it would be longer
than 2ϵ), and we could thus exchange their roles and get that b1 ⩽ b2 +ϵ and d1 ⩾ d2 −ϵ.
From this, we get that d∞ ((b1 , d1 ), (b2 , d2 )) ⩽ ϵ, and thus get the bound on db .
We now prove the other direction, dI (I1 , I2 ) ⩽ db (DgmI1 , DgmI2 ). To see this, we
show that from every matching whose longest edge is ϵ, we get an ϵ-interleaving.

Case 1: The two off-diagonal points are matched to the diagonal. Then, we get that
di − bi ⩽ 2ϵ for both of them, and thus for all ϵ ′ > ϵ, I1 and I2 are ϵ ′ -interleaved with
φ, ψ = 0. Thus, dI ⩽ ϵ.

Case 2: The points are matched with each other. Then, |b2 − b1 | ⩽ ϵ and |d2 − d1 | ⩽ ϵ.
Taking φ, ψ = id we can see that I1 and I2 are ϵ-interleaved. Thus, dI ⩽ ϵ.

Corollary 3.62. Let U, V be p.f.d. persistence modules. Then, dI (U, V) ⩽ db (DgmU, DgmV).

We apply Lthe structure theorem to write U = i∈I I⟨bi , di ⟩ ⊕ j∈J 0 and V =

L L
Proof.
j∈J I⟨bj , dj ⟩ ⊕ i∈I 0. From the Bottleneck matching we get a matching between parts
L
making up U and V. Since the Bottleneck distance is the maximum length of any edge,
we have db (DgmU, DgmV) ⩾ db (DgmI1 , DgmI2 ) = dI (I1 , I2 ) for every two interval
modules that were matched together, where we used Proposition 3.61. Finally, we use
Proposition 3.59 to get the desired statement.

60
Introduction to TDA 3.7. Interval decomposition of Persistence Modules

Questions
13. What is a filtration? State the definition and describe different ways how filtra-
tions appear in topology and data analysis.
14. What persistent homology? State the formal definitions and give examples.
15. How can persistent homology be computed? Discuss the two algorithms de-
scribed in Section 3.3.
16. What are the Čech and Vietoris-Rips complexes? Give the definitions, discuss
their size and theoretical guarantees, and how they are related.
17. What are the Delaunay and Alpha complexes? Give the definitions, discuss
their size and theoretical guarantees, and how they are related.
18. What is the Witness complex? State the Definition and describe how it relates
to the non-sparse complexes.
19. What is the Graph induced complex? State the Definition and describe how it
relates to the non-sparse complexes.
20. How can we measure distances between persistence diagrams? Discuss Bottle-
neck and Wasserstein distance.
21. How stable are filtrations derived from simplex-wise monotone functions with
respect to Bottleneck distance? State, illustrate and prove the stability theorem
(Theorem 3.33).
22. How can we measure distances between persistence diagrams? Define inter-
leaving distance and discuss its relation to Botleneck distance.
23. How stable are Čech complexes to perturbations of the underlying point set?
Define Hausdorff distance, state and prove the stability theorem for Čech complexes
(Theorem 3.55).

61
Chapter 4

Reeb graphs and Mapper

In this chapter we look at another tool in topological data analysis, called Mapper. The
underlying idea of Mapper has its roots in Morse theory, where Georges Reeb defined
a graph to summarize a Morse function on a manifold. We first discuss these graphs,
called Reeb graphs, and then how to mimic the ideas for the case where instead of a
manifold we have point cloud data.
Before we dive into the mathematical details, a short remark about the pronunciation
of the word “Reeb graph”. Georges Reeb, after whom these graphs are named, was a
French mathematician born in the German speaking region Alsace. Thus, he likely
pronounced his name the German way, that is, with the “ee” spoken similar to the “ea”
in “bear” (as opposed to “beer”).

4.1 Reeb Graphs

The idea of Reeb graphs is that given some topological space X, and some function
f : X → R, we consider the preimage of f for some fixed value a ∈ R. We place one
point per connected path-component of the preimage. We do this for some values in R,
and connect the points corresponding to neighboring connected components in adjacent
preimages. More formally,

Definition 4.1. Let X be some topological space, and f a function f : X → R. Two

points x, y are called equivalent (x ∼ y), iff f(x) = f(y) = α and x and y are in the
same path-connected component of f−1 (α). The Reeb graph Rf is the quotient space
X/ ∼.

To make sure that nothing weird happens due to some things being infinite, we
assume all of our functions to be levelset tame:

Definition 4.2. A function f : X → R is levelset tame if

• each levelset f−1 (α) has finitely many connected components, all of which are
path-connected, and

62
Introduction to TDA 4.1. Reeb Graphs

f :X→R

X Rf

Figure 4.1: An example of a Reeb graph

• the homology groups of the levelsets only change at finitely many critical val-
ues.
The Reeb graph itself is just a (continuous) topological space. We call it a graph,
since it is 1-dimensional. To arrive at a graph as we know it in combinatorics, we will
need to discretize it. To discretize the Reeb graph, we need to define vertices and edges.
There are many different possibilities of defining vertices and edges to discretize the Reeb
graph, but we want to define some type of minimal one.
Let us look at the neighborhood of some point p in the Reeb graph (as a topological
space). We look at how many ways there exist to go from p towards the direction of
higher f-value (we call this number the up-degree u), and how many ways to go towards
the direction of lower f-value (we call this the down-degree l). Depending on u and l,
we classify p as in Table 4.1.

Table 4.1: Classifications of points in the Reeb graph.

u l Classification
1 1 regular
0 >0 maximum
>0 0 minimum
⩾2 l up-fork
u ⩾2 down-fork

Note that a point can fall into multiple of these classes, for example it can be a maxi-
mum and a down-fork simultaneously, or an up-fork and a down-fork simultaneously. We

63
Chapter 4. Reeb graphs and Mapper Introduction to TDA

call the minima, maxima, up-forks, and down-forks critical points. Our discretization
places vertices at the critical points. Note that the graph we get through this process is
not necessarily simple, we may have multi-edges.

Exercise 4.3. Consider a double torus embedded in R3 . You can imagine it as the
result of taking the figure depicted in Figure 4.2 embedded in the plane x3 = 0,
replacing every point by a 3-dimensional ball with radius r < min{d/2, R/2}, and
taking the boundary of the union of these balls.
x2

R R
d

x1
Figure 4.2: The space blown up to a double torus in Exercise 4.3.

Draw the Reeb graph for the three functions f1 (x) = x1 , f2 (x) = x2 , and f3 (x) =
x3 .

We next consider merge trees and split trees, which are variants of the Reeb graph,
where instead of levelsets, we look at sub-level sets or super-level sets.

Definition 4.4. Let X be some topological space, and f a function f : X → R. We

have x ∼M y for two points x, y, if and only if f(x) = f(y) = α and x and y are in
the same connected component of f−1 ((−∞, α]). The merge tree TM is the quotient
space X/ ∼M .

Note that in the merge tree, since we only increase the space under consideration,
we never have a connected component that splits. We can only have new connected
components appearing, and connected components merging. This also tells us that the
Merge tree (or its discretization) is always a tree.

Definition 4.5. Let X be some topological space, and f a function f : X → R. We have

x ∼S y for two points x, y, if and only if f(x) = f(y) = α and x and y are in the same
connected component of f−1 ([α, ∞)). The split tree TS is the quotient space X/ ∼S .

In topological data analysis, we use computers, which cannot handle arbitrary topo-
logical spaces. We thus now look more at Reeb graphs in the context of simplicial
complexes. We consider a simplicial complex K and a function f : |K| → R, which is

64
Introduction to TDA 4.1. Reeb Graphs

piece-wise linear (linear on each simplex). We observe that the Reeb graph then only
depends on the 2-skeleton of K. This is the case since looking at a levelset is the same as
cutting through the simplicial complex. When we cut through a simplex, we generally
get a simplex of one dimension lower. In a simplicial complex, connectivity is completely
determined by the 1-skeleton. Thus, before cutting, the 2-skeleton suffices. Furthermore,
we can see that the critical points are images of the vertices of K. This happens since a
connected component can only appear, disappear, split, or merge at some local maximum
or minimum of the connected component. Since the function is linear, the maximum or
minimum of every simplex is also attained at some vertex. We define the augmented
Reeb graph of a simplicial complex with a PL-function, by just taking all the images of
the vertices as our graph vertices.
How can we compute this augmented Reeb graph? We can do a discrete sweep (or
scan) through the simplicial complex in the order given by f, only stopping at values
a such that f(v) = a for some vertex v. In this sweep, we want to keep track of the
connected components. The levelset f−1 (α) of the 2-skeleton of K is just a graph Gα :
vertices and edges of K induce vertices of Gα , triangles induce edges. We can now go
through our vertices in order, look at these graphs, and update the connected compo-
nents. The runtime of this algorithm is given by the data structure used to manage the
connected components. We want a data structure that can update the connected compo-
nents under insertion and deletions of edges and vertices. There are such data structures
that can do each update in amortized time O(log m), where m is the size of the graph.
The size of the graph is bounded by the sum m of vertices, edges, and triangles in K.
Each such feature appears at one point, and disappears at one point, and we thus have
at most 2m insertions and deletions in total, giving an O(m log m) algorithm. We thus
have the following theorem.
Theorem 4.6. Given a 2-dimensional simplicial complex K with m faces and a piece-
wise linear function f : |K| → R on it, we can compute the augmented Reeb graph
Rf of K with respect to f in time O(m log m).
Exercise 4.7. Consider a simplicial complex K and a PL (piece-wise linear) function
f : |K| → R. What happens to the Reeb graph when you add one additional face to
K and extend f accordingly?

4.1.1 Homology of Reeb graphs

The Reeb graph of a topological space X with respect to a function f can be viewed as a
summary of X through the lens of X. The natural question is: how good of a summary is
it? It is clear that in general we lose information, for example on the dimension of X, but
we can still hope that some topological information is retained. In this section, we thus
compare the homology of the Reeb graph to the homology of X. Since the Reeb graph
Rf is a graph (a 1-dimensional object), we have Hp (Rf ) = 0 for p ⩾ 2, so any higher-
dimensional homology gets lost. However, a graph still has homology in dimensions 0
and 1.

65
Chapter 4. Reeb graphs and Mapper Introduction to TDA

Observation 4.8. For a levelset tame f : X → R, we have β0 (X) = β0 (Rf ).

In other words, the Reeb graph captures the 0-homology of the input space X per-
fectly, no matter which levelset tame function f we use.
Sadly, the same does not hold for the 1-homology. Let us consider a torus, as in
Figure 4.3. In general, it can be that the choice of function f determines whether we
capture a hole or not, consider e.g. a cylinder. Note that for the torus, it is actually the
case that no matter which function f we choose, we cannot capture its 1-homology (this
is non-trivial to show).

Figure 4.3: The torus and its Reeb graph.

On the other hand, we can see that every cycle in the Reeb graph is indeed also a
cycle in the topological space X, and it cannot be filled in, so it is indeed a hole. Thus
we also get the following observation:

Observation 4.9. For a levelset tame f : X → R, we have β1 (X) ⩾ β1 (Rf ).

Can we somehow formalize which holes we lose? To do this, we split up homology into
“horizontal” and a “vertical” parts, where horizontal and vertical are of course relative to
f.

Definition 4.10. A p-th homology class h ∈ Hp (X) is called horizontal if there is a

S set of values A = {a1 , . . . , ak } such that h has a pre-image
finite
−1
under the map
Hp ( a∈A Xa ) → Hp (X) induced by inclusion, where Xa = f (a).

This definition means that we need to be able to find a finite set of levelsets, such
that we can find cycles contained in these levelsets, which are in the homology class h
in Hp (X).
One now wonders whether the set of horizontal homology classes forms a group. Let
this set be Hp (X). It turns out that it is indeed a group.

Lemma 4.11. Hp (X) is a subgroup of Hp (X).

66
Introduction to TDA 4.2. Distances for Reeb Graphs

Proof. First, we see that the identity element 0 is in

S Hp (X). We can take an arbitrary
set A, and we can always map the 0 element of Hp ( a∈A Xa ) to 0.
Next, we show that the set is closed under addition. Let p, q ∈ Hp (X), and we show
that p + q ∈ Hp (X). p has a pre-image in some levelset Ap , and q has a pre-image in
some levelset Aq . p + q must have a pre-image in Ap ∪ Aq .
Finally, we show that the inverse of every element is contained in the group, but since
every element is self-inverse in Z2 -homology, we get that for every element its inverse is
also contained in Hp (X).

Since the horizontal homology is a sub-group, we can now easily define vertical ho-
mology by taking quotient groups.
∨
Definition 4.12. The vertical homology group of X with respect to f is the group Hp (X) :=
Hp (X)/Hp (X).
∨
Observation 4.13. rank(Hp (X)) = rank(Hp (X)) + rank(Hp (X)).
∨ ∨
Fact 4.14. The surjection ϕ : X → Rf induces an isomorphism Φ : H1 (X) → H1 (Rf ).

In other words, when we go from a space X to its Reeb graph, we keep the vertical
homology classes, and lose the horizontal ones.

Corollary 4.15. Given X an orientable connected compact 2-manifold, and a Morse

function f : X → R, then β1 (Rf ) = β1 (X)/2.

Here, a 2-manifold is a space that locally at every point looks like R2 . Orientable
means that there is an inside and an outside side. A Morse function is a “nice enough”
function defined in terms of some derivatives, which we do not need to specify here.

Exercise 4.16. (a) Consider a 2-dimensional geometric simplicial complex K embed-

ded in R2 . Consider the function f(x) = x1 . Show that β1 (K) = β1 (Rf ).

(b) Find a geometric simplicial complex K embedded in R2 and a map f : K → R

such that β1 (K) > β1 (Rf ).

4.2 Distances for Reeb Graphs

In order to compare Reeb graphs to each other, we again want to define distance measures
between them. We discuss two such measures here. The first one, called interleaving
distance, is, not surprisingly, similar to the interleaving distance of persistence mod-
ules. The second one, called functional distortion distance is similar to the Gromov-
Hausdorff distance for metric spaces.

67
Chapter 4. Reeb graphs and Mapper Introduction to TDA

4.2.1 Interleaving Distance

When do we want two Reeb graphs to be considered the same, and thus have distance
0? We definitely need that the graphs are isomorphic in the sense of graph isomorphism.
But further than that, we also want that this graph isomorphism is “function preserving”.
In other words, the critical points should lie on the same function levels. The idea of
the interleaving distance is to measure how far away from this we are. Thus, given two
Reeb graphs Rf , Rg , “how much” is missing towards a “function preserving isomorphism”?
Towards formalizing this idea, we need a few definitions.
Note that when we compare two Reeb graphs Rf , Rg , those can be Reeb graphs of
different spaces with regards to different functions.

Definition 4.17. An ϵ-thickening Xϵ of some topological space X is given by Xϵ :=

X × [−ϵ, +ϵ].

Definition 4.18. For a Reeb graph Rf consider a function fϵ : (Rf )ϵ → R such that

(x, t) 7→ f(x) + t.

The ϵ-smoothing of Rf , denoted by Sϵ (Rf ) is the Reeb graph of (Rf )ϵ with regards to
fϵ .

An example of these definitions can be seen in Figure 4.4. Note that when we say
(Rf )ϵ , we mean an ϵ-thickening of Rf , not a Reeb graph with regards to some function
fϵ . The ϵ-smoothing Sϵ (Rf ) is then a Reeb graph with regards to the function fϵ , but
of (Rf )ϵ , and not of the original space Rf is the Reeb graph of. Furthermore, when we
write f(x) for some x ∈ Rf , we mean that we extend f to some function f∗ : Rf → R by
defining f∗ (x) = f(f− 1(x)). We will just call this function f as well for simplicity.

Definition 4.19. The function ι : Rf → Sϵ (Rf ) with x 7→ [(x, 0)] is the quotiented inclusion
map. Here, [(x, 0)] denotes the equivalence class, or the connected component that
contains (x, 0) in f−1
ϵ (fϵ (x, 0)).

Consider some function µ : Rf → Rg which is function preserving, i.e., f(x) = g(µ(x))

for all x ∈ Rf . A function-preserving map µ : Rf → Sϵ (Rg ) induces a function preserving
map µϵ : Sϵ (Rf ) → S2ϵ (Rg ) with [x, t] 7→ [µ(x), t].

Definition 4.20 (Reeb graph interleaving). A Reeb graph interleaving is a pair of func-
tion preserving maps φ : Rf → Sϵ (Rg ), ψ : Rg → Sϵ (Rf ) are ϵ-interleaved, if the
following diagram commutes:
ι ιϵ
Rf Sϵ (Rf ) S2ϵ (Rf )
ψ φ ψϵ φϵ

ι ιϵ
Rg Sϵ (Rg ) S2ϵ (Rg )

68
Introduction to TDA 4.2. Distances for Reeb Graphs

Rf (Rf ) S (Rf )

Figure 4.4: A Reeb graph, its ϵ-thickening, and its ϵ-smoothing.

Here, to understand why ιϵ makes sense, we need the following fact, the proof of
which is left as an exercise.

Observation 4.21. Sδ (Sϵ (Rf )) = Sδ+ϵ (Rf ).

Note that by construction of ι, ιϵ and φϵ (or ψϵ , respectively), the trapezoidal parts

of this diagram commute trivially: φϵ ◦ ι(x) = φϵ ([x, 0]) = [φ(x), 0] = ιϵ ◦ φ(x). Fur-
thermore, note that for sufficiently large ϵ, Sϵ (Rf ) is a union of segments, i.e., any two
Reeb graphs of compact connected spaces are ϵ-interleaved for some ϵ. Lastly, if Rf and
Rg are ϵ-interleaved, then they are also δ-interleaved for all δ ⩾ ϵ.

Definition 4.22. dI (Rf , Rg ) = inf{ϵ | Rf , Rg are ϵ-interleaved}.

We once again have a stability theorem, which we will not prove here.

Theorem 4.23. For tame functions f, g : X → R we have dI (Rf , Rg ) ⩽ ||f − g||∞ .

4.2.2 Functional Distortion Distance

As mentioned above, the functional distortion distance in motivated by the Gromov-
Hausdorff distance for metric spaces. Thus, the first step is to define a metric on a Reeb
graph.

Definition 4.24. Let Rf be a Reeb graph of a space X, and u, v ∈ Rf (in the same
connected component), and let π be a path from u to v. We define the height of π
as height(π) = maxx∈π f(x) − minx∈π f(x). To turn this into a distance metric, we

69
Chapter 4. Reeb graphs and Mapper Introduction to TDA

consider Π(u, v),the set of all paths between u and v. Then, the function induced
metric on Rf is defined as

df (u, v) = min height(π).

π∈Π(u,v)

In a sense, df (u, v) is the “thickness” of the thinnest “slice” of the space X in which u
and v are connected.
Definition 4.25 (Functional distortion distance). Let Rf and Rg be two Reeb graphs. Let
Φ : Rf → Rg , Ψ : Rg → Rf be continuous functions, but not necessarily function-
preserving. Then, we define correspondence and distortion:

C(Φ, Ψ) = {(x, y) ∈ Rf × Rg | Φ(x) = y or x = Ψ(y)}

1
D(Φ, Ψ) = sup |df (x, x ′ ) − dg (y, y ′ )|.
′ ′
(x,y),(x ,y )∈C(Φ,Ψ) 2
And finally, we define the functional distortion distance,

dFD (Rf , Rg ) = inf max{D(Φ, Ψ), ||f − (g ◦ Φ)||∞ , ||g − (f ◦ Ψ)||∞ }.

Φ,Ψ

Also for this distance measure there is a stability theorem.

Theorem 4.26. Let f, g : X → R be tame functions. Then, dFD (Rf , Rg ) ⩽ ||f − g||∞ .
We can also quantify the relation between the two discussed distances.
Theorem 4.27. dI (Rf , Rg ) ⩽ dFD (Rf , Rg ) ⩽ 3dI (Rf , Rg ).
Exercise 4.28. Consider a merge tree T with regards to a function f. We define
the a-shift xa for any x ∈ T to be the unique “ancestor” of x with function value
f(xa ) = f(x) + a.
We now consider two merge trees; T1 with regards to f, and T2 with regards to
g. We call T1 and T2 ϵ-compatible if there exist maps α : T1 → T2 and β : T2 → T1
such that we get the following commutativities:
• g(α(x)) = f(x) + ϵ for all x ∈ T1
• f(β(y)) = g(y) + ϵ for all y ∈ T2
• β ◦ α(x) = x2ϵ for all x ∈ T1
• α ◦ β(y) = y2ϵ for all y ∈ T2 .
The interleaving distance between merge trees dI (T1 , T2 ) can now be defined as
the infimum of all ϵ such that T1 and T2 are ϵ-compatible. Show that dI (T1 , T2 ) =
dFD (T1 , T2 ).
Note: we technically only defined dFD for Reeb graphs. You can simply consider a
merge tree to be the Reeb graph of itself (with regards to the same filter function).

70
Introduction to TDA 4.3. Mapper

4.3 Mapper

4.3.1 An approximation of the Reeb graph

Reeb graphs loose a lot of information, since they at most retain some 1-dimensional
holes, but no larger holes. To generalize Reeb graphs further, we start looking at neigh-
borhoods instead of levelsets, which will then lead to the Mapper algorithm.
To begin, we consider the 1-dimensional case, and try to find an approximation of the
Reeb graph. Instead of looking at pre-images of points, we will now look at pre-images
of intervals. Let U = {Uα }α∈A be an open cover of R (i.e., a collection of open sets whose
union is R). As always, we consider a function f : X → R. ForSeach f−1 (Uα ), we consider
a partition into path-connected components, i.e., f−1 (Uα ) = β∈Bα Vβ . We then look at
f∗ (U) := {Vβ }, the set of all Vβ we get over all α. Our object of interest is the nerve of
this family, i.e., N(f∗ (U)).

X U f ∗(U) N (f ∗(U))
Figure 4.5: A space X, an open cover U of R, the family f∗ (F), and its nerve.

If we take sufficiently nice functions, and sufficiently fine covers, then N(f∗ (U)) is
isomorphic to Rf .

4.3.2 Topological Mapper

We can generalize this idea to maps to arbitrary spaces.
Definition 4.29. Let X, Z be topological spaces. Then we call f : X → Z well-behaved if
for all open sets U ⊆ Z, f−1 (U) has finitely many path-connected components.
Definition 4.30 (Mapper). Let f : X → Z be well-behaved, and U be a (finite) open cover
of Z. Then the Mapper is defined as M(U, f) := N(f∗ (U)).

71
Chapter 4. Reeb graphs and Mapper Introduction to TDA

As an example, we look at X being the boundary of the 3-cube [0, 1]3 . We then also
look at Z1 = R2 spanned by the x- and y-axis, with f1 : X → Z1 being the projection onto
this plane. Furthermore, we look at Z2 = R, spanned by just the x-axis, and f2 : X → Z2
being again the projection.
We consider the open cover U2 of Z2 : {(−∞, 13 ), (0, 1), ( 23 , +∞)}. For Z1 , we consider
the cover U1 := U2 × U2 .

M (U2 , f2 )

M (U1 , f1 )

Figure 4.6: The cover U∞ , and the two Mappers. The Mapper M(U1 , f1 ) consists
of an empty octahedron, with additional filled tetrahedra attached at the
purple vertices. The whole space thus collapses to an octahedron.

Exercise 4.31. (a) Consider spaces X, Z, a filter function f : X → Z, and an open

cover U of Z. Show that if the pullback cover f∗ (U) is a good cover of X, then
M(U, f) is homotopy equivalent to X.
(b) Give an example of spaces X, Z, a filter function f : X → Z, and a good cover
U of Z, such that M(U, f) is not homotopy equivalent to X.
(c) Give an example of spaces X, Z, a filter function f : X → Z, and an open cover
U of Z such that the pullback cover f∗ (U) is not a good cover, but M(U, f) is
still homotopy equivalent to X.

4.3.3 Mapper for Point Clouds

We would like to apply the ideas of the topological Mapper to analyze the shape of data.
However, once again we have the issue that data usually does not come in the form of
a topological space, but as a set of data points with a notion of distance between them.
The Mapper algorithm for point clouds adapts the ideas of the topological Mapper to
this setting.

72
Introduction to TDA 4.4. Multiscale Mapper

Input: In the most general setting, data comes as a finite metric space (P, dP ), for ex-
ample as points in Rd or as vertices of a graph. We also requires a cover U of a space Z,
usually Z = R, as input. Finally, we also need a filter function f : P → Z and a clustering
algorithm (which might also require some input parameters).

Algorithm: Since at the moment we only have a discrete metric space, we do not really
have the notion of connected components yet. For every U ∈ U, we thus cluster the pre-
image f−1 (U) using some clustering algorithm, which we can also consider as an input.
Now, we can just consider each cluster Ci as a vertex of some simplicial complex K, and
add a face {C1 , . . . , Ck } to K if these clusters (which are just point sets) have a common
point.

Output: We output K, or even just its 1-skeleton.

As you can see, this algorithm requires a lot of input parameters. While this allows
to encode previous knowledge of the data set (e.g. by choosing as filter function the
distance to a known center of the data), it also makes the space of possible outputs very
large. Picking the correct parameters is currently still an art form on its own, and there
is currently significant research being done towards understanding the interplay between
the parameters and statistical guarantees for certain good choices of parameters.

4.4 Multiscale Mapper

Motivated by the many tuneable parameters, we discuss here one idea to look at many
values at once. The multiscale Mapper is a combination of the ideas of persistence and
of Mapper. We here want to look at different covers.

Definition 4.32. Let U = {Uα }α∈A and F = {Vβ }β∈B be two covers of the same space X.
A map of covers is a map φ : A → B such that for every α ∈ A, we have Uα ⊆ Vφ(α) .

Proposition 4.33. If φ : U → V is a map of covers (with a slight abuse of notation),

then the map N(φ) : N(U) → N(V) given on the vertices by φ is simplicial.

Proof. Let σ ∈ N(U). We need to show that the intersection β∈φ(σ) Vβ is non-empty.
T

\ \ \
Vβ = Vφ(α) ⊇ Uα ̸= ∅
β∈φ(σ) α∈σ α∈σ

Thus, φ(σ) ∈ N(V).

Proposition 4.34. Let f : X → Z be some map, and U, V be covers of Z, with φ : U → V

some map of covers. Then, there exists a map of covers f∗ (φ) : f∗ (U) → f∗ (V).

Recall that f∗ (U) is the cover of X consisting of the connected components of the
pre-images of the sets of U under f.

73
Chapter 4. Reeb graphs and Mapper Introduction to TDA

Proof. For every α, we have Uα ⊆ Vφ(α) =⇒ f−1 (Uα ) ⊆ f−1 (Vφ(α) ). We now need to go
from these pre-images to their connected components. Since every connected component
of f−1 (Uα ) must lie in a unique connected component of f−1 (Vφ(α) ), our desired map of
covers is given by exactly mapping to this connected component.
φ ψ
If we have multiple maps of covers, U → V → W, we can concatenate the maps, and
the f∗ function distributes: f∗ (ψ ◦ φ) = f∗ (ψ) ◦ f∗ (φ).
φ φ φn−1
Let U = U1 →1 U2 →2 . . . → Un be a sequence of covers of Z with maps between
them, which we call a cover tower. By applying f∗ we get a cover tower f∗ (U) of X.

Definition 4.35 (Multiscale Mapper). Let f : X → Z, U a cover tower of Z. Then, the

Multiscale Mapper MM(U, f) is

MM(U, f) := N(f∗ (U)) = {N(f∗ (Ui )) | Ui ∈ U})

together with the induced simplicial maps

N(f∗ (φi )) : N(f∗ (Ui )) → N(f∗ (⟩ + ∞)).

Applying homology, we get the sequence homology groups with induced homo-
morphisms between them, i.e., a persistence module:
N(f∗ (φ1 )) N(f∗ (φn−1 ))
Hp (N(f∗ (U1 ))) → ... → Hp (N(f∗ (Un ))).

We can now view Dgmp MM(U, f) as a topological summary of f through the lens
of U.

As opposed to the normal Mapper, at first glance the Multiscale Mapper adds even
more parameters. But a cover tower can be seen as a way of looking at a whole interval
of covers. For example, we can get a cover tower by increasing the size of all intervals
in an interval cover. The features of the data should show up as a robust feature that
persists for a longer time over this process, while spurious features obtained from choosing
“wrong” Mapper parameters should disappear quickly.

Questions
24. What is a Reeb graph? State the definition and describe how we get the graph
structure.
25. How can we compute the augmented Reeb graph of a piece-wise linear func-
tion? Define the augmented Reeb graph and explain the algorithm to compute
it.
26. How much of the homology of the underlying topological space is captured by
the Reeb graph? Explain vertical and horizontal homology.

74
Introduction to TDA 4.4. Multiscale Mapper

27. What is the interleaving distance for Reeb graphs? Give the definitions and
state the relevant stability theorems.
28. What is the functional distortion distance for Reeb graphs? Give the definitions
and state the relevant stability theorems.
29. What is the topological Mapper? State the Definition and give an example.
30. How can we use Mapper on point cloud data? Explain the Mapper algorithm
and describe the input parameters.
31. How can we use Mapper on several covers at once? Explain the Multiscale
Mapper.

75
Chapter 5

Optimal Generators

In some applications, we are not only interested in the number of holes in our data, but
we also want to look at specific holes, that is, we would like to have a representation of
this hole in the data, or even a basis of the homology group. However, in a homology
class, there are many homologous cycles. Furthermore, there are many different choices
of homology classes which form a basis of the homology group. Thus, there are many
different choices for cycles as bases of the homology group. How do we find good bases?
We define a weight function w : KP p → R⩾0 on the p-simplices,
P and the weight of a
chain is simply the sum, i.e., w(c) = αi w(σi ) for c = αi σi . The weight of a set of
cycles C is then the sum of weights of each cycle. We are now interested in cycles that
have minimal weight in their homology class, or at bases with minimum total weight.
We look at this problem in two settings: first we look at the case where we are given
a fixed simplicial complex and we want to find an optimal basis for the homology of this
complex. This can be applied for example if the persistence diagram of a filtration gives
us a range of values in which we expect the complex to nicely capture the shape of the
data. We can then compute an optimal basis for the fixed complex for some value in this
range.
In some applications, we might also want to take a closer look at single intervals in
the persistence barcode, that is, understand a hole that is born at time b and dies at
time d (for example, to decide whether it is corresponds to a feature in the data or is
just a consequence of the process). This brings us to the second setting we look at in
this chapter, where we want to find an optimal representative of a persistent homology
class.

5.1 Optimal basis of a fixed complex

Definition 5.1. A set C of cycles is an optimal basis for Hp (K) if it is a basis and there
is no other basis C ′ with w(C ′ ) < w(C).

How can we compute such an optimal basis?

76
Introduction to TDA 5.1. Optimal basis of a fixed complex

In a first step, we are going to compute a set of cycles C which contains an optimal
basis. Then, we sort the cycles by increasing weight, and pick the first cycle to be part
of our basis B. Then, we simply iterate through our cycles and add a cycle ci to our
basis if it cannot be written as a linear combination of our current basis. Finally, if c1 is
a boundary, we return the B \ {c1 }, and otherwise we return B.
Assuming that we can do all these steps, it follows from a more general framework
in matroid theory that the computed basis is indeed optimal.

Exercise 5.2. A matroid is given by a collection I of subsets of some universe U, such

that

1. ∅ ∈ I, and if some set L is in I, all L ′ ⊆ L are also in I.

2. If some L, L ′ are in I, and |L ′ | = |L| + 1, then there exists an element f ∈ L ′ \ L,

such that L ∪ {f} ∈ I.

The sets in I are also called the independent sets of the matroid. The inclusion-
maximal sets in I are called bases.

(a) Show that for U being any finite set of vectors in some vector space, the
family I of subsets of U corresponding to linearly independent vectors forms
the family of independent sets of a matroid.

(b) Show that for any graph G = (V, E), the family I of subsets of E corresponding
to forests in G forms the family of independent sets of a matroid.

(c) Consider a matroid on a universe U with a weight function w : U → R.

Consider the following greedy algorithm: begin with L = ∅, and consecutively
add the lowest-weight element e ̸∈ L such that L ∪ {e} remains an independent
set, until reaching a basis. Show that this greedy algorithm finds a minimum-
weight basis.

For the first step of the above algorithm, we need to be able to compute our beginning
set C. Furthermore, we need to be able to check linear independence.
From now on, we will focus on computing a basis for H1 (K). Without loss of gen-
erality, we say that K is 2-dimensional, with n triangles, O(n) edges and vertices. To
compute C, we begin with C = ∅. For all vertices v, we compute the shortest path tree
Tv rooted at v. We can do this for example with Dijkstra’s algorithm. For every edge e
that is not in Tv , we add the unique cycle in Tv ∪ {e} to C. This can be implemented in
O(n2 log n), and yields a set of cycles with |C| ∈ O(n2 ). But, we need to prove that it is
indeed a set which contains an optimal basis.

Lemma 5.3. C as computed by the algorithm above contains an optimal basis.

77
Chapter 5. Optimal Generators Introduction to TDA

Proof. Let C∗ be an optimal basis, and towards a contradiction, let c be a cycle contained
in C∗ \ C. As the weights are non-negative, we can assume that c is simple, i.e., no edge
is used multiple times.
Let v be a vertex in c, and let Tv be the corresponding shortest path tree. There must
be an edge e = {u, w} in c, which is not in Tv , since Tv is a tree. Let Πv,u and Πv,w be
the shortest paths from v to u, w respectively. These paths must be contained in Tv . Let
us similarly consider Πv,u ′
and Πv,w′
to be the (shortest) paths from v to u, w in c. We
know that not both Πv,u = Πv,u and Πv,w
′ ′
= Πv,w , so w.l.o.g. assume that Πv,u
′
̸= Πv,u .
We now define the cycle c1 = {Πv,w , e, Πv,u } and c2 = {Πv,u , Πv,u }. We can now see
′ ′

that as we work in Z2 , c = c1 + c2 . Furthermore, we have w(c1 ) ⩽ w(c), since Πv,u is a

shortest path (in K), while Πv,u ′
is not necessarily shortest. The same also holds for c2 :
w(c2 ) ⩽ w(c) since {Πv,w′
, e} can not be shorter than Πv,u .
Let us now consider the homology classes of c1 and c2 . If both [c1 ] and [c2 ] were
dependent on C∗ \ {c}, then so would [c], since c = c1 + c2 . Then, C∗ would not be a
basis. Thus, at least one of [c1 ] and [c2 ] has to be dependent of C∗ \ {c}. Let us consider
first that c1 is independent. Then, we could replace c by c1 in C∗ and get a basis which
is at least as good as C∗ . We can repeat the argument for that basis with v ′ , the common
ancestor of Πv,u and Πv,w ′
. If c2 is independent, we replace c by c2 in C∗ and repeat the
argument with v the common ancestor of Πv,u , Πv,u
′ ′
and e an edge incident to u.
At the end, we get a basis C with w(C ) ⩽ w(C ) with C ′ ⊆ C.
′ ′ ∗

So, we have finished the first step of our algorithm. It remains to figure out how to
check independence. For this, we introduce annotations.
Definition 5.4. An annotation of p-simplices is a function a : Kp → Z − 2g giving each
p-simplex a binary vector of size g. This extends to chains by sums. An annotation
must fulfill:
• g = βp (K)
• a(z1 ) = a(z2 ) iff [z1 ] = [z2 ].
Given an annotation, we can now clearly check linear independence of cycles by
simply checking linear independence of a set of vectors, for which we have existing tools
such as Gaussian elimination.
Proposition 5.5. In every simplicial complex K and for every p ⩾ 0, there exists an
annotation of p-simplices, and can also be computed.
Proof. (Sketch for p = 1) We can compute a spanning forest T , and let m be the number
of remaining edges. We initialize annotations of length m, and set a(e) = 0 for every
edge in the spanning forest T . For every remaining edge ei , we set aj (ei ) = 1 if and only
if j = i, and 0 otherwise.
For every triangle t, if the annotation of its boundary δt is not 0, we find a non-zero
entry bu in a(δt) and add a(δt) to every edge with au (e) = 1, and we delete the u-th
entry from all annotations. One can show that this yields a valid annotation, and it can
be implemented in O(n3 ), and more clever implementations work in O(nω ).

78
Introduction to TDA 5.2. Persistent cycles

To check independence more efficiently, we add auxiliary annotations also to vertices

in a shortest path tree Tv rooted at v. We give v the annotation 0, and for a vertex
x that is the child of y, we set a(x) := a(y) + a(exy ). For every cycle defined by the
non-tree edge e = uw, we now have a(ce ) = a(u) + a(w) + a(e). So, we never actually
have to compute an explicit representation of a cycle by its edges, we only need to store
its weight, the shortest path trees with the auxiliary annotations, and the non-tree edge
e. Note that the auxiliary annotations can be computed in O(gn) for the whole tree,
thus in O(gn2 ) for all trees.
Finally, we have to check independence. Given an (n × m) matrix M, we can find the
lexicographic leftmost set of independent columns in time O(max(n, m)ω ). Instead of
naively doing this n2 times (once for every cycle), we group our cycles of C into groups
Ai of size g, and compute the leftmost set for [B|Ai ], and thus we get O(n2 gω−1 ) runtime
for this step.
To summarize, computing C takes O(n2 log n), sorting the O(n2 ) cycles also takes
O(n2 log n), and for checking linear independence we need O(nω ) for the annotations
of the edges, O(gn2 ) for the auxiliary annotations, and O(n2 gω−1 ) for the block-wise
linear independence checking. Overall, we thus get a runtime of O(nω + n2 gω−1 ).

Theorem 5.6. Given a 2-dimensional simplicial complex K with n faces and a weight
function w on its edges, we can compute an optimal basis of H1 (K) in time O(nω +
n2 gω−1 ).

5.2 Persistent cycles

In the persistent setting, given a filtration F and an interval [b, d], can we find an optimal
persistent p-cycle c that is born at b and dies at d.
Sadly, this problem is already known to be NP-hard for d < ∞ and p ⩾ 1. However,
if we assume that K is a weak (p + 1)-pseudomanifold, i.e., a simplicial complex in which
each p-simplex is a face of at most 2 (p+1)-simplices, then there exists a polynomial-time
algorithm, which we will describe in this section.
If we consider cycles that live until ∞, we can solve the problem in polynomial time
for p = 1, but it is NP-hard for p ⩾ 2. Here, the assumption of K being a weak (p + 1)-
pseudomanifold does not save us. However, if we further assume that the complex can
be embedded in Rp+1 , then it is again polynomial.
To solve the problem for d < ∞ in a weak (p + 1)-pseudomanifold, we consider
undirected flow networks: We have a graph, where every edge has a capacity in [0, ∞],
some sources, and some sinks, and we want to find the maximum flow we can send from
the sources to the sinks without sending too much flow through any edge. Recall that
if we consider a cut which separates the sources from the sinks, the capacity of this cut
is an upper bound on the value of the maximum flow. Furthermore, if we consider the
minimum such cut, its capacity is equal to the value of the maximum flow. This can be
solved in polynomial time.

79
Chapter 5. Optimal Generators Introduction to TDA

We can build a dual graph G, by placing a vertex into every (p + 1)-simplex and
adding an edge whenever they share a p-simplex. We furthermore add a dummy vertex
which gets connected to all vertices which only have one neighbor. We are going to make
the vertex belonging to the (p + 1)-simplex which is the destructor of our desired cycle
the source. Furthermore, we make the dummy vertex as well as all vertices belonging
to (p + 1)-simplices added after the destructor into sinks. Edges added at or before the
birth are getting the capacity equal to their weight, while all other edges get capacity
∞. Then, it turns out that the p-simplices belonging to the edges in a minimum cut
separating the sources from the sinks are an optimal persistent cycle.

Exercise 5.7. Consider a simplex-wise filtration on a simplicial complex that is a

weak (p + 1)-pseudomanifold, and consider some interval [b, d] (for d < ∞) such
that there exists a p-cycle born at b and dying at d. We look at the dual graph G
with source and sinks defined as in the lecture. Consider a cut with finite capacity
that separates the source from the sinks. Let c be the chain corresponding to the
p-simplices dual to the edges going over this cut. Show that c is a p-cycle born at
b and dying at d, and show that its weight is equal to the capacity of the cut.

This exercise proves one direction of the correctness of the algorithm described above.
The other direction is similar. We get the following result.

Theorem 5.8. Given a a simplex-wise filtration on a simplicial complex that is a

weak (p + 1)-pseudomanifold and an interval [b, d] (for d < ∞), we can compute an
optimal p-cycle born at b and dying at d in polynomial time.

For details, we refer to Chapter 5 in the book of Dey and Wang [1].

Questions
32. How can we compute an optimal basis given a set of cycles that contain one?
Explain the algorithm described in Section 5.1. Further, explain annotations and
how they can be used to check linear independence.
33. How can we compute a set of 1-cycles that contain an optimal basis of H1 ?
Describe the algorithm to do this and prove its correctness.
34. How can we compute an optimal persistent cycle? Explain the algorithm de-
scribed in Section 5.2.

References
[1] Tamal Krishna Dey and Yusu Wang, Computational topology for data analysis,
Cambridge University Press, 2022.

P. M. H. Wilson-Curved Spaces - From Classical Geometry To Elementary Differential Geometry-CUP (2008) PDF
100% (3)
P. M. H. Wilson-Curved Spaces - From Classical Geometry To Elementary Differential Geometry-CUP (2008) PDF
198 pages
(Monografie Matematyczne) Czeslaw Bessaga - Selected Topics in Infinite-Dimensional Topology - PWN-Polish Scientific Publishers (1975)
No ratings yet
(Monografie Matematyczne) Czeslaw Bessaga - Selected Topics in Infinite-Dimensional Topology - PWN-Polish Scientific Publishers (1975)
353 pages
Higgins - Notes On Categories and Groupoids
No ratings yet
Higgins - Notes On Categories and Groupoids
195 pages
XY-pic: 1. Xymatrix
No ratings yet
XY-pic: 1. Xymatrix
37 pages
Protein Modelling
No ratings yet
Protein Modelling
53 pages
Topological Data Analysis
No ratings yet
Topological Data Analysis
26 pages
Assembly Theory
No ratings yet
Assembly Theory
12 pages
Computational Topology For Data Analysis (Tamal Krishna Dey, Yusu Wang)
No ratings yet
Computational Topology For Data Analysis (Tamal Krishna Dey, Yusu Wang)
455 pages
Milnor Thom
No ratings yet
Milnor Thom
18 pages
Group Theory (MIlne)
No ratings yet
Group Theory (MIlne)
133 pages
Halverson, Category Reading List
No ratings yet
Halverson, Category Reading List
5 pages
(Mathematics of Data, 1) Hal Schenck - Algebraic Foundations For Applied Topology and Data Analysis-Springer (2022)
No ratings yet
(Mathematics of Data, 1) Hal Schenck - Algebraic Foundations For Applied Topology and Data Analysis-Springer (2022)
231 pages
Lectures On Math by Felix Klein
No ratings yet
Lectures On Math by Felix Klein
136 pages
Algebraic Topology A Computational Approach - Kaczynski - Mischaikow - Mrozek
No ratings yet
Algebraic Topology A Computational Approach - Kaczynski - Mischaikow - Mrozek
219 pages
B. Ozbagci and A. I. Stipsicz - Surgery On Contact 3-Manifolds and Stein Surfaces
100% (1)
B. Ozbagci and A. I. Stipsicz - Surgery On Contact 3-Manifolds and Stein Surfaces
285 pages
Several Complex Variables with Connections to Algebraic Geometry and Lie Groups 1st Edition Joseph L. Taylor 2024 Scribd Download
100% (1)
Several Complex Variables with Connections to Algebraic Geometry and Lie Groups 1st Edition Joseph L. Taylor 2024 Scribd Download
77 pages
Fourier-Mukai and Nahm Transforms in Geometry and Mathematical Physics (Claudio Bartocci, Ugo Bruzzo Etc.)
100% (1)
Fourier-Mukai and Nahm Transforms in Geometry and Mathematical Physics (Claudio Bartocci, Ugo Bruzzo Etc.)
434 pages
Introduction To Real Analysis Instructor Solution Manual 1nbsped 3030269019 9783030269012 - Compress
No ratings yet
Introduction To Real Analysis Instructor Solution Manual 1nbsped 3030269019 9783030269012 - Compress
429 pages
MIT Algebraic Topology
No ratings yet
MIT Algebraic Topology
135 pages
Probability and Geometry On Groups Lecture Notes For A Graduate Course
No ratings yet
Probability and Geometry On Groups Lecture Notes For A Graduate Course
209 pages
A First Course in Differential Equations. H. F. Weinberger
No ratings yet
A First Course in Differential Equations. H. F. Weinberger
456 pages
Lectures On Lie Groups and Representations of Locally Compact Groups
No ratings yet
Lectures On Lie Groups and Representations of Locally Compact Groups
140 pages
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (auth.) (z-lib.org)
No ratings yet
Asymptotics in Statistics Some Basic Concepts by Lucien Le Cam, Grace Lo Yang (auth.) (z-lib.org)
298 pages
Review: Genomic Approaches To Studying The Human Microbiota
No ratings yet
Review: Genomic Approaches To Studying The Human Microbiota
7 pages
2020 Skew PBW Extensions
100% (1)
2020 Skew PBW Extensions
581 pages
Introduction To Algebraic Structures
No ratings yet
Introduction To Algebraic Structures
142 pages
Topology and Grupoids
100% (1)
Topology and Grupoids
538 pages
Guide of Topology
100% (1)
Guide of Topology
66 pages
High Quality Online Texts and Notes:: Sition To Advanced Mathematics, 2nd Ed., Addison Wesley, Reading, MA
No ratings yet
High Quality Online Texts and Notes:: Sition To Advanced Mathematics, 2nd Ed., Addison Wesley, Reading, MA
8 pages
Introduction To Coalgebra 59: Towards Mathematics of States and Observation
100% (1)
Introduction To Coalgebra 59: Towards Mathematics of States and Observation
493 pages
Complete Download Topological Vector Spaces and Their Applications 1st Edition V.I. Bogachev PDF All Chapters
100% (7)
Complete Download Topological Vector Spaces and Their Applications 1st Edition V.I. Bogachev PDF All Chapters
62 pages
Abstract Algebra An Introduction 3rd Edition Thomas W. Hungerford - Read the ebook online or download it for the best experience
100% (1)
Abstract Algebra An Introduction 3rd Edition Thomas W. Hungerford - Read the ebook online or download it for the best experience
51 pages
Lectures On Groups of Transformations: J. L. Koszul
No ratings yet
Lectures On Groups of Transformations: J. L. Koszul
85 pages
(Encyclopedia of Mathematics and Its Applications) Magurn B.A. - An Algebraic Introduction To K-Theory-Cambridge University Press (2002)
No ratings yet
(Encyclopedia of Mathematics and Its Applications) Magurn B.A. - An Algebraic Introduction To K-Theory-Cambridge University Press (2002)
694 pages
Foundation of Algebra Geometry
No ratings yet
Foundation of Algebra Geometry
826 pages
Banach Spaces
100% (2)
Banach Spaces
34 pages
A Primer of Infinitesimal Analysis - Portada
0% (1)
A Primer of Infinitesimal Analysis - Portada
4 pages
Stacks Project Book
100% (1)
Stacks Project Book
2,757 pages
Triangulated Categories of Mixed Motives: Denis-Charles Cisinski Frédéric Déglise
100% (2)
Triangulated Categories of Mixed Motives: Denis-Charles Cisinski Frédéric Déglise
442 pages
Cvitanovic Et Al. Classical and Quantum Chaos Book (Web Version 9.2.3, 2002) (750s) - PNC
No ratings yet
Cvitanovic Et Al. Classical and Quantum Chaos Book (Web Version 9.2.3, 2002) (750s) - PNC
750 pages
George F. Simmons - Introduction To Topology and Modern Analysis-Krieger Publishing Company (June 1, 2003) PDF
100% (1)
George F. Simmons - Introduction To Topology and Modern Analysis-Krieger Publishing Company (June 1, 2003) PDF
384 pages
Topological Data Analysis For Genomics and Evolution Topology in Biology
No ratings yet
Topological Data Analysis For Genomics and Evolution Topology in Biology
522 pages
Turning Points in the HIstory of Mathematics
No ratings yet
Turning Points in the HIstory of Mathematics
112 pages
Dilations of Irreversible Evolutions in Algebraic Quantum Theory
100% (2)
Dilations of Irreversible Evolutions in Algebraic Quantum Theory
81 pages
Algebra Notes From The Underground 1st Edition Paolo Aluffi Download PDF
100% (2)
Algebra Notes From The Underground 1st Edition Paolo Aluffi Download PDF
79 pages
Computational Optimal Transport
No ratings yet
Computational Optimal Transport
56 pages
Notes On Lattice Theory: J. B. Nation University of Hawaii
No ratings yet
Notes On Lattice Theory: J. B. Nation University of Hawaii
69 pages
SPARQL Tutorial II & III
No ratings yet
SPARQL Tutorial II & III
91 pages
Algebraic and Coalgebraic Methods in The Mathematics of Program Construction - Backhouse, Crole, and Gibbons
100% (1)
Algebraic and Coalgebraic Methods in The Mathematics of Program Construction - Backhouse, Crole, and Gibbons
400 pages
Rings and Ideals A First Course in
No ratings yet
Rings and Ideals A First Course in
208 pages
Download Full Riemannian Geometry A Modern Introduction Second Edition Chavel I. PDF All Chapters
100% (12)
Download Full Riemannian Geometry A Modern Introduction Second Edition Chavel I. PDF All Chapters
60 pages
Thermodynamic Theory of Structure, Stability, and Fluctuation
No ratings yet
Thermodynamic Theory of Structure, Stability, and Fluctuation
240 pages
Algebraic Topology
No ratings yet
Algebraic Topology
60 pages
Diffusion Phenomena: Cases and Studies: Second Edition
From Everand
Diffusion Phenomena: Cases and Studies: Second Edition
Richard Ghez
No ratings yet
Z3 An Efficient SMT Solver
No ratings yet
Z3 An Efficient SMT Solver
4 pages
Complex Analysis - Christian Berg PDF
No ratings yet
Complex Analysis - Christian Berg PDF
192 pages
A Brief History of Functional Analysis
100% (1)
A Brief History of Functional Analysis
23 pages
Proof and the Art of Mathematics
From Everand
Proof and the Art of Mathematics
Joel David Hamkins
No ratings yet
Classical Approach to Constrained and Unconstrained Molecular Dynamics
From Everand
Classical Approach to Constrained and Unconstrained Molecular Dynamics
Ajith Gunaratne
No ratings yet
Substitutional Analysis
From Everand
Substitutional Analysis
Daniel Edwin Rutherford
No ratings yet
Treatise on Irreversible and Statistical Thermodynamics: An Introduction to Nonclassical Thermodynamics
From Everand
Treatise on Irreversible and Statistical Thermodynamics: An Introduction to Nonclassical Thermodynamics
Wolfgang Yourgrau
No ratings yet
A brief introduction to causal inference in machine learning
No ratings yet
A brief introduction to causal inference in machine learning
88 pages
Algorithms Lecture Notes Cambridge
No ratings yet
Algorithms Lecture Notes Cambridge
133 pages
The mathematics of causality
No ratings yet
The mathematics of causality
10 pages
Notes on Randomized Algorithms
No ratings yet
Notes on Randomized Algorithms
539 pages
Lecture Notes on High Dimensional Linear Regression
No ratings yet
Lecture Notes on High Dimensional Linear Regression
73 pages
The Mathematics of Kolmogorov-Arnold-Networks
No ratings yet
The Mathematics of Kolmogorov-Arnold-Networks
26 pages
Correlated Strategies and Correlated Equilibrium: Definition (Non-Cooperative Games)
No ratings yet
Correlated Strategies and Correlated Equilibrium: Definition (Non-Cooperative Games)
24 pages
355 Topology Lecture Notes ITU
No ratings yet
355 Topology Lecture Notes ITU
64 pages
Group Invariance in Statistical Inference (Narayan)
100% (1)
Group Invariance in Statistical Inference (Narayan)
176 pages
Real Analysis Solutions PDF
No ratings yet
Real Analysis Solutions PDF
3 pages
Modern Geometry - Methods and Applications - Part II - The Geometry and Topology of Manifolds - PDF Room
100% (1)
Modern Geometry - Methods and Applications - Part II - The Geometry and Topology of Manifolds - PDF Room
447 pages
Munkers Soln Selected
No ratings yet
Munkers Soln Selected
13 pages
Amitabha Lahiri - Lecture Notes On Differential Geometry For Physicists 2011 PDF
100% (2)
Amitabha Lahiri - Lecture Notes On Differential Geometry For Physicists 2011 PDF
102 pages
Topology Notes On Continuity
No ratings yet
Topology Notes On Continuity
15 pages
Homework 6
No ratings yet
Homework 6
3 pages
Summary of Syllabus Mathematics G100
No ratings yet
Summary of Syllabus Mathematics G100
13 pages
Quals-F20 Without Solutions
No ratings yet
Quals-F20 Without Solutions
6 pages
Intro Topology
No ratings yet
Intro Topology
7 pages
Topology CAT
No ratings yet
Topology CAT
6 pages
Topology: Notes For W4051, Fall 2004, Columbia University
No ratings yet
Topology: Notes For W4051, Fall 2004, Columbia University
42 pages
Multivariable Calculus Volume 2 Multivariable Calculus Galina Filipuk download
100% (1)
Multivariable Calculus Volume 2 Multivariable Calculus Galina Filipuk download
45 pages
Download full An Introduction to Proof through Real Analysis 1st Edition Daniel J. Madden ebook all chapters
100% (3)
Download full An Introduction to Proof through Real Analysis 1st Edition Daniel J. Madden ebook all chapters
51 pages
Lecture Notes Introduction to Topological Data Analysis
No ratings yet
Lecture Notes Introduction to Topological Data Analysis
80 pages
Continuous Calculus - Peter J. Olver
No ratings yet
Continuous Calculus - Peter J. Olver
112 pages
Chapter 1 Metric Spaces
100% (1)
Chapter 1 Metric Spaces
17 pages
Chapter 2. Topological Spaces: Proofs Covered in Class
No ratings yet
Chapter 2. Topological Spaces: Proofs Covered in Class
45 pages
Cubical Homo Top y Theory
No ratings yet
Cubical Homo Top y Theory
648 pages
Topology Balwan Mudgil
No ratings yet
Topology Balwan Mudgil
140 pages
Homework 5
No ratings yet
Homework 5
3 pages
LinAlg Problems Solved
No ratings yet
LinAlg Problems Solved
2 pages
PPSC Paper 2015
No ratings yet
PPSC Paper 2015
8 pages
11 Separable Space
No ratings yet
11 Separable Space
11 pages
(Ebook) Differential Topology by C. T. C. Wall ISBN 9781107153523, 1107153522 pdf download
100% (2)
(Ebook) Differential Topology by C. T. C. Wall ISBN 9781107153523, 1107153522 pdf download
58 pages
(Probability Theory and Stochastic Modelling 103) Zenghu Li - Measure-Valued Branching Markov Processes-Springer-Verlag GMBH (2023)
No ratings yet
(Probability Theory and Stochastic Modelling 103) Zenghu Li - Measure-Valued Branching Markov Processes-Springer-Verlag GMBH (2023)
481 pages
Basis of A Topology
No ratings yet
Basis of A Topology
9 pages