Lecture Notes Introduction to Topological Data Analysis
Lecture Notes Introduction to Topological Data Analysis
Patrick Schnider
Department of Computer Science, ETH Zürich
Andreasstrasse 5, CH-8050 Zürich, Switzerland
E-mail address: [email protected]
Contents
1 Mathematical Foundations 6
1.1 Topological Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Maps between topological spaces . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2 Homology 18
2.1 Simplicial Complexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2 Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.1 An intuitive view at holes . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2 Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Boundary Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.4 Cycle and boundary groups . . . . . . . . . . . . . . . . . . . . . . 28
2.2.5 Homology Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.6 Singular Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2.7 The 0-th homology group . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.8 Homology of Spheres . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2.9 Induced Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2.10 Application: Brouwer fixed point theorem . . . . . . . . . . . . . . 36
3 Persistence 38
3.1 Filtrations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Persistent Homology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 Algorithms for persistent homology . . . . . . . . . . . . . . . . . . . . . . 41
3.3.1 Persistence pairing algorithm . . . . . . . . . . . . . . . . . . . . . 41
3.3.2 Matrix reduction algorithm . . . . . . . . . . . . . . . . . . . . . . 43
3.4 Simplicial Complexes on Point Sets . . . . . . . . . . . . . . . . . . . . . . 44
3.4.1 Čech and Vietoris-Rips complexes . . . . . . . . . . . . . . . . . . 44
3.4.2 Delaunay and Alpha complexes . . . . . . . . . . . . . . . . . . . . 45
3.4.3 Subsample Complexes . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.5 Distance Metrics on Persistence Diagrams . . . . . . . . . . . . . . . . . . 48
3.5.1 Bottleneck Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.5.2 Wasserstein Distance . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4
Introduction to TDA Contents
5 Optimal Generators 76
5.1 Optimal basis of a fixed complex . . . . . . . . . . . . . . . . . . . . . . . 76
5.2 Persistent cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5
Chapter 1
Mathematical Foundations
1. ∅ ∈ T , X ∈ T .
S
2. For every S ⊆ T , S ∈ T.
T
3. For every finite S ⊆ T , S ∈ T .
For example, setting X = R2 and T to be the collection of open subsets (in the
geometric/calculus sense) of R2 , we can check that (X, T ) is a topological space. A
further example of a topological space is (X, 2X ), where 2X denotes the family of all
subsets of X. This is called a discrete topology.
You might wonder, why do we consider infinite unions of open sets to be open,
but restrict to finite intersections in Condition 3. This is such that the open sets of
Euclidean space can actually be called open in the language of topology. If we allowed
infinite intersections in Condition 3, a set {p} consisting a single point p ∈ R2 would
6
Introduction to TDA 1.1. Topological Spaces
have to be considered to be open: it is the intersection of the infinite series of open balls
of radius 1/n centered at p, for n ∈ N.
In most applications in these lecture notes, we work with subspaces of the Euclidean
space Rd , so apart from open sets, we also know from calculus notions such as closed
sets, closure, interior and boundary. These terms can be defined also for abstract
topological spaces:
Definition 1.2. A set Q ⊆ X is called closed, if its complement X \ Q is open. The
closure cl Q is the smallest closed set containing Q. The interior int Q is the union
of all open subsets of Q. The boundary bnd Q is the set minus its interior: bnd Q =
Q \ int Q.
Note that sets can be open and closed simultaneously: in every topological space
(X, T ), ∅ and X are such examples. In a discrete topology, every subset S ⊆ X is open
and closed.
Exercise 1.3. Show that a finite union of closed sets is closed.
So far we have only seen two topological spaces: Euclidean space, or any set with
the (rather boring) discrete topology. In order to see the value in the abstractions we
are doing, we would like to have more examples of topological spaces. In particular, it
would be great if we had a way to get new topological spaces from known ones. In the
following we discuss some ways to this, starting with taking intersections.
Lemma 1.4. Let (X, T ) be some topological space, and Y ⊆ X. Then, U := {A∩Y | A ∈ T }
is a topology on Y. We call this a subspace topology.
Proof. We check the three conditions of a topology:
1. ∅ = ∅ ∩ Y, therefore ∅ ∈ U. Similarly, Y = X ∩ Y, and thus Y ∈ U.
3. n
Tn Tn
i=1 (Ai ∩ Y) = ( i=1 Ai ) ∩ Y, and thus
T
i=1 (Ai ∩ Y) ∈ U.
Since we have seen that Rd is a topological space, this already tells us that all subsets
of Rd are topological spaces.
Another way to get topological spaces is as a product of spaces. We will not discuss
the details of this here, and refer the interested reader to any textbook on topology, such
as the excellent book by Munkres [1].
Fact 1.5. Let X, Y be two topological spaces. Then, X × Y is a topological space, with
the so-called product topology.
The definition of topological spaces allows us to formally define concepts from geom-
etry in a more abstract setting:
7
Chapter 1. Mathematical Foundations Introduction to TDA
Definition 1.6. A topological space (X, T ) is disconnected, if there are two disjoint non-
empty open sets U, V ∈ T , such that X = U ∪ V. A topological space is connected, if
it is not disconnected.
Exercise 1.7. In this exercise, we will use topology to prove that the set of primes is
infinite.
We define the sets S(a, b) as follows:
S(a, b) := {an + b | n ∈ Z}, ∀a ∈ Z \ {0}, b ∈ Z
We then say that a set U ⊆ Z is open, if and only if for all x ∈ U, there exists
a ∈ Z such that S(a, x) ⊆ U. This is equivalent to saying that every open set U is
a union of zero or more (including infinitely many) sets S(a, b).
(a) Show that this defines a topology on Z.
(b) Let A ⊂ Z be finite and non-empty. Show that Z \ A cannot be closed.
(c) Show that S(a, b) is both open and closed.
(d) Show that
[
S(p, 0) = Z \ {−1, 1}
p prime
8
Introduction to TDA 1.3. Maps between topological spaces
• All knots (embeddings of the circle into R3 ) are homeomorphic. Thus, we cannot
distinguish between knots using only homeomorphism.
Exercise 1.13. Give an example of a map f : X → Y that is bijective but not a homeo-
morphism.
9
Chapter 1. Mathematical Foundations Introduction to TDA
Exercise 1.14. Consider a grid of 2 vertical line segments and k + 2 horizontal seg-
ments, for some k ⩾ 0. For k = 1, this looks as follows:
Now, we consider the problem of placing a point on each of the k + 2 horizontal
line segments, such that each of the k + 4 total line segments contains at least one
point.
(a) How could one define a topology on the set of all such point placements?
'
The example of the knots shows that in certain cases, maps and homeomorphism are
not a good language to capture the relevant properties. In some cases, we want to look
at an entire process of continuously deforming one object into another.
Some examples:
• Let X ⊂ R be the union of 0, and [1, 2], and let Y ⊂ R be the union of [0, 1] and 2.
These spaces are homeomorphic (X ≃ Y), but not isotopic.
• The two knots from Figure 1.1 above are also not isotopic.
• Consider the two spaces in Figure 1.2. Do you think they are isotopic? Most
people would probably argue that they are not, as in one of them the “handcuff”
wraps around the “pole” once and in the other one twice. However, it turns out
that the spaces are in fact isotopic. An isotopy is illustrated by the following video:
https://www.youtube.com/watch?v=wDZx9B4TAXo
10
Introduction to TDA 1.3. Maps between topological spaces
Figure 1.2: Left: Both handcuffs are connected to an infinite pole. Right: Only one
loop of the handcuffs is connected to the infinite pole. These spaces are
isotopic.
Some examples:
• The inclusion map g : B3 ,→ R3 (where B3 is the unit ball in R3 ), and h : B3 → R3
which sends every point to the origin, are homotopic, as shown by the homotopy
H(x, t) = (1 − t)g(x).
• g ◦ h is homotopic to idY .
For example, the circle S1 and R2 \ {0} are homotopy equivalent. We pick g as the
inclusion map S1 ,→ R2 \ {0}, and h(x) := |x|
x
. We see that h ◦ g(x) = x, i.e., h ◦ g = idS1 .
11
Chapter 1. Mathematical Foundations Introduction to TDA
Furthermore, g ◦ h(x) = h(x). Finally, g ◦ h and idR2 \{0} are homotopic as certified by
the homotopy H(x, t) := tx + (1 − t)h(x).
An important example of a homotopy equivalence are deformation retracts:
• R(·, 0) = idX
• R(x, 1) ∈ A, ∀x ∈ X
• R(a, t) = a, ∀a ∈ A, t ∈ [0, 1]
Some examples:
• A punctured torus can be deformation retracted onto the symbol 8 where one of
the two circles is rotated by 90◦ , as seen by the following video:
https://www.youtube.com/watch?v=tz3QWrfPQj4
Lemma 1.20. If X and Y are homeomorphic, they are also homotopy equivalent.
Proof. Let g : X → Y be the homeomorphism, and h := g−1 its inverse. Then g ◦ h = idY
and h ◦ g = idX , and id is homotopic to itself.
The following is a nice way to show that two spaces are homotopy equivalent:
Fact 1.21. X, Y are homotopy equivalent if and only if there exists a space Z such
that X and Y are deformation retracts of Z.
Exercise 1.22. Sort the letters of the alphabet into equivalence classes under homotopy
equivalence.
Exercise 1.23. Show that both a cylinder and a Möbius strip are homotopy equivalent
to a circle.
12
Introduction to TDA 1.4. Algebra
Figure 1.3: The top space deformation retracts to both spaces below, showing that
they are homotopy equivalent.
Exercise 1.24. Let X be S2 where the north pole and the south pole have been glued
together, see Figure 1.4a. Let Y be S2 with an S1 attached at the north pole, see
Figure 1.4b.
Give an informal argument that X and Y are homotopy equivalent. Bonus ques-
tion: Are they also homeomorphic?
We note that in general showing existence of a map with certain properties (e.g., a
homeomorphism, isotopy, homotopy) is easy: just give a map and show that it satisfies
the required properties. On the other hand, showing that such a map cannot exist is
hard, as there are usually infinitely many candidate maps. The idea of algebraic topology
is to construct invariants preserved by these maps. Then, we know that no map can exist
between spaces on which these invariants differ. An example of such an invariant is the
number of “holes” a space has, which we will formalize when we introduce the notion of
homology.
1.4 Algebra
In this section we need the necessary background in algebra that is needed for the basics
of homology theory. Just as for topology, we first introduce the objects of study, followed
13
Chapter 1. Mathematical Foundations Introduction to TDA
Definition 1.25. A group (G, +) is a set G together with a binary operation “+” such
that
1. ∀a, b ∈ G: a + b ∈ G
2. ∀a, b, c ∈ G: (a + b) + c = a + (b + c) (Associativity)
3. ∃0 ∈ G: a + 0 = 0 + a = a ∀a ∈ G
4. ∀a ∈ G∃ − a ∈ G: a + (−a) = 0
(G, +) is abelian if we also have
5. ∀a, b ∈ G: a + b = b + a (Commutativity)
Examples:
• The moves of a Rubik’s cube also form a group (with the operation being concate-
nation), but not an abelian one: let L denote moving the left face clockwise, and
let U denote moving the upper face clockwise. Replacing “clockwise” by counter-
clockwise we get −L and −U, respectively. Now, if the group was abelian, then
L + U − L − U should give the same configuration again, but if you do these moves
on a Rubik’s cube, you will see that the configuration has changed.
As groups can be very large, even infinite, it can be useful to have a concise way of
writing them:
Examples:
• The six standard moves of the Rubik’s cube (rotating the top, bottom, front, back,
left, or right layer clockwise by 90◦ ) are a generator for the Rubik’s cube moves.
Exercise 1.27. A cyclic group is a group G that contains an element g ∈ G such that
{g} is a generator of G. Show that every cyclic group is abelian (commutative).
14
Introduction to TDA 1.4. Algebra
Exercise 1.28. Consider a Rubik’s cube. Prove that no move (sequence of elementary
moves) X exists such that every Rubik’s cube can be solved by repeatedly applying
X.
Definition 1.29. For some group (G, +), H ⊆ G is a subgroup, if (H, +) is also a group.
For example, the even integers (including 0) are a subgroup of (Z, +). Subgroups are
important in group theory, as they can be used to partition a group into several parts:
Examples:
• R/Z is the circle group (the multiplicative group of all complex numbers of absolute
value 1). (You should try and convince yourself, why).
In order to compare groups with each other, we again want a notion of maps between
groups, that behave well with the group structures:
cokernel coker h := H/ im h
Note that we are assuming something in our definition of the cokernel: for the defi-
nition of a quotient group to apply, we need the divisor group to be a subgroup of the
dividend group. Luckily, the following lemma says that im h is always a subgroup of H.
Lemma 1.32. ker h and im h are subgroups of (G, +) and (H, ⋆), respectively.
15
Chapter 1. Mathematical Foundations Introduction to TDA
3. ∀a ∈ G : h(0) ⋆ h(a) = h(0 + a) = h(a), and thus h(0) = 0, from which 0 ∈ ker h
follows.
Exercise 1.34. For two Abelian groups (G, ⋆) and (H, +), let the set of all homomor-
phisms f : G → H be denoted by Hom(G, H).
(a) Show that for any groups G, H, (Hom(G, H), ⊕), where the operation + is
defined as
(f ⊕ g)(x) = f(x) + g(x), ∀x ∈ G,
is also a group.
∼ Z2 , i.e., the groups are isomorphic.
(b) Show Hom(Z22 , Z2 ) = 2
As the example of the integers shows, a big motivation for the study of groups comes
from number theory. However, in number theory we do not only have addition but also
multiplication. This motivates the following definition:
2. ∀a, b, c ∈ R:
(a · b) · c = a · (b · c) and (Associativity of ·)
a · (b + c) = a · b + a · c,
(b + c) · a = b · a + c · a (Distributivity)
Definition 1.36. A commutative ring in which every non-zero element has a multi-
plicative inverse (∀a ∈ R \ {0}, ∃b ∈ R : a · b = 1) is called a field.
16
Introduction to TDA 1.4. Algebra
Another important area of algebra, which you already know, is linear algebra. Here,
vectors can be added and subtracted. Further the field of real numbers are called scalars
and they can be multiplied with vectors. So, we have very similar operations at hand.
This motivates the following generalization of the concept of vector spaces.
1. r ⊗ (x + y) = (r ⊗ x) ⊕ (r ⊗ y)
2. (r + r ′ ) ⊗ x = (r ⊗ x) ⊕ (r ′ ⊗ x)
3. 1 ⊗ x = x
4. (r · r ′ ) ⊗ x = r ⊗ (r ′ ⊗ x)
In the literature, often the same symbol (·) is used for both operations · and ⊗, and
+ for both + in R and ⊕ in M. For a vector space, this should feel quite normal, since
for the vector space Rn (which is an R-module), we also write · for multiplying scalars
to both scalars and vectors, and + for addition of both scalars and vectors.
Modules appear all over the place in homology theory. In some cases, in particu-
lar in all the ones we discuss in these lecture notes, the modules happen to be vector
spaces. Thus, most of what we discuss in the following chapters could be phrased using
only language from linear algebra. However, to be consisting with most of the existing
literature, we will phrase most results slightly more general.
Questions
1. What is a topological space? Give the formal definition and some examples.
2. What is a continuous map between topological spaces? What is a homeomor-
phism? State the definitions and give examples.
3. What is a homotopy? What is a homotopy equivalence? Give the formal
definitions. Further, define deformation retracts and use them to give an alternative
definition of homotopy equivalence.
4. What are groups and the maps between them? State the definitions and prove
that the image and kernel are subgroups.
References
[1] J.R. Munkres, Topology, Prentice Hall, Incorporated, 2000.
17
Chapter 2
Homology
A face of a simplex is the convex hull of a subset of its vertices. In particular, every
face of a simplex is also a simplex. The empty set ∅ is also a face. The (k − 1)-faces are
called facets.
18
Introduction to TDA 2.1. Simplicial Complexes
Figure 2.2: The left is a simplicial complex. The right is not, as the intersection of
the two triangles is not a face of both of them.
in topological data analysis, we may assume that all simplicial complexes are finite, that
is, consisting of finitely many simplices.
The way we defined them, simplicial complexes are geometric objects. However, we
can also study them in a purely combinatorial setting.
19
Chapter 2. Homology Introduction to TDA
Does every abstract simplicial complex have a geometric realization? For 1-dimensional
complexes (graphs), we know that not all graphs admit a straight-line embedding in the
plane, as only planar graphs admit any embedding, i.e., crossing-free drawing, in the
plane. However, by placing the vertices in R3 in such a way that no four vertices lie on
a common plane, we see that we can always find a geometric realization of a graph in
R3. This generalizes to the following realization theorem:
Theorem 2.5. Every k-dimensional simplicial complex has a geometric realization in
R2k+1.
Proof. Place the vertices as distinct points on the moment curve in R2k+1 , which is
the curve given by f(t) = (t, t2 , . . . , t2k+1 ). This way, any 2k + 2 of the placed points are
affinely independent. Thus, any two faces with disjoint vertex sets will not intersect in
the realization, showing that the realization is indeed an embedding.
Since we now know that abstract and geometric simplicial complexes can be translated
into one another, we will not make the distinction between them again and just use the
word simplicial complex for both objects in the following. As a subset of Euclidean
space, a simplicial complex thus also inherits the subspace topology from Rd , which
allows us to view simplicial complexes as topological spaces.
On the other hand, most topological spaces are not simplicial complexes by definition.
For example, the 2-sphere S2 is not a simplicial complex, as it is not defined by a vertex
set and faces. However, the boundary of a tetrahedron is a simplicial complex, and it is
homeomorphic to S2 , so if we want to work with S2 , from a topologist’s point of view,
we might as well work with the boundary of a tetrahedron instead. This motivates the
following definition.
20
Introduction to TDA 2.1. Simplicial Complexes
• For a poset (P, ⩽), the set of all chains of P forms a simplicial complex, giving rise
to the order topology.
Another example of high relevance for topological data analysis is the nerve:
Definition 2.7. For a finite collection U of sets, its nerve N(U) is a simplicial complex
on the vertex set U that contains U0 , . . . , Uk as a k-simplex if and only if U0 ∩ . . . ∩
Uk ̸= ∅.
In many applications, the considered sets are subsets of some topological space. In
this case, we often want the intersections to be “well-behaved”.
Definition 2.8. Let X be a metric space, and U a finite family of closed subsets of X.
We call U a good cover, if every non-empty intersection of sets in U is contractible
(i.e., homotopy equivalent to a point).
Under these conditions on the sets, we get the following, very powerful theorem, which
allows us to relate complicated spaces (unions of sets) with a much simpler simplicial
complex, namely the nerve. For a proof of this we refer to any textbook on algebraic
topology, for example the one by Hatcher [2].
Theorem
S 2.9 (Nerve theorem). If U is a good cover, then |N(U)| is homotopy equivalent
to U.
The nerve theorem also holds if all the sets in U are open with contractible intersec-
tions, but it may fail if some sets in U are closed, and some open: We can have an open
and a closed set which do not intersect, but whose union is connected.
Now that we have defined simplicial complexes, once again we want to study maps
between them. The study of simplicial complexes and the maps between them, as we
will define them, is called combinatorial topology.
Recall that simplicial complexes are topological spaces, so there is also the notion of
continuous maps between them. It can be shown that every simplicial map is continuous.
21
Chapter 2. Homology Introduction to TDA
On the other hand, continuous maps are in general not even vertex maps and thus not
simplicial. Thus, simplicial maps are more restrictive than continuous maps. However,
the difference of the two concepts is smaller than one might think at first glance.
Fact 2.12. Every continuous map f : |K1 | → |K2 | can be approximated arbitrarily
closely by simplicial maps on appropriate subdivisions of K1 and K2 .
This shows that we can consider simplicial maps to be the analogue of continu-
ous maps in the world of simplicial complexes. This begs the question whether other
definitions from topology, such as homotopies or deformation retracts, have simplicial
analogues. As we will see in the next few definitions, they do.
Note that every face that is a superset of a free face is either a maximal face or also
free.
Definition 2.15. A collapse is the operation of removing all faces γ that contain some
fixed free face τ. A simplicial complex is collapsible if there is a sequence of collapses
leading to a point.
22
Introduction to TDA 2.1. Simplicial Complexes
Figure 2.3: Bing’s house with two rooms. Image taken from [2].
from Rd . On the other hand, some topological spaces (the triangulable ones) can be ex-
pressed by simplicial complexes. As for maps, every simplicial map is continuous. On
the other hand, continuous maps between simplicial complexes can be approximated by
simplicial maps between subdivisions of the simplicial complexes. A similar property
holds between homotopic maps and contiguous maps, as well as between deformation
retracts and collapses. In general, we can say that the terms in combinatorial topol-
ogy are special cases of their “continuous” counterparts, and if we consider triangulable
spaces, the continuous terms can be approximated in some way by their combinatorial
23
Chapter 2. Homology Introduction to TDA
2.2 Homology
24
Introduction to TDA 2.2. Homology
In the following we will make this intuition precise by precisely defining the types of
subcomplexes we consider, as well as the notions of boundaries and cycles, and how we
can mathematically describe the cycles that are not boundaries.
2.2.2 Chains
Let K be a simplicial complex with mp p-simplices.
Definition 2.16. A p-chain c (in K) is a formal sum1 of p-simplices added with some
coefficients from some ring R.
X
mp
c= αi σi
i=1
X
mp
′
c + c := (αi + αi′ )σi
i=1
We write Cp (K) for the set of all p-chains in K, called the p-th chain group. The
following observation shows that this name makes sense:
Observation 2.17. (Cp (K), +) is an abelian group, it is free, and the p-simplices form
a basis.
2. ∀c1 , c2 , c3 ∈ Cp (K),
P (1) (2) P (3) P (1) (2) (3)
(c1 + c2 ) + c3 = (αi + αi )σi + αi σi = (αi + αi + αi )σi =
P (1) P (2) (3)
αi σi + (αi + αi )σi = c1 + (c2 + c3 ).
P
3. 0 = 0σi ∈ Cp (K)
P P
4. ∀c ∈ Cp (K) we have −c = (−αi σi ) ∈ Cp (K) and c + (−c) = (αi − αi )σi = 0
Commutativity follows from + being commutative, thus the group is abelian. The p-
simplices clearly form a basis, since the set of chains is defined as the set of formal sums
of these p-simplices.
1
A formal sum just means that we formally write a sum, but that there is no meaning behind the
operation of adding the simplices.
25
Chapter 2. Homology Introduction to TDA
Observation 2.18. Equipped with the appropriate function · : R×Cp (K) → Cp (K), Cp (K)
is an R-module.
The proof is similar and left as an exercise, but the statement should feel natural
since every chain is simply described by a vector of mp elements of R, with addition
being element-wise addition in R.
From now on we will always work with the ring R = Z2 , so in particular we have
that c + c = 0. With this, we will define homology over Z2 . Using some slightly more
abstract definitions, all of the following can be extended to define homology over any
ring R. For more on this, we refer to any textbook on algebraic topology, e.g. the one by
Hatcher [2].
X
p
{v1 , . . . , vp } + {v0 , v2 , . . . , vp } + . . . + {v0 , . . . , vp−1 } = {v0 , . . . , vˆi , . . . , vp }
i=0
In the above notation, vˆi denotes that the element vi is omitted from the set. Note that
δp (σ) is a (p − 1)-chain. For some examples, see Figure 2.5.
3 3
δ2( )= + + ≈
1 2 1 2
δ0( ) = 0
Figure 2.5: The boundary chains of two different simplices.
26
Introduction to TDA 2.2. Homology
Let us apply this definition to the following example. In a slight abuse of notation,
we denote a face {a, b, c} by abc.
d
b
a e
P
Proof.
P It is enough to show this for simplices, as δp−1 ◦ δp (c) = δp−1 ( αi (δp (σi ))) =
αi (δp−1 ◦ δp (σi )).
For a p-simplex σ, every (p − 2)-face is contained in exactly 2 (p − 1)-faces, and does
thus not appear in δp−1 ◦ δp (σ).
δk+1 k δ 2 δ 1 δ 0 δ
0 = Ck+1 (K) −→ Ck (K) −→ Ck−1 (K) · · · C2 (K) −→ C1 (K) −→ C0 (K) −→ C−1 = 0
27
Chapter 2. Homology Introduction to TDA
Proof. Bp = im δp+1 .
We will not prove this statement here, but to see that Bp ⊆ Zp , recall that by
Lemma 2.20 the boundary of a boundary is empty.
Definition 2.26. The p-th homology group Hp (K; Z2 ) is the quotient group Zp (K)/Bp (K).
28
Introduction to TDA 2.2. Homology
c0
Exercise 2.27. Visualize the following simplicial complex K: 0-faces {a, b, c, d, e}, 1-
faces {ab, ac, ad, bc, bd, cd, ce, de} and 2-faces {abc, abd, acd, bcd}. For the dimen-
sions 1 & 2, what are the cycle, boundary, and homology groups of K? Note: You
can express the groups by their generators. You do not need to write out all the
elements.
Exercise 2.28. Give an informal derivation for the homology groups of a torus (see
Figure 2.8). Can you find a space with isomorphic homology that is not homeo-
morphic to the torus?
29
Chapter 2. Homology Introduction to TDA
Exercise 2.29. For a simplicial complex K, its cone CK is the complex with the same
set of vertices plus one additional vertex z, and such that for all simplices in K we
have
{a, b, c, . . .} ∈ K =⇒ {a, b, c, . . . , z} ∈ CK
(b) Show that the homology of the cone CK is 0 in all dimensions d > 0, for any
K.
(c) Bonus: What would happen (intuitively and to the homology) if we extended K
in the same way as before, but with two points? (this is called the suspension
of K)
Here are some nice properties of homology groups, that will be beneficial for us, but
that we will not prove here.
Fact 2.30.
• Hp is a Z2 -vector space.
Remark 2.31. If we consider homology defined over other rings, e.g. over Z instead
of Z2 , the homology groups might not be free.
4
K: 1
H1(K) : {0, 123, 234, 1234} ∼
= Z22
2
Recall that our original motivation was to count the number of holes. With homology
as we defined it, we have the algebraic structure of a vector space where we can add holes.
The number of distinct holes is now just the dimension of this vector space.
In the definition above, dim denotes the dimension of a vector space as you know it
from Linear Algebra, i.e., dim Hp is the number of elements in a basis of Hp .
30
Introduction to TDA 2.2. Homology
χ = k0 − k1 + k2 − . . .
Exercise 2.34. Take any vector v = (a0 , . . . , ad ) ∈ Nd+1 with a0 > 0. Show that there
exists a simplicial complex Kv with that vector as its Betti numbers.
Note that in this definition we do not require σ to be injective, thus it would even
be possible to map the simplex to a single point.
We now define Cp the same way as before, but now on the family of all singular
p-simplices, which in general makes the group uncountably infinite. We also define δp
as before, leading to Zp and Bp now also being uncountably infinite. Similarly, Hp (X) =
Zp (X)/Bp (X). The following relates singular homology and simplicial homology.
As isomorphisms for vector spaces are an equivalence relation, we also get the desired
independence of the triangulation.
∼ Hp (K2 )
Corollary 2.37. Let K1 , K2 be two distinct triangulations of X. Then, Hp (K1 ) =
for all p ⩾ 0, that is, homology is independent of the chosen triangulation.
For the remainder of these notes, we will only work with simplicial homology, but we
often talk about the homology of a triangulable space without specifying a triangulation.
The above corollary gives us the right to do this.
31
Chapter 2. Homology Introduction to TDA
Further, the 0-homology classes are the formal sums of connected components.
H0 (Sd ): Let us first investigate H0 (Sd ). Since all vertices are connected, all vertices are
homologous, and H0 (Sd ) = ⟨[v]⟩ = ∼ Z2 .
Hd (Sd ): Now, let us check Hd (Sd ). We first compute Zd : Obviously, the zero element is
part of Zd . Furthermore, the d-simplices are exactly the sets σi = {v0 , . . . , vˆi , . . . , vd+1 .
The sum c of all these d-simplices must be a cycle, since every d − 1-simplex occurs
in exactly two d-simplices, thus the boundary of c must be empty. Thus, c ∈ Zd . We
cannot have any other cycle, since for any other chain there must be some d-simplex for
which we include one neighbor but not the other, thus this d-simplex would be part of
the boundary. We conclude that Zd (Sd ) = ⟨c⟩.
Since δ(∆d+1 ) is a d-dimensional simplicial complex, and thus does not contain any
(d +1)-simplices, c cannot be a boundary. Since Bd is a subgroup of Zd , we thus get that
Bd (Sd ) is the group containing only 0. Alternatively, we can also get this by noticing
that Cd+1 = 0, and Bd = im δd+1 = 0.
We finally get Hd (Sd ) = Zd /Bd = Zd =∼ Z2 .
P
Hp (Sd ): Finally, let us go to Hp (Sd ), for 0 < p < d: Let c = αi σi be a p-cycle. We
aim to show that c is homologous to the 0-chain, i.e., that [c] = 0. Equivalently, we show
that c must be a boundary.
Let σ = (vm0 , . . . , vmp ) be any p-simplex in c which does not include v0 . We will
keep replacing such simplices by simplices which do contain v0 , until we have no more
simplices not containing v0 .
32
Introduction to TDA 2.2. Homology
Let b be the (p + 1)-simplex (v0 , vm0 , . . . , vmp ). Note that b ∈ δ(∆d+1 ) and thus δ(b)
is a p-boundary. Also note that σ is in δ(b). Furthermore, σ is the only p-simplex in
δ(b) which does not contain v0 . We now add δ(b) to c, to get c ′ := c + δ(b). Since we
added a boundary, [c] = [c ′ ] (i.e., c and c ′ are homologous). Furthermore, c ′ contains
one fewer p-simplex not containing v0 , when compared to c.
We repeat this process until we reach a cycle c∗ in which every p-simplex contains
v0 . We now claim that c∗ must be the trivial cycle: Assume c∗ contains some p-simplex
a = (v0 , va1 , . . . , vap ). Then, the (p − 1)-simplex a ′ = (va1 , . . . , vap ) is part of δ(a). But,
a ′ cannot be part of the boundary of any other p-simplex in c∗ , since the only p-simplex
containing a ′ as a face while also containing v0 is a. Thus, to have an empty boundary,
c∗ must be 0. We thus have [c∗ ] = 0, and by construction, [c] = [c∗ ], therefore [c] = 0 as
we aimed to prove.
We have proven that every cycle is homologous to 0, and we can conclude that for
all 0 < p < d, Hp (Sd ) = 0.
By these arguments we conclude the following theorem:
Theorem 2.39. For any d > 0, we have
d Z2 p ∈ {0, d}
Hp (S ) =
0 else.
1 p ∈ {0, d}
βp (Sd ) =
0 else.
f# : Cp (K1 ) → Cp (K2 )
X X f(σi ) if f(σi ) is p-simplex in K2
c= αi σi 7→ f# (c) := αi τi , where τi =
0 otherwise
Note that f(σi ) is always a simplex in K2 since f is a simplicial map, but it could be a
simplex of smaller dimension. This is why we have the condition in the above definition
of τi .
The following can be shown with a bit of work:
• f# ◦ δ = δ ◦ f#
33
Chapter 2. Homology Introduction to TDA
f∗ : Hp (K1 ) → Hp (K2 )
[c] = c + Bp 7→ f# (c) + Bp (K2 ) = [f# (c)]
Fact 2.40. If Hp (K1 ) and Hp (K2 ) are vector spaces (as they are in e.g. Z2 -homology,
which is what we are using), then f∗ is a linear map.
We also get the following functorial property, which we will not prove: if f : X → Y,
g : Y → Z, then (g ◦ f)∗ = g∗ ◦ f∗ .
Let us look at a small example:
b b
a d a d
K1 K2
c c
b
We consider f : K1 ,→ K2 the inclusion map.
∼ Z2
H1 (K1 ) = {0, [abc], [bcd], [abdc]} = 2
and
a 7→ y, b 7→ x, c 7→ y, d 7→ z, e 7→ z.
You can verify easily that f is simplicial. Compute f∗ : Hp (K1 ) → Hp (K2 ) for
0 ⩽ p ⩽ 2.
Exercise 2.42. Which of the following four statements is true for every simplicial map
f?
“If f is {injective, surjective}, then f∗ is {injective, surjective}.”
34
Introduction to TDA 2.2. Homology
The following fact has some very powerful consequences, as we will see.
Fact 2.43. If f, g : K1 → K2 are contiguous, f∗ = g∗ .
Note that the definition of induced homology extends from simplicial maps to maps
between any topological spaces. We will not state the exact definitions, but the following
fact is the continuous analogue (remember that two simplicial maps being contiguous is
analogous to two maps being homotopic) of the previous fact.
Fact 2.44. If f, g : X → Y are homotopic, f∗ = g∗ .
The following corollary is very useful to compute the homology of a space, as it gives
us the option to relate it to the homology of a potentially simpler space.
Corollary 2.45. If f : X → Y is a homotopy equivalence (i.e., there exists g : Y → X
such that f ◦ g is homotopic to idY and g ◦ f is homotopic to idX ), then f∗ is an
isomorphism.
In particular, if Y is a deformation retract of X, then Hp (Y) and Hp (X) are isomorphic.
As a special case of the above, we have that a contractible space has the same homology
groups as a point.
Z2 p = 0,
Corollary 2.46. If X is contractible, Hp (X) =
0 otherwise.
Exercise 2.47.
Consider the space you get when you glue together two points of a torus. What is
the homology of this space?
Consider the space you get when you simultaneously pierce a balloon at n distinct
locations. What is the homology of this space?
Exercise 2.48. Let f, g : S1 → S1 be continuous maps such that f(−x) = f(x) and
g(−x) = −g(x) for all x ∈ S1 .
a) Convince yourself that f∗ : H1 (S1 ) → H1 (S1 ) is trivial (maps everything to 0)
and that g∗ is an isomorphism.
35
Chapter 2. Homology Introduction to TDA
36
Introduction to TDA 2.2. Homology
f (x)
x
r(x)
Questions
5. What is a simplicial complex? Define geometric and abstract simplicial com-
plexes and state and prove the realization theorem (Theorem 2.5).
6. What are simplicial and contiguous maps? State the definitions and discuss the
connection to their counterparts in continuous topology.
7. Is every contractible simplicial complex collapsible? Define the notion of col-
lapsibility and describe Bing’s house with two rooms.
8. What is simplicial homology? Explain the intuition and give the formal defini-
tions of chains, boundaries and cycles.
9. Why is the homology of a triangulable space independent of the chosen trian-
gulation? Explain the idea of singular homology.
10. What are the homology groups of a sphere? State and prove the corresponding
theorem (Theorem 2.39).
11. How does a simplicial map between two simplicial complexes induce maps
between their homology groups? Define induced homomorphisms.
12. What is the Brouwer fixed point theorem? State, illustrate and prove the
Brouwer fixed point theorem (Theorem 2.49).
References
[1] Sketches of topology - Bing’s house. https://sketchesoftopology.wordpress.com/
2010/03/25/bings-house/, accessed: 2023-04-27.
[2] Allen Hatcher, Algebraic topology, Cambridge Univ. Press, Cambridge, 2000.
37
Chapter 3
Persistence
In the previous chapter, we have studied the homology of fixed simplicial complexes.
In this chapter, we will look at simplicial complexes that vary over time. Let us start
with a small example. Consider the following process of building up a triangle abc. At
time t1 , we add the vertices a and b together with the edge ab. This gives birth to a
single connected component. At time t2 we add the vertex c, giving birth to a second
connected component. At time t3 we add the edge ac, connecting the two components.
We can interpret this as the younger of the components dying. At time t4 we add the
final edge bc, which gives birth to a hole, that is, an element of the homology group H1 .
Finally, at time t5 we add the interior of the triangle, killing the hole born at t4 . We
can summarize this process as follows: we have a connected component that was born
at t1 and survived the entire process, and a connected component that was born at t2
that died again at t3 . Finally, we have a hole born at t4 dying at t5 . Capturing this
information of holes with their birth and death is the motivation of persistent homology.
Persistent homology can be applied to data analysis by defining (in a way that we will
see soon) a process to build up a simplicial complex from point cloud data and computing
the birth and death times of holes. Subtracting the birth time from the death time, we
get the lifespan of a hole, and the underlying idea is that holes with a short lifetime are
a byproduct of the process, whereas holes with a long lifespan convey information about
the shape of the underlying data.
3.1 Filtrations
We start by a mathematical formulation of the process of building up a complex or, more
general, a topological space. A filtration is a nested sequence of subspaces
F : X0 ⊆ X1 ⊆ X2 ⊆ . . . ⊆ Xn = X.
For each i ⩽ j, we have the inclusion map ιi,j : Xi ,→ Xj . Given these functions ι,
we get induced maps in homology: hi,jp = ι∗ : Hp (Xi ) → Hp (Xj ). Filtrations are a very
general object that appear naturally in many settings. Let us look at some important
examples of filtrations.
38
Introduction to TDA 3.2. Persistent Homology
F : K0 ⊆ K1 ⊆ . . . ⊆ Kn = K.
The following simplicial filtration captures the process that is relevant for analyzing
point cloud data.
Definition 3.1. Let (M, d) be a metric space. Let P be a finite subset of M, and
r > 0 a real number. The Čech complex Cr (P) is the nerve of the family of balls
B(p, r) = {x ∈ M|d(p, x) ⩽ r} for all p ∈ P.
Since the balls B(p, r) form a good cover, the nerve theorem tells us that the Čech
complex is homotopy equivalent to the union of the balls.
By looking at the sequence of Čech complexes for increasing r, we get a simplicial
filtration.
39
Chapter 3. Persistence Introduction to TDA
[c]
···
This definition characterizes the cycles that that are present already in Ki and that
are not boundaries even in Kj .
• Hi,j
p = 0 for all i < j.
We say that a p-homology class [c] (a p-hole) is born at Ki if [c] ∈ Hp (Ki ) but
[c] ∈ Hi−1,i
p . Similarly, [c] dies entering Kj , if [c] ̸= 0 in Hp (Kj−1 ) but hj−1,j
p ([c]) = 0.
It is not always obvious which homology class dies. Consider the following filtration:
X1 consists of two points a and b, and in X2 the two points are connected by an edge.
Let us look at H0 , that is, the connected components. We have that H0 (X1 ) ≃ Z22 , with
the natural basis {[a], [b]}. On the other hand, in X2 there is only a single connected
component, and [a] = [b]. So a homology class is dying, but both our basis elements [a]
and [b] survive. What is happening?
It turns out that we were not careful with our choice of basis: H0 (X1 ) can also be
viewed as being generated by [a] and [a + b], and the class [a + b] indeed dies going into
X2 . In general, if two homology classes merge, they both do not die, but their sum does.
There is a consistent choice of basis which allows us to only look at persistent homology
in terms of basis elements, but we do not go into this at this point.
If we have a simplex-wise filtration, we can circumvent the above issue by sorting
homology classes by the time where they were born, and when they merge, we just say
the “younger one” dies. This can be seen as adapting the considered basis along the way.
Persistence pairings are another way around this issue. We add some final complex
Kn+1 which has trivial homology (i.e., by adding all simplices that are not yet present).
40
Introduction to TDA 3.3. Algorithms for persistent homology
Then, we aim to figure out how many holes get born at Ki and die entering Kj . For this,
we define
µi,j i,j−1
p = (βp − βi,j i−1,j−1
p ) − (βp − βi−1,j
p ), for i < j ⩽ n + 1.
Here, the content of the left parenthesis denotes the number of holes born at or before
Ki , which die entering Kj . Conversely, the right parenthesis denotes the number of holes
born strictly before Ki , and die entering Kj . Thus, subtracting the two, gives the number
of holes born exactly at Ki and die entering Kj . Note that this conveys the information
that we are interested in, but does not require choosing any basis.
The persistence diagram Dgmp (F) is a birth-death diagram which contains a point
for every pair i, j for which µi,j
p > 0. If we give each Ki a timestamp ai , the point is
drawn at the coordinates (ai , aj ). We give each point multiplicity µi,j p . On the diagram
we add points on the diagonal with infinite multiplicity, for some technical reasons that
will become apparent later. We can also represent the same information by barcodes:
For every i, j, we draw µi,j
p many intervals [ai , aj ]. This is then called the p-th persistence
barcode.
induce a simplex-wise filtration? When it does, describe the relation between the
corresponding persistence diagrams.
41
Chapter 3. Persistence Introduction to TDA
F:
t1 t2 t3 t4 t5
H0
H1
∞ ∞
t5 t5
t4 t4
t3 t3
t2 t2
Dgm0 (F) Dgm1 (F)
t1 t1
t1 t2 t3 t4 t5 t1 t2 t3 t4 t5
Figure 3.2: An example of a filtration with the corresponding barcodes and persistence
diagrams.
42
Introduction to TDA 3.3. Algorithms for persistent homology
the boundary of σ. We try pairing σ to the youngest element ρ of its boundary. If this
element is already paired with some element τ, we replace it by the sum of ρ and the
boundary of τ. We now have a new set of candidate creators. We repeat this process
until we found an unpaired creator we can pair to, or until we cannot continue (there
are no more candidates). If we cannot pair σ to anything, it must be a new creator.
Whatever unpaired creators remain at the end of the algorithm are paired to an element
∞.
What is the runtime of this algorithm? Let N be the total number of simplices in the
final complex of our filtration. Whenever we add a simplex, and we replace a simplex by
the boundary of its paired destructor, we add at most O(N) simplices. We have to do
this at most O(N) times. Since we do this for each simplex, we get a runtime of O(N3 ).
Surprisingly, this runtime is tight.
Exercise 3.7. Let G be a weighted connected graph, where all edge weights are pairwise
distinct. Consider a filtration that first inserts all vertices (in some arbitrary order)
and then inserts the edges one by one, ordered by increasing weight. What is the
set of destructors?
43
Chapter 3. Persistence Introduction to TDA
Represent the results you obtained by a persistence diagram, and also by the
persistence barcodes.
d 8 c
7 13 10 9
14 12
e 6 a 11 b
Figure 3.3: The filtration for Exercise 3.8.
Exercise 3.9. A Union-Find data structure is a data structure that maintains disjoint
sets dynamically. Given a ground set X, such a data structure maintains a family
S of disjoint subsets of X, where each subset is represented by the smallest element
contained in it. It supports three operations: MakeSet(x) creates a new set {x}.
FindSet(x) returns the representative (minimum) of the set in S which contains x
(or “no” if x is not contained in any set). Union(x, y) merges the sets containing x
and y into a single one. All of these operations can be implemented in amortized
Θ(α(n)) time, where α is the extremely slowly growing inverse Ackermann function
and can be considered a constant for any real world application.
Consider a simplicial complex K with its vertices ordered v0 , . . . , vn , and consider
its lower star filtration. Find an algorithm to compute the 0-dimensional persistence
diagram (i.e., the persistence pairings) of K which makes use of a Union-Find data
structure. How many Union-Find operations do you need to perform?
44
Introduction to TDA 3.4. Simplicial Complexes on Point Sets
The Čech complex has the nice property that by the nerve theorem, it is homotopy
equivalent to the union of the balls B(p, r). In particular, for nice radii, it will capture
the underlying shape. Sadly, checking whether a large number of balls have a common
intersection can be computationally expensive. Further, the definition requires that
the data points are embedded in a metric space. These two issues motivate the next
definition.
Definition 3.11. Given a finite metric space (P, d) and a real number radius r > 0,
the Vietoris-Rips complex VRr (P) is defined as the simplicial complex containing a
simplex σ if and only if d(p, q) ⩽ 2r for every pair p, q ∈ σ.
Clearly, for finite subsets of metric spaces, by definition, the Čech complex and the
Vietoris-Rips complex for the same radius and the same point set have the same set
of 1-simplices (the same 1-skeleton). While the Čech complex then contains additional
information about the common intersections of balls, the Vietoris-Rips complex is simply
the clique complex of this 1-skeleton. This makes the Vietoris-Rips complex easier to
compute. Furthermore, we make the following simple observation, showing that the
Vietoris-Rips complex still captures shapes in the data:
Exercise 3.14. Find a point set P ⊂ R2 and a radius r such that its Vietoris-Rips
complex has non-trivial 2-homology, i.e., such that H2 (VRr (P)) =
̸ ∼ 0.
Furthermore, is there a dimension k such that Hk ′ (VR (Q)) = 0 for all k ′ ⩾ k, all
r
45
Chapter 3. Persistence Introduction to TDA
It is a well-known fact that for a point set in general position (no d + 2 points lie on
a common sphere), there is a unique Delaunay triangulation. Furthermore, in this case
the extended Delaunay complex and the unique Delaunay triangulation coincide.
Definition 3.16. Given a finite point set P ⊂ Rd , the Voronoi diagram is the tessellation
of Rd into the Voronoi cells
for all p ∈ P.
Fact 3.17. The nerve of the Voronoi cells of P is the extended Delaunay complex of
P.
Exercise 3.18. Convince yourself that for a point set in R2 , the nerve of the Voronoi
diagram is the extended Delaunay complex. Furthermore, convince yourself that if
the points are in general position (there are no three points that are collinear, and
no four points that are cocircular), then there is a unique Delaunay triangulation.
Based on the Delaunay triangulation, we define the Alpha complex by parameterizing
using a radius as follows:
Definition 3.19. Given a finite point set P ⊂ Rd in general position as well as a real
number radius r > 0, the Alpha complex Delr (P) consists of all simplices σ ∈ Del(P)
for which the circumscribing ball of σ has radius at most r.
The following fact provides us with an alternative definition of the Alpha complex:
Fact 3.20. The Alpha complex Delr (P) is the nerve of the sets B(p, r) ∩ Vp for all
p ∈ P.
Since the Alpha complex is a subset of the Delaunay triangulation (and for large
enough radius is equal to the Delaunay triangulation), it also has complexity O(n⌈d/2⌉ ).
Further, the above fact together with the Nerve theorem implies that the Alpha complex
Delr (P) is homotopy equivalent to the Čech complex Cr (P).
Exercise 3.21. Is the following true or false? Consider a point set P ⊂ R2 in gen-
eral position and a radius r > 0. Then the Alpha complex (with radius r) is the
intersection of the Čech complex (with radius r) with the Delaunay triangulation.
46
Introduction to TDA 3.4. Simplicial Complexes on Point Sets
Definition 3.22. Given a finite point set Q and a point set P ⊃ Q in some metric space,
we say that a simplex σ ⊆ Q is weakly witnessed by x ∈ P \ Q, if d(q, x) ⩽ d(p, x)
for every q ∈ σ and p ∈ Q \ σ.
Note that the set of weakly witnessed simplices is not downwards closed. We thus
define a simplicial complex by requiring that all faces are weakly witnessed:
Definition 3.23. The Witness complex W(Q, P) is the collection of simplices on Q for
which all faces are weakly witnessed by some point p ∈ P \ Q.
Note that if we take the metric space Rd and we let P be the whole Rd , then W(Q, P) =
Del(Q), and by definition we thus get in general that W(Q, P) ⊆ Del(Q).
To arrive at a filtration, we again have to introduce a parameter r > 0:
Definition 3.24. Given a finite point set Q and a point set P ⊃ Q in some metric space
as well as a real number radius r > 0, the parameterized Witness complex Wr (Q, P)
is defined as follows:
An edge pq is in Wr (Q, P) if it is weakly witnessed by x ∈ P \ Q and d(p, x) ⩽ r and
d(q, x) ⩽ r. A simplex σ is in Wr (Q, P) if all its edges are.
The idea of this complex is that it should approximate the Vietoris-Rips complex on
P. There are theoretical guarantees about this approximation for manifolds of dimension
at most 2, but the parameterized witness complex may fail to capture the topology of
manifolds in dimension 3 and above.
Note that from the definition it is not guaranteed that the parameterized Witness
complex is a subcomplex of the Witness complex.
Definition 3.25. Given two finite point sets Q, P in Rd , as well as a graph G(P) with
vertices in P, we define v : P → Q by sending each point in P to its closest point in
Q. The graph induced complex G(Q, G(P)) contains a simplex σ = {q0 , . . . , qk } ⊂ Q
if and only if there is a clique {p0 , . . . , pk } in G(P) for which v(pi ) = qi .
We again parameterize this:
Definition 3.26. Let Gr (P) be the graph on P where pq is an edge if and only if d(p, q) ⩽
2r. The parameterized graph induced complex Gr (Q, P) is defined as G(Q, Gr (P)).
This complex again has theoretical guarantees of approximating the Vietoris-Rips
complex on P ∪ Q.
Exercise 3.27. Let P, Q be point sets and G(P) a graph with P as its vertex set. Let
v : P → Q be the map sending each point of P to its closest point of Q (assume
that this closest point is always unique). Let C be the clique complex of G(P) (the
complex which includes a simplex iff its corresponding vertices in G(P) form a
clique).
Show that v extends to a simplicial map v̄ : C → G(Q, G(P)). Also show that any
simplicial complex K with V(K) = Q for which v has a simplicial extension must
contain G(Q, G(P)).
47
Chapter 3. Persistence Introduction to TDA
where we say that ∞ − ∞ = 0 for points with coordinates that are ∞ (i.e., points
in persistence diagrams that correspond to holes that did not die).
Definition 3.29. Let Π = {π : Dgmp (F) → Dgmp (G) | π is bijective} be the set of all
bijections between Dgmp (F) and Dgmp (G). Then, the Bottleneck distance is defined
as
The Bottleneck distance thus minimizes the maximum L∞ -norm of any pairing, over
all pairings of points.
48
Introduction to TDA 3.5. Distance Metrics on Persistence Diagrams
Exercise 3.31. Give an algorithm to compute the Bottleneck distance between two
persistence diagrams. Your algorithm should be polynomial in n, where n is the
total number of off-diagonal points in the two persistence diagrams.
The following theorem tells us that this infinity norm and the Bottleneck distance
are closely related:
49
Chapter 3. Persistence Introduction to TDA
Proof. Let ft := (1 − t)f + tg for t ∈ [0, 1] be the linear interpolation between f and g.
Note that f0 = f, f1 = g.
We first show that each ft is a simplex-wise monotone function. It is clearly simplex-
wise, and we prove that it is also monotone: Let σ ⊆ τ. Since f and g are monotone, we
have f(σ) ⩽ f(τ) and g(σ) ⩽ g(τ). Thus,
ft (σ) = (1 − t)f(σ) + tg(σ) ⩽ (1 − t)f(τ) + tg(τ) = ft (τ).
Let p ⩾ 0 be fixed. We now draw the family of persistence diagrams Dgmp (Fft )
as a multiset in R2 × [0, 1]. Each off-diagonal point of Xt := Dgmp (Fft ) is of the form
x(t) = (ft (σ), ft (τ), t) for σ being the creator and τ being the destructor. Note that the
persistence pairings (σ, τ) may only change when the order of simplex insertion changes,
which only happens finitely many times when going from t = 0 to t = 1. Let us call
these values 0 = t0 < t1 < t2 < . . . < tn < tn+1 = 1. Without loss of generality, we
assume that at each of these values ti exactly two simplices have the same value fti .
Within each open interval (ti , ti+1 ) the pairings stay constant. Furthermore, every
off-diagonal point x(t) is a linear function of t in all three coordinates, meaning that it
defines a line segment.
At ti+1 , if x(ti+1 ) is an off-diagonal point whose creator and destructor are still paired
after ti+1 , x(t) continues in the same direction after ti+1 .
If on the other hand x(ti+1 ) is an off-diagonal point whose creator and destructor get
paired differently, recall by Exercise Sheet 5, Question 3, there are exactly two pairs that
swap their creators or destructors, and these creators or destructors that are swapped
must have the same value in fti+1 . In the persistence diagram, this means that two points
vertically or horizontally of each other swap creators/destructors, and there is a unique
continuing line segment for both of them.
Lastly, if x(ti+1 ) is on the diagonal, this means that its previous creator and destructor
now have the same value in fti+1 . There is no continuation for this point.
Every point thus moves along a polygonal path monotone in t. Every such path
is called a vine, and the multiset of all vines is called a vineyard, see Figure 3.5 for
an illustration. Based on this vineyard, we now wish to find a good matching giving
an upper bound on the Bottleneck distance. We simply take the matching where we
match the start point of every vine with its endpoint. To get a bound on the Bottleneck
distance, we simply need to get a bound for the distance of each matched pair.
Between ti and ti+1 we get for δx(t) δt
:
δ
((1 − t)(f(σ), f(τ), t)) + t(g(σ), g(τ), t)) = (g(σ) − f(σ), g(τ) − f(τ), 1)
δt
Projecting x(ti+1 ) and x(ti ) to R2 we get two points yi+1, yi such that
||yi+1 − yi ||∞ = (ti+1 − ti ) · max(g(σ) − f(σ), g(τ) − f(τ) ⩽ (ti+1 − ti ) · ||f − g||∞
Thus, since || · ||∞ is a norm and fulfills the triangle inequality, we also have that from
t = 0 to t = 1, the point can move at most ||f − g||∞ . We thus have the desired bound
on the Bottleneck distance.
50
Introduction to TDA 3.5. Distance Metrics on Persistence Diagrams
time
ti+1 h
at
de
birth
Exercise 3.34. Show that Theorem 3.33 (Stability for simplicial filtrations) can be
tight for all p ⩾ 0 and all values of ||f − g||∞ .
We do not prove this theorem at this point, but with additional tools that we will
develop in Section 3.6, the proof of this (and of Theorem 3.33) will follow quite easily.
51
Chapter 3. Persistence Introduction to TDA
X Y1 Y2
only one reasonable matching between X and Y1 , and also only one between X and Y2 : We
simply match each off-diagonal point with its closest point on the diagonal. Since we only
look at the longest edge in this matching, the Bottleneck distance db (X, Y1 ) = db (X, Y2 ).
We can get rid of this counter-intuitive behavior of the Bottleneck distance by using
the Wasserstein distance.
X 1/q
" #
dW,q (Dgmp (F), Dgmp (G)) := inf (||x − π(x)||∞ )q
π∈Π
x∈Dgmp (F)
Intuitively, we now consider the length of all edges in the matching induced by the
bijection, as opposed to just the longest one, but the longer ones get more weight. Note
that for q = ∞, we retrieve the bottleneck distance, that is, dW,∞ = db .
We can see that the stability theorem we proved for Bottleneck distance does not
hold for Wasserstein distance: consider two simplex-wise monotone functions f and g
on a path, as illustrated in Figure 3.6. In both f and g the first vertex on the path is
mapped to 1 and the edges along the path are mapped to increasing odd numbers. In
f the remaining vertices along the path get mapped to increasing even numbers, and in
g to increasing odd numbers. In particular, ||f − g||∞ = 1. In the filtration defined by
f, at every even step we add a vertex, creating a new connected component, which gets
connected to the rest of the path at the next step. Thus, each vertex of the path will
give an off-diagonal point in the 0-persistence diagram, where all of them except the first
one have a lifespan of 1. On the other hand, in the filtration defined by g, we always
add the new vertices and their connecting edge in the same step, thus the 0-persistence
diagram only has a single off-diagonal point with infinite lifespan. In particular, we have
that for arbitrarily long paths we get arbitrarily large Wasserstein distances between the
diagrams for all q < ∞.
A similar counterexample can also be found for topological spaces. Consider the
topological space [0, 1] and the two functions depicted by the curves in Figure 3.7. Here
52
Introduction to TDA 3.6. Interleaving of persistence modules
0 1 3 5 0 1 3 5
··· ···
2 4 6 1 3 5
Figure 3.6: Two simplex-wise monotone functions with bounded infinity norm whose
persistence diagrams have unbounded Wasserstein distance.
we again have that ||f − g||∞ ⩽ ϵ, but the Wasserstein distance between the two diagrams
can be made arbitrarily big.
To avoid these types of counterexamples, we only want to consider even nicer func-
tions:
Definition 3.37 (Lipschitz). Let (X, d) be a metric space. A function f : X → R is
Lipschitz if there exists a constant C such that |f(x) − f(y)| ⩽ c · d(x, y) for all
x, y ∈ X.
For these functions we again get stability theorems, that we will not prove here.
Theorem 3.38. Let X be a triangulable, compact metric space. Let f, g : X → R be
Lipschitz functions. Then there exist constants C and k (that may only depend on
X and on the Lipschitz constants of f, g) such that for every p ⩾ 0 and every q ⩾ k,
dW,q (Dgmp (Ff ), Dgmp (Fg )) ⩽ C · ||f − g||1−k/q
∞ .
Theorem 3.39. Let f, g : K → R be simplex-wise monotone functions. Then for all
p ⩾ 0 and all q ⩾ 1,
X 1/q
dW,q (Dgmp (Ff ), Dgmp (Fg )) ⩽ ||f − g||q = |f(σ) − g(σ)|q .
σ∈K
53
Chapter 3. Persistence Introduction to TDA
X = [0, 1]
f: g:
Figure 3.7: Two functions [0, 1] → R with bounded infinity norm whose persistence
diagrams have unbounded Wasserstein distance.
−1
commutes both ways, i.e., fa ′ ◦ ua,a ′ = va,a ′ ◦ fa , and ua,a ′ ◦ f−1
a = fa ′ ◦ va,a ′ .
The basic idea of interleaving distance is to measure how close two persistence mod-
ules are to being isomorphic. For this, we allow ourselves some slack, in the sense that
Ua does not need to map to Va , but it can map to Va+ϵ , as long as all the relevant
54
Introduction to TDA 3.6. Interleaving of persistence modules
maps still behave like they would for an isomorphism. We make this formal in the next
definition.
Definition 3.42 (ϵ-interleaving persistence modules). Let U and V be persistence modules
over R. We say that U and V are ϵ-interleaved if there exist two families of maps,
φa : Ua → Va+ϵ and ψa : Va → Ua+ϵ such that the following four diagrams are
commutative:
ua,a ′ ua+ϵ,a ′ +ϵ
Ua Ua ′ Ua+ϵ Ua ′ +ϵ
φa φa ′ ψa ψa ′
and
va+ϵ,a ′ +ϵ va,a ′
Va+ϵ Va ′ +ϵ Va Va ′
ua,a+2ϵ
Ua Ua+2ϵ Ua+ϵ
φa ψa+ϵ ψa φa+ϵ
and
va,a+2ϵ
Va+ϵ Va Va+2ϵ
Note that if U and V are isomorphic, then they are 0-interleaved: the first type
of diagrams (the square diagrams) are the commutative diagrams in the definition of
isomorphic persistence modules and the the second type of diagrams (the triangular
diagrams) collapse to two arrows that say that the maps φa are isomorphisms with
inverses ψa .
Theorem 3.43. Assume U and V are ϵ-interleaving. Let δ > ϵ. Then U and V are
also δ-interleaving.
Proof. Given φa′ : Ua → Va+ϵ we define φa : Ua → Va+δ simply as φa := va+ϵ,a+δ ◦φa′ .
Symmetrically, we define ψa := ua+ϵ,a+δ ◦ ψa′ . To check that the correct diagrams
commute, we only check the right of every pair of symmetric ones above. We have to
distinguish two cases for the first diagram, a + δ < a ′ + ϵ and a + δ > a ′ + ϵ.
For the first case, we get the following diagram:
Ua Ua ′
Va+ϵ Va+δ Va ′ +ϵ Va ′ +δ
Ua Ua ′
Va+ϵ Va ′ +ϵ Va+δ Va ′ +δ
55
Chapter 3. Persistence Introduction to TDA
Va+ϵ Va+δ
One can now verify that in all of these diagrams the correct paths commute.
Thus, the following definition makes sense:
Exercise 3.46. Let W1 and W2 be two arbitrary vector spaces. Let U be the persistence
module such that Ua = W1 for a ∈ [w, x), and Ua = 0, otherwise. For a, a ′ ∈ [w, x)
we have ua,a ′ being the identity map. For a < w or a ′ ⩾ x (or both), we have ua,a ′
being the zero map. Similarly, we define the persistence module V which is W2 in
a ∈ [y, z) and 0 otherwise.
Show that dI (U, V) ⩽ max( w−x2
, z−y
2
).
The underlying ideas that allowed us to define the interleaving distance of persistence
modules can also be applied to filtrations.
56
Introduction to TDA 3.6. Interleaving of persistence modules
define Crlog = C2 and similarly VRrlog = VR2log . Since 2(r+1) = 2r, we have Crlog (P) ⊆
r r
Crlog Cr+1
log Cr+2
log
VRrlog VRr+1
log VRr+2
log
Since these are all inclusions, all relevant diagrams must commute, and thus we get that
dI (Clog , VRlog ) ⩽ 1.
Definition 3.49. A persistence module V is q-tame if the linear maps have finite rank.
Note that in this definition, the q is not a parameter, just a name. All persistence
modules that show up in the context of persistent homology on point clouds are q-tame,
so this condition is not restrictive.
Thus, for every interleaving one can find between two persistence modules or between
filtrations, one immediately gets a bound on the Bottleneck distance. This is a very
powerful result, and the proof of this is out of scope for these lecture notes. One direction
of the proof however follows from a decomposition result of persistence modules, that
we will discuss in Section 3.7. But first, we will look at some examples, how we can use
Theorem 3.50 to prove stability theorems.
57
Chapter 3. Persistence Introduction to TDA
Consider two point clouds P, Q in the same metric space X. Let us first consider
the really simple case, where P = {p}, and Q = {q} with d(p, q) = d. Then, B(p, r) ⊆
B(q, r + d). Now, how does this generalize to larger point sets? To get the same kind of
behavior, we need that for every point in P, there exists some point in Q with distance
at most d. This motivates the following distance measure:
Definition 3.52 (Hausdorff distance). Let A, B ⊆ X be compact sets. Then the Hausdorff
distance between A and B is defined as
Let dH (P, Q) = d. Then, p∈P B(p, r) ⊆ q∈Q B(q, r + d). From this, we get the
S S
following lemma:
Lemma 3.54. The (filtration given by) the Čech complexes of P and Q are d-interleaved.
Proof.
S S S
q∈Q B(q, r) q∈Q B(q, r + d) q∈Q B(q, r + 2d)
≃ ≃ ≃
The relevant diagrams commute up to homotopy, since we only chain together homo-
topies and inclusion maps.
Proof. By Theorem 3.50, Observation 3.48, and finally Lemma 3.54, we have
58
Introduction to TDA 3.7. Interval decomposition of Persistence Modules
U =∼ I⟨bi, di⟩.
M
i∈I
The intervals ⟨bi , di ⟩ are exactly the barcodes if U is a persistent homology module.
59
Chapter 3. Persistence Introduction to TDA
Note that unless we have some additional tame-ness condition on U, I is not guaran-
teed to be finite.
Recall that when we talked about persistent homology, we said that there is some
consistent global choice of basis for persistent homology groups. That is a consequence
of the structure theorem. The structure theorem also allows us to prove one direction of
Theorem 3.50, which we will do in the following.
Proof. To prove that dI (I1 , I2 ) ⩾ db (DgmI1 , DgmI2 ), we show that every upper bound
on dI is also an upper bound on db : assume that we have maps φ, ψ showing that the
two modules are ϵ-interleaved. Then, consider ψa+ϵ ◦ φa = v1a,a+2ϵ , equality holding
because φ, ψ certify ϵ-interleaving. Consider a ∈ ⟨b1 , d1 ⟩.
Case 1: v1a,a+2ϵ = 0 for all a ∈ ⟨b1 , d1 ⟩. Then, d1 − b1 < 2ϵ, and the (infinity-norm)
distance of (b1 , d1 ) to the diagonal is less than ϵ.
Case 1: The two off-diagonal points are matched to the diagonal. Then, we get that
di − bi ⩽ 2ϵ for both of them, and thus for all ϵ ′ > ϵ, I1 and I2 are ϵ ′ -interleaved with
φ, ψ = 0. Thus, dI ⩽ ϵ.
Case 2: The points are matched with each other. Then, |b2 − b1 | ⩽ ϵ and |d2 − d1 | ⩽ ϵ.
Taking φ, ψ = id we can see that I1 and I2 are ϵ-interleaved. Thus, dI ⩽ ϵ.
Corollary 3.62. Let U, V be p.f.d. persistence modules. Then, dI (U, V) ⩽ db (DgmU, DgmV).
60
Introduction to TDA 3.7. Interval decomposition of Persistence Modules
Questions
13. What is a filtration? State the definition and describe different ways how filtra-
tions appear in topology and data analysis.
14. What persistent homology? State the formal definitions and give examples.
15. How can persistent homology be computed? Discuss the two algorithms de-
scribed in Section 3.3.
16. What are the Čech and Vietoris-Rips complexes? Give the definitions, discuss
their size and theoretical guarantees, and how they are related.
17. What are the Delaunay and Alpha complexes? Give the definitions, discuss
their size and theoretical guarantees, and how they are related.
18. What is the Witness complex? State the Definition and describe how it relates
to the non-sparse complexes.
19. What is the Graph induced complex? State the Definition and describe how it
relates to the non-sparse complexes.
20. How can we measure distances between persistence diagrams? Discuss Bottle-
neck and Wasserstein distance.
21. How stable are filtrations derived from simplex-wise monotone functions with
respect to Bottleneck distance? State, illustrate and prove the stability theorem
(Theorem 3.33).
22. How can we measure distances between persistence diagrams? Define inter-
leaving distance and discuss its relation to Botleneck distance.
23. How stable are Čech complexes to perturbations of the underlying point set?
Define Hausdorff distance, state and prove the stability theorem for Čech complexes
(Theorem 3.55).
61
Chapter 4
In this chapter we look at another tool in topological data analysis, called Mapper. The
underlying idea of Mapper has its roots in Morse theory, where Georges Reeb defined
a graph to summarize a Morse function on a manifold. We first discuss these graphs,
called Reeb graphs, and then how to mimic the ideas for the case where instead of a
manifold we have point cloud data.
Before we dive into the mathematical details, a short remark about the pronunciation
of the word “Reeb graph”. Georges Reeb, after whom these graphs are named, was a
French mathematician born in the German speaking region Alsace. Thus, he likely
pronounced his name the German way, that is, with the “ee” spoken similar to the “ea”
in “bear” (as opposed to “beer”).
To make sure that nothing weird happens due to some things being infinite, we
assume all of our functions to be levelset tame:
• each levelset f−1 (α) has finitely many connected components, all of which are
path-connected, and
62
Introduction to TDA 4.1. Reeb Graphs
f :X→R
X Rf
• the homology groups of the levelsets only change at finitely many critical val-
ues.
The Reeb graph itself is just a (continuous) topological space. We call it a graph,
since it is 1-dimensional. To arrive at a graph as we know it in combinatorics, we will
need to discretize it. To discretize the Reeb graph, we need to define vertices and edges.
There are many different possibilities of defining vertices and edges to discretize the Reeb
graph, but we want to define some type of minimal one.
Let us look at the neighborhood of some point p in the Reeb graph (as a topological
space). We look at how many ways there exist to go from p towards the direction of
higher f-value (we call this number the up-degree u), and how many ways to go towards
the direction of lower f-value (we call this the down-degree l). Depending on u and l,
we classify p as in Table 4.1.
u l Classification
1 1 regular
0 >0 maximum
>0 0 minimum
⩾2 l up-fork
u ⩾2 down-fork
Note that a point can fall into multiple of these classes, for example it can be a maxi-
mum and a down-fork simultaneously, or an up-fork and a down-fork simultaneously. We
63
Chapter 4. Reeb graphs and Mapper Introduction to TDA
call the minima, maxima, up-forks, and down-forks critical points. Our discretization
places vertices at the critical points. Note that the graph we get through this process is
not necessarily simple, we may have multi-edges.
Exercise 4.3. Consider a double torus embedded in R3 . You can imagine it as the
result of taking the figure depicted in Figure 4.2 embedded in the plane x3 = 0,
replacing every point by a 3-dimensional ball with radius r < min{d/2, R/2}, and
taking the boundary of the union of these balls.
x2
R R
d
x1
Figure 4.2: The space blown up to a double torus in Exercise 4.3.
Draw the Reeb graph for the three functions f1 (x) = x1 , f2 (x) = x2 , and f3 (x) =
x3 .
We next consider merge trees and split trees, which are variants of the Reeb graph,
where instead of levelsets, we look at sub-level sets or super-level sets.
Note that in the merge tree, since we only increase the space under consideration,
we never have a connected component that splits. We can only have new connected
components appearing, and connected components merging. This also tells us that the
Merge tree (or its discretization) is always a tree.
In topological data analysis, we use computers, which cannot handle arbitrary topo-
logical spaces. We thus now look more at Reeb graphs in the context of simplicial
complexes. We consider a simplicial complex K and a function f : |K| → R, which is
64
Introduction to TDA 4.1. Reeb Graphs
piece-wise linear (linear on each simplex). We observe that the Reeb graph then only
depends on the 2-skeleton of K. This is the case since looking at a levelset is the same as
cutting through the simplicial complex. When we cut through a simplex, we generally
get a simplex of one dimension lower. In a simplicial complex, connectivity is completely
determined by the 1-skeleton. Thus, before cutting, the 2-skeleton suffices. Furthermore,
we can see that the critical points are images of the vertices of K. This happens since a
connected component can only appear, disappear, split, or merge at some local maximum
or minimum of the connected component. Since the function is linear, the maximum or
minimum of every simplex is also attained at some vertex. We define the augmented
Reeb graph of a simplicial complex with a PL-function, by just taking all the images of
the vertices as our graph vertices.
How can we compute this augmented Reeb graph? We can do a discrete sweep (or
scan) through the simplicial complex in the order given by f, only stopping at values
a such that f(v) = a for some vertex v. In this sweep, we want to keep track of the
connected components. The levelset f−1 (α) of the 2-skeleton of K is just a graph Gα :
vertices and edges of K induce vertices of Gα , triangles induce edges. We can now go
through our vertices in order, look at these graphs, and update the connected compo-
nents. The runtime of this algorithm is given by the data structure used to manage the
connected components. We want a data structure that can update the connected compo-
nents under insertion and deletions of edges and vertices. There are such data structures
that can do each update in amortized time O(log m), where m is the size of the graph.
The size of the graph is bounded by the sum m of vertices, edges, and triangles in K.
Each such feature appears at one point, and disappears at one point, and we thus have
at most 2m insertions and deletions in total, giving an O(m log m) algorithm. We thus
have the following theorem.
Theorem 4.6. Given a 2-dimensional simplicial complex K with m faces and a piece-
wise linear function f : |K| → R on it, we can compute the augmented Reeb graph
Rf of K with respect to f in time O(m log m).
Exercise 4.7. Consider a simplicial complex K and a PL (piece-wise linear) function
f : |K| → R. What happens to the Reeb graph when you add one additional face to
K and extend f accordingly?
65
Chapter 4. Reeb graphs and Mapper Introduction to TDA
In other words, the Reeb graph captures the 0-homology of the input space X per-
fectly, no matter which levelset tame function f we use.
Sadly, the same does not hold for the 1-homology. Let us consider a torus, as in
Figure 4.3. In general, it can be that the choice of function f determines whether we
capture a hole or not, consider e.g. a cylinder. Note that for the torus, it is actually the
case that no matter which function f we choose, we cannot capture its 1-homology (this
is non-trivial to show).
On the other hand, we can see that every cycle in the Reeb graph is indeed also a
cycle in the topological space X, and it cannot be filled in, so it is indeed a hole. Thus
we also get the following observation:
Can we somehow formalize which holes we lose? To do this, we split up homology into
“horizontal” and a “vertical” parts, where horizontal and vertical are of course relative to
f.
This definition means that we need to be able to find a finite set of levelsets, such
that we can find cycles contained in these levelsets, which are in the homology class h
in Hp (X).
One now wonders whether the set of horizontal homology classes forms a group. Let
this set be Hp (X). It turns out that it is indeed a group.
66
Introduction to TDA 4.2. Distances for Reeb Graphs
Since the horizontal homology is a sub-group, we can now easily define vertical ho-
mology by taking quotient groups.
∨
Definition 4.12. The vertical homology group of X with respect to f is the group Hp (X) :=
Hp (X)/Hp (X).
∨
Observation 4.13. rank(Hp (X)) = rank(Hp (X)) + rank(Hp (X)).
∨ ∨
Fact 4.14. The surjection ϕ : X → Rf induces an isomorphism Φ : H1 (X) → H1 (Rf ).
In other words, when we go from a space X to its Reeb graph, we keep the vertical
homology classes, and lose the horizontal ones.
Here, a 2-manifold is a space that locally at every point looks like R2 . Orientable
means that there is an inside and an outside side. A Morse function is a “nice enough”
function defined in terms of some derivatives, which we do not need to specify here.
67
Chapter 4. Reeb graphs and Mapper Introduction to TDA
Definition 4.18. For a Reeb graph Rf consider a function fϵ : (Rf )ϵ → R such that
(x, t) 7→ f(x) + t.
The ϵ-smoothing of Rf , denoted by Sϵ (Rf ) is the Reeb graph of (Rf )ϵ with regards to
fϵ .
An example of these definitions can be seen in Figure 4.4. Note that when we say
(Rf )ϵ , we mean an ϵ-thickening of Rf , not a Reeb graph with regards to some function
fϵ . The ϵ-smoothing Sϵ (Rf ) is then a Reeb graph with regards to the function fϵ , but
of (Rf )ϵ , and not of the original space Rf is the Reeb graph of. Furthermore, when we
write f(x) for some x ∈ Rf , we mean that we extend f to some function f∗ : Rf → R by
defining f∗ (x) = f(f− 1(x)). We will just call this function f as well for simplicity.
Definition 4.19. The function ι : Rf → Sϵ (Rf ) with x 7→ [(x, 0)] is the quotiented inclusion
map. Here, [(x, 0)] denotes the equivalence class, or the connected component that
contains (x, 0) in f−1
ϵ (fϵ (x, 0)).
Definition 4.20 (Reeb graph interleaving). A Reeb graph interleaving is a pair of func-
tion preserving maps φ : Rf → Sϵ (Rg ), ψ : Rg → Sϵ (Rf ) are ϵ-interleaved, if the
following diagram commutes:
ι ιϵ
Rf Sϵ (Rf ) S2ϵ (Rf )
ψ φ ψϵ φϵ
ι ιϵ
Rg Sϵ (Rg ) S2ϵ (Rg )
68
Introduction to TDA 4.2. Distances for Reeb Graphs
Rf (Rf ) S (Rf )
Here, to understand why ιϵ makes sense, we need the following fact, the proof of
which is left as an exercise.
We once again have a stability theorem, which we will not prove here.
Definition 4.24. Let Rf be a Reeb graph of a space X, and u, v ∈ Rf (in the same
connected component), and let π be a path from u to v. We define the height of π
as height(π) = maxx∈π f(x) − minx∈π f(x). To turn this into a distance metric, we
69
Chapter 4. Reeb graphs and Mapper Introduction to TDA
consider Π(u, v),the set of all paths between u and v. Then, the function induced
metric on Rf is defined as
In a sense, df (u, v) is the “thickness” of the thinnest “slice” of the space X in which u
and v are connected.
Definition 4.25 (Functional distortion distance). Let Rf and Rg be two Reeb graphs. Let
Φ : Rf → Rg , Ψ : Rg → Rf be continuous functions, but not necessarily function-
preserving. Then, we define correspondence and distortion:
70
Introduction to TDA 4.3. Mapper
4.3 Mapper
X U f ∗(U) N (f ∗(U))
Figure 4.5: A space X, an open cover U of R, the family f∗ (F), and its nerve.
If we take sufficiently nice functions, and sufficiently fine covers, then N(f∗ (U)) is
isomorphic to Rf .
71
Chapter 4. Reeb graphs and Mapper Introduction to TDA
As an example, we look at X being the boundary of the 3-cube [0, 1]3 . We then also
look at Z1 = R2 spanned by the x- and y-axis, with f1 : X → Z1 being the projection onto
this plane. Furthermore, we look at Z2 = R, spanned by just the x-axis, and f2 : X → Z2
being again the projection.
We consider the open cover U2 of Z2 : {(−∞, 13 ), (0, 1), ( 23 , +∞)}. For Z1 , we consider
the cover U1 := U2 × U2 .
M (U2 , f2 )
M (U1 , f1 )
Figure 4.6: The cover U∞ , and the two Mappers. The Mapper M(U1 , f1 ) consists
of an empty octahedron, with additional filled tetrahedra attached at the
purple vertices. The whole space thus collapses to an octahedron.
72
Introduction to TDA 4.4. Multiscale Mapper
Input: In the most general setting, data comes as a finite metric space (P, dP ), for ex-
ample as points in Rd or as vertices of a graph. We also requires a cover U of a space Z,
usually Z = R, as input. Finally, we also need a filter function f : P → Z and a clustering
algorithm (which might also require some input parameters).
Algorithm: Since at the moment we only have a discrete metric space, we do not really
have the notion of connected components yet. For every U ∈ U, we thus cluster the pre-
image f−1 (U) using some clustering algorithm, which we can also consider as an input.
Now, we can just consider each cluster Ci as a vertex of some simplicial complex K, and
add a face {C1 , . . . , Ck } to K if these clusters (which are just point sets) have a common
point.
Definition 4.32. Let U = {Uα }α∈A and F = {Vβ }β∈B be two covers of the same space X.
A map of covers is a map φ : A → B such that for every α ∈ A, we have Uα ⊆ Vφ(α) .
Proof. Let σ ∈ N(U). We need to show that the intersection β∈φ(σ) Vβ is non-empty.
T
\ \ \
Vβ = Vφ(α) ⊇ Uα ̸= ∅
β∈φ(σ) α∈σ α∈σ
Recall that f∗ (U) is the cover of X consisting of the connected components of the
pre-images of the sets of U under f.
73
Chapter 4. Reeb graphs and Mapper Introduction to TDA
Proof. For every α, we have Uα ⊆ Vφ(α) =⇒ f−1 (Uα ) ⊆ f−1 (Vφ(α) ). We now need to go
from these pre-images to their connected components. Since every connected component
of f−1 (Uα ) must lie in a unique connected component of f−1 (Vφ(α) ), our desired map of
covers is given by exactly mapping to this connected component.
φ ψ
If we have multiple maps of covers, U → V → W, we can concatenate the maps, and
the f∗ function distributes: f∗ (ψ ◦ φ) = f∗ (ψ) ◦ f∗ (φ).
φ φ φn−1
Let U = U1 →1 U2 →2 . . . → Un be a sequence of covers of Z with maps between
them, which we call a cover tower. By applying f∗ we get a cover tower f∗ (U) of X.
Applying homology, we get the sequence homology groups with induced homo-
morphisms between them, i.e., a persistence module:
N(f∗ (φ1 )) N(f∗ (φn−1 ))
Hp (N(f∗ (U1 ))) → ... → Hp (N(f∗ (Un ))).
We can now view Dgmp MM(U, f) as a topological summary of f through the lens
of U.
As opposed to the normal Mapper, at first glance the Multiscale Mapper adds even
more parameters. But a cover tower can be seen as a way of looking at a whole interval
of covers. For example, we can get a cover tower by increasing the size of all intervals
in an interval cover. The features of the data should show up as a robust feature that
persists for a longer time over this process, while spurious features obtained from choosing
“wrong” Mapper parameters should disappear quickly.
Questions
24. What is a Reeb graph? State the definition and describe how we get the graph
structure.
25. How can we compute the augmented Reeb graph of a piece-wise linear func-
tion? Define the augmented Reeb graph and explain the algorithm to compute
it.
26. How much of the homology of the underlying topological space is captured by
the Reeb graph? Explain vertical and horizontal homology.
74
Introduction to TDA 4.4. Multiscale Mapper
27. What is the interleaving distance for Reeb graphs? Give the definitions and
state the relevant stability theorems.
28. What is the functional distortion distance for Reeb graphs? Give the definitions
and state the relevant stability theorems.
29. What is the topological Mapper? State the Definition and give an example.
30. How can we use Mapper on point cloud data? Explain the Mapper algorithm
and describe the input parameters.
31. How can we use Mapper on several covers at once? Explain the Multiscale
Mapper.
75
Chapter 5
Optimal Generators
In some applications, we are not only interested in the number of holes in our data, but
we also want to look at specific holes, that is, we would like to have a representation of
this hole in the data, or even a basis of the homology group. However, in a homology
class, there are many homologous cycles. Furthermore, there are many different choices
of homology classes which form a basis of the homology group. Thus, there are many
different choices for cycles as bases of the homology group. How do we find good bases?
We define a weight function w : KP p → R⩾0 on the p-simplices,
P and the weight of a
chain is simply the sum, i.e., w(c) = αi w(σi ) for c = αi σi . The weight of a set of
cycles C is then the sum of weights of each cycle. We are now interested in cycles that
have minimal weight in their homology class, or at bases with minimum total weight.
We look at this problem in two settings: first we look at the case where we are given
a fixed simplicial complex and we want to find an optimal basis for the homology of this
complex. This can be applied for example if the persistence diagram of a filtration gives
us a range of values in which we expect the complex to nicely capture the shape of the
data. We can then compute an optimal basis for the fixed complex for some value in this
range.
In some applications, we might also want to take a closer look at single intervals in
the persistence barcode, that is, understand a hole that is born at time b and dies at
time d (for example, to decide whether it is corresponds to a feature in the data or is
just a consequence of the process). This brings us to the second setting we look at in
this chapter, where we want to find an optimal representative of a persistent homology
class.
Definition 5.1. A set C of cycles is an optimal basis for Hp (K) if it is a basis and there
is no other basis C ′ with w(C ′ ) < w(C).
76
Introduction to TDA 5.1. Optimal basis of a fixed complex
In a first step, we are going to compute a set of cycles C which contains an optimal
basis. Then, we sort the cycles by increasing weight, and pick the first cycle to be part
of our basis B. Then, we simply iterate through our cycles and add a cycle ci to our
basis if it cannot be written as a linear combination of our current basis. Finally, if c1 is
a boundary, we return the B \ {c1 }, and otherwise we return B.
Assuming that we can do all these steps, it follows from a more general framework
in matroid theory that the computed basis is indeed optimal.
The sets in I are also called the independent sets of the matroid. The inclusion-
maximal sets in I are called bases.
(a) Show that for U being any finite set of vectors in some vector space, the
family I of subsets of U corresponding to linearly independent vectors forms
the family of independent sets of a matroid.
(b) Show that for any graph G = (V, E), the family I of subsets of E corresponding
to forests in G forms the family of independent sets of a matroid.
For the first step of the above algorithm, we need to be able to compute our beginning
set C. Furthermore, we need to be able to check linear independence.
From now on, we will focus on computing a basis for H1 (K). Without loss of gen-
erality, we say that K is 2-dimensional, with n triangles, O(n) edges and vertices. To
compute C, we begin with C = ∅. For all vertices v, we compute the shortest path tree
Tv rooted at v. We can do this for example with Dijkstra’s algorithm. For every edge e
that is not in Tv , we add the unique cycle in Tv ∪ {e} to C. This can be implemented in
O(n2 log n), and yields a set of cycles with |C| ∈ O(n2 ). But, we need to prove that it is
indeed a set which contains an optimal basis.
77
Chapter 5. Optimal Generators Introduction to TDA
Proof. Let C∗ be an optimal basis, and towards a contradiction, let c be a cycle contained
in C∗ \ C. As the weights are non-negative, we can assume that c is simple, i.e., no edge
is used multiple times.
Let v be a vertex in c, and let Tv be the corresponding shortest path tree. There must
be an edge e = {u, w} in c, which is not in Tv , since Tv is a tree. Let Πv,u and Πv,w be
the shortest paths from v to u, w respectively. These paths must be contained in Tv . Let
us similarly consider Πv,u ′
and Πv,w′
to be the (shortest) paths from v to u, w in c. We
know that not both Πv,u = Πv,u and Πv,w
′ ′
= Πv,w , so w.l.o.g. assume that Πv,u
′
̸= Πv,u .
We now define the cycle c1 = {Πv,w , e, Πv,u } and c2 = {Πv,u , Πv,u }. We can now see
′ ′
So, we have finished the first step of our algorithm. It remains to figure out how to
check independence. For this, we introduce annotations.
Definition 5.4. An annotation of p-simplices is a function a : Kp → Z − 2g giving each
p-simplex a binary vector of size g. This extends to chains by sums. An annotation
must fulfill:
• g = βp (K)
• a(z1 ) = a(z2 ) iff [z1 ] = [z2 ].
Given an annotation, we can now clearly check linear independence of cycles by
simply checking linear independence of a set of vectors, for which we have existing tools
such as Gaussian elimination.
Proposition 5.5. In every simplicial complex K and for every p ⩾ 0, there exists an
annotation of p-simplices, and can also be computed.
Proof. (Sketch for p = 1) We can compute a spanning forest T , and let m be the number
of remaining edges. We initialize annotations of length m, and set a(e) = 0 for every
edge in the spanning forest T . For every remaining edge ei , we set aj (ei ) = 1 if and only
if j = i, and 0 otherwise.
For every triangle t, if the annotation of its boundary δt is not 0, we find a non-zero
entry bu in a(δt) and add a(δt) to every edge with au (e) = 1, and we delete the u-th
entry from all annotations. One can show that this yields a valid annotation, and it can
be implemented in O(n3 ), and more clever implementations work in O(nω ).
78
Introduction to TDA 5.2. Persistent cycles
Theorem 5.6. Given a 2-dimensional simplicial complex K with n faces and a weight
function w on its edges, we can compute an optimal basis of H1 (K) in time O(nω +
n2 gω−1 ).
79
Chapter 5. Optimal Generators Introduction to TDA
We can build a dual graph G, by placing a vertex into every (p + 1)-simplex and
adding an edge whenever they share a p-simplex. We furthermore add a dummy vertex
which gets connected to all vertices which only have one neighbor. We are going to make
the vertex belonging to the (p + 1)-simplex which is the destructor of our desired cycle
the source. Furthermore, we make the dummy vertex as well as all vertices belonging
to (p + 1)-simplices added after the destructor into sinks. Edges added at or before the
birth are getting the capacity equal to their weight, while all other edges get capacity
∞. Then, it turns out that the p-simplices belonging to the edges in a minimum cut
separating the sources from the sinks are an optimal persistent cycle.
This exercise proves one direction of the correctness of the algorithm described above.
The other direction is similar. We get the following result.
For details, we refer to Chapter 5 in the book of Dey and Wang [1].
Questions
32. How can we compute an optimal basis given a set of cycles that contain one?
Explain the algorithm described in Section 5.1. Further, explain annotations and
how they can be used to check linear independence.
33. How can we compute a set of 1-cycles that contain an optimal basis of H1 ?
Describe the algorithm to do this and prove its correctness.
34. How can we compute an optimal persistent cycle? Explain the algorithm de-
scribed in Section 5.2.
References
[1] Tamal Krishna Dey and Yusu Wang, Computational topology for data analysis,
Cambridge University Press, 2022.
80