0% found this document useful (0 votes)
11 views27 pages

A Bayesian Framework For Persistent Homology

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views27 pages

A Bayesian Framework For Persistent Homology

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

SIAM J. MATH. DATA SCI.

\bigcirc
c 2020 Society for Industrial and Applied Mathematics
Vol. 2, No. 1, pp. 48--74

A Bayesian Framework for Persistent Homology\ast


Vasileios Maroulas\dagger , Farzana Nasrin\dagger , and Christopher Oballe\dagger
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Abstract. Persistence diagrams offer a way to summarize topological and geometric properties latent in datasets.
While several methods have been developed that use persistence diagrams in statistical inference, a
full Bayesian treatment remains absent. This paper, relying on the theory of point processes, presents
a generalized Bayesian framework for inference with persistence diagrams relying on a substitution
likelihood argument. In essence, we model persistence diagrams as Poisson point processes with
prior intensities and compute posterior intensities by adopting techniques from the theory of marked
point processes. We then propose a family of conjugate prior intensities via Gaussian mixtures to
obtain a closed form of the posterior intensity. Finally, we demonstrate the utility of this generalized
Bayesian framework with a classification problem in materials science using Bayes factors.

Key words. Bayesian inference and classification, intensity, marked Poisson point processes, topological data
analysis, high entropy alloys, atom probe tomography

AMS subject classifications. 62F15, 60G55, 62-07

DOI. 10.1137/19M1268719

1. Introduction. A crucial first step in understanding patterns and properties of a crys-


talline material is determining its crystal structure. For highly disordered metallic alloys, such
as high entropy alloys (HEAs), atom probe tomography (APT) gives a snapshot of the lo-
cal atomic environment; see Figure 1. However, APT has two main drawbacks: experimental
noise and an abundance of missing data. Approximately 65\% of the atoms in a sample are not
registered in a typical experiment [50], and the spatial coordinates of those identified atoms
are corrupted by experimental noise [42]. Understanding the atomic pattern within HEAs
using an APT image requires observation of atomic cubic unit neighborhood cells under a
microscope. This is problematic, as APT may have a spatial resolution approximately the
length of the unit cell under consideration [28, 42]. Hence, the process is unable to see the finer
details of a material, rendering the determination of a lattice structure a challenging problem
[55, 39]. Existing algorithms for detecting the crystal structure [14, 24, 25, 32, 43, 56] are not

\ast
Received by the editors June 17, 2019; accepted for publication (in revised form) November 5, 2019; published
electronically February 6, 2020.
https://doi.org/10.1137/19M1268719
Funding: The first author's research was partially supported by ARO W911NF-17-1-0313, NSF MCB-1715794,
and DMS-1821241, Thor Industries/ARL W911NF-17-2-0141, and ARL cooperative agreement W911NF-19-2-0328.
The second author's research was supported by ARO W911NF-17-1-0313. The third author's research was partially
supported by Thor Industries/ARL W911NF-17-2-0141 and W911NF-19-2-0302. The views and conclusions con-
tained in this document are those of the authors and should not be interpreted as representing the official policies,
either expressed or implied, of the Army Research Laboratory or the U.S. Government. The U.S. Government is
authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation
herein.
\dagger
Department of Mathematics, University of Tennessee, Knoxville, TN 37996 ([email protected], [email protected],
[email protected]).
48

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 49

able to establish the crystal lattice of an APT dataset, as they rely on symmetry arguments
based on identifying repeating parts of molecules. Consequently, the field of atom probe crys-
tallography, i.e., determining the crystal structure from APT data, has emerged in recent
years [22, 43]. Algorithms in this field rely on knowing the global lattice structure a priori and
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

aim to determine local small-scale structures within a larger sample. For some materials, this
information is readily known, while for others, such as HEAs, the global structure is unknown
and must be inferred.

Aluminium Cobalt Chromium Copper Nickel Iron

(a) (b) (c)

Figure 1. (a) Image of APT data with atomic neighborhoods shown in detail on the left. Each pixel
represents a different atom, the neighborhood of which is considered. Certain patterns with distinct crystal
structures exist, e.g., the orange region is copper-rich (left), but overall no pattern is identified. Putting a single
atomic cubic unit cell under a microscope, the true crystal structure of the material, which could be either
body-centered cubic (b) or face-centered cubic (c), is not revealed. This distinction is obscured due to further
experimental noise. Notice there is an essential topological difference between the two structures in (b) and (c):
The BCC structure has one atom at its center, whereas the FCC is hollow in its center, but has one atom in
the center of each of its faces.

In this work, we specifically classify unit cells that are either body-centered cubic (BCC)
or face-centered cubic (FCC). These lattice structures are the essential building blocks of
HEAs [61] and have fundamental differences that set them apart in the case of noise-free,
complete materials data. The BCC structure has a single atom in the center of the cube,
while the FCC has a void in its center but has atoms on the center of the cubes' faces; see
Figure 1(b)--(c). These two crystal structures are distinct when viewed through the lens of
topology. Differentiating between the empty space and connectedness of these two lattice
structures allows us to create an accurate classification rule. This fundamental distinction
between BCC and FCC point clouds is captured well by topological methods and explains
the high degree of accuracy in the classification scheme presented herein. Indeed, we offer a
Bayesian classification framework for topological features, and although we focus on materials
science data in this paper, the generalized Bayesian framework can handle any datasets.
Overall, topological data analysis (TDA) encompasses a broad set of techniques that
explore topological structure in datasets [18, 23, 13, 59]. One of these techniques, persis-
tent homology, associates shapes to data and summarizes salient features with persistence
diagrams---multisets of points that represent homological features along with their appearance
and disappearance scales [18]. Features of a persistence diagram that exhibit long persistence

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


50 V. MAROULAS, F. NASRIN, AND C. OBALLE

describe global topological properties in the underlying dataset, while those with shorter per-
sistence encode information about local geometry and/or noise. Hence, persistence diagrams
can be considered multiscale summaries of data's shape. While there are several methods
present in the literature to compute persistence diagrams, we adopt geometric complexes that
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

are typically used for applications of persistent homology to data analysis in various settings,
such as handwriting analysis [2], studying of brain arteries [5, 6], image analysis [8, 12, 11],
neuroscience [15, 54, 4], sensor networks [17, 52], protein structure [21, 31], biology [51, 40, 45],
dynamical systems [29], action recognition [58], signal analysis [35, 34, 47, 36], chemistry [60],
genetics [26], object data [46], etc.
Researchers desire to use persistence diagrams in inference and classification problems.
Several achieve this directly with persistence diagrams [37, 35, 7, 20, 41, 49, 10], while others
elect to first map them into a Hilbert space [9, 48, 1, 57, 19]. The latter approach enables
one to adopt traditional machine learning and statistical tools such as principal component
analysis, random forests, support vector machines, and more general kernel-based learning
schemes. Despite progress toward statistical inference, to the best of our knowledge, a full
Bayesian treatment predicated upon creating posterior distributions of persistence diagrams
is still absent in the literature. The first Bayesian considerations in a TDA context take place
in [41], where the authors discuss a conditional probability setting on persistence diagrams
where the likelihood for the observed point cloud has been substituted by the likelihood for
its associated topological summary.
The homological features in persistence diagrams have no intrinsic order implying they
are random sets as opposed to random vectors. To that end, we model random persistence
diagrams as Poisson point processes. The defining feature of these point processes is that they
are solely characterized by a single parameter known as the intensity. Utilizing the theory
of marked point processes, we obtain a method for computing posterior intensities that does
not require us to consider explicit maps between input diagrams and underlying parameters,
alleviating the computational burden associated with deriving the posterior intensity from
Bayes' rule alone.
In particular, for a given collection of observed persistence diagrams, we consider the un-
derlying stochastic phenomena generating persistence diagrams to be Poisson point processes
with prior uncertainty captured in presupposed intensities. In applications, one may select
an informative prior by choosing an intensity based on expert opinion, or alternatively choose
an uninformative prior intensity when information is not available. The likelihood functions
in our model represent the level of belief that observed diagrams are representative of the
entire population. We build this analogue using the theory of marked Poisson point processes
[16]. A central idea of this paper is to use the topological summaries of point clouds in place
of the actual point clouds. This provides a powerful tool with applications in wide ranging
fields. The application considered in this paper is the classification of the crystal structure
of materials, which allows scientists to predict the properties of a crystalline material. Our
goal is to view point clouds through their topological descriptors, as this can reveal essential
shape peculiarities latent in the point clouds. Our generalized Bayesian method adopts a
substitution likelihood technique by Jeffreys in [27] instead of considering the full likelihood
for the point cloud. A similar sort of discussion was considered in [41] for defining conditional
probability on persistence diagrams.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 51

Another key contribution of this paper is the derivation of a closed form of the posterior
intensity, which relies on conjugate families of Gaussian mixtures. An advantage of this Gauss-
ian mixture representation is that it allows us to perform Bayesian inference in an efficient
and reliable manner. Indeed, this model can be viewed as an analogue of the ubiquitous ex-
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

ample in standard Bayesian inference where a Gaussian prior and likelihood yield a Gaussian
posterior. We present a detailed example of our closed form implementation to demonstrate
computational tractability and showcase its applicability by using it to build a Bayes factor
classification algorithm; we test the latter in a classification problem for materials science
data.
The contributions of this work are
1. Theorem 3.1, which provides the generalized Bayesian framework for computing the
posterior distribution of persistence diagrams;
2. Proposition 3.2, which yields a conjugate family of priors based on a Gaussian mixture
for the proposed Bayesian framework; and
3. a classification scheme using Bayes factors considering the posteriors of persistence
diagrams and its application to a materials science problem.
This paper is organized as follows. Section 2 provides a brief overview of persistence
diagrams and general point processes. Our methods are presented in section 3. In particular,
subsection 3.1 establishes the Bayesian framework for persistence diagrams, while subsection
3.2 contains the derivation of a closed form for a posterior distribution based on a Gaussian
mixture model. A classification algorithm with Bayes factors is discussed in section 4. To
assess the capability of our algorithm, we investigate its performance on materials data in
subsection 4.1. Finally, we end with discussions and conclusions in section 5.

2. Background. We begin by discussing preliminary definitions essential for building our


model. In subsection 2.1, we briefly review simplicial complexes and provide a formal definition
for persistence diagrams (PDs). Pertinent definitions and theorems from point processes (PPs)
are discussed in subsection 2.2.

2.1. Persistence diagrams. We start by discussing simplices and simplicial complexes,


intermediary structures for constructing PDs.
Definition 2.1. A d-dimensional collection of data\sum \{ v0 , . . . , vn \} \subset \BbbR d \setminus \{ 0\} \sum
is said to be
geometrically independent if for any set ti \in \BbbR with ni=0 ti = 0, the equation ni=0 ti vi = 0
implies that ti = 0 for all i \in \{ 0, . . . , n\} .
Definition 2.2. A k-simplex is a collection
\bigl\{ \sum k of k+1 \sum geometrically\bigr\} independent elements along
k
with their convex hull: [v0 , . . . , vk ] = i=0 \alpha i v i : i=0 \alpha i = 1 . We say that the vertices
v0 , . . . , vn span the k-dimensional simplex, [v0 , . . . , vk ]. The faces of a k-simplex [v0 , . . . , vk ]
are the (k - 1)-simplices spanned by subsets of \{ v0 , . . . , vk \} .
Definition 2.3. A simplicial complex S is a collection of simplices satisfying two conditions:
(i) if \xi \in S, then all faces of \xi are also in S, and (ii) the intersection of two simplices in S is
either empty or contained in S.
Given a point cloud X, our goal is to construct a sequence of simplicial complexes that
reasonably approximates the underlying shape of the data. We accomplish this by using the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


52 V. MAROULAS, F. NASRIN, AND C. OBALLE

Vietoris--Rips filtration.
Definition 2.4. Let X = \{ xi \} L d
i=0 be a point cloud in \BbbR , and let r > 0. The Vietoris--Rips
complex of X is defined to be the simplicial complex \scrV r (X) satisfying [xi1 , . . . , xil ] \in \scrV r (X)
if and only if diam(xi1 , . . . , xil ) < r. Given a nondecreasing sequence \{ rn \} \in \BbbR + \cup \{ 0\} with
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

r0 = 0, we denote its Vietoris--Rips filtration by \{ \scrV rn (X)\} n\in \BbbN .

Figure 2. Vietoris--Rips complexes.

A PD \scrD is a multiset of points in \scrW := \BbbW \times \{ 0, 1, . . . , d - 1\} , where \BbbW := \{ (b, d) \in
\BbbR 2 : d \geq b \geq 0\} and each element (b, d, k) represents a homological feature of dimension k
that appears at scale b during a Vietoris--Rips filtration and disappears at scale d. Intuitively
speaking, the feature (b, d, k) is a k-dimensional hole lasting for duration d - b. Namely,
features with k = 0 correspond to connected components, k = 1 to loops, and k = 2 to voids.
An illustration of Vietoris--Rips complexes is shown in Figure 2, and an example of a PD is
shown in Figure 3.

2.2. Poisson point processes. This section contains basic definitions and fundamental
theorems from PPs, primarily Poisson PPs. Detailed treatments of Poisson PPs can be found
in [16] and references therein. For the remainder of this section, we take \BbbX and \scrX to be a
Polish space and its Borel \sigma -algebra, respectively.
\sum \infty
Definition 2.5. A finite PP \scrP is a pair (\{ pn \} , \{ \BbbP n \} ) where n=0 pn = 1 and \BbbP n is a
symmetric probability measure on \scrX n , where \scrX 0 is understood to be the trivial \sigma -algebra.
The sequence \{ pn \} defines a cardinality distribution, and the measures \{ \BbbP n \} give spatial
distributions of vectors (x1 , . . . , xn ) for fixed n. Definition 2.5 naturally prescribes a method
for sampling a finite PP: (i) determine the number of points n by drawing from \{ pn \} , and
then (ii) spatially distribute (x1 , . . . , xn ) according to a draw from \BbbP n . As PPs model random
collections of elements in \{ x1 , . . . , xn \} \subset \BbbX whose order is irrelevant, any sensible construction
relying on random vectors should assign equal weight to all permutations of (x1 , . . . , xn ). This
is ensured by the symmetry requirement in Definition 2.5. We abuse notation and write \scrP for
samples from \scrP as well as their set representations. It proves useful to describe finite PPs by
a set of measures that synthesize pn and \BbbP n to simultaneously package cardinality and spatial
distributions.
Definition 2.6. Let (\{ pn \} , \{ \BbbP n \} ) be a finite PP. The Janossy measures \{ \BbbJ n \} are defined as
the set of measures satisfying \BbbJ n (A) = n!pn \BbbP n (A) for all n \in \BbbN and A \in \scrX n .

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 53

Given a collection of disjoint rectangles A1 , . . . , An \subset \BbbX , the value \BbbJ n (A1 \times \cdot \cdot \cdot \times An )
is the probability of observing exactly one element in each of A1 , . . . , An and none in the
complement of their union. For applications, we are primarily interested in Janossy measures
\BbbJ n that admit densities jn with respect to a reference measure on \BbbX . We are now ready to
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

describe the class of finite PPs that model PDs.


Definition 2.7. Let \Lambda be a finite measure on \BbbX , and let \mu := \Lambda (\BbbX ). The finite PP \Pi is
n
Poisson if, for all n \in \BbbN and disjoint measurable rectangles A1 \times \cdot \cdot \cdot \times An \in \scrX n , pn = e - \mu \mu n! ,
and \BbbP n (A1 \times \cdot \cdot \cdot \times An ) = ni=1 \Lambda (\mu Ai ) . We call \Lambda an intensity measure.
\prod \bigl( \bigr)

Equivalently, a Poisson PP is a finite PP with Janossy measures \BbbJ n (A1 \times \cdot \cdot \cdot \times An ) =
e - \mu ni=1 \Lambda (Ai ). The intensity measure in Definition 2.7 admits a density, \lambda , with respect to
\prod
some reference measure on \BbbX . Notice that for all A \in \scrX ,
\infty n
\Biggl( \biggl( \biggr) \Biggr)
n
1 k c n - k .
\sum \sum
\BbbE (| \Pi \cap A| ) = pn \BbbE \BbbP n k
k A \times (A )
n=0 k=0

Elementary calculations then show that \BbbE (| \Pi \cap A| ) = \Lambda (A). Thus, we interpret the intensity
measure of a region A, \Lambda (A) as the expected number of elements in \Pi that land in A. The
intensity measure serves as an analogue to the first-order moment for a random variable (RV).
The next two definitions involve a joint PP wherein points from one space parameterize
distributions for the points living in another. Consequently, we introduce another Polish space
\BbbM along with its Borel \sigma -algebra \scrM to serve as the mark space in a marked Poisson PP. These
model scenarios in which points drawn from a Poisson PP provide a data likelihood model for
Bayesian inference with PPs.
Definition 2.8. Suppose \ell : \BbbX \times \BbbM \rightarrow \BbbR + \cup \{ 0\} is a function satisfying the following: (1)
for all x \in \BbbX , \ell (x, \bullet ) is a probability measure on \BbbM , and (2) for all B \in \scrM , \ell (\bullet , B) is a
measurable function on \BbbX . Then, \ell is a stochastic kernel from \BbbX to \BbbM .
Definition 2.9. A marked Poisson PP \Pi M is a finite PP on \BbbX \times \BbbM such that the fol-
lowing hold: (i) (\{ pn \} , \{ \BbbP n (\bullet \times \BbbM )\} ) is a Poisson PP on \BbbX , and (ii) for all (x1 , . . . , xn ) \in
\BbbX n\sum
, measurable rectangles B1 \times \cdot \cdot \cdot \times Bn \in \scrM n and \BbbP n ((x1 , . . . , xn ) \times B1 \times \cdot \cdot \cdot \times Bn ) =
1 \prod n
n! \pi \in \scrS n i=1 \ell (x\pi (i) , Bi ), where \scrS n is the set of all permutations of (1, . . . , n) and \ell is a
stochastic kernel.
Given a set of observed marks M = \{ y1 , . . . , ym \} , after adopting Definition 2.6, it can be
shown [53] that the Janossy densities for the PP induced by \Pi M on \BbbX given M are

1\emptyset ,
\left\{
n = m = 0,
\sum \prod n
(2.1) jn| M (x1 , . . . , xn ) = \pi \in \scrS n i=1 p(xi | y\pi (i) ), n = m > 0,
0 otherwise,

where p is the stochastic kernel for \Pi M evaluated in \BbbX for a fixed value of y \in \BbbM .
The following theorems allow us to construct new Poisson PPs from existing ones. Their
proofs can be found in [30].

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


54 V. MAROULAS, F. NASRIN, AND C. OBALLE

Theorem 2.10 (the superposition theorem). Let \{ \Pi n \} n\in \BbbN be a collection of independent
Poisson
\bigcup PPs each having intensity measure \Lambda n . Then\sum their superposition \Pi given by \Pi :=
n\in \BbbN \Pi n is a Poisson PP with intensity measure \Lambda = n\in \BbbN \Lambda n .
Theorem 2.11 (the mapping theorem). Let \Pi be a Poisson PP on \BbbX with \sigma -finite intensity
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

measure \Lambda , and let (\BbbT , \scrT ) be a measurable space. Suppose f : \BbbX \rightarrow \BbbT is a measurable function.
Write \Lambda \ast for the induced measure on T given by \Lambda \ast (B) := \Lambda (f - 1 (B)) for all B \in \scrT . If \Lambda \ast
has no atoms, then f \circ \Pi is a Poisson PP on T with intensity measure \Lambda \ast .
Theorem 2.12 (the marking theorem). \int \int The marked Poisson PP in Definition 2.9 has the
intensity measure given by \Lambda M (C) = C \Lambda (dx)\ell (x, dm) for all C \in \scrX \times \scrM , where \Lambda is the
intensity measure for the Poisson PP that \Pi M induces on \BbbX , and \ell is a stochastic kernel.
The final tool we need is the probability generating functional, as it enables us to recover
intensity measures using a notion of differentiation. The probability generating functional can
be interpreted as the PP analogue of the probability generating function.
Definition 2.13. Let \scrP be a finite PP on a Polish space \BbbX . Denote by \scrB (\BbbC ) the set of all
functions h : \BbbX \rightarrow \BbbC with | | h| | \infty < 1. The probability generating functional of \scrP denoted by
G : \scrB (\BbbC ) \rightarrow \BbbR is given by
\left( \right)
\infty \int n
\sum 1 \prod
(2.2) G(h) = J0 + h(xj ) \BbbJ n (dx1 . . . dxn ).
n! \BbbX n
n=1 j=1

Definition 2.14. Let G be the probability generating functional given in (2.2). The func-
tional derivative of G in the direction of \eta \in \scrB (C) evaluated at h, when it exists, is given by
G\prime (h; \eta ) = lim\epsilon \rightarrow 0 G(h+\epsilon \eta \epsilon ) - G(h) .
It can be shown
\prod m that the functional derivative satisfies the familiar product rule [33];
namely, for G = i=1 Gi ,
m
\sum \prod
(2.3) G\prime (h; \gamma ) = G\prime i (h; \gamma ) Gj (h).
i=1 j\not =1

As proved in [44], the intensity measure \Lambda of the Poisson PP in Definition 2.7 can be obtained
by differentiating G, i.e.,

(2.4) \Lambda (A) = G\prime (1; 1A ),

where 1A is the indicator function for any A \in \scrX . Generally speaking, one obtains the intensity
measure for a general PP through \Lambda (A) = limh\rightarrow 1 G\prime (h; 1A ), but the preceding identity suffices
for our purposes since we only consider PPs for which (2.2) is defined for all bounded h.
Corollary
\sum m 2.15. The intensity function for the PP whose Janossy densities are listed in
(2.1) is i=1 p(x| yi ) if m > 0 and 0 otherwise.
Proof.
\int We1 substitute \prod nthe Janossy measures \BbbJ n for the PP (see (2.1)) into (2.2). This yields
G(h) = \BbbX m m! \pi \in \scrS n i=1 p(xi | y\pi (i) ) dxi if m > 0, and G(h) = 1\emptyset otherwise. It is easy to
\sum

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 55
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(a) (b) (c)

Figure 3. (a) An example of a dataset; (b) its PD; (c) its tilted representation.

see that the functional derivative (Definition 2.14) of a constant is zero, so our claim follows
immediately in the latter case by appealing to (2.4). In the former case, we have
\int n
1 \sum \prod
G(h) = h(xi )p(xi | y\pi (i) ) dxi
\BbbX m m! \pi \in \scrS i=1
m
m
\int \prod
1 \sum
= h(xi )p(xi | y\pi (i) ) dxi (by linearity of the integral)
m! m
\pi \in \scrS m \BbbX i=1
\Biggl( m \int \Biggr)
1 \sum \prod
= h(x)p(x| y\pi (i) ) dx (by Fubini's theorem)
m!
\pi \in \scrS m i=1 \BbbX
m \int
\prod
= h(x)p(x| yi ) dx (by symmetry).
i=1 \BbbX

Write Gi (h) for the probability generating functional of the PP with j1 = p(x| y
\prod i ) and jn = 0
for n \not = 1. We can summarize the preceding string of equalities by G(h) = m i=1 Gi (h). We
apply the product rule for functional derivatives (see (2.3)) and then substitute h = 1 and
\gamma = 1x\prime (the indicator of a singleton) to obtain
m \prod \int m
G (1; 1x\prime ) =
\sum \sum
\prime \prime
p(x | yi ) p(x| yj ) dx = p(x| yi ).
i=1 i\not =j \BbbX i=1

Appealing to (2.4) establishes the claim.


3. Bayesian inference. In this section, we construct a framework for Bayesian inference
with PDs by modeling them as Poisson PPs. First, we derive a closed form for the posterior
intensity given a PD drawn from a finite PP, and then we present a family of conjugate priors
followed by an example.
3.1. Model. Given a PD \scrD , the map T : \BbbW \rightarrow T (\BbbW ) given by T (b, d) = (b, d - b) defines
tilted representation of \scrD as T (\scrD ) = \cup (b,d,k)\in \scrD (T (b, d), k); see Figure 3. In what follows, we
assume all PDs are given in their tilted representations and, unless otherwise noted, abuse

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


56 V. MAROULAS, F. NASRIN, AND C. OBALLE

notation by writing \BbbW and \scrD for T (\BbbW ) and T (\scrD ), respectively. We also fix the homological
dimension of features in a PD by defining \scrD k := \{ (b, d) \in \BbbW : (b, d, k) \in \scrD \} .
According to Bayes' theorem, posterior density is proportional to the product of a likeli-
hood function and a prior. To adopt the Bayesian framework to PDs, we need to define two
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

models. In particular, our Bayesian framework views a random PD as a Poisson PP equipped


with a prior intensity while observed PDs \scrD Y are considered to be marks from a marked
Poisson PP. This enables modification of the prior intensity by incorporating observed PDs,
yielding a posterior intensity based on data. Some parallels between our Bayesian framework
and that for RVs are illustrated in Table 1.
Table 1
The parallels between the Bayesian framework for RVs and its counterpart for random PDs.

Bayesian framework for RVs Bayesian framework for random PDs


Prior modeled by a prior density f modeled by a Poisson PP with prior intensity \lambda
Likelihood depends on observed data stochastic kernel that depends on observed PDs
Posterior compute the posterior density a Poisson PP with posterior intensity

k , \scrD k ) \in \BbbW \times \BbbW be a finite PP, and consider the following:
Let (\scrD X Y
k1
(M1) For k1 \not = k2 , (\scrD X , \scrD Yk1 ) and (\scrD X
k2
, \scrD Yk2 ) are independent.
k k k
(M2) For k fixed, \scrD X = \scrD XO \cup \scrD XV , and some \alpha : \BbbW \rightarrow [0, 1], \scrD X k k
and \scrD X are inde-
O V
pendent Poisson PPs having intensity functions \alpha (x)\lambda \scrD k (x) and (1 - \alpha (x)) \lambda \scrD k (x),
X X
respectively.
(M3) For k fixed, \scrD Yk = \scrD YkO \cup \scrD YkS , where the following hold:
(i) (\scrD X k , \scrD k ) is a marked Poisson PP with a stochastic kernel density \ell (y| x).
O YO
(ii) \scrD YkO and \scrD YkS are independent finite Poisson PPs where \scrD YkS has intensity
function \lambda \scrD k .
YS

Hereafter we abuse notation by writing \scrD X for \scrD X k . The modeling assumption (M1) allows

us to develop results independently for each homological dimension k and then combine them
using independence. In (M2), the random PD \scrD X is modeled as a Poisson PP with prior
intensity \lambda \scrD X . There are two cases we may encounter for any point x from the prior intensity
due to the nature of PDs. We assign a probability function \alpha (x) to accommodate these
two possibilities. Depending upon the noise level in data, any feature x in \scrD X may not be
represented in observations; this scenario happens with probability 1 - \alpha (x), and we denote
this case as \scrD XV in (M2). Otherwise, a point x is observed with a probability of \alpha (x) and
this scenario is presented as \scrD XO in (M2). Consequently, the intensities of \scrD XO and \scrD XV are
proportional to the intensity \lambda \scrD X weighted by \alpha (x) and 1 - \alpha (x), respectively, and the total
prior intensity for \scrD X is given by their sum. (M3) considers observed PD \scrD Y and decomposes
it into two independent PDs, \scrD YO and \scrD YS . \scrD YO is linked to \scrD XO via a marked PP with
likelihood \ell (y| x) defined in (2.1), whereas the component \scrD YS includes any point y that arises
from noise or unanticipated geometry. See Figure 4 for a graphical representation of these
ideas.
Theorem 3.1 (Bayesian theorem for PDs). Let \scrD X be a PD modeled by a Poisson PP as in
(M2). Suppose \scrD XO and \scrD XV have prior intensities \alpha (x)\lambda \scrD X and (1 - \alpha (x))\lambda \scrD X , respectively.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 57
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(a) (b)

Figure 4. (a) A sample from the prior PP \scrD X and an observed PD \scrD Y . (b) Decomposition of \scrD X into
\scrD XO ,\scrD XV and \scrD Y into \scrD YO ,\scrD YS . The points in \scrD XV have no relationship to those in \scrD Y , while those in \scrD XO
only generate observed points in \scrD YO . The remaining observed points in \scrD YS model unanticipated features that
one may obtain due to uncertainty/noise.

Consider DY 1 , . . . , DY m independent samples from the PP that characterizes the PD \scrD Y of


(M3), and denote DY 1:m := \cup m i=1 DY i , where DY i = DYOi \cup DYSi for all i = 1, . . . , m. Moreover,
\ell (y| x) is the likelihood associated for the stochastic kernel between \scrD XO and \scrD YO , and \lambda \scrD YS
is the intensity of \scrD YS as defined in (M3). Then, the posterior intensity of \scrD X given DY 1:m
is

(3.1) \lambda \scrD X | DY 1:m (x) = (1 - \alpha (x)) \lambda \scrD X (x)
m \sum
1 \sum \ell (y| x)\lambda \scrD X (x)
+ \alpha (x) \int a.s.
m \lambda \scrD YS (y) + \BbbW \ell (y| u)\alpha (u)\lambda \scrD X (u)du
i=1 y\in DY i

The proof of Theorem 3.1 can be found in Appendix A. One important point about the
above theorem is that, instead of relying on a likelihood function for the point cloud data,
our Bayesian model considers the likelihood for the PD generated by the observed point cloud
data at hand. This is analogous to the idea of substitution likelihood by Jeffreys in [27].
3.2. A conjugate family of prior intensities: Gaussian mixtures. This section focuses
on constructing a family of conjugate prior intensities, i.e., a collection of priors that yield
posterior intensities of the same form when used in (3.1). Exploiting Theorem 3.1 with
Gaussian mixture prior intensities, we obtain Gaussian mixture posterior intensities. As PDs
are stochastic PPs on the space \BbbW , not \BbbR 2 , we consider a restricted Gaussian density restricted
to \BbbW . Namely, for a Gaussian density on \BbbR 2 , \scrN (z; \upsilon , \sigma I), with mean \upsilon and covariance matrix
\sigma I, we restrict the Gaussian density on \BbbW as

(3.2) \scrN \ast (z; \upsilon , \sigma I) := \scrN (z; \upsilon , \sigma I)1\BbbW (z),

where 1\BbbW is the indicator function of the wedge \BbbW .


Consider a random PD \scrD X as in (M2) and a collection of observed PDs \{ DY 1 , . . . , DY m \}
that are independent samples from the Poisson PP characterizing the PD \scrD Y in (M3). We

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


58 V. MAROULAS, F. NASRIN, AND C. OBALLE

denote DY 1:m := \cup m i=1 DY i . Below we specialize (M2) and (M3) so that applying Theorem 3.1
to a mixed Gaussian prior intensity yields a mixed Gaussian posterior:
(M2\prime ) \scrD X = \scrD XO \cup \scrD XV , where \scrD XO and \scrD XV are independent Poisson PPs with intensities
\alpha \lambda \scrD X (x) and (1 - \alpha )\lambda \scrD X (x), respectively, with
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

N
c\scrD \scrD X \scrD X
\sum
\ast
j \scrN (x; \mu j , \sigma j I),
X
(3.3) \lambda \scrD X (x) =
j=1

where N is the number of mixture components.


(M3\prime ) \scrD Y = \scrD YO \cup \scrD YS , where
(i) the marked Poisson PP (\scrD XO , \scrD YO ) has density \ell (y| x) given by

(3.4) \ell (y| x) = \scrN \ast (y; x, \sigma \scrD YO I).

(ii) \scrD YO and \scrD YS are independent finite Poisson PPs, and \scrD YS has the intensity
function given below:
M
\sum \scrD YS \scrD YS \scrD YS
(3.5) \lambda \scrD YS (y) = ck \scrN \ast (y; \mu k , \sigma k I),
k=1

where M is the number of mixture components.


Proposition 3.2. Suppose that assumptions (M1), (M2\prime ), and (M3\prime ) hold; then, the poste-
rior intensity of (3.1) in Theorem 3.1 is a Gaussian mixture of the form
m N
\alpha \sum \sum \sum y \ast
(3.6) \lambda \scrD X | DY 1:m (x) = (1 - \alpha )\lambda \scrD X (x) + Cj \scrN (x; \mu yj , \sigma jy I),
m
i=1 y\in \scrD Y i j=1

wjy
\int
where Cjy = \sum N
y
y y ; Qj = \scrN (u; \mu yj , \sigma jy I)du;
\lambda \scrD YS (y) + \alpha j=1 wj Qj \BbbW

wjy = c\scrD \scrD X


j \scrN (y; \mu j , (\sigma
X \scrD YO
+ \sigma j\scrD X )I);
\sigma j\scrD X y + \sigma \scrD YO \mu \scrD
j
X
\sigma \scrD YO \sigma j\scrD X
and \mu yj = ; \sigma jy = .
\sigma j\scrD X + \sigma \scrD YO \sigma j\scrD X + \sigma \scrD YO

The proof of Proposition 3.2 follows from well-known results about products of Gaussian
densities given below; for more details, the reader may refer to [38] and references therein.
Lemma 3.3. For p \times p matrices H, R, P , with R and P positive definite, and a p \times 1
\^ P\^ ), where q(y) = \scrN (y; Hs, R + HP H T ), s\^ =
vector s, \scrN (y; Hx, R) \scrN (x; s, P ) = q(y) \scrN (x; s,
s + K(y - Hs), P\^ = (I - KH)P , and K = P H T (HP H T + R) - 1 .
Proof of Proposition 3.2. Using Lemma 3.3, we first derive \ell (y| x) \lambda \scrD X (x) by seeing that,
in our model, H = I, R = \sigma \scrD YO I, s = \mu \scrD X \scrD X
j , and P = \sigma j I. By typical matrix operations

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 59

\scrD \scrD X \scrD Y \scrD \scrD Y \scrD


\sigma j X \sigma j y+\sigma O \mu X \sigma O \sigma X
we obtain K = \scrD X \scrD , s\^ = \scrD
j
, and P\^ = \scrD X
j
\scrD Y . The numerator and
\sigma j +\sigma YO \sigma jX +\sigma YO \sigma j +\sigma O
denominator of the second term in (3.1), j=1 c\scrD \scrD X
+ \sigma j\scrD X )I) \scrN \ast (x; \mu yj , \sigma jy I),
\sum N
\scrD YO
j \scrN (y; \mu j , (\sigma
X

and \lambda \scrD YS (y)+\alpha j=1 cj \scrN (y; \mu \scrD


\sum N \scrD X \scrD YO
+\sigma j\scrD X )I) \BbbW \scrN (u; muyj , \sigma jy I)du, respectively, yield
X
\int
j , (\sigma
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

\scrD \scrD \scrD \scrD


c X \scrN (y;\mu X ,(\sigma YO +\sigma X )I)
\Bigl[ \Bigr]
\sum N j j j x| y x| y
j=1 \scrD \scrD \scrD \scrD x| y x| y
\scrN \ast (x;\mu j ,\sigma j I),
(y)+\alpha N c X \scrN (y;\mu X ,(\sigma YO +\sigma X )I) \BbbW \scrN (u;\mu
\sum \int
\lambda \scrD ,\sigma I)du
YS j=1 j j j j j

where the bracketed expression is the definition of Cjy .


3.2.1. Example. Here, we present a detailed example of computing the posterior intensity
according to (3.6) for a range of parametric choices. Reproducing these results, the interested
reader may download our R-package BayesTDA. We consider circular point clouds often as-
sociated with periodicity in signals [35] and focus on estimating homological features with
k = 1 as they correspond to 1-dimensional holes, which describe the prominent topological
feature of a circle. Precisely, our goals are to (i) illustrate posterior intensities and draw
analogies to standard Bayesian inference; (ii) determine the relative contributions of the prior
and observed data to the posterior; and (iii) perform sensitivity analysis. Case-I is ideal, while
Case-II, Case-III, and Case-IV are purposefully disadvantageous.
Table 2
List of Gaussian mixture parameters of the prior intensities in (3.3). The means \mu \scrD
i
X
are 2 \times 1 vectors
and the rest are scalars.

\mu \scrD
i
X
\sigma i\scrD X c\scrD
i
X

informative prior (0.5, 1.2) 0.01 1


weakly informative prior (0.5, 1.2) 0.2 1
unimodal uninformative prior (1, 1) 1 1
(0.5, 0.5) 0.2 1
bimodal uninformative prior
(1.5, 1.5) 0.2 2

Case-I Case-II Case-III


Figure 5. The observed datasets generated for Case-I, Case-II, and Case-III by sampling the unit circle
and perturbing with Gaussian noise having variances 0.001I2 , 0.01I2 , and 0.1I2 , respectively.

We start by considering a Poisson PP with prior intensity \lambda \scrD X that has the Gaussian mix-
ture form given in (M2\prime ). We take into account four types of prior intensities: (i) informative,
(ii) weakly informative, (iii) unimodal uninformative, and (iv) bimodal uninformative; see
Figures 6--8(a), (d), (g), (j), respectively. We use one Gaussian component in each of the first

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


60 V. MAROULAS, F. NASRIN, AND C. OBALLE

three priors, as the underlying shape has a single 1-dimensional feature, and two for the last
one to include a case where we have no information about the cardinality of the underlying
true diagram. The parameters of the Gaussian mixture density in (3.3) used to compute these
prior intensities are listed in Table 2. To present the intensity maps uniformly throughout
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

this example while preserving their shapes, we divide the intensities by their corresponding
maxima. This ensures all intensities are on a scale from 0 to 1, and we call it the scaled
intensity. The observed PDs are generated from point clouds sampled uniformly from the
unit circle and then perturbed by varying levels of Gaussian noise; see Figure 5, wherein we
present three point clouds sampled with Gaussian noise having variances 0.001I2 , 0.01I2 , and
0.1I2 , respectively. Consequently, these point clouds provide PDs DY i for i = 1, 2, 3, which are
considered as independent samples from Poisson PP \scrD Y , exhibiting distinctive characteristics
such as only one prominent feature with high persistence and no spurious features (Case-I),
one prominent feature with high persistence and very few spurious features (Case-II), and one
prominent feature with medium persistence and more spurious features (Case-III).
Table 3
Parameters for (M3\prime ) in (3.4) and (3.5). We set the weight and mean of the Gaussian component, c\scrD YS = 1
and \mu \scrD YS = (0.5, 0), respectively, for all of the cases. The first row corresponds to parameters in the functions
characterizing \scrD Y that are used in computing the posterior depicted in the first column of Figure 9. The
second row corresponds to analogous parameters that are used in computing the posterior depicted in the second
columns of Figures 6--9. Similarly, the third row corresponds to parameters in the functions characterizing \scrD Y
used for computing the posterior presented in the third columns of Figures 6--9.

Case-I Case-II Case-III Case-IV


\sigma \scrD YO = 0.1
\sigma \scrD YS = 0.1
\sigma \scrD YO = 0.01 \sigma \scrD YO = 0.1 \sigma \scrD YO = 0.01 \sigma \scrD YO = 0.1
\sigma \scrD YS = 0.1 \sigma \scrD YS = 0.1 \sigma \scrD YS = 0.1 \sigma \scrD YS = 1
\sigma \scrD YO = 0.1 \sigma \scrD YO = 0.1 \sigma \scrD YO = 0.1 \scrD Y
\sigma O = 0.01
\sigma \scrD YS = 0.1 \sigma \scrD YS = 1 \sigma \scrD YS = 0.1 \sigma \scrD YS = 0.1

For each observed PD, 1-dimensional persistence features are presented as green dots
overlaid on their corresponding posterior intensity plots. For Case-I, Case-II, and Case-III,
we set the probability \alpha of the event that a feature in \scrD X appears in \scrD Y to 1, i.e., any
feature in \scrD X is certainly observed through a mark in \scrD Y , and later in Case-IV, we decrease
\alpha to 0.5 while keeping all other parameters the same for the sake of comparison. The choice
of \alpha = 0.5 anticipates that any feature has equal probability to appear or disappear in the
observation and in turn provides further intuition about the contribution of prior intensities
to the estimated posteriors. We observe that in all cases, the posterior estimates the 1-
dimensional hole but with different uncertainty each time. For example, for the cases where
the data are trustworthy, expressed by a likelihood with tight variance, or in the case of
an informative prior, the posterior accurately estimates the 1-dimensional hole. In contrast,
when the data suffer from high uncertainty and the prior is uninformative, then the posterior
offers a general idea that the true underlying shape is a circle, but the exact estimation of the
1-dimensional hole is not accurate. We examine the cases below.
Case-I. We consider informative, weakly informative, unimodal uninformative, and bi-
modal uninformative prior intensities as presented in Figure 6(a), (d), (g), and (j), respec-

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 61

tively, to compute corresponding posterior intensities. The prior intensities parameters are
listed in Table 2. The observed PD is obtained from the point cloud in Figure 5 (left). The
parameters associated to the observed PD are listed in Table 3. For the observed PD aris-
ing from data with very low noise, we observe that the posterior computed from any of the
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

priors predicts the existence of a 1-dimensional hole accurately. First, with a low variability
in observed PD (\sigma \scrD YO = 0.01 and \sigma \scrD YS = 0.1), the posterior intensities estimate the hole
with high certainty (Figure 6(b), (e), (h), and (k), respectively). Next, to determine the effect
of observed data on the posterior, we increase the variance of the observed PD component
\scrD YO , which consists of features in observed PDs that are associated to the underlying prior.
Here, we observe that the posterior intensities still estimate the hole accurately due to the
trustworthy data; this is evident in Figure 6(c), (f), (i), and (l). In Figure 6, the posteriors in
(b), (e), (h), and (k) have lower variance around the 1-dimensional feature in comparison to
(c), (f), (i), and (l), respectively.
Case-II. Here, we consider all four priors as in Case-I (see Figure 7(a), (d), (g), and
(j)). The point cloud in Figure 5 (center) is more perturbed around the unit circle than
that of Case-I (Gaussian noise with variance 0.01I2 ). Due to this, the associated PD exhibits
spurious features. The parameters used for this case are listed in Table 3. We compute the
posterior intensities for each type of prior. First, to illustrate the posterior intensity and
check the capability of detecting the 1-dimensional feature, we use moderate noise for the
observed PD (\sigma \scrD YO = 0.1 and \sigma \scrD YS = 0.1). The results are presented in Figure 7(b), (e),
(h), and (k); overall, the posteriors estimate the prominent feature with different variances
in their respective posteriors. Next, to illustrate the effect of observed data on the posterior,
we increase the variance \sigma \scrD YS of \scrD YS . According to our Bayesian model, the PD component
\scrD YS contains features that are not associated with \scrD X , so increasing \sigma \scrD YS yields that every
observed point is linked to \scrD X , and therefore one may expect to observe increased intensity
skewed towards the spurious points that arise from noise. Indeed, posterior intensities with
weakly informative, unimodal uninformative, and bimodal uninformative priors exhibit the
skewness toward the spurious point in Figure 7(f), (i), and (l), respectively, but this is not
the case when an informative prior is used. In (f), we observe increased intensity skewing
towards the spurious points, and in (i) and (l) the intensity appears to be bimodal with two
modes---one at the prominent point and the other at the spurious point. For the bimodal
uninformative prior, since one mode is located close to the spurious point in the observed PD,
we observe higher intensity for that mode in the posterior (Figure 7(l)) with another mode
estimating the prominent feature.
Case-III. We again consider the four types of priors here. The observed PD is constructed
from the point cloud in Figure 5 (right). The point cloud has Gaussian noise with variance
0.1I2 , and due to the high noise level in sampling relative to the unit circle, the associated
PD exhibits one prominent feature and several spurious features. We repeat the parameter
choices as in Case-I for the variances of the observed PD. For the choice of \sigma \scrD YO = 0.01 and
\sigma \scrD YS = 0.1, the posteriors computed from all of the four priors are able to detect the difference
between the one prominent point and other spurious points. We increase the variance of \scrD YO
to determine the effect of the observed PD on the posterior, and we observe that only the
posterior intensity from the informative prior has evidence of the hole (Figure 8(c)). For the
weakly informative and uninformative priors, while the posteriors in (f), (i), and (l) may not

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


62 V. MAROULAS, F. NASRIN, AND C. OBALLE
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 6. Case-I: Posterior intensities obtained by using Proposition 3.2. We consider informative (a),
weakly informative (d), unimodal uninformative (g), and bimodal uninformative (j) prior intensities. The color
maps represent scaled intensities. The list of associated parameters of the observed PD used for this case is in
Table 3. Posteriors computed from all of these priors estimate the 1-dimensional hole accurately for a choice
of variances in the observed PD as \sigma \scrD YO = 0.01 and \sigma \scrD YS = 0.1 which are presented in (b), (e), (h), and
(k). After increasing the variance to \sigma \scrD YO = 0.1, we observe that the posteriors can still estimate the hole with
higher variance as presented in (c), (f), (i), and (k).

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 63
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 7. Case-II: We consider informative (a), weakly informative (d), unimodal uninformative (g),
and bimodal uninformative (j) prior intensities to estimate posterior intensities using Proposition 3.2. The
color maps represent scaled intensities. The parameters we use for estimating the posterior intensity are listed
in Table 3. The posterior intensities estimated from the informative prior in (b) and (c) estimate the 1-
dimensional hole with high certainty. Also, the posterior intensities estimated from the weakly informative and
uninformative priors in (e), (h), and (k) imply the existence of a hole with lower certainty. (c), (f), (i), and
(l) represent the posterior with higher variance in observed PD component \scrD YS . As this makes the assumption
that every observed point is associated to \scrD X , we observe increased intensity skewed towards the spurious point
in (f). Furthermore, in (i) and (l), we observe bimodality in the posterior intensity.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


64 V. MAROULAS, F. NASRIN, AND C. OBALLE
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(a) (b) (c)

(d) (e) (f)

(g) (h) (i)

(j) (k) (l)

Figure 8. Case-III: Posterior intensities obtained by using Proposition 3.2. We consider informative (a),
weakly informative (d), unimodal uninformative (g), and bimodal uninformative (j) prior intensities. The color
maps represent scaled intensities. Parameters of the observed PD used to estimate the posterior intensity are
listed in Table 3. With a choice of \sigma \scrD YO = 0.01 and \sigma \scrD Ys = 0.1, we observe that the posteriors can deduce
the existence of the prominent feature as presented in (b), (e), (h), and (k), as we have more confidence in the
component of observed data associated to the prior. Otherwise, with an increased variance \sigma \scrD YO = 0.1, only
the posterior intensity from the informative prior is able to detect the hole with high certainty, as observed in
(c). For the weakly informative and uninformative priors, the posteriors in (f), (i), and (l) may not detect the
hole directly, but the mode of (f) with higher variance and the tail towards the prominent point in (i) and (l)
imply the existence of a hole in the underlying PD.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 65

Case-I Case-II Case-III


Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Informative
Weakly Informative
Unimodal Uninformative
Bimodal Uninformative

Figure 9. Case-IV: The first, second, and third columns match the parameters of observed PD \scrD Y used
in computing the posteriors depicted in the third column of Figures 6 and 7, and the second column of Figure
8, respectively, with \alpha = 0.5. The parameters are presented in Table 3. The color maps represent scaled
intensities. A variation in the level of intensity is observed for all of them compared to their respective cases
due to the added term in the posterior intensity. The posterior intensities in the first and second columns exhibit
the estimation of the hole with higher variability as compared to the respective figures in Case-I and Case-II.
The posteriors in the third column demonstrate dominance of the prior relative to their corresponding figures in
Case-III, especially when one examines those for informative, weakly informative, and bimodal uninformative
priors.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


66 V. MAROULAS, F. NASRIN, AND C. OBALLE

detect the hole clearly, in (f) we observe a mode with higher variance, and in (i) and (l) a tail
towards the high persistence point, implying the presence of a hole. It should be noted that
with the informative prior the posterior intensity identifies the hole closer to the mode of the
prior as we increase the variance in \sigma \scrD YO .
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Case-IV: Last, in this case we concentrate on the effect of \alpha . The rest of the parameters
used for this case remain the same and are listed in Table 3. We decrease \alpha to 0.5 to model the
scenario that a feature in \scrD X has equal probability to appear or vanish in observed \scrD Y . The
columns of Figure 9 correspond to the parameters of the observed PD \scrD Y used in computing
the posteriors depicted in the third column of Figure 6, third column of Figure 7, and second
column of Figure 8, respectively. By comparing them with their respective cases, we notice
a change in the intensity level in all of these due to the first term of the posterior intensity
on the right-hand side of (3.6). Comparing with the respective figures in Case-I, we observe
that the posterior intensities are estimating the hole with higher variability for the weakly
informative and unimodal uninformative priors. For the bimodal prior, we observe bimodality
in the posterior. Next, for Case-II, the existence of a hole is evident for informative and
weakly informative priors with higher uncertainty when compared to their previous cases.
The unimodal and bimodal uninformative priors lead to bimodal and trimodal posteriors,
respectively. We observe that the posterior resembles the prior intensity more closely when
we compare them to the respective figures in Case-III. One can especially see this with the
informative, weakly informative, and bimodal uninformative priors, which have significantly
increased intensities at the location of the modes of the prior.
4. Classification. The Bayesian framework introduced in this paper allows us to explicitly
compute the posterior intensity of a PD given data and prior knowledge. This lays the
foundation for supervised statistical learning methods in classification. In this section, we
build a Bayes factor classification algorithm based on notions discussed in section 3 and then
apply it on materials data, in particular on measurements for spatial configurations of atoms.
We commence our classification scheme with a PD D belonging to an unknown class. We
assume that D is sampled from a Poisson PP \scrD in \BbbW with the prior intensity \lambda \scrD having the
form in (M2\prime ). Consequently, its probability density has the form
N
e - \lambda \prod e - \lambda \prod \sum \scrD \ast
(4.1) p\scrD (D) = \lambda \scrD (d) = ci \scrN (d; \mu \scrD \scrD
i , \sigma i I),
| D| ! | D| !
d\in D d\in D i=1

where \lambda = \BbbW \lambda \scrD = \BbbE (| \scrD | ), with probability \alpha as in (M2\prime ). Next, suppose we have two
\int

training sets TY := DY1:n and TY \prime := DY1:m \prime from two classes of random diagrams \scrD Y and
\scrD Y \prime , respectively. The likelihood densities of the respective classes take the form of (3.4). We
then follow (3.6) to obtain the posterior intensities of \scrD given the training sets TY and TY \prime
from the prior intensities and likelihood densities. In particular, the corresponding posterior

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 67

probability density of \scrD given the training set TY is

e - \lambda \prod
(4.2) p\scrD | \scrD Y (D| TY ) = \lambda D| TY (d)
| D| !
d\in D
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

N
\Biggl[ \Biggr]
e - \lambda \prod \alpha \sum \sum d| yj d| yj d| yj
= (1 - \alpha )\lambda \scrD (d) + Ci \scrN (d; \mu i , \sigma i I) ,
| D| ! n
d\in D yj \in TY i=1

and the posterior probability density given TY\prime is given by an analogous expression. The Bayes
factor defined by

pD| \scrD Y (D| TY )


(4.3) BF (D) =
pD| \scrD Y \prime (D| TY \prime )

provides the decision criterion for assigning D to either \scrD Y or \scrD Y \prime . More specifically, for a
threshold c, BF (D) > c implies that D belongs to \scrD Y and BF (D) < c implies otherwise. We
summarize this scheme in Algorithm 4.1.

Algorithm 4.1. Bayes factor classification of PDs.


1: Input 1: Prior intensities \lambda \scrD Y and \lambda \scrD Y \prime for two classes of diagrams \scrD Y and \scrD Y \prime , respec-
tively; a threshold c > 0.
2: Input 2: Two training sets TY and TY \prime sampled from \scrD Y and \scrD Y \prime , respectively.
3: for DY and, DY \prime do
4: Compute p\scrD | \scrD Y (D| TY ) and p\scrD | \scrD Y \prime (D| TY \prime ).
5: end for
6: Compute BF (D) as in (4.3)
7: if BF (D) > c then
8: assign D to \scrD Y .
9: else
10: assign D to \scrD Y \prime .
11: end if

4.1. Atom probe tomography data. Our goal in this section is to use Algorithm 4.1 to
classify the crystal lattice of a noisy and sparse materials dataset, where the unit cells are either
body-centered cubic (BCC) or face-centered cubic (FCC); recall Figure 1. The BCC structure
has a single atom in the center of the cube, while the FCC has a void in its center but has
atoms on the centers of the cubes' faces (Figure 1(b)--(c)). Despite notable differences in the
physical configurations of each class, sparsity and noise do not allow the crystal structure to
be revealed. For high-entropy alloys, our object of interest, atom probe tomography (APT),
provides the best atomic level characterization possible. Due to the sparsity and noise in
the resulting data, there are only a few algorithms for successfully determining the crystal
structure; see [22, 43]. These algorithms, designed for APT data, rely on knowing the global
structure a priori (which is not the case for high entropy alloys (HEAs)) and seek to discover
small-scale structure within a sample.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


68 V. MAROULAS, F. NASRIN, AND C. OBALLE

To bypass this restriction, the neural network architecture of [62] provides a way to clas-
sify the crystal structure of a noisy or sparse dataset by looking at a diffraction image. In
particular, the authors therein employ a convolutional neural network for classifying the crys-
tal structure by examining a computer-generated diffraction pattern. The authors suggest
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

their method could be used to determine the crystal structure of APT data. However, the
synthetic data considered in [62] is not a realistic representation of experimental APT data,
where about 65\% of the data is missing and furthermore corrupted by observational noise.
Most importantly, their synthetic data is either sparse or noisy, not a combination of both.
The algorithm is also not publicly available, so a side-by-side comparison of our method with
theirs using HEAs is not feasible.
It is natural to consider PDs in this setting because they distill salient information about
the materials patterns with respect to connectedness and empty space (holes) within cubic
unit cells; i.e, we can differentiate between atomic unit cells by examining their homological
features. In particular, after storing both types of spatial configurations as point clouds, we
compute their Rips filtrations (see section 2), collecting resultant 1-dimensional homological
features into PDs; see Figure 10. The dataset had 200 diagrams from each class. To perform
classification with Algorithm 4.1, we started by specifying priors for each class, \lambda \scrD BCC and
\lambda \scrD F CC . Two scenarios were considered, namely using separate priors (Prior-1 in Table 4) and
the same prior (Prior-2 in Table 4) for both the BCC and the FCC classes. In particular, for
Prior-1 we superimpose 50 PDs from each class and find the highly clustered areas by using
K-means clustering. The centers of the clusters from K-means are then used as the means in
Gaussian mixture priors; see (3.6). In this manner, we produce different priors for BCC and
FCC classes. On the other hand, for Prior-2 we choose a flat prior with a higher variance level
than that of Prior-1 for both of the classes. The parameters for these two prior intensities
are in Table 4. For all cases, we set \sigma \scrD YO = 0.1 and \lambda \scrD YS (x) = 5\scrN \ast (x; (0, 0), 0.2I). We
chose a relatively high weight for \lambda \scrD YS because the nature of the data implied that extremely
low persistence holes were rare events arising from noise. To perform 10-fold cross-validation,
we partitioned PDs from both classes into training and test sets. During each fold, we took
the training sets from each class, TBCC and TF CC , and input them into Algorithm 4.1 as TY
p BCC (D| TBCC )
and TY \prime , respectively. Next, we computed the Bayes factor BF (D) = p\scrD \scrD F CC (D| TF CC )
for each
diagram D in the test sets. After this, we used the Bayes factors to construct receiver operating
characteristic (ROC) curves and computed the resulting areas under the ROC curves (AUCs).
Finally, we used the AUCs from 10-fold cross-validation to build a bootstrapped distribution
by resampling 2000 times. Information about these bootstrapped distributions is summarized
in Table 4, which shows that our scoring method almost perfectly distinguishes between the
BCC and FCC classes using the Bayesian framework of section 3. Also, it exemplifies the
robustness of our algorithm, as two different types of priors produce near perfect accuracy.

5. Discussion. This work is the first approach to introduce a generalized Bayesian frame-
work for persistent homology. This toolbox gives experts the opportunity to incorporate
their prior belief about data in conjunction with topological data analysis notions when faced
with research questions. Our framework is entirely predicated upon modeling random PDs
with Poisson PPs because of their nice qualities that behave well with Bayesian formulations.
Specifically, since they are characterized entirely by their intensity measures, they allow us to

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 69
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

(a) (b)

Figure 10. PDs for members of the BCC and FCC classes.

Table 4
Parameters for the prior intensities used in cross-validation of materials science data. Each prior \lambda is
indexed by its corresponding class for Prior-1 or U in the case of the Prior-2. The summary of AUCs across
10-folds for materials science data after scoring with Algorithm 4.1 is presented in the last three columns.

Parameters for prior intensities Summary of AUC


Priors 5th per- 95th per-
\mu \scrD
i \sigma i\scrD c\scrD
i Mean
centile centile
(0.5,0.24) 2 1
\lambda B CC
(3.6,3.6) 2 1
Prior-1 (3.7,0.65) 2 1

0.931 0.941 0.958


(0.4,0.27) 2 1
\lambda B CC
(2.8,1.2) 2 1
(2.9,3) 2 1
Prior-2 \lambda U (1,1) 20 1 0.928 0.94 0.951

quantify prior uncertainty with presupposed intensity functions, and allow for efficient com-
putation of posterior intensities if we regard observed PDs as noisy observations described by
marks in a marked Poisson PP. Interestingly, recent works [3, 37] could also be used to devise
an alternative, parallel PP-based Bayesian framework for persistent homology. In particular,
one could directly use Bayes rule with prior distributions constructed from [37] and then ob-
tain posteriors with the methodology from [3], which outlines a procedure for Monte Carlo
estimation of Choquet integrals. This is a worthwhile future direction for research of this
nature. It should be noted that our Bayesian model considers PDs, which are summaries of
the data at hand, for defining a substitution likelihood rather than using the underlying point
cloud data. This does not adhere to a strict Bayesian viewpoint, as we model the behavior
of the PDs without considering the underlying data (materials data in our example) used to
create it; however, our paradigm incorporates prior knowledge and observed data summaries
to create posterior probabilities, analogous to the notion of substitution likelihood detailed
in [27]. The general relationship between the likelihood models related to point cloud data
and those of their corresponding PDs remains an important open problem. Our paper also

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


70 V. MAROULAS, F. NASRIN, AND C. OBALLE

introduces a conjugate-like family of prior intensities and stochastic kernels (our likelihood
analogue), which can be used to obtain a closed form for posterior intensities. A detailed
example is presented to illustrate the qualities of posterior intensities arising under several in-
teresting parameter choices in our model. This example establishes evidence that our Bayesian
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

framework updates prior uncertainty with new observations in a manner similar to that for
standard RVs. Thus, the Bayesian inference developed herein can be reliably used for machine
learning and data analysis techniques directly on the space of PDs. Indeed, a classification
algorithm is derived and successfully applied on materials science data to assess the capability
of our Bayesian framework.
Appendix A. Proof of Theorem 3.1.
Proof. By Theorem 2.10, we decompose \lambda \scrD X | DY 1:m to write

(A.1) \lambda \scrD X | DY 1:m = \lambda \scrD X | DY 1:m + \lambda \scrD X | DY 1:m = (1 - \alpha (x))\lambda \scrD X + \lambda \scrD X | DY 1:m ,
V O O

where the second equality follows because \scrD XV is independent of \scrD Y . Theorem 2.10 allows us
to express \lambda \scrD XO as the average of intensity functions \lambda \scrD X i for i = 1, . . . , m, where the \scrD X i
O O
1 \sum m
are independent and equal in distribution to \scrD XO . That is, \lambda \scrD XO = m \lambda
i=1 \scrD X i , and by
O
conditioning we have
m
1 \sum
(A.2) \lambda \scrD X | DY 1:m = \lambda \scrD i | DY i .
O m X
O
i=1

So to expand (A.1) it suffices to compute \lambda \scrD | DY i for fixed i. First, we express the finite
Xi
O
PP (\scrD X , \scrD Y ) as a marked Poisson PP. To this end, we adopt a construction from [53], the
augmented space \BbbW \prime := \BbbW \cup \{ \Delta \} , where \Delta is a dummy set used for labeling points in \scrD YS .
Next, we define the random set \scrH = \scrH \BbbW \cup \scrH \Delta such that
\Bigl\{ \Bigr\} \bigcup \Bigl\{ \Bigr\}
(A.3) \scrH := (x, y) \in (\scrD XO , \scrD YO ) (\Delta , y) : y \in \scrD YS .

One can observe that \scrH is the superposition of two marked Poisson PPs \scrH \BbbW and \scrH \Delta ,
taking values in \BbbW \times \BbbW and \Delta \times \BbbW , respectively. Moreover, it directly follows from (M2)
and (M3)(i) that \scrH \BbbW has marginal intensity function \alpha (x)\lambda \scrD X (x) on \BbbW and stochastic kernel
density \ell (y| x) while (M3)(ii) shows that \scrH \Delta has marginal intensity function \lambda DYS (\BbbW ) on \{ \Delta \}
\lambda \scrD Y (y)
with stochastic kernel density \lambda \scrD Y (\BbbW ) .
S
By Theorem 2.12, the intensity functions for \scrH \BbbW and
S
\scrH \Delta are \alpha (x)\lambda \scrD X (x)\ell (y| x) and \lambda \scrD YS (y), respectively. Hence, applying Theorem 2.10 to (A.3)
reveals that the intensity function for \scrH , \lambda \scrH , is given by

(A.4) \lambda \scrH (x, y) = \alpha (x)\lambda \scrD X (x)\ell (y| x)1x\in \BbbW + \lambda \scrD YS (y)1x\in \Delta .

Let \scrH Y := \{ y : (x, y) \in \scrH \} , \scrH X := \{ x : (x, y) \in \scrH \} be the projections of \scrH onto its first
and second coordinates, respectively. It immediately follows from Theorem 2.11 that \scrH Y is
a Poisson PP on \BbbW since it is the image of \scrH under a projection. Therefore, by treating the

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 71

first coordinates of \scrH as marks, we may express \scrH as a marked Poisson PP having intensity
function \lambda \scrH Y on \BbbW and stochastic kernel density p(x| y) from \BbbW to \BbbW \prime . Another application
of Theorem 2.12 then implies

(A.5) \lambda \scrH (x, y) = \lambda \scrH Y (y)p(x| y).


Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

From (A.4) and (A.5), we obtain the identity

\alpha (x)\lambda \scrD X (x)\ell (y| x)1x\in \BbbW + \lambda \scrD YS (y)1x\in \Delta
(A.6) p(x| y) = , \lambda \scrH Y (y) \not = 0.
\lambda \scrH Y (y)

Equation (A.6) describes the probability density of \scrH at x \in \BbbW \prime for y \in \BbbW fixed. Substituting
(A.6) for the Janossy density in (2.1) and applying Corollary 2.15 gives the intensity function
for the PP \scrH X | DY i whenever \lambda \scrH Y (y) \not = 0 for any y \in DY i :
\sum \alpha (x)\lambda \scrD X (x)\ell (y| x)1x\in \BbbW + \lambda DY (y)1x\in \Delta
S
(A.7) \lambda \scrH X | DY i (x) = , \lambda \scrH Y (y) \not = 0.
\lambda \scrH Y (y)
y\in DY i

Restricting (A.4) and (A.5) to \BbbW \times \BbbW , we obtain p(x| y)\lambda \scrH Y (y) = \ell (y| x)\alpha (x)\lambda \scrD X (x) = 0,
whenever \lambda \scrH Y (y) = 0, from which we conclude that \lambda \scrH Y (y) \not = 0 a.s. Hence restricting (A.7)
to \BbbW \times \BbbW yields
\sum \alpha (x)\lambda \scrD (x)\ell (y| x)
X
(A.8) \lambda \scrD X | DY i (x) = a.s.
O \lambda \scrH Y (y)
y\in DY i

Notice that \scrH Y is the same PP as \scrD YO \cup \scrD YS . Theorem 2.11 implies that \scrD YO is a Poisson
PP, and \scrD YS is a Poisson PP by (M3), so by Theorem 2.10, \lambda \scrH Y = \lambda \scrD YO + \lambda \scrD YS , where
\int
\lambda \scrD YO (y) = \lambda (\scrD X ,\scrD Y ) (\BbbW \times y) = \BbbW \alpha (u)\lambda \scrD XO (u)\ell (y| u)du by Theorem 2.12. Employing
O O
(A.8), one gets that
\sum \ell (y| x)\lambda \scrD X (x)
(A.9) \lambda \scrD X | DY i (x) = \alpha (x) \int ,
O \lambda \scrD YS (y) + \BbbW \ell (y| u)\alpha (u)\lambda \scrD X (u)du
y\in DY i

which proves Theorem 3.1 after substituting into (A.1).


Acknowledgments. We thank the associate editor and the two anonymous referees for
their constructive comments, which helped us to improve our manuscript.

REFERENCES

[1] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova,


E. Hanson, F. Motta, and L. Ziegelmeier, Persistence images: A stable vector representation of
persistent homology, J. Mach. Learn. Res., 18 (2017), pp. 218--252.
[2] A. Adcock, E. Carlsson, and G. Carlsson, The ring of algebraic functions on persistence bar codes,
Homology Homotopy Appl., 18 (2016), pp. 381--402, https://doi.org/10.4310/HHA.2016.v18.n1.a21.
[3] H. Agahi, H. Mehri-Dehnavi, and R. Mesiar, Monte Carlo integration for Choquet integral, Int. J.
Intell. Syst., 34 (2019), pp. 1348--1358, https://doi.org/10.1002/int.22112.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


72 V. MAROULAS, F. NASRIN, AND C. OBALLE

[4] A. Babichev and Y. Dabaghian, Persistent memories in transient networks, in Emergent Complexity
from Nonlinearity, in Physics, Engineering and the Life Sciences, Springer Proc. Phys. 191, Springer,
Cham, 2017, pp. 179--188.
[5] P. Bendich, J. S. Marron, E. Miller, A. Pieloch, and S. Skwerer, Persistent homology analysis
of brain artery trees, Ann. Appl. Stat., 10 (2016), pp. 198--218, https://doi.org/10.1214/15-AOAS886.
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[6] C. Biscio and J. M{\e}ller, The accumulated persistence function, a new useful functional summary statis-
tic for topological data analysis, with a view to brain artery trees and spatial point process applications,
J. Comput. Graph. Statist., 28 (2019), pp. 671--681, https://doi.org/10.1080/10618600.2019.1573686.
[7] O. Bobrowski, S. Mukherjee, and J. E. Taylor, Topological consistency via kernel estimation,
Bernoulli, 23 (2017), pp. 288--328, https://doi.org/10.3150/15-BEJ744.
[8] T. Bonis, M. Ovsjanikov, S. Oudot, and F. Chazal, Persistence-based pooling for shape pose recog-
nition, in Computational Topology in Image Context, A. Bac and J. L. Mari, eds., Springer, New
York, 2016, pp. 19--29.
[9] P. Bubenik, Statistical topological data analysis using persistence landscapes, J. Mach. Learn. Res., 16
(2015), pp. 77--102.
[10] P. Bubenik, The Persistence Landscape and Some of Its Properties, preprint, https://arxiv.org/abs/
1810.04963, 2018.
[11] G. Carlsson, T. Ishkhanov, V. D. Silva, and A. Zomorodian, On the local behavior of spaces of
natural images, Int. J. Comput. Vis., 76 (2008), pp. 1--12.
[12] M. Carri\ère, S. Y. Oudot, and M. Ovsjanikov, Stable topological signatures for points on 3D shapes,
Comput. Graph. Forum, 34 (2015), pp. 1--12.
[13] F. Chazal and B. Michel, An Introduction to Topological Data Analysis: Fundamental and Practical
Aspects for Data Scientists, preprint, https://arxiv.org/abs/1710.04019, 2017.
[14] J. A. Chisholm and S. Motherwell, A new algorithm for performing three-dimensional searches of
the Cambridge Structural Database, J. Appl. Crystallogr., 37 (2004), pp. 331--334.
[15] M. K. Chung, J. L. Hanson, J. Ye, R. J. Davidson, and S. D. Pollak, Persistent homology in
sparse regression and its application to brain morphometry, IEEE Trans. Med. Imaging, 34 (2015),
pp. 1928--1939.
[16] D. J. Daley and D. Vere-Jones, An Introduction to the Theory of Point Processes, Volume I: Ele-
mentary Theory and Methods, 2nd ed., Probab. Appl. (N. Y.), Springer-Verlag, New York, 2003.
[17] P. D\lotko, R. Ghrist, M. Juda, and M. Mrozek, Distributed computation of coverage in sensor
networks by homological methods, Appl. Algebra Eng. Commun. Comput., 23 (2012), pp. 29--58.
[18] H. Edelsbrunner, Computational Topology: An introduction, American Mathematical Society, Provi-
dence, RI, 2010.
[19] B. D. Fabio and M. Ferri, Comparing persistence diagrams through complex vectors, in Proceedings of
the International Conference on Image Analysis and Processing, 2015, pp. 294--305, https://doi.org/
10.1007/978-3-319-23231-7 27.
[20] B. T. Fasy, F. Lecci, A. Rinaldo, L. Wasserman, S. Balakrishnan, and A. Singh, Confi-
dence sets for persistence diagrams, Ann. Statist., 42 (2014), pp. 2301--2339, https://doi.org/10.1214/
14-AOS1252.
[21] M. Gameiro, Y. Hiraoka, S. Izumi, M. Kramar, K. Mischaikow, and V. Nanda, A topological
measurement of protein compressibility, Jpn. J. Ind. Appl. Math., 32 (2015), pp. 1--17.
[22] B. Gault, M. P. Moody, J. M. Cairney, and S. P. Ringer, Atom probe crystallography, Mater.
Today, 15 (2012), pp. 378--386.
[23] R. Ghrist, Barcodes: The persistent topology of data, Bull. Amer. Math. Soc. (N.S.), 45 (2008), pp. 61--75,
https://doi.org/10.1090/S0273-0979-07-01191-3.
[24] D. Hicks, C. Oses, E. Gossett, G. Gomez, R. H. Taylor, C. Toher, M. J. Mehl, O. Levy, and
S. Curtarolo, AFLOW-SYM: Platform for the complete, automatic and self-consistent symmetry
analysis of crystals, Acta Crystallogr. A, 74 (2018), pp. 184--203.
[25] J. D. Honeycutt and H. C. Andersen, Molecular dynamics study of melting and freezing of small
Lennard-Jones clusters, J. Phys. Chem., 91 (1987), pp. 4950--4963.
[26] D. P. Humphreys, M. R. McGuirl, M. Miyagi, and A. J. Blumberg, Fast estimation of re-
combination rates using topological data analysis, GENETICS, 211 (2019), pp. 1194--1201, https:
//doi.org/10.1534/genetics.118.301565.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


A BAYESIAN FRAMEWORK FOR PERSISTENT HOMOLOGY 73

[27] H. Jeffreys, Theory of Probability, Clarendon Press, Oxford, UK, 1961.


[28] T. F. Kelly, M. K. Miller, K. Rajan, and S. P. Ringer, Atomic-scale tomography: A 2020 vision,
Microsc. Microanal., 19 (2013), pp. 652--664.
[29] F. A. Khasawneh and E. Munch, Chatter detection in turning using persistent homology, Mech. Syst.
Signal Process., 70--71 (2016), pp. 527--541.
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

[30] J. F. C. Kingman, Poisson Processes, Clarendon Press, Oxford, UK, 1993.


[31] G. Kusano, K. Fukumizu, and Y. Hiraoka, Persistence weighted Gaussian kernel for topological data
analysis, in Proceedings of the 33rd International Conference on Machine Learning, Vol. 48, 2016,
pp. 2004--2013.
[32] P. M. Larsen, S. Schmidt, and J. Schi{\e}tz, Robust structural identification via polyhedral template
matching, Model. Simul. Mater. Sci. Eng., 24 (2016), 055007.
[33] R. Mahler, Statistical Multisource-Multitarget Information Fusion, Artech House, Boston, MA, 2007.
[34] A. Marchese and V. Maroulas, Topological learning for acoustic signal identification, in Proceedings
of the 19th International Conference on Information Fusion (FUSION), 2016, pp. 1377--1381.
[35] A. Marchese and V. Maroulas, Signal classification with a point process distance on the space of
persistence diagrams, Adv. Data Anal. Classif., 12 (2018), pp. 657--682, https://doi.org/10.1007/
s11634-017-0294-x.
[36] A. Marchese, V. Maroulas, and J. Mike, K-means clustering on the space of persistence diagrams,
in Wavelets and Sparsity XVII, Proc. SPIE 10394, International Society for Optics and Photonics,
Bellingham, WA, 2017.
[37] V. Maroulas, J. L. Mike, and C. Oballe, Nonparametric estimation of probability density functions
of random persistence diagrams, J. Mach. Learn. Res., 20 (2019), pp. 1--49.
[38] V. Maroulas and A. Nebenfuhr, \" Tracking rapid intracellular movements: A Bayesian random set
approach, Ann. Appl. Stat., 9 (2015), pp. 926--949, https://doi.org/10.1214/15-AOAS819.
[39] N. W. McNutt, O. Rios, V. Maroulas, and D. J. Keffer, Interfacial Li-ion localization in hierar-
chical carbon anodes, Carbon, 111 (2017), pp. 828--834, https://doi.org/10.1016/j.carbon.2016.10.061.
[40] J. Mike, C. D. Sumrall, V. Maroulas, and F. Schwartz, Nonlandmark classification in paleobiology:
Computational geometry as a tool for species discrimination, Paleobiology, 42 (2016), pp. 696--706.
[41] Y. Mileyko, S. Mukherjee, and J. Harer, Probability measures on the space of persistence diagrams,
Inverse Problems, 27 (2011), 124007, https://doi.org/10.1088/0266-5611/27/12/124007.
[42] M. K. Miller, T. Kelly, K. Rajan, and S. Ringer, The future of atom probe tomography, Mater.
Today, 15 (2012), pp. 158--165, https://doi.org/10.1016/S1369-7021(12)70069-X.
[43] M. P. Moody, B. Gault, L. T. Stephenson, R. K. Marceau, R. C. Powles, A. V. Ceguerra,
A. J. Breen, and S. P. Ringer, Lattice rectification in atom probe tomography: Toward true three-
dimensional atomic microscopy, Microsc. Microanal., 17 (2011), pp. 226--239.
[44] J. Moyal, The general theory of stochastic population processes, Acta Math., 108 (1962), pp. 1--31,
https://doi.org/10.1007/BF02545761.
[45] M. M. P. Nicolau, A. J. Levine, and G. E. Carlsson, Topology based data analysis identifies a
subgroup of breast cancers with a unique mutational profile and excellent survival, Proc. Natl. Acad.
Sci. USA, 108 (2011), pp. 7265--7270.
[46] V. Patrangenaru, P. Bubenik, R. L. Paige, and D. Osborne, Topological Data Analysis for Object
Data, preprint, https://arxiv.org/abs/1804.10255, 2018.
[47] C. M. M. Pereira and R. F. Mello, Persistent homology for time series and spatial data clustering,
Expert Syst. Appl., 42 (2015), pp. 6026--6038.
[48] J. Reininghaus, S. Huber, U. Bauer, and R. Kwitt, A stable multi-scale kernel for topological ma-
chine learning, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), 2015, pp. 4741--4748, https://doi.org/10.1109/CVPR.2015.7299106.
[49] A. Robinson and K. Turner, Hypothesis testing for topological data analysis, J. Appl. Comput. Topol.,
1 (2017), pp. 241--261, https://doi.org/10.1007/s41468-017-0008-7.
[50] L. Santodonato, Y. Zhang, M. Feygenson, C. M. Parish, M. C. Gao, R. J. K. Weber, J. C.
Neuefeind, Z. Tang, and P. K. Liaw, Deviation from high-entropy configurations in the atomic
distributions of a multi-principal-element alloy, Nat. Commun., 6 (2015), 5964, https://doi.org/10.
1038/ncomms6964.
[51] I. Sgouralis, A. Nebenfuhr, \" and V. Maroulas, A Bayesian topological framework for the iden-

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.


74 V. MAROULAS, F. NASRIN, AND C. OBALLE

tification and reconstruction of subcellular motion, SIAM J. Imaging Sci., 10 (2017), pp. 871--899,
https://doi.org/10.1137/16M1095755.
[52] V. D. Silva and R. Ghrist, Coverage in sensor networks via persistent homology, Algebr. Geom. Topol.,
7 (2007), pp. 339--358, https://doi.org/10.2140/agt.2007.7.339.
[53] S. S. Singh, B.-N. Vo, A. Baddeley, and S. Zuyev, Filters for spatial point processes, SIAM J.
Downloaded 02/13/20 to 185.13.33.12. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Control Optim., 48 (2009), pp. 2275--2295, https://doi.org/10.1137/070710457.


[54] A. E. Sizemore, J. E. Phillips-Cremins, R. Ghrist, and D. S. Bassett, The importance of the
whole: Topological data analysis for the network neuroscientist, Netw. Neurosci., 3 (2019), pp. 656--
673, https://doi.org/10.1162/netn a 00073.
[55] A. Spannaus, V. Maroulas, D. Keffer, and K. J. Law, Bayesian point set registration, in 2017
Matrix Annals, Springer, Cham, 2019, pp. 99--120.
[56] A. Togo and I. Tanaka, Spglib: A Software Library for Crystal Symmetry Search, preprint, https:
//arxiv.org/abs/1808.01590, 2018.
[57] K. Turner, S. Mukherjee, and D. M. Boyer, Persistent homology transform for modeling shapes and
surfaces, Inf. Inference, 3 (2014), pp. 310--344, https://doi.org/10.1093/imaiai/iau011.
[58] V. Venkataraman, K. N. Ramamurthy, and P. Turaga, Persistent homology of attractors for action
recognition, in Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP),
2016, pp. 4150--4154, https://doi.org/10.1109/ICIP.2016.7533141.
[59] L. Wasserman, Topological data analysis, Annu. Rev. Stat. Appl., 5 (2018), pp. 501--535.
[60] K. Xia, X. Feng, Y. Tong, and G. W. Wei, Persistent homology for the quantitative prediction of
fullerene stability, J. Comput. Chem., 36 (2014), pp. 408--422, https://doi.org/10.1002/jcc.23816.
[61] Y. Zhang, T. T. Zuo, Z. Tang, M. C. Gao, K. A. Dahmen, P. K. Liaw, and Z. P. Lu, Microstruc-
tures and properties of high-entropy alloys, Prog. Mater. Sci., 61 (2014), pp. 1--93.
[62] A. Ziletti, D. Kumar, M. Scheffler, and M. Ghiringhelli, Insightful classification of crys-
tal structures using deep learning, Nat. Commun., 9 (2018), 2775, https://doi.org/10.1038/
s41467-018-05169-6.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.

You might also like