Topological Data Analysis and Machine Learning
Topological Data Analysis and Machine Learning
Topological Data Analysis and Machine Learning
To cite this article: Daniel Leykam & Dimitris G. Angelakis (2023) Topological data analysis and
machine learning, Advances in Physics: X, 8:1, 2202331, DOI: 10.1080/23746149.2023.2202331
To link to this article:
I. Introduction
Topological quantities are invariant under continuous deformations; an
often-cited example is that a doughnut can be continuously transformed
into a coffee mug – both are topologically equivalent to a torus. The
robustness of topological quantities to perturbations is inspiring physicists
in many fields, including condensed matter, photonics, acoustics, and
CONTACT Daniel Leykam [email protected] Centre for Quantum Technologies, 3 Science Drive
2, National University of Singapore, 117543, Singapore
© 2023 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (
org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited. The terms on which this article has been published allow the posting of the Accepted Manuscript in a repository by
the author(s) or with their consent.
mechanical systems [1–4]. In all these areas, topology has enabled the
prediction and explanation of surprisingly robust physical effects.
Most famously, the extremely precise quantisation of the Hall conductiv
ity observed in two-dimensional electronic systems since the 1980s was
explained as a novel topological phase of matter, the quantum Hall phase
[5]. In this and many other examples from physics, we deal with smooth
deformations in some parameter space, such as the energy bands of solid
state electronic systems.
Physics is, however, an outlier among fields of science in that idealised
continuous models and functions can explain a wide variety of observed
phenomena. Other fields do not have the luxury of continuity and have to
make do out of sparse data and limited observations in high-dimensional
parameter spaces. Despite this very different setting, topological approaches
remain powerful.
A suite of computational topological techniques known as topological
data analysis (TDA) has been developed over the past 20 years to system
atically define and study the ‘shape’ of complex discrete data in high-
dimensional spaces. TDA is attracting growing interest among physicists,
particularly those working on topological materials or the application of
machine learning techniques to physics [6–10].
At this time, we are aware of two existing reviews of TDA aimed at the
physics audience. The first by Carlsson, one of the founders of the field, gave
a broad survey of different techniques of TDA and their applications in
various areas of science [11]. The second review, by Murugan and
Robertson, provided a detailed pedagogical and physicist-friendly introduc
tion to two important techniques, persistent homology and the Mapper
algorithm, applying them to the example of an astronomical dataset [12].
Since publication of these two reviews there has been growing interest in
applying TDA methods to physics, including the incorporation of TDA into
physics-targeted machine learning, with applications including the unsu
pervised detection of phase transitions. Moreover, the field of TDA has
continued to evolve with new generalisations and techniques being actively
The aim of this article is to review cutting edge applications of TDA to
physics. We will provide a gentle introduction to the basic techniques,
survey how TDA shows promise for the detection of novel phases of matter,
and speculate on what we believe to be important directions for future
research, including opportunities offered by newer TDA methods such as
zigzag persistence.
The structure of this article is as follows: Section II provides a brief
introduction to TDA guided by the simple example of two-dimensional
point clouds. Section III discusses how TDA has been applied to identify
order parameters and phase transitions in various physical systems. Section
Figure 1. Examples of noisy point clouds. Point clouds sampled from objects with differing
shapes and even differing dimensionality may be difficult to distinguish using standard
summary statistics such as the centre of mass and variance. In “Circle” and “Figure 8” the
noise randomly perturbs the points in the ambient two-dimensional space. In “Swiss Roll”
points are sampled from a one-dimensional interval before being embedded into the two-
dimensional space ðx; yÞ.
Figure 2. Simplicial complexes constructed from a point cloud using different cutoff distances
�i , where blue lines and orange shaded areas denote edges and faces, respectively. For small
cutoff distances all points are disconnected, forming a trivial simplicial complex with no edges
(�1 ). As the cutoff is increased nearby vertices start to become connected by edges (�2 ).
Increasing the cutoff further, triplets of points become connected, forming faces. In �3 and �4
the simplicial complex has a single connected component hosting a non-trivial cycle. For
sufficiently large cutoff distances the cycle is destroyed by the addition of faces covering the
entire interior of the point cloud (�5).
Figure 3. Persistence diagrams of the two-dimensional point clouds shown in Figure 1 com
puted using the Vietoris-Rips complex [16]. Each point represents a distinct topological feature.
Horizontal and vertical axes denote the length scales at which each feature is created (b; birth)
and destroyed (d; death) respectively. Points that are further from the diagonal dashed line
therefore persist over a larger range of scales and are said to have a longer “lifetime” l ¼ d b.
Since features must be created before they are destroyed, no points lie below the diagonal. At
sufficiently large spatial scales all points become connected to form a single connected graph,
corresponding to a single cluster with an infinite lifetime. Typically the infinite lifetime cluster is
either discarded or plotted at a finite d and distinguished using a horizontal dashed line.
example, the birth scales of the long-lived cycles in the ‘Circle’ and
‘Figure 8’ clouds are related to a maximum separation between neighbour
ing points comprising the cycle, while the death scale will be related to the
cycle’s diameter.
The attentive reader will notice that the persistence diagrams for
the ‘Circle’ and ‘Swiss Roll’ clouds share the same long-lived features,
despite their obviously differing shapes. Closer inspection will, how
ever, reveal noticeable differences in their short-lived features. For
example, the cycles appearing in the ‘Swiss Roll’ dataset all have
similar birth scales, corresponding to the distance between the inner
and outer parts of the spiral and hinting at a one-dimensional embed
ding. This suggests that the differing shapes of these two-point clouds
may indeed be captured by inspecting their short-lived features; thus,
persistent homology can also capture the local features (geometry) of
the data.
Figure 4. Matching (green lines) of the one-dimensional cycles of the Circle, Figure 8, and Swiss
Roll point clouds used to compute the Wasserstein distance, which corresponds to the total
length of the green lines.
Figure 5. (A) Noisy sampling of the chaotic trajectory of the Lorenz attractor and (b) topology-
based filtered data, adapted from Ref. [36]. (c) Snapshots of a chaotic two-dimensional fluid and
(d) estimates of finite-size effects using image eigendecompositions (SVG) and a TDA-based
disorder estimator (TDA) showing the two methods give similar results, adapted from Ref [40].
Figure 6. Persistence diagrams obtained from molecular dynamics simulations of liquid (a),
amorphous (b), and crystalline (c) phases of silica. Point colours indicate the multiplicity (on
a logarithmic scale) of one-dimensional features. Insets in (b) illustrate representative cycles
corresponding to short- and medium-range order in the amorphous phase. Adapted from Ref.
V. Future directions
A. New techniques for topological data analysis
An area of active research among physicists is the application of TDA tools
to analyse the structure of more complex systems including flow networks
involving directed links [96] and time-evolving networks [92]. One
approach used in recent studies that is compatible with standard persistent
homology tools is to convert the directed network into a regular point cloud
using a diffusion map, which constructs edges between a pair of vertices
ði; jÞ by computing the probability of diffusion between i and j. It will be
interesting to explore alternate approaches that can work directly with
unidirectional or time-evolving systems without requiring diffusion maps,
such as zigzag persistence [97].
The metrics used for quantifying differences between persistence dia
grams have applications beyond persistent homology. For example, Skinner
et al. [98] used the Wasserstein distance to compare different local neigh
bourhood structures of disordered media, based on the intuition that it
encodes the energy cost required to transform one configuration into
another. The advantage of such a topological metric compared to more
conventional measures including the Kullback–Leibler divergence is that
the former is better at distinguishing weakly-overlapping distributions. Are
there other examples where such metrics can be linked to physical
VI. Conclusion
In summary, we have attempted to give an overview of emerging physics
applications of topological data analysis methods, focusing on persistent
homology. The take-home message is that TDA can compress complex
datasets into their essential (topological) features, enabling the training of
simpler machine learning models compared to widely used and computa
tionally expensive general-purpose artificial neural networks. Nevertheless,
as topological data analysis is relatively new it is still largely employed on an
ad-hoc basis and further work is needed to establish a standard set of
methods that non-specialists can trust [15].
Topological data analysis has already been fruitfully applied to other areas
of research including image analysis and medical science, enabling the
extraction of useful insights from complicated hard-to-visualise datasets.
We hope that the techniques discussed here and in other recent reviews
aimed at the physics audience [11,12] will not merely provide a transient
fashionable alternative to more standard methods of data analysis used by
physicists but will form a new set of long-lasting tools enabling a better
understanding of complex physical systems from classical to quantum.
Disclosure statement
No potential conflict of interest was reported by the author(s).
This research is supported by the National Research Foundation, Singapore, and A*STAR
under its CQT Bridging Grant and Quantum Engineering Programme NRF2021-QEP2-02-
P02, A*STAR (#21709) and by EU HORIZON—Project 101080085 — QCFD.
[1] Hasan MZ, Kane CL. Colloquium: topological insulators. Rev Mod Phys.
2010;82:3045. DOI:10.1103/RevModPhys.82.3045
[2] Ozawa T, Price HM, Amo A, et al. Topological photonics. Rev Mod Phys.
2019;91:015006. DOI:10.1103/RevModPhys.91.015006
[3] Ma G, Xiao M, Chan TC. Topological phases in acoustic and mechanical systems. Nat
Rev Phys. 2019;1:281. DOI:10.1038/s42254-019-0030-x
[4] Kim M, Jacob Z, Rho J. Recent advances in 2D, 3D and higher-order topological
photonics. Light: Sci & Appl. 2020;9:130. DOI:10.1038/s41377-020-0331-y
[5] von Klitzing K, Chakraborty T, Kim P, et al. 40 years of the quantum Hall effect. Nat
Rev Phys. 2020;2:397. DOI:10.1038/s42254-020-0209-1
[6] Carleo G, Cirac I, Cranmer K, et al. Machine learning and the physical sciences. Rev
Mod Phys. 2019;91:045002. DOI:10.1103/RevModPhys.91.045002
[48] Olsthoorn B, Hellsvik J, Balatsky AV. Finding hidden order in spin models with
persistent homology. Phys Rev Res. 2020;2:043308. DOI:10.1103/PhysRevResearch.2.
[49] Sehayek D, Melko RG. Persistent homology of Z2 gauge theories. Phys Rev B.
2022;106:085111. DOI:10.1103/PhysRevB.106.085111
[50] Ormrod Morley D, Salmon PS, Wilson M. Persistent homology in two-dimensional
atomic networks. J Chem Phys. 2021;154:124109. DOI:10.1063/5.0040393
[51] Muldoon M, MacKay R, Huke J, et al. Topology from time series. Phys D. 1993;65:1–
16. DOI:10.1016/0167-2789(92)00026-U
[52] Maletić S, Zhao Y, Rajković M. Persistent topological features of dynamical systems.
Chaos Inter J Nonlinear Sci. 2016;26:053105. DOI:10.1063/1.4949472
[53] Mittal K, Gupta S. Topological characterization and early detection of bifurcations
and chaos in complex systems using persistent homology. Chaos Inter J Nonlinear
Sci. 2017;27:051102. DOI:10.1063/1.4983840
[54] Tran QH, Hasegawa Y. Topological time-series analysis with delay-variant
embedding. Phys Rev E. 2019;99:032209. DOI:10.1103/PhysRevE.99.032209
[55] Tempelman JR, Khasawneh FA. A look into chaos detection through topological data
analysis. Phys D. 2020;406:132446. DOI:10.1016/j.physd.2020.132446
[56] Makarenko N, Karimova L, Novak M. Investigation of global solar magnetic field by
computational topology methods. Phys A. 2007;380:98. DOI:10.1016/j.physa.2007.02.
[57] Kondic L, Goullet A, O’Hern CS et al. Topology of force networks in compressed
granular media. EPL (Europhysics Letters). 2012;97:54001. DOI:10.1209/0295-5075/
[58] Pugnaloni LA, Carlevaro CM, Kramár M, et al. Structure of force networks in tapped
particulate systems of disks and pentagons. I. clusters and loops. Phys Rev E.
2016;93:062902. DOI:10.1103/PhysRevE.93.062902
[59] Cole A, Shiu G. Persistent homology and non-Gaussianity. J Cosmol Astropart Phys.
2018;025:025. DOI:10.1088/1475-7516/2018/03/025
[60] Leykam D, Angelakis DG. Photonic band structure design using persistent homology.
APL Photonics. 2021;6:030802. DOI:10.1063/5.0041084
[61] Spitz D, Berges J, Oberthaler M, et al. Finding self-similar behavior in quantum
many-body dynamics via persistent homology. SciPost Phys. 2021;11:60. DOI:10.
[62] Leykam D, Rondón I, Angelakis DG. Dark soliton detection using persistent homol
ogy. Chaos Inter J Nonlinear Sci. 2022;32:073133. DOI:10.1063/5.0097053
[63] Cole A, Loges GJ, Shiu G. Quantitative and interpretable order parameters for phase
transitions from persistent homology. Phys Rev B. 2021;104:104426. DOI:10.1103/
[64] Sale N, Giansiracusa J, Lucini B. Quantitative analysis of phase transitions in
two-dimensional XY models using persistent homology. Phys Rev E.
2022;105:024121. DOI:10.1103/PhysRevE.105.024121
[65] Membrillo Solis I, Orlova T, Bednarska K, et al. Tracking the time evolution of soft
matter systems via topological structural heterogeneity. Commun Mater. 2022;3:1.
[66] He Y, Xia S, Angelakis DG, et al. Persistent homology analysis of a generalized Aubry-
André-Harper model. Phys Rev B. 2022;106:054210. DOI:10.1103/PhysRevB.106.
[67] Suzuki A, Miyazawa M, Minto JM, et al. Flow estimation solely from image data
through persistent homology analysis. Sci Rep. 2021;11:17948. DOI:10.1038/s41598-
[68] Cirafici M. Persistent homology and string vacua. J High Energy Phys. 2016;2016:45.
[69] Cole A, Shiu G. Topological data analysis for the string landscape. J High Energy
Phys. 2019;2019:54. DOI:10.1007/JHEP03(2019)054
[70] Hirakida T, Kashiwa K, Sugano J, et al. Persistent homology analysis of deconfine
ment transition in effective Polyakov-line model. Int J Mod Physics A.
2020;35:2050049. DOI:10.1142/S0217751X20500499
[71] di Pierro A, Mancini S, Memarzadeh L, et al. Homological analysis of multi-qubit
entanglement. EPL (Europhysics Letters). 2018;123:30006. DOI:10.1209/0295-5075/
[72] Mengoni R, Di Pierro A, Memarzadeh L, et al. Persistent homology analysis of
multiqubit entanglement. Quantum Inf Computation. 2020;20:375. DOI:10.26421/
[73] Olsthoorn B. Persistent homology of quantum entanglement. Phys Rev B.
2021;107:115174. DOI:10.1103/PhysRevB.107.115174
[74] Tran QH, Chen M, Hasegawa Y. Topological persistence machine of phase
transitions. Phys Rev E. 2021;103:052127. DOI:10.1103/PhysRevE.103.052127
[75] Donato I, Gori M, Pettini M, et al. Persistent homology analysis of phase transitions.
Phys Rev E. 2016;93:052138. DOI:10.1103/PhysRevE.93.052138
[76] Park S, Hwang Y, Yang B-J. Unsupervised learning of topological phase diagram
using topological data analysis. Phys Rev B. 2022;105:195115. DOI:10.1103/
[77] Tirelli A, Costa NC. Learning quantum phase transitions through topological data
analysis. Phys Rev B. 2021;104:235146. DOI:10.1103/PhysRevB.104.235146
[78] Tirelli A, Carvalho DO, Oliveira LA, et al. Unsupervised machine learning approaches
to the q-state Potts model. Eur Phys J B. 2022;95:189. DOI:10.1140/epjb/s10051-022-
[79] Rodriguez-Nieva JF, Scheurer MS. Identifying topological order through unsuper
vised machine learning. Nat Phys. 2019;15:790. DOI:10.1038/s41567-019-0512-x
[80] Long Y, Ren J, Chen H. Unsupervised manifold clustering of topological phononics.
Phys Rev Lett. 2020;124:185501. DOI:10.1103/PhysRevLett.124.185501
[81] Scheurer MS, Slager R-J. Unsupervised machine learning and band topology. Phys
Rev Lett. 2020;124:226401. DOI:10.1103/PhysRevLett.124.226401
[82] Che Y, Gneiting C, Liu T, et al. Topological quantum phase transitions retrieved
through unsupervised machine learning. Phys Rev B. 2020;102:134213. DOI:10.1103/
[83] Lustig E, Yair O, Talmon R, et al. Identifying topological phase transitions in experi
ments using manifold learning. Phys Rev Lett. 2020;125:127401. DOI:10.1103/
[84] Lidiak A, Gong Z. Unsupervised machine learning of quantum phase transitions using
diffusion maps. Phys Rev Lett. 2020;125:225701. DOI:10.1103/PhysRevLett.125.225701
[85] Long Y, Zhang B. Unsupervised data-driven classification of topological gapped
systems with symmetries. Phys Rev Lett. 2023;130:036601. DOI:10.1103/
[86] Albertsson K, Altoe P, Anderson D, et al. Machine learning in high energy physics
community white paper. J Phys Conf Series. 2018;1085:022008. DOI:10.1088/1742-
[107] Berwald JJ, Gottlieb JM, Munch E. Computing Wasserstein distance for persistence
diagrams on a quantum computer. arXiv:1809.06433. 2018. DOI:10.48550/ARXIV.
[108] Ubaru S, Akhalwaya IY, Squillante MS, et al. Quantum topological data analysis with
linear depth and exponential speedup. arXiv:2108.02811. 2021. DOI:10.48550/
[109] Akhalwaya IY, He Y-H, Horesh L, et al. Representation of the fermionic boundary
operator. Phys Rev A. 2022;106:022407. DOI:10.1103/PhysRevA.106.022407
[110] Kerenidis I, Prakash A. Quantum machine learning with subspace states.
arXiv:2202.00054. 2022. DOI:10.48550/ARXIV.2202.00054
[111] Akhalwaya IY, Ubaru S, Clarkson KL et al . Towards quantum advantage on noisy
quantum computers. arXiv:2209.09371. 2022. DOI:10.48550/ARXIV.2209.09371
[112] Gyurik C, Cade C, Dunjko V. Towards quantum advantage via topological data
analysis. Quantum. 2022;6:855. DOI:10.22331/q-2022-11-10-855
[113] Schmidhuber A, Lloyd S. Complexity-theoretic limitations on quantum algorithms for
topological data analysis. arXiv:2209.14286. 2022. DOI:10.48550/ARXIV.2209.14286
[114] Apers S, Sen S, Szabó D. A (simple) classical algorithm for estimating Betti numbers.
arXiv:2211.09618. 2022. DOI:10.48550/ARXIV.2211.09618