0% found this document useful (0 votes)
10 views30 pages

Deep Learning of Conjugate Mappings

This document presents a method for obtaining explicit Poincaré mappings of chaotic dynamical systems using deep learning to create invertible coordinate transformations. It leverages an autoencoder to classify chaotic systems through topological conjugacies, enabling improved forecasting, dimensionality reduction, and interpretable mappings. The approach is demonstrated on both low-dimensional systems like the Rössler and Lorenz systems, and infinite-dimensional systems such as the Kuramoto–Sivashinsky equation.

Uploaded by

mike88malone88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views30 pages

Deep Learning of Conjugate Mappings

This document presents a method for obtaining explicit Poincaré mappings of chaotic dynamical systems using deep learning to create invertible coordinate transformations. It leverages an autoencoder to classify chaotic systems through topological conjugacies, enabling improved forecasting, dimensionality reduction, and interpretable mappings. The approach is demonstrated on both low-dimensional systems like the Rössler and Lorenz systems, and infinite-dimensional systems such as the Kuramoto–Sivashinsky equation.

Uploaded by

mike88malone88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Deep Learning of Conjugate Mappings

Jason J. Bramburger∗† Steven L. Brunton‡ J. Nathan Kutz∗


arXiv:2104.01874v2 [math.DS] 13 Jul 2021

Abstract
Despite many of the most common chaotic dynamical systems being continuous in time, it
is through discrete time mappings that much of the understanding of chaos is formed. Henri
Poincaré first made this connection by tracking consecutive iterations of the continuous flow with
a lower-dimensional, transverse subspace. The mapping that iterates the dynamics through con-
secutive intersections of the flow with the subspace is now referred to as a Poincaré map, and
it is the primary method available for interpreting and classifying chaotic dynamics. Unfortu-
nately, in all but the simplest systems, an explicit form for such a mapping remains outstanding.
This work proposes a method for obtaining explicit Poincaré mappings by using deep learning
to construct an invertible coordinate transformation into a conjugate representation where the
dynamics are governed by a relatively simple chaotic mapping. The invertible change of variable
is based on an autoencoder, which allows for dimensionality reduction, and has the advantage
of classifying chaotic systems using the equivalence relation of topological conjugacies. Indeed,
the enforcement of topological conjugacies is the critical neural network regularization for learn-
ing the coordinate and dynamics pairing. We provide expository applications of the method
to low-dimensional systems such as the Rössler and Lorenz systems, while also demonstrating
the utility of the method on infinite-dimensional systems, such as the Kuramoto–Sivashinsky
equation.

1 Introduction

Much of our modern understanding of chaotic dynamical systems comes from tracking the inter-
section of trajectories with a lower-dimensional subspace that is transverse to the flow. Such a
subspace is typically called a Poincaré section in honour of Henri Poincaré who pioneered this
method in the late nineteenth century. Chaotic systems can be characterized by considering the
iterates between successive intersections of a single trajectory with the Poincaré section. With
such a mapping, now termed a Poincaré or first-return mapping, it is possible to forecast chaotic
systems and also to better understand the structure of a chaotic attractor. Unfortunately, there is
almost no system of interest for which we have an explicit representation of such a mapping. Thus,
studies of chaotic systems often involve analyzing simplified heuristic mappings that, in most cases,
cannot be directly related back to the systems that motivated their study. Prominent examples of
this include the relationship between the Rössler system and the logistic map [1], and the Lorenz
system and the Hénon map [2]. The goal of this work is to overcome this disconnect by using deep
learning to find an invertible coordinate transformation that allows for the explicit representation

Department of Applied Mathematics, University of Washington, Seattle, WA, 98105

Email: [email protected]

Department of Mechanical Engineering, University of Washington, Seattle, WA, 98105

1
of the chaotic dynamics on the Poincaré section while retaining a direct connection back to the
original system.
The power of modern computing and the increasing availability of time-series data has resulted in a
number of data-driven discovery methods for identifying nonlinear dynamical systems models [3–12].
The sparse identification of nonlinear dynamics (SINDy) method [4] employs sparse regression to
best fit the time derivative of time-series data to a sparse linear combination of candidate functions.
SINDy has been adapted to discover partial differential equations [9, 10], multiscale data [13, 14],
boundary value problems [15], conservation laws [16], implicit dynamics [17], conservative fluid
dynamics models [18], stochastic systems [19, 20], and, most important for our work here, Poincaré
maps [21]. In the case of Poincaré maps, it was shown that SINDy can be used to discover an
accurate mapping when the attractor is a fixed point, periodic, or even quasiperiodic, but the
method fails in many cases to provide models that govern chaotic section data. These discovered
Poincaré maps may be inaccurate if the library of candidate functions contains insufficient terms to
describe their dynamics, or when the section data is high-dimensional. The former can potentially
be remedied by continuing to add terms to the library, but at present this amounts to a heuristic
search that may result in overfitting and ill-conditioned numerical implementation.
To remedy the issue of discovering mappings from chaotic data, deep learning will be leveraged for
the joint discovery of a simplifying coordinate system and the simplified dynamical systems model
in these coordinates [6, 22, 23]. Deep learning provides a flexible architecture for the representation
of data, which has led to its significant integration into the physical and engineering sciences [24–
47]. Specifically, a feedforward, autoencoder neural network structure is used to implement the
concept of topological conjugacy, wherein a nonlinear mapping

xn+1 = f (xn ) (1.1)

is shown to generate equivalent dynamics to another mapping

yn+1 = g(yn ) (1.2)

if and only if there exists a homeomorphism h such that

f = h−1 ◦ g ◦ h. (1.3)

The relationship (1.3) gives that the relevant topology of the phase spaces of f and g is preserved
under the homeomorphism h. In particular, fixed and periodic orbits of f and g are mapped
into each other in a one-to-one fashion, while their stability is also preserved under the action of
h. Topological conjugacies are often employed to establish local results, such as the Hartman–
Grobman theorem [48], to better understand the dynamics in a small region of interest in the phase
space. The difference here is that we seek a global conjugacy that maps the entire phase space of
f into the entire phase space of g. Therefore, the topological organization of trajectories in phase
space is the same before and after applying the homeomorphism h.
The neural network exploits the concept of a topological conjugacy by circumventing discovering f
directly and instead discovers this invertible change of variable h along with the ‘simple’ mapping
g that governs iterates of the transformed observations. Here the term simple can be defined by the
user and is informally determined by which and how many candidate functions are used to build
the mapping g. Parsimonious representations are often desired due to interpretability and general-
izability. Thus, unlike [22, 23] where a coordinate system is learned to pair with linear (Koopman)
dynamics, or [6] where a coordinate system is learned to pair with a parsimonious (SINDy) rep-

2
resentation of the dynamics, here the coordinate system is learned that pairs with a conjugacy
mapping. Therefore, although there are computational methods for determining homeomorphisms
between two given mappings [49, 50], here we take a data-driven approach and discover the homeo-
morphism and the conjugate mapping simultaneously. The advantage the network presents is that
the candidate functions can be kept relatively basic, while the network performs the difficult task
of transforming the data to be fit by the simple mapping. Discovering simple mappings g that
generate equivalent dynamics to the unknown map f , is a data-driven generalization of what has
been done for decades with chaotic dynamics: providing a simple mapping that can be used to
understand the chaotic dynamics of a system of interest. The difference now is that through the
discovered homeomorphism h, a direct correspondence is learned between the heuristic mapping
and the original system dynamics.
Employing neural networks to discover conjugate mappings provides the following benefits:

1. Improved forecasting of chaotic dynamics.

2. Dimensionality reduction.

3. Actionable and interpretable mappings.

Point (1) is achieved through the homeomorphism h by transforming into a coordinate system
where the section data is better fit by a sparse combination of the SINDy library; without h, the
original data often doesn’t admit a simple representation in a standard library. An improved fit
to g in the transformed variables results in an improved fit for f = h−1 ◦ g ◦ h in the original
coordinates.
Point (2) comes from the fact that the homeomorphism h only operates on section data intersected
with the chaotic attractor. Therefore, it may be the case that observations are made up of a low-
dimensional chaotic attractor embedded in a high-dimensional phase space. The neural network
can obtain a change of coordinates that removes redundancies in the observation data and describe
the chaotic attractor with the fewest degrees of freedom possible. To estimate the dimension of the
attractor we will use the Kaplan–Yorke dimension [51] which is derived from the Lyapunov spectrum
of the system. This is motived in part by Kaplan and Yorke’s conjecture that for many chaotic
systems their measure of dimension is equal to the information dimension of the attractor [52], and
that it is useful in approximating an attractor’s fractal and Hausdorff dimensions [53].
Point (3) is a consequence of the limited library used to build the conjugate mapping g. The
main advantage of having an explicit form for the Poincaré mapping is the ability to identify its
periodic/recurrent orbits. It is known that infinitely many unstable periodic orbits (UPOs) densely
fill a chaotic attractor, and these UPOs manifest themselves as recurrent points in the Poincaré
map. In this way, UPOs form the skeleton of the attractor and a chaotic transient can be visualized
as closely following one UPO and then randomly switching to follow another, ad infinitum. This
approach to understanding chaotic dynamics is further emphasized by noting that concepts such as
Lyapunov exponents, entropy, and fractal dimensions can be expressed in terms of weighted sums
over the embedded UPOs [54–56]. Taking advantage of the equivalence of the dynamics generated
by the unknown Poincaré map f and the map g, one may identify UPOs in g and then map
them bijectively to UPOs in f using h−1 . This results in a quick and easy method for identifying
the location of UPOs in a Poincaré section that is potentially much simpler than other data-driven
methods [57]. Furthermore, the identified UPOs in the section can be used to seed shooting methods
to find the corresponding UPOs in the full continuous-time dynamical system, after which a fast
fixed point iteration will converge to the true UPO.

3
The combination of dimensionality reduction and obtaining explicit mappings holds great promise
for exploiting UPOs in the flow of high- and infinite-dimensional systems, such as partial differential
equations (PDEs). First, by identifying UPOs in the Poincaré section, the Ott-Grebogi-Yorke
(OGY) method can be implemented for controlling the output of a chaotic system [58]. This
method relies on applying small, precise parameter kicks each time the flow crosses the section to
keep trajectories close to a chosen UPO. In this way, one no longer has a chaotic output from the
system, but one that is periodic and therefore predictable. Second, much like in low-dimensional
chaotic systems, UPOs have been shown to guide the spatio-temporal chaos observed in turbulent
fluid models [59–67]. The work of Yalniz et al. exploits the transient visits of turbulent solutions to
the 3D Navier–Stokes equations to neighbourhoods of UPOs by reducing the system to a Markov
chain where each node corresponds to the neighbourhood of a UPO [68]. Since finding UPOs is
a major technical challenge for implementing this method on a wide variety of fluid flows, we are
hopeful that the methods presented here will be used to facilitate this analysis. To this end, the
utility of the methods is demonstrated for chaotic infinite-dimensional flows on the Kuramoto–
Sivashinsky PDE, a prototypical chaotic PDE used to understand turbulent flow.
This paper is organized as follows. In Section 2 the relevant mathematical concepts that form
the basis for the deep conjugacy computational algorithms are presented. This includes defining
Poincaré sections, conjugacies, and Lyapunov exponents, while also presenting numerical procedures
for seeding initial guesses for UPOs and controlling chaos. Section 3 contains a description of the
network architecture which is entirely designed to mimic the conjugacy f = h−1 ◦ g ◦ h, as described
above. Various aspects of the method are illustrated in Section 4 on three distinct three-dimensional
chaotic systems. Each system is strategically chosen to demonstrate that the neural network can
be used to provide nonlinear changes of variables to better fit polynomial mappings, target certain
types of mappings to confirm long-standing conjectures, and reduce the dimension of the observed
variables down to the dimension of the attractor. Once the various advantages to the neural network
are illustrated, two infinite-dimensional systems are considered in Section 5. These systems are the
Kuramoto–Sivashinsky PDE and the Mackey–Glass delay-differential equation. In both cases we
show that the Poincaré section dynamics can be accurately captured by simple quadratic mappings,
resulting in the extraction of a number of UPOs in the case of the former and numerical evidence
for a decades old conjecture in the case of the latter. We conclude in Section 6 with a discussion
of our findings and some avenues for future work.

2 Mathematical Framework

The construction of deep conjugacy relations requires a diversity of mathematical methods. As such,
the following section highlights the critical mathematical methods leveraged for the task, allowing
for an implementation and interpretation of the neural network architecture. We compliment this
theoretical presentation with brief discussions of numerical implementations.

2.1 Poincaré Maps

Many of our modern methods for understanding periodic, recurrent, and chaotic dynamics comes
from the formative work of Henri Poincaré in the late nineteenth century. He proposed that to
better analyze the flow of a continuous time dynamical system

ẋ = F(x), F : RD → RD (2.1)

4
one could define a transverse-to-the-flow hypersurface X ⊂ RD and simply track the values of the
trajectories x(t) each time they intersect the surface. This takes the dynamics of a continuous
system with a D-dimensional phase space to a discrete collection of iterates in the (at most)
(D − 1)-dimensional hypersurface X. For a trajectory x(t) we obtain a sequence

{xn := x(tn )| x(tn ) ∈ X, tn > tn−1 } (2.2)

consisting of the successive intersections of the trajectory with the surface X. One may then
consider an iterative scheme
xn+1 = f (xn ). (2.3)
The above mapping f is referred to as a Poincaré map or first return map, with the hypersurface
X typically referred to as a Poincaré section.
One major benefit in moving from a continuous time dynamical system to a Poincaré map is the
reduction in the phase space since we may discard information about the trajectories away from
their intersection with the lower-dimensional Poincaré section. Beyond this, periodic orbits of a
continuous time system manifest themselves as a recurrent set of points in the Poincaré section,
and therefore Poincaré maps provide an accessible method of identifying periodic orbits in a con-
tinuous flow field. This approach is predicated on the idea that one has access to a closed form
Poincaré mapping f : X → X, which traditionally has been a difficult problem. Recent methods
have leveraged data-driven model discovery methods to obtain such mappings [21], although these
methods become less accurate as the dimensionality D ≥ 1 of the phase space becomes large. In
what follows we will outline how we may leverage state-of-the-art machine learning techniques to
perform dimensionality reduction and nonlinear coordinate transforms to obtain simple mappings
which generate equivalent dynamics to an unknown Poincaré map.

2.2 Conjugacies and Dynamical Equivalences

The previous subsection motivates our study of discrete dynamical systems of the form (2.3). In
practice, the dynamics generated by the function f : X → X may be difficult to analyze, and so it is
often strategic to establish an equivalence of the dynamics with another mapping g : Y → Y whose
dynamics are better understood. Such an equivalence is achieved by obtaining a homeomorphism,
or conjugacy, h : X → Y so that g ◦ h = h ◦ f . Importantly, this gives that iterating (2.3)
is equivalent to applying h to xn , then g, and then h−1 , or f = h−1 ◦ g ◦ h. The situation is
summarized with the commutative diagram:

f
X X
h h (2.4)
g
Y Y

Hence, iterates governed by f lie in one-to-one correspondence with iterates of g, mapped into each
other by h and h−1 . If such an h can be found, we say that f and g are topologically conjugate,
or simply conjugate. Conjugacies are used to demonstrate dynamical equivalence of two mappings
and they form an equivalence relation of the set of all continuous surjections of a topological space
to itself. We note that conjugacies have also been referred to as factor maps, particularly in the
Koopman operator literature [69, 70].
In practice finding a conjugacy between mappings is exceedingly difficult, but potentially tremen-

5
dously rewarding. For example, suppose that f is a Poincaré mapping from the intersection of the
chaotic attractor with the Poincaré section back into itself. It may be the case that the dynamics
are difficult to analyze, and so establishing a conjugacy of f with a simpler mapping g would help
to better understand the chaotic dynamics of f . Importantly, periodic orbits of f and g lie in one-
to-one correspondence and their stability properties are transferred between the mappings using
the conjugacy. As we will see in what follows, we can identify periodic orbits in the (purportedly)
simpler mapping g, use h−1 to transfer them over to points in the Poincaré section as periodic
orbits of f , and use these points to either seed initial guesses for identifying UPOs or implement
an algorithm to tame the output of the chaotic system. This method of identifying UPOs and the
algorithm for controlling chaos are outlined in the following two subsections.

2.3 Discovery and Construction of UPOs

As discussed in the introduction, UPOs can be thought of as forming the skeleton of a chaotic
attractor. Therefore, to best understand the geometry and dynamics of the attractor, we seek to
identify these periodic orbits. To begin, a T -periodic solution of the ordinary differential equation

ẋ = F(x) (2.5)

satisfies x(t + T ) = x(t) for all t ∈ R. In almost every application the period T > 0 is unknown,
and therefore we will introduce t = sT so that the unknown T becomes a parameter in the system.
Then, T -periodic solutions of (2.5) are equivalent to 1-periodic solutions of

dx
= T F(x). (2.6)
ds
Following [71], the search for 1-periodic UPOs in (2.6) is equivalent to seeking a solution (x, T ) to
the system of (nonlinear) equations

dx
− T F(x) = 0
ds
x(1) − x(0) = 0 (2.7)
Z 1
hẋ0 (s), x0 (s) − x(s)ids = 0
0

where x0 (s) comes from an initial guess (x0 , T0 ) at the solution (x, T ). The first two equations
guarantee that the solution x(s) = x(t/T ) is a 1-periodic solution of (2.6), and therefore a T -
periodic solution of (2.5). Since the system is autonomous, it follows that any time shift of a
solution is also a solution, leading to a non-uniqueness of solutions to the first two equations in
(2.7). To remedy this, the third equation fixes a specific time-shift, based on the initial guess, and
in turn makes the solution unique.
The nonlinear equations above can be discretized in s to produce a nonlinear system of equations
that can be solved using any number of root-finding algorithms. We elect to use the built-in
MATLAB routine fsolve. The major difficulty of searching for UPOs then becomes finding an
appropriate initial condition that converges to a desired UPO. Recall that a chaotic attractor has
infinitely many UPOs that densely fill the attractor, and therefore (2.7) may have infinitely many
solutions. In the previous subsection we described a method for obtaining the location of periodic
points in the Poincaré section using a conjugate mapping. These periodic points correspond to

6
the UPOs intersecting the section, and therefore we may use these periodic points to initialize our
root-finding routine. Indeed, given a sequence of periodic points {x̄1 , x̄2 , . . . , x̄N } in the Poincaré
section, we may consider the initial value problems

ẋ = F(x)
(2.8)
x(0) = x̄k , k = 1, . . . , N.

We flow each of these trajectories until they reach the Poincaré section again. This produces a
sequence of trajectories starting from an iterate of the UPO in the Poincaré section that flow
forward until reaching the section again, likely reaching a point near the next iterate in the section.
Sequentially concatenating all of these trajectories together produces an approximation of the
UPO, although with jump discontinuities each time the trajectories reaches the section due to the
concatenation. The length of this initial guess further produces an initial guess for the period, T0 .
In our present formulation we seek to employ a neural network not to discover the Poincaré mapping
explicitly, but a mapping that is conjugate to it along with the homeomorphism that transfers one
between the mappings. Therefore, we will see that with an explicit conjugate mapping, g, we may
extract a sequence of its periodic points {ȳ1 , ȳ2 , . . . , ȳN } and then use the inverse homeomorphism
h−1 to obtain the desired sequence {x̄1 , x̄2 , . . . , x̄N } such that x̄k = h−1 (yk ) for all k = 1, . . . N . We
will demonstrate with our examples below that this process produces a fast and accurate method
for obtaining UPOs in chaotic systems.

2.4 Controlling Chaos

In their seminal work [58], Ott, Grebogi, and Yorke described a method for exploiting the UPOs
that densely fill a chaotic attractor to tame the output of a parameter-dependent system. This
approach to controlling chaos relies on finding small parameter manipulations to keep the flow of a
chaotic system near a UPO, thus making the output periodic and therefore predictable. We briefly
illustrate by assuming that
xn+1 = f (xn , µ) (2.9)
is a parameter-dependent Poincaré mapping of a chaotic dynamical system. Suppose {x̄1 , x̄2 , . . . , x̄N }
is a periodic sequence of the Poincaré mapping at a parameter value µ = µ̄. Linearizing system
(2.9) about each of the N points (x̄k , µ̄) means that locally we have

xn+1 ≈ x̄k+1 + Ak (xn − x̄) + Bk (µn − µ̄), (2.10)

where xN +1 = x1 by periodicity. Here the matrices Ak and Bk are the Jacobian matrices of the
Poincaré map evaluated at each pair (x̄k , µ̄). Since the periodic orbit is assumed to be unstable,
the matrix A1 A2 · · · AN has at least one eigenvalue outside the unit circle in the complex plane.
Following [72], we can employ the pole-placement method to stabilize the periodic orbit. This is
achieved by introducing small parameter manipulations at each iterate of the form
(
µ̄ + Ck (xn − x̄k ) |xn − x̄k | ≤ η
µn = (2.11)
µ̄ otherwise

where η > 0 is small. Notice that with µn as above, near each x̄k we have

x̄n+1 ≈ x̄k+1 + (Ak + Ck Bk )(xn − x̄k ), (2.12)

7
and so the matrices Ck are chosen in such a way to ensure that all eigenvalues of

(A1 + C1 B1 )(A2 + C2 B2 ) · · · (AN + CN BN ) (2.13)

lie inside the unit circle of the complex plane. With such Ck the UPO is now stable. Note that the
choice of Ck can be automated using linear matrix inequalities [73].
Although the above controlling chaos algorithm works perfectly on paper, it suffers from the fact
that it requires an explicit description of the Poincaré map (2.9). In fact, much work has been done
to circumvent this issue and it is understood that one may implement such a control algorithm
with only the locations of the UPOs in the section. Indeed, given a sequence {x̄1 , x̄2 , . . . , x̄N }
corresponding to a UPO, the derivatives in Ak and Bk can be estimated directly from the collected
section data. For example, if the governing equations are known, one may estimate derivatives
evaluated at a pair (x̄k , µ̄) in the following way. Initialize a trajectory of the system with initial
condition (xk ± hej , µ̄), where ej is the jth coordinate vector and h > 0 is small. We flow the
trajectory forward until reaching the section again, so that this terminal value gives the value of
f (x ± hej , µ̄). Hence, we may estimate

1
∂j f (x̄k , µ̄) ≈ [f (x̄k + hej , µ̄) − f (x̄k − hej , µ̄)]. (2.14)
2h
A similar procedure can be used to estimate the derivative in µ with the initial conditions (x, µ̄ ±
hej ).
Therefore, the main challenge to applying the control of chaos algorithm to a broad range of physical
systems is identifying the location of the UPOs in the Poincaré section. Although there are methods
to extract these UPOs from section data [57], UPOs can be identified quickly and efficiently by
applying the inverse homeomorphism h−1 to the UPOs of the discovered conjugate mapping. Since
the goal of this work is to identify an explicit, relatively simple, and low-dimensional conjugate
mapping, it follows that extracting its UPOs should be able to be automated with relative ease. In
Section 4 we discuss this efficiency as it applies to both the Rössler and Gissinger systems, while also
providing MATLAB implementations for controlling these systems in the code repository associated
to the manuscript.

2.5 Lyapunov Exponents

A standard approach to quantify chaotic behaviour is by analyzing the Lyapunov exponents of an


attractor, which we now describe. We start with an ordinary differential equation

ẋ = F(x) (2.15)

and a solution to the system x(t). The Lyapunov exponents (LEs) describe the rate of contraction
or separation of trajectories that start nearby x(t). Precisely, let J(t) := DF(x(t)) be the Jacobian
matrix of F at x(t) for each t. Then, we can define the evolution of the tangent vectors M(t) to
be the trajectory via the differential equation equation

Ṁ = J(t)M (2.16)

8
with the initial condition M(0) = I, the identity matrix. The LEs are defined to be the eigenvalues
of the matrix
1
Λ := lim log(M(t)M(t)T ). (2.17)
t→∞ 2t

The limiting matrix, Λ, is guaranteed to exist by Oseledets theorem and is a symmetric matrix,
making the eigenvalues real. When x(t) is a steady-state solution, i.e. independent of t, then the
LEs are just the real part of the eigenvalues of the constant matrix J. Similarly, when x(t) is
periodic, then the LEs correspond to the real part of the Floquet exponents.
Although the LEs can be used to understand the stability of steady-state and periodic solutions,
they are primarily employed to quantify systems with chaotic behaviour. Throughout we can order
the real-valued LEs such that
λ1 ≥ λ2 ≥ · · · ≥ λD , (2.18)
where D ≥ 1 is the dimension of the system. For chaotic systems, Oseledets theorem guarantees
that the LEs are independent of x(0), and chaos is typically indicated by the presence of at least one
positive LE. Furthermore, one LE is always zero, corresponding to the eigenvector in the direction
of the flow of x(t). Throughout we will calculate the LEs using the work of [74], for which the LEs
will only be reported to two decimal places due to potential numerical inaccuracy.
The significance of the LEs is that they can be used to estimate the dimension of a chaotic attractor.
Indeed, the Kaplan–Yorke dimension (sometimes Lyapunov dimension) is defined as
k
X λi
DKY = k + , (2.19)
|λk+1 |
n=1

where k is the maximum integer such that the sum of the k largest LEs is nonnegative. The
Kaplan–Yorke dimension is conjectured to be the dimension of the chaotic attractor and from this
conjecture we will use it as an approximation of the dimension of the latent variables inside the
network. This implies that although we may have a description of the iterates inside a Poincaré
section using a large number of variables, it could be the case that the attractor dimension is small,
as measured by the Kaplan–Yorke dimension. Thus, a conjugacy between the data collected in the
section and a new map g does not need to use the same number of variables as the collected data
since the conjugacy is only established from attractor to attractor. This will allow for a notion of
dimensionality reduction from the observed variables to the latent variables governed by the map
g. The dimension of these latent variables, d ≥ 1, will be given by

d ≈ DKY − 1, (2.20)

where some rounding will be necessary since DKY is rarely an integer. Recall that DKY is the
dimension of the attractor in the continuous system (2.15), and so we subtract one dimension as
we are moving to discrete iterates in the Poincaré section.

3 Network Architecture

The proposed network architecture is based on the concept of obtaining a conjugate mapping and
is illustrated in Figure 1. This neural network takes the form of a feedforward autoencoder. The
general idea is that if {xn } ⊂ RD−1 is a set of observations obtained from intersections of the flow

9
Input xn Output xn+1
yn yn+1

h h -1

{
g

f(xn) = h-1(g(h(xn)))
Figure 1: Diagram of the deep learning architecture for discovering nonlinear mappings f . The
network first conjugates the inputs xn into the latent variables yn = h(xn ). Then, the latent variables
are stepped forward using a discovered mapping g : Rd → Rd . Finally, the outputs of the unknown
mapping f are recovered by applying h−1 to the stepped latent variables g(h(xn )) = g(yn ).

with the Poincaré section, then instead of attempting to identify a mapping

xn+1 = f (xn ), (3.1)

we use the the network to find a homeomorphism h and a conjugate mapping g such that f =
h−1 ◦ g ◦ h. The neural network combines both nonlinear changes of coordinates and dimensionality
reduction to arrive at a latent mapping g as a function of the latent variables y ∈ Rd . Like other
data-driven discovery algorithms, the function g is expanded as a basis of library functions. The
coefficients in front of each basis function are set as variables in the network which are tuned
along with the weights of the network. The conjugacy function mimics an autoencoder with h and
h−1 trained as an encoder and decoder, respectively. We may collect training data from one long
trajectory of the nonlinear dynamical system since the transitivity of orbits on a chaotic attractor
can be relied upon to densely fill the attractor. This means that the longer the trajectory, the more
we have mapped out the entire attractor in the observable variables in the section, thus densely
filling out the domain of the conjugacy function h.
The goal is that the conjugate/latent mapping g : Rd → Rd is simple, as defined by the user. For
example, polynomials are some of the simplest basis functions that a function can be expanded in,
but it is unlikely that the unknown function f is itself a polynomial. With this in mind, we can
create a library of monomials up to a finite degree. The conjugacy h could then be used to identify
a nonlinear change of variable, under which the iterates of the latent variables yn := h(xn ) are
governed by a polynomial map. A well-known example of this is the conjugacy between the tent
map
f (x) = 2 min{x, 1 − x} (3.2)
and the logistic map
g(y) = 4y(1 − y). (3.3)

Indeed, the conjugacy is known to be h(x) = π2 arcsin( x), thus providing a nonlinear change of
coordinates that takes iterates of a piecewise continuous mapping, f , to a quadratic polynomial
map, g.
As mentioned above, an advantage to using the neural network to obtain conjugate mappings is

10
that it provides a notion of dimensionality reduction. Recall that the conjugacy is only between
attractors and the dimension of the attractor is approximated by the Kaplan–Yorke dimension,
DKY . If the flow of the governing system lies in RD , then necessarily we have that DKY ≤ D. In
almost every case of interest this inequality is strict. Recalling from Subsection 2.5 that we take
the latent mapping g to have phase-space Rd with d ≈ DKY − 1, we have the potential to greatly
reduce the dimension from that of the observations {xn } ⊂ RD−1 to that of the latent variables
y ∈ Rd . This dimensionality reduction provides one of the major benefits for obtaining conjugacies
with a neural network since it allows us to understand relatively low-dimensional attractors in
high-dimensional and even infinite-dimensional systems. We will illustrate this utility with the
Kuramoto–Sivashinsky equation in Section 5.1.
The loss function used to train the network is the sum of the different losses:

1. Loss 1: Conjugacy loss. A requirement for obtaining a conjugacy is that h be a homeo-


morphism, and is therefore invertible. To ensure this, we require h−1 ◦ h to reconstruct the
training data. This loss is given by kxn − h−1 (h(xn ))kMSE , where k · kMSE refers to the mean
square error.

2. Loss 2: Prediction loss. Applying the network to xn results in h−1 (g(h(xn ))), and to
ensure this replicates the dynamics of the unknown Poincaré map f we want the output to
be xn+1 = f (xn ). To enforce this we use the loss kxn+1 − h−1 (g(h(xn )))kMSE . Furthermore,
a conjugacy maps orbits of f to orbits of g, and so we may generalize the loss to incorporate
this. In general we have
s
1X
kxn+j − h−1 (gj (h(xn )))kMSE (3.4)
s
j=1

where the number of steps s ≥ 1 is a hyperparameter in the system and gj is the composition
of g with itself j times.

3. Loss 3: Latent mapping loss. Starting with xn and xn+1 , following the commutative
diagram (2.4) means that g(h(xn )) and h−1 (xn+1 ) should be the same. This is reflected by
the loss kh−1 (xn+1 ) − g(h(xn ))kMSE , which can similarly be generalized to s ≥ 1 steps into
the future with
s
1 X −1
kh (xn+j ) − gj (h(xn ))kMSE (3.5)
s
j=1

4. Loss 4: Network regularization. We apply elastic net regularization by including `1 and


`2 regularization terms to avoid overfitting.

In some cases we have found it advantageous to introduce one more loss:

5. Loss 5: Attractor stretching. If the image of h is very small then Loss 3 will also be
small, even if the latent mapping g is inaccurate. To get around this we introduce a loss that
stretches the attractor across the hypercube [0, 1]d . This loss takes the form
d
1 X
(1 − max{hj (xn )})2 + (min{hj (xn )})2 + (1 − max{gj (h(xn ))})2 + (min{gj (h(xn ))})2 ,
4d n n n n
j=1
(3.6)

11
where hj and gj denotes the jth component of h and g, respectively. This loss guarantees
that the domain and range of g goes from 0 to 1 in each component of h(xn ). Note that such
a stretching or compressing in each dimension is achieved by conjugating any map using a
linear conjugacy function. Therefore, this loss works to make the choice of g unique by fixing
the the extremal values of the domain and range.

Network weights are initialized randomly according to a normal distribution with mean zero, and
each hidden layer of the network uses the SeLu activation function [75]. To optimize performance, all
training data is scaled to only take values between 0 and 1. Complete python code and Jupyter note-
books to implement these methods are available at github.com/jbramburger/Deep-Conjugacies,
along with saved models that reproduce the results in this work. These saved models are the result
of extensive searches for the smallest loss over the various parameters used to build the network
and the above components of the loss function. These parameter values are given in the appendix.

4 Low-Dimensional Systems

We begin our investigation with three chaotic systems with three-dimensional phase spaces. These
examples are meant to illustrate the various tasks that the network is able to perform. Particularly,
we will see that in the case of the Rössler system the network can perform a nonlinear change of
coordinates that improves the fit of the data to a polynomial Poincaré map. In the case of the
Lorenz system we will see how the network can be used to obtain conjugacies between the section
data and heuristic mappings that have long been used to understand the section dynamics. Finally,
we will use the Gissinger system to demonstrate the dimensionality reduction aspect of the network
by moving from the three-dimensional phase space of the continuous flow, to the two-dimensional
Poincaré section data, to a conjugated one-dimensional mapping.

4.1 Rössler System

Our first case study will be the Rössler system, given by

ẋ1 = −x2 − x3
ẋ2 = x1 + 0.1x2 (4.1)
ẋ3 = 0.1 + x3 (x1 − c)

where c ∈ R is a bifurcation parameter. It is well known when x1 crosses 0 on the attractor, so


does x3 . Thus, we take x1 = 0 with ẋ1 > 0 as our Poincaré section, reducing ourselves to iterating
in a single variable x2 as we cross the section. In Figure 3 we provide plots of the first return map
on the attractor for varying c. We observe that the iterates appear to lie approximately along a
quadratic function, prompting many to use the logistic map

yn+1 = ryn (1 − yn ), r ∈ (0, 4] (4.2)

as a simple heuristic for understanding the dynamics of (4.1) on the attractor.


Previous attempts to fit the first returns maps to simple polynomial functions were met with varying
degrees of success [21]. For parameter values c where the attractor intersects the Poincaré section
at finitely many points (non-chaotic), the SINDy method [4] is able to obtain polynomial mappings
that accurately describe these iterations. This is not surprising since we are only required to fit the

12
Figure 2: Typical trajectories of (4.1) on the chaotic attractor for varying values of c in the (x1 , x2 )-
plane. The black line is a projection of the Poincaré section x1 = 0 and ẋ1 > 0.

polynomial to finitely many points. As the c value increases and the attractor becomes chaotic, the
polynomial functions become less well-suited to the first return maps of the Rössler system (4.1).
The inability for polynomial mappings to faithfully describe the chaotic return map dynamics thus
limits our ability to forecast the chaotic system, understand the statistics of the attractor, and pull
out periodic orbits [73].
One method to obtain a function that better represents the first return map of the Rössler attractor
is to include more than just polynomials in the library of candidate functions for which the first
return map must belong to the span of. There is an immediate limitation to this though, and
that is deciding which functions to include. Instead, we will use the network to obtain a nonlinear
change of variables which conjugates the dynamics of the first return map of the Rössler attractor
into a simple quadratic function of the form

g(y) = c1 y + c2 y 2 , (4.3)

where c1 , c2 are variables to be learned by the network. We first note that we do not include any
constant term in our mapping g since a simple linear shift of the variable y can eliminate this. There-
fore, we leave this linear shift to be learned by the network. Second, a linear stretching/compressing
of the variable y will conjugate the map (4.3) into the map
 
c2
g̃(y) = c1 y 1 + y , (4.4)
|c2 |

and therefore if c2 < 0 we get that the mapping (4.3) is dynamically equivalent to the logistic map
with r = c1 . The election to use two variables, c1 and c2 , allows the network greater freedom to
fine-tune the conjugacy between the training data and the iterates of the latent map.

13
Figure 3: First return maps on the attractor of the Rössler system (4.1) for varying c.

Neural Network SINDy Method


Parameter Mapping r-value Mapping r-value
c=9 3.6075y − 4.9044y 2 3.6075 1.0219 − 2.9885y + 2.2063y 2 3.6219
c = 11 3.8302y − 4.4801y 2 3.8302 1.0217 − 3.4481y + 2.9242y 2 3.7985
c = 13 3.8502y − 6.5346y 2 3.8502 1.0075 − 3.5638y + 3.1613y 2 3.8456
c = 18 3.9661y − 4.6718y 2 3.9661 0.9776 − 3.5219y + 3.4089y 2 3.6737

Table 1: Comparison of discovered Rössler Poincaré mappings along with the corresponding value
of r in the Logistic map (4.2) that they are conjugate to.

The results for c = 9, 11, 13 and 18 are presented in table 1. We also provide the quadratic function
produced by the SINDy method using the same training data. The obtained quadratic SINDy
mappings are easily conjugated to the logistic map (4.2), and so we also provide the corresponding
logistic parameter r for comparison. We have the best agreement between the discovered models at
c = 9, 13, when the attractor and the LEs are relatively small. The greatest discrepancy is at c = 18,
where the attractor and the LEs are relatively large. To compare the results at c = 18, in Figure 4
we provide the one-step error between the two maps and the training data. That is, we iterate
every training data point on the attractor through the network (red) and through the SINDy model
(black) and measure the distance from the true value, as obtained through numerically integrating
the Rössler system. This relatively large error in the SINDy model means that attempts to forecast
the iterates in the section will become inaccurate quickly due to the sensitivity of the mapping
to both initial conditions and the coefficients. We illustrate this inaccuracy in the right image
of Figure 4 with a random initial condition iterated forward using the network and the SINDy
method. We also plot the true iterates as obtained through numerical integration as blue squares

14
Figure 4: Left: One-step error of every training data point on the attractor of the Rössler system
with c = 18. Plotted is a comparison between the discovered SINDy model (black, circles) and
neural network (red, stars). Right: The reduced step error with the neural network results in more
accurate forecasting in the Poincaré section. Compare the training data (blue, squares) with the
SINDy model forecast (black) and the neural network model forecast (red), all with the same random
initial condition on the attractor.

for comparison. We see that the SINDy model becomes inaccurate after about two steps, while the
network mapping doesn’t become inaccurate until about seven steps. Finally, we emphasize that
between iterates in the section is approximately 7 time units, meaning the network has forecast the
trajectory nearly 50 time units into the future.
We further highlight the utility of the conjugate mapping in the context of controlling chaos. In a
related work it was shown that the SINDy models in chaotic regions are able to locate period one
and two UPOs [73], but attempts to located higher period UPOs was prevented due to numerical
inaccuracy. To demonstrate the utility of our results, we report here that using the latent mapping
g we are able to locate and stabilize periodic orbits at c = 11 up to at least period six. These results
are not presented here for brevity, but these calculations can be reproduced using an accompanying
MATLAB script included in the code repository for this article. We do highlight the fact that
these orbits were stabilized using the methods of Subsection 2.4 and the derivatives were estimated
directly from numerical integrations of the full Rössler system.
Finally, we remind the reader that the trained neural networks are the result of extensive hyper-
parameter searches. The values for the hyperparameters for each value of c is reported in the
appendix. At present we do not have any analytical results on the robustness of the discovered
mappings with respect to the neural network hyperparameters, but preliminary numerical explo-
rations have been promising. For example, taking c = 11 and varying the hyperparameters in a
neighbourhood of the values reported in the appendix returns similar values of the loss function.
Particularly, taking sufficiently widths ≥ 100 and one or more blocks in and out in the network
results in nearly identical results to those reported here. Therefore, we hope to provide convergence
guarantees and robustness results in a follow-up investigation.

15
Figure 5: Left: A typical chaotic trajectory of the Lorenz system (4.5) projected into the (x1 , x3 )-
plane. In red are the two nontrivial equilibria and black line represents a side view Poincaré section
given by x3 = 27 and ẋ3 < 0. Right: The intersection of the chaotic attractor with the Poincaré
section.

4.2 Lorenz System

Let us turn our attention now to one of the most celebrated equations in all of chaotic dynamics:
the Lorenz system. The governing equations are given by

ẋ1 = 10(x2 − x1 )
ẋ2 = x1 (28 − x3 ) − x2 (4.5)
8
ẋ3 = x1 x2 − x3
3
where we are using the standard parameter values that induce chaotic dynamics. We take the
Poincaré section to be x3 = 27 and ẋ3 < 0, which contains the two nontrivial equilibria at the
centres of the chaotic lobes. In Figure 5 we plot a typical chaotic trajectory of system (4.5), the
two nontrivial equilibria, and the Poincaré section in the (x1 , x3 )-plane. The classical method to
understand the Poincaré map is to follow [48], where they assume there is a change of variable
from the (x1 , x2 ) coordinate system in the section to a new set of coordinates (y1 , y2 ) for which the
Poincaré map can be written in the skew-product form
 
g1 (y1 )
g(y1 , y2 ) = . (4.6)
g(y1 , y2 )

Furthermore, the map is assumed to satisfy g(−y1 , −y2 ) = −g(y1 , y2 ) and has a jump discontinuity
at y1 = 0.
Despite having no exact correspondence between the Poincaré map of the Lorenz system and the
function (4.6), it has primarily been through such skew-product maps that we have formed our
modern understanding of the chaotic Lorenz system. To strengthen this connection, we attempt
to learn a conjugacy between the training data shown on the right of Figure 5 and a mapping
of the form (4.6) using our proposed neural network architecture. We have elected to attempt to
conjugate this data with a skew-product map of the form

g1 (y1 ) = −sgn(y1 ) + c1 y1 + c2 |y1 |y1


(4.7)
g2 (y1 , y2 ) = d0 sgn(y1 ) + d1 y2 ,

where sgn(·) is the function that returns 1 if the argument is positive and −1 if it is negative. The

16
Symbolic Sequence Period fsolve Iterates fsolve Time (seconds)
LR 1.5560 7 0.9253
LLR 2.3032 8 2.8569
LLLR 3.0208 9 7.8822
LLRR 3.0816 7 7.3054
LLLLR 3.7228 8 12.2913
LLLRR 3.8174 9 20.1933
LLRLR 3.8667 8 12.0158

Table 2: Symbolic sequences of periodic orbits in the discovered conjugate mapping are used to obtain
periodic orbits in the Lorenz system (4.5). All symbolic sequences up to symmetry up to length 5 are
presented, along with the period of the related UPO to the Lorenz system, the number of iterates to
converge to the UPO, and the time taken.

specific form of g2 was proposed in [76], thus motivating the choice here. The form for g1 is taken to
be the simplest quadratic model that satisfies the constraints assumed in [48]. The goal of employing
the neural network is not only to find the coordinate transformation from (x1 , x2 ) to (y1 , y2 ), but
to obtain an invertible nonlinear change of coordinates that simplifies the right-hand-side of the
mapping as well.
Using 26631 training data points gathered from a single trajectory of system (4.5), we find a
numerical conjugacy to a mapping of the form (4.7) with

c1 = 2.5248 d0 = −0.34275
(4.8)
c2 = 1.6595 d1 = 1.7825.

The conjugacy to the mapping (4.7) with the coefficients (4.8) now allows one to extract the inter-
section of the UPOs of (4.5) with the Poincaré section. From the method outlined in Section 2.3,
this in turn allows one to seed initial guesses for the UPOs to be found in the continuous time flow.
For example, a period 2 point of the map (4.7) is obtained by solving

y1 = g1 (g1 (y1 ))
(4.9)
y2 = g2 (g1 (y1 ), g2 (y1 , y2 ))

for (y1 , y2 ). Indeed, such a point is mapped back to itself after exactly two iterations of (4.7). De-
noting the extracted period 2 point that solves (4.9) by (ȳ1 , ȳ2 ), we then use the discovered function
h−1 from the neural network to define x̄1 = h−1 (ȳ1 , ȳ2 ) and x̄2 = h−1 (g1 (ȳ1 ), g2 (ȳ1 , ȳ2 )), using the
notation of Section 2.3. Since periodic points are mapped into each other by the homeomorphism
h, it follows that x̄1 and x̄2 are period 2 points in the Poincaré section of (4.5), thus belonging to
a UPO that crosses the section exactly twice before completing a full period. With these points we
can then seed initial guesses for the continuous-time UPOs of (4.5) using the method outlined in
Section 2.3.
To demonstrate the utility of the discovered mapping, we will follow the above outlined procedure
to obtain UPOs of (4.5). We use the built-in MATALB function fsolve to obtain periodic points of
the map (4.7) and to solve (2.7) for the desired continuous-time UPOs. All tolerances have been
set to at least 10−15 . Our results are summarized in table 2 where we have enumerate information
for all periodic orbits that have up to 5 intersections with the Poincaré section before getting back
to where they started. The ‘L’ denotes the trajectory winding around the left lobe of the attractor,
while the ‘R’ denotes the trajectory winding around the right lobe. Note that system (4.5) is

17
Figure 6: Left: A typical chaotic trajectory of the Gissinger system (4.10) projected into the (x̂1 , x̂2 )-
plane. In red is the equilibrium (−1, 1, 0) and the black line represents a side view of the Poincaré
section x̂1 + x̂2 = 0 and x̂˙ 1 + x̂˙ 2 > 0. Right: The first return map of the x̂2 -variable on the attractor.

equivariant with respect to the action (x1 , x2 , x3 ) 7→ (−x1 , −x2 , x3 ), meaning that a sequence of L’s
and R’s representing a periodic orbit is mapped by the symmetry to another periodic orbit obtained
by flipping the L’s to R’s and the R’s to L’s. Hence the sequence LRR, for example, is absent from
table 2 since the corresponding UPO can be obtained from the sequence LLR and therefore has
the same period. The periods of the UPOs agree with those in [76] up to at least three significant
digits, while greater accuracy can be obtained by increasing the temporal resolution. We note that
increasing the resolution does increase the time taken for each iterate, but rarely does it increase
the number of iterates. This testifies to the accuracy of the conjugacy between the Poincaré map of
the Lorenz system and to our discovered mapping. The corresponding UPOs can be plotted using
a simple MATLAB script, included in the code repository for this article.
The skew-product form of the discovered mapping (4.7) emphasizes that almost all of the in-
formation about the attractor in the section can be understood through the dynamics of the y1
variable. This is further emphasized by examining the Lyapunov spectrum. The LEs are given
by λ1 = 0.90, λ2 = 0, and λ3 = −14.57, giving a Kaplan–Yorke dimension of 2.06, and implying
that a one-dimensional map should be able to accurately describe the dynamics. The projection of
the attractor of our discovered mapping g(y1 , y2 ) onto the first component gives a bijection since
the periodic orbits of g1 (y1 ) lie in one-to-one correspondence with the periodic orbits of g(y1 , y2 )
which densely fill the attractor. With these facts as motivation, we are thus able to use the network
to obtain a conjugacy between the Lorenz section data and the mapping g1 (y1 ), providing for a
successful dimensionality reduction of the dynamics on the attractor.

4.3 Gissinger’s System

As an application of the dimensionality reduction capabilities of the neural network, we present


another simple three-dimensional system known to exhibit chaotic dynamics. First presented by
Gissinger [77], we will consider the model

ẋ1 = µx1 − x2 x3
ẋ2 = −νx2 + x1 x3 (4.10)
ẋ3 = Γ − x3 + x1 x2

Gissinger showed that by varying the parameters (µ, ν, Γ) ∈ R3 one would go from simple periodic
motion to chaotic motion, most notably through a period-double cascade. The Poincaré map used

18
in the original analysis is slightly tedious, therefore to simplify its definition we will shift and re-scale
the (x1 , x2 , x3 ) variables appropriately. Applying the change of variable

x̂1 = x1 /η1 ,
x̂2 = x2 /η2 (4.11)
x̂3 = (x3 − Γ)/η3
r r
q q √
where η1 = ν + Γ µ , η2 = µ + Γ µν , and η3 = − µν − Γ, we move one equilibrium to the
ν

origin and scale the remaining two to be (±1, ∓1, 1). Then, Gissinger’s Poincaré section becomes
x̂1 + x̂2 = 0 and x̂˙ 1 + x̂˙ 2 > 0 in the hatted variables. The reader should note that (4.11) is an
invertible change of variable and therefore is a conjugacy itself. This means that the dynamics in
the Poincaré section of the hatted variables lie in one-to-one correspondence with the dynamics in
Gissinger’s section for (4.10) by simply applying a linear change of variable.
We will focus on the parameter values µ = 0.12, ν = 0.1, and Γ = 0.85, with the left panel of
Figure 6 presenting a typical chaotic trajectory at these parameter values. The section data is
given in the variables (x̂2 , x̂3 ). The right panel of Figure 6 presents the next iterate of x̂2 in the
section against the current iterate. This data gives the appearance that the iterates of x̂2 in the
section are almost entirely independent of x̂3 . The next iterate of x̂3 in the section against the
current iterate looks similar but is not provided here for brevity. We have calculated the LEs to
two decimal places to be λ1 = 0.07, λ2 = 0, and λ3 = −1.05. This gives a Kaplan–York dimension
of 2.07, and from the discussion above we would expect the dynamics in the section to be conjugate
to a one-dimensional mapping.
We employ the neural network to discover a conjugacy of the Gissinger Poincaré section data with
a one-dimensional homogeneous cubic map. Using the specifications described in the appendix we
arrive at a conjugacy with the mapping

g(y) = 8.5344y − 18.2999y 2 + 9.8172y 3 (4.12)

The election to conjugate to a cubic map is motivated by the appearance of the training data in
Figure 6. In much the same way that the Rössler system gives the appearance that its section
data is governed by a quadratic map, the Gissinger system gives the appearance that its section
is governed by a cubic map. The difference is that the section data is governed by two variables,
(x̂2 , x̂3 ), and so attempting to discover a mapping governing these iterates would either require a
two-dimensional mapping or losing information about the potential slight influence the variables
have upon each other. Hence, the advent of using the network to discover conjugacies is that it can
automate the task of dimensional reduction while also providing a nonlinear change of variable to
fit to a simple model. Moreover, the simplicity of the mapping (4.12) lends itself well to extracting
UPOs and mapping them back to UPOs in the Gissinger system (4.10). To provide evidence for
this fact, the accompanying supplementary material contains a MATLAB script to stabilize UPOs
that intersect the Poincaré section once, twice, three times, and four times.
Prior to concluding this subsection we comment that additional explorations of the Gissinger system
(4.10) were undertaken to verify the consistency of our results. This includes working with the
original variables (x1 , x2 , x3 ) instead of introducing the change of variable (4.11). In this case the
coefficients of the discovered conjugate cubic mapping were within 4.5% of those of (4.12). One
method that we chose to compare the results between the discovered conjugacy trained on the
Poincaré section data with and without the change of variable (4.11) is to compare the location of

19
the fixed point (corresponding to the lowest period UPO) that they give in the original Gissinger
model (4.10). To do this we solve g(y) = y for the respective conjugacy mapping and then map
the result back to the Gissinger system using the associated inverse of the conjugacy, h−1 . The
resulting location from the network trained on the original Gissinger variables is within 0.3% of
the location from the neural network trained on the hatted variables (4.11). Similar conjugacies
to mappings nearly identical to (4.12) have also been established when using different Poincaré
sections.

5 Infinite Dimensional Systems

Having now demonstrated the utility of the neural network on low-dimensional chaotic systems,
we turn now to some more complex examples. In particular, we will focus on the Kuramoto–
Sivashinsky PDE and the Mackey–Glass equation. In the case of the former we demonstrate how
we can use high-dimensional approximations of the system coming from Galerkin projections to
identify the low-dimensional dynamics on the attractor. From these low-dimensional dynamics
we can then extract the intersections of the UPOs with the Poincaré section and implement the
methods of Subsection 2.3 to obtain the UPOs in the continuous-time system. In the case of the
Mackey–Glass equation we show how two different Poincaré maps are conjugate to the same simple
quadratic mapping and therefore can be used to provide evidence for a long-standing conjecture
surrounding the infinite-dimensional system.

5.1 Kuramoto–Sivashinsky Equation

Here we will consider the Kuramoto–Sivashinsky equation (KSE), given by the PDE

ut + νuξξξξ + uξξ + uuξ = 0, (5.1)

where u = u(ξ, t) is a function of space ξ ∈ [−π, π] and time t ≥ 0. Following [78, 79], we restrict
ourselves to the flow-invariant set of odd periodic functions u(ξ, t). We perform the Galerkin/Fourier
projection onto N ≥ 1 spatial modes with time varying coefficients
N
X
u(ξ, t) = xk (t) sin(kξ). (5.2)
k=1

This leads to an N -dimensional coupled system of differential equations for the time-dependent
coefficients, given by
N −k k
k X kX
ẋk = k 2 (1 − νk 2 )xk + xi xk+i − xj xk−j , k = 1, . . . , N. (5.3)
2 4
i=1 j=1

In what follows we will fix N = 14 and generate training data with initial conditions xk (0) drawn
from the uniform distribution on [0, 0.1]. The top left image in Figure 7 presents a typical chaotic
trajectory of (5.3) with ν = 0.0298 projected into the (x1 , x10 )-plane.
Previous numerical investigations of the KSE using the system (5.3) have shown that a period-
doubling sequence into chaos takes places as ν descends towards ∼0.0297. Furthermore, the
works [78, 79] calculate that this period-doubling sequence obeys the Feigenbaum scaling - a uni-

20
Figure 7: The chaotic attractor (black) of (5.1) with ν = 0.0298 projected into the (x1 , x10 )-plane
along with identified UPOs (blue). Top row: R and LR orbits. Bottom row: LLR, LRR, and LLLR.

versal feature of one-dimensional maps with a quadratic extremum. Notably, these works draw
frequent comparisons to the logistic map (4.2), but lack an analytical backing for such a correspon-
dence. Here we will follow [80] and define a Poincaré section by the half-plane x1 = 0 and ẋ1 > 0 to
produce a 13-dimensional collection of discrete section data. With the neural network architecture
described above, we are now in a position to obtain a conjugacy between this 13-dimensional section
data and the logistic map.
To illustrate our results we will take ν = 0.0298, which is near the end of the period-doubling
cascade and represents a region of chaotic dynamics in the system. The three largest LEs are
λ1 = 0.74, λ2 = 0, and λ3 = −5.92, thus giving a Kaplan–Yorke dimension of 2.13. This provides
further justification for attempting to obtain a conjugacy with a one-dimensional mapping. We
find a conjugacy with the quadratic mapping

g(y) = 3.9653y − 3.9153y 2 , (5.4)

and comment that applications of the method with the addition of cubic and/or quartic terms to
the mapping show little improvement over the obtained quadratic model. From the discussion on
the Rössler attractor, the discovered latent map (5.4) generates equivalent dynamics to the logistic
map (4.2) with r = 3.9653. As expected from the numerical results, we find that the chosen value
ν = 0.0298 is near the culmination of the period-doubling cascade and firmly in a chaotic parameter
region.
The utility of the discovered mapping (5.4) is that it provides a low-dimensional mapping that
can be used to understanding the chaotic attractor in the infinite-dimensional system (5.1). To
demonstrate this utility, we extract periodic orbits from (5.4) and relate them to the 13-dimensional
Poincaré section via the neural-network conjugacy. These section points are then used to seed initial
guesses to produce UPOs which form the skeleton of the chaotic attractor in the same way that
was done for the Lorenz system above. Furthermore, we search for the UPOs in system (5.3) with
N = 28, double the number of variables used for the training data, to improve accuracy. The
additional 14 variables are simply initialized as zero functions, aiding in showing that our 14 mode

21
Symbolic Sequence Period fsolve Iterates fsolve Time (seconds)
L 0.8630 7 4.9900
R 0.8567 6 6.0526
LR 1.7356 9 37.5210
LLR 2.6242 10 96.2644
LRR 2.6031 8 61.9304
LLLR 3.5167 10 167.6217
LLRR 3.5166 9 124.7951
LRRR 3.4623 10 159.5762
LLRLR 4.3651 11 441.0552
LLRRR 4.3539 12 306.9639
LRRLR 4.3424 11 236.0808
LRRRR 4.3247 9 210.8622

Table 3: Symbolic sequences of periodic orbits in the discovered conjugate mapping are used to obtain
periodic orbits in the KSE (5.1). All symbolic sequences that are present in the discovered mapping
up to length 5 are given, along with the same information provided in table 2.

truncation indeed captures much of the dynamical structure of the chaotic system. We label the
identified UPOs using the symbols L and R, meaning that the iterate of the map (5.4) is either to
the left or the right of its global maximum, respectively. Our results are summarized in table 3,
where we emphasize how few iterates of the solver are required to produce a UPO solution to
(5.3). We provide data on all periodic orbits with sequence length up to five and comment that
the sequences LLLLR and LLLRR are notably absent since such periodic orbits are not present in
(5.4). Figure 7 presents visual demonstrations of some of these identified UPOs projected onto the
(x1 , x10 )-plane.
As ν > 0 is lowered beyond the culmination of the period-doubling cascade the system (5.1)
exhibits a variety of exotic behaviour that is not fully understood [78, 79]. In particular, the
chaotic attractor becomes increasingly complicated, indicated by larger LEs and an increasing
Kaplan–Yorke dimension. The further the Kaplan–Yorke dimension of the attractor is above 2,
the more difficulty we have in understanding its topology. Therefore, the choice for the form of
the latent map g becomes less clear since we cannot easily relate the Poincaré section dynamics to
well-studied universal maps.
Taking N = 14 and ν = 0.021, the first four LEs of the KSE (5.3) are given by

λ1 = 1.75, λ2 = 0, λ3 = −1.65, λ4 = −4.85, (5.5)

giving a Kaplan–Yorke dimension of 3.02. In the top left of Figure 8 we provide a typical chaotic
trajectory of the system at ν = 0.021. Based on the size of the Kaplan–Yorke dimension, we seek a
latent mapping that is two-dimensional and conjugate to the 13-dimensional Poincaré section data.
The network is able to identify a conjugacy with the quadratic function

0.1009 + 1.5589y1 + 0.5601y2 − 0.2534y12 − 2.4928y1 y2 + 0.4457y22


 
g(y1 , y2 ) = . (5.6)
0.6545 + 0.4098y1 − 1.2802y2 + 0.2022y12 + 0.6164y1 y2 − 0.3460y22

We again demonstrate the utility of this mapping by using it to identify UPOs in the chaotic flow of
(5.3). We use a 28-mode truncation to identify UPOs with the final 14 modes initialized identically
to zero. Figure 8 shows plots of three UPOs projected into the (x1 , x10 )-plane. The presented

22
Figure 8: The chaotic attractor (black) of (5.1) with ν = 0.0210 projected into the (x1 , x10 )-plane
along with identified UPOs (blue). Top right: A UPO that intersects the Poincaré section exactly
once. Bottom: UPOs that intersect the Poincaré section exactly twice and three times, respectively.

UPOs intersect the Poincaré section at exactly one point, two points, or three points.

5.2 Mackey–Glass Equation

Let us now consider the Mackey-Glass delay differential equation [81]


2xτ
ẋ = − x, (5.7)
1 + xpτ
where xτ = x(t − τ ) is the value of the function τ > 0 time units in the past. We will fix τ = 2
throughout this investigation. Then, increasing p from approximately 7 results in a period-doubling
cascade into chaos [81]. On the left of Figure 9 we present typical chaotic trajectories with p = 9.65
in both the (x(t), x(t − 2))-plane and the (x(t), ẋ(t))-plane. This sequence of period-doubling
bifurcations has led to conjectures on whether there is a correspondence between the well-known
sequence of bifurcations leading to chaos in quadratic maps, such as the logistic equation (4.2), and
the infinite-dimensional Mackey–Glass equation (5.7). This remains an open problem that we will
provide numerically-assisted evidence for.
There are two different Poincaré sections that are used to analyze the infinite-dimensional dynamics
of (5.7). They are given by [82]

(1) x(t) = x(t − 2), s.t. ẋ(t) − ẋ(t − 2) ≥ 0, x(t) < 0.96
(5.8)
(2) ẋ(t) = 0, s.t. ẍ(t) > 0, x(t) < 0.8.

These sections are illustrated in Figure 9, along with their respective first return maps. To illustrate
how correspondences between these two distinct Poincaré maps and the logistic map (4.2) can be
found using the network we will fix n = 9.65 and show that both maps correspond to the same
value of logistic map parameter, denoted r. Indeed, at n = 9.65 the network produces conjugacies

23
Figure 9: Left: A typical chaotic trajectory of the Mackey–Glass delay equation (5.7) with p = 9.65
projected into the (top) (x(t), x(t − 2))-plane and into the (bottom) (x(t), ẋ(t))-plane. The black line
represents the Poincaré section (top) x(t) = x(t − 2) and (bottom) ẋ(t) = 0. Right: The first return
maps of the x-variable on the attractor for the respective Poincaré sections shown on the left.

with the quadratic maps

(1) g1 (y) = 3.8390y − 3.9016y 2


(5.9)
(2) g2 (y) = 3.8321y − 3.8976y 2 ,

where (1) and (2) denote the choices of Poincaré maps in (5.8). From the discussion on the Rössler
system, the maps are equivalent to the logistic map with r = 3.8390 and r = 3.8321, respectively.

6 Discussion

Through a number of illustrative examples, we have seen how the proposed deep learning architec-
ture can be used to understand and classify chaotic behaviour by learning coordinates and dynamics
that can be leveraged to understand conjugacies. Indeed, these examples have demonstrated that
by using the network to learn conjugate mappings we can improve forecasting of chaotic systems,
achieve a dimensionality reduction from the observed variables, and obtain actionable and in-
terpretable mappings. These aspects of the method were highlighted with three low-dimensional
chaotic systems: the Rössler, Lorenz, and Gissinger systems. Then, with an improved intuition and
understanding of the method, we applied it to two infinite-dimensional systems: the Kuramoto–
Sivashinsky PDE and the Mackey–Glass delay-differential equation. It was evident from the analysis
of the Kuramoto–Sivashinsky PDE that this method can go a long way towards better understand-
ing spatio-temporal chaos. It therefore becomes appealing to apply these methods to fluid systems,
where the dimensionality reduction can particularly help to understand low-dimensional turbulent
behaviour in extremely high-dimensional systems. Results in this direction will be reported in a

24
follow-up investigation to this one.
We emphasize that our methods are comparable to other data-driven discovery algorithms in that
the goal is to obtain an explicit dynamical system by expanding an unknown function in a library
of candidate functions. Hence, the discovery process amounts to tuning the coefficients of the
linear span for which the discovered mapping is assumed to belong to. The advantage of using the
autoencoder structure proposed in our work is that finding the change of variable, h, and tuning the
coefficients in the linear span are done simultaneously during the neural network training process.
Therefore, the neural network is given the opportunity to obtain a change of variable that transforms
the coordinates so that they can be best fit to the (potentially) limited library provided by the
user. This was on display with the Rössler, Kuramoto–Sivashinsky, and Mackey–Glass equations
where our libraries only contained degree one and two monomials and the Gissinger system where
we only use degree one, two, and three monomials. These examples are guided by intuition of
what the mapping should be, but the process described above is best demonstrated on the Lorenz
system. From years of previous work we have a good idea of what the Lorenz Poincaré mapping
should look like in terms of symmetries and a jump discontinuity, but there is no indication that
it should be (piecewise) polynomial. Our work shows that the network can transform the variables
in such a way that they fit a piecewise quadratic map whose coefficients are tuned simultaneously.
The result is a mapping guided by analytical rigour that is simple enough to gain specific insight
from.
Mappings that are conjugate to each other generate equivalent dynamics, thus forming an equiva-
lence relation of the set of all continuous surjections of a topological space to itself. Hence, with the
neural network we are able to classify and categorize distinct kinds of chaotic systems. For example,
the Rössler, Kuramoto–Sivashinsky, and Mackey–Glass equations were all shown to be conjugate
to the logistic map in certain parameter regions. Therefore, their chaotic attractors can be un-
derstood as a process of stretching and folding, as was made famous with Smale’s horsehoe [83].
This lies in contrast to the chaotic switching between lobes observed in the Lorenz system. Beyond
one-dimensional mappings lies a world of under-explored and little-understood chaotic dynamics.
Therefore, the work initiated in this manuscript can help to classify the topology of chaotic attrac-
tors, while also providing simple mappings that can be analyzed with this goal in mind. In this
way, we will be able to move toward better understanding not just chaotic, but hyperchaotic (more
than one positive Lyapunov exponent) behaviour. This also would include the study of periodically
driven and Hamiltonian systems, which were notably absent from our discussion. In the case of
Hamiltonian systems, the discovered mappings must necessarily be measure-preserving. This prop-
erty requires the targeted mapping to be at least two-dimensional and possess a specific structure,
thus presenting an additional challenge to applying the method. This presents an obvious nontrivial
extension of this method that we hope to report on in the future.
Finally, the ability to obtain conjugacies of Poincaré mappings with actionable and interpretable
nonlinear functions provides avenues for future computer-assisted analytical undertakings of chaotic
systems. That is, a computer-assisted proof first showed that the Lorenz system is indeed chaotic
using the Poincaré map [84]. A more recent example of such an undertaking has coupled Poincaré
mappings and computer-assisted proofs to show that the Kuramoto–Sivashinsky PDE is chaotic [85].
Therefore, we conjecture that by using the neural network architecture of this manuscript one will
be able to reach similar conclusions. Following similar computer-assisted proof methods, one could
seek a desired threshold for the loss function of the neural network to prove that an exact conjugacy
exists, followed by then proving that the simple conjugate mapping is chaotic. Since chaos is a
topological invariant, it would then follow that the original system is also chaotic. If such a method
of computer-assisted proof is possible, we suspect that it would considerably ease the difficulty of

25
proving a system is chaotic.

Acknowledgements

J.J.B. thanks Joseph Bakarji for his help with learning TensorFlow 2. SLB and JNK acknowledge
funding support from the Air Force Office of Scientific Research (AFOSR FA9550-19-1-0386). SLB
acknowledges support from the Army Research Office (ARO W911NF-19-1-0045). JNK acknowl-
edges support from the Air Force Office of Scientific Research (FA9550-19-1-0011).

Data Availability

The data that support the findings of this study can be generated using the scripts in the repository
github.com/jbramburger/Deep-Conjugacies.

References
[1] O.E. Rössler. An equation for continuous chaos. Physics Letters A, 57(5):397–398, 1976.
[2] M. Hénon. A two-dimensional mapping with a strange attractor. Communications in Mathematical
Physics, 50(1):69–77, 1976.
[3] Michael Schmidt and Hod Lipson. Distilling free-form natural laws from experimental data. Science,
324(5923):81–85, 2009.
[4] Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Discovering governing equations from
data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of
Sciences, 113(15):3932–3937, 2016.
[5] Alejandro Carderera, Sebastian Pokutta, Christof Schütte, and Martin Weiser. Cindy: Condi-
tional gradient-based identification of non-linear dynamics–noise-robust recovery. arXiv preprint
arXiv:2101.02630, 2021.
[6] K. Champion, B. Lusch, J. N. Kutz, and S. L. Brunton. Data-driven discovery of coordinates and
governing equations. Proceedings of the National Academy of Sciences, 116(45):22445–22451, 2019.
[7] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part i):
Data-driven solutions of nonlinear partial differential equations, 2017.
[8] Maziar Raissi, Paris Perdikaris, and George Em Karniadakis. Physics informed deep learning (part ii):
Data-driven discovery of nonlinear partial differential equations, 2017.
[9] Samuel H. Rudy, Steven L. Brunton, Joshua L. Proctor, and J. Nathan Kutz. Data-driven discovery of
partial differential equations. Science Advances, 3(4), 2017.
[10] H. Schaeffer. Learning partial differential equations via data discovery and sparse optimization. Proc.
Roy. Soc. A, 473(2197):20160446, 2017.
[11] Chen Yao and Erik M. Bollt. Modeling and nonlinear parameter estimation with Kronecker product
representation for coupled oscillators and spatiotemporal systems. Physica D: Nonlinear Phenomena,
227(1):78–99, 2007.
[12] Steven L Brunton, Marko Budišić, Eurika Kaiser, and J Nathan Kutz. Modern Koopman theory for
dynamical systems. arXiv preprint arXiv:2102.12086, 2021.
[13] Jason J. Bramburger, Daniel Dylewsky, and J. Nathan Kutz. Sparse identification of slow timescale
dynamics. Phys. Rev. E, 102:022204, Aug 2020.
[14] K. P. Champion, S. L. Brunton, and J. N. Kutz. Discovery of nonlinear multiscale systems: Sampling
strategies and embeddings. SIAM Journal on Applied Dynamical Systems, 18(1):312–333, 2019.
[15] Daniel E Shea, Steven L Brunton, and Nathan Kutz. SINDy-BVP: Sparse identification of nonlinear
dynamics for boundary value problems, 2020.
[16] Eurika Kaiser, J Nathan Kutz, and Steven L Brunton. Discovering conservation laws from data for
control. In 2018 IEEE Conference on Decision and Control (CDC), pages 6415–6421. IEEE, 2018.

26
[17] K. Kaheman, J. N. Kutz, and S. L. Brunton. SINDy-PI: a robust algorithm for parallel implicit sparse
identification of nonlinear dynamics. Proc. Roy. Soc. A, 476(2242):20200279, 2020.
[18] J.-C. Loiseau and S. L. Brunton. Constrained sparse Galerkin regression. Journal of Fluid Mechanics,
838:42–67, 2018.
[19] Lorenzo Boninsegna, Feliks Nüske, and Cecilia Clementi. Sparse learning of stochastic dynamical equa-
tions. The Journal of Chemical Physics, 148(24):241723, 2018.
[20] Jared L Callaham, Jean-Christophe Loiseau, Georgios Rigas, and Steven L Brunton. Nonlinear stochas-
tic modeling with Langevin regression. arXiv preprint arXiv:2009.01006, 2020.
[21] Jason J. Bramburger and J. Nathan Kutz. Poincaré maps for multiscale physics discovery and nonlinear
floquet theory. Physica D: Nonlinear Phenomena, 408:132479, 2020.
[22] Bethany Lusch, J Nathan Kutz, and Steven L Brunton. Deep learning for universal linear embeddings
of nonlinear dynamics. Nature communications, 9(1):1–10, 2018.
[23] Craig Gin, Bethany Lusch, Steven L Brunton, and J Nathan Kutz. Deep learning models for global
coordinate transformations that linearize PDEs. arXiv preprint arXiv:1911.02710, 2019.
[24] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio. Deep learning. MIT press Cambridge, 2016.
[25] J. Pathak, B. Hunt, M. Girvan, Z. Lu, and E. Ott. Model-free prediction of large spatiotemporally
chaotic systems from data: a reservoir computing approach. Phys. Rev. Lett., 120(2):024102, 2018.
[26] Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi,
Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational
inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.
[27] Christoph Wehmeyer and Frank Noé. Time-lagged autoencoders: Deep learning of slow collective
variables for molecular kinetics. The Journal of Chemical Physics, 148(241703):1–9, 2018.
[28] Andreas Mardt, Luca Pasquali, Hao Wu, and Frank Noé. VAMPnets: Deep learning of molecular
kinetics. Nature Communications, 9(5), 2018.
[29] Lu Lu, Xuhui Meng, Zhiping Mao, and George E Karniadakis. Deepxde: A deep learning library for
solving differential equations. arXiv preprint arXiv:1907.04502, 2019.
[30] Y. Bar-Sinai, S. Hoyer, J. Hickey, and M. P. Brenner. Learning data-driven discretizations for partial
differential equations. Proceedings of the National Academy of Sciences, 116(31):15344–15349, 2019.
[31] Miles D Cranmer, Rui Xu, Peter Battaglia, and Shirley Ho. Learning symbolic physics with graph
networks. arXiv preprint arXiv:1909.05862, 2019.
[32] M Raissi, P Perdikaris, and GE Karniadakis. Physics-informed neural networks: A deep learning
framework for solving forward and inverse problems involving nonlinear partial differential equations.
Journal of Computational Physics, 378:686–707, 2019.
[33] Steven L Brunton and J Nathan Kutz. Data-driven science and engineering: Machine learning, dynam-
ical systems, and control. Cambridge University Press, 2019.
[34] Karthik Duraisamy, Gianluca Iaccarino, and Heng Xiao. Turbulence modeling in the age of data. Annual
Reviews of Fluid Mechanics, 51:357–377, 2019.
[35] Frank Noé, Simon Olsson, Jonas Köhler, and Hao Wu. Boltzmann generators: Sampling equilibrium
states of many-body systems with deep learning. Science, 365(6457):eaaw1147, 2019.
[36] Steven L. Brunton, Bernd R. Noack, and Petros Koumoutsakos. Machine learning for fluid mechanics.
Annual Review of Fluid Mechanics, 52:477–508, 2020.
[37] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew
Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations.
arXiv preprint arXiv:2010.08895, 2020.
[38] Maziar Raissi, Alireza Yazdani, and George Em Karniadakis. Hidden fluid mechanics: Learning velocity
and pressure fields from flow visualizations. Science, 367(6481):1026–1030, 2020.
[39] Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. La-
grangian neural networks. arXiv preprint arXiv:2003.04630, 2020.
[40] Kookjin Lee and Kevin T Carlberg. Model reduction of dynamical systems on nonlinear manifolds using
deep convolutional autoencoders. Journal of Computational Physics, 404:108973, 2020.
[41] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew
Stuart, and Anima Anandkumar. Multipole graph neural operator for parametric partial differential
equations. arXiv preprint arXiv:2006.09535, 2020.
[42] Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew
Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equa-
tions. arXiv preprint arXiv:2003.03485, 2020.

27
[43] Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar,
Dominic Skinner, and Ali Ramadhan. Universal differential equations for scientific machine learning.
arXiv preprint arXiv:2001.04385, 2020.
[44] Dmitrii Kochkov, Jamie A Smith, Ayya Alieva, Qing Wang, Michael P Brenner, and Stephan Hoyer.
Machine learning accelerated computational fluid dynamics. arXiv preprint arXiv:2102.01010, 2021.
[45] R. González-Garcı́a, R. Rico-Martı́nez, and I.G. Kevrekidis. Identification of distributed parameter
systems: A neural net based approach. Computers & Chemical Engineering, 22:S965–S968, 1998.
European Symposium on Computer Aided Process Engineering-8.
[46] Tom Bertalan, Felix Dietrich, Igor Mezić, and Ioannis G Kevrekidis. On learning hamiltonian systems
from data. Chaos: An Interdisciplinary Journal of Nonlinear Science, 29(12):121107, 2019.
[47] Ramiro Rico-Martinez and Ioannis G Kevrekidis. Continuous time modeling of nonlinear systems: A
neural network-based approach. In IEEE International Conference on Neural Networks, pages 1522–
1525. IEEE, 1993.
[48] J. Guckenheimer and P. Holmes. Nonlinear Oscillations, Dynamical Systems, and Bifurcations of Vector
Fields. Applied Mathematical Sciences. Springer New York, 2002.
[49] Joseph D Skufca and Erik M Bollt. A concept of homeomorphic defect for defining mostly conjugate
dynamical systems. Chaos: An Interdisciplinary Journal of Nonlinear Science, 18(1):013118, 2008.
[50] Joseph D Skufca and Erik M Bollt. Relaxing conjugacy to fit modeling in dynamical systems. Physical
Review E, 76(2):026220, 2007.
[51] Paul Frederickson, James L Kaplan, Ellen D Yorke, and James A Yorke. The Liapunov dimension of
strange attractors. Journal of Differential Equations, 49(2):185–207, 1983.
[52] James L Kaplan and James A Yorke. Chaotic behavior of multidimensional difference equations. In
Functional Differential equations and approximation of fixed points, pages 204–227. Springer, 1979.
[53] Nikolay Kuznetsov and Volker Reitmann. Attractor dimension estimates for dynamical systems: theory
and computation. Springer, 2020.
[54] R Artuso, E Aurell, and P Cvitanovic. Recycling of strange sets: I. cycle expansions. Nonlinearity,
3(2):325–359, may 1990.
[55] Ditza Auerbach, Predrag Cvitanović, Jean-Pierre Eckmann, Gemunu Gunaratne, and Itamar Procaccia.
Exploring chaotic motion through periodic orbits. Phys. Rev. Lett., 58:2387–2389, Jun 1987.
[56] P. Cvitanović. Invariant measurement of strange sets in terms of cycles. Phys. Rev. Lett., 61:2729–2732,
1988.
[57] Paul So, Edward Ott, Steven J. Schiff, Daniel T. Kaplan, Tim Sauer, and Celso Grebogi. Detecting
unstable periodic orbits in chaotic experimental data. Phys. Rev. Lett., 76:4705–4708, Jun 1996.
[58] E. Ott, C. Grebogi, and J. A. Yorke. Controlling chaos. Phys. Rev. Lett., 64:1196–1199, 1990.
[59] N. B. Budanur, K. Y. Short, M. Farazmand, A. P. Willis, and P. Cvitanovic. Relative periodic orbits
form the backbone of turbulent pipe flow. Journal of Fluid Mechanics, 833:274–301, 2017.
[60] P Cvitanović and J F Gibson. Geometry of the turbulence in wall-bounded shear flows: periodic orbits.
Physica Scripta, T142:014007, dec 2010.
[61] L Fazendeiro, Bruce M Boghosian, Peter V Coveney, and Jonas Lätt. Unstable periodic orbits in weak
turbulence. Journal of Computational Science, 1(1):13–23, 2010.
[62] Valter Franceschini and Claudio Tebaldi. Sequences of infinite bifurcations and turbulence in a five-mode
truncation of the Navier-Stokes equations. Journal of Statistical Physics, 21(6):707–726, 1979.
[63] Dan Lucas and Colmcille Patrick Caulfield. Irreversible mixing by unstable periodic orbits in buoyancy
dominated stratified turbulence. arXiv preprint arXiv:1706.02536, 2017.
[64] Valery Petrov, Michael F Schatz, Kurt A Muehlner, Stephen J VanHook, WD McCormick, JB Swift, and
Harry L Swinney. Nonlinear control of remote unstable states in a liquid bridge convection experiment.
Physical review letters, 77(18):3779, 1996.
[65] Bruno Eckhardt, Tobias M Schneider, Bjorn Hof, and Jerry Westerweel. Turbulence transition in pipe
flow. Annu. Rev. Fluid Mech., 39:447–468, 2007.
[66] Balachandra Suri, Logan Kageorge, Roman O Grigoriev, and Michael F Schatz. Capturing turbu-
lent dynamics and statistics in experiments with unstable periodic orbits. Physical Review Letters,
125(6):064501, 2020.
[67] Michael D Graham and Daniel Floryan. Exact coherent states and the nonlinear dynamics of wall-
bounded turbulent flows. Annual Review of Fluid Mechanics, 53, 2021.
[68] Gökhan Yalnız, Björn Hof, and Nazmi Burak Budanur. Coarse graining the state space of a turbulent
flow using periodic orbits. arXiv preprint arXiv:2007.02584, 2020.

28
[69] Igor Mezić. Spectral properties of dynamical systems, model reduction and decompositions. Nonlinear
Dynamics, 41(1):309–325, 2005.
[70] Igor Mezić. Spectrum of the koopman operator, spectral expansions in functional spaces, and state-space
geometry. Journal of Nonlinear Science, pages 1–55, 2019.
[71] Björn Engquist. Encyclopedia of Applied and Computational Mathematics. Springer Publishing Com-
pany, Incorporated, 1st edition, 2016.
[72] Filipe J. Romeiras, Celso Grebogi, Edward Ott, and W.P. Dayawansa. Controlling chaotic dynamical
systems. Physica D: Nonlinear Phenomena, 58(1):165–192, 1992.
[73] Jason J Bramburger, J Nathan Kutz, and Steven L Brunton. Data-driven stabilization of periodic
orbits. IEEE Access, 9:43504–43521, 2021.
[74] Alan Wolf, Jack B. Swift, Harry L. Swinney, and John A. Vastano. Determining Lyapunov exponents
from a time series. Physica D: Nonlinear Phenomena, 16(3):285 – 317, 1985.
[75] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. Self-normalizing neural
networks. In NIPS, pages 972–981, 2017.
[76] Divakar Viswanath. Symbolic dynamics and periodic orbits of the Lorenz attractor. Nonlinearity,
16(3):1035–1056, apr 2003.
[77] C. Gissinger. A new deterministic model for chaotic reversals. The European Physical Journal B,
85(4):137, 2012.
[78] Y S Smyrlis and D T Papageorgiou. Predicting chaos for infinite dimensional dynamical systems:
the Kuramoto-Sivashinsky equation, a case study. Proceedings of the National Academy of Sciences,
88(24):11129–11132, 1991.
[79] Demetrios T. Papageorgiou and Yiorgos S. Smyrlis. The route to chaos for the Kuramoto-Sivashinsky
equation. Theoretical and Computational Fluid Dynamics, 3(1):15–42, 1991.
[80] Yoshitaka Saiki, Michio Yamada, Abraham C.-L. Chian, Rodrigo A. Miranda, and Erico L. Rempel.
Reconstruction of chaotic saddles by classification of unstable periodic orbits: Kuramoto-Sivashinsky
equation. Chaos: An Interdisciplinary Journal of Nonlinear Science, 25(10):103123, 2015.
[81] Leon Glass and Michael C. Mackey. Pathological conditions resulting from instabilities in physiological
control systems. Annals of the New York Academy of Sciences, 316(1):214–235, 1979.
[82] L. Glass and M. Mackey. Mackey-Glass equation. Scholarpedia, 5(3):6908, 2010. revision #186443.
[83] S. Smale. Differentiable dynamical systems. Bull. Amer. Math. Soc., 73(6):747–817, 11 1967.
[84] Z Galias and P Zgliczyński. Computer assisted proof of chaos in the Lorenz equations. Physica D:
Nonlinear Phenomena, 115(3-4):165–188, 1998.
[85] D. Wilczak and P. Zgliczyński. A geometric method for infinite-dimensional chaos: Symbolic dynamics
for the Kuramoto-Sivashinsky PDE on the line. J. Differ. Equ., 269(10):8509–8548, 2020.

A Model Information

Here we report the specifications used to produce the models described throughout the manuscript.
The first table collections the initial conditions used to generate the training data, as well as listing
the location of the Poincaré section. The following table lists the network parameters found through
parameter searches used to obtain the models. The ’Layers In’ refers to the number of network
layers to build h and ’Layers Out’ refers to the number of network layers to build h−1 . In every
application these two values are always set to be the same, but this is not necessary.

29
System Name Reference Initial Conditions Section Location Iterates

9877, c = 9
(0, −10, 0), c = −9 x1 = 0 9845, c = 11
Rössler (4.1)
(0, −15, 0), otherwise ẋ1 > 0 9788, c = 13
9738, c = 18
x3 = 27
Lorenz (4.5) (2, 0, 27) 26631
ẋ3 < 0
x̂1 + x̂2 = 0
Gissinger (4.10) (−1.5, 1.5, 1.3) 2615
x̂˙ 1 + x̂˙ 2 > 0
x1 = 0
Kuramoto–Sivashinsky (5.1) Random from [0, 0.1]14 5728
ẋ1 > 0
(1) x(t) = x(t − τ ) (1) 1739
Mackey–Glass (5.7) x(t) = 0.5, t ∈ [−τ, 0]
(2) ẋ = 0 ∧ ẍ > 0 (2) 1739

System Name Layer Width Layers In Layers Out Steps Learning Rate

Rössler (c = 9) 100 1 1 2 5 × 10−3


Rössler (c = 11) 80 1 1 2 5 × 10−3
Rössler (c = 13) 80 1 1 2 5 × 10−3
Rössler (c = 18) 100 1 1 2 5 × 10−3
Lorenz 200 3 3 2 1 × 10−3
Gissinger 100 2 2 1 1 × 10−3
Kuramoto–Sivashinsky (1D) 200 4 4 1 1 × 10−4
Kuramoto–Sivashinsky (2D) 200 4 4 1 1 × 10−4
Mackey–Glass (1) 200 4 4 2 5 × 10−4
Mackey–Glass (2) 300 4 4 2 1 × 10−4

30

You might also like