0% found this document useful (0 votes)
11 views9 pages

Learning Dynamics Models With Stable Invariant Set

Uploaded by

metricgab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views9 pages

Learning Dynamics Models With Stable Invariant Set

Uploaded by

metricgab
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI-21)

Learning Dynamics Models with Stable Invariant Sets


Naoya Takeishi 1,2 , Yoshinobu Kawahara 3,1
1
RIKEN Center for Advanced Intelligence Project
2
University of Applied Sciences and Arts Western Switzerland
3
Kyushu University
[email protected], [email protected]

Abstract self-sustained oscillations in physical, chemical, or biolog-


ical phenomena are modeled as systems with stable closed
Invariance and stability are essential notions in dynamical sys- orbits (Strogatz 2015). In the dynamical systems study, ana-
tems study, and thus it is of great interest to learn a dynamics
model with a stable invariant set. However, existing methods
lyzing system’s stability has been a classical yet challenging
can only handle the stability of an equilibrium. In this paper, problem. In contrast, synthesizing (i.e., achieving) stability of
we propose a method to ensure that a dynamics model has some dynamics models is a problem that has been addressed
a stable invariant set of general classes such as limit cycles mainly in automatic control and machine learning. Whereas
and line attractors. We start with the approach by Manek and control theory has been trying to achieve stable systems by
Kolter (2019), where they use a learnable Lyapunov function designing control inputs, another possible strategy is to learn
to make a model stable with regard to an equilibrium. We a dynamics model from data with a constraint that the model
generalize it for general sets by introducing projection onto attains some desired stability property.
them. To resolve the difficulty of specifying a to-be stable
In this work, we tackle the problem of learning dynamics
invariant set analytically, we propose defining such a set as a
primitive shape (e.g., sphere) in a latent space and learning models with guaranteed invariance and stability. This prob-
the transformation between the original and latent spaces. It lem is important in many practices of machine learning. For
enables us to compute the projection easily, and at the same example, we often have prior knowledge that a target phe-
time, we can maintain the model’s flexibility using various nomenon shows self-sustained oscillations (Strogatz 2015).
invertible neural networks for the transformation. We present Such prior knowledge is a powerful inductive bias in learning
experimental results that show the validity of the proposed and can be incorporated into learning by forcing a model to
method and the usefulness for long-term prediction. have a stable limit cycle. Likewise, we often want to assure
the invariance and stability (i.e., time-asymptotic behavior)
1 Introduction of a learned forecasting model for meaningful prediction or
safety issues. To these ends, we need a method to guarantee
Machine learning of dynamical systems appears in diverse invariance and stability of a dynamics model.
disciplines, such as physics (Raissi, Perdikaris, and Karni- Learning dynamics models with provable stability is not a
adakis 2019), biology (Costello and Martin 2018), chemistry new task. Learning linear dynamical systems with stability
(Li, Kermode, and De Vita 2015), and engineering (Morton (e.g., Lacy and Bernstein 2003; Siddiqi, Boots, and Gordon
et al. 2018). Recent progress in dynamics models includes 2008; Huang et al. 2016) is a long-standing problem, and
the Gaussian process dynamics models (Wang, Fleet, and learning stable nonlinear dynamics has also been addressed
Hertzmann 2006) and models based on deep neural networks by many researchers (e.g., Khansari-Zadeh and Billard 2011;
(e.g., Takeishi, Kawahara, and Yairi 2017; Lusch, Kutz, and Neumann, Lemme, and Steil 2013; Umlauft and Hirche 2017;
Brunton 2018; Chen et al. 2018; Manek and Kolter 2019; Duncker et al. 2019; Chang, Roohi, and Gao 2019; Manek
Greydanus, Dzamba, and Yosinski 2019). Whatever mod- and Kolter 2019; Massaroli et al. 2020). However, these
els are employed, we often would like to know and control methods can only handle the stability of a finite number
the nature of a learned dynamics model, for example, to re- of equilibria (i.e., points where a state remains if no external
flect prior knowledge of the dynamics and to ensure specific perturbation applies) and are not suitable for guaranteeing
behavior of the learned dynamics model. general stable invariant sets (e.g., limit cycles, limit tori, and
Invariance and stability of some subsets of a state space continuous sets of infinitely many equilibria). This limitation
play a key role in dynamical systems study as they concisely has been hindering useful applications of machine learning
describe the asymptotic behavior (i.e., in t → ∞) of a system. on dynamical systems, for example in physics and biology.
For example, dynamical systems with stable invariant sets
We develop a dynamics model with provable stability of
are used to explain neurological functions such as working
general invariant sets. The starting point of our model is
memory and eye control (Eliasmith 2005). Moreover, various
the approach by Manek and Kolter (2019), where they use
Copyright c 2021, Association for the Advancement of Artificial a learnable Lyapunov function to modify a base dynamics
Intelligence (www.aaai.org). All rights reserved. model to ensure the stability of an equilibrium. We generalize

9782
it for handling general subsets of state space (e.g., closed to model synchronization phenomena. In engineering, equi-
orbits, surfaces, and volumes) as stable invariant sets, by librium stability is used for roughly assessing the safety of
introducing projection onto such sets in the definition of the controlled agents and the plausibility of forecasting.
learnable Lyapunov function. A practical difficulty arising Lyapunov’s direct method is a well-known way to assess
here is that in general, we cannot specify the geometry of a to- the stability of equilibria (see, e.g., Hirsch, Smale, and De-
be stable invariant set analytically. To resolve this difficulty, vaney 2003), which can be summarized as follows:
we propose defining such a set as a primitive shape (e.g., Theorem 1 (Lyapunov’s direct method). Let xe be an equi-
a sphere) in a latent space and learning the transformation librium of dynamical system (1). Let V : U → R be a func-
between the original state space and the latent space (see tion on a neighborhood U of xe , and further suppose:
Figure 1). We can configure such a primitive shape so that
(A) V has a minimum at xe ; e.g., a sufficient condition is:
the projection is easily computed. At the same time, we can
(∀x ∈ U V (x) ≥ 0) ∧ (V (x) = 0 ⇔ x = xe ).
maintain the flexibility of the model using the rich machinery
(B) V is strictly decreasing along trajectories of (1); e.g.,
of invertible neural networks that have been actively studied
when V is differentiable, a sufficient condition is:
recently (see, e.g., Papamakarios et al. 2019).
In the remainder, we review the technical background in (∀x ∈ U\{xe } V̇ = dV /dt = h∇V (x), f (x)i < 0.
Section 2. We first give a general definition of the proposed If such a function V exists, then it is called a Lyapunov
dynamics model in Section 3 and then show its concrete con- function, and xe is asymptotically stable.
structions in Section 4. We introduce some related studies in
Section 5. We present experimental results in Section 6, with 2.3 Dynamics Models with Stable Equilibrium
which we can confirm the validity of the proposed method Manek and Kolter (2019) proposed a concise method to en-
and its usefulness for the application of long-term prediction. sure the stability of an equilibrium of a dynamics model by
The paper is concluded in Section 7. construction. They suggested learning a function V that sat-
isfies condition (A) in Theorem 1 with neural networks and
2 Background projecting outputs of a base dynamics model onto a space
2.1 Invariant Sets of Dynamical Systems where condition (B) also holds. Consequently, the modified
model’s equilibrium becomes asymptotically stable.
We primarily consider a continuous-time dynamical system Their dynamics model, ẋ = f (x), is built as
described by an ordinary differential equation (
β(x)
fˆ(x) − k∇V (x)k22
∇V (x), if β(x) ≥ 0,
ẋ = f (x), (1) f (x) =
ˆ
f (x), otherwise, (2)
where x ∈ X ⊂ Rd is a state vector in a state space X . ẋ
denotes the time derivative, dx/dt. We assume that f : X → where β(x) = ∇V (x)T fˆ(x) + αV (x).
X is a locally Lipschitz function. We denote the solution of
(1) with initial condition x(0) = x0 as x(t). An invariant Here, fˆ : Rd → Rd is a base dynamics model, and α ≥ 0
set of a dynamical system is defined as follows: is a nonnegative constant. Function V : Rd → R works as a
Lyapunov (candidate) function. V is designed so that it has a
Definition 1 (Invariant set). An invariant set S of dynamical
global minimum at x = 0 and no local minima:
system (1) is a subset of X such that a trajectory x(t) starting
from x0 ∈ S remains in S, i.e., x(t) ∈ S for all t ≥ 0. V (x) = σ (q(x) − q(xe )) + εkx − xe k22 , (3)

2.2 Stability of Equilibrium where ε > 0 is a positive constant to ensure the positivity of
V , and σ : R → R≥0 = [0, ∞) is a convex nondecreasing
A state xe s.t. f (xe ) = 0 is called an equilibrium of (1)
and constitutes a particular class of invariant sets. One of function with σ(0) = 0. Function q : Rd → R also needs
the common interests in analyzing dynamical systems is the to be convex, and they use the input convex neural networks
Lyapunov stability of equilibria (e.g., Hirsch, Smale, and (Amos, Xu, and Kolter 2017) for q.
Devaney 2003; Giesl and Hafstein 2015). Informally, an equi-
librium xe is stable if the trajectories starting near xe remain 3 Proposed Method
around it all the time. More formally; 3.1 Stability of General Invariant Set
Definition 2 (Stability of equilibrium). An equilibrium xe We begin with reviewing the theory around stability of gen-
is said to be Lyapunov stable if for every  > 0, there exists eral invariant sets, which comprises the theoretical backbone
δ > 0 such that, if kx(0) − xe k < δ, then kx(t) − xe k <  of the proposed method. First, stability of a general invariant
for all t ≥ 0. Moreover, if xe is stable, and x(t) → xe as set is formally defined as follows:
t → ∞, xe is said to be asymptotically stable. Definition 3 (Stability of invariant set). Let S ⊂ X be
The (asymptotic) stability of equilibria plays a crucial role a positively invariant set of dynamical system (1), and let
in analyzing dynamical systems as well as in applications. For dist(x, S) = inf s∈S kx − sk denote the distance between x
example, in computational neuroscience, dynamical systems and S. S is said to be stable if for every  > 0, there exists
with stable equilibria are used to explain phenomena such δ > 0 such that, if dist(x(0), S) < δ, then dist(x(t), S) < 
as associative memory and pattern completion. In physics, for all t ≥ 0. Moreover, if S is stable, and dist(x, S) → 0 as
coupled phase oscillators whose equilibria are stable are used t → ∞, S is said to be asymptotically stable.

9783
Stable invariant sets appear in various forms in a variety
of dynamics. For example, a closed invariant orbit is called a
limit cycle, and if it is asymptotically stable, nearby trajecto-
ries approach to it as t → ∞. Such stable limit cycles play a
key role in understanding behavior of various physical and
biological phenomena with oscillation (Strogatz 2015). More-
over, invariant sets comprising infinitely many equilibria are
often considered in analyzing higher-order coupled oscilla-
tors in physics (Tanaka and Aoyagi 2011) and continuous
attractor networks in neuroscience (Eliasmith 2005).
The LaSalle’s theorem characterizes the asymptotic stabil-
ity of a general invariant set (see, e.g., Khalil 2002):
Theorem 2 (LaSalle’s theorem). Let Ω ⊂ D ⊂ Rd be a
compact set that is positively invariant for the dynamical
system (1). Let V : D → R be a differentiable function such
Figure 1: Proposed dynamics model, ẋ = f (x), for the case
that V̇ (x) ≤ 0 in Ω. Let E ⊂ Ω be the set of all points in Ω
where latent stable invariant set S̃ is defined as S̃ surf in (7b).
such that V̇ (x) = 0. Let S ⊂ E be the largest invariant set
Two states, x ∈ / S (blue) and x ∈ S (red), are shown. The
in E. Then, every solution of (1) starting from a point in Ω
dotted orbits are the original and latent stable invariant sets,
approaches to S as t → ∞.
S ⊂ X and S̃ ⊂ Z. Input x ∈ X is first transformed into
3.2 Dynamics Models with Stable Invariant Set a latent state z ∈ Z by a learnable bijective function φ and
then fed into a base dynamics model h. h(z) is modified to
We give a general framework to construct a dynamics model
with a general stable invariant set. This framework can be im- ensure the stability and/or invariance of S̃ by (6) and/or (8),
plemented with any parametric function approximators, such respectively. Finally, things are projected back to X by φ−1 .
as neural networks, as its components. We provide concrete
examples of implementation later in Section 4.
The proposed dynamics model, ẋ = f (x), is depicted in Z, expecting (inverse of) φ should learn an appropriate
in Figure 1. It comprises five steps. Given a state vector transformation of the primitive shape in Z into a to-be stable
x as an input, it first computes a transformed latent state invariant set in the original space, X . Hereafter, a to-be stable
z = φ(x) via a learnable bijective function φ (Step 1). invariant set in X and the corresponding primitive set in Z
Latent state z is fed into a base dynamics model h (Step 2). are denoted by S ⊂ X and S̃ ⊂ Z, respectively.
Then, h(z) may be modified to be g(z) to ensure stability Step 2: Base Dynamics Model The second component is a
of some set S̃ (Step 3). g(z) may further be modified to be base dynamics model h : Z → Z that acts on the latent state
f˜(z) to ensure invariance of S̃ (Step 4). Finally, it computes z. We can use any parametric models as h. Note that we do
f (x) = φ−1 (f˜(x)) via the inverse of φ (Step 5). In the not have control on the invariance and stability properties of
following, we explain the details of the five steps. base dynamics model ż = h(z). Hence, we need to modify
the output of h to make S̃ a stable invariant set.
Step 1: Learnable Invertible Feature Transform Given
a state vector x ∈ X ⊂ Rd as an input, we transform it into a Step 3: Ensuring Stability In this step, we modify the out-
latent state z ∈ Z ⊂ Rd using a learnable bijective function put of the base dynamic model, h, so that z’s trajectories
φ : X → Z, that is, converge to some limit set S̃ ⊂ Z in t → ∞. According
z = φ(x). (4) to Theorem 2, for trajectories to converge to S̃, it is suffi-
cient that there exists a function V : Z → R whose value
We restrict φ to be bijective for provable existence of a stable decreases along trajectories everywhere outside S̃. To this
invariant set. A bijective function φ can be modeled in a end, we construct a candidate function for V by generalizing
number of ways given the development of invertible neural the method of Manek and Kolter (2019).
networks in the area of normalizing flows (see, e.g., Papa- Suppose S̃ ⊂ Z is convex. This assumption does not
makarios et al. 2019, and references therein). Hence, we make us lose much generality because even if S̃ is convex,
believe that restricting φ to be bijective does not severely corresponding S ⊂ X is not necessarily convex thanks to
limit the flexibility of the proposed dynamics model. the feature transform φ in Step 1 and Step 5. Let PS̃ z de-
Such a feature transform, together with its inverse in note the orthogonal projection of z onto S̃, that is, PS̃ z =
Step 5, is indispensable when we cannot exactly parametrize arg mins∈S̃ kz − sk22 . Let q : Z → R be a convex function,
the geometry of a to-be stable invariant set (in X ) that a and σ : R → R≥0 be a convex nonnegative nondecreasing
learned dynamics model should have. This is often the case function with σ(0) = 0. We define a function V as
in practice; for example, we may only know the presence
V (z) = σ q(z) − q PS̃ z + εkz − PS̃ zk22 ,

of a limit cycle but cannot describe its shape analytically in (5)
advance. Our proposal lies in avoiding such a difficulty by
defining a set that has a primitive shape (e.g., a unit sphere) with ε > 0. It reaches the minimum V (z) = 0 at z ∈ S̃ and

9784
does not have any local minima at z ∈ / S̃ from construction. t ≤ s). Then, from the definition of S̃ vol in (7a), we have
Given such a function (5), we modify the outputs of the base c(t) ≥ 0 and c(s) < 0. By the continuity of c, there is at least
dynamics model, h(z), into g(z) as follows: one point r ∈ [t, s] where c(r) = 0 and ċ(r) ≤ 0. At this
( point, CS̃ (z(r)) = 0 and ∇CS̃ (z(r))T F (z(r)) = ċ(r) ≤ 0,
h(z), z ∈ S̃, which is a contradiction to what we assumed. Therefore, (a)
g(z) =  β(z)+η(z)
holds.
h(z) − u β(z) k∇V (z)k2 ∇V (z), z ∈ / S̃, (6)
2
Second, let us see (b). Analogously to the above case,
where β(z) = ∇V (z)T h(z) + αV (z). assume CS̃ (z) = 0 ⇒ ∇CS̃ (z)T F (z) = 0 and S̃ = S̃ surf
is not a positively invariant set. Then, from the definition of
Here, u is the unit step function (i.e., u(a) = 1 if a ≥ 0 and
u(a) = 0 otherwise), and α ≥ 0 is a nonnegative constant. S̃ surf in (7b), we have c(t) = 0 and c(s) 6= 0. Hence, we have
η : Rd → R≥0 is a nonnegative function that works like c(r) = 0 and ċ(r) 6= 0 at some point r ∈ [t, s], which is a
a slack variable. Merely setting η(z) = 0 also ensures the contradiction. Therefore, (b) holds.
stability of S̃, but it may be useful to make η a learnable Given this fact, we modify the outputs of the previous step,
component if we want more flexibility. Note that (5) and (6) g(z), into f˜(z) as follows:
do not ensure anything about the positive invariance property (
γ(z)−ξ(z)
of S̃; it is deferred in Step 4. g(z) − k∇C 2 ∇CS̃ (z), CS̃ (z) = 0,
˜
f (z) = S̃ (z)k2
Comparing (5) and (6) to (3) and (2), we can see that g(z), CS̃ (z) 6= 0, (8)
Step 3 here is indeed a generalized version of the method
of Manek and Kolter (2019). We note that such a general- where γ(z) = ∇CS̃ (z)T g(z).
ization is meaningful only with other components of the
proposed method; we need the learnable feature transform The definition of ξ depends on that of S̃; if S̃ is defined as
of Step 1 and Step 5 to avoid difficulty of parametrizing a S̃ vol in (7a), ξ : Z → R>0 is a positive-valued function; if S̃
stable invariant set analytically, and the procedure of Step 4 is as S̃ surf in (7b), it is simply ξ(z) = 0. Note that in actual
is indispensable to ensure that trajectories do not escape from computation, condition CS̃ (z) = 0 in (8) should be replaced
a limit set. Step 3 does not work without these remedies. by |CS̃ (z)| ≤  with a tiny .
We may compute PS̃ in a closed form when S̃ is a simple-
Step 5: Projecting Back Things have been described in
shaped set like a sphere or a 2-torus. Such a simple S̃ does terms of the latent state z ∈ Z after Step 1. However, what
not severely drop the flexibility of a dynamics model if we we want is a dynamics model on x ∈ X , namely ẋ = f (x).
set φ to be flexible enough. Meanwhile, we can also adopt S̃ As the final part of the proposed method, we project things
with nontrivial PS̃ if needed, by employing the technique of back to X via the inverse of φ, that is,
the convex optimization layer (Agrawal et al. 2019).  
f (x) = φ−1 f˜(z) = φ−1 f˜ φ(x) .

Step 4: Ensuring Invariance Recall that the previous step (9)
only ensures S̃ is a limit set. Even if trajectories converge to Recall that we assumed φ is invertible in Step 1.
S̃ in t → ∞, they may escape unless it is also invariant. To
make S̃ invariant, we further modify the output of g. 3.3 Analysis
Without loss of generality, we consider the following two The dynamics model in (9) has a stable invariant set that
types of the definition of S̃: can be learned from data. We summarize such a property
as follows, where we describe the cases of ·vol and ·surf in
S̃ vol = {z | CS̃ (z) ≥ 0}, (7a)
parallel.
surf
S̃ = {z | CS̃ (z) = 0}, (7b)
Proposition 2. Let S̃ vol (or S̃ surf ) be a subset of Z ⊂ Rd
where CS̃ : Z → R is a continuously differentiable function. defined in (7a) (or (7b)). Let f˜ : Z → Z be the function in
The invariance of such sets can be characterized as follows: (8). Then, for a dynamical system ż = f˜(x), S̃ vol (or S̃ surf )
Proposition 1. For a dynamical system ż = F (z) with some is a positively invariant set and is asymptotically stable.
F : Z → Z, Proof. Let us consider the case of S̃ vol (the discussion
(a) If CS̃ (z) = 0 ⇒ ∇CS̃ (z)T F (z) > 0, then S̃ vol in holds analogously for S̃ surf ). Recall that from the definition,
(7a) is a positively invariant set.
CS̃ (z) = 0 implies z ∈ S̃. Hence, from (8), if CS̃ (z) = 0,
(b) If CS̃ (z) = 0 ⇒ ∇CS̃ (z)T F (z) = 0, then S̃ surf in
(7b) is a positively invariant set. then ∇CS̃ (z)T f˜(z) = ξ(z) > 0, which proves the invari-
ance of S̃ (Proposition 1). As for stability, we should show
Proof. Let z(t) be a trajectory of the dynamical system,
ż = F (z). Let c : R → R be a function such that V̇ (z) = 0, if and only if z ∈ S̃,
(10)
c(τ ) = CS̃ (z(τ )). First, let us consider case (a). For a V̇ (z) < 0, otherwise.
proof by contradiction, assume the negation of (a), that is,
CS̃ (z) = 0 ⇒ ∇CS̃ (z)T F (z) > 0 and S̃ = S̃ vol is not a Suppose z ∈ S̃. We have V (z) = 0 for every z ∈ S̃ from
positively invariant set (i.e., z(t) ∈ S̃ and z(s) ∈
/ S̃ for some the construction, and the orbits of f˜ stay in S̃ because S̃ is a

9785
positively invariant set. Hence, V̇ (z) = 0 for z ∈ S̃. On the coupling flows (see, e.g., Teshima et al. 2020). In fact, such
other hand, suppose z ∈ / S̃. Then, we have prior knowledge is often available from our scientific under-
standing of physical, chemical, and biological phenomena
∇V (z)T f˜(z) + αV (z) (see, e.g., Strogatz 2015; Tanaka and Aoyagi 2011; Eliasmith
= ∇V (z)T g(z) + αV (z) 2005). We may use other types of invertible models if less
prior knowledge is available; for example, a neural ODE with
= ∇V (z)T h(z) + αV (z) − u ∇V (z)T h(z) + αV (z)

auxiliary variables (namely, ANODE) (Dupont, Doucet, and
· ∇V (z)T h(z) + αV (z) + η(z) Teh 2019) can represent non-homeomorphic functions. If

  perfect prior knowledge is available (i.e., we can specify the
= β(z) − u β(z) β(z) + η(z) . geometry of S ⊂ X analytically), simply set z = φ(x) = x.
First, suppose β(z) ≥ 0. Then u(β(z)) = 1, and thus h We can substitute arbitrary models to the base dynamics,
∇V (z)T f˜(z) + αV (z) = −η(z) ≤ 0. Second, suppose h, in accordance with the nature of data.
β(z) < 0. Then u(β(z)) = 0, and thus ∇V (z)T f˜(z) + q The convex function, q in (5), can be modeled using the
input-convex neural networks (Amos, Xu, and Kolter 2017)
αV (z) = β(z) < 0. As V (z) > 0 at z ∈ / S̃ from the
as in the previous work (Manek and Kolter 2019).
construction of V , in either of the cases above, we have
∇V (z)T f˜(z) ≤ −αV (z) < 0. Hence, V̇ (z) < 0 for all η and ξ The slack-like functions, η and ξ in (6) and (8),
z ∈/ S̃. This proves (10), from which we can say that S̃ is respectively, can be modeled as neural networks with output-
value clipped to be nonnegative or positive.
the largest subset of the state space such that V̇ (z) = 0.
Therefore, from Theorem 2, S̃ is asymptotically stable. 4.2 Stable Invariant Set
vol
Corollary 1. Suppose a subset of X , namely S = {x | Besides the learnable components, we should prepare a to-be
CS̃ (φ(x)) ≥ 0} (or S surf = {x | CS̃ (φ(x)) = 0}), where stable invariant set. If we do not know the analytic form of
CS̃ is the function that appeared in (7). Let f = φ−1 ◦ f˜ ◦ φ S ⊂ X (i.e., CS ), which is usually the case, we are to define
as in (9). Then, S vol (or S surf ) is an asymptotically stable S̃ ⊂ Z (i.e., CS̃ ) instead. A general guideline we suggest
invariant set of a dynamical system defined as ẋ = f (x). is to set S̃ as a simple primitive shape, such as spheres and
tori. As stated earlier, setting S̃ to be a primitive shape does
3.4 Extension not severely restrict the flexibility of the model, thanks to the
The proposed dynamics model in Section 3.2 works not only learnable feature transform, φ, which “deforms” a simple S̃
as a standalone machine learning model, but also as a module to various S. We can regard unknown coefficients in CS̃ (e.g.,
embedded in a larger machine learning method. For example, radius of sphere) as learnable parameters, too.
suppose we have high-dimensional observations y ∈ Y (e.g., We can also consider the case of low-dimensional S by
observation of fluid flow). In such a case, we often try to setting S̃ also low-dimensional. Here, axes of Z ignored by
transform y into lower-dimensional vectors, namely x, using a low-dimensional S̃ can be arbitrary because the feature
methods like principal component analysis and autoencoders. transform φ modeled by a neural network is usually flexible
We can then consider a dynamics model (with a stable in- enough to learn a rotation between Z and X .
variant set) on x as in Section 3.2, rather than on y directly. Care may have to be taken in the computation of PS̃ . If we
Temporal forecasting is performed in the space of x and then do not know a closed form of PS̃ z, as long as S̃ is convex
returned to the space of y. as we assumed, we can use the differentiable convex opti-
Such an extension is straightforward yet useful, but a draw- mization layer (Agrawal et al. 2019) to allow gradient-based
back is that if the dimensionality reduction is lossy, which is
optimization. For example, suppose we have S̃ of the type of
often the case, we can no longer guarantee a stable invariant
(7a). Then, PS̃ is an optimization problem:
set in Y. Nonetheless, such an approximative model may still
be useful. We exemplify such a case in Section 6.4, where PS̃ z = arg min kz − sk22 s.t. CS̃ (s) ≥ 0. (11)
we reduce the dimensionality of fluid flow observations by s
principal component analysis and learn a dynamics model on The derivative of its output (i.e., ∂PS̃ z/∂z) can be computed
the low-dimensional space. by the techniques of Agrawal et al. (2019) via the implicit
function theorem on the optimality condition of (11).
4 Implementation Examples Examples Let us provide concrete examples of the config-
4.1 Learnable Components uration of φ and CS̃ (also summarized in Table 1).
The components of the proposed method, namely φ, h, q, η, Example 1. If we exactly know that S is a sphere around
and ξ, can be any parametric models such as neural networks. the origin, we can set φ to be the identity function, i.e.,
Let us introduce examples for each component. z = φ(x) = x, and set CS̃ (z) = kzk2 − r2 . In this case,
φ Choice of φ’s model depends on the availability of prior PS̃ z = PS x = rx/kxk for x 6= 0 (and arbitrary for x = 0).
knowledge of the dynamics to be learned (see also Table 1). Radius r may or may not be a learnable parameter.
For example, if we know the topology of S (e.g., it is a Example 2. If we know the dynamics should have a stable
closed orbit), we can model φ as a diffeomorphic function limit cycle, we can set S to be a circle along a pair of axes of
such as the neural ODE (NODE) (Chen et al. 2018) and Z, expecting φ learns an appropriate coordinate transform

9786
φ(x) CS̃ (z)
Steil 2013; Manek and Kolter 2019; Tuor, Drgona, and Vra-
bie 2020; Massaroli et al. 2020). However, they only handle
Perfect knowledge available identity S̃ = S the stability of finite number of equilibria.
(i.e., exact S ⊂ X is known) (cf. Example 1)
We note that, in parallel to the current work, Urain et al.
Partial knowledge available (2020) developed a method for learning a dynamics model
(i.e., rough behavior of phenomenon is known)
as an invertible transform from a primitive model for which
e.g., self-sustained oscillations NODE Example 2
asymptotic behavior is specified. Their method requires to
e.g., quasiperiodic patterns NODE Example 3
e.g., neural integrators (A)NODE Example 4 specify a particular primitive model with desired stability
property. In contrast, our method is more general while it
might need more meticulous attention in model configuration.
Table 1: Implementation examples in accordance with avail-
ability of prior knowledge. NODE (Chen et al. 2018) and Learning Stabilizing Controllers Another related direc-
ANODE (Dupont, Doucet, and Teh 2019) are mentioned tion is to learn a controller that stabilizes a given dynamical
here, but other invertible neural nets are applicable, too. For system. For example, Chang, Roohi, and Gao (2019) pro-
concrete examples of each case of “partial knowledge,” e.g., posed a method to learn neural controllers by constructing a
Strogatz (2015) and Eliasmith (2005) are informative. neural Lyapunov function simultaneously. They adopt a self-
supervised learning scheme where a neural controller and a
neural Lyapunov function are trained so that the violation of
the stability condition (in Theorem 1) is minimized. Such an
to adjust the axes to those in the original space. For example,
approach is also applicable to dynamics learning, but existing
CS̃ (z) = z12 + z22 − r2 (and ignore zi ’s for i > 2).
methods only focus on discrete equilibria, too.
Example
p 3. We may set S̃ to be a 2-torus, CS̃ (z) = Learning Physically Meaningful Systems Another re-
( z12 + z22 − R)2 + z32 − r2 , onto which the orthogonal lated thread of studies is to learn physical systems, such as
projection PS̃ z can be computed analytically. Lagrangian (Lutter, Ritter, and Peters 2019; Cranmer et al.
Example 4. Another common option is a hyperplane 2020) and Hamiltonian (Greydanus, Dzamba, and Yosinski
CS̃ (z) = cT z − b. This is useful in modeling, for exam- 2019; Matsubara, Ishikawa, and Yaguchi 2020) mechanics us-
ple, sets of infinitely many equilibria, which often appear in ing neural networks. Extension to port-Hamiltonian systems
computational neuroscience (Eliasmith 2005). (Zhong, Dey, and Chakraborty 2020) is also considered.
Example 5. More generally, we may set S̃ to be a quadric,
CS̃ (z) = z T Qz + pT z + r. In this case, we need the differ- 6 Experiment
entiable optimization layer (Agrawal et al. 2019). 6.1 Configuration
Implementation We implemented the learnable compo-
4.3 Learning Procedures nents (i.e., φ, h, q, η, and ξ) with neural networks. We
Given a dataset and a dynamics model ẋ = f (x) constructed used ANODE (Dupont, Doucet, and Teh 2019) for φ in
as above, we are to learn the parameters of the unknown func- Sections 6.3 and 6.4 to allow much flexibility, while non-
tions, φ, h, and q, and possibly η, ξ, and C. The learning augmented NODE (Chen et al. 2018) was also sufficient. For
scheme can be designed in either or both of the following two the other components, we used networks with fully-connected
ways. First, if we have paired observations of x and ẋ, we hidden layers. We used the exponential linear unit as the acti-
simply minimize some loss (e.g., square loss) between ẋ and vation function. Other details are found in the appendix.
f (x). This is also applicable when we can estimate ẋ from Baselines Besides the proposed model in (9), we tried
x’s (e.g., Chartrand 2011). Second, if we have unevenly- either or both of the following models as baselines:
sampled sequences (xt1 , . . . , xtn ), we utilize the adjoint 1) Base dynamics model without stability nor invariance,
state method or backpropagation in forward ODE solvers i.e., ẋ = φ−1 (h(φ(x))); we may refer to this baseline
for optimization (see, e.g., Chen et al. 2018). as a vanilla model.
2) Stable dynamics model like ours, but the stable invariant
5 Related Work set is fixed to be an equilibrium at x = 0 (i.e., almost
the same with Manek and Kolter (2019)); we may refer
Learning Stable Dynamics Learning stable linear dy-
to this baseline as a stable equilibrium model.
namical systems, e.g., xt+1 = Axt s.t. ρ(A) < 1, is indeed
a nontrivial problem and has been addressed for decades
(e.g., Lacy and Bernstein 2003; Siddiqi, Boots, and Gordon 6.2 Simple Examples
2008; Huang et al. 2016; Mamakoukas, Xherija, and Murphey As a proof of concept, we examined the performance of the
2020). The problem of learning stable nonlinear systems has proposed method on simple dynamical systems whose stable
also been studied for various models, such as Gaussian mix- invariant set is known analytically. Hence, we do not need φ
tures (Khansari-Zadeh and Billard 2011; Blocher, Saveriano, (i.e., set φ(x) = x) in the two experiments in this section.
and Lee 2017; Umlauft and Hirche 2017), kernel methods Limit Cycle We examined the system with a limit cycle:
(Khosravi and Smith 2021), Gaussian processes (Duncker
et al. 2019), and neural networks (Neumann, Lemme, and ẋ1 = x1 −x2 −x1 (x21 +x22 ), ẋ2 = x1 +x2 −x2 (x21 +x22 ),

9787
1 2
truth vanilla
0.4 2
vanilla proposed
proposed 1.5

x2
error
0
x2

0 0.2 1

0.5
−2
0 −2 0 2
−1
−1 0 1 20 40 x1
x1 step
Figure 3: Contour plot of V (x) learned on
Figure 2: Test results on the system with limit cycle in Section 6.2. (left) the data generated from the system with line
Examples of long-term prediction from x1 , x2 = −.1, .1 for 200 steps. (right) attractor in Section 6.2. The dotted line is the
Average (and stdev) long-term prediction errors against prediction steps. true line attractor, x1 = 0.
5 5
2 3 vanilla
5 5 1 proposed

error
2
x2

x2

x2

x2
0 0 0 0
1
1
−5 −5
0
−5 −5
−2 0 2 −2 0 2 −2 0 2 −2 0 2 100 200 300 400
x1 x1 x1 x1 step

(a) (b) (c) (d) (e)


Figure 4: Results of Section 6.3. (a) True vector field of (12) and two trajectories. Red dashed-line rectangle is the training data
region. (b) Learned vector field and trajectories from it. (c) Learned V (x). (d) Learned V (x) without φ. (e) Prediction errors.

whose orbits approach to a unit circle as t → ∞. We gener- 6.3 Learning Vector Field of Nonlinear Oscillator
ated four sequences of length 20 with ∆t = .075 and used The Van der Pol oscillator:
the pairs of x and ẋ as training data. For the proposed model,
we set CS (x) = x21 + x22 − 1 (i.e., the truth) with S defined ẋ1 = x2 , ẋ2 = µ(1 − x21 )x2 − x1 (12)
as in (7b). is well known as a basis for modeling many physical and
In Figure 2, we show the results of long-term prediction biological phenomena. It has a stable limit cycle, whose
given only xt=0 that was not in the training data. The left exact shape cannot be described analytically. As training data
panel depicts trajectories of length 200 predicted by the true (Figure 4a), we used the values of x and ẋ sampled from an
system, the vanilla model, and the proposed model. The pro- even grid on area [−2.5, 2.5] × [−4.5, 4.5] with µ = 2. For
posed model’s trajectory successfully reaches the plausible the proposed model, we set S̃ to be a circle defined as in (7b),
limit cycle, while it is not surprising as a natural consequence expecting φ would be learned so that it transforms a circle
of the model’s construction. The right panel shows the av- into the limit cycle of the system.
erage long-term prediction errors against prediction steps (a In Figure 4b, we show the learned vector field and two
single step corresponds to the ∆t). The average was taken trajectories generated from it. They successfully resemble the
with regard to 20 test sequences with different xt=0 . We can truth (in Figure 4a), even outside the area of the training data.
observe that the proposed stable model achieves consistently In Figure 4c, we depict the values of learned V (x), wherein
lower prediction errors. we can observe V (x) decreases toward the limit cycle of the
Line Attractor We examined another simple system: system (the dashed orbit). In Figure 4d, for comparison, we
show learned V (x) without a learnable feature transform φ;
ẋ1 = x1 (1 − x2 ), ẋ2 = x21 . not surprisingly, it fails to capture the shape of the limit cycle.
Line x1 = 0 constitutes a line attractor of this system as a In Figure 4e, we show an example of long-term prediction
set of infinitely many stable equilibria; every orbit starting errors against prediction steps (a single step corresponds to
at x1 6= 0 approaches to some point on this line as t → ∞. the ∆t). The model with the proposed stability guarantee
We generated eight sequences of length 80 with ∆t = .05 as achieves significantly lower long-term prediction errors.
training data. We learned the proposed model with φ(x) = x
and CS (x) = c1 x1 + c2 x2 , where c1 and c2 were learnable 6.4 Application: Fluid Flow Prediction
parameters, and S was defined as in (7b). We apply the proposed stable dynamics model to an appli-
In Figure 3, we show the values of learned V (x). We can cation of fluid flow prediction. The target flow is so-called
say it is successfully learned because V (x) monotonically cylinder wake (see Figure 5a); a cylinder-like object is lo-
decreases toward the line x1 = 0. Moreover, it reflects the cated in a 2D field, fluids come from one side uniformly, and
fact that a state of this system moves faster when |x1 |  0 there occurs a series of vortices past the object in certain con-
and x2  0 (i.e., in the lower part of the x-plane). ditions. This is a limit cycle known as the Kármán’s vortex

9788
(a)

(b)

(c)

(d)

t=0 t = 40 t = 80 t = 120 t = 160 t = 200

Figure 5: Long-term predictions of fluid flow. Red&yellow and blue&cyan denote positive and negative values of vorticity,
respectively. (a) Ground truth. (b) The vanilla model. (c) The stable equilibrium model. (d) The proposed model.

street. Before the flow reaches the limit cycle, it typically References
starts from an unstable equilibrium, and then the vortices Agrawal, A.; Amos, B.; Barratt, S.; Boyd, S.; Diamond, S.;
begin to grow gradually. This stage is called off-attractor. and Kolter, J. Z. 2019. differentiable convex optimization lay-
Cylinder wake has been studied as one of standard problems ers. In Advances in Neural Information Processing Systems
of fluid dynamics and also appears as a testbed of forecasting 32, 9562–9574.
methods even recently (see, e.g., Kutz et al. 2016).
As training data, we generated such flow using the im- Amos, B.; Xu, L.; and Kolter, J. Z. 2017. Input convex
mersed boundary projection method (Taira and Colonius neural networks. In Proceedings of the 34th International
2007; Colonius and Taira 2008) and used the part from near Conference on Machine Learning, 146–155.
the equilibrium to a time point before the limit cycle is com- Blocher, C.; Saveriano, M.; and Lee, D. 2017. Learning stable
pletely observed; hence the training data were off-attractor. dynamical systems using contraction theory. In Proceedings
The data comprised the observations of the vorticity in the of the 14th International Conference on Ubiquitous Robots
field of size 199 × 449. As preprocessing, we reduced the and Ambient Intelligence, 124–129.
dimensionality of data from 89351 to 26 by PCA, which
lost only 0.1% of the energy. We contaminated the data with Chang, Y.-C.; Roohi, N.; and Gao, S. 2019. Neural Lyapunov
Gaussian noise. We estimated ẋ by (xt+∆t − xt )/∆t and control. In Advances in Neural Information Processing Sys-
learned the proposed dynamics model with a cycle along the tems 32, 3245–3254.
first two axes of Z as S̃ (i.e., CS̃ (z) = z12 + z22 − r2 ). Chartrand, R. 2011. Numerical differentiation of noisy, nons-
In Figure 5, we show the results of long-term prediction mooth data. ISRN Applied Mathematics 2011: 164564.
starting at a time point where the flow is almost on the limit
cycle (i.e., on-attractor). The two baselines (in Figures 5b Chen, T. Q.; Rubanova, Y.; Bettencourt, J.; and Duvenaud,
and 5c) fail to replicate the true limit cycle (in Figure 5a). In D. K. 2018. Neural ordinary differential equations. In Ad-
contrast, the long-term prediction by the proposed method vances in Neural Information Processing Systems 31, 6572–
(in Figure 5d) shows a plausible oscillating pattern, though 6583.
the oscillation phase is slightly different from the truth. It is Colonius, T.; and Taira, K. 2008. A fast immersed boundary
worth noting that with the proposed stable dynamics model, method using a nullspace approach and multi-domain far-
we were able to predict the on-attractor oscillating patterns field boundary conditions. Computer Methods in Applied
only from off-attractor training data. Mechanics and Engineering 197(25): 2131–2146.
Costello, Z.; and Martin, H. G. 2018. A machine learning
7 Conclusion approach to predict metabolic pathway dynamics from time-
We proposed a dynamics model with the provable existence series multiomics data. npj Systems Biology and Applications
of a stable invariant set. It can handle the stability of general 4(1): 19.
types of invariant sets, for example, limit cycles and line
attractors. Future directions of research include the treatment Cranmer, M.; Greydanus, S.; Hoyer, S.; Battaglia, P.;
of random dynamical systems as the current method is limited Spergel, D.; and Ho, S. 2020. Lagrangian neural networks.
to deterministic dynamics. Consideration of the input-to-state arXiv:2003.04630.
stability of controlled systems is also an interesting problem. Duncker, L.; Bohner, G.; Boussard, J.; and Sahani, M. 2019.
Learning interpretable continuous-time models of latent
Acknowledgements stochastic dynamical systems. In Proceedings of the 36th
International Conference on Machine Learning, 1726–1734.
This work was done when the first author was working at
RIKEN Center for Advanced Intelligence Project. It was Dupont, E.; Doucet, A.; and Teh, Y. W. 2019. Augmented
supported by JSPS KAKENHI Grant Numbers JP19K21550, neural ODEs. In Advances in Neural Information Processing
JP18H03287, and JST CREST Grant Number JPMJCR1913. Systems 32, 3140–3150.

9789
Eliasmith, C. 2005. A unified approach to building and Morton, J.; Jameson, A.; Kochenderfer, M. J.; and Witherden,
controlling spiking attractor networks. Neural Computation F. 2018. Deep dynamical modeling and control of unsteady
17(6): 1276–1314. fluid flows. In Advances in Neural Information Processing
Giesl, P.; and Hafstein, S. 2015. Review on computational Systems 31, 9258–9268.
methods for Lyapunov functions. Discrete and Continuous Neumann, K.; Lemme, A.; and Steil, J. J. 2013. Neural
Dynamical Systems Series B 20(8): 2291–2331. learning of stable dynamical systems based on data-driven
Lyapunov candidates. In Proceedings of the 2013 IEEE/RSJ
Greydanus, S.; Dzamba, M.; and Yosinski, J. 2019. Hamil-
International Conference on Intelligent Robots and Systems,
tonian neural networks. In Advances in Neural Information
1216–1222.
Processing Systems 32, 15379–15389.
Papamakarios, G.; Nalisnick, E.; Rezende, D. J.; Mohamed,
Hirsch, M. W.; Smale, S.; and Devaney, R. L. 2003. Differ- S.; and Lakshminarayanan, B. 2019. Normalizing flows for
ential equations, dynamical systems, and an introduction to probabilistic modeling and inference. arXiv:1912.02762.
chaos. Academic Press, 2nd edition.
Raissi, M.; Perdikaris, P.; and Karniadakis, G. E. 2019.
Huang, W.; Cao, L.; Sun, F.; Zhao, D.; Liu, H.; and Yu, Physics-informed neural networks: A deep learning frame-
S. 2016. Learning stable linear dynamical systems with work for solving forward and inverse problems involving
the weighted least square method. In Proceedings of the nonlinear partial differential equations. Journal of Computa-
25th International Joint Conference on Artificial Intelligence, tional Physics 378: 686–707.
1599–1605.
Siddiqi, S. M.; Boots, B.; and Gordon, G. J. 2008. A con-
Khalil, H. K. 2002. Nonlinear systems. Prentice Hall, 3rd straint generation approach to learning stable linear dynami-
edition. cal systems. In Advances in Neural Information Processing
Khansari-Zadeh, S. M.; and Billard, A. 2011. Learning stable Systems 20, 1329–1336.
nonlinear dynamical systems with Gaussian mixture models. Strogatz, S. H. 2015. Nonlinear dynamics and chaos: With
IEEE Transactions on Robotics 27(5): 943–957. applications to physics, biology, chemistry, and engineering.
Khosravi, M.; and Smith, R. S. 2021. Nonlinear system CRC Press, 2nd edition.
identification with prior knowledge of the region of attraction. Taira, K.; and Colonius, T. 2007. The immersed boundary
IEEE Control Systems Letters 5(3): 1091–1096. method: A projection approach. Journal of Computational
Kutz, J. N.; Brunton, S. L.; Brunton, B. W.; and Proctor, J. L. Physics 225(2): 2118–2137.
2016. Dynamic mode decomposition: Data-driven modeling Takeishi, N.; Kawahara, Y.; and Yairi, T. 2017. Learning
of complex systems. SIAM. Koopman invariant subspaces for dynamic mode decomposi-
Lacy, S. L.; and Bernstein, D. S. 2003. Subspace identifica- tion. In Advances in Neural Information Processing Systems
tion with guaranteed stability using constrained optimization. 30, 1130–1140.
IEEE Transactions on Automatic Control 48(7): 1259–1263. Tanaka, T.; and Aoyagi, T. 2011. Multistable attractors in
a network of phase oscillators with three-body interactions.
Li, Z.; Kermode, J. R.; and De Vita, A. 2015. Molecu-
Physical Review Letters 106(22): 224101.
lar dynamics with on-the-fly machine learning of quantum-
mechanical forces. Physical Review Letters 114(9): 096405. Teshima, T.; Ishikawa, I.; Tojo, K.; Oono, K.; Ikeda, M.;
and Sugiyama, M. 2020. Coupling-based invertible neural
Lusch, B.; Kutz, J. N.; and Brunton, S. L. 2018. Deep learn- networks are universal diffeomorphism approximators. In
ing for universal linear embeddings of nonlinear dynamics. Advances in Neural Information Processing Systems 33.
Nature Communications 9(1): 4950.
Tuor, A.; Drgona, J.; and Vrabie, D. 2020. Constrained neu-
Lutter, M.; Ritter, C.; and Peters, J. 2019. Deep Lagrangian ral ordinary differential equations with stability guarantees.
networks: Using physics as model prior for deep learning. In arXiv:2004.10883.
Proceedings of the 7th International Conference on Learning
Representations. Umlauft, J.; and Hirche, S. 2017. Learning stable stochastic
nonlinear dynamical systems. In Proceedings of the 34th
Mamakoukas, G.; Xherija, O.; and Murphey, T. D. 2020. International Conference on Machine Learning, 3502–3510.
Memory-efficient learning of stable linear dynamical systems
for prediction and control. In Advances in Neural Information Urain, J.; Ginesi, M.; Tateo, D.; and Peters, J. 2020. Imita-
Processing Systems 33. tionFlow: Learning deep stable stochastic dynamic systems
by normalizing flows. In Proceedings of the 2020 IEEE/RSJ
Manek, G.; and Kolter, J. Z. 2019. Learning stable deep International Conference on Intelligent Robots and Systems,
dynamics models. In Advances in Neural Information Pro- 5231–5237.
cessing Systems 32, 11128–11136.
Wang, J. M.; Fleet, D. J.; and Hertzmann, A. 2006. Gaussian
Massaroli, S.; Poli, M.; Bin, M.; Park, J.; Yamashita, A.; and process dynamical models. In Advances in Neural Informa-
Asama, H. 2020. Stable neural flows. arXiv:2003.08063. tion Processing Systems 18, 1441–1448.
Matsubara, T.; Ishikawa, A.; and Yaguchi, T. 2020. Deep Zhong, Y. D.; Dey, B.; and Chakraborty, A. 2020. Dissipative
energy-based modeling of discrete-time physics. In Advances SymODEN: Encoding Hamiltonian dynamics with dissipa-
in Neural Information Processing Systems 33. tion and control into deep learning. arXiv:2002.08860.

9790

You might also like