Goyal Benner 2022 Discovery of Nonlinear Dynamical Systems Using A Runge Kutta Inspired Dictionary Based Sparse
Goyal Benner 2022 Discovery of Nonlinear Dynamical Systems Using A Runge Kutta Inspired Dictionary Based Sparse
dictionary-based sparse regression approach. Max Planck Institute for Dynamics of Complex Technical Systems,
Proc. R. Soc. A 478: 20210883. Standtorstraße 1, 39106 Magdeburg, Germany
https://doi.org/10.1098/rspa.2021.0883 PG, 0000-0003-3072-7780; PB, 0000-0003-3362-4103
2022 The Authors. Published by the Royal Society under the terms of the
Creative Commons Attribution License http://creativecommons.org/licenses/
by/4.0/, which permits unrestricted use, provided the original author and
source are credited.
1. Introduction 2
Data-driven discovery of dynamical models has recently drawn significant attention as there
as that for linear systems. Inferring nonlinear systems often requires a priori model hypothesis
by practitioners. A compelling breakthrough towards discovering nonlinear governing equations
appeared in [9,10], where an approach based on genetic programming or symbolic regression
is developed to identify nonlinear models using measurement data. It provides parsimonious
models that accomplish a long-standing desire for the engineering community. A parsimonious
model is determined by examining the Pareto front that discloses a trade-off between the
identified model’s complexity and accuracy. In a similar spirit, there have been efforts to
develop sparsity promoting approaches to discover nonlinear dynamical systems [11–15]. It
is often observed that the dynamics of physical processes can be given by collecting a few
nonlinear feature candidates from a high-dimensional nonlinear function space, referred to as
a feature dictionary. These sparsity-promoting methods are able to discover models that are
parsimonious, which in some situations can lead to better interpretability than black-box models.
For motivation, we take an example from [14], where using data for fluid flow dynamics
behind a cylinder, it is shown that one can obtain a model, describing the dynamics on-attractor
and off-attractor and characterizing a slow parabolic manifold. Fluid dynamics practitioners
can well interpret this model. Another example may come from biological modelling, where
parsimonious models can describe how a species affects the dynamics of other species. Hence,
the approach to discovering sparse models using dictionary learning can be interpreted in
this way.
Significant progress in solving sparse regression problems [16–18] and in compressed sensing
[19–22] supports the development of these approaches. Although all these methods have gained
much popularity, the success largely depends on the feature candidates included in the dictionary
and the ability to approximate the derivative information using measurement data accurately.
A derivative approximation using sparsely sampled and noisy measurements imposes a tough
challenge though there are approaches to deal with noise, e.g. [23]. We also highlight additional
directions explored in the literature to discover nonlinear governing equations, which include
discovery of models using time-series data [8], automated inference of dynamics [9,24,25] and
equation-free modelling [13,26,27].
In this work, we re-conceptualize the problem of discovering nonlinear differential equations
by blending sparse identification with a classical numerical integration tool. We focus on a widely
known integration scheme, namely the classical fourth-order Runge–Kutta [28] method, noting
that any other explicit high-order integration scheme, e.g. 3/8-rule fourth Runge–Kutta method
or the ideal of neural ODEs proposed in [29] incorporating any numerical integrator. In contrast to
previously studied sparse identification approaches, e.g. [9,11,14], our approach does not require
direct access or approximation of temporal gradient information. Therefore, we do not commit
errors due to a gradient approximation. The approach becomes an attractive choice when the
collected measurement data are sparsely sampled and corrupted with noise.
However, we mention that using numerical integration schemes in the course of learning
dynamics has a relatively long history. The work goes back to [30,31], where the fourth-
order Runge–Kutta scheme is coupled with neural networks to learn a function, describing
the underlying vector field. In recent times, making use of numerical integration schemes with
neural networks has again received attention and has been studied from the perspective of
3
dynamical modelling, e.g. [32–34]. We particularly emphasize the work [34] that also uses a
similar concept to learn dynamical systems using noisy measurements; precisely, it realizes the
assumptions about the underlying system or structure of dynamical models, they are often
black-box models; thus, interpretability and generalization of these models is unclear.
In this work, we also discuss an essential class of dynamical models that typically explains
the dynamics of biological networks. It is also shown that regulatory and metabolic networks
are sparse in nature, i.e. not all components influence each other. Furthermore, such dynamical
models are often given by rational nonlinear functions. Consequently, the classical dictionary-
based sparse identification ideology is not applicable as building all possible rational feature
candidates is infeasible. To deal with this, the authors in [36] have recast the problem as finding
the sparsest vector in a given null space. However, computing a null space using corrupted
measurement data is a non-trivial task though there is some work in this direction [37]. Here,
we instead characterize identifying rational functions as a fraction of two functions, where each
function is identified using dictionary learning. Hence, we inherently retain the primary principle
of sparse identification in the course of discovering models. In addition to these, we discuss the
case where a dictionary contains parameterized candidates, e.g. eαx , where x is the dependent
variable, and α is an unknown parameter. We extend our discussion to parametric and controlled
dynamic processes. The organization of the paper is as follows. In §2, we briefly recap the classical
fourth-order Runge–Kutta method for the integration of ordinary differential equations. After
that, we propose a methodology to discover differential equations by synthesizing the integration
scheme with sparse identification. Furthermore, since the method involves solving nonlinear
and non-convex optimization problems that promote sparse solutions, §3 discusses algorithms
inspired by a sparse-regression approach in [14,18]. In §4, we examine a number of extensions to
other classes of models, e.g. when the governing equations are given by a fraction of two functions
and involve model parameters and external control inputs. In the subsequent section, we illustrate
the efficiency of the proposed methods by discovering a broad variety of benchmark examples,
namely the chaotic Lorenz model, Fitz–Hugh Nagumo (FHN) models, Michaelis–Menten kinetics
and parameterized Hopf normal norm. We extensively study the performance of the proposed
approach even under noisy measurements and compare it to the approach proposed in [14]. We
conclude the paper with a summary and high-priority research directions.
where x(t) := [x1 (t), x2 (t), . . . , xn (t)] with xj (t) being the jth element of the vector x(t), and the
function : Rn → Rn defines its vector field. Assume that we aim at predicting x(tk+1 ) for a given
x(tk ), where k ∈ {0, 1, . . . , N }. Then, using the RK4 scheme, x(tk+1 ) can be given as a weighted sum
of four increments, which are the product of the time-step and vector field information f(·) at
specific locations. Precisely, it is given as
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
1
x(tk+1 ) ≈ x(tk ) + hk (k1 + 2 · k2 + 2 · k3 + k4 ), hk = tk+1 − tk , (2.2)
6
where
k1 k2
k1 = f(x(tk )), k2 = f(x(tk ) + hk ), k3 = f(x(tk ) + hk ) and k4 = f(x(tk ) + hk k3 ).
2 2
The RK4 scheme as a network is illustrated in figure 1a. The local integration error due to the RK4
scheme is in O(h5k ); hence, the approach is very accurate for small time-steps. Furthermore, if we
integrate equation (2.1) from t0 to tf , we can take N steps with time-steps hk , k ∈ {1, . . . , N } so that
tf = t0 + N i=0 hk . In the rest of the paper, we use the short-hand notation FRK4 (f, x(tk ), hk ), i.e. for
the step in (2.2)
x(tk+1 ) = x(tk + hk ) ≈ FRK4 (f, x(tk ), hk ). (2.3)
Lastly, we stress the point that the RK4 scheme readily handles integration backward in time,
meaning that hk in (2.2) can also be negative. Hence, we can predict both x(tk+1 ) and x(tk−1 ) using
x(tk ) very accurately using the RK4 scheme.
The next important ingredient to sparse identification is the construction of a huge symbolic
dictionary Φ, containing potential nonlinear features. We assume that the function f(·) can be
given by a linear combination of few terms from the dictionary. For example, one can consider a
dictionary containing polynomial, exponential and trigonometric functions, which, for any given
vector v := [v1 , . . . , vn ] can be given as
in which vPi , i ∈ {2, 3}, denote high-order polynomials, e.g. vP2 contains all possible degree-2
polynomials of elements of v as
) [x ,...,x ]
) [x ,...,x ]
F (X) [x X ,...,xXn]
⎡ ⎤
Xn
Xn
Xn
.. .. ⎡ ⎤
. . hi hi hi .. ..
⎢ ⎥ 2 X̃1 2 X̃2 hi X̃3 6 . .
⎢ X (t ) , . . . , X (t ) ⎥ ⎢ ⎥
1
⎣ 1 i n i ⎦ + + + + ≈ ⎢X (t ) , . . . , X (t ) ⎥
X
.. .. × × × × ⎣ 1 i +1 n i +1 ⎦
. . .. ..
. .
3
:= X
F (X
F (X
F (X
symobolic dictionar a y of functions
⎡ ⎤
.. ..
⎢ . . ⎥
= ⎢ 1, X,
F (X) X P2, XP3, . . . , sin (X), , . . . ,⎥
cos(X)
coefficients to describe dynamics ⎣ ⎦
.. ..
(a) xX1, . . . , xXn . .
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
xv xw
Fitz-Hugh Nagumo model 1 [0.499] [0.032]
3
v̇ = v − w − v /3 + 0.5 ⎡ ⎤ ⎛ ⎡ ⎤⎞ ⎡ ⎤ v [0.998] [0.040]
.. .. .. .. .. .. w [−0.998] [−0.028]
ẇ = 0.040v − 0.028w + 0.032 ⎢ . . ⎥ ⎜ ⎢. . ⎥⎟ ⎢ . . ⎥ v2 [ 0] [ 0]
⎢ v (t ) , w (t ) ⎥ RK4 ⎜ ,⎢ xw ⎥⎟ ⎢ v (t ) , w (t ) ⎥
data ⎣ i i ⎦ ⎝F · ⎣ v,
x ⎦⎠ ≈ ⎣ i +1 i +1 ⎦ vw [ 0] [ 0]
data .. w2
. ... ..
.
..
.
..
.
.
.
. v3
[ 0]
[−0.333]
[
[
0]
0]
1.0 ..
w . [ 0] [ 0]
0.5 w5 [ 0] [ 0]
0
discovered model
model
2
1
0 0 dictionary of nonlinear functions
200 –1 v 1.0
time
400
600 –2 ⎛⎡ ⎤⎞ ⎡ ⎤ 0.5
w
.. .. .... .. .. .. .. .. ..
⎜⎢ . ⎥⎟ ⎢ . . . . . . . . . ⎥ 0 identified model
F ⎜⎢ v (ti ) , w (ti) ⎥⎟ = ⎢ 1 v (ti) w (ti) v (ti)2 v (ti) w(ti) w (ti)2 · · · w (ti)5 ⎥ 2
⎝⎣ ⎦⎠ ⎣ ⎦ 1 v̇(t) = Φ ([v, w])ξv
.. .. .. .. .. .. .. .. .. ..
(b) . . . . . . . . . .
0
200
400 –2
0
–1 v ẇ(t) = Φ ([v, w])ξv
time 600
Figure 1. In (a), we show the RK4 scheme to predict variables at the next time-step as a network. It resembles a residual-type
network with skip connections (e.g. [38,39]). In (b), we present a systematic illustration of the RK4-SINDy approach to discover
governing equations using the Fitz–Hugh Nagumo model. In the first step, we collect a time history of variables v(t) and w(t).
Next, we build a symbolic feature dictionary Φ, containing potential features. This is followed by solving a nonlinear sparse
regression problem to pick the right features from the dictionary (encoded in sparse vectors ξv and ξw ). Here, we presume that
variables at the next time-steps are given by following the RK4 scheme. The non-zero entries in vectors ξv and ξw determine
the right-hand side of the differential equations. As shown, we pick the right features from the dictionary upon solving the
optimization problem, and corresponding coefficients are 0.1% accurate. (Online version in colour.)
Each element in the dictionary Φ is a potential candidate to describe the function f. Moreover,
depending on the application, one may use empirical or expert knowledge to construct a
meaningful feature dictionary.
Having an extensive dictionary, one has many choices of candidates. However, our goal is to
choose as few candidates as possible, describing the nonlinear function f in (2.1). Hence, we set up
a sparsity-promoting optimization problem to pick a few candidate functions from the dictionary,
e.g.
where fk : Rn → R is the kth element of f, and ξ k ∈ Rm a sparse vector with m the total number
of features in the dictionary Φ; hence, selecting appropriate candidates from the dictionary
determines the governing equations. As a result, we can write the function f(·) in (2.1) as follows:
we manifest in our results in §5. The approach is also appealing when data is collected at irregular
time-steps.
When the data are corrupted with noise or do not follow RK4 exactly, we may need to
regularize the above optimization problem. Since the l1 -regularization promotes sparsity in the
solution, one can solve an l1 -regularized optimization problem
where
⎡ ⎤ ⎡ ⎤ ⎡ ⎤
x(t0 ) x1 (t0 ) x2 (t0 ) ··· xn (t0 ) FRK4 (f, x(t1 ), −h1 )
⎢ x(t ) ⎥ ⎢ x (t ) ··· xn (t2 ) ⎥ ⎢ F (f, x(t ), −h ) ⎥
⎢ 1 ⎥ ⎢ 1 1 x2 (t1 ) ⎥ b ⎢ RK4 2 2 ⎥
Xb := ⎢
⎢ .. ⎥=⎢
⎥ ⎢ .. .. .. .. ⎥ X
⎥ F (f) := ⎢
⎢ .. ⎥.
⎥
⎣ . ⎦ ⎣ . . . . ⎦ ⎣ . ⎦
x(tN −1 ) x1 (tN −1 ) x2 (tN −1 ) ··· xn (tN −1 ) FRK4 (f, x(tN ), −hN )
Therefore, we can have a more involved optimization by including both forward and backward
predictions in time. This helps particularly in the presence of noisy measurement data. But, on
the other hand, it would make the optimization problems yielding the sparse vectors ξ i ’s harder
to solve. In the following subsection, we discuss an efficient procedure to solve the optimization
problem (2.9).
Input: Measurement data {x(t0 ), x(t1 ), . . . , x(tN )} and the cutoff parameter λ.
Output: The sparse Ξ that picks the right features from the dictionary.
coefficients are equal to or larger than λ. This procedure is efficient as the current value of non-
zero coefficients can be used as an initial guess for the next iteration, and the optimal Ξ can be
found with little computational effort. Note that the cut-off parameter λ is important to obtain a
suited sparse solution, but it can be found using the concept of cross-validation. We sketch the
discussed procedure in algorithm 1.
For the iterative thresholding algorithm proposed for the sparse regression in [14], an analysis
of the iterative thresholding algorithm is conducted in [42], showing a rapid convergence of the
algorithm. In contrast to the algorithm in [14], algorithm 1 is more complex, and the underlying
optimization problem is non-convex; thus, a thorough study of its convergence is out of the scope
of this paper. However, we here mention that a rapid convergence of algorithm 1 is observed in
numerical experiments, but its analysis will be an important topic for future research.
We also note that algorithm 1 always terminates as either the number of indices set to zero is
increased (which terminates when the dictionary is exhausted), or the error criterion is satisfied.
But the question remains as to whether the algorithm will converge to the correct sparse solution.
A remedy to this can be to use the rationale of an ensemble, proposed in [43] to build an ensemble
of sparse models. It can provide statistical quantities for the feature candidates in the dictionary.
Based on these, we can construct a final sparse model based on statistical tools such as the
p-values.
8: E : =X − XF (Φ(·)Ξ ) .
Output: The sparse Ξ that picks right features from the dictionary.
particularly when data are corrupted with noise. However, it may be computationally more
expensive than the fixed cut-off thresholding approach as it may need more iterations to converge.
Therefore, an efficient approach combining fixed and iterative thresholding approaches is a
worthy future research direction.
Φ (g) (x) = 1, x, x2 , x3 , . . . , sin (x), cos (x), sin (x2 ), cos (x2 ), sin (x3 ), sin (x3 ), . . . (4.2)
and
Φ (h) (x) = x, x2 , x3 , . . . , sin (x), cos (x), sin (x2 ), cos (x2 ), sin (x3 ), sin (x3 ), . . . . (4.3)
and
h(x) = Φ (h) (x)ξ h , (4.5)
where ξ g and ξ h are sparse vectors. Then, we can readily apply the framework discussed in the
9
previous section by assuming f(x) := g(x)/(1 + h(x)) in (2.1). We can determine sparse coefficients
ξ g and ξ h by employing the thresholding concepts presented in algorithms 1 and 2. These are
x(t)
ẋ(t) = −x(t) − , (4.7)
1 + x(t)
which fits to the form considered in (4.6). In this case, all nonlinear functions k(·), g(·) and h(·) are
degree-1 polynomials. On the other hand, if the model (4.7) is written in the form (4.1), then we
have
−1 − x(t) − x(t)2
ẋ(t) = . (4.8)
1 + x(t)
Thus, the nonlinear functions g(·) and h(·) in (4.1) are of degrees 2 and 1, respectively. This gives a
hint that if we aim at learning governing equations using sparse identification, it might be efficient
to consider the form (4.6) due to a smaller size of the necessary dictionary. It becomes even more
adequate in multi-dimensional differential equations. To discover a dynamical model of the form
(4.6), we extend the idea of learning nonlinear functions using dictionaries. Let us construct three
dictionaries as follows:
Φ (k) (x) = 1, x, x2 , x3 , . . . , sin (x), cos (x), sin (x2 ), cos (x2 ), sin (x3 ), sin (x3 ), . . . , (4.9)
Φ (g) (x) = 1, x, x2 , x3 , . . . , sin (x), cos (x), sin (x2 ), cos (x2 ), sin (x3 ), sin (x3 ), . . . (4.10)
and Φ (h) (x) = x, x2 , x3 , . . . , sin (x), cos (x), sin (x2 ), cos (x2 ), sin (x3 ), sin (x3 ), . . . . (4.11)
Then, we believe that the nonlinear functions in (4.6) can be given as a sparse linear
combination of the dictionaries, i.e.
k(x) = Φ (k) (x)ξ k , g(x) = Φ (g) (x)ξ g and h(x) = Φ (h) (x)ξ h . (4.12)
To determine the sparse coefficients {ξ k , ξ g , ξ h }, we can employ the RK4-SINDy framework, and
algorithms 1 and 2. We will illustrate this approach to discover enzyme kinetics given by a rational
function in §d .
We note that learning a rational dynamical mode with a small denominator may lead to
numerical challenges. This could be related to fast transient behaviour, as the gradient can be
significantly larger when the denominator is small. Therefore, such cases need to be appropriately
handled, for example, with proper data normalization and sampling, although, in our experiment
to identify a Michaelis–Menten kinetic model from data (see §5d), we have not noticed any
unexpected behaviour.
where x(t) ∈ Rn and u(t) ∈ Rm are state and control input vectors. The goal here is to discover
f(x(t), u(t)) using the state trajectory x(t) generated using a control input u(t). We aim at
discovering governing equations using dictionary-based identification. As discussed in §2, we
10
construct a symbolic dictionary Φ of possible candidate features using x and u, i.e.
..........................................................
u , xu , (4.14)
2
where xP i P2
u comprises polynomial terms of degree-i, i.e.xu contains degree-2 polynomial terms
including cross terms
xP
u = x1 , . . . , xn , u1 , . . . , um , x1 u1 , . . . , xn u1 , x1 u2 , . . . , xn um ,
2 2 2 2 2 (4.15)
where ui is the ith element of u. Using measurements of x and u, we can cast the problem exactly
as done in §2 by assuming that f(x(t), u(t)) can be determined by selecting appropriate functions
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
from the dictionary Φ(x, u). Similarly, system parameters can also be incorporated to discover
parametric differential equations of the form
where μ ∈ Rp contains the system parameters. It can be considered as a special case of (4.13) since a
constant input can be thought of as a parameter in the course of discovering governing equations.
We illustrate the RK4-SINDy framework for discovering parametrized Hopf normal form using
measurement data (see §5e).
Let us assume that we are concerned about discovering the model (4.17) using a time history
of x(t) without any prior knowledge except that we expect exponential nonlinearities based
on expert. For instance, an electrical circuit containing diode components typically involves
exponential nonlinearities, but the corresponding coefficient is unknown.
We conventionally build a dictionary containing exponential functions using several possible
coefficients as follows:
However, it is impossible to add infinitely many exponential terms with different coefficients
in the dictionary. As a remedy, we discuss the idea of a parameterized dictionary that was also
discussed in [44]
In this case, we do not need to include all frequencies for trigonometric functions and coefficients
for exponential functions. However, it comes at the cost of finding suitable coefficients {ηi },
along with a vector, selecting the right features from the dictionary. Since we solve optimization
problems, e.g. (2.10) using gradient descent, we can easily incorporate the parameters ηi along
with ξi as learning parameters and can readily employ algorithms 1 and 2 with a little alteration.
5. Results 11
Here, we demonstrate the success of RK4-SINDy in discovering governing equations using
and contains rational nonlinearities. In the last example, we showcase that RK4-SINDy also
successfully discovers the parametric Hopf normal form from collected noisy measurement data
for various parameters. Lastly, we mention that we have generated the data using the adaptive
solver solve_ivp from the python library SciPy with default settings. We have implemented
algorithms 1 and 2 using the PyTorch library and have used a gradient descent method with a
fixed learning rate to solve equation (2.10).
and
ẏ(t) = −2.0x(t) − 0.1y(t). (5.1b)
To infer governing equations from measurement data, we first assume to have clean data at a
regular time-step dt. We then build a symbolic dictionary containing polynomial nonlinearities up
to degree 5. Next, we learn governing equations using RK4-SINDy (algorithm 1 with λ = 5 × 10−2 )
and observe the quality of inferred equations for different dt. We also present a comparison with
Std-SINDy.
The results are shown in figure 2 and table 1. We note that RK4-SINDy is much more robust
with the variation in time-step when compared with Std-SINDy, and discovers the governing
equations accurately. We also emphasize that for large time-steps, Std-SINDy fails to capture the
part of dynamics; in fact, for a time-step dt = 5 × 10−1 , Std-SINDy even yields unstable models,
figure 2d.
Next, we study the performance of both methodologies under corrupted data. We corrupt
the measurement data by adding zero-mean Gaussian white noise of different variances. We
present the results in figure 3 and table 2 and note that RK4-SINDy can discover better quality
sparse parsimonious models as compared to Std-SINDy even under significantly corrupted data.
It is predominantly visible in figure 3d. Naturally, RK4-SINDy also breaks down for a very large
amount of noise in measurements, but this breakdown happens much later than for Std-SINDy.
1
Most of all examples are taken from [14].
2
We use the Python implementation of the method, the so-called PySINDy [45].
(a) (b)
12
2.0 2.0
data data data data
1.5 RK4-SINDy 1.5 Rk4-SINDy
RK4-SINDy RK4-SINDy
..........................................................
1.0 Std-SINDy 1.0 Std-SINDy
0.5 1 0.5 1
x x
0 0
0 0
y –1 y –1
–0.5 –0.5
–2 –2
–1.0 25 –1.0 25
20 20
–1.5 –2 15 –1.5 –2 15
–1 10 –1 10
e
tim
tim
–2.0 0 5 –2.0 0 5
1 1
y 0 y 0
–2 –1 0 1 2 –2 –1 0 1 2
x x
(c) (d)
2.0 2.0
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
0.5 1 0.5 1
x x
0 0
0 0
y y –1
–1
–0.5 –0.5
–2 –2
–1.0 25 –1.0 25
20 20
–1.5 –2 15 –1.5 –2 15
–1 10 –1 10
e
e
tim
tim
–2.0 0 5 –2.0 0 5
1 1
y 0 y 0
–2 –1 0 1 2 –2 –1 0 1 2
x x
Figure 2. Linear two-dimensional model: identified models using data at various regular time-steps. (a) Time-step
dt = 1 × 10−2 , (b) time-step dt = 1 × 10−1 , (c) time-step dt = 3 × 10−1 , (d) time-step dt = 5 × 10−1 . (Online version
in colour.)
Table 1. Linear two-dimensional model: the discovered governing equations using RK4-SINDy and Std-SINDy are reported for
different regular time-steps at which data are collected.
(x, y)
0 –1 0 –1
–2 –2
–0.5 –0.5
10 10
–1.0 8 –1.0 8
–2 6 6
–1.5 –1 –1.5 –2
4 –1 4
e
0
tim
tim
1 2 0 2
–2.0 y –2.0 1
0 y 0
0 2 4 6 8 10 0 2 4 6 8 10
time time
(c) (d)
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
2.0 noisy data (0.1) noisy data (0.1) 2.0 noisy data (0.2) noisy data (0.2)
1.5 RK4-SINDy RK4-SINDy RK4-SINDy RK4-SINDy
1.5
Std-SINDy Std-SINDy Std-SINDy Std-SINDy
1.0 1.0
1 1
0.5 x 0.5 x
0 0
(x, y)
(x, y)
0 –1 0 –1
–0.5 –2
–0.5
10 10
–1.0 8 –1.0 8
–2 6 6
–1.5 –1 –1.5 –2
0 4 –1 4
e
e
tim
tim
y 1 2 0 2
–2.0 0 –2.0 y 1 0
0 2 4 6 8 10 0 2 4 6 8 10
time time
Figure 3. Linear two-dimensional model: the transient responses of discovered models using corrupted data are compared.
(a) Noise level σ = 1 × 10−2 , (b) noise level σ = 5 × 10−2 , (c) noise level σ = 1 × 10−1 , (d) noise level σ = 2 × 10−1 .
(Online version in colour.)
Table 2. Linear two-dimensional model: the discovered governing equations, by employing RK4-SINDy and Std-SINDy, are
reported. In this scenario, the measurement data are corrupted using zero-mean Gaussian white noise of different variances.
Like in the linear case, we aim at discovering the governing equations using measurement data.
We repeat the study done in the previous example using different regular time-steps. We report
the quality of discovered models using RK4-SINDy and Std-SINDy in figure 4 and table 3.
We observe that RK4-SINDy successfully discovers the governing equations quite accurately,
whereas Std-SINDy struggles to identify the governing equations when measurements data are
collected at a larger time-step. It simply fails to obtain a stable model for the time-step dt = 0.1. It
showcases the robustness of the RK4-SINDy to discover interpretable models even when data are
collected sparsely.
(a) 2.0 (b) 2.0
data data
14
1.5 RK4-SINDy
1.5 RK4-SINDy
1.0 Std-SINDy 1.0 Std-SINDy
..........................................................
x x
0 0
y 0 y 0
–1 –1
–0.5 –0.5
–2 –2
–1.0 25 –1.0 25
data 20 data 20
–1.5 RK4-SINDy –2 15 –1.5 RK4-SINDy –2 15
–1 10 –1 10
e
tim
tim
–2.0 Std-SINDy 0 5 –2.0 Std-SINDy 0 5
y 1 0 y 1 0
–2 –1 0 1 2 –2 –1 0 1 2
x x
(c) (d)
2.0 2.0
data data
1.5 1.5 RK4-SINDy
RK4-SINDy
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
0.5 1 0.5 1
x x
0 0
y 0 y 0
–1 –1
–0.5 –0.5
–2 –2
–1.0 25 –1.0 25
data 20 data 20
–1.5 RK4-SINDy –2 15 –1.5 RK4-SINDy –2 15
–1 10 –1 10
e
e
tim
tim
–2.0 Std-SINDy 0 5 –2.0 Std-SINDy 0 5
y 1 0 y 1 0
–2 –1 0 1 2 –2 –1 0 1 2
x x
Figure 4. Cubic two-dimensional model: a comparison of the transient responses of discovered models using data at different
regular time-steps. (a) time-step dt = 5 × 10−3 , (b) time-step dt = 1 × 10−2 , (c) time-step dt = 5 × 10−2 and (d) time-
step dt = 1 × 10−1 . (Online version in colour.)
Table 3. Cubic two-dimensional model: the table reports the discovered governing equations by employing RK4-SINDy and
Std-SINDy.
..........................................................
0.8 w 0.8 w
data data
w 0.6 RK4-SINDy 0.5 w 0.6 RK4-SINDy 0.5
Std-SINDy 0 Std-SINDy 0
0.4 0.4
0.2 2 0.2 2
1 1
0 0 0 0 0 0
200 400 –1 v 200 400 –1 v
–0.2 –0.2
time 600 –2 time 600 –2
–2 –1 0 1 2 –2 –1 0 1 2
v v
Std-SINDy Std-SINDy
1.0 1.0
1.0 1.0
0.8 w 0.8 w
data data
0.5 0.5
w 0.6 RK4-SINDy w 0.6 RK4-SINDy
Std-SINDy 0 Std-SINDy 0
0.4 0.4
0.2 2 0.2 2
1 1
0 0 0 0 0 0
200 400 –1 v 200 400 –1 v
–0.2 –2 –0.2
time 600 time 600 –2
–2 –1 0 1 2 –2 –1 0 1 2
v v
Figure 5. FHN model: a comparison of the transient responses of the discovered differential equations using data collected at
different regular time-steps. (a) time-step dt = 1.0 × 10−1 , (b) time-step dt = 2.5 × 10−1 , (c) time-step dt = 5.0 × 10−1
and (d) time-step dt = 7.5 × 10−1 . (Online version in colour.)
Table 4. FHN model: discovered models using data at various time-steps using RK4-SINDy and Std-SINDy.
dt RK4-SINDy Std-SINDy
v̇(t) = 0.499 + 0.998v − 0.998w − 0.333v3 v̇(t) = 0.498 + 0.996v − 0.996w − 0.332v3
1.0 × 10−1
ẇ(t) = 0.032 + 0.040v − 0.028w ẇ(t) = 0.032 + 0.040v − 0.028w
..........................................................................................................................................................................................................
v̇(t) = 0.499 + 0.998v − 0.998w − 0.333v3 v̇(t) = 0.494 + 0.985v − 0.989w − 0.328v3
2.5 × 10−1
ẇ(t) = 0.032 + 0.040v − 0.028w ẇ(t) = 0.032 + 0.040v − 0.028w
..........................................................................................................................................................................................................
v̇(t) = 0.482 + 0.943v − 0.959w
v̇(t) = 0.501 + 1.001v − 1.001w − 0.334v3
5.0 × 10−1 − 0.034vw − 0.311v3 + 0.024vw2
ẇ(t) = 0.032 + 0.040v − 0.028w
ẇ(t) = 0.032 + 0.040v − 0.028w
..........................................................................................................................................................................................................
v̇(t) = 0.459 + 0.816v − 0.982w
v̇(t) = 0.502 + 1.001v − 1.003w − 0.334v3
7.5 × 10−1 − 0.013v2 + · · · + 0.131vw2 − 0.021w3
ẇ(t) = 0.032 + 0.040v − 0.027w
ẇ(t) = 0.032 + 0.040v − 0.028w
..........................................................................................................................................................................................................
RK4-SINDy (algorithm 1 with λ = 10−2 ) and Std-SINDy. We discover governing equations using
the data collected in the time interval [0, 600]s. We identify models under different conditions,
namely, different time-steps at which data are collected. We report the results in figure 5 and
table 4. It can be observed that RK4-SINDy faithfully discovers the underlying governing
equations by picking the correct features from the dictionary and estimates the corresponding
coefficients up to 1% accurately. On the other hand, Std-SINDy breaks down when data are taken
at a large time-step.
Figure 6. Chaotic Lorenz model: (a) the collected data (in dotted) and a finely spaced trajectory of the ground truth is shown.
(b,c) The trajectories obtained using the discovered models using RK4-SINDy and Std-SINDy, respectively. (Online version in
colour.)
Table 5. Chaotic Lorenz model: discovered governing equations using RK4-SINDy and Std-SINDy.
RK4-SINDy Std-SINDy
˙ = −10.004x̃(t) + 10.004ỹ(t),
x̃(t) ˙ = −9.983x̃(t) + 9.983ỹ(t),
x̃(t)
˙ = 2.966x̃ − 0.956ỹ(t) − 7.953x̃(t)z̃(t),
ỹ(t) ˙ = 2.912x̃ − 0.922ỹ(t) − 7.911x̃(t)z̃(t),
ỹ(t)
˙ = 7.944x̃(t)ỹ(t) − 2.669z̃(t) − 8.336
z̃(t) ˙ = 7.972x̃(t)ỹ(t) − 2.662z̃(t) − 8.313
z̃(t)
..........................................................................................................................................................................................................
⎫
ẋ(t) = −10x(t) + 10y(t), ⎪ ⎪
⎬
ẏ(t) = x(28 − z(t)) − y(t) (5.4)
⎪
⎪
⎭
and ż(t) = x(t)y(t) − 83 z(t).
We collect the data by simulating the model from time t = 0 to t = 20 with a time-step of
dt = 10−2 . To discover the governing equations using the measurement data, we employ RK4-
SINDy and Std-SINDy with the fixed cut-off parameter λ = 0.5. However, before employing the
methodologies, we perform a normalization step. The reason behind this is that the mean value
of the variable z is large, and the standard deviations of all three variables are much larger than
1. Consequently, a dictionary containing polynomial terms would be highly ill-conditioned. To
circumvent this, we perform a normalization of data. Ideally, one performs normalization such
that the mean and variance of the transformed data are 0 and 1. But for this particular example,
we normalize such that the interactions between the transformed variables are similar to (5.4).
Hence, we propose a transformation as
x(t) y(t) z(t)−25
x̃(t) := 8 , ỹ(t) := 8 and z̃(t) := 8 . (5.5)
The models (5.4) and (5.6) look alike, and the basis features in which the dynamics of both models
lie are the same except for a constant. However, an appealing property of the model (5.6) or
the transformed data is that the data becomes well-conditioned, hence the dictionary containing
polynomial features. Next, we discover models by employing RK4-SINDy and Std-SINDy using
the transformed data. For this, we construct a dictionary with polynomial nonlinearities up to
noisy-data
17
40 denoised-data
30
Figure 7. Chaotic Lorenz model: the figure shows the noisy measurements of {x, y, z} that are corrupted by adding zero-mean
Gaussian noise of variance one. It also shows the denoised signal done using a Savitzky–Golay filter [48]. (Online version in
colour.)
40 40 40
30 z 30 z 30 z
20 20 20
10 10 10
20 20 20
10 10 10
–20–10 0 0 0
0 –10 y –10 0 –10 y –10 0 –10 y
x 10 20 –20 x 10 –20 x 10 –20
Figure 8. Chaotic Lorenz model: the left figure shows the collected noisy data (in dotted), and a continuous trajectory of the
ground truth is shown. We have added Gaussian white noise of mean zero and variance one. The middle and right figures present
the transient trajectories obtained using the discovered models using RK4-SINDy and Std-SINDy, respectively, and show that the
dynamics of the discovered models are intact on the attractor. (Online version in colour.)
degree 3. We observe the result in figure 6 and table 5. We note that both methods identify correct
features from the dictionary with coefficients that are close to the ground truth, but the RK4-
SINDy model coefficients are closer to the ground-truth ones. It is also worthwhile to note that
the coefficients of the obtained RK4-SINDy model are only 0.01% off the ground-truth, but the
dynamics still seem quite different, figure 6. A reason behind this is the highly chaotic behaviour
of the dynamics. As a result, a tiny deviation in the coefficients can significantly impact the
transient behaviour in an absolute sense; however, the dynamics on the attractor are perfectly
captured. Next, we study the performance of the approaches under noisy measurements. For
this, we add zero-mean Gaussian white noise of variance one. To employ RK4-SINDy, we first
apply a Savitzky–Golay filter [48] to denoise the time-history data, figure 7. For Std-SINDy as
well, we use the same filter to denoise the signal and approximate the derivative information.
We plot the trajectories of the discovered models and ground-truth in figure 8 and observe
that dynamics on the attractor is still intact; however, we note that the discovered equations
are very different from the ground truth, table 6. The learning can be improved by employing
algorithm 2, where we iteratively remove the smallest coefficient and determine the sparsest
solutions by looking at the Pareto-front. However, it comes at a slightly higher computational
cost.
2.0 training trajectories
ctories
1.5 s(t) 18
ṡ(t) = 0.6–
1.5 0.3 + s(t)
–0.449 – 0.9
0.900s̃(t)
900s̃(t)
(a) 0 2 4 6 8
(b) s̃˙ (t)=
discovering
time 0.674 + 0.3
0.350s̃(t)
50s̃(t)
model
simulations parsimonious
4 ground-truth model model
RK4-SINDy learned
learned model 10–4
3 ground-truth model
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
loss
2 –0.666 – 1.335s̃(
1.335s̃(t)
˜ t)
˜˙ =
10–6
s
s(t)
1 1.000 + 0.512s̃(t)
⇒
10–8
6 ˙ = –0.449 – 0.900s̃(t)
s̃(t) (e) 0 2 4 6 8
(c) 0 2 4 8
0.674 + 0.345˜s(t) no. forced zero coefficients
time (d)
Figure 9. Michaelis–Menten kinetics: in the first step, we have collected data for four initial conditions at a time-stepping
dt = 5 × 10−2 . In the second step, we performed data-processing to normalize the data using the mean and standard
deviation. In the third step, we employed RK4-SINDy (algorithm 2) to discover the most parsimonious model. For this, we observe
the Pareto front and pick the model that best fits the data, yet has the maximum number of zero coefficients. We then compare
the discovered model with the ground truth and find that the proposed methodology could find precise candidates from the
polynomial dictionary. The corresponding coefficients have less than 1% errors. (Online version in colour.)
Table 6. Lorenz model: discovered governing equations using RK4-SINDy and Std-SINDy from noisy measurements.
RK4-SINDy Std-SINDy
˙ = −9.016x̃ + 8.221ỹ − 1.675x̃z̃
x̃(t) ˙ = −8.842x̃ + 8.373ỹ − 3.107x̃z̃
x̃(t)
+ 0.895x̃3 + 0.603x̃2 ỹ + −0.579x̃ỹ2 + 1.065ỹz̃ + 2.165x̃3 + −1.215x̃2 ỹ
˙ = 1.025ỹ − 6.133x̃z̃ − 1.033ỹz̃
ỹ(t) ˙ = 1.811x̃ + −7.580x̃z̃
ỹ(t)
˙ = −8.3451 − 2.708z̃ + 7.971x̃ỹ
z̃(t) ˙ = −8.3441 + −2.710z̃ + 7.950x̃ỹ
z̃(t)
..........................................................................................................................................................................................................
coefficients. To identify the correct model while employing algorithm 2, we keep track of the loss
(data-fidelity) and the number of non-zero coefficients, which is shown in figure 9c. This allows us
to build a Pareto front for the optimization problem and to choose the most parsimonious model
that describes the dynamics present in the collected data. One of the most attractive features of
learning parsimonious models is to avoid over-fitting and generalizing better in regions in which
data are not collected. It is exactly what we observed as well. As shown in figure 9e, the learned
model predicts dynamics very accurately in the region far away from the training one.
Next, we study the performance of the method under noisy measurements. For this, we
corrupt the collected data using zero-mean Gaussian noise of variance σ = 2 × 10−2 . Then, we
process the data by first employing a noise-reduction filter, namely Savitzky–Golay, followed by
normalizing the data. In the third step, we focus on learning the most parsimonious model by
picking appropriate candidates from the polynomial dictionary. Remarkably, the method allows
us to find a model with correct features from the dictionary and coefficient accuracy up to 5%.
Furthermore, the model faithfully generalizes to regimes outside the training, even using noisy
measurements (figure 10).
that exhibits bifurcation with respect to the parameter μ. For this example, we collect
measurements for eight different parameter values μ at a time-step 0.2 by fixing ω = 1 and A = 1.
Then, we corrupt the measurement data by adding Gaussian sensor noise of 1% that is shown
in figure 11a. Next, we aim at constructing a symbolic polynomial dictionary Φ by including the
parameter μ as the dependent variables. While building a polynomial dictionary, it is important
to choose the degree of the polynomial as well. Moreover, it is known that the polynomial basis
becomes numerically unstable as the degree increases. Hence, solving the optimization problem
(2.9) becomes challenging. By means of this example, we discuss an assessment test to choose
the appropriate degree of the polynomial in the dictionary. Essentially, we inspect data fidelity
with respect to the degree of the polynomial in the dictionary. When the dictionary contains all
essential polynomial features, then a sharp drop in the error is expected. We observe in figure 11b
a sharp drop in the error at the degree 3, and the error remains almost the same even when higher
polynomial features are added. It indicates that polynomial degree 3 is sufficient to describe the
dynamics. Using the dictionary containing degree 3 polynomial features, we seek to identify
the minimum number of features from the dictionary that explains the underlying dynamics.
We achieve this by employing RK4-SINDy, and comparing the performance with Std-SINDy.
2.0 2.0 20
noisy measurements
1.5 1.5
0 2 4 6 8 0 2 4 6 8
time time
discovering
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
model
ground-truth model 3 × 10–4
RK4-SINDy
K4-SINDy
3 ground-truth
ound-truth model
2 × 10–4
loss
loss
RK4-SINDy
K4-SINDy (outside training)
s 2
1 parsimonious 10–4
model
0 2 4 6 8 0 2 4 6
time no. forced zero coefficients
Figure 10. Michaelis–Menten kinetics: the figure demonstrates the necessary steps to uncover the most parsimonious model
using noisy measurement data. It also testifies to the capability of discovering the most parsimonious model to even generalize
beyond the training regime. (Online version in colour.)
Table 7. Hopf normal form: here, we report discovered governing equations using noisy measurement data, representing
dynamics of Hopf bifurcation. We note that RK4-SINDy recovers the Hopf normal form very accurately; on the other hand,
Std-SINDy breaks down.
We note down the discovered governing equations in table 7, where we note an impressive
performance of RK4-SINDy to discover the exact form of the underlying parametric equations,
and the coefficients are up to 1% accurate. On the other hand, Std-SINDy is not able to identify
the correct form of the model. Furthermore, we compare the discovered model simulations using
RK4-SINDy with ground truth beyond the training regime of the parameter μ in figure 11c,d. It
exposes the strength of the parsimonious and interpretable discovered models.
6. Discussion
This work has introduced a compelling approach (RK4-SINDy) to discover nonlinear differential
equations. For this, we have blended sparsity-promoting identification with a numerical
integration scheme, namely, the classical fourth-order Runge–Kutta method. We note that the
(a) measurement data (b) 10–1 21
loss
y
0 10–3
–0.5
2 10–4
1
0 x
0.2 0
0.4 0 2 4
m degree of polynomials
Downloaded from https://royalsocietypublishing.org/ on 24 October 2024
1.0 1.0
0.5 0.5
0 y 0 y
–0.5 –0.5
–1.0 –1.0
2 2
1 1
0 0 x 0 0 x
0.5 0.5
–1 –1
m 1.0 m 1.0
Figure 11. Hopf normal form: (a) the noisy measurements that are obtained using various parameter μ. To identify correct
degree polynomial basis in the dictionary, we do an assessment test, indicating degree-3 polynomials are sufficient to describe
the dynamics (b). (c,d) A comparison of simulations of the ground truth model and identified models for the parameter μ,
illustrating the capability of generalizing beyond the training parameter regime. (Online version in colour.)
RK4 scheme could easily be exchanged with other high-order explicit or adaptive-time integrators
using those presented in [29], and similar results could be expected. The appeal of the proposed
methodology is that we do not require derivative information at any stage, still discovering
differential equations. Hence, the proposed algorithm differs from previously suggested sparsity-
promoting identification methods in the literature in this aspect. Consequently, we expect
RK4-SINDy to perform better under sparsely sampled and corrupted data. We have demonstrated
the efficiency of the approach on a variety of examples, namely linear and nonlinear damped
oscillators, a model describing neural dynamics, chaotic behaviour and parametric differential
equations. We have accurately discovered the Fitz–Hugh Nagumo model that describes the
activation and de-activation of neurons. We have also illustrated the identification of the Lorenz
model and have shown that the dynamics of identified models are intact on an attractor as it is
more important for chaotic dynamics. The example of Michaelis–Menten kinetics highlights that
the proposed algorithm can discover models that are given by a fraction of two functions. The
example also shows the power of determining parsimonious models—that is, their generalization
beyond the region in which data are selected. Furthermore, we have demonstrated the robustness
of the proposed RK4-SINDy algorithm to sparsely sampled and corrupted measurement data.
In the case of large noise, a noise-reduction filter such as Savitzky–Golay helps to improve the
quality of the discovered governing equations. We have also reported a comparison with the
sparse identification approach [15] and have observed the out-performance of RK4-SINDy over
the latter approach.
This work opens many exciting doors for further research from both theory and practical
22
perspectives. Since the approach aims at selecting the correct features from a dictionary
containing a high-dimensional nonlinear feature basis, the construction of these feature bases
References
1. Jordan MI, Mitchell TM. 2015 Machine learning: trends, perspectives, and prospects. Science
349, 255–260. (doi:10.1126/science.aaa8415)
2. Marx V. 2013 The big challenges of big data. Nature 498, 255–260. (doi:10.1038/498255a)
3. Ljung L. 1999 System identification: theory for the user. Englewood Cliffs, NJ: Prentice Hall.
4. Van Overschee P, de Moor B. 1996 Subspace identification of linear systems: theory, implementation,
applications. Dordrecht (Hingham, MA): Kluwer Academic Publishers.
5. Kumpati SN, Kannan P. 1990 Identification and control of dynamical systems using neural
networks. IEEE Trans. Neural Netw. 1, 4–27. (doi:10.1109/72.80202)
6. Suykens JA, Vandewalle JP, de Moor BL. 1996 Artificial neural networks for modelling and control
of non-linear systems. New York, NY: Springer.
7. Kantz H, Schreiber T. 2004 Nonlinear time series analysis vol. Cambridge, UK: Cambridge
University Press.
8. Crutchfield JP, McNamara BS. 1987 Equations of motion from a data series. Complex Syst.
23
1, 121.
9. Bongard J, Lipson H. 2007 Automated reverse engineering of nonlinear dynamical systems.
13. Proctor JL, Brunton SL, Brunton BW, Kutz J. 2014 Exploiting sparsity and equation-free
architectures in complex systems. Eur. Phy. J. Spec. Top. 223, 2665–2684. (doi:10.1140/
epjst/e2014-02285-8)
14. Brunton SL, Proctor JL, Kutz JN. 2016 Discovering governing equations from data by sparse
identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. USA 113, 3932–3937.
(doi:10.1073/pnas.1517384113)
15. Brunton SL, Proctor JL, Kutz JN. 2016 Sparse identification of nonlinear dynamics with control
(SINDYc). IFAC-PapersOnLine 49, 710–715. (doi:10.1016/j.ifacol.2016.10.249)
16. Friedman J, Hastie T, Tibshirani R. 2001 The elements of statistical learning, vol. 1. New York,
NY: Springer.
17. James G, Witten D, Hastie T, Tibshirani R. 2013 An introduction to statistical learning, vol. 112.
New York, NY: Springer.
18. Tibshirani R. 1996 Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B
(Methodological) 58, 267–288.
19. Donoho DL. 2006 Compressed sensing. IEEE Trans. Inform. Theory 52, 1289–1306. (doi:10.1109/
TIT.2006.871582)
20. Candès EJ, Romberg J, Tao T. 2006 Robust uncertainty principles: exact signal reconstruction
from highly incomplete frequency information. IEEE Trans. Inform. Theory 52, 489–509.
(doi:10.1109/TIT.2005.862083)
21. Candés EJ, Romberg JK, Tao T. 2006 Stable signal recovery from incomplete and inaccurate
measurements. Commun. Pure Appl. Math. 59, 1207–1223. (doi:10.1002/cpa.20124)
22. Tropp JA, Gilbert AC. 2007 Signal recovery from random measurements via orthogonal
matching pursuit. IEEE Trans. Inform. Theory 53, 4655–4666. (doi:10.1109/TIT.2007.909108)
23. Chartrand R. 2011 Numerical differentiation of noisy, nonsmooth data. ISRN Appl. Math. 2011,
1-11. (doi:10.5402/2011/164564)
24. Schmidt MD, Vallabhajosyula RR, Jenkins JW, Hood JE, Soni AS, Wikswo JP, Lipson H. 2011
Automated refinement and inference of analytical models for metabolic networks. Phy. Biol.
8, 055011. (doi:10.1088/1478-3975/8/5/055011)
25. Daniels BC, Nemenman I. 2015 Efficient inference of parsimonious phenomenological models
of cellular dynamics using S-systems and alternating regression. PLoS ONE 10, e0119821.
26. Kevrekidis IG, Gear CW, Hyman JM, Kevrekidis PG, Runborg O, Theodoropoulos C.
2003 Equation-free, coarse-grained multiscale computation: enabling mocroscopic simulators
to perform system-level analysis. Commun. Math. Sci. 1, 715–762. (doi:10.4310/CMS.
2003.v1.n4.a5)
27. Ye H, Beamish RJ, Glaser SM, Grant SC, Hsieh C, Richards LJ, Schnute JT, Sugihara G. 2015
Equation-free mechanistic ecosystem forecasting using empirical dynamic modeling. Proc.
Natl Acad. Sci. USA 112, E1569–E1576.
28. Ascher UM, Petzold LR. 1998 Computer methods for ordinary differential equations and differential-
algebraic equations, vol. 61. Philadelphia, PA: SIAM.
29. Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK. 2018 Neural ordinary differential
equations. In Advances Neural Inform. Processing Sys., pp. 6571–6583 (eds S Bengio, H Wallach,
H Larochelle, K Grauman, N Cesa-Bianchi , R Garnett). Red Hook, NY: Curran Associates.
30. Rico-Martinez R, Anderson J, Kevrekidis I. 1994 Continuous-time nonlinear signal processing:
a neural network based approach for gray box identification. In Proc. of IEEE Workshop on
Neural Networks for Signal Processing, pp. 596–605. IEEE.
31. Gonzalez-Garcia R, Rico-Martinez R, Kevrekidis I. 1998 Identification of distributed
24
parameter systems: a neural net based approach. Comput. Chem.l Eng. 22, S965–S968.
(doi:10.1016/S0098-1354(98)00191-4)