PINNecho

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Journal of Computational Science 47 (2020) 101237

Contents lists available at ScienceDirect

Journal of Computational Science


journal homepage: www.elsevier.com/locate/jocs

Physics-informed echo state networks


N.A.K. Doan a, b, W. Polifke a, L. Magri c, d, *
a
Department of Mechanical Engineering, Technical University of Munich, Germany
b
Institute for Advanced Study, Technical University of Munich, Germany
c
Institute for Advanced Study, Technical University of Munich, Germany (visiting)
d
Department of Engineering, University of Cambridge, United Kingdom

A R T I C L E I N F O A B S T R A C T

Keywords: We propose a physics-informed echo state network (ESN) to predict the evolution of chaotic systems. Compared
Echo state networks to conventional ESNs, the physics-informed ESNs are trained to solve supervised learning tasks while ensuring
Physics-informed neural networks that their predictions do not violate physical laws. This is achieved by introducing an additional loss function
Chaotic dynamical systems
during the training, which is based on the system’s governing equations. The additional loss function penalizes
non-physical predictions without the need of any additional training data. This approach is demonstrated on a
chaotic Lorenz system and a truncation of the Charney–DeVore system. Compared to the conventional ESNs, the
physics-informed ESNs improve the predictability horizon by about two Lyapunov times. This approach is also
shown to be robust with regard to noise. The proposed framework shows the potential of using machine learning
combined with prior physical knowledge to improve the time-accurate prediction of chaotic dynamical systems.

1. Introduction use are based on reservoir computing [13], in particular, echo state
networks (ESNs). ESNs are used here instead of more conventional
Over the past few years, there has been a rapid increase in the recurrent neural networks (RNNs), like the long-short term memory
development of machine learning techniques, which have been applied unit, because ESNs proved particularly accurate in predicting chaotic
with success to various disciplines, from image or speech recognition [1, dynamics for a longer time horizon than other machine learning net­
2] to playing Go [3]. However, the application of such methods to the works [13]. ESNs are also generally easier to train than other RNNs, and
study and forecasting of physical systems has only been recently they have recently been used to predict the evolution of spatiotemporal
explored, including some applications in the field of fluid dynamics chaotic systems [14,15]. In the present study, ESNs are augmented by
[4–7]. One of the major challenges for using machine learning algo­ physical constraints to accurately forecast the evolution of two proto­
rithms for the study of complex physical systems is the prohibitive cost typical chaotic systems, the Lorenz system [16] and the Char­
of data generation and acquisition for training [8,9]. However, in ney–DeVore system [17]. The robustness of the proposed approach with
complex physical systems, there exists a large amount of prior knowl­ regard to noise is also analysed. Compared to previous physics-informed
edge, such as governing equations and conservation laws, which can be machine learning approaches, which mostly focused on identifying so­
exploited to improve existing machine learning approaches. These lutions of PDEs using feedforward neural networks [4,9,11], the
hybrid approaches, called physics-informed machine learning or theor­ approach proposed here is applied on a form of RNN for the modeling of
y-guided data science [10], have been applied with some success to chaotic systems. The objective is to train the ESN in conjunction with
flow-structure interaction problems [4], turbulence modelling [5], the physical knowledge to reproduce the dynamics of the original system,
solution of partial differential equations (PDEs) [9], cardiovascular flow and so, for the ESN to be a digital twin of the real system.
modelling [11], and physics-based object tracking in computer vision Section 2 details the method used for the training and for forecasting
[12]. the dynamical systems, both with conventional ESNs and the newly
In this study, we propose an approach to combine physical knowl­ proposed physics-informed ESNs (PI-ESNs). Results are presented in
edge with a machine learning algorithm to time-accurately forecast the Section 3 and final comments are summarized in Section 4.
evolution of chaotic dynamical systems. The machine learning tools we

* Corresponding author at: Institute for Advanced Study, Technical University of Munich, Germany (visiting).
E-mail address: [email protected] (L. Magri).

https://doi.org/10.1016/j.jocs.2020.101237
Received 18 February 2020; Received in revised form 11 August 2020; Accepted 18 October 2020
Available online 31 October 2020
1877-7503/© 2020 Elsevier B.V. All rights reserved.
N.A.K. Doan et al. Journal of Computational Science 47 (2020) 101237

2. Methodology largest absolute eigenvalue of W to be equal to a value Λ where Λ ≤ 1 to


ensure the Echo State Property [18].
The echo state network (ESN) approach presented in [18] is used After training, to obtain predictions for future times t > T, the output
here. Given a training input signal u(n) of dimension Nu and a desired of the ESN is looped back as an input, which then evolves autonomously
known target output signal y(n) of dimension Ny , the ESN learns a model (Fig. 1b).
with output ̂y (n) matching y(n). n = 1, …, Nt is the number of time steps,
and Nt is the number of data points in the training dataset covering a 2.1. Training
time window from 0 until T = (Nt − 1)Δt. Here, where the forecasting of
a dynamical system is under investigation, the desired output signal is The training of the ESN consists of the optimization of W out . As the
equal to the input signal at the next time step, i.e., y(n) = u(n + 1) ∈ ℝNy . outputs of the ESN, ̂y , are a linear combination of the states, x, W out can
The ESN is composed of a randomized high dimensional dynamical be obtained by using ridge regression:
system, called a reservoir, whose states at time n are represented by a ( )− 1
vector, x(n) ∈ ℝNx representing the reservoir neuron activations. The W out = YXT XX T + γI (4)
reservoir is coupled to the input signal, u, via an input-to-reservoir
matrix, W in ∈ ℝNx ×Nu . The output of the reservoir, ̂y , is deduced from where Y and X are respectively the column-concatenation of the various
time instants of the output data, y, and associated ESN states x. γ is a
the states via the reservoir-to-output matrix, W out ∈ ℝNy ×Nx , as a linear
Tikhonov regularization factor. The optimization in Eq. (4) is:
combination of the reservoir states:
Ny
( )
(1) 1 ∑ ∑
Nt
̂
y = W out x W out = argmin y i (n) − yi (n))2 + γ||wout,i ||2
(̂ (5)
W out Ny i=1 n=1
In this work, a non-leaky reservoir is used, in which the state of the
reservoir evolves according to: where wout,i denotes the ith row of W out . This optimization problem pe­
nalizes large values of W out , which generally improves the feedback
x(n + 1) = tanh(W in u(n + 1) + Wx(n)) (2)
stability and avoids overfitting [18].
where W ∈ ℝNx ×Nx is the recurrent weight matrix and the (element-wise) In this work, following the approach of [9] for artificial deep feed­
tanh function is used as an activation function for the reservoir neurons. forward neural networks, we propose an alternative approach for
The commonly-used tanh activation offers good accuracy [13,18] for the training W out , which combines the data available with prior physical
systems studied here, as discussed in Sections 3.1 and 3.2. While knowledge of the system under investigation. Let us first assume that the
different activation functions have been proposed [19], it is beyond the dynamical system is governed by the following nonlinear differential
scope of the present work to study the effect of activation functions on equation:
the echo state network accuracy. ℱ (y) ≡ ẏ + 𝒩 (y) = 0 (6)
In the conventional ESN approach (Fig. 1a), the input and recurrent
matrices, W in and W, are randomly initialized only once and are not where ℱ is a general nonlinear operator, (˙) is the time derivative and 𝒩
trained. These are typically sparse matrices constructed so that the is a nonlinear differential operator. Eq. (6) represents a formal equation
reservoir verifies the Echo State Property [20]. Only the output matrix, describing the dynamics of a generic nonlinear system. The training
W out , is trained to minimize the mean squared error, Ed , between the ESN phase can be reframed to make use of our knowledge of ℱ by minimizing
predictions and the data: the mean squared error, Ed , and a physical error, Ep , based on ℱ :
Ny
1 ∑ 1 ∑ Nt
1 ∑
Ny
1 ∑
Np
Ed = y (n) − yi (n))2
(̂ (3) Etot = Ed + Ep , where Ep = |ℱ ( ŷi (np ))|2 (7)
Ny i=1 Nt n=1 i Ny i=1 Np p=1

(The subscript d is used to indicate the error based on the available N


Here, the set {̂ p
y (np )}p=1 denotes the “collocation points” for ℱ , which
data.) Following [14], W in is generated for each row of the matrix to
are defined as a prediction horizon of Np datapoints obtained from the
have only one randomly chosen nonzero element, which is indepen­
dently taken from a uniform distribution in the interval [ − σ in , σin ]. W is ESN covering the time period (T + Δt) ≤ t ≤ (T + Np Δt). Compared to
constructed to have an average connectivity 〈d〉 and the non-zero ele­ the conventional approach where the regularization of W out is based on
ments are taken from a uniform distribution over the interval [ − 1,1]. All avoiding extreme values of W out , the proposed method regularizes W out
the coefficients of W are then multiplied by a constant coefficient for the by using the prior physical knowledge. Eq. (7), which is a key equation,
shows how to constrain the prior physical knowledge in the loss

Fig. 1. Schematic of the ESN during (a) training and (b) future prediction. The physical constraints are imposed during the training phase (a).

2
N.A.K. Doan et al. Journal of Computational Science 47 (2020) 101237

function. Therefore, this procedure ensures that the ESN becomes pre­ training. Therefore, Np is chosen as a trade-off.
dictive because of data training and the ensuing prediction is consistent The predictions for the Lorenz system by conventional ESN and PI-
with the physics. It is motivated by the fact that in many complex ESNs, for a particular case where the reservoir has 200 units, are
physical systems, the cost of data acquisition is prohibitive and thus, compared with the actual evolution in Fig. 2, where the time is
there are many instances where only a small amount of data is available normalized by the largest Lyapunov exponent, λmax = 0.934. Fig. 3
for the training of neural networks. In this context, most existing ma­ shows the evolution of the associated normalized error, which is defined
chine learning approaches lack robustness. The proposed approach as
better leverages on the information content of the data that the recurrent
||u(n) − ̂u (n)||
neural network uses. The physics-informed framework is straightfor­ E(n) = 〈 (9)
ward to implement because it only requires the evaluation of the re­
2 1/2
||u|| 〉
sidual, but it does not require the computation of the exact solution.
Practically, the optimization of W out is performed using the L-BFGS-B where 〈⋅〉 denotes the time average. The PI-ESN shows a remarkable
algorithm [21] with the W out obtained by ridge regression (Eq. (4)) as improvement of the time over which the predictions are accurate.
the initial guess. Indeed, the time for the normalized error to exceed 0.2, which is the
threshold used here to define the predictability horizon, increases from 4

2.2. Hybrid ESN

For a machine learning model comparison, the PI-ESN, which in­


cludes physical knowledge as a penalty term in the loss function, is
compared to the hybrid approach of [14]. The hybrid approach com­
bines an ESN with an approximate model, which provides a one-step
forward prediction that is fed both as an input into the ESN and
directly into the output layer. The reservoir is excited by both the
original input data and the prediction of the approximate model. The
output layer, W out , is trained by blending the reservoir states and the
prediction from the approximate model. This approach increases the size
of the input and output layers of the ESN by the number of degrees of
freedom in the approximate model. In [14], the approximate model was
based on the same governing equations as the original system (with one
of the coefficients being slightly altered), which doubles the size of the
output and input layers. A similar approach will be carried out here for
comparison.

3. Results

3.1. Lorenz system

The approach described in Section 2 is applied for forecasting the


chaotic evolution of the Lorenz system, which is governed by the
following equations [16]:
u˙1 = σ (u2 − u1 ) (8a)

u˙2 = u1 (ρ − u3 ) − u2 (8b)

u˙3 = u1 u2 − βu3 (8c)

where ρ = 28, σ = 10 and β = 8/3. These are the standard values of the
Lorenz system that spawn a chaotic solution [16]. The size of the
training dataset is Nt = 1000 and the timestep between two time in­
stants is Δt = 0.01. This corresponds to roughly 10 Lyapunov times
[22].
The parameters of the reservoir both for the conventional and PI-
ESNs are: σ in = 0.15, Λ = 0.4 and 〈d〉 = 3. In the case of the conven­
tional ESN, γ = 0.0001. These values of the hyperparameters are taken
from previous studies [14,15].
For the PI-ESN, a prediction horizon of Np = 1000 points is used and
the physical error is estimated by discretizing Eq. (8) using an explicit
Euler time-integration scheme. The choice of Np = 1000 is used to bal­
ance the error based on the data and the error based on the physical
constraints. A balancing factor, similar to the Tikhonov regularization
factor, could potentially also be used to do this. However, the proposed
method based on collocation points provide additional information for
the training of the PI-ESN as the physical residual has to be minimized at Fig. 2. Prediction of the Lorenz system (a) u1 , (b) u2 , (c) u3 with the conven­
the collocation points. Increasing Np may be beneficial for the accuracy tional ESN (dotted red lines) and the PI-ESN (dashed blue lines). The actual
of the PI-ESN, but at the cost of a more computationally expensive evolution of the Lorenz system is shown with full black lines.

3
N.A.K. Doan et al. Journal of Computational Science 47 (2020) 101237

The hybrid methods have a larger predictability horizon than both


the PI-ESN and the conventional ESN with a downward, or constant,
trend with increasing reservoir sizes. The hybrid model is more prone to
overfitting as its output matrix is twice the size as the output matrix of
the PI-ESN and conventional ESN. Furthermore, as it may be expected,
the predictability horizon of the higher-accuracy case (ϵ = 0.05) is
larger than the lower-accuracy case (ϵ = 1.0). The high predictability
horizon of the hybrid ESN is due to the fact that the approximate solu­
tion is very close to the correct dynamics because the approximate
model consists of the exact governing equations (with a small pertur­
bation to a parameter). Technically, the only difference between the
approximate model and the governing equations is the ϵu1 term in Eq.
(8b). To compensate for this small error, the hybrid ESN does not need to
learn the actual chaotic dynamics of the Lorenz system because the
approximate model provides an accurate estimate. In practice, reduced-
Fig. 3. Error, E, from the conventional ESN (dotted red lines) and the PI-ESN order models may contain larger model errors and may not model so
(dashed blue lines) of the predictions shown in Fig. 2. (For interpretation of accurately the actual dynamics of the system. Therefore, if the approx­
the references to color in this figure legend, the reader is referred to the web imate solution is sufficiently far from the real dynamics, the gain in
version of this article.) predictability horizon could become negligible despite the input and
output layers being larger. This loss in accuracy in the hybrid ESN is
Lyapunov times for the data-only ESN to 5.5 for the PI-ESN. illustrated in Section 3.3 where noisy training data are considered. In
The statistical dependence of the predictability horizon on the contrast, the PI-ESN enables an improvement in predictability horizon
reservoir size and the comparison with a hybrid ESN [14] are shown in without modifying the underlying network architecture.
Fig. 4. In the hybrid ESN, the approximate model consists of the same
governing equations (Eqs. (8)) with a slightly different parameter ρ,
which is perturbed as (1 + ϵ)ρ (as in [14]). Values of ϵ = 0.05 and ϵ = 3.2. Charney–DeVore system
1.0 are considered here to have a higher- and lower-accuracy approxi­
mate model. This statistical predictability horizon is estimated as fol­ The truncated Charney–DeVore (CDV) system is now considered.
lows. First, the trained PI-ESNs and conventional ESNs are run for an This model is based on a Galerkin projection and truncation to 6 modes
ensemble of 100 different initial conditions. Second, for each run, the of the barotropic vorticity equation in a β-plane channel with orography
predictability horizon is calculated. Third, the mean of the predictability [17]. The 6 retained modes exhibit chaos and intermittency for an
horizon is computed from the ensemble. appropriate choice of parameters. The model equations are [17,23,24].
It is observed that the physics-informed approach provides a marked
u˙1 = γ *1 u3 − C(u1 − u*1 )
improvement of the predictability horizon over conventional ESNs and,
most significantly, for reservoirs of intermediate sizes. The only excep­ u˙2 = − (α1 u1 − β1 )u3 − Cu2 − δ1 u4 u6
tion is for the smallest reservoir (Nx = 50). In principle, it may be u˙3 = (α1 u1 − β1 )u2 − γ 1 u1 − Cu3 + δ1 u4 u5
(10)
conjectured that a conventional ESN may have a similar performance to u˙4 = γ *2 u6 − C(u4 − u*4 ) + ϵ(u2 u6 − u3 u5 )
that of a PI-ESN by ad hoc optimization of the hyperparameters. How­ u˙5 = − (α2 u1 − β2 )u6 − Cu5 − δ2 u4 u3
ever, no efficient methods are available (to date) for hyperparameters u˙6 = (α2 u1 − β2 )u5 − γ 2 u4 − Cu6 + δ2 u4 u2
optimization [13]. The approach proposed here allows us to improve the
performance of the ESN (optimizing W out ) by adding a constraint on the where the model coefficients are given by:
physics, i.e., the governing equations, without changing the hyper­ √̅̅̅
parameters of the ESN and so, without performing an ad hoc tuning of 8 2m2 (b2 + m2 − 1) βb2
αm = , βm = 2
the hyperparameters. This suggests that the physics-informed approach π(4m2 − 1)(b2 + m2 ) b + m2
may be more robust than the conventional approach and could provide √̅̅̅ √̅̅̅
64 2 b2 − m2 + 1 4 2mb
an improvement of the accuracy of a given ESN without having to δm = , γ *
m = γ (11)
15π b2 + m2 π(4m2 − 1)
perform an expensive additional hyperparameter optimization.
√̅̅̅ √̅̅̅ 3
16 2 4 2m b
ϵ= , γm = γ
5π π(4m2 − 1)(b2 + m2 )

for m = 1, 2. Here, we set the parameters as in [23], (u*1 , u*4 , C, β, γ, b) =


(0.95, − 0.76095, 0.1, 1.25, 0.2, 0.5), which ensures a chaotic and
intermittent behaviour.
The time evolution of this system is illustrated in Fig. 5. It can be seen
that the CDV system shows two distinct regimes: one characterized by a
slow evolution (and a large decrease in u1 ) and one with strong fluctu­
ations of all modes. These correspond to “blocked” and “zonal” flow
regimes, respectively, which originate from the combination of topo­
graphic and barotropic instabilities [17]. This intermittent characteristic
of the CDV system makes it significantly more challenging than the
Lorenz system. The dataset illustrated in Fig. 5 is obtained by dis­
Fig. 4. Mean predictability horizon of the conventional ESN (red line with
circles), PI-ESN (blue line with crosses), hybrid method with ϵ = 0.05 (green cretizing the set of Eqs. (10) with an Euler-explicit scheme with a
line with triangles) and hybrid method with ϵ = 1.0 (dashed green line with timestep of Δt = 0.1. The first 9000 timesteps of Fig. 5, highlighted in
triangles) as a function of the reservoir size (Nx ) for the Lorenz system. (For the grey box, are kept for training. This corresponds to approximately 30
interpretation of the references to color in this figure legend, the reader is Lyapunov times. The largest Lyapunov exponent of the CDV system is
referred to the web version of this article.) equal to λmax = 0.033791.

4
N.A.K. Doan et al. Journal of Computational Science 47 (2020) 101237

number of collocation points was sufficient to improve the prediction as


is shown next.
In Fig. 6, the predictions of the evolution of the CDV system by the
ESN and PI-ESN with a reservoir of 600 units are presented alongside the
true evolution. The associated normalized error (Eq. (9)), is shown in
Fig. 7. The PI-ESN outperforms the conventional ESN and maintains a
good accuracy for 2 Lyapunov times beyond the conventional ESN.
To assess the robustness of the results and compare the PI-ESN with
the hybrid ESN, a statistical analysis similar to Section 3.1 is shown in
Fig. 8. Similarly to the Lorenz system, the approximate model used
consists of the exact governing equations (Eqs. (10)) with one parameter
being slightly perturbed. Two cases are considered: one in which b is
perturbed as (1 + ϵ)b (hybrid-b), and one in which C is perturbed as (1 +
ϵ)C (hybrid-C), where ϵ = 0.05 or 1.0.
The mean predictability horizon is computed from 100 different
initial conditions and for different reservoir sizes. Similarly to the Lorenz
system, the PI-ESN outperforms the conventional ESN by up to 2 Lya­
punov times. However, the evolution of the predictability horizon of the
PI-ESN and also the conventional ESN shows some degradation for very
large reservoirs. It is conjectured that this behaviour originates from
overfitting and the more complicated evolution of the CDV system
which exhibits two different regimes. Indeed, for the PI-ESN, the
training is performed using the training timeseries which contains
mostly a zonal regime evolution and the collocation points which are at
times corresponding to a zonal regime as they are directly after the
training dataset. As a result, the conventional ESN and the PI-ESN with
very large reservoir may be overfitting to predict only the zonal regime.
It is possible that by extending the collocation points for the PI-ESN, the
prediction of the PI-ESN improves as those added collocation points may
then cover a blocked regime evolution.
The hybrid ESN has a larger predictability horizon for small reser­
voirs because of the extra information added by the approximate model,

Fig. 5. (a) Evolution of the modal amplitudes of the CDV system (black to light
gray: u1 to u6 ). The shaded grey box indicates the data used for training. (b)
Phase plots of the u1 − u4 trajectory.

For the prediction, the parameters for the ESNs are: σin = 2.0, Λ =
0.9 and 〈d〉 = 3. For the conventional ESN, γ = 0.0001. These values are
obtained after performing a grid search. For the PI-ESN, a prediction
horizon of Np = 3000 points is used. Compared to the Lorenz system
where the same number of collocation points as training points was
used, here, comparatively fewer collocation points are used. This choice Fig. 6. Prediction of the CDV system for (a) u1 , u2 and u3 and (b) u4 , u5 and
was made to decrease the computational cost of the optimization process u6 with the conventional ESN (dotted lines) and the PI-ESN (dashed lines). The
as the cost of computing Ep is proportional to Np . Nonetheless, that actual evolution of the CDV system is shown with full lines.

5
N.A.K. Doan et al. Journal of Computational Science 47 (2020) 101237

Fig. 7. Error on the prediction from the conventional and PI-ESN for the pre­
diction shown in Fig. 6.

Fig. 8. Mean predictability horizon of the conventional ESN (red line with
circles), PI-ESN (blue line with crosses), hybrid-b with ϵ = 0.05 (full green line
with triangles), hybrid-b with ϵ = 1.0 (dashed green line with triangles),
hybrid-C with ϵ = 0.05 (full orange line with downward triangles) and hybrid-C
with ϵ = 1.0 (dashed orange line with downward triangles) as a function of the
reservoir size (Nx ) for the CDV system. (For interpretation of the references to
color in this figure legend, the reader is referred to the web version of
this article.)

Fig. 9. (a) Prediction of the Lorenz system with the conventional ESN (dotted
which is close to the exact model. The accuracy is, however, less marked
lines) and the PI-ESN (dashed lines) with 200 units trained from noisy data
than it is in the Lorenz system because an error in the parameters b or C
(SNR=20dB) and (b) zoom of the evolution before the divergence of the con­
is amplified by more significant model nonlinearities as these parame­ ventional ESN. The actual (noise-free) evolution of the Lorenz system is shown
ters appear in all the governing equations of the CDV system (Eq. (10)). with full grey lines. (c) Error on the prediction for the conventional ESN and
The accuracy of hybrid-b is lower than the accuracy of hybrid-C because PI-ESN.
the nonlinear dynamics is more sensitive to small errors in b, which
affects all the coefficients of the CDV equations (Eqs. (10) and (11)). required during the training as to how to appropriately filter the noise.
Similarly to the Lorenz system, when the model error is larger (ϵ = 1.0), Indeed, the physics-based loss provides the constraints that the com­
the predictability horizon is smaller than with the accurate approximate ponents of the output have to satisfy, therefore providing an indication
model (ϵ = 0.05). as to how to filter the noise. In addition, for the Lorenz system, the
conventional ESN is diverging during its prediction while the PI-ESN’s
prediction remains bounded. This highlights the improved robustness of
3.3. Robustness with respect to noise the physics-informed approach. This is an encouraging result, which can
potentially enable the use of the proposed approach with noisy data
In this section, we study the robustness of the results presented in the from physical experiments whose governing equations are known.
previous sections for the Lorenz and CDV systems with regard to noise. The mean predictability horizon for the two systems and the two
To do so, the training data used in Sections 3.1 and 3.2 are perturbed by noise levels is shown in Fig. 11, which also shows a comparison with the
adding measurement Gaussian noise to the training datasets. Two cases hybrid approach with ϵ = 0.05. For the Lorenz system, compared to the
with signal to noise ratios (SNRs) of 20 and 30 dB are considered, which ESN trained on non-noisy data, in Fig. 4, the mean predictability horizon
are typical noise levels encountered in experimental fluid mechanics is smaller. Furthermore, for the data-only ESN, the predictability hori­
[25]. zon decreases for large reservoirs. This is because the ESN starts over­
The evolution of the Lorenz and the CDV systems and the predictions fitting the noisy data and, thereby, reproducing a noisy behaviour and
from the conventional and PI-ESNs are shown in Figs. 9 and 10 , deteriorating its prediction. On the other hand, the PI-ESN maintains a
respectively. In those figures, it is seen that the proposed approach still satisfactory predictability horizon for the same large reservoirs. This
improves the prediction capability of the PI-ESN despite the training indicates that the physics-based regularization in the loss function (Ep in
with noisy data. This originates from the physics-based regularization
Eq. (7)) enhances the robustness of the PI-ESN. The predictability
term in the loss function in Eq. (7), which provides the information

6
N.A.K. Doan et al. Journal of Computational Science 47 (2020) 101237

Fig. 11. Mean predictability horizon of the conventional ESN (dotted line with
circles), PI-ESN (full line with crosses), hybrid or hybrid-b (dashed-dotted line
with upward triangles) and hybrid-C (dashed line with downward triangles)
trained from noisy data (red: SNR = 20 dB, blue: SNR = 30 dB) as a function of
the reservoir size (Nx ) for the (a) Lorenz and (b) CDV systems. Hybrid methods
are used with ϵ = 0.05. (For interpretation of the references to color in this
figure legend, the reader is referred to the web version of this article.)

larger dimension than the Lorenz system. Hence, it would require larger
reservoirs than those considered here before the occurrence of noise
overfitting. Finally, the accuracy of the hybrid method is similar to that
of the PI-ESN. Similarly to the Lorenz system, this is because of the effect
of noisy data used in training.

4. Conclusions and future directions

In this paper, we propose an approach for training echo state net­


works (ESNs) by constraining the knowledge of the physical equations
that govern a dynamical system. This physics-informed ESN (PI-ESN) is
shown to be more robust than purely data-trained ESNs. The proposed
PI-ESN needs minimal modification of the original architecture by
requiring only the estimation of the physical residual. The predictability
horizon is markedly increased without requiring additional training
data. This is assessed on the Lorenz system and the Charney–DeVore
Fig. 10. (a, b) Prediction of the CDV system with the conventional ESN (dotted system, both of which exhibit strong intermittency. Furthermore, the
lines) and the PI-ESN (dashed lines) with 600 units trained from noisy data robustness to noise of the proposed PI-ESN is assessed. It is observed
(SNR = 20 dB). The actual (noise-free) evolution of the CDV system is shown that, compared to a Thikonov regularization, the PI-ESN performs more
with full red lines and the noisy data is shown with full greyscale lines. (c) Error robustly, even with larger reservoirs where the conventional ESN may
on the prediction from the conventional ESN and PI-ESN. overfit the noisy data. As compared to other nonlinear filters used for
denoising, such as the ensemble Kalman filter, the proposed approach
horizon of the hybrid method is close to the predictability horizon of the does not require ensemble calculations.
PI-ESN for a small noise level. This is due to the effect of noise in the For noise-free data, the predictability of the hybrid ESN [14] can be
training data. During the training, the approximate model time- higher than the predictability of the PI-ESN, but the model errors of the
integrates noisy input data, therefore, the approximate prediction is additional approximate model in the hybrid ESN, which requires an
far from the target output. As a result, during the training, the hybrid additional time-integration, should be very small. In engineering prac­
ESN learns to rely mostly on the reservoir states to make a forecast, and tice, we expect model errors to be more significant. Additionally, the
only to use the prediction from the approximate model in a limited way. hybrid method needs larger output and input layers, up to twice the
This is more apparent for a higher noise level, in which the predictability original size if the approximate model has the same number of states as
horizon of the hybrid method becomes shorter than the predictability the original system as in [14], and a time integrator for the approximate
horizon of the PI-ESN. This shows that the hybrid ESN is not filtering out model. For noisy data, the predictability of the PI-ESN is higher than the
the noise as efficiently as the PI-ESN. The performance of the hybrid ESN predictability of the hybrid method of [14].
deteriorates for a higher noise level. In addition, in ongoing work, the PI-ESN is being applied to high
For the CDV system, similar observations as for the Lorenz system dimensional fluid dynamics systems. This work opens up new possibil­
can be made. However, the decrease in mean predictability horizon of ities for the time-accurate prediction of the dynamics of chaotic systems
the ESN and PI-ESN with large reservoir sizes is not observed as it has a

7
N.A.K. Doan et al. Journal of Computational Science 47 (2020) 101237

by using the underlying physical laws as constraints. [7] J.-L. Wu, H. Xiao, E. Paterson, Physics-informed machine learning approach for
augmenting turbulence models: a comprehensive framework, Phys. Rev. Fluids
(2018) 074602, arXiv:1801.02762v3.
Conflict of interest [8] K. Duraisamy, G. Iaccarino, H. Xiao, Turbulence modeling in the age of data, Annu.
Rev. Fluid Mech. 51 (2019) 357–377, arXiv:1804.00183.
[9] M. Raissi, P. Perdikaris, G. Karniadakis, Physics-informed neural networks: a deep
The authors declare no conflict of interest. Luca Magri on behalf of all learning framework for solving forward and inverse problems involving nonlinear
the authors. partial differential equations, J. Comput. Phys. 378 (2019) 686–707.
[10] A. Karpatne, G. Atluri, J.H. Faghmous, M. Steinbach, A. Banerjee, A. Ganguly,
S. Shekhar, N. Samatova, V. Kumar, Theory-guided data science: a new paradigm
Declaration of Competing Interest for scientific discovery from data, IEEE Trans. Knowl. Data Eng. 29 (2017)
2318–2331, arXiv:1612.08544.
[11] G. Kissas, Y. Yang, E. Hwuang, W.R. Witschey, J.A. Detre, P. Perdikaris, Machine
The authors report no declarations of interest. learning in cardiovascular flows modeling: predicting arterial blood pressure from
non-invasive 4D flow MRI data using physics-informed neural networks, Comput.
Methods Appl. Mech. Eng. 358 (2020) 112623, arXiv:1905.04817.
Acknowledgements
[12] R. Stewart, S. Ermon, Label-free supervision of neural networks with physics and
domain knowledge, 31st AAAI Conf. Artif. Intell. AAAI 2017, volume 1 (2017)
The authors acknowledge the support of the Technical University of 2576–2582. arXiv:1609.05566.
Munich – Institute for Advanced Study, funded by the German Excel­ [13] M. Lukoševičius, H. Jaeger, Reservoir computing approaches to recurrent neural
network training, Comput. Sci. Rev. 3 (2009) 127–149.
lence Initiative and the European Union Seventh Framework Pro­ [14] J. Pathak, A. Wikner, R. Fussell, S. Chandra, B.R. Hunt, M. Girvan, E. Ott, Hybrid
gramme under grant agreement no. 291763. L.M. also acknowledges the forecasting of chaotic processes: using machine learning in conjunction with a
Royal Academy of Engineering Research Fellowship Scheme. knowledge-based model, Chaos 28 (2018) 041101, arXiv:1803.04779.
[15] J. Pathak, B. Hunt, M. Girvan, Z. Lu, E. Ott, Model-free prediction of large
spatiotemporally chaotic systems from data: a reservoir computing approach, Phys.
References Rev. Lett. 120 (2018) 24102.
[16] E.N. Lorenz, Deterministic nonperiodic flow, J. Atmos. Sci. 20 (1963) 130–141.
[1] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep [17] D.T. Crommelin, J.D. Opsteegh, F. Verhulst, A mechanism for atmospheric regime
convolutional neural networks, Adv. Neural Inf. Process. Syst. 25 (2012) behavior, J. Atmos. Sci. 61 (2004) 1406–1419.
1097–1105. [18] M. Lukoševičius, A practical guide to applying echo state networks, in:
[2] G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, G. Montavon, G.B. Orr, K.-R. Muller (Eds.), Neural Networks: Tricks of the Trade,
V. Vanhoucke, P. Nguyen, T. Sainath, B. Kingsbury, Deep neural networks for Springer, 2012.
acoustic modeling in speech recognition: the shared views of four research groups, [19] P. Verzelli, C. Alippi, L. Livi, Echo state networks with self-normalizing activations
IEEE Signal Process. Mag. 29 (2012) 82–97. on the hyper-sphere, Sci. Rep. 9 (2019) 1–14. arXiv:1903.11691.
[3] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. van den Driessche, [20] H. Jaeger, H. Haas, Harnessing nonlinearity: predicting chaotic systems and saving
J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, energy in wireless communication, Science 304 (2004) 78–80 (80-.).
D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, [21] R.H. Byrd, P. Lu, J. Nocedal, C. Zhu, A limited memory algorithm for bound
K. Kavukcuoglu, T. Graepel, D. Hassabis, Mastering the game of Go with deep constrained optimization, J. Sci. Comput. 16 (1995) 1190–1208.
neural networks and tree search, Nature 529 (2016) 484–489. [22] S.H. Strogatz, Nonlinear Dynamics and Chaos: With Applications to Physics,
[4] M. Raissi, Z. Wang, M.S. Triantafyllou, G. Karniadakis, Deep learning of vortex- Biology, Chemistry, Engineering, Perseus Books Publishing, 1994.
induced vibrations, J. Fluid Mech. 861 (2019) 119–137. [23] Z.Y. Wan, P. Vlachas, P. Koumoutsakos, T.P. Sapsis, Data-assisted reduced-order
[5] J. Ling, A. Kurzawski, J. Templeton, Reynolds averaged turbulence modelling using modeling of extreme events in complex dynamical systems, PLOS ONE 13 (2018)
deep neural networks with embedded invariance, J. Fluid Mech. 807 (2016) 1–22, arXiv:1803.03365.
155–166. [24] D.T. Crommelin, A.J. Majda, Strategies for model reduction: comparing different
[6] S. Jaensch, W. Polifke, Uncertainty encountered when modelling self-excited optimal bases, J. Atmos. Sci. 61 (2004) 2206–2217.
thermoacoustic oscillations with artificial neural networks, Int. J. Spray Combust. [25] N.T. Ouellette, H. Xu, E. Bodenschatz, A quantitative study of three-dimensional
Dyn. 9 (2017) 367–379. Lagrangian particle tracking algorithms, Exp. Fluids 40 (2006) 301–313.

You might also like