0% found this document useful (0 votes)
78 views23 pages

Physics-Informed Neural Nets For Control of Dynamical Systems

Uploaded by

Partha Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
78 views23 pages

Physics-Informed Neural Nets For Control of Dynamical Systems

Uploaded by

Partha Surve
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Physics-Informed Neural Nets for Control of Dynamical Systems

Eric Aislan Antoneloa, Eduardo Camponogaraa, Laio Oriel Semana,b, Eduardo Rehbein de Souzaa , Jean Panaioti
Jordanoua, Jomi Fred Hübnera
a Department of Automation and Systems Engineering, Federal University of Santa Catarina, Florianópolis - Brazil, CEP: 88040-900
b Gradute Program in Applied Computer Science, University of Vale do Itajaı́, Itajaı́ - Brazil, CEP: 88302-901

Abstract
Physics-informed neural networks (PINNs) impose known physical laws into the learning of deep neural networks,
making sure they respect the physics of the process while decreasing the demand of labeled data. For systems rep-
arXiv:2104.02556v3 [cs.LG] 31 May 2022

resented by Ordinary Differential Equations (ODEs), the conventional PINN has a continuous time input variable
and outputs the solution of the corresponding ODE. In their original form, PINNs do not allow control inputs,
neither can they simulate for variable long-range intervals without serious degradation in their predictions. In this
context, this work presents a new framework called Physics-Informed Neural Nets for Control (PINC), which pro-
poses a novel PINN-based architecture that is amenable to control problems and able to simulate for longer-range
time horizons that are not fixed beforehand, making it a very flexible framework when compared to traditional
PINNs. Furthermore, this long-range time simulation of differential equations is faster than numerical methods
since it relies only on signal propagation through the network, making it less computationally costly and, thus, a
better alternative for simulation of models in Model Predictive Control. We showcase our proposal in the control
of two nonlinear dynamic systems: the Van der Pol oscillator and the four-tank system.
Keywords: physics-informed neural networks, deep learning, nonlinear model predictive control.

1. Introduction

In the era of industry 4.0, the simulation and control of complex real-world systems in smart and efficient
ways becomes increasingly important. Thus, harnessing deep learning for smart automation and control of real
plants is not only desirable but also inevitable. One way to achieve that is by the use of deep neural networks
as models in Model Predictive Control (MPC) (Grüne & Pannek, 2011). MPC is a technique that has become
standard for multivariate control in industry and academia (Camacho & Bordons, 2013). Since its inception in the
1970s, MPC has been successfully applied in the oil and gas (Jordanou et al., 2018), aerospace (Eren et al., 2017)
and process industries, as well as in robotics (Nascimento et al., 2018). The main idea of MPC is to control a
system by employing a prediction model: at every iteration of the control loop, an optimization problem is solved
using a model of the plant in a receding horizon approach.
There are two main cases which we consider for which practical application of MPC or even just efficient
simulation of a dynamic system are a challenge: (a) sparse or insufficient historical data of the real plant to build
a sufficiently accurate machine learning model; (b) when the numerical simulation of a precise model given by
Ordinary Differential Equations (ODEs) or Partial Differential Equations (PDEs) is too costly to be considered
in a real-time application. However, a recently introduced approach for training deep neural networks using
laws of physics, namely Physics-Informed Neural Networks (PINN) (Raissi et al., 2017, 2019), is one effective
approach that addresses both of the aforementioned challenges. For the first challenge (a), we assume that a priori
knowledge built previously by experts or borrowed from the laws of nature is available. For (b), instead of relying
on numerical solutions of differential equations, PINNs can be used instead easing the computational burden of
solving ODEs or PDEs, and consequently extending the application of MPC to more real-time scenarios.
A standard PINN has a continuous-time t as input, and the system’s state variables as output y. The main
outcome of this approach is that the need for real data collection is reduced to a minimum, since the behavior

∗ Corresponding
author. E-mail: [email protected]
Email addresses: [email protected] (Eduardo Camponogara ), [email protected] (Laio Oriel Seman),
[email protected] (Eduardo Rehbein de Souza ), [email protected] (Jean Panaioti Jordanou ),
[email protected] (Jomi Fred Hübner )

Preprint submitted to Elsevier June 2, 2022


of the deep network is constrained to follow physical laws given in terms of PDEs or ODEs. These differ-
ential equations are included in the learning problem’s cost function as nonlinear differential operators on the
output y of the networks, defining a second cost term that regularizes the learning process. Effectively, this
approach also allows solving complex PDEs or ODEs by using deep learning since the network output y rep-
resents the solution of these equations. Since Raissi et al. (2017), many extensions and alternative approaches
have been proposed by Zhu et al. (2019); Sirignano & Spiliopoulos (2018); Meng et al. (2020); Yang et al. (2021);
Pang & Karniadakis (2020); Stinis (2020). Applications of PINNs are widespread in many engineering areas, in-
cluding Harp et al. (2021) for underground reservoir pressure management, Chen & Liu (2021) for fatigue data
analysis, and Kumar et al. (2021) for characterizing non-Newtonian fluid flow to achieve optimal use of energy
resources.
However, to the best of our knowledge, there are no PINN architectures in continuous time in the literature
that allow optimal control techniques such as Multiple Shooting (MS) (Biegler, 2010) and Model-based Predictive
Control (MPC) to be readily applied. Previous works used neural networks such as Echo State Networks and Long
Short-Term Memory (LSTM) networks as models of the plant or process to be controlled (Jordanou et al., 2021).
These networks are trained exclusively on data collected from the process and are thus not sample efficient as
PINNs, as the latter can benefit from prior knowledge of the physics laws involved in the processes. In this sense,
the challenge is to make PINNs compliant to control applications so that they can be used as a predictive model of
a plant or process. In their original form, PINNs do not allow control inputs, neither can they simulate for variable
long-range intervals without serious degradation in their predictions.
With those limitations in mind, this work presents a new framework called Physics-Informed Neural Nets
for Control (PINC), which proposes a novel PINN-based architecture that is amenable to control problems. In
particular:
(i) our PINN-based architecture, called hereafter PINC net, is augmented with extra inputs such as the initial
state of the system and control input, in addition to the continuous-time t. This augmentation is inspired
by the multiple shooting and collocation methods (Biegler, 2010), which are numerical methods for solving
boundary value problems in ODEs, which split the time horizon over which a solution is sought into several
shorter intervals (shooting intervals). In our proposal, a single network learns the ODE solution conditioned
on the initial state and the given control input over the shorter interval.
(ii) this innovation allows enhancing the simulation capabilities of conventional PINNs, which can not correctly
sustain a simulation beyond the time interval that was fixed during network training. This degradation of
the network prediction is related to the maximum value allowed for t, which is fixed at training time. On
the other hand, our proposed PINC network can run for an indefinite time horizon as long as it is necessary,
without significant deterioration of network prediction. This is achieved by chaining the network prediction
in a self-feedback mode, by setting the initial state (input) of the next interval k to the last predicted state
(network output) of the previous interval k − 1. The work in Meng et al. (2020) also intends to solve this
problem, but it requires many individual PINNs to be independently trained and, besides, is not ready for
control applications.
(iii) the particular structure of PINC makes physics-informed nets in continuous time amenable to MPC appli-
cations, which is the first work in the literature to tackle this as far as the authors know.
(iv) finally, the real-time requirements for simulation of differential equations, in particular for MPC applica-
tions, are better satisfied with PINC than traditional numerical simulation methods, since the inference of
an already trained PINC network can replace a numerical solution method at each timestep of the prediction
horizon in MPC.
In the following, some related works are presented in Section 2. PINNs, MPC and PINC are introduced in
Section 3, whereas the experiments on identifying and controlling two known dynamical systems in the literature
are shown in Section 4. Section 5 concludes this work.

2. Related Works

2.1. Neural networks and MPC


Neural networks have long been used as models in MPC tasks or as controllers themselves. Previous works
have trained neural networks to imitate MPC strategies using the usual mean squared error cost functions (Ortega & Camacho,
1996; Cavagnari et al., 1999). In Åkesson & Toivonen (2006), the control law is represented by a neural network

2
approximator, trained offline to minimize a control-related cost function directly, without the need to calculate a
model predictive controller during training.
In the vein of Recurrent Neural Networks (RNN), works such as Jordanou et al. (2021) and Pan & Wang
(2012) utilize Echo State Networks as dynamical models for the MPC. Jordanou et al. (2021) uses a Trajectory
linearization approach (Ławryńczuk, 2014), by derivating the input-output sensitivities along the nonlinear free
response over the prediction horizon to calculate a forced response (Camacho & Bordons, 2013). In Pan & Wang
(2012), the whole ESN is approximated into a state space system for computation of the control action. Terzi et al.
(2020) rely on the same reduction approach, however employing LSTMs instead of ESN.
Another example is the classical Approximate Predictive Control (Witt & Werner, 2010), which employs a
feedforward neural network that implements dynamics through the application of delayed outputs as inputs (an
external dynamics model (Nelles, 2001)), obtains an ARX (Auto Regressive with eXogenous input) model from
the networks through derivation, and performs GPC (Generalized Predictive Control) calculations per time step
(Camacho & Bordons, 2013). Hertneck et al. (2018) consider a neural network as the approximation to a MPC, in
the same vein as Åkesson & Toivonen (2006).

2.2. Long-range simulation with PINNs


In Meng et al. (2020), parareal PINNs are proposed for long time integration of time-dependent PDEs. They
decompose a long-time problem in several short-time independent problems supervised by a fast coarse-grained
solver, which provide approximate predictions of the solution at discrete times. Several smaller, fine PINNs are
trained in parallel with the help of the supervision given by the fast solver. Each PINN solves the problem for a
particular time interval independently. Notice that their approach does not include the possibility of control inputs
and, thus, can not be readily used for control applications. On the other hand, while the initial goal of our proposal
is to extend PINNs for control, we also benefit from being able to simulate a PINN for ODEs for a long time
interval as well (we expect that our architecture can be extended for PDEs too).

2.3. PINNs for control


Since the first appearance of this work (Antonelo et al., 2021), some architectures based on PINNs for control
have been introduced. In Zhai & Sands (2021), PINNs are supposedly used to control chaos in van der Pol oscil-
lating circuits. However, the circuits employed in their paper have no control input, neither their method outputs
a control signal to be applied. Their PINN architecture still has only time t as input, and has an unusual loss
function which includes the loss for the data points and also for the reference to be followed. Surprisingly, we
have not found any physics law that was included in the network training, making their network just an ordinary
one. Besides, no control input can be derived from it to actually control a plant or dynamical system.
In Liu & Wang (2021), a model-based Reinforcement Learning (RL) algorithm for the first time employs
physical laws to learn the state transition dynamics of an agent’s environment. The model corresponds to an
encoder-decoder recurrent network architecture that learns the state transition function by minimizing the violation
of conservation laws. The real samples (state-action data pairs and corresponding rewards) from the environment
are used to train the agent and the transition model simultaneously. In turn, the latter is used to generate samples
into an alternative replay buffer that ultimately improves sample efficiency in the RL update and reduces real-world
interaction. As the transition function is part of a Markov Decision Process (MDP) formulation, it represents a
discrete evolution of the environment dynamics. For this reason, the physical loss function is built on the laws of
the system in their discretized form, instead of the continuous form as proposed in our work. Notice that while
they require training a recurrent network, our work is based on feedforward networks as time is explicitly given
as input here.
In Gokhale et al. (2022), PINNs are employed to learn a control-oriented thermal model of a building. As in
Liu & Wang (2021), they assume that the model is a discrete transition function in a MDP that predicts the next
state, given the current state and action. In that way, control actions could be input to the model. Their physical
loss also has to be discretized, differently from our work. Although their proposal is control-oriented, they do not
show actual control experiments with the trained PINNs, as we do in Section 4.

3. Methods

3.1. Physics-informed Neural Networks (PINNs)


Raissi et al. (2017, 2019) introduced physics-informed neural networks, training deep neural networks in a
supervised way to respect any physical law described by partial differential equations (PDEs). The PINN approach

3
allows one to find data-driven solutions of PDEs or ODEs automatically. In this paper, nonlinear ODEs are
considered in the following general form:

∂t y + N [y] = 0, t ∈ [0, T ] (1)

where N [·] is a nonlinear differential operator and y represents the state of the dynamic system (the latent ODE
solution).
We define F (y) to be equivalent to the left-hand side of Equation (1):

F (y) := ∂t y + N [y] (2)

Here, y also represents the output of a multilayer neural network (hence the notation y instead of x) which has
the continuous time t as input: y = fw (t), where fw represents the mapping function obtained by a deep network
parameterized by adaptive weights w. This formulation implies that a neural network must learn to compute the
solution of a given ODE.
Assuming an autonomous system for this formulation, a given neural network y(t) is trained using optimizers
such as ADAM (Kingma & Ba, 2014) or L-BFGS (Andrew & Gao, 2007) to minimize a mean squared error
(MSE) cost function:
MSE = MSEy + MSEF , (3)
where
Ny Nt
1 X 1 X
MSEy = |yi (tj ) − ybij |2 , (4a)
Ny i=1 Nt j=1

Ny NF
1 X 1 X
MSEF = |F (yi (tk ))|2 , (4b)
Ny i=1 NF
k=1

where: Nt , NF , and Ny correspond to the number of training data samples, the number of collocation points, and
the number of outputs of the neural network, respectively; yi (·) is the i-th output of the network; ybij represents
the desired i-th output for yi (·), considering the j-th data pair (tj , ybij ). The first loss term MSEy corresponds to
the usual cost function for regression (Bishop, 2006) based on collected training data {(tj , ybij )}N j=1 , which usually
t

provides the boundary (initial or terminal) conditions of ODEs when solving these equations.
The second loss term MSEF penalizes the misadjusted behavior of y(t), measured by F (y) in Equation (2),
whereby the physical structure of the solution is imposed by F (y) at a finite set of randomly sampled collocation
points {tk }Nk=1 . Experiments show that the training data size Nt required for learning a certain dynamical behavior
F

is drastically reduced due to the a priori information assimilated from MSEF . As the differential equation of the
physical system is assumed to be represented by F (y) = 0, the term MSEF is a measure of how well the PINN
adheres to the solution of the physical model. This physics-informed approach provides a framework that unifies a
previously available theoretical, possibly approximate model and measured data from processes, which is capable
of correcting imprecisions in the theoretical model or providing sample efficiency in process modeling.

3.2. Nonlinear Model Predictive Control


Model Predictive Control (MPC) has evolved considerably over the last two decades, significantly impacting
industrial process control. This impact can be attributed to its generality in posing the process control problem
in the time domain, being suitable for SISO (Single-Input Single-Output), and MIMO (Multiple-Input Multiple-
Output) systems. Soft and hard constraints can be imposed on the formulation of the control law through opti-
mization problems, while minimizing an objective function over a prediction horizon (Normey-Rico & Camacho,
2007).
MPC is not a specific control strategy, but rather a denomination of a vast set of control methods developed
considering some standard ideas and predictions (Normey-Rico & Camacho, 2007). Figure 1 shows a represen-
tation of the output prediction at a time instant, where the proposed actions generate a predicted behavior that
reduces the distance between the value predicted by the model and a reference trajectory.
The MPC strategy uses a discrete mathematical model based on the real process of interest. A predicted output
is calculated in a prediction horizon by comparing the mathematical model to the real process’s output. To propose
control actions, the MPC strategy uses an iterative optimization process, taking into account the mathematical
model of interest and the constraints that it is subjected. Based on objectives and constraints, the optimization
4
Figure 1: Representation of the output prediction at a time instant tk , where the proposed actions generate a
predicted behavior that reduces the distance between the value predicted by the model and a reference trajectory.

problem is composed of mathematical expressions established in the controller’s design phase, taking many forms.
Usually, quadratic functions are used to penalize the error in the reference tracking.
According to Camacho & Bordons (2013), there are several ways to classify these controllers taking into
account characteristics such as model linearity, treatment of uncertainties, and how the optimization problem is
solved. In this work, we focus on the lack of model linearity, more specifically in the Nonlinear Model Predictive
Control (NMPC) (Grüne & Pannek, 2011). The discrete NMPC formulation is given by:
N2 NX
u −1
X 2
J= x[k + j] − xref [k + j] + k∆u[k + i]k2R (5a)
Q
j=N1 i=0

while being subject to:


x[k + j + 1] = f (x[k + j], u[k + j]), ∀j = 0, . . . , N2 − 1 (5b)
j
X
u[k + j] = u[k − 1] + ∆u[k + i], ∀j = 0, . . . , (Nu − 1) (5c)
i=0

u[k + j] = u[k + Nu − 1], ∀j = Nu , . . . , N2 − 1 (5d)


h(x[k + j], u[k + j]) ≤ 0, ∀j = N1 , . . . , N2 (5e)
g(x[k + j], u[k + j]) = 0, ∀j = N1 , . . . , N2 (5f)

where k represents the time step at which the MPC problem is being computed, x[k] is the recurrent state of the
dynamic system which, for simplification purposes, is also the output (i.e., x = y), xref is the set-point signal
over the prediction horizon (i.e., reference), being defined by the first penalized instant k + N1 and the last instant
k +N2 . The cost function J is the penalization of the quadratic error between the model output x and the reference
xref along the horizon, and the penalization of the control increment ∆u. Each penalization is weighted by the
diagonal matrices Q and R, respectively. Eqn. (5b) is the constraint imposed by the considered state-equation
model with x as the state, and equations (5e) and (5f) refer to inequality and equality constraints imposed by
functions h and g, respectively. Eqns. (5c) and (5d) define the relation between the control action u and the
control increments, which are aggregated into the control action from time k up until either the control action time
k + j or k + Nu − 1.
The optimization problem is defined by equations from (5a) to (5f) and results in a Non-Linear Programming
(NLP) Problem, which can be solved using well-established methods like Sequential Quadratic Programming
(SQP) (Nocedal & Wright, 2006) and the Interior-Point (IP) method, available in commercial (Gill et al., 2005)
and non-commercial solvers (Wächter & Biegler, 2006). The NLP is solved at each time step k, and typical
approaches only apply the first control increment into the system (Camacho & Bordons, 2013).

3.3. Physics-Informed Neural nets-based Control (PINC)


Unlike PINNs that assume fixed inputs and conditions, the proposed PINC framework operates with variable
initial conditions as well as control inputs that can change over the complete simulation, making it suitable for
5
model predictive control tasks. The network is augmented with two, possibly multidimensional inputs: control
action u and initial state y(0), as illustrated in Figure 2. The output of the network is given by:

y(t) = fw (t, y(0), u), t ∈ [0, T ] (6)

where fw represents the mapping given by a deep network parameterized by weights w. In this work, we assume
the control input to be a constant value for the time interval t ∈ [0, T ]. Thus, the new formulation provides a
conditioned response y(t) on u and y(0) during this interval of T seconds.
Traditional PINNs tend to degrade rapidly for long time intervals and can only accept input t in the range the
network was trained. The PINC framework significantly alleviates this degrading issue as well as enables control
applications by dividing the problem in M equidistant control time intervals, each of T seconds (see Fig. 4). We
call this shorter period of T seconds as the inner continuous time interval of the problem, in which a solution of
an ODE is obtained given some initial condition y(0) (which models the current system state) and control input
u for t ∈ [0, T ]. This ODE solution y(t), which is the output of the network, is found by a single PINC network,
that is, the same network solves all M intermediate problems, which results from learning the ODE solution for
a particular range of initial conditions and control inputs that vary over the complete time horizon, but which stay
constant for t ∈ [0, T ].

3.3.1. Combining the intermediate solutions


Each of the M intermediate solutions of T seconds can be viewed in Figure 4. The states y[k] inferred by the
network can be seen at the top of the figure as a dashed trajectory, while its corresponding inputs are located in
the lower part. Here, the notation changed to represent the output in discrete time k. Between steps k and k + 1,
one intermediate solution is given by Equation (6), fixing the control input to some constant and the initial state to
the last state of step k − 1.
Since t is an input to the network, the state at t = T can be directly inferred by a single forward network
propagation:
y[k] = fw (T, y[k − 1], u[k]) (7)
where the initial state is set to the last state of the previous step, i.e., y[k − 1]; and the control input u[k] has an
index k indicating which fixed value is applied in the inner continuous time interval between steps k − 1 and k.

3.3.2. Free-run simulation in the prediction horizon


b [k − 1] coming from the process
The initial state of the dynamical system in step k can be either the true state y
or the previous network prediction y[k − 1] at timestep k − 1.
Within one iteration of MPC, the PINC net is used for a certain prediction horizon without feedback from the
process. This means that the network prediction y[k − 1] and not the true state y b [k − 1] is fed back as input to the
same network in the next timestep k of the prediction horizon (Fig. 3a).
In discrete time control applications, a sampling period Ts must be chosen. The setting of Ts usually depends
on the particular dynamics of the process being modeled. Here, T is equal to the sampling period Ts . Thus, using
Equation (7), we can encapsulate the PINC prediction function so that it is only a function of the control action
u[k] and previous prediction y[k − 1], leaving T implicit:

y[k] = fbw (y[k − 1], u[k])


= fw (T, y[k − 1], u[k]) (8)

fbw
We call fbw the control interface for the PINC framework. Thus, ∂∂u can be computed for providing the
Jacobian matrix to solvers used in MPC, possibly by means of automatic differentiation. This control interface
provides the prediction of the states of the dynamic system at the vertical lines in Fig. 4, that is, at every Ts
seconds, the state y[k] is predicted in a single forward net propagation operation, for k = 1, ..., M . This differs
from numerical integration methods that need to integrate over the continuous inner interval (Iserles, 1996).
Since the prediction is fed back as an input at every discrete timestep, it is expected that errors accumulate in
the long free run. This is not exclusive of this approach, and is common to recurrent neural networks. However,
because MPC works in a receding horizon control approach, at every timestep k of the control loop, the input
y[k − 1] representing the initial state is set to the real system’s state y b [k − 1] (Fig. 3b). Thus, the prediction
horizon in MPC always starts from the true initial state yb [k − 1], that is, Equation (8) becomes

y[k] = fbw (b
y[k − 1], u[k]) (9)

which counters error accumulation between consecutive control iterations.


6
Figure 2: The PINC network has initial state y(0) of the dynamic system and control input u as inputs, in addition
to continuous time scalar t. Both y(0) and u can be multidimensional. The output y(t) corresponds to the state
of the dynamic system as a function of t ∈ [0, T ], and initial conditions given by y(0) and u. The deep network
is fully connected even though not all connections are shown.

^
y u ref

Ts
T y
y y(0)
y(0)
u
u
input output
(a) PINC in self-loop mode (b) PINC connected to the plant

Figure 3: Modes of operation of the PINC network. (a) PINC net operates in self-loop mode, using its own output
prediction as next initial state, after T seconds. This operation mode is used within one iteration of MPC, for
trajectory generation until the prediction horizon of MPC completes (predicted output from Fig. 1). (b) Block
diagram for PINC connected to the plant. One pass through the diagram arrows corresponds to one MPC iteration
applying a control input u for Ts timesteps for both plant and PINC network. Note that the initial state of the
PINC net is set to the real output of the plant. In practice, in MPC, these two operation modes are executed in an
alternated way (optimization in the prediction horizon, and application of control action).

The error might accumulate when the MPC model is used in a future finite prediction horizon to solve a
constrained optimization problem. In this case, the prediction y[k − 1] is fed back as no readings from the real
process at a future time are possible (Fig. 3a).

3.3.3. Training
The first loss term in Equation (3) can be generalized to the PINC network as:
Ny Nt
1 X 1 X
MSEy = |yi (vj ) − ybij |2 , (10)
Ny i=1 Nt j=1

where the pair (vj , yb j ) corresponds to the j-th training example and vj = (t, y(0), u)j is the whole input to the
network (i.e., time, initial state, and control input). Usually, this dataset comes from measured data, but in this
work we will show that, if we assume that the given ODE is an exact representation of the process, it is enough
for this dataset to contain only the initial conditions of the modelled ODE. For instance, one such training data
pair is ((0, 0.4, 0.6), 0.4), which means: at t = 0 the initial state is 0.4, the control input is 0.6 and the desired
output is equal to the initial state (0.4). Thus, for all data points, t = 0, while y(0) and u are randomly sampled
from intervals defined according to the modelled dynamic system. This means that MSEy represents the mean
squared error for all randomly sampled initial conditions of the considered ODE and control inputs. Note also that
the input yj (0) is equal to the desired output ybj in the training set, such that the network must learn to reproduce
7
Figure 4: Representation of a trained PINC network evolving through time in self-loop mode (Fig. 3a) for tra-
jectory generation in prediction horizon. The top dashed black curve corresponds to a predicted trajectory y of
a hypothetical dynamic system in continuous time. The states y[k] are snapshots of the system in discrete time
k positioned at the equidistant vertical lines. Between two vertical lines (during the inner continuous interval
between steps k and k + 1), the PINC net learns the solution of an ODE with t ∈ [0, T ], conditioned on a fixed
control input u[k] (blue solid line) and initial state y(0) (green thick dashed line). Control action u[k] is changed
at the vertical lines and kept fixed for T seconds, and the initial state y(0) in the interval between steps k and
k + 1 is updated to the last state of the previous interval k − 1 (indicated by the red curved arrow). The PINC net
can directly predict the states at the vertical lines without the need to infer intermediate states t < T as numerical
simulation does. Here, we assume that T = Ts and, thus, the number of discrete timesteps M is equal to the
length of the prediction horizon in MPC.

the initial state yj (0) into the network output y(vj ) at t = 0. In practice, the aforementioned assumption allows
training using randomly sampled data for solving the ODE, without ever requiring measured process’s data.
The second loss term in Equation (3) is rewritten as:
Ny NF
1 X 1 X
MSEF = |F (yi (vk ))|2 , (11)
Ny i=1 NF
k=1

where vk corresponds to the k-th collocation point (t, y(0), u)k , where now all three types of inputs (and not only
the last two), i.e., time, initial condition, and control input, are randomly sampled from their respective particular
intervals. Specifically, the interval for t is [0, T ], where T is the inner continuous interval of the PINC framework.
Basically, this formulation means that the PINC net is trained with data points that lie on the boundary of
simulations, i.e., only initial states of ODEs are presented for the loss function in Eq. (10). Practically, this does
not require collecting data from ODE simulators. On the other hand, the collocation points in MSEF serve to
regularize the PINC net to satisfy the behavior defined by F . Thus, in the training process, the PINC net is only
directly informed with a initial state in Eq. (10), and its physics-informed cost loss in Eq. (11) must enforce the
structure of the differential equation into its output y(·) for the remaining inner continuous interval of T seconds
(e.g., t ∈ (0, T ]).
The total loss can be generalized to MSE = MSEy + λ · MSEF , where λ represents a rescaling factor so that
both terms are approximately in the same scale. Once the PINC net structure, datasets and the losses are defined,
the training process starts with the ADAM optimizer (Kingma & Ba, 2014) for K1 epochs, and subsequently
continues with the L-BFGS optimizer (Andrew & Gao, 2007) for K2 iterations in order to adapt the net weights
w towards the minimization of MSE. Note that automatic differentiation is employed for the physics-informed
term MSEF in Eq. (11), using deep learning frameworks such as Tensorflow.

3.3.4. NMPC
After training, the PINC net is used as a model in nonlinear MPC, whose algorithm is described in Section 3.2.
Thus, the control interface function fbw in Equation (8) replaces Equation (5b) in the MPC formulation, redefining
the notation of a dynamic system’s state by the prediction given by the PINC network, i.e., x[k] = y[k]. After
these substitutions, we arrive at a Multiple Shooting (MS)-inspired formulation for the NMPC problem under the
PINC framework:
XN2 NXu −1
2
J= y[k + j] − yref [k + j] Q + k∆u[k + i]k2R (12a)
j=N1 i=0
8
while being subject to:

y[k + j + 1] = b
fw (y[k + j], u[k + j]), ∀j = 0, . . . , N2 − 1 (12b)
j
X
u[k + j] = u[k − 1] + ∆u[k + i], ∀j = 0, . . . , (Nu − 1) (12c)
i=0
u[k + j] = u[k + Nu − 1], ∀j = Nu , . . . , N2 − 1 (12d)
h(y[k + j], u[k + j]) ≤ 0, ∀j = N1 , . . . , N2 (12e)
g(y[k + j], u[k + j]) = 0, ∀j = N1 , . . . , N2 (12f)

3.4. Metrics
The evaluation of the PINC net prediction performance is done on a validation set in self-loop mode (Figures 3a
and 4). In particular, the generalization MSE is computed only at the discrete time steps (vertical lines in Fig. 4):
Ny N
1 X 1 X 2
MSEgen = yi [k] − ybi [k] , (13)
Ny i=1 N
k=1

where: y[k] is the prediction of the PINC net given by Equation (8) and yb [k] is obtained with Runge-Kutta (RK)
simulation of the true model of the plant; N is the length of vector y; and the same control input signal u[k] is
given to both the PINC net and the RK model.
The control performance is measured by employing the Integral of Absolute Error (IAE) on a simulation of
C iterations:
Ny C
1 X X ref
IAE = yi [k] − yi [k] (14)
Ny i=1
k=1

and the Root Mean Squared Error (RMSE):


v
Ny u C
1 Xu t1
X 2
RMSE = yiref [k] − yi [k] (15)
Ny i=1 C
k=1

where yiref [k] is the reference value of yi [k] at timestep k.


The IAE is ideal for comparing simulation runs with the exact same reference signal, as the sum of absolute
errors is very sensitive to changes in control performance (Schultz & Rideout, 1961). Meanwhile, the RMSE can
capture the average error behavior of the controller.

3.5. PINC Algorithms


In this section, an overview of the proposal is presented with the help of high-level algorithms. Algorithm
1’s objective is training the PINC network. It uses datapoints and collocations points (generated as described in
Section 3.3.3) to minimize Eq. (10) + Eq. (11), first with ADAM optimizer and then with L-BFGS optimizer.
Algorithm 2 employs MPC with PINC using the minimization process (NMPC) described in the Section 3.3.4
for each timestep k out of C iterations (i.e., total length of the reference signal) to yield a control action u[k] to be
applied to the plant.

4. Experiments

This section presents experiments regarding the application of PINC to the modeling and control of the Van der
Pol Oscillator and the four-tank system, which are two dynamical systems often considered for nonlinear analysis
in the literature.

9
Algorithm 1: PINC Training Algorithm
input: K1 , K2 , F (·), {(vj , y bj ) : j = 1, . . . , Nt }, {vk : k = 1, . . . , NF };
initialize PINC weights w with Xavier normal distribution;
// Train with ADAM
for K1 epochs do
Compute the gradients of Eq. (10) + Eq. (11) (with F (·)) with respect to w using the data points
{(vj , y
bj ) : j = 1, . . . , Nt } and collocation points {vk : k = 1, . . . , NF };
Update w with ADAM optimizer and the obtained gradients;
// Train with L-BFGS
for K2 iterations do
Compute the gradients of Eq. (10) + Eq. (11) (with F (·)) with respect to w using the data points
{(vj , y
bj ) : j = 1, . . . , Nt } and collocation points {vk : k = 1, . . . , NF };
Update w with L-BFGS optimizer and the obtained gradients;
Save network w with best performance seen so far on a validation set using Eq. (13);
output: network w with lowest validation error;

Algorithm 2: MPC with PINC Algorithm


input: Q, R, Nu , N1 , N2 , yref , b fw , x[0]
// Use the trained PINC to perform the control procedure
for k := 0, 1, 2, . . . , C do
Set initial state y[k] to the plant’s current state x[k];
Minimize (12a) s.t. (12b)—(12f), with respect to control u for reference yref at timestep k, using the
trained network b fw as predictive model, control horizon Nu , prediction horizon M = N2 − N1 + 1,
and weight matrices Q and R;
Apply u[k] to the plant, obtaining the next states x[k + 1];
output: control action u

4.1. Van der Pol Oscillator


4.1.1. Model
The Van der Pol oscillator (Y. Hafeez et al., 2015) is an ODE initially discovered by Balthazar Van der Pol that
had the original purpose of modeling triode oscillations in electric circuits. Since then, the ODE has been used
for other purposes, such as seismology and biological neuron modeling (Y. Hafeez et al., 2015), and as a standard
proof-of-concept dynamical system for optimal control applications (Andersson et al., 2012). The equations that
govern the Van der Pol Oscillator are as follows:

ẋ1 = x2 (16a)
ẋ2 = µ(1 − x21 )x2 − x1 + u (16b)

where µ = 1 is referred to as the damping parameter, which affects how much the system will oscillate, x =
(x1 , x2 ) is the system state, and u is an exogenous control action.
By inspection,
√ √ Pol oscillator has an equilibrium at x̄ = (u, 0), which is stable for a constant input
the Van der
u ∈ (− 3, −1) or u ∈ (1, 3). The oscillator also has a limit cycle that can be perceived in polar coordinates
(Y. Hafeez et al., 2015). For our experiments, u ∈ [−1, 1] and x1 , x2 ∈ [−3, 3].

4.1.2. PINC Analysis


To find the most suitable configuration for the PINC net to control a dynamical system, we propose first
running grid search experiments over hyperparameters, such as the network complexity and the number of data
points (Nt ) and collocation points (Nf ).
Here, the sampling time is chosen according to the particular dynamics of the Van der Pol oscillator: T =
Ts = 0.5s. At first, we use Nt = 1, 000 and Nf = 100, 000 as they provide a sufficient number of points to train
a PINC net.
For training the PINC net, ADAM is used to optimize the loss function for K1 = 500 epochs, and afterward,
L-BFGS is used for K2 = 2, 000 iterations to enhance the stability of the training process. Note that this K2 does
not exhaust the training and, as such, it will need to be increased before the final deployment of the PINC net.
10
log(MSE)

2 -0.22 -0.55 -1.00 -1.56 -1.91


−0.5
log(MSE)
4 -0.31 -0.61 -1.45 -2.62 -2.84
0.0
Number of Layers

−1.0
2000 -1.13 -2.46 -2.40 -0.68 0.30

−0.5
5 -0.30 -0.75 -1.85 -2.31 -2.83 −1.5
4000 -1.19 -2.31 -2.73 -2.56 -0.80
−1.0

Nf
8 -0.15 -0.61 -1.60 -2.36 -2.80 −2.0
-1.23 -2.79 -2.50 -2.83 -2.68
−1.5
10000
−2.0
−2.5
10 -0.17 -0.46 -2.08 -1.86 -2.87 100000 -1.19 -2.52 -2.75 -2.54 -2.51 −2.5

3 5 10 15 20 40 80 100 500 1000


Neurons per Layer Nt

(a) Effect of the number of layers and neurons per layer (b) Effect of the number of collocation points Nf and data points Nt

Figure 5: Analysis of the PINC net for the Van der Pol Oscillator. The network training time is fixed to a constant
number of iterations. The MSE validation error is computed according to Equation (13). (a) The log10 of the
MSE error as a function of network complexity, averaged over 10 different simulations. The best generalization
error (10−2.87 ) is achieved with a deep network of 10 layers with 20 neurons each. (b) The effect of the number
of collocation points Nf and data points Nt on generalization performance, averaged over 5 different randomly
initialized networks.

The parameter λ is set empirically so that MSEy and MSEF are not in disparate scales. The validation dataset is
composed of 1810 points obtained using a randomly generated control action u (e.g., Fig. 9), which is equivalent
to 905s of simulation, since Ts = 0.5s. The validation or generalization error considers the self-loop mode of
PINC to compute Eq. (13).
The first experiment analyses the network complexity (Fig. 5a) and shows the validation MSE using Eq. (13)
averaged over 10 different random initializations of the network weights. In general, as the network grows deeper
and with more neurons per layer, the performance increases. Besides, layers with 3 or 5 neurons are not sufficient
to model the required task. Note that these errors would decrease even further if training had been extended
for more epochs (correcting the lower performance of the net of 10 layers with 15 neurons each, for instance).
Although the network of 10 layers with 20 neurons each achieves the best performance, we choose to continue
the following experiments with a configuration of 4 layers of 20 neurons each, which also showed excellent
performance, but with less computational overhead.
In Fig. 5b, the proportion between data points and collocation points is investigated. Each error cell in the
plot corresponds to the average of 5 different experiments with randomly generated networks. Clearly, 40 data
points are not enough, and the proportion Nf /Nt should be considerably higher than 4 (hence the dark cells in the
upper-right corner of the plot).

4.1.3. Long-range Simulation


In order to showcase the capacity of long-range simulation of the proposed approach in relation to the con-
ventional PINN, we trained traditional PINNs that have the same complexity of the PINC net, i.e., 4 layers of
20 neurons each, but that have only one input t as usual for PINNs. A new PINN is trained for each interval
considered T ∈ {0.5, 1, 2, 5, 10} seconds. Note that these PINNs do not allow for arbitrary initial conditions
after training, as PINC nets do. Furthermore, once the interval value T is chosen before training for conventional
PINNs, further simulation beyond Ts rapidly deteriorates as we shall see.
The PINNs were trained with the ADAM optimization algorithm for K1 = 500 epochs initially with a learning
rate of 0.0035, and then other K1 = 700 × T epochs with a learning rate of 0.001, and finally for K2 = 1, 000 × T
iterations of the L-FGBS optimization method. In addition, the number of collocation points also increased with
the value of T , Nf = 5, 000 × T . Thus, the longer the interval T , the longer the training and the higher the
number of collocation points employed. On the other hand, only one PINC network was trained, following the
configuration from the previous section, but for longer, as indicated in Fig. 8.
In Fig. 6, the results are shown, which compare the trajectories of the single PINC net that works for any
considered interval T (e.g., shorter or longer than 10s) with the ones from the PINN networks. The two rows in
11
the plot correspond to the two states of the oscillator. Each subplot involves the training of one new PINN from
scratch for a certain T ∈ {0.5, 1, 2, 5, 10}, except for the PINC net, which is trained only once. Besides, each
PINN is trained considering a fixed control input u = 0.54 along the run, with fixed initial conditions x1 = −2.14
and x2 = 0.25, both randomly chosen. Unlike PINC, conventional PINNs would need to be retrained from scratch
if a different initial condition or control input is required.
In the plot of Fig. 6, the dots in the predicted PINC trajectories, in blue and pink colors, mark the moments
at which the final predicted states at T = 0.5s are fed back as new initial conditions and input to the network,
corresponding to the vertical lines in Fig. 4. Although PINC is trained with a fixed T = 0.5s, its chained (self-
loop) prediction can be used to perform long-range simulation for an arbitrary total simulation time T without
fixing it beforehand as with traditional PINNs, whose trajectory is shown by the dashed grey lines in the plots
of Fig. 6. Note that the target true trajectory of the dynamical system, drawn in solid black line, is completely
superimposed by the predicted PINC trajectory. In addition, observe that only the PINN trained specifically with
T = 10s can simulate without degradation until 10s, and not beyond that, for the given fixed initial condition and
control input.

T=0.5 T=1.0 T=2.0 T=5.0 T=10.0


y1
y2

0 10 0 10 0 10 0 10 0 10
Time (s)

Figure 6: Comparison between conventional PINN (dashed grey line) and proposed PINC (solid blue and pink
lines) for long-range simulation of the Van Der Pol oscillator with fixed control input u and fixed initial condition
x = (x1 , x2 ) along the simulation. The target trajectories for states x1 and x2 are also plot in black solid line,
but are completely superimposed by the PINC predictions y1 and y2 . From left to right, the PINN nets are trained
with fixed T ∈ {0.5, 1, 2, 5, 10}, while the PINC net is trained only once with T = 0.5s even though it can run
for arbitrary longer simulation times not fixed beforehand.

The RMSE error for these experiments are summarized in Fig. 7, making it clear the high prediction error
obtained by the conventional PINN in relation to the proposed PINC approach when the T used for PINN training
is lower than 10s. At T = 10s, PINN has slightly lower error than PINC, likely because of the small accumulation
of prediction errors during self-loop simulation for PINC.

100

10−1
RMSE

PINN
PINC
10−2

10−3

0.5 1.0 2.0 5.0 10.0


T (s)

Figure 7: Performance comparison in terms of RMSE between the target trajectory of the Van der Pol oscillator
and the predicted trajectory for the PINN and PINC networks from Fig. 6 for a simulation of 10s. The horizontal
axis corresponds to the fixed T used for training the PINN network. See text for more details.

12
The control input u was kept fixed here to compare the conventional and the new approach. However, PINC
can have a variable u along the simulation, yielding an additional advantage for allowing control applications, as
showcased in the next section.

4.1.4. PINC Control


The final PINC net is chosen to have 4 hidden layers each of 20 neurons for the Van der Pol oscillator. Besides,
we continue setting Nt = 1, 000, Nf = 100, 000, and K1 = 500, but the training is extended with K2 = 20, 000,
which allows the MSE to settle in an asymptotic curve (Fig. 8). For comparison, a vertical black dashed line
is plotted in Fig. 8, indicating the moment at which training would have stopped for earlier experiments from
Fig. 5. Thus, further training allows improving validation error (according to equation (13)) at least one order of
magnitude. Note that the validation error does not increase permanently as training follows, arguably due to the
regularization effect of MSEF in the loss function.

100 Training
Validation
10−1

10−2
MSE

10−3

10−4

10−5

0 10000 20000 30000 40000 50000 60000


Epoch

Figure 8: MSE evolution during training of the final PINC net. Previous experiments from Section 4.1.2 stopped
training at the vertical dashed line. The validation dataset consists of 1810 points, or 90s of simulation since
T = Ts = 0.5s. The validation MSE is more noisy because it is computed on a much smaller dataset and on
self-loop mode using Eq. (13).

To view the PINC prediction after training, we randomly generate a control input u for 10s. In Fig. 9, the
predicted trajectory is given for such a control input. With our method, we can directly infer each circle in the
trajectory using (8) every T = 0.5s. The trajectory between two consecutive circles can be predicted by varying
the input t of the network and keeping the other inputs y(0) and u fixed. The prediction matches the target
trajectory very well as the latter is also plot, but gets superimposed by the former.
The resulting control from PINC can be seen in Fig. 10 in a simulation of 60s, where MPC was employed
to find the optimal value of the control input, considering a prediction and control horizon of 5T (or 2.5s). The
control parameters are given as follows: N1 = 1, N2 = 5, Nu = N2 , Q = 10I, and R = I. Here, the
optimization in MPC to find a control input at the current timestep uses the PINC network’s predicted trajectory
for future timesteps, i.e., for the prediction horizon of 2.5s. This procedure is repeated for all 120 points of
the plotted trajectory. Here, the role of the plant to be controlled is taken by the Van der Pol oscillator, whose
states are obtained by an RK integrator. The control performance for a 60s simulation is presented in Table 1,
which also shows the result when the original ODE model is used as the predictive model in NMPC instead
of the PINC net. In this case, the classic, fourth-order Runge-Kutta method (RK4) is employed as a numerical
solution method to compute the states of the system for NMPC. This means that practically other approximations
to the plant/system are not likely to improve the ODE model itself, thus, justifying our comparison to the baseline
NMPC. Remarkably, PINC achieves practically the same result as the ODE/RK approach in terms of RMSE and
IAE, while being slightly faster on average when executed with 10 repetitions on the same desktop computer. The
right plot in Fig. 10 also shows the NMPC considering the ODE/RK model as the predictive model, in thick yellow
line, showing that the lines from PINC and ODE/RK match very well.

13
Van der Pol prediction
1.0

2
0.5

outputs y1 , y2

input u
0 0.0

−0.5
−2
−1.0
0 2 4 6 8 10
Time (s)

Figure 9: PINC net prediction for the Van der Pol oscillator on test data. The grey dashed line gives the randomly
generated input u, while the predictions for the oscillator states x1 and x2 correspond to the solid blue and pink
lines, respectively. The target trajectory, from RK method, is also plot in black, but is not visible as the prediction
completely superimposes the former. Each dot in the predicted trajectory corresponds to the vertical lines in Fig. 4
when the control action and initial state change (after Ts = T = 0.5s).

1.0
1.0 x1
x2
0.5
0.5
x
x

0.0
0.0
x1
x2
−0.5
−0.5
1.0
1.0

0.5
0.5
u
u

0.0
0.0

−0.5
−0.5 0 5 10 15 20
0 10 20 30 40 50 60 Time (s)
Time (s)

Figure 10: Control of the Van der Pol oscillator with PINC and comparison with the ODE model. The reference
trajectory for x1 (x2 ) is given by a dashed black step signal (fixed at zero), while the controlled variables are the
states x1 and x2 given by blue and pink lines. The control input u is the manipulated variable in the lower plot,
found by MPC. Left: simulation totalling 60s exclusively by PINC. Right: comparison with ODE model (plant)
as predictive model in yellow thick line during the first 20s. See text for more details.

4.2. Four Tanks


4.2.1. Model
The four tanks system is a widely used benchmark for multivariate control systems (Johansson, 2000), as it is
a nonlinear and multivariate system with some degree of coupling between variables. By setting its parameters to
a given combination of values, it is possible to induce the system to have non-minimum phase transmission zeros,
which are an additional difficulty for PID controllers (Johansson, 2000).
As Figure 11 shows, the four tanks system is composed of four tanks, denoted by the index i = {1, 2, 3, 4},
14
and two pumps j = {1, 2} supplying each tank with water. Each tank has a cylindric form with a basis area of
Ai , and an orifice of area ai at the basis center. Tank 1 (2) is located right below tank 3 (4), so that the flow ωi
from the tank above goes directly to the tank below. Both pumps are linear actuators controlled by the voltage uj
with coefficient kj converting the voltage into the pump flow. Pump 1 (2) is associated with a directional valve
that distributes the resulting flow into tanks 1 and 4 (2 and 3), which is the coupling source in this system. The
directional valves have an opening γj , which is the amount they distribute to the bottom tanks. The adjustment of
γj is one of the main factors in regulating the control problems associated with the system (Johansson, 2000). The
level of water in each tank is denoted hi .
The following equations govern the four tank system, which are obtained from mass balance:

γ1 k1 u1 + ω3 − ω1
ḣ1 = (17a)
A1
γ2 k2 u2 + ω4 − ω2
ḣ2 = (17b)
A2
(1 − γ2 )k2 u2 − ω3
ḣ3 = (17c)
A3
(1 − γ1 )k1 u1 − ω4
ḣ4 = (17d)
A4
where the flow in each tank orifice ωi is described by the Bernoulli orifice equation, adding the sole nonlinearity
of the system: p
ωi = ai 2ghi (18)
with g as the acceleration of gravity. The parameters used for this application are the same as the ones stipulated
for the non-minimum phase experiment in Johansson (2000).

4.2.2. PINC Control


We have performed a similar approach to the first control problem with respect to finding a suitable config-
uration for network complexity and the proportion between data and collocation points. We have observed that
5 is the minimum number of layers to obtain sufficient prediction performance for the four tanks system, since
it is a more complex plant, with multiple inputs and multiple outputs (MIMO) operating at different timescales.
The following experiments consider a PINC net with 5 layers of 20 neurons each. Besides, we continue setting
Nt = 1, 000, Nf = 100, 000, K1 = 500, and K2 = 20, 000. The sampling period is T = Ts = 10s. The control
parameters are once again given by N1 = 1, N2 = 5, Nu = N2 , Q = 10I, and R = I.
After training the PINC net, the prediction on test data, with new randomly generated control actions (not
shown), is presented in Fig. 12. The deviation in prediction at longer ranges, as seen in the first plot for h1 and h2 ,
is expected, since the network works in self-loop mode, feeding its prediction of the last state back as input for the
initial state (Fig. 3a), every T = 10s. Thus, the error is accumulated in this chaining procedure. However, MPC

Tank 4
Tank 3

Tank 1
Pump 1 Tank 2 Pump 2

Figure 11: Schematic representation of the four tanks system, from Brandão (2018).
15
12.5
y1 12.5 y3
10.0 y2 y4
10.0
7.5
7.5
h1 , h2

h3 , h4
5.0 5.0

2.5 2.5

0.0 0.0
0 50 100 150 200 250 300 350 0 50 100 150 200 250 300 350
Time (s) Time (s)

Figure 12: PINC net prediction for the four tanks system on test data, with randomly generated control input
signals similar to Fig. 9. The predictions for the level of the tanks h1 and h2 correspond to the solid blue and pink
lines, respectively. From the RK method, the target trajectories are also plot in dark solid lines without dots. At
each dot in the predicted trajectory, the PINC net receives new inputs for control action and initial state (every
Ts = T = 10s), as explained in Fig. 3a. The vertical dashed line indicates the prediction horizon used for MPC.

Van der Pol Four tanks


(4 layers of 20 units) (5 layers of 20 units)
RMSE IAE time (s) RMSE IAE time (s)
PINC 0.15 123.6 3.32 ± 0.15 0.811 876 10.85 ± 0.14
ODE / RK 0.15 122 3.41 ± 0.04 0.807 544 14.15 ± 0.13

Table 1: Results for the control experiments

uses this trajectory only up to 50s, equivalent to a prediction horizon of 5 steps, indicated by the vertical dashed
line in the figure, and the next optimization procedure in MPC resets the initial state to the true value as obtained
by sensors of the real process (Fig. 3b).
PINC’s control employs prediction and control horizons both of 5 steps (50s in simulation time) for this four
tanks system. Besides, both h3 and h4 tank levels are constrained to the interval [0.6, 5.5]cm. The results are
shown in Fig. 13, where the controlled and constrained tank levels are presented in the first two topmost plots, and
the control action found by MPC is shown at the bottom plot. The plots on the right-hand side show a close-up
during the initial 160s of the simulation. The control was successful in spite of the constrains imposed on h3
and h4 (which were respected) and some minor error in steady regime, which can be corrected by adding the
calculation of a correction factor through filtering the error between the measurement and the network prediction,
as done in Jordanou et al. (2018) for a recurrent network. In Fig. 14, we use the same simulation setup, focusing
on the timesteps between 500s and 1300s, to compare with the response (in yellow color) of the control using
the plant reference model as predictive model in MPC. This ODE/RK-based model is the reference model that
represent the plant itself, which justifies the negligible steady-state regime error observed in the figure. We can
notice that the PINC simulation is very close to the nominal MPC given by the ODE/RK model. This comparison
suffices as another NMPC would employ an approximation of the ODE/RK model as a predictive model.
The control performance in terms of RMSE and IAE is shown in Table 1. Although IAE seems to differ more
between PINC and ODE/RK, RMSE errors for both methods are almost equivalent. The average time spent on a
desktop computer for the complete control simulation using PINC, repeated 10 times, 10.85s, is 23.3% inferior
than using the ODE of the four tanks as a model for MPC (14.15s), which is remarkable given the architecture of
5 hidden layers with 20 neurons each that is used for PINC.

4.2.3. Sensitivity to Perturbations


The PINC approach, as it was introduced in this work, does not have an inherent method to deal with modeling
errors and completely reject disturbances, making these points valid for future research. Nonetheless, some of
robustness to model mismatch and disturbances can be implemented in the control algorithm by means of error
correction filtering (Camacho & Bordons, 2013) and Kalman filters (Brown & Hwang, 1992).
While works such as Cheng et al. (2021) and Wei et al. (2021) are focused on proving stability theoretically
through the use Lyapunov functions, we showcase the PINC robustness experimentally since our focus is more ap-
plication oriented. In order to test the robustness of the proposed formulation to parameter mismatch, a sensitivity
analysis was performed for the four tanks scenarios previously presented. The analysis was performed assuming
16
Controlled tank levels Controlled tank levels

15.0 h1 15.0
h1 ,h2 (cm)

h1 ,h2 (cm)
h2
12.5 12.5
10.0 10.0
7.5 7.5

Constrained tank levels Constrained tank levels

h3
h3 ,h4 (cm)

h3 ,h4 (cm)

4 h4 4

2 2
pump voltage u (V)

pump voltage u (V)

u1
4 4
u2

2 2

0 500 1000 1500 2000 0 50 100 150


Time (s) Time (s)

Figure 13: Control of the four tanks system with PINC. The controlled variables are the tank levels h1 and h2
given by blue and pink lines, respectively, whereas the reference trajectory for h1 (h2 ) is given by a dashed black
(grey) step signal. The control inputs u are the manipulated voltages shown in the lower plot, found by MPC.
Dashed grey horizontal lines represent the lower and upper limits for both h3 and h4 . Left: simulation totalling
2400s. Right: close-up on the first 160s. The initial conditions for h1 and h2 are (2, 2), which is the minimum of
the allowed interval [2, 20]. See text for more details.

17
Controlled tank level Controlled tank level
16
10.0
15
9.5
h1 (cm)

h2 (cm)
14
9.0

8.5 13

8.0
Constrained tank level Constrained tank level
6 2.0

5 1.5
h3 (cm)

h4 (cm)

4 1.0

3 0.5

2 0.0

3 5
pump voltage u1 (V)

pump voltage u2 (V)

4
2
3
1
2

0 1
600 800 1000 1200 600 800 1000 1200
Time (s) Time (s)

Figure 14: Control of the four tanks system with PINC and the RK/ODE model for timesteps 500s to 1300s from
Fig. 13. All the yellow lines correspond to the simulation where the ODE model itself is used as predictive model
(using a numerical solution at each timestep in the prediction horizon), whereas the remaining lines belong to the
simulation with PINC net as predictive model.

18
25 25

20 20

15 15

10 10

5 5

0 0
400 600 800 1000 520 540 560 580
IAE IAE

(a) 5 layers of 20 neurons each (left: k1 and k2 perturbed; right: initial condition perturbed)

25 25

20 20

15 15

10 10

5 5

0 0
300 350 400 450 500 550 600 300 320 340 360 380
IAE IAE

(b) 8 layers of 20 neurons each (left: k1 and k2 perturbed; right: initial condition perturbed)

Figure 15: Sensitivity to modeling errors (left) and to initial conditions (right) for the MPC of the four tanks
system with PINC net as the model. For each of the plots, 151 runs of the MPC algorithm using a trained PINC
net of 5 layers (top plots) and 8 layers (bottom plots) are executed. The resulting IAEs between the references (as
in Fig. 13) and the controlled signals h1 and h2 are computed and shown in a histogram. The initial conditions are
h1 = h2 = 9 , which is the middle point of the allowed interval.

random deviations in the values of the k1 and k2 parameters. The perturbed values of e e2 are sampled from
k1 and k
uniform probability distributions U5% (a, b) = [0.95x, 1.05x], with x being the nominal value in which the PINN
was trained.
Altogether, 151 simulations were carried out, injecting the deviations in the system. The results are shown in
first column of Fig. 15 for two different networks, one with 5 hidden layers of 20 neurons each, and another one
with 8 hidden layers of 20 neurons each. Despite the random variation of the parameters k1 and k2 , it can be seen
that the IAE of the system has variations within a tolerance range considered adequate. Fig. 16 shows the control
of the four tanks when the plant controlled has the maximum deviation of 5% in k1 and k2 parameters, showcasing
that the perturbation only slightly bias the trajectories.
A second experiment consisted of perturbing the initial condition with a uniform distribution. The perturbed
initial conditions for h1 and h2 are sampled from U5% (a, b) = [0.95x, 1.05x], with x being the nominal value 9
for both states. The results are shown in the second column of Fig. 15. The peak of the histogram approximately
coincides with the IAE obtained by the unperturbed model of the plant. Thus, other initial conditions can imply
relatively lower or higher IAE. Notice that the bottom plots show instances with lower IAE, evidencing the higher
accuracy of a deeper network, with 8 hidden layers, in this particular situation.
In summary, the sensitivity experiments imply that the system does not lose much performance in terms of
IAE. In fact, the occurrence of lower IAE simulations is actually higher. Since the PINC control strategy has no
integrators, a small steady state error that depends on model match is expected. As there is more model mismatch
when the parameters k1 and k2 distance themselves from the nominal ones, the steady-state error is expected to
be higher but still within an acceptable range of IAEs.

19
Controlled tank levels

15.0

h1 ,h2 (cm)
12.5

10.0

7.5
Constrained tank levels

h3 ,h4 (cm) 4

2
pump voltage u (V)

0 500 1000 1500 2000 2500


Time (s)

Figure 16: Control of the four tanks system with PINC as in Fig. 13, but with maximum perturbation of 5% in k1
and k2 parameters of the model being controlled. Initial conditions as in Fig. 15, that is, h1 = h2 = 9. Control
results were adequate even though the IAE was 1022, which is in the tail of the histogram from Fig. 15(a) left plot.

4.3. Discussion
In the context of ODE simulation, our proposal, after the network is trained, has shown that it is possible to
speed up the runtime of these simulations (up to 30% on average). With further enhancements in the network
inference (e.g., parallelization) or in the case of extending our method to PDEs, we envision an even higher gain
(e.g., 10x faster as reported in the literature for simulation of PDEs with PINNs).
One of the main obstacles to having a fully effective simulation from a PINC network is the long training
of such networks. Nonetheless, this is a common issue in most of the proposals dealing with deep learning and
specially with PINNs. However, after training, the network can predict directly any state in the range [0, T ] without
requiring an integration with intermediate points as in numerical simulation methods.
We noticed that a precise optimization algorithm towards the end of the training (e.g., L-BFGS) is essential
in obtaining a precise model. Besides, preliminary work in identifying more complex plants (e.g., in oil and gas
industry) show that skip connections (Lee et al., 2015; He et al., 2016) can help the training of deep networks by
helping to backpropagate the gradient to the deepest layers during training, further improving the precision of the
final trained model.
Furthermore, challenges to the learning of PINNs can arise from discontinuities in the ODE equations that
model the plant, such as the presence of the max operator. Also, the random initialization of the weights of neural
networks may cause different results and also render invalid arguments to functions such as the square root, if
present in the ODE equations. Notice that some fixes or workarounds can be applied in these cases.

5. Conclusion

We have proposed a new framework that makes Physics-Informed Neural Networks (PINNs) amenable to
control methods, such as MPC, opening a wide range of application opportunities. This Physics-Informed Neural
Nets-based Control (PINC) approach allows a PINN to work for longer-range time intervals that are not fixed
beforehand, without severe prediction degradation as it normally does, and makes it easy to employ such networks
in MPC applications. In control applications, this framework (a) provides a way to identify a system by integrating
collected data from a plant with a priori expert knowledge in the form of ordinary differential equations; (b) can
simulate differential equations faster than numerical solution methods, specially if extended to Partial Differential
Equations (PDE), making PINNs more appealing to real-time control applications. Although we only used initial

20
conditions as real training data, we foresee that using additional sparse data will make the training of PINC nets
much faster.
In future work, we intend to extend the framework to systems described by Differential-Algebraic Equations
(DAEs) and PDEs, and systems for which prior knowledge is uncertain (unknown parameters) as well as apply
PINC to industrial control problems, such as in the oil and gas industry, for which some prior knowledge of ODEs
are known in addition to very noisy or sparse data. We expect that the reduction in the computational burden in
using PINC for control scenarios will be even more relevant in comparison to the numerical solution approach, as
the model becomes increasingly more complex or in the case of models described by PDEs. Finally, we envision
that the application of system identification in an industrial setting will expand if we use complementary sources of
information for training deep networks, that is, by using physical laws and historical sparse data, making feasible
a wide range of previously unsolved applications in systems and control.

Acknowledgments

This work was funded in part by CNPq (Grant 308624/2021-1) and FAPESC (Grant 2021TR2265).

References

Åkesson, B. M., & Toivonen, H. T. (2006). A neural network model predictive controller. Journal of Process
Control, 16, 937–946.
Andersson, J., Åkesson, J., & Diehl, M. (2012). Dynamic optimization with CasADi. In Proceedings of the IEEE
Conference on Decision and Control (pp. 681–686). doi:10.1109/CDC.2012.6426534.
Andrew, G., & Gao, J. (2007). Scalable training of L1-regularized log-linear models. In Proceedings of the 24th
International Conference on Machine Learning ICML’07 (pp. 33–40). New York, NY, USA: Association for
Computing Machinery. doi:10.1145/1273496.1273501.
Antonelo, E. A., Camponogara, E., Seman, L. O., de Souza, E. R., Jordanou, J. P., & Hub-
ner, J. F. (2021). Physics-informed neural nets for control of dynamical systems. URL:
https://arxiv.org/abs/2104.02556. doi:10.48550/ARXIV.2104.02556.
Biegler, L. T. (2010). Nonlinear Programming: Concepts, Algorithms, and Applications to Chemical Processes.
Philadelphia: SIAM.
Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Springer.
Brandão, A. S. M. (2018). Controle Preditivo com Geração de Código: Um Estudo Comparativo. Master’s thesis
Universidade Federal de Santa Catarina.
Brown, R. G., & Hwang, P. Y. C. (1992). Introduction to Random Signals and Applied Kalman Filtering. John
Wiley & Sons.
Camacho, E. F., & Bordons, C. (2013). Model Predictive Control. Springer Science & Business Media.
Cavagnari, L., Magni, L., & Scattolini, R. (1999). Neural network implementation of nonlinear receding-horizon
control. Neural computing & applications, 8, 86–92.
Chen, J., & Liu, Y. (2021). Probabilistic physics-guided machine learning for fatigue data analysis. Expert Systems
with Applications, 168, 114316. doi:10.1016/j.eswa.2020.114316.
Cheng, P., Chen, M., Stojanovic, V., & He, S. (2021). Asynchronous fault detection filtering for piece-
wise homogenous markov jump linear systems via a dual hidden markov model. Mechanical Systems
and Signal Processing, 151, 107353. URL: https://doi.org/10.1016/j.ymssp.2020.107353.
doi:10.1016/j.ymssp.2020.107353.
Eren, U., Prach, A., Koçer, B. B., Raković, S. V., Kayacan, E., & Açıkmeşe, B. (2017). Model predictive control
in aerospace systems: Current state and opportunities. Journal of Guidance, Control, and Dynamics, 40,
1541–1566. doi:10.2514/1.G002507.
Gill, P. E., Murray, W., & Saunders, M. A. (2005). SNOPT: An SQP algorithm for large-scale constrained
optimization. SIAM Review, 47, 99–131. doi:10.1137/S0036144504446096.
21
Gokhale, G., Claessens, B., & Develder, C. (2022). Physics informed neural networks for control oriented thermal
modeling of buildings. Applied Energy, 314, 118852.
Grüne, L., & Pannek, J. (2011). Nonlinear Model Predictive Control: Theory and Algorithms. Springer.
Harp, D. R., O’Malley, D., Yan, B., & Pawar, R. (2021). On the feasibility of using physics-informed machine
learning for underground reservoir pressure management. Expert Systems with Applications, 178, 115006.
doi:10.1016/j.eswa.2021.115006.
He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778).
Hertneck, M., Köhler, J., Trimpe, S., & Allgöwer, F. (2018). Learning an approximate model predictive controller
with guarantees. IEEE Control Systems Letters, 2, 543–548.
Iserles, A. (1996). A First Course in the Numerical Analysis of Differential Equations. Cambridge University
Press.
Johansson, K. (2000). The quadruple-tank process: A multivariable laboratory process with an adjustable zero.
IEEE Transactions on Control Systems Technology, 8, 456–465. doi:10.1109/87.845876.
Jordanou, J. P., Antonelo, E. A., & Camponogara, E. (2021). Echo state networks for practical nonlinear model
predictive control of unknown dynamic systems. IEEE Transactions on Neural Networks and Learning
Systems, (pp. 1–15). doi:10.1109/TNNLS.2021.3136357.
Jordanou, J. P., Camponogara, E., Antonelo, E. A., & Aguiar, M. A. S. (2018). Nonlinear model
predictive control of an oil well with echo state networks. IFAC-PapersOnLine, 51, 13–18.
doi:10.1016/j.ifacol.2018.06.348.
Kingma, D. P., & Ba, J. (2014). ADAM: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, .
Kumar, A., Ridha, S., Narahari, M., & Ilyas, S. U. (2021). Physics-guided deep neural network to characterize
non-newtonian fluid flow for optimal use of energy resources. Expert Systems with Applications, (p. 115409).
doi:10.1016/j.eswa.2021.115409.
Ławryńczuk, M. (2014). Computationally Efficient Model Predictive Control Algorithms. Springer International
Publishing.
Lee, C.-Y., Xie, S., Gallagher, P., Zhang, Z., & Tu, Z. (2015). Deeply-supervised nets. In Artificial Intelligence
and Statistics (pp. 562–570). PMLR.
Liu, X.-Y., & Wang, J.-X. (2021). Physics-informed dyna-style model-based deep reinforcement learning for
dynamic control. Proceedings of the Royal Society A, 477, 20210618.
Meng, X., Li, Z., Zhang, D., & Karniadakis, G. E. (2020). PPINN: Parareal physics-informed neural net-
work for time-dependent PDEs. Computer Methods in Applied Mechanics and Engineering, 370, 113250.
doi:10.1016/j.cma.2020.113250.
Nascimento, T. P., Dórea, C. E. T., & Gonçalves, L. M. G. (2018). Nonlinear model predictive control for trajectory
tracking of nonholonomic mobile robots: A modified approach. International Journal of Advanced Robotic
Systems, 15. doi:10.1177/1729881418760461.
Nelles, O. (2001). Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy
Models. (1st ed.). Berlin: Springer.
Nocedal, J., & Wright, S. J. (2006). Numerical Optimization. (2nd ed.). New York, NY, USA: Springer.
Normey-Rico, J. E., & Camacho, E. F. (2007). Control of Dead-time Processes. Springer London.
doi:10.1007/978-1-84628-829-6.
Ortega, J. G., & Camacho, E. (1996). Mobile robot navigation in a partially structured static environment, using
neural predictive control. Control Engineering Practice, 4, 1669–1679.
Pan, Y., & Wang, J. (2012). Model predictive control of unknown nonlinear dynamical systems based on recurrent
neural networks. IEEE Transactions on Industrial Electronics, 59, 3089–3101.
22
Pang, G., & Karniadakis, G. E. (2020). Physics-informed learning machines for partial differential equations:
Gaussian processes versus neural networks. Kevrekidis P., Cuevas-Maraver J., Saxena A. (eds) Emerging
Frontiers in Nonlinear Science. Nonlinear Systems and Complexity, 32, 323–343.
Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2017). Physics informed deep learning (Part I): Data-driven
solutions of nonlinear partial differential equations. arXiv preprint arXiv:1711.10561, .
Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning frame-
work for solving forward and inverse problems involving nonlinear partial differential equations. Journal of
Computational Physics, 378, 686–707. doi:0.1016/j.jcp.2018.10.045.
Schultz, W. C., & Rideout, V. C. (1961). Control system performance measures: Past, present, and future. IRE
Transactions on Automatic Control, AC-6, 22–35. doi:10.1109/TAC.1961.6429306.
Sirignano, J., & Spiliopoulos, K. (2018). DGM: A deep learning algorithm for solving partial differential equa-
tions. Journal of Computational Physics, 375, 1339–1364. doi:10.1016/j.jcp.2018.08.029.
Stinis, P. (2020). Enforcing constraints for time series prediction in supervised, unsupervised and
reinforcement learning. In Proceedings of the AAAI 2020 Spring Symposium on Combining
Artificial Intelligence and Machine Learning with Physical Sciences. volume 2587. URL:
http://ceur-ws.org/Vol-2587/article_5.pdf.
Terzi, E., Bonetti, T., Saccani, D., Farina, M., Fagiano, L., & Scattolini, R. (2020). Learning-based predic-
tive control of the cooling system of a large business centre. Control Engineering Practice, 97, 104348.
doi:https://doi.org/10.1016/j.conengprac.2020.104348.
Wächter, A., & Biegler, L. T. (2006). On the implementation of an interior-point filter line-
search algorithm for large-scale nonlinear programming. Mathematical Programming, 106, 25–57.
doi:10.1007/s10107-004-0559-y.
Wei, T., Li, X., & Stojanovic, V. (2021). Input-to-state stability of impulsive reac-
tion–diffusion neural networks with infinite distributed delays. Nonlinear Dynam-
ics, 103, 1733–1755. URL: https://doi.org/10.1007/s11071-021-06208-6.
doi:10.1007/s11071-021-06208-6.
Witt, J., & Werner, H. (2010). Approximate model predictive control for nonlinear multivariable systems. Model
Predictive Control, (pp. 141–166). doi:10.5772/46955.
Y. Hafeez, H., Ndikilar, C. E., & Isyaku, S. (2015). Analytical study of the Van der Pol equation in the autonomous
regime. Progress in Physics, 11, 252–255.
Yang, L., Meng, X., & Karniadakis, G. E. (2021). B-PINNs: Bayesian physics-informed neural networks for
forward and inverse PDE problems with noisy data. Journal of Computational Physics, 425, 109913.
Zhai, H., & Sands, T. (2021). Physics-informed deep operator control: Controlling chaos in van der pol oscillating
circuits. arXiv preprint arXiv:2112.14707, .
Zhu, Y., Zabaras, N., Koutsourelakis, P.-S., & Perdikaris, P. (2019). Physics-constrained deep learning for high-
dimensional surrogate modeling and uncertainty quantification without labeled data. Journal of Computa-
tional Physics, 394, 56–81. doi:10.1016/j.jcp.2019.05.024.

23

You might also like