Data Based Modelling
Data Based Modelling
Keywords: This paper discusses recent developments in the data-based modeling and control of nonlinear chemical
Nonlinear processes process systems using sparse identification of nonlinear dynamics (SINDy). SINDy is a recent nonlinear system
Sparse identification identification technique that uses only measurement data to identify model dynamical systems in the form of
Subsampling
first-order nonlinear differential equations. In this work, the challenges of handling time-scale multiplicities
Two-time-scale processes
and noisy sensor data when using SINDy are addressed. Specifically, a brief overview of novel methods devised
Singular perturbations
Co-teaching
to overcome these challenges are described, along with modeling guidelines for using the proposed techniques
Dropout for process systems. When applied to two-time-scale systems, to overcome model stiffness, which leads to ill-
Ensemble learning conditioned controllers, a reduced-order modeling approach is proposed where SINDy is used to model the slow
Model predictive control dynamics, and nonlinear principal component analysis is used to algebraically ‘‘slave’’ the fast states to the slow
states. The resulting model can then be used in a Lyapunov-based model predictive controller with guaranteed
closed-loop stability provided the separation of fast and slow dynamics is sufficiently large. To handle high
levels of sensor noise, SINDy is combined with subsampling and co-teaching to improve modeling accuracy.
The challenges of modeling and controlling large-scale systems using noisy industrial data are then addressed
by using ensemble learning with SINDy. After summarizing the advances, a nonlinear chemical process is used
to provide an end-to-end demonstration of process modeling using sparse identification with guidelines for
chemical engineering practitioners. Finally, several future research directions for the incorporation of SINDy
into process systems engineering are proposed.
∗ Corresponding author at: Department of Chemical and Biomolecular Engineering, University of California, Los Angeles, CA, 90095-1592, USA.
E-mail address: [email protected] (P.D. Christofides).
https://doi.org/10.1016/j.compchemeng.2023.108247
Received 17 February 2023; Received in revised form 27 March 2023; Accepted 28 March 2023
Available online 31 March 2023
0098-1354/© 2023 Elsevier Ltd. All rights reserved.
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
particularly of interest in recent years, artificial neural networks. For three decades since it became a subject of interest in the scientific
example, in Wu et al. (2019a,b), recurrent neural networks were used community. Sansana et al. (2021) reports that the a priori knowledge
to model nonlinear processes, and subsequent closed-loop stability to be incorporated into hybrid modeling has typically been in the form
results under recurrent neural network-based model predictive control of equations, and other forms of data/information such as plant floor
were derived. Autoencoders, which are feedforward neural networks experience and process flow sheets have not been investigated exhaus-
(FNN) that replicate the input at their output, have been the subject tively. It was also found that data-driven modeling has typically been
of several studies. While linear autoencoders correspond exactly to used to enhance previously known or derived mechanistic models, but
PCA, Kramer (1991) proposed the use of nonlinear autoencoders, which the reverse, i.e., using mechanistic models to improve or constrain data-
use nonlinear activation functions, as a form of nonlinear PCA. Au- driven models, is largely unexplored. In the field of process monitoring
toencoders, particularly undercomplete autoencoders, are a powerful and fault diagnostics, in particular, Sansana et al. (2021) highlighted
tool for dimensionality reduction due to the enforced reduction of the the benefit of knowledge of causality that can be inferred via hybrid
dimension in the intermediate or hidden layers of the network. Tsay modeling.
and Baldea (2020) used undercomplete autoencoders to carry out The area of surrogate modeling in process systems engineering
nonlinear dimensionality reduction and build reduced-order models has, in parallel with the above directions, increased in research in-
for integrated scheduling and control of chemical process operations. tensity. McBride and Sundmacher (2019) provides a comprehensive
overview of advances in surrogate modeling in chemical engineering
When the system identification and optimal scheduling computations
over the past three decades. A primary reason for the surge of interest
were conducted in the latent variables with reduced dimensionality,
in surrogate models is the increasing complexity of modern, highly
it was found that the computational efficiency as well as the level of
accurate models used to simulate or model the nonlinear processes,
dynamic information provided were improved. In Schulze et al. (2022),
scheduling problems, and complex thermodynamics that are ubiqui-
Koopman theory was used to derive a Wiener-type formulation for han-
tous in chemical engineering. Despite their increasing accuracy, such
dling multiple-input multiple-output (MIMO) input-affine dynamical
models encounter a number of challenges in downstream optimization
systems. Specifically, reduced-order surrogate models were developed
and control applications. Due to the model complexity, the computa-
by combining autoencoders with linear dynamic blocks. The models
tional expense in terms of both processing power and time required
were hypothesized to be particularly useful in control applications due to evaluate such models is exorbitantly large in many cases. While
to the high accuracy and dimensionality reduction capabilities of the single function evaluations may be feasible in a practical setting, if
proposed Wiener-type Koopman models. The integration of a Gaussian the models are to be embedded into an optimization problem, such
process model with MPC was proposed in Likar and Kocijan (2007) as set-point optimization or closed-loop control under an MPC, the
and applied to a gas–liquid separation process. The simple model computational demand becomes prohibitive due to the large number
structure of the Gaussian process model and, more importantly, the of function evaluations (typically hundreds or thousands) required to
statistical information such as the prediction uncertainty provided by find such solutions. This is further complicated by black-box models
such a model were found to be desirable qualities for control-centric since no simplification, such as omission of a term or otherwise, may
applications. be performed to find a compromise between model complexity and
A potential drawback of traditional ML models has been their black- accuracy. If the type of model used is noisy or has discontinuities, this
box nature, which limits their applicability and adoption in process further complicates the problem, especially since, in this case, finite-
systems engineering. Therefore, the field of hybrid modeling, some- differences cannot be used to estimate derivatives, which are crucial
times referred to as ‘‘gray-box’’ modeling, which aims to combine a in optimization. To overcome these challenges, mathematically simpler
priori first-principles knowledge or domain expertise with black-box models known as surrogate models have been proposed to approximate
approaches such as ML models to improve both the accuracy and the input/output relationship of the complex models using much fewer
interpretability of the overall model, has recently attracted significant model parameters and with much lower computational costs.
attention. Bikmukhametov and Jäschke (2020) outlines a number of ap- Although surrogate models can be used to approximate more com-
proaches to incorporating physics into data-driven modeling including plex models, another approach is to start with simpler model structures
but not limited to: to model the desired system of interest and only add complexity as
required. While methods such as N4SID and MOESP have been widely
(1) feature engineering, which refers to domain experts selecting used over the past decades with varying degrees of success depending
and/or creating physically meaningful features from the data set on the application and severity of the nonlinearities present, sparse
obtained from sensors rather than using the raw measurements identification for nonlinear dynamical systems (SINDy) is a recent
directly, method that aims to identify nonlinear ODEs directly from data, which
(2) residual modeling, which refers to building an ML model to are explicit and in closed-form, allowing them to be directly incor-
model the residual between the known first-principles model and porated into MPC or any other optimization problem. Due to the
sensor measurements in order to build a model that captures the availability of efficient differential equation solvers, the computational
plant-model mismatch, and cost of integrating such models is generally low, especially if the mod-
(3) linear meta-model of models, where the solutions from multiple els are well-conditioned. In the field of chemical engineering, SINDy
sub-models, which correspond to various parts of the overall has been used to identify reaction networks (Hoffmann et al., 2019)
system and are obtained using feature engineering, are combined and to build reduced-order models for modeling and controlling a
into a linear meta-model by taking a weighted linear combina- hydraulic fracturing process (Narasingam and Kwon, 2018). Despite
tion of all the models to represent the overall system accurately the application of SINDy to several chemical process examples in the
once the weights are tuned. literature, a number of specific issues encountered in the modeling
and control of chemical processes and plants remain to be addressed
Alhajeri et al. (2021) investigated the use of FNNs to build state adequately, based on our review of the literature. Therefore, this paper
estimators in the absence of full-state feedback. Specifically, an FNN provides a unified summary of recent advancements and novel exten-
was used to model the nonlinear terms in the dynamics such as those sions to SINDy to overcome numerous challenges that are encountered
corresponding to chemical reactions. In Alhajeri et al. (2022a,b), the when applying SINDy to the domain of chemical engineering. Besides
links between layers of a recurrent neural network (RNN) were dis- providing general guidance with respect to basis functions and numer-
connected (i.e., corresponding weights zeroed) based on the process ical concerns, more specifically, the difficulties of modeling multiscale
structure, leading to the elimination of erroneous model predictions systems, noisy sensor data, and industrial processes are discussed.
and improved overall model accuracy. Sansana et al. (2021) provides a In this manuscript, we apply SINDy to model and control three types
detailed overview of hybrid modeling and its evolution over the last of process systems:
2
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
(1) processes with time-scale multiplicities, where 𝑥̂ ∈ R𝑛 is the state vector of the sparse-identified model, and
(2) simulated processes with high levels of sensor noise, and 𝑓̂ and 𝑔̂ are the model parameters that capture the physical laws
(3) large-scale processes corrupted with high levels of industrial governing the system.
noise. Since most physical systems contain only a few terms in the right-
hand side of Eq. (2), if a large number of nonlinear basis functions
Each category of systems has associated challenges and are addressed
are considered as possible terms in 𝑓̂ and 𝑔,
̂ the space of all candidate
using different improvements upon the original SINDy algorithm, the
functions considered is rendered sparse. Hence, SINDy aims to identify
details of which will be discussed in the respective section. We note
the small number of active functions in 𝑓̂ and 𝑔̂ using algorithms
that, although SINDy was introduced with the intent of identifying the
that leverage sparsity. We first obtain a discrete set of full-state mea-
governing physical laws as closed-form differential equations consistent
surements from open-loop simulations or experiments and concatenate
with known physics of the system of interest, the application of SINDy
them into a data matrix 𝑋 and an input matrix 𝑈 ,
is not limited to such cases. As the product of SINDy is a closed-form ( ) ( ) ( )
ODE model with explicit nonlinearities, the resulting model can be ⎡ 𝑥1 𝑡1 𝑥2 𝑡1 ⋯ 𝑥𝑛 𝑡1 ⎤
⎢ ( ) ( ) ( ) ⎥
directly incorporated into an MPC for efficient computations. There- ⎢ 𝑥1 𝑡2 𝑥2 𝑡2 ⋯ 𝑥𝑛 𝑡2 ⎥
fore, in this work, we use SINDy as a system identification algorithm 𝑋=⎢ ⎥ (3a)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
with the ultimate goal of building dynamical models for controllers. ⎢ ( ) ( ) ( ) ⎥
The rest of this manuscript is outlined as follows: in Section 2, the ⎣ 𝑥1 𝑡𝑚 𝑥2 𝑡𝑚 ⋯ 𝑥𝑛 𝑡𝑚 ⎦
( ) ( ) ( )
general class of nonlinear process systems under consideration is de- ⎡ 𝑢1 𝑡 1 𝑢2 𝑡1 ⋯ 𝑢𝑟 𝑡 1 ⎤
scribed. Section 3 details the SINDy algorithm along with general ⎢ ( ) ( ) ( ) ⎥
⎢ 1 2
𝑢 𝑡 𝑢 2 2𝑡 ⋯ 𝑢𝑟 𝑡 2 ⎥
guidelines and tuning considerations for building SINDy models, and 𝑈 =⎢ ⎥ (3b)
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
its formulation in a model predictive controller. In Section 4, the ( ) ( ) ( ) ⎥
⎢
challenges of two-time-scale systems are discussed, while Sections 5 ⎣ 𝑢1 𝑡 𝑚 𝑢2 𝑡 𝑚 ⋯ 𝑢𝑟 𝑡𝑚 ⎦
and 6 address the challenges of noisy data for simulated processes
where 𝑥𝑖 (𝑡𝓁 ) and 𝑢𝑗 (𝑡𝓁 ) represent the measurement of the 𝑖th state and
and large-scale industrial processes, respectively. A detailed, end-to-
𝑗 th input at the 𝓁 th sampling time, respectively, where 𝑖 = 1, … , 𝑛,
end practical demonsration of applying SINDy to a highly nonlinear ̇ the time-derivative of 𝑋, is a required
𝑗 = 1, … , 𝑟, and 𝓁 = 1, … , 𝑚. 𝑋,
chemical process is given in Section 7. Finally, Section 8 provides a
matrix in the sparse identification algorithm and is either measured if
number of research directions for furthering the application of SINDy
possible (e.g., velocity) or otherwise estimated from 𝑋. Subsequently,
for process modeling and control.
a function library 𝛩(𝑋, 𝑈 ) is constructed with 𝑠 nonlinear functions of
𝑋 and 𝑈 . These 𝑠 functions are the candidate nonlinear functions that
2. Class of nonlinear process systems may be zero or nonzero in the right-hand side of Eq. (2). The sparse
identification algorithm exploits sparsity to calculate the coefficients
We consider the class of nonlinear process systems described by the associated with the terms in the library, 𝛩. Given the universality of
following first-order ODE: mononomials, polynomials, and trigonometric functions in engineering
systems (Brunton et al., 2016), they are often selected as the initial
̇ = 𝑓 (𝑥) + 𝑔(𝑥)𝑢 + 𝑤,
𝑥(𝑡) 𝑥(𝑡0 ) = 𝑥0 (1)
library in 𝛩. An example of an augmented library is
where 𝑥 ∈ R𝑛 is the state vector, 𝑢 ∈ R𝑟
is the manipulated input
vector, and 𝑤 ∈ R𝑛 is the noise vector. The unknown vector and matrix ⎡ ⎤
𝛩(𝑋, 𝑈 ) = ⎢ 𝟏 𝑋 sin(𝑋) e𝑋 𝑈 𝑈 𝑋2 ⎥ (4)
functions 𝑓 ∈ R𝑛 and 𝑔 ∈ R𝑛×𝑟 , respectively, constitute the process ⎢ ⎥
⎣ ⎦
model representing the inherent physical laws constraining the system
and are assumed to be locally Lipschitz vector and matrix functions of The goal of sparse identification is to find each of the 𝑠 coefficients
their arguments with 𝑓 (0) = 0. The manipulated input is constricted associated with the 𝑠 nonlinear functions considered in 𝛩 for each row
to be in 𝑟 nonempty convex sets defined as 𝑖 ⊆ R, 𝑖 = 1, … , 𝑟. of Eq. (2). Each state 𝑥𝑖 corresponds to a sparse vector of coefficients,
The sensor noise 𝑤 is assumed to be bounded within the set 𝑊 ∶= 𝜉𝑖 ∈ R𝑠 , that represent the nonzero terms in 𝑓̂𝑖 and 𝑔̂𝑖 in the respective
𝑤 ∈ R𝑛 ∶ ‖𝑤‖2 ≤ 𝜃, 𝜃 > 0. The class of systems of the form of Eq. (1) ODE, 𝑥̂̇ 𝑖 = 𝑓̂𝑖 (𝑥̂ 𝑖 ) + 𝑔̂𝑖 (𝑥̂ 𝑖 )𝑢. Consequently, there are 𝑛 such coefficient
is further restricted to the family of stabilizable nonlinear systems, vectors that must be calculated. In matrix notation, the unknown
i.e., there exist a sufficiently smooth control Lyapunov function 𝑉 (𝑥) quantity is
and a control law 𝛷(𝑥) = [𝛷1 (𝑥) ⋯ 𝛷𝑟 (𝑥)]⊤ that renders the nominal [ ]
(𝑤 ≡ 0) closed-loop system of Eq. (1) asymptotically stable under 𝛯 = 𝜉1 𝜉2 ⋯ 𝜉𝑛 (5)
𝑢 = 𝛷(𝑥). The stability region 𝛺𝜌 is defined as the largest level set of which is found by solving the following equation:
𝑉 where 𝑉̇ is rendered negative. Without loss of generality, the initial
time 𝑡0 is taken to be 0 throughout the article. 𝑋̇ = 𝛩(𝑋, 𝑈 )𝛯 (6)
Based on sparse regression and compressive sensing, sparse identi- 𝛯 = arg min ‖ ̇ ′‖ ‖ ′‖
‖𝑋 − 𝛩(𝑋, 𝑈 )𝛯 ‖2 + 𝜆 ‖𝛯 ‖1 (7)
𝛯′
fication of nonlinear dynamics (SINDy) is a novel method in the field
of system identification (Bai et al., 2015; Brunton et al., 2016) and has where the first term maximizes the fidelity of the model to the data,
been applied to a diverse array of engineering problems (Bhadriraju while the second term is an 𝐿1 regularization term that ensures sparsity
et al., 2020). The aim of SINDy is to use only input/output data from of 𝛯. In Eq. (7), 𝛯 ′ is a notational substitute for 𝛯. To solve Eq. (7),
a system to represent the dynamics in the form of the nominal system the least-squares problem is written in the following form, which may
of Eq. (1), be solved using a standard solver for a linear system of equations:
̂̇ = 𝑓̂(𝑥)
𝑥(𝑡) ̂ + 𝑔(
̂ 𝑥)𝑢
̂ (2) 𝛯 = arg min ‖ ̇ ′′ ‖
‖𝑋 − 𝛩(𝑋, 𝑈 )𝛯 ‖2 (8)
𝛯 ′′
3
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
where the matrix 𝛯 ′′ is 𝛯 ′ with all coefficients having an absolute value yield an independent and identically distributed (i.i.d.) data set with
below 𝜆 set to zero. Eq. (8) is repeatedly solved until convergence of maximum dynamic information and the least redundancy/repetition.
the non-zero coefficients. The iterations typically converge rapidly due Lastly, when dealing with a specific type of a system, any unique
to the sparse structure of 𝛯. An alternate algorithm to solve Eq. (6) is characteristics of the system that may hinder or facilitate data gener-
known as Sparse Relaxed Regularized Regression (SR3), which is based ation and quality should be considered. For example, since the goal is
on the well-known LASSO operator (Zheng et al., 2019). After finding to capture as much dynamic information as possible and not collect
𝛯 using either method, the identified model can be formulated as the redundant data over a large period of the simulation with constant val-
continuous-time differential equation, ues for all variables (i.e., after the system reaches a steady state), when
dealing with multiscale systems, techniques such as ‘‘burst sampling’’
𝑥̇ = 𝛯 ⊤ (𝛩(𝑥⊤ , 𝑢⊤ ))⊤
have been proposed in Champion et al. (2019). Burst sampling refers
where 𝛩(𝑥⊤ , 𝑢⊤ ) is a column vector containing symbolic functions of 𝑥 to the use of a short sampling time in regions with higher gradients
and 𝑢 from the chosen function library, and 𝑥⊤ represents the transpose and faster dynamics, such as the fast transient of the fast subsystem(s)
of 𝑥. of a multiscale system, while reducing the sampling rate once the
fast states converge to the slow manifold. Such advanced sampling
3.2. Data generation and SINDy modeling considerations strategies greatly reduce data storage requirements, and allow the user
to retain only the most informative bits of data to be used for modeling
When applying SINDy to an engineering problem, a number of and control. Such advanced data acquisition strategies should be used
factors affect the results and must be carefully considered before and instead of mere iterative procedures. On the other hand, if the system
during the construction of a SINDy model. operates at or near an unstable steady-state, integrating the system
for extended periods of time may lead to the states diverging (if the
3.2.1. Data generation system does not have another steady-state that is stable), which will
Data for system identification methods is typically obtained from cause errors during run time and hinder the data generation. Hence,
either open-loop simulations or open-loop experiments. The sampling for unstable operating points, it is desirable to use multiple shorter
period used to record the data, the variation of system inputs and trajectories. Such facilitation and difficulties of data generation must
outputs considered for data generation, and the distribution of the be considered on a case-to-case basis for the system being studied.
data set are some of the properties that affect the amount of dynamic
information contained in the data set and, as a result, the model quality. 3.2.2. Data preprocessing
Firstly, as information is lost when continuous data is sampled into In any machine learning (ML) application, it is essential to pre-
discrete data, a higher sampling rate (lower sampling period) generally process the data before training a model. The two preprocessing steps
leads to better system identification for any method including SINDy, required to apply SINDy are the train/test split and the normalization
especially since SINDy requires estimates of the time-derivative of the of the data set.
states using finite differences or some variant thereof. However, it is With respect to the split, the data set must first be split into the
important to consider practical limitations in terms of sampling. While training and test sets. Most of the training data set is used to regress the
an extremely small sampling period of 10−5 units may produce a data model coefficients, while a small fraction of the training set is reserved
set with high information density, from which derivative estimations as the validation set, which is used to tune the hyper-parameters. Once
can likely be made very accurately, leading to the identification of the optimal set of hyper-parameters is found, the model is finalized on
better models, such a high sampling rate is typically not possible the entire training data with the selected hyper-parameters. The final
to achieve in a chemical process application or even in many other model is then bench-marked against the unseen data, which is the test
engineering disciplines. Instead, the sampling period should be chosen set, also referred to as open-loop tests in control applications. The train–
to be as small as reasonably possible, which would also be desirable in validation–test split ratio is arbitrary to an extent, although general
practice. Manufacturer specifications of the relevant type of sensors for rules and best practices exist. The training set should generally be the
process variables may be used as a lower bound on the sampling period largest because the model performance is mostly related to the training
for simulations-based studies. data set, which is used to find the model parameters, while the test set
Secondly, the dynamic information captured in a data set is de- is only used to gauge the model accuracy post-training. In fact, as long
pendent on the initial conditions chosen, the input signal variation, as the data set is i.i.d., increasing the size of the training set will always
and the total simulation duration. The chosen combinations of initial lead to an improvement of the model accuracy. The train–validation–
conditions and input variables must cover as much of the operating test proportions are also determined by the application. For example,
region of interest as possible, and the simulation should be run until for methods with a large number of hyper-parameters to tune, such
it reaches the desired steady-state of operation, in order to maximize as neural networks, where usually large volumes of data are usually
the dynamic information captured in the data set. In contrast, if the available, it may be more valuable to have a larger validation set, e.g., a
data collection is carried out using a narrow range of initial conditions 50-25-25 train–validation–test split. On the other hand, for SINDy,
and/or inputs, or if a large part of the trajectories are zero values at since there is only one key hyper-parameter to tune, a larger training
the steady-state due to excessive run time, the data set may be large set may be warranted, such as a 60-20-20 or 70-15-15 split. When the
but contain little dynamic information to build an adequate model data set poses additional challenges such as noise, it may warrant an
from. Furthermore, based on our studies, SINDy modeling works best 80-10-10 split, since training noisy data is often a difficult task and
with longer trajectories, even if from fewer initial conditions, rather requires significant amounts of data, especially if any smoothing or
than an exponential number of extremely small trajectories from many other, additional preprocessing steps are required.
random initial conditions. The open-loop runs, whether experimental or A number of methods exist to normalize, i.e., center and scale the
simulations-based, should also reflect the various types of actions that data set. Three common methods for normalization are the 𝑧-score
are relevant in a control setting. For example, a number of trajectories scaler, Min-Max scaler, and Max-Abs scaler, of which the first two
should use a nonzero input to drive the system to various regions of methods both center and scale the data, while the Max-Abs scaler only
the state space, which will assist the model in identifying the input dy- scales it. Specifically, the 𝑧-score scaler first centers the data set to its
namics. However, a few runs should also initiate the system away from mean value by subtracting the mean, and then scales the data set to
the steady-state and let the states approach the steady-state under no have unit variance by dividing by the standard deviation. The Min-Max
control action, provided that the steady-state of interest is a stable one. scaler divides each number by the range of the data after subtracting
If the data set is generated following the above best practices, it should the minimum value of the data set and then adds the minimum value
4
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
back to the scaled number in order to transform all data points to values or centered finite difference is usually adequate and will eventually
between a lower and upper limit, usually 0 and 1, respectively. The yield similar results for the model coefficients at the end of the SINDy
Max-Abs scaler only scales the data to be between ±1 by dividing the algorithm. However, if the data is noisy, finite differences are unstable
data set by its maximum absolute value, without any subtraction or even at low noise levels. Hence, methods robust to noise such as the
centering. While the methods are described for a single variable, for the total variation regularized derivative (TVRD) and the smoothed finite-
multivariate case, the above operations are independently carried out difference (SFD) have been proposed (Brunton et al., 2016). TVRD is
on each variable or column of the data set. For chemical processes, as based on the total-variation regularization, which has been widely used
the process inputs and outputs are often written in deviation form from in image processing applications. In TVRD, the derivative is computed
their steady-state values, further centering may not be as crucial; all as the minimizer of a functional using gradient descent. In contrast,
variables will attain a value of zero at the steady-state. However, due to in SFD, the data set is first presmoothed using a filter, which may be
the large differences in the orders of magnitudes between the variables, a low-pass filter or the Savitzky-Golay filter, and finite-differences are
such as between concentrations and temperatures, scaling the data set then computed from the resulting, smoothed data set. As no gradient
is necessary in most cases. When using SINDy, where the sign of the descent is involved, computationally, it is generally faster than TVRD.
coefficients associated with certain terms can contain information on However, when both methods were used in Abdullah et al. (2022b),
the process dynamics (e.g., an increase in the input heating rate should each method yielded some of the final, optimal models for the various
lead to an increase in the temperature), methods that scale without cases studied, making them both reasonable choices to test.
centering such as the Max-Abs scaler can be a reasonable starting point
when deciding on a normalization method, as was also observed in 3.3. Incorporation of SINDy within MPC
some of our results.
Model predictive control is an advanced control methodology that
3.2.3. Hyper-parameter tuning utilizes a model of the process to predict the states/output over a
In the basic SINDy algorithm, the model structure and accuracy are prediction horizon to compute the optimal control actions by solving
simultaneously controlled and balanced by a single hyper-parameter, an online optimization problem. The formulation of a Lyapunov-based
𝜆. Therefore, tuning it is essential and usually carried out via a fine model predictive controller (LMPC) that uses a sparse-identified ODE,
search or coarse-to-fine search. The latter is computationally efficient 𝐹𝑠𝑖 (⋅), as the process model is presented below:
and used in this work. A coarse-to-fine search can be justified by the 𝑡𝑘+𝑁
fact that, for appropriately scaled data, for most systems, no nonzero = min (𝑥(𝑡),
̃ 𝑢(𝑡)) d𝑡 (9a)
𝑢∈𝑆(𝛥) ∫𝑡
terms will remain in the SINDy model for large values of 𝜆, such as 𝑘
𝜆 that is an order of magnitude greater than the scaled data set. In s.t. ̃̇ = 𝐹𝑠𝑖 (𝑥(𝑡),
𝑥(𝑡) ̃ 𝑢(𝑡)) (9b)
contrast, extremely small values of 𝜆 will yield dense models that are
prone to instability as well as redundant in the basis functions. Hence, ̃ 𝑘 ) = 𝑥(𝑡𝑘 )
𝑥(𝑡 (9c)
a coarse search can be used to bound the region where a finer search 𝑢(𝑡) ∈ , ∀ 𝑡 ∈ [𝑡𝑘 , 𝑡𝑘+𝑁 ) (9d)
can be carried out to identify the optimal model that yields the lowest ( )
𝑉̂̇ (𝑥(𝑡𝑘 ), 𝑢) ≤ 𝑉̂̇ 𝑥(𝑡𝑘 ), 𝛷𝑠𝑖 (𝑥(𝑡𝑘 )) , if 𝑥(𝑡𝑘 ) ∈ 𝛺𝜌̂ ∖𝛺𝜌𝑠𝑖 (9e)
loss or error metric. Fig. 1 demonstrates how this process can be used to
select the optimal model (corresponding to the orange point) through 𝑉̂ (𝑥(𝑡))
̃ ≤ 𝜌𝑠𝑖 , ∀ 𝑡 ∈ [𝑡𝑘 , 𝑡𝑘+𝑁 ), if 𝑥(𝑡𝑘 ) ∈ 𝛺𝜌𝑠𝑖 (9f)
a very fine search or even models very close to the optimal in terms of
where 𝑥̃ is the predicted state trajectory, 𝑆(𝛥) represents the set of
accuracy by a much coarser search (corresponding to the green region).
piece-wise constant functions with a period of 𝛥, and 𝑁 is the number
As expected, it can be observed that values of 𝜆 > 1.0 zero all terms in
of sampling periods within each prediction horizon. 𝑉̂̇ (𝑥, 𝑢) is the
the SINDy model, leading to a constant error for all such 𝜆. At the lower ̂
extreme of values of 𝜆, the model is no longer sparse, and some terms time-derivative of the Lyapunov function and is equal to 𝜕 𝑉𝜕𝑥(𝑥) 𝐹𝑠𝑖 (𝑥, 𝑢).
that may even lead to an unstable model can start to have nonzero 𝑢 = 𝑢∗ (𝑡), 𝑡 ∈ [𝑡𝑘 , 𝑡𝑘+𝑁 ) denotes the optimal input sequence over the
coefficients, in which case the MSE rapidly increases even beyond the prediction horizon, which is provided by the optimizer. The LMPC
case of all the terms being zero. This is especially the case since Fig. 1 applies only the first value in 𝑢∗ (𝑡𝑘 ) over the next sampling period
is based on the work in Abdullah et al. (2022a), where the case of noisy 𝑡 ∈ [𝑡𝑘 , 𝑡𝑘+1 ), and solves the optimization again at the next sampling
data is considered. time 𝑡𝑘+1 .
The basis functions chosen for the candidate library are another In the MPC formulation, Eq. (9a) is the objective function to be
central element of the SINDy method, which may be treated similarly as minimized and is chosen to be equal to the integral of (𝑥(𝑡),
̃ 𝑢(𝑡)) over
a hyper-parameter, in the sense that it is not entirely arbitrary and may the prediction horizon. A typical cost function that achieves a value
require addition/removal of basis functions as necessary. Expanding of zero at the steady-state in the absence of manipulated input action,
it without computational considerations is not recommended as the while simultaneously weighing the deviation in both state and input
overall optimization problem will then suffer from the curse of dimen- from the origin is the quadratic stage cost, which is often used in LMPC
sionality, while also rendering the model more prone to instabilities due and is formulated as follows:
to dense model structures. Therefore, if there is any physical insight on ̃ 𝑢(𝑡)) = 𝑥⊤ 𝑄1 𝑥 + 𝑢⊤ 𝑄2 𝑢
(𝑥(𝑡), (10)
the type of nonlinearities that are potentially relevant to the system
of interest, this physical insight should be incorporated into the opti- Eq. (9b) describes the sparse-identified model that is used to predict the
mization search (e.g., biasing the order with which the nonlinearities closed-loop states over the prediction horizon starting from the initial
are considered in the optimization search in an approach similar to the condition of Eq. (9c) while 𝑢 is varied within the constraints defined
ALAMO modeling technique (Wilson and Sahinidis, 2017)). For chem- by Eq. (9d). The last two constraints of Eq. (9e) based on the Lyapunov
ical processes with nonlinear reaction terms, a common consideration function, 𝑉 = 𝑥⊤ 𝑃 𝑥, guarantee that the closed-loop state either moves
may be to include exponential terms involving the temperature as the towards the origin at the next sampling time if the state is outside 𝛺𝜌𝑠𝑖
Arrhenius rate law is widely used in deriving mass and energy balances or is contained within 𝛺𝜌𝑠𝑖 for the entire prediction horizon once the
for reactors. state enters 𝛺𝜌𝑠𝑖 .
For estimating the time-derivative 𝑋̇ in the right-hand side of The generally nonlinear, non-convex optimization problem of
Eq. (6), which is typically unavailable from sensor measurements, the Eq. (9) is solved at every sampling period, and the first entry of the
ideal method to be used depends on the nature of the data set. For clean optimal 𝑢∗ calculated is sent to the actuator, following which the
data, any finite difference-based approach such as forward, backward, optimization is re-solved at the next sampling period using the new
5
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 1. Values of two error metrics, the Akaike Information Criterion (AIC) and the mean-squared error, as functions of 𝜆 for model selection.
state measurement. The optimization is solved using the numerical system into two lower-order subsystems, each separately modeling the
solver Ipopt (Wächter and Biegler, 2006) with its Python front-end slow and fast dynamics of the original multiscale system. Specifically,
named PyIpopt. For the contractive constraint of Eq. (9e), the universal following a short transient period, the fast states converge to a slow
Sontag control law (Lin and Sontag, 1991) or a well-tuned, stabilizing manifold and can be algebraically related to the slow states using
proportional-only controller may be used. It is important to note that nonlinear functional representations. In Abdullah et al. (2021a), we
the matrices 𝑃 , 𝑄1 , and 𝑄2 must be tuned for the LMPC to achieve applied nonlinear principal component analysis (NLPCA) developed
the best results, and poorly tuned weight matrices may lead to the by Dong and McAvoy (1996) to capture the nonlinear relationship
solver not converging to a solution within the sampling period or the between the slow and fast states, while using sparse identification to
maximum allowed number of iterations. derive well-conditioned, reduced-order ODE models for only the slow
states that could then be integrated with much larger integration time
4. Reduced-order modeling for two-time-scale systems steps due to their numerical stability. Once the slow states are predicted
with the ODE model, it is possible to use NLPCA to algebraically predict
Time-scale separation is a common phenomenon found in chem- the fast states without any integration.
ical processes such as distillation columns and catalytic continuous Nonlinear principal component analysis is a nonlinear extension of
stirred-tank reactors (CSTRs) (Chang and Aluko, 1984). If the time- principal component analysis (PCA). PCA is a commonly used dimen-
scale separation is not accounted for in a standard nonlinear feedback sionality reduction technique that finds a linear mapping between a
controller, the controller may be ill-conditioned or even unstable in higher-dimensional space (of the data) and a lower-dimensional space
closed-loop (Kokotović et al., 1999). Due to the distinct slow and fast with minimal loss of information by minimizing the squared sum
dynamics in such systems, the process will be represented by stiff of orthogonal distances between the data points and a straight line.
ODEs in time when using SINDy without any modification. Such stiff NLPCA attempts to generalize this to the nonlinear case in two steps:
ODEs, when integrated with an explicit integration method such as first, a 1-D curve that passes through the ‘‘middle’’ of the data points
forward Euler, require a very small integration step size to prevent known as the ‘‘principal curve’’ is found; second, the principal curve is
divergence and yield sufficiently accurate solutions. Hence, Abdullah parametrized in terms of distance of each point along the curve by using
et al. (2021a) used the mathematical framework of singular pertur- a feedforward neural network with nonlinear activation functions.
bations to propose the decomposition of the original two-time-scale Overall, to make a prediction of the state of the two-time-scale system,
6
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 2. Demonstration of the evolution of NLPCA based on PCA and its relation to nonlinear regression.
the measurement of the slow states at the current sampling time is networks (Hornik et al., 1990; Hornik, 1991). To improve the network
passed to an explicit integrator (such as a Runge–Kutta scheme) that capability, a two-hidden-layer FNN was used in this work, as depicted
integrates the sparse-identified model to predict the slow states over the in Fig. 3. The learning rate, which is the most influential hyper-
prediction horizon, which are then sent to the FNN to yield a prediction parameter, requires careful tuning to obtain the optimal FNN model
of the fast states. in the second step of NLPCA.
Two-time-scale systems can be written in the form, An LMPC that uses Eq. (12) as the process model of Eq. (9b) may
be constructed. Such an LMPC will predict the slow states of the two-
𝑥̇ 𝑠 = 𝑓𝑠 (𝑥𝑠 , 𝑥𝑓 , 𝑢, 𝜖) (11a) time-scale system and optimize the cost function based on the predicted
𝜖 𝑥̇ 𝑓 = 𝑓𝑓 (𝑥𝑠 , 𝑥𝑓 , 𝑢, 𝜖) (11b) slow states. Due to the coupled nature of the states, it is sufficient to
stabilize the slow states to guarantee asymptotic stability for the entire
𝑛𝑓
where 𝑥𝑠 ∈ R𝑛𝑠
and 𝑥𝑓 ∈ R denote the slow and fast states, respec- system. However, if computational resources are available, the FNN
tively, with 𝑛𝑠 + 𝑛𝑓 = 𝑛. 𝜖 is a small positive parameter that represents may be used to predict the fast states, and the LMPC can then account
the ratio of slow to fast dynamics of the original system. By making for the full-state of the system. In Abdullah et al. (2021b), only the slow
standard assumptions from the singular perturbation framework, the subsystem was used to ensure the LMPC optimization can be solved
slow subsystem of Eq. (11a) can be rewritten in the form required for within every sampling period.
sparse identification, The primary advantage of the reduced-order model in LMPC is
𝑥̂̇ 𝑠 = 𝐹𝑠𝑖 (𝑥̂ 𝑠 , 𝑢) ∶= 𝑓̂(𝑥̂ 𝑠 ) + 𝑔(
̂ 𝑥̂ 𝑠 )𝑢, 𝑥̂ 𝑠 (𝑡0 ) = 𝑥𝑠0 (12) that the lower computational cost of the SINDy model inference, with
nearly zero loss in model accuracy, directly impacts the difficulty of
where 𝐹𝑠𝑖 is the sparse-identified slow subsystem. the optimization required to be solved by the LMPC. Hence, the LMPC
In the first step of NLPCA, we capture the unidimensional principle based on the reduced-order SINDy model can use a longer prediction
curve in the 𝑛-dimensional state space to find the nonlinear algebraic horizon, which has the potential to improve closed-loop performance in
relationship between the slow and fast states as shown in Fig. 2. The terms of faster convergence to the origin and a lower total cost function
curve, denoted by (𝜇) is parametrized in terms of the ordered arc- over the simulation duration, the former of which is demonstrated most
length along the curve, 𝜇. If 𝑥̄ ∈ R𝑛 is the full-state vector, we can clearly in the concentration profile in Fig. 4, which is based on the work
define the projection index 𝜇 ∶ R𝑛 → R as: in Abdullah et al. (2021b).
7
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 4. Concentration (state) profile for a CSTR in closed-loop under the LMPC utilizing the first-principles (FP) model with 𝑁 = 16 (blue line) and the SINDy slow model with
𝑁 = 24 (orange line).
variance of 10−4 , which is very small in the context of process systems. such as the work of Rudy et al. (2019) also emphasize their limitations
Although a number of works can be found that focus on alternate when integrating the models from new initial conditions or attempting
approaches to build dynamical models in the presence of noise, such as to capture dynamics away from a steady state, both of which are
Runge–Kutta time-steppers with embedded neural networks to handle relevant in control-centric applications. While a detailed discussion of
the nonlinear elements (González-García et al., 1998; Fablet et al., the comprehensive literature can be found in Abdullah et al. (2022b),
2018; Raissi et al., 2018; Rudy et al., 2019), these are alternatives in summary, one paper proposed an improvement upon the SINDy
to SINDy rather than improvements upon the original SINDy method. algorithm in the presence of moderate noise that demonstrated promise
As a result, the methods to assist the modeling of noisy data as well and could be developed further. This method, proposed by Zhang
as the subsequent results are largely different from SINDy and its and Lin (2021), termed subsampling-based threshold sparse Bayesian
extensions. For example, the unexpected results of Raissi et al. (2018) regression (SubTSBR), involved randomly subsampling a fraction of
when using Runge–Kutta time-steppers were later explained using well- the entire data set multiple times and selecting the best model by
known characteristics of neural networks. More advanced time-steppers using a model-selection criterion. The issue of noisy data has also been
8
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 5. Data flow diagram of subsampling with co-teaching for noisy data.
studied in the field of computer science, where fitting a neural network most practical data sets. Although as a standalone improvement, sub-
to noisy data often leads to the neural network overfitting the data sampling greatly improves the performance of SINDy under moderate
and capturing the noisy pattern instead. A recent technique proposed noise levels, it is insufficient at higher noise levels, where co-teaching
to overcome this challenge is co-teaching, where a simplified first- becomes incumbent.
principles process model is used to generate noise-free training data to Co-teaching is a method that has been used in the field of computer
assist the model training step by reducing overfitting. In this section, we science, primarily in image recognition, where neural networks are
propose a novel extension to SINDy by combining it with subsampling trained to categorize images into pre-defined classes. However, often,
and co-teaching to handle highly noisy sensor data. a small proportion of the images in the training data set may be misla-
Subsampling is a classical statistical technique where a fraction of beled, greatly deteriorating the performance of the neural network. As
the total number of samples in a data set are randomly extracted and manually relabeling vast amounts of images is not feasible, the method
of co-teaching was proposed wherein newly generated noise-free data
analyzed to estimate statistical parameters (Efron and Stein, 1981) or
is fed during model training to reduce the impact of the noisy data. The
speed up algorithms (Rudy et al., 2017). However, subsampling can
concept has recently been extended to regression problems, specifically
also be used to instead improve the modeling accuracy of SINDy when
the modeling of dynamical systems using long short-term memory
the data set is highly noisy. This is because common regression methods
(LSTM) networks (Wu et al., 2021a,b). The central idea of co-teaching,
such as least squares utilize the complete data set by assuming that only
which was highlighted in Wu et al. (2021b), is that neural networks fit
a small fraction of the data samples are highly noisy or outliers. As a
simpler patterns in the early iterations of model training, which implies
result, if the entire data set is used, the higher percentage of ‘‘good’’
that noise-free data will yield low values of the loss function, while
data samples should smooth the large noise present in the data set. noisy data will tend to produce high loss function values. Therefore,
However, this assumption breaks down if the noise is either very high the training can be made more robust to noise and overfitting if the
or uniformly present throughout the data set. In such a case, there are noisy data is augmented with a nonzero proportion of noise-free data
insufficient ‘‘good’’ data samples to smooth out the noise from the very from simulations of simplified, approximate first-principle models that
highly corrupted data samples. In the context of SINDy, subsampling can be derived for the complex, original nonlinear system.
refers to selecting random fractions of the data set multiple times in Improving the sparse identification algorithm with both subsam-
order to sample only the less noisy data points for carrying out the pling and co-teaching enables it to tackle consistently noisy data sets
sparse regression. The key requirement for subsampling is that the where subsampling alone is insufficient. This is because subsampling
number of unknown weights to be estimated in the SINDy procedure only subsamples, in the best case scenario, the least noisy data points,
have to be fewer than the number of total data samples available which are still too noisy to yield an adequate model. In the proposed
(i.e., the problem has to be overdetermined), which is the case for method, first, a random subset of the entire data set 𝑋noisy and its
9
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
corresponding 𝑋̇ noisy are sampled, which are then mixed with noise-free Table 1
Test set MSE for the CSTR system for four noise levels.
data generated from approximate first-principles models of the process,
𝑋FP and 𝑋̇ FP . The resulting mixed data set is used to solve for the 𝜎𝑇 (K) Base Only Subsampling +
subsampling Co-teaching
unknown weights of the 𝑠 terms in the SINDy function library. Once
0.4 0.01113 0.01059 0.01102
a model is identified, a model-selection criterion is used to evaluate
2 0.10510 0.09922 0.10370
the model performance. Three parameters must be specified in the 4 0.49837 0.40037 0.36283
algorithm: 𝑝 ∈ (0, 1) or the subsampling fraction, 𝑞 ∈ (0, 1) or the noise- 6 0.98210 1.89607 0.77613
free subsampling fraction, and 𝐿 ≥ 1, which is the number of times to
independently subsample and identify a SINDy model. The algorithm
randomly subsamples and mixes 𝑝 × 𝑚 data points from the noisy data
set with 𝑞 × 𝑚 data points from the noise-free data set to produce the the steady-state than predicted by either subsampling-based model.
data and derivative submatrices, 𝑋𝑖 and 𝑋̇ 𝑖 , respectively, for subsample Therefore, the co-teaching-based subsampling model is closer to the
𝑖 with 𝑖 = 1, 2, … , 𝐿. 𝑈𝑖 are the corresponding (𝑝 + 𝑞) × 𝑚 points from data than the subsampling-only model in these ranges where the states
deviate further from the steady-state. However, especially when deal-
the input matrix 𝑈 . The sparse regression equation to be solved is then
ing with noisy data, the modeling performance is best characterized
quantitatively in terms of the MSE, which are shown in Table 1.
𝑋̇ 𝑖 = 𝛩(𝑋𝑖 , 𝑈𝑖 )𝛯𝑖 (15) The MSE for subsampling with co-teaching is consistently the lowest
across all noise levels except the lowest noise level, where all methods
where 𝛯𝑖 are the coefficients associated with each library function that
show very similar MSE and the differences are insignificant because
is identified using the data subset {𝑋𝑖 , 𝑋̇ 𝑖 , 𝑈𝑖 }. Once 𝛯𝑖 is determined
of the superior performance of the models from all three methods.
and, therefore, the 𝑖th ODE model is found, the process is repeated 𝐿
At higher noise levels, the differences become more significant, with
times until all 𝐿 models are found, following which the model selection
the subsampling-only based model even diverging when 𝜎𝑇 = 6 K.
criterion is used to extract the optimal model. An example of a model
At low to moderate noise levels, however, the MSE of the models
selection criterion that balances the error with the model sparsity,
using subsampling, whether with co-teaching or not are very similar.
which is crucial for SINDy, is the Akaike Information Criterion given
Therefore, co-teaching should be used once the model performance
by the expression,
from using only subsampling deteriorates.
1 ∑(
𝑚
)2
MSE = 𝑥(𝑡𝑗 ) − 𝑥(𝑡
̂ 𝑗) (16) 6. Ensembled-based dropout-SINDy to model highly noisy indus-
𝑚 𝑗=1
trial data sets
AIC = 𝑚 log MSE +2𝐿0 (17)
While subsampling with co-teaching is a viable option to tackle
where MSE is the mean-squared error, and 𝐿0 denotes the zeroth norm,
the issue of high sensor noise in the data measurements, the primary
which is equal to the number of nonzero terms in the sparse-identified
drawback of co-teaching is its requirement for a first-principles process
model.
model that is at least similar to the original system with respect to
The hyper-parameters unique to the subsampling with co-teaching
the dynamics and the steady-state values. However, in the case of
algorithm, besides the ones described in Section 3.2.3, are the values
industrial data, the dynamics may be far too complex for any theo-
of 𝑝, 𝑞, and 𝐿. It should be noted that the goal is to capture the
retically derived ODE to adequately capture the system. Therefore, for
original noisy data rather than the noise-free data from first-principles
the case of dealing with high levels of industrial noise, a new direction
simulations. Hence, the fraction 𝑞 should generally be small, while
and improvement on SINDy is proposed, which is a form of ensemble
𝑝 can be any real number between 0 and 1 as long as both metrics
learning that we term ‘‘dropout-SINDy’’.
satisfy 𝑝 + 𝑞 ≤ 1. While increasing 𝐿 will generally improve the model
Ensemble learning refers to the use of multiple models in place
performance because a larger number of sub-models are identified
of one model. Homogeneous ensemble learning involves the use of
for the optimal model to be chosen from, the computational costs of
the multiple models of the same type, while heterogeneous ensemble
increasing 𝐿 must be considered. Fig. 5 shows the flow of the data
learning strategies use a combination of different types of models to
throughout the algorithm.
improve the predictive performance. In this work, only homogeneous
Open-loop modeling results for a CSTR system are shown in Fig. 6,
ensemble learning is considered. However, in the context of SINDy,
where the base SINDy model is observed to deteriorate in performance
even the terminology, ‘‘homogeneous ensemble learning’’, can refer
at the level of noise considered (Gaussian noise with a standard de-
to two distinct methods: either the data set can be subsampled to
viation of 𝜎𝑇 = 4 K in the temperature). Subsampling, even by itself,
produce multiple models with the same underlying model structure,
greatly improves the SINDy model performance, while subsampling
or multiple models with varying function libraries may be built using
with co-teaching further improves the performance. The improvement
the same data set. The subsampling method described in Section 5 is an
using co-teaching is most significant at the highest levels of noise
example of the former, but it was shown in Abdullah et al. (2022b) that
considered (Gaussian noise with a standard deviation of 𝜎𝑇 = 6 K for
subsampling, by itself, cannot improve the SINDy algorithm under high
the temperature) since models constructed using only subsampling even
noise levels. In contrast, for the case of industrial noise, the proposed
diverged in some cases (Abdullah et al., 2022b). Visually, the models
dropout-SINDy method uses only a fraction of the function library 𝛩 to
can be assessed in terms of how close the model predictions are to
identify each submodel. Hence, multiple models can be identified, each
the data as well as whether the states evolve in the correct direction.
with a random subset of the library. Similarly to co-teaching, this can
Although this is difficult to do for the entire simulation domain at the
reduce the impact of noisy data and, additionally, improve the stability
higher levels of noise, analyzing specific time domains in Fig. 6 can
properties of the SINDy models because a large number of nonzero
reveal differences between the models. In Fig. 6, in the ranges 𝑡 ∈
terms (a dense coefficient matrix 𝛯) can often lead to instabilities. The
[2, 4] ∪ [14, 16], the base SINDy model clearly deteriorates and deviates
sparse regression equation to be solved for dropout-SINDy is similar
from the other models and the data, which is mostly concentrated much
to Eq. (15), but the state, input, and derivative data sets remain as
higher, near the steady-state, indicating the poor performance of the ̇ respectively, while only the library 𝛩𝑖 and coefficients
𝑋, 𝑈 and 𝑋,
base SINDy model in these regions. The models using subsampling with
𝛯𝑖 are varied between the 𝑛models models in the ensemble, where 𝑖 =
and without co-teaching can be further differentiated in the regions
1, 2, … , 𝑛models :
𝑡 ∈ [6, 10] ∪ [12, 14], where the subsampling-only model predicts smaller
deviations from the steady-state, but the data deviates further from 𝑋̇ = 𝛩𝑖 (𝑋, 𝑈 )𝛯𝑖 (18)
10
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 6. Comparison of original noisy data (gray dots) with results from sparse identification without any subsampling (blue line), subsampling without co-teaching with 𝑝 = 0.2
(green line), and subsampling with co-teaching and 𝑝 = 0.16, 𝑞 = 0.04 (red line) for the temperature 𝑇 of a CSTR system.
Eq. (18) is solved using a different subset of the computed library 𝛩𝑖 Table 2
Parameter values for nonisothermal CSTR example.
each time to find the corresponding set of model coefficients for the
nonzeroed terms, 𝛯𝑖 . In each 𝛩𝑖 , 𝑛dropout library functions are randomly 𝐹 = 5.0 m3 ∕h 𝑉 = 1.0 m3
𝑘0 = 8.46 × 106 m3 kmol−1 h−1 𝐸 = 5.0 × 104 kJ∕kmol
dropped out, with the corresponding entries in 𝛯𝑖 also being zeroed 𝑅 = 8.314 kJ kmol−1 K −1 𝜌𝐿 = 1000.0 kg∕m3
before solving Eq. (18). Once all the sub-models are found, the final 𝛥𝐻𝑟 = −1.15 × 104 kJ∕kmol 𝑇0 = 300 K
model must be selected from the 𝛯1 , … , 𝛯𝑛models . In this case, although 𝐶𝐴0𝑠 = 4 kmol∕m3 𝑄𝑠 = 0 MJ∕h
the mean, median, and mode are all possible methods to find the central 𝐶𝐴𝑠 = 1.95 kmol∕m3 𝑇𝑠 = 402 K
𝐶𝑝 = 0.231 kJ kg−1 K −1
tendency of all the 𝛯𝑖 , the mean is likely to yield dense models because
even one nonzero value for a certain coefficient in any one of the sub-
models will cause the coefficient to be nonzero. In contrast, the median
and mode do not suffer from this. However, the mode may not be useful to aid chemical engineers in process design and optimization. Chem-
since even two sub-models with a zero coefficient for a library term ical process simulators have several advantages over first-principles
will lead to the term being zeroed if none of the other nonzero values models as they contain numerous built-in packages to handle most
are repeated exactly equally, leading to excessive sparsity. Hence, the common unit operations, thermodynamic properties, molecular inter-
median is determined to be the most reasonable measure of central actions, etc., which result in significantly more accurate models that
tendency for dropout-SINDy. more closely represent the plant process dynamics. In Abdullah et al.
The number of functions of the candidate library to be omitted in (2022a), Aspen Plus Dynamics was used to build the process flow
each model, 𝑛dropout , as well as the number of models, 𝑛models , must diagram shown in Fig. 8, which was then used for both data generation
be tuned when building a dropout-SINDy model. A very small value of as well as closed-loop simulations in order to imitate the industrial
𝑛dropout implies that the sub-models in the ensemble are very similar process.
to the base SINDy model without any dropout, negating most if not all When using the basic SINDy algorithm to model the highly noisy
performance gains of the proposed method. But if 𝑛dropout is too large, industrial data from Aspen Plus Dynamics, it is found that basic SINDy
excessive sparsity will lead to models that lack the complexity required is unable to model the dynamics or even correctly predict the final
to capture the dynamics. Similarly, a small value of 𝑛models may lead steady-state of the open-loop system, the latter of which greatly af-
to the optimal model not being identified as the search is conducted fects the performance of a controller. However, when dropout-SINDy
over a smaller set, but increasing 𝑛models also increases computational is used on the industrial data set, it is able to capture most of the
costs and might even promote instability if the median of the model dynamics and correctly predict the final steady-state values of the
coefficients is shifted by a larger proportion of poor models. Hence, states. When an MPC is designed with the dropout-SINDy model, it
this balance between computational cost, model improvement, model can be demonstrated to achieve closed-loop stability and converge
complexity, and stability must be considered when tuning 𝑛models and to the steady-state faster and with less energy and overshoot than a
𝑛dropout when using dropout-SINDy. The data flow throughout the corresponding proportional-controller as shown in Fig. 9.
algorithm is outlined in Fig. 7.
In this section, ‘‘industrial’’ data refers not to an experimental data 7. Demonstration of the use of SINDy to model a nonlinear chem-
set but data generated from a chemical process simulated in the high- ical process
fidelity chemical process simulator, Aspen Plus Dynamics, which is
a widely used simulator in the chemical sector that has been used In this section, the modeling of a highly nonlinear CSTR
to build steady-state and dynamic simulations of chemical processes operating at an unstable steady-state using SINDy is considered.
11
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 8. Aspen Plus model process flow diagram of an ethylbenzene production process.
Fig. 9. State and input profiles for a CSTR in closed-loop under no control (blue line), a P-controller (red line), and the LMPC utilizing the dropout-SINDy model (black line)
throughout the simulation period 𝑡𝑝 = 1.5 h.
Specifically, a perfectly mixed, nonisothermal CSTR where an irre- where 𝑘0 , 𝐸, and 𝑅 represent the pre-exponential constant, activa-
𝑘
versible, exothermic reaction with second-order kinetics, A ⟶ B, takes tion energy of the reaction, and the ideal gas constant, respectively.
place is studied. The rate constant of the reaction, 𝑘, is not assumed Using material and energy balances, the differential equation model
to be constant and, instead, an Arrhenius relation of the following describing the CSTR dynamics is derived as follows:
form is used to determine the rate constant as a function of the Kelvin d𝐶𝐴 𝐹 𝐸
= (𝐶𝐴0 − 𝐶𝐴 ) − 𝑘0 e− 𝑅𝑇 𝐶𝐴2 (20a)
temperature, 𝑇 : d𝑡 𝑉
d𝑇 𝐹 −𝛥𝐻 𝐸 103 𝑄
𝐸
𝑘 = 𝑘0 e− 𝑅𝑇 = (𝑇0 − 𝑇 ) + 𝑘 e− 𝑅𝑇 𝐶𝐴2 + (20b)
(19) d𝑡 𝑉 𝜌𝐿 𝐶 𝑝 0 𝜌𝐿 𝐶 𝑝 𝑉
12
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 10. Three types of data generation for the nonisothermal CSTR operating at an unstable steady-state.
where 𝐶𝐴 , 𝑉 , and 𝑇 denote the concentration of reactant A in the practical method while also providing sufficient dynamic information
reactor, the volume of the reacting liquid inside the reactor, and the for an algorithm to capture. Therefore, all simulations of Eq. (20) were
time-varying absolute temperature of the reactor. The concentration carried out using an integration step size of ℎ𝑐 = 10−4 h and sampled
of species A in the inlet stream, the inlet temperature, and the volu- every 𝛥 = 0.01 h (36 s), which is a reasonable sampling period for such
metric flow rate fed to the reactor are represented by 𝐶𝐴0 , 𝑇0 , and 𝐹 , a chemical process. The simulations were carried out for a duration of
respectively. A heating jacket supplies/removes heat to/from the CSTR 𝑡𝑓 = 1 h since most trajectories reached a steady-state within 1 h of
at a rate of 𝑄. The density and heat capacity of the reacting liquid simulation duration.
are assumed to have constant values of 𝜌𝐿 and 𝐶𝑝 , respectively, while Due to the various ways that one may generate or obtain data for
𝛥𝐻 denotes the enthalpy of the reaction. The values of the process this system, three types of data generation were carried out, and each
parameters are provided in Table 2. With the values from Table 2 data set was then used to attempt to build SINDy models. Represen-
substituted into Eq. (20), the exact ODE model to be identified using tative trajectories for each data set are shown in Fig. 10. The types of
SINDy can be found to be data generation and their advantages/disadvantages are summarized as
d𝐶𝐴 6013.95236949723 follows:
= 5𝐶𝐴0 − 5𝐶𝐴 − 8.46 × 106 e− 𝑇 𝐶𝐴2 (21a)
d𝑡 1. Method: Open-loop step tests are carried out using numerous,
d𝑇 6013.95236949723
= 1500 − 5𝑇 + 4.211688 × 108 e− 𝑇 𝐶𝐴2 + 4.3𝑄 (21b) random initial conditions and input signals until a steady-state
d𝑡 is reached.
The objective is to build a SINDy model for the CSTR system of Eq. (20),
ideally for the entire state-space or at least a large region of the • 1000 such trajectories were generated in this data set.
state-space around the desired operating point, which is the unstable • Initial conditions were randomly selected with the follow-
steady-state, (𝐶𝐴𝑠 , 𝑇𝑠 ) = (1.95 kmol∕m3 , 402 K). The factors that most ing restrictions on the initial states: 𝐶𝐴 ∈ [0.2, 3.7] kmol∕m3
significantly impact the quality of the SINDy model for this system were and 𝑇 ∈ [327, 477] K
found to be the data generation and the candidate library of basis func- • Input signals were randomly generated with the following
tions considered for 𝛩(𝑋, 𝑈 ), both of which are discussed in detail over restrictions on the inputs: 𝐶𝐴0 ∈ [0.5, 7.5] kmol∕m3 and
the next two subsections. To compare models quantitatively, for the 𝑄 ∈ [−500, 500] MJ∕h
sake of brevity, rather than reporting every model obtained from each • This is a standard method of data generation within chem-
data generation method or candidate library considered, in the rest of ical engineering in simulations-based applications as well
this section, the maximum absolute error in the Kelvin temperature will as experimental practices. As an established method, data
often be reported because the errors in the temperature are larger in generation via this method is easily conducted, a wide
terms of absolute value and intuitively understood. area of the state space can be covered by exciting the
input signals as desired, and a large amount of dynamic
Remark 1. Due to the explicit nature of SINDy models, once the information is present in the data set.
ODE models are obtained from SINDy, incorporating them into an • Due to the operating region being the unstable steady-
MPC is generally straightforward. The challenge of SINDy-based MPC, state, the trajectories, being in open-loop, will settle at
however, lies in the modeling rather than MPC implementation, as the stable steady-states. However, this was not found to
opposed to entirely black-box approaches such as recurrent neural deteriorate the performance, likely because the dynamics
networks and other deep learning models, which can approximate prac- of the reactor itself are independent of the region.
tically any input/output data if provided with sufficient data and tuned • As the states may achieve extreme values when the in-
thoroughly, but can encounter computational and technical challenges put signals are varied too widely (such as temperatures
when implemented in closed-loop MPC. Hence, this section focuses as high as 1000 K or as low as 1 K for certain exces-
solely on the modeling of the nonlinear CSTR, since past works (Ab- sively large/small values of 𝑄), the best practice is to
dullah et al., 2021b, 2022b) have already demonstrated the application limit the range of input signals when using this method
of SINDy models in MPC with open- and closed-loop simulation results of data generation. This is particularly important when
for a diverse array of systems. The goal of this section is to familiarize using finite-differences to estimate 𝑋, ̇ which is the only
the reader with the intricacies of building SINDy models in a chemical estimation method available in a practical setting. It was
engineering paradigm. found that when data generated indiscriminately including
trajectories that settle at 1000 K or 1 K were included
7.1. Data generation in the training data set, i.e., all 1000 trajectories were
used in training, SINDy had difficulties identifying the
For system identification, the data set used to identify the system is correct model if the derivatives were estimated with finite-
a crucial element. Hence, the data generation must be carried out in a differences. If the exact derivatives were provided (which
13
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 11. State-space profiles for open-loop simulation using the first-principles model of Eq. (20) and the SINDy model obtained using type 1 data generation, respectively, for
various sets of inputs and initial conditions (marked as solid dots) 𝑥0 in the vicinity of the desired operating point.
Fig. 12. State-space profiles for open-loop simulation using the first-principles model of Eq. (20) and the SINDy model obtained using type 2 data generation, respectively, for
various sets of inputs, starting from the steady-state.
would not be available in most chemical engineering ap- trajectories, producing a model with a maximum absolute
plications), then SINDy was able to identify the model error in the temperature of only 0.5 K. A few represen-
correctly. Upon further analysis of the derivatives at the tative open-loop simulations (i.e., the test set) for the
regions of the fastest dynamics, it was found that the states first-principles model and this sparse-identified model are
changed very abruptly within the sampling period of 𝛥 = shown in Fig. 11, demonstrating close agreement through-
0.01 h, causing numerical instabilities in the derivative out the region of state-space. Hence, it can be concluded
estimation. Hence, providing the exact derivatives resolved
that 53 trajectories contain sufficient dynamic information
the issue. As expected, the issue was also resolved if the
to build a highly accurate SINDy model, and there is no ne-
data was sampled ten times as frequently, i.e., with 𝛥 =
cessity to use all 1000 trajectories, which introduce faster
0.001 h. However, using all the 1000 trajectories is not nec-
and more complex dynamics in certain regions, which
essary to capture the dynamics of this system, as described
next. eventually require a finer sampling to be practically useful.
• When the data set was truncated to only retain trajectories 2. Method: Open-loop step tests are carried out with the system
that never exceeded a temperature or 500 K or dropped initiated from the desired steady-state and excited using various
below 300 K, i.e. 𝑇 ∈ [300, 500] K ∀ 𝑡, in order to only input signals only until the desired level set, 𝛺𝜌̂ , or operating
select trajectories close to the desired steady-state, 53 out
region is excited.
of the initial 1000 trajectories were retained. However,
SINDy was able to identify the best model with these 53 • 1000 such trajectories were generated in this data set.
14
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
• The initial condition was fixed to be the unstable steady- identification of wrong basis terms for some of the vari-
state, (𝐶𝐴 , 𝑇 ) = (𝐶𝐴𝑠 , 𝑇𝑠 ) = (1.95 kmol∕m3 , 402 K) ables. In the models obtained for Eq. (20) using SINDy
• Input signals were randomly generated with the following with the data set generated using closed-loop simulations,
restrictions on the inputs: 𝐶𝐴0 ∈ [0.5, 7.5] kmol∕m3 and the heat input rate, 𝑄, erroneously appeared with a rel-
𝑄 ∈ [−500, 500] MJ∕h atively large coefficient in the first ODE representing 𝐶̇ 𝐴 ,
• As data is generated only within the operating region of which may be the cause of the divergence. While further
interest, an advantage is that almost the entire region of analyses may allow such data to be used for SINDy model
the state-space that is of interest can be captured via a large identification, based on our current results, this method of
number of simulations. data generation was not found to produce accurate SINDy
• Due to the unstable nature of the steady-state, one disad- models.
vantage is that a very large number of the 1000 trajectories Based on the above analysis, the first method of data generation was
in the data set are incomplete and too short to provide suf- found to be the optimal method of data generation when using SINDy.
ficient dynamic information, especially for SINDy, which Since the optimal results were obtained with limited trajectories that
generally performs better with longer trajectories rather were able to be integrated to 𝑡𝑓 and also stayed relatively close to
than short bursts of trajectories. Out of the 1000 trajec- the steady-state of interest, the best method of data generation for this
tories, only 14 trajectories are able to be simulated until system, based on the above analysis, seems to be conduct a modest
𝑡𝑓 = 1 h. Since second-order finite-differences are used for number of step tests near the desired region. However, the second
the gradient approximation in our work as well as due to method can also be used if a much larger data set is used and caution
the internal mechanisms of the integrator used, at least 4 is taken to only use trajectories with at least 4 data points when
data points are required to be able to use a trajectory for using a second-order finite-difference method for estimating the time-
model identification. Only 381 of the 1000 trajectories had derivative of the states. The use of closed-loop data to identify SINDy
at least 4 data points and could be used to build a SINDy models was not found to yield satisfactory results, and further analyses
model. However, the data set of 381 trajectories did not should be carried out in the future to assess the viability of such data
contain enough dynamic information, producing a SINDy for SINDy modeling of chemical processes.
model with a maximum temperature prediction error of
10.4 K. However, if the size of the data set was increased 7.2. Candidate library of basis functions
to 2000 trajectories, 807 trajectories with at least 4 data
points remained, which then produced a highly accurate Since its inception, multiple studies have reported the central role of
SINDy model with a maximum temperature prediction the candidate library, 𝛩, in the SINDy algorithm (Brunton et al., 2016;
error of 0.6 K. Kaheman et al., 2020). In Brunton et al. (2016), for example, a standard
• State-space profiles for some open-loop simulations are benchmark problem for system identification, the glycolytic oscillator
shown in Fig. 12 for the first-principles model and the iden- model, could only be partially identified, i.e., the dynamics of only
tified SINDy model, showing close agreement throughout. four out of the seven states could be correctly identified. The reason
was attributed to the presence of rational functions in the right-hand
3. Method: Closed-loop simulations are carried out under a
sides of the ODEs corresponding to the remaining three states, which
proportional-only controller with the state initialized from vari-
were not considered in the polynomial basis set used. Hence, choosing
ous initial conditions.
the correct basis functions is critical to the success of SINDy. For the
• Two data sets were used to attempt to build a SINDy remainder of the section, the data set used to study the effect of the
model using this method of data generation, one data set candidate library is the data set generated using the first type of data
with 121 trajectories, spanning an 11 × 11 grid for 𝑥0 generation described in Section 7.1 (53 trajectories from open-loop step
in the state-space, while the second data set consisted of tests).
961 trajectories, covering a 31 × 31 grid for 𝑥0 in the In the absence of any a priori knowledge, the nonisothermal CSTR
state-space. of Eq. (20) is a particularly challenging system to obtain the correct
• Initial conditions were selected within the grid, 𝐶𝐴 × 𝑇 = basis for and, hence, model. This is primarily due to the fact that,
[0.2, 3.7] kmol∕m3 × [327, 477] K with each range uni- by design, SINDy can only regress the pre-multiplying coefficients for
formly spaced into 10 or 30 intervals with 11 or 31 points, each basis function, which appear linearly in the right-hand side of
respectively. the ODE. The basis functions themselves must be selected and the 𝛩
• Input signals were calculated using the equation for a calculated before carrying out the regression step for identifying the
proportional controller, 𝑄 = −1000(𝑇 − 𝑇𝑠 ), where 1000 pre-multiplying coefficients by solving Eq. (8). Since the activation
represents the controller gain, and 𝐶𝐴0 was fixed at its energy is generally unknown, the exponential term must be carefully
steady-state value of 𝐶𝐴0𝑠 = 4 kmol∕m3 . selected. For the set of parameters chosen, from Eq. (21), it can be
• A purported advantage of this method of data generation observed that the numerator of the argument of the exponential term
1
is that, due to the presence of the controller, the state is −6013.95236949723. Due to the extreme dissimilarity between e− 𝑇
6013.95236949723
can be driven to the desired unstable steady-state from and e− 𝑇 , using the former exponential term as a basis func-
any initial condition, providing dynamic information for tion will not yield an accurate SINDy model. The dynamics of the
6013.95236949723 1
trajectories from any point in the state-space up to the e− 𝑇 term cannot be captured by any linear multiple of e− 𝑇 .
unstable steady-state. However, choosing large numbers over a wide range is also inadvisable
• The models obtained using this data set could very accu- since only a narrow range of the exponent can yield an accurate model
rately predict the derivatives within the test set, i.e., the with a maximum absolute error in the temperature below 1 K as shown
right-hand side of the model evaluates to the correct value in Fig. 13. While it may be possible to tune the exponent in this particu-
of the derivative of the test set trajectories. However, all lar case by conducting a fine search for the exponent over a wide range
the simulations diverged from the steady-state after a short with shortly spaced intervals of approximately 10 units, this is generally
period at the initial stages of the simulation duration. This intractable when the exponent is even larger in magnitude (increasing
phenomenon was also observed in Brunton et al. (2016) the required search region) or if there are multiple reactions, in which
with the glycolytic oscillator model and attributed to the case tuning each exponent term using a multidimensional grid search
15
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 13. Validation error as a function of the numerator of the argument of the exponential function in the candidate library for the original data set, (𝐶𝐴 , 𝑇 ).
at such a high resolution becomes prohibitively expensive. Therefore, all the 𝛾 will often be approximately of the same order of magnitude or
two approaches are proposed to overcome this challenge, both of which within an order of magnitude difference (e.g., Alanqar et al., 2017a).
are shown to yield accurate SINDy models. Therefore, a ‘‘mean’’ or representative value of the 𝛾 values of all the
reactions will produce an accurate SINDy model, owing to the greatly
Remark 2. This challenge has been overcome in some past studies reduced sensitivity of the basis functions to 𝛾. Secondly, even if the
by assuming that the activation energy is known a priori, and the (nearly) exact value of 𝛾 = −15 is not found using the search methodol-
6013.95236949723
exact term, e− 𝑇 , is included in the candidate library, largely ogy described, simply using every integer value of 𝛾 ∈ [−20, −10] to
simplifying the modeling problem (e.g., Bhadriraju et al., 2019, 2020). create 11 basis functions also yields an accurate (but dense) SINDy
In other studies using SINDy to model reaction networks, the objective model with a maximum absolute error in the temperature of 0.4 K.
was to identify the reactions rather than investigate any temperature In contrast, in the original variables, if multiple basis functions, for
dependence (Hoffmann et al., 2019). Hence, the specific challenge example, the set 𝛾 ∈ {−7000, −6000, −5000, −4000, −3000, −2000, −1000}
of obtaining an appropriate basis for SINDy to model nonisothermal is chosen to be in the candidate library, the results are poor due to the
reactors is considered here. large dissimilarities between successive basis functions as noted previ-
ously. We reiterate that the goal of using SINDy in this work is not to
capture the underlying ODE but to use SINDy as a system identification
7.2.1. Non-dimensionalization of the temperature
method. Although the original system was nearly reproduced when 𝛾 =
The first approach we consider is non-dimensionalizing the tem-
−15 was correctly identified, this need not be the case to apply SINDy,
perature by scaling it by a reference temperature, 𝑇ref . We consider,
especially when the models will subsequently be used for model-based
for simplicity and without loss of generality, 𝑇ref = 𝑇𝑠 , and define the
feedback control. Hence, the ‘‘bruteforce’’ approach of using all 11 basis
new dimensionless temperature as 𝑇̄ = 𝑇 ∕𝑇𝑠 . Hence, the ODE system
functions with 𝛾 ∈ [−20, −10] is considered a satisfactory model as well.
of Eq. (20), after the appropriate substitutions, takes the form,
This latter approach may also handle multiple reactions more easily
d𝐶𝐴 𝐹 𝛾
since it is likely that the correct value of 𝛾 for each reaction is captured
= (𝐶𝐴0 − 𝐶𝐴 ) − 𝑘0 e 𝑇̄ 𝐶𝐴2 (22a)
d𝑡 𝑉( ) in the candidate library.
d𝑇̄ 𝐹 𝑇0 −𝛥𝐻 𝛾 103 𝑄
= − 𝑇̄ + 𝑘 e 𝑇̄ 𝐶𝐴2 + (22b)
d𝑡 𝑉 𝑇𝑠 𝜌𝐿 𝐶 𝑝 𝑇 𝑠 0 𝜌𝐿 𝐶 𝑝 𝑉 𝑇 𝑠 Remark 3. To apply non-dimensionalization to the system when
where 𝛾 = −𝐸∕𝑅𝑇𝑠 . For the set of process parameters and reference applying SINDy, the only change that must be made is that the tem-
temperature chosen, 𝛾 = −14.96. Due to the much smaller value of 𝛾 perature data must be scaled by 𝑇𝑠 before providing the data set to
and the lower sensitivity of 𝛾, it is possible to conduct a fine search the SINDy algorithm. Since finite-differences are used to estimate the
for a value of 𝛾 that produces an accurate SINDy model. The maximum time-derivative, 𝑋,̇ the derivative estimates will scale accordingly once
absolute errors in the variables for the validation set for 𝛾 ∈ [−20, 0] are the data set itself is scaled.
shown in Fig. 14. A value of 𝛾 = −15 yields the highly accurate model,
7.2.2. Higher-order Taylor series approximation
d𝐶𝐴 − 15 A possibly more general approach that can handle any value of the
= 5.051𝐶𝐴0 − 5.058𝐶𝐴 − 8.8 × 106 e 𝑇̄ 𝐶𝐴2 (23a)
d𝑡 activation energy or any number of reactions is to express the expo-
d𝑇̄ − 15
nential term using its Taylor series expansion such that the activation
= 3.768 − 5.046𝑇̄ + 1.09 × 106 e 𝑇̄ 𝐶𝐴2 + 0.011𝑄 (23b)
d𝑡 energy appears as a pre-multiplier, which can then be regressed using
where every coefficient is within 5% of the true values. There are SINDy. As SINDy is a nonlinear method, any order of the Taylor series
two further advantages of non-dimensionalization in this case. Firstly, can be retained. If multiple reactions are present, the pre-multiplier
when multiple reactions are present, in many practical cases, since the should account for all the reactions since the temperature variable is
reference temperature is similar to the specific process temperatures, independent of the activation energy, and all the approximated terms
16
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Fig. 14. Validation error as a function of 𝛾 for the data set with dimensionless temperature, (𝐶𝐴 , 𝑇̄ ).
can be summed to yield one final pre-multiplying coefficient value for 7.3. Summary of data generation and candidate library guidelines and final
each term of the Taylor expansion. steps to build the SINDy model
Due to the length of the models involving Taylor expansion, only
Based on extensive results from using the various types of data
error metrics and discussions are provided for this method. When generation and basis functions considered, the following points can be
the candidate library includes up to 5th-order terms of the Taylor summarized:
expansion, i.e., (𝑇 − 402)5 , an accurate SINDy model with a maximum
1. Open-loop step tests were found to be the optimal method of
absolute error in the temperature of 0.4 K is obtained, with open-loop
data generation for obtaining a SINDy model for the system
test results nearly identical to Fig. 11. When the sparse-identified model studied, although short bursts within the desired stability region
is compared to the original ODE of Eq. (21) with parameters substituted can yield a good model if a sufficient number of trajectories with
in and the exponential term replaced by 5th-order Taylor series, it is at least 4 data points are obtained.
found that the SINDy model neglects terms above third-order, which 2. Data from closed-loop simulations did not yield an accurate
are of the order of 10−8 and 10−6 for 𝐶𝐴 and 𝑇 , respectively, in model for the system studied.
the actual equation (i.e., when a 5th-order Taylor expansion of the 3. A sampling period of 𝛥 = 0.01 h or 36 seconds is sufficient for
exponential term is used in the first-principles system of Eq. (21)). obtaining an accurate SINDy model as long as enough dynamic
information is captured via open-loop tests with a large number
As for terms up to third-order, the SINDy model correctly identifies
of input signals.
all terms for 𝐶𝐴 and identifies the terms in 𝑇 correctly as well, but
4. Due to the sensitivity of the argument of the exponential term,
also identifies a few erroneous terms such as linear 𝐶𝐴 and 𝐶𝐴0 terms. −𝐸∕𝑅 or 𝛾, the exponential basis term should be selected care-
However, the contribution of the extra terms are extremely minor and fully.
do not affect the accuracy, as seen in the extremely low maximum
• Specifically, if a priori knowledge of the reaction (such
absolute error.
as an estimate of the activation energy) is available, the
system may be modeled directly without any modifications
Remark 4. While this method may be reminiscent of linearization of as long as the correct values of the activation energy are
a nonlinear ODE, there are two key differences. Firstly, a nonlinear used to build the candidate library.
higher-order Taylor expansion is used to approximate the exponen- • In the event that the no a priori knowledge is available, the
tial function rather than a linear approximation. This greatly affects system should be either non-dimensionalized with respect
the region of accurate model predictions compared to a linearized to temperature or a higher-order Taylor series used to
approximate the exponential terms.
model. When the open-loop tests shown in Fig. 12 were repeated with
the linear state-space model obtained for this system in Wu et al. 5. Non-dimensionalization of the temperature has potential to re-
(2019b) using N4SID, all trajectories were found to diverge, while produce the exact system.
Fig. 12 demonstrates the high accuracy of the nonlinear SINDy model. 6. Using Taylor series approximations of the exponential term can
Secondly, only the exponential term in the Arrhenius relationship is yield highly accurate SINDy models, but their performance is
expected to deteriorate when sufficiently far from the point of
approximated using the Taylor expansion, but the remaining terms
expansion. However, since a nonlinear, higher-order approxima-
in the ODE model and candidate library remain in their original,
tion is used, the region where the model performs accurately
nonlinear forms. Hence, all other nonlinear terms can still be identified will be significantly larger than any model obtained from a
exactly without any approximation, while model linearization includes linearization of the original system, and likely large enough for
linearizing even such polynomial and trigonometric terms. any practical application.
17
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Once the data set and method of handling the exponential term are due to the requirement of simpler models with fewer parameters, since
finalized based on the aforementioned guidelines, the SINDy model is only a fraction of the model must be captured by an FNN. Moreover,
obtained by using the PySINDy package in Python (de Silva et al., 2020; neural network training generally requires large volumes of data with
Kaptanoglu et al., 2022). Specifically, the data set is loaded into Python wide variation and coverage of the operating region, which may be
and split into an 70%/10%/20% training/validation/test set. The time- difficult to obtain in an experimental or plant setting. In contrast, when
̇ is estimated using second-order central finite
derivative of the states, 𝑋, only a fraction of the overall model requires an FNN to be modeled, the
differences (except the first and last points, which use second-order data acquisition may be eased as well.
forward and backward finite differences, respectively). The optimizer Once a model of the form of Eq. (25) is identified, if possible,
is chosen to be the sequential thresholded least squares described converting the FNN part of the SINDy model back to symbolic functions
in Brunton et al. (2016), with 𝜆 tuned via a coarse search to a value will greatly improve the model inference time as explicit nonlinearities
of 5.0, although similar results were obtained for the SR3 optimizer are computationally desirable. Such advances have already been initi-
as well. The candidate library, for both the non-dimensionalization ated in recent papers on modeling biological systems (Rackauckas et al.,
and Taylor series approaches, was chosen to include up to second- 2020).
order polynomial terms for the concentration 𝐶𝐴 , the bias term, and
linear input terms. The remaining terms for the non-dimensionalization
method included a linear temperature term, the exponential term with 8.2. Real-time model updates
𝛾 = −15, and interaction terms between the polynomial 𝐶𝐴 terms
and the exponential term. Specifically, the candidate library for the In the presence of disturbances or changes in process behavior
non-dimensionalization approach takes the following form: due to, for example, catalyst deactivation or feed stream disruptions,
the process model in a model-based controller such as MPC must
𝛩(𝐶𝐴 , 𝑇̄ , 𝐶𝐴0 , 𝑄) be updated in real-time to reflect the changes. Much of the research
− 15 − 15 − 15 in model re-identification is concentrated on the mathematical de-
= [1 𝐶𝐴 𝐶𝐴2 𝑇̄ 𝐶𝐴0 𝑄 e 𝑇̄ 𝐶𝐴 e 𝑇̄ 𝐶𝐴2 e 𝑇̄ ] (24)
tails of the algorithms used for the model update, such as recur-
For the Taylor series approach, the only change is that the exponential sive least-squares or recursive singular value decomposition (Moonen
term is replaced with (𝑇 − 402), (𝑇 − 402)2 , …, (𝑇 − 402)5 . Hence, the et al., 1989; Lovera et al., 2000; Mercere et al., 2004) rather than
last three functions and the 𝑇̄ function in Eq. (24) are replaced by 15 developing a rigorous framework for the triggering of the model re-
terms (five exponential approximation terms and ten interaction terms identification procedure. Research on the triggering procedure include
with 𝐶𝐴 ), producing a library of 20 functions. Once all of the above
error-triggered as well as event-triggered model re-identification (Alan-
selections are made, the SINDy model can be obtained by calling the
qar et al., 2017a,b; Wu et al., 2020), but mostly use first-principles
model fitting method in PySINDy. The SINDy model of Eq. (23), for
process models. In the context of SINDy, Quade et al. (2018) proposed a
example, is obtained by using the first type of data generation (53
model re-identification procedure, where the SINDy model coefficients
open-loop step tests) and the candidate library of Eq. (24).
could be updated or terms could be added or deleted as required. The
trigger for re-identification was a significant divergence between the
8. Future directions
local Lyapunov exponent and the prediction horizon estimate (although
the definition of ‘‘prediction horizon’’ in Quade et al. (2018) differs
8.1. Neural network basis functions
from its usage in this manuscript). However, the results of Quade
et al. (2018) were only in the context of modeling. Hence, a future
For highly complex systems, it may be possible that the initially
chosen nonlinear basis functions do not produce adequate results, direction for research in sparse identification would be to consider
but no prior knowledge is available to intelligently expand the func- real-time updates to a data-based SINDy model based on the error- or
tion library. Moreover, adding random, additional nonlinear candidate event-triggering mechanism of Wu et al. (2020).
functions may fail to improve the SINDy model performance if the
functions added are completely dissimilar to the relevant functions 9. Conclusions
that are required to model the system. An example is the challenge of
the exponential basis term encountered and discussed in Section 7.2 In this paper, we have provided an overview of several recent ad-
with the nonisothermal CSTR example. In such cases, one option is vancements in the sparse identification for nonlinear dynamics (SINDy)
to add more powerful and general function approximators such as method to overcome the challenges of modeling and controlling two-
feedforward neural networks, which are well-known for their universal time-scale systems and noisy data. The methods considered included
approximation property, which dictates that they can approximate any combining SINDy with nonlinear principal component analysis, feedfor-
static nonlinear function if they are designed with enough neurons ward neural networks, subsampling, co-teaching, and ensemble learn-
and at least one sigmoidal hidden layer (Hornik et al., 1990; Hornik, ing. The novel methods were described in detail, and best practices,
1991). Such hybrid models consisting of partly first-principles/ODE
tuning guidelines, as well as common pitfalls to avoid, for their suc-
models and partly data-based black-box models are increasingly being
cessful application in process systems engineering were provided for
used (Porru et al., 2000; Oliveira, 2003; von Stosch et al., 2014;
control practitioners. To demonstrate their effectiveness, results from
Zendehboudi et al., 2018; Bangi and Kwon, 2020; Lee et al., 2020).
applying the proposed algorithms to chemical processes were pro-
Specifically, hybrid models involving ODE models and FNNs have been
vided. Subsequently, SINDy was used to model a nonlinear chemical
successfully applied to state estimation problems in the recent work
process to provide a demonstration of its application as well as to
of Alhajeri et al. (2021). Therefore, a similar approach may be proposed
highlight specific challenges faced when applying SINDy in process
for SINDy, where the right-hand side of the SINDy model of Eq. (2) may
systems engineering. Finally, a number of future research directions
be modified to
were outlined.
̂̇ = 𝑓̂(𝑥)
𝑥(𝑡) ̂ + 𝑔( ̂ + FNN(𝑥, 𝑢)
̂ 𝑥)𝑢 (25)
where FNN denotes a feedforward neural network model that can CRediT authorship contribution statement
capture any nonlinearities not modeled by the function library. One
advantage of such a model, as opposed to a purely FNN model for the Fahim Abdullah: Conceptualization, Methodology, Software, Writ-
right-hand side of Eq. (25), may include reduced computational time ing. Panagiotis D. Christofides: Supervision, Reviewing and editing.
18
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
Declaration of competing interest González-García, R., Rico-Martínez, R., Kevrekidis, I., 1998. Identification of distributed
parameter systems: A neural net based approach. Comput. Chem. Eng. 22,
S965–s968.
The authors declare that they have no known competing finan-
Hoffmann, M., Fröhner, C., Noé, F., 2019. Reactive SINDy: Discovering governing
cial interests or personal relationships that could have appeared to
reactions from concentration data. J. Chem. Phys. 150 (2), 025101.
influence the work reported in this paper. Holkar, K.S., Waghmare, L.M., 2010. An overview of model predictive control. Int. J.
Control Autom. 3 (4), 47–63.
Data availability Hornik, K., 1991. Approximation capabilities of multilayer feedforward networks.
Neural Netw. 4 (2), 251–257.
Hornik, K., Stinchcombe, M., White, H., 1990. Universal approximation of an unknown
Data will be made available on request.
mapping and its derivatives using multilayer feedforward networks. Neural Netw.
3 (5), 551–560.
References Kaheman, K., Kutz, J.N., Brunton, S.L., 2020. SINDy-PI: A robust algorithm for parallel
implicit sparse identification of nonlinear dynamics. Proc. R. Soc. A: Math., Phys.
Abdullah, F., Alhajeri, M.S., Christofides, P.D., 2022a. Modeling and control of Eng. Sci. 476 (2242), 20200279.
nonlinear processes using sparse identification: Using dropout to handle noisy data. Kaptanoglu, A.A., de Silva, B.M., Fasel, U., Kaheman, K., Goldschmidt, A.J., Calla-
Ind. Eng. Chem. Res. 61 (49), 17976–17992. ham, J., Delahunt, C.B., Nicolaou, Z.G., Champion, K., Loiseau, J.-C., Kutz, J.N.,
Abdullah, F., Wu, Z., Christofides, P.D., 2021a. Data-based reduced-order modeling of Brunton, S.L., 2022. PySINDy: A comprehensive Python package for robust sparse
nonlinear two-time-scale processes. Chem. Eng. Res. Des. 166, 1–9. system identification. J. Open Source Softw. 7 (69), 3994.
Abdullah, F., Wu, Z., Christofides, P.D., 2021b. Sparse-identification-based model Kokotović, P., Khalil, H.K., O’Reilly, J., 1999. Singular Perturbation Methods in Control:
predictive control of nonlinear two-time-scale processes. Comput. Chem. Eng. 153, Analysis and Design. Society for Industrial and Applied Mathematics, pp. 93–156.
107411. Kramer, M.A., 1991. Nonlinear principal component analysis using autoassociative
Abdullah, F., Wu, Z., Christofides, P.D., 2022b. Handling noisy data in sparse neural networks. AIChE J. 37 (2), 233–243.
model identification using subsampling and co-teaching. Comput. Chem. Eng. 157, Lee, D., Jayaraman, A., Kwon, J.S., 2020. Development of a hybrid model for a partially
107628. known intracellular signaling pathway through correction term estimation and
Aggelogiannaki, E., Sarimveis, H., 2008. Nonlinear model predictive control for neural network modeling. PLoS Comput. Biol. 16, 1–31.
distributed parameter systems using data driven artificial neural network models. Likar, B., Kocijan, J., 2007. Predictive control of a gas–liquid separation plant based
Comput. Chem. Eng. 32 (6), 1225–1237. on a Gaussian process model. Comput. Chem. Eng. 31 (3), 142–152.
Alanqar, A., Durand, H., Christofides, P.D., 2017a. Error-triggered on-line model Lin, Y., Sontag, E.D., 1991. A universal formula for stabilization with bounded controls.
identification for model-based feedback control. AIChE J. 63, 949–966. Systems Control Lett. 16, 393–397.
Alanqar, A., Durand, H., Christofides, P.D., 2017b. Fault-tolerant economic model Lovera, M., Gustafsson, T., Verhaegen, M., 2000. Recursive subspace identification of
predictive control using error-triggered online model identification. Ind. Eng. Chem. linear and non-linear Wiener state-space models. Automatica 36 (11), 1639–1650.
Res. 56, 5652–5667. Mangan, N.M., Brunton, S.L., Proctor, J.L., Kutz, J.N., 2016. Inferring biological
Alhajeri, M.S., Abdullah, F., Wu, Z., Christofides, P.D., 2022a. Physics-informed machine networks by sparse identification of nonlinear dynamics. IEEE Trans. Mol., Biol.
learning modeling for predictive control using noisy data. Chem. Eng. Res. Des. 186, Multi-Scale Commun. 2 (1), 52–63.
34–49. McBride, K., Sundmacher, K., 2019. Overview of surrogate modeling in chemical process
Alhajeri, M.S., Luo, J., Wu, Z., Albalawi, F., Christofides, P.D., 2022b. Process structure- engineering. Chem. Ing. Tech. 91 (3), 228–239.
based recurrent neural network modeling for predictive control: A comparative Mercere, G., Lecoeuche, S., Lovera, M., 2004. Recursive subspace identification based
study. Chem. Eng. Res. Des. 179, 77–89. on instrumental variable unconstrained quadratic optimization. Internat. J. Adapt.
Alhajeri, M.S., Wu, Z., Rincon, D., Albalawi, F., Christofides, P.D., 2021. Machine- Control Signal Process. 18, 771–797.
learning-based state estimation and predictive control of nonlinear processes. Chem. Moonen, M., De Moor, B., Vandenberghe, L., Vandewalle, J., 1989. On-and off-line
Eng. Res. Des. 167, 268–280. identification of linear state-space models. Internat. J. Control 49, 219–232.
Bai, Z., Wimalajeewa, T., Berger, Z., Wang, G., Glauser, M., Varshney, P.K., 2015. Low- Moore, C., 1986. Application of singular value decomposition to the design, analysis,
dimensional approach for reconstruction of airfoil data via compressive sensing. and control of industrial processes. In: 1986 American Control Conference. Seattle,
AIAA J. 53 (4), 920–933. WA, USA, pp. 643–650.
Bangi, M.S.F., Kwon, J.S.-I., 2020. Deep hybrid modeling of chemical process: Narasingam, A., Kwon, J.S.I., 2018. Data-driven identification of interpretable
Application to hydraulic fracturing. Comput. Chem. Eng. 134, 106696. reduced-order models using sparse regression. Comput. Chem. Eng. 119, 101–111.
Bhadriraju, B., Bangi, M.S.F., Narasingam, A., Kwon, J.S.-I., 2020. Operable adaptive Oliveira, R., 2003. Combining first principles modelling and artificial neural networks: A
sparse identification of systems: Application to chemical processes. AIChE J. 66 general framework. In: Kraslawski, A., Turunen, I. (Eds.), European Symposium on
(11), e16980. Computer Aided Process Engineering-13. In: Computer Aided Chemical Engineering,
Bhadriraju, B., Narasingam, A., Kwon, J.S.-I., 2019. Machine learning-based adaptive vol. 14, Elsevier, pp. 821–826.
model identification of systems: Application to a chemical process. Chem. Eng. Res. Porru, G., Aragonese, C., Baratti, R., Servida, A., 2000. Monitoring of a CO oxidation
Des. 152, 372–383. reactor through a grey model-based EKF observer. Chem. Eng. Sci. 55 (2), 331–338.
Bikmukhametov, T., Jäschke, J., 2020. Combining machine learning and process Quade, M., Abel, M., Nathan Kutz, J., Brunton, S.L., 2018. Sparse identification of
engineering physics towards enhanced accuracy and explainability of data-driven nonlinear dynamics for rapid model recovery. Chaos 28, 063116.
models. Comput. Chem. Eng. 138, 106834. Rackauckas, C., Ma, Y., Martensen, J., Warner, C., Zubov, K., Supekar, R., Skinner, D.,
Brunton, S.L., Proctor, J.L., Kutz, J.N., 2016. Discovering governing equations from Ramadhan, A., Edelman, A., 2020. Universal differential equations for scientific
data by sparse identification of nonlinear dynamical systems. Proc. Natl. Acad. Sci. machine learning. arXiv preprint arXiv:2001.04385.
113 (15), 3932–3937. Raissi, M., Perdikaris, P., Karniadakis, G.E., 2018. Multistep neural networks for
Champion, K.P., Brunton, S.L., Kutz, J.N., 2019. Discovery of nonlinear multiscale data-driven discovery of nonlinear dynamical systems. arXiv:1801.01236.
systems: Sampling strategies and embeddings. SIAM J. Appl. Dyn. Syst. 18, Rudy, S.H., Brunton, S.L., Proctor, J.L., Kutz, J.N., 2017. Data-driven discovery of
312–333. partial differential equations. Sci. Adv. 3 (4), e1602614.
Chang, H.C., Aluko, M., 1984. Multi-scale analysis of exotic dynamics in surface Rudy, S.H., Nathan Kutz, J., Brunton, S.L., 2019. Deep learning of dynamics and
catalyzed reactions–I: Justification and preliminary model discriminations. Chem. signal-noise decomposition with time-stepping constraints. J. Comput. Phys. 396,
Eng. Sci. 39 (1), 37–50. 483–506.
de Silva, B., Champion, K., Quade, M., Loiseau, J.C., Kutz, J., Brunton, S., 2020. Sansana, J., Joswiak, M.N., Castillo, I., Wang, Z., Rendall, R., Chiang, L.H., Reis, M.S.,
PySINDy: A Python package for the sparse identification of nonlinear dynamical 2021. Recent trends on hybrid modeling for industry 4.0. Comput. Chem. Eng. 151,
systems from data. J. Open Source Softw. 5 (49), 2104. 107365.
Dong, D., McAvoy, T., 1996. Nonlinear principal component analysis–Based on principal Schulze, J.C., Doncevic, D.T., Mitsos, A., 2022. Identification of MIMO Wiener-type
curves and neural networks. Comput. Chem. Eng. 20 (1), 65–78. Koopman models for data-driven model reduction using deep learning. Comput.
Efron, B., Stein, C., 1981. The Jackknife estimate of variance. Ann. Statist. 9 (3), Chem. Eng. 161, 107781.
586–596. Tsay, C., Baldea, M., 2020. Integrating production scheduling and process control using
Fablet, R., Ouala, S., Herzet, C., 2018. Bilinear residual neural network for the latent variable dynamic models. Control Eng. Pract. 94, 104201.
identification and forecasting of geophysical dynamics. In: Proceedings of the 26th Van Overschee, P., De Moor, B., 1994. N4SID: Subspace algorithms for the identification
European Signal Processing Conference. pp. 1477–1481. of combined deterministic-stochastic systems. Automatica 30 (1), 75–93.
19
F. Abdullah and P.D. Christofides Computers and Chemical Engineering 174 (2023) 108247
von Stosch, M., Oliveira, R., Peres, J., Feyo de Azevedo, S., 2014. Hybrid semi- Wu, Z., Tran, A., Rincon, D., Christofides, P.D., 2019a. Machine learning-based
parametric modeling in process systems engineering: Past, present and future. predictive control of nonlinear processes. Part I: Theory. AIChE J. 65, e16729.
Comput. Chem. Eng. 60, 86–101. Wu, Z., Tran, A., Rincon, D., Christofides, P.D., 2019b. Machine learning-based
Wächter, A., Biegler, L.T., 2006. On the implementation of an interior-point filter predictive control of nonlinear processes. Part II: Computational implementation.
line-search algorithm for large-scale nonlinear programming. Math. Program. 106, AIChE J. 65, e16734.
25–57. Zendehboudi, S., Rezaei, N., Lohi, A., 2018. Applications of hybrid models in chemical,
Wilson, Z.T., Sahinidis, N.V., 2017. The ALAMO approach to machine learning. Comput. petroleum, and energy systems: A systematic review. Appl. Energy 228, 2539–2566.
Chem. Eng. 106, 785–795. Zhang, S., Lin, G., 2018. Robust data-driven discovery of governing physical laws with
Wu, Z., Luo, J., Rincon, D., Christofides, P.D., 2021a. Machine learning-based predictive
error bars. Proc. R. Soc. A: Math., Phys. Eng. Sci. 474 (2217), 20180305.
control using noisy data: Evaluating performance and robustness via a large-scale
Zhang, S., Lin, G., 2021. Subtsbr to tackle high noise and outliers for data-driven
process simulator. Chem. Eng. Res. Des. 168, 275–287.
discovery of differential equations. J. Comput. Phys. 428, 109962.
Wu, Z., Rincon, D., Christofides, P.D., 2020. Real-time adaptive machine-learning-based
Zheng, P., Askham, T., Brunton, S.L., Kutz, J.N., Aravkin, A.Y., 2019. A uni-
predictive control of nonlinear processes. Ind. Eng. Chem. Res. 59 (6), 2275–2290.
fied framework for sparse relaxed regularized regression: SR3. IEEE Access 7,
Wu, Z., Rincon, D., Luo, J., Christofides, P.D., 2021b. Machine learning modeling and
predictive control of nonlinear processes using noisy data. AIChE J. 67 (4), e17164. 1404–1423.
20