0% found this document useful (0 votes)
37 views13 pages

2023-Physics-Informed Recurrent Neural Networks and Hyper-Parameter Optimization For Dynamic Process Systems

This document summarizes a study that investigates two physics-informed training approaches for recurrent neural networks to model dynamic process systems. The first approach uses a multi-objective loss function including the discretized form of differential equations. The second approach uses a hybrid recurrent neural network cell with embedded physics-informed and data-driven nodes performing Euler discretization. The study also applies hyper-parameter optimization techniques like grid search, Bayesian optimization, and genetic algorithms to optimize the physics-informed recurrent neural network architectures. The goal is to improve test performance through physics-informed training while obtaining smaller and more robust network architectures.

Uploaded by

maycvc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views13 pages

2023-Physics-Informed Recurrent Neural Networks and Hyper-Parameter Optimization For Dynamic Process Systems

This document summarizes a study that investigates two physics-informed training approaches for recurrent neural networks to model dynamic process systems. The first approach uses a multi-objective loss function including the discretized form of differential equations. The second approach uses a hybrid recurrent neural network cell with embedded physics-informed and data-driven nodes performing Euler discretization. The study also applies hyper-parameter optimization techniques like grid search, Bayesian optimization, and genetic algorithms to optimize the physics-informed recurrent neural network architectures. The goal is to improve test performance through physics-informed training while obtaining smaller and more robust network architectures.

Uploaded by

maycvc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Computers and Chemical Engineering 173 (2023) 108195

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/compchemeng

Physics-informed recurrent neural networks and hyper-parameter


optimization for dynamic process systems
Tuse Asrav a, 1, Erdal Aydin a, b, 1, *
a
Department of Chemical and Biological Engineering, Koç University, Istanbul 34450, Turkey
b
Koç University TUPRAS Energy Center (KUTEM), Koç University, Istanbul 34450, Turkey

A R T I C L E I N F O A B S T R A C T

Keywords: Many of the processes in chemical engineering applications are of dynamic nature. Mechanistic modeling of these
Machine learning processes is challenging due to the complexity and uncertainty. On the other hand, recurrent neural networks are
Recurrent neural networks useful to be utilized to model dynamic processes by using the available data. Although these networks can
Physics-informed neural networks
capture the complexities, they might contribute to overfitting and require high-quality and adequate data. In this
Hybrid neural networks
Hyper-parameter optimization
study, two different physics-informed training approaches are investigated. The first approach is using a multi-
objective loss function in the training including the discretized form of the differential equation. The second
approach is using a hybrid recurrent neural network cell with embedded physics-informed and data-driven nodes
performing Euler discretization. Physics-informed neural networks can improve test performance even though
decrease in training performance might be observed. Finally, smaller and more robust architecture are obtained
using hyper-parameter optimization when physics-informed training is performed.

1. Introduction dependencies difficult. On the other hand, LSTM (Long Short Term
Memory) or GRU (Gated Recurrent Unit) networks with more complex
Mechanistic mathematical models are developed based on the rule of mechanisms in the network structure process longer sequences by
the first principles to predict the actual behavior of processes. However, regulating the flow of information (Chung et al., 2014).
chemical engineering often deals with complex systems, and obtaining Data-driven machine learning models may have also some limita­
an accurate mechanistic model for these systems is quite challenging tions. The performance of a purely data-driven model depends on both
(Thebelt et al., 2022). the quality and the quantity of the data. In Thebelt et al. (2022) four data
Artificial neural networks are utilized to empirically model nonlinear characteristics including the variance, volume, veracity, and physical
systems due to their capability of capturing complex relationships. They restrictions that make data-driven modeling difficult are explained.
are data-driven black-box models inspired by the human nervous sys­ Additionally, standard data-driven models usually neglect the physical
tem. One class of artificial neural networks is the recurrent neural net­ laws governing the real phenomena.
works extending the standard feed-forward artificial neural networks to Physics-informed neural networks can be used to find the solutions to
handle time-dependent responses. Recurrent neural networks are best differential equations and for discovering the form of differential
suited for sequential data and might be considered as an alternative equations. Raissi et al. (2019) propose physics-informed feed-forward
modeling approach for dynamical systems once proper data is available. neural networks to solve nonlinear differential equations through
For instance, in Xiao et al. (2022) a recurrent neural network model is selected collocation points by encoding the initial/boundary conditions
utilized as the prediction model to solve the model predictive controller as well as the differential equation into the loss function. In addition, the
problem for a complex system governed by partial differential authors introduce the inverse problem to find the coefficients of the
equations. differential equations. Similarly, a toolbox called SciANN is developed
A major disadvantage of the traditional recurrent neural networks is in Python by Haghighat and Juanes (2021) aiming to simplify the use of
the vanishing gradient problem which makes learning the long-term artificial neural networks for scientific computations while inheriting

* Corresponding author at: Department of Chemical and Biological Engineering, Koç University, Istanbul 34450, Turkey.
E-mail address: [email protected] (E. Aydin).
1
These authors contributed equally to this work.

https://doi.org/10.1016/j.compchemeng.2023.108195
Received 11 November 2022; Received in revised form 28 January 2023; Accepted 17 February 2023
Available online 18 February 2023
0098-1354/© 2023 Elsevier Ltd. All rights reserved.
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

the functionalities of Tensorflow and Keras packages (Abadi et al., 2016;


Chollet, 2015). Another open-source toolbox, HybridML, is developed to
train the hybrid models which employs Tensorflow for training neural
networks and Casadi for the integration of ordinary differential equa­
tions (Andersson et al., 2019; Merkelbach et al., 2022). Subraveti et al.
(2022) extend the neural networks trained with physics-constrained loss
function to learn the solutions of partial differential equations of cyclic
processes, in which the initial conditions change in every cycle through
selected collocation points using arbitrary initial profiles and step pa­
rameters. Automatic differentiation is used to calculate the residuals of Fig. 1. Feed-forward Artificial Neural Network.
partial differential equations (Subraveti et al., 2022).
In this work, Euler backward method is used for the numerical concentration in a wastewater treatment plant. Another method with
integration of the ordinary differential equations. The discretized forms the main advantage of avoiding trapping in local optimal solution by
of differential equations are incorporated into the loss function taking all searching parallel from a population of points rather than a single point
predictions and target data into account to solve the left-hand side and is the genetic algorithm (Goldberg, 1989). In Gorgolis et al. (2019),
right-hand side of the equations, respectively, at each time step. Raissi hyper-parameter optimization in an LSTM network is done through a
et al. (2019) and Subraveti et al. (2022) aim to find the data-driven genetic algorithm. In this study, a grid searh, a Gaussian processes-based
solutions of differential equations. This paper aims to improve the Bayesian optimization and a genetic algorithm are applied to optimize
learning process of neural networks by introducing physical knowledge the hyper-parameters of bi-objective physics-informed recurrent neural
in loss function. networks. The main aim of combining physics-informed recurrent neu­
Another approach for physics-informed modeling is embedding ral networks with hyper-parameter optimization is to investigate the
physical knowledge directly into the neural network structures. The impact of physics-informed training on the optimal architectures of the
advantages of hybrid semi-parametric models over mechanistic or data- networks.
driven models and the possible hybrid model structures are described in The key contributions of this paper can be summarized as follows: (i)
von Stosch et al. (2014). Nascimento and Viana (2019) extends recur­ developing a bi-objective based loss function traning for recurrent
rent neural networks to cumulative damage models for wind turbines, neural networks including the discretized form of the ordinary differ­
aircraft, and jet engines by proposing a recurrent neural network cell ential equations coupled with hyper-parameter tuning, (ii) using the
using embedded data-driven and physics-based layers performing Euler hybrid Euler recurrent neural network cell with the combination of both
integration. Dourado and Viana (2019) focuses on the hybrid modeling physics-informed and data-driven nodes for modeling systems governed
of corrosion fatigue while data-driven layers are used to estimate the by ordinary differential equations, (iii) comparison of physics-
prediction bias of a physical model. In Viana et al. (2021), the proposed uninformed, physics-informed bi-objective and hybrid neural network
recurrent neural network cell including both series and parallel performances for systems governed by ordinary differential equations,
physics-informed and data-driven nodes is used to model the fatigue (iv) the deployment of physics-informed bi-objective RNNs into hyper-
crack growth, corrosion-fatigue crack growth, and bearing fatigue while parameter optimization and detailed comparison of data-driven and
data-driven layers are used to compensate for the model discrepancies. physics-informed approaches for hyper-parameter optimization.
In addition to formulating physics-informed machine learning into The paper has the following structure: Section 2 introduces the
training objective function, this paper also suggest the usage of hybrid methodology behind the training algorithms of the machine learning
Euler recurrent neural network cell, firstly proposed by Nascimento and models. Section 3 details the two different case scenarios including a
Viana (2019), to model dynamic process systems involved in chemical semi-batch reactor and a wastewater treatment unit and shows the
engineering applications. Data-driven nodes are used to model more implementations, results, and discussions in the subsections for different
complex dynamics and physics-informed nodes are used to impose training algorithms. Finally, Section 4 presents the conclusions.
well-known physical knowledge to improve the data-driven modeling
performances. Finally, in most of the industrial applications, the ordi­ 2. Methodology
nary differential equations describing the real dynamic behavior of the
processes are unknown or partly known, but there can be some 2.1. Feed-forward artificial neural networks
well-known physical terms. Therefore, the use of these well-known
physical terms in data-driven models implemented in this paper can In feed-forward neural networks, the information is processed in one
be very useful in industrial case studies (Asrav et al., 2023). direction, through input nodes, hidden nodes, and output nodes as
Furthermore, the performance of the neural network is dependent shown in Fig. 1.
upon the hyper-parameters that are generally determined by a tri­ A fully connected feed-forward artificial neural network is expressed
al&error approach before the training process. However, there are also as follows:
various search algorithms in literature used for the hyper-parameter ( )
y t = f1 Wy f2 (Wx xt + Bh ) + By
̂ (1)
optimization eliminating the computational and practical challenges
of trial&error procedure (Yu, 2020). One of the simplest methods is grid where f1 and f2 are output and hidden layer activation functions, Wx and
search which searches for all of the hyper-parameter combinations from Wy are weight matrixes associated with inputs and outputs, respectively,
the defined search space (Alibrahim and Ludwig, 2021). On the other
Bh and By are the bias vectors.
hand, Bayesian optimization aims to find the global optimum with the
minimum number of trials by using the information from previous
evaluations (Snoek et al., 2012). In Escapil-Inchauspé et al. (2022), 2.2. Recurrent neural networks
physics-informed feed-forward neural network models are developed for
the solution of forward Helmholtz problems following the methodology Recurrent neural networks repeatedly apply transformations to the
proposed by Raissi et al. (2019). The hyper-parameters of the developed cell states throughout the time series as shown in Fig. 2 following the
models are optimized through Gaussian-processes based Bayesian opti­ below equation:
mization. In Hansen et al. (2022), number of hidden layers and number ht = f2 (Wx xt + Wh ht− 1 + Bh ) (2)
of neurons in each layer of LSTM networks are optimized through
Bayesian optimization for the purpose of modeling phosphorus where Wh is the weight matrix associated with hidden units.

2
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

Fig. 2. Unfolded Recurrent Neural Network.

Fig. 3. LSTM and GRU cells (Viana et al., 2021).

Therefore, a traditional recurrent neural network can be expressed as


where N is the size of the training data, yi is the ith target value, and ̂
y i is
follows:
the ith predicted value.
( )
y t = f1 Wy f2 (Wx xt + Wh ht− 1 + Bh ) + By
̂ (3) While training the networks, the physical knowledge of the data can
be included by proposing a multi-objective loss function. Therefore, it is
Since the cell states can learn from past information, output pre­ ensured that the learning procedure does not only depend on the
dictions depend on not only the inputs at the current time step but also training data information but also depends on the physical information
the memory of the cell state. Consequently, recurrent neural networks about the model.
can capture the sequential information in the input data. A bi-objective loss function can be proposed as follows:
To train the neural networks, the backward propagation algorithm is
preferred extensively. By sweeping backward during the propagation, 1 ∑
N
L= y i )2 + Wp P
(yi − ̂ (5)
gradients of the loss function are found with respect to weights, and the N i
weights are adjusted repeatedly to minimize the loss. If gradients are so
small due to further backward sweeping, standard recurrent neural where Wp is the scalar weight and P is the function derived from physical
networks can suffer from the vanishing gradient problem and the knowledge. Please note that choosing a proper Wp value is critical for
network cannot be effectively trained. Therefore, if there is a long-term effective and efficient training.
dependency between the relevant information and the output, long If a neural network is used to model a dynamic system that is gov­
short-term memory (LSTM) or gated recurrent unit (GRU) networks can erned by ordinary differential equations, the discretized form of the
be better choices than standard neural networks (Chung et al., 2014). differential equations can be added to the loss function.
LSTM and GRU are capable of tracking long-term dependencies by
dy
controlling what information to add or remove to the hidden state = f (x(t), y, t) (6)
dt
through structures called gates (Hochreiter and Schmidhuber, 1997;
Cho et al., 2014). The ordinary differential equations expressed in Eq. (6) can be dis­
Recurrent neural networks have a chain of repeating units. In stan­ cretized using Euler’s backward method as follows:
dard RNNs, repeating units have a simple computation node such as a
yn = yn− 1 + hf (xn , yn , tn ) (7)
single tanh layer. On the other hand, LSTM and GRU networks have
more complex recurrent units containing computational blocks which where h = tn − tn− 1 denotes the step size, tn , xn , yn denotes the time,
control the information flow as shown in Fig. 3. input, and output at the nth time step, respectively.
The equation can be rewritten as follows:
2.3. Physics-informed neural networks for dynamical systems
yn − yn− 1 = hf (xn , yn , tn ) (8)
2.3.1. Training with a multi-objective loss function When the left-hand side of Eq. (8) is calculated for each point in the
Training of the neural networks aims to find the appropriate weights target data and the right-hand side of Eq. (8) is calculated using the
and biases which minimize the loss function. The mean squared error predicted values, the sum of differences between these calculations must
(MSE) loss function, which is shown below, is commonly used in be zero if all target values are predicted perfectly. Accordingly, aiming
regression models: to minimize the mean of the squared of these differences in addition to
the mean squared error between neural network predictions and target
1 ∑
N
L= y i )2
(yi − ̂ (4) data can increase the learning performance of the neural network model.
N i

3
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

First, the data-driven node is modeled as a multi-layer perceptron


(MLP). Then this MLP and the known physical relation are embedded
into the recurrent neural network cell to march in time while integrating
starting from y0. Finally, the proposed PI-RNN model with a hybrid Euler
cell is trained with the observed inputs and outputs using the mean
squared error loss function as expressed in Eq. (4).
Physics-informed layers are used to model the well-known dynamics,
while data-driven layers are used to model the missing, oversimplified,
or difficult-to-model dynamics. Having a physics-informed node in the
network structure may reduce overfitting since the recurrent neural
Fig. 4. Euler RNN Cell. network cell is more inclined to conform the physical knowledge. Even
though the training error of physics-informed layer augmented RNN is
the highest, test error is the lowest among the neural network models.
Hybrid recurrent neural networks can predict the trend because of the
physics-informed layers in their structure regardless of the learning
performances of the data-driven layers. They are expected to have better
test performances since the physical knowledge in their structure is also
valid for the unseen data.

2.4. Hyper-parameter optimization

The performance of machine learning models is highly dependent on


the parameters. There are two types of parameters in machine learning:
Fig. 5. Hybrid Euler RNN Cell. model parameters and hyper-parameters (Luo, 2016). Model parameters
in neural networks are weights and biases which are estimated from the
training data. Finding the proper weights and biases is an optimization
Similarly, in Patel et al. (2022), feed-forward neural networks are
problem for the model. On the other hand, hyper-parameters cannot be
trained with physics-constrained loss functions. However, the authors
estimated by the model, they are determined either by trial and error or
do not work with differential equations and do not develop recurrent
by using the search algorithms. The hyper-parameters in neural net­
neural network models.
works include the number of hidden layers, the number of neurons in the
The function derived from physical knowledge including the dis­
layers, the number of epochs, batch size, activation function, optimizer,
cretized ordinary differential equation for the data having equidistant
learning rate, and the number of time steps in recurrent neural networks
time intervals can be written as:
(Yu, 2020; Gorgolis et al., 2019). The hyper-parameters are critical
1 ∑
N variables for the learning process. Thus, they have a great impact on the
P= (yi − yi− 1 − hf (xi , ̂y i , ti ))2 (9) estimation of the model parameters (Luo, 2016). Usually, increasing the
N i
number of hyper-parameters such as the number of epochs, hidden
Loss function with augmented physical knowledge could reduce the layers, and hidden layer neurons provides better learning of training
risk of overfitting since the loss function tends to keep the predictions data, but it can also be the cause of overfitting worsening the test per­
within the physical limits required by the equation. Therefore, the in­ formance. Therefore, choosing the proper hyper-parameters considering
crease in the training error might be observed in the physics-informed the aforementioned trade-off is crucial for both the training and test
models compared to the physics-uninformed models, however, the test performances of the model. The choice of tuning strategy is critical for
error may decrease regardless. managing the hyper-parameter configuration successfully (Alibrahim
and Ludwig, 2021). In this paper, our main aim is to compare the impact
2.3.2. Training with physics-informed layers embedded as hybrid of physics-informed modeling on the hyper-parameter tuning. Accord­
The ordinary differential equation expressed in Eq. (10) is usually ingly, hyper-parameter optimization will be conducted in order to find
solved through numerical integration based on the known initial values. the optimal number of hidden layers and number of neurons in the
dy hidden layers of GRUs and the comparison is made for physics unin­
= f (x(t), y) (10) formed and physics informed models.
dt
As a numerical integration method, Euler’s forward method with 2.4.1. Grid search algorithm
unit time step can be used to obtain Eq. (11): Grid search is a basic algorithm which searches exhaustively over

n specified search space for hyper-parameters (Yu, 2020). The purpose is
yn = y0 + f (xt , yt− 1 ) (11) to find the optimal hyper-parameter combination that minimizes the
objective function. Cross-validation scores can be used as the objective
t=1

The formulation is similar to a single traditional RNN cell where the function for regression problems. Suppose the cross-validation score is
hidden cell takes both the past hidden cell information and the obser­ evaluated using mean squared error. In that case, the hyper-parameters
vations at the current time step as inputs. These inputs are passed which deliver the minimum of the validation mean squared error are
through a perceptron such as a single tanh layer and the output of the selected as optimal hyper-parameters.
current time step is obtained. Accordingly, a recurrent neural network
cell can be designed as illustrated in Fig. 4 which performs the forward 2.4.2. Gaussian processes-based Bayesian optimization algorithm
Euler integration method for the implementation of Eq. (10) (Viana Bayesian optimization ensures a probabilistically principled pro­
et al., 2021). cedure that uses the Bayes theorem. A probability model is built to
The model for f(xt , yt− 1 ) in the proposed Euler RNN cell can be approximate the objective function called the surrogate model, which
defined as a hybrid model by combining the physics-informed node and directs future sampling. The gaussian process is commonly used as a
data-driven node in parallel as shown in Fig. 5 (Dourado andViana, surrogate model, assuming that the function values follow multivariate
2019). Gaussian distribution (Bergstra et al., 2014). The aim is to find the

4
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

hyper-parameters which minimize or maximize the objective function.


dCA v0
The objective function can be the cross-validation score which can be = − kCA CB − CA (13)
dt V
evaluated using the mean squared error. Hyper-parameters are selected
from the defined search space and to determine which hyper-parameter dCB v0 ( )
combination to try next, acquisition function is used. The surrogate = − kCA CB + CBf − CB (14)
dt V
model is updated by the selected hyper-parameter combination and
corresponding objective function score. The algorithm usually termi­ dCC v0
= kCA CB − CC (15)
nates when certain number of evaluations are performed. Bayesian dt V
optimization can use the knowledge from all past evaluations while
deciding which hyper-parameter combinations to try, therefore, the where CA , CB , Cc are the concentrations of A, B, and C, respectively, k is
optimal results can be obtained by less iterations (Snoek et al., 2012). If the rate constant, v0 is the volumetric flow rate of B, V is the liquid
the objective function is complex, non-convex and computationally volume in the semi-batch reactor, and CBf is the concentration of B in the
expensive to evaluate, such as in hyper-parameter optimization, using feed.
Bayesian optimization can be advantageous. By assuming constant density, and constant volumetric flow rate of
B, the liquid volume varies with time as follows which is derived from an
2.4.3. Genetic algorithm overall mass balance:
A genetic algorithm is an optimization-oriented search method that dV
is inspired by a natural selection theory (Goldberg, 1989). The algorithm = v0 (16)
dt
starts with an initial population which is generally created randomly
The initial liquid volume in the reactor is V0 , therefore, the liquid
from the defined search spaces. This population includes a set of in­
volume can be written as:
dividuals known as chromosomes that are the solutions to the problem.
The chromosomes are characterized by a set of integer variables, known V = V0 + v0 t (17)
as genes which represent the values of the hyperparameters. Each
chromosome is given a fitness score based on the fitness function. The It is assumed that k = 9.439 × 10− 5 m3.mol.s− 1, CBf = 17.66 mol.m− 3, v0
fitness function can be selected as mean squared error, root mean = 1.416 × 10− 3 m3.s− 1, V0 = 3531 m3.
squared error, mean absolute error and so on. The fittest chromosome The initial concentrations of the molecules in the reactor (when t =
pairs (parents) are selected as those with the lowest mean squared error 0) are CA0 = 70.63 mol.m− 3, CB0 = CC0 = 0.
from the population. The genes of these parents are transferred to the Three of the ordinary differential equations are solved simulta­
next generation. Crossover sites including the genes are selected neously using ode45 in Matlab given the initial conditions of the con­
randomly for each parent. The genes of these parents are exchanged centrations and time interval as 30,000 s. As a result, a synthetic dataset
until the crossover point is reached thus new chromosomes are created of size 500 is created. The data is normalized between − 1 and 1. The
called offspring. Mutation occurs for some of the genes in offspring to concentrations of A and B are taken as inputs and the concentration of C
protect diversity. The process continues in an iterative manner from the is taken as output for the machine learning models.
fitness evaluation step until the termination condition is reached. Usu­ Physics-uninformed and physics-informed ANN, RNN, LSTM, and
ally, the algorithm terminates when a certain number of generations is GRU networks are used to model the proposed semi-batch reactor.
reached or if the best fitness value does not change for some given time. Machine learning models for this study are developed in Python using
Its advantages over other search algorithms are listed below (Gold­ TensorFlow and Keras frameworks (Abadi et al., 2016; Chollet, 2015).
berg, 1989): The codes and dataset are publicly available.2
Two hidden layers with 25 neurons with a hyperbolic tangent acti­
(1) Genetic algorithm searches parallel from a population of points, vation function are used for all machine learning models. The number of
not a single point, therefore trapping in local optimal solution time steps is taken as 5 in recurrent neural network models. The models
might be avoided. are trained using Adam optimizer (Kingma and Ba, 2017).
(2) Genetic algorithm uses just the objective function to calculate the For the physics-uninformed models, the objective is to minimize the
fitness score, it does not require derivatives or auxiliary mean squared error loss function as shown in Eq. (4). For the physics-
knowledge. informed models, unnormalized data values are used when physical
(3) Genetic algorithm uses probabilistic selection rules rather than knowledge is included in order to be able to capture the relation. There
deterministic rules. are two different training approaches, training with bi-objective loss
(4) Genetic algorithm does not work on the parameters themselves; it function and training with physics-informed layers embedded:
works on the chromosomes.
(i) Training is done with a bi-objective customized loss function as
3. Results and discussion expressed in Eq. (5) and the discretized form of Eq. (15) is
embedded in loss function following Eq. (9) while step size is 1
3.1. Semi-batch reactor min. The calculation of the function with augmented physical
knowledge is given in Eq. (18):
Semi-batch reactors have non-linear and non-stationary dynamic N ( ( ))2
1 ∑ v0
behavior. For this case study, an elementary liquid-phase reaction is P= CCi − CC(i− 1) − 1 ∗ kCAi CBi − ̂ Ci
C (18)
assumed to occur in the isothermal semi-batch reactor. Molecule B is N i V0 + v0 ti
entering to the semi-batch reactor continuously with constant molar
feed and reacted with molecule A which is already present in the reactor.
Accordingly, molecule C is obtained as the product following the reac­ (ii) Data-driven layers and physics-informed layers are combined in
tion shown in Eq. (12). the proposed RNN cell. The multiplication of k, CA and CB in the
ordinary differential equation expressed in Eq. (15) is introduced
A + B→C (12)
The mole balances of the molecules in terms of concentrations are
written as follows: 2
https://github.com/TuseAsrav/Physics-Informed-Neural-Networks-and-
Hyper-parameter-Optimization-for-Dynamic-Process-Systems.

5
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

constant density assumptions. Additionally, saturation might


occur as time passes, so data-driven prediction is more reliable for
the part that depends on time.

Training data is selected in two different ways:

(i) First 30% of the normalized data is used as training data,


remaining is used as test data to see the performance of the model
if the output changes in the opposite direction.
(ii) Training data and test data have been replaced with each other to
be able to observe the performance of the model when the tran­
Fig. 6. Semi-Batch Reactor. siency in the training data is weaker.

Performances of the machine learning models when the first 30% of


the normalized data is selected as training data and selected as test data
are given in Sections 3.1.1 and 3.1.2, respectively. In the figures, the
concentration of molecule C is given in mol.ft− 3 and the time is given in
minutes (Figs. 6 –11).

3.1.1. First scenario results

3.1.2. Second scenario results


Training and test performances of machine learning models for both
Fig. 7. Hybrid Euler RNN cell for Semi-Batch Reactor. scenarios are evaluated using root mean squared error and reported in
Table 3.1.
to the recurrent neural network cell as physics-informed layers. In the first scenario, 30% of the data from the beginning is taken as
The remaining part of the equation is estimated through data- training data. It is shown in Figs. 8 and 9 that training data and test data
driven layers. The term that is modeled as physics-informed in­ are in opposite directions. All proposed feed-forward artificial neural
cludes rate constant which is only dependent on constant tem­ networks and recurrent neural networks can predict the direction of the
perature assumption according to Arrhenius equation and is a concentration of C, however recurrent neural networks perform better
relatively small value. However, the term that is modeled as data- than feed-forward artificial neural networks as expected due to the dy­
driven is dependent on both constant volumetric flow rate and namic behavior. Even though the training mean squared errors of some

Fig. 8. The performance of feed-forward artificial neural networks.

6
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

Fig. 9. The performance of recurrent neural networks.

Fig. 10. The performance of feed-forward artificial neural networks.

recurrent neural network models are slightly higher than the training small. The simple recurrent neural network gives the worst training
mean squared errors of feed-forward neural networks, all recurrent mean squared error, but the test mean squared error is less compared to
neural network models deliver better test performances. When feed- LSTM and GRU. However, it is seen from Fig. 9 that simple RNN does not
forward artificial neural network is trained with bi-objective loss func­ produce the correct trend of the test data so it may not be preferred over
tion, the training mean squared error increases slightly, however; the LSTM and GRU models. Therefore, it can be said that mean squared error
test mean squared error decreases. Yet, the decrease in test error may values could be misleading and may require improvement. Integrating
still be unacceptable for the model to capture the desired behavior as can physical knowledge into the loss function of simple RNN gives the best
be seen in Fig. 8. All the root mean squared error values in Table 3.1 test performance but the recurrent neural network that combines the
seem to be small since the actual concentration values in the data are physics-informed layers and data-driven layers might be preferred since

7
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

Fig. 11. The performance of recurrent neural networks.

Table 3.1
Performance of the Semi-Batch Neural Network Models.
Machine learning First Scenario Second Scenario
model Training RMSE Test RMSE Training RMSE Test RMSE

ANN 1.7290e-03 0.0612 1.0094e-03 0.0911


Bi-objective PI-ANN 2.3334e-03 0.0568 4.8760e-04 0.0710
RNN 9.2246e-03 0.0238 0.0141 0.0700
Bi-objective PI-RNN 6.4320e-03 8.6770e-03 8.0581e-03 0.0141
LSTM 1.2713e-03 0.0356 9.0861e-04 0.0265
Bi-objective PI-LSTM 1.2338e-03 0.0349 1.0828e-03 0.0245
GRU 1.2713e-03 0.0360 9.0861e-04 0.0141
Bi-objective PI-GRU 1.2338e-03 0.0336 1.0828e-03 0.0173
Hybrid PI-RNN 2.6681e-03 0.0120 3.2014e-04 9.3182e-03

it predicts the trend better. Even though the training mean squared errors of some of the recurrent
For the second scenario, 70% of the data from the end is taken as neural networks are higher than the training mean squared errors of
training data. When physical knowledge is added to the loss function of feed-forward artificial neural networks, the test mean squared errors of
the feed-forward artificial neural network, the performance of the model the recurrent neural network are much less than the test mean squared
slightly improves but it can only predict the direction of the test data, yet errors of feed-forward artificial neural networks. Similarly, physics-
not the trend. On the other hand, recurrent neural networks perform informed neural networks always decrease the test mean squared error
better, as expected since semi-batch reactors have dynamic behavior and for this case study regardless of the fact that the training mean squared
recurrent neural networks are best suited for dynamic process systems. error is increased or decreased. There is only one exception with GRU,

Fig. 12. General overview of the BSM1 plant (taken from Alex et al., 2008).

8
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

Table 3.2 Table 3.3


ASM1 variables. Kinetic Parameters for Eq. (19). (Alex et al., 2008).
Definition Notation Unit Parameter Value Unit
− 1
Time t d μA 0.5 d− 1
Volumetric air flow rate Q m3.d− 1 KNH 1.0 g NH3-N.m− 3
3 3
Soluble inert organic matter concentration SI g COD.m− KO,A 0.4 g (-COD).m−
3
Readily biodegradable substrate concentration SS g COD.m−
3
Particulate inert organic matter concentration XI g COD.m−
3
Slowly biodegradable substrate concentration XS g COD.m−
Active heterotrophic biomass concentration XB,H g COD.m− 3 Table 3.4
Active autotrophic biomass concentration XB,A g COD.m− 3
Stoichiometric parameters for Eq. (20). (Alex et al., 2008).
3
Particulate products arising from biomass decay XP g COD.m−
Parameter Value Unit
concentration
Dissolved oxygen concentration SO g (-COD). iXB 0.08 g N. (g COD)− 1 in biomass
m− 3 YA 0.24 g cell COD formed. (g N oxidized)− 1

Soluble nitrate and nitrite oxygen concentration SNO g NO3-N.


m− 3
Soluble NH+
4 +NH3 nitrogen concentration SNH g NH3-N. as output is observed. The neural network is expected to predict the
m− 3
missing dynamics.
Soluble biodegradable organic nitrogen concentration SND g N.m− 3
Particulate biodegradable organic nitrogen XND g N.m− 3 ( )( )
SNH SO
concentration ρ 3 = μA XB,A (19)
Alkalinity SALK mol.m− 3 KNH + SNH KO,A + SO

In any influent: SO = 0 g (-COD).m , XB,A = 0 g COD.m , SNO = 0 g NO3-N.m− 3,


− 3 − 3 According to BSM1, the aerobic growth of autotrophs in the biore­
XP = 0 g COD.m− 3, SALK = 7 mol.m− 3. actor is governed by Eq. (17), which affects the SNH dynamics as follows:
( )
dSNH 1
where it is trained with bi-objective loss function delivering higher = − iXB + ρ (20)
dt YA 3
training and test mean squared error values compared to the case where
GRU is trained with mean squared error loss function. The positive effect The kinetic parameters and stoichiometric parameters that are used
of the physical knowledge in the loss function is mostly observed in in Eq. (19) and Eq. (20) are given in Table 3.3 and Table 3.4,
simple recurrent neural networks. The best test performance is observed respectively.
for the physics-informed layer embedded hybrid recurrent neural For the machine learning models, SNH in the 5th tank is taken as
network model, showing the improvement gap for recurrent neural output while SS, XS, XI, Q, SND, XND, XBH, SNH in the influent are taken as
networks integrated with physics-informed methods. inputs. The data is normalized between − 1 and 1.
Linear regression, support vector machine, and physics-informed and
physics-uninformed ANN, RNN, LSTM and GRU networks are used to
3.2. Wastewater treatment unit model the wastewater treatment unit. All proposed machine learning
models for this study are developed in Python using TensorFlow and
Wastewater treatment plants are usually operated cyclic-batch wise Keras frameworks (Abadi et al., 2016; Chollet, 2015). The codes and
contributing to complex non-linear systems subject to great disturbances dataset are publicly available.3
in flow and load. First principles modeling for these systems is chal­ Two hidden layers with 25 neurons with a hyperbolic tangent acti­
lenging because they include large numbers of biological, physico­ vation function are used for all machine learning models. The number of
chemical and biochemical sub-processes (Guo et al., 2015). Developing time steps is taken as 5 in recurrent neural network models. The models
a promising mechanistic model especially for industrial wastewater are trained using Adam optimizer (Kingma and Ba, 2017).
treatment plants is quite hard, even two different wastewater treatment For the physics-uninformed models, the objective is to minimize the
plant in a same factory perform distinct process behaviors. mean squared error loss function as shown in Eq. (4).
In order to design neural network models for biological wastewater For the physics-informed models, unnormalized data values are used
treatment, an open-loop simulation of Benchmark Simulation Model 1 when physical knowledge is included to be able to capture the relation.
(BSM1) were utilized to create a synthetic dataset by Alex et al. (2008) There are two different training approaches, training with bi-objective
as shown in Fig. 12. The model includes two anaerobic tanks, three loss function and training with physics-informed layers embedded:
aerated tanks, and one settler.
The Activated Sludge Model no:1 (Henze et al., 1987) is a mecha­ (i) Training is done with a bi-objective customized loss function as
nistic dynamical model which is selected in BSM1 to describe the phe­ expressed in Eq. (5) and the discretized form of Eq. (20) is
nomena including the biological processes of carbonaceous energy embedded in loss function following Eq. (9) while step size is
removal, nitrification, and denitrification occurring in the bioreactor. 0.01042 days. The calculation of the function with augmented
The influent data with size 1345 which is initially proposed by physical knowledge is given in Eq. (21):
Vanhooren and Nguyen (1996), is used in the simulation. There are
N ( (( ) ))2
three different influent data for three different weather conditions; dry 1 ∑ 1
P= SNHi − SNH(i− 1) + 0.01042 ∗ iXB + ρ (21)
weather, rainy weather, and stormy weather. In this case study, rainy N i YA 3
weather influent data is selected. Rainy weather influent data is a
combination of dry weather and a long rain period that begins in the
middle of the eighth day and lasts approximately 2 days. (ii) Data-driven layers and physics-informed layers are combined in
The variables on the dataset are given in Table 3.2. Organic con­ the proposed RNN cell. The coefficient of ρ3 in the ordinary dif­
stituents are expressed in chemical oxygen demand (COD) units. ferential equation expressed in Eq. (20) is introduced to the
SNH is a critical determinant in wastewater pollution due to its neural network as physical knowledge. The calculation of
toxicity. Therefore, predicting SNH using recurrent neural networks is
vital to be aware of the violation limits (Pisa et al., 2018).
In Quaghebeur et al. (2022), the parameter μA in Eq. (19) is assumed 3
https://github.com/TuseAsrav/Physics-Informed-Neural-Networks-and-
to be zero, and the performance of the hybrid model when SNH is taken Hyper-parameter-Optimization-for-Dynamic-Process-Systems.

9
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

Table 3.5 is rainy which is quite different from the dry weather influent data.
Performance of the Wastewater Treatment Unit Neural Network Models. Therefore, the performances of the models can be evaluated to predict
Machine learning model Training MSE Test MSE the unseen operational regions which are given in Section 3.2.1.
Linear Regression 1.6507 34.7919
Support Vector Regression 0.8737 36.6601 3.2.1. Results
ANN 1.0892 27.2682 The training and test performances of machine learning models are
Bi-objective PI-ANN 1.1440 25.6443 evaluated using mean squared error and reported in Table 3.5.
RNN 0.0467 3.2956 Linear regression, support vector regression and feed-forward arti­
Bi-objective PI-RNN 0.0558 1.3629
LSTM 0.0162 2.1453
ficial neural network models give oscillatory predictions and fail to learn
Bi-objective PI-LSTM 0.0126 2.1216 the test data. Support vector machine model exhibits lower training
GRU 0.0162 0.6943 error than linear regression and feed-forward artificial neural network
Bi-objective PI-GRU 0.0126 0.4326 models, however, yields the highest test mean squared error. Linear
Hybrid PI-RNN 0.1499 0.3605
regression and feed-forward artificial neural networks can partly predict
the trend of the test data, but test mean squared errors are quite high.
The bi-objective physics-informed feed-forward artificial neural
network (Bi-objective PI-ANN) also fails to predict the test data even
though the test mean squared error decreases slightly (Fig. 15).
Recurrent neural networks perform well as expected because of the
dynamic behavior. Simple RNN gives oscillatory test predictions, which
are not observed for the other recurrent neural networks. Simple RNN
trained with bi-objective loss function significantly improves the test
performance, however, still delivers slightly higher training error. For
LSTM and GRU, when the physical knowledge is added to the loss
function, both training performances and test performances improve.
Even though the training errors of LSTM and GRU models are equal,
Fig. 13. Hybrid Euler RNN cell for Wastewater Treatment. GRU performs better than LSTM over test data. Hybrid PI-RNN has the
highest training mean squared error among the recurrent neural net­
ρ3 given in Eq. (19) is estimated through a multilayer perceptron works. It is shown in Fig. 16 that, the trend of the training data can be
because of the complex dynamics. fully seen by the network, but it slips slightly which also affects the test
performance. Even then, the hybrid PI-RNN with embedded layers
Training data is selected from the influent data before the rain event brings about the best test performance with a small mean squared error.
occurs, and test data is selected from the influent data when the weather

Fig. 14. The performance of linear regression and SVM regression.

10
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

Fig. 15. The performance of feed-forward artificial neural networks.

Fig. 16. The performance of recurrent neural networks.

3.2.2. Hyper-parameter optimization to investigate the impact of physics-informed training on the perfor­
Both physics uninformed and physics informed GRU deliver better mance of hyper-parameter optimization. Hyperbolic tangent activation
performance than simple RNN and LSTM for the wastewater treatment function is used in the hidden layers and the number of time steps is
unit. In this section, the number of hidden layers and number of neurons taken as 5. The models are trained using Adam optimizer (Kingma and
in the hidden layers are determined through grid search, Gaussian- Ba, 2017).
processes based Bayesian optimization and genetic algorithm for
physics-informed and physics-uninformed GRU models. The main aim is Grid search. The objective function for grid search is determined as 2-

11
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

Table 3.6
Hyper-parameter Optimization Performance through grid search.
Machine learning model Run # Training MSE Test MSE # of neurons in the hidden layers # of hidden layers

GRU 1 0.0629 0.9309 27 2


GRU 2 0.0332 0.4818 33 2
GRU 3 0.0188 0.3680 31 3
Bi-objective PI-GRU 1 0.0402 0.4993 23 2
Bi-objective PI-GRU 2 0.0278 0.4373 31 2
Bi-objective PI-GRU 3 0.0203 0.3458 30 3

Table 3.7
Hyper-parameter Optimization Performance through GP-based Bayesian Optimization.
Machine learning model Run # Training MSE Test MSE # of neurons in the hidden layers # of hidden layers

GRU 1 0.0273 0.8613 24 2


GRU 2 0.0225 0.4264 34 2
GRU 3 0.0157 0.3740 27 3
Bi-objective PI-GRU 1 0.0289 0.7252 16 2
Bi-objective PI-GRU 2 0.0135 0.3818 32 2
Bi-objective PI-GRU 3 0.0191 0.3169 26 3

Table 3.8
Hyper-parameter Optimization Performance through Genetic Algorithm.
Machine learning model Run # Training MSE Test MSE # of neurons in the hidden layers # of hidden layers

GRU 1 0.0646 1.2906 15 2


GRU 2 0.0263 0.7542 36 2
GRU 3 0.0248 0.5734 28 3
Bi-objective PI-GRU 1 0.0263 0.5810 15 2
Bi-objective PI-GRU 2 0.0208 0.5118 28 2
Bi-objective PI-GRU 3 0.0173 0.1858 36 2

fold cross-validation score evaluated by mean squared error. The search Genetic algorithm. For genetic algorithm, the population size is deter­
space for hyper-parameters includes the integers between 1 and 3 for mined as 4 and a binary array with size 9 is used as a genetic solution
number of hidden layers and the integers between 15 and 40 for number representation (chromosome). First six genes represent number of neu­
of neurons in the hidden layers. rons in the hidden layers and last three genes represent number of
Training and test performances of three runs for each model along hidden layers. The fitness function for genetic algorithm is determined
with hyper-parameter values found by the genetic algorithm are re­ as training mean squared error and the algorithm terminates when a
ported in Table 3.6. certain number of generations is reached, which is 4.
Comparing the second run of GRU and first run of PI-GRU, PI-GRU The training and test performances of three runs for each model
delivers almost same test mean squared error with GRU but utilizes ten along with hyper-parameter values found by the genetic algorithm are
less neurons. Moreover, although GRU in the second run uses two reported in Table 3.8.
additional neurons than the second run of PI-GRU, the test mean squared When the runs with similar training mean squared errors are
error of PI-GRU is lower than the test mean squared error of GRU. compared, test mean squared errors have lower values in PI-GRU
Similarly, when the third runs are compared, it can be seen that PI-GRU models. For example, the second run of GRU and the first run of PI-
gives better test performance even though the training mean squared GRU deliver exactly same training mean squared errors, but the test
error is slightly higher by using one less neuron. Similar observation can mean squared error of PI-GRU is less even though it uses considerably
be made by analyzing first runs as training mean squared error of the fewer number of neurons in the hidden layers. A similar observation can
first run of GRU is highest among the others. be made by comparing the third run of GRU and the second run of PI-
GRU which have equal number of neurons in their hidden layers.
Gaussian-processes based Bayesian optimization. The objective function Although GRU has one more hidden layer, it brings about higher test
and the search space for hyper-parameters are selected as the same with mean squared error.
the grid search algorithm. Total number of evaluations is determined as Upon examining the first runs, even though same number of hidden
ten and the number of initial points before approximating with Gaussian layers and neurons are used, PI-GRU delivers better training and test
process estimator is determined as five. At every iteration, acquisition performances. The training mean squared error of GRU is significantly
function is selected probabilistically among lower confidence bound, higher compared with the other runs, accordingly, it can be said that the
negative expected improvement and negative probability of first run of GRU gives an underfit model.
improvement. Similarly, the second run of GRU and third run of PI-GRU uses same
The training and test performances of three runs for each model number of hidden layers and neurons. However, there is a significant
along with hyper-parameter values found by the Gaussian processes- difference in the test performances. PI-GRU brings about better test
based Bayesian Optimization are reported in Table 3.7. performance having the least test mean squared error.
When the first runs, second runs and third runs are compared, PI- From all the results obtained after hyper-parameter optimization
GRU models delivers better test performances regardless of the slight through grid search, Gaussian-processes based Bayesian optimization
increases or decreases in training mean squared errors even though and genetic algorithm, we yield that using physics-informed bi-objective
using equal number of hidden layers and fewer number of neurons in the training for hyper-parameter optimization enables more robust data-
hidden layers, showing the improving impact of physics-informed driven dynamic modeling in this study.
methods for hyper-parameter optimization.

12
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195

4. Conclusion Bergstra, J., Bardenet, R., Bengio, Y., Kegl, B., 2014. Algorithms for hyper-parameter
optimization. In: Proceedings of the Neural Information Processing Systems
Conference.
Physics-informed neural networks have significant advantages over Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y. (2014). On the properties of
purely data-driven black-box models for two case studies presented for neural machine translation: encoder-decoder approaches. doi:10.48550/arXiv.1
the dynamic modeling of process systems in this study. Two different 409.1259.
Chollet F. (2015). Keras.
physics-informed training approaches are considered. In the first one, Chung, J., Gulcehre, C., Cho, K., Bengio, Y. (2014). Empirical evaluation of gated
the loss function includes the error between the left-hand side solved recurrent neural networks on sequence modeling. doi:10.48550/arXiv.1412.3555.
with all the real values and the right-hand side solved with all the pre­ Dourado, A., & Viana, F.A.C. (2019). Physics-informed neural networks for corrosion-
fatigue prognosis. doi:10.36001/phmconf.2019.v11i1.814.
dicted values of the discretized form of an ordinary differential equation. Goldberg D.E. (1989). Genetic Algorithms in Search, Optimization and Machine
In the second one, missing or complex dynamics are modeled through Learning.
data-driven layers and the well-known dynamics are embedded as the Gorgolis, N., Hatzilygeroudis, I., Istenes, Z., Gyenne, L.G., 2019. Hyperparameter
optimization of LSTM network models through genetic algorithm. In: Proceedings of
physical knowledge in a hybrid recurrent neural network cell perform­ the 10th International Conference on Information, Intelligence, Systems and
ing integration. The impact of physics-informed neural networks is Applications (IISA), pp. 1–4. https://doi.org/10.1109/IISA.2019.8900675.
investigated on a semi-batch reactor and a wastewater treatment unit Guo, H., Jeong, K., Lim, J., Jo, J., Kim, Y.M., Park, J.pyo, Kim, J., H, Cho, K, H, 2015.
Prediction of effluent concentration in a wastewater treatment plant using machine
which are dynamic process systems governed by ordinary differential learning models. J. Environ. Sci. 32, 90–101. https://doi.org/10.1016/j.
equations. Different machine learning algorithms such as feed-forward jes.2015.01.007 (China).
artificial neural networks, linear regression and support vector regres­ Haghighat, E., Juanes, R., 2021. SciANN: a Keras/TensorFlow wrapper for scientific
computations and physics-informed deep learning using artificial neural networks.
sion are also used to model the semi-batch reactor and the wastewater
Comput. Methods Appl. Mech. Eng. 373 https://doi.org/10.1016/j.
treatment unit, however, recurrent neural networks delivered consid­ cma.2020.113552.
erably better performances since they can capture the sequential infor­ Hansen, L.D., Bjerregaard, M.S., Durdevic, P., 2022. Modeling phosphorous dynamics in
mation. There are two different scenarios investigated in the semi-batch a wastewater treatment process using Bayesian optimized LSTM. Comput. Chem.
Eng. 160 https://doi.org/10.1016/j.compchemeng.2022.107738.
reactor; the training data of the first scenario is taken as the test data in Henze M., Grady Jr C.P.L., Gujer W., Marais G.v.R., & Matsuo T. (1987). Activated
the second scenario. In both scenarios, hybrid recurrent neural network Sludge Model no1. IAWQ Scientific and Technical Report No1, IAWQ, London, UK.
gives the minimum test error and predicts the test trend better than Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9,
1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
other networks. For the wastewater treatment unit, hybrid RNN delivers Pisa, I., Santin, I., Vicario, J.L., Morell, A., 2018. A recurrent neural network for
the best test performance, and the second-best performance is obtained wastewater treatment plant effluents’ prediction. In: Proceedings of the Actas de Las
by physics-informed GRU. Physics-informed training improves the test XXXIX Jornadas de Automática.
Kingma, D.P., Ba, J., 2017. Adam: A Method for Stochastic Optimization. https://doi.
performances compared to the physics uninformed models in most of org/10.48550/arXiv.1412.6980.
them regardless of the fact that the training error is decreased or Luo, G., 2016. A review of automatic selection methods for machine learning algorithms
increased. Finally, hyper-parameter optimization is done for physics- and hyper-parameter values. Netw. Model. Anal. Health Inform. Bioinform. 5, 18.
https://doi.org/10.1007/s13721-016-0125-6.
uninformed and physics-informed GRU models of the wastewater Merkelbach, K., Schweidtmann, A.M., Müller, Y., Schwoebel, P., Mhamdi, A., Mitsos, A.,
treatment unit, and it is observed that similar test errors can be obtained Schuppert, A., Mrziglod, T., Schneckener, S., 2022. HybridML: open source platform
using fewer number of neurons in physics-informed training. for hybrid modeling. Comput. Chem. Eng. 160 https://doi.org/10.1016/j.
compchemeng.2022.107736.
Nascimento, R.G., & Viana, F.A.C. (2019). Fleet prognosis with physics-informed
Declaration of Competing İnterest recurrent neural networks. doi:10.48550/arXiv.1901.05512.
Patel, R.S., Bhartiya, S., Gudi, R.D., 2022. Physics constrained learning in neural network
The authors declare that they have no known competing financial based modeling. IFAC-PapersOnLine 55 (7), 79–85. https://doi.org/10.1016/j.
ifacol.2022.07.425.
interests or personal relationships that could have appeared to influence Quaghebeur, W., Torfs, E., de Baets, B., Nopens, I., 2022. Hybrid differential equations:
the work reported in this paper. integrating mechanistic and data-driven techniques for modelling of water systems.
Water Res. 213 https://doi.org/10.1016/j.watres.2022.118166.
Raissi, M., Perdikaris, P., Karniadakis, G.E., 2019. Physics-informed neural networks: a
Data availability deep learning framework for solving forward and inverse problems involving
nonlinear partial differential equations. J. Comput. Phys. 378, 686–707. https://doi.
Data will be made available on request. org/10.1016/j.jcp.2018.10.045.
Snoek, J., Larochelle, H., Adam, R.P., 2012. Practical Bayesian optimization of machine
learning algorithms. In: Proceedings of the Advances in Neural Information
Processing Systems, 25.
References Subraveti, S.G., Li, Z., Prasad, V., Rajendran, A., 2022. Physics-based neural networks for
simulation and synthesis of cyclic adsorption processes. Ind. Eng. Chem. Res. 61
(11), 4095–4113. https://doi.org/10.1021/acs.iecr.1c04731.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., Misener, R., 2022. Maximizing information
Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G.,
from chemical engineering data sets: applications to machine learning. Chem. Eng.
Steiner, B., Tucker, P., Vasudevan, V., Warden, P., & Zheng, X. (2016). TensorFlow: a
Sci. 252 https://doi.org/10.1016/j.ces.2022.117469.
system for large-scale machine learning. OSDI’16: Proceedings of the 12th USENIX
Vanhooren, H., Nguyen, K., 1996. Development of a Simulation Protocol For Evaluation
conference on Operating Systems Design and Implementation.
of Respirometry-Based Control Strategies. Report University of Gent and University
Alex, J., Benedetti, L., Copp, J., Gernaey, K.v, Jeppsson, U., Nopens, I., Pons, M.N.,
of Ottawa.
Steyer, J.P., Vanrolleghem, P. (2008). Benchmark Simulation Model no. 1 (BSM1).
Viana, F.A.C., Nascimento, R.G., Dourado, A., Yucesan, Y.A., 2021. Estimating model
Report by the IWA Taskgroup on Benchmarking of Control Strategies for WWTPs.
inadequacy in ordinary differential equations with physics-informed neural
Alibrahim, H., Ludwig, S.A., 2021. Hyperparameter optimization: comparing genetic
networks. Comput. Struct. 245 https://doi.org/10.1016/j.compstruc.2020.106458.
algorithm against grid search and Bayesian optimization. In: Proceedings of the IEEE
von Stosch, M., Oliveira, R., Peres, J., Feyo de Azevedo, S., 2014. Hybrid semi-parametric
Congress on Evolutionary Computation, CEC 2021, pp. 1551–1559. https://doi.org/
modeling in process systems engineering: past, present and future. Comput. Chem.
10.1109/CEC45853.2021.9504761.
Eng. 60, 86–101. https://doi.org/10.1016/j.compchemeng.2013.08.008.
Andersson, J.A.E., Gillis, J., Horn, G., Rawlings, J.B., Diehl, M., 2019. CasADi: a software
Xiao, T., Wu, Z., Christofides, P.D., Armaou, A., Ni, D., 2022. Recurrent neural-network-
framework for nonlinear optimization and optimal control. Math. Program. Comput.
based model predictive control of a plasma etch process. Ind. Eng. Chem. Res. 61 (1),
11 (1), 1–36. https://doi.org/10.1007/s12532-018-0139-4.
638–652. https://doi.org/10.1021/acs.iecr.1c04251.
Asrav, T., Koksal, E.S., Esenboga, E.E., Cosgun, A., Kusoglu, G., Aydin, E., 2023. Physics-
Yu, T., Zhu. H. (2020) Hyper-parameter optimization: a review of algorithms and
informed neural network based modeling of an industrial wastewater treatment unit.
applications. doi:10.48550/arXiv.2003.05689.
In: Proceedings of the European Symposium on Computer Aided Process
Engineering. Accepted.

13

You might also like