2023-Physics-Informed Recurrent Neural Networks and Hyper-Parameter Optimization For Dynamic Process Systems
2023-Physics-Informed Recurrent Neural Networks and Hyper-Parameter Optimization For Dynamic Process Systems
A R T I C L E I N F O A B S T R A C T
Keywords: Many of the processes in chemical engineering applications are of dynamic nature. Mechanistic modeling of these
Machine learning processes is challenging due to the complexity and uncertainty. On the other hand, recurrent neural networks are
Recurrent neural networks useful to be utilized to model dynamic processes by using the available data. Although these networks can
Physics-informed neural networks
capture the complexities, they might contribute to overfitting and require high-quality and adequate data. In this
Hybrid neural networks
Hyper-parameter optimization
study, two different physics-informed training approaches are investigated. The first approach is using a multi-
objective loss function in the training including the discretized form of the differential equation. The second
approach is using a hybrid recurrent neural network cell with embedded physics-informed and data-driven nodes
performing Euler discretization. Physics-informed neural networks can improve test performance even though
decrease in training performance might be observed. Finally, smaller and more robust architecture are obtained
using hyper-parameter optimization when physics-informed training is performed.
1. Introduction dependencies difficult. On the other hand, LSTM (Long Short Term
Memory) or GRU (Gated Recurrent Unit) networks with more complex
Mechanistic mathematical models are developed based on the rule of mechanisms in the network structure process longer sequences by
the first principles to predict the actual behavior of processes. However, regulating the flow of information (Chung et al., 2014).
chemical engineering often deals with complex systems, and obtaining Data-driven machine learning models may have also some limita
an accurate mechanistic model for these systems is quite challenging tions. The performance of a purely data-driven model depends on both
(Thebelt et al., 2022). the quality and the quantity of the data. In Thebelt et al. (2022) four data
Artificial neural networks are utilized to empirically model nonlinear characteristics including the variance, volume, veracity, and physical
systems due to their capability of capturing complex relationships. They restrictions that make data-driven modeling difficult are explained.
are data-driven black-box models inspired by the human nervous sys Additionally, standard data-driven models usually neglect the physical
tem. One class of artificial neural networks is the recurrent neural net laws governing the real phenomena.
works extending the standard feed-forward artificial neural networks to Physics-informed neural networks can be used to find the solutions to
handle time-dependent responses. Recurrent neural networks are best differential equations and for discovering the form of differential
suited for sequential data and might be considered as an alternative equations. Raissi et al. (2019) propose physics-informed feed-forward
modeling approach for dynamical systems once proper data is available. neural networks to solve nonlinear differential equations through
For instance, in Xiao et al. (2022) a recurrent neural network model is selected collocation points by encoding the initial/boundary conditions
utilized as the prediction model to solve the model predictive controller as well as the differential equation into the loss function. In addition, the
problem for a complex system governed by partial differential authors introduce the inverse problem to find the coefficients of the
equations. differential equations. Similarly, a toolbox called SciANN is developed
A major disadvantage of the traditional recurrent neural networks is in Python by Haghighat and Juanes (2021) aiming to simplify the use of
the vanishing gradient problem which makes learning the long-term artificial neural networks for scientific computations while inheriting
* Corresponding author at: Department of Chemical and Biological Engineering, Koç University, Istanbul 34450, Turkey.
E-mail address: [email protected] (E. Aydin).
1
These authors contributed equally to this work.
https://doi.org/10.1016/j.compchemeng.2023.108195
Received 11 November 2022; Received in revised form 28 January 2023; Accepted 17 February 2023
Available online 18 February 2023
0098-1354/© 2023 Elsevier Ltd. All rights reserved.
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
2
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
3
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
The formulation is similar to a single traditional RNN cell where the function for regression problems. Suppose the cross-validation score is
hidden cell takes both the past hidden cell information and the obser evaluated using mean squared error. In that case, the hyper-parameters
vations at the current time step as inputs. These inputs are passed which deliver the minimum of the validation mean squared error are
through a perceptron such as a single tanh layer and the output of the selected as optimal hyper-parameters.
current time step is obtained. Accordingly, a recurrent neural network
cell can be designed as illustrated in Fig. 4 which performs the forward 2.4.2. Gaussian processes-based Bayesian optimization algorithm
Euler integration method for the implementation of Eq. (10) (Viana Bayesian optimization ensures a probabilistically principled pro
et al., 2021). cedure that uses the Bayes theorem. A probability model is built to
The model for f(xt , yt− 1 ) in the proposed Euler RNN cell can be approximate the objective function called the surrogate model, which
defined as a hybrid model by combining the physics-informed node and directs future sampling. The gaussian process is commonly used as a
data-driven node in parallel as shown in Fig. 5 (Dourado andViana, surrogate model, assuming that the function values follow multivariate
2019). Gaussian distribution (Bergstra et al., 2014). The aim is to find the
4
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
5
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
6
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
recurrent neural network models are slightly higher than the training small. The simple recurrent neural network gives the worst training
mean squared errors of feed-forward neural networks, all recurrent mean squared error, but the test mean squared error is less compared to
neural network models deliver better test performances. When feed- LSTM and GRU. However, it is seen from Fig. 9 that simple RNN does not
forward artificial neural network is trained with bi-objective loss func produce the correct trend of the test data so it may not be preferred over
tion, the training mean squared error increases slightly, however; the LSTM and GRU models. Therefore, it can be said that mean squared error
test mean squared error decreases. Yet, the decrease in test error may values could be misleading and may require improvement. Integrating
still be unacceptable for the model to capture the desired behavior as can physical knowledge into the loss function of simple RNN gives the best
be seen in Fig. 8. All the root mean squared error values in Table 3.1 test performance but the recurrent neural network that combines the
seem to be small since the actual concentration values in the data are physics-informed layers and data-driven layers might be preferred since
7
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
Table 3.1
Performance of the Semi-Batch Neural Network Models.
Machine learning First Scenario Second Scenario
model Training RMSE Test RMSE Training RMSE Test RMSE
it predicts the trend better. Even though the training mean squared errors of some of the recurrent
For the second scenario, 70% of the data from the end is taken as neural networks are higher than the training mean squared errors of
training data. When physical knowledge is added to the loss function of feed-forward artificial neural networks, the test mean squared errors of
the feed-forward artificial neural network, the performance of the model the recurrent neural network are much less than the test mean squared
slightly improves but it can only predict the direction of the test data, yet errors of feed-forward artificial neural networks. Similarly, physics-
not the trend. On the other hand, recurrent neural networks perform informed neural networks always decrease the test mean squared error
better, as expected since semi-batch reactors have dynamic behavior and for this case study regardless of the fact that the training mean squared
recurrent neural networks are best suited for dynamic process systems. error is increased or decreased. There is only one exception with GRU,
Fig. 12. General overview of the BSM1 plant (taken from Alex et al., 2008).
8
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
9
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
Table 3.5 is rainy which is quite different from the dry weather influent data.
Performance of the Wastewater Treatment Unit Neural Network Models. Therefore, the performances of the models can be evaluated to predict
Machine learning model Training MSE Test MSE the unseen operational regions which are given in Section 3.2.1.
Linear Regression 1.6507 34.7919
Support Vector Regression 0.8737 36.6601 3.2.1. Results
ANN 1.0892 27.2682 The training and test performances of machine learning models are
Bi-objective PI-ANN 1.1440 25.6443 evaluated using mean squared error and reported in Table 3.5.
RNN 0.0467 3.2956 Linear regression, support vector regression and feed-forward arti
Bi-objective PI-RNN 0.0558 1.3629
LSTM 0.0162 2.1453
ficial neural network models give oscillatory predictions and fail to learn
Bi-objective PI-LSTM 0.0126 2.1216 the test data. Support vector machine model exhibits lower training
GRU 0.0162 0.6943 error than linear regression and feed-forward artificial neural network
Bi-objective PI-GRU 0.0126 0.4326 models, however, yields the highest test mean squared error. Linear
Hybrid PI-RNN 0.1499 0.3605
regression and feed-forward artificial neural networks can partly predict
the trend of the test data, but test mean squared errors are quite high.
The bi-objective physics-informed feed-forward artificial neural
network (Bi-objective PI-ANN) also fails to predict the test data even
though the test mean squared error decreases slightly (Fig. 15).
Recurrent neural networks perform well as expected because of the
dynamic behavior. Simple RNN gives oscillatory test predictions, which
are not observed for the other recurrent neural networks. Simple RNN
trained with bi-objective loss function significantly improves the test
performance, however, still delivers slightly higher training error. For
LSTM and GRU, when the physical knowledge is added to the loss
function, both training performances and test performances improve.
Even though the training errors of LSTM and GRU models are equal,
Fig. 13. Hybrid Euler RNN cell for Wastewater Treatment. GRU performs better than LSTM over test data. Hybrid PI-RNN has the
highest training mean squared error among the recurrent neural net
ρ3 given in Eq. (19) is estimated through a multilayer perceptron works. It is shown in Fig. 16 that, the trend of the training data can be
because of the complex dynamics. fully seen by the network, but it slips slightly which also affects the test
performance. Even then, the hybrid PI-RNN with embedded layers
Training data is selected from the influent data before the rain event brings about the best test performance with a small mean squared error.
occurs, and test data is selected from the influent data when the weather
10
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
3.2.2. Hyper-parameter optimization to investigate the impact of physics-informed training on the perfor
Both physics uninformed and physics informed GRU deliver better mance of hyper-parameter optimization. Hyperbolic tangent activation
performance than simple RNN and LSTM for the wastewater treatment function is used in the hidden layers and the number of time steps is
unit. In this section, the number of hidden layers and number of neurons taken as 5. The models are trained using Adam optimizer (Kingma and
in the hidden layers are determined through grid search, Gaussian- Ba, 2017).
processes based Bayesian optimization and genetic algorithm for
physics-informed and physics-uninformed GRU models. The main aim is Grid search. The objective function for grid search is determined as 2-
11
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
Table 3.6
Hyper-parameter Optimization Performance through grid search.
Machine learning model Run # Training MSE Test MSE # of neurons in the hidden layers # of hidden layers
Table 3.7
Hyper-parameter Optimization Performance through GP-based Bayesian Optimization.
Machine learning model Run # Training MSE Test MSE # of neurons in the hidden layers # of hidden layers
Table 3.8
Hyper-parameter Optimization Performance through Genetic Algorithm.
Machine learning model Run # Training MSE Test MSE # of neurons in the hidden layers # of hidden layers
fold cross-validation score evaluated by mean squared error. The search Genetic algorithm. For genetic algorithm, the population size is deter
space for hyper-parameters includes the integers between 1 and 3 for mined as 4 and a binary array with size 9 is used as a genetic solution
number of hidden layers and the integers between 15 and 40 for number representation (chromosome). First six genes represent number of neu
of neurons in the hidden layers. rons in the hidden layers and last three genes represent number of
Training and test performances of three runs for each model along hidden layers. The fitness function for genetic algorithm is determined
with hyper-parameter values found by the genetic algorithm are re as training mean squared error and the algorithm terminates when a
ported in Table 3.6. certain number of generations is reached, which is 4.
Comparing the second run of GRU and first run of PI-GRU, PI-GRU The training and test performances of three runs for each model
delivers almost same test mean squared error with GRU but utilizes ten along with hyper-parameter values found by the genetic algorithm are
less neurons. Moreover, although GRU in the second run uses two reported in Table 3.8.
additional neurons than the second run of PI-GRU, the test mean squared When the runs with similar training mean squared errors are
error of PI-GRU is lower than the test mean squared error of GRU. compared, test mean squared errors have lower values in PI-GRU
Similarly, when the third runs are compared, it can be seen that PI-GRU models. For example, the second run of GRU and the first run of PI-
gives better test performance even though the training mean squared GRU deliver exactly same training mean squared errors, but the test
error is slightly higher by using one less neuron. Similar observation can mean squared error of PI-GRU is less even though it uses considerably
be made by analyzing first runs as training mean squared error of the fewer number of neurons in the hidden layers. A similar observation can
first run of GRU is highest among the others. be made by comparing the third run of GRU and the second run of PI-
GRU which have equal number of neurons in their hidden layers.
Gaussian-processes based Bayesian optimization. The objective function Although GRU has one more hidden layer, it brings about higher test
and the search space for hyper-parameters are selected as the same with mean squared error.
the grid search algorithm. Total number of evaluations is determined as Upon examining the first runs, even though same number of hidden
ten and the number of initial points before approximating with Gaussian layers and neurons are used, PI-GRU delivers better training and test
process estimator is determined as five. At every iteration, acquisition performances. The training mean squared error of GRU is significantly
function is selected probabilistically among lower confidence bound, higher compared with the other runs, accordingly, it can be said that the
negative expected improvement and negative probability of first run of GRU gives an underfit model.
improvement. Similarly, the second run of GRU and third run of PI-GRU uses same
The training and test performances of three runs for each model number of hidden layers and neurons. However, there is a significant
along with hyper-parameter values found by the Gaussian processes- difference in the test performances. PI-GRU brings about better test
based Bayesian Optimization are reported in Table 3.7. performance having the least test mean squared error.
When the first runs, second runs and third runs are compared, PI- From all the results obtained after hyper-parameter optimization
GRU models delivers better test performances regardless of the slight through grid search, Gaussian-processes based Bayesian optimization
increases or decreases in training mean squared errors even though and genetic algorithm, we yield that using physics-informed bi-objective
using equal number of hidden layers and fewer number of neurons in the training for hyper-parameter optimization enables more robust data-
hidden layers, showing the improving impact of physics-informed driven dynamic modeling in this study.
methods for hyper-parameter optimization.
12
T. Asrav and E. Aydin Computers and Chemical Engineering 173 (2023) 108195
4. Conclusion Bergstra, J., Bardenet, R., Bengio, Y., Kegl, B., 2014. Algorithms for hyper-parameter
optimization. In: Proceedings of the Neural Information Processing Systems
Conference.
Physics-informed neural networks have significant advantages over Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y. (2014). On the properties of
purely data-driven black-box models for two case studies presented for neural machine translation: encoder-decoder approaches. doi:10.48550/arXiv.1
the dynamic modeling of process systems in this study. Two different 409.1259.
Chollet F. (2015). Keras.
physics-informed training approaches are considered. In the first one, Chung, J., Gulcehre, C., Cho, K., Bengio, Y. (2014). Empirical evaluation of gated
the loss function includes the error between the left-hand side solved recurrent neural networks on sequence modeling. doi:10.48550/arXiv.1412.3555.
with all the real values and the right-hand side solved with all the pre Dourado, A., & Viana, F.A.C. (2019). Physics-informed neural networks for corrosion-
fatigue prognosis. doi:10.36001/phmconf.2019.v11i1.814.
dicted values of the discretized form of an ordinary differential equation. Goldberg D.E. (1989). Genetic Algorithms in Search, Optimization and Machine
In the second one, missing or complex dynamics are modeled through Learning.
data-driven layers and the well-known dynamics are embedded as the Gorgolis, N., Hatzilygeroudis, I., Istenes, Z., Gyenne, L.G., 2019. Hyperparameter
optimization of LSTM network models through genetic algorithm. In: Proceedings of
physical knowledge in a hybrid recurrent neural network cell perform the 10th International Conference on Information, Intelligence, Systems and
ing integration. The impact of physics-informed neural networks is Applications (IISA), pp. 1–4. https://doi.org/10.1109/IISA.2019.8900675.
investigated on a semi-batch reactor and a wastewater treatment unit Guo, H., Jeong, K., Lim, J., Jo, J., Kim, Y.M., Park, J.pyo, Kim, J., H, Cho, K, H, 2015.
Prediction of effluent concentration in a wastewater treatment plant using machine
which are dynamic process systems governed by ordinary differential learning models. J. Environ. Sci. 32, 90–101. https://doi.org/10.1016/j.
equations. Different machine learning algorithms such as feed-forward jes.2015.01.007 (China).
artificial neural networks, linear regression and support vector regres Haghighat, E., Juanes, R., 2021. SciANN: a Keras/TensorFlow wrapper for scientific
computations and physics-informed deep learning using artificial neural networks.
sion are also used to model the semi-batch reactor and the wastewater
Comput. Methods Appl. Mech. Eng. 373 https://doi.org/10.1016/j.
treatment unit, however, recurrent neural networks delivered consid cma.2020.113552.
erably better performances since they can capture the sequential infor Hansen, L.D., Bjerregaard, M.S., Durdevic, P., 2022. Modeling phosphorous dynamics in
mation. There are two different scenarios investigated in the semi-batch a wastewater treatment process using Bayesian optimized LSTM. Comput. Chem.
Eng. 160 https://doi.org/10.1016/j.compchemeng.2022.107738.
reactor; the training data of the first scenario is taken as the test data in Henze M., Grady Jr C.P.L., Gujer W., Marais G.v.R., & Matsuo T. (1987). Activated
the second scenario. In both scenarios, hybrid recurrent neural network Sludge Model no1. IAWQ Scientific and Technical Report No1, IAWQ, London, UK.
gives the minimum test error and predicts the test trend better than Hochreiter, S., Schmidhuber, J., 1997. Long short-term memory. Neural Comput. 9,
1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
other networks. For the wastewater treatment unit, hybrid RNN delivers Pisa, I., Santin, I., Vicario, J.L., Morell, A., 2018. A recurrent neural network for
the best test performance, and the second-best performance is obtained wastewater treatment plant effluents’ prediction. In: Proceedings of the Actas de Las
by physics-informed GRU. Physics-informed training improves the test XXXIX Jornadas de Automática.
Kingma, D.P., Ba, J., 2017. Adam: A Method for Stochastic Optimization. https://doi.
performances compared to the physics uninformed models in most of org/10.48550/arXiv.1412.6980.
them regardless of the fact that the training error is decreased or Luo, G., 2016. A review of automatic selection methods for machine learning algorithms
increased. Finally, hyper-parameter optimization is done for physics- and hyper-parameter values. Netw. Model. Anal. Health Inform. Bioinform. 5, 18.
https://doi.org/10.1007/s13721-016-0125-6.
uninformed and physics-informed GRU models of the wastewater Merkelbach, K., Schweidtmann, A.M., Müller, Y., Schwoebel, P., Mhamdi, A., Mitsos, A.,
treatment unit, and it is observed that similar test errors can be obtained Schuppert, A., Mrziglod, T., Schneckener, S., 2022. HybridML: open source platform
using fewer number of neurons in physics-informed training. for hybrid modeling. Comput. Chem. Eng. 160 https://doi.org/10.1016/j.
compchemeng.2022.107736.
Nascimento, R.G., & Viana, F.A.C. (2019). Fleet prognosis with physics-informed
Declaration of Competing İnterest recurrent neural networks. doi:10.48550/arXiv.1901.05512.
Patel, R.S., Bhartiya, S., Gudi, R.D., 2022. Physics constrained learning in neural network
The authors declare that they have no known competing financial based modeling. IFAC-PapersOnLine 55 (7), 79–85. https://doi.org/10.1016/j.
ifacol.2022.07.425.
interests or personal relationships that could have appeared to influence Quaghebeur, W., Torfs, E., de Baets, B., Nopens, I., 2022. Hybrid differential equations:
the work reported in this paper. integrating mechanistic and data-driven techniques for modelling of water systems.
Water Res. 213 https://doi.org/10.1016/j.watres.2022.118166.
Raissi, M., Perdikaris, P., Karniadakis, G.E., 2019. Physics-informed neural networks: a
Data availability deep learning framework for solving forward and inverse problems involving
nonlinear partial differential equations. J. Comput. Phys. 378, 686–707. https://doi.
Data will be made available on request. org/10.1016/j.jcp.2018.10.045.
Snoek, J., Larochelle, H., Adam, R.P., 2012. Practical Bayesian optimization of machine
learning algorithms. In: Proceedings of the Advances in Neural Information
Processing Systems, 25.
References Subraveti, S.G., Li, Z., Prasad, V., Rajendran, A., 2022. Physics-based neural networks for
simulation and synthesis of cyclic adsorption processes. Ind. Eng. Chem. Res. 61
(11), 4095–4113. https://doi.org/10.1021/acs.iecr.1c04731.
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S.,
Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., Misener, R., 2022. Maximizing information
Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G.,
from chemical engineering data sets: applications to machine learning. Chem. Eng.
Steiner, B., Tucker, P., Vasudevan, V., Warden, P., & Zheng, X. (2016). TensorFlow: a
Sci. 252 https://doi.org/10.1016/j.ces.2022.117469.
system for large-scale machine learning. OSDI’16: Proceedings of the 12th USENIX
Vanhooren, H., Nguyen, K., 1996. Development of a Simulation Protocol For Evaluation
conference on Operating Systems Design and Implementation.
of Respirometry-Based Control Strategies. Report University of Gent and University
Alex, J., Benedetti, L., Copp, J., Gernaey, K.v, Jeppsson, U., Nopens, I., Pons, M.N.,
of Ottawa.
Steyer, J.P., Vanrolleghem, P. (2008). Benchmark Simulation Model no. 1 (BSM1).
Viana, F.A.C., Nascimento, R.G., Dourado, A., Yucesan, Y.A., 2021. Estimating model
Report by the IWA Taskgroup on Benchmarking of Control Strategies for WWTPs.
inadequacy in ordinary differential equations with physics-informed neural
Alibrahim, H., Ludwig, S.A., 2021. Hyperparameter optimization: comparing genetic
networks. Comput. Struct. 245 https://doi.org/10.1016/j.compstruc.2020.106458.
algorithm against grid search and Bayesian optimization. In: Proceedings of the IEEE
von Stosch, M., Oliveira, R., Peres, J., Feyo de Azevedo, S., 2014. Hybrid semi-parametric
Congress on Evolutionary Computation, CEC 2021, pp. 1551–1559. https://doi.org/
modeling in process systems engineering: past, present and future. Comput. Chem.
10.1109/CEC45853.2021.9504761.
Eng. 60, 86–101. https://doi.org/10.1016/j.compchemeng.2013.08.008.
Andersson, J.A.E., Gillis, J., Horn, G., Rawlings, J.B., Diehl, M., 2019. CasADi: a software
Xiao, T., Wu, Z., Christofides, P.D., Armaou, A., Ni, D., 2022. Recurrent neural-network-
framework for nonlinear optimization and optimal control. Math. Program. Comput.
based model predictive control of a plasma etch process. Ind. Eng. Chem. Res. 61 (1),
11 (1), 1–36. https://doi.org/10.1007/s12532-018-0139-4.
638–652. https://doi.org/10.1021/acs.iecr.1c04251.
Asrav, T., Koksal, E.S., Esenboga, E.E., Cosgun, A., Kusoglu, G., Aydin, E., 2023. Physics-
Yu, T., Zhu. H. (2020) Hyper-parameter optimization: a review of algorithms and
informed neural network based modeling of an industrial wastewater treatment unit.
applications. doi:10.48550/arXiv.2003.05689.
In: Proceedings of the European Symposium on Computer Aided Process
Engineering. Accepted.
13