Machine Learning For Dynamic Process Systems
Machine Learning For Dynamic Process Systems
2024 Edition
Ankur Kumar
Jesus Flores-Cerrillo
`
Dedicated to our spouses, family, friends, motherland, and all the data-science enthusiasts
`
www.MLforPSE.com
All rights reserved. No part of this book may be reproduced or transmitted in any form or in
any manner without the prior written permission of the authors.
.
Every effort has been made in the preparation of this book to ensure the accuracy
of the information presented and obtain permissions for usage of copyrighted materials.
However, the authors make no warranties, expressed or implied, regarding errors or
omissions and assume no legal liability or responsibility for loss or damage resulting from
the use of information contained in this book.
Series Introduction
In the 21st century, data science has become an integral part of the work culture at every
manufacturing industry and process industry is no exception to this modern phenomenon.
From predictive maintenance to process monitoring, fault diagnosis to advanced process
control, machine learning-based solutions are being used to achieve higher process reliability
and efficiency. However, few books are available that adequately cater to the needs of
budding process data scientists. The scant available resources include: 1) generic data
science books that fail to account for the specific characteristics and needs of process plants
2) process domain-specific books with rigorous and verbose treatment of underlying
mathematical details that become too theoretical for industrial practitioners. Understandably,
this leaves a lot to be desired. Books are sought that have process systems in the backdrop,
stress application aspects, and provide a guided tour of ML techniques that have proven
useful in process industry. This series ‘Machine Learning for Process Industry’ addresses
this gap to reduce the barrier-to-entry for those new to process data science.
The first book of the series ‘Machine Learning in Python for Process Systems
Engineering’ covers the basic foundations of machine learning and provides an overview of
broad spectrum of ML methods primarily suited for static systems. Step-by-step guidance on
building ML solutions for process monitoring, soft sensing, predictive maintenance, etc. are
provided using real process datasets. Aspects relevant to process systems such as modeling
correlated variables via PCA/PLS, handling outliers in noisy multidimensional dataset,
controlling processes using reinforcement learning, etc. are covered. This second book of
the series is focused on dynamic systems and provides a guided tour along the wide range
of available dynamic modeling choices. Emphasis is paid to both the classical methods (ARX,
CVA, ARMAX, OE, etc.) and modern neural network methods. Applications on time series
analysis, noise modeling, system identification, and process fault detection are illustrated
with examples. Future books of the series will continue to focus on other aspects and needs
of process industry. It is hoped that these books can help process data scientists find
innovative ML solutions to the real-world problems faced by the process industry.
Books of the series will be useful to practicing process engineers looking to ‘pick up’ machine
learning as well as data scientists looking to understand the needs and characteristics of
process systems. With the focus on practical guidelines and real industrial case studies, we
hope that these books lead to wider spread of data science in the process industry.
`
Book 1 Book 3
`
Preface
Model predictive control (MPC) and real-time optimization (RTO) are among the most critical
technologies that drive the process industry. Any experienced process engineer would vouch
that the success of MPCs and RTOs depends heavily on the accuracy of the underlying
process models. Similarly, the key requirement for other important process technologies like
dynamic data reconciliation, process monitoring, etc. is availability of accurate process
models. Modeling efforts during commissioning of these tools can easily consume up to 90%
of the project cost and time. Modern data revolution, whereby abundant amount of process
data are easily available, and practical difficulties in building high-fidelity first-principles
dynamic models for complex industrial processes have popularized the usage of empirical
data-driven/machine learning (ML) models. Building dynamic models using process data is
called system identification (SysID) and it’s a very mature field with extensive literature.
However, it is also very easy for a process data scientist (PDS) new to this field to get
overwhelmed with the SysID mathematics and ‘drowned’ in the sea of SysID terminology.
Several noteworthy ML books have been written on time-series analysis; however, in process
industry, input-output models are of greater import. Unfortunately, there aren’t many books
that cater to the needs of modern PDSs interested in dynamic process modeling (DPM)
without weighing them down with too much mathematical details and therein lies our
motivation for authoring this book: specifically, a reader-friendly and easy to understand book
that provides a comprehensive coverage of ML techniques that have proven useful for
building dynamic process models with focus on practical implementations.
It would be clear to you by now that this book is designed to teach working process engineers
and budding PDSs about DPM. While doing so, this book attempts to avoid a pitfall that
several generic ML books fall into: overemphasis on ‘modern’ and complex ML techniques
such as artificial neural networks (ANNs) and undertreatment of classical DPM methods.
Classical techniques like FIR and ARX still dominate the dynamic modeling solutions offered
by commercial vendors of industrial solutions and are no less ‘machine-learning’ than the
ANNs. These two along with other classical techniques (such as OE, ARIMAX, CVA) predate
the ANN-craze era and have stood the test of time in providing equal (if not superior)
performance compared to ANNs. Correspondingly, along with modern ML techniques like
RNNs, considerable portion of the book is devoted to classical dynamic models with the
emphasis on understanding the implications of modeling decisions such as the impact of
implicit noise model assumption with ARX model, the implication of differencing data, etc.
`
Guided by our own experience from building process models for varied industrial applications
over the past several years, this book covers a curated set of ML techniques that have proven
useful for DPM. The broad objectives of the book can be summarized as follows:
• reduce barrier-to-entry for those new to the field of SysID
• provide working-level knowledge of SysID techniques to the readers
• enable readers to make judicious selection of a SysID technique appropriate for their
problems through intuitive understanding of the advantages and drawbacks of
different methods
• provide step-by-step guidance for developing tools for soft sensing, process
monitoring, predictive maintenance, etc. using SysID methods
This book adopts a tutorial-style approach. The focus is on guidelines and practical
illustrations with a delicate balance between theory and conceptual insights. Hands-on-
learning is emphasized and therefore detailed code examples with industrial-scale datasets
are provided to concretize the implementation details. A deliberate attempt is made to not
weigh readers down with mathematical details, but rather use it as a vehicle for better
conceptual understanding. Complete code implementations have been provided in the
GitHub repository. Although most of the existing literature on SysID use MATLAB as the
programming environment, we have adopted Python in this book due to its immense
popularity among the broad ML community. Several Python libraries are now available which
makes DPM using Python convenient.
We are quite confident that this text will enable its readers to build dynamic models for
challenging problems with confidence. We wish them the best of luck in their career.
Pre-requisites
No prior experience with machine learning or Python is needed. Undergraduate-level
knowledge of basic linear algebra and calculus is assumed.
Book organization
Under the broad theme of ML for process systems engineering, this book is an extension of
the first book of the series (which dealt with fundamentals of ML and its varied applications
in process industry); however, it can also be used as a standalone text. To give due treatment
to various aspects of SysID, the book has been divided into three parts. Part 1 of the book
provides a perspective on the importance of ML for dynamic process modeling and lays down
the basic foundations of ML-DPM (machine learning for dynamic process modeling). Part 2
provides in-detail presentation of classical ML techniques and has been written keeping in
mind the different modeling requirements and process characteristics that determine a
model’s suitability for a problem at hand. These include, amongst others, presence of
multiple correlated outputs, process nonlinearity, need for low model bias, need to model
disturbance signal accurately, etc. Part 3 is focused on artificial neural networks and deep
learning. While deep learning is the current buzzword in ML community, we would like to
caution the reader against the temptation to deploy a deep learning model for every problem
at hand. For example, the models covered in Part 2 still dominate the portfolio of models
used in industrial controllers and can often provide comparable (or even superior)
performance compared to ANNs with relatively less hassle.
Symbol notation
The following notation has been adopted in the book for representing different types of
variables:
- lower-case letters refer to vectors (𝑥 ∈ ℝ𝑚×1 ) and upper-case letters denote matrices
(𝑋 ∈ ℝ𝑛×𝑚 )
- individual element of a vector and a matrix are denoted as 𝑥𝑗 and 𝑥𝑖𝑗 , respectively.
- any ith vector in a dataset gets represented as subscripted lower-case letter (e.g., 𝑥𝑖 ∈
ℝ𝑚×1 ). Its distinction from an individual element 𝑥𝑗 would be clear from the
corresponding context.
`
Table of Contents
1.1. Process Systems Engineering, Dynamic Process Modeling, and Machine Learning
-- components of a dynamic process model
1.2. ML-DPM Workflow
1.3. Taxonomy of ML-based Dynamic Models
1.4. Applications of DPM in Process Industry
Part 1
Introduction & Fundamentals
Chapter 1
Machine Learning and Dynamic Process
Modeling: An Introduction
P
rocess industry operations are dynamic in nature. In complex process plants such as
oil refineries, 1000s of process measurements may be recorded every second to
capture crucial process trends. Plant engineers often employ dynamic process models
to predict future values of process variables in applications such as process control, process
monitoring, etc. Machine learning (ML) provides a convenient mechanism to bring together
the dynamic process modeling (DPM) needs and the large data resources available in modern
process plants.
This chapter provides an overview of what ML has to offer for DPM. This chapter also
addresses a dichotomy between ML community and DPM community. While the former
generally tends to claim that ML has been reduced to mere execution of ‘model-fitting’ actions,
the later vouches for the need of ‘experts’ to build ‘successful’ models. We will attempt to
reconcile these two differing viewpoints.
Overall, this chapter provides a whirlwind tour of how the power of machine learning is
harnessed for dynamic process modeling. Specifically, the following topics are covered
• Introduction to dynamic process modeling and the need for machine learning
• Typical workflow in a ML-based DPM (ML-DPM) project
• Taxonomy of popular ML-DPM methods and models
• Applications of DPM in process industry
Let’s now tighten our seat-belts as we embark upon this exciting journey of de-mystifying
machine learning for dynamic process modeling.
`
Process industry is a parent term used to refer to industries like petrochemical, chemical,
power, paper, cement, pharmaceutical, etc. These industries use processing plants to
manufacture intermediate or final consumer products. As emphasized in Figure 1.1, the prime
concerns of the management of these plants include, amongst others, optimal design and
operations, high reliability through proactive process monitoring, quality control, and data
reconciliation. All these tasks fall under the ambit of process systems engineering (PSE).
Figure 1.1: Overview of industries that constitute process industry and the tasks process
systems engineers perform
The common theme among the PSE tasks is that they all rely on reliable dynamic
mathematical process models that can accurately predict the future state of the process using
current and past process measurements. Therefore, DPM is one of the defining skills of
process systems engineers. Process industry has historically utilized both first
principles/phenomenological and empirical/data-based models for DPM. While the former
models provide higher fidelity, the later models are easier to build for complex systems. The
immense rise in popularity of machine learning/data science in recent years and exponential
increase in sensor measurements collected at plants have led to renewed interest in ML-
based DPM. Several classical and ‘modern’ ML methods for DPM are at a process data
scientist’s disposal. The rest of the book will take you on a whirlwind tour of these methods.
Let’s now jump straight into the nitty-gritty of ML-DPM.
`
Figure 1.2: (a) Process system (and its dynamic model) with input, output, and noise signals (b) A pH
neutralization process broken down into its dynamic components
The impact of measurement and process noise can be clubbed together and explained via a
stochastic model as shown in the figure above. The resulting disturbance signal summarizes
all the uncertain characteristics of the process. The stochastic and deterministic models can
be estimated simultaneously as well as separately.
where α and β are estimable model parameters and the disturbance variable, v(k),
summarizes all the uncertainties. Here, the value of the output at the kth time instant is
predicated upon the past values of both output and input, and the disturbance. In the later
chapters, we will study how to characterize v(k) using stochastic models.
⋯ 𝑦𝑠𝑡𝑒𝑎𝑑𝑦 ⋯
transition
dynamics
Figure 1.3: Representative dynamic changes in a SISO process output upon a step change
in input
`
➢ In static model, we are only concerned with being able to predict ysteady
as the impact of the step change in u.
➢ On the other hand, in dynamic model, we care about the transition
dynamics as well.
➢ It is obvious that a dynamic model estimation is a more demanding
problem and the complexity only increases for multivariable systems
with stochastic components.
𝑑𝑦
= 𝑓(𝑦(𝑡), 𝑢(𝑡))
𝑑𝑡
In an alternate scenario, if the input variables are constant between samples, then
an exact analytical discrete-time representation can be derived for linear processes.
`
Figure 1.4 shows the typical steps involved in system identification. As you can see, SysID is
more than just curve fitting. Overall, there are five broad tasks: data collection, exploratory
data analysis, data pre-treatment, model identification, and model validation. Although we will
study each step in detail in Chapter 4, let’s take a quick overview:
• Data pre-treatment: This step consists of several activities that are designed to
remove the portions of training data that are unimportant (or even detrimental) to model
identification. It may entail removal of outliers and noise, removal of trends, etc.
Alternatively, training data can also be massaged to manipulate the model’s accuracy
as suited for the end purpose of the model. For example, if the model is to be used for
control purpose, then the training data may be pre-filtered to bolster model’s accuracy
for high frequency signals. Overall, the generic guideline is that you have better
chances of a successful SysID with a better conditioned dataset.
• Model training: Model training is the most critical step in SysID and entails a few sub-
steps. First, a choice must be made on the type of disturbance model and deterministic
model. The end use of the model, the available a priori system knowledge, and the
desired degree of modeling complexity dictate these selections. For example, if the
model is to be used for simulation purposes, then OE1 structure may be preferred over
ARX; if process disturbance has different dynamics compared to process input, then
ARMAX will be preferable; if computational convenience is sought, then ARX is usually
1
We will cover these model structures in the upcoming chapters
`
the first choice and is preferred over other complex structures such as recurrent neural
networks. Model structure selection is followed by model parameter estimation. This
sub-step is mostly ‘hands-off’ due to the availability of several specialized libraries that
perform parameter estimation.
Representative illustrations
Time plots
detrending
ARX modeling
Figure 1.4: Steps (with sample sub-steps) involved in a typical ML-based dynamic modeling
`
• Model validation: Once a model has been fitted, the next step is to check if the model
truly represents the underlying process. Several techniques are at a modeler’s
disposal. For example, you could check the model’s performance on a dataset that has
not been used during parameter estimation, plot the modeling errors to look for leftover
patterns, or check if the model agrees with the a priori information about the system.
Often, your first model will fail the validation procedure. An expert modeler can use the
validation results to infer the reasons for the failure. Some examples of the cause could
be
➢ Training data was not ‘rich’ enough
➢ Training data was not pre-treated adequately
➢ Choice of model structure was wrong
➢ Estimation algorithm did not converge
Once diagnosed, appropriate corrections are made, and the iterative procedure
continues.
We hope that you get the understanding that SysID is more than just black-box application of
modeling algorithm for estimation of model parameters. There are several practical aspects
that require active participation of the modeler during the SysID process. Without
exaggeration, we can say that SysID is an art, and the rest of the book will help you obtain
the necessary skills to become the SysID artist!
2
Models where time variable does not appear as an explicit variable.
`
Figure 1.5: Some popular SysID methods and models covered in this book
There are several noteworthy points in Figure 1.5. First, the apparent domination of linear
models: you may have expected nonlinear models to be preferable for modeling complex
industrial process systems, however, as it turns out, linear models are often justified for DPM.
The justification stems from the fact that industrial processes often operate around an optimal
point making the linear models pretty good approximation for usage in process control and
process monitoring applications. If the end use of the model is process simulation or design
where the system response over wide range of input variables is of interest, then nonlinear
models can prove to be more suitable. Within linear category, different modeling options exist
to cater to process systems with different characteristics, viz, multivariable outputs, presence
of correlated noise, disturbances sharing dynamics with inputs, presence of measurement
noise only, presence of drifts or non-stationarities, etc.
The classification at the root of linear model sub-tree is based on the methodology used for
model fitting. While PEM methods employ minimization of prediction errors, subspace
methods are based on matrix algebra and do not involve any optimization for parameter
optimization.3 Do not worry if these terms do not make much sense right now; they will soon
become ‘obvious’ to you. If not trained about the nuanced differences between these different
models and methods, you may not have much idea at the outset about which model would be
the best one for your system. This book will help you gain adequate conceptual understanding
to become adept at making the right choice and obtaining your coveted model quickly.
3
Another distinction is that PEM is used to obtain input-output models while SIM is used to obtain state-space models.
`
Types of models
In Eq. 1 we saw one way of representing a dynamic process model. The models from Figure
1.5 can be used to generate other representations of your dynamic systems; the figure below
shows the different forms/types of models that we will learn to derive in this book using
machine learning. We will also understand the pros and cons of these different model forms.
Figure 1.6: Different forms of dynamic process model (that we will learn to generate in this
book), their defining characteristics, and corresponding representative examples
`
u Process model:
y
𝑥(𝑘 + 1) + 𝑎𝑥(𝑘) = 𝑏𝑢(𝑘)
If not careful, you may just ignore the presence of measurement noise and attempt to fit
the following input-output model to estimate the model parameters a and b.
Data file ‘simpleProcess.csv’ contains 1000 samples of u and y obtained from the true
process. Below are the parameter estimates we get using this data,
Two things are striking here: the estimates are grossly inaccurate, and the parameter
error estimates seem to suggest high confidence in these wrong values! What went
wrong in our approach? The reason is that the SNR (signal-to-noise ratio) value is not
very high (data was generated with SNR ~ 10) and the fitted ARX model is wrong input-
output form of the true process. The correct form would be the following
In Figure 1.1, we saw some of the applications of dynamic models in process industry. To
provide further perspectives into how a typical plant operator or management may use these
varied applications, Figure 1.7 juxtaposes DPM applications alongside the typical decision-
making hierarchy in a process plant. Every step of plant operation is now-a-days heavily
reliant on DPM-based tools and machine learning has proven to be a useful vehicle for quickly
building these tools. You can use the models from Figure 1.5 for building these tools or if you
use commercial vendor solutions, you can find them employing these models in their products.
For example, in the process control field, FIR models have been the bedrock of industrial MPC
controllers. In the last few years, commercial vendors have inducted CVA models in their
offering due to the advantages provided by subspace models. The latest offering by Aspen,
DMC34, incorporates neural networks for MPC and inferential modeling. ARX, BJ models are
also used for industrial MPC5.
4
https://www.aspentech.com/en/products/msc/aspen-dmc3
5
Qin and Badgwell, A survey of industrial model predictive control technology. Control Engineering Practice, 2003
`
This concludes our quick attempt to establish the connection between process industry,
dynamic process modeling, and machine learning. It must now be obvious to you that modern
process industry relies heavily on dynamic process modeling to achieve its objectives of
reducing maintenance costs, and increasing productivity, reliability, safety, and product
quality. ML-based DPM helps build tools quickly to facilitate achieving these objectives.
This introductory chapter has also cautioned against blind application of ‘convenient’ ML
models for dynamic modeling. Your process data will throw several questions at you at each
stage of system identification. No straightforward answers exist to these questions and only
some time-tested guiding principles are available. While the rest of the book will familiarize
you with these principles, the onus still lies on you to use your process insights and SysID
`
understanding to make the right modeling choices. And yes, remember the age-old advice6,
“All models are wrong, but some are useful.”
Summary
This chapter impressed upon you the importance of DPM in process industry and the role
machine learning plays in it. We familiarized ourselves with the typical SysID workflow,
explored its different tasks, and looked at different ML models available at our disposal for
model identification. We also explored the application areas in process industry where ML has
proved useful. In the next chapter we will take the first step and learn about the environment
we will use to execute our Python scripts containing SysID code.
6
Attributed to the famous statistician George E. P. Box. It basically implies that your (SysID) model will seldom exactly
represent the real process. However, it can be close enough to be useful for practical purposes.
`
Chapter 2
The Scripting Environment
I
n the previous chapter we studied the various aspects of system identification and learned
about its different uses in process industry. In this chapter we will quickly familiarize
ourselves with the Python language and the scripting environment that we will use to write
ML codes, execute them, and see results. This chapter won’t make you an expert in Python
but will give you enough understanding of the language to get you started and help understand
the several in-chapter code implementations in the upcoming chapters. If you already know
the basics of Python, have a preferred code editor, and know the general structure of a typical
SysID script, then you can skip to Chapter 3.
If you skim through the system identification literature, you will find almost exclusive usage of
MATLAB software as the computing environment. This is mostly attributed to the System
Identification toolbox, a very powerful MATLAB toolbox developed by Prof. Lennart Jung (a
legend in system identification). Unfortunately, this tool not freely available, and MATLAB is
not yet as popular as Python among the ML community. Luckily, several good souls in the
Python community have developed specialized libraries for all aspects of SysID. Most of the
popular SysID models can now be generated using off-the-shelf Python libraries. Considering
the dominance of Python for deep learning, Python becomes an excellent choice for SysID
scripting.
In the above context, we will cover the following topics to familiarize you to Python
• Introduction to Python language
• Introduction to Spyder and Jupyter, two popular code editors
• Overview of Python data structures and scientific computing libraries
• Python libraries for system identification
• Overview of a typical ML-DPM/SysID script
`
Python is a high-level general-purpose computer programming language that can be used for
application development and scientific computing. If you have used other computer languages
like Visual Basic, C#, C++, Java, then you would understand the fact that Python is an
interpreted and dynamic language. If not, then think of Python as just another name in the list
of computer languages. What is more important is that Python offers several features that
sets it apart from the rest of the pack making it the most preferred language for machine
learning. Figure 2.1 lists some of these features. Python provides all tools to conveniently
carry out all steps of an ML-DPM project, namely, data collection, data exploration, data pre-
processing, model ID, visualization, and solution deployment to end-users. In addition, freely
available tools make writing Python code very easy7.
Installing Python
One can download official and latest version of Python from the python.com website.
However, the most convenient way to install and use Python is to install Anaconda
(www.anaconda.com) which is an open-source distribution of Python. Along with the core
Python, Anaconda installs a lot of other useful packages. Anaconda comes with a GUI called
Anaconda Navigator (Figure 2.2) from where you can launch several other tools.
7
Most of the content of this chapter is similar to that in Chapter 2 of the book ‘Machine Learning in Python for Process
Systems Engineering’ and have been re-produced with appropriate changes to maintain the standalone nature of this
book.
`
Chapter 3
Exploratory Analysis and Visualization of
Dynamic Dataset: Graphical Tools
I
n Chapter 1 we had remarked that system identification is an art. A part of the art lies in
making right inferences from exploratory analysis of data and model residuals via visual
plots. Yeah, you read that right: even with all the fancy algorithms at our disposal,
sometimes visual inspection of data is still the best tool for a quick assessment of dataset and
model validity. Visual plotting can often provide crucial clues about model structure and model
troubleshooting, and forms an important component of deductive phase of SysID. These plots
will be used repeatedly in the in-chapter illustrations in this book and therefore we thought it
would be best to introduce the concepts behind these graphical tools right away.
Visual plots can range from simple time plots to advanced spectral density plots. These may
help us make some quick educated judgement about data stationarity, deterministic vs
stochastic trends, presence of colored noise, whiteness of model residuals, etc. Don’t worry
if you don’t understand these dynamic modeling-specific jargons for now. We will focus on
generation and conceptual understanding of these plots in this chapter and learn inference-
making using these plots in the later chapters.
In system identification, careful scrutiny of data plots forms a critical component right from the
initial step of exploratory data analysis (EDA) to the last step of model validation. Consider
plots a and b in Figures 3.1 which are time plots of output signals in two different scenarios.
Just a visual inspection makes it apparent that the raw output signals contain trends. One can
further infer that while plot a exhibits a deterministic trend, plot b exhibits a stochastic trend8.
Such assessment during EDA can help a process modeler select appropriate class of models
for their systems.
(a) (b)
(c) (d)
Consider further the plots 3.1c and d. Here, the shown noisy signals could correspond to the
disturbance signals or the model residuals (difference between observations and model
predictions). Here, unlike the previous example, the time plots don’t provide any quick obvious
assessment. However, the derived visualizations, autocorrelation and spectral density plots
make the distinction between the two scenarios very evident. These plots confirm that the
shown signals have white noise and colored noise properties, respectively. If these were
model residuals, then non-zero autocorrelations in plot 3.1d immediately suggest inadequate
modeling. Moreover, the spectral density plot looks qualitatively similar to that of an
autoregressive (AR) process and therefore, an AR model could be attempted for the colored
8
We will see the details on the distinction between these two types of trends in Chapter 4.
`
Chapter 4
Machine Learning-based Dynamic Modeling:
Workflow and Best Practices
I
n Figure 1.4 in Chapter 1 we looked at a typical workflow for system identification. What is
noteworthy is that while model ID is only a part of this workflow, very often too much focus
goes into application of model ID to estimate model parameters at the expense of other
aspects of SysID. Dumping raw data into model ID module will invariably generate
unsatisfactory model if data is not pre-treated appropriately. Additionally, you will seldom be
right in your modeling choices in the very first attempt. The take-home message is that
knowledge about how to prepare raw data to enhance its information content about the
system, how to assess validity of obtained model, how to analyze modeling results critically,
and how to iterate judiciously is absolutely critical for successful SysID. Fortunately, several
guidelines and best practices have been devised to guide a process modeler at each step of
the SysID workflow. We will learn these guidelines and best practices in this chapter.
We will not cover the best practices associated with generic machine learning workflow.
Concepts like feature extraction, feature engineering, cross-validation, regularization, etc.
have already been covered in detail in our first book of the series. In this chapter our focus
will be on the unique challenges presented by dynamic processes and the specific best
practices to deal with them. Specifically, the following topics are covered
So far in this book we have been consciously emphasizing the difference between model
identification and system identification. While model ID is all about estimating model
parameters, SysID is about ensuring that the model truly represents the underlying system
and stands consistent w.r.t. any prior available knowledge and any modeling assumptions.
Figure 4.1 reproduces the SysID workflow we had seen in Chapter 1 and shows the several
steps that serve to achieve the stated goals. The rest of the chapter will sensitize you about
these varied aspects of SysID and help you gain working-level understanding about how to
successfully execute a SysID project.
Figure 4.1: Typical steps in a SysID workflow that we will cover in this chapter
The first step of the workflow is data collection. Whether you are using pre-existing historical
data or conducting fresh identification tests, you must ensure that the dataset has enough
‘juice’ in it to enable identification of the desired model. We will soon see some guidelines
around this task. The next step corresponds to sanitizing the data to remove the unwanted
components that can negatively impact model ID. For example, elimination of any slow drift
in the output signals is essential to avoid incorrect model parameters. After the first two steps,
our dataset is ready and we come to model structure selection where the onus is on the
process modeler to choose the right class of process and noise models. We saw in Chapter
1 that an incorrect choice can lead to biased parameters estimates. Your expert domain
knowledge and any prior information on process disturbances can come in quite handy at this
`
Part 2
Classical Machine Learning Methods for Dynamic
Modeling
`
Chapter 5
Time Series Analysis: Concepts and
Applications
A
s alluded to before, time series analysis (TSA) refers to study of time series signals
from processes without exogenous inputs. Before we dive into the relatively more
complex world of input-output modeling, it would be prudent to first get acquainted with
how to model stochastic variations in signals and compactly describe the dependencies
among adjacent observations in a time series. The study of time series is not just for
pedagogical convenience. SysID derives many concepts from time series analysis and
therefore, a strong foundation in modeling time series is important for mastering the art of
SysID.
Time series analysis can help us to characterize I/O model residuals and get clues regarding
further model refinement. If your process has measured disturbance signals (inputs that can’t
be manipulated; for example, ambient temperature), then they can be modeled via TSA to
provide better forecasts and control. Furthermore, TSA can be used to model controlled
process variables and build process monitoring tools. In this chapter, we will work through
case-studies illustrating these applications.
Apart from setting a strong foundation of digital signal processing, this chapter will also
introduce the backshift notation and transfer function operator. These constructs are quite
useful for compact description and algebraic manipulation of the difference equations.
Specifically, the following topics are covered
• Introduction to AR, MA, and ARMA models for stationary signals
• Using ACF and PACF for model structure selection
• Monitoring controlled process variable in a CSTR using ARMA models
• Introduction to ARIMA models for nonstationary signals
• Forecasting measured signals using ARIMA models
`
Time series analysis refers to the study and modeling of dependencies among sequential data
points in time series signals. Several approaches exist for modeling time series. In this
chapter, we will look at four of the most common models as shown in Figure 5.1. While AR,
MA, and ARMA models are used to describe stationary signals, ARIMA is used for
(homogeneous) nonstationary signals. In this chapter, we will understand the differences,
similarities, and the inherent connections between these models.
To further motivate the study of time series analysis, let’s briefly look at two of the use cases
that were mentioned previously. The illustration below shows a CSTR unit that has measured
disturbance signals (feed concentration and temperature). An accurate model for these
signals can help generate accurate forecasts and control the vessel temperature more
effectively. Also, it is easy to show that if the setpoint of the controllers do not change, then
the controlled variables (here, output product temperature and concentration) depend on
process disturbances alone. Therefore, a TSA model for these variables can be built to
eventually look for significant mismatch between measurements and model predictions as an
indicator of process faults.
`
Chapter 6
Input-Output Modeling - Part 1:
Simple Yet Popular Classical Linear Models
T
here is beauty is simplicity - this age-old truth perfectly sums up the reason behind the
popularity of two simple input-output models, FIR and ARX models, that we will learn
about in this chapter. In the age of artificial neural networks, these models still hold on
their own and are an essential part of any PDS’s toolkit. As alluded to before, I/O processes
have inputs signals that can be manipulated to control the output signals. The objective,
therefore, of I/O modeling is to find the deterministic relationship between the input and output
signals, and, optionally, find a suitable model for the stochastic disturbances affecting the
outputs.
FIR model is a non-parametric model and is used in thousands of MPC solutions in process
industry globally. The remarkable success of the FIR-based MPCs is testament to the fact
that simple models can be quite useful representation of complex industrial processes. FIR
models are intuitive and simple enough to be understood by plant operators. However, nothing
is all rosy with anything. FIR models, owing to their flexibility, often lead to overfitting. ARX
models, on the other hand, are parametric models and require much lower number of
parameters. Consequently, the resulting models are ‘smoother’. ARX models are often the
initial choice in any SysID exercise. Nonetheless, ARX models can easily suffer from high
bias errors. Since these two models are so critical, this chapter is devoted to understanding
their various aspects, along with their pros and cons, in detail.
Impulse response of a system is its response to a special impulse input as shown in Figure
6.1. The impulse response model approximates the system output as a linear combination of
several impulse responses due to past inputs. Since the impulse responses usually decay to
zero, it suffices to include only the past few values (say M) of u(k) to approximate y(k). The
value ‘M’ is called the model order and the resulting model is called the finite impulse response
(FIR) model. You will notice that there is no attempt to model stochastic disturbances and a
non-parametric modeling form (there is no compact mathematical structure) is adopted.
Consequently, FIR model is among the simplest SysID model.
Unit Impulse Input
𝑦(𝑘) ≈ σ∞
𝑖=1 𝑔𝑖 𝑢(𝑘 − 𝑖) IIR (Infinite impulse response)
each of these can be treated as
impulse inputs of magnitudes u(k)
𝑦(𝑘) ≈ σ𝑀
𝑖=1 𝑔𝑖 𝑢(𝑘 − 𝑖)
Generic Input
k
Figure 6.1: Impulse response model for a SISO system
Fitting a FIR model entails estimation of the impulse response coefficients. Although a FIR
model’s structure is quite simple, it is very flexible and given sufficiently large M, any type of
complex impulse response can be fitted. It is common to employ model orders of 90 to 120 to
get all the impulse coefficients. However, such large parameters dimensionality often leads to
overfitting (impulse response curves show wiggles or spikes due to output signal corruption
by process disturbances). Apart from model flexibility, another advantage of FIR model is that
an estimate of I/O delay is automatically obtained after model fitting. For a system with delay
d, the first d impulse coefficients will be zero or close to zero. This is in fact a classical method
for delay estimation. FIR models also provide unbiased parameter estimates as the
deterministic part of the model is independently parametrized (recall our discussion from
Chapter 4). These favorable properties of FIR models make them the dominant SysID model
deployed in industrial MPCs9.
9
Darby and Nikolaou, MPC: Current practice and challenges. Control Engineering Practice, 2012
`
To showcase an industrial application of FIR models, we will consider fired heaters used in
petroleum refineries wherein, as shown in the figure below, fuel is combusted to heat feed oil
and vaporize it for subsequent fractionation10. In the SISO version of the system, we will
consider the task of keeping the outlet temperature of the heated oil (TO) in control. Changes
in feed oil temperature, TI, or the feed flow will impact TO. The controller adjusts the flow of
the fuel gas (although indirectly, by manipulating the set-point of the flue-gas PID controller.
The PID controller, not shown in the schematic, adjusts the fuel-gas stream’s valve opening
to meet the specified flow)11. The process controller, however, needs a model to understand
the relationship between flue gas flow set-point (FGSP) and TO.
To obtain the model, we will use data provided in the file IndustrialFiredHeater_SISO.csv
which contains 1000 samples of TO and FGSP obtained from an open-loop simulation of the
system wherein FGSP was excited using a GBN signal and the system was subjected to
disturbances in TI.12 Note that the time interval here is in minutes and not seconds.
10
The reader is referred to https://apmonitor.com/dde/index.php/Main/FiredHeaterSimulation for more details on the
system. At this link, readers can also find the system model that is used to generate the simulation dataset used in this
case-study.
11
Using PID controller’s setpoints as manipulated variables is a very standard approach in modern (MPC) controllers.
12
Online code repository shows how the identification test data was generated
`
Chapter 7
Input-Output Modeling - Part 2:
Handling Process Noise the Right Way
I
n the previous chapter we saw that the FIR and high-order models can provide unbiased
and linear-in-parameter models, but, unfortunately, suffer from high variance errors due to
large number of parameters. Therefore, in this chapter, we will look at other popular model
structures, namely, ARMAX, OE, and Box-Jenkins, that can provide more parsimonious
description of the underlying process and generate unbiased models. These models allow
greater flexibility in noise dynamics and capture the fact that noise may enter (or originate in)
the system at different points which, in turn, determines the structure of the noise transfer
operator. However, there is no free lunch! These models generate non-linear-in-parameter
predictors and therefore the parameter estimation procedures are computationally more
complex. Nonetheless, addition of these models to your ML-DPM toolkit will allow you to
choose model structure specific to the problem at hand.
In the later part of this chapter, we will relax the stationarity conditions on disturbance signals
and look at how to model processes with non-stationary disturbances. We will see how
differencing input and output signals helps in removing disturbance non-stationarities. Using
simulated illustrations, we will try to understand the implications of choosing these model
structures. Specifically, we will study the following topics.
• Introduction to ARMAX and Box-Jenkins models
• Modeling distillation columns using ARMAX models
• Introduction to OE model structure and its comparison to ARX model structure
• Introduction to ARIMAX models
• Modeling gas furnace systems using Box-Jenkins models
`
In Chapter 6, we saw how the ARX model provides a very restrictive treatment of the process
disturbances. We also saw how this shortcoming can lead to significant model bias.
Accordingly, the focus in this chapter is to overcome this shortcoming and look at other
popular alternatives to ARX model, namely, the output error (OE) model, the autoregressive
moving average with eXogenous (ARMAX) inputs model, and the Box-Jenkins (BJ) model.
As shown in Figure 7.1, these parametric models provide a broad spectrum of options
regarding the parametrization of the noise transfer operator (H). For example, while
disturbances are not modeled and only input transfer operator is sought in OE framework,
completely independent parametrization of both G and H operators are allowed in the BJ
model. In the next few sections, we will explore these models’ properties and learn how to use
them judiciously.
v(k) e(k)
[No model]
u(k) 𝐵(𝑞 −1 ) y(k) u(k) 1 y(k)
𝑞 −𝑑 𝑞 −𝑑 𝐵(𝑞 −1 ) OE
𝐴(𝑞 −1 ) 𝐴(𝑞 −1 ) N
o
i
• Disturbance directly perturbs the output • White noise enters early in the process such s
• No attempt is made to model v(k) that it shares dynamics with u(k) e
ARX
m
o
d
e
Box-Jenkins model ARMAX model l
s
𝐶(𝑞 −1 ) e(k) ARM
e(k) 𝐶(𝑞 −1 ) p
AX
𝐷(𝑞 −1 ) v(k) v(k) e
c
u(k) y(k) t
y(k) 1
u(k) −𝑑
𝐵(𝑞 −1 ) 𝑞 −𝑑 𝐵(𝑞 −1 ) r
𝑞 𝐴(𝑞 −1 ) u
𝐴(𝑞 −1 )
m BJ
• Disturbance is modeled as an ARMA • An MA stochastic signal enters early in the [Most general]
stochastic process process and shares dynamics with u(k)
All the models in Figure 7.1 belong to the family of prediction error methods wherein the model
parameters are estimated by minimizing the one-step ahead prediction errors (ε(k)) where,
`
Rest of the Section 7.1 and Section 7.2 not shown in this preview
`
Let’s start by importing the required packages and exploring the provided I/O dataset.
13
The model for the distillation column was taken from ‘Digital control systems: design, identification and
implementation. Springer, 2006’.
`
Chapter 8
State-Space Models:
Efficient Modeling of MIMO Systems
I
n the previous chapter we alluded to the problems related to the classical methodology of
modeling a MIMO system through separate MISO models for each output. In a physical
system, the different output variables seldom behave independently; they often share
common parameters and/or correlated noise. Therefore, identifying all the output models
simultaneously while being cognizant of shared dynamics between them leads to better and
more robust models. This alternative approach to SysID is provided by state-space (SS)
models. State-space models add an intermediate layer of ‘state variables’ between the input
and output variables, wherein, the inputs determine the states, and the outputs are derived
as linear combination of the states. This modeling paradigm leads to parsimonious models
capable of fitting complex processes with both fast and slow dynamics.
In this chapter, we will cover in detail the subspace identification techniques which are utilized
for fitting the SS models. While several popular subspace identification methods (SIMs) such
as N4SID, CVA, MOESP exist, CVA will be our primary focus. Subspace identification offers
several additional advantages over I/O models. SIMs involve only linear algebra for model ID
(no iterative nonlinear optimization schemes are needed). SIMs also provide in-situ
mechanism for automated optimal model order selection. Subspace models are more robust
to noise and provide smooth (step and impulse) response curves. Infact, it is not uncommon
to fit SIM model to I/O data and then derive FIR model using the SIM model for usage in MPC
applications. Due to such superior properties of SIM, they are among the ‘mainstream’ models
provided by advanced process control (APC) solution vendors.
CVA models are also inherently suitable for computing fault detection indices, performing fault
diagnosis, and therefore building process monitoring tools. We will build one such application
in this chapter. Specifically, the following topics will be covered
• Introduction to state-space models and CVA modeling
• Modeling MIMO glass furnace via CVA
• Process monitoring of Industrial Chemical Plants using CVA
`
Consider the distillation column in Figure 8.1 where we have four inputs and four outputs.
Let’s focus on the relationship between the reboiler duty QB and distillate composition xD. We
can build an I/O model between QB and xD, but it is clear that QB doesn’t affect xD directly.
There are internal dynamics or states (e.g., arising from the liquid holdup at each column tray)
that changes in QB have to pass through before impacting xD. Similar argument could be made
for QB vs D relationship. The unobserved intermediate variables which characterize the
internal dynamics or ‘states’ of the column are termed state variables and, as argued before,
the outputs share these states. The modeling framework that reflects such interrelationship
between the inputs and outputs is the state-space model whose structure14 is shown in Figure
8.1. In a generic modeling exercise, the estimated state variables may not always have any
physical meanings and even the choice of the states is not unique.
estimate the system order n, the system matrices (A, B, C, D), the initial state x(0), and the statistical properties
(covariance matrices) of the noises (𝛴𝑤𝑤 , 𝛴𝑣𝑣 )
Our familiar PEM methodology could be used to estimate SS models but that would involve
solving a nonlinear optimization problem. The better alternative is the subspace identification
method.
14The matrix D is often kept equal to zero due to no immediate direct impact of u(k) on y(k)
15Column diagram created by Milton Beychok under Creative Commons Attribution-Share Alike 3.0
[http://commons.wikimedia.org/wiki/File:Continuous_Binary_Fractional_Distillation.PNG]
`
Chapter 9
Nonlinear System Identification:
Going Beyond Linear Models
I
n the previous chapters we restricted our attention to only linear processes. However, most
industrial processes exhibit linear behavior only in a limited range of operating conditions.
If a global model of complex processes (such as high-purity distillation, pH neutralization,
polymerization, etc.) is sought, then the demon of nonlinearity would inadvertently surface. It
won’t be an exaggeration to state that nonlinearity is not an exception, rather the rule in
process industry. We won’t be surprised if you are tempted to always resort to artificial neural
networks-based structures which have become synonymous with nonlinear modeling now-a-
days. However, there exist simple classical nonlinear model structures which are more
interpretable and equally powerful. We will explore these models in this chapter.
Nonlinear models can be powerful tools but there are certain costs to their usage. These
include greater demand for variability in training data (which means more laborious
experiment design), more complex model selection procedure, and more computationally
intensive parameter estimation. Therefore, the decision to use nonlinear models should be
judiciously made. If you do decide to go ahead, then the concepts covered in this chapter may
help you greatly. Specifically, the following topics will be covered
Nonlinear SysID becomes a necessary evil when accurate predictions are sought over wide
range of operating conditions. Although nonlinear SysID literature is not as mature as linear
SysID, SysID community has developed several useful approaches for nonlinear modeling.
Figure 9.1 shows a few popular approaches. In the 1st approach, a nonlinear relationship is
directly fitted between y(k) and past data The nonlinear form could be polynomial-based or
more complex such as neural network-based. We will focus on the polynomial form in this
chapter. Just like linear SysID, different model structures can be postulated, namely, nonlinear
ARX (NARX), NOE, NARMAX, etc. In the 2nd approach, the linear and nonlinear parts are
defined in separate blocks, leading to block-structured models. Depending upon whether the
nonlinear block comes before or after the linear block, one could have the Hammerstein model
or the Wiener model (the two most popular models from this class). In the 3rd approach,
several linear models are built for different regions of the operating space and then they are
combined to give one global model. Numerous industrial applications of these approaches
have been reported for predictive modeling, process control, and process monitoring.
past inputs 𝑤1
Nonlinear Dynamic 𝑦̂𝑚1 (𝑘)
𝑢(𝑘 − 1) … Nonlinear function
static Linear
𝑦̂(𝑘) 𝑤2 𝑦̂(𝑘)
past outputs 𝑓(𝑢(𝑘 − 1) ⋯ 𝑦̂(𝑘) 𝑢(𝑘) block block 𝑦̂𝑚2 (𝑘)
𝑦(𝑘 − 1) … 𝑦(𝑘 − 1) ⋯ 𝑓(𝑢(𝑘)) 𝐺(𝑞−1 ) ⋮
𝑦̂(𝑘 − 1) ⋯ ) 𝑤𝑀
past predictions 𝑦̂𝑚𝑀 (𝑘)
Nonlinear and linear blocks can
𝑦̂(𝑘 − 1) … be arranged differently Predictions from M linear models
Nonlinear transformation
Considering high computational burden of nonlinear SysID, if the nature of nonlinearity is
known beforehand, then a nonlinear transformation of the input or output variables may be
utilized to avoid explicit nonlinear fitting. Illustrations below provide some example scenarios.
𝑥𝐷 𝑇𝑖𝑛
Rest of the Section 9.1 and Section 9.3 not shown in this preview
`
To demonstrate the utility of nonlinear SysID, we will use data from a liquid-saturated steam
heat exchanger16 where hot saturated steam is used to heat cold liquid. The provided dataset
contains 4000 samples (sampling time 1s) of input (liquid flow rate) and output (outlet liquid
temperature) variables. For this benchmark nonlinear process, our objective will be to build
a dynamic model to predict the outlet liquid temperature as a function of incoming liquid
flowrate. The dataset can be downloaded from the DaISy datasets repository.
We will use the SysIdentPy package for this nonlinear SysID exercise. Let’s begin by exploring the
dataset.
16
De Moor B.L.R. (ed.), DaISy: Database for the Identification of Systems, Department of Electrical Engineering,
ESAT/STADIUS, KU Leuven, Belgium, URL: http://homes.esat.kuleuven.be/~smc/daisy/
`
Rest of the Chapter 9 and Part 2 of the book not shown in this preview
`
Part 3
Artificial Neural Networks & Deep Learning
`
Chapter 10
Artificial Neural Networks:
Handling Complex Nonlinear Systems
t won’t be an exaggeration to say that artificial neural networks (ANNs) are currently the
I most powerful modeling construct for describing generic nonlinear processes. ANNs can
capture any kind of complex nonlinearities, don’t impose any specific process
characteristics, and don’t demand specification of process insights prior to model fitting.
Furthermore, several recent technical breakthroughs and computational advancements have
enabled (deep) ANNs to provide remarkable results for a wide range of problems.
Correspondingly, ANNs have re(caught) the fascination of data scientists and the process
industry is witnessing a surge in successful applications of ML-based process control,
predictive maintenance, and inferential modeling involving ANN-based system identification.
For SysID, ANNs are commonly used in NARX framework wherein an ANN model is fitted to
find the nonlinear relationship between current output and past input/output measurements.
In such applications, the FFNN (feed-forward neural networks) is invariably employed.
However, for dynamic modeling, a specialized type of network called RNN (recurrent neural
network) exist which is designed to be aware of the temporal nature of dynamic process data.
Consequently, RNNs tend to be more parsimonious than FFNN. We will see SysID
applications of both these popular architectures in this chapter.
There is no doubt that ANNs have proven to be monstrously powerful. However, it is not easy
to tame this monster. If the model hyperparameters are not set judiciously, it is very easy to
end up with disappointing results. The reader is referred to Part 3 of Book 1 of this series for
a detailed exposition on ANN training strategies and different facets of ANN models. In this
chapter, SysID relevant concepts are sufficiently described to enable an uninitiated reader to
learn how to use ANNs. Specifically, the following topics are covered
• Introduction to ANN, FFNN, RNN, LSTM
• Heat exchanger SysID via FFNN-based NARX
• Heat exchanger SysID via LSTM
`
Artificial neural networks (ANNs) are nonlinear empirical models which can capture complex
relationships between input-output variables via supervised learning or recognize data
patterns via unsupervised learning. Architecturally, ANNs were inspired by human brain and
are a complex network of interconnected neurons as shown in Figure 10.1. An ANN consists
of an input layer, a series of hidden layers, and an output layer. The basic unit of the network,
neuron, accepts a vector of inputs from the source input layer or the previous layer of the
network, takes a weighted sum of the inputs, and then performs a nonlinear transformation to
produce a single real-valued output. Each hidden layer can contain any number of neurons.
Single Neuron
Nonlinear mapping
f(.)
Figure 10.1: Architecture of a single neuron and feedforward neural network with 2 hidden layers
for a SISO (single input single output) process within NARX framework
By the process data scientists, for the process data scientists, of process data science
www.MLforPSE.com