Data-Driven Process Modeling for Control and Optimization
Sudhakar Kathari
Senior Research Scientist, Connected Industrials, Honeywell
January 8, 2025
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 1/22
Outline
1 Introduction
2 Identification
Prepare Data for Identification
Identification of ARX Models
Identification of State-Space Models
Identification of Inferential Models
3 Summary
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 2/22
Introduction
Benefits of Control and Optimization
Implementing control and optimization solutions in process industries maximizes profitability
and enhances operational efficiency.
The key value drivers are:
+ increases production,
+ reduces energy usage,
+ minimizes downtime,
+ enhances product quality,
+ improves process stability, and
+ provides safer operations.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 3/22
Introduction
PID Control, Feedback, Single Variable
– Proportional-integral derivative (PID) con-
trol is the most common control strategy
used in industries.
– OP is a weighted sum of three factors
– error
– integration of error
– derivative (rate of change) of error
– The weights need to be adjusted (tuned)
according to the process conditions.
– Tuning methods are based on process pa-
SP Set-point; PV Process variable
rameters (gain, time constant and delay).
OP Output from controller
To tune PID parameters, it is necessary to identify the process parameters (process model)
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 4/22
Introduction
Limitations of PID Control
– Tuning is difficult for processes with large dead-times (delays).
– Difficult to control processes with inverse responses and unusual dynamic behaviors.
– Unable to reject disturbances due to the feedback mechanism; certain processes cannot
tolerate these kinds of disturbances.
– Unable to handle interactions (where many OPs can affect many PVs) and constraints
(minimize certain utility and maximize product).
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 5/22
Introduction
Model Predictive Control −→ Advanced Process Control, Multivariable
Advanced process control (APC) solution / technology is based on model predictive control
(MPC), which relies on dynamic models of the underlying plant.
Basic elements of MPC
1 Specification of reference trajectory
2 Prediction of process output
3 Computation of input moves
4 Updation of prediction (error)
To implement an MPC-based advanced control solutions, it is necessary to identify the process
dynamic model between inputs and outputs
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 6/22
Introduction
What is Dynamic Behaviour?
A process (variable) whose behavior changes over time is called dynamic behaviour of a
process (variable) and modeling such a behaviour is called dynamic modeling.
How to incorporate dynamics?; y = β0 + β1 u1 + β2 u2 + · · · + βm um
Dynamics in input variables (process output depends on past inputs)
p
X p
X p
X
y[k] = β0 + βi u1 [k − i] + βi u2 [k − i] + · · · + βi um [k − i] + ε[k] (1)
i=0 i=0 i=0
With the inclusion of dynamics in output variables
p
X p
X p
X p
X
αi y[k − i] = β0 + βi u1 [k − i] + βi u2 [k − i] + · · · + βi um [k − i] + ε[k] (2)
i=0 i=0 i=0 i=0
where p is the model order and m is number of inputs
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 7/22
Introduction
Continuous Measurements of Process Quality Variables −→ Requires Inferential Models
Implementation of advanced control, optimization and monitoring solutions require
– continuous measurement of the key process quality variables and properties (kerosene sulfur,
product recovery, gas oil density and Reid vapor pressure etc).
However, it is not possible or too costly to measure at the required frequency due to the
limitations of sensors and instrumentation.
To get continuous measurements of the key process quality variables and properties, it is
necessary to develop inferential models or soft sensors.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 8/22
Introduction
Models for Process Control and Optimization
To summarize, the following models are required to implement control and optimization solutions
in process industries.
1 Models (dynamic, SISO) for tuning PID parameters
2 Models (dynamic, MIMO) for implementing control and optimization solutions
3 Models (static/dynamic, MISO) for measuring process quality variables and properties
How to identify these models?
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 9/22
Identification
First Principles Vs Data Driven Approaches
To build models, we require good knowledge Identifying process models from measured
of the physics of process, which is rarely data is a natural alternative and a commonly
available for a large class of processes. used approach in industries
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 10/22
Identification
Problem formulation
Process
Data ---> Generic form higher order models ---> Specific form reduced order models
Honeywell Internal
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 11/22
Identification
Model Structures: Transfer-function Form
A general description of transfer-function models (in polynomial form) is given by
B(q) C(q)
A(q)y[k] = u[k] + e[k] (3)
F (q) D(q)
where A(q) captures the dynamics common to plant and disturbance models.
Model Polynomial Values
FIR A(q) = 1; F (q) = 1; C(q) = 1; D(q) = 1
ARX F (q) = D(q) = 1; C(q) = 1
ARMAX F (q) = D(q) = 1
OE A(q) = 1; C(q) = D(q) = 1
BJ A(q) = 1
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 12/22
Identification
Model Structures: State-space Form
A general state-space model form is given by
x[k + 1] = Ax[k] + Bu[k] + n[k]
y[k] = Cx[k] + Du[k] + v[k]
where A, B, C, and D are the state, input, output, and direct feed-through matrix, respectively.
x[k] ∈ Rn×1 is a vector of states
u[k] ∈ Rm×1 is a vector of inputs (MVs)
y[k] ∈ Rl×1 is a vector of outputs (CVs)
n[k] and v[k] are process and measurement noises, respectively
State-space to Input-Output (TF) model
G(s) = C (sI − A)−1 B + D (4)
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 13/22
Challenges with Industrial Data − Need for Data Preparation
There are several possible issues with industrial data:
◦ oversampling
◦ offsets (non-zero means) and different engineering units.
◦ drifts, trends, and low frequency disturbance.
◦ unmeasured disturbances.
◦ disturbances can be non-stationary (random walk).
Furthermore, an industrial process can have
◦ large (and different) input-output delays, and
◦ integrating type dynamics
All these issues can effect the efficacy of the identification algorithm. To handle these issues, a
suitable data pre-processing approach is required.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 14/22
Identification of ARX Models
Autoregressive with eXtra input (eXogenous vari-
ables)
A(q)y[k] = B(q)u[k] + e[k]
y[k] + a1 y[k − 1] + · · · + ana y[k − na ] + b1 u[k − nk ] +
· · · + bnb u[k − nb − nk + 1] + e[k]
Plant and noise models share common dynamics
If na = 0, it reduces to FIR model
Given N measurements of input u[k] and output y[k] data of unknown process, identify the
– appropriate orders of A and B polynomials (na , nb ) and delay (nk ),
– coefficients of A and B polynomials, and
– second-order properties of noise.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 15/22
Identification of ARX Models
Unknowns (parameters):
θ = [a1 a2 · · · ana b1 b2 · · · bnb ]T
Knowns (regressors):
Φ = −y[k − 1] · · · −y[k − na ] u[k − 1] · · · u[k − nb ]
Predictor:
ŷ[k|k − 1] = B(q)u[k] + (1 − A(q))y[k] = Φθ
Predictor is linear in parameters, therefore, coefficients (parameters) of A and B polynomials
can be obtained uniquely using least squares
−1 T
θ̂LS = ΦT Φ Φ y
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 16/22
Identification of State-Space Models
Problem Formulation
An innovation form of state-space representation is
x[k + 1] = Ax[k] + Bu[k] + Ke[k]
y[k] = Cx[k] + Du[k] + e[k] (5)
Given N measurements of input u[k] and output y[k] data of unknown process, identify the
– appropriate order of the process (n),
– state-space matrices A, B, C, and D, and
– second-order properties of noise.
Identification algorithm: Subspace identification methods.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 17/22
Identification of State-Space Models
Subspace Algorithm
An innovation form of state-space representation is
x[k + 1] = Ax[k] + Bu[k] + Ke[k]
y[k] = Cx[k] + Du[k] + e[k] (6)
Define the past and future data matrices as
y[0] y[1] ··· y[j−1] u[0] u[1] ··· u[j−1]
y[1] y[2] ··· y[j] u[1] u[2] ··· u[j]
Yp = ; Up =
.. .. .. .. .. .. .. ..
. . . . . . . .
y[p−1] y[p] ··· y[p+j−2] u[p−1] u[p] ··· u[p+j−2]
y[p] y[p+1] ··· y[p+j−1] u[p] u[p+1] ··· u[p+j−1]
y[p+1] yp+2] ··· y[p+j] u[p+1] u[p+2] ··· u[p+j]
Yf = ; Uf =
.. .. .. .. .. .. .. ..
. . . . . . . .
y[f +p−1] y[f +p] ··· y[f +p+j−1] u[f +p−1] u[f +p] ··· u[f +p+j−1]
Choosing subspace parameters f , p and j
↰
f > guessed maximum order of the system (n), p ≥ f > n
↰
j >> f (in fact nearly equal to N , that is, j = N − (p + f ) + 1)
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 18/22
Identification of State-Space Models
Subspace Algorithm
Extending the state-space model
y[k] C D 0 ··· 0 u[k] I 0 ··· 0 e[k]
y[k + 1] CA CB D ··· 0 u[k + 1] CK I ··· 0 e[k + 1]
. =
. x[k] +
. . .. .
. +
. . .. . .
. . . . . . . . . .
. . . . . . . . . . . .
y[k + f − 1] f −1 f −2 u[k + f − 1] f −2 e[k + f − 1]
CA CA B ··· CB D CA K ··· CK I
| {z } | {z } | {z }| {z } | {z }| {z }
Yf Of Gf Uf Hf Ef
We get the extended state-space model as
Yf = Of x[k] + Gf Uf + Hf Ef (7)
Either the knowledge of Of (or) x[k] is required to get the state-space matrices.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 19/22
Identification of Inferential Models
Problem Formulation
Given N measurements of m input variables, (IV1 , IV2 , · · · , IVm ), and n process output (quality)
variables , (TV1 , TV2 , · · · , TVn ), build a soft sensor (inferential model) that relates input
variables and each output variable, and deploy a model to continuously measure/predict
process quality variables.
ˆ i = f (IV1 , IV2 , · · · , IVm )
TV (OR) ŷi = f (u1 , u2 , · · · , um )
Where
– input variables are also referred to as independent variables (IVs)
– output variables are also referred to as training (target) variables (TVs)
– measurements can be continuous, analyzer or lab
– model can be linear / nonlinear and static / dynamic
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 20/22
Identification of Inferential Models
Modeling Techniques
Model structure (linear, static)
y = β0 + β1 u1 + β2 u2 + · · · + βm um (8)
Modeling techniques
1 Ordinary least squares (OLS) −→ linear and appropriate for equal error variances in data.
2 Weighted least squares (WLS) −→ linear and appropriate for unequal error variances.
3 Partial least squares (PLS) −→ linear and appropriate for correlated input variables.
···
4 Non-linear least squares (NLS) −→ useful for identifying non-linear models.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 21/22
Identification of Inferential Models
Modeling Techniques
Model structure (linear, static)
y = β0 + β1 u1 + β2 u2 + · · · + βm um (8)
Modeling techniques
1 Ordinary least squares (OLS) −→ linear and appropriate for equal error variances in data.
2 Weighted least squares (WLS) −→ linear and appropriate for unequal error variances.
3 Partial least squares (PLS) −→ linear and appropriate for correlated input variables.
···
4 Non-linear least squares (NLS) −→ useful for identifying non-linear models.
Advanced modeling techniques are required to capture more complex features and nonlinear
relationships.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 21/22
Summary
Issues still need to be solved for industries:
– A modeling technique that quickly generates accurate process dynamic models for both
open-loop and closed-loop data.
– A faster modeling technique that generates accurate models from minimal excitation.
– An efficient and explicit method for delay estimation.
Some of the major developments that will emerge in process industries are:
Advancements (artificial intelligence, advanced analytics) in production automation for
more reliable and safer operations.
Generation of meaningful insights to help operators and engineers optimize operational
performance.
Physics-informed neural network models for modeling the dynamics of processes.
Reinforcement learning (RL)-based advanced process control and PID control tuning.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 22/22
Thank you
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 22/22
References
Magnus Jansson.
Subspace identification and arx modeling.
IFAC Proceedings Volumes, 36(16):1585–1590, 2003.
L. Ljung.
System Identification: Theory for the User.
Pearson Education, 1998.
Arun K Tangirala.
Principles of system identification: theory and practice.
CRC Press, 2014.
Peter Van Overschee and Bart De Moor.
N4sid: Subspace algorithms for the identification of combined deterministic-stochastic
systems.
Automatica, 30(1):75–93, 1994.
Sudhakar Kathari | APC | Connected Industrials | HCE | HON ATAL FDP, NIT Trichy 22/22