0% found this document useful (0 votes)
2 views

Formulating data-driven surrogate models for process optimization

This paper discusses the formulation of data-driven surrogate models for process optimization, emphasizing the importance of robustness and accurate extrapolation. It presents two approaches for developing process engineering surrogates and addresses the verification problem to ensure that the surrogate's optimum aligns with the truth model's optimum. The study includes case studies on heat exchanger network synthesis and drill scheduling, highlighting the challenges and strategies in integrating these models into larger optimization problems.

Uploaded by

Qingshu Chang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Formulating data-driven surrogate models for process optimization

This paper discusses the formulation of data-driven surrogate models for process optimization, emphasizing the importance of robustness and accurate extrapolation. It presents two approaches for developing process engineering surrogates and addresses the verification problem to ensure that the surrogate's optimum aligns with the truth model's optimum. The study includes case studies on heat exchanger network synthesis and drill scheduling, highlighting the challenges and strategies in integrating these models into larger optimization problems.

Uploaded by

Qingshu Chang
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Computers and Chemical Engineering 179 (2023) 108411

Contents lists available at ScienceDirect

Computers and Chemical Engineering


journal homepage: www.elsevier.com/locate/cace

Formulating data-driven surrogate models for process optimization


Ruth Misener a ,∗, Lorenz Biegler b ,∗
a
Department of Computing, Imperial College London, London, SW7 2AZ, UK
b
Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15217, United States of America

ARTICLE INFO ABSTRACT

Keywords: Recent developments in data science and machine learning have inspired a new wave of research into data-
Optimization formulations driven modeling for mathematical optimization of process applications. This paper first considers essential
Surrogate modeling conditions for robustness to uncertainties and accurate extrapolation, which are required to integrate surrogates
Data-driven optimization
into process optimization. Next we consider two perspectives for developing process engineering surrogates:
Optimization under uncertainty
a surrogate-led and a mathematical programming-led approach. As these data-driven surrogate models must be
Software tools
integrated into a larger process optimization problem, we discuss the verification problem, i.e., checking that
the optimum of the surrogate corresponds to the optimum of the truth model. The paper investigates two case
studies on surrogate-based optimization for heat exchanger network synthesis and drill scheduling.

1. Introduction locally optimal solutions. Integration of data-driven surrogate


models into mathematical optimization is a rapidly-evolving and
Both data-driven techniques and mathematical optimization have innovating field (Bhosekar and Ierapetritou, 2018; McBride and
been pillars of process systems engineering (PSE) since its inception Sundmacher, 2019).
(Sargent, 1972; Pistikopoulos et al., 2021). But recent developments • The mathematical programming-led perspective selects a specific
in data science and machine learning have inspired a new wave of re- surrogate model based on its desired optimization properties,
search into data-driven techniques for mathematical optimization (Ning e.g. linearity and convexity. This second perspective is important
and You, 2019). This research belongs to larger efforts at the inter- for PSE domains where the optimization problems being solved
section of data science and PSE (Qin and Chiang, 2019; Shang and require a surrogate model that must conform to these properties.
You, 2019; Tsay and Baldea, 2019; Schweidtmann et al., 2021b; The- Therefore, this paper also views the literature through the lens
belt et al., 2022b). Moreover, recent research integrating data-driven of selecting optimization properties. This is especially important
techniques into mathematical optimization includes: derivative-free op- for discrete optimization and mixed-integer problems that require
timization (Rios and Sahinidis, 2013), hybrid data-driven/mechanistic
global solutions.
modeling (Von Stosch et al., 2014; Boukouvala et al., 2016), using
• Integration of data-driven surrogate models within larger process
surrogate models in optimization (Bhosekar and Ierapetritou, 2018;
optimization problems also requires consideration of the verifi-
McBride and Sundmacher, 2019), and optimization under uncertainty
cation problem, i.e., checking that the optimum of the surrogate
using data-driven techniques (Ning and You, 2019; Thebelt et al.,
corresponds to the optimum of the truth model. In this paper,
2022b).
an optimization strategy that guarantees convergence to the truth
This paper considers three perspectives for developing process en-
gineering surrogates: model is discussed and demonstrated with a case study on heat
exchanger network synthesis.
• The surrogate-led perspective first selects an appropriate surro-
gate, e.g., polynomial regression based on smoothness or Gaussian We conclude by describing an open challenge in using surrogate
processes for its statistical properties, and then develops effective models for process optimization, namely that the notion of convergence
optimization formulations for that particular surrogate model. is different between mathematical optimization and machine learning
The surrogate-led approach selects a particular data-driven sur- (Zhang et al., 2022). In the next section we first consider two essential
rogate model and then develops an optimization formulation conditions for integrating surrogates into process optimization and
for that surrogate model, which is then solved with available discuss how to achieve those conditions. Here, we also discuss how our
mathematical programming algorithms that converge to (at least) strategies differ from machine learning algorithms.

∗ Corresponding authors.
E-mail addresses: [email protected] (R. Misener), [email protected] (L. Biegler).

https://doi.org/10.1016/j.compchemeng.2023.108411
Received 2 July 2023; Received in revised form 6 September 2023; Accepted 8 September 2023
Available online 12 September 2023
0098-1354/© 2023 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

2. Surrogate models for process optimization capabilities, particularly for extrapolation and also satisfying additional
constraints. As a result, they may not be compatible for constrained
We mention two essential conditions for process optimization sur- models, such as Physics Informed Neural Networks (PINNs) (Djeumou
rogates: et al., 2022).
Second, optimization algorithms for machine learning are based
• Condition 1. Overfitting data-driven models must be avoided for on different problem characteristics, i.e., data sets that are often too
surrogate models embedded into optimization problems. Unfor- large to fully include for evaluation of functions, gradients and high
tunately, this condition is rarely checked in machine learning derivatives, which must be approximated by data sampling. Also for
algorithms, and as described below, in applications such as im- stable performance, first order, unconstrained SGD methods and their
age recognition and language processing, it is not considered at extensions are used. In a recent text, Wright and Recht (2022) present a
all. On the other hand, with process optimization, the optimum comprehensive exposition and analysis of SGD and accelerated methods
determined from an overfitted surrogate model is unlikely to be based on extended dynamic systems. While these are generally for
robust to changes in the problem data or input conditions. unconstrained, convex problems, they can also be extended beyond this
• Condition 2. Surrogate model accuracy can be enforced by con- class.
straining the models to desired space so that they do not ex- In Section 3 we discuss alternative approaches to deep learning that
trapolate with large errors. Since process optimization often ex- are based on classical regression, can be used to provide confidence
trapolates beyond known operating conditions, it is essential that bands to capture uncertainties due to fitting (Wilson and Sahinidis,
the surrogate-based optimization be feasible and at least locally 2017; Wiebe and Misener, 2021), can be monitored to prevent over-
optimal for the actual process. fitting, and can be applied directly to surrogate-based optimization.
To close this section, we also consider the impact of surrogate process
These conditions are also closely tied to the stability of optimal so- models at multiple system levels.
lutions, which are partially characterized by Lipschitz continuity for
input/output relations and data uncertainties. The Lipschitz continu- 2.2. Surrogate process modeling levels
ity property depends strongly on surrogate model parameters with
bounded confidence regions that result from data uncertainties. More- For process optimization problems, so-called truth models for plant
over, for gradient-based optimization algorithms where both the ‘truth’, system-wide equations can be classified at the physical property, unit
𝑡(𝑤) and surrogate 𝑟(𝑤) models are smooth with respect to their inputs and system or plant levels, through the following form:
𝑤 and basepoint 𝑤, ̄ the 𝜅-fully linear property (Conn et al., 2009):
𝑦 = 𝑔(𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑧𝑝𝑝 , 𝑤𝑠𝑦𝑠 )
‖𝑡(𝑤) − 𝑟(𝑤)‖ ≤ 𝜅𝑓 𝛥2 , ‖∇𝑡(𝑤) − ∇𝑟(𝑤)‖ ≤ 𝜅𝑔 𝛥 (1) 0 = 𝑓𝑠𝑦𝑠 (𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑧𝑝𝑝 , 𝑤𝑠𝑦𝑠 )
(2)
∀𝑤, 𝑤̄ ∈ B𝛥 , where B𝛥 = {𝑤, 𝑤̄ ∈ R𝑛𝑤 |‖𝑤 − 𝑤‖
̄ ≤ 𝛥} 0 = 𝑓𝑢𝑛𝑖𝑡 (𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑧𝑝𝑝 )
is a necessary condition for a convergent surrogate-based optimization 0 = 𝑓𝑝𝑝 (𝑧𝑝𝑝 , 𝑧𝑢𝑛𝑖𝑡 )
formulation. as shown in Fig. 1. Here 𝑦 and 𝑤 are system output and input vectors,
respectively, and 𝑧𝑙 , are state variables at level 𝑙. We now consider the
2.1. Optimization algorithms for machine learning application of surrogate models at each of the three levels.
At the system level, the surrogate model abstraction is the most
Machine Learning, AI and ‘‘big data’’ have seen widespread applica- generalized and seeks to substitute the entire set of Eqs. (2) with a
tions in information capture, data classification, imaging/visual recog- single surrogate model,
nition and natural language processing. These deep learning applica-
𝑦 = 𝑔𝑠𝑢𝑟𝑟 (𝑤𝑠𝑦𝑠 ) = 𝑔(𝑧𝑠𝑦𝑠 , 𝑤𝑠𝑦𝑠 ) + 𝜀𝑦 ,
tions form classes of optimization problems that are significantly differ-
ent from process applications. Moreover, the enabling tools are recent 0 = 𝑓𝑠𝑦𝑠 (𝑧𝑠𝑦𝑠 , 𝑤𝑠𝑦𝑠 ),
optimization algorithms, which differ significantly from optimization where 𝜀𝑦 is the approximation error vector of the surrogate model.
algorithms applied to process applications. This approach avoids calculating mass or energy balances and attempts
First, machine learning models lead to predictions that are largely to determine a simulation topology solely based upon the simulation
interpolative rather than extrapolative. Many surrogates, such as neural input variables 𝑤𝑠𝑦𝑠 . Such models have the following advantages. First,
networks, have nonsmooth activation functions, for example ReLU. As plant surrogate models typically have few input variables (degrees of
a result, Zhang et al. (2022) show that neural network weights, with freedom) for the entire plant, which may lead to high fidelity inter-
either differentiable or non-differentiable activation functions, are not polative plant models. Moreover, they are often solved with simpler
expected to converge to stationary points of the loss function. Neverthe- derivative-free optimization solvers. On the other hand, extrapolating
less, stable convergence behavior approaching the minimum of the loss these surrogates will likely violate conservation laws and other first
function can still be shown with algorithms based on stochastic gradient principle relations. Moreover, these surrogates are not reusable for
descent (SGD). In addition, the data fitting tasks are trended to increase related cases, as they need to be reconstructed for every specific plant
weights that are generally more numerous than data entries, thus case.
leading to overfitted models that lead to surprisingly stable, low error Instead, surrogate models at the unit level can be linked to form a
fitted models, despite the occurrence of nonunique weights. In this way, plant model. At the unit level, these represent an intermediate level
machine learning for surrogate models can be analyzed analogously to of surrogacy that satisfies the overall mass and energy balances of
stability theory of observers for stochastic dynamic processes (Zhang the plant, even though these surrogate models are not designed to
et al., 2022). In particular, modern approaches to data fitting are account for conservation or other first principle laws within the unit.
governed by different concepts than with classical approaches. These The underlying system of Eqs. (2) is reduced to the form:
include over-parametrized kernel machines, nonunique minimizers and 𝑦 = 𝑔(𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑤𝑠𝑦𝑠 )
the ‘‘double descent curve’’, which leads to stable loss functions, even
0 = 𝑓𝑠𝑦𝑠 (𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑤𝑠𝑦𝑠 ) (3)
with overfitted models (Belkin, 2021). Consequently, such models have
good convergence properties and algorithms such as SGD often perform 0 = 𝑓𝑠𝑢𝑟𝑟,𝑢𝑛𝑖𝑡 (𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 ) = 𝑓𝑢𝑛𝑖𝑡 (𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 ) + 𝜀𝑢𝑛𝑖𝑡
well, even with data mini-batches and early stopping. where 𝜀𝑢𝑛𝑖𝑡 is the approximation error vector of the surrogate model.
On the other hand, overfitted deep learning models may be non- Such models have the following features. Unit-level surrogate mod-
robust to data uncertainties, and also may have limited predictive els with few input variables (degrees of freedom) based on the unit

2
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

Fig. 1. Surrogate model levels in plant models.

structure can lead to high fidelity interpolative unit models. These problems directly using simulation results. In a detailed comparison,
surrogates can be combined with other first principle models to form they show that ALAMO-based optimization performs well for smaller
a hybrid system model. Moreover, conservation laws hold at plant and simpler systems, while the ReLU network models perform better
level and these surrogates are reusable for new plant-level cases. On for more complex ones. On the other hand, at the plant level the most
the other hand, extrapolating the surrogate model may violate first effective performance was obtained with DFO, using smooth penalty
principle relations and conservation laws in the unit, and surrogate functions to handle constraints.
extrapolation errors may lead to convergence failures at the plant
level. Also, plant-wide optimization with embedded unit surrogates re- 3. Process optimization using data-driven surrogate models
quires mid-scale optimization solvers, which are more computationally
expensive. The following classes of surrogate models have been investigated
Finally, surrogate models describing first-principles relations within for optimization applications.
common process units, e.g., flash calculations and physical properties,
represent the lowest level of abstraction, leaving the rest of the unit- • Regression based on polynomials: Developing polynomial regression
and systems-level model equations in place. Eq. (2) becomes: models is common and has been extensively reviewed (Bhosekar
and Ierapetritou, 2018). Some newer ideas in the process sys-
𝑦 = 𝑔(𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑧𝑝𝑝 , 𝑤𝑠𝑦𝑠 ) tems engineering literature adaptively select which polynomial
0 = 𝑓𝑠𝑦𝑠 (𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑧𝑝𝑝 , 𝑤𝑠𝑦𝑠 ) regressors to use, e.g., ALAMO (Wilson and Sahinidis, 2017), and
(4) build a polynomial surrogate explicitly for global optimization,
0 = 𝑓𝑢𝑛𝑖𝑡 (𝑧𝑠𝑦𝑠 , 𝑧𝑢𝑛𝑖𝑡 , 𝑧𝑝𝑝 )
e.g., ARGONAUT (Boukouvala and Floudas, 2017). The ALAMO
0 = 𝑓𝑠𝑢𝑟𝑟,𝑝𝑝 (𝑧𝑝𝑝 , 𝑧𝑢𝑛𝑖𝑡 ) = 𝑓𝑝𝑝 (𝑧𝑝𝑝 , 𝑧𝑢𝑛𝑖𝑡 ) + 𝜀𝑝𝑝 solver applies a mixed-integer linear formulation, linear least
where 𝜀𝑝𝑝 is the approximation error for the surrogate model. Rigorous squares and modified Akaike or Bayesian information criteria.
first-principle or ‘‘physics-based’’ models only remain within the unit Moreover, these approaches can provide estimates of confidence
models and the rest of the process is modeled with rigorous unit- and regions, which can be used to assess model accuracy within an
plant-level equations. These equations, along with the surrogates, form optimization framework.
an equation-oriented model, which must be solved with a large-scale • Regression trees: Mixed-integer programming formulations of
optimization solver. Surrogate subunit models generally have few input gradient-boosted regression trees are from Mišić (2020) and Mis-
variables (degrees of freedom), leading to high fidelity interpolative try et al. (2021). These formulations are available, for example,
models. Moreover, conservation laws hold at the unit and plant level, in the black-box optimizer ENTMOOT (Thebelt et al., 2021,
and these subunit surrogates are reusable for new plant-level and unit- 2022a) and the formulation tool OMLT (Ceccon et al., 2022).
level cases. On the other hand, the optimization problem using Eq. (4) Related codes like reluMIP (Lueg et al., 2021) and OMLT (Ceccon
must be solved with large-scale optimization solvers, in order to evalu- et al., 2022) allow users to directly add data-driven surrogates to
ate plant-level and unit-level cases efficiently, and non-smooth features Pyomo (Bynum et al., 2021), an optimization modeling platform.
and ill-conditioning in these particular surrogate models can lead to • Neural Networks: As mentioned in the previous section, care is
convergence failures at both unit and plant levels. Ma et al. (2022) needed to prevent overfitting with neural networks embedded
and Goldstein et al. (2022) explore the performance of surrogates at within optimization formulations. Nevertheless, some of the ear-
different modeling levels. liest formulations for optimizing over neural networks are big-M
Moreover, Ma et al. (2022) also consider data-driven optimization mixed-integer programming formulations relevant to ReLU acti-
formulations at the plant level, in order to minimize energy cost of vation functions (Lomuscio and Maganti, 2017; Fischetti and Jo,
extractive distillation with truth models from an Aspen simulator. First, 2018). Grimstad and Andersson (2019) considered tightening big-
they apply surrogate-based optimization using ALAMO’s generalized M parameters for optimization over neural networks with ReLU
linear models (Wilson and Sahinidis, 2017). Second, they consider activation functions. Alternative mixed-integer formulations for
neural networks with the rectified linear unit (ReLU) activation func- ReLU activation functions include adding cuts representing the
tion. Third, they use derivative-free optimization (DFO) to optimize convex hull of a single neural network node (Anderson et al.,

3
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

2020) or a partition-based formulation that includes a subset of There has been significant work solving optimization problems
the convex hull constraints (Tsay et al., 2021). based on hybrid data-driven/mechanistic models and the consequences
Other mathematical programming formulations for ReLU acti- of these algorithms for surrogate models (Eason and Biegler, 2018;
vation functions include a semidefinite relaxation (Raghunathan Bajaj et al., 2018; Kim and Boukouvala, 2020). The next two sub-
et al., 2018) and a quadratic relaxation derived through applying sections further develop these ideas. For optimization, all models are
the S-Lemma (Fazlyab et al., 2019). Moreover, for optimiza- imperfect, but some are useful. As a result, surrogate models ranging
tion with ReLU activation functions of the form (𝑟 = max(0, 𝑥)), from modified first principles to shortcut models to data-driven models
Yang et al. (2021) also apply convex hull constraints to prevent have been applied in the context of optimization studies.
extrapolation of neural networks and consider three problem for-
mulations: (i) embedded ReLU, where 𝑟 = 𝑎∕(1 + 𝑒−𝑎𝑥 ) ≈ max(0, 𝑥) 3.2. Strategies for surrogate-based optimization
is smoothed for the nonlinear optimization solver, (ii) binary
variables to handle the max functions within a mixed-integer Derivative-free optimization methods are generally formulated for
linear strategy, (iii) complementarity formulations with 𝑟 = 𝑥 + 𝑦, unconstrained optimization problems, i.e., 𝑚𝑖𝑛𝑥∈R𝑛 𝑓 (𝑥). These methods
0 ≥ 𝑦 ⟂ 𝑥+𝑦 ≥ 0 using basic complementarity formulations within are either stochastic or deterministic in nature. The former methods
the nonlinear optimization problem. While the second and third are based on opportunistic sampling and selection algorithms, which
options both converge robustly, faster performance was observed converge asymptotically, but offer no guarantees for a finite number
with the third approach. of samples. On the other hand, deterministic methods are based on
Different types of neural network activation functions have also generalized pattern searches, which adapt themselves to the response
been considered, e.g. binarized neural networks (Khalil et al., surface, and often provide guarantees to convergence of local optimal-
2018), as well as a reduced space formulation for nonlinear ity of the truth model. These include the DFO and NOMAD.DIRECT
smooth activation functions (Schweidtmann and Mitsos, 2019). solvers, see Conn et al. (2009).
• Gaussian processes: Optimization over Gaussian processes can be For the development of constrained optimization formulations with
managed in many different ways. First, Gaussian processes are data-driven models, optimization strategies are applied with embed-
a natural fit for robust optimization strategies (Bertsimas et al., ded surrogate models, which substitute for high-fidelity (or ‘‘truth’’)
2010a,b; Bogunovic et al., 2018; Wiebe and Misener, 2021; Wiebe models; these are widely performed in process engineering. Here,
et al., 2022). Of course, there are infinite possible functions the high-fidelity model is replaced over the entire optimization space
in a Gaussian process. Depending on the application, we can with a surrogate such as a polynomial, neural network or Kriging
either optimize over the mean while integrating some notion model (Bhosekar and Ierapetritou, 2018). While this approach no
of uncertainty or use pathwise-conditioning to sample from the longer requires additional evaluation of the high-fidelity model once
Gaussian process posterior (Wilson et al., 2020). Schweidtmann the surrogate model is established, it is likely that the optimization will
et al. (2021a) have also developed a reduced space formulation lead to extrapolation of the surrogate model. And these extrapolation
for global optimization. errors for the surrogate can lead to convergence failures, or termination
at a point that is not the optimum of the high-fidelity model. Con-
3.1. Selecting a data-driven surrogate based on its optimization properties sequently, it is challenging to maintain the accuracy of the surrogate
model over the entire optimization space.
Often, it is important to quantify the uncertainty of surrogate mod-
The preceding discussion develops optimization models for data-
els and integrate this uncertainty into the optimization problem. Our
driven surrogates. But we often wish to focus first on the needs of an
case studies present two ways of doing this. In the surrogate-based heat
application and then choose a corresponding surrogate. One example of
exchanger network synthesis example, the detailed exchanger design
a consideration for selecting a data-driven surrogate based on its opti-
is regularly queried. Effectively, this is a way of consistently testing
mization properties is making sure that the constraints associated with
the veracity of the surrogate model. In the drill scheduling example,
a data-driven surrogate somehow match the constraints or objective of
this truth model is not available, so we use the modeled uncertainty
the larger decision-making problem. For example, if the larger decision-
from the Gaussian process to try to mitigate against uncertainty. For
making problem is a discrete, unit-commitment model, it may make
more information about how surrogate models incorporate uncertainty,
sense to use a surrogate such as a binarized neural network (Khalil
see Bhosekar and Ierapetritou (2018).
et al., 2018) which admits a discrete optimization formulation.
Early foundational work initializing this line of inquiry explored
3.3. Conditions where the optimum of the surrogate model corresponds to
how individual surrogates may fit into larger decision-making problems
the optimum of the truth model
(Palmer and Realff, 2002; Caballero and Grossmann, 2008; Henao
and Maravelias, 2011). Other ideas in this area include: developing
For global optimization there are a number of DFO methods with
a decision tree with desired properties (Bertsimas and Dunn, 2017),
convergence guarantees ‘‘in the limit’’, e.g., see Huyer and Neumeier
selecting ReLU neural networks to expand the applicability of multi-
(2008). However, unlike conventional methods based on spatial branch
parametric programming (Katz et al., 2020), and developing a neural
and bound search, they do not provide lower bounds and certificates
network with desired properties (Tsay, 2021). As mentioned in the
for global solutions.
example with a unit-commitment problem and a discrete formulation,
On the other hand, for local optimization strategies, this chal-
one major consideration of selecting a data-driven surrogate based
lenge can be addressed through locally approximated surrogate models
on its optimization properties is whether the data-driven surrogate
that are updated with recourse to the truth model, 𝑡(𝑤) in (1), as
will be discrete or not. Optimization models incorporating decision
part of the optimization strategy. Building on the unconstrained ap-
trees (Mistry et al., 2021), neural networks with ReLU activation func-
proaches by Fahl and Sachs (2003) and Conn et al. (2009), Eason
tions (Anderson et al., 2020), and graph neural networks (Zhang et al.,
and Biegler (2018) developed the trust region filter (TRF) method for
2023) will typically create binary decisions, although there has been
constrained optimization that samples from high-fidelity models which
significant work developing continuous relaxations or reformulations
have smooth input–output properties and surrogate models that satisfy
of neural networks with ReLU activation functions (Raghunathan et al.,
Eq. (1). Here we consider the truth model optimization problem given
2018; Fazlyab et al., 2019; Yang et al., 2021). Other possible con-
by:
siderations involve selecting surrogates with convex nonlinearities or
surrogates with a built-in notion of uncertainty (Thebelt et al., 2022b). min 𝑓 (𝑥) 𝑠.𝑡. 𝑐(𝑥) = 0, 𝑔(𝑥) ≤ 0, 𝑦 − 𝑡(𝑤) = 0 (5)

4
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

Fig. 2. Flowchart for trust region filter (TRF) algorithm. The algorithm simplifies and the dotted steps can be removed when ∇𝑡(𝑤𝑘 ) and FOC terms are available.

where 𝑥 = (𝑤, 𝑦, 𝑧), 𝑤 and 𝑦 are the truth model input and output 2013), real-time optimization of petroleum refineries (Chen et al.,
variables, respectively, 𝑧 are the remaining variables and 𝑐(𝑥) = 0, 2021), and optimization of benzene chlorination processes (Yoshio and
𝑔(𝑥) ≤ 0 are the remaining (equation-oriented) constraints. To solve Biegler, 2020). Performance of the TRF method depends on accuracy
(5) we first represent the truth model with a suitable local surrogate of the surrogate model, and sampling of the truth model is required as
𝑦 − 𝑟𝑘 (𝑤) = 0, which may be updated through truth model sample TRF proceeds. On the other hand, this approach is usually easier to set
updates at iteration 𝑘. Since 𝑟𝑘 (𝑤) is only a local approximation, we add up, initialize and solve than a large-scale equation-based optimization
a trust region constraint and form the trust region subproblem (TRSP): problem.
Finally, we note that the TRF approach extends to problems that
min 𝑓 (𝑥) 𝑠.𝑡. 𝑐(𝑥) = 0, 𝑔(𝑥) ≤ 0, 𝑦 − 𝑟𝑘 (𝑤) = 0, ‖𝑥 − 𝑥𝑘 ‖ ≤ 𝛥𝑘 (6)
would be intractable without surrogate-based optimization. In the next
with a solution given by 𝑥∗
= 𝑥𝑘 + 𝑠𝑘 . In addition, if derivatives section we present a case study on heat exchanger network synthesis
are available from the truth model, then the surrogate 𝑟𝑘 (𝑤) can be (HENS) with detailed exchanger models from Kazi et al. (2021).
redefined by a global surrogate model 𝑟̃(𝑤) plus zero-order (ZOC) and
first-order correction (FOC) terms, as follows: 4. Surrogate-led case study: Surrogate-based HENS
𝑇 𝑘
𝑟𝑘 (𝑤) = 𝑟̃(𝑤) + (𝑡(𝑤𝑘 ) − 𝑟̃(𝑤𝑘 )) + (∇𝑡(𝑤𝑘 ) − ∇̃𝑟(𝑤𝑘 )) (𝑤 − 𝑤 ) (7)
The Heat Exchanger Network Synthesis (HENS) problem has been
As seen in Fig. 2, the TRF method repeatedly solves TRSP (6) researched in several hundred research publications for over 75 years.
with local surrogate models under trust region constraints, along with However, there are very few HENS studies that include detailed equip-
stabilizing filter methods that shrink or enlarge the trust region to ment designs within the synthesis task. HENS approaches can be gen-
promote convergence. erally divided into two categories. The first deals with pinch-based
The trust region filter (TRF) method for surrogate-based optimiza- targeting and design approaches, pioneered by Linnhoff and Hindmarsh
tion has rigorous guarantees of convergence to locally optimal solutions (1983), which lead to well-known engineering workflows. The second
for the truth model. These are based on the DFO properties in Conn category is devoted to optimization approaches that simultaneously
et al. (2009) and apply to any surrogate model 𝑟𝑘 (𝑤) that satisfies the consider the trade-offs of energy minimization, operating costs and
𝜅− fully linear property (1). We note that convergence to the optimum capital costs. The most widespread representation of the simultaneous,
of the truth model problem (5) requires lim𝑘→∞ 𝛥𝑘 → 0 if ∇𝑡(𝑤𝑘 ) is mathematical programming network synthesis problem is the stage-wise
unavailable. On the other hand, when derivatives of these high-fidelity superstructure (SWS) formulation (Yee and Grossmann, 1990), which
model (∇𝑡(𝑤)) are available, then the 𝜅-fully linear condition (1) always is solved as a nonconvex MINLP. On the other hand, many literature
holds for any smooth 𝑟̃(𝑤), and faster convergence occurs without studies attest that large integrated optimization models are difficult
requiring 𝛥𝑘 → 0. Additional details and convergence properties of this to solve, particularly when multiple exchangers are considered ex-
algorithm can be found in Eason and Biegler (2016) and Yoshio and changers. These difficulties are exacerbated when stream splitting,
Biegler (2020). intermediate temperatures, variable heat transfer coefficients and mul-
The TRF method has been applied to a number of surrogate-based tiple shells are considered. To alleviate these difficulties, Kazi et al.
optimization case studies, where direct optimization of the truth models (2021) applied the TRF approach and considered detailed PDE-based
is prohibitive. These include air-fired and oxycombustion power plants heat exchangers as truth models, with surrogates provided by multi-
(Dowling et al., 2016), surrogate equations of state and MWD models pass LMTD-based models. These surrogates along with zero and first
for polymerization (Eason et al., 2018; Kang et al., 2019), Moreover, order corrections were embedded within the TRF framework. The PDE-
approaches with FOC have been used to surrogate large models that are based models, described in Kazi et al. (2020a) as finite element models
otherwise expensive in computation, such as aerodynamics and pres- (see Fig. 3) can model heat exchangers with temperature dependent
sure swing optimization (Alexandrov et al., 1998; Agarwal and Biegler, physical properties, phase changes, multiple passes and shells, baffles

5
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

Fig. 3. Detailed heat exchanger models for HENS.

and other structural details. The HENS solution strategy then combines • Non-isothermal Bilinear Constraints for mixing streams and in-
the SWS MINLP model with an NLP subproblem that incorporates termediate temperatures, which are selected from the MINLP
detailed heat exchanger models along with the TRF algorithm. solution at each stage,
• Bypass constraints which reduce individual pressure drops and
4.1. Surrogate-based optimization problem for HENS increase area costs,
• Reduced Models based on using shortcut models, supplemented by
The HENS Optimization Model is divided into three parts: (1) Stage- zero-order and first-order corrections from truth models,
wise superstructure (SWS) MINLP model (2) Non-isothermal mixing • Trust Region Constraints that ensure the solution of the NLP, 𝑥∗ ,
model within the TRSP (6) problem and (3) the PDE-based exchanger remains close enough to 𝑧𝑘 to provide a valid reduced model.
‘‘truth’’ model.
The resulting TRSP model is then solved using the TRF approach
described in Section 3.3 and Fig. 2.
4.1.1. MINLP model
The MINLP model used in this study is based on the SWS representa- 4.1.3. Detailed exchanger design
tion given by Yee and Grossmann (1990). The superstructure model has For this study, the detailed design of each exchanger forms the
𝑁 + 2 stages where 𝑁 = 𝑚𝑎𝑥(𝑁ℎ , 𝑁𝑐 ), and 𝑁ℎ , 𝑁𝑐 denote the number ‘‘truth’’ model derived from the first-principles partial differential (PDE)
of hot and cold streams, respectively. The first and last stages are used heat equations. As detailed in Kazi et al. (2020b) the truth model is
to represent feasible connections between process streams and utility more accurate than the LMTD-based model especially for multiple shell
streams. Each possible assignment between one hot and one cold stream heat exchangers, and it is derived using fewer assumptions. The model
(including utility streams) is represented by a binary variable, and is also easily extended to cases where physical properties vary within
only single hot and cold utilities are considered in this study. Energy the exchanger and for phase changes. The PDE system is discretized
balances for individual streams are written at the stage boundaries with using geometric design variables (tube diameters, number of baffles
the assumption of isothermal mixing of split streams. Heat exchanger and tube passes). These can be determined quickly by evaluating an
areas are calculated using the LMTD approximation (Kazi et al., 2020c) LMTD-based model with different sets of design configurations with
with constant values for overall heat transfer coefficient 𝑈 . Pressure enumerated discrete variable values. The discretized PDE system is
drops across each individual exchanger and number of shells are also solved as an NLP optimization problem with the tube length as the
considered at the MINLP level. degree of freedom. Modeling the PDE design as a separate NLP allows
The objective function consists of minimizing total annualized cost us to extract the parametric sensitivity of the solution with respect to
(TAC): the mass flow rates and terminal temperatures. These sensitivities are
used to determine the FOC terms in (7) for the surrogate models in the
𝑇 𝐴𝐶 = Exchanger Area Cost (Fixed Cost × Number of Shells
TRSP (6). The details of this NLP model for heat exchanger design are
+ Variable Cost × Area) given in Kazi et al. (2020b).
+ Pumping Cost (Pumping per kPa cost
4.2. Optimization strategy
× Pressure Drops(kPa))
+ Utility Costs (Steam Cost × Hot Utility The overall HENS optimization strategy is presented in Fig. 4
+ Cooling Water Cost × Cold Utility). and described in more detail in Kazi et al. (2021). Once a topology
(i.e., stream matches) is obtained from an MINLP solution (solved
At the MINLP level the heat exchanger areas are based on the LMTD with BARON Sahinidis, 1996), a nonconvex NLP subproblem (solved
equation with a correction factor 𝐹𝑡 = 1, which corresponds to ideal with IPOPT 3.12 Wächter and Biegler, 2006) is considered with fixed
counter-current flow. The number of shell passes is assumed small, matches, non-isothermal mixing, and detailed ‘‘truth’’ heat exchanger
and the pressure drops are fixed to nominal values lower than would models. From this, the TRSP problem (6) is formed and the TRF
be expected from the detailed design. All of these parameter choices algorithm further samples the truth models, updates the ZOC and FOC
ensure that the MINLP will underestimate the true objective value and terms and manages the size of the trust region in (6). Once a detailed
serve as a valid lower bound to the overall problem. design is obtained via the TRF algorithm, this solution serves as the
upper bound to the overall MINLP problem.
4.1.2. Trust region subproblem At this point, an integer cut is added to avoid a repeated topology
The TRSP (6) comprises the network model that includes stream and the MINLP is re-solved. If the objective value (new lower bound)
splitting and stream bypasses for a given topology. Here the binary is above the current upper bound, we declare the current upper bound
variables from the MINLP solution are fixed variables in the NLP and solution as the optimum. Otherwise, a new upper bound is obtained
the flowrates and inlet-outlet temperatures are defined as free variables. from the NLP subproblem and the process continues. Further details of
The model constraints include: this TRF-based optimization strategy can be found in Kazi et al. (2021).

6
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

Fig. 4. The overall algorithm used in this study.

4.3. HENS case study Table 1


Summary of HENS case study and comparison with other studies.

The case study example has 7 hot process streams, 3 cold process TRF/MINLP Mizutani et al. Short et al.

streams, and single hot and cold utilities, with stream data taken Total annual cost ($/a) 3,764,984 5,183,221 4,203,057
Utility costs ($/a) 3,497,138 5,154,291 4,091,975
from Mizutani et al. (2003). This network is known to contain relatively
Area costs ($/a) 60,874 11,123 42,981.97
large exchangers, where multiple shells are common. Nevertheless, Pumping costs ($/a) 163,972 4807 46,099
the algorithm terminates after only 2 MINLP cycles; the trust region Fixed costs ($/a) 43,000 13,000 22,000
algorithm terminates in 60 TRSP iterations and the overall algorithm Number of matches 10 8 8
converges in 1.7 CPU h. We note that the time taken in previous HEN Number of exchangers 43 13 22

optimization studies (4.5 CPU h in Xiao et al. (2019) and over 7 CPU h
in Kazi et al. (2020c)) shows that the TRF-based algorithm obtains op-
timal solutions in significantly less time. The optimal HEN with bypass utilities for the given topology. In turn, the TRF algorithm manages
streams is presented in Fig. 5 and compared with two previous studies. the updates and accuracy of the reduced models with only a few
Despite the larger number of shells and higher pumping requirements in function and gradient values supplied by the truth model, Moreover,
the network, we note that it still obtains a lower TAC than the previous this approach ensures that the NLPs with fixed topology (shown in the
studies, which use single shells and single pass exchangers. dashed box in Fig. 4) are guaranteed to converge to the solution of the
As shown in Table 1, the TRF method shows considerable im- NLPs with finite element PDE (truth) models. Since (7) always satisfies
provement over previous studies, because it directly integrates crucial the 𝜅-fully linear condition (1), convergence of the TRF algorithm
problem elements, network matches, utility minimization and detailed ensures that the verification problem is satisfied for these surrogate-
equipment modeling. In comparison to previous studies the crucial based NLPs. Consequently, the optimal solutions with reduced heat
difference lies in the lower utility costs, as shown in Table 1. In contrast exchanger models correspond to the same optimal solutions as with the
to the method of Short et al. (2016), we solve the NLP subproblem using truth models. As a result, even though global MINLP solutions cannot
Eq. (6) to update the models. be guaranteed for the nonconvex MINLP in Fig. 4, the embedded TRF-
This HENS strategy employs the TRF approach to simultaneously based NLP solutions still provide high-quality upper bounds that lead to
optimize the split flows, pressure drops, areas, number of shells and improved HENs that are faithful to PDE-based heat exchanger models.

7
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

Fig. 5. HEN case study solution with bypasses.

5. Math programming-led case study: Drill scheduling

Like HENS, drill scheduling is well-studied (Detournay et al., 2008;


Ba et al., 2016). The objective in drilling oil or geothermal wells
is typically to minimize total well completion time by choosing the
rotational speed 𝑁̇ ∈ R and the weight on bit 𝑊̇ ∈ R as a function
of rock type and depth 𝑥 ∈ R below the surface. Fig. 6 diagrams an
example where the rock changes from type 1 to type 2 at 𝑥1 , mainte-
nance is required at 𝑥2 , and the drilling has to proceed until reaching
𝑥3 . The total completion time of the well depends on any required
maintenance, and nonlinearly on the drill motor power curves and the
interaction between the drill bit and the rock (Wiebe et al., 2022).
The challenge centers on considering predicted equipment degradation
in a scheduling application. The need to drill through several time
periods effectively means that the aggregated black-box constraint sums
several draws from the same black-box function, with different decision
variables as arguments in each individual black-box term.
For this application, there is no underlying truth model other
than the expensive, time-consuming experiment of actually doing the
drilling. Unlike the HENS case study, there is no underlying PDE
because the rock conditions are highly uncertain and difficult to quan-
Fig. 6. An example of drill scheduling with two rock types.
tify. As a result, verification that the data-driven model converges to
Source: Taken from Wiebe et al. (2022).
the truth optimum cannot be considered here. Instead, the overall
challenge is that this kind of optimization problem ends up aggregating
and solving with black-box functions, i.e.,
∑ where 𝑥𝑖 is a decision variable and 𝑎̃𝑖 depends on a vector of decision
𝑎̃𝑖 𝑥𝑖 ≤ 𝑏 (8a) variables 𝑦⃗𝑖 ∈ R𝑘 through a black-box function 𝑔(⋅). In the drilling
𝑖
example, the rate of motor degradation is a black-box function and
𝑎̃𝑖 = 𝑔(𝑦⃗𝑖 ), (8b) the motor degradation aggregates over time. Wiebe et al. (2022) give

8
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

the complete model, but the essential idea is that this black-box model Acknowledgments & disclosure of funding
should be somehow amenable to either reformulating or approximating
a chance constraint which bounds the probability of failure for the RM gratefully acknowledges support from a BASF/Royal Academy
motor. If the motor degradation is well-represented by a standard of Engineering Research Chair and EPSRC, United Kingdom
Gaussian process, then the chance constraint can be reformulated into a EP/W003317/1. LB gratefully acknowledges support from the Covestro
deterministic constraint. If the motor degradation is better represented Chair at Carnegie Mellon University, United States.
by a warped Gaussian process (Snelson et al., 2003), then (Wiebe et al.,
2022) develop a robust approximation. The advantage of this math- References
ematical programming-led approach in this instance is that there are
Agarwal, A., Biegler, L.T., 2013. A trust-region framework for constrained optimization
many possible surrogate models for purely data-driven surrogate mod- using reduced order modeling. Optim. Eng. 14, 1, 3–36.
els like the model of motor degradation. By using a surrogate model Alexandrov, N.M., Dennis, J.E., Lewis, R.M., Torczon, V., 1998. A trust-region frame-
like warped Gaussian processes that directly incorporate uncertainty, work for managing the use of approximation models in optimization. Struct. Optim.
15 (1), 16–23.
we can either develop an exact reformulation of the chance constraint
Anderson, R., Huchette, J., Ma, W., Tjandraatmadja, C., Vielma, J.P., 2020. Strong
or a good robust approximation, depending on the appropriate model. mixed-integer programming formulations for trained neural networks. Math. Prog.
183, 3–39.
Ba, S., Pushkarev, M., Kolyshkin, A., Song, L., Yin, L.L., 2016. Positive displacement
6. Conclusions motor modeling: skyrocketing the way we design, select, and operate mud motors.
In: Abu Dhabi International Petroleum Exhibition & Conference. OnePetro.
This paper offers a concise review of the formulation and imple- Bajaj, I., Iyer, S.S., Hasan, M.F., 2018. A trust region-based two phase algorithm
for constrained black-box and grey-box optimization with infeasible initial point.
mentation of data-driven surrogate models for process optimization Comput. Chem. Eng. 116, 306–321.
strategies and applications. First, we note that data-driven models must Belkin, M., 2021. Fit without fear: remarkable mathematical phenomena of deep
avoid large extrapolation errors and overfitting in order to mitigate learning through the prism of interpolation. Acta Numer. 30, 203–248.
Bertsimas, D., Dunn, J., 2017. Optimal classification trees. Mach. Learn. 106 (7),
surrogate model instability and ill-conditioning. Next, recent advances
1039–1082.
in the development of data-driven models are briefly reviewed to Bertsimas, D., Nohadani, O., Teo, K.M., 2010a. Nonconvex robust optimization for
describe polynomial regression and Kriging, regression trees, neural problems with constraints. INFORMS J. Comput. 22 (1), 44–58.
networks and Gaussian processes. Bertsimas, D., Nohadani, O., Teo, K.M., 2010b. Robust optimization for unconstrained
simulation-based problems. Oper. Res. 58 (1), 161–178.
We also note that the proper selection of the surrogate and level of Bhosekar, A., Ierapetritou, M., 2018. Advances in surrogate based modeling, feasibility
implementation within process systems plays a key role, and trade-offs analysis, and optimization: A review. Comput. Chem. Eng. 108, 250–267.
between construction costs, computational complexity, reusability and Bogunovic, I., Scarlett, J., Jegelka, S., Cevher, V., 2018. Adversarially robust
robustness must be considered carefully. Moreover, we review several optimization with Gaussian processes. In: NeurIPS, Vol. 31.
Boukouvala, F., Floudas, C.A., 2017. ARGONAUT: AlgoRithms for global optimization
current optimization approaches including DFO, gradient-based local of coNstrAined grey-box computational problems. Optim. Lett. 11 (5), 895–913.
optimization, global strategies based on spatial branch and bound, and Boukouvala, F., Misener, R., Floudas, C.A., 2016. Global optimization advances in
stochastic optimization. For all of these approaches, interaction with Mixed-Integer Nonlinear Programming, MINLP, and Constrained Derivative-Free
data sampling and (potentially expensive) truth models is needed to Optimization, CDFO. European J. Oper. Res. 252 (3), 701–727.
Bynum, M.L., Hackebeil, G.A., Hart, W.E., Laird, C.D., Nicholson, B.L., Siirola, J.D.,
avoid extrapolation errors. In particular, with the trust region filter Watson, J.-P., Woodruff, D.L., 2021. Pyomo - Optimization Modeling in Python,
method, convergence to local optima can be guaranteed under mild Vol. 67. Springer Nature.
conditions for the surrogate model. Caballero, J.A., Grossmann, I.E., 2008. An algorithm for the use of surrogate models
in modular flowsheet optimization. AIChE J. 54 (10), 2633–2650.
Future research directions include the development, analysis and
Ceccon, F., Jalving, J., Haddad, J., Thebelt, A., Tsay, C., Laird, C.D., Misener, R.,
evaluation of accelerated algorithms with much less recourse to truth 2022. OMLT: Optimization & machine learning toolkit. J. Mach. Learn. Res. 23
model information. Moreover, treatment of residual errors, from con- (1), 15829–15836.
struction of surrogate regression models, needs to be analyzed within Chen, X., Wu, K., Bai, A., Masuku, C.M., Niederberger, J., Liporace, F.S., Biegler, L.T.,
2021. Real-time refinery optimization with reduced-order fluidized catalytic cracker
an optimization framework, in order to provide an efficient, rigorous model and surrogate-Based Trust Region filter method. Comput. Chem. Eng. 153,
bound on the optimum of the ‘‘truth’’ model — or the real-world plant. 107455.
Finally, we note that there are still many open questions related Conn, A.R., Scheinberg, K., Vicente, L.N., 2009. Introduction to Derivative-Free
Optimization. SIAM, Phila., PA.
to machine learning algorithms, data-driven models, and how they
Detournay, E., Richard, T., Shepherd, M., 2008. Drilling response of drag bits: Theory
interact with mathematical programming strategies and first principle and experiment. Int. J. Rock Mech. Min. Sci. 45 (8), 1347–1360.
process models. Much exciting research is still to be explored! Djeumou, F., Neary, C., Goubault, E., Putot, S., Topcu, U., 2022. Neural networks with
physics-informed architectures and constraints for dynamical systems modeling.
Proc. Mach. Learn. Res. 168, 1–15.
CRediT authorship contribution statement Dowling, A.W., Eason, J.P., Ma, J., Miller, D.C., Biegler, L.T., 2016. Equation-based
design, integration, and optimization of oxycombustion power systems. In: Mar-
tin, M. (Ed.), Alternative Energy Sources and Technologies. Springer, Switzerland,
Ruth Misener: Conceptualization, Investigation, Writing – original pp. 119–158.
draft, Writing – review & editing. Lorenz Biegler: Conceptualization, Eason, J.P., Biegler, L.T., 2016. A trust region filter method for glass box/black box
Investigation, Writing – original draft, Writing – review & editing. optimization. AIChE J. 62 (9), 3124–3136.
Eason, J.P., Biegler, L.T., 2018. Advanced trust region optimization strategies for glass
box/black box models. AIChE J. 64 (11), 3934–3943.
Declaration of competing interest Eason, J.P., Kang, J.-Y., Chen, X., Biegler, L.T., 2018. Surrogate equations of state for
equation-oriented optimization of polymerization processes. 44, 781–786.
Fahl, M., Sachs, E.W., 2003. Reduced-order modelling approaches to PDE-
The authors declare that they have no known competing finan- constrained optimization based on proper orthogonal decomposition. In: Large-Scale
cial interests or personal relationships that could have appeared to PDE-Constrained Optimization. Springer, Heidelberg, Germany, pp. 268–280.
influence the work reported in this paper. Fazlyab, M., Morari, M., Pappas, G.J., 2019. Probabilistic verification and reachability
analysis of neural networks via semidefinite programming. In: 2019 IEEE 58th CDC
Conference. pp. 2726–2731.
Data availability Fischetti, M., Jo, J., 2018. Deep neural networks and mixed integer linear optimization.
Constraints 23 (3), 296.
Goldstein, D., Heyer, M., Jakobs, D., Schultz, E.S., Biegler, L.T., 2022. Multilevel
No data was used for the research described in the article. surrogate modeling of an amine scrubbing process. AIChE J. 68, 6, e17705.

9
R. Misener and L. Biegler Computers and Chemical Engineering 179 (2023) 108411

Grimstad, B., Andersson, H., 2019. ReLU networks as surrogate models in mixed-integer Schweidtmann, A.M., Bongartz, D., Grothe, D., Kerkenhoff, T., Lin, X., Najman, J.,
linear programs. Comput. Chem. Eng. 131, 106580. Mitsos, A., 2021a. Deterministic global optimization with Gaussian processes
Henao, C.A., Maravelias, C.T., 2011. Surrogate-based superstructure optimization embedded. Math. Prog. Comput. 13 (3), 553–581.
framework. AIChE J. 57 (5), 1216–1232. Schweidtmann, A.M., Esche, E., Fischer, A., Kloft, M., Repke, J.-U., Sager, S., Mitsos, A.,
Huyer, W., Neumeier, A., 2008. SNOBFIT – stable noisy optimization by branch and 2021b. Machine learning in chemical engineering: A perspective. Chem. Ing. Tech.
fit. ACM Trans. Math. Softw. 35 (2). 93 (12), 2029–2039.
Kang, J., Shao, Z., Chen, X., Biegler, L.T., 2019. Reduced order models for dynamic Schweidtmann, A.M., Mitsos, A., 2019. Deterministic global optimization with artificial
molecular weight distribution in polymerization processes. Comput. Chem. Eng. neural networks embedded. J. Optim. Theory Appl. 180 (3), 925–948.
126, 280–291. Shang, C., You, F., 2019. Data analytics and machine learning for smart process
Katz, J., Pappas, I., Avraamidou, S., Pistikopoulos, E.N., 2020. Integrating deep learning manufacturing: recent advances and perspectives in the big data era. Engineering
models and multiparametric programming. Comput. Chem. Eng. 136, 106801. 5 (6), 1010–1016.
Kazi, S., Short, M., Biegler, L.T., 2020a. Heat exchanger network optimization including Short, M., Isafiade, A.J., Fraser, D.M., Kravanja, Z., 2016. Synthesis of heat ex-
detailed heat exchanger models Using Trust Region methods. Comput. Aided Chem. changer networks using mathematical programming and heuristics in a two-step
Eng. 48, 1147–1152. optimisation procedure with detailed exchanger design. Chem. Eng. Sci. 144,
Kazi, S.R., Short, M., Biegler, L.T., 2020b. Heat exchanger network synthesis with 372–385.
detailed exchanger designs - 1. a discretized differential algebraic equation(DAE) Snelson, E., Ghahramani, Z., Rasmussen, C., 2003. Warped gaussian processes. In:
model for shell and tube heat exchanger design. e17056. Advances in Neural Information Processing Systems, Vol. 16.
Kazi, S.R., Short, M., Biegler, L.T., 2020c. Heat exchanger network synthesis with Thebelt, A., Kronqvist, J., Mistry, M., Lee, R.M., Sudermann-Merx, N., Misener, R., 2021.
detailed exchanger designs - 2. hybrid optimization strategy for synthesis of heat ENTMOOT: A framework for optimization over ensemble tree models. Comput.
exchanger networks with detailed individual heat exchanger designs. e17057. Chem. Eng. 151, 107343.
Kazi, S.R., Short, M., Biegler, L.T., 2021. Synthesis of combined heat and mass exchange Thebelt, A., Tsay, C., Lee, R.M., Sudermann-Merx, N., Walz, D., Tranter, T., Misener, R.,
networks via a trust RegionFilter optimization algorithm including detailed unit 2022a. Multi-objective constrained optimization for energy applications via tree
designs. Comput. Aided Chem. Eng. 50 (1), 3–18. ensembles. Appl. Energy 306, 118061.
Khalil, E.B., Gupta, A., Dilkina, B., 2018. Combinatorial attacks on binarized neural Thebelt, A., Wiebe, J., Kronqvist, J., Tsay, C., Misener, R., 2022b. Maximizing
networks. In: International Conference on Learning Representations. information from chemical engineering data sets: Applications to machine learning.
Kim, S.H., Boukouvala, F., 2020. Surrogate-based optimization for mixed-integer Chem. Eng. Sci. 252, 117469.
nonlinear problems. Comput. Chem. Eng. 140, 106847. Tsay, C., 2021. Sobolev trained neural network surrogate models for optimization.
Linnhoff, B., Hindmarsh, E., 1983. The pinch design method for heat exchanger Comput. Chem. Eng. 153, 107419.
networks. Chem. Eng. Sci. 38 (5), 745–763. Tsay, C., Baldea, M., 2019. 110Th anniversary: Using data to bridge the time and length
Lomuscio, A., Maganti, L., 2017. An approach to reachability analysis for feed-forward scales of process systems. Ind. Eng. Chem. Res. 58, 16696–16708.
ReLU neural networks. arXiv preprint arXiv:1706.07351. Tsay, C., Kronqvist, J., Thebelt, A., Misener, R., 2021. Partition-based formulations for
Lueg, L., Grimstad, B., Mitsos, A., Schweidtmann, A.M., 2021. reluMIP: Open source tool mixed-integer optimization of trained ReLU neural networks. In: NeurIPS.
for MILP optimization of ReLU neural networks. https://github.com/ChemEngAI/ Von Stosch, M., Oliveira, R., Peres, J., de Azevedo, S.F., 2014. Hybrid semi-parametric
ReLU_ANN_MILP. 1.0.0. modeling in process systems engineering: Past, present and future. Comput. Chem.
Ma, K., Sahinidis, N.V., Bindlish, R., Bury, S.J., Haghpanah, R., Rajagopalan, S., 2022. Eng. 60, 86–101.
Data-driven strategies for extractive distillation unit optimization. Comput. Chem. Wächter, A., Biegler, L.T., 2006. On the implementation of an interior-point filter
Eng. 167, 107970. line-search algorithm for large-scale nonlinear programming. Math. Program. 106,
McBride, K., Sundmacher, K., 2019. Overview of surrogate modeling in chemical process 25–57.
engineering. Chem. Ing. Tech. 91 (3), 228–239. Wiebe, J., Cecílio, I., Dunlop, J., Misener, R., 2022. A robust approach to warped
Mišić, V.V., 2020. Optimization of tree ensembles. Oper. Res. 68 (5), 1605–1624. Gaussian process-constrained optimization. Math. Prog. 1–35.
Mistry, M., Letsios, D., Krennrich, G., Lee, R.M., Misener, R., 2021. Mixed-integer Wiebe, J., Misener, R., 2021. ROmodel: Modeling robust optimization problems in
convex nonlinear optimization with gradient-boosted trees embedded. INFORMS Pyomo. Optim. Eng. 1–22.
J. Comput. 33 (3), 1103–1119. Wilson, J., Borovitskiy, V., Terenin, A., Mostowsky, P., Deisenroth, M., 2020. Efficiently
Mizutani, F.T., Pessoa, F.L., Queiroz, E.M., Hauan, S., Grossmann, I.E., 2003. Mathemat- sampling functions from Gaussian process posteriors. In: ICML. pp. 10292–10302.
ical programming model for heat-exchanger network synthesis including detailed Wilson, Z.T., Sahinidis, N.V., 2017. The ALAMO approach to machine learning. Comput.
heat-exchanger designs. 2. Network synthesis. Ind. Eng. Chem. Res. 42, 4019–4027. Chem. Eng. 106, 785.
Ning, C., You, F., 2019. Optimization under uncertainty in the era of big data and Wright, S.J., Recht, B., 2022. Optimization for Data Analysis. Cambridge, Cambridge,
deep learning: When machine learning meets mathematical programming. Comput. UK.
Chem. Eng. 125, 434–448. Xiao, W., Wang, K., Jiang, X., Li, X., Wu, X., Ze, H., He, G., 2019. Simultaneous
Palmer, K., Realff, M., 2002. Metamodeling approach to optimization of steady-state optimization strategies for heat exchanger network synthesis and detailed shell-
flowsheet simulations: Model generation. Chem. Eng. Res. Des. 80 (7), 760–772. and-tube heat-exchanger design involving phase changes using GA/SA. Energy
Pistikopoulos, E.N., Barbosa-Povoa, A., Lee, J.H., Misener, R., Mitsos, A., Reklaitis, G.V., 1166–1177.
Venkatasubramanian, V., You, F., Gani, R., 2021. Process systems engineering – the Yang, D., Balaprakash, P., Leyffer, S., 2021. Modeling design and control problems
generation next? Comput. Chem. Eng. 147, 107252. involving neural network surrogates. arXiv preprint arXiv:2111.10489.
Qin, S.J., Chiang, L.H., 2019. Advances and opportunities in machine learning for Yee, T.F., Grossmann, I.E., 1990. Simultaneous Optimization Models for Heat
process data analytics. Comput. Chem. Eng. 126, 465–473. Integration—II. Heat Exchanger Network Synthesis. Comput. Chem. Eng. 14 (10),
Raghunathan, A., Steinhardt, J., Liang, P.S., 2018. Semidefinite relaxations for certifying 1165–1184.
robustness to adversarial examples. In: NeurIPS, Vol. 31. pp. 10877–10887. Yoshio, N., Biegler, L.T., 2020. Demand-based optimization of a chlorobenzene process
Rios, L.M., Sahinidis, N.V., 2013. Derivative-free optimization: a review of algorithms with high-fidelity and surrogate reactor models under trust region strategies. AIChE
and comparison of software implementations. J. Global Optim. 56 (3), 1247–1293. J. e17054.
Sahinidis, N.V., 1996. BARON: A general purpose global optimization software package. Zhang, J., Li, H., Sra, S., Jadbabaie, A., 2022. Neural network weights do not converge
J. Global Optim. 8, 201–205. to stationary points: An invariant measure perspective. In: ICML.
Sargent, R., 1972. Forecasts and trends in systems engineering. Chem. Eng. 262, Zhang, S., Salazar, J.S.C., Feldmann, C., Walz, D., Sandfort, F., Mathea, M., Tsay, C.,
226–230. Misener, R., 2023. Optimizing over trained GNNs via symmetry breaking. arXiv
preprint arXiv:2305.09420.

10

You might also like