Reference 19
Reference 19
net/publication/359084677
CITATION READS
1 93
8 authors, including:
All content following this page was uploaded by Rafael Oliveira on 11 March 2022.
RESEARCH ARTICLE
Abstract
Bayesian optimization (BO) has been a successful approach to optimize expensive functions whose prior knowledge can
be specified by means of a probabilistic model. Due to their expressiveness and tractable closed-form predictive
distributions, Gaussian process (GP) surrogate models have been the default go-to choice when deriving BO frameworks.
However, as nonparametric models, GPs offer very little in terms of interpretability and informative power when applied
to model complex physical phenomena in scientific applications. In addition, the Gaussian assumption also limits the
applicability of GPs to problems where the variables of interest may highly deviate from Gaussianity. In this article, we
investigate an alternative modeling framework for BO which makes use of sequential Monte Carlo (SMC) to perform
Bayesian inference with parametric models. We propose a BO algorithm to take advantage of SMC’s flexible posterior
representations and provide methods to compensate for bias in the approximations and reduce particle degeneracy.
Experimental results on simulated engineering applications in detecting water leaks and contaminant source localization
are presented showing performance improvements over GP-based BO approaches.
Impact Statement
The methodology we present in this article can be applied to a wide range of problems involving sequential
decision making. As presented in a water leak detection experiment, one may apply the algorithm to guide robots
in automated monitoring of underground water lines. Other applications include environmental monitoring,
chemical synthesis, disease control, and so forth. One of the main advantages of the proposed framework when
compared to previous Bayesian optimization approaches is the interpretability of the model, which allows for
inferring variables important for analysis and decision support. In addition, practical performance improvements
are also observed in experiments.
© The Author(s), 2022. Published by Cambridge University Press. This is an Open Access article, distributed under the terms of the Creative Commons
Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the
original article is properly cited.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-2 Rafael Oliveira et al.
1. Introduction
Bayesian optimization (BO) offers a principled approach to integrate probabilistic models into processes
of decision making under uncertainty (Shahriari et al., 2016). In particular, BO has been successful in
optimization problems involving black-box functions and very little prior information, such as smooth-
ness of the objective function. Examples include hyperparameter tuning (Snoek et al., 2012), robotic
exploration (Souza et al., 2014), and chemical design (Shields et al., 2021), and disease control (Spooner
et al., 2020). Most of its success relies on the flexibility and well-understood properties of nonparametric
modeling frameworks, particularly Gaussian process (GP) regression (Rasmussen and Williams, 2006).
Although nonparametric models are usually the best approach for problems with scarce prior information,
they offer little in terms of interpretability and may be a suboptimal guide when compared to expert
parametric models. In these expert models, parameters are attached to variables with physical meaning
and reveal aspects of the nature of the problem, since models are derived from domain knowledge.
As a motivating example, consider the problem of localizing leaks in underground water distribution
pipes (Mashford et al., 2012). Measurements from pipe monitoring stations are usually sparse and
excavations are costly (Sadeghioon et al., 2014). As a possible alternative, microgravimetric sensors
have recently allowed detecting gravity anomalies of parts in a billion (Brown et al., 2016; Hardman et al.,
2016), making them an interesting data source for subsurface investigations (Hauge and Kolbjørnsen,
2015; Rossi et al., 2015). One could then design a GP-based BO algorithm to localize a leak in a pipe by
searching for a maximum in the gravity signal on the surface due to the heavier wet soil. The determined
2D location, however, tells nothing of the depth or the volume of leaked water. In this case, a physics-
based probabilistic model of a simulated water leak could better guide BO and the decision-making end
users. Bayesian inference on complex parametric models, however, is usually intractable, requiring the
use of sampling-based techniques, like Markov chain Monte Carlo (MCMC) (Andrieu et al., 2003), or
variational inference methods (Bishop, 2006; Ranganath et al., 2014). Either of these approaches can lead
to high computational overheads during the posterior updates in a BO loop.
In this article, we investigate a relatively unexplored modeling approach in the BO community which
offers a balanced trade-off between MCMC and approximate inference methods for problems where
domain knowledge is available. Namely, we consider sequential Monte Carlo (SMC) (Doucet et al.,
2001), also known as particle filtering, algorithms as posterior update mechanisms for BO. SMC has an
accuracy-speed trade-off controlled by the number of particles it uses, and it suffers less of the drawbacks
of other approximate inference methods (Bishop, 2006), while still enjoying asymptotic convergence
guarantees similar to MCMC (Crisan and Doucet, 2002; Beskos et al., 2014). SMC methods have
traditionally been applied to state-space models for localization (Thrun et al., 2006) and tracking (Doucet
et al., 2000) problems, but SMC has also found use in more general areas, including experimental design
(Kuck et al., 2006) and likelihood-free inference (Sisson et al., 2007).
As contributions, we present an approach to efficiently incorporate SMC within BO frameworks. In
particular, we derive an acquisition function to take advantage of the flexibility of SMC’s particle
distributions by means of empirical quantile functions. Our approach compensates for the approximation
bias in the empirical quantiles by taking into account SMC’s effective sample size (ESS). We also propose
methods to reduce the correlation and bias among SMC samples and improve its predictive power. Lastly,
experimental results demonstrate practical performance improvements over GP-based BO approaches.
2. Related Work
Other than the GP approach, BO frameworks have applied a few other methods for surrogate modeling,
including linear parametric models with nonlinear features (Snoek et al., 2015), Bayesian neural networks
(Springenberg et al., 2016), and random forests (Hutter et al., 2011). Linear models and limits of Bayesian
neural networks when the number of neurons approaches infinity can be directly related to GPs
(Rasmussen and Williams, 2006; Khan et al., 2019). Mostly these approaches, however, consider a
black-box setting for the optimization problem, where very little is known about the stochastic process
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-3
defining the objective function. In this article, we take BO instead toward a gray-box formulation, where
we know a parametric structure which can accurately describe the objective function, but whose true
parameters are unknown.
SMC has previously been combined with BO in GP-based approaches. Benassi et al. (2012) applies
SMC to the problem of learning the hyperparameters of the GP surrogate during the BO process by
keeping and updating a set of candidate hyperparameters according to the incoming observations. Bijl
et al. (2016) provide a method for Thompson sampling (Russo and Van Roy, 2016) using SMC to keep
track of the distribution of the global optimum. These approaches still use GPs as the main modeling
framework for the objective function. Lastly and more related to this article, Dalibard et al. (2017) presents
an approach to use SMC for inference with semiparametric models, where one combines GPs with
informative parametric models. Their framework is tailored to automatically tuning computer programs
following dynamical systems, where the system state transitions. In contrast, our approach is based on a
static formulation of SMC, where the system state corresponds to a probabilistic model’s parameters
vector, which does not change over time. Simply adapting a dynamics-based SMC model to a static
system is problematic due to particle degeneracy in the lack of a transition model (Doucet et al., 2000). We
instead apply MCMC to move and rejuvenate particles in the static SMC framework, as originally
proposed by Chopin (2002).
In the multiarmed bandits literature, whose methods have frequently been applied to BO (Srinivas
et al., 2010; Wang and Jegelka, 2017; Berkenkamp et al., 2019), SMC has also been shown as a useful
modeling framework (Kawale et al., 2015; Urteaga and Wiggins, 2018). In particular, Urteaga and
Wiggins (2018) present a SMC approach to bandits in dynamic problems, where the reward function
evolves over time. A generalization of their approach has recently been proposed to include linear and
discrete reward functions (Urteaga and Wiggins, 2019) with support by empirical results. Bandit
problems seek policies which maximize long-term payoffs. In this article, we instead focus on investi-
gating and addressing the effects of the SMC approximation on a more general class of problems. We also
provide experimental results on applications where domain knowledge offers informative models.
Lastly, BO algorithms provide model-based solutions for black-box derivative-free optimization prob-
lems (Rios and Sahinidis, 2013). In this context, there are plenty of other model-free approaches, such as
evolutionary algorithms, which include the popular covariance matrix adaptation evolution strategy (CMA-
ES) algorithm (Arnold and Hansen, 2010). However, these approaches are usually not focused on improving
data efficiency as the usually require hundreds or even thousands of steps to converge to a global optimum.
In contrast, BO algorithms are usually applied to problems where the number of evaluations of the objective
function is very limited, usually to the order of tens of evaluations, due to a high cost of collecting
observations (Shahriari et al., 2016), for example, drilling a hole to find natural gas deposits. The algorithm
we propose in this article also targets this kind of use case, with the difference that we apply more
informative predictive models than the usual GP-based formulations.
3. Preliminaries
In this section, we specify our problem setup and review relevant theoretical background. We consider an
optimization problem over a function f : X ! R within a compact search space S⊂ X ⊆Rd :
x∗ ∈ argmax f ðxÞ: (1)
x∈S
We assume a parametric formulation for f ðxÞ = hðx,θÞ, where h : X Θ ! R has a known form, but
θ ∈ Θ⊂ Rm is an unknown parameter vector. The only prior information about θ is a probability
distribution pðθÞ. We are allowed to collect up to T observations ot distributed according to an observation
(or likelihood) model pðot jθ, xt Þ, for t ∈ f1,…,T
g. For instance, in the classical
white Gaussian noise
setting, we have ot = f ðxt Þ þ εt , with εt N 0, σ 2ε , so that pðot jθ,xt Þ = N ot ;hðxt , θÞ,σ 2ε . However, our
optimization problem is not restricted by Gaussian noise assumptions.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-4 Rafael Oliveira et al.
As formulated above, we highlight that the objective function f is a black-box function, and the model
h : X Θ ! R is simply an assumption over the real function, which is unknown in practice. For example,
in one of our experiments, we have f as the gravity measured on the surface above an underground water
leak of unknown location. In this case, gradients and analytic formulations for the objective function are
not available. Therefore, we need derivative-free optimization algorithms to solve Equation (1). In
addition, we assume that the budget of observations T is relatively small (in the orders of tens or a few
hundreds) and incrementally built, so that a maximum-likelihood or interpolation approach, as common
in response surface methods (Rios and Sahinidis, 2013), would lead to suboptimal results, as it would not
properly account for the uncertainty due to the limited amount of data and their inherent noise. We then
seek a Bayesian approach to solve Equation (1). In the following, we revise theoretical background on the
main components of the method we propose.
The acquisition function informs BO of the utility of collecting an observation at a given location x ∈ S
based on the posterior predictions for f ðxÞ. For example, with a GP model, one can apply the upper
confidence bound (UCB) criterion:
aðxjDt1 Þ ≔ μt1 ðxÞ þ βt σ t1 ðxÞ, (3)
where f ðxÞ∣Dt1 N μt1 ðxÞ, σ 2t1 ðxÞ represents the GP posterior at iteration t ≥ 1, and βt is a parameter
which can be annealed over time to maintain a high-probability confidence interval over f ðxÞ (Chowdhury
and Gopalan, 2017). Besides the UCB, the BO literature is filled with many other types of acquisition
functions, including expected improvement (Jones et al., 1998; Bull, 2011), Thompson sampling (Russo
and Van Roy, 2016), and information-theoretic criteria (Hennig and Schuler, 2012; Hernández-Lobato
et al., 2014; Wang and Jegelka, 2017).
t
1
In these and the following equations, we omit the dependence on the observation locations x j j¼1 ∈ X to avoid notation
clutter. However, we will make this dependence explicit whenever needed to emphasize that locations are also part of the
observed data.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-5
n SMC methods maintain an approximation of the posterior pðθt jo1 ,…, ot Þ based
Based on this decomposition,
on a set of particles θit i = 1 ⊂Θ, where each particle θit represents a sample from the posterior. Despite the
many variants of SMC available in the literature (Doucet et al., 2001; Naesseth et al., 2019), in its basic form,
the SMC algorithm is simple and straightforward to implement, given a transition model pðθt jθt1 Þ and an
observation model pðot jθt Þ. For instance, a time-varying
spatial model h : X Θ ! R may have a Gaussian
observation model pðot jθt Þ = pðot jθt ,xt Þ = N ot ;hðxt , θt Þ,σ 2ε , with σ ε > 0, and the transition model
pðθt jθt1 Þ may be given by a known stochastic partial differential equation describing the system dynamics.
Basic SMC follows the procedure
n outlined in Algorithm 1. SMC starts with a set of particles initialized
as samples from the prior θi0 i = 1 pðθÞ. At each time step, the algorithm proposes new particles by
moving them according to the state transition model pðθt jθt1 Þ. Given a new observation ot , SMC updates
its particles distribution by first weighing each particle according to its likelihood wit = p ot jθit under the
new observation. A set of n new particles is then sampled (with replacement) from the resulting weighted
empirical distribution. The new particles distribution is then carried over to the next iteration until another
observation is received. Additional steps can be performed to reduce the bias in the approximation and
improve convergence rates (Chopin, 2002; Naesseth et al., 2019).
4. BO via SMC
In this section, we present a quantile-based approach to solve Equation (1) via BO. The method uses SMC
particles to determine a high-probability UCB on f via an empirical quantile function. We start with a
description of the proposed version of the SMC modeling framework, followed by the UCB approach.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-6 Rafael Oliveira et al.
approximation by resampling the particles only when their ESS goes below a certain pre-specified
threshold2 nmin ≤ n, which reduces computational costs. Whenever this happens, we also move the particles
according to the MCMC kernel. This allows us to maintain a diverse particle set, avoiding particle
degeneracy. Otherwise, case the ESS is still acceptable, we simply update the particle weights and continue.
2
Usually nmin is set to a fraction of n, with 50% being a common setting.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-7
P
posterior probability measure of f ðxÞ, denoted by P bt f ðxÞ ≔ ni= 1 wit δ
hðx,θit Þ. With the empirical posterior,
we approximate the quantile function on f ðxÞ as:
n o
q f ðxÞ ðτ Þ ≈ b bt f ðxÞ ð f ðxÞ ≤ sÞ ≥ τ , τ ∈ ð0,1Þ:
qt ðx, τ Þ ≔ inf s ∈ RjP (7)
Although SMC samples are inherently biased with respect to the true posterior, we explore methods to
transform SMC particles into approximately i.i.d. samples from the true posterior. For instance, the
MCMC moves follow the true posterior, which turn the particles into approximate samples of the true
posterior. In addition, other methods such as density estimation and the jackknife (Efron, 1992) allow for
decorrelation and bias correction. We explore these approaches in Section 5. As a consequence of Lemma
1, we have the following bound on the empirical quantiles.
Theorem 1 (Szörényi et al. (2015), Proposition 1)). Under the assumptions in Lemma 1, given x ∈ X
and δ ∈ ð0,1Þ, the following holds for every ρ ∈ ½0,1 with probability at least 1 δ:
qξ ðρ cn ðδÞÞ ≤ qξ ðρÞ ≤ b
∀n ≥ 1, b qξ ðρ þ cn ðδÞÞ, (8)
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where cn ðδÞ ≔ 2n log π3δn .
1 2 2
Note that Theorem 1 provides a confidence interval around the true quantile based on i.i.d. empirical
approximations. In the case of SMC, however, its particles do not follow the i.i.d. assumption. We address
this problem in our method, with further approaches for bias reduction presented in Section 5.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-8 Rafael Oliveira et al.
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
where cn ðδÞ ≔ 2n log π3δn . In the non- i.i.d. case, however, the approximation above is no longer valid.
1 2 2
We instead replace n by the ESS nESS , which is defined as the ratio between the variance of an i.i.d. Monte
Carlo estimator and the variance, or mean squared error (Elvira et al., 2018), of the SMC estimator
(Martino et al., 2017). With ntESS denoting the ESS at iteration t, we set:
δt ≔ 1 δ þ cntESS ðδÞ: (11)
Several approximations for the ESS are available in the SMC literature (Martino et al., 2017; Huggins
and Roy, 2019), which are usually based on the distribution of the weights of the particles. A classical
Pn
ð wi Þ
2
example is bnESS ≔ Pin= 1 2 (Doucet et al., 2000; Huggins and Roy, 2019). In practice, the simple
i=1
wi
substitution of n by b nESS defined above can be enough to compensate for the correlation and bias in the
SMC samples. In Section 5, we present further approaches to reduce the SMC approximation error with
respect to an i.i.d. sample-based estimator.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-9
We, however, do not have access to pðDt Þ to compute pðθjDt Þ. Therefore, we do another approximation
using the same proposal samples:
Z Z X p θi^pt
pðθÞ n
pðDt Þ = pðDt jθÞpðθÞdθ = pðDt jθÞ ^pt ðθÞdθ ≈ p Dt jθi^pt : (15)
Θ Θ ^
pt ðθÞ i=1 ^pt θi^pt
pðθi^p Þ
Setting αit ≔ p Dt jθi^pt ^pt ðθi^p
t
, we then have:
t
Þ
1X n
E½uðθÞjDt ≈ αi u θi^pt , x ∈ X , (16)
ηt i = 1 t
P
where ηt ≔ ni= 1 αit ≈ pðDt Þ. This approach has been recently applied to estimate intractable marginal
likelihoods of probabilistic models (Tran et al., 2021).
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-10 Rafael Oliveira et al.
!
1X n
b
ubias ≔ ðn 1Þ ui b
b u : (17)
n i=1
Having an estimate for the bias, we can subtract it from the approximation in Equation (16) to compensate
for its bias. We compare the effect of this combined with the previous approaches in preliminary
experiments presented next.
Figure 1. Posterior CDF approximation errors for the exponential-gamma model using T ¼ 2 obser-
vations. For each sample size, which corresponds to the number of SMC particles, SMC runs were
repeated 400 times for each method, except for the jackknife, which was rerun 40 times due to a longer run
time.The theoretical upper confidence bound on the CDF approximation error cn(δ) (Theorem 1) is shown
as the plotted blue line. The frequency of violation of the theoretical bounds for i.i.d. empirical CDF errors
is also presented on the top of each plot, alongside the target (δ ¼ 0:1).
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-11
Figure 2. Posterior CDF approximation errors for the exponential-gamma model using T ¼ 5 obser-
vations. For each sample size, which corresponds to the number of SMC particles, SMC runs were
repeated 400 times for each method, except for the jackknife, which was rerun 40 times due to a longer run
time.The theoretical upper confidence bound on the CDF approximation error cn(δ) (Theorem 1) is shown
as the plotted blue line. The frequency of violation of the theoretical bounds for i.i.d. empirical CDF errors
is also presented on the top of each plot, alongside the target (δ ¼ 0:1).
posteriors after a larger number of observations is collected. In particular, we highlight the difference
between importance reweighting and jackknife results, which is more noticeable for T = 5. The main
drawback of the jackknife approach, however, is its increased computational cost due to the repeated
leave-one-out estimates. In our main experiments, therefore, we present the effects of the importance
reweighting correction compared to the standard SMC predictions.
5.5. Limitations
As a few remarks on the limitations of the proposed framework, we highlight that the use of more complex
forward models and bias correction methods for the predictions comes at an increased cost in terms of
runtime, when compared to simpler models, such as GPs. In practice, one should consider applications
where the cost of observations is much larger than the cost of evaluating model predictions. For example,
in mineral exploration, collecting observations may involve expensive drilling operations which take a
much higher cost than running simulations for a few minutes, or even days, to obtain an SMC estimate.
This runtime cost, however, becomes amortized as the number of observations grows, since the
computational complexity for GP updates is still Oðt 3 Þ, which grows cubically with the number of
observations, when compared to an Oðnt Þ for SMC. Even when considering importance reweighting or
optionally combining it with the jackknife method for bias correction, the computational complexity for
SMC updates is only added by Oðnt Þ or Oðn2 t Þ, respectively, which are both linear with respect to the
number of observations. Lastly, the compromise between accuracy and speed can be controlled by
adjusting the number of particles n used by the algorithm. As each particle corresponds to an independent
simulation by the forward model, these simulations can also be executed simultaneously in parallel,
further reducing the algorithm’s runtime. The following section presents experimental results on practical
applications involving water resources monitoring, which in practice present a suitable use case for our
framework, given the cost of observations and the availability of informative physics simulation models.
6. Experiments
In this section, we present experimental results comparing SMC-UCB against its GP-based counterpart
GP-UCB (Srinivas et al., 2010) and the GP expected improvement (GP-EI) algorithm (Jones et al.,
1998), which does not depend on a confidence parameter. We assess the effects of the SMC approxi-
mation and its optimization performance in different problem settings. As a performance metric, we use
the algorithm’s regret r t ≔ max x ∈ S f ðxÞ f ðxt Þ. The averaged regret provides
P an upper bound on the
minimum regret of the algorithm up to iteration T, that is, min t ≤ T r t ≤ T1 Tt= 1 r t . A vanishing average
regret (as T ! ∞) indicates that the choices of the algorithm get arbitrarily close to the optimal solution
x∗ in Equation (1).
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-12 Rafael Oliveira et al.
(a) Regret (b) Quantile difference (c) Comparison with EI (d) Dimensionality effect
Figure 3. Linear Gaussian case: (a) mean regret of SMC-UCB for different n compared to the GP-UCB
baseline with parameter dimension m ≔ 10; (b) approximation error between the SMC quantile b qt ðxt , δt Þ
and the true qt ðxt , δt Þ at SMC-UCB’s selected query point xt for different n settings (absent values
correspond to cases where b qt ðx, δt Þ ¼ ∞); (c) comparison with the non-UCB, GP-based expected
improvement algorithm; and (d) effect of parameter dimension m on optimization performance when
compared to the median performance of the GP optimization baselines. All results were averaged over
10 runs. The shaded areas correspond to 1 standard deviation.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-13
6.2.3. Likelihood
As an observation model, we use Equations (20) and (21) to compose a Gaussian likelihood model based
on the gravity of the pipe gP and of the wet-soil sphere gS . As gravity is an integral over distributions of
mass, Gaussian noise is a well-suited assumption
due to the central limit theorem (Bauer, 1981). So we set
T
pðot jθ, xÞ = N ot ;gP,3 ðxÞ þ gS,3 ðxÞ, σ 2ε for the gravity model parameters θ ≔ xTP ,r P ,ρP , uTP , xTS , r S , ρS
3
Simulations based on COMSOL Multiphysics software environment: https://www.comsol.com
4
Available at: https://www.comsol.com
5
Details at: https://www.comsol.com/subsurface-flow-module
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-14 Rafael Oliveira et al.
Figure 4. Pipe simulation diagram: A water pipe of 2 in diameter is buried 3 underground in a large block
of soil (100 100 50m).
(a) Objective and regret (b) Final estimates (c) SMC leak location
Figure 5. Performance results for water leak detection experiment: (a) The gravity objective function
generated by CFD simulations and the mean regret curves for each algorithm. The shaded areas in the
plot correspond 1 standard deviation results from results which were averaged over 10 trials. (b) The
gravity estimates according to the final SMC and GP posteriors after 100 iterations. (c) SMC estimates
for the parameters concerning the location of the leak. The upper plot in (c) is colored according to an
estimate for the mass of leaked water.
based on the vertical-axis gravity measurements with σ ε ≔ 5 108 m s2, the noise level of the gravity
sensor we consider (Hardman et al., 2016).
6
The pipe can, for example, be assumed to be aligned with a street of known location.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-15
with α = 2 and β = 1 on the pipe’s depth, that is, xP,3 . For the radius, we set r P Γðαr , βr Þ with αr = 2:25
and βr = 0:75, yielding a 95% confidence interval of r P ∈ ½0:4,8 m. The pipe’s density is set as
tightly around the water density ρP N b ρP ,σ 2ρP with bρP ≔ 1000 kg m3 and σ ρP ≔ 10 kg m3, which
allows for small variations in the pipe’s mass distribution due to the outflow of water with the leak.
6.2.8. Results
Figure 5 presents results for the leak detection experiment. As the plot in Figure 5a shows, the proposed
SMC-UCB algorithm is able to overcome GP-UCB in this setting with a clearly lower regret from the
beginning. The SMC method also presents final predictions better approximating the provided CFD data
with low uncertainty on the sphere position parameters, as seen in Figure 5b. Due to exploration–
exploitation trade-off of BO algorithms (Brochu et al., 2010; Shahriari et al., 2016), it is natural that the
uncertainty regarding the pipe’s parameters is slightly higher, as its gravity does not directly contribute to
the maximum of the gravity objective function. Yet we can see that SMC’s estimate on the pipe’s gravity
signature is still evident when compared to the GP’s estimated gravity predictions (see Figure 5c). In
addition, SMC is able to directly estimate the leak’s location online based on its empirical posterior
parameters distribution. With GPs the same would require running a second inference scheme using all the
observations.
7
Parameters with wider priors are set with larger step sizes.
8
Data available at: https://github.com/gpirot/BGICLP
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-16 Rafael Oliveira et al.
locations over a window of 300 days with candidate contaminant source locations θ ≔ xC spread over a
51-by-51, 3-meter cell grid, totalling 2,601 possible locations.
6.3.1. Data
The contaminant concentration data is composed of two datasets, one with the simulated concentration
values for all 25 wells and all 301 measurement times over the 51-by-51 source locations grid, while the
other dataset consists of reference “real” concentration values over the 25 wells and the 301 time steps.
The “real” concentration values are in fact simulated concentration values at locations not revealed to the
BO algorithms. To simulate different scenarios, each experiment trial was run on a random day, sampled
uniformly among the 300 days. To provide observations for each algorithm, we also add Gaussian noise
with σ ε ≔ 1 103 , corresponding to roughly a 10% measurement error level.
6.3.4. Results
Figure 6 presents our experimental results for the contaminant source localization problem. As Figure 6a
shows, SMC-UCB is able to achieve a better performance than GP-UCB by using the informative prior
coming from hydraulic simulations. In addition, SMC final parameter estimates (Figure 6b) present high
probability mass around the true source location.
Figure 6. Contaminant source localization problem: (a) The optimization performance of each algorithm
in terms of regret. Results were averaged over 10 runs. (b) A final SMC estimate for the source location,
while the true location is marked as a red star.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-17
7. Conclusion
This article presented SMC as an alternative to GP based approaches for Bayesian optimization when
domain knowledge is available in the form of informative computational models. We presented a quantile-
based acquisition function for BO which is adjusted for the effects of the approximation bias in the SMC
estimates and allows for inference based on non-Gaussian posterior distributions. The resulting SMC
algorithm is shown to outperform the corresponding GP-based baseline in different problem scenarios.
The work in this article can be seen as a starting point toward a wider adoption of SMC algorithms in
BO applications. Further studies with theoretical assessments of the SMC approximation should
strengthen the approach and allow for the derivation of new algorithms which can make an impact on
the practical use of BO in data-driven science and engineering applications. Another interesting direction
of future research is the incorporation of likelihood-free SMC (Sisson et al., 2007) methods into the BO
loop in cases where forward models do not allow for a closed-form expression of the observation model.
Acknowledgment. Rafael Oliveira was supported by the Medical Research Future Fund Applied Artificial Intelligence in Health
Care grant (MRFAI000097).
Data Availability Statement. The data for the contaminant source localization experiment is available at https://github.com/
gpirot/BGICLP.
Author Contributions. Conceptualization: R.O., R.S., R.K., S.C., J.C.; Data curation: K.H., N.T.; Formal analysis: R.O.; Funding
acquisition: R.S., R.K., S.C., J.C., C.L; Investigation: R.O., K.H., N.T., C.L; Methodology: R.O., R.K.; Software: R.O.;
Supervision: R.S., R.K., S.C., C.L.; Writing—original draft: R.O. All authors approved the final submitted draft.
Funding Statement. Parts of this research were conducted by the Australian Research Council Industrial Transformation Training
Centre in Data Analytics for Resources and the Environment (DARE), through project number IC190100031.
References
Andrieu C, De Freitas N, Doucet A and Jordan MI (2003) An introduction to MCMC for machine learning. Machine Learning 50
(1–2), 5–43.
Arnold DV and Hansen N (2010) Active covariance matrix adaptation for the (1þ1)-CMA-ES. In Proceedings of the 12th Annual
Conference on Genetic and Evolutionary Computation - GECCO ’10, Portland, OR. New York: ACM.
Bauer H (1981) Probability Theory and Elements of Measure Theory. Probability and Mathematical Statistics. New York:
Academic Press.
Benassi R, Bect J and Vazquez E (2012) Bayesian optimization using sequential Monte Carlo. In Hamadi Y and Schoenauer M
(eds), Learning and Intelligent Optimization. Berlin: Springer, pp. 339–342.
Berkenkamp F, Schoellig AP and Krause A (2019) No-regret Bayesian optimization with unknown hyperparameters. Journal of
Machine Learning Research (JMLR) 20, 1–24.
Beskos A, Crisan DO, Jasra A and Whiteley N (2014) Error bounds and normalising constants for sequential Monte Carlo
samplers in high dimensions. Advances in Applied Probability 46(1), 279–306.
Bijl H, Schön TB, van Wingerden J-W and Verhaegen M (2016) A sequential Monte Carlo approach to Thompson sampling for
Bayesian optimization. arXiv.
Bishop CM (2006) Chapter 10: Approximate inference. In Pattern Recognition and Machine Learning. Berlin: Springer,
pp. 461–522.
Brochu E, Cora VM and de Freitas N (2010) A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application
to Active User Modeling and Hierarchical Reinforcement Learning. Technical Report. Vancouver, BC: University of British
Columbia.
Brown G, Ridley K, Rodgers A and de Villiers G (2016) Bayesian signal processing techniques for the detection of highly
localised gravity anomalies using quantum interferometry technology. In Emerging Imaging and Sensing Technologies, 9992.
Bellingham, WA: SPIE.
Bull AD (2011) Convergence rates of efficient global optimization algorithms. Journal of Machine Learning Research (JMLR) 12,
2879–2904.
Chopin N (2002) A sequential particle filter method for static models. Biometrika 89(3), 539–551.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
e5-18 Rafael Oliveira et al.
Chowdhury SR and Gopalan A (2017) On Kernelized multi-armed bandits. In Proceedings of the 34th International Conference
on Machine Learning (ICML), Sydney, Australia: Proceedings of Machine Learning Research.
Cornaton F (2007) Ground Water: A 3-d Ground Water and Surface Water Flow, Mass Transport and Heat Transfer Finite Element
Simulator. Technical Report. Neuchâtel: University of Neuchâtel.
Crisan D and Doucet A (2002) A survey of convergence results on particle filtering methods for practitioners. IEEE Transactions
on Signal Processing 50(3), 736–746.
Dalibard V, Schaarschmidt M and Yoneki E (2017) Boat: Building auto-tuners with structured bayesian optimization. In
Proceedings of the 26th International Conference on World Wide Web, WWW ’17. Geneva: Republic and Canton of Geneva,
CHE, International World Wide Web Conferences Steering Committee, pp. 479–488.
Del Moral P, Doucet A and Jasra A (2012) On adaptive resampling strategies for sequential Monte Carlo methods. Bernoulli 18
(1), 252–278.
Dinh L, Sohl-Dickstein J and Bengio S (2017) Density estimation using real NVP. In 5th International Conference on Learning
Representations (ICLR 2017), Toulon, France.
Doucet A, de Freitas N and Gordon N (eds) (2001) Sequential Monte Carlo Methods in Practice. Information Science and
Statistics, 1st Edn. New York: Springer.
Doucet A, Godsill S and Andrieu C (2000) On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and
Computing 10, 197–208.
Durand A, Maillard O-A and Pineau J (2018) Streaming kernel regression with provably adaptive mean, variance, and
regularization. Journal of Machine Learning Research 19, 1–48.
Efron B (1992) Bootstrap methods: Another look at the Jackknife. In Kotz S and Johnson NL (eds), Breakthroughs in Statistics:
Methodology and Distribution. New York: Springer, pp. 569–593.
Elvira V, Martino L and Robert CP (2018) Rethinking the Effective Sample Size. arXiv e-prints, arXiv:1809.04129.
Hardman KS, Everitt PJ, McDonald GD, Manju P, Wigley PB, Sooriyabandara MA, Kuhn CCN, Debs JE, Close JD and
Robins NP (2016) Simultaneous precision gravimetry and magnetic gradiometry with a Bose-Einstein condensate: A high
precision, quantum sensor. Physical Review Letters 117, 138501.
Hauge VL and Kolbjørnsen O (2015) Bayesian inversion of gravimetric data and assessment of CO2 dissolution in the Utsira
formation. Interpretation 3(2), SP1–SP10.
Hennig P and Schuler CJ (2012) Entropy search for information-efficient global optimization. Journal of Machine Learning
Research (JMLR) 13, 1809–1837.
Hernández-Lobato JM, Hoffman MW and Ghahramani Z (2014) Predictive entropy search for efficient global optimization of
black-box functions. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2014), Montréal, Canada:
Curran Associates.
Huggins JH and Roy DM (2019) Sequential Monte Carlo as approximate sampling: Bounds, adaptive resampling via ∞-ESS, and
an application to particle Gibbs. Bernoulli 25(1), 584–622.
Hutter F, Hoos HH and Leyton-Brown K (2011) Sequential model-based optimization for general algorithm configuration. In
International Conference on Learning and Intelligent Optimization. Cham: Springer, pp. 507–523.
Jones DR, Schonlau M and Welch WJ (1998) Efficient global optimization of expensive black-box functions. Journal of Global
Optimization 13(4), 455–492.
Kaufmann E, Cappé O and Garivier A (2012) On Bayesian upper confidence bounds for bandit problems. In Proceedings of the
15th International Conference on Artificial Intelligence and Statistics (AISTATS), Vol. 22. La Palma. JMLR: W&CP,
pp. 592–600.
Kawale J, Bui H, Kveton B, Thanh LT and Chawla S (2015) Efficient Thompson sampling for online matrix-factorization
recommendation. In Advances in Neural Information Processing Systems, Montréal, Canada. Curran Associates, Curran
Associates, Inc. pp. 1297–1305.
Khan ME, Immer A, Abedi E and Korzepa M (2019) Approximate inference turns deep networks into Gaussian processes. In
Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada: Curran Associates.
Kobyzev I, Prince S and Brubaker M (2020) Normalizing flows: An introduction and review of current methods. IEEE
Transactions on Pattern Analysis and Machine Intelligence 43, 3964–3979.
Kuck H, de Freitas N and Doucet A (2006) SMC samplers for Bayesian optimal nonlinear design. In 2006 IEEE Nonlinear
Statistical Signal Processing Workshop, Cambridge, UK: IEEE.
Martino L, Elvira V and Louzada F (2017) Effective sample size for importance sampling based on discrepancy measures. Signal
Processing 131, 386–401.
Mashford J, De Silva D, Burn S and Marney D (2012) Leak detection in simulated water pipe networks using SVM. Applied
Artificial Intelligence 26(5), 429–444.
Massart P (1990) The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of Probability 18(3), 1269–1283.
Naesseth CA, Lindsten F and Schön TB (2019) Elements of sequential Monte Carlo. Foundations and Trends in Machine
Learning 12(3), 187–306.
Neal RM (2011) MCMC using Hamiltonian dynamics. In Brooks S, Gelman A, Jones G and Meng X-L (eds), Handbook of Markov
Chain Monte Carlo. New York: Chapman & Hall, pp. 113–162.
Pirot G, Krityakierne T, Ginsbourger D and Renard P (2019) Contaminant source localization via Bayesian global optimization.
Hydrology and Earth System Sciences 23, 351–369.
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
Data-Centric Engineering e5-19
Ranganath R, Gerrish S and Blei DM (2014) Black box variational inference. In Proceedings of the 17th International Conference
on Artificial Intelligence and Statistics (AISTATS), Reykjavik, Iceland: Proceedings of Machine Learning Research.
Rasmussen CE and Williams CKI (2006) Gaussian Processes for Machine Learning. Cambridge, MA: The MIT Press.
Rios LM and Sahinidis NV (2013) Derivative-free optimization: A review of algorithms and comparison of software implemen-
tations. Journal of Global Optimization 56(3), 1247–1293.
Rossi L, Reguzzoni M, Sampietro D and Sansò F (2015) Integrating geological prior information into the inverse gravimetric
problem: The Bayesian approach. In VIII Hotine-Marussi Symposium on Mathematical Geodesy. Cham: International Associ-
ation of Geodesy Symposia, pp. 317–324.
Russo D and Van Roy B (2016) An information-theoretic analysis of Thompson sampling. Journal of Machine Learning Research
(JMLR) 17, 1–30.
Sadeghioon AM, Metje N, Chapman DN and Anthony CJ (2014) SmartPipes: Smart wireless sensor networks for leak detection
in water pipelines. Journal of Sensor and Actuator Networks 3(1), 64–78.
Schuster I, Mollenhauer M, Klus S and Muandet K (2020) Kernel conditional density operators. In Proceedings of the 23rd
International Conference on Artificial Intelligence and Statistics (AISTATS), Palermo, Italy Vol. 108 Proceedings of Machine
Learning Research.
Shahriari B, Swersky K, Wang Z, Adams RP and De Freitas N (2016) Taking the human out of the loop: A review of Bayesian
optimization. Proceedings of the IEEE 104(1), 148–175.
Shields BJ, Stevens J, Li J, Parasram M, Damani F, Alvarado JI, Janey JM, Adams RP and Doyle AG (2021) Bayesian
reaction optimization as a tool for chemical synthesis. Nature 590(7844), 89–96.
Sisson SA, Fan Y and Tanaka MM (2007) Sequential Monte Carlo without likelihoods. Proceedings of the National Academy of
Sciences of the United States of America 104(6), 1760–1765.
Snoek J, Larochelle H and Adams RP (2012) Practical bayesian optimization of machine learning algorithms. In Pereira F, Burges
CJC, Bottou L and Weinberger KQ (eds), Advances in Neural Information Processing Systems 25. Red Hook, NY: Curran
Associates, pp. 2951–2959.
Snoek J, Rippel O, Swersky K, Kiros R, Satish N, Sundaram N, Patwary M, Prabhat, and Adams R (2015) Scalable Bayesian
optimization using deep neural networks. In International Conference on Machine Learning (ICML), Lille, France: Proceedings
of Machine Learning Research (PMLR).
Souza JR, Marchant R, Ott L, Wolf DF and Ramos F (2014) Bayesian Optimisation for active perception and smooth navigation.
In IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China: IEEE.
Spooner T, Jones AE, Fearnley J, Savani R, Turner J and Baylis M (2020) Bayesian optimisation of restriction zones for
bluetongue control. Scientific Reports 10(1), 1–18.
Springenberg JT, Aaron K, Falkner S and Hutter F (2016) Bayesian optimization with robust Bayesian neural networks. In
Advances in Neural Information Processing Systems (NIPS), Barcelona, Spain: Curran Associates.
Srinivas N, Krause A, Kakade SM and Seeger M (2010) Gaussian process optimization in the bandit setting: No regret and
experimental design. In Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel.
Omnipress, pp. 1015–1022.
Szörényi B, Busa-Fekete R, Weng P and Hüllermeier E (2015) Qualitative multi-armed bandits: A quantile-based approach. In
Proceedings of the 32nd International Conference on Machine Learning (ICML), Lille, France: Proceedings of Machine
Learning Research.
Thrun S, Burgard W and Fox D (2006) Probabilistic Robotics. Cambridge, MA: The MIT Press.
Tran M-N, Scharth M, Gunawan D, Kohn R, Brown SD and Hawkins GE (2021) Robustly estimating the marginal likelihood
for cognitive models via importance sampling. Behavior Research Methods 53(3), 1148–1165.
Urteaga I and Wiggins CH (2018) Sequential Monte Carlo for dynamic softmax bandits. In 1st Symposium on Advances in
Approximate Bayesian Inference (AABI), Montréal, Canada.
Urteaga I and Wiggins CH (2019) (Sequential) Importance Sampling Bandits. arXiv e-prints, arXiv:1808.02933.
Wand MP and Jones MC (1994) Kernel Smoothing. Boca Raton, FL: CRC Press.
Wang Z and Jegelka S (2017) Max-value entropy search for efficient Bayesian optimization. In 34th International Conference on
Machine Learning, ICML 2017, Sydney, Australia, Vol. 7. Proceedings of Machine Learning Research.
Young H and Freedman R (2015) University Physics with Modern Physics. Hoboken, NJ: Pearson Education.
Cite this article: Oliveira R, Scalzo R, Kohn R, Cripps S, Hardman K, Close J, Taghavi N and Lemckert C (2022). Bayesian
optimization with informative parametric models via sequential Monte Carlo. Data-Centric Engineering, 3: e5. doi:10.1017/
dce.2022.5
Downloaded from https://www.cambridge.org/core. 11 Mar 2022 at 03:12:33, subject to the Cambridge Core terms of use.
View publication stats