0% found this document useful (0 votes)

72 views26 pages

Chimera: Enabling Hierarchy Based Multi-Objective Optimization For Self-Driving Laboratories

The document introduces Chimera, a new achievement scalarizing function (ASF) for multi-objective optimization problems in experiment design. Chimera combines concepts from a priori scalarization and lexicographic approaches to construct a single merit-based function that implicitly accounts for a provided hierarchy of objectives. The performance of Chimera is demonstrated on benchmark optimization problems and two applications: auto-calibration of a robotic sampling sequence and inverse design of an efficient energy transport system. Chimera allows various optimization algorithms to efficiently find solutions to multi-objective problems involving experimentation with few objective evaluations.

Uploaded by

Freyrsverris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views26 pages

Chimera: Enabling Hierarchy Based Multi-Objective Optimization For Self-Driving Laboratories

Uploaded by

Freyrsverris

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Chimera: enabling hierarchy based multi-objective optimization for

self-driving laboratories
Florian Häse,1 Loı̈c M. Roch,1 and Alán Aspuru-Guzik1, 2, ∗
1
Department of Chemistry and Chemical Biology,
Harvard University, Cambridge, Massachusetts, 02138, USA
2
Senior Fellow, Canadian Institute for Advanced Research, Toronto, Ontario M5G 1Z8, Canada
(Dated: April 27, 2018)
We introduce Chimera, a general purpose achievement scalarizing function (ASF) for multi-
objective optimization problems in experiment design. Chimera combines concepts of a priori
scalarizing with ideas from lexicographic approaches. It constructs a single merit-based function
which implicitly accounts for a provided hierarchy in the objectives. The performance of the sug-
gested ASF is demonstrated on several well-established analytic multi-objective benchmark sets
using different single-objective optimization algorithms. We further illustrate the performance and
applicability of Chimera on two practical applications: (i) the auto-calibration of a virtual robotic
sampling sequence for direct-injection, and (ii) the inverse-design of a system for efficient excitation
energy transport. The results indicate that Chimera enables a wide class of optimization algorithms
to rapidly find solutions. The presented applications highlight the interpretability of Chimera to
corroborate design choices on tailoring system parameters. Additionally, Chimera appears to be ap-
plicable to any set of n unknown objective functions, and more importantly does not require detailed
knowledge about these objectives. We recommend the use of Chimera in combination with a variety
of optimization algorithms for an efficient and robust optimization of multi-objective problems.

I. INTRODUCTION a set of parameters, which yields optimal values for

a set of n objectives. In general, parameter points
where all objectives are at their optimal values do not
The transition from automated experimentation exist, which makes such multi-objective optimization
platforms to self-driving laboratories requires artificial problem challenging. As a matter of fact, improving
intelligence (AI) methods to learn experimental con- one objective might only be possible at the expense of
ditions satisfying pre-defined targets.1–3 These targets degrading on other objectives. Therefore, approaches
could include the yield and selectivity of a reaction, the capable of balancing competing criteria and identifying
production cost and overall execution time of a process, the parameter point yielding the highest merit with
or the optimization of materials with properties tailored respect to user-defined preferences are highly desirable.
to specific needs. Self-driving laboratories design and Hereafter, we propose Chimera, a versatile achievement
execute experiments without human interaction, and scalarizing function (ASF) for multi-objective optimiza-
compare experimental results to the pre-defined targets tion combining concepts of a priori scalarizing with ideas
assessing the merit of the machine-proposed conditions. from lexicographic approaches.
The learning algorithms are then used to refine and
improve the experimental conditions to design future Methods for multi-objective optimization were already
experiments with a higher merit.4–9 Often, these experi- successfully applied to various scenarios in science and
ments are subject to multiple targets and experimental engineering. Examples include the rational design of
constraints, introducing additional complexity in the dielectric nanoantennas,10 and plasmonic waveguides,11
optimization process. This complexity manifests as a the optimization of Stirling heat pumps,12 the design of
stringent requirement on the AI algorithms to satisfy thermal-energy storage systems,13–15 and optimizations
multiple, possibly competing targets simultaneously on scheduling problems in combined hydro-thermo-wind
without violating the experimental constraints. As such, power plants.16 In the aforementioned applications,
a key component of self-driving laboratories are robust analytic models assessed the merit of a set of parameters
and efficient AI algorithms evolving on multi-dimensional by evaluating the objectives with fast computations. As
surfaces to reach optimal experimental parameters in a such, these optimization problems could be approached
minimum number of distinct experiments. with methods identifying the entire set of solutions
which cannot be further optimized in at least one of
Experiment design can be formulated as a multi- the objectives, at the expense of numerous evaluations
objective (Pareto) optimization problem. Such opti- of proposed parameter sets. Preference information
mization problems are generally concerned with finding regarding specific solutions could then be expressed
knowing the surface of optimal points.

However, applications involving experimentation

∗ Electronic address: [email protected] cannot be approached by optimization methods which
2

rely on numerous objective evaluations as the number of method relies on preference information provided in
conducted experiments must be kept low. Examples of the form of a hierarchy in the objectives. A single
single-objective evaluations involving experimentation merit-based function is constructed from the provided
exist in chemistry in the context of self-optimizing hierarchy, and shapes a surface which can be optimized
reactors.17–20 Self-optimizing reactors comprise a set of by a variety of single-objective optimization algorithms.
experimental devices capable of unsupervised experi- Chimera does not require detailed assumptions about
mentation to optimize chemical processes.21 As such, the surfaces of the objective functions and it improves
the merit of a set of parameters is not computation- on the hierarchy of objectives from the beginning of the
ally evaluated, but experimentally. In this context, optimization procedure, without any required warm-up
multi-objective optimization methods have already been iterations.
employed, for example, to determine trade-offs in the
reaction rate and yield of methylated ethers,22 maximize This manuscript is organized as follows. We start
the intensity of quantum dots at a target wave length,21 with an overview of the multi-objective formulation, and
or balance the production rate and conversion efficiency machine-learning based algorithms. Then, we detail the
of Paal-Knorr reactions.23 These optimization problems implementation of Chimera, and assess its performance
have been approached with methods, which allow for on multi-objective benchmark functions. Before drawing
preference information expressed prior to starting the our conclusions, we further demonstrate the applicability
optimization procedures. Preference information are of our ASF on an automated experimental procedure for
frequently provided by constructing a single merit- real-time reaction monitoring, and on the inverse-design
function from all considered objectives such that the of an excitonics system for the efficient transport of
single merit-based function accounts for the provided excitation energy.
preferences. Optimizations were then conducted on the
merit-based function using single objective optimization
algorithms.
II. BACKGROUND AND RELATED WORK
Indeed, the aforementioned examples display the suc-
cessful application and benefit of multi-objective opti- Multi-objective (Pareto) optimization problems are
mization methods on self-optimizing reactors to power concerned with the simultaneous optimization of a set of
self-driving laboratories. Yet, the merit-based func- objective functions, {fk }n−1
k=0 , where each of the objective
tions employed in these examples are often handcrafted. functions, fk , is defined on the same compact parameter
Constructing a suitable and versatile merit-based func- space P ⊂ Rd .27 Although the desired goal of an opti-
tion with little prior knowledge about the objectives is mization procedure is to find a point in parameter space
challenging.24,25 As a matter of fact, compositions of x∗ ∈ P for which each of the objectives fk (x∗ ) assume
merit-based functions can sometimes require refinements their desired optimal value (e.g. minimum/maximum),
after initial optimization runs as the desired preference in objectives in multi-objective optimization problems of-
the objectives is not achieved.23 Recently, Walker et al. tentimes conflict each other. Indeed, improving on one
have introduced a method for formulating merit-based objective could imply an unavoidable degradation on
multi-objective optimization problems as constrained op- other objectives. As a consequence, a single global so-
timization problems for the synthesis of o-xylenyl adducts lution cannot be defined for the generic multi-objective
of Buckminsterfullerene.26 Their approach aims to opti- optimization problem.
mize a main objective, while keeping other objectives at
desired levels by considering them as constraints. How-
ever, their method depends on the choice of constraints, A. Defining and identifying solutions to
which requires substantial prior knowledge about the ob- multi-objective optimization problems
jective surfaces. Therefore, the lack of a universal, gen-
eral purpose method for constructing merit-based func- A commonly used criterion for determining solutions
tions from multiple objectives appears as a major obsta- to multi-objective optimization problems is Pareto
cle to the massive deployment of self-optimizing reactors optimality.28 A point is called Pareto optimal if and only
and self-driving laboratories. Notably, we identify two if there exists no other point such that all objectives
main constraints: (i) objective evaluations involve timely are improved simultaneously. Therefore, deviating from
and costly experimentation, and, thus, must be kept to a a Pareto optimal point always implies a degradation
minimum, (ii) no prior knowledge is available about the in at least one of the objectives. As Pareto optimal
surface of the objectives. points cannot be collectively improved in two or more
In this work, we suggest Chimera, an approach to objectives, solving a multi-objective optimization prob-
multi-objective optimization problems for self-driving lem translates to finding Pareto optimal points. Note,
laboratories. We show on several well-established bench- that for a given multi-objective optimization problem,
mark functions and on two applications how Chimera multiple Pareto optimal points can coexist.29
fulfills the aforementioned constraints. Our proposed
3

Typically, approaches to solving multi-objective problems with a priori knowledge consists in con-
optimization problems aim to assist a decision maker in sidering only one of the objectives for optimization
identifying the favored solution from the set of Pareto while constraining the other objectives based on user
optimal solutions (Pareto front). The favored solution is preferences.44–46 These approaches, referred to as ε-
determined from preference information provided by the constraint methods, have been shown to find Pareto
decision maker. Methods for multi-objective optimiza- optimal points even on non-convex objective spaces.29,47
tion can be divided into a posteriori methods, which However, the constraint vector needs to be carefully
discover Pareto optimal points first, such that prefer- chosen, which typically requires detailed prior knowledge
ences can be expressed knowing the set of Pareto optimal about the objectives.
points, and a priori methods, which require preference
information prior to starting the optimization procedure. A third set of methods, known as lexicographic meth-
ods, follow yet a different approach.48 Lexicographic
A posteriori methods are commonly realized as mathe- methods require preference information expressed in
matical programming approaches such as Normal Bound- terms of an importance hierarchy in the objectives. To
ary Intersection (NBI),30,31 Normal Constraint,32,33 or start the optimization procedure with a lexicographic
Successive Pareto Optimization,34 which repeat algo- method, the objectives are sorted in descending order
rithms for finding Pareto optimal solutions. Another of importance. Each objective is then subsequently op-
strategy consists in evolutionary algorithms such as the timized without degrading objectives higher up in the
Non-dominated Sorting Genetic Algorithm-II,35 or the hierarchy.49 Variants of the lexicographic approach allow
Sub-population Algorithm based on Novelty,36 where for minimal violations of the imposed constraints.50,51
a single run of the algorithm produces a set of Pareto
optimal solutions. Recently, a posteriori methods have
also been developed following Bayesian approaches for B. Single-objective optimization methods
optimization.37–41 However, determining the preferred
Pareto point from the entire Pareto front requires a Most a priori methods reformulate multi-objective
substantial number of objective function evaluations optimization problems into single-objective optimiza-
compared to scenarios in which only a subset of the tion problems. The latter are well studied and a
Pareto front is of interest. Such scenarios can be found plethora of algorithms has been developed to solve
in the context of experiment design, where preferences these problems.52–55 Some single-objective optimization
are available prior to the optimization procedure. As algorithms aim to optimize an objective function locally
such, a priori methods appear to be better suited to while others aim to determine the global optimum. In
multi-objective optimization problems in the context some cases, optimization algorithms are based not only
of designing experiments, as they keep the number of on the objective function, but also on its gradients and
objective evaluations to a minimum. possibly higher derivatives.

A common a priori approach for expressing preferences For unknown objective functions, which are costly to
for multi-objective optimization problems is to construct evaluate, the employed optimizer must be gradient-free,
a single achievement scalarizing function (ASF) from and global to keep the number of function evaluations
all involved objective functions. The premise of the to a minimum. In addition, such an algorithm must
constructed ASF is that its optimal solution coincides support optimization on possibly non-convex surfaces.
with the preferred Pareto optimal solution of the Several algorithms have been developed for this purpose.
multi-objective optimization problem. Typically, ASFs In the following paragraphs we describe four of such
are constructed with a set of parameters which account techniques which we will consider hereafter to study the
for the expressed preferences regarding the individual performance of Chimera.
objectives. ASFs can be constructed via, e.g., weighted
sums or weighted products. In such approaches, the Systematic grid searches and (fractional) factorial
ASF is computed by summing up each objective function design strategies are popular methods for experimental
fk multiplied by a pre-defined weight wk accounting design.56–58 These strategies rely on the construction
for the user preferences. The first set of a priori ap- of a grid of parameter points within the parameter
proaches includes different formulations of weighted sum (sub) space, from which points are sampled for eval-
approaches,42 and methods have been developed to learn uation. Grid searches are embarrassingly parallel, as
these weights adaptively.43 Weighted sum approaches the parameter grid can be constructed prior to running
are usually simple to implement, but the challenge any experiments. However, a constructed grid cannot
lies in finding suitable weight vectors to yield Pareto fully take into account the most recent experimental
optimal solutions. In addition, Pareto optimal solu- results for proposing new parameter points. More-
tions might not be found for non-convex objective spaces. over, parameter samples proposed from grid searches
are correlated, and thus might miss important features
A second set to solving multi-objective optimization of the objective surface or even the Pareto optimal point.
4

are appropriately balanced, or explicit bounds have

The Covariance Matrix Adaptation Evolution Strategy to be defined on objectives. While initial sampling
(CMA-ES) samples parameter points from a multinomial of the objectives can provide sufficient information to
distribution defined on the parameter space.59,60 After formulate suitable weight-vectors or constraints, objec-
evaluation of all proposed parameter points, distribution tive function evaluations are assumed to be costly and
parameters are updated via a maximum-likelihood ap- are thus to be kept to a minimum. The lexicographic
proach. As a consequence, the means of the multinomial method in contrast requires less prior knowledge about
distribution follow a natural gradient descent while the objectives, but optimizes objectives subsequently
the covariance matrix is updated via iterated principal instead of simultaneously.
component analysis retaining all principal components.
While CMA-ES is successful on highly multi-modal In this work, we propose Chimera, which follows the
functions, its efficiency drops on well-behaved convex idea of lexicographic methods by providing preference
functions. information in the form of a hierarchy in the objectives,
but formulates a single ASF based on the provided
Recently, Bayesian optimization methods gained hierarchy. The formulation of a single hierarchy based
increased attention. Spearmint implements Bayesian ASF avoids the subsequent optimization of objectives
optimization based on Gaussian processes.61,62 Gaussian in the hierarchy without requiring detailed information
processes associate every point in the parameter space about the objectives. The ASF for this purpose should
with a normal distribution to construct an approxi- be formulated to enable the following procedure: (i)
mation of the unknown objective function. Parameter Given a hierarchy in the objectives, relative tolerances
points can be proposed from this approximation via an are defined for each objective, indicating the allowed
acquisition function, implicitly balancing explorative relative deviation with respect to the full range of ob-
and exploitative behavior of the optimization procedure. jective values. (ii) Improvements on the main objective
While Gaussian process based optimization provides should always be realized, unless sub-objectives can be
high flexibility, it suffers from the adverse cubical scaling improved without degrading the main objective beyond
of the approach with the number of observations. the defined tolerance. (iii) Furthermore, changes in
the order of the hierarchy and the tolerances on the
Recently, we introduced Phoenics for a rapid opti- objectives should enable the optimization procedure to
mization of unknown black-box functions.63 Phoenics reach different Pareto-optimal points.
combines concepts from Bayesian optimization with
ideas from Bayesian kernel density estimation. Phoenics
was shown to be an effective, flexible optimization algo-
rithm on a wide range of objective functions and allows
for an efficient parallelization by proposing parameter
points based on different sampling strategies. These A. Constructing Chimera
strategies are enabled by the introduction of an intuitive
bias towards exploitation or exploration.
In this section we detail the design of Chimera for
the three aforementioned purposes. We assume the set
of f = (f0 , . . . , fn−1 ) objective functions to be ordered
III. METHODS based in descending hierarchy, i.e. f0 is the main
objective, and that the optimization procedure aims to
minimize each of the objectives. Chimera is updated
We consider a multi-objective (Pareto) optimization
at every optimization iteration based on all available
problem with n objective functions {fk }n−1
k=0 , where each observed pairs of parameter points and objectives
objective function fk is defined on the d-dimensional
Dj = {(xi , fi )}ji=1 .
compact subset P ⊂ Rd . We further assume that no
prior information about the objectives is available and
that evaluations of the objectives are demanding in Chimera is based on and implicitly accounts for the
terms of budgeted resources such as time or money. hierarchy in the objectives. Using prior observations Dj ,
Numerous evaluations of the objectives are therefore relative tolerances f˜ktol defined prior to the optimization
impractical, which motivates to approach such problems procedure are used to compute absolute tolerances fktol
via a priori methods for multi-objective optimization on all objectives at each optimization iteration (see
with gradient-free global optimization algorithms. Eq. 1). Note, that absolute tolerances for individual
objectives are computed based on the minimum and
As previously mentioned, a priori methods, such as maximum of this objective only in the subset of the
the weighted sum method or the ε-constraint method, parameter space, Yk−1 ⊂ P, where the objective one
require significant prior knowledge about the objectives. level up the hierarchy meets its tolerance criteria.
Weight vectors need to be chosen such that objectives
5

Algorithm Chimera

function Chimera (x, Y)
fktol = f˜ktol max fk (xi ) − min fk (xi ) . (1)
xi ∈Yk−1 xi ∈Yk−1 min
f0min ← min f0 (xi )
xi ∈Y

f1min ← min f1 (xi )

We can determine whether a given objective function xi ∈Y0

value is above or below the given tolerance via the Heav- f2min ← min f2 (xi )
xi ∈Y1
iside function Θ,
if f0 (x) ≤ f0tol then
y1 ← f1 (x) − f0min
if f11 (x) ≤ f1tol then
(
0 if fk (x) ≥ fktol
fktol y2 ← f2 (x) − f1min

Θ − fk (x) = . (2)
1 if fk (x) < fktol if f2 (x) ≤ f2tol then
y0 ← f0 (x) − f2min
return y0
For the following considerations we introduce the ab- else
breviations return y2
else
Θ+ tol

return y1
k (x) = Θ fk − fk (x) , (3)
else
− tol +

Θk (x) = Θ fk (x) − fk = 1 − Θk (x). (4) return f00 (x)

Using the Heaviside function to scale the involved

χ(x) = Θ+ f0 (x) + Θ− + min

objectives, a single ASF can be constructed. Further- 0 (x)f 0 (x)Θ1 (x) f1 (x) − f0
− − + min

more, this ASF is sensitive to a single objective only in + Θ0 (x)Θ1 (x)Θ2 (x) f2 (x) − f1
specific regions of the parameter space. However, as the + Θ− − − min

0 (x)Θ1 (x)Θ2 (x) f0 (x) − f2
Heaviside function is not continuous, a formulation of
the ASF in terms of the Heaviside function introduces
discontinuities. Some of the discontinuities can be FIG. 1: Example for the construction of Chimera from three
avoided by shifting objectives fk based on the minimum objective functions. Left panel: pseudo code showcasing
of fk−1 in the parameter regions Yk−1 ⊂ P for which conceptual ideas of Chimera. Right panels: Illustration of
the three considered objectives (top panel) used to construct
fk−1 does not satisfy the defined tolerance. We denote
min Chimera (lower panel). Bottom panel: analytic expression of
the shifting parameters with fk−1 . Chimera χ(x) is then Chimera for this specific example.
constructed to account for the hierarchy of individual
objectives via Eq. 5. Fig. 1 illustrates the construction
and implementation of Chimera for an example of three
objective functions. tol −1
f − fk (x)
θ fktol − fk (x) = 1 + exp − k

, (6)
τ
n−1
Y where τ > 0 can be interpreted as a smoothing
χ(x) = f0 (x)Θ−
k (x) parameter. Note, that the logistic function converges to
k=0 the Heaviside function in the limit lim+ θ(f ) = Θ(f ).
n−1 k−1 τ →0
We illustrate the influence of the smoothing parameter
X Y
Θ−
min
+
+ fk (x) − fk−1 Θk (x) m (x). (5)
k=1 m=0
on the shape of χ(x) in Fig. 1. In general, we observe
that small values of τ still retain sharp features in the
Within this formulation of the ASF, and its associated ASF, although discontinuities are lifted. Large values of
relative tolerances, a single-objective optimization algo- τ , however, may cause deviation in the global minimum
rithm is motivated to improve on the main objective. of the ASF and in the location of the Pareto-optimal
In addition, the algorithm will be enforced to optimize point.
the sub-objectives as well, from the beginning of the
optimization procedure on. Nevertheless, improvements The impact of the smoothing parameter on the
on the sub-objectives will not be realized if they cause performance of an optimization run is reported in the
degradations in objectives higher up the hierarchy. supplementary information (see Sec. VII A). We ran
Phoenics on the three one dimensional objective func-
In the general case, not all discontinuities in Chimera tions illustrated in Fig. 1 and construct Chimera with
can be removed by shifting the objectives (see Fig. 1) different smoothing parameter values. We find that gen-
as continuous connections of the objectives might not be erally large values of τ result in considerable deviations
possible at all locations without reshaping the objectives. in the objectives after a given number of optimization
However, these discontinuities can be avoided with the iterations, eventually causing the optimization algorithm
logistic function as a smooth alternative to the Heaviside not to find parameter points yielding objectives within
function the user-defined tolerances. In contrast, small values of τ
6

(including τ → 0+ ) cause the optimization algorithm to constraint values matching the pre-defined tolerances
need slightly more objective function evaluations to find from this grid evaluation.
parameter points yielding objectives within the defined
tolerances. However, we did not observe any significant After these initial computations, we emulate an
differences in the performance for intermediate values of optimization procedure set up as a grid search, which is
τ . We recommend the use of τ within the [10−4 , 10−2 ] a common strategy for experimental design.56–58 During
interval. For all the tests performed and reported in the the optimization procedure we construct both Chimera
results section as well as for the two applications a value and c-ASF from obtained observations. We designed the
of τ = 10−3 was used. grid from 20 × 20 equidistant parameter points. From
the resulting 400 grid points, we construct 25 different
sampling sequences by shuffling the order of grid points.
All objective functions are evaluated at parameter points
IV. RESULTS in sequential order. At each iteration in the optimization
procedure, we reconstruct both ASFs and determine
Scalarizing multiple objectives allows to reformulate their predicted Pareto optimal points. Deviations in the
a multi-objective optimization problem as two separate objective values of the predicted Pareto optimal points
problems: (i) finding a suitable ASF which predicts and the true Pareto optimal points are used as a measure
Pareto-optimal solutions, and (ii) finding the optimum to determine how well Pareto optimal objectives are
of this ASF with a chosen single-objective optimization predicted by either ASF. Average deviations between
algorithm. With the benchmarks presented in this predicted and true Pareto optimal objectives, with
section, we address these two different aspects. We start respect to the full range of all objectives, are reported in
with a focus on the question whether Chimera reason- Fig. 2.
ably predicts the locations of Pareto optimal points
for a given set of hierarchies and tolerances. We then
proceed with evaluating the performance and behavior
of difference single-objective optimization algorithms on
Chimera.

To benchmark the performance of Chimera, we

consider six different sets of well-established analytic
objective functions. Five of the sets consist of two
objectives, while the sixth set contains three objectives.
Details on the objective functions are reported in the
supplementary information (see Sec. VII A). For all
benchmark optimizations reported in this section, we
employed the same set of tolerances and constraints on
the objectives in the benchmark set, which are reported
in the supplementary information as well (see Sec. VII A).

A. Deviations of the expected optimum from the

actual optimum

We start our discussion with benchmarking the

accuracy of Chimera in predicting the location of
Pareto-optimal points given a set of hierarchies and
tolerances. The performance of Chimera is compared
to the behavior of the ASF introduced by Walker et
al.,26 which we refer to as c-ASF from now on due
to its constrained approach. Pareto-optimal points
were determined from evaluating each objective on
a 1000 × 1000 grid in the parameter spaces. While
tolerances on the objectives for Chimera can be defined FIG. 2: Average relative distance from the Pareto-optimal
a priori without detailed knowledge about the shapes of point determined by the applied constraints. We compare
the objectives, the c-ASF introduced requires absolute the achieved relative distances of Chimera and c-ASF. Pa-
constraints on the objectives. For a fair comparison rameter spaces were searched via a grid search (see main text
between the two ASFs, we therefore also compute for details).
7

Based on the benchmark results, we find that the

Pareto optimal point predicted by Chimera is closer
to the true Pareto optimal point with respect to all
involved objectives after the full evaluation of the 20 × 20
grid for four out of the six benchmark sets. With the
Viennet benchmark set, we find similar performance in
both ASFs, and c-ASF predicts Pareto optimal with
slightly smaller deviations on the ZDT2 benchmark set.

Besides the prediction accuracy, it is important to em-

phasize a major difference between Chimera and c-ASF:
c-ASF requires detailed knowledge about the individual
objective surfaces to set appropriate constraints. The
Pareto optimal point can only be determined if reason-
able bounds have been defined. In addition, changing
the hyperparameters in c-ASF can significantly influence
how individual objectives are balanced. Chimera, how-
ever, only contains a single hyperparameter τ (see Eq. 6),
which is used for smoothing the constructed χ. From
the presented benchmark, we find that Chimera shows
good performance with the same choice of τ on a diverse
set of benchmark functions. We have also illustrated
that the performance of an optimization procedure aug-
mented with Chimera only weakly depends on the par-
ticular choice of τ over several orders of magnitude (see
Sec. VII B).

B. Performance with various optimization

algorithms

In this section, we investigate the performance

of four single-objective optimization algorithms on
both Chimera and c-ASF. In particular, we employ FIG. 3: Average smallest relative deviations between objec-
four different gradient-free optimization procedures: tives sampled by different optimization algorithms after 100
grid search,56–58 CMA-ES,59,60 spearmint61,62 and objective function evaluations averaged over 25 different op-
Phoenics.63 Details about the optimization procedures timization runs. Panel (A) reports results on the Fonseca
are reported in Sec. II B. The resulting combinations benchmark set, and panel (B) depicts results for the Viennet
of optimization algorithms and ASFs are then applied variant benchmark set.
to the six analytic benchmark sets, and were used to
determine how fast the Pareto optimal points can be We find that optimization runs of different opti-
reached. mization algorithms augmented with Chimera reach
low deviations to the Pareto optimal points after 100
In all optimization runs we applied the same set of objective set evaluations. When comparing to the devi-
constraints and tolerances as discussed in the previous ations in objectives achieved by optimization algorithms
section. The performance of each optimization algo- augmented with c-ASF, Chimera generally seems to
rithm augmented with each of the ASFs is quantified lead optimization algorithms closer to the true Pareto
by computing the smallest relative deviation in the optimal objectives. Although the degree of improvement
objectives between all sampled parameter points and in the deviations of Chimera over c-ASF varies across all
the Pareto optimal point. The average smallest achieved objectives, we did not observe a case where c-ASF signif-
relative deviations after a total of 100 objective set icantly outperforms Chimera. These observations hold
evaluations for the Fonseca set and the Viennet set are for the duration of the entire optimization procedure, as
reported in Fig. 3. Note, that the performance of grid reflected by the individual optimization traces reported
search does not depend on the ASF, as decisions about in the supplementary information (see Sec. VII C 2). In
which parameter point to evaluate next are not updated particular, the fact that the tolerances are defined rela-
based on prior evaluations. Results on the remaining tive to the observed range of objectives in Chimera does
four benchmark sets are reported in the supplementary not appear to be disadvantageous. Indeed, optimization
information (see Sec. VII C). runs with Chimera achieve relatively low deviations in
8

all objectives from the beginning of the optimization in the main objective are allowed. In contrast, Chimera
procedure on. Furthermore, we find that optimization strictly enforces the user-defined hierarchy for a wide
algorithms based on Bayesian methods (spearmint range of different objective functions, as demonstrated
and Phoenics) generally outperform CMA-ES and grid in this benchmark study.
search, although the degree of improvement can vary
with the objectives. In summary, the benchmarks presented in this section
illustrate that Chimera can identify Pareto optimal
points for the provided set of hierarchies and tolerances
in the objectives. Moreover, the ASF constructed by
C. Behavior of optimization procedures Chimera enables a variety of optimization algorithms
to locate the Pareto optimal point. Chimera strictly
In addition to the differences in performance of follows the hierarchy imposed by the user and requires
Chimera and c-ASF with different optimization al- less prior information about the shape of the objectives.
gorithms, we also observe differences in the general Therefore, Chimera is well suited for multi-objective
behavior of the optimization runs regarding the trade-off optimization problems where evaluations of the objective
between objectives. Optimization traces generated functions are costly, satisfying thus the two constraints
by optimization algorithms augmented with Chimera identified and discussed in the introduction.
closely follow the user-defined hierarchy in the objec-
tives. As such, improvements on sub-objectives are only
realized if superior objectives are not degraded beyond
the specified tolerance. Optimization runs generated
from optimization procedures augmented with c-ASF do V. APPLICATIONS OF CHIMERA
not strictly follow this hierarchy. Instead, we observe
cases in which c-ASF appears to favor improvements
on the sub-objectives even if these improvements cause In this section we further demonstrate the applicability
degradations in superior objectives. An example is and performance of Chimera on two different examples:
given in Fig. 4, where optimization traces of grid search the auto-calibration of a robotic sampling sequence for
and Phoenics augmented with both ASFs on the ZDT2 direct-injection, and an inverse-design problem for exci-
benchmark set are depicted. tonic systems. Both applications involve a larger number
of parameters, and include three different objectives to
be optimized.

A. Auto-calibrating an automated experimentation

platform

In this first application we apply Chimera to find

optimal parameters for an automated experimental
procedure designed for real-time reaction monitoring, as
previously reported in the literature.64 The procedure
is used to characterize chemicals via high-performance
FIG. 4: Optimization traces representing the smallest relative liquid chromatography (HPLC). The goal of the op-
deviations between sampled objectives and Pareto optimal timization procedure is to maximize the response
objectives averaged over 25 individual optimization runs on of the HPLC, while minimizing the amount of sample
the ZDT 2 benchmark set. Panel (A) shows deviations in used in the analysis along with the overall execution time.
the main objective, and panel (B) displays deviations in the
sub-objective. To benchmark the performance of Chimera, exper-
iments were not executed on the robotic hardware,
While Chimera only allows for improvements on the but on a probabilistic model (virtual robot) trained
sub-objective if the main objective is not degraded to reproduce the behavior of the real-life experiment.
substantially, c-ASF favors improvements on the sub- The virtual robot is trained on experimental data
objective over improvements on the main objective. This collected over two distinct autonomous calibration runs
observation, and the fact that this observation can only orchestrated by the ChemOS software package.4 During
be made for some of the benchmark sets, corroborates this process, both the HPLC response and the execution
with the functional form of c-ASF. Depending on the times were recorded (see supplementary information of
considered objectives, improvements on sub-objectives Ref.4 ).
can decrease the penalty term such that degradations
9

1. Constructing a probabilistic model (virtual robot) scenarios can possibly occur when setting up a new
optimization procedure.
The probabilistic model (virtual robot) was trained
on HPLC responses (peak areas) and execution times Based on the 105 random uniform evaluations of the
obtained from 1, 500 independent experiments conducted probabilistic model, we chose the objective constraints
fully autonomously, without human interaction. For reported in Tab. I for both scenarios. Tolerances were
these experiments, the six experimental parameters of defined such that they match up with the constraints
the procedure were sampled from a uniform distribution, relative to the entire range of the observed objective
to ensure unbiased and uncorrelated coverage of the function values. A detailed influence analysis of each
parameter space. parameter on the objectives, as well as the ranges of the
observed objectives, is reported in the supplementary
The virtual robot was set up as a Bayesian neural information (see Sec. VII D).
network (BNN), which was trained to predict HPLC
responses and execution times for any possible set of
experimental parameters. For a dense enough sampling Scenario Response Sample Time
of the parameter space, the BNN smoothly interpolates Loose 50 % 25 % 50 %
Tolerances
experimental results between two executed experiments. Tight 20 % 10 % 10 %
It is important to emphasize that the virtual robot then Loose 1250 counts 15 µl 70 s
allows to query experimental results for parameters, Limits
Tight 2000 counts 7.5 µl 54 s
which have not been realized by the actual experimental
set up. As such, the virtual robot trained in this work TABLE I: Constraints on the objectives for multi-objective
is well suited to inexpensively benchmark algorithms for optimization runs on the probabilistic model. Uniform sam-
experiment design. pling of 105 parameter points revealed that loose constraints
are achievable by parameter points in a sub-region of the pa-
The BNN was trained via variational expectation- rameter space, while tight constraints cannot be achieved by
maximization with respect to the network model any parameter point in the parameter space.
parameters. Details on the network architecture, the
training procedure and the prediction accuracy on both
observed (training set) and unobserved data (test set)
are reported in the supplementary information (see 3. Optimization results
Sec. VII D). The probabilistic model is made available
on GitHub.65 We carried out a total of 50 optimization runs with
different random seeds and a total of 400 optimization
iterations for each set of constraint (loose/tight) and
each ASF (Chimera/c-ASF). Average traces of the
2. Experimental procedure recorded objectives are presented in Fig. 5 for loose con-
straints (A) and tight constraints (B) as defined in Tab. I.
The goal of this optimization procedure is to (i)
maximize the response of the HPLC, (ii) keep the When applying loose constraints to the optimization
amount of drawn sample low and (iii) minimize the procedure, we observe a similar behavior of Chimera and
execution time of the experimental procedure. All the c-ASF. For both cases, Phoenics quickly discovers
results presented in this section were obtained with acceptable HPLC responses above the lower constraint,
the Phoenics optimization algorithm,63 and objectives and is then motivated to further minimize the sample vol-
were sampled from the trained virtual robot. Phoenics ume and the execution time below the specified bounds.
was set up with three different sampling strategies, and We observe a slight trend of Chimera causing Phoenics
sequential evaluation of proposed parameter points. to find very large peak areas after more conducted
experiments at the advantage of finding still accept-
We compare the behavior and performance of Chimera able peak areas at lower solvent amounts earlier on.
and c-ASF in two different scenarios, defined by different This trade-off reflects the hierarchical nature of Chimera.
tolerances and constraints on the individual objectives.
By sampling the objective space for 105 random uniform With tight constraints, however, we observe a more
parameter points, we can find loose constraints on the significant difference between the two optimization
objectives such that a parameter point fulfilling all strategies. While with both ASFs Phoenics finds accept-
constraints (feasible point) exists. At the same time, able peak areas much faster than for loose constraints,
such a dense sampling of the parameter space allows Chimera appears to help Phoenics in finding acceptable
us to define a set of objectives which likely cannot be peak areas in fewer experiments. Moreover, the amount
achieved for any set of experimental parameters. As we of solvent used in the experiments is lower with Chimera
assume no prior knowledge about the objectives, both from the earliest experiments on, and reaches acceptable
10

B. Inverse-design of excitonic systems

In this section we demonstrate the applicability

of Chimera to inverse-design problems: systems are
reversed engineered based on desired properties. We
focus on the design of a system for efficient excitation
energy transport (EET). EET phenomena have been of
great interest in recent years across different fields such
as evolutionary biology or solar cell engineering.66–69 In
particular, studies have focused on understanding the
relation between the structure of an excitonic system
and its transfer properties fostering the design of novel
excitonic devices.

1. System definition

The inverse design challenge in this application focuses

on an excitonic system consisting of four sites located
along the axis, ex . Each excitonic site is defined with a
position xi on ex , an excited state energy εi , a transition
dipole with a fixed oscillator strength of |µi |2 = 37.1 D2
and an orientation angle, ϕi = arccos(ei · ex ), with
respect to the main axis. As such, the excitonic system
is fully characterized by a total of ten parameters: four
transition dipole orientations, {ϕ0 , ϕ1 , ϕ2 , ϕ3 }, three
relative excited state energies of the last three sites,
{ε1 , ε2 , ε3 }, with respect to the excited state energy of
FIG. 5: Achieved objective function values for multi-objective
the first site ε0 = 0 and three relative distances between
optimization runs on a virtual robot model obtained with two consecutive sites, {d1 , d2 , d3 }, where di = xi − xi−1
Phoenics on Chimera and c-ASF averaged over 50 individual and d0 = 0. Each of the system parameters was
runs. The goal of the optimization runs is to maximize the constrained to domains motivated by parameter values
HPLC response, minimize the sample volume and minimize for biological light-harvesting complexes.70–73 Ranges
the execution time beyond the set bounds, indicated with for all parameters are reported in Tab. II.
black dashed lines.

Parameter size lower bound upper bound

Distances d 3 5 Å 40 Å
Energies ε 3 −800 cm−1 800 cm−1
Angles ϕ 4 0 2π
levels much faster than with c-ASF. However, the upper
bound on the execution time is always exceeded, as TABLE II: Parameters for the excitonic system studied in this
there is no point in parameter space for which the peak application. All parameter ranges are inspired by parameter
ranges for biological light-harvesting complexes.
area is above the chosen lower bound and the execution
time below the specified upper bound simultaneously
(see Sec. VII D). The goal of the optimization procedure is to design
excitonic systems with highly efficient energy transport
at a low energy gradient across a large distance. These
three objectives are quantified as follows: assuming
Chimera therefore enables optimization algorithms the system transfers excitons from the first site to the
to rapidly find parameter points yielding objectives fourth site, we compute the total transfer distance
close to the user specifications. In the scenario where as d = d1 + d2 + d3 . Furthermore, we consider the
the parameter point does not exist, Chimera still leads energy gradient between the first and the last site,
optimization algorithms to parameter points yielding ε = |ε3 |. Lastly, we also compute the efficiency η of
acceptable objective values based on the provided the EET. The transfer efficiency is computed from a
hierarchy and achieves as many objectives as possible. full population dynamics calculation in the hierarchical
equations of motion (HEOM) approach,74–76 with the
11

QMaster software package, version 0.2.77–80 HEOM is a computed system. We therefore set up the optimization
numerically exact method which accurately accounts for procedure in an asynchronous feedback-loop, to process
the reorganization process. results from population dynamics calculations as soon
as they are available. In this feedback-loop, a database
To run a full population dynamics calculation we is used to store system parameters for future evaluation.
construct the Frenkel exciton Hamiltonian81,82 for each When a population dynamics calculation completes,
proposed excitonic system from the system parameters. a new set of system parameters obtained from the
The Frenkel exciton Hamiltonian accounts for the database is submitted for evaluation. Optimization
excitation energy of each excitonic site and the Coulomb iterations with Phoenics are triggered right after all
coupling between the sites. While excitation energies three objectives (transfer efficiency, total distance and
are provided as parameters during the optimization, energy gradient) have been retrieved from the completed
excitonic couplings are computed from the geometry population dynamics calculation. At the end of an
of the system using a point-dipole approximation (see optimization iteration, the system parameters in the
Eq. 7).83 We denote the unit vector along the spatial database are updated with the parameters proposed
displacement of sites i and j with eij and the dis- from this optimization iteration. A similar procedure
tance between the two sites with dij . Note, that the has recently been employed in the context of self-driving
point-dipole approximation only holds for large distances laboratories.4

For the problem of reverse-engineering an excitonics

µi µj system, we illustrate the performance of Chimera on
Vij = 3 [ei · ej − 3 (ei · eij ) (ej · eij )] . (7)
dij all possible permutations of hierarchies among all three
objectives. For each permutation, we execute a total
The coupling of the excitonic sites, J(ω), in the system of 25 individual optimization runs with 400 iterations.
to the surrounding bath are modeled via single-peak All optimization runs aim to design excitonic systems
Drude-Lorentz spectral densities (see Eq. 8). For all with highly efficient energy transport at a low energy
spectral densities, we chose λ = 35 cm−1 and ν −1 = 50 fs. gradient across a large distance. Note, that large
−1
In all calculations, we use a trapping rate of Γtrap = 1 ps transfer efficiencies compete with large distances and
−1 low energy gradients. To emphasize the importance of
and exciton life-times of Γloss = 0.25 ns.
large efficiencies and low energy loss of the transport,
we chose to apply a tolerance of 10 % on the transfer
ων efficiency, 12.5 % on the energy gradient and 40 % on the
J(ω) = 2λ . (8) total distance.
ω2 + ν2
We find that Chimera enables Phoenics to discover
2. Optimization procedure excitonic systems with the desired objectives in all
six studied hierarchy permutations. Details about
Calculations of the Population dynamics on the these permutations are provided in the supplementary
described excitonic system are computationally demand- information (see Sec. VII F). Independently from the
ing, with execution times ranging from about five to order of the objectives in the hierarchy, Chimera guides
about twenty minutes. To accelerate the optimization Phoenics to the parameter space region, for which the
procedure, we employ Phoenics which allows to generate associated objectives satisfy all tolerances following
multiple excitonic systems per optimization iteration for different sampling paths. We illustrate this in Fig. 6,
parallel evaluation. Note, that we extended the sampling which highlights the objectives sampled for two of the six
procedure in Phoenics to account for periodicities in studied permutations: Permutation 2 (green dots), which
the orientation angles by computing periodic distances (i) maximizes the transfer efficiency, (ii) minimizes the
when constructing the approximation to the objective energy gradient and (iii) maximizes the total distance,
function from the kernel density distributions. Details and permutation 5 (red triangles) which (i) minimizes
on the procedure are provided in the supplementary the energy gradient, (ii) maximizes the transfer efficiency
information (see Sec. VII E). and (iii) maximizes the total distance. In Fig. 6a we
show the points with the most desirable objectives
Phoenics was used with four different sampling discovered during the optimization runs. Bootstrapped
strategies, each proposing a different set of parameters sampling paths leading from the initial (random) points
in one optimization iteration. For each of the proposed to the best performing points are presented as projec-
parameter sets, we construct the Frenkel exciton Hamil- tions on each of the three planes. Fig. 6b to Fig. 6d
tonian and start the population dynamics calculation further detail the projected paths by supplementing the
with QMaster. It is important to mention that the individually sampled points for each of the permutations.
execution time of the population dynamics calculation
can vary, as it depends on the parameters of the For both permutations presented in Fig. 6, Chimera
12

excitonic systems satisfying the main objective are

typically discovered within a few optimization iterations.
Sub-objectives are then easily realized in cases where the
first and the second objectives do not compete, e.g. per-
mutation 4, where the first objective is the total distance
and the second objective the energy gradient. However,
if the first and the second objective do compete with
each other (e.g. transfer efficiency and energy gradient
in Fig. 6) Chimera gradually leads to improvements on
the second objective without allowing for degradations
in the first objective. This behavior is observed across
all studied permutations. Chimera therefore implements
the means to realize as many objectives as possible.

3. Deriving design choices

In the previous sections we observed that optimiza-

tion algorithms strictly follow the implicit objective
hierarchy in the ASF constructed by Chimera. As such,
FIG. 6: Objective function values sampled in optimization
runs with two different hierarchies in the objective. Hierar- the excitonic systems sampled during the optimization
chy order shown in green dots: (i) transfer efficiency, (ii) en- procedure will achieve objectives in the order of the
ergy gradient, (iii) total distance. Hierarchy order shown in hierarchy imposed. We now study the excitonic systems
red triangles: (i) energy gradient, (ii) transfer efficiency, (iii) sampled during the optimization procedures to retrieve
total distance. (A) Optimal points with respect to all objec- design choices made by the algorithm in order to sub-
tives discovered during individual optimizations. Projections sequently achieve the objectives in the imposed hierarchy.
illustrate bootstrapped sampling paths leading to the best
performing points. (B)-(D) Detailed illustration of projected Fig. 7 illustrates excitonic systems produced by opti-
sample traces. Arrows indicate the general paths taken by mization runs with the following hierarchy: (i) lower the
the optimization algorithm for the different hierarchy orders.
energy gradient, (ii) maximize the transfer efficiency and
More transparent points have been sampled earlier in the op-
timizaton procedure, and more opaque points have been sam-
(iii) increase the total distance covered by the excitonic
pled at a later stage. White regions indicate the target values system. Fig. 7A shows the average optimization traces
for all considered objectives. highlighting the portions where only the first objective
is reached, the first and second objectives are reached,
and all objectives are reached (Fig. 7A.I to Fig. 7A.III
respectively). Since both low energy gradients and large
successfully leads Phoenics to the region in objective distances compete with high transport efficiency, only a
space where all tolerances are satisfied. However, few parameter points satisfy all three objectives.
we observe differences in the sampling paths. While
with permutation 2 Phoenics samples higher transfer Fig. 7B illustrates examples of parameters for ex-
efficiencies earlier on in the optimization procedure, the citonic systems matching the portions highlighted in
algorithm is biased towards first sampling lower energy Fig 7A. The depicted excitonic systems are the earliest
gradients with permutation 5. The sampling paths encountered sets of parameters in these portions. Ar-
displayed in Fig. 6 are in agreement with the order of rows indicate both the location and the orientation of
hierarchies in the objectives for the two permutations. transition dipoles. Associated excited state energies for
These differences in the samplings paths can be ratio- these sampled systems are presented in Fig. 7C.
nalized by the fact that high transfer efficiencies and low
energy gradients are competing objectives, i.e. it is not For the sampled excitonic systems achieving the first
possibly to improve on both objectives with the same objective (low energy gradient, Fig. 7I) we do not observe
changes in the parameters. preferences regarding the distances between excitonic
sites, orientations of transition dipoles or excited state
Optimization traces for all permutations averaged energies for all but the last sites. These observations are
over the 25 individual optimizations are reported in the in accordance with the defined objective, as the energy
supplementary information (see Sec. VII F). In accor- gradient is only controlled by the excited state of the
dance with previous results on the analytic benchmarks last site.
(see Sec. IV) and the auto-calibration of an automated
experimentation platform (see Sec. V A) we find that To subsequently achieve the second objective (high
13

FIG. 7: Results for the inverse-design of an excitonic system with (i) a low energy gradient, (ii) high transfer efficiency and
(iii) large total distance between the first and the last site. (A): optimization traces averaged over 25 individual optimization
runs, and indicate the average required number of designed systems to achieve one, two or all objectives. (B): illustrations
of sampled excitonic systems achieving one, two or three objectives. Arrows represent transition dipoles with their location
and orientation to the principal axis. (C): excited state energies of the systems depicted in (B). Overall energy gradients are
reported in the legends.

transport efficiency, Fig. 7II) we observe a tendency optimization problems. Chimera uses concepts of lexi-
of sampling shorter overall distances and excited state cographic methods to combine any n objectives into a
energies which are lower in magnitude. By further single, smooth objective function based on a user-defined
constraining the system to maximize the overall distance hierarchy in the objectives. Additionally, tolerances for
(Fig. 7III) transition dipoles are required to align. This acceptable ranges in these objectives can be provided
sampling behavior provides empirical evidence about the prior to the optimization procedure. Chimera strictly
influence of individual system parameters on considered follows the imposed hierarchy in the objectives, and
objectives. their associated tolerances. This avoids degradation of
objectives upon improvement of objectives with lower
Overall, we find that Chimera is well suited to ap- importance along the hierarchy. Chimera contains
proach inverse-design challenges and discover systems a single hyperparameter τ controlling the degree of
with desired properties even if the properties of the sys- smoothness of the ASF. However, the performance of
tem are determined by a larger number of parameters. Chimera appears to be rather insensitive to the value of
In addition, the formulation of Chimera in terms of a τ across several orders of magnitude. We nonetheless
hierarchy in the objectives allows to study the systems recommend τ = 10−3 based on our benchmark results.
sampled at different stages of the optimization procedure When comparing to the formulation of other a priori
when different objectives are achieved. As demonstrated methods, Chimera requires less prior information about
on the example of designing excitonic systems in Fig. 7, the shapes of individual objectives, while providing
general design choices can be identified empirically from the flexibility to reach any Pareto optimal point in
the sampled systems. the Pareto optimal front and keeping the number of
objective evaluations to a minimum.

VI. CONCLUSIONS We assessed the performance of Chimera on well-

established analytic benchmark sets for multi-objective
In this work we introduced Chimera, a novel optimization methods. Our results indicate that
achievement scalarizing function for multi-objective Chimera is well suited to predict the location of Pareto
14

optimal points following the provided preference in- Chimera to be successfully used in scenarios where slow
formation. Chimera provides additional flexibility by merit-evaluation processes such as involved computa-
enabling various single-objective optimization algorithms tions or experimentation, most notably in chemistry and
to efficiently run on top of the constructed ASF. In materials science, present a challenge to other methods.
comparison to the general purpose constrained ASF Moreover, Chimera enables the use of single-objective
suggested by Walker et al.26 we find that Chimera en- optimization algorithms and quickly determines con-
ables optimization algorithms to identify Pareto optimal ditions yielding the desired merit. As such, Chimera
points in fewer objective function evaluations while constitutes an important step towards the deployment
requiring less detailed knowledge about the objective of self-optimizing reactors and self-driving laboratories,
surfaces. as it provides an approach to overcome the identified
constraints: (i) objective evaluations involve timely and
We further illustrated the capabilities of Chimera costly experimentation, and (ii) no prior knowledge
on two different applications involving up to ten about the objective functions is available.
independent parameters: the auto-calibration of a
robotic sampling sequence for direct-injection, and In summary, we suggest that researchers in automa-
an inverse-design problem for excitonic systems. The tion and more generally multi-objective optimization
auto-calibration application revealed that Chimera test and/or employ Chimera for Pareto problems when
always aims to achieve as many objectives as possible evaluations of the objectives are expensive and no prior
following the provided hierarchy and does not improve information about the experimental response is available.
on sub-objectives if this would imply degradations of
the main objective. This observation is also confirmed Acknowledgments
with the excitonics application. In addition, we found
that the permutations in the objective hierarchies allow
to deduce design principles from sampled parameters. We thank Dr. Christoph Kreisbeck for helpful com-
This can find important applications for molecular and ments and fruitful discussions on the excitonics applica-
structural design with tailored properties. Furthermore, tion. F.H. was supported by the Herchel Smith Grad-
it allows to understand the influence of distinct features uate Fellowship. L.M.R and A.A.G were supported by
on the global properties of the system. the Tata Sons Limited - Alliance Agreement (A32391).
F.H., L.M.R. and A.A.G. acknowledge financial support
With the versatile formulation of Chimera, and its from Dr. Anders Frøseth. All computations reported in
low requirements on a priori available information, this paper were completed on the Odyssey cluster sup-
Chimera is readily applicable to problems beyond the ported by the FAS Division of Science, Research Com-
scope of the two presented illustrations. We envision puting Group at Harvard University.

[1] Li. J., S. G. Ballmer, E. P. Gillis, S. Fujii, M. J. Schmidt, multistep organic synthesis in reactionware for on-demand
A. M. E. Palazzolo, J. W. Lehnmann, G. F. Morehouse, pharamceuticals. Science, 359(6373):314–319, 2018.
and M. D. Burke. Synthesis of many different types of or- [7] Z. Zhou, X. Li, and R. N. Zare. Optimizing chemical
ganic small molecules using one automated process. Sci- reactions with deep reinforcement learning. ACS Cent.
ence, (6227):1221–1226, 2015. Sci., 2017.
[2] M. Trobe and M. D. Burke. The molecular industrial rev- [8] P. Nikolaev, D. Hooper, F. Webber, R. Rao, K. Decker,
olution: Automated synthesis of small molecules. Angew. M. Krein, J. Poleski, R. Barto, and B. Maruyama. Au-
Chem. Int. Ed., 2018. tonomy in materials research: a case study in carbon nan-
[3] D. P. Tabor, L. M. Roch, S. K. Saikin, C. Kreisbeck, otube growth. npj Comput. Mater., 2:16031, 2016.
D. Sheberla, J. H. Montoya, S. Dwaraknath, M. Aykol, [9] V. Sans, L. Porwol, V. Dragone, and L. Cronin. A self opti-
C. Ortiz, H. Tribukait, C. Amador-Bedolla, C. J. Brabec, mizing synthetic organic reactor system using real-time in-
B. Maruyama, K. Persson, and A. Aspuru-Guzik. Accel- line nmr spectroscopy. Chem. Sci., 6(2):1258–1264, 2015.
erating Discovery of New Materials for Clean Energy in [10] P. R. Wiecha, A. Arbouet, C.n Girard, A. Lecestre,
the Era of Smart Automation. Nat. Rev. Mater., 2018. G. Larrieu, and V. Paillard. Evolutionary multi-objective
[4] L.M. Roch, F. Häse, C. Kreisbeck, T. Tamayo-Mendoza, optimization of colour pixels based on dielectric nanoan-
L.P.E. Yunker, J.E. Hein, and A. Aspuru-Guzik. Chemos: tennas. Nat. Nanotechnol., 12(2):163, 2017.
An orchestration software to democratize autonomous dis- [11] J. Jung. Robust design of plasmonic waveguide using gra-
covery. 2018. dient index and multiobjective optimization. IEEE Pho-
[5] D. Caramelli, D. Salley, A. Henson, G. A. Camarasa, tonics Technol. Lett., 28(7):756–758, 2016.
S. Sharabi, G. Keenan, and L. Cronin. Networking chem- [12] M. H. Ahmadi, M. A. Ahmadi, R. Bayat, M. Ashouri,
ical robots using twitter for #realtimechem. 2018. and M. Feidt. Thermo-economic optimization of stirling
[6] P. J. Kitson, G. Marie, J. P. Francoia, S. S. Zalesskiy, R. C. heat pump by using non-dominated sorting genetic algo-
Sigerson, J. S. Mathieson, and L. Cronin. Digitalization of rithm. Energy Convers. Manag., 91:315–322, 2015.
15

[13] O. Maaliou and B. J. McCoy. Optimization of thermal [30] I. Das and J. E. Dennis. Normal-boundary intersection:
energy storage in packed columns. Solar energy, 34(1):35– A new method for generating the pareto surface in nonlin-
41, 1985. ear multicriteria optimization problems. SIAM J. Optim.,
[14] A. J. White, J. D. McTigue, and C. N. Markides. Anal- 8(3):631–657, 1998.
ysis and optimisation of packed-bed thermal reservoirs for [31] R. Motta, S. M. B. Afonso, and P. R. M. Lyra. A
electricity storage applications. Proc. Inst. Mech. Eng. A, modified nbi and nc method for the solution of n-
230(7):739–754, 2016. multiobjective optimization problems. Struct. Multidis-
[15] J. Marti, L. Geissbühler, V. Becattini, A. Haselbacher, cipl. Optim., 46(2):239–259, 2012.
and A. Steinfeld. Constrained multi-objective optimiza- [32] Achille Messac, Amir Ismail-Yahaya, and Christopher A
tion of thermocline packed-bed thermal-energy storage. Mattson. The normalized normal constraint method for
Appl. Energy, 2018. generating the pareto frontier. Struct. Multidiscipl. Op-
[16] H. M. Dubey, M. Pandit, and B. K. Panigrahi. Hydro- tim., 25(2):86–98, 2003.
thermal-wind scheduling employing novel ant lion opti- [33] A. Messac and C. A. Mattson. Normal constraint method
mization technique with composite ranking index. Renew. with guarantee of even representation of complete pareto
Energ., 99:18–34, 2016. frontier. AIAA journal, 42(10):2101–2111, 2004.
[17] J. P. McMullen, M. T. Stone, S. L. Buchwald, and [34] D. Mueller-Gritschneder, H. Graeb, and U. Schlicht-
K. F. Jensen. An integrated microreactor system for self- mann. A successive approach to compute the bounded
optimization of a heck reaction: From micro-to mesoscale pareto front of practical multiobjective optimization prob-
flow systems. Angew. Chem. Int. Ed., 49(39):7076–7080, lems. SIAM J. Optim., 20(2):915–934, 2009.
2010. [35] K Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A
[18] R. A. Bourne, R. A. Skilton, A. J. Parrott, D. J. Irvine, fast and elitist multiobjective genetic algorithm: Nsga-ii.
and M. Poliakoff. Adaptive process optimization for con- IEEE Trans. Evol. Comput., 6(2):182–197, 2002.
tinuous methylation of alcohols in supercritical carbon [36] D. V. Vargas, J. Murata, H. Takano, and A. C. B.
dioxide. Org. Proc. Res. Dev., 15(4):932–938, 2011. Delbem. General subpopulation framework and taming
[19] B. J. Reizman and K. F. Jensen. Simultaneous solvent the conflict inside populations. Evol. Comput., 23(1):1–
screening and reaction optimization in microliter slugs. 36, 2015.
Chem. Comm., 51(68):13290–13293, 2015. [37] D. Hernández-Lobato, J. Hernandez-Lobato, A. Shah,
[20] N. Holmes, G. R. Akien, R. J. D. Savage, C. Stanetty, and R. P. Adams. Predictive entropy search for multi-
I. R. Baxendale, A. J. Blacker, B. A. Taylor, R. L. Wood- objective bayesian optimization. In International Confer-
ward, R. E. Meadows, and R. A. Bourne. Online quantita- ence on Machine Learning, pages 1492–1501, 2016.
tive mass spectrometry for the rapid adaptive optimisation [38] V. Picheny. Multiobjective optimization using gaussian
of automated flow reactors. React. Chem. Eng., 1(1):96– process emulators via stepwise uncertainty reduction. Stat.
100, 2016. Comput., 25(6):1265–1280, 2015.
[21] S. Krishnadasan, R. J. C. Brown, A. J. deMello, and J. C. [39] M. Emmerich and J. W. Klinkenberg. The computation
deMello. Intelligent routes to the controlled synthesis of of the expected improvement in dominated hypervolume
nanoparticles. Lab Chip, 7(11):1434–1441, 2007. of pareto front approximations. Rapport technique, Leiden
[22] D. N. Jumbam, R. A. Skilton, A. J. Parrott, R. A. University, 34, 2008.
Bourne, and M. Poliakoff. The effect of self-optimisation [40] W. Ponweiser, T. Wagner, D. Biermann, and M. Vincze.
targets on the methylation of alcohols using dimethyl car- Multiobjective optimization on a limited budget of evalu-
bonate in supercritical co2. J. Flow Chem., 2(1):24–27, ations using model-assisted s-metric selection. In Interna-
2012. tional Conference on Parallel Problem Solving from Na-
[23] J. S. Moore and K. F. Jensen. Automated multitrajectory ture, pages 784–794. Springer, 2008.
method for reaction optimization in a microfluidic system [41] J. Knowles. Parego: A hybrid algorithm with on-
using online ir analysis. Org. Proc. Res. Dev., 16(8):1409– line landscape approximation for expensive multiobjec-
1415, 2012. tive optimization problems. IEEE Trans. Evol. Comput.,
[24] D. E. Fitzpatrick, C. Battilocchio, and S. V. Ley. A 10(1):50–66, 2006.
novel internet-based reaction monitoring, control and au- [42] M. Pescador-Rojas, R. H. Gómez, E. Montero, N. Rojas-
tonomous self-optimization platform for chemical synthe- Morales, M. C. Riff, and C. A. Coello. An overview of
sis. Org. Proc. Res. Dev., 20(2):386–394, 2015. weighted and unconstrained scalarizing functions. In In-
[25] R. T. Marler and J. S. Arora. The weighted sum method ternational Conference on Evolutionary Multi-Criterion
for multi-objective optimization: new insights. Struct. Optimization, pages 499–513, 2017.
Multidiscipl. Optim., 41(6):853–862, 2010. [43] I. Y. Kim and O. L. de Weck. Adaptive weighted-sum
[26] B. E. Walker, J. H. Bannock, A. M. Nightingale, and method for bi-objective optimization: Pareto front gener-
J. C. deMello. Tuning reaction products by constrained ation. Struct. Multidiscipl. Optim., 29(2):149–158, 2005.
optimisation. React. Chem. Eng., 2:785–798, 2017. [44] Y. Y. Haimes. On a bicriterion formulation of the prob-
[27] R. T. Marler and J. S. Arora. Survey of multi-objective lems of integrated system identification and system opti-
optimization methods for engineering. Struct. Multidis- mization. IEEE transactions on systems, man, and cyber-
cipl. Optim., 26(6):369–395, 2004. netics, 1(3):296–297, 1971.
[28] V. Pareto. Manuale di economica politica, societa editrice [45] V. Changkong and Y. Y. Haimes. Multiobjective deci-
libraria. milan. translated to English by Schwier AS as sion making: Theory and methodology. In North-Holland
Manual of Political Economy, Kelley, New York, 1906. Series in System Science and Engineering, volume 8. El-
[29] K. Miettinen. Nonlinear multiobjective optimization, sevier Science Publishing Co New York NY, 1983.
volume 12 of international series in operations research [46] J. L. Cohon. Multiobjective programming and planning,
and management science, 1999. volume 140. Courier Corporation, 2004.
16

[47] C. L. Hwang and A. S. M. Masud. Multiple objective de- [66] G. D. Scholes, G. R. Fleming, A. Olaya-Castro, and
cision makingmethods and applications: a state-of-the-art R. Van Grondelle. Lessons from nature about solar light
survey, volume 164. Springer Science & Business Media, harvesting. Nat. Chem., 3(10):763, 2011.
2012. [67] A. E. Jailaubekov, A. P. Willard, J. R. Tritsch, W. L.
[48] W. Stadler. Fundamentals of multicriteria optimization. Chan, N. Sai, R. Gearba, L. G. Kaake, K. J Williams,
In Multicriteria Optimization in Engineering and in the K. Leung, P. J. Rossky, et al. Hot charge-transfer excitons
Sciences, pages 1–25. Springer, 1988. set the time limit for charge separation at donor/acceptor
[49] O. Grodzevich and O. Romanko. Normalization and interfaces in organic photovoltaics. Nat. Mat., 12(1):66,
other topics in multi-objective optimization. Proceed- 2013.
ings of the FieldsMITACS Industrial Problems Workshop, [68] D. A. Vithanage, A. Devižis, V. Abramavičius, Y. In-
2006. fahsaeng, D. Abramavičius, R. C. I. MacKenzie, P. E.
[50] F. Waltz. An engineering approach: hierarchical opti- Keivanidis, A. Yartsev, D. Hertel, J. Nelson, et al. Vi-
mization criteria. IEEE Transactions on Automatic Con- sualizing charge separation in bulk heterojunction organic
trol, 12(2):179–180, 1967. solar cells. Nat. Comm., 4:2334, 2013.
[51] M. J. Rentmeesters, W. K. Tsai, and K. J. Lin. A the- [69] G. D. Scholes, G. R. Fleming, L. X. Chen, A. Aspuru-
ory of lexicographic multi-criteria optimization. In Second Guzik, A. Buchleitner, D. F. Coker, G. S. Engel,
IEEE International Conference on Engineering of Com- R. Van Grondelle, A. Ishizaki, D. M. Jonas, et al. Using
plex Computer Systems, 1996. Proceedings., pages 76–79. coherence to enhance function in chemical and biophysical
IEEE, 1996. systems. Nature, 543(7647):647, 2017.
[52] J. A. Nelder and R. Mead. A simplex method for function [70] R. E. Fenna and B. W. Matthews. Chlorophyll arrange-
minimization. Comput. J., 7(4):308–313, 1965. ment in a bacteriochlorophyll protein from chlorobium
[53] C. G. Broyden. The convergence of a class of double- limicola. Nature, 258(5536):573, 1975.
rank minimization algorithms 1. general considerations. [71] G. Raszewski and T. Renger. Light harvesting in photo-
J. Appl. Math., 6(1):76–90, 1970. system ii core complexes is limited by the transfer to the
[54] C. M. Fonseca and P. J. Fleming. An overview of evolu- trap: can the core complex turn into a photoprotective
tionary algorithms in multiobjective optimization. Evol. mode? J. Am. Chem. Soc., 130(13):4431–4446, 2008.
Comput., 3(1):1–16, 1995. [72] G. Raszewski, B. A. Diner, E. Schlodder, and T. Renger.
[55] J. Kennedy. The particle swarm: social adaptation of Spectroscopic properties of reaction center pigments in
knowledge. In IEEE International Conference on Evolu- photosystem ii core complexes: revision of the multimer
tionary Computation, 1997., pages 303–308. IEEE, 1997. model. Biophys. J., 95(1):105–119, 2008.
[56] R. A. Fisher. The design of experiments. Oliver and [73] F. Müh, M. E. A. Madjet, and T. Renger. Structure-
Boyd; Edinburgh; London, 1937. based simulation of linear optical spectra of the cp43 core
[57] G. E. P. Box, J. S. Hunter, and W. G. Hunter. Statis- antenna of photosystem ii. Photosynth. Res., 111(1-2):87–
tics for experimenters: design, innovation and discovery. 101, 2012.
Wiley, 2nd edition edition, 2005. [74] Y. Tanimura and R. Kubo. Time evolution of a quantum
[58] M. J. Anderson and P. J. Whitcomb. DOE simplified: system in contact with a nearly gaussian-markoffian noise
practical tools for effective experimentation. CRC Press, bath. J. Phys. Soc. Jap., 58(1):101–114, 1989.
2016. [75] A. Ishizaki and G. R. Fleming. On the adequacy of the
[59] N. Hansen and A. Ostermeier. Completely derandom- redfield equation and related approaches to the study of
ized self-adaptation in evolution strategies. Evol. Comput., quantum dynamics in electronic energy transfer. J. Chem.
9(2):159–195, 2001. Phys., 130(23):234110, 2009.
[60] N. Hansen, S. D. Müller, and P. Koumoutsakos. Re- [76] Y. Tanimura. Reduced hierarchy equations of motion
ducing the time complexity of the derandomized evolu- approach with drude plus brownian spectral distribution:
tion strategy with covariance matrix adaptation (cma-es). Probing electron transfer processes by means of two-
Evol. Comput., 11(1):1–18, 2003. dimensional correlation spectroscopy. J. Chem. Phys.,
[61] J. Snoek, H. Larochellrobotie, and R. P. Adams. Practi- 137(22):22A550, 2012.
cal Bayesian optimization of machine learning algorithms. [77] B. Hein, C. Kreisbeck, T. Kramer, and M. Rodrı́guez.
In Advances in Neural Information Processing Systems Modelling of oscillations in two-dimensional echo-spectra
(NIPS), volume 25, pages 2951–2959. 2012. of the Fenna-Matthews-Olson complex. New J. Phys.,
[62] J. Snoek, K. Swersky, R. Zemel, and R. P. Adams. Input 14:023018, 2012.
warping for bayesian optimization of non-stationary func- [78] C. Kreisbeck, T. Kramer, and A. Aspuru-Guzik. Dis-
tions. In International Conference on Machine Learning, entangling electronic and vibronic coherences in two-
pages 1674–1682, 2014. dimensional echo spectra. J. Phys. Chem. B, 117:9380–
[63] F. Häse, L. M. Roch, C. Kreisbeck, and A. Aspuru-Guzik. 9385, 2013.
PHOENICS: A universal deep Bayesian optimizer. arXiv [79] C. Kreisbeck and T. Kramer. Exciton dynamics lab for
preprint arXiv:1801.01469, 2018. light-harvesting complexes (gpu-heom), Feb 2013.
[64] T. C. Malig, J. D. B. Koenig, H. Situ, N. K. Chehal, [80] C. Kreisbeck, T. Kramer, and A. Aspuru-Guzik. Scalable
P. G. Hultin, and J. E. Hein. Real-time HPLC-MS re- high-performance algorithm for the simulation of exciton
action progress monitoring using an automated analytical dynamics. application to the light-harvesting complex ii
platform. React. Chem. Eng., 2:309, 2017. in the presence of resonant vibrational modes. J. Chem.
[65] F. Häse, L. M. Roch, C. Kreisbeck, and A. Aspuru- Theory Comput., 10:4045–4054, 2014.
Guzik. Phoenics: A universal deep bayesian optimizer [81] J. A. Leegwater. Coherent versus incoherent energy
(https://github.com/aspuru-guzik-group/phoenics). transfer and trapping in photosynthetic antenna com-
GitHub, 2018. plexes. J. Phys. Chem., 100(34):14403–14409, 1996.
17

[82] V. May et al. Charge and energy transfer dynamics in

molecular systems. John Wiley & Sons, 2008.
[83] J. Adolphs and T. Renger. How proteins trigger exci-
tation energy transfer in the fmo complex of green sulfur
bacteria. Biophys. J., 91(8):2778–2797, 2006.
18

VII. SUPPLEMENTARY INFORMATION

A. Benchmark functions

The influence of the smoothing parameter τ on the shape of the achievement scalarizing function has been
demonstrated on a set of three one-dimensional objectives, presented in Eq. 9. Note, that all objectives were
considered on the x ∈ [−1, 5]. The three objectives are also shown in Fig. 1 in the main text (see Sec. III A).


−2x + 1


if x≤1
x − 2 if 1 < x ≤ 3
f0 (x) = (9)
7 − 2x
 if 3 < x ≤ 3.5

2x − 7 if 3.5 < x


f1 (x) = 1 − 2 exp[−(x − 2.5)2 ] (10)

2

f2 (x) = (5 − x) + exp(x − 2) /100 (11)

In addition, we benchmarked the performance of Chimera on six well-established analytic sets of objective
functions. All of these functions are defined on two dimensional parameter spaces and benchmark sets comprise
of two or three different objectives. Contour plots of all objective functions as well as the locations of the global
minimum of each objective are presented in Fig. 10. Note, that for all objective functions the parameter space has
always been rescaled to the unit square [0, 1]2 and the objective function values were rescaled to the [0, 1] interval.
Python implementations of all objective functions are provided on GitHub.1

FIG. 8: Contour plots of the employed benchmark functions

For all benchmark optimization procedures we chose a particular set of tolerances and limits on the objective
functions in each benchmark set. Tolerances and limits used throughout all optimization runs for all benchmark
functions are listed in Tab. III. Goal of all optimization procedures on the analytic benchmark set is to minimize
each individual objective given the defined hierarchies and tolerances.

B. Influence of the smoothing parameter on optimization behavior

In this section we study the influence of the value of the smoothing parameter τ on the behavior of the optimization
procedure. On the one hand, smoothing parameters which are too small yield a rather rough ASF, which could be
more challenging for the optimization algorithm. On the other hand, smoothing parameters which are too large
19

Benchmark set Objective Tolerance Limit

f0 50 % 1.808
Ackley
f1 20 % 2.616
f0 50 % 0.500
Fonseca
f1 50 % 0.873
f0 20 % 1.650
Viennet f1 12 % 17.296
f2 20 % -0.136
f0 25 % 2.653
ZDT1
f1 25 % 0.150
f0 25 % 2.980
ZDT2
f1 25 % 0.150
f0 25 % 2.391
ZDT3
f1 25 % 0.150

TABLE III: Tolerances and limits for all analytic benchmark sets used for assessing the performance of Chimera.

might smoothen the objectives too much, such that the location of the global optimum of the ASF is shifted away
from the Pareto optimal point (see Sec. III A). To test this hypothesis, we ran Phoenics on the set of one dimensional
objectives presented in Sec. VII A with tolerances of 30 % on objective 0, 40 % on objective 1 and 50 % on objective
2 in combination with different choices of the smoothing parameter τ .

To assess the influence of the value of the smoothing parameter τ on the optimization behavior, we consider an
ensemble of 25 individual optimization runs for each smoothing parameter value. For each parameter value, we count
how many of the individual optimization runs found parameter points for which all objectives meet the specified
tolerances. Results are reported in Fig. 9 for a total of 400 optimization iterations in each run.

FIG. 9: Rates of successful optimizations on the one dimensional benchmark set out of 25 individual optimization runs conducted
with Phoenics on Chimera with different values of the smoothing parameter τ (see legend). Larger values of τ generally show
lower success rates later on in the optimization procedures.

In addition to the success rates, we report average optimization traces in Fig. 10, where we depict the average
closest achieved objective function values for all three objectives over the progress of the optimization. The target
values for all three objectives are indicated via dashed lines. Background colors in the plots indicate if the currently
best performing parameter set violates objective 0 (red), objective 1 (yellow), objective 2 (blue) or non of the
constraints (white).
20

FIG. 10: Influence of the value of the smoothing parameter τ on the convergence of the optimization run. Reported traces
have been averaged over 25 individual optimization runs with different random seeds. We do not observe a strong dependence
of the optimization behavior on the smoothing parameter.

We observe, that all three objectives are quickly achieved by all optimization runs if the smoothing parameter τ is
small (τ ≤ 0.03). At the same time, we observe that smoothing parameter values slightly above zero allow for faster
convergence towards the target objective values. With these observations we chose to apply a smoothing value of
τ = 0.001 to all other optimization procedures presented in this work.

C. Analytic benchmarks

We benchmarked Chimera by running a number of optimization algorithms based on different methods on six
well-established analytic benchmark sets introduced in Sec. VII A. Details of the benchmark studies are provided in
the main text (see Sec. IV B). In this section, we report the complete results of this benchmark study.

1. Performance after completion of the optimization procedures

In Fig. 11, we report the average smallest achieved relative deviation of objectives from Pareto optimal objectives
for the remaining four benchmark sets: ZDT1, ZDT2, ZDT3 and Ackley. Benchmark results on the Fonseca and
the Viennet sets are reported in the main text (see Sec. IV B). Four different optimization algorithms (grid search,
CMA-ES, spearmint, and Phoenics; see Sec. II for details) have been run on both Chimera and c-ASF for a total of
100 optimization iterations.

In accordance with results on the Fonseca and the Viennet variant benchmark sets (see Sec. IV B) we find that
Chimera leads the optimization algorithms closer to the Pareto optimal values with the same number of function
evaluations.
21

FIG. 11: Average smallest deviations from Pareto optimal points achieved by the four studied optimization algorithms on the
remaining benchmark sets. Results are averaged over 25 independent runs executed for 100 iterations each.

2. Performance during the optimization procedures

In addition to the average achieved smallest deviations at the end of the optimization procedures in Sec. VII C 1
we also report the traces of achieved objectives over the duration of the optimization procedure. Fig. 12 depicts the
objective traces for all six benchmark functions after every iteration of the optimization procedures. Uncertainty
bands highlight the 68 % confidence interval computed from the 25 repetitions of each optimization run.

FIG. 12: Optimization traces depicting the smallest deviations between sampled objectives and Pareto optimal objectives
averaged over 25 optimization runs with the four employed optimization algorithms on Chimera and c-ASF. Uncertainty bands
highlight the 68 % confidence interval computed from the 25 repetitions of each optimization run.

Overall, we observe that Chimera enables the studied optimization algorithms to get closer to the Pareto optimal
values faster from the very beginning of the optimization procedure. The fact, that tolerances are defined with
respect to the current observed minima and maxima of the objectives therefore does not seem to significantly delay
22

the optimization procedure.

Furthermore, as reported in the main text (see Sec. IV B), we find that Chimera samples points in accordance with
the imposed hierarchy, i.e. improvements on the sub-objective are not realized if they come along with degradations
on the main objective. However, c-ASF sometimes shows this behavior.

D. Virtual model of the N9

In this section we detail the construction of a probabilistic model to reproduce and interpolate experimental results
obtained from autonomous calibrations of a robotic sampling sequence for direct-inject HPLC analysis.2 Provided
sufficient coverage of the space of experimental parameters, the trained probabilistic model can then be used to query
experimental results for any set of experimental conditions without running the experiment.

1. Dataset of experimental results

The experimental procedure consists in the characterization of an unknown chemical sample via high-performance
liquid chromatography (HPLC). The calibration of the setup involves an optimization of six experimental parameters.
For each experiment, the response of the HPLC as well as the execution time of the experiment can be measured.
Details of the experimental procedure are described elsewhere.2

An autonomous sampling sequence of the experimental procedure allowed to the efficient execution of a total
of 1, 500 individual experiments for the acquisition of experimental outcomes for given experimental conditions.
Parameters for the experimental procedure were generated by uniformly sampling the parameter space, to ensure
uniform and uncorrelated coverage of the parameter space. Experimental results are depicted in Fig. 13

FIG. 13: Experimental results of individual auto-calibration experiments on the robotic sampling sequence. Panel (A): Achieved
peak areas for different parameter choices. Panel (B): Achieved execution times for different parameter choices.
23

2. Training a Bayesian neural network

We construct a test set by randomly sampling 10 % of all points in the dataset. From the remaining 90 % of the
dataset, we select the most diverse 80 % for the training set based on principal component analysis (PCA) analysis
following a procedure reported in the literature.3 The remaining 10 % of the dataset are used as a validation set for
early stopping.

The probabilistic model was set up as a fully connected Bayesian neural network with three layers and 192
neurons per layer. Distributions of weights and biases were adapted via variational expectation-maximization, which
was carried in Edward, version 1.3.5,4 with the Adam optimization algorithm,5 a learning rate of 10−2.5 , and 100
randomly chosen training points per batch.

For the peak area, one of the experimental results to reproduce, we observe that some of the experimentally
obtained values are exactly zero, but never negative. To incorporate these features in our model, we employ a
modified version of the leaky ReLU activation function, as shown in Eq. 12. The modification to the traditional leaky
ReLU consists in splitting the activation function into three piece-wise linear parts. While positive inputs x > 0 are
processed just like in traditional ReLU or leaky ReLU activation functions, negative inputs are mapped onto zero if
the inputs are slightly negative, but mapped onto a linear function with small slope for large negative inputs.


x
 if 0 < x
f (x) = 0 if − dx < x ≤ 0 (12)

αx if x ≤ −dx

We chose α = 0.1 and dx = 2. With this choice of the leakage parameter, the activation function is flat for
inputs x ∈ [−2, 0]. Weights and biases were regularized via a Laplacian prior, corresponding to L1 regularization in
traditional neural networks. Every 200 training epochs we computed the prediction accuracy on the training and the
validation set by sampling predictions from 200 network instances. Network training was aborted if the prediction
error on the validation set was found to either increase or to be twice as large as the prediction error on the training set.

Fig. 14 illustrates the prediction accuracies on the peak areas and execution times obtained by averaging over 200
samples after completion of the training procedure. Prediction errors for the two experimental results are reported
in Tab. IV. Correlations between predicted properties and target properties of 97.8 % f for the peak areas and 99.7 %
for the execution times.

Dataset HPLC response [a.u.] Execution time [s]

Training 117.16 2.20
Validation 147.04 1.90
Test 193.64 1.75

TABLE IV: RMSDs of experimental results predicted by the trained Bayesian neural network from the actual experimental
results for all three datasets.

E. Optimizations on periodic domains

The inverse-design problem of finding excitation energy transfer systems discussed in the main text (see Sec. V B)
involves a total of ten independent parameters. Four of these parameters describe the orientation of transition dipoles
with respect to a principal axis, expressed in terms of an angle ϕ ∈ [0, 2π]. The orientation of the transition dipoles
is periodic, which imposes a constraint on the response surface of objectives with respect to these parameters. This
constraint can be taken into account when constructing approximations to response surfaces during the optimization
procedure. Indeed, by accounting for this periodicity constraint, a more accurate approximation to the response
surface can be found, which has the potential to determine the location of the global minimum in fewer optimization
iterations.
24

FIG. 14: Predictions of Bayesian neural networks compared to experimental results on the N9 system. Panel (A) displays
scatter plots for comparing peak areas, and panel (B) shows results for execution times. The lines of perfect agreement are
represented in black. Pearson correlation coefficients ρ are reported for both properties.

In this section, we demonstrate how Phoenics can be expanded to account for periodic boundary conditions on the
parameter domain. Phoenics constructs approximations to an objective function by estimating the kernel density of
observed parameter points and reweighting those by the corresponding observed objective function value.6 Kernel
densities pk are estimated via a Bayesian neural network as shown in Eq. 13. The Bayesian neural network is used
to sample random variables φ3 in the parameter domain based on previously observed parameter points xk . Ref.6
provides a detailed description of the construction of kernel densities

r i
τn h τ
n 2
pk (x) = exp − (x − φ3 (θ; xk )) . (13)
2π 2 BNN

Importantly, the construction of the kernel densities pk at an arbitrary point x ∈ Rd in the parameter domain
depends on the distance d(x, φ3 (θ; xk )) = x − φ3 (θ; xk ) between this parameter point x and the random variable
φ3 (θ; xk ) sampled from the Bayesian neural network. We now consider a scenario where the objective f is periodic
with periodicity P, i.e. f (x) = f (x + P) for all x ∈ Rd . A periodicity constraint on the parameter domain can be
formulated by replacing this distance d(x, φ3 (θ; xk )) by a periodic distance dperiodic (x, φ3 (θ; xk )).

Computing the periodic distance from all periodic images of the kernel density is computationally costly. As a
compromise between the computational demand of the approach and accuracy of the periodicity constraint, we only
account for nearest periodic images and neglect higher order periodic images. This approximation becomes more
accurate with more optimization iterations, as the precision τn increases.

We illustrate the construction of periodic objective function approximations one a one-dimensional example. The
considered objective function f consists of the product of two cosine functions, as shown in Eq. 14, with a period
of P = 1. Phoenics was used to determine the location of the global minimum of this function within the x ∈ [0, 1]
interval by constructing the approximation with and without periodicity support. Note, that the global minimum is
located at x∗ = 0.05.

f (x) = − cos ((π(x − 0.05)) cos (3π(x − 0.05)) (14)

Fig. 15 shows the approximations constructed to the objective function after two, five and eight optimization
iterations with and without periodicity support. We find that the optimization run without periodicity support tend
to sample the objective function at large values of x. Only after a few optimization iterations, the location of the
global minimum at small values of x is discovered. In contrast, the optimization procedure supporting periodicity in
the objective function discovers the location of the global minimum within much fewer iterations, and needs fewer
observations to construct reasonable approximations to the objective function.
25

FIG. 15: Optimization runs using Phoenics on a periodic one-dimensional objective function. Upper panels depict the approxi-
mation to the objective function constructed by Phoenics without accounting for the periodicity of the objective. Lower panels,
in contrast, depict the approximations constructed with periodicity taken into account.

F. Excitonics application

Here we present the average optimization traces for all studied objective hierarchy permutations of the excitonics
application. While the order of the hierarchy in the objectives was changed between different permutation runs, all
other parameters such as tolerances were kept the same. Fig. 16 shows optimization traces for all six permutations
of hierarchies averaged over 25 individual optimization runs. Optimization traces are sorted by hierarchy from top
to bottom for all permutations.

FIG. 16: Optimization traces for the three objectives in the excitonics application averaged over 25 individual optimization
runs for all six possible permutations. Top panels present the optimization traces for the main objectives in each permutation,
central panels the optimization traces for the sub-objective and bottom panels the traces for the least important objective.
Dashed black lines indicate the lower/upper limits on each of the objectives.
26

[1] F. Häse, L. M. Roch, C. Kreisbeck, and A. Aspuru-Guzik. Phoenics: A universal deep bayesian optimizer
(https://github.com/aspuru-guzik-group/phoenics). GitHub, 2018.
[2] L.M. Roch, F. Häse, C. Kreisbeck, T. Tamayo-Mendoza, L.P.E. Yunker, J.E. Hein, and A. Aspuru-Guzik. Chemos: An
orchestration software to democratize autonomous discovery. 2018.
[3] F. Häse, C. Kreisbeck, and A. Aspuru-Guzik. Machine learning for quantum dynamics: deep learning of excitation energy
transfer properties. Chem. Sci., 8(12):8419–8426, 2017.
[4] D. Tran, A. Kucukelbir, A. B. Dieng, M. Rudolph, D. Liang, and D. M. Blei. Edward: A library for probabilistic modeling,
inference, and criticism. arXiv preprint arXiv:1610.09787, 2016.
[5] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[6] F. Häse, L. M. Roch, C. Kreisbeck, and A. Aspuru-Guzik. PHOENICS: A universal deep Bayesian optimizer. arXiv preprint
arXiv:1801.01469, 2018.