Automated Algorithm Configuration and Parameter Tuning
Automated Algorithm Configuration and Parameter Tuning
Holger H. Hoos
3.1 Introduction
Holger H. Hoos
Department of Computer Science, University of British Columbia, 2366 Main Mall, Vancouver,
BC, V6T 1Z4, Canada, e-mail: [email protected]
1 We note that the use of heuristic techniques does not imply that the resulting algorithms are nec-
essarily incomplete or do not have provable performance guarantees, but often results in empirical
performance far better than the bounds guaranteed by rigorous theoretical analysis.
efficacy with which a heuristic algorithm solves a given problem instance or class of
problem instances. For example, the run-time of CPLEX 12.1 – a widely used, com-
mercial solver for mixed integer programming problems – has recently been demon-
strated to vary by up to a factor of over 50 with the settings of 76 user-accessible
parameters [42].
A problem routinely encountered by designers as well as end users of parame-
terised algorithms is that of finding parameter settings (or configurations) for which
the empirical performance on a given set of problem instances is optimised. For-
mally, this algorithm configuration or parameter tuning problem can be stated as
follows:
Given
• an algorithm A with parameters p1 , . . . , pk that affect its behaviour,
• a space C of configurations (i.e., parameter settings), where each config-
uration c ∈ C specifies values for A’s parameters such that A’s behaviour
on a given problem instance is completely specified (up to possible ran-
domisation of A),
• a set of problem instances I,
• a performance metric m that measures the performance of A on instance
set I for a given configuration c,
find a configuration c∗ ∈ C that results in optimal performance of A on I ac-
cording to metric m.
In the context of this problem, the algorithm whose performance is to be optimised
is often called the target algorithm, and we use A(c) to denote target algorithm A
under a specific configuration c. The set of values any given parameter p can take
is called the domain of p. Depending on the given target algorithm, various types of
parameters may occur. Categorical parameters have a finite, unordered set of dis-
crete values; they are often used to select from a number of alternative mechanisms
or components. Using Boolean parameters, heuristic mechanisms can be activated
or deactived, while the behaviour and interaction of these mechanisms is often con-
trolled by integer- and real-valued parameters (the former of which are a special
cases of ordinal parameters, whose domains are discrete and ordered). Conditional
parameters are only active when other parameters are set to particular values; they
routinely arise in the context of mechanisms that are activated or selected using some
parameter, and whose behaviour is then controlled by other parameters (where the
latter parameters conditionally depend on the former). Sometimes, it is useful to
place additional constraints on configurations, e.g., to exclude certain combinations
of parameter values that would lead to ill-defined, incorrect or otherwise undesirable
behaviour of a given target algorithm.
Clearly, the number and types of parameters, along with the occurrence of con-
ditional parameters and constraints on configurations, determine the nature of the
configuration space C and have profound implications on the methods to be used
for searching performance-optimising configurations within that space. These meth-
ods range from well-known numerical optimisation procedures, such as the Nelder-
Mead Simplex algorithm [49, 13] or the more recent gradient-free CMA-ES algo-
3 Automated Algorithm Configuration and Parameter Tuning 39
rithm [25, 27, 26], to approaches based on experimental design methods (see, e.g.,
11, 5, 1), response-surface models (see, e.g., 44, 7) and stochastic local search pro-
cedures (see, e.g., 36, 37).
In general, when configuring a specific target algorithm, it is desirable to find
parameter configurations that work well on problem instances other than those in
the given instance set I. To this end, care needs to be taken in selecting the instances
in I to be representative of the kinds of instances to which the optimised target
algorithm configuration is expected to be applied. Difficulties can arise when I is
small, yet contains very different types of instances. To recognise situations in which
a configured target algorithm, A(c∗ ), fails to perform well when applied to instances
other than those used in the configuration process, it is advisable to test it on a set
of instances not contained in I; this can be done by including in I only part of the
overall set of instances available, or by means of cross-validation.
It is also advisable to investigate performance variation of A(c∗ ) over instance set
I, since, depending on the performance metric m used for the configuration of A and
differences between instances in I, the optimised configuration c∗ might represent a
trade-off between strong performance on some instances and weaker performance
on others. In particular, when using a robust statistic, such as median run-time, as
a performance metric, poor performance on large parts of a given instance set can
result. To deal effectively with target algorithm runs in which no solution was pro-
duced (in particular, time-outs encountered when optimising run-time), it is often
useful to use a performance metric based on penalised averaging, in which a fixed
penalty is assigned to any unsuccessful run of A (see also 37).
In the existing literature, the terms algorithm configuration and parameter tuning
are often used interchangeably. We prefer to use parameter tuning in the context of
target algorithms with relatively few parameters with mostly real-valued domains,
and algorithm configuration in the context of target algorithms with many categor-
ical parameters. Following Hoos [28], we note that algorithm configuration prob-
lems arise when dealing with an algorithm schema that contains a number of instan-
tiable components (typically, subprocedures or functions), along with a discrete set
of concrete choices for each of these. While most standard numerical optimisation
methods are not applicable to these types of algorithm configuration problems, F-
Race [11, 5], Calibra [1] and ParamILS [36, 37] have been used successfully in this
context. However, so far only ParamILS has been demonstrated to be able to deal
with the vast design spaces resulting from schemata with many independently in-
stantiable components (see, e.g., 45, 66), and promising results have been achieved
by a genetic programming procedure applied to the configuration of local search
algorithms for SAT [18, 19], as well as by a recent gender-based genetic algorithm
[2].
In the remainder of this chapter, we discuss three classes of methods for solving
algorithm configuration and parameter tuning problems. Racing procedures itera-
tively evaluate target algorithm configurations on problem instances from a given
set and use statistical hypothesis tests to eliminate candidate configurations that
are significantly outperformed by other configurations; ParamILS uses a powerful
stochastic local search (SLS) method to search within potentially vast spaces of can-
40 Holger H. Hoos
Given a number of candidate solvers for a given problem, the concept of racing is
based on a simple yet compelling idea: sequentially evaluate the candidates on a
series of benchmark instances and eliminate solvers as soon as they have fallen too
far behind the current leader, i.e., the candidate with the overall best performance at
a given stage of the race.
Racing procedures were originally introduced for solving model selection prob-
lems in machine learning. The first such technique, dubbed Hoeffding Races [48],
was introduced in a supervised learning scenario, where a black-box learner is eval-
uated by measuring its error on a set of test instances. The key idea is to test a
given set of models, one test instance at a time, and to discard models as soon as
they are shown to perform significantly worse than the best ones. Performance is
measured as error over all test instances evaluated so far, and models are eliminated
3 Automated Algorithm Configuration and Parameter Tuning 41
from the race using non parametric bounds on the true error, determined based on
Hoeffding’s inequality (which gives an upper bound on the probability of the sum
of random variables deviating from its expected value). More precisely, a model is
discarded from the race if the lower bound on its true error (for a given confidence
level 1 − δ ) is worse than the upper bound on the error of the currently best model.
As a result, the computational effort expended in evaluating models becomes in-
creasingly focussed on promising candidates, and the best candidate models end up
getting evaluated most thoroughly.
This idea can be easily transferred to the problem of selecting an algorithm from
a set of candidates, where each candidate may correspond to a configuration of a pa-
rameterised algorithm [11]. In this context, candidate algorithms (or configurations)
are evaluated on a given set of problem instances. As in the case of model selection,
the race proceeds in steps, where in each step every candidate is evaluated on the
same instance, taken from the given instance set, and candidates that performed sig-
nificantly worse on the instances considered so far are eliminated from the race. (We
note that the evaluation of candidates in each step can, in principle, be performed
independently in parallel.)
This procedure requires that the set of candidate algorithms be finite and, since
in the initial steps of a race all candidates will need to be evaluated, of somewhat
reasonable size. Therefore, when applied to algorithm configuration or parameter
tuning scenarios with continuous parameters, racing approaches need to make use
of discretisation or sampling techniques. In the simplest case, all continuous param-
eters are discretised prior to starting the race. Alternatively, stages of sampling and
racing can be interleaved, such that the candidate configurations being considered
become increasingly concentrated around the best performing configurations.
In the following, we will first present the F-Race procedure of Birattari et al.
[11] in more detail and outline its limitations. We will then discuss variations of
F-Race that overcome those weaknesses [5, 12], and finally summarise some results
achieved by these racing procedures in various algorithm configuration scenarios.
3.2.1 F-Race
The F-Race algorithm by Birattari et al. [11] closely follows the previously dis-
cussed racing procedure. Similarly to Hoeffding races, it uses a non parametric
test as the basis for deciding which configurations to eliminate in any given step.
However, rather than just performing pairwise comparisons with the currently best
configuration (the so-called incumbent), F-Race first uses the rank-based Friedman
test (also known as Friedman two-way analysis of variance by ranks) for ni inde-
pendent s-variate random variables, where s is the number of configurations still in
the race, and ni is the number of problem instances evaluated so far. The Friedman
test assesses whether the s configurations show no significant performance differ-
ences on the ni given instances; if this null hypothesis is rejected, i.e., if there is
evidence that some configurations perform better than others, a series of pairwise
42 Holger H. Hoos
procedure F-Race
input target algorithm A, set of configurations C, set of problem instances I,
performance metric m;
parameters integer nimin ;
output set of configurations C∗ ;
C∗ := C; ni := 0;
repeat
randomly choose instance i from set I;
run all configurations of A in C∗ on i;
ni := ni + 1;
if ni ≥ nimin then
perform rank-based Friedman test on results for configurations in C∗ on all instances
in I evaluated so far;
if test indicates significant performance differences then
c∗ := best configuration in C∗ (according to m over instances evaluated so far);
for all c ∈ C∗ \ {c∗ } do
perform pairwise Friedman post hoc test on c and c∗ ;
if test indicates significant performance differences then
eliminate c from C∗ ;
end if;
end for;
end if;
end if;
until termination condition met;
return C∗ ;
end F-Race
Fig. 3.1: Outline of F-Race for algorithm configuration (original version, according
to 11). In typical applications, nimin is set to values between 2 and 5; further details
are explained in the text. When used on its own, the procedure would typically
be modified to return c∗ ∈ C∗ with the best performance (according to m) over all
instances evaluated within the race
post hoc tests between the incumbent and all other configurations is performed. All
configurations found to have performed significantly worse than the incumbent are
eliminated from the race. An outline of the F-Race procedure for algorithm config-
uration, as introduced by [11], is shown in Figure 3.1; as mentioned by [5], runs on
a fixed number of instances are performed before the Friedman test is first applied.
The procedure is typically terminated either when only one configuration remains,
or when a user-defined time budget has been exhausted.
The Friedman test involves ranking the performance results of each configura-
tion on a given problem instance; in the case of ties, the average of the ranks that
would have been assigned without ties is assigned to each tied value. The test then
determines whether some configurations tend to be ranked better than others when
considering the rankings for all instances considered in the race up to the given iter-
ation. Following Birattari et al. [11], we note that performing the ranking separately
for each problem instance amounts to a blocking strategy on instances. The use of
3 Automated Algorithm Configuration and Parameter Tuning 43
this strategy effectively reduces the impact of noise effects that may arise from the
performance variation observed over the given instances set for any configuration of
the target algorithm under consideration; this can become critical when those per-
formance variations are large, as has been observed for many algorithms for various
hard combinatorial problems (see, e.g., 21, 29).
A major limitation of this basic version of F-Race stems from the fact that in the
initial steps all given configurations have to be evaluated. This property of basic
F-Race severely limits the size of the configuration spaces to which the procedure
can be applied effectively – particularly when dealing with configuration spaces
corresponding to so-called full factorial designs, which contain all combinations of
values for a set of discrete (or discretised) parameters. Two more recent variants of
F-Race, Sampling F-Race and Iterative F-Race, have been introduced to address this
limitation [5]; both use the previously described F-Race procedure as a subroutine.
Sampling F-Race (short: RSD/F-Race) is based on the idea of using a sampling
process to determine the initial set of configurations subsequently used in a stan-
dard F-Race. In RSD/F-Race, a fixed number r of samples is determined using a so-
called Random Sampling Design, in which each configuration is drawn uniformly at
random from the given configuration space C. (In the simplest case, where no con-
ditional parameters or forbidden configurations exist, this can be done by sampling
values for each parameter independently and uniformly at random from the respec-
tive domain.) As noted by Balaprakash et al. [5], the performance of this procedure
depends substantially on r, the number of configurations sampled in relation to the
size of the given configuration space.
A somewhat more effective approach for focussing a procedure based on F-Race
on promising configurations is Iterative F-Race (short: I/F-Race). The key idea be-
hind I/F-Race is the use of an iterative process, where in the first stage of each iter-
ation configurations are sampled from a probabilistic model M, while in the second
stage a standard F-Race is performed on the resulting sample, and the configurations
surviving this race are used to define or update the model M used in the following
iteration. (See Figure 3.2.)
The probabilistic model used in each iteration of I/F-Race consists of a series
of probability distributions, D1 , . . . , Ds , each of which is associated with one of s
‘promising’ parameter configurations, c1 , . . . , cs . Balaprakash et al. [5] consider only
numerical parameters and define each distribution Di to be a k-variate normal distri-
bution Ni := N (μi , Σi ) that is centred on configuration ci , i.e., μi = ci . They further
define the covariance between any two different parameters in a given Ni to be zero,
such that Ni can be factored into k independent, univariate normal distributions. To
start the process with an unbiased probabilistic model, in the first iteration of I/F-
Race a single k-variate uniform distribution is used, which is defined as the product
of the k independent uniform distributions over the ranges of each given parameter
44 Holger H. Hoos
procedure I/F-Race
input target algorithm A, set of configurations C, set of problem instances I,
performance metric m;
output set of configurations C∗ ;
initialise probabilistic model M;
C := 0;/ // later, C is the set of survivors from the previous F-Race
repeat
based on model M, sample set of configurations C ⊆ C;
perform F-Race on configurations in C ∪C to obtain set of configurations C∗ ;
update probabilistic model M based on configurations in C∗ ;
C := C∗ ;
until termination condition met;
return c∗ ∈ C∗ with best performance (according to m) over all instances evaluated;
end I/F-Race
Fig. 3.2: High-level outline of Iterated F-Race, as introduced by [5]; details are
explained in the text. The most recent version of I/F-Race slightly deviates from
this outline (see 12)
(we note that this can be seen as a degenerate case of the normal distributions used
subsequently, in which the variance is infinite and truncation is applied).
In each iteration of I/F-Race, a certain number of configurations are sampled
from the distributions N1 , . . . , Ns . In the first iteration, this corresponds to sampling
configurations uniformly at random from the given configuration space. In subse-
quent iterations, for each configuration to be sampled, first, one of the Ni is chosen
using a rank-based probabilistic selection scheme based on the performance of the
configuration ci associated with Ni (for details, see 5), and then a configuration
is sampled from this distribution. Values that are outside the range allowable for a
given parameter are set to the closer of the two boundaries, and settings for param-
eters with integer domains are rounded to the nearest valid value. The number a
of configurations sampled in each iteration depends on the number s of configura-
tions that survived the F-Race in the previous iteration; Balaprakash et al. [5] keep
the overall number of configurations considered in each iteration of I/F-Race con-
stant at some value r, and therefore simply replace those configurations eliminated
by F-Race with newly sampled ones (i.e., a := r − s, where in the first iteration,
s = 0).
The resulting population of a + s configurations is subjected to a standard F-
Race; this race is terminated using a complex, disjunctive termination condition
that involves a (lower) threshold on the number of surviving configurations as well
as upper bounds on the computational budget (measured in target algorithm runs)
and the number of problem instances considered 2 . Each of the F-Races conducted
within I/F-Race uses a random permutation of the given instance set in order to
2 The threshold mechanism ends the race as soon as the number of survivors has fallen below k,
the number of target algorithm parameters.
3 Automated Algorithm Configuration and Parameter Tuning 45
avoid bias due to a particular instance ordering. The s configurations that survived
the race (where the value of s depends on the part of the termination condition that
determined the end of that race) induce the probabilistic model used in the following
iteration of I/F-Race.
To increasingly focus the sampling process towards the most promising config-
urations, the standard deviations of the component distributions of the probabilistic
models Ni are gradually decreased using a volume reduction technique. More pre-
cisely, after each iteration, the standard deviation vector σi of each distribution Ni is
scaled by a factor (1/r)k , where r is the total number of configurations entered into
the F-Race, and k is the number of given parameters; this corresponds to a reduction
of the total volume of the region bounded by μi ± σi (over all k parameters) by a fac-
tor of r. At the beginning of I/F-Race, when configurations are sampled uniformly,
the standard deviation values are (somewhat arbitrarily) set to half of the range of
the respective parameter values.
I/F-Race, as specified by Balaprakash et al. [5], assumes that all parameters
are numerical. This limitation is overcome in a later variant [12], which supports
categorical parameters by sampling their values from discrete probability distri-
butions that are updated by redistributing probability mass to values seen in good
configurations, as determined by F-Race. This version of I/F-Race, which we call
I/F-Race-10 for clarity, also differs from the one described previously in several
other aspects. Notably, the number of iterations in I/F-Race-10 is determined as
2 + log2 (k) + 0.5 , and the overall computational budget (i.e., number of target al-
gorithm runs) is distributed equally over these iterations. Furthermore, the number r
of configurations considered at iteration number t is set to b/(5+t) , where b is the
computational budget available for that iteration; this leads to fewer configurations
being considered in later iterations. The threshold on the number of survivors below
which any given F-Race is terminated is also determined as 2 + log2 (k) + 0.5 . Fi-
nally, I/F-Race-10 handles conditional parameters by only sampling values for them
when they are active, and by only updating the respective component of the model
in situations where such parameters are active in a configuration surviving one of
the subsidiary F-Races. (For further details, see 12.)
3.2.3 Applications
Balaprakash et al. [5] describe applications of F-Race, Sampling F-Race and Iter-
ative F-Race to three high-performance stochastic local search algorithms: MAX-
MIN Ant System for the TSP with six parameters [64], an estimation-based local
search algorithm for the probabilistic TSP (PTSP) with three parameters [6], and a
simulated annealing algorithm for vehicle routing with stochastic demands (VRP-
SD) with four parameters [53]. The empirical results from these case studies indi-
cate that both, Sampling F-Race and Iterative F-Race can find good configurations
in spaces that are too big be handled effectively by F-Race, and that Iterative F-
Race tends to give better results than Sampling F-Race, especially when applied to
46 Holger H. Hoos
more difficult configuration problems. Both, the PTSP and the VRP-SD algorithms
as configured by Iterative F-Race represented the state of the art in solving these
problems at the time of this study.
More applications of F-Race have recently been summarised by Birattari et al.
[12]. These include tuning the parameters of various meta-heuristic algorithms for
university timetabling problems [58], of a control system for simple robots [52], and
of a new state-of-the-art memetic algorithm for the linear ordering problem [61]. In
all of these cases, the basic F-Race algorithm was applied to target algorithms with
few parameters and rather small configuration spaces (48–144 configurations).
Yuan et al. [71] report an application of I/F-Race for tuning various heuristic
algorithms for solving a locomotive scheduling problem provided by the German
railway company, Deutsche Bahn. The target algorithms considered in this work
had up to five parameters, mostly with continuous domains. The most complex
application of I/F-Race reported by Birattari et al. [12] involves 12 parameters of
the ACOTSP software, some of which conditionally depend on the values of oth-
ers.
While these (and other) racing procedures have been demonstrated to be useful
for accomplishing a broad range of parameter tuning tasks, it is somewhat unclear
how well they perform when applied to target algorithms with many more param-
eters, and how effectively they can deal with the many categorical and conditional
parameters arising in the context of more complex computer-aided algorithm design
tasks, such as the ones considered by Hutter et al. [35], KhudaBukhsh et al. [45],
Hutter et al. [42], and Tompkins and Hoos [66].
3.3 ParamILS
At the core of the ParamILS framework for automated algorithm configuration [36,
37] lies Iterated Local Search (ILS), a well-known and versatile stochastic local
search method that has been applied with great success to a wide range of difficult
combinatorial problems (see, e.g., 47, 30). ILS iteratively performs phases of simple
local search designed to rapidly reach or approach a locally optimal solution to
the given problem instance, interspersed with so-called perturbation phases, whose
purpose is to effectively escape from local optima. Starting from a local optimum
x, in each iteration one perturbation phase is performed, followed by a local search
phase, with the aim of reaching (or approaching) a new local optimum x . Then,
a so-called acceptance criterion is used to decide whether to continue the search
process from x or whether to revert to the previous local optimum, x. Using this
mechanism, ILS aims to solve a given problem instance by effectively exploring the
space of its locally optimal solutions. At a lower level, ILS – like most SLS methods
– visits (i.e., moves through) a series of candidate solutions such that at any given
time there is a current candidate solution, while keeping track of the incumbent (i.e.,
the best solution encountered so far).
ParamILS uses this generic SLS method to search for high-performance config-
urations of a given algorithm as follows (see also Figure 3.3). The search process is
initialised by considering a given configuration (which would typically be the given
target algorithm’s default configuration) as well as r further configurations that are
chosen uniformly at random from the given configuration space. These r + 1 con-
figurations are evaluated in a way that is specific to the given ParamILS variant, and
the best-performing configuration is selected as the starting point for the iterated
local search process. This initialisation mechanism can be seen as a combination of
the intuitive choice of starting from a user-defined configuration (such as the tar-
get algorithm’s default settings) and a simple experimental design technique, where
the latter makes it possible to exploit situations where the former represents a poor
choice for the given set of benchmark instances. Clearly, there is a trade-off between
the effort spent on evaluating randomly sampled configurations at this point and the
effort used in the subsequent iterated local search process. Hutter et al. [39] reported
empirical results suggesting that r = 10 results in better performance than r = 0 and
r = 100 across a number of configuration scenarios. However, we suspect that more
sophisticated initialisation procedures, in particular ones based on racing or sequen-
tial model-based optimisation techniques, might result in even better performance.
The subsidiary local search procedure used in ParamILS is based on the one-
exchange neighbourhood induced by arbitrary changes in the values of a single
target algorithm parameter. ParamILS supports conditional parameters by pruning
neighbourhoods such that changes in inactive parameters are excluded from con-
sideration; it also supports exclusion of (complete or partial) configurations ex-
plicitly declared ‘forbidden’ by the user. Using the one-exchange neighbourhood,
ParamILS performs iterative first-improvement search – an obvious choice, con-
sidering the computational cost of evaluating candidate configurations. We believe
that larger neighbourhoods might prove effective in situations in which parame-
48 Holger H. Hoos
procedure ParamILS
input target algorithm A, set of configurations C, set of problem instances I,
performance metric m;
parameters configuration c0 ∈ C, integer r, integer s, probability pr;
output configuration c∗ ;
c∗ := c0 ;
for i := 1 to r do
draw c from C uniformly at random;
assess c against c∗ based on performance of A on instances from I according to metric m;
if c found to perform better than c∗ then
c∗ := c;
end if;
end for;
c := c∗ ;
perform subsidiary local search on c;
while termination condition not met do
c := c;
perform s random perturbation steps on c
perform subsidiary local search on c ;
assess c against c based on performance of A on instances from I according to metric m;
if c found to perform better than c then // acceptance criterion
update overall incumbent c∗ ;
c := c ;
end if;
with probability pr do
draw c from C uniformly at random;
end with probability;
end while;
return c∗ ;
end ParamILS
Fig. 3.3: High-level outline of ParamILS, as introduced by [36]; details are explained
in the text
ter effects are correlated, as well as in conjunction with mechanisms that recog-
nise and exploit such dependencies in parameter response. Furthermore, search
strategies other than iterative first-improvement could be considered in variants of
ParamILS that build and maintain reasonably accurate models of local parameter
responses.
The perturbation procedure used in the ParamILS framework performs a fixed
number, s, of steps chosen uniformly at random in the same one-exchange neigh-
bourhood used during the local search phases. Computational experiments in which
various fixed values of s as well as several multiples of the number of target algo-
rithm parameters were considered suggest that relatively small perturbations (i.e.,
s = 2) are sufficient for obtaining good performance of the overall configuration
procedure [39]. Considering the use of iterative first-improvement during the local
search phases, this is not overly surprising; still, larger perturbations might be effec-
3 Automated Algorithm Configuration and Parameter Tuning 49
3.3.2 BasicILS
exceeds the number of given problem instances; in this case, the list of runs per-
formed is determined by a sequence of random permutations of the given set of
instances, and the random number seed used in each run is determined uniformly at
random.
This approach works well for configuration scenarios where a relatively small set
of benchmark instances is representative of all instances of interest. Furthermore,
the N target algorithm runs per configuration can be performed independently in
parallel. As for all ParamILS variants – and, indeed, for any SLS algorithm – fur-
ther parallelisation can be achieved by performing multiple runs of BasicILS(N)
in parallel. Finally, in principle, it is possible to perform multiple parallel runs of
the subsidiary local search in each iteration or to evaluate multiple neighbours of a
configuration in each search step independently in parallel.
3.3.3 FocusedILS
One drawback of BasicILS is that it tends to make substantial effort evaluating poor
configurations, especially when used to configure a given target algorithm for min-
imised run-time. The only way to reduce that effort is to choose a small number of
runs, N; however, this can (and often does) result in poor generalisation of perfor-
mance to problem instances other than those used during the configuration process.
FocusedILS addresses this problem by initially evaluating configurations using few
target algorithm runs and subsequently performing additional runs to obtain increas-
ingly precise performance estimates for promising configurations. We note that the
idea of focussing the computational effort in evaluating configurations on candi-
dates that have already shown promising performance is exactly the same as that
underlying the concept of racing. However, unlike the previously discussed racing
procedures, FocusedILS determines promising configurations heuristically rather
than using statistical tests.
The mechanism used by FocusedILS to assess configurations is based on the
following concept of domination: Let c1 and c2 be configurations for which N(c1 )
and N(c2 ) target algorithm runs have been performed, respectively. As in the case
of BasicILS, the runs performed for each configuration follow the same sequence of
instances (and pseudo random number seeds). Then c1 dominates c2 if, and only if,
N(c1 ) ≥ N(c2 ) and the performance estimate for c1 based on its first N(c2 ) runs is at
least as good as that for c2 based on all of its N(c2 ) runs. This definition incorporates
the previously discussed idea of blocking, as configurations are compared based on
their performance on a common set of instances (and pseudo random number seeds).
Whenever FocusedILS decides that one configuration, c1 , performs better than
another, c2 , it ensures that c1 dominates c2 by performing additional runs on either
or both configurations. More precisely, when comparing two configurations, an ad-
ditional run is first performed for the configuration whose performance estimate is
based on fewer runs or, in the case of a tie, on both configurations. Then, as long as
neither configuration dominates the other, further runs are performed based on the
3 Automated Algorithm Configuration and Parameter Tuning 51
Both BasicILS and FocusedILS can be improved by limiting under certain condi-
tions the time that is spent evaluating poorly performing configurations. The key
idea is that when comparing two configurations c1 and c2 , a situation may arise
where, regardless of the results of any further runs, c2 cannot match or exceed the
performance of c1 [37]. This is illustrated by the following example, taken from
Hutter et al. [37]: Consider the use of BasicILS(100) for minimising the expected
run-time of a given target algorithm on a set of 100 benchmark instances, where
configuration c1 has solved all 100 instances in a total of ten CPU seconds, and c2
has run for the same ten CPU seconds on the first instances without solving them.
52 Holger H. Hoos
Clearly, we can safely terminate that latter run after 10 + ε CPU seconds (for some
small time ε ), since the average run-time of c2 must exceed 0.1 CPU seconds, re-
gardless of its performance in the remaining N − 1 runs, and therefore be worse than
that of c1 .
Based on this insight, the trajectory-preserving adaptive capping mechanism of
Hutter et al. [37] limits the effort spent on evaluating configurations based on com-
paring lower bounds on the performance of one configuration c2 to upper bounds (or
exact values) on that of another configuration c1 , based on the results of given sets
of runs for c1 and c2 . We note that this corresponds to the notion of racing, where
each of the two configurations works independently through a given number of runs,
but the race is terminated as soon as the winner can be determined with certainty.
Apart from the potential for savings in running time, the use of trajectory-preserving
capping does not change the behaviour of ParamILS.
A heuristic generalisation of this capping mechanism makes it possible to achieve
even greater speedups, albeit at the price of possibly substantial changes to the
search trajectory followed by the configuration procedure. The key idea behind
this generalisation (dubbed aggressive capping) is to additionally bound the time
allowed for evaluating configurations based on the performance observed for the
current incumbent, i.e., the best-performing configuration encountered since the
beginning of the ParamILS run. The additional bound is obtained by multiplying
the performance estimate of the incumbent by a constant, bm, called the bound
multiplier. Formally, for bm = ∞, the additional bound becomes inactive (assum-
ing the performance measure is to be minimised), and the behaviour of trajectory-
preserving capping is obtained. For bm = 1, on the other hand, a very aggressive
heuristic is obtained, which limits the evaluation of any configuration to the time
spent on evaluating the current incumbent. In practice, bm = 2 appears to result
in good performance and is used as a default setting in ParamILS. Despite its
heuristic nature, this modified capping mechanism preserves the PAC property of
FocusedILS.
Although Hutter et al. [37] spelled out their adaptive capping mechanisms for
the performance objective of minimising a target algorithm’s mean run-time only,
these mechanisms generalise to other objectives in a rather straightforward way (a
discussion of capping in the context of minimising quantiles of run-time is found
in Ch. 7 of [32]). We note, however, that – especially when several target algorithm
runs are conducted in parallel – adaptive capping would be most effective in the
case of run-time minimisation. Particularly substantial savings can be achieved dur-
ing the assessment of the r + 1 configurations considered during initialisation of the
search process, as well as towards the end of each local search phase. Finally, it
should be noted that adaptive capping mechanisms can be used in the context of
configuration procedures other than ParamILS; for example, Hutter et al. [37] men-
tion substantial speedups achieved by using adaptive capping in combination with
simple random sampling (the same procedure as that used during the initialisation
of ParamILS).
3 Automated Algorithm Configuration and Parameter Tuning 53
3.3.5 Applications
ParamILS variants, and in particular FocusedILS, have been very successfully ap-
plied to a broad range of high-performance algorithms for several hard combina-
torial problems. An early version of FocusedILS was used by Thachuk et al. [65]
to configure a replica-exchange Monte Carlo (REMC) search procedure for the 2D
and 3D HP protein structure prediction problems; the performance objective was to
minimise the mean run-time for finding ground states for a given set of sequences
in these abstract but prominent models of protein structure, and the resulting config-
urations of the REMC procedure represented a considerable improvement over the
state of the art techbniques for solving these challenging problems.
FocusedILS has also been used in a series of studies leading to considerable
advances in the state-of-the-art in solving the satisfiability problem in propositional
logic, one of the most widely studied NP-hard problems in computer science. Hut-
ter et al. [35] applied this procedure to S PEAR, a complete, DPLL-type SAT solver
with 26 parameters (ten of which are categorical), which jointly give rise to a total
of 8.34 · 1017 possible configurations. The design of S PEAR was influenced con-
siderably by the availability of a powerful configuration tool such as FocusedILS,
whose application ultimately produced configurations that solved a given set of SAT-
encoded software verification problems about 100 times faster than previous state-
of-the-art solvers for these types of SAT instances and won the first prize in the
QF BV category of the 2007 Satisfiability Modulo Theories (SMT) Competition.
KhudaBukhsh et al. [45] used FocusedILS to find performance-optimised in-
stantiations of SATenstein-LS, a highly parametric framework for stochastic local
search (SLS) algorithms for SAT. This framework was derived from components
found in a broad range of high-performance SLS-based SAT solvers; its 41 parame-
ters induce a configuration space of size 4.82·1012 . Using FocusedILS, performance
improvements of up to three orders of magnitudes were achieved over the previous
best-performing SLS algorithms for various types of SAT instances, for several of
which SLS-based solvers are the most effective SAT algorithms overall. Several
automatically determined configurations of SATenstein-LS were used in the most
recent SATzilla solvers, which led the field in the 2009 SAT Competition, winning
prizes in five of the nine main categories [69].
Very recently, Xu et al. [70] used FocusedILS in an iterative fashion to obtain
sets of configurations of SATenstein-LS that were then used in combination with
state-of-the-art per-instance algorithm selection techniques (here: SATzilla). In each
iteration of the overall procedure, dubbed Hydra, FocusedILS was used to find con-
figurations that would best complement a given portfolio-based per-instance algo-
rithm selector. This approach resulted in a portfolio-based SAT solver that, while
derived in a fully automated fashion from a single, highly parameterised algorithm,
achieved state-of-the-art performance across a wide range of benchmark instances.
Tompkins and Hoos [66] applied FocusedILS to a new, flexible framework for
SLS-based SAT solvers called VE-Sampler (which is conceptually orthogonal to the
previously mentioned SATenstein-LS framework). VE-Sampler has a large number
of categorical and conditional parameters, which jointly give rise to more than 1050
54 Holger H. Hoos
not cause major losses in the performance of the configurations found by ParamILS)
could be difficult, in most cases generic approaches such as even or geometric sub-
divisions of a given interval seem to give good results. Where this is not the case,
multiple runs of the configuration procedure can be used to iteratively refine the do-
mains of continuous parameters. The same approach can be used to extend domains
in cases where parameter values in an optimised configuration lie at the boundary
of their respective domains. Nevertheless, the development of ParamILS variants
that natively deal with continuous parameters and support dynamic extensions of
parameter domains remains an interesting direction for future work.
procedure SMBO
input target algorithm A, set of configurations C, set of problem instances I,
performance metric m;
output configuration c∗ ;
determine initial set of configurations C0 ⊂ C;
for all c ∈ C0 , measure performance of A on I according to metric m;
build initial model M based on performance measurements for C0 ;
determine incumbent c∗ ∈ C0 for which best performance was observed or predicted;
repeat
based on model M, determine set of configurations C ⊆ C;
for all c ∈ C , measure performance of A on I according to metric m;
update model M based on performance measurements for C ;
update incumbent c∗ ;
until termination condition met;
return c∗ ;
end SMBO
Fig. 3.4: High-level outline of the general sequential model-based optimisation ap-
proach to automated algorithm configuration; model M is used to predict the perfor-
mance of configurations that have not (yet) been evaluated, and set C is typically
chosen to contain configurations expected to perform well based on those predic-
tions. Details of various algorithms following this approach are explained in the text
The efficient global optimisation (EGO) algorithm for black-box function optimi-
sation by Jones et al. [44] uses a response surface model obtained via noise-free
Gaussian process regression in combination with an expected improvement crite-
3 Automated Algorithm Configuration and Parameter Tuning 57
rion for selecting the next configuration to be evaluated. The noise-free Gaussian
process (GP) model utilised by EGO is also known as the DACE model, after its
prominent use in earlier work by Sacks et al. [59]. It defines for every input x a
random variable F̂(x) that characterises the uncertainty over the true response value
f (x) at point x.
The model-based optimisation process carried out by EGO starts with about 10·k
design points determined using a k-dimensional space-filling Latin hypercube de-
sign (LHD). After measuring the response values for these values, the 2 · k + 2 pa-
rameters of a DACE model are fit to the pairs of design points and response values,
using maximum likelihood estimates (as described by 44, this can be partially done
in closed form). The resulting model is assessed by means of so-called standard-
ized cross-validated residuals, which reflect the degree to which predictions made
by the model agree with the observed response values on the design points used for
constructing the model. If the model is deemed unsatisfactory, the response values
may be transformed using a log- or inverse-transform (i.e., modified by applying the
function ln y or 1/y) and the model fitted again.
After a satisfactory initial model has been obtained, it is used in conjunction
with an expected improvement criterion to determine a new design point to be
evaluated. The expected improvement measure used in this context uses the cur-
rent DACE model M to estimate the expected improvement over the best response
value measured so far, fmin , at any given design point x, and is formally defined as
EI(x) := E[max{ fmin − F̂(x), 0}], where F̂(x) is the random variable describing the
response for a design point x according to model M. Using a closed-form expres-
sion for this measure given by Jones et al. [44] and a branch & bound search method
(which can be enhanced heuristically), the EGO algorithm then determines a design
point x with maximal expected improvement EI(x ). If EI(x ) is less than 1% of the
current incumbent, the procedure terminates. Otherwise, the response value f (x ) is
measured, and the DACE model is refitted on the previous set of data extended by
the pair (x , f (x )), and a new iteration begins, in which the updated model is used
to determine the next design point using the same process that yielded x .
Note that in every iteration of this process, the DACE model has to be fitted,
which involves a matrix inversion of cost O(n3 ), where n is the number of design
points used. Depending on the cost of measuring the response value for a given
design point, this may represent a substantial computational overhead. Furthermore,
the noise-free Gaussian process model used in EGO cannot directly characterise
the stochastic responses obtained when solving algorithm configuration problems
involving randomised target algorithms.
We now discuss two black-box optimisation procedures that deal with stochastic
responses, as encountered when modelling phenomena subject to observation noise
or configuring randomised algorithms.
58 Holger H. Hoos
where fˆmin is the model’s prediction for the current best input (as in EGO, obtained
by considering all design points used for building the model), F̂(x) is the random
variable describing the response for a design point x according to model M, σε is
the standard deviation of the measurement noise (assumed to be identical for all
inputs), and s2 (x) is the variance of the response F̂(x) given by the model at point x,
where the second term in the product decreases as the predictions of M become more
accurate. Based on the given model M, the next design point to be evaluated, x , is
determined by maximising EI (x) using the Nelder-Mead Simplex algorithm [49].
Next, the model is refitted, taking into account x and a response value sampled at x ,
and a new iteration begins, in which the updated model is used to determine the next
design point using the same process that yielded x . If the maximum EI (x) values
from d + 1 successive iterations all fall below a user-defined threshold, the iterative
sampling process is terminated. (This treshold can be specified as an absolute value
or as a fraction of the difference between the largest and smallest observed response
values.)
Unfortunately, SKO assumes that the variability of the response values at each
design point is characterised by a Gaussian distribution, and that the standard de-
viations of those distributions are the same across the entire design space. Both of
these assumptions are problematic in the context of configuring randomised algo-
rithms, particularly when minimising run-time (see, e.g., 30). Furthermore, the time
required for fitting the model in each iteration of SKO is cubic in the number of
response values sampled, which can represent a substantial computational burden.
3 Automated Algorithm Configuration and Parameter Tuning 59
where fmin is the best value of the measure to be optimised observed so far, and
F̂(x) is the distribution over the predictions obtained from model M at design point
x. This EI measure has been introduced by Schonlau et al. [62] with the aim of
encouraging the exploration of design points for which the current model produces
highly uncertain predictions.
At each of the design points determined in this way, r new response values are
measured. Furthermore, additional response values are measured for the current in-
cumbent to ensure that it is evaluated based on as many samples as available for any
of the new design points. Then, the best of all the design points considered so far,
according to the given measure to be optimised, is selected as the new incumbent
3 In the literature, the term sequential parameter optimisation is also used to refer to a broad
(with ties broken uniformly at random). If the design point thus selected has been an
incumbent at any point earlier in the search process, r is increased; in SPO version
0.3 [8], r is doubled, while in the newer version 0.4 [7], it is merely incremented by
1, and in both cases values of r are limited to a user-specified maximum value rmax .
At this point, a new iteration of SPO begins, in which a noise-free GP is fitted on
the augmented set of data.
Based on a detailed investigation of the core components of the SPO algorithm, Hut-
ter et al. [38] introduced a variant called SPO+ that shows considerably more robust
performance on standard benchmarks than the SPO 0.3 and SPO 0.4 algorithms
described previously.
The main difference between SPO+ and the previous SPO procedures lies in
the way in which new design points are accepted as incumbents. Inspired by Fo-
cusedILS, SPO+ uses a mechanism that never chooses a new incumbent x̂ without
ensuring that at least as many responses have been sampled at x̂ as at any other de-
sign point x = x̂ . To achieve this, for any challenger to the current incumbent x̂, i.e.,
for any design point x that appears to represent an improvement over x̂ based on the
samples taken so far, additional response values are sampled until either x ceases to
represent an improvement, or the number of response values sampled at x reaches
that taken at x̂, with x still winning the comparison with x̂ based on the respective
samples; only in the latter case does x become the new incumbent, while in the
former case it is dismissed, and as many additional response values are sampled for
x̂ as newly measured for x .
The new response values determined for a challenger x are sampled in batches,
with the number of new samples taken doubling in each successive batch. As noted
by Hutter et al. [38], using this mechanism, rejection of challengers is done in a
rather aggressive, heuristic manner, and frequently occurs after only a single re-
sponse value has been sampled at x – long before a statistical test could conclude
that the x is worse than the current incumbent.
The Time-Bounded Sequential Parameter Optimisation (TB-SPO) algorithm by
Hutter et al. [41] introduces a number of further modifications to the SMBO frame-
work underlying the previously described SPO variants. In particular, in contrast
to all SMBO procedures discussed so far, TB-SPO does not construct its initial
model based on a large set of samples determined using a Latin hypercube design,
but rather interleaves response measurements at randomly chosen points with ones
taken at points that appear to be promising based on the current model. The initial
model is based on a single sample only; when used for algorithm configuration,
where the black-box function to be optimised represents the output of a parame-
terised algorithm, the default configuration for the algorithm is used as the design
point at which this initial sample is taken. At any stage of the iterative model-based
3 Automated Algorithm Configuration and Parameter Tuning 61
search process that follows, response values are sampled at a series of design points
in which odd-numbered points are determined by optimising an expected improve-
ment measure (as is done in SPO+ ), while even-numbered points are sampled uni-
formly at random from the given design space. (Mechanisms that achieve a different
balance between promising and randomly chosen design points could lead to better
performance but have not been explored so far.)
The number of design points at which response values are sampled between any
two updates to the model is determined based on the time t required for constructing
a new model and the search for promising parameter settings; to be precise, after at
least two design points have been evaluated, further points are considered until the
time used for evaluating design points since the last model update exceeds a user-
defined multiple (or fraction) of the overhead t.
Finally, in order to reduce the computational overhead incurred by the model con-
struction process, TB-SPO uses an approximate version of the standard Gaussian
process models found in the other SPO variants. This so-called projected process
(PP) approximation is based on the idea of representing explicitly only a randomly
sampled subset of the given data points (here: pairs of input and response values)
when building the Gaussian process model; if this subset comprises s data points,
while the complete set has n data points, the time complexity of fitting a GP model
decreases from O(n3 ) to O((s + n) · s2 ), while the time required for predicting a
response value (mean and variance of the predictive distribution at a given design
point) decreases from O(n2 ) to O(s2 ) [56]. In the context of an SMBO procedure,
this will typically lead to substantial savings, since the number of data points avail-
able increases over time, and n can easily reach values of several thousand, while
effective PP approximations can be based on constant-size subsets with s no larger
than 300 [41]. (Details on other, minor differences between TB-SPO and SPO+ can
be found in [41].)
3.4.4 Applications
response surface models used, SMBO methods are usually restricted to dealing with
real- and integer-valued target algorithm parameters (although, very recently, [43]
has introduced techniques that can handle categorical parameters).
Following an example from Bartz-Beielstein et al. [9], the SPO variants discussed
in this section have been empirically evaluated using CMA-ES [25, 26, 27] – one
of the best-performing gradient-free numerical optimisation procedures currently
known – on several standard benchmark functions from the literature on gradient-
free numerical optimisation (see, e.g., 26). The configuration space considered in
these examples, which involve the convex Sphere function as well as the non-convex
Ackley, Griewank and Rastrigin functions, is spanned by three real- and one integer-
valued parameters of CMA-ES, and the performance measure was solution quality,
achieved after a fixed number of evaluations of the respective benchmark function.
The empirical results reported by Hutter et al. [38] for CMA-ES applied to the ten-
dimensional instances of these functions indicate that SPO+ tends to perform sig-
nificantly better than SPO 0.3 and 0.4, which in turn appear to perform substantially
better than SKO. In addition, Hutter et al. [38] considered the minimisation of the
median number of search steps required by SAPS [33], a well-known stochastic lo-
cal search algorithm for SAT, to solve a single benchmark instance obtained from
encoding a widely studied quasi-group completion problem into SAT; in this case,
four continuous parameters were optimised. The results from that experiment con-
firmed that SPO+ tends to perform better than previous SPO variants and suggest
that, at least on some configuration problems with a relatively modest number of
predominently real-valued parameters, it can also yield slightly better results than
FocusedILS when allowed the same number of target algorithm runs.
TB-SPO has been empirically compared to SPO+ and FocusedILS on relatively
simple algorithm configuration tasks involving the well-known SAT solver SAPS
[33], with four continuous parameters, running on single SAT instances. In these
experiments, TB-SPO was shown to perform significantly better than SPO+ (some-
times achieving over 250-fold speedups), and moderately better than FocusedILS
[41]. However, it is important to keep in mind that, unlike TB-SPO (and all other
SMBO procedures covered in this section), FocusedILS explicitly deals with multi-
ple problem instances, and can therefore be expected to perform substantially better
on realistic algorithm configuration tasks. Furthermore, while SMBO procedures
like TB-SPO do not require continuous algorithm parameters to be discretised, they
presently cannot deal with conditional parameters, which are routinely encountered
in the more challenging algorithm configuration tasks on which FocusedILS has
been shown to be quite effective.
In addition to the methods covered in the previous sections, there are many other
procedures described in the literature that can, at least in principle, be applied to the
algorithm configuration problem considered here.
3 Automated Algorithm Configuration and Parameter Tuning 63
most genetic programming approaches, CLASS closely links the specification of the
configuration space and the evolutionary algorithm used for exploring this space.
The Composer system developed by Gratch and Dejong [23] is based on an it-
erative improvement procedure not unlike that used in ParamILS; this procedure is
conceptually related to racing techniques in that it moves to a new configuration
only after gathering sufficient statistical evidence to conclude that this new configu-
ration performs significantly better than the current one. In a prominent application,
Gratch and Chien [22] used the Composer system to optimise five parameters of an
algorithm for scheduling communication between a spacecraft and a set of ground-
based antennas.
Ansótegui et al. [2] recently developed a gender-based genetic algorithm for solv-
ing algorithm configuration problems. Their GGA procedure supports categorical,
ordinal and real-valued parameters; it also allows its user to express independen-
cies between parameter effects by means of so-called variable trees – a concept that
could be of particular interest in the context of algorithm configuration problems
where such independencies are known by construction, or heuristic methods are are
available for detecting (approximate) independencies. Although there is some evi-
dence that GGA can solve some moderately difficult configuration problems more
effectively than FocusedILS without capping [2], it appears to be unable to reach
the performance of FocusedILS version 2.3 with aggressive capping on the most
challenging configurations problems [40]. Unfortunately, GGA also offers less flex-
ibility than FocusedILS in terms of the performance metric to be optimised. More
algorithm configuration procedures based on evolutionary algorithms are covered in
Chapter 2 of this book.
Finally, work originating from the Ph.D. project of Hutter [32] has recently over-
come two major limitations of the sequential model-based optimisation methods
discussed in Section 3.4 of this chapter by introducing a procedure that can han-
dle categorical parameters while explicitly exploiting the fact that performance is
evaluated on a set of problem instances. There is some evidence that this procedure,
dubbed Sequential Model-based Algorithm Configuration (SMAC), can, at least on
some challenging configuration benchmarks, reach and sometimes exceed the per-
formance of FocusedILS [43], and we are convinced that, at least in cases where
the parameter response of a given target algorithm is reasonably regular and per-
formance evaluations are very costly, such advanced SMBO methods hold great
promise.
Automated algorithm configuration and parameter tuning methods have been de-
veloped and used for more than a decade, and many of the fundamental techniques
date back even further. However, it has only recently become possible to effectively
solve complex configuration problems involving target algorithms with dozens of
parameters, which are often categorical and conditional. This success is based in
3 Automated Algorithm Configuration and Parameter Tuning 65
part on the increased availability of computational resources, but has mostly been
enabled by methodological advances underlying recent configuration procedures.
Still, we see much room (and, indeed, need) for future work on automated algo-
rithm configuration and parameter tuning methods. We believe that in developing
such methods, the fundamental features underlying all three types of methods dis-
cussed in this chapter can play an important role, and that the best methods will
employ combinations of these. We further believe that different configuration pro-
cedures will likely be most effective for solving different types of configuration
problems (depending, in particular, on the number and type of target algorithm pa-
rameters, but also on regularities in the parameter response). Therefore, we see a
need for research aiming to determine which configurator is most effective under
which circumstances. In fact, we expect to find situations in which the sequential
or iterative application of more than one configuration procedure turns out to be
effective – for example, one could imagine applying FocusedILS to find promising
configurations in vast, discrete configuration spaces, followed by a gradient-free nu-
merical optimisation method, such as CMA-ES, for fine-tuning a small number of
real-valued parameters.
Overall, we believe that algorithm configuration techniques, such as the ones
discussed in this chapter, will play an increasingly crucial role in the develop-
ment, evaluation and use of state-of-the-art algorithms for challenging computa-
tional problems, where the challenge could arise from high computational complex-
ity (in particular, NP-hardness) or from tight resource constraints (e.g., in real-
time applications). Therefore, we see great value in the design and development of
software frameworks that support the real-world application of various algorithm
configuration and parameter tuning procedures. The High-Performance Algorithm
Lab (HAL), recently introduced by Nell et al. [50], is a software environment de-
signed to support a wide range of empirical analysis and design tasks encountered
during the development, evaluation and application of high-performance algorithms
for challenging computational problems, including algorithm configuration and pa-
rameter tuning. Environments such as HAL not only facilitate the application of
automated algorithm configuration and parameter tuning procedures, but also their
development, efficient implementation and empirical evaluation.
In closing, we note that the availability of powerful and effective algorithm
configuration and parameter tuning procedures has a number of interesting con-
sequences for the way in which high-performance algorithms are designed and used
in practice. Firstly, for developers and end users, it is now possible to automatically
optimise the performance of (highly) parameterised solvers specifically for certain
classes of problem instances, leading to potentially much improved performance in
real-world applications. Secondly, while on-line algorithm control mechanisms that
adjust parameter settings during the run of a solver (as covered, for example, in
Chapters 6, 7 and 8 of this book) can in principle lead to better performance than
the (static) algorithm configuration procedures considered in this chapter, we expect
these latter procedures to be very useful in the context of (statically) configuring the
parameters and heuristic components that determine the behaviour of the on-line
control mechanisms. Finally, during algorithm development, it is no longer neces-
66 Holger H. Hoos
sary (or even desirable) to eliminate parameters and similar degrees of freedom,
but instead, developers can focus more on developing ideas for realising certain
heuristic mechanisms or components, while the precise instantiation can be left to
automated configuration procedures [28]. We strongly believe that this last effect
will lead to a fundamentally different and substantially more effective way of de-
signing and implementing high-performance solvers for challenging computational
problems.
Acknowledgements This chapter surveys and discusses to a large extent work carried out by
my research group at UBC, primarily involving Frank Hutter, Kevin Leyton-Brown and Kevin
Murphy, as well as Thomas Stützle at Université Libre de Bruxelles, to all of whom I am deeply
grateful for their fruitful and ongoing collaboration. I gratefully acknowledge valuable comments
by Frank Hutter, Thomas Stützle and Maverick Chan on earlier drafts of this chapter, and I thank
the members of my research group for the many intellectually stimulating discussions that provide
a fertile ground for much of our work on automated algorithm configuration and other topics in
empirical algorithmics.
References
[1] Adenso-Diaz, B., Laguna, M.: Fine-tuning of algorithms using fractional ex-
perimental design and local search. Operations Research 54(1):99–114 (2006)
[2] Ansótegui, C., Sellmann, M., Tierney, K.: A gender-based genetic algorithm
for the automatic configuration of algorithms. In: Proceedings of the 15th In-
ternational Conference on Principles and Practice of Constraint Programming
(CP 2009), pp. 142–157 (2009)
[3] Applegate, D. L., Bixby, R. E., Chvátal, V., Cook, W. J.: The Traveling Sales-
man Problem: A Computational Study. Princeton University Press (2006)
[4] Audet, C., Orban, D.: Finding optimal algorithmic parameters using the mesh
adaptive direct search algorithm. SIAM Journal on Optimization 17(3):642–
664 (2006)
[5] Balaprakash, P., Birattari, M., Stützle, T.: Improvement strategies for the
F-Race algorithm: Sampling design and iterative refinement. In: Bartz-
Beielstein, T., Blesa, M., Blum, C., Naujoks, B., Roli, A., Rudolph, G., Sam-
pels, M. (eds) 4th International Workshop on Hybrid Metaheuristics, Proceed-
ings, HM 2007, Springer Verlag, Berlin, Germany, Lecture Notes in Computer
Science, vol. 4771, pp. 108–122 (2007)
[6] Balaprakash, P., Birattari, M., Stützle, T., Dorigo, M.: Estimation-based meta-
heuristics for the probabilistic traveling salesman problem. Computers & OR
37(11):1939–1951 (2010)
[7] Bartz-Beielstein, T.: Experimental Research in Evolutionary Computation:
The New Experimentalism. Natural Computing Series, Springer Verlag,
Berlin, Germany (2006)
[8] Bartz-Beielstein, T., Lasarczyk, C., Preuß, M.: Sequential parameter optimiza-
tion. In: McKay, B., et al. (eds) Proceedings 2005 Congress on Evolutionary
3 Automated Algorithm Configuration and Parameter Tuning 67
[22] Gratch, J., Chien, S. A.: Adaptive problem-solving for large-scale scheduling
problems: A case study. Journal of Artificial Intelligence Research 4:365–396
(1996)
[23] Gratch, J., Dejong, G.: Composer: A probabilistic solution to the utility prob-
lem in speed-up learning. In: Rosenbloom, P., Szolovits, P. (eds) Proceedings
of the 10th National Conference on Artificial Intelligence (AAAI-92), AAAI
Press / The MIT Press, Menlo Park, CA, USA, pp. 235–240 (1992)
[24] Guerri, A., Milano, M.: Learning techniques for automatic algorithm portfo-
lio selection. In: Proceedings of the 16th European Conference on Artificial
Intelligence (ECAI 2004), pp. 475–479 (2004)
[25] Hansen, N.: The CMA evolution strategy: A comparing review. In: Lozano,
J., Larranaga, P., Inza, I., Bengoetxea, E. (eds) Towards a new evolutionary
computation. Advances on estimation of distribution algorithms, Springer, pp.
75–102 (2006)
[26] Hansen, N., Kern, S.: Evaluating the CMA evolution strategy on multimodal
test functions. In: Yao, X., et al. (eds) Parallel Problem Solving from Nature
PPSN VIII, Springer, LNCS, vol. 3242, pp. 282–291 (2004)
[27] Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evo-
lution strategies. Evolutionary Computation 9(2):159–195 (2001)
[28] Hoos, H.: Computer-aided design of high-performance algorithms. Tech. Rep.
TR-2008-16, University of British Columbia, Department of Computer Sci-
ence (2008)
[29] Hoos, H., Stützle, T.: Local search algorithms for SAT: An empirical evalua-
tion. Journal of Automated Reasoning 24(4):421–481 (2000)
[30] Hoos, H., Stützle, T.: Stochastic Local Search—Foundations and Applications.
Morgan Kaufmann Publishers, USA (2004)
[31] Huang, D., Allen, T. T., Notz, W. I., Zeng, N.: Global optimization of stochas-
tic black-box systems via sequential kriging meta-models. Journal of Global
Optimization 34(3):441–466 (2006)
[32] Hutter, F.: Automated configuration of algorithms for solving hard computa-
tional problems. Ph.D. thesis, University of British Columbia, Department of
Computer Science, Vancouver, BC, Canada (2009)
[33] Hutter, F., Tompkins, D. A., Hoos, H.: Scaling and Probabilistic Smoothing:
Efficient Dynamic Local Search for SAT. In: Principles and Practice of Con-
straint Programming – CP 2002, Springer-Verlag, LNCS, vol. 2470, pp. 233–
248 (2002)
[34] Hutter, F., Hamadi, Y., Hoos, H., Leyton-Brown, K.: Performance prediction
and automated tuning of randomized and parametric algorithms. In: Principles
and Practice of Constraint Programming – CP 2006, Springer-Verlag, LNCS,
vol. 4204, pp. 213–228 (2006)
[35] Hutter F., Babić, D., Hoos, H., Hu, A. J.: Boosting verification by automatic
tuning of decision procedures. In: Proc. Formal Methods in Computer-Aided
Design (FMCAD’07), IEEE Computer Society Press, pp. 27–34 (2007)
3 Automated Algorithm Configuration and Parameter Tuning 69
[36] Hutter, F., Hoos, H., Stützle, T.: Automatic algorithm configuration based on
local search. In: Proceedings of the 22nd National Conference on Artificial
Intelligence (AAAI-07), pp. 1152–1157 (2007)
[37] Hutter, F., Hoos, H., Leyton-Brown, K., Murphy, K.: An experimental investi-
gation of model-based parameter optimisation: SPO and beyond. In: Proceed-
ings of the 11th Annual Conference on Genetic and Evolutionary Computation
(GECCO’09), ACM, pp. 271–278 (2009)
[38] Hutter, F., Hoos, H., Leyton-Brown, K., Stützle, T.: ParamILS: An automatic
algorithm configuration framework. Journal of Artificial Intelligence Research
36:267–306 (2009)
[39] Hutter, F., Hoos, H., Leyton-Brown, K., Stützle, T.: ParamILS: An automatic
algorithm configuration framework (extended version). Tech. Rep. TR-2009-
01, University of British Columbia, Department of Computer Science (2009)
[40] Hutter, F., Hoos, H., Leyton-Brown K.: Sequential model-based optimization
for general algorithm configuration (extended version). Tech. Rep. TR-2010-
10, University of British Columbia, Department of Computer Science (2010)
[41] Hutter, F., Hoos, H., Leyton-Brown, K., Murphy, K.: Time-bounded sequential
parameter optimization. In: Proceedings of the 4th International Conference
on Learning and Intelligent Optimization (LION 4), Springer-Verlag, LNCS,
vol. 6073, pp. 281–298 (2010)
[42] Hutter, F., Hoos, H., Leyton-Brown, K.: Automated configuration of mixed
integer programming solvers. In: Proceedings of the 7th International Confer-
ence on the Integration of AI and OR Techniques in Constraint Programming
for Combinatorial Optimization Problems (CPAIOR 2010), Springer-Verlag,
LNCS, vol. 6140, pp. 186–202 (2010)
[43] Hutter, F., Hoos, H., Leyton-Brown, K.: Extending sequential model-based
optimization to general algorithm configuration. To appear in: Proceedings
of the 5th International Conference on Learning and Intelligent Optimization
(LION 5) (2011)
[44] Jones, D. R., Schonlau, M., Welch, W. J.: Efficient global optimization of
expensive black box functions. Journal of Global Optimization 13:455–492
(1998)
[45] KhudaBukhsh, A., Xu, L., Hoos, H., Leyton-Brown, K.: SATenstein: Automat-
ically building local search SAT solvers from components. In: Proceedings of
the 21st International Joint Conference on Artificial Intelligence (IJCAI-09),
pp 517–524 (2009)
[46] Leyton-Brown, K., Nudelman, E., Andrew, G., McFadden, J., Shoham, Y.:
A portfolio approach to algorithm selection. In: Rossi, F. (ed) Principles and
Practice of Constraint Programming – CP 2003, Springer Verlag, Berlin, Ger-
many, Lecture Notes in Computer Science, vol. 2833, pp. 899–903 (2003)
[47] Lourenço, H. R., Martin, O., Stützle, T.: Iterated local search. In: Glover, F.,
Kochenberger, G. (eds) Handbook of Metaheuristics, Kluwer Academic Pub-
lishers, Norwell, MA, USA, pp. 321–353 (2002)
70 Holger H. Hoos
[48] Maron, O., Moore, A. W.: Hoeffding races: Accelerating model selection
search for classification and function approximation. In: Advances in neural
information processing systems 6, Morgan Kaufmann, pp. 59–66 (1994)
[49] Nelder, J. A., Mead, R.: A simplex method for function minimization. The
Computer Journal 7(4):308–313 (1965)
[50] Nell, C. W., Fawcett, C., Hoos, H., Leyton-Brown K.: HAL: A framework for
the automated design and analysis of high-performance algorithms. To appear
in: Proceedings of the 5th International Conference on Learning and Intelli-
gent Optimization (LION 5) (2011)
[51] Nocedal, J., Wright, S. J.: Numerical Optimization, 2nd edn. Springer-Verlag
(2006)
[52] Nouyan, S., Campo, A., Dorigo, M.: Path formation in a robot swarm: Self-
organized strategies to find your way home. Swarm Intelligence 2(1):1–23
(2008)
[53] Pellegrini, P., Birattari, M.: The relevance of tuning the parameters of meta-
heuristics. a case study: The vehicle routing problem with stochastic de-
mand. Tech. Rep. TR/IRIDIA/2006-008, IRIDIA, Université Libre de Brux-
elles, Brussels, Belgium (2006)
[54] Pop, M., Salzberg, S. L., Shumway, M.: Genome sequence assembly: Algo-
rithms and issues. Computer 35(7):47–54 (2002)
[55] Prasad, M. R., Biere, A., Gupta, A.: A survey of recent advances in SAT-based
formal verification. International Journal on Software Tools for Technology
Transfer 7(2):156–173 (2005)
[56] Rasmussen, C. E., Williams, C. K. I.: Gaussian Processes for Machine Learn-
ing. The MIT Press (2006)
[57] Rice, J.: The algorithm selection problem. Advances in Computers 15:65–118
(1976)
[58] Rossi-Doria, O., Sampels, M., Birattari, M., Chiarandini, M., Dorigo, M.,
Gambardella, L. M., Knowles, J. D., Manfrin, M., Mastrolilli, M., Paechter,
B., Paquete, L., Stützle, T.: A comparison of the performance of different
metaheuristics on the timetabling problem. In: Burke, E. K., Causmaecker,
P. D. (eds) Practice and Theory of Automated Timetabling IV, 4th Interna-
tional Conference, PATAT 2002, Selected Revised Papers, Springer, Lecture
Notes in Computer Science, vol. 2740, pp. 329–354 (2003)
[59] Sacks, J., Welch, W., Mitchell, T., Wynn, H.: Design and analysis of computer
experiments (with discussion). Statistical Science 4:409–435 (1989)
[60] Santner, T., Williams, B., Notz, W.: The Design and Analysis of Computer
Experiments. Springer Verlag, New York (2003)
[61] Schiavinotto, T., Stützle, T.: The linear ordering problem: Instances, search
space analysis and algorithms. Journal of Mathematical Modelling and Algo-
rithms 3(4):367–402 (2004)
[62] Schonlau, M., Welch, W. J., Jones, D. R.: Global versus local search in con-
strained optimization of computer models. In: Flournoy, N., Rosenberger, W.,
Wong, W. (eds) New Developments and Applications in Experimental Design,
3 Automated Algorithm Configuration and Parameter Tuning 71