0% found this document useful (0 votes)
11 views8 pages

A Racing Algorithm For Configuring Metaheuristics, M Birattari, 2002, (8p)

Uploaded by

hbambill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views8 pages

A Racing Algorithm For Configuring Metaheuristics, M Birattari, 2002, (8p)

Uploaded by

hbambill
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION 11

A Racing Algorithm for Configuring Metaheuristics

Mauro Birattari† Thomas Stützle, Luis Paquete, and Klaus Varrentrapp


IRIDIA Intellektik/Informatik
Université Libre de Bruxelles Technische Universität Darmstadt
Brussels, Belgium Darmstadt, Germany

Abstract by a mixture of rules of thumb. Most often this leads to


tedious and time consuming experiments. In addition, it
This paper describes a racing procedure for find- is very rare that a configuration is selected on the basis of
ing, in a limited amount of time, a configuration some well defined statistical procedure.
of a metaheuristic that performs as good as pos- The aim of this work is to define an automatic hands-off
sible on a given instance class of a combinatorial procedure for finding a good configuration through sta-
optimization problem. Taking inspiration from tistically guided experimental evaluations, while minimiz-
methods proposed in the machine learning litera- ing the number of experiments. The solution we pro-
ture for model selection through cross-validation, pose is inspired by a class of methods proposed for solv-
we propose a procedure that empirically evalu- ing the model selection problem in memory-based super-
ates a set of candidate configurations by discard- vised learning (Maron and Moore, 1994; Moore and Lee,
ing bad ones as soon as statistically sufficient ev- 1994). Following the terminology introduced by Maron
idence is gathered against them. We empirically and Moore (1994), we call racing method for selection
evaluate our procedure using as an example the a method that finds a good configuration (model) from
configuration of an ant colony optimization algo- a given finite pool of alternatives through a sequence of
rithm applied to the traveling salesman problem. steps.1 As the computation proceeds, if sufficient evidence
The experimental results show that our procedure is gathered that some candidate is inferior to at least another
is able to quickly reduce the number of candi- one, such a candidate is dropped from the pool and the pro-
dates, and allows to focus on the most promising cedure is iterated over the remaining ones. The elimination
ones. of inferior candidates, speeds up the procedure and allows
a more reliable evaluation of the promising ones.

1 INTRODUCTION Two are the main contributions of this paper. First, we give
a formal definition of the metaheuristic configuration prob-
A metaheuristic is a general algorithmic template whose lem. Second, we show that a metaheuristic can be tuned
components need to be instantiated and properly tuned in efficiently and effectively by a racing procedure. Our re-
order to yield a fully functioning algorithm. The instan- sults confirm the general validity of the racing algorithms
tiation of such an algorithmic template requires to choose and extend their area of applicability. On a more technical
among a set of different possible components and to assign level, left aside the specific application to metaheuristics,
specific values to all free parameters. We will refer to such we give some contribution to the general class of racing
an instantiation as a configuration. Accordingly, we call algorithms. In particular, our method adopts blocking de-
configuration problem the problem of selecting the optimal sign (Dean and Voss, 1999) in a nonparametric setting. In
configuration. some sense, therefore, the method fills the gap between Ho-
effding race (Maron and Moore, 1994) and BRACE (Moore
Practitioners typically configure their metaheuristics in an and Lee, 1994): similarly to Hoeffding race it features a
iterative process on the basis of some runs of different con- nonparametric test, and similarly to BRACE it considers a
figurations that are felt as promising. Usually, such a pro-
1
cess is heavily based on personal experience and is guided Several metaheuristics involve continuous parameters. This
would actually lead to an infinite set of candidate configurations.

This research was carried out while MB was with Intellek- In practice, typically only a finite set of possible parameter values
tik, Technische Universität Darmstadt. are considered by discretizing the range of continuous parameters.
12 ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION

blocking design. The occurrence of different instances can be conveniently


represented as the result of random experiments governed
The rest of the paper is structured as follows. Section 2
by some unknown probability measure, say PI , defined on
gives a formal definition of the problem of configuring a
the class of the possible instances. In the example discussed
metaheuristic. Section 3 describes the general ideas behind
here, it is reasonable to assume that different experiments
racing algorithms and introduces F-Race, a racing method
are independent and all governed by the same probability
specifically designed for matching the peculiar characteris-
measure. In Section 2.3, we will briefly discuss how to pos-
tics of the metaheuristic configuration problem. Section 4
sibly tackle situations in which such assumptions appear
proposes some background information on MAX–MIN-
unreasonable.
Ant-System and on the traveling salesman problem (TSP),
which are respectively the metaheuristic and the problem Now, our pizza delivery boy loves metaheuristics and uses
considered in this paper. In particular, the section gives a one to find a shortest possible tour visiting all the cus-
description of the sub class of TSP instances, and of the tomers. Being such a metaheuristic a general algorithmic
candidate configurations of MAX–MIN-Ant-System that template, different configurations are possible (see Sec-
we consider in our experimental evaluation. Section 5 pro- tion 4.2 for a more detailed example). In our setting, the
poses some experimental results, and Section 6 concludes problem that the delivery boy has to solve is to find the
the paper. configuration that is expected to yield the best solution to
the instances that he typically faces. The concept of typi-
cal instance, used here informally, has to be understood in
2 CONFIGURING A METAHEURISTIC relation to the probability measure PI , and will receive a
clear mathematical meaning presently.
This section introduces and defines the general problem
of configuring a metaheuristic. Before proposing a formal Since PI is unknown, the only information that can be used
definition, it is worth outlining briefly, with the help of an for finding the best configuration must be extracted from a
example, the type of problem setting to which our proce- sample of previously seen instances. By adopting the ter-
dure applies. Namely, our methodology is meant to be ap- minology used in machine learning, we will use the ex-
plied to repetitive problems, that is, problems where many pression training instances to denote the available previous
similar instances appear over time. instances. On the basis of such training instances, we will
look for the configuration that is expected to have the best
performance over the whole class of possible instances.
2.1 An Example: Delivering Pizza
The fact of extending results obtained on a usually small
The example we propose is admittedly simplistic and does training set to a possibly infinite set of instances is a
not cover all possible aspects of the configuration problem; genuine generalization, as intended in supervised learn-
still it has the merit of highlighting those elements that are ing (Mitchell, 1997). In the context of metaheuristics con-
essential for the discussion that follows. figuration, generalization is fully justified by the assump-
Let us consider the following pizza delivery problem. Or- tion that the same probability measure PI governs the se-
ders are collected for a (fixed) time period of, say, 30 min- lection of all the instances: both those used for training and
utes. At the end of the time period, a pizza delivery boy those that will be solved afterwards. The training instances
has some limited amount of time for scheduling a reason- are in this sense representative of the whole set of instances.
ably short tour that visits all the customers that have called
in the last 30 minutes. Then the boy leaves and delivers 2.2 The Formal Statement
the pizzas following a chosen route. The time available for
scheduling may be constant or may be expressed as a func- In order to give a formal definition of the general problem
tion of some characteristic of the instance itself, for exam- of configuring a metaheuristic, we consider the following
ple the size which in the pizza delivery problem might be objects:
measured by the number of customers to visit.
• Θ is the finite set of candidate configurations.
In such a setting, every 30 minutes a new instance of an
optimization problem is given, and a solution as good as • I is the possibly infinite set of instances.
possible has to be found in a limited amount of time. It
is very likely that every instance will be different from all • PI is a probability measure over the set I of instances:
With some abuse of notation, we indicate with PI (i)
previous ones in the location of the customers that need
the probability that the instance i is selected for being
to be visited. Further, a certain variability in the instance
size, that is the number of customers to be served, is to be solved.2
2
expected, too. Since a probability measure is associated to (sub)sets and not
ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION 13

• t : I → < is a function associating to every instance 2.3 Further Considerations and Possible Extensions
the computation time that is allocated to it.
The formal configuration problem, as described in Sec-
• c(θ, i) = c(θ, i, t(i)) is a random variable represent- tion 2.2, assumes that, as far as a given instance is con-
ing the cost of the best solution found by running con- cerned, no information on the performance of the various
figuration θ on instance i for t(i) seconds.3 candidate configurations can be obtained prior to their ac-
tual execution on the instance itself. In this sense, the in-
• C ⊂ < is the range of c, that is, the possible values stances are a priori indistinguishable.
for the cost of the best solution found in a run of a In many practical situations, it is known a priori that var-
configuration θ ∈ T heta on an instance i ∈ I. ious types of instances with different characteristics may
arise. In such a situation all possible prior knowledge
• PC is a probability measure over the set C: With the should be used to cluster the instances into homogeneous
notation4 PC (c|θ, i), we indicate the probability that classes and to find, for each class, the most suitable config-
c is the cost of the best solution found by running for uration.
t(i) seconds configuration θ on instance i.
The case mentioned in Section 2.1, in which it is not rea-
• C(θ) = C(θ|Θ, I, PI , PC , t) is the criterion that needs sonable to accept that all instances are extracted indepen-
to be optimized with respect to θ. In the most general dently and according to the same probability measure, can
case it measures in some sense the desirability of θ. possibly be handled in a similar way. Often, some temporal
correlation is observed among instances. In other words,
temporal patterns can be observed on previous instances
On the basis of these concepts, the problem of configuring that bring a priori information on the characteristics of the
a metaheuristic can be formally described by the 6-tuple current instance. This phenomenon can be handled by as-
hΘ, I, PI , PC , t, Ci. The solution of this problem is the suming that the instances are generated by a process akin
configuration θ ∗ such that: to a time-series. Also in this case, different configuration
problems should be formulated: Each class of instances to
θ∗ = arg min C(θ). (1) be treated separately would be composed by instances that
θ
follow in time a given pattern and that are therefore sup-
As far as the criterion C is concerned, different alternatives posed to share similar characteristics. The aim is again to
are possible. In this paper, we consider the optimization match the hypothesis of a priori indistinguishability of in-
of the expected value of the cost c(θ, i). Such a criterion stances within each of the different configuration problems
is adopted in many different applications and, besides be- in which the original one is reformulated.
ing quite natural, it is often very convenient from both the
theoretical and the practical point of view. Formally:
3 A RACING ALGORITHM
h i ZZ
C(θ) = EI,C c(θ, i) = c(θ, i) dPC (c|θ, i) dPI (i), Before giving a definition of a racing algorithm for solv-
I C
(2) ing the problem given in Equation 1, it is convenient to
where the expectation is considered with respect to both describe a somewhat naive brute-force approach for high-
PI and PC , and the integration is taken in the Lebesgue lighting some of the difficulties associated with the config-
sense (Billingsley, 1986). uration problem.

The measures PI and PC are usually not explicitly avail- A brute-force approach to the problem defined in Equa-
able and the analytical solution of the integrals in Equa- tion 1 consists in estimating the quantities defined in Equa-
tion 2, one for each configuration θ, is not possible. In tion 2 by means of a sufficiently large number of runs of
order to overcome such a limitation, the integrals defined each candidate on a sufficiently large set of training in-
in Equation 2 will be estimated in a Monte Carlo fashion stances. The candidate configuration with the smallest es-
on the basis of a training set of instances, as it will be ex- timated quantity is then selected.
plained in Section 3. However, such a brute-force approach presents some draw-
backs: First, the size of the training set must be defined
to single elements, the correct notation should be PI ({i}). Our prior to any computation. A criterion is missing to avoid
notational abuse consists therefore in using the same symbol i
both for the element i ∈ I, and for the singleton {i} ⊂ I. considering, on the one hand, too few instances, which
3
In the following, for the sake of a lighter notation, the depen- could prevent from obtaining reliable estimates, and on the
dency of c on t will be often implicit. other hand, too many instances, which would then require
4
The same remark as in Note 2 applies here. a great deal of useless computation. Second, no criterion
14 ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION

Θ ck of length k can be obtained from ck−1 by appending to


the latter the cost concerning the k-th instance in i.
A racing algorithm tackles the optimization problem in
Equation 1 by generating a sequence of nested sets of can-
didate configurations:

Θ0 ⊇ Θ 1 ⊇ Θ 2 ⊇ . . . ,

starting from Θ0 = Θ. The step from a set Θk−1 to Θk


i is obtained by possibly discarding some configurations that
appear to be suboptimal on the basis of information avail-
Figure 1: A visual representation of the amount of com- able at step k.
putation needed by the two methods. The surface of the
At step k, when the set of candidates still in the race
dashed rectangle represents the amount of computation for
is Θk−1 , a new instance ik is considered. Each candidate
brute-force, the shadowed area the one for racing.
θ ∈ Θk−1 is executed on ik and each observed cost c(θ, ik )
is appended to the respective ck−1 to form the different ar-
is given for deciding how many runs of each configuration rays ck (θ, i), one for each θ. Step k terminates defining set
on each instance should be performed in order to cope with Θk by dropping from Θk−1 the configurations that appear
the stochastic nature of metaheuristics. Finally, the same to be suboptimal in the light of some statistical test that
computational resources are allocated to each configura- compares the arrays ck (θ, i) for all θ ∈ Θ. The description
tion: manifestly poor configurations are thoroughly tested of the test considered in this paper is given in Section 3.2.
to the same extent as the best ones are. It should be noticed here that, for any θ, each component
of the array ck (θ, i), that is, any cost c(θ, i) of the best
solution found by a single run of θ over one generic i ex-
3.1 Racing Algorithms: The Idea tracted according to PI , is an estimate of C(θ), as defined in
Equation 2. The sampling average of ck (θ, i) is therefore
Racing algorithms are designed to provide a better alloca- itself an estimate of C(θ) and can be used for comparing
tion of computational resources among candidate configu- the performance yielded by different configurations.
rations and therefore to overcome the last of the three above
described drawbacks of brute-force. At the same time, the The above described procedure is iterated and stops ei-
racing framework indirectly allows for a clean solution to ther when all configurations but one are discarded, or
the first two problems of brute-force, that is the problems when some predefined total time T of computation is
of fixing the number of instances and the number of runs to reached. That is, the procedure would stop before consid-
Pk
be considered. ering the (k + 1)-th instance if l=1 t(il+1 ) Θl > T.

To do so, racing algorithms sequentially evaluate candidate


configurations and discard poor ones as soon as statistically 3.2 F-Race
sufficient evidence is gathered against them. The elimi-
The racing algorithm we propose, F-Race in the following,
nation of inferior candidates speeds up the procedure and
is based on the Friedman test, a statistical method for hy-
allows to evaluate the promising configurations on more
pothesis testing also known as Friedman two-way analysis
instances and to obtain more reliable estimates of their be-
of variance by ranks (Conover, 1999).
havior. Figure 1 visualizes the two different ways of allo-
cating computational resources to candidate configurations For giving a description of the test, let us assume that F-
that are adopted by brute-force and by racing algorithms. Race has reached step k, and n = |Θk−1 | configurations
are still in the race. The Friedman test assumes that the
Let us suppose that a random sequence of training in-
observed costs are k mutually independent n-variate ran-
stances i is available, where the generic k-th term ik is
dom variables (ck (θ1 , il ), ck (θ2 , il ), . . . , ck (θn , il )) called
drawn from I according to PI , independently for each k.
blocks (Dean and Voss, 1999) where each block corre-
We assume that i can be extended at will and at a negligi-
sponds to the computational results on instance il for each
ble cost, by sampling further from I.
configuration in the race at step k. Within each block
With the notation ck (θ, i) we indicate an array of k terms the quantities ck (θ, il ) are ranked from the smallest to the
whose generic l-th one is the cost c(θ, il ) of the best solu- largest. Average ranks are used in case of ties. For each
tion found by configuration θ on instance il in a run of t(il ) configuration θj ∈ Θk−1 , let Rlj be the rank of θj within
Pk
seconds. It is clear therefore that, for a given θ, the array block l, and Rj = l=1 Rlj the sum of the ranks over all
ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION 15

instances il , with 1 ≤ l ≤ k. The Friedman test considers namely the normality of data: When the hypothesis of nor-
the following statistic (Conover, 1999): mality is not strictly met t-test gracefully looses power.
n  2 For what concerns the metaheuristics configuration prob-
X k(n + 1)
(n − 1) Rj − lem, we are in a situation in which these arguments look
j=1
2
T = . suspicious. First, since we wish to reduce as soon as possi-
k X n
X
2 kn(n + 1)2 ble the number of candidates, we deal with very small sam-
Rlj − ples and it is exactly on these small samples, for which the
j=1
4
l=1 central limit theorem cannot be advocated, that we wish to
Under the null hypothesis that all possible rankings of the have the maximum power. Second, the computational costs
candidates within each block are equally likely, T is ap- are not really relevant since in any case they are negligible
proximatively χ2 distributed with n−1 degrees of freedom. compared to the computational cost of executing configura-
If the observed T exceeds the 1 − α quantile of such a dis- tions of the metaheuristic in order to enlarge the available
tribution, the null is rejected, at the approximate level α, in samples. Section 5 shows that the doubts expressed here
favor of the hypothesis that at least one candidate tends to find some evidential support in our experiments.
yield a better performance than at least one other. A second role played by ranking in F-Race is to imple-
If the null is rejected, we are justified in performing pair- ment in a natural way a blocking design (Dean and Voss,
wise comparisons between individual candidates. Candi- 1999). The variation in the observed costs c is due to dif-
dates θj and θh are considered different if ferent sources: Metaheuristics are intrinsically stochastic
algorithms, the instances might be very different one from
|Rj − Rh | the other, and finally some configurations perform better
r > t1−α/2 ,
kn(n+1)2 than others. This last source of variation is the one that
“P ”
2k( ) k
T
Pn 2
1− k(n−1) l=1 j=1 Rlj − 4
(k−1)(n−1) is of interest in the configuration problem while the oth-
ers might be considered as disturbing elements. Blocking
where t1−α/2 is the 1 − α/2 quantile of the Student’s t is an effective way for normalizing the costs observed on
distribution (Conover, 1999). different instances. By focusing only on the ranking of
In F-Race, if at step k the null of the aggregate comparison the different configurations within each instance, blocking
is not rejected, all candidates in Θk−1 pass to Θk . On the eliminates the risks that the variation due to the difference
other hand, if the null is rejected, pairwise comparisons are among instances washes out the variation due to the differ-
executed between the best candidate and each other one. ence among configurations.
All candidates that result significatively worse than the best The work proposed in this paper was openly and largely in-
are discarded and will not appear in Θk . spired by some algorithms proposed in the machine learn-
ing community (Maron and Moore, 1994; Moore and Lee,
3.3 Discussion on the Role of Ranking in F-Race 1994) but it is precisely in the adoption of a statistical test
based on ranking that it diverges from previously published
In F-Race, ranking plays an important two-fold role. The works. Maron and Moore (1994) proposed Hoeffding Race
first one is connected with the nonparametric nature of a that adopts a nonparametric approach but does not consider
test based on ranking. The main merit of nonparametric blocking. In a following paper, Moore and Lee (1994) de-
analysis is that it does not require to formulate hypothe- scribe BRACE that adopts blocking but discards the non-
ses on the distribution of the observations. Discussions parametric setting in favor of a Bayesian approach. Other
on the relative pros and cons of the parametric and non- relevant work was proposed by Gratch et al. (1993) and by
parametric approaches can be found in most textbooks on Chien et al. (1995) who consider blocking in a parametric
statistics (Larson, 1982). For an organic presentation of the setting.
topic, we refer the reader, for example, to Conover (1999).
Here we limit ourselves to mention some widely accepted This paper, to the best of our knowledge, is the first work
facts about parametric and nonparametric hypothesis test- in which blocking is considered in a nonparametric set-
ing: When the hypotheses they formulate are met, para- ting. Further, in all the above mentioned works blocking
metric tests have a higher power than nonparametric ones was always implemented through multiple pairwise paired
and usually require much less computation. Further, when comparisons (Hsu, 1996), and only in the more recent
a large amount of data is available the hypotheses for the one (Chien et al., 1995) correction for multiple tests is con-
application of parametric tests tend to be met in virtue of sidered. F-Race is the first racing algorithm to implement
the central limit theorem. Finally, it is well known that the blocking through ranking and to adopt an aggregate test
t-test, the classical parametric test that is of interest here, over all candidates, to be performed prior to any pairwise
is robust against departure from some of its hypotheses, test.
16 ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION

4 MAX–MIN-ANT-SYSTEM FOR TSP procedure Ant Colony Optimization


Init pheromones, calculate heuristic
while(termination condition not met) do
In this paper we illustrate F-Race by using as an example
p = ConstructSolutions(pheromones, heuristic)
the configuration of MAX–MIN-Ant-System (MMAS) p = LocalSearch(p) % optional
(Stützle and Hoos, 1997; Stützle and Hoos, 2000), a par- GlobalUpdateTrails(p)
ticular Ant Colony Optimization algorithm (Dorigo and Di end
Caro, 1999; Dorigo and Stützle, 2002), over a class of in- end Ant Colony Optimization
stances of the Traveling Salesman Problem (TSP).
Figure 2: Algorithmic skeleton of ACO for static
4.1 A Class of TSP Instances combinatorial optimization problems.

Given a complete graph G = (N, A, d) with N being the


set of n = |N | nodes, A being the set of arcs fully connect- city. At each construction step, ant k applies a probabilistic
ing the nodes, and d being the weight function that assigns action choice rule. In particular, when being at city i, ant k
each arc (i, j) ∈ A a length dij , the Traveling Salesman chooses to go to a yet unvisited city j at the tth iteration
Problem (TSP) is the problem of finding a shortest closed with a probability of
tour visiting each node of G once. We assume the TSP
is symmetric, that is, we have dij = dji for every pair of [τij (t)]α · [ηij ]β
nodes i and j. pkij (t) = P α β
, if j ∈ Nik ; (3)
l∈N k [τil (t)] · [ηil ]
i
The TSP is extensively studied in literature and that serves
as a standard benchmark problem (Johnson and McGeoch, where ηij = 1/dij is an a priori available heuristic value,
1997; Lawler et al., 1985; Reinelt, 1994). For our study α and β are two parameters which determine the relative
we randomly generate Euclidean TSP instances with a ran- influence of the pheromone trail and the heuristic informa-
dom distribution of city coordinates and a random num- tion, and Nik is the feasible neighborhood of ant k, that is,
ber of cities. Euclidean TSPs were chosen because such / Nik ,
the set of cities which ant k has not yet visited; if j ∈
k
instances are used in a large number of experimental re- we have pij (t) = 0.
searches on the TSP (Johnson and McGeoch, 1997; John- After all ants have constructed a solution, the pheromone
son et al., 2001). In our case, city locations are randomly trails are updated according to
chosen according to a uniform distribution in a square of
dimension 10.000 × 10.000, and the resulting distances are τij (t + 1) = (1 − ρ) · τij (t) + ∆τijbest , (4)
rounded to the nearest integer. The number of cities in each
instance is chosen as an integer randomly sampled accord- where ∆τijbest = 1/Lbest if arc (i, j) ∈ T best and zero
ing to a uniform distribution in the interval [300, 500]. We
generated a total number of 400 such instances for our ex- otherwise. Here T best is either the iteration-best solution
periments reported in Section 5. T ib , or the global-best solution T gb and Lbest is the cor-
responding tour length. Experimental results showed that
the best performance is obtained by gradually increasing
4.2 MAX–MIN-Ant-System the frequency of choosing T gb for the pheromone trail up-
date (Stützle and Hoos, 2000).
Ant Colony Optimization (ACO) (Dorigo et al., 1999;
Dorigo and Di Caro, 1999; Dorigo and Stützle, 2002) is a In MMAS, lower and upper limits τmin and τmax on the
population-based approach inspired by the foraging behav- possible pheromone strengths on any arc are imposed to
ior of ants for the solution of hard combinatorial optimiza- avoid search stagnation. The pheromone trails in MMAS
tion problems. In ACO, artificial ants implement stochastic are initialized to their upper pheromone trail limits τmax ,
construction procedures that are biased by pheromone trails leading to an increased exploration of tours at the start of
and heuristic information on the problem being solved. The the algorithms.
solutions obtained by the ants may then be improved by
In our experimental study, we have chosen a number
applying some local search routine. ACO algorithms typ-
of configurations that differ in particular parameter set-
ically follow the high-level procedure given in Figure 2.
tings for MMAS. We focused on alternative settings for
MMAS (Stützle and Hoos, 1996, 1997; Stützle and Hoos,
the main algorithm parameters as they were identified
2000) is currently one of the best performing ACO algo-
in earlier studies, in particular we considered values of
rithms for the TSP.
α ∈ {1, 1.25, 1.5, 2}, m ∈ {1, 5, 10, 25}, β ∈ {0, 1, 3, 5},
MAX–MIN-Ant-System constructs tours as follows: Ini- ρ ∈ {0.6, 0.7, 0.8, 0.9}. Each possible combination of the
tially, each of the m ants is put on some randomly chosen parameter settings leads to one particular algorithm config-
ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION 17

uration, leading to a total number of 4×4×4×4 = 256 con- races respectively, where the three races were conducted on
figurations. In our experiments each solution is improved the basis of the same pseudo-sample: We are therefore jus-
by a 2.5-opt local search procedure (Bentley, 1992). tified in using paired statistical tests when comparing the
three races among them.
5 EXPERIMENTAL RESULTS On the basis of a paired Wilcoxon test we can state that
F-Race is significatively better, at a significance level of
In this section we propose a Monte Carlo evaluation of 5%, than both tn-Race and tb-Race.6
F-Race based on a resampling technique (Good, 2001).
Some insight on this result can be obtained from the fol-
For comparison, we consider two other instances of racing lowing observation. By early dropping the less interesting
algorithms both based on a paired t-test. They are therefore candidates, F-Race is able to perform more experiments on
parametric, and they adopt a blocking design. We refer the more promising candidates. On the 1000 pseudo-trials
to them as tn-Race and tb-Race. The first does not adopt considered, at the moment in which the computation time
any correction for multiple-tests, while the second adopts was up and a decision among the surviving candidate had
the Bonferroni correction and is therefore not unlike the to be taken, the set of survivors was on average composed
method described by Chien et al. (1995). by 7.9 candidates and such survivors had been tested on av-
The goal is to select an as good as possible configura- erage on 77.9 instances. In the case of tn-Race, the average
tion out of the 256 configurations of the MAX–MIN-Ant- size of the set of survivors upon expiration of computation
System described in Section 4.2. time was 31.1, while the number of instances seen by such
survivors was on average 18.2. For tb-Race the numbers
Each configuration was executed once on each of the 400 are 253.8 and 5, respectively. In this sense, F-Race proved
instances for 10s on a CPU Athlon 1.4GHz with 512 MB to be the bravest of the three, while tb-Race appeared to
of RAM, for a total time of about 12 days to allow in a be extremely conservative and on average it dropped only
following phase the application of the resampling analysis. slightly more than 2 candidates before the time limit.
The costs of the best solution found in each of these exper-
iments were stored in a two-dimensional 400 × 256 array. On the basis of our Monte Carlo evaluation, some stronger
In the following, when saying that we run configuration j statement can be pronounced on the quality of the results
over instance i, we will simply mean that we execute the obtained by F-Race. We have shown above that the perfor-
pseudo-experiment that consists in reading the value in po- mance of F-Race was good in a relative sense: F-Race pro-
sition (i, j) from the array of the results. duced better results than its competitors. We state now that,
in a precise sense to be defined presently, the performance
From the 400 instances, we extract 1000 pseudo-samples of F-Race was absolutely good. We compare F-Race with
each of which is obtained by re-ordering randomly the orig- Cheat, a brute-force method that, rather unfairly, uses in
inal instances. Each pseudo-sample is used for a pseudo- each pseudo-trial the same number of instances used by F-
trial, that is, for simulating a run of a racing algorithm: One Race and on these instances runs all the candidate config-
after the other the instances are considered and, on the ba- urations. In doing so, Cheat allows itself an enormously
sis of the results of pseudo-experiments, configurations are large amount of computation time. In our experiments,
progressively discarded. Each algorithm stops after execut- Cheat has performed on average about 19950 experiments
ing 5 × 256 pseudo-experiments.5 Upon time expiration, per trial which is equivalent to about 55 hours of computa-
the best candidate in the pseudo-trial is selected and it is tion against the 3.5 hour available to F-Race. The selection
tested on 10 instances that were not used during the selec- operated by Cheat is the optimum that can be obtained from
tion itself. The results obtained on these previously unseen the fixed set of training instances, and considering only one
instances are recorded and are used for comparing the three run of each configuration on each instance. F-Race can be
racing methods. To summarize, after 1000 pseudo-trials a seen as an approximation of Cheat: The set of experiment
vector of 10 × 1000 components is obtained for each of performed by F-Race is a proper subset of the experiments
F-Race, tn-Race, and tb-Race. It is important to note that performed by Cheat.
the three algorithms face the same pseudo-samples and that
the candidates selected in each pseudo-trial by each algo- Now, in the statistical analysis of the results obtained by
rithm are tested on the same unseen instances. The generic our Monte Carlo experiments, we were not able to reject
i-th components of the three 10×1000 vectors refers there- the null that F-Race and Cheat produce equivalent results.
fore to the results obtained by the champions of the three Also in this case, we have worked at the significance level
of 5%: neither Wilcoxon test nor t-test were able to show
5
In such a time, by definition, brute-force would be able to significance.
test the 256 candidates on only 5 instances. The 5 × 256 pseudo-
6
experiments simulate 3.5 hours of actual computation on the com- The same conclusion can be drawn on the basis of a paired
puter used for producing the results proposed here. t-test.
18 ARTIFICIAL LIFE, ADAPTIVE BEHAVIOR, AGENTS AND ANT COLONY OPTIMIZATION

6 CONCLUSIONS Dean, A. and Voss, D. (1999). Design and Analysis of Experi-


ments. Springer Verlag, New York, NY, USA.
The paper has given a formal definition of the problem of Dorigo, M. and Di Caro, G. (1999). The Ant Colony Optimization
configuring a metaheuristic and has presented F-Race, an meta-heuristic. In Corne, D., Dorigo, M., and Glover, F., ed-
itors, New Ideas in Optimization, pages 11–32. McGraw Hill,
algorithm belonging to the class of racing algorithms pro- London, UK.
posed in the machine learning community for solving the
Dorigo, M., Di Caro, G., and Gambardella, L. M. (1999). Ant
model selection problem (Maron and Moore, 1994). algorithms for discrete optimization. Artificial Life, 5(2):137–
In giving a formal definition of the configuration problem, 172.
we have stressed the important role played by the probabil- Dorigo, M. and Stützle, T. (2002). The ant colony optimiza-
ity measure defined on the class of the instances. Without tion metaheuristic: Algorithms, applications and advances. In
Metaheuristics Handbook. Kluwer Academic Publishers. In
such a concept, it is impossible to give a meaning to the press.
generalization process that is implicit when a configuration
Good, P. I. (2001). Resampling Methods. Birkhauser, Boston,
is selected on the basis of its performance on a limited set MA, USA, second edition.
of instances.
Gratch, J., Chien, S., and DeJong, G. (1993). Learning search
F-Race, the algorithm we propose in this paper, is the spe- control knowledge for deep space network scheduling. In In-
cialization of the generic class of racing algorithms to the ternational Conference on Machine Learning, pages 135–142.
configuration of metaheuristics. The adoption of the Fried- Hsu, J. (1996). Multiple Comparisons. Chapman & Hall/CRC,
man test, which is nonparametric and two-way, matches Boca Raton, Fl, USA.
indeed the specificities of the configuration problem. As Johnson, D. S. and McGeoch, L. A. (1997). The travelling sales-
shown by the experimental results proposed in Section 5, man problem: A case study in local optimization. In Aarts, E.
H. L. and Lenstra, J. K., editors, Local Search in Combinatorial
F-Race obtains better results than its competitors that adopt Optimization, pages 215–310. John Wiley & Sons, Chichester,
a parametric approach. This better performance can be in- UK.
deed explained by the ability of discarding inferior candi- Johnson, D. S., McGeoch, L. A., Rego, C., and Glover,
dates earlier and faster than the competitors. Still, we do F. (2001). 8th DIMACS implementation challenge.
not feel like using these results for claiming a general pre- http://www.research.att.com/˜dsj/chtsp/.
sumed superiority of F-Race against its fellow racing algo- Larson, H. (1982). Introduction to Probability Theory and Statis-
rithms. Rather, we wish to stress the appeal of the racing tical Inference. John Wiley & Sons, New York, NY, USA.
idea in itself, and we want to interpret our results as an evi- Lawler, E. L., Lenstra, J. K., Kan, A. H. G. R., and Shmoys, D. B.
dence that this idea is extremely promising for configuring (1985). The Travelling Salesman Problem. John Wiley & Sons,
metaheuristics and should be further investigated. Chichester, UK.
Maron, O. and Moore, A. W. (1994). Hoeffding races: Accel-
Acknowledgments erating model selection search for classification and function
approximation. In Cowan, J. D., Tesauro, G., and Alspector, J.,
editors, Advances in Neural Information Processing Systems,
This work was supported by the “Metaheuristics Network”,
volume 6, pages 59–66. Morgan Kaufmann Publishers, Inc.
a Research Training Network funded by the Improving Hu-
Mitchell, T. M. (1997). Machine Learning. McGraw-Hill, New
man Potential programme of the CEC, grant HPRN-CT-
York, NY, USA.
1999-00106. The information provided is the sole respon-
Moore, A. W. and Lee, M. S. (1994). Efficient algorithms for min-
sibility of the authors and does not reflect the Community’s
imizing cross validation error. In International Conference on
opinion. The Community is not responsible for any use that Machine Learning, pages 190–198. Morgan Kaufmann Pub-
might be made of data appearing in this publication. lishers, Inc.
Reinelt, G. (1994). The Traveling Salesman: Computational So-
References lutions for TSP Applications, volume 840 of Lecture Notes in
Computer Science. Springer Verlag, Berlin, Germany.
Bentley, J. L. (1992). Fast algorithms for geometric traveling Stützle, T. and Hoos, H. (1996). Improving the Ant-System: A
salesman problems. ORSA Journal on Computing, 4(4):387– detailed report on the MAX –MIN Ant System. Technical
411. Report AIDA–96–12, FG Intellektik, TU Darmstadt, Germany.
Billingsley, P. (1986). Probability and Measure. John Wiley & Stützle, T. and Hoos, H. H. (1997). The MAX –MIN Ant Sys-
Sons, New York, NY, USA, second edition. tem and local search for the traveling salesman problem. In
Chien, S., Gratch, J., and Burl, M. (1995). On the efficient allo- Bäck, T., Michalewicz, Z., and Yao, X., editors, Proceedings
cation of resources for hypothesis evaluation: A statistical ap- of the 1997 IEEE International Conference on Evolutionary
proach. Pattern Analysis and Machine Intelligence, 17(7):652– Computation (ICEC’97), pages 309–314. IEEE Press, Piscat-
665. away, NJ, USA.
Conover, W. J. (1999). Practical Nonparametric Statistics. John Stützle, T. and Hoos, H. H. (2000). MAX –MIN Ant System.
Wiley & Sons, New York, NY, USA, third edition. Future Generation Computer Systems, 16(8):889–914.

You might also like