Dynamic Algorithm Selection For Pareto Optimal Set Approximation
Dynamic Algorithm Selection For Pareto Optimal Set Approximation
Abstract This paper presents a meta-algorithm for approximating the Pareto optimal
set of costly black-box multiobjective optimization problems given a limited number
of objective function evaluations. The key idea is to switch among different algo-
rithms during the optimization search based on the predicted performance of each
algorithm at the time. Algorithm performance is modeled using a machine learning
technique based on the available information. The predicted best algorithm is then
selected to run for a limited number of evaluations. The proposed approach is tested
on several benchmark problems and the results are compared against those obtained
using any one of the candidate algorithms alone.
Keywords multiobjective optimization · expensive black-box function · machine
learning · classification · algorithm selection · hypervolume metric · features
1 Introduction
Ingrida Steponavičė
School of Mathematical Sciences, Monash University, Clayton, Australia
Tel.: +61 3 9905 8511
E-mail: [email protected]
Rob J Hyndman
Department of Econometrics & Business Statistics, Monash University, Clayton, Australia
Laura Villanova
School of Mathematical Sciences, Monash University, Clayton, Australia
Kate Smith-Miles
School of Mathematical Sciences, Monash University, Clayton, Australia
2 Ingrida Steponavičė et al.
of solutions representing the best possible trade-offs among the objectives. Therefore,
multiobjective optimization is a very important research area due to the multiobjec-
tive nature of most real-life problems, with many challenging issues to tackle.
The development of multiobjective optimization techniques has been an active
area of research for many years, resulting in a wide variety of approaches [4, 23, 24].
Besides the challenge caused by multiple objectives, practical problems arising in
engineering often require the solution of optimization problems where analytical ex-
pressions of the objective functions are unavailable and the evaluation of the objective
functions are very expensive. Such problems might involve computationally expen-
sive black-box simulation, or require costly experiments to be conducted in order to
obtain the objective function values. One simulation or experiment may take several
hours, days or even weeks. In addition to time restrictions, there can be other limita-
tions such as financial and physical constraints. Therefore, in order to keep the cost
affordable, it is important to find approximate solutions of the optimization problem
within a very restricted number of function evaluations (often only a few hundred
evaluations can be made).
Methods have been developed to solve expensive black-box optimization (BBO)
problems by building a surrogate model that approximates the objective function
and predicts promising new solutions at a smaller evaluation cost [14, 31]. One of
the state-of-art methods for expensive multiobjective optimization problems, named
ParEGO, was developed by Knowles [16]. It is essentially a multiobjective translation
of the efficient global optimization (EGO) method [14], where multiple objectives are
converted to a single objective using a scalarization function with different parameter
values at each step. The idea of modelling challenging functions by statistical mod-
els has a very long history, and was popularized for optimization problems in [25,
34]. Other EGO modifications to address costly multiobjective optimization prob-
lems are also available, including SMS-EGO [29], -EGO [37], MOEA/D-EGO [41],
and EGO-MO [7].
In addition to the EGO family of algorithms, we have previously proposed the
EPIC (Efficient Pareto Iterative Classification) algorithm [32]. In this approach, the
Pareto optimal set is identified by classifying regions of the decision space as likely to
be part of the Pareto set or not. A support-vector-machine (SVM) is applied in order
to capture nonlinear relationships between class labels and features (i.e., decision
variable values in this case). The advantage of this approach is that it does not depend
on the dimensionality of the objective space and so is suitable for high-dimensional
multiobjective problems.
Other approaches are also possible. It is an open question how to best select the
optimization algorithm for the particular problem of interest. Comparisons of various
methods for expensive multiobjective black-box optimization are prospering in the
literature, but they are usually limited and highlight the advantages of some proposed
modification to an existing method over its predecessor. There is a need for a deeper
analysis of the characteristics of the available algorithms and how well-suited they
are to specific problems.
To our knowledge, the methods developed so far each have some strengths and
weaknesses, and despite advances made in recent years, they are still far from being
able to solve a variety of real-life problems efficiently. Often, they are better suited to
Dynamic Algorithm Selection for Pareto Optimal Set Approximation 3
some restricted problem classes. Moreover, for the same problem, some algorithms
can perform very well at the beginning and then lose their power, while other al-
gorithms can perform badly at the beginning but later demonstrate their superiority.
Such an example is demonstrated in Figure 1 which presents the percentage of times
when one of the considered algorithms (ParEGO, EPIC and a hybrid of EPIC and
Nelder-Mead) was outperforming others with respect to the hypervolume (HV) met-
ric (see Section 3.3) on a benchmark problem ZDT3 over 100 runs; the horizontal
axis represents the number of objective function evaluations (or iterations). This fig-
ure clearly shows that there is no single algorithm (at least among those considered)
performing better than the rest for all runs and at all points in time. For example, if
one can afford more than 70 evaluations, one should use ParEGO; in case of fewer
than 20 affordable evaluations, one should run EPIC-NM. As this figure suggests,
one might think that we can obtain good results by running EPIC-NM for the first 20
iterations, then EPIC for the next 50 iterations, and then ParEGO for the remaining
iterations. However, algorithm performance depends on the decision space already
explored. If we switch algorithms, then we would also have a different historical ex-
ploration of the decision space, so the performance may not match that presented in
Figure 1. Thus, there is no guarantee that the results obtained using this simple idea
will outperform the algorithms running separately.
ZDT3 problem
70
60
Percentage of times performing better, %
50
ParEGO
40 EPIC
EPIC−NM
30
20
10
0
0 20 40 60 80 100 120 140 160
Iterations
Therefore, we are interested in learning how to select the right algorithm at each
stage of the optimization process when very little is known about the multiobjective
optimization problem in advance. In particular, we focus on expensive black-box
problems where we wish to limit the number of function evaluations.
The paper has the following structure. Section 2 introduces the main concepts
involved in multiobjective optimization, and our proposed approach is described in
Section 3. In Section 4, we outline our experimental setup and the selected algo-
rithms, and present and analyse the results we have obtained on some test problems.
4 Ingrida Steponavičė et al.
Section 5 draws some conclusions and briefly discusses some future research direc-
tions.
This is an initial exploration of an approach to this problem, describing how it
can be implemented, and highlighting and discussing the results obtained on a small
number of optimization problems. Much larger computational experiments involving
many more optimization problems would be required in order to draw general con-
clusions, and validate the proposed approach. This would take a vast amount of time
and so is left for future research.
There are many different algorithms that perform well on some problem classes and
struggle on others, and it is difficult to predict the accurate performance of an al-
gorithm on a particular problem. In practice, the number of function evaluations
required by the candidate algorithm to solve some particular problem can be vast.
Having a particular problem to be solved, especially in a limited number of func-
tion evaluations, one must select an algorithm without being sure of making the most
appropriate choice. Bad decisions may lead to an unacceptable number of function
evaluations and poor approximation of the true Pareto set. Algorithm selection is a
learning problem where we use a model to predict the expected performance of each
algorithm on a given problem; the model is trained on a set of performance data for a
number of problems [30]. For each new problem, the model is used to select the algo-
rithm that is expected to give the best results. In addition to static approaches where
the selection is performed before running the algorithms, there have been proposed
Dynamic Algorithm Selection for Pareto Optimal Set Approximation 5
Here, we suggest switching among different algorithms during the search, based
on the information collected in the objective and decision spaces. For this purpose, we
use a model that predicts which algorithm will perform the best in a given situation
according to a selected performance metric. In multiobjective optimization, algorithm
performance can be assessed taking into account different qualities of the estimated
Pareto optimal set such as spread, convergence, distribution, etc. The choice of met-
rics to use in evaluating algorithm performance is somewhat subjective.
The basic idea of dynamic algorithm selection tends to circumvent the following
challenges which are associated with expensive multiobjective black-box optimiza-
tion problems: (i) selecting the ‘right’ algorithm to solve the problem with very little
(or no) knowledge about it; and (ii) obtaining a high quality approximation of the
Pareto optimal set within a limited number of evaluations.
In Stage A, we collect the training data and use them to build the performance
prediction model(s). That is, a classification algorithm is used to learn the relation-
ship between the descriptive metrics of a current situation and subsequent algorithm
performance over the next few evaluations. This is based on a large dataset where all
considered algorithms have been applied to a large number of problems at various
points in time and their performance has been monitored.
Stage B employs the prediction model to approximate the Pareto optimal set and
can be decomposed into the following steps:
Step B1 Generate an initial set in the decision space and evaluate the objective func-
tions;
Step B2 Given some evaluated vectors, calculate descriptive metrics;
Step B3 Ask the prediction model to predict the best algorithm based on the calcu-
lated metrics;
Step B4 Run the suggested algorithm for a limited number of evaluations;
Step B5 Stop if the maximum number of function evaluations is reached. Otherwise,
go to B2.
Stage B is represented in Figure 2. The most important elements are the prediction
model and descriptive metrics which at the very beginning are calculated from the
initial set of evaluated solutions. After running a suggested algorithm for a small
number of iterations, the solution set is updated. Metrics are updated and the same
steps are repeated until the maximum number of evaluations is reached.
6 Ingrida Steponavičė et al.
Initial
Set Which algorithm
should I run?
Suggested
Descriptive Prediction
Metrics Model Algorithm
If # sol ≥ N
Finish
To build a classifier, we need to have knowledge of what features make good predic-
tors of class membership for the algorithms we are considering. In real-world situa-
Dynamic Algorithm Selection for Pareto Optimal Set Approximation 7
tions, we often have little knowledge about relevant features. Therefore, we have to
find a set of features that separates classes as cleanly as possible.
It is clear that the selected features must have some relationship with the perfor-
mance of the algorithms. Usually, in the published comparisons of different multi-
objective optimization algorithms, we can see only numerical experiments and dis-
cussion about the algorithm performance without deeper analysis about why some
algorithms work well on the problems, why some algorithms have some difficulties,
and what are the characteristics that make them succeed or fail.
Intuitively, the metrics characterizing the observed Pareto set and the progress
made in searching for non-dominated solutions should be important. There exist
many metrics used to assess the quality of the obtained solution set in multiobjective
optimization that can be categorized into cardinality (or capacity), convergence, di-
versity and hybrid measures [12]. Most of these metrics were developed to compare
an obtained solution set with the true Pareto optimal set, and include generational
distance [5], inverted generational distance [35], -indicator [45] and hypervolume
difference [38] among others. However, these are not suitable for our purpose as in
practice the true Pareto set is not known a priori. Hence, this significantly reduces our
choice.
We now describe the quality metrics we have considered.
|P |
RON(S, P ) = , (2)
|S|
where |S| is number of the solutions in the observed solution set and |P | is the
number of non-dominated solutions in the observed Pareto set P .
Generalized Spread metric [6]. This diversity metric indicates the distribution of
solutions in the observed Pareto set P :
Pm P ¯
∗ i=1 d(ei , P ) + X∈P |d(X, P ) − d|
∆ (P, T ) = Pm ¯ , (3)
i=1 d(ei , P ) + |P | ∗ d
where (e1 , . . . , em ) are m extreme solutions in T , the true Pareto optimal set, and
1 X
d¯ = d(X, P ).
|P |
X∈P
Smaller values are preferable. This metric requires knowledge of extreme val-
ues of T . When solving real-word problems, this information is not available
beforehand. Therefore, one can use some estimates of the extreme values of the
objective space.
8 Ingrida Steponavičė et al.
Number of distinct choices [38]. This metric divides the objective space into a grid
of (1/µ)m m-dimensional hypercubes (µ ∈ [0, 1]) and calculates the number of
hypercubes containing solutions; i.e., it indicates the number of distinct solutions
that exists in an observed Pareto solution set P :
ν−1
X ν−1
X ν−1
X
NDCµ (P ) = ··· Nµ (q, P ) (4)
`m =0 `2 =0 `1 =0
|P |
CLµ (P ) = , (5)
NDCµ (P )
In the ideal case where every non-dominated solution obtained is distinct, then
the value of metric CLµ (P ) is equal to 1. Also, the higher the value of metric
CLµ (P ), the more clustered the non-dominated solution set P , and hence the
less preferred. In our opinion, the term ‘cluster’ is misleading, therefore, we have
named it ‘dispersion’.
Correct classification. This is the percentage of correctly classified non-dominated
and dominated solutions obtained by a support vector machine (SVM). This met-
ric was selected because some of the considered algorithms use an SVM. Thus,
the quality of an SVM at a given point in the search is a useful descriptor of the
current situation and how an algorithm relying on SVM modelling is likely to
perform.
Correct classification of non-dominated class. This is the percentage of correctly
classified non-dominated solutions. As the non-dominated class is usually (sig-
nificantly) smaller than the dominated one, total correct classification may still be
good while all examples from non-dominated class can be misclassified.
Correct classification of dominated class. This is the percentage of dominated so-
lutions correctly classified by an SVM.
Hypervolume metric [44]. This metric has attracted a lot of interest in recent years
as it describes both the convergence towards the Pareto optimal set and the dis-
tribution along it. Basically it calculates the volume covered by non-dominated
solutions (see Figure 3). Mathematically, for each solution i ∈ P , a hypercube vi
is constructed with a reference point W and the solution i as the diagonal corners
of the hypercube. The reference point can simply be obtained by composing a
vector of the worst objective function values. Then, a union of all hypercubes is
Dynamic Algorithm Selection for Pareto Optimal Set Approximation 9
One of the main reasons for the popularity of HV is that it not only reflects domi-
nance, but also promotes diverse sets. Moreover, it is the only indicator known to
be strictly monotonic with respect to Pareto dominance and thereby guaranteeing
that the Pareto optimal set achieves the maximum hypervolume possible, while
any worse set will be assigned a worse indicator value [1].
Despite the attractive features of HV, it has a few major issues. First, it is
computationally intensive, especially for high dimensional problems. Second, the
metric varies with the choice of the reference point [42]. Finally, if the scales of
the objective functions are very different, it can be biased in favour of objectives
with a larger scale. To eliminate the bias of different scales, it is suggested [4] to
calculate the HV metric using normalized objective function values.
We use the HV metric for two purposes: first, as one of the features to char-
acterize the current situation; and second, to assess algorithm performance or
superiority to derive class membership.
All the descriptive metrics that depend on the scale of the objective functions
should be calculated using normalized (scaled) objective function values in order to
eliminate any bias in the metric values. Therefore, we estimated the extreme values
of the objective space and used this information for normalization.
4 Experimental Analysis
To test the proposed approach, we switch among three algorithms: ParEGO, EPIC,
and Nelder-Mead (NM). The performance of our dynamic switching algorithm was
10 Ingrida Steponavičė et al.
compared with ParEGO, EPIC and EPIC-NM algorithms running separately. They
are shortly discussed below.
ParEGO This method employs a Gaussian process (GP) model to predict objective
function values. It converts the multiobjective optimization problem into a single ob-
jective problem using the augmented Tchebycheff function:
m
X
fλ (x) = max λj fj (x) ± ρ λj fj (x), (7)
j=1,...,m
j=1
where ρ > 0 is a small positive number and λ is a weight vector. At each itera-
tion of the algorithm, a different weight vector is drawn uniformly at random from
the set of evenly distributed vectors allowing the model to gradually build up an ap-
proximation to the true Pareto set. Before scalarization, the objective functions are
normalized with respect to the known (or estimated) limits of the objective space to
the range [0, 1]. At each iteration, the method uses a genetic algorithm to search for
the solutions that maximizes the expected improvement criterion with respect to a
surrogate model. After evaluation of the selected solution on the real expensive func-
tion, ParEGO updates the GP surrogate model of the landscape and repeats the same
steps.
The main disadvantage of employing a GP is that model construction can be a
very time-consuming process [13], where the time increases with the number of eval-
uated vectors used to model the GP. To overcome this issue, when the iteration num-
ber is greater or equal to 25, ParEGO uses a subset of the evaluated vectors to build
the GP model, thus attempting to balance model accuracy and computation time.
Moreover, using a GP becomes increasingly problematic in high dimensional spaces
[8], so these methods do not scale well as the dimension of the problem increases.
EPIC The EPIC algorithm approximates the Pareto optimal set with a limited num-
ber of objective function evaluations. Its main idea is to learn about the evaluated
non-dominated and dominated vectors in the decision space and to predict which un-
evaluated vectors are likely to be non-dominated, thus gradually building an approx-
imation of the Pareto optimal set by evaluating the most promising decision vectors.
A discussion of how to select vectors for evaluation can be found in [32].
A major advantage of this method is that it does not use any statistical model
of the objective function, such as GP, and so it involves more modest computational
requirements, and scales easily to handle high dimensional spaces. Moreover, it is
simple to implement, has no limitations on high dimensional problems, and multiple
decision vectors can be selected at each iteration [32]. However, its weakness is that
it does not generate new decision vectors but rather selects a decision vector from a
given set representing the decision space, whose quality has an impact on the method
performance. To overcome this issue, the EPIC method can be modified by introduc-
ing a mechanism for generating new decision vectors in the most promising areas of
the decision space.
Dynamic Algorithm Selection for Pareto Optimal Set Approximation 11
All algorithms were implemented in Matlab and their parameters were set to de-
fault values. In particular, our ParEGO implementation was based on the C code by
Knowles, which can be downloaded from www.cs.bham.ac.uk/~jdk/parego/;
this implementation corresponds to the algorithm described in [16]. The default val-
ues for the ParEGO implementation were used, namely (i) population size equal to
20, (ii) number of restarts when optimising the likelihood function is equal to 30, and
(iii) crossover is equal to 0.2. The implementation of EPIC is described in [32]. In
both EPIC and EPIC-NM, we used SVM with a radial basis function kernel; SVM
kernel parameters were obtained through cross-validation performed at each iteration.
In EPIC-NM, an initial simplex was composed of the vertices having the best scalar-
ized problem values. A local search was called after EPIC could not make progress
12 Ingrida Steponavičė et al.
(i.e., no change occurred in HV metric values for the last four iterations), and run for
five evaluations. All algorithms started with the same initial set consisting of 11n − 1
decision vectors, where n is the dimension of the decision space, as suggested in
[14]. The Latin hypercube technique was used to sample the decision space. In addi-
tion, for EPIC and EPIC-NM, we sampled a design space representation consisting
of 500 vectors although the objective function values for these points were not eval-
uated unless selected by an algorithm. The maximum number of evaluations was
restricted to 200 including the initial sampling. The algorithms were run 100 times
with different initial sets (as their performance is influenced by the initial set), and the
average values of the HV metric were calculated. The performance of the algorithms
was measured at every iteration to assess the progress obtained after each objective
function evaluation.
Our training set consisted of more than 12000 instances (snapshots in time) from
solving the following four benchmark problems: ZDT3 [43], OKA2 [27], Kursawe
[19] and Viennet [36]. These presented different challenges for approximating the
true Pareto optimal set.
ZDT3. This problem has two objective functions and three decision variables. The
Pareto optimal set comprises several discontinuous convex parts in the objective
space.
Kursawe. This problem has two objective functions and a scalable number of deci-
sion variables. In our experiment, three decision variables were used. Its Pareto
optimal set is disconnected and symmetric in the decision space, and disconnected
and concave in the objective space.
OKA2. This problem has two objective functions and three decision variables. Its
true Pareto optimal set is a spiral shaped curve in the objective space, and the
density of the Pareto optimal solutions in the objective space is low.
Viennet. This problem consists of three objective functions and two decision vari-
ables. Its true Pareto optimal set is convex in the objective space.
There are many classification methods available including linear classifiers, support
vector machines, decision trees and neural networks. To model our algorithm perfor-
mance we have used a random forest [3], which is an ensemble of randomly trained
decision trees. Algorithm performance was assessed with the HV metric calculated
using normalized objective function values. Each instance was associated with the
name of the algorithm that had the largest value of the HV metric. The distribution
of instances among the three classes was as follows: 4599 instances in the ParEGO
class, 5890 in the EPIC class and 2930 in the NM class. Hence, the largest class
consists of instances where EPIC was best, while NM’s was the smallest.
Dynamic Algorithm Selection for Pareto Optimal Set Approximation 13
HV
100
ParEGO
90 EPIC
NM
80
Percent when an algorithm is the best
70
60
50
40
30
20
10
0
0.4 0.5 0.6 0.7 0.8 0.9 1
Metric value
Fig. 4 Percentage distribution of HV values when each algorithm was the best
Ratio of Nondominated
50
ParEGO
45 EPIC
NM
40
Percent when an algorithm is the best
35
30
25
20
15
10
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45
Metric value
Fig. 5 Percentage distribution of RoN values when each algorithm was the best
NDC
40
ParEGO
EPIC
35 NM
30
Percent when an algorithm is the best
25
20
15
10
0
0 2 4 6 8 10 12 14
Metric value
Fig. 6 Percentage distribution of NDC values when each algorithm was the best
Dynamic Algorithm Selection for Pareto Optimal Set Approximation 15
A number of computational experiments were carried out to test the ability of the pro-
posed approach to approximate the Pareto optimal set with a limited number of eval-
uations. The performance of our proposed dynamic switching algorithm is demon-
strated in Figure 7. Here, it dynamically switches among three algorithms at every
fifth evaluation using model predictions to decide which algorithm to use and these
moments are marked by dots. Also, its performance is compared with the perfor-
mance of three algorithms run separately.
0.67
0.66
0.65
0.64
HV
0.63
0.62
EPIC
EPIC−NM
0.61
ParEGO
Our approach
0.6
call ParEGO
call NM
0.59 call EPIC
0 20 40 60 80 100 120 140 160
Number of evaluations
The comparison, based on the average HV metric over 100 runs and calculated
using normalized and original values of ZDT3 and Kursawe, is presented in Fig-
ures 8–13. The figures depict the average HV measured after the initial sampling
(i.e., starting from the 11nth function evaluation). The initial sampling does not pro-
vide relevant information for algorithm comparisons because, for each of the 100
runs, all the algorithms have been evaluated on the same initial sample. Results simi-
lar to those reported in Figures 8–13 were obtained for OKA2 and Viennet problems.
They also show that the proposed approach is competitive. It can be noted that for the
ZDT3 problem, algorithm superiority depends on how the HV metric is calculated.
For example, Figure 10 demonstrates that the proposed approach is the most efficient
with respect to the HV metric calculated using the original scale while Figure 8 does
not provide a clear winner.
This raises the question of which metric should be used to judge the algorithm
performance. If the objectives have different scales and we aim to find a uniformly
16 Ingrida Steponavičė et al.
HV of ZDT3 problem
0.66
0.65
0.64
avg normalized HV
0.63
0.62
0.61
0.6
EPIC
0.59 EPIC−NM
ParEGO
Our approach
0.58
0 20 40 60 80 100 120 140 160
iterations
0.66 0.66
Avg. normalized HV
Avg. normalized HV
0.64 0.64
0.62 0.62
0.6 0.6
0.58 0.58
0.66 0.66
Avg. normalized HV
Avg. normalized HV
0.64 0.64
0.62 0.62
0.6 0.6
0.58 0.58
5.95
5.9
5.85
avg original HV
5.8
5.75
5.7
EPIC
5.65 EPIC−NM
ParEGO
Our approach
5.6
0 20 40 60 80 100 120 140 160
iterations
HV of Kursawe problem
0.9
0.85
0.8
avg normalized HV
0.75
0.7
0.65
EPIC
EPIC−NM
0.6
ParEGO
Our approach
0.55
0 20 40 60 80 100 120 140 160
iterations
0.9 0.9
Avg. normalized HV
Avg. normalized HV
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160
Evaluations Evaluations
ParEGO Our approach
0.9 0.9
Avg. normalized HV
Avg. normalized HV
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160
Evaluations Evaluations
520
500
480
avg original HV
460
440
420
400
380 EPIC
EPIC−NM
360 ParEGO
Our approach
340
0 20 40 60 80 100 120 140 160
iterations
5.1 Conclusions
References
1. Bader, J., Zitzler, E.: Hype: An algorithm for fast hypervolume-based many-objective optimization.
Evolutionary computation 19(1), 45–76 (2011)
2. Borrett, J.E., Tsang, E.P.: Adaptive constraint satisfaction: the quickest first principle. In: Computa-
tional Intelligence, pp. 203–230. Springer (2009)
3. Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)
4. Deb, K.: Multi-objective optimization using evolutionary algorithms, vol. 16. John Wiley & Sons
(2001)
5. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm:
Nsga-ii. Evolutionary Computation, IEEE Transactions on 6(2), 182–197 (2002)
6. Durillo, J.J., Nebro, A.J.: jmetal: A java framework for multi-objective optimization. Advances in
Engineering Software 42(10), 760–771 (2011)
7. Feng, Z., Zhang, Q., Zhang, Q., Tang, Q., Yang, T., Ma, Y.: A multiobjective optimization based
framework to balance the global exploration and local exploitation in expensive optimization. Journal
of Global Optimization pp. 1–18 (2014)
8. Forrester, A.I., Keane, A.J.: Recent advances in surrogate-based optimization. Progress in Aerospace
Sciences 45(1–3), 50–79 (2009)
9. Gao, F., Han, L.: Implementing the nelder-mead simplex algorithm with adaptive parameters. Com-
putational Optimization and Applications 51(1), 259–277 (2012)
10. Garrett, D., Dasgupta, D.: Multiobjective landscape analysis and the generalized assignment problem.
In: Learning and Intelligent Optimization, pp. 110–124. Springer (2008)
11. Han, L., Neumann, M.: Effect of dimensionality on the nelder–mead simplex method. Optimization
Methods and Software 21(1), 1–16 (2006)
12. Jiang, S., Ong, Y.S., Zhang, J., Feng, L.: Consistencies and contradictions of performance metrics in
multiobjective optimization. Cybernetics, IEEE Transactions on 44(12), 2391–2404 (2014)
13. Jin, R., Chen, W., Simpson, T.: Comparative studies of metamodelling techniques under multiple
modelling criteria. Structural and Multidisciplinary Optimization 23(1), 1–13 (2001)
14. Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box func-
tions. Journal of Global Optimization 13(4), 455–492 (1998)
15. Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: A survey. Journal of artificial
intelligence research pp. 237–285 (1996)
16. Knowles, J.: Parego: A hybrid algorithm with on-line landscape approximation for expensive mul-
tiobjective optimization problems. IEEE Transactions on Evolutionary Computation 10(1), 50–66
(2006)
17. Koduru, P., Dong, Z., Das, S., Welch, S.M., Roe, J.L., Charbit, E.: A multiobjective evolutionary-
simplex hybrid approach for the optimization of differential equation models of gene networks. Evo-
lutionary Computation, IEEE Transactions on 12(5), 572–590 (2008)
18. Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: New perspectives on some
classical and modern methods. SIAM review 45(3), 385–482 (2003)
19. Kursawe, F.: A variant of evolution strategies for vector optimization. In: H.P. Schwefel, R. Mn-
ner (eds.) Parallel Problem Solving from Nature, vol. 496, pp. 193–197. Springer Berlin Heidelberg
(1991)
20. Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E.: Convergence properties of the nelder–mead
simplex method in low dimensions. SIAM Journal on optimization 9(1), 112–147 (1998)
21. Lagoudakis, M.G., Littman, M.L.: Algorithm selection using reinforcement learning. In: ICML, pp.
511–518. Citeseer (2000)
22. Luersen, M.A., Le Riche, R.: Globalized nelder–mead method for engineering optimization. Com-
puters & structures 82(23), 2251–2260 (2004)
23. Marler, R.T., Arora, J.S.: Survey of multi-objective optimization methods for engineering. Structural
and multidisciplinary optimization 26(6), 369–395 (2004)
20 Ingrida Steponavičė et al.
24. Miettinen, K.: Nonlinear multiobjective optimization, vol. 12. Springer Science & Business Media
(1999)
25. Mockus, J.: Bayesian Approach to Global Optimization. Kluwer Academic Publishers, Dordrecht
(1989)
26. Nelder, J.A., Mead, R.: A simplex method for function minimization. The computer journal 7(4),
308–313 (1965)
27. Okabe, T., Jin, Y., Sendhoff, M.O.B.: On test functions for evolutionary multi-objective optimization.
In: X. Yao, E. Burke, J. Lozano, J. Smith, J. Merelo-Guervs, J. Bullinaria, J. Rowe, P. Tio, A. Kabn,
H.P. Schwefel (eds.) Parallel Problem Solving from Nature – PPSN VIII, vol. 3242, pp. 792–802.
Springer Berlin Heidelberg (2011)
28. Pham, N., Wilamowski, B.M.: Improved nelder mead’s simplex method and applications. Journal of
Computing 3(3), 55–63 (2011)
29. Ponweiser, W., Wagner, T., Biermann, D., Vincze, M.: Multiobjective optimization on a limited budget
of evaluations using model-assisted S-metric selection. In: G. Rudolph, T. Jansen, N. Beume, S. Lu-
cas, C. Poloni (eds.) Parallel Problem Solving from Nature – PPSN X, Lecture Notes in Computer
Science, vol. 5199, pp. 784–794. Springer Berlin Heidelberg (2008)
30. Rice, J.R.: The algorithm selection problem (1975)
31. Santana-Quintero, L., Montaño, A., Coello, C.C.: A review of techniques for handling expensive func-
tions in evolutionary multi-objective optimization. In: Y. Tenne, C.K. Goh (eds.) Computational In-
telligence in Expensive Optimization Problems, vol. 2, pp. 29–59. Springer Berlin Heidelberg (2010)
32. Steponavičė, I., Hyndman, R.J., Smith-Miles, K., Villanova, L.: Efficient identification of the pareto
optimal set. In: Learning and Intelligent Optimization, pp. 341–352. Springer International Publishing
(2014)
33. Torczon, V.J.: Multi-directional search: a direct search algorithm for parallel machines. Ph.D. thesis,
Citeseer (1989)
34. Törn, A., Žilinskas, A.: Global Optimization, Lecture Notes in Computer Science, vol. 350 (1989)
35. Van Veldhuizen, D.A., Lamont, G.B.: Multiobjective evolutionary algorithm test suites. In: Proceed-
ings of the 1999 ACM symposium on Applied computing, pp. 351–357. ACM (1999)
36. Viennet, R., Fonteix, C., Marc, I.: New multicriteria optimization method based on the use of a diploid
genetic algorithm: Example of an industrial problem. In: Selected Papers from the European confer-
ence on Artificial Evolution, pp. 120–127. Springer-Verlag, London, UK, (1996)
37. Wagner, T.: Planning and Multi-objective Optimization of Manufacturing Processes by Means of
Empirical Surrogate Models. Vulkan (2013)
38. Wu, J., Azarm, S.: Metrics for quality assessment of a multiobjective design optimization solution set.
Journal of Mechanical Design 123(1), 18–25 (2001)
39. Zahara, E., Kao, Y.T.: Hybrid nelder–mead simplex search and particle swarm optimization for con-
strained engineering design problems. Expert Systems with Applications 36(2), 3880–3886 (2009)
40. Zapotecas-Martínez, S., Coello, C.A.C.: Monss: A multi-objective nonlinear simplex search approach.
Engineering Optimization (ahead-of-print), 1–23 (2015)
41. Zhang, Q., Liu, W., Tsang, E., Virginas, B.: Expensive multiobjective optimization by moea/d with
gaussian process model. Evolutionary Computation, IEEE Transactions on 14(3), 456–474 (2010)
42. Zitzler, E., Brockhoff, D., Thiele, L.: The hypervolume indicator revisited: On the design of pareto-
compliant indicators via weighted integration. In: Evolutionary multi-criterion optimization, pp. 862–
876. Springer (2007)
43. Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms: Empirical
results. Evolutionary Computation 8(2), 173–195 (2000)
44. Zitzler, E., Thiele, L.: Multiobjective optimization using evolutionary algorithms – a comparative case
study. In: Parallel Problem Solving from Nature - PPSN-V, pp. 292–301. Springer (1998)
45. Zitzler, E., Thiele, L., Laumanns, M., Fonseca, C.M., Da Fonseca, V.G.: Performance assessment of
multiobjective optimizers: an analysis and review. Evolutionary Computation, IEEE Transactions on
7(2), 117–132 (2003)