Computational Optimization, Methods and Algorithms Koziel and Yang
Computational Optimization, Methods and Algorithms Koziel and Yang
)
Computational Optimization, Methods and Algorithms
Studies in Computational Intelligence, Volume 356
Editor-in-Chief
Prof. Janusz Kacprzyk
Systems Research Institute
Polish Academy of Sciences
ul. Newelska 6
01-447 Warsaw
Poland
E-mail: [email protected]
Further volumes of this series can be found on our Vol. 345. Shi Yu, Léon-Charles Tranchevent,
homepage: springer.com Bart De Moor, and Yves Moreau
Kernel-based Data Fusion for Machine Learning, 2011
Vol. 333. Fedja Hadzic, Henry Tan, and Tharam S. Dillon ISBN 978-3-642-19405-4
Mining of Data with Complex Structures, 2011
ISBN 978-3-642-17556-5 Vol. 346. Weisi Lin, Dacheng Tao, Janusz Kacprzyk, Zhu Li,
Ebroul Izquierdo, and Haohong Wang (Eds.)
Vol. 334. Álvaro Herrero and Emilio Corchado (Eds.) Multimedia Analysis, Processing and Communications, 2011
Mobile Hybrid Intrusion Detection, 2011 ISBN 978-3-642-19550-1
ISBN 978-3-642-18298-3
Vol. 347. Sven Helmer, Alexandra Poulovassilis, and
Vol. 335. Radomir S. Stankovic and Radomir S. Stankovic
Fatos Xhafa
From Boolean Logic to Switching Circuits and Automata, 2011
Reasoning in Event-Based Distributed Systems, 2011
ISBN 978-3-642-11681-0
ISBN 978-3-642-19723-9
Vol. 336. Paolo Remagnino, Dorothy N. Monekosso, and
Lakhmi C. Jain (Eds.) Vol. 348. Beniamino Murgante, Giuseppe Borruso, and
Innovations in Defence Support Systems – 3, 2011 Alessandra Lapucci (Eds.)
ISBN 978-3-642-18277-8 Geocomputation, Sustainability and Environmental
Planning, 2011
Vol. 337. Sheryl Brahnam and Lakhmi C. Jain (Eds.) ISBN 978-3-642-19732-1
Advanced Computational Intelligence Paradigms in
Healthcare 6, 2011 Vol. 349. Vitor R. Carvalho
ISBN 978-3-642-17823-8 Modeling Intention in Email, 2011
ISBN 978-3-642-19955-4
Vol. 338. Lakhmi C. Jain, Eugene V. Aidman, and
Canicious Abeynayake (Eds.) Vol. 350. Thanasis Daradoumis, Santi Caballé,
Innovations in Defence Support Systems – 2, 2011 Angel A. Juan, and Fatos Xhafa (Eds.)
ISBN 978-3-642-17763-7 Technology-Enhanced Systems and Tools for Collaborative
Learning Scaffolding, 2011
Vol. 339. Halina Kwasnicka, Lakhmi C. Jain (Eds.)
ISBN 978-3-642-19813-7
Innovations in Intelligent Image Analysis, 2010
ISBN 978-3-642-17933-4 Vol. 351. Ngoc Thanh Nguyen, Bogdan Trawiński, and
Jason J. Jung (Eds.)
Vol. 340. Heinrich Hussmann, Gerrit Meixner, and
New Challenges for Intelligent Information and Database
Detlef Zuehlke (Eds.)
Systems, 2011
Model-Driven Development of Advanced User Interfaces, 2011
ISBN 978-3-642-19952-3
ISBN 978-3-642-14561-2
Vol. 352. Nik Bessis and Fatos Xhafa (Eds.)
Vol. 341. Stéphane Doncieux, Nicolas Bredeche, and
Next Generation Data Technologies for Collective
Jean-Baptiste Mouret(Eds.)
Computational Intelligence, 2011
New Horizons in Evolutionary Robotics, 2011
ISBN 978-3-642-20343-5
ISBN 978-3-642-18271-6
Vol. 353. Igor Aizenberg
Vol. 342. Federico Montesino Pouzols, Diego R. Lopez, and
Complex-Valued Neural Networks with Multi-Valued
Angel Barriga Barros
Neurons, 2011
Mining and Control of Network Traffic by Computational
ISBN 978-3-642-20352-7
Intelligence, 2011
ISBN 978-3-642-18083-5 Vol. 354. Ljupco Kocarev and Shiguo Lian (Eds.)
Chaos-Based Cryptography, 2011
Vol. 343. Kurosh Madani, António Dourado Correia,
ISBN 978-3-642-20541-5
Agostinho Rosa, and Joaquim Filipe (Eds.)
Computational Intelligence, 2011 Vol. 355. Yan Meng and Yaochu Jin (Eds.)
ISBN 978-3-642-20205-6 Bio-Inspired Self-Organizing Robotic Systems, 2011
ISBN 978-3-642-20759-4
Vol. 344. Atilla Elçi, Mamadou Tadiou Koné, and
Mehmet A. Orgun (Eds.) Vol. 356. Slawomir Koziel and Xin-She Yang (Eds.)
Semantic Agent Systems, 2011 Computational Optimization, Methods and Algorithms, 2011
ISBN 978-3-642-18307-2 ISBN 978-3-642-20858-4
Slawomir Koziel and Xin-She Yang (Eds.)
Computational Optimization,
Methods and Algorithms
123
Dr. Slawomir Koziel Dr. Xin-She Yang
Reykjavik University Mathematics and Scientific Computing
School of Science and Engineering National Physical Laboratory
Engineering Optimization & Modeling Center Teddington TW11 0LW
Menntavegur 1 UK
101 Reykjavik E-mail: [email protected]
Iceland
E-mail: [email protected]
DOI 10.1007/978-3-642-20859-1
c 2011 Springer-Verlag Berlin Heidelberg
This work is subject to copyright. All rights are reserved, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilm or in any other
way, and storage in data banks. Duplication of this publication or parts thereof is
permitted only under the provisions of the German Copyright Law of September 9,
1965, in its current version, and permission for use must always be obtained from
Springer. Violations are liable to prosecution under the German Copyright Law.
The use of general descriptive names, registered names, trademarks, etc. in this
publication does not imply, even in the absence of a specific statement, that such
names are exempt from the relevant protective laws and regulations and therefore
free for general use.
Typeset & Cover Design: Scientific Publishing Services Pvt. Ltd., Chennai, India.
987654321
springer.com
Preface
problem and to search for optimal solutions, though such optimality is not always
achievable.
Contemporary engineering design is heavily based on computer simulations. This
introduces additional difficulties to optimization. Growing demand for accuracy and
ever-increasing complexity of structures and systems results in the simulation pro-
cess being more and more time consuming. Even with an efficient optimization
algorithm, the evaluations of the objective functions are often time-consuming. In
many engineering fields, the evaluation of a single design can take as long as sev-
eral hours up to several days or even weeks. On the other hand, simulation-based
objective functions are inherently noisy, which makes the optimization process even
more difficult. Still, simulation-driven design becomes a must for a growing number
of areas, which creates a need for robust and efficient optimization methodologies
that can yield satisfactory designs even at the presence of analytically intractable
objectives and limited computational resources.
In most engineering design and industrial applications, the objective cannot be
expressed in explicit analytical form, as the dependence of the objective on de-
sign variables is complex and implicit. This black-box type of optimization often
requires a numerical, often computationally expensive, simulator such as computa-
tional fluid dynamics and finite element analysis. Furthermore, almost all optimiza-
tion algorithms are iterative, and require numerous function evaluations. Therefore,
any technique that improves the efficiency of simulators or reduces the function
evaluation count is crucially important. Surrogate-based and knowledge-based op-
timization uses certain approximations to the objective so as to reduce the cost of
objective evaluations. The approximations are often local, while the quality of ap-
proximations is evolving as the iterations proceed. Applications of optimization in
engineering and industry are diverse. The contents are quite representative and cover
all major topics of computational optimization and modelling.
This book is contributed from worldwide experts who are working in these excit-
ing areas, and each chapter is practically self-contained. This book strives to review
and discuss the latest developments concerning optimization and modelling with a
focus on methods and algorithms of computational optimization, and also covers
relevant applications in science, engineering and industry.
We would like to thank our editors, Drs Thomas Ditzinger and Holger Schaepe,
and staff at Springer for their help and professionalism. Last but not least, we thank
our families for their help and support.
Slawomir Koziel
Xin-She Yang
2011
List of Contributors
Editors
Slawomir Koziel
Engineering Optimization & Modeling Center, School of Science and Engineering,
Reykjavik University, Menntavegur 1, 101 Reykjavik, Iceland ([email protected])
Xin-She Yang
Mathematics and Scientific Computing, National Physical Laboratory, Teddington,
Middlesex TW11 0LW, UK ([email protected])
Contributors
Carlos A. Coello Coello
CINVESTAV-IPN, Departamento de Computación, Av. Instituto Politécnico Na-
cional No. 2508, Col. San Pedro Zacatenco, Delegación Gustavo A. Madero, México,
D.F. C.P. 07360. MEXICO ([email protected])
Kathleen R. Fowler
Clarkson University, Department of Math & Computer Science, P.O. Box 5815,
Postdam, NY 13699-5815, USA ([email protected])
Christian A. Hochmuth
Manufacturing Coordination and Technology, Bosch Rexroth AG, 97816, Lohr am
Main, Germany ([email protected])
Ming-Fu Hsu
Department of International Business Studies, National Chi Nan University, Taiwan,
ROC ([email protected])
Ivan Jeliazkov
Department of Economics, University of California, Irvine, 3151 Social Science
Plaza, Irvine CA 92697-5100, U.S.A. ([email protected])
Jörg Lässig
Institute of Computational Science, University of Lugano, Via Giuseppe Buffi 13,
6906 Lugano, Switzerland ([email protected])
Slawomir Koziel
Engineering Optimization & Modeling Center, School of Science and Engineering,
Reykjavik University, Menntavegur 1, 101 Reykjavik, Iceland ([email protected])
Oliver Kramer
UC Berkeley, CA 94704, USA, ([email protected])
Leifur Leifsson
Engineering Optimization & Modeling Center, School of Science and Engineering,
Reykjavik University, Menntavegur 1, 101 Reykjavik, Iceland ([email protected])
Alicia Lloro
Department of Economics, University of California, Irvine, 3151 Social Science
Plaza, Irvine CA 92697-5100, U.S.A. ([email protected])
Alfredo Arias-Montaño
CINVESTAV-IPN, Departamento de Computación, Av. Instituto Politécnico Na-
cional No. 2508, Col. San Pedro Zacatenco, Delegación Gustavo A. Madero, México,
D.F. C.P. 07360. MEXICO ([email protected])
Efrén Mezura-Montes
Laboratorio Nacional de Informática Avanzada (LANIA A.C.), Rébsamen 80, Cen-
tro, Xalapa, Veracruz, 91000, MEXICO ([email protected])
Stanislav Ogurtsov
Engineering Optimization & Modeling Center, School of Science and Engineering,
Reykjavik University, Menntavegur 1, 101 Reykjavik, Iceland ([email protected])
List of Contributors IX
Ping-Feng Pai
Department of Information Management, National Chi Nan University, Taiwan,
ROC ([email protected])
Stefanie Thiem
Institute of Physics, Chemnitz University of Technology, 09107 Chemnitz, Ger-
many, ([email protected])
Xin-She Yang
Mathematics and Scientific Computing, National Physical Laboratory, Teddington,
Middlesex TW11 0LW, UK ([email protected])
Table of Contents
2 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Xin-She Yang
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Derivative-Based Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Newton’s Method and Hill-Climbing . . . . . . . . . . . . . . . . . . . 14
2.2.2 Conjugate Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Derivative-Free Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Pattern Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.2 Trust-Region Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.4 Metaheuristic Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Simulated Annealling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Genetic Algorithms and Differential Evolution . . . . . . . . . . 19
2.4.3 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.4 Harmony Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4.5 Firefly Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.4.6 Cuckoo Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 A Unified Approach to Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Characteristics of Metaheuristics . . . . . . . . . . . . . . . . . . . . . . 26
XII Table of Contents
3 Surrogate-Based Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Slawomir Koziel, David Echeverrı́a Ciaurri, Leifur Leifsson
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Surrogate-Based Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Surrogate Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.1 Design of Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.2 Surrogate Modeling Techniques . . . . . . . . . . . . . . . . . . . . . . . 41
3.3.3 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3.4 Surrogate Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4 Surrogate-Based Optimization Techniques . . . . . . . . . . . . . . . . . . . . . 49
3.4.1 Approximation Model Management Optimization . . . . . . . . 50
3.4.2 Space Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.4.3 Manifold Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4.4 Surrogate Management Framework . . . . . . . . . . . . . . . . . . . . 53
3.4.5 Exploitation versus Exploration . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4 Derivative-Free Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Oliver Kramer, David Echeverrı́a Ciaurri, Slawomir Koziel
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Derivative-Free Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Local Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.1 Pattern Search Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.3.2 Derivative-Free Optimization with Interpolation and
Approximation Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.4 Global Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.1 Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4.2 Estimation of Distribution Algorithms . . . . . . . . . . . . . . . . . . 72
4.4.3 Particle Swarm Optimization . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.4.4 Differential Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
4.5 Guidelines for Generally Constrained Optimization . . . . . . . . . . . . . 74
4.5.1 Penalty Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.5.2 Augmented Lagrangian Method . . . . . . . . . . . . . . . . . . . . . . . 75
4.5.3 Filter Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.5.4 Other Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Table of Contents XIII
1.1 Introduction
Optimization is everywhere, from airline scheduling to finance and from the Internet
routing to engineering design. Optimization is an important paradigm itself with a
wide range of applications. In almost all applications in engineering and industry,
we are always trying to optimize something – whether to minimize the cost and
energy consumption, or to maximize the profit, output, performance and efficiency.
In reality, resources, time and money are always limited; consequently, optimization
is far more important in practice [1, 7, 27, 29]. The optimal use of available resources
of any sort requires a paradigm shift in scientific thinking, this is because most real-
world applications have far more complicated factors and parameters to affect how
the system behaves. The integrated components of such an optimization process are
the computational modelling and search algorithms.
Xin-She Yang
Mathematics and Scientific Computing,
National Physical Laboratory, Teddington, Middlesex TW11 0LW, UK
e-mail: [email protected]
Slawomir Koziel
Engineering Optimization & Modeling Center,
School of Science and Engineering, Reykjavik University,
Menntavegur 1, 101 Reykjavik, Iceland
e-mail: [email protected]
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 1–11.
springerlink.com
c Springer-Verlag Berlin Heidelberg 2011
2 X.-S. Yang and S. Koziel
functions fi are called objective or cost functions, and when M > 1, the optimization
is multiobjective or multicriteria [21]. It is possible to combine different objectives
into a single objective, and we will focus on the single-objective optimization prob-
lems in most part of this book. It is worth pointing out that here we write the prob-
lem as a minimization problem, it can also be written as a maximization by simply
replacing fi (x) by − fi (x).
In a special case when K = 0, we have only equality constraints, and the opti-
mization becomes an equality-constrained problem. As an equality h(x) = 0 can be
written as two inequalities: h(x) ≤ 0 and −h(x) ≤ 0, some formulations in the opti-
mization literature use constraints with inequalities only. However, in this book, we
will explicitly write out equality constraints in most cases.
When all functions are nonlinear, we are dealing with nonlinear constrained prob-
lems. In some special cases when fi , h j , gk are linear, the problem becomes lin-
ear, and we can use the widely linear programming techniques such as the simplex
method. When some design variables can only take discrete values (often integers),
while other variables are real continuous, the problem is of mixed type, which is
often difficult to solve, especially for large-scale optimization problems.
A very special class of optimization is the convex optimization [2], which has
guaranteed global optimality. Any optimal solution is also the global optimum, and
most importantly, there are efficient algorithms of polynomial time to solve such
problems [3]. These efficient algorithms such the interior-point methods [12] are
widely used and have been implemented in many software packages.
On the other hand, some of the functions such as fi are integral, while others such
as h j are differential equations, the problem becomes an optimal control problem,
and special techniques are required to achieve optimality.
For most applications in this book, we will mainly deal with nonlinear con-
strained global optimization problems with a single objective. In one chapter by
Coello Coello, multiobjective optimization will be discussed in detail. Optimal con-
trol and other cases will briefly be discussed in the relevant context in this book.
model
optimizer simulator
Another important step is to use the right algorithm or optimizer so that an op-
timal set of combination of design variables can be found. An important capability
of optimization is to generate or search for new solutions from a known solution
(often a random guess or a known solution from experience), which will lead to
the convergence of the search process. The ultimate aim of this search process is
to find solutions which converge at the global optimum, though this is usually very
difficult.
In term of computing time and cost, the most important step is the use of an effi-
cient evaluator or simulator. In most applications, once a correct model representa-
tion is made and implemented, an optimization process often involves the evaluation
of objective function (such as the aerodynamical efficiency of an airfoil) many times,
often thousands and even millions of configurations. Such evaluations often involve
the use of extensive computational tools such as a computational fluid dynamics
simulator or a finite element solver. This is the step that is most time-consuming,
often taking 50% to 90% of the overall computing time.
1.4 Optimizer
1.4.1 Optimization Algorithms
An efficient optimizer is very important to ensure the optimal solutions are reach-
able. The essence of an optimizer is a search or optimization algorithm implemented
correctly so as to carry out the desired search (though not necessarily efficiently).
It can be integrated and linked with other modelling components. There are many
optimization algorithms in the literature and no single algorithm is suitable for all
problems, as dictated by the No Free Lunch Theorems [24].
1 Computational Optimization: An Overview 5
search all fit into this category. It is worth pointing out that we should not confuse
the use of memory with the simple record of the current state and the elitism or se-
lection of the fittest. On the other hand, some algorithms indeed use memory/history
explicitly. In the Tabu search [9], tabu lists are used to record the move history and
recently visited solutions will not be tried again in the near future, and it encourages
to explore completely different new solutions, which may save computing effort
significantly.
Another type of the algorithm is the so-called mixed-type or hybrid, which uses
some combination of deterministic and randomness, or combines one algorithm
with another so as to design more efficient algorithms. For example, genetic al-
gorithms can be hybridized with many algorithms such as particle swarm optimiza-
tion; more specifically, may involve the use of generic operators to modify some
components of another algorithm.
From the mobility point of view, algorithms can be classified as local or global.
Local search algorithms typically converge towards a local optimum, not necessar-
ily (often not) the global optimum, and such algorithms are often deterministic and
have no ability of escaping local optima. Simple hill-climbing is an example. On
the other hand, we always try to find the global optimum for a given problem, and
if this global optimality is robust, it is often the best, though it is not always possi-
ble to find such global optimality. For global optimization, local search algorithms
are not suitable. We have to use a global search algorithm. Modern metaheuris-
tic algorithms in most cases are intended for global optimization, though not always
successful or efficiently. A simple strategy such as hill-climbing with random restart
may change a local search algorithm into a global search. In essence, randomization
is an efficient component for global search algorithms. A detailed review of opti-
mization algorithms will be provided later in the chapter on optimization algorithms
by Yang.
Straightforward optimization of a given objective function is not always prac-
tical. Particularly, if the objective function comes from a computer simulation,
it may be computationally expensive, noisy or non-differentiable. In such cases,
so-called surrogate-based optimization algorithms may be useful where the direct
optimization of the function of interest is replaced by iterative updating and re-
optimization of its model - a surrogate [5]. The surrogate model is typically con-
structed from the sampled data of the original objective function, however, it is
supposed to be cheap, smooth, easy to optimize and yet reasonably accurate so
that it can produce a good prediction of the function’s optimum. Multi-fidelity or
variable-fidelity optimization is a special case of the surrogate-based optimization
where the surrogate is constructed from the low-fidelity model (or models) of the
system of interest [15]. Using variable-fidelity optimization is particularly useful is
the reduction of the computational cost of the optimization process is of primary
importance.
Whatever the classification of an algorithm is, we have to make the right choice
to use an algorithm correctly and sometime a proper combination of algorithms may
achieve better results.
1 Computational Optimization: An Overview 7
1.5 Simulator
To solve an optimization problem, the most computationally extensive part is prob-
ably the evaluation of the design objective to see if a proposed solution is feasible
and/or if it is optimal. Typically, we have to carry out these evaluations many times,
often thousands and even millions of times [25, 27]. Things become even more
challenging computationally, when each evaluation task takes a long time via some
black-box simulators. If this simulator is a finite element or CFD solver, the running
time of each evaluation can take from a few minutes to a few hours or even weeks.
Therefore, any approach to save computational time either by reducing the number
of evaluations or by increasing the simulator’s efficiency will save time and money.
many design problems can be simulated by using neural networks and support
vector machines. In this case, we know certain objectives of the design, but the re-
lationship between the parameter setting and the system performance/output is not
only implicit, but also dynamically changing based on iterative learning/training.
Fuzzy system is another example, and in this case, special techniques and methods
are used, which is essentially forms a different subject.
In this book, we will mainly focus on the cases in which the objective can be
evaluated either using explicit formulas or using black-box numerical tools/solvers.
Some case studies of optimization using neural networks will be provided as well.
(exploration of the design space) or a mixture of both [5]. The new data is used
to update the surrogate. A detailed review of surrogate-modeling techniques and
surrogate-base optimization methods will be given by Koziel et al. later.
References
1. Arora, J.: Introduction to Optimum Design. McGraw-Hill, New York (1989)
2. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press,
Cambridge (2004)
3. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-region methods. SIAM & MPS (2000)
4. Dantzig, G.B.: Linear Programming and Extensions. Princeton University Press,
Princeton (1963)
1 Computational Optimization: An Overview 11
Xin-She Yang
2.1 Introduction
Algorithms for optimization are more diverse than the types of optimization, though
the right choice of algorithms is an important issue, as we discussed in the first chap-
ter where we have provided an overview. There are a wide range of optimization al-
gorithms, and a detailed description of each can take up the whole book of more than
several hundred pages. Therefore, in this chapter, we will introduce a few important
algorithms selected from a wide range of optimization algorithms [4, 27, 31], with a
focus on the metaheuristic algorithms developed after the 1990s. This selection does
not mean that the algorithms not described here are not popular. In fact, they may
be equally widely used. Whenever an algorithm is used in this book, we will try to
provide enough details so that readers can see how they are implemented; alterna-
tively, in some cases, enough citations and links will be provided so that interested
readers can pursue further research using these references as a good start.
Xin-She Yang
Mathematics and Scientific Computing,
National Physical Laboratory,
Teddington, Middlesex TW11 0LW, UK
e-mail: [email protected]
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 13–31.
springerlink.com c Springer-Verlag Berlin Heidelberg 2011
14 X.-S. Yang
x = xn − H −1 ∇ f (xn ), (2.2)
where H −1 (x(n) ) is the inverse of the symmetric Hessian matrix H = ∇2 f (xn ), which
is defined as ⎛ 2 ⎞
∂ f ∂2 f
... ∂ x1 ∂ xn ⎟
⎜ ∂ x1
2
⎜ ⎟
H(x) ≡ ∇2 f (x) ≡ ⎜ ... ..
. ⎟. (2.3)
⎝ 2 ⎠
∂ f ∂f 2
∂ xn ∂ x . . . ∂ x2
1 n
Starting from an initial guess vector x(0) , the iterative Newton’s formula for the nth
iteration becomes
x(n+1) = x(n) − H −1 (x(n) )∇ f (x(n) ). (2.4)
In order to speed up the convergence, we can use a smaller step size α ∈ (0, 1] and
we have the modified Newton’s method
where Δ s = x(n+1) − x(n) is the increment vector. Since we are trying to find a better
(higher) approximation to the objective function, it requires that
From vector analysis, we know that the inner product uT v of two vectors u and v is
the largest when they are parallel. Therefore, we have
Δ s = α ∇ f (x(n) ), (2.9)
where α > 0 is the step size. In the case of minimization, the direction Δ s is along
the steepest descent in the negative gradient direction.
It is worth pointing out that the choice of the step size α is very important. A very
small step size means slow movement towards the local optimum, while a large step
may overshoot and subsequently makes it move far away from the local optimum.
Therefore, the step size α = α (n) should be different at each iteration and should
be chosen so as to maximize or minimize the objective function, depending on the
context of the problem.
1
f (u) = uT Au − bT u + v, (2.11)
2
where v is a vector constant and can be taken to be zero. We can easily see that
∇ f (u) = 0 leads to Au = b. In theory, these iterative methods are closely related to
the Krylov subspace Kn spanned by A and b as defined by
where A0 = I.
If we use an iterative procedure to obtain the approximate solution un to Au = b
at nth iteration, the residual is given by
rn = b − Aun , (2.13)
and
d n+1 = rn+1 + βnd n , (2.16)
where
rTn rn r Tn+1 rn+1
αn = , βn = . (2.17)
d Tn Ad n rTn rn
Iterations stop when a prescribed accuracy is reached. In the case when A is not
symmetric, we can use other algorithms such as the generalized minimal residual
(GMRES) algorithm developed by Y. Saad and M. H. Schultz in 1986.
objective itself at step n. If the past information such as the steps at n − 1 and n is
properly used to generate a new move at step n + 1, it may speed up the convergence.
The Hooke-Jeeves pattern search method is one of such methods that incorporate the
past history of iterations in producing a new search direction.
The Hooke-Jeeves pattern search method consists of two moves: exploratory
move and pattern move. The exploratory moves explore the local behaviour and
information of the objective function so as to identify any potential sloping valleys
if they exist. For any given step size (each coordinate direction can have a different
increment) Δi (i = 1, 2, ..., p), exploration movement performs from an initial start-
ing point along each coordinate direction by increasing or decreasing ±Δi , if the
new value of the objective function does not increase (for a minimization problem),
(n) (n−1)
that is f (xi ) ≤ f (xi ), the exploratory move is considered as successful. If it is
not successful, then a step is tried in the opposite direction, and the result is updated
only if it is successful. When all the d coordinates have been explored, the resulting
point forms a base point x(n) .
The pattern move intends to move the current base x(n) along the base line (x(n) −
x(n−1) ) from the previous (historical) base point to the current base point. The move
is carried out by the following formula
Then x(n+1) forms a new temporary base point for further new exploratory moves.
If the pattern move produces improvement (lower value of f (x)), the new base point
x(n+1) is successfully updated. If the pattern move does not lead to any improvement
or a lower value of the objective function, then the pattern move is discarded and
a new search starts from x(n) , and the new search moves should use a smaller step
size by reducing increments Di /γ where γ > 1 is the step reduction factor. Iterations
continue until the prescribed tolerance ε is met.
f (xk ) − f (xk+1 )
γk = . (2.19)
φk (xk ) − φk (xk+1 )
If this ratio is close to unity, we have a good approximation and then should move
the trust region to xk+1 . The trust-region should move and update iteratively until
the (global) optimality is found or until a fixed number of iterations is reached.
There are many other methods, and one of the most powerful and widely used is
the polynomial-time efficient algorithm, called the interior-point method [16], and
many variants have been developed since 1984.
All these above algorithms are deterministic, as they have no random
components. Thus, they usually have some disadvantages in dealing with highly
nonlinear, multimodal, global optimization problems. In fact, some randomization is
useful and necessary in algorithms, and metaheuristic algorithms are such powerful
techniques.
where kB is the Boltzmann’s constant, and T is the temperature for controlling the
annealing process. Δ E is the change of the energy level. This transition probability
is based on the Boltzmann distribution in statistical mechanics.
The simplest way to link Δ E with the change of the objective function Δ f is to
use
Δ E = γΔ f , (2.21)
where γ is a real constant. For simplicity without losing generality, we can use
kB = 1 and γ = 1. Thus, the probability p simply becomes
carry out the search, which may have some advantage due to its potential paral-
lelism.
Genetic algorithms are a classic of algorithms based on the abstraction of Dar-
win’s evolution of biological systems, pioneered by J. Holland and his collaborators
in the 1960s and 1970s [14]. Holland was the first to use genetic operators such
as the crossover and recombination, mutation, and selection in the study of adaptive
and artificial systems. Genetic algorithms have two main advantages over traditional
algorithms: the ability of dealing with complex problems and parallelism. Whether
the objective function is stationary or transient, linear or nonlinear, continuous or
discontinuous, it can be dealt with by genetic algorithms. Multiple genes can be
suitable for parallel implementation.
Three main components or genetic operators in genetic algorithms are: crossover,
mutation, and selection of the fittest. Each solution is encoded in a string (often bi-
nary or decimal), called a chromosome. The crossover of two parent strings pro-
duce offsprings (new solutions) by swapping part or genes of the chromosomes.
Crossover has a higher probability, typically 0.8 to 0.95. On the other hand, muta-
tion is carried out by flipping some digits of a string, which generates new solutions.
This mutation probability is typically low, from 0.001 to 0.05. New solutions gen-
erated in each generation will be evaluated by their fitness which is linked to the
objective function of the optimization problem. The new solutions are selected ac-
cording to their fitness – selection of the fittest. Sometimes, in order to make sure
that the best solutions remain in the population, the best solutions are passed onto
the next generation without much change, this is called elitism.
Genetic algorithms have been applied to almost all area of optimization, design
and applications. There are hundreds of good books and thousand of research arti-
cles. There are many variants and hybridization with other algorithms, and interested
readers can refer to more advanced literature such as [12, 14].
Differential evolution (DE) was developed by R. Storn and K. Price by their nom-
inal papers in 1996 and 1997 [25, 26]. It is a vector-based evolutionary algorithm,
and can be considered as a further development to genetic algorithms. It is a stochas-
tic search algorithm with self-organizing tendency and does not use the information
of derivatives. Thus, it is a population-based, derivative-free method.
As in genetic algorithms, design parameters in a d-dimensional search space are
represented as vectors, and various genetic operators are operated over their bits
of strings. However, unlikely genetic algorithms, differential evolution carries out
operations over each component (or each dimension of the solution). Almost every-
thing is done in terms of vectors. For example, in genetic algorithms, mutation is
carried out at one site or multiple sites of a chromosome, while in differential evolu-
tion, a difference vector of two randomly-chosen population vectors is used to per-
turb an existing vector. Such vectorized mutation can be viewed as a self-organizing
search, directed towards an optimality.
For a d-dimensional optimization problem with d parameters, a population of n
solution vectors are initially generated, we have xi where i = 1, 2, ..., n. For each
solution xi at any generation t, we use the conventional notation as
2 Optimization Algorithms 21
vt+1
i = xtp + F(xtq − xtr ), (2.26)
Most studies have focused on the choice of F, Cr and n as well as the modifica-
tion of (2.26). In fact, when generating mutation vectors, we can use many different
ways of formulating (2.26), and this leads to various schemes with the naming con-
vention: DE/x/y/z where x is the mutation scheme (rand or best), y is the number
of difference vectors, and z is the crossover scheme (binomial or exponential). The
basic DE/Rand/1/Bin scheme is given in (2.26). Following a similar strategy, we can
design various schemes. In fact, 10 different schemes have been formulated, and for
details, readers can refer to [23].
component. Each particle is attracted toward the position of the current global best
g∗ and its own best location x∗i in history, while at the same time it has a tendency
to move randomly.
Let xi and vi be the position vector and velocity for particle i, respectively. The
new velocity vector is determined by the following formula
vt+1
i = vti + αε 1 [g∗ − xti ] + β ε 2 [x∗i − xti ]. (2.28)
where ε 1 and ε 2 are two random vectors, and each entry taking the values between
0 and 1. The Hadamard product of two matrices u v is defined as the entrywise
product, that is [u v]i j = ui j vi j . The parameters α and β are the learning parameters
or acceleration constants, which can typically be taken as, say, α ≈ β ≈ 2.
The initial locations of all particles should distribute relatively uniformly so that
they can sample over most regions, which is especially important for multimodal
i = 0. The
problems. The initial velocity of a particle can be taken as zero, that is, vt=0
new position can then be updated by
xt+1
i = xti + vt+1
i . (2.29)
Although vi can be any values, it is usually bounded in some range [0, vmax ].
There are many variants which extend the standard PSO algorithm [15, 30, 31],
and the most noticeable improvement is probably to use inertia function θ (t) so that
vti is replaced by θ (t)vti
vt+1
i = θ vti + αε 1 [g∗ − xti ] + β ε 2 [x∗i − xti ], (2.30)
where θ takes the values between 0 and 1. In the simplest case, the inertia function
can be taken as a constant, typically θ ≈ 0.5 ∼ 0.9. This is equivalent to introducing
a virtual mass to stabilize the motion of the particles, and thus the algorithm is
expected to converge more quickly.
we can assign a parameter raccept ∈ [0, 1], called harmony memory accepting or
considering rate. If this rate is too low, only few best harmonies are selected and
it may converge too slowly. If this rate is extremely high (near 1), almost all the
harmonies are used in the harmony memory, then other harmonies are not explored
well, leading to potentially wrong solutions. Therefore, typically, raccept = 0.7 ∼
0.95.
To adjust the pitch slightly in the second component, we have to use a method
such that it can adjust the frequency efficiently. In theory, the pitch can be adjusted
linearly or nonlinearly, but in practice, linear adjustment is used. If xold is the current
solution (or pitch), then the new solution (pitch) xnew is generated by
where ε is a random number drawn from a uniform distribution [0, 1]. Here b p is the
bandwidth, which controls the local range of pitch adjustment. In fact, we can see
that the pitch adjustment (2.31) is a random walk.
Pitch adjustment is similar to the mutation operator in genetic algorithms. We
can assign a pitch-adjusting rate (rpa ) to control the degree of the adjustment. If rpa
is too low, then there is rarely any change. If it is too high, then the algorithm may
not converge at all. Thus, we usually use r pa = 0.1 ∼ 0.5 in most simulations.
The third component is the randomization, which is to increase the diversity of
the solutions. Although adjusting pitch has a similar role, but it is limited to certain
local pitch adjustment and thus corresponds to a local search. The use of random-
ization can drive the system further to explore various regions with high solution
diversity so as to find the global optimality. HS has been applied to solve many
optimization problems including function optimization, water distribution network,
groundwater modelling, energy-saving dispatch, structural design, vehicle routing,
and others.
β = β0 e−γ r ,
2
(2.32)
which has an infinite variance with an infinite mean. Here the steps essentially form
a random walk process with a power-law step-length distribution with a heavy tail.
Some of the new solutions should be generated by Lévy walk around the best solu-
tion obtained so far, this will speed up the local search.
A demo version of firefly algorithm implementation, without Lévy flights, can be
found at Mathworks file exchange web site.1 Firefly algorithm has attracted much
attention [1, 24]. A discrete version of FA can efficiently solve NP-hard scheduling
problems [24], while a detailed analysis has demonstrated the efficiency of FA over
a wide range of test problems, including multobjective load dispatch problems [1].
will either get rid of these alien eggs or simply abandon its nest and build a new nest
elsewhere. Some cuckoo species such as the New World brood-parasitic Tapera have
evolved in such a way that female parasitic cuckoos are often very specialized in the
mimicry in colour and pattern of the eggs of a few chosen host species. This reduces
the probability of their eggs being abandoned and thus increases their reproductivity.
In addition, the timing of egg-laying of some species is also amazing. Parasitic
cuckoos often choose a nest where the host bird just laid its own eggs. In general, the
cuckoo eggs hatch slightly earlier than their host eggs. Once the first cuckoo chick
is hatched, the first instinct action it will take is to evict the host eggs by blindly
propelling the eggs out of the nest, which increases the cuckoo chick’s share of food
provided by its host bird. Studies also show that a cuckoo chick can also mimic the
call of host chicks to gain access to more feeding opportunity.
For simplicity in describing the Cuckoo Search, we now use the following three
idealized rules:
• Each cuckoo lays one egg at a time, and dumps it in a randomly chosen nest;
• The best nests with high-quality eggs will be carried over to the next generations;
• The number of available host nests is fixed, and the egg laid by a cuckoo is
discovered by the host bird with a probability pa ∈ [0, 1]. In this case, the host bird
can either get rid of the egg, or simply abandon the nest and build a completely
new nest.
As a further approximation, this last assumption can be approximated by a fraction
pa of the n host nests are replaced by new nests (with new random solutions).
For a maximization problem, the quality or fitness of a solution can simply be
proportional to the value of the objective function. Other forms of fitness can be
defined in a similar way to the fitness function in genetic algorithms.
For the implementation point of view, we can use the following simple represen-
tations that each egg in a nest represents a solution, and each cuckoo can lay only
one egg (thus representing one solution), the aim is to use the new and potentially
better solutions (cuckoos) to replace a not-so-good solution in the nests. Obviously,
this algorithm can be extended to the more complicated case where each nest has
multiple eggs representing a set of solutions. For this present work, we will use the
simplest approach where each nest has only a single egg. In this case, there is no
distinction between egg, nest or cuckoo, as each nest corresponds to one egg which
also represents one cuckoo.
Based on these three rules, the basic steps of the Cuckoo Search (CS) can be
summarized as the pseudo code shown in Fig. 2.1.
When generating new solutions x(t+1) for, say, a cuckoo i, a Lévy flight is
performed
(t+1) (t)
xi = xi + α ⊕ Lévy(λ ), (2.35)
where α > 0 is the step size which should be related to the scales of the problem
of interests. In most cases, we can use α = O(L/10) where L is the characteristic
scale of the problem of interest, while in some case α = O(L/100) can be more
effective and avoid flying to far. The above equation is essentially the stochastic
26 X.-S. Yang
equation for a random walk. In general, a random walk is a Markov chain whose
next status/location only depends on the current location (the first term in the above
equation) and the transition probability (the second term). The product ⊕ means
entrywise multiplications. This entrywise product is similar to those used in PSO,
but here the random walk via Lévy flight is more efficient in exploring the search
space, as its step length is much longer in the long run. However, a substantial
fraction of the new solutions should be generated by far field randomization and
whose locations should be far enough from the current best solution, this will make
sure that the system will not be trapped in a local optimum [35].
The pseudo code given here is sequential, however, vectors should be used from
an implementation point of view, as vectors are more efficient than loops. A Matlab
implementation is given by the author, and can be downloaded.2
diversification makes sure the algorithm explores the search space globally
(hopefully also efficiently).
Furthermore, intensification is also called exploitation, as it typically searches
around the current best solutions and selects the best candidates or solutions. Simi-
larly, diversification is also called exploration, as it tends to explore the search space
more efficiently, often by large-scale randomization.
A fine balance between these two components is very important to the overall
efficiency and performance of an algorithm. Too little exploration and too much
exploitation could cause the system to be trapped in local optima, which makes it
very difficult or even impossible to find the global optimum. On the other hand, if
there is too much exploration but too little exploitation, it may be difficult for the
system to converge and thus slows down the overall search performance. A proper
balance itself is an optimization problem, and one of the main tasks of designing new
algorithms is to find a certain balance concerning this optimality and/or tradeoff.
Furthermore, just exploitation and exploration are not enough. During the search,
we have to use a proper mechanism or criterion to select the best solutions. The most
common criterion is to use the Survival of the Fittest, that is to keep updating the
the current best found so far. In addition, certain elitism is often used, and this is to
ensure the best or fittest solutions are not lost, and should be passed onto the next
generations.
xt+1 = g∗ + w, (2.36)
and
w = ε d, (2.37)
where ε is drawn from a Gaussian distribution or normal distribution N(0, σ 2 ), and
d is the step length vector which should be related to the actual scales of independent
variables. For simplicity, we can take σ = 1.
xt+1 = g∗ + ε d (2.38)
else
Global search: randomization (Uniform, Lévy flights etc)
end
Evaluate new solutions and find the current best gt∗ ;
t = t + 1;
end while
Postprocess results and visualization;
where εu is drawn from a uniform distribution Unif[0,1]. U and L are the upper and
lower bound vectors, respectively.
Typically, α ≈ 0.25 ∼ 0.7. We will use α = 0.5 in our implementation. Interested
readers can try to do some parametric studies.
Again two important issues are: 1) the balance of intensification and diversifica-
tion controlled by a single parameter α , and 2) the choice of the step size of the
random walk. Parameter α is typically in the range of 0.25 to 0.7. The choice of the
right step size is also important. Simulations suggest that the ratio of the step size to
its length scale can typically be around 0.001 to 0.01 for most applications.
Another important issue is the selection of the best and/or elitism, as we intend
to discard the worst solution and replace it by generating new solution. This may
implicitly weed out the least-fit solutions, while the solution with the highest fitness
2 Optimization Algorithms 29
remains in the population. The selection of the best and elitism can be guaranteed
implicitly in the evolutionary walkers.
Furthermore, the number (n) of random walkers is also important. Too few walk-
ers are not efficient, while too many may lead to slow convergence. In general, the
choice of n should follow the similar guidelines as those for all population-based
algorithms. Typically, we can use n = 15 to 50 for most applications.
References
1. Apostolopoulos, T., Vlachos, A.: Application of the Firefly Algorithm for Solving
the Economic Emissions Load Dispatch Problem. International Journal of Combina-
torics 2011 Article ID 523806 (2011) ,
http://www.hindawi.com/journals/ijct/2011/523806.html
2. Blum, C., Roli, A.: Metaheuristics in combinatorial optimization: Overview and concep-
tural comparision. ACM Comput. Surv. 35, 268–308 (2003)
3. Cox, M.G., Forbes, A.B., Harris, P.M.: Discrete Modelling, SSfM Best Practice Guide
No. 4, National Physical Laboratory, UK (2002)
4. Boyd, S.P., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cam-
bridge (2004)
5. Celis, M., Dennis, J.E., Tapia, R.A.: A trust region strategy for nonlinear equality con-
strained optimization. In: Boggs, P., Byrd, R., Schnabel, R. (eds.) Numerical Optimiza-
tion 1994, pp. 71–82. SIAM, Philadelphia (1994)
6. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust-region methods. SIAM&MPS (2000)
7. Dorigo, M., Stütle, T.: Ant Colony Optimization. MIT Press, Cambridge (2004)
8. Farmer, J.D., Packard, N., Perelson, A.: The immune system, adapation and machine
learning. Physica D 2, 187–204 (1986)
30 X.-S. Yang
9. Geem, Z.W., Kim, J.H., Loganathan, G.V.: A new heuristic optimization: Harmony
search. Simulation 76, 60–68 (2001)
10. Gill, P.E., Murray, W., Wright, M.H.: Practical optimization. Academic Press Inc,
London (1981)
11. Glover, F., Laguna, M.: Tabu Search. Kluwer Academic Publishers, Boston (1997)
12. Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison Wesley, Reading (1989)
13. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems.
Journal of Research of the National Bureaus of Standards 49(6), 409–436 (1952)
14. Holland, J.: Adaptation in Natural and Artificial Systems. University of Michigan Press,
Ann Anbor (1975)
15. Kennedy, J., Eberhart, R.C.: Particle swarm optimization. In: Proc. of IEEE International
Conference on Neural Networks, Piscataway, NJ, pp. 1942–1948 (1995)
16. Karmarkar, N.: A new polynomial-time algorithm for linear programming. Combinator-
ica 4(4), 373–395 (1984)
17. Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Sci-
ence 220(4598), 671–680 (1983)
18. Koziel, S., Yang, X.S.: Computational Optimization and Applications in Engineering and
Industry. Springer, Germany (2011)
19. Nelder, J.A., Mead, R.: A simplex method for function optimization. Computer Journal 7,
308–313 (1965)
20. Matthews, C., Wright, L., Yang, X.S.: Sensitivity Analysis, Optimization, and Sam-
pling Methodds Applied to Continous Models. National Physical Laboratory Report,
UK (2009)
21. Pavlyukevich, I.: Lévy flights, non-local search and simulated annealing. J. Computa-
tional Physics 226, 1830–1844 (2007)
22. Powell, M.J.D.: A new algorithm for unconstrained optimization. In: Rosen, J.B., Man-
gasarian, O.L., Ritter, K. (eds.) Nonlinear Programming, pp. 31–65 (1970)
23. Price, K., Storn, R., Lampinen, J.: Differential Evolution: A Practical Approach to Global
Optimization. Springer, Heidelberg (2005)
24. Sayadi, M.K., Ramezanian, R., Ghaffari-Nasab, N.: A discrete firefly meta-heuristic with
local search for makespan minimization in permutation flow shop scheduling problems.
Int. J. of Industrial Engineering Computations 1, 1–10 (2010)
25. Storn, R.: On the usage of differential evolution for function optimization. In: Biennial
Conference of the North American Fuzzy Information Processing Society (NAFIPS), pp.
519–523 (1996)
26. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global
optimization over continuous spaces. Journal of Global Optimization 11, 341–359 (1997)
27. Talbi, E.G.: Metaheuristics: From Design to Implementation. John Wiley & Sons, Chich-
ester (2009)
28. Yang, X.S.: Introduction to Computational Mathematics. World Scientific Publishing,
Singapore (2008)
29. Yang, X.S.: Nature-Inspired Metaheuristic Algorithms, 1st edn. Lunver Press, UK (2008)
30. Yang, X.S.: Nature-Inspired Metaheuristic Algoirthms, 2nd edn. Luniver Press, UK
(2010)
31. Yang, X.S.: Engineering Optimization: An Introduction with Metaheuristic Applications.
John Wiley & Sons, Chichester (2010)
32. Yang, X.-S.: Firefly algorithms for multimodal optimization. In: Watanabe, O., Zeug-
mann, T. (eds.) SAGA 2009. LNCS, vol. 5792, pp. 169–178. Springer, Heidelberg (2009)
2 Optimization Algorithms 31
33. Yang, X.-S.: A new metaheuristic bat-inspired algorithm. In: González, J.R., Pelta,
D.A., Cruz, C., Terrazas, G., Krasnogor, N. (eds.) NICSO 2010. SCI, vol. 284,
pp. 65–74. Springer, Heidelberg (2010)
34. Yang, X.S., Deb, S.: Cuckoo search via Lévy flights. In: Proc. of World Congress on Na-
ture & Biologically Inspired Computing (NaBic 2009), pp. 210–214. IEEE Publications,
USA (2009)
35. Yang, X.S., Deb, S.: Engineering optimization by cuckoo search. Int. J. Math. Modelling
Num. Optimisation 1(4), 330–343 (2010)
Chapter 3
Surrogate-Based Methods*
Abstract. Objective functions that appear in engineering practice may come from
measurements of physical systems and, more often, from computer simulations. In
many cases, optimization of such objectives in a straightforward way, i.e., by ap-
plying optimization routines directly to these functions, is impractical. One reason
is that simulation-based objective functions are often analytically intractable (dis-
continuous, non-differentiable, and inherently noisy). Also, sensitivity information
is usually unavailable, or too expensive to compute. Another, and in many cases
even more important, reason is the high computational cost of measure-
ment/simulations. Simulation times of several hours, days or even weeks per ob-
jective function evaluation are not uncommon in contemporary engineering, de-
spite the increase of available computing power. Feasible handling of these
unmanageable functions can be accomplished using surrogate models: the optimi-
zation of the original objective is replaced by iterative re-optimization and updat-
ing of the analytically tractable and computationally cheap surrogate. This chapter
briefly describes the basics of surrogate-based optimization, various ways of creat-
ing surrogate models, as well as several examples of surrogate-based optimization
techniques.
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 33–59.
springerlink.com © Springer-Verlag Berlin Heidelberg 2011
34 S. Koziel, D. Echeverría Ciaurri, and L. Leifsson
3.1 Introduction
Contemporary engineering is more and more dependent on computer-aided design
(CAD). In most engineering fields, numerical simulations are used extensively,
not only for design verification but also directly in the design process. As a matter
of fact, because of increasing system complexity, ready-to-use theoretical (e.g.,
analytical) models are not available in many cases. Thus, simulation-driven design
and design optimization becomes the only option to meet the specifications
prescribed, improve the system reliability, or reduce the fabrication cost.
The simulation-driven design can be formulated as a nonlinear minimization
problem of the following form
x * = arg min f ( x ) , (3.1)
x
where f(x) denotes the objective function to be minimized evaluated at the point
x ∈ Rn (x is the design variable vector). In many engineering problems f is of the
form f(x) = U(Rf(x)), where Rf ∈ Rm denotes the response vector of the system of in-
terest (in particular, one may have m > n or even m >> n [1]), whereas U is a given
scalar merit function. In particular, U can be defined through a norm that measures
the distance between Rf(x) and a target vector y. An optimal design vector is denoted
by x*. In many cases, Rf is obtained through computationally expensive computer
simulations. We will refer to it as a high-fidelity or fine model. To simplify notation,
f itself will also be referred to as the high-fidelity (fine) model.
Unfortunately, a direct attempt to solve (3.1) by embedding the simulator di-
rectly in the optimization loop may be impractical. The underlying simulations can
be very time-consuming (in some instances, the simulation time can be as long as
several hours, days or even weeks per single design), and the presence of massive
computing resources is not always translated in computational speedup. This latter
fact is due to a growing demand for simulation accuracy, both by including mul-
tiphysics and second-order effects, and by using finer discretization of the structure
under consideration. As conventional optimization algorithms (e.g., gradient-based
schemes with numerical derivatives) require tens, hundreds or even thousands of
objective function calls per run (that depends on the number of design variables),
the computational cost of the whole optimization process may not be acceptable.
Another problem is that objective functions coming from computer simulations
are often analytically intractable (i.e., discontinuous, non-differentiable, and in-
herently noisy). Moreover, sensitivity information is frequently unavailable, or too
expensive to compute. While in some cases it is possible to obtain derivative in-
formation inexpensively through adjoint sensitivities [2], numerical noise is an
important issue that can complicate simulation-driven design. We should also
mention that adjoint-based sensitivities require detailed knowledge of and access
to the simulator source code, and this is something that cannot be assumed to be
generally available.
Surrogate-based optimization (SBO) [1,3,4] has been suggested as an effective
approach for the design with time-consuming computer models. The basic concept
Surrogate-Based Methods 35
This scheme generates a sequence of points (designs) x(i) that (hopefully) converge
to a solution (or a good approximation) of the original design problem (3.1). Each
x(i+1) is the optimal design of the surrogate model s(i), which is assumed to be a
computationally cheap and sufficiently reliable representation of the fine model f,
particularly in the neighborhood of the current design x(i). Under these assump-
tions, the algorithm (3.2) aims at a sequence of designs to quickly approach x*.
Typically, and for verification purposes, the high-fidelity model is evaluated only
once per iteration (at every new design x(i+1)). The data obtained from the valida-
tion is used to update the surrogate model. Because the surrogate model is compu-
tationally cheap, the optimization cost associated with (3.2) can—in many cases—
be viewed as negligible, so that the total optimization cost is determined by the
evaluation of the high-fidelity model. Normally, the number of iterations often
needed within a surrogate-based optimization algorithm is substantially smaller
than for any method that optimizes the high-fidelity model directly (e.g., gradient-
based schemes with numerical derivatives) [5].
If the surrogate model satisfies zero- and first-order consistency conditions with
the high-fidelity model (i.e., s(i)(x(i)) = f (x(i)) and ∇s(i)(x(i)) = ∇f(x(i)) [7]; it should
be noticed that the verification of the latter requires high-fidelity model sensitivity
data), and the surrogate-based algorithm is enhanced by, for example, a trust re-
gion method [11] (see Section 3.4.1), then the sequence of intermediate solutions
is provably convergent to a local optimizer of the fine model [12] (some standard
assumptions concerning the smoothness of the functions involved are also neces-
sary) [13]. Convergence can also be guaranteed if the SBO algorithm is embedded
within the framework given in [5,14] (space mapping), [13] (manifold mapping)
or [9] (surrogate management framework). A more detailed description of several
surrogate-based optimization techniques is given in Section 3.4.
Space mapping [1,5,6] is an example of a surrogate-based methodology that
does not normally rely on using sensitivity data or trust region convergence safe-
guards; however, it requires the surrogate model to be constructed from a physi-
cally-based coarse model [1]. This usually gives remarkably good performance in
the sense of the algorithm being able to locate a satisfactory design quickly. Un-
fortunately space mapping suffers from convergence problems [14] and it is sensi-
tive to the quality of the coarse model and the specific analytical formulation of
the surrogate [15,16].
Surrogate-Based Methods 37
Initial Design
Surrogate-Based
Construct/Update Optimization
Surrogate Model Algorithm
Evaluate High-Fidelity
Model
Termination No
Condition?
Yes
Final Design
It should be noticed that the evaluation of a physical surrogate may involve, for
example, the numerical solution of partial differential equations or even actual
measurements of the physical system.
The main advantage of physically-based surrogates is that the amount of high-
fidelity model data necessary for obtaining a given level of accuracy is generally
substantially smaller than for functional surrogates (physical surrogates inherently
embed knowledge about the system of interest) [1]. Hence, surrogate-based opti-
mization algorithms that exploit physically-based surrogate models are usually
more efficient than those using functional surrogates (in terms of the number of
high-fidelity model evaluations required to find a satisfactory design) [5].
Functional (or approximation) surrogate models [20,4]:
An initial functional surrogate can be generated using high-fidelity model data ob-
tained through sampling of the design space. Figure 3.2 shows the model construc-
tion flowchart for a functional surrogate. Design of experiments involves the use
of strategies for allocating samples within the design space. The particular choice
depends on the number of samples one can afford (in some occasions only a few
points may be allowed), but also on the specific modeling technique that will be
used to create the surrogate. Though in some cases the surrogate can be found us-
ing explicit formulas (e.g., polynomial approximation) [3], in most situations it is
computed by means of a separate minimization problem (e.g., when using kriging
[21] or neural networks [22]). The accuracy of the model should be tested in order
to estimate its prediction/generalization capability. The main difficulty in obtain-
ing a good functional surrogate lies in keeping a balance between accuracy at the
known and at the unknown data (training and testing set, respectively). The surro-
gate could be subsequently updated using new high-fidelity model data that is ac-
cumulated during the run of the surrogate-based optimization algorithm.
In this section we first describe the fundamental steps for generating functional
surrogates. Various sampling techniques are presented in Section 3.3.1. The surro-
gate creation and the model validations steps are tackled in Section 3.3.2 and Sec-
tion 3.3.3, respectively. If the quality of the surrogate is not sufficient, more data
points can be added, and/or the model parameters can be updated to improve accu-
racy. Several correction methods, both for functional and physical surrogates, are
described in Section 3.3.4.
Surrogate-Based Methods 39
Design of Experiments
High-Fidelity Model
Data Acquisition
Model Identification
(Data Fitting)
Model Validation
Fig. 3.2 Surrogate model construction flowchart. If the quality of the model is not satisfac-
tory, the procedure can be iterated (more data points will be required).
Fig. 3.3 Factorial designs for three design variables (n = 3): (a) full factorial design, (b)
fractional factorial design, (c) central composite design, (d) star design, and (e) Box-
Behnken design.
Polynomial regression [3] assumes the following relation between the function of
interest f and K polynomial basis functions vj using p samples f(x(i)), i = 1, … , p:
K
f ( x ( i ) ) = ∑ β j v j ( x (i ) ) . (3.3)
j =1
n n n
s ( x ) = s([ x1 x2 ... xn ]T ) = β 0 + ∑ β j x j + ∑∑ βij xi x j . (3.7)
j =1 i =1 j ≤i
where again f = [f(x(1)) f(x(2)) … f(x(p))]T, and the p×K matrix Φ is defined as
⎡ φ (|| x (1) − c (1) ||) φ (|| x (1) − c (2) ||) " φ (|| x (1) − c ( K ) ||) ⎤
⎢ ⎥
φ (|| x (2) − c (1) ||) φ (|| x (2) − c (2) ||) " φ (|| x (2) − c ( K ) ||) ⎥ . (3.10)
Φ=⎢
⎢ # # % # ⎥
⎢ ⎥
⎣⎢φ (|| x − c ||) φ (|| x − c ||) " φ (|| x − c ||) ⎦⎥
( p) (1) ( p) (2) ( p) (K )
3.3.2.3 Kriging
where R is a p×p correlation matrix with Rij = R(x(i),x(j)). Here, R(x(i), x(j)) is the
correlation function between sampled data points x(i) and x(j). The most popular
choice is the Gaussian correlation function
R ( x , y ) = exp ⎡ −∑ k =1θ k | xk − yk |2 ⎤ ,
n
(3.13)
⎣ ⎦
where θk are unknown correlation parameters, and xk and yk are the kth component
of the vectors x and y, respectively.
Surrogate-Based Methods 43
where r(x) = [R(x, x(1)) … R(x, x(p))]T, f = [f(x(1)) f(x(2)) … f(x(p))]T, and G is a
p×K matrix with Gij = gj(x(i)).
The vector of model parameters β can be computed as
β = (G T R −1G ) −1 G T R −1 f . (3.15)
where the variance σ2 and |R| are both functions of θk, is solved for positive values
of θk as optimization variables.
It should be noted that, once the kriging-based surrogate has been obtained, the
random process Z(x) gives valuable information regarding the approximation error
that can be used for improving the surrogate [4,35].
The basic structure in a neural network [39,40] is the neuron (or single-unit per-
ceptron). A neuron performs an affine transformation followed by a nonlinear op-
eration (see Fig. 3.5(a)). If the inputs to a neuron are denoted as x1, …, xn, the neu-
ron output y is computed as
1 , (3.18)
y=
1 + exp( −η / T )
(a) (b)
Fig. 3.5. Neural networks: (a) neuron basic structure; (b) two-layer feed-forward neural net-
work architecture.
The techniques described in this section refer to some other approaches that are
gaining popularity recently. One of the most prominent approaches, which has been
observed as a very general approximation tool, is support vector regression (SVR)
[41,42]. SVR resorts to quadratic programming for a robust solving of the underly-
ing optimization in the approximation procedure [43]. SVR is a variant of the sup-
port vector machines (SVMs) methodology developed by Vapnik [44], which was
originally applied to classification problems. SVR/SVM exploits the structural risk
minimization (SRM) principle, which has been shown (see, e.g., [41]) to be supe-
rior to the traditional empirical risk minimization (ERM) principle employed by
several modeling technologies (e.g., neural networks). ERM is based on minimiz-
ing an error function for the set of training points. When the model structure is
complex (e.g., higher order polynomials), ERM-based surrogates often result in
overfitting. SRM incorporates the model complexity in the regression, and there-
fore yields surrogates that may be more accurate outside of the training set.
Moving least squares (MLS) [45] is a technique particularly popular in aero-
space engineering. MLS is formulated as weighted least squares (WLS) [46]. In
MLS, the error contribution from each training point x(i) is multiplied by a weight
ωi that depends on the distance between x and x(i). A common choice for the
weights is
ωi (|| x − x ( i ) ||) = exp( − || x − x ( i ) ||2 ) . (3.19)
In this section we will describe two strategies for improving surrogates locally.
The corrections described in Section 3.3.4.1 are based on mapping objective func-
tion values. In some occasions, the cost function can be expressed as a function of
a model response. Section 3.3.4.2 presents the space-mapping concept that gives
rise to a whole surrogate-based optimization paradigm (see Section 3.4.2).
Most of the objective function corrections used in practice can be identified in one
of these three groups: compositional, additive or multiplicative corrections. We will
briefly illustrate each of these categories for correcting the surrogate s(i)(x), and dis-
cuss if zero- and first-order consistency conditions with f(x) [7] can be satisfied.
The following compositional correction [20]
s (i +1) ( x ) = g ( s (i ) ( x )) (3.20)
represents a simple scaling of the objective function. Since the mapping g is a real-
valued function of a real variable, a compositional correction will not in general
yield first-order consistency conditions. By selecting a mapping g that satisfies
∇f ( x ( i ) )∇s (i ) ( x (i ) )T , (3.21)
g '( s (i ) ( x (i ) )) =
∇s ( i ) ( x (i ) )∇s ( i ) ( x ( i ) )T
If f(x(i)) is not in the range of s(i)(x), then the condition s(i)(p(x(i))) = f(x(i)) is not
achievable. We can overcome that issue by combining both compositional correc-
tions. In that case, the following selection for g and p
g (t ) = t − s (i ) ( x (i ) ) + f ( x ( i ) ) , (3.23)
p( x ) = x ( i ) + J p ( x − x ( i ) ) , (3.24)
and
Surrogate-Based Methods 47
∇λ ( x ( i ) ) = ∇f ( x (i ) ) − ∇s ( i ) ( x (i ) ) . (3.27)
s (i +1) ( x ) = α ( x ) s (i ) ( x ) . (3.29)
f ( x(i) ) ,
α ( x(i) ) = (3.30)
s (i ) ( x ( i ) )
and
The requirement s(i)(x(i)) ≠ 0 is not strong in practice since very often the range of
f(x) (and thus, of the surrogate s(i)(x)) is known beforehand, and hence, a bias can
be introduced both for f(x) and s(i)(x) to avoid cost function values equal to zero.
In these circumstances the following multiplicative correction
⎡ f ( x (i ) ) ∇f ( x (i ) ) s (i ) ( x (i ) ) − f ( x (i ) )∇s (i ) ( x (i ) ) ⎤
s (i +1) ( x ) = ⎢ (i ) (i ) + (i ) (i ) 2
( x − x (i ) ) ⎥ s (i ) ( x ) , (3.32)
⎣ s (x ) ( s ( x )) ⎦
f ( x ) = U ( R f ( x )) . (3.33)
The fine model response Rf(x) is assumed to be accurate but computationally ex-
pensive. The coarse model response Rc(x)∈Rm is much cheaper to evaluate than
the fine model response at the expense of being an approximation of it. SM estab-
lishes a correction between model responses rather than between objective func-
tions. The corrected model response will be denoted as Rs(x; pSM) ∈ Rm, and pSM
represents a set of parameters that describes the type of correction performed.
We can find in the literature four different groups of coarse model response
corrections [1,5]:
48 S. Koziel, D. Echeverría Ciaurri, and L. Leifsson
Rs ( x; pSM ) = Rs ( x; r0 , r1 ) = Rc ( x; r0 + rt
1 )
. (3.36)
In Fig. 3.6 we illustrate by means of block diagrams the four SM-based correction
strategies introduced above, together with a combination of three of them.
The surrogate response is usually optimized with respect to the SM parame-
ters pSM in order to reduce the model discrepancy for all or part of the data avail-
able Rf(x(1)), Rf(x(2)), … , Rf (x(p)):
where 0 ≤ ω(k) ≤ 1 are weights for each of the samples. The corrected surrogate
Rs(x; pSM) can be used as an approximation to the fine response Rf(x) in the vicin-
ity of the sampled data. The minimization in (3.37) is known in SM literature as
parameter extraction [1]. The solving of this optimization process is not exempt
from difficulties, since in many cases the problem is ill-conditioned. We can find
in [1] a number of techniques for addressing parameter extraction in a robust
manner.
2
This type of space mapping is known as frequency space mapping [4], and it was origi-
nally proposed in microwave engineering applications (in these applications t usually
refers to frequency).
Surrogate-Based Methods 49
B c
(a)
A d
(b)
xp
(c)
x Rc(x; r0 + r1t)
Coarse
t r0 + r1t
Frequency SM Model
[r0 r1]
(d)
B c
x Input SM B x+c
Coarse Rc(B x + c; r0 + r1t) Output A Rc(B x + c; r0 + r1t) + d
t r0 + r1t Model SM
Frequency SM
[r0 r1] A d
(e)
Fig. 3.6 Basic space-mapping surrogate correction types: (a) input SM, (b) output SM, (c)
implicit SM, (d) frequency SM, and (e) composite using input, output and frequency SM.
and
s (i ) ( x ) = U ( Rs ( x; pSM
(i )
)) , (3.39)
Surrogate-Based Methods 51
for i > 0. The parameters pSM(i) are obtained by parameter extraction as in (3.37).
The accuracy of the corrected surrogate will clearly depend on the quality of the
coarse model response [16]. In microwave design applications it has been many
times observed that the number of points p needed for obtaining a satisfactory SM-
based corrected surrogate is on the order of the number of optimization variables n
[1]. Though output SM can be used to obtain both zero- and first-order consistency
conditions with f(x), many other SM-based optimization algorithms that have been
applied in practice do not satisfy those conditions, and in some occasions conver-
gence problems have been identified [14]. Additionally, the choice of an adequate
SM correction approach is not always obvious [14]. However, in multiple occa-
sions and in several different disciplines [52,53,1], space mapping has been re-
ported as a very efficient means for obtaining satisfactory optimal designs.
Convergence properties of space-mapping optimization algorithms can be
improved when these are safeguarded by a trust region [54]. Similarly to AMMO,
the SM surrogate model optimization is restricted to a neighborhood of x(i) (this
time by using the Euclidean norm) as follows
where δ(i) denotes the trust-region radius at iteration i. The trust region is updated
at every iteration by means of precise criteria [11]. A number of enhancements for
space mapping have been suggested recently in the literature (e.g., zero-order and
aproximate/exact first order consistency conditions with f(x) [54], or adaptively
constrained parameter extraction [55]).
The quality of a surrogate within space mapping can be assessed by means of
the techniques described in [14,16]. These methods are based on evaluating the
high-fidelity model at several points (and thus, they require some extra
computational effort). With that information, some conditions required for
convergence are approximated numerically, and as a result, low-fidelity models
can be compared based on these approximate conditions. The quality assessment
algorithms presented in [14,16] can also be embedded into SM optimization
algorithms in order to throw some light on the delicate issue of selecting the most
adequate SM surrogate correction.
It should be emphasized that space mapping is not a general-purpose
optimization approach. The existence of the computationally cheap and
sufficiently accurate low-fidelity model is an important prerequisite for this
technique. If such a coarse model does exist, satisfactory designs are often
obtained by space mapping after a relatively small number of evaluations of the
high-fidelity model. This number is usually on the order of the number of
optimization variables n [14], and very frequently represents a dramatic reduction
in the computational cost required for solving the same optimization problem with
other methods that do not rely on surrogates. In the absence of the above-
mentioned low-fidelity model, space-mapping optimization algorithms may not
perform efficiently.
52 S. Koziel, D. Echeverría Ciaurri, and L. Leifsson
The matrix S(0) is typically taken as the identity matrix Im. Here, †
denotes the
pseudoinverse operator defined for ΔC as
ΔC † = VΔC Σ Δ†CU ΔTC , (3.45)
where UΔC, ∑ΔC, and VΔC are the factors in the singular value decomposition of
ΔC. The matrix ∑ΔC † is the result of inverting the nonzero entries in ∑ΔC, leaving
the zeroes invariant [8]. Some mild general assumptions on the model responses
are made in theory [56] so that every pseudoinverse introduced is well defined.
The response correction Rs(i)(x) is an approximation of
Rs* ( x ) = R f ( x * ) + S * ( Rc ( x ) − Rc ( x * )) , (3.46)
where Jf (x*) and Jc(x*) stand for the fine and coarse model response Jacobian, re-
spectively, evaluated at x*. Obviously, neither x* nor S * is known beforehand.
Therefore, one needs to use an iterative approximation, such as the one in (3.41)-
(3.45), in the actual manifold-mapping algorithm.
The manifold-mapping model alignment is illustrated in Fig. 3.7 for the least-
squares optimization problem
U ( R f ( x )) =|| R f ( x ) − y ||22 , (3.48)
with y ∈ Rm being the design specifications given. In that figure the point xc* de-
notes the minimizer corresponding to the coarse model cost function U(Rc(x)). We
note that, in absence of constraints, the optimality associated to (3.48) is translated
into the orthogonality between the tangent plane for Rf (x) at x* and the vector
Rf(x*) - y.
Surrogate-Based Methods 53
Rc(xc*)
y y
Fig. 3.7 Illustration of the manifold-mapping model alignment for a least-squares optimiza-
tion problem. The point xc* denotes the minimizer corresponding to the coarse model re-
sponse, and the point y is the vector of design specifications. Thin solid and dashed straight
lines denote the tangent planes for the fine and coarse model response at their optimal de-
signs, respectively. By the linear correction S *, the point Rc(x*) is mapped to Rf(x*), and
the tangent plane for Rc(x) at Rc(x*) to the tangent plane for Rf(x) at Rf(x*) [13].
for some of them improves on f(x(i)) the search step is declared successful, the cur-
rent pattern is centered at this new point, and a new search step is started. Other-
wise a poll step is taken. Polling requires computing f(x) for points in the pattern.
If one of these points is found to improve on f(x(i)), the poll step is declared suc-
cessful, the pattern is translated to this new point, and a new search step is per-
formed. Otherwise the whole pattern search iteration is considered unsuccessful
and the termination condition is checked. This stopping criterion is typically based
on the pattern size Δ [9,61]. If, after the unsuccessful pattern search iteration an-
other iteration is needed, the pattern size Δ is decreased, and a new search step is
taken with the pattern centered again at x(i). Surrogates are incorporated in the
SMF through the search step. For example, kriging (with Latin hypercube sam-
pling) is considered in the SMF application studied in [61].
In order to guarantee convergence to a stationary point, the set of vectors
formed by each pattern point and the pattern center should be a generating (or
positive spanning) set [60,61]. A generating set for Rn consists of a set of vectors
whose non-negative linear combinations span Rn. Generating sets are crucial in
proving convergence (for smooth objective functions) due to the following prop-
erty: if a generating set is centered at x(i) and ∇f(x(i)) ≠ 0, then at least one of the
vectors in the generating set defines a descent direction [60]. Therefore, if f(x) is
smooth and ∇f(x(i)) ≠ 0, we can expect that for a pattern size Δ small enough, some
of the points in the associated stencil will improve on f(x(i)).
Though pattern search optimization algorithms typically require many more
function evaluations than gradient-based techniques, the computations in both the
search and poll steps can be performed in a distributed fashion. On top of that, the
use of surrogates, as is the case for the SMF, generally accelerates noticeably the
entire optimization process.
Surrogate-Based Methods 55
contemporary engineering design, and the importance of this role will most likely
increase in the near future. One of the reasons for this increase is the fact that
computer simulations have become a major design tool in most engineering areas.
In order for these simulations to be sufficiently accurate, more and more phenom-
ena have to be captured. This level of sophistication renders simulations computa-
tionally expensive, particularly when they deal with the time-varying three-
dimensional structures considered in many engineering fields. Hence, evaluation
times of several days, or even weeks, are nowadays not uncommon. The direct use
of CPU-intensive numerical models in some off-the-shelf automated optimization
procedures (e.g., gradient-based techniques with approximate derivatives) is very
often prohibitive. Surrogate-based optimization can be a very useful approach in
this context, since, apart from reducing significantly the number of high-fidelity
expensive simulations in the whole design process, it also helps in addressing im-
portant high-fidelity cost function issues (e.g., presence of discontinuities and/or
multiple local optima).
References
1. Bandler, J.W., Cheng, Q.S., Dakroury, S.A., Mohamed, A.S., Bakr, M.H., Madsen, K.,
Søndergaard, J.: Space mapping: the state of the art. IEEE Trans. Microwave Theory
Tech. 52, 337–361 (2004)
2. Pironneau, O.: On optimum design in fluid mechanics. J. Fluid Mech. 64, 97–110
(1974)
3. Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidynathan, R., Tucker, P.K.: Surro-
gate-based analysis and optimization. Progress in Aerospace Sciences 41, 1–28 (2005)
4. Forrester, A.I.J., Keane, A.J.: Recent advances in surrogate-based optimization. Prog.
Aerospace Sciences 45, 50–79 (2009)
5. Koziel, S., Bandler, J.W., Madsen, K.: A space mapping framework for engineering
optimization: theory and implementation. IEEE Trans. Microwave Theory Tech. 54,
3721–3730 (2006)
6. Koziel, S., Cheng, Q.S., Bandler, J.W.: Space mapping. IEEE Microwave Magazine 9,
105–122 (2008)
7. Alexandrov, N.M., Lewis, R.M.: An overview of first-order model management for
engineering optimization. Optimization and Engineering 2, 413–430 (2001)
8. Echeverria, D., Hemker, P.W.: Space mapping and defect correction. CMAM Int.
Mathematical Journal Computational Methods in Applied Mathematics 5, 107–136
(2005)
9. Booker, A.J., Dennis, J.E., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.: A
rigorous framework for optimization of expensive functions by surrogates. Structural
Optimization 17, 1–13 (1999)
10. Simpson, T.W., Peplinski, J., Koch, P.N., Allen, J.K.: Metamodels for computer-based
engineering design: survey and recommendations. Engineering with Computers 17,
129–150 (2001)
11. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust Region Methods. MPS-SIAM Series on
Optimization (2000)
Surrogate-Based Methods 57
12. Alexandrov, N.M., Dennis, J.E., Lewis, R.M., Torczon, V.: A trust region framework
for managing use of approximation models in optimization. Struct. Multidisciplinary
Optim. 15, 16–23 (1998)
13. Echeverría, D., Hemker, P.W.: Manifold mapping: a two-level optimization technique.
Computing and Visualization in Science 11, 193–206 (2008)
14. Koziel, S., Bandler, J.W., Madsen, K.: Quality assessment of coarse models and surro-
gates for space mapping optimization. Optimization Eng. 9, 375–391 (2008)
15. Koziel, S., Bandler, J.W.: Coarse and surrogate model assessment for engineering de-
sign optimization with space mapping. In: IEEE MTT-S Int. Microwave Symp. Dig,
Honolulu, HI, pp. 107–110 (2007)
16. Koziel, S., Bandler, J.W.: Space-mapping optimization with adaptive surrogate model.
IEEE Trans. Microwave Theory Tech. 55, 541–547 (2007)
17. Alexandrov, N.M., Nielsen, E.J., Lewis, R.M., Anderson, W.K.: First-order model
management with variable-fidelity physics applied to multi-element airfoil optimiza-
tion. In: AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Design and
Optimization, Long Beach, CA, AIAA Paper 2000-4886 (2000)
18. Wu, K.-L., Zhao, Y.-J., Wang, J., Cheng, M.K.K.: An effective dynamic coarse model
for optimization design of LTCC RF circuits with aggressive space mapping. IEEE
Trans. Microwave Theory Tech. 52, 393–402 (2004)
19. Robinson, T.D., Eldred, M.S., Willcox, K.E., Haimes, R.: Surrogate-based optimiza-
tion using multifidelity models with variable parameterization and corrected space
mapping. AIAA Journal 46, 2814–2822 (2008)
20. Søndergaard, J.: Optimization using surrogate models – by the space mapping tech-
nique. Ph.D. Thesis, Informatics and Mathematical Modelling, Technical University of
Denmark, Lyngby (2003)
21. Kleijnen, J.P.C.: Kriging metamodeling in simulation: a review. European Journal of
Operational Research 192, 707–716 (2009)
22. Rayas-Sanchez, J.E.: EM-based optimization of microwave circuits using artificial
neural networks: the state-of-the-art. IEEE Trans. Microwave Theory Tech. 52,
420–435 (2004)
23. Giunta, A.A., Wojtkiewicz, S.F., Eldred, M.S.: Overview of modern design of experi-
ments methods for computational simulations. American Institute of Aeronautics and
Astronautics, paper AIAA 2003–0649 (2003)
24. Santner, T.J., Williams, B., Notz, W.: The Design and Analysis of Computer Experi-
ments. Springer, Heidelberg (2003)
25. Koehler, J.R., Owen, A.B.: Computer experiments. In: Ghosh, S., Rao, C.R. (eds.)
Handbook of Statistics, vol. 13, pp. 261–308. Elsevier Science B.V., Amsterdam
(1996)
26. Cheng, Q.S., Koziel, S., Bandler, J.W.: Simplified space mapping approach to en-
hancement of microwave device models. Int. J. RF and Microwave Computer-Aided
Eng. 16, 518–535 (2006)
27. McKay, M., Conover, W., Beckman, R.: A comparison of three methods for selecting
values of input variables in the analysis of output from a computer code. Technomet-
rics 21, 239–245 (1979)
28. Beachkofski, B., Grandhi, R.: Improved distributed hypercube sampling. American In-
stitute of Aeronautics and Astronautics, Paper AIAA 2002–1274 (2002)
29. Leary, S., Bhaskar, A., Keane, A.: Optimal orthogonal-array-based latin hypercubes.
Journal of Applied Statistics 30, 585–598 (2003)
58 S. Koziel, D. Echeverría Ciaurri, and L. Leifsson
30. Ye, K.Q.: Orthogonal column latin hypercubes and their application in computer ex-
periments. Journal of the American Statistical Association 93, 1430–1439 (1998)
31. Palmer, K., Tsui, K.-L.: A minimum bias latin hypercube design. IIE Transactions 33,
793–808 (2001)
32. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. The Johns Hopkins Uni-
versity Press, Baltimore (1996)
33. Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimiza-
tion. MPS-SIAM Series on Optimization, MPS-SIAM (2009)
34. Wild, S.M., Regis, R.G., Shoemaker, C.A.: ORBIT: Optimization by radial basis func-
tion interpolation in trust-regions. SIAM J. Sci. Comput. 30, 3197–3219 (2008)
35. Journel, A.G., Huijbregts, C.J.: Mining Geostatistics. Academic Press, London (1981)
36. O’Hagan, A.: Curve fitting and optimal design for predictions. Journal of the Royal
Statistical Society B 40, 1–42 (1978)
37. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT
Press, Cambridge (2006)
38. Jones, D., Schonlau, M., Welch, W.: Efficient global optimization of expensive black-
box functions. Journal of Global Optimization 13, 455–492 (1998)
39. Haykin, S.: Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice-Hall,
Englewood Cliffs (1998)
40. Minsky, M.I., Papert, S.A.: Perceptrons: An Introduction to Computational Geometry.
MIT Press, Cambridge (1969)
41. Gunn, S.R.: Support vector machines for classification and regression. Technical Re-
port. School of Electronics and Computer Science, University of Southampton (1998)
42. Angiulli, G., Cacciola, M., Versaci, M.: Microwave devices and antennas modeling by
support vector regression machines. IEEE Trans. Magn. 43, 1589–1592 (2007)
43. Smola, A.J., Schölkopf, B.: A tutorial on support vector regression. Statistics and
Computing 14, 199–222 (2004)
44. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)
45. Levin, D.: The approximation power of moving least-squares. Mathematics of Compu-
tation 67, 1517–1531 (1998)
46. Aitken, A.C.: On least squares and linear combinations of observations. Proceedings of
the Royal Society of Edinburgh 55, 42–48 (1935)
47. Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT
Press, Massachussets (2006)
48. Geisser, S.: Predictive Inference. Chapman and Hall, Boca Raton (1993)
49. Koziel, S., Cheng, Q.S., Bandler, J.W.: Implicit space mapping with adaptive selection
of preassigned parameters. IET Microwaves, Antennas & Propagation 4, 361–373
(2010)
50. Alexandrov, N.M., Lewis, R.M., Gumbert, C.R., Green, L.L., Newman, P.A.: Ap-
proximation and model management in aerodynamic optimization with variable-
fidelity models. AIAA Journal of Aircraft 38, 1093–1101 (2001)
51. Moré, J.J.: Recent developments in algorithms and software for trust region methods.
In: Bachem, A., Grötschel, M., Korte, B. (eds.) Mathematical Programming. The State
of Art, pp. 258–287. Springer, Heidelberg (1983)
52. Leary, S.J., Bhaskar, A., Keane, A.J.: A constraint mapping approach to the structural
optimization of an expensive model using surrogates. Optimization and Engineering 2,
385–398 (2001)
Surrogate-Based Methods 59
53. Redhe, M., Nilsson, L.: Optimization of the new Saab 9-3 exposed to impact load us-
ing a space mapping technique. Structural and Multidisciplinary Optimization 27,
411–420 (2004)
54. Koziel, S., Bandler, J.W., Cheng, Q.S.: Robust trust-region space-mapping algorithms
for microwave design optimization. IEEE Trans. Microwave Theory and Tech. 58,
2166–2174 (2010)
55. Koziel, S., Bandler, J.W., Cheng, Q.S.: Adaptively constrained parameter extraction
for robust space mapping optimization of microwave circuits. IEEE MTT-S Int. Mi-
crowave Symp. Dig., 205–208 (2010)
56. Echeverría, D.: Multi-Level optimization: space mapping and manifold mapping.
Ph.D. Thesis, Faculty of Science, University of Amsterdam (2007)
57. Koziel, S., Echeverría Ciaurri, D.: Reliable simulation-driven design optimization of
microwave structures using manifold mapping. Progress in Electromagnetics Research
B 26, 361–382 (2010)
58. Hemker, P.W., Echeverría, D.: A trust-region strategy for manifold mapping optimiza-
tion. JCP Journal of Computational Physics 224, 464–475 (2007)
59. Echeverría, D.: Two new variants of the manifold-mapping technique. COMPEL The
International Journal for Computation and Mathematics in Electrical Engineering 26,
334–344 (2007)
60. Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: new perspec-
tives on some classical and modern methods. SIAM Review 45, 385–482 (2003)
61. Marsden, A.L., Wang, M., Dennis, J.E., Moin, P.: Optimal aeroacoustic shape design
using the surrogate management framework. Optimization and Engineering 5, 235–262
(2004)
Chapter 4
Derivative-Free Optimization
4.1 Introduction
Efficient optimization very often hinges on the use of derivative information of
the cost function and/or constraints with respect to the design variables. In the last
Oliver Kramer
UC Berkeley, Berkeley, CA 94704, USA
e-mail: [email protected]
David Echeverrı́a Ciaurri
Department of Energy Resources Engineering,
Stanford University, Stanford, CA 94305-2220, USA
e-mail: [email protected]
Slawomir Koziel
Engineering Optimization & Modeling Center,
School of Science and Engineering, Reykjavik University, Menntavegur 1,
101 Reykjavik, Iceland
e-mail: [email protected]
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 61–83.
springerlink.com c Springer-Verlag Berlin Heidelberg 2011
62 O. Kramer, D. Echeverrı́a Ciaurri, and S. Koziel
Many derivative-free methods are easy to implement, and this feature makes them
attractive when approximate solutions are required in a short time frame. An obvi-
ous statement that is often neglected is that the computational cost of an iteration
of an algorithm is not always a good estimate of the time needed within a project
(measured from its inception) to obtain results that are satisfactory. However, one
important drawback of derivative-free techniques (when compared, for example,
with adjoint-based approaches) is the limitation on the number of optimization vari-
ables that can be handled. For example, in [3] and [2] the limit given is a few hundred
variables. However, this limit in the problem size can be overcome, at least to some
extent, if one is not restricted to a single sequential environment. For some of the
algorithms though, adequately exploiting parallelism may be difficult or even im-
possible. When distributed computing resources are scarce or not available, and for
simulation-based designs with significantly more than a hundred optimization vari-
ables, some form of parameter reduction is mandatory. In these cases, surrogates
or reduced order models [8] for the cost function and constraints are desirable ap-
proaches. Fortunately, suitable parameter and model order reduction techniques can
often be found in many engineering applications, although they may give rise to in-
accurate models. We should add that even in theory, as long as a problem with nons-
mooth/noisy cost functions/constraints can be reasonably approximated by a smooth
function (see [9], Section 10.6), some derivative-free optimization algorithms per-
form well with nonsmooth/noisy cost functions, as has been observed in practice
[2, 3].
In the last decade, there has been a renaissance of gradient-free optimization
methodologies, and they have been successfully applied in a number of areas. Exam-
ples of this are ubiquitous; to name a few, derivative-free techniques have been used
within molecular geometry [10], aircraft design [11, 12], hydrodynamics [13, 14],
medicine [15, 16] and earth sciences [17, 18, 19, 20]. These references include
generally constrained cases with derivative-free objective functions and constraints,
continuous and integer optimization variables, and local and global approaches. In
spite of all this apparent abundance of results, we should not disregard the general
recommendation (see [3, 2]) of strongly preferring gradient-based methods if accu-
rate derivative information can be computed reasonably efficiently and globally.
This chapter is structured as follows. In Section 4.2 we introduce the gen-
eral problem formulation and notation. A number of derivative-free methodologies
for unconstrained continuous optimization are presented in the next two sections.
Section 4.3 refers to local optimization, and Section 4.4 is devoted to global op-
timization. Guidelines for extending all these algorithms to generally constrained
optimization are given in Section 4.5. We bring the chapter to an end with some
conclusions and recommendations.
where f (x) is the objective function, x ∈ Rn is the vector of control variables, and
g : Rn → Rm represents the nonlinear constraints in the problem. Bound and linear
constraints are included in the set Ω ⊂ Rn . For many approaches it is natural to treat
any constraints for which derivatives are available separately. In particular, bounds
and linear constraints, and any other structure than can be exploited, should be. So
for example, nonlinear least-squares problems should exploit that inherent structure
whenever possible (see e.g. [21]). We are interested in applications for which the
objective function and constraint variables are computed using the output from a
simulator, rendering function evaluations expensive and derivatives unavailable.
We will begin by discussing some general issues with respect to optimization
with derivatives since they have important relevancy to the derivative-free case.
Essentially all approaches to the former are somewhere between steepest descent
and Newton’s method, or equivalently use something that is between a linear and
a quadratic model. This is reinforced by the realization that almost all practical
computation is linear at its core, and (unconstrained) minima are characterized by
the gradient being zero, and quadratic models give rise to linear gradients. In fact,
theoretically at least, steepest descent is robust but slow (and in fact sometimes so
slow that in practice it is not robust) whereas Newton’s method is fast but may have
a very small radius of convergence. That is, one needs to start close to the solu-
tion. It is also computationally more demanding. Thus in a sense, most practical
unconstrained algorithms are intelligent compromises between these two extremes.
Although, somewhat oversimplified, one can say that the constrained case is dealt
with by being feasible, determining which constraints are tight, linearizing these
constraints and then solving the reduced problem determined by these linearizations.
Therefore, some reliable first-order model is essential, and for faster convergence,
something more like a second-order model is desirable. In the unconstrained case
with derivatives these are typically provided by a truncated Taylor series model (in
the first-order case) and some approximation to a truncated second-order Taylor se-
ries model. A critical property of such models is that as the step sizes become small
the models become more accurate. In the case where derivatives, or good approx-
imations to them, are not available, clearly, one cannot use truncated Taylor series
models. It thus transpires that, if for example, one uses interpolation or regression
models, that depend only on function values, one can no longer guarantee that as the
step sizes become small the models become more accurate. Thus one has to have
some explicit way to make this guarantee, at least approximately. It turns out that
this is usually done by considering the geometry of the points at which the func-
tion is evaluated, at least, before attempting to decrease the effective maximum step
size. In pattern search methods, this is done by explicitly using a pattern with good
geometry, for example, a regular mesh that one only scales while maintaining the a
priori good geometry.
In the derivative case the usual stopping criteria relates to the first-order optimal-
ity conditions. In the derivative-free case, one does not explicitly have these, since
they require (approximations to) the derivatives. At this stage we just remark that
any criteria used should relate to the derivative case conditions, so, for example one
needs something like a reasonable first-order model, at least asymptotically.
4 Derivative-Free Optimization 65
Generalized pattern search (GPS; [22, 23]) refers to a whole family of optimiza-
tion methods. GPS relies on polling (local exploration of the cost function on the
pattern) but may be enhanced by additional searches, see [23]. At any particular it-
eration a stencil (pattern) is centered at the current solution. The stencil comprises
a set of directions such that at least one direction is a descent direction. This is also
called a generating set (see e.g. [2]). If any of the points in the stencil represent an
improvement in the cost function, the stencil is moved to one of them. Otherwise,
the stencil size is decreased. The optimization progresses until some stopping crite-
rion is satisfied (typically, a minimum stencil size). Generalized pattern search can
be further generalized by polling in an asymptotically dense set of directions (this
set varies with the iterations). The resulting algorithm is the mesh adaptive direct
search (MADS; [24]). In particular, some generalization of a simple fixed pattern is
essential for constrained problems. The GPS method parallelizes naturally since, at a
particular iteration, the objective function evaluations at the polling points can be ac-
complished in a distributed fashion. The method typically requires on the order of n
function evaluations per iteration (where n is the number of optimization variables).
The Hooke-Jeeves direct search (HJDS; [25]) is another pattern search method and
was the first to use the term ‘direct search’ method and take advantage of the idea
66 O. Kramer, D. Echeverrı́a Ciaurri, and S. Koziel
d2
d1
x2
optimum
x1
x0
Exploratory move
Pattern move
Improvement
No Improvement
Fig. 4.1 Illustration of exploratory and pattern moves in Hooke-Jeeves direct search (modi-
fied from [19]). The star represents the optimum.
of a pattern. HJDS is based on two types of moves: exploratory and pattern. These
moves are illustrated in Figure 4.1 for some optimization iterations in R2 .
The iteration starts with a base point x0 and a given step size. During the ex-
ploratory move, the objective function is evaluated at successive changes of the
base point in the search (for example coordinate) directions. All the directions are
polled sequentially and in an opportunistic way. This means that if d1 ∈ Rn is the
first search direction, the first function evaluation is at x0 + d1 . If this represents
an improvement in the cost function, the next point polled will be, assuming n > 1,
x0 + d1 + d2 , where d2 is the second search direction. Otherwise the point x0 − d1
is polled. Upon success at this last point, the search proceeds with x0 − d1 + d2 , and
alternatively with x0 + d2 . The exploration continues until all search directions have
been considered. If after the exploratory step no improvement in the cost function is
found, the step size is reduced. Otherwise, a new point x1 is obtained, but instead of
centering another exploratory move at x1 , the algorithm performs the pattern move,
which is a more aggressive step that moves further in the underlying successful di-
rection. After the pattern move, the next polling center x2 is set at x0 + 2(x1 − x0 ).
If the exploratory move at x2 fails to improve upon x1 , a new polling is performed
around x1 . If this again yields no cost function decrease, the step size is reduced,
keeping the polling center at x1 .
Notice the clear serial nature of the algorithm. This makes HJDS a reason-
able pattern search option when distributed computing resources are not available.
Because of the pattern move, HJDS may also be beneficial in situations where an op-
timum is far from the initial guess. One could argue that initially pattern search tech-
niques should use a relatively large stencil size on the hope that this feature enables
them to avoid some local minima and, perhaps, some robustness against noisy cost
functions.
4 Derivative-Free Optimization 67
f (x0 ) − f (y1 )
ρ = .
m (x0 ) − m (y1 )
68 O. Kramer, D. Echeverrı́a Ciaurri, and S. Koziel
Then typically one assigns some updating strategy to the trust-region radius Δ 0
like ⎧
⎨ 2 · Δ 0, if ρ > 0.9 ,
Δ1 = Δ 0 , if 0.1 ≤ ρ ≤ 0.9 ,
⎩
0.5 · Δ 0 if ρ < 0.1 ,
where Δ1 denotes the updated radius. In the first two cases x1 = y1 and in the third
case x1 = x0 .
Thus, although oversimplified, if we are using Taylor series approximations for
our models, within the trust management scheme one can ensure convergence to a
solution satisfying first-order optimality conditions [9]. Perhaps the most important
difference once derivatives are not available is that we cannot take Taylor series
models and so, in general, optimality can no longer be guaranteed. In fact, we have
to be sure that when we reduce the trust-region radius it is because of the problem
and not just a consequence of having a bad model as a result of poor geometry of
the sampling points. So it is here that one has to consider the geometry. Fortunately,
it can be shown that one can constructively ensure good geometry, and with that,
support the whole derivative-free approach with convergence to solutions that satisfy
first-order optimality conditions. For details see [3], Chapter 6.
1 Start
2 Initialize solutions xi of population P
3 Evaluate objective function for the solutions xi in P
4 Repeat
5 For i = 0 To λ
6 Select ρ parents from P
7 Create new xi by recombination
8 Mutate xi
9 Evaluate objective function for xi
10 Add xi to P
11 Next
12 Select μ parents from P and form new P
13 Until termination condition
14 End
4.4.1.2 Recombination
1 ρ k
xi := ∑ xi .
ρ k=1
(4.3)
4.4.1.3 Mutation
Mutation is the second main source for evolutionary changes. According to Beyer
and Schwefel [38], a mutation operator is supposed to fulfill three conditions. First,
from each point in the solution space each other point must be reachable. Second,
in unconstrained solution spaces a bias is disadvantageous, because the direction to
the optimum is not known. And third, the mutation strength should be adjustable
in order to adapt to solution space conditions. In the following, we concentrate on
the well-known Gaussian mutation operator. We assume that solutions are vectors
of real values. Random numbers based on the Gaussian distribution N (0, 1) satisfy
these conditions in continuous domains. The Gaussian distribution can be used to
describe many natural and artificial processes. By isotropic Gaussian mutation each
component of x is perturbed independently with a random number from a Gaussian
distribution with zero mean and standard deviation σ .
4 Derivative-Free Optimization 71
x x x
1 1 1
x2 x2 x
2
Fig. 4.3 Gaussian mutation: isotropic Gaussian mutation (left) uses one step size σ for each
dimension, multivariate Gaussian mutation (middle) allows independent step sizes for each
dimension, and correlated mutation (right) introduces an additional rotation of the coordinate
system
The standard deviation σ plays the role of the mutation strength, and is also
known as step size. The step size σ can be kept constant, but convergence can be
improved by adapting σ according to the local solution space characteristics. In
case of high success rates, i.e., a high number of offspring solutions being better
than their parents, large step sizes are advantageous in order to promote the explo-
ration of the search space. This is often the case at the beginning of the search.
Small step sizes are appropriate for low success rates. This is frequently adequate
in later phases of the search, when the optimization history can be exploited while
the optimum is approximated. An example for an adaptive control of step sizes is
the 1/5-th success rule by Rechenberg [39] that increases the step size if the success
rate is over 1/5-th, and decreases it, if the success rate is lower.
The isotropic Gaussian mutation can be extended to the multivariate Gaussian
mutation by introducing a step size vector σ with independent step sizes σi . Fig-
ure 4.3 illustrates the differences between isotropic Gaussian mutation (left) and
the multivariate Gaussian mutation (middle). The multivariate variant considers a
mutation ellipsoid that adapts flexibly to local solution space characteristics.
Even more flexibility can be obtained through the correlated mutation proposed
by Schwefel [44] that aligns the coordinate system to the solution space charac-
teristics. The mutation ellipsoid is rotated by means of an orthogonal matrix, and
this rotation can be modified along iterations. The rotated ellipsoid is also shown in
Figure 4.3 (right). The covariance matrix adaptation evolution strategies (CMA-ES)
and derivates [45, 46] are self-adapting control strategies based on an automatic
alignment of the coordinate system.
4.4.1.4 Selection
can be utilized at two points. Mating selection selects individuals for recombina-
tion. Another popular selection operator is survivor selection, corresponding to the
Darwinian principle of survival of the fittest. Only the individuals selected by sur-
vivor selection are allowed to confer genetic material to the following generation.
The elitist strategies plus and comma selection choose the μ best solutions and are
usually applied for survivor selection. Plus selection selects the μ best solutions
from the union P ∪ P of the last parental population P and the current offspring
population P , and is denoted by (μ + λ )-EA. In contrast to plus selection, comma
selection, which is denoted by (μ , λ )-EA, selects exclusively from the offspring
population, neglecting the parental population − even if individuals have superior
fitness. Though disregarding these apparently promising solutions may seem to be
disadvantageous, this strategy that prefers the new population to the old population
can be useful to avoid being trapped in unfavorable local optima.
The deterministic selection scheme described in the previous paragraph is a char-
acteristic feature of ES. Most evolutionary algorithms use selection schemes con-
taining random components. An example is fitness proportionate selection (also
called roulette-wheel selection) popular in the early days of genetic algorithms [41].
Another example is tournament selection, a widely used selection scheme for EAs.
Here, the candidate with the highest fitness out of a randomly chosen subset of the
population is selected to the new population. The stochastic-based selection schemes
permit survival of not-so-fit individuals and thus helps with preventing premature
convergence and preserving the genetic material that may come in handy at later
stages of the optimization process.
individuals, may survive, and the variance estimate in the direction of the gradient
could be larger than without AMS.
where x∗p and x∗s denote the best previous positions of the particle and of the swarm,
respectively. The weights c1 and c2 are acceleration coefficients that determine the
bias of the particle towards its own or the swarm history. The recommendation given
by Kennedy and Eberhart is to set both parameters to one. The stochastic compo-
nents r1 and r2 are uniformly drawn from the interval [0, 1], and can be used to
promote the global exploration of the search space.
3. The position of the new agent y is computed by means of the following iteration
over i ∈ {1, . . . , n}:
i) select a random number ri ∈ (0, 1) with uniform probability distribution;
ii) if i = p or ri < CR let yi = ai + F(bi − ci ), otherwise let yi = xi ; here, F ∈ [0, 2]
is the differential weight and CR ∈ [0, 1] is the crossover probability, both
defined by the user;
iii) if f (y) < f (x) then replace x by y; otherwise reject y and keep x.
Although DE resembles some other stochastic optimization techniques, unlike tra-
ditional EAs, DE perturbs the solutions in the current generation vectors with scaled
differences of two randomly selected agents. As a consequence, no separate prob-
ability distribution has to be used, and thus the scheme presents some degree of
self-organization. Additionally, DE is simple to implement, uses very few con-
trol parameters, and has been observed to perform satisfactorily in a number of
multi-modal optimization problems [52].
where ρ > 0 is a penalty parameter. The modified optimization problem may still
have constraints that are straightforward to handle.
If the penalty parameter is iteratively increased (tending to infinity), the solution
of (4.6) converges to that of the original problem in (4.1). However, in certain cases,
a finite (and fixed) value of the penalty parameter ρ also yields the correct solution
(this is the so-called exact penalty; see [7]). For exact penalties, the modified cost
function is not smooth around the solution [7], and thus the corresponding optimiza-
tion problem can be significantly more involved than that in (4.6). However, one can
argue that in the derivative-free case exact penalty functions may in some cases be
attractive. Common definitions of h(x), where I and J denote the indices that refer
to inequality and equality constraints, respectively, are
h(x) = 1
2 ∑ max(0, gi (x))2 + ∑ g2i (x)
i∈I j∈J
4 Derivative-Free Optimization 75
an exact penalty. It should be noticed that by these penalties, the search considers
both feasible and infeasible points. Those optimization methodologies where the
optimum can be approached from outside the feasible region are known as exterior
methods.
The log-barrier penalty (for inequality constraints)
has to be used with a decreasing penalty parameter (tending to zero). This type of
penalty methods (also known as barrier methods) confines the optimization to the
feasible region of the search space. Interior methods aim at reaching the optimum
from inside the feasible region.
In [53], non-quadratic penalties have been suggested for pattern search tech-
niques. However, the optimizations presented in that work are somewhat simpler
than those found in many practical situations, so the recommendations given might
not be generally applicable. In future research, it will be useful to explore further
the performance of different penalty functions in the context of simulation-based
optimization.
where ρ > 0 is a penalty parameter, and λ ∈ Rm are Lagrange multipliers. This cost
function can indeed be interpreted as a quadratic penalty with the constraints shifted
by some constant term [56]. As in penalty methods, the penalty parameter and the
Lagrange multipliers are iteratively updated. It turns out that if one is sufficiently
stationary for Equation (4.7), which is exactly when we have good approximations
for the Lagrange multipliers, then λ can be updated via
λ + = λ + ρ g (x) , (4.8)
76 O. Kramer, D. Echeverrı́a Ciaurri, and S. Koziel
f
l l
fk
F ( hk , f k )
h max h
Fig. 4.4 An idealized (pattern search) filter at iteration k (modified from [19])
where λ + denotes the updated Lagrange multipliers. Otherwise one should increase
the penalty parameter ρ (say by multiplying it by 10). The Lagrange multipliers are
typically initialized to zero. What is significant is that one can prove (see e.g. [56])
that after a finite number of iterations the penalty parameter is never updated, and
that the whole scheme eventually converges to a solution of the original optimization
problem in (4.1). Inequality constraints can also be incorporated in the augmented
Lagrangian framework by introducing slack variables and simple bounds [56]. The
augmented Lagrangian approach can be combined with most optimization algo-
rithms. For example, refer to [58] for a nonlinear programming methodology based
on generalized pattern search.
iteration are accepted if they are not dominated by any point in the filter. The filter
is updated at each iteration based on all the points evaluated by the optimizer. We
reiterate that, as for exterior methods, the optimization search is enriched by con-
sidering infeasible points, although the ultimate solution is intended to be feasible
(or very nearly so). Filters are often observed to lead to faster convergence than
methods that rely only on feasible iterates.
Pattern search optimization techniques have been previously combined with fil-
ters [60]. In Hooke-Jeeves direct search, the filter establishes the acceptance crite-
rion for each (unique) new solution. For schemes where, in each iteration, multiple
solutions can be accepted by the filter (such as in GPS), the new polling center must
be selected from the set of validated points. When the filter is not updated in a par-
ticular iteration (and thus the best feasible point is not improved), the pattern size is
decreased. As in [60], when we combine GPS with a filter, the polling center at a
given iteration will be the feasible point with lowest cost function or, if no feasible
points remain,
it will
bethe infeasible
point with lowest constraint violation. These
two points, 0, fkF and hIk , fkI , respectively, are shown in Figure 4.4 (it is assumed
that both points have just been accepted by the filter, and thus it makes sense to use
one of them as the new polling center). Refer to [60] and [61] for more details on
pattern search filter methods.
Acknowledgements. We are grateful to the industry sponsors of the Stanford Smart Fields
Consortium for partial funding of this work, and also to J. Smith for his valuable suggestions.
4 Derivative-Free Optimization 79
References
[1] Pironneau, O.: On optimum design in fluid mechanics. Journal of Fluid Mechanics 64,
97–110 (1974)
[2] Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: new perspec-
tives on some classical and modern methods. SIAM Review 45(3), 385–482 (2003)
[3] Conn, A.R., Scheinberg, K., Vicente, L.N.: Introduction to Derivative-Free Optimiza-
tion. MPS-SIAM Series on Optimization. MPS-SIAM (2009)
[4] Gilmore, P., Kelley, C.T.: An implicit filtering algorithm for optimization of functions
with many local minima. SIAM Journal on Optimization 5, 269–285 (1995)
[5] Kelley, C.T.: Iterative Methods for Optimization. In: Frontiers in Applied Mathemat-
ics, SIAM, Philadelphia (1999)
[6] Dennis Jr., J.E., Schnabel, R.B.: Numerical Methods for Unconstrained Optimization
and Nonlinear Equations. SIAM’s Classics in Applied Mathematics Series. SIAM,
Philadelphia (1996)
[7] Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Heidelberg
(2006)
[8] Schilders, W.H.A., van der Vorst, H.A., Rommes, J.: Model Order Reduction: The-
ory, Research Aspects and Applications. Mathematics in Industry Series. Springer,
Heidelberg (2008)
[9] Conn, A.R., Gould, N.I.M.: Toint, Ph.L.: Trust-Region Methods. MPS-SIAM Series
on Optimization. MPS-SIAM (2000)
[10] Meza, J.C., Martinez, M.L.: On the use of direct search methods for the molecular
conformation problem. Journal of Computational Chemistry 15, 627–632 (1994)
[11] Booker, A.J., Dennis Jr., J.E., Frank, P.D., Moore, D.W., Serafini, D.B.: Optimization
using surrogate objectives on a helicopter test example. In: Borggaard, J.T., Burns, J.,
Cliff, E., Schreck, S. (eds.) Computational Methods for Optimal Design and Control,
pp. 49–58. Birkháuser, Basel (1998)
[12] Marsden, A.L., Wang, M., Dennis Jr., J.E., Moin, P.: Trailing-edge noise reduction
using derivative-free optimization and large-eddy simulation. Journal of Fluid Me-
chanics 572, 13–36 (2003)
[13] Duvigneau, R., Visonneau, M.: Hydrodynamic design using a derivative-free method.
Structural and Multidisciplinary Optimization 28, 195–205 (2004)
[14] Fowler, K.R., Reese, J.P., Kees, C.E., Dennis Jr., J.E., Kelley, C.T., Miller, C.T., Audet,
C., Booker, A.J., Couture, G., Darwin, R.W., Farthing, M.W., Finkel, D.E., Gablonsky,
J.M., Gray, G., Kolda, T.G.: Comparison of derivative-free optimization methods for
groundwater supply and hydraulic capture community problems. Advances in Water
Resources 31(5), 743–757 (2008)
[15] Oeuvray, R., Bierlaire, M.: A new derivative-free algorithm for the medical image
registration problem. International Journal of Modelling and Simulation 27, 115–124
(2007)
[16] Marsden, A.L., Feinstein, J.A., Taylor, C.A.: A computational framework for
derivative-free optimization of cardiovascular geometries. Computational Methods in
Applied Mechanics and Engineering 197, 1890–1905 (2008)
[17] Artus, V., Durlofsky, L.J., Onwunalu, J.E., Aziz, K.: Optimization of nonconven-
tional wells under uncertainty using statistical proxies. Computational Geosciences 10,
389–404 (2006)
80 O. Kramer, D. Echeverrı́a Ciaurri, and S. Koziel
[18] Dadashpour, M., Echeverrı́a Ciaurri, D., Mukerji, T., Kleppe, J., Landrø, M.: A
derivative-free approach for the estimation of porosity and permeability using time-
lapse seismic and production data. Journal of Geophysics and Engineering 7, 351–368
(2010)
[19] Echeverrı́a Ciaurri, D., Isebor, O.J., Durlofsky, L.J.: Application of derivativefree
methodologies for generally constrained oil production optimization problems. In-
ternational Journal of Mathematical Modelling and Numerical Optimisation 2(2),
134–161 (2011)
[20] Onwunalu, J.E., Durlofsky, L.J.: Application of a particle swarm optimization algo-
rithm for determining optimum well location and type. Computational Geosciences 14,
183–198 (2010)
[21] Zhang, H., Conn, A.R., Scheinberg, K.: A derivative-free algorithm for leastsquares
minimization. SIAM Journal on Optimization 20(6), 3555–3576 (2010)
[22] Torczon, V.: On the convergence of pattern search algorithms. SIAM Journal on Opti-
mization 7(1), 1–25 (1997)
[23] Audet, C., Dennis Jr., J.E.: Analysis of generalized pattern searches. SIAM Journal on
Optimization 13(3), 889–903 (2002)
[24] Audet, C., Dennis Jr., J.E.: Mesh adaptive direct search algorithms for constrained
optimization. SIAM Journal on Optimization 17(1), 188–217 (2006)
[25] Hooke, R., Jeeves, T.A.: Direct search solution of numerical and statistical problems.
Journal of the ACM 8(2), 212–229 (1961)
[26] Powell, M.J.D.: The NEWUOA software for unconstrained optimization without
derivatives. Technical report DAMTP 2004/NA5, Dept. of Applied Mathematics and
Theoretical Physics, University of Cambridge (2004)
[27] Oeuvray, R., Bierlaire, M.: BOOSTERS: a derivative-free algorithm based on radial
basis functions. International Journal of Modelling and Simulatio 29(1), 26–36 (2009)
[28] Metropolis, N., Rosenbluth, A., Teller, A., Teller, E.: Equation of state calculations by
fast computing machines. Chemical Physics 21(6), 1087–1092 (1953)
[29] Kirkpatrick, S., Gelatt Jr., C.D., Vecchi, M.: Optimization by simulated annealing.
Science 220(4498), 671–680 (1983)
[30] Glover, F.: Tabu search – part I. ORSA Journal on Computing 1(3), 190–206 (1990)
[31] Glover, F.: Tabu search – part II. ORSA Journal on Computing 2(1), 4–32 (1990)
[32] Dorigo, M.: Optimization, Learning and Natural Algorithms. PhD thesis, Dept. of
Electronics, Politecnico di Milano (1992)
[33] Dorigo, M., Stützle, T.: Ant Colony Optimization. Prentice-Hall, Englewood Cliffs
(2004)
[34] Farmer, J., Packard, N., Perelson, A.: The immune system, adaptation and machine
learning. Physica 2, 187–204 (1986)
[35] Castro, L.N.D., Timmis, J.: Artificial Immune Systems: A New Computational Intel-
ligence. Springer, Heidelberg (2002)
[36] Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan
Press (1975)
[37] Fogel, D.B.: Artificial Intelligence through Simulated Evolution. Wiley, Chichester
(1966)
[38] Beyer, H.-G., Schwefel, H.-P.: Evolution strategies - a comprehensive introduction.
Natural Computing 1, 3–52 (2002)
[39] Rechenberg, I.: Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzip-
ien der Biologischen Evolution. Frommann-Holzboog (1973)
4 Derivative-Free Optimization 81
[40] Schwefel, H.-P.: Numerische Optimierung von Computer-Modellen mittel der Evolu-
tionsstrategie. Birkhäuser, Basel (1977)
[41] Goldberg, D.E.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, Reading (1989)
[42] Holland, J.H.: Hidden Order: How Adaptation Builds Complexity. Addison- Wesley,
London (1995)
[43] Beyer, H.-G.: An alternative explanation for the manner in which genetic algorithms
operate. BioSystems 41(1), 1–15 (1997)
[44] Schwefel, H.-P.: Adaptive mechanismen in der biologischen evolution und ihr einfluss
auf die evolutionsgeschwindigkeit. In: Interner Bericht der Arbeitsgruppe Bionik und
Evolutionstechnik am Institut für Mess- und Regelungstechnik, TU Berlin (1974)
[45] Beyer, H.-G., Sendhoff, B.: Covariance matrix adaptation revisited – the CMSA evo-
lution strategy –. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.)
PPSN 2008. LNCS, vol. 5199, pp. 123–132. Springer, Heidelberg (2008)
[46] Ostermeier, A., Gawelczyk, A., Hansen, N.: A derandomized approach to selfadapta-
tion of evolution strategies. Evolutionary Computation 2(4), 369–380 (1994)
[47] Teytaud, F., Teytaud, O.: Why one must use reweighting in estimation of distributional-
gorithms. In: Proceedings of the 11th Annual conference on Genetic and Evolutionary
Computation (GECCO 2009), pp. 453–460 (2009)
[48] Grahl, J., Bosman, P.A.N., Rothlauf, F.: The correlation-triggered adaptive variance
scaling idea. In: Proceedings of the 8th Annual conference on Genetic and Evolution-
ary Computation (GECCO 2006), pp. 397–404 (2006)
[49] Bosman, P.A.N., Grahl, J., Thierens, D.: Enhancing the performance of maximum–
likelihood gaussian eDAs using anticipated mean shift. In: Rudolph, G., Jansen, T.,
Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 133–143.
Springer, Heidelberg (2008)
[50] Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the IEEE
International Conference on Neural Networks, pp. 1942–1948 (1995)
[51] Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global
optimization over continuous spaces. Journal of Global Optimization 11, 341–359
(1997)
[52] Chakraborty, U.: Advances in Differential Evolution. SCI. Springer, Heidelberg (2008)
[53] Griffin, J.D., Kolda, T.G.: Nonlinearly-constrained optimization using asynchronous
parallel generating set search. Technical report SAND2007-3257, Sandia National
Laboratories (2007)
[54] Hestenes, M.R.: Multiplier and gradients methods. Journal of Optimization Theory
and Applications 4(5), 303–320 (1969)
[55] Powell, M.J.D.: A method for nonlinear constraints in minimization problems. In:
Fletcher, R. (ed.) Optimization, pp. 283–298. Academic Press, London (1969)
[56] Conn, A.R., Gould, N.I.M., Toint, P.L.: A globally convergent augmented Lagrangian
algorithm for optimization with general constraints and simple bounds. SIAM Journal
on Numerical Analysis 28(2), 545–572 (1991)
[57] Conn, A.R., Gould, N.I.M., Toint, P.L.: LANCELOT: A Fortran Package for Large-
Scale Nonlinear Optimization (Release A). Computational Mathematics. Springer,
Heidelberg (1992)
[58] Lewis, R.M., Torczon, V.: A direct search approach to nonlinear programming prob-
lems using an augmented Lagrangian method with explicit treatment of the linear con-
straints. Technical report WM-CS-2010-01, Dept. of Computer Science, College of
William & Mary (2010)
82 O. Kramer, D. Echeverrı́a Ciaurri, and S. Koziel
[59] Fletcher, R., Leyffer, S., Toint, P.L.: A brief history of filter methods. Technical report
ANL/MCS/JA-58300, Argonne National Laboratory (2006)
[60] Audet, C., Dennis Jr., J.E.: A pattern search filter method for nonlinear programming
without derivatives. SIAM Journal on Optimization 14(4), 980–1010 (2004)
[61] Abramson, M.A.: NOMADm version 4.6 User’s Guide. Dept. of Mathematics and
Statistics, Air Force Institute of Technology (2007)
[62] Belur, S.V.: CORE: constrained optimization by random evolution. In: Koza, J.R. (ed.)
Late Breaking Papers at the Genetic Programming 1997 Conference, pp. 280–286
(1997)
[63] Coello Coello, C.A.: Theoretical and numerical constraint handling techniques used
with evolutionary algorithms: a survey of the state of the art. Computer Methods in
Applied Mechanics and Engineering 191(11-12), 1245–1287 (2002)
[64] Parmee, I.C., Purchase, G.: The development of a directed genetic search technique
for heavily constrained design spaces. In: Parmee, I.C. (ed.) Proceedings of the Con-
ference on Adaptive Computing in Engineering Design and Control, pp. 97–102.
University of Plymouth (1994)
[65] Surry, P.D., Radcliffe, N.J., Boyd, I.D.: A multi-objective approach to constrained
optimisation of gas supply networks: the COMOGA method. In: Fogarty, T.C. (ed.)
AISB-WS 1995. LNCS, vol. 993, pp. 166–180. Springer, Heidelberg (1995)
[66] Coello Coello, C.A.: Treating constraints as objectives for single-objective evolution-
ary optimization. Engineering Optimization 32(3), 275–308 (2000)
[67] Coello Coello, C.A.: Constraint handling through a multiobjective optimization tech-
nique. In: Proceedings of the 8th Annual conference on Genetic and Evolutionary
Computation (GECCO 1999), pp. 117–118 (1999)
[68] Jimenez, F., Verdegay, J.L.: Evolutionary techniques for constrained optimization
problems. In: Zimmermann, H.J. (ed.) 7th European Congress on Intelligent Tech-
niques and Soft Computing (EUFIT 1999). Springer, Heidelberg (1999)
[69] Mezura-Montes, E., Coello Coello, C.A.: Constrained optimization via multiobjective
evolutionary algorithms. In: Knowles, J., Corne, D., Deb, K., Deva, R. (eds.) Multiob-
jective Problem Solving from Nature. Natural Computing Series, pp. 53–75. Springer,
Heidelberg (2008)
[70] Deb, K., Agrawal, S., Pratap, A., Meyarivan, T.: A fast and elitist multiobjective
genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6(2),
182–197 (2002)
[71] Zitzler, E., Laumanns, M., Thiele, L.: SPEA2: improving the strength Pareto evolution-
ary algorithm for multiobjective optimization. In: Evolutionary Methods for Design,
Optimisation and Control with Application to Industrial Problems (EUROGEN 2001),
pp. 95–100 (2002)
[72] Schoenauer, M., Xanthakis, S.: Constrained GA optimization. In: Forrest, S. (ed.)
Proceedings of the 5th International Conference on Genetic Algorithms (ICGA 1993),
pp. 573–580. Morgan Kaufmann, San Francisco (1993)
[73] Montes, E.M., Coello Coello, C.A.: A simple multi-membered evolution strategy to
solve constrained optimization problems. IEEE Transactions on Evolutionary Compu-
tation 9(1), 1–17 (2005)
[74] Liang, J., Suganthan, P.: Dynamic multi-swarm particle swarm optimizer with a novel
constraint-handling mechanism. In: Yen, G.G., Lucas, S.M., Fogel, G., Kendall, G.,
Salomon, R., Zhang, B.-T., Coello Coello, C.A., Runarsson, T.P. (eds.) Proceedings of
the 2006 IEEE Congress on Evolutionary Computation (CEC 2006), pp. 9–16. IEEE
Press, Los Alamitos (2006)
4 Derivative-Free Optimization 83
[75] Raidl, G.R.: A unified view on hybrid metaheuristics. In: Almeida, F., Blesa Aguilera,
M.J., Blum, C., Moreno Vega, J.M., Pérez Pérez, M., Roli, A., Sampels, M. (eds.) HM
2006. LNCS, vol. 4030, pp. 1–12. Springer, Heidelberg (2006)
[76] Talbi, E.G.: A taxonomy of hybrid metaheuristics. Journal of Heuristics 8(5), 541–564
(2002)
[77] Griewank, A.: Generalized descent for global optimization. Journal of Optimization
Theory and Applications 34, 11–39 (1981)
[78] Duran Toksari, M., Güner, E.: Solving the unconstrained optimization problem by
a variable neighborhood search. Journal of Mathematical Analysis and Applica-
tions 328(2), 1178–1187 (2007)
Chapter 5
Maximum Simulated Likelihood Estimation:
Techniques and Applications in Economics
5.1 Introduction
The econometric analysis of models for multivariate discrete data is often compli-
cated by intractability of the likelihood function, which can rarely be evaluated di-
rectly and typically has to be estimated by simulation. In such settings, the efficiency
of likelihood estimation plays a key role in determining the theoretical properties and
practical appeal of standard optimization algorithms that rely on those estimates. For
this reason, the development of fast and statistically efficient techniques for estimat-
ing the value of the likelihood function has been at the forefront of much of the
research on maximum simulated likelihood estimation in econometrics.
In this paper we examine the performance of a method for estimating the ordi-
nate of the likelihood function which was recently proposed in [8]. The method is
rooted in Markov chain Monte Carlo (MCMC) theory and simulation [3, 4, 15, 18],
Ivan Jeliazkov · Alicia Lloro
Department of Economics, University of California,
Irvine, 3151 Social Science Plaza, Irvine, CA 92697
e-mail: [email protected],[email protected]
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 85–100.
springerlink.com c Springer-Verlag Berlin Heidelberg 2011
86 I. Jeliazkov and A. Lloro
and its ingredients have played a central role in Bayesian inference in econometrics
and statistics. The current implementation of those methods, however, is intended to
examine their applicability to purely frequentist problems such as maximum likeli-
hood estimation and hypothesis testing.
We implement the methodology to study firm-level patent registration data in
four patent categories in the “computers & instruments” industry during the 1980s.
One goal of this application is to examine how patent counts in each category are
affected by firm characteristics such as sales, workforce size, and research & devel-
opment (R&D) capital. A second goal is to study the degree of complementarity or
substitutability that emerges among patent categories due to a variety of unobserved
factors, such as firms’ internal R&D decisions, resource concentration, managerial
dynamics, technological spillovers, and the relevance of innovations across category
boundaries. These factors can affect multiple patent categories simultaneously and
necessitate the specification of a joint empirical structure that can flexibly capture
interdependence patterns.
We approach these tasks by considering a copula model for multivariate count
data which enables us to pursue joint modeling and estimation. Because the out-
come probabilities in the copula model are difficult to evaluate, we rely on MCMC
simulation to evaluate the likelihood function. Moreover, to improve the perfor-
mance of the optimization algorithm, we implement a quasi-Newton optimization
method due to [1] that exploits a fundamental statistical relation to avoid direct
computation of the Hessian matrix of the log-likelihood function. The application
demonstrates that the simulated likelihood algorithm performs very well – even with
few MCMC draws, the precision of the likelihood estimate is sufficient for produc-
ing reliable parameter estimates and hypothesis tests. The results support the case
for joint modeling and estimation in our application and reveal interesting comple-
mentarities among several patent categories.
The remainder of this chapter is organized as follows. In Section 5.2, we present
the copula model that we use in our application and the likelihood function that we
use in estimation. The likelihood function is difficult to evaluate because it is given
by a set of integrals with no closed-form solution. For this reason, in Section 5.3,
we present the MCMC-based simulation algorithm for evaluating this function and
discuss how it can be embedded in a standard optimization algorithm to maxi-
mize the log-likelihood function and yield parameter estimates and standard errors.
Section 5.4 presents the results from our patent application and demonstrates the
performance of the estimation algorithm. Section 5.5 offers concluding remarks.
The generality of the approach rests on the recognition that a copula can be viewed
as a q-dimensional distribution function with uniform marginals, each of which
can be related to an arbitrary known cumulative distribution function (cdf) Fj (·),
j = 1, . . . , q. For example, if a random variable u j is uniform u j ∼ U(0, 1), and
y j = Fj−1 (u j ), then it is easy to show that y j ∼ Fj (·). As a consequence, if the vari-
ables y1 , . . . , yq have corresponding univariate cdfs F1 (y1 ), . . . , Fq (yq ) taking values
in [0, 1], a copula is a function that can be used to link or “couple” those univariate
marginal distributions to produce the joint distribution function F(y1 , . . . , yq ):
A detailed overview of copulas is provided in [9], [13], and [20]. The key feature
that will be of interest here is that they provide a way to model dependence among
multiple random variables when their joint distribution is not easy to specify, in-
cluding cases where the marginal distributions {Fj (·)} belong to entirely different
parametric classes.
There are several families of copulas, but the Gaussian copula is a natural mod-
eling choice when one is interested in extensions beyond the bivariate case. The
Gaussian copula is given by
where u = (u1 , . . . , uq ) , Φ represents the standard normal cdf, and Φq is the cdf for
a multivariate normal vector z = (z1 , . . . , zq ) , z ∼ N(0, Ω ), where Ω is in correlation
form with ones on the main diagonal. The data generating process implied by the
Gaussian copula specification is given by
yi j = Fi−1
j {Φ (zi j )}, zi ∼ N(0, Ω ), i = 1, . . . , n, j = 1, . . . , q, (5.3)
88 I. Jeliazkov and A. Lloro
yi j ∼ NB(λi j , α j ), (5.4)
To relate the negative binomial distribution to the Gaussian copula, the pmf and
cdf computed in 5.5 and 5.6, respectively, can be used to find unique, recursively
determined cutpoints
that partition the standard normal distribution so that for zi j ∼ N(0, 1), we have
Pr(zi j ≤ γi j,U ) = Fj (yi j |λi j , α j ) and Pr(γi j,L < zi j ≤ γi j,U ) = Pr(yi j |λi j , α j ). Hence,
the cutpoints in 5.7 provide the range Bi j = (γi j,L , γi j,U ] of zi j that is consistent with
each observed outcome yi j in 5.3. In turn, because zi = (zi1 , . . . , ziq ) ∼ N(0, Ω ), the
5 Maximum Simulated Likelihood Estimation 89
Gaussian copula representation implies that the joint probability of observing the
vector yi = (yi1 , . . . , yiq ) is given by
Pr(yi |θ , Ω ) = ··· fN (zi |0, Ω )dzi , (5.8)
Biq Bi1
in which fN (·) denotes the normal density and, for notational convenience, we let
θ = (θ1 , . . . , θq ) , where θ j = (β j , α j ) represents the parameters of the jth marginal
model, which determine the regions of integration Bi j = (γi j,L , γi j,U ], j = 1, . . . , q.
Figure 5.1 offers an example of how the region of integration is constructed in the
simple bivariate case. Because of the dependence introduced by the correlation ma-
trix Ω , the probabilities in 5.8 have no closed-form solution and will be estimated
by MCMC simulation methods in this chapter. Once computed, the probabilities
in 5.8, also called likelihood contributions, can be used to construct the likelihood
function
n
f (y|θ , Ω ) = ∏ Pr(yi |θ , Ω ). (5.9)
i=1
γi2,U
γi2,L
2
z
γi1,L γi1,U
z1
Fig. 5.1 An example of the region of integration implied by a bivariate Gaussian copula
model
90 I. Jeliazkov and A. Lloro
where Bi = Bi1 × Bi2 × · · · × Biq and fT NBi (·) represents the truncated normal den-
sity that accounts for the truncation constraints reflected in Bi . This representation
follows by Bayes formula because Pr(yi |θ , Ω ) is the normalizing constant of a trun-
cated normal distribution, and its representation in terms of the quantities in 5.10 is
useful for developing an estimation strategy that is simple and efficient. As discussed
in [3], this identity is particularly useful because it holds for any value of zi ∈ Bi and
therefore, given that the numerator quantities 1{z∗i ∈ Bi } and fN (z∗i |0, Ω ) in 5.10 are
directly available, the calculation is reduced to finding an estimate of the ordinate
fT NBi (z∗i |0, Ω ) at a single point z∗i ∈ Bi , typically taken to be the sample mean of the
draws zi ∼ T NBi (0, Ω ) that will be simulated in the estimation procedure (details
will be presented shortly). The log-probability is subsequently obtained as
i |θ , Ω ) = ln fN (z∗i |0, Ω ) − ln f
ln Pr(y ∗
T NBi (zi |0, Ω ), (5.11)
To estimate f ∗
T NBi (zi |0, Ω ) in 5.11, the CRT method relies on random draws zi ∼
T NBi (0, Ω ) which are produced by the Gibbs sampling algorithms of [5] or [16],
where a new value for zi is generated by iteratively simulating each element zi j
from its full-conditional density zi j ∼ f (zi j |yi j , {zik }k= j , Ω ) = T NBi j (μi j , σi2j ) for
5 Maximum Simulated Likelihood Estimation 91
j = 1, . . . , q. In the preceding, μi j and σi2j are the conditional mean and variance of
zi j given {zik }k= j and are obtained by the usual formulas for a conditional Gaussian
density. MCMC simulation of zi ∼ T NBi (0, Ω ) is an important tool for drawing from
this density, which is non-standard due to the multiple constraints defining the set
Bi and the correlations in Ω .
The Gibbs transition kernel for moving from a point zi to z∗i is given by the
product of univariate truncated normal full-conditional densities
J
K(zi , z∗i |yi , θ , Ω ) = ∏ f (z∗i j |yi , {z∗ik }k< j , {zik }k> j , θ , Ω ). (5.12)
j=1
a more general version of which was exploited for density estimation in [15]. There-
fore, an estimate of fT NBi (z∗i |0, Ω ) for use in 5.10 or 5.11 can be obtained by invok-
ing 5.13 and averaging the transition kernel K (zi , z∗i |yi , θ , Ω ) with respect to draws
(g)
from the truncated normal distribution zi ∼ T NBi (0, Ω ), g = 1, . . . , G, i.e.
1 G
∑ K(zi , z∗i |yi , θ , Ω ).
(g)
f ∗
T NBi (zi |0, Ω ) = (5.14)
G g=1
where gt = ∂ ln f (y|ψt )/∂ ψt and Ht = ∂ 2 ln f (y|ψt )/∂ ψt ∂ ψt are the gradient vec-
tor and Hessian matrix, respectively, of the log-likelihood function at ψt and λ is a
step size. Gradient-based methods are widely used in log-likelihood optimization
because many statistical models have well-behaved log-likelihood functions and
gradients and Hessian matrices are often required for statistical inference, e.g. in ob-
taining standard errors or Lagrange multiplier test statistics. The standard Newton-
Raphson method, however, has well-known drawbacks. One is that computation
of the Hessian matrix can be quite computationally intensive. For a k dimensional
parameter vector ψ , computing the Hessian requires O(k2 ) evaluations of the log-
likelihood function. In the context of simulated likelihood estimation, where k can
be very large and each likelihood evaluation can be very costly, evaluation of the
Hessian presents a significant burden that adversely affects the computational effi-
ciency of Newton-Raphson. Another problem is that (−H) may fail to be positive
definite. This may be due to purely numerical issues (e.g. the computed Hessian may
be a poor approximation to the analytical one) or it may be caused by non-concavity
of the log-likelihood function. In those instances, the Newton-Raphson iterations
will fail to converge to a local maximum.
To deal with these difficulties, [1] noted that an application of a fundamental sta-
tistical relationship, known as the information identity, obviates the need for direct
computation of the Hessian. Because we are interested in maximizing a statistical
function given by the sum of the log-likelihood contributions over a sample of obser-
vations, it is possible to use statistical theory to speed up the iterations. In particular,
by definition we have
f (y|ψ )dy = 1, (5.16)
where it is assumed that if there are any limits of integration, they do not depend on
the parameters
ψ . With this assumption,
an application of Leibniz’s theorem implies
that ∂ { f (y|ψ )dy}/∂ ψ = ∂ f (y|ψ )/∂ ψ dy. Moreover, because ∂ f (y|ψ )/∂ ψ =
{∂ ln f (y|ψ )/∂ ψ } f (y|ψ ), upon differentiation of both sides of 5.16 with appropri-
ate substitutions, we obtain
∂ ln f (y|ψ )
f (y|ψ )dy = 0. (5.17)
∂ψ
Differentiating 5.17 with respect to ψ once again (recalling that under our assump-
tions we can interchange integration and differentiation), we get
2
∂ ln f (y|ψ ) ∂ ln f (y|ψ ) ∂ f (y|ψ )
f (y| ψ ) + dy = 0, (5.18)
∂ ψ∂ ψ ∂ψ ∂ ψ
where, taking advantage of the equality ∂ f (y|ψ )/∂ ψ = {∂ ln f (y|ψ )/∂ ψ } f (y|ψ )
once again, we obtain the primary theoretical result underlying the BHHH approach
∂ 2 ln f (y|ψ ) ∂ ln f (y|ψ ) ∂ ln f (y|ψ )
− f (y|ψ )dy = f (y|ψ )dy. (5.19)
∂ ψ∂ ψ ∂ψ ∂ ψ
5 Maximum Simulated Likelihood Estimation 93
The left side of equation 5.19 gives E(−H), whereas on the right side we have
E(gg ) which also happens to be Var(g) because from 5.17 we know that E(g) = 0.
Now, because the log-likelihood is the sum of independent log-likelihood contribu-
tions, i.e. ln f (y|ψ ) = ∑ni=1 ln f (yi |ψ ), it follows that
n n
Var(g) = ∑ Var(gi ) ≈ ∑ gi gi ,
i=1 i=1
in which gi = ∂ ln f (yi |ψ )/∂ ψ . Therefore, the BHHH algorithm for maximizing the
log-likelihood function relies on the recursions
Table 5.1 Descriptive statistics for the explanatory variables in the patent count application
5.4 Application
In this section, we implement the methodology developed earlier to study the joint
behavior of firm-level patent registrations in four technology categories in the “com-
puters & instruments” industry during the 1980s. We use the data sample of [10],
which consists of n = 498 observations on 254 manufacturing firms from the U.S.
Patent & Trademark Office data set discussed in [7] and [11]. The response variable
is a 4 × 1 vector yi (i = 1, . . . , 498) containing firm-level counts of registered patents
in communications (COM), computer hardware & software (CHS), computer pe-
ripherals (CP), and information storage (IS). The explanatory variables reflect the
characteristics of individual firms and, in addition to a category specific intercept,
include sales (SALES), workforce size (WF), and R&D capital (RDC). Sales are
measured by the annual sales revenue of each firm, while the size of the workforce
is given by the number of employees that the firm reports to stockholders. R&D
capital is a variable constructed from the history of R&D investment using inven-
tory and depreciation rate accounting standards discussed in [7]. All explanatory
variables, except the intercept, are measured on the logarithmic scale. Table 5.1
contains variable explanations along with descriptive statistics.
To analyze these multivariate counts, we use a Gaussian copula model with nega-
tive binomial marginals which was presented in Section 5.2. The negative binomial
specification is suitable for this application because patent counts exhibit a heavy
right tail, and hence it is useful to specify a model that can account for the possible
presence and extent of over-dispersion. In addition to examining how patents in each
category are affected by firm characteristics, joint modeling allows us to study the
interdependence of patent counts that emerges due to technological spillovers, man-
agerial incentives, and internal R&D decisions. For instance, technological break-
throughs and know-how in one area may produce positive externalities and spill
over to other areas. Moreover, significant discoveries may produce patents in multi-
ple categories, resulting in positive correlation among patent counts. Alternatively,
the advancement of a particular technology may cause a firm to re-focus and con-
centrate its resources to that area at the expense of others, thereby producing nega-
tive correlations. The dependence structure embodied in the correlation matrix Ω of
the Gaussian copula model that we consider is intended to capture these and other
factors that can affect multiple patent categories simultaneously.
We estimate the copula model by first estimating the parameters of each negative
binomial model separately by maximum likelihood and then using those estimates
5 Maximum Simulated Likelihood Estimation 95
as a starting point for maximizing the copula log-likelihood. The individual nega-
tive binomial models have well-behaved log-likelihood functions and are relatively
fast and straightforward to estimate by standard optimization techniques such as
those presented in Section 5.3.2. Parameter estimates for the independent negative
binomial models and the joint Gaussian copula model are presented in Table 5.2.
Table 5.2 Maximum simulated likelihood estimates of independent negative binomial (NB)
models and joint Gaussian copula model with standard errors in parentheses
The results in Table 5.2 largely accord with economic theory. Of particular in-
terest is the fact that in all cases the coefficients on ln(RDC) are positive, and for
CHS, CP, and IS, they are also economically and statistically significant. Specifi-
cally, those point estimates are relatively large in magnitude and lie more than 1.96
standard errors away from zero, which is the 5% critical value for a two-sided test
under asymptotic normality. This indicates that innovation in those categories is
capital-intensive and the stock of R&D capital is a key determinant of patenting ac-
tivity. The results also suggest that, all else being equal, the introduction of patents
tends to be done by large firms, as measured by the size of the company workforce
ln(W F). The coefficient on that variable in the communications category is large
and statistically significant, whereas in the other three categories the estimates are
positive but not significant at the customary significance levels. Interestingly, and
perhaps counter-intuitively, the coefficients on ln(SALES) in these categories are
predominantly negative (with the exception of computer peripherals), and none are
statistically significant. To explain this puzzling finding, economists have proposed
96 I. Jeliazkov and A. Lloro
0.0429
0.0328
0.0296
0.0188
0.0155
0.0099
NSE
0.0065
0.0044
0.003
0.0014
Fig. 5.2 Numerical standard errors (NSE) of the log-likelihood estimate as a function of the
MCMC sample size in the CRT method (the axes, but not the values, are on the logarithmic
scale)
1.015
1.01
Standard error ratios
1.005
0.995
Fig. 5.3 Boxplots of the ratios of parameter standard errors estimated for each MCMC sam-
ple size setting G relative to those for G = 1500; the lines in the boxes mark the quartiles, the
whiskers extend to values within 1.5 times the interquartile range, and outliers are displayed
by “+”
G ∈ {25, 50, 100, 500, 1500}. We then compare the behavior of the standard errors
for lower values of G relative to those for large G. Figure 5.3 presents boxplots of
the ratios of the parameter standard errors estimated for each setting of G relative
to those at the highest value G = 1500. The results suggest that while at lower val-
ues of G the standard error estimates are somewhat more volatile than at G = 1500,
neither the volatility nor the possible downward bias in the estimates represents a
significant concern. Because the CRT method produces very efficient estimates of
the log-likelihood ordinate, such issues are not problematic even with small MCMC
samples, although in practice G should be set conservatively high, subject to one’s
computational budget.
four categories of U.S. technology patents using a Gaussian copula model for multi-
variate count data. The results support the case for joint modeling and estimation of
the patent categories and suggest that the estimation techniques perform very well in
practice. Additionally, the CRT estimates of the log-likelihood function are very ef-
ficient and produce reliable parameter estimates, standard errors, and hypothesis test
statistics, mitigating any potential problems (discussed at the end of Section 5.3.2)
that could arise due to maximizing a simulation-based estimate of the log-likelihood
function.
We note that the simulated likelihood methods discussed here can be applied in
optimization algorithms that do not require differentiation, for example in simulated
annealing and metaheuristic algorithms which are carefully examined and summa-
rized in [22]. At present, however, due to the computational intensity of evaluating
the log-likelihood function at each value of the parameters, algorithms that require
numerous evaluations of the objective function can be very time consuming, espe-
cially if standard errors have to be computed by bootstrapping. Nonetheless, the
application of such algorithms is an important new frontier in maximum simulated
likelihood estimation.
References
1. Berndt, E., Hall, B., Hall, R., Hausman, J.: Estimation and Inference in Nonlinear Struc-
tural Models. Annals of Economic and Social Measurement 3, 653–665 (1974)
2. Cameron, A.C., Trivedi, P.K.: Regression Analysis of Count Data. Cambridge University
Press, Cambridge (1998)
3. Chib, S.: Marginal Likelihood from the Gibbs Output. Journal of the American Statistical
Association 90, 1313–1321 (1995)
4. Chib, S., Greenberg, E.: Markov Chain Monte Carlo Simulation Methods in Economet-
rics. Econometric Theory 12, 409–431 (1996)
5. Geweke, J.: Efficient Simulation from the Multivariate Normal and Student-t Dis-
tributions Subject to Linear Constraints. In: Keramidas, E.M. (ed.) Proceedings of
the Twenty-Third Symposium on the Interface. Computing Science and Statistics,
pp. 571–578. Interface Foundation of North America, Inc, Fairfax (1991)
6. Greene, W.: Functional forms for the negative binomial model for count data. Economics
Letters 99, 585–590 (2008)
7. Hall, B.H.: The Manufacturing Sector Master File: 1959–1987. NBER Working paper
3366 (1990)
8. Jeliazkov, I., Lee, E.H.: MCMC Perspectives on Simulated Likelihood Estimation. Ad-
vances in Econometrics: Maximum Simulated Likelihood 26, 3–39 (2010)
9. Joe, H.: Multivariate Models and Dependence Concepts. Chapman and Hall, London
(1997)
10. Lee, E.H.: Essays on MCMC Estimation, Ph.D. Thesis, University of California, Irvine
(2010)
11. Mairesse, J., Hall, B.H.: Estimating the Productivity of Research and Development: An
Exploration of GMM Methods Using Data on French and United States Manufacturing
Firms. NBER Working paper 5501 (1996)
12. McFadden, D., Train, K.: Mixed MNL Models for Discrete Response. Journal of Applied
Econometrics 15, 447–470 (2000)
100 I. Jeliazkov and A. Lloro
6.1 Introduction
Reducing cost and improving service is the key to success in a competitive economic
climate. Although these objectives seem contradictory, there is a way to achieve
them. Spreading of service locations improves service and pooling of resources
can decrease cost if lateral transshipments are allowed between the locations.
The design and control of such multi-location systems is an important non-trivial
Christian A. Hochmuth
Manufacturing Coordination and Technology,
Bosch Rexroth AG, 97816 Lohr am Main, Germany
e-mail: [email protected]
Jörg Lässig
Institute of Computational Science, University of Lugano, Via Giuseppe Buffi 13,
6906 Lugano, Switzerland
e-mail: [email protected]
Stefanie Thiem
Institute of Physics, Chemnitz University of Technology,
09107 Chemnitz, Germany
e-mail: [email protected]
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 101–124.
springerlink.com c Springer-Verlag Berlin Heidelberg 2011
102 C.A. Hochmuth, J. Lässig, and S. Thiem
demand during the transshipment time and the time interval elapsing from the re-
lease moment of a TD until the next order quantity will arrive. Therefore, continu-
ous review MLIMTs are usually investigated under several simplifying assumptions,
e.g., two locations [7, 33], Poisson demand [25], a fixed ordering policy not consid-
ering future transshipments [27], restriction to simple rules such as a one-for-one
ordering policy [25] and an all-or-nothing transshipment policy [7], or the limita-
tion that at most one transshipment with negligible time and a single shipping point
during an order cycle is possible [33]. Nowhere the question for optimal ordering
and transshipment policies has been answered. All models work with a given order-
ing policy and heuristic transshipment rules. In few cases simulation is used either
for testing approximate analytical models [27, 33] or for the definition of the best re-
order point s for a (s, S)-ordering policy [24] by linear search and simulation. Thus,
the investigations are restricted to small-size models. Herer et al. [12] calculates
optimal order-up-to levels S using a sample-path-based optimization procedure and
subsequently finds optimal transshipment policies for given ordering policies apply-
ing linear programming to a network flow model. Extensions include finite trans-
portation capacities [28] and positive replenishment lead times [10]. Furthermore,
we investigate the effect of non-stationary transshipment policies under continuous
review. Thus, the complexity of this general model motivates the application of
simulation-based optimization with PSO instead of gradient-based methods. In
this regard, we follow an approach similar to Arnold et al. [1], and recently Bel-
gasmi et al. [2], who analyze the effect of different parameters using evolutionary
optimization.
Problem Solution
Optimizer
Performance
Candidate
analysis
solutions
Simulator
Fig. 6.1 Scheme for the iterative Simulation Optimization approach. The optimizer proposes
new candidate solutions for the given problem, whose performance are estimated by simula-
tion experiments. Depending on the estimated performance, the optimizer decides to either
accept the current decisions and stop the search process or to reject it and to continue
As seen from Figure 6.1, iterative SO is based upon two main elements – a sim-
ulator for the system to be investigated and an optimizer that finds acceptable solu-
tions. Generic simulators and optimizers are compatible, and thus SO is suited for
the solution of arbitrary complex optimization problems. In the past different appli-
cations of the outlined approach especially to inventory problems [21, 22, 16, 20]
have been implemented. In most cases Genetic Algorithms (GA) have been applied
so far. But just as GAs also Particle Swarm Optimization (PSO) is in fact suitable for
very general optimization problems. Contrary to gradient-based approaches, local
optima can be left. Hence, these methods are predestined for unknown or compli-
cated fitness landscapes. However, it is not guaranteed to find the global optimum,
but a very good solution is usually returned in reasonable time. Furthermore, they
rely only on a small amount of information and can be designed independently from
the application domain. It will be interesting to see if PSO deals as excellently with
the random output of stochastic simulation as GAs.
6 Optimizing Complex Multi-location Inventory Models 105
Orders Demand
1 2 ... N
Transshipments
Fig. 6.2 Logical view of a general Multi-Location Inventory Model with Lateral Transship-
ments (MLIMT). Each of the N locations may refill its stock either by ordering from an
outside supplier or by transshipments from other locations in order to meet the demand. The
locations are on an equal level without any predefined structure
Each of the N locations faces a certain demand for a single product or a finite
number of products. In the latter case a substitution order between products may be
defined. Most approaches assume a single product. For the consideration of multiple
products, sequential simulation and optimization is feasible, unless fixed cost or
finite resources are shared among products. However, this limitation is negligible
provided that shared fixed cost is insignificant relative to total fixed and variable
cost, and capacities for storage and transportation are considered to be infinite.
The ordering mode defines when to order, i.e., the review scheme, and what
ordering policy to use. The review scheme defines the time moments for order-
ing. Discrete and continuous review are the alternatives. Under the discrete review
scheme the planning horizon is divided into periods. Usually the ordering policy
is defined by its type and corresponding parameters (e.g., order-up-to, one-for-one,
(s, S), (R, Q)).
Central to the model specification is the definition of the demand process. It
may be deterministic or random, identical or different for all locations, station-
ary or non-stationary in time, independent or dependent across locations and time,
106 C.A. Hochmuth, J. Lässig, and S. Thiem
account for the ordering decision at the beginning of a period. But after the demand
realization at the end of a period the optimal transshipment decision results from
an open (linear) transportation problem. Such problems do not have closed form
solutions. Therefore, prior to the demand realization no expression is available for
the cost savings from transshipments. Both approximate models and simulation are
potential solutions, e.g., Köchel [18, 19] and Robinson [30].
ri (t), yi (t)
Si ri (t)
Si − ri (t)
si
yi (t)
Fig. 6.3 (si , Si )-ordering policy. If the inventory position ri of location i drops below the reorder
point si at the end of an order period, an order is released to the order-up-to level Si , i.e., Si − ri
product units are ordered. Analogously, but under continuous review, a transshipment order
(TO) of Hi − f TO,i (t) product units is released, if the state function f TO,i (t) falls below hi
review. However, in almost all such models a Poisson demand process is assumed –
a strong restriction as well.
Concerning the demand satisfaction mode, most models assume the back-order
or the lost-sales cases. An arriving client is enqueued according to a specific ser-
vice policy, such as First-In-First-Out (FIFO) and Last-In-First-Out (LIFO), sorting
clients by their arrival time, Smallest-Amount-Next (SAN) and Biggest-Amount-
Next (BAN), sorting clients by their unserved demand, and Earliest-Deadline-First
(EDF). In addition, a random impatient time is realized for each client.
To balance excess and shortage, the simulation model permits all pooling modes
from complete to time-dependent partial pooling. A symmetric N × N matrix P =
(pi j ) defines pooling groups in such a way that two locations i and j belong to the
same group if and only if pi j = 1, pi j = 0 otherwise. The following reflection is
crucial. Transshipments allow the fast elimination of shortages, but near to the end
of an order period transshipments may be less advantageous. Therefore, a parameter
tpool,i ∈ [0,tP,i ] is defined for each location i. After the kth order request, location i
can get transshipments from all other locations as long as for the actual time t ≤
ktP,i + tpool,i holds. For all other times location i can receive transshipments only
from locations that are in the same pooling group. Thus, the transshipment policies
become non-stationary in time.
Transshipments are in fact in the spotlight of this chapter. Regarding the transship-
ment mode, our simulation model allows transshipments at any time during an order
cycle (continuous review) as well as multiple shipping points and partial deliveries
to realize a transshipment decision (TD). To answer the question when to transship
what amount between which locations, a great variety of rules can be defined. Broad
applicability is achieved by three main ideas – priorities, introduction of a state func-
tion and generalization of common transshipment rules. Difficulties are caused by
the problem to calculate the effects of a TD. Therefore, TDs should be based on ap-
propriate forecasts for the dynamics of the model, especially the stock levels. The
MLIMT simulator offers several possibilities. For each location transshipment orders
(TO) and product offers (PO) are distinguished. Times for TOs or POs are the arrival
times of clients or deliveries, respectively. Priorities are used to define the sequence of
transshipments in one-to-many and many-to-one situations. Because of continuous
time only such situations occur, and thus, all possible cases are considered. The three
rules, Biggest-Amount-Next (BAN), Minimal-Transshipment-Cost per unit (MTC)
and Minimal-Transshipment-Time (MTT) may be combined arbitrarily. State func-
tions are used to decide when to release a TO or PO. The following variables for each
location i and time t ≥ 0 are used in further statements:
To decide at time t in location i about a TO or PO, the state functions fTO,i (t)
and fPO,i (t) are defined based on the available stock plus expected transshipments
fTO,i (t) = yi (t) + btr,i (t) and the on-hand stock fPO,i (t) = y+
i (t), respectively. Since
fixed cost components for transshipments are feasible, a heuristic (hi , Hi )-rule for
TOs is suggested in the following way, which is inspired by the (si , Si )-rule for or-
der requests (hi ≤ Hi ).
If fTO,i (t) < hi
release a TO for Hi − fTO,i (t) product units.
However, in case of positive transshipment times it may be advantageous to take
future demand into account. Thus, a TO is released on the basis of a forecast of
the state function fTO,i (t ) for a time moment t ≥ t, and the transshipment poli-
cies become non-stationary in time. The MLIMT simulator offers three such time
moments: t = t, the current time (i.e., no forecast), t = t1 , the next order review
moment, and t = t2 , the next potential moment of an order supply. For instance, the
state function fTO,i (t) = yi (t) + btr,i (t), t ≥ 0 is considered. Let ktP,i ≤ t < (k + 1)tP,i ,
i.e., we assume that we are in the review period after the kth order request. Then t1
is defined as follows.
t1 = (k + 1)tP,i . (6.1)
For t2 we introduce two events ev(t) ↔ {in the actual period there has not been an
order supply until t} and ev(t) ↔ {there has been an order supply until t}.
0 ev(t) ↔ t < (k − nord,i )tP,i + tA,i
t2 = (k − nord,i )tP,i + tA,i + . (6.2)
tP,i ev(t) ↔ t ≥ (k − nord,i )tP,i + tA,i
Thus, replacing function fTO,i (t) by various forecast functions, a great variety of
rules can be described to control the release of TOs. We remark that in case of lin-
ear transshipment cost functions without set-up part the (hi , Hi )-rule degenerates
to (Hi , Hi ). A well-designed optimization algorithm will approximate that solution.
Therefore, we work generally with the (hi , Hi )-rule. To serve a TO, at least one loca-
tion has to offer some product quantity. To decide when to offer what amount, an ad-
ditional control parameter is introduced – the offering level oi , corresponding to the
110 C.A. Hochmuth, J. Lässig, and S. Thiem
fˆi (t1 )
t t2
t1 t
fi (t)
fˆi (t2 )
Fig. 6.4 Forecast functions for ev(t) ↔ t < (k − nord,i )tP,i +tA,i . In the actual period there has
not been an order supply until t. Thus, the time moment t2 of the next order supply k − nord,i
is in the current period, and the supplied amount must be considered to forecast fˆi (t1 )
fˆi (t1 )
t t2
t1 t
fˆi (t2 )
Fig. 6.5 Forecast functions for ev(t) ↔ t ≥ (k − nord,i )tP,i + tA,i . In the actual period there
has been an order supply until t, and thus, the time moment t2 of the next order supply
k + 1 − nord,i is in the next period, not affecting fˆi (t1 )
hold-back level introduced by Xu et al. [33]. Since only on-hand stock can be trans-
shipped, the state function fPO,i (t) = y+ +
i (t) is defined. The offered amount yi (t)− oi
must not be smaller than a certain value Δ omin,i to prevent undesirably small and
frequent transshipments. Similar forecasts are applied to take future demand into
account with forecast moments t, t1 , and t2 . For details we refer to Hochmuth [13].
Thus, the PO rule is as follows.
If fˆPO,i (t) − oi ≥ Δ omin,i
release a PO for fˆPO,i (t) − oi product units.
6 Optimizing Complex Multi-location Inventory Models 111
Thus, the set of available transshipment policies is extended, including all com-
monly used policies, and allowing multiple shipping points with partial deliveries.
In order to measure the system performance by cost and gain functions, order,
holding, shortage (waiting) and transshipment cost functions may consist of fixed
values, components linear in time, and components linear in time and units. Fixed
cost arises from each non-served demand unit. All cost values are location-related.
The gain from a unit, sold by any location, is a constant. To track cost for infi-
nite planning horizons, appropriate approximations must be used. The only problem
with respect to finite horizons is the increase in computing time to get a sufficiently
accurate estimate, although the extent can be limited using parallelization.
Choosing cost function components in a specific way, cost criteria as well as
non-cost criteria can be used, e.g., the average ratio of customers experiencing a
stock-out or the average queue time measured by out-of-stock cost, or the efficiency
of logistics, indicated by order and transshipment cost.
The diagonal matrices R 1 and R 2 contain uniform random numbers in [0, 1) and
thus randomly weight each component of the connecting vector (rlbsf i − ri ) from
the current position ri to the locally best solutions rlbsf
i . The vector (rbsf − r ) is
i i
treated analogously. Since every component is multiplied with a different random
112 C.A. Hochmuth, J. Lässig, and S. Thiem
number, the vectors are not only changed in length but also perturbed from their
original direction. The new position follows then from the superposition of the three
vectors. By choosing the cognitive parameter c1 and the social parameter c2 , the
influence of these two components can be adjusted. The Standard PSO 2007 setup
uses c1 = c2 = 0.5 + ln(2) 1.193 and w = 1/(2 ln 2) 0.721 as proposed by Clerc
and Kennedy [5]. A number of N = 100 particles is chosen with a number of K = 3
informants.
The pseudo-code of the solution update for the swarm is shown in Algorithm 1.
The position and the velocity components for the different particles i and dimension
d are written as subscripts, i.e., vid is the d-th component of the velocity vector vi of
particle i. The iterative solution update of the vector vi is visualized in Figure 6.6.
6.6 Experimentation
y rlbsf,t
i
rt+1
i
c1 R1 (ri
lbsf,t
− rti )
rbsf,t
i
c2 R2 (rbsf,t
i − rti )
wvti
rti
Fig. 6.6 Iterative solution update of PSO in two dimensions. From the current particle posi-
tion rti the new position rt+1
i is obtained by vector addition of the velocity, the cognitive and
the social component
2
05
364.0
55
206.1
1
304.138
316.
228
18
0.2
78
5
55
206.1 14
1.
4 42
304.138 1
Fig. 6.7 Topology of the five-location model. An edge between two locations indicates that
these belong to the same pooling group, i.e., lateral transshipments are feasible at all times.
Along the edges distances in kilometers are shown
The other parameters are identical for all optimization runs. All locations i use
an (si , Si )-ordering policy, where the initial reorder points are si = 0 and the initial
order-up-to levels are S1 = 600, S2 = 900, S3 = 1, 200, S4 = 1, 500 and S5 = 1, 800.
114 C.A. Hochmuth, J. Lässig, and S. Thiem
For all locations i an order period is equal to 10 days, i.e., tP,i = 10 days. The
distances between all locations are visualized in Figure 6.7, and the transship-
ment velocity is chosen to be 50.00 km/h. The state function chosen for TOs is
fTO,i (t) = yi (t) + btr,i (t) and for POs fPO,i (t) = y+
i (t). To analyze the effect of fore-
casting demand for ordering transshipments, all combinations of the current time t
and the forecast moment t1 are compared for TOs. For offering product units, the
current time t is used. The priority sequence for TOs and POs is MTC, BAN, MTT.
The inter-arrival time of customers to a location i is an exponentially distributed
random variable with Ti = 2 h. The impatient time is triangularly distributed in the
interval (0 h, 8 h), i.e., Wi = 4 h. The customer demand is for all locations i uni-
formly distributed in [0, Bmax ] but with different maximum values, i.e., Bmax,1 = 10,
Bmax,2 = 15, Bmax,3 = 20, Bmax,4 = 25, Bmax,5 = 30, respectively. The initial inven-
tory of the five locations i is chosen to be Istart,1 = 600, Istart,2 = 900, Istart,3 = 1, 200,
Istart,4 = 1, 500, Istart,5 = 1, 800, respectively. However, the initialization values will
not have an influence after the transition time. The maximum capacity of the storage
is 10,000 product units for each location. The regular order delivery times at the end
of each period are location dependent as well. For location i the times are t1 = 2.0 d,
t2 = 2.5 d, t3 = 3.0 d, d4 = 3.5 d and t5 = 4.0 d.
The cost for storing product units is 1.00 e per unit and day, whereas the or-
der and transshipment cost is 1.00 e per unit and per day transportation time. The
fixed transshipment cost is 10.00 e for each location and the gain per unit sold is
100.00 e. The out-of-stock cost per product unit and waiting time are 1.00 e/h and
the out-of-stock cost for a canceling customer is 50.00 e. The fixed cost for each
periodic order is 500.00 e, and the order cost per product unit and day is 1.00 e.
The optimization criterion is the minimum total cost expected.
The simulation time is 468 weeks plus an additional transition time of 52 weeks
in the beginning. For optimization we use PSO with a population of 100 individuals
i, where an individual is a candidate solution ri , i.e., a real-valued vector of the
following policy parameters for each location i.
si Reorder point for periodic orders
Si Order-up-to level for periodic orders
hi Reorder point for transshipment orders
Hi Order-up-level for transshipment orders
oi Offer level
Δ omin,i Minimum offer quantity
tpool,i Pooling time
The optimization stops if a new optimum has not occurred for the last 2,000 cy-
cles, but at least 10,000 cycles must be realized to prevent early convergence in
a local optimum. On machines with two dual-core Opteron 270 2GHz processors,
one iteration consumes about 15 seconds runtime. For all experiments total results,
optimized parameter values and cost function values are determined. After the op-
timization the minimum absolute values of all parameters not changing the cost
function values are determined using a binary search.
6 Optimizing Complex Multi-location Inventory Models 115
Table 6.1 Overall results for the service policy First-In-First-Out (FIFO). Monitoring the
net-inventory level at the current time t while limiting transshipments to adjacent locations at
the end of the order period is the optimal policy
Result Cycle
Transshipment order Pooling Rank
in e p.a. optimum (total)
1 current time t all −19, 666, 901.01 9, 562 (11, 562) 3
2 current time t adjacent −19, 758, 378.05 9, 562 (11, 562) 2
3 time of next order t1 all −19, 530, 460.22 9, 562 (11, 562) 8
4 time of next order t1 adjacent −19, 562, 191.79 9, 562 (11, 562) 7
Table 6.2 Total results for service policy Earliest-Deadline-First (EDF). Observing the cur-
rent net inventory level and restricting pooling dominates the other choices, while EDF is
even slightly better than FIFO
Result Cycle
Transshipment order Pooling Rank
in e p.a. optimum (total)
1 current time t all −19, 652, 741.48 8, 259 (10, 259) 4
2 current time t next −19, 762, 585.05 8, 259 (10, 259) 1
3 time of next order t1 all −19, 565, 348.10 9, 562 (11, 562) 6
4 time of next order t1 next −19, 587, 244.27 9, 562 (11, 562) 5
Looking at the total results, there exists a lower bound regardless of the individual
policies. However, for both service policies solution 2 yields the best performance.
It is advantageous for this system to order transshipments based on the current net-
inventory level and to limit transshipments to adjacent locations at the end of an or-
der period. But even though the total results are similar, the optimal model structure
varies significantly. Therefore, all solutions are investigated in detail. The optimal
parameter values of the four considered systems are listed in Tables 6.3 and 6.4 for
FIFO and EDF, respectively. Pooling times tpool,i are optimized if transshipments
are restricted to adjacent locations, and set to the order period time tP,i otherwise.
The resulting flows are visualized in Figures 6.8–6.10 for the three best solutions.
116 C.A. Hochmuth, J. Lässig, and S. Thiem
4, 211.28
2
[1.00]
560.1
3
, 6 3]
[11
.80
]
[24.64
1, 635
1
47 9]
1, 121.4
[0.
.04
[9.41]
9
5
132.1
7
9
1, 756.72
4 [2.04]
[1.00] 89.20
[0.75] 3
Fig. 6.8 Flows for solution 2 (rank 1) using the service policy Earliest-Deadline-First (EDF).
Locations 2 and 4 act as hubs. The volumes ordered per period are listed next to these loca-
tions, as well as the order frequency in square brackets. Transshipments are indicated by di-
rected edges, in conjunction with transshipment volumes and frequencies in square-brackets
4, 225.69
2
[1.00]
565.1
2
]
[9.6 4
.07
]
[27.51
1, 646
1
42 3]
1, 125.1
[15.81]
[0.
.15
7
5
124.1
1
9
1, 742.31
4 [2 ]
.1 1
[1.00] 85.90
[1.23] 3
Fig. 6.9 Flows for solution 2 (rank 2) using the service policy First-In-First-Out (FIFO). This
solution is similar to EDF solution 2. Locations 2 and 4 act as hubs, while locations 1, 3 and
5 just receive transshipments, and thereby, act as spokes. Periodic order volumes are listed
next to the hubs, as well as transshipment volumes along the edges, and frequencies in square
brackets
The figures illustrate that the solutions 2 for FIFO and EDF, respectively, are
very similar. Moreover, there are two observations. First, there are locations period-
ically receiving and offering product units. These locations act as hubs in a hub-and-
spoke structure. Second, there are locations just receiving transshipments from other
6 Optimizing Complex Multi-location Inventory Models 117
606.22
1 5
[1.00] .9
74 71]
827.10
.
[7.27]
[0
5
140.0
9 1,
62
[0.8 ]
9 [9 8.4
4 1, 714.7 .5 6
4 7]
[9.48] 5, 370.12
3
[1.00]
Fig. 6.10 Flows for (First-In-First-Out) FIFO solution 1 (rank 3). Location 1 is isolated,
ordering products as indicated by the periodic order volume and frequency in square brackets
next to the location, but not exchanging transshipments. The offcut network is integrated with
location 3 as a hub. Transshipments are visualized by directed edges along with volumes per
period and frequencies in square brackets.
locations, and thus, never receiving periodic orders. Thereby, these locations act as
spokes. In EDF solution 2 – the best solution – locations 2 and 4 are considered as
hubs, while locations 1, 3 and 5 are spokes, see Figure 6.8. Thus, transshipments
take the role of periodic orders rather than just eliminating shortages due to stochas-
tic demand. This is a consequence of the general definition of transshipments under
continuous review, fixed order cost, and order lead times.
Furthermore, some solutions show a specific characteristic. A particular location
is isolated, receiving periodic orders but never exchanging transshipments, e.g., lo-
cation 1 in FIFO solution 1, cp. Figure 6.10. That points to the limitations of the
proposed heuristic. Ordering and offering decisions are based upon the inventory
level, not differentiating between target locations. Therefore, in specific situations it
may be more economical not to exchange transshipments at all. Setting up pooling
groups is a potential way to limit the complexity and to guide the optimization pro-
cess in this case. Of course, complete linear optimization of the transport problem
would be feasible, too, but at the cost of continuous review.
After studying elaborate model structures, which solution should a user imple-
ment? Tables 6.1 and 6.2 show the individual overall cost function values of all
solutions for FIFO and EDF, respectively. By further evaluating specific cost func-
tions as presented in Tables 6.5 and 6.6, decisions are better informed. Low out-
of-stock cost corresponds to high service quality, and low order and transshipment
cost indicates efficient logistics, if the total results are comparable. FIFO solution 1
in Table 6.5 leads to the least out-of-stock cost for all considered systems. A case
in point for contradictive objectives is FIFO solution 4. Product units are constantly
118 C.A. Hochmuth, J. Lässig, and S. Thiem
Table 6.3 Parameter values for the service policy First-In-First-Out (FIFO) corresponding
to the systems specified in Table 6.1. Prohibitive values, e.g., reorder points si never leading
to a positive ordering decision, are enclosed in square brackets. Thus, hubs can be identified
as locations which periodically order and offer product units. Spokes never receive periodic
orders but replenish their stock via transshipments
Table 6.4 Parameter values for the service policy Earliest-Deadline-First (EDF) correspond-
ing to the systems specified in Table 6.2. Square brackets indicate prohibitive values, never
leading to a positive ordering or transshipment decision. Hence, analogous to FIFO, there are
hubs, that order periodically, and spokes, that receive transshipments but never periodically
orders
Table 6.5 Cost function values for service policy First-In-First-Out (FIFO) corresponding to
the systems specified in Table 6.1. Different performance aspects correlate with individual
cost functions, e.g., high service quality with low out-of-stock cost, and efficient logistics
with low order and transshipment cost
being shipped, and thus, inventory cost is low, while transshipment cost is excessive.
Therefore, it is reasonable to evaluate the comparative effects of all solutions in
certain aspects, if the total results are inconclusive. To emphasize the importance of
these aspects, the cost function coefficients are adjusted accordingly.
6 Optimizing Complex Multi-location Inventory Models 121
Table 6.6 Cost function values for service policy Earliest-Deadline-First (EDF) correspond-
ing to the systems specified in Table 6.2. Out-of-stock cost monitor service quality, while
order and transshipment cost highlight logistics efficiency
depends on the model specification and shows a specific structure. The development
of such a structure is one of the most intriguing aspects, and the question arises, what
conditions have a promoting effect.
As aforementioned an advantage of the Simulation Optimization of multi-location
inventory systems with lateral transshipments is that the model itself is straightfor-
ward extendable. Functional extensions are, e.g., policies for periodic orders, trans-
shipment orders and product offers. Extending the parameter set itself, the capacity
of the locations can be optimized by introducing estate and energy cost for unused
storage. Thus, not only the flows of transshipments are optimized, but also the al-
location of capacities. In addition to static aspects of the model, the parameter set
may be extended by dynamic properties such as the location-specific order period
time. Besides these extensions there is an idea regarding orders from more than one
location at a time. Under specific circumstances one location evolves as a supplier,
ordering and redistributing product units. Therefore, the basic idea is to release an
order by several locations and to solve the Traveling Salesman Problem with mini-
mal cost. However, the existing heuristics already seem to approximate such a trans-
portation logic well, and thus, the inclusion of more elaborate policies is expected
just to increase complexity. Further research may also concentrate on characteristics
favoring demand forecast and promoting certain flows through a location network
leading to a structure.
Acknowledgment. The authors would like to thank the Robert Bosch doctoral program, the
German Academic Exchange Service and the Foundation of the German Business for fund-
ing their research.
References
[1] Arnold, J., Kochel, P., Uhlig, H.: With Parallel Evolution towards the Optimal Order
Policy of a Multi-Location Inventory with Lateral Transshipments. In: Papachristos, S.,
Ganas, I. (eds.) Research Papers of the 3rd ISIR Summer School, pp. 1–14 (1997)
[2] Belgasmi, N., Saı̈d, L.B., Ghédira, K.: Evolutionary Multiobjective Optimization of the
ulti-Location Transshipment Problem. Operational Research 8(2), 167–183 (2008)
[3] Chiou, C.-C.: Transshipment Problems in Supply Chain Systems: Review and Exten-
sions. Supply Chain, Theory and Applications, 558–579 (2008)
[4] Clerc, M.: Standard PSO (2007),
http://www.particleswarm.info/Programs.html Online: accessed July
31, 2010
[5] Clerc, M., Kennedy, J.: The Particle Swarm – Explosion, Stability, and Convergence
in a Multidimensional Complex Space. IEEE Transactions on Evolutionary Computa-
tion 6(1), 58–73 (2002)
[6] Dye, C.-Y., Hsieh, T.-P.: A Particle Swarm Optimization for Solving Joint Pricing and
Lot-Sizing Problem with Fluctuating Demand and Unit Purchasing Cost. Computers &
Mathematics with Applications 60, 1895–1907 (2010)
[7] Evers, P.T.: Heuristics for Assessing Emergency Transshipments. European Journal of
Operational Research 129, 311–316 (2001)
6 Optimizing Complex Multi-location Inventory Models 123
[8] Fu, M.C., Healy, K.J.: Techniques for Optimization via Simulation: An Experimental
Study on an (s;S) Inventory System. IIE Transactions 29, 191–199 (1997)
[9] Fu, M.C., Glover, F.W., April, J.: Simulation Optimization: A Review, New Develop-
ments and Applications. In: Kuhl, M.E., Steiger, N.M., Armstrong, F.P., Joines, J.A.
(eds.) Proceedings of the 2005 Winter Simulation Conference, pp. 83–95 (2005)
[10] Gong, Y., Yücesan, E.: Stochastic Optimization for Transshipment Problems with Posi-
tive Replenishment Lead Times. International Journal of Production Economics (2010)
(in Press, Corrected Proof)
[11] Guariso, G., Hitz, M., Werthner, H.: An Integrated Simulation and Optimization Mod-
elling Environment for Decision Support. Decision Support Systems 16(2), 103–117
(1996)
[12] Herer, Y.T., Tzur, M., Yücesan, E.: The Multilocation Transshipment Problem. IIE
Transactions 38, 185–200 (2006)
[13] Hochmuth, C.A.: Design and Implementation of a Software Tool for Simulation Op-
timization of Multi-Location Inventory Systems with Transshipments. Master’s thesis,
Chemnitz University of Technology, In German (2008)
[14] Hochmuth, C.A., Lássig, J., Thiem, S.: Simulation-Based Evolutionary Optimization of
Complex Multi-Location Inventory Models. In: 3rd IEEE International Conference on
Computer Science and Information Technology (ICCSIT), vol. 5, pp. 703–708 (2010)
[15] Iassinovski, S., Artiba, A., Bachelet, V., Riane, F.: Integration of Simulation and Op-
timization for Solving Complex Decision Making Problems. International Journal of
Production Economics 85(1), 3–10 (2003)
[16] Kämpf, M., Köchel, P.: Simulation-Based Sequencing and Lot Size Optimisation for a
Production-and-Inventory System with Multiple Items. International Journal of Produc-
tion Economics 104, 191–200 (2006)
[17] Kennedy, J., Eberhart, R.: Particle Swarm Optimization. In: Proceedings of IEEE Inter-
national Conference on Neural Networks, vol. 4, pp. 1942–1948 (1995)
[18] Köchel, P.: About the Optimal Inventory Control in a System of Locations: An Approxi-
mate Solution. Mathematische Operationsforschung und Statistik, Serie Optimisation 8,
105–118 (1977)
[19] Köchel, P.: A Survey on Multi-Location Inventory Models with Lateral Transship-
ments. In: Papachristos, S., Ganas, I. (eds.) Inventory Modelling in Production and
Supply Chains, Research Papers of the 3rd ISIR Summer School, Ioannina, Greece,
pp. 183–207 (1998)
[20] Köchel, P.: Simulation Optimisation: Approaches, Examples, and Experiences. Tech-
nical Report CSR-09-03, Department of Computer Science, Chemnitz University of
Technology (2009)
[21] Kochel, P., Arnold, J.: Evolutionary Algorithms for the Optimization of Multi-Location
Systems with Transport. In: Simulationstechnik, Proceedings of the 10th Symposium in
Dresden, pp. 461–464. Vieweg (1996)
[22] Köchel, P., Nieländer, U.: Simulation-Based Optimisation of Multi-Echelon Inventory
Systems. International Journal of Production Economics 93-94, 505–513 (2005)
[23] Köchel, P., Thiem, S.: Search for Good Policies in a Single-Warehouse, Multi-Retailer
System by Particle Swarm Optimisation. International Journal of Production Economics
(2010) (in press, corrected proof)
[24] Kukreja, A., Schmidt, C.P.: A Model for Lumpy Parts in a Multi-Location Inventory
System with Transshipments. Computers & Operations Research 32, 2059–2075 (2005)
[25] Kukreja, A., Schmidt, C.P., Miller, D.M.: Stocking Decisions for Low- Usage Items in
a Multilocation Inventory System. Management Science 47, 1371–1383 (2001)
124 C.A. Hochmuth, J. Lässig, and S. Thiem
[26] Li, J., González, M., Zhu, Y.: A Hybrid Simulation Optimization Method for Produc-
tion Planning of Dedicated Remanufacturing. International Journal of Production Eco-
nomics 117(2), 286–301 (2009)
[27] Minner, S., Silver, E.A., Robb, D.J.: An Improved Heuristic for Deciding on Emergency
Transshipments. European Journal of Operational Research 148, 384–400 (2003)
[28] Özdemir, D., Yücesan, E., Herer, Y.T.: Multi-Location Transshipment Problem
with Capacitated Transportation. European Journal of Operational Research 175(1),
602–621 (2006)
[29] Parsopoulos, K.E., Skouri, K., Vrahatis, M.N.: Particle swarm optimization for tack-
ling continuous review inventory models. In: Giacobini, M., Brabazon, A., Cagnoni, S.,
Di Caro, G.A., Drechsler, R., Ekárt, A., Esparcia-Alcázar, A.I., Farooq, M., Fink, A.,
McCormack, J., O’Neill, M., Romero, J., Rothlauf, F., Squillero, G., Uyar, A.Ş., Yang,
S. (eds.) EvoWorkshops 2008. LNCS, vol. 4974, pp. 103–112. Springer, Heidelberg
(2008)
[30] Robinson, L.W.: Optimal and Approximate Policies in Multi-Period Multi- Location
Inventory Models with Transshipments. Operations Research 38, 278–295 (1990)
[31] Ruppeiner, G., Pedersen, J.M., Salamon, P.: Ensemble Approach to Simulated anneal-
ing. Jounal de Physique I 1(4), 455–470 (1991)
[32] Willis, K.O., Jones, D.F.: Multi-Objective Simulation Optimization through Search
Heuristics and Relational Database Analysis. Decision Support Systems 46(1),
277–286 (2008)
[33] Xu, K., Evers, P.T., Fu, M.C.: Estimating Customer Service in a Two- Location Contin-
uous Review Inventory Model with Emergency Transshipments. European Journal of
Operational Research 145, 569–584 (2003)
[34] Zhan, Z.-H., Feng, X.-L., Gong, Y.-J., Zhang, J.: Solving the Flight Frequency Program-
ming Problem with Particle Swarm Optimization. In: Proceedings of the 11th Congress
on Evolutionary Computation, CEC 2009, pp. 1383–1390. IEEE Press, Los Alamitos
(2009)
Chapter 7
Traditional and Hybrid Derivative-Free
Optimization Approaches for Black Box
Functions
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 125–151.
springerlink.com c Springer-Verlag Berlin Heidelberg 2011
126 G.A. Gray and K.R. Fowler
can help overcome difficult regions of the domain and method B can be applied for
fast convergence and efficiency. This Chapter also explores the promise of hybrid
approaches and demonstrates some results for the water management problem.
Throughout this Chapter, the problem of interest is
where the objective function is f : IRn → IR and Ω defines the feasible search space.
In practice, Ω may be comprised of component-wise bound constraints on the de-
cision variable x in combination with linear and nonlinear equality or inequality
constraints. Often, Ω may be further defined in terms of state variables determined
by simulation output. The example in this Chapter includes such constraints. In ad-
dition, integer and categorical variables (for example those which require a ‘yes’
or ‘no’) are present in many engineering applications. There are a variety of DFO
methods equipped to handle these classes of problems and several are discussed
later. For the application in this work, we consider both real-valued and mixed
integer problem formulations.
The rest of this Chapter is outlined as follows: In Section 2, an example of a
black box optimization problem from hydrology is introduced. Then, in Section
3, some DFO approaches are introduced including a genetic algorithm (GA), DI-
RECT, asynchronous parallel pattern search (APPS) and implicit filtering. In ad-
dition, these methods are demonstrated on the example introduced in Section 2.
Section 4 describes some hybrid methods created using the classical DFO methods
from Section 3, and describes their performance on the example problem. Finally,
Section 5 summarizes all the information given in this Chapter and gives some ideas
regarding future research directions for hybrid optimization.
the well locations {(x̂k , ŷk )}nk=1 , and the number of wells n in the final design. A
negative pumping rate, Qk < 0 for some k, means that a well is extracting and a pos-
itive pumping rate, Qk > 0 for some k, means that a well is injecting. The objective
function and constraints of WS rely on the solution to a nonlinear partial differential
equation to obtain values of the hydraulic head, h, which determines the direction
of flow. Thus, h would be considered a state variable. In this example, for each well
k = 1 . . . n in a candidate set of wells, the hydraulic head hk must be obtained via
simulation.
The objective function, based on the one proposed in [72, 71] is given by
n
fT = ∑ Dk + ∑ c1 |1.5Qk |b1 (zgs − hmin )b2 (7.2)
k=1 k,Qk <0.0
fc
tf
+
0
∑ c2 Qk (hk − zgs ) + ∑ c3 Qk dt ,
k,Qk <0.0 k,Qk >0.0
fo
where c j and b j are cost coefficients and exponents given in [71]. In the first f c term,
Dk is the cost to drill and install well k. The second term of f c includes the cost to
install a pump for each extraction well, and this cost is based on the extraction rate
and hmin = 10 m, the minimum allowable hydraulic head and the ground surface
elevation zgs . The calculation of f o is for t f = 5 years. The first part of the integral
includes the cost to lift the water to the surface which depends on the hydraulic head
hk in well k. The second part accounts for any injection wells, which are assumed to
operate under gravity. Details pertaining to the aquifer and groundwater flow model
are fully described in [72] and are not included here as they fall outside of the scope
of the application of optimization methods to solve the WS problem.
Note that although the well locations {(x̂k , ŷk )}nk=1 do not explicitly appear in
Equation (7.2), they enter through the state variable h as output from a simulation
tool. For this work, the U.S. Geological Survey code MODFLOW [92] was used to
calculate the head values. MODFLOW is a widely used and well supported block-
centered finite difference code that simulates saturated groundwater flow. Since the
well locations must lie on the finite difference grid, real-valued locations must be
rounded to grid locations for the simulation. This results in small steps and low
amplitude noise in the optimization landscapes.
The constraints for the WS application are given as limitations on the pumping
rates,
−0.0064 m3 /s ≤ Qk ≤ 0.0064 m3 /s, k = 1, ..., n, (7.3)
and impact on the aquifer in terms of the hydraulic head,
10 m ≤ hk ≤ 30 m, k = 1, ..., n. (7.4)
7 Traditional and Hybrid Derivative-Free Optimization Approaches 129
The constraints given in Equations (7.3) and (7.4) are enforced at each well. The
total amount of water to supply is defined by the constraint
n
∑ Qk ≤ −0.032 m3 /s. (7.5)
k=1
While the pumping rates and locations are real-valued, there are options for how to
define the variable which indicates the appropriate number of wells. One approach
is to start with a large number of candidate wells, Nw , and run multiple optimization
scenarios where at the end of each one, wells with sufficiently low pumping rates
are removed before the optimization routine continues. However, for realistic water
management problems, simulations are time-consuming so it is more attractive to
determine the number of wells as the optimization progresses. One way to do this is
to include integer variables {zi }ni=1 where each zi ∈ {0, 1} is a binary indicator for
assigning a well as off or on. Since this formulation requires an optimization algo-
rithm that can handle integer variables, alternatives have been developed. In [50],
three formulations that implicitly determine the number of wells while avoiding the
inclusion of integer variables are compared. Two formulations are based on a multi-
plicative penalty formulation ([69]) and one is based on removing a well during the
course of the optimization if the rate becomes sufficiently low. This third technique
is implemented here using an inactive-well threshold given by
Note that the cost to install a well is roughly $20,000, and the operational cost is
about $1,000 per year. Thus, using as few wells as possible drives the optimiza-
tion regardless of the formulation. However, the inclusion of Equation (7.6) in the
formulation results in a narrow region of decrease for an optimization method to
find, but a large decrease in cost. Mathematically, using Equation (7.6) allows for
real-valued DFO methods, but adds additional discontinuities in the minimization
landscapes.
The implementation of the WS problem considered in this study was taken from
http://www4.ncsu.edu/˜ctk/community.html where the entire pack-
age of simulation data files and objective function/constraint subroutines are avail-
able for download. The final design solution is known to be five wells all operating
at Qi = −0.0064 m3 /s with locations aligned with the north and east boundaries,
as shown in Table 1. See [32] for details. To study the DFO methods described in
this Chapter, a starting point with six candidate wells was used. In order to find
the solution, the optimization methods must determine that one well must be shut
off while simultaneously optimizing the rates and locations of the remaining wells.
Furthermore, the rates must lie on the boundary of the constraint in Equation(7.3) in
order to satisfy the constraint given in Equation (7.5). Thus, the WS problem con-
tains challenging features for simulation-based optimization problems that are not
unique to environmental engineering but that can be seen across many scientific and
engineering disciplines. To summarize, the challenges of the WS problem include a
black box objective function and constraints, linear and nonlinear constraints on the
130 G.A. Gray and K.R. Fowler
Table 7.1 Five well solution to WS with pumping rates Qi = −0.0064 m3 /s, i = 1, . . . , 5
Well Number 1 2 3 4 5
x̂ [m] 350 788 722 170 800
ŷ [m] 724 797 579 800 152
decision and state variables, multiple problem formulations, low amplitude noise, a
discontinuous and disconnected feasible region, and multiple local minima.
Often, GAs are criticized for their computational complexity and dependence on
optimization parameter settings, which are not known a priori [22, 42, 68]. Also,
since the GA incorporates a randomness to the search phase, multiple optimization
runs may be needed. However, if the user is willing to exhaust a large number of
function evaluations, a GA can help provide insight into the design space and locate
initial points for fast, local, single search methods. The GA has many alternate forms
and has been applied to a wide range of engineering design problems as shown in
references such as [60]. Moreover, hybrid GAs have been developed at all levels of
the algorithm and with a variety of other global and local search DFO methods. See
for example [6, 83, 76] and the references therein.
132 G.A. Gray and K.R. Fowler
In [32, 29, 50], the NSGA-II implementation [21, 93, 19, 20] of the GA was
used on the WS problem for both the mixed-integer formulation and the inactive
well-threshold to determine the wells. It was shown that for this problem the GA
performed better if (1) the number of wells was determined directly by including
the binary on-off switch compared to using the inactive well threshold and (2) if
the initial population was seeded with points that had at least five wells operating at
-0.0064 m3 /s. If a random initial population was used, the algorithm could not iden-
tify the solution after 4,000 function evaluations. If the GA was seeded accordingly,
a solution was found within 161 function calls but the function evaluation budget
would be exhausted before the algorithm would terminate, which for that work was
set to 900.
Asynchronous Parallel Pattern Search (APPS) [55, 62] is a direct search methods
which uses a predetermined pattern of points to sample a given function domain.
APPS is an example of a generating set search (GSS), a class of algorithms for
bound and linearly constrained optimization that obtain conforming search direc-
tions from generators of local tangent cones [65, 64]. In the case that only bound
constraints are present, GSS is identical to a pattern search. The majority of the
computational cost of pattern search methods is the function evaluations, so parallel
pattern search (PPS) techniques have been developed to reduce the overall computa-
tion time. Specifically, PPS exploits the fact that once the points in the search pattern
have been defined, the function values at these points can be computed simultane-
ously [23, 84]. For example, for a simple two-dimensional function, consider the
illustrations in Figure 7.3. First, the points f , g, h, and i in the stencil around point
c are evaluated. Then, since f results in the smallest function value, the second
7 Traditional and Hybrid Derivative-Free Optimization Approaches 133
Fig. 7.3 Illustration of the steps of Parallel Pattern Search (PPS) for a simple two-
dimensional function. On the left, an initial PPS stencil around starting point c is shown.
In the middle, a new stencil is created after successfully finding a new local min ( f ). On the
left, PPS shrinks the stencil after failing to find a new minimum
picture shows a new stencil around point f . Finally, in third picture, since none of
the iterates in this new stencil result in a new local minima, the step size of the
stencil is reduced.
The APPS algorithm is a modification of PPS that eliminates the synchronization
requirements that the function values of all the points in the current search pattern
must be completed before the algorithm can progress. It retains the positive features
of PPS, but reduces processor latency and requires less total time than PPS to return
results [55]. Implementations of APPS have minimal requirements on the number
of processors (i. e. 2 instead of n + 1 for PPS) and do not assume that the amount of
time required for an objective function evaluation is constant or that the processors
are homogeneous.
The implementation of the APPS algorithm is more complicated than a basic GSS
in that it requires careful bookkeeping. However, the details are irrelevant to the
overall understanding of the method. Instead we present a basic GSS algorithm and
direct interested readers to [40] for a detailed description and analysis of the APPS
algorithm and corresponding APPSPACK software. The basic GSS algorithm is:
Let x0 be the starting point, Δ0 be the initial step size, and D be the set of positive
spanning directions.
While not converged Do
when the iteration was unsuccessful, the step size is necessarily reduced. A defining
difference between the basic GSS and APPS is that the APPS algorithm processes
the directions independently, and each direction may have its own corresponding
step size. Global convergence to locally optimal points is ensured using a sufficient
decrease criteria for accepting new best points. A trial point xk + Δ di is considered
better than the current best xk point if
Implicit filtering is based on the notion that if derivatives were available and reli-
able, Newton-like methods would yield fast results. The method evaluates a stencil
of points at each iteration used simultaneously to form finite difference gradients and
as a pattern for direct search [35]. Then, a secant approach, called a quasi-Newton
7 Traditional and Hybrid Derivative-Free Optimization Approaches 135
method, is used to solve the resulting system of nonlinear equations at each itera-
tion [61] to avoid finite difference approximations of the Hessian matrix. In contrast
to classical finite-difference based Newton algorithms, implicit filtering begins with
a much larger stencil to account for noise. This step-size is reduced as the opti-
mization progresses to improve the accuracy of the gradient approximation and take
advantage of the fast convergence of quasi-Newton methods near a local minimum.
To solve the WS problem, a FORTRAN implementation called IFFCO (Implicit
Filtering For Constrained Optimization) was used. IFFCO depends on a symmetric
rank one quasi-Newton update [11]. The user must supply an objective function and
initial iterate and then optimization is terminated based on a function evaluation
budget or by exhausting the number of times the finite difference stencil is reduced.
We denote the finite difference gradient with increment size p by ∇ p f and the model
Hessian matrix as H. For each p, the projected quasi-Newton iteration proceeds
until the center of the finite difference stencil yields the smallest function value or
||∇ p f || ≤ τ p, which means the gradient has been reduced as much as possible on
the current scale. After this, the difference increment is halved (unless the user has
specified a particular sequence of increments) and the optimization proceeds until
the function evaluation budget is met.
The general unconstrained algorithm can be outlined as follows:
While not converged
Do until ∇ p f < τ p
1. Compute ∇ p f
2. Find the least integer λ such that sufficient decrease holds
3. x = x − λ H −1∇ p f (x)
4. Update H via a quasi-Newton method
Reduce p
This can be illustrated on an a small perturbation of a quadratic function as illus-
trated in Figure 7.4. Given an initial iterate x0 = −1.25 and p = 0.25, the resulting
centered finite difference stencil is shown on the left. The center of the stencil, f (x0 )
is denoted with an “*” and f (x0 ± p) is denoted with “o”. Since the center of the
stencil has the lowest function value, the algorithm would proceed and take a decent
step. Then, suppose the next iterate is as in the center picture of Figure 7.4. Then,
the lowest function value occurs on the stencil at f (x1 ) and thus stencil failure has
occurred. In this case, p, is reduced by half. Then, the stencil would be as in the
right picture, and stencil failure would not occur, so the algorithm would proceed.
IFFCO was used to solve the WS problem in [32, 29] using the inactive-well
threshold and also in [50] a multiplicative penalty term to determine the number of
wells. The behavior for both implementations was similar to that of APPS in that a
good initial iterate was needed to identify the five well solution. With good initial
data, IFFCO identified the solution within 200 function evaluations.
IFFCO and the implicit filtering algorithm in general have been successfully ap-
plied to a variety of other challenging simulation-based optimization applications
including mechanical engineering [12], polymer processing [31], and physiological
modeling [30]. There are several convergence theorems for implicit filtering [13],
136 G.A. Gray and K.R. Fowler
Fig. 7.4 The illustration on the right shows the first implicit filtering stencil for a small
perturbation on a quadratic function. The center picture shows stencil failure and the picture
on the right illustrates a new stencil with a reduced step size
which was particularly designed for the optimization of noisy functions with bound
constraints [35].
Linear and nonlinear constraints may be incorporated into the objective function
via a penalty or barrier approach. The default in IFFCO is to handle constraint viola-
tions using an extreme barrier approach and simply assign a large function value to
any infeasible point. The performance of IFFCO on nonsmooth, nonconvex, noisy
problems and even those with disconnected feasible regions are strong but the de-
pendence on the initial starting point is well documented [8, 61]. Also, note that
IFFCO includes a projection operator to handle bound constraints.
7.3.2.3 DIRECT
DIRECT, an acronym for DIviding RECTangles, was designed for global optimiza-
tion of bound constrained problems as an extension of Shuberts Lipschitz optimiza-
tion method [58]. Since its introduction in the early 1990’s, a significant number of
papers have been written analyzing, describing, and developing new variations of
this highly effective algorithm. Some of these include [57, 34, 16, 5, 86, 80, 28, 10].
DIRECT is essentially a partitioning algorithm that sequentially refines the re-
gion defined by bound constraints at each iteration by selecting a hyper-rectangle to
trisect [27, 33, 58]. To balance the global and local search, at each iteration a set S of
potentially optimal rectangles is identified based on the function value at the center
of the rectangle and the size of the rectangle. The basic algorithm is as follows:
1. Normalize the bound constraints to form a unit hypercube search space with
center c1
2. Find f (c1 ), set fmin = f (c1 ), i = 0
3. Evaluate f (ci + 13 ei ), 1 ≤ i ≤ n where ei is the ith unit vector
4. While not converged Do
a. Identify the set S of all potentially optimal rectangles
b. For all j ∈ S, identify the longest sides of rectangle j, evaluate f at centers,
and trisect j into smaller rectangles
c. Update fmin , i = i + 1
7 Traditional and Hybrid Derivative-Free Optimization Approaches 137
Note that DIRECT requires that both the upper and lower bounds be finite. The al-
gorithm begins by mapping the rectangular feasible region onto the unit hypercube;
that is DIRECT optimizes the transformed problem
Fig. 7.5 For a two-dimensional problem, DIRECT iteratively subdivides the optimal hyper-
rectangle into thirds
The criteria for being a potentially optimal hyper-rectangle given a constant ε > 0
is as follows [58]: Suppose there are K enumerated hyper-rectangles subdividing
the unit hypercube from Equation ( 7.8) with centers ci , 1 ≤ i ≤ K. Let γi denote
the corresponding distance from the center ci to its vertices. A hyper-rectangle is
considered potentially optimal if there exists αK > 0 such that
The set of potentially optimal hyper-rectangles forms a convex hull for the set point
{ f˜(ci ), γi }. Figure 7.6 illustrates this. Notice that the user defined parameter ε con-
trols whether or not the algorithm performs more of a global or local search.
Although DIRECT has been shown to be highly effective for relatively small
problems and has proven global convergence, it does suffer at higher dimensions
[16, 87, 80, 28] and requires an exhaustive number of function evaluations. In [29],
DIRECT was unable to identify a five well solution to the WS problem when start-
ing with an initial six well configuration and using the constraint in Equation (7.6).
These results are not surprising given that the five well solution has all of the pump-
ing rates lying on the bound constraint. The sampling strategy of DIRECT does not
make it a good candidate for this problem.
138 G.A. Gray and K.R. Fowler
Fig. 7.6 Potentially optimal hyper-rectangles can be found by forming the convex hull of
the set { f (ci ), γi }, where ci denotes the center point of the ith hyper-rectangle and γi the
corresponding distance to hyper-rectangle’s vertices
calculated at each candidate location. The points with highest expected improve-
ment are selected as candidates for the new best point.
Standard GP models have several drawbacks, including strong assumptions of
stationarity and poor computational scaling. To reduce these problems, treed Gaus-
sian process (TGP) models partition the input space using a recursive tree structure;
and independent GP models are fit within each partition [37, 38]. Such models are
a natural extension of standard GP models, and combine partitioning ideas with
Bayesian methods to produce smooth fitted functions [9]. The partitions can be fit
simultaneously with the parameters of the embedded GP models using reversible
jump Markov chain Monte Carlo [41].
Note that the statistical emulation via TGP has the disadvantage of computa-
tional expense. As additional points are evaluated, the computational work load of
creating the GP model increases significantly. This coupled with some convergence
issues when TGP approaches the solution indicate that TGP alone is not an effec-
tive method for solving the WS problem. However, TGP is an excellent method for
inclusion in a hybrid because these disadvantages can be overcome.
7.4.1 APPS-TGP
Some optimization methods have introduced an oracle to predict additional points at
which a decrease in the objective function might be observed. Analytically, an oracle
is free to choose points by any finite process. (See [63] and references therein.)
The addition of an oracle is particularly amenable to a pattern search methods like
APPS. The iterate(s) suggested by the oracle are merely additions to the pattern.
Furthermore, the asynchronous nature of the APPSPACK implementation makes it
adept at handling the evaluation of the additional points. The idea of an oracle is
used as a basis for creating a hybrid optimization scheme which combines APPS
and the statistical emulator TGP.
In the APPS-TGP hybrid, the TGP statistical model serves as the an oracle. The
hopes in utilizing the TGP oracle include added robustness and the introduction of
some global properties to APPSPACK. When the oracle is called, the TGP algorithm
is applied to the set of evaluated iterates in order to choose additional candidate
points. In other words, APPSPACK is still optimizing as normal, but throughout
the optimization process, the iterate pairs (xi , f (xi )) are collected. Then, the TGP
model is fit to the existing output, and the expected improvement is calculated at
each candidate location. The points with highest expected improvement are passed
back to the APPS algorithm to determine if it is a new best point. If not, the point is
merely discarded and the APPS algorithm continues without any changes. However,
if a TGP point is a new best point, the APPSPACK search pattern continues from
that new location. The general flow of this algorithm is illustrated in Figure 7.7.
Note that both APPS and TGP generate points and both methods are informed of
the function values associated with these iterates.
This hybrid technique is loosely coupled as APPS and TGP run independently of
each other. Since the iterates suggested by the TGP algorithm are used in addition
7 Traditional and Hybrid Derivative-Free Optimization Approaches 141
Fig. 7.7 The flow of the APPS-TGP hybrid. Both APPS and TGP generate iterates. The
iterates are merged into one list. Then, the function value of each iterate is either obtained
from cache or evaluated. Finally, the results are shared with both methods
to the iterates suggested by APPSPACK, there is no adverse affect on the local con-
vergence properties of APPS. As noted earlier, pattern search methods have strong
local convergence properties [25, 66, 85]. However, their weakness is that they are
local methods. In contrast, TGP performs a global search of the feasible region,
but does not have strong local convergence properties. Hence, using the hybridiza-
tion scheme, TGP lends a globalization to the pattern search and the pattern search
further refines TGP iterates by local search. This benefit is clearly illustrated on a
model calibration problem from electrical engineering in [82] and on a groundwater
remediation problem in [39].
APPS-TGP is also collaborative since APPS and TGP are basically run indepen-
dently of one another. From the perspective of TGP, a growing cache of function
evaluations is being cultivated, and the sole task of TGP is to build a model and se-
lect a new set of promising points to be evaluated. The TGP algorithm is not depen-
dent on where this cache of points comes from. Thus in this approach, we may eas-
ily incorporate other optimization strategies where each strategy is simply viewed
as an external point generating mechanism leveraged by TGP. From the perspective
of APPS, points suggested by TGP are interpreted in an identical fashion to other
trial points and are ignored unless deemed better than the current best point held
by APPS. Thus neither algorithm is aware that a concurrent algorithm is running in
parallel. However, the hybridization is integrative in the sense that points submitted
by TGP are given a higher priority in the queue of iterates to be evaluated. In the
parallel execution of APPS and TGP, TGP is given one processor (because it is com-
putationally prohibitive) while APPS directs the use of the remaining processors to
perform point evaluations. Communication between TGP and APPSPACK occurs
intermittently through out the optimization process, whenever TGP completes and
is ready to look at a new cache of points.
142 G.A. Gray and K.R. Fowler
7.4.2 EAGLS
To address mixed-variable, nonlinear optimization problems (MINLPs) of the form
where c(x) : IRn → IRm , consider a hybrid of a GA and a direct search. The APPS-GA
hybrid, commonly referred to as EAGLS (Evolutionary Algorithm Guiding Local
Search), uses the GA’s handling of integer and real variables for global search, and
APPS’s handling of real variables in parallel for local search [43].
As previously discussed, a GA carries forward a population of points that are it-
eratively mutated, merged, selected, or dismissed. However, individuals in the pop-
ulation are not given the opportunity to make intergeneration improvements. This
is not reflective of he real world, where an organism is not constant throughout its
life span, but instead can grow, improve, or become stronger. Improvements within
a generation are allowed in EAGLS. The GA still governs point survival, mutation,
and merging as an outer iteration, but, during an inner iteration, individual points
are improved via APPS applied to the real variables, with the integer variables held
fixed. For simplicity, consider the synchronous EAGLS algorithm:
1. Evaluate initial population
2. While not converged Do
a. Perform selection, mutation, crossover
b. Evaluate new points
c. Choose points for local search
d. Make multiple calls to APPS for real-valued subproblems
Of course, to allow the entire population to grow as such, would be computationally
prohibitive. Thus, EAGLS employs a ranking algorithm that takes in to account in-
dividual proximity to other, better points. The goal of this step is to select promising
individuals representing distinct subdomains. Note that the flow of the asynchronous
EAGLS algorithm is slightly different than that of APPS-TGP. In this case, NSGA-
II generates iterates and multiple instances of APPS also generate iterates. Returned
7 Traditional and Hybrid Derivative-Free Optimization Approaches 143
Fig. 7.8 The flow of EAGLS. The NSGA-II algorithm generate iterates. Then, some iterates
are selected for refinement by multiple instances of APPS. The iterates are merged into one
list, and the function value of each iterate is either obtained from cache or evaluated. Finally,
the results are returned so that the APPS instances and the GA can proceed
function values are distributed to the appropriate instance of APPS or the GA. This
is illustrated in Figure 7.8.
Note that it is the combinatorial nature of integer variables that makes the solution
of MINLPs difficult. If the integer variables are relaxable (i.e. the objective function
is defined for rational variables), more sophisticated schemes such as branch and
bound may be preferred options. However, for simulation-based optimization prob-
lems, the integer variables often represent a descriptive category (ı.e. a color or a
building material) and may lack the natural ordering required for relaxation. That
is, there is may be no well-defined mathematical definition for what is meant by
“nearby.” In the WS problem, the number of wells is not a relaxable variable be-
cause, for example, one-half a well cannot be installed. The other results for the
WS problem given in this chapter consider the strictly real-valued WS formula-
tion. However, since EAGLS was designed to handle MINLPs, it was applied to the
MINLP formulation of WS. Moreover, EAGLS combines a global and local search
in order to take advantage of the global properties and overcome the computational
expense of the GA.
To illustrate the global properties of EAGLS, the problem was solved without
an initial point. In [43], EAGLS was able to locate a five well solution using only
random points in the initial GA population. Moreover, this was done in after about
65 function evaluations. This is an improvement both for the GA and for the local
search method APPS. The function evaluation budget was 3000 and roughly 1000
of those were spent on points that did not satisfy the linear constraint in Equation
(7.5) which means the simulator was never called.
144 G.A. Gray and K.R. Fowler
7.4.3 DIRECT-IFFCO
A simple sequential hybrid was proposed in [8] where the global search strengths
of DIRECT were used to generate feasible starting points for IFFCO. This hybrid
further addresses the weakness that DIRECT may require a large number of func-
tion evaluations to find a highly accurate solution. In that work, DIRECT and IF-
FCO, which was initialized using random points, were compared to the sequen-
tial pairing for a gas pipeline design application and a set of global test problems.
The pipeline problems were significantly challenging since the underlying objective
function would often fail to return a value. This is referred to as a hidden constraint
in simulation-based or black-box optimization. DIRECT showed some evidence of
robustness in terms of locating global minima but often required an excessive num-
ber of function evaluations. IFFCO alone showed mixed performance; sometimes
refining the best solution once a feasible point was located but often converging to
a suboptimal local minimum. For the hybrid, the results were promising. Even if
the function value at the end of the DIRECT iteration was high, IFFCO was able
to avoid entrapment in a local minima using the results. In fact, using DIRECT as
a generator of starting points for local searches has been actively studied over the
years and applied to a variety of applications. For example, in [17] DIRECT was
paired with a sequential quadratic programming method for the local search and
outperformed a variety of other global methods applied to an aircraft design prob-
lem and in [70] a gradient-based local search was shown to accelerate convergence
to the global minimum for a flight control problem.
A different idea was used in [29] where DIRECT was used in conjunction with
IFFCO to find starting points for the WS problem. In this case, DIRECT was used to
minimize an aggregate of constraint violation and thereby identify sets of feasible
starting points and then IFFCO was used to minimize the true objective function.
This approach was not successful in that the points identified by DIRECT were so
close to multiple local minima that IFFCO was unable to improve the objective func-
tion value. In particular, IFFCO would only converge if initial points contained five
wells operating on the bound constraint for their pumping rates, and DIRECT did
not identify any points of this sort. The advantages obtained by combining DIRECT
and IFFCO do not address the characteristics of the WS problem that make it diffi-
cult to solve. However, it should be noted that the idea of using DIRECT and IFFCO
together in this sort of bi-objective approach certainly warrants further investigation
despite the performance on the WS problem.
7.4.4 DIRECT-TGP
Another attempt to improve the local search of DIRECT involves TGP with a
gradient-based method on the surrogate model, which is cheap to minimize [52].
Hybridization in this case is performed at the iteration level in that the center of the
current rectangle is used as a starting point for a local search on the surrogate. Es-
sentially, the procedure for dividing hyper-rectangles in Step 4(b) in Section 2.2.3
7 Traditional and Hybrid Derivative-Free Optimization Approaches 145
above is replaced with the following steps once the number of function evaluations
is larger than 2n + 1, which allows for the initial hypercube sampling:
1. Build TGP surrogate using all known function evaluations
2. Start local search on the surrogate, constrained to the rectangle, using the center
of the rectangle as the initial point
3. Evaluate f the local optimum, xloc
4. Return f (xloc ) instead of f (ci )
The algorithm, although relatively new, has been tested on a suite of bound con-
strained and nonlinearly constrained problems and a cardiovascular modeling
problem proposed in [30].
These promising preliminary results indicate this new hybrid can improve the
local search capabilities of DIRECT. This is achieved without compromising the
computational efficiency and with practically no additional algorithmic parameters
to fine-tune. It should also be noted that other hybrids that attempt to improve the lo-
cal search of DIRECT have been proposed. For example, [44] proposes a DIRECT-
GSS hybrid. The resulting algorithm does show some promising results in terms
of reducing the computational workload required to solve the optimization prob-
lem, but it has only been investigated on test problems from the literature. Further
tests are needed to determine its applicability to engineering applications. Given the
performance of the DIRECT-IFFCO approach above, any local search hybrid with
DIRECT would likely not perform well on the WS problem.
Table 7.2 Summary of the Performance of some Derivative-Free Methods for the WS
Problem
The current state of the art is to accept an iterate as an optimum based on the inabil-
ity to find better guess within a decreasing search region. This may lead to solutions
to design problems that are undesirable due to a lack of robustness to small design
perturbations. Instead, algorithms that allow designers to choose a solution based
on additional criteria can be created in the hybrid framework. For example, a re-
gional optimum could be used to generate a set of multiple solutions from which
the designer can choose.
Acknowledgments. The authors would like to thank the American Institute of Mathematics
(AIM) for their support of the EAGLS research. We thank Josh Griffin for his assistance with
Figures 7.5 and 7.6 included in Section 3.2.3 to illustrate DIRECT and Tammy Kolda for
her assistance with Figure 7.3 included in Section 3.2.1 to illustrate APPS. We also thank
Josh Griffin, Matt Parno, and Thomas Hemker for their contributions to the hybrid meth-
ods research. Sandia National Laboratories is a multiprogram laboratory operated by Sandia
Corporation, a Lockheed Martin Company, for the United States Department of Energy’s
National Nuclear Security Administration under Contract DE-AC04-94AL85000.
References
[1] Alba, E.: Parallel Metaheuristics. John Wiley & Sons, Chichester (2005)
[2] Audet, C., Booker, A., et al.: A surrogate-model-based method for constrained opti-
mization. In: AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis
and Optimization (2000)
[3] Audet, C., Couture, G., Dennis Jr, J.E.: Nonlinear optimization with mixed variables
and derivatives, NOMAD (2002)
[4] Audet, C., Dennis Jr., J.E.: Mesh adaptive direct search algorithms for constrained op-
timization. Technical report, Ecole Polytechnique de Montreal, Departement de Math-
ematiques et de Genie Industriel, Montreal (Quebec), H3C 3A7 Canada (2004)
[5] Bartholomew-Biggs, M.C., Parkhurst, S.C., Wilson, S.P.: Global optimization –
stochastic or deterministic? In: Albrecht, A.A., Steinhöfel, K. (eds.) SAGA 2003.
LNCS, vol. 2827, pp. 125–137. Springer, Heidelberg (2003)
7 Traditional and Hybrid Derivative-Free Optimization Approaches 147
[6] Blum, C., Blesa Aquilera, M.J., Roli, A., Sampels, M.: Hybrid Metaheuristics. SCI.
Springer, Heidelberg (2008)
[7] Booker, A.J., Meckesheimer, M.: Reliability based design optimization using design
explorer. Opt. Eng. 5, 170–205 (2004)
[8] Carter, R., Gablonsky, J.M., Patrick, A., Kelley, C.T., Eslinger, O.J.: Algorithms for
noisy problems in gas transmission pipeline optimization. Opt. Eng., 139–157 (2001)
[9] Chipman, H.A., George, E.I., McCulloch, R.E.: Bayesian treed models. Machine Learn-
ing 48, 303–324 (2002)
[10] Chiter, L.: Direct algorithm: A new definition of potentially optimal hyperrectangles.
Appl. Math. Comput. 179(2), 742–749 (2006)
[11] Choi, T.D., Eslinger, O.J., Gilmore, P., Patrick, A., Kelley, C.T., Gablonsky, J.M.:
IFFCO: Implicit Filtering for Constrained Optimization, Version 2. Technical Report
CRSC-TR99-23, North Carolina State Univeristy (July 1999)
[12] Choi, T.D., Eslinger, O.J., Kelley, C.T., David, J.W., Etheridge, M.: Optimization of
automotive valve train components with implict filtering. Optim. Engrg. 1, 9–28 (2000)
[13] Choi, T.D., Kelley, C.T.: Superlinear convergence and implicit filtering. SIAM J.
Opt. 10, 1149–1162 (2000)
[14] Conn, A., Scheinberg, K., Vincente, L.N.: Introduction to Derivative-Free Optimization.
SIAM, Philadelphia (2009)
[15] Cotta, E.-G., Talbi, E.A.: Parallel Hybrid Metaheuristics. In: Parallel Metaheuristics,
pp. 347–370. John Wiley & Sons, Inc, Chichester (2005)
[16] Cox, S.E., Hart, W.E., Haftka, R., Watson, L.: DIRECT algorithm with box penetra-
tion for improved local convergence. In: 9th AIAA/ISSMO Symposium on Multidisci-
plinary Analysis and Optimization (2002)
[17] Cox, S.L., Haftka, R.T., Baker, C.A., Grossman, B., Mason, W.H., Watson, L.T.: A com-
parison of global optimization methods for the design of a high-speed civil transport.
Journal of Global Optimization 21, 415–433 (2001)
[18] Cressie, N.A.C.: Statistics for Spatial Data, revised edition. John Wiley & Sons,
Chichester (1993)
[19] Deb, K.: An efficient constraint handling method for genetic algorithms. Comp. Meth-
ods Appl. Mech. Eng. 186(2-4), 311–338 (2000)
[20] Deb, K., Goel, T.: Controlled elitist non-dominated sorting genetic algorithms for
better convergence. In: Zitzler, E., Deb, K., Thiele, L., Coello Coello, C.A., Corne,
D.W. (eds.) EMO 2001. LNCS, vol. 1993, p. 67. Springer, Heidelberg (2001)
[21] Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic
algorithm: NSGA-II. IEEE Trans. Evolutionary Comp. 6(2), 182–197 (2002)
[22] Ting, C.-K.: An analysis of the effectiveness of multi-parent crossover. In: Yao, X.,
Burke, E.K., Lozano, J.A., Smith, J., Merelo-Guervós, J.J., Bullinaria, J.A., Rowe, J.E.,
Tiňo, P., Kabán, A., Schwefel, H.-P. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 131–140.
Springer, Heidelberg (2004)
[23] Dennis Jr., J.E., Torczon, V.: Direct search methods on parallel machines. SIAM J.
Opt. 1, 448–474 (1991)
[24] Diaconis, P.: Bayesian numerical analysis. In: Gupta, S.S., Berger, J.O. (eds.) Statistical
Decision Theory and Related Topics IV, Springer, Heidelberg (1988)
[25] Dolan, E.D., Lewis, R.M., Torczon, V.: On the local convergence properties of parallel
pattern search. Technical Report 2000-36, NASA Langley Research Center, Inst. Com-
put. Appl. Sci. Engrg., Hampton, VA (2000)
[26] Fan, S.-K.S., Zahara, E.: A hybrid simplex search and particle swarm optimization for
unconstrained optimization. Eur. J. Oper. Res. 181(2), 527–548 (2007)
148 G.A. Gray and K.R. Fowler
[27] Finkel, D.E.: Global Optimization with the DIRECT Algorithm. PhD thesis, North Car-
olina State Univ., Raleigh, NC (2005)
[28] Finkel, D.E., Kelley, C.T.: Additive scaling and the DIRECT algorithm. J. Global Op-
tim. 36(4), 597–608 (2006)
[29] Fowler, K.R., et al.: A comparison of derivative-free optimization methods for water
supply and hydraulic capture community problems. Adv. Water Resourc. 31(5), 743–
757 (2008)
[30] Fowler, K.R., Gray, G.A., Olufsen, M.S.: Modeling heart rate regulation part ii: Param-
eter identification and analysis. J. Cardiovascular Eng. 8(2) (2008)
[31] Fowler, K.R., Jenkins, E.W., LaLonde, S.L.: Understanding the effects of polymer ex-
trusion filter layering configurations using simulation-based optimization. Optim. En-
grg. 11, 339–354 (2009)
[32] Fowler, K.R., Kelley, C.T., Miller, C.T., Kees, C.E., Darwin, R.W., Reese, J.P.,
Farthing, M.W., Reed, M.S.C.: Solution of a well-field design problem with implicit
filtering. Opt. Eng. 5, 207–234 (2004)
[33] Gablonsky, J.M.: DIRECT Version 2.0 User Guide. Technical Report CRSCTR01- 08,
Center for Research in Scientific Computation, NC State (2001)
[34] Gablonsky, J.M., Kelley, C.T.: A locally-biased form of the DIRECT algorithm. J.
Global Optim. 21(1), 27–37 (2001)
[35] Gilmore, P., Kelley, C.T.: An implicit filtering algorithm for optimization of functions
with many local minima. SIAM J. Opt. 5, 269–285 (1995)
[36] Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, Reading (1989)
[37] Gramacy, R.B., Lee, H.K.H.: Bayesian treed Gaussian process models. Technical report,
Dept. of Appl. Math & Statist., Univ. of California, Santa Cruz (2006)
[38] Gramacy, R.B., Lee, H.K.H.: Bayesian treed Gaussian process models with an applica-
tion to computer modeling. J. Amer. Statist. Assoc. 103, 1119–1130 (2008)
[39] Gray, G.A., Fowler, K., Griffin, J.D.: Hybrid optimization schemes for simulation based
problems. Procedia Comp. Sci. 1(1), 1343–1351 (2010)
[40] Gray, G.A., Kolda, T.G.: Algorithm 856: APPSPACK 4.0: Asynchronous parallel
pattern search for derivative-free optimization. ACM Trans. Math. Software 32(3),
485–507 (2006)
[41] Green, P.J.: Reversible jump markov chain monte carlo computation and bayesian
model determination. Biometrika 82, 711–732 (1995)
[42] Grefenstette, J.J.: Optimization of control parameters for genetic algorithms. IEEE
Trans. Sys. Man Cybernetics, SMC 16(1), 122–128 (1986)
[43] Griffin, J.D., Fowler, K.R., Gray, G.A., Hemker, T., Parno, M.D.: Derivative-free opti-
mization via evolutinary algorithms guiding local search (EAGLS) for MINLP. Pac. J.
Opt (2010) (to appear)
[44] Griffin, J.D., Kolda, T.G.: Asynchronous parallel hybrid optimization combining
DIRECT and GSS. Optim. Meth. Software 25(5), 797–817 (2010)
[45] Griffin, J.D., Kolda, T.G.: Nonlinearly-constrained optimization using heuristic penalty
methods and asynchronous parallel generating set search. Appl. Math. Res. ex-
press 2010(1), 36–62 (2010)
[46] Griffin, J.D., Kolda, T.G., Lewis, R.M.: Asynchronous parallel generating set search for
linearly-constrained optimization. SIAM J. Sci. Comp. 30(4), 1892–1924 (2008)
[47] Gropp, W., Lusk, E., Doss, N., Skjellum, A.: A high-performance, portable implemen-
tation of the MPI message passing interface standard. Parallel Comput. 22, 789–828
(1996)
7 Traditional and Hybrid Derivative-Free Optimization Approaches 149
[48] Gropp, W.D., Lusk, E.: User’s guide for mpich, a portable implementation of MPI.
Technical Report ANL-96/6, Mathematics and Computer Science Division, Argonne
National Lab (1996)
[49] Hackett, P.: A comparison of selection methods based on the performance of a genetic
program applied to the cart-pole problem (1995) ; A Bachelor’s thesis for Griffith Uni-
versity, Gold Coast Campus, Queensland
[50] Hemker, T., Fowler, K.R., Farthing, M.W., von Stryk, O.: A mixed-integer simulation-
based optimization approach with surrogate functions in water resources management.
Opt. Eng. 9(4), 341–360 (2008)
[51] Hemker, T., Fowler, K.R., von Stryk, O.: Derivative-free optimization methods for han-
dling fixed costs in optimal groundwater remediation design. In: Proc. of the CMWR
XVI - Computational Methods in Water Resources, June 19-22 (2006)
[52] Hemker, T., Werner, C.: Direct using local search on surrogates. Submitted to Pac. J.
Opt. (2010)
[53] Holland, J.H.: Adaption in Natural and Artificial Systems. Univ. of Michigan Press,
Ann Arbor (1975)
[54] Holland, J.H.: Genetic algorithms and the optimal allocation of trials. SIAM J. Com-
put. 2 (1975)
[55] Hough, P.D., Kolda, T.G., Torczon, V.: Asynchronous parallel pattern search for non-
linear optimization. SIAM J. Sci. Comput. 23, 134–156 (2001)
[56] Hough, P.D., Meza, J.C.: A class of trust-region methods for parallel optimization.
SIAM J. Opt. 13(1), 264–282 (2002)
[57] Jones, D.R.: The direct global optimization algorithm. In: Encyclopedia of Optimiza-
tion, vol. 1, pp. 431–440. Kluwer Academic, Boston (2001)
[58] Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the
lipschitz constant. J. Opt. Theory Apps. 79(1), 157–181 (1993)
[59] Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive
blackbox functions. J. Global Optim. 13, 455–492 (1998)
[60] Karr, C., Freeman, L.M.: Industrial Applications of Genetic Algorithms. International
Series on Computational Intelligence. CRC Press, Boca Raton (1998)
[61] Kelley, C.: Iterative methods for optimization. SIAM, Philadelphia (1999)
[62] Kolda, T.G.: Revisiting asynchronous parallel pattern search. Technical Report
SAND2004-8055, Sandia National Labs, Livermore, CA 94551 (February 2004)
[63] Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: New perspectives
on some classical and modern methods. SIAM Rev. 45(3), 385–482 (2003)
[64] Kolda, T.G., Lewis, R.M., Torczon, V.: Stationarity results for generating set search for
linearly constrained optimization. SIAM J. Optim. 17(4), 943–968 (2006)
[65] Lewis, R.M., Shepherd, A., Torczon, V.: Implementing generating set search methods
for linearly constrained minimization. Technical Report WMCS- 2005-01, Department
of Computer Science, College of William & Mary, Williamsburg, VA (July 2006) (re-
vised)
[66] Lewis, R.M., Torczon, V.: Rank ordering and positive basis in pattern search algorithms.
Technical Report 96-71, NASA Langley Research Center, Inst. Comput. Appl. Sci. En-
grg., Hampton, VA (1996)
[67] Lewis, R.M., Torczon, V., Trosset, M.W.: Direct search methods: Then and now. J.
Comp. Appl. Math. 124(1-2), 191–207 (2000)
[68] Lobo, F.G., Lima, C.F., Michalewicz, Z. (eds.): Parameter settings in evolutionary
algorithms. Springer, Heidelberg (2007)
150 G.A. Gray and K.R. Fowler
[69] McKinney, D.C., Lin, M.D.: Approximate mixed integer nonlinear programming meth-
ods for optimal aquifer remdiation design. Water Resour. Res. 31, 731–740 (1995)
[70] Menon, P.P., Bates, D.G., Postlethwaite, I.: A deterministic hybrid optimization algo-
rithm for nonlinear flight control systems analysis. In: Proceedings of the 2006 Amer-
ican Control Conference, Minneapolis, MN, pp. 333–338. IEEE Computer Society
Press, Los Alamitos (2006)
[71] Meyer, A.S., Kelley, C.T., Miller, C.T.: Electronic supplement to ”optimal design
for problems involving flow and transport in saturated porous media”. Adv. Water
Resources 12, 1233–1256 (2002)
[72] Meyer, A.S., Kelley, C.T., Miller, C.T.: Optimal design for problems involving flow and
transport in saturated porous media. Adv. Water Resources 12, 1233–1256 (2002)
[73] Payne, J.L., Eppstein, M.J.: A hybrid genetic algorithm with pattern search for finding
heavy atoms in protein crystals. In: GECCO 2005: Proceedings of the 2005 conference
on Genetic and evolutionary computation, pp. 374–384. ACM Press, New York (2005)
[74] Plantenga, T.D.: HOPSPACK 2.0 User Manual (v 2.0.1). Technical Report SAND2009-
6265, Sandia National Labs, Livermore, CA (2009)
[75] Powell, M.J.D.: Direct search algorithms for optimization calculations. Acta Numer. 7,
287–336 (1998)
[76] Raidl, G.R.: A unified view on hybrid metaheuristics. In: Almeida, F., Blesa Aguilera,
M.J., Blum, C., Moreno Vega, J.M., Pérez Pérez, M., Roli, A., Sampels, M. (eds.) HM
2006. LNCS, vol. 4030, pp. 1–12. Springer, Heidelberg (2006)
[77] Regis, R.G., Shoemaker, C.A.: Constrained global optimization of expensive black box
functions using radial basis functions. J. Global Opt. 31 (2005)
[78] Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P.: Design and analysis of computer
experiments. Statist. Sci. 4, 409–435 (1989)
[79] Santner, T.J., Williams, B.J., Notz, W.I.: The Design and Analysis of Computer Exper-
iments. Springer, New York (2003)
[80] Siah, E.S., Sasena, M., Volakis, J.L., Papalambros, P.Y., Wiese, R.W.: Fast parameter
optimization of large-scale electromagnetic objects using DIRECT with Kriging meta-
modeling. IEEE T. Microw. Theory 52(1), 276–285 (2004)
[81] Stein, M.L.: Interpolation of Spatial Data. Springer, New York (1999)
[82] Taddy, M., Lee, H.K.H., Gray, G.A., Griffin, J.D.: Bayesian guided pattern search for
robust local optimization. Technometrics 51(4), 389–401 (2009)
[83] Talbi, E.G.: A taxonomy of hybrid metaheurtistics. J. Heuristics 8, 541–564 (2004)
[84] Torczon, V.: PDS: Direct search methods for unconstrained optimization on either se-
quential or parallel machines. Technical Report TR92-09, Rice Univ., Dept. Comput.
Appl. Math., Houston, TX (1992)
[85] Torczon, V.: On the convergence of pattern search algorithms. SIAM J. Opt. 7, 1–25
(1997)
[86] Wachowiak, K.P., Peters, T.M.: Combining global and local parallel optimization for
medical image registration. In: Fitzpatrick, J.M., Reinhardt, J.M. (eds.) Medical Imag-
ing 2005: Image Processing, vol. 5747, pp. 1189–1200. SPIE, San Jose (2005)
[87] Wachowiak, M.P., Peters, T.M.: Parallel optimization approaches for medical image
registration. In: Barillot, C., Haynor, D.R., Hellier, P. (eds.) MICCAI 2004. LNCS,
vol. 3216, pp. 781–788. Springer, Heidelberg (2004)
[88] Wild, S., Regis, R.G., Shoemaker, C.A.: ORBIT: optimization by radial basis function
interpolation in trust region. SIAM J. Sci. Comput. 30(6), 3197–3219 (2008)
7 Traditional and Hybrid Derivative-Free Optimization Approaches 151
[89] Wright, M.H.: Direct search methods: Once scorned, now respectable. In: Griffiths,
D.F., Watson, G.A. (eds.) Numerical Analysis 1995 (Proceedings of the 1995 Dundee
Biennial Conference in Numerical Analysis). Pitman Research Notes in Mathematics,
vol. 344, pp. 191–208. CRC Press, Boca Raton (1996)
[90] Yehui, P., Zhenhai, L.: A derivative-free algorithm for unconstrained optimization.
Appl. Math. - J. Chinese Univ. 20(4), 491–498 (2007)
[91] Zhang, T., Choi, K.K., et al.: A hybrid surrogate and pattern search optimiza-
tion method and application to microelectronics. Struc. Multidisiciplinary Opt. 32,
327–345 (2006)
[92] Zheng, C., Hill, M.C., Hsieh, P.A.: MODFLOW2000, The U.S.G.S Survey Modular
Ground-Water Model User Guide to the LMT6 Package, the Linkage With MT3DMS
for Multispecies Mass Transport Modeling. USGS, user’s guide edition (2001)
[93] Zitzler, E., Deb, K., Thiele, L.: Comparison of multiobjective evolutionary algorithms:
Empirical results. Evolutionary Comp. J. 8(2), 173–195 (2000)
Chapter 8
Simulation-Driven Design in Microwave
Engineering: Methods*
8.1 Introduction
Computer-aided full-wave electromagnetic analysis has been used in microwave
engineering for a few decades. Initially, its main application area was design
verification. Electromagnetic (EM) simulations can be highly accurate but, at the
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 153–178.
springerlink.com © Springer-Verlag Berlin Heidelberg 2011
154 S. Koziel and S. Ogurtsov
where Rf ∈ Rm denotes the response vector of the device of interest, e.g., the
modulus of the transmission coefficient |S21| evaluated at m different frequencies. U
is a given scalar merit function, e.g., a minimax function with upper and lower
specifications [7]. Vector x* is the optimal design to be determined. Normally, Rf is
obtained through computationally expensive electromagnetic simulation. It is re-
ferred to as the high-fidelity or fine model.
The conventional way of handling the design problem (8.1) is to employ the EM
simulator directly within the optimization loop as illustrated in Fig. 8.1. This direct
approach faces some fundamental difficulties. The most important one is the compu-
tational cost. EM simulation of a microwave device at a single design can
Simulation-Driven Design in Microwave Engineering: Methods 155
Initial Design
x(0) Optimizer
i=0
x(i)
EM
Evaluate Model Simulator
x(i)
Rf(x(i)),Rf(x(i))
Update Design
x(i+1)
Termination No
i=i+1
Condition?
Yes
Final Design
Fig. 8.1 Conventional simulation-driven design optimization: the EM solver is directly em-
ployed in the optimization loop. Each modification of the design requires additional simula-
tion of the structure under consideration. Typical (e.g., gradient-based) optimization algo-
rithms may require tens or hundreds of computationally expensive iterations.
areas, particularly for antenna design, population-based search techniques are used
such as genetic algorithms [13], [14] or particle swarm optimizers [15], [16].
These algorithms are mostly exploited to handle issues such as multiple local op-
tima for antenna-related design problems, although they suffer from substantial
computational overhead. Probably the best picture of the state of the art in the
automated EM-simulation-based design optimization is given by the methods that
are available in major commercial software packages such as CST Microwave
Studio [9], Sonnet Software [17], HFSS [10], or FEKO [18]. All these packages
offer traditional techniques including gradient-based algorithm, simplex search, or
genetic algorithms. Practical use of these methods is quite limited.
One of possible ways of alleviating the difficulties of EM-simulation-based de-
sign optimization is the use of adjoint sensitivities. The adjoint sensitivity ap-
proach dates back to the 1960s work of Director and Rohrer [8]. Bandler et al.
[19] also addressed adjoint circuit sensitivities, e.g., in the context of microwave
design. Interest in EM-based adjoint calculations was revived after the work [20]
was published. Since 2000, a number of interesting publications addressed the ap-
plication of the so-called adjoint variable method (AVM) to different numerical
EM solvers. These include the time-domain transmission-line modeling (TLM)
method [21], the finite-difference time-domain (FDTD) method [22], the finite-
element method (FEM) [23], the method of moments (MoM) [24], the frequency
domain TLM [25], and the mode-matching method (MM) [26]. These approaches
can be classified as either time-domain adjoint variable methods or frequency-
domain adjoint variable methods. Adjoint sensitivity is an efficient way to speed
up (and, in most cases, actually make feasible) gradient-based optimization using
EM solvers, as the derivative information can be obtained with no extra EM simu-
lation of the structure in question. As mentioned before, adjoint sensitivities are
currently implemented in some major commercial EM simulation packages, particu-
larly in CST Microwave Studio [9], and in HFSS [10]. As for now, adjoint sensitiv-
ity is only available for frequency-domain solvers; however, CST plans to imple-
ment it in time-domain in one of the next releases.
Another way of improving efficiency of simulation-driven design is circuit de-
composition, i.e., breaking down an EM model into smaller parts and combining
them in a circuit simulator to reduce the CPU-intensity of the design process [27]-
[29]. Co-simulation or co-optimization of EM/circuit is a common industry solu-
tion to blend EM-simulated components into circuit models. In general through,
this is only a partial solution though because the EM-embedded co-simulation
model is still subject to direct optimization.
methods are treated in some detail in Chapter 3. Here, only some background
information is presented. The primary reason for using SBO approach in micro-
wave engineering is to speed up the design process by shifting the optimization
burden to an inexpensive yet reasonably accurate surrogate model of the device.
The generic SBO framework described here that the direct optimization of the
computationally expensive EM-simulated high-fidelity model Rf is replaced by an
iterative procedure [7], [32]
Initial Design
x(0) Surrogate-Based
i=0 Optimization
Algorithm
x(i)
Evaluate Fine Model EM Solver
x(i)
x(i), Rf(x(i))
Update Surrogate Model Surrogate Model
(Parameter Extraction)
Rs(i)
Optimize Surrogate Model
x(i+1)
Termination No
i=i+1
Condition?
Yes
Final Design
accuracy. This is justified in the case of library models created for multiple usage
but not so much in the case of ad hoc surrogates created for specific tasks such as
parametric optimization, yield-driven design, and/or statistical analysis at a given
(e.g., optimal) design.
Physical surrogates are based on underlying physically-based low-fidelity mod-
els of the structure of interest (denoted here as Rc). Physically-based models de-
scribe the same physical phenomena as the high-fidelity model, however, in a
simplified manner. In microwave engineering, the high-fidelity model describes
behavior of the system in terms of the distributions of the electric and magnetic
fields within (and, sometimes in its surrounding) that are calculated by solving the
corresponding set of Maxwell equations [47]. Furthermore, the system perform-
ance is expressed through certain characteristics related to its input/output ports
(such as so-called S-parameters [47]). All of these are obtained as a result of high-
resolution electromagnetic simulation where the structure under consideration is
finely discretized. In this context, the physically-based low-fidelity model of the
microwave device can be obtained through:
The three groups of models have different characteristics. While analytical and
equivalent-circuit models are computationally cheap, they may lack accuracy and
they are typically not available for structures such as antennas and substrate-
integrated circuits. On the other hand, coarsely-discretized EM models are available
for any device. They are typically accurate, however, relatively expensive. The cost
is a major bottleneck in adopting coarsely-discretized EM models to surrogate-based
optimization in microwave engineering. One workaround is to build a function-
approximation model using coarse-discretization EM-simulation data (using, e.g.,
kriging [31]). This, however, requires dense sampling of the design space, and
should only be done locally to avoid excessive CPU cost. Table 8.1 summarizes the
characteristics of the low-fidelity models available in microwave engineering. A
common feature of physically-based low-fidelity models is that the amount of
high-fidelity model data necessary to build a reliable surrogate model is much
smaller than in case of functional surrogates [48].
160 S. Koziel and S. Ogurtsov
Consider an example microstrip bandpass filter [48] shown in Fig. 8.3(a). The
high-fidelity filter model is simulated using EM solver FEKO [18]. The low-
fidelity model is an equivalent circuit implemented in Agilent ADS [49]
(Fig. 8.3(b)). Figure 8.4(a) shows the responses (here, the modulus of transmission
coefficient, |S21|, versus frequency) of both models at certain reference design x(0).
While having similar shape, the responses are severely misaligned. Figure 8.4(b)
shows the responses of the high-fidelity model and the surrogate constructed using
the low-fidelity model and space mapping [48]. The surrogate is build using a sin-
gle training point – high-fidelity model data at x(0) – and exhibits very good
matching with the high-fidelity model at x(0). Figure 8.4(c) shows the high-fidelity
and surrogate model response at a different design: the good alignment between
the models is still maintained. This comes from the fact that the physically-based
low-fidelity model has similar properties to the high-fidelity one and local model
alignment usually results in relatively good global matching.
Output
g 5mm
L1 L2
0.6mm
L3 L4
5mm
Input
(a)
(b)
Fig. 8.3 Microstrip bandpass filter [48]: (a) geometry, (b) low-fidelity circuit model.
Simulation-Driven Design in Microwave Engineering: Methods 161
-20
|S21|
-40
4.5 5 5.5
Frequency [GHz]
(a)
0
-20
|S21|
-40
4.5 5 5.5
Frequency [GHz]
(b)
0
-20
|S21|
-40
4.5 5 5.5
Frequency [GHz]
(c)
Fig. 8.4 Microstrip bandpass filter [48]: (a) high- (—) and low-fidelity (- - -) model re-
sponse at the reference design x(0); (b) responses of the high-fidelity model (—) and surro-
gate model constructed from the low-fidelity model using space mapping (- - -) at x(0); (c)
responses of the high-fidelity model (—) and the surrogate (- - -) at another design x. The
surrogate model was constructed using a single high-fidelity model response (at x(0)) but a
good matching between the models is preserved even away from the reference design,
which is due to the fact that the low-fidelity model is physically based.
where Rs.g is a generic space mapping surrogate model, i.e., the low-fidelity model
composed with suitable transformations, whereas
is a vector of model parameters and wi.k are weighting factors; a common choice
of wi.k is wi.k = 1 for all i and all k.
Various space mapping surrogate models are available [7], [30]. They can be
roughly categorized into four groups: (i) Models based on a (usually linear) distor-
tion of coarse model parameter space, e.g., input space mapping of the form
Rs.g(x, p) = Rs.g(x, B, c) = Rc(B·x + c) [7]; (ii) Models based on a distortion of the
coarse model response, e.g., output space mapping of the form
Rs.g(x, p) = Rs.g(x, d) = Rc(x) + d [30]; (iii) Implicit space mapping, where the pa-
rameters used to align the surrogate with the fine model are separate from the de-
sign variables, i.e., Rs.g(x, p) = Rs.g(x, xp) = Rc.i(x, xp), with Rc.i being the coarse
model dependent on both the design variables x and so-called preassigned parame-
ters xp (e.g., dielectric constant, substrate height) that are normally fixed in the
fine model but can be freely altered in the coarse model [30]; (iv) Custom models
exploiting parameters characteristic to a given design problem; the most character-
istic example is the so-called frequency space mapping
Rs.g(x, p) = Rs.g(x, F) = Rc.f(x, F) [7], where Rc.f is a frequency-mapped coarse
model, i.e., the coarse model evaluated at frequencies different from the original
frequency sweep for the fine model, according to the mapping ω → f1 + f2 ω, with
F = [f1 f2]T.
Space mapping usually comprises combined transformations. At instance, a
surrogate model employing input, output, and frequency SM transformations
would be Rs.g(x, p) = Rs.g(x, c, d, F) = Rc.f(x + c, F) + d. The rationale for this is
that a properly chosen mapping may significantly improve the performance of the
space mapping algorithm, however, the optimal selection of the mapping type for
a given design problem is not trivial [38]. Work has been done to ease the
Simulation-Driven Design in Microwave Engineering: Methods 163
selection process for a given design problem [39], [48]. However, regardless of
the mapping choice, coarse model accuracy is what principally affects the per-
formance of the space mapping design process. One can quantify the quality of the
surrogate model through rigorous convergence conditions [38]. These conditions, al-
though useful for developing more efficient space mapping algorithms and auto-
matic surrogate model selection techniques, cannot usually be verified because of
the limited amount of data available from the fine model. In practice, the most im-
portant criterion for assessing the quality or accuracy of the coarse model is still vis-
ual inspection of the fine and coarse model responses at certain points and/or exam-
ining absolute error measures such as ||Rf(x) – Rc(x)||.
The coarse model is the most important factor that affects the performance of
the space mapping algorithm. The first stems from accuracy. Coarse model accu-
racy (more generally, the accuracy of the space mapping surrogate [38]) is the
main factor that determines the efficiency of the algorithm in terms of finding a
satisfactory design. The more accurate the coarse model, the smaller the number
of fine model evaluations necessary to complete the optimization process. If the
coarse model is insufficiently accurate, the space mapping algorithm may need
more fine model evaluations or may even fail to find a good quality design.
The second important characteristic is the evaluation cost. It is essential that the
coarse model is computationally much cheaper than the fine model because both
parameter extraction (8.4) and surrogate optimization (8.2) require large numbers
of coarse model evaluations. Ideally, the evaluation cost of the coarse model
should be negligible when compared to the evaluation cost of the fine model, in
which case the total computational cost of the space mapping optimization process
is merely determined by the necessary number of fine model evaluations. If the
evaluation time of the coarse model is too high, say, larger than 1% of the fine
model evaluation time, the computational cost of surrogate model optimization
and, especially, parameter extraction, start playing important roles in the total cost
of space mapping optimization and may even determine it. Therefore, practical ap-
plicability of space mapping is limited to situations where the coarse model is com-
putationally much cheaper than the fine model. Majority of SM models reported in
the literature (e.g., [7], [30], [36]) concern microstrip filters, transformers or junc-
tions where fast and reliable equivalent circuit coarse models are easily available.
Tuning space mapping (TSM) [50] combines the concept of tuning, widely
used in microwave engineering [55], [56], and space mapping. It is an iterative op-
timization procedure that assumes the existence of two surrogate models: both are
less accurate but computationally much cheaper than the fine model. The first
model is a so-called tuning model Rt that contains relevant fine model data (typi-
cally a fine model response) at the current iteration point and tuning parameters
(typically implemented through circuit elements inserted into tuning ports). The
tunable parameters are adjusted so that the model Rt satisfies the design specifica-
tions. The second model, Rc is used for calibration purposes: it allows us to trans-
late the change of the tuning parameters into relevant changes of the actual design
variables; Rc is dependent on three sets of variables: design parameters, tuning pa-
rameters (which are actually the same parameters as the ones used in Rt), and SM
parameters that are adjusted using the usual parameter extraction process [7] in
order to have the model Rc meet certain matching conditions. Typically, the model
Rc is a standard SM surrogate (i.e., a coarse model composed with suitable trans-
formations) enhanced by the same or corresponding tuning elements as the model
Rt. The conceptual illustrations of the fine model, the tuning model and the cali-
bration model are shown in Fig. 8.5.
The iteration of the TSM algorithm consists of two steps: optimization of the
tuning model and a calibration procedure. First, the current tuning model Rt(i) is
built using fine model data at point x(i). In general, because the fine model with in-
serted tuning ports is not identical to the original structure, the tuning model re-
sponse may not agree with the response of the fine model at x(i) even if the values
of the tuning parameters xt are zero, so that these values must be adjusted to, say,
xt.0(i), in order to obtain alignment [50]:
In the next step, one optimizes Rt(i) to have it meet the design specifications. Op-
timal values of the tuning parameters xt.1(i) are obtained as follows:
xt(.1i ) = arg min U ( Rt(i ) ( xt ) ) (8.6)
xt
The calibration model is then optimized with respect to the design variables in or-
der to obtain the next iteration point x(i+1)
Note that xt.0(i) is used in (8.7), which corresponds to the state of the tuning model
after performing the alignment procedure (8.5), and xt.1(i) in (8.8), which
Simulation-Driven Design in Microwave Engineering: Methods 165
corresponds to the optimized tuning model (cf. (6)). Thus, (8.7) and (8.8) allow
finding the change of design variable values x(i+1) – x(i) necessary to compensate
the effect of changing the tuning parameters from xt.0(i) to xt.1(i).
Tuning Model
uH j
D HE
ZDJ Rt(xt)
DB 0
u E jZ B
DD U
Response
B PH
x Fine Rf(x)
Model
Design Response xt Tuning
Variables Parameters
(a) (b)
Calibration Model
x Space Rc(x,p,xt)
Design Mapping Response
Variables
Space
Mapping p xt Tuning
Parameters
Parameters
(c)
Fig. 8.5 Conceptual illustrations of the fine model, the tuning model and the calibration
model: (a) the fine model is typically based on full-wave simulation, (b) the tuning model
exploits the fine model “image” (e.g., in the form of S-parameters corresponding to the cur-
rent design imported to the tuning model using suitable data components) and a number of
circuit-theory-based tuning elements, (c) the calibration model is usually a circuit equiva-
lent dependent on the same design variables as the fine model, the same tuning parameters
as the tuning model and, additionally, a set of space mapping parameters used to align the
calibration model with both the fine and the tuning model during the calibration process.
It should be noted that the calibration procedure described here represents the
most generic approach. In some cases, there is a formula that establishes an ana-
lytical relation between the design variables and the tuning parameters so that the
updated design can be found simply by applying that formula [50]. In particular,
the calibration formula may be just a linear function so that
x(i+1) = x(i) + s(i)∗(xt.1(i) – xt.0(i)), where s(i) is a real vector and ∗ denotes a Hadamard
product (i.e., component-wise multiplication) [50]. If the analytical calibration is
possible, there is no need to use the calibration model. Other approaches to the
calibration process can be found in the literature [50], [57]. In some cases (e.g.,
[57]), the tuning parameters may be in identity relation with the design variables,
which simplified the implementation of the algorithm.
166 S. Koziel and S. Ogurtsov
The operation of the tuning space mapping algorithm can be clarified using a
simple example of a microstrip transmission line [50]. The fine model is imple-
mented in Sonnet em [17] (Fig. 8.6(a)), and the fine model response is taken as the
inductance of the line as a function of the line’s length. The original length of the
line is chosen to be x(0) = 400 mil with a width of 0.635 mm. The goal is to find a
length of line such that the corresponding inductance is 6.5 nH at 300 MHz. The
Sonnet em simulation at x(0) gives the value of 4.38 nH, i.e., Rf(x(0)) = 4.38 nH.
1 2 1 3 4 2
(a) (b)
S4P
SNP1 L
4 L1
L=L nH
1 2
3 Ref
Term 1 Term 2
Z=50 Ohm Z=50 Ohm
(c)
MLIN L MLIN
TL1 L1 TL2
Term 1 Term 2
W= 0.635 mm L=L nH W= 0.635 mm
Z=50 Ohm L= x/2 mm Z=50 Ohm
L= x/2 mm
(d)
Fig. 8.6 TSM optimization of the microstrip line [50]: (a) original structure of the micro-
strip line in Sonnet, (b) the microstrip line after being divided and with inserted the co-
calibrated ports, (c) tuning model, (d) calibration model.
The tuning model Rt is developed by dividing the structure in Fig. 8.6(a) into two
separate parts and adding the two tuning ports as shown in Fig. 8.6(b). A small induc-
tor is then inserted between these ports as a tuning element. The tuning model is im-
plemented in Agilent ADS [47] and shown in Fig. 8.6(c). The model contains the fine
model data at the initial design in the form of the S4P element as well as the tuning
element (inductor). Because of Sonnet’s co-calibrated ports technology [56], there is
a perfect agreement between the fine and tuning model responses when the value of
the tuning inductance is zero, so that xt.0(0) is zero in this case.
Next, the tuning model should be optimized to meet the target inductance of 6.5
nH. The optimized value of the tuning inductance is xt.1(0) = 2.07 nH.
The calibration model is shown in Fig. 8.6(d). Here, the dielectric constant of the
microstrip element is used as a space mapping parameter p. Original value of this
parameter, 9.8, is adjusted using (8.7) to 23.7 so that the response of the calibration
Simulation-Driven Design in Microwave Engineering: Methods 167
model is 4.38 nH at 400 mil, i.e., it agrees with the fine model response at x(0).
Now, the new value of the microstrip length is obtained using (8.8). In particular,
one optimizes x with the tuning inductance set to xt.0(0) = 0 nH to match the total in-
ductance of the calibration model to the optimized tuning model response, 6.5 nH.
The result is x(1) = 585.8 mil; the fine model response at x(1) obtained by Sonnet em
simulation is 6.48 nH. This result can be further improved by performing a second
iteration of the TSM, which gives the length of the microstrip line equal to
x(2) = 588 mil and its corresponding inductance of 6.5 nH.
Simulation-based tuning and tuning space mapping can be extremely efficient
as demonstrated in Chapter 12. In particular, a satisfactory design can be obtained
after just one or two iterations. However, the tuning methodology has limited
applications. It it well suited for structures such as microstrip filters but it can
hardly be applied for radiating structures (antennas). Also, tuning of cross-
sectional parameters (e.g., microstrip width) is not straightforward [50]. On the
other hand, the tuning procedure is invasive in the sense that the structure may
need to be cut. The fine model simulator must allow such cuts and allow tuning
elements to be inserted. This can be done using, e.g., Sonnet em [17]. Also, EM
simulation of a structure containing a large number of tuning ports is computation-
ally far more expensive than the simulation of the original structure (without the
ports). Depending on the number of design variables, the number of tuning ports
may be as large as 30, 50 or more [50], which may increase the simulation time by
one order of magnitude or more. Nevertheless, recent results presented in [58] indi-
cate possibility of speeding up the tuning process by using so-called reduced
structures.
where
Rs(i ) ( x , ω j ) = R f .i ( x (i ) , F (ω j ,{−ωkt }kK=1 )) + R (ω j ,{rkt }kK=1 ) (8.10)
0
-10
-20
|S 21| [dB]
-30
-40
-50
8 10 12 14 16 18
Frequency [GHz]
(a)
0
-10
-20
|S 21| [dB]
-30
-40
-50
8 10 12 14 16 18
Frequency [GHz]
(b)
Fig. 8.7 SPRP concept: (a) Example low-fidelity model response at the design x(i), Rc(x(i))
(solid line), the low-fidelity model response at x, Rc(x) (dotted line), characteristic points of
Rc(x(i)) (circles) and Rc(x) (squares), and the translation vectors (short lines); (b) High-
fidelity model response at x(i), Rf(x(i)) (solid line) and the predicted high-fidelity model response
at x (dotted line) obtained using SPRP based on characteristic points of Fig. 8.1(a); characteristic
points of Rf(x(i)) (circles) and the translation vectors (short lines) were used to find the character-
istic points (squares) of the predicted high-fidelity model response; low-fidelity model responses
Rc(x(i)) and Rc(x) are plotted using thin solid and dotted line, respectively [51].
⎡1 x−( Kn.1) " x−( Kn.)n ( x−( Kn.1) )2 " ( x−( Kn.)n )2 ⎤ ⎡ R(j −n) ⎤
⎢ ⎥ ⎡ λ j .0 ⎤ ⎢ ⎥
⎢# # # # # ⎥ ⎢ ⎥ ⎢ # ⎥
λ
⎢ j.1 ⎥ = ⎢ R(0) ⎥
(8.12)
⎢1 x0.1
(K)
" x0.( Kn ) ( x0.1
(K ) 2
) " ( x−( Kn.)n )2 ⎥ ⋅ ⎢
⎥ # ⎥ ⎢
j ⎥
⎢
⎢# # # # # ⎥ ⎢ ⎥ ⎢ # ⎥
⎢1 x( K ) " x( K ) ( x( K ) )2 " ( x( K ) )2 ⎥ ⎢⎣ λ j.2n ⎥⎦ ⎢ R( n) ⎥
⎣ n.1 n.n n.1 − n .n ⎦ ⎢⎣ j ⎥⎦
where xk.j(K) is a jth component of the vector xk(K), and Rj(k) is a jth component of
the vector R(k), i.e.,
In order to account for unavoidable misalignment between Rc.K and Rf, instead
of optimizing the quadratic model q, it is recommended to optimize a corrected
model q(x) + [Rf(x(K)) – Rc.K(x(K))] that ensures a zero-order consistency [34] be-
tween Rc.K and Rf. The refined design can be then found as
x * = arg min U (q ( x ) + [ R f ( x ( K ) ) − Rc .K ( x ( K ) )]) (8.13)
x( K ) −d ≤ x≤ x( K ) +d
Simulation-Driven Design in Microwave Engineering: Methods 171
This kind of correction is also known as output space mapping [30]. If necessary,
the step (8.4) can be performed a few times starting from a refined design, i.e.,
x* = argmin{x(K) – d ≤ x ≤ x(K) + d : U(q(x) + [Rf(x*) – Rc.K(x*)])} (each iteration
requires only one evaluation of Rf).
The design optimization procedure can be summarized as follows (input argu-
ments are: initial design x(0) and the number of coarse-discretization models K):
1. Set j = 1;
2. Optimize coarse-discretization model Rc.j to obtain a new design x(j) using
x(j–1) as a starting point;
3. Set j = j + 1; if j < K go to 2;
4. Obtain a refined design x* as in (8.13);
5. END;
Note that the original model Rf is only evaluated at the final stage (step 4) of the
optimization process. Operation of the algorithm in illustrated in Fig. 8.8. Coarse-
discretization models can be optimized using any available algorithm.
x(0) d1
d2 x(3)
(1)
x x*
(2)
x
Fig. 8.8 Operation of the multi-fidelity design optimization procedure for K = 3 (three coarse-
discretization models). The design x(j) is obtained as the optimal solution of the model Rc.j,
j = 1, 2, 3. A reduced second-order approximation model q is set up in the neighborhood of
x(3) (gray area) and the final design x* is obtained by optimizing a reduced q as in (8.4).
the fine model optimum. The second (and, possibly third) coarse-discretization
model should be more accurate but still at least about 10 times faster than the fine
model. This can be achieved by proper manipulation of the solver mesh density.
solution to the original design problem (i.e., optimization of the fine model with
respect to the original specifications). Steps 1 and 2 (listed above) can be repeated
if necessary. Substantial design improvement is typically observed after the first
iteration, however, additional iterations may bring further enhancement [53].
0
|S21| [dB]
-20
-40
8 10 12 14 16 18
Frequency [GHz]
(a)
0
|S 21| [dB]
-20
-40
8 10 12 14 16 18
Frequency [GHz]
(b)
0
|S 21| [dB]
-20
-40
8 10 12 14 16 18
Frequency [GHz]
(c)
Fig. 8.9 Bandstop filter example (responses of Rf and Rc are marked with solid and dashed
line, respectively): (a) fine and coarse model responses at the initial design (optimum of Rc)
as well as the original design specifications, (b) characteristic points of the responses corre-
sponding to the specification levels (here, –3 dB and –30 dB) and to the local response
maxima, (c) fine and coarse model responses at the initial design and the modified design
specifications.
174 S. Koziel and S. Ogurtsov
In the first step of the optimization procedure, the design specifications are
modified (or mapped) so that the level of satisfying/violating the modified specifi-
cations by the coarse model response corresponds to the satisfaction/violation lev-
els of the original specifications by the fine model response. It is assumed that the
coarse model is physically-based, in particular, that the adjustment of the design
variables has similar effect on the response for both Rf and Rc. In such a case the
coarse model design that is obtained in the second stage of the procedure (i.e., op-
timal with respect to the modified specifications) will be (almost) optimal for Rf
with respect to the original specifications. As shown in Fig. 8.9, the absolute
matching between the models is not as important as the shape similarity.
In order to reduce the overhead related to coarse model optimization (step 2 of
the procedure) the coarse model should be computationally as cheap as possible.
For that reason, equivalent circuits or models based on analytical formulas are pre-
ferred. Unfortunately, such models may not be available for many structures in-
cluding antennas, certain types of waveguide filters and substrate integrated cir-
cuits. In all such cases, it is possible to implement the coarse model using the
same EM solver as the one used for the fine model but with coarser discretization.
To some extent, this is the easiest and the most generic way of creating the coarse
model. Also, it allows a convenient adjustment of the trade-off between the quality
of Rc (i.e., the accuracy in representing the fine model) and its computational cost.
For popular EM solvers (e.g., CST Microwave Studio [9], Sonnet em [17], FEKO
[18]) it is possible to make the coarse model 20 to 100 faster than the fine model
while maintaining accuracy that is sufficient for the method SPRP.
When compared to space mapping and tuning, the adaptively adjusted design
specifications technique appears to be much simpler to implement. Unlike space
mapping, it does not use any extractable parameters (which are normally found by
solving a separate nonlinear minimization problem), the problem of the surrogate
model selection [38], [39] (i.e., the choice of the transformation and its parameters)
does not exist, and the interaction between the models is very simple (only through
the design specifications). Unlike tuning methodologies, the method presented in this
section does not require any modification of the optimized structure (such as “cut-
ting” and insertion of the tuning components [50]). The lack of extractable parame-
ters is its additional advantage compared to some other approached (e.g., space map-
ping) because the computational overhead related to parameter extraction, while
negligible for very fast coarse model (e.g., equivalent circuit), may substantially in-
crease the overall design cost if the coarse model is relatively expensive (e.g., imple-
mented through coarse-discretization EM simulation).
If the similarity between the fine and coarse model response is not sufficient the
adaptive design specifications technique may not work well. In many cases, how-
ever, using different reference design for the fine and coarse models may help. In
particular, Rc can be optimized with respect to the modified specifications starting
not from x(0) (the optimal solution of Rc with respect to the original specifications),
but from another design, say xc(0), at which the response of Rc is as similar to the re-
sponse of Rf at x(0) as possible. Such a design can be obtained as follows [7]:
Simulation-Driven Design in Microwave Engineering: Methods 175
At iteration i of the optimization process, the optimal design of the coarse model
Rc with respect to the modified specifications, xc(i), has to be translated to the cor-
responding fine model design, x(i), as follows x(i) = xc(i) + (x(0) – xc(0)). Note that the
preconditioning procedure (8.14) is performed only once for the entire optimiza-
tion process. The idea of coarse model preconditioning is borrowed from space
mapping (more specifically, from the original space mapping concept [7]). In prac-
tice, the coarse model can be “corrected” to reduce its misalignment with the fine
model using any available degrees of freedom, for example, preassigned parameters
as in implicit space mapping [33].
8.6 Summary
Simulation-driven optimization has become an important design tool in contempo-
rary microwave engineering. Its importance is expected to grow in the future due
to the rise of the new technologies and the novel classes of devices and systems
for which traditional design methods are not applicable. The surrogate-based
approach and methods described in this chapter can make the electromagnetic-
simulation-based design optimization feasible and cost efficient. In Chapter 12, a
number of applications of the techniques presented here are demonstrated in the
design of common microwave devices including filters, antennas and interconnect
structures.
References
1. Bandler, J.W., Chen, S.H.: Circuit optimization: the state of the art. IEEE Trans.
Microwave Theory Tech. 36, 424–443 (1988)
2. Bandler, J.W., Biernacki, R.M., Chen, S.H., Swanson, J.D.G., Ye, S.: Microstrip filter
design using direct EM field simulation. IEEE Trans. Microwave Theory Tech. 42,
1353–1359 (1994)
3. Swanson Jr., D.G.: Optimizing a microstrip bandpass filter using electromagnetics. Int.
J. Microwave and Millimeter-Wave CAE 5, 344–351 (1995)
4. De Zutter, D., Sercu, J., Dhaene, V., De Geest, J., Demuynck, F.J., Hammadi, S., Paul,
C.-W.: Recent trends in the integration of circuit optimization and full-wave electro-
magnetic analysis. IEEE Trans. Microwave Theory Tech. 52, 245–256 (2004)
5. Schantz, H.: The art and science of ultrawideband antennas. Artech House, Boston
(2005)
6. Wu, K.: Substrate Integrated Circuits (SiCs) – A new paradigm for future Ghz and Thz
electronic and photonic systems. IEEE Circuits Syst. Soc. Newsletter 3 (2009)
7. Bandler, J.W., Cheng, Q.S., Dakroury, S.A., Mohamed, A.S., Bakr, M.H., Madsen, K.,
Søndergaard, J.: Space mapping: the state of the art. IEEE Trans. Microwave Theory
Tech. 52, 337–361 (2004)
8. Director, S.W., Rohrer, R.A.: The generalized adjoint network and network sensitivi-
ties. IEEE Trans. Circuit Theory CT-16, 318–323 (1969)
176 S. Koziel and S. Ogurtsov
9. CST Microwave Studio, ver. 20109 CST AG, Bad Nauheimer Str. 19, D-64289 Darm-
stadt, Germany (2010)
10. HFSS, release 13.0, ANSYS (2010),
http://www.ansoft.com/products/hf/hfss/
11. Wrigth, S.J., Nocedal, J.: Numerical Optimization. Springer, Heidelberg (1999)
12. Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: new perspec-
tives on some classical and modern methods. SIAM Rev. 45, 385–482 (2003)
13. Lai, M.-I., Jeng, S.-K.: Compact microstrip dual-band bandpass filters design using
genetic-algorithm techniques. IEEE Trans. Microwave Theory Tech. 54, 160–168
(2006)
14. Haupt, R.L.: Antenna design with a mixed integer genetic algorithm. IEEE Trans. An-
tennas Propag. 55, 577–582 (2007)
15. Jin, N., Rahmat-Samii, Y.: Parallel particle swarm optimization and finite- difference
time-domain (PSO/FDTD) algorithm for multiband and wide-band patch antenna de-
signs. IEEE Trans. Antennas Propag. 53, 3459–3468 (2005)
16. Jin, N., Rahmat-Samii, Y.: Analysis and particle swarm optimization of correlator an-
tenna arrays for radio astronomy applications. IEEE Trans. Antennas Propag. 56,
1269–1279 (2008)
17. Sonnet em. Ver. 12.54, Sonnet Software. North Syracuse, NY (2009)
18. FEKO User’s Manual. Suite 5.5, EM Software & Systems-S.A (Pty) Ltd, 32 Techno
Lane, Technopark, Stellenbosch, 7600, South Africa (2009)
19. Bandler, J.W., Seviora, R.E.: Wave sensitivities of networks. IEEE Trans. Microwave
Theory Tech. 20, 138–147 (1972)
20. Chung, Y.S., Cheon, C., Park, I.H., Hahn, S.Y.: Optimal design method for microwave
device using time domain method and design sensitivity analysis-part II: FDTD case.
IEEE Trans. Magn. 37, 3255–3259 (2001)
21. Bakr, M.H., Nikolova, N.K.: An adjoint variable method for time domain TLM with
fixed structured grids. IEEE Trans. Microwave Theory Tech. 52, 554–559 (2004)
22. Nikolova, N.K., Tam, H.W., Bakr, M.H.: Sensitivity analysis with the FDTD method
on structured grids. IEEE Trans. Microwave Theory Tech. 52, 1207–1216 (2004)
23. Webb, J.P.: Design sensitivity of frequency response in 3-D finite-element analysis of
microwave devices. IEEE Trans. Magn. 38, 1109–1112 (2002)
24. Nikolova, N.K., Bandler, J.W., Bakr, M.H.: Adjoint techniques for sensitivity analysis
in high-frequency structure CAD. IEEE Trans. Microwave Theory Tech. 52, 403–419
(2004)
25. Ali, S.M., Nikolova, N.K., Bakr, M.H.: Recent advances in sensitivity analysis with
frequency-domain full-wave EM solvers. Applied Computational Electromagnetics
Society J. 19, 147–154 (2004)
26. El Sabbagh, M.A., Bakr, M.H., Nikolova, N.K.: Sensitivity analysis of the scattering
parameters of microwave filters using the adjoint network method. Int. J. RF and Mi-
crowave Computer-Aided Eng. 16, 596–606 (2006)
27. Snyder, R.V.: Practical aspects of microwave filter development. IEEE Microwave
Magazine 8(2), 42–54 (2007)
28. Shin, S., Kanamaluru, S.: Diplexer design using EM and circuit simulation techniques.
IEEE Microwave Magazine 8(2), 77–82 (2007)
29. Bhargava, A.: Designing circuits using an EM/circuit co-simulation technique. RF De-
sign. 76 (January 2005)
Simulation-Driven Design in Microwave Engineering: Methods 177
30. Koziel, S., Bandler, S.W., Madsen, K.: A space mapping framework for engineering
optimization: theory and implementation. IEEE Trans. Microwave Theory Tech. 54,
3721–3730 (2006)
31. Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidynathan, R., Tucker, P.K.: Surro-
gate based analysis and optimization. Progress in Aerospace Sciences 41, 1–28 (2005)
32. Forrester, A.I.J., Keane, A.J.: Recent advances in surrogate-based optimization. Prog.
Aerospace Sciences 45, 50–79 (2009)
33. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust Region Methods. MPS-SIAM Series on
Optimization (2000)
34. Alexandrov, N.M., Dennis, J.E., Lewis, R.M., Torczon, V.: A trust region framework
for managing use of approximation models in optimization. Struct. Multidisciplinary
Optim. 15, 16–23 (1998)
35. Booker, A.J., Dennis Jr., J.E., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.:
A rigorous framework for optimization of expensive functions by surrogates. Struc-
tural Optimization 17, 1–13 (1999)
36. Amari, S., LeDrew, C., Menzel, W.: Space-mapping optimization of planar coupled-
resonator microwave filters. IEEE Trans. Microwave Theory Tech. 54, 2153–2159
(2006)
37. Crevecoeur, G., Sergeant, P., Dupre, L., Van de Walle, R.: Two-level response and pa-
rameter mapping optimization for magnetic shielding. IEEE Trans. Magn. 44, 301–308
(2008)
38. Koziel, S., Bandler, J.W., Madsen, K.: Quality assessment of coarse models and surro-
gates for space mapping optimization. Optimization Eng. 9, 375–391 (2008)
39. Koziel, S., Bandler, J.W.: Space-mapping optimization with adaptive surrogate model.
IEEE Trans. Microwave Theory Tech. 55, 541–547 (2007)
40. Simpson, T.W., Peplinski, J., Koch, P.N., Allen, J.K.: Metamodels for computer-based
engineering design: survey and recommendations. Engineering with Computers 17,
129–150 (2001)
41. Miraftab, V., Mansour, R.R.: EM-based microwave circuit design using fuzzy logic
techniques. IEE Proc. Microwaves, Antennas & Propagation 153, 495–501 (2006)
42. Yang, Y., Hu, S.M., Chen, R.S.: A combination of FDTD and least-squares support
vector machines for analysis of microwave integrated circuits. Microwave Opt. Tech-
nol. Lett. 44, 296–299 (2005)
43. Xia, L., Meng, J., Xu, R., Yan, B., Guo, Y.: Modeling of 3-D vertical interconnect us-
ing support vector machine regression. IEEE Microwave Wireless Comp. Lett. 16,
639–641 (2006)
44. Burrascano, P., Dionigi, M., Fancelli, C., Mongiardo, M.: A neural network model for
CAD and optimization of microwave filters. In: IEEE MTT-S Int. Microwave Symp.
Dig., Baltimore, MD, pp. 13–16 (1998)
45. Zhang, L., Xu, J., Yagoub, M.C.E., Ding, R., Zhang, Q.-J.: Efficient analytical formu-
lation and sensitivity analysis of neuro-space mapping for nonlinear microwave device
modeling. IEEE Trans. Microwave Theory Tech. 53, 2752–2767 (2005)
46. Kabir, H., et al.: Neural network inverse modeling and applications to microwave filter
design. IEEE Trans. Microwave Theory Tech. 56, 867–879 (2008)
47. Pozar, D.M.: Microwave Engineering, 3rd edn. Wiley, Chichester (2004)
48. Koziel, S., Cheng, Q.S., Bandler, J.W.: Implicit space mapping with adaptive selection
of preassigned parameters. IET Microwaves, Antennas & Propagation 4, 361–373
(2010)
178 S. Koziel and S. Ogurtsov
49. Agilent ADS. Version 2009, Agilent Technologies, 395 Page Mill Road, Palo Alto,
CA, 94304 (2009)
50. Koziel, S., Meng, J., Bandler, J.W., Bakr, M.H., Cheng, Q.S.: Accelerated microwave
design optimization with tuning space mapping. IEEE Trans. Microwave Theory and
Tech. 57, 383–394 (2009)
51. Koziel, S.: Shape-preserving response prediction for microwave design optimization.
IEEE Trans. Microwave Theory and Tech. (2010) (to appear)
52. Koziel, S., Ogurtsov, S.: Robust multi-fidelity simulation-driven design optimization
of microwave structures. In: IEEE MTT-S Int. Microwave Symp. Dig., Anaheim, CA,
pp. 201–204 (2010)
53. Koziel, S.: Efficient optimization of microwave structures through design specifica-
tions adaptation. In: IEEE Int. Symp. Antennas Propag., Toronto, Canada (2010)
54. Bandler, J.W., Salama, A.E.: Functional approach to microwave postproduction tun-
ing. IEEE Trans. Microwave Theory Tech. 33, 302–310 (1985)
55. Swanson, D., Macchiarella, G.: Microwave filter design by synthesis and optimization.
IEEE Microwave Magazine 8(2), 55–69 (2007)
56. Rautio, J.C.: EM-component-based design of planar circuits. IEEE Microwave Maga-
zine 8(4), 79–90 (2007)
57. Cheng, Q.S., Bandler, J.W., Koziel, S.: Tuning Space Mapping Optimization Exploit-
ing Embedded Surrogate Elements. In: IEEE MTT-S Int. Microwave Symp. Dig.,
Boston, MA, pp. 1257–1260 (2009)
58. Koziel, S., Bandler, J.W., Cheng, Q.S.: Design optimization of microwave circuits
through fast embedded tuning space mapping. In: European Microwave Conference,
Paris, September 26-October 1 (2010)
Chapter 9
Variable-Fidelity Aerodynamic Shape Optimization
9.1 Introduction
Aerodynamic and hydrodynamic design optimization is of primary importance in
several disciplines [1-3]. In aircraft design, both for conventional transport aircraft
and unmanned air vehicles, the aerodynamic wing shape is designed to provide
maximum efficiency under a variety of takeoff, cruise, maneuver, loiter, and land-
ing conditions [1, 4-7]. Constraints on aerodynamic noise are also becoming in-
creasingly important [8, 9]. In the design of turbines, such as gas, steam, or wind
turbines, the blades are designed to maximize energy output for a given working
fluid and operating conditions [2, 10]. The shapes of the propeller blades of ships
are optimized to increase efficiency [11]. The fundamental design problem, com-
mon to all these disciplines, is to design a streamlined wing (or blade) shape that
provides the desired performance for a given set of operating conditions, while at
the same time fulfilling one or multiple design constraints [12-20].
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 179–210.
springerlink.com © Springer-Verlag Berlin Heidelberg 2011
180 L. Leifsson and S. Koziel
Fuselage
Pylon
Engine nacelle
Turbine blade section Wing
Fig. 9.1 A CAD drawing of a typical transport aircraft with a turbofan jet engine. The air-
craft wing and the turbine blades of the turbofan engines are streamlined aerodynamic
surfaces defined by airfoil sections
In the early days of engineering design, the designer would have to rely on ex-
perience and physical experiments. Nowadays, most engineering design is per-
formed using computational tools, especially in the early phases, i.e., conceptual
and preliminary design. This is commonly referred to as simulation-driven (or si-
mulation-based) design. Physical experiments are normally performed at the final
design stages only, mostly for validation purposes. The fidelity of the computa-
tional methods used in design has been steadily increasing. Over forty years ago,
the computational fluid dynamic (CFD) tools were only capable of simulating po-
tential flow past simplified wing configurations [1, 21]. Today’s commercial CFD
tools, e.g., [22, 23], are capable of simulating three-dimensional viscous flows
past full aircraft configurations using the Reynolds-Averaged Navier-Stokes
(RANS) equations with the appropriate turbulence models [24].
The use of optimization methods in the design process, either as a design sup-
port tool or for automated design, has now become commonplace. In aircraft de-
sign, the use of numerical optimization techniques began in the mid 1970’s by
coupling CFD tools with gradient-based optimization methods [1]. Substantial
progress has been made since then, and the exploitation of higher-fidelity meth-
ods, coupled with optimization techniques, has led to improved design efficiency
[4, 12-16]. An overview of the relevant work is provided in the later sections. In
spite of being widespread, simulation-driven aerodynamic design optimization
involves numerous challenges:
• High-fidelity CFD simulations are computationally expensive. A CFD si-
mulation involves solving the governing flow equations on a computa-
tional mesh. The resulting system of algebraic equations can be very large,
with a number of unknowns equal to the product of the number of flow va-
riables and the number of mesh points. For a three-dimensional turbulent
RANS flow simulation with one million mesh points, and a two-equation
turbulence model, there will be seven flow variables, leading to an alge-
braic system of seven million equations with seven million unknowns. De-
pending on the computational resources, this kind of simulation can take
Variable-Fidelity Aerodynamic Shape Optimization 181
V c
Fig. 9.2 A single-element airfoil section (this solid line) of chord length c and thickness t.
V∞ is at an angle of attack α relative to the x-axis. F is the force acting on the airfoil due to
the airflow, where the component perpendicular to V∞ is called the lift force l and the com-
ponent parallel to V∞ is called the drag force d. p is the pressure acting normal to a surface
element of length ds, and τ is the viscous wall shear stress acting parallel to the surface
element. θ is the angle that p and τ make relative to the z- and x-axis, respectively, positive
clockwise
Variable-Fidelity Aerodynamic Shape Optimization 183
l
Cl ≡ (9.1)
q∞ S
where l is the magnitude of the lift force, q∞ ≡ (1/2)ρ∞V∞2 is the dynamic pressure,
ρ∞ is the air density, V∞ is the free-stream velocity, and S is a reference surface.
For a two-dimensional airfoil, the reference area is taken to be the chord length
multiplied by a unit depth, i.e., S = c. Similarly, the drag coefficient is defined as
d
Cd ≡ (9.2)
q∞ S
Table 9.1 Typical problem formulations for two-dimensional airfoil shape optimization.
Additionally a constraint on the minimum allowable airfoil cross-sectional area is included.
min f (x)
x
s.t. g j (x) ≤ 0 (9.3)
l ≤x≤u
where f(x) is the objective function, x is the design variable vector (parameters de-
scribing the airfoil shape), gj(x) are the design constraints, and l and u are the low-
er and upper bounds, respectively. The detailed formulation depends on the par-
ticular design problem. Typical problem formulations for two-dimensional airfoil
optimization are listed in Table 9.1. Additional constraints are often prescribed.
For example, to account for the wing structural components inside the airfoil, one
sets a constraint on the airfoil cross-sectional area, which can be formally written
as g2(x) = Amin – A(x) ≤ 0, where A(x) is the cross-sectional area of the airfoil for
the design vector x and Amin is the minimum allowable cross-sectional area. Other
constraints can be included depending on the design situation, e.g., a maximum
pitching moment or a maximum local allowable pressure coefficient [41].
An aircraft wing and a turbomachinery blade are three-dimensional aerody-
namic surfaces. A schematic of a typical wing (or a blade) planform is shown in
Fig. 9.3, where—at each spanstation (numbered 1 through 4)—the wing cross-
section is defined by an airfoil shape. The number of spanstations can be smaller
or larger than four, depending on the design scenario. Between each station, there
is a straight-line wrap. Parameters controlling the planform shape include the wing
span, the quarter-chord wing sweep angle, the chord lengths and thickness-to-
chord ratio at each spanstation, the wing taper ratio, and the twist distribution.
Numerical design optimization of the three-dimensional wing (or blade) is per-
formed in a similar fashion as for the two-dimensional airfoil [4]. In the problem for-
mulation, the section lift and drag coefficients are replaced by the overall lift and drag
coefficients. However, the number of design variables is much larger and the fluid
flow domain is three-dimensional. These factors increase the computational burden,
and the setup of the optimization process becomes even more important [4].
1
y
2 V∞
Λ 3
4
x
b/2
Fig. 9.3 A schematic of a wing planform of semi-span b/2 and quarter chord sweep angle
Λ. Other planform parameters (not shown) are the taper ratio (ratio of tip chord to root
chord) and the twist distribution. V∞ is the free-stream speed
Variable-Fidelity Aerodynamic Shape Optimization 185
The fluid flow past an aerodynamic surface is governed by the Navier-Stokes equ-
ations. For a Newtonian fluid, compressible viscous flows in two dimensions,
without body forces, mass diffusion, finite-rate chemical reactions, heat conduc-
tion, or external heat addition, the Navier-Stokes equations, can be written in
Cartesian coordinates as [24]
∂U ∂E ∂F
+ + =0 (9.4)
∂t ∂x ∂x
186 L. Leifsson and S. Koziel
⎡ρ⎤ ⎡ ρu ⎤ ⎡ ρv ⎤
⎢ ρu ⎥ ⎢ ⎥ ⎢ ρ τ ⎥
⎢ ρ u 2
+ p − τ ⎥ ⎢ uv − ⎥ (9.5)
U = ⎢ ⎥ E =
xx
F =
xy
⎢ ρv ⎥ ⎢ ρuv − τ xy ⎥ ⎢ ρv 2 + p − τ yy ⎥
⎢ ⎥ ⎢ ⎥ ⎢ ⎥
⎣ Et ⎦ ⎣⎢( Et + p)u − uτ xx − vτ xy ⎦⎥ ⎣⎢( Et + p )v − uτ xy − vτ yy ⎦⎥
Here, ρ is the fluid density, u and v are the x and y velocity components, respec-
tively, p is the static pressure, Et = ρ (e+V2/2) is the total energy per unit volume, e
is the internal energy per unit mass, V2/2 is the kinetic energy, and τ is the viscous
shear stress tensor given by [24]
2 ⎛ ∂u ∂v ⎞ 2 ⎛ ∂v ∂u ⎞ ⎛ ∂u ∂v ⎞
τ xx = μ ⎜ 2 − ⎟ τ yy = μ ⎜⎜ 2 − ⎟⎟ τ xy = μ ⎜⎜ + ⎟⎟ (9.6)
3 ⎜⎝ ∂x ∂y ⎟⎠ 3 ⎝ ∂y ∂x ⎠ ⎝ ∂y ∂x ⎠
p = ρRT (9.7)
Incompressible flow
Small disturbance Reynolds Equations (RANS)
approximation
Laplace’s Equation Restrict viscous effects to gradients
normal to bodies (directional bias)
Prandtl-Glauert Equation
Transonic Small-Disturbance Equation Thin Layer N-S Equations
Fig. 9.4 A hierarchy of the governing fluid flow equations with the associated assumptions
and approximations
Direct Numerical Simulation (DNS) has as objective to simulate the whole range
of the turbulent statistical fluctuations at all relevant physical scales. This is a for-
midable challenge, which grows with increasing Reynolds number as the total
computational effort for DNS simulations is proportional to Re3 for homogeneous
turbulence [25]. Due to limitations of computational capabilities, DNS is not avail-
able for typical engineering flows such as those encountered in airfoil design for
typical aircraft and turbomachinery, i.e., with Reynolds numbers from 105 to 107.
Large-Eddy Simulation (LES) is of the same category as DNS, in that it com-
putes directly the turbulent fluctuations in space and time, but only above a certain
length scale. Below that scale, the turbulence is modeled by semi-empirical laws.
The total computational effort for LES simulations is proportional to Re9/4, which
is significantly lower than for DNS [25]. However, it is still excessively high for
large Reynolds number applications.
The Reynolds equations (also called the Reynolds-averaged Navier-Stokes eq-
uations (RANS)) are obtained by time-averaging of a turbulent quantity into their
188 L. Leifsson and S. Koziel
mean and fluctuating components. This means that turbulence is treated through
turbulence models. As a result, a loss in accuracy is introduced since the available
turbulence models are not universal. A widely used turbulence model for simula-
tion of the flow past airfoils and wings is the Spalart-Allmaras one-equation turbu-
lence model [43]. The model was developed for aerospace applications and is con-
sidered to be accurate for attached wall-bounded flows and flows with mild
separation and recirculation. However, the RANS approach retains the viscous ef-
fects in the fluid flow, and, at the same time, significantly reduces the computa-
tional effort since there is no need to resolve all the turbulent scales (as it is done
in DNS and partially in LES). This approach is currently the most widely applied
approximation in the CFD practice and can be applied to both low-speed, such as
take-off and landing conditions of an aircraft, and high-speed design [25].
The inviscid flow assumption will lead to the Euler equations. These equations
hold, in the absence of separation and other strong viscous effects, for any shape
of the body, thick or thin, and at any angle of attack [44]. Shock waves appear in
transonic flow where the flow goes from being supersonic to subsonic. Across the
shock, there is almost a discontinuous increase in pressure, temperature, density,
and entropy, but a decrease in Mach number (from supersonic to subsonic). The
shock is termed weak if the change in pressure is small, and strong if the change in
pressure is large. The entropy change is of third order in terms of shock strength.
If the shocks are weak, the entropy change across shocks is small, and the flow
can be assumed to be isentropic. This, in turn, allows for the assumption of irrota-
tional flow. Then, the Euler equations cascade to a single nonlinear partial differ-
ential equation, called the full potential equation (FPE). In the case of a slender
body at a small angle of attack, we can make the assumption of a small distur-
bance. Then, the FPE becomes the transonic small-disturbance equation (TSDE).
These three different sets of equations, i.e., the Euler equations, FPE, and TSDE,
represent a hierarchy of models for the analysis of inviscid, transonic flow past
airfoils [44]. The Euler equations are exact, while FPE is an approximation (weak
shocks) to those equations, and TSDE is a further approximation (thin airfoils at
small angle of attack). These approaches can be applied effectively for high-speed
design, such as the cruise design of transport aircraft wings [13, 14] and the design
of turbomachinery blades [2].
There are numerous airfoil and wing models that are not typical CFD models,
but they are nevertheless widely used in aerodynamic design. Examples of such
methods include thin airfoil theory, lifting line theory (unswept wings), vortex lat-
tice methods (wings), and panel methods (airfoils and wings). These methods are
out of the scope of this chapter, but the interested reader is directed to [45] and
[46] for the details. In the following section, we describe the elements of a typical
CFD simulation of the RANS or Euler equations.
9.3.2.1 Geometry
Several methods are available for describing the airfoil shape numerically, each
with its own benefits and drawbacks. In general, these methods are based on two
different approaches, either the airfoil shape itself is parameterized, or, given an
initial airfoil shape, the shape deformation is parameterized.
Generate geometry
Generate grid
Flow solution
Evaluate objective(s)
and constraints
Fig. 9.5 Elements of a single CFD simulation in numerical airfoil shape optimization
The shape deformation approach is usually performed in two steps. First, the
surface of the airfoil is deformed by adding values computed from certain
functions to the upper and lower sides of the surfaces. Several different types of
functions can be considered, such as the Hicks-Henne bump functions [1], or the
transformed cosine functions [47]. After deforming the airfoil surface, the compu-
tational grid needs to be regenerated. Either the whole grid is regenerated based on
the airfoil shape deformation, or the grid is deformed locally, accounting for the
airfoil shape deformation. The latter is computationally more efficient. An exam-
ple grid deformation method is the volume spline method [47]. In some cases, the
first step described here above is skipped, and the grid points on the airfoil surface
are used directly for the shape deformation [14].
Numerous airfoil shape parameterization methods have been developed. The
earliest development of parameterized airfoil sections was performed by the Na-
tional Advisory Committee for Aeronautics (NACA) in the 1930’s [48]. Their de-
velopment was derived from wind tunnel experiments, and, therefore, the shapes
generated by this method are limited to those investigations. However, only three
parameters are required to describe their shape. Nowadays, the most widely used
airfoil shape parameterization methods are the Non-Uniform Rational B-Spline
190 L. Leifsson and S. Koziel
(NURBS) [27], and the Bézier curves [49] (a special case of NURBS). These me-
thods use a set of control points to define the airfoil shape and are general enough
so that (nearly) any airfoil shape can be generated. In numerical optimization,
these control points are used as design variables and they provide sufficient con-
trol of the shape so that local changes on the upper and lower surfaces can be
made separately. The number of control points varies depending on how accu-
rately the shape is to be controlled. NURBS requires as few as thirteen control
points to represent a large family of airfoils [27]. Other parameterization methods
include the PARSEC method [50], which uses 11 specific airfoil geometry pa-
rameters (such as leading edge radius, and upper and lower crest location includ-
ing curvature), and the Bezier-PARSEC method [51], which combines the Bezier
and PARSEC methods.
In this chapter, for the sake of simplicity, we use the NACA airfoil shapes [48]
to illustrate some variable-fidelity optimization methods. In particular, we use the
NACA four-digit airfoil parameterization method, where the airfoil shape is de-
fined by three parameters m (the maximum ordinate of the mean camberline as a
fraction of chord), p (the chordwise position of the maximum ordinate) and t/c (the
thickness-to-chord ratio). The airfoils are denoted by NACA mpxx, where xx is the
thickness-to-chord ratio, t/c.
The NACA airfoils are constructed by combining a thickness function yt(x)
with a mean camber line function yc(x). The x-coordinates are [48]
xu , l = x ∓ yt sin θ (9.8)
where u and l refer to the upper and lower surfaces, respectively, yt(x) is the thick-
ness function, yc(x) is the mean camber line function, and
⎛ dyc ⎞
θ = tan −1 ⎜ ⎟ (9.10)
⎝ dx ⎠
is the mean camber line slope. The NACA four-digit thickness distribution is given
by
yt = t ( a0 x1 2 − a1 x − a2 x 2 + a3 x 3 − a4 x 4 ) (9.11)
⎧ m
⎪⎪ (2 px − x 2 ) , x < p
p2
yc = ⎨ (9.12)
m
⎪ (1 − 2 p + 2 px − x 2 ) , x ≥ p
⎩⎪ (1 − p)
2
z/c 0.1
0
NACA 4608
NACA 0012 NACA 2412
-0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x/c
Fig. 9.6 Shown are three different NACA four-digit airfoil sections. NACA 0012 (m = 0, p
= 0, t/c = 0.12) is shown by a solid line (-). NACA 2412 (m = 0.02, p = 0.4, t/c = 0.12) is
shown by a dash line (--). NACA 4608 (m = 0.04, p = 0.6, t/c = 0.08) is shown by a dash-
dot line (-⋅-)
The governing equations are solved on a computational grid. The grid needs to re-
solve the entire solution domain, as well as the detailed airfoil geometry. Further-
more, the grid needs to be sufficiently fine to capture the flow physics accurately.
For example, a fine grid resolution is necessary near the airfoil surface, especially
near the LE where the flow gradients are large. Also, if viscous effects are in-
cluded, then the grid needs to be fine near the entire airfoil surface (and any other
wall surface in the solution domain). The grid can be much coarser several chord
lengths away from the airfoil and in the farfield. For a detailed discussion on grid
generation the reader is referred to [24] and [25].
For illustration purposes, a typical grid for an airfoil used in aircraft design,
generated using the computer program ICEM CFD [52], is shown in Fig. 9.7. This
is a structured curvilinear body-fitted grid of C-topology (a topology that can be
associated to the letter C, i.e., at the inlet the grid surrounds the leading-edge of
the airfoil, but is open at the other end). The size of the computational region is
made large enough so that it will not affect the flow solution. In this case, there are
24 chord lengths in front of the airfoil, 50 chord lengths behind it, and 25 chord
lengths above and below it. The airfoil leading-edge (LE) is located at the origin.
Most commercially available CFD flow solvers are based on the Finite Volume
Method (FVM). According to FVM, the solution domain is subdivided into a fi-
nite number of small control volumes (cells) by a grid. The grid defines the boun-
daries of the control volumes, while the computational node lies at the center of
the control volume. Integral conservation of mass, momentum, and energy are sat-
isfied exactly over each control volume. The result is a set of linear algebraic eq-
uations, one for each control volume. The set of equations are then solved itera-
tively, or simultaneously. Iterative solution is usually performed with relaxation to
suppress numerical oscillations in the flow solution that result from numerical er-
rors. The iterative process is repeated until the change in the flow variables in two
192 L. Leifsson and S. Koziel
20
10
z/c
-10
-20
-20 0 20 40
x/c
(a)
0.4
0.2
z/c
-0.2
-0.4
0 0.2 0.4 0.6 0.8 1
x/c
(b)
Fig. 9.7 (a) An example computational grid for the NACA 0012 airfoil with a C-topology,
(b) a view of the computational grid close to the airfoil
of the residuals for mass, momentum, and energy is shown in Fig. 9.8(a) and the
convergence of the lift and drag coefficients is shown in Fig. 9.8(b). The limit on
the residuals to indicate convergence was set to 10-6. The solver needed 216 itera-
tions to reach full convergence of the flow solution. However, only about 50 itera-
tions or so are necessary to reach convergence of the lift and drag coefficient. The
Mach number contour plot of the flow field around the airfoil is shown in
Fig. 9.9(a) and the pressure distribution on the airfoil surface is shown in
Fig. 9.9(b). On the upper surface there is a strong shock with associated decrease
in flow speed and an increase in pressure.
0
10
Continuity
x-momentum
y-momentum
-5 Energy
10
Residuals
-10
10
-15
10
0 50 100 150 200 250
Iterations
(a)
Cl
0.8
10 x Cd
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250
Iterations
(b)
Fig. 9.8 (a) Convergence history of the simulation of the flow past the NACA 2412 at M∞ =
0.75 and α = 1 deg., (b) convergence of the lift and drag coefficients. The converged values
of the lift coefficient is Cl = 0.67 and the drag coefficient is Cd = 0.0261
194 L. Leifsson and S. Koziel
The aerodynamic forces are calculated by integrating the pressure (p) and the vis-
cous wall shear stress (τ), as defined in Figure 9.2, over the surface of the airfoil.
The pressure coefficient is defined as
p − p∞
Cp ≡ (9.13)
q∞
Mach
1.4
0.6
1.3
1.2
1.1
0.4 1.0
0.9
z/c
Shock 0.8
0.2 0.7
0.6
0.5
0.0 0.4
0.3
0.0 0.2 0.4 0.6 0.8 1.0 0.2
x/c 0.1
(a)
1.5
1
Shock
0.5
-Cp
-0.5
-1
0 0.2 0.4 0.6 0.8 1
x/c
(b)
Fig. 9.9 (a) Mach contour plot of the flow past the NACA 2412 at M∞ = 0.75 and α = 1
deg., (b) the pressure distribution on the surface of the airfoil. The lift coefficient is Cl =
0.67 and drag coefficient Cd = 0.0261
Variable-Fidelity Aerodynamic Shape Optimization 195
τ
Cf ≡ (9.14)
q∞
The normal force coefficient (parallel to the z-axis) acting on the airfoil is [46]
where ds is the length of a surface element, θ is the angle (positive clockwise) that
p and τ make relative to the z- and x-axis, respectively. The subscripts u and l refer
to the upper and lower airfoil surfaces, respectively. The horizontal force coeffi-
cient (parallel to the x-axis) acting on the airfoil is [46]
where α is the airfoil angle of attack, and the drag force coefficient is calculated as
Initial Design
x(0) Optimization
i=0 Loop
x(i)
High-Fidelity
Evaluate Model CFD Simulation
x(i)
Update Design
x(i+1)
Termination No
i=i+1
Condition?
Yes
Final Design
In many situations, the functions one wants to optimize are difficult to handle.
This is particularly the case in aerodynamic design where the objective and con-
straint functions are typically based on CFD simulations. The major issue is com-
putational cost of simulation which may be very high (e.g., up to several days or
even weeks for high-fidelity 3D wing simulation for a single design). Another
problem is numerical noise which is always present in CFD tools. Also, simulation
may fail for specific sets of design variables (e.g., due to convergence issues). In
order to alleviate these problems, it is often advantageous to replace—in the opti-
mization process—the original objective function by its surrogate model. To make
this replacement successful, the surrogate should be sufficiently accurate represen-
tation of the original function, yet analytically tractable (smooth), and, preferably
computationally cheap (so that to reduce the overall cost of the optimization proc-
ess). In practice, surrogate-based optimization if often an iterative process, where
the surrogate model is re-optimized and updated using the data from the original
function that is accumulated during the algorithm run.
The flow of a typical SBO algorithm is shown in Fig. 9.11. The surrogate
model is optimized, in place of the high-fidelity one, to yield prediction of its mi-
nimizer. This prediction is verified by evaluating the high-fidelity model, which is
typically done only once per iteration (at every new design x(i+1)). Depending on
198 L. Leifsson and S. Koziel
the result of this verification, the optimization process may be terminated or may
continue, in which case the surrogate model is updated using the new available
high-fidelity model data, and then re-optimized to obtain a new, and hopefully bet-
ter approximation of the minimizer. For a well performing surrogate-based algo-
rithm, the number of iterations is substantially smaller than for most methods op-
timizing the high-fidelity model directly (e.g., gradient-based one).
Initial Design
x(0) Surrogate-Based
i=0 Optimization
Algorithm
x(i)
High-Fidelity
Evaluate Model CFD Simulation
x(i)
Surrogate Model
Update Surrogate Model
x(i+1)
Termination No
i=i+1
Condition?
Yes
Final Design
The functional surrogate models are typically cheap to evaluate. However, a con-
siderable amount of data is required to set up the surrogate model that ensures rea-
sonable accuracy. The methodology of constructing the functional surrogates is
generic, and, therefore, is applicable to a wide class of problems.
As the low-fidelity model enjoys the same underlying physics as the high-fidelity
one, it is able to predict the general behavior of the high-fidelity model. However,
200 L. Leifsson and S. Koziel
the low-fidelity model needs to be corrected to match the sampled data of the
high-fidelity model to become a reliable and accurate predictor. Popular correction
techniques include response correction [34] and space mapping [65]. One of the
recent techniques is shape-preserving response prediction (SPRP) introduced in
[26]. The application of this technique to the design of airfoil at high-lift and
transonic conditions is given in the next chapter.
The physics-based surrogates are typically more expensive to evaluate than the
functional surrogates. Furthermore, they are problem specific, i.e., reuse across
different problems is rare. On the other hand, their fundamental advantage is that
much less high-fidelity model data is needed to obtain a given accuracy level than
in case of functional surrogates. Some SBO algorithms exploiting physics-based
low-fidelity models (often referred to as variable- or multi-fidelity ones) require
just a single high-fidelity model evaluation per algorithm iteration to construct the
surrogate [34, 38]. One of the consequences is that the variable-fidelity SBO me-
thods are more scalable to larger number of design variables (assuming that no
derivative information is required) than SBO using functional surrogates.
The AMMO algorithm is a general approach for controlling the use of variable-
fidelity models when solving a nonlinear minimization problem, such as Eq. (9.3),
[34-36]. A flowchart of the AMMO algorithm is shown in Fig. 9.12. The opti-
mizer receives the function and constraint values, as well as their sensitivities,
from the low-fidelity model. The response of the low-fidelity model is corrected to
satisfy at least zero- and first-order consistency conditions with the high-fidelity
model, i.e., agreement between the function values and the first-order derivatives
at a given iteration point. The expensive high-fidelity computations are performed
outside the optimization loop and serve to re-calibrate the low-fidelity model oc-
casionally, based on a set of systematic criteria. AMMO exploits the trust-region
methodology [66], which is an adaptive move limit strategy for improving the
global behavior of optimization algorithms based on local models. By combining
the trust-region approach with the use of the low-fidelity model satisfying at least
first-order consistency conditions, then convergence of AMMO to the optimum of
the high-fidelity model can be guaranteed.
Variable-Fidelity Aerodynamic Shape Optimization 201
Initial Design
Update TR Radius
Accept No
Design?
Yes
No
Converged?
Yes
Final Design
Initial Surrogate
High-Fidelity
SEARCH
Model
Update Surrogate
Yes Improved
Design?
No
Yes
Converged? Final Design
No
POLL
Update Surrogate
No Improved
Refine Mesh Design?
Yes
The convergence of the SMF is ensured by the POLL step, where the neighbors
of the current best solution are evaluated using the high-fidelity model on the
mesh in a positive spanning set of directions [69] to look for a local objective
function improvement. In case the POLL step fails to improve the objective func-
tion value, the mesh is being refined and the new iteration begins starting with the
SEARCH step.
The surrogate model is updated in each iteration using all accumulated
high-fidelity data.
In [69], the SMF algorithm is applied to the optimal aeroacoustic shape design
of an airfoil in laminar flow. The airfoil shape is designed to minimized total radi-
ated acoustic power while constraining lift and drag. The high-fidelity model is
implemented through the solution to the unsteady incompressible two-dimensional
Navier-Stokes equations with a roughly 24 hour analysis time for a single CFD
evaluation. The surrogate function is constructed using kriging, which is typical
when using the SMF algorithm. As the acoustic noise is generated at the airfoil
TE, only the upper TE of the airfoil is parameterized with a spline using five con-
trol points. Optimal shapes that minimize noise are reported. Results show a
significant reduction (as much as 80%) in acoustic power with reasonable
computational cost (less than 88 function evaluations).
One of the important steps of the SBO optimization process is to update the surro-
gate model using the high-fidelity model data accumulated during the algorithm
run. In particular, the high-fidelity model is evaluated at any new design obtained
from prediction provided by the surrogate model. The new points at which we
evaluate the high-fidelity model are referred to as infill points [33]. Selection of
these points is based on certain infill criteria. These criteria can be either exploita-
tion- or exploration-based.
A popular exploitation-based strategy is to select the surrogate minimizer as the
new infill point [33]. This strategy is able to ensure finding at least a local mini-
mum of the high-fidelity model provided that the surrogate model satisfies zero-
and first-order consistency conditions. In general, using the surrogate model
optimum as a validation point corresponds to exploitation of certain region of the
design space, i.e., neighborhood of a local optimum. Selecting the surrogate mini-
mizer as the infill point is utilized by AMMO [34-36], SM [65, 73-75], MM [76],
and can also be used by SMF [38].
In exploration-based strategies, the new sample points are located in between
the existing ones. This allows building a surrogate model that is globally accurate.
A possible infill criterion is to allocate the new samples at the points of maximum
estimated error [33]. Pure exploration however, may not be a good way of updat-
ing the surrogate model in the context of optimization because the time spent on
accurately modeling sub-optimal regions may be wasted if the global optimum is
the only interest.
Probably the best way of performing global search is to balance exploration and
exploitation of the design space. The details regarding several possible approaches
can be found in [33].
9.6 Summary
Although aerodynamic shape optimization (ASO) is widely used in engineering
design, there are numerous challenges involved. One of the biggest challenges is
that high-fidelity computational fluid dynamic (CFD) simulations are (usually)
computationally expensive. As a result, the overall computational cost of the
Variable-Fidelity Aerodynamic Shape Optimization 205
References
1. Hicks, R.M., Henne, P.A.: Wing Design by Numerical Optimization. Journal of Air-
craft 15(7), 407–412 (1978)
2. Braembussche, R.A.: Numerical Optimization for Advanced Turbomachinery Design.
In: Thevenin, D., Janiga, G. (eds.) Optimization and Computational Fluid Dynamics,
pp. 147–189. Springer, Heidelberg (2008)
206 L. Leifsson and S. Koziel
3. Percival, S., Hendrix, D., Noblesse, F.: Hydrodynamic Optimization of Ship Hull
Forms. Applied Ocean Research 23(6), 337–355 (2001)
4. Leoviriyakit, K., Kim, S., Jameson, A.: Viscous Aerodynamic Shape Optimization of
Wings including Planform Variables. In: 21st Applied Aerodynamics Conference, Or-
lando, Florida, June 23-26 (2003)
5. van Dam, C.P.: The aerodynamic design of multi-element high-lift systems for trans-
port airplanes. Progress in Aerospace Sciences 8(2), 101–144 (2002)
6. Lian, Y., Shyy, W., Viieru, D., Zhang, B.: Membrane Wing Aerodynamics for Micro
Air Vehicles. Progress in Aerospace Sciences 39(6), 425–465 (2003)
7. Secanell, M., Suleman, A., Gamboa, P.: Design of a Morphing Airfoil Using Aerody-
namic Shape Optimization. AIAA Journal 44(7), 1550–1562 (2006)
8. Antoine, N.E., Kroo, I.A.: Optimizing Aircraft and Operations for Minimum Noise. In:
AIAA Paper 2002-5868, AIAA‘s Aircraft Technology, Integration, and Operations
(ATIO) Technical Forum, Los Angeles, California, October 1-3 (2002)
9. Hosder, S., Schetz, J.A., Grossman, B., Mason, W.H.: Airframe Noise Modeling Ap-
propriate for Multidisciplinary Design and Optimization. In: 42nd AIAA Aerospace
Sciences Meeting and Exhibit, Reno, NV, AIAA Paper 2004-0698, January 5-8 (2004)
10. Giannakoglou, K.C., Papadimitriou, D.I.: Adjoint Methods for Shape Optimization. In:
Thevenin, D., Janiga, G. (eds.) Optimization and Computational Fluid Dynamics,
pp. 79–108. Springer, Heidelberg (2008)
11. Celik, F., Guner, M.: Energy Saving Device of Stator for Marine Propellers. Ocean
Engineering 34(5-6), 850–855 (2007)
12. Jameson, A.: Aerodynamic Design via Control Theory. Journal of Scientific Comput-
ing 3, 233–260 (1988)
13. Reuther, J., Jameson, A.: Control Theory Based Airfoil Design for Potential Flow and
a Finitie Volume Discretization. In: AIAA Paper 94-0499, AIAA 32nd Aerospace Sci-
ences Meeting and Exhibit, Reno, Nevada (January 1994)
14. Jameson, A., Reuther, J.: Control Theory Based Airfoil Design using Euler Equations.
In: Proceedings of AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary
Analsysis and Optimization, Panama City Beach, pp. 206–222 (September 1994)
15. Reuther, J., Jameson, A., Alonso, J.J., Rimlinger, M.J., Sauders, D.: Constrained Mul-
tipoint Aerodynamic Shape Optimization Using an Adjoint Formulation and Parallel
Computing. In: 35th AIAA Aerospace Sciences Meeting & Exhibit, Reno, NV, AIAA
Paper 97-0103, Januvary 6-9 (1997)
16. Kim, S., Alonso, J.J., Jameson, A.: Design Optimization of High-Lift Configurations
Using a Viscous Continuous Adjoint Method. In: AIAA paper 2002-0844, AIAA 40th
Aerospace Sciences Meeting & Exhibit, Reno, NV (January 2002)
17. Eyi, S., Lee, K.D., Rogers, S.E., Kwak, D.: High-lift design optimization using Navier-
Stokes equations. Journal of Aircraft 33(3), 499–504 (1996)
18. Nemec, M., Zingg, D.W.: Optimization of high-lift configurations using a Newton-
Krylov algorithm. In: 16th AIAA Computational Fluid Dynamics Conference,
Orlando, Florida, June 23-26 (2003)
19. Nemec, M., Zingg, D.W., Pulliam, T.H.: Multi-Point and Multi-Objective Aerody-
namic Shape Optimization. In: AIAA Paper 2002-5548, 9th AIAA/ISSMO Sympo-
sium on Multidisciplinary Analysis and Optimization, Altanta, Georgia, September 4-6
(2002)
20. Giannakoglou, K.C.: Design of optimal aerodynamic shapes using stochastic optimiza-
tion methods and computational intelligence. Progress in Aerospace Sciences 38(2),
43–76 (2002)
Variable-Fidelity Aerodynamic Shape Optimization 207
21. Hicks, R.M., Murman, E.M., Vanderplaats, G.: An Assessment of Airfoil Design by
Numerical Optimization. NASA TM 3092 (July 1974)
22. FLUENT, ver. 6.3.26, ANSYS Inc., Southpointe, 275 Technology Drive, Canonsburg,
PA 15317 (2006)
23. GAMBIT. ver. 2.4.6, ANSYS Inc., Southpointe, 275 Technology Drive, Canonsburg,
PA 15317 (2006)
24. Tannehill, J.C., Anderson, D.A., Pletcher, R.H.: Computational Fluid Mechanics and
Heat Transfer, 2nd edn. Taylor & Francis, Abington (1997)
25. Hirsch, C.: Numerical Computation of Internal and External Flows – Fundamentals of
Computational Fluid Dynamics, 2nd ed., Butterworth-Heinemann (2007)
26. Leifsson, L., Koziel, S.: Multi-fidelity design optimization of transonic airfoils using
physics-based surrogate modeling and shape-preserving response prediction. Journal of
Computational Science 1(2), 98–106 (2010)
27. Lepine, J., Guibault, F., Trepanier, J.-Y., Pepin, F.: Optimized Nonuniform Rational
B-Spline Geometrical Representation for Aerodynamic Design of Wings. AIAA
Journal 39(11), 2033–2041 (2001)
28. Li, W., Huyse, L., Padula, S.: Robust Airfoil Optimization to Achieve Consistent Drag
Reductin Over a Mach Range. In: NASA/CR-2001-211042 (August 2001)
29. Giunta, A.A., Dudley, J.M., Narducci, R., Grossman, B., Haftka, R.T., Mason, W.H.,
Watson, L.T.: Noisy Aerodynamic Response and Smooth Approximations in HSCT
Design. In: 5th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analy-
sis and Optimization, Panama City, FL (Septemper 1994)
30. Burman, J., Gebart, B.R.: Influence from numerical noise in the objective function for
flow design optimisation. International Journal of Numerical Methods for Heat &
Fluid Flow 11(1), 6–19 (2001)
31. Dudley, J.M., Huang, X., MacMillin, P.E., Grossman, B., Haftka, R.T., Mason, W.H.:
Multidisciplinary Design Optimization of a High Speed Civil Transport. In: First In-
dustry/University Symposium on High Speed Transport Vehicles, December 4-6. NC
A&T University, Greensboro (1994)
32. Queipo, N.V., Haftka, R.T., Shyy, W., Goel, T., Vaidyanathan, R., Tucker, P.K.:
Surrogate-Based Analysis and Optimization. Progress in Aerospace Sciences 41(1),
1–28 (2005)
33. Forrester, A.I.J., Keane, A.J.: Recent advances in surrogate-based optimization. Pro-
gress in Aerospace Sciences 45(1-3), 50–79 (2009)
34. Alexandrov, N.M., Lewis, R.M.: An overview of first-order model management for
engineering optimization. Optimization and Engineering 2(4), 413–430 (2001)
35. Alexandrov, N.M., Nielsen, E.J., Lewis, R.M., Anderson, W.K.: First-Order Model
Management with Variable-Fidelity Physics Applied to Multi-Element Airfoil Optimi-
zation. In: 8th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Design
and Optimization, AIAA Paper 2000-4886, Long Beach, CA (September 2000)
36. Alexandrov, N.M., Lewis, R.M., Gumbert, C.R., Green, L.L., Newman, P.A.: Optimi-
zation with Variable-Fidelity Models Applied to Wing Design. In: 38th Aerospace
Sciences Meeting & Exhibit, Reno, NV, AIAA Paper 2000-0841(January 2000)
37. Robinson, T.D., Eldred, M.S., Willcox, K.E., Haimes, R.: Surrogate-Based Optimiza-
tion Using Multifidelity Models with Variable Parameterization and Corrected Space
Mapping. AIAA Journal 46(11) (November 2008)
38. Booker, A.J., Dennis Jr., J.E., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.:
A rigorous framework for optimization of expensive functions by surrogates. Struc-
tural Optimization 17(1), 1–13 (1999)
208 L. Leifsson and S. Koziel
39. Lee, D.S., Gonzalez, L.F., Srinivas, K., Periaux, J.: Multi-objective robust design op-
timisation using hierarchical asynchronous parallel evolutionary algorithms. In: 45th
AIAA Aerospace Science Meeting and Exhibit, AIAA Paper 2007-1169, Reno,
Nevada, USA, January 8-11 (2007)
40. Barrett, T.R., Bressloff, N.W., Keane, A.J.: Airfoil Design and Optimization Using
Multi-Fidelity Analysis and Embedded Inverse Design. In: 47th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Con-
ference, Newport, AIAA Paper 2006-1820, Rhode Island, May 1-4 (2006)
41. Vanderplaats, G.N.: Numerical Optimization Techniques for Engineering Design, 3rd
edn. Vanderplaats Research and Development (1999)
42. Rai, M.M.: Robust optimal design with differential evolution. In: 10th AIAA/ISSMO
Multidisciplinary Analysis and Optimization Conference, AIAA Paper 2004-4588,
Albany, New York, August 30 - September 1 (2004)
43. Spalart, P.R., Allmaras, S.R.: A one-equation turbulence model for aerodynamic flows.
In: 30th AIAA Aerospace Sciences Meeting and Exhibit, Reno, Nevada, January 6-9
(1992)
44. Anderson, J.D.: Modern Compressible Flow – With Historical Prespective, 3rd edn.
McGraw-Hill, New York (2003)
45. Katz, J., Plotkin, A.: Low-Speed Aerodynamics, 2nd edn. Cambridge University Press,
Cambridge (2001)
46. Anderson, J.D.: Fundamentals of Aerodynamics, 4th edn. McGraw-Hill, New York
(2007)
47. Gauger, N.R.: Efficient Deterministic Approaches for Aerodynamic Shape Optimiza-
tion. In: Thevenin, D., Janiga, G. (eds.) Optimization and Computational Fluid Dy-
namics, pp. 111–145. Springer, Heidelberg (2008)
48. Abbott, I.H., Von Doenhoff, A.E.: Theory of Wing Sections. Dover Publications, New
York (1959)
49. Peigin, S., Epstein, B.: Robust Optimization of 2D Airfoils Driven by Full Navier-
Stokes Computations. Computers & Fluids 33, 1175–1200 (2004)
50. Sobieczky, H.: Parametric Airfoils and Wings. In: Fuji, K., Dulikravich, G.S. (eds.)
Notes on Numerical Fluid Mechanics, vol. 68. Wiesbaden, Vieweg (1998)
51. Derksen, R.W., Rogalsky, T.: Bezier-PARSEC: An Optimized Aerofoil Parameteriza-
tion for Design. Advances in Engineering Software 41, 923–930 (2010)
52. ICEM CFD. ver. 12.1, ANSYS Inc., Southpointe, 275 Technology Drive, Canonsburg,
PA 15317 (2006)
53. Kolda, T.G., Lewis, R.M., Torczon, V.: Optimization by direct search: new perspec-
tives on some classical and modern methods. SIAM Review 45(3), 385–482 (2003)
54. Nelder, J.A., Mead, R.: A simplex method for function minimization. Computer Jour-
nal 7, 308–313 (1965)
55. Goldberg, D.: Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, Reading (1989)
56. Michalewicz, Z.: Genetic Algorithm + Data Structures = Evolutionary Programs, 3rd
edn. Springer, Heidelberg (1996)
57. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of IEEE Inter-
national Conference on Neural Networks, pp. 1942–1948 (1995)
58. Clerc, M., Kennedy, J.: The particle swarm - explosion, stability, and convergence in a
multidimensional complex space. IEEE Transactions on Evolutionary Computa-
tion 6(1), 58–73 (2002)
Variable-Fidelity Aerodynamic Shape Optimization 209
59. Storn, R., Price, K.: Differential evolution - a simple and efficient heuristic for global
optimization over continuous spaces. Journal of Global Optimization 11, 341–359
(1997)
60. Kirkpatrick, S., Gelatt Jr, C., Vecchi, M.: Optimization by simulated annealing.
Science 220(4498), 671–680 (1983)
61. Giunta, A.A., Wojtkiewicz, S.F., Eldred, M.S.: Overview of modern design of experi-
ments methods for computational simulations. In: 41st AIAA Aerospace Sciences
Meeting and Exhibit, AIAA Paper 2003-0649, Reno, NV (2003)
62. Simpson, T.W., Peplinski, J., Koch, P.N., Allen, J.K.: Metamodels for computer-based
engineering design: survey and recommendations. Engineering with Computers 17(2),
129–150 (2001)
63. Gunn, S.R.: Support vector machines for classification and regression. Tech. Rep.,
School of Electronics and Computer Science, University of Southampton (1998)
64. Forrester, A.I.J., Bressloff, N.W., Keane, A.J.: Optimization Using Surrogate Models
and Partially Converged Computationally Fluid Dynamics Simulations. Proceedings of
the Royal Society A: Mathematical. Physical and Engineering Sciences 462(2071),
2177–2204 (2006)
65. Bandler, J.W., Cheng, Q.S., Dakroury, S.A., Mohamed, A.S., Bakr, M.H., Madsen, K.,
Søndergaard, J.: Space mapping: the state of the art. IEEE Trans. Microwave Theory
Tech. 52(1), 337–361 (2004)
66. Conn, A.R., Gould, N.I.M., Toint, P.L.: Trust Region Methods. MPS-SIAM Series on
Optimization (2000)
67. Rao, S.S.: Engineering Optimization: Theory and Practice, 3rd edn. Wiley, Chichester
(1996)
68. Eldred, M.S., Giunta, A.A., Collis, S.S.: Second-Order Corrections for Surrogate-
Based Optimizatin with Model Hierarchies. In: 10th AIAA/ISSMO Multidisciplinary
Analysis and Optimization Conference, AIAA Paper 2004-4457, Albany, NY (2004)
69. Marsden, A.L., Wang, M., Dennis, J.E., Moin, P.: Optimal aeroacoustic shape design
using the surrogate management framework. Optimization and Engineering 5, 235–262
(2004)
70. Robinson, T.D., Eldred, M.S., Willcox, K.E., Haimes, R.: Strategies for Multifidelity
Optimization with Variable Dimensional Hierarchical Models. In: 47th
AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Con-
ference, AIAA Paper 2006-1819, Newport, Rhode Island, May 1-4 (2006)
71. Robinson, T.D., Willcox, K.E., Eldren, M.S., Haimes, R.: Multifidelity Optimization
for Variable-Complexity Design. In: 11th AIAA/ISSMO Multidisciplinary Analysis
and Optimization Conference, AIAA Paper 2006-7114, Portsmouth, VA (September
2006)
72. Robinson, T.D., Eldred, M.S., Willcox, K.E., Haimes, R.: Surrogate-Based Optimiza-
tion Using Multifidelity Models with Variable Parameterization and Corrected Space
Mapping. AIAA Journal 46(11) (November 2008)
73. Koziel, S., Bandler, J.W., Madsen, K.: A space mapping framework for engineering
optimization: theory and implementation. IEEE Trans. Microwave Theory
Tech. 54(10), 3721–3730 (2006)
74. Koziel, S., Cheng, Q.S., Bandler, J.W.: Space mapping. IEEE Microwave
Magazine 9(6), 105–122 (2008)
75. Redhe, M., Nilsson, L.: Using space mapping and surrogate models to optimize vehicle
crashworthiness design. In: 9th AIAA/ISSMO Multidisciplinary Analysis and Optimi-
zation Symp., AIAA Paper 2002-5536, Atlanta, GA, pp. 2002–5536 (Septemper 2002)
210 L. Leifsson and S. Koziel
76. Echeverria, D., Hemker, P.W.: Space mapping and defect correction. CMAM Int.
Mathematical Journal Computational Methods in Applied Mathematics 5(2), 107–136
(2005)
77. Koziel, S.: Efficient Optimization of Microwave Circuits Using Shape-Preserving Re-
sponse Prediction. In: IEEE MTT-S Int. Microwave Symp. Dig, Boston, MA, pp.
1569–1572 (2009)
78. Koziel, S., Leifsson, L.: Multi-Fidelity High-Lift Aerodynamic Optimization of Sin-
gle-Element Airfoils. In: Int. Conf. Engineering Optimization, Lisbon (September 6-9,
2010)
Chapter 10
Evolutionary Algorithms Applied to
Multi-Objective Aerodynamic Shape
Optimization
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 211–240.
springerlink.com c Springer-Verlag Berlin Heidelberg 2011
212 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
10.1 Introduction
There are many industrial areas in which optimization processes help to find new
solutions and/or to increase the performance of an existing one. Thus, in many cases
a research goal can be translated into an optimization problem. Optimal design in
aeronautical engineering is, by nature, a multiobjective, multidisciplinary and highly
difficult problem. Aerodynamics, structures, propulsion, acoustics, manufacturing
and economics, are some of the disciplines involved in this type of problems. In
fact, even if a single discipline is considered, many design problems in aeronautical
engineering have conflicting objectives (e.g., to optimize a wing’s lift and drag or
a wing’s structural strength and weight). The increasing demand for optimal and
robust designs, driven by economics and environmental constraints, along with the
advances in computational intelligence and the increasing computing power, has
improved the role of computational simulations, from being just analysis tools to
becoming design optimization tools.
In spite of the fact that gradient-based numerical optimization methods have been
successfully applied in a variety of aeronautical/aerospace design problems,1 [30,
16, 42] their use is considered a challenge due to the following difficulties found in
practice:
1. The design space is frequently multimodal and highly non-linear.
2. Evaluating the objective function (performance) for the design candidates is usu-
ally time consuming, due mainly to the high fidelity and high dimensionality
required in the simulations.
3. By themselves, single-discipline optimizations may provide solutions which not
necessarily satisfy objectives and/or constraints considered in other disciplines.
4. The complexity of the sensitivity analyses in Multidisciplinary Design Optimiza-
tion (MDO2 ) increases as the number of disciplines involved becomes larger.
5. In MDO, a trade-off solution, or a set of them, are searched for.
Based on the previously indicated difficulties, designers have been motivated to
use alternative optimization techniques such as Evolutionary Algorithms (EAs)
[31, 20, 33]. Multi-Objective Evolutionary Algorithms (MOEAs) have gained an
increasing popularity as numerical optimization tools in aeronautical and aerospace
engineering during the last few years [1, 21]. These population-based methods
mimic the evolution of species and the survival of the fittest, and compared to tradi-
tional optimization techniques, they present the following advantages:
(b) Multiple Solutions per Run: As MOEAs use a population of candidates, they are
designed to generate multiple trade-off solutions in a single run.
(c) Easy to Parallelize: The design candidates in a MOEA population, at each
generation, can be evaluated in parallel using diverse paradigms.
(d) Simplicity: MOEAs use only the objective function values for each design can-
didate. They do not require a substantial modification or complex interfacing
for using a CFD (Computational Fluid Dynamics) or CSD/M (Computational
Structural Dynamics/Mechanics) code.
(e) Easy to hybridize: Along with the simplicity previously stated, MOEAs also
allow an easy hybridization with alternative methods, e.g., memetic algorithms,
which additionally introduce specifities to the implementation, without
influencing the MOEA simplicity.
(f) Novel Solutions: In many cases, gradient-based optimization techniques con-
verge to designs which have little variation even if produced with very different
initial setups. In contrast, the inherent explorative capabilities of MOEAs allow
them to produce, some times, novel and non-intuitive designs.
An important volume of information has been published on the use of MOEAs in
aeronautical engineering applications (mainly motivated by the advantages previ-
ously addressed). In this chapter, we provide a review of some representative works,
dealing specifically with multi-objective aerodynamic shape optimization.
The remainder of this chapter is organized as follows: In Section 10.2, we present
some basic concepts and definitions adopted in multi-objective optimization. Next,
in Section 10.3, we review some of the work done in the area of multi-objective
aerodynamic shape optimization. This review covers: surrogate based optimization,
hybrid MOEA optimization, robust design optimization, multidisciplinary design op-
timization, and data mining and knowledge extraction. In Section 10.4 we present
a case study and, finally, in Section 10.5. we present our conclusions and final
remarks.
gi (x) ≤ 0 i = 1, 2, . . . , m (10.2)
hi (x) = 0 i = 1, 2, . . . , p (10.3)
3 Without loss of generality, minimization is assumed in the following definitions, since any
maximization problem can be transformed into a minimization one.
214 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
In order to say that a solution dominates another one, it needs to be strictly better in
at least one objective, and not worse in any of them.
In words, this definition says that x∗ is Pareto optimal if there exists no feasible vec-
tor x which would decrease some objective without causing a simultaneous increase
in at least one other objective (assuming minimization). This definition does not pro-
vide us a single solution (in decision variable space), but a set of solutions which
form the so-called Pareto Optimal Set (P ∗ ), whose formal definition is given by:
P ∗ = {x ∈ F |x is Pareto optimal}
10 Evolutionary Algorithms 215
The vectors that correspond to the solutions included in the Pareto optimal set are
said to be nondominated.
PF ∗ = {f(x) ∈ IRk |x ∈ P ∗ }
The goal on a MOP consists on determining P ∗ from F of all the decision variable
vectors that satisfy (10.2) and (10.3). Thus, when solving a MOP, we aim to find
not one, but a set of solutions representing the best possible trade-offs among the
objectives (the so-called Pareto optimal set).
(i) maximization of the stage pressure rise, and (ii) minimization of the entropy
generation. Constraints were imposed on the mass flow rate to have a difference
less than 0.1% between the new one and the reference design. The blade ge-
ometry was constructed from airfoil shapes defined at four span stations, with a
total of 32 design variables. The authors adopted a MOEA based on MOGA [14]
with real numbers encoding. The optimization process was coupled to a second-
order RSM, which was built with 1,024 design candidates using the Improved
Hypercube Sampling (IHS) algorithm. The authors reported that the evaluation
of the 1,024 sampling individuals took approximately 128 hours (5.33 days) us-
ing eight processors and a Reynolds-Averaged Navier-Stokes CFD simulation. In
their experiments, 12 design solutions were selected from the RSM-Pareto front
obtained, and such solutions were verified with a high fidelity CFD simulation.
The objective function values slightly differed from those obtained by the ap-
proximation model, but all the selected solutions were better in both objective
functions than the reference design.
• Song and Keane [46] performed the shape optimization of a civil aircraft en-
gine nacelle. The primary goal of the study was to identify the trade-off between
aerodynamic performance and noise effects associated with various geometric
features for the nacelle. For this, two objective functions were defined: i) scarf
angle, and ii) total pressure recovery. The nacelle geometry was modeled us-
ing 40 parameters, from which 33 were considered design variables. In their
study, the authors implemented the NSGA-II [12] as the multi-objective search
engine, while a commercial CFD software was used for evaluation of the three-
dimensional flow characteristics. A kriging-based surrogate model was adopted
in order to keep the number of designs being evaluated with the CFD tool to
a minimum. In their experiments, the authors reported difficulties in obtaining
a reliable Pareto front (there were large discrepancies between two consecutive
Pareto front approximations). They attributed this behavior to the large number
of variables in the design problem, and also to the associated difficulties to ob-
tain an accurate kriging model for these situations. In order to alleviate this, they
performed an analysis of variance (ANOVA) test to find the variables that con-
tributed the most to the objective functions. After this test, they presented results
with a reduced surrogate model, employing only 7 decision variables. The au-
thors argued that they obtained a design similar to the previous one, but requiring
a lower computational cost because of the use of a reduced number of variables
in the kriging model.
• Arabnia and Ghaly [2] presented the aerodynamic shape optimization of turbine
stages in three-dimensional fluid flow, so as to minimize the adverse effects of
three-dimensional flow features on the turbine performance. Two objectives were
considered: (i) maximization of isentropic efficiency for the stage, and (ii) mini-
mization of the streamwise vorticity. Additionally, constraints were imposed on:
(1) inlet total pressure and temperature, (2) exit pressure, (3) axial chord and
spacing, (4) inlet and exit flow angles, and (5) mass flow rate. The blade ge-
ometry, both for rotor and stator blades, was based on the E/TU-3 turbine which
218 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
The accuracy of the surrogate model relies on the number and on the distribution
of samples provided in the search space, as well as on the selection of the appropri-
ate model to represent the objective functions and constraints. One important fact is
that Pareto-optimal solutions based on the computationally cheap surrogate model
do not necessarily satisfy the real CFD evaluation. So, as indicated in the previ-
ous references, it is necessary to verify the whole set of Pareto-optimal solutions
found from the surrogate, which can render the problem very time consuming. If
discrepancies are large, this condition might atenuate the benefit of using a surro-
gate model. The verification process is also needed in order to update the surrogate
model. This latter condition raises the question of how often in the design process it
is necessary to update the surrogate model. There are no general rules for this, and
many researchers rely on previous experiences and trial and error guesses.
CFD analyses rely on discretization of the flow domain and in numerical models
of the flow equations. In both cases, some sort of reduced model can be used as
fitness approximation methods, which can be further used to generate a surrogate
model. For example, Lee et al. [24] use different grid resolutions for the CFD sim-
ulations. Coarse grids are used for global exploration, while fine grids are used for
solution exploitation purposes.
Finally, many of the approaches using surrogates, build them, relating the design
variables with the objective functions. However, Leifsson and Koziel [25], have
recently proposed the use of physics-based surrogate models in which, they are
built relating the design variables with pressure distributions (instead of objective
functions). The premise behind this approach is that in aerodynamics, the objective
functions are not directly related with the design variables, but with the pressure
distributions. The authors have presented successful results using this new kind of
surrogate model for global transonic airfoil optimization. Its extension to multiob-
jective aerodynamic shape optimization is straightforward and very promising.
10 Evolutionary Algorithms 219
Experience has shown that hybridizing MOEAs with gradient-based techniques can,
to some extent, increase their convergence rate. However, in the examples presented
above, the gradient information relies on local and/or global surrogate models. For
this, one major concern is how to build a high-fidelity surrogate model with the ex-
isting designs in the current population, since, their distribution in the design space
can introduce some undesired bias in the surrogate model. Additionally, there are
no rules for choosing the number of points for building the surrogate model, nor
for defining the number of local searches to be performed. These parameters are
emprirically chosen. Another idea that has not been explored in multi-objective
evolutionary optimization, is to use adjoint-based CFD solutions to obtain gradi-
ent information. Adjoint-based methods are also mature techniques currently used
for single objective aerodynamic optimization [28], and gradient information with
these techniques can be obtained with as much of an additional objective function
evaluation.
The last objective function can be considered as a robust condition for the de-
sign, since it is computed as the average of the pressure loss coefficients at two
off-design incidence angles. The airfoil blade geometry was defined by twelve
design variables. The authors adopted MOGA [14] with real-numbers encoding
as their search engine. Aerodynamic performance evaluation for the compressor
blade was done using Navier-Stokes CFD simulations. The optimization process
was parallelized using 24 processors in order to reduce the computational time
required.
• Rai [37] dealt with the robust optimal aerodynamical design of a turbine blade
airfoil shape, taking into account the performance degradation due to manufac-
turing uncertainties. The objectives considered were: (i) to minimize the vari-
ance of the pressure distribution over the airfoil’s surface, and (ii) to maximize
the probability of constraint satisfaction. Only one constraint was considered, re-
lated to the minimum thickness of the airfoil shape. The author adopted a multi-
objective version of the differential evolution algorithm and used a high-fidelity
CFD simulation on a perturbed airfoil geometry in order to evaluate the aerody-
namic characteristics of the airfoil generated by the MOEA. The geometry used
in the simulation was perturbed, following a probability density function that is
observed for manufacturing tolerances. This process had a high computational
cost, which the author reduced using a neural network surrogate model.
• Shimoyama et al. [44] applied a design for multi-objective six-sigma (DFMOSS)
[43] for the robust aerodynamic airfoil design of a Mars exploratory airplane.
The aim is to find the trade-off between the optimality of the design and its ro-
bustness. The idea of the DFMOSS methodology was to incorporate a MOEA to
simultaneously optimize the mean value of an objective function, while minimiz-
ing its standard deviation due to the uncertainties in the operating environment.
The airfoil shape optimization problems considered two cases: a robust design of
(a) airfoil aerodynamic efficiency (lift to drag ratio), and (b) airfoil pitching mo-
ment constraint. In both cases, only the variability in the flow Mach number was
taken into account. The authors adopted MOGA [14] as their search engine. The
airfoil geometry was defined with 12 design variables. The aerodynamic perfor-
mance of the airfoil was evaluated by CFD simulations using the Favre-Averaged
compressible thin-layer Navier-Stokes equations. The authors reported computa-
tional times of about five minutes per airfoil, and about 56 hours for the total
optimization process, using a NEC SX-6 computing system with 32 processors.
Eighteen robust nondominated solutions were obtained in the first test case. From
this set, almost half of the population attained the 6σ condition. In the second test
case, more robust nondominated solutions were found, and they satisfied a sigma
level as high as 25σ .
• Lee et al. [24] presented the robust design optimization of an ONERA M6 Wing
Shape. The robust optimization was based on the concept of the Taguchi method
in which the optimization problem is solved considering uncertainties in the de-
sign environment, in this case, the flow Mach number. The problem had two ob-
jectives: (i) minimization of the mean value of an objective function with respect
to variability of the operating conditions, and (ii) minimization of the variance
222 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
of the objective function of each candidate solution, with respect to its mean
value. In the sample problems, the wing was defined by means of its planform
shape (sweep angle, aspect ratio, taper ratio, etc.) and of the airfoil geometry, at
three wing locations (each airfoil shape was defined with a combination of mean
lines and camber distributions), using a total of 80 design variables to define the
wing designs. Geometry constraints were defined by upper and lower limits of
the design variables. The authors adopted the Hierarchical Asynchronous Paral-
lel Multi-Objective Evolutionary Algorithm (HAPMOEA) algorithm [15], which
is based on evolution strategies, incorporating the concept of Covariance Matrix
Adaptation (CMA). The aerodynamic evaluation was done with a CFD simula-
tion. 12 solutions were obtained in the robust design of the wing. All the nondom-
inated solutions showed a better behavior, in terms of aerodynamic performance
(lift-to-drag ratio) with a varying Mach number, as compared to the baseline de-
sign. During the evolutionary process, a total of 1100 individuals were evaluated
in approximately 100 hours of CPU time.
As can be seen form the previous examples, robust solutions can be achieved in
evolutionary optimization in different ways. One simple approach is to add pertur-
bations to the design variables or environmental parameters before the fitness is
evaluated, which is known as implicit averaging [50]. An alternative to implicit av-
eraging is explicit averaging, which means that the fitness value of a given design
is averaged over a number of designs generated by adding random perturbations to
the original design. One drawback of the explicit averaging method is the number of
additional quality evaluations needed, which can turn the approach impractical. In
order to tackle this problem, metamodeling techniques have been considered [32].
to the wing’s geometry and two more to the operating conditions in lift coeffi-
cient and to the fuel volume required for a predefined aircraft mission. The wing
geometry was defined by 35 design variables. The authors adopted ARMOGA
[40]. The disciplines involved included aerodynamics and structural analysis and
during the optimization process, an iterative aeroelastic solution was generated
in order to minimize the wing weight, with constraints on flutter and strength
requirements. Also, a flight envelope analysis was done, obtaining high-fidelity
Navier-Stokes solutions for various flight conditions. Although the authors used
very small population sizes (eight individuals), about 880 hours of CPU time
were required at each generation, since an iterative process was performed in or-
der to optimize the wing weight, subject to aeroelastic and strength constraints.
The population was reinitialized at every 5 generations for range adaptation of
the design variables. In spite of the use of such a reduced population size, the au-
thors were able to find several nondominated solutions outperforming the initial
design. They also noted that during the evolution, the wing-box weight tended to
increase, but this degrading effect was redeemed by an increase in aerodynamic
efficiency, given a reduction in the block fuel of over one percent, which would
be translated in significant savings for an airline’s operational costs.
• Sasaki et al. [41] used MDO for the design of a supersonic wing shape. In this
case, four objective functions were minimized: (i) drag coefficient at transonic
cruise, (ii) drag coefficient at supersonic cruise, (iii) bending moment at the wing
root at supersonic cruise condition, and (iv) pitching moment at supersonic cruise
condition. The problem was defined by 72 design variables. Constraints were
imposed on the variables ranges and on the wing section’s thickness and camber,
all of them being geometrical constraints. The authors adopted ARMOGA [40],
and the aerodynamic evaluation of the design soutions, was done by high-fidelity
Navier-Stokes CFD simulations. No aeroelastic analysis was performed, which
considerably reduced the total computational cost. The objective associated with
the bending moment at wing root was evaluated by numerical integration of the
pressure distribution over the wing surface, as obtained by the CFD analysis. The
authors indicated that among the nondominated solutions there were designs that
were better in all four objectives with respect to a reference design.
• Lee et al. [23] utilized a generic Framework for MDO to explore the improve-
ment of aerodynamic and radar cross section (RCS) characteristics of an Un-
manned Combat Aerial Vehicle (UCAV). In this application, two disciplines were
considered, the first concerning the aerodynamic efficiency, and the second re-
lated to the visual and radar signature of an UCAV airplane. In this case, three
objective functions were minimized: (i) inverse of the lift/drag ratio at ingress
condition, (ii) inverse of the lift/drag ratio at cruise condition, and (iii) frontal
area. The number of design variables was of approximately 100 and only side
constraints were considered in the design variables. The first two objective func-
tions were evaluated using a Potential Flow CFD Solver (FLO22) [17] coupled to
FRICTION code to obtain the viscous drag, using semi-empirical relations. The
authors adopted the Hierarchical Asynchronous Parallel Multi-Objective Evolu-
tionary Algorithm (HAPMOEA) [15]. The authors reported a processing time
224 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
of 200 hours for their approach, on a single 1.8 GHz processor. It is important
to consider that HAPMOEA operates with different CFD grid levels (i.e. ap-
proximation levels): coarse, medium, and fine. In this case, the authors adopted
different population sizes for each of these levels. Also, solutions were allowed
to migrate from a low/high fidelity level to a higher/lower one in an island-like
mechanism.
The increasing complexity of engineering systems has raised the interest in multidis-
ciplinary optimization, as can be seen from the examples presented in this section.
For this task, MOEAs facilitate the integration of several disciplines, since they do
not require additional information other than the evaluation of the corresponding
objective functions, which is usually done by each discipline and by the use of sim-
ulations. Aditionally, an advantage of the use of MOEAs for MDO, is that they can
easily manage any combination of variable types, coming from the involved disci-
plines i.e., from the aerodynamic discipline, the variables can be continuous, but
for the structural optimization, it can happen that the variables are discrete. Kuhn
et al. [22] presented an example of this condition for the multi-disciplinary design
of an airship. However, one challenge in MDO is the increasing dimensionality at-
tained in the design space, as the number of disciplines also increases.
• Jeong et al. [18] and Chiba et al. [7, 6] explored the trade-offs among four aero-
dynamic objective functions in the optimization of a wing shape for a Reusable
Launch Vehicle (RLV). The objective functions were: (i) The shift of the aero-
dynamic center between supersonic and transonic flight conditions, (ii) Pitching
moment in the transonic flight condition, (iii) drag in the transonic flight condi-
tion, and (iv) lift for the subsonic flight condition. The first three objectives were
minimized while the fourth was maximized. These objectives were selected for
10 Evolutionary Algorithms 225
attaining control, stability, range and take-off constraints, respectively. The RLV
definition comprised 71 design variables to define the wing planform, the wing
position along the fuselage and the airfoil shape at prescribed wingspan stations.
The authors adopted ARMOGA [40], and the aerodynamic evaluation of the RLV
was done with a Reynolds-Averaged Navier-Stokes CFD simulation. A trade-
off analysis was conducted with 102 nondominated individuals generated by the
MOEA. Data mining with SOM was used, and some knowledge was extracted
in regards to the correlation of each design variable to the objective functions
in [7]; with SOM, Batch-SOM, ANOVA and rough sets in [6]; and with SOM,
Batch-SOM and ANOVA in [18]. In all cases, some knowledge was extracted in
regards to the correlation of each design variable to the objective functions.
• Oyama et al. [35] applied a design exploration technique to extract knowledge in-
formation from a flapping wing MAV (Micro Air Vehicle). The flapping motion
of the MAV was analyzed using multi-objective design optimization techniques
in order to obtain nondominated solutions. Such nondominated solutions were
further analyzed with SOMs in order to extract knowledge about the effects of the
flapping motion parameters on the objective functions. The conflicting objectives
considered were: (i) maximization of the time-averaged lift coefficient, (ii) max-
imization of the time-averaged thrust coefficient, and (iii) minimization of the
time-averaged required power coefficient. The problem had five design variables
and the geometry of the flying wing was kept fixed. Constraints were imposed
on the averaged lift and thrust coefficients so that they were positive. The authors
adopted a GA-based MOEA. The objective functions were obtained by means of
CFD simulations, solving the unsteady incompressible Navier-Stokes equations.
Objective functions were averaged over one flapping cycle. The purpose of the
study was to extract trade-off information from the objective functions and the
flapping motion parameters such as plunge amplitude and frequency, pitching
angle amplitude and offset.
• Tani et al. [49] solved a multiobjective rocket engine turbopump blade shape op-
timization design which considered three objective functions: (i) shaft power, (ii)
entropy rise within the stage, and (iii) angle of attack of the next stage. The first
objective was maximized while the others were minimized. The design candi-
dates defined the turbine blade aerodynamic shape and consisted of 58 design
variables. The authors adopted MOGA [14] as their search engine. The objective
function values were obtained from a CFD Navier-Stokes flow simulation. The
authors reported using SOMs to extract correlation information for the design
variables with respect to each objective function.
When adopting the data mining techniques used in the above examples, in which
analyses are done, correlating the objective functions values, with the design param-
eter values of the Pareto optimal solutions, some valuable information is obtained.
However, in many other cases, for aerodynamic flows, the knowledge required is
226 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
more related to the physics, rather than to the geometry, given by the design
variables. For example, for understanding the relation between the generation of
shock wave formation and aerodynamic characteristics in a transonic airfoil opti-
mization. For this, Oyama et al. [34], have recently proposed a new approach to ex-
tract useful design information from one-dimensional, two-dimensional, and three-
dimensional flow data of Pareto-optimal solutions. They use a flow data analysis
by Proper Orthogonal Decomposition (POD), which is a statistical approach that
can extract dominant features in the data by decomposing it into a set of optimal
orthogonal base vectors of decreasing importance.
rleup rlelo αte βte Zte Δ Zte Xup Zup Zxxup Xlo Zlo Zxxlo
min 0.0085 0.002 7.0 10.0 -0.006 0.0025 0.41 0.11 -0.9 0.20 -0.023 0.05
max 0.0126 0.004 10.0 14.0 -0.003 0.0050 0.46 0.13 -0.7 0.26 -0.015 0.20
6
∑ bn x
n−1
Zlower = 2 (10.5)
n=1
In the above equations, the coefficients an , and bn are determined as function of the
12 described geometric parameters, by solving the following two systems of linear
equations:
Upper surface:
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1 a1 Zte + 12 Δ Zte
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 11/2 ⎥ ⎢ ⎥ ⎢ ⎥
⎢ Xup 1/2 3/2
Xup Xup
5/2
Xup
7/2 9/2
Xup Xup ⎥ ⎢a2 ⎥ ⎢ Zup ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢
⎢ 1/2 3/2 5/2 7/2 9/2 11/2 ⎥ ⎢a3 ⎥ ⎢tan((2αte − βte )/2)⎥
⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 1 −1/2 3 1/2 5 3/2 7 5/2 9 7/2 11 9/2 ⎥ ⎢a ⎥ = ⎢ ⎥
⎢ 2 Xup 2 Xup 2 Xup 2 Xup 2 Xup 2 Xup ⎥ ⎢ 4⎥ ⎢ 0 ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 1 −3/2 3 −1/2 15 1/2 35 3/2 63 5/2 99 7/2 ⎥ ⎢ a ⎥ ⎢ ⎥
⎢− 4 Xup 4 Xup 4 Xup 4 Xup 4 Xup X ⎥ ⎢ 5
⎥ ⎢
Z xxup
⎥
⎢ 4 up
⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎣ ⎦ ⎣ √ ⎦
⎣ 1 0 0 0 0 0 ⎦ a6 r leup
(10.6)
It is important to note that the geometric parameters rleup /rlelo , Xup /Xlo , Zup /Zlo ,
Zxxup /Zxxlo , Zte , Δ Zte , αte , and βte are the actual design variables in the optimization
process, and that the coeficients an , bn serve as intermediate variables for interpolat-
ing the airfoil’s coordinates, which are used by the CFD solver (we used the Xfoil
CFD code [13]) for its discretization process.
Lower surface:
⎡ ⎤⎡ ⎤ ⎡ ⎤
1 1 1 1 1 1 b1 Zte − 1 Δ Zte
⎢ ⎥⎢ ⎥ ⎢ 2
⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ Xlo 1/2 3/2
Xlo Xlo
5/2
Xlo
7/2 9/2
Xlo Xlo ⎥ ⎢b2 ⎥ ⎢
11/2
⎥
⎢ ⎥⎢ ⎥ ⎢ Z lo
⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢
⎢ 1/2 3/2 5/2 7/2 9/2 11/2 ⎥ ⎢b3 ⎥ ⎢tan((2αte + βte )/2)⎥
⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 1 −1/2 3 1/2 5 3/2 7 5/2 9 7/2 11 9/2 ⎥ ⎢b ⎥ = ⎢ ⎥
⎢ 2 Xlo X X X X X ⎥ ⎢ 4⎥ ⎢ 0 ⎥
⎢ 2 lo 2 lo 2 lo 2 lo 2 lo ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎢ ⎥ ⎢ ⎥
⎢ 1 −3/2 3 −1/2 15 1/2 35 3/2 63 5/2 99 7/2 ⎥ ⎢b5 ⎥ ⎢ ⎥
⎢− 4 Xlo Xlo Xlo 4 Xlo 4 Xlo 4 Xlo ⎥ ⎢ ⎥ ⎢
Z xxlo
⎥
⎢ 4 4 ⎥⎢ ⎥ ⎢ ⎥
⎢ ⎥⎣ ⎦ ⎣ √ ⎦
⎣ 1 0 0 0 0 0 ⎦ b6 − rlelo
(10.7)
10 Evolutionary Algorithms 229
10.4.3 Constraints
For this case study, five constraints are considered. The first three are defined in
terms of flight speed for each objective function, namely the prescribed CL values,
CL = 0.63 for objective (i), CL = 0.86 for objective (ii), and CL = 1.05 for objective
(iii), enable the glider to fly at a given design speed, and to produce the necessary
amount of lift to balance the gravity force for each design condition being analyzed.
It is important to note that prescribing the required CL , the corresponding angle of
attack α for the airfoil is obtained as an additional variable. For this, the flow solver,
given the design candidate geometry, solves the flow equations with a constraint on
the CL value, i.e., it additionally determines the operating angle of attack α . Two ad-
ditional constraints are defined for the airfoil geometry. First, the maximum airfoil
thickness range is defined by 13.0% ≤ t/c ≤ 13.5%. For handling this constraint,
every time a new design candidate is created by the evolutionary operators, its max-
imum thickness is checked and corrected before being evaluated. The correction is
done by scaling accordingly the design parameters Zup and Zlo , which mainly define
the thickness distribution in the airfoil. In this way, only feasible solutions are eval-
uated by the simulation process. The final constraint is the trailing edge thickness,
whose range is defined by 0.25% ≤ Δ Zte ≤ 0.5%. This constraint is directly handled
in the lower and upper bounds by the corresponding Δ Zte design parameter.
v ← u1 + F · (u2 − u3 ) (10.8)
F > 0, is a real constant scaling factor which controls the amplification of the differ-
ence (u2 − u3 ). Using this mutant vector, a new offspring Pi (also called trial vector
in DE) is created by crossing over the mutant vector v and the current solution Pi , in
accordance to:
v j if (rand j (0, 1) ≤ CR or j = jrand
Pj = (10.9)
Pj otherwise
230 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
Algorithm 1 MODE-LD+SS
1: INPUT:
P[1, . . . , N] = Population
N = Population Size
F = Scaling factor
CR = Crossover Rate
λ [1, . . . , N] = Weight vectors
NB = Neighborhood Size
GMAX = Maximum number of generations
2: OUTPUT:
PF = Pareto front approximation
3: Begin
4: g ← 0
g
5: Randomly create Pi , i = 1, . . . , N
g
6: Evaluate Pi , i = 1, . . . , N
7: while g < GMAX do
8: {LND} = {}
9: for i = 1 to N do
10: DetermineLocalDominance(Pig ,NB)
11: if Pig is locally nondominated then
g
12: {LND} ← {LND} ∪ Pi
13: end if
14: end for
15: for i = 1 to N do
16: Randomly select u1 , u2 , and u3 from {LND}
17: v ← CreateMutantVector(u1 , u2 , u3 )
g+1 g
18: Pi ← Crossover(Pi , v)
g+1
19: Evaluate Pi
20: end for
21: Q ← Pg ∪ Pg+1
22: Determine z∗ for Q
23: for i = 1 to N do
g+1
24: Pi ← MinimumTchebycheff(Q, λ i , z∗)
g+1
25: Q ← Q\Pi
26: end for
27: PF ← {P}g+1
28: end while
29: ReturnPF
30: End
In the above expression, the index j refers to the jth component of the decision vari-
ables vectors. CR is a positive constant and jrand is a randomly selected integer in the
range [1, . . . , D] (where D is the dimension of the solution vectors) ensuring that the
offspring is different at least in one component with respect to the current solution
Pi . The above DE variant is known as Rand/1/bin, and is the version adopted here.
Additionally, the proposed algorithm incorporates two mechanisms for improving
both the convergence towards the Pareto front, and the uniform distribution of
10 Evolutionary Algorithms 231
- We say that a solution x is locally nondominated with respect to ℵ(x) if and only
if there is no x in the neighborhood of x such that f(x ) ≺ f(x)
criterion and (2) the hypervolume4 is applied as its selection criterion to discard that
individual, which contributes the least hypervolume to the worst-ranked front.
The basic algorithm is described in Algorithm 2. Starting with an initial pop-
ulation of μ individuals, a new individual is generated by means of randomised
variation operators. We adopted simulated binary crossover (SBX) and polynomial-
based mutation as described in [11]. The new individual will become a member of
the next population, if replacing another individual leads to a higher quality of the
population with respect to the hypervolume.
Algorithm 2 SMS-EMOA
1: Po ← init() /* initialize random population of μ individuals */
2: t←0
3: repeat
4: qt+1 ← generate(Pt ) /* generate offspring by variation*/
5: Pt+1 ← reduce(Pt {qt+1 }) /* select μ best individuals */
6: until termination condition is fulfilled
The procedure Reduce used in Algorithm 2 selects the μ individuals of the sub-
sequent population; the definition of this procedure is given in Algorithm 3. The
algorithm fast-nondominated-sort used in NSGA-II [12] is applied to partition the
population into v sets R1 , . . . , Rv . The subsets are called fronts and are provided
with an index representing a hierarchical order (the level of domination) whereas
the solutions within each front are mutually nondominated. The first subset con-
tains all nondominated solutions of the original set Q. The second front consists of
individuals that are nondominated in the set (Q\R1 ), e.g. each member of R2 is
dominated by at least one member of R1 . More general, the ith front consists of
individuals that are nondominated if the individuals of the fronts j with j < i were
removed from Q.
Algorithm 3 Reduce(Q)
1: {R1 , . . . , Rv } ← f ast nondominated sort(Q) /* all v fronts of Q*/
2: r ← argmins∈Rv [ΔS (s, Rv )] /* s ∈ Rv with lowest Δ S (s, Rv )*/
3: return (Q\r)
Due to the high computational effort of the hypervolume calculation, a steady state
selection scheme is used. Since only one individual is created, only one has to be
deleted from the population at each generation. Thus, the selection operator has to
compute at most μ + 1 values of the S-Metric (exactly μ + 1 values in case all
solutions are nondominated). These are the values of the subsets of the worst ranked
front, in which one point of the front is left out, respectively. A (μ + λ ) selection
scheme would require the calculation of μ + μ
λ
possible S-Metric values to identify
an optimally composed population, maximising the S-Metric net value.
The parameters used for solving the present case study, and for each algorithm
were set as follows: N = 120 (population size) for both MOEAs, F = 0.5 (mutation
scaling factor for MODE-LD+SS), CR = 0.5 (crossover rate for MODE-LD+SS),
NB = 5 (neighborhood size for MODE-LD+SS), ηm = 20 (mutation index for SBX
in SMS-EMOA), and ηc = 15 (crossover index for SBX in SMS-EMOA).
10.4.5 Results
Both, MODE-LD+SS and SMS-EMOA were run for 100 generations. The simula-
tion process in each case took approximately 8 hrs of CPU time. Five independent
runs were executed for extracting some statistics. Figs. 10.2 to 10.3 show the Pareto
front approximations (of the median run) at different evolution times. For compar-
ison purposes, in these figures the corresponding objective functions of a reference
airfoil (a720o [48]) are plotted. At t = 10 generations (the corresponding figure is
not shown due to space constraints), the number of nondominated solutions is 26
for SMS-EMOA and 27 for MODE-LD+SS. With this small number of nondomi-
nated solutions is difficult to identify the trade-off surface for this problem. How-
ever, as the number of evolution steps increases, the trade-off surface is more clearly
revealed. At t = 50 generations (see Fig. 10.2), the number of nondominated solu-
tions is 120 for SMS-EMOA, and 91 for MODE-LD+SS. At this point, the trade-off
surface shows a steeper variation of objective (iii) toward the compromise region of
the Pareto front. Also, the trade-off shows a plateau where the third objective has a
small variation with respect to the other objectives. Finally, at t = 100 generations
(see Fig. 10.3), the shape of the trade-off surface is more clearly defined, and a clear
trade-off between the three objectives are evidenced. It is important to note in Fig.
10.3, that the trade-off surface shows some void regions. This condition is captured
by both MOEAs and is attributed to the constraints defined in the airfoil geome-
try. Table 10.2 summarizes the maximum possible improvement with respect to the
reference solution, that can be attained for each objective and by each MOEA.
In the context of MOEAs, it is common to compare results on the basis of some
performance measures. Next, and for comparison purposes between the algorithms
234 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
Table 10.2 Maximum improvement per objective for the median run of each MOEA used
MOEA
SMS-EMOA MODE-LD+SS
Gen Δ Ob j1(%) Δ Ob j2(%) Δ Ob j3(%) Δ Ob j1(%) Δ Ob j2(%) Δ Ob j3(%)
10 11.43 10.19 5.43 11.93 10.38 5.47
50 12.84 10.67 6.06 13.22 10.67 6.21
100 12.75 10.79 6.28 13.63 10.80 6.40
used, we present the hypervolume values attained by each MOEA, as well as the val-
ues of the two set coverage performance measure C-M(A,B) between them. Next,
we present the definition for these two performance measures:
veci is a nondominated vector from the Pareto set approximation, and voli is the vol-
ume for the hypercube formed by the reference point and the nondominated vector
veci . Here, the reference point (zre f ) in objective space for the 3-objective MOPs
was set to (0.007610 , 0.005895 , 0.005236 ), which corresponds to the objective
values of the reference airfoil. High values of this measure indicate that the solu-
tions are closer to the true Pareto front and that they cover a wider extension of it.
Two Set Coverage (C-Metric): This performance measure estimates the coverage
proportion, in terms of percentage of dominated solutions, between two sets. Given
the sets A and B, both containing only nondominated solutions, the C-Metric is
mathematically defined as:
SMS-EMOA
MODE-LD+SS
f3 a720o
0.0054
0.0053
0.0052
0.0051
0.005
0.0049
0.0065 0.0058
0.007 0.0056
f1 0.0054 f2
0.0075
0.0052
SMS-EMOA
MODE-LD+SS
f3 a720o
0.0053
0.0052
0.0051
0.005
0.0049
0.0065
0.007 0.0058
0.0075 0.0056
f1 0.008 f2
0.0085 0.0054
0.15
a720o
SMS-EMOA
MODE-LD+SS
0.1
0.05
-0.05
-0.1
0 0.2 0.4 0.6 0.8 1
of the five independent runs executed by each algorithm. Our results indicate that
MODE-LD+SS converges closer to the true Pareto front, and provides more non-
dominated solutions than SMS-EMOA.
Finally, in Figure 10.4 are presented the geometries of the reference airfoil,
a720o, and two selected airfoils from the trade-off surface of this problem and ob-
tained by SMS-EMOA and MODE-LD+SS at t = 100 generations. These two latter
airfoil are selected as those with the closest distance to the origin of the objective
space, since they are considered to represent the best trade-off solutions.
and low-order physics-based models without major changes. They can also be easily
parallelized, since MOEAs normally have low data dependency.
From an algorithmic point of view, it is clear that the use of Pareto-based MOEAs
remains as a popular choice in the previous group of applications. It is also evident
that, when dealing with expensive objective functions such as those of the above ap-
plications, the use of careful statistical analysis of parameters is unaffordable. Thus,
the parameters of such MOEAs were simple guesses or taken from values suggested
by other researchers. The use of surrogate models also appears in these costly ap-
plications. However, the use of other simpler techniques such as fitness inheritance
or fitness approximation [39] seems to be uncommon in this domain and could be
a good alternative when dealing with high-dimensional problems. Additionally, the
authors of this group of applications have relied on very simple constraint-handling
techniques, most of which discard infeasible individuals. Alternative approaches ex-
ist, which can exploit information from infeasible solutions and can make a more
sophisticated exploration of the search space when dealing with constrained prob-
lems (see for example [29]) and this has not been properly studied yet. Finally, it is
worth emphasizing that, in spite of the difficulty of these problems and of the evi-
dent limitations of MOEAs to deal with them, most authors report finding improved
designs when using MOEAs, even when in all cases a fairly small number of fit-
ness function evaluations was allowed. This clearly illustrates the high potential of
MOEAs in this domain.
Acknowledgements. The first author acknowledges support from both CONACyT and IPN
to pursue graduate studies in computer science at CINVESTAV-IPN. The second author ac-
knowledges support from CONACyT project no. 103570. The third author acknowledges
support from CONACyT project no. 79809.
References
1. Anderson, M.B.: Genetic Algorithm. In: Aerospace Design: Substantial Progress,
Tremendous Potential. Technical report, Sverdrup Technology Inc./TEAS Group, 260
Eglin Air Force Base, FL 32542, USA (2002)
2. Arabnia, M., Ghaly, W.: A Strategy for Multi-Objective Shape optimization of Turbine
Stages in Three-Dimensional Flow. In: 12th AIAA/ISSMO Multidisciplinary Analysis
and Optimization Conference, Victoria, British Columbia Canada, September 10 –12
(2008)
3. AriasMontano, A., Coello, C.A.C., Mezura-Montes, E.: MODE-LD+SS: A Novel Dif-
ferential Evolution Algorithm Incorporating Local Dominance and Scalar Selection
Mechanisms for Multi-Objective Optimization. In: 2010 IEEE Congress on Evolution-
ary Computation (CEC 2010), Barcelona, Spain, IEEE Press, Los Alamitos (2010)
4. Benini, E.: Three-Dimensional Multi-Objecive Design optimization of a ransonic Com-
pressor Rotor. Journal of Propulsin and Power 20(3), 559–565 (2004)
5. Beume, N., Naujoks, B., Emmerich, M.: SMS-EMOA: Multiobjective Selection-
Basd on Dominated Hypervolume. European Journal of Operational Research 181,
1653–1659 (2007)
238 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
6. Chiba, K., Jeong, S., Obayashi, S., Yamamoto, K.: Knowledge Discovery in Aero-
dynamic Design Space for Flyback–Booster Wing Using Data Mining. In: 14th
AIAA/AHI Space Planes and Hypersonic System and Technologies Conference, Can-
berra, Australia, November 6–9 (2006)
7. Chiba, K., Obayashi, S., Nakahashi, K.: Design Exploration of Aerodynamic Wing
Shape for Reusable Launch Vehicle Flyback Booster. Journal of Aircraft 43(3),
832–836 (2006)
8. Chiba, K., Oyama, A., Obayashi, S., Nakahashi, K., Morino, H.: Multidisciplinary De-
sign Optimization and Data Mining for Transonic Regional-Jet Wing. AIAA Journal of
Aircraft 44(4), 1100–1112 (2007), doi:10.2514/1.17549
9. Chung, H.-S., Choi, S., Alonso, J.J.: Supersonic Business Jet Design using a
Knowledge-Based Genetic Algorithm with an Adaptive, Unstructured Grid Method-
ology. In: AIAA Paper 2003-3791, 21st Applied Aerodynamics Conference, Orlando,
Florida, USA, June 23-26 (2003)
10. Coello, C.A.C.: Theoretical and Numerical Constraint Handling Techniques used with
Evolutionary Algorithms: A Survey of the State of the Art. Computer Methods in Ap-
plied Mechanics and Engineering 191(11-12), 1245–1287 (2002)
11. Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms. John Wiley &
Sons, Chichester (2001) ISBN 0-471-87339-X
12. Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast and Elitist Multiobjective
Genetic Algorithm: NSGA–II. IEEE Transactions on Evolutionary Computation 6(2),
182–197 (2002)
13. Drela, M.: XFOIL: An Analysis and Design System for Low Reynolds Number Aerody-
namics. In: Conference on Low Reynolds Number Aerodynamics. University of Notre
Dame, IN (1989)
14. Fonseca, C.M., Fleming, P.J.: Genetic Algorithms forMultiobjective Optimization: For-
mulation, Discussion and Generalization. In: Stephanie, F. (ed.) Proceedings of the Fifth
International Conference on Genetic Algorithms, pp. 416–423. University of Illinois at
Urbana-Champaign, Morgan Kauffman Publishers, San Mateo, California (1993)
15. Gonzalez, L.F.: Robust Evolutionary Methods forMulti-objective and Multidisciplinary
Design Optimization in Aeronautics. PhD thesis, School of Aerospace, Mechanical and
Mechatronic Engineering, The University of Sydney, Australia (2005)
16. Hua, J., Kong, F., Liu, P.y., Zingg, D.: Optimization of Long-Endurance Airfoils. In:
AIAA-2003-3500, 21st AIAA Applied Aerodynamics Conference, Orlando, FL, June
23-26 (2003)
17. Jameson, A., Caughey, D.A., Newman, P.A., Davis, R.M.: NYU Transonic Swept-Wing
Computer Program - FLO22. Technical report, Langley Research Center (1975)
18. Jeong, S., Chiba, K., Obayashi, S.: DataMining for erodynamic Design Space. In:
AIAA Paper 2005–5079, 23rd AIAA Applied Aerodynamic Conference, Toronto On-
tario Canada, June 6–9 (2005)
19. Jin, Y.: A comprehensive survey of fitness approximation in evolutionary computation.
Soft Computing 9(1), 3–12 (2005)
20. Kroo, I.: Multidisciplinary Optimization Applications in Preliminary Design – Status
and Directions. In: 38th, and AIAA/ASME/AHS Adaptive Structures Forum, Kissim-
mee, FL, April 7-10 (1997)
21. Kroo, I.: Innovations in Aeronautics. In: 42nd AIAA Aerospace Sciences Meeting,
Reno, NV, January 5-8 (2004)
22. Kuhn, T., Rösler, C., Baier, H.: Multidisciplinary Design Methods for the Hybrid
Universal Ground Observing Airship (HUGO). In: AIAA Paper 2007–7781, Belfast,
Northern Ireland, September 18-20 (2007)
10 Evolutionary Algorithms 239
23. Lee, D.S., Gonzalez, L.F., Srinivas, K., Auld, D.J., Wong, K.C.: Erodynamics/RCS
Shape Optimisation of Unmanned Aerial Vehicles using Hierarchical Asynchronous
Parallel Evolutionary Algorithms. In: AIAA Paper 2006-3331, 24th AIAA Applied
Aerodynamics Conference, San Francisco, California, USA, June 5-8 (2006)
24. Lee, D.S., Gonzalez, L.F., Periaux, J., Srinivas, K.: Robust Design Optimisation Using
Multi-Objective Evolutionary Algorithms. Computer & Fluids 37, 565–583 (2008)
25. Leifsson, L., Koziel, S.: Multi-fidelity design optimization of transonic airfoils using
physics-based surrogate modeling and shape-preserving response prediction. Journal
of Computational Science, 98–106 (2010)
26. Lian, Y., Liou, M.-S.: Multiobjective Optimization Using Coupled Response Surface
Model end Evolutinary Algorithm. In: AIAA Paper 2004–4323, 10th AIAA/ISSMO
Multidisciplinary Analysis and Optimization Conference, , Albany, New York, USA,
August 30-September 1 (2004)
27. Lian, Y., Liou, M.-S.: Multi-Objective Optimization of Transonic Compressor Blade
Using Evolutionary Algorithm. Journal of Propulsion and Power 21(6), 979–987 (2005)
28. Liao, W., Tsai, H.M.: Aerodynamic Design optimization by the Adjoint Equation
Method on Overset Grids. In: AIAA Paper 2006-54, 44th AIAA Aerospace Science
Meeting and Exhibit, Reno, Nevada, USA, January 9-12 (2006)
29. Mezura-Montes, E. (ed.): Constraint-Handling in Evolutionary Optimization. SCI,
vol. 198. Springer, Heidelberg (2009)
30. Mialon, B., Fol, T., Bonnaud, C.: Aerodynamic Optimization Of Subsonic Flying Wing
Configurations. In: AIAA-2002-2931, 20th AIAA Applied Aerodynamics Conference,
St. Louis Missouri, June 24-26 (2002)
31. Obayashi, S., Tsukahara, T.: Comparison of OptimizationAlgorithms for Aerodynamic
Shape Design. In: AIAA-96-2394-CP, AIAA 14th Applied Aerodynamics Conference,
New Orleans, LA, USA, June 17-20 (1996)
32. Ong, Y.-S., Nair, P.B., Lum, K.Y.: Max-min surrogate-assisted evolutionary algorithm
for robust design. IEEE Trans. Evolutionary Computation 10(4), 392–404 (2006)
33. Oyama, A.: Wing Design Using Evolutionary Algorithms. PhD thesis, Department of
Aeronautics and Space Engineering. Tohoku University, Sendai, Japan (March 2000)
34. Oyama, A., Nonomura, T., Fujii, K.: Data Mining of Pareto-Optimal Transonic Air-
foil Shapes Using Proper Orthogonal Decomposition. AIAA Journal Of Aircraft 47(5),
1756–1762 (2010)
35. Oyama, A., Okabe, Y., Shimoyama, K., Fujii, K.: Aerodynamic Multiobjective Design
Exploration of a Flapping Airfoil Using a Navier-Stokes Solver. Journal Of Aerospace
Computing, Information, and Communication 6(3), 256–270 (2009)
36. Price, K.V., Storn, R., Lampinen, J.A.: Differential Evolution. A Practical Approach to
Global Optimization. Springer, Berlin (2005)
37. Rai, M.M.: Robust Optimal Design With Differential Evolution. In: AIAA Paper 2004-
4588, 10th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference,
Albany, New York, USA, August 30 - September 1 (2004)
38. Ray, T., Tsai, H.M.: A Parallel Hybrid Optimization Algorithm for Robust Airfoil De-
sign. In: AIAA Paper 2004–905, 42nd AIAA Aerospace Science Meeting and Exhibit,
Reno, Nevada, USA, January 5 -8 (2004)
39. Sierra, M.R., Coello, C.A.C.: A Study of Fitness Inheritance and ApproximationTech-
niques for Multi-Objective Particle Swarm Optimization. In: 2005 IEEE Congress
on Evolutionary Computation (CEC 2005), vol. 1, pp. 65–72. IEEE Service Center,
Edinburgh (2005)
40. Sasaki, D., Obayashi, S.: Efficient search for trade-offs by adaptive range multiobjec-
tive genetic algorithm. Journal Of Aerospace Computing, Information, and Communi-
cation 2(1), 44–64 (2005)
240 A. Arias-Montaño, C.A. Coello Coello, and E. Mezura-Montes
41. Sasaki, D., Obayashi, S., Nakahashi, K.: Navier-Stokes Optimization of Supersonic
Wings with Four Objectives Using Evolutionary Algorithms. Journal Of Aircraft 39(4),
621–629 (2002)
42. Secanell, M., Suleman, A.: Numerical Evaluation of Optimization Algorithms for Low-
Reynolds Number Aerodynamic Shape Optimization. AIAA Journal 10, 2262–2267
(2005)
43. Shimoyama, K., Oyama, A., Fujii, K.: A New Efficient and Useful Robust Optimization
Approach –Design forMulti-objective Six Sigma. In: 2005 IEEE Congress on Evolu-
tionary Computation (CEC 2005), vol. 1, pp. 950–957. IEEE Service Center, Edinburgh
(2005)
44. Shimoyama, K., Oyama, A., Fujii, K.: Development of Multi-Objective Six-Sigma Ap-
proach for Robust Design Optimization. Journal of Aerospace Computing, Information,
and Communication 5(8), 215–233 (2008)
45. Sobieczky, H.: Parametric Airfoils and Wings. In: Fuji, K., Dulikravich, G.S. (eds.)
Notes on Numerical Fluid Mechanics, vol. 68, pp. 71–88. Vieweg Verlag, Wiesbaden
(1998)
46. Song, W., Keane, A.J.: Surrogate-based aerodynamic shape optimization of a civil air-
craft engine nacelle. AIAA Journal 45(10), 265–2574 (2007), doi:10.2514/1.30015
47. Srinivas, N., Deb, K.: Multiobjective Optimization Using Nondominated Sorting in Ge-
netic Algorithms. Evolutionary Computation 2(3), 221–248 (1994)
48. Szöllös, A., Smı́d, M., Hájek, J.: Aerodynamic optimization via multiobjective micro-
genetic algorithm with range adaptation, knowledge-based reinitialization, crowding
and epsilon-dominance. Advances in Engineering Software 40(6), 419–430 (2009)
49. Tani, N., Oyama, A., Okita, K., Yamanishi, N.: Feasibility study of multi ob-
jective shape optimization for rocket engine turbopump blade design. In: 44th
AIAA/ASME/SAE/ASEE Joint Propulsion Conference & Exhibit, Hartford, CT, July
21 - 23 (2008)
50. Tsutsui, S., Ghosh, A.: Genetic algorithms with a robust solution searching scheme.
IEEE Trans. Evolutionary Computation 1(3), 201–208 (1997)
51. Yamaguchi, Y., Arima, T.: Multi-Objective Optimization for the Transonic Compres-
sor Stator Blade. In: AIAA Paper 2000–4909, 8th AIAA/USAF/NASA/ISSMO Sym-
posium on Multidisciplinary Analysis and Optimization, AIAA Paper 2000–4909, 8th
AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimiza-
tion, September 6 - 8, Long Beach, CA, USA (2000)
52. Zhang, Q., Li, H.: MOEA/D: A Multiobjective Evolutionary Algorithm Based on De-
composition. IEEE Transactions on Evolutionary Computation 11(6), 712–731 (2007)
53. Zitzler, E., Thiele, L.: Multiobjective Evolutionary Algorithms: A Comparative Case
Study and the Strength Pareto Approach. IEEE Transactions on Evolutionary Compu-
tation 3(4), 257–271 (1999)
Chapter 11
An Enhanced Support Vector Machines Model
for Classification and Rule Generation
Ping-Feng Pai
Department of Information Management, National Chi Nan University, Taiwan, ROC
e-mail: [email protected].
Ming-Fu Hsu
Department of International Business Studies, National Chi Nan University, Taiwan, ROC
e-mail: [email protected]
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 241–258.
springerlink.com © Springer-Verlag Berlin Heidelberg 2011
242 P.-F. Pai and M.-F. Hsu
(ROA), quick ratio, and return on investment (ROI). This is a classification task
and data mining techniques are suitable for this task. The goal of data mining is to
build up a suitable model for a labeling process that approximates the original
process as closely as possible. Thus, investors can adopt the well-developed model
to learn the status of firm.
Support vector machines (SVM) were proposed by Vapnik [42, 43] originally
for typical binary classification problems. The SVM implements the structural risk
minimization (SRM) principle rather than the empirical risk minimization (ERM)
principle employed by most traditional neural network models. The most impor-
tant concept of SRM is the minimization of an upper bound to the generalization
error instead of minimizing the training error. In addition, the SVM will be equiv-
alent to solving a linear constrained quadratic programming (QP) problem, so that
the solution for SVM is always unique and globally optimal [6, 12, 14, 41, 42, 43].
Given a training set of instance-base pairs (xi,yi), i = 1,…,m, where xi ∈ R n and
yi ∈ {±1}, SVM determines an optimal separating hyperplane with the maximum
margin by solving the following optimization problem:
1 T
min w w (11.1)
w, g 2
s.t. yi (w ⋅ xi + g ) − 1 ≥ 0
where w denotes the weight vector, and g denotes the bias term.
The Lagrange function’s saddle point is the solution to the quadratic optimiza-
tion problem:
m
Lh (w, g , α ) = w ⋅ w − ∑ (α i yi (w ⋅ xi + g ) − 1)
1 T (11.2)
2 i =1
⎛ m ⎞
f ( x ) = sign⎜ ∑ yiα i* xi , x + g * ⎟ (11.4)
⎝ i =1 ⎠
In a binary classification task, only a few subsets of the Lagrange multipliers αi
usually tend to be greater than zero. These vectors are the closest to the optimal
hyperplane. The respective training vectors having non-zero αi are called support
vectors, as the optimal decision hyperplane f(x,α*,g*) depends on them exclu-
sively. Figure 11.1 illustrates the basic structure of SVM.
Very few data sets in the real world are linearly separable. What makes SVM
so remarkable is that the basic linear framework is easily extended to the case
where the data set is not linearly separable. The fundamental concept behind this
extension is to transform the input space where the data set is not linearly
separable into a higher-dimensional space, where the data are linearly separable.
Figure 11.2 illustrates the mapping concept of SVM.
Margin
Support vector
Fig. 11.2 Mapping a non-linear data set into a feature space [6]
s.t. yi ( w.xi + g ) + ξ − 1 ≥ 0
(11.5)
ξi ≥ 0
where C is a penalty parameter on the training error, and ξi is the non-negative
slack variable. The constant C used to determine the trade-off between margin size
244 P.-F. Pai and M.-F. Hsu
and error. Observe that C is positive and cannot be zero; that is, we cannot simply
ignore the slack variables by setting C = 0. With a large value for C, the optimiza-
tion will try to discover a solution with a small number of non-zero slack variables
because errors are costly [14]. Above all, it can be concluded that a large C
implies a small margin, and a small C implies a large margin.
The Lagrangian method can be used to solve the optimization model, which is
almost equivalent to the method for dealing with the optimization problem in the
separable case. One has to maximize the dual variables Lagrangian:
m
1 m
max LE (α ) = ∑ α i − ∑ α iα j yi y j xi ⋅ x j
α 2 i , j =1 (11.6)
i =1
m
s.t. 0 ≤ α i ≤ C , i = 1, … , m and ∑α y
i =1
i i =0
plane. The penalty parameter C is an upper bound on αi, and determined by the
user.
The mapping function Φ is used to map the training samples from the input
space into a higher-dimensional feature space. In Eq.11.6, the inner products are
substituted by the kernel function (Φ(xi)⋅ Φ(yi)) = K(xi,xj), and the nonlinear SVM
dual Lagrangian LE(α) shown in Eq.(11.7) is similar to that in the linear general-
ized case:
∑ α iα j yi y j K (xi ⋅ x j )
m
1 m
LE (α ) = ∑ α i −
i =1 2 i , j =1 (11.7)
m
s.t. 0 ≤ α i ≤ C , i = 1, … , m and ∑α y
i =1
i i =0
Hence, followed the steps illustrated in the linear generalized case, we derive the
decision function of the following form:
⎛ m ⎞ ⎛ m ⎞
f ( x ) = sign ⎜ ∑ yiα i* Φ ( x ) , Φ ( xi ) + g * ⎟ = sign ⎜ ∑ yiα i* K ( x, xi ) + g * ⎟ (11.8)
⎝ i =1 ⎠ ⎝ i =1 ⎠
The function K is defined as the kernel function for generating the inner products
to construct machines with different types of nonlinear decision hyperplane in the
input space. There are several kernel functions, depicted as follows. The
determination of kernel function type depends on the problem’s complexity [12].
Outliers
Figures
Volumes
A a<A
Selected procedure
Start
Parent selection
Crossover
Generation = No
generation+1 Mutation
End
z Selection: Select the antibodies in the memory cells. Antibodies with higher
values of Antigen are treated as candidates to enter the memory cell. How-
ever, the antibody candidates with Antibodiesij values exceeding the thresh-
old are not qualified to enter the memory cell.
z Crossover and mutation: The antibody population is undergoing crossover
and mutation. Crossover and mutation are used to generate new antibodies.
When conducting the crossover operation, strings representing antibodies
are paired randomly. Segments of paired strings between two predetermined
break-points are swapped.
z Perform tabu search [11] on each antibody: Evaluate neighbor antibodies
and adjust the tabu list. The antibody with the better classification error and
not recorded on the tabu list is placed on the tabu list. If the best neighbor
antibody is the same as one of the antibodies on the tabu list, then the next
set of neighbor antibodies is generated and the classification error of the an-
tibody calculated. The next set of neighbor antibodies is generated from the
best neighbor antibodies in the current iteration.
z Current antibody selection by tabu search: If the best neighbor antibody is
better than the current antibody, then the current antibody is replaced by the
best neighbor antibody. Otherwise, the current antibody is retained.
z Next generation: From a population for the next generation.
z Stop criterion: If the number of epochs is equal to a given scale, then the
best antibodies are presented as a solution; otherwise go to Step (b) [32, 33].
solution from bit in the population at iteration t. To search for an optimal solution,
each particle changes its velocity according to cognition and sociality. Each parti-
cle then moves to a new potential solution. The use of PSO algorithm to select
SVM parameters is described as follows. First, initialize a random population of
particles and velocities. Second, define the fitness of each particle. The fitness
function of PSO is represented as the classification accuracy of SVM models.
Each particle’s velocity is expressed by Eq. (11.11). For each particle, the
procedure then moves to the next position according to Eq. (11.12).
( ) (
S igt = Sigt −1 + c1 j1 Bigt − yigt + c2 j2 Bmg
t
− y mg
t
)
, g = 1, … , G (11.11)
where c1 is the cognitive learning factor, c2 is the social learning factor, and j1 and
j2 are the random numbers uniformly distributed in U(0,1).
Yigt +1 = Yigt + Sigt , g = 1, … , G (11.12)
Finally, if the termination criterion is reached, the algorithm stops; otherwise re-
turn to the step of fitness measurement [34]. The architecture of PSO is illustrated
in Fig. 11.9.
provide specific reasons why the application was rejected; and indefinite and
vague reasons for denial are illegal [23]. Comprehensibility can be added to SVM
by extracting symbolic rules from the trained model. Rule extraction techniques
would be used to open up the black box of SVM and generate comprehensible de-
cision rules with approximately the same detective power as the model itself.
There are two ways to open up the black box of SVM, as shown in Fig. 11.10.
A B
SVM
The SVM model with the best
cross validation result
Fig. 11.10 Experimental (A) and de-compositional (B) rule extraction techniques [23]
The SVM with the best cross validation (CV) result is then fed into rule-based
classifier (i.e., decision tree, rough set and so on) to derive the comprehensive de-
cision rules for humans to understand (experimental rule extraction technique).
The concept behind this procedure is the assumption that the trained model can
more appropriately represent the data than can the original dataset. This is to say
that the data of the best CV result is cleaner and free of curial conflicts. The CV is
a re-sampling technique which adopts multiple random training and test subsam-
ples to overcome the overfitting problem. Overfitting would lead to SVM losing
its applicability, as shown in Fig. 11.11. The CV analysis would yield useful in-
sights on the reliability of the SVM model with respect to sampling variation.
High
Training error
Testing error
Error
Optimal point
High Over-fitting point
Model complexity
Fig. 11.11 Classification errors vs. model complexity of SVM models [12]
Decompositional rule extraction was proposed by Nunez et al. [25, 26] and pro-
poses rule-defining regions based on the prototype and support vectors [23]. The
representative of the obtained clusters is prototype vectors. The clustering task is
overcome by vector quantization. There are two kinds of rules which can be
252 P.-F. Pai and M.-F. Hsu
Age
Income
SVM + Prototype
In this section, the scheme of a proposed ESVM model is illustrated. Figure 11.13
shows the flowchart of the ESVM model, including functions of data preprocess-
ing, parameter determination and rule generation. First, the raw data is processed
by data-preprocessing techniques containing data cleaning, data transformation,
feature selection, and dimension reduction. Second, the preprocessed data are di-
vided into two sets: training and testing data sets. The training data set is used to
select a data set used for rule generation. To prevent overfitting, a cross-validation
(CV) procedure is performed at this stage. The testing data set is employed to ex-
amine the classification performance of a well-trained SVM model. Sequentially,
metaheuristics are used to determine the SVM parameters. The training errors of
SVM models are formulated as forms of fitness function of metaheuristics. Thus,
each succeeding iteration produces a smaller classification error. The parameter
search procedure is performed until the stop criterion of the metaheuristic is
reached. The two parameters resulting in the smallest training error are then em-
ployed to undertake testing procedures and therefore testing accuracy is obtained.
Finally, the CV training data set with the smallest testing error is utilized to derive
decision rules by rule extraction mechanisms. Accordingly, the proposed ESVM
model can provide decision rules as well as classification accuracy for decision
makers.
An Enhanced Support Vector Machines Model 253
Data preprocessing
Original data
Data cleaning
Data transformation
Preprocessed-data
No
Determine parameters by meta-heuristics Meta-heuristics stop conditions ?
Yes
Finalized Support Vector Machines models
Rule extraction
Testing accuracy The CV training data set with the most accurate testing result
Rule sets
Decision makers
A numerical example borrowed from Pai et al. [34] was used here to illustrate the
classification and rule generation of SVM models. The original data used in this
example contain 75 listed firms in Taiwan’s stock market. These firms were di-
vided into 25 fraudulent financial statement (FFS) firms and 50 non-fraudulent
financial statement (non-FFS) firms. Published indication or proof of involvement
in issuing FFS was found for the 25 FFS firms. The classification of a financial
254 P.-F. Pai and M.-F. Hsu
Method Features
SFS A1: Net income to Fixed asset; A2: Net profit to
Total asset; A3: Earnings before Interest and Tax;
A4: Inventory to Sales; A5: Total debt to Total
Asset; A6: Pledged shares of Directors
11.7 Conclusion
In this chapter, the three essential issues influencing the performance of SVM
models were pointed out. The three issues are: data preprocessing, parameter de-
termination and rule extraction. Some investigations have been conducted into
each issue respectively. However, this chapter is the first study proposing an en-
hanced SVM model which deals with three issues at the same time. Thanks to data
preprocessing procedure, the computation cost decreases and the classification ac-
curacy increases. Furthermore, the ESVM model provides rules for decision mak-
ers. Rather than the expression of complicated mathematical functions, it is easy
for decision makers to realize the relation and strength between condition attrib-
utes and outcome intuitively form a set of rules. These rules can be reasoned in
both forward and backward ways. For the example in Section 11.6, the forward
reasoning can provide a good direction for managers to improve the current finan-
cial status; and the backward reasoning can protect the wealth of investors and
sustain the stability of financial market.
256 P.-F. Pai and M.-F. Hsu
Acknowledgments. The authors would like to thank the National Science Council
of the Republic of China, Taiwan for financially supporting this research under
Contract No. 96-2628-E-260-001-MY3 & 99-2221-E-260-006.
References
1. Agarwal, S., Agrawal, R., Deshpande, P.M., Gupta, A., Naughton, J.F., Ramakrishnan,
R., Sarawagi, S.: On the computation of multidimensional aggregates. In: Proc. Int.
Conf. Very Large Data Bases, pp. 506–521 (1996)
2. Barbar’a, D., DuMouchel, W., Faloutos, C., Haas, P.J., Hellerstein, J.H., Ioannidis, Y.,
Jagadish, H.V., Johnson, T., Ng, R., Poosala, V., Ross, K.A., Servcik, K.C.: The New
Jersey data reduction report. Bull. Technical Committee on Data Engineering 20, 3–45
(1997)
3. Ballou, D.P., Tayi, G.K.: Enhancing data quality in data warehouse environments.
Comm. ACM 78, 42–73 (1999)
4. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classifcation and Regression Trees,
Wadsworth International Group (1984)
5. Chakrabart, S., Cox, E., Frank, E., Guiting, R.H., Han, J., Jiang, X., Kamber, M.,
Lightstone, S.S., Nadeau, T.P., Neapolitan, R.E., Pyle, D., Refaat, M., Schneider, M.,
Teorey, T.J.I., Witten, H.: Data Mining: Know It All. Morgan Kaufmann, San Fran-
cisco (2008)
6. Taylor, J.S., Cristianini, N.: Support Vector Machines and other kernel-based learning
methods. Cambridge University Press, Cambridge (2000)
7. Dash, M., Liu, H.: Feature selection methods for classification. Intell. Data Anal. (1),
131–156 (1997)
8. Dwyer, D.W., Kocagil, A.E., Stein, R.M.: Moody’s kmv riskcalc v3.1 model (2004)
9. English, L.: Improving Data Warehouse and Business Information Quality: Methods
for Reducing Costs and Increasing. John Wiley & Sons, Chichester (1999)
10. Farmer, J.D., Packard, N.H., Perelson, A.: The immune system, adaptation, and ma-
chine learning. Physica. D 22(1–3), 187–204 (1986)
11. Glover, F., Kelly, J.P., Laguna, M.: Genetic algorithms and tabu search: hybrids for
optimization. Comput. Oper. Res. 22, 111–134 (1995)
12. Hamel, L.H.: Knowledge Discovery with Support Vector Machines. Wiley, Chichester
(2009)
13. Holland, J.H.: Adaptation in Natural and Artificial Systems. University of Michigan
Press, Ann Arbor (1975)
14. Huang, C.L., Chen, M.C., Wang, C.J.: Credit scoring with a data mining approach
based on support vector machines. Expert Systems with Applications 33(4), 847–856
(2007)
15. Kennedy, R.L., Lee, Y., Van Roy, B., Reed, C.D., Lippman, R.P.: Solving Data Min-
ing Problems Through Pattern Recognition. Prentice-Hall, Englewood Cliffs (1998)
16. Kennedy, J., Eberhart, R.: Particle swarm optimization, In Proceedings of IEEE con-
ference on neural network, vol. 4, pp. 1942–1948 (1995)
17. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97,
273–324 (1997)
18. Langley, P., Simon, H.A., Bradshaw, G.L., Zytkow, J.M.: Scientific Discovery: Com-
putational Explorations of the Creative Processes. MIT Press, Cambridge (1987)
An Enhanced Support Vector Machines Model 257
19. Liu, H., Motoda, H.: Feature Extraction, Construction, and Selection: A Data Mining
Perspective. Kluwer Academic Publishers, Dordrecht (1998)
20. Lin, S.W., Shiue, Y.R., Chen, S.C., Cheng, H.M.: Applying enhanced data mining ap-
proaches in predicting bank performance: A case of Taiwanese commercial banks.
Expert Syst. Appl. (36), 11543–11551 (2009)
21. Loshin, D.: Enterprise Knowledge Management: The Data Quality Approach. Morgan
Kaufmann, San Francisco (2001)
22. Lopez, F.G., Torres, G.M., Batista, B.M.: Solving feature subset selection problem by
parallel scatter search. Eur. J. Oper. Res. (169), 477–489 (2006)
23. Martens, D., Baesens, B., Gestel, T.V., Vanthienen, J.: Comprehensible credit scoring
models using rule extraction from support vector machines. Eur. J. Oper. Res. 183(3),
1466–1476 (2007)
24. Martin, D.: Early warning of bank failure a logit regression approach. J. Bank.
Financ. (1), 249–276 (1977)
25. Nunez, H., Angulo, C., Catala, A.: Rule extraction from support vector machines. In:
European Symposium on Artificial Neural Networks Proceedings, pp. 107–112 (2002)
26. Nunez, H., Angulo, C., Catala, A.: Rule based learning systems from SVM and
RBFNN. Tendencias de la mineria de datos en espana, Red Espaola de Minera de Da-
tos (2004)
27. Neter, J., Kutner, M.H., Nachtsheim, C.J., Wasserman, L.: Applied Linear Statistical
Models. Irwin (1996)
28. Olson, J.E.: Data Quality: The Accuracy Dimension. Morgan Kaufmann, San
Francisco (2003)
29. Pai, P.F., Hong, W.C.: Forecasting regional electricity load based on recurrent support
vector machines with genetic algorithms. Electr. Pow. Syst. Res. 74(3), 417–425
(2005)
30. Pai, P.F., Lin, C.S.: A hybrid ARIMA and support vector machines model in stock
price forecasting. Omega 33(6), 497–505 (2005)
31. Pai, P.F.: System reliability forecasting by support vector machines with genetic algo-
rithms. Math. Comput. Model. 433(3-4), 262–274 (2006)
32. Pai, P.F., Chen, S.Y., Huang, C.W., Chang, Y.H.: Analyzing foreign exchange rates by
rough set theory and directed acyclic graph support vector machines. Expert Syst.
Appl. 37(8), 5993–5998 (2010)
33. Pai, P.F., Chang, Y.H., Hsu, M.F., Fu, J.C., Chen, H.H.: A hybrid kernel principal
component analysis and support vector machines model for analyzing sonographic pa-
rotid gland in Sjogren’s Syndrome. International Journal of Mathematical Modelling
and Numerical Optimisation (2010) (in press)
34. Pai, P.F., Hsu, M.F., Wang, M.C.: A support vector machine-based model for detect-
ing top management fraud. Knowl.-Based Syst. 24(2), 314–321 (2011)
35. Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann, San Francisco (1999)
36. Quinlan, J.R.: Unknown attribute values in induction. In: Proc. 1989 Int. Conf. Ma-
chine Learning (ICML 1989), Ithaca, NY, pp. 164–168 (1989)
37. Redman, T.: Data Quality: Management and Technology. Bantam Books (1992)
38. Ross, K., Srivastava, D.: Fast computation of sparse datacubes. In: Proc Int. Conf.
Very Large Data Bases, pp. 116–125 (1997)
39. Sarawagi, S., Stonebraker, M.: Efficient organization of large multidimensional arrays.
In: Proc. Int. Conf. Data Engineering, ICDE 1994 (1994)
40. Siedlecki, W., Sklansky, J.: On automatic feature selection. Int. J. Pattern Recognition
and Artificial Intelligence (2), 197–220 (1988)
258 P.-F. Pai and M.-F. Hsu
41. Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines,
Regularization, Optimization, and Beyond. MIT Press, Cambridge (2001)
42. Vapnik, V.: Statistical learning theory. John Wiley and Sons, New York (1998)
43. Vapnik, V., Golowich, S., Smola, A.: Support vector machine for function approxima-
tion, regression estimation, and signal processing. Advances in Neural Information
processing System (9), 281–287 (1996)
44. Wang, R., Storey, V., Firth, C.: A framework for analysis of data quality research.
IEEE Trans. Knowledge and Data Engineering (7), 623–640 (1995)
45. Zhao, Y., Deshpande, P.M., Naughton, J.F.: An array-based algorithm for simultane-
ous multi-dimensional aggregates. In: Proc. 1997 ACM-SIGMOD Int. Conf. Manage-
ment of Data, pp. 159–170 (1997)
Chapter 12
Benchmark Problems in Structural
Optimization
S. Koziel & X.-S. Yang (Eds.): Comput. Optimization, Methods and Algorithms, SCI 356, pp. 259–281.
springerlink.com © Springer-Verlag Berlin Heidelberg 2011
260 A.H. Gandomi and X.-S. Yang
properties so as to make sure whether or not the tested algorithm can solve certain
types of optimization efficiently. According to the nature of the structural optimi-
zation problems, we can first divide them into two groups: truss and non-truss de-
sign problems. The selected lists of the test problems for each optimization group
are listed below:
Truss design problems:
10-bar plane truss
25-space truss
72-bar truss
120-bar truss dome
200-bar plane truss
26-story truss tower
Non-truss design problems:
Welded beam
Reinforced concrete beam
Compression Spring
Pressure vessel
Speed reducer
Stepped cantilever beam
Frame optimization
Subject to:
The basic requirement for an efficient structural design is that the response of
the structure be acceptable for given various specifications. That is, a set of
parameters should at least be in a feasible design. There can be a very large
number of feasible designs, but it is desirable to choose the best of these designs.
The best design can be identified using minimum cost, minimum weight, maxi-
mum performance or, a combination of these [1]. Obviously, parameters may have
associated uncertainties, and in this case, a robust design solution, not necessarily
the best solution, is often the best choice in practice. As parameter variations are
usually very large, systematic adaptive searching or optimization procedures are
required. In the past several decades, researchers have developed many optimiza-
tion algorithms. Examples of conventional methods are hill climbing, gradient
methods, random search, simulated annealing, and heuristic methods. Examples of
evolutionary or biology-inspired algorithms are genetic algorithms [2], neural
network [3], particle swarm optimization [4], firefly algorithm [5], cuckoo search
[6], and many others. The methods used to solve a particular structural problem
depend largely on the type and characteristics of the optimization problem itself.
There is no universal method that works for all structural problems, and there is
generally no guarantee to find the globally optimal solution in highly nonlinear
global optimization problems. In general, we can emphasize on the best estimate
or suboptimal solutions under given conditions. Knowledge about a particular
problem is always helpful to make the appropriate choice of the best or most effi-
cient methods for the optimization procedure.
widely used in structural engineering [7, 8]. Also, examples of truss structure
design optimization are extensively used in the literature to compare the efficiency
of optimization algorithms [9-12]. Then, we introduce five examples of non-truss
optimization problems under static constraints.
where W(A) is the weight of the structure; NM is the number of member in the
structure; γi represents the material density of member i; Li is the length of mem-
ber i; Ai is the cross-sectional area of member i chosen between Amin and Amax (the
lower bound and upper bound, respectively). Any optimal design also has to sat-
isfy some inequality constraints that limit design variable sizes and structural re-
sponses [15].
The main issue in truss optimization is to deal with constraints because the
weight of each truss structures can be simplified to an explicit formula [16]. Gen-
erally, a truss structure has one of the following three kinds of constraints:
Stress constraints: each member is under tensile or compressive strength so for
each member of the structure, the positive tensile stress should be less than the al-
lowable tensile stress (σmax), while the compressive stress should be less than the
allowable compressive stress (σmin). In each truss optimization problem, we have
2NM stress constraints. These constraints can be formulated as follow:
For truss optimization problems, there are only two constant mechanical proper-
ties: elastic modulus (E), and material density (γ). Structural analysis of each truss
can readily be carried out using the finite element method.
This truss example is one of the most well-known structural optimization bench-
marks [17]. It has been widely used by many researchers as a standard 2D bench-
mark for truss optimization (e.g., [16-18]). The geometry and loading of a 10-bar
truss is presented in Figure 12.1. This problem has many variations and has been
solved with only continuous or discrete variables. The main objective is to find
minimum weight of the truss by changing the areas of elements, so it has 10
variables in total.
This spatial truss structure has been solved by many researchers as a benchmark
structural problem [19]. The topology and nodal numbers of a 25-bar spatial truss
structure are shown in Figure 12.2 where 25 members are categorized into eight
groups, so it has eight individual variables. This problem has been solved with
various loading conditions (e.g. [10, 11, 20, 21]).
264 A.H. Gandomi and X.-S. Yang
Group Element(s)
1 A1
2 A2–A5
3 A6–A9
4 A10–A11
5 A12–A13
6 A14–A17
7 A18–A21
8 A22–A25
The 72-bar truss is a challenging benchmark that has also been used by many re-
searchers (e.g., [9, 18, 22, 23]). As shown in Figure 12.3, this truss has 16 inde-
pendent groups of design variables. It is usually subjected to two different loading
inputs.
Benchmark Problems in Structural Optimization 265
Group Element(s)
1 A1–A4
2 A5–A12
3 A13–A16
4 A17–A18
5 A19–A22
6 A23–A30
7 A31–A34
8 A35–A36
9 A37–A40
10 A41–A48
11 A49–A52
12 A53–A54
13 A55–A58
14 A59–A66
15 A67–A70
16 A71–A72
The 120-bar truss dome is used as a benchmark problem in some researches (e.g.,
[10, 16]). This symmetrical space truss, shown in Figure 12.4, has a diameter of
31.78 m, and its 120 members are divided into 7 groups, taking the symmetry of
the structure into account. Because of symmetry, the design of one-fourth of the
dome is sufficient. The truss is subjected to vertical loading at all the unsupported
joints. According to the American institute of steel construction (AISC) code for
allowable stress design (ASD) [24] standards, the allowable tensile stress (σmax) is
equal to 0.6Fy (Fy is the yield stress of the steel), and the allowable compressive
stress (σmin) is calculated according to the slenderness.
266 A.H. Gandomi and X.-S. Yang
The benchmark 200-bar plane truss structure shown in figure 12.5 which has been
solved in many papers with different number of variables. The 200 structural
members of this planar truss has been categorized as 29 [11], 96 [25] or 105 [26]
groups using symmetry in the literature. Some researchers have also solved it with
Benchmark Problems in Structural Optimization 267
Figure 12.6 shows the geometry and the element groups of the recently developed
26-story-tower space truss ([10, 11, 29-31]). This truss is a large-scale truss prob-
lem containing 244 nodes and 942 elements. In this truss structure, 59 element
groups employ the symmetry of the structure. This problem has been solved as a
continuous problem and as a discrete problem [30]. More details of this problem
can be found in [31].
268 A.H. Gandomi and X.-S. Yang
The design of a welded beam which minimizes the overall cost of fabrication was
introduced as a benchmark structural engineering problem by Rao [19].
Figure 12.7 shows a beam of low-carbon steel (C-1010), welded to a rigid support.
The welded beam is fixed and designed to support a load (P). The thickness of the
weld (h), the length of the welded joint (12.l), the width of the beam (t) and the
thickness of the beam (b) are the design variables. The values of h and l can only
take integer multiples of 0.0065, but many researchers consider them continuous
variables [32]. The objective function of the problem is expressed as follows:
Benchmark Problems in Structural Optimization 269
side constraints
g5 = 0.25− δ ≥ 0 (12.13)
where
τ= (τ ′)2 + (τ ′′)2 + lτ ′τ ′′ / (
0.25 l 2 + (h + t )2 ) (12.14)
504000
σ= (12.15)
t 2b
2.1952
δ= (12.17)
t 3b
270 A.H. Gandomi and X.-S. Yang
6000
τ′ = (12.18)
2 hl
τ ′′ =
(
6000 (14 + 0 .5l ) 0 .25 l 2 + (h + t )2 ) (12.19)
2{0 .707 hl (l / 12 + 0 . 25 (h + t ) )}
2 2
The simple bounds of the problem are: 0.125 ≤ h ≤ 5, 0.1 ≤ l, t ≤ 10 and 0.1 ≤ b ≤
5. The constant values for the formulation are given in Table 12.2.
This problem has been solved by many researchers in the literature (e.g., [15,
33, 34]) here are two different solutions presented. One has an optimal function
value of around 2.38 and the other one (with a difference in one of the constraints)
has an optimal function value of about 1.7. Deb and Goyal [35] extended this
problem to choose one of the four types of materials of the beam and two types of
welded joint configurations.
The problem of designing a reinforced concrete beam has many variations and has
been solved by various researchers with different kinds of constraints (e.g., [36, 37]).
A simplified optimization problem minimizing the total cost of a reinforced concrete
beam, shown in Figure 12.8, was presented by Amir and Hasegawa [38]. The beam is
simply supported with a span of 30 ft and subjected to a live load of 2.0 klbf and a
dead load of 1.0 klbf including the weight of the beam. The concrete compressive
strength (σc) is 5 ksi, and the yield stress of the reinforcing steel (Fy) is 50 ksi. The
cost of concrete is $0.02/in2/linear ft and the cost of steel is $1.0/in2/linear ft. The aim
of the design is to determine the area of the reinforcement (As), the width of the beam
Benchmark Problems in Structural Optimization 271
(b) and the depth of the beam (h) such that the total cost of structure is minimized.
Herein, the cross-sectional area of the reinforcing bar (As) is taken as a discrete type
variable that must be chosen from the standard bar dimensions listed in [38]. The
width of concrete beam (b) assumed to be an integer variable, and the depth (h) of the
beam is a continuous variable. The effective depth is assumed to be 0.8h.
⎛ As F y ⎞
M u = 0 .9 A s F y (0 .8 h )⎜⎜ 1 . 0 − 0 .59 ⎟ ≥ 1 .4 M d + 1 . 7 M l
⎟ (12.22)
⎝ 0 .8bh σ c ⎠
where Mu, Md and Ml are, respectively, the flexural strength, dead load and live
load moments of the beam. In this case, Md = 1350 in.kip and Ml = 2700 in.kip.
This constraint can be simplified as [40]:
2
A
g 2 = 180 + 7 .375 s − As h ≤ 0
b (12.23)
The bounds of the variables are b є {28, 29, …, 40} inches, 5 ≤ h ≤ 10 inches, and
As is a discrete variable that must be chosen from possible reinforcing bars by
ACI. The best solution obtained by the existing methods so far is 359.208 with
h = 34, b = 8.5 and As = 6.32 (15#6 or 11#7) using firefly algorithm [41].
The problem of spring design has many variations and has been solved by various
researchers. Sandgren [42] minimized the volume of a coil compression spring
272 A.H. Gandomi and X.-S. Yang
with mixed variables and Deb and Goyal [35] tried to minimize the weight of a
Belleville spring. The most well-known spring problem is the design of a tension–
compression spring for a minimum weight [43]. Figure 12.9 shows a tension–
compression spring with three design variables: the wire diameter (d), the mean
coil diameter (D), and the number of active coils (N). The weight of the spring is
to be minimized, subject to constraints on the minimum deflection (g1), shear
(g2), and surge frequency (g3), and to limits on the outside diameter (g4) [43].
The problem can be expressed as follows:
Minimize: f ( N , D, d ) = (N + 2 ) × Dd
2
(12.24)
Subject to:
D3 N (12.25)
g1 = 1 − ≤0
71785d 4
4D 2 − Dd 1
g2 = + −1 ≤ 0 (12.26)
(
12566 Dd − d
3 4
)
5108d 2
140.45d
g3 = 1 − ≤0 (12.27)
D2 N
D+d
g4 = −1 ≤ 0 (12.28)
1.5
where 0 .05 ≤ d ≤ 1, 0 .25 ≤ D ≤ 1 .3 and 2 ≤ N ≤ 15 .
Many researchers have tried to solve this problem (e.g., [33, 44, 45]) and it
seems the best results obtained for this problem is equal to 0.0126652 with d =
0.05169, D = 0.35673, N = 11.28846 using bat algorithm [46].
Pressure vessel is a closed container that holds gases or liquids at a pressure, typi-
cally significantly higher than the ambient pressure. A cylindrical pressure vessel
capped at both ends by hemispherical heads is presented in Figure 12.10. The
pressure vessels are widely used for engineering purposes and this optimization
Benchmark Problems in Structural Optimization 273
problem was proposed by Sandgren [42]. This compressed air tank has a working
pressure of 3000 psi and a minimum volume of 750 ft3, and is designed according
to the American society of mechanical engineers (ASME) boiler and pressure ves-
sel code. The total cost, which includes a welding cost, a material cost, and a
forming cost, is to be minimized. The variables are the thickness of shell (Ts),
thickness of the head (Th), the inner radius (R), and the length of the cylindrical
section of the vessel (L). The thicknesses (Ts and Th) can only take integer multi-
ples of 0.0625 inch.
The constraints are defined in accordance with the ASME design codes where
g3 represents the constraint function of minimum volume of 750 feet3 and others
are the geometrical constraints. The constraints are as follow:
g1 = −Ts + 0.0193R ≤ 0 (12.30)
4 3
g 3 = −πR 2 L − πR + 750 × 11728 ≤ 0 (12.32)
3
g 4 = L − 240 ≤ 0 (12.33)
where 1×0.0625 ≤ Ts, Th ≤ 99×0.0625, 10 ≤ R, and L ≤ 200. The minimum cost
and the statistical values of the best solution obtained in about forty different stud-
ies are reported in [47]. According to this paper, the best results are a total cost of
$6059.714. Although nearly all researchers use 200 as the upper limit of variable
L, it was extended to 240 in a few studies (e.g., [41]) in order to investigate the
last constrained problem region. Use this bound, the best result was decreased to
about $5850. It seems this variation may be a new challenging benchmarking
problem. It should also be noted that if an approximate value for π is used in the g3
constraint calculation, then the best result cannot be achieved (actually a smaller
274 A.H. Gandomi and X.-S. Yang
value will be obtain). Thus, the exact value of π should be used in this problem.
From the implementation point of view, a more accurate approximation of π
should be used.
A speed reducer is part of the gear box of mechanical system, and it also is used
for many other types of applications. The design of a speed reducer is a more chal-
lenging benchmark [48], because it involves seven design variables As shown in
Figure 12.11, these variables are the face width (b), the module of the teeth (m),
the number of teeth on pinion (z), the length of the first shaft between bearings
(l1), the length of the second shaft between bearings (l2), the diameter of the first
shaft (d1), and the diameter of the second shaft (d2).
The objective is to minimize the total weight of the speed reducer. There are
nine constraints, including the limits on the bending stress of the gear teeth, sur-
face stress, transverse deflections of shafts 1 and 2 due to transmitted force, and
stresses in shafts 1 and 2.
The mathematical formulation can be summarized as follows:
( ) ( ) (
− 1.508b d1 + d 2 + 7.477 d1 + d 2 + 0.7854 l1d1 + l2 d 2
2 2 3 3 2
)2
Subject to:
27
g1 = P −1 ≤ 0 (12.35)
bm 2 z
397 . 5
g2 = −1≤ 0 (12.36)
bm 2 z 2
1 . 93
g3 = 3 4
−1≤ 0 (12.37)
mzl 1 d 1
Benchmark Problems in Structural Optimization 275
1 .93
g4 = 3 4
−1 ≤ 0 (12.38)
mzl 2 d 2
2
⎛ 745 l1 ⎞
⎜ ⎟ + 1 .69 × 10
6
⎝ mz ⎠
g5 = 3
−1≤ 0 (12.39)
110 d 1
2
⎛ 745 l1 ⎞
⎜ ⎟ + 157 .5 × 10
6
⎝ mz ⎠
g6 = 3
−1 ≤ 0 (12.40)
85 d 2
mz
g7 = −1≤ 0 (12.41)
40
5m
g8 = −1 ≤ 0 (12.42)
B −1
b
g9 = −1≤ 0 (12.43)
12 m
In addition, the design variables are also subject to the simple bounds listed in col-
umn 2 of Table 12.2. This problem has been solved by many researchers (e.g., [49,
50]) and it seems the best weight of the speed reducer is about 3000 (kg) [47, 51].
The corresponding values of this solution so far are presented in Table 12.2.
variables in different cases in the literature [8, 53]. Figure 12.12 illustrates a
five-stepped cantilever beam with a rectangular shape. In this problem, the height
and width of the beam in all five steps of the cantilever beam are the design vari-
ables, and the volume of the beam is to be minimized. The objective function is
formulated as follows:
Minimize: V = D (b1h1l1 + b2 h2l 2 + b3 h3l3 + b4 h4l 4 + b5 h5l5 ) (12.44)
6 P (l s + l 4 )
g2 = −σd ≤ 0 (12.46)
b4 h42
6P(l s + l4 + l3 )
g3 = −σd ≤ 0 (12.47)
b3h32
6 P (l s + l 4 + l3 + l 2 )
g4 = −σ d ≤ 0 (12.48)
b2 h22
6 P (l s + l 4 + l3 + l 2 + l1 )
g5 = −σd ≤ 0 (12.49)
b1h12
Pl 3 ⎛ 1 7 19 37 61 ⎞ (12.50)
g6 = ⎜ ⎟
⎜ I + I + I + I + I ⎟ − Δ max ≤ 0
3E ⎝ s 4 3 2 1 ⎠
Benchmark Problems in Structural Optimization 277
• A specific aspect ratio of 20 has to be maintained between the height and width
of each of the five cross sections of the beam:
h5
g7 = − 20 ≤ 0
b5
(12.51)
h4
g8 = − 20 ≤ 0
b4
(12.52)
h3
g9 = − 20 ≤ 0
b3
(12.53)
h2
g10 = − 20 ≤ 0
b2
(12.54)
h1
g 11 = − 20 ≤ 0
b1
(12.55)
The initial design space for the cases with continuous, discrete and, mixed variable
formulations can be found in Thanedar and Vanderplaats [52].
This problem can be used as a large-scale optimization problem if the number
of segments of the beam is increased. When the beam has N segments, it has
2N+1 constrains including N stress constraints, N aspect ratio constraints and a
displacement constraint. Vanderplaats [54] solved this problem as a very large
structural optimization up to 25,000 segments and 50,000 variables.
Frame design is one of the popular structural optimization benchmarks. Many re-
searchers have attempted to solve frame structures as a real-world, discrete-
variable problem, using different methods (e.g., [55, 56]). The design variables of
frame structures are cross sections of beams and columns which have to be chosen
from standardized cross sections. Recently, Hasançebi et al. [57] compared seven
well-known structural design algorithms for weight minimization of some steel
frames, including ant colony optimization, evolution strategies, harmony search
method, simulated annealing, particle swarm optimizer, tabu search and genetic
algorithms. Among these algorithms, they showed that simulated annealing and
evolution strategies performed best for frame optimization.
One of the well-known frame structures was introduced by Khot et al. [58].
This problem has been solved by many researchers (e.g., [59, 60]), and now can
be considered as a frame-structure benchmark. The frame has one bay, eight sto-
ries, and applied loads (see Figure 12.13). This problem has eight element groups.
The values of the cross section groups are chosen from all 267 W-shapes of AISC.
278 A.H. Gandomi and X.-S. Yang
3.4 m
12.529 kN
8.743 kN
7.264 kN
6.054 kN
[email protected] m
4.839 kN
3.630 kN
2.420 kN
1.210 kN
There are many other benchmark problem sets in engineering optimization, and
there is no agreed upon guideline for their use. Interested readers can found more in-
formation about additional benchmarks in recent books and review articles [61, 62].
References
1. Iyengar, N.G.R.: Optimization in Structural design. DIRECTIONS, IIT Kanpur 6,
41–47 (2004)
2. Goldberg, D.E.: Genetic Algorithms in Search. In: Optimization and Machine Learn-
ing. Addison-Wesley, Reading (1989)
3. McCulloch, W.S., Pitts, W.: A logical calculus of the idea immanent in nervous activ-
ity. Bulletin of Mathematical Biophysics 5, 115–133 (1943)
4. Eberhart, R.C., Kennedy, J.: A new optimizer using particle swarm theory. In: Pro-
ceedings of the sixth international symposium on micro machine and human science,
Nagoya, Japan (1995)
5. Yang, X.S.: Nature-Inspired Metaheuristic Algorithms. Luniver Press (2008)
6. Yang, X.S., Deb, S.: Cuckoo search via Levy flights. In: World Congress on Nature &
Biologically Inspired Computing (NaBIC 2009), pp. 210–214. IEEE publication, Los
Alamitos (2009)
7. Stolpe, M.: Global optimization of minimum weight truss topology problem with
stress, displacement, and local buckling constraints using branch-and-bound. Interna-
tional Journal For Numerical Methods In Engineering 61, 1270–1309 (2004)
8. Lamberti, L., Pappalettere, C.: Move limits definition in structural optimization with
sequential linear programming. Part II: Numerical examples, Computers and Struc-
tures 81, 215–238 (2003)
9. Kaveh, A., Talatahari, S.: Size optimization of space trusses using Big Bang–Big
Crunch algorithm. Computers and Structures 87, 1129–1140 (2009)
10. Kaveh, A., Talatahari, S.: Optimal design of skeletal structures via the charged system
search algorithm. Struct. Multidisc. Optim. 41, 893–911 (2010)
11. Gandomi, A.H., Yang, X.S., Talatahari, S.: Optimum Design of Steel Trusses using
Cuckoo Search Algorithm (submitted for publication)
12. Floudas, C.A., Pardolos, P.M.: Encyclopedia of Optimization, 2nd edn. Springer,
Heidelberg (2009)
13. Maglaras, G., Ponslet, E., Haftka, R.T., Nikolaidis, E., Sensharma, P., Cudney, H.H.:
Analytical and experimental comparison of probabilistic and deterministic optimiza-
tion (structural design of truss structure). AIAA Journal 34(7), 1512–1518 (1996)
14. Hasançebi, O., Çarbas, S., Dogan, E., Erdal, F., Saka, M.P.: Performance evaluation of
metaheuristic search techniques in the optimum design of real size pin jointed struc-
tures. Computers and Structrures, 284–302 (2009)
15. Lee, K.S., Geem, Z.W.: A new structural optimization method based on the harmony
search algorithm. Comput. Struct. 82, 781–798 (2004)
16. Iranmanesh, A., Kaveh, A.: Structural optimization by gradient-based neural networks.
Int. J. Numer. Meth. Engng. 46, 297–311 (1999)
17. Kirsch, U.: Optimum Structural Design, Concepts, Methods and Applications.
McGraw-Hill, New York (1981)
18. Lemonge, A.C.C., Barbosa, H.J.C.: An adaptive penalty scheme for genetic algorithms
in structural optimization. Int. J. Numer. Meth. Engng. 59, 703–736 (2004)
280 A.H. Gandomi and X.-S. Yang
19. Rao, S.S.: Engineering Optimization, 3rd edn. Wiley, New York (1996)
20. Patnaik, S.N., Gendys, A.S., Berke, L., Hopkins, D.A.: Modified fully utilized design
(mfud) method for stress and displacement constraints. Int. J. Numer. Meth. Engng. 41,
1171–1194 (1998)
21. Kaveh, A., Rahami, H.: Analysis, design and optimization of structures using force
method and genetic algorithm. Int. J. Numer. Meth. Engng. 65, 1570–1584 (2006)
22. Sonmez, M.: Discrete optimum design of truss structures using artificial bee colony al-
gorithm. Struct. Multidisc. Optim. 43, 85–97 (2011)
23. Sedaghati, R.: Benchmark case studies in structural design optimization using the force
method. International Journal of Solids and Structures 42, 5848–5871 (2005)
24. AISC-ASD Manual of steel construction-allowable stress design. 9th edn., American
Institute of Steel Construction, Chicago (1989)
25. Dede, T., Bekiroğlu, S., Ayvaz, Y.: Weight minimization of trusses with genetic
algorithm (2011), doi:10.1016/j.asoc.2010.10.006
26. Genovese, K., Lamberti, L., Pappalettere, C.: Improved global–local simulated anneal-
ing formulation for solving non-smooth engineering optimization problems. Interna-
tional Journal of Solids and Structures 42, 203–237 (2005)
27. Korycki, J., Kostreva, M.: Norm-relaxed method of feasible directions: application in
structural optimization. Structural Optimization 11, 187–194 (1996)
28. Lamberti, L.: An efficient simulated annealing algorithm for design optimization of
truss structures. Computers and Structures 86, 1936–1953 (2008)
29. Hasancebi, O., Erbatur, F.: On efficient use of simulated annealing in complex struc-
tural optimization problems. Acta. Mech. 157, 27–50 (2002)
30. Hasancebi, O.: Adaptive evolution strategies in structural optimization: Enhancing
their computational performance with applications to large-scale structures. Comput.
Struct. 86, 119–132 (2008)
31. Adeli, H., Cheng, N.T.: Concurrent genetic algorithms for optimization of large struc-
tures. J. Aerospace Eng. 7, 276–296 (1994)
32. Zhang, M., Luo, W., Wang, X.: Differential evolution with dynamic stochastic selec-
tion for constrained optimization. Information Sciences 178, 3043–3074 (2008)
33. Coello, C.A.C.: Constraint-handling using an evolutionary multiobjective optimization
technique. Civil Engineering and Environmental Systems 17, 319–346 (2000)
34. Deb, K.: An efficient constraint handling method for genetic algorithms. Computer
Methods in Applied Mechanics and Engineering 186, 311–338 (2000)
35. Deb, K., Goyal, M.: A combined genetic adaptive search (GeneAS) for engineering
design. Comput. Sci. Informatics 26(4), 30–45 (1996)
36. Coello, C.A.C., Hernandez, F.S., Farrera, F.A.: Optimal Design of Reinforced Con-
crete Beams Using Genetic Algorithms. Expert systems with Applications 12(1),
101–108 (1997)
37. Saini, B., Sehgal, V.K., Gambhir, M.L.: Genetically Optimized Artificial Neural Net-
work Based Optimum Design Of Singly And Doubly Reinforced Concrete Beams.
Asian Journal of Civil Engineering (Building And Housing) 7(6), 603–619 (2006)
38. Amir, H.M., Hasegawa, T.: Nonlinear mixed-discrete structural optimization. J. Struct.
Engng. 115(3), 626–645 (1989)
39. ACI 318-77, Building code requirements for reinforced concrete. American Concrete
Institute, Detroit, Mich (1977)
40. Liebman, J.S., Khachaturian, N., Chanaratna, V.: Discrete structural optimization. J.
Struct. Div. 107(ST11), 2177–2197 (1981)
Benchmark Problems in Structural Optimization 281
41. Gandomi, A.H., Yang, X.S., Alavi, A.H.: Mixed Discrete Structural Optimization
Using Firefly Algorithm (2010) (submitted for publication)
42. Sandgren, E.: Nonlinear Integer and Discrete Programming in Mechanical Design
Optimization. J. Mech. Design 112(2), 223–229 (1990)
43. Arora, J.S.: Introduction to Optimum Design. McGraw-Hill, New York (1989)
44. Aragon, V.S., Esquivel1, S.C., Coello, C.A.C.: A modified version of a T-Cell Algorithm
for constrained optimization problems. Int. J. Numer. Meth. Engng. 84, 351–378 (2010)
45. Coello, C.A.C.: Self-adaptive penalties for GA based optimization. In: Proceedings of
the Congress on Evolutionary Computation, vol. 1, pp. 573–580 (1999)
46. Gandomi, A.H., Yang, X.S., Alavi, A.H.: Bat Algorithm for Solving Nonlinear Con-
strained Engineering Optimization Tasks(2010) (submitted for publication)
47. Gandomi, A.H., Yang, X.S., Alavi, A.H.: Cuckoo Search Algorithm: A Metaheuristic
Approach to Solve Structural Optimization Problems (2010) (submitted for publica-
tion)
48. Golinski, J.: An adaptive optimization system applied to machine synthesis. Mech.
Mach. (1973)
49. Akhtar, S., Tai, K., Ray, T.: A socio-behavioural simulation model for engineering de-
sign optimization. Eng. Optmiz. 34(4), 341–354 (2002)
50. Kuang, J.K., Rao, S.S., Chen, L.: Taguchi-aided search method for design optimization
of engineering systems. Eng. Optmiz. 30, 1–23 (1998)
51. Yang, X.S., Gandomi, A.H.: Bat Algorithm: A Novel Approach for Global Engineer-
ing Optimization (2011) (submitted for publication)
52. Thanedar, P.B., Vanderplaats, G.N.: Survey of discrete variable optimization for struc-
tural design. Journal of Structural Engineering 121(2), 301–305 (1995)
53. Huang, M.W., Arora, J.S.: Optimal Design With Discrete Variables: Some Numerical
Experiments. International Journal for Numerical Methods in Engineering 40, 165–188
(1997)
54. Vanderplaats, G.N.: Very Large Scale Optimization. NASA/CR-2002 211768 (2002)
55. Degertekin, S.O.: Optimum design of steel frames using harmony search algorithm.
Struct. Multidisc. Optim. 36, 393–401 (2008)
56. Greiner, D., Emperador, J.M., Winter, G.: Single and multiobjective frame optimiza-
tion by evolutionary algorithms and the auto-adaptive rebirth operator. Computer
Methods in Applied Mechanics and Engineering 193(33-35), 3711–3743 (2004)
57. Hasançebi, O., Çarbas, S., Dogan, E., Erdal, F., Saka, M.P.: Comparison of non-
deterministic search techniques in the optimum design of real size steel frames.
Computers and Structures 88, 1033–1048 (2010)
58. Khot, N.S., Venkayya, V.B., Berke, L.: Optimum structural design with stability con-
straints. Int. J. Numer. Methods Eng. 10, 1097114 (1976)
59. Kaveh, A., Shojaee, S.: Optimal design of skeletal structures using ant colony optimi-
sation. Int. J. Numer. Methods Eng. 5(70), 563–581 (2007)
60. Camp, C.V., Pezeshk, S., Cao, G.: Optimized design of two dimensional structures us-
ing a genetic algorithm. J. Struct. Eng. ASCE 124(5), 551–559 (1998)
61. Alimoradi, A., Foley, C.M., Pezeshk, S.: Benchmark problems in structural design and
performance optimization: past, present and future – part I. In: Senapathi, S.,
Hoit, C.K. (eds.) 19th ASCE Conf. Proc., State of the Art and Future Challenges in
Structure. ASCE Publications (2010)
62. Yang, X.S.: Engineering Optimization: An Introduction with Metaheuristic Applica-
tions. John Wiley and Sons, Chichester (2010)