Ref3 - Nature-Inspired Metaheuristic Algorithms
Ref3 - Nature-Inspired Metaheuristic Algorithms
Metaheuristic Algorithms
Second Edition
Xin-She Yang
Luniver Press
Published in 2010 by Luniver Press
Frome, BA11 6TT, United Kingdom
www.luniver.com
All rights reserved. This book, or parts thereof, may not be reproduced in
any form or by any means, electronic or mechanical, including photocopy-
ing, recording or by any information storage and retrieval system, without
permission in writing from the copyright holder.
ISBN-13: 978-1-905986-28-6
ISBN-10: 1-905986-28-9
While every attempt is made to ensure that the information in this publi-
cation is correct, no liability can be accepted by the authors or publishers
for loss, damage or injury caused by any errors in, or omission from, the
information given.
CONTENTS
1 Introduction 1
1.1 Optimization 1
1.2 Search for Optimality 2
1.3 Nature-Inspired Metaheuristics 4
1.4 A Brief History of Metaheuristics 5
3 Simulated Annealing 21
3.1 Annealing and Boltzmann Distribution 21
3.2 Parameters 22
3.3 SA Algorithm 23
3.4 Unconstrained Optimization 24
3.5 Stochastic Tunneling 26
5 Genetic Algorithms 41
5.1 Introduction 41
5.2 Genetic Algorithms 42
5.3 Choice of Parameters 43
6 Differential Evolution 47
6.1 Introduction 47
6.2 Differential Evolution 47
6.3 Variants 50
6.4 Implementation 50
8 Swarm Optimization 63
8.1 Swarm Intelligence 63
8.2 PSO algorithms 64
8.3 Accelerated PSO 65
8.4 Implementation 66
8.5 Convergence Analysis 69
9 Harmony Search 73
9.1 Harmonics and Frequencies 73
9.2 Harmony Search 74
9.3 Implementation 76
10 Firefly Algorithm 81
10.1 Behaviour of Fireflies 81
10.2 Firefly Algorithm 82
10.3 Light Intensity and Attractiveness 83
10.4 Scalings and Asymptotics 84
10.5 Implementation 86
10.6 FA variants 89
10.7 Spring Design 89
11 Bat Algorithm 97
11.1 Echolocation of bats 97
11.1.1 Behaviour of microbats 97
11.1.2 Acoustics of Echolocation 98
11.2 Bat Algorithm 98
11.2.1 Movement of Virtual Bats 99
11.2.2 Loudness and Pulse Emission 100
11.3 Validation and Discussions 101
11.4 Implementation 102
11.5 Further Topics 103
References 141
v
Since the publication of the first edition of this book in 2008, significant
developments have been made in metaheuristics, and new nature-inspired
metaheuristic algorithms emerge, including cuckoo search and bat algo-
rithms. Many readers have taken time to write to me personally, providing
valuable feedback, asking for more details of algorithm implementation,
or simply expressing interests in applying these new algorithms in their
applications.
In this revised edition, we strive to review the latest developments in
metaheuristic algorithms, to incorporate readers’ suggestions, and to pro-
vide a more detailed description to algorithms. Firstly, we have added
detailed descriptions of how to incorporate constraints in the actual imple-
mentation. Secondly, we have added three chapters on differential evolu-
tion, cuckoo search and bat algorithms, while some existing chapters such
as ant algorithms and bee algorithms are combined into one due to their
similarity. Thirdly, we also explained artificial neural networks and sup-
port vector machines in the framework of optimization and metaheuristics.
Finally, we have been trying in this book to provide a consistent and uni-
fied approach to metaheuristic algorithms, from a brief history in the first
chapter to the unified approach in the last chapter.
Furthermore, we have provided more Matlab programs. At the same
time, we also omit some of the implementation such as genetic algorithms,
as we know that there are many good software packages (both commercial
and open course). This allows us to focus more on the implementation of
new algorithms. Some of the programs also have a version for constrained
optimization, and readers can modify them for their own applications.
Even with the good intention to cover most popular metaheuristic al-
gorithms, the choice of algorithms is a difficult task, as we do not have
the space to cover every algorithm. The omission of an algorithm does not
mean that it is not popular. In fact, some algorithms are very powerful
and routinely used in many applications. Good examples are Tabu search
and combinatorial algorithms, and interested readers can refer to the refer-
ences provided at the end of the book. The effort in writing this little book
becomes worth while if this book could in some way encourage readers’
interests in metaheuristics.
Xin-She Yang
August 2010
vi
Xin-She Yang
Cambridge, 2008
Chapter 1
INTRODUCTION
1.1 OPTIMIZATION
out that we can also write the inequalities in the other way ≥ 0, and we
can also formulate the objectives as a maximization problem.
In a rare but extreme case where there is no objective at all, there are
only constraints. Such a problem is called a feasibility problem because
any feasible solution is an optimal solution.
If we try to classify optimization problems according to the number
of objectives, then there are two categories: single objective M = 1 and
multiobjective M > 1. Multiobjective optimization is also referred to as
multicriteria or even multi-attributes optimization in the literature. In
real-world problems, most optimization tasks are multiobjective. Though
the algorithms we will discuss in this book are equally applicable to mul-
tiobjective optimization with some modifications, we will mainly place the
emphasis on single objective optimization problems.
Similarly, we can also classify optimization in terms of number of con-
straints J + K. If there is no constraint at all J = K = 0, then it is
called an unconstrained optimization problem. If K = 0 and J ≥ 1, it is
called an equality-constrained problem, while J = 0 and K ≥ 1 becomes
an inequality-constrained problem. It is worth pointing out that in some
formulations in the optimization literature, equalities are not explicitly in-
cluded, and only inequalities are included. This is because an equality
can be written as two inequalities. For example h(x) = 0 is equivalent to
h(x) ≤ 0 and h(x) ≥ 0.
We can also use the actual function forms for classification. The objec-
tive functions can be either linear or nonlinear. If the constraints hj and gk
are all linear, then it becomes a linearly constrained problem. If both the
constraints and the objective functions are all linear, it becomes a linear
programming problem. Here ‘programming’ has nothing to do with com-
puting programming, it means planning and/or optimization. However,
generally speaking, all fi , hj and gk are nonlinear, we have to deal with a
nonlinear optimization problem.
the found quality solutions, it is expected some of them are nearly optimal,
though there is no guarantee for such optimality.
Two major components of any metaheuristic algorithms are: intensifi-
cation and diversification, or exploitation and exploration. Diversification
means to generate diverse solutions so as to explore the search space on the
global scale, while intensification means to focus on the search in a local
region by exploiting the information that a current good solution is found
in this region. This is in combination with with the selection of the best
solutions. The selection of the best ensures that the solutions will converge
to the optimality, while the diversification via randomization avoids the
solutions being trapped at local optima and, at the same time, increases
the diversity of the solutions. The good combination of these two major
components will usually ensure that the global optimality is achievable.
Metaheuristic algorithms can be classified in many ways. One way is
to classify them as: population-based and trajectory-based. For example,
genetic algorithms are population-based as they use a set of strings, so
is the particle swarm optimization (PSO) which uses multiple agents or
particles.
On the other hand, simulated annealing uses a single agent or solution
which moves through the design space or search space in a piecewise style.
A better move or solution is always accepted, while a not-so-good move
can be accepted with a certain probability. The steps or moves trace a tra-
jectory in the search space, with a non-zero probability that this trajectory
can reach the global optimum.
Before we introduce all popular meteheuristic algorithms in detail, let
us look at their history briefly.
some strong evidence that PSO is better than traditional search algorithms
and even better than genetic algorithms for many types of problems, though
this is far from conclusive.
In around 1996 and later in 1997, R. Storn and K. Price developed their
vector-based evolutionary algorithm, called differential evolution (DE), and
this algorithm proves more efficient than genetic algorithms in many ap-
plications.
In 1997, the publication of the ‘no free lunch theorems for optimization’
by D. H. Wolpert and W. G. Macready sent out a shock way to the opti-
mization community. Researchers have been always trying to find better
algorithms, or even universally robust algorithms, for optimization, espe-
cially for tough NP-hard optimization problems. However, these theorems
state that if algorithm A performs better than algorithm B for some opti-
mization functions, then B will outperform A for other functions. That is
to say, if averaged over all possible function space, both algorithms A and B
will perform on average equally well. Alternatively, there is no universally
better algorithms exist. That is disappointing, right? Then, people real-
ized that we do not need the average over all possible functions for a given
optimization problem. What we want is to find the best solutions, which
has nothing to do with average over all possible function space. In addition,
we can accept the fact that there is no universal or magical tool, but we do
know from our experience that some algorithms indeed outperform others
for given types of optimization problems. So the research now focuses on
finding the best and most efficient algorithm(s) for a given problem. The
objective is to design better algorithms for most types of problems, not for
all the problems. Therefore, the search is still on.
At the turn of the 21st century, things became even more exciting. First,
Zong Woo Geem et al. in 2001 developed the harmony search (HS) algo-
rithm, which has been widely applied in solving various optimization prob-
lems such as water distribution, transport modelling and scheduling. In
2004, S. Nakrani and C. Tovey proposed the honey bee algorithm and its
application for optimizing Internet hosting centers, which followed by the
development of a novel bee algorithm by D. T. Pham et al. in 2005 and the
artificial bee colony (ABC) by D. Karaboga in 2005. In 2008, the author of
this book developed the firefly algorithm (FA)1 . Quite a few research arti-
cles on the firefly algorithm then followed, and this algorithm has attracted
a wide range of interests. In 2009, Xin-She Yang at Cambridge University,
UK, and Suash Deb at Raman College of Engineering, India, introduced
an efficient cuckoo search (CS) algorithm, and it has been demonstrated
that CS is far more effective than most existing metaheuristic algorithms
REFERENCES
2 Novel
cuckoo search ‘beats’ particle swarm optimization, Science Daily, news article
(28 May 2010), www.sciencedaily.com
10 CHAPTER 1. INTRODUCTION
which means the next state SN will only depend the current existing state
SN −1 and the motion or transition XN from the existing state to the next
state. This is typically the main property of a Markov chain to be intro-
duced later.
Here the step size or length in a random walk can be fixed or varying.
Random walks have many applications in physics, economics, statistics,
computer sciences, environmental science and engineering.
Consider a scenario, a drunkard walks on a street, at each step, he
can randomly go forward or backward, this forms a random walk in one-
dimensional. If this drunkard walks on a football pitch, he can walk in
any direction randomly, this becomes a 2D random walk. Mathematically
speaking, a random walk is given by the following equation
St+1 = St + wt , (2.8)
Broadly speaking, Lévy flights are a random walk whose step length is
drawn from the Lévy distribution, often in terms of a simple power-law
formula L(s) ∼ |s|−1−β where 0 < β ≤ 2 is an index. Mathematically
speaking, a simple version of Lévy distribution can be defined as
p γ γ 1
2π exp[− 2(s−µ) ] (s−µ)3/2 , 0 < µ < s < ∞
L(s, γ, µ) = (2.11)
0 otherwise,
1 γ
p(x, γ, µ) = , (2.16)
π γ 2 + (x − µ)2
where µ is the location parameter, while γ controls the scale of this distri-
bution.
For the general case, the inverse integral
1 ∞
Z
L(s) = cos(ks) exp[−α|k|β ]dk, (2.17)
π 0
α β Γ(β) sin(πβ/2)
L(s) → , s → ∞. (2.18)
π|s|1+β
16 CHAPTER 2. RANDOM WALKS AND LÉVY FLIGHTS
Studies show that Lévy flights can maximize the efficiency of resource
searches in uncertain environments. In fact, Lévy flights have been observed
among foraging patterns of albatrosses and fruit flies, and spider monkeys.
Even humans such as the Ju/’hoansi hunter-gatherers can trace paths of
Lévy-flight patterns. In addition, Lévy flights have many applications.
Many physical phenomena such as the diffusion of fluorescent molecules,
cooling behavior and noise could show Lévy-flight characteristics under the
right conditions.
St+1 = St + wt , (2.25)
chain with known transition probability. Since the 1990s, the Markov chain
Monte Carlo has become a powerful tool for Bayesian statistical analysis,
Monte Carlo simulations, and potentially optimization with high nonlin-
earity.
An important link between MCMC and optimization is that some heuris-
tic and metaheuristic search algorithms such as simulated annealing to be
introduced later use a trajectory-based approach. They start with some ini-
tial (random) state, and propose a new state (solution) randomly. Then,
the move is accepted or not, depending on some probability. There is
strongly similar to a Markov chain. In fact, the standard simulated an-
nealing is a random walk.
Mathematically speaking, a great leap in understanding metaheuristic
algorithms is to view a Markov chain Monte carlo as an optimization pro-
cedure. If we want to find the minimum of an objective function f (θ) at
θ = θ∗ so that f∗ = f (θ∗ ) ≤ f (θ), we can convert it to a target distribution
for a Markov chain
π(θ) = e−βf (θ) , (2.26)
where β > 0 is a parameter which acts as a normalized factor. β value
should be chosen so that the probability is close to 1 when θ → θ∗ . At
θ = θ∗ , π(θ) should reach a maximum π∗ = π(θ∗ ) ≥ π(θ). This requires
that the formulation of L(θ) should be non-negative, which means that
some objective functions can be shifted by a large constant A > 0 such as
f ← f + A if necessary.
By constructing a Markov chain Monte Carlo, we can formulate a generic
framework as outlined by Ghate and Smith in 2008, as shown in Figure 2.3.
In this framework, simulated annealing and its many variants are simply a
special case with
∆f
exp[− Tt ] if ft+1 > ft
Pt = ,
1 if ft+1 ≤ ft
In this case, only the difference ∆f between the function values is impor-
tant.
Algorithms such as simulated annealing, to be discussed in the next
chapter, use a single Markov chain, which may not be very efficient. In
practice, it is usually advantageous to use multiple Markov chains in paral-
lel to increase the overall efficiency. In fact, the algorithms such as particle
swarm optimization can be viewed as multiple interacting Markov chains,
though such theoretical analysis remains almost intractable. The theory of
interacting Markov chains is complicated and yet still under development,
however, any progress in such areas will play a central role in the under-
standing how population- and trajectory-based metaheuristic algorithms
perform under various conditions. However, even though we do not fully
understand why metaheuristic algorithms work, this does not hinder us to
2.4 OPTIMIZATION AS MARKOV CHAINS 19
end
use these algorithms efficiently. On the contrary, such mysteries can drive
and motivate us to pursue further research and development in metaheuris-
tics.
REFERENCES
SIMULATED ANNEALING
One of the earliest and yet most popular metaheuristic algorithms is simu-
lated annealing (SA), which is a trajectory-based, random search technique
for global optimization. It mimics the annealing process in material pro-
cessing when a metal cools and freezes into a crystalline state with the
minimum energy and larger crystal size so as to reduce the defects in
metallic structures. The annealing process involves the careful control of
temperature and its cooling rate, often called annealing schedule.
by
− k∆ET
p=e B , (3.1)
where kB is the Boltzmann’s constant, and for simplicity, we can use k to
denote kB because k = 1 is often used. T is the temperature for controlling
the annealing process. ∆E is the change of the energy level. This transition
probability is based on the Boltzmann distribution in statistical mechanics.
The simplest way to link ∆E with the change of the objective function
∆f is to use
∆E = γ∆f, (3.2)
where γ is a real constant. For simplicity without losing generality, we can
use kB = 1 and γ = 1. Thus, the probability p simply becomes
3.2 PARAMETERS
Here the choice of the right initial temperature is crucially important. For
a given change ∆f , if T is too high (T → ∞), then p → 1, which means
almost all the changes will be accepted. If T is too low (T → 0), then any
∆f > 0 (worse solution) will rarely be accepted as p → 0 and thus the
diversity of the solution is limited, but any improvement ∆f will almost
always be accepted. In fact, the special case T → 0 corresponds to the
gradient-based method because only better solutions are accepted, and the
system is essentially climbing up or descending along a hill. Therefore,
if T is too high, the system is at a high energy state on the topological
landscape, and the minima are not easily reached. If T is too low, the
system may be trapped in a local minimum (not necessarily the global
minimum), and there is not enough energy for the system to jump out the
local minimum to explore other minima including the global minimum. So
a proper initial temperature should be calculated.
Another important issue is how to control the annealing or cooling pro-
cess so that the system cools down gradually from a higher temperature
to ultimately freeze to a global minimum state. There are many ways of
controlling the cooling rate or the decrease of the temperature.
3.3 SA ALGORITHM 23
Two commonly used annealing schedules (or cooling schedules) are: lin-
ear and geometric. For a linear cooling schedule, we have
T = T0 − βt, (3.5)
or T → T − δT , where T0 is the initial temperature, and t is the pseudo
time for iterations. β is the cooling rate, and it should be chosen in such a
way that T → 0 when t → tf (or the maximum number N of iterations),
this usually gives β = (T0 − Tf )/tf .
On the other hand, a geometric cooling schedule essentially decreases
the temperature by a cooling factor 0 < α < 1 so that T is replaced by αT
or
T (t) = T0 αt , t = 1, 2, ..., tf . (3.6)
The advantage of the second method is that T → 0 when t → ∞, and thus
there is no need to specify the maximum number of iterations. For this
reason, we will use this geometric cooling schedule. The cooling process
should be slow enough to allow the system to stabilize easily. In practise,
α = 0.7 ∼ 0.99 is commonly used.
In addition, for a given temperature, multiple evaluations of the objec-
tive function are needed. If too few evaluations, there is a danger that the
system will not stabilize and subsequently will not converge to its global
optimality. If too many evaluations, it is time-consuming, and the system
will usually converge too slowly, as the number of iterations to achieve
stability might be exponential to the problem size.
Therefore, there is a fine balance between the number of evaluations and
solution quality. We can either do many evaluations at a few temperature
levels or do few evaluations at many temperature levels. There are two
major ways to set the number of iterations: fixed or varied. The first uses
a fixed number of iterations at each temperature, while the second intends
to increase the number of iterations at lower temperatures so that the local
minima can be fully explored.
3.3 SA ALGORITHM
high temperature (so that almost all changes are accepted) and reduce
the temperature quickly until about 50% or 60% of the worse moves are
accepted, and then use this temperature as the new initial temperature T0
for proper and relatively slow cooling.
For the final temperature, it should be zero in theory so that no worse
move can be accepted. However, if Tf → 0, more unnecessary evaluations
are needed. In practice, we simply choose a very small value, say, Tf =
10−10 ∼ 10−5 , depending on the required quality of the solutions and time
constraints.
we know that its global minimum f∗ = 0 occurs at (1, 1) (see Fig. 3.2). This
is a standard test function and quite tough for most algorithms. However,
by modifying the program given later in the next chapter, we can find this
3.4 UNCONSTRAINED OPTIMIZATION 25
-1
-2
-2 -1 0 1 2
Figure 3.2: Rosenbrock’s function with the global minimum f∗ = 0 at (1, 1).
Figure 3.3: 500 evaluations during the annealing iterations. The final global best
is marked with •.
global minimum easily and the last 500 evaluations during annealing are
shown in Fig. 3.3.
This banana function is still relatively simple as it has a curved nar-
row valley. We should validate SA against a wide range of test functions,
especially those that are strongly multimodal and highly nonlinear. It is
straightforward to extend the above program to deal with highly nonlinear
multimodal functions.
26 CHAPTER 3. SIMULATED ANNEALING
f (x)
g(x)
REFERENCES
and
∂L
= gj = 0, (j = 1, ..., M ). (4.5)
∂λj
These M + n equations will determine the n components of x and M
∂L
Lagrange multipliers. As ∂g j
= λj , we can consider λj as the rate of the
change of the quantity L(x, λj ) as a functional of gj .
Now let us look at a simple example
subject to
3u + v = 9.
First, we write it as an unconstrained problem using a Lagrange multiplier
λ, and we have
L = u2/3 v 1/3 + λ(3u + v − 9).
The conditions to reach optimality are
∂L 2 ∂L 1
= u−1/3 v 1/3 + 3λ = 0, = u2/3 v −2/3 + λ = 0,
∂u 3 ∂v 3
and
∂L
= 3u + v − 9 = 0.
∂λ
The first two conditions give 2v = 3u, whose combination with the third
condition leads to
u = 2, v = 3.
√
3
Thus, the maximum of f∗ is 12.
Here we only discussed the equality constraints. For inequality con-
straints, things become more complicated. We need the so-called Karush-
Kuhn-Tucker conditions.
Let us consider the following, generic, nonlinear optimization problem
minimize
x∈<n f (x),
M
X N
X
µ0 ∇f (x∗ ) + λi ∇φi (x∗ ) + µj ∇ψj (x∗ ) = 0, (4.7)
i=1 j=1
and
ψj (x∗ ) ≤ 0, µj ψj (x∗ ) = 0, (j = 1, 2, ..., N ), (4.8)
where
µj ≥ 0, (j = 0, 1, ..., N ). (4.9)
The last non-negativity conditions hold for all µj , though there is no con-
straint on the sign of λi .
The constants satisfy the following condition
N
X M
X
µj + |λi | ≥ 0. (4.10)
j=0 i=1
As random walks are widely used for randomization and local search, a
proper step size is very important. In the generic equation
xt+1 = xt + s t , (4.14)
t is drawn from a standard normal distribution with zero mean and unity
standard deviation. Here the step size s determines how far a random
walker (e.g., an agent or particle in metaheuristics) can go for a fixed
number of iterations.
If s is too large, then the new solution xt+1 generated will be too far
away from the old solution (or more often the current best). Then, such a
move is unlikely to be accepted. If s is too small, the change is too small
to be significant, and consequently such search is not efficient. So a proper
step size is important to maintain the search as efficient as possible.
From the theory of simple isotropic random walks, we know that the
average distance r traveled in the d-dimension space is
r2 = 2dDt, (4.15)
τ r2
s2 = . (4.16)
td
For a typical length scale L of a dimension of interest, the local search is
typically limited in a region of L/10. That is, r = L/10. As the iterations
are discrete, we can take τ = 1. Typically in metaheuristics, we can expect
that the number of generations is usually t = 100 to 1000, which means
that
r L/10
s≈ √ = √ . (4.17)
td td
For d = 1 and t = 100, we have s = 0.01L, while s = 0.001L for d = 10
and t = 1000. As step sizes could differ from variable to variable, a step
size ratio s/L is more generic. Therefore, we can use s/L = 0.001 to 0.01
for most problems. We will use this step size factor in our implementation,
to be discussed later in the last section of this chapter.
34 CHAPTER 4. HOW TO DEAL WITH CONSTRAINTS
The welded beam design problem is a standard test problem for constrained
design optimization, which was described in detail in the literature (Rags-
dell and Phillips 1976, Cagnina et al 2008). The problem has four design
variables: the width w and length L of the welded area, the depth d and
thickness h of the beam. The objective is to minimize the overall fabri-
cation cost, under the appropriate constraints of shear stress τ , bending
stress σ, buckling load P and end deflection δ.
The problem can be written as
subject to
g1 (x) = τ (x) − 13, 600 ≤ 0
g2 (x) = σ(x) − 30, 000 ≤ 0
g3 (x) = w − h ≤ 0
g4 (x) = 0.10471w2 + 0.04811hd(14 + L) − 5.0 ≤ 0 (4.19)
g5 (x) = 0.125 − w ≤ 0
g6 (x) = δ(x) − 0.25 ≤ 0
g7 (x) = 6000 − P (x) ≤ 0,
where
504, 000 65, 856 L
σ(x) = , δ= , Q = 6000(14 + ),
hd2 30, 000hd3 2
1p 2 √ L2 (w + d)2 QD
D= L + (w + d)2 , J=2 wL[ + ], β = ,
2 6 2 J
r
6000 αβL
α= √ , τ (x) = α2 + + β2,
2wL D
p
3 d 30/48
6 dh
P = 0.61423 × 10 (1 − ). (4.20)
6 28
The simple limits or bounds are 0.1 ≤ L, d ≤ 10 and 0.1 ≤ w, h ≤ 2.0.
If we use the simulated annealing algorithm to solve this problem (see
next section), we can get the optimal solution which is about the same
solution obtained by Cagnina et al (2008)
It is worth pointing out that you have to run the programs a few times
using values such as α = 0.95 (default) and α = 0.99 to see how the results
vary. In addition, as SA is a stochastic optimization algorithm, we cannot
expect the results are the same. In fact, they will be slightly different, every
time we run the program. Therefore, we should understand and interpret
the results using statistical measures such as mean and standard deviation.
4.5 SA IMPLEMENTATION
We just formulated the welded beam design problem using different nota-
tions from some literature. Here we try to illustrate a point.
As the input to a function is a vector (either column vector or less often
row vector), we have to write
x = w L d h = [x(1) x(2) x(3) x(4)]. (4.22)
function [bestsol,fval,N]=sa_mincon(alpha)
% Default cooling factor
if nargin<1,
alpha=0.95;
end
% Display usage
disp(’sa_mincon or [Best,fmin,N]=sa_mincon(0.9)’);
if length(Lb) ~=length(Ub),
disp(’Simple bounds/limits are improper!’);
return
end
else
init_flag=0;
ns=newsolution(best,Lb,Ub,init_flag);
end
totaleval=totaleval+1;
E_new = Fun(ns);
% Decide to accept the new solution
DeltaE=E_new-E_old;
% Accept if improved
if (DeltaE <0)
best = ns; E_old = E_new;
accept=accept+1; j = 0;
end
% Accept with a probability if not improved
if (DeltaE>=0 & exp(-DeltaE/(k*T))>rand );
best = ns; E_old = E_new;
accept=accept+1;
else
j=j+1;
end
% Update the estimated optimal solution
f_opt=E_old;
end
bestsol=best;
fval=f_opt;
N=totaleval;
%% New solutions
function s=newsolution(u0,Lb,Ub,init_flag)
% Either search around
if length(Lb)>0 & init_flag==1,
s=Lb+(Ub-Lb).*rand(size(u0));
else
% Or local search by random walk
stepsize=0.01;
s=u0+stepsize*(Ub-Lb).*randn(size(u0));
end
s=bounds(s,Lb,Ub);
%% Cooling
function T=cooling(alpha,T)
T=alpha*T;
38 CHAPTER 4. HOW TO DEAL WITH CONSTRAINTS
function ns=bounds(ns,Lb,Ub)
if length(Lb)>0,
% Apply the lower bound
ns_tmp=ns;
I=ns_tmp<Lb;
ns_tmp(I)=Lb(I);
% Apply the upper bounds
J=ns_tmp>Ub;
ns_tmp(J)=Ub(J);
% Update this new move
ns=ns_tmp;
else
ns=ns;
end
% Objective
z=fobj(u);
function Z=getnonlinear(u)
Z=0;
% Penalty constant
lam=10^15; lameq=10^15;
[g,geq]=constraints(u);
% Inequality constraints
for k=1:length(g),
Z=Z+ lam*g(k)^2*getH(g(k));
end
if g<=0,
H=0;
else
H=1;
end
% Objective functions
function z=fobj(u)
% Welded beam design optimization
z=1.10471*u(1)^2*u(2)+0.04811*u(3)*u(4)*(14.0+u(2));
% All constraints
function [g,geq]=constraints(x)
% Inequality constraints
Q=6000*(14+x(2)/2);
D=sqrt(x(2)^2/4+(x(1)+x(3))^2/4);
J=2*(x(1)*x(2)*sqrt(2)*(x(2)^2/12+(x(1)+x(3))^2/4));
alpha=6000/(sqrt(2)*x(1)*x(2));
beta=Q*D/J;
tau=sqrt(alpha^2+2*alpha*beta*x(2)/(2*D)+beta^2);
sigma=504000/(x(4)*x(3)^2);
delta=65856000/(30*10^6*x(4)*x(3)^3);
tmpf=4.013*(30*10^6)/196;
P=tmpf*sqrt(x(3)^2*x(4)^6/36)*(1-x(3)*sqrt(30/48)/28);
g(1)=tau-13600;
g(2)=sigma-30000;
g(3)=x(1)-x(4);
g(4)=0.10471*x(1)^2+0.04811*x(3)*x(4)*(14+x(2))-5.0;
g(5)=0.125-x(1);
g(6)=delta-0.25;
g(7)=6000-P;
% Equality constraints
geq=[];
%% End of the program --------------------------------
40 CHAPTER 4. HOW TO DEAL WITH CONSTRAINTS
To get the files of all the Matlab programs provided in this book, readers
can send an email (with the subject ‘Nature-Inspired Algorithms: Files’)
to [email protected] – A zip file will be provided
(via email) by the author.
REFERENCES
1. Cagnina L. C., Esquivel S. C., and Coello C. A., Solving engineering op-
timization problems with the simple constrained particle swarm optimizer,
Informatica, 32, 319-326 (2008)
2. Cerny V., A thermodynamical approach to the travelling salesman problem:
an efficient simulation algorithm, Journal of Optimization Theory and Ap-
plications, 45, 41-51 (1985).
3. Deb K., Optimisation for Engineering Design: Algorithms and Examples,
Prentice-Hall, New Delhi, (1995).
4. Gill P. E., Murray W., and Wright M. H., Practical optimization, Academic
Press Inc, (1981).
5. Hamacher K., Wenzel W., The scaling behaviour of stochastic minimization
algorithms in a perfect funnel landscape, Phys. Rev. E., 59, 938-941(1999).
6. Kirkpatrick S., Gelatt C. D., and Vecchi M. P., Optimization by simulated
annealing, Science, 220, No. 4598, 671-680 (1983).
7. Metropolis N., Rosenbluth A. W., Rosenbluth M. N., Teller A. H., and Teller
E., Equations of state calculations by fast computing machines, Journal of
Chemical Physics, 21, 1087-1092 (1953).
8. Ragsdell K. and Phillips D., Optimal design of a class of welded structures
using geometric programming, J. Eng. Ind., 98, 1021-1025 (1976).
9. Wenzel W. and Hamacher K., A stochastic tunneling approach for global
optimization, Phys. Rev. Lett., 82, 3003-3007 (1999).
10. Yang X. S., Biology-derived algorithms in engineering optimization (Chapter
32), in Handbook of Bioinspired Algorithms, edited by Olariu S. and Zomaya
A., Chapman & Hall / CRC, (2005).
11. E. G. Talbi, Metaheuristics: From Design to Implementation, Wiley, (2009).
Chapter 8
SWARM OPTIMIZATION
Many algorithms such as ant colony algorithms and virtual ant algorithms
use the behaviour of the so-called swarm intelligence. Particle swarm opti-
mization may have some similarities with genetic algorithms and ant algo-
rithms, but it is much simpler because it does not use mutation/crossover
operators or pheromone. Instead, it uses the real-number randomness and
the global communication among the swarm particles. In this sense, it is
also easier to implement as there is no encoding or decoding of the param-
eters into binary strings as those in genetic algorithms (which can also use
real-number strings).
This algorithm searches the space of an objective function by adjusting
the trajectories of individual agents, called particles, as the piecewise paths
formed by positional vectors in a quasi-stochastic manner. The movement
of a swarming particle consists of two major components: a stochastic com-
ponent and a deterministic component. Each particle is attracted toward
the position of the current global best g ∗ and its own best location x∗i in
history, while at the same time it has a tendency to move randomly.
When a particle finds a location that is better than any previously found
locations, then it updates it as the new current best for particle i. There
is a current best for all n particles at any time t during iterations. The
aim is to find the global best among all the current best solutions until the
objective no longer improves or after a certain number of iterations. The
movement of particles is schematically represented in Fig. 8.1 where x∗i is
Nature-Inspired Metaheuristic Algorithms, 2nd Edition by Xin-She Yang 63
Copyright c 2010 Luniver Press
64 CHAPTER 8. SWARM OPTIMIZATION
possible
directions
f x∗
i
:
- f
H
particle i A HH
A H
j
H
?AU
v ∗
g
Figure 8.1: Schematic representation of the motion of a particle
in PSO, moving towards the global best g ∗ and the current best
x∗i for each particle i.
the current best for particle i, and g ∗ ≈ min{f (xi )} for (i = 1, 2, ..., n) is
the current global best.
v t+1
i = v ti + α1 [g ∗ − xti ] + β2 [x∗i − xti ]. (8.1)
8.3 ACCELERATED PSO 65
where 1 and 2 are two random vectors, and each entry taking the values
between 0 and 1. The Hadamard product of two matrices u v is defined
as the entrywise product, that is [u v]ij = uij vij . The parameters α and
β are the learning parameters or acceleration constants, which can typically
be taken as, say, α ≈ β ≈ 2.
The initial locations of all particles should distribute relatively uniformly
so that they can sample over most regions, which is especially important
for multimodal problems. The initial velocity of a particle can be taken as
zero, that is, v t=0
i = 0. The new position can then be updated by
xt+1
i = xti + v t+1
i . (8.2)
Although v i can be any values, it is usually bounded in some range [0, v max ].
There are many variants which extend the standard PSO algorithm, and
the most noticeable improvement is probably to use inertia function θ(t)
so that v ti is replaced by θ(t)v ti
v t+1
i = θv ti + α1 [g ∗ − xti ] + β2 [x∗i − xti ], (8.3)
where θ takes the values between 0 and 1. In the simplest case, the inertia
function can be taken as a constant, typically θ ≈ 0.5 ∼ 0.9. This is
equivalent to introducing a virtual mass to stabilize the motion of the
particles, and thus the algorithm is expected to converge more quickly.
The standard particle swarm optimization uses both the current global
best g ∗ and the individual best x∗i . The reason of using the individual best
is primarily to increase the diversity in the quality solutions, however, this
diversity can be simulated using some randomness. Subsequently, there is
no compelling reason for using the individual best, unless the optimization
problem of interest is highly nonlinear and multimodal.
A simplified version which could accelerate the convergence of the al-
gorithm is to use the global best only. Thus, in the accelerated particle
swarm optimization, the velocity vector is generated by a simpler formula
v t+1
i = v ti + α( − 1/2) + β(g ∗ − xti ), (8.4)
where is a random variable with values from 0 to 1. Here the shift 1/2 is
purely out of convenience. We can also use a standard normal distribution
αn where n is drawn from N (0, 1) to replace the second term. The update
of the position is simply
xt+1
i = xti + v t+1
i . (8.5)
66 CHAPTER 8. SWARM OPTIMIZATION
In order to increase the convergence even further, we can also write the
update of the location in a single step
xt+1
i = (1 − β)xti + βg ∗ + αn . (8.6)
This simpler version will give the same order of convergence. The typical
values for this accelerated PSO are α ≈ 0.1 ∼ 0.4 and β ≈ 0.1 ∼ 0.7, though
α ≈ 0.2 and β ≈ 0.5 can be taken as the initial values for most unimodal
objective functions. It is worth pointing out that the parameters α and β
should in general be related to the scales of the independent variables xi
and the search domain.
A further improvement to the accelerated PSO is to reduce the random-
ness as iterations proceed. This means that we can use a monotonically
decreasing function such as
α = α0 e−γt , (8.7)
or
α = α0 γ t , (0 < γ < 1), (8.8)
where α0 ≈ 0.5 ∼ 1 is the initial value of the randomness parameter. Here t
is the number of iterations or time steps. 0 < γ < 1 is a control parameter.
For example, in our implementation, we will use
α = 0.7t , (8.9)
where t ∈ [0, 10]. Obviously, these parameters are fine-tuned to suit the
current optimization problems as a demonstration.
8.4 IMPLEMENTATION
4m x2 x2
− x sin(x) cos( ) − cos(x) sin( ) = 0,
π π π
and
8m 2y 2 2y 2
− y sin(x) cos( ) − cos(y) sin( ) = 0.
π π π
8.4 IMPLEMENTATION 67
−2
0
1
2 4
3
3 2
1
4 0
Figure 8.3: Michalewicz’s function with a global minimum at about
(2.20319, 1.57049).
function [best]=pso_simpledemo(n,Num_iterations)
% n=number of particles
% Num_iterations=total number of iterations
if nargin<2, Num_iterations=10; end
if nargin<1, n=20; end
% Michalewicz Function f*=-1.801 at [2.20319,1.57049]
% Splitting two parts to avoid long lines in printing
str1=‘-sin(x)*(sin(x^2/3.14159))^20’;
str2=‘-sin(y)*(sin(2*y^2/3.14159))^20’;
funstr=strcat(str1,str2);
% Converting to an inline function and vectorization
f=vectorize(inline(funstr));
% range=[xmin xmax ymin ymax];
range=[0 4 0 4];
% ----------------------------------------------------
% Setting the parameters: alpha, beta
% Random amplitude of roaming particles alpha=[0,1]
68 CHAPTER 8. SWARM OPTIMIZATION
% alpha=gamma^t=0.7^t;
% Speed of convergence (0->1)=(slow->fast)
beta=0.5;
% ----------------------------------------------------
% Grid values of the objective function
% These values are used for visualization only
Ngrid=100;
dx=(range(2)-range(1))/Ngrid;
dy=(range(4)-range(3))/Ngrid;
xgrid=range(1):dx:range(2); ygrid=range(3):dy:range(4);
[x,y]=meshgrid(xgrid,ygrid);
z=f(x,y);
% Display the shape of the function to be optimized
figure(1);
surfc(x,y,z);
% ---------------------------------------------------
best=zeros(Num_iterations,3); % initialize history
% ----- Start Particle Swarm Optimization -----------
% generating the initial locations of n particles
[xn,yn]=init_pso(n,range);
% Display the paths of particles in a figure
% with a contour of the objective function
figure(2);
% Start iterations
for i=1:Num_iterations,
% Show the contour of the function
contour(x,y,z,15); hold on;
% Find the current best location (xo,yo)
zn=f(xn,yn);
zn_min=min(zn);
xo=min(xn(zn==zn_min));
yo=min(yn(zn==zn_min));
zo=min(zn(zn==zn_min));
% Trace the paths of all roaming particles
% Display these roaming particles
plot(xn,yn,‘.’,xo,yo,‘*’); axis(range);
% The accelerated PSO with alpha=gamma^t
gamma=0.7; alpha=gamma.^i;
% Move all the particles to new locations
[xn,yn]=pso_move(xn,yn,xo,yo,alpha,beta,range);
drawnow;
% Use "hold on" to display paths of particles
hold off;
% History
best(i,1)=xo; best(i,2)=yo; best(i,3)=zo;
end %%%%% end of iterations
% ----- All subfunctions are listed here -----
% Intial locations of n particles
8.5 CONVERGENCE ANALYSIS 69
4 4
3 3
2 2
1 1
0 0
0 1 2 3 4 0 1 2 3 4
function [xn,yn]=init_pso(n,range)
xrange=range(2)-range(1); yrange=range(4)-range(3);
xn=rand(1,n)*xrange+range(1);
yn=rand(1,n)*yrange+range(3);
% Move all the particles toward (xo,yo)
function [xn,yn]=pso_move(xn,yn,xo,yo,a,b,range)
nn=size(yn,2); %a=alpha, b=beta
xn=xn.*(1-b)+xo.*b+a.*(rand(1,nn)-0.5);
yn=yn.*(1-b)+yo.*b+a.*(rand(1,nn)-0.5);
[xn,yn]=findrange(xn,yn,range);
% Make sure the particles are within the range
function [xn,yn]=findrange(xn,yn,range)
nn=length(yn);
for i=1:nn,
if xn(i)<=range(1), xn(i)=range(1); end
if xn(i)>=range(2), xn(i)=range(2); end
if yn(i)<=range(3), yn(i)=range(3); end
if yn(i)>=range(4), yn(i)=range(4); end
end
If we run the program, we will get the global optimum after about 200
evaluations of the objective function (for 20 particles and 10 iterations).
The results and the locations of the particles are shown in Fig. 8.4.
From the statistical point of view, each particle in PSO forms a Markov
chain, though this Markov chain is biased towards to the current best, as
the transition probability often leads to the acceptance of the move towards
the current global best. In addition, the multiple Markov chains are inter-
acting in terms of partly deterministic attraction movement. Therefore, the
70 CHAPTER 8. SWARM OPTIMIZATION
v t+1
i = v ti + γ(g ∗ − xti ), γ = α + β, (8.10)
and
xt+1
i = xti + v t+1
i . (8.11)
Following an analysis of 1D dynamical system for particle swarm opti-
mization by Clerc and Kennedy (2002), we can replace g ∗ by a parameter
constant p so that we can see if or not the particle of interest will converge
towards p. Now we can write the above system as a simple dynamical
system
or
1 γ vt
Yt+1 = AYt , A= , Yt = . (8.14)
−1 1 − γ ut
The general solution of this dynamical system can be written as
Yt = Y0 exp[At]. (8.15)
For detailed analysis, please refer to Clerc and Kennedy (2002). Since p is
linked with the global best, as the iterations continue, it can be expected
that all particles will aggregate towards the the global best.
Various studies show that PSO algorithms can outperform genetic al-
gorithms and other conventional algorithms for solving many optimization
problems. This is partially due to that fact that the broadcasting ability of
the current best estimates gives a better and quicker convergence towards
the optimality. However, PSO algorithms are almost memoryless since they
do not record the movement paths of each particle, and it is expected that
it can be further improved using short-term memory in the similar fashion
as that in Tabu search. Further development is under active research.
REFERENCES
HARMONY SEARCH
which means that the A4 notes has a pitch number 69. In this scale, octaves
correspond to size 12 and semitone corresponds to size 1. Furthermore, the
ratio of frequencies of two notes which are an octave apart is 2:1. Thus,
the frequency of a note is doubled (halved) when it raised (lowered) by
an octave. For example, A2 has a frequency of 110Hz, while A5 has a
frequency of 880Hz.
The measurement of harmony when different pitches occurring simulta-
neously, like any aesthetic quality, is somewhat subjective. However, it is
possible to use some standard estimation for harmony. The frequency ra-
tio, pioneered by ancient Greek mathematician Pythagoras, is a good way
for such estimations. For example, the octave with a ratio of 1:2 sounds
pleasant when playing together, so are the notes with a ratio of 2:3 (see
Fig. 9.1). However, it is unlikely for any random notes such as those shown
in 9.2 to produce a pleasant harmony.
Harmony search can be explained in more detail with the aid of the dis-
cussion of the improvisation process by a musician. When a musician is
improvising, he or she has three possible choices: (1) play any famous
piece of music (a series of pitches in harmony) exactly from his or her
memory; (2) play something similar to a known piece (thus adjusting the
pitch slightly); or (3) compose new or random notes. If we formalize these
three options for optimization, we have three corresponding components:
usage of harmony memory, pitch adjusting, and randomization.
The usage of harmony memory is important as it is similar to choose
the best fit individuals in the genetic algorithms. This will ensure the best
harmonies will be carried over to the new harmony memory. In order to
use this memory more effectively, we can assign a parameter raccept ∈ [0, 1],
called harmony memory accepting or considering rate. If this rate is too
low, only few best harmonies are selected and it may converge too slowly.
9.2 HARMONY SEARCH 75
If this rate is extremely high (near 1), almost all the harmonies are used in
the harmony memory, then other harmonies are not explored well, leading
to potentially wrong solutions. Therefore, typically, raccept = 0.7 ∼ 0.95.
To adjust the pitch slightly in the second component, we have to use a
method such that it can adjust the frequency efficiently. In theory, the pitch
can be adjusted linearly or nonlinearly, but in practice, linear adjustment
is used. If xold is the current solution (or pitch), then the new solution
(pitch) xnew is generated by
where rand is a random number drawn from a uniform distribution [0, 1].
Here bp is the bandwidth, which controls the local range of pitch adjust-
ment. In fact, we can see that the pitch adjustment (9.3) is a random walk.
Harmony Search
Objective function f (x), x = (x1 , ..., xp )T
Generate initial harmonics (real number arrays)
Define pitch adjusting rate (rpa ) and pitch limits
Define harmony memory accepting rate (raccept )
while ( t <Max number of iterations )
Generate new harmonics by accepting best harmonics
Adjust pitch to get new harmonics (solutions)
if (rand> raccept ),
choose an existing harmonic randomly
else if (rand> rpa ),
adjust the pitch randomly within a bandwidth (9.3)
else
generate new harmonics via randomization (9.4)
end if
Accept the new harmonics (solutions) if better
end while
Find the current best estimates
9.3 IMPLEMENTATION
it has the global minimum fmin = 0 at (1, 1). The following Matlab/Octave
program can be used to find its optimality.
HM(HSmaxNum, :) = x;
HMbest(HSmaxNum) = fbest;
end
solution=x; % Record the solution
end %% (end of harmony search)
The best estimate solution (1.005, 1.0605) is obtained after 25000 itera-
tions. On a modern desktop computer, it usually takes less than a minute.
The variations of these solutions are shown in Fig. 9.4.
We have used raccept =HMacceptRate= 0.95, and the pitch adjusting
rate rpa =PArate= 0.7. From Fig. 9.4, we can see that since the pitch
adjustment is more intensive in local regions (two thin strips), it indeed
indicates that the harmony search is more efficient than genetic algorithms.
However, such comparison for different types of problem is still an area of
active research.
Harmony search is emerging as a powerful algorithm, and its relevant
literature is expanding. It is still an interesting area of active research.
REFERENCES
FIREFLY ALGORITHM
Firefly Algorithm
Objective function f (x), x = (x1 , ..., xd )T
Generate initial population of fireflies xi (i = 1, 2, ..., n)
Light intensity Ii at xi is determined by f (xi )
Define light absorption coefficient γ
while (t <MaxGeneration)
for i = 1 : n all n fireflies
for j = 1 : n all n fireflies (inner loop)
if (Ii < Ij ), Move firefly i towards j; end if
Vary attractiveness with distance r via exp[−γr]
Evaluate new solutions and update light intensity
end for j
end for i
Rank the fireflies and find the current global best g ∗
end while
Postprocess results and visualization
• All fireflies are unisex so that one firefly will be attracted to other
fireflies regardless of their sex;
In the firefly algorithm, there are two important issues: the variation of
light intensity and formulation of the attractiveness. For simplicity, we
can always assume that the attractiveness of a firefly is determined by its
brightness which in turn is associated with the encoded objective function.
In the simplest case for maximum optimization problems, the brightness
I of a firefly at a particular location x can be chosen as I(x) ∝ f (x).
However, the attractiveness β is relative, it should be seen in the eyes
of the beholder or judged by the other fireflies. Thus, it will vary with
the distance rij between firefly i and firefly j. In addition, light intensity
decreases with the distance from its source, and light is also absorbed in
the media, so we should allow the attractiveness to vary with the degree of
absorption.
In the simplest form, the light intensity I(r) varies according to the
inverse square law
Is
I(r) = 2 , (10.1)
r
where Is is the intensity at the source. For a given medium with a fixed
light absorption coefficient γ, the light intensity I varies with the distance
r. That is
I = I0 e−γr , (10.2)
where I0 is the original light intensity. In order to avoid the singularity
at r = 0 in the expression Is /r2 , the combined effect of both the inverse
square law and absorption can be approximated as the following Gaussian
form 2
I(r) = I0 e−γr . (10.3)
As a firefly’s attractiveness is proportional to the light intensity seen by
adjacent fireflies, we can now define the attractiveness β of a firefly by
2
β = β0 e−γr , (10.4)
where β0 is the attractiveness at r = 0. As it is often faster to calculate
1/(1 + r2 ) than an exponential function, the above function, if necessary,
can conveniently be approximated as
β0
β= . (10.5)
1 + γr2
√
Both (10.4) and (10.5) define a characteristic distance Γ = 1/ γ over which
the attractiveness changes significantly from β0 to β0 e−1 for equation (10.4)
or β0 /2 for equation (10.5).
In the actual implementation, the attractiveness function β(r) can be
any monotonically decreasing functions such as the following generalized
form m
β(r) = β0 e−γr , (m ≥ 1). (10.6)
84 CHAPTER 10. FIREFLY ALGORITHM
where xi,k is the kth component of the spatial coordinate xi of ith firefly.
In 2-D case, we have
q
rij = (xi − xj )2 + (yi − yj )2 . (10.10)
It is worth pointing out that the distance r defined above is not limited to
the Euclidean distance. We can define other distance r in the n-dimensional
10.4 SCALINGS AND ASYMPTOTICS 85
1
5
0
−5 0
0
5 −5
10.5 IMPLEMENTATION
where (x, y) ∈ [−5, 5] × [−5, 5]. This function has four peaks. Two local
peaks with f = 1 at (−4, 4) and (4, 4), and two global peaks with fmax = 2
at (0, 0) and (0, −4), as shown in Figure 10.2. We can see that all these
four optima can be found using 25 fireflies in about 20 generations (see Fig.
10.3). So the total number of function evaluations is about 500. This is
much more efficient than most of existing metaheuristic algorithms.
% ------------------------------------------------
alpha=0.2; % Randomness 0--1 (highly random)
gamma=1.0; % Absorption coefficient
% ------------------------------------------------
% Grid values are used for display only
Ngrid=100;
dx=(range(2)-range(1))/Ngrid;
dy=(range(4)-range(3))/Ngrid;
[x,y]=meshgrid(range(1):dx:range(2),...
range(3):dy:range(4));
z=f(x,y);
% Display the shape of the objective function
figure(1); surfc(x,y,z);
% ------------------------------------------------
% generating the initial locations of n fireflies
[xn,yn,Lightn]=init_ffa(n,range);
% Display the paths of fireflies in a figure with
% contours of the function to be optimized
figure(2);
% Iterations or pseudo time marching
for i=1:MaxGeneration, %%%%% start iterations
% Show the contours of the function
contour(x,y,z,15); hold on;
% Evaluate new solutions
zn=f(xn,yn);
5 5
0 0
−5 −5
−5 0 5 −5 0 5
Figure 10.3: The initial locations of 25 fireflies (left) and their final locations
after 20 iterations (right).
xrange=range(2)-range(1);
yrange=range(4)-range(3);
xn=rand(1,n)*xrange+range(1);
yn=rand(1,n)*yrange+range(3);
Lightn=zeros(size(yn));
10.6 FA VARIANTS
The basic firefly algorithm is very efficient, but we can see that the solutions
are still changing as the optima are approaching. It is possible to improve
the solution quality by reducing the randomness.
A further improvement on the convergence of the algorithm is to vary
the randomization parameter α so that it decreases gradually as the optima
are approaching. For example, we can use
where t ∈ [0, tmax ] is the pseudo time for simulations and tmax is the max-
imum number of generations. α0 is the initial randomization parameter
while α∞ is the final value. We can also use a similar function to the
geometrical annealing schedule. That is
α = α0 θt , (10.13)
4x22 − x1 x2 1
g2 (x) = + − 1 ≤ 0,
12566(x31 x2 − x41 ) 5108x21
140.45x1
g3 (x) = 1 − ≤ 0,
x22 x3
x1 + x2
g4 (x) = − 1 ≤ 0. (10.15)
1.5
The simple bounds on the design variables are
The best solution found in the literature (e.g., Cagnina et al. 2008) is
% -------------------------------------------------------%
% Firefly Algorithm for constrained optimization %
% by Xin-She Yang (Cambridge University) Copyright @2009 %
% -------------------------------------------------------%
function fa_mincon_demo
% Simple bounds/limits
disp(’Solve the simple spring design problem ...’);
Lb=[0.05 0.25 2.0];
Ub=[2.0 1.3 15.0];
[u,fval,NumEval]=ffa_mincon(@cost,@constraint,u0,Lb,Ub,para);
10.7 SPRING DESIGN 91
% Display results
bestsolution=u
bestojb=fval
total_number_of_function_evaluations=NumEval
%%% --------------------------------------------------%%%
%%% Do not modify the following codes unless you want %%%
%%% to improve its performance etc %%%
% -------------------------------------------------------
% ===Start of the Firefly Algorithm Implementation ======
% Inputs: fhandle => @cost (your own cost function,
% can be an external file )
% nonhandle => @constraint, all nonlinear constraints
% can be an external file or a function
% Lb = lower bounds/limits
% Ub = upper bounds/limits
% para == optional (to control the Firefly algorithm)
% Outputs: nbest = the best solution found so far
% fbest = the best objective value
% NumEval = number of evaluations: n*MaxGeneration
% Optional:
% The alpha can be reduced (as to reduce the randomness)
% ---------------------------------------------------------
92 CHAPTER 10. FIREFLY ALGORITHM
% Start FA
function [nbest,fbest,NumEval]...
=ffa_mincon(fhandle,nonhandle,u0, Lb, Ub, para)
% Check input parameters (otherwise set as default values)
if nargin<6, para=[20 50 0.25 0.20 1]; end
if nargin<5, Ub=[]; end
if nargin<4, Lb=[]; end
if nargin<3,
disp(’Usuage: FA_mincon(@cost, @constraint,u0,Lb,Ub,para)’);
end
% n=number of fireflies
% MaxGeneration=number of pseudo time steps
% ------------------------------------------------
% alpha=0.25; % Randomness 0--1 (highly random)
% betamn=0.20; % minimum value of beta
% gamma=1; % Absorption coefficient
% ------------------------------------------------
n=para(1); MaxGeneration=para(2);
alpha=para(3); betamin=para(4); gamma=para(5);
% Check if the upper bound & lower bound are the same size
if length(Lb) ~=length(Ub),
disp(’Simple bounds/limits are improper!’);
return
end
% Calcualte dimension
d=length(u0);
zn(i)=Fun(fhandle,nonhandle,ns(i,:));
Lightn(i)=zn(i);
end
% -------------------------------------------------------
% ----- All the subfunctions are listed here ------------
% The initial locations of n fireflies
function [ns,Lightn]=init_ffa(n,d,Lb,Ub,u0)
% if there are bounds/limits,
if length(Lb)>0,
for i=1:n,
ns(i,:)=Lb+(Ub-Lb).*rand(1,d);
end
else
% generate solutions around the random guess
for i=1:n,
ns(i,:)=u0+randn(1,d);
end
end
% Updating fireflies
for i=1:n,
% The attractiveness parameter beta=exp(-gamma*r)
for j=1:n,
r=sqrt(sum((ns(i,:)-ns(j,:)).^2));
% Update moves
if Lightn(i)>Lighto(j), % Brighter and more attractive
beta0=1; beta=(beta0-betamin)*exp(-gamma*r.^2)+betamin;
tmf=alpha.*(rand(1,d)-0.5).*scale;
ns(i,:)=ns(i,:).*(1-beta)+nso(j,:).*beta+tmpf;
end
end % end for j
% -----------------------------------------
% d-dimensional objective function
function z=Fun(fhandle,nonhandle,u)
% Objective
10.7 SPRING DESIGN 95
z=fhandle(u);
function Z=getnonlinear(nonhandle,u)
Z=0;
% Penalty constant >> 1
lam=10^15; lameq=10^15;
% Get nonlinear constraints
[g,geq]=nonhandle(u);
REFERENCES
CUCKOO SEARCH
Cuckoo are fascinating birds, not only because of the beautiful sounds they can
make, but also because of their aggressive reproduction strategy. Some species
such as the ani and Guira cuckoos lay their eggs in communal nests, though they
may remove others’ eggs to increase the hatching probability of their own eggs.
Quite a number of species engage the obligate brood parasitism by laying their
eggs in the nests of other host birds (often other species).
There are three basic types of brood parasitism: intraspecific brood parasitism,
cooperative breeding, and nest takeover. Some host birds can engage direct
conflict with the intruding cuckoos. If a host bird discovers the eggs are not their
owns, they will either get rid of these alien eggs or simply abandon its nest and
build a new nest elsewhere. Some cuckoo species such as the New World brood-
parasitic Tapera have evolved in such a way that female parasitic cuckoos are
often very specialized in the mimicry in colour and pattern of the eggs of a few
chosen host species. This reduces the probability of their eggs being abandoned
and thus increases their reproductivity.
In addition, the timing of egg-laying of some species is also amazing. Parasitic
cuckoos often choose a nest where the host bird just laid its own eggs. In general,
the cuckoo eggs hatch slightly earlier than their host eggs. Once the first cuckoo
chick is hatched, the first instinct action it will take is to evict the host eggs by
blindly propelling the eggs out of the nest, which increases the cuckoo chick’s
share of food provided by its host bird. Studies also show that a cuckoo chick
can also mimic the call of host chicks to gain access to more feeding opportunity.
On the other hand, various studies have shown that flight behaviour of many an-
imals and insects has demonstrated the typical characteristics of Lévy flights. A
recent study by Reynolds and Frye shows that fruit flies or Drosophila melanogaster,
explore their landscape using a series of straight flight paths punctuated by a sud-
den 90o turn, leading to a Lévy-flight-style intermittent scale free search pattern.
Studies on human behaviour such as the Ju/’hoansi hunter-gatherer foraging pat-
terns also show the typical feature of Lévy flights. Even light can be related to
Lévy flights. Subsequently, such behaviour has been applied to optimization and
optimal search, and preliminary results show its promising capability.
For simplicity in describing our new Cuckoo Search, we now use the following
three idealized rules:
• Each cuckoo lays one egg at a time, and dumps its egg in a randomly
chosen nest;
• The best nests with high-quality eggs will be carried over to the next
generations;
• The number of available host nests is fixed, and the egg laid by a cuckoo
is discovered by the host bird with a probability pa ∈ [0, 1]. In this case,
the host bird can either get rid of the egg, or simply abandon the nest and
build a completely new nest.
As a further approximation, this last assumption can be approximated by a
fraction pa of the n host nests are replaced by new nests (with new random
solutions).
For a maximization problem, the quality or fitness of a solution can simply be
proportional to the value of the objective function. Other forms of fitness can be
defined in a similar way to the fitness function in genetic algorithms.
For the implementation point of view, we can use the following simple rep-
resentations that each egg in a nest represents a solution, and each cuckoo can
lay only one egg (thus representing one solution), the aim is to use the new and
potentially better solutions (cuckoos) to replace a not-so-good solution in the
nests. Obviously, this algorithm can be extended to the more complicated case
where each nest has multiple eggs representing a set of solutions. For this present
work, we will use the simplest approach where each nest has only a single egg.
In this case, there is no distinction between egg, nest or cuckoo, as each nest
corresponds to one egg which also represents one cuckoo.
Based on these three rules, the basic steps of the Cuckoo Search (CS) can be
summarized as the pseudo code shown in Fig. 12.1.
When generating new solutions x(t+1) for, say, a cuckoo i, a Lévy flight is
performed
(t+1) (t)
xi = xi + α ⊕ Lévy(λ), (12.1)
where α > 0 is the step size which should be related to the scales of the problem
of interests. In most cases, we can use α = O(L/10) where L is the characteristic
12.3 CUCKOO SEARCH 107
scale of the problem of interest. The above equation is essentially the stochastic
equation for a random walk. In general, a random walk is a Markov chain whose
next status/location only depends on the current location (the first term in the
above equation) and the transition probability (the second term). The product ⊕
means entrywise multiplications. This entrywise product is similar to those used
in PSO, but here the random walk via Lévy flight is more efficient in exploring
the search space, as its step length is much longer in the long run.
The Lévy flight essentially provides a random walk whose random step length
is drawn from a Lévy distribution
which has an infinite variance with an infinite mean. Here the steps essentially
form a random walk process with a power-law step-length distribution with a
heavy tail. Some of the new solutions should be generated by Lévy walk around
the best solution obtained so far, this will speed up the local search. However, a
substantial fraction of the new solutions should be generated by far field random-
ization and whose locations should be far enough from the current best solution,
this will make sure that the system will not be trapped in a local optimum.
From a quick look, it seems that there is some similarity between CS and
hill-climbing in combination with some large scale randomization. But there are
some significant differences. Firstly, CS is a population-based algorithm, in a
way similar to GA and PSO, but it uses some sort of elitism and/or selection
similar to that used in harmony search. Secondly, the randomization in CS is
more efficient as the step length is heavy-tailed, and any large step is possible.
Thirdly, the number of parameters in CS to be tuned is fewer than GA and PSO,
108 CHAPTER 12. CUCKOO SEARCH
After implementation, we have to validate the algorithm using test functions with
analytical or known solutions. For example, one of the many test functions we
have used is the bivariate Michalewicz function
x2 2y 2
f (x, y) = − sin(x) sin2m ( ) − sin(y) sin2m ( ), (12.3)
π π
where m = 10 and (x, y) ∈ [0, 5] × [0, 5]. This function has a global minimum
f∗ ≈ −1.8013 at (2.20319, 1.57049). This global optimum can easily be found
using Cuckoo Search, and the results are shown in Fig. 12.2 where the final
locations of the nests are also marked with in the figure. Here we have used
n = 15 nests, α = 1 and pa = 0.25. In most of our simulations, we have used
n = 15 to 50.
From the figure, we can see that, as the optimum is approaching, most nests
aggregate towards the global optimum. We also notice that the nests are also
distributed at different (local) optima in the case of multimodal functions. This
means that CS can find all the optima simultaneously if the number of nests
are much higher than the number of local optima. This advantage may become
more significant when dealing with multimodal and multiobjective optimization
problems.
We have also tried to vary the number of host nests (or the population size
n) and the probability pa . We have used n = 5, 10, 15, 20, 30, 40, 50, 100,
150, 250, 500 and pa = 0, 0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5. From our
simulations, we found that n = 15 to 40 and pa = 0.25 are sufficient for most
optimization problems. Results and analysis also imply that the convergence
rate, to some extent, is not sensitive to the parameters used. This means that
the fine adjustment is not needed for any given problems.
12.5 IMPLEMENTATION
% -------------------------------------------------------
% Cuckoo algorithm by Xin-She Yang and Suasg Deb %
% Programmed by Xin-She Yang at Cambridge University %
% -------------------------------------------------------
function [bestsol,fval]=cuckoo_search(Ngen)
% Here Ngen is the max number of function evaluations
if nargin<1, Ngen=1500; end
3.5
2.5
1.5
0.5
0
0 1 2 3 4
Figure 12.2: Search paths of nests using Cuckoo Search. The final locations of
the nests are marked with in the figure.
n=25;
for j=1:Ngen,
% Find the current best
Kbest=get_best_nest(fbest);
% Choose a random nest (avoid the current best)
k=choose_a_nest(n,Kbest);
bestnest=nest(Kbest,:)
% Generate a new solution (but keep the current best)
s=get_a_cuckoo(nest(k,:),bestnest);
k=get_max_nest(fbest);
s=emptyit(nest(k,:));
nest(k,:)=s;
fbest(k)=fobj(s);
end
end
%% Post-optimization processing
%% Find the best and display
[fval,I]=min(fbest)
bestsol=nest(I,:)
If we run this program using some standard test functions, we can observe that
CS outperforms many existing algorithms such as GA and PSO. The primary
reasons are: 1) a fine balance of randomization and intensification, and 2) fewer
number of control parameters. As for any metaheuristic algorithms, a good
balance of intensive local search and an efficient exploration of the whole search
space will usually lead to a more efficient algorithm. On the other hand, there
are only two parameters in this algorithm, the population size n, and pa . Once
n is fixed, pa essentially controls the elitism and the balance of randomization
and local search. Few parameters make an algorithm less complex and thus
potentially more generic. Such observations deserve more systematic research
and further elaboration in the future work.
It is worth pointing out that there are three ways to carry out randomization:
uniform randomization, random walks and heavy-tailed walks. The simplest way
is to use a uniform distribution so that new solutions are limited between upper
and lower bounds. Random walks can be used for global randomization or local
randomization, depending on the step size used in the implementation. Lévy
flights are heavy-tailed, which is most suitable for the randomization on the
global scale.
As an example for solving constrained optimization, we now solved the spring
design problem discussed in the chapter on firefly algorithm. The Matlab code
is given below
% Discovery rate
pa=0.25;
% Random initial solutions
nest=init_cuckoo(n,d,Lb,Ub);
fbest=ones(n,1)*10^(10); % minimization problems
112 CHAPTER 12. CUCKOO SEARCH
Kbest=1;
k=mod(k+1,n)+1;
end
function Z=getnonlinear(u)
Z=0;
% Penalty constant
lam=10^15;
% Inequality constraints
g(1)=1-u(2)^3*u(3)/(71785*u(1)^4);
gtmp=(4*u(2)^2-u(1)*u(2))/(12566*(u(2)*u(1)^3-u(1)^4));
114 CHAPTER 12. CUCKOO SEARCH
g(2)=gtmp+1/(5108*u(1)^2)-1;
g(3)=1-140.45*u(1)/(u(2)^2*u(3));
g(4)=(u(1)+u(2))/1.5-1;
REFERENCES
1. Barthelemy P., Bertolotti J., Wiersma D. S., A Lévy flight for light, Nature,
453, 495-498 (2008).
12.5 IMPLEMENTATION 115
2. Bradley D., Novel ‘cuckoo search algorithm’ beats particle swarm optimiza-
tion in engineering design (news article), Science Daily, May 29, (2010). Also
in Scientific Computing (magazine), 1 June 2010.
3. Brown C., Liebovitch L. S., Glendon R., Lévy flights in Dobe Ju/’hoansi
foraging patterns, Human Ecol., 35, 129-138 (2007).
4. Chattopadhyay R., A study of test functions for optimization algorithms, J.
Opt. Theory Appl., 8, 231-236 (1971).
5. Passino K. M., Biomimicry of Bacterial Foraging for Distributed Optimiza-
tion, University Press, Princeton, New Jersey (2001).
6. Payne R. B., Sorenson M. D., and Klitz K., The Cuckoos, Oxford University
Press, (2005).
7. Pavlyukevich I., Lévy flights, non-local search and simulated annealing, J.
Computational Physics, 226, 1830-1844 (2007).
8. Pavlyukevich I., Cooling down Lévy flights, J. Phys. A:Math. Theor., 40,
12299-12313 (2007).
9. Reynolds A. M. and Frye M. A., Free-flight odor tracking in Drosophila is
consistent with an optimal intermittent scale-free search, PLoS One, 2, e354
(2007).
10. A. M. Reynolds and C. J. Rhodes, The Lévy flight paradigm: random search
patterns and mechanisms, Ecology, 90, 877-87 (2009).
11. Schoen F., A wide class of test functions for global optimization, J. Global
Optimization, 3, 133-137, (1993).
12. Shlesinger M. F., Search research, Nature, 443, 281-282 (2006).
13. Yang X. S. and Deb S., Cuckoo search via Lévy flights, in: Proc. of World
Congress on Nature & Biologically Inspired Computing (NaBic 2009), IEEE
Publications, USA, pp. 210-214 (2009).
14. Yang X. S. and Deb S,, Engineering optimization by cuckoo search, Int. J.
Math. Modelling & Numerical Optimisation, 1, 330-343 (2010).
Chapter 13
u1
ui wi P yk
un
h
u1 y1
.. wih
.
ui .. whk ..
. .
..
.
uni yno
m
input layer (i) hidden layer (h) output neurons (k)
Figure 13.2: Schematic representation of a three-layer neural
network with ni inputs, m hidden nodes and no outputs.
and
1
ok = f (Sk ) = , (13.8)
1 + e−Sk
we have
f 0 = f (1 − f ), (13.9)
∂ok ∂ok ∂Sk
= = ok (1 − ok )oh , (13.10)
∂whk ∂Sk ∂whk
and
∂E
= (ok − yk ). (13.11)
∂ok
Therefore, we have
BPNN
Initialize weight matrices Wih and Whk randomly
for all training data points
while ( residual errors are not zero )
Calculate the output for the hidden layer oh using (13.13)
Calculate the output for the output layer ok using (13.14)
Compute errors δk and δh using (13.15) and (13.16)
Update weights wih and whk via (13.17) and (13.18)
end while
end for
1
oh = Pni , (h = 1, 2, ..., m), (13.13)
1 + exp[− i=1
wih ui ]
and the outputs for the output nodes
1
ok = Pm , (k = 1, 2, ..., no ). (13.14)
1 + exp[− h=1
whk oh ]
where yk (k = 1, 2, ..., no ) are the data (real outputs) for the inputs ui (i =
1, 2, ..., ni ). Similarly, the errors for the hidden nodes can be written as
no
X
δh = oh (1 − oh ) whk δk , (h = 1, 2, ..., m). (13.16)
k=1
and
t+1 t
wih = wih + ηδh ui , (13.18)
where 0 < η ≤ 1 is the learning rate.
Here we can see that the weight increments are
with similar updating formulae for whk . An improved version is to use the so-
called weight momentum α to increase the learning efficiency
13.2.1 Classifications
In many applications, the aim is to separate some complex data into different
categories. For example, in pattern recognition, we may need to simply separate
circles from squares. That is to label them into two different classes. In other
applications, we have to answer a yes-no question, which is a binary classification.
If there are n different classes, we can in principle first classify them into two
classes: class, say k, and non-class k. We then focus on the non-class k and divide
them into two different classes, and so on and so forth.
Mathematically speaking, for a given (but scattered) data set, the objective
is to separate them into different regions/domains or types. In the simplest case,
the outputs are just class either A or B; that is, either +1 or −1.
and that the expected risk E(α) is minimal. That is the minimization of the risk
Z
1
E(α) = |fα (x) − y|dP (x, y), (13.22)
2
A main drawback of this approach is that a small risk or error on the training
set does not necessarily guarantee a small error on prediction if the number n of
training data is small.
In the framework of structural risk minimization and statistical learning the-
ory, there exists an upper bound for such errors. For a given probability of at
122 CHAPTER 13. ANNS AND SUPPORT VECTOR MACHINE
s
s s
s ss
s
w · x + b = +1
4
4 w·x+b=0
4 4 w · x + b = −1
4
h log(p)
E(α) ≤ Ep (α) + φ , , (13.24)
n n
where r
h log(p) 1 2n p
φ , = h(log + 1) − log( ) . (13.25)
n n n h 4
Here h is a parameter, often referred to as the Vapnik-Chervonenskis dimension
(or simply VC-dimension). This dimension describes the capacity for prediction
of the function set fα . In the simplest binary classification with only two values
of +1 and −1, h is essentially the maximum number of points which can be
classified into two distinct classes in all possible 2h combinations.
w · x + b = 0, (13.26)
so that these samples can be divided into classes with triangles on one side
and the spheres on the other side. Here the normal vector w and b have the
same size as x, and they can be determined using the data, though the method
of determining them is not straightforward. This requires the existence of a
hyperplane; otherwise, this approach will not work. In this case, we have to use
other methods.
In essence, if we can construct such a hyperplane, we should construct two
hyperplanes (shown as dashed lines) so that the two hyperplanes should be as
far away as possible and no samples should be between these two planes. Math-
13.2 SUPPORT VECTOR MACHINE 123
w · x + b = +1, (13.27)
and
w · x + b = −1. (13.28)
From these two equations, it is straightforward to verify that the normal (per-
pendicular) distance between these two hyperplanes is related to the norm ||w||
via
2
d= . (13.29)
||w||
A main objective of constructing these two hyperplanes is to maximize the dis-
tance or the margin between the two planes. The maximization of d is equivalent
to the minimization of ||w|| or more conveniently ||w||2 . From the optimization
point of view, the maximization of margins can be written as
1 1
minimize ||w||2 = (w · w). (13.30)
2 2
If we can classify all the samples completely, for any sample (xi , yi ) where
i = 1, 2, ..., n, we have
and
w · xi + b ≤ −1, if (xi , yi ) ∈ the other class. (13.32)
As yi ∈ {+1, −1}, the above two equations can be combined as
so that
yi (w · xi + b) ≥ 1 − ηi , (i = 1, 2, ..., n). (13.35)
Now the optimization for the support vector machine becomes
n
1 X
minimize Ψ = ||w||2 + λ ηi , (13.36)
2
i=1
subject to yi (w · xi + b) ≥ 1 − ηi , (13.37)
ηi ≥ 0, (i = 1, 2, ..., n), (13.38)
Pn
where λ > 0 is a parameter to be chosen appropriately. Here, the term η
i=1 i
is essentially a measure of the upper bound of the number of misclassifications
on the training data.
124 CHAPTER 13. ANNS AND SUPPORT VECTOR MACHINE
n
∂L X
=− αi yi = 0, (13.41)
∂b
i=1
yi (w · xi + b) − (1 − ηi ) ≥ 0, (13.42)
αi [yi (w · xi + b) − (1 − ηi )] = 0, (i = 1, 2, ..., n), (13.43)
αi ≥ 0, ηi ≥ 0, (i = 1, 2, ..., n). (13.44)
From the first KKT condition, we get
n
X
w= yi αi xi . (13.45)
i=1
It is worth pointing out here that only the nonzero coefficients αi contribute to
the overall solution. This comes from the KKT condition (13.43), which implies
that when αi 6= 0, the inequality (13.37) must be satisfied exactly, while α0 = 0
means the inequality is automatically met. In this latter case, ηi = 0. Therefore,
only the corresponding training data (xi , yi ) with αi > 0 can contribute to the
solution, and thus such xi form the support vectors (hence, the name support
vector machine). All the other data with αi = 0 become irrelevant.
V. Vapnik and B. Schölkopf et al. have shown that the solution for αi can be
found by solving the following quadratic programming
n n
X 1 X
maximize αi − αi αj yi yj (xi · xj ), (13.46)
2
i=1 i,j=1
subject to
n
X
αi yi = 0, 0 ≤ αi ≤ λ, (i = 1, 2, ..., n). (13.47)
i=1
From the coefficients αi , we can write the final classification or decision function
as
n
X
f (x) = sgn αi yi (x · xi ) + b . (13.48)
i=1
4
4 4
s s
4 s s 4 s
s s
s s
4 s
for nonlinear classifiers. This kernel can easily be extended to any high dimen-
sions. Here σ 2 is the variance and γ = 1/2σ 2 is a constant.
Following a similar procedure as discussed earlier for linear SVM, we can
obtain the coefficients αi by solving the following optimization problem
n
X 1
maximize αi − αi αj yi yj K(xi , xj ). (13.51)
2
i=1
It is worth pointing out under Mercer’s conditions for kernel functions, the matrix
A = yi yj K(xi , xj ) is a symmetric positive definite matrix, which implies that
126 CHAPTER 13. ANNS AND SUPPORT VECTOR MACHINE
REFERENCES
There are many ways of carrying out intensification and diversification. In fact,
each algorithm and its variants use different ways of achieving the balance of
between exploration and exploitation.
By analyzing all the metaheuristic algorithms, we can categorically say that
the way to achieve exploration or diversification is mainly by certain randomiza-
tion in combination with a deterministic procedure. This ensures that the newly
generated solutions distribute as diversely as possible in the feasible search space.
One of simplest and yet most commonly used randomization techniques is to use
xnew = L + (U − L) ∗ u , (14.1)
where L and U are the lower bound and upper bound, respectively. u is a uni-
formly distributed random variable in [0,1]. This is often used in many algorithms
such as harmony search, particle swarm optimization and firefly algorithm.
It is worth pointing that the use of a uniform distribution is not the only
way to achieve randomization. In fact, random walks such as Lévy flights on a
global scale are more efficient. We can use the same equation (14.2) to carry out
randomization, and the only difference is to use a large step size s so that the
random walks can cover a larger region on the global large scale.
A more elaborate way to obtain diversification is to use mutation and crossover.
Mutation makes sure new solutions are as far/different as possible, from their par-
ents or existing solutions; while crossover limits the degree of over diversification,
as new solutions are generated by swapping parts of the existing solutions.
The main way to achieve the exploitation is to generate new solutions around
a promising or better solution locally and more intensively. This can be easily
achieved by a local random walk
where w is typically drawn from a Gaussian distribution with zero mean. Here
s is the step size of the random walk. In general, the step size should be small
enough so that only local neighbourhood is visited. If s is too large, the region
visited can be too far away from the region of interest, which will increase the
diversification significantly but reduce the intensification greatly. Therefore, a
proper step size should be much smaller than (and be linked with) the scale of
the problem. For example, the pitch adjustment in harmony search and the move
in simulated annealing are a random walk.
If we want to increase the efficiency of this random walk (and thus increase
the efficiency of exploration as well), we can use other forms of random walks
such as Lévy flights where s is drawn from a Lévy distribution with large step
sizes. In fact, any distribution with a long tail will help to increase the step size
and distance of such random walks.
Even with the standard random walk, we can use a more selective or con-
trolled walk around the current best xbest , rather than any good solution. This
is equivalent to replacing the above equation by
From the above discussion of all the major components and their characteris-
tics, we realized that a good combination of local search and global search with
a proper selection mechanism should produce a good metaheuristic algorithm,
whatever the name it may be called.
In principle, the global search should be carried out more frequently at the
initial stage of the search or iterations. Once a number of good quality solutions
are found, exploration should be sparse on the global scale, but frequent enough
so as to escape any local trap if necessary. On the other hand, the local search
should be carried out as efficient as possible, so a good local search method should
be used. The proper balance of these two is paramount.
Using these basic components, we can now design a generic, metaheuristic al-
gorithm for optimization, we can call it the Generalized Evolutional Walk Algo-
rithm (GEWA). Evolutionary walk is a random walk, but with a biased selection
towards optimality. This is a generalized framework for global optimization.
There are three major components in this algorithm: 1) global exploration by
randomization, 2) intensive local search by random walk, and 3) the selection of
the best with some elitism. The pseudo code of GEWA is shown in Fig. 12.1.
The random walk should be carried out around the current global best g ∗ so as
to exploit the system information such as the current best more effectively. We
have
xt+1 = g ∗ + w, (14.5)
and
w = εd, (14.6)
where ε is drawn from a Gaussian distribution or normal distribution N(0, σ 2 ),
and d is the step length vector which should be related to the actual scales of
independent variables. For simplicity, we can take σ = 1.
The randomization step can be achieved by
where u is drawn from a uniform distribution Unif[0,1]. U and L are the upper
and lower bound vectors, respectively.
Typically, α ≈ 0.25 ∼ 0.7. We will use α = 0.5 in our implementation.
Interested readers can try to do some parametric studies.
Again two important issues are: 1) the balance of intensification and diversi-
fication controlled by a single parameter α, and 2) the choice of the step size of
the random walk. Parameter α is typically in the range of 0.25 to 0.7. The choice
of the right step size is also important, as discussed in Section 4.3. The ratio of
the step size to its length scale can be determined by (4.17), which is typically
0.001 to 0.01 for most applications.
Another important issue is the selection of the best and/or elitism. As we
intend to discard the worst solution and replace it by generating new solution.
This may implicitly weed out the least-fit solutions, while the solution with the
highest fitness remains in the population. The selection of the best and elitism
can be guaranteed implicitly in the evolutionary walkers.
14.3 GENERALIZED EVOLUTIONARY WALK ALGORITHM (GEWA) 131
xt+1 = g ∗ + εd (14.7)
else
Global search: randomization
end
Evaluate new solutions and find the current best g t∗ ;
t = t + 1;
end while
Postprocess results and visualization;
Furthermore, the number (n) of random walkers is also important. Too few
walkers are not efficient, while too many may lead to slow convergence. In general,
the choice of n should follow the similar guidelines as those for all population-
based algorithms. Typically, we can use n = 15 to 50 for most applications.
function [bestsol,fval]=gewa(N_iter)
% Default number of iterations
if nargin<1, N_iter=5000; end
% dimension or number variables
d=3;
% Lower and upper bounds
Lb=-2*ones(1,d); Ub=2*ones(1,d);
132 CHAPTER 14. METAHEURISTICS – A UNIFIED APPROACH
% Iterations begin
for j=1:N_iter,
if rand<alpha,
% Local search by random walk
ns(k,:)=rand_walk(sbest,Lb,Ub);
else
% Global search by randomization
ns(k,:)=randomization(Lb,Ub);
end
end
% Objective function
function z=fobj(u)
% Rosenbrock’s 3D function
z=(1-u(1))^2+100*(u(2)-u(1)^2)^2+(1-u(3))^2;
% -------- end of the GEWA implementation -------
Eagle Strategy
Objective functions f1 (x), ..., fN (x)
Initialization and random initial guess xt=0
while (stop criterion)
Global exploration by randomization (e.g. Lévy flights)
Evaluate the objectives and find a promising solution
Intensive local search around a promising solution
via an efficient local optimizer (e.g. hill-climbing)
if (a better solution is found)
Update the current best
end
Update t = t + 1
end
Post-process the results and visualization.
can be built in such a way that the algorithms can be selected automatically and
evolve accordingly, then intelligent algorithms can be developed to solve complex
optimization problems efficiently.
There are a few other nature-inspired algorithms that are used in the literature.
Some such as Tabu search are widely used, while others are gaining momentum.
For example, both the photosynthetic algorithm and the enzyme algorithm are
very specialized algorithms. In this chapter, we will briefly outline the basic con-
cepts of these algorithms without implementation. Readers who are interested in
these algorithms can refer to recent research journals and conference proceedings
for more details.
plasts and light. The actual reaction is quite complicated, though it is often
simplified as the following overall reaction
light
6CO2 + 6H2 O −→ C6 H12 O6 +6O2 . (14.10)
Obviously, more and more metaheuristic algorithms will appear in the future.
Interested readers can follow the latest literature and research journals.
not?’. If we do not have good solutions at hand, it is always a good idea to learn
from nature.
Another important question is what algorithm to choose for a given problem?
This depends on many factors such as the type of problem, the solution qual-
ity, available computing resource, time limit (before which a problem must be
solved), balance of advantages and disadvantages of each algorithm (another op-
timization problem!), and the expertise of the decision-makers. When we study
the algorithms, the efficiency and advantages as well as their disadvantages, to a
large extent, essentially determine the type of problem they can solve and their
potential applications. In general, for analytical function optimization problems,
nature-inspired algorithms should not be the first choice if analytical methods
work well. If the function is simple, we can use the stationary conditions (first
derivatives must be zero) and extreme points (boundaries) to find the optimal
solution(s). If this is not the best choice for a given problem, then calculus-based
algorithms such as the steepest descent method should be tried. If these options
fail, then nature-inspired algorithms can be used. On the other hand, for large-
scale, nonlinear, global optimization problems, modern approaches tend to use
metaheuristic algorithms (unless there is a particular algorithm already worked
so well for the problem of interest).
Now the question is why almost all the examples about numerical algorithms
in this book (and in other books as well) are discussed using analytical functions?
This is mainly for the purpose of validating new algorithms. The standard test
functions such as Rosenbrock’s banana function and De Jong’s functions are
becoming standard tests for comparing new algorithms against established al-
gorithms because the latter have been well validated using these test functions.
This will provide a good standard for comparison.
Another important question is how to develop new algorithms. There are
many ways of achieving a good formulation of new algorithms. Two good and
successful ways are based on two basic ways of natural selection: explore new
strategies and inherit the fittest strategies. Therefore, the first way of developing
new algorithms is to design new algorithms using new discoveries. The second
way is to formulate by hybrid and crossover.
Another successful strategy used by nature is the adaptation to its new envi-
ronment. We can also use the same strategy to modify existing algorithms and ex-
plore their new applications. For any new optimization problems we might meet,
we can try to modify existing successful algorithms, to suit for new applications,
by either changing the controlling parameters or introducing new functionality.
Many variants of numerical algorithms have been developed this way.
In order to develop completely new nature-inspired algorithms, we have to
observe, study and learn from nature. For example, I was always fascinated by
the cobwebs spun by spiders. For a given environment, how does a spider decide
to spin a web with such regularity so as to maximize the chance of catching some
food? Are the cobweb’s location and pattern determined by the airflow? Surely,
these cobwebs are not completely random? Can we design a new algorithm – the
spider algorithm?
Nature provides almost unlimited ways for problem-solving. If we can observe
carefully, we are surely inspired to develop more powerful and efficient new gen-
eration algorithms. Intelligence is a product of biological evolution in nature.
14.6 FURTHER RESEARCH 139
Ultimately some intelligent algorithms (or systems) may appear in the future, so
that they can evolve and optimally adapt to solve NP-hard optimization problems
efficiently and intelligently.
REFERENCES
1. Bersini H. and Varela F. J., Hints for adaptive problem solving gleaned from
immune networks, Parellel Problem Solving from Nature, PPSW1, Dort-
mund, FRG, (1990).
2. Blum, C. and Roli, A., Metaheuristics in combinatorial optimization: Overview
and conceptural comparision, ACM Comput. Surv., 35, 268-308 (2003).
3. Farmer J.D., Packard N. and Perelson A., The immune system, adaptation
and machine learning, Physica D, 2, 187-204 (1986).
4. Moscato, P. On Evolution, Search, Optimization, Genetic Algorithms and
Martial Arts: Towards Memetic Algorithms. Caltech Concurrent Computa-
tion Program (report 826), (1989).
5. Rubinstein R.Y., Optimization of computer simulation models with rare
events, European Journal of Operations Research, 99, 89-112 (1997).
6. Passino K. M., Biomimicry of bacterial foraging for distributed optimization
and control, IEEE Control System Magazine, pp. 52-67 (2002).
7. Schoen, F., 1993. A wide class of test functions for global optimization, J.
Global Optimization, 3, 133-137.
8. Shilane D., Martikainen J., Dudoit S., Ovaska S. J., 2008. A general frame-
work for statistical performance comparison of evolutionary computation al-
gorithms, Information Sciences: an Int. Journal, 178, 2870-2879 (2008).
9. Yang, X. S. and Deb, S., 2009. Cuckoo search via Lévy flights, Proceedings of
World Congress on Nature & Biologically Inspired Computing (NaBIC 2009,
India), IEEE Publications, USA, pp. 210-214 (2009).
10. Yang X. S., 2009. Harmony search as a metaheuristic algorithm, in: Music-
Inspired Harmony Search: Theory and Applications (Eds Z. W. Geem),
Springer, pp.1-14.
11. Yang X. S. and Deb S., Eagle strategy using Lévy walk and firefly algorithms
for stochastic optimization, in: Nature Inspired Cooperative Strategies for
Optimization (NICSO 2010) (Eds. J. R. Gonzalez et al.), Springer, SCI 284,
101-111 (2010).
REFERENCES
12. Deb. K., Optimisation for Engineering Design: Algorithms and Examples,
Prentice-Hall, New Delhi, (1995).
13. Dorigo M., Optimization, Learning and Natural Algorithms, PhD thesis,
Politecnico di Milano, Italy, (1992).
14. Dorigo M. and Stützle T., Ant Colony Optimization, MIT Press, Cambridge,
(2004).
15. El-Beltagy M. A., Keane A. J., A comparison of various optimization al-
gorithms on a multilevel problem, Engin. Appl. Art. Intell., 12, 639-654
(1999).
16. Engelbrecht A. P., Fundamentals of Computational Swarm Intelligence, Wi-
ley, (2005).
17. Fathian M., Amiri B., Maroosi A., Application of honey-bee mating opti-
mization algorithm on clustering, Applied Mathematics and Computation,
190, 1502-1513 (2007).
18. Flake G. W., The Computational Beauty of Nature, MIT Press, (1998).
19. Fogel L. J., Owens A. J., and Walsh M. J., Artificial Intelligence Through
Simulated Evolution, Wiley, (1966).
20. Fowler A. C., Mathematical Models in the Applied Sciences, Cambridge
University Press, (1997).
21. Geem Z. W., Kim J. H., and Loganathan G. V., A new heuristic optimiza-
tion algorithm: Harmony search, Simulation, 76, 60-68 (2001).
22. Gill P. E., Murray W., and Wright M. H., Practical optimization, Academic
Press Inc, (1981).
23. Glover F., Heuristics for Integer Programming Using Surrogate Constraints,
Decision Sciences, 8, 156-166 (1977).
24. Glover F. and Laguna M., Tabu Search, Kluwer Academic, (1997).
25. Goldberg D. E., Genetic Algorithms in Search, Optimisation and Machine
Learning, Reading, Mass.: Addison Wesley (1989).
26. Haddad O. B., Afshar A., Marino M. A., Honey bees mating optimization
algorithm (HBMO), in: First Int. Conf. on Modelling, Simulation & Appl.
Optimization, UAE, (2005).
27. Holland J., Adaptation in Natural and Artificial Systems, University of
Michigan Press, Ann Anbor, (1975).
28. Jaeggi D., Parks G. T., Kipouros T., Clarkson P. J., A multi-objective Tabu
search algorithm for constrained optimization problem, 3rd Int. Conf. Evol.
Multi-Criterion Optimization, 3410, 490-504 (2005).
29. Judea P., Heuristics, Addison-Wesley, (1984).
30. Karaboga D. and Basturk B., On the performance of artificial bee colony
(ABC) algorithm, Applied Soft Computing,8, 687-697 (2008).
31. Keane A. J., Genetic algorithm optimization of multi-peak problems: stud-
ies in convergence and robustness, Artificial Intelligence in Engineering, 9,
75-83 (1995).
REFERENCES 143
52. Sawaragi Y., Nakayama H., Tanino T., Theory of Multiobjective Optimisa-
tion, Academic Press, (1985).
53. Schrijver A., On the history of combinatorial optimization (till 1960), in:
Handbook of Discrete Optimization (Eds K. Aardal, G. L. Nemhauser, R.
Weismantel), Elsevier, Amsterdam, p.1-68 (2005).
54. Sirisalee P., Ashby M. F., Parks G. T., and Clarkson P. J.: Multi-criteria
material selection in engineering design, Adv. Eng. Mater., 6, 84-92 (2004).
55. Siegelmann H. T. and Sontag E. D., Turing computability with neural nets,
Appl. Math. Lett., 4, 77-80 (1991).
56. Seeley T. D., The Wisdom of the Hive, Harvard University Press, (1995).
57. Seeley T. D., Camazine S., Sneyd J., Collective decision-making in honey
bees: how colonies choose among nectar sources, Behavioural Ecology and
Sociobiology, 28, 277-290 (1991).
58. Spall J. C., Introduction to Stochastic Search and optimization: Estimation,
Simulation, and Control, Wiley, Hoboken, NJ, (2003).
59. Storn R., On the usage of differential evolution for function optimization,
Biennial Conference of the North American Fuzzy Information Processing
Society (NAFIPS), pp. 519-523 (1996).
60. Storn R., web pages on differential evolution with various programming
codes, http://www.icsi.berkeley.edu/∼storn/code.html
61. Storn R. and Price K., Differential evolution - a simple and efficient heuristic
for global optimization over continuous spaces, Journal of Global Optimiza-
tion, 11, 341-359 (1997).
62. Swarm intelligence, http://www.swarmintelligence.org
63. Talbi E. G., Metaheuristics: From Design to Implementation, Wiley, (2009).
64. Vapnik V., The Nature of Statistical Learning Theory, Springer, (1995).
65. Wolpert D. H. and Macready W. G., No free lunch theorems for optimiza-
tion, IEEE Trans. on Evol. Computation, 1, 67-82 (1997).
66. Wikipedia, http://en.wikipedia.org
67. Yang X. S., Engineering optimization via nature-inspired virtual bee algo-
rithms, IWINAC 2005, Lecture Notes in Computer Science, 3562, 317-323
(2005).
68. Yang X. S., Biology-derived algorithms in engineering optimization (Chap-
ter 32), in Handbook of Bioinspired Algorithms, edited by Olariu S. and
Zomaya A., Chapman & Hall / CRC, (2005).
69. Yang X. S., New enzyme algorithm, Tikhonov regularization and inverse
parabolic analysis, in: Advances in Computational Methods in Science and
Engineering, ICCMSE 2005, 4, 1880-1883 (2005).
70. Yang X. S., Lees J. M., Morley C. T.: Application of virtual ant algorithms
in the optimization of CFRP shear strengthened precracked structures, Lec-
ture Notes in Computer Sciences, 3991, 834-837 (2006).
REFERENCES 145
71. Yang X. S., Firefly algorithms for multimodal optimization, 5th Symposium
on Stochastic Algorithms, Foundations and Applications, SAGA 2009, Eds.
O. Watanabe & T. Zeugmann, LNCS, 5792, 169-178(2009).
72. Yang X. S. and Deb S., Cuckoo search via Lévy flights, in: Proc. of World
Congress on Nature & Biologically Inspired Computing (NaBic 2009), IEEE
Publications, USA, pp. 210-214 (2009).
73. Yang X. S. and Deb S., Engineering optimization by cuckoo search, Int. J.
Math. Modelling & Num. Optimization, 1, 330-343 (2010).
74. Yang X. S., A new metaheuristic bat-inspired algorithm, in: Nature Inspired
Cooperative Strategies for Optimization (NICSO 2010) (Eds. J. R. Gonzalez
et al.), Springer, SCI 284, 65-74 (2010).
75. Yang X. S. and Deb S., Eagle strategy using Lévy walk and firefly algorithms
for stochastic optimization, in: Nature Inspired Cooperative Strategies for
Optimization (NICSO 2010) (Eds. J. R. Gonzalez et al.), Springer, SCI 284,
101-111 (2010).