Pikaia 1
Pikaia 1
Pikaia 1
NCAR/TN-450+IA
NCAR TECHNICAL NOTE
March 2002
Paul Charbonneau
TABLE OF CONTENTS
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
1. Introduction: Optimization
1.1 Optimization and hill climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1
1.2 The simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
1.3 Iterated simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7
1.4 A set of test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
1.5 Performance of the simplex and iterated simplex methods . . . . . . . . 13
2. Evolution, optimization, and genetic algorithms
2.1 Biological evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 The power of cumulative selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 A basic genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.4 Information transfer in genetic algorithms . . . . . . . . . . . . . . . . . . . . . . 25
3. PIKAIA: A genetic algorithm for numerical optimization
3.1 Overview and problem denition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 Minimal algorithmic components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Additional components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 A case study: GA2 on P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5 Hamming walls and creep mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Performance on test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4. A real application: orbital elements of binary stars
4.1 Binary stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Radial velocities and Keplerian orbits . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 A genetic algorithm solution using PIKAIA . . . . . . . . . . . . . . . . . . . . 45
5. Final thoughts and further readings
5.1 To cross over or not to cross over? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Hybrid methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
iv
5.3 When should you use genetic algorithms? . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
v
LIST OF FIGURES
LIST OF TABLES
PREFACE
The Tutorial Page also includes various animations for some of the solutions dis-
cussed in the text. The PIKAIA Web Page contains links to the HAO ftp archive,
from which you can obtain, in addition to the source code for PIKAIA, a User's
Guide, as well as source codes for the various examples discussed therein. The
idea behind all this is that by the time you are done reading through this paper
and doing the Exercises, you should be in good shape to solve global numerical
optimization problems you might encounter in your own research.
The writing of this preface oers a ne opportunity to thank my friends and
colleagues Viggo Hansteen and Mats Carlsson for their invitation and nancial sup-
port to attend their 1998 Mini-Workshop on Numerical Methods in Astrophysics,
as well as for their kind hospitality during my extended stay in Norway. The
CrB data and some source codes for the orbital element tting problem of x4 were
provided by Tim Brown, who was also generous with his time in explaining to me
some of the subtleties of orbital element determinations. Thorough readings of
the 1998 draft of this paper by Sandy and Gene Arnn, Tim Brown, Sarah Gibson,
Barry Knapp and Hardi Peter are also gratefully acknowledged.
Throughout my twelve years working at NCAR's High Altitude Observatory,
it has been my privilege to interact with a large number of bright and enthusi-
astic students and postdocs. My forays into genetic algorithms have particularly
beneted from such collaborators. Since 1995, I have had to keep up in turn with
Ted Kennelly, Sarah Gibson, Hardi Peter, Scott McIntosh, and Travis Metcalfe. I
thank them all for keeping me on my toes all this time.
Paul Charbonneau
March 2002, Boulder
1. INTRODUCTION: OPTIMIZATION
Figure 2: Two dimensional surface f (x y), with x y 2 0 1], dening a hard
maximization problem. The global maximum is f (x y) = 1 at (x y) = (0:5 0:5),
and is indicated by the arrow.
Rule of Global Optimization, also known as
Faced with the landscape of Figure 2 the most straightforward solution lies with
a technique called iterated hill climbing. This is a fancy name for something very
simple, as illustrated on Figure 3. You just run your favorite local hill climbing
method repeatedly, each time from a dierent randomly chosen starting point.
While doing so you keep track of the various maxima so located, and once you
are satised that all maxima have been found you pick the tallest one and you
are done with your global optimization problem. As you might imagine, deciding
when to stop is the crux of this otherwise straightforward procedure.
5
iterations. And indeed it does repeatedly running the simplex (500 times) on the
test problem of Figure 2 leads to the central peak being located in 99.5% of tri-
als5 . The price to pay of course, is in the number of function evaluations required
to achieve this level of global performance: nearly 104 function evaluations per
iterated simplex run, on average6 . Welcome back to the No Free Lunch Rule...
5 It is recommended practice when using the simplex in single-run mode to
carry out a random restart once the simplex has converged this entails reinitial-
izing randomly all but one of the converged simplex's vertices, and letting the
simplex reconverge again. What is described here as iterated simplex consists
in reinitializing all vertices randomly, so as to make each successive trial fully
independent from all others.
6 A single simplex move may entail more than one function evaluation. For
9
1.4 A set of test problems
One should rightfully suspect that the simplex method's performance on the
test problem of Figure 2 might not be representative of its performance on other
problems. This very legitimate concern will evidently carry over to the various
genetic algorithm-based optimization schemes discussed further below. It will
therefore prove useful to have available not just one, but a set of test problems. The
four test problems described below are all very hard global optimization problems,
on which most conventional local optimization algorithms would fail miserably.
Also keep in mind that it is always possible to design a test problem that will
defeat any global optimization method7.
1.4.1 P1: maximizing a function of two variables 2 parameters]
Our rst test problem (hereafter labeled \P1") is our now familiar 2-D landscape
of Figure 2. Mathematically, it is dened as
f (x y) = cos2 (nr) exp(;r2 =2) (1a)
r2 = (x ; 0:5)2 + (y ; 0:5)2 x y 2 0 1] (1b)
where n = 9 and 2 = 0:15 are constants. The global maximum is located at
(x y) = (0:5 0:5), with f (x y) = 1:0. This global maximum is surrounded by
concentric rings of secondary maxima, centered on the global maximum at radial
distances
rmax = f0:110192 0:220385 0:330582 0:440782 0:550986g : (2)
Between these are located another series of concentric rings corresponding to min-
ima:
rmin = m ;n1=2 m = 1 ::: 6 (3)
The error " associated with a given solution (x y) can be dened as
" = 1 ; f (x y) : (4)
example if the trial move does not lead to an increase in f , the move might be
repeated with a halved or doubled displacement length (or a dierent type of move
might be attempted, depending on implementation). On the maximization prob-
lem of Figure 2, one simplex move requires 1.8 function evaluations, on average.
7 The high-n, high-D version of the fractal function discussed in x3.5 of B#ack
(1996) is a pretty good candidate for the ultimate killer test problem.
10
Note that the \peak" corresponding to the global maximum covers a surface area
=(4n2) in parameter space. If a hill climbing scheme were used, the probability of
a randomly chosen starting point landing close enough to this peak for the method
to locate the true global maximum is only 1% for n = 9.
1.4.2 P2: maximizing a function of two variables again 2 parameters]
Test function P2, shown on Figure 5, is again a 2-D landscape to be maximized.
It is dened by
f (x y) = 0:8 exp(;r12 =(0:3)2) + 0:879008 exp(;r22=(0:03)2) (5a)
r12 = (x ; 0:5)2 + (y ; 0:5)2 (5b)
r22 = (x ; 0:6)2 + (y ; 0:1)2 : (5c)
The maximum f (x y) = 1 is at (x y) = (0:6 0:1), and corresponds to the peak of
the second, narrower Gaussian. P2 is about as hard a global optimization problem
as P1 (the simplex succeeds 141 times out of 104 trials), but for a dierent reason.
There are now only two local maxima, with the global maximum again covering
about 1% of parameter space. Unlike P1, where moving toward successively higher
secondary extrema actually brings one closer to the true maximum, with P2 mov-
ing to the secondary maximum pulls solutions away from the global maximum.
Problems exhibiting this characteristics are sometimes called \deceptive" in the
optimization literature.
1.4.3 P3: maximizing a function of four variables 4 parameters]
Test problem P3 is a direct generalization of P1 to four independent variables
(w x y z):
f (w x y z) = cos2 (nr) exp(;r2=2) (6a)
r2 = (w ; 0:5)2 + (x ; 0:5)2 + (y ; 0:5)2 + (z ; 0:5)2 w x y z 2 0 1] (6b)
again with n = 9 and 2 = 0:15. Comparing performance on P1 and P3 will
provide a measure of scalability of the method under consideration, namely how
performance degrades as parameter space dimensionality is increased, everything
else being equal. P3 is a very hard global optimization problem the simplex
method manages to nd the global maximum only 6 times out of 105 trials.
1.4.4 P4: Minimizing a least squares residual 6 parameters]
Our fourth and nal test problem is dened as a \real" nonlinear least squares
tting problem. Consider a function of one variable (x) dened as the sum of two
Gaussians: !
X 2
( x ; x ) 2
y(x) = Aj exp ; 2 j : (7)
j =1 j
11
X
K
R(A1 x1 1 A2 x2 2) = y ; y(xk A1 x1 1 A2 x2 2)]2 (8)
k=1
with respect to the 6 parameters dening the two Gaussians. If one is told a
priori that two Gaussians are to be t to the data, then this residual minimization
problem is obviously equivalent to a 6-D function maximization problem for 1=R
(say), which simply denes a function in 6-D space. Figure 6 shows the dataset
generated using the parameter set
A1 x1 1 A2 x2 2] = 0:9 0:3 0:1 0:3 0:8 0:025] (9)
12
global maximum is
pG = 1 ; (1 ; p)N :
t
Iterated hill climbing] (10)
On the basis of eq. (10) one would predict global performances (0.884, 0.930, 0.029,
0.927) on P1 through P4, given the number of hill climbing iterations listed in the
rightmost column of Table 1, which compares quite well with the actual measured
global performance. One can also rewrite eq. (10) as
Doesn't look much like the original sentence... although careful comparison will
show that two letters actually coincide. The total number of distinct 27-character-
long sentences that can be made out of a 30-character alphabet is 3027 = 7:63
1039. This is a very large number, even by astronomical standards. The corre-
sponding probability of generating our rst, target sentence by this random process
on the rst trial is then (30);27 ' 10;40. This is such a small number that in-
voking the Dirty Harry Rule at this point would be moot. Instead consider the
following procedure:
(1) Generate 10 sentences of 27 randomly chosen characters
(2) Select the sentence that has the most correct letters
(3) Duplicate this best sentence ten times
(4) For each such duplicate, randomly replace a few letters10
(5) Repeat steps (2) through (4) until the target sentence has been matched.
This search algorithm incorporates the three ingredients mentioned previously as
essential to the evolutionary process. Step (2) is natural selection, in fact in a
deterministic and rather extreme form since the best and only the best acts as
progenitor to the next \generation". Step (3) is inheritance, again of a rather
extreme form as ospring start o as exact replicas of the (single) progenitor.
Step (4) is a stochastic process which provides the required variability. Note also
that the algorithm operates with minimal \tness" information all it has available
is how many correct letters a sentence contains, but not which letters are correct or
incorrect. What is still missing is exchange of information between trial solutions,
but be patient, this will come in due time.
Figure 7 illustrates the \evolution" of the best-of-10 sentence, starting from
an initial ten random sentences, as described above. The mutation rate was set at
p = 0:01, meaning that any given letter has a probability 0.01 of being subjected
to random replacement. Iteration count is listed in the leftmost column, and error
in the rightmost column. Error is dened here simply as the number of incorrect
letters in the best sentence generated in the course of the current iteration. Note
how the error decreases rather rapidly at rst, but much more slowly later on it
takes about as many iterations to get the rst 20 letters right as it takes to get
the last one. The target sentence is found after only 918 iterations, in the course
10 More precisely, dene a mutation rate as the probability p (2 0 1]) that a
given constituent letter be randomly replaced.
20
Figure 8: Convergence curves for the sentence search problem, for three dierent
mutation rates. The curves show the error associated with the best sentence
produced at each iteration. The solid line corresponds to the solution shown on
Figure 7.
is a mixed blessing. It is clearly needed as a source of variability, but too much
of it is denitely deleterious. Second, the general shape of the convergence curves
in Figure 8 is worth noting. Convergence is rather swift at rst, but then levels
o. This is a behavior we will meet again and again in what follows. Time now
to move on, nally, to genetic algorithms.
environment without self-destructing, and so rapidly take over the soup (see, Eigen
1971 for a comprehensive though somewhat dated review). This is conjectured to
be the explanation behind the universality of the genetic code among very nearly
all living organisms.
22
2.3 A basic genetic algorithm
Fundamentally, genetic algorithms are a class of search techniques that use sim-
plied forms of the biological processes of selection/inheritance/variation. Strictly
speaking they are not optimization methods per se, but can be used to form the
core of a class of robust and exible methods known as genetic algorithm-based
optimizers.
Let's go back to a generic optimization problem. One is given a \model" that
depends on a set of parameters u, and a functional relation f (u) that returns
a measure of quality, or tness, associated with the corresponding model (this
could be a 2 -type goodness of t measure if the model is compared to data, for
example). The optimization task usually consists in nding the \point" u in
parameter space corresponding to the model that maximizes the tness function
f (u). Dene now a population as a set of Np realizations of the parameters u. A
top-level view of a basic genetic algorithm is then as follows:
(1) Randomly initialize population and evaluate tness of its members
(2) Breed selected members of current population to produce ospring population
(selection based on tness)
(3) Replace current population by ospring population
(4) Evaluate tness of new population members
(5) Repeat steps (2) through (4) until the ttest member of the current population
is deemed t enough.
Were it not that what it being cycled through the iteration is a population of
solutions rather than a single trial solution, this would very much smell of iterated
hill climbing. It should also give you that uncanny feeling of deja vu, unless your
memory is really shot or, shame on you, you have skipped over the preceding
section. The crucial novelty lies with step 2: Breeding. It is in the course of
breeding that information is passed and exchanged across population members.
How this information transfer takes place is rather peculiar, and merits discussion
in some detail, and not only because this is where genetic algorithms justify the
\genetic" in their name.
Figure 9 illustrates the breeding process in the context of a simple 2-D max-
imization problem, such as the P1 or P2 test problems. In this case an individual
is a (x y) point, and so is \dened" by two oating point numbers. The rst step
is to encode the two oating point numbers dening each individual selected for
breeding. Here this is done simply by dropping the decimal point and concate-
nating the resulting set of simple decimal integers into a \chromosome"-like string
(lines 01|06 on Figure 9). Breeding proper is a two step process. The rst step
23
is crossover. The two strings generated by the encoding process are laid side by
side, and a cutting point is randomly selected along the length of the dening
strings. The string fragments located right of the cutting point are then inter-
changed, and spliced onto the fragments originally located left of the cutting point
(lines 07|12, for a cutting point located between the third and fourth decimal
digit). The second breeding step is mutation. For each string produced by the
crossover process, a few randomly selected digits (or \genes") are replaced by a
new, randomly selected digit value (lines 13|16, for a mutation hitting the tenth
digit of the second ospring string). The resulting fragments are then decoded
into two (x y) pairs, whose tness is then evaluated, here simply by computing
the function value f (x y).
Some additional comments are in order. First, note that ospring incorporate
intact \chunks" of genetic material coming from both parents that's the needed
inheritance, as well as the promised exchange of information between trial solu-
tions. However, both the crossover and mutation operations also involve purely
stochastic components, such as the choice of cutting point, site of mutation, and
new value of mutated digit. This is where we get the variability needed to sus-
tain the evolutionary process, as discussed earlier. Second, the encoding/decoding
process illustrated on Figure 9 is just one of many possible such schemes. Tra-
ditionally, genetic algorithms have made use of binary encoding, but this is often
not particularly advantageous for numerical optimization. The use of a decimal
genetic \alphabet" is no more articial than a binary representation, even more so
given that very nearly all known living organisms encode their genetic information
in a base-4 alphabet. In fact, in terms of encoding oating-point numbers, both
binary and decimal alphabets suer from signicant shortcomings that can aect
the performance of the resulting optimization algorithms. Third, the crossover
and mutation operators, operating in conjunction with the encoding/decoding
processes as illustrated on Figure 9, preserve the total range in parameter space.
That is, if the oating-point parameters dening parent solutions are restricted to
the range 0:0 1:0], then the ospring solution parameters will also be restricted
to 0:0 1:0]. This is a very important property, through which one can eortlessly
hardwire constraints such as positivity. Fourth, having the mutation operator
act on the encoded form of the parent solution has the interesting consequence
that ospring can dier very much or very little from their parents, depending on
whether the digits aected by mutation decode into one of the leading or trailing
digits of the corresponding oating-point number. This means that from the point
of view of parameter space exploration, a genetic algorithm can carry out both
wide exploration and ne tuning in parallel. Fifth, it takes two parents to pro-
duce (simultaneously) two ospring. One can of course devise orgiastic breeding
schemes that involve more than two parents and yield any number of ospring.
24
Experience shows that this rarely improves the performance of the resulting algo-
rithms. Sixth, f (u) must obviously be computable for all u, but not necessarily
25
dierentiable since derivatives of the tness function with respect to its input pa-
rameters are not required for the algorithm to operate. From a practical point of
view, this can be a great advantage.
2.4 Information transfer in genetic algorithms
Time to step back and revisit the issue of information processing. Genetic
algorithms achieve transfer of information through the breeding of trial solutions
selected on the basis of their tness, which is why the crossover operator is usually
deemed to be the dening feature of genetic algorithms, as compared to other
classes of evolutionary algorithms (see, e.g., B#ack 1996).
The joint action of crossover and tness-based selection on a population of
strings encoding trial solutions is to increase the occurrence frequency of sub-
strings that convey their decoded trial solution above-average tness, at a rate
proportional to dierence between the average tness of all trial solutions includ-
ing that substring in their \genotype" (i.e., the string-encoded version of their
dening parameter set), and the average tness of the whole population. The
mathematical expression of the preceding mouthful, adequately expanded to take
into account the possibility of substring disruption by crossover or mutation, is
known as the Schema Theorem, and is originally due to Holland (1975 see also
Goldberg 1989). As the population evolves in response to breeding and tness-
based selection, advantageous substrings are continuously sorted and combined by
crossover into single individuals, leading to an inexorable tness increase in the
population as a whole. Because this involves the concurrent processing of a great
many distinct substrings, Holland dubbed this property intrinsic parallelism, and
argues that therein fundamentally lies the exploratory power of genetic algorithms.
This section opens with a brief overview of the operators and techniques
included in PIKAIA. Internally, PIKAIA seeks to maximize a user-dened function
f (x) in a bounded n-dimensional space, i.e.,
x (x1 x2 :::xn) xk 2 0:0 1:0] 8k : (12)
The restriction of parameter values in the range 0:0 1:0] allows greater exibility
and portability across problem domains. This, however, implies that the user must
adequately normalize the input parameters of the function to be maximized with
respect to those bounds.
The maximization is carried out on a population made up of Np individuals
(trial solutions). This population size remains xed throughout the evolution.
Rather than evolving the population until some tolerance criterion is satised,
PIKAIA carries the evolution over a user-dened, preset number of generations Ng .
28
PIKAIA oers the user the exibility to specify a number of other input param-
eters that control the behavior of the underlying genetic algorithm. The subroutine
does include built-in default settings that have proven robust across problem do-
mains. All such input parameters are passed to PIKAIA in the 12-dimensional
control vector ctrl. See Section 4 of the PUG for the allowed and default values
of those control parameters.
The top-level structure of PIKAIA is the same as the sequence of algorithmic
steps listed in x2.3: an outer loop controlling the generational iteration, and an
inner loop controlling breeding. Since breeding involves the production of two
ospring, the inner loop executes Np =2 times per generational iteration, where Np
is the population size (Np = 100 is the default value).
All parameter values dening the individual members of the initial population
are assigned a random number in the range 0:0 1:0], extracted from a uniform
distribution of random deviates (see x3.3 of the PUG). This ensures that no initial
bias whatsoever is introduced by the initialization.
3.2 Minimal algorithmic components
3.2.1 Selection PUG, x3.4]
PIKAIA uses a stochastic selection process to assign to each individual in the pop-
ulation a probability of being selected for breeding. Specically, that probability
is made linearly proportional to the tness-based rank of each individual within
the current population. This is carried out using a scheme known as the Roulette
Wheel Algorithm, as detailed in x3.4 of the PUG (see also Davis 1991, chap. 1).
Note that in general it is not a good idea to make selection probability directly
proportional to tness value, as this often leads to a loss of selection pressure
late in the evolutionary run, once most population members have \found" the
global optimum. In some cases it can also lead, early on, to a \superindividual"
being selected so frequently that the population becomes degenerate through the
computational equivalent of inbreeding. The proportionality constant between
tness-based rank and selection probability is specied as an input parameter to
PIKAIA. The default value is 1.0.
Figure 10: Panel (A) shows convergence curves for 10 distinct runs of GA2 on
P1. As before the error is dened as 1 ; f (x y). Panel (B) shows, for the single run
plotted with a thicker line on panel (A), the variations with generation count of the
best individual of the population (solid line), median-ranked individual (dashed
line), and mutation rate (dotted line).
33
Figure 11: Evolution of the population of trial solutions in parameter space, for
the GA2 run shown as a thicker line on Fig. 10. The concentric circles indicate the
rings of secondary maxima, and the larger, solid black dot is the ttest solution of
the current generation.
34
the current best via both crossover and mutation (Fig. 11 F]). Note how elitism is
essential here, otherwise the \mutant" having landed on the slopes of the central
peak would have a low likelihood of replicating itself intact into the subsequent
generation, in view of the high mutation rate.
GA1 basically behaves in exactly the same way, with the important exception
that many more generations are needed for the favorable mutation to show up
this is because GA1 operates with a xed, low mutation rate, while GA2 lets this
rate vary depending on the degree of convergence of the population (cf. x3.3.2).
3.5 Hamming walls and creep mutation
We are doing pretty well with GA2, but we still need to correct a fundamental
shortcoming of the one-point mutation operator arising from the decimal encoding
scheme of Fig. 9. Consider a problem where the sought-after optimal solution
requires the following substring to be produced by the evolutionary process:
..........21000..........
decoding into the oating point number 2.1000 now, early in the evolutionary run
an individual having, say,
..........19123..........
will likely be tter than average, and so this genetic material will spread through-
out the population. After a while, following favorable mutations or crossover
recombinations, the substring might look like, say
..........19994..........
which is admittedly quite close to 21000. However, two very well coordinated
mutations are needed to push this towards the target 21000: the \1" must mutate
to a \2" and the rst \9" to a \0". Note that either mutation occurring in isolation,
and/or mutating to a dierent digit value, takes us farther from the target oating
point number. Mutation being a slow process, the probability of the needed pair
of mutations occurring simultaneously will in general be quite small, meaning that
the evolution would have to be pushed over many generations for it to happen. The
population is getting \piled up" at internal boundaries of the encoding system.
These boundaries are called Hamming walls. They can be bypassed by choos-
ing an encoding scheme such that successive single mutations can always lead to a
continuous variation in the decoded parameter. This is why the so-called Gray bi-
nary coding (e.g., Press et al. 1992, x20.2) is now used almost universally in genetic
algorithms based on binary encoding. Another possibility is to devise mutation
operators that can jump over Hamming walls.
35
Creep mutation does precisely this. Once a digit on the encoding string has
been targeted for mutation, instead of simply replacing the existing digit by a
randomly chosen one, just add either +1 or ;1 (with equal probability), and if
the resulting digit is < 0 (because a \0" has been hit with \;1") or > 9 (because
a \9" has been hit with \+1"), carry the one over to the next digit on the left.
Just like in grade school. So, for example, creep mutation hitting the middle \9"
with +1 in the last substring above would lead to
..........20094..........
same number of generations as the GA2 and GA3 runs. Once again performance
measures are established on the basis of 1000 distinct runs for each method.
Evidently GA2 and GA3 outperform GA1 on all aspects of performance to a
staggering degree. GA1 is not much of a global numerical optimization algorithm.
Comparison with Table I shows that its global performance exceeds somewhat that
of the simplex method in single-run mode, but the number of function evaluations
required by GA1 to achieve this is orders of magnitude larger.
What is also plainly evident on Table II is the degree to which GA2 and GA3
outperform iterated simplex for a given level of initial sampling of parameter space.
Although the number of function evaluations required is typically an order of
magnitude larger, both algorithms are far better than iterated simplex at actively
exploring parameter space. This is plain evidence for the positive e ects of transfer
of information between trial solutions in the course of the search process.
The worth of creep mutation can be ascertained by comparing the global
40
performance of the GA2 and GA3 solutions. The results are not clear-cut: GA3
does better than GA2 on P1 and P3, a little worse on P2, and signicantly worse
on P4. The usefulness of creep mutation is contingent on there actually being
Hamming walls in the vicinity of the global solution if there are, creep mutation
helps, sometimes quite a bit. Otherwise, it eectively decreases the probability of
taking large jumps in parameter space, and so can be deleterious in some cases.
This is what is happening here with P2, where moving away from the secondary
maximum requires a large jump in parameter space to take place, from (x y) =
(0:5 0:5) to (0:6 0:1).
At any rate, the above discussion amply illustrates the degree to which global
performance is problem-dependent. This cannot be overemphasized. You should
certainly beware of any empirical comparisons between various global optimization
methods that rely on a small set of test problems, especially of low dimensional-
ity. You should also keep in mind that GA2 is one specic instance of a genetic
algorithm-based optimizer, and that other incarnations may behave dierently
|either better or worse| on the same test problems.
4. A REAL APPLICATION:
ORBITAL ELEMENTS OF BINARY STARS
where n is the dimension of parameter space and x(n) is a vector of n oating point
numbers dening a trial solution. Of course the function's name, here orbit, can
be whatever you like, but do declare it external in the calling program. For the
orbital element tting problem we have n= 6. The function itself basically goes
through the sequence of steps listed in x4.3.2 to compute a 2 , which is then used
to dene a tness as per eq. (22).
One important thing relates to the scaling of the input parameters. The
scaled versions of the x(n)'s (cf. x4.3.1) must be stored in new variables local to
the tness function. Storing the rescaled parameters back into the x(n)'s, i.e.,
x(1)=200.+x(1)*600. *****NEVER DO THIS*****]
You can of course pick any seed value other than 123456, as long as it is a nonzero
positive integer. Initializing all components of ctrl to some negative value, as
done here, forces PIKAIA to use its internal default settings. This yields GA2,
23 is distributed with a random number generator which is deterministic,
PIKAIA
meaning that it must be given a seed value, from which it will always produce the
same sequence of random deviates (on a given platform).
48
evolving a population of Np = 100 individuals over Ng = 500 generations. For
other possible settings (and their algorithmic consequences) see x4.5 of the PUG.
With the tness function dened as described above, a call to PIKAIA looks like
call pikaia(orbit,n,ctrl,xb,fb,status)
Figure 14: Evolution of a typical solution to the binary orbit tting problem.
Part (A) shows the 2 (inversely proportional to tness) of the best and median
individuals in the population, as well as the mutation rate (dotted line). Part
(B) shows the corresponding variations of the six parameters dening the best
individual (scaled to 0 1]).
50
4.3.6 Error estimates
You might think we're done, but we certainly are not our allegedly global solution
of x4.3.5 is almost worthless until we can specify error bars of some sort on the
best t model parameters.
The traditional way derivative-based local hill climbing methods compute er-
ror bars \automatically" is via the Hessian matrix of second derivatives evaluated
at the best-t solution (e.g., Press et al. 1992, x15.5). This local curvature infor-
mation is not particularly useful when dealing with a global optimization problem.
What we want is some information about the shape and extent in parameter space
of the region where solutions with 2 1 are to be found. This is usually done by
Monte Carlo simulation, by perturbing the best t solution and computing the 2
of these perturbed solutions (see, e.g., Bevington & Robinson 1992, x11.6 Press
et al. 1992, x15.6). This is undoubtedly the most reliable way to get error estimates.
All it requires is the ability to compute a 2 given a set of (perturbed) model pa-
rameters if you have found your best-t solution using a genetic algorithm-based
optimizer such as PIKAIA, you already have available the required computational
machinery: it is nothing else than your tness function.
In relatively low-dimensionality parameter spaces such as for our orbital tting
problem, it is often even simpler to just construct a hypercube centered about
the best-t solution and directly compute 2 (u) at some preset spatial resolution
across the cube. Figure 15 shows the result of such an exercise, in the context
of our orbital tting problem. The Figure shows 2 isocontours, with the best-t
solution of x4.3.5 indicated by a solid dot. A strong well-dened error correlation
between ! and is seen on panel (D). This \valley" in parameter space becomes
longer and atter as orbital eccentricities approach zero. The gradual, parallel
increase in ! and visible on Fig. 14(B) corresponds to the population slowly
\crawling" along the valley oor this process is greatly facilitated by the use of
creep mutation. Weaker error correlations are also apparent between e, V0 and K .
The diamonds are a series of best-t solutions returned by a series of GA2 runs
extending over 2500 generations. Only the runs having returned a solution with
2 1:715 are shown. The dashed lines are the means of the inferred parameter
values. Notice, on Panels (A) and (B), the \pileup" of solutions at K = 8:3696,
and a similar such accumulation at ! = 324 on panel (D). These parameter values
map onto Hamming walls in the scaled 0 1] parameter range used internally by
PIKAIA. It just so happens that for these data and adopted parameter range,
two such walls lie close to the best-t solution this does not prevent GA2 from
converging, but it slows it down signicantly in this case the solutions stuck at
walls still lie well within the 68.3% condence level, but GA2 needs to be pushed
to a few thousands of generations to reliably locate the true 2 minimum. Since
51
When, then, should you consider using genetic algorithms on a real-life re-
search problem? There is no clear-cut answer to this questions, but based on my
own relatively limited experience I would oer the following list:
(1) Markedly multimodal optimization problems where it is dicult to make a
reliable initial guess as to the approximate location of the global optimum
(2) Optimization problems for which derivatives with respect to dening parame-
ters are very hard or impossible to evaluate in closed form. If a reliable initial
guess is available, the simplex method is a strong contender if not, genetic
algorithms become the method of choice
(3) Ill-conditioned data modeling problem, in particular those described by inte-
gral equations
(4) Problems subjected to positivity or monotonicity constraints than can be
hardwired in the genetic algorithm's encoding scheme.
This is of course not meant to be exclusive of other classes of problems. One con-
straint to keep in mind is the fact that genetic algorithm-based optimization can be
CPU-time consuming, because of the large number of model evaluations typically
required in dealing with a hard problem. The relative ease with which genetic
algorithms can be parallelized can oset in part this diculty: on a \real life"
problem most work goes into the tness evaluation, which proceeds completely in-
dependently for each individual in the population (see Metcalfe and Charbonneau
2002 for a specic example). Never forget that your time is always more precious
than computer time.
In the introductory essay opening his book on genetic programming, Koza
(1992) lists seven basic features of \good" conventional optimization methods:
correctness, consistency, justiability, certainty, orderliness, parsimony, and deci-
siveness. He then goes on to argue that genetic algorithms embody none of these
presumably sound principles. Is this ground to reject optimization methods based
58
on genetic algorithms? Koza does not think so, and neither do I. From a practi-
cal point of view the bottom line always is: use whatever works. In fact, that is
precisely the message conveyed, loud and clear, by the biological world.
I would like to bring this tutorial to a close with a nal, Third Rule of Global
Optimization. Unlike the rst two, you probably would not nd something equiv-
alent in optimization textbooks. In fact I did not come up with this rule, although
I took the liberty to rename it. It originates with Francis Crick, co-discoverer of
DNA and 1962 Nobel Prize winner. So here is the Fourth Rule of Global Opti-
mization, also known as24
25 There is no answer to this exercise on the Tutorial Web Page, only a few hints
but if you do get interesting results (namely, signicantly enhanced performance
that remains robust across problem domain), I would very much like to hear about
it.
62
63
BIBLIOGRAPHY