No Tutorial
No Tutorial
Paul Charbonneau
HIGH ALTITUDE OBSERVATORY NATIONAL CENTER FOR ATMOSPHERIC RESEARCH BOULDER, COLORADO
ii
iii
TABLE OF CONTENTS
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Introduction: Optimization 1.1 Optimization and hill climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 1.2 The simplex method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 1.3 Iterated simplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 1.4 A set of test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 1.5 Performance of the simplex and iterated simplex methods . . . . . . . . 13 2. Evolution, optimization, and genetic algorithms 2.1 Biological evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 The power of cumulative selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3 A basic genetic algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.4 Information transfer in genetic algorithms . . . . . . . . . . . . . . . . . . . . . . 25 3. PIKAIA: A genetic algorithm for numerical optimization 3.1 Overview and problem de nition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Minimal algorithmic components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3 Additional components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.4 A case study: GA2 on P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.5 Hamming walls and creep mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6 Performance on test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4. A real application: orbital elements of binary stars 4.1 Binary stars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Radial velocities and Keplerian orbits . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 A genetic algorithm solution using PIKAIA . . . . . . . . . . . . . . . . . . . . 45 5. Final thoughts and further readings 5.1 To cross over or not to cross over? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2 Hybrid methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
iv 5.3 When should you use genetic algorithms? . . . . . . . . . . . . . . . . . . . . . . 56 5.4 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
LIST OF FIGURES
1 Operation of a generic hill climbing method . . . . . . . . . . . . . . . . . . . . . . . . . .3 2 A hard maximization problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 3 An iterated hill climbing scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5 4 Absolute performance of the simplex method . . . . . . . . . . . . . . . . . . . . . . . . . .8 5 Test problem P2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6 Test problem P4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 7 Accelerated Norsk learning by means of cumulative selection . . . . . . . . . . 20 8 Convergence curves for the sentence learning search problem . . . . . . . . . . 21 9 Breeding in genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 10 Convergence curves for GA2 on P1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 11 Evolution of the population in parameter space . . . . . . . . . . . . . . . . . . . . . . 33 12 Global convergence probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 13 Radial velocity variations in Bootis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 14 Evolution of a typical solution to the binary orbit tting problem . . . . . . 49 15
2
. . . . . . . . . . . . . . 51
vi
vii
LIST OF TABLES
vs iterated simplex . . . . . . . . . . . . 39
viii
ix
PREFACE
In 1998 I was invited to present a lecture on genetic algorithms at a MiniWorkshop on Numerical Methods in Astrophysics, held June 3{5 at the Institute for Theoretical Astrophysics, in Oslo, Norway. I subsequently prepared a written version of the lecture in the form of a tutorial introduction to genetic algorithms for numerical optimization. However, for reasons beyond the organizers' control, the planned Proceedings of the Workshop were never published. Because the written version, available through the PIKAIA Web Page since september 1998, continues to prove popular with users of the PIKAIA software, I decided to \publish" the paper in the form of the present NCAR Technical Note. The paper is organized as follows. Section 1 establishes the distinction between local and global optimization and the meaning of performance measures in the context of global optimization. Section 2 introduces the general idea of a genetic algorithm, as inspired from the biological process of evolution by means of natural selection. Section 3 provides a detailed comparison of the performance of three genetic algorithm-based optimization schemes against iterated hill climbing using the simplex method. Section 4 describes in full detail the use of a genetic algorithm to solve a real data modeling problem, namely the determination of orbital elements of a binary star system from observed radial velocities. The paper closes in section 5 with re ections on matters of a somewhat more philosophical nature, and includes a list of suggested further readings. I ended up making very few modi cations to the text originally prepared in 1998, even though if I were to rewrite it now some things undoubtedly would turn out di erent. The suite of test functions I now use to test modi cations to PIKAIA has evolved signi cantly from that presented in x2 herein. Version 1.2 of PIKAIA, publicly released in April 2002, would compare even more favorably to the iterated simplex method against which PIKAIA 1.0 is pitted in x3 herein. I updated and expanded the list of further reading (x5.5) to better re ect current topic and trends in the genetic algorithm literature. In addition to some minor rewording here and there throughout the text, I also restored a Figure to x1, and a nal subsection to x2, both originally eliminated to t within the 50-page limit of the above-mentioned ill-fated Workshop Proceedings. Back in 1998, I chose to give this paper the avor of a tutorial. Each section ends with a summary of important points to remember from that section. You are
x of course encouraged to remember more than whatever is listed there. You will also nd at the end of each section a series of Exercises. Some are easy, others less so, and some require programming on your part. These are designed to be done using PIKAIA, a public domain self-contained genetic algorithm-based optimization subroutine. The source code for PIKAIA |as well as answers to most exercises| are available on the tutorial Web Page, from which you can also access the PIKAIA Web Page:
http://www.hao.ucar.edu/public/research/si/pikaia/tutorial.html
The Tutorial Page also includes various animations for some of the solutions discussed in the text. The PIKAIA Web Page contains links to the HAO ftp archive, from which you can obtain, in addition to the source code for PIKAIA, a User's Guide, as well as source codes for the various examples discussed therein. The idea behind all this is that by the time you are done reading through this paper and doing the Exercises, you should be in good shape to solve global numerical optimization problems you might encounter in your own research. The writing of this preface o ers a ne opportunity to thank my friends and colleagues Viggo Hansteen and Mats Carlsson for their invitation and nancial support to attend their 1998 Mini-Workshop on Numerical Methods in Astrophysics, as well as for their kind hospitality during my extended stay in Norway. The CrB data and some source codes for the orbital element tting problem of x4 were provided by Tim Brown, who was also generous with his time in explaining to me some of the subtleties of orbital element determinations. Thorough readings of the 1998 draft of this paper by Sandy and Gene Arnn, Tim Brown, Sarah Gibson, Barry Knapp and Hardi Peter are also gratefully acknowledged. Throughout my twelve years working at NCAR's High Altitude Observatory, it has been my privilege to interact with a large number of bright and enthusiastic students and postdocs. My forays into genetic algorithms have particularly bene ted from such collaborators. Since 1995, I have had to keep up in turn with Ted Kennelly, Sarah Gibson, Hardi Peter, Scott McIntosh, and Travis Metcalfe. I thank them all for keeping me on my toes all this time.
1. INTRODUCTION: OPTIMIZATION
2 detail. In nearly all cases, those methods will fall under the broad category of hill climbing schemes. The operation of a generic hill climbing scheme is illustrated on Figure 1, in the context of maximizing a function of two variables, i.e., nding the maximum \altitude" in a 2-D \landscape". Hill climbing begins by choosing a starting location in parameter space (panels A]{ B]). One then determines the local steepest uphill direction, moves a certain distance in that direction (panel C]), re-evaluates the local uphill direction, and so on until a location in parameter space is arrived at where all surrounding directions are downhill. This marks the successful completion of the maximization task (panel D]). Most textbook optimization methods basically operate in this way, and simply di er in how they go about determining the steepest uphill direction, choosing how a big a step is to be taken in that direction, and whether or not in doing so use is made of gradient information accumulated in the course of previous steps. Hill climbing methods work great if faced with unimodal landscapes such as the one towards which the rabid paratrooper of Fig. 1(A) is about to deposit his lower backside. Unfortunately, life is not always that simple. Consider instead the 2-D landscape shown on Figure 2 the maximum is the narrow central spike indicated by the arrow, and is surrounded by concentric rings of secondary maxima. The only way that hill climbing can nd the true maximum in this case is if our paratrooper happens to land somewhere on the slopes of the central maximum hill climbing from any other landing site will lead to one of the rings. The central peak covers a fractional surface area of about 1% of the full parameter space (0 x y 1). Unlike on the landscape of Fig. 1(A), here the starting point is critical if hill climbing is to work. Hill climbing is a local optimization strategy. Figure 2 o ers a global optimization problem. Of course, if the speci c optimization problem you are working on happens to be such that you can always come up with a good enough starting guess, then all you need is local hill-climbing, and you can proceed merrily ever after. But what if you are in the situation most people nd themselves in when dealing with a hard global optimization problem, namely not being in a position to pull a good starting guess out of your hat? I know what you're thinking. If the central peak covers about 1% of parameter space, it means that you have about one chance in a hundred for a random drop to land close enough for hill climbing to work. So the question you have to ask yourself is: do I feel lucky?2 Your answer to this question is embodied in the First
2
domly chosen starting point (panel A]), the direction of maximum slope is followed (panel C]) until one reaches a point where all surrounding directions are downhill (panel D]). Landing (panel B]) is not problematic from the computational point of view.
maximization problem. The global maximum is f (x y) = 1 at (x y) = (0:5 0:5), and is indicated by the arrow.
Faced with the landscape of Figure 2 the most straightforward solution lies with a technique called iterated hill climbing. This is a fancy name for something very simple, as illustrated on Figure 3. You just run your favorite local hill climbing method repeatedly, each time from a di erent randomly chosen starting point. While doing so you keep track of the various maxima so located, and once you are satis ed that all maxima have been found you pick the tallest one and you are done with your global optimization problem. As you might imagine, deciding when to stop is the crux of this otherwise straightforward procedure.
6 With a fractional coverage of 1% for the central peak of Figure 2, you might expect to have to run, on average, something of the order of 102 iterated hill climbing trials before nding the central peak. As one is faced with optimization problems of increasing parameter space dimensionality, and/or situations where the global maximum spans only a tiny fraction of parameter space, iterated hill climbing can add up to a lot of work. This leads us naturally to the Second Rule of Global Optimization, also known as THE NO FREE LUNCH RULE: \If you really want the global optimum, you will have to work for it" These considerations also lead us to distinguish between three distinct aspects of performance, when dealing with a global optimization problem3: (1) Absolute performance: How numerically accurate is the solution returned by my adopted method? (2) Global performance: How certain can I be that the solution returned by my method is the true global maximum in parameter space? (3) Relative performance: How much computational work is required by my method to return a solution? Most fancy optimization methods you might read about in textbooks are designed to do as well as possible on (1) and (3) simultaneously. Such methods will do well on (2) only if provided with a suitable starting guess. If such a guess is consistently available for the problems you are working on, you need not read any further. But rest assured that Dirty Harry will catch up with you one of these days.
7 The Simplex Method of Nelder & Mead (1965) is actually a very robust hill climbing scheme. A brief yet clear introduction to the method can be found in Press et al. (1992, x10.4). A simplex is a geometrical gure with n +1 vertices, that lives in a parameter space of dimension n. In 2-D space a simplex is a triangle, in 3-D space a tetrahedron, and so on. Given the function value (here the \altitude" f (x y)) at each of the simplex's vertices (here an (x y) point), the worst vertex is displaced by having the simplex undergo one of three possible types of \moves", namely contraction, expansion, or re ection (see Fig. 10.4.1 in Press et al.). The move is executed in a manner such that the function value of the displaced vertex is increased by the move (in the context of a maximization problem). The simplex undergoes successive such moves until no move can be found that leads to further improvement beyond some preset tolerance. Watching the simplex contract and expand and squirt around the landscape of Fig. 2 is good visual fun4 , and justi es well the name given by Press et al. to their simplex subroutine: amoeba. This is the implementation used here. By the standards of local optimization methods, the simplex passes for a \slow" method. The absolute accuracy of the solution increases approximately linearly with the number of simplex moves. However, the simplex can pull itself out of situations that would defeat or seriously impede faster, \smarter" gradient-based local methods it can e ciently crawl up long at valleys, and squeeze through saddle points. In this sense, it can be said to exhibit pseudo-global capabilities. Evidently the simplex method requires that one provide initial coordinates (x y) for the simplex's three vertices. Despite the simplex method's pseudo-global abilities, on a multimodal, global problem the choice of initial location for the simplex often determines whether the global maximum is ultimately found. Figure 4 shows a series of convergence curves for the test problem of Figure 2. Each curve corresponds to a di erent, random initial simplex con guration. When the simplex nds the central peak, it does so rather quickly, requiring about 25 moves for 10;5 accuracy. The problem, of course, is that the simplex often does not converge to the central peak. Repeated trials reveal that the method achieves global convergence for only 2% or so of trials.
Figure 2. Each curve corresponds to a di erent starting simplex. Failure of the simplex to locate the central peak leads to the convergence curves leveling o at relatively high values of 1;f (x y). With 2 converged runs out of 10 trials, this plot is not representative of the simplex method's global performance on this problem, which is in fact signi cantly poorer, namely about 2%.
iterations. And indeed it does repeatedly running the simplex (500 times) on the test problem of Figure 2 leads to the central peak being located in 99.5% of trials5 . The price to pay of course, is in the number of function evaluations required to achieve this level of global performance: nearly 104 function evaluations per iterated simplex run, on average6 . Welcome back to the No Free Lunch Rule... It is recommended practice when using the simplex in single-run mode to carry out a random restart once the simplex has converged this entails reinitializing randomly all but one of the converged simplex's vertices, and letting the simplex reconverge again. What is described here as iterated simplex consists in reinitializing all vertices randomly, so as to make each successive trial fully independent from all others. 6 A single simplex move may entail more than one function evaluation. For
5
f (x y) = cos2 (n r) exp(;r2 = 2)
(1a)
r2 = (x ; 0:5)2 + (y ; 0:5)2 x y 2 0 1] (1b) where n = 9 and 2 = 0:15 are constants. The global maximum is located at (x y) = (0:5 0:5), with f (x y) = 1:0. This global maximum is surrounded by concentric rings of secondary maxima, centered on the global maximum at radial distances rmax = f0:110192 0:220385 0:330582 0:440782 0:550986g :
(2) Between these are located another series of concentric rings corresponding to minima: 1=2 rmin = m ; m = 1 ::: 6 (3) n
" = 1 ; f (x y) :
(4)
example if the trial move does not lead to an increase in f , the move might be repeated with a halved or doubled displacement length (or a di erent type of move might be attempted, depending on implementation). On the maximization problem of Figure 2, one simplex move requires 1.8 function evaluations, on average. 7 The high-n, high-D version of the fractal function discussed in x3.5 of Back (1996) is a pretty good candidate for the ultimate killer test problem.
10 Note that the \peak" corresponding to the global maximum covers a surface area =(4n2) in parameter space. If a hill climbing scheme were used, the probability of a randomly chosen starting point landing close enough to this peak for the method to locate the true global maximum is only 1% for n = 9. 1.4.2 P2: maximizing a function of two variables again 2 parameters] Test function P2, shown on Figure 5, is again a 2-D landscape to be maximized. It is de ned by 2 =(0:3)2) + 0:879008 exp(;r2 =(0:03)2 ) f (x y) = 0:8 exp(;r1 (5a) 2 2 = (x ; 0:5)2 + (y ; 0:5)2 (5b) r1 2 = (x ; 0:6)2 + (y ; 0:1)2 : r2 (5c) The maximum f (x y) = 1 is at (x y) = (0:6 0:1), and corresponds to the peak of the second, narrower Gaussian. P2 is about as hard a global optimization problem as P1 (the simplex succeeds 141 times out of 104 trials), but for a di erent reason. There are now only two local maxima, with the global maximum again covering about 1% of parameter space. Unlike P1, where moving toward successively higher secondary extrema actually brings one closer to the true maximum, with P2 moving to the secondary maximum pulls solutions away from the global maximum. Problems exhibiting this characteristics are sometimes called \deceptive" in the optimization literature. 1.4.3 P3: maximizing a function of four variables 4 parameters] Test problem P3 is a direct generalization of P1 to four independent variables (w x y z): f (w x y z) = cos2 (n r) exp(;r2= 2) (6a) r2 = (w ; 0:5)2 + (x ; 0:5)2 + (y ; 0:5)2 + (z ; 0:5)2 w x y z 2 0 1] (6b) again with n = 9 and 2 = 0:15. Comparing performance on P1 and P3 will provide a measure of scalability of the method under consideration, namely how performance degrades as parameter space dimensionality is increased, everything else being equal. P3 is a very hard global optimization problem the simplex method manages to nd the global maximum only 6 times out of 105 trials. 1.4.4 P4: Minimizing a least squares residual 6 parameters] Our fourth and nal test problem is de ned as a \real" nonlinear least squares tting problem. Consider a function of one variable (x) de ned as the sum of two Gaussians: ! 2 2 X ( x ; x ) j : (7) y(x) = Aj exp ; 2
j =1 j
11
two variables, de ned by two Gaussians (see eqs. 5]). The global maximum is f (x y) = 1 at (x y) = (0:6 0:1), and is indicated by the arrow.
De ne now a \dataset" by evaluating this function for a set of K equidistant values of xk in the interval 0 1], i.e., yk y(xk ), xk+1 ; xk = x, for some set values of A1 x1 etc. Given that dataset and the functional form used to generate it (i.e., eq. 7]), the optimization problem is then to recover the parameter values for A1 x1 etc originally used to produce the dataset. This is done by minimizing the square residual
R(A1 x1
A2 x2
2) =
K X k=1
y ; y(xk A1 x1
A2 x2
2 2 )]
(8)
with respect to the 6 parameters de ning the two Gaussians. If one is told a priori that two Gaussians are to be t to the data, then this residual minimization problem is obviously equivalent to a 6-D function maximization problem for 1=R (say), which simply de nes a function in 6-D space. Figure 6 shows the dataset generated using the parameter set A1 x1 1 A2 x2 2] = 0:9 0:3 0:1 0:3 0:8 0:025] (9)
12
and K = 51 discretization points in x. Once again the resulting minimization problem is not an easy one given the discretization in x, the minimization is largely dominated by the need to accurately t the broader, high amplitude rst Gaussian the second Gaussian is not only of much lower amplitude, it is also poorly sampled in x. Fitting only the rst Gaussian leads to a reasonably low residual (R ' 0:25) global accuracy requires the second Gaussian to be also \detected" and t, in which case only does R ! 0. The simplex succeeds in properly tting both Gaussians 123 out of 103 trials. What are the \secondary minima" on which the simplex remains stuck? They can be divided into two broad classes: (1) one of the model Gaussians ts the broad, higher amplitude component, and the other is driven to zero, either by having A ! 0 or ! 0 (2) the method returns a two Gaussians solution, where x1 = x2 = 0:3, 1 = 2 = 0:1, and A1 + A2 = 0:9. The 6-D parameter space contains long, at \valleys" and \plains" of low but suboptimal residual values in which the simplex grinds to a halt.
13 Simplex performance measures on test problems Test Problem P1 Performance Simplex 37 1 0.194 44 1 0.413 Iterated simplex 0.00793 3872 100 0.0619 4412 100 0.07713 35252 500 0.0069
Table I
P2
P3
P4
h1 ; f i
0.0213
0.633
0.898
0.0263
0.931
70 1 0.332 753 1
0.00006
0.016
0.123
0.941
37638 20
14 At this stage only a few comments need be made on the basis of Table I. The rst is that, as advertised, all four test problems are hard global optimization problems, as can be judged from the poor global performance of the basic simplex method on each. Turning to iterated simplex leads to spectacular improvement in global performance in all cases, but of course the number of required function evaluations goes up by a few orders of magnitude. In fact the global performance of iterated simplex can be predicted on the basis of the single-run simplex. The global performance on the later can be viewed as a probability (p) of locating the global maximum the (complementary) probability of a given run not to do so is 1 ; p the probability of all iterated simplex runs not nding the global maximum is then (1 ; p)N , so that the probability of any one of Nt iterations locating the global maximum is
t
pG = 1 ; (1 ; p)N :
t
(10)
On the basis of eq. (10) one would predict global performances (0.884, 0.930, 0.029, 0.927) on P1 through P4, given the number of hill climbing iterations listed in the rightmost column of Table 1, which compares quite well with the actual measured global performance. One can also rewrite eq. (10) as
; pG ) Nt = log(1 log(1 ; p)
(11)
to predict the expected number of hill climbing iterations required to achieve a global performance level pG with p ' 6 10;5 for P3, requiring pG = 0:95 would demand (on average) Nt ' 50000 hill climbing trials, adding up to a grand total of about 3:5 106 function evaluations since a single simplex run on P3 carries out on average 70 function evaluations (cf. Table 1). Iterated hill climbing certainly works, but there really is no such thing as a free lunch... It is easy to predict the expected global performance of iterated simplex because each trial proceeds completely independently. The improvement in global performance simply re ects the better initial sampling of parameter space associated with the initial distribution of simplex vertices. Everything else being equal, as problem dimensionality (n) increases the number of trials Nt required can be expected to scale as Nt / an , where a is some number characterizing in this case the fraction of parameter space covered by the global maximum. Iterated simplex is not only demanding in terms of function evaluations, but in addition it does not scale well at all on a given problem as dimensionality is increased. This, in fact, is the central problem facing iterated hill climbing in general, not just its simplex-based incarnation.
15 The poor scalability of iterated hill climbing stems from the fact that each trial proceeds independently. The challenge in developing global methods that are to outperform iterated hill climbing consists in introducing a transfer of information between trial solutions, in a manner that continuously \broadcasts" to each paratrooper in the squadron the topographical information garnered by each individual paratrooper in the course of his/her local hill climb. The challenge, of course is to achieve this without overly biasing the ensemble of trials. A relatively well-known method that often achieves this reliably is simulated annealing (Metropolis et al. 1953 see also Press et al. 1992, x10.9). Simulated annealing is inspired by the global transfer of energy/information achieved by colliding constituent particles of a cooling liquid metal, which allows the substance to achieve the crystalline/metallic con guration that minimizes the total energy of the whole system. The algorithmic implementation of the technique for numerical optimization requires the speci cation of a cooling schedule, which is far from trivial: fast cooling is computationally e cient (low hNf i) but can lead to convergence on a secondary extremum (low pG ), while slow cooling improves global convergence (high pG ), but at the expense of a high hNf i. No Free Lunch, remember... Genetic Algorithms achieve the same goal, but are inspired by the exchange of genetic information occurring in a breeding population subjected to natural selection. They can be used to form the core of very robust, global numerical optimization methods, as detailed in Section 3 below. The following Section provides a brief introduction to genetic algorithms in a more general sense.
Global optimization is a totally di erent game from local optimization. You should never feel lucky. There is no such thing as a free lunch. You can always design a problem that will defeat any global optimization method.
(1) Look back at Figure 4. Whenever the simplex fails to achieve global convergence (i.e., 1 ; f ! 0) it seems to remain stuck at a discrete set of 1 ; f values. What do these values correspond to? (2) Consider again the use of iterated simplex on the test problem of Figure 2 calculate the fractional surface area of the part of the central peak that lies higher that the innermost ring of secondary maxima. On this basis what
16 would you predict the required number of simplex trials to be, on average, for iterated simplex to locate the central peak. (3) Repeat the same analysis as for Exercise (2) above, but in the context of the P3 test problem. Are your results in basic agreement with Table 1? How can you explain the di erences (if any)?
17
18 to the fact that an o spring receives complementary genes from two parents (which is true of most animals), provides the needed source of variability. The individual that moves, feeds and mates in real space can be looked at as an outer manifestation of its de ning genes8 . Think then of an individual's tness as a function of the values assumed by its genes. What evolution does is to drive a gradual increase in average tness values over the course of many generations. This is what Darwin called adaptation. Now that's beginning to sound like hill climbing, doesn't it? In fact evolution does not optimize, at least not in the mathematical sense of the word. Evolution is blind. Evolution does not give a damn about globally maximal tness (n'en deplaise a Teilhard de Chardin). Even if it did, evolution must accommodate physical constraints associated with development and growth, so that not all paths are possible in genetic \parameter space". All evolution does is produce individuals of above-average tness. Nonetheless, the basic ideas of natural selection and inheritance with variation can be used to construct very robust algorithms for global numerical optimization.
This sentence is 27 characters long including blank spaces, and is made up of an alphabet of 30 letters if a blank character is included (please note that I am taking into account the famous Scandinavian letters A, , and ). Consider now This is said without at all denying that a large part of what makes us who we are arises from learning and other interactions with the environment in the course of development and growth what genes encode is some sort of basic behavioral Bauplan from which these higher level processes take o . 9 The original sentence used by Dawkins is METHINK IT IS LIKE A WEASEL, which, of course, is taken from Shakespeare's Hamlet.
8
19 the process of producing 27-character-long sentences by randomly selecting letters from the 30 available characters of the alphabet. Here's an example:
GE YTAUMNBGH JH A QMWCXNES
Doesn't look much like the original sentence... although careful comparison will show that two letters actually coincide. The total number of distinct 27-characterlong sentences that can be made out of a 30-character alphabet is 3027 = 7:63 1039. This is a very large number, even by astronomical standards. The corresponding probability of generating our rst, target sentence by this random process on the rst trial is then (30);27 ' 10;40. This is such a small number that invoking the Dirty Harry Rule at this point would be moot. Instead consider the following procedure: (1) Generate 10 sentences of 27 randomly chosen characters (2) Select the sentence that has the most correct letters (3) Duplicate this best sentence ten times (4) For each such duplicate, randomly replace a few letters10 (5) Repeat steps (2) through (4) until the target sentence has been matched. This search algorithm incorporates the three ingredients mentioned previously as essential to the evolutionary process. Step (2) is natural selection, in fact in a deterministic and rather extreme form since the best and only the best acts as progenitor to the next \generation". Step (3) is inheritance, again of a rather extreme form as o spring start o as exact replicas of the (single) progenitor. Step (4) is a stochastic process which provides the required variability. Note also that the algorithm operates with minimal \ tness" information all it has available is how many correct letters a sentence contains, but not which letters are correct or incorrect. What is still missing is exchange of information between trial solutions, but be patient, this will come in due time. Figure 7 illustrates the \evolution" of the best-of-10 sentence, starting from an initial ten random sentences, as described above. The mutation rate was set at p = 0:01, meaning that any given letter has a probability 0.01 of being subjected to random replacement. Iteration count is listed in the leftmost column, and error in the rightmost column. Error is de ned here simply as the number of incorrect letters in the best sentence generated in the course of the current iteration. Note how the error decreases rather rapidly at rst, but much more slowly later on it takes about as many iterations to get the rst 20 letters right as it takes to get the last one. The target sentence is found after only 918 iterations, in the course More precisely, de ne a mutation rate as the probability p (2 0 1]) that a given constituent letter be randomly replaced.
10
20
count is listed in the left column, and the error, de ned as the number of incorrect letters, in the rightmost column. The target sentence is found after 918 iterations. of which 9180 trial sentences were generated and \evaluated" against the target. This is almost in nitely less than the 1040 of enumerative or purely random search. Figure 8 shows convergence curves for three runs starting with the same initial random sentence, but evolving under di erent mutation rates. The solid line is the solution of Figure 7. Note how the solution with the highest mutation rate converges more rapidly at rst, but eventually levels o at a nite, nonzero error level. What is happening here is that mutations are producing the needed correct letters as fast as they are destroying currently correct letters. Given an alphabet size and sentence length, there will always exist a critical mutation rate above which this will happen11 . There are two important things to remember at this point. First, mutation This is in fact a notion central to our understanding of the emergence of life. Among a variety of self-replicating molecules of di erent lengths \competing" for chemical constituents in limited supply in the primaeval soup, those lying closest to the critical mutation rate can adapt the fastest to an evolving chemical
11
21
Figure 8: Convergence curves for the sentence search problem, for three di erent
mutation rates. The curves show the error associated with the best sentence produced at each iteration. The solid line corresponds to the solution shown on Figure 7.
is a mixed blessing. It is clearly needed as a source of variability, but too much of it is de nitely deleterious. Second, the general shape of the convergence curves in Figure 8 is worth noting. Convergence is rather swift at rst, but then levels o . This is a behavior we will meet again and again in what follows. Time now to move on, nally, to genetic algorithms.
environment without self-destructing, and so rapidly take over the soup (see, Eigen 1971 for a comprehensive though somewhat dated review). This is conjectured to be the explanation behind the universality of the genetic code among very nearly all living organisms.
22
23 is crossover. The two strings generated by the encoding process are laid side by side, and a cutting point is randomly selected along the length of the de ning strings. The string fragments located right of the cutting point are then interchanged, and spliced onto the fragments originally located left of the cutting point (lines 07|12, for a cutting point located between the third and fourth decimal digit). The second breeding step is mutation. For each string produced by the crossover process, a few randomly selected digits (or \genes") are replaced by a new, randomly selected digit value (lines 13|16, for a mutation hitting the tenth digit of the second o spring string). The resulting fragments are then decoded into two (x y) pairs, whose tness is then evaluated, here simply by computing the function value f (x y). Some additional comments are in order. First, note that o spring incorporate intact \chunks" of genetic material coming from both parents that's the needed inheritance, as well as the promised exchange of information between trial solutions. However, both the crossover and mutation operations also involve purely stochastic components, such as the choice of cutting point, site of mutation, and new value of mutated digit. This is where we get the variability needed to sustain the evolutionary process, as discussed earlier. Second, the encoding/decoding process illustrated on Figure 9 is just one of many possible such schemes. Traditionally, genetic algorithms have made use of binary encoding, but this is often not particularly advantageous for numerical optimization. The use of a decimal genetic \alphabet" is no more arti cial than a binary representation, even more so given that very nearly all known living organisms encode their genetic information in a base-4 alphabet. In fact, in terms of encoding oating-point numbers, both binary and decimal alphabets su er from signi cant shortcomings that can a ect the performance of the resulting optimization algorithms. Third, the crossover and mutation operators, operating in conjunction with the encoding/decoding processes as illustrated on Figure 9, preserve the total range in parameter space. That is, if the oating-point parameters de ning parent solutions are restricted to the range 0:0 1:0], then the o spring solution parameters will also be restricted to 0:0 1:0]. This is a very important property, through which one can e ortlessly hardwire constraints such as positivity. Fourth, having the mutation operator act on the encoded form of the parent solution has the interesting consequence that o spring can di er very much or very little from their parents, depending on whether the digits a ected by mutation decode into one of the leading or trailing digits of the corresponding oating-point number. This means that from the point of view of parameter space exploration, a genetic algorithm can carry out both wide exploration and ne tuning in parallel. Fifth, it takes two parents to produce (simultaneously) two o spring. One can of course devise orgiastic breeding schemes that involve more than two parents and yield any number of o spring.
24
Figure 9: Breeding in genetic algorithms. Here the process is illustrated in the context of a 2-D maximization problem (such as P1 or P2 of x1.4). An individual is
an (x y) point, and two such parent individuals are needed for breeding (denoted P(P1) and P(P2) here). The one-point crossover and one-point mutation operators act on string representations of the parents (S(P1) and S(P2)) to produce o spring strings S(O1) and S(O2), which are nally decoded into two o spring (x y) points P(O1) and P(O2). Experience shows that this rarely improves the performance of the resulting algorithms. Sixth, f (u) must obviously be computable for all u, but not necessarily
25 di erentiable since derivatives of the tness function with respect to its input parameters are not required for the algorithm to operate. From a practical point of view, this can be a great advantage.
Natural selection alone cannot lead to evolution inheritance and variation are also needed. Cumulative selection can accelerate an otherwise random search process by a factor that is astronomically enormous. Genetic Algorithms are search techniques that make use of simpli ed forms of the biological selection/inheritance/variation triad.
All exercises for this part of the tutorial aim at letting you explore quantitatively the probabilistic aspects of the sentence search example of x2.2.
26 (1) First some basic probability calculations, to warm up. (a) what is the probability of getting all of the letters wrong on an initial random trial? (b) getting at least one letter (any letter) right? (c) getting exactly one letter (any letter) right? (2) In the run of Figure 7, it took 671 iterations to get to the point of having 26 correct letters out of 27. What is now the probability of obtaining a fully correct sentence in one of the ten mutated copies after the subsequent iteration? What is the probability of all mutated copies having regressed to only 25 correct letters? (3) Given the sentence length S = 27, alphabet size A = 30, and a mutation rate p, obtain an estimate (i.e., not a formal calculation) for the number of iterations required, on average, to reach zero error. How does your estimate compare to Figure 7? Do you think that Figure 7 is a typical solution? (4) Given again a sentence length S , an alphabet size A, and a mutation rate p, calculate the error level at which the sentence search algorithm will saturate (like the dotted line on Figure 9). Use this result to estimate an optimal mutation rate as a function of S and A that will, on average, lead to convergence in the smallest possible number of iterations. (5) In terms of an analogy for biological evolution, what do you think are the most signi cant failings of the sentence search example?
27
This section opens with a brief overview of the operators and techniques included in PIKAIA. Internally, PIKAIA seeks to maximize a user-de ned function f (x) in a bounded n-dimensional space, i.e.,
x (x1 x2 :::xn)
xk 2 0:0 1:0] 8k :
(12)
The restriction of parameter values in the range 0:0 1:0] allows greater exibility and portability across problem domains. This, however, implies that the user must adequately normalize the input parameters of the function to be maximized with respect to those bounds. The maximization is carried out on a population made up of Np individuals (trial solutions). This population size remains xed throughout the evolution. Rather than evolving the population until some tolerance criterion is satis ed, PIKAIA carries the evolution over a user-de ned, preset number of generations Ng .
28
PIKAIA o ers the user the exibility to specify a number of other input parameters that control the behavior of the underlying genetic algorithm. The subroutine does include built-in default settings that have proven robust across problem domains. All such input parameters are passed to PIKAIA in the 12-dimensional control vector ctrl. See Section 4 of the PUG for the allowed and default values of those control parameters. The top-level structure of PIKAIA is the same as the sequence of algorithmic steps listed in x2.3: an outer loop controlling the generational iteration, and an inner loop controlling breeding. Since breeding involves the production of two o spring, the inner loop executes Np =2 times per generational iteration, where Np is the population size (Np = 100 is the default value). All parameter values de ning the individual members of the initial population are assigned a random number in the range 0:0 1:0], extracted from a uniform distribution of random deviates (see x3.3 of the PUG). This ensures that no initial bias whatsoever is introduced by the initialization.
29 probability that the crossover operation actually takes place (default is 0.85) (2) the mutation rate, which sets the probability, for each digit making up the de ning string of an o spring, that a mutation takes place at that digit location (default is 0.005). 3.2.3 Population replacement PUG, x3.8] Under PIKAIA's default settings the o spring population is accumulated into temporary storage, and once the number of such o spring equals that of the current breeding population the latter is deleted and replaced by the o spring population. This is the default strategy used by PIKAIA, although it is possible for the user to specify other population replacement techniques (see PUG, xx3.8.2, 3.8.3).
30 it provides the much needed source of variability through which novel parameter values are injected into the population. However, it also leads to the destruction of good solutions. This was precisely the point of Figure 8 (dotted line). Finding the exact value for the mutation rate that achieves optimal balance between those two e ects to maximize the former while minimizing the latter is of course possible14 . However, in doing so one nds that the optimal parameter settings often end up being highly problem dependent15 . One powerful solution to this problem is to dynamically adjust the mutation rate. The key to this strategy lies with recognizing that as long as the population is broadly distributed in parameter space, the crossover operator leads to a pretty e cient \search" as it recombines fragments of existing solutions. However, once the population has converged |whether on a secondary or absolute optimum| crossover no longer achieves much, as it leads to the exchange of fragments that are nearly identical since all parents have nearly identical parameter values. This, obviously, is where a high mutation rate is needed to reinject variability into the population. Consider then the following procedure. At any given time, keep track of the tness value of the ttest population member, and of the median ranked member. The tness di erence f between those two individuals is clearly a measure of population convergence if f is large the population is presumably distributed more broadly in parameter space than if f is very small. Therefore, if f becomes too small, increase the mutation rate if it becomes too large, decrease the mutation rate again. This is how PIKAIA dynamically adjusts its mutation rate during run-time. This strategy represents a simple form of self-adaptation of a parameter controlling the behavior of the underlying genetic algorithm. Further details and implementation issues are discussed in x3.7.2 of the PUG.
31 tness value of the ttest individual versus generation count, for 10 separate runs of GA2. Figure 10(A) should be compared to Figure 4, showing the convergence of the simplex on the same problem. Early on, the curves have qualitatively similar shapes16 either convergence occurs relatively quickly (much more quickly for simplex, when it does converge), or solutions remain \stuck" on one of the rings of secondary extrema (cf. Fig. 2), which leads to the error leveling o at a xed value. Unlike simplex, however, GA2 is able to pull itself o the secondary extrema rings. It does so primarily through mutation, although crossover between two parents properly positioned in parameter space can achieve the same e ect. Mutation being a fundamentally stochastic process, it is then not surprising to see di erent GA2 runs requiring di erent generation counts before the needed favorable mutation takes place. Clearly mutation plays a critical role here. Figure 10(B) shows the tnesses of the best (solid line) and median-ranked (dashed line) individuals in the population as a function of generational count, for the GA2 run plotted with a thicker line on panel (A). The dotted line shows the variation of the mutation rate. Figure 11 shows the distribution of the population in 2-D parameter space17 , at the epochs indicated by solid dots on Fig. 10(B). To start with, note on Fig. 11(A) that no individual in the initial random population has landed anywhere close enough to the central peak for hill climbing to work. The rst few generational iterations see the population cluster itself closer and closer to center (Fig. 11 B]), but the tness di erence between best and median is still quite large. The mutation decreases slightly from its initial (low) value, but then remains constant. By the 15th generation (Fig. 11 C]) most of the population has converged somewhere on the inner ring of secondary extrema (f = 0:9216), so that the tnesses of the best and median are now comparable. This leads to a sharp increase of the mutation rate (between the 12thand 20th generations). The high mutation rate results in o spring being knocked all over parameter space in the course of breeding (Fig. 11 D]). While some mutant individuals do land regularly on the slope of the central peak, it is only by the 55th generation that one such mutant is catapulted high enough to become the ttest of the current population (Fig. 11 E]). Further breeding during subsequent generations brings more and more individuals to the central peak and further increases in tness of You might notice that GA2 already starts o doing signi cantly better than the simplex method this merely results from the initial random population of GA2 having \sampled" 50 points in parameter space, compared to only 3 for the simplex. 17 An animation of the evolving population for this solution can be viewed on the Tutorial Web Page.
16
32
Figure 10: Panel (A) shows convergence curves for 10 distinct runs of GA2 on P1. As before the error is de ned as 1 ; f (x y). Panel (B) shows, for the single run
plotted with a thicker line on panel (A), the variations with generation count of the best individual of the population (solid line), median-ranked individual (dashed line), and mutation rate (dotted line).
33
Figure 11: Evolution of the population of trial solutions in parameter space, for
the GA2 run shown as a thicker line on Fig. 10. The concentric circles indicate the rings of secondary maxima, and the larger, solid black dot is the ttest solution of the current generation.
34 the current best via both crossover and mutation (Fig. 11 F]). Note how elitism is essential here, otherwise the \mutant" having landed on the slopes of the central peak would have a low likelihood of replicating itself intact into the subsequent generation, in view of the high mutation rate. GA1 basically behaves in exactly the same way, with the important exception that many more generations are needed for the favorable mutation to show up this is because GA1 operates with a xed, low mutation rate, while GA2 lets this rate vary depending on the degree of convergence of the population (cf. x3.3.2).
decoding into the oating point number 2.1000 now, early in the evolutionary run an individual having, say,
..........19123..........
will likely be tter than average, and so this genetic material will spread throughout the population. After a while, following favorable mutations or crossover recombinations, the substring might look like, say
..........19994..........
which is admittedly quite close to 21000. However, two very well coordinated mutations are needed to push this towards the target 21000: the \1" must mutate to a \2" and the rst \9" to a \0". Note that either mutation occurring in isolation, and/or mutating to a di erent digit value, takes us farther from the target oating point number. Mutation being a slow process, the probability of the needed pair of mutations occurring simultaneously will in general be quite small, meaning that the evolution would have to be pushed over many generations for it to happen. The population is getting \piled up" at internal boundaries of the encoding system. These boundaries are called Hamming walls. They can be bypassed by choosing an encoding scheme such that successive single mutations can always lead to a continuous variation in the decoded parameter. This is why the so-called Gray binary coding (e.g., Press et al. 1992, x20.2) is now used almost universally in genetic algorithms based on binary encoding. Another possibility is to devise mutation operators that can jump over Hamming walls.
35
Creep mutation does precisely this. Once a digit on the encoding string has been targeted for mutation, instead of simply replacing the existing digit by a randomly chosen one, just add either +1 or ;1 (with equal probability), and if the resulting digit is < 0 (because a \0" has been hit with \;1") or > 9 (because a \9" has been hit with \+1"), carry the one over to the next digit on the left. Just like in grade school. So, for example, creep mutation hitting the middle \9" with +1 in the last substring above would lead to
..........20094..........
which achieves the desired e ect of \jumping" the wall. The one thing creep mutation does not allow is to take large jumps in parameter space. As argued before, jumping is actually a needed capability consequently, in practice for each o spring individual a probability test will decide whether onepoint or creep mutation is to be used (with equal probabilities). Creep mutation is not included in the original release of PIKAIA (now known as PIKAIA 1.0, although it is in version 1.2, which has been released in April 2002 (see the PIKAIA Web Page and the Release Notes for PIKAIA 1.2, NCAR Technical Note 451-STR). The results described in what follows were obtained using a modi ed version of PIKAIA 1.0, GA3, which includes creep mutation but is otherwise identical to GA2.
36 Such considerations are easily quanti ed. Let Np and Ng be the population size and generation length of a run the required number of function evaluations, Nf , is obviously Nf = Np Ng GA1 GA2 GA3] (13a) while for iterated simplex Nf is the number of hill climbing trials (Nt) times the average number of function evaluations required by a single simplex run (Ns this quantity is run- and problem-dependent):
Nf = Nt Ns :
Iterated simplex]
(13b)
So we play the following game: we run iterated simplex and GA2 for increasing numbers of generations/iterations, and check whether global convergence is achieved to get statistically meaningful results we do this 1000 times for each method and each generation/iteration count. This allows us to empirically establish the probability of global convergence (pG, 2 0 1]) as a function of generation/iteration count. In doing so, to decide whether or not a given run has globally converged we use again the criteria f 0:95 for P1, P2 and P3, and R 0:1 for P4. The results of this procedure, applied to each test problem, is shown in Figure 12. It should be easy to convince yourself of the following: (1) on P1 and P2, both iterated simplex and GA2 perform equally well on all aspects of performance when pushed long enough to have pG > 0:9 (2) P3 is a hard problem, and neither technique performs satisfactorily on it. Still, GA2 largely outperforms iterated simplex on global performance. (4) On P4, GA2 and iterated simplex do equally well up to pG 0:5, but then GA2's performance starts to lag behind as the solutions are pushed to pG > 0:95. An obvious conclusion to be drawn at this juncture is that iterated hill climbing using the simplex method makes for a pretty decent global optimization scheme. Not quite what you were expecting as a sales pitch for genetic algorithm-based optimization, right? This is in part a consequence of the relatively low dimensionality of our test problems. Recall from x1.5 that iterated simplex leads to improved performance (with respect to single run simplex) primarily as a consequence of the better sampling of parameter space associated with the initial (random) distribution of simplex vertices given enough trials, one is almost guaranteed to have one initial simplex vertex landing close enough to the evaluation involves (1) the construction of a 2-D rotation curve, (2) a large matrixvector multiplication, (3) the calculation of a 2 against some 600 data points. This adds up to about half a CPU-second on a Cray J90. All test problems of x1.4 require very little computation in comparison.
37
evaluations Nf required by iterated simplex (diamonds) and GA2 (solid dots) on the four test problem (P1: dotted line P2: dashed line P3: solid line P4: dashdotted line). The probabilities were estimated from 1000 distinct trials and, in the case of iterated simplex, Nf is an average over the 1000 trials. global maximum to ensure subsequent global convergence. In low dimensional search spaces, iterated simplex thus ends up being quite competitive. Figure 12 already indicates that this \edge" does not carry over to higher dimensionality (compare results for P1 and P3). GA2's performance on P2 is actually a delicate matter. Take another look at Figure 5 and consider what happens once the population has converged to the broad, secondary maximum (as it does early in the run for nearly every single trial) for mutation to propel a solution from (x y) ' (0:5 0:5) to the narrow peak (x y) = (0:6 0:1), two very well coordinated mutations must take place simultaneously, otherwise mutant solutions end up in regions of rather low elevations and do not contribute much to the next generation. This is a low probability occurrence even at relatively high mutation rates19 , so the process takes time. GA2's global performance on P2 then results from an interplay between one-point
19
Notice on Figure 11(D)|(F) how few solutions show up in the corners of the
38 mutation and the rather direct relationship that exists here between a solution's de ning parameters and its string representation, on which mutation and crossover operate. If the narrow Gaussian is centered on (x y) = (0:5 0:9), then a single mutation can propel a solution from the broad, central Gaussian to the narrow one. Not surprisingly, on this modi ed problem GA2 outperforms iterated simplex to a signi cant degree: pG = 0:987 with only Nf = 2500, i.e., faster than iterated simplex by a factor of 5. Encoding is a tricky business, with potentially far-reaching consequences on performance. Iterated simplex's superior performance on P4 is certainly noteworthy, yet re ects in part the peculiar structure of parameter space de ned by the Gaussian tting problem, which is relatively well accommodated by the simplex method's pseudo-global capabilities. Other local optimization methods do not fare nearly as well. For a detailed comparison of genetic algorithms and other methods on tting Gaussian pro les to real and synthetic data, see McIntosh et al. (1998). It is really only with very hard problems, such as P3, that GA2 starts showing its worth. By any standards, P3 is a very hard global optimization problem. While on its 2-D version GA2 and iterated simplex do about as well, as dimensionality is increased the global performance of iterated simplex degrades much more rapidly than GA2. This is in fact where the power of genetic algorithm-based optimizers lies, although for search spaces of high dimensionality (n > 20, say) the one-point crossover and mutation operators described in x2.3 are usually suboptimal and must be improved upon20 . In some sense, a fairer comparison of the respective exploratory capabilities of GA2 and iterated simplex can be carried out by setting the number of trials in iterated simplex so that the original distribution of simplex vertices samples parameter space with the same density as GA2's initial random population in other words, using the notation of eqs. (13), we set
Nt = Np =(n + 1)
(14)
where n is the dimensionality of parameter space, and compare the results of the resulting iterated simplex runs to some \standard" GA2 and GA3 runs. Such a comparison is presented in Table II, in a format essentially identical to Table I. Performance measures are also listed for a set of GA1 runs extending over the domain.
20
, to be released in April 2002, includes a two-point crossover operator, which generally improves performance for problems involving many parameters. See, e.g., Section 3 of the Release Notes for PIKAIA 1.2 (Charbonneau 2002).
PIKAIA 1.2
39 Performance on test problems (with eq. (14) enforced) Test ProblemPerformance Iter. Simplex GA1 P1 GA2 5000 100 0.0212 GA3 5000 100 0.0289
Table II
P2
P3
P4
0.353
671 17 0.11583
0.302
0.05471
0.619
0.001 0.149
0.018
0.119
0.0753
0.914
0.0071
0.971
0.0024
0.191
0.883
50000 1000
50000 1000
0.845
0.230
0.840
50000 1000
0.698
same number of generations as the GA2 and GA3 runs. Once again performance measures are established on the basis of 1000 distinct runs for each method. Evidently GA2 and GA3 outperform GA1 on all aspects of performance to a staggering degree. GA1 is not much of a global numerical optimization algorithm. Comparison with Table I shows that its global performance exceeds somewhat that of the simplex method in single-run mode, but the number of function evaluations required by GA1 to achieve this is orders of magnitude larger. What is also plainly evident on Table II is the degree to which GA2 and GA3 outperform iterated simplex for a given level of initial sampling of parameter space. Although the number of function evaluations required is typically an order of magnitude larger, both algorithms are far better than iterated simplex at actively exploring parameter space. This is plain evidence for the positive e ects of transfer of information between trial solutions in the course of the search process. The worth of creep mutation can be ascertained by comparing the global
40 performance of the GA2 and GA3 solutions. The results are not clear-cut: GA3 does better than GA2 on P1 and P3, a little worse on P2, and signi cantly worse on P4. The usefulness of creep mutation is contingent on there actually being Hamming walls in the vicinity of the global solution if there are, creep mutation helps, sometimes quite a bit. Otherwise, it e ectively decreases the probability of taking large jumps in parameter space, and so can be deleterious in some cases. This is what is happening here with P2, where moving away from the secondary maximum requires a large jump in parameter space to take place, from (x y) = (0:5 0:5) to (0:6 0:1). At any rate, the above discussion amply illustrates the degree to which global performance is problem-dependent. This cannot be overemphasized. You should certainly beware of any empirical comparisons between various global optimization methods that rely on a small set of test problems, especially of low dimensionality. You should also keep in mind that GA2 is one speci c instance of a genetic algorithm-based optimizer, and that other incarnations may behave di erently |either better or worse| on the same test problems.
Through random initialization of the population, genetic algorithms introduce no initial bias whatsoever in the search process. For numerical optimization, elitism and an adjustable mutation rate are two crucial additions to a basic genetic algorithm. Iterated hill climbing using the simplex method makes a pretty decent global optimization technique, especially for low-dimensionality problems. Performance measures of any global optimization method are highly problemdependent.
(1) Look back at Figure 10. The dynamically adjusting mutation rate levels o at a value of about 0.1. One could have predicted this average value before running the code. How? (Hint: re-read x2.2) (2) Code up a 3-D, 5-D and 6-D version of P1. Using PIKAIA in its GA2 form (default settings except for generation count), investigate how global performance degrades with problem dimensionality. Keep the generation count xed at 2500 (ctrl(2)=2500). How does this compare to iterated simplex?
41
Amusingly, this spectroscopic binary is the brighter component of the rst visual binary discovered by Riccioli: the star Mizar A, in the constellation Ursa Majoris. Even better, it was later realized that Mizar B is also a spectroscopic binary. 22 This gure is for 2002 back in 1998, when this paper was originally written, it was given as 10 m s;1 . Pretty remarkable improvement, in just a little over three years...
21
42
component of the binary star Bootis. The time axis is given in units of Julian Date (one JD = one solar day). Data are from Bertiau (1957), with one lone datum at JD= 23175 missing on this plot. The solid line is the best- t solution obtained later in this section. The asymmetrical shape of the curve is due to the eccentricity of the orbit a circular orbit would lead to a purely sinusoidal radial velocity variation.
(16)
43 The quantity V0 is the radial velocity of the binary system's center of mass, and eq. (16) only holds once the Earths' orbital motion about the Sun as been subtracted out. Note that the quantity V0 cannot be simply \read o " the radial velocity curve, unless the orbit is perfectly circular. The velocity amplitude K is a function of other orbital parameters: a sin i (17) K=2 P (1 ; e2 )1=2 where P is the orbital period, e the orbital eccentricity, a the semi-major orbital axis, and i the inclination angle of the orbital plane with respect to the plane of the sky (i = 0 is an orbit in the plane of the sky, i = 90 an orbit seen edge-on). Because i usually cannot be inferred from the velocity curve (unless the system happens to also be an eclipsing binary), the velocity amplitude K is usually treated as a single parameter. The so-called true anomaly v is the time-like variable, and corresponds to the angle between a radius vector joining the star to the center-ofmass and that joining the orbital perihelion to the center-of-mass. The longitude of the perihelion (!) is the angle subtended by the latter line segment to the line segment de ned by the intersection of the orbital plane with the plane of the sky (see Smart 1971, x195 and Figure 132). The chief complication arises from the fact that for an elliptical orbit the angular velocity about the center of mass is not constant, but obeys instead Kepler's second Law (orbital radius vector sweeps equal areas in equal time intervals). The true anomaly v is related to the so-called eccentric anomaly E via the relation r + e tan E : v tan 2 = 1 (18) 1;e 2 The eccentric anomaly, in turn, is related to time via Kepler's equation (see Smart 1971, x68): (19) E ; e sin(E ) = 2 P (t ; ) where is the time of perihelion passage, by convention the zero-point of the orbit. At the risk of oversimplifying, in essence what E measures is the deviation from constant angular velocity (as in a circular orbit) due to the orbit's eccentricity. Note that eq. (19) is transcendental in E , i.e., it cannot be solved analytically for E as a function of t. Of course it can be solved using any of the classical methods for nonlinear root nding, such as bisection (Press et al. 1992, x9.1). Kepler, of course, did it all by hand. Going over the preceding expressions, one can identify 6 parameters that need to be determined to relate the radial velocity curve to the orbital elements and related quantities. These six parameters are:
44 (1) P , the orbital period (2) , the time of perihelion passage (3) !, the longitude of the perihelion (4) e, the orbital eccentricity (5) K , the orbital velocity amplitude (6) V0 , the system's radial velocity. For later use we will group these six parameters into a vector
u = (P ! e K V0) :
(20)
A number of methods have been devised to infer these parameters from a given radial velocity curve (see, e.g., Smart 1971, x197 Petrie 1962). In what follows we treat this tting problem as a nonlinear least-squares minimization problem. We seek to nd the parameter set u that minimize the reduced 2 given N data points Vjobs V (tj ) with associated error estimates j :
2 (u) =
N ;6
N ;1 j =0
X Vjobs ; V (tj u) !2
j
(21)
where the normalization factor (N ; 6);1 is the number of degrees of freedoms of the t under this normalization 2 < 1 indicates an acceptable t. The minimization problem de ned by eq. (21) can |and will| be solved using a genetic algorithm-based optimizer, speci cally GA3's version of PIKAIA. At this stage you should perhaps only note that performing this minimization using an explicitly gradient-based method would be a real mess. If you need to be convinced of this try di erentiating eq. (16) with respect to e, and see how you like it... This, however, is not the only di culty one encounters in carrying out the t. The most serious problem is related to the existence of solution degeneracies, i.e., widely di ering sets of tting parameters that lead to very similar radial velocity curves, for some classes of orbits. This can become a severe problem for noisy and/or poorly sampled radial velocity curves the search space then becomes markedly multimodal in 2 , and a global method is essential. The Bootis data of Fig. 13 o ers a moderately di cult global optimization problem. So let's give it a go using PIKAIA.
45
46 Given a trial solution, as de ned by a 6-vector u, computing a 2 requires the construction of a synthetic radial velocity curve evaluated at the N data abcissa tj . Once the input parameters have been properly rescaled (x4.3.1), for each of the tj 's, the steps involved are: (1) Given a tj and trial period P , eccentricity e, and time of perihelion passage , solve Kepler's equation (19) for E . This de nes a nonlinear root nding problem for which the bisection method is well-suited (2) Now knowing E , calculate the true anomaly v using equation (18) (3) Now knowing v, and given the trial velocity amplitude K , system velocity V0 and perihelion longitude !, compute the radial velocity V using equation (16) (4) Once V has been computed for all tj , calculate 2 using equation (21). One nal step is required, to relate 2 to tness, in order to set the selection probability of the trial solution. PIKAIA is set up to maximize the user-de ned function, and requires tness to be a positive de nite quantity. We thus set Fitness = ( 2 );1 : (22) Because PIKAIA uses ranking to set selection probability, you need not worry about the functional form you impose between tness and 2 making tness proportional to ( 2 );1=2 (say) would lead to the same rank distribution, and so to the same selection probabilities. Naturally you'd better make sure that the relationship you de ne between tness and goodness-of- t is single-valued and monotonic. Otherwise you can't expect PIKAIA to produce anything sensible. Please do not set tness equal to ; 2 , as PIKAIA's implementation of the Roulette Wheel Algorithm for parent selection requires tness to be a positive-de nite quantity. This has been a common initial mistake for PIKAIA users attempting 2 minimization. 4.3.3 Setting PIKAIA's internal parameters Unless you have good reasons to do otherwise, use PIKAIA's default parameter settings. This is done by initializing all twelve elements of the control vector ctrl to some nonzero negative value, and will result in what we have been calling GA2. The one parameter you most likely want to set explicitly is the generation count Ng . This corresponds to the second element of the control vector, so that for example setting ctrl(2) = 2000 would force PIKAIA to run for 2000 generations, instead of its default value of 500. As you hopefully have gured out by now, the required number of generations is very much problem-dependent. Just be sure to remember the Dirty Harry Rule if you're not sure how to set Ng , err on the high side.
47 4.3.4 Running PIKAIA The rst thing to do is to write a FORTRAN function that is given a trial solution parameter vector u, and returns a tness. This is really the only interface between PIKAIA and the problem at hand. The argument speci cation of the function are hardwired into PIKAIA, so that the beginning of the function must look like
real function orbit(n,x) dimension x(n)
where n is the dimension of parameter space and x(n) is a vector of n oating point numbers de ning a trial solution. Of course the function's name, here orbit, can be whatever you like, but do declare it external in the calling program. For the orbital element tting problem we have n= 6. The function itself basically goes through the sequence of steps listed in x4.3.2 to compute a 2 , which is then used to de ne a tness as per eq. (22). One important thing relates to the scaling of the input parameters. The scaled versions of the x(n)'s (cf. x4.3.1) must be stored in new variables local to the tness function. Storing the rescaled parameters back into the x(n)'s, i.e., x(1)=200.+x(1)*600. *****NEVER DO THIS*****] is guaranteed to have disastrous consequences (besides being poor programming style). This has also been a relatively common initial mistake of PIKAIA users so far. Prior to calling PIKAIA itself three things need to be done: (1) Read the time and radial velocity data and making them accessible to the tness function through an appropriately de ned COMMON block (for example). (2) initialize the random number generator23 , and (3) initialize PIKAIA's control vector. The last two steps are carried out as follows:
seed=123456 call urand_init(seed) do i=1,12 ctrl(i)=-1 enddo
You can of course pick any seed value other than 123456, as long as it is a nonzero positive integer. Initializing all components of ctrl to some negative value, as done here, forces PIKAIA to use its internal default settings. This yields GA2, is distributed with a random number generator which is deterministic, meaning that it must be given a seed value, from which it will always produce the same sequence of random deviates (on a given platform).
PIKAIA
23
48 evolving a population of Np = 100 individuals over Ng = 500 generations. For other possible settings (and their algorithmic consequences) see x4.5 of the PUG. With the tness function de ned as described above, a call to PIKAIA looks like
call pikaia(orbit,n,ctrl,xb,fb,status)
Upon successful termination (indicated by status= 0 on output), the n-dimensional array xb contains the parameters (scaled to 0 1]) of the best trial solution of the last generation, and its tness is given by the scalar output variable fb. By default this is the only output returned by PIKAIA, although additional run-time output can be produced by appropriately setting ctrl(12) to 1 or 2 (see PUG, x4.5). 4.3.5 Results Figure 14 shows results for a typical GA3 run. Part (A) shows convergence curves, namely the 2 value for the best (solid line) and median (dashed line) individual as a function of generation count. Part (B) shows the corresponding variations of the six parameters de ning the best solution (scaled to 0 1]). Most parameters undergo rapid variations over the rst ten generational iterations, but subsequently tend to remain pretty stable until favorable mutations or crossover produce better individuals. The rst such \key" mutation occurs at generation 184, when a period close enough to the true period is nally produced. Note how the solution then remains \stuck" on a e = 0 secondary minimum up to generation 430. The subsequent evolution is characterized by a gradual increase in e, ! and , accompanied by smaller adjustments in K and V0. Notice again on Fig. 14(A) how, especially in the rst few hundred generations, the mutation rate (dotted line) is highest when the best solution is \stable", and decreases again following signi cant changes in best tness. The nal, best solution vector after 1000 generations is (P
(23)
with units as in x4.3.1. This best- t GA3 solution, with 2 = 1:63, is plotted as a solid line on Fig. 13. It turns out that for this problem the use of creep mutation is advantageous, as detailed in the following section. The best- t solution di ers slightly from the best- t solution of Bertiau (1957), but lies well within Bertiau's one- range. The fact that the solution has a 2 signi cantly larger than 1 should not be deemed extremely alarming, as error estimates on V given in Bertiau (1957) are based in part on a subjective assessment of the \quality" of his photographic plates.
49
Figure 14: Evolution of a typical solution to the binary orbit tting problem.
Part (A) shows the 2 (inversely proportional to tness) of the best and median individuals in the population, as well as the mutation rate (dotted line). Part (B) shows the corresponding variations of the six parameters de ning the best individual (scaled to 0 1]).
50 4.3.6 Error estimates You might think we're done, but we certainly are not our allegedly global solution of x4.3.5 is almost worthless until we can specify error bars of some sort on the best t model parameters. The traditional way derivative-based local hill climbing methods compute error bars \automatically" is via the Hessian matrix of second derivatives evaluated at the best- t solution (e.g., Press et al. 1992, x15.5). This local curvature information is not particularly useful when dealing with a global optimization problem. What we want is some information about the shape and extent in parameter space of the region where solutions with 2 1 are to be found. This is usually done by Monte Carlo simulation, by perturbing the best t solution and computing the 2 of these perturbed solutions (see, e.g., Bevington & Robinson 1992, x11.6 Press et al. 1992, x15.6). This is undoubtedly the most reliable way to get error estimates. All it requires is the ability to compute a 2 given a set of (perturbed) model parameters if you have found your best- t solution using a genetic algorithm-based optimizer such as PIKAIA, you already have available the required computational machinery: it is nothing else than your tness function. In relatively low-dimensionality parameter spaces such as for our orbital tting problem, it is often even simpler to just construct a hypercube centered about the best- t solution and directly compute 2 (u) at some preset spatial resolution across the cube. Figure 15 shows the result of such an exercise, in the context of our orbital tting problem. The Figure shows 2 isocontours, with the best- t solution of x4.3.5 indicated by a solid dot. A strong well-de ned error correlation between ! and is seen on panel (D). This \valley" in parameter space becomes longer and atter as orbital eccentricities approach zero. The gradual, parallel increase in ! and visible on Fig. 14(B) corresponds to the population slowly \crawling" along the valley oor this process is greatly facilitated by the use of creep mutation. Weaker error correlations are also apparent between e, V0 and K . The diamonds are a series of best- t solutions returned by a series of GA2 runs extending over 2500 generations. Only the runs having returned a solution with 2 1:715 are shown. The dashed lines are the means of the inferred parameter values. Notice, on Panels (A) and (B), the \pileup" of solutions at K = 8:3696, and a similar such accumulation at ! = 324 on panel (D). These parameter values map onto Hamming walls in the scaled 0 1] parameter range used internally by PIKAIA. It just so happens that for these data and adopted parameter range, two such walls lie close to the best- t solution this does not prevent GA2 from converging, but it slows it down signi cantly in this case the solutions stuck at walls still lie well within the 68.3% con dence level, but GA2 needs to be pushed to a few thousands of generations to reliably locate the true 2 minimum. Since
51
Figure 15:
ing is interval similar to a one- region, see Press et al. 1992, x15.6). The solid dots mark the best- t solution of x4.3.5. Note the clear error correlation between ! and on panel (D). The diamonds are best- t solutions returned by a series of GA2 runs. The dashed lines indicate the corresponding mean parameter values. most of the GA2 solutions would be deemed acceptable on the basis of their 2 value, you might have missed this (I certainly did at rst) unless you chose, as you should, to heed the Dirty Harry Rule and err on the high side in setting the
2 = 0:0053, and the thicker contours corresponds to the 68.3% con dence for 2-parameter joint probability distribution ( 2 = 1:6513 conceptually
Contour spac-
52 generation count Ng . So here is a case where the use of creep mutation is de nitely advantageous. The error-estimation procedure described above is straightforward though (CPU) time-consuming. It certainly works if the computing costs associated with evaluating one model are su ciently low to allow the calculation of many additional solutions once the genetic algorithm has converged. If this is not the case other strategies must be used. One possibility is to accumulate information about 2 isosurfaces in the course of a single evolutionary run. After all, while the population evolves through many hundreds (or thousands) of generations a signi cant (but non-homogeneous) sampling of parameter space is taking place. Throughout the evolutionary run, one stores away all solutions that fall below whichever 2 value is deemed to indicate an adequate t. This information is then used a posteriori to construct 2 -isosurfaces, without having to carry out additional model evaluations. Gibson & Charbonneau (1998) describe one such technique, in the context of a coronal modeling problem. We actually had to take steps to slow down the convergence of our genetic algorithm, in order to achieve a suitable sampling of parameter space in the course of the evolutionary run (see the Gibson & Charbonneau paper for more details). One thing you should de nitely not do is use your population of trial solutions at the last generational iteration to establish error bounds while the nal population will evidently be distributed about the best- t solution if the algorithm has converged, the way population members are distributed in parameter space is greatly in uenced by features of the genetic algorithm, notably the value of the mutation over the last few generations. Such extraneous factors are clearly unrelated to the structure of parameter space in the vicinity of the best- t solution.
Set upper and lower bounds on allowed parameter values that are as wide as possible, while remaining physically meaningful. When using PIKAIA, always make your tness a positive-de nite quantity in your tness function, always store your rescaled input parameters in local variables, rather than back into the input parameter vector x(n). Whenever in doubt as to the number of generations through which you should let your solutions evolve, err on the high side. With global optimization, a posteriori Monte Carlo simulation is the safest way to get reliable error estimates on the global solution.
53
(1) This Exercise lets you have a go at a real orbital tting problem. Your target is the star CrB (meaning, the 17th brightest star in the constellation Corona Borealis), one of the rst stars around which a planet was detected (see Noyes et al. 1997). You can obtain radial velocity data from the Tutorial Web page under Data. These data were obtained using the Advanced Fiber Optic Echelle (AFOE) spectrograph. Follow the procedure outlined in x4.3 to obtain a set of best- t orbital elements. (2) Physically, what do you thing causes the !] degeneracy so obviously apparent on Fig. 15(D)?
54
55
56 uses a dynamically adjustable mutation rate in a manner reminiscent of Evolution Strategies. In evolutionary phases where the mutation rate is low, PIKAIA operates pretty much like a classical genetic algorithm when the mutation rate is high, PIKAIA functions more like a stochastic hill-climber. This is a powerful algorithmic combination indeed. As for the ultimate worth of classical crossover, you are hereby encouraged to form your own opinion by doing Exercise 5.1 below.
57 have something that works well enough for you, don't mess with it! This is in fact the First Rule of Scienti c Computing, also known as
HAMMING'S FIRST RULE: \The purpose of computing is insight, not numbers" When, then, should you consider using genetic algorithms on a real-life research problem? There is no clear-cut answer to this questions, but based on my own relatively limited experience I would o er the following list: (1) Markedly multimodal optimization problems where it is di cult to make a reliable initial guess as to the approximate location of the global optimum (2) Optimization problems for which derivatives with respect to de ning parameters are very hard or impossible to evaluate in closed form. If a reliable initial guess is available, the simplex method is a strong contender if not, genetic algorithms become the method of choice (3) Ill-conditioned data modeling problem, in particular those described by integral equations (4) Problems subjected to positivity or monotonicity constraints than can be hardwired in the genetic algorithm's encoding scheme. This is of course not meant to be exclusive of other classes of problems. One constraint to keep in mind is the fact that genetic algorithm-based optimization can be CPU-time consuming, because of the large number of model evaluations typically required in dealing with a hard problem. The relative ease with which genetic algorithms can be parallelized can o set in part this di culty: on a \real life" problem most work goes into the tness evaluation, which proceeds completely independently for each individual in the population (see Metcalfe and Charbonneau 2002 for a speci c example). Never forget that your time is always more precious than computer time. In the introductory essay opening his book on genetic programming, Koza (1992) lists seven basic features of \good" conventional optimization methods: correctness, consistency, justi ability, certainty, orderliness, parsimony, and decisiveness. He then goes on to argue that genetic algorithms embody none of these presumably sound principles. Is this ground to reject optimization methods based
58 on genetic algorithms? Koza does not think so, and neither do I. From a practical point of view the bottom line always is: use whatever works. In fact, that is precisely the message conveyed, loud and clear, by the biological world. I would like to bring this tutorial to a close with a nal, Third Rule of Global Optimization. Unlike the rst two, you probably would not nd something equivalent in optimization textbooks. In fact I did not come up with this rule, although I took the liberty to rename it. It originates with Francis Crick, co-discoverer of DNA and 1962 Nobel Prize winner. So here is the Fourth Rule of Global Optimization, also known as24
59 functions. I have retained a soft spot for Davis' book because this is the book I used to teach myself genetic algorithms many years ago. It is far less comprehensive in its coverage than Goldberg's or Back's books, but has an application-oriented, no-nonsense avor that I found and continue to nd very refreshing. Two other books well worth looking into are Michalewicz, Z. 1996, Genetic Algorithms + Data Structures = Evolution Programs, third ed., New York: Springer, Mitchell, M. 1996, An Introduction to Genetic Algorithms, Cambridge: MIT Press. Both of these put more emphasis on the type of non-numerical optimization problems that are closer to the hearts of most computer scientists, such as the Traveling Salesman Problem. Mitchell's book has a nice chapter describing applications of genetic algorithms in modeling evolutionary processes in biology. Genetic algorithms were originally developed in a much broader context, centering on the phenomenon of adaptation in quite general terms. Anybody serious about using genetic algorithms for complex optimization tasks should make it a point to work through the early bible in the eld: Holland, J.H. 1975, Adaptation in Natural and Arti cial Systems, Ann Arbor: The University of Michigan Press (second ed. 1992, MIT Press). If you want to know where the name Pikaia comes from, or if you think you just might enjoy an excellent book on evolution, read Gould, S.J. 1989, Wonderful Life. The Burgess Shale and the Nature of History, New York: W.W. Norton & Company. Finally, here are some samples from the computing science literature that I found useful and intellectually stimulating in pondering over some of the more \philosophical" material contained in this tutorial: Back, T., Hammel, U., and Schwefel, H.-P. 1997, Evolutionary computation: comments on the history and current state, in IEEE Transactions on Evolutionary Computation, 1, 3-16, Culberson, J.C. 1998, On the futility of blind search: an algorithmic view of \No Free Lunch", Evolutionary Computation, 6, 109-127, DeJong, K.A. 1993, Genetic algorithms are NOT function optimizers, in Foundation of genetic algorithms 2, ed. L.D. Whitley (San Mateo: Morgan Kaufmann),
60 Hamming, R.W. 1962, Numerical Methods for Scientists and Engineers, New York: McGraw-Hill, chap. N + 1, Holland, J.H. 1995, Hidden Order: How Adaptation builds Complexity (Reading: Addison-Wesley), Koza, J.R. 1992, Genetic Programming: on the programming of computers by means of natural selection (Cambridge: MIT Press), chap. 1, Wolpert, D.H., and Macready, W.G. 1997, No Free Lunch theorems for optimization, in IEEE Transactions on Evolutionary Computation, 1, 67-82.
61 of the population in parameter space. Run this modi ed PIKAIA on the four test problems of x1.4 and compare the results to the GA2 algorithm25.
There is no answer to this exercise on the Tutorial Web Page, only a few hints but if you do get interesting results (namely, signi cantly enhanced performance that remains robust across problem domain), I would very much like to hear about it.
25
62
63
BIBLIOGRAPHY
Back, T. 1996, Evolutionary Algorithms in Theory and Practice, Oxford: Oxford University Press. Bertiau, F.C. 1957, Astrophys. J., 125, 696 Bevington, P.R., & Robinson, D.K. 1992, Data Reduction and Error Analysis for the physical Sciences, second ed., New York: McGraw-Hill Charbonneau, P. 2002, Release Notes for PIKAIA 1.2, NCAR Technical Note 451+STR, Boulder: National Center for Atmospheric Research (PUG) Charbonneau, P., & Knapp, B. 1995, A User's Guide to PIKAIA 1.0, NCAR Technical Note 418+IA, Boulder: National Center for Atmospheric Research (PUG) Charbonneau, P., Tomczyk, S., Schou, J., & Thompson, M.J. 1998, Astrophys. J., 496, 1015 Darwin, C. 1859, On the Origin of Species by Means of Natural Selection, or the Preservation of favoured Races in the Struggle for Life, London: J. Murray Dawkins, R. 1986, The Blind Watchmaker, New York: W.W. Norton Eigen, M. 1971, Die Naturwissenschaften, 58(10), 465 Gibson, S.E., & Charbonneau, P. 1998, J. Geophys. Res., 103(A7), 14511 Goldberg, D.E. 1989, Genetic Algorithms in Search, Optimization & Machine Learning, Reading: Addison-Wesley Hamming, R.W. 1962, Numerical Methods for Scientists and Engineers, New York: McGraw-Hill Holland, J.H. 1975, Adaptation in Natural and Arti cial Systems, Ann Arbor: The University of Michigan Press (second ed. 1992, MIT Press) Jenkins, F.A., & White, H.E. 1976, Fundamentals of Optics, fourth ed., New York: McGraw-Hill Koza, J.R. 1992, Genetic Programming: on the programming of computers by means of natural selection, Cambridge: MIT Press
64 Maynard Smith, J. 1989, Evolutionary Genetics, Oxford: Oxford University Press McIntosh, S.W., Diver, D.A., Judge, P.G., Charbonneau, P., Ireland, J., & Brown, J.C. 1998, Astron. Ap. Suppl., 132, 145 Metcalfe, T., & Charbonneau, P. 2002, J. Comp. Phys., submitted Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., & Teller, E. 1953, J. Chem. Phys., 21, 1087 Michalewicz, Z. 1996, Genetic Algorithms + Data Structures = Evolution Programs, third ed., New York: Springer Nelder, J.A., & Mead, R. 1965, Computer J., 7, 308 Noyes, R.W., Jha, S., Korzennik, S.G., Krockenberger, M., Ninenson, P., Brown, T.M., Kennelly, E.J., & Horner, S.S. 1997, Astrophys. J. Lett., 483, L111 Petrie, R.M. 1962, in Astronomical Techniques, ed. W.A. Hiltner vol. II of Stars and Stellar Systems, eds. G.P. Kuiper & B.M. Middlehurst, Chicago: University of Chicago Press, chap. 23 Press, W.H., Teukolsky, S.A., Vetterling, W.T., & Flannery, B.P. 1992, Numerical Recipes, Second Ed., Cambridge: Cambridge University Press Smart, W.M. 1971, Textbook on Spherical Astronomy, fth ed., Cambridge: Cambridge University Press