Genetic Algorithms in Search Optimization and Machine Learning

Download as pdf
Download as pdf
You are on page 1of 50
GENETIC ALGORITHMS in Search, Optimization &. Machine Learning /:; te DAVIDE. GOLDBERG» As a graduate student at the University of Michigan, David E. Gold- berg spearheaded a successful.nggiect applying genetic algorithms and classifier systems to the coftrol.of natugal gas pipelines. After receiv ing his Ph.D. at the University of Miciigan, Dr. Goldberg joi og faculty of the University of Alabama at Tuscaloosa where he is now Associate Professor of Engineering Mefhanics. Dr. Goldberg has comtinued:his refearch in genetic algorithms and classifier systems and rechi 1985 NSF Presidential Young Investi- gator Award for his work. Dr. Goldberg has 12 years of consulting ex- perience in industry and government and has published numerous arti- cles and papers. 7PI8 fabl\ Genetic Algorithms in Search, Optimization, and Machine Learning David E. Goldberg The University of Alabama ADDISON-WESLEY PUBLISHING COMPANY, INC. Reading, Massachusetts + Menlo Park, California + Sydney Don Mills, Ontario + Madrid + Son Juan + New York + Singapore Amsterdam + Wokingham, England + Tokyo + Bonn ‘The procedures and applications presented in this book have been included for th structional value. They have been tested with care but are swt guaranteed for any partic: ular purpose. The publisher does not offer any warranties or representations, nor does ities with respect to the programs or applications Library of Congress Catalogit Goldberg, David E. (David Eaward), 1953— Genetic algorithms in search, optimization, and machine (earning. Bibliography: p. Includes index. 1, Combinatorial optimization. 2. Algorithms. 3, Machine learning. 1. Title. QA402.5.6635 1989 006.31 88-6276 ISBN 0-201-15767-5 Reprinted with corrections January, 1989 Copyright © 1989 by Addison-Wesley Publishing Company, Inc. All rights reserved. No part of this publication may be reproduced, stored inva retrieval system, of transmitted, in any form or by any means, electronic, mechanical, photocopy: ing, recording, or otherwise, without the prior written permission of the publisher. Printed in the United States of America, Published simultaneously in Canada. 89 10 MA 959499, Foreword [first encountered David Goldberg as a young, PhD-bound Civil Engineer inquir- ing about my course Introduction to Adaptive Systems. He was something of an anomaly becuase his severely practical field experience in the gas-pipeline indus. try, and his evident interest in that industry, did not seem to dampen his interest in what was, afier all, an abstract course involving a lot of “biological stuff.” After he enrolled in the course, I soon realized that his expressed interests in control, gas pipelines, and AI were the surface tell-tales of a wide-ranging curiosity and a talent for careful analysis. He was. and is, an engineer interested in building, but he was. and is, equally interested in ideas. Not long thereafter. Dave asked if I would be willing to co-chair (with Ben Wylie, the chairman of our Civil Engineering Department) a dissertation investi- gating the use of genetic algorithms and classifier systems in the control of gas- pipeline transmission. My first teaction was that this was too difficult a problem for a dissertation—there are no closed analytic sohutians to even simple versions ‘of the problem, and actual operation involves long, craftsmanlike apprenticeships. Dave persisted, and in a surprisingly short time produced a dissertation that, in turn, produced for him a 1985 NSF Presidential Young Investigator Award. So much for my intuition as to what constitutes a reasonable dissertation, In the past few years GAs have gone from an arcane subject known to a few of my students, and their students, to a subject engaging the curiosity of many different research communities including researchers in economics, political s ‘ence, psychology, linguistics. immunology, biology, and computer science. A ma- jor reason for this interest is that GAs really work. Gas offer robust procedures that can exploit massively parallel architectures and, applied to classifier systems, they provide a new route toward an understanding of intelligence and adaptation. David Goldberg's book provides a turnpike into this territory. One cannot be around David Goldberg for long without being infected by his enthusiasm and energy. That enthusiasm comes across in this book. It is also iv Foreword an embodiment of his passion for clear explanations and carefully worked ex- amples. His book does an exceptional job of making the methods of GAs and classifier systems available to a wide audience, Dave is deeply interested in the intellectual problems of GAs and classifier systems, but he is interested even more in seeing these systems used. This book. | think, will be instrumental in realizing that ambition. John Holland ‘Ann Arbor, Michigan Preface This book is'about genetic algorithms (GAs)—search procedures based on the mechanics of natural selection and natural genetics. In writing it, I have tried to bring together the computer techniques, mathematical tools, and research results that'will enable you to apply genetic algorithms to problems in your field. If you ‘choose to do so, you will join a growing group of researchers and practitioners who have come to appreciate the natural analogues, mathematical analyses, and ‘computer techniques comprised by the genetic algorithm methodology. ‘The book is designed £0 be a textbook and a selfstudy guide. I have tested the draft text in a one semester, senior-level undergraduate/first-year graduate course devoted to genetic algorithms. Although the students came from different ‘backgrounds (biochemistry. chemical engineering, computer science, electrical ‘engineering. engineering mechanics, English, mathematics, mechanical engineer- 1g, and physics) and had wide differences in mathematical and computational maturity, they all acquired an understanding of the basic algorithm and its theory of operation. To reach such a diverse audience. the tone of the book is intention. ally casual, and rigor has almost always been sacrificed in the interest of building intuition and understanding, Worked out examples illustrate major topics, and computer assignments are available at the end of each chapter. T have minimized the mathematics, genetics. and computer background re- quired to read this book. An understanding of introductory college-level mathe: matics (algebra and a little calcuts) is assumed. Elementary notions of counting and finite probability are used. and Appendix A summarizes the important con cepts briefly. | assume no particular knowledge of genetics and define all required genetic terminology and concepts within the text. Last, some computet program: ming ability is necessary. If you have programmed a computer in any language. you should be able to follow the computer examples I present. All computer code in this book is written in Pascal, and Appendix B presents a brief introduction to the essentials of that language. Although I have not explicitly subdivided the book into separate parts, the chapters may be grouped in two major categories: those dealing with search and ‘optimization and those dealing with machine fearning. The first five chapters are devoted to genetic algorithms in search and opti- mization. Chapter 1 introduces the topic of genetic search; it also describes a simple genetic algorithm and illustrates the GA's application through a hand cal- culation. Chapter 2 introduces the essential theoretical basis of Gas, covering topics including schemata, the fundamental theorem, and extended analysis. If you dislike theory, you can safely skip Chapter 2 without excessive loss of can- tinuity; however, before doing 50, I suggest you try reading it anyway. The math- ‘ematical underpinnings of Gas are not difficult to follow, but their ramifications are subtle; some attention to analysis early in the study of GAs promotes fuller understanding of algorithm power. Chapter introduces computer implementa- tion of genetic algorithms through example. Specifically, a Pascal code called the simple genetic algorithm (SGA) is presented along with a number of extensions. Chapter 4 presents a historical account of eatly genetic algorithms together with a potpourri of current applications. Chapter 5 examines more advanced genetic ‘operators and presents a number of applications illustrating their use. These in- clude applications of micro- and macro-level operators as well as hybrid techniques. Chapters 6 and 7 present the application of genetic algorithms in machine learning systems. Chapter 6 gives a generic description of one type of genetics- based machine learning (GBML) system, a classifier system. The theory of oper- ation of such a system is briefly reviewed, and one Pascal implementation called the simple classifier system (SCS) is presented and applied to the learning of a boolean function. Chapter 7 rounds out the picture of GBML by presenting a historical review of carly GBML systems together with a selective survey of other cusrent systems and topics. ACKNOWLEDGMENTS Jn writing acknowledgments for a book on genetic algorithms, there is no ques- tion who should get top billing. } thank John H. Holland from the University of Michigan for his encouragement of this project and for giving birth to the infant ‘we now recognize as the genetic algorithms movement. It hasn't been easy nur- turing such a child, At times she showed signs of sturited intellectual growth, and the other kids on the block haven't always treated her very.nicely. Nonetheless, John stood by his baby with the quiet confidence only a father can possess, know- ing that his daughter would one day take her rightful place in the community of ideas. also thank two men who have influenced me in more ways than they know: E. Benjamin Wylie and William D. Jordan. Ben Wylie was my dissertation adviser in Civil Engineering at the University of Michigan. When I approached him with the idea for a dissertation about gas pipelines and genetic algorithms, he was appropriately skeptical, but he gave me the rope and taught me the research and organizational skills necessary not to hang myself, Bill Jordan was my Department Head in Engineering Mechanics at The University of Alabama (he retired in 1986). He was and continues to be a model of teaching quality and administrative fairness that I still strive to emulate. 1 thank my colleagues in the Department of Engineering Mechanics at Ala bama, A. E. Carden, C. H. Chang, C. R. Evces, 5. C. Gambrell, J. L Hill. E. Jones, D.C. Raney, and H. B. Wilson, for their encouragement and support. | also thank my many colleagues in the genetic algorithms community. Particular thanks are due Stewart Wilson at the Rowland Institute for Science for providing special encouragement and a sympathetic ear on numerous occasions. T thank my students (the notorious Bama Gene Hackers), including C. L Bridges, K. Deb, C. L Karr, C. H. Kuo, R. Lingle, Jr, M. P. Samtani, P. Segrest, T. Sivapalan, R. E. Smith, and M. Valenzuela-Rendon, for lots of long hours and hard work. I also recognize the workmanlike assistance rendered by a string of right- hand persons: A. L. Thomas, S, Damsky, B. Korb, and K. Y. Lee. T acknowledge the editorial assistance provided by Sarah Bane Wood at Ala- bama. [ am also grateful to the team at Addison-Wesley, including Peter Gordon, Helen Goldstein, Helen Wythe, and Cynthia Benn, for providing expert advice and assistance during this project. I thank the reviewers, Ken De Jong, John Holland, and Stewart Wilson, for their comments and suggestions. ‘A number of individuals and organizations have granted permission to reprint or adapt materials originally printed elsewhere. I gratefully acknowledge the per- mission granted by the following individuals: L. B. Booker, G. E. P. Box, K. A. De Jong, S. Forrest, J.J. Grefenstette, J. H. Holland, J. D. Schaffer, S. E Smith, and . W. Wilson. I also acknowledge the petmission granted by the following organiza. tions: Academic Press, Academic Press London Ltd. Journal of Theoretical Bi- ology), the American Society of Civil Engineers, the Association for Computing Machinery, the Conference Committee of the International Conference on Ge: netic Algorithms, Kluwer Academic Publishers (Machine Learning), North-Hol- land Physics Publishing, the Royal Statistical Society (Journal of the Royal Statistical Society, C), and Jobn Wiley and Sons, Inc. J thank my spouse and still best friend, Mary Ann, for her patience and assis- tance. There were more than a few evenings and weekends I didn't come home when I said I would, and she proofread the manuscript. judiciously separating my tolerable quips from my unacceptable quirks. Untold numbers of readers would thank you, Nary (sic). if they knew the fate they have been spared by your sound judgment. This material is based upon work supported by the National Science Foun: dation under Grant MSM-8451610. I am also grateful for research support pro- vided by the Alabama Research Institute, Digital Equipment Corporation, Intel viii Preface Corporation, Mr. Peter Prater, the Rowland Institute for Science, Texas Instru- ‘ments Incorporated, and The University of Alabama, fast, it has become a cliche in textbooks and monographs: after thanking one and all for their assistance. the author gallantly accepts blame for all remaining errors in the text. This is usually done with no small amount of pomp and cis- cumstance—a ritualistic incantation to ward off the evil spirits of error. I w forgo this exercise and close these acknowledgments by paraphrasing a piece of graffiti that 1 first spotted on the third floor of the West Engineering Building at To err is human. To really foul up, use a computer. Unfortunately, in writing this book. | find myself subject to both of these sources of error, and no doubt many mistakes remain. Lcan only take comfort in knowing that error is the one inevitable side effect of our human past and the probable destiny of our artificially intelligent future. Contents FOREWORD iii PREFACE v AGENTLE INTRODUCTION TO GENETIC ALGORITHMS 1 ‘What Are Genetic Algorithms? 1 Robustness of Traditional Optimization and Search Methods 2 The Goals of Optimization 6 How Are Genetic Algorithms Different from Traditional Methods? Simple Genetic Algorithm 10 Genetic Algorithms at Work—a Simulation by hand 15 Grist for the Search Mill—Important Similarities 18 Similarity Templates (Schemata) 19 Learning the Lingo 21 Summary 22 Problems 23 Computer Assignments 25 GENETIC ALGORITHMS REVISITED: MATHEMATICAL FOUNDATIONS 27 Who Shall Live and Who Shall Die? The Fundament#! Theorem 28 ‘Schema Processing at Work: An Example by Hand Revisited 33 ‘The Two-armed and k-armed Bandit Problem 36 How Many Schemata Are Processed Usefully? 40 Contents The Building Block Hypothesis 41 Another Perspective: The Minimal Deceptive Problem 46 Schemata Revisited: Similarity Templates as Hyperplanes 53 Summary 54 Problems 55 Computer Assignments 56 COMPUTER IMPLEMENTATION OF AGENETIC ALGORITHM 59 Data Structures 60 Reproduction, Crossover, and Mutation 62 A Time to Reproduce, a Time to Cross 66 Get with the Main Program 68 How Well Does it Work? 70 Mapping Objective Functions to Fitness Form 75 Fitness Scaling 76 Codings 80 A Multiparameter, Mapped, Fixed-Point Coding 82 Discretization 84 Constraints 85 Summary 86 Problems 87 ‘Computer Assignments 88 SOME APPLICATIONS OF GENETIC ALGORITHMS 89 The Rise of Genetic Algorithms 89 Genetic Algorithm Applications of Historical Interest 92 ‘De Jong and Function Optimization 106 Improvements in Basic Tectinique 120 Current Applications of Genetic Algorithms 125 Summary 142 Problems 143 Computer Assignments 145 ADVANCED OPERATORS AND TECHNIQUES IN GENETIC SEARCH 147 Dominance, Diploidy, and Abeyance 148 Inversion and Other Reordering Operators 166 Contents xi Other Micro-operators 179 Niche and Speciation 185 Multiobjective Optimization 197 Knowledge-Based Techniques 201 Genetic Algorithms and Parallel Processors 208 Summary 212 Problems 213 ‘Computer Assignments 214 INTRODUCTION TO GENETICS-BASED MACHINE LEARNING 217 Genetics-Based Machine Learning: Whence It Came 218 What is a Classifier System? 221 Rule and Message System 223 Apportionment of Credit: The Bucket Brigade 225 Genetic Algorithm 229 A Simple Classifier System in Pascal 230 Results Using the Simple Classifier System 245 Summary 256 Problems 258 Computer Assignments 259 APPLICATIONS OF GENETICS-BASED MACHINE LEARNING 261 ‘The Rise of GBML 261 Development of CS-1, the First Classifier System 265 ‘Smith's Poker Player 270 Other Early GBM Efforts 276 ‘A Potpourri of Current Applications 293 Summary 304 Problems 306 Computer Assignments 307 ALOOK BACK, AGLANCE AHEAD 309 APPENDIXES 313 Contents A REVIEW OF COMBINATORICS AND ELEMENTARY PROBABILITY 313 Counting 313 Permutations. 314 Combinations 316 Binomial Theorem 316 Events and Spaces 317 Axioms of Probability 318 Equally Likely Outcomes 319 Conditional Probability 321 Partitions of an E 321 Bayes’ Rule Independent Events 322 Two Probability Distributions: Bernoulli and Binomial 323 Expected Value of a Random Variable 323 Limit Theorems 324 Summary 324 Problems 325 PASCAL WITH RANDOM NUMBER GENERATION FOR FORTRAN, BASIC, AND COBOL PROGRAMMERS 327 ‘Simple\: An Extremely Simple Code 327 ‘Simple2: Functions, Procedures, and More VO 330 Let's Do Something 332 Last Stop Before Freeway 338 Summary 341 A SIMPLE GENETIC ALGORITHM (SGA) IN PASCAL 343 A SIMPLE CLASSIFIER SYSTEM. (SCS) IN PASCAL 351 PARTITION COEFFICIENT TRANSFORMS FOR PROBLEM-CODING ANALYSIS 373 Partition Coefficient Transform 374 ‘An Example: f(x) = x? on Three Bits a Day 375 What do the Partition Coefficients Mean? 376 Contents xiii Using Partition Coefficients to Analyze Deceptive Problems 377 Designing GA-Deceptive Problems with Partition Coefficients 377 Summary 378 Problems 378 Computer Assignments 379 BIBLIOGRAPHY 381 INDEX 403 i A Gentle Introduction to Genetic Algorithms In this chapter, we introduce genetic algorithms: what they are, where they came from, and how they compare to and differ from other search procedures. We illustrate how they work with a hand calculation, and we start to understand their power through the concept of a schema or similarity template, WHAT ARE GENETIC ALGORITHMS? Genetic algorithms are search algorithms based on the mechanics of natural se- lection and natural genetics. They combine survival of the fittest among string, structures with a structured yet randomized information exchange to form a search algorithm with some of the innovative flair of human search. In every generation, a new set of artificial creatures (strings) is created using bits and pieces of the fittest of the old; an occasional new part is tried for good measure. While randomized, genetic algorithms are no simple random walk. They effi- ciently exploit historical information to speculate on new search points with ex- pected improved performance. Genetic algorithms have been developed by John Holland, his colleagues, and his students at the University of Michigan. The goals of their research have been rwofold: (1) to abstract and rigorously explain the adaptive processes of natural systems, and (2) to design artificial systems software that retains the important mechanisms of natisral systems. This approach has led to important discoveries in both natural and artificial systems science. The central theme of research on genetic algorithms has been robustness, the balance between efficiency and efficacy necessary for survival in many differ- 2 Chapter 1 / A Gentle Infroduction to Genetic Algorithms ent environments. The implications of robustness for artificial systems are mani- fold. If artificial systems can be made niore robust, costly redesigns can be reduced or eliminated. If higher levels of adaptation can be achieved, existing systems can perform their functions longef and better. Designers of artificial sys- tems—both software and hardware, whether engineering systems, computer sys tems, or business systems—can only marvel at the robustness, the efficiency, and the flexibility of biological systems. Features for self-repair, self guidance, and re- production are the rule in biological systems, whereas they barely exist in the most sophisticated artificial systems. Thus, we are drawn to an interesting conclusion: where robust performance is desired (and where is it not?). nature does it better; the secrets of adaptation and survival are best learned from the careful study of biological example. Yet we do not accept the genetic algorithm method by appeal to this beauty-of-nature argument alone. Genetic algorithms are theoretically and empirically proven to provide robust search in complex spaces, The primary monograph on the topic 4s Holland's (1975) Adaptation in Natural and Artificial Systems. Many papers and dissertations establish the validity of the technique in function optimization and control applica ving been established as a valid approach to problems, requiring efficient and effective search, genetic algorithms are now finding more widespread application in business, scientific, and engineering circles. The rea sons behind the growing numbers of applications are clear. These algorithms are ‘computationally simple yet powerful in their search for improvement. Further- more, they af not fundamentally limited by restrictive assumptions about the search space (assumptions concerning continuity, existence of derivatives, uni modality, and other matters). We will investigate the reasons behind these attrac tive qualities; but before this, we need to explore the robustness of more widely accepted search procedures. ~~ ROBUSTNESS OF TRADITIONAL OPTIMIZATION AND SEARCH METHODS =~ ‘This book is not a comparative study of search and opti Nonetheless, it is important to question whether conventional search methods meet our robustness requirements. The current literature identifies three main types of search methods: calculus-based, enumerative, and fandom. Let us ex: amine each type to see what conclusions may be drawn without formal testing, Calculus-based methods have been studied heavily. These subdivide into two main classes: indirect and direct, Indirect methods seek local extrema by solving the usually nonlinear set of equations resulting from setting the gradient of the objective function equal to zero. This is the multidimensional generalization of the elementary calculus notion of extremal points, as illustrated in Fig, 1.1. Given a smooth, unconstrained function, finding a possible peak starts by restricting search to those points with slopes of zero in all directions. On the other hand, Robustness of Traditional Optimization and Search Methods 3 fy) FIGURE 1.1 The single-peak function is easy for calculus-based methods. direct (search ) methods seek local optima by hopping on the function and mov. ing in a direction related to the local gradient. This is simply the notion of bill climbing: to find the local best. climb the function in the steepest permissible direction. While both of these calculus-based methods Bave been, improved. extended. hashed. and rehashed. some simple reasoning shows their lack of robustness. First. both methods are local in scope: the optima they seek are the best neighborhood of the current point. For example, suppose that Fig. 1.1 shows portion of the complete domain of interest: more complete picture is shown in Fig. 1.2. Clearly. starting the search or zero-finding procedures in the neighbor- hood of the lower peak will cause us to miss the main event (the higher peak). Furthermore, once the lower peak is reached, further improvement must be sought through random restart or other trickery. Second, calculus-based methods depend upon the existence of derivatives (well-defined slope values). Even if we allow numerical approximation of derivatives. this is a severe shortcoming, Many practical parameter spaces have little respect for the notion of a derivative and the smoothness this implies. Theorists interested in optimization have been too willing w accept the legacy of the great eighteenth and nineteenth-century math ematicians who paimed a clean world of quadratic objective functions. ideal con straints. and ever present derivatives. The real world of search is fraught with discontinuities and vast multimodal. noisy search spaces as depicted in a les calculus-iriendly function in Fig. 1.3. It comes as no surprise that methods d pending upon the restrictive requirements of continuity and derivative existen are unsuitable for all but a very limited problem domain, For this reson and CChopter 1 / A Gentle Introduction to Genetic Algorithms fixy) FIGURE 1.2. The moltipte-peak function causes a dilemma. Which hill should we climb? because of their inherently local scope of search, we must reject calculus-based methods. They are insufficiently robust in unintended dom: Enumerative schemes have been considered in many shapes and sizes. The idea is fairly straightforward; within a finite search space, or a discretized infinite search space, the search algorithm starts looking at objective function values at every point in the space, one at a time. Although the simplicity of this type of 1607 120 fix) 102 bese 0.09 020 0.40 080 080 1.00 x FIGURE 1.3. Many functions are noisy and discontinuous and thus unsuitable for search by traditional methods. Robustness of Traditional Optimization and Search Methods 5 algorithm is attractive, and enumeration is a very human kind of search (when in the robustness race for one simple reason: lack of efficiency. Many practical spaces are simply too large to search one ata time and still have a chance of using the information tp some practical end, Even the highly touted enumerative scheme dynamic programming breaks down on problems of moderate size and complexity, suffering from a malady melodramatically labeled the “curse of di- mensionality” by its creator (Bellman, 1961). We must conclude that less clever enumerative schemes are similarly. and more abundantly, cursed for real problems. Random search algorithms have achieved increasing popularity as research- cers have recognized the shortcomings of calculus-based and enumerative schemes. Yet, random walks and random schemes that search and save the best ‘must also be discounted because ofthe efficiency requirement. Random searches, in the long run, can be expected to do no better than enumerative schemes. In ur haste to discount strictly random search methods, we must be careful to separate them from randomized techniques. The genetic algorithm is an example ofa search procedure that uses random choice as a tool to guide a highly exploi- tative search through a coding of a parameter space. Using random choice as a ‘wol in a directed search process seems strange at first, but nature contains many examples, Another currently popular search technique, simulated annealing, uses random processes to help guide its form of search for minimal energy states. A recent book (Davis, 1987) explores the connections between simulated an- nealing and genetic algorithms. The important thing to recognize at this juncture is that randomized search does not necessarily imply directionless search. While our discussion has been no exhaustive examination of the myriad methods of traditional optimization. we are left with a somewhat unsettling con clusion; conventional search methods are not robust. This does not imply that they are not useful. The schemes mentioned and countless hybrid combinations and permutations have been used successfully in many applications; however, as more complex problems are attacked, other methods will be necessary. To put this point in better perspective. inspect the problem spectrum of Fig. 1.4. In the figure a mythical effectiveness index is plotted across a problem continuum for a specialized scheme, an enumerative scheme, and an idealized robust scheme, The ‘gradient technique performs well in its narrow problem class, as we expect, but it becomes highly inefficient (if useful at all) elsewhere. On the other hand. the enumerative scheme performs with egalitarian inefficiency across the spectrum of problems. as shown by the lower performance curve. Far more desirable would be a performance curve like the one labeled Robust Scheme. It would be worth- while sacrificing peak performance on a particular problem to achieve a relatively. high level of performance across the spectrum of problems. (Of course, with broad. efficient methods we can always create hybrid schemes that combine the best of the local search method with the more general robust scheme, We will have more to say about this possibility in Chapter 5.) We shall soon see how ‘genetic algorithms help fill this robustness gap. 6 Chopter 1 / A Gentle Introduction to Genetic Algorithms Robust _Seneme poe " Specianses Scheme Efficiency andor Walk i J \ combinatorial ‘ummossl ulumode!| Problem Type FIGURE 1.4 Many traditional schemes work well in a narrow problem domain. ‘Enumerative schemes and random walks work equally inefficiently across a broad spectrum. A robust method works well across a broad spectrum of problems. THE GOALS OF OPTIMIZATION Before examining the mechanics and power of a simple genetic algorithm, we must be clearer about our goals when we say we want to optimize a function or a process, What are we trying to accomplish when we optimize? The conven. tional view is presented well by Beightler, Pl and Wilde (1979. p. 1 Man's longing for perfection finds expression in the theory of optimiza. tion. It studies how to describe and attain what is Best, once one knows how to measure and alter what is Good ar Bad... Optimization theory encompasses the quantitative study of optima and methods for finding them. ‘Thus optimization secks to improve performance toward some optimal point or points, Note that this definition has two parts: (1) we seek improvement to ap- proach some (2) optimal point. There is a clear distinction between the process of improvement and the destination or optimum itself. Yet, in judging optimiza- tion procedures we commonly focus solely upon convergence (does the method reach the optimum?) and forget entirely about interim performance. This empha- sis stems from the origins of optimization in the calculus. It is not, however, a ‘natural emphasis How are Genetic Algorithms Different from Traditional Methods? 7 Consider a human decision maker, for example, a businessman. How do we judge his decisions? What criteria do we use to decide whether he has done a good or bad job? Usually we say he has done well when he makes adequate selections within the time and resources allotted. Goodness is judged relative to his competition. Does he produce a better widget? Does he get it to market more efficiently? With better promotion? We never judge a businessman by an attainment-of-the-best criterion: perfection is all too stern a taskmaster. AS a re- sult, we conclude that convergence to the best is not an issue in business or in most walks of life; we are only concerned with doing better relative to others. This, if we want more humanlike optimization tools, we are ted to a.reordering of the priorities of optimization. The most important goal of optimization is im- provement. Can we get to some good, “satisficing” (Simon, 1969) level of per- formance quickly? Attainment of the optimum is much less important for complex systems. It would be nice to be perfect: meanwhile, we can only strive to improve. In the next chapter we watch the genetic algorithm for these quali- ties; here we outline some important differences between genetic algorithms and more traditional methods. HOW ARE GENETIC ALGORITHMS DIFFERENT FROM TRADITIONAL METHODS? In order for genetic algorithms to surpass their more traditional cousins in the quest for robustness. GAs must differ in some very fundamental ways. Genetic algorithms are different from more normal optimization and search procedures in four ways 1. Gas work with a coding of the parameter set, not the parameters themselves. 2. GAs search from a population of points. not a single point. 3. GAS use payoff (objective function) information, not derivatives or other auxiliary knowledge. 4, GAs use probabilistic transition rules, not deterministic rules. Genetic algorithms require the natural parameter set of the optimization probiem to be coded as a finite-length string over some finite alphabet. As an ‘example, consider the optimization problem posed in Fig. 1.5. We wish to maxi- mize the function f(x) = x" on the integer interval [0. 31]. With more traditional methods we would be tempted to twiddle with the parameter x, turning it like the vertical hold knob on a television set, until we reached the highest objective function value. With GAs, the first step of our optimization process is to code the parameter ~ as a finite-length string, There are many ways to code the x param- eter, and Chapter 3 examines some of these in detail. At the moment, let’s con- sider an optimization problem where the coding comes a bit more naturally Consider the black box switching problem illustrated in Fig. 1.6. This prob: lem concerns a black box device with a bank of five input switches. For every setting of the five switches. there is an output signal f; mathematically f= f(s). Chapter 1 / A Gentle Introduction to Genetic Algorithms tx) x FIGURE 1.5 A simple function optimization example, the function ftx) = x! on the integer interval [0, 31]. whese s is a particular setting of the five switches. The objective of the problem is to set the switches to obtain the maximum possible f value. With other meth- ods of optimization we might work directly with the parameter set (the switch settings) and toggle switches from one setting to another using the transition rules of our particular method. With genetic algorithms, we first code the switches as a finite-length string. A simple code can be generated by considering a string of five 1's and 0's where each of the five switches is represented by a 1 if the switch is on and a 0 if the switch is off. With this coding. the string 11110 codes the setting where the first four switches are on and the fifth switch is of. ‘Some of the codings introduced later will not be so obvious, but at this juncture ‘we acknowledge that genetic algorithms use codings. Later it will be apparent — Fs) OUTPUT ‘SIGNAL Payor $ FIGURE 1.6 A black box optimization problem with five on-off switches illus- trates the idea of a coding and a payoff measure. Genetic algorithms only require these two things: they don't need to know the workings of the black box. How are Genetic Algorithms Different from Traditional Methods? 9 that genetic algorithms exploit coding similarities in a very general way: as a result, they are largely unconstrained by the limitations of other methods (con- tinuity, derivative existence, unimodality, and so on), In many optimization methods, we move gingerly from a single point in the decision space to the next using some transition rule to determine the next point. ‘This point-to-point method is dangerous because it is a perfect prescription for locating false peaks in multimodal (many-peaked ) search spaces. By contrast. GAS work from a rich database of points simultaneously (a population of strings), climbing many peaks in parallel; thus. the probability of finding a false peak is reduced over methods that go point to point, As an example, let's consider our black box optimization problem (Fig. 1.6) again. Other techniques for solving this problem might start with one set of switch settings, apply some transition rules, and generate a new trial switch setting, A genetic algorithm starts with a population of strings and thereafter generates successive populations of strings. For example, in the five-switch problem, a random start using successive coin flips (head = 1, tail = 0) might generate the initial population of size n = 4 (small by genetic algorithm standards): o1101 11000 01000 10011 After this start, successive populations are generated using the genetic algorithm. By working from a population of well-adapted diversity instead of a single point, the genetic algorithm adheres to the old adage that there is safety in numbers: ‘we will soon see how this parallel flavor contributes to a genetic algorithm's robustness. Many search techniques require much auxiliary information in order to work properly. For example, gradient techniques need derivatives (calculated analyti- cally or numerically) in order to be able to climb the current peak. and other local search procedures like the greedy techniques of combinatorial optimization (Lawler, 1976; Syslo, Deo, and Kowalik, 1983) require access to most if not all, tabular parameters. By contrast, genetic algorithms have no need for all this aux- iliary information: GAs are blind. To perform an effective search for better and better structures, they only require payoff values (objective function values) as- sociated with individual strings. This characteristic makes a GA a more canonical method than many search schemes. After all. every search problem has a metric (or metrics) relevant to the search: however. different search problems have vastly different forms of auxiliary information. Only if we refuse to use this aux- iliary information can we hope to develop the broadly based schemes we desire. On the other hand, the refusal to use specific knowledge when it does exist can place an upper bound on the performance of an algorithm when it goes head to head with methods designed for that problem. Chapter 5 examines ways to use nonpayoff information in so-called knowledge-irected genetic algorithms; how- ever, at this juncture we stress the importance of the blindness assumption to pure genetic algorithm robustness. aC} Chapter 1 / A Gentle Introduction to Genetic Algorithms Unlike many methods, GAs use probabilistic transition rules £0 guide their search. To persons familiar with deterministic methods this seems odd, but the use of probability does not suggest that the method is some simple random search; this is not decision making at the toss of a coin. Genetic algorithms use random choice as a tool to guide a search toward regions of the search space with likely improvement. Taken together, these four differences—direct use of a coding, search from a Population, blindness to auxiliary information, and randomized operators—con- tribute to a genetic algorithm's robustness and resulting advantage over other more commonly used techniques. The next section introduces a simple three- operator genetic algorithm, A SIMPLE GENETIC ALGORITHM ‘The mechanics of a simple genetic algorithm are surprisingly simple, involving ‘nothing more complex than copying strings and swapping partial strings. The explanation of why this simpte process works is much more subtle and powerful, Simplicity of operation and power of effect are two of the main attractions of the genetic algorithm approach. ‘The previous section pointed out how genetic algorithms process popula- tions of strings, Recalling the black box switching problem, remember that the initial population had four strings: 01101 11000 01000 10011 Also recall that this population was chosen at random through 20 successive flips ofan unbiased coin. We now must define a set of simple operations that take this initial population and generate successive populations that (we hope) improve over time. A simple genesic algorithm that yields good results in many practical prob- Jems is composed of three operators: 1. Reproduction 2. Crossover 3. Mutation Reproauction is a process in which individual strings are copied according to their objective function values, f (biologists call this function the fitness func- tion). Intuitively, we can think of the function fas some measure of profit, utility, ‘or goodness that we want to maximize. Copying strings according 10 theit fitness values means that strings with a higher value have a higher probability of con- tributing one or more offspring in the next generation. This operator, of course, is an artificial version of natural selection, a Darwinian survival of the fittest A Simple Genetic Algorithm u TABLE 1.1 Sample Problem Strings and Fitness Values No. String Fitness % of Total 1 01101 169 us 2 11000 576 492 3 01000 64 35 4 10011 361 30.9 Total 1170 1000 ‘among string creatures. In natural populations fitaess is determined by a crea: ture’s ability to survive predators, pestilence, and the other obstacles to adult- hood and subsequent reproduction. In our unabashedly artificial setting, the objective function is the final arbiter of the string-creature’s life or death. ‘The reproduction operator may be implemented in algorithmic form in a number of ways. Perhaps the easiest is to create a biased roulette wheel where ‘each current string in the population has a roulette wheel slot sized in proportion to its fitness. Suppose the sample population of four strings in the black box problem has objective or fitness function values fas shown in Table 1.1 (for now ‘we accept these Values as the output of some unknown and arbitrary black box— later we will examine-a function and coding that generate these same values). Summing the fitness over all four strings, we obtain a total of 1170, The percentage of population total fitness is also shown in the table. The correspond- ing weighted roulette wheel for this generation's reproduction is shown in Fig 1.7. To reproduce, we simply spin the weighted roulette wheel thus defined four times. For the example problem. string number I has a fitness value of 169, which represents 14.4 percent of the total fitness. As a result. string 1 is given 14.4 ‘percent of the biased roulette wheel, and each spin turns up string 1 with prob- FIGURE 1.7 simple ceproduction allocates offspring strings using a roulette wheel with slots sized according to fitness. The sample wheel is sized for the problem of Tables 1.1 and 1.2. 12 Chapter 1 / A Genile Iniroduction to Genetic Algorithms ability 0.144, Each time we require another offspring, a simple spin of the weighted roulette wheel yields the reproduction candidate. In this way, more highly fit strings have a higher number of ofispring in the succeeding generation. ‘Once a string has been selected for reproduction, an exact replica of the string is made. This string is then entered into a mating pool. a tentative new population, for further genetic operator action. After reproduction, simple crossover (Fig. 1.8) may proceed in two steps. First, members of the newly reproduced strings in the mating pool are mated at random. Second, each pair of strings undergoes crossing over as follows: an in- teger position along the string is selected uniformly at random between i and the string length less one [1, / ~ 1} Two new strings are created by swapping all characters between positions & + 1 and / inclusively. For example, consider strings A, and A, from our example initial population: A= Oll0|1 A = 1100/0 Suppose in choosing a random number between 1 and 4, we obtain ak = 4 (as indicated by the separator symbol | ). The resulting crossover yields two new strings where the prime (") means the strings are part of the new generation: w= 01100 a, = 11001 BEFORE CROSSOVER AFTER CROSSOVER crassins sire STRING 1 TULA ev ome STR 2 ANU #"* FIGURE 1.8 A schematic of simple crossover shows the alignment of two strings and the partial exchange of information, using a cross site chosen at random. A Simple Genetic Algorithm 13 ‘The mechanics of reproduction and crossover are surprisingly simple, involv- ing random number generation. string copies, and some partial string exchanges. Nonetheless, the combined emphasis of reproduction and the structured, though randomized, information exchange of crossover zive genetic algorithms much of their power. At first this seems surprising. How can two such simple (and com- putationally trivial) operators result in anything useful. let alone a rapid and ro- bust search mechanism? Furthermore, doesn't it seem a little strange that chance should play such a fundamental role in a directed search process? We will ex- amine a partial answer to the first of these two questions in a moment; the answer to the second question was well recognized by the mathematician J. Hadamard (1949, p. 29): We shall see a little tater that the possibility of imputing discovery to pire chance is already excluded... On the contrary, that there is an intervention of chance but also a necessary work of unconsciousness, the latter implying and not contradicting the former... Indeed, it is ob- vious that invention or discovery, be it in mathematics or anywhere else. takes place by combining ides. Hadamard suggests that even though discovery is not a result—cannot be a re. sult—of pure chance, it is almost certainly guided by directed serendipity. Fur- thermore, Hadamard hints that a proper role for chance in a more humanlike discovery mechanism is to cause the juxtaposition of different notions. It is in- teresting that genetic algorithms adopt Hadamard’s mix of direction and chance in a manner that efficiently builds new solutions from the best partial solutions of previows trials. To see this, consider a population of n strings (perhaps the fourstring pop- ulation for the black box problem) over some appropriate alphabet. coded so that each is a complete idea or prescription for performing a particular task (in this case, each string is one complete switch-setting idea). Substrings within each string (idea) contain various notions of what is important or relevant to the task. ‘Viewed in this way, the population contains not just a sample of 1 ideas; rather. it contains a multitude of notions and rankings of those notions for task perfor- mance. Genetic algorithms ruthlessly exploit this wealth of information by (1) reproducing high-quality notions according to their performance and (2) cross- ing these notions with many other high-performance notions from other strings. Thus, the action of crossover with previous reproduction speculates on new ideas constructed from the high-performance building blocks (notions) of past trials. In passing, we note that despite the somewhat fuzzy definition of a notion. we have not limited a notion to simple linear combinations of single features or pairs of features. Biologists have long recognized that evolution must efficiently pro- cess the epistasis (positionwise nonlinearity’) that arises in nature. In a similar ‘manner, the notion processing of genetic algorithms must effectively process no- tions even when they depend upon their component features in highly nonlinear and complex ways. 4 Chapter 1 / A Gentle Introduction to Genetic Algorithms Exchanging of notions to form new ideas is appealing intuitively. if we think in terms of the process of innovation, What is an innovative idea? AS Hadamard suggests, most often it is a juxtaposition of things that have worked well in the past. In much the same way, reproduction and crossover combine to search po- tentially pregnant new ideas. This experience of emphasis and crossing is analo- gouS to the human interaction many of us have observed at a trae show OF scientific conference. At a widget conference. for example, various widget €x perts from around the world gather to discuss the fatest in widget technology After the lecture sessions, shey all pair off around the bar to exchange widget stories. Well-known widget experts, of course. are in greater demand and €x: change more ideas, thoughts. and notions with their lesser known widget col leagues. When the show ends. the widget people return to their widget laboratories to try out a surfeit of widget innovations. The process of reproduc- tion and crossover in a genetic algorithm is this kind of exchange. High-perfor mance notions are repeatedly tested and exchanged in the search for better and better performance. WF reproguction according to fitness combined with crossover gives genetic algorithms the bulk of their processing power, what then is the purpose of the mutation operator? Not surprisingly, there is much confusion about the role of ‘mutation in genetics (both natural and artificial), Perhaps it is the result of to many ® movies detailing the exploits of mutant eggplants that consume mass quantities of Tokyo or Chicago, but whatever the cause for the confusion, we find that mutation plays a decidedly secondary role in the operation of genetic algo- rithms. Mutation is needed because, even though reproduction and crossover effectively search and recombine extant notions. occasionally they may become ‘overzealous and lose some potentially useful genetic material (1's or O's at partic ular locations). In artificial genetic systems, the mutation operator protects. against such an irrecoverable loss, In the simple GA, mutation is the occasional (swith small probability) random alteration of the value of a string position. In the binary coding of the black box problem. this simply means changing a 1 t0 a0 and vice versa. By itself, mutation is a random walk through the string, space. ‘When used sparingly with reproduction and crossover, it is an insurance policy against premature loss of important notions. ‘That the mutation operator plays a secondary role in the simple GA, we sim- ply note that the frequency of mutation to obtain good results in empirical genetic algorithm studies is on the order of one mutation per thousand bit (po- sition) transfers. Mutation rates are similarly small (or smaller) in natural popu- Jations, leading us to conclude that mutation is appropriately’ considered as a secondary mechanism of genetic algorithm adaptation Othe? genetic operators and reproductive plans have been abstracted from the study of biological example. However, the three examined in this section, reproduction, simple crassover, and mutation, have proved to be both computa: tionally simple and effective in attacking a number of important optimization problems. In the next section. we perform a hand simulation of the simple genetic algorithm to demonstrate both its mechanics and its power. Genetic Algorithms at Work—A Simulation by Hand 15 GENETIC ALGORITHMS AT WORK—A SIMULATION BY HAND Let’s apply our simple genetic algorithm to a particular optimi by step. Consider the problem of maximizing the function (x: permitted to vary between 0 and 31, a function displayed earlier as Fig. 1.5. To use a genetic algorithm we must first code the decision variables of our problem as some finite-length string, For this problem, we will code the variable x simply as a binary unsigned integer of length 5. Before we proceed with the simulation, let’s briefly review the notion of a binary integer. As decadigited creatures, we have little problem handling base 10 integers and arithmetic. For example, the five-digit number 53,095 may be thought of as, 5°10! + 3-105 + 0:10? + 9°10! + 5-1 3.095. In base 2 arithmetic, we of course only have two digits to work with, 0 and 1, and as an example the number 10,011 decodes to the base 10 number 1-2 + O25 40-2? + 1-2! + 1-2= 164241 = 19, With a five-bit (binary digit) unsigned integer we can obtain numbers between 0 (00000) and 31 (11111). With a well-defined objective function and coding, we now simulate a single generation of a genetic algorithm with reproduction, crossover, and mutation, To start off, we select an initial population at random. We seiect a population of size 4 by tossing a fair coin 20 times. We can skip this step by using the initial population created in this way earlier for the black box switching problem, Look- ing at this population, shown on the left-hand side of Table 1.2, we observe that the decoded x values are presented along with the fitness or objective function values f(x). To make sure we know how the fitness values f(x) are calculated from the string representation, let's take a look at the third string of the initial population, string 01000, Decoding this string as an unsigned binary integer. we ‘note that there is a single one in the 2* = 8's position. Hence for string 01000 we obtain x = 8. To calculate the fitness or objective function we simply square the x value and obtain the resulting fitness value f(x) = 64, Other x and f(x) values rhay be obtained similarly. ‘You may notice that the fitness or objective function values are the same as the black box values (compare Tables 1.1 and 1.2). This is no coincidence, and the black box optimization problem was well represented by the particular func tion, f(x), and coding we are now using, Of course. the genetic algorithm need not know any of this; it is just as happy to optimize some arbitrary switching function (or any other finite coding and function for that matter) as some pol nomial function with straightforward binary coding. This discussion simply rein forces one of the strengths of the genetic algorithm: by exploiting similarities in codings, genetic algorithms can deal effectively with a broader class of functions than can many other procedures. ‘A generation of the genetic algorithm begins with reproduction. We select the mating pool of the next generation by spinning the weighted roulette wheel 16 Chapter 1 / A Gentle Introduction to Genetic Algorithms TABLE 1.2 A Genetic Algorithm by Hand , Actual Initial Expected Count Population x Value pselect, count / from Suing (Fendony ase) wo L £ Roulette Generated, suet) x y f Wheel 1 ollol 13 169.14 0.58 1 2 11000 24 576 049 © 197 2 3 ol000 8 64 0.06 022 a 4 10011 19 361 0.31 1.23 1 ‘Sum . 1170 1.00 4.00 4.0 Average 293 025 180 10 Max 049197 20 (shown in Fig. 1.7) four times. Actual simulation of this process using coin tosse: has resulted in string 1 and steing 4 receiving one copy in the mating pool, string 2 receiving two copies, and string 3 receiving no copies, as shown in the centet of Table 1.2. Comparing this with the expected number of copies (n-pselect,) we have obtained what we should expect: the best get more copies, the average stay even, and the worst die off. With an active pool of strings looking for mates, simple crossover proceed: in two steps: (1) strings are mated randomly, using coin tosses to pair off the happy couples, and (2) mated string couples cross over, wsing coin tosses tc select the crossing sites. Referring again to Table 1.2, random choice of mates has selected the second string in the mating pool to be mated with the first. With 2 crossing site of 4, the two strings 01101 and 11000 cross and yield two new strings 01100 and 11001. The remaining two strings in the mating pool are crossed at site 2; the resulting strings may be checked in the table. ‘The last operator, mutation, is performed on a bit-by-bit basis. We assume that the probability of mutation in this test is 0,001. With 20 transferred bit po- sitions we should expect 20-0.001 = 0.02 bits to undergo mutation during a given generation, Simulation of this process indicates that no bits undergo mu: tation for this probability value. As a result, no bit positions are changed from © to 1 or vice versa during this generation. Following reproduction, crossover, and mutation, the new population is ready to be tested. To do this, we simply decode thé new strings created by the simple genetic algorithm and calculate the fitness function values from the x values thus decoded. The results of a single generation of the simulation are shown at the right of Table 1.2. While drawing concrete conclusions from a single tial of a stochastic process is, at best, a risky business, we start to see how genetic algorithms combine high-performance notions to achieve better performance. In the table, note how both the maximal and average performance have improved in the new population. The population average fitness has improved from 293 to Genetic Algorithms at Work—A Simulation by Hand 7 Mating Pool after ee ee Reproduction (Randomly) Randomly New x fix) (Gross Site Stown) —_\ Selected Selected} Population Value “x girl 2 4 01100 12 144 1100/0 1 4 11001 25 62 }i]eg9 4 2 11011 277 79 2o0{o11 3 2 10000 6 256 1754 8 2 NOTES 1) Initial population chosen by four repetitions of five coin tosses where heads = 1, tals = 0. 2) Reproduction performed through 1 part in 8 simulation of roulette wheel selection (three cola tosses). 3) Cromowes pried rough ary dccading of 2 ola ome (TT = 00, = 0 = cro ste 1H = 11, = 3 = cross site 4) Crossover probability assumed to be unity p, = 1.0. '5) Mutation probability assumed 20 be 0.001, p, = 0.001, Expected mutations = 5:4:0.001 = (0.02. No mutations expected during a single generation. None simulated. * 439 in one generation. The maximum fitness has increased from 576 to 729 dur- ing that same period. Although random processes help cause these happy circum- stances, we start to see that this improvement is no fluke. The best string of the first generation (11000) receives two copies because of its high, above average performance. When this combines at random with the next highest string (10011) and is crossed at location 2 (again at random), one of the resulting strings (11011) proves to be a very good choice indeed. ‘This event is an excellent illustration of the ideas and notions analogy devel- oped in the previous section. In this case, the resulting good idea is the combi- nation of two above-average notions, namely the substrings 11-~— and -—=11. Although the argument is still somewhat heuristic, we start to see how genetic algorithms effect a robust search. in the next section, we expand our understand- ing of these concepts by analyzing genetic algorithms in terms of schemata or similarity templates. The intuitive viewpoint developed thus far has much appeal. We have com- pared the genetic algorithm with certain human search processes commonly called innovative or creative. Furthermore, hand simulation of the simple genetic algorithm has given us some confidence that indeed something interesting is going on here. Yet, something is missing. What is being processed by genetic algorithms and how do we know whether processing it (whatever it is) will lead to optimal or near optimal results in a particular problem? Clearly, as scientists, Chapter 1 / A Gentle Introduction to Genetic Algorithms engineers. and business managers we need to understand the what and the how of genetic algorithm performance. To obtain this understanding, we examine the raw data available for any search procedure and discover that we can search mote effectively if we exploit important similarities in the coding we use. This leads us to develop the impor- tant notion of a similarity template, or schema. This in turn leads us to a key- stone of the genetic algorithm approach, the building block hypothesis. GRIST FOR THE SEARCH MILL—IMPORTANT SIMILARITIES. For much too long we have ignored a fundamental question. In a search process given only’ payoff data (fitness values), what information is contained in a popw- lation of strings and their objective function values to help guide a directed search for improvement? To ask this question more clearly, consider the strings and fitness values originally displayed in Table 1.1 from the simulation of the previous section (the black box problem) and gathered below for convenience: String Fitness o1101 169 11000 576 01000 68 10011 361 ‘What information is contained in this population to guide a directed search for improvement? On the face of it, there is not very much: four independent samples of different strings with their fitness values. As we stare at the page, however, quite naturally we start scanning up and down the string column, and we notice certain similarities among the strings. Exploring these similarities in more depth, we notice that certain string patterns seem highly associated with good perfor- mance. The longer we stare at the strings and their fitness values, the greater is the temptation fo experiment with these high fitness associations. It seems per- fectly reasonable to play mix and match with some of the substrings that are highly correlated with past success. For example, in the sample population, the strings starting with a } seem to be among the best. Might this be an important ingredient in optimizing this function? Certainly with our function (f(x) = 2°) and our coding (a five-bit unsigned integer) we know it is (why is this true?). But, what are we doing here? Really, two separate things. First, we are secking similarities among strings in the population. Second, we are looking for causal relationships between these similasities and high fitness. In so doing, we admit a wealth of new information to help guide a search. To see how much and precisely Similarity Templates (Schemnata) 9 what information we admit, let us consider the important concept of a schema (plural, schemata), or similarity template. ‘SIMILARITY TEMPLATES (SCHEMATA) In some sense we are no longer interested in strings as strings alone. Since im- portant similarities among highly fit strings can help guide a search, we question ‘how one string can be similar to its fellow strings. Specifically we ask; in what ways is a string a representative of other string classes with similagities at certain string positions? The framework of schemata provides the tool to answer these questions. ‘A schema (Holland, 1968, 1975) is a similarity template describing a subset of strings with similarities at certain string positions. For this discussion, let us once again limit ourselves without loss of generality to the binary alphabet {0,1}. We motivate a schema most easily by appending a special symbol to this alphabet; we add the * or don’t care symbol. With this extended alphabet we can now create strings (schemata) over the ternary alphabet {0, 1, *}, and the meaning of the schema is clear if we think of it as a pattern matching device: a schema matches a particular string if at every location in the schema a 1 matches a1 in the string. 2 0 matches a 0, or a * matches either. As an example, consider the strings and schemata of length 5. The schema *0000 matches two strings, namely {10000, 00000}. As another example, the schema *111* describes 2 subset with four members {01110, 01111, 11110, 11111}. As one last example, the schema 0°1** matches any of the eight strings of length 5 that begin with a 0 and have a 1 in the third position. As you can start to see, the idea of a schema gives us a powerful and compact way to talk about all the well-defined similarities among finite-length strings over a finite alphabet. We should emphasize that the * is only a metasymbol (a symbol about other symbols); it is never explicitly processed by the genetic algorithm. It is simply a notational device that allows description of all possible similarities among strings of a particular length and alphabet. Counting the total number of possible schemata is an enlightening exercise. In the previous example, with | = 5, we note there are 3:33:33 = 3° = 243 different similarity templates because cach of the five positions may be a 0, 1, or *. In general. for alphabets of cardinality (number of alphabet characters) ke there are (& + 1) schemata. At first blush, it appears that schemata are making the search more difficult. For an alphabet with & elements there are only (only?) K! different strings of length £ Why consider the (k + 1) schemata and enlarge the space of concern? Put another way, the length 5 example now has onty 2" = 32 different alternative strings. Why make matters more difficult by considering 3° = 243 schemata? In fact, the reasoning discussed in the previous section makes things easier. Do you recall glancing up and down the list of four strings and fitness values and trying to figure out what to do next? We recognized that if we considered the strings separately, then we only had four pieces of information; Chapter 1 / A Gentle Introduction to Genetic Algorithms however, when we considered the strings, their fitness values, and the si among the strings in the population, we 2 help direct our Search. How much information do we admit by considering the similarities? The answer to this question is related to the number of unique sche- ‘smata Contained in the population. To count this quantity exactly requires knowl- ‘edge of the strings in a particular population. To get a bound on the number of schemata in a particular population, we first count the number of schemata con. tained in an individual string, and then we get an upper bound on the total num- ‘ber of schemata in the population. To see this, consider a single string of length 5: 11111, for example. This string is a member of 2* schemata because each position may take on its actual value or a don’t care symbol. In general. a particular string contains 2' schemata. ‘As a result, a population of size n contains somewhere between 2! and 7-2! sche- ‘mata, depending upon the population diversity. This fact verifies our earlier in- tuition. The original motivation for considering important similarities was 10 get more information to help guide our search. The counting argument shows that a wealth of information about important similarities is indeed contained in even moderately sized populations. We will examine how genetic algorithms effec- tively exploit this information. At this juncture, some parallel processing appears to be needed if we are to make use of all this information in a timely fashion. ‘These counting arguments are well and good, but where does this all lead? ‘More pointedly, of the 2’ to 2-2! schemata contained in a population, how many are actually processed in a useful manner by the genetic algorithm? To obtain the answer to this question, we consider the effect of reproduction, crossover, and ‘mutation on the growth of decay of important schemata from generation to gen- eration. The effect of reproduction on a particular schema is easy to determit since more highly fit strings have higher probabilities of selection, on average we give an ever increasing number of samples to the observed best similarity pat- terns (this is a good thing to do, as is shown in the next chapter); however, reproduction alone samples no new points in the space. What then happens to a particular schema when crossover is introduced? Crossover leaves a schema un- scathed if it does not cut the schema, but it may disrupt a schema when it does. For example, consider the two schemata 1***0 and **11*. The first is likely to be disrupted by crossover, whereas the second is rejatively unlikely to be destroyed. AS a result, schemata of short defining length are left alone by crossover and reproduced at a good sampling rate by reproduction operator. Mutation at nor- mal, low rates docs nat disrupt a particular schema very frequently and we are left with a startling conclusion. Highly fit, short-defining length schemata (we call them building blocks) are propagated generation to generation by giving expo- nentially increasing samples to the observed best; all this goes in parallel with no special bookkeeping of special memory other than our population of n strings. In the next chapter we will count how many schemata are processed usefully in each generation. It turns out that the number is something like 7. This compares favorably with the number of function evaluations (2). Because this processing leverage is so important (and apparently unique to genetic algorithms), we give it.a special name, implicit parallelism. Learning the Lingo a LEARNING THE LINGO ‘The power behind the simple operations of our genetic algorithm is at least in- tuitively clearer if we think of building blocks. Some questions remain: How do we know that building biocks lead to improvement? Why is it a near optimal strategy to give exponentially increasing samples to the best? How can we cal culate the number of schemata usefully processed by the genetic algorithm? ‘These questions are answered fully in the next chapter. but first we need to mas- ter the terminology used by researchers who work with genetic algorithms. Be- cause genetic algorithms are rooted in both natural genetics and computer science, the terminology used in the Ga literature is an unholy mix of the natural and the artificial. Until now we have focused on the artificial sidg of the genetic algorithm's ancestry and talked about strings, alphabets. string positions, and the like. We review the correspondence between these terms and their natural coun- terparts to connect with the growing Ga literature and also to permit our own ‘occasional slip of a natural utterance or two. Roughly speaking, the strings of artificial genetic systems are analogous to chromosomes in biological systems. In natural systems. one or more chromo- somes combine to form the total genetic prescription for the construction and operation of some organism. In natural systems the total genetic package is called the genotype. In artificial genetic systems the total package of strings is called a structure (in the early chapters of this book. the structure will consist of a single string, so the text refefs to strings and structures interchangeably until it is nec essary to differentiate between them), In natural systems. the organism formed by the interaction of the total genetic package with its environment is called the phenotype. In artificial genetic systems, the structures decode to form a partic- ular parameter set, solution alternative, or point (in the solution space). The Sesigner of an artificial genetic system has a variety of alternatives for coding both numeric and nonnumeric parameters. We will confront codings and coding principles in later chapters; for now. we stick to our consideration of GA and natural terminology. In natural terminology, we say that chromosomes are composed of genes, which may take on some number of values called atfeles. In genetics, the position of a gene (its focus) is identified separately from the gene's function. Thus. we can talk of a particular gene. for example an animal’s eve color gene. its locus, position 10, and its allele value, blue eyes. In artificial genetic search we say that strings are composed of features or detectors. which take on different values Features may be located at different positions on the string. The correspondence between natural and artificial terminology is summarized in Table 1.3. Thus far. we have not distinguished berween a gene (a particular character) locus (its position): the position of a bit in a string has determined its ‘meaning (how it decodes) uniformly throughout a population and throughout time. For example. the string 10000 is decoded as a binary unsigned integer 16 (base 10) because implicitly the 1 is in the 16's place. It is not necessary to limit codings like this, however. A later chapter presents more advanced structures that treat locus and gene separately: 22 Chapter 1 / A Gentle Introduction to Genetic Algorithms TABLE 1.3 Comparison of Natural and GA Terminology Natural chromosome string gene feature, character. or detector allele feature value locus stsing position enotype structure phenotype parameter set alternative solution, a decoded structure ponlinearity SUMMARY This chapter has laid the foundation for understanding genetic algorithms, their mechanics and their power. We are led to these methods by our search for robustness; natural systems are robust—efficient and efficacious—as they adapt to a wide variety of environments, By abstracting nature's adaptation algorithm of choice in artificial form we hope to achieve similar breadth of performance. In fact, genetic algorithms have demonstrated their capability in a number of analytical and empirical studies. ‘The chapter has presented the detailed mechanics of a simple, three-operator genetic algorithm, Genetic algorithms operate on populations of strings, with the string coded to represent some underlying parameter set. Reproduction, cross- over, and mutation are applied to successive string populations to create new string populations. These operators are simplicity itself, involving nothing more complex than random number generation, string copying, and partial string ex- changing; yet, despite their simplicity, the resulting search petformance is wide- ranging and impressive. Genetic algorithms realize an innovative notion exchange ‘among strings and thus connect to our own ideas of human search or discovery. A simulation of one generation of the simple genetic algorithm has helped illus- trate both the detail and the power of the method, Four differences separate genctic algorithms from more conventional opti- mization techniques: 1, Direct manipulation of a coding 2. Search from a population, not a single point 3. Search via sampling, a blind search 4. Search using stochastic operators, not deterministic rules Genetic algorithms manipulate decision or control variable representations at the string level to exploit similarities among high-performance strings. Other methods usually deal with functions and thei control variables directly. Because Problems 23 genetic algorithms operate at the coding level, they are difficult to fool even when the function may be difficult for traditional schemes. Genetic algorithms work from a population; many other methods work from a single point. In this way, GAs find safety in numbers. By maintaining a population of well-adapted sample points, the probability of reaching a false peak is reduced. Genetic algorithms achieve much of their breadth by ignoring information except that concerning payoff. Other methods rely heavily on such information, and in problems where the necessary information is not available or difficult to obtain, these other techniques break down. GAs remain general by exploiting information available in any search problem. Genetic algorithms process similar- ities in the underlying coding together with information ranking the structures according to their survival capability in the current environment. By exploiting such widely available information, GAs may be applied in virtually any problem. ‘The transition rules of genetic algorithms are stochastic; many other methods have deterministic transition rules. A distinction exists, however. between the randomized operators of genetic algorithms and other methods that are simple random walks. Genetic algorithms use random choice to guide a highly exploi tative search. This may seem unusual, using chance to achieve directed results (the best points), but nature is full of precedent. We have started a more rigorous appraisal of genetic algorithm performance through the concept of schemata or similarity templates. A schema is a string over an extended alphabet, {0,1,*} where the 0 and the 1 retain their normal meaning and the * is a wild card or dén't care symbol. This notational device greatly simplifies the analysis of the genetic algorithm method because it explic- itly recognizes all the possible similarities in a population of strings. We have discussed how building blocks—short, high-performance schemata—are com- bined to form strings with expected higher performance. This occurs because building blocks are sampled at near optimal rates and recombined via crossover. Mutation has little effect on these building blocks; like an insurance policy, it helps prevent the irrecoverable loss of potentially important genetic material. ‘The simple genetic algorithm studied in this chapter has much to recom- mend it. In the next chapter, we will analyze its operation more carefully. Follow: ing this, we will implement the simple GA in a short computer program and examine some applications in practical problems. @ PROBLEMS 1.1. Consider a black box containing eight multiple-position switches. Switches 1 and 2 may be set in any of 16 positions. Switches 3. 4. and 5 are four-position switches, and switches 6-8 have only two positions. Calculate the number of unique switch settings possible for this black box device. 1.2. For the black box device of Problem 1.1, design a natural string coding that uses eight positions, one position for each switch. Count the number of switch 24 Chapter 1 / A Gentle latroduction to Genetic Algorithms settings represented by your coding and count the number of schemata or simi- larity templates inherent in your coding 4.3. For the black box device of Problem 1.1, design a minimal binary coding for the eight switches and compare the number of schemata in this coding to a ‘coding for Problem 1.2 1.4. Consider a binary string of length 11, and consider a schema, 1 Under crossover with uniform crossover site selection, calewlate a lower limit on the probability of this schema surviving crossover, Calculate survival probabilities under the same assumptions for the following schemata: ****10"****, +10" 15. Ifthe distance between the outermost alleles ofa particular schema is called defining length 8, derive an approximate expression for the survival proba- ity of a particular schema of total length / and defining length 8 under the operation of simple crossover. 1.6. Six strings have the following fitness function values: 5. 10. 15, 25, 50. 100. Under roulette wheel selection, calculate the expected number of copies of each string in the mating pool if a constant population size, n = 6, is maintained. 1.7. Instead of using roulette wheel selection during reproduction, suppose we define a copy count for each string, ncount, as follows: ncount, = f/f where f, is the fitness of the ith string and f is the average fitness of the population. The copy count is then used to generate the number of members of the mating pool by giving the integer part of ncount, copies to the ith string and an additional copy with probability equal to the fractional part of ncount,. For exampte, with J, = 100 and 7 = 80, string # would receive an ncount, of 1.25, and thus would receive one copy with probability 1.0 and another copy with probability 0.25. Using the string fitness values in Problem 1.6, calculate the expected number of copies for cach of the six strings. Calculate the total number of strings expected in the gene pool under this form of reproduction 1.8. The form of reproduction discussed in Problem 1.7 is sometimes called reproduction with expected number control. In a short essay, explain why this is 80. in what ways are roulette wheel selection and expected number control sim ilar? In what ways are they different? 1.9. Suppose the probability of a mutation at a single bit position is 0.1. Calculate the probability of a 10-bit string surviving mutation without change. Calculate the probability of a 20-bit string surviving mutation without change. Recalculate the survival probabilities for both 10- and 20-bit strings when the mutation prob- ability is 0.01 1.10. Consider the strings and schemata of length 11. For the following schemata, calculate the probability of surviving mutation if the probability of mutation is 0.1 at a single bit position: ***1*+0F***, It*ee*#*#40, ##¢111°F#**, 1000010" 11. Recalculate the survival probabilities for a mutation probability p,, = 0.01. Computer Assignments 25 @ COMPUTER ASSIGNMENTS ‘A. One of the primitive functions required in doing genetic algorithms on a computer is the ability to generate pseudorandom numbers. The numbers are pseudorandom because as von Neumann once said, “Anyone who considers ar- ithmetical methods of producing random digits is, of course, in a state of sin.” As part of this assignment, go forth and sin some more. Use the random number generator given in Appendix B to create a program where you generate 1000 random numbers between 0 and 1. Keep track of how many numbers are gen- erated in each of the four quartiles, 0-0.25, 0.25-0.5, 0.5-0.75, 0.75-1.0, and compare the actual counts with the expected number. Is the difference within reasonable limits? How can you quantify whether the difference is reasonable? B. Suppose you have 10 strings with the following probabilities of selection in the next generation: 0.1, 0.2,0.05, 0.15, 0.0, 0.11, 0.07, 0.04, 0.00, 0.12, 0.16. Given that these are the only possibie alternatives, calculate whether the probabilities are consistent. Write a computer program that simulates roulette wheel selection for these 10 strings. Spin the wheel 1000 times and keep track of the number of selections for each string, comparing this number to the expected number of selections, C. Write a function that generates a pseudorandom integer between some spec- ified lower limit and some specified upper limit. Test the program by generating 1000 numbers between 3 and and 12. Keep track of the quantity of each number selected and compare these figures to the expected quantities. D. Create a procedure that receives two binary strings and a crossing site value, performs simple crossover, and returns two offspring strings. Test the program by crossing the following strings of length 10: 1011101011. 0000110100. Try crossing site values of — 3, 1, 6, and 20. E. Create a function mutation that complements a particular bit value with specified mutation probability p,. Test the function by performing 1000 calls to mutation using mutation probabilities p,, = 0.001, 0.01, 0.1. Compare the real- ized number of mutations to the expected number. F. Using the simple crossover operator of Assignment D. repeatedly apply the crossover operator to strings contained within the following population of size n= 200and = 5: 100 copies of 11100 100 copies of 00011 Perform crossover (p, = 1.0) for 50 generations without replacement under no selection. Compare the initial and final distributions of strings. Also compare the expected quantity of each string to the realized quantity in generation 50. Genetic Algorithms Revisited: Mathematical Foundations ‘The broad brush of Chapter 1 painted an accurate, if somewhat crude, picture of genetic algorithms and their mechanics and power. Perhaps these brush strokes appeal to your own sense of human discovery and search, That somehow a reg- vwlar though randomized procedure can achieve some of the breadth and intuitive flair of human search seems almost too good to be true. That this discovery pro- cedure should mirror the natural processes that created the species possessing the procedure is a recursion of which Gédel, Escher, or Bach (Hofstadter, 1979) could each have been proud. Despite their intuitive appeal, and despite their symmetry, it is crucial that we back these fuzzy feelings and speculations about enetic algorithms using cold, mathematical facts. Actually, we have already begun a more rigorous appraisal of GAs. Toward the end of the last chapter, the fundamental concept of a schema or similarity template was introduced. Quantitatively, we found that there are indeed 2 large number of similarities to exploit in a population of strings. Intuitively, we saw how genetic algorithms exploit in parallel the many similarities contained in Duilding blocks or short, high-performance schemata. In this chapter, we make these observations more rigorous by doing several things. First, we count the schemata represented within a population of strings and consider which grow and which decay during any given generation. To do this. we consider the effect of reproduction, crossover, and mutation on a particular schema. This analysis leads to the fundamental theorem of genetic algorithms that quantifies these growth and decay rates more precisely: it also points to the mathematical form 28 Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations of this growth. This form is connected to an important and classical problem of decision theory, the two-armed bandit problem (and its extension, the k-armed bandit ). The mathematical similarity between the optimal (minimal loss) solution to the two-armed and k-armed bandit and the equation describing the number Of trials given to successive generations of schemata in the simple genetic algo- rithm is striking. Counting the number of schemata that are usefully processed by the simple genetic algorithm reveals tremendous leverage in the building block processing. Finally, we consider an important question: How do we know that combining building blocks leads to high performance in arbitrary problems? The question sparks our consideration of some relatively new tools of genetic algorithm analysis: schema transforms and the minimal deceptive problem. WHO SHALL LIVE AND WHO SHALL DIE? THE FUNDAMENTAL THEOREM The operation of genetic algorithms is remarkably straightforward. After all, we start with a random population of m strings, copy strings with some bias toward the best, mate and partially swap substrings, and mutate an occasional bit value for good measure. Even though genetic algorithms directly manipulate a popu- lation of strings in this straightforward manner, in Chapter 1 we started to rec- ognize that this explicit processing of strings really causes the implicit processing of many schemata during each generation. To analyze the growth and decay of the many schemata contained in a population, we need some simple notation to add rigor to the discussion. We consider the operation of reproduction, crossover, and mutation on the schemata contained in the population, ‘We consider strings, without loss of generality, to be constructed over the binary alphabet V = {0, 1}. As a notational convenience, we refer to strings by capital letters and individual characters by lowercase letters subscripted by their Position. For example, the seven-bit string A = 0111000 may be represented symbolically as follows: A= aaaaaaa, Here each of the a, represents a single binary feature of detector (in accordance with natural analogy, we sometimes call the as genes), where each feature may take on a value 1 or 0 (we sometimes call the a, values alleles). In the particular string 0111000, a; is 0, a, is 1, a, s 1, etc. It is also possible to have strings where detectors are not ordered sequentially as in string’A. For example a string A’ could have the folowing ordering: A’ = aaa aaady A later chapter explores the effect of extending the representation to allow fea- tures to be located in a manner independent of their function. For now, assume that a feature’s function may be determined by its position. Meaningful genetic search requires a population of strings, and we consider ‘Who Shall Live and Who Shall Die? The Fundamental Theorem 29 a population of individual strings 4, j = 1, 2,..., ”, contained in the population A(t) at time (or generation) ¢ where the boldface is used to denote a population, Besides notation to describe populations, strings, bit positions, and alleles, ‘we need convenient notation to describe the schemata contained in individual strings and populations. Let us consider a schema H taken from the three-letter alphabet V+ = {0. 1, *}. As discussed in the previous chapter, the additional symbol, the asterisk or star *, is a don't care or wild card symbol which matches either a 0 or a 1 at a particular position. For example, consider the length 7 schema H = *11°0**, Note that the string A = 0111000 discussed above is an ‘example of the schema H, because the stsing alleles a, match schema positions +b,at the fixed positions 2, 3, and 5. From the results of the last chapter, recall that there are 3! schemata or sim- ilarity defined over a binary string of length £ In general, for alphabets of cardi- nality & there are (e + 1) schemata. Furthermore, recall that in a string Population with n members there are at most 1-2! schemata contained in a pop- ulation because each string is itself a representative of 2! schemata. These count- ing arguments give us some feel for the magnitude of information being processed by genetic algorithms; however, to really understand the important building blocks of future solutions, we need to distinguish between different types of schemata All schemata are not created equal. Some are more specific than others. For example, the schema 011°1"* is a more definite statement about important sim- ilarity than the schema 0 Furthermore, certain schema span more of the total string length than others. For example, the schema 1**"*1" spans a larger portion of the string than the schema 1*1****, To quantify these ideas, we intro- duce two schema properties: schema order and defining length. ‘The order of a schema H, denoted by o(H), is simply the number of fixed positions (in a binary alphabet, the number of 1's and 0's) present in the temnplate. In the examples above, the order of the schema 011°1** is 4 (symbolically, 0(011*1**) = 4), whereas the order of the schema 0****** is 1. The defining length of a schema H, denoted by 8(H), is the distance between the first and last specific string position. For example, the schema 011°1** has defining length 8 = 4 because the last specific position is 5 and the first specific position is 1, and the distance between them is 8H) = 5 — 1 = 4. In the other ‘example (the schema 0******), the defining length is particularly easy to calcu- late. Since there is only a single fixed position, the first and last specific positions are the same, and the defining length 8 = 0. Schemata and their properties are interesting notational devices for rigor- ously discussing and classifying string similarities. More than this, they provide the basic means for analyzing the net effect of reproduction and genetic operators ‘on building blocks contained within the population. Let us consider the individ: ual and combined effect of reproduction, crossover, and mutation on schemata contained within 2 population of strings. “The effect of reproduction on the expected number of schemata in the pop- ulation is particularly easy to determine. Suppose at a given time step ¢ there are 30 Chapter 2 / Genetic Algorithms Revisited: Mathematicol Foundations ‘m examples of a particular schema H contained within the population A(t) where we write m = m(H,t) (there are possibly different quantities of different sche- mata H at different times f), During reproduction. a string is copied according to . OF More precisely a string A, gets selected with probability p, = f/f, icking a nonoverlapping population of size n with replacement from the population A(1). we expect to have mH, + 1) representatives of the schema H in the population at time #~ 1 as given by the equation m(H, t+ 1) = m(H.t)-7-fUH)/Sf, where f\H) is the average fitness of the strings representing schema H at time £ If we recognize that the average fitness of the entire popu- lation may be written as f = Sfyn then we may rewrite the reproductive schema growth equation as follows: m(H, t +1) = mH, ohh In words, a particular schema grows as the ratio of the average fitness of the schema to the average fitness of the population. Put another way, schemata with fitness values above the population average will receive an increasing number of samples in the next generation, while schemata with fitness values below the population average will receive a decreasing number of samples. It is interesting to observe that this expected behavior is carried out with every schema H con- tained in a particular population A in parallel. In other words, all the schemata in population grow or decay according to their schema averages under the oper- ation of reproduction alone. In a moment, we examine why this might be a good thing to do. For the time being, simply note that many things go on in parallel with simple operations on the strings in the population. ‘The effect of reproduction on the number of schemata is qualitatively clear; above-average schemata grow and below-average schemata die off. Can we learn, anything else about the mathematical form of this growth (decay) from the schema difference equation’ Suppose we assume that a particular schema H re- mains above average an amount cf with ¢ a constant. Under this assumption we can rewrite the schema difference equation as follows: mH, t+ 1) = mn LD = (1 +c): mH, 1). Starting at ¢ = 0 and assuming a stationary value of ¢ we obtain the equation m(H, t) = m(H, 0)- (1 + c¥. Business-oriented readers will recognize this equation as the compound interest equation, and mathematically oriented readers will recognize a geometric pro- gression or the discrete analog of an exponential form. The effect of reproduction is now quantitatively clear; reproduction allocates exponentially increasing (de- creasing) numbers of trials to above- (below- ) average schemata. We will connect this rate of schemata allocation to the multiarmed bandit problem, but for right now we will investigate how crossover and mutation affect this allocation of trials. ‘Who Shall Live and Who Shall Die? The Fundamental Theorem 31 ‘To some extent it is curious that reproduction can allocate exponentially increasing and decreasing numbers of schemata to future generations in parallel; many, many different schemata are sampled ia parallel according to the same rule through the use of 1 simple reproduction operations. On the other hand, repro- duction alone does nothing to promote exploration of new regions of the search space, since no new points are searched: if we only copy old structures without change, then how will we ¢ver try anything new? This is where crossover steps in. Crossover is a structured yet randomized information exchange between strings. Crossover creates new structures with a minimum of disruption to the allocation strategy dictated by reproduction alone. This results in exponentially increasing (or decreasing) proportions of schemata in 2 population on many of the schemata contained in the population. To see which schemata are affected by crossover and which are not, consider a particular string of length / = 7 and two representative schemata string: A 30111000 He*leeeeo Ho=*eeioee Clearly the two schemata H, and H, are represented in the string A, but to see the effect of crossover on the schemata, we first recall that simple crossover pro- ceeds with the random selection of a mate, the random selection of a crossover site, and the exchange of substrings from the beginning of the string to the cross- over site inclusively with the corresponding substring of the chosen mate. Sup- pose string A has been chosen for mating and crossover. In this string of length 7, suppose we roll a single die to choose the crossing site (there are six sites in a string of length 7). Further suppose that the die turns up a 3, meaning that the cross cut will take place between positions 3 and 4. The effect of this cross on ‘our two schemata H, and H, can be seen easily in the following example, where the crossing site has been marked with the separator symbol | A =012{1000 Ae tL s|ss so Hetetlior? Unless string A’s mate is identical to <1 at the fixed positions of the schema (a possibility that we conservatively ignore), the schema H, will be destroyed be: cause the 1 at position 2 and the 0 at position 7 will be placed in different off- spring (they are on opposite sides of the separator symbol marking the cross point, or cut point), It is equally clear that with the same cut point (between bits 3 and 4), schema H, will survive because the 1 at position 4 and the 0 at position 5 will be carried intact to a single offspring, Although we have used a specific cut, point for illustration, it is clear that schema H, is less likely to survive crossover than schema H, because on average the cut point is more likely to fall between the extreme fixed positions. To quantify this observation, we note that schema H, has a defining length of 5. If the crossover site is selected uniformly at random Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations among the ~ 1 = 7 — 1 = @possible sites, then clearly schema His destrovec with probability p, = 8(H,)(/ — 1) = 5/6 (it surviv ith probability p, 1 ~ p, = 146). Similarly. the schema H, has defining length 8H) = 1, and itis destroyed during that one event in siX where the Cut site is selected to occur between positions 4 and 5 such that p, = 1/6 or the survival probability is p, 1 py = 56. More generally, we sce that a lower bound on crossover survival probability ‘p, can be calculated for any schema. Because a schema survives when the cross site falls outside the defining length. the survival probability under simple cross: over isp, = 1 ~ 8(H)(1 — 1). since the schema is likely to be disrupted when- ever a site within the defining length is selected from the / — 1 possible sites. It crossover is itself performed by random choice, say with probability p, at a par- ticular mating, the survival probability may be given by the expression which reduces to the earlier expression when p, = 1.0. ‘The combined effect of reproduction and crossover may now be considered. ‘As when we considered reproduction alone, we are interested in calculating the number of a particular schema H expected in the next generation. Assuming independence of the reproduction and crossover operations, we obtain the estimate: m(H, t+ 1) > mH, 0) as pe ‘Comparing this to the previous expression for reproduction alone, the combined effect of crossover and reproduction is obtained by multiplying the expected number of schemata for reproduction alone by the survival probability under crossover p,. Once again the effect of the operations is clear. Schema H grows of decays depending upon a multiplication factor. With both crossover and repro- duction, that factor depends on two things: whether the schema is above or be- low the population average and whether the schema has relatively shost or long, defining length. Clearly, those schemata with both above-average obsetved per- formance and short defining lengths are going to be sampled at exponentially increasing rates. The last operator to consider is mutation. Using our previous definition, mut tation is the random alteration of a single position with probability p,,. In order for a schema H to survive, all of the specified positions must themselves survive. ‘Therefore, since a single allele survives with probability (1 — p,,), and since each of the mutations is statistically independent, a particular schema survives when each of the o(//) fixed positions within the schema survives. Multiplying the survival probability (1 — p,,) by itself o(#1) times, we obtain the probability of surviving mutation, (1 — p,,/*"”. For small values of Py, (Py << 1), the schema survival probability may be approximated by the expression 1 — O(H)'Py. We ‘Schema Processing at Work: An Example by Hand Revisited 33 therefore conclude that a particular schema H receives an expected number of copies in the next generation under reproduction, crossover, and mutation as given by the following equation (ignoring small cross-product terms): mH, 1) = m(H, €) a ~ po - ac, ‘The addition of mutation changes our previous conclusions little. Short, low. ‘order, above-average schemata receive exponentially increasing trials in subse- quent generations. This conclusion is important, so important that we give it a special name: the Schema Theorem, or the Fundamental Theorem of Genetic Al- gorithms. Although the calculations that led us to prove the schema theorem ‘were not too demanding, the theorem’s implications are far reaching and subtle. ‘To see this, we examine the effect of the three-operator genetic algorithm on schemata in a population through another visit to the hand-calculated Ga of Chapter 1 SCHEMA PROCESSING AT WORK: AN EXAMPLE BY HAND REVISITED Chapter 1 demonsteated the mechanics of the simple GA through a hand calew- lation of a single generation. Let us retuca to that example, this time observing how the GA processes schemata—not individual strings—withia the population. The hand calculation of Chapter 1 is reproduced in Table 2.1. In addition to the information presented earlier, we also keep a running count of three particular schemata, which we call Hy, Hy, and H,, where H, = 1****, H, = *10**, and Hy = "0. Observe the effect of reproduction, crossover, and mutation on the first schema, Hy. During the reproduction phase, the strings are copied probabil cally according to their fitness values. Looking at the first column of the table, we notice that strings 2 and 4 are both representatives of the schema 1****, After reproduction, we note that three copies of the schema have been produced (strings 2, 3, 4 in the mating pool column). Does this number correspond with the value predicted by the schema theofem? From the schema theorem we €x- pect to have m-/(H)f copies. Calculating the schema average /(H,), we obtain (576 + 36192 = 468.5. Dividing this by the population average f = 293 and multiplying by the number of H, schemata at time f, m(H,, ¢) = 2, we obtain the expected number of H, schemata at time ¢ + 1, m(H,t + 1) = 2-468.5/293 3.20. Comparing this to the actual number of schemata (three), we see that we have the correct number of copies. Taking this one step further, we realize that crossover cannot have any further effect on this schema because a defining length 8(H,) = 0 prevents disruption of the single bit. Furthermore, with the mutation, rate Set at p,, = 0.001 we expect to have m:p,, = 3:0.001 = 0.003 or no bits changed within the three schema copies in the three strings. As a result, we ob- Chapter 2 / Genetic Algorithms Revisited: Mathematical Foundations TABLE 2.1 GA Processing of Schemata—Hand Calculations String Processing i ‘Actual Initial Expected Count Population x Value pselect, count ‘com string (Randomly) (Unsigned) xy L Roulette No, Generated, Integer) x? Of f Wheel 1 o1i101 B 169 014 058 1 2 11000 24 57% 049197 2 3 01000 8 6§ 0060.22 ° 4 1oell 19 361 0311.23 1 Sum 1170-100 4.00 40 Average 293 025 1.00 10 Max 576049197 20 Schema Processing Before Reproduction String Schema Average Representatives Fitness fH) A, eres 2h 469 Hy slore 23 320 Hy eeeg 2 576 serve that for schema H,, we do obtain the expected exponentially increasing number of schemata as predicted by the schema theorem, So far, so good; but schema H, with its single fixed bit seems like something ofa special case. What about the propagation of important similarities with longer defining lengths? For example consider the propagation of the schema H, = *10°* and the schema H,, = 1***0. Following reproduction and prior to crossover the replication of schemata is correct. The case of H, starts with two examples in the initial population and ends with two copies following reproduction. This agrees with the expected number of copies, m(H,) = 2:320/293 = 2.18, where 320 is the schema average and 293 is the population average fitness. The case of AH, starts with a single example (string 2) and ends with two copies following reproduction (strings 2 and 3 in the string copies column). This agrees with the ‘expected number of copies m(H;) = 1-576/293 = 1.97, where 576 is the sche- ‘ma’s average fitness and 293 is the population's average fitness. The circum- stances following crossover are a good bit different. Notice that for the short schema, schema H,, the two copies are maintained even though crossover has

You might also like