100% found this document useful (1 vote)
2K views245 pages

Adaptation in Natural and Artificial Systems

Genetic algorithms are playing an increasingly important role in studies of complex adaptive systems, ranging from adaptive agents in economic theory to the use of machine learning techniques in the design of complex devices such as aircraft turbines and integrated circuits. Adaptation in Natural and Artificial Systems is the book that initiated this field of study, presenting the theoretical foundations and exploring applications. In its most familiar form, adaptation is a biological process, whereby organisms evolve by rearranging genetic material to survive in environments confronting them. In this now classic work, Holland presents a mathematical model that allows for the nonlinearity of such complex interactions. He demonstrates the model's universality by applying it to economics, physiological psychology, game theory, and artificial intelligence and then outlines the way in which this approach modifies the traditional views of mathematical genetics. Initially applying his concepts to simply defined artificial systems with limited numbers of parameters, Holland goes on to explore their use in the study of a wide range of complex, naturally occuring processes, concentrating on systems having multiple factors that interact in nonlinear ways. Along the way he accounts for major effects of coadaptation and coevolution: the emergence of building blocks, or schemata, that are recombined and passed on to succeeding generations to provide, innovations and improvements.

Uploaded by

el_ruso__83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
2K views245 pages

Adaptation in Natural and Artificial Systems

Genetic algorithms are playing an increasingly important role in studies of complex adaptive systems, ranging from adaptive agents in economic theory to the use of machine learning techniques in the design of complex devices such as aircraft turbines and integrated circuits. Adaptation in Natural and Artificial Systems is the book that initiated this field of study, presenting the theoretical foundations and exploring applications. In its most familiar form, adaptation is a biological process, whereby organisms evolve by rearranging genetic material to survive in environments confronting them. In this now classic work, Holland presents a mathematical model that allows for the nonlinearity of such complex interactions. He demonstrates the model's universality by applying it to economics, physiological psychology, game theory, and artificial intelligence and then outlines the way in which this approach modifies the traditional views of mathematical genetics. Initially applying his concepts to simply defined artificial systems with limited numbers of parameters, Holland goes on to explore their use in the study of a wide range of complex, naturally occuring processes, concentrating on systems having multiple factors that interact in nonlinear ways. Along the way he accounts for major effects of coadaptation and coevolution: the emergence of building blocks, or schemata, that are recombined and passed on to succeeding generations to provide, innovations and improvements.

Uploaded by

el_ruso__83
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 245

This excerpt from

Adaptation in Natural and Artificial Systems.


John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

Prefaceto the 1992Edition

When this book was originally publishedI was very optimistic, envisioningextensive
reviews and a kind of " best seller" in the realm of monographs
. Alas! That did not
happen. After five yearsI did regain someoptimism becausethe book did not " die,"
as is usual with monographs
, but kept on selling at 100- 200 copies a year. Still ,
researchin the areawas confinedalmostentirely to my studentsand their colleagues,
and it did not fit into anyone's categories. " It is certainly not a part of artificial
"
"
intelligence and Why would somebodystudy learning by imitating a processthat
takesbillions of years?" are typical of commentsmadeby thoselessinclined to look
at the work.
Five more yearssaw the beginningsof a rapid increasein interest. Partly, this
interest resulted from a change of focus in artificial intelligence. Learning, after
severaldecadesat the peripheryof artificial intelligence, wasagainregardedas pivotal
in the study of intelligence. A more important factor, I think, was an increasing
recognitionthat geneticalgorithmsprovide a tool in areasthat do not yield readily to
standardapproaches. Comparativestudiesbeganto appear, pointing up the usefulness
of genetic algorithms in areasranging from the design of integratedcircuits and
communicationnetworksto the designof stockmarketportfolios andaircraft turbines.
Finally, and quite important for future studies, genetic algorithmsbeganto be seen
as a theoreticaltool for investigatingthe phenomenageneratedby complexadaptive
- a collective designationfor nonlinear
systems
systemsdefinedby the interactionof
numbers
of
economies
large
, political systems, ecologies, immune
adaptiveagents(
systems, developingembryos, brains, and the like) .
The last five years have seen the number of researchersstudying genetic
algorithms increasefrom dozensto hundreds. There are two recent innovationsthat
will strongly affect thesestudies. The first is the increasingavailability of massively
parallel machines. Geneticalgorithmswork with populations, so they are intrinsically
suitedto executionon computerswith large numbersof processors
, using a processor
for each individual in the population. The secondinnovation is a unique interdisci-

Preface to the 1992 Edition

plinary consortium, the SantaFe Institute, dedicatedto the studyof complexadaptive


systems. The SantaFe Institute, by providing a focus for intensiveinteractionsamong
its collection of Nobel Laureates, MacArthur Fellows, Old and Young Turks, and
bright young postdocs, hasalreadymadea substantialimpactin the field of economics.
Current work emanatingfrom the Institute promisessimilar effects in fields ranging
es to cognitive science.
from studiesof the immunesystemto studiesof new approach
The future for studiesof adaptivesystemslooks bright indeed.
. Despite
Fifteen years should provide perspectiveand a certain detachment
that, or becauseof it , I still find the 1975 prefacesurprisingly relevant. About the
only changeI would make would be to put more emphasison improvementand less
on optimization. Work on the more complex adaptive systems- ecologies, for example
- has convincedme that their behavioris not well describedby the ttajectories
around global optima. Even when a relevant global optimum can be defined, the
"
"
systemis typically so far away from that optimum that basinsof attraction, fixed
'
points, and the other apparatusused in studying optima tell little about the systems
"
behavior. Instead, competitionbetweencomponentsof the system, aimed at getting
an edge" over neighboring competitors, determinesthe aggregatebehavior. In all
other respects, I would hold to the points madein the earlier preface.
There are changesin emphasisreflectedby two changesin terminology since
1975. Soonafter the book was published, doctoralstudentsin Ann Arbor beganusing
the term genetic algorithm in place of genetic plan , emphasizingthe centrality of
'
computationin defining and implementingthe plans. More recently, I ve advocated
"
"
implicit parallelism over intrinsic parallelism to distinguish the implicit workings
of the algorithm, via schemata
, from the parallel processingof the populationsused
.
the
by
algorithm
'
As a way of detailing somemore recentideasand research, I ve addeda new
chapter, chapter 10, to this edition. In part, this chapterconcernsitself with further
work on the advancedquestionsposedin section9.3 of the previousedition. Questions
-based, hierarchicalmodelsof
concerningthe designof systemsthat build experience
10.1
of
the new chapter. Questions
in
section
are
addressed
their environments
concerningspeciationand the evolution of ecologiesare addressedin terms of the
Echo modelsin section 10.3. The Echo models, besidesbeing concernedwith computer
-basedgedankenexperimentson thesequestions, have a broaderpurpose. They
are designedto facilitate investigationof mechanisms
, such as competitionand trading
, found in a wide rangeof complex adaptivesystems. In addition to thesediscussions
, the new chapteralso includes, in section 10.2, somecorrectionsto the original

Prefaceto the J992 Edition

edition. Section 10.4 concludesthe chapterwith a new set of advancedquestionsand


somenew speculations
.
There is more recent work, still in its earliest stages, that is not discussedin
, Marc Feldman, and I , working through the Santa
chapter 10. Freddy Christiannsen
Fe Institute, have begun to introducethe effects of schematainto much-generalized
versionsof Fisher' s equations. This work is, in part, a follow -up of work of a decade
ago, by Bob Axelrod and Bill Hamilton, that beganto study the relation of recombination
to the prevalenceof sex in spite of the two-fold genetic load it incurs. In
anotherdirection, somepreliminary theoreticalinvestigations, stimulatedby the Echo
models, suggestthat there is a schematheoremthat is relevantto any adaptivesystem
that can be describedin terms of resourceflow- such a systemmay involve neither
reproductionnor defined fitnessfunctions. Also in the wings is a characterizationof
" that are
a broadclassof problemsor " landscapes
relatively easyfor geneticalgorithms
but difficult for both traditional optimization techniquesand new weight-changing
techniquessuch as artificial nerve nets and simulatedannealing.
At a metalevel, the problem landscapeswe' ve beenstudying may describean
essential aspect of all problems encounteredby complex adaptive systems: Easy
(linear, hill -climbing) problems have a short-lived influence on complex adaptive
'
systemsbecausethey are quickly exploited and absorbedinto the systems structure.
"
"
Extremely difficult problems( spikes ) almost never influencethe behavior because
they are almostneversolved. This leavesas a major continuinginfluencethe presence
of certain kinds of " bottlenecks." Thesebottlenecksare regionsin the problem space
that offer improvementbut are surroundedby " valleys" of loweredperformance. The
time it takesto traversethesevalleys determinesthe trajectory, and rate of improvement
, of the adaptivesystem. It seemslikely that this rate will be determined, to a
"
"
) suppliedby
great degree, by recombinationapplied to building blocks (schemata
solutionsattachedto other regionsof high performance
.
It is an exciting time to study adaptationin natural and artificial systems;
perhapsthesestudieswill yield anotheredition sometimein the next millenium.
JOHNH. HOLLAND
OcTOBER1991

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

Preface

The first technical descriptions and definitions of adaptation come from biology.
In that context adaptation designatesany processwhereby a structure is progressively
modified to give better performance in its environment. The structures may
'
range from a protein molecule to a horse s foot or a human brain or , even, to an
interacting group of organisms such as the wildlife of the African veldt. Defined
more generally, adaptive processes have a critical role in fields as diverse as psychology
"
"
"
"
( learning ), economics ( optimal planning ), control , artificial intelligence
"
, computational mathematicsand sampling( statistical inference" ) . Basically,
adaptive processes are optimization processes, but it is difficult to subject them to
unified study becausethe structures being modified are complex and their performance
is uncertain. Frequently nonadditive interaction (i.e., " epistasis" or
"
"
nonlinearity ) makes it impossible to determine the performance of a structure
from a study of its isolated parts. Moreover possibilities for improved performance
must usually be exploited at the sametime that the searchfor further improvements
is pressed. While these difficulties pose a real problem for the analyst, we know
that they are routinely handled by biological adaptive processes, qua processes
.
The approach of this book is to set up a mathematical framework which makesit
possibleto extract and generalizecritical factors of the biological processes. Two
of the most important generalizationsare: ( I ) the concept of a schemaas agen eralization of an interacting, coadapted set of genes, and ( 2) the generalizationof
genetic operators such as crossing-over, inversion, and mutation. The schema
conceptmakesit possibleto dissectand analyzecomplex " nonlinear" or " epistatic"
interactions, while the generalizedgenetic operators extend the analysisto studies
of learning, optimal planning, etc. The possibility of " intrinsic parallelism" - the
testing of many schemataby testing a single structure - is a direct offshoot of this
approach. The book developsan extensivestudy of intrinsically parallel processes
and illustrates their uses over the full range of adaptive processes, both as hypotheses
and as algorithms.
The book is written on the assumptionthat the reader has a familiarity with

Pref~

probability and combinatorics at the level of a first course in finite mathematical


structures, plus enough familiarity with the concept of a systemto make the notion
of " state" a comfortable working tool . Readersso preparedshould probably read
the book in the given order, moving rapidly (on the first reading) over any example
or proof offering more than minor difficulties. A good deal of meaning can still
be extractedby those with lessmathematicsif they are willing to abide the notation ,
treating the symbols (with the help of the Glossary) as abbreviations for familiar
intuitive concepts. For such a reading I would recommend chapter 1 (skipping
over section 1.3), the first part of chapter 4 and the summary at the end, the discussions
throughout chapter 6 (particularly section 6.6), section 7.5, and most of
chapter 9, combined with use of the Index to locate familiar examplesand topics.
The reader whose first interest is the mathematical development (exclusive of
applications) will find section 2.2, chapters 4 and 5, sections6.2, 6.3, 6.4, 7.1, 7.2,
7.3, 7.5, and 9.1 the core of the book. By a judicious useof the Glossary and Index
it should be possible for a well-trained systemscientist to tackle this part of the
book directly. ( This is not a procedure I would recommend except as a way of
getting a mathematical overview before further reading; in a book of this sort the
examples have a particularly important role in establishing the meaning of the
formalism.)
The pattern of this book, as the reader seesit now, only distantly resembles
the one projected at its inception. The first serious writing began almost seven
yearsago at Pohoiki on the Big Island under the kamaaina hospitality of Carolyn
and Gilbert Hay. No book could start in a finer setting. Since that time whole
chapters, including chapterson hierarchies, the Kuhn -Tucker fixed point theorem,
and cellular automata, have come and gone, a vital pJ. f ) emerged, blossomedand
disappeared, 2-armed bandits arrived, and so on. At this removeit would be about
as difficult to chronicle those changesas to acknowledgeproperly the people who
have influenced the book along the way. Arthur Burks stands first among those
who provided the research setting and encouragementwhich made the book
feasible; Michael Arbib ' s commentson a near-final draft s~ nd as the culmination
of readings, written comments, commentaries, and remarksby more than a hundred
students and colleagues; and Monna Whipp ' s perseverancethrough the typing
of the final draft and revised revisions of changesbrings to fruition the tedious
work of her predecessors
. For the rest, I cannot conceivethat appearancein a long
list of names is a suitable reward, but I also cannot conceive a good alternative
( beyond personal expression), so they remain anonymous and bereft of formal
gratitude beyond someappearancesin the references.They deservebetter.
JOHNH. HOLLAND

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

1. The GeneralSetting

1. INTRODUCTION
How does evolution produce increasingly fit organisms in environments which are
highly uncertain for individual organisms?
'
What kinds of economic plan can upgrade an economy s performance in spite of
the fact that relevant economic data and utility measuresmust be obtained as the
economy develops?
How does an organism use its experienceto modify its behavior in beneficial ways
"
"
(i .e., how does it learn or adapt under sensory guidance ) ?
How can computers be programmed so that problem solving capabilities are built
"
"
"
"
up by specifying what is to be done rather than how to do it ?
What control procedures can improve the efficiency of an ongoing process,
when details of changing component interactions must be compiled and used
concurrently ?

Though these questions come from very different areas, it is striking how much
they have in common. Each involves a problem of optimization made difficult by
substantial complexity and uncertainty. The complexity makes discovery of the
optimum a long, perhaps never-to-be- completed task, so the best among tested
options must be exploited at every step. At the sametime, the uncertainties must
be reducedrapidly, so that knowledgeof availableoptions increasesrapidly . More
succinctly, information must be exploited asacquired so that performanceimproves
apace. Problems with thesecharacteristicsare even more pervasivethan the questions
above would indicate. They occur at critical points in fields as diverse as
evolution, ecology, psychology, economic planning, control , artificial intelligence,
computational mathematics, sampling, and inference.
There is no collective name for such problems, but whenever the term
adaptation(ad + aptare, to fit to ) appearsit consistently singlesout the problems
of interest. In this book the meaning of " adaptation" will be extendedto encom-

Adaptation in Natural Iand Artificial Systems

pass the entire collection of problems . (Among rigorous studies of adaptation ,


'
Tsypkin s [ 1971] usage comes closest to this in breadth , but he deliberately focuses
on the man - made systems.) This extension , if taken seriously , entails acommitment
to view adaptation as a fundamental process, appearing in a variety of guises
but subject to unified study . Even at the outset there is a powerful warrant for this
view . It comes from the observation that all variations of the problem give rise to
the same fundamental questions .
To what parts of its environment is the organism (system, organization) adapting ?
How does the environment act upon the adapting organism (system, organization) ?
What structures are undergoing adaptation ?
What are the mechanismsof adaptation ?
What part of the history of its interaction with the environment does the organism
(system, organization) retain ?
What limits are there to the adaptive process?
How are different ( hypothesesabout) adaptive processes to be compared?

Moreover, as we attempt to answerthesequestionsin different contexts, essentially


the same obstaclesto adaptation appear again and again. They appear with different
guises and names, but they have the same basic structure. For example,
"
" "
"
"
"
nonlinearity, false peak, and epistatic effect all designate versions of the
same difficulty . In the next section we will look more closely at these obstacles;
for now let it be noted that the study of adaptation is deeply concerned with the
meansof overcoming these obstacles.
Despite a wealth of data from many different fields and despite many insights
, we are still a long way from a generalunderstandingof adaptive mechanisms.
The situation is much like that in the old tale of blind men examiningan elephantdifferent aspectsof adaptation acquire different emphasesbecauseof the points of
contact. A specific feature will be prominent in one study, obscure in another.
Useful and suggestiveresults remain in comparative isolation. Under such circumstances
theory can be a powerful aid. Successfulanalysis separatesincidental
or " local" exaggerationsfrom fundamental features. A broadly conceivedanalytic
theory brings data and explanation into a coherent whole, providing opportunities
for prediction and control . Indeed there is an important sensein which a good
theory defines the objects with which it deals. It reveals their interactions, the
methods of transforming and controlling them, and predictions of what will
happen to them .

Theory will have a central role in all that follows, but only insofar as it
illuminates practice. For natural systems, this means that theory must provide

The General Setting

techniquesfor prediction and control ; for artificial systems, it must provide practical
algorithms and strategies. Theory should help us to know more of the mechanisms
of adaptation and of the conditions under which new adaptations arise.
It should enable us to better understand the processes whereby an initially unorganized
system acquires increasing self-control in complex environments. It
should suggestprocedureswhereby actions acquired in one set of circumstances
can be transferred to new circumstances. In short, theory should provide us with
means of prediction and control not directly suggestedby compilations of data
or simple tinkering . The developmenthere will be guided accordingly.
The fundamental questions listed above can serve as a starting point for a
unified theory of adaptation, but the informal phrasing is a source of difficulty .
With the given phrasing it is difficult to conceive of answerswhich would apply
unambiguouslyto the full range of problems. Our first task, then, is to rephrase
questions in a way which avoids ambiguity and encouragesgenerality. We can
avoid ambiguity by giving precisedefinitions to the terms appearing in the questions
, and we can assurethe desired generality if the terms are defined by embedding
them in a common formal framework. Working within such a framework
we can proceed with theoretical constructions which are of real help in answering
the questions. This, in broad outline, is the approach we will take.
2 . PRELIMINARY SURVEY
(Sincewe are operating outside of a formal framework in this chapter, someof the
statementswhich follow will be susceptibleof different, possibly conflicting interpretations
. Preciseversions will be formulated later.)
'
Just what are adaptation s salient features? We can seeat once that adaptation
, whatever its context, involves a progressivemodification of some structure
or structures. These structures constitute the grist of the adaptive process, being
largely determined by the field of study. Careful observation of successivestructural
modifications generallyrevealsa basicset of structural modifiers or operators;
.
repeated action of these operators yields the observed modification sequences
the
associated
with
structures
operators
Table I presentsa list of sometypical
along
for severalfields of interest.
A systemundergoing adaptation is largely characterizedby the mixture of
operators acting on the structures at each stage. The set of factors controlling this
changing mixture - the adaptive plan- constitutes the works of the system as far
as its adaptive character is concerned. The adaptive plan determinesjust what
structuresarise in responseto the environment, and the set of structuresattainable

Adaptation in Natural and Artificial Systems

Table1: Typical
Sb' uctures. . . Operators
Field
Structures
Genetics

Operators

chromosomes

mutation, recombination
,
etc.

mixes of goods

productionactivities
'
Bayess rule, successive
approximation, etc.
synapsemodification
rules for iterative approximation
of optimal strategy
66
"
Iearning rules

policies
cell assemblies

Physiologicalpsychology
Gametheory

strategies

Artificialintelligence

programs

by applying all possible operator sequencesmarks out the limits of the adaptive
'
plan s domain of action. Since a given structure performs differently in different
environments- the structure is more or less fit - it is the adaptive plan' s task to
"
"
produce structures which perform well (are fit ) in the environment confronting
"
"
it . Adaptations to the environment are persistent properties of the sequenceof
structures generatedby the adaptive plan.
A precisestatement of the adaptive plan' s task servesas a key to uniform
treatment. Three major componentsare associatedin the task statement: ( I ) the
environment, E, of the system undergoing adaptation, (2) the adaptive plan, T,
'
whereby the systems structure is modified to effect improvements, (3) a measure,
#I., of performance, i.e., the fitness of the structures for the environment. ( The
formal framework developedin chapter 2 is built around thesethree components.)
The crux of the problem for the plan T is that initially it has incomplete information
about which structures are most fit . To reduce this uncertainty the plan must
"
test th~ performanceof different structuresin the environment. The " adaptiveness
of the plan enters when different environmentscausedifferent sequencesof structures
to be generatedand tested.
In more detail and somewhat more formally : A characteristic of the environment
can be unknown (from the adaptive plan' s point of view) only if alternative
outcomes of the plan' s tests are allowed for. Each distinct combination of
alternatives is a distinct environment E in which the plan may have to act. The set
of all possiblecombinations of alternatives indicates the plan' s initial uncertainty
about the environment confronting it - the range of environments in which the
plan should be able to act. This initial uncertainty about the environment will be
formalized by designatinga class8 of possibleenvironments. The domain of action

The General Setting

of the adaptive plan will be formalized by designatinga set d of attainable structures


. The fact that different E E: 8 in general elicit different performancesfrom
a given structure A Ed meansformally that there will be a different performance
measureIJ.B associatedwith each E. Each field of study is typified as much by its
performancemeasuresas by its structures and operators. For the fields mentioned
in connection with examplesof structures and operators, we have a corresponding
list of performance measures:

PerformanceMeasure

Genetics

Fitness

Economicplanning
Control

Utility
Error functions

Physiologicalpsychology

Performance rate (in some contexts.

but often unspecified


)
Game theory
Artificial

intelligence

Payoff
Comparativeefficiency(if specified
at all)

The successivestructural modifications dictated by a plan oramount to a sequence


or trajectory through the set Ci. For the plan to be adaptive the trajectory through
Ci must depend upon which environment E E: & is present. Symbolizing the set of
operators by 0 , this last can be stated another way by saying that the order of
application of operators from 0 must depend upon E.
It is clear that the organization of Ci, the effects of the operators 0 upon
structuresin Ci, and the form of the performancemeasure/JiBall affect the difficulty
of adaptation. Among the specific obstaclesconfronting an adaptive plan are the
following :
I . Clis large so that there are many alternativesto be tested.
2. The structures in Cl are complicated so that it is difficult to determine
which substructures or components (if any) are responsible for good
performance.
3. The performance measure liB is a complicated function with many
interdependent parameters (e.g., it has many dimensions and is nonlinear
, exhibiting local optima, discontinuities, etc.).

Adaptation in Natural and Artificial Systems

4. The performance measure varies over time and space so that given
adaptations are only advantageousat certain placesand times.

s. The environment E presentsto 'Ta great flux of information (including


performances) which must be filtered and sorted for relevance.
By describing these obstacles, and the adaptive plans meant to overcome them,
within a general framework, we open the possibility of discovering plans useful
in any situation requiring adaptation.
Before going further let us flesh out theseabstractions by using them in the
description of two distinct adaptive systems, one simple and artificial , the other
complex and natural.

3. A SIMPLEARTIFICIALADAPTIVESYSTEM
.The artificial adaptive systemof this
example is a pattern recognition device. ( The
device to be described has very limited capabilities; while this is important in
'
applications, it does not detract from the device s usefulness as an illustration .)
The information to be fed to the adaptive device is preprocessedby a rectangular
array of sensors, a units high by b units wide. Each sensor is a threshold device
which is activated when the light falling upon it exceedsa fixed threshold. Thus,
when a " scene" is presentedto the sensor array at some time I , each individual
sensor is either " on" or " off" depending upon the amount of light reaching it .
Let the activity of the ith sensor, i = 1, 2, . . . , ab, at time I be representedformally
"
by the function a.( I ), wherea.( I ) = 1 if the sensoris " on and a.( I ) = 0 if it is " off."
"
A given scenethus givesrise to a configuration of ab ones" and " zeros." All told
there are 20. possible configurations of sensoractivation ; let C designatethis set
.
of possible configurations. It will be assumedthat a particular subset of Cl of C
correspondsto (instancesof) the pattern to be recognized. The particular subset
involved, among the 22- posSible, will be unknown to the adaptive device. ( E.g.,
Cl might consist of all configurations containing a connected X -shaped array of
ones, or it might consist of all configurations containing as many ones as zeros,
or it might be anyone of the other 22- possiblesubsetsof C.) This very large set of
possibilities constitutes the class of possibleenvironments 8 ; it is the set of alternatives
the adaptive plan must be prepared to handle. The adaptive device's task
is to discover or " learn" which element of8 is in force by learning what configurations
belong to Ct. Then, when an arbitrary configuration is presented, the device
can reliably indicate whether the configuration belongs to Cl, thereby detecting
an instance of the pattern.

The General Setting

SENSOR
SCENE
.
.
.
THRESHOLD
CE

DEVI

The sceneshownis classifiedas c+ becauseEI ! I W.-8,( t) = 8I(t) + 2as


(t) + 2aa
( t)
.
.
.
4.
t
t
t
>
+ 84
( ) + 281
( ) + 4a.( ) +
Fig. 1. A limpie' pattern recognizer
The particular pattern recognition device considered here- a linear
threshold device- processes the input signals8,( 1) by first multiplying each one by
some weight w.. and then summing them to yield Er . 1w,-8,( I ) . When this sum
exceedsa given fixed threshold K the input configuration will be said to be a
member of the set C+ , otherwise a member of the set C- . ( It should be clear that
c+ U C- = C and that c+ n C- is empty, so that the linear threshold device
partitions C into two classes.) More precisely c+ is supposedto be an approximation
to C1, so that when the sum exceedsthe fixed threshold K , the deviceindicates
(rightly or wrongly) that the input configuration is an instance of the pattern. The
object of the adaptive plan, then, is to discover as rapidly as possible a set of

J, Systems
Adaptation in Natural and ArtiJicia

weights for which the partition ( C+, C- ) approximates the partition ( C1, Co), so
that c+ ~ C1 and C- ~ Co. ( This device, as noted earlier, is quite limited ; there
are many partitions ( C1, Co) that can only be poorly approximated by ( C+, C- ),
no matter what set of weights is chosen.) Now , let W = {VI, VI, . . . , Vi} be the set
of possible values for the weights Wi; that is, each Wi E: W, i = 1, . . . , ab. Thus,
with a fixed threshold K , the set of attainable structures is the set of all ab-tupies,
Wa6.
The natural performance measure, liB, relative to any particular partition
E E: 8 is the proportion of all configurations correctly assigned(to C1 and Co).
That is, liB maps each ab-tuple into the fraction of correct recognitions achieved
- +[ O, 1] . (In this example the outcome
thereby, a number in the interval [0, I ] , liB: Wa6
of each test- " configuration correctly classified" or " configuration incorrectly
classified" - will be treated as the plan' s input . The sameab- tuple may have
to be tested repeatedly to establish an estimate of its performance.)
A simple plan TOfor discovering the best set of weights in Wa6is to try
various ab- tupies, either in some predetermined order or at random, estimating
the performance of each in its turn ; the best ab- tuple encounteredup to a given
"
"
point in time is savedfor comparison with later trials- this best-to-date ab-tuple
being replaced immediately by any better ab-tuple encounteredin a later trial . It
should be clear that this procedure must eventually uncover the " best" ab- tuple
in Wa6
. But note that even for k = 10 and a = b = 10, Wa6has 10100
elements.
This is a poor augury for any plan which must exhaustivelysearch W.&. And that
is exactly what the plan just describedmust undertake, sincethe outcome of earlier
tests in no way affects the ordering of later tests.
Let ' s look at a (fairly standard) plan TOO
which doesusethe outcome of each
test to help determine the next structure for testing. The basic idea of this plan is
to change the weights whenever a presentation is misassignedso as to decrease
the likelihood of similar misassignmentsin the future. In detail : Let the values in
W be ordered in increasingmagnitude so that Vi+l > Vi, j = I , 2, . . . , k - 1 (for
instance, the weights might be located at uniform intervals so that Vi+l = Vi + ~ ) .
Then the algorithm proceedsaccording to the following prescription :
1. If the presentation at time t is assignedto Cowhen it should have been
assignedto C1 then, for each i such that 8.{ t ) = 1, replace the corresponding
weight by the next highest weight (in the case of uniform
intervals the new weight would be the old weight Wi increased by A,
Wi + .6). Leave the other weights unchanged.

The General Setting

2. If the presentationat time t is assignedto C1instead of Cothen, for each


i such that 8.( t ) = I , replace the corresponding weight by the next
lowest weight (for uniform intervals, the new weight is w.. - ~ ).
We cannot yet fruitfully discussthe merits of this plan in comparison to
alternatives; we can only note that the order in which 1'00tests the elementsof ci.
does indeed depend upon the information it receives. That is, the trajectory
., of prior tests.
through ci. = Wobis conditional on the outcomesII.. ( A )~A E: Ci
4 . A COMPLEX NATURAL ADAPTIVE SYSTEM
Here we will look at biological adaptation via changesin genetic makeup- the
first of a series of progressively more detailed examinations. This section will
present only biological facts directly relevant to adaptation, with a caveat to the
reader about the dangers of unintentional emphasis and oversimplification in herent in such a partial picture.
It is a familiar fact ( but one we will delve into later) that every organism is
an amalgam of characteristicsdetermined by the genesin its chromosomes. Each
gene has several forms or alternatives- a //eles- producing differencesin the set
of characteristicsassociatedwith that gene. ( E.g., certain strains of garden pea
have a single genewhich determinesblossomcolor , one allele causingthe blossom
to be white, the other pink ; bread mold has a gene which in normal form causes
synthesisof vitamin Rt, but severalmutant alleles of the geneare deficient in this
ability ; human sickle cell anemia results from an abnormal allele of one of the
genes determining the structure of hemoglobin- interestingly enough, in environments
where malaria is endemic, the abnormal allele can confer an advantage.)
There are tens of thousands of genesin the chromosomesof a typical vertebrate,
each of which (on the evidence available) has several alleles. Taking the set of
attainable structures (t, to be the set of chromosomes obtained by making all
possiblecombinations of alleles, we seethat (t, contains on the order of 210.000~
101000
structures for a typical vertebrate species(assuming 2 alleles for each of
10,(XX
) genes). Even a very large population, say 10 billion individuals of that
species,contains only a minuscule fraction of the possibilities.
The enormous number of possible genetic structures- genotypes- for a
single vertebrate speciesis an indicator of the complexity of such systems, but it
is only an indicator. The basic complexity of these systemscomes from the interactions
of the genes. To seejust how extensivethese interactions are, it is worth

Adaptation in Natural and ArtificialSystems

looking briefly at
different alleles of
the variations in
these proteins ( or

some of the related biochemistry . Without going into detail ,


the same gene produce related proteins , which in turn produce
expressed characteristics associated with that gene. Typically
combinations of them ) are powerful biological catalysts called
of
enzymes, capable
modifying reaction rates by factors of IO,<XX> and more . For
this reason , genes exercise extensive control over the ongoing reactions in a cell the enzymes they produce modulate ongoing reactions so strongly that they are
the major determinants of the cell ' s form . Moreover , the products of any given
enzyme- controned reaction may , and generally do , enter into several subsequent
reactions . Thus the effects of changes in a single enzyme are often widespread ,
causing gross changes in cell form and function . The human hereditary disorder
called phenylketonuria results from an ( undesirable ) allele of a single gene; the
presence of this allele has pronounced effects upon a whole battery of characteristics
ranging from hair color and head size through intelligence . It is equally true
that several genes may jointly determine a given characteristic , e.g ., eye color in
humans .
All of this adds consider ably to the complexity of the system, but the
greatest complexities come about because the effects of different enzymes are not
additive - a phenomenon known as epistasis. For example , if a sequence of reactions
depends upon several enzymes, for practical purposes the sequence does not
proceed at all until all of the enzymes are present ; subtraction of one enzyme stops
the reaction completely . More complicated reactions involving positive and
negative feedback are common , particularly those in which the output of a reaction
sequence is a catalyst or inhibitor for some intermediate step of the reaction .
The main point is that the effect of each allele depends strongly upon what other
alleles are present and small changes can often produce large effects. The amalgam
of observed characteristics - the phenotype - depends strongly upon these epistatic
effects.
Because of epistasis there is no simple way to apportion credit to individual
alleles for the performance of the resulting phenotype . What may be a good allele
when coordinated with an appropriate set of alleles for other genes, can be disastrous
in a different genetic context . Thus adaptation cannot be accomplished
by selecting among the alleles for one gene independently of what alleles appear
for other genes. The problem is like the problem of adjusting the " height ," " vertical
"
"
"
linearity , and vertical hold controls on a television set. A " best setting " for
"
"
height , ignoring the settings of the other two controls , will be destroyed as soon
as one attempts to better the setting of either of the other two controls . The problem
is vexing enough when there are three interdependent controls , as anyone who

The General Setting

has attempted these adjustments can testify, but it pales in comparison to the
genetic casewhere dozensor hundreds of interdependentalleles can be involved.
Roughly, the difficulty of the problem increasesby an order of magnitude for each
additional gene when the interdependenciesare intricate ( but seethe discussions
in chapter 4 and pp. 160- 61).
Given the pervasivenessof epistasis, adaptation via changes in genetic
makeup becomesprimarily a search for coadaptedsets of alleles- alleles of different
geneswhich together significantly augment the performance of the corresponding
phenotype. (In chapter 4 the concept of a coadaptedset of alleleswill be
generalized, under the term schema, to the point where it applies to the full range
of adaptive systems.) It should be clear that coadaptation dependsstrongly upon
the environment of the phenotype. The large coadapted set of alleles which produces
gills in fish augments performance only in aquatic environments. This dependence
of coadaptation upon characteristics of the environment gives rise to
the notion of an environmentalniche, taken here to mean a set of features of the
environment which can be exploited by an appropriate organization of the phenotype. ( This is a broader interpretation than the usual one which limits niche to
those environmental features particularly exploited by a given species.) Examples
of environmental niches fitting this interpretation are: (i ) an oxygen-poor, sulfurrich environment such as is found at the bottom of ponds with large amounts of
decaying matter- a class of anaerobic bacteria, the thiobacilli , exploits this niche
by meansof a complex of enzymesenabling them to use sulfur in place of oxygen
to carry out oxidation ; (ii ) the " bee-rich " environment exploited by the orchid
Ophrys apifera which has a flower mimicking the bee closely enough to induce
pollination via attempted copulation by the male bees; (iii ) the environment rich
in atmosphericvibrations in the frequencyrangeof 50 to 50,000 cyclesper secondthe bonesof the mammalian ear are a particular adaptation of parts of the reptilian
jaw which aids in the detection of these vibrations, an adaptation which clearly
must be coordinated with many other adaptations, including a sophisticated
information -processingnetwork, before it can improve an organism' s chancesof
survival. It is important to note that quite distinct coadapted sets of alleles can
exploit the sameenvironmental niche. Thus, the eye of aquatic mammals and the
(functionally similar) eye of the octopus exploit the same environmental niche,
but are due to coadaptedsets of alleles of entirely unrelated setsof genes.
The various environmental niches E E:: 8 define different opportunities for
adaptation open to the genetic system. To exploit these opportunities the genetic
systemmust selectand use the setsof coadapted alleles which produce the appropriate
phenotypic characteristics. The central question for geneticsystemsis: How

Adaptation in Natural and Artificial

Systems

are initially unsuited structures transformed to (an observedrange of) structures


suited to a variety of environmental niches8? To attempt a generalanswerto this
question we need a well-developedformal framework. The framework available at
this point is insufficient, even for a careful description of a candidate adaptive
plan l' for geneticsystems, unlike the caseof the simpler artificial system. A fortiori ,
questions about such adaptive plans, and critical questions about efficiency, must
wait upon further development of the framework. We can explore here some of
the requirementsan adaptive plan l' must meet if it is to be relevant to data about
geneticsand evolution.
In beginning this exploration we can make good use of a concept from
mathematical genetics. The action of the environment E E: 8 upon the phenotype
(and thereby upon the genotypeA E: (t ) is typically summarizedin mathematical
studies of geneticsby a single performancemeasureIJ.B calledfitness. Roughly, the
fitness of a phenotype is the number of its offspring which survive to reproduce
(precise definitions will be given later in connection with the appropriate formal
models, seesection 3.1) . This measurerests upon a universal, and familiar , feature
of biological systems: Every individual (phenotype) exists as a member of a population
of similar individuals, a population constantly in flux becauseof the reproduction
and death of the individuals comprising it . The fitness of an individual is
.
clearly related to its influence upon the future development of the population.
When many offspring of a given individual survive to reproduce, then many members
of the resulting population, the " next generation," will carry the alleles of
that individual . Genotypesand phenotypesof the next generationwill be influenced
accordingly.
Fitness, viewed as a measureof the genotype's influence upon the future,
introduces a concept useful through the whole spectrum of adaptation. A good
way to seethis concept in wider context is to view the testing of genotypesas a
sampling procedure. The samplespacein this caseis the set of all genotypes(t and
the outcome of each sampleis the performanceIJ.B of the correspondingphenotype.
The general question associatedwith fitness, then, is: To what extent does the
outcome IJ.s( A ) of a sampleA E: (t influence or alter the sampling plan l' (the kinds
of samplesto be taken in the future) ? Looking backward instead of forward , we
encounter a closely related question: How does the history of the outcomes of
previous samplesinfluence the current sampling plan? The answersto thesequestions
go far toward determining the basic character of any adaptive process.
We have already seen that the answer to the first question, for genetic
systems, is that the future influence of each individual A E: (t is directly proportional
to the sampledperformanceIJ.s( A) . This relation need not be so ingeneral-

TheGeneral
Sett;nR

there are many well-established procedures for optimization, inference, mathematical


learning, etc., where the relation between sampled performance and
future sampling is quite different. Nevertheless reproduction in proportion to
measuredperformance is an important concept which can be generalizedto yield
sampling plans- reproductiveplans - applicable to any adaptive problem (including
the broad class of problems where there is no natural notion of reproduction) .
Moreover, once reproductive plans have been defined in the formal framework,
it can be proved that they are efficient (in a reasonablesense) over a very broad
range of conditions.
A part of the answer to the second question, for genetic systems, comes
from the observation that future populations can only develop via reproduction
of individuals in the current population. Whatever history is retained must be
representedin the current population. In particular, the population must serveas
a summary of observed sample values (performances). The population thereby
has the same relation to an adaptive processthat the notion of (complete) state
has to the laws of physicsor the transition functions of automata theory. Knowing
the population structure or state enablesone to determine the future without any
additional information about the past of the system. ( That is, different sampling
sequenceswhich arrive at the samepopulation will have exactly the sameinfluence
on the future.) The state concept has been used as a foundation stone for formal
models in a wide variety of fields; in the formal developmentto follow generalizations of population structure will have this role.
An understanding of the two questionsjust posedleads to a deeperunderstanding
of the requirementson a geneticadaptive plan. It also leadsto an apparent
dilemma. On the one hand, if offspring are simple duplicates of fit membersof the
population, fitness is preservedbut there is no provision for improvement. On the
other hand, letting offspring be produced by simple random variation (a process
practically identical to enumeration) yields a maximum of new variants but makes
no provision for retention of advancesalready made. The dilemma is sharpened
by two biological facts: ( I ) In biological populations consisting of advanced
organisms (say vertebrates) no two individuals possessidentical chromosomes
( barring identical twins and the like ) . This is so even if we look over many (all )
successivegenerations. (2) In realistic cases, the overwhelming proportion of
possible variants (all possible allele combinations, not just those observed) are
incapable of surviving to produce offspring in the environments encountered.
Thus, by observation ( I ), advancesin fitness are not retained by simple duplication
. At the same time, by observation (2), the observedlack of identity cannot
result from simple random variation becauseextinction would almost certainly

Adaptation in Natural and Artificial Systems

follow in a single generation- variants chosen completely at random are almost


certain to be sterile.
In attempting to seehow this " dilemma" is resolved, we begin to encounter
some of the deeperquestionsabout adaptation. We can only hint at the dilemma' s
resolution in this preliminary survey. Even a clear statement of the resolution
requires a considerableformal structure, and proof that it is in fact a resolution
requiresstill more effort. Much of the understandinghingeson posing and answering
two questions closely related to the questions generated by the concept of
fitness; How can an adaptive plan l' (specifically, here a plan for genetic systems)
retain useful portions of its (rapidly growing) history along with advancesalready
made? How is the adaptive plan l' to accessand useits history (the portion stored)
to increasethe likelihood of fit variants (A (t such that IJ.g( A ) is above average) ?
Once again theseare questionsrelevant to the whole spectrum of fields mentioned
at the outset.
The resolution of the dilemma lies in the action of the genetic operators 1}
within the reproductive plan 1'. The best-known genetic operators exhibit two
properties strongly affecting this action: ( I ) The operators do not directly affect
the size of the population- their main effect is to alter and redistribute alleles
within the population. ( The allelesin an individual typically come from more than
one source in the previous generation, the result, for example, of the mating of
parents in the case of vertebrates, or of transduction in the case of bacteria.) ( 2)
The operators infrequently separatealleles which are close together on a chromosome
. That is, alleles close together typically remain close together after the
operators have acted.
Useful clues to the dilemma' s resolution emergewhen we look at the effect
of these operators in a simple reproductive plan, 1'1. This plan can be thought of
as unfolding through repeated application of a two- phase procedure: During
phase one, additional copies of (some) individuals exhibiting above-averageperformance
are added to the population while (some) individuals of subaverage
performanceare deleted. More carefully, each individual has an expectednumber
of offspring, or rate of reproduction, proportional to its performance. ( If the
"
"
population is to be constant in size, the ratesof reproduction must be normalized
so that their averageover the population at any time is 1.) During phasetwo , the
genetic operators in 1} are applied, interchanging and modifying sets of alleles in
the chromosomes of different individuals, so that the offspring are no longer
identical to their progenitors. The result is anew , modified population . The process
is iterated to produce successivegenerationsof variants.
More formally, in an environment which assignsan observableperformance

The General Setting

to each individual , '7"1 acts as follows: At the beginning of each time period t , the
'
plan s accumulatedinformation about the environment residesin a finite population
Ci(t ) selectedfrom (i,. The most important part of this information is given by
the discrete distributions which give the proportions of different sets of alleles in
the population Ci(t ). Ci(t ) servesnot only as the plan' s repository of accumulated
'
information , but also as the sourceof new variants which will give rise to Ci(t + 1).
As indicated earlierithe formation of Ci(t + 1) proceedsin two phases. During the
first phase, Ci(t ) is modified to form (i, '( I ) by copying each individual in Ci(t ) a
number of times dependent upon the individual ' s observed performance. The
number of copies made will be determined stochastically so that the expected
number of copies increasesin proportion to observed performance. During the
secondphase, the operators are applied to the population (i,' (I), interchanging and
modifying the setsof alleles, to produce the new generation Ci(t + I ) .
One key to understanding '7"1'S resolution of the dilemma lies in observing
what happensto small setsof adjacent alleles under its action. In particular, what
happensif an adjacent set of alleles appears in several different chromosomesof
above-averagefitness and not elsewhere? Becauseeach of the chromosomeswill
be duplicated an above-averagenumber of times, the given alleles will occupy an
increasedproportion of the population after the duplication phase. This increased
proportion will of course result whether or not the alleles had anything to do with
the above-averagefitness. The appearanceof the alleles in the extra-fit chromosomes
, but it is equally true that any correlation between
might be happenstance
the given selection of alleles and above-average fitness will be exploited by this
action. Moreover, the more varied the chromosomescontaining the alleles, the
lesslikely it is that the alleles and above-averagefitness are uncorrelated.
What happensnow when the genetic operators {} are applied to form the
next generation? As indicated earlier, the closer alleles are to one another in the
chromosome the less likely they are to be separatedduring the operator phase.
Thus the operator phase usually transfers adjacent sets of genesas unit , placing
them in new chromosomal contexts without disturbing them otherwise. These
new contexts further test the sets of alleles for correlation with above-average
fitness. If the selectedset of allelesdoes indeed augment fitness, the chromosomes
containing the set will again (on the average) be extra fit . On the other hand, if the
, sustained association with extra-fit
prior associationswere simply happenstance
chromosomesbecomesincreasingly less likely as the number of trials (new contexts
) increases. The net effect of the genetic plan over severalgenerationswill be
an increasing predominanceof alleles and sets of alleles augmenting fitness in the
given environment.

AdaPtation in Natural and Artificial Systems

In observing what happensto small sets of genesunder its action, we have


seenone way in which the plan T! preservesthe history of its interactions with the
environment. It also retains certain kinds of advancesthereby, favoring structural
components which have proved their worth by augmenting fitness. At the same
time , since these components are continually tried in new contexts and combinations
, stagnation is avoided. In brief , sets of alleles engendering above-average
performance provide comparative successin reproduction for the chromosomes
carrying them. This in turn assuresthat thesealleles becomepredominant components
of later generations of chromosomes. Though this description is sketchy,
it does indicate that reproductive plans using genetic operators proceed in a way
which is neither enumeration nor simple duplication of fit structures. The full
story is both more intricate and more sophisticated. Becausereproductive plans
are provably efficient over a broad range of conditions, we will spendconsiderable
time later unraveling the skeins of this story.
5 . SOME GENERAL OBSERVATIONS
One point which comesthrough clearly from the examplesis the enormous size of
d , even for a very modest system. This size has a fatal bearing on what is at first
"
"
sight a candidate for a universal adaptive plan. The candidate, called 'Toin the
first example, and henceforth designatedan enumerativeplan, exhaustively tests
the structures in dEnumerative plans are characterizedby the fact that the order
in which they test structures is unaffected by the outcome of previous tests. For
example, the plan first generatesand testsall structuresattainable (from an initially
given structure) by single applications of the basic operators, then all structures
attainable by two applications of the operators, etc. The plan preservesthe fittest
structure it has encounteredup to any given point in the process, replacing that
structure immediately upon generating a structure which is still more fit . Thus,
given enough time (and enough stability of the environment so that the fitness of
structures does not changeduring the process) an enumerativeplan is guaranteed
to discover the structure most fit for any environment confronting it . The simplicity
of this plan, together with the guaranteeof discovering the most fit structure
, would seemto make it a very important adaptive plan. Indeed enumerative
plans have been repeatedly proposed and studied in most of the areas mentioned
in section 1.1. They are often set forth in a form not obviously enumerative, particularly
in evolutionary studies (mutation in the absenceof other genetic operators
), learning (simple trial -and-error), and artificial intelligence(random search).

The GeneralSetting

However, in all but the most constrained situations, enumerativeplans are a false
lead.
"
The flaw, and it is a fatal one, assertsitself when we begin to ask, How
"
long is eventually? To get somefeeling for the answerwe need only look back at
structures in d .
the first example. For that very restricted systemthere were 10100
In most casesof real interest, the number of possiblestructures vastly exceedsthis
number, and for natural systemslike the geneticsystemswe have already seenthat
arise. If 1012structures could be tried every second
numbers like 210,000~ 101000
(the fastestcomputers proposedto date could not even add at this rate), it would
take a year to test about 3 . 1018structures, or a time , 'a.stly exceedingthe estimated
structures.
age of the universeto test 10100
It is clear that an attempt to adapt by meansof an enumerativeplan is foredoomed
in all but the simplestcasesbecauseof the enormoustimes involved, This
extreme inefficiency makes enumerative plans uninteresting either as hypotheses
.about natural processes or as algorithms for artificial systems. It follows at once
that an adaptive plan cannot be consideredgood simply becauseit will eventually
produce fit structuresfor the environmentsconfronting it ; it must do so in area
"
"
on
the
sonable time span. What a reasonabletime span is depends strongly
environments(problems) under consideration, but in no casewill it be a time large
"
with respectto the age of the universe. This question of efficiency or reasonable
"
time span is the pivotal point of the most serious contemporary challenge to
evolutionary theory : Are the known geneticoperators sufficient to account for the
changesobservedin the alloted geological intervals? There is of course evidence
for the existenceof adaptive plans much more efficient than enumeration. Arthur
Samuel( 1959) has written a computer program which learned to play tournament
calibre checkers, and humans do manageto adapt to very complex environments
in times considerably lessthan a century. It follows that a major part of any study
of the adaptive processmust be the discovery of factors which provide efficiency
while retaining the " universality" (robustness) of enumeration. It does not take
analysisto seethat an enumerative plan is inefficient just becauseit always generates
structures in the same order, regardlessof the outcome of tests on those
structures. The way to improvement lies in avoiding this constraint.
'
The foregoing points up again the critical nature of the adaptive plan s
initial uncertainty about its environment, and the central role of the proceduresit
usesto store and accessthe history of its interactions with that environment. Since
'
different structures perform differently in different environments, the plan s task
is set by the aspectsof the environment which are unknown to it initially . It must

J Systems
Adaptation in Natural and Arlificia

generate structures which perform well (are fit ) in the particular environment
confronting it , and it must do this efficiently. Interest centers on robust adaptive
plans- plans which are efficient over the range of environments 8 they may encounter
. Giving robustness precise definition and discovering something of the
factors which make an adaptive plan robust is the formal distillation of questions
about efficiency. Becauseefficiencyis critical , the study of robustnesshas a central
place in the formal development.
The discussion of genetic systemsemphasizedtwo general requirements
bearing directly on robustness: ( I ) The adaptive plan must retain advancesalready
made, along with portions of the history of previous plan-environmentinteractions.
(2) The plan must use the retained history to increasethe proportion of fit structures
generatedas the overall history lengthens. The samediscussionalso indicated
the potential of a particular class of adaptive plans- the reproductive plans. One
of the first tasks, after setting out the formal framework, will be to provide a general
definition of this class of plans. Lifting the reproductive plans from the specific
geneticcontext makesthem useful acrossthe full spectrum of fields in which adaptation
has a role. This widened role for reproductive plans can be looked upon as
a first validation of the formalism. A much more substantial validation follows
closely upon the definition , when the general robustnessof reproductive plans is
proved via the formalism. Later we will see how reproductive plans using generalized
genetic operators retain and exploit their histories. Throughout the development
, reproductive plans using genetic operators will serveto illuminate key
features of adaptation and, in the process, we will learn more of the robustness,
wide applicability , and general sophistication of such plans.
Summarizing: This entire survey has been organized around the concept of
an adaptive plan. The adaptive plan, progressivelymodifying structure by means
of suitable operators, determineswhat structures are produced in responseto the
environment. The set of operatorsn and the domain of action of the adaptive plan
(t (i.e., the attainable structures) determine the plan' s options ; the plan' s objective
is to produce structures which perform well in the environment E confronting it .
The plan' s initial uncertainty about the environment- its room forimprovement is reflectedin the range of environments8 in which it may haveto act. The related
performance measuresIJ.BE E: 8, change from environment to environment since
the same structure performs differently in different environments. These objects
lie at the center of the formal framework set out in chapter 2. Chapter 3 provides
illustrations of the framework as applied to genetics, economics, game-playing,
searches, pattern recognition, statistical inference, control , function optimization ,
and the central nervous system.

The General Setting

A brief look at the enormous times taken by enumerativeplans to discover


fit structures, even when the domain of action disgreatly constrained, makes it
clear that efficiency is a sine qua non of studies of adaptation. Efficiency acts as a
"
"
cutting edge, shearing away plans too slow to serve as hypotheses about
natural systemsor as algorithms for artificial systems. Whether an adaptive plan
is to serve as hypothesis or algorithm, information about its robustness- its efficiency
in the environments8- is critical . The latter part of this book will be much
concernedwith this topic . Chapter 4 introduces a critical tool for the investigation
and construction of efficient adaptive plans- schemata. This generalization of
coadaptedsetsof alleles provides an efficient way of defining and exploiting properties
associatedwith above-averageperformance. Chapter 5 developsa criterion
for measuringthe efficiency with which adaptive plans improve average performance
and then relates this criterion to the exploitation of schemata. Chapter 6
introduces generalized genetic plans and chapter 7 establishes their robustness.
Chapter 8 studies mechanismswhich enable genetic plans to use predictive modeling
for flexible exploitation of the large fluxes of information provided by typical
environments.
The emphasisthroughout the book is on general principles which help to
resolve the problems and questionsraised in this chapter. One particular interest
will be the solution of problems involving hundreds to hundreds of thousands of
interdependent parameters and multitudes of local optima- problems which
largely lie outside the prescriptions of present day computational mathematics.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

2. A FormalFramework

1. DISCUSSION
Threeassociated
objectsoccupiedthe centerof the preliminarysurvey
E, the environment of the system undergoing adaptation,
1', the adaptive plan which determines successivestructural modifications in response
to the environment,

.
#I., a measureof the performanceof differentstructuresin the environment

Implicit in the discussion is a decomposition of the overall process into two


disjoint parts- the adaptive system employing 1', and its environment E. This
decomposition is usually fixed or strongly suggestedby the particular emphasisof
each study, but occasionally it can be arbitrary and, rarely, it can be a source of
difficulty . Thus, in some biological studies the epidermis naturally servesas the
adaptive system-environment boundary, while in other biological studies we deal
with populations which have no fixed spatial boundaries, and in ecological settings
the boundary shifts with every changein emphasis. Similarly , the emphasisof the
study usually determineswhat notion of performance is relevant and how it is to
be measuredto yield #I.. BecauseE, 1', and #I. are central and can be regularly identified in problems of adaptation, the formal framework will be built around them.
In the basic formalism the adaptive plan l' will be taken to act at discrete
instants of time, t = I , 2, 3, . . . , rather than continuously. The primary reason
for adopting a discrete time-scale is the simpler form it confers on most of the
important results. Also this formalism intersectssmoothly with extant mathematical
theories in severalfields of interest where much of the developmentis basedon a
discrete time-scale, viz., mathematical economics, sequentialsampling theory, the
theory of self-reproducing automata, and major portions of population genetics.
Where continuity is more appropriate, it is often straightforward to obtain continuous
counterparts of definitions and theorems, though in somecasesappropriate
20

A Formal Framework

restatementsare full -fledgedresearchproblems with the discreteresultsserving only


as guidelines. In any case, the instants of time can be freely reinterpreted indifferent
applications- they may be nanosecondsin one application (e.g., artificial
intelligence), centuries in another (e.g., evolutionary theory). The properties and
relations establishedwith the formalism remain valid, only their durations will
vary. Thus, at the outset, we come upon a major advantage of the formalism:
Featuresor procedureseasily observedin one processcan be a~ tracted, set within
the framework, and analyzed so that they can be interpreted in other processes
where duration of occurrence, or other detail, obscurestheir role.
As our starting point for constructing the formalism let us take the domain
of action of the adaptive plan, the set of structures(t . At the most abstract level (t
will simply be an arbitrary , nonempty set; when the theory is applied, (t will
designatethe set of structuresappropriate to the field of interest. Becausethe more
generalparts of the theory are valid for any nonempty set (t , we have great latitude
in interpreting or applying the notion of structure in particular cases. Stated the
other way around, the diversity of objects which can serveas elementsof (t assures
flexibility in applying the theory. In practice, the elementsof (t can be the formal
counterparts of objects much more complex than the basic structures (chromosomes
, mixesof goods, etc.) of the preliminary survey. They may be sets, sequences
,
or probability distributions over the basic structures; moreover, portions of the
'
adaptive systems past history may be explicitly representedas part of the structure.
Often the basic structures themselves will exhibit additional properties, being
presentedas compositions of interacting components(chromosomescomposedof
alleles, programs composedof setsof instructions, etc.). Thus (referring to section
1.4), if the elements of (t are to represent chromosomeswith l specified genes,
where the ith gene has a set of ki alleles Ai = {ail, . . . , au,} , then the set of
structuresbecomesthe set of all combinations of alleles,
a; = Al X A2 X . . .

X At = nf - lAi8

Finally , the set d will usually be potential rather than actual. That is, elements
becomeavailable to the plan only by successivemodification (e.g., by rearrangement
of components or construction from primitive elements), rather than by
selection from an extant set. We will examine all of these possibilities as we go
along, noting that relevant elaborations of the elements of d provide a way of
specializingthe generalparts of the theory for particular applications.
The adaptiveplan T produces a sequenceof structures, i.e., a trajectory
throughd , by making successiveselectionsfrom a set of operatorsO. The particular

Adaptation in Natural and ArtificialSystems

selectionsmade are influenced by information obtained from the environment E,


so that the plan l' typically generatesdifferent trajectories in different environments.
The adaptive system's ability to discriminate among various environments is
limited by the range I of stimuli or signals it can receive. More formally : Let the
structure tried at time t be Ci(t ) Ed . Then the particular environment E confronting
the adaptive systemreacts by producing a signal I ( t). Different structures may of
course be capableof receivingdifferent rangesof signals. That is, if IA is the range
of signals which A can receive, then for A ' . ~ A it may be that IA ~ IAI. To keep
the presentation simple, I is used to designatethe total range of signals U AE:GIA
receivable by structures in d . The particular information I ( t ) received by the
adaptive systemat time t will then be constrained to the subsetof signalsI G(I) C I
which the structure at time t, Ci(t ), can receive. I may have many components
corresponding, say, to different sensors. Thus, referring to the example of section
1.3, I consistsof ab components11X It X . . . X I .. = fir . Iii . In this caseIi = {O, I }
for all i sincethe ith component of I representsthe range of valuesthe ith sensor8i

PLAN
ADAPTIVE

T
,.~
.
~
7
,
"\,-'/".*'~
~
1
?
'
)
I
/
'
.
"
,\xI"f..(~
.
,2
1
/)!@
"i:-~
'S
.I
3

SELECTED
BY
OPERATORS
PLAN
TRIED
STRUCTURES

PERFORMANCES
0 SSERVED p. E( A( I , IJ
.E(.,f (2 , IJ.E(Jf.(3 , ...
E
IN ENVIRONMENT
'
Fig . 2. Schematicof adaptiveplan " operation

A Formal Framework

can transmit. That is, given a particular signal I ( t) E: lat time t, the ith component
I .( t ) is the value 8.( 1) of the ith sensorat time t. In generalthe setsIi may be quite
different, corresponding to different kinds of sensorsor sensorymodalities.
The formal presentationof an adaptive plan 'Tcan be simplified by requiring
that (I.( t) serveas the state of the plan at time t. That is, in addition to being the
structure tried at time t , Ct.(t ) must summarizewhateveraccumulatedinformation is
to be available to 'T. We havejust provided that the total information receivedby 'T
up to time t is given by the sequence(/ ( 1), 1(2), . . . , / ( t - I )} . Generally only part
of this information is retained. To provide for the representationof the retained
information we can make use of the latitude in specifying (1. Think of (1 as consisting
of two components(11and 3Jt, where (l.l( t) is the structure tested against the
environment at time t, and the memory mt( t) representsother retained parts of
the input history (/ ( 1), / (2), . . . , / ( t - I )} . Then the plan can be representedby the
two- argument function
'T:I X (1- + (1.

Here the structureto be tried at time t + I , al ( t + I ), along with the updated


memory3Jt(t + I ), is givenby
(al ( t + I ), 3Jt(t + I

= d( t + I ) = or(l (t), d( t = or(l (t), (al( t), 3Jt(t ).

( Theprojectionof 'I' on 3Jt,


'1'. :1 X al X 3Jt- + 3Jt
definedso that
'I'. { I ( t), al (t), 3Jt(t = proj2[or(l (t), d( t ] = 3Jt(t + I )
is that part of 'I' whichdetermineshow the plan's memoryis updated.) It is clear
that any theoremsor interpretationsestablished
for the simpleform
'I': lxa

-+a

'
can at oncebe elaborated
, without loss of generalityor rangeof application, to
the form
'1':1 X (al X 3Jt) - + (al X 3Jt).
Thusthe frameworkcan be developedin termsof the simple, two- argumentform
of '1', elaboratingit wheneverwe wishto studythe mechanisms
of trial selectionor
in
detail
.
memoryupdate greater

ISystems
Adaptation in Natural and Artificial

In what follows it will often be convenient to treat the adaptive plan 'Tas a
stochastic process; instead of detennining a unique structure <i( 1 + I ) from 1( 1)
and <i ( I), 'Tassignsprobabilities to a rangeof structuresand then selectsaccordingly.
That is, given 1( 1), <i ( 1) may be transfonned into anyone of several structures
A ~, A ~, . . . , Aj , . . . , the structure Aj being selectedwith probability Pl . More
formally : Let <Pbe a set of admissibleprobability distributions over d . Then

.,.:1 X a. - + <P
will beinterpretedasassigningto eachpair (1( 1), (t(1 a particulardistributionover
Ct, <P(1+ I ) E: (9. The structureCt(1+ I ) to be tried at time 1+ I will then be
determined by drawing a random sample from Ci according to the probability
distribution <p( t + I ) = .,(1( t), Ci( t . For those caseswhere the plan T is to determine the next structure <l ( t + I ) uniquely, the distribution <P(t + I ) simply becomes
a degenerate
, one-point distribution where a single structure in disassigned
probability I . Hence the form
T:/ X d - + CP
includes the previous
T:/ Xd - + d
as a special case.
In practice the transformation of Ci( t) to Ci( t + I ) is usually accomplished

,..' :1 X Ci- + n
and the setof operators
() = {C
-':ct -to cP
}
where the stochastic

aspectis now embodied in the operators .


C
-' , = 'T'(/ ( t), Ct(t

the particularoperatorselectedby .,' at time t, then


designates
'
<p(t + 1) = (IJ,(a( t = [., (I (t), a( t ](a( t
givesthe resultingdistribution over Ct. Hence.,' determines., oncethe functions
in n are specified
:
'
1'(1( t), a( t = [., (I ( t), a( t ](a( t = <p(t + 1).

A Formal Framework

That is, the rangeof ".' can bechangedfrom n to (J' with ".' beingredefinedso that
".'(/ (t), Ct(t = [".'(;(t), Ct(t ](Ct(t = <p(t + I ).
With this extension".' and ". becomeidentical; for this reasononesymbol" "." will
be usedto designateboth functions, the range being specifiedwheneverthe distinction
is important .
The general objective of this formalism is comparison of adaptive plans,
either as hypothesesabout natural phenomena or as algorithms for artificial
systems. The comparison naturally centers on the efficiency of different plans in
locating high performancestructures under a variety of environmental conditions.
For a comparison to be made there must be a set of plans, given either explicitly
or implicitly , which are candidatesfor comparison. This set will be formally designated
3. Often 3 will be the set of all possibleplans employing the operators in 0 ,
but in some casesthere will be constraints restricting 3, while in others 3 will be
enlargedto include all possibleplans over Ct(i.e., all possiblefunctions of the form
'T: 1 X Ct- . <P). 3, however defined, represents the set of technical or feasible
options for the adaptive systemunder consideration.
As indicated in the survey, a nontrivial problem of adaptation exists only
when the adaptive plan is faced with an initial uncertainty about its environment.
This uncertainty is formalized by designatingthe set 8 of alternativescorresponding
to characteristicsof the environment unknown to the adaptive plan. The dependence
of the plan' s action upon the environment finds its formal counterpart in the
dependenceof the input I ( t) upon which environmentE E: 8 actually confronts
the plan. One case of particular importance is that in which the adaptive plan
receivesa direct indication of the performanceof each structure it tries. That is, a
part of the input I ( t ) will be the payoff 1I. (Ct(t determined by the function
:<1- ' Reals
118
which measuresthe performance of each structure in the given environment.
Sometimes, when the performance of a structure in the environment E
dependsupon random factors, it is useful to treat the utility function as assigning
a random variable from some predeterminedset ' Uto each structure in Ct. Thus

liB: (t - + '\1
and the payoff assignedto 6( t) is determined by a trial of the random variable
/I.. (6( t = ~ t). This extension does not add any generality to the framework
(and henceis unnecessaryat the abstract level) becauseany randomnessinvolved

,
Adaptation in Natural and Artificial Systems

in the interaction between the adaptive system and its environment can be subsumed in the stochastic action of the operators. (See chapter 5 and section 7.2,
however .)

Much can be learned about adaptive plans in general by studying plans


which act only in terms of payoff, plans for which
l ( t) = p.~ (t.(t .
In particular , plans which receiveinformation in addition to payoff should do at
least as well as plans which receiveonly payoff information . Thus, the efficiency of
payoff-only plans with respectto 8 setsa nontrivial lower bound on the efficiency
of other plans.
To pose a problem in adaptation unambiguously one more element is
required: a criterion X for comparing the efficiency of different plans TE: 3 under
the uncertainty representedby 8. Such a criterion must of necessitybe fairly
sophisticatedsince it must somehow take into account the varying efficiency of a
plan in different environments. Thus, even with a definite measureof efficiency
such as the averagerate of increaseof payoff, there is still the problem of variations
acrossthe environments 8. How is a plan which is highly efficient only in some
subset of 8 to be compared with a plan which is moderately efficient in all the
environmentsin 81 It should be clear that the plan favored will often dependupon
the particular application. In spite of this there are some broadly based criteria
which have quite general applicability . The simplest of these requires that a plan
accumulatepayoff in eachE E: 8 more rapidly than an enumerativeplan which has
the samedomain of action Ct. The intuitive content of this criterion is clear: A plan
which does not accumulate payoff at least as rapidly as the extremely inefficient
enumerativeplans should, exceptin simple situations, be eliminated as a hypothesis
(about natural systems) or an algorithm ( for artificial systems). Becauseit is often
useful to smooth out short-term variations in judging a plan, severalbroadly based
criteria are stated in terms of the long-term average rate of payoff. When the
adaptive plan has the deterministic form Til X ct - + Ct, other, more generalcriteria
are basedon the cumulative payoff function

U,...(T) = ET. ll1.. t (1', t


where B,(T, t) is the structure selectedby T in E at time t, il. (B,(T, t is the corresponding
payoff, and U.,... (T) is the total payoff receivedby T in E to time T. ( The
averagerate of payoff is just the function U.,... (T) / T basedon the cumulative payoff
function U.,... (T) .) When the adaptive plan is stochastic, T:I X Ct- + (p, it is natural

A Formal Framework

to substitute the expectedpayoff under <p(t), P. ( T, I), for 1I. (Ci(T, t . ( If Ctis countable
, P. ( 1', t ) is simply given by P. ( 1', t ) = Ej <p(Aj , t)II. (Aj) where <p(Aj, t) is the
probability of selecting Aj E: Ct when the distribution over Ct is <p(t ).) Thus, for
stochasticadaptive plans,
U,... (T) = ET. IP. ( T, I).
Following this line, a useful performancetarget can be formulated in terms
of the greatestpossiblecumulative payoff in the first T time-steps,

.
U,.
UJ( T) = lob
,.3 .. (T}
An important criterion, appearingfrequentlyin the literatureof control theory
economics(seechapter3, " Illustrations" ), canbeconciselyformulated
and mathematical
in termsof UJ: .,. accumulates
payoffat an asymptoticoptimalrate if
Jim Uf' T)/ T)/ (UJ(T)/ T) ] = Jim [ Uf'.~ T)/ UJ(T)] = 1.
T- T- - [( .~
In other words, the rate at which 'Taccumulatespayoff is, in the limit , the sameas
the best possiblerate. Often it is desirableto have a much stronger criterion setting
standardson interim behavior. That is, eventhough the payoff rate approaches the
optimum, it may take an intolerably long time before it is reasonably close. Thus,
the stronger criterion setsa lower bound on the rate of approach to the optimum .
For example, the criterion would designatea sequence(c'/') approaching 0 (such as
"1'
kj (T + k 1) or (kj (k + el , for 0 < j < ~ ) and then require for all T
[ Ut' ,s(T) jU .J(T) ] > ( 1 - c'/').
Clearly the plan 'Tsatisfiesthe asymptotic optimal rate criterion when it satisfies
this criterion and, in addition , 'Tcan approach that rate no more slowly than c'/'
approaches O.
The simplestway to extend thesecriteria to & is to require that a plan ' TE: 3
meet the given criterion in each E E: &.
'Tis robustin & with respectto the asymptoticoptimalrate criterionfor 3 whm
Sib Jim[ Ut'.,B(T)/ UJ(T) ] - 1.
BE:&T- 'Tis robustIn & with respectto the interimbehaviorcriterion < Cf'> for 3 when, for
all T,

gib[ U,...s(T)/ UJ(T)] > (1 - C,.).


BE
:l

Adaptation in Natural and Artificial

Systems

Each criterion in effect classifiesthe plans in 3 as " good" or " bad" according to
whether or not it is satisfied. The first of thesecriteria is commonly met in a wide
range of applications, while the second proves to be relevant to questions of
survival under competition. (Once again, a plan satisfying the second criterion
automatically meetsthe first but not vice versa.) Other criteria can be basedon the
cumulative payoff function and indeed criteria of a quite different kind can be
useful in particular situations. Neverthelessthe criteria given are representative
and of general use; they will playa prominent role later.

2. PRESENTATION
A problem in adaptation will be said to be well posed once 3, 8, and x have been
specified within the foregoing framework. An adaptive systemis specified within
this framework by the set of objects (a , a , I , T) where
(t = {Ai A !, . . .} is the set of attainable structures, the domain of action of the
adaptive plan,
() = {", .",, !, . . .} is the set of operators for modifying structures with ", E: () being
a function ", : (t - . <P, where <Pis some set of probability distributions over (t ,
I is the set of possible inputs to the system from the environment, and
T : I X (t - . n is the adaptive plan which, on the basis of the mput and structure
at time I , determines what operator is to be applied at time I.

.,(1( 1), <1(1 = (I), E: {} and (1), 1( 1 = <P(I + 1),


where <p( t + I ) is a particular distribution over (t . Ct(t + I ) is determined by
drawing a random sample from (t according to the distribution <p(t + I ). Given
the input sequence(/ ( I ), / (2), . . .}, l' completely determinesthe stochasticprocess.
(Occasionally, when the adaptive systemis to be deterministic with Ct(t + I ) being
uniquely determined once / ( t ) and Ct(t) are given, l' will be defined without the use
of operators so that 1':/ X (t - . (t .) The structure of the adaptive systemat time t,
Ct(t), will be required to summarizewhateveraspectsof the input history are to be
available to the plan. Henceit will often be useful to represent(t as (tl X 3ft, where
(tl is the set of structuresto be directly testedand 3ft is the set of possiblememory
configurations, for retaining past history not directly incorporated in the tested
structures.

A Formal Framework

3 is the set of feasibleor possibleplans of the form 1':I X <1- + D (or 1': I X
<1- + (j;) appropriate to the problem being investigated.
8 representsthe range of possibleenvironments or , equivalently, the initial
uncertainty of the adaptive systemabout its environment. When the plan l' tries a
structure <l( t) E: <1 at time t, the particular environment E E: 8 confronting the
adaptive system signals a responseI ( t) E: I . The performance or payoff 1I.~ <l( t ,
given by the function II.B: (j; - + Reals, is generally an important part of the information
I ( t). Given E ~ E ' for E, E ' E: 8, the corresponding functions II.B, II.B' are
generallynot identical so that a major part of the uncertainty about the environment
is just about how well a structure will perform therein. When a plan employs, or
receives, only information about payoff so that I ( t ) = 1I
.g l ( t it will be called a
payoff only plan.
Finally , the various plans in 3 are to be compared over 8 according to a
criterion x . Comparisons will often be based on the cumulative payoff functions
Ur,g(T) = 1:: [. 1pg( 1', I), where pg( 1', t) is the expectedpayoff under <p( t), and the
"
"
target function UJ( T ) = lub Ur,g( T ). An interim behaviorcriterion , basedon a
rE:3
selectedsequence(CT) - + 0 and of the form

),
sIbs [ U,.,.(T)/ UJ(T) ] > ( I - CT
BE
:
will be important in the sequel.
With the help of this framework each of the fundamental questions about
adaptation posed in chapter 1, section 1, can be translated into a formal counterpart
:
Formal
Original
What is &?
To what parts of its environmentis the organism(system
, organiza
tion) adapting?
What is I ?
How doesthe environmentact upon the adaptingorganism(system
,
?
organization
)
What is (J;?
What structuresare undergoingadaptation?
What
is n ?
What are the mechanisms
of adaptation?
What is mt?
What part of the history of its interactionwith the environmentdoes
the organism(system
, organization
) retain in addition to that summarized
in the structuretested?
What is 3?
?
What limits are thereto the adaptiveprocess
What is X?
es to be Corn
about) adaptiveprocess
How are different(hypotheses
pared?

in NaturalandArtificialSystems
Adaptation

3. COMPARISON
WITH THE DUBINS-SAVAGEFORMALIZATION
OF THE GAMBLER' S PROBLEM
. . . much of the mathematicalessence
of a theory of gamblingconsistsof
the discoveryand demonstrationof sharpinequalitiesfor stochasticprocess
es. . .
this theory is closelyakin to dynamicprogrammingand Bayesianstatistics. In the
reviewer's opinion, [ How to GambleIf YouMust] is oneof the mostoriginal books
publishedsinceWorld War II .
M. Iosifescu. Math. Rev. 38, 5, Review5276( 1969
).
For those who have read, or can be induced to read, Dubins and Savage's influential
book, this section (which requires special knowledge not essential for
subsequentdevelopment) showshow to translate their formulation of the abstract
'
gambler s problem to the present framework and vice versa. Briefly, their formulation
is basedon a progressionoflortuneslo ,/ I ,/ 2' . . . which the gambler attains
a
by sequenceof gambles. A gamble is naturally given as a probability distribution
over the set of all possiblefortunes F. The gambler' s range of choice at any time t
dependsdirectly and only upon his current fortune I , so that, as Dubins and Savage
remark, the word " state" might be more appropriate than " fortune." The gambler' s
rangeof choice for eachfortune I is dictated by the gamblinghouser . The strategy0'
for confronting the houseis a function which at each time t selectsa gamble in r
on the basisof the sequenceor partial history of fortunes to that time ( /0,/ 1, . . . ,1,).
Finally the utility of a given fortune I to the gambler is specifiedby a utility function
u. Thus an abstract gambler's problem is well posed when the objects
'
(F, r , u) havebeenspecified; the gambler s responseto the problem is given by his
strategy0'.
The objects of the Dubins-Savageframework can be put in a one-to-one
correspondencewith formally equivalent objects in the present framework. With
the help of this correspondenceany theorem proved in one framework can automatically
be translated to a statementwhich is a valid theorem in the other framework
. The relation betweenthe intended interpretations of corresponding objects
is in itself enlightening, but the real advantageaccruesfrom the ability to transfer
results from one framework to the other with a guaranteeof validity .
The following table presentsthe formal correspondencewith an indication
of the intended interpretation of each formal object. In this table the superscript
" * " on a set will
indicate the set of all finite sequences(or strings) which can be
formed from that set; thus F* is the set of all partial histories.

A Formal Framework

Dubin .f- Savage

Adaptive Systems

F , fortunes

I, basicstructures(see'Tbelow).
P, a probability distribution over structures
, i.e., P E: <P.
The (induced) function which assignsto
eachA E: Cithe set of distributions<pA {c.(A), CdE: O} .
.,.: Ci- + <p, an adaptivepion; .,. usesonly
the retainedhistory~ in Ci = Ci1X 3ft,
' if
but .,. has the same generality as Q
F*
~
Fand
.
Ci1
.
"'. : Ci- + Reals, performance

' Y, a probability distribution over fortunes


or a gamble.
r , a function assigning a set of gambles to
eachfE : F , the house.
0-: F* - + {' Y} , a strategy which assigns to
each partial history pEP * a gamble
rU ) , where I is the latest fortune in the
sequencep .
uP - + Reals, utility .

As implied by their terminology, Dubins and Savagetreat situations wherein


the expectationfor any strategy0', given an initial fortune F, is lessthan F. That is,
the strategiesare operating in environments wherein continued operation makes
degraded performance ever more likely . ( This is similar to adaptation in an
environment having only nonreplaceableresources, so that performancecan only
decline in the long run.) In contrast, the presentwork is primarily concernedwith
complex environmentswherein performancecan be permanently improved, if only
the right information can be acquired and exploited. Despite the differences, or
more likely becauseof them, theorems from one framework have interesting, and
sometimessurprising, translations in the other framework.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

3. Illustrations

The formal framework set out in chapter 2 is intended, first of all , as an instrument
for uniform treatment of adaptation. If it is to be useful a wide variety of adaptive
processes must fit comfortably within its confines. To give us a better idea of how
the framework servesthis end, the presentchapter appliesthe framework in several
different fields. It will repay the reader to skim through all of the illustrations on
first reading, but he should skip without hesitation over difficult points on unfamiliar
ground, reserving concentration for illustrations from familiar fields.
Although each of the illustrations adds something to the substantiation of the
framework no one of them is essentialin itself to later developments. The interpretations
, limited usually to one commonly used model per field, are of necessity
largely informal , but two points can be checkedin eachcase: ( I ) the facility of the
framework in picking out and organizing the facts relevant to adaptation, and (2)
the fit of establishedmathematical models within the framework.

1. GENETICS
. . . genesact in many ways, affecting many physiological and morphological
characteristics which are relevant to survival. All of these come together into the
" or selective value. . . .
sufficient parameter 66fitness
Similarly environmental fluctuation
, patchiness, and productivity can be combined . . . in . . . [ a] measure of
environmental uncertainty . . . .
Levins in ChangingEnvironments( pp. 6- 7)

.
The phenotypeis the product of the harmoniousinteractionof all genes
"
..
The genotypeis a physiologicalteam in whicha genecanmakea maximumcontribution
to fitnessby elaboratingits chemical..geneproduct" in the neededquantity
. Thereis extensiveinteraction
and at the time whenit is neededin development
not only amongthe allelesof a locus, but also betweenloci. The main localeof
32

Illustrations
.

theseepistaticinteractionsis the developmentalpathway. Natural selectionwill


tend to bring togetherthosegenesthat constitutea balancedsystem
. The process
by which genesare accumulatedin the genepool that collaborateharmoniouslyis
called " integration" or " coadaptation
." The result of this selectionhas been
"
"
referredto as internal balance
. Eachgenewill favor the selectionof that genetic
backgroundon which it canmakeits maximumcontributionto fitness. The fitness
of a genethusdependson andis controlled by the totality of its geneticbackground
.
Mayr in AnimalSpeciesandEvolution(p. 29S)
We have already looked at genetic processes at some length in the preliminary
survey, so this illustration will be brief, mostly recapitulating the main points of
the earlier discussion, but within the formal framework. Typically , only a certain
range of basic structures, i.e., chromosomes, is admitted to studies in genetics, so
that only a species
, family, or other taxonomic grouping is involved. Still , in
principle, one can study all possiblevariations, including variations in chromosome
number and type. The range of the study will be primarily determined by the set 0
of genetic operators admitted, since the possible variants (genotypic and phenotypic) will be those produced by sequencesof genetic operators from o. Familiar
examples of genetic operators are mutation, crossover, inversion, dominance
modification, translocation, and deletion (see the formal definitions given in
chapter 6).
The geneticadaptive plan developsin terms of an everchanging population
of chromosomeswhich, interacting with the environment, provides a concurrent
sequenceof phenotype populations. For many purposes, it is convenient to
representa population as a probability distribution over the set of genotypesai ,
where the probability assignedto genotype A E: al is the fraction of the total
' and Kimura 1970). Thus the
population consisting of that genotype (cf. Cro W
population at time t can be specifiedby Ci(t) E: a , wherea is the set of distributions
over (tl . In very general terms, each element of the population is tested against
the .environmentand is ranked according to its fitness- its ability to survive and
'
reproduce. It s often usefulto think of the environment E in terms of environmental
niches, each of which can be exploited by an appropriate set of phenotypic characteristics
. Then fitness P,B becomesa function of the coadapted sets of alleles
which produce these characteristics(seechapter 4). From this point of view the
population Ci(t ) can be looked upon as a reservoir of coadapted sets, preserving
the history of past advances, particularly the environmental niches encountered.
Most mathematical models of geneticadaptation are basedon very simple
' i is assigneda fitness p,s( O
' i) and
reproductive plans, where each individual allele O

Adaptation in Natural and Artificial Systems

the fitness of any set of alleles { 0'1, . . . , o'. } is taken to be the sum of the fitnesses
of the alleles in the set,

1: ':. 111
..(.0';).
However, in general, the fitness of an allele dependscritically upon the influence
of other alleles (epistasis). The replacementof any single allele in a coadapted set
may completely destroy the complex of phenotypic characteristics necessaryfor
adaptation to a particular environmental niche. The genetic operators provide
for the preservation of coadapted sets by inducing a " linkage" betweenadjacent
alleles- the closer together a set of alleles is on a chromosome, the more immune
it is to separation by the genetic operators. Thus a more realistic set of adaptive
plans provides for emphasis of coadapted sets through reproduction, combined
with application of the genetic operators to provide new candidates and test
establishedcoadaptedsetsin new combinations and contexts.
More formally , an interesting set of plans can be defined in terms of a twophase
procedure: First the number of offspring of each individual A in a finite
population a( t) is determined probabilistically, so that the expected number of
'
offspring of A is proportional to A s observedfitness IJ.B!.A ). The result is a population
'
(1( t) with certain chromosomesemphasized, along with the coadapted sets
contain.
Then, in the secondphase, the genetic operators from n are applied
they
in
some
(
predeterminedorder) to yield the new population Ct( t + I ). One class of
plans of considerablepractical relevancecan be defined by assumingthat operator
CIJ
.. from n is applied to an individual A E: (1'( t) with probability p .. (constant over
time). It is easyto seethat the efficiencyof such a plan will dependupon the values
ofthep ..; it is perhapslessclear that onceeach of the p .. has a value within a certain
critical range, the plan remains efficient, relative to other possible plans, over a
very broad range of fitness functions {IJ.BEE : 8} . In particular, if chromosomes
containing a given linked set of alleles repeatedly exhibit above-average fitness,
the set will spread throughout the population. On the other hand, if a linked set
occurs by happenstancein a chromosomeof above-averagefitness, later tests will
eliminate it (seechapters6 and 7) . It is this mode of operation (and others similar)
which gives such plans robustness- the ability to discover complex combinations
of coadaptedsetsappropriate to a wide variety of environmental niches.
Becauseof the central role of fitness, it is natural to discussthe efficiency
and robustnessof a plan T in terms of the averagefitnessesof the populations it
produces. Formally , the averagefitness in E of a finite population of genotypes
Ct,(t) produced by T at time t is given by

Illustrations

4~(')IIs(A),
Ps
(T, t) = 1:AM
(T, t)
where M (or, t) is the number of individuals in ~ t) . If we take the ratio ps( or, t)/
"
'
"
'
ps( or, I), we have an indication of how close orcomesto extinction relative to or
(in the sensethat extinction occurs when the population produced by orbecomes
'
negligible relative to the population produced by or). If we take the greatestlower
bound of this ratio relative to some set of possible plans 3, we have an indication
of the worst that can happen to orin E, relative to 3 at time t. Continuing in this
vein we get the following criterion for ranking plans as to robustnessover &:

'
gIb P.(1', t)/ P.( i , I).
gIb&gIb
B
~ , ", ~3
In effect this criterion ranks plans according to how close they come to extinction
under the most unfavorable conditions.
The fantastic variety of possiblegenotypes, the effectsof epistasis, changing
environments, and the difficulty of retaining adaptations while maintaining
variability (genetic variance), all constitute difficulties which genetic processes
must surmount. In terms of the (3, 8, x) framework theseare, respectively, problems
of the large size of Ct, the nonlinearity and high dimensionality of #liB, the nonstationarity of #liB, and the mutual interference of search and exploitation . The
(3, 8, x) framework enablesthe definition of concepts(chapters4 and 5) which in
turn (chapters6 through 9) help explain how geneticprocessesmeetthesedifficulties
in times consistentwith paleological and current biological observations.
Summarizing:
Ct, populations of chromosomesrepresented, for example, by the set of distributions
over the set of genotypesCtl.
n, genetic operators such as mutation, crossover, inversion, dominance modifi cation, translocation, deletion, etc.
3, reproductive plans combining duplication according to fitness with the
En is
application of genetic operators; for example, if each operator CI1i
of
set
the
then
a
with
fixed
plans
to
individuals
possible
,
probability Pi
applied
can be representedby the set
{(Pi, . . . , Pi, . . . , Pb) where 0 ~ Pi ~ I } .
8, the set of possible fitness functions {ll.B:Ct- + m} , each perhaps stated as-a
function of combinations of coadaptedsets.

Adaptation in Natural and Artificial Systems

x, comparison of plans according to averagefitnessesof the populations


duced; for example,

pro -

'
gIb gIb
gIb p.( .,., I)/ P.( ", , I).
, ,.,
B~8
~3

2. ECONOMICS
The specificationof how goodscan be transformedinto eachother is called
the technology
of the modeland the specificationof how goodsare transformedto
satisfactionis calledthe utility function. Giventhis structureand someinitial bundle
of goods, the problemof optimal developmentis to decideat eachpoint of time
how muchto investandhow muchto consumein orderto maximizeutility summed
over time in somesuitableway.
Gale in IIA MathematicalTheoryof Optimal Economic
"
Development Bull. AMS 74, 2 (p. 207)
One of the most important formulations of mathematical economics is the von
Neumann technology. This technology can be presented(following David Gale
1968) in terms of a finite set of goods and a finite set of activities, where each
activity transforms somegoodsinto others. If the goodsare indexed, then the goods
available to the economy at any given time can be presentedas a v~ or where the
ith component givesthe amount of the ith good. In the sameway, the input to the
jth activity and the resultant output can be given by a pair of vectors Wj and W;,
where the ith component of Wj specifiesthe amount of the ith good required by
the activity , while the ith component of w ; specifiesthe amount produced. An
activity can be operatedat various levelsof effort so that, for instance, if the amount
of input of each required good is doubled then the amount of output will be
doubled. More generally, if the level of effort for activity j is Cj then the pair
( WjCj, WJcJ specifiesthe input and output of the activity . Ifa mixture of activities
is allowed, the overall technology can be specifiedas the set of pairs
'
{( Wc, W c) : 3 CE: Q}
where Wand W' are matriceshaving the vectors Wj and w ; as their respectivejth
columns, each c is a vector having the level of the jth activity as its jth component,
and Q designatesthe set of admissible activity mixes (corresponding to the real
constraintslimiting the total activity at any time). A program for utilizing the technology
is given by a sequenceof activities (c,) satisfyingthe intuitive " local" requirement
that the total amount of each good required as input for the activities at time

Illustrations

" Tool
"
Fabrication
Output
.

Input

'

" Coal
"
Mining
Input Output

TYPICAL ACTIVITIES:
" Coal
"
Storage
Input Output

Matrix W'
(CombiningActivity
Outputs
)
3/2 000000
0 100004
0 010000
0 001300
0 000010

0000

Matrix W
(Combining Activity
Input Reauirements
)

ON -

0
0
0
1

00 -

Iron Ore
Steel
Tools

TYPICALPRODUCTION
:
SEQUENCE

~"

II

W' .

\ ~N

I ~

<t(l )

5
0
2
/0
5
/2
l

10
10
0
~"

Wood
Coal
Iron Ore
Steel
Tools

I In

Ci(t) :

0
4
0

= w

Fig. 3. Exampleolvon Neumanntechnology

t cannot exceedthe total amount of that good produced as output in the preceding
period :
W' C'- l ~ Wc,
(using matrix multiplication and the obvious extension of inequality to vectors) .
Activities which dispose of or store goods can be introduced so that the given

. Thus, given
inequalitiescanbechangedto equalitieswithout lossof generality
initial supplyof goodsV(O), the setof admissibleprogramsbecomes

Adaptation in Natural and Artificial System"

C~ = (c~.,), where.BE: <B, anindexingset,


e = {sequences
andt = 0, 1, 2, . . . :3 (i) C~.' E: Q, (ii) Wc~.o= V(O),
(ill) W'c~., = Wc~.,+v .
It is assumed that each activity vector c can be assigneda unique utility ,,(c)
designatingthe satisfaction to society of engagingin the mix of activities specified
by the vector. ( This way of assigning utility has the nice feature that satisfying
activities which do not directly consume goods, such as viewing pictures in a
museum or conserving goods for future use, can be included in the model.) The
object of the study is to compare various programs in terms of the utility sequences
they produce. Typically , programs are compared over someinterval of time (0, T)
by taking the differenceof their accumulatedutilities

U.,(T) - U~,(T)

where

U.,(T) = ET- o,l(c~.,).

A programCisconsidered optimalif
RIb liminf [ U,.(T) - U.s(T) ] ~ O.
: eT - Gile
BecauseQ setsan upper limit to levels of effort, an optimal program always exists.
A program c ~ will often be satisfactory if its rate of accrual of utility U~ T )/ T is
comparable to that of C,. .
"
"
Generally interest centers on noncontracting economieswhere, once an
activity is possible, it continuesto be possibleat any subsequenttime. This can be
"
"
guaranteed, for example, if there is a set of initial goods which are regenerated
by all activities (cf. sunlight, water, and air ) and from which all other goods can be
produced by appropriate sequencesof activities. In such economies a mix of
activities can be tried and, if found to be of above-averageutility , can be employed
again in the future.
In the (3, 8, x) framework, the set of admissible activity mixes Q becomes
the set of structures (t. An adaptive plan 7' generatesa program C by selecting a
sequenceof activity vectors (cr.,) on the basis of information received from the
environment (economy). The environment E in this case makes itself felt only
through the observed utility sequences
( cr., ; thus different utility functions
to
different
environments
.
Within
this framework, the basic concern is
correspond
discovery of an adaptive plan which, over a broad variety of environments,
"
"
generatesprograms which work near-optimally . A typical criterion of " near-

Illustrations

"
optimality would be that for all utility functionsof interestthe ratio of the rate
of accrualof theadaptiveplan1', U~T)/ T, to that of C~(E) , U~(B)(T)/ T, approach
es
1 for eachE E: B. That is,

jim
U B T = I , for aUE E: 8.
2"- - [Ur<T)/ ~( )( )]
Generally there will be some additional requirement that the rates be comparable
for all timesT .
Adaptation becomesimportant when there is uncertainty about just what
utility should be assignedto given activity mixes, or when it is difficult to project
IJ.B into the future, or when Q is a function of time (reflecting technological innovations
). The key to formulating an adaptive plan here, paralleling the procedure
in other contexts, is continual use of incoming information (about satisfactions
and dissatisfactions, changing technology, etc.) to modify activity levels. A wellformulated plan should respond automatically, specifying adjustments needed, as
information accumulates. Since, in von Neumann' s formulation , the environment
is characterizedby the utility assignedto different activity vectors, we can limit
considerationto payoff-only plans. The fact that reproductive plans are payoff-only
plans which can be proved near-optimal (in the sensedefined above) for any set of
utilities, makes it likely that such plans can supply the responsivenessrequired
here. In (3, 8, x) terms the basic problems here, as in the geneticsillustration , are
the large size of (t coupled with nonlinearity and high- dimensionality of IJ.B.
Becausethe concepts of chapters 4 and 5 are formulated in terms of the general
framework, they apply here as readily as to genetics. The resulting techniquesare
specifically interpreted as optimization procedures throughout chapter 6, at the
end of section 7.1, and throughout section 7.2.
Summarizing:
d , the set of admissibleactivity vectors Q.
0, transformations of Q into itself.
3, plans for selectinga program (c,) E: e , where c, is an activity vector in Q, on
the basis of observedutilities {p.Bf.c,,), t' < I } , i .e., payoff-only plans.
8, an indexing set of possible utility functions {p.B: Q - + <R, E E: 8} .
x, typically a requirement that , for all utility functions p.BE E: 8, the limiting
rate of accrual of a plan, Jim ( U,(T) / T) , equal that of the best possiblepro'1'- gram C.s.(E) in each E E: 8.

Adaptation in Natural and Artificial Systems

3. GAME-PLAYING
to
Lackingsuchknowledge[of machine-learningtechniques
], it is necessary
specifymethodsof problemsolutionin minuteand exactdetail, a time- consuming
and costly procedure
. Programmingcomputersto learn from experienceshould
eventuallyeliminatethe needfor muchof this detailedprogrammingeffort.
Samuelin " SomeStudiesin MachineLearningUsingthe
" IBM J. Rea. Dev. 3 . 211
Gameof Checkers
(p
)
Most competitive gamesplayed by man ( board games, card games, etc.) can be
presentedin terms of a tree of moves where each vertex (point, node) of the tree
corresponds to a possible game configuration and each directed edge (arrow)
leading from a given vertex representsa legal move of the game. The edge points
to a new vertex corresponding to a configuration which can be attained from the
given one in one move (turn , action) ; the options open to a player from a given
configuration are thus indicated by the edges leading from the corresponding
vertex. The tree has a single distinguished vertex with no edgesleading into it , the
initial vertex, and there are tenninal vertices, having no edgesleading from them,
which designateoutcomes of the game. In a typical two- person game which does
not involve chance, the first player selectsone of the options leading from the
initial configuration; then the second player selects one of the options leading
from the resulting configuration; the play of the game proceeds with the two
players alternately selectingoptions. The result is a path from the initial vertex to
some terminal vertex. The outcomes are ranked, usually by a payoff function
which assignsa value to each terminal vertex.
In theseterms, a pure strategy for a given player is an algorithm (program,
procedure) which, for each nonterminal configuration, selectsa particular option
leading therefrom. Once each player choosesa pure strategy, the outcome of the
game is completely determined, although in practice it is usually possible to
determine this outcome only by actually playing the game. Thus, in a strictly
determined (non-chance) two- person game, each pair of pure strategies(one for
each player) can be assigneda unique payoff. The object of either player, then, is
to find a strategy which doesas well as possibleagainst the opponent as measured
by the expectedpayoff. This informal object ramifies into a whole seriesof cases,
"
depending upon the initial information about the opponent and the form of the
game.
One of the simplest casesoccurs when it is known that the opponent, say
the secondplayer, hasselecteda singlepure strategyfor all future plays of the game.

Ill I Istrations

GAMETREE

.
.
.

Plan 's Move

Opponent' s Move
. . .

Plan's

Move

Move

Move selected by oppon ent ' s predetermined strote QY

t: Move' selected by adaptive


~ plan s strategy 2
Fig . 4. Example of a game tree

The object of the first player, then, is to learn enough of the strategychosenby the
secondplayer to find an opposing strategywhich maximizespayoff. When the game
tree involves only a finite number of vertices, as is often the case, it is at least
theoretically possibleto locate the maximizing strategy by enumerating and testing
all strategiesagainst the opponent. However, if there is an averageof k options
proceeding from each configuration, and if the average play involves m moves,
there will be in excessof k- pure strategies. The situation is quite comparable to
the examplesof enumeration given earlier. Even for a quite modest game with
k = 10 and m = 20, and a machine which tests strategiesat the exceptional rate
of one every 10- ' second, it would require in excessof 1011seconds, or about 30
centuries, to test all possibilities. Efficiency thus becomesthe critical issue, and

Adaptation

in

Natural

and

Artificial

Systems

interest centers on the discovery of plans which enable a player to do well while
learning to do better. If plans are compared in terms of accumulated payoff, a
'
"
"
criterion emergesanalogousto the classical gambler s ruin of elementaryprobability
. Let Ug( T, t) be the payoff accumulatedto time t by plan TE: :J confronting
the ( unknown) pure strategy E E: 8, and require that

t)/ U",s( t)] ~ C.


sIb6,,(:3
sIb sIb
, [Un,s(
B
(:
That is, the payoff accumulatedby 1'1never falls to lessthan c of that accumulated
: 3, no matter what strategy the opponent chooses
by any other admissibleplan lE
(even if that other plan by happenstancehits upon a good opposing strategy in its
first trial ). The smaller c is, the less stringent the criterIon and, in general, the
larger the number of plans satisfying the criterion . The usefulnessof this criterion
and the kinds of plans satisfying it will be discussedat length later (seeespecially
'
the discussionof Samuels algorithm in section 7.3) ; for now it is sufficient to notice
that : (i ) the criterion depends upon the accumulation function U,..~ t), (ii ) for a
given opposing strategyE, the lower the efficiencyof a plan in accumulatingpayoff
in relation to other plans lE : 3, the smaller c becomes, and (iii ) the rating of a plan
will be determined by its performance against the opposing strategy which gives
it the most difficulty .
Even when it is known that the opponent has selecteda singlepure strategy,
there is a wide range of sophistication of adaptive plans. One classof simpler plans,
the payoff-only plans, proves to be quite instructive becauseit sets a nontrivial
lower bound on the performance of more sophisticated plans and it can be analyzed
in some detail. In this context, a payoff-only plan ranks strategiesit has
tried according to the payoff obtained, and it generatesnew trial strategieson the
basisof (selectedparts of) this information alone (seesection 7.3). More sophisticated
plans use the large amounts of information generatedduring plays of the
game, information concerning configurations encountered and the sequencein
which they occur (seechapter 8). Obviously a plan which makesproper use of this
additional information should do no worse than a payoff-only plan (since the
sophisticatedplan can reduceits operation to that of a payoff-only plan by ignoring
the additional information ), and there are certainly situations in which the information
will enable the plan to accumulatepayoff at a greater rate than a payoff-only
plan.
The other extreme from a fixed opposing pure strategy occurs when any
sequentialmix of strategiesis presumedpossibleon the part of the opponent. The
object then (following von Neumann 1947) is usually to minimize the maximum

Illustrations

loss (negative of the payoff) the opponent can impose. It is interesting that often
(checkers, chess, go) this minimax strategy is a pure strategy. Thus, although the
payoff may vary on successivetrials of the samestrategy, the plan can still restrict
its searchto pure strategiesin such cases. In more generalsituations, however, the
plan will have to employ stochasticmixtures of pure strategiesand, if it is to exploit
its opponents maximally, it will even associateparticular mixtures with particular
kinds of opponents (assumingit is supplied with enough information to enable it
to identify individual opponents).
Considered in the (3, 8, x) framework, the strategies becomethe elements
of the domain of action (t and the plans for employing thesestrategiesbecomeelements
of 3. The set of admissibleenvironments8 dependsupon the particular case
considered. Ifit is known that the opponent has chosena single pure strategy, then
the set of admissible environments 8 is given by the set of pure strategies. The
criterion X for ranking the plans is then built up from the unique payoff determined
by each pair of opposing pure strategies, the example given being the " gambler' s
ruin " criterion

gibsigib gib [Un..s(t)/ U,..~ t)] .


BE
: E:3 ,
In the rnorecornplicatedcases
is enlarged
, the set of environrnents
, ultirnatelyincluding
plansoverct; however
, the accurnulationfunctionsU,..~ t) arestill defined
's ruin" criterioncanstill
andcriteriasuchasthe " garnbler
beusedto rank theplans
in 3.
Onceagain, asin the previoustwo illustrations, the largesizeof ct and the
cornplexrelation of its elernentsto perforrnanceconstitutea major barrier to
. Section7.3 specificallydiscuss
esthe role of adaptivealgorithrnsin
irnprovernent
defined
in
the
rnanner
of
Sarnuel
. In addition, the necessity
garnestrategyspaces
of
non
using
payoff inforrnationgeneratedduringthe play of rnorecornplexgarnes
in section8.4 as an
presentsspecialdifficulties. This latter problernis addressed
elaborationof the conceptsand techniquesdevelopedin the earlierchapters
.
:
Surnrnarizing
Ct, strategiesfor the garne
.
D, dependentupon the way strategiesare represented
; geneticoperatorswill
functionif descriptorsare usedso that eachstrategyis designated
by a string
of descriptorvalues(seethe predictivernodelingtechniqueof the next section
for suggestions
concerningoperationson the strategyduringthe play; section
8.4 extendstheseideas).
3, plansfor testingstrategies
.

Adaptation in Natural and Artificial Systems

8 , the strategic options open to the opponent ; in simple cases, the set of pure

.
strategies
'
"
"
a
x, rankingof plansusingthecumulativepayoff functions , the gambler s ruin
criterionbeingan example
.
4 . SEARCHES
, PATTERN RECOGNITION, AND STATISTICAL INFERENCE
Searches occur as the principal element in most problem-solving and goal-attainment
attempts, from maze-running through resourceallocation to very complicated
planning situations in business, government, and research. Games and searches
have much in common and, from one viewpoint, a gameis just a search(perturbed
by opponents) in which the object is to find a winning position. The complementary
viewpoint is that a searchis just a gamein which the movesare the transformations
(choices, inferences) permissible in carrying out the search. Thus, this discussion
of searches complementsthe previous discussionof games.
In complicated searches the attainable situations S are not given explicitly ;
instead some initial situation SoES (position in a maze, collection of facts, etc.)
is specifiedand the searcheris given a repertory of transformations {" ..} which can
be applied (repeatedly) in carrying out the search. As in the caseof games, a tree is
a convenient abstract representationof the search. For searches, each edgecorresponds
to a possible transformation " .. and the traverse of any path in the tree
correspondsto the application of the associatedsequenceof transformations. The
vertex at the end of a path extending from the initial vertex correspondsto the
situation produced from the initial situation by the transformations associatedwith
the path. The difficulty of solving a problem or attaining a goal is primarily a function
of the size of the search tree and the cost of applying the transformations.
In most casesof interest the treesare so vast that hope of tracing out all alternative
paths must be abandoned. Somehowone must formulate a searchplan which, over
a wide range of searches, will act with sufficient efficiency to attain the goal or
solve the problem.
A typical search plan (see Newell, Shaw, and Simon' s [ 1959] GPS or
Samuel's [ 1959] procedure) involves the following elements:
(i ) An (ordered) set of feature detectors {8i:S - + Vi, where Vi is the range
of readingsor outputs of the ith detector} . Typically , each detector is
an algorithm which, when presentedwith a " scene" or situation, calculates
a number; if the number is restricted to 0 or I , it is convenient
to think of the algorithm as detecting the presenceor absenceof a

Illustrations

property (cf. the simple artificial adaptive system of section 1.3). The
needfor detectorsarisesfrom the overwhelming flow of information in
most realistic situations; the intent is to filter out as much " irrelevant"
information as possible.

ENTER
At eachchoicepoint, 1 through6, thereis to be a signassociated
with eachof the 3 possible
directionsx , y, z. If the symbolII A " occursat the top of a signthe associated
corridor
belongsto the shortestpath from the entranceto the goal; on the other hand, if the
II "
symbol V occursat the bottom of a sign the associatedcorridor is to be avoided.
Eithersymbolmaybedark on a light backgroundor viceversa.Thus, reducedto a 4- by4
array of sensors(seesection1.3), either of the configurations

indicates the direction of the goal. Each experiment involves a set of signs indicating
uniquely the shortest path to one of the three possible goals GI, GI, Ga.
In the tern1inology of section 3.4, the state at each choice point is given by the triple of
signs there. That is,
S - {triples of 4- by- 4 arrays}
"
{'llillila } - { 66followdirection x , 66followdirection y," 66followdirection z" }
Fig . 5. A simple searchsetting: a maze with six choicepoints

Adaptation in Natural and Artificial Sy.rtem


.r

The device is supplied with four detectors ( see section 1.3)


1 if the array is predominantly
It -

"'

&. -

/
"

&, -

dark ( 8 or more squares dark )

0 otherwise

1 if therighthalfis darkerthantheleft half


.
0 Otherwise

1 if theupperhalfis darker than the lower half

""0

otherwise
1 if the array is dark and the upper half is darker

a. - la - &. -

/
,

.
0 Otherwise

Thusthearray

would be assignedthe quadrupleof detectorvalues( 1, 1,0,0). ( Notethe largereduction


in the numberof situationsto be evaluated- thereare 211~ 64,(MX
) differentarraysbut
2'
16
detector
value
.
only
quadruples)
The thresholddevicesof interestare specifiedby
/ (S) - Ef - 1W18
.{S)
whereeachWicanbe chosenfrom the set { - 2, - 1,0,1,2} . A 4- by-4 array S is ~~ ~ oo to
c+ if f (S) ~ i , otherwiseit is assignedto C- .
To determinewhat transformationfrom {" .", !", .} is to be invokedat choicepoint J each
of the threearraysS;aSS ;. is submittedto f. If exactlyoneof SlatS;., S;. is assigned
to c+ the correspondingcorridor is followed, otherwisea corridor is chosenat random
of the lack
(and, presumably
, the adaptiveplan is invokedto modify the weiahtsbecause
of a uniqueprediction).
Fig . 6. A threlhold devicefor the . ttlng of figure 5

Illustrations

Setting I : The goal is at G1, the signs at choice point 1 are

Six

Sly

Siz

~,

52z

at choicepoint2

~x

and so on (i .e., the shortest path is indicated by dark symbols on a light background) .
Setting II : The goal is at Ga, the signs at choice point 1 are

Six

51y

SIZ

and at choice point 2 they are the same as in Setting I except for

S2x
If the shortest path to the goal were always indicated as in Setting I , i .e., with dark symbols
on a light background, then the functionf ' (s) = 8a(S) ( i.e., w. = 1, WI - WI = W. = 0)
would always suffice for following the path . Notice , however, that in Setting II f ' assigns
'
exactly the sameset of values (0, 1,0) at point 1, indicating that I does not distinguish the
'
two settings. But , in Setting I f ' assigns( 1,0,0) at point 2, while in Setting II I assigns
(0,0,0) at point 2. Thus, starting from the same initial state (0, 1,0) and invoking the same
'
response'l-, I arrives at two different states. Changing the weight assignedto 8, cannot
correct the difficulty . This is a clear indication that the set of detectors (8, in this case) is
inadequate.
A quick check of the possibilities showsthat consistently correct choicesin the two settings
can be achieved only by assigning a nonzero weight to 8. , which is a nonlinear combination
of 81and 8, . The function I ' ' (S) = 8. + 8, - 28. then performs correctly in both
settings and, in fact, performs consistently with any proper sequenceof signs.
Fig . 7. Some searche" u.rIng the device" 01figure 6 In the setting" 01figure 5

Adaptation in Natural and ArtificialSystems

"
"
(ii ) An evaluator. The evaluator calculatesan estimate of the distance
of any given situation from the goal, using the detector outputs (an
ordered set of real numbers) produced by that situation. The estimates
are supposedto take the costsof the transformations, etc., into account;
that is, the " distances" are usually weighted path lengths, where the
paths involved are (conjectured) sequencesof transformations leading
from given situations to the goal. The intent is to usetheseestimatesto
determine which transformations should be carried out next. An evaluation
is made of each of the situations which could be produced from
the current one by the application of allowed (simple sequencesof)
transformations, and then that (sequenceof) transformation(s) is executed
which leads to the new situation estimated to be " nearest" the
goal.
(iii ) Error correction procedures. Before the searchplan has beentried , the
detectorsand evaluator must be set up in more or lessarbitrary fashion,
using whatever information is at hand. The purpose of the error correction
proceduresis to improve the detectors and evaluators as the
plan accumulatesdata. The shorter term problem is that of evaluator
improvement. A typical procedureis to explore the searchtree to some
distanceaheadof the current situation, either actually or by simulation,
evaluating the situations encounteredfor their estimateddistancesfrom
the goal. The evaluation of the situation estimated to be " nearest"
the goal is then compared with the evaluation of the current situation
and the evaluator is modified to make the estimatesconsistent. This
" lookahead"
procedure decreasesthe likelihood of contra4ictory distance
estimatesat different points on the samepath. (A similar procedure
can be carried out without lookahead using predictors to make
predictions about future situations, subsequentlymodifying the predictors to bring predictions more in line with observedoutcomes.) As a
result, the consistencyof the evaluator is improved with eachsuccessive
evaluation. At the same time, in most searches, the difficulty of estimating
the distance to the goal decreasesas the goal is approached,
becoming perfect when the lookahead actually encounters the goal.
Thus increasingthe consistencyultimately increasesthe relevanceof the
evaluator.
There is, however, a caveat. If the set of detectorsis inadequate, for whatever
reason, the improvement of the evaluator will be blocked. This raises the broad
issueof pattern recognition, for the set of detectorsis, of course, meant to enable

Illustrations

the plan to recognizecritical features for goal-attainment. The plan must be able
to classifyeachsituation encounteredaccordingto the goal-directed transformation
which should be applied to it . The long-term problem is that of determining whether
the set of detectorsis adequateto this task. Important shortcomingsare indicated
when, from application of identical transformations to situations classedas equivalent
by the detectors, situations with critically different evaluations result. When
this happens, the detectors have clearly failed to distinguish some feature which
makesa critical differenceas far as the transformations are concerned. The object,
then, is to generatea detector which gives different readings for the previously
indistinguishable situations. Among the obvious candidatesare modifications of
the detectorswhich made the distinctions after the transformations were applied.
Usually simple modifications will enable such detectors to make the distinction
before the transformation as well as after .
We can look at this whole problem in another way, a way which makescontact
with standard definitions in the theory of probability . Assumethat the search
plan assignsto each transformation " a probability dependentupon the observed
situation. That is, if Sa is the current situation, then each situationS ~ ES can be
assigneda conditional probability of occurrencep~ , wherepa~ is simply the sum
of the probabilities of all transformations leading from Sato S~. (It may, of course,
be that there are no transformations of Sato S~, in which casepa~ = 0.) A sequence
of trials performed accordingto the probabilitiespa~is a Markov chain, the outcome
of eachtrial being a random variable(dependentupon the outcome of prior trials).
The samplespaceunderlying this random variable is the set of situations S. Let us
assigna measureof utility or relevanceto each of thesesituations. ( For example,
goals could be assignedutility 1 and all other situations utility 0, or some more
complicated assignmentranking goals and intermediate situations could be used.)
Then, formally , the function W making this assignmentis also a random variable.
Accordingly, we can assignan expectedutility to the random variable representing
the outcome of eachtrial in the Markov chain. In theseterms, the plan continually
redefinesthe Markov chain (by changing the transformation probabilities). It
attempts in this way to increasethe average(over time) of the expectedvalues of
the sequenceof random variables correspondingto its trials.
The role of detectorshere is, as already suggested
, reduction of the size of
the sample spaceand simplification of the search. More formally , consider a set
of n detectors(not necessarilyall those available), H = {8t, . . . , 8ft} , where H is
arbitrarily ordered. The detectorsin H assignto each SE: S an n-tuple of readings
(VI, . . . , Vft) belonging to the direct product
fir - l Vi.

Adaptation in Natural and Artificial Systems

In generalthere will be many situations producing a given set of detector readings;


let S( VI' . . . , vA) be the set of situations in S producing the particular n-tuple of
readings(VI, . . . , vA). In probabilistic termsS(VI, . . . , vA) is an eventdefined on the
sample spaceS. Events themselvescan be treated as random variables. (In fact, an
occurrenceof the situation 5 can be construed as the occurrenceof all the events
of which it is an instance.) Moreover, the function W assigningvaluesto elements
of S can be restrictedto the eventS( VI, . . . , vA) so that it becomesa random variable
W(VI, . . . , VA
). As such W(VI, . . . , vA) has a well-defined expected
) over S( VI, . . . , VA
value W(VI, . . . , vA) over S( VI, . . . , VA
).
This probabilistic view of searchplans is closely related to statistical inference
basedon sampling plans. The estimation of W(VI, . . . , VA
) from observation
of a few samplesdrawn from S(VI, . . . , VA
) is a standard problem of statistical
inference. We can think of a subsetof detectorsH as detecting one kind of critical
feature when the corresponding W(VI, . . . , vA) is greater than W, where W is the
averagevalue of the random variable W over the samplespaceS. Searchplans go
further In attempting to infer somethingof the value of W(VI, . . . , vA) for S( VI, . . . ,
vA) which have not been sampled. ForexampleS ( VI, V2
) is contained in both S( Vt)
and S( ~ ); often it is possible to infer something of W(Vt, ~ ) from knowledge of
W(VI) and W(V2
), though not necessarilyby standard statistical techniques.
The earlier concern with distinguishability is also directly stated in these
terms: Let 8( t) be the particular n-tuple of detector readings at time t (8( t ) E: niv i)
and let / : ni Vi - . {.,,} be a searchplan. That is,/ is a prescription which specifies,
for eachset of detector readings, a transformation. The object of the searchplan is
to transform the current situation into one of high utility . But, for this to be possible
, the effectsof the transformations must be reliably indicated by the detectors.
In particular, consider St and ~ ES ( VI' . . . , vA), so that at t = I either would
show the samereading 8( 1) = (VI, . . . , vA) E: ni Vi. The plan/ specifiesthe action
.,,(1) = / (8( 1 , and this in turn produces a new detector reading 8( 2). The
whole procedure is iterated to yield a sequenceof pairs ([8( 1),/ (8( 1 ] , [8( 2),
/ (8(2 ] , . . . , [8( t),/ (8( t ]) . The requirement on distinguishability is simply that ,
using the information provided by the detectors, / reliably transforms 51 and 52
into situations 5 ~and ~ , respectively, for which W( 5 ~) ~ W( 5~). ( Notice that this
is a much weaker requirement than would be necessaryfor a completely " autonomous
" model wherein future
situations would be wholly predictable on the basis
of 8( 1) without any further information from the environment. That is, in an autonomous
model, knowledge of 8( 1) and .,,(1), . . . , .,,(t) must suffice to determine
"
"
t
I
8( + ). This requirement for autonomy - technically a requirement that the
detectorsinduce a homomorphism- can be quite difficult to meetand, for intricate

V
IV
I
-,~
(
\
(--U
\V
>
)
C
-U
~
>
II
.~
:~
U
)II
"
,II
.)()(\\II,.;:>
-If
~
I-<
')~
II
(.+
\~
>
N
~
..IM
-)J
I>
'=
(-M
\,U
>
- N ->
>NN
--. --.
-> ->
-II
-II _N
~_
U
-) U
-)
(10 ~

~
.
I

1
~

1
.:s
2

3l
~3mVA
3

N
II~
~N
>
,t)
>
-..

H3

""NN
>NN>
,t) ,f)
>- >- --II --..
(/)- u\"
NN
~ ~~
--~ -N
(/)
-- (/)
~
,o
- --II- -IIN
(/)
- (/)
~ ~

i
'i

~
~
~
~
:r
-;
1...

IE
s~
1
~
~
~~

52

Adaptation

in Natural

and Artificial

Systems

situations, there may be no nontrivial homomorphisms.) The requirement on distinguishabili


permits the detectors to lump together situations having approximately
equal expected utilities. This permits us to construct new, much smaller
Markov .chains basedon eventsof interest. With this interpretation, the object of
the adaptive plan T is to test different sets of detectors, H , and search plans, f .
The error correction proceduresfor the detectorsand the evaluator can be
coordinated via a model of the search environment. In complex search environments
, it is the detectors which make a model possible. From considerations of
storage alone, a model w.ould be impossible if , for each observedtransformation,
the initial situation and its successorhad to be recordedin full . If the systemrecords
'
just the effect of transformations on the detector readIngs, it can be reduced to
manageableproportions. Statedanother way, the state spaceof the searchenvironment
is reducedto the manageablespaceof detector readings, and the effectsof the
transformations are observedon this reducedspace. The construction of the model
proceedsas data accumulate. When new detectors are required to increase distinguishabilit
, an augmentedmodel can be built around the old model as a nucleus
. In particular, when a new detector is added to the set of detectors, this does
not affect the data or the part of the model concernedwith the old set of detectors.
The task is to add information about the effect of the transformations upon the
new detector, particularly in those situations that were causing difficulty . Once a
model is available, it can be used with the evaluator to generatepredictions and
thesecan be checkedagainstthe outcome of that segmentof the search. The resulting
error indications, together with simulated lookahead, can then be used to improve
the consistencyof the evaluator.
The sophistication of a model-evaluator search plan can only be justified
when repeatedsearches must be made in the sameoverall searchenvironment. As
a prototype, we can consider an environment which (i ) is complicated enough to
make the exact recurrenceof any particular situation extremely unlikely, but (ii )
is regular enough to exhibit critical features(patterns) " pointing the way" to goals.
The object of a plan, then, is to search out a successionof goals, improving its
performance by incorporating the critical featuresin its detector-evaluator scheme.
The overall objective of a formal study of this area is to find a plan which,
when presentedwith any of a broad range of complex searchenvironments, rapidly
increasesits searchefficiency by extracting and exploiting critical features. Considered
within the (:J, 8, x) framework, the domain of action Cl of a search plan
TE: :J consists of the various combinations of detectors, models, and evaluators
that the plan can generate. Usually these will all be specified as algorithms or

Illustrations

programs in some common formal language(seechapter 8) . A search plan ' TE: :I


thus amounts to a data- dependentalgorithm for modifying the combination of
detectors, model, and evaluator along the lines indicatedabove. The outward effect,
at each point in time, of the combination produced by the searchplan is a transformation
'
" in the search environment. The range of the plan s action at each
moment is therefore circumscribedby the set of possibletransformations {,,} . The
set of admissibleenvironments E E: 8 consists of the set of search environments
over which the searchplan is expectedto operate, eachelementE being presentable
as a tree generatedby the possible transformations. Let U,..~ I) be the cost in E
of the transformations applied by 'Tthrough time I. Ifn,. .~ I ) is the number of goals
achievedto time I , then c,..~ I) = U,..~ I )/ n,..~ I ) is the averagecost to time I of each
'
goal in E. A conservativemeasureof a plan s performance over all of 8 would
then be
lub jim c,.,.s(t )
BE:8,- which yields the criterion X wherein plan l' has a higher rank than plan 1" if it
is assigneda lower number by the above measure. Suggestionsfor a modelevaluator plan, based on the genetic algorithms of chapter 6 and capable of
, are advancedin section 8.4.
modifying its representations
Summarizing:
, [probabilistic] Markov chains induced by the setsof conditional probabilities
{P.{ S) , the probability of applying transformation "Ii to situationS ES } ;
[general] admissibledetector-evaluator-model combinations.
0 , [probabilistic] rules for modifying the conditional probabilities {P.{ S) } ;
"
"
[general] lookahead error correction, detector generation, and model revision
procedures.
3, algorithms for applying operators from 0 to using information about the
(sampled) averagecost of goal attainment and (in the generalcase) errors in
"
"
prediction ( lookahead ) and observedinadequaciesin detectors.
8, the set of search environments characterized by search trees along with a
transformation cost function II.BC
"Ii' S) giving the cost of applying "Ii in situation
SE: S.
x, the ranking of plans in 3 according to performancemeasuressuch as
lob Jim c.,.,~ I).
BE:s ,- -

Adaptation In Natural and Artific/~ Systems

5. CONTROLAND FUNCTIONOPTIMIZATION
The fact that we need time to determine the minimum of the [performance]
functional or the optimal [control ] vector c . is sad, but unavoidable - it is a cost
that we have to pay in order to solve a complex problem in the presenceof uncertainty
. . . . adaptation and learning are characterized by a sequential gathering
of information and the usage of current information to eliminate the uncertainty
created by insufficient a priori information .
Tsypkin in Adaptation and Learning in Automatic Systems( p. 69)

In the usual version, a control led processis defined in terms of a set of variables
{XI, . . . , x ..} which are to be control led. ( For example, a simple process of air
conditioning may involve three critical variables, temperature, humidity, and air
flow.) The set of statesor the phasespacefor the pr ~
, X , is the set of all possible
combinations of values for these variables. ( Thus, for an air conditioning process
the phase space would be a 3-dimensional space of all triples of real numbers
(XI, X2, XI) where the temperature XI in degreescentigrade might have a range
0 ~ XI ~ SO, etc.) Permissiblechangesor transitionsin phasespaceare determined
as a function of the state variable itself and a set of control parametersC. Typically
X is a region in n- dimensionalEuclidean spaceand the control parametersassume
values in a region C of an m-dimensionalspace. Accordingly , the equation takes
the form of a " law of motion " in the spaceX ,

dx/ dt = f (X(t), C( t ,

where

X(t) E:, XC(t) E: C.

Often X will haveseveralcomponentsXI, . . . , Xi following distinct laws/ I, . . . ,It


so that

f (X(t), C(t = (/I(xi(t), C(t , . . . ,fk(X.(t), C(t ).


For example, given a pursuit problem with a moving target having coordinates
X~ t ) at time t ,f ~ X~ t ), C( t would be the law of motion of the target while/ l ( xi ( t),
C( t would determine the pursuit curve. If some component, say X" represents
time, thenfa(Xa(t), C( t = t and the law of motion becomesan explicit function of
time.
When a rule or policy A is given for selectingelementsof C as a function of
time, a unique trajectory X ( t), C( t ) through X X Cisdetermined by the law of
motionf . The object is to selecta policy A for minimizing a given function J which
assignsa performanceor cost to each possibletrajectory X ( t ), C( t ) . In practice,
the function J is usually determinedas the cumulation over time of someinstanta-

II lustrat /ona

neouscost rate Q( X( t), C( t ; i.e., J ( X ( t), V( t }) = f Q( X ( t), C( t dr. Typically,


the cost function is derived from an explicit control objective such as attainment
of a target state or a target region in minimal time or minimization of cumulative
error. (Error is defined in terms of a measureof distance imposed on the phase
space; the distance of the current state from the target region is the current error.)
Control is thus a continuing searchin phasespacefor the (usually moving) target
or goal- as such the considerations of the preceding illustration are directly relevant
. In the formulation of the pursuit problem stated above a natural measure
of the cost of pursuit over someinterval T would be the changein distancebetween
target and pursuer divided by the fuel expenditure ( with suitable conventions for
trajectories where the distancedoes not decrease
).
Although the controlled processis defined above in terms of continuous
functions, discrete finite-state versions closely approximating the continuous version
almost always exist. Indeed, if the problem is to be solved with the help of a
digital computer, it must be put in finite-state form. Becausethe framework we are
using is discrete, we will reformulate the problem in discrete form. The law of
motion is given by
X ( t + I ) = f ( X ( t), C( t ,
and the cumulative cost for a given trajectory over T units of time is given by
J X ( I ), C( I , . . . , ( X (T) , C(T ) = ET- l Q( X ( t), C( t .
If we look at the control led processin the (3, 8, x) framework we seethat
the law of motionf determinesthe environment of the adaptive system. A problem
in control becomesa problem of adaptation when there is significant uncertainty
about the law of motionf ; that is, it is only known thatfE : { fB, E E: 8} . Such
problems are generally unsolvable by contemporary methods of optimal control
theory (cf., for example, the comments of Tsypkin [ 1971, p. 178] ). Clearly under
such circumstancesthe adaptive plan will have to try out various policies in an
attempt to determine a good one. To fix ideas, let us assumethat each policy
lA E: (11can be assignedan average or expectedperformance Q( IA,f ) for each
possiblef . Moreover let us assumethat this averagecan be estimated as closely
as desired by simply trying 1A long enough from any arbitrary time t onward. The
object then is to searchfor the policy in (11with the best averageperformance Q,
exploiting the best among known possibilities at each step along the way.
A control policy IA E: (11generatesa sequenceof control parameters(C( t)) .
Different trials of the policy lA , say at times tl, t2, . . . , t.., will in general elicit
different costs Q( tl )' Q( t2)' . . . , Q(tJ . However, the (3, 8, x) framework requires

ISystems
Adaptation in Natural and Artificial

that eachA E: (t be assigneda unique cost pBf.A ) . To satisfy this requirementwe can
let (t = (tl X m where m is the set of natural numbers { I , 2, 3, . . .} . Then unique
elementsof (t , namely ( lA , II)' ( lA , t2)' . . . , ( lA , It ), correspond to the successive
trials of lA and the cost Q( ti ) of trial ti can be assignedas required,
pBf.A ) = pBf.( IA, ti

= Q( ti ).

An adaptive plan .,. will modify the policy at intervals on the basis of observed
costs. With the definition of (t just given this meansthat , if 1A is tried at
time t and is to be retained for trial at time t + 1,
1'(1( t), <t( t

= 1'(1( t), ( lA , t

= ( lA , t + 1) ;

on the other hand, if a new policy lA ' is to be tried ,


1'(1( t), <t( t

= 1'(1( t), ( lA , t

= ( lAt

+ 1).

A sophisticatedadaptive plan will probably retain a measureof the averageperformance


of various policies tried so that (t would be further extendedby a component
5R(seesection 2.2) to (t = (tl X m X 5R. A still more sophisticatedplan will
progressivelyreduce uncertainty about the environment by deliberately selecting
elementsof C to elicit critical information , perhaps constructing a model of fB .
Then by exploiting predictions of the model .,. can adjust the sequence( C( t)) to
better performanceas measuredby the function J. At this level the illustration concerning
searches, pattern recognition, and statistical inferenceappliesin toto . If the
plan is to be a payoff-only plan, then
I ( t ) = pBf.<t(t

= Q( t ),

and ' In( t + I ) is updated by using Q( t ) in a recalculation of the average performance


of (tl (t ).
Finally the function J determines a ranking for every control sequence
( C( t )}, whether or not it is generatedby a single policy. That is, an adaptive plan
.,. confronted with a law of motionf B may try severalpolicies, thereby generatinga
control sequencewhich no single 1A E: (tl could generate. However every control
action C(t ) has a definite cost Q( t ). Thus the trajectory ( C( t)} through C generated
by .,. can be ranked according to J. In this way J determinesa criterion for ranking
any .,. E: 3 in any E E: 8. As a specificexample, consider the casewhere the object
is minimization of cumulative error. By assigningmaximum payoff to the target
region and reducing the payoff of other statesin proportion to the associatederror ,
the performance of a plan .,. can be measuredin terms of the cumulative payoff
function UBf
..,., I ). The greater UBf
..,., t) the lessthe cumulative error to time t.

Illustrations

The foregoing discussioncan bernade applicable to function optirnization


"
"
by so arranging it that a single trial of a policy 1A producesa sufficient estirnate
of its averageperforrnance Q( lA ,/ } . For instance the policy could be repeatedly
tried over sorneextendedinterval of tirne which would then be taken to be a single
tirne-step in the discreteforrnulation. In any caselet us assurnethat each 1A E: Ctl
has a unique value #I.~ lA ) = Q( lA ,/ } which can be deterrnined (with sufficient
accuracy) in a singletirne-step. Moving the problern ofestirnating Q into the background
in this way, reducesthe control objective to finding the optirnurn of the
function #l.B.
If the elernentsof Ctlare representedas points in an n-dirnensionalEuclidean
space (R" the problern becornesone of optirnizing an n-dirnensional (nonlinear)
real function. For exarnplethe elernentsof CtI can be representedas strings of length
n over sornebasicalphabet} : . Since} : can be recodedas a subsetof {O, I } "' , where
m is the first integergreaterthan log2(card }: ), this can be looked upon as optirnization of an n-dirnensional function having m-place binary fractions as argurnents.
' o, 0'1, 0'2, . . .} , the coding 0'0+-+ .00 . . . 00, 0'1+-+ .00 . . . 01,
( Thus, if } : = {O
+
-+
0'2 .00 . . . 10, etc., can be used. Then with lA E: Ctl representedby the string
0'20
'20
' 1, say, the argurnent of #liB becornes( .00 . . . 010, .00 . . . 010, .00 . . . 001) .)
With this arrangernentan adaptive plan T usesits operators to generatea
sequenceof points Ctl( l ), Ctl(2), Ctl( 3), . . . converging to an optirnurn, rnuch in the
rnanner of standard iterative procedures. The adaptive approach, however, suggests
irnportant differencesin what inforrnation frorn prior calculations should be
retained (in ' In( t )) in preparation for generation of the next point Ctl( t + 1). In
particular certain adaptive plans proceedsirnultaneouslyand efficiently with global
and local optirnization of #liB. (Seechapters 4 and 5 for basic techniques.)
In the caseof function optirnization, high-dirnensionality and nonlinearity
of the function to be optirnized <ll.B), in all but a few special cases, constitute
insurmountable barriers to standard optirnization algorithrns. In the general control
problern there is the added difficulty of nonstationarity . The schernataconcept
(first interpreted in function optirnization terrns in chapter 4, pp. 70- 71) and the
algorithms based upon it (chapter 6) provide specific remediesfor the first two
problems. The latter problem is substantial and difficult - it is discussedin section
9.3.
Summarizing:
Ct, [control ] a set having as its basic component the set of adrnissiblecontrol
policies Ctl augmented by a memory component ' In and a set of time subscripts
m so that <t( t) = (Ctl( t), t , ' In( t )) ; [ function optirnization] the dornain
of the function Q to be optimized.

Adaptation in Natural and Artificial Systems

6. CENTRALNERVOUS
SYSTEMS
Behavioris primarily adaptationto the environmentunder sensoryguidance
. It takesthe organismawayfrom harmful eventsand toward favorableones,
or introduceschangesin the immediateenvironmentthat make survival more
likely.
Hebb in A Textbook of Psychology( pp. 44- 45)

I introduce this last example of an adaptive systemwith some hesitation. Not because
the central nervoussystem(CNS hereafter) lacks qualifications as an adaptive
system- on the contrary, this complex systemexhibits a combination of breadth,
flexibility , and rapidity of responseunmatchedby any other systemknown to man
- but becausethere is so little prior mathematical theory aimed at explaining
adaptive aspectsof the CNS. Even an intuitive understanding of the relation between
physiological micro- data and behavioral macro- data is only sporadically
available. Perforce, mathematical theories enabling us to seesome overall action
of the CNS as a consequenceof the actions and interactions of its parts are, when
available at all , in their earliest formative stages.
Here, more than with the other examples, the initial advantageof the formal
framework will be restatementof the familiar in a broader context. The best that
can be hoped for at this stageis an occasionalsuggestionof new consequencesof
familiar facts: Without the advantagesof a deductive theory, statementsmade
within the framework can do little more than provide an experimenterwith guideposts
and cautions, suggesting possibilities and impossibilities, phenomena to
anticipate, and conclusionsto be acceptedwarily . This is a preliminary , heuristic
stagemarking the transition from unmathematicalplausibility to the formal deductions
of mathematical theory. In common with most heuristic and loose-textured

Illustrations

59

arguments, it is difficult to eliminate ambiguities and contradictions- as with


proverbs, proper application dependsupon the intuition of the user. Specificapplications
of the formalism may arise, but these will probably be in areas of little
uncertainty, where theory was not actually required; the formal procedures will
be primarily corroborative. Only after considerableeffort at this level can we hope
for a theory mathematically rigorous and conceptually general- a genuinely predictive
theory organizing large massesof data at many levels.
One of the earliest suggestions( or corroborations) from the formal underpinnings of the (3, E, x) framework is quite fundamental. It can be establishedthat
major aspectsof the behavior of any very complex systemfall outside the explanatory
power of simple input -outputS -R or switching) theory. This result is a
rigorous version of the observation that ongoing activity in a complex system
usually dependsupon the past history of that system. This dependence
, which both
"
psychologistsand computer theorists call memory," finds its formal counterpart
in the notion of state: distinct stimulus-state pairs generally giving rise to different
. If there are many states(and, by any reasonabledefinition of state, the
responses
CNS has an astronomical number) the same stimulus may give rise to a great
. Thus, observation of stimulus-responsepairs will not
many different responses
enableus to discover the mode of operation of any systemwith a substantial number
of states. For a systemas complex as the CNS, sucha result can be ignored only
to the great detriment of the ensuing theory. It is a corollary of this result that
complex systemscan act in autonomous fashion, producing continuing response
sequencesin the absenceof new stimulus. Thus, a stimulus may serve only to
modify ongoing activity rather than to initiate it . In short, the responsesof the
CNS cannot be explained wholly in terms of concurrent stimuli.
The (3, E, x) framework also emphasizesa secondimportant point . An adequate
theory must include more than a formal counterpart of the internal processes
of the systembeing studied. The environment (or range of possibleenvironments),
the information receivedtherefrom, and the ways the systemcan affect the environment
, must also be represented. Moreover, the criterion X emphasizesthe importance
of performance " along the way." The CNS cannot wait indefinitely for
" useful" outcomes
; some minimal level of ongoing performanceis required. ( E.g.,
if food is not obtained with sufficient frequency, death ensues, totally removing the
possibility of further goal-oriented behavior.) Such observationsare not new, but
the (3, E, X) framework does provide a form for fitting and arranging them, and it
lends them emphasis. This at least gives us a fresh look at familiar facts, occasionally
suggestingnew consequenceswhich might otherwise be overwhelmed in the
plethora of macro- and micro- data ( behavioral and physiological).

Adaptation in Natural and Artificial System. !

The discussion which follows will be based upon the informal theory of
CNS action introduced by Hebb in his signal 1949 work and subsequentlyim portantlyenlarged by P. M . Milner ( 1957) and I . J. Good ( 1965). I will attempt a
brief recapitulation of some of the main assumptionshere, with the intention of
orienting the reader having some knowledge of the area. This is only one view of
CNS processes and the presentation has been kept deliberately simplistic. (E.g., a
more sophisticatedtheory would take account of substantial evidencefor distinct
physiological mechanisms underlying short-term, medium-term, and long term
the
relevance
a
as
in
context
possible,
memory.) The object is to indicate, as simple
"
of the (3, 8, x) framework to. understanding the CNS as a means of adaptation
"
to the environment under sensoryguidance. The reader without a relevant background
can gain a significant understanding by reading the papers of Milner and
Good ; a readingof Hebb' s excellenttextbook ( 1958) will give a much more comprehensive
view.
The basic element of Hebb' s theory is the cell assembly. It is assumedto
exhibit the following essential characteristics. (Comments in parenthesesin the
presentation of characteristicsrefer to possible neurophysiological mechanisms):

Illustrations

creasesthe likelihood of activity in all assemblieswith which it is negatively associated. Positive association betweena pair of cell assemblies
increaseswheneverthey are active at the sametime. Negative association
is asymmetrical in that one cell assemblymay be negatively associated
with a second, while the secondis not necessarilynegatively associated
with the first ; this negative association increaseseach time the first assembly
is active and the second is inactive. (The underlying neural
assumption here is that, if neuron n2producesa pulse immediately after
it receivesa pulse from neuron nl, then nl is better able to elicit a pulse
from n2in the future ; contrariwise, ifn2 producesno pulse upon receiving
a pulse from nl, then nl is more likely to inhibit n2 in the future. It is
usually assumed that this process is the result of changing synapse
levels. The same processcan be invoked in explaining the origin of cell
assemblies
.) It should be noted that , under this assumption, there is a
for
cell assembliesto becomeactive in fixed combinations, at
tendency
the sametime actively suppressingalternative combinations. ( Becausea
cell assemblyinvolves only a minute fraction of the neurons in a CNS,
a great many can be excitedat any instant, different configurations corresponding
to different perceivedobjects, etc.) Temporal association(i.e.,
probable action sequences
) can occur via appropriate asymmetries; e.g.,
assemblya can arouse.8via positive associationwhile .8inhibits a through
negative association. Thus the action sequenceis always a,8, never the
reverse.
4. At any instant the responseof the CNS to sensoryinput is determined
. (Overt behavior such as
by the configuration of active cell assemblies
movement
activation
of
reflex
es
release
of voluntary muscle sequences
,
eye
,
and
so
on
will
,
accompanymost sensoryevents. Via the mechanisms
of (3), neuronsinvolved in this behavior will tend to becomecomponents
of cell assembliesactive at the same time. Since pulse trains
from the active cell assembliesdominate overall CNS activity , overt
behavior will thus be determined by the active configurations. In effect,
the sensoryinput modulatesthe ongoing activity in the CNS to produce
overt behavior.)
5. Cell assembliesinvolved in temporal sequencesyielding " need satisfaction
"
(satisfaction of hunger, thirst , etc.) have their associationsenhanced
"
"
"
; the greater the " need, the greater the enhancement. ( Needs
are internal conditions in the CNS-control led organism, conditions pri marily concerned with survival, which set basic restrictions on CNS

Adaptation in Natural and Artificial Systems

action. In typical environmental situations, overt behavior is required for


" need satisfaction " and then the satisfaction is
,
only temporary- the
organism consumesthe resourcesinvolved in order to maintain itself.
"
Innate internal mechanisms in the organism automatically " reward
satisfaction of hunger, thirst , etc., and perhaps some more generalized
"
needssuch as curiosity. These " rewards may be mediated by innately
organized neural networks which exhibit increasing activity as a corresponding
need increases. Such internally generatedstimuli would progressively
disturb establishedconfigurations and sequences
, unlessthey
resulted in reduction of the corresponding need. Ultimately , in the absence
of satisfaction, this disruption would causean increasingly broad
search through the organism's behavioral repertory- a kind of hunt
through increasingly unusual cell assemblyconfigurations in an attempt
to produce an appropriate overt response. Temporal associationsof cell
assemblies
, active when such a disturbanceis reduced, would retain their
incremented synapselevels. Those active during a period of increasing
disturbance would encounter subsequentinterference, causing synapse
level incrementsto be transitory . Assemblieshaving precursorsoccurring
"
"
early in need satisfaction sequencesacquire a particular role. They
serve as " leading indicators," becoming active in advance of actual
"
"
primitive needs; they may serveas learned needs [goals] . A hierarchy
of precursorsof precursors, etc., can provide the systemwith a hierarchy
of " learned needs," someof them quite remote from the primitive needs.
That is, assembliescontaining substantial segmentsof the innately organized
networks as components, or assembliescloselyassociatedtherewith,
could give rise to secondary and higher-order " learned needs." The
effectsof thesenew assemblieswill be much like those generatedby the
innately organized networks.)
6. An active cell assemblyprimes cell assembliesassociatedwith it as successors
in temporal sequences
, making them more likely to be active
subsequently. (A neuron producing pulsesat a high rate tends to become
fatigued, with a consequentdrop in pulse rate. A neuron that is being
inhibited tends to exhibit less fatigue than normal becauseof its very
low pulse rate. A kind of inhibitory priming results, becausethe neuron
is hyperresponsiveoncethe inhibition ceases
. It is also likely that priming
"
occurs by transmission of priming molecules" through the synapsesof
active neurons. Priming provides the CNS with expectationsand predictions
. In effect the systemexpectsand is ready to respondto selectedsets

l //ustrat;ons

of featuresfrom the myriads of possibilities. When primed cell assemblies


subsequently become active - i.e., when the corresponding predictions
are verified- the associationsinvolved are strengthenedby the mechanisms
of (3). The resulting network of associationsconstitutesa model of
the environment"within the CNS. The model is dynamic in the sensethat
it takes sensory data as input and primes different temporal sequences
on the basis of the model' s predictions. Introspection confirms that , for
the human CNS, models of the environment are indeed usedto compare
alternative coursesof action. This model is ultimately dedicatedto keeping
"
"
primitive needs fulfilled , but it incorporates leading indicators,
etc., so that needs rarely become acute enough to determine action
directly.)
How can the (3, B, x) framework help in analyzing this model? Certain
analogieswith other processesare suggestive.Individual cell assembliesact, in part,
like the detectors in pattern recognizers: they are activated by particular features
of the environment, features presumably relevant to the organism's needs. At the
sametime, the configuration of cell assembliesactive at any given time definesthe
'
organism s responseto the environment. In the terms used earlier, such a configuration
is an element of the system's repertory. Assuming that the set of all cell
assembliesis fixed (as it might be, to a first approximation, in a mature organism)
or , at least, that the potentially available cell assembliescan be enumerated, the
set of all possible assemblyconfigurations constitutes the system's repertory of
techniquesfor confronting the environment.
When cell assembliesare in mutual negative association(cross-inhibition ),
they act much as the allelesof a chromosomal locus; any active configuration can
contain at most one of the assemblies
, becauseit will actively suppressthe others
in the set. Positive associationsbetweencell assemblieswhich favor particular configurations
are analogous to the linkage of coadapted alleles in a chromosome.
Indeed there are many potentially fruitful .' genetic" analogies. As the CNS gains
experience, some assembliesin a cross-inhibited set are likely to be expressedin a
broadenedrange of environmental conditions, at the expenseof others in the seta processsuggestiveof the evolution of (partial) dominance. Various geneticoperators
such as crossoverand inversion find their counterparts in the ways in which
cell assemblyassociationsare modified. Temporal associationscorrespondto feedback
among gene-products and sequential expression of genes. The list can be
extendedeasily.
The needsof the organism define its goals, and ultimately set a criterion on

Adaptation in Natural and Artificial Systems

performance. Just as in gamesand searches, there is the problem that individual


responsesdo not often directly yield needsatisfaction. However, the internal model
discussedin connection with property (6) enablesthe CNS to constantly improve
performancein the absenceof current needsatisfaction. Two kinds of improvement
are possible. First of all , cell assembliestypically respond to too broad a range of
situations when first formed, yielding inconsistenciesin the model. That is, situations
, and hence the same
activating the same combinations of cell assemblies
, are followed by radically different outcomes. The remedy here is much
responses
like that for inadequacyof detectorsdiscussedin the illustration on searches. Because
of the inconsistenciesnew associationsare formed betweenthe cell assemblies involved, causingthem to split and recombineso that their responsesare more
discriminative. (Hebb calls the related proceduresfractionation and recruitment.)
"
"
The secondkind of improvement consistsin filling in the model- generallythere
will be many situations where no expectationsor predictions have beendeveloped.
This clearly provides an important role for curiosity. The CNS must experiencea
wide enough range of situations to provide an adequate repertory of relevant
. Just as with the coadapted sets of genetics, the basic laws of
temporal sequences
cell assembliespermit flexible recombination (association) under environmental
(sensory) guidance, the actual combinations being influenced by the parts of the
model (associations) already extant. In this way a tremendous range or useful
procedurescan be formed from relatively few elements. More importantly , a single
experiencethen constitutes a trial of a great many relevant associations, just as in
geneticsa single organism tests a great many coadaptedsets. Property (3) assures
"
"
that many associationswill be tested and modified. The ultimate survival of
various combinations of assembliesis determined by their consistencywithin the
model and their successin contributing to learned or unlearned need satisfaction.
While the foregoing analogiesare ready offshoots of the formal framework,
the basic task of theory in this area is quite difficult . It must enable one to judge
whether proposedmechanismsfor CNS operation permit the learning rates, utiliza tion of cues, transfer of learning, etc., that one actually observes. How does the
CNS maintain its rapidity and appropriatenessof response, while extending its
breadth and filling in its model of the environment? Section 8.4 indicates one way
in which conceptsfrom the (.1, 8, x) framework can be brought to bear. In particular
, the robustnessof reproductive plans, when interpreted in this area, indicates
somepromising directions, but we evenlack good generalmeasuresof performance
here. A kind of error function basedupon averageneedlevels might be interesting
for organismsnot quite so efficient as man at keepingtheir primitive needssatisfied.
A criterion X could then be formulated, much as it was in the optimal control

Illustrations

illustration , and it might be possible to use the framework more precisely in this
context (especiallyfor animals in the wild state). Somesuggestionsfor bringing cell
assemblytheory within the range of the (:1, 8, x) framework are made in section
8.4. There is much to be done before we can hope for definite, generalresults from
theory.
Summarizing:
.
d , repertory of possiblecell assemblies
0 , possibleassociationrules (Hebb's rule for synapsechange, short-term memory
rules, etc.).
:1, possible ( or hypothetical) organizations of the CNS in terms of conditions
under which the rules of 0 are to operate.
8, the range of environments in which the CNS being studied is expected to
operate(relevant features, cues, etc.).
x, the ranking of organizations in :1according to performanceover 8, for example
, according to ability to keep averageneedslow under any situation in 8
cf.
( optimal control illustration ).
These illustrations are intended to demonstrate the broad applicability of
the (3, 8, x) framework. They can also serve to demonstrate something else. The
obstaclesdescribedinformally at the end of section 1.2 do indeedappearas central
problems in each of the fields examined. This is an additional augury for making a unified approach to adaptation- common problems should have common solutions
. Much of the work that follows is directed to the resolution of thesegeneral
problems. In section 9.1 the problems are listed again, more formally , and the
relevanceof this work to their resolution is recapitulated.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

4. Schemata

An adaptive systemfacesits principal challengewhen the set of possiblestructures


B, is very large and the performance functions IJ.B involve many local maxima. It is
important then for the adaptive systemto provide itself with whatever insurance
it can againsta prolonged search. It is clear that the searchof B, must go on so long
as significant improvementsare possible( unlessthe systemis to settle for inferior
performancethroughout the remainder of its history). At the sametime, unlessit
exploits possibilitiesfor improved performancewhile the searchgoeson, the system
pays the implicit cost of a performance less even than the best among known
alternatives. Moreover, unexploited possibilities may contain the key to optimal
performance, dooming the systemto fruitless search until they are implemented.
There is only one insuranceagainst thesecontingencies. The adaptive systemmust,
as an integral part of its search of B" persistently test and incorporate structural
propertiesassociatedwith better performance. As with most insurance, this particular
policy contains a limiting clause: useful properties must be identified to be
exploited. The presentchapter is concernedwith this limitation .
Almost by definition useful properties are points of comparison between
structures yielding better-than-average performance. The question then is: How
are the structuresin B, to be compared? If the structures are built up from components
, comparison in terms of common components is natural and the question
becomes: How is credit for the above-averageperformance of a structure to be
apportioned to its components? A more general approach usesfeature detectors
(seesection 3.4) to make comparisons. Since one can find an appropriate detector
for any effectively describablefeature of structures in B, (including the presenceor
absenceof given components) this approach is well suited to present purposes.
To begin with let us seehow comparisons can be developedwhen a finite
set of detectors {8i:B, - + Vi, i = I , . . . , I } is given. In terms of the given detectors
each structure A E: B, will have a representation(8i(A ), ~ A ), . . . , 8,(A ; that is,
each structure A will he described by its particular ordered set of I attributes or
detector values8.{ A ) E: Vi, i = I , . . . , I. Thus, for a chromosome...4, Vi can desig66

Schemata

nate the set of alleles of locus i (seesection 3.1) and the correspondingrepresentation
of A is the specificationof the ordered set of alleleswhich make up the chromosome
. For a von Neumann economy(section 3.2) the V.. can designatethe possible
levelsof the ith activity so that the representationof a mixture of activities is simply
the corresponding activity vector. Similar considerations apply to each of the remaining
illustrations of chapter 3. Clearly, with a given set of I detectors, two
structureswill be distinguishableonly insofar as they have distinct representations.
Since, in the presentchapter, we are only interested in comparisonslet us assume
that all structuresin (t are distinguishable(havedistinct representations) or , equivalently
, that (t is used to designate distinguishable subsets of the original set of
structures. For simplicity in what follows (t will simply be taken to be the set of
representationsprovided by the detectors (rather than the abstract elements so
represented
).
{ A such that 8.(Aj
- V..} - +
{ A such that 8.(A)
- VII} - +
{A such that 8.(A )
- V..} - +
-

,
. .
j
,
j
"

,
j

1
{A suchthat8s(A)
- YII}

f
{A suchthat 8s(A)
- Va}

. . .

ofadesignated
~ Subset
schema
0...0
V1aO
by
ofa0designated
~ Subset
...0
schema
VaO
by
ofadesignated
~ Subset
...0
schema
V1aVaO
by

Fig . 9. Diagrammatic presentation I of some schemata

Adaptation in Natural and Artificial

Systems

Now our objective is to designate subsetsof (t which have attributes in


"
'
"
" "
common. To do this let the $ymbol 0 indicate that we don t care what attribute
occursat a given position (i .e., for a given detector). Thus (Via, 0 , 0 , . . . , 0 )
designatesthe subset of all elementsin (t having the attribute Vu E: VI. (Equivalently
, (vu, 0 , . . . , 0 ) designatesthe set of all I-tupies in (t beginning with the
, Val) belong to (Vu, 0 , 0 ), but
) and (Vu, V21
-; hence, for I = 3, (vu, V'J;2, Va2
symbol VIa
'
"
, ~ 2, Va2
) does not.) The set of all I tupies involving combinations of don t
(VI2
cares" and attributes is given by the augmentedproduct setE = n ~- l { Vi U { O } } .
Then any I-tuple ~ = (~il' ~it, . . . , ~ I) E: E designatesa subset of (t as follows:
A E: (t belongsto the subsetif and only if (i ) whenever~ I = 0 , any attribute from
Vj may occur at the jth position of A , and (ii ) whenever~ I E: Vj , the attribute ~ I
, Va2
, Va)
, Val, Va) and (Vu, V21
must occur at thejth position of A. (For example, (Vn, V21
of
I
set
The
not.
does
but
belonging
to
V42
tupies
)
, Val, )
(Vn, V21
, 0 , Va)
belong ( 0 , V21
to E will be called the set of schemata; E amounts to a decomposition of (t into a
large number of subsetsbased on the representation in terms of the I detectors
{8i:(t - + Vi, i = 1, . . . , I } .
Schemataprovide a basis for associatingcombinations of attributes with
"
"
potential for improving current performance. To seethis, let improvement be
defined as any increment in the averageperformance over past history. That is, if
I t ( t is the performance of the structure <t( t) tried at time t, the object is to discover
ways of incrementing

.a(T) = ~ ET- 1II<<l(t .


(A more sophisticatedmeasurewould give more weightto recenthistory,
using

IJ
.'(T) = (}::T. l c,lJ.{a( t )/(}::T- l C,), c, > c" for t > t' ,
but the simple average suffices for the present discussion.) Though .a(T ) can be
incrementedby simply repeating the structure yielding the best performance up to
time T this does not yield new information . Hence the object is to find new structures
which have a high probability of incrementingp,(T ) significantly. An adaptive
plan can use schematato this end as follows: Let A E: (t have a probability P(A )
of being tried by the plan T at time T + I . That is, T inducesa probability distribution
P over (t and, under this distribution , (l becomesa sample space. The performance
measure#J. then becomesa random variable over (t , A E: (t being tried
with probability P( A ) and yielding payoff ,,(A ). More importantly , any schema
~ E: ,sodesignatesan event on the samplespace(t . Thus, the restriction #J. I ~ of #J. to

Schemata

69

the subsetdesignatedby ~, is also a random variable, A E: ~ being tried with probability


'
(P( A / (LA ' E:EP( A and yielding payoff P( A ). In what follows ~ will be
used to designateboth an element of S; and the corresponding random variable
with samplespace~, the particular usagebeing specifiedwhen it is not clear from
context. As a random variable, ~ has a well-defined averageIlt (and variance ul)
where, intuitively , Ilt is the payoff expectedwhen an element of ~ is randomly selected
under the marginal distribution (P(A / (LA ' E:EP(A ' .
Clearly, when Ilt > .a(T ), instancesof ~ (i.e., A E: ~) are likely to exhibit
performancebetter than the current average.a(T ). This suggestsa simple procedure
(a bit too simple as it turns out) for exploiting combinations of attributes associated
with better-than-averageperformance while further searching(t,: (i ) try instances
of various schematauntil at least one schema~ is located which exhibits a sample
average .at > .a(T ); (ii ) generate new instances of the (observed) above-average
schema~, returning to step (i ) from time to time (particularly when the increasing
overall average.a(T ) comes close to .at> to locate new schemata {~' } for which
.at' > .a(T ). In effect, then, credit is apportioned to a combination of attributes in
accord with the observed average performance of its instances. This procedure
has some immediate advantagesover a fixed random (or enumerative) search of
(t,: it generatesimprovements with high probability while gathering new information
by trying new A E: (t, ; furthermore, the new trials of the above-averageschema
~ increaseconfidencethat the observedaverage.at closely approximates Ilt . It is
oversimple becauseeach instance A E: (t, tried yields information about a great
many schemataother than rinformation which is not used.
Given / detectors, a single structure A E: (t, is an instance of 2' distinct
schemata, as can be easily affirmed by noting that A is an instanceof any schema~
'
defined by substituting " 0 " s for one or more of the / attribute values in A s
representation. Thus a single trial A constitutes a trial of 2' distinct random variables
, yielding information about the expectedpayoff Ilt of each. (If / is only 20
this is still information about a million schemata!) Any procedurewhich useseven
a fraction of this information to locate ~ for which Ilt > p(T ) has a substantial
advantageover the one-at-a-time procedurejust propo~ d.
Exploiting this tremendousflow of information posesa much more clearly
defined challengethan the one which started the chapter. Schematahave advanced
our understanding, in this sense, but the new problem is difficult . The amount of
storagerequired quickly exceedsall feasible bounds if one attempts to record the
averagepayoff of the observedinstancesof each schemasampled. Moreover, the
information will be employed effectively only if it is used to generatenew A E: (t,
which, individually, test as many above-averageschemaas possible. The adaptive

~ Systems
Adaptation in Natural and ArtiJici

systemis thus faced with a specific problem of compact storage, access


, and effective
use of information about extremely large numbers of schemata. Chapter 6
"
"
( Reproductive Plansand Genetic Operators ) setsforth a resolution of thesedifficulties
, but a closer look at schemata(the remainder of this chapter) and the optimal
allocation of trials to sets of schemata (the next chapter) provides the proper
setting.
Let us begin with a concrete, but fairly general, interpretation of schemata
stemming from the earlier discussionof control and function optimization (section
3.5, p. 57). Consider an arbitrary bounded functionf (x ), 0 ~ x < I , and assume
that x is specified to an accuracy of one part in a million or , equivalently, that
valuesof x are discretelyrepresentedby 20 bits. Define <t. to be the set of 220discrete
values of x representedwith 20 detectors {8i :<t. - + {O, I } , j = I , . . . , 20} where
8,{ x ), x E: <t., assignsto x the value of the jth bit in the binary expansion of x.
The schema 100 . . . 0 then is just the right half-plane i ~ x < I , while the
schema0000 . . . 0 is a set of four strips {O ~ x < i , t ~ x < f , i ~ x < I ,
! ~ x < i } and the schema 1000 . . . 0 is the intersection of the two previous
schemata {i ~ x < I , ! ~ x < i } (seeFigure 10).
With this representationthere are 320distinct schematasince any 20- tuple
over the set {O, I , D } definesa schema. ( Moretechnically, the schemataare simply
hyperplanes, of dimension 20 or less, in the 2O- dimensionalspaceof detector-value
combinations.) Note that there are many points, such as x = H = .11010. . . 0,
which are instancesof all three of the schematajust singledout. Note also thatf has
a well- definedaveragevaluef Eon each schema~ (for any weighting of the values
f (x ), as by a probability distribution ). Clearly, for any x , knowledge of f (x ) is
relevant to estimatingf Efor any schemafor which x E: ~. Moreover, observations

rJ
~
~

100 . . . 0
DO 0 0 . . . 0
1000

(
2
1

. flona/ / Unction
Fig. 10. Some schemata/ or a one- dimen

... 0

Schemata

off for relatively few x will enableff to be estimatedfor a great many fEE . Even
a sequenceof four observations, say x ( l ) = .0100010. . . 0, x( 2) = .110100. . . 0,
x( 3) = .100010. . . 0, x (4) = .1111010. . . 0, enablesone to calculate three-point
estimatesfor many schemata, e.g. (assumingall points are equally likely or equally
weighted), 1100 . . . 0 = ( f (x( 2 + f (x( 3 + f (x( 4 )/ 3 and 1000D 10 . . . 0 =
( f (x( I + f (x( 3 + f (x (4 )/ 3, and two- point estimatesfor even more schemata,
e.g.,/ oloD lo . . . o = ( f (x( 3 + f (x( 4 )/ 2 and/ tlo . . . 0 = ( f (x( 2 + f (x( 4 )/ 2.
The picture is not much changed if f is a function of many variables
Xl, . . . , Xd. Using binary representationsagain, we now have 2Oddetectors (assuming
the sameaccuracyas before), 32CW
schemata, and eachpoint is an instanceof
pcw schemata. In the one-dimensional case the representation transformed the
problem to one of sampling in a 2O- dimensional space- already a space of high
dimensionality- so the increaseto a 2Od-dimensional spacereally involves no significant
conceptual changes. Interestingly, each point (Xl, . . . , Xd) is now an instance
of 22Od
schematarather than 220schemata, an exponential (dth power) increase
. Thus, for a given number of points tried , we can expect an exponential
(dth power) increasein the number of schematafor whichf f can be estimatedwith
a given confidence. As a consequence
, if the information about the schematacan
be stored and used to generate relevant new trials, high dimensionality of the
argument space {O ~ Xj < I ,j = I , . . . , d} imposesno particular barrier.
It is also interesting in this context to compare two different representations
for the same underlying space. Six detectors with a range of 10 values can yield
approximately the samenumber of distinct representationsas 20 detectors with a
range of 2 values, since 108~ 220= 1.05 X 108(cf. decimal encoding vs. binary
encoding). However the numbers of schematain the two casesare vastly different:
118= 1.77 X 108vs. 320= 3.48 X 108. Moreover in the first caseeach A E: (1 is
an instanceof only 28 = 64 schemata, whereasin the secondcaseeachA E: (1 is an
instanceof 220= 1.05 X 108schemata. This suggeststhat , for adaptive plans which
can usethe increasedinformation flow (such as the reproductive plans), many detectors deciding among few attributes are preferable to few detectors with a range
of many attributes. In geneticsthis would correspond to chromosomeswith many
loci and few allelesper locus (the usual case) rather than few loci and many alleles
per locus.
Returning to the view of schemataas random variables, it is instructive to
determine how many schematareceiveat least some given number n < N of trials
when N elementsof (1 are selectedat random. This will give us a better idea of the
intrinsic parallelism wherein a sequenceof trials drawn from (1 is at the sametime
a ( usually shorter) sequenceof trials for each of a large number of schemata

Adaptation in Natural and

ArtificialSystems

~ E=: E. It will be helpful in approaching this calculation (and in later discussions)


to more carefully classify the schemata. A schemawill be said to be definedon the
set of positions { h, . . . , i,,} at which 6.., ~ o . If V = U i Vi has kelements and we
consider all schemata over V, i.e., E = { V U { O } } " then there are k " distinct
schematadefined on any given set of h ~ I positions. Moreover, for any given set of
h positions, every A E=: V' is an instance of one of thesek " schemata. That is, the
set of schematadefined on a given set of positions partitions Ct, and each distinct
set of positions givesrise to a different partition of Ct. ( For example, if V = {O, I }
and I = 4, the set of schematadefined on position I is {OO 0 0 , 100 O } , where
abbreviates (0, 0 , 0 , D ) etc. It is clear that every element in ct = V'
0000
begins either with the symbol 0 or else the symbol I , hence the given set partitions
Ct. Similarly the set defined on position 2, { DODD , DIDO } , partitions Ct,
and the setdefinedon positions 2 and 4, { DODO, 000 I , 0 I DO, 0 I 0 I } is still a
different partition of Ct, a refinement of the one just previous.) There are (~) distinct
ways of choosing h positions { I ~ h < i2 < . . . < i" ~ I } along the I-tuple,
and h can be any number betweenI and I. Thus there are E ~. l (~) = 2' - I distinct
partitions induced on ct by thesesetsof schemata. It follows that any sequenceof
N trials of ct will be simultaneouslydistributed over each of thesepartitions. That
is, each of the 2' sets of schemata defined on the 2' distinct choices of positions
receivesN trials.
On the assumption that elementsof ct are tried at random, uniformly (elements
equally likely) and independently, we can use the Poisson distribution to
determinethe number of schematareceivingat leastn < N trials. The basicparameter
required is the averagenumber of trials per schemafor any set of schemata
defined on h positions. The value of this parameter is just N I k" since there are k"
schematadefined on a fixed set of h positions. The Poissondistribution then gives
'
r(nN ) = E : , . - Nlk " )'" In !) exp ( - Nlk " )
as the proportion of schematadefined on the positions h, . . . , 4 and receiving at
least n out of the N trials.
This can be directly generalizedto give a lower bound in the casewhere the
distribution over ct is no longer uniform . Let X.(h, . . . , i,,) designatethe fraction
of schematadefined on h, . . . , i" for which the probability of a trial is at least
Elk", let 'Yhbe the proportion of the (~) setsof schematadefined on h positions for
which X.(h, . . . , i,,) > .80and, finally, let 'Yo= min" 'Y". Then the expressionabove,
by a simple manipulation, yields
" E : , . - ENlk" )'" In ' !) exp ( - ENlk" )
r(nE , N) = E ~. l 'Y"(~).Bok
'
~ 'Y0
/30E ~- l (~)k" E : , - - ENIk ")'" In !) exp ( - ENIk " ) .

Schemata

Values for this bound can be obtained from standard tables for the Poisson distribution
, but the following representative cases will give some feeling for the
numbers involved . Setting k = 2 and / = 32 ( so that d contains 212~ 4.3 X 10'
distinct elements) we get :
, N)
( nE

I(nE, N)

( 8, 1, 16)
( 8, i , 16)

> 7000yo
Po
> 5OyoPo

(4, n , 32) > 2OyoPo


Po
(8, 1, 32) > 900010
Yo
( 8, 1, 32) > 5OyoPo

(If the distributionof


trials is uniform, 'Yo= .80= 1.)

Evenfor the smallvaluesof N consideredhereit is clearthat a greatmanyschemata


will receivea significantnumberof trials. Moreoverthe figuresgivenare
conservative
sinceat leastoneschemadefinedon eachdistinctsetof positions
quite
mustreceiveat least1trial in k" , whereas
theboundassumes
nonereceivemorethan
1 in Ek".
The figuresjust givenalsohint at a compactwayfor storinga greatdealof
informationabout schemata
. Supposethat the objectis to storethe relativerank
of a largenumberof schematawhere, say, ~ rankshigherthan ~' when,4f > ,4f'.
Let us limit the numberof ranksto M (e.g., by dividing the rangeof p. into M
intervalsandassigning~the samerank as~' if their averagepayoffs,,4fand,4f" fall
in thesameinterval). Now with a setof Melements,<B= {Ai E: Ct,j = 1, . . . , M } ,
it is possibleto representthe rank of a givenschema~ by the numberof instances
of ~, Ai E: ~, in the set. That is, if ~ has rank m < M, there will be m instances
of ~ in <B, {Ail E: ~, h = 1, . . . , m} C <B. Note that thereis no requirementthat
Ai = Ai , forj ~ j ' , so that givensomeother schema~' we may haveAi E: ~' but
'
Ai , ~ ~ . Thus the sameinstancesusedto representthe rank of ~ can be usedin
.
differingnumbersto representthe ranksof other schemata
For example
, giventhe 8 individuals
Al =
At =
A, =
A. =
A. =
A. =
A7 =
As =

000010
001111
011<xx>
011011
101100
110011
111011
111101

Adaptation in NaturalandArtificial Systems

we see that the schema~ = 011000 is assignedrank 3 by the three instances


A" A . , A7, while the schema~' = 000010
is assignedrank 4 using A4 E: ~ and
A7 E: ~ with two other instances, and ~" = 110 0 0 0 is assignedrank 3 using
A7 E: ~ and two other instances.
If we set M = 32 then the above calculation for 1( 8, 1, 32) indicates that
some sets of size 32 drawn from Ct ( randomly generatedones in this case) can
assigna rank m ~ 8 to 9000 distinct schemata(for k = 2, I = 32). The problem
then is one of using this potential to representthe relative ranking of the sample
averages.4f for a large set of observedschemata. Once again we must wait upon
the discussion of reproductive plans in chapter 6 to see that this can be done.
Summarizing: Given a set of detectors {ai:Ct- + Vi, i = I , . . . , I } the elements
A E: Cteach have a representation(~ (A ), . . . , a,(A in terms of the ordered
set oflattributes a,{ A ) E: Vi, i = I , . . . , I. Each ~ E: ,so= nI - l { V.. U { O } } designates
a particular subsetof Ct, namely all elementsof Ctfor which the corresponding
" "
representationsmatch all positions in ~ which are not 0 s. Given a set of observations
Ct( 1), Ct(2), . . . , Ct(t ) from Ct, the averagepayoff .4f of the observedinstances
'
t
Ct( ) E: ~ is apportioned to ~ as its credit for the performancesof the A E: Ctpossessing
the correspondingset of attributes. Since each A E: Ctis an instance of 2'
schematait constitutesa valid sample point of 2' distinct subsetsof ( or events on)
Ct. This suggeststhe existenceof algorithms which, by testing many possibilities
with a single trial , are intrinsically parallel and which store the relative rankings
of .4f for a great many schemataby selectinga small set <B C Ct. The algorithms
introduced in chapter 6 will realizethesepossibilities. Later (chapter 8) dependence
on the detectors {ai} will be eliminated by subjecting the detectors themselves
to adaptation.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

5. TheOptimalAllocation of Trials

In the last chaptera schema~wasdefinedas potentially useful when.4f' the observed


averageperformanceof instancesof that schema, was significantly greaterthan the
overall averageperformance. However, .4f is basically a sample averagefor a random
variable ( or sequenceof random variables) and, as such, is subjectto sampling
error. For any two schemata~ and ~' , there is always a non-zero probability that
IJ.f ' > IJ.f even though .4f > .4f' . This reintroduces in a sharp form the conflict of
exploiting what is known vs. obtaining new information . Confidence that the
ranking .4f > .4f' reflectsa true ranking IJ.f > IJ.f ' can be increasedsignificantly only
'
by allocating additional trials to both ~ and ~ . Thus, we can allocate a trial to exploit
the observedbest or we can allocate a trial to reducethe probability of error
as much as possible but we cannot generally do both at once. Given astring representeddomain (1, it is important to have some idea of what proportion of
trials should be allocated to each purpose as the number of trials increases.
Correspondingto eachof theseobjectives- exploitation vs. new information
- there is a sourceof loss. A trial allocated to the observedbest
may actually incur
a loss becauseof sampling error , the observedbest being in fact lessthan the best
among the alternatives examined. On the other hand trials intended to maximally
reduce the probability of error will generally be allocated to a schemaother than
the observed best. This means a performance less on the averagethan the best
among known alternatives, when the observationsreflect the true ranking. Stated
succinctly, more information meansa performance loss, while exploitation of the
observedbest runs the risk of error perpetuated.
Competing sourcesof loss suggestthe possibility of optimization- minimiz ing expectedlossesby a proper allocation of trials. If we can determinethe optimal
allocation for arbitrary numbers of trials , then we can determine the minimum
lossesto be expectedas a function of the number of trials. This in turn can be used
as a criterion against which to measurethe performance of suggestedadaptive
plans. Such a criterion will be particularly useful in determining the worth of plans
75

Adaptation in Natural and Artificial Systems

which useschematato compare structures in d . The objective of this chapter is to


determine this criterion . In the process we will learn a good deal more about
schemataand intrinsic parallelism.
1. THE 2 -ARMED BANDIT
The simplest precise version of the optimal allocation problem arises when we
restrict attention to two random variables, ~ and ~' , with only two possible payoffs,
0 or 1. A trial of ~ producesthe payoff 1 with probability PI and the payoff 0 with
'
probability 1 - PI; similarly ~ produces I with probability P2and 0 with probability
I - P2. (For example, such trials could be produced by flipping either of two
unbalancedcoins, one of which producesheadswith probability PI, the other with
/
probability P2.) One is allowed N trials on each of which either ~ or ~ can be selected
. The object is to maximize the total payoff (the cumulative number of heads).
Clearly if we know that ~ produces payoff I with a higher probability then all N
trials should be allocated to ~ with a resulting expectedaccumulationPl . N. On the
other hand if we know nothing initially about ~ and ~' It would be unwise not to
test both. How trials should be allocated to accomplish this is certainly not immediately
obvious. (This is a version of the much-studied 2-armed bandit problem,
a prototype of important decision problems. Bellman [ 1961] and Hellman and
Cover [ 1970] give interesting discussionsof the problem.)
If we allow the two random variables to be completely general (having
probability distributions over an arbitrary number of outcomes), we get a slight
generalization of the original problem which makes direct contact with our discussion
of schemata. The outcome of a trial of either random variable is to be interpreted
as a payoff (performance). The object once more is to discover a .procedure
for distributing an arbitrary number of trials , N , between~ and ~/ so as to maximize
the expectedpayoff over the N trials. As before, if we know for each~i the meanand
variance(lI.i, 0'1) of its distribution (actually the mean lI.i would suffice), the problem
hasa trivial solution (allocate all trials to the random variable with maximal mean).
The conflict assertsitself, however, if we inject just a bit more uncertainty. Thus we
can know the mean-variance pairs but not which variable is described by which
' f ) and (11
' ~) but not which pair describes~.
.1, O
.2, O
pair ; i.e., we know the pairs (11
If it could be determined through some small number of trials which of ~
and ~' has the higher mean, then from that point on all trials could be allocated to
that random variable. Unfortunately , unlessthe distributions are non-overlapping,
no finite number of observationswill establish with certainty which random variable
has the higher mean. (E.g., given II.f > II.E' along with a probability P > 0 that
a trial of ~' will yield an outcome x > II.f, there is still a probability pH after N

The Optimal Allocation of Trials

trials of ~' that all of the trials have had outcomesexceedingIJ.E. A fortiori their
' will exceedIJ.Ewith probability at leastpH, even though IJ.E' < IJ.E.) Here
average.4E
the tradeoff betweengathering information and exploiting it appearsin its simplest
terms. To seeit in exact form let ~(1)(N) namethe random variable with the highest
observedpayoff rate (averageper trial ) after N trials and let ~(2)(N) name the other
random variable. For any number of trials n, 0 ~ n ~ N , allocated to ~(:!)(N) (and
n),
assuming overlapping distributions) there is a positive probability , q( Nn,
that ~(:!)(N ) is actually the random variable with the highest mean, max {IJ.E, IJ.E'} .
The two possiblesourcesof loss are: ( I ) The observedbest ~(1)(N) is really second
best, whencethe Nn
trials given ~(l)(N ) incur an (expected) cumulative loss
.E- IJ.rl ; this occurs with probability q( Nn
(Nn ) . IIJ
, n) . (2) The observedbest
is in fact the best, whencethe n trials given ~(2)(N) incur a loss n . IIJ
.E- IJ.f1 ; this
occurs with probability ( I - q( Nn , n . The total expectedloss for any allocation
of n trials to ~(2) and Nn
trials to ~(l) is thus
L( Nn, n ) = [q( Nn

, n) . (N - n) + ( I - q( Nn

, nn

.E- IJ.rl .
] . IIJ

We shall soon seethat , for n not too large, the first source of loss decreases
as n increasesbecauseboth N - nand q( Nn
. At the sametime the
, n) decrease
secondsourceof loss increases. By making a tradeoff betweenthe first and second
sourcesof loss, then, it is possibleto find for each N a value n *(N) for which the
lossesare minimized; i.e.,
L( Nn

* , n* ~ Nn, n
) L(

) for all n ~ N.

For the determination of n* let us assumethat initially one random variable


is as likely as the other to be best. (This would be the casefor exampleif the two
unbalanced coins referred to earlier have no identifying external characteristics
and are positioned initially at random. More generally, the result is the sameif the
labels of the random variables are assignedat random. The proof of the theorem
will indicate the modifications necessaryfor caseswhere one random variable is
initially more likely than the other to be the best.) For conveniencelet us adopt the
convention that ~ is the random variable with the highest mean and let 1J
.1 be that
mean; accordingly ~2is the other random variable with mean1J
.1. (The observer,
.2 ~ 1J
of course, does not know this.) Using these conventions we can now establish
THEOREM
5.1: Given N trials to be allocated to two random variables, with means
' , 0'1 respectively, the minimum expectedloss results when
1J
.1 > 1J
.2 and variancesO
the numberof trials allocated ~(2)(N) is
n ~ n * ,....., b2In [ N2/ ( 8rb4 In N2)]

,
Adaptation in Natural and Artilic~ Systems

whereb = O
' I/ {I 1 - 11
.2). If , initially , one random variableis as likely as the other to
*
=
be best, n n and the expectedlossper trial is
L *(N) ' " (b2(pl - 1I.2)/ N )[ 2 + In [ Nt / ( S...b4In Nt )] ] .
"
"
(Given two arbitrary functions, Y( t) and Z ( t), of the samevariable t, Y( t) ' " Z ( t)
will be usedto meanJim,_ - ( Y( t)/ Z ( t = I while " Y( t) ~ Z ( t)" meansthat under
stated conditions the differenceY ( t) - Z ( t is negligible.)
Proof: In order to select an n which minimizes the expectedloss, it is necessary
first to write q( Nn , n) as an explicit function of n. As defined aboveq( Nn
, n)
=
that
the
observation
N
.
More
is the probability that ~(2)( ) ~
, say,
carefully, given
'
'
~ = ~(2)(N) , we wish to determine the probability that ~ = ~ . That is, we wish to
determine
'
'
q( Nn , n) = Pr {~ = ~ I ~(2) = ~ }
as an explicit function of N - nand n. Bayes's theorem then givesus the equation
Pr {~' = ~ I ~(2) = ~' }
-

Pr{F.' = F.(2) I F.' = Fit


}Pr
{F.' = &}'
'
'
'
'
Pr{F. = F.(2) I F. = Fit
.OJ
.OJ
}Pr{F.' = F
}
}Pr{F. = &} + Pr{F. = F.(2) I F. = F

'
'
'
'
' "
Letting q , q , and p designatePr{~ = ~(2) I ~ = a } , Pr {~ = ~(2) I ~ = ~} , and
'
'
Pr{~ = ~} , respectively
, and usingthe fact that ~ must be ~ if it is not ~, this
can be rewrittenas
q( Nn

'
'
, n) = q p/ (q p + q" ( 1 - p .

(If one random variable is as likely as the other to be best, then p = ( I - p) = I .)


To derive q' let us assumethat ~' has receivedn trials out of the N total.
trials of & and let 51 be
Let Sf a be the sum of the outcomes(payoffs) of Nn
the correspondingsum for n trials of ~ . Sinceq' has ~' = ~1as a condition , q' is just
the probability that 51/ n < Sf a/ (Nn ) or, equivalently the probability that
a
< O. By the central limit theorems : a/ (N - n) ap( 51/ n) - ( Sf / (Nn
' I / (Nn ) ; similarly ,
.2 and variance O
proachesa normal distribution with mean 1J
' Y/ n. The distribution of ( 51/ n) - ( Sa / (N - n
51/ n has mean1J
.1and variance O
is given by the product (convolution) of the distributions of S': / n and
_ ( Sf - a/ (Nn
; by an elementarytheorem ( on the convolution of normal distributions
.1 - 1J
.2 and variance
) this is a normal distribution with mean 1J
-a
'
I
~ +
- ~ xo) of
.. Thus the probability Pr ~ - n < O} is the tail I
n
NO- nn
NSf
{

The Optimal Allocation of Tria/ 8

a canonicalnormaldistributioncf(x) where
x =

Y - (PI- lit)
2
!! -+: 0'2
~ nN ~ - n

and- Xoisthevalueof x when O. (I .ey ), whichdescribes


thedistribution
-a .
C
'a
0
;)
SN
1
2
~
- ' IStranslorm
ed to x) whichdescribes
the canonicalnormaldisnn N
tributionwith mean0 andvarianceI.) Thetail of a normaldistributionis well
approximatedby
.Y

-'
1 .e
- s/2.
cf(- x) ~ 1- cf(x) "'~ ";\72
; x

Thus

6
'
6
'
t
2
~
+
.
~
.
/
2
-_2(Pt
-#
I
e
I
n
N
1
~
~
12
.2)2.
.
q'.<.,~
~
_
nexP2
0
'
v2,. Xov2,.
2
+ - n]
[!!nN
(, , ]

Jjt )

Usingthe sameline of reasoning(but now with (N - n) observations : of at etc.)


we have

(
~
:
!
I
=
~
~
T
I
:
N
n
V
.$I ~ (P112
)I-;:

~.=
<~}
q"~1- Pr
-:.nn
{N

I
exp 2

- (PI- 1I
.t)2 .
6'2
1 + .!!2
[ Nn n]

From this we see that both q'- and q- " are functions of the variances and means as

well as the total number of trials, N , and the number of trials, n, given f ' . More
'
importantly, both q and I - q" decreaseexponentially with n, yielding
q( Nn

, n) = q'p/ (q'p + q" ( 1 - p " " q' . (p/ ( 1 - p

with the approximation being quite good even for relatively small n. For p = !
this reducesto
, n) " " q'
q( Nn
where the error is less than min {(q' )2, ( I - q" )2} . ( If one random variable is a
priori more likely than the other to be best, i.e., if p ~ ! , then we can seefrom

Adaptation in Natural and

Artificial,Systems

~ ( x)
~ (y)

distribution of

Area = Pr

.. L - ~
nN ~

<
oJ

{
~

y. o
x=
-< II. I p. }

~2+_:N
0':"2-1:-2'
V,-~

y =1".- 1" 2
x =O

y
x

G
'
- - rI

Fig . 11. The convolution of ~ with -~


N
nn

above and from what follows that fewer trials can be allocated to attain the same
reduction ofq ( Nn
, n). The expectedloss is reducedaccordingly.)
The observation that q' and henceq( Nn
, n) decreasesexponentially with
n makesit clear that, to minimize loss as N increases
, the number of trials allocated
the observed best, Nn , should be increased dramatically relative to n. This
observation (which will be verified in detail shortly) enables us to simplify the
expressionfor x .. Whatever the value of 0'2, there will be an No such that , for any
N > No, O
' I / (Nn ) O
' f / n, for n close to its optimal value. ( In most casesof
interest this occurseven for small numbers of trials since, usually, 0'1is at worst an
order of magnitude or two larger than 0'2.) Using this we seethat, for n close to its
optimal value,

- )V;
.
Xo~ (PI 0'I'2
1 , N > No

The Optimal Allocation of Trials

We can now proceedto determinewhat valueof n will minimizethe loss


L( n) by taking the derivativeof L with respectto n.
dL
tin

= 111
.1 = 1IJ
.1 -

where

dq
dq
- +I- q- n-dn
]
[ q + (N n) dn
11
.21. (1- 2q
) + (N- 2n
)~]
[
11
.21 .

1 - e-z~/2- e-zO
2/2 dxo
-dq,<
- = - -q + X~ -dxo
--dn.., .yI2
J dn [ Xo
J dn
,..[ X~

and

dL
#1.1- #1.2= ~ . Thus
~ ,...,
- ,<
. (1.1- #1.21
dn < 20
2n
dn.., 111
[
'IvIn

+ I) .
) - (N- 2n
).q. (~2n
2q
J

n * , the optimal value of n, satisfies = 0, whencewe obtain a bound on n * as


~
follows :

. .q.(~+I)
0~(I- 2q
)- (N-2n
.2n
)
. --< 1- 2q
N-2n
2n
.
( ) q.(x~+I)

es I because
< I / x~ and that ( I - 2q) rapidly approach
q
Noting that I / (~ + I ) ,...,
decreases
exponentiallywith n, weseethat ~
.$ 4 ' wherethe error rapidly
eszeroas N increases
. Thus the observationof the precedingparagraph
approach
-bestgrowing
is verified, the ratio of trials of the observedbestto trials of second
exponentially.
Finally, to obtain n* asan explicitfunction of N , q must be written in terms
*
of n :

*< 2V
N- *2n
:2';1"0'1 1 [ PI- #l.2)2n
*)/(20
'f)].
n -- (PI- #1.2). w .exp

b =
Introducing

0'1/ <1 1 - 11
-2) and ni = Nn

* for
simplification , we obtain

NI,<
..,~ .b.expI [(b- 2n* + In n*)j2]

Adaptation in NaturalandArtificialSystems

N
n* + b21nn* ~ 2b2.ln : --~
bY 8...
where the fact that ( N - 2n * ) ,..." ( N - n * ) has been used, with the inequality
*
generally holding as soon as NI exceeds n by a small integer . We obtain a recursion
*
for an ever better approximation to n as a function of NI by rewriting this as

~* J.
n*~blIn[~~
Srn
Whence

n* ~ b2In

~ !!1.
~ b2In [ 8... In b- lNl)27ST
) - In n*]
~ b2ln

b-

4Nl
~ b21n[ 8... InNl] '
where, again, the error rapidly approaches zero as N increases. Finally, where it is
desirableto have n* approximated by an explicit function of N , the stepshere can
be redone in terms of N In * , noting that ni / n* rapidly approaches NIn * as N
increases. Then

n*""b21n
[8;ij~~"Nt
]

where, still, the error rapidly approach


eszeroas N increases
.
The expectedloss.per trial L *(N) whenn* trials have beenallocatedto
f (2)(T, N) is
I
. Nn
L *(N) = 111
:hi -1 IJill [(

*)q(Nn

, n*) + n*( l - q( Nn

*q(Nn*
*
, n* )+n
0[N-N2n
=1"1- 1'21
N
]
*
*
2n
n
0[~ +N]
~1"1- 1'21
- 1'212+ln HI .
"lN
~bll
[ (S..b4InNI
)]

*, n* ]

Q.E.D.

The Optimal Allocation of Trials

The main point of this theorem quickly becomesapparent if we rearrange


the results to give the number of trials N <*t) allocated to ~(l) as a function of the
number of trials n* allocated to ~(2) :

.
Ntt>= N - n* "" N "" v S...bcln N2e"*/26I
Thus the loss rate will be optimally reduced if the number of trials allocated ~(l)
grows slightly faster than an exponential function of the number of trials allocated
~(2). This is true regardlessof thefonn of the distributionsdefininga and & . Later we
will see that the random variables defined by schemataare similarly treated by
reproductive plans.
It should be emphasizedthat the above approximation for n* will be misleading
for small N when
.1 - 11
.2 is small enough that, for small N , the standard deviation of
(i ) 11
.
.
SI
~
- IS Iarge reIatlve to 11
.1 - 11
.2 and, as a consequence
, teh
n
N - n
xo) fails,
approximation for the tail I or
ii
( ) 0'2 is large relative to 0'1so that , for small N , the approximation for Xois
inadequate.
Neither of thesecasesis important for our objectiveshere. The first is unimportant
becausethe cumulative losseswill be small until N is large since the cost of trying
~ is just 11
.1 - 11
.2. The secondis unimportant becausethe uncertainty and therefore
* is large; hencethe
the expectedloss dependspri,marily on a until Nn
expected
loss rate will be reducednear optimally as long as Nn
~ N (i .e., most trials go
to ~(1 , as will be the caseif n is at least as small as the value given by the approximation
for n * .
*
Finally , to get someidea of n when 0'1is not known, note that for bounded
payoff, ~..:d - + [ ro, rl ] , the maximum variance occurs when all payoff is concentrated
at the extremes, i.e., p( ro) = p( rl ) = I . Then

I r}-2+ 1ro
-2 - I '1+ 1'02= '1--'02.
=(2
0'2
'2
1~O
mu
2 ) ('2 '2) (2 )
REALIZATION OF MINIMAL LOSSES
This section points out , and resolves, a difficulty in using L *(N) as a perfonnance
criterion. The difficulty occurs because
, in a strict sense, the minimal expectedloss
rate just calculated cannot be obtained by any feasibleplan for allocating trials in
terms of observations. As suchL *(N ) constitutesan unattainable lower bound and,

Adaptation in Natural and Artificial Systems

if it is too far below what can be attained, it will not be a useful criterion . However,
we will see here that such loss rates can be approached quite closely (arbitrarily
'
) by feasible plans, thus verifying L *(N) s usefulness.
closely as N increases
*
The sourceof the difficulty lies in the expressionfor n , which was obtained
on the assumption that the n* trials were allocated to ~(2)(N) . However there is no
realizableplan (sequentialalgorithm) which can ' ~foresee" in all caseswhich of the
two random variableswill be ~(2)(N) at the end of Ntrials . No matter what the plan
1', there will be some observational sequencesfor which it allocates n > n * trials
to a random variable ~ (on the assumptionthat ~ will be ~(1)(N only to have~ turn
out to be ~(2)(N) after N trials. ( For example, the observational sequencemay be
such that at the end of 2n* trials l' has allocated n* trials to each random variable.
l' must then decide where to allocate the next trial even though each random variable
has a positive probability of being ~(2)(N) .) For these sequencesthe loss rate
will perforce exceedthe optimum . Hence L *(N) is not attainable by any realizable
1'- there will always be payoff sequenceswhich lead l' to allocate too many trials
to ~(2)(N) .
There is, however, a realizable plan 1'(- ) for which the expectedloss per trial
1..(1'(- ), N) quickly approaches L *(N) , i.e.,

Proof: The expectedloss per trial L( T(_), N) for T(_) is determined by applying the
earlier discussionof sourcesof loss to the present case.
1
L( T(_), N) = N . <Ill - 1&2) . [(Nn

* *
*
* *
)q(n , n ) + n ( l - q( n , n ]

whereq is the samefunction as before, but here the probability of error is irrevocably
determined after 2n* trials. That is
* *
q(n , n ) ~

(~

. v2T(pl - ,..

. exp

~[

- CPI- P2)2
/

( #.

+ ;1 .
)]

The Optimal Allocation of Trials

Rewriting

'
L( 'T(- >, N) we have

*q(n*,n*)+n* .
1.(1'(->,N)=(PI- 12)[N-N2n
N]

Since, asymptotically, q decreasesas rapidly as N - l , it is clear that the secondterm


in the brackets will dominate as N grows. Inspecting the earlier expression for
L( N) we seethe sameholds there. Thus, since the secondterms are identical

limN- ( ( 1'(- >, N) / L*(N = I.

Q.E.D.

From this we seethat, giventhe requisiteinfonnation ( pi, 0'1) and ~ , 0'2),

there exist plans which have loss rates closely approximating L * (N) as N increases.

3. MANY OPTIONS
The function L *(N) sets a very stringent criterion when there are two uncertain
options, specifying a high goal which can only be approached where uncertainty
is very limited. Adaptive plans, however, consideredin terms of testing schemata
,
face many more than two uncertain options at any given time. Thus a general
performancecriterion for adaptive plans must treat loss rates for arbitrary numbers
of options. Though the extensionfrom two options to an arbitrary number of r
*
options is conceptually straightforward, the actual derivation of L (N) is considerably more intricate. The derivation proceedsby indexing the r random variables
.1 > 11
.2 > . . . > p.,. (again,
a , ft , . . . , ~,. so that the meansare in decreasingorder 11
without the observerknowing that this ordering holds).
THEOREM
5.3: Under the same conditions asfor Theorem5.1, but now with r random variables, the minimum expected loss after N trials must exceed

(PI- 112
) .(' - 1)b2 2 + In S ' [
( 'l{
~ b41nNt) ]
where b = D'l / {I lp

.,.) .

Proof: Following Theorem 5.1 we are interestedin the probability that the average
of the observations of any ~i, i > 1, ex~ s the averagefor a . This probability
of error is accordingly

...,nr
<~
q(nb
),..,P,{(~
n.<~
n2
nl)
nl) or(~

or . . . or

(~

<~ ,
)}

Adaptation in Natural and Artificial Systems

where ni is the number of trials given ~.., and the loss ranges from ( PI - 112
) to
( PI - p. r) depending on which ~i is mistakenly taken for best.
Let n = EC - 2 ni , let m = min { n2, na, . . . , nr} , and letj be the largest index
of the random variables ( if more than one) receiving m trials .
The proof of TheoremS . I shows that a lower bound on the expected loss
is attained by minimizing with respect to any lower bound on the probability
q ( a point which will be verified in detail for r variables ) . In the present case q must
exceed

- r)2m
I
0'1 '
---; Pr{ ~
exp[ <11 2cr1
~ ]
m< Nn } > q = ~ <11- IJ.r)Vm
usingthe fact that (PI- Pr) ~ (PI- 11
.;) for anyj > I. By the definitionofq
LN,,(n) > L~,,(n) = (II.I - 1I.2)[(Nn

)q + n( 1 - q)]

usingthe fact that (PI- 11


.2) ~ (PI - lI.i) for i ~ 2. Moreoverthe samevalueof n
minimizesboth LN,,(n) andL~,,(n). To find this valueof n, set
~

= (PI- 11
.2) (N - 2n) ~ + ( 1 - 2q) = O.
[
]

es I as N increases
Solvingthis for n* and noting that 1 - 2q rapidly approach
,
gives

-l.
dq
n*'""N
+
2 ('d)ii

'
Noting that q must decrease less rapidly than q with increasing n , we have
(dq' / dn) < (dq/ dn) and , taking into account the negative sign of the derivatives ,

-l.
dq
'
n*>"N
+
2 ('d")ii

(This verifies the observation at the outset, since the expectedloss approaches n*
as N increases
- see below.) Finally , noting that n > (r - l ) m, we can proceed
as in the two-variable caseby using (r - 1)In in place of n andtaking the derivative
of q' with respectto m instead of n. The result is
n * > ( I' -

l ) m* ' " ( I' -

1) b21n
( 8-.(1' - ~ b41n Nt )

The Optimal Allocation

of Trials

whereb = 0'1/ <11- P-r). Accordingly,


LN.,(n*) > L'H., r - 1) m*)

-1I
>(PI
.2).(r- 1)[,2[2+In(ST
Nt
)].
(r- ; b4ln

Q.E.D.

4. APPLICATION TO SCHEMATA
We are ready now to apply the criterion just developedto the generalproblem of
ranking schemata. The basic problem was rephrasedas one of minimizing the performance
lossesinevitably coupled with any attempt to increaseconfidencein an
observedranking of schemata. The theorem just proved provides a guideline for
solving this problem by indicating how trials should be allocated among the
schemataof interest.
To seethis note first that the central limit theorem, usedat the heart of the
proof of Theorem 5.3, applies to any sequenceof independent random variables
having meansand variances. As such it applies to the observedaveragepayoff p,~
of a sequenceof trials of the schema~ under any probability distribution P over (i
(cf. chapter 4). It even applies when the distribution over (i changeswith time (a
fact we will take advantageof with reproductive plans). In particular, then, Theorem
5.3 applies to any given set of r schemata. It indicates that under a good
adaptive plan the number of trials of the (observed) best will increaseexponentially
relative to the total number of trials allocated to the remainder.
Near the end of chapter 4 it was proposed that the observedperformance
rankings of schematabe stored by selectingan appropriate (small) set of elements
<B from (i so that the rank of each schema would be indicated by the relative
number of instancesof ~ in <B. Theorem 5.3 suggestsan approach to developing
<B, or rather a sequence<B( l ), <B( 2), . . . , <B( t), according to the sequenceofobservations of schemata. Let the number of instancesof ~ in the set <B( t ) representthe
number of observationsof ~ at time t. Then the number of instancesof ~ in the set
UT. l <B( t ) representsthe total number of observations of ~ through time T. If
'
schema~ should persistas the observedbest, Theorem 5.3 indicatesthat ~ s portion
of UT. l <B( t ) should increaseexponentially with respectto the remainder. We can
'
look at this in a more " instantaneous" sense. ~ s portion of <B( t ) correspondsto the
"
rate at which ~ is being observed, i .e., to the " derivative of the function giving
'
~ s increase. Sincethe derivative of an exponential is an exponential, it seemsnatural
to have ~' s portion M ~ t) of <B( t ) increase exponentially with t (at least until ~

Adaptation in Natural and Artificial Systems

'
occupiesmost of <B( t . This will be the caseif ~ s rate of increaseis proportional
to the observedaveragepayoff .aE<t) of instancesof ~ at time t or , roughly,
dMt <t )/ dt = .aE<t )ME<t ).
It will still be the caseif the rate is proportional to the schema's " usefulness," the
difference between.aE<t ) and the overall averageperformance .a( t) of instancesin
<B( t), so that dMt <t)/ dt = <J1E
<t) - .a( t ME<t ). ( In genetics.aE<t ) - il (t) is called the
" of when is defined on a
"
excess
~
~
single locus, i.e., when ~ is a specific
average
allele.)
The discussionof " intrinsic parallelism" in chapter 4 would imply here that
each~ representedin <B( t ) should increase( or decrease
) at a rate proportional to its
"
"
t
If
use
fulness
t
.
this
could
be
observed
done consistently then each ~
.at< ) il ( )
would be automatically and properly ranked within <B(t) as t increases.The reasoning
behind this, as well as the proof that reproductive plans accomplish the task,
will be developedin full in the next two chapters.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

6. Reproductive
Plans
andGenetic
Operators

In the earlier informal discussion of genetics(sections 1.4 and 3.1) reproductive


plans were introduced as the fundamental procedure of genetic adaptation. The
presentchapter lifts reproductive plans from the specificcontext of geneticsto the
general framework of chapter 2. This, at one stroke, makes reproductive plans
suitable objects for rigorous study and yields a classof plans applicable to the full
range of adaptive systems. Genetic plans, i.e., reproductive plans using generalized
geneticoperators, will be the prime focus; emphasiswill be laid upon the operators'
retention and use of relevant history as they exploit opportunities for improved
performance.
Genetic plans can be applied to any domain of structures (t representedby
strings (I-tupies). ( To build a better intuition for this flexibility the reader may find
it useful to consistentlyinterpret the properties and theoremsadvancedhere in the
most familiar of the nongenetic illustrations of chapter 3.) We will seethat each
structure generated and tested by a genetic plan in effect tests a multitude of
schemata and that the plan actually preservesand exploits this information .
Genetic plans do this by generating sequencesof structures in such a way that ,
once a few instancesof any given schema~ occur, one can count on the cumulative
number of instancesof ~ increasingat a rate closely related to #I.E. The generalized
geneticoperators act so as to test old schematain new contexts, generateinstances
of schematanot previously tested, and so on (see sections7.2- 7.5), without disturbing
the rates of increase. Genetic plans thus exhibit the intrinsic parallelism
discussedat the ends of chapters 4 and 5.
Interpreted in genetics, the results of the next two chapters indicate that
adaptation proceedslargely in terms of pools of coadapted sets of alleles rather
than genepools. As one important offshoot, this approach yields an extension of
Fisher' s ( 1930) classicalresult (on the rates of increaseof alleles) to coadaptedsets
of alleles with epistatic interaction (see section 7.4) . A typical interpretation for
artificial systemscan be obtained by looking again at the functionj (x ) of Figure 10.
89

Adaptation

in Natural

and Artificial

Systems

We seethat the averagevalue JJ


.IDD . . . 0 of all points in the schema100 . . . 0
over
the intervail ~ x < 1 divided by I , the length
curve
the
area
under
the
i.e.
( ,
1.5.
is
of the interval )
Similarly, for 0000 . . . 0 the value is
approximately
approximately 1, for 1000 . . . 0 the value is approximately 2, etc. Thus instances
of 10 0 . . . 0 will accumulateat a higher rate than those of 0 000 . . . 0 , and
instancesof 1000 . . . 0 will accumulatestill more rapidly . The result is an ever
greaterclustering of test points (instances) in intervals (schemata) of above average
the
In
7.3
this
.
value (seeFigure 13 and the example of section )
genetic plan
way
locates a global optimum of f (x ), exploiting false peaks (without entrapment) to
rapidly increasethe averagevalue of points tested.
We will see that genetic plans act with a combination of simplicity and
subtlety both pleasing to the eye and useful in application. They also act with
robustnessand efficiency, a fact that will be finally establishedin the next chapter.
It should be emphasizedthat the plans (algorithms) set forth have a dual role.
When the plan' s parametervalues(and functions) are determined from data about
a particular natural process, the plan servesas an idealized model or hypothesis
about that process. As such it is subject to the general observation-modification
cycle applicable to physical theories in general. Becausethe model is already in
algorithmic form , it is particularly suitable for simulations of the process. The
other role occurs in relation to artificial (designed) processes. Here the plans serve
as optimization procedureswhich can be fitted into the processto control its direction
. In either role the theorems proved hereafter yield predictions which must
come true if (for the natural systems) the basic model is verified or (for artificial
systems) the algorithm is incorporated as a control .
1. GENERALIZED REPRODUCTIVE PLANS
To embedreproductive plans in the (3, 8, x) framework of chapter 2 we must define
a classof plans(algorithms) applicableto an arbitrary set of structures(t. Moreover
each plan must be a mapping of the form T: I X (t ~ o. It must use only the input
from the environment, I (t ), and the structure tried at time t , Ct(t ), to determine a
random variable over (t , ~ t(Ct(t , which is in turn sampledto determine the next
trial , Ct(t + 1) . We will begin by defining a relatively narrow classof reproductive
plans <RI. Later <RIwill be extendedin ways which make some applications more
natural, and we will seethat the new algorithms are essentiallyno more powerful
than those from <RI.
To begin let (tl be the set of structures to be tested and, as in chapter 4,
assumethat the elementsof (tl are representations. (As long as each structure is

Reproductive Plans and Genetic Operators

representedby a finite string of attributes, (11can be made countably infinite without


. This will be discussedin chapter 8.) Each
affecting the presentation of CRI
plan in CRIis an algorithm which acts at eachinstant t upon a small set of structures
<B( t ) from (11(interpretable, for instance, as a population or data base). The
algorithm usesa single basic cycle to modify elementsof the small set, one at a
time, thereby producing a sequenceof new structures for trial . In general terms,
the basic stepsof the cycle are:
I . Select one structure from <B(t ) probabilistically, after assigning each
structure a probability proportional to its observedperformance.
2. Copy the selectedstructure, then apply operators to the copy to produce
a new structure.
3. Selecta secondelement from <B( t ) at random (all elementsequally likely)
and replace it by the new structure produced in step 2.
4. Observeand record the performance of the new structureS
.
Return to step I .
Note that the number of elements in <B( t ) remains constant . ( From the point of
view of genetics, it is convenient to look upon the size of <B( t ) as an upper bound
on population size determined , say, by the " carrying capacity " of the environment
.) The number of structures in <B( t ) can be varied up to the maximum number
by allowing null structures or vacancies.
With this outline as a guide , we can now go on to the rigorous definition of
the algorithms in <RI. The following symbols and definitions will be used with the
interpretations given :

(11
,
M,
(11
cB
(t),
8M=
0,
8MX

the set of basic structures being tested.


the set of all M -tupies of structurescorrespondingto
possiblecompositions of <B.
the particular set of M structures { ai (t ), A2Ct
), . . . ,
AM(t )} available to the adaptive plan at time t.
{ I , 2, . . . , M}, the first M positiveintegers
, used as an index set
for <B.
the set of stochastic operators for modifying structures
.
at' ,
compositions of <B with one structure selected(for
modification by an operator) ; i .e. (i , ai (t ), . . . ,
AM(tE :: 8M X at ' correspondsto <B(t ) with the ith
structure, A .( t ) selected.

,
Adaptation in Natural and ArtificialSystems

p : Cl1- + 0 ,

",: 8MX 1" - + <P,

a set of probability distributions over 1, one of


which is selectedby each application of a stochastic
operator ", E: O.
assignsto each basic structure A E: (t;1the stochastic
operator ", E: 0 which is to be used to modify A.
an arbitrary operator from 0 which determines, from
<B(t ) and a selection;(t ), a distribution P E: CPover (t;1.

Oncethe set of structures(t,1 has beengiven, along with an observation


procedurewhich assignsa payoff II..( A) to eachtrial of a structureA E: (t,1, a
)} .
is determinedby specifyingthe functionsp and {CA
reproductiveplan of type CR1
The algorithmproceedsas follows:
M structures
at randomfrom(t,. to
I Sett = 0 andinitialize<Bby selecting
form<B(O) = {A,.(O), h = 1, . . . , M} .
~
2.1 Observe
andstoretheperformances
{ll.Bf.A,.(Oh = 1, . . . , M} .
2.2 Observe
theperformance
of A' (t) andreplace
'
II.Bf
.A;(c)(t by II.Bf.A (t .
3L Increment
t by 1
t
4 SelectonestructureAi(c)(t) from<B(t) by takingonesampleof <B(t) using
theprobabilities
Prob(A,.(t = II.Bf.A,.(t 1I:A. . II.Bf.A",(t , h = 1, . . . , M.
~
)c = p(Ai(c)(t , and
)cE: n to beappliedto Ai(C), CA
S Determine
theoperatorCA
'
A
a
new
structure
t
use
CA
)
to
determine
then
c
( ) by takinga sampleof (t,.
)c(i(t), A.(t), . . . ,
the
distribution
to
Pc= CA
according
probability
A..,(tE : <P.
1
6 Assign
- -probability
- 11Mto eachnumber1. . . . , M, selectonenumber
'
1 ~ j (t) ~ M accordingly
, and replaceAj(I)(t) by A (t).

Algorithms of type <RIare strictly sequentialin the sensethat one individual


A ' (/ ) is testedat a time. <B( t ) servesas a reservoir of information about the environment
and as a basisfor generatingnew trials. <B(t ) remainsconstant in sizebecause
each new individual A ' (t ) replacesan individual already in the population. Under
the operators n of interest (particularly the generalizedgenetic operators), A ' ( t )
can be looked upon as the " offspring" of Ai (I)(/ ), retaining many (but generallynot
all ) of the attributes of A i (I)(t ) . Via the function p each structure in the population
"
carries a specification of the operator appropriate to it (a kind of " species designation
) . ( The apparent generalization to stochastic selection of one of a set of

ReproductivePlans and GeMtic Operator"

operators can actually be subsumedin the stochastic selection of offspring. See


below.) The operators are computation proceduresusing random numbers; generally
, they useat most one other memberof the population, in addition to A ,<,)(I ),
in the determination of A ' (I ). ( For instance, the operator may randomly select a
" mate" for A t . The
) En includes the whole population,
, <,)( ) )
argument of each CI
becauseany structure in the population is a conceivablecandidate for the second
) is essentiallya binary operator. (E.g., the probable outcomes
operand, evenwhen CI
of a " mating" will depend upon the range of " mates" available.)
It should be noted that the state of the algorithm at the beginning of any
cycle includes not only the population <B(t), but also the retained performances
p.. ( AA(t , h = I , . . . , M , of the structures in <B(t) . Thus, in the general formalism
of chapter 2,
= II X [O" . ] M

where[0, r ] is the interval of possiblepayoffs(performances


), i.e. [0, r ] is the
of
range liB,
IIB:(I;l - + [0, r ] .
,
Accordingly
<l(t) = (..41
(t), . . . , ..4...,(t), 1I.(..41
(t , . . . , 1I.(..4...,(t ).
Thenewinformationl (t), from theenvironmentE E: 8 at eachtime t, is simplythe
'
'
payoffll.(..4(t of the newstructure..4(t). Thusany adaptiveplan .,. E: <Rlhasthe
requiredform
.,.:1 X (I; - + (I;
since
'
'I'{IS
. ( ..4(t , [..41
(t), . . . , ..4...,(t), 1I.(..41
(t , . . . , 1I.(..4...,(t ])
= [..41
t
I
.
.
.
..4
+
(
, ...,(t + I ), 1I.( ..41
),
(t + I , . . . , 1I.(..4...,(t + I ] .
Informally, a reproductiveplan is oneunderwhichthe betteran individual
performsthe moreoffspringit has. For plans.,. E: <Rla precisecounterpartof this
with the helpof the following
propertycanbe established
LEMMA
6.1: If , at any time-step, pi is theprobabilitythat a structure..4produces
an " offspring
" duringthat time-stepandP, is theprobabilitythat ..4is deletedduring
" A
that timestep, thentheexpected
numberof " offspring
of ispJp, .

Adaptation in Natural and Artificial Systems

by notingthat, whenpi andPI areconstant,


Proof: This is immediatelyestablished
the expectedlifespanof A is I /PI and the expectednumberof offspringis simply
the numberof offspringexpectedduring the expectedlifespan, i.e., pi/ Pt. In more
detail, the probability of A surviving for exactly T time-steps is P( T) =
"
"
( I - Pt)T l .PI, and the expectednumberof offspring during that interval is
"
'
"
UA
(T) = PiT. Thus, the expectednumberof offspring during A s lifespanis
ET - 1p(T) Ila(T) = P1PtET - l T( I - Pt)T l .
But ET - l T( I - Pt)T- J convergesto ( I /P2)1 (as may be easilyverifiedby taking
the derivative of both sides of the identity ( 1/ 1 - x) = I + x + ,xl + . . . ).
Therefore
Q.E.D.
ET - IP(T) UA
(T) = Pi/ P2.
For plans in <Rl the interpretation of this lemma is direct: The probability
of AA being selectedto produce an offspring A ' during time t is JlA,I E A'PA" where
PA', = Til.Jl. ( AA,(t , while the probability of AA being deleted at the end of that
time-step is II M . Hence, if EA'PA" changesnegligibly over A A's lifespan, the expected
number of offspring is
(IJ.l ,/ El 'lJ.l , ,)/ ( I / M ) = IJ.l ,/ (El 'lJ.l " / M) = IJ.l '/ P,.
"
"
"
"
IJ.1'/ D, can be looked upon as a normalized payoff, the usefulness of ai being
measuredrelative to the averageperformanceof the other membersin the population
. With this arrangementthe expectednumber of offspring of A " is greaterthan I
'
just in caseA " s performanceis above average. Since p, is not stationary for plans
1" E: <RI, the probability PI does not in fact remain constant (though, over the expected
lifespan of a structure, it will not often change greatly) . If p, increases(as
it will generally with a good plan), then A" will receive fewer offspring than predicted
by the calculation of PI at the time ai originated. That is, the performance
of A " looks less promising relative to the current average, so trials of ai are curtailed
. If lI, decreases
, the opposite effect occurs. Still , the expected number of
'
offspring varies in direct relation to A" s relative performance, so that plans in <RI
satisfy the (informal ) characterization of reproductive plans.
A slight change in the form of the algorithms in <RIyields a class of algorithms
"
"
<Jl(l wherein a time-step is a generation during which each individual
ai (t ) E: CB
(t ) is replaced, deterministically instead of as an expectation, by IJ."'/ p ,
offspring. Thus, for <Jl(l, CB
( t + I ) consistsof the set of all offspring of the individuals
in CB
t
.
the
( ) ( To keep
population level at M individuals a special kind of rounding

Reproductive Plans and Genetic Operators

an integer)
.ac= ( Elf . 1/ls(AA,(t )/ M.
SubstitQte<B' in placeof <B.

for modification.
be

"
"
Algorithms in the class<Rdare closer to some of the deterministic models
of mathematical genetics. It is easier, in some respects, to interpret the role of the
population <B(t ) in theseplans than it is for the strictly sequential, stochasticplans
in {Rl. On the other hand the algorithms in {Rllook more like the " one-point-at-atime
"
algorithms of numerical analysis and computational mathematics. Though
{Rland <Rdbehavesimilarly, it is useful to have both in mind, translating from one
to the other as it aids understanding.
For both types of plan the operators brought into play in step 5 are critical

Adaptation in Natural and Artificial Systems

in determining just how past history is stored and exploited. The examination of
specific operators can be expedited by subsuming ml and <Rdin a single overall
diagram. Plans which satisfy this diagram and retain a recognizablevariant of the
"
"
reproduction according to performance procedures in ml or <Rdwill be called
plans of type m.

1
tt2
t3
-step
3.1Is
the
time
Ino
Iyes
t7
3
.~2
4
,s
1
6
Sett - 0 and initialize <B.

Observeand storethe performances{p~ A.{ t for A.{ t) E: <B(t)} .

Incrementt by 1 and initialize parametersto begina new time-step.


completed?

Substitute<B' in placeof <B.

"
Modify parametersfor production of a new structure ( off " .
spring )

Selectan A(t) E: <B(t) for modification.

Determine the operator (11


, E: n to be applied to A(t) from
(11
, = P(A(t . Then use (11
, to producea new structureA' (t) by
taking a sample of Ct. according to the distribution P, E: <P
selectedby (11
,.
StoreA'(t) in <B'.

because
(In <H. steps6 and 7 are amalgamatedand the testsin 3 are unnecessary
exactlyone newstructureis formedper time-step.)
The next four sectionswill investigatethe role of generalizedgenetic operators
in plans of type <H. We will seethat <B(t ) is usedbasicallyas a pool of schemata.
( Recallfrom chapter 4 that this meansthat <B( t) acts as a repository for somewhere
between 2' and M . 2' schemata; i .e., it contains instances of this many distinct
schemata.) Past history is recorded in terms of the ranking (number of instances)
of each schemain <B(t ), much as discussedat the end of chapterS. From this point
of view crossing-over acts to generatenew instancesof schemataalready in the pool
while simultaneously generating(instancesof) new schemata(seesection 6.2) . In
general a total of 2' schematawill be affected by each crossing-over (seeLemma
6.2.1) . Inversion(section 6.3) affectsthe pool of schemataby changing the linkage
(association) of alleles(attributes) defining various schemata. In combination with
reproduction, the net effect is to increasethe linkage of schemataof high rank

ReproductivePlans and Genetic Operators

(coadapted sets of alleles), making such schemataless subject to decomposition.


Mutation (section 6.4) generally has a background role, supplying new alleles or
new instancesof lost alleles. All of this goes on without seriously disturbing the
intrinsic rates of increase {II.U of most schemata instanced in <B(t ). Chapter 7
establishes the robustnessand intrinsic parallelism of thesetype <Rplans for arbitrary
string-representabledomains Ci.
2 . GENERALIZED GENETIC OPERATORS
- CROSSING- OVER
When genetic operators are used with reproductive plans we get a surprisingly
sophisticatedset of adaptive plans. Like the rules of a well-constructedgame(chess,
.
go, poker), genetic operators are simply defined but subtle in their consequences
Our first objective, as with reproductive plans, will be to lift genetic operators
from their specificbiological context to the general(~, 8, x) framework. With
the help of this framework we can then define and investigate rig ~rously two
critical advantages(first discussedin chapter 4) conferred by geneticoperators:
(i ) intrinsic parallelism in the testing and exploitation of schemata, and
(ii ) compact storageand use of the large amounts of information resulting
from prior observationsof schemata.
This contrasts with the common view of evolutionary processes as successive
selection of the best of a sequenceof variants produced by mutation- a process
which we will see amounts to an enumeration of structures, with its attendant
.
disadvantages
The reader should be warned that the generalizedoperators presentedin
the next three sections are idealized to varying degrees. This has been done to
emphasizethe basicfunctions of the operators, at the cost of exploring the complex
(and fascinating) biological mechanism underlying their execution. Even so an
attempt has been made to keep the correspondenceclose enough to allow ready
translation of the results to the original biological context.
Becauseit serveswell as a paradigm for other genetic operators, we will
look first at " crossing-over." In biological systems, crossing-over is a process
yielding recombination of alleles via exchangeof segmentsbetweenpairs of chromosomes
. We can lift this processto the level of a general operator on structures
by providing the structures with representationsas in chapter 4. As before, for
simplicity, Ci will be taken to be the set of representations. Besidesfacilitating the
generalization to arbitrary structures this emphasizesthe effects of crossing-over
on schemata. Crossing-over proceedsin three steps~

Adaptation in Natural and Artificial Systems

I . Two structures, A = ala2. . . a, and A ' = a~a~ . . . a: , are selected


(usually at random) from the current population <B(t ). (ai and a~ are
elementsof the set of attribute values V. Hence, if Ao is the basic structure
prior to representation, 8.{ Ao) = ai . Again ala2 . . . a, abbreviates
(aI, a2, . . . , aI), etc.)
2. A number x is selectedfrom { I , 2, . . . , 1 - I } (again at random) .
3. Two new structures are formed from A and A I by exchangingthe set of
attributes to the right of position x , yielding al . . . a.zO
~+ I . . . a: and
I
I
.
a
.
.
.
al . . . a' zoZ
,
+1
al
1
1
,
al

at

az

,
az + l

. . .
~
/ - f - . . .
.
.
.
1
- +- - 1' - 1
,
,
a2
as
as +l
I

. . .

I
I

( To incorporate crossing-over directly into plans of type <R one of the resultant
structures is discarded.)
The quickest way to get a feeling for the role crossing-over plays in adaptation
is to look at its effect upon schemata. To do this, consider <B( t ) as a pool of
schemata(following the suggestionsof chapter 4) where the number ME<t ) of instances
of ~ in <B(t ) reflects ~' s current " usefulness." The two direct effects of
crossing-over on this pool are:
1. Generation of new instances of schemata already in the pool . E.g.,
A = alai . . . a, is an instance of the schema alai 0 . . . 0 and, after
I
crossing-over with A = a~a~ . . . af, we have a. new instance of
, lor some
I . . . a,
, (assummga, ~ af ~
alai 0 . . . 0 , name1y alai . . . azDz
+I
i ~ x ) . Each new instance of a schema~ amounts to a new trial of the
random variable corresponding to ~. As such it increasesthe likelihood
that the observedaverage performance .4Eof the instancesof ~ closely
approximatesthe expectation IJ.Eof the random variable ~.
2. Generation of new schemata(i.e. schematahaving neither A nor A I as
an instance) . E.g., after the crossing-over of A with A ' the schema
I
0 . . . 0 azD~+1 0 . . . 0 has an instance, though neither A nor A are
instancesof it (if az ~ a~or az+l ~ a~+ I). Thus 0 . . . 0 azD~+1 D . . . 0
will receive its first trial with the instance alai . . . azD~+1. . . a~, unless
the schema has previously been introduced to the pool from another
source.

ReproductivePlans and Genetic Operators

Once againf (x ) of Figure 10 provides a simple illustration . New instancesof a


schema such as I 0 0 . . . 0 increase confidence that the observed average
.4100. . . 0 of evaluations of f (x ) for selected x E: I 0 0 . . . 0 approaches
IJ.Ioo . . . o . At the sametime an instance of some previously untried schema, say
1100 . . . 0 , allows a plan of type <Rto exploit the new schema(by giving it high
rank) if it is above average.
In modifying the pool of schemata, crossing-over gains tremendous power
from its intrinsic parallelism. Eachcrossing-over affectsgreat numbersof schemata,
as establishedby the following :
LEMMA6.2.1: Let A = ala2. . . a, and A ' = a~a~ . . . a~ differ in attribute values
at x ' positions to the left of x + 1 and x " positions to the right of x. Then either
resultantof a singlecrossing-overof A with A ' at x will be an instanceof 2' - 2l- s' '
"
2l- ,z + 2l- (,z+s" ) " new" schemata(instancedby neither A nor A ' ) . It will also be a
'
'
new instanceof 2l - ,z + 2l- s" - 2l - (,z+s" ) schemataa/ready instancedby A or A '
'
"
(assumingx ~ 0 and x ~ 0) .
'
Proof : After crossing-over, any schemawhich is defined at one or more of the x
"
on the right will have
positions on the left and at one or more of the x positions
'
'
neither A nor A as an instance. On the left there are 2% - I ways of combining
"
one or more of the x ' attribute values with " D ' s" ; similarly there are 2% - I
'
"
an attribute value or
ways on the right ; at the other / - (x + x ) positions either
' "
"
%
" - 1 2' - (s' + ,z"
a 0 is allowable without restriction. Thus there are (2
1)(2%
)(
= 2' - 2' - s' - 2' - s" + 2' - (s' +s" ) " new" schemata of which the resultant is
an instance.
If x ' > 0 and x " > 0 the remainder of the 2' schematainstanced by the
"
resultant, i.e., 2' - s' + 2' - s" - 2' - (s' + ,z ), will have a new instance (though they
are not " new" schemata) since the resultant must differ by at least one attribute
value from both A and A ' .
Q.E.D.
In other words each of the 2' schematainstanced by the resultant arises from a
potentially useful manipulation of schemataalready in the pool (those instanced
'
by A and A ) . Note also that , evenwhen / is only 20, a singleoperation is processing
over a million schemata!
We can gain additional insight concerning crossing-over by considering its
effect, over an extended interval, on the whole pool of schematain <B(t ). In the
absenceof reproduction and other operators, crossing-over generatesa kind of
diffusion from the pool to schemata not representedtherein. More precisely,

100

Adaptation in Natural and Artificial Systems

repeated application of crossing-over to the individuals in <B( t ) yields a " steady


state" wherein, at any instant (time-step), each schema~ has a well-defined probability
of occurrenceX(~). It follows that the expectedinterval betweenoccurrences
of ~ will be just the reciprocal l / X(~) of this probability . Thus, if the proportions
of schematain <B( t ) are not far removed from steady-state values, I / X(~) is area sonablemeasureof the expectedtime to an occurrenceof ~. Of course, no actively
adapting system(natural or artificial ) following a plan of type <R will even begin
to approach the steady state. Under such a plan, the steady state is continually
" modulated"
by changesin the number of instancesof various ~ resulting from
. In effect, with reproduction added, I / X(~) is a
reproduction according to .4E
"
"
continually changing background testing rate, giving at any time a rough estimate
of the expectedtime to first occurrenceof ~. Theseideas, together with values
for X(~) are establishedrigorously by
LEMMA6.2.2: Repeatedcrossing-over (with uniform random pairing of individuals
and in the absenceof other operators) in a population <B(t ) yields a " steady state"
(i.e., a fixed point of the stochastictransformation) in which each schema~ occurs
with probability X(~) = Nip (j~) where P(j~) is the overall proportion in <B(t ) of the
allele occurring at the jth position of ~ (if a " 0 " occurs at the jtb position take
P(j~) = 1) .
Proof: Let Sa and s~2 be the resultants of a crossing-over of ~1 and ~2 at point x.
Then a crossing-over of the resultantss~l and s~2 at point x will bring back a and
~2 (i.e., as may be determined directly from its definition , the crossover operator
is self- dual).
Letting P(~) designatethe proportion of (instancesof) ~ in <B(t ), we have
P(a ) P(~2) as the probability that a will be paired with ~2 for crossing-over (under
uniform random pairing) . Thus the probability that s~l, s~2 arise from acrossingover of b , ~2 at x is P(~l ) P(&) Ps, where Psis the probability that crossovertakes
place at x .
Similarly the probability of a reversion (a , ~2 arising from sa, s~2 by crossover
at x ) is P(sa)P(S&) Ps.
Considering only the effects of crossing-over at x on the pairs ~1, ~2 and
s
sa, ~2, there will be no changesin their probabilities of occurrenceif
P(a >P<&) Ps = P( sa) P(s~2) Ps.
If (and only if ) such an equation holds for every x and every ordered quadruple
(~1' ~2, sa, S~2) will there be no changein the probability of occurrenceof any schema.

Reproductive Plans and Genetic Operators

101

To balanceall of theseequationssimultaneously
, note first that the set of
-over,
alleles{;~, jft } is identicalto the setof alleles{j ~, ift } since, after crossing
the sameallelesare still presentat thejth positions(thoughto the right of x they
will havebeeninterchanged
). Hence
P( /;t) P(.;Et) = P(j~) P(jEt).
Thus, if P(f) = nip(jf ) for eachf, as definedin the statementof the lemma, we
havefor any x, ~, ft, s~, sEt,
P(~)P(Et) = (nip(,-a)Xn; p(~ = Nip(;~) p(;ft)
= n ; p(j~) p(jEt) = ( nip(j~)Xn; p(jEt
= P(s~) P(sft).
In other words, eachof the equationswill be balancedif the schemataoccurwith
probabilities~(f) = lljP (;f); it is also clearthat any departurefrom theseprobin some
abilitieswill unbalancethe equationsin sucha wayasto resultin changes
~(f) is the unique" steady
of the probabilitiesof occurrence
. Thus, the assignment
state" (fixedpoint) of the crossoveroperator.
Q.E.D.
" toward
We canseefrom the proof of this lemmathat a kind of " pressure
the steadystate
4 = P(~)P(Et) - P(s~) P(sft)
can be definedfor eachquadruple~, ft, s~, sEt. If 4 ~ 0 for any quadruplethen
will start changingandtherewill be a diffusiontoward
probabilitiesof occurrence
a
'
the resultants ~, Sft (4 > 0) or the precursors~, ft (4 < 0). For example
, if
P(~) > ~(~) whilethe othercomponents
remainat their steadystatevalues,there
will be a " movementto the right" - a tendencyto increasethe probabilitiesof the
result. The followingheuristicargumentgivessomeidea of the rate of approach
:
to steadystatefrom suchdepartures
A givenindividual has probability 2/ M of beinginvolvedin a crossover
when <B(t) containsM individuals(sincetwo individualsare involved in each
applicationof the crossoveroperator). Thus in N trials a given individual can
-overs. WhenN is in the vicinity of /M/ 2, where
expectto undergo2N/ M crossing
I is the lengthof individualrepresentations
, eachindividualin the populationcan
-overat almosteveryposition.
beexpected
to haveundergone
crossing
independent
As a resultevenextremedeparturesfrom steadystateshouldbe muchreducedin
IM/ 2 trials.

102

JI Systems
Adaptation in Natural and ArliJicia

The reduction to steady state does not , however, proceed uniformly with
respectto all schematabecausethe crossoveroperator inducesa linkage phenomenon
. Simply stated, linkage arisesbecausea schemais lesslikely to be affectedby
crossoverif its defining positions are close together. In more detail, let ~' s defining
" "
positions (those not having a 0 ) be 4 < is < . . . < iAand let the length of ~ be
defined as I(~) = (iA - 4) . Then the probability of the crossoverfalling somewhere
in ~, once an instance of ~ has been selectedfor crossing-over, is just I(~)/ (/ - I ) .
E.g., if A = alasaa
. . . a, is selectedfor crossing-over, the probability of the
D4a6
crossover point x falling within ~ = 0 as 00 a6 0 . . . 0 is 3/ (1 - I ). Clearly
the smaller the length of a schema, the less likely it is to be affected by crossingover. Thus, the smaller the length of ~, the more slowly will a departure from ~(~)
be reduced.
Alleles defining a schema~ of small length /(~) which exhibits above-average
performancewill be tried ever more frequently as a unit under an adaptive plan of
type <R. I .e., the alleleswill be associatedand tried accordingly. More modifications
and testsof suchschematawill be tried , and many of these trials will be of a variety
of combinations with other similarly favored schematadefined at other positions.
In effect such schemataserveas provisional structural elementsor primitives. This
observation is made preciseby the following simple but important
THEOREM
6.2.3: Consider a reproductiveplan 0/ type CRusing only the simple
crossover operator- defined as a crossoveroperator with both precursors, and the
single crossoverpoint, determinedby Wliform random selection. Then the expected
proportion 0/ eachschemarepresentedin <B(t ) changesin onegeneration/ romP (F., t ) to

P(~, t + I ) ~ [ I - Pc.(l(~)/ (l - 1 ( 1 - P(~, t ](.4t<t)/ .4(t P( ~, I),


wherePc is theproportionof individualsundergoing
crossover
duringa generation
and.4(t) is theobserved
average
performance
of <B(t). (Theunit of timehere-- ageneration is theexpectedtimefor an individualto produceits offspring
.)
to produce
Proof: During onegenerationeachindividualA E: <B(t) canbeexpected
p..( A)/ .4(t) offspringundera reproductiveplan of type <R. The total expectedoffspring
of the setof instances<Bt<t) of ~ in <B(t) is thus givenby
ME(t) = ( EAE:CBI
<I)p.. (A / .4(t) = .4~t)M t<t)/ .4(t).
If Pc is the proportionof <B(t) selectedto undergocrossoverand l (~) is the
lengthof ~, thena proportionPcl(~)/ (l - I ) of the Mkt ) offspringwill havea crossover
falling within thedefinin.g positionsof ~. Whenan instanceof ~is crossedwith

Reproductive Plans and Genetic Operators

103

another instanceof ~ the result will also be an instanceof ~; otherwisethe resultant


may not be an instanceof ~. Sincethe probability of ~ crossing with ~ is P(~, t ) no
more than a proportion ( I - P(~, t Pc/(~)/ (/ - I ) of the modified offspring of ~
can be expected to be instances of schemata other than ~; the remainder
[ 1 - ( 1 - P(~, t Pc/(~)/ (/ - 1)] will be instancesof ~.
That is,

P(~, t + 1) = ME<t + 1)/ M


~ [ 1 - ( 1 - P(~, t Pcl(~)/ (1- 1)] Mf(t)/ M
= [ 1 - Pc.(l (~)/ (l - 1 ( 1 - P(~, t ] (PE
<t)/ .4(t P( ~, I).
-overappliedto precursors
whicharenot instances
(It shouldbenotedthat crossing
of ~mayyielda resultantwhichis aninstanceof ~. ThusME<t + 1) maybeenlarged
,
by a small amount usually, from sourcesoutside BE<t); this of course only
the abovebound.)
strengthens
Q.E.D.
From this resultwe seethat the proportionof (instances
of) a schema~ will
increaseaslong as
[ 1 - Pc' (l (~)/ (/ - 1 ( 1 - P(~, t ](P-E<t)/ .4(t ~ 1
or, usingthe fact that 1/ ( 1 - c) ~ 1 + c for c ~ 1,
P-E<t) ~ [ 1 + Pc.(l (~)/ (/ - 1 ( 1 - P(~, t ].4(t).
Sincethe worst caseoccurswhenPc = 1 (everyindividual in <B(t) subjectedto
-over) andP(~, t) is small, weseethat ~ will alwaysincreaseits representation
crossing
if
P-E<t) ~ [ 1 + (l (~)/ (/ - 1 ].4(t).
Since 1/ / ~ 1(~)/ (/ - 1) ~ 1, short schemataneed perform only slightly above
averageto increase
, whilethe longestschemata
(if they occurin smallproportion)
have
to
a
.
exhibit
twice
the
may
performance
populationaverageto increase
Theorem6.2.3 providesthe first evidenceof the intrinsic parallelismof
in the population<B(t) increasesor decreases
geneticplans. Eachschemarepresented
to
accordingto the aboveformulationindependently
of whatis happening
other schematain the population. The proportion of eachschemais essentially
determinedby its averageperformance
in relationto the populationaverage
. Thus
we seethe evolutionof a rankingof schematabasedon observedperformance
, as
-overserves
at the endof chapter4 andamplifiedin section5.4. Crossing
suggested

104

Adaptation in Natural and Artificial Systems

this adaptive process by continually introducing new schemata for trial , while
testing extant schemata in new contexts- all this without much disturbing the
ranking process(except for the longer schemata). Moreover, crossing-over makes
it possible for the schematarepresentedin <B( t) to move automatically to appropriate
rankings through the application of the genetic plan to individual structures
from (t. As a result this very large number of rankings is compactly stored in a
selected, relatively small population of individuals (exploiting the possibility suggested
at the end of chapter 4) .
By extending the pressureanalogy introduced just before Theorem 6.2.3
we can gain a global view of the interaction of reproduction and crossover. Whenever
some schema ~ exhibits better-than-average performance, reproduction introduces
"
"
pressures ~ > 0, disturbing the steady state which would result from
the action of the crossoveroperator alone. The disturbancesboth shift the steadystate values X(~' ) for large numbersof schemata, becauseof changesin the proportions
P(j~) of the allelesj~, 1 ~ j ~ I , and also introduce local transitory departures
becauseP(~) > X(~). Becauseall schemataare being affected simultaneously, and
becausereproduction affects them according to observed performance, we have
a diffusion " outward" from schematacurrently representedin <B(t ), a diffusion
which proceedsrapidly in the vicinity of schemataexhibiting above-averageperformance
. This is closely analogousto a gas diffusing from some central location
through a medium of varying porosity, where above-average porosity is the
analogue of above-averageperformance. The gas will exhibit a quickened rate of
diffusion whereverit encountersa region of higher porosity, rapidly saturating the
whole region. All the while it slowly but steadily infuses enclavesof low porosity.
In effect, high porosity is exploited whereverit occurs, without prejudicing eventual
penetration into regions of lower porosity. As a result the overall rate of penetration
is much more determined by regions of high porosity and their proximity to
each other than by averageporosity.
Restated in terms of schemata, regions of higher porosity correspond to
setsof schemataof above-averageperformancewhich can be produced from each
other by relatively few crossovers. Thus, following the analogy, local optima in
performance are thoroughly explored in an intrinsically parallel fashion. At the
sametime the geneticplan doesnot get entrappedby settling on somelocal optimum
when further improvements are possible. Instead all observed regions of high
performance are exploited without significantly slowing the overall search for
better optima. Here we begin to seein a more precisecontext the powers of generalized
genetic plans, powers first suggestedin the specificcontext of section 1.4.
One final point : Plans of type <Rmeasurea schema's performance relative

105

Reproductive Plans and Genetic Operators

to the current average performance of the population. Thus, as time elapses,


schematamust meet progressivelyhigher criteria to attain (or retain) ~ high ranking
. ( This is, again, somewhatanalogousto the slowed rate of occupation of a gas
as it occupiessuccessivelylarger volumes, higher porosity being.required for the
same occupation rate.) As a result, older schemataassociatedwith local optima
steadily lo.seranking as better optima are located (unlessthe older schemataare
components of the new schemata), so that capacity is not wasted on superseded
regions.
The overall results of this section can be illustrated by elaborating the
comment (on page 99) aboutf (x ) of Figure 10. Using 6 bits of accuracy(/ = 6),
>IOO,Aa = .101<XX
>, A4 = .110011,and A , = .011100
assumeAl = .001100, AI = .<XX
have beenchosenat random to form <B(O) . ( The size of <B(O), M = 5, is of course
much too small to be realistic even for an algorithm for artificial systems, but it is
adequateto illustrate the effects of crossing-over.) Looking at Figure 10 we see
that III = f (AJ = f (.OOIIOO
) ~ i . Similarly, III = f (AI ) ~ Ii , lIa~ 2, 14~ If ,
and /1.6~ i . For thesepoints .a~ I . Accordingly Al will produce1I1
/ .a~ (i )/ (I ) = f
i
has
.e.
2
about chancesout of 5 of being reproduced. Similarly AI
, Al
offspring
will have 112
/ .a~ ! offspring; and so on. Figure 12 shows a typical outcome for a
plan of type <Rusing only reproduction and simple crossoveron <B(O). ( Thus, for
the reproduction of AI , a trial was made of a random variable yielding I with
probability f and 0 with probability i - the outcomeof the trial was0.) The crossing-

X .. B ( 0 )

Crossover

( inte Qer approx .

to f (x))

point

X . 8 ( 1)

At

.00 I I 00

A2

.000100

.001000

A3

. I 0 I 000

. 101011

A4

. 110011

. 1 10000

A5

.0 I I 100

lOJ

OJ

. 100 100

.011100

a! function
Fig . 12. Some effects of a type <Rplan on a one- dimension

106

Adaptation in Natural and Artificial Systems

over of At with one of the replicatesof Aaat intersection @ servesboth to generate


another (different) instance of 100 . . . 0 , and to generate a first instance of
I 0 0 0 . . . D . Clearly such a crossoverbecomesincreasingly likely as instances
of 100 . . . 0 and 0000 . . . 0 proliferate. ( Points from these schemataare
likely to exhibit above-averagevalues and hencewill have more offspring on the
average.) Similar effectswill be happeningto all other schematainstancedin (B(O),
(B( I ), etc. Figure 13 displays a more elaborate example of theseeffects.
- INVERSION
3 . GENERALIZED GENETIC OPERATORS
Crossover, by inducing a linkage betweenalleles, offers the possibility of an adaptable
net of associationsbetweenalleles. By changing the length of a schemawe
modify the probability of its being affected by crossover; instancesof a shorter
schemaare less likely to have the defining alleles separatedby crossover. In consequence
, under a plan of type <R, instancesof the shorter schemaproliferate more
.
The
rapidly
long-term effectis a selectiveincreasein the linkage of various schemata
-average performance. The corresponding alleles (attributes) are
above
exhibiting
more frequently found in association(on the samestring) in successivegenerations.
Since schemataare defined for any string-representabledomain (1, such an adaptable
network of associationscan be induced for any such domain by introducing
an appropriate operator for changing linkage.
The linkage betweenthe alleles defining a schemacan be altered only by
changing the length of the schema. That is, the positions of the allelesdefining the
schema(particularly the end-points) must be modifiable. However, up to this point,
the functional meaning of an allele has beendetermined by its position. The allele
a, at the ith position of the representationof the structure A is the value a,( A ) of
the ith detector when A is its argument. Thus, if linkage is to be changed, an allele
must have the same functional interpretation in any position (as is the case generally
in genetics). This in turn requiresa changein the method of representation.
The simplest way to change the method of representation formally is to
assigneach allele an index indicating the detector with which it is associated. That
is, each allele is now taken to be a pair (i , a) indicating that a = a,( A ) . It follows
that a structure A can be representedby any permutation of
I , ai(A , (2, at(A , . . . , (I, a,(A ).
For example,
3, aa(A , (2, at(A , ( I , ai(A , (4, a4(A , . . . , (I, a,(a )

ReproductivePlans and Genetic Operators

107

would still representA. Moreover, the schema( 1, 81(A 0 0 (4, 8.(A 0 . . . 0


designatesthe same subset of (t as the schema 00 ( 1, 81(A ( 4, 8.(A 0 . . . 0 ,
though the latter is more tightly linked than the former. To define this enlargedset
of representationsprecisely, let V.. be redefinedto be the set of pairs VI = {(i , v),
for all v E: Vi} and let O
' t indicate the set of all permutations of the string (or
I tuple) 0'. Then (tt = ( n : - I Vf>t is the enlarged set of all representationsof elements
in (t . The setof schematais enlargedaccordingly to Z = (n ~- 1 { V~U { O } } )t .
The object now is to find an operator which when used with crossoverand
reproduction will tend to replace an above-averageschemaFawith a permutation
'
'
FaE: F.f of shorter length I(Fa) < I(Fa
) . The genetic operator which fits this specification
is inversion. It works by producing a crossover within a single structure as
follows :
I . A structure A = ala! . . . Q, is selected(usually at random) from the
current population <B( t ) (where each ai, i = I , . . . , /, now representsa
pair U, v) E: Vi) .
2. Two numbers, x ~and X~, are selectedfrom {O, 1, 2, . . . , / + I } (again at
random) and are usedto defineXl = min {x ~, x ~} and X! = max {x ~, x ~} .
3. A new structure is formed from A by inverting the segmentwhich lies
to the right of position Xl and to the left of position X! , yielding

-l-azt
..az1azt
.-.OzI
.
.
a
e
al
2Ozt
I
'
+
Ozt
.
2
+
1
1
~
1
+
~
0
:
.
~:~
.llzl
~
1.
Zt
~+
I
al

I .
at .

. - . aQ,

It is clear that a single inversion can bring previously widely separated alleles into
close proximity , viz ., QZIand Qz,- l in the description . It is also clear that any possible
permutation of the representation can be produced by an appropriate sequence
of inversions . ( More technically , the inversions ( Xl = 0 , XI = 2), (Xl = 1, XI = 3),
. . . , (Xl = 1 - 1, XI = 1 + 1) are sufficient to generate the group of all permutations
of order I.) The effect of the inversion operator upon ( the instances of ) a
schema ~ is to randomly produce permutations ~' of ~ with varying lengths . Though
inversion alters the linkage of schemata , it does not alter the subsets of d which
'
they designate . Every permutation ~ of ~ designates the same subset in the set of
(original ) structures d ( since the same set of detector values occurs in both ~ and
'
~ ) . The lengths of many schemata are affected simultaneously by a single inversion
, so this operator too exhibits intrinsic parallelism . As with crossover , schemata
of shorter lengths are less frequently affected by the inversion operator .

108

Adaptation in Natural and ArtificialSystems

Let us define the simple inversion operator as an inversion with both the
structure selectedfor inversion and the two points Xl and XI determined by uniform
random selection. To seethe combined effect of simple inversion, simple crossover,
and reproduction we need only refer to Theorem 6.2.3. The theorem guarantees
that , if inversion has produced a permutation ~' of ~ where l (~' ) < l (~), then the
'
proportion of ~ in <B( t ) increasesmore rapidly than the proportion of ~. For example
'
1 we can expect
, if Pc = 1 and P(~, t ), P(~ , t )
P(~' , t + 1) =

/ - 1 - l (~' / (/ - 1 - /(~ XP( ~' , t )/ P( ~, t P(~, t + 1)

since IJ.f = IJ.f ' . Or , after T generations


P(~' , t + T) =

/ - 1 - l (~' / (/ - 1 - /(~ )T(P(~' , t)/ P(~, t P (~, t + T) .

As a result, any time inversion yields a shorter permutation ~' of a schema~ of


above-average performance, that permutation will rapidly predominate. Because
the rate of reproduction of a schemais dependentupon its length, there is a constant
"
"
pressure toward tighter linkage of the defining allelesof schemata. Because
only schemataexhibiting above-averageperformance occupy substantial fractions
of <B(t ), the " pressure" is only important for such schemata. Inversion, by repeatedly varying the linkage, givesthis pressurea chanceto act.
A great many schemataare affected by each inversion, but tightly linked
schemataare much lesslikely to be affectedthan loosely linked ones, so that variations
are primarily in the loosely linked schemata. That is, changesin linkage are
concentratedin the loosely linked (long) schemataof above-averageperformance,
where changesare desirable. More precisely, if P, is the proportion of the population
undergoing inversion in a given generation, then the probability of a schema
of
~ length /(~) being affectedis
2P, . (/(~)/ (/ - 1 . ( 1 - /(~)/ (/ - 1 = 2P, [ /(~)/ (/ - 1) - (/(~)/ (/ - 1 1] ,
where the second factor comes from the fact that an inversion wholly inside a
schemadoes not affect its length. Hence, if I(~) = b . /(~' ) < 1/ 4, b > I , for two
schemata~ and ~' , ~ is almost b times as likely to have its length altered.
One new restriction must be made upon the crossoveroperator when it is
used in combination with inversion. Becauseof inversion, two I-tupies in <B(t ) will
not always have the allelesfor a given detector at the sameposition. Crossing-over
can thus produce resultants with two (or more) alleles for a given detector, or
resultants with no alleles for a given detector. For example, crossing

Plans
Reproductive

109

and Genetic Operators

I , at), (2, at), (3, a,

with

I , a~), (3, a' ), (2, a~

at x = 2 yields I , oJ, (2, at), (2, a~ as one of the resultants. The simplest way to
remedy this is to permit crossing-over only betweenhomologousrepresentations,
where two representationsare defined to be homologous if the detector indices
(first number of eachpair in the representation) are in the sameorder. For example,
I , at), (3, a,), (2, at is homologousto I , a~), (3, a' ), (2, a~ , even ifai ~ aJ for
some or allj , while I , oJ, (2, at), (3, a, is not homologous to either of the foregoing
. This remedy requires that the probability of inversion PI be small so that
there will exist substantial homologous subpopulations for the crossoveroperator
to act upon. A secondalternative (with a biological precedent) would be to temporarily make the secondof the I-tupies chosenfor crossoverhomologous to the
first by reordering it , returning it to the population in its original order after the
resultants of the crossing-over are formed. Under this alternative inversion can be
unrestricted, i.e., PI can be as large as desired.
Summing up : Inversion, in combination with reproduction and crossover,
selectively increasesthe linkage (decreasesthe length) of schemata exhibiting
above-averageperformance, and it does this in an intrinsically parallel fashion.
4 . GENERALIZED GENETIC OPERATO Rs - MUTATION
Though mutation is one of the most familiar of the genetic operators, its role in
adaptation is frequently misinterpreted. In geneticsmutation is a processwherein
one allele of a geneis randomly replacedby (or modified to ) another to yield a new
structure. Generally there is a small probability of mutation at each gene in the
structure. In the formal framework this meansthat , each structure A = alai . . . a,
in the population <B(t ), is operated upon as follows:
I . The positions XI, X2, . . . , x ,. to undergo mutation are determined (by a
random processwhere each position has a small probability of undergoing
mutation, independently of what happensat other positions) .
.
,
,
2. A new structure A ' = al . . . a'%
la'%
I- la%
I+ 1 . . . ~ - la~ + 1 . . . a,. IS
formed where a~ is drawn at random from the range V, of the detector
8, corresponding to position XI, each element in V, being an equilikely
candidate; a~, . . . , a~, are determined in the sameway.
If IPM is the probability of mutation at each position, then the probability of h
mutations in a single representation is given by the Poisson distribution with
parameter IPM.

110

Adaptation in Natural and Artificial Systems

If successivepopulations are produced by mutation alone (without reproduction


), the result is a random sequenceof structuresdrawn from (t . The process
is evidently enumerative (see section 1.5) since the order in which structures are
generatedis unaffected by the observed performances of the structures. Even a
reproductive plan of type <Rusing only the mutation operator is little more than
an enumerativeplan retaining the best structure encounteredto each point in time.
That is, if IPMis small enough, reproduction will assurethat structureswith aboveaverage performance predominate in successivegenerations thus retaining the
better structures generatedby the mutation operator. There is actually a bit of
history dependencesince, with IPMsmall, the most likely structures resulting from
mutation will differ by one or two allelesfrom the current " best" structures. Thus,
the sequenceof tests is not entirely random, though the dependenceon observations
is very unsophisticatedcompared to that generatedby crossing-over.
Since enumerative plans are, at best, useful in very limited situations, it
would seemthat mutation' s primary role is not one of generating new structures
for trial - a role very efficiently filled by crossing-over. It might be objected that
crossing-over cannot generate all possible combinations of alleles unless the
population <B( t ) contains at least one copy of every allele. However this is not a
burdensomerequirement. If k is the maximum number of allelesfor any detector,
then as few as k strings will suffice to provide a copy of each allele. ( E.g., if
Vi = {O, I } , i = I , . . . , I , then the two I-tupies 00 . . . 0 and II . . . I suffice.)
There is neverthelessa difficulty which is remedled by mutation. In a population
that is small relative to (t , there is always the possibility that the last copy of some
will be eliminated during the deletion phaseof a plan of type <R. Alleles which
occur in structures of below-average performance are particularly susceptible;
yet at somelater stagethesesamealleles may be required in a combination necessary
for further improvement. Stated another way, the lost allele may be necessary
for the adaptive plan to escapea falsepeak. Oncean allele is lost from a population,
the crossoveroperator hasno way of reintroducing it . Here, then, is a role uniquely
filled by mutation, becauseit assuresthat no allele permanently disappearsfrom
the population.
Mutation introduces an additional source of loss for schemataundergoing
reproduction. If the probability of mutation at each position is less than or equal
to IPM, then a schemaF. defined on / O(F.) positions can expect to undergo one or
more mutations with probability

1 - ( 1 - IPM),t(f)

Reproductive Plans and Genetic Operators

which is approximately equal to / O(~) . IPMwhen IPMis small relative to 1/ /. Thus,


adding mutation to the list of operators in Theorem 6.2.3, we get
COROLLARY6.4.1: Under a reproductive plan of type <R using the simple cross over operator and mutation , the expected proportion of each schema represented in
<B( t ) changes in one generation from P (~, t ) to
P (~, t + 1) ~ [ 1 - Pc . ( l (~)/ ( l -

1 ( 1 - P (~, t ] . ( 1 - IPM) ' O(t)

(~

P (~, t ) .

Unlike the casefor crossing-over, mutation is a constant source of lossfor a schema ~,


with IPMfixed , even when P (~, t ) = 1. In effect it is a " disturbance " introduced to
prevent entrapment on a false peak .

"
"
Summing up : Mutation is a background operator, assuring that the
crossoveroperator hasa full rangeof allelesso that the adaptive plan is not trapped
on local optima. (Of course if there are many possiblealleles- e .g., if we consider
a great many variants of the nucleotide sequencesdefining a given gene- then even
a large population will not contain all variants. Then mutation servesan enumerative
function , producing alleles not previously tried.)

5. FURTHERINCREASES
IN POWER
The next chapter will establish that the three genetic operators just describedare
adequatefor a robust and general purpose set of adaptive plans, with one irnportant reservation which will be discussedat the end of this section. However, there
are additional operators which can rnake significant contributions to efficiency in
rnore cornplex situations. Chief arnong these is the dorninance- change operator
which (arnong other things) helpsto control lossesresulting frorn rnutation. Because
lossesresulting frorn rnutation, for given IPM, do not dirninish as scherna~ gains
"
"
high rank , a constant load is placed on the adaptive plan by the randorn rnovernentsaway frorn optirnal configurations. For this reasonit is desirableto keep the
rnutation rate IPMas low as possibleconsistent with rnutation' s role of supplying
rnissingalleles. In particular, if the rate of disappearanceof allelescan be lowered
without affecting the efficiency of the adaptive plan, then the rnutation rate can be
proportionally lower. Sincethe rnain causeof disappearanceof alleles is sustained
below-average perforrnance, the rate of loss can be reduced by shielding such

112

Adaptation in Natural and Artificial Systems

alleles from continued testing against the environment. Dominance provides just
such shielding.
To introduce dominance, we must extend the method of representation
once again. Pairs of alleles will be used for each detector, so that a representation
involves a pair of homologous I-tupies. The object is to let someof the extra alleles
be carried along with the others in an unexpressedform , forming a kind of reservoir
of protected alleles. Precisely, then, the set of representationswill be extended
to the set of all permutations of homologous pairs d ~ = (n ~- l ( Vl )2)t . Sincethere
is now a pair of alleles at each position there is no longer a direct correspondence
betweenthe detector values for a structure A and the representationof A.
Let (A ' , A " ) be a homologous pair of I-tupies drawn from d ~ and let
'
"
'
" where v' v"
,( A , A ) = d/ h, v ), (h, v
E: Via
,
, designate the pair of alleles
occurring at the ith position of the I tupies. The most direct way to relate this pair
of I-tupies to a structure is to designateeither v' or elsev" as the value of detector h,
ignoring the other allele. The allele so designatedwill be called dominant, the other
recessive
. For each position i , this designationshould be completely determined by
information available in the pair (A ' , A " ) . Formally , for each i there should be a
dominance map di :d ~- + d such that , for each (A ' , A " ) E: d ~, d.{ A ' , A " ) is either
the first allele or the secondallele of .( A ' , A " ). It should be emphasizedthat in this
' "
general form , the determination of the dominant allele in .( A , A ) may depend
' "
upon the whole context (i.e., the other allelesin (A , A . ( Thiscorrespondsclosely
'
with Fisher s [ 1937, Chapter III ] theory of dominance.) A simpler approach makes
the determination dependentonly upon the pair ,( A ' , A " ) itself. Thus, for each h,

d~: J1-+ VAsuchthatd~v' , v" ) E: {v' , v" }


and

di:<l&-. <l suchthat for .(A' , A" ) = h, v'), (h, v" , d.( A' , A" ) = d,.(v' , v" ).
A E: <l for which
thestructure
(A' , A" ) represents
Accordingly
8A
,(A) = di(A
,)(A' , A" )
where i (h) is the index of the pair of allelesin (A ' , A " ) for detector h.
A particularly interesting example of the simpler dominance map, useful
for binary (two allele) codings (seechapter 4), can be constructed as follows. Let
VA= { I , 10, O} , where 10is to be recessivewheneverit is paired with 0, and let the
mapping dA: v: - + VAbe given by the following table:

113

Reproductive Plans and Genetic~Operators

,,', ,,"

' '
IdA(,, , ,, ')

0
1
1

1 0
1 10
1 1

1
1
1

10 0
10 10
10 1

0
0
1

0 0
0 10
0 1

Then, for example, the representation


A' =
' ) ( I , ' ' '
~~~ I? ~ ~~: ~~ ~~ I~~ ~~ I~~?
"
=
A
I , 10), (3, 0), (2, 1), (4, 0
maps to the (unpermuted) representation
1, 0), (2, I ), (3, 0), (4, I

= 0101.

That is, (A ' , A " ) representsthe structure A E: (t; for which


8t(A ) = 0, 8tCA) = I , aa(A ) = 0, 8~A ) = I .
In order to examine the effect of dominance on genetic plans, the simple
crossoveroperator must be extendedto this new type of representation(inversion
takes place, as before, on the individual /-tupies in the homologouspairs) . To cross
the homologous pair (A ' , A " ) with the pair (A " ' , A " " ), the procedurewill be to
cross A ' with A ' " with probability Pc, and then select one of the resultants at
random. Similarly, A " is crossedwith A " " and, again, one of the resultants is
selectedat random. The two selectedresultantsare then paired to yield one of the
outcomesof the extended operation; the other two resultants are paired to yield
the other outcome (if it is to be used) .
To seethe effect of dominanceon the mutation rate, let us considerthe case
of two allelesVI, Voat position i , where VIis dominant and Vois recessive
. There are
four distinct pairs of thesealleleswhich can occur at position i in (A ' , A " ), namely
{(VI, VI), (Vi, Vo
), (vo, vJ , (Vo
, t'o)} , and only one of these maps to VI under the dominance
map. That is, in the pairs (VI, VI) and (Vo
, vJ the allele Vois shieldedor stored
without test (becausethe representationmapsto one whereonly allele VIis present).

Adaptation in Natural and ArtificialSystems

114

Stated another way, allele Vois only expressedor tested when it occurs in the pair
(vo, VI) . Let us assumethat , on the average, the adaptive plan is to provide at least
one occurrenceof each allele in every T generations. That is, P(Vo
, t ) ~ 1/ MT must
be assured. In the absenceof dominance(using the earlier singleI-tuple representation
), let the reproduction rate of 1'0 (corrected for operator losses) be ( 1 - E(vo
exclusiveof additions resulting from mutation. Then

P(Vo
, t + I ) = ( I - E(VoP(Vo, t) + IPM
, t - IPMP
, I ).
( I - P( Vo
(Vo
To keepP(vo, t) ~ 1/ MT for all t , IPMmustbe at leastlargeenoughto maintain
the steadystateP(Vo
, t) = P(Vo
, t + I ) = 1/ MT. That is,
1/ MT = ( I - E(Vo/ MT + IPM
( I - 2/ MT)
or
IPM= E(Vo
)/ 1 - 2/ MT) . MT) .
If MT is at a1llarge(asit will be for all casesof interest) this reducesto
IPM~ E(Vo
)/ MT
as a close approximation to the mutation rate required without dominance . ( In the
extreme case that alleles Voare deleted whenever they are tested, IPM = II MT .)
With dominance , the allele Vois subject to selection only when the pair (Vo
, Vo)
occurs . Under crossover , as extended to homologous pairs , the pair (vo, vo) occurs
with probability PI (Vo
, I ) . The loss from selection then is
2E{t'o) P2("o, t ) M
the factor 2 occurring because 2 copies of Voare lost each time the pair (vo, vo) is
deleted. Again the gains from mutation are
IPM. ( I - P (Vo
, t ) 2M -

IPMP(Vo
, t 2M

where the factor 2 occurs because the M homologous pairs are 2M I -tupies . Thus
P (Vo
, t + I ) = P (vo, t ) - 2E{Vo)P2(vo, t ) + IPM. 2( 1 - 2P(vo, I
for the homologous pairs with dominance . Setting P (Vo
, t ) = P (vo, t + I ) = II MT
as before , and solving, we get
IPM = E{vo)/
We have thus established

1 - 21MT) ( M1' ) 2) .

Reproductive Plans and Genetic Operators

lIS

LEMMA6.5.1: To assure that, at all times, each allele a occurs with probability
P(a, t ) ~ 1/ MT , the mutation rate IPMmust be ~ 1/ MTin the absenceof dominance,
but only ~ ( 1/ MT )2 with dominance.
For example, to sustain an averagedensity of at least lo- a for every allele, the
mutation rate would have to be lo - a without dominance, but only lQ-6 with
dominance.
It should be noted that , with dominance, P(Vo
, t ) is no longer the expected
rate.
dominance
allows
the
constant
mutation load to be reduced,
testing
Although
while maintaining a given proportion of disfavored allelesas a reserve, the testing
rate of the reservedallelesis only p2(Vo
, t ) not P(Vo
, t ) . This reservoir is only released
through a changein dominance.
Dominance change in the general case di : a ~- + a , i = I , . . . , I, occurs
simply through a changein context, so that dominance is directly subject to adaptation
by selectionof appropriate contexts. In the more restricted casedAn - + VA
a special operator is required. The example using VA= { I , 10, O} will serve to
illustrate the process. The basic idea will be to replace some or all occurrencesof
I by 10, and vice versa, in an I-tuple. Thus the previous recessivesbecomedominant
and vice versa, this changebeing transmitted to all progeny of the I-tuple. A simple
way to do this is to designatea specialinversion operator which not only inverts a
segment but carries out the replacement in the inverted segment. ( In genetics,
there is a distant analogue in the effects produced by changesof context when a
region is inverted, but it should not be taken literally .) Thus for the dominancechangeinversion operator, step 3, p.l O7of the inversion operator is followed by
4. In the inverted segment each occurrence
versa.

of 1 is replacedby 10and vice

With this operator the defining alleles of an arbitrary schema can be " put in
reserve" in a single operation to be " released" later, again in a single operation.
Dominance provides a reservedstatus not only for alleles but, more importantly
, for schemata. A useful schema~l defined on many positions may be the
result of an extensivesearch. As such it representsa considerablefragment of the
'
adaptive plan s history, embodying important adaptations. When it is superseded
.
by a schema~2 exhibiting better performance, it is important that ~l not be discarded
until it is establishedthat ~2 is useful over the samerange of contexts as ~l .
'
~2Sperformanceadvantagemay be temporary or restricted in some way, or ~l may
be useful again in some context engenderedby ~2. In any caseit is useful to retain

116

Adaptation in Natural and ArtificialSystems

F.l for a period comparable to the time it took to establish it . Dominance makes
this possible.
Summing up : Under dominance, a given minimal rate of occurrence of
alleles can be maintained with a mutation rate which is the square of the rate required
in the absence of dominance. Moreover, with the dominance-change
"
"
operator the combination of alleles defining a schema can be reserved (as
"
"
recessives
) or released (as dominants) in a single operation.
When the performance function depends upon many more or less independent
factors, there is another pair of operators, segregationand translocation
which can make a significant contribution to efficiency. In such situations it is
useful to make provision for distinct and independentsetsof associations(linkages)
betweengenes. This again calls for an extension in the method of representation.
Let eachelement in ct be representedby a set of homologous pairs of n-tupies, and
let crossoverbe restricted to homologousn-tupies. After two elementsof ct, A and
A ' , are chosenfor crossoverand after all homologous pairs have beencrossed(as
detailed under the discussion of dominance change) then from each pair of resultants
one is chosen at random to yield the offspring' s n-tupies. Each offspring
thereby consists of the same number of homologous pairs of n-tupies as its progenitors
. The genetic counterpart of this random selection of resultants is known
as segregation. Clearly, under segregation, there is no linkage betweenalleles on
separatenonhomologousn-tupies, while alleles on homologousn-tupies are linked
as before. With this representationit is natural to provide an operator which will
shift genesfrom one linkage set to another (so that , for example, schematathat
are useful in one context of associationscan be tested in another) . The easiestway
to accomplish this is to introduce an exceptional crossover operator, the translocation
operator, which produces crossing-over between randomly chosen nonhomologQl.ls pairs.
Another genetic operation provides a means of adaptively modifying the
effective mutation rate for different closely linked sets of alleles. The operator
involved is intrachromosomalduplication (see Britten 1968) ; it acts by providing
multiple copies of alleles on the samen-tuple. To interpret this operation, n-tupies
with multiple copies of the alleles for a given genemust be mapped into the set of
original structures. This can be done most directly by extending the concept of
dominance to multiple copies of alleles. With this provision, if there are k. copies
of a given allele a, the probability of one or more mutations of allele a is k . times
greater than if there were but one copy. That is, the probability of occurrence, via
mutation, of allele a ' ~ a is increasedk . times. Thus, increasesand decreasesin
the number of copies of an allele have the effect of modifying the (local) mutation

ReproductivePlans and Genetic Operators

117

rate. In geneticsthe decreasesare provided by deletion. The easiestgeneralization


of these operators is an operator which doubles (or halves) the number of copies
at a randomly chosen(set of adjacent) location(s) .
.
Though the operators just described are useful, they are not necessary
Moreover they do not compensatethe major shortcoming of genetic plans which
use just the first three operators described. That shortcoming is the complete
dependenceof such plans upon the detectors determining the representation. If
the set of detectors {8i} is inadequate, in any way, the plan must operate within
that constraint. However, if the plan could add or modify detectors at need, it
could circumvent the difficulty . This implies making the detectors themselves
subject to adaptation. When we note that each detector can be specifiedby an
appropriate subroutine (string of instructions) for a general purpose computer,
a way of making this extension suggestsitself. By keeping the number of basic
instructions from which the subroutines are constructed small, we can treat them
as alleles. d can then be extendedto include all strings of basic instructions. In this
way d contains a representation of any possible detector, set of detectors or , in
fact, any effectively describableway of processinginformation . Moreover, under
this extension, favored schematacorrespond to useful coordinated sets of instructions
(such as detectors) . Genetic plans applied to d , so extended, can thus develop
whatever functions or representationsthey need. This problem and the suggested
approach are complex enough to merit a chapter, chapter 8.
The Jacob- Monod ( 1961) " operon" model of the functioning of the chromosome
has an interesting relation to the extension of d just suggested
. In the
extension, we can think of eachelement of d as a program processinginputs from
the environment to produce outputs affecting that environment (cf. chapter 3.4
where transformations {'1i} are the outputs) . The performance of the element is
thus directly determined by the relevanceor fitness of the program. The " operon"
model treats the chromosome as a similar information processingdevice. Each
gene can either be active (cf. the execution of an instruction) or inactive. When
active the geneis participating in the production of signals(enzymes) which modulate
the ongoing activity of the cell. It thereby determines the cell' s modes of
action and critical aspectsof its structure. The genesare collected ingroups operons- such that all genes in the group are either simultaneously active or
inactive, as determinedby one control genein the group called an " operator gene"
"
"
(or more recently, a receptor gene in Britten and Davidson 1969; seeFig . 14).
The remainder of the cell is treated as the chromosome's environment. The action
of the " receptor gene" is conditional upon the presenceof signals(proteins) from
the cell (usually through the mediation of other genes- " repressor" or " sensor"

Adaptation in NaturalandArtificial
Systems

118

genes). In this way one operon can causethe cell to produce signals which (with
controlled delays) turn on other operons. This provision for action conditional
upon previous (conditional) actions gives the chromosome tremendous information
-processing power. In fact, as will be shown in chapter 8, any effectively
describableinformation -processingprogram can be produced in this way.

6. INTERPRETATIONS
For the geneticist, the picture of the processof adaptation which emergesfrom the
mathematical treatment thus far exhibits certain familiar landmarks:
Natural selectiondirectsevolutionnot by acceptingor rejectingmutationsas they
occur, but by sorting new adaptivecombinationsout of a genepool of variability
whichhasbeenbuilt up throughthe combinedactionof mutation, generecombination
. For the most part Darwin' s conceptof
, and selectionover manygenerations
descentwith modificationfits in with our modernconceptof interactionbetween
es, becauseeachnew adaptivecombinationis a modification
evolutionaryprocess
. (p. 31)
of an adaptationto a previousenvironment
Inversionsand translocationsof chromosomalsegments
, when present in the
condition, can increasegeneticlinkageand so bind togetheradaptive
heterozygous
. . . . The importanceof such increasedlinkage is due to the
genecombinations
numberof diversegeneswhich must contributeto any adaptivemechanismin a
higherplant or animal. (p. 57)
Stebbinsin Process
esof OrganicEvolution
Not only do we claim in this case [of inversions found in D. pseudoobscuraand
D . persimilis] that the precise pairing of the chromosomes in the specieshybrids
shows that the chromosomal material has had a common source, but we also claim
that the sequenceof rearrangements[produced by inversions] that occurred in the
chromosome reconstructs for us the precise pattern of change that led up to and
then beyond the point of speciation.
Wallace in Chromosomes
, Giant Molecules,
and Evolution (p . 49)

At the same time the emphasis on gene interaction poses a series of difficult
problems:
Intricate adaptations, involving a great complexity of genetic substitutions to
render them efficient would only be established, or even maintained in the species,
by the agency of selectiveforces, the intensity of which may be thought of broadly,
as proportional to their complexity .
Fisher in Evolution as a Process, ed. Huxley et ale ( p. 117)

ReproductivePlans and Genetic Operators

119

The interaction of genes is more and more recognized as one of the great evolutionary
factors. The longer a genotype is maintained in evolution , the stronger will
its developmental homeostasis, its canalizations, its system of internal feed backs
become. . . . one of the real puzzles of evolution is how to break up such a perfectly
co- adapted system in such a way so as not to induce extinction . . .
Mayr in Mathematical Challengesto the NeoDarwinian Interpretation
of Evolution, ed. Moorhead & Kaplan ( p. 53)

The other and I think more interestingproblem, which we havehardly begunto


to explain
solve, is the question: How manychangesof informationare necessary
evolution?
to the NeoDarwinianInterpretation
Waddingtonin MathematicalChallenges
&
.
96)
Moorhead
Evolution
ed.
(p
,
Kaplan
of
And , even though the centennial for the Origin of Specieshas passed, speciation
"
still lacks a generalmathematical explanation. Moreover, the question of enough
"
. It is a
time plaguesthe neo- Oarwinian almost as much as it did his predecessors
occur
sets
of
alleles
question which weighs heavily if it is assumedthat coadapted
only by the spread of mutant alleles to the point that relevant combinations are

'
likely ( see Eden s [ 1967] comments ) .
In the present context each of these questions can be rephrased in terms of
the processing of schemata by genetic operators . This allows us to probe the origin
and development of coadapted sets of alleles much more deeply, particularly the
way in which different genetic mechanisms enable exploitation of useful epistatic
effects. In the next chapter , we will be able to extend Corollary 6.4.1 to demonstrate
the simultaneous rapid spread of sets of alleles , as sets, whenever they are associated
with above - average performance ( because of epistasis or otherwise ) . Theorem
7.4 establish es the efficiency of this process for epistatic interactions of arbitrary
complexity ( i .e., for any fitness function p. :d --+ ' U, however complex ) . Section 7.4
gives a specific example of the process in genetic terms and exhibits a version of
'
Fisher s ( 1930) theorem applicable to arbitrary coadapted sets. Finally , in section
9.3, the formalism is extended to give an approach to speciation . This extension
suggests reasons for competitive exclusion within a niche , coupled with a proliferation
of (hierarchically organized ) species when there are many niches.
For the nongeneticist , the illustration at the end of section 6.2 should
convey some of the flavor of algorithms of type <R as optimization procedures . It
is easy enough to extend that illustration to cover inversion and mutation . For
example , under the revised representation of section 6.3 each bit 8 is paired with
a number j designating its significance ( i .eU , 8) designates the bit 8 . 2 ;) . Thus bits

120

.!
Adaptation in Natural and Artificial System

of different orders can be set adjacent to each other in a string without changing
their significance. In consequence
, under the combined effect of inversion and
reproduction, bits defining various regions of above- averagevaluesfor f (x ) will be
ever more tightly linked. This in turn increasesthe rate of exploration of intersections
and refinementsof theseregions. Filling in the remaining details to complete
the extension of the illustration is a straightforward exercise. Section 7.3 in
the next chapter provides a detailed example of the responseof an algorithm of
type <Rto nonlinearities. Theorem 7.4 of that chapter, coupled with the comments
on dimensionality in chapter 4 (p. 71) shows that , whatever the form off (i.e.,
for anyf mapping aboundedd dimensional spaceinto the reals), an algorithm of
type <R optimizes expeditiously. Moreover, the algorithm does this while rapidly
increasing the averagevalue of the points it tests (though they may be scattered
"
through many different hyperplanes), thus making the algorithm useful for online
" control . Sections 9.1 and 9.3
provide more detailed summaries of these
advantages.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

of GeneticPlans
7. TheRobustness

We cannot distinguish between a realistic and an unrealistic adaptive hypothesis


or algorithm without a good estimateof the underlying adaptive plan' s robustnessits efficiencyover the range of environmentsit may encounter. By determining the
, in the intended domains )
speedand flexibility of proposedadaptive mechanisms
of action, we gain a critical index of their adequacy. The framework of concepts
and theorems has expandednow to the point that we can tackle such questions
rigorously. The robustnessestablishedhere is a general property holding for particular
plans of type <Rin any string-representeddomain cf,; furthermore, the basic
theorem holds for any payoff function p. : cf, - + ' U. We can also addressourselves
directly to related questionsof the automatic determination, retention, and use of
relevant history to increaseefficiency~

ADAPTIVEPLANS OF TYPE <R1


, (C,
(Pc, PI, IPM
Genetic plans will be the main vehicle for this investigation, both as test casesand
to illustrate formal approaches to questions of robustness. In particular, the
investigationwill use, as prototypes, plans of type <RIemploying the three operators,
simple crossover, simple inversion, and mutation. ( To retain the one-operator
format of the original specificationof <RI, the combined effect of the three operators
could easily be reinterpreted as the effect of a single composite operator; for
expository purposes it is easier to treat the operators individually .) The basic
parametersare:
Pc, the constant probability of applying simple crossover to a selected
individual ,
P" the constant probability of applying simple inversion to a selected
individual ,
lPN, the initial probability of mutation of an allele (all alternatives for the
allele being equilikely outcomes),
121

jJD
to
~
0
.
c
.
~
S
u
G
Q
~
u
:
"
.
~
~
=
i
I
's.j-D~
J.
122

Adaptation in Natural and Artificial Systems

C"

an arbitrary sequencesatisfyingthe conditions: (i) 0 ~ c, ~ 1, (ii )


: ~ 1.
C, - + 0, (iii ) E ,c, - + ~ ; e.g., c, = ( l / t) , 0 < cx

Definethe randomvariableRand, on SM= { I , . . . , M} by assigningthe


probability IJ.h(t)/ .o(t) to hE: SM. Make one trial of Rand, and designate
the outcome;(t) .

simple

tria!
designate

random

probability

~ . II oS-E - . . . -. u ~, ~ ~ ~ .~ ~ fI.2 u. =~ ~. e c. e c oestB .

( The sequence(c,) is included primarily for its effects when genetic plans are
used as algorithms for artificial systems; it is used to drive the mutation rate to
zero, while assuring that every allele is tried in all possible contexts. (c,) is not
intended to have a natural systemcounterpart and its effectscan be ignored in that
context. Seebelow.)

1/ M to each hE"" 6., and make a random trial accord-

in the subclassof CRtjust describedwill be designated


I. The performances
observedat step 2.3 will be taken to be
, a random variable
assigns

The Robustnessof GeneticPlans

123

from some predeterminedset '\1to eachstructure d1 (seesection 2.1) and successive


trials of the samestructure will in generalyield different performances. ( For clarity ,
this stochastic effect is treated explicitly here, rather than using the formally
equivalent approach of subsumingthe effect in the stochastic action of the operators
.) It will be assumedthat each random variable in '\1 has a well-defined mean
and variance.
The study below shows that the algorithm works well and efficiently when
d1 is small (e.g., when d1 has two elementsas in the two- armed bandit problem)
as well as when d1 is large. When d is small relative to M the geneticoperators are
unimportant, replication alone (step 4) being adequateto the task. However the
'
algorithm s power is most evident when it is confronted with problems involving
high dimensionality ( hundredsto hundredsof thousandsof attributes, as in genetics
and economics) and multitudes of local optima. Computational mathematics has
little to offer at presenttoward the solution of such problems and, when they arise
in a natural context, they are consistently a barrier to understanding. For this
reasonit will be helpful in evaluating the algorithm to keep in mind the casewhere
d1 (and '\1 as well) is a very large set, finite only by virtue of a limited ability to
distinguish its elements(e.g., becausethe detectorshave a limited resolution). The
ultimate finiteness of d1 is convenient, since then the number of attributes or
detectors / can be held fixed, but it is not essential. Chapter 8 will discuss the
changesrequired when / is an unbounded function of t.
That step 5.3 assurescontinued testing of all alleles in all contexts follows
from

LEMMA
7.1: Under algorithms of type(Rl(PC
, PI, IPM, (c, the expectednumber
trials
the
value
the
ith
attribute
i.e.
( , allelej of detectori ),for anyi andj ,
of
of jth
for
is infinite.
Proof : ( Essentiallythis proof is a specializedversion of the Borel zero- one criterion
.) Let P jAr) be the probability of occurrenceat time t of Vi;, the jth value for
the ith attribute. Then L ... 1Pi ,( t )M is the expectednumber of occurrencesof Vi;
over the history of the system. Unless L ,Pi ,( t )M is infinite , Vi; can be expectedto
occur only a finite number of times. That is, unlessL ,Pi,( t )M is infinite , Vi; will at
best be tested only a finite number of times in each context, and it may not be
tested at all in some contexts. (Despite this a plan for which L ,Pi,( t )M is large
relative to the size of Ctl may be quite interesting in practical circumstances.)
Since L ,C, - + ~ , we have L ,c,lPM - + ~ . But Pi ,( t ) ~ C,IPMM for all t ,
whenceLtPi ,( t )M > L ,C,IPMM. Hence LtPi ,( t )M is also infinite in the limit .
Q.E.D .

124

Adaptation in Natural and Artificial Systems

(PC, PI, lPN, (c,})


2 . THE ROBUSTNESSOF PLANS <R1
It might seemthat the natural first stepin establishingthe robustnessof an adaptive
plan would be to show that it will ultimately converge to an optimal structure.
However, as early as section 1.5 it was possibleto make a good argument against
convergence as a criterion for distinguishing useful plans. Enumerative plans
converge, yet in all but the most restricted circumstancesthey are uselesseither as
hypothesesor algorithms. Moreover, when data can be retained by no more than
. More
M structures and M < Idl\ , no plan for searchingd1 can yield convergence
+
~
T
as
M
0
such
that
exists
M
there
for
<
>
,
E( )
formally ,
any
Id11;
*
(liT ) ET. 1p( d , t ) - + I - E(M)
where d * is a subsetof d1 consisting of one or more structures with optimal mean
'
such that the mean of the random variable
*
performance (i.e., structures A Ed
'
#J.~ A ) is at least as high as the mean for any A E: dJ . This is so becausefor any
finite sequenceof trials of a suboptimal structure in d1, there is a non-zero probability
that its observedaverageperformancewill exceedthe observedperformance
of the optimal structures) (assumingoverlapping distributions) . Clearly, if enough
of the structures being tested exhibit observed performances above that of the
optimal structures) (again an event with a non-zero probability), the result will
be the deletion of data concerning the optimal structure. Thus, unless possible
convergenceto a suboptimal structure is to be allowed, each structure must be
repeatedly tested (infinitely often in the limit ) . But this repeatedtesting (and the
law of large numbers) assuresthat suboptimal structures which have a finite
probability of displacing an optimal structure will do so with a limiting frequency
approaching that probability . Hence, for M < \dl \ , no plan can yield
'
'
*
( liT ) ETP(d , t ) - + I . At the sametime, E(M ) < E(M ) for M > M (even when
'
M < Idl\) becauseof two effects:
(i ) the more copies there are of a suboptimal structure A in a given generation
, the smaller the variance of the associatedaveragepayoff #J.A(t )
'
*
(making it lesslikely that .4A(t ) > .4A,(t ), A E: d ),
'
(ii ) more generationsare required to displace A in the whole population
(meaning that .4A(t ) has to exceed.4A,(t) over a longer period, a progressively
lesslikely event) .
At the cost of a small increasein the complexity of the algorithms <R1
(PC, PI, lPN,
as
for
when
M
c
we
can
assure
that
>
example, in the
\dl \ ( ,
( ,})
they converge

12S

The Robustnessof GeneticPlans

2-armed bandit problem) . The reader can learn a great deal more about convergence
'
properties of reproductive plans from N . Martin s excellent 1973study.
Since convergenceis not a useful guide, we must turn to the stronger
"
" minimal
expected losses criterion introduced in chapter 5. Results there
( Theorems5.1 and 5.3) indicate that the number of trials allocated to the observed
best option should be an exponential function of the trials allocated to all other
options. It is at once clear that enumerativeplans do not fare well under this criterion
. Enumeration, by definition, allocates trials in a uniform fashion, with no
increasein the number of trials allocated to the observedbest at any state prior to
, expectedlosses
completion; accordingly, as the number of observationsincreases
climb precipitously in comparison to the criterion. On the other hand, plans of
type <RI(Pc, PI, IPM, (c, do award an exponentially increasing number of trials
to the observedbest, as we shall seein a moment. More importantly , plans of this
type actually treat schematafrom E as options, rather than structures from (tl .
In doing this the plans exhibit intrinsic parallelism, effectively modifying the rank
of large numbers of schemataeach time a structure A E: (tl is tried. The effect is
pronounced, even in an example as simple as that of Figure 13, which illustrates
2 generationsof a small population (M = 8, I = 9) undergoing reproduction and
crossover. Specifically, under plans of type <RI(Pc, PI, IPM, (c, the number of
instancesof a schemaincreases(or decreases
) at a rate closelyrelatedto its observed
is
performance.art/ ) at eachinstant. That , the portion ME<t ) of eachschema~ represented
in population <B( t) changessimultaneouslyaccording to an equation much
like that suggestedat the end of chapter 5:
dM rtt)/ dl = .artt)M rtt).
The foregoing statementscan be establishedwith the help of
LEMMA7.2: Under a plan of type <R1
(PC, PI, lPJf, (c, , given ME<to) instancesof ~
in the population <B( to) at time to, the expectednumber of instancesof ~ at time t .
ME<t), is boundedbelowby

)PiE
)n~--llo(I - EE
<t')/ .4(t')
ME<tO
where

= (Pc+ 2P1
)/(~)/(/ - I ) + k"<t>
EE
a time -invariant constant generally close to zero, depending only upon the parameters
of the plan , the length l (~) of ~, and the nwnber 10(~) of defining positions for ~.

JaqwnN

Ja

r46 f
r4613a

WI

rH Jogd6

IP

LL~ I ILZZLL
-

The Robustnessof GeneticPlans

127

for the probability


Proof: UsingCorollary6.4.1in combinationwith theexpression
have
we
section
6.3
from
inversion
affected
of a schemabeing
, for any to,
)
(
by
P(E, to+ I )
~ [ I - Pc.(f(E)/ (f - 1 ( 1 - P(E, 10 ] . [ I - 2P,(f(E)/ (f - 1 ( 1 - f(E)/ (f - I ]
. [( I - C" IPMY' (f)] . [.4E
<to)/ .4(/0)] .P(E, to)
~ [( I Ef).4E
)/ .4(/o)] P(E, to).
<tO
Or, by iterationof the relation,
'
'
P(E, 10+ h) ~ P(E, to)n ~:t: 1[( I - Ef).4E
<t )/ .4(/ )] .
But the expectednumberof instancesof Eat time t = to+ h is just
-1
'
'
M .P(E, 10+ h) ~ M .P(E, 10
<t )/ .4(t )]
)n ~:t: [( I - Ef).4E
'
'
1
~ ME<to)n ~:t: [( I - Ef).4E
<t )/ .4(/ )] .

Q.E.D.

Lemma7.2, thoughsimple, makesonevery importantpoint. Eventhough


plansof type <Rl(PC, PI, IPM, (c, try slructuresfrom 6.1one at a time, it is really
between2' and
schemata
whicharebeingtestedand ranked. Therearesomewhere
M2' schematawith instancesin <B(t). Eachonechangesits proportionin <B(t) at a
rate largely determinedby its observedperformance
, .4E
<t), and is largely uninfluenced
. This is the foundationof the
by what is happeningto other schemata
intrinsic parallelismof plansof type <R.
While Lemma7.2 is sharpenoughasit standsto enableus to establishthe
efficiencyof plansof type <RI(Pc, PI, IPM, (c, , someof the propertiesimplied by
the sharperinequality of the first line of its proof are also worth noting. As
-overapproach0 and the first factor
P(Et ) - + I the operatorlossesfrom crossing
es I. That is, asP(Et ) - + I , the rate of changeis very nearly
in bracketsapproach
" . <t)/ .4(t)] - I .
[ I - 2P,(f(E)/ (f - 1 ( 1 - f(E)/ (f - I ] . [( I - c,IPM) (f)] [ .4E
Moreover, if ME<t) is at all large, .4E
<t) will closelyapproximatethe expectedpayoff
to <B(t), because
over
6. 1at time t) corresponding
piE
<t) of Eunderthe distribution(
of the centrallimit theorem. Now recall that two schematadefinedon different
the same
positions,but havingidenticalsetsof/ unctionafattributepairs, designate
subsetof 6. 1. Thus all permutationsof E inducedby inversionexhibit the same
expectedpayoffpiE
<t) at any giventime. If we treat thesepermutationsasversions
of the sameschema
, then inversiondoesnot in fact resultin instancesof E being
. This leavesmutationasthe only importantsource
lost duringthe operatorphase

128

Adaptation in Natural and Artificial Systems

of loss when P(~, t ) is near one. But as I advancesc, - + 0, so that


'
[( I - C,lPJl Y(f)] - + I
and the rate of changeapproaches [ PE
<t )/ p( t)] - I . In particular, if some schema
begins to occupy a large fraction of the population (through consistent aboveaverageperformance) its rate of increasewill come very close to [ PE
<t )/ p( t )] - I .
We can now go on to determine the number of trials allocated to the observed
best schemaas a function of the number of trials allocated to structures
which are not instancesof ~. In this determination nf.,. (/0) designatesthe number
of structures in <:B(/o) which are not instances of schema~. Nf.,. and nf.,. designate
the number of trials allocatedfrom to to t to structures which are, respectively,
instancesof ~ and nol instancesof ~. ( That is, Nf.,. + nf,le= (I - to + I ) . M , for
I ~ to.) The logarithm of the effectivepayoff to ~ or log payoff, bounded below by
10 [( I - EE
).aE</)/ .4(t)] , plays a direct role in
LEMMA7.3: If each instanceof ~ gives rise, on the average, to at least one new
instance of ~ in each generation over the interval (to, t ), i.e., if Nf,le~
( t - to + I )ME<to - I ), then the trials from to onwardsatisfy
Nf,le~ ME<to - I ) exp [(Z ~/ nf,le(tOnf ,le]
'
'
where ZL = ( I / (t - to + I E ~- " 10 [( I - EE
).aE<t - 1)/ .4(t - I )] is (a lower
bozmdon) the averagelog payoff over (to, t ).
Proof:

Nf,ft = E ~- . Mt<t')
> Mt<t)
~
=
=
=

ME<toME<toME<to ME<to -

'
'
l )n ~_,. [( 1 - Ef)piE
<t - 1)/ .4(t - 1)] usingLemma7~2
'
'
1) exp[In n ~_,. [( 1 Ef)piE
<t - 1)/ p.(t - 1)]]
'
'
l ) exp [E ~_,. ln [( 1 - Ef)piE
<I - 1)/ .4(1 - 1)]]
1) exp[Zl .(t - to+ 1)]

However ,

nf".(I)/ nf".(lo) = (M .(I - 10+ I ) - Nf".(I / (M - M t<lo


by definition
.
~ (M (t to+ I ) (t to+ I )Mt<to / (M ME<to
by the premiseof the theorem
~ t - to+ 1

The Robustnessof GeneticPlans

129

Substituting for (t - to + I ) in the previous expressionwe get


Nf." ~ ME<to - I ) exp [Z ~(nt "; nf.,,(to ] .

Q.E.D.

This lemma holds a fortiori for any schema~ consistently exhibiting an effective
'
'
rate of increaseat least equal to I , i.e., .4E
), over the interval
<t ) > .4(t )/ ( 1 - EE
(to, I ) . As noted first in sections6.2 and 6.3, when I(~)/ I is small, EEwill be small and
*
the factor 1/ ( 1 - EE
) will be very close to one. Let ~ denote a schemawhich consistently
'
'
yields the best observed performance .4~ t ), to ~ t < t , among the
schematawhich persist over that interval. In all but unusual circumstances.4~ t ' )
will exceed.4(t ' ) by more than the factor 1/ ( 1 - EE
) . If I is large this is the more
certain since, until the adaptation is far advanced, I(~)/ I will with overwhelming
probability be small- seethe discussionat the end of section 6.2. Thus, for any
'
'
*
~ for which II.~ significantly exceeds.4(t ), to ~ t < t , the number of trials N~."
allocated to ~* is an exponential function of n~." .
( For natural systemsthe reproduction rate is determined by the environment
- cf . fitness in genetics- henceit cannot be manipulated as a parameter of
the adaptive system. However, for artificial systemsthis is not the case; the adaptive
plan can manipulate the observedperformance, as a pieceof data, to produce
more efficient adaptation. In particular, the reproductive stepof <Rl(PC, PI , IPM, (c,
*
algorithms, step 4, can be modified to assure that the reproduction rate of ~
).)
automatically exceeds.4{t )/ ( 1 - EE
From all of this it is clear that , whateverthe complexity of thefwrction 11
.,
plans of type <Rl( PC, PI , IPM, (c, behavein a way much like that dictated by the
optimal allocation criterion : the number of trials allocated to the observed best
increasingas an exponential function of the total number of trials n~ allocated to
structures which are not instancesof ~* . However we can learn a good deal more
by comparing the expectedloss per trial of the genetic plans <Rl(PC, PI, IPM, (c,
to the loss rate under optimal allocation. Theorem 5.3 establisheda lower bound
.2)[ 2 + In [ N'l / r - 1)2Srb. ln N'l )] ]
(r - 1)b2(IJ.~ - 11
for the expectedloss under optimal allocation, where b = Ul/ (IJ.~ - 11
.' ) . For the
is
bounded
above
the
loss
trial
by
geneticplan,
expected
per
L ~(N) = (IJ.~/ N) [N ~r 'q( N~, n ' ) + ( I - r 'q( Nf*, n ' n~]
where r ' is the number of schematawhich have receivedn ' or more trials under
the genetic plan, and, as in Theorem 5.3, q( Ne. , n ' ) is the probability that a given
*
option other than ~ is observedas best. ( This expression is simply L ~., from

130

Adaptation in Natural and Artificial Systems

'
Theorem 5.3 rewritten in the terms of the genetic plan s allocation of trials , NE*
and n ' , noting that r 'q( NE*, n' ) is an upper bound on q( nl, . . . , n,).) It is critical
to what follows that r ' . n' need not be equal to nE*. As <B(t ) is transformed into
<B(t + I ) by the genetic plan, each schema~ having instal1cesin <B(t ) can be expected
to have ( I - EE
)IJ.E<t )/ p( t ) instancesin <B(t + I ). Thus, over the course of
several time steps, the number of schemata r ' receiving n ' trials will be much,
much greater than the number of trials allocated to individuals A fl ~* , even when
n~, is an
n ' approaches or exceedsnE*. This observation, that generally rin '
'
explicit consequenceof the genetic plan s intrinsic parallelism.
With these observations for guidance, we can establish that the lossesof
'
genetic plans are decreasedby a factor I / (r - I ) in comparison to the losses
under optimal allocation. Specifically, we have
mO OREM 7.4 : If r ' is the number of schemata .,(or which

n' ~ [2Z~~*)b2
/ nE
-.o(O)]nE
-.O,
i.e., if r ' is the number of schematafor which the number of trials n ' increasesat
leastproportionally to nE*,O, then for any performance function #1.: - + ' U,
L ~(N) / L ~N) - + L < [ I / (r ' - 1)](IJ.~ E*,o(O)/ 2b2Z~(~*
as N - + ~ , wherethe parametersare definedas in Lemma 7.3.
Proof: Substituting the expressionfor NE* (from Lemma 1.3) and the expression
for q( N~, n' ) ( from the proof of Theorem 5.1) in L ~(N ), and noting that
'
'
( I - r q( N~ , n n ~ < n~ , gives

'
L~(N) ~
-- (P~/ N) [(r M~ O)/ ~ )

. exp L<Z ~(~. )nf*.o/ nf*.o(O - (b- 2n' + In b- 2n')/ 2] + nf*.o] .

If b- 2n'/ 2 ~ Zl .(~. )nf*.o/ nf*.o(O), it is clear that the first term (the exponential term)
decreasesas nf*.o increases
, but the secondterm, nf*.o, increases. In other words, if
'
.
n ~ [ 2Z~(~ )b2/ nf*.o(O)]nf*.o, i.e., if n ' increasesat least proportionally with nf*.o,
the expectedloss per trial will soon depend almost entirely on the second term.
We have already seen(in the proof of Corollary 5.2) that the same holds for the
second term of the expressionfor expected loss under an optimal allocation of
N trials. Thus, for r ' and n ' as specified, the ratio of upper bound on the reproductive
'
'
plan s lossesto the lower bound on the optimal allocation s lossesapproaches
'
*
1&f*IIf*.o/ ({I f* - I12Xr - l ) m )

131

The Robustness 01 Genetic Plans

asN increases
. (Thiscomparisonyieldsa lowerboundon the ratio sincethe upper
bound in one caseis beingcomparedto the lowerboundin the other. It can be
established
easily, on comparisonof the respective
first termsof the two expressions
, that the conditionon n' is sufficientto assurethat the first term of L ~(N) is
alwayslessthan the first term of L:(N ). It shouldbe notedthat the conditionon
n' can be madeas weakas desiredby simplychoosingne*.o(O) largeenough.)
To proceed
, substitutethe explicit expressionsderived earlier for ne*.o
(Lemma7.3) and m* ( Theorem5.3) in the ratio ~ e*.o/ (~ - IJ.tXr' - 1) m*),
yielding
L ~/ L ~--to~ / Pe* - IJ.tXr' - 1 ](ne*.o(O)/ Z &(~* In [(N - ne*.o)/ M~ O)]
.(b21n[NI/ (8rb4(r ' - 1)21nNI)])- l .
Simplifyingand deletingtermswhichdo not affectthe directionof the inequality
we get

L: I L'r -+L < [ 1/ (r' - 1)](pE


*1J
~.o(O)/ Z~~*

.In N/ [2b2In N - b21n(Srbf(r ' - 1)1In N' )] .

Or , as N grows

L:.'/ L: -+L < [ 1/ (, ' - 1)](Pt*"e-.o(O)/ 2b2Zc


\(E* .

Q.E.D.

Thus algorithms of type <R1


(PC, P" lPJI , (c, effectivelyexploit their intrinsic
parallelism, however intricate the assignmentof payoff IAl.A ) to structures A E: Cl,
'
reducingtheir lossesby a factor r in comparison to one-schema-at-a-time searches.
We can get some idea of the size of r ' by referring to the last few paragraphs of
chapter 4. Given a representationproduced by I = 32 detectorswith k = 2 values
"
"
'
) when N = 32 and n ' = 8 (with all elementsof Clequally
( alleles ) each, r > 9(XX
"
"
=
=
likely, i.e., .80 'Yo I ) . This is a startling speed-up for a spacewhich is, after
all , small relative to the Clspacesin , say, geneticsor economicswhich may involve
chromosomesor goods vectors with I
I 00. Even small increasesin N , or decreases
in n ' produce dramatic increasesin r ' ; similar increasesresult from increases
in I. Increasesin I may result from representinga larger spaceCl, or they
may be deliberately introduced for a given Cl (either by selectingk ' < k , so that
' '
(k )Z ~ (k ) ' ~ IClI necessitatesI ' > I, or else by using additional [ redundant]
detectors).
To get a better picture of the implications of Lemma 7.2 and Theorem 7.4
let us look at two applications. Once again, as in chapter 1, one application is to

132

Adaptation in Natural and Artificial


Systems

a system which is simple and artificial , while the other is to a system which is
complex and natural .

3 . ROBUSTNESSVIS-A-VIS A SIMPLE ARTIFICIAL ADAPTIVE SYSTEM


The first application concerns game-playing algorithms. The game-playing illustration
(section 3.3) begins by pointing out that the outcome of a 2-person game
without chance moves (a strictly determined game) is fixed once each player has
selected a pure strategy. Assume, for present purposes, that the opponent has
adopted the best pure strategy available to him (the minimax strategy) for use in
all plays of the game. Then any pure strategy selectedby the adaptive plan will
lead to a unique outcome and a unique payoff (again, as pointed out in section3.3seeFigure 4) . Thus, the function II which assignspayoff to outcomes can be extended
to the strategies d1 employed by the adaptive plan, assigning to each
'
strategy the unique payoff it achievesagainst the opponent s fixed strategy. ( It is
, to think of thesepayoffs as wide ranging- numerical
helpful, though not necessary
" close win " " loss a wide
"
of
,
equivalents
by
margin, etc., rather than just 1, 0, - 1
for " win ," " draw," " loss." ) The strategiesavailable to the adaptive plan will be
limited ,to a set of strategiesfundamentally little different from the threshold pattern
recognition devicesof section 1.3. These strategiesare basedon the recognition
and evaluation of positions (configurations) in the game tree and are substantially
the same as those employed by Samuel in his 1959 checkers-player.
Each strategy in d1 is defined by a linear form E ~- 1W,-8i where: (i ) 8i:S - + Rea/s
evaluateseach configurationS ES for a property relevant to winning the game
(e.g., in checkers, 81might assignto eachconfiguration the differencein the number
of kings on each side, 8t might count the number of piecesadvancedbeyond the
centerline, etc.) ; (ii ) Wi E:: W weights the property according to its estimated importance
in the play of the game. The linear form determinesa move by assigning
a rank

P(S) = I : ~- 1W,",,(S) to eachSE: S(y), whereS(y)


is the set of configurationslegallyattainableon the yth move; then that moveis

chosen which leads to a configurationS * ES ( y) of maximal rank , i.e., p( S*) =


maXsE:lc.>{P(S) } .
The objective now is to find an adaptive plan which searches the set of
strategiesCt1so that performance improves rapidly . To keep the example simple
only the specialcaseof correction of weights at the end of each play of the game

133

The Robustnessof GeneticPlans

"
will be considered here. ( The more complicated case, involving predictive correction
"
during play of the game, is discussedin the latter half of section 8.4.)
Becausethe detectors 8i are given and fixed, the strategiesin d1 are completely
determined by the weights Wi, i = 1, . . . , I , so the search is actually a search
'
through the spaceof I-tupies of weights, W .
A typical plan for optimization in W' adjusts the weights independentlyof
each other (ignoring the interactions). However, in complex situations (such as
playing checkers) this plan is almost certain to lead to entrapment on a false peak,
or to oscillations betweenpoints distant from the optimum . Clearly such a plan is
not robust. To make the reasonsfor this loss of robustnessexplicit, consider the
'
plan T~ with an initial population <B(O) drawn from W , but with steps3 and 4 of
CRl
(O,O,O,O) extendedas follows:

A'(t)=(0,0,...,0).
r =1and
3]--""3.1Set
set
[from
t [the sameas before]
4
t4.1Set
'(t)toa,.(i(r).I). the
rthposit
value
A
the
rthposition
of
'. ofthe
tion
ofAi(I)(t)=(al(i(t),t),...,a,{i(t),tE: W
4.2Isr =11
!o Lyes- + [to6.1]
t.3Increase
rby1.
4
Clearly

makes no use of the genetic

. Oversuccessive generations this


operators

plan has the same(stochastic) effect as repetition of the following sequence:


I . Form <B'(t ) from <B(t ) by making I (A ,( t copies of each element A ,( t ),
; = I , . . . , M in <B(t ). (Payoff l yields a copy with probability ; , so
that the expectednumber of elementsin <B'(t ) is L ~ II (A ,( t .)
'
2. All the copiesof weightsassociatedwith positionj of the I-tupies in <B(I )
are collected in a single set Wj(t), j = 1, . . . , I. Wj(t ) thus, typically ,
contains many duplicates of each weight in W.
3. ElementA ,( t + 1) = (OI(;(t + 1), t + I ), . . . , o,(;(t + I ), t + I , ; = 1,
. . . , M , is formed from <B'(t ) by drawing weight 01(;(t + I ), t + I ) at
random from set W1(t ), weight 02(;(t + 1), t + I ) from W2(t ), etc.
<B(t + 1) thus consists of M I-tupies formed by M successivedrawings

from the I setsW;(t).


4. Returnto stepI to generatethe next generation.

134

Adaptation in Natural and Artificial Systems

Because'T~ makes no use of genetic operators it is a plan for adjusting


weights independently. Specifically, under this procedure, the probability of
occurrenceof A = ala, . . . a, at time t + 1 is just n ~- 1P(ar, I ), where P(ar, t ) is
the proportion of ar E: W in W,(I ). It follows at once that an arbitrary schema~
occurs with probability X(~) = Nip (;~), as would be the caseunder the equilibrium
discussedin section 6.2. Moreover,

P(;~, t + 1) = M~ t + 1)/ M = (,4~ t)/ ,4(/ M~ t)/ M

= (pit<t)f .4(t P(; Et)

so that

P(~, t + I) = [UJ
{ .4,t<t)/ .4(t ]P(~, t)
. Clearly the weightsat distinct positions are chosenindependently
under the plan TtR
of each other. Hence if a pair of weights contributes to a better performancethan
could be expectedfrom the presenceof either of the two weights separately, there
will be no way to preservethat observation. This can lead to quite maladaptive
behavior wherein the plan ranks mediocre schemata highly and fails to exploit
useful schemata. For example, consider the set of schematadefined on positions I
and 2 when W = { WI, Wt, Wa
} . Assume that all weights are equally likely at each
. . . 0 , say, occurs with probability
position (so that an instanceof schemaWtWaO
and
of
each
schema
be given by the following table:
let
the
i ),
expectedpayoff
Table 3: A N~- " -~~ IJ.f " Two Positi OM

Sinceall instancesare equally likely we can calculate from this table the following
expectationsfor single weights:

135

The Robustness of Genetic Plans

Table4: P(~) aI M
Il .JEfor the0ae- , .. Id0llSdaemata
Implicitin Table3
~
P(~)
IJ.E
WI0 0 . . . 0
W200 . . . 0
W, 00 . . . 0
0 wID . . . 0
0 W20 . . . 0
0 wID . . . 0

1/ 3
1/ 3
1/ 3
1/ 3
1/ 3
1/ 3

0.9
1.1
1.0
1.1
1.0
0.9

becomes
Clearlythecombination
WtWI
increasingly
; in fact
likelyunder'T6t
P(WtWI
0 . . . 0 , t + I ) = P(WtO . . . 0 , t + I ) .P(O WI0 . . . 0 , t + I )
= [(-4IDIO
.. .o( t)/ P,(t P( Wt0 . . . 0 , I)]
. [(P,QlDIO
.. .o( t)/ P,(t P(O WI0 . . . 0 , I)]
= 1.21P(WtWI
0 . . . 0 , I).
On the other hand, the best combination WIW
, 0 . . . 0 by the same calculation
satisfies
P(WIW
, 0 . . . 0 , t + I ) = 0.81 P(WIWI0 . . . 0 , t)
so that its probability of occurrence actually decreases
. It is true that , as
.
.
.
becomes
more probable, the values of PUIIO
WtWl0
0
. . .O and 1J
.0UII0. . .0
decrease
, eventually dropping below I , but WIwID . . . 0 is still selectedagainst,
as the following table shows:
Table5: IJ.~for the sm . . fa of Table4 whenI I I Stanees
Are Not F..quDikely

136

Adaptation in Natural and Artificia J Systems

On the other hand the nonlinearities of liE ( Table 3) have no effect on

<R1
( 1, - , - , - ). Lemma1.2 makesthis quite clear.

0 . . . 0 , t + I ) = MtDat
Da
P( WIWa
O.. .O (t + 1)/ M
= ( I - EvlWIO
.. .O).4w
. . .o( t)MW1WIO
. . .o( t)/ M
.WIO
=

I -

. 1.6 ,P(wlwa0 . . . 0 , I),

whereasW2Wl
0 . . . 0 now satisfies
P(W2Wl
0 . . . 0 , t + I) =

I -

. 1.I .P(W2Wl
0 . . . 0 , I).

. Thus a plan of type


Clearly WIWa0 . . . 0 quickly gains the ascendancy
<Rl( l , - , - , - ) preservesand exploits useful interactions between the weights.
Moreover Lemma 1.2, in conjunction with Theorem 7.4, makes it clear that such
a plan can actually exploit local optima (false peaks) to improve its interim performance
on the way to a global optimum .
4 . ROBUSTNESSVIS-A-VIS A COMPLEX NATURAL ADAPTIVE SYSTEM
Many points made in connection with the game-playing algorithm can be translated
to the much more complex situation in genetics. We shaIl see that these
points weigh strongly against the (stiIl widely held) view that biological adaptation
proceeds by the substitution of advantageousmutant genesunder natural
selection. In addition , they directly contradict the closely related view (in mathematical
genetics) that aIleies are replaced independently of each other, increasing
or decreasingaccording to their individual averageexcess
es. Rather, the results
of this chapter suggestthat the adaptive processworks largely in terms of pools of
schemata(potentiaIly coadapted sets of genes) instead of genepools. Becausethe
pool of schematacorrespondingto a population is so much larger than the pool of
genes, selectionhas broader scope(some multiple of 2' vs. 2/ , or with k = 2 aIleies
and just / = 100loci , some multiple of 1010vs. 200) with many more pathwaysto
improvement, and the great advantageof intrinsic paraIlelism.
To translate the results on robustness to genetics, the central genetic
"
"
parameter, averageexcess(of fitness), must be defined in terms of observational
.
df
. First let .4E
quantities .4E
<t ) = Mf (t + 1)/ Mf (t ) ; that is, .4E
<t) is the effective rate
of increaseof the schema~ at time t. For adaptive plans of type <Rl(PC, P" IPM, c,),

137

The Robustness01 GeneticPlans

p'E(t) is boundedbelowby ( I - EE
)PiE
<t) (following Lemma1.2). For a fraction 4t
of a generation we can write

4M~t) = M~t + ~ ) - M~t)


= (.aE(t)ME<t) - M~t ~
= (.aE(t) - l )M~t)4t.
If M(t) is the sizeof the populationat time t (allowingthe overallpopulationsize
to be variablefor the time being), then

P(~, I) = ME<t)/ M(/)


and

4M
t(t))+
t
)
<
~
<t)
- ME
4P
(~,t)=ME
M
+A.M
(t) t ""ii(jj"
- 1.61
1

() ) = ~ !2. + (.4E
M(t) [ 1+ (.4(t) - 1).61 1]
= P(~, t) . (p'E(t) ".4(t dt
1+ (.4(t
:

l )~t

at a rate determinedby the


usingthe fact that the populationasa wholeincreases
observedaveragefitnessil(t). It followsthat
4P(~, t)/ 41 = a(~, 4I)P(~, t )
wherea(~, 41) = df.(.4f(t) - il(t / ( 1 + (il(t) - 1)41).
If we usea discretetime-scalet = I , 2, 3, . . . then 41 = (t + I ) - t = 1
and
a(~, I ) = (.4f(t) - il(t / il(t).
If we takethe limit as~ - + 0, in effectgoing~to

a continuous

time - scale , we

have

Iim &- o[4P( Et )/ .6.I] = dP(Et )/ dt = a(E, O)P(E, I )


= [l1f<t) - .4(t)]P(E, I).
The equation dP(~, t )/ dt = P(~, t )a(~, 0), when restricted to alleles (schemata
defined on one position), is just Fisher' s ( 1930) classicalresult, relating the change
in proportion of an allele to its averageexcess
. We seehowever that the equation
holds for arbitrary schemata. This givesus a way of predicting the rate of increase
of a set of alleleswith epistatic interactions from a sampleaverage.4Eof the fitnesses
of chromosomescarrying the set of alleles.

138

Adaptation In Natl Ual and Artificial Systems

Consider, now, how such a prediction would differ from one made under
the assumption of independent substitution of alleles, using the earlier example
(the tables of section 7.3) . In the presentcasethe elementsof W play the role of
indices: WI at position I indicates the allele I for position I is present; the same
WI at position 2 indicates the presenceof allele I for position 2, an allele which
may be quite different from the former one. Under independentselection

DO . . . 0 , t) .P( D Wit0 . . . 0 , t)
WiI D . . . 0 , t) = P(Wia
P(Wit
sothat

d
d
0 . . . 0 , I)] = di [P(Wil00 . . . 0 , t).P( O Wti0 . . . 0 , I)]
dt [P(WiIWti
~ (P( O Wit0 . . . 0 , t) d [P( WilDO . . . , I )])
11
d
+ (P( WilDO . . . 0 , t) di [P( O Wi. 0 . . . 0 , I)])
~ P( WiIWi
. O . . . 0 , I) . [a(WilDO . . . 0 , 0) + a( O Wit0 . . . 0 , 0)] .
Thus, underindependent
selection
, combinationsof alleleshavea rate of change
whichis the sumof their averageexcess
es.
Table
4
in
terms
of averageexcess
es(notingthat .4(t) ~ I ),
Reinterpreting
we seethat the rate of changeof the favorableWIW
. 0 . . . 0 ( Table3) is

- 0.2P( W1W
. 0 . . . D),
while that of the lessfavorableWSWI
0 . . . 0 is + 0.2 P( WSWI
0 . . . 0 ) under
. Thus independent
selectionleadsto maladaptationhere.
independentselection
As mentionedearlier, adaptationunderindependentselectionamountsto
adaptationunderthe operatorequilibriumof section6.2,
P(Et ) ~ >.(Et ) ~ UiP(; E, I).
This is a commonassumptionin mathematicalgenetics
, but it clearly leadsto
whenever
maladaptations
a(E) ~ Eia(;E).
The aboveequationfor a(E) in tenDsof .dEshowsthis to be the casewhenever
liE~ Eiillf , whichoccurswheneverthe fitnessis a nonlinearfunctionof the alleles
.
, i.e., wheneverthereis epistasis
present

The Robustnessof GeneticPlans

139

On the other hand, under reproductive plans of type <RI(PC, P" IPAl, (c, ,
operator equilibrium is persistently destroyed by reproduction. In effect, useful
) are exploited. Indeed, it would
linkagesare preservedand nonlinearities (epistases
seemthat the term " coadapted" is only reasonably usedwhen allelesare peculiarly
suited to each other, giving a performance when combined which is not simply
the sum of their individual performances. Following Lemma 7.2, each coadapted
set of alleles(schema) changesits proportion at a rate determined by the particular
average (observed) fitness of its instances, not by the sum of the fitnessesof its
component alleles.
(Becauseof the stochastic nature of the operators in genetic plans, each
chromosome A E: (11 has a probability of appearing in the next generation
CB
(t + I ), a probability which is conditional on the elements appearing in CB
(t ) .
If there are enough instancesof E in CB
the
theorem
assures
that
t
central
limit
( ),
.4E
<t ) ~ IJ.ft where IJ.Eis the expectedfitness of the coadapted set E under the given
probability distribution over (11.Thus the observedrate of increaseof a coadapted
set of alleles E will closely approximate the theoretical expectation once E gains a
foothold in the population.)
Returning to the examplejust above, but now for genetic RI) plans, we see
(from Table 3) that WlWa0 . . . 0 has a rate of changegiven by
+ O.6 ' P( wlwa0 . . . 0 , t),
while WSW1
0 . . . 0 changesas
+ O.I ,p( wsw10 . . . 0 , t ) .
Consequently, the coadapted set of alleles with the higher averagefitness quickly
predominates. Thus, when epistasisis important , plans of type <RI(and the corresponding
theorems involving schemata) provide a better hypothesis than the
hypothesisof independentselection(and least mean squaresestimatesof the fitness
of setsof alleles) .

5. GENERALCONSEQUENCES
We see from Lemmas 7.2 and 7.3 that , under a genetic plan, a schema~ which
persistsin the population <B(t ) for more than a generation or two will be ranked
according to its observed performance. This is accomplished in a way which
satisfiesthe desiderata put forth at the end of chapterS . Specifically, the proportion
of ~' s instancesin the population <B(t ) will grow at a rate proportional to the

140

Adaptation in Natural and Artificial Systems

amount by which ~' s averageperformance II.Eexceedsthe averageperformance II.


of the whole population. At the same time the rankings are stored compactly in
the way suggestedat the end of chapter 4, at least 2' schematabeing ranked in a
population which may consist of only a few dozen elementsfrom d1. Moreover,
genetic plans automatically accessthis information , update it , and use it to generate
new structures, each of which efficiently tests large numbers of schemata.
In detail: Schemataof above-averageperformanceare combined and tested
in new contexts by crossing-over outside their defining locations. Because (the
instances of) schemata increase or decreaseexponentially in terms of observed
performance (Lemma 7.3), the overall average performance is close to the best
observed. Becausea wide range of promising variants is generated and tested
"
"
(section 6.2) entrapment on false peaks (local optima) is prevented. Even for
moderate sizesof population and representation, say M = 100 and / = 20, if the
initial population <B(O) is varied, a crossoverprobability Pc > l will make it almost
certain that every structure A generatedduring the initial stagesof adaptation is
new. Nevertheless, this high value of Pc doesnot disturb the rankings of schemata
which are consistently above average. Thus, sampling efficiency remains high,
while ranking information is preserved and used. In conjunction with these
processes, inversion by changes in linkage assures that schemata consistently
associated with above-average performance are steadily shortened (/(~) is decreased
), thereby reducing operator losses(section 6.4 and the definition of EEin
Lemma 7.2).
"
"
Overall, genetic plans, by simple operations on the current data base
<B(t ), produce sophisticated, intrinsically parallel tests of the spaceof schemataE.
Large numbers of local optima, instead of diverting the plan from further improvement
, are exploited to improve performance on an interim basis while the
searchfor more global optima goes on. High dimensionality (such as a multitude
of factors affecting fitness or play of a game) creates no difficulties for genetic
plans, in contrast to its effect on classical procedures, becauseof the intrinsic
'
parallelism (the r factor of Theorem 7.4) .

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

8. Adaptation of Codings and


Representations

To this point the major limitation of geneticplans has beentheir dependenceupon


the fixed representationof the structures Ct. The object of the presentchapter is to
show how to relax this limitation by subjectingthe representationitself to adaptation
. This will be approached by reconsidering representation via detectors
- + Vi, i = I , . . . , I } (chapter 4) in the light of the comment that detectors
:Ct
6i
{
can be looked upon as algorithms for assigning attributes (section 3.4). Since
algorithms can be presented as strings of instructions, the possibility opens of
treating them by genetic plans, much as the strings of attributes are treated. ( The
mode of action of the geneticoperators, of course, puts some unique requirements
on the form and interaction of the instructions.) Actually , with a set of instructions
of adequatepower, we can go much further. We can define structures capable of
achieving any effectively describablebehavior vis-a-vis the environment. We can
do this by setting up algorithms which act conditionally in terms of environmental
and internal conditions. In particular, the predictive modeling technique of
sections3.4 and 3.5 can be implemented and subjectedto adaptation. The JacobMonod " operon-operator" model (seethe end of chapter 6) is suggestivein this
"
'
respect, and we ll look at it more closely after the question of a language of
" the instructions and their
grammar) has beenconsidered.
algorithms (

1. FIXEDREPRESENTATION
Before proceedingto a " language" suited to the modification of representationsit
is worth looking at just how flexible a fixed representationcan be. That a fixed
representationhas limitations is clear from the fact that only a limited number of
subsetsof (t can be representedor defined in terms of schemata based on that
representation. If (t is a set of structures uniquely representedby I detectors, each
141

142

Adaptation In Natural and Artificial Systems

'
taking on k values, then of the P distinct subsetsof (J, only (k + I ) ' can be defined
by schemata. However, the question is not so much one of defining all possible
subsets, as it is one of defining enough " enriched" subsets, where an " enriched"
subset is one which contains an above-average number of high-performance
structures.
It is instructive, then, to determine how many schemata (on the given
"
"
representation) are enriched in the foregoing sense. Let (J, contain x structures
which are of interest at time t (becausetheir performance exceedsthe averageby
some specifiedamount) . If the attributes are randomly distributed over structures,
determination of " enrichment" is a straightforward combinatoric exercise. More
precisely, let each8.. be a pseudorandom function and let V.. = {O, I } , i = I , . . . , / ,
so that a given structure A E: (J, has property i (i.e., 8.( A ) = I ) with probability -t .
Under this arrangement peculiarities of the payoff function cannot bias concentration
of exceptional structures in relation to schemata.
Now , two exceptionalstructurescan belong to the sameschemaonly if they
are assignedthe sameattributes on the samedefining positions. If there are h defining
positions this occurs with probability (l )It. For j exceptional structures,
instead of 2, the probability is ( 1/ 2i- l)lt. Sincethere are (1) ways of choosingh out
of / detectors, and ~, ways of choosingj out of x exceptional structures, the expected
number of schematadefined on h positions and containing exactlyj exceptional
structures is
( 1/ 2i - l)A(1XJ) .
For example, with / = 40 and x = I~ (so that the density of exceptional structures
is x/ 2' = 1~ / 2.o ~ 10- 7), h = 20, andj = 10, this comesto
( 1/ 2' )' 0(40!/ (20!20!)XI ~ !/ (99,990! to !) ~ 3.
Noting that a schemadefined on 20 positions out of 40 has po = I ~ instances,
we seethat the 10 exceptional structures occur with density 10- ' , an " enrichment"
factor of 100. A few additional calculations show that in excessof 20 schemata
defined on 20 positions contain 10 or more exceptional structures.
For given h andj , the " enrichment" factor rises steeply as / increases. On
the other hand an increasein x (correspondingto an extensionof interest to structures
with performancesnot so far above the average) acts most directly on the
expectednumber of schematacontaining j structures. With an adequate number
of pseudorandom functions as detectors(and a procedure for assuringthat every
combination of attributes designatesa testable structure), the adaptive plan wiD
have adequategrist for its mill . Stated another way, even when there can be no

Adaptation of Coding"

143

correlation between attributes and performance , the set of schemata cuts through
"
the spaceof structures in enough ways to provide a variety of " enriched subsets.
Intrinsic parallelism assures that these subsets will be rapidly explored and

exploited.

"
LANGUAGB
2. THE " BROADCAST
Though the foregoing is encouraging as to the range of partitions offered by a
given set of schernata, sornething rnore is desirable when long-term adaptation is
involved. First of all , when the payoff function is very cornplex, it is desirable to
adapt the representationso that correlations betweenattributes and performance
are generated. Both higher proportions of " enriched" schernata and higher
" enrichrnent" factors result. It is still rnore
irnportant, when the environrnent
in
addition
to
that
the
provides signals
adaptive plan be able to model
payoff,
the environrnent by rneans of appropriate structures (the 5Jt cornponent of
(1 = (11X mt in section 2.1). In this way large (non-payoff) information flows
frorn the environrnent can be used to irnprove perforrnance. As suggestedin
sections3.4 and 3.5, by a processof generating predictions with the rnodel, observing
subsequentoutcornes, and then cornpensatingthe rnodel for false predictions
, adaptation can take place even when payoff is a rare occurrence.
To provide these possibilities, the set of representationsand rnodels available
to the plan rnust be defined. Further flexibility results if provision is rnade,
within the sarnefrarnework, for defining operators useful in rnodifying representations
and rnodels. A natural way to do this is to provide a " language" tailored to
the precise specification of the representationsand operators- a language which
can be ernployed by the adaptive plan. Sorne earlier observations suggestadditional
, desirable properties of this language:
I . It should be convenientto presentthe representations, rnodels, operators,
etc. as strings so that schernataand generalizedgenetic operators can be
defined for theseextensions.
2. The functional " units" (cf. detectors, etc.) should have the sarneinterpretation
(function) regardlessof their positioning within a string, so
that advantagecan be taken of the associationsprovided by positional
proximity (section 6.3) .
3. The nurnber of alternatives at each position in a string should be small
so that a richer set of schernatais provided for a given size of (1 (seethe
commentsin the middle of chapter 4) .

Adaptation in Natural and Artificial Systems

144

"
, in the senseof being able to define within the language
Completeness
all effective representationsand operators, should be provided so that
the languageitself placesno long-term limits on the adaptive plan.
"

What follows is an outline of one " language" satisfying these conditions.


It has the additional property (which will be discussedafter the presentation) of
offering straightforward representations of several models of natural systems,
including operon-operator models, cell-assemblymodels (section 3.6) and various
physical signaling and radiation models.
The basic units of the language are broadcast units. Each broadcast unit
can be thought of as broadcasting an output signal " to whom it may concern"
wheneverit detectscertain other signals in its environment. For example, a given
unit , upon detecting the presenceof signals I and I' (perhaps broadcast by other
units), would broadcast signal 1" . Some broadcast units actually processsignals
so that the signal broadcastis somemodification of the signalsdetected. In keeping
with suggestion( I ), broadcast units are specified by strings of symbols. A set of
broadcast units, usually combined in a string, will constitute a deviceor structure
(an element of <1) . Some broadcast units broadcast strings which can be interpreted
as (new) broadcast units ; broadcast units can also detect the presenceof
other broadcast units (treating them as signals) . Thus, given broadcast units can
modify and create others- they serveas operators on <1.
The language's ten symbolsA = {O, I , * , : , 0 , V , ' Y, ~ , p, '} , along with
informal descriptions of intended usage, follow . (Exact interpretations for strings
of the symbols follow the listing .)

0 These two symbols constitute the basic alphabet for specifying


I signals. Thus 01011is a signal which, within the language, has no other
meaning than its ability to activate certain broadcast units. GeneraIly
a string such as 01011 wiIl be interpreted as the name of a particular
signal (e.g. the binary encoding of a frequencyor amino acid sequence
).
* indicatesthat the fo
Ilowing string of symbols(up to the next occurrence
of a * , if any) is to be interpreted as an active broadcast unit.
This symbol is the basic punctuation mark, usedin separatingthe arguments
of a broadcast unit.
For example, * 1100: II designatesa broadcastunit which will broadcastthe signal
II one unit of time after the signal 1100is detectedin the unit ' s environment (see
the intended interpretations for strings below) .

Adaptation of Codings and Representations

145

When this symbol occurs in the argument of a broadcast unit it indicates


a " don' t care" condition. I .e. any symbol can occur at that particular
position of a signal without affecting its acceptanceor rejection
the
broadcast
unit . If the symbol 0 occurs at the last position of an
by
it
indicates
that any terminal string (suffix) may occur from
argument
that point onward without affecting acceptanceor rejection.
For example, * 1000 : II will broadcast II if it detectseither the signal 1100or the
signal 1000(or in fact a signal with any of the other symbols at the secondposition
*
) ; 1000 : II will broadcast II if it detects any string having the prefix 100,
such as 1001or 10010110or even 100.
~

~ designatesan arbitrary initial or terminal string of symbols (an


arbitrary prefix or suffix) when usedin the argumentsof a given broadcast
unit. This symbol gives the unit string-processingcapability.

For example, * II ~ : ~ will broadcast the suffix of any signal having the symbols
II as prefix. Thus if 1100is detected, the signal 00 will be broadcast, whereasif the
signal 11010is detected, the signal 010 will be broadcast. ( The resolution of conflicts
, where more than one signal satisfiesthe input condition , is detailed below.)
All occurrencesof ~ within a given broadcast unit designatethe same substring,
but occurrencesin different broadcast units are independentof each other.
..., A secondsymbol used in the samemanner as V . It makes concatenation
of inputs possible(seebelow).
~ This symbol servesmuch as V and ..., but designatesan arbitrary single
symbol in the argumentsof a given broadcast unit.
For example, * II ~ O: I ~ broadcasts10if 1100is detected, or II if 1110is detected.
p When p occursas the first symbol of a string it designatesa string which
persiststhrough time (until deleted), even though it is not a broadcast
unit .
This symbol is used to quote symbols in the arguments of a broadcast
unit.
For example, * 11' 0 : 10 broadcasts10 only if the (unique) string 110 is detected
(i.e., the symbol 0 occursliterally at the third position) ; without the quote the unit
would broadcast 10 wheneverany 3 symbol string with the prefix 11 is detected.

146

Adaptation In Natural and

ArtiJic~ Systems

The interpretations of the various strings from A* , the set of strings over A,
along with the conventions for resolving conflicts, follow .
Let I be an arbitrary string from A * . In I a symbol is said to be quoted if the
'
symbol occurs at its immediate left. I is parsed into broadcast units as follows:
The first broadcast unit is designatedby the segmentfrom the leftmost unquoted
* to
( but not including) the next unquoted * on the right (if any). (Any prefix to the
left of the leftmost unquoted * is ignored.) The second, third , etc., broadcast units
are obtained by repeating this procedure for each successiveunquoted * from the
left. If I contains no unquoted * s it designatesthe null unit, i.e., it doesnot broadcast
a signal under any condition. Thus
* '*
*
pl0 11 ~ O: 1~ : 11~ : 11~
designatestwo broadcast units, namely
* ll ' * ~ O: l ~

and * : 11~ : 11~ .

There are four types of broadcast unit (other than the null unit ) . To determine
the type of a broadcast unit from its designation, first detennine if there are
three or more (unquoted) : to the right of the * . If so ignore the third : and everything
to the right of it . The remaining substring, which has a * at the initial positions
and at most two : s elsewhere
, designatesone of the four types if it has one
of the following four organizations.

1.
2.
3.
4.

*h :h
...r11
*...r11
*r.II..
. ..r11
*h :h :h

where h , it , and I , are arbitrary non-null strings from A * except that they contain
neither unquoted * s nor unquoted : s. If the substring does not have one of these
organizations it designatesthe null unit . The four basic types have the following
functions (subject to the conventions for eliminating ambiguities, which follow ).
1. *h : it - If a signal of type h is present at time " then the signal it is
broadcast at time I + I .
2. * : h : it - If there is no signal of type h presentat time I , then the signal
it is broadcast at time I + I .
3. *h : : it - If a signal of type h is presentat time " then a persistentstring
of type Is (if any exists) is deleted at the end of time I.

Adaptation 01 Codingsand Representations

147

4. . h : It : fa - If a signal of type h and a signal of type It are both present


at time I , then the signal fa is broadcastat the sametime t unlessfa contains
unquoted occurrencesof the symbols { V , ,." A } or singly quoted
occurrencesof . , in which casefa is broadcast at time t + I .
When the final string of any of theseunits (It for ( I ), ( 2), and (3), fa for (4 is interpreted
for broadcast, one quote is stripped from each quoted symbol.
The concept of the state of a (finite) collection of broadcast units facilitates
discussionof potential ambiguities in the actions and interactions of the four types
of unit . This state at time t is, quite simply, the set of al/ signals presentat time t ,
including the stringsdefining devicesin the collection, the signalsgeneratedby those
devices, and the signals generated in the environment of the collection (input
signals) . Thus the initial state is the set of strings used to specify the initial collection
of units, together with all signals present initially . If we look again at the
definition of type 4 broadcast units, we seethat they may actually use signals in
the current state to contribute additional signalsto the current state (i.e., they can
act with negligible delay much as the switching elementsof computer theory) . For
example, given the broadcast units

* llV :O.., : llV ..,


* 100:100:(XX
)
with environmental(input) signals{ tOO
, llO} at t ~ 0 and { IOO} at t = 1, the
state S(O) at t = 0 is

*
*
)} ,
), 100, 110, (XK
S( O) = { II ~ :O" ' : II ~ " ' , IOO: IOO:(XK
and at t = 1 it is
*
*
>} .
>, II <XX
>, 100, <XX
S( I ) = { I I '\7:0' Y: I I '\7' Y, IOO: IOO:<XX
The latter signal in S( I ), 11000, occurs becausethe unit * 11'\7:O' Y: II '\7' Y receives
both the signal 110and the signal 000 at t = 0, so that '\7 = 0 and ' Y = 00,
and hencethe output 11000occurs at time t = I . A little thought showsthat the
instantaneousaction of type 4 units does not interfere with the determination of
a unique state at each time since type 4 units can add at most a finite number of
signalsto the current state.
Sincethe symbols '\7 and ' Y are meant only to designateinitial or terminal
strings their placement within arguments of a broadcast unit can be critical to
unambiguous interpretation. For types I through 4, if h contains exactly one

148

Adaptation in Natural and Artificial Systems

unquotedoccurrenceof a symbolfrom the set { V , '9'} , then that symbolmust


occur in either the first or last positionto be interpreted; otherwiseV or '9' is
treatedsimplyas a null symbolwithout function or interpretation. If h contains
morethan one unquotedoccurrenceof symbolsfrom the set { V , '9' } then only
the leftmostis operativeand then only if it occupiesthe first position. For type 4
the sameconventionappliesto It , with the additionalstipulationthat, if the operative
symbolis the samein both hand It. then only the occurrencein h is interpreted
. Similarly, for types I through 4, only the leftmost occurrenceof an unquoted
. ~ oreover, if V , '9' , or ~ occurunquotedin the output
~ is operative
of
the
broadcast
unit without interpretableoccurrences
in the arguments
signal
,
they are onceagaintreatedas null symbols(and are not broadcast
). Thus the
broadcastunit *V II V ~ 0: II ~ " looks for" any signal with a 4 symbolsuffix
, the signal001110wouldyield the
beginningwith II and endingin 0 ; for example
III
one
time
later.
output
step
The final sourceof ambiguityariseswhentwo or more signalssatisfythe
sameargumentof a givenstring-processingbroadcastunit. For example
, when
the stateis
S(t) = { * IIV : V , III , lIOO}
the broadcastunit * IIV : V could processeither II or 1100
, producingeitherthe
I
00.
or
else
the
This
is
resolved
output
output
difficulty
by havingthe unit select
one of the two signalsat random. That is, if thereare c signalssatisfyinga given
argumentat time t , then eachis assigneda probability Ilc and one is chosenat
random under this distribution. This methodof resolvingthe difficulty extends
the powerof the language
of randomprocess
es.
, allowingthe representation

3. USAGE
The following examplesexhibit typical constructionsand operationswithin the
"
" broadcast
language:
1. The objectis to producethe concatenationof two arbitrary persistent
. In so doing the
stringsuniquelyidentifiedby the prefixes It and It respectively
prefixesshouldbe droppedandthe resultshouldbe identifiedby the newprefixI ..
This is accomplished
by the broadcastunit

*phV :ph' " :phV ... .


The object is to generatea sample of the random variable defined by

assigningprobability Lin to each of the numbers { I , 2, . . . , n} . To do this each

Adaptation of Codingsand Representations

149

string in a set of n persistentstrings, saythe binary representationsof the numbers I


through n, is prefixed by the samestring, say I , which uniquely indicates that the
strings are to serveas the data basefor the random number generator. When the
state S(t) contains these strings and the signal (string) J , the broadcast unit
*
plV :J :ph V
then accomplishes the task, with J signaling that the sample-taking procedure is
to be initiated , and h indicating the result. Simple, nonuniform random variables
can be approximated by making multiple copies of numbers in the base (so that
their proportions approximate the nonuniform distribution ). More complex distributions
can be handled by using the general computational powers of sets of
broadcast units in conjunction with the above procedure.
3. The object is to generatea sampleas in ( 2) but without replacement(the
number drawn is deleted from the data base tagged by I ). To accomplish this a
secondbroadcast unit is added to the one in (2) giving
* I V :J :
*
p
phV phV : :p I V .
The secondunit deletesthe string just selectedfrom the data base sincejust that
string is uniquely prefixed with h by the first broadcast unit .
4. A particular substring 10is to serve as a special punctuation mark and
the object is to cleave an arbitrary string at the first (ith ) occurrence of 10(if it
occurs) in that string. To accomplishthis let 1 be a prefix identifying the string to
be cleaved, let h identify the component to the left of 10after the cleavageand let
h identify the component to the right (including 10). The following set of broadcast
units accomplishes the cleavageat the first occurrenceof 10:
*h :
ph
*h : I V : V
ph
p

Signal h initiates the processand the string which


will be developedinto the right component is given
its initial configuration, i.e., it is " set equal" to the
string to be cleaved.

*
plJoO :h
* : I Jo :J,
p O

A test is made to seeif the punctuation 10is a prefix


of the current version of the right component. If so
signal h is emitted, indicating that cleavagehas been
accomplished. Otherwise J, is emitted, indicating
that the test should be repeatedone placeto the right .

*h :/ .
*1. :/ 6
*/ 6:/ . :/ .

Signal J, indicates that the punctuation test failed


exactly two time-stepsago.

ISO

Adaptation in Natural and Artificial I Systems

*ph6. V :J. : faV


*fa6. :phV :phV 6.
*ph6. V :J. :/ .V
*1. V :ph V
*ph V :J. :J.V
*J.V : :phV
*ph V :J. :J7V
*J7V : :ph V
*J. :Js
*Js:J.

The left componentis readiedfor a newtest by hav" "


ing the leftmost symbolof the old right component
addedto its right end.
"
The right componentis " updated
by deletingthe
left
.
added
to
the
component
symboljust
The " old" right and left componentsare deleted
with the " updating" above) so that
(simultaneously
"
"
only the new componentsfor testing appearin
the stateafter two time- steps.
J. signalsthat the punctuationtestis to be reinitiated
.
for the " new" components

of 10
To cleavethe stringat the ith occurrence
, insteadof the first, a countmustbe
occurrences
of 10.Sincethe signalJssignalssuchan occurrence
madeof successive
,
of Js, restartingthe processeachtime
occurrences
this meanscountingsuccessive
the count is incremented(by issuingthe signalJa) until the count reachesi. The
nextexampleindicateshow a binarycountercan be setup to recordthe count.
of a signalJs. The basic
S. The objectis to count(modulo2- ) occurrences
techniqueis illustratedby the constructionof a one-stagebinary counter. The
transitionfunction(table) for a one-stagebinarycounteris

The setof broadcastunits which realizesthis functionfor an initial state(signal)


Sois:

*.."rt .."rO Makes current condition of input available for calculation of


*h :J1
new state ( on next time - step) .
*SO
:J10 Makes current state available for calculation of new state ( on
*Sl:JU
next time - step) .
*Jo:ho:So Realizationof the transitiontable.
*JO
:JU:Sl
*J1:J10
:Sl
*J1:JU:So

ISI

Adaptation of Coding" and Representations

For example, if Is occursat timest = I and t = 3 the sequenceof all signalsbroadcast


) is:
(the overall state sequence

The use of broadcast units to realize the given transition table is perfectly general
and allows the realization of arbitrary transition functions (including counts
modulo 2- ) .
6. Treating the persistent strings as data implies that it should be possible
to processthem in standard computational ways. As a typical operation consider
the addition of two persistentbinary integers. The object, then, is to set up broadcast
units which will carry out this addition. Let Al and At be the suffixes which
identify the two strings. The addition can be carried out serially, digit by digit,
"
from right to left. Much as in example (4) the " rightmost digits are successively
extracted by the broadcast units
. V .6.ai : h .6.
. V .6.At : It .6.
Thesedigits, together with the " carry" from the operation on the previous pair of
digits, identified by prefix It , are submitted as in example (5) to broadcast units
realizing the transition table:
h
Addendl

It
Addend2

h
Carry

I.
Slim

I.
New Carry

0
0
0
0
1
1
1
1

0
0
1
1
0
0
1
1

0
1
0
1
0
1
0
1

0
1
1
0
1
0
0
1

0
0
0
1
0
1
1
1

.
Successive
digits of the sum are assembledby the broadcastunit I .~ :pA V :pA ~ V
where, at the end of the process, the prefix A designatesthe sum. A few additional

152

Adaptation in Natural and

J
Artificial,System

"
.
housekeeping units such as pAV :pV A , which puts the sum in the sameform
as the addends, are required to start up the process, keep track of position, etc.
The overall processis simply a straightforward extension of techniques already
illustrated.
7. As a final example note that any string identified with a suffix I can be
.
reproducedby the broadcast unit V I : V I . Note additionally that this unit itself
has the suffix I ! Hence, if we start with this unit alone, there will be 2' copies of it
after t time-steps. By revising the unit a bit , so that its action is conditional on a
.
signal J , V I :J : V I , this self-reproductioncan be control led from outside (say by
other broadcastunits) . By extending this idea, with the help of the techniquesoutlined
previously, we can put together a set of broadcast units which reproducesan
arbitrary set of broadcast units (including itself) . The result is a self-reproducing
"
"
entity which can be given any of the powersexpressiblein the broadcastlanguage.
"
At this point it would not be difficult to give the " broadcast language a
precise, axiomatic formulation , developing the foregoing examplesinto a formal
, say, in
proof of its powers. ( For anyone familiar with the material presented
Arbib [ 1964] or Minsky [ 1967] this turns out to be little more than a somewhat
tedious exercise.) However, our presentobjectiveswould be little advancedthereby.
"
It is already reasonably clear that the " broadcastlanguage exhibits the desiderata
outlined at the beginning of section 2. In particular, the broadcastunits satisfy the
functional integrity requirement(2) in a straightforward way. Consequently, strings
of broadcast units can be manipulated by generalized genetic operators with
attendant advantagesvis-a-vis schemata(seesection6.3 and the closeof chapter 7) .
Moreover a little thought shows that by using the techniquesof usage(4) along
with those of (2), units can be combined to define a crossoveroperator which acts
"
"
"
"
.
only at specified punctuations (such as s or : s or at a particular indicator
string / ) . The other generalizedgenetic operators can be similarly defined. New
detectors can be formed naturally from environmental signals (represented as
binary strings) . For example; a signal can be converted to an argument which will
"
'
"
detect similar signals (elements of a superset) simply by inserting don t cares
.
( 0 ) at one or more points. Thus, EV :pI V converts any signal with prefix E
"
"
( environmental ) into a permanent piece of data which can then be manipulated
as in usages(4) and (6) to form a new broadcast unit with some modification of
the signal as an argument.
The collection of broadcast units employed by an adaptive system at any
will
time
, in effect, determine its representationof the environment. Sincethe units
themselvesare strings which can be manipulated by generalizedgeneticoperators,
"
"
strings of units ( devices ) can be made subjectto reproductive plans and intrinsic
"

Adaptation of Codingsand Representations

IS3

parallelism in processing. More than this, the adaptive plan can modify and coordinate
the broadcastunits to form models of the environment. By implementing,
within the language, the " prediction and correction" techniques for models discussed
in Illustration 3.4, we can arrive at a very sophisticatedadaptive plan, one
which can rapidly overcomeinadequaciesin its representationof the environment.
This approach will be elaborated in the next section. The " broadcast language"
already makes it clear that there exist languagessuitable both for defining arbitrary
representationsand for defining the operators which allow these representations
to be adapted to environmental requirements.
4 . CONCERNING APPLICATIONS AND THE USE OF GENETIC PLANS
TO MODIFY

REPRESENTATIONS

The broadcast language provides unusually straightforward representations for a


variety of natural models . Such representation not only provides a uniform context
for comparisons and rigorous study , it also makes clear the " computational "
or processing power of the model and its susceptibility to adaptation .
The Britten - Davidson generalization ( 1969) of the " operon - operator "
model serves well to illustrate the point . This is a model for regulation in higher
cells ; as such it includes many mechanisms not found in the simpler bacterial cells
modeled by Jacob and Monod ( 1961) . The model consists of four basic types of
gene ( see also Figure 14) :
I . The sensor gene is activated (perhaps via intermediary molecules ) by any
of various agents (enzymes, hormones , metabolites ) involved in interand
intracellular control . That is , the sensor gene is a detector sensitive
to the state of the cell and its environment .
2. The producer genes are the specific controls for the actual production of
cell structures ( membranes , organelles , etc.) and operating agents
(enzymes, etc.) . They are the output controls of the regulation procedure .
3. Each integrator gene is associated with a sensor gene and sends out a
specific signal ( molecule ) to other genes when the sensor is activated .
Several integrator genes may be associated with a single sensor, thus
allowing the sensor to initiate a variety of signals .
4. The receptor gene is a link between integrator genes and producer genes.
Each receptor gene is associated with a single producer gene and is
sensitive to a single integrator signal . When the signal is received the
producer gene is activated . A given producer gene may have several
associated receptors .

1S4

Adaptation In Natural and

.f
ArtificialSY8tem

~~ .,41.~ ~- - - - - - - - - - - - - - - - - - - - II I L- - - - - - - - - - - - ,
,
_ ___
I
LL_
- - - - - - _- , .,.
I
.

I
I
EZ
~ ~ i -.~. ~_- - - - . .
~
~
+, - - - - - .,
+,
. L___ - "'!.. r (
L- - - - 8
8,
H
~
I

.I..:rI
.+
I
I
~zz
~.!-M

t.:
= ::::::~
t11C
. I
I I
I
- - - - _ .J .
'- - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - _JI
-:.
~
.
L__ _ _ _ _ .- --- - - - - - - ,
--- - --- - - - -t
' }J
8
~
8H
-trliI

-' i'- ,

.
. II .;. r - -- -- -- -- .,:t
= ~ ~ i~
- .J
I .. .. . .. r
..
II I
I . . . . ..
.
L ___ __ - _ ":_ - - - - - - - - - - - -

SENSORGENE

w"' "' "' "' ~

INTEGRATOR
GENE

t -f

RECEPTOR
GENE

PRODUCER
GENE
MOLECULAR
SIGNAL
(CHAIN)

"
"
Fig . 14. Schematic of Britten- Da, id.fOIIgeneralized operon- operQlor
modelfor generegulation in higher cell.J

Translated to the broadcaStlanguage a sensor-integrator gene complex SIJJ . is


*
directly representedby the set of broadcast units Sh * S: It * S: I . . Areceptor producer complex RIRsP is similarly representedby * Rl :P * Rt :P. If the receptor
Rl respondsonly to the integrator I ., say, then * R1: P would be replacedby *I . :P.
It is apparent, however, that the receptor Rl could be activated by severalrelated
signals, say lA , IB , IC , . . . , in which casethe producer complex would be represented
*
by 10 :P. Similarly asensorS could be sensitiveto any product with an
initial radical X so that ShI J . could be representedby * X 0 : h * X 0 : It * X 0 : I . .
Clearly extremely complex feedback loops can be constructed, allowing a great
range of conditional actions dependentupon substrateand products already pro-

J
Adaptation of Coding, and Representation

ISS

duced. As an example (simplified for brevity) the sensor-integrator-receptor*


*
*
producer complex XIO : hA hO : X , p X , p: hC maintains production of X , p if
"
a metabolite with initial radical Xl ever makes an appearance. In fact, if sensors
and receptorscan have the samerange of sensitivity as the argumentsof broadcast
units, it is easy to show that there is an appropriate Britten-Davidson model for
producing any arbitrarily given sequenceof products.
A very similar representationcan be produced for the lymphocyte immune
network, well describedby Niels K . Jerne ( 1973) and presentedmore technically
"
"
by M . Sela ( 1973). In this case the detectors are combining siteson antibody
"
molecules produced by lymphocyte cells. The environmental " signals are invading
antigens (e.g., foreign protein molecules). The presence of a detected
"
antigen causesthe production of additional lymphocytes (additional broadcast
"
units, seeusage(7) of section 3) which in turn secreteadditional antibodies which
combine with (and neutralize) the antigens.
A bit further afield the broadcast language can also serve for astraight forward representation of the cell assemblymodel of the central nervous system
"
(section 3.6) . Here the broadcast units are cell assemblieswhile the to-whom-it "
may-Concern aspect of the broadcast language is reasonably approximated by
the large number of neurons ( 101to 10' ) in other assembliescontacted by each
neuron in a given cell assembly. ( More specific interconnections can be represented
"
"
by appropriate tagging (prefixes) as in section 3.) Then, synaptic
"
"
learning rules which induce fractionation and recruitment in cell assembliesfind
counterparts in generalized genetic operators which modify representations.
Closely associatedcell assembliesbecome the counterparts of tested representational
components(cf. schemata), and so on. ( The interestedreader should consult
Plum [ 1972] for the details of a related model.)
In the context of the broadcast language, the cell assembly model fits
smoothly with the predictive modeling technique of section 3.4. A discussion of
the latter implementation also gives an indication of how the broadcast language
is applied to artificial systems. One implementation which emphasizesthe cellassembly/ predictive-modeling fit relies on a set of behavioralunits which generate
action sequencesand are modified on the basis of the 'outcome. Each behavioral
unit consistsof a population of behavioralatoms realized as devicesin the broadcast
language. If we look back to the searchstrategiesof Figure 6 it is the detectors
which have a role comparable to the atoms here. In the broadcast language, the
detectorswould be broadcastunits (or setsof them) with argumentscorresponding
to the conditions defining the detector. (For example, the atom corresponding to
8t in Figure 6 would be activated by any 4- by-4 array with 8 or more dark squares.)

156

Adaptation in Natural and Artificial Systems

The unit(s) implementing the detector, when activated, would broadcast a signal
with an identifying prefix. (For the reader familiar with the early history of pattern
'
recognition these units act much like the demons at the lowest level of Selfridge s
"
"
"
"
Pandemonium [ 1959] .) Other devices would weight the signals, sum them,
and " compare" the result to a threshold (cf. section 7.3) to determine which
responsesignal (from the set of transformations {77i
} ) should be broadcast. More
generally, the behavioral atoms would be a string of broadcast units with an
" initiate " condition C which
specifiesthe set of signals capable of activating the
atom, an " end" signal J . which indicates the end of the atom' s activity, and a
"
"
predicted value signal which is meant to indicate the ultimate value to the
behavioral unit of that atom' s activation. With this arrangementwe can treat the
behavioral unit as a population of atoms. The atom ac~ivated at any given time is
determined by a competition betweenwhatever atom is already activated and all
atoms having condition C satisfied by a signal in the current state S(t ). The higher
the predicted value of the atom the more likely it is to win the competition. The
'
object at this level is to haveeach atom s predicted value V consistentwith that of
its successor
, so that a set of atoms acting in sequenceprovides a consistent prediction
of their value to the behavioral unit. ( In this way the atoms satisfy the
error correction requirements discussedat the beginning of section 3.4 under
element(iii ) of a typical searchplan.) This object can be accomplishedvia a genetic
plan applied to the population of atoms- the reproduction of any atom is determined
by the match betweenits predicted value and that of whatever atom is next
activated. For example, consider two atoms, alwithparameters ( C, J , V) and a2
with parameters( C' , JI , V' ), where ai ' s end signal acts as a2's initiate signal. Then
'
'
( V - I V - V I) could be used as a payoff to al for purposesof the genetic plan,
since the quantity measuresthe match between V' and V. The population would
then be modified as outlined in section6.1, new atoms being assignedthe predicted
value of the successora2. That is, the offspring of al would be assignedthe predicted
value V' . All atoms active sincethe last actual payoff from the environment,
and their offspring, are taggedand their predicted valuesare adjusted up or down
at the time of the next environmental payoff. The adjustment is determined by the
differencebetweenpredictedvalue and the actual payoff rate (payoff receivedfrom
the environment divided by the elapsedtime sincelast payoff). After eachenvironmental
payoff the active behavioral unit is subjectedto a genetic plan (again as
describedin section 6.1). The behavioral unit next active (after the environmental
payoff) is determined by the winner of a competition among al/ atoms in all
behavioral units. The outcome of the competition is determined in the same way
as the within -unit competition. Finally , a behavioral unit may be subjected to

Adaptation of Codingsand Representations

lS7

competition in the absenceof environmental payoff, the probability of such an


event increasingas a function of elapsedtime since last payoff.
Much more detail would have to be supplied for this implementation of
predictive modeling to reach the level of precision earlier given to the description
of geneticplans. However the objective here is only to indicate the potential of the
broadcast languagefor predictive modeling with changing representations.
As a final example, note that the world of radiative signals (sound, light,
etc.) is susceptibleto modeling as a complex broadcastsystem. In fact one physical
realization of devicesspecified in the broadcast language would assign a unique
frequency to each signal and realize broadcast units as a variety of frequency
modulation devices.
Even where the broadcast language does not so directly represent extant
models, it still suppliesa rigorous framework for the description and modification
of representations. In particular it makes possiblethe application of genetic plans
to the problem of discovering suitable representations. Becausedevicesare represented
by strings and becausethe functional elements(the broadcast units) are
self-defined, the generalizedgenetic operators of sections6.2 and 6.3 can be used
to modify the devices. Moreover, as indicated in section 3, these operators can
themselves be defined within the broadcast language. This makes possible a
hierarchy of operators defined with respectto various kinds of punctuation. Thus,
one crossoveroperator could be defined to produce crossing-over anywherealong
the string, another could be defined to produce crossing-over only at the symbol :
(thereby providing for exchangeof arguments between broadcast units), still another
*
only at (thereby exchangingbroadcast units betweendevices), and so on.
The operators so defined introduce a hierarchy of schemataranging from schemata
concerned primarily with varieties of arguments, through schemata concerned
with combinations of broadcast units, and on to higher levels of organization
(e.g., behavioral atoms, behavioral units, etc.) .
Note that for the broadcast language schemataare generally defined with
respectto sets of arbitrarily long strings. That is, the set of all devicesspecifiable
in the broadcastlanguagewould be the set of all strings which can be formed from
the ten basic symbols; sincea schemadesignatesthe set of all strings which match
it on its defining positions, each schemadesignatesa countable subsetof devices.
Using this extensionof the notion of a schema, we seethat the results of chapters6
and 7, particularly those pertaining to intrinsic parallelism, extend directly to the
adaptation of codings and representations. Since the operators themselvescan
also be specifiedwithin the broadcast language, they can also be made subject to
the sameadaptive processes.

158

.f
Adaptation In Natllral and Artificial SY.ftem

In sum: This chapter has been concerned with removing the limitations imposed
by fixed representations. To this end it is possibleto deviselanguages- the broadcast
languageis an example- which use strings to rigorously define all effectively
specifiablerepresentations, models, operators, etc. Sincethe objectsof the language
are presentedas strings, they can be made grist for the mill provided by genetic
plans. As a consequencethe advantagesof compact storage of accrued information
, operational simplicity, intrinsic parallelism, robustness, etc., discussedin
chapters6 and 7, extend to adaptation of representations.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

9. An Overview

Enough of the theoretical framework has now been erected that we can begin to
view it as a whole. To this end, the presentchapter will discussthree generalaspects
of the theory. Section I will concentrate on those insights offered by the theory
which are useful acrossthe full spectrum of adaptive problems. Section 2 provides
a synopsisof severalcomputer studiesto give the reader an idea of how the overall
theory works in particular contexts. Section 3 will outline several difficult longrange problems which fall within the scopeof the theory.

1. INSIGHTS
Within the theoretical framework problems of adaptation have been phrased in
terms of generating structures of progressivelyhigher performance. Becausethe
framework itself placesno constraints on what objects can be taken as structures,
other than that it be possible to rank them according to some measure of performance
, the resulting theory has considerable latitude. Once adaptation has
been characterizedalong these lines, it is also relatively easy to describe several
pervasive, interrelated obstacles to adaptation- obstacles which occur in some
combination in all but the most trivial problems:
1. High cardinality of (1,. The set of potentially interesting structures is
extensive, making searches long and storage of relevant data difficult .
2. Apportionment of credit. Knowledge of properties held in common by
structures of above- averageperformance is incomplete, making it difficult
to infer from past tests what untested structures are likely to yield
above- averageperformance.
3. High dimensionality of liB. Performanceis a function of large numbers
of variables, making it difficult to useclassicaloptimization methods employing
gradients, etc.
159

160

Adaptation in Natural and Artificial Systems

Nonlinearity of #I.E. The performance measure is nonlinear, exhibiting


"
" false
peaks and making it difficult to avoid concentration of trials in
suboptimal regions.
Mutual interference of searchand exploitation . Exploitation of what is
known (generation of structures observed to give above-average performance
) interferes with acquisition of new information (generation of
new structures) and vice versa.
Relevant non-payoff information . The environment provides much
information in addition to performance values (payoff), some of which
is relevant to improved performance.
The schemaconcept suggestsa coordinated array of robust proceduresfor meeting
these obstacles. The proceduresare all founded on the view that each structure is
a " carrier" (or selectedsample point) of each of the great number of schematait
instances. Becausearbitrary structures are easily representedas strings (by using
detectors or more sophisticated techniques such as the broadcast language) the
resulting procedures apply to adaptation in all its forms. Once schemata have
beendefined, there is a natural means(p. 69) of comparing structures and apportioning
credit by assigningto each schemathe averageof payoffs to its observed
instances (compensating obstacle (2 . A small population of structures, when
properly selected(pp. 139- 40), can then store the relative performancerankings
for very large numbers of schemata(compensatingobstacle ( I . It is this broad
data base vis-a-vis schemata(p. 87) which enables genetic plans to escape false
peaks and other difficulties engenderedby nonlinearities (compensating obstacle
(4 . Recasting the search problem in terms of the space of schemata sidesteps
dimensionality effects (obstacle (3 , at least for intrinsically parallel procedures
such as genetic plans (p. 71) . Under such plans the successionof structures generated
from the data base (the current population) induces a highly parallel,
diffusion-like spread of trials in the space of schemata (pp. 104- 6) . This takes
place in such a fashion that there is:

and

( I ) progressiveexploitation of the best observedschemata,


(2) increasingconfidencein the estimatesof the expectedpayoff to the best
observedschemata,
(3) testing of great numbers of new combinations of schemata(both newly
generatedschemataand new combinations of already tested schemata
of high rank) .

An Overview

161

In the particular caseof the interaction of crossoverand inversion with reproduction


a net of associationsis induced (p. 108) . Coadapted attributes (attributes
defining schemata of above- average performance) become tightly linked and
increasetheir proportion in the population (p. 127) . In fact (p. 137) the expected
rate of increasedPf/ dt of the proportion of any given schema~ is closely approximated
by
dPf/ dt = PE<I'f - p,) = P ~ f,
where af is the averageexcessof the random variable ~ in the population <B(t ).
This formula is analogous to Fisher' s ( 1930) classical result for single alleles and
reducesto it when ~ is restrictedto a singledefining position. The resulting intrinsic
parallelism greatly ameliorates the conflict between search and exploitation
(obstacle(5 . By building up representationsand models in terms of a language
like the broadcastlanguage(p. 152) the overall advantagesof the schemaapproach
can also be brought to bear on the problem of non-payoff information (obstacle
(6 . The schemataprovide for apportionment of credit to various aspectsof the
model on the basisof their relevanceto realized predictions.

2. COMPUTERSTUDIES
At the time of this writing several computer studies of genetic plans have been
completed (and more are underway). Four studies closely related to the theoretical
framework will be outlined here: R. S. Rosenberg's Simulation of Genetic
'
Populationswith BiochemicalProperties ( 1967), D. J. Cavicchio s Adaptive Search
'
Using Simulated Evolution ( 1970), R. B. Hollstien s Artificial Genetic Adaptation
in ComputerControl Systems( 1971), and DR . Frantz' s Non-linearities in Genetic
Adaptive Search( 1972).
Richard Rosenbergcompleted his computer study of closed, small populations
while formulation of the theoretical framework was still in its early stages.
He concentratedon the complicated relationship betweengenotypeand phenotype
under dynamic interaction between the population and its environment. The
model' s central feature is the definition of phenotype by chemical concentrations.
These concentrations are controlled by enzymesunder genetic control. Epistasis
has a critical role becauseseveral enzymes(and hence the corresponding genes)
can affect any given phenotypic characteristic (chemical concentration) . Though
the variety of molecules, enzyme-controlled reactions, and genesis kept small to
make the study feasible, it still presentsa detailed picture of the propagation of

162

Adaptation in Natural and Artificial Systems

advantageous, linked genes through a small population. Moreover the study


suggestsgeneral relations between the number of genes, crossover probabilities,
and the rate of adaptation under epistasis. Equally important, the study makes it
clear that quite complex (" molecular" ) definitions of phenotype can be simulated
without losing relevance, up to and including suggestionsfor experimentsin vivo
and in vitro. (At least two subsequentdetailed studies of biological cells were
'
directly encouragedby this experience, R. Weinberg s Computer Simulation of a
'
Living Cell [ 1970] and ED . Goodman s Adaptive Behaviorof SimulatedBacterial
Cells Subjectedto Nutritional Shifts [ 1972].)
The first study based directly upon the theoretical framework was that of
Daniel Cavicchio. (J. D . Bagley's The Behaviorof Adaptive SystemsWhich Employ
Genetic and Correlation Algorithms [ 1967] is an earlier study which is a direct
'
precursor of both this study and Frantz s.) The set of structuresClis taken to be a
broad class of pattern classification devicesbased on those developedby Bledsoe
and Browning ( 1959) and Uhr ( 1973). Specificallyeachdeviceusesa set of detectors
to processinformation presentedby the sensorsin a 25 by 25 array (cf. section 1.3
and Figures 5 through 7). After an initial " training " period, during which the
deviceA E: Cl accumulatesinformation about one or more handwritten alphabets,
A is tested and scored on its classification of letters from another handwritten
'
alphabet. This score amounts to A s performance rating, its payoff ~ A ). The
adaptive plan, a version of the ffid class of reproductive plans (pp. 94- 95), generates
new detectors(and, in the process, new devices) by using genetic operators
which are variations on the operators discussedin sections6.2 through 6.4.
Becauseof the sophistication of the problem environment, the first objective
is to develop some estimate of the task' s difficulty vis-a-vis the devicesin Cl.
Cavicchio does this by testing, in the problem setting, a large number of devices
drawn at random from Cl. The observeddistribution of performancesis Gaussian.
For a typical environment (Cavicchio calls it the " difficult task" ), the mean score
is 17 with a standard deviation of 5. (A perfect score would be 100.) This implies
> random trials of devicesdrawn from Cl we can expect the best performance
that in I<XX
to be about 32.
To obtain an idea of the performance of a nonreproductive, but adaptive,
'
plan in the sameenvironment, Cavicchio applied a version of Uhr and Vossler s
"
"
( 1973) detector evaluation procedureto the searchof Cl. This procedureamounts
to identifying inferior detectors and replacing them with " mutated" versions.
The best performance observedover a great many runs of 600 trials each was a
score of 52; each of the runs " leveled out " somewherebetweenthe 300th and the
600th trial . This is considerably better than a random search, being 7 standard

An Overview

163

deviations above the mean in lessthan 600 trials (as comparedto 3 standard deviations
in ICXX
> trials) .
Against this background Cavicchio then developed and tested a series of
reproductive plans. The best of theseattained a score of 75.5 in 780 trials , a score
considerably beyond that attained in any of the " detector evaluation" runs. (In
"
qualitative terms, a score of 52 would correspond to a " poor human performance
"
"
, while a ~ ore of 75.5 would correspond to a good human performance.
Becausemany characters in the " difficult task" are quite similar in form , increments
in scoring are difficult to attain after the easily distinguished characters
have been handled.) An important general observation of this study is that the
sophistication and power of a genetic plan is lost whenever M , the size of the
population (data base), is very small. It is an overwhelming handicap to use only
the most recent trial ( M = I ) as the basis for generatingnew trials (cf. Fogel et al.
1966). On the other hand, the population need not be large to give considerable
scopeto genetic plans (20 was a population size commonly used by Cavicchio) .
Roy Hollstien added considerably to our detailed understanding of genetic
plans by making an extensive study of genetic plans as adaptive control procedures
. His emphasis is on domains wherein classical " linear" and " quadratic"
approaches are unavailing, i.e., domains where the performance function exhibits
discontinuities, multiple peaks, plateaus, elongated ridges, etc. To give the problems
a uniform settinghetransforms them to discrete function optimization problems
, encoding points in the domain as strings (see p. 57) . An unusual and
'
productive aspect of Hollstien s study is his translation of breeding plans for
artificial genetic selection into control policies. A breeding plan which employs
inbreeding within related (akin) policies, and recurrent crossbreedingof the best
policies from the best families, is found to exhibit very robust performance over
a range of 14 carefully selected, difficult test functions. ( The test functions include
such " standards" as Rosenbrock' s ridge, the sum of three Gaussian2-dimensional
"
density functions, and a highly discontinuous checkerboard" pattern.) The test
functions are representedon a grid of 10,CXX
> points ( 100 by 100) . In each casethe
region in which the test function exceeds90 percent of its maximum value is small.
For example, test function 7 with two false peaks (the sum of three Gaussian
2-dimensionaldensities) exceeds90 percentof its maximum value on only 42 points
out of the 10,000. The breeding plans are tested over 20 generations of 16 individuals
each, special provisions being made to control random effects of small
"
"
sample size ( genetic drift ). The breeding plan referred to above, when confronted
with test function 7, placed all of its trials in the " 90 percent region" after
12 generations( 192 trials) . A random searchwould be expectedto take 250 trials

164

Adaptation in Natural and Artificial Systems

"
"
( 10,000/ 42) to place a single point in the 90 percent region. The same breeding
plan performs as well or better on the 13 other test functions. Given the variety of
the test functions, the simplicity of the basic algorithms, and the restricted data
base, this is a striking performance.
Daniel Frantz concentrated on the internal workings of genetic plans,
observing the effect, upon the population, of dependenciesin the performance
function. Specifically, he studies situations in which the quantity to be optimized
is a function of
binaryparameters . I .e., 8 consists of functions which are
25-dimensional and have a domain of 225= 3.2 X 107 points. Dependencies
between the parameters(nonlinearities) are introduced to make it impossible to
optimize the functions dimension by dimension (unimodality is avoided) . Frantz' s
procedure is to detect the effects of thesedependenciesupon population structure
(geneassociations) by using a multidimensional chi-square contingency table. As
expectedfrom theoretical considerations (see Lemma 7.2 and the discussion following
it ) algebraic dependencies(between the parameters) induce statistical
dependencies( betweenalleles) . That is, in the population, combinations of alleles
associatedwith dependent parametersoccur with probabilities different from the
product of the probabilities of the individual allele~. Moreover there is a positional
effect on the rate of improvement: For functions with dependenciesthe rate
of improvement is significantly greater when the corresponding alleles are close
together in the representation. This effect corresponds to the theoretical result
that the ability to pass good combinations on to descendantsdependsupon the
combinations' immunity to disruption by crossover. It is significant that , for the
problems studied, the optimum was attained in too short a time for the inversion
operator to effectively augment the rate of improvement (by varying positional
effects) .

3. ADVANCED
QUESTIONS
The results presentedin this book have a bearing on several problem areas substantially
more difficult than those recounted in section 9.1. Each of these problems
has a long history and is complex enough to make suddenresolution unlikely .
Neverthelessthe general framework does help to focus several disparate results,
providing suggestionsfor further progress.
As a first example, let us look at the complex of problems concernedwith
the dynamics of speciation. These problems have their origin in biology, but a
closelook showsthem to be closely related to problemsin the optimal allocation of

16S

An Overview

limited resources. To seethis, consider the following idealized situation. There are
two one-armed bandits, bandit ~l paying 1 unit with probability Pion each trial ,
bandit ~2 paying 1 unit with probability P2 < Pl. There are also M players. The
casino is so organized that the bandits are continuously (and simultaneously)
operated, so that at any time I , for a modest fee, a player may elect to receivethe
payoff (possibly zero) of one of the two bandits. The managerhas, however, introduced
a gimmick. If Ml players elect to play bandit ~l at time I , they must share
. That is, on that particular trial ,
the unit of payoff if the Qutcomeis successful
1
of
a
will
receive
of
the
each
/ Ml with probability Pt. Now , let
Ml players
payoff
a period of T consecutivetrials.
for
us assumethat the M playersmust participate
=
1), clearly he will maximize his income (or minimize
If there is but one player (M
his losses) by playing bandit ~l at all times. However, if there are M > 1
players the situation changes. There will be stable queues, where no player can
improve his payoff by shifting from one bandit to another. These occur when the
players distribute themselvesin the ratio Mil M2 = Pl/ P2 (at least as closely as
allowed by the requirement that Ml and M2 be integers summing to M ) . For
example, if PI = i , P2= 1, and M = 10, there will be 8 players queuedin front of
bandit ~l and 2 players in front of bandit & . We seethat with limited resources(in
the numerical example, a maximum of 2 units payoff per trial and an expectation
of 1 unit) the population of players must divide into two subpopulations in order
"
"
to optimize individual shares of the resources(the bandit ~l players and the
"
" bandit
& players ). Similar considerations apply when there are r > 2 bandits.
We have here a rough analogy to the differentiation of individuals (the
subpopulations) to exploit environmental niches (the bandits) . The analogy can
be mademore preciseby recastingit in terms of schemata. Let us consider a population
1
of M individuals and the set of 2' schematadefined on a given set of /0
"
'1
positions. Assume that schema ~.., i = 1, . . . , 2 , exploits a unique environmental
"
niche which producesa total of Q.. units of payoff per time step. (Q.. corresponds
to the renewal rate of a critical , volatile resourceexploited by ~...) If the
population contains M .. instancesof ~.., the Q.. units are shared among them so
that eachinstanceof ~.. receivesa payoff of Q../ M ... Let Q(l ) > Q(2) > . . . > Q(2P)
so that schema ~(l ) is associatedwith the most productive niche, ~(2) with the
second most productive niche, etc. Clearly when M (1) is large enough that
Q(l)/ M (1) < Q(2), an instanceof ~(t ) will be at a reproductive advantage. Following
the sameline of argument as in the caseof the 2 one-armed bandits, we get as a
stable distribution the obvious generalization:
M (11= cQ(il/ Q(j)

JSystems
Adaptation in Natural and Artificia

166

wherej is the smallestindexsuchthat


/ QU+l ) > M
Ei :tl Q(11
and c is chosenso that
E {. 1cQ(t')/ Q<l1 ~ M
, let /0 ~ 2 with
). For example
(modifiedso that the actualsolutionis in integers
2 alleles(attributes) at eachlocus, yielding schemata~1, &, &, ~4 with Ql ~ I ,
Qt ~ 4, Qa~ 8, Q4~ I. Then for M ~ 9 there will be 6 instancesof &, 3
instancesof &, and no instancesof ~1or ~4in the stabledistribution.
. If the populationis restricted
Herewe havea simpleexampleof speciation
to M individuals(by factorsother than the nichepayoffrates), certaincombinations
of allelesappearin a stablecompetitionwhile other combinationsare proscribed
by the samecompetition.The examplecan rapidly be mademorerealistic
by letting the payoffto eachschema~ be a randomvariablewith expectedpayoff
lIE<t) ~ min ~ ~, QJ ME<t), Q/ M(t)}
where QEis the minimum of the renewalrates of resourcescharacterizingthe
environmentalnicheassociatedwith ~, ME<t) is the numberof instancesof ~ at
time t, Q is the minimum of the renewalratesof resourcesrequiredby all the
schemata
, and M (t) is the total populationat time t. Now the schema~ will increase
its proportion at an intrinsic rate set by II~ IOI'll! it reachesthe " carrying
"
capacity of its niche, determinedby QE' or until the total populationhas increased
to a point that the overall " carryingcapacity," determinedby Q, limits
. (For the readerfamiliar with MacArthur and Wilson's [ 1967
further expansion
]
- crowded niche- effect,
work, the effect of QEcorrespondsto a K selection
whereasII~ is the intrinsic rate of increase
, under
, possiblywastefulof resources
an
on
the
of
the
environment
. Q sets ultimatelimit
classicalr selection
carryingcapacity
.) With typical
, no matterwhatthe diversityor organizationof the species
valuesfor the {QJ and Q, the populationwill onceagaindevelopinto subpopulations
characterizedby certain combinationsof alleles(schemata
), with many
.
combinationsbeingproscribed
The really interestingform of this theory would characterizeniches(and
hencethe overall payoff function II) in terms of the varietiesof schematathat
could exploit them- different schemataexploiting a given niche with differing
efficiencies
. The dynamicsof speciationwould then be determinedby competition
within andacrossniches.It is interestingthat underthesecircumstances
speciation

An Overview

167

could take place in the absenceof isolation (in contrast to the usual view, cf. Mayr
1963).
Once an adaptive system discovers that given combinations of genes(or
their alleles) offer a persistentadvantage, new modes of advancebecomepossible.
If the given combinations can be handled as units they can serve as components
"
"
( super genes ) for higher order units. In effect the systemcan ignore the details
underlying the advantage conferred by a combination, and operate simply in
terms of the advantageconferred. By so doing the systemcan explore regions of (i ,
i.e., combinations of the new units, which would otherwise be tried with a much
lower probability . ( For example, consider two combinations of 10 alleles each
under the steady state of section 7.2. If each of the alleles involved occurs with a
frequency of 0.8, the overall combination of 20 alleles will occur with a frequency
(0.8)20~ 0.01. On the other hand, if each of the two 100allelecombinations is
maintained at a frequency of only 0.5, then the 20-allele combination win occur
with frequency(0.5)2 = 0.25. I .e., the expectedtime to occurrencewin be reduced
by a factor of 25.) Since combinations of advantageousunits' often offer an advantage
beyond that of the individual units as when the units effectsare additive
) or cooperative- they are good candidates for early testing.
(linear independence
( Thecooperativecasewhere one unit effectsan enrichment which can be exploited
by another is particularly common; cf. cooperating cell assembliesor stagesof a
complex production activity such as illustrated in Figure 3.)
We have already discussed(section 6.3) the way in which inversion can
favor association between genes. However, by controlling representation, the
adaptive system can bring about changeswhich go much further , producing a
hierarchy of units. The basic mechanismstemsfrom the introduction of arbitrary
punctuation marks to control operators (seeusage(4) in section 8.3 and the discussion
on pages 152- 53) . The adaptive systemintroduces a distinct punctuation
mark (specificsymbol string) to mark off the combinations which are to be treated
as units at a given level of the hierarchy. Then the operators for that level are
restricted to act only at that punctuation. ( E.g., crossovertakes place only at the
positions marked by the given punctuation.) By introducing another punctuation
mark to treat combinationsof these units, in turn , as new units, and so on, the
hierarchy can be extendedto any number of levels. The resulting structure offers
the possibility of quickly pinpointing responsibility for good or bad performance.
( E.g., a hierarchy of 5 levelsin which each unit is composedof 10 lower level units
anows anyone of 101componentsto be selectedby a sequenceof 5 tests.) In the
"
"
hierarchy, the units at each level are subject to the same stability considerations

168

Adaptation in Natural and Artificial Systems

as schemata ( pp. 100- 102), being continually modified by operators at lower


levels. Thus certain hierarchieswill be favored becauseof their stability, the corresponding
punctuations and operators becoming common features of the overall
'
.
population Chapter 4 of Simon s book, The Sciencesof the Artificial ( 1969), gives
a good qualitative discussionof this and related topics.
It is natural to ask whether theseoperator-induced hierarchiescan account
for important features of such observed hierarchies as the organelle, cell, organ,
, . . . hierarchy of biology, or the hierarchical organization of
organism, species
the CNS or a computer program. There would seemto be a strong relation between
operator-induced hierarchiesand the sequencesof developmentalbiology (embryogenesisand morphogenesis
) whereby, for example, a fertilized egg develops into
a mature multicellular organism.
As a final problem area we can look to situations wherein payoff to a given
structure varies in time and space. For example, in the case of limited resources,
the resourcerenewalrates Qf may be both temporally and spatially inhomogeneous,
being described by a function QE<XI' . . . ' Xi , I ). In such cases we would also
expectthe population at time t to be distributed spatially, yielding <B(XI, . . . , Xi, t )
as the component at coordinate (Xl, . . . , Xi) . After someadaptation anyone .component
of the population, in responseto the spatial variations in payoff, will
generally exhibit different proportions of schematathan its neighbors.
In ecological situations, as well as in certain control situations, it is appropriate
to consider the migration of structures from one component of the population
to another (one.coordinate to another) . That is, under the direction of the
adaptive plan, the jth structure AJ( XI, . . . , Xi , t ) in the population component
<B(XI, . . . , Xi, t ) may be transferred to a neighboring coordinate (x ~, . . . , X~),
becoming an element of <B(x ~, . . . , X~, t + I ) . (Such systemscan be usefully
describedwith the help of cellular automata; seeR. F. Brender' s A Programming
Systemfor Cel/ular Spaces1969and Essayson Cel/ular Automata edited by A . W.
Burks 1970.) Under theseconditions we would expect to observea spatial diffusion
of schemata. Thus schematahaving a large number of instancesin <B(XI, . . . , Xi , t)
would be expectedto appear in fair numbers in neighboring components of the
"
"
population, even if their performancethere is poor. At the boundaries between
"
different nichesthe genetic operators will produce unusual " hybrids of schemata
common in each of the niches. That is, where there are sharp changes in the
QE<XI, . . . , Xi, I), crossoverwill yield a wide range of new schemata, which would
otherwise occur with low probability . Many of these schematawill be unfit or fit
only in the boundary region, but some may exhibit exceptional performance on
one or both niches. The relation to Mayr ' s ( 1963) description of speciation as the

An Overview

result of contact betweenpreviously isolated, locally adapted populations is manifest


. (See, however, the comment on page 166.) There is much to be learned about
these processes, particularly with referenceto schemataor coadapted sets. (Some
of the most interesting work to date has been carried out by ABrues 1972.)
It is clear that the addition of migration rules to reproductive plans affords a
sophisticated approach to spatially inhomogeneousenvironments, but we need
to know a great deal more about the efficiencyand robustnessof such an approach
(paralleling the developmentof chapters 5 and 7 for the homogeneouscase).
So far we have been discussingspatial inhomogeneity of payoff, but temporal
inhomogeneity or nonstationarity is an even more difficult problem. There
four
are
points at which the results of this book have a bearing on such problems.
First , and most obvious, the rapid responseof reproductive plans, exhibited concretely
"
"
in the studies of Cavicchio ( 1970) and Hollstien ( 1971), permits tracking
of the changing payoff function. As long as the payoff function changesat a rate
comparable to the responserate, overall performance will be good. The proportions
of schematain the population will changerapidly enough to take advantage
of current featuresof the environment. As a secondpoint, it should be noted that
the rank bestowed on a schema(its proportion in successivegenerations) is the
geometric mean of the observedaverages.aE<t ) (seeLemma 7.2) . Thus more rapid
fluctuations will favor schemata which exhibit the best (geometric) mean performance
when subjected to the fluctuations. Third , if there are repetitive (not
necessarilycyclic) features over time, dominance change provides a mechanism
for retaining useful schematawhen the featuresare not in force (seepages115- 16) .
By occasionally (say once every few generations) giving recessivestatus to instances
of currently favored schemata, they can be reserved against adverse
environmental configurations. In particular, theserecessiveinstanceshave a much
reducedtesting rate (seepage liS ) . As a result the recessiveversionsare relatively
unaffected by environmental changeswhich quickly eliminate the dominant version
. By occasionally returning an instance of a recessiveschema to dominant
status it can be tested against the current environmental configuration. If the
dominant instance achievesabove-averageperformance it will reproduce rapidly,
producing an increasing proportion of dominant instancesin the population. ( If
the performance is below averagethe newly dominant instance will quickly disappear
, at no great cost to the efficiency of the adaptive plan.) Finally , by making
the intrachromosomal duplication of a schema~ subject to the disappearanceof
an environmental feature currently exploited by ~, the effective mutation rate of ~
can be increased. For example, let the schema~ be associatedwith a sensor(see
pages 153- 54) which detects the environmental feature exploited by ~. Let intra

170

Adaptation in Natural and Artificial Systems

chromosomal duplication be an operator control led by the sensor; i.e., whenever


the sensor is deactivated, intrachromosomal duplication takes place on the sets
of genesassociatedwith the sensor. In consequence
, disappearanceof the environmental
feature will result in many copies of the genes, and hencethe schemata,
associatedwith the sensor. With a fixed mutation rate for each gene, the number
of mutants of a given schemain the population will depend upon the number of
copiesthereof. Thus by providing many copies within a chromosome, the effective
mutation rate is correspondingly increased. As a result, this (hypothetical) mechanism
provides many variants relevant to the crisis. At the same time it retains
whatever advantageremains to the original schema~. In biology there are varying
amounts of evidencefor the foregoing responsesto nonstationarity, and some of
the predicted effects have been demonstrated in simulations, but again we are a
long way from a theory, or evengood experimentalconfirmation, of their efficiency.
In these nine chapters we have come only a short way in the study of
'
adaptation as a general process. The book s main objective has been to make it
plausible that simple mechanismscan generate complex adaptations; however,
the book will have fulfilled its role if it has communicated enough of adaptation' s
inherent fascination to make the reader' s effort worthwhile.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

10. Interim

andProspectus

, after a seven-year gestation, made its


Adaptation in Natural and Artificial Systems
in
1975
.
It
is
now
1991
and
much
has happenedin the interim. Topics
,
appearance
that were speculativein 1975have beencarefully explored; extensions, applications,
and new areasof investigationabound. More than 150 paperswere submittedto the
1991InternationalConferenceon GeneticAlgorithms (Belew and Booker 1991), and
severalnew books have been written about genetic algorithms(e.g. , Davis 1987 or
Davis 1991) . There is even a textbook (Goldberg 1989) . Most of this new research
has beenreportedin the publishedproceedingsof the geneticalgorithm conferences
of 1985, 1987, 1989, and 1991(Grefenstette1985, Grefenstette1987, Schaffer1989,
and Belew and Booker 1991) and is readily accessiblethere, so I will not attemptto
review it here--the review would be, at best, little more than an annotatedlisting.
Instead, I ' ll follow the patternof the rest of the book, using this new chapterto report
on lines of researchI ' ve pursuedsince 1975. A new edition also providesan opportunity
to correcterrorsin the original edition. Most of theseare simpleand innocuous,
but an error in one proof, discoveredand correctedby Dr. Daniel Frantz, is subtle
and important. By good fortune, after the correctionthe theoreminvolved standsas
stated. Finally, a new chapter offers an opportunity to look further into the future;
this too I ' ll attempt.

1. IN THE INTERIM
Classifier
Systems

" fonn in
inb' oducedin Holland 1976 and were later revisedto the current " standard
Holland 1980. There is a comprehensivedescription of the standardfonn , with
examples, in Holland 1986, but there are now many variants(seeBelew and Booker
1991) . A classifier systemis more restrictedthan the broadcastlanguagein just one
171

172

Adaptation in Natural and Artificial Systems

major respect: A broadcastunit can directly create other broadcastunits, but a


classifier, the broadcastunit' s counterpartin a classifiersystem, cannotdirectly create
other classifiers. This restriction permits a much simpler syntax basedon only three
atomic symbols, { I ,O, # (" don' t care" )} . A classifier systemcreatesnew classifiers
'
through the action of the geneticalgorithm on the systems populationof classifiers.
Classifier systemsaim at a question that seemsto me central to a deeper
understandingof learning: How doesa systemimproveits performancein a perpetually
novel environmentwhere overt ratings of performanceare only rarely available? A
learningtask of this kind is moreeasily describedif we think of the systemas playing
a strategicgame, like checkersor chess. After a long sequenceof actions (moves),
the system receives some notification of a " win" or a " loss" and, perhaps, some
indication of the strengthof the win or loss. But there is almostno information about
what moves should have been changedto yield better performance
. Most learning
- an extendedsequence
situations for animals, including humans, have this characteristic
of actionsis followed by somegeneralindicationof the level of performance,
with little information about specific changesthat would improve performance.
In defining classifiersystems(seeFigure 15), I adoptedthe commonview that
.the stateof the environmentis conveyedto the systemvia a set of detectors(e.g. ,
rods and cones in a retina) . The outputs of the detectorsare treatedas standardized
. Messagesare usedfor internal processingas well ,
packetsof information- messages
'
and somemessages
, by directing the systems effectors(e.g. , its muscles), determine
'
the systems actions upon its environment. Besidethe interactionswith the environment
provided by detectorsand effectors, there is a further interactionthat is critical
to the learning process. The environmentmust, upon occasion, provide the system
with somemeasureof its performance. Here, as earlier, I will usethe term payoff as
the generalterm for this measure.
The computationalbasisfor classifiersystemsis providedby a setof condition!
action rules, called classifiers. To simplify the computationalbasis, all interactions
betweenrules are mediatedby messages
. Under this provision a typical rule, under
interpretation, would have the form
IF there is (a messagefrom the detectorsindicating) an object left of centerin
the field of vision
THEN (by issuing a messageto the effectors) causethe eyesto look left.
That is, the condition part of the rule " looks for" certainkinds of messages
, and when
'
the rule s conditions are satisfied, the action part specifiesa messageto be sent.
Messagesboth pass information from the environmentand provide communication

173

Interim and Prospectus

Classifier
System
------------------------------------------------

1
rI
I
Rule
Me_
:I
:I
ge
List
List
:
:
mak:h
:
:
:I
:I
IF
.I
.I
II
II
II
.I
..
..
..
II
II
.
.
.
II
I.
II
II
I
II
II
II
III
II
II
POSt
I
I------------------------- -----------------------output
Input
mea_ . .
mea_ . .
"
OutputInterface
InputInterface
Bucketbrlge.
effectors
)1S
deteck
)
(~ Uslsrulestrengths
Geneticalgorithm
newrules
)
(generates

Environment

Fig. 15. A classifier system


betweenrules, as in the broadcastlanguage(where they were called signals) . Thus
eachrule is a simple messageprocessor
. Many rules can be active simultaneously
, so
many messagesmay be presentat any given instant. It is convenientto think of the
messagesas collected in a list that changesunder the combined impetus of the
environmentand the rules.

174

Adaptation in Natural and Artificial Systems

Parallelism, the concurrentactivity of many rules, is an important aspectof


classifiersystems. Parallelismmakesit possiblefor the systemto combinerules into
clustersthat model the environment, providing two importantadvantages
:
I . Combinatoricswork for the systeminsteadof againstit . The systembuilds
a " picture" of the situationfrom parts, ratherthan treatingit as a monolithic
whole. The advantageis similar to that obtainedwhen one describesa face
in terms of componentparts. Select, say, 8 componentsfor the face- hair ,
forehead, eyebrows, eyes, cheekbones
, nose, mouth, and chin. Allow 10
variants for each componentpart- different hair colors and textures, different
foreheadshapes, and so on. Then 108 = 100 million faces can be
describedby combining thesecomponentsin different ways. This at a cost
of storing only 8x 10 = 80 " building block" components
.
2. Experiencecan be transferredto novel situations. On encounteringa novel
situation, such as a " red car by the side of the road with a flat tire," the
" "
"
" "
systemactivatesseveralrelevantrules, suchas thosefor red, car, flat
"
"
"
"
"
tire , etc. When building block rules, suchasthosefor car, haveproved
useful in past combinations, it is at least plausible that they will prove
useful in new, similar combinations. To exploit thesepossibilities, the rules
must be organizedin a way that pennits clustersof rules to be activatedin
changingcombinations, as dictatedby changingsituations. Building-block
rules then give the system a capacity for transferringexperienceto new
situations.
To define a standardclassifier system, we first require all messagesto be bitstrings of the same length, k , much as one sets the register size for a computer.
k
Formally, then, messagesbelong to the set { 1,O} . The condition part of a rule is
'
"
"
"
'
"
specifiedby the use of a don t care symbol, # , reminiscentof the don t care
k
usedto define schemata
. Thus, the set of all conditionsis the set { I ,O,# } . For k = 6,
the condition 1# # # # # is satisfiedby any messagethat starts with a I , while the
condition 001001 is satisfiedby one and only one message
, the message001001. It
is worth noting that a condition' s specificity(the reciprocalof the numberof messages
that satisfy it ) dependsdirectly upon the number of # s in the condition- the more
lIs , the lower the specificity.
In the standardsystem, all rules consistof two conditionsanda singleoutgoing
, which is sent when the two conditions are satisfied. Rules are specifiedin
message
the format

1# # # # # ,001001/<XXX
>I I .

Interim and Prospectus

175

This format is interpreted as follows :

IF condition I is satisfied(in this case, by a message


, on the messagelist , that
startswith I ),
AND condition 2 is satisfied (in this case, by a second, specific, message
001001),
THEN the messagein the action part (in this case, 0000II ) is postedto the
messagelist on the next time-step.
Conditions may be negated: For example, - 1# # # # # is satisfied if there is no
messageon the messagelist that beginswith a I . With theseprovisionsit is easy to
show that a classifier system is computationally complete, in the sensethat any
program that can be written in a standardprogramminglanguage, such as For Uan,
C, or Lisp, can be implementedwithin a classifiersystem.
"
"
Without any changesto this definition, rules can be given an address that
can be usedby other rules when that is useful. Considera rule r with a condition of
the form 111# . . . # . Any messagethat startswith three Is will satisfythis condition.
If this particular prefix, III , is reservedfor the rule r alone, then any messagewith
that prefix will be directed to r and only to r . Such reservedprefixes (they can also
be suffixes, or indeed any part of the message
) are called tags. Of course, several
rules might have the samereservedtag; that simply meansthat all of them receive
messagesso tagged, acting as a cluster with respectto that tag. Appropriate use of
tags also permits rules to be coupledto act sequentially.
The basic execution cycle of the classifier systemconsistsof an iteration of
the following steps:
I . Messagesfrom the environmentare placedon the messagelist.
2. Each condition of eachclassifier is checkedagainstthe messagelist to see
if it is satisfiedby (at leastone) messagethereon.
3. All classifiersthat haveboth conditionssatisfiedparticipatein a competition
(to be discussedin a moment) , and those that win post their messagesto
the messagelist.
4. All messagesdirected to effectors are executed(causing actions in the
environment).
5. All messageson the messagelist from the previous cycle are erased(i .e. ,
messagespersistfor only a singlecycle, unlessthey are repeatedlyposted) .
Becausethe messagelist can hold an arbitrary numberof messages
, any number of
rules can be active simultaneously; becausethe messagesare simply uninterpreted

176

Adaptation in Natural and Artificial Systems

. Consistency
bit-strings, there are no consistencyproblemsin the internal processing
simultaneous
when
different
,
messagesurge an
problems do arise at the effectors;
effector to take mutually exclusiveactions, they are resolvedby competition.
Competition plays a central role in detenniningjust which rules are active at
any given time. To provide a computationalbasis for the competition, each rule is
assigneda quantity, called its strength, that summarizesits averagepast usefulness
to the system. We will see shortly that the strengthis automaticallyadjustedby a
credit assignmentalgorithm, as part of the learningprocess. Competitionallows rules
to be treatedas hypotheses
, more or less confirmed, rather than as incontrovertible
facts. The strengthof a rule correspondsto its level of confirmation; strongerrules
are more likely to win the competition when their conditions are satisfied. Stated
'
another way, the classifier system's reliance upon a rule is basedupon the rule s
averageusefulnessin the contextsin which it hasbeentried previously. Competition
also provides a means of resolving conflicts when effectors receive conb' adictory
.
messages
A rule, then, entersa competition to post its messageany time its conditions
are satisfied. The actualcompetitionis basedon a bidding process. Eachsatisfiedrule
makesa bid basedupon its strengthand its specificity. In its simplestform , the bid
for a rule r of strengths(r ) would be

Bid(r ) = cos(r ) olog2


[specificity
(r ),
where c is a constant< I , say 1/ 10. A rule that both has beenuseful to the system
in the past(high strength) and usesmore infonnation aboutthe currentsituation(high
specificity) thus makes a higher bid. Rules making higher bids are favored in the
competition. Various criteria for winning can be employed. For example, the proba-

. Thereare two basicproblems,


now readyto discussthe system's learningprocedures
credit assignment
, alreadymentioned, and rule discovery. Credit assignmentratesthe
rules the system already has. Rule discovery replacesrules of low strength and
providesnew rules when environmentalsituationsare ill -handled.
Credit assignment

Let us begin with the credit assignmentproblem., Credit assignment is not particularly
difficult where the situation provides immediatereward or precise infonnation about

Interim and Prospectus

177

. Credit
correct actions. Then the rules directly involved are simply strengthened
assignmentbecomesdifficult when credit must be assignedto early acting rules that
set the stage, making possiblelater useful actions. Stage-setting movesare the key
. The
to successin complex situations, such as playing chessor investing resources
look
such
as
the
sacrifice
of a
poor (
problem is to credit an early action, which may
piece in chess) but which setsthe stagefor later positive actions(suchas the capture
of a major piece in chess). When many rules are active simultaneously
, the problem
is exacerbated
. It may be that only a few of the early acting rules contribute to a
favorable outcome, while others, active at the same time, are ineffective or even
obstructive. Somehowthe credit assignmentalgorithm must sort this out, modifying
rule strengthsappropriately.
Credit assignmentin classifier systemsis basedon competition. The bidding
"
"
processmentionedearlier is treatedas an exchangeof capital (strength). That is,
when a rule wins the competition, it actually " pays" its bid to the rule(s) that sentthe
messages) satisfying its conditions. The rule acts as a kind of go-betweenor broker
in a chain that leadsfrom the stage-setting situationto the favorableoutcome.
In a bit more detail, when a rule competes, its suppliers are those rules that
have sent messagessatisfying its conditions and its consumersare those rules that
. Under this regime, we treat the strengthof
have conditionssatisfiedby its message
a rule as capital and the bid as paymentto its suppliers. When a rule wins, its bid is
apportionedto its suppliers, increasingtheir strengths. At the sametime, becausethe
bid is treatedas a paymentfor the right to post a message
, the strengthof the winning
rule is reducedby the amountof its bid. Should a rule bid but not win , its strength
is unchangedand its suppliers receive no payment. The resulting credit assignment
procedureis called a bucketbrigade algorithm (seeFigure 16) .
Winning rules can recouptheir paymentsin two ways: ( 1) They, in turn, have
winning consumersthat makepaymentsto them, or (2) they are active at a time when
the systemreceivespayoff from the environment. Case(2) is the sole way in which
payoff from the environmentaffects the system. When payoff occurs, it is divided
among the rules active at that instant, their strengthsbeing increasedaccordingly.
Rules not active at the time the payoff occurs do not sharedirectly in that payoff.
The system must rely on the bucket brigade algorithm to distribute the increased
strengthto the stage-setting rules, under repeatedactivationsin similar situations.
The bucketbrigadeworks becauserules becomestrongonly when they belong
to sequencesleading to payoff. To seethis, first note that rules consistentlyactive at
times of payoff tend to becomestrong becauseof the payoff they receive from the
environment. As theserules grow stronger, they make larger bids. A rule that " supplies
" oneof the
.
payoff rulesthenbenefitsfrom theselargerbids in future transactions

178

Adaptation in Natural and Artificial Systems

When a classifier wins a competition It Immediately


1) posts Its me. . age for use on die next time- step
2) peys Its bid to Its supplier (s) thereby re~ ing its strength.
In the diagram
C' Is first a consumer (of C) then a supplier (of C") .

(ti~ of activation)

el. nlf ..,

[t-1)

[t)

C
~--_...'
~
cond
1 oond
2 messag
~1 a
liP
t
t
tagx
tagx

payment(time)
sb'ength at t.1
sb'ength at t
8b'ength at t+1

100
112
112

[t+1)

C'
/
1

1"1
t
tag1

12 (t)

C"
~--_..
~
I 1111 1
I
t
tag1

416

120
1~
124

~ - - -

(1+ 1)

160
160
144

Fig . 16. The bucketbrigade algorithm


"
Its strength increases because its income exceeds its payout- it makes a " profit .

, the suppliersof the suppliersbeginto benefit, andso on, backto the


Subsequently
earlystagesettingrules.
Things can go wrong. A supplier might, through a messageto the effectors,
convert an environmentalstate to one that diverts its consumerrule from apayoff directedpath. That is, it might fail in its stage-settingrole. In that case, the consumer
suffers becausethe diversion will prevent it from receiving paymentsfrom its consumers
; however, the diverting supplier rule generallysuffers even more, becauseit
is at an earlier stagein its " getting rich" effort. Or it may be that the consumerhas
a condition that attendsto the state of the enviromentand does not even bid when
the diverting state occurs. In either case, the diverting supplier soon loses enough
strengthso that it no longer wins the competition. It then ceasesto influencesubsequent
activity.
The whole process, of course, takes repeatedplays of the game. But it only
. It requires
requiresthat a rule interact with its immediatesuppliers and consumers
to
no overt memory of the long and complicatedsequences
leading payoff. Avoiding
extensiveovert memoriesis almost a sine qua non for large, parallel systemsacting

Interim and Prospectus

179

in perpetuallynovel environmentswith sparsepayoff. Overt memoriesin such situations


necessarilyinvolve many tangled strands, including unnecessarydetoursand
incidentals. To teaseout the relevant strandsat the time payoff occurs would be an
overwhelming, hardly feasible task. Over repeatedtrials the bucket brigade carries
out this task, but in an implicit fashion.
Rule Discovery

Generatingplausible replacementsfor rules assignedlow strengthunder the credit


assignmentalgorithm is an even more dauntingtask than credit assignmentitself. In
a rule-basedsystem, the whole processof induction succeedsor fails in proportion
to its efficacy in generatingplausiblenew rules, rules that are not obviously incorrect
on the basis of experience. However, plausible is not an easy conceptto pin down
computationally. It implies that experienceblasesthe generationof new rules, but
how?
"
I propose that the concept of plausibility is closely linked to the " schema
concept set forth in the discussionof genetic algorithms. Becausethe rules in a
classifiersystemare presentedby stringsdefinedover a three-letter alphabet, { l ,O,# },
we can think of the strings as chromosomesdefined on three alleles. Accordingly,
we can interpret the set of rules used by the classifier system as a population of
chromosomes
. Moreover, the strengthof each rule can be interpreteddirectly as its
fitness (though it should be noted that there are interestingvariantsthat basefitness
on strengthin a less simplistic'way) . A geneticalgorithm, then, is easily applied to
such a population of rules, and, indeed, classifier systemswere designedwith just
this objective in mind.
In this applicationof the geneticalgorithm, schemataserveas building blocks
for rules. The usefulness of any given schemacan be estimated, in the usual way,
from the averageobservedstrengthof the rules that are instancesof the schemain
the population. Though these estimatesare subject to error, they do provide an
-dependentguideline. Both the possibility of error and role of experience
experience
are consonantwith the term plausibility . As always, the genetic algorithm exploits
theseestimatesimplicitly (implicit parallelism, nee intrinsic parallelism in chapter4)
ratherthanexplicitly , but this doesnot affect the plausibility of the new rulesgenerated
thereby.
It helps in understandingthe evolution of a classifiersystemto note that simple
schemata(thosewith few defining positions) generallyhavemore instancesthan more
complex schematain a populationof fixed size. From a samplingpoint of view, this

180

Adaptation in Natural and Artificial Systems

meansthat simple schemataaccumulatesamplesmore rapidly. It is not difficult to


show that the rate of accumulationfalls off exponentiallywith the complexity.
This automatic differential in sampling rates has a strong influence on what
schemataplay an importantrole in rule generationat any point in time. Early on, the
. But simple schemata
system has reliable information only about simple schemata
. Though
discriminations
for
coarse
blocks
and
estimates
usuallyonly provide building
built
from
these
information
rules
the classifier systemcan exploit this
,
simple schemata
are exposedto frequent surprises, departures
, and exceptionsin more complex
contexts. Over time, the systemgainsmoreexperience, andit gainsinformationabout
. This informationblasesthe geneticalgorithmtoward the
morecomplicatedschemata
constructionof more sophisticated
, as the
, more specific rules. As a consequence
classifier systemaccumulatesexperience, it is prone to build hierarchiesof rules of
"
"
increasingspecificity. These hierarchiesgrow from early default rules, basedon
"
"
simplecontexts, to layersof exception rulesbasedon later, moredetailedcontextual
information.
When simultaneousmessagessatisfy both a simpler default rule and a more
complex exceptionrule, the latter tendsto outcompetethe former (though there can
be complications; seeRiolo' s paperin Belew and Booker 1991) . The higher specificity
of the exceptionrule causesit to outbid the default rule if their strengthsare comparable
. The exception rule only survives under the bucket brigade if it corrects
inappropriateactions of some default rule; otherwise, the strengthof the exception
rule diminishes until it is no longer a factor in the competition. When the exception
rule doescorrect the default rule, a kind of symbiosisresults. By saving the default
rule from paying a bid in a situationswhereit would not makea profit, the exception
rule actually helps the default rule to retain its strength. Thus both the default rule
and the systemas a whole are better off for the presenceof the exceptionrule.
Becausesuccessivelayers of exceptionrules are only addedas the necessary
information becomesavailable, theserule hierarchiesprovide a sophisticated
, incremental
way of modeling the environment. The formal structurescorrespondingto
thesedefault hierarchies, calledquasi-homomorphisms
, havebeendefinedand studied
in Holland et al.( 1986) .
Geneticalgorithmshaveanothercritical effect on the developmentof classifier
systems. Recombination, under the algorithm, discoversuseful schematafor tags in
just the way it discoversuseful schematafor other parts of the rule. For example, a
genetic algorithm can recombineparts of establishedtags to invent new tags. As a
result, establishedtags spawnrelatedtags, providing new clustersof rules, and new

Interim and Prospectus

couplings betweenestablishedclusters. Tags survive (or, more carefully, the rules


using them survive) if they conbibuteto useful interactions. Under theseevolutionary
-based" symbols" for interior
, the tags develop into a systemof experience
pressures
'
use(cf . Hofstadters [ 1979] conceptof an .'activesymbol" ) . The associationsprovided
by thesetags flesh out the default hierarchymodels. The resulting structurescan be
, enabling the systemto model new situationsby coupling appropriate
quite sophisticated
clustersof established(strong) rules. Moreover, thesemodelscan be usedin a
" lookahead
" fashion
, permitting the classifier systemto act in anticipatory fashion,
. The interestedreaderis referred
selectingactionson the basisof future consequences
to Holland 1991and Riolo 1990.
Each of the mechanismsused by the classifier systemhas been designedto
enablethe systemto continueto adaptto its environment, while using the capabilities
it alreadyhasto respondinstant-by-instantto that environment. In so doing the system
is constantlytrying to balanceexploration(acquisitionof new information and capabilities
) with exploitation (the efficient use of information and capabilities already
available) .

2. THEOPTIMAL
ALLOCATION
OFTRIALSREVISITED
Pride of place in the correctioncategorybelongsto Dan Frantz' s work on one of the
main motivating theorems in the book, Theorem 5. 1. This theorem concernsthe
"
"
optimal allocation of trials in determining which of two random variables has a
higher expectedvalue (the well known 2-armed bandit problem) . In chapter 5, an
"
"
optimal solution is a solutionthat minimizesthe lossesincurredby drawing samples
from the randomvariable of lower expectation. The theoremthere showsthat these
lossesare minimized if the numberof trials allocatedto the randomvariablewith the
highestobservedexpectationincreasesexponentiallyrelative to the numberof trials
allocatedto the observedsecondbest.
Becauseschematacan be looked upon as random variables, this result illuminates
the treatmentof schemataunder a genetic algorithm (nee genetic plan in
7
chapter ) . Under a geneticalgorithm, a schemawith an above-averagefitnessin the
populationincreasesits proportion exponentially(until its instancesconstitutea significant
fraction of the total population) . If we think of the genetic algorithm as
generatingsamplesof n random variables(an n-armed bandit) , in a searchfor the
best, then this exponentialincreaseis just what Theorem5.1 suggestsit should be.
The problem with the proof of the theorem, as given, turns on its particular

182

Adaptation in Natural and Artificial Systems

useof the central limit theorem. To seethe fonn of the error, let us follow Frantzby
using F ,.(x) to designatethe distribution of the normalizedsum of the observationsof
the randomvariableX. For the 2-armedbandit, F is the distribution of the difference
of the two randomvariablesof interest. Using the notationof chapter5, q(n) = 1 F,.(x) when x = bnll2. That is, 1 - F,.(x ) gives the probability of a decision error,
q(n), after n trials out of N have been allocatedto the random variable observedto
be secondbest. Becausex is a function of n, the proof given in chapter5 implicitly
assumesthat, as n - . 00, the ratio
[ 1- F,,(x)]/ [ I - ~ (x)] - + I ,
where I - ~ (x) is the areaunder the tail of a nonnal distribution. However, standard
sources(seeFeller 1966, for example) show that this is only true whenx varies with
n as o( nl /~ . This is manifestly untrue for Theorem5.1, wherex = bnl/2.
The main result of theorem5.1 can be recoveredby using the theory of large
deviationsinsteadof the central limit theorem. The theory of large deviationsmakes
the additional requirementthat the moment-generatingfunctions for the random variables
exist, but this is satisfied for the random variablesof interest here. Let the
moment-generatingfunctions for the two randomvariables, correspondingto the two
arms of the bandit, be ml(t ) and m2(t) . Then the moment-generatingfunction for X ,
the difference, is m(t) = ml( - t)* m2(t) . There is a uniquely definedconstantc such
that
c = inf ,(m(t .
DefineS(n) to be the sum of n samplesof X. Then the appropriatetheoremon
large deviationsyields
Pr{S(n) :2:0} = [c"' (21rn) I/2]d,.( I + o( l ,
where log d,. = 0( 1) . Making appropriateprovision for ties, this yields
1/2
' "
q(n) - b c ' (21rn) ,
where b ' is a constantthat dependsupon whetheror not X is a lattice variable. This
relation for q(n) is of the sameform (except for constants
) as that obtainedfor q(n)
underthe inappropriateuseof the centtallimit theorem. Substituting, and proceeding
as before, Frantz obtains

183

Interim and Prospectw

THEOREM
5.1 (largedeviations
): Theoptimalallocationof trials n* to theobserved
second
is given

best of the two random

variables

corresponding

to the 2 - armed

bandit

problem

by

n* -- ( 1/2r) In[(r3c2N1
/(1Tln
(rc2N2/21T],
where r = Iin cl .

This theorem actually goes a step further than the original version, directly
"
"
providing a realizableplan for sampleallocation. The original version was based
"
"
on a ideal plan that could not be directly realized, requiring section 5.2 to show
that the lossesof the ideal plan could be approachedby the lossesof an associated
" realizable"
.
plan. Section5.2 is now superfluous
The revised constantsfor the realizable plan do not affect results in later
chapters, becausethe further analysis of genetic algorithm performancedoes not
depend upon the exact values for the constants. The basic point is that genetic
algorithmsallocatetrials exponentiallyto the randomvariables(schemata
) corresponding
to the arms of an n-armed bandit. Coefficientsmay vary among schemata
, but
the implicit parallelismof a geneticalgorithm is enoughto dominateany differences
in the coefficients.
There are two other errors that may trouble the close reader, though they are
much less important. The first error occurs, at the top of page 71, in the example
. x(3) should be .1000010 . . . 0 , with the
giving estimatedvalues for schemata
that
consequence

A
fo1oCX
>lo...0= (.f{x(1 +/{x(4 )/2.

The seconderror occurs, on page 103, in the discussionof the effect of crossoveron
the increaseof schemata
. In the derivation just below the middle of the page, the
approximation1/ ( I - c) :e;; I + c, for c ~ I , is invoked. But this approximationis in
the wrong direction for preserving the inequality for Tie(t); therefore the line that
follows the mention of this approximationshouldbe deleted.
Finally there is a point of emphasisthat may be troublesome. In the discussion
of the role of payoff in the formal framework, near the bottom of page 25, the
mappingallows the payoff to be any real number, positive or negative. It would have
helpedthe readerto say that payoff is treatedas a nonnegativequantity throughout
the book, particularly in the discussionof geneticalgorithms.

Adaptation in Natural and

184

ArtlJicialSystems

Other than these corrections , I am only aware of a few ( less than a dozen)
typo graphical errors scattered throughout . They are all obvious from context , so
'
there s no need to list them here.

3.

RECENTWORK

My most recentwork stemsfrom my associationwith the SantaFe Institute in Santa


Fe, New Mexico. About five years ago the SantaFe Institute, then newly founded,
begandevelopinga new interdisciplinaryapproachto the study of adaptivesystems.
The studiescenter on a classof systems, called complexadaptivesystems
, that have
a crucial role in a wide range of human activitiesEconomiesecologies , immune
systems, developing embryos, and the brain are all examplesof complex adaptive
systems. Despitesurfacedissimilarities, all complex adaptivesystemsexhibit acom mon kernel of similarities and difficulties, and they all exhibit complexitiesthat have,
:
until now, blocked broadly basedattemptsat comprehension
I . All complex adaptivesystemsinvolve large numbersof partsundergoinga
kaleidoscopicarray of simultaneousnonlinearinteractions.
Becauseof the nonlinearinteractions, the behaviorof the whole systemis
not, even to an approximation, a simple sum of the behaviorsof its parts.
The usual mathematicaltechniquesof linear approximation- linear regression
es, and the like - make little
, nonnal coordinates, meanfield approach
in
of
the
analysis complex adaptivesystems. The simultaneityof
progress
the interactionsposesboth a challengeand an opportunityfor the massively
parallel computersnow coming on the scene.
The impact of these systemsin human affairs centers on the aggregate
behavior, the behaviorof the whole.
Indeed, the aggregatebehavior often feeds back to the individual parts,
modifying their behavior. Considerthe effect of governmentstatisticson
es in an economy, or the effect of the
the plans of individual business
in
a rain forest, despiteleached, impoverished
of
nutrients
retention
aggregate
soils, upon speciesdiversity and nichestherein.
The interactionsevolve over time, asthe partsadaptin an attemptto survive
in the environmentprovided by the other parts.
As a result, the parts face perpetualnovelty, and the systemas a whole
typically operatesfar from a global optimum or equilibrium. Standard
theories in physics, economics, and elsewhereare of little help because
"
"
they typically concentrateon end points, whereascomplex adaptivesys-

Interim and Prospectus

185

tems " never get there." Improvementis usually much more important than
optimization. When partsof the systemdo settledown to a local optimum,
it is usually temporary, and those parts are almost always " dead," or
uninteresting, if they remain at that equilibrium for an extendedperiod.
Complex adaptivesystemsanticipate.
"
"
In seeking to adapt to changing circumstance
, the parts develop rules
of responses
. At its simplest, this
(models) that anticipatethe consequences
is a processnot much different from Pavlovian conditioning. Even then,
the effects are quite complex when large numbersof parts are being conditioned
in different ways. The effects are still more profound when the
anticipation involves lookaheadtoward more distant horizons. Moreover,
aggregatebehavior is sharply modified by anticipations, even when the
anticipationsare not realized. For example, the anticipationof an oil shortage
can causea sharprise in oil prices, whetheror not the shortagecomes
to pass. The effect of local anticipationson aggregatebehavior is one of
the aspectsof complex adaptivesystemswe leastunderstand
.
The objective of the Santa Fe Institute is to develop new approach
es to the
es
that
interactions
of
,
study complex adaptivesystems particularly approach
exploit
. Computersimulationoffers new ways
betweencomputersimulationand mathematics
of carrying out both realistic experiments, of flight-simulator precision, and welldefined gedankenexperiments, of the kind that have played such an important role
in the developmentof physics. For real complex adaptive systems- economies,
ecologies, brains, etc.- these possibilitieshavebeenhard to comeby because( I ) the
systemslose their major featureswhen parts are isolatedfor study, (2) the systems
are highly history dependent, so that it is difficult to make comparisonsor teaseout
representativebehavior, and (3) operationfar from equilibrium or a global optimum
is a regime not readily handledby standardmethods.
The Institute aims to exploit the new experimentalpossibilitiesoffered by the
simulation of complex adaptive systems, providing a much enrichedversion of the
theory/experimentcycle for such systems. In conjunctionwith thesesimulations, the
commonke~ el sharedby complex adaptivesystemssuggestsseveralpossibilitiesfor
theory (cf. the work on the schematheory of genetic algorithms) . In an area this
complex, it is critical for theory to guide and inform the simulations, if they are not
to degenerateinto a processof " watching the pot boil." Theory is as necessaryfor
sustainedprogresshere as it is in modem experimentalphysics, which could not
proceedoutsidethe framework of theoreticalphysics. We needexperimentto inform

186

Adaptation in Natural and Artificial Systems

Echo

While interesting models of complex adaptive systemscan be built with classifier


systems, and classifier systemshave indeedbeenusedfor this purpose(Marimon et
al. 1989), there is an advantageto having a simpler modelthat placesthe interactions
in a simpler setting, giving them sharperrelief. ~ cho is one such model, properly a
class of models, designedprimarily for gedankenexperimentsrather than precise
simulations. Echo provides for the study of populationsof evolving, reproducing
agentsdistributed over a geographywith different inputs of renewableresourcesat
various sites (see Figures 17 and 18) . Each agent has simple capabilities- offense,
"
." Though
defense, trading, and mate selection- defined by a set of chromosomes
these capabilities are simple, and simply defined, they provide for a rich set of
variations illustrating the four kernel propertiesof complex adaptivesystemspreviously
described. Collections of agentscan exhibit analoguesof a diverse range of
, including ecological phenomena(e.g. , mimicry and biological arms
phenomena
races), immune system responses(e.g. , interactionsconditionedon identification),
evolution of metazoans(e.g. , emergent hierarchical organization), and economic
"
"
phenomena(e.g . , trading complexes and the evolution of money ) .
A precisedescriptionof Echo begins with definition of the individual agents
see
(
Figure 19) . The capacitiesof an agentare completelydeterminedby a small set
"
of strings, the " chromosomes
, defined over a small finite alphabet. In the simplest
Echo model, this alphabetconsistsof four letters {a ,b,c ,d} , called resources, and
there are just two classesof chromosomes
, tag chromosomesand condition chromosomes
'
. The tag chromosomesdeterminethe agents external, phenotypiccharacteristics
'
, and the condition chromosomesdetermine an agents responsesto the
phenotypiccharacteristicsof other agents.
There are just three tag chromosomesin the simplestmodel: ( 1) offense tag,
2
defense
( )
tag, and (3) mating tag. It is convenientto think of the tags as displayed
on the exterior of the agent, counterpartsof the signaturegroupsof an antigenor the
trademarksof an organization. These tags are a kind of identifying address, quite
similar to the tagsemployedby classifiersystems. There are alsojust three condition

187

Interim and Prospectus

0
~
.
.

1II

Q.

-resource fountain renew8) le resources (


)

.
.
single-cellagent

IQagent( :~ J, . . -~ tazoan(n'MJltioe

' "
"
Fig. 17. Echos world
: ( 1) condition for combat, (2) condition for trading, and (3) condition
chromosomes
for mating. Conditions serve much as the condition parts of a classifier rule, determining
what interactionswill take place when agentsencounterone another.
The fact that an agent's structureis completely defined by its chromosomes
,
which arejust strings over the resourcealphabet{a ,b,c,d} , plays a critical role in its
"
"
reproduction. An agent reproduceswhen it collects enoughletters to make copies
of its chromosomes
. As we will see, an agent can collect these letters through its

188

Adaptation in Natural and Artificial Systems

~
-,1
",,
'-",.-r~,<
,
'
"
"
t!ff~~
aaa
c abaa
bb b aa
a
...
a
c
a.
aa
a
a
.
(::8~
b
c
(: :i..
{a, b , c} are renewmle resources

{ o. . . . . ( ~~~ :~ ~ } areagenm
Fig . 18. A site in Echo

interactions: combat, b' ade, or uptake from the environment. Each agent has a reservoir
in which it storescollectedlettersuntil thereare enoughof them forreproduction
to take place.
Interactionsbetweenagents, when they come into contact, are detenninedby
a simple sequenceof testsbasedon their tags and conditions. In the simplestmodel,
they first test for combat, then they test for b' ading, and finally they test for mating:
I . Combat (see Figure 20) . Each agent checks its combat condition against
the offensetag of the other agent. This is a matchingprocessmuch like the matching
of conditions against messagesin classifier systems. For example, if the combat
condition is given by the sUing Dad, then this condition is matchedby any offense
"
"
tag that beginswith the letters Dad. ( Thecondition, in this example, doesnot care
what letters follow the first three in the tag, and it does not match any tag that has
lessthan three letters.)

189

and Prospectus

mat

)
;

cb

mate
)

ba

trade

def

nm

Interim

off
[

Tags
Conds

[
:

.
)

enz

1
;

enz

Trans

ac
UPTAKE
environrmnt

from
(
other

and

organisms

has enough elemena In la reservoirto make oopies of its


When an organism
. chr~
an offspring.
n S.. it p~
( Theoffspring may differ from the parent because of nl Jtationor recormlnation.)

Fig . 19. A single-cell agent

If the combatconditionof eitheragentmatchesthe offensetag of the other,


thencombatis initiated. Thatis, combatcanbe initiatedunilaterallyby eitheragent.
If combatis initiated, theoffensetagof thefirst agentis matchedagainstthedefense
. In thesimplestcase,thisscoreis calculated
tagof thesecondanda scoreis calculated
, the
the
basis
on a positionby position
, adding resultsto get a total. For example
score
to
is
used
that
mabix
score
a
from
scorefor a singlepositioncouldbeobtained
lettersin thetwo tags:
thematchbetweencorresponding
Offense
Defense
a
b
c
d

a b c d
4
0
2
2

0
4
2
2

2
2
4
0

1
I
0
4

190

Adaptation in Natural and Artificial Systems

1) Whenagent1 [ 0 ] ~ vesintothevicinityof agent2 [ @


conmatconcitionmatchesagent2' s offensetag.

J, ~

t occursif agent1's

2) Agent 1 is assigned a ~ e based on the match between its offense tag and the defense tag of
agent 2 ; a simHar~ e is calculatedfor agent 2. The agent with the higher ~ e is the winner.

loft

. " . . tagsl

.. 18g81
I- ten
3) The winner a~ ulres from the loser the resouroosIn Its reservoirand the resouroostied ~ In Its
chr~
s and tags. The loser Is deleted.

Fig . 20. A typical interaction between agents

Under this matrix, the offensetag Dab matchedagainstthe defensetag aaaad would
yield a score of 4+ 4+ 0 = 8. (In this simple example, the additional letters in the
defense
- tag do not enter the scoring; in a more sophisiticatedscoring procedure, the
defensemight be given some extra points for additional letters) . A score is also
calculatedfor the secondagentby matchingits offensetag againstthe defensetag of
the first agent. If the scoreof one agentexceedsthe scoreof the other, then that agent
is declaredthe winner of the combat. In an interestingvariant, the win is a stochastic
function of the differenceof the two scores.

Interim and Prospectus

191

The winner collects the loser' s resources


, both the resourcesin its reservoir
and the resourcestied up in its chromosomes(brokeninto individual letters) . In some
models, the winner collects only some of the resourcesof the loser, the rest being
dissipated. The provision of separateoffenseand defensecapabilities, with possible
, allows the system to evolve intransitive relations between agents
asymmetries
" "
"
wherein, for example, X can " eat" Y, and Y can " eat Z , but X cannot eat Z. As a
"
"
, various kinds of food webs can evolve.
consequence
2. Trading. If combat does not take place, then the first agent in the pair
checks its trading condition against the offense tag of the secondagent, and vice
versa. Unlike combat, which can be initiated unilaterally, trading is bilateral- a trade
does not take place unless the trading conditions of both agentsare satisfied. The
trading condition in the simplestmodel has a single letter, as a suffix, that specifies
the resourcebeing offered for trade. If the tradeis executed, then eachagenttransfers
any excessof the offered resource(amountsover and abovethe requirementsfor its
own reproduction) from its reservoir to the reservoir of its trading partner. Though
this is a very simple rule, with no bidding betweenagents, it does lead to intricate,
rational trading interactions as the system evolves: Trades that provide resources
neededfor reproductionincreasethe reproductionrate, assuringthat agentswith such
rational trading conditionsbecomecommoncomponentsof the population.
3. Mating. While an agentcan reproduceasexually, simply making a copy of
eachof its chromosomeswhen it hasaccumulatedenoughresources(letters), there is
. When agentscome into contact
also a provision for recombinationof chromosomes
and do not engagein combat, the mating condition of eachagentis checkedagainst
the mating tag of the other. As with trade, mating is only executedas a bilateral
action: Both agentsmust have their mating conditionssatisfiedfor recombinationto
take place. If this happens, then the agentsexchangesome of their chromosome
material, as with crossoverunderthe geneticalgorithm. ( Theprocedureis reminiscent
of conjugationbetweendifferent mating types of paramecia
) . This selectiverecombination
and
for
exploiting useful schemata
provides a powerful mechanism discovering
. The effect is very like the effect that the schematheorems) of chapter7 project,
'
thoughthe schematheoremscannotbe applieddirectly to Echo s agentsbecausethey
have no explict fitness function.
Sites

192

Adaptation in Natural and Artificial Systems

regular, or irregular, array (seeFigures 17 and 18) . Each site has a well-defined set
of neighboringsites, and eachsite can containa subpopulationof agents. In addition,
each site is assigneda production function that determineshow rapidly the site
. For example, one site may produce
producesand accumulatesthe various resources
10 units of resourcea per time step, and nothing of b, c, or d, whereasanothermay
produce4 units eachof a , b, c , and d. If the site is unoccupiedby any agents, these
resoucesaccumulate, up to some maximum value. In the example of the site that
produces10 units of resourcea per time step, the site could continueto accumulate
the resourceuntil it had accumulated
, say, a total of 100 units. Agents presentat a
site can " consume" theseresources
. Thus an agentlocatedat a site that producesthe
resourcesit needscan managereproductionwithout combat or trade, if it survives
combat interactionswith other agents. Different agentsmay have intrinsic limits on
the resourcesthey can take up from the site. For example, an agentmay only be able
to consumeresourceb from the environment, being dependentupon agent-agent
interactionsto obtain other neededresources
. Resoucesavailableat a site are shared
the
that
can
consume
them.
among
agents
When neither agent-agentnor agent-environmentinteractionsare providing at
least one neededresourceat a given site, an agent may migrate from that site to a
neighboringsite. For example, consideran agentthat hasalreadyacquiredenoughof
resourcesa and b to makecopiesof its chromosomesbut that is not acquiring needed
resourcec. Then that agent will migrate to some neighboringsite; in the simplest
modelsthe new site is simply selectedat randomfrom the neighboringsites.
The Simulation

The actualEchosimulationis designedso that, in effect, the populations


at eachof
thesitesin themodelundergotheirinteractions
. In otherwords, Echo
simultaneously
is well suited to executionon a massivelyparallel computer. The interactionsat each
site are carried out by repetition of the following basic cycle.
( 1) Pairs of agentsfrom within the site are selectedfor interaction. (In the
simplestmodel, thesepairs are simply selectedat randomfrom the local population) .
Each pair is testedfor the kind(s) of action that will ensuefollowing the procedures
outlined above.
( 1. 1) First the pair is testedfor combat, which may be invoked unilaterally.
( 1.2) If combat is not invoked, then the pair is testedfor trade, which can
only be invoked bilaterally. The samepair is then testedfor mating compatibility
. If the agentsare compatible, then, with low probability, recombination
of their chromosomeswill follow.

193

Interim and Prospectus

( 2

Each

Each

in

agent

site

the

executes

of

uptake

resources

in

agent

the

by

be

met

is

agent

on

depending

of

subtraction

the

the

in

its

see

If

not

the

of

units

of

models

the

site

If

the

must

be

also

has

reservoir

be

may

cost

to

returned

random

small

cannot

the

site

chance

of

see

to

make

one

to

of this basic
of

range

the

for

it

resources

enough

its

chromosomes

later

gedanken

needs

currently

If

so

it

( 3

step

for

) ,

tests

reproduction

the

in

uptake

site

adds

step

number

specified

out in a variety

of ways , depending

of

interest . Even

the

. One

of the earliest

models

experiments
evolutions

with

in

produced

sites

with

can be filled

cycle

of agents

of

replicates

neighboring

site

accumulated

copy

resources

associated

sophisticated

surprisingly

has

than

the

the

the

it

of

function

sequences

if

to

other

one

to

particular

evolving

site

least

resource

show

( 2

and

the

at

The details
upon

in

production

each

the

its

agent

mutations

migrates

The

1 )

acquired

agent

( 6

from

resources

Each

tests

infrequent

agent

has

site

the

steps

with

Each

in

via

it

which

cause

agent

itself

if

cost

"

reservoir

( 5

to

its

of

model

particular

Each

replicates

maintenance

resources

( some

deleted

without

charged

specified

"
deleted

being

is

"

"
payed

site

the

at

produced
"

"
( 3

ever longer , more complicated

simplest
produced

chromosomes

accompanied by a corresponding increase in the complexity of their interactions . The


result
ever

was

"

and

longer

the increasing
basic

arms

biological
offense

tags

defensive

( Dawkins
ever

developed

capabilities

for

cycle , provide

"

race

. More

the evolution

of

1986 ) , wherein
more

sophisticated

tags
to

became

overcome

of the
models , by a modification
"
- - connected
communities
of

recent
"

defense
matches

metazoans

agents that have internal boundaries and reproduce as a unit . With this provision ,
agents belonging to a connected community can specialize to the advantage of the
whole

community

. For example

, one kind

of agent

belonging

to the community

can

specialize for offense , while a second kind specializes in resource acquisitionsome


what

reminiscent

show

that

of the

intracommunity

the reproduction

rate

of

cells

stinging

and cavity

between

trading
both . As

these

a consequence

cells

of the hydra ) . It is easy

specialists

yields

the metazoans

a net increase
come

to occupy

to
in
a

significant place in the overall ecology of agents. Many of the mechanisms investigated
by Buss ( 1987) can be imitated by this model , including the evolution of cooperation
between

cell

lines

mechanisms

( cf . Axelrod
as induction

and Ha milton
and competence

1981 ) and the origin


( see Figure

21 ) .

of such

developmental

~Systems
Adaptation in Natural and Artificia

194

"
"CATERPILLAR
~

'PREDATION
"'"

~
PRED
,.

t
TRADE
~

QIIANTI
)

" FLY"

~
.-Conditions
..,mm18i
~~
. ~
Taga
.

Condition
~

~
I

_
~

"

,
,

~~..~
~ ~

ag

Condition.

.
.
.

Fig . 21. A small ecologyin Echo

Interim and Prospectus

195

Classifier Systems and Echo : ACompar ;s;on

Echo and classifier systemsare similar in many ways. The conditionsemployedby


an agentin Echo to determineits actionsarequite similar to the condition/ action rules
of a classifier system. However, the actions in Echo (combat, trading, mating) are
much more concretethan the rule-activating messagesused by a classifier system.
They are much easierto interpret when one is trying to understandaspectsof distributed
control and emergentcomputationin complex adaptivesystems. Tags also play
a critical role in both Echo and classifier systems, but again a tag' s effects are much
more directly interpretablein Echo.
Echo differs from classifier systemsin two importantways. First, geometryis
critical in Echo. This goes back to the origin of the Echo models (Holland 1976)
wheregeometryplayed an importantrole in the spontaneous
emergenceof autocatalytic
structures. In a similar way, the sites in Echo, with their differing resource
, encouragesophisticatedagentecologies. Second, there are
productioncharacteristics
no explicit fitnessfunctions in Echo. The reproductionrate of an agentdependssolely
on its ability to gatherthe necessaryresourcesin the contextof other agentsand sites.
There is no numbercorrespondingto the payoff usedby a genetic algorithm, nor is
there a counterpartof the payoff-derived strength of a classifier system rule. An
economist would say that fitness has becomeendogenousin Echo, whereasit is
, the emergent
exogenousin geneticalgorithmsand classifiersystems. As a consequence
structures(agents) in Echo are much more a function of the overall context and
much less a function of external constraints. This can be both an advantageand a
, but it does allow studiesof emergentfunctional structuresfree from
disadvantage
the confoundingeffects of externalconstraints.
4.

POSSIBILITIES

Both Echo and classifiersystemspoint up a salientcharacteristicof complexadaptive


systems: In thesemultiagentsystemsit takesonly a few primitive activitiesto generate
an amazing array of structuresand behaviors. Moreover, when the primitives are
chosenwith care, counterpartsof thesestructuresand behaviorscan be found in all
kinds of complex adaptive systems. Echo' s primitives (combat, trade, and mating)
and the phenomenathey generate(arms races, cooperation, etc.) directly illustrate
the point. Though the range of structuresexhibited by complex adaptivesystemsis
"
"
daunting, this generator characteristicoffers real hope for a future generaltheory.
In pursuing a generaltheory, there is a traditional tool of physicsthat can be

196

Adaptation in Natural and Artificia4 Systems

brought to bear- the gedankenexperiment. As the nameimplies, a gedankenexperiment


is a thought experiment. It extractsa few elementsfrom a processin order to
examine, logically, somecritical effect producedby the interactionof theseelements.
Computersoffer a way of extending the scope of gedankenexperimentsto much
more complex situations. Echo hasbeendesignedas a computationalbasefor gedanken experimentson complex adaptivesystems.
Echo, and other modelsof complex adaptivesystems, are readily designedfor
direct simulation on massivelyparallel computers. It is also possibleto designinteractive
interfacesfor these simulationsthat permit ready, intuitive interactionswith
the ongoing simulation, much as is the casewith flight simulators. Thus, the " logic"
of the simulation can be combined with the human's intuition and superb pattern
recognition ability to provide quick detectionof interestingpatternsor events. This
has the double value of providing reality checks on the design, while allowing
investigatorsto bring their scientific taste and intuitions to bear in creating and
exploring unusualvariants.
By looking for pervasivephenomenain thesegedankenexperiments, we can
-teststudy complex adaptivesystemswith a new version of the classic hypothesize
"
"
revise cycle. The test part of this cycle is particularly important becausecomplex
adaptivesystems, as mentionedearlier, typically operatefar from a steadystate. They
.
are continually undergoingrevisions, and their evolution is highly history dependent
This, combined with the nonadditive nature of the internal interactions, makes it
difficult to do controlled experimentswith real complex systems. Computer-based
gedankenexperimentsshould help fill the gap.
In examining complex adaptivesystems, there is one property that is particularly
hard to examinein situ. Complexadaptivesystemsform and useinternal models
to anticipatethe future, basingcurrent actionson expectedoutcomes. A systemwith
an internal model can look aheadto the future consequences
of currentactionswithout
commit
itself
to
those
actions
.
In
the
,
actually
ting
particular
systemcan avoid acts
that would set it irretrievably down some road to future disaster(" steppingover a
cliff " ) . More sophisticatedusesof an internalmodelallow the systemto selectcurrent
"
"
situations(as in Samuel's [ 1959]
stage-setting actionsthat setup later advantageous
"
"
useof lookahead) . As pointedup earlier, the very essenceof attaininga competive
, whether it be in chessor economics, is the discovery and executionof
advantage
"
"
stagesetting moves. Internal models distinguish complex adaptive systemsfrom
other kinds of complex systems; they also make the emergentbehavior of complex
.
adaptivesystemsintricate and difficult to understand
Internal models offer a second advantagein addition to the advantageof

Interim

and Prospectus

prediction. They enablea systemto makeimprovementsin the absenceof overt payoff


or detailed information about errors. Whenevera model' s prediction fails to match
subsequentoutcome, there is direct information aboutthe needfor improvement. An
appropriatecredit (blame) assignmentalgorithm can even detenninewhat part(s) of
the model should be revised. This is a tremendousadvantagein most real-world
situationswhere the rewards for current action are usually much delayed. Internal
modelsenableimprovementin the interim.
Though we readily ascribeinternal models, cognitive maps, anticipation, and
predictionto humans, we rarely think of them ascharacteristicof other systems. Still ,
a bacteriummoves in the direction of a chemicalgradient, implicitly predicting that
food lies in that direction. Th'e--repertoireof the immunesystemconstitutesits model
of its world, including an identity of " self." The butterfly that mimics the foul-tasting
monarchbutterfly survivesbecauseit implicitly forecaststhat a certain wing pattern
discouragespredators. A wolf basesits actionson anticipationsgeneratedby a mental
map that incorporateslandmarksand scents. Becauseso much of the behavior of a
complex adaptivesystemstemsfrom anticipationsbasedon its internal models, it is
important that we understandthe way in which such systemsbuild and use internal
models.
A generaltheory of complex adaptivesystemsthat address
es theseproblems
: parallelism,
will be built , I think, on a framework that centerson three mechanisms
.
the
and
recombination
Parallelism
lets
use
individuals
(rules,
competition,
system
agents) as building blocks, activating sets of individuals to describeand act upon
) . Competition
changingsituations(as describedin the discussionof classifiersystems
allows the systemto marshalits resourcesin realistic environmentswheretorrentsof
mostly irrelevantinformationdelugethe system. Proceduresrelying on the mechanism
of competition- credit assignmentand rule discovery- extract useful, repeatable
eventsfrom this torrent, incorporatingthem as new building blocks. Recombination
underpinsthe discoveryprocess, generatingplausiblenew rules from building blocks
that form parts of tested rules. It ir&
1plementsthe pervasiveheuristic that building
blocks useful in the past will prove useful in new, similar contexts. Overall, these
mechanismsallow a complex adaptive systemto respond, instant by instant, to its
environment, while improving its performance. In so doing, aswith classifiersystems,
the systembalancesexploration with exploitation.
When these mechanismsare appropriately incorporatedin simulations, the
systemsthat result are well founded in computationalterms, and they do indeedget
better at attaining goals in perpetuallynovel environments. It should be possibleto
take a first step toward a generaltheory of complex adaptivesystemsby formalizing

198

Adaptation in Natural and Artificial Systems

a framework basedon thesemechanisms


. A secondstep would incorporatea mathematics
that emphasizesprocessover end points. This mathematicswould emphasize
the discoveryand recombinationof useful components- building blocks- rather than
focusing on fixed points and basinsof attraction. At that point, the incipient theory
should begin to provide the guidelines that make the computer-basedexperiments
more than uncoordinatedforays into an endlesslycomplexdomain. Once we get that
far, we come at last to a rational discipline of complex adaptivesystemsproviding
genuinepredictions.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

Bibliography

Arbib , M . A . , 1964. Brains, Machines, and Mathematics. New York: McGraw-Hill .


Axelrod, R. and Hamilton , W. D. 1981. The evolution of cooperation. Science211: 13901396.
Bagley, J. D. 1967. The Behaviorof Adaptive SystemsWhich Employ GeneticandCorrelation
Algorithms. Ph.D. Dissertation, Ann Arbor: University of Michigan.
Belew, R. K . and Booker, LB . , eds. 1991. Proceedingsof theFourth InternationalConference
on GeneticAlgorithms. San Mateo: Morgan Kaufmann.
Bellman, R. 1961. AdaptiveControl Processes: A GuidedTour. Princet;pn: PrincetonUniversity
Press.
Bledsoe, W. W. , and Browning, I. 1959. Patternrecognitionand readingby machine. Proc.

East. Joint Comp


"t. Con/. 16:225- 32.
. PhiD.
Brender
, R. F. 1969. A Programming
Systemfor the Simulationof CellularSpaces
.
Dissertation
, Ann Arbor: Universityof Michigan
Britten, R. J., andKohne, DE . 1968. Repeated
in DNA. Science161:529- 40.
sequences
Britten, R. J. , andDavidson
, EH . 1969. Generegulationfor highercells: a theory. Science
- 57165~14Q
Bmes, A . M . 1972. Models of race and cline. Amer. J. Phvs. Anlhro" . 37:389- 99.
Burks, A . W. , cd. 1970. Essayson Cellular Automata. Urbana, Dl.: University of Illinois
Press.
Buss, L . W. 1987. The Evolution of Individuality. Princeton: PrincetonUniversity Press.
Cavicchio, D. J. 1970. Adaptive SearchUsing SimulatedEvolution. PhiD. Dissertation, Ann
Arbor: University of Michigan.
Crow, J. F. , and Kimura, M . 1970. An Introduction to PopulationGeneticsTheory. New York:
Harper and Row.
Davis, L . , cd. 1987. Genetic Algorithms and Simulated Annealing. San Mateo: Morgan
Kaufmann.
. 1991. Handbook01GeneticAlgorithms. New York: Van Nostrand Reinhold.
Dawkins, R. 1986. The Blind Watchmaker
. New York: W. W. Norton.
Dubins, L . E. , and Savage, L . J. 1965. How to Gamble if You Must. New York: McGrawHill .
Eden, M . 1967. Inadequaciesof neodarwinian evolution as a scientific theory. Mathematical
Challengesto the NeoDarwinian Interpretation 01Evolution. ed. Moorhead, P. S. , and
Kaplan, M . M . 5- 19. Philadelphia: Wistar Institute Press.
Feller, W. 1966. An Introduction to Probability Theory and its Applications, vol. II . New
York: Wiley .

203

204

Fisher, R. A . 1930. The GeneticalTheory of Natural Selection. Oxford: ClarendonPress.


. 1963. Retrospectof the criticism of the theory of natural selection. Evolution as a
Process. ed. Huxley, J. , et al. 103- 19. New York: Collier Books.
Fogel, L. J. , Owens, A . J. , and Walsh, M . J. 1966. Artificial IntelligenceThroughSimulated
Evolution. New York: Wiley.
Frantz, D. R. 1972. Non- linearities in Genetic Adaptive Search. PhiD. Dissertation, Ann
Arbor: University of Michigan.
Gale, D. 1968. A mathematicaltheory of optimal economicdevelopment. Bull . Amer. Math.
Soc. 74:207- 23.
Goldberg, D. E. 1989. Genetic algorithms in Search, Optimizationand Machine Learning.
Reading: Addison-Wesley.
Good, I . J. 1965. Speculationsconcerning the first ultra-intelligent machine. Advances in
Computers. 6:31- 88. New York: AcademicPress.
Goodman, E. D. 1972. Adaptive Behaviorof SimulatedBacterialCells Subjectedto Nutritional
Shifts. Ph.D. Dissertation, Ann Arbor: University of Michigan.
Grefenstette
, J. J. , ed. 1985. Proceedingsof the First International Conferenceon Genetic
Algorithms. San Mateo: Morgan Kaufmann.
. 1987. Proceedingsof the SecondInternational Conferenceon Genetic Algorithms.
San Mateo: Morgan Kaufmann.
Hebb, D. O. 1949. The Organizationof Behavior. New York: Wiley.
. 1958. A Textbookof Psychology. Philadelphia: Saunders
.
Hellman, M . E. , and Cover, T. M . 1970. Learning with finite memory. Ann. Math. Stat.
41:765- 82.
Hofstadter, D. R. 1979. Godel, Escher, Bach. New York: Basic Books.
Holland, J. H. 1976. Studiesof the spontaneousemergenceof self-replicating systemsusing
cellular automataand formal grammars. Automata, Languages, Development
. ed. Lin denmayer, A . , and Rozenberg, G. Amsterdam: North-Holland.
. 1976. Adaptation. Progressin TheoreticalBiology W . ed. Rosen, R. F. New York:
AcademicPress.
. 1980. Adaptive algorithms for discovering and using general patterns in
growing
. 4:217- 240.
knowledge-bases. Int . J . Policy Analysisand Information Systems
. 1986. Escapingbrittleness: The possibilities of general-purposelearning algorithms
applied to parallel rule-based systems. Machine Learning II . ed. Michalski, R. S. ,
Carbonell, J. G , . and Mitchell , T. M . Los Altos: Morgan Kaufmann.
. 1991 Concerning the emergenceof tag- mediated lookaheadin classifier systems.
: M .I .T Press.
EmergentComputation. ed. ForrestS . Cambridge, Massachusetts
Holland, J. H. , Holyoak, K . J. , Nisbett, R. E. and Thagard, P. R. 1986. Induction: Processes
: M .I .T Press.
of Inference, Learning, and Discovery. Cambridge, Massachusetts
Hollstien, R. B. 1971. Artificial Genetic Adaptation in Computer Control Systems. PhiD.
Dissertation, Ann Arbor: University of Michigan.
Jacob, F. , and Monod, J. 1961. Genetic regulatory mechanismsin the synthesisof proteins.
Moi . Bioi . 3:318- 56.
Jerne, N. K . 1973. The immune system. Sci. Amer. 229:52- 60.
Levins, R. 1968. Evolution in ChangingEnvironments. Princeton: PrincetonUniversity Press.
MacArthur, R. H. , and Wilson, E. O. 1967. The Theory of Island Biogeography. Princeton:
PrincetonUniversity Press.
Marimon, R. E. , McGrattan, E. , and Sargent, T. J. 1989. Money as a Medium of Exchange
in an Economywith Artificially Intelligent Agents. SantaFe Institute Working Paper89004. SantaFe: SantaFe Institute.

Bibliography

205

Called
of a Classof Probabilistic
Martin, N. 1973. Convergence
AdaptiveSchemes
Properties
Plans. PhiD. Dissertation
, AnnArbor: Universityof Michigan.
Reproductive
Sequential
: HarvardUniversity
. Cambridge
andEvolution
, Massachusetts
Mayr, E. 1963.AnimalSpecies
.
Press
. Mathematical
of evolution
to themathematical
. 1967. Evolutionarychallenges
interpretation
. ed. Moorhead
to theNeoDarwinianInterpretation
,
of Evolution
Challenges
.
: WistarInstitutePress
P. S. , andKaplan, MM . 47- 58. Philadelphia
: markll . Psych.Rev. 64:242&52.
Milner, P. M. 1957. Thecell-assembly
Cliffs: Prentice
. Englewood
Finite
andInfiniteMachines
:
.
M.
L.
1967
,
Computation
Minsky
Hall.
Newell, A. , Shaw, J. C., and Simon, H. A. 1959. Reporton a generalproblemsolving
.
. 256- 64. Paris: UnescoHouse
. Proc. Int. Con/. Info. Process
program
of a Cell-AssemblyModel. PhiD. Dissertation
, Ann Arbor:
Plum, T. W.-S. 1972. Simulations
Universityof Michigan.
. Simulationof
Riolo, R. L. 1990Lookahead
planningandlatentlearningin a classifiersystem
. ed. Meyer, J.-A. andWilsonS . Cambridge
: FromAnimalsto Animats
AnimalBehavior
.
: M.I.T. Press
, Massachusetts
. PhiD.
with Biochemical
Properties
, R. S. 1967. Simulationof GeneticPopulations
Rosenberg
Dissertation
, Ann Arbor: Universityof Michigan.
. IBM J.
Samuel
, A. L. 1959. Somestudiesin machinelearningusingthe gameof checkers
Res. Dev. 3:210- 29.
on Genetic
Schaffer
, J. D. , ed. 1989. Proceedings
of the Third InternationalConference
.
Kaufmann
Mateo
:
.
San
Morgan
Algorithms
. TheHarveyLectures1971-1972. New
Sela, M. 1973. Antigendesignandimmuneresponse
.
York: AcademicPress
. Mech. Thought
. Proc. Symp
: a paradigmfor learning
, O. J. 1959. Pandemonium
Selfridge
Office.
Process
es. 511- 29. London: H. M. Stationary
.
: M.I .T. Press
Simon, H. A. 1969. TheSciences
, Massachusetts
of theArtificial. Cambridge
-Hall.
Cliffs: Prentice
. Englewood
es,of OrganicEvolution
Stebbins
, G. L. 1966. Process
. NewYork: Academic
Tsypkin, Y. Z. 1971. AdaptationandLearningin AutomaticSystems
.
Press
-Hall.
Cliffs: Prentice
. Englewood
Uhr, L. 1973.PatternRecognition
, andThought
, Learning
.
stern, O. 1947. Theoryof Gamesand EconomicBehavior
von Neumann
, J., and Morgen
.
: PrincetonUniversityPress
Princeton
to theNeoDarwinian
. Mathematical
discussion
Challenges
, C. H. 1967.Summary
Waddington
, P. S., andKaplan, MM . 95- 102. Philadelphia
Interpretationof Evolution. ed. Moorhead
.
: WistarInstitutePress
. NewYork: Norton.
Wallace
, andEvolution
, GiantMolecules
, B. 1966. Chromosomes
. Dissertation
.
PhiD
Cell
a
of
, Ann Arbor:
.
Simulation
R.
1970
Living
,
Computer
Weinberg
Universityof Michigan.

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

Glossaryof ImportantSymbols

(Page numbersindicatefirst important or defining occurrencesin the text.)


English Symbols
A

a particular structure attainable by an adaptive plan ; A E: (1 (5, 22)

(1

domain of action of an adaptive plan, the structures it can attain (5, 21)

Ct(t)

the particular structure from (1 being tried at time t ( 15, 22)

(11(t)

that part of the structure Ct( t) directly tested against the environment
(23)

(B( t)

the population (set of structures) acted upon by the reproductive plan


at time t (88, 91)

(c,)
( C,J,V)

controlling sequencefor mutation rate ( 122)


"
" " end
" "
"
( initiation ,condition ,
signal,
predicted value ) forbehav Loral atom ( 156)

d..

dominance map for ith position of homologous pair of I-tupies ( 112)

a particular environment of a system undergoing adaptation (4, 25)

possible environments (uncertainty) of adaptive system (4, 25)

the range of signals the adaptive system can receive from the environment
(22)

I (t )

the particular signal received by the adaptive system from the environment
at time t (22)

8M

first M positive integers (91)

k or k ..

number of attributes (alleles, etc.) associated with the (it h) detector


(gene, etc.) (21, 72)

number of detectors (genes, etc.) used in the representation of structures


in (1 (66)
199

200

/(e
JO
(e
L(N)
M

Glossary of Important Symbols

length of schema ~ ( 102)


number of positions on which schema~ is defined ( 110)
expected loss under an allocation of N trials (by plan 1') (77)
size of population (data base) <B( t) acted upon by reproductive plan
( 73, 91)

mt( t)

memory, that part of the input history retainedby the adaptiveplan


in additionto the part summarizedin the testedstructureCi1
(t), where
Ci(t) - (Ci1
(t),3R( t (23)

Mf<t)

numberof instancesof schema~ in the population<B(t) (87, 98)


numberof trials allocatedto randomvariablesother than the bestin
a setof randomvariables(77)

total numberof trials allocatedto a setof randomvariables(76)


- dr.ME<t)/ M , the proportion of ~ in <B(t) ( 102, 127)
probability of operator [ ] being applied to an individual in <B(t)
- over), 108PI (inversion), 110IPJI(mutation
( 102Pc (crossing
a setof probability distributionsover Ci(24)

P(~,t)
p[ ]
<P
Of

limit on rateof reproductionin environmentalnicheassociated


with ~,
set by renewalratesof resourcesin that niche( 166)

r'

number of schemata receiving n ' or more trials ( under a genetic plan )


( 129)

<R[ )

reproductiveplansof type [ ] (90 ft)

<R.(PC'p" .PM,(C,

specialclassof type <R. plansusedin the studyof robustness


( 121ft)

time (20)

a setof adaptiveplansto be compared(25)


the payoffaccumulatedby plan T in environmentE up to time T (26)

U.,...{T)
'U

a set of random variablesusedwhen payoff is to be assignedstochastically


to elementsof <t (25)
setof attributes(rangeof values
) for the ith detector, 8.. (66)
Greek Symbols

a(~,.tit)
8i: -+ Vi

averageexcess(in genetics) of schema(coadapted set) ~ ( 137)


detector, assigns attributes (values from Yi) to structures A Ed
(66 ; cf. 6, 44)

Glossary of Important Symbol"

201

.11

" 101
- over" pressure
crossing
( )

Ef

fraction of instancesof f in <B(t) lost becauseof action of OperatoR


(125)
steady-state probability of occurrenceof schemaf undercrossing
over ( 100)
- df. {O,I ,*, : ,O, " ,v A ,p,' } , symbolsof the broadcastlanguage( 144)
payoff or performanceof structureA E: (i in environmentE (4, 25)

A(f}
~\
II . : d - + Reala
lit

the expectedpayoff to schemaf (under somegiven probability distribution


P over (i ) (69)

P,f

the observedaverageperformance(payoff) of a set of samplesof f


(69)

p,(t)

the observedaverageperformanceof the structuresin <B(t) ( 102)

p(T) or p(l)
p~,

the averageperformance(payoff) of all trials of (i to time T, or the


averageperformanceof trials of (i at time-stept (69)
- df.JI,.(AA(t (94)

P,

- df. L AIIA'/ M , averageperformanceof population<P.(t) (94)

F.

a schema(designatinga subsetof (i ) ; f E: ,i (68)


schemawith thejth highestobserved
averageafter N trials (77)

~(/)(N)
,:,

the setof schematadefinedover (i (68)

p : d1 - + n

assignsopei-ator to structurefor plansof type CR


[ ) (92)

T:/ X d - + n
or
T:/ X d - + d

an adaptiveplan (4, 21)

a criterion for comparing plans in the set 3 (26)

"' : Cl - + Cl

an operator (for modifying structures), either deterministic or stoor


chastic; ' " E: n (24)

", :Cl- + <P


", :SMX Clf' - + <P a particular operator ( for plans of type <R( ) ( 92)
n
the set of operators (for modifying structures) employed by anadaptive plan ( 3, 24)

Miscellanea" " Symbols


"
'
"
a don t care indicator used in the definition of schemata(68)

[ ]t

set of aU permutations of (elementsof ) [ ] ( 107)

202

Glossaryof Important Symbols

'"

ratio is 1 in the limit (78)

differenceis negligible(understatedconditions) (78)

dr.

definedto be equal(94)

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

Index

(df . ) / ollowing an entry indicates the term is defined or explained on that page

Activities, economic, 36- 39, 67. Seealso


Economics, mathematical
Adaptive agent. SeeAgent
Adaptive plan, 1.2, 8- 9, 12- 14, 16- 19,
2.1; examplesof , ch. 3; genetic, 1316, 34, 7. 1, 7.5, 156, 157, 160- 61;
reproductive, 13- 14, 18, 6. 1, 102- 5,
III
Adaptive System, 20, 25, 28 (df .)
Agent, 186- 94
Aggregatebehavior, 184
Algorithm, 25, 44, 6.1, 105, 119- 20,
122, 131, 141
Allele , 9 (df .), 14- 16, 21, 33- 34, 100,
102, 106, 109- 10, 119, 7.4 ; dominant,
112 (df.); recessive, 112 (df .)
Antibody, 155
Apportionmentof credit, 10, 33- 34, 66,
68- 69, 74, 160- 62. Seealso Obstacles
to adaptation; Schemata
Arbib , M . A . , 152
Artificial Intelligence. SeeApportionment
of credit; Broadcastlanguage; Evaluator;
Falsepeak; Game-playing; Generalproblem
solver; Goal- directedsearch; Lookahead; Model; Patternrecognition;
Prediction; Representation
; Search; Strategy
Artificial selection, 163
Artificial systems, 3, 1.3, 25, 89- 90,
122, 129, 155
Association, 60- 63, 68, 97, 161. Seealso
Linkage
Attribute, 66 (df.), 68, 98, 106, 8.1, 161

Automata. SeeTransition function


Averageexcess, 88, 136- 37, 161. Seealso
Fitness
Axelrod, R. , xi , 193

, J. D. , 162
Bagley
Bandit, two-anned. SeeTwo-annedbandil
, 78- 79
Bayesian
algorithm
atom, 155(dC.)
Behavioral
Behavioral
unit, 155(dC.)
Bellman
, R., 76
Bledsoe
, W. W., 162
Breedingplan, 163- 64
Brender
, R. F., 168
Britten, R. J., 116, 117, 153, 154
Broadcast
, 8.2- 8.4, 171- 72
language
Broadcast
unit, 144- 47 (df.)
, I. , 162
Browning
Brues, A. M. , 169
Bucketbrigadealgorithm
, 176- 79 (df.).
of credit
SeealsoApportionment
"
"
Buildingblocks, 174, 179- 80, 198. See
alsoRecombination
Burks, A. W. , 168
Buss, L. W., 193
CNS. SeeCentralnervoussystem
, 91, 166
Carryingcapacity
Cavicchio
, D. J., 161, 162, 163, 169
Cell assembly
, 60- 64, 155
Cellularautomata
, 168
Centralnervoussystem
, 3.6, 155- 57, 168
Chi-squarecontingency
, 164
Christiannsen
, F., xi

207

208

Chromosome
, 9, 13- 16, 21, 33- 34, 6667, 117- 18, 137, 153- 55. Seealso
Structures
Classifiersystems
, 172- 81, 195; bid, 176;
classifiers
, 172- 73, 174- 75 (df.); detec
tors, 172; effectors
, 172, 176; execution
, 172- 73, 175
cycle, 175; messages
Co- adapted
, II (df.), 34, 88, 136, 139,
161. SeealsoSchemata
Coding, 57. SeealsoRepresentation
, 176, 177, 197
Competition
, 184- 86, 195Complexadaptivesystems
98. SeealsoEcho
, 5, 21, 66, 167. Seealso
Component
Schemata
Studies
, 9.2
Computer
Condition
/actionrules. SeeClassifiersystems
Control, 3.5, 70- 71, 119- 20, 163- 64,
169. SeealsoFunctionoptimization
Controlpolicy, 54 (df.)
, 124- 25. SeealsoCriterion
Convergence
Cover, T. M., 76
Creditassignment
. SeeApportionment
of
credit; Bucketbrigadealgorithm
Criterion
, 12, 16- 19, 26- 28, 41- 42, 7576, 83- 84, 85, 87, 125, 129; examples
of, ch. 3
Crossingover, 6.2, 108.:-9, 110, 113,
121, 140, 152, 164, 167, 168; simple,
102(df.), 108, III , 121. SeealsoOperator
, genetic
Crossover
. SeeCrossingover
operator
Crow, J. F., 33
Cumulative
payoff, 26- 27, 38- 39, 42, 53,
55. SeealsoLossminimization

Index

Domainof action, 4, 18, 28. Seealso


Adaptiveplan; Structures
Dominance
, 112(df.)- 16, 169
Dubins, L. E. , 30, 31
Echo, 186- 95; combat, 188- 91; mating,
191; simulation, 192- 94; site, 191- 92;
trade, 191
Ecology, 168- 69. Seealso Environmental
niche
Economics, mathematical, 3.2, 67, 131.
Seealso Optimization
Eden. M .. 119

, 118, 144
Effectivelydescribable
. SeeCriterion
Efficiency

Enumeration
(enumerative
plan), 8, 13,
16- 17, 19, 26, 41, 69, 110, 124
- 25
Environment
, 1.2, 6, 11- 12, 16- 18,
2.1, 143, 153, 169;examples
of, ch. 3.
SeealsoPerformance
measure
Environmental
niche
, II (df.), 12, 33,
119
- 66, 168
- 69
, 165
- 54, 161
, 10(df.), 117, 153
Enzyme
- 39, 161
- 62.
, 10(df.), 34, 138
Epistasis

Seealso Nonlinearity
Error, 48, 52, 55, 56, 156
Evaluator, 48, 52, 156

Evolution
, 12, 17, 97, 119. SeealsoSpeciation

Falsepeak
. SeeLocaloptimum
Feedback
, 10, 119, 154
Feldman
, M. W., xi
Fisher
, R. A., 89, 118, 119, 137, 161
- 39, 168.
Fitness
, 4, 12- 16, 33- 34, 137
SeealsoAverage
excess
; Performance
measure
Davidson
Fixedpoint, 100
- 101, 134, 138
, E. H., 117, 153, 154
Dawkins
, R. , 193
, L. J., 163
Fogel
Defaulthierarchy
Frantz
, D. R., 161, 162, 164, 171
, 180- 81
', 181
, 20, 66. SeealsoApportionmentFunction
, 57, 70-71, 89- 90,
Decomposition
optimization
- 20, 163
of credit
99, 105
- 6, 119
- 64. Seealso
of
schema
.
See
Schema
Definingpositions
Optimization
Deletion
, 117(df.)
Detector
, 44 (df.), 3.4, 63, 66- 67, 117,
132, 8.1, 153, 155- 56, 162. Seealso
Representation
Distribution
. SeeProbability
, probability
distribution

Gale
, D., 36
, 2.3
Gambling
-playing
Game
;, 3.3, 7.3
Gedanken
186, 196

experiments
Allele.

. See
Gene

209

Index

K selection, 166
Generalproblem solver, 44
Kimura, M . , 33
Generation, 12 (df.), 13- 16, 94, 102 (df.),
133, 137
Geneticalgorithm. SeeAdaptive plan
Language, broadcast. SeeBroadcastlanguage
Geneticoperator. SeeOperator, genetic
Geneticplan. SeeAdaptive plan, genetic
Learning, 6, 60- 61, 155. Seealso Adaptive
Genetics, 1.4 , 3.I , 131, 7.4 , 153- 55
plan; Central nervoussystem
Length of schema. SeeSchema
Genotype, 9 (df.), 12, 32- 33, 161
Levins, R. , 32
Goal- directedsearch, 49, 52. Seealso
Search
Linkage, 34, 97, 102- 3, 106- 9, 135, 139,
140, 143, 164. Seealso Association
Good, I . J. , 60
Local optimum, 66, 104, 110, 111, 123,
Goodman, E. D. , 162
Goods, 36- 38. Seealso Economics, mathematical 133, 140, 160. Seealso Nonlinearity
Lookahead, 48, 52, 181, 185. Seealso
Prediction
Loss minimization, 42- 43, 75, 77- 83, 85Hamilton, W. D., xi , 193
87, 125, 129- 31. Seealso Perfonnance
Hebb, D. O., 58, 60, 64
measure
M.
E.
Hellman
76
,
,

, 157, 167- 68
Hierarchy
History. SeeInputhistory
Hofstadter
, D. R. , 181
Hollstien
, R. B., 161, 163, 169
, 50- 51
Homomorphism
Hybrid, 168

MacArthur, R. H. , 166
Machine Learning. SeeCla.~ ifier systems;
Samuel, A . L .
Maladaptation, 134- 35, 138. Seealso Local optimum

Immunesystem(biological), 155
Implicit parallelism. SeeIntrinsic parallelism

Marlmon,R. E., 186


Markovchain, 49- 52
Martin, N., 125
. See
Mathematical
economics

Inference, statistical, 13, 50, 159. Seealso


Functionoptimization; Prediction; Twoannedbandit
Input, 6- 7. Seealso Input history; Signal
Input history, 2, 12- 14, 16, 18, 23, 29,
30, 96. Seealso Storage, of input history

mathematical
Mayr. E. . 33. 119. 167. 168
Maze. 45- 47
Memory. 23. 56. 59. 93. 143. Seealso
Storage
Metazoanevolution, 193- 94

Interim perfonnance, I , 26, 29, 140. See


also Loss minimization
Internal model, 181, 185, 196- 97. See
also Lookahead
Intrachromosomalduplication, 116 (df.),
169- 70
Intrinsic parallelism, 71- 72, 88, 99, 1034, 125- 27, 130- 31, 140, 157, 160- 61
Inversion, 6.3, 115, 118, 121, 125- 27,
140, 161; simple, 108 (df .), 121, 127
Iosifescu, M . , 30

Minimum(expected
) loss. SeeLossminimization
Minsky, M. L. , 152
Model, 52- 53, 56, 63- 64, 143- 44, 153,
155- 57
Monod, J. , 117, 141, 153
, 168
Morphogenesis
Mutation
, 97, 16.4, 111, 113- 15, 116,
121-22, 127- 28, 136, 170

Jacob
, F., 117, 141, 153
Jeme
, N. K., 155

- 69
, 168
Migration
Milner
. P. M... 60

Needs
, 61- 64
Nervoussystem
. SeeCentralnervoussystem

210

Neuron, 60- 62, 155


Newell. A . .- 44

, 2, 5, 35, 39, 57, 136, 138Nonlinearity


39, 160, 164. SeealsoLocaloptimum
;
to adaptation
; Obstacles
Epistasis
Nonstation
arity, 35, 57, 168- 69

Index

, 15, 24, 28, 30- 31,


Probabilitydistribution
68, 70, 77, 87, 92. SeealsoSampling
, economic
, 36 (df.)- 38
Program
Punctuation
, 144, 149- 50, 152, 157, 168
-homomorphism
. SeeDefaulthierarchy
Quasi

Observed
best, 75, 77, 87- 88, 124- 25,
, 165-66
Queue
129, 140
Obstacles
to adaptation
r selection, 166
, 2, 5- 6, 9- 11, 1314, 65, 66, 75, -123, 140, 9.1; examples Randomvariable. SeeSampling; Schemata
of, ch. 3. SeealsoEpistasis
; Local
Ranking, 73- 74, 87- 88, 96, 103- 4 , 105,
139- 40, 160, 169
; Nonlinearity
; Nonstation
optimum
arity
, 1.2, 2.1, 92- 93, 152, 157; examples
Operator
Rankings, storage. SeeStorage, of rankings
of, ch. 3; genetic
, 14- 18, 33,
6.2, 6.3, 6.4, 6.5, 121- 22, 127- 28,
Recombination
, 180, 191, 197, 198. See
140, 152, 157, 161, 167-68, 169- 70;
also Crossingover
losses
: examplesof , 66- 67, 106, 103, 110- 11, 125- 27, 140
Representation
7, 109, 112- 13, 116, 148- 52; homolo, 117- 18(df.), 153- 55
Operon
, 1, 4, 19, 38- 39, 54- 55, 57,
Optimization
gous, 109 (df.); via broadcastlanguage,
75- 76, 90, 120, 123, 140, 160- 61. See
141, 8.2- 8.4 , 167- 68; via detectors,
alsoFunctionoptimization
57, 66 (df.)- 71, 74, 89, 98, 131, 140,
8.1

Parallelism
, 174, 178, 192, 197
Reproductiveplan. SeeAdaptive plan, reproductive
Paralleloperation
, 8.2- 8.4 SeealsoIntrinsic
Resourcerenewalrate, 165- 66, 168
parallelism
Patternrecognition
Riolo, R. L. , 180, 181
, 1.3, 3.4, 63, 132,
155- 56, 162- 63
Robustness
, 17- 19, 27, 34- 35, 121, 12425; examplesof , 7.3, 7.4. Seealso
Payoff, 25 (df.), 26- 27, 40, 42, 76, 132,
160, 165, 166, 172, 177. SeealsoPerformanceCriterion
measure
, R. S. , 161
Rosenberg
Rule discovery, 179- 181
Payoff-onlyplan, 26, 29 (df.), 39, 42, 132
Performance
measure
, 1.2, 8, 12, 18, 66,
69, 74, 75, 87- 88, 124, 139- 40, 159. SeeSampling
Samplespace
61, 168- 69; examples
of, ch. 3. See
, 49- 50, 68- 73, 75- 77, 140,
Sampling
alsoCriterion; Fitness
160; of (i , 8, 12, 23- 24, 66, 74, 90; Payoff; Utility
Permutation
94, 124; of schemata
, 107- 8, 127
, 71- 73, 75, 85Phasespace
88, 98- 99, 127, 128- 30, 139- 40, 157,
, 54 (df.). SeealsoControl
160- 61; of severalrandomvariables
, 10(df.), 11- 12, 33- 34, 160
Phenotype
,
Plum, T. W.-S., 155
5.I , 5.3
. SeeAdaptiveplan
Policy. SeeControlpolicy
Samplingprocedure
: asa database, 73- 74, 87- 88,
Samuel
Population
, A. L., 17, 40, 42, 43, 44, 132,
91, 92- 93, 95, 96, 99- 100, 103- 4, 110,
196
125- 27, 133, 139- 40, 156, 160; biologSantaFeInstitute, x, 184, 185
ical, 12- 15, 33- 35, 136, 139, 161- 62,
, L. J., 30- 31
Savage
165-66, 168- 69
Schema
: lengthof, 102(df.)- 3, 108, 129;
Predict
~~i_O
__lil, 48, 50- 52, 56, 63, 143, 153,
definingpositionsof, 72 (df.), 125, 165
155- 56, 161. SeealsoLookahead
Schemata
;
, 19, 68 (df.), ch. 4. 8.I , 157,
Model
160- 61; andcoadaptation
, 89, 119,

211

Index

niches
136- 39, 161; andenvironmental
,
of, 74,
11, 165- 67, 168- 70; processing
87- 88, 89, 97- 108, 115- 16, 119- 20,
127, 7.5, 160- 61; rankingof, 73- 74,
75, 5.4, 125, 127, 130, 7.5, 160
Search
, 3.4, 66, 155- 57, 160, 162- 63,
164
, 116(df.)
Segregation
Sela, M. , 155
Self-reproduction
, 152
, O. J., 156
Selfridge
Shaw
, J. C., 44
Signal, 22- 23, 117- 18, 8.2, 152, 153- 57
Simon, H. A. , 44, 168
. SeeCrossing
operator
Simplecrossover
over, simple
. SeeInversion
,
Simpleinversionoperator
simple
, 164- 67, 168- 69
Speciation
State
, 13, 23, 30, 54- 55, 59, 93, 147
. SeeInference
Statisticalinference
, statistical
Steadystate. SeeFixedpoint
Stebbins
, G. L. , 118
Stimulus
. SeeSignal
-response
Stimulus
theory, 59. Seealso
Broadcast
language
Stochastic
, 24, 26--27, 28, 90, 92process
93, 95, 100, 123, 133. Seealsf) Markov
chain
: of inputhistory, 23, 30, 56, 143;
Storage
of rankings
, 69- 70, 73- 74, 87, 96, 98,
103- 4, 125, 140, 160
, 30- 31, 3.3, 7.3
Strategy
Structures
, 1.2, 16--19, 21- 23, 66- 67,
89, 92- 93, 97- 98, 107, 113, 116, 117,
of, ch. 3
144, 167- 68; examples
Supergene, 167
, 61- 62, 155
Synapse
Tag, 175, 186, 188- 91
,
, 36- 38. SeealsoEconomics
Technology
mathematical
device, 6 (df.), 1.3
Threshold
Transitionfunction, 13, 23, 50
Translocation
, 116(df.), 118
Treegraph, 40 (df.), 41
Trial. SeeSampling
Trial anderror. SeeEnumeration

Trials, optimalallocation
, ch. 5
Tsypkin, Y. Z., 2, 54, 55
Two-annedbandit, 5.1; revisited
, 10.2
Uhr, L., 162
Utility, 30, 31, 38 (df.)- 39, 49- 51. See
alsoPerformance
measure
VonNeumann
, J., 36, 37, 39, 42
VonNeumann
technology
, 3.2, 67
Vossler
, C. , 162
, C. H., 119
Waddington
Wallace
, B., 118
WeinbergR., 162
Wilson, E. 0 . , 166

This excerpt from


Adaptation in Natural and Artificial Systems.
John H. Holland.
1992 The MIT Press.
is provided in screen-viewable form for personal use only by members
of MIT CogNet.
Unauthorized use or dissemination of this information is expressly
forbidden.
If you have any questions about this material, please contact
[email protected].

You might also like