Adaptation in Natural and Artificial Systems
Adaptation in Natural and Artificial Systems
When this book was originally publishedI was very optimistic, envisioningextensive
reviews and a kind of " best seller" in the realm of monographs
. Alas! That did not
happen. After five yearsI did regain someoptimism becausethe book did not " die,"
as is usual with monographs
, but kept on selling at 100- 200 copies a year. Still ,
researchin the areawas confinedalmostentirely to my studentsand their colleagues,
and it did not fit into anyone's categories. " It is certainly not a part of artificial
"
"
intelligence and Why would somebodystudy learning by imitating a processthat
takesbillions of years?" are typical of commentsmadeby thoselessinclined to look
at the work.
Five more yearssaw the beginningsof a rapid increasein interest. Partly, this
interest resulted from a change of focus in artificial intelligence. Learning, after
severaldecadesat the peripheryof artificial intelligence, wasagainregardedas pivotal
in the study of intelligence. A more important factor, I think, was an increasing
recognitionthat geneticalgorithmsprovide a tool in areasthat do not yield readily to
standardapproaches. Comparativestudiesbeganto appear, pointing up the usefulness
of genetic algorithms in areasranging from the design of integratedcircuits and
communicationnetworksto the designof stockmarketportfolios andaircraft turbines.
Finally, and quite important for future studies, genetic algorithmsbeganto be seen
as a theoreticaltool for investigatingthe phenomenageneratedby complexadaptive
- a collective designationfor nonlinear
systems
systemsdefinedby the interactionof
numbers
of
economies
large
, political systems, ecologies, immune
adaptiveagents(
systems, developingembryos, brains, and the like) .
The last five years have seen the number of researchersstudying genetic
algorithms increasefrom dozensto hundreds. There are two recent innovationsthat
will strongly affect thesestudies. The first is the increasingavailability of massively
parallel machines. Geneticalgorithmswork with populations, so they are intrinsically
suitedto executionon computerswith large numbersof processors
, using a processor
for each individual in the population. The secondinnovation is a unique interdisci-
Preface
The first technical descriptions and definitions of adaptation come from biology.
In that context adaptation designatesany processwhereby a structure is progressively
modified to give better performance in its environment. The structures may
'
range from a protein molecule to a horse s foot or a human brain or , even, to an
interacting group of organisms such as the wildlife of the African veldt. Defined
more generally, adaptive processes have a critical role in fields as diverse as psychology
"
"
"
"
( learning ), economics ( optimal planning ), control , artificial intelligence
"
, computational mathematicsand sampling( statistical inference" ) . Basically,
adaptive processes are optimization processes, but it is difficult to subject them to
unified study becausethe structures being modified are complex and their performance
is uncertain. Frequently nonadditive interaction (i.e., " epistasis" or
"
"
nonlinearity ) makes it impossible to determine the performance of a structure
from a study of its isolated parts. Moreover possibilities for improved performance
must usually be exploited at the sametime that the searchfor further improvements
is pressed. While these difficulties pose a real problem for the analyst, we know
that they are routinely handled by biological adaptive processes, qua processes
.
The approach of this book is to set up a mathematical framework which makesit
possibleto extract and generalizecritical factors of the biological processes. Two
of the most important generalizationsare: ( I ) the concept of a schemaas agen eralization of an interacting, coadapted set of genes, and ( 2) the generalizationof
genetic operators such as crossing-over, inversion, and mutation. The schema
conceptmakesit possibleto dissectand analyzecomplex " nonlinear" or " epistatic"
interactions, while the generalizedgenetic operators extend the analysisto studies
of learning, optimal planning, etc. The possibility of " intrinsic parallelism" - the
testing of many schemataby testing a single structure - is a direct offshoot of this
approach. The book developsan extensivestudy of intrinsically parallel processes
and illustrates their uses over the full range of adaptive processes, both as hypotheses
and as algorithms.
The book is written on the assumptionthat the reader has a familiarity with
Pref~
1. The GeneralSetting
1. INTRODUCTION
How does evolution produce increasingly fit organisms in environments which are
highly uncertain for individual organisms?
'
What kinds of economic plan can upgrade an economy s performance in spite of
the fact that relevant economic data and utility measuresmust be obtained as the
economy develops?
How does an organism use its experienceto modify its behavior in beneficial ways
"
"
(i .e., how does it learn or adapt under sensory guidance ) ?
How can computers be programmed so that problem solving capabilities are built
"
"
"
"
up by specifying what is to be done rather than how to do it ?
What control procedures can improve the efficiency of an ongoing process,
when details of changing component interactions must be compiled and used
concurrently ?
Though these questions come from very different areas, it is striking how much
they have in common. Each involves a problem of optimization made difficult by
substantial complexity and uncertainty. The complexity makes discovery of the
optimum a long, perhaps never-to-be- completed task, so the best among tested
options must be exploited at every step. At the sametime, the uncertainties must
be reducedrapidly, so that knowledgeof availableoptions increasesrapidly . More
succinctly, information must be exploited asacquired so that performanceimproves
apace. Problems with thesecharacteristicsare even more pervasivethan the questions
above would indicate. They occur at critical points in fields as diverse as
evolution, ecology, psychology, economic planning, control , artificial intelligence,
computational mathematics, sampling, and inference.
There is no collective name for such problems, but whenever the term
adaptation(ad + aptare, to fit to ) appearsit consistently singlesout the problems
of interest. In this book the meaning of " adaptation" will be extendedto encom-
Theory will have a central role in all that follows, but only insofar as it
illuminates practice. For natural systems, this means that theory must provide
techniquesfor prediction and control ; for artificial systems, it must provide practical
algorithms and strategies. Theory should help us to know more of the mechanisms
of adaptation and of the conditions under which new adaptations arise.
It should enable us to better understand the processes whereby an initially unorganized
system acquires increasing self-control in complex environments. It
should suggestprocedureswhereby actions acquired in one set of circumstances
can be transferred to new circumstances. In short, theory should provide us with
means of prediction and control not directly suggestedby compilations of data
or simple tinkering . The developmenthere will be guided accordingly.
The fundamental questions listed above can serve as a starting point for a
unified theory of adaptation, but the informal phrasing is a source of difficulty .
With the given phrasing it is difficult to conceive of answerswhich would apply
unambiguouslyto the full range of problems. Our first task, then, is to rephrase
questions in a way which avoids ambiguity and encouragesgenerality. We can
avoid ambiguity by giving precisedefinitions to the terms appearing in the questions
, and we can assurethe desired generality if the terms are defined by embedding
them in a common formal framework. Working within such a framework
we can proceed with theoretical constructions which are of real help in answering
the questions. This, in broad outline, is the approach we will take.
2 . PRELIMINARY SURVEY
(Sincewe are operating outside of a formal framework in this chapter, someof the
statementswhich follow will be susceptibleof different, possibly conflicting interpretations
. Preciseversions will be formulated later.)
'
Just what are adaptation s salient features? We can seeat once that adaptation
, whatever its context, involves a progressivemodification of some structure
or structures. These structures constitute the grist of the adaptive process, being
largely determined by the field of study. Careful observation of successivestructural
modifications generallyrevealsa basicset of structural modifiers or operators;
.
repeated action of these operators yields the observed modification sequences
the
associated
with
structures
operators
Table I presentsa list of sometypical
along
for severalfields of interest.
A systemundergoing adaptation is largely characterizedby the mixture of
operators acting on the structures at each stage. The set of factors controlling this
changing mixture - the adaptive plan- constitutes the works of the system as far
as its adaptive character is concerned. The adaptive plan determinesjust what
structuresarise in responseto the environment, and the set of structuresattainable
Table1: Typical
Sb' uctures. . . Operators
Field
Structures
Genetics
Operators
chromosomes
mutation, recombination
,
etc.
mixes of goods
productionactivities
'
Bayess rule, successive
approximation, etc.
synapsemodification
rules for iterative approximation
of optimal strategy
66
"
Iearning rules
policies
cell assemblies
Physiologicalpsychology
Gametheory
strategies
Artificialintelligence
programs
by applying all possible operator sequencesmarks out the limits of the adaptive
'
plan s domain of action. Since a given structure performs differently in different
environments- the structure is more or less fit - it is the adaptive plan' s task to
"
"
produce structures which perform well (are fit ) in the environment confronting
"
"
it . Adaptations to the environment are persistent properties of the sequenceof
structures generatedby the adaptive plan.
A precisestatement of the adaptive plan' s task servesas a key to uniform
treatment. Three major componentsare associatedin the task statement: ( I ) the
environment, E, of the system undergoing adaptation, (2) the adaptive plan, T,
'
whereby the systems structure is modified to effect improvements, (3) a measure,
#I., of performance, i.e., the fitness of the structures for the environment. ( The
formal framework developedin chapter 2 is built around thesethree components.)
The crux of the problem for the plan T is that initially it has incomplete information
about which structures are most fit . To reduce this uncertainty the plan must
"
test th~ performanceof different structuresin the environment. The " adaptiveness
of the plan enters when different environmentscausedifferent sequencesof structures
to be generatedand tested.
In more detail and somewhat more formally : A characteristic of the environment
can be unknown (from the adaptive plan' s point of view) only if alternative
outcomes of the plan' s tests are allowed for. Each distinct combination of
alternatives is a distinct environment E in which the plan may have to act. The set
of all possiblecombinations of alternatives indicates the plan' s initial uncertainty
about the environment confronting it - the range of environments in which the
plan should be able to act. This initial uncertainty about the environment will be
formalized by designatinga class8 of possibleenvironments. The domain of action
PerformanceMeasure
Genetics
Fitness
Economicplanning
Control
Utility
Error functions
Physiologicalpsychology
intelligence
Payoff
Comparativeefficiency(if specified
at all)
4. The performance measure varies over time and space so that given
adaptations are only advantageousat certain placesand times.
3. A SIMPLEARTIFICIALADAPTIVESYSTEM
.The artificial adaptive systemof this
example is a pattern recognition device. ( The
device to be described has very limited capabilities; while this is important in
'
applications, it does not detract from the device s usefulness as an illustration .)
The information to be fed to the adaptive device is preprocessedby a rectangular
array of sensors, a units high by b units wide. Each sensor is a threshold device
which is activated when the light falling upon it exceedsa fixed threshold. Thus,
when a " scene" is presentedto the sensor array at some time I , each individual
sensor is either " on" or " off" depending upon the amount of light reaching it .
Let the activity of the ith sensor, i = 1, 2, . . . , ab, at time I be representedformally
"
by the function a.( I ), wherea.( I ) = 1 if the sensoris " on and a.( I ) = 0 if it is " off."
"
A given scenethus givesrise to a configuration of ab ones" and " zeros." All told
there are 20. possible configurations of sensoractivation ; let C designatethis set
.
of possible configurations. It will be assumedthat a particular subset of Cl of C
correspondsto (instancesof) the pattern to be recognized. The particular subset
involved, among the 22- posSible, will be unknown to the adaptive device. ( E.g.,
Cl might consist of all configurations containing a connected X -shaped array of
ones, or it might consist of all configurations containing as many ones as zeros,
or it might be anyone of the other 22- possiblesubsetsof C.) This very large set of
possibilities constitutes the class of possibleenvironments 8 ; it is the set of alternatives
the adaptive plan must be prepared to handle. The adaptive device's task
is to discover or " learn" which element of8 is in force by learning what configurations
belong to Ct. Then, when an arbitrary configuration is presented, the device
can reliably indicate whether the configuration belongs to Cl, thereby detecting
an instance of the pattern.
SENSOR
SCENE
.
.
.
THRESHOLD
CE
DEVI
J, Systems
Adaptation in Natural and ArtiJicia
weights for which the partition ( C+, C- ) approximates the partition ( C1, Co), so
that c+ ~ C1 and C- ~ Co. ( This device, as noted earlier, is quite limited ; there
are many partitions ( C1, Co) that can only be poorly approximated by ( C+, C- ),
no matter what set of weights is chosen.) Now , let W = {VI, VI, . . . , Vi} be the set
of possible values for the weights Wi; that is, each Wi E: W, i = 1, . . . , ab. Thus,
with a fixed threshold K , the set of attainable structures is the set of all ab-tupies,
Wa6.
The natural performance measure, liB, relative to any particular partition
E E: 8 is the proportion of all configurations correctly assigned(to C1 and Co).
That is, liB maps each ab-tuple into the fraction of correct recognitions achieved
- +[ O, 1] . (In this example the outcome
thereby, a number in the interval [0, I ] , liB: Wa6
of each test- " configuration correctly classified" or " configuration incorrectly
classified" - will be treated as the plan' s input . The sameab- tuple may have
to be tested repeatedly to establish an estimate of its performance.)
A simple plan TOfor discovering the best set of weights in Wa6is to try
various ab- tupies, either in some predetermined order or at random, estimating
the performance of each in its turn ; the best ab- tuple encounteredup to a given
"
"
point in time is savedfor comparison with later trials- this best-to-date ab-tuple
being replaced immediately by any better ab-tuple encounteredin a later trial . It
should be clear that this procedure must eventually uncover the " best" ab- tuple
in Wa6
. But note that even for k = 10 and a = b = 10, Wa6has 10100
elements.
This is a poor augury for any plan which must exhaustivelysearch W.&. And that
is exactly what the plan just describedmust undertake, sincethe outcome of earlier
tests in no way affects the ordering of later tests.
Let ' s look at a (fairly standard) plan TOO
which doesusethe outcome of each
test to help determine the next structure for testing. The basic idea of this plan is
to change the weights whenever a presentation is misassignedso as to decrease
the likelihood of similar misassignmentsin the future. In detail : Let the values in
W be ordered in increasingmagnitude so that Vi+l > Vi, j = I , 2, . . . , k - 1 (for
instance, the weights might be located at uniform intervals so that Vi+l = Vi + ~ ) .
Then the algorithm proceedsaccording to the following prescription :
1. If the presentation at time t is assignedto Cowhen it should have been
assignedto C1 then, for each i such that 8.{ t ) = 1, replace the corresponding
weight by the next highest weight (in the case of uniform
intervals the new weight would be the old weight Wi increased by A,
Wi + .6). Leave the other weights unchanged.
looking briefly at
different alleles of
the variations in
these proteins ( or
has attempted these adjustments can testify, but it pales in comparison to the
genetic casewhere dozensor hundreds of interdependentalleles can be involved.
Roughly, the difficulty of the problem increasesby an order of magnitude for each
additional gene when the interdependenciesare intricate ( but seethe discussions
in chapter 4 and pp. 160- 61).
Given the pervasivenessof epistasis, adaptation via changes in genetic
makeup becomesprimarily a search for coadaptedsets of alleles- alleles of different
geneswhich together significantly augment the performance of the corresponding
phenotype. (In chapter 4 the concept of a coadaptedset of alleleswill be
generalized, under the term schema, to the point where it applies to the full range
of adaptive systems.) It should be clear that coadaptation dependsstrongly upon
the environment of the phenotype. The large coadapted set of alleles which produces
gills in fish augments performance only in aquatic environments. This dependence
of coadaptation upon characteristics of the environment gives rise to
the notion of an environmentalniche, taken here to mean a set of features of the
environment which can be exploited by an appropriate organization of the phenotype. ( This is a broader interpretation than the usual one which limits niche to
those environmental features particularly exploited by a given species.) Examples
of environmental niches fitting this interpretation are: (i ) an oxygen-poor, sulfurrich environment such as is found at the bottom of ponds with large amounts of
decaying matter- a class of anaerobic bacteria, the thiobacilli , exploits this niche
by meansof a complex of enzymesenabling them to use sulfur in place of oxygen
to carry out oxidation ; (ii ) the " bee-rich " environment exploited by the orchid
Ophrys apifera which has a flower mimicking the bee closely enough to induce
pollination via attempted copulation by the male bees; (iii ) the environment rich
in atmosphericvibrations in the frequencyrangeof 50 to 50,000 cyclesper secondthe bonesof the mammalian ear are a particular adaptation of parts of the reptilian
jaw which aids in the detection of these vibrations, an adaptation which clearly
must be coordinated with many other adaptations, including a sophisticated
information -processingnetwork, before it can improve an organism' s chancesof
survival. It is important to note that quite distinct coadapted sets of alleles can
exploit the sameenvironmental niche. Thus, the eye of aquatic mammals and the
(functionally similar) eye of the octopus exploit the same environmental niche,
but are due to coadaptedsets of alleles of entirely unrelated setsof genes.
The various environmental niches E E:: 8 define different opportunities for
adaptation open to the genetic system. To exploit these opportunities the genetic
systemmust selectand use the setsof coadapted alleles which produce the appropriate
phenotypic characteristics. The central question for geneticsystemsis: How
Systems
TheGeneral
Sett;nR
to each individual , '7"1 acts as follows: At the beginning of each time period t , the
'
plan s accumulatedinformation about the environment residesin a finite population
Ci(t ) selectedfrom (i,. The most important part of this information is given by
the discrete distributions which give the proportions of different sets of alleles in
the population Ci(t ). Ci(t ) servesnot only as the plan' s repository of accumulated
'
information , but also as the sourceof new variants which will give rise to Ci(t + 1).
As indicated earlierithe formation of Ci(t + 1) proceedsin two phases. During the
first phase, Ci(t ) is modified to form (i, '( I ) by copying each individual in Ci(t ) a
number of times dependent upon the individual ' s observed performance. The
number of copies made will be determined stochastically so that the expected
number of copies increasesin proportion to observed performance. During the
secondphase, the operators are applied to the population (i,' (I), interchanging and
modifying the setsof alleles, to produce the new generation Ci(t + I ) .
One key to understanding '7"1'S resolution of the dilemma lies in observing
what happensto small setsof adjacent alleles under its action. In particular, what
happensif an adjacent set of alleles appears in several different chromosomesof
above-averagefitness and not elsewhere? Becauseeach of the chromosomeswill
be duplicated an above-averagenumber of times, the given alleles will occupy an
increasedproportion of the population after the duplication phase. This increased
proportion will of course result whether or not the alleles had anything to do with
the above-averagefitness. The appearanceof the alleles in the extra-fit chromosomes
, but it is equally true that any correlation between
might be happenstance
the given selection of alleles and above-average fitness will be exploited by this
action. Moreover, the more varied the chromosomescontaining the alleles, the
lesslikely it is that the alleles and above-averagefitness are uncorrelated.
What happensnow when the genetic operators {} are applied to form the
next generation? As indicated earlier, the closer alleles are to one another in the
chromosome the less likely they are to be separatedduring the operator phase.
Thus the operator phase usually transfers adjacent sets of genesas unit , placing
them in new chromosomal contexts without disturbing them otherwise. These
new contexts further test the sets of alleles for correlation with above-average
fitness. If the selectedset of allelesdoes indeed augment fitness, the chromosomes
containing the set will again (on the average) be extra fit . On the other hand, if the
, sustained association with extra-fit
prior associationswere simply happenstance
chromosomesbecomesincreasingly less likely as the number of trials (new contexts
) increases. The net effect of the genetic plan over severalgenerationswill be
an increasing predominanceof alleles and sets of alleles augmenting fitness in the
given environment.
The GeneralSetting
However, in all but the most constrained situations, enumerativeplans are a false
lead.
"
The flaw, and it is a fatal one, assertsitself when we begin to ask, How
"
long is eventually? To get somefeeling for the answerwe need only look back at
structures in d .
the first example. For that very restricted systemthere were 10100
In most casesof real interest, the number of possiblestructures vastly exceedsthis
number, and for natural systemslike the geneticsystemswe have already seenthat
arise. If 1012structures could be tried every second
numbers like 210,000~ 101000
(the fastestcomputers proposedto date could not even add at this rate), it would
take a year to test about 3 . 1018structures, or a time , 'a.stly exceedingthe estimated
structures.
age of the universeto test 10100
It is clear that an attempt to adapt by meansof an enumerativeplan is foredoomed
in all but the simplestcasesbecauseof the enormoustimes involved, This
extreme inefficiency makes enumerative plans uninteresting either as hypotheses
.about natural processes or as algorithms for artificial systems. It follows at once
that an adaptive plan cannot be consideredgood simply becauseit will eventually
produce fit structuresfor the environmentsconfronting it ; it must do so in area
"
"
on
the
sonable time span. What a reasonabletime span is depends strongly
environments(problems) under consideration, but in no casewill it be a time large
"
with respectto the age of the universe. This question of efficiency or reasonable
"
time span is the pivotal point of the most serious contemporary challenge to
evolutionary theory : Are the known geneticoperators sufficient to account for the
changesobservedin the alloted geological intervals? There is of course evidence
for the existenceof adaptive plans much more efficient than enumeration. Arthur
Samuel( 1959) has written a computer program which learned to play tournament
calibre checkers, and humans do manageto adapt to very complex environments
in times considerably lessthan a century. It follows that a major part of any study
of the adaptive processmust be the discovery of factors which provide efficiency
while retaining the " universality" (robustness) of enumeration. It does not take
analysisto seethat an enumerative plan is inefficient just becauseit always generates
structures in the same order, regardlessof the outcome of tests on those
structures. The way to improvement lies in avoiding this constraint.
'
The foregoing points up again the critical nature of the adaptive plan s
initial uncertainty about its environment, and the central role of the proceduresit
usesto store and accessthe history of its interactions with that environment. Since
'
different structures perform differently in different environments, the plan s task
is set by the aspectsof the environment which are unknown to it initially . It must
J Systems
Adaptation in Natural and Arlificia
generate structures which perform well (are fit ) in the particular environment
confronting it , and it must do this efficiently. Interest centers on robust adaptive
plans- plans which are efficient over the range of environments 8 they may encounter
. Giving robustness precise definition and discovering something of the
factors which make an adaptive plan robust is the formal distillation of questions
about efficiency. Becauseefficiencyis critical , the study of robustnesshas a central
place in the formal development.
The discussion of genetic systemsemphasizedtwo general requirements
bearing directly on robustness: ( I ) The adaptive plan must retain advancesalready
made, along with portions of the history of previous plan-environmentinteractions.
(2) The plan must use the retained history to increasethe proportion of fit structures
generatedas the overall history lengthens. The samediscussionalso indicated
the potential of a particular class of adaptive plans- the reproductive plans. One
of the first tasks, after setting out the formal framework, will be to provide a general
definition of this class of plans. Lifting the reproductive plans from the specific
geneticcontext makesthem useful acrossthe full spectrum of fields in which adaptation
has a role. This widened role for reproductive plans can be looked upon as
a first validation of the formalism. A much more substantial validation follows
closely upon the definition , when the general robustnessof reproductive plans is
proved via the formalism. Later we will see how reproductive plans using generalized
genetic operators retain and exploit their histories. Throughout the development
, reproductive plans using genetic operators will serveto illuminate key
features of adaptation and, in the process, we will learn more of the robustness,
wide applicability , and general sophistication of such plans.
Summarizing: This entire survey has been organized around the concept of
an adaptive plan. The adaptive plan, progressivelymodifying structure by means
of suitable operators, determineswhat structures are produced in responseto the
environment. The set of operatorsn and the domain of action of the adaptive plan
(t (i.e., the attainable structures) determine the plan' s options ; the plan' s objective
is to produce structures which perform well in the environment E confronting it .
The plan' s initial uncertainty about the environment- its room forimprovement is reflectedin the range of environments8 in which it may haveto act. The related
performance measuresIJ.BE E: 8, change from environment to environment since
the same structure performs differently in different environments. These objects
lie at the center of the formal framework set out in chapter 2. Chapter 3 provides
illustrations of the framework as applied to genetics, economics, game-playing,
searches, pattern recognition, statistical inference, control , function optimization ,
and the central nervous system.
2. A FormalFramework
1. DISCUSSION
Threeassociated
objectsoccupiedthe centerof the preliminarysurvey
E, the environment of the system undergoing adaptation,
1', the adaptive plan which determines successivestructural modifications in response
to the environment,
.
#I., a measureof the performanceof differentstructuresin the environment
A Formal Framework
X At = nf - lAi8
Finally , the set d will usually be potential rather than actual. That is, elements
becomeavailable to the plan only by successivemodification (e.g., by rearrangement
of components or construction from primitive elements), rather than by
selection from an extant set. We will examine all of these possibilities as we go
along, noting that relevant elaborations of the elements of d provide a way of
specializingthe generalparts of the theory for particular applications.
The adaptiveplan T produces a sequenceof structures, i.e., a trajectory
throughd , by making successiveselectionsfrom a set of operatorsO. The particular
PLAN
ADAPTIVE
T
,.~
.
~
7
,
"\,-'/".*'~
~
1
?
'
)
I
/
'
.
"
,\xI"f..(~
.
,2
1
/)!@
"i:-~
'S
.I
3
SELECTED
BY
OPERATORS
PLAN
TRIED
STRUCTURES
PERFORMANCES
0 SSERVED p. E( A( I , IJ
.E(.,f (2 , IJ.E(Jf.(3 , ...
E
IN ENVIRONMENT
'
Fig . 2. Schematicof adaptiveplan " operation
A Formal Framework
can transmit. That is, given a particular signal I ( t) E: lat time t, the ith component
I .( t ) is the value 8.( 1) of the ith sensorat time t. In generalthe setsIi may be quite
different, corresponding to different kinds of sensorsor sensorymodalities.
The formal presentationof an adaptive plan 'Tcan be simplified by requiring
that (I.( t) serveas the state of the plan at time t. That is, in addition to being the
structure tried at time t , Ct.(t ) must summarizewhateveraccumulatedinformation is
to be available to 'T. We havejust provided that the total information receivedby 'T
up to time t is given by the sequence(/ ( 1), 1(2), . . . , / ( t - I )} . Generally only part
of this information is retained. To provide for the representationof the retained
information we can make use of the latitude in specifying (1. Think of (1 as consisting
of two components(11and 3Jt, where (l.l( t) is the structure tested against the
environment at time t, and the memory mt( t) representsother retained parts of
the input history (/ ( 1), / (2), . . . , / ( t - I )} . Then the plan can be representedby the
two- argument function
'T:I X (1- + (1.
-+a
'
can at oncebe elaborated
, without loss of generalityor rangeof application, to
the form
'1':1 X (al X 3Jt) - + (al X 3Jt).
Thusthe frameworkcan be developedin termsof the simple, two- argumentform
of '1', elaboratingit wheneverwe wishto studythe mechanisms
of trial selectionor
in
detail
.
memoryupdate greater
ISystems
Adaptation in Natural and Artificial
In what follows it will often be convenient to treat the adaptive plan 'Tas a
stochastic process; instead of detennining a unique structure <i( 1 + I ) from 1( 1)
and <i ( I), 'Tassignsprobabilities to a rangeof structuresand then selectsaccordingly.
That is, given 1( 1), <i ( 1) may be transfonned into anyone of several structures
A ~, A ~, . . . , Aj , . . . , the structure Aj being selectedwith probability Pl . More
formally : Let <Pbe a set of admissibleprobability distributions over d . Then
.,.:1 X a. - + <P
will beinterpretedasassigningto eachpair (1( 1), (t(1 a particulardistributionover
Ct, <P(1+ I ) E: (9. The structureCt(1+ I ) to be tried at time 1+ I will then be
determined by drawing a random sample from Ci according to the probability
distribution <p( t + I ) = .,(1( t), Ci( t . For those caseswhere the plan T is to determine the next structure <l ( t + I ) uniquely, the distribution <P(t + I ) simply becomes
a degenerate
, one-point distribution where a single structure in disassigned
probability I . Hence the form
T:/ X d - + CP
includes the previous
T:/ Xd - + d
as a special case.
In practice the transformation of Ci( t) to Ci( t + I ) is usually accomplished
,..' :1 X Ci- + n
and the setof operators
() = {C
-':ct -to cP
}
where the stochastic
A Formal Framework
That is, the rangeof ".' can bechangedfrom n to (J' with ".' beingredefinedso that
".'(/ (t), Ct(t = [".'(;(t), Ct(t ](Ct(t = <p(t + I ).
With this extension".' and ". becomeidentical; for this reasononesymbol" "." will
be usedto designateboth functions, the range being specifiedwheneverthe distinction
is important .
The general objective of this formalism is comparison of adaptive plans,
either as hypothesesabout natural phenomena or as algorithms for artificial
systems. The comparison naturally centers on the efficiency of different plans in
locating high performancestructures under a variety of environmental conditions.
For a comparison to be made there must be a set of plans, given either explicitly
or implicitly , which are candidatesfor comparison. This set will be formally designated
3. Often 3 will be the set of all possibleplans employing the operators in 0 ,
but in some casesthere will be constraints restricting 3, while in others 3 will be
enlargedto include all possibleplans over Ct(i.e., all possiblefunctions of the form
'T: 1 X Ct- . <P). 3, however defined, represents the set of technical or feasible
options for the adaptive systemunder consideration.
As indicated in the survey, a nontrivial problem of adaptation exists only
when the adaptive plan is faced with an initial uncertainty about its environment.
This uncertainty is formalized by designatingthe set 8 of alternativescorresponding
to characteristicsof the environment unknown to the adaptive plan. The dependence
of the plan' s action upon the environment finds its formal counterpart in the
dependenceof the input I ( t) upon which environmentE E: 8 actually confronts
the plan. One case of particular importance is that in which the adaptive plan
receivesa direct indication of the performanceof each structure it tries. That is, a
part of the input I ( t ) will be the payoff 1I. (Ct(t determined by the function
:<1- ' Reals
118
which measuresthe performance of each structure in the given environment.
Sometimes, when the performance of a structure in the environment E
dependsupon random factors, it is useful to treat the utility function as assigning
a random variable from some predeterminedset ' Uto each structure in Ct. Thus
liB: (t - + '\1
and the payoff assignedto 6( t) is determined by a trial of the random variable
/I.. (6( t = ~ t). This extension does not add any generality to the framework
(and henceis unnecessaryat the abstract level) becauseany randomnessinvolved
,
Adaptation in Natural and Artificial Systems
in the interaction between the adaptive system and its environment can be subsumed in the stochastic action of the operators. (See chapter 5 and section 7.2,
however .)
A Formal Framework
to substitute the expectedpayoff under <p(t), P. ( T, I), for 1I. (Ci(T, t . ( If Ctis countable
, P. ( 1', t ) is simply given by P. ( 1', t ) = Ej <p(Aj , t)II. (Aj) where <p(Aj, t) is the
probability of selecting Aj E: Ct when the distribution over Ct is <p(t ).) Thus, for
stochasticadaptive plans,
U,... (T) = ET. IP. ( T, I).
Following this line, a useful performancetarget can be formulated in terms
of the greatestpossiblecumulative payoff in the first T time-steps,
.
U,.
UJ( T) = lob
,.3 .. (T}
An important criterion, appearingfrequentlyin the literatureof control theory
economics(seechapter3, " Illustrations" ), canbeconciselyformulated
and mathematical
in termsof UJ: .,. accumulates
payoffat an asymptoticoptimalrate if
Jim Uf' T)/ T)/ (UJ(T)/ T) ] = Jim [ Uf'.~ T)/ UJ(T)] = 1.
T- T- - [( .~
In other words, the rate at which 'Taccumulatespayoff is, in the limit , the sameas
the best possiblerate. Often it is desirableto have a much stronger criterion setting
standardson interim behavior. That is, eventhough the payoff rate approaches the
optimum, it may take an intolerably long time before it is reasonably close. Thus,
the stronger criterion setsa lower bound on the rate of approach to the optimum .
For example, the criterion would designatea sequence(c'/') approaching 0 (such as
"1'
kj (T + k 1) or (kj (k + el , for 0 < j < ~ ) and then require for all T
[ Ut' ,s(T) jU .J(T) ] > ( 1 - c'/').
Clearly the plan 'Tsatisfiesthe asymptotic optimal rate criterion when it satisfies
this criterion and, in addition , 'Tcan approach that rate no more slowly than c'/'
approaches O.
The simplestway to extend thesecriteria to & is to require that a plan ' TE: 3
meet the given criterion in each E E: &.
'Tis robustin & with respectto the asymptoticoptimalrate criterionfor 3 whm
Sib Jim[ Ut'.,B(T)/ UJ(T) ] - 1.
BE:&T- 'Tis robustIn & with respectto the interimbehaviorcriterion < Cf'> for 3 when, for
all T,
Systems
Each criterion in effect classifiesthe plans in 3 as " good" or " bad" according to
whether or not it is satisfied. The first of thesecriteria is commonly met in a wide
range of applications, while the second proves to be relevant to questions of
survival under competition. (Once again, a plan satisfying the second criterion
automatically meetsthe first but not vice versa.) Other criteria can be basedon the
cumulative payoff function and indeed criteria of a quite different kind can be
useful in particular situations. Neverthelessthe criteria given are representative
and of general use; they will playa prominent role later.
2. PRESENTATION
A problem in adaptation will be said to be well posed once 3, 8, and x have been
specified within the foregoing framework. An adaptive systemis specified within
this framework by the set of objects (a , a , I , T) where
(t = {Ai A !, . . .} is the set of attainable structures, the domain of action of the
adaptive plan,
() = {", .",, !, . . .} is the set of operators for modifying structures with ", E: () being
a function ", : (t - . <P, where <Pis some set of probability distributions over (t ,
I is the set of possible inputs to the system from the environment, and
T : I X (t - . n is the adaptive plan which, on the basis of the mput and structure
at time I , determines what operator is to be applied at time I.
A Formal Framework
3 is the set of feasibleor possibleplans of the form 1':I X <1- + D (or 1': I X
<1- + (j;) appropriate to the problem being investigated.
8 representsthe range of possibleenvironments or , equivalently, the initial
uncertainty of the adaptive systemabout its environment. When the plan l' tries a
structure <l( t) E: <1 at time t, the particular environment E E: 8 confronting the
adaptive system signals a responseI ( t) E: I . The performance or payoff 1I.~ <l( t ,
given by the function II.B: (j; - + Reals, is generally an important part of the information
I ( t). Given E ~ E ' for E, E ' E: 8, the corresponding functions II.B, II.B' are
generallynot identical so that a major part of the uncertainty about the environment
is just about how well a structure will perform therein. When a plan employs, or
receives, only information about payoff so that I ( t ) = 1I
.g l ( t it will be called a
payoff only plan.
Finally , the various plans in 3 are to be compared over 8 according to a
criterion x . Comparisons will often be based on the cumulative payoff functions
Ur,g(T) = 1:: [. 1pg( 1', I), where pg( 1', t) is the expectedpayoff under <p( t), and the
"
"
target function UJ( T ) = lub Ur,g( T ). An interim behaviorcriterion , basedon a
rE:3
selectedsequence(CT) - + 0 and of the form
),
sIbs [ U,.,.(T)/ UJ(T) ] > ( I - CT
BE
:
will be important in the sequel.
With the help of this framework each of the fundamental questions about
adaptation posed in chapter 1, section 1, can be translated into a formal counterpart
:
Formal
Original
What is &?
To what parts of its environmentis the organism(system
, organiza
tion) adapting?
What is I ?
How doesthe environmentact upon the adaptingorganism(system
,
?
organization
)
What is (J;?
What structuresare undergoingadaptation?
What
is n ?
What are the mechanisms
of adaptation?
What is mt?
What part of the history of its interactionwith the environmentdoes
the organism(system
, organization
) retain in addition to that summarized
in the structuretested?
What is 3?
?
What limits are thereto the adaptiveprocess
What is X?
es to be Corn
about) adaptiveprocess
How are different(hypotheses
pared?
in NaturalandArtificialSystems
Adaptation
3. COMPARISON
WITH THE DUBINS-SAVAGEFORMALIZATION
OF THE GAMBLER' S PROBLEM
. . . much of the mathematicalessence
of a theory of gamblingconsistsof
the discoveryand demonstrationof sharpinequalitiesfor stochasticprocess
es. . .
this theory is closelyakin to dynamicprogrammingand Bayesianstatistics. In the
reviewer's opinion, [ How to GambleIf YouMust] is oneof the mostoriginal books
publishedsinceWorld War II .
M. Iosifescu. Math. Rev. 38, 5, Review5276( 1969
).
For those who have read, or can be induced to read, Dubins and Savage's influential
book, this section (which requires special knowledge not essential for
subsequentdevelopment) showshow to translate their formulation of the abstract
'
gambler s problem to the present framework and vice versa. Briefly, their formulation
is basedon a progressionoflortuneslo ,/ I ,/ 2' . . . which the gambler attains
a
by sequenceof gambles. A gamble is naturally given as a probability distribution
over the set of all possiblefortunes F. The gambler' s range of choice at any time t
dependsdirectly and only upon his current fortune I , so that, as Dubins and Savage
remark, the word " state" might be more appropriate than " fortune." The gambler' s
rangeof choice for eachfortune I is dictated by the gamblinghouser . The strategy0'
for confronting the houseis a function which at each time t selectsa gamble in r
on the basisof the sequenceor partial history of fortunes to that time ( /0,/ 1, . . . ,1,).
Finally the utility of a given fortune I to the gambler is specifiedby a utility function
u. Thus an abstract gambler's problem is well posed when the objects
'
(F, r , u) havebeenspecified; the gambler s responseto the problem is given by his
strategy0'.
The objects of the Dubins-Savageframework can be put in a one-to-one
correspondencewith formally equivalent objects in the present framework. With
the help of this correspondenceany theorem proved in one framework can automatically
be translated to a statementwhich is a valid theorem in the other framework
. The relation betweenthe intended interpretations of corresponding objects
is in itself enlightening, but the real advantageaccruesfrom the ability to transfer
results from one framework to the other with a guaranteeof validity .
The following table presentsthe formal correspondencewith an indication
of the intended interpretation of each formal object. In this table the superscript
" * " on a set will
indicate the set of all finite sequences(or strings) which can be
formed from that set; thus F* is the set of all partial histories.
A Formal Framework
Adaptive Systems
F , fortunes
I, basicstructures(see'Tbelow).
P, a probability distribution over structures
, i.e., P E: <P.
The (induced) function which assignsto
eachA E: Cithe set of distributions<pA {c.(A), CdE: O} .
.,.: Ci- + <p, an adaptivepion; .,. usesonly
the retainedhistory~ in Ci = Ci1X 3ft,
' if
but .,. has the same generality as Q
F*
~
Fand
.
Ci1
.
"'. : Ci- + Reals, performance
3. Illustrations
The formal framework set out in chapter 2 is intended, first of all , as an instrument
for uniform treatment of adaptation. If it is to be useful a wide variety of adaptive
processes must fit comfortably within its confines. To give us a better idea of how
the framework servesthis end, the presentchapter appliesthe framework in several
different fields. It will repay the reader to skim through all of the illustrations on
first reading, but he should skip without hesitation over difficult points on unfamiliar
ground, reserving concentration for illustrations from familiar fields.
Although each of the illustrations adds something to the substantiation of the
framework no one of them is essentialin itself to later developments. The interpretations
, limited usually to one commonly used model per field, are of necessity
largely informal , but two points can be checkedin eachcase: ( I ) the facility of the
framework in picking out and organizing the facts relevant to adaptation, and (2)
the fit of establishedmathematical models within the framework.
1. GENETICS
. . . genesact in many ways, affecting many physiological and morphological
characteristics which are relevant to survival. All of these come together into the
" or selective value. . . .
sufficient parameter 66fitness
Similarly environmental fluctuation
, patchiness, and productivity can be combined . . . in . . . [ a] measure of
environmental uncertainty . . . .
Levins in ChangingEnvironments( pp. 6- 7)
.
The phenotypeis the product of the harmoniousinteractionof all genes
"
..
The genotypeis a physiologicalteam in whicha genecanmakea maximumcontribution
to fitnessby elaboratingits chemical..geneproduct" in the neededquantity
. Thereis extensiveinteraction
and at the time whenit is neededin development
not only amongthe allelesof a locus, but also betweenloci. The main localeof
32
Illustrations
.
the fitness of any set of alleles { 0'1, . . . , o'. } is taken to be the sum of the fitnesses
of the alleles in the set,
1: ':. 111
..(.0';).
However, in general, the fitness of an allele dependscritically upon the influence
of other alleles (epistasis). The replacementof any single allele in a coadapted set
may completely destroy the complex of phenotypic characteristics necessaryfor
adaptation to a particular environmental niche. The genetic operators provide
for the preservation of coadapted sets by inducing a " linkage" betweenadjacent
alleles- the closer together a set of alleles is on a chromosome, the more immune
it is to separation by the genetic operators. Thus a more realistic set of adaptive
plans provides for emphasis of coadapted sets through reproduction, combined
with application of the genetic operators to provide new candidates and test
establishedcoadaptedsetsin new combinations and contexts.
More formally , an interesting set of plans can be defined in terms of a twophase
procedure: First the number of offspring of each individual A in a finite
population a( t) is determined probabilistically, so that the expected number of
'
offspring of A is proportional to A s observedfitness IJ.B!.A ). The result is a population
'
(1( t) with certain chromosomesemphasized, along with the coadapted sets
contain.
Then, in the secondphase, the genetic operators from n are applied
they
in
some
(
predeterminedorder) to yield the new population Ct( t + I ). One class of
plans of considerablepractical relevancecan be defined by assumingthat operator
CIJ
.. from n is applied to an individual A E: (1'( t) with probability p .. (constant over
time). It is easyto seethat the efficiencyof such a plan will dependupon the values
ofthep ..; it is perhapslessclear that onceeach of the p .. has a value within a certain
critical range, the plan remains efficient, relative to other possible plans, over a
very broad range of fitness functions {IJ.BEE : 8} . In particular, if chromosomes
containing a given linked set of alleles repeatedly exhibit above-average fitness,
the set will spread throughout the population. On the other hand, if a linked set
occurs by happenstancein a chromosomeof above-averagefitness, later tests will
eliminate it (seechapters6 and 7) . It is this mode of operation (and others similar)
which gives such plans robustness- the ability to discover complex combinations
of coadaptedsetsappropriate to a wide variety of environmental niches.
Becauseof the central role of fitness, it is natural to discussthe efficiency
and robustnessof a plan T in terms of the averagefitnessesof the populations it
produces. Formally , the averagefitness in E of a finite population of genotypes
Ct,(t) produced by T at time t is given by
Illustrations
4~(')IIs(A),
Ps
(T, t) = 1:AM
(T, t)
where M (or, t) is the number of individuals in ~ t) . If we take the ratio ps( or, t)/
"
'
"
'
ps( or, I), we have an indication of how close orcomesto extinction relative to or
(in the sensethat extinction occurs when the population produced by orbecomes
'
negligible relative to the population produced by or). If we take the greatestlower
bound of this ratio relative to some set of possible plans 3, we have an indication
of the worst that can happen to orin E, relative to 3 at time t. Continuing in this
vein we get the following criterion for ranking plans as to robustnessover &:
'
gIb P.(1', t)/ P.( i , I).
gIb&gIb
B
~ , ", ~3
In effect this criterion ranks plans according to how close they come to extinction
under the most unfavorable conditions.
The fantastic variety of possiblegenotypes, the effectsof epistasis, changing
environments, and the difficulty of retaining adaptations while maintaining
variability (genetic variance), all constitute difficulties which genetic processes
must surmount. In terms of the (3, 8, x) framework theseare, respectively, problems
of the large size of Ct, the nonlinearity and high dimensionality of #liB, the nonstationarity of #liB, and the mutual interference of search and exploitation . The
(3, 8, x) framework enablesthe definition of concepts(chapters4 and 5) which in
turn (chapters6 through 9) help explain how geneticprocessesmeetthesedifficulties
in times consistentwith paleological and current biological observations.
Summarizing:
Ct, populations of chromosomesrepresented, for example, by the set of distributions
over the set of genotypesCtl.
n, genetic operators such as mutation, crossover, inversion, dominance modifi cation, translocation, deletion, etc.
3, reproductive plans combining duplication according to fitness with the
En is
application of genetic operators; for example, if each operator CI1i
of
set
the
then
a
with
fixed
plans
to
individuals
possible
,
probability Pi
applied
can be representedby the set
{(Pi, . . . , Pi, . . . , Pb) where 0 ~ Pi ~ I } .
8, the set of possible fitness functions {ll.B:Ct- + m} , each perhaps stated as-a
function of combinations of coadaptedsets.
pro -
'
gIb gIb
gIb p.( .,., I)/ P.( ", , I).
, ,.,
B~8
~3
2. ECONOMICS
The specificationof how goodscan be transformedinto eachother is called
the technology
of the modeland the specificationof how goodsare transformedto
satisfactionis calledthe utility function. Giventhis structureand someinitial bundle
of goods, the problemof optimal developmentis to decideat eachpoint of time
how muchto investandhow muchto consumein orderto maximizeutility summed
over time in somesuitableway.
Gale in IIA MathematicalTheoryof Optimal Economic
"
Development Bull. AMS 74, 2 (p. 207)
One of the most important formulations of mathematical economics is the von
Neumann technology. This technology can be presented(following David Gale
1968) in terms of a finite set of goods and a finite set of activities, where each
activity transforms somegoodsinto others. If the goodsare indexed, then the goods
available to the economy at any given time can be presentedas a v~ or where the
ith component givesthe amount of the ith good. In the sameway, the input to the
jth activity and the resultant output can be given by a pair of vectors Wj and W;,
where the ith component of Wj specifiesthe amount of the ith good required by
the activity , while the ith component of w ; specifiesthe amount produced. An
activity can be operatedat various levelsof effort so that, for instance, if the amount
of input of each required good is doubled then the amount of output will be
doubled. More generally, if the level of effort for activity j is Cj then the pair
( WjCj, WJcJ specifiesthe input and output of the activity . Ifa mixture of activities
is allowed, the overall technology can be specifiedas the set of pairs
'
{( Wc, W c) : 3 CE: Q}
where Wand W' are matriceshaving the vectors Wj and w ; as their respectivejth
columns, each c is a vector having the level of the jth activity as its jth component,
and Q designatesthe set of admissible activity mixes (corresponding to the real
constraintslimiting the total activity at any time). A program for utilizing the technology
is given by a sequenceof activities (c,) satisfyingthe intuitive " local" requirement
that the total amount of each good required as input for the activities at time
Illustrations
" Tool
"
Fabrication
Output
.
Input
'
" Coal
"
Mining
Input Output
TYPICAL ACTIVITIES:
" Coal
"
Storage
Input Output
Matrix W'
(CombiningActivity
Outputs
)
3/2 000000
0 100004
0 010000
0 001300
0 000010
0000
Matrix W
(Combining Activity
Input Reauirements
)
ON -
0
0
0
1
00 -
Iron Ore
Steel
Tools
TYPICALPRODUCTION
:
SEQUENCE
~"
II
W' .
\ ~N
I ~
<t(l )
5
0
2
/0
5
/2
l
10
10
0
~"
Wood
Coal
Iron Ore
Steel
Tools
I In
Ci(t) :
0
4
0
= w
t cannot exceedthe total amount of that good produced as output in the preceding
period :
W' C'- l ~ Wc,
(using matrix multiplication and the obvious extension of inequality to vectors) .
Activities which dispose of or store goods can be introduced so that the given
. Thus, given
inequalitiescanbechangedto equalitieswithout lossof generality
initial supplyof goodsV(O), the setof admissibleprogramsbecomes
U.,(T) - U~,(T)
where
A programCisconsidered optimalif
RIb liminf [ U,.(T) - U.s(T) ] ~ O.
: eT - Gile
BecauseQ setsan upper limit to levels of effort, an optimal program always exists.
A program c ~ will often be satisfactory if its rate of accrual of utility U~ T )/ T is
comparable to that of C,. .
"
"
Generally interest centers on noncontracting economieswhere, once an
activity is possible, it continuesto be possibleat any subsequenttime. This can be
"
"
guaranteed, for example, if there is a set of initial goods which are regenerated
by all activities (cf. sunlight, water, and air ) and from which all other goods can be
produced by appropriate sequencesof activities. In such economies a mix of
activities can be tried and, if found to be of above-averageutility , can be employed
again in the future.
In the (3, 8, x) framework, the set of admissible activity mixes Q becomes
the set of structures (t. An adaptive plan 7' generatesa program C by selecting a
sequenceof activity vectors (cr.,) on the basis of information received from the
environment (economy). The environment E in this case makes itself felt only
through the observed utility sequences
( cr., ; thus different utility functions
to
different
environments
.
Within
this framework, the basic concern is
correspond
discovery of an adaptive plan which, over a broad variety of environments,
"
"
generatesprograms which work near-optimally . A typical criterion of " near-
Illustrations
"
optimality would be that for all utility functionsof interestthe ratio of the rate
of accrualof theadaptiveplan1', U~T)/ T, to that of C~(E) , U~(B)(T)/ T, approach
es
1 for eachE E: B. That is,
jim
U B T = I , for aUE E: 8.
2"- - [Ur<T)/ ~( )( )]
Generally there will be some additional requirement that the rates be comparable
for all timesT .
Adaptation becomesimportant when there is uncertainty about just what
utility should be assignedto given activity mixes, or when it is difficult to project
IJ.B into the future, or when Q is a function of time (reflecting technological innovations
). The key to formulating an adaptive plan here, paralleling the procedure
in other contexts, is continual use of incoming information (about satisfactions
and dissatisfactions, changing technology, etc.) to modify activity levels. A wellformulated plan should respond automatically, specifying adjustments needed, as
information accumulates. Since, in von Neumann' s formulation , the environment
is characterizedby the utility assignedto different activity vectors, we can limit
considerationto payoff-only plans. The fact that reproductive plans are payoff-only
plans which can be proved near-optimal (in the sensedefined above) for any set of
utilities, makes it likely that such plans can supply the responsivenessrequired
here. In (3, 8, x) terms the basic problems here, as in the geneticsillustration , are
the large size of (t coupled with nonlinearity and high- dimensionality of IJ.B.
Becausethe concepts of chapters 4 and 5 are formulated in terms of the general
framework, they apply here as readily as to genetics. The resulting techniquesare
specifically interpreted as optimization procedures throughout chapter 6, at the
end of section 7.1, and throughout section 7.2.
Summarizing:
d , the set of admissibleactivity vectors Q.
0, transformations of Q into itself.
3, plans for selectinga program (c,) E: e , where c, is an activity vector in Q, on
the basis of observedutilities {p.Bf.c,,), t' < I } , i .e., payoff-only plans.
8, an indexing set of possible utility functions {p.B: Q - + <R, E E: 8} .
x, typically a requirement that , for all utility functions p.BE E: 8, the limiting
rate of accrual of a plan, Jim ( U,(T) / T) , equal that of the best possiblepro'1'- gram C.s.(E) in each E E: 8.
3. GAME-PLAYING
to
Lackingsuchknowledge[of machine-learningtechniques
], it is necessary
specifymethodsof problemsolutionin minuteand exactdetail, a time- consuming
and costly procedure
. Programmingcomputersto learn from experienceshould
eventuallyeliminatethe needfor muchof this detailedprogrammingeffort.
Samuelin " SomeStudiesin MachineLearningUsingthe
" IBM J. Rea. Dev. 3 . 211
Gameof Checkers
(p
)
Most competitive gamesplayed by man ( board games, card games, etc.) can be
presentedin terms of a tree of moves where each vertex (point, node) of the tree
corresponds to a possible game configuration and each directed edge (arrow)
leading from a given vertex representsa legal move of the game. The edge points
to a new vertex corresponding to a configuration which can be attained from the
given one in one move (turn , action) ; the options open to a player from a given
configuration are thus indicated by the edges leading from the corresponding
vertex. The tree has a single distinguished vertex with no edgesleading into it , the
initial vertex, and there are tenninal vertices, having no edgesleading from them,
which designateoutcomes of the game. In a typical two- person game which does
not involve chance, the first player selectsone of the options leading from the
initial configuration; then the second player selects one of the options leading
from the resulting configuration; the play of the game proceeds with the two
players alternately selectingoptions. The result is a path from the initial vertex to
some terminal vertex. The outcomes are ranked, usually by a payoff function
which assignsa value to each terminal vertex.
In theseterms, a pure strategy for a given player is an algorithm (program,
procedure) which, for each nonterminal configuration, selectsa particular option
leading therefrom. Once each player choosesa pure strategy, the outcome of the
game is completely determined, although in practice it is usually possible to
determine this outcome only by actually playing the game. Thus, in a strictly
determined (non-chance) two- person game, each pair of pure strategies(one for
each player) can be assigneda unique payoff. The object of either player, then, is
to find a strategy which doesas well as possibleagainst the opponent as measured
by the expectedpayoff. This informal object ramifies into a whole seriesof cases,
"
depending upon the initial information about the opponent and the form of the
game.
One of the simplest casesoccurs when it is known that the opponent, say
the secondplayer, hasselecteda singlepure strategyfor all future plays of the game.
Ill I Istrations
GAMETREE
.
.
.
Opponent' s Move
. . .
Plan's
Move
Move
The object of the first player, then, is to learn enough of the strategychosenby the
secondplayer to find an opposing strategywhich maximizespayoff. When the game
tree involves only a finite number of vertices, as is often the case, it is at least
theoretically possibleto locate the maximizing strategy by enumerating and testing
all strategiesagainst the opponent. However, if there is an averageof k options
proceeding from each configuration, and if the average play involves m moves,
there will be in excessof k- pure strategies. The situation is quite comparable to
the examplesof enumeration given earlier. Even for a quite modest game with
k = 10 and m = 20, and a machine which tests strategiesat the exceptional rate
of one every 10- ' second, it would require in excessof 1011seconds, or about 30
centuries, to test all possibilities. Efficiency thus becomesthe critical issue, and
Adaptation
in
Natural
and
Artificial
Systems
interest centers on the discovery of plans which enable a player to do well while
learning to do better. If plans are compared in terms of accumulated payoff, a
'
"
"
criterion emergesanalogousto the classical gambler s ruin of elementaryprobability
. Let Ug( T, t) be the payoff accumulatedto time t by plan TE: :J confronting
the ( unknown) pure strategy E E: 8, and require that
Illustrations
loss (negative of the payoff) the opponent can impose. It is interesting that often
(checkers, chess, go) this minimax strategy is a pure strategy. Thus, although the
payoff may vary on successivetrials of the samestrategy, the plan can still restrict
its searchto pure strategiesin such cases. In more generalsituations, however, the
plan will have to employ stochasticmixtures of pure strategiesand, if it is to exploit
its opponents maximally, it will even associateparticular mixtures with particular
kinds of opponents (assumingit is supplied with enough information to enable it
to identify individual opponents).
Considered in the (3, 8, x) framework, the strategies becomethe elements
of the domain of action (t and the plans for employing thesestrategiesbecomeelements
of 3. The set of admissibleenvironments8 dependsupon the particular case
considered. Ifit is known that the opponent has chosena single pure strategy, then
the set of admissible environments 8 is given by the set of pure strategies. The
criterion X for ranking the plans is then built up from the unique payoff determined
by each pair of opposing pure strategies, the example given being the " gambler' s
ruin " criterion
8 , the strategic options open to the opponent ; in simple cases, the set of pure
.
strategies
'
"
"
a
x, rankingof plansusingthecumulativepayoff functions , the gambler s ruin
criterionbeingan example
.
4 . SEARCHES
, PATTERN RECOGNITION, AND STATISTICAL INFERENCE
Searches occur as the principal element in most problem-solving and goal-attainment
attempts, from maze-running through resourceallocation to very complicated
planning situations in business, government, and research. Games and searches
have much in common and, from one viewpoint, a gameis just a search(perturbed
by opponents) in which the object is to find a winning position. The complementary
viewpoint is that a searchis just a gamein which the movesare the transformations
(choices, inferences) permissible in carrying out the search. Thus, this discussion
of searches complementsthe previous discussionof games.
In complicated searches the attainable situations S are not given explicitly ;
instead some initial situation SoES (position in a maze, collection of facts, etc.)
is specifiedand the searcheris given a repertory of transformations {" ..} which can
be applied (repeatedly) in carrying out the search. As in the caseof games, a tree is
a convenient abstract representationof the search. For searches, each edgecorresponds
to a possible transformation " .. and the traverse of any path in the tree
correspondsto the application of the associatedsequenceof transformations. The
vertex at the end of a path extending from the initial vertex correspondsto the
situation produced from the initial situation by the transformations associatedwith
the path. The difficulty of solving a problem or attaining a goal is primarily a function
of the size of the search tree and the cost of applying the transformations.
In most casesof interest the treesare so vast that hope of tracing out all alternative
paths must be abandoned. Somehowone must formulate a searchplan which, over
a wide range of searches, will act with sufficient efficiency to attain the goal or
solve the problem.
A typical search plan (see Newell, Shaw, and Simon' s [ 1959] GPS or
Samuel's [ 1959] procedure) involves the following elements:
(i ) An (ordered) set of feature detectors {8i:S - + Vi, where Vi is the range
of readingsor outputs of the ith detector} . Typically , each detector is
an algorithm which, when presentedwith a " scene" or situation, calculates
a number; if the number is restricted to 0 or I , it is convenient
to think of the algorithm as detecting the presenceor absenceof a
Illustrations
property (cf. the simple artificial adaptive system of section 1.3). The
needfor detectorsarisesfrom the overwhelming flow of information in
most realistic situations; the intent is to filter out as much " irrelevant"
information as possible.
ENTER
At eachchoicepoint, 1 through6, thereis to be a signassociated
with eachof the 3 possible
directionsx , y, z. If the symbolII A " occursat the top of a signthe associated
corridor
belongsto the shortestpath from the entranceto the goal; on the other hand, if the
II "
symbol V occursat the bottom of a sign the associatedcorridor is to be avoided.
Eithersymbolmaybedark on a light backgroundor viceversa.Thus, reducedto a 4- by4
array of sensors(seesection1.3), either of the configurations
indicates the direction of the goal. Each experiment involves a set of signs indicating
uniquely the shortest path to one of the three possible goals GI, GI, Ga.
In the tern1inology of section 3.4, the state at each choice point is given by the triple of
signs there. That is,
S - {triples of 4- by- 4 arrays}
"
{'llillila } - { 66followdirection x , 66followdirection y," 66followdirection z" }
Fig . 5. A simple searchsetting: a maze with six choicepoints
"'
&. -
/
"
&, -
0 otherwise
""0
otherwise
1 if the array is dark and the upper half is darker
a. - la - &. -
/
,
.
0 Otherwise
Thusthearray
Illustrations
Six
Sly
Siz
~,
52z
at choicepoint2
~x
and so on (i .e., the shortest path is indicated by dark symbols on a light background) .
Setting II : The goal is at Ga, the signs at choice point 1 are
Six
51y
SIZ
and at choice point 2 they are the same as in Setting I except for
S2x
If the shortest path to the goal were always indicated as in Setting I , i .e., with dark symbols
on a light background, then the functionf ' (s) = 8a(S) ( i.e., w. = 1, WI - WI = W. = 0)
would always suffice for following the path . Notice , however, that in Setting II f ' assigns
'
exactly the sameset of values (0, 1,0) at point 1, indicating that I does not distinguish the
'
two settings. But , in Setting I f ' assigns( 1,0,0) at point 2, while in Setting II I assigns
(0,0,0) at point 2. Thus, starting from the same initial state (0, 1,0) and invoking the same
'
response'l-, I arrives at two different states. Changing the weight assignedto 8, cannot
correct the difficulty . This is a clear indication that the set of detectors (8, in this case) is
inadequate.
A quick check of the possibilities showsthat consistently correct choicesin the two settings
can be achieved only by assigning a nonzero weight to 8. , which is a nonlinear combination
of 81and 8, . The function I ' ' (S) = 8. + 8, - 28. then performs correctly in both
settings and, in fact, performs consistently with any proper sequenceof signs.
Fig . 7. Some searche" u.rIng the device" 01figure 6 In the setting" 01figure 5
"
"
(ii ) An evaluator. The evaluator calculatesan estimate of the distance
of any given situation from the goal, using the detector outputs (an
ordered set of real numbers) produced by that situation. The estimates
are supposedto take the costsof the transformations, etc., into account;
that is, the " distances" are usually weighted path lengths, where the
paths involved are (conjectured) sequencesof transformations leading
from given situations to the goal. The intent is to usetheseestimatesto
determine which transformations should be carried out next. An evaluation
is made of each of the situations which could be produced from
the current one by the application of allowed (simple sequencesof)
transformations, and then that (sequenceof) transformation(s) is executed
which leads to the new situation estimated to be " nearest" the
goal.
(iii ) Error correction procedures. Before the searchplan has beentried , the
detectorsand evaluator must be set up in more or lessarbitrary fashion,
using whatever information is at hand. The purpose of the error correction
proceduresis to improve the detectors and evaluators as the
plan accumulatesdata. The shorter term problem is that of evaluator
improvement. A typical procedureis to explore the searchtree to some
distanceaheadof the current situation, either actually or by simulation,
evaluating the situations encounteredfor their estimateddistancesfrom
the goal. The evaluation of the situation estimated to be " nearest"
the goal is then compared with the evaluation of the current situation
and the evaluator is modified to make the estimatesconsistent. This
" lookahead"
procedure decreasesthe likelihood of contra4ictory distance
estimatesat different points on the samepath. (A similar procedure
can be carried out without lookahead using predictors to make
predictions about future situations, subsequentlymodifying the predictors to bring predictions more in line with observedoutcomes.) As a
result, the consistencyof the evaluator is improved with eachsuccessive
evaluation. At the same time, in most searches, the difficulty of estimating
the distance to the goal decreasesas the goal is approached,
becoming perfect when the lookahead actually encounters the goal.
Thus increasingthe consistencyultimately increasesthe relevanceof the
evaluator.
There is, however, a caveat. If the set of detectorsis inadequate, for whatever
reason, the improvement of the evaluator will be blocked. This raises the broad
issueof pattern recognition, for the set of detectorsis, of course, meant to enable
Illustrations
the plan to recognizecritical features for goal-attainment. The plan must be able
to classifyeachsituation encounteredaccordingto the goal-directed transformation
which should be applied to it . The long-term problem is that of determining whether
the set of detectorsis adequateto this task. Important shortcomingsare indicated
when, from application of identical transformations to situations classedas equivalent
by the detectors, situations with critically different evaluations result. When
this happens, the detectors have clearly failed to distinguish some feature which
makesa critical differenceas far as the transformations are concerned. The object,
then, is to generatea detector which gives different readings for the previously
indistinguishable situations. Among the obvious candidatesare modifications of
the detectorswhich made the distinctions after the transformations were applied.
Usually simple modifications will enable such detectors to make the distinction
before the transformation as well as after .
We can look at this whole problem in another way, a way which makescontact
with standard definitions in the theory of probability . Assumethat the search
plan assignsto each transformation " a probability dependentupon the observed
situation. That is, if Sa is the current situation, then each situationS ~ ES can be
assigneda conditional probability of occurrencep~ , wherepa~ is simply the sum
of the probabilities of all transformations leading from Sato S~. (It may, of course,
be that there are no transformations of Sato S~, in which casepa~ = 0.) A sequence
of trials performed accordingto the probabilitiespa~is a Markov chain, the outcome
of eachtrial being a random variable(dependentupon the outcome of prior trials).
The samplespaceunderlying this random variable is the set of situations S. Let us
assigna measureof utility or relevanceto each of thesesituations. ( For example,
goals could be assignedutility 1 and all other situations utility 0, or some more
complicated assignmentranking goals and intermediate situations could be used.)
Then, formally , the function W making this assignmentis also a random variable.
Accordingly, we can assignan expectedutility to the random variable representing
the outcome of eachtrial in the Markov chain. In theseterms, the plan continually
redefinesthe Markov chain (by changing the transformation probabilities). It
attempts in this way to increasethe average(over time) of the expectedvalues of
the sequenceof random variables correspondingto its trials.
The role of detectorshere is, as already suggested
, reduction of the size of
the sample spaceand simplification of the search. More formally , consider a set
of n detectors(not necessarilyall those available), H = {8t, . . . , 8ft} , where H is
arbitrarily ordered. The detectorsin H assignto each SE: S an n-tuple of readings
(VI, . . . , Vft) belonging to the direct product
fir - l Vi.
V
IV
I
-,~
(
\
(--U
\V
>
)
C
-U
~
>
II
.~
:~
U
)II
"
,II
.)()(\\II,.;:>
-If
~
I-<
')~
II
(.+
\~
>
N
~
..IM
-)J
I>
'=
(-M
\,U
>
- N ->
>NN
--. --.
-> ->
-II
-II _N
~_
U
-) U
-)
(10 ~
~
.
I
1
~
1
.:s
2
3l
~3mVA
3
N
II~
~N
>
,t)
>
-..
H3
""NN
>NN>
,t) ,f)
>- >- --II --..
(/)- u\"
NN
~ ~~
--~ -N
(/)
-- (/)
~
,o
- --II- -IIN
(/)
- (/)
~ ~
i
'i
~
~
~
~
:r
-;
1...
IE
s~
1
~
~
~~
52
Adaptation
in Natural
and Artificial
Systems
Illustrations
5. CONTROLAND FUNCTIONOPTIMIZATION
The fact that we need time to determine the minimum of the [performance]
functional or the optimal [control ] vector c . is sad, but unavoidable - it is a cost
that we have to pay in order to solve a complex problem in the presenceof uncertainty
. . . . adaptation and learning are characterized by a sequential gathering
of information and the usage of current information to eliminate the uncertainty
created by insufficient a priori information .
Tsypkin in Adaptation and Learning in Automatic Systems( p. 69)
In the usual version, a control led processis defined in terms of a set of variables
{XI, . . . , x ..} which are to be control led. ( For example, a simple process of air
conditioning may involve three critical variables, temperature, humidity, and air
flow.) The set of statesor the phasespacefor the pr ~
, X , is the set of all possible
combinations of values for these variables. ( Thus, for an air conditioning process
the phase space would be a 3-dimensional space of all triples of real numbers
(XI, X2, XI) where the temperature XI in degreescentigrade might have a range
0 ~ XI ~ SO, etc.) Permissiblechangesor transitionsin phasespaceare determined
as a function of the state variable itself and a set of control parametersC. Typically
X is a region in n- dimensionalEuclidean spaceand the control parametersassume
values in a region C of an m-dimensionalspace. Accordingly , the equation takes
the form of a " law of motion " in the spaceX ,
dx/ dt = f (X(t), C( t ,
where
II lustrat /ona
ISystems
Adaptation in Natural and Artificial
that eachA E: (t be assigneda unique cost pBf.A ) . To satisfy this requirementwe can
let (t = (tl X m where m is the set of natural numbers { I , 2, 3, . . .} . Then unique
elementsof (t , namely ( lA , II)' ( lA , t2)' . . . , ( lA , It ), correspond to the successive
trials of lA and the cost Q( ti ) of trial ti can be assignedas required,
pBf.A ) = pBf.( IA, ti
= Q( ti ).
An adaptive plan .,. will modify the policy at intervals on the basis of observed
costs. With the definition of (t just given this meansthat , if 1A is tried at
time t and is to be retained for trial at time t + 1,
1'(1( t), <t( t
= 1'(1( t), ( lA , t
= ( lA , t + 1) ;
= 1'(1( t), ( lA , t
= ( lAt
+ 1).
= Q( t ),
Illustrations
6. CENTRALNERVOUS
SYSTEMS
Behavioris primarily adaptationto the environmentunder sensoryguidance
. It takesthe organismawayfrom harmful eventsand toward favorableones,
or introduceschangesin the immediateenvironmentthat make survival more
likely.
Hebb in A Textbook of Psychology( pp. 44- 45)
I introduce this last example of an adaptive systemwith some hesitation. Not because
the central nervoussystem(CNS hereafter) lacks qualifications as an adaptive
system- on the contrary, this complex systemexhibits a combination of breadth,
flexibility , and rapidity of responseunmatchedby any other systemknown to man
- but becausethere is so little prior mathematical theory aimed at explaining
adaptive aspectsof the CNS. Even an intuitive understanding of the relation between
physiological micro- data and behavioral macro- data is only sporadically
available. Perforce, mathematical theories enabling us to seesome overall action
of the CNS as a consequenceof the actions and interactions of its parts are, when
available at all , in their earliest formative stages.
Here, more than with the other examples, the initial advantageof the formal
framework will be restatementof the familiar in a broader context. The best that
can be hoped for at this stageis an occasionalsuggestionof new consequencesof
familiar facts: Without the advantagesof a deductive theory, statementsmade
within the framework can do little more than provide an experimenterwith guideposts
and cautions, suggesting possibilities and impossibilities, phenomena to
anticipate, and conclusionsto be acceptedwarily . This is a preliminary , heuristic
stagemarking the transition from unmathematicalplausibility to the formal deductions
of mathematical theory. In common with most heuristic and loose-textured
Illustrations
59
The discussion which follows will be based upon the informal theory of
CNS action introduced by Hebb in his signal 1949 work and subsequentlyim portantlyenlarged by P. M . Milner ( 1957) and I . J. Good ( 1965). I will attempt a
brief recapitulation of some of the main assumptionshere, with the intention of
orienting the reader having some knowledge of the area. This is only one view of
CNS processes and the presentation has been kept deliberately simplistic. (E.g., a
more sophisticatedtheory would take account of substantial evidencefor distinct
physiological mechanisms underlying short-term, medium-term, and long term
the
relevance
a
as
in
context
possible,
memory.) The object is to indicate, as simple
"
of the (3, 8, x) framework to. understanding the CNS as a means of adaptation
"
to the environment under sensoryguidance. The reader without a relevant background
can gain a significant understanding by reading the papers of Milner and
Good ; a readingof Hebb' s excellenttextbook ( 1958) will give a much more comprehensive
view.
The basic element of Hebb' s theory is the cell assembly. It is assumedto
exhibit the following essential characteristics. (Comments in parenthesesin the
presentation of characteristicsrefer to possible neurophysiological mechanisms):
Illustrations
creasesthe likelihood of activity in all assemblieswith which it is negatively associated. Positive association betweena pair of cell assemblies
increaseswheneverthey are active at the sametime. Negative association
is asymmetrical in that one cell assemblymay be negatively associated
with a second, while the secondis not necessarilynegatively associated
with the first ; this negative association increaseseach time the first assembly
is active and the second is inactive. (The underlying neural
assumption here is that, if neuron n2producesa pulse immediately after
it receivesa pulse from neuron nl, then nl is better able to elicit a pulse
from n2in the future ; contrariwise, ifn2 producesno pulse upon receiving
a pulse from nl, then nl is more likely to inhibit n2 in the future. It is
usually assumed that this process is the result of changing synapse
levels. The same processcan be invoked in explaining the origin of cell
assemblies
.) It should be noted that , under this assumption, there is a
for
cell assembliesto becomeactive in fixed combinations, at
tendency
the sametime actively suppressingalternative combinations. ( Becausea
cell assemblyinvolves only a minute fraction of the neurons in a CNS,
a great many can be excitedat any instant, different configurations corresponding
to different perceivedobjects, etc.) Temporal association(i.e.,
probable action sequences
) can occur via appropriate asymmetries; e.g.,
assemblya can arouse.8via positive associationwhile .8inhibits a through
negative association. Thus the action sequenceis always a,8, never the
reverse.
4. At any instant the responseof the CNS to sensoryinput is determined
. (Overt behavior such as
by the configuration of active cell assemblies
movement
activation
of
reflex
es
release
of voluntary muscle sequences
,
eye
,
and
so
on
will
,
accompanymost sensoryevents. Via the mechanisms
of (3), neuronsinvolved in this behavior will tend to becomecomponents
of cell assembliesactive at the same time. Since pulse trains
from the active cell assembliesdominate overall CNS activity , overt
behavior will thus be determined by the active configurations. In effect,
the sensoryinput modulatesthe ongoing activity in the CNS to produce
overt behavior.)
5. Cell assembliesinvolved in temporal sequencesyielding " need satisfaction
"
(satisfaction of hunger, thirst , etc.) have their associationsenhanced
"
"
"
; the greater the " need, the greater the enhancement. ( Needs
are internal conditions in the CNS-control led organism, conditions pri marily concerned with survival, which set basic restrictions on CNS
l //ustrat;ons
Illustrations
illustration , and it might be possible to use the framework more precisely in this
context (especiallyfor animals in the wild state). Somesuggestionsfor bringing cell
assemblytheory within the range of the (:1, 8, x) framework are made in section
8.4. There is much to be done before we can hope for definite, generalresults from
theory.
Summarizing:
.
d , repertory of possiblecell assemblies
0 , possibleassociationrules (Hebb's rule for synapsechange, short-term memory
rules, etc.).
:1, possible ( or hypothetical) organizations of the CNS in terms of conditions
under which the rules of 0 are to operate.
8, the range of environments in which the CNS being studied is expected to
operate(relevant features, cues, etc.).
x, the ranking of organizations in :1according to performanceover 8, for example
, according to ability to keep averageneedslow under any situation in 8
cf.
( optimal control illustration ).
These illustrations are intended to demonstrate the broad applicability of
the (3, 8, x) framework. They can also serve to demonstrate something else. The
obstaclesdescribedinformally at the end of section 1.2 do indeedappearas central
problems in each of the fields examined. This is an additional augury for making a unified approach to adaptation- common problems should have common solutions
. Much of the work that follows is directed to the resolution of thesegeneral
problems. In section 9.1 the problems are listed again, more formally , and the
relevanceof this work to their resolution is recapitulated.
4. Schemata
Schemata
nate the set of alleles of locus i (seesection 3.1) and the correspondingrepresentation
of A is the specificationof the ordered set of alleleswhich make up the chromosome
. For a von Neumann economy(section 3.2) the V.. can designatethe possible
levelsof the ith activity so that the representationof a mixture of activities is simply
the corresponding activity vector. Similar considerations apply to each of the remaining
illustrations of chapter 3. Clearly, with a given set of I detectors, two
structureswill be distinguishableonly insofar as they have distinct representations.
Since, in the presentchapter, we are only interested in comparisonslet us assume
that all structuresin (t are distinguishable(havedistinct representations) or , equivalently
, that (t is used to designate distinguishable subsets of the original set of
structures. For simplicity in what follows (t will simply be taken to be the set of
representationsprovided by the detectors (rather than the abstract elements so
represented
).
{ A such that 8.(Aj
- V..} - +
{ A such that 8.(A)
- VII} - +
{A such that 8.(A )
- V..} - +
-
,
. .
j
,
j
"
,
j
1
{A suchthat8s(A)
- YII}
f
{A suchthat 8s(A)
- Va}
. . .
ofadesignated
~ Subset
schema
0...0
V1aO
by
ofa0designated
~ Subset
...0
schema
VaO
by
ofadesignated
~ Subset
...0
schema
V1aVaO
by
Systems
IJ
.'(T) = (}::T. l c,lJ.{a( t )/(}::T- l C,), c, > c" for t > t' ,
but the simple average suffices for the present discussion.) Though .a(T ) can be
incrementedby simply repeating the structure yielding the best performance up to
time T this does not yield new information . Hence the object is to find new structures
which have a high probability of incrementingp,(T ) significantly. An adaptive
plan can use schematato this end as follows: Let A E: (t have a probability P(A )
of being tried by the plan T at time T + I . That is, T inducesa probability distribution
P over (t and, under this distribution , (l becomesa sample space. The performance
measure#J. then becomesa random variable over (t , A E: (t being tried
with probability P( A ) and yielding payoff ,,(A ). More importantly , any schema
~ E: ,sodesignatesan event on the samplespace(t . Thus, the restriction #J. I ~ of #J. to
Schemata
69
~ Systems
Adaptation in Natural and ArtiJici
rJ
~
~
100 . . . 0
DO 0 0 . . . 0
1000
(
2
1
. flona/ / Unction
Fig. 10. Some schemata/ or a one- dimen
... 0
Schemata
off for relatively few x will enableff to be estimatedfor a great many fEE . Even
a sequenceof four observations, say x ( l ) = .0100010. . . 0, x( 2) = .110100. . . 0,
x( 3) = .100010. . . 0, x (4) = .1111010. . . 0, enablesone to calculate three-point
estimatesfor many schemata, e.g. (assumingall points are equally likely or equally
weighted), 1100 . . . 0 = ( f (x( 2 + f (x( 3 + f (x( 4 )/ 3 and 1000D 10 . . . 0 =
( f (x( I + f (x( 3 + f (x (4 )/ 3, and two- point estimatesfor even more schemata,
e.g.,/ oloD lo . . . o = ( f (x( 3 + f (x( 4 )/ 2 and/ tlo . . . 0 = ( f (x( 2 + f (x( 4 )/ 2.
The picture is not much changed if f is a function of many variables
Xl, . . . , Xd. Using binary representationsagain, we now have 2Oddetectors (assuming
the sameaccuracyas before), 32CW
schemata, and eachpoint is an instanceof
pcw schemata. In the one-dimensional case the representation transformed the
problem to one of sampling in a 2O- dimensional space- already a space of high
dimensionality- so the increaseto a 2Od-dimensional spacereally involves no significant
conceptual changes. Interestingly, each point (Xl, . . . , Xd) is now an instance
of 22Od
schematarather than 220schemata, an exponential (dth power) increase
. Thus, for a given number of points tried , we can expect an exponential
(dth power) increasein the number of schematafor whichf f can be estimatedwith
a given confidence. As a consequence
, if the information about the schematacan
be stored and used to generate relevant new trials, high dimensionality of the
argument space {O ~ Xj < I ,j = I , . . . , d} imposesno particular barrier.
It is also interesting in this context to compare two different representations
for the same underlying space. Six detectors with a range of 10 values can yield
approximately the samenumber of distinct representationsas 20 detectors with a
range of 2 values, since 108~ 220= 1.05 X 108(cf. decimal encoding vs. binary
encoding). However the numbers of schematain the two casesare vastly different:
118= 1.77 X 108vs. 320= 3.48 X 108. Moreover in the first caseeach A E: (1 is
an instanceof only 28 = 64 schemata, whereasin the secondcaseeachA E: (1 is an
instanceof 220= 1.05 X 108schemata. This suggeststhat , for adaptive plans which
can usethe increasedinformation flow (such as the reproductive plans), many detectors deciding among few attributes are preferable to few detectors with a range
of many attributes. In geneticsthis would correspond to chromosomeswith many
loci and few allelesper locus (the usual case) rather than few loci and many alleles
per locus.
Returning to the view of schemataas random variables, it is instructive to
determine how many schematareceiveat least some given number n < N of trials
when N elementsof (1 are selectedat random. This will give us a better idea of the
intrinsic parallelism wherein a sequenceof trials drawn from (1 is at the sametime
a ( usually shorter) sequenceof trials for each of a large number of schemata
ArtificialSystems
Schemata
Values for this bound can be obtained from standard tables for the Poisson distribution
, but the following representative cases will give some feeling for the
numbers involved . Setting k = 2 and / = 32 ( so that d contains 212~ 4.3 X 10'
distinct elements) we get :
, N)
( nE
I(nE, N)
( 8, 1, 16)
( 8, i , 16)
> 7000yo
Po
> 5OyoPo
000010
001111
011<xx>
011011
101100
110011
111011
111101
5. TheOptimalAllocation of Trials
trials of ~' that all of the trials have had outcomesexceedingIJ.E. A fortiori their
' will exceedIJ.Ewith probability at leastpH, even though IJ.E' < IJ.E.) Here
average.4E
the tradeoff betweengathering information and exploiting it appearsin its simplest
terms. To seeit in exact form let ~(1)(N) namethe random variable with the highest
observedpayoff rate (averageper trial ) after N trials and let ~(2)(N) name the other
random variable. For any number of trials n, 0 ~ n ~ N , allocated to ~(:!)(N) (and
n),
assuming overlapping distributions) there is a positive probability , q( Nn,
that ~(:!)(N ) is actually the random variable with the highest mean, max {IJ.E, IJ.E'} .
The two possiblesourcesof loss are: ( I ) The observedbest ~(1)(N) is really second
best, whencethe Nn
trials given ~(l)(N ) incur an (expected) cumulative loss
.E- IJ.rl ; this occurs with probability q( Nn
(Nn ) . IIJ
, n) . (2) The observedbest
is in fact the best, whencethe n trials given ~(2)(N) incur a loss n . IIJ
.E- IJ.f1 ; this
occurs with probability ( I - q( Nn , n . The total expectedloss for any allocation
of n trials to ~(2) and Nn
trials to ~(l) is thus
L( Nn, n ) = [q( Nn
, n) . (N - n) + ( I - q( Nn
, nn
.E- IJ.rl .
] . IIJ
We shall soon seethat , for n not too large, the first source of loss decreases
as n increasesbecauseboth N - nand q( Nn
. At the sametime the
, n) decrease
secondsourceof loss increases. By making a tradeoff betweenthe first and second
sourcesof loss, then, it is possibleto find for each N a value n *(N) for which the
lossesare minimized; i.e.,
L( Nn
* , n* ~ Nn, n
) L(
) for all n ~ N.
,
Adaptation in Natural and Artilic~ Systems
whereb = O
' I/ {I 1 - 11
.2). If , initially , one random variableis as likely as the other to
*
=
be best, n n and the expectedlossper trial is
L *(N) ' " (b2(pl - 1I.2)/ N )[ 2 + In [ Nt / ( S...b4In Nt )] ] .
"
"
(Given two arbitrary functions, Y( t) and Z ( t), of the samevariable t, Y( t) ' " Z ( t)
will be usedto meanJim,_ - ( Y( t)/ Z ( t = I while " Y( t) ~ Z ( t)" meansthat under
stated conditions the differenceY ( t) - Z ( t is negligible.)
Proof: In order to select an n which minimizes the expectedloss, it is necessary
first to write q( Nn , n) as an explicit function of n. As defined aboveq( Nn
, n)
=
that
the
observation
N
.
More
is the probability that ~(2)( ) ~
, say,
carefully, given
'
'
~ = ~(2)(N) , we wish to determine the probability that ~ = ~ . That is, we wish to
determine
'
'
q( Nn , n) = Pr {~ = ~ I ~(2) = ~ }
as an explicit function of N - nand n. Bayes's theorem then givesus the equation
Pr {~' = ~ I ~(2) = ~' }
-
'
'
'
'
' "
Letting q , q , and p designatePr{~ = ~(2) I ~ = a } , Pr {~ = ~(2) I ~ = ~} , and
'
'
Pr{~ = ~} , respectively
, and usingthe fact that ~ must be ~ if it is not ~, this
can be rewrittenas
q( Nn
'
'
, n) = q p/ (q p + q" ( 1 - p .
a canonicalnormaldistributioncf(x) where
x =
Y - (PI- lit)
2
!! -+: 0'2
~ nN ~ - n
-'
1 .e
- s/2.
cf(- x) ~ 1- cf(x) "'~ ";\72
; x
Thus
6
'
6
'
t
2
~
+
.
~
.
/
2
-_2(Pt
-#
I
e
I
n
N
1
~
~
12
.2)2.
.
q'.<.,~
~
_
nexP2
0
'
v2,. Xov2,.
2
+ - n]
[!!nN
(, , ]
Jjt )
(
~
:
!
I
=
~
~
T
I
:
N
n
V
.$I ~ (P112
)I-;:
~.=
<~}
q"~1- Pr
-:.nn
{N
I
exp 2
- (PI- 1I
.t)2 .
6'2
1 + .!!2
[ Nn n]
From this we see that both q'- and q- " are functions of the variances and means as
well as the total number of trials, N , and the number of trials, n, given f ' . More
'
importantly, both q and I - q" decreaseexponentially with n, yielding
q( Nn
with the approximation being quite good even for relatively small n. For p = !
this reducesto
, n) " " q'
q( Nn
where the error is less than min {(q' )2, ( I - q" )2} . ( If one random variable is a
priori more likely than the other to be best, i.e., if p ~ ! , then we can seefrom
Artificial,Systems
~ ( x)
~ (y)
distribution of
Area = Pr
.. L - ~
nN ~
<
oJ
{
~
y. o
x=
-< II. I p. }
~2+_:N
0':"2-1:-2'
V,-~
y =1".- 1" 2
x =O
y
x
G
'
- - rI
above and from what follows that fewer trials can be allocated to attain the same
reduction ofq ( Nn
, n). The expectedloss is reducedaccordingly.)
The observation that q' and henceq( Nn
, n) decreasesexponentially with
n makesit clear that, to minimize loss as N increases
, the number of trials allocated
the observed best, Nn , should be increased dramatically relative to n. This
observation (which will be verified in detail shortly) enables us to simplify the
expressionfor x .. Whatever the value of 0'2, there will be an No such that , for any
N > No, O
' I / (Nn ) O
' f / n, for n close to its optimal value. ( In most casesof
interest this occurseven for small numbers of trials since, usually, 0'1is at worst an
order of magnitude or two larger than 0'2.) Using this we seethat, for n close to its
optimal value,
- )V;
.
Xo~ (PI 0'I'2
1 , N > No
= 111
.1 = 1IJ
.1 -
where
dq
dq
- +I- q- n-dn
]
[ q + (N n) dn
11
.21. (1- 2q
) + (N- 2n
)~]
[
11
.21 .
1 - e-z~/2- e-zO
2/2 dxo
-dq,<
- = - -q + X~ -dxo
--dn.., .yI2
J dn [ Xo
J dn
,..[ X~
and
dL
#1.1- #1.2= ~ . Thus
~ ,...,
- ,<
. (1.1- #1.21
dn < 20
2n
dn.., 111
[
'IvIn
+ I) .
) - (N- 2n
).q. (~2n
2q
J
. .q.(~+I)
0~(I- 2q
)- (N-2n
.2n
)
. --< 1- 2q
N-2n
2n
.
( ) q.(x~+I)
es I because
< I / x~ and that ( I - 2q) rapidly approach
q
Noting that I / (~ + I ) ,...,
decreases
exponentiallywith n, weseethat ~
.$ 4 ' wherethe error rapidly
eszeroas N increases
. Thus the observationof the precedingparagraph
approach
-bestgrowing
is verified, the ratio of trials of the observedbestto trials of second
exponentially.
Finally, to obtain n* asan explicitfunction of N , q must be written in terms
*
of n :
*< 2V
N- *2n
:2';1"0'1 1 [ PI- #l.2)2n
*)/(20
'f)].
n -- (PI- #1.2). w .exp
b =
Introducing
0'1/ <1 1 - 11
-2) and ni = Nn
* for
simplification , we obtain
NI,<
..,~ .b.expI [(b- 2n* + In n*)j2]
Adaptation in NaturalandArtificialSystems
N
n* + b21nn* ~ 2b2.ln : --~
bY 8...
where the fact that ( N - 2n * ) ,..." ( N - n * ) has been used, with the inequality
*
generally holding as soon as NI exceeds n by a small integer . We obtain a recursion
*
for an ever better approximation to n as a function of NI by rewriting this as
~* J.
n*~blIn[~~
Srn
Whence
n* ~ b2In
~ !!1.
~ b2In [ 8... In b- lNl)27ST
) - In n*]
~ b2ln
b-
4Nl
~ b21n[ 8... InNl] '
where, again, the error rapidly approaches zero as N increases. Finally, where it is
desirableto have n* approximated by an explicit function of N , the stepshere can
be redone in terms of N In * , noting that ni / n* rapidly approaches NIn * as N
increases. Then
n*""b21n
[8;ij~~"Nt
]
*)q(Nn
, n*) + n*( l - q( Nn
*q(Nn*
*
, n* )+n
0[N-N2n
=1"1- 1'21
N
]
*
*
2n
n
0[~ +N]
~1"1- 1'21
- 1'212+ln HI .
"lN
~bll
[ (S..b4InNI
)]
*, n* ]
Q.E.D.
.
Ntt>= N - n* "" N "" v S...bcln N2e"*/26I
Thus the loss rate will be optimally reduced if the number of trials allocated ~(l)
grows slightly faster than an exponential function of the number of trials allocated
~(2). This is true regardlessof thefonn of the distributionsdefininga and & . Later we
will see that the random variables defined by schemataare similarly treated by
reproductive plans.
It should be emphasizedthat the above approximation for n* will be misleading
for small N when
.1 - 11
.2 is small enough that, for small N , the standard deviation of
(i ) 11
.
.
SI
~
- IS Iarge reIatlve to 11
.1 - 11
.2 and, as a consequence
, teh
n
N - n
xo) fails,
approximation for the tail I or
ii
( ) 0'2 is large relative to 0'1so that , for small N , the approximation for Xois
inadequate.
Neither of thesecasesis important for our objectiveshere. The first is unimportant
becausethe cumulative losseswill be small until N is large since the cost of trying
~ is just 11
.1 - 11
.2. The secondis unimportant becausethe uncertainty and therefore
* is large; hencethe
the expectedloss dependspri,marily on a until Nn
expected
loss rate will be reducednear optimally as long as Nn
~ N (i .e., most trials go
to ~(1 , as will be the caseif n is at least as small as the value given by the approximation
for n * .
*
Finally , to get someidea of n when 0'1is not known, note that for bounded
payoff, ~..:d - + [ ro, rl ] , the maximum variance occurs when all payoff is concentrated
at the extremes, i.e., p( ro) = p( rl ) = I . Then
I r}-2+ 1ro
-2 - I '1+ 1'02= '1--'02.
=(2
0'2
'2
1~O
mu
2 ) ('2 '2) (2 )
REALIZATION OF MINIMAL LOSSES
This section points out , and resolves, a difficulty in using L *(N) as a perfonnance
criterion. The difficulty occurs because
, in a strict sense, the minimal expectedloss
rate just calculated cannot be obtained by any feasibleplan for allocating trials in
terms of observations. As suchL *(N ) constitutesan unattainable lower bound and,
if it is too far below what can be attained, it will not be a useful criterion . However,
we will see here that such loss rates can be approached quite closely (arbitrarily
'
) by feasible plans, thus verifying L *(N) s usefulness.
closely as N increases
*
The sourceof the difficulty lies in the expressionfor n , which was obtained
on the assumption that the n* trials were allocated to ~(2)(N) . However there is no
realizableplan (sequentialalgorithm) which can ' ~foresee" in all caseswhich of the
two random variableswill be ~(2)(N) at the end of Ntrials . No matter what the plan
1', there will be some observational sequencesfor which it allocates n > n * trials
to a random variable ~ (on the assumptionthat ~ will be ~(1)(N only to have~ turn
out to be ~(2)(N) after N trials. ( For example, the observational sequencemay be
such that at the end of 2n* trials l' has allocated n* trials to each random variable.
l' must then decide where to allocate the next trial even though each random variable
has a positive probability of being ~(2)(N) .) For these sequencesthe loss rate
will perforce exceedthe optimum . Hence L *(N) is not attainable by any realizable
1'- there will always be payoff sequenceswhich lead l' to allocate too many trials
to ~(2)(N) .
There is, however, a realizable plan 1'(- ) for which the expectedloss per trial
1..(1'(- ), N) quickly approaches L *(N) , i.e.,
Proof: The expectedloss per trial L( T(_), N) for T(_) is determined by applying the
earlier discussionof sourcesof loss to the present case.
1
L( T(_), N) = N . <Ill - 1&2) . [(Nn
* *
*
* *
)q(n , n ) + n ( l - q( n , n ]
whereq is the samefunction as before, but here the probability of error is irrevocably
determined after 2n* trials. That is
* *
q(n , n ) ~
(~
. v2T(pl - ,..
. exp
~[
- CPI- P2)2
/
( #.
+ ;1 .
)]
Rewriting
'
L( 'T(- >, N) we have
*q(n*,n*)+n* .
1.(1'(->,N)=(PI- 12)[N-N2n
N]
Q.E.D.
there exist plans which have loss rates closely approximating L * (N) as N increases.
3. MANY OPTIONS
The function L *(N) sets a very stringent criterion when there are two uncertain
options, specifying a high goal which can only be approached where uncertainty
is very limited. Adaptive plans, however, consideredin terms of testing schemata
,
face many more than two uncertain options at any given time. Thus a general
performancecriterion for adaptive plans must treat loss rates for arbitrary numbers
of options. Though the extensionfrom two options to an arbitrary number of r
*
options is conceptually straightforward, the actual derivation of L (N) is considerably more intricate. The derivation proceedsby indexing the r random variables
.1 > 11
.2 > . . . > p.,. (again,
a , ft , . . . , ~,. so that the meansare in decreasingorder 11
without the observerknowing that this ordering holds).
THEOREM
5.3: Under the same conditions asfor Theorem5.1, but now with r random variables, the minimum expected loss after N trials must exceed
(PI- 112
) .(' - 1)b2 2 + In S ' [
( 'l{
~ b41nNt) ]
where b = D'l / {I lp
.,.) .
Proof: Following Theorem 5.1 we are interestedin the probability that the average
of the observations of any ~i, i > 1, ex~ s the averagefor a . This probability
of error is accordingly
...,nr
<~
q(nb
),..,P,{(~
n.<~
n2
nl)
nl) or(~
or . . . or
(~
<~ ,
)}
where ni is the number of trials given ~.., and the loss ranges from ( PI - 112
) to
( PI - p. r) depending on which ~i is mistakenly taken for best.
Let n = EC - 2 ni , let m = min { n2, na, . . . , nr} , and letj be the largest index
of the random variables ( if more than one) receiving m trials .
The proof of TheoremS . I shows that a lower bound on the expected loss
is attained by minimizing with respect to any lower bound on the probability
q ( a point which will be verified in detail for r variables ) . In the present case q must
exceed
- r)2m
I
0'1 '
---; Pr{ ~
exp[ <11 2cr1
~ ]
m< Nn } > q = ~ <11- IJ.r)Vm
usingthe fact that (PI- Pr) ~ (PI- 11
.;) for anyj > I. By the definitionofq
LN,,(n) > L~,,(n) = (II.I - 1I.2)[(Nn
)q + n( 1 - q)]
= (PI- 11
.2) (N - 2n) ~ + ( 1 - 2q) = O.
[
]
es I as N increases
Solvingthis for n* and noting that 1 - 2q rapidly approach
,
gives
-l.
dq
n*'""N
+
2 ('d)ii
'
Noting that q must decrease less rapidly than q with increasing n , we have
(dq' / dn) < (dq/ dn) and , taking into account the negative sign of the derivatives ,
-l.
dq
'
n*>"N
+
2 ('d")ii
(This verifies the observation at the outset, since the expectedloss approaches n*
as N increases
- see below.) Finally , noting that n > (r - l ) m, we can proceed
as in the two-variable caseby using (r - 1)In in place of n andtaking the derivative
of q' with respectto m instead of n. The result is
n * > ( I' -
1) b21n
( 8-.(1' - ~ b41n Nt )
of Trials
-1I
>(PI
.2).(r- 1)[,2[2+In(ST
Nt
)].
(r- ; b4ln
Q.E.D.
4. APPLICATION TO SCHEMATA
We are ready now to apply the criterion just developedto the generalproblem of
ranking schemata. The basic problem was rephrasedas one of minimizing the performance
lossesinevitably coupled with any attempt to increaseconfidencein an
observedranking of schemata. The theorem just proved provides a guideline for
solving this problem by indicating how trials should be allocated among the
schemataof interest.
To seethis note first that the central limit theorem, usedat the heart of the
proof of Theorem 5.3, applies to any sequenceof independent random variables
having meansand variances. As such it applies to the observedaveragepayoff p,~
of a sequenceof trials of the schema~ under any probability distribution P over (i
(cf. chapter 4). It even applies when the distribution over (i changeswith time (a
fact we will take advantageof with reproductive plans). In particular, then, Theorem
5.3 applies to any given set of r schemata. It indicates that under a good
adaptive plan the number of trials of the (observed) best will increaseexponentially
relative to the total number of trials allocated to the remainder.
Near the end of chapter 4 it was proposed that the observedperformance
rankings of schematabe stored by selectingan appropriate (small) set of elements
<B from (i so that the rank of each schema would be indicated by the relative
number of instancesof ~ in <B. Theorem 5.3 suggestsan approach to developing
<B, or rather a sequence<B( l ), <B( 2), . . . , <B( t), according to the sequenceofobservations of schemata. Let the number of instancesof ~ in the set <B( t ) representthe
number of observationsof ~ at time t. Then the number of instancesof ~ in the set
UT. l <B( t ) representsthe total number of observations of ~ through time T. If
'
schema~ should persistas the observedbest, Theorem 5.3 indicatesthat ~ s portion
of UT. l <B( t ) should increaseexponentially with respectto the remainder. We can
'
look at this in a more " instantaneous" sense. ~ s portion of <B( t ) correspondsto the
"
rate at which ~ is being observed, i .e., to the " derivative of the function giving
'
~ s increase. Sincethe derivative of an exponential is an exponential, it seemsnatural
to have ~' s portion M ~ t) of <B( t ) increase exponentially with t (at least until ~
'
occupiesmost of <B( t . This will be the caseif ~ s rate of increaseis proportional
to the observedaveragepayoff .aE<t) of instancesof ~ at time t or , roughly,
dMt <t )/ dt = .aE<t )ME<t ).
It will still be the caseif the rate is proportional to the schema's " usefulness," the
difference between.aE<t ) and the overall averageperformance .a( t) of instancesin
<B( t), so that dMt <t)/ dt = <J1E
<t) - .a( t ME<t ). ( In genetics.aE<t ) - il (t) is called the
" of when is defined on a
"
excess
~
~
single locus, i.e., when ~ is a specific
average
allele.)
The discussionof " intrinsic parallelism" in chapter 4 would imply here that
each~ representedin <B( t ) should increase( or decrease
) at a rate proportional to its
"
"
t
If
use
fulness
t
.
this
could
be
observed
done consistently then each ~
.at< ) il ( )
would be automatically and properly ranked within <B(t) as t increases.The reasoning
behind this, as well as the proof that reproductive plans accomplish the task,
will be developedin full in the next two chapters.
6. Reproductive
Plans
andGenetic
Operators
Adaptation
in Natural
and Artificial
Systems
(11
,
M,
(11
cB
(t),
8M=
0,
8MX
,
Adaptation in Natural and ArtificialSystems
p : Cl1- + 0 ,
an integer)
.ac= ( Elf . 1/ls(AA,(t )/ M.
SubstitQte<B' in placeof <B.
for modification.
be
"
"
Algorithms in the class<Rdare closer to some of the deterministic models
of mathematical genetics. It is easier, in some respects, to interpret the role of the
population <B(t ) in theseplans than it is for the strictly sequential, stochasticplans
in {Rl. On the other hand the algorithms in {Rllook more like the " one-point-at-atime
"
algorithms of numerical analysis and computational mathematics. Though
{Rland <Rdbehavesimilarly, it is useful to have both in mind, translating from one
to the other as it aids understanding.
For both types of plan the operators brought into play in step 5 are critical
in determining just how past history is stored and exploited. The examination of
specific operators can be expedited by subsuming ml and <Rdin a single overall
diagram. Plans which satisfy this diagram and retain a recognizablevariant of the
"
"
reproduction according to performance procedures in ml or <Rdwill be called
plans of type m.
1
tt2
t3
-step
3.1Is
the
time
Ino
Iyes
t7
3
.~2
4
,s
1
6
Sett - 0 and initialize <B.
"
Modify parametersfor production of a new structure ( off " .
spring )
because
(In <H. steps6 and 7 are amalgamatedand the testsin 3 are unnecessary
exactlyone newstructureis formedper time-step.)
The next four sectionswill investigatethe role of generalizedgenetic operators
in plans of type <H. We will seethat <B(t ) is usedbasicallyas a pool of schemata.
( Recallfrom chapter 4 that this meansthat <B( t) acts as a repository for somewhere
between 2' and M . 2' schemata; i .e., it contains instances of this many distinct
schemata.) Past history is recorded in terms of the ranking (number of instances)
of each schemain <B(t ), much as discussedat the end of chapterS. From this point
of view crossing-over acts to generatenew instancesof schemataalready in the pool
while simultaneously generating(instancesof) new schemata(seesection 6.2) . In
general a total of 2' schematawill be affected by each crossing-over (seeLemma
6.2.1) . Inversion(section 6.3) affectsthe pool of schemataby changing the linkage
(association) of alleles(attributes) defining various schemata. In combination with
reproduction, the net effect is to increasethe linkage of schemataof high rank
at
az
,
az + l
. . .
~
/ - f - . . .
.
.
.
1
- +- - 1' - 1
,
,
a2
as
as +l
I
. . .
I
I
( To incorporate crossing-over directly into plans of type <R one of the resultant
structures is discarded.)
The quickest way to get a feeling for the role crossing-over plays in adaptation
is to look at its effect upon schemata. To do this, consider <B( t ) as a pool of
schemata(following the suggestionsof chapter 4) where the number ME<t ) of instances
of ~ in <B(t ) reflects ~' s current " usefulness." The two direct effects of
crossing-over on this pool are:
1. Generation of new instances of schemata already in the pool . E.g.,
A = alai . . . a, is an instance of the schema alai 0 . . . 0 and, after
I
crossing-over with A = a~a~ . . . af, we have a. new instance of
, lor some
I . . . a,
, (assummga, ~ af ~
alai 0 . . . 0 , name1y alai . . . azDz
+I
i ~ x ) . Each new instance of a schema~ amounts to a new trial of the
random variable corresponding to ~. As such it increasesthe likelihood
that the observedaverage performance .4Eof the instancesof ~ closely
approximatesthe expectation IJ.Eof the random variable ~.
2. Generation of new schemata(i.e. schematahaving neither A nor A I as
an instance) . E.g., after the crossing-over of A with A ' the schema
I
0 . . . 0 azD~+1 0 . . . 0 has an instance, though neither A nor A are
instancesof it (if az ~ a~or az+l ~ a~+ I). Thus 0 . . . 0 azD~+1 D . . . 0
will receive its first trial with the instance alai . . . azD~+1. . . a~, unless
the schema has previously been introduced to the pool from another
source.
100
101
To balanceall of theseequationssimultaneously
, note first that the set of
-over,
alleles{;~, jft } is identicalto the setof alleles{j ~, ift } since, after crossing
the sameallelesare still presentat thejth positions(thoughto the right of x they
will havebeeninterchanged
). Hence
P( /;t) P(.;Et) = P(j~) P(jEt).
Thus, if P(f) = nip(jf ) for eachf, as definedin the statementof the lemma, we
havefor any x, ~, ft, s~, sEt,
P(~)P(Et) = (nip(,-a)Xn; p(~ = Nip(;~) p(;ft)
= n ; p(j~) p(jEt) = ( nip(j~)Xn; p(jEt
= P(s~) P(sft).
In other words, eachof the equationswill be balancedif the schemataoccurwith
probabilities~(f) = lljP (;f); it is also clearthat any departurefrom theseprobin some
abilitieswill unbalancethe equationsin sucha wayasto resultin changes
~(f) is the unique" steady
of the probabilitiesof occurrence
. Thus, the assignment
state" (fixedpoint) of the crossoveroperator.
Q.E.D.
" toward
We canseefrom the proof of this lemmathat a kind of " pressure
the steadystate
4 = P(~)P(Et) - P(s~) P(sft)
can be definedfor eachquadruple~, ft, s~, sEt. If 4 ~ 0 for any quadruplethen
will start changingandtherewill be a diffusiontoward
probabilitiesof occurrence
a
'
the resultants ~, Sft (4 > 0) or the precursors~, ft (4 < 0). For example
, if
P(~) > ~(~) whilethe othercomponents
remainat their steadystatevalues,there
will be a " movementto the right" - a tendencyto increasethe probabilitiesof the
result. The followingheuristicargumentgivessomeidea of the rate of approach
:
to steadystatefrom suchdepartures
A givenindividual has probability 2/ M of beinginvolvedin a crossover
when <B(t) containsM individuals(sincetwo individualsare involved in each
applicationof the crossoveroperator). Thus in N trials a given individual can
-overs. WhenN is in the vicinity of /M/ 2, where
expectto undergo2N/ M crossing
I is the lengthof individualrepresentations
, eachindividualin the populationcan
-overat almosteveryposition.
beexpected
to haveundergone
crossing
independent
As a resultevenextremedeparturesfrom steadystateshouldbe muchreducedin
IM/ 2 trials.
102
JI Systems
Adaptation in Natural and ArliJicia
The reduction to steady state does not , however, proceed uniformly with
respectto all schematabecausethe crossoveroperator inducesa linkage phenomenon
. Simply stated, linkage arisesbecausea schemais lesslikely to be affectedby
crossoverif its defining positions are close together. In more detail, let ~' s defining
" "
positions (those not having a 0 ) be 4 < is < . . . < iAand let the length of ~ be
defined as I(~) = (iA - 4) . Then the probability of the crossoverfalling somewhere
in ~, once an instance of ~ has been selectedfor crossing-over, is just I(~)/ (/ - I ) .
E.g., if A = alasaa
. . . a, is selectedfor crossing-over, the probability of the
D4a6
crossover point x falling within ~ = 0 as 00 a6 0 . . . 0 is 3/ (1 - I ). Clearly
the smaller the length of a schema, the less likely it is to be affected by crossingover. Thus, the smaller the length of ~, the more slowly will a departure from ~(~)
be reduced.
Alleles defining a schema~ of small length /(~) which exhibits above-average
performancewill be tried ever more frequently as a unit under an adaptive plan of
type <R. I .e., the alleleswill be associatedand tried accordingly. More modifications
and testsof suchschematawill be tried , and many of these trials will be of a variety
of combinations with other similarly favored schematadefined at other positions.
In effect such schemataserveas provisional structural elementsor primitives. This
observation is made preciseby the following simple but important
THEOREM
6.2.3: Consider a reproductiveplan 0/ type CRusing only the simple
crossover operator- defined as a crossoveroperator with both precursors, and the
single crossoverpoint, determinedby Wliform random selection. Then the expected
proportion 0/ eachschemarepresentedin <B(t ) changesin onegeneration/ romP (F., t ) to
103
104
this adaptive process by continually introducing new schemata for trial , while
testing extant schemata in new contexts- all this without much disturbing the
ranking process(except for the longer schemata). Moreover, crossing-over makes
it possible for the schematarepresentedin <B( t) to move automatically to appropriate
rankings through the application of the genetic plan to individual structures
from (t. As a result this very large number of rankings is compactly stored in a
selected, relatively small population of individuals (exploiting the possibility suggested
at the end of chapter 4) .
By extending the pressureanalogy introduced just before Theorem 6.2.3
we can gain a global view of the interaction of reproduction and crossover. Whenever
some schema ~ exhibits better-than-average performance, reproduction introduces
"
"
pressures ~ > 0, disturbing the steady state which would result from
the action of the crossoveroperator alone. The disturbancesboth shift the steadystate values X(~' ) for large numbersof schemata, becauseof changesin the proportions
P(j~) of the allelesj~, 1 ~ j ~ I , and also introduce local transitory departures
becauseP(~) > X(~). Becauseall schemataare being affected simultaneously, and
becausereproduction affects them according to observed performance, we have
a diffusion " outward" from schematacurrently representedin <B(t ), a diffusion
which proceedsrapidly in the vicinity of schemataexhibiting above-averageperformance
. This is closely analogousto a gas diffusing from some central location
through a medium of varying porosity, where above-average porosity is the
analogue of above-averageperformance. The gas will exhibit a quickened rate of
diffusion whereverit encountersa region of higher porosity, rapidly saturating the
whole region. All the while it slowly but steadily infuses enclavesof low porosity.
In effect, high porosity is exploited whereverit occurs, without prejudicing eventual
penetration into regions of lower porosity. As a result the overall rate of penetration
is much more determined by regions of high porosity and their proximity to
each other than by averageporosity.
Restated in terms of schemata, regions of higher porosity correspond to
setsof schemataof above-averageperformancewhich can be produced from each
other by relatively few crossovers. Thus, following the analogy, local optima in
performance are thoroughly explored in an intrinsically parallel fashion. At the
sametime the geneticplan doesnot get entrappedby settling on somelocal optimum
when further improvements are possible. Instead all observed regions of high
performance are exploited without significantly slowing the overall search for
better optima. Here we begin to seein a more precisecontext the powers of generalized
genetic plans, powers first suggestedin the specificcontext of section 1.4.
One final point : Plans of type <Rmeasurea schema's performance relative
105
X .. B ( 0 )
Crossover
to f (x))
point
X . 8 ( 1)
At
.00 I I 00
A2
.000100
.001000
A3
. I 0 I 000
. 101011
A4
. 110011
. 1 10000
A5
.0 I I 100
lOJ
OJ
. 100 100
.011100
a! function
Fig . 12. Some effects of a type <Rplan on a one- dimension
106
107
-l-azt
..az1azt
.-.OzI
.
.
a
e
al
2Ozt
I
'
+
Ozt
.
2
+
1
1
~
1
+
~
0
:
.
~:~
.llzl
~
1.
Zt
~+
I
al
I .
at .
. - . aQ,
It is clear that a single inversion can bring previously widely separated alleles into
close proximity , viz ., QZIand Qz,- l in the description . It is also clear that any possible
permutation of the representation can be produced by an appropriate sequence
of inversions . ( More technically , the inversions ( Xl = 0 , XI = 2), (Xl = 1, XI = 3),
. . . , (Xl = 1 - 1, XI = 1 + 1) are sufficient to generate the group of all permutations
of order I.) The effect of the inversion operator upon ( the instances of ) a
schema ~ is to randomly produce permutations ~' of ~ with varying lengths . Though
inversion alters the linkage of schemata , it does not alter the subsets of d which
'
they designate . Every permutation ~ of ~ designates the same subset in the set of
(original ) structures d ( since the same set of detector values occurs in both ~ and
'
~ ) . The lengths of many schemata are affected simultaneously by a single inversion
, so this operator too exhibits intrinsic parallelism . As with crossover , schemata
of shorter lengths are less frequently affected by the inversion operator .
108
Let us define the simple inversion operator as an inversion with both the
structure selectedfor inversion and the two points Xl and XI determined by uniform
random selection. To seethe combined effect of simple inversion, simple crossover,
and reproduction we need only refer to Theorem 6.2.3. The theorem guarantees
that , if inversion has produced a permutation ~' of ~ where l (~' ) < l (~), then the
'
proportion of ~ in <B( t ) increasesmore rapidly than the proportion of ~. For example
'
1 we can expect
, if Pc = 1 and P(~, t ), P(~ , t )
P(~' , t + 1) =
Plans
Reproductive
109
with
at x = 2 yields I , oJ, (2, at), (2, a~ as one of the resultants. The simplest way to
remedy this is to permit crossing-over only betweenhomologousrepresentations,
where two representationsare defined to be homologous if the detector indices
(first number of eachpair in the representation) are in the sameorder. For example,
I , at), (3, a,), (2, at is homologousto I , a~), (3, a' ), (2, a~ , even ifai ~ aJ for
some or allj , while I , oJ, (2, at), (3, a, is not homologous to either of the foregoing
. This remedy requires that the probability of inversion PI be small so that
there will exist substantial homologous subpopulations for the crossoveroperator
to act upon. A secondalternative (with a biological precedent) would be to temporarily make the secondof the I-tupies chosenfor crossoverhomologous to the
first by reordering it , returning it to the population in its original order after the
resultants of the crossing-over are formed. Under this alternative inversion can be
unrestricted, i.e., PI can be as large as desired.
Summing up : Inversion, in combination with reproduction and crossover,
selectively increasesthe linkage (decreasesthe length) of schemata exhibiting
above-averageperformance, and it does this in an intrinsically parallel fashion.
4 . GENERALIZED GENETIC OPERATO Rs - MUTATION
Though mutation is one of the most familiar of the genetic operators, its role in
adaptation is frequently misinterpreted. In geneticsmutation is a processwherein
one allele of a geneis randomly replacedby (or modified to ) another to yield a new
structure. Generally there is a small probability of mutation at each gene in the
structure. In the formal framework this meansthat , each structure A = alai . . . a,
in the population <B(t ), is operated upon as follows:
I . The positions XI, X2, . . . , x ,. to undergo mutation are determined (by a
random processwhere each position has a small probability of undergoing
mutation, independently of what happensat other positions) .
.
,
,
2. A new structure A ' = al . . . a'%
la'%
I- la%
I+ 1 . . . ~ - la~ + 1 . . . a,. IS
formed where a~ is drawn at random from the range V, of the detector
8, corresponding to position XI, each element in V, being an equilikely
candidate; a~, . . . , a~, are determined in the sameway.
If IPM is the probability of mutation at each position, then the probability of h
mutations in a single representation is given by the Poisson distribution with
parameter IPM.
110
1 - ( 1 - IPM),t(f)
(~
P (~, t ) .
"
"
Summing up : Mutation is a background operator, assuring that the
crossoveroperator hasa full rangeof allelesso that the adaptive plan is not trapped
on local optima. (Of course if there are many possiblealleles- e .g., if we consider
a great many variants of the nucleotide sequencesdefining a given gene- then even
a large population will not contain all variants. Then mutation servesan enumerative
function , producing alleles not previously tried.)
5. FURTHERINCREASES
IN POWER
The next chapter will establish that the three genetic operators just describedare
adequatefor a robust and general purpose set of adaptive plans, with one irnportant reservation which will be discussedat the end of this section. However, there
are additional operators which can rnake significant contributions to efficiency in
rnore cornplex situations. Chief arnong these is the dorninance- change operator
which (arnong other things) helpsto control lossesresulting frorn rnutation. Because
lossesresulting frorn rnutation, for given IPM, do not dirninish as scherna~ gains
"
"
high rank , a constant load is placed on the adaptive plan by the randorn rnovernentsaway frorn optirnal configurations. For this reasonit is desirableto keep the
rnutation rate IPMas low as possibleconsistent with rnutation' s role of supplying
rnissingalleles. In particular, if the rate of disappearanceof allelescan be lowered
without affecting the efficiency of the adaptive plan, then the rnutation rate can be
proportionally lower. Sincethe rnain causeof disappearanceof alleles is sustained
below-average perforrnance, the rate of loss can be reduced by shielding such
112
alleles from continued testing against the environment. Dominance provides just
such shielding.
To introduce dominance, we must extend the method of representation
once again. Pairs of alleles will be used for each detector, so that a representation
involves a pair of homologous I-tupies. The object is to let someof the extra alleles
be carried along with the others in an unexpressedform , forming a kind of reservoir
of protected alleles. Precisely, then, the set of representationswill be extended
to the set of all permutations of homologous pairs d ~ = (n ~- l ( Vl )2)t . Sincethere
is now a pair of alleles at each position there is no longer a direct correspondence
betweenthe detector values for a structure A and the representationof A.
Let (A ' , A " ) be a homologous pair of I-tupies drawn from d ~ and let
'
"
'
" where v' v"
,( A , A ) = d/ h, v ), (h, v
E: Via
,
, designate the pair of alleles
occurring at the ith position of the I tupies. The most direct way to relate this pair
of I-tupies to a structure is to designateeither v' or elsev" as the value of detector h,
ignoring the other allele. The allele so designatedwill be called dominant, the other
recessive
. For each position i , this designationshould be completely determined by
information available in the pair (A ' , A " ) . Formally , for each i there should be a
dominance map di :d ~- + d such that , for each (A ' , A " ) E: d ~, d.{ A ' , A " ) is either
the first allele or the secondallele of .( A ' , A " ). It should be emphasizedthat in this
' "
general form , the determination of the dominant allele in .( A , A ) may depend
' "
upon the whole context (i.e., the other allelesin (A , A . ( Thiscorrespondsclosely
'
with Fisher s [ 1937, Chapter III ] theory of dominance.) A simpler approach makes
the determination dependentonly upon the pair ,( A ' , A " ) itself. Thus, for each h,
di:<l&-. <l suchthat for .(A' , A" ) = h, v'), (h, v" , d.( A' , A" ) = d,.(v' , v" ).
A E: <l for which
thestructure
(A' , A" ) represents
Accordingly
8A
,(A) = di(A
,)(A' , A" )
where i (h) is the index of the pair of allelesin (A ' , A " ) for detector h.
A particularly interesting example of the simpler dominance map, useful
for binary (two allele) codings (seechapter 4), can be constructed as follows. Let
VA= { I , 10, O} , where 10is to be recessivewheneverit is paired with 0, and let the
mapping dA: v: - + VAbe given by the following table:
113
,,', ,,"
' '
IdA(,, , ,, ')
0
1
1
1 0
1 10
1 1
1
1
1
10 0
10 10
10 1
0
0
1
0 0
0 10
0 1
= 0101.
114
Stated another way, allele Vois only expressedor tested when it occurs in the pair
(vo, VI) . Let us assumethat , on the average, the adaptive plan is to provide at least
one occurrenceof each allele in every T generations. That is, P(Vo
, t ) ~ 1/ MT must
be assured. In the absenceof dominance(using the earlier singleI-tuple representation
), let the reproduction rate of 1'0 (corrected for operator losses) be ( 1 - E(vo
exclusiveof additions resulting from mutation. Then
P(Vo
, t + I ) = ( I - E(VoP(Vo, t) + IPM
, t - IPMP
, I ).
( I - P( Vo
(Vo
To keepP(vo, t) ~ 1/ MT for all t , IPMmustbe at leastlargeenoughto maintain
the steadystateP(Vo
, t) = P(Vo
, t + I ) = 1/ MT. That is,
1/ MT = ( I - E(Vo/ MT + IPM
( I - 2/ MT)
or
IPM= E(Vo
)/ 1 - 2/ MT) . MT) .
If MT is at a1llarge(asit will be for all casesof interest) this reducesto
IPM~ E(Vo
)/ MT
as a close approximation to the mutation rate required without dominance . ( In the
extreme case that alleles Voare deleted whenever they are tested, IPM = II MT .)
With dominance , the allele Vois subject to selection only when the pair (Vo
, Vo)
occurs . Under crossover , as extended to homologous pairs , the pair (vo, vo) occurs
with probability PI (Vo
, I ) . The loss from selection then is
2E{t'o) P2("o, t ) M
the factor 2 occurring because 2 copies of Voare lost each time the pair (vo, vo) is
deleted. Again the gains from mutation are
IPM. ( I - P (Vo
, t ) 2M -
IPMP(Vo
, t 2M
where the factor 2 occurs because the M homologous pairs are 2M I -tupies . Thus
P (Vo
, t + I ) = P (vo, t ) - 2E{Vo)P2(vo, t ) + IPM. 2( 1 - 2P(vo, I
for the homologous pairs with dominance . Setting P (Vo
, t ) = P (vo, t + I ) = II MT
as before , and solving, we get
IPM = E{vo)/
We have thus established
1 - 21MT) ( M1' ) 2) .
lIS
LEMMA6.5.1: To assure that, at all times, each allele a occurs with probability
P(a, t ) ~ 1/ MT , the mutation rate IPMmust be ~ 1/ MTin the absenceof dominance,
but only ~ ( 1/ MT )2 with dominance.
For example, to sustain an averagedensity of at least lo- a for every allele, the
mutation rate would have to be lo - a without dominance, but only lQ-6 with
dominance.
It should be noted that , with dominance, P(Vo
, t ) is no longer the expected
rate.
dominance
allows
the
constant
mutation load to be reduced,
testing
Although
while maintaining a given proportion of disfavored allelesas a reserve, the testing
rate of the reservedallelesis only p2(Vo
, t ) not P(Vo
, t ) . This reservoir is only released
through a changein dominance.
Dominance change in the general case di : a ~- + a , i = I , . . . , I, occurs
simply through a changein context, so that dominance is directly subject to adaptation
by selectionof appropriate contexts. In the more restricted casedAn - + VA
a special operator is required. The example using VA= { I , 10, O} will serve to
illustrate the process. The basic idea will be to replace some or all occurrencesof
I by 10, and vice versa, in an I-tuple. Thus the previous recessivesbecomedominant
and vice versa, this changebeing transmitted to all progeny of the I-tuple. A simple
way to do this is to designatea specialinversion operator which not only inverts a
segment but carries out the replacement in the inverted segment. ( In genetics,
there is a distant analogue in the effects produced by changesof context when a
region is inverted, but it should not be taken literally .) Thus for the dominancechangeinversion operator, step 3, p.l O7of the inversion operator is followed by
4. In the inverted segment each occurrence
versa.
With this operator the defining alleles of an arbitrary schema can be " put in
reserve" in a single operation to be " released" later, again in a single operation.
Dominance provides a reservedstatus not only for alleles but, more importantly
, for schemata. A useful schema~l defined on many positions may be the
result of an extensivesearch. As such it representsa considerablefragment of the
'
adaptive plan s history, embodying important adaptations. When it is superseded
.
by a schema~2 exhibiting better performance, it is important that ~l not be discarded
until it is establishedthat ~2 is useful over the samerange of contexts as ~l .
'
~2Sperformanceadvantagemay be temporary or restricted in some way, or ~l may
be useful again in some context engenderedby ~2. In any caseit is useful to retain
116
F.l for a period comparable to the time it took to establish it . Dominance makes
this possible.
Summing up : Under dominance, a given minimal rate of occurrence of
alleles can be maintained with a mutation rate which is the square of the rate required
in the absence of dominance. Moreover, with the dominance-change
"
"
operator the combination of alleles defining a schema can be reserved (as
"
"
recessives
) or released (as dominants) in a single operation.
When the performance function depends upon many more or less independent
factors, there is another pair of operators, segregationand translocation
which can make a significant contribution to efficiency. In such situations it is
useful to make provision for distinct and independentsetsof associations(linkages)
betweengenes. This again calls for an extension in the method of representation.
Let eachelement in ct be representedby a set of homologous pairs of n-tupies, and
let crossoverbe restricted to homologousn-tupies. After two elementsof ct, A and
A ' , are chosenfor crossoverand after all homologous pairs have beencrossed(as
detailed under the discussion of dominance change) then from each pair of resultants
one is chosen at random to yield the offspring' s n-tupies. Each offspring
thereby consists of the same number of homologous pairs of n-tupies as its progenitors
. The genetic counterpart of this random selection of resultants is known
as segregation. Clearly, under segregation, there is no linkage betweenalleles on
separatenonhomologousn-tupies, while alleles on homologousn-tupies are linked
as before. With this representationit is natural to provide an operator which will
shift genesfrom one linkage set to another (so that , for example, schematathat
are useful in one context of associationscan be tested in another) . The easiestway
to accomplish this is to introduce an exceptional crossover operator, the translocation
operator, which produces crossing-over between randomly chosen nonhomologQl.ls pairs.
Another genetic operation provides a means of adaptively modifying the
effective mutation rate for different closely linked sets of alleles. The operator
involved is intrachromosomalduplication (see Britten 1968) ; it acts by providing
multiple copies of alleles on the samen-tuple. To interpret this operation, n-tupies
with multiple copies of the alleles for a given genemust be mapped into the set of
original structures. This can be done most directly by extending the concept of
dominance to multiple copies of alleles. With this provision, if there are k. copies
of a given allele a, the probability of one or more mutations of allele a is k . times
greater than if there were but one copy. That is, the probability of occurrence, via
mutation, of allele a ' ~ a is increasedk . times. Thus, increasesand decreasesin
the number of copies of an allele have the effect of modifying the (local) mutation
117
Adaptation in NaturalandArtificial
Systems
118
genes). In this way one operon can causethe cell to produce signals which (with
controlled delays) turn on other operons. This provision for action conditional
upon previous (conditional) actions gives the chromosome tremendous information
-processing power. In fact, as will be shown in chapter 8, any effectively
describableinformation -processingprogram can be produced in this way.
6. INTERPRETATIONS
For the geneticist, the picture of the processof adaptation which emergesfrom the
mathematical treatment thus far exhibits certain familiar landmarks:
Natural selectiondirectsevolutionnot by acceptingor rejectingmutationsas they
occur, but by sorting new adaptivecombinationsout of a genepool of variability
whichhasbeenbuilt up throughthe combinedactionof mutation, generecombination
. For the most part Darwin' s conceptof
, and selectionover manygenerations
descentwith modificationfits in with our modernconceptof interactionbetween
es, becauseeachnew adaptivecombinationis a modification
evolutionaryprocess
. (p. 31)
of an adaptationto a previousenvironment
Inversionsand translocationsof chromosomalsegments
, when present in the
condition, can increasegeneticlinkageand so bind togetheradaptive
heterozygous
. . . . The importanceof such increasedlinkage is due to the
genecombinations
numberof diversegeneswhich must contributeto any adaptivemechanismin a
higherplant or animal. (p. 57)
Stebbinsin Process
esof OrganicEvolution
Not only do we claim in this case [of inversions found in D. pseudoobscuraand
D . persimilis] that the precise pairing of the chromosomes in the specieshybrids
shows that the chromosomal material has had a common source, but we also claim
that the sequenceof rearrangements[produced by inversions] that occurred in the
chromosome reconstructs for us the precise pattern of change that led up to and
then beyond the point of speciation.
Wallace in Chromosomes
, Giant Molecules,
and Evolution (p . 49)
At the same time the emphasis on gene interaction poses a series of difficult
problems:
Intricate adaptations, involving a great complexity of genetic substitutions to
render them efficient would only be established, or even maintained in the species,
by the agency of selectiveforces, the intensity of which may be thought of broadly,
as proportional to their complexity .
Fisher in Evolution as a Process, ed. Huxley et ale ( p. 117)
119
The interaction of genes is more and more recognized as one of the great evolutionary
factors. The longer a genotype is maintained in evolution , the stronger will
its developmental homeostasis, its canalizations, its system of internal feed backs
become. . . . one of the real puzzles of evolution is how to break up such a perfectly
co- adapted system in such a way so as not to induce extinction . . .
Mayr in Mathematical Challengesto the NeoDarwinian Interpretation
of Evolution, ed. Moorhead & Kaplan ( p. 53)
'
likely ( see Eden s [ 1967] comments ) .
In the present context each of these questions can be rephrased in terms of
the processing of schemata by genetic operators . This allows us to probe the origin
and development of coadapted sets of alleles much more deeply, particularly the
way in which different genetic mechanisms enable exploitation of useful epistatic
effects. In the next chapter , we will be able to extend Corollary 6.4.1 to demonstrate
the simultaneous rapid spread of sets of alleles , as sets, whenever they are associated
with above - average performance ( because of epistasis or otherwise ) . Theorem
7.4 establish es the efficiency of this process for epistatic interactions of arbitrary
complexity ( i .e., for any fitness function p. :d --+ ' U, however complex ) . Section 7.4
gives a specific example of the process in genetic terms and exhibits a version of
'
Fisher s ( 1930) theorem applicable to arbitrary coadapted sets. Finally , in section
9.3, the formalism is extended to give an approach to speciation . This extension
suggests reasons for competitive exclusion within a niche , coupled with a proliferation
of (hierarchically organized ) species when there are many niches.
For the nongeneticist , the illustration at the end of section 6.2 should
convey some of the flavor of algorithms of type <R as optimization procedures . It
is easy enough to extend that illustration to cover inversion and mutation . For
example , under the revised representation of section 6.3 each bit 8 is paired with
a number j designating its significance ( i .eU , 8) designates the bit 8 . 2 ;) . Thus bits
120
.!
Adaptation in Natural and Artificial System
of different orders can be set adjacent to each other in a string without changing
their significance. In consequence
, under the combined effect of inversion and
reproduction, bits defining various regions of above- averagevaluesfor f (x ) will be
ever more tightly linked. This in turn increasesthe rate of exploration of intersections
and refinementsof theseregions. Filling in the remaining details to complete
the extension of the illustration is a straightforward exercise. Section 7.3 in
the next chapter provides a detailed example of the responseof an algorithm of
type <Rto nonlinearities. Theorem 7.4 of that chapter, coupled with the comments
on dimensionality in chapter 4 (p. 71) shows that , whatever the form off (i.e.,
for anyf mapping aboundedd dimensional spaceinto the reals), an algorithm of
type <R optimizes expeditiously. Moreover, the algorithm does this while rapidly
increasing the averagevalue of the points it tests (though they may be scattered
"
through many different hyperplanes), thus making the algorithm useful for online
" control . Sections 9.1 and 9.3
provide more detailed summaries of these
advantages.
of GeneticPlans
7. TheRobustness
jJD
to
~
0
.
c
.
~
S
u
G
Q
~
u
:
"
.
~
~
=
i
I
's.j-D~
J.
122
C"
simple
tria!
designate
random
probability
( The sequence(c,) is included primarily for its effects when genetic plans are
used as algorithms for artificial systems; it is used to drive the mutation rate to
zero, while assuring that every allele is tried in all possible contexts. (c,) is not
intended to have a natural systemcounterpart and its effectscan be ignored in that
context. Seebelow.)
123
LEMMA
7.1: Under algorithms of type(Rl(PC
, PI, IPM, (c, the expectednumber
trials
the
value
the
ith
attribute
i.e.
( , allelej of detectori ),for anyi andj ,
of
of jth
for
is infinite.
Proof : ( Essentiallythis proof is a specializedversion of the Borel zero- one criterion
.) Let P jAr) be the probability of occurrenceat time t of Vi;, the jth value for
the ith attribute. Then L ... 1Pi ,( t )M is the expectednumber of occurrencesof Vi;
over the history of the system. Unless L ,Pi ,( t )M is infinite , Vi; can be expectedto
occur only a finite number of times. That is, unlessL ,Pi,( t )M is infinite , Vi; will at
best be tested only a finite number of times in each context, and it may not be
tested at all in some contexts. (Despite this a plan for which L ,Pi,( t )M is large
relative to the size of Ctl may be quite interesting in practical circumstances.)
Since L ,C, - + ~ , we have L ,c,lPM - + ~ . But Pi ,( t ) ~ C,IPMM for all t ,
whenceLtPi ,( t )M > L ,C,IPMM. Hence LtPi ,( t )M is also infinite in the limit .
Q.E.D .
124
12S
2-armed bandit problem) . The reader can learn a great deal more about convergence
'
properties of reproductive plans from N . Martin s excellent 1973study.
Since convergenceis not a useful guide, we must turn to the stronger
"
" minimal
expected losses criterion introduced in chapter 5. Results there
( Theorems5.1 and 5.3) indicate that the number of trials allocated to the observed
best option should be an exponential function of the trials allocated to all other
options. It is at once clear that enumerativeplans do not fare well under this criterion
. Enumeration, by definition, allocates trials in a uniform fashion, with no
increasein the number of trials allocated to the observedbest at any state prior to
, expectedlosses
completion; accordingly, as the number of observationsincreases
climb precipitously in comparison to the criterion. On the other hand, plans of
type <RI(Pc, PI, IPM, (c, do award an exponentially increasing number of trials
to the observedbest, as we shall seein a moment. More importantly , plans of this
type actually treat schematafrom E as options, rather than structures from (tl .
In doing this the plans exhibit intrinsic parallelism, effectively modifying the rank
of large numbers of schemataeach time a structure A E: (tl is tried. The effect is
pronounced, even in an example as simple as that of Figure 13, which illustrates
2 generationsof a small population (M = 8, I = 9) undergoing reproduction and
crossover. Specifically, under plans of type <RI(Pc, PI, IPM, (c, the number of
instancesof a schemaincreases(or decreases
) at a rate closelyrelatedto its observed
is
performance.art/ ) at eachinstant. That , the portion ME<t ) of eachschema~ represented
in population <B( t) changessimultaneouslyaccording to an equation much
like that suggestedat the end of chapter 5:
dM rtt)/ dl = .artt)M rtt).
The foregoing statementscan be establishedwith the help of
LEMMA7.2: Under a plan of type <R1
(PC, PI, lPJf, (c, , given ME<to) instancesof ~
in the population <B( to) at time to, the expectednumber of instancesof ~ at time t .
ME<t), is boundedbelowby
)PiE
)n~--llo(I - EE
<t')/ .4(t')
ME<tO
where
= (Pc+ 2P1
)/(~)/(/ - I ) + k"<t>
EE
a time -invariant constant generally close to zero, depending only upon the parameters
of the plan , the length l (~) of ~, and the nwnber 10(~) of defining positions for ~.
JaqwnN
Ja
r46 f
r4613a
WI
rH Jogd6
IP
LL~ I ILZZLL
-
127
Q.E.D.
128
Nf,ft = E ~- . Mt<t')
> Mt<t)
~
=
=
=
ME<toME<toME<to ME<to -
'
'
l )n ~_,. [( 1 - Ef)piE
<t - 1)/ .4(t - 1)] usingLemma7~2
'
'
1) exp[In n ~_,. [( 1 Ef)piE
<t - 1)/ p.(t - 1)]]
'
'
l ) exp [E ~_,. ln [( 1 - Ef)piE
<I - 1)/ .4(1 - 1)]]
1) exp[Zl .(t - to+ 1)]
However ,
129
Q.E.D.
This lemma holds a fortiori for any schema~ consistently exhibiting an effective
'
'
rate of increaseat least equal to I , i.e., .4E
), over the interval
<t ) > .4(t )/ ( 1 - EE
(to, I ) . As noted first in sections6.2 and 6.3, when I(~)/ I is small, EEwill be small and
*
the factor 1/ ( 1 - EE
) will be very close to one. Let ~ denote a schemawhich consistently
'
'
yields the best observed performance .4~ t ), to ~ t < t , among the
schematawhich persist over that interval. In all but unusual circumstances.4~ t ' )
will exceed.4(t ' ) by more than the factor 1/ ( 1 - EE
) . If I is large this is the more
certain since, until the adaptation is far advanced, I(~)/ I will with overwhelming
probability be small- seethe discussionat the end of section 6.2. Thus, for any
'
'
*
~ for which II.~ significantly exceeds.4(t ), to ~ t < t , the number of trials N~."
allocated to ~* is an exponential function of n~." .
( For natural systemsthe reproduction rate is determined by the environment
- cf . fitness in genetics- henceit cannot be manipulated as a parameter of
the adaptive system. However, for artificial systemsthis is not the case; the adaptive
plan can manipulate the observedperformance, as a pieceof data, to produce
more efficient adaptation. In particular, the reproductive stepof <Rl(PC, PI , IPM, (c,
*
algorithms, step 4, can be modified to assure that the reproduction rate of ~
).)
automatically exceeds.4{t )/ ( 1 - EE
From all of this it is clear that , whateverthe complexity of thefwrction 11
.,
plans of type <Rl( PC, PI , IPM, (c, behavein a way much like that dictated by the
optimal allocation criterion : the number of trials allocated to the observed best
increasingas an exponential function of the total number of trials n~ allocated to
structures which are not instancesof ~* . However we can learn a good deal more
by comparing the expectedloss per trial of the genetic plans <Rl(PC, PI, IPM, (c,
to the loss rate under optimal allocation. Theorem 5.3 establisheda lower bound
.2)[ 2 + In [ N'l / r - 1)2Srb. ln N'l )] ]
(r - 1)b2(IJ.~ - 11
for the expectedloss under optimal allocation, where b = Ul/ (IJ.~ - 11
.' ) . For the
is
bounded
above
the
loss
trial
by
geneticplan,
expected
per
L ~(N) = (IJ.~/ N) [N ~r 'q( N~, n ' ) + ( I - r 'q( Nf*, n ' n~]
where r ' is the number of schematawhich have receivedn ' or more trials under
the genetic plan, and, as in Theorem 5.3, q( Ne. , n ' ) is the probability that a given
*
option other than ~ is observedas best. ( This expression is simply L ~., from
130
'
Theorem 5.3 rewritten in the terms of the genetic plan s allocation of trials , NE*
and n ' , noting that r 'q( NE*, n' ) is an upper bound on q( nl, . . . , n,).) It is critical
to what follows that r ' . n' need not be equal to nE*. As <B(t ) is transformed into
<B(t + I ) by the genetic plan, each schema~ having instal1cesin <B(t ) can be expected
to have ( I - EE
)IJ.E<t )/ p( t ) instancesin <B(t + I ). Thus, over the course of
several time steps, the number of schemata r ' receiving n ' trials will be much,
much greater than the number of trials allocated to individuals A fl ~* , even when
n~, is an
n ' approaches or exceedsnE*. This observation, that generally rin '
'
explicit consequenceof the genetic plan s intrinsic parallelism.
With these observations for guidance, we can establish that the lossesof
'
genetic plans are decreasedby a factor I / (r - I ) in comparison to the losses
under optimal allocation. Specifically, we have
mO OREM 7.4 : If r ' is the number of schemata .,(or which
n' ~ [2Z~~*)b2
/ nE
-.o(O)]nE
-.O,
i.e., if r ' is the number of schematafor which the number of trials n ' increasesat
leastproportionally to nE*,O, then for any performance function #1.: - + ' U,
L ~(N) / L ~N) - + L < [ I / (r ' - 1)](IJ.~ E*,o(O)/ 2b2Z~(~*
as N - + ~ , wherethe parametersare definedas in Lemma 7.3.
Proof: Substituting the expressionfor NE* (from Lemma 1.3) and the expression
for q( N~, n' ) ( from the proof of Theorem 5.1) in L ~(N ), and noting that
'
'
( I - r q( N~ , n n ~ < n~ , gives
'
L~(N) ~
-- (P~/ N) [(r M~ O)/ ~ )
If b- 2n'/ 2 ~ Zl .(~. )nf*.o/ nf*.o(O), it is clear that the first term (the exponential term)
decreasesas nf*.o increases
, but the secondterm, nf*.o, increases. In other words, if
'
.
n ~ [ 2Z~(~ )b2/ nf*.o(O)]nf*.o, i.e., if n ' increasesat least proportionally with nf*.o,
the expectedloss per trial will soon depend almost entirely on the second term.
We have already seen(in the proof of Corollary 5.2) that the same holds for the
second term of the expressionfor expected loss under an optimal allocation of
N trials. Thus, for r ' and n ' as specified, the ratio of upper bound on the reproductive
'
'
plan s lossesto the lower bound on the optimal allocation s lossesapproaches
'
*
1&f*IIf*.o/ ({I f* - I12Xr - l ) m )
131
asN increases
. (Thiscomparisonyieldsa lowerboundon the ratio sincethe upper
bound in one caseis beingcomparedto the lowerboundin the other. It can be
established
easily, on comparisonof the respective
first termsof the two expressions
, that the conditionon n' is sufficientto assurethat the first term of L ~(N) is
alwayslessthan the first term of L:(N ). It shouldbe notedthat the conditionon
n' can be madeas weakas desiredby simplychoosingne*.o(O) largeenough.)
To proceed
, substitutethe explicit expressionsderived earlier for ne*.o
(Lemma7.3) and m* ( Theorem5.3) in the ratio ~ e*.o/ (~ - IJ.tXr' - 1) m*),
yielding
L ~/ L ~--to~ / Pe* - IJ.tXr' - 1 ](ne*.o(O)/ Z &(~* In [(N - ne*.o)/ M~ O)]
.(b21n[NI/ (8rb4(r ' - 1)21nNI)])- l .
Simplifyingand deletingtermswhichdo not affectthe directionof the inequality
we get
Or , as N grows
Q.E.D.
132
a system which is simple and artificial , while the other is to a system which is
complex and natural .
133
"
will be considered here. ( The more complicated case, involving predictive correction
"
during play of the game, is discussedin the latter half of section 8.4.)
Becausethe detectors 8i are given and fixed, the strategiesin d1 are completely
determined by the weights Wi, i = 1, . . . , I , so the search is actually a search
'
through the spaceof I-tupies of weights, W .
A typical plan for optimization in W' adjusts the weights independentlyof
each other (ignoring the interactions). However, in complex situations (such as
playing checkers) this plan is almost certain to lead to entrapment on a false peak,
or to oscillations betweenpoints distant from the optimum . Clearly such a plan is
not robust. To make the reasonsfor this loss of robustnessexplicit, consider the
'
plan T~ with an initial population <B(O) drawn from W , but with steps3 and 4 of
CRl
(O,O,O,O) extendedas follows:
A'(t)=(0,0,...,0).
r =1and
3]--""3.1Set
set
[from
t [the sameas before]
4
t4.1Set
'(t)toa,.(i(r).I). the
rthposit
value
A
the
rthposition
of
'. ofthe
tion
ofAi(I)(t)=(al(i(t),t),...,a,{i(t),tE: W
4.2Isr =11
!o Lyes- + [to6.1]
t.3Increase
rby1.
4
Clearly
134
so that
P(~, t + I) = [UJ
{ .4,t<t)/ .4(t ]P(~, t)
. Clearly the weightsat distinct positions are chosenindependently
under the plan TtR
of each other. Hence if a pair of weights contributes to a better performancethan
could be expectedfrom the presenceof either of the two weights separately, there
will be no way to preservethat observation. This can lead to quite maladaptive
behavior wherein the plan ranks mediocre schemata highly and fails to exploit
useful schemata. For example, consider the set of schematadefined on positions I
and 2 when W = { WI, Wt, Wa
} . Assume that all weights are equally likely at each
. . . 0 , say, occurs with probability
position (so that an instanceof schemaWtWaO
and
of
each
schema
be given by the following table:
let
the
i ),
expectedpayoff
Table 3: A N~- " -~~ IJ.f " Two Positi OM
Sinceall instancesare equally likely we can calculate from this table the following
expectationsfor single weights:
135
Table4: P(~) aI M
Il .JEfor the0ae- , .. Id0llSdaemata
Implicitin Table3
~
P(~)
IJ.E
WI0 0 . . . 0
W200 . . . 0
W, 00 . . . 0
0 wID . . . 0
0 W20 . . . 0
0 wID . . . 0
1/ 3
1/ 3
1/ 3
1/ 3
1/ 3
1/ 3
0.9
1.1
1.0
1.1
1.0
0.9
becomes
Clearlythecombination
WtWI
increasingly
; in fact
likelyunder'T6t
P(WtWI
0 . . . 0 , t + I ) = P(WtO . . . 0 , t + I ) .P(O WI0 . . . 0 , t + I )
= [(-4IDIO
.. .o( t)/ P,(t P( Wt0 . . . 0 , I)]
. [(P,QlDIO
.. .o( t)/ P,(t P(O WI0 . . . 0 , I)]
= 1.21P(WtWI
0 . . . 0 , I).
On the other hand, the best combination WIW
, 0 . . . 0 by the same calculation
satisfies
P(WIW
, 0 . . . 0 , t + I ) = 0.81 P(WIWI0 . . . 0 , t)
so that its probability of occurrence actually decreases
. It is true that , as
.
.
.
becomes
more probable, the values of PUIIO
WtWl0
0
. . .O and 1J
.0UII0. . .0
decrease
, eventually dropping below I , but WIwID . . . 0 is still selectedagainst,
as the following table shows:
Table5: IJ.~for the sm . . fa of Table4 whenI I I Stanees
Are Not F..quDikely
136
<R1
( 1, - , - , - ). Lemma1.2 makesthis quite clear.
0 . . . 0 , t + I ) = MtDat
Da
P( WIWa
O.. .O (t + 1)/ M
= ( I - EvlWIO
.. .O).4w
. . .o( t)MW1WIO
. . .o( t)/ M
.WIO
=
I -
whereasW2Wl
0 . . . 0 now satisfies
P(W2Wl
0 . . . 0 , t + I) =
I -
. 1.I .P(W2Wl
0 . . . 0 , I).
137
p'E(t) is boundedbelowby ( I - EE
)PiE
<t) (following Lemma1.2). For a fraction 4t
of a generation we can write
4M
t(t))+
t
)
<
~
<t)
- ME
4P
(~,t)=ME
M
+A.M
(t) t ""ii(jj"
- 1.61
1
() ) = ~ !2. + (.4E
M(t) [ 1+ (.4(t) - 1).61 1]
= P(~, t) . (p'E(t) ".4(t dt
1+ (.4(t
:
l )~t
a continuous
time - scale , we
have
138
Consider, now, how such a prediction would differ from one made under
the assumption of independent substitution of alleles, using the earlier example
(the tables of section 7.3) . In the presentcasethe elementsof W play the role of
indices: WI at position I indicates the allele I for position I is present; the same
WI at position 2 indicates the presenceof allele I for position 2, an allele which
may be quite different from the former one. Under independentselection
DO . . . 0 , t) .P( D Wit0 . . . 0 , t)
WiI D . . . 0 , t) = P(Wia
P(Wit
sothat
d
d
0 . . . 0 , I)] = di [P(Wil00 . . . 0 , t).P( O Wti0 . . . 0 , I)]
dt [P(WiIWti
~ (P( O Wit0 . . . 0 , t) d [P( WilDO . . . , I )])
11
d
+ (P( WilDO . . . 0 , t) di [P( O Wi. 0 . . . 0 , I)])
~ P( WiIWi
. O . . . 0 , I) . [a(WilDO . . . 0 , 0) + a( O Wit0 . . . 0 , 0)] .
Thus, underindependent
selection
, combinationsof alleleshavea rate of change
whichis the sumof their averageexcess
es.
Table
4
in
terms
of averageexcess
es(notingthat .4(t) ~ I ),
Reinterpreting
we seethat the rate of changeof the favorableWIW
. 0 . . . 0 ( Table3) is
- 0.2P( W1W
. 0 . . . D),
while that of the lessfavorableWSWI
0 . . . 0 is + 0.2 P( WSWI
0 . . . 0 ) under
. Thus independent
selectionleadsto maladaptationhere.
independentselection
As mentionedearlier, adaptationunderindependentselectionamountsto
adaptationunderthe operatorequilibriumof section6.2,
P(Et ) ~ >.(Et ) ~ UiP(; E, I).
This is a commonassumptionin mathematicalgenetics
, but it clearly leadsto
whenever
maladaptations
a(E) ~ Eia(;E).
The aboveequationfor a(E) in tenDsof .dEshowsthis to be the casewhenever
liE~ Eiillf , whichoccurswheneverthe fitnessis a nonlinearfunctionof the alleles
.
, i.e., wheneverthereis epistasis
present
139
On the other hand, under reproductive plans of type <RI(PC, P" IPAl, (c, ,
operator equilibrium is persistently destroyed by reproduction. In effect, useful
) are exploited. Indeed, it would
linkagesare preservedand nonlinearities (epistases
seemthat the term " coadapted" is only reasonably usedwhen allelesare peculiarly
suited to each other, giving a performance when combined which is not simply
the sum of their individual performances. Following Lemma 7.2, each coadapted
set of alleles(schema) changesits proportion at a rate determined by the particular
average (observed) fitness of its instances, not by the sum of the fitnessesof its
component alleles.
(Becauseof the stochastic nature of the operators in genetic plans, each
chromosome A E: (11 has a probability of appearing in the next generation
CB
(t + I ), a probability which is conditional on the elements appearing in CB
(t ) .
If there are enough instancesof E in CB
the
theorem
assures
that
t
central
limit
( ),
.4E
<t ) ~ IJ.ft where IJ.Eis the expectedfitness of the coadapted set E under the given
probability distribution over (11.Thus the observedrate of increaseof a coadapted
set of alleles E will closely approximate the theoretical expectation once E gains a
foothold in the population.)
Returning to the examplejust above, but now for genetic RI) plans, we see
(from Table 3) that WlWa0 . . . 0 has a rate of changegiven by
+ O.6 ' P( wlwa0 . . . 0 , t),
while WSW1
0 . . . 0 changesas
+ O.I ,p( wsw10 . . . 0 , t ) .
Consequently, the coadapted set of alleles with the higher averagefitness quickly
predominates. Thus, when epistasisis important , plans of type <RI(and the corresponding
theorems involving schemata) provide a better hypothesis than the
hypothesisof independentselection(and least mean squaresestimatesof the fitness
of setsof alleles) .
5. GENERALCONSEQUENCES
We see from Lemmas 7.2 and 7.3 that , under a genetic plan, a schema~ which
persistsin the population <B(t ) for more than a generation or two will be ranked
according to its observed performance. This is accomplished in a way which
satisfiesthe desiderata put forth at the end of chapterS . Specifically, the proportion
of ~' s instancesin the population <B(t ) will grow at a rate proportional to the
140
1. FIXEDREPRESENTATION
Before proceedingto a " language" suited to the modification of representationsit
is worth looking at just how flexible a fixed representationcan be. That a fixed
representationhas limitations is clear from the fact that only a limited number of
subsetsof (t can be representedor defined in terms of schemata based on that
representation. If (t is a set of structures uniquely representedby I detectors, each
141
142
'
taking on k values, then of the P distinct subsetsof (J, only (k + I ) ' can be defined
by schemata. However, the question is not so much one of defining all possible
subsets, as it is one of defining enough " enriched" subsets, where an " enriched"
subset is one which contains an above-average number of high-performance
structures.
It is instructive, then, to determine how many schemata (on the given
"
"
representation) are enriched in the foregoing sense. Let (J, contain x structures
which are of interest at time t (becausetheir performance exceedsthe averageby
some specifiedamount) . If the attributes are randomly distributed over structures,
determination of " enrichment" is a straightforward combinatoric exercise. More
precisely, let each8.. be a pseudorandom function and let V.. = {O, I } , i = I , . . . , / ,
so that a given structure A E: (J, has property i (i.e., 8.( A ) = I ) with probability -t .
Under this arrangement peculiarities of the payoff function cannot bias concentration
of exceptional structures in relation to schemata.
Now , two exceptionalstructurescan belong to the sameschemaonly if they
are assignedthe sameattributes on the samedefining positions. If there are h defining
positions this occurs with probability (l )It. For j exceptional structures,
instead of 2, the probability is ( 1/ 2i- l)lt. Sincethere are (1) ways of choosingh out
of / detectors, and ~, ways of choosingj out of x exceptional structures, the expected
number of schematadefined on h positions and containing exactlyj exceptional
structures is
( 1/ 2i - l)A(1XJ) .
For example, with / = 40 and x = I~ (so that the density of exceptional structures
is x/ 2' = 1~ / 2.o ~ 10- 7), h = 20, andj = 10, this comesto
( 1/ 2' )' 0(40!/ (20!20!)XI ~ !/ (99,990! to !) ~ 3.
Noting that a schemadefined on 20 positions out of 40 has po = I ~ instances,
we seethat the 10 exceptional structures occur with density 10- ' , an " enrichment"
factor of 100. A few additional calculations show that in excessof 20 schemata
defined on 20 positions contain 10 or more exceptional structures.
For given h andj , the " enrichment" factor rises steeply as / increases. On
the other hand an increasein x (correspondingto an extensionof interest to structures
with performancesnot so far above the average) acts most directly on the
expectednumber of schematacontaining j structures. With an adequate number
of pseudorandom functions as detectors(and a procedure for assuringthat every
combination of attributes designatesa testable structure), the adaptive plan wiD
have adequategrist for its mill . Stated another way, even when there can be no
Adaptation of Coding"
143
correlation between attributes and performance , the set of schemata cuts through
"
the spaceof structures in enough ways to provide a variety of " enriched subsets.
Intrinsic parallelism assures that these subsets will be rapidly explored and
exploited.
"
LANGUAGB
2. THE " BROADCAST
Though the foregoing is encouraging as to the range of partitions offered by a
given set of schernata, sornething rnore is desirable when long-term adaptation is
involved. First of all , when the payoff function is very cornplex, it is desirable to
adapt the representationso that correlations betweenattributes and performance
are generated. Both higher proportions of " enriched" schernata and higher
" enrichrnent" factors result. It is still rnore
irnportant, when the environrnent
in
addition
to
that
the
provides signals
adaptive plan be able to model
payoff,
the environrnent by rneans of appropriate structures (the 5Jt cornponent of
(1 = (11X mt in section 2.1). In this way large (non-payoff) information flows
frorn the environrnent can be used to irnprove perforrnance. As suggestedin
sections3.4 and 3.5, by a processof generating predictions with the rnodel, observing
subsequentoutcornes, and then cornpensatingthe rnodel for false predictions
, adaptation can take place even when payoff is a rare occurrence.
To provide these possibilities, the set of representationsand rnodels available
to the plan rnust be defined. Further flexibility results if provision is rnade,
within the sarnefrarnework, for defining operators useful in rnodifying representations
and rnodels. A natural way to do this is to provide a " language" tailored to
the precise specification of the representationsand operators- a language which
can be ernployed by the adaptive plan. Sorne earlier observations suggestadditional
, desirable properties of this language:
I . It should be convenientto presentthe representations, rnodels, operators,
etc. as strings so that schernataand generalizedgenetic operators can be
defined for theseextensions.
2. The functional " units" (cf. detectors, etc.) should have the sarneinterpretation
(function) regardlessof their positioning within a string, so
that advantagecan be taken of the associationsprovided by positional
proximity (section 6.3) .
3. The nurnber of alternatives at each position in a string should be small
so that a richer set of schernatais provided for a given size of (1 (seethe
commentsin the middle of chapter 4) .
144
"
, in the senseof being able to define within the language
Completeness
all effective representationsand operators, should be provided so that
the languageitself placesno long-term limits on the adaptive plan.
"
145
For example, * II ~ : ~ will broadcast the suffix of any signal having the symbols
II as prefix. Thus if 1100is detected, the signal 00 will be broadcast, whereasif the
signal 11010is detected, the signal 010 will be broadcast. ( The resolution of conflicts
, where more than one signal satisfiesthe input condition , is detailed below.)
All occurrencesof ~ within a given broadcast unit designatethe same substring,
but occurrencesin different broadcast units are independentof each other.
..., A secondsymbol used in the samemanner as V . It makes concatenation
of inputs possible(seebelow).
~ This symbol servesmuch as V and ..., but designatesan arbitrary single
symbol in the argumentsof a given broadcast unit.
For example, * II ~ O: I ~ broadcasts10if 1100is detected, or II if 1110is detected.
p When p occursas the first symbol of a string it designatesa string which
persiststhrough time (until deleted), even though it is not a broadcast
unit .
This symbol is used to quote symbols in the arguments of a broadcast
unit.
For example, * 11' 0 : 10 broadcasts10 only if the (unique) string 110 is detected
(i.e., the symbol 0 occursliterally at the third position) ; without the quote the unit
would broadcast 10 wheneverany 3 symbol string with the prefix 11 is detected.
146
ArtiJic~ Systems
The interpretations of the various strings from A* , the set of strings over A,
along with the conventions for resolving conflicts, follow .
Let I be an arbitrary string from A * . In I a symbol is said to be quoted if the
'
symbol occurs at its immediate left. I is parsed into broadcast units as follows:
The first broadcast unit is designatedby the segmentfrom the leftmost unquoted
* to
( but not including) the next unquoted * on the right (if any). (Any prefix to the
left of the leftmost unquoted * is ignored.) The second, third , etc., broadcast units
are obtained by repeating this procedure for each successiveunquoted * from the
left. If I contains no unquoted * s it designatesthe null unit, i.e., it doesnot broadcast
a signal under any condition. Thus
* '*
*
pl0 11 ~ O: 1~ : 11~ : 11~
designatestwo broadcast units, namely
* ll ' * ~ O: l ~
There are four types of broadcast unit (other than the null unit ) . To determine
the type of a broadcast unit from its designation, first detennine if there are
three or more (unquoted) : to the right of the * . If so ignore the third : and everything
to the right of it . The remaining substring, which has a * at the initial positions
and at most two : s elsewhere
, designatesone of the four types if it has one
of the following four organizations.
1.
2.
3.
4.
*h :h
...r11
*...r11
*r.II..
. ..r11
*h :h :h
where h , it , and I , are arbitrary non-null strings from A * except that they contain
neither unquoted * s nor unquoted : s. If the substring does not have one of these
organizations it designatesthe null unit . The four basic types have the following
functions (subject to the conventions for eliminating ambiguities, which follow ).
1. *h : it - If a signal of type h is present at time " then the signal it is
broadcast at time I + I .
2. * : h : it - If there is no signal of type h presentat time I , then the signal
it is broadcast at time I + I .
3. *h : : it - If a signal of type h is presentat time " then a persistentstring
of type Is (if any exists) is deleted at the end of time I.
147
*
*
)} ,
), 100, 110, (XK
S( O) = { II ~ :O" ' : II ~ " ' , IOO: IOO:(XK
and at t = 1 it is
*
*
>} .
>, II <XX
>, 100, <XX
S( I ) = { I I '\7:0' Y: I I '\7' Y, IOO: IOO:<XX
The latter signal in S( I ), 11000, occurs becausethe unit * 11'\7:O' Y: II '\7' Y receives
both the signal 110and the signal 000 at t = 0, so that '\7 = 0 and ' Y = 00,
and hencethe output 11000occurs at time t = I . A little thought showsthat the
instantaneousaction of type 4 units does not interfere with the determination of
a unique state at each time since type 4 units can add at most a finite number of
signalsto the current state.
Sincethe symbols '\7 and ' Y are meant only to designateinitial or terminal
strings their placement within arguments of a broadcast unit can be critical to
unambiguous interpretation. For types I through 4, if h contains exactly one
148
3. USAGE
The following examplesexhibit typical constructionsand operationswithin the
"
" broadcast
language:
1. The objectis to producethe concatenationof two arbitrary persistent
. In so doing the
stringsuniquelyidentifiedby the prefixes It and It respectively
prefixesshouldbe droppedandthe resultshouldbe identifiedby the newprefixI ..
This is accomplished
by the broadcastunit
149
*
plJoO :h
* : I Jo :J,
p O
*h :/ .
*1. :/ 6
*/ 6:/ . :/ .
ISO
of 10
To cleavethe stringat the ith occurrence
, insteadof the first, a countmustbe
occurrences
of 10.Sincethe signalJssignalssuchan occurrence
madeof successive
,
of Js, restartingthe processeachtime
occurrences
this meanscountingsuccessive
the count is incremented(by issuingthe signalJa) until the count reachesi. The
nextexampleindicateshow a binarycountercan be setup to recordthe count.
of a signalJs. The basic
S. The objectis to count(modulo2- ) occurrences
techniqueis illustratedby the constructionof a one-stagebinary counter. The
transitionfunction(table) for a one-stagebinarycounteris
ISI
The use of broadcast units to realize the given transition table is perfectly general
and allows the realization of arbitrary transition functions (including counts
modulo 2- ) .
6. Treating the persistent strings as data implies that it should be possible
to processthem in standard computational ways. As a typical operation consider
the addition of two persistentbinary integers. The object, then, is to set up broadcast
units which will carry out this addition. Let Al and At be the suffixes which
identify the two strings. The addition can be carried out serially, digit by digit,
"
from right to left. Much as in example (4) the " rightmost digits are successively
extracted by the broadcast units
. V .6.ai : h .6.
. V .6.At : It .6.
Thesedigits, together with the " carry" from the operation on the previous pair of
digits, identified by prefix It , are submitted as in example (5) to broadcast units
realizing the transition table:
h
Addendl
It
Addend2
h
Carry
I.
Slim
I.
New Carry
0
0
0
0
1
1
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
0
0
0
1
0
1
1
1
.
Successive
digits of the sum are assembledby the broadcastunit I .~ :pA V :pA ~ V
where, at the end of the process, the prefix A designatesthe sum. A few additional
152
J
Artificial,System
"
.
housekeeping units such as pAV :pV A , which puts the sum in the sameform
as the addends, are required to start up the process, keep track of position, etc.
The overall processis simply a straightforward extension of techniques already
illustrated.
7. As a final example note that any string identified with a suffix I can be
.
reproducedby the broadcast unit V I : V I . Note additionally that this unit itself
has the suffix I ! Hence, if we start with this unit alone, there will be 2' copies of it
after t time-steps. By revising the unit a bit , so that its action is conditional on a
.
signal J , V I :J : V I , this self-reproductioncan be control led from outside (say by
other broadcastunits) . By extending this idea, with the help of the techniquesoutlined
previously, we can put together a set of broadcast units which reproducesan
arbitrary set of broadcast units (including itself) . The result is a self-reproducing
"
"
entity which can be given any of the powersexpressiblein the broadcastlanguage.
"
At this point it would not be difficult to give the " broadcast language a
precise, axiomatic formulation , developing the foregoing examplesinto a formal
, say, in
proof of its powers. ( For anyone familiar with the material presented
Arbib [ 1964] or Minsky [ 1967] this turns out to be little more than a somewhat
tedious exercise.) However, our presentobjectiveswould be little advancedthereby.
"
It is already reasonably clear that the " broadcastlanguage exhibits the desiderata
outlined at the beginning of section 2. In particular, the broadcastunits satisfy the
functional integrity requirement(2) in a straightforward way. Consequently, strings
of broadcast units can be manipulated by generalized genetic operators with
attendant advantagesvis-a-vis schemata(seesection6.3 and the closeof chapter 7) .
Moreover a little thought shows that by using the techniquesof usage(4) along
with those of (2), units can be combined to define a crossoveroperator which acts
"
"
"
"
.
only at specified punctuations (such as s or : s or at a particular indicator
string / ) . The other generalizedgenetic operators can be similarly defined. New
detectors can be formed naturally from environmental signals (represented as
binary strings) . For example; a signal can be converted to an argument which will
"
'
"
detect similar signals (elements of a superset) simply by inserting don t cares
.
( 0 ) at one or more points. Thus, EV :pI V converts any signal with prefix E
"
"
( environmental ) into a permanent piece of data which can then be manipulated
as in usages(4) and (6) to form a new broadcast unit with some modification of
the signal as an argument.
The collection of broadcast units employed by an adaptive system at any
will
time
, in effect, determine its representationof the environment. Sincethe units
themselvesare strings which can be manipulated by generalizedgeneticoperators,
"
"
strings of units ( devices ) can be made subjectto reproductive plans and intrinsic
"
IS3
parallelism in processing. More than this, the adaptive plan can modify and coordinate
the broadcastunits to form models of the environment. By implementing,
within the language, the " prediction and correction" techniques for models discussed
in Illustration 3.4, we can arrive at a very sophisticatedadaptive plan, one
which can rapidly overcomeinadequaciesin its representationof the environment.
This approach will be elaborated in the next section. The " broadcast language"
already makes it clear that there exist languagessuitable both for defining arbitrary
representationsand for defining the operators which allow these representations
to be adapted to environmental requirements.
4 . CONCERNING APPLICATIONS AND THE USE OF GENETIC PLANS
TO MODIFY
REPRESENTATIONS
1S4
.f
ArtificialSY8tem
~~ .,41.~ ~- - - - - - - - - - - - - - - - - - - - II I L- - - - - - - - - - - - ,
,
_ ___
I
LL_
- - - - - - _- , .,.
I
.
I
I
EZ
~ ~ i -.~. ~_- - - - . .
~
~
+, - - - - - .,
+,
. L___ - "'!.. r (
L- - - - 8
8,
H
~
I
.I..:rI
.+
I
I
~zz
~.!-M
t.:
= ::::::~
t11C
. I
I I
I
- - - - _ .J .
'- - - - + - - - - - - - - - - - - - - - - - - - - - - - - - - _JI
-:.
~
.
L__ _ _ _ _ .- --- - - - - - - ,
--- - --- - - - -t
' }J
8
~
8H
-trliI
-' i'- ,
.
. II .;. r - -- -- -- -- .,:t
= ~ ~ i~
- .J
I .. .. . .. r
..
II I
I . . . . ..
.
L ___ __ - _ ":_ - - - - - - - - - - - -
SENSORGENE
INTEGRATOR
GENE
t -f
RECEPTOR
GENE
PRODUCER
GENE
MOLECULAR
SIGNAL
(CHAIN)
"
"
Fig . 14. Schematic of Britten- Da, id.fOIIgeneralized operon- operQlor
modelfor generegulation in higher cell.J
J
Adaptation of Coding, and Representation
ISS
156
The unit(s) implementing the detector, when activated, would broadcast a signal
with an identifying prefix. (For the reader familiar with the early history of pattern
'
recognition these units act much like the demons at the lowest level of Selfridge s
"
"
"
"
Pandemonium [ 1959] .) Other devices would weight the signals, sum them,
and " compare" the result to a threshold (cf. section 7.3) to determine which
responsesignal (from the set of transformations {77i
} ) should be broadcast. More
generally, the behavioral atoms would be a string of broadcast units with an
" initiate " condition C which
specifiesthe set of signals capable of activating the
atom, an " end" signal J . which indicates the end of the atom' s activity, and a
"
"
predicted value signal which is meant to indicate the ultimate value to the
behavioral unit of that atom' s activation. With this arrangementwe can treat the
behavioral unit as a population of atoms. The atom ac~ivated at any given time is
determined by a competition betweenwhatever atom is already activated and all
atoms having condition C satisfied by a signal in the current state S(t ). The higher
the predicted value of the atom the more likely it is to win the competition. The
'
object at this level is to haveeach atom s predicted value V consistentwith that of
its successor
, so that a set of atoms acting in sequenceprovides a consistent prediction
of their value to the behavioral unit. ( In this way the atoms satisfy the
error correction requirements discussedat the beginning of section 3.4 under
element(iii ) of a typical searchplan.) This object can be accomplishedvia a genetic
plan applied to the population of atoms- the reproduction of any atom is determined
by the match betweenits predicted value and that of whatever atom is next
activated. For example, consider two atoms, alwithparameters ( C, J , V) and a2
with parameters( C' , JI , V' ), where ai ' s end signal acts as a2's initiate signal. Then
'
'
( V - I V - V I) could be used as a payoff to al for purposesof the genetic plan,
since the quantity measuresthe match between V' and V. The population would
then be modified as outlined in section6.1, new atoms being assignedthe predicted
value of the successora2. That is, the offspring of al would be assignedthe predicted
value V' . All atoms active sincethe last actual payoff from the environment,
and their offspring, are taggedand their predicted valuesare adjusted up or down
at the time of the next environmental payoff. The adjustment is determined by the
differencebetweenpredictedvalue and the actual payoff rate (payoff receivedfrom
the environment divided by the elapsedtime sincelast payoff). After eachenvironmental
payoff the active behavioral unit is subjectedto a genetic plan (again as
describedin section 6.1). The behavioral unit next active (after the environmental
payoff) is determined by the winner of a competition among al/ atoms in all
behavioral units. The outcome of the competition is determined in the same way
as the within -unit competition. Finally , a behavioral unit may be subjected to
lS7
158
.f
Adaptation In Natllral and Artificial SY.ftem
In sum: This chapter has been concerned with removing the limitations imposed
by fixed representations. To this end it is possibleto deviselanguages- the broadcast
languageis an example- which use strings to rigorously define all effectively
specifiablerepresentations, models, operators, etc. Sincethe objectsof the language
are presentedas strings, they can be made grist for the mill provided by genetic
plans. As a consequencethe advantagesof compact storage of accrued information
, operational simplicity, intrinsic parallelism, robustness, etc., discussedin
chapters6 and 7, extend to adaptation of representations.
9. An Overview
Enough of the theoretical framework has now been erected that we can begin to
view it as a whole. To this end, the presentchapter will discussthree generalaspects
of the theory. Section I will concentrate on those insights offered by the theory
which are useful acrossthe full spectrum of adaptive problems. Section 2 provides
a synopsisof severalcomputer studiesto give the reader an idea of how the overall
theory works in particular contexts. Section 3 will outline several difficult longrange problems which fall within the scopeof the theory.
1. INSIGHTS
Within the theoretical framework problems of adaptation have been phrased in
terms of generating structures of progressivelyhigher performance. Becausethe
framework itself placesno constraints on what objects can be taken as structures,
other than that it be possible to rank them according to some measure of performance
, the resulting theory has considerable latitude. Once adaptation has
been characterizedalong these lines, it is also relatively easy to describe several
pervasive, interrelated obstacles to adaptation- obstacles which occur in some
combination in all but the most trivial problems:
1. High cardinality of (1,. The set of potentially interesting structures is
extensive, making searches long and storage of relevant data difficult .
2. Apportionment of credit. Knowledge of properties held in common by
structures of above- averageperformance is incomplete, making it difficult
to infer from past tests what untested structures are likely to yield
above- averageperformance.
3. High dimensionality of liB. Performanceis a function of large numbers
of variables, making it difficult to useclassicaloptimization methods employing
gradients, etc.
159
160
and
An Overview
161
2. COMPUTERSTUDIES
At the time of this writing several computer studies of genetic plans have been
completed (and more are underway). Four studies closely related to the theoretical
framework will be outlined here: R. S. Rosenberg's Simulation of Genetic
'
Populationswith BiochemicalProperties ( 1967), D. J. Cavicchio s Adaptive Search
'
Using Simulated Evolution ( 1970), R. B. Hollstien s Artificial Genetic Adaptation
in ComputerControl Systems( 1971), and DR . Frantz' s Non-linearities in Genetic
Adaptive Search( 1972).
Richard Rosenbergcompleted his computer study of closed, small populations
while formulation of the theoretical framework was still in its early stages.
He concentratedon the complicated relationship betweengenotypeand phenotype
under dynamic interaction between the population and its environment. The
model' s central feature is the definition of phenotype by chemical concentrations.
These concentrations are controlled by enzymesunder genetic control. Epistasis
has a critical role becauseseveral enzymes(and hence the corresponding genes)
can affect any given phenotypic characteristic (chemical concentration) . Though
the variety of molecules, enzyme-controlled reactions, and genesis kept small to
make the study feasible, it still presentsa detailed picture of the propagation of
162
An Overview
163
deviations above the mean in lessthan 600 trials (as comparedto 3 standard deviations
in ICXX
> trials) .
Against this background Cavicchio then developed and tested a series of
reproductive plans. The best of theseattained a score of 75.5 in 780 trials , a score
considerably beyond that attained in any of the " detector evaluation" runs. (In
"
qualitative terms, a score of 52 would correspond to a " poor human performance
"
"
, while a ~ ore of 75.5 would correspond to a good human performance.
Becausemany characters in the " difficult task" are quite similar in form , increments
in scoring are difficult to attain after the easily distinguished characters
have been handled.) An important general observation of this study is that the
sophistication and power of a genetic plan is lost whenever M , the size of the
population (data base), is very small. It is an overwhelming handicap to use only
the most recent trial ( M = I ) as the basis for generatingnew trials (cf. Fogel et al.
1966). On the other hand, the population need not be large to give considerable
scopeto genetic plans (20 was a population size commonly used by Cavicchio) .
Roy Hollstien added considerably to our detailed understanding of genetic
plans by making an extensive study of genetic plans as adaptive control procedures
. His emphasis is on domains wherein classical " linear" and " quadratic"
approaches are unavailing, i.e., domains where the performance function exhibits
discontinuities, multiple peaks, plateaus, elongated ridges, etc. To give the problems
a uniform settinghetransforms them to discrete function optimization problems
, encoding points in the domain as strings (see p. 57) . An unusual and
'
productive aspect of Hollstien s study is his translation of breeding plans for
artificial genetic selection into control policies. A breeding plan which employs
inbreeding within related (akin) policies, and recurrent crossbreedingof the best
policies from the best families, is found to exhibit very robust performance over
a range of 14 carefully selected, difficult test functions. ( The test functions include
such " standards" as Rosenbrock' s ridge, the sum of three Gaussian2-dimensional
"
density functions, and a highly discontinuous checkerboard" pattern.) The test
functions are representedon a grid of 10,CXX
> points ( 100 by 100) . In each casethe
region in which the test function exceeds90 percent of its maximum value is small.
For example, test function 7 with two false peaks (the sum of three Gaussian
2-dimensionaldensities) exceeds90 percentof its maximum value on only 42 points
out of the 10,000. The breeding plans are tested over 20 generations of 16 individuals
each, special provisions being made to control random effects of small
"
"
sample size ( genetic drift ). The breeding plan referred to above, when confronted
with test function 7, placed all of its trials in the " 90 percent region" after
12 generations( 192 trials) . A random searchwould be expectedto take 250 trials
164
"
"
( 10,000/ 42) to place a single point in the 90 percent region. The same breeding
plan performs as well or better on the 13 other test functions. Given the variety of
the test functions, the simplicity of the basic algorithms, and the restricted data
base, this is a striking performance.
Daniel Frantz concentrated on the internal workings of genetic plans,
observing the effect, upon the population, of dependenciesin the performance
function. Specifically, he studies situations in which the quantity to be optimized
is a function of
binaryparameters . I .e., 8 consists of functions which are
25-dimensional and have a domain of 225= 3.2 X 107 points. Dependencies
between the parameters(nonlinearities) are introduced to make it impossible to
optimize the functions dimension by dimension (unimodality is avoided) . Frantz' s
procedure is to detect the effects of thesedependenciesupon population structure
(geneassociations) by using a multidimensional chi-square contingency table. As
expectedfrom theoretical considerations (see Lemma 7.2 and the discussion following
it ) algebraic dependencies(between the parameters) induce statistical
dependencies( betweenalleles) . That is, in the population, combinations of alleles
associatedwith dependent parametersoccur with probabilities different from the
product of the probabilities of the individual allele~. Moreover there is a positional
effect on the rate of improvement: For functions with dependenciesthe rate
of improvement is significantly greater when the corresponding alleles are close
together in the representation. This effect corresponds to the theoretical result
that the ability to pass good combinations on to descendantsdependsupon the
combinations' immunity to disruption by crossover. It is significant that , for the
problems studied, the optimum was attained in too short a time for the inversion
operator to effectively augment the rate of improvement (by varying positional
effects) .
3. ADVANCED
QUESTIONS
The results presentedin this book have a bearing on several problem areas substantially
more difficult than those recounted in section 9.1. Each of these problems
has a long history and is complex enough to make suddenresolution unlikely .
Neverthelessthe general framework does help to focus several disparate results,
providing suggestionsfor further progress.
As a first example, let us look at the complex of problems concernedwith
the dynamics of speciation. These problems have their origin in biology, but a
closelook showsthem to be closely related to problemsin the optimal allocation of
16S
An Overview
limited resources. To seethis, consider the following idealized situation. There are
two one-armed bandits, bandit ~l paying 1 unit with probability Pion each trial ,
bandit ~2 paying 1 unit with probability P2 < Pl. There are also M players. The
casino is so organized that the bandits are continuously (and simultaneously)
operated, so that at any time I , for a modest fee, a player may elect to receivethe
payoff (possibly zero) of one of the two bandits. The managerhas, however, introduced
a gimmick. If Ml players elect to play bandit ~l at time I , they must share
. That is, on that particular trial ,
the unit of payoff if the Qutcomeis successful
1
of
a
will
receive
of
the
each
/ Ml with probability Pt. Now , let
Ml players
payoff
a period of T consecutivetrials.
for
us assumethat the M playersmust participate
=
1), clearly he will maximize his income (or minimize
If there is but one player (M
his losses) by playing bandit ~l at all times. However, if there are M > 1
players the situation changes. There will be stable queues, where no player can
improve his payoff by shifting from one bandit to another. These occur when the
players distribute themselvesin the ratio Mil M2 = Pl/ P2 (at least as closely as
allowed by the requirement that Ml and M2 be integers summing to M ) . For
example, if PI = i , P2= 1, and M = 10, there will be 8 players queuedin front of
bandit ~l and 2 players in front of bandit & . We seethat with limited resources(in
the numerical example, a maximum of 2 units payoff per trial and an expectation
of 1 unit) the population of players must divide into two subpopulations in order
"
"
to optimize individual shares of the resources(the bandit ~l players and the
"
" bandit
& players ). Similar considerations apply when there are r > 2 bandits.
We have here a rough analogy to the differentiation of individuals (the
subpopulations) to exploit environmental niches (the bandits) . The analogy can
be mademore preciseby recastingit in terms of schemata. Let us consider a population
1
of M individuals and the set of 2' schematadefined on a given set of /0
"
'1
positions. Assume that schema ~.., i = 1, . . . , 2 , exploits a unique environmental
"
niche which producesa total of Q.. units of payoff per time step. (Q.. corresponds
to the renewal rate of a critical , volatile resourceexploited by ~...) If the
population contains M .. instancesof ~.., the Q.. units are shared among them so
that eachinstanceof ~.. receivesa payoff of Q../ M ... Let Q(l ) > Q(2) > . . . > Q(2P)
so that schema ~(l ) is associatedwith the most productive niche, ~(2) with the
second most productive niche, etc. Clearly when M (1) is large enough that
Q(l)/ M (1) < Q(2), an instanceof ~(t ) will be at a reproductive advantage. Following
the sameline of argument as in the caseof the 2 one-armed bandits, we get as a
stable distribution the obvious generalization:
M (11= cQ(il/ Q(j)
JSystems
Adaptation in Natural and Artificia
166
An Overview
167
could take place in the absenceof isolation (in contrast to the usual view, cf. Mayr
1963).
Once an adaptive system discovers that given combinations of genes(or
their alleles) offer a persistentadvantage, new modes of advancebecomepossible.
If the given combinations can be handled as units they can serve as components
"
"
( super genes ) for higher order units. In effect the systemcan ignore the details
underlying the advantage conferred by a combination, and operate simply in
terms of the advantageconferred. By so doing the systemcan explore regions of (i ,
i.e., combinations of the new units, which would otherwise be tried with a much
lower probability . ( For example, consider two combinations of 10 alleles each
under the steady state of section 7.2. If each of the alleles involved occurs with a
frequency of 0.8, the overall combination of 20 alleles will occur with a frequency
(0.8)20~ 0.01. On the other hand, if each of the two 100allelecombinations is
maintained at a frequency of only 0.5, then the 20-allele combination win occur
with frequency(0.5)2 = 0.25. I .e., the expectedtime to occurrencewin be reduced
by a factor of 25.) Since combinations of advantageousunits' often offer an advantage
beyond that of the individual units as when the units effectsare additive
) or cooperative- they are good candidates for early testing.
(linear independence
( Thecooperativecasewhere one unit effectsan enrichment which can be exploited
by another is particularly common; cf. cooperating cell assembliesor stagesof a
complex production activity such as illustrated in Figure 3.)
We have already discussed(section 6.3) the way in which inversion can
favor association between genes. However, by controlling representation, the
adaptive system can bring about changeswhich go much further , producing a
hierarchy of units. The basic mechanismstemsfrom the introduction of arbitrary
punctuation marks to control operators (seeusage(4) in section 8.3 and the discussion
on pages 152- 53) . The adaptive systemintroduces a distinct punctuation
mark (specificsymbol string) to mark off the combinations which are to be treated
as units at a given level of the hierarchy. Then the operators for that level are
restricted to act only at that punctuation. ( E.g., crossovertakes place only at the
positions marked by the given punctuation.) By introducing another punctuation
mark to treat combinationsof these units, in turn , as new units, and so on, the
hierarchy can be extendedto any number of levels. The resulting structure offers
the possibility of quickly pinpointing responsibility for good or bad performance.
( E.g., a hierarchy of 5 levelsin which each unit is composedof 10 lower level units
anows anyone of 101componentsto be selectedby a sequenceof 5 tests.) In the
"
"
hierarchy, the units at each level are subject to the same stability considerations
168
An Overview
170
10. Interim
andProspectus
1. IN THE INTERIM
Classifier
Systems
" fonn in
inb' oducedin Holland 1976 and were later revisedto the current " standard
Holland 1980. There is a comprehensivedescription of the standardfonn , with
examples, in Holland 1986, but there are now many variants(seeBelew and Booker
1991) . A classifier systemis more restrictedthan the broadcastlanguagein just one
171
172
173
Classifier
System
------------------------------------------------
1
rI
I
Rule
Me_
:I
:I
ge
List
List
:
:
mak:h
:
:
:I
:I
IF
.I
.I
II
II
II
.I
..
..
..
II
II
.
.
.
II
I.
II
II
I
II
II
II
III
II
II
POSt
I
I------------------------- -----------------------output
Input
mea_ . .
mea_ . .
"
OutputInterface
InputInterface
Bucketbrlge.
effectors
)1S
deteck
)
(~ Uslsrulestrengths
Geneticalgorithm
newrules
)
(generates
Environment
174
1# # # # # ,001001/<XXX
>I I .
175
176
. Consistency
bit-strings, there are no consistencyproblemsin the internal processing
simultaneous
when
different
,
messagesurge an
problems do arise at the effectors;
effector to take mutually exclusiveactions, they are resolvedby competition.
Competition plays a central role in detenniningjust which rules are active at
any given time. To provide a computationalbasis for the competition, each rule is
assigneda quantity, called its strength, that summarizesits averagepast usefulness
to the system. We will see shortly that the strengthis automaticallyadjustedby a
credit assignmentalgorithm, as part of the learningprocess. Competitionallows rules
to be treatedas hypotheses
, more or less confirmed, rather than as incontrovertible
facts. The strengthof a rule correspondsto its level of confirmation; strongerrules
are more likely to win the competition when their conditions are satisfied. Stated
'
another way, the classifier system's reliance upon a rule is basedupon the rule s
averageusefulnessin the contextsin which it hasbeentried previously. Competition
also provides a means of resolving conflicts when effectors receive conb' adictory
.
messages
A rule, then, entersa competition to post its messageany time its conditions
are satisfied. The actualcompetitionis basedon a bidding process. Eachsatisfiedrule
makesa bid basedupon its strengthand its specificity. In its simplestform , the bid
for a rule r of strengths(r ) would be
Let us begin with the credit assignmentproblem., Credit assignment is not particularly
difficult where the situation provides immediatereward or precise infonnation about
177
. Credit
correct actions. Then the rules directly involved are simply strengthened
assignmentbecomesdifficult when credit must be assignedto early acting rules that
set the stage, making possiblelater useful actions. Stage-setting movesare the key
. The
to successin complex situations, such as playing chessor investing resources
look
such
as
the
sacrifice
of a
poor (
problem is to credit an early action, which may
piece in chess) but which setsthe stagefor later positive actions(suchas the capture
of a major piece in chess). When many rules are active simultaneously
, the problem
is exacerbated
. It may be that only a few of the early acting rules contribute to a
favorable outcome, while others, active at the same time, are ineffective or even
obstructive. Somehowthe credit assignmentalgorithm must sort this out, modifying
rule strengthsappropriately.
Credit assignmentin classifier systemsis basedon competition. The bidding
"
"
processmentionedearlier is treatedas an exchangeof capital (strength). That is,
when a rule wins the competition, it actually " pays" its bid to the rule(s) that sentthe
messages) satisfying its conditions. The rule acts as a kind of go-betweenor broker
in a chain that leadsfrom the stage-setting situationto the favorableoutcome.
In a bit more detail, when a rule competes, its suppliers are those rules that
have sent messagessatisfying its conditions and its consumersare those rules that
. Under this regime, we treat the strengthof
have conditionssatisfiedby its message
a rule as capital and the bid as paymentto its suppliers. When a rule wins, its bid is
apportionedto its suppliers, increasingtheir strengths. At the sametime, becausethe
bid is treatedas a paymentfor the right to post a message
, the strengthof the winning
rule is reducedby the amountof its bid. Should a rule bid but not win , its strength
is unchangedand its suppliers receive no payment. The resulting credit assignment
procedureis called a bucketbrigade algorithm (seeFigure 16) .
Winning rules can recouptheir paymentsin two ways: ( 1) They, in turn, have
winning consumersthat makepaymentsto them, or (2) they are active at a time when
the systemreceivespayoff from the environment. Case(2) is the sole way in which
payoff from the environmentaffects the system. When payoff occurs, it is divided
among the rules active at that instant, their strengthsbeing increasedaccordingly.
Rules not active at the time the payoff occurs do not sharedirectly in that payoff.
The system must rely on the bucket brigade algorithm to distribute the increased
strengthto the stage-setting rules, under repeatedactivationsin similar situations.
The bucketbrigadeworks becauserules becomestrongonly when they belong
to sequencesleading to payoff. To seethis, first note that rules consistentlyactive at
times of payoff tend to becomestrong becauseof the payoff they receive from the
environment. As theserules grow stronger, they make larger bids. A rule that " supplies
" oneof the
.
payoff rulesthenbenefitsfrom theselargerbids in future transactions
178
(ti~ of activation)
[t-1)
[t)
C
~--_...'
~
cond
1 oond
2 messag
~1 a
liP
t
t
tagx
tagx
payment(time)
sb'ength at t.1
sb'ength at t
8b'ength at t+1
100
112
112
[t+1)
C'
/
1
1"1
t
tag1
12 (t)
C"
~--_..
~
I 1111 1
I
t
tag1
416
120
1~
124
~ - - -
(1+ 1)
160
160
144
179
180
2. THEOPTIMAL
ALLOCATION
OFTRIALSREVISITED
Pride of place in the correctioncategorybelongsto Dan Frantz' s work on one of the
main motivating theorems in the book, Theorem 5. 1. This theorem concernsthe
"
"
optimal allocation of trials in determining which of two random variables has a
higher expectedvalue (the well known 2-armed bandit problem) . In chapter 5, an
"
"
optimal solution is a solutionthat minimizesthe lossesincurredby drawing samples
from the randomvariable of lower expectation. The theoremthere showsthat these
lossesare minimized if the numberof trials allocatedto the randomvariablewith the
highestobservedexpectationincreasesexponentiallyrelative to the numberof trials
allocatedto the observedsecondbest.
Becauseschematacan be looked upon as random variables, this result illuminates
the treatmentof schemataunder a genetic algorithm (nee genetic plan in
7
chapter ) . Under a geneticalgorithm, a schemawith an above-averagefitnessin the
populationincreasesits proportion exponentially(until its instancesconstitutea significant
fraction of the total population) . If we think of the genetic algorithm as
generatingsamplesof n random variables(an n-armed bandit) , in a searchfor the
best, then this exponentialincreaseis just what Theorem5.1 suggestsit should be.
The problem with the proof of the theorem, as given, turns on its particular
182
useof the central limit theorem. To seethe fonn of the error, let us follow Frantzby
using F ,.(x) to designatethe distribution of the normalizedsum of the observationsof
the randomvariableX. For the 2-armedbandit, F is the distribution of the difference
of the two randomvariablesof interest. Using the notationof chapter5, q(n) = 1 F,.(x) when x = bnll2. That is, 1 - F,.(x ) gives the probability of a decision error,
q(n), after n trials out of N have been allocatedto the random variable observedto
be secondbest. Becausex is a function of n, the proof given in chapter5 implicitly
assumesthat, as n - . 00, the ratio
[ 1- F,,(x)]/ [ I - ~ (x)] - + I ,
where I - ~ (x) is the areaunder the tail of a nonnal distribution. However, standard
sources(seeFeller 1966, for example) show that this is only true whenx varies with
n as o( nl /~ . This is manifestly untrue for Theorem5.1, wherex = bnl/2.
The main result of theorem5.1 can be recoveredby using the theory of large
deviationsinsteadof the central limit theorem. The theory of large deviationsmakes
the additional requirementthat the moment-generatingfunctions for the random variables
exist, but this is satisfied for the random variablesof interest here. Let the
moment-generatingfunctions for the two randomvariables, correspondingto the two
arms of the bandit, be ml(t ) and m2(t) . Then the moment-generatingfunction for X ,
the difference, is m(t) = ml( - t)* m2(t) . There is a uniquely definedconstantc such
that
c = inf ,(m(t .
DefineS(n) to be the sum of n samplesof X. Then the appropriatetheoremon
large deviationsyields
Pr{S(n) :2:0} = [c"' (21rn) I/2]d,.( I + o( l ,
where log d,. = 0( 1) . Making appropriateprovision for ties, this yields
1/2
' "
q(n) - b c ' (21rn) ,
where b ' is a constantthat dependsupon whetheror not X is a lattice variable. This
relation for q(n) is of the sameform (except for constants
) as that obtainedfor q(n)
underthe inappropriateuseof the centtallimit theorem. Substituting, and proceeding
as before, Frantz obtains
183
THEOREM
5.1 (largedeviations
): Theoptimalallocationof trials n* to theobserved
second
is given
variables
corresponding
to the 2 - armed
bandit
problem
by
n* -- ( 1/2r) In[(r3c2N1
/(1Tln
(rc2N2/21T],
where r = Iin cl .
This theorem actually goes a step further than the original version, directly
"
"
providing a realizableplan for sampleallocation. The original version was based
"
"
on a ideal plan that could not be directly realized, requiring section 5.2 to show
that the lossesof the ideal plan could be approachedby the lossesof an associated
" realizable"
.
plan. Section5.2 is now superfluous
The revised constantsfor the realizable plan do not affect results in later
chapters, becausethe further analysis of genetic algorithm performancedoes not
depend upon the exact values for the constants. The basic point is that genetic
algorithmsallocatetrials exponentiallyto the randomvariables(schemata
) corresponding
to the arms of an n-armed bandit. Coefficientsmay vary among schemata
, but
the implicit parallelismof a geneticalgorithm is enoughto dominateany differences
in the coefficients.
There are two other errors that may trouble the close reader, though they are
much less important. The first error occurs, at the top of page 71, in the example
. x(3) should be .1000010 . . . 0 , with the
giving estimatedvalues for schemata
that
consequence
A
fo1oCX
>lo...0= (.f{x(1 +/{x(4 )/2.
The seconderror occurs, on page 103, in the discussionof the effect of crossoveron
the increaseof schemata
. In the derivation just below the middle of the page, the
approximation1/ ( I - c) :e;; I + c, for c ~ I , is invoked. But this approximationis in
the wrong direction for preserving the inequality for Tie(t); therefore the line that
follows the mention of this approximationshouldbe deleted.
Finally there is a point of emphasisthat may be troublesome. In the discussion
of the role of payoff in the formal framework, near the bottom of page 25, the
mappingallows the payoff to be any real number, positive or negative. It would have
helpedthe readerto say that payoff is treatedas a nonnegativequantity throughout
the book, particularly in the discussionof geneticalgorithms.
184
ArtlJicialSystems
Other than these corrections , I am only aware of a few ( less than a dozen)
typo graphical errors scattered throughout . They are all obvious from context , so
'
there s no need to list them here.
3.
RECENTWORK
185
tems " never get there." Improvementis usually much more important than
optimization. When partsof the systemdo settledown to a local optimum,
it is usually temporary, and those parts are almost always " dead," or
uninteresting, if they remain at that equilibrium for an extendedperiod.
Complex adaptivesystemsanticipate.
"
"
In seeking to adapt to changing circumstance
, the parts develop rules
of responses
. At its simplest, this
(models) that anticipatethe consequences
is a processnot much different from Pavlovian conditioning. Even then,
the effects are quite complex when large numbersof parts are being conditioned
in different ways. The effects are still more profound when the
anticipation involves lookaheadtoward more distant horizons. Moreover,
aggregatebehavior is sharply modified by anticipations, even when the
anticipationsare not realized. For example, the anticipationof an oil shortage
can causea sharprise in oil prices, whetheror not the shortagecomes
to pass. The effect of local anticipationson aggregatebehavior is one of
the aspectsof complex adaptivesystemswe leastunderstand
.
The objective of the Santa Fe Institute is to develop new approach
es to the
es
that
interactions
of
,
study complex adaptivesystems particularly approach
exploit
. Computersimulationoffers new ways
betweencomputersimulationand mathematics
of carrying out both realistic experiments, of flight-simulator precision, and welldefined gedankenexperiments, of the kind that have played such an important role
in the developmentof physics. For real complex adaptive systems- economies,
ecologies, brains, etc.- these possibilitieshavebeenhard to comeby because( I ) the
systemslose their major featureswhen parts are isolatedfor study, (2) the systems
are highly history dependent, so that it is difficult to make comparisonsor teaseout
representativebehavior, and (3) operationfar from equilibrium or a global optimum
is a regime not readily handledby standardmethods.
The Institute aims to exploit the new experimentalpossibilitiesoffered by the
simulation of complex adaptive systems, providing a much enrichedversion of the
theory/experimentcycle for such systems. In conjunctionwith thesesimulations, the
commonke~ el sharedby complex adaptivesystemssuggestsseveralpossibilitiesfor
theory (cf. the work on the schematheory of genetic algorithms) . In an area this
complex, it is critical for theory to guide and inform the simulations, if they are not
to degenerateinto a processof " watching the pot boil." Theory is as necessaryfor
sustainedprogresshere as it is in modem experimentalphysics, which could not
proceedoutsidethe framework of theoreticalphysics. We needexperimentto inform
186
Echo
187
0
~
.
.
1II
Q.
.
.
single-cellagent
IQagent( :~ J, . . -~ tazoan(n'MJltioe
' "
"
Fig. 17. Echos world
: ( 1) condition for combat, (2) condition for trading, and (3) condition
chromosomes
for mating. Conditions serve much as the condition parts of a classifier rule, determining
what interactionswill take place when agentsencounterone another.
The fact that an agent's structureis completely defined by its chromosomes
,
which arejust strings over the resourcealphabet{a ,b,c,d} , plays a critical role in its
"
"
reproduction. An agent reproduceswhen it collects enoughletters to make copies
of its chromosomes
. As we will see, an agent can collect these letters through its
188
~
-,1
",,
'-",.-r~,<
,
'
"
"
t!ff~~
aaa
c abaa
bb b aa
a
...
a
c
a.
aa
a
a
.
(::8~
b
c
(: :i..
{a, b , c} are renewmle resources
{ o. . . . . ( ~~~ :~ ~ } areagenm
Fig . 18. A site in Echo
interactions: combat, b' ade, or uptake from the environment. Each agent has a reservoir
in which it storescollectedlettersuntil thereare enoughof them forreproduction
to take place.
Interactionsbetweenagents, when they come into contact, are detenninedby
a simple sequenceof testsbasedon their tags and conditions. In the simplestmodel,
they first test for combat, then they test for b' ading, and finally they test for mating:
I . Combat (see Figure 20) . Each agent checks its combat condition against
the offensetag of the other agent. This is a matchingprocessmuch like the matching
of conditions against messagesin classifier systems. For example, if the combat
condition is given by the sUing Dad, then this condition is matchedby any offense
"
"
tag that beginswith the letters Dad. ( Thecondition, in this example, doesnot care
what letters follow the first three in the tag, and it does not match any tag that has
lessthan three letters.)
189
and Prospectus
mat
)
;
cb
mate
)
ba
trade
def
nm
Interim
off
[
Tags
Conds
[
:
.
)
enz
1
;
enz
Trans
ac
UPTAKE
environrmnt
from
(
other
and
organisms
a b c d
4
0
2
2
0
4
2
2
2
2
4
0
1
I
0
4
190
J, ~
t occursif agent1's
2) Agent 1 is assigned a ~ e based on the match between its offense tag and the defense tag of
agent 2 ; a simHar~ e is calculatedfor agent 2. The agent with the higher ~ e is the winner.
loft
. " . . tagsl
.. 18g81
I- ten
3) The winner a~ ulres from the loser the resouroosIn Its reservoirand the resouroostied ~ In Its
chr~
s and tags. The loser Is deleted.
Under this matrix, the offensetag Dab matchedagainstthe defensetag aaaad would
yield a score of 4+ 4+ 0 = 8. (In this simple example, the additional letters in the
defense
- tag do not enter the scoring; in a more sophisiticatedscoring procedure, the
defensemight be given some extra points for additional letters) . A score is also
calculatedfor the secondagentby matchingits offensetag againstthe defensetag of
the first agent. If the scoreof one agentexceedsthe scoreof the other, then that agent
is declaredthe winner of the combat. In an interestingvariant, the win is a stochastic
function of the differenceof the two scores.
191
192
regular, or irregular, array (seeFigures 17 and 18) . Each site has a well-defined set
of neighboringsites, and eachsite can containa subpopulationof agents. In addition,
each site is assigneda production function that determineshow rapidly the site
. For example, one site may produce
producesand accumulatesthe various resources
10 units of resourcea per time step, and nothing of b, c, or d, whereasanothermay
produce4 units eachof a , b, c , and d. If the site is unoccupiedby any agents, these
resoucesaccumulate, up to some maximum value. In the example of the site that
produces10 units of resourcea per time step, the site could continueto accumulate
the resourceuntil it had accumulated
, say, a total of 100 units. Agents presentat a
site can " consume" theseresources
. Thus an agentlocatedat a site that producesthe
resourcesit needscan managereproductionwithout combat or trade, if it survives
combat interactionswith other agents. Different agentsmay have intrinsic limits on
the resourcesthey can take up from the site. For example, an agentmay only be able
to consumeresourceb from the environment, being dependentupon agent-agent
interactionsto obtain other neededresources
. Resoucesavailableat a site are shared
the
that
can
consume
them.
among
agents
When neither agent-agentnor agent-environmentinteractionsare providing at
least one neededresourceat a given site, an agent may migrate from that site to a
neighboringsite. For example, consideran agentthat hasalreadyacquiredenoughof
resourcesa and b to makecopiesof its chromosomesbut that is not acquiring needed
resourcec. Then that agent will migrate to some neighboringsite; in the simplest
modelsthe new site is simply selectedat randomfrom the neighboringsites.
The Simulation
193
( 2
Each
Each
in
agent
site
the
executes
of
uptake
resources
in
agent
the
by
be
met
is
agent
on
depending
of
subtraction
the
the
in
its
see
If
not
the
of
units
of
models
the
site
If
the
must
be
also
has
reservoir
be
may
cost
to
returned
random
small
cannot
the
site
chance
of
see
to
make
one
to
of this basic
of
range
the
for
it
resources
enough
its
chromosomes
later
gedanken
needs
currently
If
so
it
( 3
step
for
) ,
tests
reproduction
the
in
uptake
site
adds
step
number
specified
out in a variety
of ways , depending
of
interest . Even
the
. One
of the earliest
models
experiments
evolutions
with
in
produced
sites
with
can be filled
cycle
of agents
of
replicates
neighboring
site
accumulated
copy
resources
associated
sophisticated
surprisingly
has
than
the
the
the
it
of
function
sequences
if
to
other
one
to
particular
evolving
site
least
resource
show
( 2
and
the
at
The details
upon
in
production
each
the
its
agent
mutations
migrates
The
1 )
acquired
agent
( 6
from
resources
Each
tests
infrequent
agent
has
site
the
steps
with
Each
in
via
it
which
cause
agent
itself
if
cost
"
reservoir
( 5
to
its
of
model
particular
Each
replicates
maintenance
resources
( some
deleted
without
charged
specified
"
deleted
being
is
"
"
payed
site
the
at
produced
"
"
( 3
simplest
produced
chromosomes
was
"
and
longer
the increasing
basic
arms
biological
offense
tags
defensive
( Dawkins
ever
developed
capabilities
for
cycle , provide
"
race
. More
the evolution
of
1986 ) , wherein
more
sophisticated
tags
to
became
overcome
of the
models , by a modification
"
- - connected
communities
of
recent
"
defense
matches
metazoans
agents that have internal boundaries and reproduce as a unit . With this provision ,
agents belonging to a connected community can specialize to the advantage of the
whole
community
. For example
, one kind
of agent
belonging
to the community
can
reminiscent
show
that
of the
intracommunity
the reproduction
rate
of
cells
stinging
and cavity
between
trading
both . As
these
a consequence
cells
specialists
yields
the metazoans
a net increase
come
to occupy
to
in
a
significant place in the overall ecology of agents. Many of the mechanisms investigated
by Buss ( 1987) can be imitated by this model , including the evolution of cooperation
between
cell
lines
mechanisms
( cf . Axelrod
as induction
and Ha milton
and competence
21 ) .
of such
developmental
~Systems
Adaptation in Natural and Artificia
194
"
"CATERPILLAR
~
'PREDATION
"'"
~
PRED
,.
t
TRADE
~
QIIANTI
)
" FLY"
~
.-Conditions
..,mm18i
~~
. ~
Taga
.
Condition
~
~
I
_
~
"
,
,
~~..~
~ ~
ag
Condition.
.
.
.
195
POSSIBILITIES
196
Interim
and Prospectus
198
Bibliography
203
204
Bibliography
205
Called
of a Classof Probabilistic
Martin, N. 1973. Convergence
AdaptiveSchemes
Properties
Plans. PhiD. Dissertation
, AnnArbor: Universityof Michigan.
Reproductive
Sequential
: HarvardUniversity
. Cambridge
andEvolution
, Massachusetts
Mayr, E. 1963.AnimalSpecies
.
Press
. Mathematical
of evolution
to themathematical
. 1967. Evolutionarychallenges
interpretation
. ed. Moorhead
to theNeoDarwinianInterpretation
,
of Evolution
Challenges
.
: WistarInstitutePress
P. S. , andKaplan, MM . 47- 58. Philadelphia
: markll . Psych.Rev. 64:242&52.
Milner, P. M. 1957. Thecell-assembly
Cliffs: Prentice
. Englewood
Finite
andInfiniteMachines
:
.
M.
L.
1967
,
Computation
Minsky
Hall.
Newell, A. , Shaw, J. C., and Simon, H. A. 1959. Reporton a generalproblemsolving
.
. 256- 64. Paris: UnescoHouse
. Proc. Int. Con/. Info. Process
program
of a Cell-AssemblyModel. PhiD. Dissertation
, Ann Arbor:
Plum, T. W.-S. 1972. Simulations
Universityof Michigan.
. Simulationof
Riolo, R. L. 1990Lookahead
planningandlatentlearningin a classifiersystem
. ed. Meyer, J.-A. andWilsonS . Cambridge
: FromAnimalsto Animats
AnimalBehavior
.
: M.I.T. Press
, Massachusetts
. PhiD.
with Biochemical
Properties
, R. S. 1967. Simulationof GeneticPopulations
Rosenberg
Dissertation
, Ann Arbor: Universityof Michigan.
. IBM J.
Samuel
, A. L. 1959. Somestudiesin machinelearningusingthe gameof checkers
Res. Dev. 3:210- 29.
on Genetic
Schaffer
, J. D. , ed. 1989. Proceedings
of the Third InternationalConference
.
Kaufmann
Mateo
:
.
San
Morgan
Algorithms
. TheHarveyLectures1971-1972. New
Sela, M. 1973. Antigendesignandimmuneresponse
.
York: AcademicPress
. Mech. Thought
. Proc. Symp
: a paradigmfor learning
, O. J. 1959. Pandemonium
Selfridge
Office.
Process
es. 511- 29. London: H. M. Stationary
.
: M.I .T. Press
Simon, H. A. 1969. TheSciences
, Massachusetts
of theArtificial. Cambridge
-Hall.
Cliffs: Prentice
. Englewood
es,of OrganicEvolution
Stebbins
, G. L. 1966. Process
. NewYork: Academic
Tsypkin, Y. Z. 1971. AdaptationandLearningin AutomaticSystems
.
Press
-Hall.
Cliffs: Prentice
. Englewood
Uhr, L. 1973.PatternRecognition
, andThought
, Learning
.
stern, O. 1947. Theoryof Gamesand EconomicBehavior
von Neumann
, J., and Morgen
.
: PrincetonUniversityPress
Princeton
to theNeoDarwinian
. Mathematical
discussion
Challenges
, C. H. 1967.Summary
Waddington
, P. S., andKaplan, MM . 95- 102. Philadelphia
Interpretationof Evolution. ed. Moorhead
.
: WistarInstitutePress
. NewYork: Norton.
Wallace
, andEvolution
, GiantMolecules
, B. 1966. Chromosomes
. Dissertation
.
PhiD
Cell
a
of
, Ann Arbor:
.
Simulation
R.
1970
Living
,
Computer
Weinberg
Universityof Michigan.
Glossaryof ImportantSymbols
(1
domain of action of an adaptive plan, the structures it can attain (5, 21)
Ct(t)
(11(t)
that part of the structure Ct( t) directly tested against the environment
(23)
(B( t)
(c,)
( C,J,V)
d..
the range of signals the adaptive system can receive from the environment
(22)
I (t )
the particular signal received by the adaptive system from the environment
at time t (22)
8M
k or k ..
200
/(e
JO
(e
L(N)
M
mt( t)
Mf<t)
P(~,t)
p[ ]
<P
Of
r'
<R[ )
<R.(PC'p" .PM,(C,
time (20)
U.,...{T)
'U
a(~,.tit)
8i: -+ Vi
201
.11
" 101
- over" pressure
crossing
( )
Ef
A(f}
~\
II . : d - + Reala
lit
P,f
p,(t)
p(T) or p(l)
p~,
P,
F.
~(/)(N)
,:,
p : d1 - + n
T:/ X d - + n
or
T:/ X d - + d
"' : Cl - + Cl
[ ]t
202
'"
dr.
definedto be equal(94)
Index
(df . ) / ollowing an entry indicates the term is defined or explained on that page
, J. D. , 162
Bagley
Bandit, two-anned. SeeTwo-annedbandil
, 78- 79
Bayesian
algorithm
atom, 155(dC.)
Behavioral
Behavioral
unit, 155(dC.)
Bellman
, R., 76
Bledsoe
, W. W., 162
Breedingplan, 163- 64
Brender
, R. F., 168
Britten, R. J., 116, 117, 153, 154
Broadcast
, 8.2- 8.4, 171- 72
language
Broadcast
unit, 144- 47 (df.)
, I. , 162
Browning
Brues, A. M. , 169
Bucketbrigadealgorithm
, 176- 79 (df.).
of credit
SeealsoApportionment
"
"
Buildingblocks, 174, 179- 80, 198. See
alsoRecombination
Burks, A. W. , 168
Buss, L. W., 193
CNS. SeeCentralnervoussystem
, 91, 166
Carryingcapacity
Cavicchio
, D. J., 161, 162, 163, 169
Cell assembly
, 60- 64, 155
Cellularautomata
, 168
Centralnervoussystem
, 3.6, 155- 57, 168
Chi-squarecontingency
, 164
Christiannsen
, F., xi
207
208
Chromosome
, 9, 13- 16, 21, 33- 34, 6667, 117- 18, 137, 153- 55. Seealso
Structures
Classifiersystems
, 172- 81, 195; bid, 176;
classifiers
, 172- 73, 174- 75 (df.); detec
tors, 172; effectors
, 172, 176; execution
, 172- 73, 175
cycle, 175; messages
Co- adapted
, II (df.), 34, 88, 136, 139,
161. SeealsoSchemata
Coding, 57. SeealsoRepresentation
, 176, 177, 197
Competition
, 184- 86, 195Complexadaptivesystems
98. SeealsoEcho
, 5, 21, 66, 167. Seealso
Component
Schemata
Studies
, 9.2
Computer
Condition
/actionrules. SeeClassifiersystems
Control, 3.5, 70- 71, 119- 20, 163- 64,
169. SeealsoFunctionoptimization
Controlpolicy, 54 (df.)
, 124- 25. SeealsoCriterion
Convergence
Cover, T. M., 76
Creditassignment
. SeeApportionment
of
credit; Bucketbrigadealgorithm
Criterion
, 12, 16- 19, 26- 28, 41- 42, 7576, 83- 84, 85, 87, 125, 129; examples
of, ch. 3
Crossingover, 6.2, 108.:-9, 110, 113,
121, 140, 152, 164, 167, 168; simple,
102(df.), 108, III , 121. SeealsoOperator
, genetic
Crossover
. SeeCrossingover
operator
Crow, J. F., 33
Cumulative
payoff, 26- 27, 38- 39, 42, 53,
55. SeealsoLossminimization
Index
, 118, 144
Effectivelydescribable
. SeeCriterion
Efficiency
Enumeration
(enumerative
plan), 8, 13,
16- 17, 19, 26, 41, 69, 110, 124
- 25
Environment
, 1.2, 6, 11- 12, 16- 18,
2.1, 143, 153, 169;examples
of, ch. 3.
SeealsoPerformance
measure
Environmental
niche
, II (df.), 12, 33,
119
- 66, 168
- 69
, 165
- 54, 161
, 10(df.), 117, 153
Enzyme
- 39, 161
- 62.
, 10(df.), 34, 138
Epistasis
Seealso Nonlinearity
Error, 48, 52, 55, 56, 156
Evaluator, 48, 52, 156
Evolution
, 12, 17, 97, 119. SeealsoSpeciation
Falsepeak
. SeeLocaloptimum
Feedback
, 10, 119, 154
Feldman
, M. W., xi
Fisher
, R. A., 89, 118, 119, 137, 161
- 39, 168.
Fitness
, 4, 12- 16, 33- 34, 137
SeealsoAverage
excess
; Performance
measure
Davidson
Fixedpoint, 100
- 101, 134, 138
, E. H., 117, 153, 154
Dawkins
, R. , 193
, L. J., 163
Fogel
Defaulthierarchy
Frantz
, D. R., 161, 162, 164, 171
, 180- 81
', 181
, 20, 66. SeealsoApportionmentFunction
, 57, 70-71, 89- 90,
Decomposition
optimization
- 20, 163
of credit
99, 105
- 6, 119
- 64. Seealso
of
schema
.
See
Schema
Definingpositions
Optimization
Deletion
, 117(df.)
Detector
, 44 (df.), 3.4, 63, 66- 67, 117,
132, 8.1, 153, 155- 56, 162. Seealso
Representation
Distribution
. SeeProbability
, probability
distribution
Gale
, D., 36
, 2.3
Gambling
-playing
Game
;, 3.3, 7.3
Gedanken
186, 196
experiments
Allele.
. See
Gene
209
Index
K selection, 166
Generalproblem solver, 44
Kimura, M . , 33
Generation, 12 (df.), 13- 16, 94, 102 (df.),
133, 137
Geneticalgorithm. SeeAdaptive plan
Language, broadcast. SeeBroadcastlanguage
Geneticoperator. SeeOperator, genetic
Geneticplan. SeeAdaptive plan, genetic
Learning, 6, 60- 61, 155. Seealso Adaptive
Genetics, 1.4 , 3.I , 131, 7.4 , 153- 55
plan; Central nervoussystem
Length of schema. SeeSchema
Genotype, 9 (df.), 12, 32- 33, 161
Levins, R. , 32
Goal- directedsearch, 49, 52. Seealso
Search
Linkage, 34, 97, 102- 3, 106- 9, 135, 139,
140, 143, 164. Seealso Association
Good, I . J. , 60
Local optimum, 66, 104, 110, 111, 123,
Goodman, E. D. , 162
Goods, 36- 38. Seealso Economics, mathematical 133, 140, 160. Seealso Nonlinearity
Lookahead, 48, 52, 181, 185. Seealso
Prediction
Loss minimization, 42- 43, 75, 77- 83, 85Hamilton, W. D., xi , 193
87, 125, 129- 31. Seealso Perfonnance
Hebb, D. O., 58, 60, 64
measure
M.
E.
Hellman
76
,
,
, 157, 167- 68
Hierarchy
History. SeeInputhistory
Hofstadter
, D. R. , 181
Hollstien
, R. B., 161, 163, 169
, 50- 51
Homomorphism
Hybrid, 168
MacArthur, R. H. , 166
Machine Learning. SeeCla.~ ifier systems;
Samuel, A . L .
Maladaptation, 134- 35, 138. Seealso Local optimum
Immunesystem(biological), 155
Implicit parallelism. SeeIntrinsic parallelism
mathematical
Mayr. E. . 33. 119. 167. 168
Maze. 45- 47
Memory. 23. 56. 59. 93. 143. Seealso
Storage
Metazoanevolution, 193- 94
Minimum(expected
) loss. SeeLossminimization
Minsky, M. L. , 152
Model, 52- 53, 56, 63- 64, 143- 44, 153,
155- 57
Monod, J. , 117, 141, 153
, 168
Morphogenesis
Mutation
, 97, 16.4, 111, 113- 15, 116,
121-22, 127- 28, 136, 170
Jacob
, F., 117, 141, 153
Jeme
, N. K., 155
- 69
, 168
Migration
Milner
. P. M... 60
Needs
, 61- 64
Nervoussystem
. SeeCentralnervoussystem
210
Index
Observed
best, 75, 77, 87- 88, 124- 25,
, 165-66
Queue
129, 140
Obstacles
to adaptation
r selection, 166
, 2, 5- 6, 9- 11, 1314, 65, 66, 75, -123, 140, 9.1; examples Randomvariable. SeeSampling; Schemata
of, ch. 3. SeealsoEpistasis
; Local
Ranking, 73- 74, 87- 88, 96, 103- 4 , 105,
139- 40, 160, 169
; Nonlinearity
; Nonstation
optimum
arity
, 1.2, 2.1, 92- 93, 152, 157; examples
Operator
Rankings, storage. SeeStorage, of rankings
of, ch. 3; genetic
, 14- 18, 33,
6.2, 6.3, 6.4, 6.5, 121- 22, 127- 28,
Recombination
, 180, 191, 197, 198. See
140, 152, 157, 161, 167-68, 169- 70;
also Crossingover
losses
: examplesof , 66- 67, 106, 103, 110- 11, 125- 27, 140
Representation
7, 109, 112- 13, 116, 148- 52; homolo, 117- 18(df.), 153- 55
Operon
, 1, 4, 19, 38- 39, 54- 55, 57,
Optimization
gous, 109 (df.); via broadcastlanguage,
75- 76, 90, 120, 123, 140, 160- 61. See
141, 8.2- 8.4 , 167- 68; via detectors,
alsoFunctionoptimization
57, 66 (df.)- 71, 74, 89, 98, 131, 140,
8.1
Parallelism
, 174, 178, 192, 197
Reproductiveplan. SeeAdaptive plan, reproductive
Paralleloperation
, 8.2- 8.4 SeealsoIntrinsic
Resourcerenewalrate, 165- 66, 168
parallelism
Patternrecognition
Riolo, R. L. , 180, 181
, 1.3, 3.4, 63, 132,
155- 56, 162- 63
Robustness
, 17- 19, 27, 34- 35, 121, 12425; examplesof , 7.3, 7.4. Seealso
Payoff, 25 (df.), 26- 27, 40, 42, 76, 132,
160, 165, 166, 172, 177. SeealsoPerformanceCriterion
measure
, R. S. , 161
Rosenberg
Rule discovery, 179- 181
Payoff-onlyplan, 26, 29 (df.), 39, 42, 132
Performance
measure
, 1.2, 8, 12, 18, 66,
69, 74, 75, 87- 88, 124, 139- 40, 159. SeeSampling
Samplespace
61, 168- 69; examples
of, ch. 3. See
, 49- 50, 68- 73, 75- 77, 140,
Sampling
alsoCriterion; Fitness
160; of (i , 8, 12, 23- 24, 66, 74, 90; Payoff; Utility
Permutation
94, 124; of schemata
, 107- 8, 127
, 71- 73, 75, 85Phasespace
88, 98- 99, 127, 128- 30, 139- 40, 157,
, 54 (df.). SeealsoControl
160- 61; of severalrandomvariables
, 10(df.), 11- 12, 33- 34, 160
Phenotype
,
Plum, T. W.-S., 155
5.I , 5.3
. SeeAdaptiveplan
Policy. SeeControlpolicy
Samplingprocedure
: asa database, 73- 74, 87- 88,
Samuel
Population
, A. L., 17, 40, 42, 43, 44, 132,
91, 92- 93, 95, 96, 99- 100, 103- 4, 110,
196
125- 27, 133, 139- 40, 156, 160; biologSantaFeInstitute, x, 184, 185
ical, 12- 15, 33- 35, 136, 139, 161- 62,
, L. J., 30- 31
Savage
165-66, 168- 69
Schema
: lengthof, 102(df.)- 3, 108, 129;
Predict
~~i_O
__lil, 48, 50- 52, 56, 63, 143, 153,
definingpositionsof, 72 (df.), 125, 165
155- 56, 161. SeealsoLookahead
Schemata
;
, 19, 68 (df.), ch. 4. 8.I , 157,
Model
160- 61; andcoadaptation
, 89, 119,
211
Index
niches
136- 39, 161; andenvironmental
,
of, 74,
11, 165- 67, 168- 70; processing
87- 88, 89, 97- 108, 115- 16, 119- 20,
127, 7.5, 160- 61; rankingof, 73- 74,
75, 5.4, 125, 127, 130, 7.5, 160
Search
, 3.4, 66, 155- 57, 160, 162- 63,
164
, 116(df.)
Segregation
Sela, M. , 155
Self-reproduction
, 152
, O. J., 156
Selfridge
Shaw
, J. C., 44
Signal, 22- 23, 117- 18, 8.2, 152, 153- 57
Simon, H. A. , 44, 168
. SeeCrossing
operator
Simplecrossover
over, simple
. SeeInversion
,
Simpleinversionoperator
simple
, 164- 67, 168- 69
Speciation
State
, 13, 23, 30, 54- 55, 59, 93, 147
. SeeInference
Statisticalinference
, statistical
Steadystate. SeeFixedpoint
Stebbins
, G. L. , 118
Stimulus
. SeeSignal
-response
Stimulus
theory, 59. Seealso
Broadcast
language
Stochastic
, 24, 26--27, 28, 90, 92process
93, 95, 100, 123, 133. Seealsf) Markov
chain
: of inputhistory, 23, 30, 56, 143;
Storage
of rankings
, 69- 70, 73- 74, 87, 96, 98,
103- 4, 125, 140, 160
, 30- 31, 3.3, 7.3
Strategy
Structures
, 1.2, 16--19, 21- 23, 66- 67,
89, 92- 93, 97- 98, 107, 113, 116, 117,
of, ch. 3
144, 167- 68; examples
Supergene, 167
, 61- 62, 155
Synapse
Tag, 175, 186, 188- 91
,
, 36- 38. SeealsoEconomics
Technology
mathematical
device, 6 (df.), 1.3
Threshold
Transitionfunction, 13, 23, 50
Translocation
, 116(df.), 118
Treegraph, 40 (df.), 41
Trial. SeeSampling
Trial anderror. SeeEnumeration
Trials, optimalallocation
, ch. 5
Tsypkin, Y. Z., 2, 54, 55
Two-annedbandit, 5.1; revisited
, 10.2
Uhr, L., 162
Utility, 30, 31, 38 (df.)- 39, 49- 51. See
alsoPerformance
measure
VonNeumann
, J., 36, 37, 39, 42
VonNeumann
technology
, 3.2, 67
Vossler
, C. , 162
, C. H., 119
Waddington
Wallace
, B., 118
WeinbergR., 162
Wilson, E. 0 . , 166