AM EC Unit4
AM EC Unit4
EVOLUTIONARY ALGORITHM
VARIANTS
Ami Munshi
Syllabus
Course Outcomes
References
Paradigms of Evolutionary
Computing/Evolutionary Algorithm
Ref: A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
Comparing GA and GP
Three important differences exist between GAs and GP:
Structure: GP usually evolves tree structures while GA’s evolve binary or real
number strings
Active Vs Passive: Because GP usually evolves computer programs, the solutions
can be executed without post processing i.e. active structures, while GA’s
typically operate on coded binary strings i.e. passive structures, which require
post-processing
Variable Vs fixed length: In traditional GAs, the length of the binary string is
fixed before the solution procedure begins
However a GP parse tree can vary in length throughout the run. Although it is
recognized that in more advanced GA work, variable length strings are used.
William B. Langdon, Riccardo Poli, Nicholas F. McPhee, and John R. Koza, Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications
GP in Nutshell
GP is a special evolutionary algorithm (EA)
where the individuals in the population are computer programs
Generation by generation GP iteratively transforms populations of programs into
other populations of programs
During the process, GP constructs new programs by applying genetic operations
which are specialized to act on computer programs
Algorithmically, GP comprises the steps shown in Algorithm 1. The main
Genetic operations involved in GP (line 5 of Algorithm 1) are the following
Crossover: the creation of one or two offspring programs by recombining randomly
chosen parts from two selected programs.
Mutation: the creation of one new offspring program by randomly altering a randomly
chosen part of one selected program
GP in Nutshell
Some GP systems support structured solutions
May then include architecture-altering operations which randomly
alter the architecture (for example, the number of subroutines) of
a program to create a new offspring program
In addition of crossover, mutation and the architecture-
altering operations,
An operation which simply copies selected individuals in the next
generation is used operation, called reproduction
It is typically applied only to produce a fraction of the new
generation
Idea of Genetic Programming
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Example: Credit Scoring
A possible model:
IF (NOC = 2) AND (S > 80000) THEN good ELSE bad
In general:
IF formula THEN good ELSE bad
Only unknown is the right formula, hence
Our search space (phenotypes) is the set of formulas
Natural fitness of a formula: percentage of well classified cases of
the model it stands for
Natural representation of formulas (genotypes) is: parse trees
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Example: Credit Scoring
IF (NOC = 2) AND (S > 80000) THEN good ELSE bad
can be represented by the following tree
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Tree based representation
Trees are a universal form, e.g. consider
Arithmetic formula y
2 ( x 3)
5 1
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Representation in GP
GP programs are usually expressed as syntax
trees rather than as lines of code
Figure shows, for example, the tree
representation of the program
max(x*x,x+3*y)
Note
variables and constants in the program (x, y,
and 3), called terminals in GP, are leaves of the
tree
while the arithmetic operations (+, *, and max)
are internal nodes (typically called functions in
the GP literature)
The sets of allowed functions and terminals
together form the primitive set of a GP system
Multi-component program representation
In more advanced forms of GP, programs
can be composed of multiple components
(say, subroutines)
In this case the representation used in GP is
a set of trees (one for each component)
grouped together under a special root
node that acts as glue, as illustrated in Fig
We will call these (sub)trees branches
The number and type of the branches in a
program, together with certain other
features of the structure of the branches,
form the architecture of the program.
Tree based representation
y
2 ( x 3)
5 1
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Tree based representation
(x true) (( x y ) (z (x y)))
Tree based representation
In GA, ES, EP chromosomes are linear structures
(bit strings, integer string, real-valued vectors,
permutations)
Tree shaped chromosomes are non-linear
structures
In GA, ES, EP the size of the chromosomes is
fixed
Trees in GP may vary in depth and width
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Tree based representation
In GA, ES, EP chromosomes are linear structures
(bit strings, integer string, real-valued vectors,
permutations)
Tree shaped chromosomes are non-linear
structures
In GA, ES, EP the size of the chromosomes is
fixed
Trees in GP may vary in depth and width
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Tree based representation
Symbolic expressions can be defined by
Terminal set T
Function set F (with the arities of function symbols)
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_6614.pdf
Tree based representation
Child 1 Child 2
One more example of subtree crossover
William B. Langdon, Riccardo Poli, Nicholas F. McPhee, and John R. Koza, Genetic Programming: An Introduction and Tutorial, with a Survey of Techniques and Applications
GA flowchart GP flowchart
Compare GA and GP
Offspring creation scheme
GA scheme using crossover AND mutation sequentially
(be it probabilistically)
GP scheme using crossover OR mutation (chosen
probabilistically)
Randomly create an initial
population (generation 0) of
individual computer programs
composed of the available
functions and terminals
Iteratively perform the following
sub-steps (called a generation)
on the population until the
termination criterion is satisfied:
Execute each program in the
population and ascertain its
fitness (explicitly or implicitly)
using the problem’s fitness
measure.
Select one or two individual
program(s) from the population
with a probability based on
fitness (with reselection allowed)
to participate in the genetic
operations
Create new individual program(s)
for the population by applying the
following genetic operations with
specified probabilities:
Reproduction: Copy the selected
individual program to the new
population.
Crossover: Create new offspring
program(s) for the new population by
recombining randomly chosen parts
from two selected programs
Mutation: Create one new offspring
program for the new population by
randomly mutating a randomly chosen
part of one selected program
Architecture-altering operations:
Choose an architecture altering
operation from the available
repertoire of such operations and
create one new offspring program for
the new population by applying the
chosen architecture-altering operation
to one selected program
After the termination
criterion is satisfied, the
single best program in
the population
produced during the run
(the best-so-far
individual) is harvested
and designated as the
result of the run
If the run is successful,
the result may be a
solution (or approximate
solution) to the problem
Getting ready to run genetic program
Five Preparatory Steps
Step 1: Identify the
terminal set
Theyare inputs to the
computer program that
make up the population
Fitness measure
Control parameters
Termination condition
Example of a genetic program run
Terminal Set
The purpose of the first two preparatory steps is to
Specify the ingredients of the to-be-evolved program
Because the problem is to find a mathematical function of one
independent variable,
terminal set (inputs to the to-be-evolved program) includes the independent
variable, x
The terminal set also includes numerical constants
𝑇𝑒𝑟𝑚𝑖𝑛𝑎𝑙 𝑠𝑒𝑡 = 𝑥, ℝ
𝑤ℎ𝑒𝑟𝑒 ℝ denotes constant numerical terminals in some range say {-5,5}
Example of a genetic program run
Function set
The objective of the problem is somewhat flexible in that it does not specify
what functions may be employed
Reasonable choice could be four typical mathematical expressions
Note: To avoid run-time errors, the division function % is protected
It returns a value of 1 when division by 0 is attempted (including 0 divided
by 0)
But otherwise returns the quotient of its two arguments
Each individual in the population is a composition of functions from the
specified function set and terminals from the specified terminal set
Example of a genetic program run
Third preparatory step involves constructing the fitness measure
Purpose of the fitness measure is to specify what the human wants
High-level goal of this problem is to find a program whose output is equal to the values of the
quadratic polynomial 𝑥2 + 𝑥 + 1
Therefore, the fitness assigned to a particular individual in the population for this problem must
reflect
how closely the output of an individual program comes to the target polynomial 𝑥2 + 𝑥 + 1
Fitness measure could be defined as the value of the integral (taken over values of the
independent variable x between −1.0 and +1.0) of the absolute value of the differences
(errors) between the value of the individual mathematical expression and the target quadratic
polynomial 𝑥2 + 𝑥 + 1
A smaller value of fitness (error) is better
the integral is numerically approximated using dozens or hundreds of different values of the
independent variable x in the range between −1.0 and +1.0
Example of a genetic program run
Fourth preparatory step-Control Parameters
The population size in this small illustrative example will be just four
In actual practice
the crossover operation is commonly performed on about 90% of the individuals in
the population
the reproduction operation is performed on about 8% of the population
the mutation operation is performed on about 1% of the population
and the architecture-altering operations are performed on perhaps 1% of the
population
Because this illustrative example involves an abnormally small population of only
four individuals
the crossover operation will be performed on two individuals and the mutation and
reproduction operations will each be performed on one individual
For simplicity, the architecture-altering operations are not used for this problem
Example of a genetic program run
Fifth preparatory step
A reasonable termination criterion for this problem is that
the run will continue from generation to generation until the
fitness of some individual gets below 0.01
Example of a genetic program run
Actual run starts now by randomly creating a
population of size 4
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
Why use Differential Evolution Algorithm?
Differential evolution algorithm is advantageous
over direct serach approaches and Genetic
Algorithms because
Itcan handle nonlinear and non-differentiable multi-
dimensional objective functions, while requiring very
few control parameters to steer the minimization
These characteristics make the algorithm easier and
more practical to use.
https://machinelearningmastery.com/differential-evolution-from-scratch-in-python/
Why use Differential Evolution (DE)
Algorithm?
An optimization problem is considered to be practical if
It has ability to handle non differentiable, non linear and
multimodal cost function
Ability to parallel process computation intensive cost functions
Ease of use- few control variables, variables should be robust
and easy to select
Good convergence properties- Consistent convergence to global
minimum
DE fulfills all the above requirements
https://machinelearningmastery.com/differential-evolution-from-scratch-in-python/
Differential Evolution
DE is an Evolutionary Algorithm
Hence follows the following steps
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
DE Algorithm- Notations
Suppose we want to optimize a function with
real D parameters
Let us select the population size as N (Here N has to
be minimum 4
The parameter vectors have to be of the form
Here G is the generation number
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
DE Algorithm
Step 1: Initialization
Define upper and lower bounds for each parameter:
where
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
DE Algorithm
Step 2: Mutation
Each of the N parameter vectors undergoes
mutation, recombination and selection
Mutation expands the search space
For a given parameter vector , randomly select three vectors
, , , such that the indices are distinct
Add the weighted difference of two of the vectors to the third, also called as
noisy vector or donor vector as follows
𝑣, = 𝑥 , + 𝐹 𝑥 , − 𝑥 ,
Here F is called the mutation factor which ranges from [0,2]
𝑣, is noisy or donor vector
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
DE Algorithm
Step 3: Recombination
Recombination
Crossover between the successful solution from the previous
generation and the donor (noisy) vector is performed to obtain an
offspring called trial vector ,
Some crossover rate CR is used to perform the crossover
,, ,
,,
,, ,
Here ,
ensures that , ,
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
DE Algorithm
Step 4: Selection
Target vector is compared with the trial vector
and the one with the lowest function value is
admitted to the next generation
, , ,,
,
,,
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
DE Algorithm
Mutation, recombination and selection continue until
some stopping criterion is reached
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
Performance testing of DE algorithm
Using Ackley’s function
In mathematical optimization, the Ackley function is
a non-convex function used as a performance test
problem for optimization algorithms- Wikipedia
Performance testing of DE algorithm
using Ackley’s function
DE with N = 10, F = 0.5 and CR = 0.1
Find
∗
https://www.maths.uq.edu.au/MASCOS/Multi-Agent04/Fleetwood.pdf
https://en.wikipedia.org/wiki/Ackley_function
Example run for DE
A simple numerical example is presented to
illustrate the DE algorithm
Let us consider the following objective function: •
Minimize f (x) = x1+ x2 + x3
Example run for DE
Example run for DE
Example run for DE
Example run for DE
Example run for DE
Flow diagram of DE
Flow diagram of Initial Population
DE Target Vector
Two Randomly
selected vectors
Build weighted
difference vector
Crossover to
obtain child(trial)
vector
Ref: https://staff.fmi.uvt.ro/~daniela.zaharie/ma2015/lectures/metaheuristics2015_slides6.pdf
Evolutionary Programming
First (traditional) direction
Evolve systems (e.g. finite state machine) with prediction abilities
The fitness of such a structure is measured by analyzing the behavior of the system = prediction
abilities
The fitness is a quality measure related to the behaviour of the system
Second (current) direction
It is related to optimization methods similar to evolution strategies
there is only a mutation operator (no recombination)
the mutation is based on random perturbation of the current configuration (x’=x+N(0,s))
s is inversely correlated with the fitness value (high fitness leads to small s, low fitness leads to large values
for s)
starting from a population with m elements, by mutation are constructed m children and the survivors are
selected from the 2m elements by tournament or by truncation
There are self-adaptive variants, called MetaEP; these variants are similar to self-adaptive Evolution
Strategies
Ref: https://staff.fmi.uvt.ro/~daniela.zaharie/ma2015/lectures/metaheuristics2015_slides6.pdf
Evolutionary Programming
Evolutionary programming was invented by Dr. Lawrence J. Fogel in 1960
Dr. Fogel refrained from modeling the end product of evolution
But rather he considered to model the process of evolution itself as a
vehicle for producing intelligent behavior
Dr. Fogel crafted a series of experiments in which finite state machines
(FSMs) represented individual organisms in a population of problem solvers
These graphical models are used to describe the behavior or computer
software and hardware, which is why he termed his approach "Evolutionary
Programming”
http://www.scholarpedia.org/article/Evolutionary_programming
Evolutionary Programming
The experimental procedure was as follow
A population of FSMs is exposed to the environment – that is, the sequence of symbols that has been
observed up to the current time
For each parent machine, as each input symbol is presented to the machine, the corresponding output
symbol is compared with the next input symbol
The worth of this prediction is then measured with respect to the payoff function (e.g., all-none, squared
error)
After the last prediction is made, a function of the payoff for the sequence of symbols (e.g., average
payoff per symbol) indicates the fitness of the machine or program
Offspring machines are created by randomly mutating the parents and are scored in a similar manner
Those machines that provide the greatest payoff are retained to become parents of the next
generation, and the process iterates
When new symbols are to be predicted, the best available machine serves as the basis for making such
a prediction and the new observation is added to the available database
Fogel described this process as “evolutionary programming” in contrast to “heuristic programming”
http://www.scholarpedia.org/article/Evolutionary_programming
Overview of Evolutionary Programming
Developed: USA in the 1960
Early names: L. Fogel, D. Fogel
Typically applied to:
Traditional EP: machine learning tasks executed by finite state machines
Contemporary EP: (numerical) optimization
Attributed features:
Very open framework: any representation and mutation options OK
Crossbred with ES (contemporary EP)
Consequently: hard to determine a “standard” EP
Special:
No recombination
Self-adaptation of parameters standard (contemporary EP)
http://www.scholarpedia.org/article/Evolutionary_programming
https://slideplayer.com/slide/15979842/
Finite State Machine
https://www.cs.mtsu.edu/~xyang/3080/fsm_recognition.html
Finite State Machine
https://www.cs.mtsu.edu/~xyang/3080/fsm_recognition.html
Traditional Direction: Prediction using
Finite State Machine
Finite State Machine
States S
Input I
Output O
Transition function (Also known as state transition)
𝛿: 𝑆 𝑋 𝐼 → 𝑆
Output function
Λ: 𝑆 𝑋 𝐼 → 𝑂
Transforms input stream to output Stream
Can be used for prediction
Example predict next symbol in the sequence
Ref: https://slideplayer.com/slide/15979842/
Prediction using Finite State Machine
Finite State Machine
States S : {A,B,C}
Input I : {1,0}
Output O: {1,0}
Transition function (Also known as state transition)
𝛿: 𝑆 𝑋 𝐼 → 𝑆
𝛿 as given in the diagram
Example: If present state A, input is 0 then the next state
is B
Output function
Λ: 𝑆 𝑋 𝐼 → 𝑂
Example: : If present state A, input is 0 then the output
is 1
Transforms input stream to output Stream
Ref: https://slideplayer.com/slide/15979842/
Example: FSM as predictor
Consider the following FSM
Task: To predict next output
Quality:
Given -Initial State: C
For input sequence 011101, using
FSM diagram given we get
Output sequence as 110111
Quality: 3 out of 5
Example: Parity Problem
A simple test problem
Design a FSM to check if a binary string has an even or an odd
numbers of elements equal to 1
S={even,odd}
I={0,1}
O={0,1}
FSM output
final state = 0 (the sequence has an even number of 1)
final state = 1 (the sequence has an odd number of 1)
Ref: https://staff.fmi.uvt.ro/~daniela.zaharie/ma2015/lectures/metaheuristics2015_slides6.pdf
Example: Parity Problem
State diagram: labelled graph
EP Design: Choose S, I, O
Population
Generate random FSM
Generate labels for nodes
Generate arcs
Mutation
Mutation of output symbol
Redirect an arc (mutate target node)
Add or eliminate nodes
Change initial state
Ref: https://staff.fmi.uvt.ro/~daniela.zaharie/ma2015/lectures/metaheuristics2015_slides6.pdf
Example: Parity Problem
Mutation example
Change of target node of an arc
Ref: https://staff.fmi.uvt.ro/~daniela.zaharie/ma2015/lectures/metaheuristics2015_slides6.pdf
One more example: Predict prime
numbers
Evolving FSM to predict prime numbers
Fitness function:
1 point for correct prediction of the next input
0 point for the wrong prediction
FSM to predict prime numbers
Parent selection: Each FSM is mutated only once
Mutation operators (one selected randomly)
Change an output symbol
Change a state transition
Add a state
Delete a state
Change initial state
Survivor selection
Evolutionary Programming Perspective
Idea was to simulate evolution as a learning process with the aim of
generating artificial intelligence
It was viewed as the
capability of a system to adapt its behaviour
in order to meet some specified goals in a range of environments
Adaptive behaviour is the key term in this definition, and the
capability to predict the environment was considered to be a
prerequisite
The classic EP systems used finite state machines as individuals
Ref:A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
Modern or contemporary EP
No predefined representation, in general, it is
chosen based on features of the problem at hand
No predefined mutation because it must be suitable
for the chosen representation
Often applies self-adaptation of mutation
parameters, as ES does
In the sequel we present one EP variant
Contemporary Evolutionary Programming
Perspective and Evolutionary Strategies
EP frequently uses real-valued representations, and so has almost
merged with ES
The principal differences(ES and EP) lie in the biological inspiration:
In EP each individual is seen as corresponding to a distinct species, and so
there is no recombination
Furthermore, the selection mechanisms are different
In ES parents are selected stochastically, then the selection of the μ best
from the union of μ + λ offspring is deterministic
By contrast, in EP
Each parent generates exactly one offspring (i.e., λ = μ)
These parents and offspring populations are then merged and compete in
stochastic round-robin tournaments for survival
Ref:A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
Technical Summary
Ref:A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
Representation
Continuous parameter optimisation is the most frequently type of application: 𝑓: 𝑅 → 𝑅
Chromosomes consist of two parts:
Object variables: x1,…,xn
Mutation step sizes: 1,…,n
Full size chromosome:
𝑥1, … , 𝑥𝑛, 1, … , 𝑛 , 1, … , 𝑘
Chromosome 1 𝑛 1 𝑛 1 𝑘 is mutated as follows:
Boundary rule: 𝜎 =𝜖
Recombination
None
Parent Selection
Each individual creates one child by mutation
Hence parent selection is deterministic and not
biased by fitness
Survivor selection
selection:
parents, offspring are the candidates
A pairwise competition in round-robin format (it works on a rotating basis, a
chosen pair is moved to the back of the list after the competition) is executed.
A pairwise competition:
Each solution from is evaluated against other randomly chosen
solutions.
For each comparison, a "win" is assigned if is fitter than an opponent.
The solutions with the greatest number of wins are retained to be parents of
the next generation
Parameter allows tuning selection pressure:
Typically
Optimal control problems using
Evolutionary Programming
Ref:D.B.Fogel, “Applying Evolutionary Programming to Selected Control Problems”, Computers Math. Applic. Vol. 27, No. 11, pp. 89-104, 1994
Optimal control problems using
Evolutionary Programming
Ref:D.B.Fogel, “Applying Evolutionary Programming to Selected Control Problems”, Computers Math. Applic. Vol. 27, No. 11, pp. 89-104, 1994
Optimal control problems using
Evolutionary Programming
Ref:D.B.Fogel, “Applying Evolutionary Programming to Selected Control Problems”, Computers Math. Applic. Vol. 27, No. 11, pp. 89-104, 1994
https://www.youtube.com/watch?v=kL7xOZvdhDo
https://scialert.net/fulltext/?doi=jai.2008.12.20
https://www.youtube.com/watch?v=Wff2InJn3p0
Evolutionary Strategies
Evolutionary Strategies
Evolution strategies are evolutionary algorithms
Invented in the early 1960s by Rechenberg and Schwefel
Similar to Genetic Algorithms
find a (near-)optimal solution to a problem within a search space (all possible
solutions)
Often used for empirical experiments, numerical optimisation
Attributed features
Fast and good optimizer for real-valued optimisation
Self-adaptation of (mutation) parameters standard
Based on principal of strong causality: Small changes have small effects
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
http://www.cmap.polytechnique.fr/~nikolaus.hansen/es-overview-2015.pdf
Hans George Beyer, Hans Paul Schwefel, “Evolutionary Strategies a comprehensive introduction”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
Evolutionary Strategy
Evolutionary strategies survivor selection
variations
(μ + λ) Selection
(μ, λ) Selection
Ref:A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
Evolutionary strategies survivor selection
variations
(μ + λ) Selection
Set of offspring and parents are merged
They are ranked according to (estimated) fitness
Then the top μ are kept to form the next generation
Or we can say that
to keep the population size constant, the λ worst out
of all μ + λ individuals are discarded;
Evolutionary strategies survivor selection
variations
(μ, λ) Selection
(μ, λ) strategy used in Evolution Strategies where typically λ>μ children are created from a population
of μ parents
Method works on a mixture of age and fitness
Age component means that all the parents are discarded, so no individual is kept for more than one
generation
The fitness component comes from the fact that the λ offspring are ranked according to the fitness, and
the best μ form the next generation
Parents are “forgotten” no matter how good or bad their fitness was compared to that of
the new generation
In Evolution Strategies, (μ, λ) selection, is generally preferred over (μ + λ) selection for the
following reasons
The (μ, λ) discards all parents and is therefore in principle able to leave (small) local optima
This may be advantageous in a multimodal search space with many local optima
Two membered Evolutionary Strategy
Simple mutation selection scheme
Works on one individual (parent) which will create
offspring by the means of mutation
The better of parent and offspring is selected
deterministically to survive the next generation
This is (1+1) selection Evolutionary strategy
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
Two membered Evolutionary Strategy
Here we describe the basic algorithm, termed the two-membered
evolution strategy, for the abstract problem of minimizing an n-
dimensional function
Task
Minimalize
Algorithm: “two-membered ES” using
Vectors from directly as chromosomes
Population size 1
Only mutation creating one child
Greedy selection
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Two membered Evolutionary Strategy
Set (here t denotes the generation counter)
Create initial point 𝑡 1𝑡 𝑛
𝑡
𝑇𝐻𝐸𝑁 𝑥 = 𝑥𝑡
𝐸𝐿𝑆𝐸𝑥 = 𝑦𝑡
𝑆𝑒𝑡 𝑡 = 𝑡 + 1
END
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Two membered Evolutionary Strategy
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Two membered Evolutionary Strategy-
Mutation mechanism
z values drawn from normal distribution N(,)
mean is set to 0
variation is called mutation step size
Rechenberg suggested the following heuristic rule for adjustings
1/5 success rule:
The ratio of successful mutation to all mutations should be 1/5. If it is greater than 1/5, increase the variance; if it is less, decrease
the mutation variance.
is varied on the fly by the “1/5 success rule”:
This rule resets after every k iterations by
= /𝑐 𝑖𝑓 𝑝𝑠 > 1/5
= • 𝑐 𝑖𝑓 𝑝𝑠 < 1/5
= 𝑖𝑓 𝑝𝑠 = 1/5
where ps is the % of successful mutations, 0.8 c 1
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Normal distribution basics
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Two membered Evolutionary Strategy-
Mutation mechanism
Given a current solution in the form of a vector of length n, a new
candidate is created by
adding a random number for to each of the n
components
A Gaussian, or normal, distribution is used with zero mean and standard
deviation for drawing the random numbers
This distribution is symmetric about zero and has the feature that the
probability of drawing a random number with any given magnitude is a
rapidly decreasing function of the standard deviation
Thus the value is a parameter of the algorithm that determines the
extent to which given values are perturbed by the mutation operator
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Two membered Evolutionary Strategy-
Mutation mechanism
For this reason is often called the mutation step size
Theoretical studies motivated an on-line adjustment of step sizes by the famous 1/5 success rule of Rechenberg
This rule states that the ratio of successful mutations (those in which the child is fitter than the parent) to all
mutations should be 1/5
Hence if the ratio is greater than 1/5 the step size should be increased to make a wider search of the space,
and if the ratio is less than 1/5 then it should be decreased to concentrate the search more around the current
solution
The rule is executed at periodic intervals, for instance, after k iterations each
= 𝑖𝑓 𝑝𝑠 > 1/5 (Foot of big hill → increase σ)
= • 𝑐 𝑖𝑓 𝑝𝑠 < 1/5 (Near the top of the hill →decrease σ)
= 𝑖𝑓 𝑝𝑠 = 1/5
where 𝑝𝑠 is the relative frequency of successful mutations measured over a number of trials, and the parameter c is in the range 0.8 < c < 1.
Ref: Thomas Back, Gunter Rudolph, Hans-Paul Schwefel, “Evolutionary Programming and Evolutionary Strategies”
A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Evolutionary Strategies- example
Some essential characteristics of evolution strategies are
illustrated here in the following example
Evolution strategies are typically used for continuous parameter
optimization
There is a strong emphasis on mutation for creating offspring
Mutation is implemented by adding some random noise drawn
from a Gaussian distribution
Mutation parameters are changed during a run of the algorithm
Ref: https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Historical example:
the jet nozzle experiment
Initial shape
Final shape
Ref: https://research.iaun.ac.ir/pd/faramarz_safi/pdfs/UploadFile_1072.pdf
https://www.cs.vu.nl/~gusz/ecbook/ecbook-course.html
Jet nozzle experiment: Representation
Here Self Adaptation mechanism for mutation is used instead of 1/5 success rule adaptation
Hence 𝑉𝑒𝑐𝑡𝑜𝑟 𝑥 = < 𝑥 , 𝑥 , . . 𝑥 > forms only part of a typical ES genotype
Individuals contain some strategy parameters, in particular, parameters of the mutation
operator
Details of mutation are discussed later
Here we focus only on the structure of individuals, and specify the meaning of the special
genes
Strategy parameters can be divided into two sets,
the 𝜎 values and the 𝛼 values
𝜎 values represent the mutation step sizes
𝛼 values, which represent interactions between the step sizes used for different variables, are not
always used
Jet nozzle experiment: Representation
Chromosomes consist of three parts:
Object variables: x1,…,xn
Strategy parameters:
Mutation step sizes: 1,…,n
Rotation angles: 1,…, n
Not every component is always present
Full size:
where
Jet nozzle experiment: Mutation
Main mechanism: changing value by adding random
noise drawn from normal distribution
x’i = xi + N(0,)
Key idea:
is part of the chromosome x1,…,xn,
is also mutated into ’ (see later how)
Thus: mutation step size is coevolving with the solution
x
Jet nozzle experiment: Mutate first
Net mutation effect: x, x’, ’
Order is important:
first ’ (see later how)
then x x’ = x + N(0,’)
Rationale: new x’ ,’ is evaluated twice
Primary: x’ is good if f(x’) is good
Secondary: ’ is good if the x’ it created is good
Reversing mutation order this would not work
Mutation case 1:
Uncorrelated mutation with one
Chromosomes: x1,…,xn,
’ = • exp( • N(0,1))
x’i = xi + ’ • N(0,1)
’j = j + • N (0,1)
x ’ = x + N(0,C’)
x stands for the vector x1,…,xn
C’ is the covariance matrix C after mutation of the values
1/(2 n)½ and 1/(2 n½) ½ and 5°
i’ < 0 i’ = 0 and
| ’j | > ’j = ’j - 2 sign(’j)
Mutants with equal likelihood
Two parents
Two fixed parents
selected for each i
Local Global
zi = (xi + yi)/2
intermediary intermediary
zi is xi or yi Local Global
chosen randomly discrete discrete
Recombination
Basic recombination scheme in evolution strategies involves two
parents that create one child
To obtain offspring recombination is performed times.
There are two recombination variants distinguished by the manner of
recombining parent alleles
Using discrete recombination one of the parent alleles is randomly
chosen with equal chance for either parents
In intermediate recombination the values of the parent alleles are
averaged
Recombination
An extension of this scheme allows the use of more than two recombinants, because the two
parents x and y are drawn randomly for each position 𝒊 ∈ {𝟏 … . . 𝒏} in the offspring anew
These drawings take the whole population of individuals into consideration, and the result is a
recombination operator with possibly more than two individuals contributing to the offspring.
The exact number of parents, however, cannot be defined in advance. This multi-parent variant
is called global recombination
To make terminology unambiguous, the original variant is called local recombination
Evolution strategies typically use global recombination
Interestingly, different recombination is used for the object variable part (discrete is
recommended) and the strategy parameters part (intermediary is recommended).
This scheme preserves diversity within the phenotype (solution) space, allowing the trial of very
different combinations of values, whilst the averaging effect of intermediate recombination
assures a more cautious adaptation of strategy parameters
Parent selection
Parents are selected by uniform random distribution
whenever an operator needs one/some
Thus: ES parent selection is unbiased - every
individual has the same probability to be selected
Note that in ES “parent” means a population member
(in GA’s: a population member selected to undergo
variation)
Survivor selection
Applied after creating children from the
parents by mutation and recombination
Deterministically chops off the “bad stuff”
Basis of selection is either:
The set of children only: (,)-selection
The set of parents and children: (+)-selection
Survivor selection cont’d
(+)-selection is an elitist strategy
(,)-selection can “forget”
Often (,)-selection is preferred for:
Better in leaving local optima
Better in following moving optima
Using the + strategy bad values can survive in x, too long if their host x is very fit
Selective pressure in ES is very high ( 7 • is the common setting)
Survivor selection cont’d
(+)-selection is an elitist strategy
(,)-selection can “forget”
Often (,)-selection is preferred for:
Better in leaving local optima
Better in following moving optima
Using the + strategy bad values can survive in x, too long if their host x is very fit
Selective pressure in ES is very high ( 7 • is the common setting)
Self-adaptation illustrated
Given a dynamically changing fitness landscape
(optimum location shifted every 200 generations)
Self-adaptive ES is able to
follow the optimum and
adjust the mutation step size after every shift !
Prerequisites for self-adaptation
> 1 to carry different strategies
> to generate offspring surplus
Not “too” strong selection, e.g., 7 •
(,)-selection to get rid of misadapted ‘s
Mixing strategy parameters by (intermediary)
recombination on them
Learning Classifier Systems
Learning Classifier Systems (LCS) represent an alternative evolutionary approach to model
building based on the use of rule sets, rather than parse trees, to represent knowledge
LCS are used primarily in applications where the objective is to evolve a system that will
respond to the current state of its environment (i.e., the inputs to the system) by suggesting a
response that in some way maximises future reward from the environment
An LCS is therefore a combination of a classifier system and a learning algorithm
The classifier system component is typically a set of rules, each mapping certain inputs to
actions
The whole rule set therefore constitutes a model that covers the space of possible inputs and
suggests the most appropriate actions for each
The learning algorithm component of an LCS is implemented by an evolutionary algorithm,
whose population members either represent individual rules, or complete rule sets, known
respectively as the Michigan and Pittsburgh approaches
Ref:A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
Learning Classifier Systems
The learning algorithm component of an LCS is implemented by an evolutionary
algorithm, whose population members either represent individual rules, or complete
rule sets, known respectively as the Michigan and Pittsburgh approaches
The fitness driving the evolutionary process may be driven by many different forms
of learning, here we restrict ourselves to ‘supervised’ learning, where at each stage
the system receives a training signal (reward) from the environment in response to
the output it proposes
This helps emphasise the difference between the Michigan and Pittsburgh
approaches
In the Michigan data items are presented to the system one-by-one and individual
rules are rewarded according to their predictions
By contrast, in a Pittsburgh approach each individual represents a complete model,
so the fitness would normally be calculated by presenting the entire data set and
calculating the mean accuracy of the predictions
Michigan Style Classifier Systems
The list below summarizes the main workflow of the algorithm.
1. A new set of inputs are received from the environment
2. The rule base is examined to find the match-set of rules.
• If the match set is empty, a ‘cover operator’ is invoked to generate one or more new matching rules with a random
action
Ref:A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
Michigan Style Classifier Systems
Ref:A.E. Eiben, J.E. Smith, Introduction to Evolutionary Computing, 2nd edition, Natural Computing Series, Springer, 2015
Pittsburgh-style LCS
The Pittsburgh-style LCS predates, but is similar to the better-known GP:
Each member of the evolutionary algorithm’s population represents a complete model of the mapping from input to
output spaces
Each gene in an individual typically represents a rule, and again a new input item may match more than
one rule, in which case typically the first match is taken
This means that the representation should be viewed as an ordered list, and two individuals which contain
the same rules, but in a different order on the genome, are effectively different models. Learning of
appropriately complex models is typically facilitated by using a variable-length representation so that new
rules can be added at any stage
This approach has several conceptual advantages — in particular, since fitness is awarded to complete rule
sets, models can be learned for complex multi-step problems
The downside of this flexibility is that, like GP, Pittsburgh-style LCS suffers from bloat and the search space
becomes potentially infinite
Nevertheless, given sufficient computational resources, and effective methods of parsimony to counteract
bloat, Pittsburgh-style LCS has demonstrated state-of-the-art performance in several machine learning
domains, especially for applications such as bioinformatics and medicine, where human-interpretability of
the evolved models is vital and large data-sets are available so that the system can evolve off-line to
minimise prediction error
The Michigan-style LCS was first described by Holland in 1976 as a framework for studying learning in condition/action rule-
based systems, us ing genetic algorithms as the principal method for the discovery of new rules and the reinforcement
of successful ones [219]. Typically each member of the population was a single rule representing a partial model – that is to
say it might only cover a region of the decision space. Thus it is the entire population that together represents the learned
model. Each rule is a tuple {condition:action:payoff}. The condition specifies a region of the space of possible inputs in which
the rule applies. The condition parts of rules may contain wildcard, or ‘don’t-care’ characters for certain variables, or may
describe a set of values that a given variable may take – for example, a range of values for a continuous variable. Rules may
be distinguished by the number of wildcards they contain, and one rule is said to be more specific than another if it contains
fewer wildcards, or if the ranges for certain variables are smaller — in other words if it covers a smaller region of the input
space. Given this flexibility, it is common for the condition parts of rules to overlap, so a given input may match a number of
rules. In the terminology of LCS, the subset of rules whose condition matches the current inputs from the environment is known as
the match set. These rules may prescribe different actions, of which one is chosen. The action specifies either the action to be
taken (for example, if controlling robots or on-line trading agents) or the system’s prediction (such as a class label or a
numerical value). The subset of the match set advocating the chosen action is known as the action set. Holland’s original
framework maintained lists of which rules have been used, and when a reward was received from the environment a portion
was passed back to recently used rules to provide information for the selection mechanism. The intended effect is that the
strength of a rule predicts the value of the reward that the system will gain for undertaking the action. However the
framework proved unwieldy and difficult to make work well in practice
LCS research was reinvigorated in the mid-1990s by Wilson who removed the concept of
memory and stripped out all but the essential components in his minimalist ZCS algorithm
[464]. At the same time several authors were noting the conceptual similarity between LCS and
reinforcement learning algorithms which attempt to learn, for each input state, an accurate
mapping from possible actions to expected rewards. The XCS algorithm [465] firmly
established this link by extdicted payoff matches the reward received. Unlike ZCS, the EA is
restricted at each cycle — originally to the match set, latterly to the action set, which increases
the pressure to discover generalised conditions for each action. As per ZCS, a credit
assignment mechanism is triggered by the receipt of rewards from the environment to update
the predicted pay-offs for rules in the previous action set. However, the major difference is
that these are not used directly to drive selection in the evolution process. Instead selection
operates on the basis of accuracy, so the algorithm can in principle evolve a complete
mapping from input space to actionsending rule-tuples to {condition:action:payoff,accuracy},
where the accuracy value reflects the system’s experience of how well the pre-
The list below summarizes the main workflow of the algorithm. 1. A new set of inputs are
received from the environment. 2. The rule base is examined to find the match-set of rules. • If
the match set is empty, a ‘cover operator’ is invoked to generate one or more new matching
rules with a random action. 3. The rules in the match-set are grouped according to their
actions. 4. For each of these groups the mean accuracy of the rules is calculated. 5. An action
is chosen, and its corresponding group noted as the action set. • If the system is an ‘exploit’
cycle, the action with the highest mean accuracy is chosen. • If the system is in an ‘explore’
cycle, an action is chosen randomly or via fitness-proportionate selection, acting on the mean
accuracies. 6. The action is carried out and a reward is received from the environment. 7. The
estimated accuracy and predicted payoffs are then updated for the rule in the current and
previous action sets, based on the rewards received and the predicted pay-offs, using a
Widrow–Hoff style update mechanism. 8. If the system is in an ‘explore’ cycle, an EA is run
within the action-set, creating new rules (with pay-off and accuracies set to the mean of their
parents), and deleting others.
The Pittsburgh-style LCS predates, but is similar to the better-known GP: each member of the evolutionary
algorithm’s population represents a complete model of the mapping from input to output spaces. Each gene
in an individual typically represents a rule, and again a new input item may match more than one rule, in
which case typically the first match is taken. This means that the representation should be viewed as an
ordered list, and two individuals which contain the same rules, but in a different order on the genome, are
effectively different models. Learning of appropriately complex models is typically facilitated by using a
variable-length representation so that new rules can be added at any stage. This approach has several
conceptual advantages — in particular, since fitness is awarded to complete rule sets, models can be
learned for complex multi-step problems. The downside of this flexibility is that, like GP, Pittsburgh-style LCS
suffers from bloat and the search space becomes potentially infinite. Nevertheless, given sufficient
computational resources, and effective methods of parsimony to counteract bloat, Pittsburgh-style LCS has
demonstrated state-of-the-art performance in several machine learning domains, especially for applications
such as bioinformatics and medicine, where human-interpretability of the evolved models is vital and large
data-sets are available so that the system can evolve off-line to minimise prediction error. Two recent
examples winning Humies Awards for better than human performance are in the realms of prostate cancer
detection [272] and protein structure prediction [16]