0% found this document useful (0 votes)
12 views28 pages

Evolutionary Computation: Theoretical Issues

The document discusses theoretical issues related to comparing evolutionary algorithms. It describes how to evaluate algorithms based on convergence speed and fitness value. Statistical tests can be used to compare algorithms and determine if performance differences are significant. The schema theorem is also covered, which allows predicting the distribution of gene combinations between generations.

Uploaded by

Luca Guglielmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views28 pages

Evolutionary Computation: Theoretical Issues

The document discusses theoretical issues related to comparing evolutionary algorithms. It describes how to evaluate algorithms based on convergence speed and fitness value. Statistical tests can be used to compare algorithms and determine if performance differences are significant. The schema theorem is also covered, which allows predicting the distribution of gene combinations between generations.

Uploaded by

Luca Guglielmi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Evolutionary Computation

Theoretical Issues
Comparison between evolutionary
algorithms
How to compare Evolutionary Algorithms

 Thequality of an Evolutionary Algorithm is evaluated


according to two measures

 Convergence speed: how many fitness evaluations are


requested for the algorithm to find a solution having
fitness below a certain threshold (if minimization is the
goal)

 Fitness: the
best fitness value found within a certain
number of fitness
How to compare Evolutionary Algorithms

 Convergence speed (or better, execution time) is


approximated as the number of times the fitness function
has been evaluated.

 In
fact, computing the fitness function is usually by far the
computationally heaviest part of the algorithm.
How to compare Evolutionary Algorithms

◦ Execute the program N times, annotating the results of each


run (typically, how many generations are needed to reach a
certain fitness value, or what fitness is reached after a number
of fitness evaluations)
◦ Compute mean, standard deviation, and success rate
(percentage of runs which reached the optimum solution) for
the set of results which have been obtained
◦ Perform a statistical test to verify how significantly different
the performances of the two programs, computed as the
average fitness, are.
How to compare Evolutionary Algorithms

 Statistical
tests answer the question: “Considering the
outcome of two algorithms as the sampling of two
random variables, what is the probability (the so-called p-
value) of the ‘null’ hypothesis, given their samplings?”

 Inour case, the null hypothesis is typically that the two


distributions have the same mean (i.e, there is no
difference in their performance)”.
Therefore:

◦ The lower the p-value, the more the algorithm scoring better
is likely to be actually better than the other.
How to compare Evolutionary Algorithms
 Ifdata have a Gaussian distribution, the Student t-test can be
used. It is based on the value of the mean and of the
standard deviation of the two samplings.
 Ifdata do not follow a Gaussian distribution, a rank test is
necessary:
◦ Sort ALL results from both distributions
◦ Compute the average rank of data from each distribution and
their standard deviation
 The null hypothesis here is that data from the same
distribution have the same average ranking
 The p-value is computed using the Wilcoxon test corrected
properly according to the number of samples.
How to compare Evolutionary Algorithms

 Itmay happen that, in n out of the N runs of the algorithm,


the objective fitness threshold is not reached (unsuccessful
runs).

 Inthis case, usually, the algorithm terminates when a


maximum number of generations, set by the user, have been
reached.

 Ifwe evaluate the quality of the algorithm in terms of


convergence speed, using the maximum fitness evaluations
value computing the average number of evaluations out of the
N runs, we would obtain a wrong estimate of the mean
How to compare Evolutionary Algorithms

 Inthese cases, one should consider compute first the


‘success rate’

s% = 100*(Number of successful runs)/N

 Then, using
only the successful runs, estimate the average
number of fitness invocations needed to reach the goal.

 The semantics of this test are: “The algorithm is expected to


reach the goal within avg(Ngeni , i=1,N) generations, but in
100-s% cases the run will fail within the generation ‘budget’
it was assigned”
Theoretical analysis of evolution
Evolutionary algorithms are stochastic optimization processes,
characterized by a non-deterministic component. They can
therefore be studied only using probability theory. For
example, an empiric statistical analysis of the outcome of
several applications of an evolutionary algorithm to the same
problem is the only way to assess the algorithm
performances.

From a theoretical point of view, there is not much hope to


find global final results in the short term. Even if evolutionary
computation is still a young discipline, population genetics has
been studied for very long now, and many problems are still
open.
Theoretical analysis of evolution

Two important theoretical results are:

• The schema theorem (Holland), which allows one to make


predictions, within a population, about the distribution of
specific gene combinations (schemata) in generation n+1,
given their distribution in generation n.

• The so-called No Free Lunch Theorem (Wolpert & McReady)


which shows that, over the set of all possible problems, all
stochastic optimization methods have the same average
performances.
The schema theorem

The schema theorem has been demonstrated only for genetic


algorithms using a fixed-length binary string representation,
which, anyway, is the most commonly used form of
representation.
If the search space is {0,1}N (the space of binary strings having
length N), a schema is defined as a hyperplane within such a
space, and is represented by a string which contains the
symbols {0, 1, #}, # denoting the ‘don’t care’ symbol, i.e., string
elements which may assume either values within the schema.
The schema theorem

For instance, using a 5-bit representation, the schema 11###


represents the hyperplane defined by the presence of two
symbols ‘1’ in the first two locations and any acceptable value
in the other 3 locations.
All strings that satisfy such a criterion are called instances or
examples of the schema. In this case the schema admits 23
instances, that is the number of possible combinations
corresponding to the degrees of freedom offered by the ‘don’t
care’ symbols.
The schema theorem
The fitness associated to a schema is the average fitness of all
possible instances of such a schema.
If the number of instances of a schema within a population is
large enough, their average fitness can be used to estimate
the schema fitness.
A global optimization problem can be seen as the search for
the schema, having no ‘don’t care’ symbols, which has
maximum fitness.
The behavior of a genetic algorithm can be studied more
easily in terms of schemata (aggregation) than studying single
instances separately.
The schema theorem
A L-bit string is an instance for 2L schemata: in fact,
considering also the ‘don’t care’ symbol, for each bit in a
string, a schema that associates a value to it may be
instantiated, in that position, either by such a value or by the
‘don’t care’ symbol; it may not be instantiated by the value
negation.
Within a population of m individuals at most m·2L schemata
can be identified. Usually, they are much fewer: Holland
estimated that m individuals can effectively exploit O(m3)
schemata.
This result, also known as implicit parallelism, is cited as one of
the main reasons for the effectiveness of genetic algorithms.
The schema theorem

Each schema has an order, defined as the number of elements


which differ from ‘#‘, and a defining length, that is the distance
between the most external elements which differ from ‘#’.

Example
H = 1##0#1#0 has order o(H) = 4
and defining length d(H) = 7
The schema theorem
The statistical distribution of the number of instances of a
population’s schemata depends mainly on the effects of
genetic operators.
Selection can only modify the statistical distribution of the
pre-existing instances.
Recombination and mutation may create new instances and
cause the extinction of others.
Pd(H,x) denotes the probability that the application of
operator x has a destructive effect on schema H, while Ps(H)
denotes the probability that an instance of schema H is
selected.
The schema theorem

Holland applied his analysis to the standard genetic algorithm


(SGA), using fitness-proportionate selection, single-point crossover
(1X) and single-bit mutation, within a generational survival
framework.
The schema H associated with a genotype of length L is
destroyed by 1X if the crossover point is located within the
schema extremes, therefore
Pd(H,1X) = d(H) / (L-1)
The schema theorem
Pd(H,mut) associated with mutation, instead, depends on the
schema order
Pd(H,mut) = 1 - (1-Pm) o(H)
which can be approximated by
Pd(H,mut) = Pm · o(H)
by neglecting the higher-order terms in Pm.

Notice: (1-Pm) o(H) is the probability that no bits of the


schema are mutated, therefore the probability that the
schema is NOT destroyed. In fact, each (non)mutation is
considered as an independent event, so the probability that it
occurs n times is (1 –Pm)n
The schema theorem
The probability that a schema is selected depends on the
relative fitness, with respect to the total fitness of the whole
population, of the individuals by which it is instantiated, and by
the number of instances n(H,t).
Ps(H,t) = n(H,t) · f(H)/(m · <f>)
where
f(H) is the schema fitness estimated over its instances
<f> is the average fitness of the population
m is the population size
The schema theorem
The algorithm is considered to be generational, so m
independent samples (individuals) are extracted to create
the next generation. The expected number of instances of
H that are selected therefore amounts to
n’(H, t) = m · Ps(H, t) = n(H, t) · f(H) / <f>
Normalizing to make the result independent of
population size, considering the destructive effects of
operators, and using an inequality sign to take into
account the creation of new instances of H, deriving from
the destruction of other schemata, the result reported in
the following slide is obtained.
The schema theorem
The percentage m(H) of instances of schema H in subsequent
generations satisfies the following disequation:
m(H,t+1) ≥ m(H,t)· (f(H)/<f>) · (1- pc · d(H)/(L-1))· (1- pm· o(H))
pc = crossover rate
pm = mutation rate
(f(H)/<f>) = probability that schema H is selected
(1- pc· d(H)/(L-1) = probability that schema H is not
destroyed by crossover
(1- pm· o(H)) = probability that schema H is not
destroyed by mutation
The schema theorem

Immediate interpretation: the number of instances of higher-


fitness schemata increase over generations.
However, the theorem has been criticized as follows:
• It is based on fitness-proportionate selection, which does
not ensure that ‘resources’ be optimally allocated to the
different schemata, with respect to convergence.
• Saturation phenomena exist when the ratio f(H)/<f>
decreases with the increase of <f>, as prevalence of schema
H on the other schemata grows.
The schema theorem

• It is based on an estimation of f(H) which permits to


predict the relative frequency of the schema in the next
generation, but not the frequency in the following
generations; in fact, the relative frequency of the other
schemata also changes, modifying, as well, the composition
of the strings belonging to H and therefore the estimation
of f(H).
• It takes into account only the destructive, and not the
constructive, effects of genetic operators.
‘No free lunch’ theorem

Controversial and sometimes abused theorem.


In synthesis, it states that “averaging over the space of all
possible problems, all black-box non-revisiting optimization
algorithms (i.e., those which do not use any a priori
knowledege of the function and evaluate each point of
the search space at most once) have the SAME
performances”. This implies that any algorithm performs
on average as well as random search


‘No free lunch’ theorem
With reference to evolutionary algorithms, however, one
may ask whether the problems they solve are actually a
sample of the WHOLE problem space or they represent
only a specific region.
In any case, since the result applies to the set of all
possible problems, this does not mean that, considering a
particular problem or class of problems, an optimum
search algorithm may not exist. Consider, for instance,
gradient descent for optimizing unimodal functions, i.e.,
those which do not have any local minima or maxima.
‘No free lunch’ theorem

In any case, from the theorem one can derive that:


 If a new algorithm is created which appears to be the
best for a certain class of problems, the price to be
paid is the existence of other classes of problems for
which it is not suited.
 Considering a specific problem, it is possible to
circumvent the NFL theorem incorporating specific
knowledge on the solution domain.
Books on Evolutionary Computation

 Eiben, Smith “Introduction to Evolutionary


Computing 2nd ed.”, Springer, 2015
 Banzhaf, Nordin, Keller, Francone, “Genetic Programming:
an introduction”, Morgan Kaufmann, 1998
 Poli, Langdon, McPhee “A Field Guide to Genetic
Programming”, http://Lulu.com, 2008 (free download)
 A. Tettamanzi, M. Tomassini “Soft Computing: Integrating
Evolutionary, Neural and Fuzzy Systems”, Springer, 2001
 K. De Jong, “Evolutionary Computation: a unified
approach”, MIT Press, 2005

You might also like