0% found this document useful (0 votes)
208 views97 pages

2.1-Genetic Algorithms

Genetic algorithms are inspired by Darwin's theory of evolution and natural selection. They use techniques such as mutation, crossover, and selection to evolve solutions to problems by starting with a random population of individuals and iterating them through generations. Genetic algorithms have been successfully applied to optimization problems, scheduling, data fitting, and other problems to find high-quality solutions in a more "natural" way compared to traditional algorithms.

Uploaded by

Upender Dhull
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
208 views97 pages

2.1-Genetic Algorithms

Genetic algorithms are inspired by Darwin's theory of evolution and natural selection. They use techniques such as mutation, crossover, and selection to evolve solutions to problems by starting with a random population of individuals and iterating them through generations. Genetic algorithms have been successfully applied to optimization problems, scheduling, data fitting, and other problems to find high-quality solutions in a more "natural" way compared to traditional algorithms.

Uploaded by

Upender Dhull
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 97

GENETIC ALGORITHMS

GORITHMS

Genetic Algorithms
g
Introduction

z Genetic algorithms are inspired by Darwin's theory


of evolution (i.e., survival of the fittest);

“The new breeds of classes of living things come into existence


through the process of reproduction, crossover, and mutation
among existing organisms.”

z The above concept in the theory of evolution has


been translated into an algorithm to search for
solutions to problems in a more ‘natural’ way.
History

z Genetic algorithms are originated from the studies of


cellular automata, conducted by John Holland and his
colleagues at the University of Michigan
Michigan.

z Until the early 1980s, the research in genetic


algorithms was mainly theoretical
theoretical, with few real
applications.

z From the early 1980s the community of genetic


algorithms has experienced an abundance of
applications which spread across a large range of
disciplines. Each and every additional application gave
a new perspective to the theory.
z In the process of improving performance as much
as possible via tuning and specializing the genetic
algorithm operators, new and important findings
regarding the generality, robustness, and
applicability of genetic algorithms became
available.

z Following the last couple of years of furious


development
p of g
genetic algorithms
g in the sciences,,
engineering and the business world, these
algorithms in various guises have now been
successfullyy applied
pp to optimization
p p
problems,,
scheduling, data fitting and clustering, trend
spotting and path finding.
Biological Background

z Chromosome: are strings of DNA (genes) and


serve as a model for the whole organism
z E h gene encodes
Each d a particular
ti l protein.
t i B
Basically,
i ll
it can be said that each gene encodes a trait, for
example
p color of eyes. y
z Possible settings for a trait (e.g. blue, brown) are
called alleles.
z E h gene h
Each has itits own position
iti iin th
the
chromosome. This position is called locus.
z Complete set of genetic material (all chromosomes)
is called genome.

z Particular set of genes in genome is called


genotype.

z The genotype is the after birth base for the


organism's phenotype, its physical and mental
characteristics, such as eye color, intelligence, etc.
http://gslc.genetics.utah.edu/units/basics/tour/
z During reproduction, recombination (or crossover)
first occurs. Genes from parents combine to form a
whole new chromosome.

z The newly created offspring can then be mutated.


M t ti means that
Mutation th t the
th elements
l t off DNA are a bit
changed. This changes are mainly caused by errors
in copying genes from parents.

z The fitness of an organism is measured by success


of the organism in its life (survival).
Biological Metaphor
z Algorithm
Al ih b begins
i with
i h a set off solutions
l i (
(represented
dbby
chromosomes) called population.

z Solutions from one population are evaluated for their


goodness (fitness) and used to form a new population,
(reproduction). This is motivated by a hope, that the new
population will be better than the old one.

z Solutions which are then selected to form new solutions.


(Offspring) are selected according to their fitness - the
more suitable they are the more chances they have to
reproduce.

z This is repeated until some condition (for example


number
b off populations
l ti or iimprovementt off th
the b
bestt
solution) is satisfied.
Flowchart
START 1. Generate random population of
N chromosomes (feasible
FITNESS solutions for the problem)
EVALUATION
2. E l t the
Evaluate th fitness
fit f( ) off each
f(x) h
GENERATE NEW
chromosome x in the population
POPULATION 3. Create a new population by
repeating following steps until
the new population is complete
REPLACE by means of selection and
crossover or mutation
4. Replace unfit individuals in old
NO END population by new off springs
CONDITION
SATISFIED
5. If the end condition is satisfied,
stop and return the best solution
stop,
YES
in current population
STOP else Go to step 2
An Illustrative Example

z How a line may be fit through a given data set using a


genetic algorithm?
z C
Consider
id a kknown model d l structure,
t t a lilinear fit
y = C1x + C2
– Encode C1 and C2 in 6-bit string g each ((L = 6))
– E.g., 0 0 0 1 1 1 0 1 0 1 0 0 (genotype)
C1 = 7 C2 = 20
y = 7x + 20 (phenotype)
– Initial population = 4
– Assume the minimum of C1 and C2 are Cmin = -2; and the
maximum
i off C1 and
d C2 are Cmax = 5.
5
– Evaluate the performance from the given data set
– Cut-off value = 0.8 (relative fitness)
z Mapping from genotype to phenotype
– Ci = Cmin,i + b / (2L-1) (Cmax,i - Cmin,i)
z Selection: strings with better fitness values receive
corresponding more copies in the new generation (i.e.,
strings with lower fitness values are eliminated) (i.e.,
fitness proportionate selection)
z Crossover: strings are able to mix and match their
desirable qualities in a random fashion (i.e., one-point
crossover))
z Mutation: helps to increase the searching space,
allowingg a vital bit of information to vary
y ((from 1 o 0 or
from 0 to 1) (i.e., bit-flip mutation). Mutation takes place
very rarely (e.g., 0.005/bit/generation)
Another Illustrative Example

z Given some functional mapping for a system, some membership


functions and their shapes are assumed for the various fuzzy
variables defined for a problem.
p
z These membership functions are them coded as bit strings that are
then concatenated.
z An evaluation function (fitness) is used to measure the fitness of
each set of membership functions.
z Single-input (x) single-output (y) system:
x 1 2 3 4 5
y 1 4 9 16 25
z Functional mapping in fuzzy logic for the system:
x S (small) L (large) x ∈ [0 5]
y S (small) VL (very large) y ∈ [0 25]
z The only parameter needed to described the
membership functions (shape and position) is the
length of the bases Î encoding
μx

10
1.0
S L

x
Base 1 Base 2
z 6-bit string (binary) to define the base Î 24-bit string
chromosome
h
z Initial population = 4
Design Parameters

1. Design a representation
2. Decide how to initialize a population
3
3. Design a mechanism to map the phenotype to
genotype and vice versa
4. Design a way of evaluating an individual
5. Design suitable mutation operators
6. Design suitable crossover operators
7
7. Decide how to select parents for crossover
8. Decide how to select individuals to be replaced
9. Decide when to stop p the algorithm
g
10. Decide how to manage the population
1. Designing a Representation

z We have to come up with a method of representing


an individual as a genotype.

z The way we choose to do it must be relevant to the


problem that we are trying to solve.

z When choosing a representation, we have to bear in


mind how the genotypes will be evaluated and what
the genetic operators might be.
For n-city Traveling Salesman Problem,
how should we design a representation
(phenotype vs. genotype)?
1

0.9

0.8

0.7

0.6

0.5

0 4
0.4

0.3

0.2

0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Binary Valued Representation

z A chromosome should in some way contain


information about solution that it represents. The
most used way of encoding is a binary string. A
chromosome
h th
then could
ld llook
k lik
like thi
this:

– Chromosome 1 1101 1001 0011 0110


– Chromosome 2 1101 1110 0001 1110

z Each chromosome is represented by a binary


string. Each bit in the string can represent some
characteristics of the solution. Another possibility is
that the whole stringg can represent
p a number
Examples

PHENOTYPE

Integer
Real Number
8 bit GENOTYPE Schedule
.
.
A thi ?
Anything?

WHAT CAN IT REPRESENT ???


z Phenotype can be an integer number

=1*2
1*27+0*26+1*25+0*24+0*23+
0*22+1*21+1*20

= 128+32+2+1 = 163
z Phenotype can be a
real number
e.g. a numberb b
between
t
2.5 and 20.5 using 8
binaryy digits
g

X= 2.5 + (163/256) (20.5-


2 ) = 13
2.5) 13.9609
9609
Job Time
step
1 2
z Phenotype can
2 1
be a schedule
3 2
z e.g. 8 jobs , 2
4 1
time steps
5 1
6 1
7 2
8 2
Real Valued Representation

z A very natural encoding if the solution we are looking


for is a list of real-valued numbers, then encode as a
list of real
real-valued
valued numbers (not as a string of 1’s
1 s and
0’s)

z Lots of applications , e.g. parameter optimization


z Individuals are represented as a tuple of n real
valued numbers:

z The fitness function maps tuples of real numbers to


a single real number
f:Rn Æ R
Binary vs. Real Representation

z Original genetic algorithm is based on binary coding, due to


biological evidence. But it has some drawbacks when applied to
multidimensional, high-precision
g p numerical p
problems.
z There are three advantages of real representation:
– First, the real representation proves adequate precision so that good values
are presentable in the search space. If a parameter is coded in binary form,
th
there iis always
l th
the d
danger th
thatt one iis allowed
ll d enough
h precision
i i tto representt
parameter values that produce the best solution values.
– Second, with the same digits, the real representation has larger range and
does not have to be a power of two.
– Third, the real-coded genes have the ability to exploit the gradualness of
continuous variables (the gradualness means that small changes in the
variables correspond to small changes in the function).
z Although the real presentation GA can overcome some
disadvantages of binary one, there are still some weak points like
inherent premature convergence and leak mountain climbing.
Order Based Representation

z Individuals are represented as permutations


z Used for ordering, sequencing problems
z Famous example: traveling salesman problem where
every city gets assigned a unique number from 1 to
n. A solution could be (5, 4, 2, 1, 3).
z Famous example: N-queen problem (or N2-queen)
z Needs special operators to make sure the individuals
stay in valid permutations.
N Queens Problem

z In chess, a queen can move as far


as she pleases, horizontally,
vertically, or diagonally. A chess
board has 8 rows and 8 columns
columns.
The standard 8 by 8 Queen's
problem asks how to place 8
queens on an ordinary chess board
so that none of them can hit any
other in one move.
z One solution - the prettiest in my
opinion - is given in the figure
nearby It turns out that there are
nearby.
12 essentially distinct solutions.
(Two solutions are not essentially
distinct if you can obtain one from
another by rotating your chess
board, or placing in in front of a
mirror, or combining these two
operations.)
Constraint Satisfaction Problem
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Unique 1 0 0 1 2 1 6 12 46 92 341 1787 9233 45752 285053

solution

Distinct 1 0 0 2 10 4 40 92 352 724 2680 14200 73712 365596 2279184

solution

* Distinct solutions are derived from unique solutions by rotations,


reflections and transpositions
z Variants:
– “Super Queen” or N2 Queens (3-D version) problem
– Th “3DN2QP” requires
The i placing
l i N2 queens in i an N×N×N
cube so that no two queens threaten each other. There can
only be one queen in any row parallel to any axis, or in any
diagonal If a solution is projected onto one face
diagonal. face, SS, of the
cube, so that Sij contains a height, k, then S is clearly a
Latin square. A solution to 3DTN2QP is also a pandiagonal
Latin square or Knut-Vik
Knut Vik design.

z Demonstration (N: 4-60)


– http://www.apl.jhu.edu/~hall/NQueens.html
Tree Based Representation

z Individuals in the population are trees


z Any expression can be drawn as a tree of function
and
d tterminals
i l
– Functions : sine, cosine, add, sub, if-then-else
– Terminals : X, Y, 0.456, true, false, sensor 1,
Tree Based: Closure and Sufficiency

z In the tree based representation we need to specify


a function set and a terminal set. It is very desirable
that these sets both satisfy closure and sufficiency.

z By closure we mean that each of the functions in the


f
function
ti sett is
i able
bl to
t acceptt as its
it arguments
t any
value and data type that may possibly be returned by
some other function or terminal.

z By sufficiency we mean that there should be a


solution in the space of all possible programs
constructed from the specified function and terminal
sets.
2. Initialization of a Population

z Usually, at random

z Uniformly on the search space …. ,if possible


– Binary representation: 0 and 1 with probability of 0.5
– Real-valued
Real valued representation: uniformly on a given interval
(OK for bounded values only)

z Seed the population with previously known values or


those from heuristics. With care:
– Possible loss of g
genetic diversity
y
– Possible unrecoverable bias
For n-city Traveling Salesman Problem,
any effective strategy to initialize the
population, given the way how the solution is
coded?
Randomly initialized path

z
z Hilb t space-filling
Hilbert filli curve

Fast closest path


p
3. Mapping a Genotype to Phenotype

z Sometimes producing
the phenotype from the
Problem
Genotype
genotype is a obvious Data
process.
z Other times the
genotype might be a set Growth
G h
of parameters to some
algorithm, which work Function
on the problem data to
produce the phenotype.
Phenotype
For n-city Traveling Salesman Problem,
it should be straightforward for the mapping
from genotype to phenotype.
4. Evaluating an Individual

z By far the most costly step for real applications


– Do not re-evaluate unmodified individuals
z It might
g be a subroutine,, a black-box simulator,, or anyy
external process (e.g., robot experiment).
z The effectiveness of the process depends on the
choice of the fitness function.
z Your could use approximate fitness, but not far too long
(fitness inheritance or fitness approximation).
z Constraint handling
handling- what if the phenotype breaks
some constraint of the problem:
– Penalize the fitness
– Specific evolutionary method
z Multiobjective evolutionary optimization gives a set of
compromise solutions
For n-city Traveling Salesman Problem,
what would be the cost function?
5. Mutation Operators

z The mutation operator should allow every part of the


search space to be reached.
z The size of mutation is important and should be
controllable.
controllable
z Mutation should produce valid chromosomes.
For n-city Traveling Salesman Problem,
how should we design the mutation operator?
1

0.9

0 8
0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Binary Valued Representation

b f
before 1 1 1 1 1 1 1
Grey Elephant
after 1 1 1 0 1 1 1

mutated gene
Black Elephant
Mutation probability pm, for each gene
Real Valued Representation

z Perturb values by adding some random noise


Often, a Gaussian/normal distribution N(0,σ) is used,
where
h

– 0 is the mean value


– σ is the standard deviation

and x’i = xi + N(0,σi) for each parameter


Order Based Representation

• Randomly select two different genes and swap them.

7 3 1 8 2 4 6 5

7 3 6 8 2 4 1 5

bit swapping bit reversal


a sequence of bits is reversed
somewhere in the chromosome
Tree Based Representation

• Single point mutation selects one node and replaces


it with a similar one.

* *
2 * π *

r r r r
Annealing Procedure Conjecture

z Two fundamental parameters to be controlled:


– mutation rate and
– degrees of variations
z In early stages, allow mutation to occur frequent and
large jumps for exploration
z Toward the later stages (convergence), limit
occurrence in mutation and with small jumps for
exploitation
l it ti
6. Recombination/Crossover Operators

z The child should inherit something from both


parents If this is not the case then the operator is a
parents.
mutation operator.

z The recombination/crossover operator should be


designed in conjunction with the representation so
that recombination is not always catastrophic.

z Recombination should produce valid chromosomes.


For n-city Traveling Salesman Problem,
how should we design the crossover operator?
Binary Valued Representation

Whole Population:
p ...

parents
1 1 1 1 1 1 1 0 0 0 0 0 0 0

offspring
1 1 1 0 0 0 0 0 0 0 1 1 1 1

1-point crossover
Real Valued Representation

• Uniform crossover: given two parents one child is


created as follows

a
a b c d e f g h
gH
BCDE F GH
A B
• Intermediate recombination (arithmetic crossover):
given two parents one child is created as follows
a b c d e f
A B CDE F

(a+A)/2 (b+B)/2 (c+C)/2 (d+D)/2 (e+E)/2 (f+F)/2

• o1 = p1*bias + p2*(1-bias)
o2 = p1*(1
(1-bias)
bias) + p2*bias
bias
bias toward the center between two parents
z Blend (BLX-α) Crossover:
– picking a random number between min(p1,p2)-α * Δ and
max(p1,pp2)+α * Δ,
Δ where p1 and p2 are the parents
parents, α is a
positive number (typically 0.5), and Δ is |p1 – p2|.
– This formulation allows for the BLX operator to choose a
child on a baseline that extends some distance beyond the
two parents. This allows for better placement of children
than the Arithmetic Crossover; however, the children will still
be biased to lie on baselines between sets of parents.
Therefore, children could still lack diversity.
X i1 = min( xi1 , xi2 ) − αd i d i = xi1 − xi2
X i2 = max( xi1 , xi2 ) + αd i
z Unimodal Normally Distributed (UNDX) Crossover:
– It first selects three parents at random from the population.
– Next,, it finds the midpoint
p of the first two pparents and calls it xp.
Then, it finds the difference vector of the first two parents as d = x1
– x2. The line containing the first two parents is called the primary
search line, and the value D is computed as the distance from the
third parent to the primary search lineline. These terms are then
n −1
combined to form a child xc: c
x = m + dξ + D ∑ηi ei
i =1

– where ei are the set of vectors orthogonal to the primary search


space and σξ and ση are standard deviations determined
empirically.
– This crossover operator
p has the advantage g of ppreserving
g the mean
vector and covariance matrix of the parent population thereby
maintaining a similar distribution to the parent population in child
generations.
Order Based Representation

z Choose an arbitrary part from the first parent and


copy this to the first child.

z Copy the remaining genes that are not in the copied


part to the first child:
– starting right from the cut point of the copied part
– using the order of genes from the second parent
– wrapping
pp g around at the end of the chromosome

z Repeat this process with the parent roles reversed.


Parent 1 Parent 2
7 3 1 8 2 4 6 5 4 3 2 8 6 7 1 5

7, 3, 4, 6, 5
order
1 8 2 4, 3, 6, 7, 5

Child 1
7 5 1 8 2 4 3 6
Tree Based Representation

*
2 *

* r r

π + 2 * (r * r )
π * (r
( + (l / r))
))
r /
Two sub-trees
T bt are selected
l t d
for swapping.
1 r
* *
π + π *
r / r r
*
1 r 2 +
*
2 * r /
Resulting in 2 new
r r expressions 1 r
Crossover vs. Mutation

z Crossover
– modifications depend on the whole population
– decreasing effects
ff with convergence
– exploitation operator

z Mutation
– mandatoryy to escape local optima
– exploration operator

z GA emphasize
h i crossover; while
hil ES and
d EP
emphasize mutation
Exploration vs. Exploitation

Exploration = sample unknown regions


Too much exporation = random search, no
convergence

Exploitation = try to improve the best


best-so-far
so far
individuals
Too much expoitation = local search only
convergence to a local optimum
7. Selection Strategy

z We want to have some way to ensure that better


fitted individuals have a better chance of being
chosen for reproduction than poorly fitted ones
ones.
z This will give us selection pressure which will drive
the ppopulation
p forward.
z On the other hand, we have to be careful to give less
good individuals at least some chance of being
parents - they may include some useful genetic
material.
z Risk of loss diversityy
For n-city Traveling Salesman Problem,
how should we design the selection operator?
Roulette Wheel Selection

z Parents are selected according to their


fitness. The better the chromosomes are,
the more chances they are to be selected.

z The size of the section in the roulette


wheel is pproportional
p to the value of the
fitness function of every chromosome - the
bigger the value is, the larger the section
is.

z Better (fitter) individuals have:


– more space
– more chances to be selected
Fitness Proportionate Selection

Disadvantages:

z Danger of premature convergence because


outstanding individuals take over the entire
population
p p very
yqquickly.
y

z Low selection pressure when fitness values are near


each other
other.

z Behave differentlyy on transposed


p versions of the
same function.
Fitness Scaling

z At the start, a few extraordinary individuals will dominate the


evolution process Æ premature convergence
z Later on, the population average fitness may be closed to the
population best fitness Æ random walk among the mediocre
z a cure for the problem issue
– Start with the raw fitness function
– S
Standardize to ensure
z Lower fitness is better fitness
z Optimal fitness equals to 0
– Adjust to ensure
z Fitness ranges from 0 to 1
– Normalize to ensure
z The sum of the fitness values equals to 1
– Linear scaling: f’ = af + b
z a and b are chosen such that favg = f’ avg
Tournament Selection

z Select k random individuals, without replacement

z Take
T k the
th best
b t
– k is called the size of the tournament
Ranked Based Selection

z Individuals are sorted on their fitness value from best


t worse. The
to Th place
l i thi
in this sorted
t d list
li t iis called
ll d rank.
k

z Instead of using the fitness value of an individual


individual, the
rank is used by a function to select individuals from
this sorted list. The function is biased towards
i di id l with
individuals i h a hi
high
h rank
k ((= good
d fifitness).
)
z Fitness: f(A) = 5, f(B) = 2, f(C) = 19

z Rank: r(A) = 2, r(B) = 3, r(C) = 1

h(x) = min + (max – min)* (r(x) – 1)/(n – 1)

z Function:
F ti h(A) = 3
3, h(B) = 5
5, h(C) = 1
z If applied, proportion on the roulette wheel:
p(A) = 11.1%,
11 1% p(B) = 33.3%,
33 3% p(C) = 55.6%
55 6%
8. Replacement Strategy

z The selection pressure is also affected by the way in


which we decide which members of the population to
eliminate in order to make space for the new
individuals.

z We can use the


W th stochastic
t h ti selection
l ti methods
th d iin
reverse, or there are some deterministic replacement
strategies.

z We can decide never to replace the best in the


population: elitism.
For n-city Traveling Salesman Problem,
how should we design the replacement
strategy?
Elitism

z Should fitness constantly improves?


– Re-introduce in the population previous best-so-far (elitism)
or
– Keep best-so-far in a safe place (preservation)

z Theory
– GA: preserve mandatory
– ES: no elitism sometimes is better

z Application: avoid user’s frustration


z Bottomline is “survival of the fittest”
z Generational Replacement Genetic Algorithm (GRGA)
replaces the entire population with each generation
generation. This is the original
approach of Holland. If the original population size is N, then N offspring
are generated, and they are combined with the parents to get 2N
individuals which are then sorted and the best N individuals are accepted
for the next generation. This approach is rather simple, and places the
entire burden of selection and extinction upon the reproductive
operations.
z Stead State Genetic Algorithm (SSGA)
Steady
uses a fixed population wherein a finite fraction of the population is
extinguished at every generation. This approach separates natural
selection into two phases
phases, parental selection for reproduction and
replacement strategy. Whereas there is no choice of replacement
strategy for the GRGA (because everyone is replaced), there are choices
for replacement strategy in the SSGA.
z Replace Worst replaces the worst existing chromosome in the
population with probability equal to one by the newly created child
z Replace Most Similar replaces the most similar existing
chromosome in the population by the newly created child to
preserve diversity
z Replace Parent replaces one of its parents. (not very useful)
z Replace Random replaces a random existing chromosome in the
population.
z Kill Tournament For parent selection,
selection it is possible to pick a number
of parents at random and replace the worst, or pick two parents at
random and replace the worst of the two, with or without some
attached probability check. This can also be combined with roulette
wheel selection where the two candidates are chosen by roulette
wheel probabilities based upon absolute fitness, and then proceed
with the tournament. This is known as a Stochastic Tournament.
z Population Decimation All chromosomes below the cut-off
threshold do not survive to the next generation. Their replacements
are formed byy mating g the remaining g individual chromosomes.
z Mass Extinction Extinction allows the repopulation of niches and
gives room for new adaptations. In mass extinction models, major
parts of the p
p population
p are occasionallyy replaced,
p as opposed
pp to a
gradual substitution of single individuals that one might see in a
classical GA. Most of the time there are no extinctions, sometimes
a few individuals die, and very rarely a large proportion of the
population is affected. However, the current best individual (an elite
class of size one) always survives extinction events, and is often
used to repopulate the extinguished individuals in the extinction
zones by mutated The analogy in nature is the re re-colonization
colonization of a
niche by fit individuals that quickly explore the environment and
adapt to it.
9. Stopping Criteria

z The optimum is reached !!!!!

z Li it on CPU resources
Limit

z Maximum number of evolution generations

z Maximum number of fitness evaluations

z Li i on the
Limit h user’s
’ patience
i

z After some generations without improvement


For n-city Traveling Salesman Problem,
what would be a reasonable stopping criterion?
z In general, generation number, evolution time,
fitness threshold, fitness convergence are indicators
for stopping.
stopping
z Population convergence The population is deemed
as converged
g when the average g fitness across the
current population is less than a user-specified
percentage away from the best fitness of the current
population.
population
z Gene convergence This is a method that stops the
evolution when a user-specified percentage of the
genes are deemed converged.
10. Managing the Population

z In general, fix population is always used for simplicity.


z If the predetermined population size is too small, there will not be
enough schemas to be exploited, resulting into a sub-optimal
sub optimal
solution. If the population size is too large, it may require
unnecessarily large computational resources and result into an
extremely y longg running
g time.
z Optimal population size?
z For MOP, a fixed population size will have great difficulty in
obtaining a Pareto front with a desired resolution because the
size and shape of the true Pareto front is unknown in a priori for
most of the MOPs
z A dynamic population size will be more reasonable for MOEA if
the computational effort can be adaptively adjusted based on the
complexity of the problem
Performance

z Never draw any conclusion from a single run


– Use statistical measure (average, median) (Box plot)
– F
From a sufficient
ffi i t number
b off iindependent
d d t runs (30
(30-50
50 minimum)
i i )

z From the application point of view


– Design perspective
z Find a very good solution at least once
– Production perspective
p p
z Find a good solution at almost every run

z “What
What you test is what you get”
get , don
don’tt tune algorithm
performance on toy data and expect it to work with real
data
Box Plot
z In descriptive statistics, a boxplot (also
known as a box-and-whisker diagram or
plot) is a convenient way of graphically
depicting groups of numerical data through
th i fifive-number
their b summaries i (th
(the smallest
ll t
observation, lower quartile (Q1), median
(Q2), upper quartile (Q3), and largest
observation). A boxplot may also indicate
which observations
observations, if any
any, might be
considered outliers. The boxplot was
invented in 1977 by the American
statistician John Tukey.
z Boxplots
p can be useful to display
p y
differences between populations without
making any assumptions of the underlying
statistical distribution. The spacings
between the different parts of the box help
indicate the degree of dispersion (spread)
and skewness in the data, and identify
outliers. Boxplots can be drawn either
horizontally or vertically.
For this data set:
• smallest non
non-outlier
outlier observation = 5 (left "whisker")
whisker )
• lower (first) quartile (Q1, x.25) = 7
• median (second quartile) (Med, x.5) = 8.5
• upper (third) quartile (Q3, x.75
75) = 9
• largest non-outlier observation = 10 (right "whisker")
• the value 3.5 is a "mild" outlier, between
• the value 0.5 is an "extreme" outlier,
• the data are skewed to the left (negatively skewed)
• the mean value of the data can also be labeled with a point.
Flowchart
START 1. Generate random population of
N chromosomes (feasible
FITNESS solutions for the problem)
EVALUATION
2. E l t the
Evaluate th fitness
fit f( ) off each
f(x) h
GENERATE NEW
chromosome x in the population
POPULATION 3. Create a new population by
repeating following steps until
the new population is complete
REPLACE by means of selection and
crossover or mutation
4. Replace unfit individuals in old
NO END population by new off springs
CONDITION
SATISFIED
5. If the end condition is satisfied,
stop and return the best solution
stop,
YES
in current population
STOP else Go to step 2
Genetic Algorithm for TSP

z Chromosome representation: (order-based)


z Population size & initialization: (50, at random)
z Fitness function: (distance traveled)
z Mutation: (5%)
z R
Recombination:
bi ti (order-based)

z Selection: (rank-based)
z Replacement: (elitism)
z Stopping criteria: (number of generations- 200)
Demonstration
Case Study: 101-city TSP problem (eil101)

Total Distance:
678.854309.
Population:
50
Generations:
200
Case Study: 535-airport problem (ali535)

Total Distance:
2433.995486
Population:
50
Generations:
200
Case Study: N Queens Problem

z In chess, a queen can move as far


as she pleases, horizontally,
vertically, or diagonally. A chess
board has 8 rows and 8 columns
columns.
The standard 8 by 8 Queen's
problem asks how to place 8
queens on an ordinary chess board
so that none of them can hit any
other in one move.
z One solution - the prettiest in my
opinion - is given in the figure
nearby It turns out that there are
nearby.
12 essentially distinct solutions.
(Two solutions are not essentially
distinct if you can obtain one from
another by rotating your chess
board, or placing in in front of a
mirror, or combining these two
operations.)
z Applications in
– Traffic control
– VLSI design
– Deadlock prevention
– Parallel memory storage schemes
– Sensor deployment
– And etc
etc.
Problem Issues in GA

z Population size
z Binary representation vs. real representation
z Population initialization
z N i fitness
Noisy fit function
f ti
z Stochastic fitness function (or dynamic environment)
z Fitness inheritance and fitness approximation
z Selection/ranking
Se ect o / a g st
strategy
ategy
z Crossover/Recombination operator
z Mutation operator
z Replacement strategy
z Stopping criteria
z Elitism strategy
z Benchmark test functions
z Exploration
p vs. exploitation
p dilemma
z Constraint handling
z Diversity promotion
z Population management
Homework #3- due 11/7/2009

z Problem #1 (Combinatorial Optimization)


Develop a generic genetic algorithm to solve the traveling
salesman problem with 20 cities that are uniformly distributed
within a unit square in a 2-dimensional plane. The coordinates
of 20 cities are given below in a matrix:

⎡0.6606,0.9695,0.5906,0.2124,0.0398,0.1367,0.9536,0.6091,0.8767,0.8148
cities = ⎢
⎣0.9500,0.6740,0.5029,0.8274,0.9697,0.5979,0.2184,0.7148,0.2395,0.2867
0.3876,0.7041,0.0213,0.3429,0.7471,0.5449,0.9464,0.1247,0.1636,0.8668⎤
0.8200,0.3296,0.1649,0.3025,0.8192,0.9392,0.8191,0.4351,0.8646,0.6768⎥⎦

Show the “best” route y


you find and the associated distance with
attached computer coding (with documentation- show your
recipe). An example is given below for reference.
1

0.9

0.8

0.7

0 6
0.6

0.5

0.4

0.3

0.2

0.1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
z Problem 2 (Scalability)
Extend your genetic algorithm to solve the benchmark 101-city
symmetric TSP problem (i.e., eil101 due to Christofides and
Eilson).The benchmark problem can be found from TSPLIB
archive at

http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/

No need to turn in the codes


codes. Only the best route found (i (i.e.,
e its
configuration and distance in the form of above figure) is to be
turned in. This problem is to test if your algorithm can scale up
properly.
properly
z Problem 3 (Numerical Optimization)
Apply the genetic algorithm to De Jong’s Rosenbrock's Valley
problem
F ( x1 , x2 ) = 100( x12 − x2 ) 2 + (1 − x1 ) 2
where global minimum is at the origin. This function is known
as Banana function.
function It has a very narrow and long ridge
ridge. The
global optimum is at the tip of the sharp ridge, which is
parabolic shaped flat valley. The difficulty of this problem is to
converge to the global optimum.
;

z Problem 4 (Multimodel Function)


Apply the genetic algorithm to De Jong’s Shekel’s Foxholes
Function
f j = j + ∑ (xi − ai , j )
25 2
F5 ( x ) = 0.002 + ∑
1 6

j =1 fj i =1

⎡− 32 − 16 0 16 32 − 32 − 16 0 16 32 − 32 − 16 0 16 32 − 32 − 16 0 16 32 − 32 − 16 0 16 32⎤
aij = ⎢
⎣− 32 − 32 − 32 − 32 − 32 − 16 − 16 − 16 − 16 − 16 0 0 0 0 0 16 16 16 16 16 32 32 32 32 32⎥⎦

where − 16.536 ≤ xi ≤ 16.536, i = 1,2


This function, a multimodel function, which has many local
optimal (in this case 25). This function is a challenge to many
standard optimization algorithms because they can get stuck in
a local optimal. Note that all 25 peaks have the same base
width and the peaks are located at (32
(32, 32)
32), (16
(16, 32)
32), ... , ((-32,
32 -
32).

You might also like