0% found this document useful (0 votes)
15 views

Genetic Algorithm

It is about service life assessment of concrete with asr and possible mitigation.

Uploaded by

praveenm.phdse
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Genetic Algorithm

It is about service life assessment of concrete with asr and possible mitigation.

Uploaded by

praveenm.phdse
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 13

Genetic algorithm

A genetic algorithm is a search technique used in computing to find true or approximate


solutions to optimization and search problems. The term "genetic algorithm" is often
abbreviated as GA. Genetic algorithms find application in computer science, engineering,
economics, physics, mathematics and other fields. Genetic algorithms are categorized as
global search heuristics. Genetic algorithms are a particular class of evolutionary
algorithms that use techniques inspired by evolutionary biology such as inheritance,
mutation, selection, and crossover (also called recombination).

Genetic algorithms are implemented as a computer simulation in which a population of


abstract representations (called chromosomes or the genotype or the genome) of
candidate solutions (called individuals, creatures, or phenotypes) to an optimization
problem evolves toward better solutions. Traditionally, solutions are represented in
binary as strings of 0s and 1s, but other encodings are also possible. The evolution
usually starts from a population of randomly generated individuals and happens in
generations. In each generation, the fitness of every individual in the population is
evaluated, multiple individuals are stochastically selected from the current population
(based on their fitness), and modified (recombined and possibly mutated) to form a new
population. The new population is then used in the next iteration of the algorithm.

GA procedure
A typical genetic algorithm requires two things to be defined:

1. a genetic representation of the solution domain,


2. a fitness function to evaluate the solution domain.

A standard representation of the solution is as an array of bits. Arrays of other types and
structures can be used in essentially the same way. The main property that makes these
genetic representations convenient is that their parts are easily aligned due to their fixed
size, that facilitates simple crossover operation. Variable length representations were also
used, but crossover implementation is more complex in this case. Tree-like
representations are explored in Genetic programming and free-form representations are
explored in HBGA.

The fitness function is defined over the genetic representation and measures the quality of
the represented solution. The fitness function is always problem dependent. For instance,
in the knapsack problem we want to maximize the total value of objects that we can put
in a knapsack of some fixed capacity. A representation of a solution might be an array of
bits, where each bit represents a different object, and the value of the bit (0 or 1)
represents whether or not the object is in the knapsack. Not every such representation is
valid, as the size of objects may exceed the capacity of the knapsack. The fitness of the
solution is the sum of values of all objects in the knapsack if the representation is valid,
or 0 otherwise. In some problems, it is hard or even impossible to define the fitness
expression; in these cases, interactive genetic algorithms are used.

Once we have the genetic representation and the fitness function defined, GA proceeds to
initialize a population of solutions randomly, then improve it through repetitive
application of mutation, crossover, and selection operators.

Initialization

Initially many individual solutions are randomly generated to form an initial population.
The population size depends on the nature of the problem, but typically contains several
hundreds or thousands of possible solutions. Traditionally, the population is generated
randomly, covering the entire range of possible solutions (the search space).
Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to
be found.

Selection

During each successive epoch, a proportion of the existing population is selected to breed
a new generation. Individual solutions are selected through a fitness-based process, where
fitter solutions (as measured by a fitness function) are typically more likely to be
selected. Certain selection methods rate the fitness of each solution and preferentially
select the best solutions. Other methods rate only a random sample of the population, as
this process may be very time-consuming.

Most functions are stochastic and designed so that a small proportion of less fit solutions
are selected. This helps keep the diversity of the population large, preventing premature
convergence on poor solutions. Popular and well-studied selection methods include
roulette wheel selection and tournament selection.

Reproduction

Main articles: crossover (genetic algorithm) and mutation (genetic algorithm)

The next step is to generate a second generation population of solutions from those
selected through genetic operators: crossover (also called recombination), and/or
mutation.

For each new solution to be produced, a pair of "parent" solutions is selected for breeding
from the pool selected previously. By producing a "child" solution using the above
methods of crossover and mutation, a new solution is created which typically shares
many of the characteristics of its "parents". New parents are selected for each child, and
the process continues until a new population of solutions of appropriate size is generated.

These processes ultimately result in the next generation population of chromosomes that
is different from the initial generation. Generally the average fitness will have increased
by this procedure for the population, since only the best organisms from the first
generation are selected for breeding, along with a small proportion of less fit solutions,
for reasons already mentioned above.

[edit] Termination

This generational process is repeated until a termination condition has been reached.
Common terminating conditions are

 A solution is found that satisfies minimum criteria


 Fixed number of generations reached
 Allocated budget (computation time/money) reached
 The highest ranking solution's fitness is reaching or has reached a plateau such
that successive iterations no longer produce better results
 Manual inspection
 Combinations of the above

[edit] Pseudo-code algorithm


Choose initial population
Evaluate the fitnesses of individuals in the population
Repeat
Select best-ranking individuals to reproduce
Breed new generation through crossover and mutation (genetic
operations) and give birth to offspring
Evaluate the individual fitnesses of the offspring
Replace worst ranked part of population with offspring
Until terminating condition

[edit] Observations

There are several general observations about the generation of solutions via a genetic
algorithm:

 In many problems with sufficient complexity, GAs may have a tendency to


converge towards local optima rather than the global optimum of the problem.
The likelihood of this occurring depends on the shape of the fitness landscape:
certain problems may provide an easy ascent towards a global optimum, others
may make it easier for the function to find the local optima. This problem may be
alleviated by using a different fitness function, increasing the rate of mutation, or
by using selection techniques that maintain a diverse population of solutions.
 Operating on dynamic data sets is difficult, as genomes begin to converge early
on towards solutions which may no longer be valid for later data. Several methods
have been proposed to remedy this by increasing genetic diversity somehow and
preventing early convergence, either by increasing the probability of mutation
when the solution quality drops (called triggered hypermutation), or by
occasionally introducing entirely new, randomly generated elements into the gene
pool (called random immigrants). Recent research has also shown the benefits of
using biological exaptation (or preadaptation) in solving this problem.
 GAs cannot effectively solve problems in which the only fitness measure is
right/wrong, as there is no way to converge on the solution. (No hill to climb). In
these cases, a random search may find a solution as quickly as a GA.
 Selection is clearly an important genetic operator, but opinion is divided over the
importance of crossover versus mutation. Some argue that crossover is the most
important, while mutation is only necessary to ensure that potential solutions are
not lost. Others argue that crossover in a largely uniform population only serves to
propagate innovations originally found by mutation, and in a non-uniform
population crossover is nearly always equivalent to a very large mutation (which
is likely to be catastrophic).
 Often, GAs can rapidly locate good solutions, even for difficult search spaces.
 For specific optimization problems and problem instantiations, simpler
optimization algorithms may find better solutions than genetic algorithms (given
the same amount of computation time). Alternative and complementary
algorithms include simulated annealing, hill climbing, and swarm intelligence
(e.g.: ant colony optimization, particle swarm optimization).
 As with all current machine learning problems it is worth tuning the parameters
such as mutation probability, recombination probability and population size to
find reasonable settings for the problem class being worked on. A very small
mutation rate may lead to genetic drift (which is non-ergodic in nature) or
premature convergence of the genetic algorithm in a local optimum. A mutation
rate that is too high may lead to loss of good solutions. There are theoretical but
not yet practical upper and lower bounds for these parameters that can help guide
selection.
 The implementation and evaluation of the fitness function is an important factor
in the speed and efficiency of the algorithm.

Variants
The simplest algorithm represents each chromosome as a bit string. Typically, numeric
parameters can be represented by integers, though it is possible to use floating point
representations. The basic algorithm performs crossover and mutation at the bit level.
Other variants treat the chromosome as a list of numbers which are indexes into an
instruction table, nodes in a linked list, hashes, objects, or any other imaginable data
structure. Crossover and mutation are performed so as to respect data element boundaries.
For most data types, specific variation operators can be designed. Different chromosomal
data types seem to work better or worse for different specific problem domains.

When bit strings representations of integers are used, Gray coding is often employed. In
this way, small changes in the integer can be readily effected through mutations or
crossovers. This has been found to help prevent premature convergence at so called
Hamming walls, in which too many simultaneous mutations (or crossover events) must
occur in order to change the chromosome to a better solution.
Other approaches involve using arrays of real-valued numbers instead of bit strings to
represent chromosomes. Theoretically, the smaller the alphabet, the better the
performance, but paradoxically, good results have been obtained from using real-valued
chromosomes.

A slight, but very successful variant of the general process of constructing a new
population is to allow some of the better organisms from the current generation to carry
over to the next, unaltered. This strategy is known as elitist selection.

Parallel implementations of genetic algorithms come in two flavours. Coarse grained


parallel genetic algorithms assume a population on each of the computer nodes and
migration of individuals among the nodes. Fine grained parallel genetic algorithms
assume an individual on each processor node which acts with neighboring individuals for
selection and reproduction. Other variants, like genetic algorithms for online optimization
problems, introduce time-dependence or noise in the fitness function.

Problem domains
Problems which appear to be particularly appropriate for solution by genetic algorithms
include timetabling and scheduling problems, and many scheduling software packages
are based on GAs. GAs have also been applied to engineering. Genetic algorithms are
often applied as an approach to solve global optimization problems.

As a general rule of thumb genetic algorithms might be useful in problem domains that
have a complex fitness landscape as recombination is designed to move the population
away from local optima that a traditional hill climbing algorithm might get stuck in.

History
Computer simulations of evolution started with Nils Aall Barricelli in 1954. Barricelli
was simulating the evolution of automata that played a simple card game. Starting in
1957, the Australian quantitative geneticist Alex Fraser published a series of papers on
simulation of artificial selection of organisms with multiple loci controlling a measurable
trait. From these beginnings, computer simulation of evolution by biologists became
more common in the early 1960s, and the methods were described in books by Fraser and
Burnell (1970) and Crosby (1973). Many early papers are reprinted by Fogel (1998).

Although Barricelli had also used evolutionary simulation as a general optimization


method, genetic algorithms became a widely recognized optimization method as a result
of the work of John Holland in the early 1970s, and particularly his 1975 book. His work
originated with studies of cellular automata, conducted by Holland and his colleagues at
the University of Michigan. Research in GAs remained largely theoretical until the mid-
1980s, when The First International Conference on Genetic Algorithms was held at The
University of Illinois. As academic interest grew, the dramatic increase in desktop
computational power allowed for practical application of the new technique. In 1989, The
New York Times writer John Markoff wrote about Evolver, the first commercially
available desktop genetic algorithm. Custom computer applications began to emerge in a
wide variety of fields, and these algorithms are now used by a majority of Fortune 500
companies to solve difficult scheduling, data fitting, trend spotting and budgeting
problems, and virtually any other type of combinatorial optimization problem.

Related techniques
 Ant colony optimization (ACO) uses many ants (or agents) to traverse the
solution space and find locally productive areas. While usually inferior to genetic
algorithms and other forms of local search, it is able to produce results in
problems where no global or up-to-date perspective can be obtained, and thus the
other methods cannot be applied.

 Cross-entropy method The Cross-entropy (CE) method generates candidates


solutions via a parameterized probability distribution. The parameters are updated
via cross-entropy minimization, so as to generate better samples in the next
iteration.

 Evolution strategies (ES) evolve linear individuals by means of mutation. ES


algorithms are designed particulary to solve problems in the real-value domain.

 Extremal optimization (EO) Unlike GAs, which work with a population of


candidate solutions, EO evolves a single solution and makes local modifications
to the worst components. This requires that a suitable representation be selected
which permits individual solution components to be assigned a quality measure
("fitness"). The governing principle behind this algorithm is that of emergent
improvement through selectively removing low-quality components and replacing
them with a randomly selected component. This is decidedly at odds with a GA
that selects good solutions in an attempt to make better solutions.

 Genetic programming (GP) is a related technique popularized by John Koza in


which computer programs, rather than function parameters, are optimized.
Genetic programming often uses tree-based internal data structures to represent
the computer programs for adaptation instead of the list structures typical of
genetic algorithms.

 Interactive genetic algorithms (IGA) are genetic algorithms that use human
evaluation. They are usually applied to domains where it is hard to design a
computational fitness function, for example, evolving images, music, artistic
designs and forms to fit users' aesthetic preference.

 Memetic algorithm (MA), also called hybrid genetic algorithm among others, is a
relatively new evolutionary method where local search is applied during the
evolutionary cycle. The idea of memetic algorithms comes from memes, which–
unlike genes–can adapt themselves. In some problem areas they are shown to be
more efficient than traditional evolutionary algorithms.

 Simulated annealing (SA) is a related global optimization technique that traverses


the search space by testing random mutations on an individual solution. A
mutation that increases fitness is always accepted. A mutation that lowers fitness
is accepted probabilistically based on the difference in fitness and a decreasing
temperature parameter. In SA parlance, one speaks of seeking the lowest energy
instead of the maximum fitness. SA can also be used within a standard GA
algorithm by starting with a relatively high rate of mutation and decreasing it over
time along a given schedule.

 Stochastic optimization is an umbrella set of methods that includes GAs and


numerous other approaches.

 Tabu search (TS) is similar to Simulated Annealing in that both traverse the
solution space by testing mutations of an individual solution. While simulated
annealing generates only one mutated solution, tabu search generates many
mutated solutions and moves to the solution with the lowest energy of those
generated. In order to prevent cycling and encourage greater movement through
the solution space, a tabu list is maintained of partial or complete solutions. It is
forbidden to move to a solution that contains elements of the tabu list, which is
updated as the solution traverses the solution space.

[Applications
 Artificial Creativity
 Automated design, including research on composite material design and multi-
objective design of automotive components for crashworthiness, weight savings,
and other characteristics.
 Automated design of mechatronic systems using bond graphs and genetic
programming (NSF).
 Automated design of industrial equipment using catalogs of exemplar lever
patterns.
 Calculation of Bound states and Local-density approximations.
 Chemical kinetics (gas and solid phases)
 Configuration applications, particularly physics applications of optimal molecule
configurations for particular systems like C60 (buckyballs).
 Container loading optimization.
 Code-breaking, using the GA to search large solution spaces of ciphers for the one
correct decryption.
 Design of water distribution systems.
 Distributed computer network topologies.
 Electronic circuit design, known as Evolvable hardware.
 File allocation for a distributed system.
 JGAP: Java Genetic Algorithms Package, also includes support for Genetic
Programming
 Parallelization of GAs/GPs including use of hierarchical decomposition of
problem domains and design spaces nesting of irregular shapes using feature
matching and GAs.
 Game Theory Equilibrium Resolution.
 Learning Robot behavior using Genetic Algorithms.
 Learning fuzzy rule base using genetic algorithms.
 Linguistic analysis, including Grammar Induction and other aspects of Natural
Language Processing (NLP) such as word sense disambiguation.
 Mobile communications infrastructure optimization.
 Molecular Structure Optimization (Chemistry).
 Multiple population topologies and interchange methodologies.
 Optimisation of data compression systems, for example using wavelets.
 Protein folding and protein/ligand docking.
 Plant floor layout.
 Representing rational agents in economic models such as the cobweb model.
 Scheduling applications, including job-shop scheduling. The objective being to
schedule jobs in a sequence dependent or non-sequence dependent setup
environment in order to maximize the volume of production while minimizing
penalties such as tardiness.
 Software engineering
 Solving the machine-component grouping problem required for cellular
manufacturing systems.
 Tactical asset allocation and international equity strategies.
 Timetabling problems, such as designing a non-conflicting class timetable for a
large university.
 Training artificial neural networks when pre-classified training examples are not
readily obtainable (neuroevolution).
 Traveling Salesman Problem.
 Robot learning, obstacle avoidance.

In genetic algorithms, mutation is a genetic operator used to maintain genetic diversity


from one generation of a population of chromosomes to the next. It is analogous to
biological mutation.

The classic example of a mutation operator involves a probability that an arbitrary bit in a
genetic sequence will be changed from its original state. A common method of
implementing the mutation operator involves generating a random variable for each bit in
a sequence. This random variable tells whether or not a particular bit will be modified.

The purpose of mutation in GAs is to allow the algorithm to avoid local minima by
preventing the population of chromosomes from becoming too similar to each other, thus
slowing or even stopping evolution. This reasoning also explains the fact that most GA
systems avoid only taking the fittest of the population in generating the next but rather a
random (or semi-random) selection with a weighting toward those that are fitter.

In genetic algorithms, crossover is a genetic operator used to vary the programming of a


chromosome or chromosomes from one generation to the next. It is an analogy to
reproduction and biological crossover, upon which genetic algorithms are based.

Crossover techniques
Many crossover techniques exist for organisms which use different data structures to
store themselves.

[One point crossover

A crossover point on the parent organism string is selected. All data beyond that point in
the organism string is swapped between the two parent organisms. The resulting
organisms are the children:

[edit] Two point crossover

Two point crossover calls for two points to be selected on the parent organism strings.
Everything between the two points is swapped between the parent organisms, rendering
two child organisms:

[edit] "Cut and splice"

Another crossover variant, the "cut and splice" approach, results in a change in length of
the children strings. the reason for this difference is that each parent string has a separate
choice of crossover point.
[edit] Uniform Crossover and Half Uniform Crossover

In both these schemes: the two parents are combined to produce two new offspring.

In the uniform crossover scheme (UX) individual bits in the string are compared between
two parents. The bits are swapped with a fixed probability, typically 0.5.

In the half uniform crossover scheme (HUX), exactly half of the nonmatching bits are
swapped. Thus first the Hamming distance (the number of differing bits) is calculated.
This number is divided by two. The resulting number is how many of the bits that do not
match between the two parents will be swapped.

[edit] Crossover for Ordered Chromosomes

Depending on how the chromosome represents the solution, a direct swap may not be
possible.

One such case is when the chromosome is an ordered list, such as an ordered list the
cities to be travelled for the traveling salesman problem. A crossover point is selected on
the parents. Since the chromosome is an ordered list, a direct swap would introduce
duplicates and remove necessary candidates from the list. Instead, the chrosome up to the
crossover point is retained for each parent. The information after the crossover point is
ordered as it is ordered in the other parent. For example, if our two parents are
ABCDEFGHI and IGAHFDBEC and our crossover point is after the fourth character, then the
resulting children would be ABCDIGHFE and IGAHBCDEF.

Other possible methods include the edge recombination operator and partially mapped
crossover.

crossover Biases
For crossover operators which exchange contiguous sections of the chromosomes (e.g. k-
point) the ordering of the variables may become important. This is particularly true when
good solutions contain building blocks which might be disrupted by a non-respectful
crossover operator

Consider the following two individuals with 11 binary variables each:


individual 1 0 1 1 1 0 0 1 1 0 1 0
individual 2 1 0 1 0 1 1 0 0 1 0 1

The chosen crossover position is:

crossover position 5

After crossover the new individuals are created:

offspring 1 0 1 1 1 0| 1 0 0 1 0 1
offspring 2 1 0 1 0 1| 0 1 1 0 1 0

Fig. 4-6: Single-point crossover

In double-point crossover two crossover positions are selected uniformly at random and
the variables exchanged between the individuals between these points. Then two new
offspring are produced.

Single-point and double-point crossover are special cases of the general method multi-
point crossover.

For multi-point crossover, m crossover positions ki_[1,2,...,Nvar-1], i=1:m, Nvar:


number of variables of an individual, are chosen at random with no duplicates and
sorted into ascending order. Then, the variables between successive crossover points
are exchanged between the two parents to produce two new offspring. The section
between the first variable and the first crossover point is not exchanged between
individuals. Figure 4-7 illustrates this process.

Consider the following two individuals with 11 binary variables each:

individual 1 0 1 1 1 0 0 1 1 0 1 0
individual 2 1 0 1 0 1 1 0 0 1 0 1

The chosen crossover positions are:

cross pos. (m=3) 2 6 10

After crossover the new individuals are created:

offspring 1 0 1| 1 0 1 1| 0 1 1 1| 1
offspring 2 1 0| 1 1 0 0| 0 0 1 0| 0

Fig. 4-7: Multi-point crossover

The idea behind multi-point, and indeed many of the variations on the crossover
operator, is that parts of the chromosome representation that contribute most to the
performance of a particular individual may not necessarily be contained in adjacent
substrings [Boo87]. Further, the disruptive nature of multi-point crossover appears to
encourage the exploration of the search space, rather than favouring the convergence to
highly fit individuals early in the search, thus making the search more robust [SDJ91b].

4.3.2 Uniform crossover

Single and multi-point crossover define cross points as places between loci where an
individual can be split. Uniform crossover [Sys89] generalizes this scheme to make
every locus a potential crossover point. A crossover mask, the same length as the
individual structure is created at random and the parity of the bits in the mask indicate
which parent will supply the offspring with which bits. This method is identical to
discrete recombination, see Section 4.1.

Consider the following two individuals with 11 binary variables each:

individual 1 0 1 1 1 0 0 1 1 0 1 0
individual 2 1 0 1 0 1 1 0 0 1 0 1

For each variable the parent who contributes its variable to the offspring is chosen
randomly with equal probability. Here, the offspring 1 is produced by taking the bit
from parent 1 if the corresponding mask bit is 1 or the bit from parent 2 if the
corresponding mask bit is 0. Offspring 2 is created using the inverse of the mask,
usually.

sample 1 0 1 1 0 0 0 1 1 0 1 0
sample 2 1 0 0 1 1 1 0 0 1 0 1

After crossover the new individuals are created:

offspring 1 1 1 1 0 1 1 1 1 1 1 1
offspring 2 0 0 1 1 0 0 0 0 0 0 0
Uniform crossover, like multi-point crossover, has been claimed to reduce the bias
associated with the length of the binary representation used and the particular coding
for a given parameter set. This helps to overcome the bias in single-point crossover
towards short substrings without requiring precise understanding of the significance of
the individual bits in the individuals representation. [SDJ91a] demonstrated how
uniform crossover may be parameterized by applying a probability to the swapping of
bits. This extra parameter can be used to control the amount of disruption during
recombination without introducing a bias towards the length of the representation used.

You might also like