0% found this document useful (0 votes)
158 views58 pages

(Peter Leow) Genetic Algorithms Demystified Unrav

[Peter Leow] Genetic Algorithms Demystified Unrav(

Uploaded by

asdsad123m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views58 pages

(Peter Leow) Genetic Algorithms Demystified Unrav

[Peter Leow] Genetic Algorithms Demystified Unrav(

Uploaded by

asdsad123m
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

Genetic Algorithms Demystified

Unravel the Myths and Power of Genetic Algorithms in Machine Learning

Peter Leow
Copyright © 2016 Peter Leow
All Rights Reserved
To my wife and kids for their love and support
Table of Contents
Introduction

Applications

Minds-on
Core Concepts
Problem Modeling
Evolution Process
Stopping Criteria
Selection Methods
Fitness Proportionate Selection
Tournament Selection
Rank Selection
Crossover Operators
Single-point Crossover
Multi-point Crossover
Mutation Operators
Performance
When to Use
Caveats
Open Source Tools
Commercial Tools

Hands-on
Problem Description
Problem Model
Fitness Evaluation
Linear Rank Selection
Mutation
Evolution Scheme
Observations
System Schema

Summary

About the Author

Other Books by Peter Leow


Introduction
Genetic Algorithms belong to a larger class of computer-based problem solving systems called
Evolutionary Algorithms which use computational models that follow the principles of evolution and
heredity in their design and implementation. The other variants of Evolutionary Algorithms are
Evolutionary Programming and Evolution Strategies. Genetic Algorithms was developed by John
Holland, his colleagues and students at the University of Michigan in early 1970s. Genetic Algorithms
deal with optimization problems. Inspired by Darwin’s theory of evolution, Genetic Algorithms
employ a repeated process of selection, attrition, and cross-breeding of potential solutions in search
of the optimal one to a problem.
Applications
The needs for Optimization are due in part to scarcity of resources and in part to conflicting desires.
As a daily ritual, we would like to choose the commuting arrangement that enable us to reach our
destination in the manner that yields the cheapest fare, the shortest commuting time, and be as
comfortable as possible. Is it achievable? We know that the answer is “you cannot have your cake and
eat it too”. The relationship between these desires are contradictory and can only be resolved through
optimization. In this commuting case, individual commuters will have to search all possible
combination of commuting arrangements – taxis, buses, trains or a mix of them, evaluate and compare
in order to reach an optimal solution. This is optimization at work.
On a more serious note, Genetic Algorithms have been used in the following areas with varying
degrees of success:

Resource Allocation – Job shop scheduling, time-table planning, and the classical traveling
1. salesperson problem.

Design Optimization – Network routing, satellite orbit selection.


2.

Machine Learning – Optimize other machine learning systems, such as weights for neural
3. networks, weights for case based reasoning system, and rules for classifier systems.
Minds-on
First, the theoretical aspect of genetic algorithms. I will step through the process of how genetic
algorithms deal with optimization problem while introducing the core concepts of genetic algorithms
along the way.
Core Concepts
Problem Modeling

Define the problem requirements that is what we are optimizing for, such as to achieve balanced
1. distribution of containers on a vessel.

Translate this problem into the genetic algorithms problem requirements, such as assignment of
2. containers to locations in the vessel.

Identify the information already presents in the problem, such as container IDs, weights, Location
3. IDs; as well as information that needs to be computed or derived, such as center of gravity.

Determine the fitness function for evaluating the “goodness” of each candidate solution. For
4. example, the lower the overall center of gravity of each feasible solution the better.

Encode the solution in a form that is analogous to biological’s chromosomes or sequence of


DNA. For example, an array of locations in the vessel with each location being filled with a
5.
container ID.

Determine the appropriate method to evolve feasible solutions. For example, permutation on the
order of container IDs to each location in the array as shown in Figure 1, or permutation on the
6.
location indices to each container ID as shown in Figure 2.

Figure 1: Permutation by Container IDs


Figure 2: Permutation by Location Indices
Evolution Process

Start with an initial population of randomly generated candidate solutions aka chromosomes.
1.

Evaluate each solution and assigned a measure of its fitness using the fitness function determined
2. in the problem modeling stage.

Select high fitness chromosomes from the population to form the mating pool for reproduction
3. later on.

Combine the features of two of the chromosomes in the mating pool to reproduce two offspring in
4. the hope to conserve and exploit good genetic traits. This is known as Crossover.

Alter one or more components of a selected chromosome randomly to inject diversity into the
population. This is known as Mutation. Mutation will help to break the dominance of elitist
5. group by giving the weaker individuals a chance, albeit a very slim one, to survive and be
selected for subsequent evolution.

Replace older members of the population with their offspring forming a new generation of
6. population.

Repeat the process of evaluation, selection, crossover, mutation and replacement until some
7. stopping criteria are met.

Stopping Criteria

When a pre-determined number of iterations or generations is reached;


1.
2. When a chromosome (solution) that meets or exceeds a target fitness value is found; or

When all the chromosomes in the population have attained certain level of uniformity in terms of
3. fitness.
Selection Methods
There are many methods to select mating chromosomes. The choice of selection method is problem
dependent and can greatly impact the optimization process and outcome. It may be decided after
comparing the outcomes from these methods. Here, we take a look at three of them, i.e. Fitness
Proportionate Selection, Tournament Selection, and Rank Selection.

Fitness Proportionate Selection

Calculate the fitness value for each chromosome, e.g. fi for the ith chromosome.
1.

Find the total fitness of the whole population, F.


2.

Calculate the probability of selection for each chromosome, pi = fi / F.


3.

Calculate the cumulative probability for each chromosome, qi = ∑i pi.


4.

Generate a random number r between [0, 1].


5.

If r is less than q1 which is the cumulative probability of first chromosome, then select the first
6. chromosome, otherwise select the ith chromosome where qi-1 < r < qi.

If a few chromosomes possess overly large fitness values as compared to the majority, they will be
selected too often. The consequence is too-quick and pre-mature convergence, resulting in a sub-
optimal solution.
Tournament Selection

Select two chromosomes from the population.


1.

Generate a random number r between [0, 1].


2.

If r is less than a pre-determine number T (called tournament size), than select the fitter of the two
3. chromosomes, otherwise select the weaker one.

The two chromosomes are then returned to the original population and can be selected again.
4.
This type of selection tends to favour the fitter ones as the tournament size increases.
Rank Selection

Rank the chromosomes in the population in ascending order according to their fitness values. The
1. weakest one will get a ranking of 1, the next weakest one 2, and so on.

Use the ranking instead of the fitness value to calculate the probability of selection for each
2. chromosome.

This method will prevent too-quick convergence that could happen with fitness proportionate
selection method, but at the expense of slower convergence.
Crossover Operators
Crossover and mutation are two core operators of genetic algorithms. Similar to selection methods,
there are many ways to perform crossover and mutation. Here we discuss the crossover operator first.
In crossover, segments of two mating chromosomes are randomly chosen and swapped to produce
two offspring. There are many crossover operators. Here, we are looking at two of them which are
the Single-point Crossover and Multi-point Crossover.
Single-point Crossover
Refer to Figure 3 below:

Figure 3: Single-point Crossover

Select two parent chromosomes, e.g. parent 1 and parent 2, for reproduction.
1.

Select the position of crossover point randomly.


2.

Copy those genes on the left side of the crossover point of parent 1's chromosome to child 1.
3.

Copy those genes on the right side of the crossover point of parent 2's chromosome to child 1.
4.

Child 1 is born.
5.
6. Swap the roles of the two parents and repeat steps 3 and 4 to produce child 2.

Multi-point Crossover
Refer to Figure 4 below:

Figure 4: Multi-point Crossover

Select two parent chromosomes, e.g. parent 1 and parent 2, for reproduction.
1.

Select the position of two (or more) crossover points randomly.


2.

Copy those genes outside of the two crossover points of parent 1's chromosome to child 1.
3.

Copy those genes in between the two crossover points of parent 2's chromosome to child 1.
4.

Child 1 is born.
5.

Swap the roles of the two parents and repeat steps 3 and 4 to produce child 2.
6.
Not all chosen chromosome pairs will undergo crossover. We introduce a new parameter called
Probability of Crossover, p . Before a crossover takes place, we generate a random number from the
c

range of [0, 1], if this random number is less than the probability of crossover, then crossover takes
place otherwise aborts. This gives the expected number of chromosomes to undergo crossover at p c

multiply the population size. The probability of crossover is also known as crossover rate.
Mutation Operators
Mutation replaces the values of some randomly chosen genes of a chromosome by some arbitrary new
values. One of the popular ways of mutation for a binary chromosome is Bit Inversion which simply
flips the values of randomly chosen bits from 1 to 0 or vice versa.
Similar to crossover, mutation does not always take place. It depends on a parameter
called Probability of Mutation, pm. Before a mutation takes place, we generate a random number
from the range of [0, 1], if this random number is less than the probability of mutation, then mutation
takes place else aborts. The probability of mutation is also known as mutation rate.
The objective of mutation is to inject diversity into the population, prompting the genetic algorithms
to explore new solutions and as such lower the risk of being trapped in a local optimum as illustrated
in Figure 5.

Figure 5: Overcome Local Optimum through Mutation


Performance
The performance of genetic algorithms depends on a number of factors such as:

The encoding method.


1.

The selection method.


2.

The crossover operators.


3.

The mutation operators.


4.

The Parameter settings, i.e. population size, crossover rate, and mutation rate.
5.
When to Use
Genetic algorithms can be applied to solve problems in the following situations:

There is no known ways to reasonably solve problems;


1.

It is impossible to enumerate all possible solutions, e.g. NP-complete; but


2.

If presented with alternatives, we can tell the good solutions from the bad ones.
3.
Caveats
If genetic algorithms can converge to an optimal solution, it is equally likely that it can converge to a
poor solution. If that occurs, it is most probably owing to poor problem modeling, premature
convergence, a poor fitness function, poor parameter settings, or simply bad luck of the random
number.
Do we know whether the final solution generated by genetic algorithms optimal or any way near
there? The answer is we never know. If we knew how to find the optimal solution, we would not need
to use genetic algorithms in the first place. Bearing in mind that the problems that require genetic
algorithms to solve are NP-complete, i.e., they are very hard problems to solve.
In other words, there is no guarantee that genetic algorithms will find an optimal solution.
Open Source Tools

Language GA Library

GALib – A C++ Library of Genetic Algorithm Components.

C++ The Genetic Algorithm Utility Library (GAUL) – A flexible programming library
designed to aid in the development of applications that use genetic algorithms.

Java Genetic Algorithms Package (JGAP) – A Java framework of Genetic Algorithms


and Genetic Programming component.
Java
Java API for Genetic Algorithms (JAGA) – An extensible and pluggable Java API for
implementing genetic algorithms and genetic programming applications.

Pyevolve – A complete python genetic algorithm framework.


Python

AI::Genetic – A pure Perl genetic algorithm implementation.


PERL
Commercial Tools
Genetic algorithms can be applied to solve problems in the following situations:

Generator by New Light Industries.


1.

GTO for BrainMaker Professional by California Scientific.


2.

MATLAB by MathWorks.
3.
Hands-on
Enough of talking, let’s put the theory into practice. We shall explore the design and implementation of
a genetic algorithms project to optimize a load distribution problem. The project is code named
GALoadDistriMiser. The project was implemented in Java language and its execution can be viewed
on YouTube by clicking on Figure 6 below:

Figure 6: Click to View GALoadDistriMiser on YouTube


Problem Description
The project attempts to optimize the load distribution of 64 packages on a vessel. It is assumed that
the holding space for the packages is divided into 64 rectangular spaces as shown in Figure 7.

Figure 7: Stacks of Packages


The problem requirements state that:

The distribution of the weights of the packages should be more or less uniform; and
1.

It is also necessary to ensure that the lighter packages are placed on top of the heavier ones.
2.
Translate these into genetic algorithms objectives, we have:

Optimize weight distribution horizontally to balance the vessel; and


1.

Optimize weight distribution vertically to maintain low center of gravity.


2.
To make the project more challenging, I have added two more requirements:

Optimize unloading operation at each port of call. i.e. the activities of unloading and loading of
3. obstructing packages have to be minimized at each port; and

Meeting as much as possible objectives 1 and 2 after unloading at each port of call so that no
unnecessary reshuffling is needed. (Assuming no loading of new packages at each destination
4.
port.)
Problem Model
The following information is provided:

Package IDs;
1.

Location IDs; and


2.

Weight of each of the 64 packages.


3.
The GALoadDistriMiser problem is represented in a genetic algorithm model as shown in Figure 8:
Figure 8: Genetic Algorithm Model

In the phenotypic space (i.e. the physical world), the candidate solutions are manifested as
1. different configurations of stacks of 64 packages.

In the genotypic space (i.e. the genetic algorithms space), each candidate solution is represented
2. as a chromosome of 64 genes identified by their package IDs.
Candidate solutions can be generated by permuting the 64 package IDs.
3.

On completion of the genetic algorithms evolution, the found optimal solution is then mapped
4.
back to the phenotypic space.
Fitness Evaluation
To evaluate the quality of each candidate solution or chromosome in meeting the four optimization
objectives, I have to devise four different evaluation functions for each objective. The overall fitness
is the sum of these four evaluation functions. Here, the smaller the overall fitness value the better the
solution.

Calculate the sum of the average weight deviation of each vertical stack (four
packages per stack) from the overall average weight of the 64 packages, i.e.
Fitness
Function for F0 = ∑ (average weight of each stack of 4 packages – average weight of all 64
Objective 1
packages)

Two options are available:

- Impose penalty when a heavier package is placed above a lighter one, i.e.
Fitness PENALTY_WEIGHT += 1; or
Function for
Objective 2 - Implement repair method to sort packages in each stack so that the packages are
arranged in weight-ascending order from top down. If this option is chosen, then
PENALTY_WEIGHT = 0

Fitness Impose penalty when a package due for later port is placed above one due for earlier
Function for port, i.e. PENALTY_PORT += 1.
Objective 3

After unloading at each port, calculate the sum of the average weight deviation of
each stack from the overall average weight of the remaining packages, i.e
Fitness
Function for
At the nth port, Fn = ∑ (average weight of each stack – average weight of remaining
Objective 4
packages)

Finally, the overall fitness of a candidate solution is:


F0 + PENALTY_WEIGHT + PENALTY_PORT + ∑ (Fn)
The smaller the overall fitness value the better the candidate solution is.
Linear Rank Selection
I have used a selection method called Linear Rank Selection to select the mating chromosomes for
reproduction. It goes like this:

Each chromosome in the population is ranked in increasing order of their fitness from 1 to N,
1. where N is the population size, assuming Ri is the ranking for the ith chromosome;

Each chromosome is assigned a new fitness value using the cumulative relative fitness function,
i.e.
2. qi = ∑1..i i, where i = (N - Ri + 1)

Generate a random number r between [1, N].


3.

If r is less than q1 then select the first chromosome, otherwise select the ith chromosome where qi-
4. 1 < r < q i

Order Crossover
I have used a crossover operator called Order Crossover operator to breed offspring from two
mating chromosomes. It works like this:

Select two parent chromosomes P1 and P2, each with two cut-points | as shown:
P1: (J K L | M N O | Z Q R)
1.
P2: (Q J N | L K Z | R M O)

Create a premature child C1 by copying the elements from P1's mid-section, i.e.
2. C1: (_ _ _ | M N O | _ _ _)

Start from the second cut-point of P2, i.e. R M O, only copy R to C1 as M and O already exist in
C1. i.e.
3.
C1: (_ _ _ | M N O | R _ _)
Starting from the left side of P2, copy those elements that are not already in C1, i.e. Q and J, to
4. fill out the vacancies after R in C1, i.e.
C1: (_ _ _ | M N O | R Q J)

Starting from the left side of P2, continue to copy elements that are not already in C1, i.e. L, K,
and Z, to fill out the remaining vacancies in C1, i.e.
5.
C1: (L K Z | M N O | R Q J)

Child 1 has been delivered.


6.

7. Another child can similarly be bred by switching the roles of P1 and P2.

Order crossover operator creates children that preserve the order and position of elements inherited
from one parent. It also preserves the relative order of the remaining elements inherited from the
second parent.
Mutation
For mutation in this project, it will simply swap two randomly selected genes of a chromosome.
Evolution Scheme
The optimization process of the GALoadDistriMiser was designed and coded according to the
following scheme:

BEGIN
INITIALIZE
POPULATION (of randomly generated candidate solutions)
POPULATION SIZE (N)
CROSSOVER RATE (CR)
MUTATION RATE (MR)
NUMBER OF EVOLUTIONS (STOPPING CRITERION)

DO WHILE (STOPPING CRITERION is not met)

INTERIM POPULATION = NULL


INTERIM POPULATION SIZE (I) = 0

COMPUTE fitness of each candidate

DO WHILE (I < N)

SELECT 2 parents randomly using Linear Rank Selection method

GENERATE a random number C

IF C < CR THEN
BREED 2 offspring using Order Crossover operator
ADD the offspring to INTERIM POPULATION
ELSE
ADD the 2 parents to INTERIM POPULATION
END IF

I=I+2

END WHILE

FOR each member in INTERIM POPULATION

GENERATE a random number M

MUTATE this member by randomly swapping 2 genes


END FOR

IF (REPAIR CENTRE OF GRAVITY option is selected)


SORT each stack so that heavier packages are placed below the lighter ones
END IF

IF (ADAPTIVE GA option is selected AND the best fitness value remains the same for 10,000
generations)
CR = CR * 1.10 (CR capped at 0.7)
MR = MR * 1.10 (MR capped at 0.1)
END IF

REPLACE POPULATION with INTERIM POPULATION

END WHILE

END

As shown in the scheme, I have added two options to change the behavior of the evolution process.
One of them is a repair option to repair solutions by sorting each stack so that heavier packages are
placed below the lighter ones. Another one is an adaptive option to increase the crossover rate and
mutation rate when the process becomes stagnant over 10,000 generations.
Observations
I have conducted many runs of GALoadDistriMiser under different parameter settings and options in
an effort to understand the behaviour and performance of genetic algorithms. Some of the results are
shown in Figure 9.

Figure 9: Performance of GALoadDistriMiser


The main observations are as follows:

Genetic algorithms evolution process will never follow a fixed path for each re-run because of
stochastic nature. However, every well-designed genetic algorithms model will behave
identically; they generally improve faster at the initial stage when population diversity is wide
1.
spread, but gradually slow down and become sluggish at the later stage when the population is
dominated by near homogeneous individuals.
Owing to the multi-modal and non-continuity of the search space, a better solution could just be
2. next to an infeasible one. So repairing an infeasible solution, if possible, could lead to the
discovery of the next best solution, and thus reducing the search effort.

Static configuration of genetic algorithms parameters that may work very well at the initial stage
of evolution will generally be of little use when the population reaches a certain local optimum.
This results in genetic algorithms hitting a bottleneck situation after some time. One way to enable
the genetic algorithms to break out of the local optimum and explore new search space is to allow
3.
genetic algorithms to change the control parameters dynamically based on some criteria. One of
these could be to increase the crossover and mutation rates when there is no significant
improvement in fitness value over say 10,000 generations.
System Schema
GALoadDistriMiser was developed in Java and integrated with the following components as shown
in Figure 10:

JGAP (Java Genetic Algorithm Package) was included as the genetic algorithms library.
1.

Java Swing provides the GUI for user interaction and display of the genetic algorithms progress.
2.

MS Access was used for storing genetic algorithms parameters, evolution log, and solutions.
3.

MS Excel was used to extract evolution log data from MS Access database and chart graphs to
4. aid in visual analysis, one of which you have seen in Figure 9.

Figure 10: System Schema of GALoadDistriMiser


Summary
With minds-on followed by hands-on, you have learned the fundamental concepts of genetic
algorithms in theory and walk through the design and implementation of a genetic algorithms project
called GALoadDistriMiser.
About the Author

Peter Leow is a software engineer and system analyst, he has more than 15 years of software
development and teaching experience in open source as well as proprietary technologies. Besides
software development, artificial intelligence is another field that he has a great interest in. An avid
pursuer of knowledge and advocate for lifelong learning, he has published many well-received
articles at CodeProject.com and contributed solutions to its coding forum and discussion. For his
efforts, he was awarded Code Project MVP (Most Valuable Professional) 2015 and 2016
consecutively.
Learn more about Peter Leow’s reputation at CodeProject.com.
Check out Peter Leow's publications at Amazon Author Central.
Read the many articles by Peter Leow at PeterLeowBlog.com.
Last but not least, follow me on twitter @peterleowblog.
Other Books by Peter Leow

1. Handling Input and Storage on Android

2. Self-Organizing Map Demystified

3. From Indifferent to Responsive Web Design

You might also like