Genetic Algorithm Example
Genetic Algorithm Example
Introduction
Few days back, I started working on a practice problem – Big Mart Sales.
After applying some simple models and doing some feature engineering, I
landed up on 219th position on the leader board.
Yes, a jump from 219th to 15th position just on the basis on genetic
algorithm. Isn’t that great? By end of this article, you will be comfortable
applying genetic algorithms and can expect similar benefit on the problems
you are working on.
Table of Content
It is not the strongest of the species that survives, nor the most intelligent ,
but the one most responsive to change.
You must be thinking what has this quote got to do with genetic algorithm?
Actually, the entire concept of a genetic algorithm is based on the above line.
Let’s take a hypothetical situation where, you are head of a country, and in
order to keep your city safe from bad things, you implement a policy like this.
You select all the good people, and ask them to extend their generation by
having their children.
This repeats for a few generations.
You will notice that now you have an entire population of good people.
Now, that may not be entirely possible, but this example was just to help you
understand the concept. So the basic idea was that we changed the input
(i.e. population) such that we get better output (i.e. better country).
Now, I suppose you have got some intuition that the concept of a genetic
algorithm is somewhat related to biology. So let’s us quickly grasp some little
concepts, so that we can draw a parallel line between them.
2. Biological Inspiration
I wanted you to recall these basics concept of biology before going further.
Let’s get back and understand what actually is a genetic algorithm?
Let’s get back to the example we discussed above and summarize what we
did.
This is how genetic algorithm actually works, which basically tries to mimic
the human evolution to some extent.
If you haven’t come across this problem, let me introduce my version of this
problem.
Let’s say, you are going to spend a month in the wilderness. Only thing you
are carrying is the backpack which can hold a maximum weight of 30 kg.
Now you have different survival items, each having its own “Survival Points”
(which are given for each item in the table). So, your objective is maximise
the survival points.
4.1 Initialisation
To solve this problem using genetic algorithm, our first step would be defining
our population. So our population will contain individuals, each having their
own set of chromosomes.
We know that, chromosomes are binary strings, where for this problem 1
would mean that the following item is taken and 0 meaning that it is dropped.
This set of chromosome is considered as our initial population.
So, for this problem, our chromosome will be considered as more fit when it
contains more survival points.
4.3 Selection
Now, we can select fit chromosomes from our population which can mate
and create their off-springs.
General thought is that we should select the fit chromosomes and allow them
to produce off-springs. But that would lead to chromosomes that are more
close to one another in a few next generation, and therefore less diversity.
I suppose we all have seen this, either in real or in movies. So, let’s build our
roulette wheel.
Consider a wheel, and let’s divide that into m divisions, where m is the
number of chromosomes in our populations. The area occupied by each
chromosome will be proportional to its fitness value.
So, in this method we can get both our parents in one go. This method is
known as Stochastic Universal Selection method.
4.4 Crossover
This is the most basic form of crossover, known as one point crossover. Here
we select a random crossover point and the tails of both the chromosomes
are swapped to produce a new off-springs.
If you take two crossover point, then it will called as multi point crossover
which is as shown below.
4.5 Mutation
Now if you think in the biological sense, are the children produced have the
same traits as their parents? The answer is NO. During their growth, there is
some change in the genes of children which makes them different from its
parents.
The off-springs thus produced are again validated using our fitness function,
and if considered fit then will replace the less fit chromosomes from the
population.
But the question is how we will get to know that we have reached our best
possible solution?
Now, I suppose you have grasp the basic understanding of the genetic
algorithm. So now let us look at some of the application of genetic algorithm
in data science.
How do you select features that are important in prediction of the target
variable? You always look at the feature importance of some model, and
then manually decide the threshold, and select the features which have
importance above that threshold.
Is there any better way to deal with this kind of situations? Actually one of
the most advanced algorithms for feature selection is genetic algorithm.
The method here is completely same as the one we did with the knapsack
problem.
So finally, here the comes the part for which you have been waiting from the
beginning of this article.
First, let’s take a quick view on the TPOT (Tree-based Pipeline Optimisation
Technique) which is build upon scikit-learn library.
So, without going deep into this, let’s directly try to implement it.
For using TPOT library, you first have to install some existing python libraries
on which TPOT is build. So let us quickly install them.
# installling TPOT
For the implementation part, here I have used Big Mart Sales dataset. So
quickly download the train and test file.
If you submit this csv, you will notice that what I promised in the start has not
been fulfilled. Was I lying to make you study all of these?
No, actually there is a simple rule of TPOT library, if you don’t run TPOT for
very long, then it may not find the best possible pipeline for your problem.
So, increase the number of generations, grab a cup of coffee and go out for
a walk. TPOT will finish your work.
Genetic algorithm has many applications in real world. Here I have listed
some of the interesting application, but explaining each one of them will
require me an extra article.
This is a famous problem and has been efficiently adopted by many sales-
based companies as it is time saving and economical. This is also achieved
using genetic algorithm.
6.3 Robotics
The use of genetic algorithm in the field of robotics is quite big. Actually,
genetic algorithm is being used to create learning robots which will behave
as a human and will do tasks like cooking our meal, do our laundry etc.
Now after these I suppose, you must have developed enough curiosity to
look out for some more other interesting applications of genetic algorithms.
Also you can comment down if you want to share that with us.
7. End Notes
I hope that now you have gain enough understanding about what genetic
algorithm is and also how to implement it using TPOT library. But this
knowledge is not enough, if you don’t apply it somewhere.