Pygad: An Intuitive Genetic Algorithm Python Library: June 2021
Pygad: An Intuitive Genetic Algorithm Python Library: June 2021
net/publication/352373466
CITATIONS READS
0 1,248
1 author:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
PyGAD: A Python Library for Building the Genetic Algorithm and Training Machine Learning Algorithms View project
Building Android Apps in Python Using Kivy with Android Studio View project
All content following this page was uploaded by Ahmed Fawzy Gad on 19 June 2021.
I. I NTRODUCTION
Nature has inspired computer scientists to create algorithms for
solving computational problems. These naturally-inspired algorithms
are called evolutionary algorithms (EAs) [1] where the initial solution
(or individual) to a given problem is evolved across multiple iterations
aiming to increase its quality. The EAs can be categorized by different
factors like the number of solutions used. Some EAs evolve a single
initial solution (e.g. hill-climbing) and others evolve a population of
solutions (e.g. particle swarm or genetic algorithm).
The genetic algorithm (GA) a biologically-inspired EA that solves Fig. 1. Flowchart of the genetic algorithm.
optimization problems inspired by Darwin’s theory “survival of the
fittest” [1], [2]. Originally, the GA was not designed for being a
computational algorithm rather understanding how natural selection children. Each child shares genes from its 2 parents using the
happens. Later, it becomes one of the most popular computational crossover operation.
EAs. To go beyond the parents’ capabilities, the mutation is introduced
The core concepts of the GA are derived from the natural solution. to make some changes to the children. Due to the randomness in these
There are several organisms called population. The members of changes, some children may be fitter or worse than their parents. By
the population may have the ability to mate and reproduce new mating the best parents and producing more children, a better solution
organisms. Because the good organisms can produce better offspring, is likely found after each generation. Either the offspring only or
then the good members of the population are selected for mating and combined with the parents form the next generation’s population.
others die. When 2 good organisms mate, their offspring will likely The process is repeated for a limited number of generations or until
be better specially when some new, hopefully, better, changes are a satisfactory solution is found where no further optimization is
introduced. These concepts form the core of the GA in computer necessary.
science. There are many parameters to tune in to make the GA fits the
The GA steps are explained in Figure 1 [2]. The first step creates problem. For experimentation, it is essential to use an easy tool for
an initial population of solutions for the problems. These solutions building the genetic algorithm.
are not the best for the problem and thus evolved. The evolution starts This paper introduces PyGAD, an open-source intuitive Python
by selecting the fittest solutions as parents based on a maximization library for optimization using the genetic algorithm. PyGAD was
fitness function. Each pair of parents mate to produce one or more released in April 2020 and has over 185K installations at the time of
©2021 Ahmed Fawzy Gad. Published at arXiv on June, 10 2021. Donation is open at PayPal (paypal.me/ahmedfgad or e-mail [email protected]),
OpenCollective (opencollective.com/pygad), and Interac e-Transfer ([email protected]).
writing this paper. The library supports single-objective optimization The library leaves much stuff to be built by the end-user which
with a wide range of parameters to customize the GA for different makes the user feel lost between the modules, classes, and functions
types of problems in an easy-to-use way with less effort. PyGAD needed to customize the library to solve a problem. For example,
has a lifecycle to trace how everything is working from population building a population of mixed data types requires the user to:
creation until finding the best solution. The lifecycle can also be 1) Register a data type for each gene.
customized to help researchers alter its sequence, enable or disable 2) Register an individual that combines those gene data types.
some operations, make modifications, or introduce new operators. 3) Register a population that uses that individual.
PyGAD works with both decimal and binary representations. The 4) Build the population
genes can be of int, float, or any supported NumPy numeric
data types (int/uint/float). Given the advantages of the GA over There is no way to restrict the gene values to a set of sparse
the gradient-based optimizers, PyGAD supports training Keras and discrete values. This is necessary for some problems where the gene
PyTorch models. value may not fall within a regular range. To select which genes to
The paper is organized so that section II covers the related mutate, DEAP only supports the mutation probability. There is no
work, section III extensively introduces PyGAD and briefly com- way to specify the exact number of genes to be mutated.
pares PyGAD with DEAP and LEAP, and finally, section IV draws DEAP only supports the traditional mutation operators where all
conclusions. Appendix A lists some resources to know more about solutions, regardless of their fitness value, are given equal mutation
PyGAD and Appendix B lists the GitHub links of some projects built probability. This would distort the high-quality solutions.
with PyGAD. Another major drawback of DEAP is the lack of means of
visualizing the results after the evolution completes [4]. The users
II. R ELATED W ORK have to manually create their plots to summarize the results.
There are already existing Python libraries for building the genetic
B. Pyevolve
algorithm. Some examples include:
A DEAP (Distributed Evolutionary Algorithms in Python) II-A Pyevolve is another pure Python library for building the genetic
B Pyevolve II-B algorithm [5]. Even that it is published in 2009, it is less popular than
C EasyGA II-C DEAP and this is based on the total number of installations (50K for
D LEAP (Library for Evolutionary Algorithms in Python) II-D all the time), GitHub stars (301), and citations. It also has a limited
community compared to DEAP.
This section gives an overview of these libraries by explaining
Its scripts start by creating the fitness function, preparing the
their objectives and limitations.
chromosome, setting some parameters like which operators to use,
A. DEAP create an instance of the GSimpleGA class and then call the
evolve() method to start the evolution.
DEAP (Distributed Evolutionary Algorithms in Python) [3] is Like DEAP, this library has boilerplate code for configurations that
considered one of the most common Python libraries for optimization makes its scripts unnecessarily longer even for simple problems. The
using the genetic algorithm based on the number of installations, supported mutation operators apply mutation equally to all solutions
GitHub issues, and stars (4.2K). One of the reasons is being one independent of their fitness value.
of the first libraries about EAs which was published in 2012. DEAP Even that Pyevolve is easier than DEAP, but it is limited in its
supports other algorithms than GA like non-dominated sorting genetic features. It supports some predefined data types for the genes like in-
algorithm II (NSGA-II), particle swarm optimization (PSO), and teger and real. To create new data types, then more details about some
evolution strategy (ES). Its latest release at Python Package Index classes in the library are needed like GenomeBase.GenomeBase
(PyPI) is 1.3.1 released on Jan 2020. which may be problematic for some users.
DEAP uses 2 main structures which are creator and toolbox. A comparison between DEAP and Pyevolve shows that the number
The creator module is used for creating data types like fitness, of code lines needed to solve the OneMax problem is 59 for DEAP
individual, and population. These data types are empty and filled and 378 for Pyevolve [3]. Given the simplicity of the problem,
using the toolbox module. Given the special structure of DEAP, Pyevolve needed too many lines.
the user would take some time until understanding getting familiar. This library is no longer maintained as the latest version was
It needs more than a beginner’s level in Python. released at the end of 2014 and the most recent commit on its GitHub
One problem about DEAP is being not user-friendly as the user project was at the end of 2015.
has to do some efforts for each optimization problem. For example,
the user has to write extra code for creating the population and filling
C. EasyGA
it with appropriate values. The motivation for that is to enable the
user to create custom data types that are not supported. But this The EasyGA library allows only defining a continuous range for
boilerplate code unnecessarily increases the length of the Python the gene values. If the range has some exclusions or if the gene values
script and makes it uncomfortable for users to write and understand do not follow a range at all, then there is no way to define the gene
more syntax about DEAP. Sometimes, the scripts are hard to read. values. Moreover, the user has to build the solutions manually without
Moreover, the users have to create a main function in which given a simple interface. Commonly, users would like to focus more
everything is grouped in an evolutionary loop. The loop in this user- on the algorithm itself and save time building additional modules
defined function should is where the user needs to follow the GA specially if they are not involved in Python.
pipeline by 1) calculating the fitness function, 2) selecting the parents, This library has a limited number of operators for crossover,
3) applying crossover and mutation, 4) and repeating that for several mutation, and parent selection. Given the
generations. This is not the best interface for users who try to focus The EasyGA library supports a random mutation operation that
on the experiments and save time on the other stuff. Writing the applies mutation over any solution in the population, including
evolutionary loop makes the library non-friendly. parents, and is not restricted to the new offspring. It randomly selects
Even if the library supports some ready-to-use algorithms that the solutions to mutate. This is against the nature of the GA as only
save time building the main function, each algorithm does a specific random changes could be introduced to the offspring, not the parents.
task and is limited in its features. With the few parameters each The users have to know about decorators, at some stage, to
algorithm accepts, there is little customization possible. Examples of build their operators. While writing the paper, the GitHub project
these algorithms are eaSimple, eaMuPlusLambda, and varOr. of EasyGA has only 22 stars.
D. LEAP The 7 modules included in PyGAD are:
LEAP (Library for Evolutionary Algorithms in Python) is another 1) pygad.pygad: It is the main module which builds everything
recent Python library published in 2020 for EAs that supports the in the genetic algorithm. This module is implicitly imported
genetic algorithm [6]. This library has 3 core classes which are when the library itself is imported.
Individual, Decoder, and Problem. The decoder is respon- 2) pygad.nn: Builds fully-connected neural networks (FCNNs)
sible for converting the genes from one form to another to calculate from scratch using only NumPy.
the fitness value for each individual given the current problem. 3) pygad.gann: Uses the pygad module to train networks build
According to the examples posted by the developers, the decoder using the nn module.
is usually set to IdentityDecoder() which means the gene is 4) pygad.cnn: Similar to the nn module but builds convolu-
translated to itself. This is a design issue in the library. Maybe this tional neural networks (CNNs).
feature is helpful in some specific problems but the library uses it 5) pygad.gacnn: Similar to the gann module but trains CNNs.
as something essential for all types of problems. It is better to work 6) pygad.kerasga: Trains Keras models using the pygad
directly on the genes without having to decode them to another form module.
or leave that decoding part to the fitness function. 7) pygad.torchga: Trains PyTorch models using the pygad
Even it is published in July 2020, the library has many missing module.
features as mentioned in its documentation. One of these missing Given that the pygad.pygad is the most critical module in the
features is the mixed data representation in the individual. There is no library, it is given the most attention.
lifecycle in LEAP to help trace what is happening in each generation. PyGAD has detailed documentation that covers all of its features
Even if one of the objectives of LEAP is to make it simpler than with examples. Moreover, the source code of various projects built
the other libraries, the user still has to write the evolution loop. This using PyGAD is explained in tutorials. A list of some of these
results in more lines to solve a problem. Moreover, the user has to tutorials is available in Appendix A.
take care of calling a function that increases the generations counter PyGAD gained popularity in the last months and some of its
by calling the util.inc_generation() function. Avoiding to English articles and tutorials are translated to different languages
call it causes an infinite evolution loop. This would be a problem for like Korean, Turkish, Hungarian, Chinese, and Russian. A list of
users with less experience. such translated articles and tutorials is found in the PyGAD in Other
Creating and managing the evolution loop is against another Languages section of the documentation.
objective of LEAP to make it suitable for all types of software users The documentation of PyGAD has a section called Release History
(users who solve problems with existing tools). to summarize the changes and additions in each release.
This library is not popular as it has a total of 3.4K installations Appendix A has extra reading resources about PyGAD. Appendix
since publication in addition to just 39 starts in the GitHub project. B lists some projects built with PyGAD.
III. P ROPOSED P Y GAD L IBRARY
B. PyGAD Usage
PyGAD is an open-source Python library for optimization us-
ing the genetic algorithm. It is published in April 2020 at PyPI There are 3 core steps to use PyGAD:
(pypi.org/project/pygad). Its GitHub project has over 525 stars 1) Build the fitness function.
(github.com/ahmedfgad/GeneticAlgorithmPython). With over 185K 2) Instantiate the pygad.GA class with the appropriate configu-
installations over 1 year, PyGAD is the most rising library compared ration parameters.
to the other libraries. 3) Call the run() method to start the evolution.
The name PyGAD has 3 parts: For the following equation with 3 inputs, we can use PyGAD to
1) Py which means it is a Python library. This is a common naming find the values of w1 , w2 , and w3 that satisfy equation:
convention for Python libraries.
2) GA stands for genetic algorithm. Y = w1 X1 + w3 X3 + w3 X3
3) D for decimal because the library was originally supporting only W here Y = 44, X1 = 4, X2 = −2, and X3 = 3.5
decimal genetic algorithm. Now, it supports both decimal and
binary genetic algorithm. A basic PyGAD example that solves this problem is given in
This section gives an overview of PyGAD III-A, discusses the Listing 1. Line 1 imports the library. This import statement implicitly
steps of its usage III-B, its lifecycle III-C, and main features in imports the pygad module. The NumPy library is also imported in
PyGAD III-D. Line 2 because it is used in the fitness function.
Line 4 creates a Python list to hold the 3 inputs and line 5 holds
A. PyGAD Overview the output.
PyGAD is an intuitive library for optimization using the genetic The fitness function in PyGAD is a regular Python function that
algorithm. It is designed with 2 main objectives: accepts 2 parameters:
1) Making everything as simple as possible for the users with the 1) The solution evolved by the genetic algorithm as a 1D vector.
least knowledge. The function should return a single numeric value representing
2) Giving the user control over everything possible. the fitness of this solution.
The simplicity comes from using descriptive names for the classes, 2) The index of the solution within the population.
methods, attributes, and parameters. This is in addition to making The length of the solution in this example is 3, one value for each
things straightforward to build the genetic algorithm and specify a weight in the equation. The fitness function must be a maximization
wide range of easy-to-understand configuration parameters. There function (the higher the fitness value the better the solution).
are fewer classes, methods, or functions to call to solve a problem The fitness function is defined from line 7 to line 10 in Listing 1.
compared to the other libraries. The function calculates the sum of products between each value in
The users do not have to build the evolution loop in any situation the solution and its corresponding input. The absolute difference is
as PyGAD supports an elastic lifecycle that can be altered. This calculated between the sum and the output.
includes, but is not limited to, enabling or disabling the mutation Returning the absolute difference makes it a minimization function.
or crossover operators and overriding them to build new operators This is why the result is returned as 1.0/abs. A tiny value of
for research purposes. 0.000001 is added to the denominator to avoid diving by zero.
A new instance of the pygad.GA class is created in line 12.
All configuration parameters are grouped in the pygad.GA class’s
constructor. With the available IDE’s, the user can easily check the
names of all available parameters. This way the user does not have
to memorize the names of functions or classes compared to the other
libraries.
There are 5 required parameters that must be specified for each
problem:
1) num_generations: The number of generations/iterations.
2) sol_per_pop: The number of solutions/chromosomes/indi-
viduals in the population (i.e. population size).
3) num_parents_mating: The number of solutions to be se-
lected from the population as parents for mating and producing
the offspring.
4) num_genes: The number of genes in each solution.
5) fitness_func: The fitness function.
These are the minimum parameters to use PyGAD. Note that the
names of the parameters are self-describing. The documentation has
information about the classes, parameters, attributes, methods, and Fig. 2. Evolution of the fitness value for 100 generations.
functions in all PyGAD modules.
R EFERENCES
[1] Simon, Dan. Evolutionary optimization algorithms. John Wiley & Sons,
2013.
[2] Gad, Ahmed Fawzy. Practical computer vision applications using deep
learning with CNNs. Apress, 2018.
[3] Fortin, Félix-Antoine, et al. ”DEAP: Evolutionary algorithms made easy.”
The Journal of Machine Learning Research 13.1 (2012): 2171-2175.
[4] Kim, Jinhan, and Shin Yoo. ”Software review: Deap (distributed evolu-
tionary algorithm in python) library.” Genetic Programming and Evolv-
able Machines 20.1 (2019): 139-142.
[5] Perone, Christian S. ”Pyevolve: a Python open-source framework for
genetic algorithms.” Acm Sigevolution 4.1 (2009): 12-20.
[6] Coletti, Mark A., Eric O. Scott, and Jeffrey K. Bassett. ”Library for
evolutionary algorithms in Python (LEAP).” Proceedings of the 2020
Genetic and Evolutionary Computation Conference Companion. 2020.
A PPENDIX A
P Y GAD S UPPLEMENTAL R ESOURCES
This appendix lists some tutorials and articles to get started with
PyGAD.
• 5 Genetic Algorithm Applications Using PyGAD, June 2020,
Ahmed Gad
• Building a Game-Playing Agent for CoinTex Using PyGAD,
July 2020, Ahmed Gad
• Working with Different Genetic Algorithm Representations in
PyGAD, September 2020, Ahmed Gad
• Train Neural Networks Using a Genetic Algorithm in Python
with PyGAD, September 2020, Fatima Ezzahra Jarmouni