Technical Seminar GP
Technical Seminar GP
“GENETIC PROGRAMMING”
Submitted in partial fulfillment for the award of
THRIPURASRI S- 1CK20CS057
Under the guidance
Mr. RAJA A
Assoc Professor
Dept of CSE, CBIT-Kolar
Kolar-Srinivasapura Road,
Kolar-563101
CERTIFICATE
II
ACKNOWLEDGEMENT
The completion of any work is a showcase of constant dedication and co-operation of
many people who lent their hands which went seen or unseen.
I would like to thank our beloved Assoc Professor and HOD Dr. VASUDEVA R,
Department of Computer Science & Engineering, for giving us guidance, valuable advice
and support.
I also thank to all our professors and the entire department of Computer Science &
Engineering, for their co-operation and suggestions.
The report would be incomplete if we do not thank our parents and friends for their
continuous encouragement and moral support.
THRIPURASRI S - 1CK20CS057
III
TABLE OF CONTENTS
ABSTRACT I
DECLARATION II
ACKNOWLEDGEMENT III
TABLE OF CONTENTS IV
LIST OF FIGURES VI
APTER NO
1 INTRODUCTION 1
1.1 Organization of report 2
2 LITERATURE SURVEY 3
3 REPRESENTATION, 6
INITIALIZATION OF PARENT
AND OPERATORS
3.1 Representation 6
3.2 Initialization of Population 7
3.3 Selection 8
3.4 Recombination and Mutation 8
4 WORKING AND ALGORITHMS 10
4.1 Working 10
4.2 Algorithms 10
4.2.1 Algorithm 1 11
4.2.2 Algorithm 2-Full and Grow 12
4.2.3 Algorithm 3-Interpreter of GP 12
CHAPTER 5 EXPERIMENTAL RESULTS AND 13
ANALYSIS
5.1 Mutation Method 13
5.2 Mutation Rate 14
III
5.3 Crossover Method 15
5.4 Selection Method 16
CHAPTER 6 APPLICATIONS 17
6.1 Curve fitting, Data Modelling and 17
Symbolic Representation.
6.2 Financial Trading, Time Series 17
Prediction and Economic Modelling
6.3 Medicine, Biology and 17
Bioinformatics
6.4 Mixing GP with other Techniques 18
6.5 GP to Create Searchers and Solvers- 18
Hyper-Heuristics
6.6 Artistic 18
6.7 Entertainment and Computer Games 18
CHAPTER 7 CONCLUSION 19
REFERENCES 20
III
LIST OF FIGURES
Figure no Figure name Page no
3.1 GP Syntax tree representation max(x*x,x+3y) 6
3.2 Full Initialization Method 7
3.3 Grow Initialization Method 7
3.4 Example for subtree crossover 8
3.5 Example for subtree mutation 9
4.1 General working of Genetic Programming 10
5.1 Average Fitness for different Mutation operators 13
5.2 Average Fitness overtime for different Mutation Rates 15
5.3 Average Fitness scores overtime for Crossover techniques 15
5.4 Selection methods: Comparison of Proportional solution 16
and elitism selection operators
VI
CHAPTER 1
INTRODUCTION
Genetic Programming (GP) is a powerful computational technique rooted in the
principles of natural evolution and genetics. It is a subfield of evolutionary computation that
aims to automatically evolve computer programs to solve complex problems. GP treats
computer programs as evolving entities represented in a tree-like structure, where each node
represents an operation or a value.
Once the initial population is created, the evolutionary process begins. This process
consists of several iterative steps, often referred to as generations. In each generation, the
fitness of each program in the population is evaluated. Fitness represents how well the
program solves the problem or achieves the desired objective. The evaluation function is
problem-specific and can vary depending on the application.
The offspring generated through genetic operations form the next generation of
programs. This new generation undergoes evaluation, selection, crossover, and mutation,
iteratively evolving towards better solutions. The process continues until a termination
condition is met, such as finding a satisfactory solution or reaching a maximum number of
generations.
1
GENETIC PROGRAMMING INTRODUCTION
Chapter 2: This chapter provides the information about the literature survey carried out for
this topic genetic programming.
Chapter 3: This chapter provides the detailed Representation, Initialization of parent and
operators used to construct the tree for genetic programming.
Chapter 4: This chapter provides the information about the working and algorithms used in
the genetic programming.
Chapter 5: This chapter provides the information about the experimental results and analysis
made on the genetic programming.
Chapter 6: This chapter describes the applications of genetic programming in various fields.
Chapter 7: Conclusion
REFERENCES
LITERATURE SURVEY
Genetic Programming (GP) has been the subject of extensive research and
application across various domains, showcasing its versatility and effectiveness in solving
complex problems. Here's a literature survey highlighting key studies, advancements, and
resources related to GP:
[1] A Field Guide to Genetic Programming: By Poli, Langdon, and McPhee: This
comprehensive book provides an in-depth introduction to GP, covering fundamental
concepts, algorithmic details, and applications. It serves as a valuable resource for
both beginners and advanced practitioners. A Field Guide to Genetic Programming"
serves as an invaluable resource for researchers, practitioners, and students interested
in understanding and applying GP techniques to solve challenging problems in
various domains. It offers a comprehensive overview of the field, accompanied by
practical insights and guidance for implementing and deploying GP algorithms
effectively.
[2] Genetic Programming Theory and Practice: edited by Riolo, Soule, and Worzel:
This edited volume presents a collection of papers that delve into theoretical aspects,
practical applications, and recent developments in GP. It offers insights into the
theoretical underpinnings and practical considerations of GP. In classification tasks,
GP can be used to evolve programs that classify data into different categories. The
individuals in the population represent candidate solutions, and each individual is
evaluated based on how well it classifies the training data. The fitness function guides
the evolution process by rewarding individuals that produce more accurate
classifications.
[3] Advancements in Symbolic Regression with Genetic Programming: Symbolic
regression using GP has been a prominent area of research. Studies by Vladislav leva
et al. (2008) and Nguyen et al. (2015) have introduced innovative techniques and
strategies for improving the accuracy and efficiency of symbolic regression with GP.
[4] GP Applications in Classification and Prediction: GP has been widely applied in
classification tasks, ranging from pattern recognition to bioinformatics.
3
[5] GP for Optimization and Control: GP has also found applications in optimization
problems and control systems. Research by Koza (1992) and Bäck et al. (1997)
demonstrates the efficacy of GP in evolving optimal solutions for a wide range of
optimization tasks.
[6] GP in Real-World Applications: Numerous studies demonstrate the practical
applicability of GP in solving real-world problems across various domains. Notable
examples include applications in finance, engineering, and healthcare. For instance,
research by Cegielski et al. (2015) explores the use of GP for financial forecasting,
while studies by Kordon et al. (2011) and Koza et al. (2012) showcase GP's. Hybrid
and Ensemble Approaches: Researchers have explored hybridizing GP with other
machine learning techniques or forming ensembles to enhance performance and
robustness. Studies by Espejo et al. (2002) and Lopes et al. (2008) investigate the
effectiveness of hybrid and ensemble approaches in improving GP's performance.
[7] GP for Automated Machine Learning (AutoML): GP has emerged as a promising
technique for automating the machine learning pipeline, including feature
engineering, model selection, and hyperparameter tuning.
[8] Theoretical Advances and Extensions: Researchers continue to explore theoretical
aspects of GP and develop extensions to improve its capabilities. Studies by Fiduccia
and Mattheakis (2001) and Fenton and Nielsen (2016) introduce theoretical
frameworks and extensions to GP, addressing issues such as bloat control and multi-
objective optimization.
[9] Open-Source GP Frameworks and Libraries: Several open-source GP frameworks
and libraries are available, facilitating experimentation, research, and application
development. Examples include DEAP (Distributed Evolutionary Algorithms in
Python), ECJ (Evolutionary Computation in Java), and lil-gp (a small, portable
implementation of GP in C).
[10] GP for Evolving Neural Network: Genetic Programming has been extensively
utilized for evolving neural network architectures and weights. Research by Stanley
and Miikkulainen (2002) introduced NEAT (Neuroevolutionary of Augmenting
Topologies), a method that evolves both the structure and weights of neural networks
through GP, leading to the creation of more efficient and adaptive networks.
3
GENETIC PROGRAMMING Literature Survey
[11] GP for Automated Design and Invention: Genetic Programming has been applied
to automate the design and invention of novel solutions to engineering and design
problems. The work by Hornby et al. (2003) showcases the use of GP in evolving
complex designs, leading to innovative solutions in various domains, including
robotics and aerospace engineering.
[12] GP in Bioinformatics and Computational Biology: Genetic Programming has
shown promise in addressing challenges in bioinformatics and computational biology,
such as gene expression analysis and protein structure prediction. Research by Koza
et al. (2004) explores the application of GP in evolving programs for predicting gene
expression levels.
3.1 REPRESENTATION
In GP programs are usually expressed as syntax trees rather than as lines of code. Figure 3.1
shows, for example, the tree representation of the program max(x*x,x+3*y). Note how the
variables and constants in the program x, y, and 3), called terminals in GP, are leaves of the
tree, while the arithmetic operations (+, *, and max) are internal nodes (typically called
functions in the GP literature). The sets of allowed functions and terminals together form the
primitive set of a GP system.
max
+ +
x x x ∗
3 y
6
3.2 Initialization of Population
Similar to other evolutionary algorithms, in GP the individuals in the initial
population are randomly generated. There are a number of different approaches to generating
this random initial population. Here we will describe two of the simplest (and earliest)
methods (the Full and Grow methods), and a widely used combination of the two known as
Ramped half-and-half. In both the Full and Grow methods, the initial individuals are
generated subject to a pre-established maximum depth. In the Full method (so named because
it generates full trees) nodes are taken at random from the function set until this maximum
tree depth is reached, and beyond that depth only terminals can be chosen. Figure 3.2 shows
snapshots of this process in the construction of a full tree of depth 2. The children of the *
node for example,
∗ ∗ ∗
x x y
t=5 t=6 t=7
+ + +
∗ / ∗ / ∗ /
x y x y 1 x y 1 0
+ + +
x x −
t=4 t=5
+ +
x − x −
must be leaves, or the resulting tree would be too deep; thus at time t = 3 and time t = 4
terminals must be chosen (x and y in this case).
Where the Full method generates trees of a specific size and shape, the Grow
method allows for the creation of trees of varying size and shape. Here nodes are selected
from the whole primitive set (functions and terminals) until the depth limit is reached, below
which only terminals may be chosen (as is the case in the Full method). Figure 3.3 illustrates
this process for the construction of a tree with depth limit 2. Here the first child of the root +
6
node happens to be a terminal, thus closing off that branch before actually reaching the depth
limit. The other
6
GENETIC PROGRAMMING Representation ,initialization & operators
child, however, is a function (-), but its children are forced to be terminals to ensure that the
resulting tree does not exceed the depth limit.
3.3 Selection
Like in most other EAs, genetic operators in GP are applied to individuals that are
probabilistically selected based on fitness. That is, better individuals are more likely to have
more child programs than inferior individuals. The most commonly employed method for
selecting individuals in GP is tournament selection, followed by fitness-proportionate
selection, but any standard EA selection mechanism can be used. Since selection has been
described elsewhere in this book (in the Chapters on EAs), we will not provide any additional
details.
Except in technical studies on the behaviour of GP, crossover points are usually not
selected with uniform probability. Typical GP primitive sets lead to trees with an average
branching factor of at least two, so the majority of the nodes will be leaves. Consequently,
the uniform selection of crossover points leads to crossover operations frequently exchanging
only very small amounts of genetic material (that is, small subtrees); many crossovers may in
fact
reduce to simply swapping two leaves. To counter this, Koza suggested the widely used
approach of choosing functions 90% of the time, while leaves are selected 10% of the time. .
Finally, it is worth mentioning that the notion of common region is related to the notion of
homology, in the sense that the common region represents the result of a matching process
between parent trees. It is then possible to imagine that within such a region transfer of
homologous primitives can happen in very much like the same way as it happens in GAs
operating on linear chromosomes. An example of recombination operator that implements
this idea is uniform crossover for GP
Mutation Mutation
+ Point + Point
+ 3 + ∗
x y x y y /
Randomly Generated
Sub-tree x 2
∗
y /
x 2
The most commonly used form of mutation in GP (which we will call subtree
mutation) randomly selects a mutation point in a tree and substitutes the sub-tree rooted there
with a randomly generated sub-tree. This is illustrated in Fig.3.5. Subtree mutation is
sometimes implemented as crossover between a program and a newly generated random
program; this operation is also known as ‘headless chicken’ crossover [10]. Another common
form of mutation is point mutation, which is the rough equivalent for GP of the bit-flip
mutation used in GAs. In point mutation a random node is selected and the primitive stored
there is replaced with a different random primitive of the same arity taken from the primitive
set. If no other primitives with that arity exist, nothing happens to that node (but other nodes
may still be mutated). Note that, when subtree mutation is applied, this involves the
modification of exactly one subtree. Point mutation, on the other hand, is typically applied
with a given mutation rate on a per-node basis, allowing multiple nodes to be mutated
independently. While mutation is not necessary for GP to solve many problems, argues that,
in some cases, GP with mutation alone can perform as well as GP using crossover. While
mutation was often used sparsely in early GP work, it is more widely used in GP today,
especially in modelling applications.
4.2 Algorithms
Genetic Algorithms (GAs) are a type of evolutionary algorithm inspired by the principles of
natural selection and genetics. They are used to solve optimization and search problems by
mimicking the process of evolution. GAs can be applied to a wide range of problems,
including optimization, combinatorial problems, and machine learning. Here's a detailed
definition and explanation of Genetic Algorithms:
10
GENETIC PROGRAMMING Working & Algorithms
well a candidate solution solves the problem, guiding the selection process during
evolution.
Selection: Selection determines which chromosomes are chosen to create the next
generation. Common selection methods include roulette wheel selection, tournament
selection, and rank-based selection. The idea is to select chromosomes with higher
fitness, allowing them to pass on their genes to the next generation.
Crossover: Crossover is a genetic operator that combines two parent chromosomes to
produce one or more offspring. It is designed to introduce new genetic material and
explore the solution space. Common crossover techniques include single-point, two-
point, and uniform crossover.
Mutation: Mutation is another genetic operator that introduces random changes to
chromosomes, promoting diversity in the population. Mutation helps prevent
premature convergence to suboptimal solutions by exploring new areas of the solution
space.
Generations and Termination: The GA process repeats over multiple generations.
Each generation involves selection, crossover, and mutation, leading to a new
population of candidate solutions. The algorithm terminates when a stopping criterion
is met, such as a maximum number of generations or an acceptable fitness level.
4.2.2 Algorithm 2: Pseudo code for recursive program generation with the Full and Grow
methods
Procedure:
gen rnd expr( func set, term set, max d, method )
1:method = grow and rand()then
1. set+func set
2. expr = choose random element( term set )
3. else
4. func = choose random element( func set )
5. for i = 1 to arity(func) do
6. arg i = gen rnd expr( func set, term set, max d - 1, method );
7. expr = (func, arg 1, arg 2, ...);
8. return expr
4.2.3 Algorithm 3: Typical interpreter for GP
1. if expr is a list then
2. proc = expr(1) {Non-terminal: extract root}
3. if proc is a function then
4. value = proc( eval(expr(2)), eval(expr(3)), ... ) {Function: evaluate arguments}
5.else
6. value = proc( expr(2), expr(3), ...) {Macro: don’t evaluate arguments}
7. else
8. if expr is a variable or expr is a constant then
9. value = expr {Terminal variable or constant: just read the value}
10.else
11. value = expr() {Terminal 0-arity function: execute}
12. return value
13
Fig 5.1: Average Fitness for Different Mutation Operators
13
GENETIC PROGRAMMING Experimental Result & Analysis
Fig 5.1 has several interesting properties. First, it is apparent that the line
corresponding to no mutation eventually reaches a maximum fitness which it never exceeds.
This shows that not including mutation reduces the amount of exploration that is possible.
The genetic material present in the initial population will remain the same throughout
subsequent generations. It will only be combined in different ways through crossover
operations. Adding mutation operators allows exploration of new areas of the search space.
Second, the swap mutation operator did not perform better than the classical
mutation operator until very late in the simulation. This is probably because the swap
mutation operator does not explore as widely as the classical mutation operator, since it can
only change the location of genes which are already in the chromosome. The early wide
exploration provided by the classical mutation operator allowed it to quickly find better
solutions than the other three techniques.
might also help in this situation, since it will increase the probability that more good solutions
survive.
Fig 5.2 : Average Fitness Over Time For Different Mutation Rates
Fig 5.3: Average fitness scores over time for different crossover techniques
APPLICATIONS
Since its early beginnings, GP has produced a cornucopia of results. The literature,
which covers more than 5000 recorded uses of GP, reports an enormous number of
applications where GP has been successfully used as an automatic programming tool, a
machine learner or an automatic problem-solving machine.
17
biological systems, principally proteins. Oakley, a practising medical doctor, used GP to
model blood flow in toes as part of his long term interests in frostbite.
17
6.4 Mixing GP with Other Techniques
GP can be hybridised with other techniques. Iba, Nikolaev, and Zhang have
incorporated information theoretic and minimum description length ideas into GP fitness
functions to provide a degree of regularization and so avoid over-fitting . As mentioned
computer language grammars can be incorporated into GP. Indeed Wong [has had success
integrating these with GP. The use of simulated annealing and hill climbing to locally fine
tune parts of solutions found by GP was described.
6.6 Artistic
Computers have long been used to create purely aesthetic artifacts. Much of today’s
computer art tends to ape traditional drawing and painting, producing static pictures on a
computer monitor. However, the immediate advantage of the computer screen – movement –
can also exploited. In both cases EC can and has been exploited. Indeed with evolution’s
capacity for unlimited variation, EC offers the artist the scope to produce ever changing
works. Some artists have also worked with sound.
CONCLUSION
The most difficult part of this project was the development of the fitness function. It
required modelling a complex system, and is based on a number of simplifying assumptions,
particularly with respect to the network. The utility of the solutions generated by this
algorithm could be improved by refining the fitness function so that it more closely reflects
the actual performance possible in the video processing network. However, based on the data
I was able to collect, the fitness function models the network reasonably well. For the
purpose of exploring genetic algorithms, the fitness function provides a complex feature
space with conflicting goals. I suspect that this made it easy to see the difference between
some of the configurations that I tried in my experiments. For example, the difference in
behaviour of elitism selection and proportional selection was very apparent. The comparison
of various mutation methods, including one which was created because I suspected it would
be useful for this problem, was interesting. It showed that not using mutation limited the
maximum fitness that could be achieved, while using both mutation operators in concert gave
better results than either one in isolation. In this case, increased searching gave better results.
When different mutation rates were compared, however, it became apparent that too much
searching could also be a problem.
A mutation rate which is too high has a tendency to destroy information contained
in successful individuals. The examination of one-point, two-point and uniform crossover
also showed that this can have an effect on the performance of the genetic algorithm.
However, it was not clear from the data I was able to collect which of the crossover
techniques is the best to use in th is case. This could be an area for further exploration.
Finally, the most surprising result was that elitist selection gave better results than
proportional selection in less time. The conventional wisdom seems to be that elitism limits
the ultimate success of the algorithm by excluding useful genetic material, but in this case the
increased selection pressure seems to have been more important.
19
REFERENCES
[1] Poli, Langdon, and McPhee, “A Field Guide to Genetic Programming”., published
year-2008[Link to the book](https://www.cs.miami.edu/home/poli/GECCO2008/).
[2] Riolo,Soule and Worzel edited “Genetic Programming Theory and Practice”.
[3] Vladislav leva and Nguyen published “Advancements in Symbolic Regression with
Genetic Programming” in the year 2015.
[4] Castelli, M., & Manzoni, L. (2002). “Genetic programming and feature construction
for classification tasks”. Proceedings of the 2002 Congress on Evolutionary
Computation (CEC).
[5] Research by Koza (1992) and Bäck et al. (1997)., “GP for Optimization and
Control”, IEEE Transactions on Evolutionary Computation, 1997
[6] Cegielski, M., Kowalski, P. A., & Winkler, K. J. (2015). “Financial forecasting using
genetic programming: A survey of the literature”. Expert Systems with Applications in
2015.
[7] Research by Kotthoff et al. (2017) and Orzechowski et al. (2017) investigates “GP for
Automated Machine Learning (AutoML)” in year 2017.
[8] Fenton, M., & Nielsen, T. D. (2016). Grammatical evolution for automated software
engineering: A critical review. “Genetic Programming and Evolvable Machines” in
2016 .
[9] “Open-Source GP Frameworks and Libraries”: Several open-source GP frameworks
and libraries are available, facilitating experimentation, research, and application
development. Examples include DEAP (Distributed Evolutionary Algorithms in
Python).
[10] Stanley, K. O., & Miikkulainen, R. (2002). Evolving neural networks through
augmenting topologies. “Evolutionary Computation”, 10(2), 99-127.[Link to the
paper] (http://nn.cs.utexas.edu/downloads/papers/stanley.ec02.pdf)
[11] Hornby, G. S., Lohn, J. D., & Linden, D. S. (2003). Computer-automated evolution of
an X-band antenna for NASA's space technology 5 mission. “Evolutionary
Computation.(https://www.mitpressjournals.org/doi/abs/10.1162/1063656033225181
71)
[12] Koza, J. R., Mydlowec, W., & Lanza, G. (2004). An empirical investigation of the
role of gene expression programming in the prediction of gene expression. “Genetic
Programming Theory and Practice II”,
(http://www.genetic-programming.org/hc2003/Koza2003GPTP.pdf).dss