N Are Do Final Thesis
N Are Do Final Thesis
Enrique Naredo
director:
Dr. Leonardo Trujillo Reyes
advisor:
Dr. Leonardo Trujillo Reyes
i
ii
Abstract
iii
iv
To my whole dear family and to all my true friends.
v
vi
Acknowledgements
vii
On a personal note, I would like to thank all the friends (‘contras’)
I made from the ITT postgraduate program in computer sciences at
the Tomás Aquino campus, for their unconditional support on my bad
and sometimes worst days in my PhD journey, and because they always
make me feel as a part of their family as well as they are for me.
On a more personal note, I am very thankful to all my family for
their lovely support. They are far away in distance, but always close in
affection.
Lastly, I would like to thank the funding provided by CONACYT
(México) for my scholarship No. 232288, as well as the funding pro-
vided by FCT project EXPL/EEISII/1861/2013, and by CONACYT Ba-
sic Science Research Project No. 178323, DGEST (Mexico) Research
Project 5414.14-P. Particularly I am very thankful for all the support
from the project ACOBSEC: FP7-PEOPLE-2013-IRSES financed by the
European Commission with contract No. 612689.
viii
Contents
Resumen i
Dedication v
Acknowledgements vii
Contents ix
List of Figures xi
List of Tables xiii
1. introduction 1
1.1. Evolutionary Algorithms . . . . . . . . . . . . . . . . . . . 3
1.2. GP open issues . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3. Mechanisms to guide the search . . . . . . . . . . . . . . . 4
1.4. Original Contributions . . . . . . . . . . . . . . . . . . . . 6
1.5. Outline of the Dissertation . . . . . . . . . . . . . . . . . . 7
1.6. Summary of publications . . . . . . . . . . . . . . . . . . . 8
2. genetic programming 11
2.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2. Genetic Programming Overview . . . . . . . . . . . . . . . 13
2.3. Nuts and Bolts of GP . . . . . . . . . . . . . . . . . . . . . 14
2.4. Search Spaces in GP . . . . . . . . . . . . . . . . . . . . . . 15
2.5. GP Case Study: Disparity Map Estimation for Stereo Vision 21
2.5.1. Experimental Configuration and Results . . . . . . 22
2.6. Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . 25
3. deception 31
3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2. Deception in the Artificial Evolution . . . . . . . . . . . . 32
3.3. Deception in Evolution . . . . . . . . . . . . . . . . . . . . 33
3.4. Deceptive Problems . . . . . . . . . . . . . . . . . . . . . . 34
3.4.1. Trap Functions . . . . . . . . . . . . . . . . . . . . . 34
3.4.2. Scalable Fitness Function . . . . . . . . . . . . . . . 35
3.4.3. Deceptive Navigation . . . . . . . . . . . . . . . . . 36
ix
3.5. Deceptive Classification Problem Design . . . . . . . . . . 36
3.5.1. Synthetic Classification Problem . . . . . . . . . . 37
3.5.2. Linear Classifier . . . . . . . . . . . . . . . . . . . . 38
3.5.3. Objective Function . . . . . . . . . . . . . . . . . . 40
3.5.4. Non-Deceptive Fitness Landscape . . . . . . . . . . 40
3.5.5. Local Optima and Global Optimum . . . . . . . . 42
3.5.6. Deceptive Objective Function . . . . . . . . . . . . 42
3.5.7. Deceptive Fitness Landscape . . . . . . . . . . . . . 43
3.5.8. Preliminary Results . . . . . . . . . . . . . . . . . . 45
3.6. Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . 46
4. novelty search 47
4.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2. Open-ended Search . . . . . . . . . . . . . . . . . . . . . . 49
4.3. Nuts & Bolts of NS . . . . . . . . . . . . . . . . . . . . . . . 50
4.3.1. Behavioral representation . . . . . . . . . . . . . . 51
4.3.2. NS algorithm . . . . . . . . . . . . . . . . . . . . . 52
4.3.3. Underlying evolutionary algorithm . . . . . . . . . 53
4.3.4. Minimal Criteria Novelty Search . . . . . . . . . . 54
4.4. Contributions on NS . . . . . . . . . . . . . . . . . . . . . 54
4.4.1. MCNSbsf . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4.2. Probabilistic NS . . . . . . . . . . . . . . . . . . . . 56
4.5. Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . 59
5. ns case study: automatic circuit syn-
thesis 61
5.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2. GA Synthesis and CFs Representation . . . . . . . . . . . 63
5.2.1. Objective Function . . . . . . . . . . . . . . . . . . 64
5.2.2. CFs Representation for a GA . . . . . . . . . . . . . 65
5.2.3. CF synthesis with GA-NS . . . . . . . . . . . . . . . 66
5.3. Results and Analysis . . . . . . . . . . . . . . . . . . . . . 66
5.3.1. CF Topologies . . . . . . . . . . . . . . . . . . . . . 66
5.3.2. Comparison between GA-OS and GA-NS . . . . . . 67
5.4. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 68
x
6. generalization of ns-based gp con-
trollers for evolutionary robotics 75
6.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2. Background . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.2.1. Grammatical Evolution . . . . . . . . . . . . . . . . 80
6.2.2. Generalization in Genetic Programming . . . . . . 81
6.3. Experimental Setup . . . . . . . . . . . . . . . . . . . . . . 84
6.3.1. Navigation Task and Environment . . . . . . . . . 85
6.3.2. Training and Testing Set Size . . . . . . . . . . . . 86
6.3.3. Training Set Selection . . . . . . . . . . . . . . . . . 86
6.3.4. Objective Functions . . . . . . . . . . . . . . . . . . 87
6.3.5. Search Algorithms . . . . . . . . . . . . . . . . . . 88
6.3.6. Limit for Allowed Moves . . . . . . . . . . . . . . . 89
6.4. Results and Analysis . . . . . . . . . . . . . . . . . . . . . 90
6.4.1. Results for Randomly Chosen Training Sets . . . . 90
6.4.2. Comparison with a Manually Selected Training Set 92
6.4.3. Statistical Analysis . . . . . . . . . . . . . . . . . . 96
6.5. Heat Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.6. Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . 99
7. gp based on ns for regression 107
7.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7.1.1. Behavioral Descriptor . . . . . . . . . . . . . . . . . 108
7.1.2. Pictorial Example . . . . . . . . . . . . . . . . . . . 108
7.1.3. NS Modifications . . . . . . . . . . . . . . . . . . . 109
7.2. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.2.1. Test Problems and Parametrization . . . . . . . . . 111
7.2.2. Parameter Settings . . . . . . . . . . . . . . . . . . 112
7.2.3. Experimental Results . . . . . . . . . . . . . . . . . 112
7.3. Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . 113
8. gp based on ns for clustering 117
8.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.2. Clustering with Novelty Search . . . . . . . . . . . . . . . 118
8.2.1. K-means . . . . . . . . . . . . . . . . . . . . . . . . 119
8.2.2. Fuzzy C-means . . . . . . . . . . . . . . . . . . . . 119
xi
8.2.3. Cluster Descriptor (CD) . . . . . . . . . . . . . . . 120
8.2.4. Cluster Distance Ratio (CDR) . . . . . . . . . . . . 121
8.3. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.3.1. Test Problems . . . . . . . . . . . . . . . . . . . . . 123
8.3.2. Configuration and Parameter Settings . . . . . . . 123
8.3.3. Results . . . . . . . . . . . . . . . . . . . . . . . . . 124
8.4. Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . 125
9. gp based on ns for classification 133
9.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.1.1. Accuracy Descriptor (AD) . . . . . . . . . . . . . . 134
9.1.2. Binary Classifier: Static Range Selection . . . . . . 136
9.1.3. Multiclass Classifier: M3GP . . . . . . . . . . . . . 136
9.1.4. Preliminary Results . . . . . . . . . . . . . . . . . . 137
9.1.5. Discussion . . . . . . . . . . . . . . . . . . . . . . . 138
9.2. Real-world classification experiments . . . . . . . . . . . . 141
9.2.1. Results: Binary Classification . . . . . . . . . . . . 142
9.2.2. Results: Multiclass Classification . . . . . . . . . . 147
9.2.3. Results: Analysis . . . . . . . . . . . . . . . . . . . 150
9.3. Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . 152
10. conclusions & future directions 157
10.1. Summary and Conclusions . . . . . . . . . . . . . . . . . . 158
10.2. Open Issues in GP . . . . . . . . . . . . . . . . . . . . . . . 158
10.3. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 159
Bibliography 163
xii
List of Figures
xiii
4.1. Two examples of evolutionary robotics applying NS. . . . . 49
4.2. Objective-based against novelty-based in a task navigation. 50
4.3. Three scenarios for an individual’s behavior . . . . . . . . . 53
4.4. Illustration of the effect of the proposed MCNS approach. . 55
4.5. Representation of the PNS novelty measure, where each
column represents one feature βi of the AD vector and
each row is a different generation. In each column, two
graphics are presented, on the left is the frequency of in-
dividuals with either a 1 or 0 for that particular feature
in the current population, and on the right the cumulative
frequency over all generations. . . . . . . . . . . . . . . . . . 58
5.1. CF representation using (a) nullors and (b) MOSFETs. . . . 65
5.2. Three CF topologies for the synthesis of a CF with the
topology size of one MOSFET. . . . . . . . . . . . . . . . . . 69
5.3. Five CF topologies for the synthesis of a CF with the topol-
ogy size of two MOSFETs found by GA-NS. . . . . . . . . . . 70
5.4. Four CF topologies for the synthesis of a CF with the topol-
ogy size of three MOSFETs found by GA-NS. . . . . . . . . . 71
5.5. Convergence plots for GA-NS and GA-OS for the two MOS-
FET CFs, showing the performance (objective function
value) of the best solution found over the initial genera-
tions of the search. The lines represent the average over
10 runs of the algorithms. . . . . . . . . . . . . . . . . . . . . 72
5.6. Histograms showing the average composition of the chro-
mosome of the best solution found by each algorithm after
10 generations, for the single MOSFET circuits. . . . . . . . 72
5.7. Histograms showing the average composition of the chro-
mosome of the best solution found by each algorithm after
10 generations, for the two MOSFET circuits. . . . . . . . . 73
5.8. Histograms showing the average composition of the chro-
mosome of the best solution found by each algorithm after
10 generations, for the three MOSFET circuits. . . . . . . . . 73
6.1. Example of a GE genotype-phenotype mapping process. . . 79
6.2. Learning environment & test set. . . . . . . . . . . . . . . . 85
xiv
6.3. Performance comparison of objective, novelty and
random-based search. . . . . . . . . . . . . . . . . . . . . . . 89
6.4. Box plot comparison of the best solution found using train-
ing set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.5. Box plot comparison of the best solution found using test
set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.6. Box plot comparison of the percentage of testing hits. . . . . 97
6.7. Box plot comparison of the best solution found using train-
ing set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
6.8. Box plot comparison of the best solution found using test
set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.9. Box plot comparison of the percentage of testing hits. . . . . 100
6.10. Comparison of the performance on the test set based on
percentage of hits. . . . . . . . . . . . . . . . . . . . . . . . . 101
6.11. Overfit heat-maps, at the right of each figure is shown a
color scale that goes from low values of overfitting in color
blue to higher values in color red. . . . . . . . . . . . . . . . 103
6.12. Overfit binary maps, obtained from the binarization of the
Overfit heat-maps by a color threshold to divide the map
into easy and difficult regions. . . . . . . . . . . . . . . . . . 103
6.13. Easy-difficult training Set & Test Set. . . . . . . . . . . . . . 104
7.1. Graphical depiction of how the −descriptor is con-
structed. Notice how the interval specified by deter-
mines the values of each βi . . . . . . . . . . . . . . . . . . . . 109
7.2. Benchmark regression problems showing the ground truth
function taken from (Uy et al., 2011a) . . . . . . . . . . . . . 112
7.3. Boxplot analysis of NS-GP-R performance on each bench-
mark problem, showing the best test-set error in each run. . 114
8.1. Fitness landscape in behavioral space for the CD descriptor.121
8.2. Five synthetic clustering problems; the observed clusters
represent the ground truth data. . . . . . . . . . . . . . . . . 123
8.3. Comparison of clustering performance on Problem No.1. . 126
8.4. Comparison of clustering performance on Problem No.2. . 127
8.5. Comparison of clustering performance on Problem No.3. . 128
8.6. Comparison of clustering performance on Problem No.4. . 129
xv
8.7. Comparison of clustering performance on Problem No.5. . 130
8.8. Evolution of sparseness for NS-GP-20, showing the aver-
age sparseness of he best individual at each generation. . . 131
8.9. Evolution of sparseness for NS-GP-40, showing the aver-
age sparseness of he best individual at each generation. . . 131
8.10. Evolution of the best solutions found at progressive gener-
ations for Problem 5 with NS-GP-40. . . . . . . . . . . . . . 132
9.1. Graphical depiction of the Accuracy Descriptor (AD) . . . . 135
9.2. Graphical depiction of SRS-GPC. . . . . . . . . . . . . . . . 136
9.3. Five synthetic 2-class problems . . . . . . . . . . . . . . . . 138
9.4. Evolution of the average size of individuals at each gener-
ation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.5. Convergence of the classification error . . . . . . . . . . . . 144
9.6. Evolution of the average size of the population . . . . . . . 145
9.7. Classification error on the test data . . . . . . . . . . . . . . 148
9.8. Convergence of training and testing error for multiclass
problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
9.9. Plot showing the percentage of rejected individuals . . . . . 151
9.10. Archive size & Relative speed-up . . . . . . . . . . . . . . . 152
9.11. Relative ranking of NS, PNS and OS on the IM-3 problem. . 155
9.12. Relative ranking of NS, PNS and OS on the SEG problem. . 156
xvi
List of Tables
xvii
7.4. Comparison of NS-GP-R with two control methods re-
ported in (Uy et al., 2011a): SSC and SGP. Values are the
mean error computed over all runs. . . . . . . . . . . . . . . 114
8.1. Parameters for the GP-based search. . . . . . . . . . . . . . . 124
8.2. Average classification error and standard error. . . . . . . . 125
9.1. Parameters for the GP systems. . . . . . . . . . . . . . . . . 139
9.2. Average and standard deviation of the classification error
on the test data. . . . . . . . . . . . . . . . . . . . . . . . . . 139
9.3. Average program size at the final generation for each algo-
rithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
9.4. Real-world and synthetic datasets for binary and multi-
class classification problems. . . . . . . . . . . . . . . . . . . 141
9.5. Binary classification performance for all MCNSbest variants. 143
9.6. Binary classification performance on the test data. . . . . . 146
9.7. Resulting p-values of the Friedman’s test with Bonferroni-
Dunn correction. . . . . . . . . . . . . . . . . . . . . . . . . . 147
9.8. Multiclass classification performance on the test data. . . . 149
9.9. Resulting p-values of the Friedman’s test with Bonferroni-
Dunn correction, for the multiclass problems. . . . . . . . . 149
xviii
Introduction
1
Many of the problems which concern to human beings to improve
their quality of life can be posed as optimization problems. However, in
order to do so, we need to be able to describe the system that is involved
in the problem. With enough knowledge about the system we can de-
rive a useful abstraction and describe it by a model, identifying the
variables and constraints which are relevant to improve some objective.
Often, for an engineering problem the objective is to choose the design
parameters that maximize the benefits, or minimizes costs without vio-
lating the constraints, in other words finding the best feasible solution.
Therefore, solutions must be scored according to the quality they show
when solving the problem, this typically is measured through to what is
known as an objective function or reward function, which precisely re-
wards those solutions that are closer to the objective. Depending on the
problem domain, solutions can be encoded with real-valued or discrete
variables, the latter ones are referred to as combinatorial optimization
(CO) problems.
Examples of CO problems are the Travellings Salesman Problem
(TSP), the Quadratic Assignment problem (QAP), scheduling problems,
and many others. Due to the practical importance of CO problems,
many algorithms to tackle them have been developed. These algo-
rithms can be classified as either complete or approximate algorithms.
Complete algorithms are guaranteed to find for every finite size in-
stance of a CO problem an optimal solution in bounded time meta-
heuristics.
Therefore, complete methods might need exponential computation
time in the worst-case. This often leads to computation times too high
1
introduction
2
1.1 evolutionary algorithms
3
introduction
4
1.3 mechanisms to guide the search
5
introduction
2 More information about NS can be obtained in; The Novelty Search Users Page, http:
//eplex.cs.ucf.edu/noveltysearch/userspage/
6
1.5 outline of the dissertation
7
introduction
problem, in this case disparity map estimation for stereo computer vi-
sion.
Chapter 3 introduces the notion of deception, which is frequently
present in real-world problems and closely related with the problem
difficulty, giving some examples about deceptive functions and de-
ceptive navigation tasks. Furthermore, a method to design synthetic
classification problems with deceptive fitness landscape is presented
(Naredo et al., 2015).
Chapter 4 presents the NS algorithm (Lehman and Stanley, 2011a),
and its unique approach to search inspired by natural evolution’s open-
ended property. We highlight that traditional evolutionary search is
driven by a given objective, but NS on the other hand, drives the
search by rewarding the most different solutions. Though, it may sound
strange and counter intuitive, NS has shown that in some problems ig-
noring the objective can outperforms traditional search. The reason for
this phenomenon is that sometimes the intermediate steps to the goal
do not resemble the goal itself (Stanley and Lehman, 2015).
Furthermore, the related work is presented analysing different pre-
vious versions of NS. Two new versions of NS are proposed; MCNSbsf
and PNS (Naredo et al., 2016b). The first version is an extension of the
progressive minimal criteria NS (PMCNS) (Gomes et al., 2012). The
second one is a probabilistic approach to compute novelty, this last
proposal has the advantage that it eliminates all of the underlying NS
parameters, and at the same time reduces the computational overhead
from the original NS algorithm while achieving the same level of per-
formance.
In Chapter 5 a first and original case study of applying NS in a GA-
based search is presented to synthesize topologies of current follower
(CF) circuits. Experimental results show twelve CFs synthesis gener-
ated by GA-NS, and their main attributes are summarized. This work
confirms that NS can be used as a promising alternative in the field of
automatic circuit synthesis (Naredo et al., 2016a).
Chapter 6 studies the problem of generalization in evolutionary
robotics, using a navigation task for an autonomous agent in a 2D envi-
ronment. The learning system used is a GE system based on NS, which
8
1.6 summary of publications
conference papers:
9
introduction
1. Naredo, E., Dunn, E., and Trujillo, L. (2013a). Disparity map es-
timation by combining cost volume measures using genetic pro-
gramming. In Schütze, O., Coello Coello, C. A., Tantar, A.-A., Tan-
tar, E., Bouvry, P., Del Moral, P., and Legrand, P., editors, EVOLVE
- A Bridge between Probability, Set Oriented Numerics, and Evolu-
tionary Computation II, volume 175 of Advances in Intelligent Sys-
tems and Computing, pages 71–86. Springer Berlin Heidelberg
10
1.6 summary of publications
10. Naredo, E., Trujillo, L., Fernández De Vega, F., Silva, S., and
Legrand, P. (2015). Diseñando problemas sintéticos de clasifi-
cación con superficie de aptitud deceptiva. In X Congreso Español
de Metaheurı́sticas, Algoritmos Evolutivos y Bioinspirados (MAEB
2015), Mérida, España.
2. Martı́nez, Y., Naredo, E., Trujillo, L., Pierrick, L., and López, U.
(2016). A comparison of fitness-case sampling methods for ge-
netic programming. Submitted to: Journal of Experimental & The-
oretical Artificial Intelligence, currently working with the reviewers’
comments.
3. Naredo, E., Trujillo, L., Legrand, P., Silva, S., and Muño, L.
(2016b). Evolving genetic programming classifiers with novelty
search. To appear: Information Siences Journal.
11
introduction
12
Genetic Programming
2
abstract — Genetic programming (GP) is a powerful tool that
is widely used to solve interesting and complex real-world problems.
This chapter introduces an overview of GP, focusing mainly on the
evaluation process, and particularly using the tree-representation. We
highlight the difference between fitness and objective function, which
is relevant to this research work. Furthermore, we aboard the different
search spaces which are concurrently sampled during the search pro-
cess. On the other hand, we go deeper in the concept of behavior as a
form to observe the performance of a GP program, proposing a variable
scale containing all different behaviors, from the a low level to a higher
description about the performance observed by each solution.
2.1 Introduction
13
genetic programming
(Koza et al., 2000, 2008; Koza, 2010). For this reason GP has become
shorthand for the generation of programs, code, algorithms and struc-
tures (O’Neill et al., 2010). GP can genetically breed computer pro-
grams capable of solving, or approximately solving, a wide variety of
problems from a wide variety of fields (Koza, 1992a).
GP is a domain-independent evolutionary method intended to solve
problems without requiring the user to know or specify the form or
structure of the solution in advance (Poli et al., 2008b). GP is inspired
in natural genetic operations, applying similar operations to computer
programs, such as crossover (sexual recombination), mutation and re-
production. Computer programs can be represented in several ways,
but since trees can be easily evaluated in a recursive manner this is the
traditional representation used in the literature (Koza, 1992a).
GP-trees are composed by two different types of nodes known as
functions and terminals. Function are internal tree nodes that rep-
resent the primitive operations used to construct more complex pro-
grams, such as standard arithmetic operations, programming struc-
tures, mathematical functions, logical functions, or domain-specific
functions (Koza, 1992a). Terminal nodes are at the leaves of the GP-
trees and usually correspond to the independent variables of the prob-
lem, zero-arity functions or random constants.
There are other GP versions that use different representations, such
as linear genetic programming (Brameier and Banzhaf, 2010), cartesian
GP (Peter, 2000), MicroGP1 or in short µGP (Squillero, 2005), and some
authors have even developed special languages for GP-based evolution
such as Push, which is used to implement the PushGP system (Spector
and Robinson, 2002) .
Moreover, GP is a very flexible technique, a characteristic that has
allowed researchers to apply it in various fields and problem domains
(Koza et al., 2000, 2008; Koza, 2010). In computer vision, for instance,
GP has been used for object recognition (Howard et al., 1999; Ebner,
2009; Hernández et al., 2007), image classification (Krawiec, 2002; Tan
1 More information about MicroGP can be found on the following web-pages;
http://www.cad.polito.it/research/Evolutionary_Computation/MicroGP.htm,
and http://ugp3.sourceforge.net/
14
2.2 genetic programming overview
et al., 2005), feature synthesis (Krawiec and Bhanu, 2005; Puente et al.,
2011), image segmentation (Poli, 1996; Song and Ciesielski, 2008), fea-
ture detection (Trujillo et al., 2008c, 2010; Olague and Trujillo, 2011)
and local image description (Pérez and Olague, 2008; Perez and Olague,
2009). GP has been successfully applied to computer vision problems
since it is easy to define concrete performance criteria and the develop-
ment or acquisition of data-sets is now trivially done. However, most
problems in computer vision remain open and most successful propos-
als rely on state-of-the-art machine learning and computational intelli-
gence techniques.
15
genetic programming
16
2.4 search spaces in gp
*
GP program K1
+ * Prefix notation:
K1 (x ) = times(plus(x, x ), times(plus(1, 1), x )).
X X + X
Infix notation:
1 1 K1 (x ) = (x + x ) ∗ ((1 + 1) ∗ x )) .
+
GP program K2
power power Prefix notation:
sin 2 cos 2
K2 (x ) = plus(power(sin(x ), 2), power(cos (x ), 2)).
Infix notation:
X
K2 (x ) = sin2 x + cos2 x.
X
17
genetic programming
While these spaces have been the focus of most GP research, other
spaces have recently been used to develop new GP-based algorithms.
For instance, the computer program K produces an output vector,
named recently in GP literature as semantics (Moraglio et al., 2012).
18
2.4 search spaces in gp
K1 K2 f(K1)=0.6 f(K2)=0.4
+ +
÷ x x x ÷ x x x
x x x x
K3 K4 f(K3)=0.5 f(K4)=0.8
- - *
*
x x x x
x x
Parent Parents
K: Individual
f: Fitness function - + * +
Functions
x ÷ x x ÷ x x x
Terminals x x x x x Trash
x
+ * -
*
x x ÷ x x x
x x
x x x x
O spring O spring
target ( )
program objective
syntax function
stimulli Fitness
input semantics
genotype
19
genetic programming
20
2.4 search spaces in gp
target ( )
21
genetic programming
account for the observed high-level interactions between the robot and
its environment. In fact, in a broad survey of types of fitness functions
used in ER, Nelson et al. (Nelson et al., 2009) found that much of the
research introduces a priori human knowledge when selecting a fitness
function to solve a given problem. Nelson et al. group the fitness func-
tions into seven classes, called training data, behavioral, functional in-
cremental, tailored, environmental, competitive and aggregate fitness
functions, ordered based on the amount of a priori knowledge incorpo-
rated into the function. The first class of fitness functions coincides
with the most common approach taken in GP, where training data is
used, the highest amount of a priori knowledge about the problem, the
same as depicted in Figure 2.3. However, as Nelson et al. stated, this
type of fitness functions require specific knowledge of what should be
the optimal output, something that in many cases is not feasible or
might be even unnecessary, as we argued in the example of the SRS
classifier. Therefore, all other classes of fitness functions in ER, each to
a different extent, incorporate the concept of behavior.
Since we can find some disagreement about the interpretation of the
concept of behavior that we can find in different areas, we agree with
the definition given in (Levitis et al., 2009) (p. 108) from the behav-
ioral biology perspective, which states that a “behavior is the internally
coordinated responses (actions or inactions) of whole living organisms
(individuals or groups) to internal and/or external stimuli, excluding
responses more easily understood as developmental changes”.
Considering the above definition, in this work we understand the
concept of behavior as a measurable description about the internally
coordinated external responses of a computer program K to internal
and/or external stimuli x within a given environment or context. The
behavior produced by a particular solution K is captured by a domain
dependent descriptor β. In the ER case, context is given by the robot
morphology and parts of the environment that cannot be sensed, while
the inputs are the robot sensors and the outputs of the controller in-
terface directly with the actuators. In this case the robot behavior de-
scriptor can include such quantities as the robot position (Lehman and
Stanley, 2008), the robot velocity (Nelson et al., 2009) or patterns gen-
22
2.4 search spaces in gp
erated in the robot’s path (Trujillo et al., 2008b). Conversely, for the
classification problem described above, the context is provided by the
specified classification rule R, while one way to describe a classifier be-
havior can be related to the accuracy of the classifier on the training
instances. Afterward, the observed behavior β can be used to compute
fitness by a function that considers the objective either explicitly or im-
plicitly.
The explicit approach is the customary way to compute fitness, us-
ing an objective function measuring how close the observed behavior
is to a particular goal. The implicit approach can be to compute fitness
based on the novelty or uniqueness of each behavior, such as in NS.
The concept of behavior is graphically depicted in Figure 2.4, along
with the manner in which behavioral information is included in the
computation of the objective and fitness functions.
O
B
A V IORS
J
s E
B EH
e
m C
a
n T
t
i I
c
s
V
E
Fine Medium Coarse
Level of Detail
Figure 2.5: Conceptual view of how the performance of a program can
be analyzed. At one extreme we have objective-based analysis, a coarse
view of performance. Semantics-based analysis lies at another extreme,
where a high level of detail is sought. Finally, behavior analysis pro-
vides a variable scale based on how the problem context is considered.
23
genetic programming
24
2.5 gp case study: disparity map estimation for stereo vision
25
genetic programming
50 50 50
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
50 50 50
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
Figure 2.6: Stereo images for experiment tests from Middlebury Stereo
Vision Page.
26
2.5 gp case study: disparity map estimation for stereo vision
Parameter Description
Population size 20 individuals.
Generations 100 generations.
Initialization Ramped Half-and-Half,
with 6 levels of maximum depth.
Operator probabilities Crossover pc = 0.8; Mutation pµ = o0.2.
n √
Function set + , − , ∗ , / , · , sin , cos , log , x2 , | · |
Terminal set {CVSAD , CVN CC , CVBT O } Where CV is the Cost Volume
from SAD, NCC, and BTO functions respectively
Bloat control Dynamic depth control.
Initial dynamic depth 6 levels.
Hard maximum depth 20 levels.
Selection Lexicographic parsimony tournament
Survival Keep best elitism
rows N and the number of columns M, and particularly for the experi-
ments the neighborhood size used is 3 × 3.
Fitness is given by the cost function shown in Equation 2.1, that
assigns a cost value to every individual K expression as feasible solution
given by GP. The goal is to minimize the error computed between the
disparity map from every K expression and the ground truth, therefore
M−1 N −1
1 XX S
f S (K ) = |d(CV ) (i, j ) − d(SGT ) (i, j )| , (2.1)
NM m
i =0 j =0
27
genetic programming
1.39
1.38
1.37
Cost Function
1.36
1.35
1.34
1.33
1.32
0 20 40 60 80 100 120
Number of generations
Sin Plus
Plus Sin
CVbto
Sin Plus
CVbto
Plus Sin
CVncc CVsad
Plus
CVsad CVsad
sin
CVncc
CVsad
M−1 N −1
1 XX s s
BP = (|dCVm (i, j ) − dGT (i, j )| > δd ) , (2.2)
N
i =0 j =0
28
2.5 gp case study: disparity map estimation for stereo vision
Table 2.2: Comparison results for the Barn1 image (bold indicates best
result).
50 50 50
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
50 50
100 100
150 150
200 200
250 250
300 300
350 350
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
We compare the SAD, NCC, and BTO matching costs against the
GP cost volume. The GP search was executed over 20 times, in all cases
achieving similar performance. Figure 2.7 presents the convergence
plot for the best run, which shows how the fitness of the best individ-
ual evolved over each generation. Figure 2.7 presents the best GP so-
29
genetic programming
Table 2.3: Comparison results for the Barn2 image (bold indicates best
result).
50 50 50
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
50 50
100 100
150 150
200 200
250 250
300 300
350 350
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
30
2.6 chapter conclusions
Table 2.4: Comparison results for the Bull image (bold indicates best
result).
50 50 50
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
50 50
100 100
150 150
200 200
250 250
300 300
350 350
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
31
genetic programming
Table 2.5: Comparison results for the Poster image (bold indicates best
result).
50 50 50
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
50 50
100 100
150 150
200 200
250 250
300 300
350 350
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
32
2.6 chapter conclusions
Table 2.6: Comparison results for the Sawtooth image (bold indicates
best result).
50 50 50
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
50 50
100 100
150 150
200 200
250 250
300 300
350 350
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
33
genetic programming
Table 2.7: Comparison results for the Venus image (bold indicates best
result).
50 50 50
. 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
50 50
100 100
150 150
200 200
250 250
300 300
350 350
50 100 150 200 250 300 350 400 50 100 150 200 250 300 350 400
34
3
Deception
3.1 Introduction
35
deception
ing block hypothesis (BBH). In this hypothesis Goldberg states that GAs
can identify segments (blocks) of the optimal solution contained in the
current solution. Furthermore, Goldberg hypothesises that GAs use
these blocks to generate new and better solutions by recombining them
or by mutating them, which at the end (mostly) build-up the complete
optimal solution.
There are a wide range of benchmark problems to test the per-
formance of EAs, which can be ranked from easy to hard problems.
Among the efforts to characterize GA-hard problems some have fo-
cused particularly on the notion of deception. Since the introduction
of the GAs, many deceptive problems have been proposed for binary-
coded GAs.
More recently, (Lehman and Stanley, 2008) have proposed several
deceptive navigation robotic tasks. But, to the best of our knowledge,
there are not deceptive benchmark problems for pattern recognition,
particularly deceptive classification problems.
This work introduces a first attempt to design a deceptive classi-
fication problem that can be used for benchmarking. The following
sections address the notion of deception.
Finding the factors that affect the performance of GAs to solve opti-
mization problems, has been a major interest in the theoretical commu-
nity (Jones and Forrest, 1995). In general, the performance is measured
in terms of their ability to find the closest solution to the global opti-
mum of a given problem. So, according with the BBH, if a GA finds the
global optimum, particularly on problems with a binary representation,
it is because in fact it correctly identified the correct building blocks for
the problem, classifying those problems as easy-GA problems. On the
other hand, it is a hard-GA problem if a GA fails to find the global opti-
mum for that particular problem and this means it did not identify the
correct building blocks.
36
3.3 deception in evolution
37
deception
38
3.4 deceptive problems
where l is the binary string length, a is the global optimum, b is the local
(deceptive) optimum, and h ∈ (0, l ) is the location of the slope change
which divide the region where the optimum local is located from the
region where can be found the global optimum with the binary string
of size l containing just ones, an example is shown in Figure ??. Trap
global
optimum
local
optimum
=number of ones
Figure 3.1: Trap function with the global optimum located at point a,
and the local optimum is located at point b, u is the variable with the
number of ones in the binary string. Varying the parameters of the
function the degree of deception can be varied.
39
deception
x2 l , |x| ≤ 1,
1 2 πω (|x|−1)
fl,ω (x ) = 1 − 2 sin , 1 ≤ |x| ≤ l + 1 , (3.5)
l
(|x| − l )2 ,
|x| ≥ l + 1.
with parameters l ≥ 0 and ω ∈ N. This function is symmetric around
the global optimum 0, and is based on the standard parabolic curve,
which is cut between +1 and −1 constructing a plateau of length l. In
30
20
-20
region with similar fitness local optima
10
global optimum
-20 -15 -10 -5 5 10 15 20
the region 1 ≤ |x| ≤ l + 1 where both plateaus are located, the parameter
ω controls the amplitude of the sine function, in such a way that if
ω = 0 there will be a flat region, while varying ω will introduce a local
optima region generated by the sine waves, this controls the deception
degree of the function.
40
3.5 deceptive classification problem design
OBJECTIVE Attractor
START
Figure 3.3: Deceptive navigation task. The figure on the left shows the
optimal path to the objective. The figure on the right shows how the
objective acts as an attractor rewarding the solutions that are closer to
the objective, but in fact they are getting trapped into a local optima
region.
41
deception
the local optimum. If we observe the trap function shown in Figure ??,
we can note that the smaller the region where the global optimum is
located, the more difficult will be to reach that region and to find the
global optimum.
We can see clearly that for a GA it is more likely to fall in the larger
region and getting trapped in the local optimum, even if there is a so-
lution that falls in the smaller region the gradient can push it all the
way to reach the global optimum, can be a set of counterpart located
in the bigger region with higher fitness, therefore having higher prob-
ability to be chosen, in such a way that the solution in the right way
can be lost. Following this reasoning, we can now attempt to generate a
similar fitness landscape considering a special synthetic dataset, linear
classifiers and an objective function that measures the deceptive nature
of the problem.
42
3.5 deceptive classification problem design
local optima
global optima
for Figure 3.3 we generated p data points that fall within circle shaped
clusters.
43
deception
g ( x ) = wT x + w 0 = 0 (3.6)
44
3.5 deceptive classification problem design
work with hyperplanes in the Rl +1 space, which pass through the ori-
gin. This is only for notational simplification. Once a w0 is estimated,
an x is classified to class ω1 if
w0T x0 = wT x + w0 ≥ 0 (3.8)
or to class ω2 if
for the 2-class classification task. In other words, this classifier gener-
ates a hyperplane decision surface; points lying on one side of it are
classified to ω1 and points lying on the other side are classified to ω2 .
For notational simplicity, we drop out the prime and adhere to the no-
tation w, x; the vectors are assumed to be augmented with w0 and 1,
respectively, and they reside in the Rl +1 space.
TP +TN
Acc = (3.10)
T P + FP + T N + FN
45
deception
0.9
0.8
0.7
itness
0.6
FFitness
0.5
0.4
0.3
0.2
0.1
0
180 160 140 120 100 80 60 40 20 0
Degrees
Rotation (degrees)
(a) Search space (b) Fitness landscape
Figure 3.5: The left figure shows the search space showing the sectors
Sg and Sl , where the global optimum and local optima are located re-
spectively. The right figure shows the 2D fitness landscape.
46
3.5 deceptive classification problem design
where it is shown with a blue dotted line the performance for the half
section approximately from 180◦ to 170◦ .
Rotating the drawn line with the same direction approximately
from 170◦ to 155◦ , will start classifying the data from the majority sub-
dataset Ca1 beginning to decrease the classification rate from 100% to
50% as shown in Figure 3.4 (b).
From approximately 155◦ to 140◦ , the linear model start recovering
its classification performance reaching in this section a maximum of
90%, given by the imbalance factor p = 0.90 for this dataset configura-
tion.
The next circle sector that our rotating linear model meets is the
sector Sl , which contains the local optima region generates a plateau re-
gion in the fitness landscape from approximately 140◦ to 15◦ as shown
in Figure 3.4 (b), which can be seen as a region with neutrality.
Finally, the linear model will increase the classification rate when it
meets the minority sub-dataset Cb1 until it reaches the perfect accuracy
when it meets the next sector Sg . Since the geometry of the clusters
is symmetric, then left half from the circle shaped search space will
show similar behavior as described previously, mirroring the fitness
landscape from 0◦ .
47
deception
ure 3.3 is to easily transmit the general idea, but there could be other
choices, for instance to consider more than two clusters per class with
different amounts of imbalance.
( )
48
3.5 deceptive classification problem design
Acc · d∆min ,
∆a > 0, Acc ≥ h,
a
Fitness = (3.11)
Acc · dmin , ∆b > 2σd 1 + 2σd 2 , otherwise
∆b
49
deception
Class-1 Class-1
100 Class-2 100 Class-2 Class-1
100 Class-2
50 50
50
0 0
0
−50 −50
−50
−100 −100
−100
−100 −50 0 50 100 −100 −50 0 50 100
1 1 1
itness
ness
tness
F i tFitness
FFitness
F iFitness
0.5
0.5 0.5
0 0 0
180 160 140 120 100 80 60 40 20 0 180 160 140 120 100 80 60 40 20 0 180 160 140 120 100 80 60 40 20 0
Rotation (degrees)
Degrees Degrees
Rotation (degrees) Degrees
Rotation (degrees)
Fitness
Fitness
Fitness
50
3.6 chapter conclusions
The next step after generating the dataset configuration and the de-
ceptive fitness function is testing a set of synthetic classification prob-
lems using some standard classification methods. In this case, the meth-
ods selected are; naive Bayesian method and the Support Vector Ma-
chine (SVM). The first method based on the data distribution leads to
the local optimal with 50% performance in all the datasets tested, while
the SVM method achieve a perfect score of 100% finding the global op-
timal, because this method concerns mainly in the optimal data separa-
tion.
51
deception
52
4
Novelty Search
Instead of aiming for the objective, novelty search looks for novelty;
surprisingly, sometimes not looking for the goal in this way leads to find-
ing the goal more quickly and consistently. While it may sound strange,
in some problems ignoring the goal outperforms looking for it. The
reason for this phenomenon is that sometimes the intermediate steps
to the goal do not resemble the goal itself. John Stuart Mill termed this
source of confusion the “like-causes-like” fallacy. In such situations,
rewarding resemblance to the goal does not respect the intermediate
steps that lead to the goal, often causing search to fail.
53
novelty search
4.1 Introduction
Novelty search (NS) was born from a radical idea about artificial
intelligence (AI) proposed by Lehman and Stanley (2008) based on
previous evolutionary art experiments using their Picbreeder system
(Stanley, 2007). Genetic art was first introduced by Richard Dawkins
(Dawkins, 1986), currently known as evolutionary art (Romero and
Machado, 2007). Picbreeder1 is a webpage where visitors using their
creativity can breed pictures, which are able to have “children” that
are slightly different from their parents, and finally get new pictures
with awesome designs.
The radical idea borrowed from Picbreeder to develop NS, is that
it can be found solutions without really looking for them. The evolu-
tionary art experiment is just one example of non-objective system of
discovery (Stanley and Lehman, 2015). Furthermore, in the experiment
was noticed that the webpage visitors frequently pick the available pic-
tures up as parents, according with an interestingness criteria to have
children from them. Pictures that were the most different, or more
novel, among all other pictures had better chances of being selected for
survival and reproduction. In other words, novelty is a rough shortcut
for identifying interestingness (Stanley and Lehman, 2015).
Provided with this insight the following step was to use these ideas
into the design of a search algorithm. But a first stone on the road
to design this algorithm and then endorse it, is the counter intuitive
idea that a computer algorithm without an objective can work properly,
since almost each of the algorithm designed do have an objective.
When solving a problem with relative low complexity to find the
desired solution, the objective becomes a good idea to drive the search.
But when the problem at hand shows an increasing degree in its com-
plexity, then it is not so easy to find the desired (or approximate) so-
lution. Particularly, in this context, following an objective could guide
the search to find solutions that seem to be good, but in fact are far
away from the desired solution. This is the deceptive phenomenon we
54
4.2 open-ended search
55
novelty search
however, they have mostly been used in Artificial Life Ofria and Wilke
(2004) and interactive search Kowaliw et al. (2012). Only recently has
open-ended search been proposed to solve mainstream problems, one
promising algorithm is Novelty Search (NS) proposed by Lehman and
Stanley (2008).
56
4.3 nuts & bolts of ns
57
novelty search
irrelevant for solving the problem. On the other hand, a too simple be-
haviour characterisation might be insufficient for accurately estimating
the novelty of each individual, and can prevent the evolution of some
types of solutions.
It is important to note that the detail of the behaviour characterisa-
tion is not necessarily correlated with the length of the behaviour vec-
tor. In the maze navigation experiments (Lehman and Stanley, 2011a),
the authors expanded the behaviour characterisation to include inter-
mediate points along the path of an individual through the maze, in-
stead of just the final position. The authors experimented with differ-
ent sampling frequencies, resulting in behaviour characterisations of
different lengths, and the results showed that the performance of the
evolution was largely unaffected by the length of the behaviour char-
acterisation. Although a longer characterisation increased the dimen-
sionality of the behaviour space, only a small portion of this space was
reachable since adjacent points in a given path were highly correlated
(i.e. the agent can only move so far in the interval between samples). It
was demonstrated that larger behaviour descriptions do not necessarily
imply a less effective search, despite having a larger behaviour space.
4.3.2 NS algorithm
58
4.3 nuts & bolts of ns
59
novelty search
ρ (βj ), if the MC are satisfied
ρMCN S (βj ) = (4.13)
0,
otherwise.
60
4.4 contributions on ns
4.4 Contributions on NS
NS suffers from several shortcomings and are the main topic of the
proposal developed in this section. In particular, computing novelty
using Equation 4.11 can lead to several problems. First, it is not evi-
dent which value for k will provide the best performance. Second, the
sparseness computation based on Equation 4.11, has a complexity of
O (m + q )2 , where m is the size of population, and q is the archive size,
which will grow unbounded if it is not implemented as a FIFO queue
(Lehman and Stanley, 2008, 2010b,c, 2011a). Third, choosing which
individuals should be stored in the archive is also important, several
approaches have been proposed and each adds an additional empirical
parameter or decision rule.
In this section we present two proposals to compute novelty. The
first proposal is an extension of the progressive minimal criteria NS
(Gomes et al., 2012), named as MCNSbsf , which considers a dynamical
threshold based on the best-so-far (bsf) solution. The second proposal
is a probabilistic approach to compute novelty named as probabilistic
NS (PNS), which eliminates all of the underlying NS parameters, and
at the same time reduces the computational overhead of the original
NS algorithm.
4.4.1 MCNSbsf
61
novelty search
minimization
measure
>15% bsf
not satisfy MCNS
up to 15% bsf
satisfy MCNS
novelty = 0
quality
best solution
so-far (bsf) 0
compute
novelty
62
4.4 contributions on ns
4.4.2 Probabilistic NS
1
φ (β ) = , (4.14)
P (β )
1
φ(βj ) = Qn , (4.15)
i =1 Pi (βj,i )
63
novelty search
1 δi
b
t
Pi (βj,i = 1) = , (4.18)
m(t + 1)
t t
Pi (βj,i = 0) = 1 − Pi (βj,i = 1). (4.19)
1
φjt = Qn t , (4.20)
i =1 Pi (βj,i )
64
4.4 contributions on ns
t=1
t=2
t=k
t=n-1
t=n
0 1 0 1 0 1 0 1 0 1 0 1
f r e q u e n c y a b o u t (pop + archive) b e h a v i o r s d e s c r ip t io n f r o m e a c h s a m p le
Figure 4.5: Representation of the PNS novelty measure, where each col-
umn represents one feature βi of the AD vector and each row is a dif-
ferent generation. In each column, two graphics are presented, on the
left is the frequency of individuals with either a 1 or 0 for that particu-
lar feature in the current population, and on the right the cumulative
frequency over all generations.
n
X 1
log φjt = log t
. (4.21)
i =1
P ( β
i j,i )
65
novelty search
66
4.5 chapter conclusions
67
novelty search
68
5
NS Case Study: Automatic Circuit Synthesis
5.1 Introduction
69
ns case study: automatic circuit synthesis
EDA tools increase productivity in the design of ICs, even for the
circuit blocks that are not repetitive. Particularly, the analog design
automation is more complex compared to digital design automation,
because the relationships among their specifications are more complex.
Moreover, analog design requires experience, intuition and creativity,
primarily because it works with a large number of parameters that usu-
ally exhibit complex interactions among them.
70
5.2 ga synthesis and cfs representation
71
ns case study: automatic circuit synthesis
72
5.2 ga synthesis and cfs representation
73
ns case study: automatic circuit synthesis
Vdd Vdd
2I 2I
P1 P2
Vb1 M1 M2 Vb2
Vb1 Vb2
O1 O2
Iin Iout Iin Iout
O3 O4
Vb3 Vb4 Vb3 M3 M4 Vb4
P3 P4
2I 2I
Vss Vss
Length(ChCF ) = 2n + n + 3n + 2
= 6n + 2 .
74
5.3 results and analysis
75
ns case study: automatic circuit synthesis
Parameter Description
population size 20
Max generations 200
Stop criteria 10 generations without
the best-fitness changing
Selection Tournament (size= 3)
Crossover one point
Crossover rate 0.9
mutation one point
Mutation rate 0.1
Parameter Description
k−neighbors half of pop size
ρth half of chromosome size
archive control FIFO
archive size double of pop size
76
5.3 results and analysis
5.3.1 CF Topologies
77
ns case study: automatic circuit synthesis
the CFs, and because we are using the same representation and fitness
function as previous works.
For the second series of experiments considering two pairs of O-
P (two MOSFETs), Table 5.4 and Figure 5.3 only the best topologies
found by GA-NS are presented. Topologies No. 4979 and 5059 can be
considered trivial because you can find it manually quite easily. More-
over, the topology No. 36387 shows a known circuit (Razavi, 2001),
which confirms that the GA-NS is guiding the search to good solutions
in the search space. Finally, topologies No. 4567 and 6147 are novel,
and should be considered as new designs of CF. In both topologies gain
values are near to one, but Zin and Zout are not ideal; however these
impedances can be occupied for filter designs made to measure.
The third set of experiments used three O-P pairs to build circuits
with 3 MOSFETs, these results are summarized in Table 5.5 and Figure
5.4. The topology No. 152195 (Figure 5.4(a)) shows three serial Cfsof
ine CMOS, this a clear example of a synthesized CF built from smaller
Cfs. Figures 5.4 (b) and (c) show more elaborate constructions of CFs
found by GA-NS. Finally, the topology no. 252746 in Figure 5.4 (d)
shows a novel structure that has not been presented in any related liter-
ature. Such a design confirms the ability of the NS paradigm to explore
the search space and find unorthodox solutions, even to long-standing
and well-known problems.
It is noteworthy, that the current sources as ideal presented in Fig-
ures 5.2, 5.3 and 5.4, the GAs are generated by mirrors current type
Wilson modified. All topologies have a performance as a current fol-
78
5.3 results and analysis
Vdd
Vdd
Vdd
2I
I I
Iout Iin
Iout
V1
Figure 5.2: Three CF topologies for the synthesis of a CF with the topol-
ogy size of one MOSFET.
Vdd
Vdd
2I
Iin Vdd Vdd I Vdd
Iin
2I 2I I I I
Iin Iin Iin Iout
V1
V1
V2
(a) No. (b) No. 4979 (c) No. 5059 (d) No. 6147 (e) No. 3638
4567
Figure 5.3: Five CF topologies for the synthesis of a CF with the topol-
ogy size of two MOSFETs found by GA-NS.
Let us now analyze the effect that the NS algorithm has on the
search process for circuit synthesis, relative to objective-based search.
Figure 5.5 shows convergence plots of the best solution found by each
algorithm, showing the average behavior over all runs. The figure plots
79
ns case study: automatic circuit synthesis
Vdd Vdd
3I 3I
IoutIin
V1
Iin Iout
3I 3I
Vss Vss
(a) No. 152195 (b) No. 62399
Vdd
Vdd
3I I 2I
Iin Iin
V1
V2
V2
Iout Iout
3I I 2I
Vss Vss
(c) No. 116851 (d) No. 252746
Figure 5.4: Four CF topologies for the synthesis of a CF with the topol-
ogy size of three MOSFETs found by GA-NS.
the objective function value of the best solution with respect to the
number of generations. In particular, we focus on the experiments us-
ing two O-P pairs, for circuits with two MOSFETs. Based on this plot,
we can see no significant difference between both algorithms, the qual-
ity of the solutions found is comparable and the convergence is similar
for both algorithms, even though GA-NS reaches a better average per-
formance. However, as stated before, the topologies found by GA-NS
do not match those found by GA-OS. Therefore, in terms of the per-
formance of the synthesized circuits both algorithms are more or less
equivalent, the difference lies on the actual topologies found by each
80
5.3 results and analysis
5.3 GA−OS
GA−NS
Objective Score
5.2
5.1
4.9
5 10 15 20
Generations
Figure 5.5: Convergence plots for GA-NS and GA-OS for the two MOS-
FET CFs, showing the performance (objective function value) of the
best solution found over the initial generations of the search. The lines
represent the average over 10 runs of the algorithms.
81
ns case study: automatic circuit synthesis
GA-OS GA-NS
Run M1 M2 M3 M1 M2 M3
R1 11 24 20 108 102 140
R2 11 18 11 109 191 201
R3 16 12 13 201 201 97
R4 15 10 21 36 201 97
R5 10 10 22 50 201 113
R6 6 6 6 193 106 176
R7 11 18 11 126 192 70
R8 16 12 13 135 201 201
R9 15 10 21 136 135 201
R10 10 10 22 72 74 201
Aver. 12.1 13.0 16.0 116.6 160.4 149.7
optima. GA-NS certainly finds similar local optima, but the search does
not stagnate given its ability to promote diversity and explore other re-
gions of the search space, allowing it to find solutions that might be-
come inaccessible to GA-OS.
For simplicity, and given the quick conversion of GA-OS, we can
take a snapshot of the type of solutions found by each algorithm af-
ter the first 10 generations. Figures 5.6 - 5.8 compare the composition
of the best solutions found by both GA-OS and GA-NS. The figures
present frequency histograms, where the height of each bar represents
the percentage of runs for which a particular bit in the chromosome
was set to a 1 value in the best solution found so far. For instance, if
a bar reaches a value 0.5 this it means that a 50% of the best solutions
found after 10 generations have a 1 at that particular position within
the chromosome. The plots are divided for each experimental config-
uration, with Figure 5.6 showing the results for the single MOSFET
CFs, Figure 5.7 for two MOSFETs and Figure 5.8 for three MOSFETs.
82
5.4 conclusions
1 1
0.8 0.8
Frequency
Frequency
0.6 0.6
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Bit position Bit position
(a) GA-OS (b) GA-NS
These figures nicely illustrate our claim, that GA-NS finds solutions
that are different from those found by GA-OS. Moreover, that the solu-
tions found, while being different, achieve the same performance based
on the objective function. This means that GA-NS explores other areas
of the search space, some of which might contain local (or even global)
optima that are not accessible to the standard GA-OS.
5.4 Conclusions
83
ns case study: automatic circuit synthesis
1 1
0.8 0.8
Frequency
Frequency
0.6 0.6
0.4 0.4
0.2 0.2
0 0
1 2 3 4 5 6 7 8 9 10111213 1 2 3 4 5 6 7 8 9 10111213
Bit position Bit position
(a) GA-OS (b) GA-NS
1 1
0.8 0.8
Frequency
Frequency
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 5 10 15 20 0 5 10 15 20
Bit position Bit position
(a) GA-OS (b) GA-NS
search is carried out based on the concept of solution novelty. The ex-
perimental results showed that NS allows the algorithm to explore the
search space in a different way than a standard GA-OS does. While the
84
5.4 conclusions
85
ns case study: automatic circuit synthesis
86
6
Generalization of NS-based GP Controllers
for Evolutionary Robotics
87
generalization of ns-based gp controllers
for evolutionary robotics
6.1 Introduction
88
6.1 introduction
89
generalization of ns-based gp controllers
for evolutionary robotics
(Lehman and Stanley, 2008), and (2) f2 considers both the Euclidean
distance and the length of the agents trajectory without considering
any repeated position (Georgiou, 2012).
Second, the novelty search (NS) algorithm proposes a different per-
spective to define fitness (Lehman and Stanley, 2008). NS was moti-
vated by the goal of dealing with deceptive fitness-landscapes (Gold-
berg, 1987) in ER, where the objective function tends to guide the
search away from the global optimum. NS replaces the objective func-
tion to compute fitness with a measure that quantifies how unique, or
novel, an evolved solution is with respect to all previously found solu-
tions during the search. However, instead of using genotypic diversity,
a common tool in EAs (Nicoară, 2009; Burke et al., 2004), NS uses a
description of what each solution does within its environment, what
can be referred to as a behavior descriptor (Lehman and Stanley, 2008;
Trujillo et al., 2011b, 2008a; Mouret and Doncieux, 2012). In this way,
selective pressure in NS pushes the search towards novel behaviors,
allowing the search to avoid local optima. In this chapter we experi-
mentally compare two different behavior descriptors including the one
proposed in (Lehman and Stanley, 2010a). NS has been widely used
in navigation and other ER problems with strong results (Lehman and
Stanley, 2008, 2010a, 2011a; Urbano and Loukas, 2013; Gomes et al.,
2013), and has recently been extended to more traditional ML prob-
lems with GP (Naredo and Trujillo, 2013; Martı́nez et al., 2013). How-
ever, most works in ER have not studied the issue of generalization in
NS, or how the size of the training set affects it.
This chapter also studies the impact of how the training set is deter-
mined, considering several different variants. First, the training set is
constructed randomly, using a different number of instances, to evalu-
ate how the size of the training set impacts generalization. This random
strategy is compared with manually selected initial conditions, to eval-
uate the bias that a human designer introduces into the learning pro-
cess and to compare the newly found results with our previous work
(Urbano et al., 2014b). Moreover, recent works in GP suggest that vary-
ing the training set during the evolutionary process can improve gener-
alization (Gathercole and Ross, 1994; Gonçalves et al., 2012; Gonçalves
90
6.1 introduction
and Silva, 2013), but this has only been validated in traditional ML
problems, not in ER. Therefore, this chapter also studies the effect of
varying the training set during evolution, instead of keeping the train-
ing set fixed during the search. The training set is either set statically
for the entirety of the evolutionary process or it is randomly varied at
the beginning of each generation, or at at the beginning of every run
and fixed then. It is assumed that by varying the training set the sys-
tem might be able to cope with a larger set of different scenarios. In
Individual BNF-Grammar Transcription
Binary String
(A) <expr> ::= <line> (0) <expr> 219 % 2 = 1
110110110101010100101001
| <expr> <line> (1) <expr> <line> 85 % 2 = 1
101111110000101100011000
(B) <line> ::= ifelse <condition> [<expr>] [<expr>] (0) <line> <line> 41 % 2 = 1
Translat ion | [<op>] (1) <op> <line> 191 % 3 = 2
Integer String (C) <condition>::= wall-ahead? (0) move <line> 11 % 2 = 1
219 85 41 191 11 24 | wall-left? (1) move <op> 24 % 3 = 0
| wall-right? (2) move turn-left ----------------
(D) <op> ::= turn-left (0)
| turn-right (1) Resulting
move turn-left program
| move (2)
ii. The size of the training set is varied, from the simplest case with
a single fitness-case, to a relatively large set with 60 fitness-cases.
Each training instance defines different initial conditions for the
91
generalization of ns-based gp controllers
for evolutionary robotics
agent within the environment, specifying its position and orienta-
tion. Generalization is then evaluated based on the performance
achieved on a test set of 100 randomly generated instances, con-
sidering both the quality given by the objective functions and the
percentage of cases in which the agent reaches the target (referred
to as hits).
iii. The manner in which the training set is determined is also evalu-
ated, considering three different approaches: (1) randomly setting
the training set at the beginning of each run; (2) randomly chang-
ing the training set at the beginning of each generation; or (3) man-
ually setting the training set based on expert knowledge, which is
fixed for all runs.
6.2 Background
92
6.2 background
93
generalization of ns-based gp controllers
for evolutionary robotics
takes a decision and prefers the correct hypotheses over others. This
preference can be due to a bias in the learning algorithm or prior knowl-
edge regarding the problem domain. According to Kushchu, there are
two major types of bias; representational and procedural. Representa-
tional bias is given by the language used to describe the hypotheses,
and procedural bias is given by the search process.
94
6.2 background
95
generalization of ns-based gp controllers
for evolutionary robotics
using an objective-based approach, sharing similar characteristics, and
tested the learned behaviors evolved with different set of similar trails.
He was able to successfully evolve general trail following agents. In
(Lehman and Stanley, 2010a) using standard GP, and in (Urbano and
Loukas, 2013) using GE, NS was applied successfully to the SFT prob-
lem, a known deceptive problem in GP.
Important open issues that have not been considered include the
effect that the training set size has on generalization, as well as the
manner in which the training set is constructed, or how different ob-
jectives and descriptors might impact the search. However, (Trujillo
et al., 2011b) showed that by promoting behavioral diversity during
the search for navigation behaviors, an EA could find several different
solutions to the same problem, and that such solutions exhibited bet-
ter generalization abilities when placed in an unknown environment.
Those results correlate with the hypothesis that the search for novel
and unique behaviors can help an evolutionary process identify gen-
eral solutions.
96
6.3 experimental setup
97
generalization of ns-based gp controllers
for evolutionary robotics
I12 I3
I11 I4
I8
I7
I9 I10
target
I2
I1
I6
I5
Figure 6.3: Figure (a) shows the learning environment, where the target
is depicted by a black square and 12 manually chosen initial conditions
for each artificial agent (labeled from I1 to I12 ). These instances are
grouped into 6 pairs represented as two triangles with different color,
and each pair share the same location (x, y ) but different orientations
(North, East, South, or West). Figure (b) shows 100 initial conditions,
which were randomly generated to be used as the test set for all experi-
ments.
cells, so the total cells that an agent can visit is (37 × 21) − 11 = 766.
This environment is similar to the one used in (Lehman and Stanley,
2010a), where it is named as the ‘medium’ (difficulty) maze. Similar
to the present work, (Lehman and Stanley, 2010a) compared the per-
formance of GP using three different approaches to guide the search:
objective, novelty, and a random fitness. However, in (Lehman and
Stanley, 2010a) the training set always contained a single starting point
for the agent (the top-left corner), only allowed 100 moves for the agent
and did not consider a test set.
In this scenario, each training (or testing) instance is defined by the
pair Ii = (xi , θi ), that defines the initial position xi = (x, y ) of the agent
within the grid environment specified by the row x and column y, and
the initial orientation θi which can be four possible values: North (N),
South (S), West (W) and East (E).
The BNF grammar that defines the space of possible programs is
shown in Figure 6.1, which is used to perform the genotype to pheno-
98
6.3 experimental setup
To the best of our knowledge, previous works have not studied the
effect that the training set size has on generalization for a navigation
problem in ER. Here, we consider seven different training set sizes, of:
1, 2, 6, 12, 24, 48, and 60 instances. Moreover, to evaluate generaliza-
tion a test set is needed, thus here the testing set is composed by 100
randomly chosen initial conditions, shown in Figure 6.2(b). We believe
that given the size of the environment, 100 instances is sufficient to
evaluate the generalization of the evolved solutions.
99
generalization of ns-based gp controllers
for evolutionary robotics
termined at the beginning of each run; (2) a set of randomly chosen
instances chosen at the beginning of each generation; and (3) a set of
manually chosen instances used in all runs.
The manual approach undoubtedly introduces a human bias into
the learning process, which might compromise any conclusions drawn
from the experimental results. If the problem requires specific initial
conditions then this is not a issue, but determining how to construct
the training set for an arbitrarily complex environment is in no way a
trivial task. Moreover, as the results of this chapter will show, using
random training sets not only simplifies the problem formulation but
it actually improves the quality of the results.
The first two approaches, (1) and (2), remove this bias and allow
us to evaluate the effect of training set size irrespective of the initial
positions of the robot. One drawback about randomizing the selection
of the training set in our experimental setup is that they may coincide
with those used for testing. But even in the worst case with a training
set size of 60 and using 100 testing instances, there will be less than
2% of overlap between both sets, and will be mostly particularly for
the smaller training sets. The second approach (2) induces a dynamic
learning process with a non-static fitness landscape, a scenario where
solution diversity and generalization would seem to be necessary. For
instance, recent work (Gonçalves and Silva, 2011b) has suggested that
changing the fitness cases used at each generation can help improve
generalization and reduce overfitting in GP.
1
f1 = (6.28)
1 + dist(α, t )
100
6.3 experimental setup
1
f2 = . (6.29)
dist(α,t )
1+ β
101
generalization of ns-based gp controllers
for evolutionary robotics
Table 6.1: A general description of the algorithms and measures used
in the experimental set-up.
Name Description
Training set selection g Randomly chosen at the beginning of each generation
r Randomly chosen every run.
Manual Manual selection of training cases.
Algorithms Ng Novelty-based search with a training set of randomly cho-
sen
instances chosen at the beginning of each generation.
Nr Novelty-based search with a training set of randomly cho-
sen
instances determined at the beginning of a run.
Og Objective-based search search with a training set of ran-
domly chosen
instances chosen at the beginning of each generation.
Or Objective-based search with a training set of randomly
chosen
instances determined at the beginning of a run.
R Random-based search
instances determined at the beginning of a run.
Measures f1 Fitness function based on the Euclidean distance.
f2 Fitness function based on the Euclidean distance & vis-
ited cells.
dα Descriptor using the final position of the agent, related
with f1 .
dβ Descriptor using the number of of visited (non-repeated)
cells, related with f2 .
The limit of allowable moves simulates the energy (i.e. battery life)
that an agent has to solve the navigation task. In this work, we chose
to determine a fixed limit for all the experimental conditions. To do
102
6.3 experimental setup
100 100
80 80
60 60
40 40
20 20
0 0
0 200 400 600 800 1000 0 200 400 600 800 1000
(a) Hit-Test for f1 (OS) and dα (NS) (b) Hit-Test for f2 (OS) and dβ (NS)
103
generalization of ns-based gp controllers
for evolutionary robotics
Table 6.2: Parameters used for the experimental work. Codons-min
and Codons-max are the minimal and maximal number of codons in
the initial random population of the GE search.
Table 6.3 summarizes the results for all of the experiments that used
randomly chosen training sets, organized by the number of training in-
stances (columns 1), the manner in which the training set was selected
104
6.4 results and analysis
Table 6.3: Summary of the experimental results for all of the variants
showing the average over a 100 runs, for the configurations that used
a random training set: the training set size is from 1 to 60; Sel is the
manner in which the training set is set, for every generation (Gen) or
for every run (Run); and three search strategies are considered, Objec-
tive, Novelty, and Random. Results are given for training and testing
performance, for training the performance of the best solution found
is evaluated on each objective function (f1 and f2 ) as well as for testing.
The percentage of hits, H1 and H2, is shown only for the test set. In all
cases Bold indicates the best performance.
Training Test
Size Sel Fitness f1 f2 H1 H2 f1 f2
1 Rand-Gen Novelty 1.0000 1.0000 12% 16% 0.1966 0.7655
Objective 0.9691 0.9989 4% 7% 0.1227 0.7246
Rand-Run Novelty 1.0000 1.0000 27% 29% 0.3315 0.8559
Objective 0.7068 0.9945 12% 22% 0.1954 0.8326
Random 0.5386 0.9689 8% 6% 0.1610 0.7848
2 Rand-Gen Novelty 1.0000 1.0000 45% 43% 0.5073 0.9206
Objective 0.6576 0.9870 10% 18% 0.1848 0.8939
Rand-Run Novelty 0.9960 1.0000 56% 59% 0.6080 0.9504
Objective 0.7296 0.9824 39% 38% 0.4439 0.9272
Random 0.4485 0.9460 10% 8% 0.1826 0.8771
6 Rand-Gen Novelty 0.9854 1.0000 92% 88% 0.9289 0.9914
Objective 0.4824 0.9655 15% 35% 0.2254 0.9435
Rand-Run Novelty 0.9733 0.9998 92% 91% 0.9280 0.9918
Continued on next page. . .
105
generalization of ns-based gp controllers
for evolutionary robotics
Table 6.3 – continued from previous page
Training Test
Size Sel Fitness f1 f2 H1 H2 f1 f2
Objective 0.7209 0.9744 56% 48% 0.6057 0.9611
Random 0.2908 0.9335 7% 13% 0.1524 0.9190
12 Rand-Gen Novelty 0.9730 1.0000 94% 98% 0.9485 0.9979
Objective 0.5131 0.9664 29% 39% 0.3609 0.9524
Rand-Run Novelty 0.9731 0.9999 94% 97% 0.9501 0.9978
Objective 0.7520 0.9784 66% 62% 0.6964 0.9735
Random 0.2524 0.9274 8% 12% 0.1673 0.9223
24 Rand-Gen Novelty 0.9548 0.9999 92% 99% 0.9306 0.9990
Objective 0.6375 0.9643 49% 40% 0.5419 0.9540
Rand-Run Novelty 0.9650 0.9999 94% 99% 0.9509 0.9989
Objective 0.8099 0.9740 75% 56% 0.7749 0.9721
Random 0.2682 0.9253 13% 8% 0.2126 0.9188
48 Rand-Gen Novelty 0.9739 1.0000 96% 100% 0.9642 0.9997
Objective 0.6727 0.9660 58% 48% 0.6199 0.9597
Rand-Run Novelty 0.9614 0.9999 95% 99% 0.9521 0.9996
Objective 0.8139 0.9742 78% 58% 0.8068 0.9727
Random 0.2379 0.9267 13% 12% 0.2102 0.9244
60 Rand-Gen Novelty 0.9648 0.9998 95% 99% 0.9581 0.9993
Objective 0.6842 0.9640 58% 41% 0.6223 0.9576
Rand-Run Novelty 0.9530 0.9996 95% 99% 0.9530 0.9993
Objective 0.7390 0.9748 70% 58% 0.7336 0.9737
Random 0.2550 0.9261 15% 10% 0.2313 0.9237
106
6.4 results and analysis
1
1 1 1
0.8
Fit−train
Fit−train
Fit−train
Fit−train
0.8 0.8 0.8
0.6 0.6
0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
1 1 1
Fit−train
Fit−train
Fit−train
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
Figure 6.5: Box plot comparison of the best solution found using train-
ing fitness with function f1 and descriptor dα . ‘N’ stands for novelty-
based search, ‘O’ for objective-based search, and ‘R’ stands for random-
based search. Sub-index ‘g’ and ‘r’ stand for the method used to choose
randomly the training set, either every generation or every run, respec-
tively. Figures are sorted in ascending order according to the number
of instances used in the training sets, from 1 to 60.
hits. All of these figures present results for five of the algorithms sum-
marized in Table 6.1, these are: Ng, Nr, Og, Or and R.
These results exhibit some clear trends. First, let’s consider train-
ing performance for both objective functions, shown in Figures 6.4 and
6.7. NS achieves good performance irrespective of the size of the train-
ing set or the manner in which the training set is chosen (per run or
per generation) for both objectives. On the other hand, objective-based
search shows worse performance than NS, and Or is relatively better
than Og in both figures, suggesting that it is better to keep the training
set static for all the generations of the run when using this form of se-
lective search. Random search clearly shows worse performance, but R
is basically equivalent to Og for small training sets.
107
generalization of ns-based gp controllers
for evolutionary robotics
1 1 1 1
0.8 0.8 0.8 0.8
Fit−test
Fit−test
Fit−test
Fit−test
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
1 1 1
0.8 0.8 0.8
Fit−test
Fit−test
Fit−test
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
Figure 6.6: Box plot comparison of the best solution found using test
fitness with function f1 and descriptor dα . ‘N’ stands for novelty-based
search, ‘O’ for objective-based search, and ‘R’ stands for random-based
search. Sub-index ‘g’ and ‘r’ stand for the method used to choose ran-
domly the training set, either every generation or every run, respec-
tively. Figures are sorted in ascending order according to the number
of instances used in the training sets, from 1 to 60.
108
6.4 results and analysis
Hit−test
Hit−test
Hit−test
50 50 50 50
0 0 0 0
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
Hit−test
Hit−test
50 50 50
0 0 0
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
Figure 6.7: Box plot comparison of the percentage of testing hits by the
best solution found using function f1 and descriptor dα . ‘N’ stands for
novelty-based search, ‘O’ for objective-based search, and ‘R’ stands for
random-based search. Sub-index ‘g’ and ‘r’ stand for the method used
to choose randomly the training set, either every generation or every
run, respectively. Figures are sorted in ascending order according to
the number of instances used in the training sets, from 1 to 60.
109
generalization of ns-based gp controllers
for evolutionary robotics
1
1 1 1
0.8
Fit−train
Fit−train
Fit−train
Fit−train
0.8 0.8 0.8
0.6 0.6
0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
1 1 1
Fit−train
Fit−train
Fit−train
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
Figure 6.8: Box plot comparison of the best solution found using train-
ing fitness with function f2 and descriptor dβ . ‘N’ stands for novelty-
based search, ‘O’ for objective-based search, and ‘R’ stands for random-
based search. Sub-index ‘g’ and ‘r’ stand for the method used to choose
randomly the training set, either every generation or every run, respec-
tively. Figures are sorted in ascending order according to the number
of instances used in the training sets, from 1 to 60.
of hits on the test set relative to the number of training instances (x-
axis), for both objective functions (columns) and for both manners in
which the training set is selected (rows). These plots clearly suggest
that NS achieves substantially better generalization performance than
traditional objective-based search and random search. Indeed, NS can
achieve almost perfect generalization for sufficiently large training sets,
above 6 instances for the present task.
110
6.4 results and analysis
1 1 1 1
0.8 0.8 0.8 0.8
Fit−test
Fit−test
Fit−test
Fit−test
0.6 0.6 0.6 0.6
0.4 0.4 0.4 0.4
0.2 0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
1 1 1
0.8 0.8 0.8
Fit−test
Fit−test
Fit−test
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
Figure 6.9: Box plot comparison of the best solution found using test
fitness with function f2 and descriptor dβ . ‘N’ stands for novelty-based
search, ‘O’ for objective-based search, and ‘R’ stands for random-based
search. Sub-index ‘g’ and ‘r’ stand for the method used to choose ran-
domly the training set, either every generation or every run, respec-
tively. Figures are sorted in ascending order according to the number
of instances used in the training sets, from 1 to 60.
reach the target from some positions than others. The chosen instances
are depicted in Figure 6.2(a), specifying the location and orientation
of each instance; these instances were used in our preliminary work
(Urbano et al., 2014b). Table 6.4 summarizes the performance of the
learning process when each of the single instances is used for training
(each row), and the average over all (final row). It is evident that most
of the chosen instances cannot lead the search towards a general so-
lution for none of the search strategies. Moreover, if we consider the
averages reported for the test set, they are worse than when compared
with a single randomly chosen training case (first row of Table 6.4). It
is important to note that these training instances were carefully chosen
to provide different scenarios for the learning process, with the hope
that they might allow the learning process to find general solutions
111
generalization of ns-based gp controllers
for evolutionary robotics
Hit−test
Hit−test
Hit−test
50 50 50 50
0 0 0 0
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
Hit−test
Hit−test
50 50 50
0 0 0
Ng Nr Og Or R Ng Nr Og Or R Ng Nr Og Or R
112
6.4 results and analysis
100 100
80 80
HIts %
HIts %
60 60
40 40
20 20
0 0
1 2 6 12 24 48 60 1 2 6 12 24 48 60
Training Set Size Training Set Size
(a) Hits-Gen using f1 and dα (b) Hits-Run using f1 and dα
100 100
80 80
HIts %
HIts %
60 60
40 40
20 20
0 0
1 2 6 12 24 48 60 1 2 6 12 24 48 60
Training Set Size Training Set Size
(c) Hits-Gen using f2 and dβ (d) Hits-Run using f2 and dβ
Novelty-based Objective-based Random-based
the Manual approach. Again, it seems that human bias does not im-
prove search performance, and in fact generates a negative impact.
113
generalization of ns-based gp controllers
for evolutionary robotics
Table 6.4: Results from each of the twelve instances; I1 , ...I12 , where
the sub-index stands for the number of the training instance (in the
heading as ’Size’). Three different approaches of fitness metrics are
used to evolve solutions; novelty, objective and random. Two fitness
function are used to score the best program from each run; f1 considers
last position, and f2 and considers both last position and number of
non-repeated visited cells. The average number of best programs that
hit the target is shown in the column ‘H’ where the subindex indicates
the function used (f1 or f2 ). Limit of moves for all experiments is 600.
Bold indicates the best performance.
Training Test
Size Fitness f1 f2 H1 H2 f1 f2
I1 Novelty 1.0000 1.0000 3% 3% 0.2631 0.8735
Objective 0.7445 0.9847 3% 0% 0.2862 0.8588
Random 0.4403 0.9649 1% 0% 0.1496 0.8572
I2 Novelty 1.0000 1.0000 6% 1% 0.3652 0.8889
Objective 0.9524 0.9889 2% 4% 0.2017 0.9098
Random 0.2801 0.9567 2% 2% 0.1594 0.8715
I3 Novelty 1.0000 1.0000 12% 3% 0.4768 0.9297
Objective 0.6977 0.9890 1% 5% 0.3238 0.9212
Random 0.2019 0.9429 0% 0% 0.1455 0.8873
I4 Novelty 1.0000 1.0000 8% 5% 0.4553 0.9261
Objective 1.0000 0.9774 3% 0% 0.6656 0.9055
Random 0.2273 0.9529 1% 2% 0.1651 0.8661
I5 Novelty 1.0000 1.0000 0% 1% 0.0913 0.7812
Objective 0.9802 0.9983 0% 0% 0.0720 0.7779
Random 1.0000 0.9936 0% 0% 0.0839 0.6711
I6 Novelty 1.0000 1.0000 4% 0% 0.1682 0.8468
Objective 0.9527 0.9946 2% 0% 0.1073 0.8236
Random 0.9301 1.0000 0% 0% 0.1068 0.8502
I7 Novelty 1.0000 1.0000 0% 0% 0.1246 0.5477
Objective 0.9917 1.0000 0% 0% 0.1252 0.5442
Random 1.0000 1.0000 0% 0% 0.1203 0.5372
I8 Novelty 1.0000 1.0000 0% 0% 0.1215 0.5638
Objective 1.0000 1.0000 0% 0% 0.1207 0.5246
Random 1.0000 1.0000 0% 0% 0.1154 0.5456
I9 Novelty 1.0000 1.0000 0% 0% 0.0951 0.4644
Objective 1.0000 1.0000 0% 0% 0.0967 0.4462
Continued on next page. . .
114
6.4 results and analysis
Table 6.5: Table that summarizes the performance of the three meth-
ods; novelty, objective, and random-based search for a set of 12 in-
stances considering 600 as limit of moves, showing the results for the
three different ways to select the training set; manually (Manual), ran-
domly chosen each generation (Rand-Gen), and randomly chosen each
run (Rand-Run). Bold indicates the best performance.
Training Test
Size Sel Fitness f1 f2 H1 H2 f1 f2
12 Manual Novelty 0.9576 0.9996 68% 81% 0.9302 0.9985
Objective 0.6775 0.9701 29% 30% 0.5310 0.9638
Random 0.4100 0.9328 1% 4% 0.1713 0.9177
12 Rand-Gen Novelty 0.9730 1.0000 94% 98% 0.9485 0.9979
Objective 0.5131 0.9664 29% 39% 0.3609 0.9524
Rand-Run Novelty 0.9731 0.9999 94% 97% 0.9501 0.9978
Objective 0.7520 0.9784 66% 62% 0.6964 0.9735
Random 0.2524 0.9274 8% 12% 0.1673 0.9223
115
generalization of ns-based gp controllers
for evolutionary robotics
6.4.3 Statistical Analysis
116
6.5 heat maps
Training Test
Size Methods f1 f2 H1 H2 f1 f2
1 Ng vs Nr 4.000 4.000 0.000 * 0.000* 0.646 0.026
Ng vs Og 0.101 0.629 0.000* 0.000* 0.000* 0.438
Ng vs Or 0.000* 0.000* 3.654 0.000* 2.194 0.111
Nr vs Og 0.101 0.629 0.000* 0.000* 0.000* 0.000*
Nr vs Or 0.000* 0.000* 0.000* 0.000* 0.020* 0.646
Og vs R 0.000* 0.000* 0.000* 0.000* 0.047* 0.920
Or vs R 0.004* 0.000* 0.000* 0.000* 0.139 0.287
2 Ng vs Nr 1.269 4.000 0.000* 0.000* 0.006* 0.007*
Ng vs Or 0.000* 0.000* 0.000* 0.000* 0.124 1.909
Nr vs Or 0.000* 0.000* 0.000* 0.000* 0.038* 0.053
Og vs Or 0.059 2.530 0.000* 0.000* 0.000* 0.224
Og vs R 0.000* 0.000* 2.367 0.000* 2.194 0.010*
6 Ng vs Nr 1.269 0.629 3.332 0.000* 2.060 2.929
Og vs Or 0.000* 3.012 0.000* 0.000* 0.000* 0.117
Og vs R 0.000* 0.000* 0.000* 0.000* 0.920 0.026*
12 Ng vs Nr 4.000 1.269 3.324 1.484 2.764 3.545
Og vs Or 0.000* 0.542 0.000* 0.000* 0.000* 0.068
24 Ng vs Nr 2.955 1.269 0.0000* 2.397 1.968 2.360
Og vs Or 0.000* 0.311 0.000* 0.000* 0.000* 0.029*
48 Ng vs Nr 1.693 2.254 0.0000* 2.344 3.526 3.274
Nr vs Or 0.106 0.000* 0.000* 0.000* 0.064 0.000*
Og vs Or 0.135 0.059 0.000* 0.000* 0.021* 0.004*
60 Ng vs Nr 3.091 4.000 0.198 0.155 3.563 0.882
Og vs Or 1.790 0.132 0.000* 0.000* 0.524 0.035*
117
generalization of ns-based gp controllers
for evolutionary robotics
0.3 0.3 0.3 0.3
2 2 2 2
6 6 6 6
10 10 10 10
0.15 0.15 0.15 0.15
12 12 12 12
14 14 14 14
0.1 0.1 0.1 0.1
16 16 16 16
20 20 20 20
0 0 0 0
5 10 15 20 25 30 35 5 10 15 20 25 30 35 5 10 15 20 25 30 35 5 10 15 20 25 30 35
is computed from the difference between the training and test qual-
ity using the 30 experimental runs. Using the experimental results we
build-up the heat-maps for each orientation is showed in 6.11, where
the average overfitting from the 30 runs are showed according with the
color scale, low values are in color blue while higher values in red.
Trough a color threshold these heat-maps can be transformed into
the binary-maps as shown in the Figure 6.12. Region in color white
contains instances that produce higher values of overfitting, while the
region in black contains the instances with the lower values of overfit-
ting. The hypothesis is that overfitting is directly correlated with the
ability to generalize. Then, instances in the white regions since show
higher values of overfitting they tend to generate very specialized con-
trollers which hardly can find the target from different instances spe-
cially from those located in the the black regions, and for this reason
are considered difficult instances to generalize. This reasoning applies
for the black region, considering their instances as easy to generalize.
With this approach we can test different combinations of easy-
difficult instances, for instance if we consider a training set with a
size of 6, we can get the following set of combinations easy-difficult:
{0 − 6, 1 − 5, 2 − 4, 3 − 3, 4 − 2, 5 − 1, 6 − 0}, each initial condition is chosen
randomly from the corresponding region. The Figure 6.13 (a) shows
an example of a set of 12 initial conditions, particularly for a combina-
118
6.6 chapter conclusions
Figure 6.13: Overfit binary maps, obtained from the binarization of the
Overfit heat-maps by a color threshold to divide the map into easy and
difficult regions.
tion of 8-easy and 4-difficult initial conditions, and the Figure 6.13 (b)
shows the same test set used in the previous experiments.
This experimental framework can give us insight about how this
distinction between easy and difficult initial conditions can benefit to
any of the methods tested, and furthermore to test if any of the possible
combinations is the best suited to improve the generalization abilities
of the navigator.
119
generalization of ns-based gp controllers
for evolutionary robotics
di
easy
target
Figure 6.14: Figure (a) shows the learning environment, where the tar-
get is depicted by a black square, and 12 randomly chosen starting posi-
tions, particularly with a combination (4-8) of 8 “easy” and 4 “difficult”
instances. Each instance has a triplet (x, y, θ ), where x, y is the 2-d loca-
tion and θ is the orientation (North, East, South, and West). Figure (b)
shows 100 initial conditions that were randomly generated to be used
as a test set.
120
6.6 chapter conclusions
keeping the training set fixed during the run produces almost the ex-
act same results as randomly varying it every generation. On the other
hand, objective-based search is hampered when a dynamic fitness func-
tion is used, both in terms of training and testing. Objective-based
search achieves better performance when the training set size is in-
creased and when it is kept fixed throughout the evolutionary process.
Finally, we compare a manual selection of training instances, the
strategy used in our previous work (Urbano et al., 2014b), with the ran-
dom strategy studied here. Results clearly indicate that manual selec-
tion introduces an undesirable bias that negatively effects generaliza-
tion. In particular, testing performance is clearly better with a random
training set instead of a manual a priori set, suggesting that removing
human bias can be very beneficial.
The results presented here open up several possible lines of future
research. While this work provides useful insights regarding the im-
pact of the training set, only coarse features are considered, such as its
size or the overall strategy used to build it (random or manual/fixed or
dynamic). However, more detailed features of the training set can be
controlled. For instance, it is evident that not all training instances are
created equally, some of them specify more difficult scenarios than oth-
ers, or some of them might lead towards deceptive landscapes while
other might not. Therefore, future work will study the impact that
the composition of the training set has on learning and generalization.
Considering, for example, the proportion of easy and difficult training
instances, or the proportion of deceptive and non-deceptive initial con-
ditions. One final open question is to perform similar tests to study the
issues related with the so-called reality-gap in ER.
121
generalization of ns-based gp controllers
for evolutionary robotics
122
7
GP based on NS for Regression
7.1 Introduction
123
gp based on ns for regression
124
7.1 introduction
0.15
max error
Ground Truth
GP-individual
0.1
2
0.05
i
min error
0 1
f(x)
L
εi
−0.05 L-1
εi
−0.1
ε
1=0, 2=1,
i=0, L-1=1, L=1}
−0.15 ε
=[01011]
−0.2 xi xL-1
x1 x2 xL
X
7.1.3 NS Modifications
125
gp based on ns for regression
126
7.2 experiments
7.2 Experiments
127
gp based on ns for regression
The three symbolic regression problems are given in Table 7.1. The
problems were chosen based on the difficulty the problems posed to the
methods published in (Uy et al., 2011a), and they are ordered from the
easiest to the most difficult. It is not claimed that this ordering implies
any deeper understanding of the intrinsic difficulty of the problem, it
is only based on the performance of the algorithms compared in (Uy
et al., 2011a).
The two easiest problems have one independent variable, while the
hardest problem has two independent variables; Figure 7.2 shows the
ground truth function in each problem. Table 7.1 specifies the desired
function and the manner in which the training set (fitness cases) and
testing set are constructed, using the same random procedure in both.
128
7.2 experiments
4 −0.5
3.5 −0.55
3 −0.6 1
2.5 −0.65
0.5
−0.7
2
f(x,y)
f(x)
f(x)
−0.75
1.5
−0.8 −0.5
1
−0.85
−1
0.5 1
−0.9
1
0 −0.95 0.5
0
0
−0.5 −1 −0.5
−1 −0.5 0 0.5 1 −1 −0.5 0 0.5 1 −1 −1
x x y x
Parameter Description
Population size 200 individuals.
Generations 100 generations.
Initialization Ramped Half-and-Half,with 6 levels
of maximum depth.
Operator probabilities Crossover pc = 0.8,
Mutation pµ = 0.2.
√
Function set ( + , − , × , ÷ , | · | , x2 , x ,
n log , sin , cos ,oif ) .
Terminal set x1 , ..., xi , ..., xp ,
where xi is a dimension of
the data patterns x ∈ Rn .
Bloat control Dynamic depth control.
Initial dynamic depth 6 levels.
Hard maximum depth 20 levels.
Selection Tournament of size 6.
129
gp based on ns for regression
Parameter Value
NS nearest neighbor k = 15
Sparseness threshold:
for single variable problems ρmin = 3
for bivariable problem ρmin = 13
-descriptor threshold p = 10
Number of runs per problem runs = 30
130
7.3 chapter conclusions
0.9
0.8
E r r o r
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3
P r o b l e m s
131
gp based on ns for regression
132
GP based on NS for Clustering
8
abstract — This chapter applies NS to unsupervised machine
learning, namely the task of clustering unlabeled data. To this end, a
behavioral descriptor is proposed describing a clustering function per-
formance. Experimental results show that NS-based search can be used
to derive effective clustering functions. In particular, NS is best suited
to solve difficult problems, where exploration needs to be encouraged
and maintained.
8.1 Introduction
133
gp based on ns for clustering
134
8.2 clustering with novelty search
This section presents first two well known data clustering algo-
rithms; K-means and Fuzzy C-means. Afterward, the proposed behav-
ioral descriptor for GP-based clustering functions is presented, and we
provide a discussion regarding its fitness landscape.
8.2.1 K-means
k X
X
arg min ||xj − µi ||2 (8.30)
S i =1 xj ∈Si
135
gp based on ns for clustering
c X
X N
J (U, M) = (ui,j )m Dij (8.31)
i =1 j =1
where, U = [ui,j ]cxN is the fuzzy partition matrix and ui,j ∈ [0, 1] is
the membership coefficient of the jth object in the ith cluster; M =
[m1 , ..., mc ] is the cluster prototype (mean or center) matrix; m ∈ [1, ∞)
is the fuzzification parameter and usually is set to 2; Dij = D (xj , mi ) is
the distance measure between xj and mi .
The training set T used contains sample patterns from each cluster.
Then, for a two-cluster problem with Ω = {ω1 , ω2 } the clustering de-
scriptor (CD) is constructed in the following way. If T = {y1 , y2 , ...yL },
then the behavioural descriptor for each GP clustering function Ki is a
binary vector ai = (a1 , a2 , ...aL ) of size L, where each vector element aj
is set to 1 if clustering function Ki assigns label ω1 to pattern yj and is
set to 0 otherwise.
Suppose that the number of training examples from each cluster
is L2 , and suppose that they are ordered in such a way that the first L2
elements in T correspond to cluster label ω1 . Let x represent a binary
vector, and function u (x) return the number of 1s in x. Moreover, let
KO be the optimal clustering function that achieves a perfect accuracy
on the training set.
Then, the CD of KO s behaviour is given by a1 = (11 , 12 , ...,
1 L , 0 L +1 , ..., 0L ). Moreover, for a two-cluster problem, an equally use-
2 2
ful solution is to take the opposite (complement) behaviours and invert
the clustering, such that a 1 is converted to a 0 and vice-versa. These
mirror behaviours are a0 = (01 , 02 , ...0 L , 1 L +1 ....1L ) for the CD descrip-
2 2
tors. The complete fitness landscapes in behavioural space are depicted
in Figure 8.1.
136
8.2 clustering with novelty search
0.8
0.6
0.4
0.2
0
L/2
L/2
L/4
L/4
0 0
UR(x) UL(x)
137
gp based on ns for clustering
K
1 XX
IntraCluster = ||x − vi ||2 , (8.32)
N
i =1 x∈Ci
IntraCluster
CDR = (8.34)
InterCluster
138
8.3 experiments
8.3 Experiments
139
gp based on ns for clustering
10
5 20
5
10
0 0
0
−5
−5 −10
−10
−20
5 5 10
4 10
2 0 5
0 5 10
0 −5 0 0
−2 −5 0
−5 −10 −5
−4 −10 −10
6
4
2 5
0
−2 0
−4
−6 −5
5 10
5 5
0 5
0 0 0
−5
−5 −5 −5
Parameter Description
8.3.3 Results
140
8.3 experiments
Table 8.2: Average classification error and standard error of the best
solution found by each algorithm on each problem; GT: Ground Truth,
KM: K-means, FC: Fuzzy C-means, NS-based algorithms both use k =
15, but CD1 with ρmin = 20 and CD2 use ρmin = 40.
NS-GP with two baseline methods, KM and FC. The table presents two
comparative views of average performance over all runs. First, the al-
gorithms are compared based on their CDR score, and the CDR of the
ground truth of each problem is also presented. Additionally, using
the ground truth, a classification error was computed, based on the or-
dering suggested by each clustering method. In general, the results
indicate two noteworthy trends. First, NS seems to performs much
worse on the simpler problems, it seems like it is basically doing a ran-
dom search. On the other hand, NS noticeably outperforms the control
methods on the harder problems, this is especially true for the hard-
est, Problem 5. Second, it seems that a lower ρmin encourages better
performance in most cases. A detailed view of how the data is being
clustered can provide a different analysis of the results. Figures 8.3 -
8.7 present a graphical illustration of the clustering output achieved
by each method. All figures show the ground truth clusters for visual
comparison, along with a typical clustering output from each method.
141
gp based on ns for clustering
−5
5
4
2
0
0
−2
−5
−4
10 10
5 5
0 0
−5 −5
−10 −10
10 10
5 5 5 5
0 0
0 0
−5 −5
−10 −5 −10 −5
5 5
0 0
−5 −5
5 5
4 4
2 2
0 0
0 0
−2 −2
−5 −5
−4 −4
These figures confirm the data presented in Table 8.2, NS-GP performs
worse on the easy problems and performs better on the difficult ones.
Figures 8.8 and 8.9 examine how sparseness evolves during the
NS-GP search, for NS-GP-20 and NS-GP-40 respectively. Each figure
presents a similar plot that shows how the sparseness of the best indi-
vidual (based on fitness) evolves over each generation. The plots are av-
erages of the 30 runs of each experiment and present a curve for each
problem. A horizontal line shows the corresponding threshold value
142
8.3 experiments
10
−5
−10
5
10
0
5
−5 0
−5
−10 −10
15 15
10 10
5 5
0 0
−5 −5
−10 −10
−15 −15
10 10
5 15 5 15
0 10 0 10
5 5
−5 0 −5 0
−10 −5 −10 −5
−10 −10
−15 −15 −15 −15
10 10
5 5
0 0
−5 −5
−10 −10
5 5
10 10
0 0
5 5
−5 0 −5 0
−5 −5
−10 −10 −10 −10
143
gp based on ns for clustering
20
10
−10
−20
10
5
10
0 0
−5 −10
30 30
20 20
10 10
0 0
−10 −10
−20 −20
−30 −30
15 15
10 20 10 20
5 10 5 10
0 0 0 0
−5 −10 −5 −10
−10 −20 −10 −20
20 20
10 10
0 0
−10 −10
−20 −20
10 10
5 5
10 10
0 0 0 0
−5 −10 −5 −10
144
8.3 experiments
6
4
2
0
−2
−4
−6
5
5
0
0
−5
−5
8 8
6 6
4 4
2 2
0 0
−2 −2
−4 −4
−6 −6
−8 −8
10 10
5 10 5 10
5 5
0 0
0 0
−5 −5
−5 −5
−10 −10 −10 −10
6 6
4 4
2 2
0 0
−2 −2
−4 −4
−6 −6
5 5
5 5
0 0
0 0
−5 −5
−5 −5
145
gp based on ns for clustering
−5
10
5 5
0 0
−5 −5
10 10
5 5
0 0
−5 −5
−10 −10
15 15
10 10 10 10
5 5 5 5
0 0 0 0
−5 −5 −5 −5
−10 −10 −10 −10
5 5
0 0
−5 −5
10 10
5 5 5 5
0 0 0 0
−5 −5 −5 −5
146
8.4 chapter conclusions
Prob.1
60 Prob.2
Prob.3
Prob.4
50
Prob.5
Treshold
40
Sparseness
30
20
10
0
20 40 60 80 100
Generations
Prob.1
60 Prob.2
Prob.3
Prob.4
50
Prob.5
Treshold
40
Sparseness
30
20
10
0
20 40 60 80 100
Generations
147
gp based on ns for clustering
5 5
0 0
−5 −5
10 10
5 5 5 5
0 0 0 0
−5 −5 −5 −5
5 5
0 0
−5 −5
10 10
5 5 5 5
0 0 0 0
−5 −5 −5 −5
ity during the search. On the other hand, for simple problems the
exploratory capacity of NS is mostly unexploited. In particular, CD
descriptor is less restrictive and for this reason it can be used on clus-
tering. According with the results, CD descriptor probed that can be
used with NS to solve clustering problems. Therefore, future work will
focus on exploring the usefulness of NS on the more difficult problem
of non-supervised learning.
148
9
GP based on NS for Classification
9.1 Introduction
149
gp based on ns for classification
150
9.1 introduction
1
[1 1 ... 1 1]
0.9 optimal
0.7
1 6 1 6
0.6
2 random
7 7 0.5
performance
2
0.4
3 8 3 8 0.3
[0 0...1 1]
[0 1...0 1]
0.2 [0 1...1 0]
4 4
[1 0...0 1]
9 9 0.1
5 10 5 10 0
[0 0 ... 0 0] [1 1...0 0]
optimal L/2
0 L
βo = [1 1 1 1 1 1 1 1 1 1] β =[0 0 0 1 1 1 0 1 0 0] Number of ones
151
gp based on ns for classification
152
9.1 introduction
d2
h
d1
Feature space 1-dimensional space
153
gp based on ns for classification
9.1.5 Discussion
154
9.1 introduction
3 6 6
2
4 4
1
0 2 2
−1
0 0
−2
−3 −2 −2
−4
−4 −4
−5
−6 −6 −6
−6 −5 −4 −3 −2 −1 0 1 2 3 −8 −6 −4 −2 0 2 4 6 −10 −5 0 5
6 8
4 6
4
2
2
0
0
−2 −2
−4
−4
−6
−6
−8
−8 −10
−6 −4 −2 0 2 4 6 −10 −5 0 5
Figure 9.3: Five synthetic 2-class problems used to evaluate each algo-
rithm in ascending order of difficulty (according with the GP objective-
based standard performance) from left to right.
155
gp based on ns for classification
Parameter Description
Population size 100 individuals.
Generations 100 generations.
Initialization Ramped Half-and-Half,
with 6 levels of maximum depth.
Operator prob- Crossover pc = 0.8, Mutation pµ = 0.2.
abilities n √ o
Function set + , − , × , ÷ , | · | , x2 , x , log , sin , cos , if .
n o
Terminal set x1 , ..., xi , ..., xp , where xi is a dimension of the
data patterns x ∈ <n .
Hard maxi- 20 levels.
mum depth
Selection Tournament of size 4.
Problem OS NS
Trivial 0.004 ±0.007 0.007 ±0.008
Easy 0.105 ±0.040 0.144 ±0.044
Moderate 0.136 ±0.033 0.159 ±0.041
Hard 0.260 ±0.052 0.266 ±0.053
Hardest 0.365 ±0.033 0.370 ±0.043
more bloat on difficult problems. In the case of NS, Figure 9.4(b) shows
that NS controls code growth quite effectively, exhibiting the same av-
erage program size on all problems.
Based on these results, we can revisit the fitness-causes-bloat theory
of Langdon and Poli (Langdon and Poli, 1997). It basically states that
the search for better fitness (given by the objective function) will bias
156
9.1 introduction
150 150
Size
Size
100 100
50 50
0 0
20 40 60 80 100 20 40 60 80 100
Generations Generations
Table 9.3: Average program size at the final generation for each algo-
rithm. For the NS algorithm, the population (Pop), archive (Arch) and
both (Pop+Arch) are considered.
the search towards larger trees, simply because there are more large
programs than small ones. Silva and Costa (Silva and Costa, 2009) state
it clearly:
... one cannot help but notice the one thing that all the
[bloat] theories have in common, the one thing that if re-
157
gp based on ns for classification
158
9.2 real-world classification experiments
Table 9.4: Real-world and synthetic datasets for binary and multiclass
classification problems, taken from the UC Irvine Machine Learning
Repository ♦ , from the U.S. geological survey
N (USGS) earth resources
observationJsystems (EROS) data center , and from the KEEL dataset
repository .
used for training and the rest for testing, the data partition is randomly
selected for each run. The objective function is given by the classifica-
tion error, which is used by all NS variants to choose the best solution
at the end of the run, and by OS to guide the search.
Since all classification problems differ on the sample size, and be-
cause PNS considers all of the individuals generated during the search,
for NS and MCNS all the behaviors in the current population and in the
population archive are used to compute the novelty measure in Equa-
tion 4.11. Moreover, ρth is set to 50% of the largest possible distance, as
well as k−neighbors is set to 50% of the population size, and the archive
is a FIFO queue with a size three times that of the population.
For the MCNS algorithm, the minimal criteria for each individual
is that it’s fitness must be within a certain percentage of the best solu-
tion found so far. Six different versions of MCNS are tested, from 5%
159
gp based on ns for classification
160
9.2 real-world classification experiments
161
gp based on ns for classification
−3
x 10 0.5
Classification Error
0.45 0.45
Classification Error
14
Classification Error
12 0.4
0.4
10 0.35
8
0.35 0.3
6
0.25
4
0.3
0.2
2
0.15
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Generations Generations Generations
0.4
0.4
0.4
Classification Error
Classification Error
Classification Error
0.35 0.35
0.35
0.3
0.3 0.3
0.25
0.25 0.25
0.2
0.2
0.2
20 40 60 80 100 20 40 60 80 100 20 40 60 80 100
Generations Generations Generations
0.4 0.3
0.25
Classification Error
Classification Error
Classification Error
0.35 0.25
0.3 0.2
0.2
0.25 0.15
0.15
0.2
0.1
0.15 0.1
0.16 0.31
0.13
Classification Error
Classification Error
Classification Error
0.15 0.3
0.12
0.14 0.29
0.11
0.13 0.28
0.12 0.27 0.1
(j) Cardiotocography C1C2 (k) Indian Liver C1C2 (l) Fertility C1C2
162
9.2 real-world classification experiments
80
100 100
80 80 60
Tree size
Tree size
Tree size
60 60
40
40 40
20
20 20
120
120 120
100
100 100
Tree size
Tree size
Tree size
80
80 80
60 60
60
40 40 40
20 20 20
100
80
80
80
Tree size
Tree size
Tree size
60
60
60
40
40 40
20 20 20
80 100
80
80
60
Tree size
Tree size
Tree size
60
60
40 40
40
20 20
20
(j) Cardiotocography C1C2 (k) Indian Liver C1C2 (l) Fertility C1C2
Figure 9.6: Evolution of the average size of the population at each gen-
eration, showing the median value over all runs.
163
gp based on ns for classification
164
9.2 real-world classification experiments
165
gp based on ns for classification
a random search. This is consistent with the size of the evolved pro-
grams generated by OS on these problems, which are among the small-
166
9.2 real-world classification experiments
0.5
0.55
Classifcation Error
Classifcation Error
Classifcation Error
0.5
0.5 0.4
0.45 0.4
0.3
0.4
0.3
0.2
0.35
0.2
0.3 0.1
0.25 0.1
0
OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS
0.7
0.6
Classifcation Error
Classifcation Error
Classifcation Error
0.6 0.5
0.5
0.5 0.4
0.4 0.4
0.3
0.3 0.3
0.2
0.2
0.2
OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS
0.25
0.3 0.2
Classifcation Error
Classifcation Error
Classifcation Error
0.25 0.2
0.15
0.2 0.15
0.1
0.15
0.1
0.1 0.05
0.05
0.05
0
0 0
OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS
0.16 0.4
0.35
Classifcation Error
Classifcation Error
Classifcation Error
0.14 0.3
0.35
0.25
0.12 0.2
0.3
0.15
0.1
0.1
0.25
0.08 0.05
OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS OS NS MCNS PNS MCPNS
(j) Cardiotocography C1C2 (k) Indian Liver C1C2 (l) Fertility C1C2
Figure 9.7: Classification error on the test data for the best solution
found, showing box plots of the median value over all runs.
est of all the algorithms as shown in Figure 9.6. Conversely, the con-
vergence plots for PNS, MCNS and MCPNS are substantially different,
167
gp based on ns for classification
In these tests, we use three problems (IM-3, SEG and M-L) with
different number of classes (3, 7 and 15). Moreover, the SEG problem
has 2,310 instances, which leads to a very large behavior descriptor;
i.e., using 70% of the data for training, we obtain a descriptor length
of 1,617 bits, which gives a very large behavior space. Assessing the
performance of the NS variants on this problem is of particular interest,
since previous works have shown that the performance of NS degrades
when behavior space is very large (Kistemaker and Whiteson, 2011).
The numerical comparison of the algorithms is given in Tables 9.8
and 9.9, the former shows the median test error and median population
size, while the latter shows the corresponding p-values of the statistical
tests. Similar to the binary case, all NS variants achieve basically the
same performance as OS, with slight improvements on some problems
(particularly M-L), but not statistically significant. Indeed the similar-
ity in terms of performance is even more evident when we analyze the
convergence plots for each problem shown in Figure 9.8, which shows
how the classification error evolves for the best solution found on the
training and testing sets. On the other hand, code growth is not con-
trolled like in the previous tests, in only one problem (M-L) the NS
variants produces statistically significant differences in terms of aver-
age program size.
The above results show that NS can be used to solve binary and mul-
ticlass classification problems, without a performance drop-off relative
to standard OS. This was not expected, given that the search process
omits the use of a standard objective function. Moreover, on some prob-
168
9.2 real-world classification experiments
169
gp based on ns for classification
0.8
0.4
0.08
Classification Error
Classification Error
Classification Error
0.3 0.6
0.06
0.2 0.4
0.04
0 0 0
0 50 100 0 50 100 0 50 100
Generations Generations Generations
0.8
0.4
0.08
Classification Error
Classification Error
Classification Error
0.3 0.6
0.06
0.2 0.4
0.04
0 0 0
0 50 100 0 50 100 0 50 100
Generations Generations Generations
170
9.2 real-world classification experiments
100 100
80 80
Ind. Rejected (%)
40 40
20 20
20 40 60 80 100 20 40 60 80 100
Generations Generations
(a) MCNS (b) MCPNS
171
gp based on ns for classification
4
x 10 1.2
10 NS
MCNS
PNS
9
1.1
Speed−up respect to OS
8
7 1
Archive Size
5 0.9
4
0.8
3
2
0.7
1
0.6
10 20 30 40 50 60 70 80 90 100 20 40 60 80 100
Generations Generations
172
9.2 real-world classification experiments
173
gp based on ns for classification
This chapter presented for the first time an application of the NS ap-
proach to supervised classification with GP, with several contributions.
First, the concept of behavior space is framed as a conceptual middle-
ground between the well-known concept of objective space and the re-
cently popular semantic space in GP. Second, a domain-specific descrip-
tor has been proposed and tested on supervised classification tasks,
considering synthetic and real-world data as well as binary and multi-
class problems. The proposed descriptor is a binary vector, where each
element corresponds with each fitness case in the training set, taking a
value of 1 when that fitness case is correctly classified and a 0 value oth-
erwise. Third, two extensions to the basic NS approach have been de-
174
9.3 chapter conclusions
veloped, PNS and MCNS, as well as a hybrid method MCPNS. PNS pro-
vides a probabilistic framework to measure a solution’s novelty, elimi-
nating all of the underlying NS parameters while reducing the compu-
tational overhead that the original NS algorithm suffers from. On the
other hand, the proposed MCNS extends the minimal criteria approach
by combining the objective function with the sparseness measure, con-
straining the NS algorithm by specifying a minimal solution quality, a
dynamic criterion that is proportional to the quality of the best solution
found so far.
Experimental results are evaluated based on two measures, solution
quality and average size of all solutions in the population. In terms of
performance, results show that all NS variants achieve competitive re-
sults relative to the standard OS approach in GP. These results show
that the general open-ended approach towards evolution followed by
NS can compete with objective driven search in traditional machine
learning domains. On the other hand, in terms of solutions size and
the bloat phenomenon, the NS approach can lead the search towards
maintaining smaller program trees, particularly in the simpler binary
tasks. In particular, NS and MCNS show substantial reductions in pro-
gram size relative to OS.
Finally, a promising aspect of the present work is that several future
lines of research can be explored, in no particular order we contem-
plate the following. Firstly, there seems to be a possible link between
the PNS algorithm and two similar methods in evolutionary compu-
tation, estimation of distribution algorithms (EDAs) (Larrañaga and
Lozano, 2001), the frequency fitness assignment (FFA) method (Weise
et al., 2014), and fitness sharing methods (Nguyen et al., 2012). While
EDAs use a distribution over genotype space to generate new individ-
uals, PNS uses a distribution in behavior space to measure the novelty
of each solution. FFA favors solutions with unique objective scores, in-
stead of uniqueness in behavior space as done in PNS.
Nonetheless, many of the theoretical and practical insights derived
from EDA and FFA research might be brought to bear during fur-
ther development of the PNS approach, while further comparisons
with recent diversity preservation techniques might also be of interest
175
gp based on ns for classification
176
9.3 chapter conclusions
30 500
Top ranked individuals (%)
450
25 400
350
Average rank
20
300
15 250
200
10
150
100
5
50
20 40 60 80 100 20 40 60 80 100
Generations Generations
(a) OS (b) Relative ranking
30 500
Top ranked individuals (%)
450
25 400
350
Average rank
20
300
15 250
200
10
150
100
5
50
20 40 60 80 100 20 40 60 80 100
Generations Generations
(c) NS (d) Relative ranking
30 500
Top ranked individuals (%)
450
25 400
350
Average rank
20
300
15 250
200
10
150
100
5
50
20 40 60 80 100 20 40 60 80 100
Generations Generations
(e) PNS (f) Relative ranking
OS NS PNS
Figure 9.11: Relative ranking of NS, PNS and OS on the IM-3 problem.
177
gp based on ns for classification
30 500
Top ranked individuals (%)
450
25 400
350
Average rank
20
300
15 250
200
10
150
100
5
50
20 40 60 80 100 20 40 60 80 100
Generations Generations
(a) OS (b) Relative ranking
30 500
Top ranked individuals (%)
450
25 400
350
Average rank
20
300
15 250
200
10
150
100
5
50
20 40 60 80 100 20 40 60 80 100
Generations Generations
(c) NS (d) Relative ranking
30 500
Top ranked individuals (%)
450
25 400
350
Average rank
20
300
15 250
200
10
150
100
5
50
20 40 60 80 100 20 40 60 80 100
Generations Generations
(e) PNS (f) Relative ranking
OS NS PNS
Figure 9.12: Relative ranking of NS, PNS and OS on the SEG problem.
178
Conclusions & Future Directions
10
The initial motivation of this research was to explore how the open
issues in GP (O’Neill et al., 2010) could be tackled. We must admit that
at the beginning we were somewhat skeptical of the benefits that could
be obtained by the use of an unorthodox search strategy based on solu-
tion novelty instead of solution quality. Even more, because we found
some criticism about NS at conferences, comparing it with a random or
exhaustive search. On the other hand, more recently Velez and Clune
(2014) showed indeed that NS has general exploration skills and does
not behaves as a random search, which we were also able to verify and
extend. With due caution, we started studying NS and exploring how
could we tackle the open issues in GP. First applying NS with a sim-
ple GA for a real-world circuit synthesis problem, and then by show-
ing the generalization that can be achieved by using NS in evolutionary
robotics. Afterward we were able to apply NS on several machine learn-
ing tasks, especially regression, clustering and more completely on su-
pervised classification. We also proposed new variants of NS, showing
that several of its issues could be mitigated.
In general, we must say that out initial findings of applying NS in
machine learning are very encouraging, even though GP systems based
on NS were not better than traditional objective-based search in many
cases, they showed competitive results against its traditional counter-
part, with several advantages; namely, its better performance on more
difficult problems and the reduction in bloat during evolution. All this
was somewhat unexpected, because NS is a counter-intuitive approach;
searching without an explicit objective!
179
conclusions & future directions
180
10.1 summary and conclusions
181
conclusions & future directions
182
10.3 future work
There are several future research lines to extend the work related
with deception, one is to take into account the methods based on data
separation to design synthetic classification problems that can intro-
duce a deception degree for this kind of methods. Another research line
is to apply non-linear classifiers, incorporating different approaches
such as genetic programming to generate the classifiers.
With respect to work related with the evolution of CF circuits, one
possible research line is to optimize the evolved topologies of the CF
circuits and to subject them to real-world experimental validation. An-
other is to apply the NS paradigm to synthesis other specialized circuits
of interest in the field of electronic design automation. Furthermore,
we can enhance the NS approach by attempting to force the search away
from specific areas of the search space. For example, it should be possi-
ble to seed the population with previously known designs that should
be avoided by the search, since they are not as interesting. In this way,
183
conclusions & future directions
the NS algorithm could be used to explicitly search for circuits that are
unique in electronic literature.
We can extend our original work on generalization into two differ-
ent directions; first by grouping the initial conditions into easy-difficult
regions, and second by using different deceptive task navigations.
With respect to the machine learning problems, future work can
focus on to provide a deep comparison between the proposed behavior-
based search strategy and recent semantics-based approaches, a com-
parison that goes beyond merely experimental results, but a detailed
analysis of the main algorithmic differences between both approaches
and their effects on search. Furthermore, future work on this domain
should also study how NS affects the bloat phenomenon during a GP
search.
Particularly with respect to the proposal of computing novelty
through a probability approach, there seems to be a possible link be-
tween the PNS algorithm and two similar methods in evolutionary com-
putation, estimation of distribution algorithms (EDAs) Larrañaga and
Lozano (2001), the frequency fitness assignment (FFA) method Weise
et al. (2014). The proposed algorithms should be evaluated in other
machine learning problems, such as unsupervised clustering Naredo
and Trujillo (2013) On the other hand, we might extend the proposed
PNS variants in other ways, such as testing PNS with real-valued be-
havior descriptors or applying PNS within semantic space, similar to
the approach suggested in Castelli et al. (2014).
Finally, future work will also focus on making our proposal more
stable, since the results indicate a large variance in the NS-based runs,
which is understandable given the nature of the search. However, we
believe that the best approach is not to combine the objective function
and the novelty into a single fitness value or to use a multiobjective
formulation. It is our opinion that the best way to move forward is
to use NS to explore the search space and to integrate a local search
method to exploit individuals that exhibit promising new behaviors ?.
184
Bibliography
Banzhaf, W., Francone, F. D., and Nordin, P. (1996). The effect of ex-
tensive use of the mutation operator on generalization in genetic pro-
gramming using sparse data sets. In In Parallel Problem Solving from
Nature IV, Proceedings of the International Conference on Evolutionary
Computation, edited by, pages 300–309. Springer Verlag.
Bezdek, J., Ehrlich, R., and Full, W. (1984). Using direct competition to
select for competent controllers in evolutionary robotics. Fcm: The
fuzzy c-means clustering algorithm, 10(2–3):191–203.
185
bibliography
Castelli, M., Manzoni, L., Silva, S., and Vanneschi, L. (2010). A compar-
ison of the generalization ability of different genetic programming
frameworks. In Evolutionary Computation (CEC), 2010 IEEE Congress
on, pages 1–8.
Castelli, M., Manzoni, L., Silva, S., and Vanneschi, L. (2011). A quanti-
tative study of learning and generalization in genetic programming.
In Silva, S., Foster, J. A., Nicolau, M., Machado, P., and Giacobini, M.,
editors, EuroGP, volume 6621 of Lecture Notes in Computer Science,
pages 25–36. Springer.
Castelli, M., Trujillo, L., Vanneschi, L., and Popovic̆, A. (2015). Predic-
tion of energy performance of residential buildings: A genetic pro-
gramming approach. Energy and Buildings, 102:67 – 74.
186
bibliography
Das, R. and Whitley, D. (1991). The only challenging problems are de-
ceptive: global search by solving order-1 hyperplanes. Number no. 102
in Technical report (Colorado State University. Department of Com-
puter Science). Colorado State University, Department of Computer
Science.
Derrac, J., Garcı́a, S., Molina, D., and Herrera, F. (2011). A practical
tutorial on the use of nonparametric statistical tests as a methodol-
ogy for comparing evolutionary and swarm intelligence algorithms.
Swarm and Evolutionary Computation, 1(1):3–18.
187
bibliography
188
bibliography
189
bibliography
Gonçalves, I., Silva, S., Melo, J., and Carreiras, J. a. M. B. (2012). Ran-
dom sampling technique for overfitting control in genetic program-
ming. In Moraglio, A., Silva, S., Krawiec, K., Machado, P., and Cotta,
C., editors, Genetic Programming, volume 7244 of Lecture Notes in
Computer Science, pages 218–229. Springer Berlin Heidelberg.
Hernández, B., Olague, G., Hammoud, R., Trujillo, L., and Romero, E.
(2007). Visual learning of texture descriptors for facial expression
recognition in thermal imagery. Computer Vision and Image Under-
standing, Special Issue on Vision Beyond the Visual Spectrum, 106(2-
3):258–269.
190
bibliography
Ingalalli, V., Silva, S., Castelli, M., and Vanneschi, L. (2014). A multi-
dimensional genetic programming approach for multi-class classi-
fication problems. In Nicolau, M., Krawiec, K., Heywood, M. I.,
Castelli, M., Garcı́a-Sánchez, P., Merelo, J. J., Rivas Santos, V. M., and
Sim, K., editors, Genetic Programming, volume 8599 of Lecture Notes
in Computer Science, chapter A Multi-dimensional Genetic Program-
ming Approach for Multi-class Classification Problems, pages 48–60.
Springer Berlin Heidelberg.
191
bibliography
192
bibliography
193
bibliography
Mahler, S., Robilliard, D., and Fonlupt, C. (2005). Tarpeian bloat con-
trol and generalization accuracy. In Keijzer, M., Tettamanzi, A., Col-
let, P., van Hemert, J. I., and Tomassini, M., editors, Proceedings of
the 8th European Conference on Genetic Programming, volume 3447 of
Lecture Notes in Computer Science, pages 203–214, Lausanne, Switzer-
land. Springer.
Martı́nez, Y., Naredo, E., Trujillo, L., and López, E. G. (2013). Search-
ing for novel regression functions. In IEEE Congress on Evolutionary
Computation, pages 16–23.
Martı́nez, Y., Naredo, E., Trujillo, L., Pierrick, L., and López, U. (2016).
A comparison of fitness-case sampling methods for genetic program-
ming. Submitted to: Journal of Experimental & Theoretical Artificial
Intelligence, currently working with the reviewers’ comments.
Martı́nez, Y., Trujillo, L., Naredo, E., and Legrand, P. (2014). A com-
parison of fitness-case sampling methods for symbolic regression
with genetic programming. In EVOLVE - A Bridge between Probabil-
ity, Set Oriented Numerics, and Evolutionary Computation V, volume
288 of Advances in Intelligent Systems and Computing, pages 201–212.
Springer International Publishing.
194
bibliography
McDermott, J., White, D. R., Luke, S., Manzoni, L., Castelli, M., Van-
neschi, L., Jaskowski, W., Krawiec, K., Harper, R., De Jong, K., and
O’Reilly, U.-M. (2012). Genetic programming needs better bench-
marks. In Proceedings of the 14th Annual Genetic and Evolutionary
Computation Conference, GECCO ’12, pages 791–798, New York, NY,
USA. ACM.
Muñoz, L., Silva, S., and Trujillo, L. (2015). M3gp – multiclass clas-
sification with gp. In Machado, P., Heywood, M. I., McDermott, J.,
195
bibliography
Castelli, M., Garcı́a-Sánchez, P., Burelli, P., Risi, S., and Sim, K., edi-
tors, Genetic Programming, volume 9025 of Lecture Notes in Computer
Science, pages 78–91. Springer International Publishing.
Naredo, E., Dunn, E., and Trujillo, L. (2013a). Disparity map estimation
by combining cost volume measures using genetic programming. In
Schütze, O., Coello Coello, C. A., Tantar, A.-A., Tantar, E., Bouvry, P.,
Del Moral, P., and Legrand, P., editors, EVOLVE - A Bridge between
Probability, Set Oriented Numerics, and Evolutionary Computation II,
volume 175 of Advances in Intelligent Systems and Computing, pages
71–86. Springer Berlin Heidelberg.
Naredo, E., Trujillo, L., Fernández De Vega, F., Silva, S., and Legrand,
P. (2015). Diseñando problemas sintéticos de clasificación con super-
ficie de aptitud deceptiva. In X Congreso Español de Metaheurı́sticas,
Algoritmos Evolutivos y Bioinspirados (MAEB 2015), Mérida, España.
Naredo, E., Trujillo, L., Legrand, P., Silva, S., and Muño, L. (2016b).
Evolving genetic programming classifiers with novelty search. To
appear: Information Siences Journal.
Naredo, E., Trujillo, L., and Martı́nez, Y. (2013b). Searching for novel
classifiers. In Proceedings from the 16th European Conference on Ge-
netic Programming, EuroGP 2013, volume 7831 of LNCS, pages 145–
156. Springer-Verlag.
196
bibliography
Naredo, E., Urbano, P., and Trujillo, L. (2016c). The training set and
generalization in grammatical evolution for autonomous agent navi-
gation. Soft Computing, pages 1–18.
Nguyen, Q., Nguyen, X., O’Neill, M., and Agapitos, A. (2012). An in-
vestigation of fitness sharing with semantic and syntactic distance
metrics. In Proceedings of the 15th European Conference on Genetic Pro-
gramming, EuroGP’12, pages 109–120. Springer Berlin Heidelberg.
O’Neill, M., Vanneschi, L., Gustafson, S., and Banzhaf, W. (2010). Open
issues in genetic programming. Genetic Programming and Evolvable
Machines, 11(3-4):339–363.
197
bibliography
Poli, R., Langdon, W. B., and McPhee, N. F. (2008a). A field guide to ge-
netic programming. Published via http://lulu.com and freely avail-
able at http://www.gp-field-guide.org.uk. (With contributions
by J. R. Koza).
Puente, C., Olague, G., Smith, S., Bullock, S., Hinojosa-Corona, A., and
González-Botello, M. (2011). A genetic programming approach to
estimate vegetation cover in the context of soil erosion assessment.
Photogrametric Engineering and Remote Sensing, 77(4):363–376.
Rana, S. (1999). Examining the role of local optima and schema pro-
cessing in genetic search.
198
bibliography
Robilliard, D., Mahler, S., Verhaghe, D., and Fonlupt, C. (2006). Santa
fe trail hazards. In Talbi, E.-G., Liardet, P., Collet, P., Lutton, E., and
Schoenauer, M., editors, 7th International Conference on Artificial Evo-
lution EA 2005, volume 3871 of Lecture Notes in Computer Science,
pages 1–12, Lille, France. Springer.
Romero, J. and Machado, P., editors (2007). The Art of Artificial Evolu-
tion: A Handbook on Evolutionary Art and Music. Natural Computing
Series. Springer Berlin Heidelberg.
Silva, S. and Costa, E. (2009). Dynamic limits for bloat control in ge-
netic programming and a review of past and current bloat theories.
Genetic Programming and Evolvable Machines, 10(2):141–179.
199
bibliography
Tan, X., Bhanu, B., and Lin, Y. (2005). Fingerprint classification based
on learned features. IEEE Transactions on Systems, Man, and Cybernet-
ics, Part C, 35(3):287–300.
200
bibliography
Trujillo, L., Muñoz, L., Naredo, E., and Martı́nez, Y. (2014). Genetic Pro-
gramming: 17th European Conference, EuroGP 2014, Granada, Spain,
April 23-25, 2014, Revised Selected Papers, chapter NEAT, There’s No
Bloat, pages 174–185. Springer Berlin Heidelberg, Berlin, Heidel-
berg.
201
bibliography
Trujillo, L., Olague, G., Lutton, E., and de Vega, F. F. (2008a). Behavior-
based speciation for evolutionary robotics. In GECCO, pages 297–
298.
Trujillo, L., Olague, G., Lutton, E., and De Vega, F. F. (2008b). Discov-
ering several robot behaviors through speciation. In Proceedings of
the 2008 conference on Applications of evolutionary computing, Evo’08,
pages 164–174. Springer-Verlag.
Trujillo, L., Olague, G., Lutton, E., de Vega, F. F., Dozal, L., and
Clemente, E. (2011b). Speciation in behavioral space for evolutionary
robotics. Journal of Intelligent and Robotic Systems, 64(3-4):323–351.
Trujillo, L., Olague, G., Lutton, E., and Fernández de Vega, F. (2008c).
Multiobjective design of operators that detect points of interest in
images. In Cattolico, M., editor, Proceedings of the Genetic and Evo-
lutionary Computation Conference (GECCO), Atlanta, GA, July 12-16,
pages 1299–1306, New York, NY, USA. ACM.
Trujillo, L., Silva, S., Legrand, P., and Vanneschi, L. (2011c). An empir-
ical study of functional complexity as an indicator of overfitting in
genetic programming. In Silva, S., Foster, J. A., Nicolau, M., Machado,
P., and Giacobini, M., editors, EuroGP, volume 6621 of Lecture Notes
in Computer Science, pages 262–273. Springer.
202
bibliography
Uy, N. Q., Hien, N. T., Hoai, N. X., and O’Neill, M. (2010). Improv-
ing the generalisation ability of genetic programming with semantic
similarity based crossover. In Proceedings of the 13th European Con-
ference on Genetic Programming, EuroGP’10, pages 184–195, Berlin,
Heidelberg. Springer-Verlag.
Uy, N. Q., Hoai, N. X., O’Neill, M., Mckay, R. I., and Galván-López,
E. (2011a). Semantically-based crossover in genetic programming:
application to real-valued symbolic regression. Genetic Programming
and Evolvable Machines, 12(2):91–119.
Uy, N. Q., Hoai, N. X., O’Neill, M., Mckay, R. I., and Galván-López,
E. (2011b). Semantically-based crossover in genetic programming:
application to real-valued symbolic regression. Genetic Programming
and Evolvable Machines, 12(2):91–119.
Vanneschi, L., Castelli, M., and Silva, S. (2010). Measuring bloat, over-
fitting and functional complexity in genetic programming. In Pro-
ceedings of the 12th Annual Conference on Genetic and Evolutionary
Computation, GECCO ’10, pages 877–884, New York, NY, USA. ACM.
Velez, R. and Clune, J. (2014). Novelty search creates robots with gen-
eral skills for exploration. In Proceedings of the 2014 Conference on
Genetic and Evolutionary Computation, GECCO ’14, pages 737–744.
ACM.
203
bibliography
Weise, T., Wan, M., Wang, P., Tang, K., Devert, A., and Yao, X. (2014).
Frequency fitness assignment. Evolutionary Computation, IEEE Trans-
actions on, 18(2):226–243.
204