Comprehensibility, Overfitting and Co-Evolution in Genetic Programming for Technical Trading Rules
Comprehensibility, Overfitting and Co-Evolution in Genetic Programming for Technical Trading Rules
Mukund Seshadri
A Thesis
of the
in
Computer Science
May 2003
APPROVED:
________________________
________________________
________________________
1
Abstract
This thesis presents Genetic Programming methodologies to find successful and
understandable technical trading rules for financial markets. The methods when
applied to the S&P500 consistently beat the buy-and-hold strategy over a 12-year
period, even when considering transaction costs. Some of the methods described
The work describes the use of a complexity-penalizing factor to avoid overfitting and
improve comprehensibility of the rules produced by GPs. The effect of this factor on
the returns for this domain area is studied and the results indicated that it increased
the predictive ability of the rules. A restricted set of operators and domain knowledge
eliminated and a number of technical indicators in addition to the widely used moving
averages, such as trend lines and local maxima and minima were added. A new
evaluation function that tests for consistency of returns in addition to total returns is
improving returns are studied and the results analyzed. We find that paired
2
Acknowledgements
If I have seen further [than certain other men] it is by standing upon the shoulders of
giants - Isaac Newton (The Columbia World of Quotations. Copyright 1996
Columbia University Press.)
None of this work would have been possible without the unfailing enthusiasm,
patience and tolerance of Prof. Lee Becker. His contributions and support are greatly
valued and appreciated and I am deeply indebted. I would also like to thank Prof.
David Brown and Prof. Carolina Ruiz for their feedback and Prof. Rundensteiner and
My father introduced me to technical analysis and the desire to ask questions and my
competitive and they deserve a lot of credit. Their help and good wishes are deeply
Thanks are also due to Jignesh Patel, who was a constant companion in my long
-Mukund Seshadri
3
Table of Contents
Abstract ......................................................................................................................... 2
1 Introduction........................................................................................................... 7
2 Background ........................................................................................................... 9
2.1 Genetic Programming ................................................................................... 9
2.1.1 Introduction to Genetic algorithms ....................................................... 9
2.1.2 Introduction to Genetic Programming ................................................ 13
2.2 Coevolution................................................................................................. 16
2.3 Comprehensibility and Overfitting ............................................................. 19
2.3.1 Comprehensibility............................................................................... 20
2.3.2 Overfitting avoidance.......................................................................... 22
2.4 Technical Analysis...................................................................................... 23
2.4.1 Different types of indicators ............................................................... 24
2.4.2 Work on technical analysis ................................................................. 31
2.5 GP In Finance ............................................................................................. 32
3 Innovations and Experimental Design ................................................................ 35
3.1 Proposed Changes to the Allen and Karjalainen approach......................... 35
3.1.1 Using a Complexity Penalizing Factor ............................................... 35
3.1.2 Technical Trading Rule Expression Language ................................... 36
3.1.3 Fitness functions ................................................................................. 38
3.1.4 Co-evolution: Using Distinct Buy and Sell Rules .............................. 38
3.2 Data ............................................................................................................. 39
3.2.1 Choice of data series ........................................................................... 39
3.2.2 Risk Free Interest Rates ...................................................................... 40
3.2.3 Indicators Derived and Used from the S&P 500 ................................ 41
3.2.4 Moving Averages................................................................................ 42
3.2.5 Previous Maxima and Minima............................................................ 42
3.2.6 Trend Lines or trading ranges ............................................................. 42
3.2.7 Rate of Change.................................................................................... 43
3.2.8 Partitioning the Data ........................................................................... 43
3.3 Details of the Genome Structure................................................................. 45
3.4 Genetic Program Parameters....................................................................... 49
3.4.1 Types of Nodes ................................................................................... 49
3.4.2 Random Tree Initialization ................................................................. 50
3.4.3 Mutation Operator............................................................................... 50
3.4.4 Crossover Operation ........................................................................... 52
3.4.5 The Objective Function....................................................................... 53
3.5 Experiments ................................................................................................ 54
3.5.1 Experiment – I (Effect of complexity-penalizing factor) ................... 54
3.5.2 Experiment – II (Different Evaluation function) ................................ 56
3.5.3 Experiment – III (Using Paired buy and sell rules) ............................ 58
3.5.4 Experiment – IV (Different types of co-evolution) ............................ 61
4 Results................................................................................................................. 65
4.1.1 Experiment-I (Effect of complexity-penalizing factor) ...................... 65
4.1.2 Experiment-II (Different Evaluation function)................................... 69
4
4.1.3 Experiment-III (Using Paired buy and sell rules) ............................... 71
4.1.4 Experiment –IV (Different types of co-evolution) ............................. 72
5 Conclusions & Future Work ............................................................................... 76
5.1 Conclusions................................................................................................. 76
5.2 Future Work ................................................................................................ 78
Appendix..................................................................................................................... 80
Approximate Calculation of number of trees.......................................................... 80
References................................................................................................................... 82
5
List of Tables and Figures
List of Tables
Table 1: Action matrix for single rule tree.................................................................. 54
Table 2: Action matrix for paired rule - 1................................................................... 59
Table 3: Action matrix for paired rule - 2................................................................... 60
Table 4: Results for experiment I -1 ........................................................................... 66
Table 5: Result for Buy and hold strategy .................................................................. 67
Table 6: Results for experiment I - 2 .......................................................................... 68
Table 7 Results for experiment 2 ................................................................................ 70
Table 8: Results for experiment III............................................................................. 71
Table 9: Results for experiment IV - 1 ....................................................................... 73
Table 10: Results for experiment 1V –2 ..................................................................... 74
List of Figures
Figure 1: Flowchart for GA algorithm........................................................................ 12
Figure 2: Example of a GP genome............................................................................ 13
Figure 3: Crossover for GP trees ................................................................................ 15
Figure 4: Possible choices of collaborators ................................................................ 18
Figure 5: Technical trading rule from Allen and Karjalainen’s paper........................ 19
Figure 6: Technical Analysis Indicators – Volume and Japanese Candles ................ 25
Figure 7: Technical Analysis Indicators – Moving Averages .................................... 27
Figure 8: Technical Analysis Indicators – Trend Lines.............................................. 28
Figure 9: Technical Analysis Indicators – Rate of Change Indicators ....................... 29
Figure 10 : Technical Analysis Indicators – Price and Volume ................................. 30
Figure 11: A rule that is difficult to comprehend ...................................................... 37
Figure 12: Price chart of monthly closing values from 1960-1990 ............................ 44
Figure 13: The simplest tree ....................................................................................... 47
Figure 14: How the Boolean operator fits in .............................................................. 47
Figure 15: A larger tree............................................................................................... 48
Figure 16: Hierarchy of nodes .................................................................................... 49
Figure 17: Sub tree swap mutation ............................................................................. 51
Figure 18 : Tree Node Swap Mutation ....................................................................... 51
Figure 19 : Sub Tree Destructive Mutation ................................................................ 52
Figure 20: Single Point Crossover .............................................................................. 52
Figure 21: Structure of a paired tree ........................................................................... 58
Figure 22: Overview of co-evolution strategies.......................................................... 62
6
1 Introduction
The last few years have seen the increasing application of Genetic Programming to
computational finance. A lot of this work has been in the area of financial trading and
prediction. This has been especially true in the area of stock price and foreign
exchange rate prediction. Attempts have been made at mining financial time series
data for patterns that may be applied to the future. The development of Genetic
Programming over the last decade, its inherent parallelism and its ability to quickly
find near-perfect solutions quickly for NP-hard problems, has encouraged the
area that offers a set of building blocks for pattern detection and is utilized by
Most studies however have suffered from being either unable to beat the buy-and-
hold strategy and more seriously the lack of simple, understandable technical trading
rule patterns. The inherent randomness of the Genetic Programming procedure also
creates problems in the variability of returns and methods that return consistent
This study attempts to solve these problems through different choices made for the
Genetic Program and apply the techniques of co-evolution to further improve on work
already done.
7
The remainder of the thesis is organized as follows. Chapter 2 introduces Genetic
Programming and Technical Analysis and details related work in those areas. Chapter
3 lists the improvements suggested over previous work and describes experiments to
test them. Chapter 4 tabulates the results of the study and Chapter 5 concludes with
8
2 Background
This chapter gives a basic overview to Genetic Programming and the problem domain
(Technical Analysis). It lists various technical indicators that are used by technical
analysts and describes their usage. Previous studies on technical analysis are studied
The problems with these previous approaches that we will attempt to address are
Evolutionary algorithms are computer-based problem solving systems that are based
on biological evolution as key elements of their design and implementation. They can
GAs were invented by Holland [1] in the 1960s as a method to study evolution. GAs
mimic the biological evolution of organisms by following the “survival of the fittest”
paradigm. Initial solutions are generated randomly and their fitness tested. This
indicates how well a solution satisfies the problem. The fittest solutions are chosen
9
for mating in proportion to their fitness. In the reproduction process, the components
of the solution are swapped and those children are also evaluated. A small percentage
of solutions are also subjected to mutation to maintain diversity in the gene pool.
Then the fittest among all these are chosen and form the next generation where the
steps are repeated. At the end of a number of generations, the fittest organism is
chosen as the answer. This sort of population-based algorithm with crossovers and
positive roots of an equation [x^3 + 3x^2 + 3x +1]. The GA designer might decide to
represent the solution in the form of an 8-bit array. Therefore the representation of the
0000 0001 1
where the number to the left is the binary representation of the number to the right.
Initially the GA would start out with a population of random bit strings. For example
0000 1111 15
Then it would measure the fitness of each of these organisms. In this case a valid
fitness measure would be to minimize the distance between 0 and the value of the
10
expression for the particular organism/genome. Therefore the fitness evaluations
would be
The next step would be to choose the fittest individuals and mate them. This would
mean crossing over 168 and 15, by choosing a random point for crossover. For
as can be seen this generates 0000 1000 which is closer to the solution than 245 and
will replace it. In this manner proceeding along, generation after generation we will
hopefully get to the right answer. The GA algorithm is depicted in Fig.1 and outlined
2.1 Assign each individual in the population a fitness, based on some domain-
2.3 Until the size of the child population equals that of the parent population:
2.3.1 Select two members of the parent population, with the probability
11
2.3.2 Breed these two members using a crossover operation to produce a
child.
probability.
Because the best parts of the solution are carried over from generation to generation,
the near optimal solutions tend to be found. Because the solutions are evaluated in
parallel, GAs are used for NP-hard problems where they can find approximations to
the best solution pretty quickly. The theoretical foundations of this process lie in The
12
Schema Theorem [1]. A schema is a set of chromosomes that share certain values.
The schema theorem relates the fitness of members of a schema to the expected
number of schema members in the next generation. Members of schemas that have
GP is different from genetic algorithms in that the forms of the organisms/genes are
trees. The idea was first introduced by John Koza [3] to evolve LISP programs where
the tree was the program syntax tree. For example a genome for a genetic program to
* +
x x * 1
x 2
This particular tree represents the expression x^2+2x+1. As a genome it can be used
for curve fitting where the evaluation function might be the inverse of the sum of
absolute differences between the tree value and the actual value. For crossover
13
operations one could replace a random sub tree of one tree with another as illustrated
in Fig.3. For mutation one could destroy the sub tree, or change a node to another
node. This is a very powerful structure that has been used in a wide number of areas.
They have been used to classify one-dimensional two-state cellular automata [4], in
which they have performed better than all human generated rules and other automated
maneuvers [5]. In addition, of the nodes of the tree are operators and the data is
stored in the nodes the tree can represent a rule. These have been used in
14
Crossover
15
2.2 Coevolution
simultaneously co-evolve two different species that depend on each other to achieve a
solution. According to geneticists Futuyuma & Slatkin [8] following Jansen [9], “a
rigorous definition of coevolution requires that a trait in one species has evolved in
response to a trait of another species, which trait was itself evolved in response to the
first species.” This more closely represents natural evolution where species evolve
one where the environment consists of the same factors as in single evolution but now
includes other species. The most familiar example of co-evolution is parasitism and
symbiosis. Another example is the evolution of deer and lions where both need to
learn to run faster. The lions need to run faster to catch their prey and the deer need to
get away from them. The slowest lions fail to catch their prey and will stand a greater
chance of going hungry. This would mean that the faster lions have a better chance of
reproducing and passing on their genes. Similarly the slowest deer would be preyed
on thus reducing their ability to pass on their genes. The faster ones on the other hand
bear children and the average ability to run faster of the overall population of deer
improves.
it would start out with two or more populations, each randomly created. Then the
algorithm proceeds as detailed above in genetic algorithms but with the change that
16
the evaluation function of each species consists of testing the performance of each
individual with each individual (or a sampling) from the other populations. No
organism has therefore an absolute fitness measure but only one that depends on the
Pollack10] between the populations with each one getting better and better. These are
known as “coupled fitness landscapes “ [11] because the fitness curve of each
population is affected by the organisms in the other population. There have been very
interesting applications of this idea as in Hillis’ [12] sorters vs. sequences where the
sorting networks evolve as one population and the other population consists of sorting
credit is easy to assign. The score for one organism is the complement for the
parts make up the solution, and they have to be evolved separately, thought must be
given to how the credit for a successful solution is passed down to its components.
For example, Holland [13] describes a bucket brigade system for a classifier system
Coevolution also forces one to consider the collaborators to use for evaluation. Potter
individuals in one species are evaluated by using collaborators from each of the other
17
All individuals of each of the other species [15] can be used for evaluations but this is
time-consuming and can lead to combinatorial explosion if there are many species.
use the best individuals from each of the other species as collaborators. This has been
tried by [16]. In some cases randomly drawn individuals of each of the other species
in the same generation as in [17] and [18] have been experimented with. [13] reports
work done on choosing fixed partners from the other species of the same generation.
Fig.4 above succinctly outlines the various combinations that can be exploited in
choosing collaborators.
18
2.3 Comprehensibility and Overfitting
Trees evolved by GP can be very large. For example, Allen and Karjalainen[19] find
rules that vary from 9 to 94 nodes and with a minimum depth of 5. Consider the tree
they describe in their paper (Fig. 5). This tree represents the rule
19
There are a few problems with trying to understand this rule.
1. Non-intuitive nature of the rule: The meaning of the rule is not immediately
obvious by looking at the tree. On noting that this is one of the smaller trees
2. Redundancy: The papers discusses the structure of the trees and notes that
there are a lot of redundant sub trees in the rules and even sub trees that are
3. The size of the tree. This is one of the smaller trees. As the trees get larger,
they become more difficult to understand and they become more prone to over
2.3.1 Comprehensibility
program for discovering technical trading rules. Neely [20] uses the following
operations
1. Arithmetic operations: plus, minus, times, divide, norm, average, max, min,
lag
4. Numerical constants
20
These operators are very powerful and can be combined to form effective rules. In
addition Allen[21] restricts the depth of the trees to size 10. Neely also imposes a
limit of depth 10 and size 100. As noted in the paper the trees generated have some
redundant expressions. In addition it might be argued that the trees generated are
significant insight to be gained into the rules that work, the GP must produce easily
understandable rules. Understanding the rules is also important for people who use
the rules. Money managers, for example, would have more confidence in using rules
Extensive work has been done the issue of comprehensibility for decision trees. To
certain extent one can equate comprehensibility with the simplicity or conciseness of
the model, but consistency with the domain knowledge of the users who must
comprehend the model is also an important factor [21]. If a user were to possess a
comprehensibility of the model would be better than if the model contained the same
‘size’ portion for which the user had no corresponding chunk. For expression trees the
this by hand may not be practical for large complex trees, building domain-specific
possible.
21
In addition, a number of studies ([24], [25]) it is argued that deeper trees are less
would have 1024 nodes and would call for a lot of effort to extract meaning from it. It
is important to note that the comprehensibility of a tree is also related to the returns it
generates. Any additional complexity in the rule must generate equivalent increase in
returns.
Another problem that is encountered in many machine learning and data mining
techniques, including GP is overfitting. This can occur when the learned or evolved
model fits the particulars of the training data overly well and consequently does not
overfitting
models or generating and hypothesizing models in the order from simple to complex
or searching in the space of solutions from general to specific and using some
stopping criterion.
There have been a number of studies ([26], [27], [28]) where accuracy has not been
reduced or has even been improved as a result of simplifying trees by pruning. There
22
have also been theoretical arguments in favor of what has sometimes been referred to
as Occam’s Razor, namely that simpler models have greater predictive power and
Following Jensen & Cohen [32], Domingos regards the number of models considered
For GP, overfitting is often avoided by limiting the number of generations, and
limiting the size of the population would also result in reducing the number of models
considered. A validation data set can be used to directly test generalization errors and
thus directly decide between different models. It can be used to cutoff search thus
limiting the number of models considered. Such a cutoff can also prevent complexity
Several centuries ago, traders of rice in Japan developed a method to track the price
of the produce [33]. This took the form of marking different levels on a page as the
prices rose and fell and was a primitive form of technical analysis. Modern technical
analysis has added many more technical trading tools that are available to the analyst
in his quest for price direction. In contrast to fundamental analysts who study the
intrinsic reasons for the stock’s rise or fall – profits, earning per share, market share,
23
recent news, etc to establish where they think a stock is headed, technical analyst use
only historical price and volume information. They believe that all news, fundamental
factors and market psychology are reflected in the price value. The technical analyst
therefore uses historical price charts to make his decisions of when to get in and out
tools that we will be using in this study and some sample analysis.
2.4.1.1 Candlesticks
Figure 6. shows the monthly stock prices for approximately the past 6 years of the
S&P500 index. Each stick represents the opening, closing, high and low price of the
day. The bottom of the thin line represents the low price of the day, the top represents
the high and the rectangle is formed using the opening and closing prices. A green
rectangle means the closing price was higher than the opening and a red rectangle
implies the closing was lower than opening. These lend themselves to a host of
interesting patterns used by the technical analyst. For example the reversals in
October 2000 was indicated by a ‘star’ pattern. This pattern, signified by a opening
price that is close or equal to the opening price and there is considerable difference in
the high and low prices indicates a reversal or indecision. There are more than 30
24
Figure 6: Technical Analysis Indicators – Volume and Japanese Candles
25
2.4.1.2 Moving Averages
More modern and one of the most commonly used tools by modern analysts is the
moving average. For example in the graph below for IBM, the black line is the
closing price for the month and the red and blue lines indicate the 3 month and 9
month moving average respectively. Moving averages are derived by averaging the
price over past prices. The 3-month moving average in the chart below (Fig. 7) is the
average of the current month and the past 2 months. Moving averages tend to smooth
out the price curve and help visualize the long-term trend. For this reason they are
often used in pairs, one to indicate much longer terms and another to indicate shorter
term movements. They have the disadvantage of being a trend following signal
instead of a trend predicting one and are often too late to be of use for short term
signaling. Moving averages signal turnarounds when the price crosses over the
average value is less than the price, it indicates that the market is bullish. The 9-
month moving average (blue) is below the price level for the bullish period from 1994
to 200. Conversely the period from 2002 –2003 is identified as bearish by the price
being below the moving average. Shorter term moving averages can be used to signal
26
Moving averages
140
120
100
80
60
40
20
0
Jul-91
Jul-92
Jul-93
Jul-94
Jul-95
Jul-96
Jul-97
Jul-98
Jul-99
Jul-00
Jul-01
Jul-02
Jan-91
Jan-92
Jan-93
Jan-94
Jan-95
Jan-96
Jan-97
Jan-98
Jan-99
Jan-00
Jan-01
Jan-02
Jan-03
Figure 7: Technical Analysis Indicators – Moving Averages
There is perhaps only one more technical tool that is easier and more widely used
than the moving average. The trend line is visually appealing and easy to interpret.
Trend lines are constructed by joining local maxima by a straight line and extending it
to the present day. Such lines are upper resistance levels. Similar lower resistance
levels can be drawn by connecting the local minima. In the above graph of IBM
monthly prices (Fig. 8), 4 trend lines have been identified. The 2 blue lines identify
lower resistance levels and the red and orange lines are upper resistance levels. As
can be seen, prices face resistance at these levels and tend to move by oscillating in
the channels created. Any breakout is followed by a large movement in the direction
27
of the breakout. It is therefore profitable to identify such breakouts. It can also be
noted that after the breakout, resistance lines may become support lines (the red line)
and vice-versa.
140
120
100
80
Ser ies1
60
40
20
The rate of change is a momentum indicator, so called because it indicates the change
in sentiment of the stock. The graph above (Fig. 9) compares the stock price with the
3-month (blue) and the 12-month (red) rate-of-change (ROC) indicator. The value for
ROC is given by
28
This gives the percentage change in the price for the period specified. The ROC is
the red line is at the lowest point indicating that a change in direction is soon to occur.
Similarly the high of August 1999, is indicated by the consequent downtrend in the
ROC. Most analysts use 2 ROC indicators, one for the short term and another for the
longer term.
140
120
100
80
Closing Pr ice
60
40
20
1 4 0 .0 0
1 2 0 .0 0
1 0 0 .0 0
8 0 .0 0
6 0 .0 0
4 0 .0 0
R OC 3
R OC 1 2
2 0 .0 0
0 .0 0
-2 0 .0 0
-4 0 .0 0
-6 0 .0 0
-8 0 .0 0
29
2.4.1.5 Volume
Volume indicates the number of shares of a stock that have been traded. The strength
of a trend is validated by the volume. In addition, periods of low volume may indicate
uncertainty and a possible change in trend. Fig. 10 charts out the volume with the
relation to the price. The peak in volume during Oct-1999 during the short fall of the
price indicates sharp selling pressure and uncertainty in the new levels the stock price
has reached.
140
120
100
80
60
40
20
16000000
14000000
12000000
10000000
8000000
6000000
4000000
2000000
30
2.4.2 Work on technical analysis
There has been quite some work on technical analysis, the path breaking paper being
[34] which shows that the profits generated by Dow Theory (the forerunner of
technical analysis) are worse than buy and hold. This work has recently been
challenged by [35] Brown et al. who show that the Dow Theory may have some merit
after all. Fama [36] concludes that the statistical evidence for random walk over
addition [40] Brown and Jennings make the case that technical analysis has value in
markets where traders are looking for short-term gains. Franket and Froot [41] add a
new dimension by suggesting that the market has periods in which it follows
technical analysts and periods in which fundamental analysis has more weight. Neftci
[42] uses moving averages to successfully trade on the Dow Jones Industrial. The
efficient stock market hypothesis but are now considering different models that lead
to price equilibrium. The technical analysts, on the other hand are still around Wall
Street and the foreign exchange markets and large investment houses use their skills.
31
Is summary, most opponents of technical analysis and market timing make the
following points
2. The transaction costs that accompany market-timing strategies must be made up.
3. All current knowledge of the stock is quickly and completely incorporated into
1. The prices reflect not just the fundamentals of a stock but also the psychology of
the investors.
2. Market information is not as readily factored into the price as it is thought too.
2.5 GP In Finance
Over the last 10 years there have been various publications of GA in finance but it is
only since 1994 that GPs were beginning to be used for financial applications. A lot
of this work has been in the area of forecasting and trading, which are closely related.
The use of technical analysis has also been studied for forecasting. Technical analysis
has been used to quite an extent for foreign exchange trading. This is partially
because of the number of factors that affect the exchange rate and make it difficult to
32
alternative. Neely [22] reports strong evidence of out-of-sample excess returns for
technical trading rules over 6 exchange rates and ability to detect patterns that are not
recognized by standard statistical models. There has also been work on the
forecasting and trading of stock indices. The most important work here is Allen and
Karjalainen’s [21] who discover that the rules do not have consistent excess returns
Allen and Karjalainen implement the GP with using the language described below
1. Arithmetic operations: plus, minus, times, divide, norm, average, max, min, lag
4. Numerical constants
Their fitness function calculates the excess return on the data over the buy-and-hold.
A population of 500 trees is used in a 50-generation GP. The data is the S&P500
index daily prices. They guard against overfitting by using a validation data set which
they obtain by splitting the data into 3 periods: a training period (5 years), a validation
period (2 years) and a test period (the remainder). They train the data on the first
period and after each generation apply the fittest rule (the one that has the largest
excess return) on the validation period. If this rule does better on the validation period
than any other rule so far encountered, it is saved as the best found so far.
33
Because of the nature of GPs, they end up with rules from about 5 to 10 levels deep
that contain redundant sub trees and sub trees that are never processed. The choice of
operators also renders the tree difficult to human interpretation and they lack an
intuitive feel for the underlying significance. In order to measure the complexity of
the rule generated, they compare the return of the rules generated to the returns
They conclude that the results obtained do not earn excess returns over a simple buy-
and-hold strategy in the out of sample period if transaction costs are considered.
34
3 Innovations and Experimental Design
This chapter describes changes from previous work, especially Allen and Karjalainen
[21]. We suggest changes and describe the experimental design and details for the
experiments to test these changes. Details of the genome design, technical indicators
approach
We use a complexity-penalizing factor that reduces the fitness level of the tree as the
of the tree with its depth or its size. If there is, therefore, a small tree with the same
return as a larger tree, the smaller tree is declared more fit. This has 2 effects – by
restricting the growth of the tree to what is essential to increase return, we increase
comprehensibility. Also the smaller tree has fewer nodes and is less likely to over fit
the training data. By not restricting the size of the tree to an absolute value, and only
encouraging it to be smaller, it is hoped that the gene pool remains rich and large. It
also admits the possibility of finding a very profitable tree that is larger than the size
35
we are looking for. Such increased profitability would be of a nature that would allow
There are also a number of different ways the factor can be constructed. [44]
describes one of them. In the search for factors, it is important to note that the factor
well, but was not general enough to be used at different scales and was rejected.
was used.
As discussed above, reducing complexity of models has sometimes led to fewer and
Most previous studies have used moving average crossovers, moving averages and
trading range breakouts as the main technical indicators. It was decided to extend this
set by using rate of change as volatility indicators, trend lines and support levels. The
details of these indicators are discussed below. By using these indicators we are
biasing the search, but are adding a large amount of comprehensibility because these
blocks are well recognized and understood by the technical analyst. Of particular note
36
Arithmetic operators such as (+, -, *, /) make the rule difficult to understand. For
example in initial trial runs it was found that we obtain rules that contain sub-trees
such as the one shown below (Fig. 11) that represents the rule
Such a rule that might be profitable does not indicate very well what the rule is trying
to do.
<
* *
MA MA P MA
3 3 6
It was therefore decided to do away with arithmetic operators and leave only the
c. Technical Analysis indicators (prices, moving averages, etc.) that give a number.
This reduces the number of operators and makes the structure less powerful, but it is
hoped that with the judicious choice of technical indicators, this can be overcome. We
have also done away with the usual constructs of “if-then-else” and the boolean
37
3.1.3 Fitness functions
A common fitness function for technical trading rule generation uses past return over
the entire training period as the fitness function. In certain situations, this might result
in the GP produced rule producing large returns but only over a small period of the
training period. Such a rule would not generate consistent annual returns over a large
period but erratic periods of high and low returns. In a desire to avoid this, we
propose a fitness function that measures the number of years that the annual return is
greater than or equal to the minimum of the risk free return and the stock market
return for that year. We would expect such a function to provide more consistent
The problem of having a single rule decide when to get in and out of the market may
perhaps be better solved by having 2 rules – one to decide when to get in and another
to decide when to get out. This would result in specialization of buying and selling
rules. These rules then would have to developed and paired together. This can be
accomplished by coevolving the rules together. Essentially the structure and nodes
comprising the tree remain the same and only the function assigned to each tree
changes by this arrangement. Also because the rules are working together to attain a
common goal (maximization of the number of positive periods or total return), the
38
As mentioned above in Chapter 2, this leads to issues with choosing collaborators and
credit assignment. The credit assignment for this cooperative coevolution problem is
easily solved by assigning the return generated by the pair to each. Equal credit is
given to the buy and sell tree that worked together to generate the return.
3.2 Data
For data, historical monthly prices for the S&P 500 index were used. This index was
measure the performance of their portfolios. Portfolios are evaluated by their ability
to beat the S&P 500’s return. It is therefore easy and meaningful to make
2. The S&P 500 indicates the overall trend of the market and the economic climate.
The companies included in this index fund are chosen to be representative of the
important industries within the U.S. economy. This is borne out by the fact that the
39
stands to reason that there has been more long term sustained interest in predicting the
ups and downs of the S&P 500 than in most other stocks and funds
3. Splits and other capital corrections do not affect the continuity of the index. The
index measures the amount of investment by taking the product of the price of a stock
and the number of outstanding stocks. The sum is then related to the standard 1941-
43 base, which is given a value of 10. This has the effect of ensuring continuity of the
index and avoiding distortions introduced by splits and rights. Continuity helps the
movements and values. We have however ignored the changes that might be
4. Most previous research has been conducted on the S&P 500. It is easiest to make
5. It is readily available and has a large amount of historical data that is easily
onwards). This provides for a large amount of training data that is essential for
machine learning techniques. Ready availability also makes it easier for other
researchers to use.
Even though it would be possible to consider just the value of the scrip for training
and comparison, it would not by itself be an accurate measure. The tradeoff between
perceived risk/volatility and return is the defining factor that pushes investment in and
out of the equity markets. As a measure of comparison, the risk free interest rates
40
available by buying government bonds and treasury bills form an important base
level. In common with other researchers we have used the 3-month government
We then went ahead to derive some technical indicators from this data. The choice of
these indicators was based on 2 considerations. Firstly, that they be commonly used
by technical analysts and easily understandable. Secondly there exist some previous
as focus areas. All 3 have been validated in previous studies [44]. For good measure
we added 2 indicators that are widely used but not studied as well
41
3.2.4 Moving Averages
month, 6 month, and 3 month averages. In preliminary runs we discovered that the
short-term indicator could be shortened further without generating false signals and
average as well. These were pre-calculated from the closing price of the months to
Local maxima and minima can be identified for months previous to the current.
Because we are looking for significant resistance levels, minima and maxima are
identified from the moving average values. This helps to weed out erratic price
movements and smoothes out the curve to obtain values that are more significant. In
order to draw a trend line for the trading range, we need at least 2 previous values of
the maxima and/or minima. Consequently we find the previous maxima and the
Trend lines are one of the analyst’s most used tools. They are drawn by connecting up
2 or more local minima or maxima. This results in support levels and resistance levels
42
respectively. It is possible to approximate this process by connecting the 2 previous
minima/maxima. One obtains significantly better resistance levels by using the short
term moving average to draw them instead of vanilla price values. Therefore the 2-
month moving average values were used. It must be noted that using a number of
different moving averages can generate a number of resistance levels and trend lines
that might be useful. Longer term moving averages can result in longer term support
volatility – 3 month and 12 month. The values are pre-computed for computational
efficiency.
The need for a 10-month moving average and a 12-month ROC value means the first
year of the data is unusable for training. In addition we want the minima and maxima
to be defined at all points, so we start the training period effectively at 1960. The data
from 1960 to 2002 has to therefore be divided up into a training and test period. For
the training to be significant we want the in-sample period to be at least twice as long
sample and 1991-2002 as the out-of-sample period. Fig. 12 is the price chart for the
43
Figure 12: Price chart of monthly closing values from 1960-1990
44
3.3 Details of the Genome Structure
The operators we use for the genome have been discussed above. Here we take a look
values on a comparison to past values but also to some extent on the level of the
indicator.
1. Moving averages
3. Trend lines
With the exception of rate of change and volume, all the other indicators give us
values that are comparable to the current price or on the same scale. This means they
can all be compared to each other. This is very much what the chartist does. He
45
compares moving averages, maxima/minima, trend lines, last month’s data and then
takes into consideration a separate graph of the rate of change and another graph of
the volume. For the first set of indicators the comparisons made are mostly of the
nature of crossovers. They wait for 2 values to crossover, for example the current
price is compared to the trendline that currently exists. If the price goes through
trendline (crossover) it signals an important point in the market. This is why we hope
that using just the comparison operators and avoiding arithmetic operators should
prove useful.
For the volume and rate of change indicators, we again take recourse to the
comparison operators, but here the patterns are not as easy to discover. For example
one could use moving averages of the volume for crossovers and for the rate of
change indicators, it might be worthwhile using 2 different periods and looking for
crossovers thought admittedly this is a rather poor way of using these 2 indicators.
We therefore just restrict ourselves to the ROC-3 months, ROC – 12 months. And use
this month’s volume and last month’s volume. One might argue that the volume
increase is used to substantiate a price movement and for this purpose, comparison of
the current month’s volume to the previous month’s should serve the purpose.
Having decided therefore to use only comparison operators, the basic structure of the
rules becomes evident. For comparison operators, we use the < and > operators. The
46
<
TI 1 TI 2
where TI-1 and TI-2 indicate 2 technical indicators. For example TI-1 could be MA3
Of course, if TI-1 were this month’s volume (V) then TI-2 must be last month’s
volume (v). Similarly if ROC12 is TI-1 or TI-2, the other TI must be ROC3.
Now, to generate more complex trading rules, we need to combine these simple rules
using boolean operators such as AND, OR and NOT. For example we could have two
rules (P > MA6) and (MA3 < MA6) put together with either an AND (below) or an
AND
< >
MA MA P MA
3 6 6
47
This structure can be extended by using more boolean operators to create more
OR
>
AND
MA t
< > 2
MA MA P MA
3 6 6
This results in trees in which the 2 leaf nodes and the node connecting it form a
simple rule and all the nodes above it put them together. These 3 nodes process real
number values and return boolean (true/false) results that are further combined by the
nodes above it. The entire tree therefore always returns a True or False value for a
particular month.
48
3.4 Genetic Program Parameters
As defined above, the nodes are implemented as a class hierarchy of different node
types, identified by the types of children they can have and the parents that can have
them
Basic Node
This relationship is described graphically in Fig. 16. The dotted lines indicate the
49
3.4.2 Random Tree Initialization
The initial random trees are generated such that they have a maximum depth of 10. A
For the mutation operation we have a few choices that are detailed in the GALib[45]
we used.
1. Sub-tree Swap Mutation: We could take a sub tree and graft it elsewhere in the
tree
50
Figure 17: Sub tree swap mutation
This cannot be used in this case because as can be seen, it violates the parent-child
node definitions. Specifically, it would end up with the left hand side dark node with
a hanging boolean/comparison operator and would give the right hand side dark node
2. Tree Node Swap Mutation: Here we swap 2 nodes. The same problem as above
exists. Of course here we can check the nodes swapped to ensure that they are of the
3. Sub-Tree Destructive Mutation: This method destroys the sub tree of a random
tree node. This again has the same problem as the sub-tree swap mutation,
51
Figure 19 : Sub Tree Destructive Mutation
4. We use a fourth option. We choose a random node in the tree and then replace it
with a similar node of a different type. For example if the node we chose was a
boolean operator, say AND, we could change it to OR. Similarly, ‘<’ could be
changed to ‘>’. Technical indicators like MA3 can be changed to MA6. We must
ensure however that for nodes like Rate-of-Change we don’t change the node to
the figure below, a random node is chosen from one of the 2 parents (say mom) and a
random node is chosen from dad. These sub trees are then swapped.
52
Here as well, one must watch out for the types of nodes chosen. They must be of the
The objective function uses the technical trading rule to carry out trades to achieve
the required objective. Initially, we start out by attempting to maximize the total
1. Initialize
b. Buy = true.
c. InTheMarket = false
d. Amount =1000
53
The action matrix for the tree can be described as in the following table
True False
Transaction costs are used for each transaction (buy/sell). The standard transaction
3.5 Experiments
In this section we describe the hypotheses to be tested and the experiments suggested
to test them. These tests include the effect of the complexity penalizing factor, the
strategies.
As mentioned in Chapter 2, there have been some results suggesting that reducing the
size of the tree may or may not decrease the performance of the tree. Our first
performance.
54
We therefore run the GP, with the factor and without and study the results. We use
1. Steady – state GP
6. Half the population is replaced at each generation. The half replaced represents
9. Objective function
10. Factor used in one set of runs: objective score * (5/(max (TreeDepth, 5))
The best tree at the end of the run is saved and run on the out of sample data. We run
this tree on the out-of-sample data. A total of 10 runs, each starting with a different
seed is made. This ensures that the initial populations of both sets of runs are the
same.
55
3.5.2 Experiment – II (Different Evaluation function)
From the results of Experiment –I it was noted that some of the rules generated
smaller returns on the test period as the returns on the training period improved. It
was thought that this might be because of a tendency to extract maximum profit from
the training period but not necessarily in a consistent manner throughout the period.
Also, an inordinate amount of weight is given to initial years in the training period
because the returns generated there are carried over to investments in the latter part of
the training period and even though rules may do well at the end of the period, they
may not make enough back to compensate for rules that do well at the start.
Therefore a different evaluation function was proposed. This function would work
1. The amount in hand or in the market is reset to 1000 at the end of the year
2. The fitness value is defined not as the return but as the number of years in which
the rule gave returns that were better or equal to both the buy and hold strategy
return for that year and the risk free interest return for that year.
To verify the hypothesis that this function yields better and more consistently
performing trees, we ran 10 runs of each and compared the results. The parameters
1. Steady – state GP
56
4. Crossover rate: 0.8
6. Half the population is replaced at each generation. The half replaced represents
9. Objective function
a. Value
10. Factor used in both sets of runs: objective score * (5/(max (TreeDepth, 5))
The best tree at the end of the run is saved and run on the out of sample data. We run
this tree on the out-of-sample data. A total of 10 runs, each starting with a different
seed is made. This ensures that the initial populations of both sets of runs are the
same.
57
3.5.3 Experiment – III (Using Paired buy and sell rules)
In order to test whether using two trees – one specialized for buying and one for
selling improves performance we decided to use 2 trees. For this purpose we used
paired buy-sell pairs as the population. Each species (buy and sell) evolves separately.
This affects the crossover operator. Crossovers can now only be carried out between
two similar sub trees (buy-to-buy, sell-to-sell). This ensures that the 2 sub trees have
The tree structure remains the same as for Experiment-I but a for a small change. We
introduce a combination node at the root that combines the two sub trees below it as
shown in Fig.21. The left sub tree is designated as the buy tree and the right sub tree
as the sell tree. The combination node can be designed to work in a couple of ways.
Combine
>
AND
MA t
< > 2
MA MA P MA
3 6 6
58
Consider for example the following action table
This represents the condition that the stock is bought only when
For this rule the combination node would work as an XOR node. So
Buy Æ (Buy rule XOR Sell rule) AND (Buy rule = true)
Sell Æ (Buy rule XOR Sell rule) AND (Sell rule = true)
This method has the advantage of obtaining a very clear signal and confirmation from
both rules for action. On the other hand it is difficult to generate successful rules
because the buy rule has to know about the sell rule and vice versa to develop
59
effectively. In a way this is also equivalent to a voting scheme of 2 rules that can be
developed from this pair by inverting the truth-value of the sell rule. Therefore we use
an alternate arrangement where we just consider the buy rule for buying decisions and
the sell rule for the selling decisions. Effectively we decide which one to consult
depending on the current state and ignore the other tree at that time.
In X True SELL
In X False Do Nothing
In order to compare the performance of the co-evolved tree with the single rule tree
we need to run both GPs and compare the results. From the results of Experiment-I,
we know that the complexity-penalizing factor works and so we use the factor in the
Once again we set the parameters as pretty much the same as above and
1. Steady – state GP
60
5. Mutation rate: 0.1
6. Half the population is replaced at each generation. The half replaced represents
9. Objective function
The best tree at the end of the run is saved and run on the out of sample data. We run
this tree on the out-of-sample data. A total of 10 runs, each starting with a different
seed is made. In this case the initial trees produced are not the same.
compare them to the paired tree. Section 2.2 describes the different ways
collaborators can be chosen from the other species. Consider Fig. 22 below:
61
Figure 22: Overview of co-evolution strategies
In Experiment – III we compared the results of the Single Rule and the Buy-Sell
Paired. Now we consider separately evolving populations of buy and sell rules where
collaborators have to be chosen for evaluating the performance of the trees. Of the
For example, for evaluating the buy trees, we choose the best 5 sell trees based on the
evaluation function. For the first generation, when fitness values have not yet been
assigned we choose random collaborators. Then each buy tree is evaluated with all
62
the 5 sell trees and the average value is assigned to it. The process is repeated for the
sell trees. Evaluating the trees with different collaborators may help in isolating a tree
from the ill effects of a bad collaborator. For example, in a paired configuration a
well performing buy tree may be paired with an ill –performing sell tree and will not
get a chance to get credit for its ‘goodness’. Choosing a variety of collaborators may
In addition because trees that are similar to each other tend to have the same fitness
levels, choosing the best 5 may result in similar trees. To avoid this we have also
replace all the trees except one instead of just half the population at every step.
The hypothesis is therefore, that the unpaired co-evolving trees are better than the
1. Best 5 co-evolved
2. Random 5 co-evolved
3. Paired buy-sell.
1. Steady – state GP
63
4. Crossover rate: 0.8
6. Population replacement
a. Half the population is replaced at each generation for the paired trees. The half
b. All but one replaced for both species in separately coevolving GP. The ones
9. Objective function
The best tree at the end of the run is saved and run on the out of sample data. We run
this tree on the out-of-sample data. A total of 10 runs, each starting with a different
seed is made. In this case the initial trees produced are the same for the BEST5 and
RANDOM5 runs.
64
4 Results
This chapter describes the results of the experiments described in chapter 3. The
results are grouped by experiment number and the conclusions of each experiment are
65
Table 4: Results for experiment I -1
Without Factor
41 11 16832 2023
45 11 16367 1977
66
In comparison to the buy-and-hold strategy
1960-1990 1991-2002
The entries in green indicate the values that beat the buy-and-hold. The last line of
Table 4 indicates the average of the size, depth, in-sample return and the out-of-
It is easily seen that the complexity of the tree is indeed a problem. The smallest tree
has a depth of 11 and size 41 and the largest are almost a 1000 nodes large with a
depth of approximately 90. This also has an effect on predictive performance. Even
though the trees perform on average about 3 times as well on the in-sample data. This
does not carry over to the out-of-sample data where the performance is worse than
buy-and-hold. In fact, only 2 of the rules discovered beat the buy-and-hold in the out-
of-sample data.
67
The results for the runs with the complexity-penalizing factor are
With Factor
15 5 10976 3128
15 5 13690 2006
3 2 8762 3377
15 5 13981 2003
3 2 8762 3377
12 5 14788 1685
15 4 15078 2096
3 2 8762 3377
15 5 9128 3697
3 2 8762 3377
10 4 11269 2812
Once again, the last row indicates the average of the column. Compared to Table 4,
the results seem much more promising. The average for the in sample data is only
twice as much as buy-and-hold but is indicative of the fact that it has not over fit the
data. This is evident from the out-of-sample average that beats the buy-and-hold
indicating that the main trends in the data have been identified and overfitting has
68
been considerably reduced. The results are significant to about 96%. The average size
and depth also indicate very clearly that the rule is much more comprehensible. Of
These results clearly bear out the hypothesis that the addition of a complexity-
In this section we study the results of running the GP with the newly proposed
evaluation function that measures the number of years the GP produced rule beats the
69
The 10 runs with the new evaluation function return the following values
1960-1990 1991-2002
7705 2064
7541 2166
7143 2654
7143 2654
6916 2953
7968 2982
7131 3184
7644 3437
7110 3691
8126 3983
7442.7 2976.8
Once again, the last row represents the averages. As can be seen, the average for the
out-of-sample period has increased. More importantly, the results are consistently
better. Only 2/10 of the rules have a performance worse than the buy-and-hold as
70
We can conclude that this evaluation function does not do any worse than the first
Here we compare the results of using a specialized buy and a specialized sell tree.
Paired Rule
1960-1990 1991-2002
10816 2807
12374 2823
11818 2837
19861 2856
12667 2911
11530 3258
22074 3434
18467 3475
14957 3476
10646 3541
14521 3141.8
71
Comparing the results with table 7 we see significant improvement in the average for
the out-of-sample period and the in-sample period. In addition, we note that all the
rules produced were consistently better than the buy-and-hold. The rule beats the buy-
We conclude that the specialization of function obtained by splitting the single rule
into two rules certainly helped the predictive power and the data fitting capability.
different from fixed pair coevolution. The results of the runs for using 5 random
collaborators from the other species for coevolution are tabulated below
72
Table 9: Results for experiment IV - 1
Buy + Sell
Random 5
1960-1990 1991-2002
12757 2438
13832 2742
12911 2807
15243 2807
14747 2837
12077 2842
16249 3059
16117 3374
17746 3376
6233 3876
13791.2 3015.8
The average for the out-of-sample period beats the buy-and-hold and only one rule
does worse than it. The results are statistically significant at 99% for beating the buy
73
and hold. In comparison with the paired rule results (Table 8), we don’t find much
difference.
For the runs with the Best 5, we have the following results
Buy + Sell
Best 5
1960-1990 1991-2002
18898 2399
18795 2527
21525 2567
14525 2807
12893 2807
14819 2837
14809 2971
14377 3036
12893 3222
13820 3256
15735.4 2842.9
74
As can be seen, these perform somewhat better on the in sample. They still manage to
beat the buy-and-hold on average but not as consistently. It might be conjectured that
75
5 Conclusions & Future Work
This chapter outlines the conclusions from the results of the experiments and
5.1 Conclusions
This thesis has shown that it certainly seems possible to beat the buy-and-hold
strategy of the S&P500 consistently using GP even when considering the rather high
transaction costs of 0.5%. This is at odds to previous work where the GP produced
rules do not consistently beat the S&P500. This is significant because it suggests that
the markets may not as efficient as they are thought to be and it proves that there is
In addition, it has been shown that reducing the size of the produced rule does not
it. In other words, for this application domain at least, simplicity in the size of the
comprehensible trading rules that outperform buy and hold strategies. This might be
an indication that the use of arithmetic operators adds too much complexity to be
76
The choice of technical indicators was justified because a large number of generated
rules used the trend line indicators, the moving averages and previous maxima and
minima.
We have also seen that the proposed annual evaluation function might be a better
alternative to the usually used vanilla return maximization function. This function
This is also important in building investor confidence. For example, a rule that made
a large profit in a 10 year period but lost money consistently in the first 5 years would
perhaps lose the investor’s trust in the first 5-year period and not be used over the 10-
year period to make up for the loss. Consistent and frequent positive returns are
important in developing this trust. This is similar in some sense to the evaluation of
money managers, whose performance is measured by how many years they have beat
By separating the buy and sell rules and evolving them separately, we obtain much
better and certainly more consistent performance. In the comparison between the
paired rule, random 5 and best 5 methods, there was not much difference but the
results seem to indicate that the paired rules perform the best. They beat the buy-and-
hold strategy with 99% significance. It appears that in the tradeoff between diversity
may be more important in this domain. This technique may be applicable to other
77
domains that are traditionally thought of as having a single undifferentiated solution.
For such domains at least, one should consider using fixed collaborators from the
Staged Priming
In the co-evolution schemes, the initial trees we consider are random trees. Instead
we might perhaps use the rules generated by one of the schemes to jump-start the
other ones. For example we could use the rules generated by the paired rules
collaborators has been Best 5 and Random 5. It is possible to use a best-of both-
worlds strategy by using some best and some random trees as collaborators. This
Competitive Coevolution
trading rules. In a manner similar to Hillis’ [13] work on sorting networks we could
78
have two species: one of rules and the other consisting of a block of years from the
training period. Then as the rules evolved to perform well on the set of years, the set
of years would evolve to become more difficult to perform on. This would lead to a
rule that did well on a variety of years and this might carry over to the unseen testing
sample.
Voting schemes
profitable to attempt a voting scheme where different rules vote on the decision to be
79
Appendix
For depth=2
There are 13 indicators apart from the rate-of-change and volume indicators that can
be directly compared. Not counting the ROC and volume indicators, we have 13C2
For trees of depth > 2, the last 2 levels are made up of 2 technical indicators and a
comparison operators. Also because there are 2^(n-2) such triples and 156 such
possible triples. The last 2 levels contain 156 ^ (2^(n-2)). The nodes above them are
made up of Boolean operators (AND, OR, NOT) and there are [2^(n-2)-1] such
nodes. Ignoring NOT which has only one subtree, there are 2^[2^(n-2)-1]
[156^(2^(n-2))] * [2^[2^(n-2)-1]]
80
Depth Number of possible trees
2 156
3 48672
4 4737927168
5 44895907698545000000
It must be noted that to find all trees with maximum depth 5 we have to sum all
number of possible trees from n=2 to n=5. The large numbers of trees indicate clearly
that GP will help find an efficient solution quickly and cheaply compared to
enumeration. For example a GP of population 500 and 150 generations that replaces
half the population at every generation has to calculate only 500*150/2 = 37500 trees.
81
References
[1] Holland, J.H. (1975) "Adaptation in natural and artificial systems", Ann Arbor,
MI: The University of Michigan Press. 2nd edition. (1992)
[4] Andre, David, Bennet III, Forrest H, and Koza, John R. 1996. Discovery by
genetic programming of a cellular automata rule that is better than any known rule for
the majority classification problem. Genetic Programming 1996: Proceedings of the
First Annual Conference, July 28-31, 1996, MIT Press
[6] S.Calderoni and P. Marcenac. Genetic Programming for automatic design of self-
adaptive robots. Genetic Programming: Proceedings of the First European Workshop.
Springer, Paris, France 1998
[8] Futuyma, D.J. & Slatkin, M. 1983. Coevolution. Sunderland, MA: Sainauer
82
[9] Jansen, D.H. 1980. When is it coevolution? Evolution 34:611-612.
[11] S.A. Kaufmann and S. Johnsen, Co-Evolution to the Edge of Chaos: Coupled
Fitness Landscapes, Poised States, and Co-Evolutionary Avalanches, in: C.G.
Langton, C. Taylor, J.D. Farmer and S. Rasmussen, Artificial Life II: Proceedings of
the Second Artificial Life Workshop, (Addison-Wesley, 1992), 325-369.
[15] Axelrod, R. 1987. The evolution of strategies in the iterated prisoner’s dilemma.
In L.D. Davis, (Ed].), Genetic Algorithms and Simulated Annealing, 32-41. New
York, Morgan Kaufmann.
[16] Cliff, D. & G. F. Miller, G.F. 1995. Tracking the Red Queen: Measurements of
adaptive progress in coevolutionary simulations''. In Moran, F., Moreno, A.,Merelo,
J.J. & Cachon, P. (Eds.) Advances in Artificial Life: Proceedings of the Third
83
European Conference on Artificial Life (ECAL95). Lecture Notes in Artificial
Intelligence 929: 200-218. Springer Verlag.
[18] Phelps, S., Parsons, S., McBurney, P., & Sklar, 2002. Co-evolution of Auction
Mechanisms and Trading Strategies: Towards a Novel Approach to Microeconomic
Design. In Proceedings of ECOMAS-2002 Workshop on Evolutionary Computation
in Multi-Agent Systems, at Genetic and Evolutionary Computation Conference
(GECCO-2002).
[19] Allen, F. & Karjalainen, R. 1999. Using genetic algorithms to find technical
trading rules. Journal of Financial Economics 51:245-271.
[20] Neely, C., Weller, P., & Dittmar, R. 1997. Is Technical Analysis in the Foreign
Exchange Market Profitable? A Genetic Programming Approach. Journal of Financial
and Quantitative Analysis 32:405-26.
[21] Pazzani, M., Mani, S., & Shankle, W.R. 1997.Beyond concise and colorful:
Learning Intelligible Rules. In Proceedings of the Fourth International Conference on
Knowledge Discovery and Data Mining, 235-238. Newport Beach, CA: AAAI Press.
[22] Miller, G.A. 1956. The magic number seven, plus or minus two: Some limits on
our capacity for processing information. Psychological Review 63:81-97.
[23] Laird, J.E., P.S. Rosenbloom, A. Newell.1986. Chunking in Soar: The Anatomy
of a General Learning Mechanism. Machine Learning 1:11-46.
84
[24] Quinlan J.R. 1987. Simplifying decision trees. International Journal of Man-
Machine Studies 27:221-334.
[26] Buntine, W. & Nibblet, T. 1992. A further comparison of splitting rules for
decision-tree induction. Machine Learning 8:75-86.
[27] Clark, P. & Nibblet, T. 1989. The CN2 Induction Algorithm, Machine Learning
3:261-283.
[28] Mingers,J. 1989. An empirical comparison of pruning methods for decision tree
induction.Machine Learning 4:227-443.
[29] A. Blumer,A., Ehrenfeucht, A., Haussler, D., & M.K.Warmuth, M.K. 1987.
Occam’s Razor.Information Processing Letters 24:377-380.
[31] Domingos, P. 1999. The Role of Occam’s Razor in Knowledge Discovery. Data
Mining and Knowledge Discovery 3:409-425, 1999.
[32] Jensen, D. & Cohen, P.R. 2000. Multiple Comparisons in Induction Algorithms.
Machine Learning, 38:309-338.
85
[33] Stephen W. Bigalow, John Wiley & Sons; 1 edition (December 21, 2001)
Profitable Candlestick Trading: Pinpointing Market Opportunities to Maximize
Profits.
[34] Alfred Cowles 3rd,“Can Stock Market Forecasters Forecast”, Econometrica, Vol
1, Issue 3, July 1933. 309-324
[35] Brown, S., Goetzmann W. and Kumar A. (1998). 'The Dow Theory: William
Peter Hamilton's Track Record Reconsidered', Journal of Finance, 53(4): 1311-1333.
[38] Jack Treynor, Robert Ferguson, “In defense of technical analysis”, Journal of
Finance, Volume 40, Issue 3, Papers and Proceedings of the Forty Third Annual
Meeting American Finance Association, Dallas, Texas, December 28-30, 1984 (July
1985), 757-773
86
[41] Jeffrey A. Frankel, Kenneth A. Froot, Chartists, Fundamentalists, and trading in
the Foreign exchange market, The American Economic Review, Volume 80, Issue 2,
Papers and Proceedings of the Hundred and Second Annual Meeting of the American
Economic Association (May, 1990), 181-185
[42] Salih N. Neftci, Naive Trading Rules in Financial Markets and Wieneer-
Komogorov Prediction Theory: A Study of Technical Analysis, Journal of Business,
Volume 63, Issue 4 (Oct. 1991), 549-571.
[44] Bojarczuk, C.C., Lopes, H.S., & AA Freitas, Data Mining with Constrained-
syntax Genetic Programming: Applications in Medical Data Sets. In Proc Intelligent
Data Analysis in Medicine and Pharmacology - a workshop at MedInfo-2001,
London, September 2001.
[45] http://lancet.mit.edu/ga/
87