0% found this document useful (0 votes)
16 views15 pages

APIN2012

Uploaded by

lisa kustina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views15 pages

APIN2012

Uploaded by

lisa kustina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/225372281

Simultaneous optimization of artificial neural networks for financial


forecasting

Article in Applied Intelligence · June 2012


DOI: 10.1007/s10489-011-0303-2

CITATIONS READS

48 136

2 authors, including:

Hyunchul Ahn
Kookmin University
73 PUBLICATIONS 1,578 CITATIONS

SEE PROFILE

All content following this page was uploaded by Hyunchul Ahn on 16 September 2016.

The user has requested enhancement of the downloaded file.


Simultaneous optimization of artificial
neural networks for financial forecasting

Kyoung-jae Kim & Hyunchul Ahn

Applied Intelligence
The International Journal of Artificial
Intelligence, Neural Networks, and
Complex Problem-Solving Technologies

ISSN 0924-669X
Volume 36
Number 4

Appl Intell (2012) 36:887-898


DOI 10.1007/s10489-011-0303-2

1 23
Your article is protected by copyright and
all rights are held exclusively by Springer
Science+Business Media, LLC. This e-offprint
is for personal use only and shall not be self-
archived in electronic repositories. If you
wish to self-archive your work, please use the
accepted author’s version for posting to your
own website or your institution’s repository.
You may further deposit the accepted author’s
version on a funder’s repository at a funder’s
request, provided it is not made publicly
available until 12 months after publication.

1 23
Author's personal copy
Appl Intell (2012) 36:887–898
DOI 10.1007/s10489-011-0303-2

Simultaneous optimization of artificial neural networks


for financial forecasting
Kyoung-jae Kim · Hyunchul Ahn

Published online: 3 June 2011


© Springer Science+Business Media, LLC 2011

Abstract Artificial neural networks (ANNs) have been 1 Introduction


popularly applied for stock market prediction, since they
offer superlative learning ability. However, they often result Artificial neural networks (ANNs) have been popularly ap-
in inconsistent and unpredictable performance in the predic- plied to financial problems, including stock market predic-
tion of noisy financial data due to the problems of determin- tion, bankruptcy prediction, and corporate bond rating. ANN
demonstrates superlative learning ability. However, it often
ing factors involved in design. Prior studies have suggested
results in inconsistent and unpredictable performance in the
genetic algorithm (GA) to mitigate the problems, but most
prediction of noisy financial data. In addition, sometimes the
of them are designed to optimize only one or two archi- amount of data is so large that results from the learning pro-
tectural factors of ANN. With this background, the paper cess are unsatisfactory. The existence of continuous data and
presents a global optimization approach of ANN to predict large numbers of records may pose a challenge to explicit
the stock price index. In this study, GA optimizes multiple concept extraction from raw data due to the huge solution
architectural factors and feature transformations of ANN to space determined by continuous features [27]. The reduc-
relieve the limitations of the conventional backpropagation tion and transformation of irrelevant and redundant features
algorithm synergistically. Experiments show our proposed may shorten the run time of reasoning and yield more gen-
model outperforms conventional approaches in the predic- eralizable results [10].
Another reason for this inconsistent and unpredictable
tion of the stock price index.
performance is the problems associated with the ad hoc na-
ture in designing ANN. Feature subset, the number of pro-
cessing elements in each layer, the connection weights, and
Keywords Simultaneous optimization · Genetic
other architectural factors are determined in advance for the
algorithms · Artificial neural networks · Stock market ANN modeling process. However, determining these factors
prediction is still an art. These factors were usually determined by trial
and error and the subjectivity of the designer. This may lead
to a locally optimized solution, because it cannot guaran-
tee a global optimum [31, 40]. This is a common restriction
K.-j. Kim () of all machine learning algorithms. Even simple algorithms,
Department of Management Information Systems, Dongguk such as logistic regression and k nearest neighbor, require
University_Seoul, 3-26 Pil-dong, Chung-gu, Seoul 100-715,
Korea
a parameter configuration process [56, 61]. However, ANN
e-mail: [email protected] has many more parameters to be set by heuristics compared
to other machine learning algorithms.
H. Ahn () Many approaches such as early stopping, cross valida-
School of Management Information Systems, Kookmin tion, regularization, addition of noise, and averaging multi-
University, 861-1, Jeongneung-dong, Seongbuk-gu, Seoul,
136-702, Korea ple models in an ensemble have been proposed to mitigate
e-mail: [email protected] the local optimization problems. Prior studies have proposed
Author's personal copy
888 K.-j. Kim, H. Ahn

a heuristic approach, such as a genetic algorithm (GA), to re- 2.1.1 Connection weights
lieve the ad hoc nature for designing ANN without the local
optimization problem. These techniques seem more suited The learning process in ANN is implemented by adjusting
to difficult nonlinear optimization problems, since obtaining the connection weights, until the desired response is attained
an optimal solution is the goal of ANN training [45]. at the output processing elements. Many ANN studies relied
In this paper, we propose a new optimization model for on the gradient descent algorithm, typically a variation of
ANN using GA. It simultaneously optimizes four major ar- BP [36], to obtain the connection weights of the model. In
chitectural factors of ANN–connection weights, the num- this type of ANN, the error signal from the error function is
ber of neurons in the hidden layer, feature subset selection, propagated back through the network from the output pro-
and feature transformation. Studies on the optimization of cessing elements, changing the connection weights that are
each factor show optimization improves the performance of proportional to the error. However, Sexton et al. [40] and
ANNs (see [12, 14, 17, 32, 34, 37, 40, 41, 53, 59]). A few Sexton et al. [44] pointed out that the gradient descent algo-
studies apply simultaneous optimization of multiple factors rithm may perform poorly, even on simple problems, when
to ANNs [20–24, 28, 35]. These studies have shown that the predicting the holdout data. Their indication stems from the
simultaneous optimization of multiple factors outperforms fact that BP is a local search algorithm and may tend to fall
the optimization of a single factor. With this in mind, the re- into a local optimum.
search model of our paper is designed to globally optimize Sexton et al. [41] suggested that one of the most promis-
all four major architectural factors. This study uses the pro- ing directions for handling this limitation is using global
posed model to predict the future direction of change in the search techniques to search the weight vector of ANN. Sex-
Korea Composite Stock Price Index (KOSPI). ton et al. [40] employed Tabu search to optimize the network
The remainder of the paper is organized as follows: The and found that Tabu search-derived solutions were signif-
next section reviews prior studies on the determination of icantly superior to those of BP solutions for the test data.
architectural factors for designing ANN. Basic concepts of In another paper, Sexton et al. [42] again incorporated simu-
lated annealing, one of global search algorithms, to optimize
GA are also briefly introduced in this section. Section 3 pro-
the network. They compared GA to simulated annealing and
poses the simultaneous optimization approach using GA for
concluded that GA outperformed simulated annealing for
ANN, describing the benefits of the proposed model. Sec-
their experiments.
tion 4 summarizes and discusses the empirical results. Fi-
Sexton et al. [41] and Sexton et al. [44] also employed
nally, Sect. 5 presents the conclusions and limitations of the
GA and a modified GA (MGA) to search the weight vector
study.
of ANN. The results showed that the GA-derived solution
was superior to the corresponding BP solution. Ignizio and
Soltys [21], and Gupta and Sexton [17] also suggested that
2 Literature review the GA-derived solutions are superior to gradient descent
algorithm derived solutions.
2.1 Determination of architectural factors for designing
ANN 2.1.2 Feature subset selection for ANN

Zhao and Higuchi [62] suggest that the following three prob- Feature subset selection tries to pick a subset of features rel-
lems must be solved to design efficient ANN: determination evant to the target concept and remove the irrelevant or re-
of the number of the hidden layers, the number of processing dundant features [10]. Feature subset selection may reduce
elements in the hidden layers, and the connection weights the run time. For ANN, generalization by feature subset se-
between the layers. In practice, feature subset selection and lection refers to the ability of ANN to predict estimates from
feature transformation also need to be considered. Appropri- patterns that have not been seen by the network [39].
ate feature subset selection selects a subset of features that The function computed by ANN is determined by its
are relevant to the target concept, and this helps to generalize topology, as well as the computations performed by individ-
the ANN model and improve prediction accuracy [10, 39]. ual processing elements, thus selecting an appropriate fea-
In addition, feature transformation may improve prediction ture subset is critical. However, there are few formal tech-
accuracy and generalization ability by transforming an inde- niques in the gradient descent algorithms to select a relevant
pendent feature to maximize its association with the values feature subset [8].
of dependent and other independent features [10, 23]. Traditional selection techniques associated with linear
Although many ANN studies have used backpropagation methods, including linear regression and discriminant anal-
(BP) neural networks, most of them do not have methods ysis, provide relevant variables that have a strong linear re-
that are appropriate to determine these factors. The present lationship with the dependent variable. However, the vari-
discussion reviews these architectural factors. ables selected by them may not be relevant in ANN, which
Author's personal copy
Simultaneous optimization of artificial neural networks for financial forecasting 889

presents non-linear relationships between independent vari- [15]. However, it may lead to low prediction performance,
ables and a dependent variable [54]. As an alternative, GA is since prediction performance is enhanced through the abil-
increasingly used in various classifiers including ANN [34, ity of discrimination, not only by a single feature but also by
59], inductive learning [3, 52], and linear regression [55]. the association among features.
GA has the following advantages: It selects feature subsets Conversely, feature transformation is one of the most
that are relevant to the specific fitness function of the ap- popular preprocessing methods. Feature transformation is
plication. In addition, GA does not require the restrictive the process of creating a new set of features [26]. Feature
monotonicity assumption and readily lends itself to the use transformation methods are classified as endogenous and ex-
of multiple selection criteria [59]. ogenous. Endogenous methods do not consider the value of
dependent features, while exogenous ones do [13, 38, 50].
2.1.3 The number of processing elements in the hidden Linear scaling is an example of an endogenous method.
layer In contrast, the exogenous methods transform an indepen-
dent feature to maximize its association with the values of
Adding additional processing elements to the hidden layer dependent and other independent features. The entropy min-
may enhance the computational abilities of ANN, because imization heuristic in inductive learning and the k-nearest
using too few hidden processing elements may starve the neighbor method [15, 29, 51], and GA for C4.5 and ANN
network of the resources it needs to solve the complex prob- [23, 52] are classified as exogenous methods. GA is also
lem [30]. However, in some cases, the models with an ex- popularly applied to feature transformation, because it trans-
cessive number of processing elements in the hidden layer forms the features according to their specific fitness function
suffer from overfitting by learning the insignificant aspects and performs well considering the association among each
of known data [18]. In this aspect, some researchers sug- independent and dependent feature.
gested that ANN with a parsimonious topology could yield In this study, the process of feature transformation con-
consistent performance from unknown data. siders the conversion of continuous variables into discrete
In practice, forward selection and backward elimination ones (i.e. feature discretization). It reduces the dimension-
methods have been used to determine the number of pro- ality of the feature space, which not only decreases the op-
cessing elements in the hidden layer. The forward selection eration’s cost and time, but also enhances the generalization
method starts with a small number of hidden processing el- ability of the classifier.
ements, and then adaptively increases the number of hidden
processing elements during training. The backward elimina- 2.2 Genetic algorithms
tion method starts with a large network and reduces the num-
ber of hidden processing elements until the proper topol- This study uses GA to optimize ANN. GA is selected from
ogy is obtained. While many forward selection and back- the many global search algorithms, because it has been
ward elimination methods have been proposed, these meth- shown to achieve better solutions in ANN compared to other
ods are usually quite complex, difficult to implement, and global search algorithms, such as simulated annealing [42].
cannot guarantee optimal topology selection [8]. Some re- GA is usually employed to enhance the performance of AI
searchers proposed GA as a technique to determine the opti- techniques. GA was applied for the selection of neural net-
mal topology of ANN. Maniezzo [28], Williamson [57], and work topology, including optimization of the relevant fea-
other recent research [19, 25] have used GA as the method ture subset, determination of the optimal number of the hid-
to determine the optimal topology of ANN. In addition, Yao den layers and processing elements for ANN. GA has been
and Liu [60] suggested a new evolutionary algorithm to op- investigated recently and shown to be effective in exploring
timize ANN’s architectures and connection weights. Their a complex space adaptively, guided by the biological evo-
system, EPNet, was based on Fogel’s evolutionary program- lution of selection, crossover, and mutation [2]. This algo-
ming (EP). They empirically showed that EP outperforms rithm uses natural selection, survival of the fittest, to solve
GA. optimization problems.
The first step of GA is problem representation. The prob-
2.1.4 Feature transformation for ANN lem must be represented in a suitable form to be handled by
GA. Thus, the problem is described in terms of genetic code,
When features are loaded into ANN, they must be prepro- like DNA chromosomes. GA often works with a form of
cessed from their numeric range into the numeric range that binary coding. If the problems are coded as chromosomes,
ANN deals with efficiently. Linear scaling is popularly used the populations are initialized. Each chromosome within the
as a data preprocessing method for ANN. It is a simplistic population gradually evolves through biological operations.
method of data preprocessing but it does not consider the Although, there are no general rules to determine the popu-
association among each independent and dependent feature lation size, population sizes of 100–200 are commonly used
Author's personal copy
890 K.-j. Kim, H. Ahn

in GA research. Once the population size is chosen, the ini- these factors are considered for optimization, then we ex-
tial population is randomly generated [4]. Each chromosome pect to achieve considerably better performance, because
is evaluated by a fitness function after the initialization step. optimization of all factors synergistically may lead to global
The chromosomes associated with the fittest individuals will optimization as a whole. This paper proposes a novel op-
be reproduced more often than those associated with unfit timization model for ANN to mitigate these limitations by
individuals, based on the value of the fitness function [11]. optimizing all four major factors.
GA works with three operators that are iteratively used.
This study employs three modules, described below, to
The selection operator determines which individuals may
study the limitations of previous studies.
survive [58]. The crossover operator allows the search to
fan out in diverse directions looking for attractive solutions Module 1 Network topology (input and hidden processing
and permits combination of genetic material from different
elements) optimization module
parents to a single child. There popular crossover methods
Module 2 Connection weight optimization module
are: single-point, two-point, and uniform. The single-point
Module 3 Feature transformation module
crossover makes only one cut in each chromosome and se-
lects two adjacent genes on the chromosome of a parent. The first module uses GA to select optimal network
The two-point crossover involves two cuts in each chro-
topology. This module consists of the feature selection sub-
mosome. The uniform crossover allows two parent strings
module and the selection sub-module for hidden processing
to produce two children. It permits great flexibility in the
elements. In the second module, GA optimizes connection
way strings are combined. In addition, the mutation opera-
tor arbitrarily alters one or more components of a selected weights in ANN. The third module adopts feature transfor-
chromosome. Mutation randomly changes a gene on a chro- mation based on GA.
mosome. It provides the means to introduce new informa- In this study, we examine the performance of five differ-
tion into the population. Finally, GA tends to converge on ent models using various combinations of these three mod-
an optimal or near-optimal solution through these opera- ules. We compare these five models with two comparable
tors [32]. models to verify the effectiveness of the proposed approach.
Each model is described as follows.

3 Simultaneous optimization using GA for ANN


3.1 SOGN1 (the first model of simultaneous optimization
As explained in the previous section, many studies have tried using GA for ANN)
to optimize the subset of the four major factors stated in
Sect. 2.1. Table 1 summarizes previous studies on hybrid
models between GA and ANN. In this model, we use the first and the second modules and
Performance improvement is observed if these factors are use linear scaling for feature transformation. Maniezzo [28],
optimized, but it is a partial optimization. However, if all of and Ignizio and Soltys [21] suggested similar models.

Table 1 The optimized factors


for ANN using GA in prior Reference Connection Number of Feature Feature Other factors
studies weight processing subset transformation
elements in the
hidden layer

Montana and Davis [32] O BW*


Maniezzo [28] O O O HL**
Ignizio and Soltys [21] O O O
Hansen [18] O O HL/AF***
Dorsey and Sexton [12] O BW
Sexton et al. [41] O
Sexton [39] O O BW
Sexton et al. [42] O
* Binary representation of Gupta and Sexton [17] O
connection weight
Kim and Han [23] O O
** Number of hidden layer
Sexton et al. [44] O O O
*** Activation function
Author's personal copy
Simultaneous optimization of artificial neural networks for financial forecasting 891

3.2 SOGN2 (the second model of simultaneous 3.6 Comparable models: CN (conventional ANN) and
optimization using GA for ANN) SIGN (simple optimization using GA for ANN)

The second and the third modules are employed in this In this study we compare the above five models to two con-
model. In addition, continuous features are discretized as ventional models: CN (the conventional ANN) and SIGN
described as follows. As mentioned earlier, discretization of (the conventional ANN, except that it uses the second mod-
continuous variables may ease the learning process of the ule to find optimal connection weights).
algorithm. A continuous feature is transforms into three dis- CN uses the conventional approach of BP learning, with a
crete ones in this model. The thresholds for discretization gradient search algorithm. The connection weights of ANN
are determined by GA but the level of discretization is fixed in this model are adjusted by the gradient descent algorithm.
at three, because stock market analysts usually accept the All initial features are incorporated as input features. The
values of popular technical indicators as low, medium, and number of processing elements in the hidden layer is fixed
high [49]. at the number of incorporated features, because BP does not
have the general rules to determine the optimal number of
3.3 SOGN3 (the third model of simultaneous optimization hidden processing elements. Linear scaling to unit variance
using GA for ANN) is used for feature transformation, because prior studies have
usually employed this method for ANN.
The third model is like SOGN2, and differs only in that the In SIGN, GA simply searches near-optimal connection
level of discretization is not predetermined to be three. This weights between the layers. The determining method of
number is constrained to between one and five in this model. other factors, including feature subset, the number of pro-
A similar model was previously suggested by [23]. cessing elements in the hidden layer, and feature transforma-
tion are just like CN. This model was previously suggested
3.4 SOGN4 (the fourth model of simultaneous by [12, 17, 32, 40, 41].
optimization using GA for ANN) Table 2 summarizes these seven models.
This study describes the optimization process of SOGN5
This model employs all three modules. However, the num- from the above models. The optimization processes of the
ber of categories to which a continuous feature is discretized other four SOGN models and SIGN are like SOGN5, with
is fixed at three. the exception of some architectural factors in ANN. The
SOGN5 process consists of the following three stages:
3.5 SOGN5 (the fifth model of simultaneous optimization Stage 1. In the first stage, the model initializes the GA
using GA for ANN) search. Specifically, the populations are initialized to ran-
dom values before the search process. As mentioned in
Similar to SOGN4, this model employs all three modules. Sect. 2.2, the search parameters must be appropriately en-
In addition, it finds the optimum number of discrete values coded on chromosomes. Thus, the parameters to be searched
that a continuous feature takes. Figure 1 shows the overall (i.e. all the architectural factors including feature subset se-
framework of SOGN5. lection, the number of nodes in the hidden layer, connec-

Fig. 1 Framework of SOGN5


Author's personal copy
892 K.-j. Kim, H. Ahn

Table 2 Summary of seven


models Connection Number of Feature subset Feature trans- Number of
weight processing elements selection formation categories to
in the hidden layer be discretized

CN GD* 12 – LS** –
SIGN GA 12 – LS –
SOGN1 GA GA GA LS –
SOGN2 GA 12 – GA 3
SOGN3 GA 12 – GA GA
* Gradient descent algorithm SOGN4 GA GA GA GA 3
** Linear scaling to unit SOGN5 GA GA GA GA GA
variance

tion weights, and feature transformation) are encoded into a 4 Empirical tests and results
chromosome in a binary string format.
The encoded chromosomes aim to maximize the fitness 4.1 Research data and experiments
function. The fitness function is problem specific. In this
study, the objectives of the model are to approximate con- The research data used in this study is the daily Korean
nection weights and to optimize the architectural factors for Composite Stock Price Index (KOSPI) from January 1989
the correct solutions. These objectives can be represented by to December 1998. The sample includes 2,928 trading days.
the average prediction accuracy of the training data. Thus, Technical indicators are used as inputs, since we attempt to
this study applies the average prediction accuracy of the forecast the direction of daily price change in KOSPI. This
training data to the fitness function. Mathematically, the fit- study selects 12 technical indicators to form the initial fea-
ness function is represented as (1): ture subset, as determined by the review of domain experts
and prior studies. Table 3 presents the descriptions of ini-
1
n
tially selected features.
Fitness = CRi (i = 1, 2, . . . , n)
n The aim of the study is to predict the directions of daily
i=1
 (1) change of the stock price index. They are categorized as “0”
if POi = AOi CRi = 1 or “1” in the research data. “0” denotes that the index’s price
otherwise CRi = 0 at day t + 1 is lower than the index’s price at day t. Con-
versely, “1” denotes that the index’s price at day t + 1 is
where CRi is the prediction result for the ith trading day, de- higher than the index’s price at day t. About 20% of the data
noted by 0 or 1. POi is the predicted output from the model is used for the holdout and 80% for training. We also di-
for the ith trading day. AOi is the actual output for the ith vide the training dataset into two sub datasets—the training
trading day. set (60%) and the test set (20%). The training set is used
Stage 2. The second stage is the process of GA search. In to search near-optimal parameters and is employed to eval-
this stage, GA searches for optimal or near-optimal values uate the fitness function. We examine the prediction results
of all architectural factors in ANN. Feedforward computa- for the test dataset during ANN training to avoid overfitting.
tion in ANN is performed for each chromosome to calculate The holdout data are used to check the performance of the
the fitness value of the chromosome. The sigmoid function algorithm when unknown data are applied. Table 4 shows
is used as an activation function. This function is a popu- the number of cases for training and holdout datasets.
lar activation function for BP neural networks, because it
can easily be differentiated. A linear function is used as a 4.2 Genetic encoding
combination function for the feedforward computation with
connection weights contained in each chromosome. In this As mentioned in Sect. 2.2, problem representation is the first
stage, GA operates the process of selection, crossover and step of GA. Thus, the problem must be represented in terms
mutation on initial chromosomes and iterates until the stop- of strings. The strings used in SOGN5 have the following
ping conditions are satisfied. encoding:
Stage 3. The computed connection weights and architec- This study uses 12 initial input features, thus employing
tural factors that are finally derived in Stage 2 are applied to 12 initial processing elements in the hidden layer. The first
the holdout data. This stage is indispensable to validate the 12 bits represent the selection codes for relevant feature sub-
generalization ability of the proposed scheme. If this stage sets. These bits are defined between 0 and 1. If one of these
is not performed, the model may overfit the training data. bits is assigned the value of “1” through the genetic search,
Author's personal copy
Simultaneous optimization of artificial neural networks for financial forecasting 893

Table 3 Initially selected


features and their formulas Feature name Description Formula Reference

Ct −LLt−n
%K Stochastic %K. It combines a HH t−n −LLt−n × 100 Achelis [1]
security’s closing price to its price where LLt and HH t mean
range over a given time period. lowest low and highest high in
the last t days respectively.
n−1
i=0 %Kt−i
%D Stochastic %D. Moving average of n Achelis [1]
%K.
n−1
i=0 %Dt−i
Slow %D Stochastic slow %D. Moving average n Gifford [16]
of %D.

Momentum It measures the amount that a security’s Ct − Ct−4 Chang et al. [5]
price has changed over a given time
span.
Ct
ROC Price Rate-of-Change. It displays the Ct−n × 100 Murphy [33]
difference between the current price
and the price n days ago.
Hn −Ct
Williams’ %R Larry William’s %R. It is a momentum Hn −Ln × 100 Achelis [1]
indicator that measures
overbought/oversold levels.
Ht −Ct−1
A/D Oscillator Accumulation/Distribution Oscillator. Ht −Lt Chang et al. [5]
It is a momentum indicator that
associates changes in price.
Ct
Disparity5 5-day disparity. It means the distance MA5 × 100 Choi [7]
of current price and the moving
average of 5 days.
Ct
Disparity10 10-day disparity. MA10 × 100 Choi [7]
MA5 −MA10
OSCP Price Oscillator. It displays the MA5 Achelis [1]
difference between two moving
averages of a security’s price.
(Mt −SM t )
CCI Commodity Channel Index. It (0.015×Dt ) Achelis [1],
measures the variation of a security’s (Ht +Lt +Ct ) Chang et al. [5]
where Mt means ,
price from its statistical mean. n 3
i=1 Mt−i+1
SM t is , and Dt
n n
i=1 |Mt−i+1 −SMt |
means n .

RSI Relative Strength Index. It is a price 100 − 100


n−1
Upt−i /n
Achelis [1]
i=0
following an oscillator that ranges 1+ n−1
i=0 Dwt−i /n
from 0 to 100. where Upt means
Where Ct : Closing price at Upward-price-change and Dwt
time t, Lt : Low price at time t , means Downward-price-change
Ht : High price at time t , MAt : at time t.
Moving average of t days

Table 4 Number of cases


Set Year Total
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

Training 232 233 234 236 237 237 235 235 234 234 2,347
Holdout 57 58 58 58 59 59 58 58 58 58 581
Total 289 291 292 294 296 296 293 293 292 292 2,928
Author's personal copy
894 K.-j. Kim, H. Ahn

the associated feature with this bit is selected for analysis. erate any offspring from the selected parents, while single-
Conversely, if one of these bits is assigned the value of “0”, point and two-point crossover techniques may degrade the
the associated feature is not selected. The following 12 bits search by positioning the features irrelevantly. A random
are selection codes for processing elements in the hidden number is generated between 0 and 1 for each of the features
layer. These bits are also defined between 0 and 1. These 24 in the organism for the mutation method. If a feature obtains
bits are codes for the first module mentioned in Sect. 3. a number that is less than or equal to the mutation rate, then
Each processing element in the hidden layer receives 12 that feature is mutated. Only 5,000 trials are allowed before
signals from the input layer. The next 144 bits represent the the iterations stop.
connection weights between the input layer and the hidden Few prior studies proposed an appropriate population
layer. These bits range from −5 to 5. Each processing ele- size and number of iterations for GA. That is, there is no
ment in the output layer receives a signal from the hidden general rule to set GA parameters. Thus, we use the same
layer. The following 12 bits indicate the connection weights configuration settings for these parameters as prior studies
between the hidden layer and the output layer. These bits did that proposed an optimization approach of ANN using
also vary between −5 and 5. The second module mentioned GA [46, 47]. In addition, we use the same GA settings, re-
in Sect. 3 indicates these 156 bits. In this study, the search gardless of the experimental model, although the size of the
space solutions are coded using real values, as binary coding search space varies. We use the same settings, so that all the
may reduce GA’s functional effectiveness when the number experimental models train GA under the same condition.
of parameters is increased [44].
Each feature is discretized into at most five categories. 4.3 Results and discussions
Thus, four thresholds for discretization are needed. The fol-
lowing 48 bits are the thresholds for feature transformation, This study proposed a simultaneous optimization approach
because each of the 12 features needs at most four thresh- to overcome the limitations of prior studies. This section
olds. These 48 bits are codes for the third module mentioned evaluates the results in two aspects: performance and gen-
in Sect. 3. In addition, using these bits, GA requires the num- eralization ability.
ber of categories to be discretized. If any threshold exceeds Performance. All the models proposed in Sect. 3, i.e.
the maximum value for each feature, it is not taken into ac- SOGN1-5, CN, and SIGN, are implemented and tested on
count. As mentioned earlier, the upper limit of the number ten datasets from KOSPI data. We next compare the predic-
of categories is five and the lower limit is one. This number tion performance of five hybrid models and two comparable
is automatically determined by the GA searching process. models. Table 5 describes the average prediction accuracy
The population size is set to 100 organisms and the of each model for hold-out dataset.
crossover and mutation rates are varied to prevent ANN In Table 5, all SOGN models outperform CN and SIGN
from falling into a local optimum to control the GA search for the holdout data. It appears that the simultaneous opti-
parameters. The range of the crossover rate is set between mization approach allows for better learning of noisy pat-
0.5 and 0.7, while the mutation rate ranges from 0.05 to 0.1. terns than do conventional and simple optimization ap-
This study performs the crossover using a uniform crossover proaches. SOGN5 shows the best performance. Our pro-
routine. The uniform crossover method is considered better posed model provides the best prediction accuracy among
at preserving the diversity in the population, and can gen- all the comparable models.

Table 5 Average predictive


performance for the holdout Year CN SIGN SOGN1 SOGN2 SOGN3 SOGN4 SOGN5
data (hit ratio: %)
1989 52.63% 49.12% 54.39% 63.16% 59.65% 61.40% 63.16%
1990 49.15% 56.90% 53.45% 62.07% 60.34% 63.79% 62.07%
1991 60.34% 50.00% 56.90% 63.79% 56.90% 63.79% 70.69%
1992 37.93% 44.83% 46.55% 56.90% 58.62% 58.62% 56.90%
1993 44.07% 44.07% 66.10% 62.71% 61.02% 61.02% 66.10%
1994 61.01% 59.32% 62.71% 66.10% 62.71% 64.41% 64.41%
1995 63.79% 53.45% 62.07% 67.24% 65.52% 72.41% 70.69%
1996 74.13% 50.00% 70.69% 68.79% 67.24% 74.14% 77.59%
1997 58.62% 50.00% 58.62% 63.79% 62.07% 63.79% 63.79%
1998 51.72% 48.28% 53.45% 63.79% 62.07% 62.07% 65.52%

Total 55.33% 50.60% 58.52% 63.86% 61.70% 64.54% 66.09%


Author's personal copy
Simultaneous optimization of artificial neural networks for financial forecasting 895

Table 6 McNemar values for


the pairwise comparison of SIGN SOGN1 SOGN2 SOGN3 SOGN4 SOGN5
performance
CN 3.407 1.784 11.130** 6.380** 14.460** 20.445**
SIGN 8.804** 31.912** 19.267** 25.703** 34.439**
SOGN1 4.265* 1.741 6.606** 10.626**
SOGN2 0.696 0.041 0.709
SOGN3 1.414 4.112*
* Significant at the 5% level
SOGN4 0.413
** Significant at the 1% level

Table 7 Difference between


the average predictive CN SIGN SOGN1 SOGN2 SOGN3 SOGN4 SOGN5
performance of the training and
the holdout data (hit ratio: %) 4.52% 7.26% 7.31% −1.74% 4.09% 3.25% 0.66%

SOGN4 is slightly better than SOGN2. SOGN5 slightly ANN demonstrates a superior predictive approach. Empir-
outperforms SOGN1. These results show that the effect of ical results show that SOGN5 offers better predictive per-
feature subset selection is positive, but the strength of the formance than CN, SIGN, SOGN1-4. The predictive perfor-
effect is quite weak. mance is enhanced based on the additional factors. In ad-
Comparing SIGN-SOGN2 or SIGN-SOGN3, SOGN2 dition, SOGN4 and SOGN5 perform better than the other
and SOGN3 outperform SIGN by about 12–13%. In addi- models for both the training and the holdout data. This may
tion, SOGN4 and SOGN5 outperform SOGN1 by more than be an intuition that simultaneous optimization of multiple
6–7%. In this case, it may be concluded that the effective- factors performs well, not only at the learning process of the
ness of feature transformation is very strong. known data but also for the generalization process for un-
SOGN5 outperforms SOGN4 by 1.55% for the holdout known data.
data. This may be attributed to the effectiveness of auto- Generalization ability. Generalization ability can be
matic determination of the number of categories for feature measured through the consistency between the performance
transformation. However, SOGN3, in which the number of of the training and the holdout data. Where one method
categories is fixed, slightly outperforms SOGN2, which op- exhibits good generalization ability, there are small differ-
timized the number of categories by GA. Thus, we may con- ences in performance between training and holdout data. In
clude that the effectiveness of automatic determination of the opposite situation, where one method shows poor gen-
the number of categories for feature transformation varies. eralization ability (this is the case of over-training or under-
Table 5 reveals that the average prediction performance training), there are large differences in performance between
of CN is somewhat better than SIGN. This may be indicative the training and holdout data. Table 7 shows the difference
that GA does not always guarantee better performance than between the average predictive performances of the training
the gradient descent algorithm in terms of connection weight and the holdout data.
optimization in ANN. These experimental results are in line In Table 7, the differences of SOGN2-5 are relatively
with prior studies [6, 43, 48]. smaller than those of CN, SIGN, and SOGN1. This means
The McNemar tests are used to examine whether the SOGN2-5 generalized better than the other models for the
proposed model significantly outperforms the other mod- holdout data. In particular, SOGN5 produces the least dif-
els. This test is a nonparametric test for two related sam- ference among all models. These results may be attributed
ples. This test may be used with nominal data and is par- to the effect of feature transformation. As mentioned earlier,
ticularly useful with before-after measurement of the same feature transformation reduces dimensionality and may en-
subjects [9]. Table 6 shows the results of the McNemar test. hance generalization ability for the unseen data. Conversely,
As shown in Table 6, SOGN5 outperforms CN, SIGN, CN, SIGN, and SOGN1 have similar differences.
and SOGN1 at the 1% significance level and SOGN3 at
the 5% significance level. SOGN1-4 significantly outper-
forms CN and SIGN at the 1% significance level. In addi- 5 Conclusions and future work
tion, SOGN4 and SOGN2 perform better than SOGN1 at
the 1% and the 5% significance level respectively. CN does This paper has suggested a new hybrid model of GA and
not significantly outperform SIGN. ANN to overcome the limitations of prior studies. In this
From the results of the experiment, it is apparent that paper, GA simultaneously searches for near-optimal solu-
for stock market prediction, the hybrid model of GA and tions of connection weights in the learning algorithm and
Author's personal copy
896 K.-j. Kim, H. Ahn

architectural factors in ANN and the thresholds of feature optimization, other learning algorithms may also prove ef-
transformation for dimensionality reduction. We conclude fective in place of ANN. We believe that there is great poten-
that SOGN5 simultaneously optimizes multiple architec- tial for further research with the simultaneous optimization
tural factors and connection weights in ANN, and then en- using GA for other AI techniques, including case-based rea-
hances the prediction accuracy and the generalization ability soning and decision trees. Future research directions also in-
of the classifier based on empirical results. clude the comparison of several variations of GA and other
Our study provides two contributions. First, we propose search techniques with ANN that would be a useful con-
a novel optimizing factor for ANN—feature transformation tribution to the study on ANN training. Of course, many
(specifically, ‘feature discretization’). Similar to feature sub- tasks still need to be accomplished for SOGN models. The
set selection, feature discretization intentionally reduces the generalization ability of the models should be tested further
amount of information by discretizing continuous variables. by applying them to other problem domains. In particular,
Thus, in theory, it is information loss. However, ANN per- we used only 10-yearly stock market index datasets from
forms better when applying feature discretization, because it one source (KOSPI) to validate our model. We expect that
can mitigate the complexity of the given data. Consequently, the findings of our study would not be much different with
we can summarize that our study extended the global op- other datasets. However, effort should be expended to apply
timization model of ANN by adding this novel concept— our model to the stock market index datasets collected from
feature discretization. other sources.
Second, our study takes into account most of the ap-
proaches used for optimizing ANN proposed by other re-
searchers. In our study, these factors were used to build com- References
parable models (SOGN1 to SOGN4 and SIGN). We tried
1. Achelis SB (1995) Technical analysis from A to Z. Probus Pub-
to examine the individual effect of each factor. That is, this lishing, Chicago
work can be considered as a comprehensive study that cov- 2. Adeli H, Hung S (1995) Machine learning: neural networks, ge-
ers most of the topics, if not all, regarding design optimiza- netic algorithms, and fuzzy systems. Wiley, New York
3. Bala J, Huang J, Vafaie H, DeJong K, Wechsler H (1995) Hybrid
tion of ANN. learning using genetic algorithms and decision trees for pattern
Although our study presents a novel ANN model that classification. In: Proc of the int jnt conf on artificial intelligence,
may predict the direction of the stock market index more pp 19–25
accurately, there are some future potential direction for re- 4. Bauer RJ (1994) Genetic algorithms and investment strategies.
Wiley, New York
search. First, the model is designed to predict the direction 5. Chang J, Jung Y, Yeon K, Jun J, Shin D, Kim H (1996) Technical
of the stock market index rather than individual stock prices. indicators and analysis methods. Jinritamgu Publishing, Seoul
Specific modifications and additional components regarding 6. Chen AS, Leung MT, Daouk H (2003) Application of neural net-
the prediction of individual stock prices are required to ap- works to an emerging financial market: forecasting and trading the
Taiwan Stock Index. Comput Oper Res 30(6):901–923
ply the proposed algorithm in real-life investment decision 7. Choi J (1995) Technical indicators. Jinritamgu Publishing, Seoul
making. 8. Coakley JR, Brown CE (2000) Artificial neural networks in ac-
Second, the proposed methodology has a time restric- counting and finance: modeling issues. Int J Intell Syst Account
Finance Manag 9(2):119–144
tion, because it is applied at the present time and of course,
9. Cooper DR, Emory CW (1995) Business research methods. Irwin,
the accuracy of prediction obtained now may not reflect the Chicago
performance of the algorithm in the future. It should be re- 10. Dash M, Liu H (1997) Feature selection methods for classifica-
trained periodically to implement it for real-life cases. Con- tions. Intell Data Anal 1(3):131–156
11. Davis L (1994) Handbook of genetic algorithms. Van Nostrand
sequently, several issues, such as design of periodic training Reinhold, New York
system and improvement of training efficiency, should be 12. Dorsey R, Sexton R (1998) The use of parsimonious neural net-
resolved before using it as a technique to find arbitrage op- works for forecasting financial time series. J Comput Intell Fi-
portunities. nance 6(1):24–31
13. Dougherty J, Kohavi R, Sahami M (1995) Supervised and unsu-
In addition, the prediction performance of ANN may be pervised discretization of continuous features. In: Proc of the 12th
substantially enhanced, if other factors are incorporated in int conf on machine learning, San Francisco, pp 194–202
the proposed model. The prediction performance may be in- 14. Durand N, Alliot J, Medioni F (2000) Neural nets trained by ge-
creased if GA is employed for appropriate feature weight- netic algorithms for collision avoidance. Appl Intell 13:205–213
15. Fayyad UM, Irani KB (1993) Multi-interval discretization of
ing or relevant instance selection. This remains a very in- continuous-valued attributes for classification learning. In: Proc
teresting topic for further study. Although feature selection of the 13th int jnt conf on artificial intelligence, pp 1022–1027
and transformation in this study may effectively reduce the 16. Gifford E (1995) Investor’s guide to technical analysis: predicting
price action in the markets. Pitman, London
dimensions of feature space, instance selection is a direct
17. Gupta JND, Sexton RS (1999) Comparing backpropagation with a
method of noise and dimensionality reduction. In addition, genetic algorithm for neural network training. Omega-Int J Man-
while ANN performed well with GA-based simultaneous age Sci 27(6):679–684
Author's personal copy
Simultaneous optimization of artificial neural networks for financial forecasting 897

18. Hansen JV (1998) Comparative performance of backpropagation 41. Sexton RS, RE Dorsey, Johnson JD (1998) Toward global opti-
networks designed by genetic algorithms and heuristics. Int J Intell mization of neural networks: a comparison of the genetic algo-
Syst Account Finance Manag 7(2):69–79 rithm and backpropagation. Decis Support Syst 22(2):171–185
19. Hansen JV, Nelson RD (2003) Forecasting and recombining time- 42. Sexton RS, Dorsey RE, Johnson JD (1999) Optimization of neu-
series components by using neural networks. J Oper Res Soc ral networks: a comparative analysis of the genetic algorithm and
54(3):307–317 simulated annealing. Eur J Oper Res 114(3):589–601
20. Henderson CE, Potter WD, McClendon RW, Hoogenboom G 43. Sexton RS, McMurtrey S, Michalopoulos JO, AM Smith (2005)
(2000) Predicting aflatoxin contamination in peanuts: a genetic al-
Employee turnover: a neural network solution. Comput Oper Res
gorithm/neural network approach. Appl Intell 12:183–192
32(10):2635–2651
21. Ignizio JP, Soltys R (1996) Simultaneous design and train-
ing of ontogenic neural network classifiers. Comput Oper Res 44. Sexton RS, Sriram RS, Etheridge H (2003) Improving decision
23(6):535–546 effectiveness of artificial neural networks: a modified genetic al-
22. Kaikhah K, Garlick R (2000) Variable hidden layer sizing in El- gorithm approach. Decis Sci 34(3):421–442
man recurrent neuro-evolution. Appl Intell 12:193–205 45. Shang Y, Wah BW (1996) Global optimization for neural network
23. Kim K, Han I (2000) Genetic algorithms approach to feature dis- training. Computer 29(3):45–54
cretization in artificial neural networks for the prediction of stock 46. Shin TS, Han I (2000) Optimal signal multi-resolution by genetic
price index. Expert Syst Appl 19(2):125–132 algorithms to support artificial neural networks for exchange-rate
24. Lacerda E, Carvalho ACPLF, Braga AP, Ludermir TB (2005) Evo- forecasting. Expert Syst Appl 18(4):257–269
lutionary radial basis functions for credit assessment. Appl Intell 47. Shin KS, Lee YJ (2002) A genetic algorithm application in
22:167–181 bankruptcy prediction modeling. Expert Syst Appl 23(3):321–328
25. Leung FHF, Lam HK, Ling SH, Tam PKS (2003) Tuning of the 48. Shin K, Shin, T, Han I (1998) Neuro-genetic approach for
structure and parameters of a neural network using an improved
bankruptcy prediction: a comparison to back-propagation algo-
genetic algorithm. IEEE Trans Neural Netw 14(1):79–88
rithms. In: Proc of the int conf of the Korea society of management
26. Liu H, Motoda H (1998) Feature transformation and subset selec- information systems 1998, Seoul, South Korea, pp 585–597
tion. IEEE Intell Syst Their Appl 13(2):26–28
49. Slowinski R, Zopounidis C (1995) Application of the rough set
27. Liu H, Setiono R (1996) Dimensionality reduction via discretiza-
tion. Knowl-Based Syst 9(1):67–72 approach to evaluation of bankruptcy risk. Int J Intell Syst Account
Finance Manag 4:27–41
28. Maniezzo V (1994) Genetic evolution of the topology and weight
distribution of neural networks. IEEE Trans Neural Netw 5(1):39– 50. Susmaga R (1997) Analyzing discretizations of continuous at-
53 tributes given a monotonic discrimination function. Intell Data
29. Martens J, Wets G, Vanthienen J, Mues C (1998) An initial com- Anal 1(3):157–179
parison of a fuzzy neural classifier and a decision tree based clas- 51. Ting KA (1997) Discretization in lazy learning algorithms. Artif
sifier. Expert Syst Appl 15(3-4):375–381 Intell Rev 11(1–5):157–174
30. Masters T (1993) Practical neural network recipes in C++. Aca- 52. Vafaie H, DeJong K (1998) Feature space transformation using
demic Press, Boston genetic algorithms. IEEE Intell Syst Their Appl 13(2):57–65
31. McNelis PD (2005) Neural networks in finance: gaining predictive 53. Valova I, Milano G, Bowen K, Gueorguieva N (2010) Bridging
edge in the market. Elsevier Academic Press, Amsterdam the fuzzy, neural and evolutionary paradigms for automatic target
32. Montana D, Davis L (1989) Training feedforward neural networks recognition. Appl Intell Online First
using genetic algorithms. In: Proc of the 11th int jnt conf on arti-
54. Vellido A, Lisboa PJG, Vaughan J (1999) Neural networks in busi-
ficial intelligence, Detroit, pp 762–767
ness: a survey of applications. Expert Syst Appl 17(1):51–70
33. Murphy JJ (1986) Technical analysis of the futures markets:
a comprehensive guide to trading methods and applications. 55. Wallet BC, Marchette DJ, Solka JL, Wegman EJ (1996) A genetic
Prentice-Hall, New York algorithm for best subset selection in linear regression. In: Proc of
34. Ornes C, Sklanski J (1997) A neural network that explains as well the 28th symp on the interface of computing science and statistics,
as predicts financial market behavior. In: Proc of the IEEE/IAFE, pp 545–550
pp 43–49 56. Wang T, Qin Z, Jin Z, Zhang S (2010) Handling over-fitting in test
35. Pujol JCF, Poli R (1998) Evolving the topology and the weights of cost-sensitive decision tree learning by feature selection, smooth-
neural networks using a dual representation. Appl Intell 8:73–84 ing and pruning. J Syst Softw 83(7):1137–1147
36. Rumelhart DE, McClelland JL (1986) Parallel distributed process- 57. Williamson AG (1995) Refining a neural network credit applica-
ing: explorations in the microstructure of cognition, vol 1. MIT tion vetting system with a genetic algorithm. J Microcomput Appl
Press, Cambridge 18(3):261–277
37. Salcedo-Sanz S, Bousono-Calzon C (2005) A hybrid neural- 58. Wong F, Tan C (1994) Hybrid neural, genetic and fuzzy systems.
genetic algorithm for the frequency assignment problem in satel- In: Deboeck GJ (ed) Trading on the edge. Wiley, New York, pp
lite communications. Appl Intell 22:207–217 245–247
38. Scott PD, Williams KM, Ho KM (1997) Forming categories in
59. Yang J, Honavar V (1998) Feature subset selection using a genetic
exploratory data analysis and data mining. In: Liu X, Cohen P
algorithm. IEEE Intell Syst Their Appl 13(2):44–49
Berthold M (eds) Advances in intelligent data analysis. Springer,
Berlin, pp 235–246 60. Yao X, Liu Y (1997) A new evolutionary system for evolving ar-
39. Sexton RS (1998) Identifying irrelevant input variables in chaotic tificial neural networks. IEEE Trans Neural Netw 8(3):694–713
time series problems: using genetic algorithm for training neural 61. Zhang S (2010) Shell-neighbor method and its application in miss-
networks. J Comput Intell Finance 6(5):34–41 ing data imputation. Appl Intell. doi:10.1007/s10489-009-0207-6
40. Sexton RS, Alidaee B, RE Dorsey, Johnson JD (1998) Global op- 62. Zhao Q, Higuchi T (1996) Efficient learning of NN-MLP based on
timization for artificial neural networks: a tabu search application. individual evolutionary algorithm. Neurocomputing 13(2–4):201–
Eur J Oper Res 106(2–3):570–584 215
Author's personal copy
898 K.-j. Kim, H. Ahn

Kyoung-jae Kim is an associate Hyunchul Ahn is an assistant pro-


professor of the Department of fessor of the School of Management
Management Information Systems Information Systems at Kookmin
at Dongguk University. He received University. He received his Ph.D.
his Ph.D. from KAIST. He has pub- from KAIST. He has published in
lished in Annals of Operations Re- Annals of Operations Research, Ap-
search, Applied Intelligence, Ap- plied Soft Computing, Expert Sys-
plied Soft Computing, Computers tems, Expert Systems with Applica-
in Human Behavior, Expert Sys- tions, Information & Management,
tems, Expert Systems with Appli- and other journals. His research in-
cations, Intelligent Data Analysis, terests are in the areas of data min-
Intelligent Systems in Accounting, ing in finance and marketing, and ar-
Finance & Management, Interna- tificial intelligence techniques such
tional Journal of Electronic Com- as artificial neural networks, support
merce, Neural Computing & Appli- vector machines, case-based reason-
cations, Neurocomputing, and other journals. His research interests in- ing, and genetic algorithms.
clude data mining, customer relationship management, and electronic
commerce.

View publication stats

You might also like