0% found this document useful (0 votes)
147 views

Using Artificial Neural Network Models in Stock Market Index Prediction PDF

Uploaded by

David Diaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
147 views

Using Artificial Neural Network Models in Stock Market Index Prediction PDF

Uploaded by

David Diaz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Expert Systems with Applications 38 (2011) 10389–10397

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Using artificial neural network models in stock market index prediction


Erkam Guresen a, Gulgun Kayakutlu a,⇑, Tugrul U. Daim b
a
Istanbul Technical University, Istanbul, Turkey
b
Portland State University, Portland OR, USA

a r t i c l e i n f o a b s t r a c t

Keywords: Forecasting stock exchange rates is an important financial problem that is receiving increasing attention.
Financial time series (FTS) prediction During the last few years, a number of neural network models and hybrid models have been proposed for
Recurrent neural networks (RNN) obtaining accurate prediction results, in an attempt to outperform the traditional linear and nonlinear
Dynamic artificial neural networks (DAN2) approaches. This paper evaluates the effectiveness of neural network models which are known to be
Hybrid forecasting models
dynamic and effective in stock-market predictions. The models analysed are multi-layer perceptron
(MLP), dynamic artificial neural network (DAN2) and the hybrid neural networks which use generalized
autoregressive conditional heteroscedasticity (GARCH) to extract new input variables. The comparison
for each model is done in two view points: Mean Square Error (MSE) and Mean Absolute Deviate
(MAD) using real exchange daily rate values of NASDAQ Stock Exchange index.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction value. It is observed that in most of the cases ANN models give bet-
ter result than other methods. However, there are very few studies
Forecasting simply means understanding which variables lead comparing the ANN models do among themselves, where this
to predict other variables (Mcnelis, 2005). This means a clear study is filling a gap.
understanding of the timing of lead-lag relations among many Objective of this study is to compare performance of most re-
variables, understanding the statistical significance of these lead- cent ANN models in forecasting time series used in market values.
lag relations and learning which variables are the more important Autoregressive Conditional Heteroscedasticity (ARCH) model
ones to watch as signals for predicting the market moves. Better (Engle, 1982), generalized version of ARCH model Generalized
forecasting is the key element for better financial decision making, ARCH (GARCH) model (Bollerslev, 1986), Exponential GARCH
in the increasing financial market volatility and internationalized (EGARCH) model (Nelson, 1991) and Dynamic Architecture for
capital flows. Artificial Neural Networks (DAN2).
Accurate forecasting methods are crucial for portfolio manage- Ghiassi and Saidane (2005) will be analyzed in comparison to
ment by commercial and investment banks. Assessing expected re- classical Multi-Layer Perceptron (MLP) model. Despite the popular-
turns relative to risk presumes that portfolio strategists ity and implementation of the ANN models in many complex
understand the distribution of returns. Financial expert can easily financial markets directly, shortcomings are observed. The noise
model the influence of tangible assets to the market value, but that caused by changes in market conditions, it is hard to reflect
not intangible asset like know-how and trademark. The financial the market variables directly into the models without any assump-
time series models expressed by financial theories have been the tions (Roh, 2007). That is why the new models will also be exe-
basis for forecasting a series of data in the twentieth century. cuted in hybrid combination with MLP. The analysed models will
Studies focusing on forecasting the stock markets have been be tested on NASDAQ index data for nine months and the methods
mostly preoccupied with forecasting volatilites. There has been will be compared by using Mean Square Error (MSE) and Mean
few studies bringing models from other forecasting areas such as Absolute Deviation (MAD).
technology forecasting. The remaining sections of this paper are organized as follows:
To model the market value, one of the best ways is the use of Section 2 gives the background of the related studies; Section 3
expert systems with artificial neural networks (ANN), which do introduces the models used in this study and Section 4 provides re-
not contain standard formulas and can easily adapt the changes sults of each model using daily exchange rates of NASDAQ index.
of the market. In literature many artificial neural network models Final section gives the conclusion and recommendations for future
are evaluated against statistical models for forecasting the market researches.
This study will not only make contribution to the ANN research
⇑ Corresponding author. but also to the business implementations of market value
E-mail address: [email protected] (G. Kayakutlu). calculation.

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.02.068
10390 E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397

2. Background Zhang and Wan (2007) developed a new ANN architecture Sta-
tistical Fuzzy Interval Neural Network based on Fuzzy Interval
2.1. Time series forecasting and ANN Neural Network. JPY/USD and GBP/USD exchanges rates are pre-
dicted using these methods. These methods are developed to pre-
The financial time series models expressed by financial theo- dict only an interval not a point in time. Hassan, Nath, and Kirley
ries have been the basis for forecasting a series of data in the (2007) used a hybrid model including Hidden Markov Model,
twentieth century. Yet, these theories are not directly applicable ANN and Genetic Algorithm. They test hybrid model on stock ex-
to predict the market values which have external impact. The change rates. Hybrid model is proven to be better than simulation
development of multi layer concept allowed ANN (Artificial models.
Neural Networks) to be chosen as a prediction tool besides other Yu and Huarng (2008) used bivariate neural networks, bivari-
methods. Various models have been used by researchers to fore- ate neural network-based fuzzy time series, and bivariate neural
cast market value series by using ANN. A brief literature survey is network-based fuzzy time series model with substitutes to apply
given in Table 1. neural networks to fuzzy time series forecasting. Bivariate neural
Gooijer and Hyndman (2006) reviewed the papers about time network-based fuzzy time series model with substitutes performs
series forecasting from 1982 to 2005. It has been prepared for the best. Zhu, Wang, Xu, and Li (2008) used basic and augmented
the silver jubilee volume of international journal of forecasting, neural network models to show trading volume can improve the
for the 25th birthday of International Institute of Forecasters prediction performance of neural networks. Leu, Lee, and Jou
(IIF). In this review statistical and simulation methods are analyzed (2009) compared radial basis-function neural network (RBFNN),
to include exponential smoothing, ARIMA, seasonality, state space random walk, and distance-based fuzzy time series models with
and structural models, nonlinear models, long memory models, daily closing values of TAIEX, and exchange rates NTD/USD,
ARCH-GARCH. Gooijer and Hyndman (2006) compiled the reported KRW/USD, CNY/USD, JPY/USD. Results show that RBFNN outper-
advantages and disadvantages of each methodology and pointed formed the random walk model and the artificial neural network
out the potential future research fields. They also denoted exis- model in terms of mean square error. Cheng, Chen, and Lin
tence of many outstanding issues associated with ANN utilisation (2010) used PNN (Prbobabilistic NN), rough sets, and hybrid
and implementation stating when they are likely to outperform model (PNN, Rough Set, C 4.5 Decision Tree) to integrate funda-
other methods. Last few years researches are focused on improving mental analysis and technical analysis to build up a trading mod-
the ANN’s prediction performance and developing new artificial el of stock market timing. They report that hybrid model is
neural network architecture. helpful to construct a better predictive power trading system
Engle (1982) suggested the Autoregressive Conditional Hetero- for stock market timing analysis. Chang, Liu, Lin, Fan, and Ng
scedasticity (ARCH) model, Bollerslev (1986) generalized the ARCH (2009) used an integrated system (CBDWNN) which combines
model and proposed the Generalized ARCH (GARCH) model for dynamic time windows, case based reasoning (CBR), and neural
time series forecasting. By considering the leverage effect limita- network (NN). Their CBDWNN model outperformed other com-
tion of the GARCH model, the Exponential GARCH (EGARCH) model paired methods, and very informative and robust for average
was proposed by Nelson (1991). Despite the popularity and imple- investors.
mentation of the ANN models in many complex financial markets Egrioglu, Aladag, Yolcu, Uslu, and Basaran (2009) introduced a
directly, shortcomings are observed. The noise that caused by new method which is based on feed forward artificial neural
changes in market conditions, it is hard to reflect the market vari- networks to analyze multivariate high order fuzzy time series fore-
ables directly into the models without any assumptions (Roh, casting models. Khashei and Bijari (2010) compaired auto-
2007). regressive integrated moving average (ARIMA), artificial neural
Preminger and Franck (2007) used a robust linear autoregres- networks (ANNs), and Zhang’s hybrid model. And Hybrid model
sive and a robust neural network model to forecast exchange rates. outperforms the other models. Hamzacebi, Akay, and Kutay
Their robust models were better than classical models but still are (2009) compaired ARIMA and ANN and conclude that direct fore-
not better than Random Walk (RW). Roh (2007) used classical ANN cast with ANN is better and noted that before generalizing the con-
and EWMA (Exponentially Weighted Moving Average), GARCH and clusion other researchs should be done. Majhi, Panda, and Sahoo
EGARCH models with ANN. NN-EGARCH model outperforms the (2009) compaired functional link artificial neural network
other models with a 100% hit ratio for smaller forecasting period (FLANN), cascaded functional link artificial neural network
than 10 days. (CFLANN),and LMS model and observed that the CFLANN model
Kumar and Ravi (2007) reviews 128 papers about bankruptcy performs the best followed by the FLANN and the LMS models.
prediction of banks and firms. This review shows that ANN meth- Liao and Wang (2010) used stochastic time effective neural net-
ods outperforms many methods and hybrid systems can combine work model to shows some predictive results on the global stock
the advantages of different methods. Ghiassi, Saidane, and Zimbra indices and their model is showed predictive results. Atsalakis
(2005) evaluated ANN, ARIMA and DAN2 (Dynamic Architecture and Valavanis (2009a) used Adaptive Neuro Fuzzy Inference Sys-
for Artificial Neural Networks) using popular time series in litera- tem (ANFIS) to determine the best stock trend prediction model
ture. DAN2, is a new NN architecture first developed by Ghiassi and results show that ANFIS clearly demonstrates the potential
and Saidane (2005), clearly outperforms the other methods. of neurofuzzy based modeling for financial market prediction.
DAN2 is pure feed forward NN architecture and detailed informa- Chen, Ying, and Pan (2010) also used ANFIS to predict monthly
tion about this architecture will be given in Section 5. tourist arrivals. And conclude that ANFIS performs better than
Menezes and Nikolaev (2006) used a new NN architecture and markov and fuzzy models. Bildirici and Ersin (2009) combined
named it Polynomial Genetic Programming. It is based on Polyno- ANNs with ARCH/GARCH, EGARCH, TGARCH, PGARCH, APGARCH.
mial Neural Network first developed by Ivakhnenko (Menezes & This combined models better perfomred than ANNs or GARCH
Nikolaev, 2006). This architecture uses polynomials to build an based models. Guresen and Kayakutlu (2008) used hybrid models
ANN. Menezes and Nikolaev (2006) uses genetic algorithm to esti- like GARCH-DAN2 and EGARCH-DAN2 to forecast Istanbul Stock
mate ANN parameters such as starting polynomials, weight esti- Exchange Index (ISE XU100). Yudong and Lenan (2009) used bacte-
mation etc. This study gives better result for some problems. It is rial chemotaxis optimization (BCO), and back propagation neural
a new promising architecture but it needs improvement (Menezes network (BPNN) on S&P 500 index and conclude that their hybrid
& Nikolaev, 2006). model (IBCO–BP) model offers less computational complexity, bet-
E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397 10391

Table 1
Financial time series researches (ANN and hybrid models).

Date Researchers Used method Data years Data type Goal Predicted Results
period
2005 Ghiassi, Saidane & ANN, ARIMADAN2 Time series from To compare the DAN2, is an alternative of ANN &
Zimbra literature methods gives better result & needs to
only choose the inputs
2005 Yümlü, Gürgen & Mixture of Experts (MoE) 1990–2002 ISE XU100 daily Exchange 4 years MoE outperforms the other
Okay MLP, RNN, EGARCH values prediction & To modelsEGARCH is outperformed
compare the by all other methods
methods
2006 Menezes & Genetic Programming (GP) Time series from To compare the Find the polynomials in time
Nikolaev Polynomial Genetic literature methods series & promising for future
Programming researches
2007 Preminger & Robust Liner 1971–2004 GBP/USDJPY/USD Better forecasting 1–3- Robust models are better than
Franck AutoregressiveRobust 6 months start models but still are not
Neural Network better than RW (Random Walk)
2007 Hamzaçebi & ARIMA ANN 2202–2006 ISE-XU100 To compare ARIMA Daily ANN has better results
Bayramoğlu & ANN
2007 Pekkaya and LR (Linear regression) ANN 1999–2006 YTL/USD To compare the Monthly ANN gives better results &
Hamzaçebi forecasts using predicts two important breaking
macro variables point with 6.611 % error
2007 Roh ANN, EWMA, GARCH, 930 KOSPI 200 To compare ANN Daily NN-EGARCH & NN-GARCH; for
EGARCH tradedays with hybrid models periods shorter than a month
100 % direction prediction and
periods shorter than 160 days
min 50 % direction prediction
2007 Kumar & Ravi ANN, Fuzzy Logic, Cased- Review- RS based models outperform
Based Reasoning, Decision Bankruptcy logistic regression & decision
Trees, Rough Sets (RS) prediction (128 tree. Logistic regression, LDA,
paper) QDA, FA clearly outperformed
by ANN. Hybrid methods can
combine the advantages of
methods
2007 Celik and Karatepe ANN 1989–2004 Banking sector Crises prediction Financial ratios successfully
data series predicted for 4 months
2007 Zhang & Wan Fuzzy Interval NN (FINN) 1998–2001 JPY/USDGBP/USD Exchange 6 weeks Promising for future researches
prediction
2007 Hassan, Nath & Hidden Markov Model 2003–2004 Stocks; Apple, Exchange 5 weeks Hybrid model is better than
Kirley (HMM),ANN, Genetic IBM, Dell prediction HMM & ARIMA
Algorithm (GA)
2008 Yu & Huarng Bivariate NN, Bivariate NN- 1999 Daily closing Appling neural Daily Bivariate neural network-based
based fuzzy time series, values of TAIEX & networks to fuzzy fuzzy time series model with
Bivariate NN-based fuzzy TAIFEX time series substitutes performs the best,
time series model with forecasting Bivariate neural network-based
substitutes fuzzy time series performs the
worst
2008 Zhu, Wang, Xu, & Basic & augmented neural 1989–2005 NASDAQ, DJIA & To investigate Daily, It is possible to modestly or
Li network models STI indices effect of trading weekly & significantly improve the
volume on monthly network performance by adding
prediction with trading volume
ANN
2009 Leu, Lee & Jou Radial basis-function 2006–2007 TAIEX, NTD/USD, Index prediction & 1,3,5 & RBFNN outperformed the
neural network (RBFNN), KRW/USD, CNY/ To compare the 7 days random walk model & the
Random walk, Distance- USD, JPY/USD methods artificial neural network model
based fuzzy time series in terms of mean square error
2010 Cheng, Chen,& Lin PNN (Probabilistic NN), 1988–2005 Monthly Taiwan To integrate 1, 3, 6 & PNN, rough sets & C4.5
Rough Sets, Hybrid (PNN, weighted stock fundamental 12 months classifiers to generate trading
Rough Set, C 4.5 Decision index analysis & rule sets, which is helpful to
Tree) technical analysis construct a better predictive
on a trading model power trading system for stock
market timing analysis
2009 Chang, Liu, Lin, An integrated system 2004–2006 Daily values of Efficient Daily The CBDWNN in this study is
Fan, & Ng (CBDWNN); combining nine different forecasting model outperforming other two
dynamic time windows, stocks for making the buy/ methods, and very informative
case based reasoning (CBR), selldecisions and robust for average investors
& neural network (NN)
2009 Egrioglu, Aladag, A new method which is 1974–2004 Annual car road A new method that Annually The proposed method provides
Yolcu, Uslu, & based on feed forward accidents does not require forecasts with a smaller AFER
Basaran artificial neural networks casualties in use of fuzzy logic value than ones obtained from
to analyze multivariate Belgium relation tables to the methods in literature
high order fuzzy time series determine fuzzy
forecasting models relationships
2010 Khashei & Bijari Auto-regressive integrated Wolf’s sunspot, To demonstrate the Zhang’s hybrid model
moving average (ARIMA), Canadian lynx, appropriateness & outperforms ARIMA, & ANNs
Artificial neural networks GBP/USD effectiveness of the
(ANNs), Zhang’s hybrid proposed models.

(continued on next page)


10392 E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397

Table 1 (continued)

Date Researchers Used method Data years Data type Goal Predicted Results
period
model
2009 Hamzacebi, Akay, ARIMA ANN (both for direct Time series used To find the method They claim the superiority of the
& Kutay & iterative forecast) in literature which gives a direct method, however other
better result researches’ are necessary to
generalize the conclusion
2009 P& a & Sahoo Functional link artificial USD to GBP, Indian To evaluate the It is observed that the CFLANN
neural network (FLANN), Rupees & Japanese performance of the model performs the best
Cascaded functional link Yen proposed models followed by the FLANN & the
artificial neural network LMS models.
(CFLANN), LMS model.
2010 Liao & Wang Stochastic time effective 1990–2008 SAI, SBI, HSI, DJI, To shows some Daily Stochastic time effective neural
neural network model IXIC & SP500 predictive results network model shows some
on the global stock predictive results on the global
indices stock indices.
2009 Atsalakis, & Adaptive Neuro Fuzzy Ten stocks from To determine the Daily Proposed system clearly
Valavanis Inference System (ANFIS) Athens & NYSE best stock trend demonstrates the potential of
prediction model neurofuzzy based modeling for
financial market prediction
2010 Chen, Ying, & Pan Adaptive network-based 1989–2000 Tourist arrivals to To demonstrate the Monthly The ANFIS model yield more
fuzzy inference system Taiwan,Hong forecasting accurate tourist arrivals
(ANFIS) Kong, USA & performance of forecasting than that of the
Germany ANFIS Markov, GM & Fuzzy
2009 Bildirici & Ersin ARCH/GARCH, EGARCH, 1987–2008 ISE-XU100 To improve Daily ANN models provide significant
TGARCH, PGARCH, forecasts with improvement in forecasts.
APGARCH, ANN ANNs
2009 Yudong & Lenan Bacterial chemotaxis 1998–2008 Standard’s & Eficient forecasting Daily The IBCO–BP model offers less
optimization (BCO), Back Poor’s 500 (S& P model for computational complexity,
propagation neural 500) prediction of stock better prediction accuracy, &
network (BPNN) indices less training time

ter prediction accuracy, and less training time. Atsalakis and 3. ANN models used in time series forecasting
Valavanis (2009b) surveyed more than 100 related published
articles which focused on neural networks and neuro-fuzzy tech- 3.1. Multilayer perceptron (MLP)
niques derived and applied to forecast stock markets.
This literature survey shows that ANN models generally out- The multilayer perceptron is one of the most widely imple-
perform other methods when applied on time series Further, mented neural network topologies. In terms of mapping abilities,
new architectures and Hybrid models are promising but only the MLP is believed to be capable of approximating arbitrary func-
DAN2 clearly outperforms all compared models (Ghiassi & tions (Principe, Euliano, & Lefebvre, 1999). This has been important
Saidane, 2005; Ghiassi et al., 2005; Ghiassi, Zimbra, & Saidane, in the study of nonlinear dynamics, and other function mapping
2006). problems.
Two important characteristics of the multilayer perceptron are:
2.2. Forecasting market values its nonlinear processing elements (PEs) which have a nonlinearity
that must be smooth (the logistic function and the hyperbolic tan-
Currently many forecasts are established by the analysts work- gent are the most widely used); and their massive interconnectiv-
ing for different financial institutions in the US (Ramnath, Rock, & ity, i.e. any element of a given layer feeds all the elements of the
Shane, 2008). Academic forecasts have also been published. next layer (Principe et al., 1999).
Prior research attempted forecasting US stock market indices or MLPs are normally trained with the backpropagation algorithm
US financial metrics in the past. A group literature focused on (Principe et al., 1999). The backpropagation rule propagates the er-
understanding volatilities. Maltritz and Eichler (2010) focus on rors through the network and allows adaptation of the hidden PEs.
the American depository receipts in order to forecast crises and The multilayer perceptron is trained with error correction learning,
changes in the exchange rates. Ando (2009) through use of Bayes- which means that the desired response for the system must be
ian theory developed a new portfolio selection method. Scharth known.
and Medeiros (2009) attempted to forecast volatilities in the Error correction learning works in the following way: From the
Dow Jones index by using a combination of regression trees and system response at PE i at iteration n, yi(n), and the desired re-
smooth transition. Another related study by Chen and So (2006) sponse di(n) for a given input pattern an instantaneous error ei(n)
used a threshold heteroscedastic model to forecast similar volatil- is defined by
ities. Wang, Keswani, and Taylor (2006) explored the role of senti-
ment in forecasting volatilities. ei ðnÞ ¼ di ðnÞ  yi ðnÞ: ð1Þ
Another group of studies used different group of methods for
forecasting the stock market. They brought tools used for technol- Using the theory of gradient descent learning, each weight in the net-
ogy forecasting into financial forecasting with some good success. work can be adapted by correcting the present value of the weight
Lee, Lee, and Oh (2005) used Lotka-Volterra model which is really with a term that is proportional to the present input and error at the
based on prey predator relationship and used for forecasting the weight, i.e.
diffusion of competing technologies. They applied it to the Korean
wij ðn þ 1Þ ¼ wij ðnÞ þ gdi ðnÞxj ðnÞ: ð2Þ
Stock Market.
E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397 10393

The local error di(n) can be directly computed from ei (n) at the out- is flat, the learning rate should be increased to speed up learning.
put PE or can be computed as a weighted sum of errors at the inter- On the other hand, when the learning curve oscillates up and
nal PEs. The constant g is the step size and called the learning rate. down, the step size should be decreased. In the extreme, the error
This procedure is called the backpropagation algorithm. can go steadily up, showing that learning is unstable. At this point
Backpropagation computes the sensitivity of a cost functional the network should be reset. When the learning curve stabilizes
with respect to each weight in the network, and updates each after many iterations at an error level that is not acceptable, it is
weight proportional to the sensitivity. The beauty of the procedure time to rethink the network topology (more hidden PEs or more
is that it can be implemented with local information and requires hidden layers, or a different topology altogether) or the training
just a few multiplications per weight, which is very efficient. Be- procedure (other more sophisticated gradient search techniques).
cause this is a gradient descent procedure, it only uses the local Principe et al. (1999) present below a set of heuristics that will
information so can be caught in local minima. Moreover, the pro- help decrease the training times and, in general, produce better
cedure is inherently noisy since we are using a poor estimate of performance;
the gradient, causing slow convergence (Principe et al., 1999).
Momentum learning is an improvement to the straight gradient  Normalizing training data,
descent in the sense that a memory term (the past increment to the  Using the tanh nonlinearity instead of the logistic function.
weight) is used to speed up and stabilize convergence. In momen-  Normalizing the desired signal to be just below the output non-
tum learning the equation to update the weights becomes linearity rail voltages (i.e. when using the tanh, the desired sig-
nals of +/ 0.9 instead of +/ 1).
wij ðn þ 1Þ ¼ wij ðnÞ þ gdi ðnÞxj ðnÞ þ aðwij ðnÞ  wij ðn  1ÞÞ; ð3Þ  Setting the step size higher towards the input (i.e. for a one hid-
den layer MLP, set the step size at 0.05 in the synapse between
where a is the momentum. Normally a should be set between 0.1 the input and hidden layer, and 0.01 in the synapse between the
and 0.9. hidden and output layer).
Training can be implemented in two ways: Either we present a  Initializing the net’s weights in the linear region of the nonlin-
pattern and adapt the weights (on-line training), or we present all earity (dividing the standard deviation of the random noise
the patterns in the input file (an epoch), accumulate the weight up- source by the fan-in of each PE).
dates, and then update the weights with the average weight up-  Using more sophisticated learning methods (quick prop or delta
date. This is called batch learning. To start backpropagation, bar delta).
loading an initial value for each weight (normally a small random  Always having more training patterns than weights. It can be
value) is needed, and proceeding until some stopping criterion is expected that the performance of the MLP in the test set to be
met. The three most common are: to cap the number of iterations, limited by the relation N > W/e, where N is the number of train-
to threshold the output mean square error, or to use cross valida- ing epochs, W the number of weights and e the performance
tion. Cross validation is the most powerful of the three since it error. The MLP should be trained until the mean square error
stops the training at the point of best generalization (i.e. the perfor- is less than e/2.
mance in the test set) is obtained (Principe et al., 1999). To imple-
ment cross validation one must put aside a small part of the
training data and use it to see how the trained network is doing 3.2. Dynamic architecture for artificial neural networks (DAN2)
(e.g. every 100 training epochs, test the net with a validation
set). When the performance starts to degrade in the validation This model is developed by Ghiassi and Saidane (2005) and
set, training should be stopped (Alpaydın, 2004; Haykin, 1999; compared with the classical ANN models using a known time ser-
Principe et al., 1999). ies (Ghiassi et al., 2005). Fig. 1 shows the structure of DAN2.
Measuring the progress of learning is fundamental in any itera- The algorithm steps given by Ghiassi and Saidane (2005) are as
tive training procedure. The learning curve (how the mean square follows:
error evolves with the training iteration) is such a quantity. The For input matrix X = {Xi; i = 1, 2 . . . , n} as n independent records
difficulty of the task and how to control the learning parameters of m attributes let Xi = {xij; j = 1, 2, . . . , m}, and the reference vector
can be judged from the learning curve. When the learning curve R = {rj; j = 1, 2, . . . , m}

Fig. 1. The DAN2 network architecture (Ghiassi & Saidane, 2005).


10394 E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397

1. The initial linear layer: 3.3. GARCH-MLP models


X
F 0 ðXÞ ¼ a0 þ b0j xij : ð3Þ Autoregressive conditional heteroscedasticity (ARCH) model con-
j
siders the variance of the current error term to be a function of
2. Subsequent hidden layers’ CAKE node at iteration k: the variances of the previous time period’s error terms. ARCH relates
the error variance to the square of a previous period’s error. If an
F k ðX i Þ ¼ ak þ bk F k1 ðX i Þ þ ck Gk ðXiÞ þ dk Hk ðXiÞ: ð4Þ autoregressive moving average model (ARMA model) is assumed
3. The CURNOLE node’s input and transfer function at iteration k for the error variance, the model is a generalized autoregressive
(k = 1, 2, . . . , K; where K is the maximum sequential iterations conditional heteroskedasticity (GARCH). In that case, the GARCH
or number of hidden layers) is defined as: (p, q) model (where p is the order of the GARCH terms r2 and q is
(a) Specify a random set of m constant representing the ‘‘refer- the order of the ARCH terms e2) is given by
ence’’ vector R (default rj = 1 for all j = 1, 2, . . . , k, m). r2t ¼ a0 þ a1 e2t1 þ . . . þ aq e2tq þ b1 r2t1 þ . . . þ bp r2tp
(b) For each input record Xi, compute the scalar product:
X
q X
p
X ¼ a0 þ ai e2ti þ bi r2ti : ð12Þ
RX ¼ r j xij : ð5Þ i¼1 i¼1
j
Most of the financial series models are known to be easily modelled
(c) Compute the length (norm) of the vector R and a record by GARCH (1, 1), so this research uses the extracted variables from
vector GARCH (1, 1) as Roh suggests (Roh, 2007). The GARCH (1, 1) has
sffiffiffiffiffiffiffiffiffiffiffiffi
X sffiffiffiffiffiffiffiffiffiffiffiffi
X ffi the following formula:
X i : kRk ¼ r2j ; kX i k ¼ x2ij ; ð6Þ
j j r2t ¼ a0 þ a1 e2t1 þ b1 r2t1 ; ð13Þ

(d) Normalize R  X to compute where rt is volatility at t, a0 is the nonconditional volatility coeffi-


cient, e2t1 residual at t  1, r2t1 is the variance at t  1.
ðR  XÞN ¼ ðR  X i Þ ¼ ðR  X i Þ=ðkRk  kX i kÞ: ð7Þ The newly extracted variables are as follows (Roh, 2007):
0
Recall that: r2t ¼ b1 r2t1 ; ð14Þ

ðR  X i ÞN ¼ ðkRk  kX i kÞ  cosðangleðR; X i ÞÞ; ð8Þ


e2t1 ¼ a1 e2t1 : ð15Þ
We use these new variables as additional inputs for every type of
thus,
ANN given above.
cos ineðangleðR; X i Þ ¼ ðR  X i Þ=ðkRk  kXki Þ ¼ ðR  X i ÞN : ð9Þ
3.4. EGARCH–MLP Models
(e) For i = 1, 2, . . . , n; compute
EGARCH has the leverage effect with the following formula:
angleðR; X i Þ ¼ arccosðR  X i ÞN ¼ ai : ð10Þ
 rffiffiffiffi!  
e 2 et1
(f) Compute the transferred nonlinear component of the signal  t1
ln r2t ¼ a þ b ln r2t1 þ c    þx ; ð16Þ
as: Gk(Xi) = cos(lk  ai), Hk(Xi) = sin (lk  ai), lk is a con- rt1 p rt1
stant multiplier for iteration k.
where a is the nonconditional variance coefficient, ln r2t is the log
(g) Replacing Gk (Xi) and Hk (Xi) in Eq. 5.9 will result p
value of variance at t  1, (j et1/rt1  (2/p) j) is the asymmetric
F k ðX i Þ ¼ ak þ bk F k1 ðX i Þ þ ck cosðlk  ai Þ þ dk sinðlk  ai Þ: shock by leverage effect, and (et1/rt1) is the leverage effect. The
ð11Þ newly extracted variables are as follows (Roh, 2007):
0
ln r2t1 ¼ b ln r2t1 ; ð17Þ
 rffiffiffiffi!
Data normalization in DAN2 can be represented by the trigono- e 2
 t1
metric function cos (lk  ai + h). At each layer vector R is rotated LE ðleverage effectÞ ¼ c    ; ð18Þ
rt1 p
and shifted to minimize the resulting total error.  
If the model training stops too early, the network is said to be et1
L ðleverageÞ ¼ x : ð19Þ
under-trained or under-fitted. An under-trained model often has rt1
high SSE values for either or both the training and validation data
sets. Under-training often occurs when there are insufficient data 3.5. Model performance measures: MSE and MAD
for model fitting. DAN2 uses e1 ¼ ðSSEk  SSEk1 Þ=SSEk 6 e1 to as-
sess existence or absence of under-training in the models (Ghiassi In literature, mean square error (MSE) and mean absolute devi-
& Saidane, 2005). Over-training or over-fitting is a more common ate (MAD) are generally used for evaluating performances of
problem in neural net modeling. A neural network modeler consid- ANN’s. MSE and MAD are obtained by the following formulas:
ered over-fitted (over-trained) when the network fits the in sample
data well but produces poor out-of-sample results. To avoid over- 1X
MSE ¼ ðr k  yk Þ2 ; ð20Þ
fitting, (Ghiassi & Saidane, 2005) divide the available in-sample n k
data into the training and validation data sets. At each iteration 1X
k, (k > 1), they compute MSE values for both the training (MSET) MAD ¼ jr k  yk j; ð21Þ
n k
and validation (MSEV) sets and they use e2 ¼ jMSET  MSEV j=
MSET 6 e2 to guard against over-fitting. The modeler should con- where yk is the actual output value of kth observation, rk is the out-
sider fully trained when the user specified accuracy criteria and put value of the ANN’s obtained from kth observation and n is the
the over fitting constraint are both satisfied. The accuracy levels number of observations.
e1 and e2 are problem dependent and should determined experi- In this study another form of MAD is used. MAD % is mean abso-
mentally (Ghiassi & Saidane, 2005). lute deviate in percentage. When the descriptive statistics is
E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397 10395

analyzed it is easy to see that training data is between 1268.64 and 5. Results & discussion
1844.25 (range is 43194.05). But the test data is between 1664.19
and 1867.32 (range is 16889.10). Thus to clarify the errors between The fact that the GARCH-DAN2 gave unexpectedly high results
training and testing, MAD % values are given. MAD % is obtained by that is far from all the rest of the methods both in training and test
the following formula: data forced us to look into the regression model of DAN2. Unfortu-
nately, DAN2 has inconsistencies in the regression model as shown
1 X jrk  yk j in Table 3.
MAD ¼  100: ð22Þ
n k yk The inconsistencies in DAN2 can be summarized as follows:

1. If the starting process (regression) of DAN2 forecast well, the


In order to facilitate the comparison of training and testing data
remaining structure of DAN2 can take the forecasts one step
performance of the models, MAD % values are used.
further.
2. There is multiple input but single output in the model since it is
based on multi-variable regression. So DAN2 cannot be used for
4. Case: forecasting NASDAQ index
multiple output problems.
3. DAN2 does not carry adaptivity feature of ANN models, which
In this research daily stock exchange rates of NASDAQ from
lasts in remodelling when the training data changes. As a result
October 7, 2008 to June 26, 2009 are used. First 146 days are used
when time passes and new data become available for time ser-
for training and cross validation and last 36 used for testing. For
ies forecasting and a new forecast needed, the entire DAN2
hybrid models also new variables extracted from GARCH and
model should be thrown away and a new DAN2 model must
EGARCH are calculated using MS Excel. Since EGARCH showed no
be established.
asymmetric shocks (gives two inputs values as 0 for EGARCH-
4. Reliability tests on input data, which is a necessity due to the
ANN), the model is eliminated from the comparisons. For MLP
regression starter, does not exist.
and GARCH-MLP NeuroSolutions 5.06 software is used. For calcu-
5. Each layer has to be controlled by reliability tests since it is the
lating DAN2 and GARCH-DAN2 MS Excel is used.
input of the following layers regression model. DAN2 can have a
MLP and DAN2 uses last four days to forecast the fifth date
dynamic architecture by removing insignificant layer
while GARCH-MLP and GARCH-DAN2 uses last four days and addi-
connections.
tional two inputs calculated from GARCH model. While MLP mod-
6. Since this study and other studies shows that some significance
els uses data one at a time, DAN2 uses all data at once to calculate
test should be done for validity of the model, it contradicts with
model parameters.
input-out mapping feature of ANN models. Input–output map-
Training data is given in Fig. 2 and the forecast period with tests
ping feature is described by Haykin (1999) as learning from
is shown in Fig. 3.
available data by input–output mapping without making prior
As it is observed in Table 2, the analysis shows that DAN2 gives
assumptions on the model or inputs.
the best MSE and MAD in test data MLP gives the best results in the
7. Since DAN2 is a new architecture and studied by its developers
training data.
some parts of the architecture are not clear enough. For exam-
ple updating method of ai values is never mentioned by Ghiassi
& Saidane (2005). Another example is calculating lk, which is a
constant multiplier of CURNOLE node for iteration k. In this
NASDAQ -MLP Forecast study the suggested bisection method (Ghiassi & Saidane,
1900
1800 2005) gave very close results to starting lk with converging to
starting value.
1700
1600
The problems mention about DAN2 structure clearly shows
1500 DAN2 architecture behaves like a statistical method rather than
NASDAQ
1400 an artificial neural network.
Forecasted
1300 The overall results show that classical ANN model MLP gives the
1200 most reliable best results in forecasting time series. Hybrid meth-
ods failed to improve the forecast results.
100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
10
19
28
37
46
55
64
73
82
91
1

When MLP results are observed NASDAQ index seems to show


Fig. 2. Training data of NASDAQ index and MLP forecast. inconsistencies for a considerable time before it can stabilize. Only

2090
2040
1990
1940
1890
1840
1790
Forecasted Actual
1740
1690
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51

Fig. 3. NASDAQ index forecasted period.


10396 E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397

Table 2
Results of ANN and hybrid models.

Method Training Test


MSE MAD MAD % MSE MAD MAD%
MLP 2227.416 36.909 2.324 2478.1468 41.153 2.516
GARCH-MLP 2695.324 38.446 2.465 3665.8387 42.739 2.775
DAN2 2349.259 37.290 2.409 1472.278 32.875 2.768
GARCH-DAN2 19383.400 119.081 7.361 20901.198 109.626 6.487

Table 3
Coefficients of DAN2 regression modela.

Model Unstandardized coefficients Standardized coefficients t Sig.


B Std. Error Beta
1 (Constant) 1460.510 196.043 7.450 .000
LAG1 .277 .244 .224 1.138 .257
LAG2 .017 .310 .014 .056 .955
LAG3 .052 .308 .044 .168 .867
LAG4 .243 .247 .219 .985 .326
r2 (t  1) 119517.701 190445.589 .071 .628 .531
e2(t  1) 12831.899 37110.957 .030 .346 .730
a
Dependent Variable: NASDAQ.

17 days can be predicted based on cycles completed in time series which are inputs of fundamental analysis, such as bond yields,
history. It can also be said that longer periods cannot be forecasted bond prices, contract volume etc. Further researches should be fo-
by simply technical analysis (by only using previous index values cus to discover whether GARCH, E-GARCH has a correcting effect
or variables derived from previous index variables) but those anal- on forecasts or other correlated variables has a corrective effect
ysis are to be completed by using the fundamental analysis (eval- on forecasts. The results of these further researches will lead us
uating index with global and national economic analysis, market to many powerful financial time series forecasting models.
analysis, ratio analysis etc.) or case based scenario analysis. It
should be noted that when the forecasted and realized Index data
References
evaluated, the MLP model clearly showed that first movement of
NASDAQ index is down and this forecast realized about 9th day. Alpaydın, E. (2004). Introduction to machine learning. Londan, England: The MIT
But as mentioned before we do not have a crystal ball so forecast- Press.
ing the following movements are very difficult. Ando, T. (2009). Bayesian portfolio selection using a multifactor model. International
Journal of Forecasting, Special Section: Time Series Monitoring, 25(3), 550–566.
Atsalakis, G. S., & Valavanis, K. P. (2009b). Surveying stock market forecasting
techniques – Part II: Soft computing methods. Expert Systems with Applications,
6. Conclusion & future research 36, 5932–5941.
Atsalakis, G. S., & Valavanis, K. P. (2009a). Forecasting stock market short-term
This study is in search for reducing the shortcomings of using trends using a neuro-fuzzy based methodology. Expert Systems with
Applications, 36, 10696–10707.
ANN in predicting the market values. With this aim this study Bildirici, M., & Ersin, Ö. Ö. (2009). Improving forecasts of GARCH family models with
motivated from a new ANN model; DAN2 developed by Ghiassi & the artificial neural networks: An application to the daily returns in Istanbul
Saidane (2005) and the hybrid models (GARCH-ANN, EGARCH- stock exchange. Expert Systems with Applications, 36, 7355–7362.
Bollerslev, T. (1986). Generalized autoregressive conditional heteroscedasticity.
ANN) developed by Roh (2007). In order to present the differences
Journal of Econometrics, 31, 307–327.
in accuracy of prediction, all the models are applied on the same Celik, A. E., & Karatepe, Y. (2007). Evaluating and forecasting banking crises through
set of data retrieved from NASDAQ Stock exchange. neural network models: An application for Turkish banking sector. Expert
The results show that classical ANN model MLP outperforms systems with Applications, 33, 809–815.
Chang, P.-C., Liu, C.-H., Lin, J.-L., Fan, C.-Y., & Ng, C. S. P. (2009). A neural network
DAN2 and GARCH-MLP with a little difference. GARCH inputs with a case based dynamic window for stock trading prediction. Expert Systems
had a noise effect on DAN2 because of the inconsistencies ex- with Applications, 36, 6889–6898.
plained in the previous section and GARCH-DAN2 clearly had the Chen, C. W. S., & So, M. K. P. (2006). On a threshold heteroscedastic model.
International Journal of Forecasting, 22(1), 73–89.
worst results. Thus further researches’ should focus on improving Chen, M.-S., Ying, L.-C., & Pan, M.-C. (2010). Forecasting tourist arrivals by using the
DAN2 architecture. At least for now, simple MLP seems to be the adaptive network-based fuzzy inference system. Expert Systems with
best and practical ANN architecture. Applications, 37(2), 1185–1191.
Cheng, J., Chen, H., & Lin, Y. (2010). A hybrid forecast marketing timing model based
When the MLP model used to forecast the future movements of on probabilistic neural network, rough set and C4. 5. Expert Systems with
the NASDAQ index, MLP model correctly forecasted the first move- Applications, 37(3), 1814–1820.
ment as down. The realized value (1747.17) had very small differ- Egrioglu, E., Aladag, C. H., Yolcu, U., Uslu, V. R., & Basaran, M. A. (2009). A new
approach based on artificial neural networks for high order multivariate fuzzy
ence (0.54%) with the forecasted value (1737.70). Thus MLP is a time series. Expert Systems with Applications, 36, 10589–10594.
powerful and practical tool for forecasting stock movements. Engle, R. F. (1982). conditional heteroscedasticity with estimator of variance of
Since hybrid models (GARCH-ANN) do not give satisfying re- United Kingdom inflation. Econometrica, 50(4), 987–1008.
Ghiassi, M., & Saidane, H. (2005). A dynamic architecture for artificial neural
sults, despite Roh’s (2007) research, a lot of time series should be
networks. Neurocomputing, 63, 397–413.
used to understand the inner dynamics of hybrid models, before Ghiassi, M., Saidane, H., & Zimbra, D. K. (2005). A dynamic artificial neural network
making a conclusion about hybrid models performance. Roh re- model for forecasting time series events. International Journal of Forecasting, 21,
ported that 20–25 % of the learning of each ANN came from GARCH 341–362.
Ghiassi, M., Zimbra, D. K., & Saidane, H. (2006). Medium term system load
or E-GARCH input variables, which are inputs of technical analysis, forecasting with a dynamic artificial neural network model. Electric Power
but 75–80 % in that research many other correlated variables, Systems Research, 76, 302–316.
E. Guresen et al. / Expert Systems with Applications 38 (2011) 10389–10397 10397

Gooijer, J. G. D., & Hyndman, R. J. (2006). 25 years of time series forecasting. Menezes, L. M., & Nikolaev, N. Y. (2006). Forecasting with genetically programmed
International Journal of Forecasting, 22, 443–473. polynomial neural networks. International Journal of Forecasting, 22, 249–265.
Guresen, E., & Kayakutlu, G. (2008). Forecasting stock exchange movements using Nelson, D. B. (1991). Conditional heterosdasticity in asset returns: A new approach.
artificial neural network models and hybrid models. In Zhongzhi Shi, E. Mercier- Econometrica, 59(2), 347–370.
Laurent, & D. Leake (Eds.), Proceedings of 5thIFIP international conference on Pekkaya, M., & Hamzaçebi, C. (2007). Yapay sinir ağları ile döviz kuru tahmini
intelligent information processing. Intelligent Information Processing IV (Vol. 288, üzerine bir uygulama. In Proceedings of the 27th YA/EM National Congress (pp.
pp. 129–137). Boston: Springer. 973–978). Izmir, Turkey.
Hamzacebi, C., Akay, D., & Kutay, F. (2009). Comparison of direct and iterative Preminger, A., & Franck, R. (2007). Forecasting exchange rates: A robust regression
artificial neural network forecast approaches in multi-periodic time series approach. International Journal of Forecasting, 23, 71–84.
forecasting. Expert Systems with Applications, 36, 3839–3844. Principe, J. C., Euliano, N. R., & Lefebvre, W. C. (1999). Neural and adaptive systems:
Hassan, M. R., Nath, B., & Kirley, M. (2007). A fusion model of HMM, ANN and GA for Fundamentals through simulations. New York, USA: John Wiley & Sons.
stock market forecasting. Expert Systems with Applications, 33, 171–180. Ramnath, S., Rock, S., & Shane, P. (2008). The financial analyst forecasting literature:
Haykin, S. (1999). Neural networks: A comprehensive foundation. New Jersey, USA: A taxonomy with suggestions for further research. International Journal of
Prentice Hall. Forecasting, 24(1), 34–75.
Khashei, M., & Bijari, M. (2010). An artificial neural network (p, d, q) model for Roh, T. H. (2007). Forecasting the volatility of stock price index. Expert Systems with
timeseries forecasting. Expert Systems with Applications, 37(1), 479–489. Applications, 33, 916–922.
Kumar, P. R., & Ravi, V. (2007). prediction in banks and firms via statistical and Scharth, M., & Medeiros, M. C. (2009). Asymmetric effects and long memory in the
intelligent techniques – A rewiev. European Journal of Operational Research, 180, volatility of Dow Jones stocks, international journal of forecasting. Forecasting
1–28. Returns and Risk in Financial Markets using Linear and Nonlinear Models, 25(2),
Lee, S., Lee, D., & Oh, H. (2005). Technological forecasting at the Korean stock 304–327.
market: A dynamic competition analysis using Lotka–Volterra model. Wang, Y., Keswani, A., & Taylor, S. J. (2006). The relationships between sentiment,
Technological Forecasting and Social Change, 72(8), 1044–1057. returns and volatility. International Journal of Forecasting, 22(1), 109–123.
Leu, Y., Lee, C., & Jou, Y. (2009). A distance-based fuzzy time series model for Yu, T. H., & Huarng, K. (2008). A bivariate fuzzy time series model to forecast the
exchange rates forecasting. Expert Systems with Applications, 36, 8107–8114. TAIEX. Expert Systems with Applications, 34, 2945–2952.
Liao, Z., & Wang, J. (2010). Forecasting model of global stock index by stochastic Yudong, Z., & Lenan, W. (2009). Stock market prediction of S& P 500 via combination
time effective neural network. Expert Systems with Applications, 37(1), 834–841. of improved BCO approach and BP neural network. Expert Systems with
Majhi, R., Panda, G., & Sahoo, G. (2009). Efficient prediction of exchange rates with Applications, 36, 8849–8854.
low complexity artificial neural network models. Expert Systems with Zhang, Y., & Wan, X. (2007). Statistical fuzzy interval neural networks for currency
Applications, 36, 181–189. exchange rate time series prediction. Applied Soft Computing, 7, 1149–1156.
Maltritz, D., & Eichler, S. (2010). Currency crisis prediction using ADR market data: Zhu, X., Wang, H., Xu, L., & Li, H. (2008). Predicting stock index increments by neural
An options-based approach. International Journal of Forecasting, 26(4), 858–884. networks: The role of trading volume under different horizons. Expert Systems
Mcnelis, P. D. (2005). Neural networks in finance: Gaining predictive edge in the with Applications, 34, 3043–3054.
market. USA: Elseivier Academic Press.

You might also like