0% found this document useful (0 votes)

24 views12 pages

SOM-GP Stock Price Prediction System

The document discusses a hybrid procedure for stock price prediction that integrates self-organizing maps (SOM) and genetic programming (GP). This SOM-GP approach clusters data to improve prediction accuracy by modeling the relationship between technical indicators and stock prices. Experimental results demonstrate its effectiveness in predicting the finance and insurance sub-index of TAIEX, highlighting the challenges posed by stock price volatility.

Uploaded by

bskalsi42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views12 pages

SOM-GP Stock Price Prediction System

Uploaded by

bskalsi42

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220218862

A hybrid procedure for stock price prediction by integrating self-organizing

map and genetic programming

Article in Expert Systems with Applications · May 2011

DOI: 10.1016/j.eswa.2011.04.210 · Source: DBLP

CITATIONS READS

85 700

1 author:

Chih-Ming Hsu
Minghsin University of Science and Technology
43 PUBLICATIONS 738 CITATIONS

SEE PROFILE

All content following this page was uploaded by Chih-Ming Hsu on 13 September 2018.

The user has requested enhancement of the downloaded file.

Expert Systems with Applications 38 (2011) 14026–14036

Contents lists available at ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier.com/locate/eswa

A hybrid procedure for stock price prediction by integrating self-organizing map

and genetic programming
Chih-Ming Hsu ⇑
Department of Business Administration, Minghsin University of Science and Technology, 1 Hsin-Hsing Road, Hsin-Fong, Hsinchu 304, Taiwan, ROC

a r t i c l e i n f o a b s t r a c t

Keywords: Stock price prediction is a very important financial topic, and is considered a challenging task and worthy
Stock price prediction of the considerable attention received from both researchers and practitioners. Stock price series have
Self-organizing map properties of high volatility, complexity, dynamics and turbulence, thus the implicit relationship between
Genetic programming the stock price and predictors is quite dynamic. Hence, it is difficult to tackle the stock price prediction
problems effectively by using only single soft computing technique. This study hybridizes a self-
organizing map (SOM) neural network and genetic programming (GP) to develop an integrated proce-
dure, namely, the SOM-GP procedure, in order to resolve problems inherent in stock price predictions.
The SOM neural network is utilized to divide the sample data into several clusters, in such a manner that
the objects within each cluster possess similar properties to each other, but differ from the objects in
other clusters. The GP technique is applied to construct a mathematical prediction model that describes
the functional relationship between technical indicators and the closing price of each cluster formed in
the SOM neural network. The feasibility and effectiveness of the proposed hybrid SOM-GP prediction pro-
cedure are demonstrated through experiments aimed at predicting the finance and insurance sub-index
of TAIEX (Taiwan stock exchange capitalization weighted stock index). Experimental results show that
the proposed SOM-GP prediction procedure can be considered a feasible and effective tool for stock price
predictions, as based on the overall prediction performance indices. Furthermore, it is found that the fre-
quent and alternating rise and fall, as well as the range of daily closing prices during the period, signif-
icantly increase the difficulties of predicting.
Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction technical analysis studies the stock prices and related issues,
including analysis of recent and historical price trends, cycles
Stock price prediction is an important financial subject, which and factors beyond the stock price, such as dividend payments,
has received considerable attention from researchers in recent trading volume, index trends, industry group trends and popular-
years. Stock price prediction is considered a challenging task in ity, and volatility of a stock (Thomsett, 1999). Technical analysis,
consideration of its high volatility, complexity, dynamics, and tur- rather than relying solely upon historical financial information,
bulence. In the past, many attempts have been made to predict analysts will surmise upon recent trends in stock price changes,
stock prices using various methodologies, which can be broadly prices and earnings relationships, the activity volume of a particu-
classified into three categories, namely, fundamental analysis, lar stock or industry, and other similar indicators in order to deter-
technical analysis, and traditional time series forecasting. Funda- mine changes in stocks, and in the market itself (Thomsett, 1999).
mental analysis examines the basic financial information of a cor- In addition, traditional time series forecasting techniques, such as
poration in order to forecast profits, supply, demand, industry autoregressive integrated moving average (ARIMA) (Box & Jenkins,
strength, management abilities, and other intrinsic matters affect- 1970), generalized autoregressive conditional heteroskedasticity
ing the market value and growth potential of a stock (Thomsett, (GARCH) (Bollerslev, 1986), and multivariate regression have been
1998). In fundamental analysis, investors believe that the funda- applied to the prediction of stock price movements. In recent years,
mentals include a corporation’s financial statements, interim data mining/computational intelligence techniques have become
reports, historical financial trends, and any forecasts concerning fu- another important approach to predict stock prices. For example,
ture growth, sales, profits, etc., should rule the processes of the Kim and Han (2000) utilized genetic algorithms (GAs) to discretize
selection of stocks and timing of sales (Thomsett, 1999). However, features and determine the connection weights of artificial neural
networks (ANNs), thus, predicting the stock price index. Experi-
⇑ Tel.: +886 3 5593142x3581; fax: +886 3 5593142x3573. ments conducted on the daily Korea stock price index (KOSPI)
E-mail address: [email protected] showed that, their proposed approach outperformed the linear

0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.04.210
C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036 14027

transformation functions of both a backpropagation neural net- direction were used to illustrate their proposed method, and suit-
work (BPLT) and a linear transformation with ANN, as trained by able results were obtained. In addition, comparisons with informa-
GA (GALT). Kim (2003) applied a support vector machine (SVM) tion gain, symmetrical uncertainty, and correlation-based feature
to predict the stock price index, and the feasibility of applying selection methods all indicated that their proposed model could
SVM to financial forecasting was examined through comparisons yield the highest levels of accurate and generalized performances.
with a backpropagation neural network (BPNN) and case-based Yu, Chen, Wang, and Lai (2009) presented an evolving least squares
reasoning (CBR). The experimental results of the daily Korea stock support vector machine (LSSVM) learning paradigm, with a mixed
price index (KOSPI) investigation showed that, SVM provides a kernel based on genetic algorithms (GAs), in order to predict the
promising alternative for financial time series forecasting; trends of stock markets. The GAs were used to select the input fea-
moreover, it outperforms both BPNN and CBR approaches. Pai tures and optimize parameters of LSSVM. The LSSVM approach was
and Lin (2005) proposed a hybrid methodology through exploita- illustrated through testing the S&P 500 index, the Dow Jones
tion of the strengths of the autoregressive integrated moving aver- Industrial Average (DJIA) index, and New York Stock Exchange
age (ARIMA) and support vector machine (SVM) in order to (NYSE) index, and experimental results revealed that their
forecast stock prices. The performance of the proposed model is proposed learning paradigm was more efficient than other param-
evaluated by testing real data sets of ten stocks, and adequate re- eter optimization methods, and outperformed all other forecasting
sults are obtained. Tsang et al. (2007) presented a stock buying/ models in terms of the hit ratio. Zhang, An, Tang, and Hong (2009)
selling alert system using a feed-forward backpropagation neural proposed a type-2 fuzzy rule based expert system that applied
network, called NN5. The system is tested with data from The Hong technical and fundamental indices as the input variables for the
Kong and Shanghai Banking Corporation (HSBC) Holdings stock, lo- analysis of stock prices. Their proposed model was tested on the
cated in Hong Kong, and achieved an overall hit rate of over 70%. stock price predictions of an automotive manufactory in Asia,
Chang and Liu (2008) presented a Takagi–Sugeno–Kang (TSK) type and successful results were obtained.
fuzzy rule based system by applying a linear combination conse- In this study, an integrated approach based on a self-organizing
quence of the significant technical index in order to predict stock map (SOM) neural network and genetic programming (GP),
prices. Their proposed approach was tested on the Taiwan Stock namely, the SOM-GP procedure, is proposed for predicting stock
Exchange (TSE) and MediaTek Inc., and the experimental results prices. The remainder of this paper is organized as follows: In Sec-
outperformed other methodologies, such as a back-propagation tion 2, SOM and GP are discussed. The proposed integrated ap-
neural network and multiple regression analysis. Ince and Trafalis proach is presented in Section 3. Section 4 evaluates the
(2008) assumed that the future value of a stock price depends on feasibility and effectiveness of the proposed approach by a case
its financial indicators, although there is no existing parametric study of predicting the finance and insurance sub-index of TAIEX.
model able to explain the relationship coming from the technical Finally, Section 5 concludes the paper.
analysis. Hence, they proposed two nonparametric data driven
models, a support vector regression (SVR) and a multi-layer per-
2. Self-organizing map and genetic programming
ceptron (MLP), for short term stock price predictions based on
technical indicators. The experiments were conducted on the daily
2.1. Self-organizing map
stock prices of ten companies traded on the NASDAQ, and compar-
ison results indicated that the SVR approach outperformed the MLP
The self-organizing map (SOM) was first introduced by
networks in short term predictions, in terms of the mean square
Kohonen (1989), as an unsupervised and competitive learning neu-
error. Huang and Tsai (2009) proposed a hybrid procedure using
ral network able to map a high-dimensional input data space into a
support vector regression (SVR), self-organizing feature map
lower-dimensional (typically one- or two-dimensional) space. The
(SOFM), and filter-based feature selection in order to predict the
end-product is called a feature map able to preserve the most
stock market price index. Their proposed model was demonstrated
important topological relationships of the input data. The typical
through a case study of predictions of the next day’s price index for
SOM consists of two layers, as shown in Fig. 1, where the input
Taiwan index futures (FITX), and the experiment results showed
that the proposed approach can improve prediction accuracy and
reduce the training time over the traditional single SVR model.
Lai, Fan, Huang, and Chang (2009) proposed a decision-making sys-
tem that integrates a data clustering technique, a fuzzy decision
tree, and genetic algorithms in order to forecast stock price tenden-
cies. Three particular stocks in the Taiwan Stock Exchange Corpo-
ration (TSEC) were selected to test the effectiveness of their
proposed system, which yielded the best performance of an 82%
average hit rate, in comparison with other approaches. Liang,
Zhang, Xao, and Chen (2009) presented a nonparametric method-
ology based on neural networks (NNs) and support vector regres-
sion (SVR) to forecast option prices. In their study, the improved
conventional option pricing methods were modified to forecast
the option prices, and then, the NN and SVR were further employed
to decrease the forecasting errors of the parametric methods. The
proposed approach was demonstrated by experimental studies
upon data taken from the Hong Kong options market, which results
showed that the NN and SVR approaches can significantly shrink
the average forecast errors, thus, improving forecasting accuracy.
Lee (2009) developed a model based on a support vector machine
(SVM) with a hybrid feature selection, namely, F-score and sup-
ported sequential forward search (F_SSFS), to predict the trends
of stock markets. The experiments of predicting the NASDAQ index Fig. 1. Typical SOM topology consists of two layers.
14028 C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036

layer is fully connected to a two-dimensional Kohonen layer, and

none of the neurons are connected in the Kohonen layer. Each neu-
ron in the Kohonen layer represents a cluster, which weight vector
serves as an exemplar of the input patterns associated with only
this cluster. The self-organizing process chooses a neuron whose
weight vector matches the input pattern most closely (usually
evaluated by the Euclidean distance) as the winner. The winner
and its neighboring neurons (based on the activation zone for each
neuron) would then update their weights. By following the
architecture and algorithm for the SOM neural network, input data
can be clustered into a certain number (i.e. the total number of
neurons in the Kohonen layer) of clusters. Assuming that there
are a set of continuous-valued input patterns of x = (x1, x2, . . . ,
xi, . . . , xn) and m clustered neurons within the feature map; the
weight vector associated with neuron j in the Kohonen layer is rep-
resented by wj = (w1j, w2j, . . . , wij, . . . , wnj); and the neighborhood
function used to control the relaxation process is denoted by hj0 j
(where j0 and j are the subscripts of the neurons in the Kohonen
layer). The training steps include competitive and weight adjust-
Fig. 2. Tree-based representation of an individual in GP.
ment processes, described as follows (Fausett, 1994; Kohonen,
1995):
SOM has attracted substantial research interest from a wide
Step 0: Initialize weights wj and neighborhood functions hj0 j; then range of applications. For example, adequate results were obtained
set the radius of the topological neighborhood R, and through SOM in literature (Huang & Tsai, 2009; Lin & Wu, 2009;
learning rate a Szczurowska, Kuniszyk-Jozkowiak, & Smolka, 2009; Zhang et al.,
Step 1: If stopping criterion is not fulfilled, repeat Steps 2–8. 2009).
Step 2: For each input vector x, complete Steps 3–5.
Step 3: For each cluster neuron j, compute
2.2. Genetic programming
DðjÞ ¼ kx wj k:
Genetic programming (GP), as developed by Koza (1992), is an
evolutionary approach that extends genetic algorithms (GAs) (Hol-
Step 4: Find index c such that D(c) is the minimum. land, 1975) to the area of computer programs. GP can automati-
Step 5: For all neurons j, within the topological neighborhood of cally create computer programs to solve a user-defined problem
the radius R of neuron c: through iterative executions of evolutionary procedures. The
evolving individuals in GP are themselves computer programs,
wj ðt þ 1Þ ¼ wj ðtÞ þ aðtÞhcj ðtÞ½x wj ðtÞ
rather than fixed-length strings consisting of numbers, alphabetic
where, t is a discrete-time coordinate. letters, or symbols. In GP, the representation of an individual can
Step 6: Update the learning rate a and neighborhood function hj0 j . be viewed as a tree-based structure composed of terminal and
Step 7: Reduce the radius of the topological neighborhood R at the function sets, as shown in Fig. 2. The terminal set defines the ter-
pre-specified times. minal elements available for each branch of the to-be-evolved
Step 8: Test stopping criterion. computer program, and includes the independent variables of the
problem, zero-argument functions, random constants, etc. The
Notably, the learning rate a and radius of topological neighbor- function set is a set of primitive functions available to each branch
hood R decrease as the clustering process progresses. The neigh- of the to-be-evolved computer program, e.g. addition, square root,
borhood function hj0 j is a smoothing kernel function defined over multiplication, sine, etc. Like other evolutionary algorithms, a fit-
the lattice, that is decreasing monotonically in time. There are ness function is defined and used to explicitly or implicitly mea-
two frequent choices for hj0 j in literature (Kohonen, 1995). The sim- sure the fitness (adaptability) of individuals in the population. It
pler of the two refers to a neighborhood set of array points around specifies a desired goal in the search for GP. Furthermore, in order
winner c, where the neighborhood function is defined as: to apply basic genetic programming, users must specify parame-

1; if neuron j lies within a radius R of the winning neuron c; ters and set the termination criterion. The parameters that control
hcj ðtÞ ¼
0; otherwise: the generation runs of the GP include population size, maximum
ð1Þ size of programs, crossover rate, mutation rate, etc. The termina-
tion criterion determines the time required before stopping the
Another widely used smoother Gaussian neighborhood function evolutionary procedures in GP, and may include the maximum
centered at the winning neuron c is defined by: number of generations to be run, the fitness values of the best-
krc rj k2 of-generation individuals for numerous successive generations
hcj ðtÞ ¼ expð Þ; ð2Þ reaching a plateau, or if a success of the run is predicated. The gen-
2r2 ðtÞ
eral steps of GP are briefly described, as follows (Ciglarič & Kidrič,
where rc and rj are the location vectors of neurons c and j, respec- 2006; Koza, Streeter, & Keane, 2008; Koza et al., 2005):
tively, within the Kohonen layer; the parameter r(t) is a monoton-
ically decreasing function of time that is used to define the width of Step 1: Creating an initial population.
the kernel. In addition, the performance of SOM is not sensitive to
the exact shape of the neighborhoods, and rectangular and hexago- The first step creates an initial population (generation 0) of indi-
nal neighborhoods are suggested by Kohonen (1995) for efficient vidual computer programs (typically random), which are com-
implementation. posed of functions and terminals appropriate to the problem. In
C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036 14029

general, the initial individuals are generated subject to a pre-spec- ized objects ðx0i ; x0j Þ, which belong to different clusters, is ﬁrst calcu-
iﬁed maximum size, and are of different sizes and different shapes. lated by:
XX
Step 2: Evaluating individuals. Dbetween clusters ¼ kx0i x0j jj=npbetween clusters ; ð3Þ
i j

Each program in the population is executed and measured in where npbetween/clusters is the total number of all possible paired
terms of how well it performs the task at hand (this is called the objects ðx0i ; x0j Þ, which belong to different clusters. Similarly, the
fitness value), by using a pre-defined fitness function. average distance of all possible paired normalized objects ðx0k ; x0l Þ,
which are clustered into the same cluster, can be expressed by:
Step 3: Generating the next generation. XX
Dwithin clusters ¼ jjx0k x0l jj=npwithin clusters ; ð4Þ
k l
This step first selects programs from the population using a
probability based on fitness. The genetic operations, including where npwithin/clusters is the total number of all possible objects
reproduction, crossover, mutation, and architecture-altering oper- ðx0k ; x0l Þ, which are grouped into the same cluster. Therefore, the
ations are applied to the selected programs. Then, a new popula- clustering efficiency (CE) that is used to evaluate the clustering per-
tion (the next generation) is created by replacing the current formance can be defined as:
population (the now-old generation) with the population of off-
spring based on a certain strategy, e.g. elitist strategy. Dbetween clusters
CE ¼ : ð5Þ
Dwithin clusters
Step 4: Examining the termination criterion. By maximizing the value of CE, the optimal number of clusters in
SOM clustering can then be determined.
When the termination criterion is satisfied, the outcome is des- In the second stage, the closing price in the next day y is first
ignated as the final results of the run. Typically, the single best pro- normalized into a range between 1 and 1 according to its maxi-
gram encountered during the entire run (i.e. the best-so-far mum and minimum values, as denoted by y0 . The normalized tech-
individual) is selected as the solution for a specific problem. If nical indicators x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ, along with the normalized
the termination criterion cannot be fulfilled, execute Steps 2–4 closing price in the next day y0 are then partitioned into training,
iteratively. testing, and validation data, as based on a pre-specified proportion,
Genetic programming has been a highly successful technique e.g. 4:1:1. According to the clustering results previously acquired
for solving problems in numerous fields. Various studies have ob- by SOM, the GP algorithm is then applied to the training and test-
tained adequate results through GP (Bae et al., 2010; Etemadi, ing sample data of each cluster and constructs several prediction
Rostamy, & Dehkordi, 2009; Hwang et al., 2009). models. Based on simultaneously minimizing the mean squared er-
rors (MSEs) regarding the training and testing data, an optimal GP
model is selected for each cluster to describe the functional rela-
tionship between the normalized technical indicators
3. Proposed hybrid SOM-GP prediction procedure x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ of each day, and the normalized closing price
in the next day y0 .
In this study, a hybrid approach based on a self-organizing map In the third stage, the normalized technical indicators
(SOM) neural network and genetic programming (GP), namely, the x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ of each day in the validation data are first
SOM-GP procedure, is proposed to predict stock prices. The SOM- grouped into a cluster, denoted as cluster c, by inputting x0 into
GP procedure comprises three stages. In the first stage, the essen- the SOM neural model constructed in Stage 1. Then, the normalized
tial historical stock trading data, e.g. opening price, highest price, technical indicators x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ of each day in the validation
lowest price, closing price, trade volume, etc. are first collected. data are fed into the well-trained GP model corresponding to clus-
Next, the required technical indicators, e.g. moving average (MA), ter c, as obtained in Stage 2, in order to acquire the predicted nor-
Williams overbought/oversold index (WMS%R), psychological line malized closing price in the next day, denoted by y ^0 . Therefore, the
(PSY), commodity channel index (CCI), etc. used for independent predicted closing price of the next day y ^, given the technical indi-
input variables in the stock price prediction model are calculated. cators of x = (x1, x2, . . . , xn) for a certain day then can be obtained
The acquired technical indicators, denoted by x = (x1, x2, . . . , xn), through de-normalizing x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ and y ^0 . Finally, the
serve as the sample data in the succeeding SOM clustering steps. effectiveness of the proposed SOM-GP prediction procedure is
To avoid variables with larger numeric ranges from dominating evaluated by using statistical metrics of the root mean squared
those in smaller numeric ranges, the technical indicators are error (RMSE), mean absolute error (MAE), and mean absolute per-
normalized into a range between 1 and 1, denoted by centage error (MAPE), which are defined as follows:
x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ, according to their corresponding maximum sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
and minimum values. An SOM neural network is followed to divide Xno ðy y ^ i Þ2
i
the sample data x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ into an appropriate number of RMSE ¼ ; ð6Þ
i¼1 no
clusters. The purpose of clustering aims to split the sample data
to form several clusters in such a manner that the objects within Xno
î j
jyi y
each cluster are similar to each other and dissimilar to the objects MAE ¼ ; ð7Þ
in other clusters. By doing so, the approximate functional model i¼1
no
that describes an implicit mathematical relationship between the
technical indicators x = (x1, x2, . . . , xn), i.e. independent variables, Xno
î j=yi
jyi y
and the closing price in the next day y, i.e. dependent variables, MAPE ¼ 100%; ð8Þ
i¼1
no
for the sample data in each cluster can be expectantly constructed
more easily and precisely. However, it is difficult to determine the where no is the total number of objects in the validation data.
most appropriate number of clusters when clustering through The proposed hybrid SOM-GP prediction procedure in this
SOM. This study introduces an index to measure the performance study is conceptually illustrated in Fig. 3 and re-stated summarily,
of clustering. The average distance of all possible paired normal- as follows:
14030 C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036

Fig. 3. Proposed hybrid SOM-GP prediction procedure.

Stage 1: Clustering sample data by SOM. anticipated, have great impact on the individual stock price. Hence,
1. Collect the essential historical stock trading data and calculate the weighted stock index in a sector, but not an individual stock
the required technical indicators. price, is selected for exploration in this study. Second, the invest-
2. Normalize the technical indicators x = (x1, x2, . . . , xn) into the ment market of exchange traded funds (ETFs), futures, and options
range of (1, 1), represented by x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ. in Taiwan have flourished and grown in recent years. For example,
3. Apply an SOM neural network to divide the sample data Fubon Taiwan Finance ETF, Finance Sector Index Futures (TF), and
x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ into several clusters, based on the clustering Finance Sector Index Options (TFO), whose underlying index is TAI-
efficiency (CE) in Eq. (5). EX-FISI, achieved trade volumes of 47,018,000 (shares), 1,285,074
(lots), and 927,888 (lots), respectively, in 2008 (http://www.twse.-
Stage 2: Building GP prediction models. com.tw; http://www.taifex.com.tw). Furthermore, the trade vol-
1. Normalize the closing price of the next day y into the range umes of TF and TFO grew 41.3% and 22.6%, respectively, from
between 1 and 1, as denoted by y0 . 2005 to 2008. The finance and insurance sub-index of TAIEX is
2. Partition the sample data x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ and y0 into the therefore selected as the object of experimentation in order to
training, testing, and validation data, as based on a pre-specified examine the practicability and performance of the proposed hybrid
proportion. SOM-GP prediction procedure. In this study, the TAIEX-FISI exper-
3. Apply the GP algorithm to construct several prediction models imental data are collected from the Taiwan Stock Exchange Corpo-
from the training and testing sample data in each cluster gener- ration (TWSE) over a period of approximately thirteen years, from
ated by SOM in Stage 1. January 4, 1996 to September 18, 2009. A total of 3,540 pairs of dai-
4. Select an optimal GP model for each cluster to approximate the ly trading data, including opening price, highest price, lowest price,
implicit functional relationship between the normalized techni- closing price, and trade volume are the initial sample data.
cal indicators ðx01 ; x02 ; . . . ; x0n Þ and the closing price of the next day
y0 , as based on minimizing training and testing MSEs.
4.2. Clustering sample data
Stage 3: Measuring the prediction performance.
1. Input the normalized technical indicators x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ of First, 16 technical indicators are selected as the independent in-
each day in the validation data into the SOM neural model, as put variables for predicting stock price, according to previous stud-
constructed in Stage 1, and group x0 into cluster c. ies of Kim and Han (2000), Kim and Lee (2004), Tsang et al. (2007),
2. Feed the normalized technical indicators x0 ¼ ðx01 ; x02 ; . . . ; x0n Þ of Chang and Liu (2008), Ince and Trafalis (2008), Huang and Tsai
each day in the validation data into the well-trained GP model (2009) and Lai et al. (2009). These technical indicators, which for-
corresponding to cluster c, as obtained in Stage 2, in order to mulas are presented in Appendix A, include 10-day moving
acquire the predicted normalized closing price of the next average (MA_10), 20-day bias (BIAS_20), moving average conver-
^0 .
day y gence/divergence (MACD), 9-day stochastic indicator K (K_9),
3. De-normalize ðx01 ; x02 ; . . . ; x0n Þ and y
^0 in order to acquire the pre- 9-day stochastic indicator D (D_9), 9-day Williams overbought/
dicted closing price of the next day y ^0 , when given the technical oversold index (WMS%R_9), 14-day plus directional indicator
indicators ðx1 ; x2 ; . . . ; xn Þ of a certain day. (+DI_14), 14-day minus directional indicator (DI_14), 10-day
4. Evaluate the effectiveness of the proposed SOM-GP prediction momentum (MTM_10), 10-day rate of change (ROC_10), 5-day rel-
procedure, via RMSE, MAE, and MAPE in Eqs. (6)–(8). ative strength index (RSI_5), 24-day commodity channel index
(CCI_24), 26-day buying/selling momentum indicator (AR_26),
26-day buying/selling willingness indicator (BR_26), 26-day vol-
4. Experiments ume ratio (VR_26), and 13-day psychological line (PSY_13).
According to the formulas in Appendix A and the initial 3540 pairs
4.1. Experimental data of daily trading data, the 16 technical indicators are calculated.
Notably, the technical indicators in the first few days from January
To demonstrate the feasibility and effectiveness of the proposed 4, 1996, are not available due to the definitions of technical indica-
hybrid SOM-GP prediction procedure, experiments on predicting tors. For example, the 10-day moving average (MA_10) can be ob-
the finance and insurance sub-index of TAIEX (Taiwan stock tained only for the period after the 10th trading day from January
exchange capitalization weighted stock index), called TAIEX-FISI 4, 1996. By considering the above limitation and validity of the
in this study, are conducted. There are two major reasons for technical indicators, a total of 3497 pairs of technical indicator data
selecting the TAIEX-FISI as the research target. First, it is difficult from March 1, 1996 to September 17, 2009 are used as the sample
to predict the price of an individual stock because the stock market data for clustering. Next, the above 16 technical indicators are
news and contrived manipulations, which usually cannot be normalized into a range between 1 and 1, according to their
C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036 14031

Table 1
Clustering results of SOM.

Number of neurons in the Kohonen layer 3 4 5 6 7 8 9 10

Dbetween-clusters 0.0192 0.0193 0.0194 0.0193 0.0192 0.0190 0.0190 0.0188
Dwithin-clusters 0.0165 0.0155 0.0141 0.0137 0.0132 0.0136 0.0132 0.0137
Clustering efﬁciency (CE) 1.1636 1.2452 1.3759 1.4088 1.4545 1.3971 1.4394 1.3723

Table 2 Table 3
Total numbers of sample data in the seven clusters formed by SOM. Dividing TAIEX-FISI sample data into 10 subsets.

Cluster 1 2 3 4 5 6 7 Total Dataset Training Testing Validation Dataset

period period period size
Total number of 397 669 443 358 534 292 804 3497
sample data 1 1996/03/01– 1996/12/26– 1997/03/17– 360
1996/12/24 1997/03/15 1997/05/29
2 1997/05/30– 1998/04/10– 1998/06/29– 360
1998/04/09 1998/06/26 1998/09/11
corresponding maximum and minimum values. The SOM neural 3 1998/09/14– 1999/08/07– 1999/10/28– 360
1999/08/06 1999/10/27 2000/01/18
network is further designed using NeuralWorks Professional II/Plus
4 2000/01/19– 2000/12/06– 2001/03/09– 360
(http://www.neuralware.com) to classify the normalized technical 2000/12/05 2001/03/08 2001/06/04
indicators into clusters. The SOM consists of sixteen neurons in an 5 2001/06/05– 2002/05/31– 2002/08/23– 360
input layer, and several neurons arranged in a one-dimensional 2002/05/30 2002/08/22 2002/11/18
Kohonen layer. The initial weight vector of each neuron in the 6 2002/11/19– 2003/11/06– 2004/02/09– 360
2003/11/05 2004/02/06 2004/04/30
Kohonen layer is randomly set, and the total number of learning 7 2004/05/03– 2005/04/21– 2005/07/15– 360
iterations for the Kohonen layer is set as 104,910 (30 times the to- 2005/04/20 2005/07/14 2005/10/12
tal number of the sample data, i.e. 30 3497). The learning rate is 8 2005/10/13– 2006/09/28– 2006/12/25– 360
initially set as 0.06, and is reduced by half at 52,454 and 78,681 2006/09/27 2006/12/22 2007/03/28
9 2007/03/29– 2008/03/19– 2008/06/13– 360
learning iterations. The simple neighborhood function, as shown
2008/03/18 2008/06/12 2008/09/05
in Eq. (1), is applied to control the relaxation process when updat- 10 2008/09/08– 2009/05/18– 2009/07/20– 257
ing weights. In addition, the size of the topological neighborhood is 2009/05/15 2009/07/17 2009/09/17
initially set as 7, and is reduced to 5 and 3 at 52,454 and 78,681 Total number of 2,330 584 583 3497
learning iterations, respectively. Experiments are conducted by sample data
setting the total number of neurons in the one-dimensional Koho-
nen layer between 3 and 10, and the results are summarized in Ta-
ble 1. Based on maximized clustering efﬁciency (CE), an SOM
neural model with seven neurons in the Kohonen layer is selected Table 4
The distribution of training, testing, and validation sample data in the seven clusters
to group the normalized technical indicators into seven clusters, as
formed by SOM.
summarized in Table 2.
Cluster 1 2 3 4 5 6 7 Total

4.3. Building GP prediction models Total number of 282 462 300 258 366 194 468 2330
training data
Total number of testing 71 115 75 65 92 49 117 584
The closing prices from March 2, 1996 to September 18, 2009
data
are ﬁrst normalized into a range between 1 and 1. The normal- Total number of 44 92 68 35 76 49 219 583
ized technical indicators from March 1, 1996 to September 17, validation data
2009, along with the normalized closing price of the next day, i.e. Total number of 397 669 443 358 534 292 804 3497
sample data
from March 2, 1996 to September 18, 2009, are then partitioned
into training, testing, and validation sample data groups, based
on the proportion of 4:1:1, as shown in Table 3. Notably, the sam-
ple data, which consist of normalized technical indicators along dataset, a GP algorithm is implemented for 5 runs, and Table 5
with the normalized closing price of the next day, are split into summarizes the results. Based on the training and testing MSEs,
10 subsets, which is achieved by slicing periods of time to ensure the 4th, 3rd, 3rd, 2nd, 5th, 2nd, and 4th models from clusters 1
that the training, testing, and validation sample data cover the through 7 of Table 5, described as GP_MODEL1 through GP_MO-
entire period of research. In this manner, it is believed that the DEL7, are selected to predict the normalized closing price of the
GP models, which are constructed later, can more accurately esti- next day, when given the normalized technical indicators of a cer-
mate the relationship between the normalized technical indicators tain day that belong to clusters 1 through 7, as formed by SOM,
and the closing price of the next day. Table 4 summarizes the respectively.
distribution of training, testing, and validation sample data in the
seven clusters previously formed by SOM. Next, a genetic program- 4.4. Measuring the prediction performance
ming (GP) technique is applied to the training and testing sample
data of each cluster in order to establish the estimated mathemat- First, the normalized technical indicators of each day lying
ical function between the independent input variables (the nor- within the validation period, as shown in Table 3, are fed into
malized technical indicators) and the dependent output variable the SOM neural model, as constructed in Section 4.2 and clustered
(the normalized closing price in the next day). Here, the GP system as cluster c. The normalized technical indicators then act as inde-
Discipulus 4.0 (http://www.rmltech.com), with its default parame- pendent input variables of the well-trained GP model correspond-
ter settings, is employed, while the ﬁtness of an individual (pro- ing to cluster c, i.e. GP_MODELc, as obtained in Section 4.3, in order
gram) is evaluated through mean squared error (MSE). For each to acquire the predicted normalized closing price of the next day.
14032 C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036

Table 5 mean absolute error (MAE), and mean absolute percentage error
Implementation results of the genetic programming algorithm. (MAPE) are used to assess prediction errors, as shown in Table 6.
Cluster Model No. Training Testing Training Testing This table also lists the maximum and minimum absolute percent-
MSE MSE R2 R2 age errors, as denoted by APEmax and APEmin, respectively. Accord-
1 1 0.000938 0.000710 0.99195 0.99346 ing to Table 6, the overall RMSE and MAE are 19.44 and 14.20,
2 0.000822 0.000884 0.99282 0.99158 respectively. Furthermore, the overall MAPE (1.44 102) indi-
3 0.000884 0.000789 0.99222 0.99265 cates that the absolute percentage of the difference between the
4 0.000791 0.000770 0.99318 0.99272
5 0.000702 0.000913 0.99387 0.99144
actual and predicted closing prices is only 1.44%, on average. The
minimum absolute percentage error attains the excellent level of
2 1 0.000519 0.000658 0.99138 0.99141
2 0.000480 0.000660 0.99382 0.99124
0.00156%, and the maximum absolute percentage of the differ-
3 0.000499 0.000641 0.99346 0.99152 ences between the actual and predicted closing prices is mere
4 0.000516 0.000691 0.99318 0.99076 7.32%. Consequently, the proposed SOM-GP prediction procedure
5 0.000549 0.000615 0.99291 0.99182 is considered an effective method for predicting the TAIEX-FISI
3 1 0.000552 0.000329 0.99338 0.99667 for the next day by using 16 technical indicators. In addition, the
2 0.000525 0.000398 0.99368 0.99579 proposed SOM-GP prediction procedure yields the maximum
3 0.000544 0.000318 0.99356 0.99661
RMSE (27.37) and MAE (22.04) for the sample data of the 1st val-
4 0.000540 0.000340 0.99349 0.99653
5 0.000530 0.000421 0.99372 0.99567 idation period, while providing the minimum RMSE (10.25) and
MAE (7.29) for the sample data of the 7th validation period. By
4 1 0.000876 0.000980 0.98008 0.98246
2 0.000833 0.000921 0.98191 0.98358 comparing Fig. 5, which draws the predicted and actual values of
3 0.000738 0.001092 0.98399 0.98105 the closing prices of the 7th validation period, with Fig. 4, it is
4 0.000836 0.000962 0.98198 0.98273 found that the TAIEX-FISI closing prices within the 1st validation
5 0.000706 0.001144 0.98440 0.97988 period fluctuated more frequently than in the 7th validation peri-
5 1 0.000466 0.000463 0.99340 0.99271 od. It is believed that such frequent fluctuation renders a predic-
2 0.000443 0.000528 0.99351 0.99157 tion more difficult to obtain, and thus, results in larger prediction
3 0.000425 0.000519 0.99372 0.99199
4 0.000504 0.000463 0.99278 0.99252
errors. Further observation of the distribution of the predicted
5 0.000411 0.000482 0.99394 0.99254 and actual closing prices in the 9th period, as shown in Fig 6, re-
6 1 0.001290 0.000478 0.97780 0.99026
veals that the TAIEX-FISI closing prices, as compared with Fig. 5,
2 0.001077 0.000464 0.98149 0.99096 progress much more immoderately, and the difference between
3 0.001082 0.000577 0.98094 0.98856 the maximum and minimum closing prices is much larger than
4 0.001166 0.000555 0.97954 0.98866 that in the 7th validation period. This study considers this the rea-
5 0.001177 0.000543 0.97948 0.98919
son for the proposed SOM-GP prediction procedure to produce the
7 1 0.000713 0.000492 0.98847 0.99275 maximum MAPE (2.05 102) for the sample data in the 9th val-
2 0.000711 0.000486 0.98858 0.99261
idation period, whereas it provides the minimum MAPE
3 0.000717 0.000487 0.98844 0.99275
4 0.000703 0.000428 0.98860 0.99356 (7.72 103) for the sample data in the 7th validation period. In
5 0.000706 0.000473 0.98882 0.99312 conclusion, this study believes that the frequent, alternating rise
and fall, as well as the range of the daily closing price during a per-
iod, will significantly increase prediction difficulty, as based on the
By de-normalizing the predicted normalized closing price, the pre- above analysis.
dicted closing price of the next day can be obtained. Fig. 4 illus-
trates the predicted and actual values of the closing prices during 5. Conclusions
the period from March 17, 1997 to May 29, 1997. To evaluate the
overall performance of the proposed SOM-GP prediction proce- With the inherent high volatility, complexity, dynamics, and
dure, the statistics including root mean squared error (RMSE), turbulence of stock prices, the prediction of a stock price is a

Fig. 4. Predicted and actual values of the closing prices during the period from March 17, 1997 to May 29, 1997.
C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036 14033

Table 6
Prediction performance of the proposed SOM-GP prediction procedure.

Period No. Validation period RMSE MAE MAPE APEmax APEmin

2 2
1 1997/03/17–1997/05/29 27.37 22.04 1.30 10 3.99 10 2.79 104
2 1998/06/29–1998/09/11 24.70 17.99 1.52 102 6.62 102 2.76 105
3 1999/10/28–2000/01/18 18.41 14.62 1.50 102 5.28 102 8.72 105
4 2001/03/09–2001/06/04 13.73 10.79 1.50 102 6.47 102 1.56 105
5 2002/08/23–2002/11/18 13.48 9.86 1.51 102 7.24 102 4.57 105
6 2004/02/09–2004/04/30 24.75 17.77 1.71 102 7.32 102 1.24 104
7 2005/07/15–2005/10/12 10.25 7.29 7.72 103 4.48 102 4.04 104
8 2006/12/25–2007/03/28 12.84 10.10 9.86 103 4.82 102 1.92 104
9 2008/06/13–2008/09/05 23.57 18.57 2.05 102 6.74 102 2.16 104
10 2009/07/20–2009/09/17 15.12 12.53 1.55 102 4.09 102 3.24 104
Overall 19.44 14.20 1.44 102 7.32 102 1.56 105

Fig. 5. Predicted and actual values of the closing prices during the period from July 15, 2005 to October 12, 2005.

Fig. 6. Predicted and actual values of the closing prices during the period from June 13, 2008 to September 5, 2008.
14034 C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036

challenging task. The fundamental analysis, technical analysis, and CPi MA 20i
BIAS 20i ¼ : ð10Þ
traditional time series forecasting, which have their respective MA 20i
merits and limitations, are the three main categories of stock pre-
3. MACD: moving average convergence/divergence.
diction methodologies. In this study, a self-organizing map (SOM)
The moving average convergence/divergence is a momen-
neural network and genetic programming (GP) were utilized to de-
tum indicator that shows the relationship between two
velop an integrated approach, called the SOM-GP procedure, for
moving averages. First, define the demand index (DI) as:
predicting stock prices. An SOM neural network was applied to
split the sample data into several clusters in such a way that the DIi ¼ ðHP i þ LP i þ 2 CPi Þ=4: ð11Þ
objects within each cluster were highly similar, which aims to Next, define the 12-day exponential moving average (EMA_12)
facilitate the construction of the approximation functions that de- and 26-day exponential moving average (EMA_26) as:
scribe the implicit mathematical relationship between the techni-
11 2
cal indicators and the closing prices. In addition, this study EMA 12i ¼ EMA 12i1 þ DIi ð12Þ
13 13
introduced the clustering efficiency (CE) index that measures clus-
tering performance, in order that the optimal number of clusters and
for SOM clustering could be determined. The GP algorithm was 25 2
EMA 26i ¼ EMA 26i1 þ DIi ; ð13Þ
then used to construct the prediction models for the sample data 27 27
of the clusters, as previously formed through SOM, thus, the closing respectively. Then, the difference between EMA_12 and
price of the next day can be predicted based on the technical indi- EMA_26 can be calculated by:
cators of a certain day. The feasibility and effectiveness of the pro-
posed hybrid SOM-GP prediction procedure were verified through
DIFi ¼ EMA 12i EMA 26i : ð14Þ
conducting experimental predictions of the finance and insurance Hence, the moving average convergence/divergence can be
sub-index of TAIEX over the period from January 4, 1996 to Sep- defined by:
tember 18, 2009. The obtained results delivered the overall root
8 2
mean squared error (RMSE), mean absolute error (MAE), and mean MACDi ¼ MACDi1 þ DIFi : ð15Þ
10 10
absolute percentage error (MAPE), as 19.44, 14.20, and 1.44 102,
respectively. Specifically, the MAPE with 1.44 102 indicates that 4. K_9: 9-day stochastic indicator K.
the absolute percentage of the differences between the actual and The 9-day stochastic indicator K is defined as:
predicted closing prices was only 1.44%, on average. The minimum 2 1 CPi LP 9i
absolute percentage error (APEmin) can attain the excellent level of K 9i ¼ K 9i1 þ 100: ð16Þ
3 3 HP 9i LP 9i
0.00156%, and the maximum absolute percentage (APEmax) of the
differences between the actual and predicted closing prices was where LP_9i and HP_9i are the lowest and highest prices of
mere 7.32%. Based on the above information, the proposed SOM- the previous 9 days, i.e. days i, i 1, . . . , i 7 and i 8,
GP prediction procedure can be considered as a feasible and effec- respectively.
tive tool for stock price prediction. Through further observations of 5. D_9: 9-day stochastic indicator D.
the distribution of actual closing prices, and their corresponding The 9-day stochastic indicator D is defined as:
prediction performance indices for the different periods, this study 2 1
concluded that the frequent, alternating rise and fall, as well as the D 9i ¼ D 9i1 þ K 9i ; ð17Þ
3 3
range of the daily closing prices during a period, can significantly
increase the difficulty of prediction. Further research directions where K_9i is the 9-day stochastic indicator K of day i, as pre-
suggested from this study might include using feature selection viously defined.
techniques to choose the most important technical indicators as 6. WMS%R_9: 9-day Williams overbought/oversold index.
the input variables of the mathematical prediction models, and The 9-day Williams overbought/oversold index is a momen-
optimizing the parameters of GP through other soft computing tum indicator that measures overbought and oversold levels,
methods, e.g. particle swarm optimization or ant colony and is calculated by:
optimization. HP 9i CP i
WMS% R 9i ¼ ; ð18Þ
HP 9i LP 9i
Appendix A. Descriptions and definitions of technical indicators
used in this study where LP_9i and HP_9i are the lowest and highest prices of
the previous 9 days, i.e. days i, i 1, . . . , i 7 and i 8,
Notations: respectively.
7. +DI_14: 14-day plus directional indicator.
i: the day i First, define plus directional movement (+DM) and minus
HPi: the highest price of day i directional movement (DM) as:
LPi: the lowest price of day i þDMi ¼ HP i HP i1 ð19Þ
OPi: the opening price of day i
CPi: the closing price of day i and
TVi: the trade volume of day i DMi ¼ LPi1 LP i ; ð20Þ
1. MA_10: 10-day moving average.
respectively. The plus true directional movement (+TDM) can
The 10-day moving average is the mean price of a security
be calculated by:
over the most recent 10 days, and is calculated by:
Pi
þDMi ; if þ DMi > DMi and þ DMi > 0;
j¼i9 CP j þTDMi ¼ :
MA 10i ¼ : ð9Þ 0; otherwise:
10
ð21Þ
2. BIAS_20: 20-day bias.
The 20-day bias is the deviation between the closing price and Similarly, the minus true directional movement (-TDM) can
the 20-day moving average (MA_20), and is calculated by: be calculated by:
C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036 14035

DMi ; if þ DMi < DMi and DMi > 0; respectively. Hence, the 5-day relative strength index can be
TDMi ¼
0; otherwise: deﬁned by:
ð22Þ AG 5i
RSI 5i ¼ 100: ð35Þ
Hence, the 14-day plus directional movement (+DM_14) can AG 5i þ AL 5i
be calculated by: 12. CCI_24: 24-day commodity channel index.
13 1 The commodity channel index is used to identify cyclical
þDM 14i ¼ ðþDM 14i1 Þ þ ðþTDMi Þ: ð23Þ turns in commodities. First, deﬁne the typical price (TP) as:
14 14
Similarly, the 14-day minus directional movement (DM_14) HPi þ LPi þ CPi
TPi ¼ : ð36Þ
can be calculated by: 3

13 1 Next, calculate the 24-day simple moving average of the typ-

DM 14i ¼ ðDM 14i1 Þ þ ðTDMi Þ: ð24Þ ical price (SMATP_24) by:
14 14
Pi
Next, define the true range (TR) as: j¼i23 TPj
SMATP 24i ¼ : ð37Þ
TRi ¼ MaxfHP i LP i ; jHPi CPi1 j; jLP i CPi1 jg: ð25Þ 24
Then, the 24-day mean deviation (MD_24) can be calculated
The 14-day true range (TR_14) can be calculated by:
by:
13 1 Pi
TR 14i ¼ TR 14i1 þ TRi : ð26Þ j¼i23 jTPj SMATP 24i j
14 14 MD 24i ¼ : ð38Þ
24
Therefore, the 14-day plus directional indicator can be de-
fined as: Hence, the 24-day commodity channel index can be defined
as:
þDM 14i
þDI 14i ¼ : ð27Þ TPi SMATP 24i
TR 14i CCI 24i ¼ : ð39Þ
0:015 MD 24i
8. DI_14: 14-day minus directional indicator.
The 14-day minus directional indicator is defined as: 13. AR_26: 26-day buying/selling momentum indicator.
The 26-day buying/selling momentum indicator is defined
DM 14i
DI 14i ¼ : ð28Þ as:
TR 14i Pi
j¼i25 ðHP j OP j Þ
where DM_14i and TR/14i are the 14-day minus directional AR 26i ¼ Pi : ð40Þ
movement and 14-day true range of day i, respectively, as j¼i25 ðOP j LP j Þ
previously defined.
14. BR_26: 26-day buying/selling willingness indicator.
9. MTM_10: 10-day momentum. The 26-day buying/selling willingness indicator is defined
The 10-day momentum measures the price changes of a as:
security during a period of 10 days, and is calculated by: Pi
j¼i25 ðHP j CP j1 Þ
BR 26i ¼ Pi : ð41Þ
MTM 10i ¼ CPi CP i10 : ð29Þ j¼i25 ðCP j1 LP j Þ

10. ROC_10: 10-day rate of change. 15. VR_26: 26-day volume ratio.
The 10-day rate of change measures the percent changes of The 26-day volume ratio is defined by:
the current price relative to the price of 10 days ago, and is
calculated by:
TVU 26i TVF 26i =2
VR 26i ¼ 100%: ð42Þ
TVD 26i TVF 26i =2
CPi CPi10
ROC 10i ¼ 100: ð30Þ where TVU_26i, TVD_26i, and TVF/26i represent the total
CP i10
trade volumes of stock prices rising, falling, and holding,
11. RSI_5: 5-day relative strength index. respectively, from the previous 26 days, i.e. days
The relative strength index is a momentum oscillator that i, i 1, . . . , i 24 and i 25.
compares the magnitude of recent gains to the magnitude 16. PSY_13: 13-day psychological line.
of recent losses. First, define the gain of day i as: The psychological line is a volatility indicator based on the
number of time intervals that the market was up during
CP i CPi1 ; if CPi > CPi1 ;
Gi ¼ ð31Þ the preceding period. The 13-day psychological line is
0; otherwise:
defined by:
Similarly, the loss of day i is calculated by:
TDU 13i
PSY 13i ¼ 100%; ð43Þ
CP i CPi1 ; if CPi < CPi1 ; 13
Li ¼ : ð32Þ
0; otherwise: where TDU_13i is the total number of days regarding stock
price rises of the previous 13 days, i.e. days i, i 1, . . . , i 11
Next, the 5-day average gain (AG_5) and 5-day average loss and i 12.
(AL_5), which can be calculated by:
4 1 References
AG 5i ¼ AG 5i1 þ Gi ð33Þ
5 5
Bae, H., Jeon, T. R., Kim, S., Kim, H. S., Kim, D., Han, S. S., et al. (2010). Optimization of
and silicon solar cell fabrication based on neural network and genetic programming
modeling. Soft Computing, 14(2), 161–169.
4 1 Bollerslev, T. (1986). Generalized autoregressive conditional heteroskedasticity.
AL 5i ¼ AL 5i1 þ Li ; ð34Þ
5 5 Journal of Econometrics, 31(3), 307–327.
14036 C.-M. Hsu / Expert Systems with Applications 38 (2011) 14026–14036

Box, G. E. P., & Jenkins, G. M. (1970). Time series analysis, forecasting, and control. San Koza, J. R., Keane, M. A., Streeter, M. J., Mydlowec, W., Yu, J., & Lanza, G. (2005).
Francisco: Holden-Day. Genetic programming IV: Routine human-competitive machine intelligence. New
Chang, P.-C., & Liu, C.-H. (2008). A TSK type fuzzy rule based system for stock price York: Springer.
prediction. Expert Systems with Applications, 34(1), 135–144. Koza, J. R., Streeter, M. J., & Keane, M. A. (2008). Routine high-return human-
Ciglarič, I., & Kidrič, A. (2006). Computer-aided derivation of the optimal competitive automated problem-solving by means of genetic programming.
mathematical models to study gear-pair dynamic by using genetic Information Sciences, 178(23), 4434–4452.
programming. Structural and Multidisciplinary Optimization, 32(2), 153–160. Lai, R. K., Fan, C.-Y., Huang, W.-H., & Chang, P.-C. (2009). Evolving and clustering
Etemadi, H., Rostamy, A. A. A., & Dehkordi, H. F. (2009). A genetic programming fuzzy decision tree for financial time series data forecasting. Expert Systems with
model for bankruptcy prediction: empirical evidence from Iran. Expert Systems Applications, 36(2), 3761–3773.
with Applications, 36(2), 3199–3207. Lee, M.-C. (2009). Using support vector machine with a hybrid feature selection
Fausett, L. (1994). Fundamentals of neural networks: Architectures, algorithms, and method to the stock trend prediction. Expert Systems with Applications, 36(8),
applications. New Jersey: Prentice-Hall. 10896–10904.
Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: Liang, X., Zhang, H., Xao, J., & Chen, Y. (2009). Improving option price forecasts with
University of Michigan Press. neural networks and support vector regressions. Neurocomputing, 72(13–15),
Huang, C.-L., & Tsai, C.-Y. (2009). A hybrid SOFM-SVR with a filter-based feature 3055–3065.
selection for stock market forecasting. Expert Systems with Applications, 36(2), Lin, G.-F., & Wu, M.-C. (2009). A hybrid neural network model for typhoon-rainfall
1529–1539. forecasting. Journal of Hydrology, 375(3–4), 450–458.
Hwang, T.-M., Oh, H., Choung, Y.-K., Oh, S., Jeon, M., Kim, J. H., et al. (2009). Pai, P.-F., & Lin, C.-S. (2005). A hybrid ARIMA and support vector machines model in
Prediction of membrane fouling in the pilot-scale microfiltration system using stock price forecasting. Omega, 33(6), 497–505.
genetic programming. Desalination, 247(1–3), 285–294. Szczurowska, I., Kuniszyk-Jozkowiak, W., & Smolka, E. (2009). Speech nonfluency
Ince, H., & Trafalis, T. B. (2008). Short term forecasting with support vector detection using Kohonen networks. Neural Computing and Applications, 18(7),
machines and application to stock price prediction. International Journal of 677–687.
General Systems, 37(6), 677–687. Thomsett, M. C. (1998). Mastering fundamental analysis. Chicago: Dearborn Financial
Kim, K.-J. (2003). Financial time series forecasting using support vector machines. Publishing.
Neurocomputing, 55(1-2), 307–319. Thomsett, M. C. (1999). Mastering technical analysis. Chicago: Dearborn Financial
Kim, K.-J., & Han, I. (2000). Genetic algorithms approach to feature discretization in Publishing.
artificial neural networks for the prediction of stock price index. Expert Systems Tsang, P. M., Kwok, P., Choy, S. O., Kwan, R., Ng, S. C., Mak, J., et al. (2007). Design and
with Applications, 19(2), 125–132. implementation of NN5 for Hong Kong stock price forecasting. Engineering
Kim, K.-J., & Lee, W. B. (2004). Stock market prediction using artificial neural Applications of Artificial Intelligence, 20(4), 453–461.
networks with optimal feature transformation. Neural Computing and Yu, L., Chen, H., Wang, S., & Lai, K. K. (2009). Evolving least squares support vector
Applications, 13(3), 255–260. machines for stock market trend mining. IEEE Transactions on Evolutionary
Kohonen, T. (1995). Self-organizing maps. Berlin: Springer-Verlag. Computation, 13(1), 87–102.
Kohonen, T. (1989). Self-organization and associative memory. Berlin: Springer- Zhang, J., An, L., Tang, T., & Hong, Y. (2009). Visual health subject directory analysis
Verlag. based on users’ traversal activities. Journal of the American Society for
Koza, J. R. (1992). Genetic programming: On the programming of computers by means Information Science and Technology, 60(10), 1977–1994.
of natural selection. Cambridge, MA: MIT Press.

View publication stats

Integration of Genetic Fuzzy Systems and Artificial Neural Networks For Stock Price Forecasting
No ratings yet
Integration of Genetic Fuzzy Systems and Artificial Neural Networks For Stock Price Forecasting
9 pages
Pump Piping Stress
100% (2)
Pump Piping Stress
81 pages
A Novel Data-Driven Stock Price Trend Prediction System
No ratings yet
A Novel Data-Driven Stock Price Trend Prediction System
10 pages
Hybrid Kohonen Som 1
No ratings yet
Hybrid Kohonen Som 1
8 pages
Stock Price Prediction Using Genetic Algorithms and Evolution Strategies
No ratings yet
Stock Price Prediction Using Genetic Algorithms and Evolution Strategies
5 pages
A Novel Technique For Selecting Financial Parameters and Technical Indicators To Predict Stock Prices
No ratings yet
A Novel Technique For Selecting Financial Parameters and Technical Indicators To Predict Stock Prices
10 pages
A Technique To Stock Market Prediction Using Fuzzy Clustering and Artificial Neural Networks Rajendran Sugumar
No ratings yet
A Technique To Stock Market Prediction Using Fuzzy Clustering and Artificial Neural Networks Rajendran Sugumar
33 pages
Ang Quek
No ratings yet
Ang Quek
15 pages
Unveiling Future Trends For Predicting Online Smart Market Stock Prices Using Ensemble Neural Network
No ratings yet
Unveiling Future Trends For Predicting Online Smart Market Stock Prices Using Ensemble Neural Network
11 pages
Does The Use of Technical & Fundamental Analysis Improve Stock Choice?
No ratings yet
Does The Use of Technical & Fundamental Analysis Improve Stock Choice?
6 pages
Genetic Algorithm Based Optimization For Efficient Investment
No ratings yet
Genetic Algorithm Based Optimization For Efficient Investment
6 pages
Sustainable Stock Market Prediction Framework Usin
No ratings yet
Sustainable Stock Market Prediction Framework Usin
15 pages
A Hybrid Machine Learning System For Stock Market Forecasting
No ratings yet
A Hybrid Machine Learning System For Stock Market Forecasting
4 pages
A Method For Assessing Financial Market Price
No ratings yet
A Method For Assessing Financial Market Price
12 pages
Clustering-Classification Based Prediction of Stock Market Future Prediction
No ratings yet
Clustering-Classification Based Prediction of Stock Market Future Prediction
4 pages
Research 4
No ratings yet
Research 4
16 pages
Fulltext01 PDF
No ratings yet
Fulltext01 PDF
91 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
11 pages
Stock Market Forecast
No ratings yet
Stock Market Forecast
15 pages
Stock Market Prediction Using Machine Language
No ratings yet
Stock Market Prediction Using Machine Language
11 pages
Evaluating Machine Learning Classification For Financial Trading
No ratings yet
Evaluating Machine Learning Classification For Financial Trading
15 pages
1 s2.0 S0950705121003828 Main
No ratings yet
1 s2.0 S0950705121003828 Main
14 pages
Stock Market Prediction Using Artificial Neural Network: Nazish Nazir, Mudasirahma Dmutto
No ratings yet
Stock Market Prediction Using Artificial Neural Network: Nazish Nazir, Mudasirahma Dmutto
4 pages
Enhancing Stock Market Forecasting - A Hybrid Model For Accurate Prediction of S&Amp P 500 and CSI 300 Future Prices - 1-s2.0-S0957417424022474-Main
No ratings yet
Enhancing Stock Market Forecasting - A Hybrid Model For Accurate Prediction of S&Amp P 500 and CSI 300 Future Prices - 1-s2.0-S0957417424022474-Main
30 pages
A Method For Automatic Stock Trading Combining Technical Analysis and
No ratings yet
A Method For Automatic Stock Trading Combining Technical Analysis and
6 pages
Machine-Learning Classification Techniques For The Analysis and P
No ratings yet
Machine-Learning Classification Techniques For The Analysis and P
292 pages
Stock Market Prediction and Investment Portfolio Selection Using Computational Approach
No ratings yet
Stock Market Prediction and Investment Portfolio Selection Using Computational Approach
10 pages
Surveying Stock Market Forecasting Techniques Part II - Soft Computing Methods - 2009
No ratings yet
Surveying Stock Market Forecasting Techniques Part II - Soft Computing Methods - 2009
10 pages
Expert Systems With Applications: George S. Atsalakis, Kimon P. Valavanis
No ratings yet
Expert Systems With Applications: George S. Atsalakis, Kimon P. Valavanis
10 pages
Paper 8660
No ratings yet
Paper 8660
4 pages
V2i1 0016
No ratings yet
V2i1 0016
7 pages
Ohlc Predictors Paper
No ratings yet
Ohlc Predictors Paper
57 pages
Sector-Based Stock Price Prediction With Machine Learning Models
No ratings yet
Sector-Based Stock Price Prediction With Machine Learning Models
12 pages
Navale 2016 Ijca 907635
No ratings yet
Navale 2016 Ijca 907635
3 pages
2015 - Predicting-Stock-Market-Index-Using-Fusion-Of-Mac - 2015 - Expert-Systems-with-A
No ratings yet
2015 - Predicting-Stock-Market-Index-Using-Fusion-Of-Mac - 2015 - Expert-Systems-with-A
11 pages
Working Paper Equity Price Direction Prediction For Day Trading Ensemble Classification Using Technical Analysis Indicators With Interaction Effects
No ratings yet
Working Paper Equity Price Direction Prediction For Day Trading Ensemble Classification Using Technical Analysis Indicators With Interaction Effects
9 pages
Chen C 66900 PHD Thesis
No ratings yet
Chen C 66900 PHD Thesis
148 pages
Stock Price Analysis and Prediction Using Machine Learning 2
No ratings yet
Stock Price Analysis and Prediction Using Machine Learning 2
6 pages
An Automated Stock Investment System Using Machine Learning Techniques An Application in Australia
No ratings yet
An Automated Stock Investment System Using Machine Learning Techniques An Application in Australia
4 pages
Expert Systems With Applications: Jigar Patel, Sahil Shah, Priyank Thakkar, K Kotecha
No ratings yet
Expert Systems With Applications: Jigar Patel, Sahil Shah, Priyank Thakkar, K Kotecha
10 pages
Masuda Jmasuda Meng Eecs 2024 Thesis
No ratings yet
Masuda Jmasuda Meng Eecs 2024 Thesis
76 pages
SSRN 4622722
No ratings yet
SSRN 4622722
22 pages
IJCRT2305032
No ratings yet
IJCRT2305032
10 pages
Stock Prediction Using Machine Learning Algorithm
No ratings yet
Stock Prediction Using Machine Learning Algorithm
5 pages
A Hybrid Genetic-Neural Architecture For Stock Indexes Forecasting
No ratings yet
A Hybrid Genetic-Neural Architecture For Stock Indexes Forecasting
31 pages
IMECS2016 pp317-321
No ratings yet
IMECS2016 pp317-321
5 pages
Stock Price Prediction - Removed
No ratings yet
Stock Price Prediction - Removed
41 pages
Hybrid Models For Intraday Stock Price Forecasting Based On Artificial
No ratings yet
Hybrid Models For Intraday Stock Price Forecasting Based On Artificial
10 pages
Weng 2018
No ratings yet
Weng 2018
40 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
Forecasting Price in A New Hybrid Neural Network Model With Machine Learning
No ratings yet
Forecasting Price in A New Hybrid Neural Network Model With Machine Learning
12 pages
Naan Muthalvan Project Report Stock Market Forecast 4310
No ratings yet
Naan Muthalvan Project Report Stock Market Forecast 4310
29 pages
Stock Market Analysis Using Classification Algorithm PDF
No ratings yet
Stock Market Analysis Using Classification Algorithm PDF
6 pages
Research Article
No ratings yet
Research Article
9 pages
(Original) Hybrid Intelligent Systems For Stock Market Analysis
No ratings yet
(Original) Hybrid Intelligent Systems For Stock Market Analysis
9 pages
Finance 7
No ratings yet
Finance 7
16 pages
LiTsa Improve FLAIRS99
No ratings yet
LiTsa Improve FLAIRS99
13 pages
Duplex Models of Complex Systems
From Everand
Duplex Models of Complex Systems
Steven H. Kim
No ratings yet
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
From Everand
Applied Techniques for GPT-3: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
From Everand
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet
ANFIS FOR Stock Price Prediction 2
No ratings yet
ANFIS FOR Stock Price Prediction 2
5 pages
CLAUDE-Metode-Numerice-1-50-pages ENGLISH
No ratings yet
CLAUDE-Metode-Numerice-1-50-pages ENGLISH
38 pages
A Limited T,: Memory Algorithm For Bound Constrained T, T
No ratings yet
A Limited T,: Memory Algorithm For Bound Constrained T, T
19 pages
Discovering Intra Day Price Patterns Using Som
No ratings yet
Discovering Intra Day Price Patterns Using Som
7 pages
1 BEST Financial Management For Technology Start-Ups - Alnoor Bhimani
No ratings yet
1 BEST Financial Management For Technology Start-Ups - Alnoor Bhimani
339 pages
Acstv10n5 20
No ratings yet
Acstv10n5 20
12 pages
Variation of Velocity and Acceleration in Suction and Delivery Pipes Due To Acceleration of Piston
100% (1)
Variation of Velocity and Acceleration in Suction and Delivery Pipes Due To Acceleration of Piston
9 pages
BOA Spot Tool Application Guide v1.0
No ratings yet
BOA Spot Tool Application Guide v1.0
48 pages
Software Midterm
No ratings yet
Software Midterm
10 pages
Analysis and Linear Algebra For Finance Part I
No ratings yet
Analysis and Linear Algebra For Finance Part I
127 pages
Lab 3 Matlab
No ratings yet
Lab 3 Matlab
19 pages
Control Proporcional
No ratings yet
Control Proporcional
5 pages
Day 7 - Alligation and Mixture
100% (2)
Day 7 - Alligation and Mixture
55 pages
Computation of Turbulent Buoyant Ows in Enclosures With Low-Reynolds-Number K-X Models
No ratings yet
Computation of Turbulent Buoyant Ows in Enclosures With Low-Reynolds-Number K-X Models
13 pages
Similar Triangles: Chapter - 8
No ratings yet
Similar Triangles: Chapter - 8
25 pages
Converse, Inverse, Contrapositive, and Biconditional: Welcome, Grade 8
No ratings yet
Converse, Inverse, Contrapositive, and Biconditional: Welcome, Grade 8
20 pages
Fundamentals of Computer Programming: Arrays (CLO3)
No ratings yet
Fundamentals of Computer Programming: Arrays (CLO3)
17 pages
9 - Maths - L-3-Coordinate Geometry WS-1
No ratings yet
9 - Maths - L-3-Coordinate Geometry WS-1
6 pages
Michaelis Manten Kinetics
No ratings yet
Michaelis Manten Kinetics
8 pages
Cs 101
No ratings yet
Cs 101
29 pages
Review of SIR Calculations For Distance Protection
No ratings yet
Review of SIR Calculations For Distance Protection
7 pages
Worksheet-1 Trigonometry
No ratings yet
Worksheet-1 Trigonometry
3 pages
Regression Analysis With Scilab
No ratings yet
Regression Analysis With Scilab
57 pages
Ordered Pair:-An Ordered Pair Consist of Two Elements in A Fixed Order
No ratings yet
Ordered Pair:-An Ordered Pair Consist of Two Elements in A Fixed Order
19 pages
Binary Arithmetic: Example of Binary Addition
No ratings yet
Binary Arithmetic: Example of Binary Addition
22 pages
Calculus Strauss 5th Edition Solution Manual
0% (9)
Calculus Strauss 5th Edition Solution Manual
8 pages
CANDIDATE-ELIMINATION Learning Algorithm
0% (1)
CANDIDATE-ELIMINATION Learning Algorithm
3 pages
A Review On Cartans Structure Equations For Certa
No ratings yet
A Review On Cartans Structure Equations For Certa
7 pages
Excercise 13.1
No ratings yet
Excercise 13.1
16 pages
LibreOffice Calc Guide 5
No ratings yet
LibreOffice Calc Guide 5
20 pages
3238-Article Text-5879-1-10-20180104
No ratings yet
3238-Article Text-5879-1-10-20180104
140 pages
Test Review
No ratings yet
Test Review
8 pages
7TH Semester Syllabus
No ratings yet
7TH Semester Syllabus
9 pages
Fuzzy Rule Base and Approximate Reasoning
No ratings yet
Fuzzy Rule Base and Approximate Reasoning
31 pages
Ship Hull Inspection PDF
No ratings yet
Ship Hull Inspection PDF
7 pages

SOM-GP Stock Price Prediction System

Uploaded by

SOM-GP Stock Price Prediction System

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A hybrid procedure for stock price prediction by integrating self-organizing

Article in Expert Systems with Applications · May 2011

The user has requested enhancement of the downloaded file.

Contents lists available at ScienceDirect

Expert Systems with Applications

A hybrid procedure for stock price prediction by integrating self-organizing map

layer is fully connected to a two-dimensional Kohonen layer, and

Fig. 3. Proposed hybrid SOM-GP prediction procedure.

Number of neurons in the Kohonen layer 3 4 5 6 7 8 9 10

Cluster 1 2 3 4 5 6 7 Total Dataset Training Testing Validation Dataset

Period No. Validation period RMSE MAE MAPE APEmax APEmin

13 1 Next, calculate the 24-day simple moving average of the typ-

View publication stats

You might also like