0% found this document useful (0 votes)
10 views10 pages

A Novel Data-Driven Stock Price Trend Prediction System

This paper presents a novel stock price trend prediction system called Xuanwu, which predicts stock price movements and growth intervals using an unsupervised heuristic algorithm. The system classifies stock price data into four main classes and utilizes machine learning techniques, including random forests, to enhance prediction accuracy. Evaluation results indicate that Xuanwu outperforms existing methods in terms of prediction effectiveness and robustness against market volatility.

Uploaded by

陳岱佑
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views10 pages

A Novel Data-Driven Stock Price Trend Prediction System

This paper presents a novel stock price trend prediction system called Xuanwu, which predicts stock price movements and growth intervals using an unsupervised heuristic algorithm. The system classifies stock price data into four main classes and utilizes machine learning techniques, including random forests, to enhance prediction accuracy. Evaluation results indicate that Xuanwu outperforms existing methods in terms of prediction effectiveness and robustness against market volatility.

Uploaded by

陳岱佑
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Expert Systems With Applications 97 (2018) 60–69

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A novel data-driven stock price trend prediction systemR


Jing Zhang a,∗, Shicheng Cui a, Yan Xu a, Qianmu Li a, Tao Li b
a
School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei Street, Nanjing 210094, China
b
School of Computer Science, Florida International University, 11200 SW 8th Street, Miami, FL 33199, USA

a r t i c l e i n f o a b s t r a c t

Article history: This paper proposes a novel stock price trend prediction system that can predict both stock price move-
Received 8 September 2017 ment and its interval of growth (or decline) rate within the predefined prediction durations. It utilizes an
Revised 12 December 2017
unsupervised heuristic algorithm to cut raw transaction data of each stock into multiple clips with the
Accepted 13 December 2017
predefined fixed length and classifies them into four main classes (Up, Down, Flat, and Unknown) accord-
Available online 13 December 2017
ing to the shapes of their close prices. The clips in Up and Down can be further classified into different
Keywords: levels reflecting the extents of their growth (or decline) rates with respect to both close price and rela-
Feature selection tive return rate. The features of clips include their prices and technical indices. The prediction models are
Morphological pattern recognition trained from these clips by a combination of random forests, imbalance learning and feature selection.
Random forest Evaluations on the seven-year Shenzhen Growth Enterprise Market (China) transaction data show that
Stock price prediction the proposed system can make effective predictions, is robust to the market volatility, and outperforms
some existing methods in terms of accuracy and return per trade.
© 2017 Elsevier Ltd. All rights reserved.

1. Introduction ing models (such as C4.5, K∗ , logistic model tree, etc.) are in the
range of 48% ∼ 54%.
Stock price trend prediction is a classic and interesting topic Traditional technical analysts have developed many indices and
that has attracted many researchers and participants in multiple sequential analytical methods that may reflect the trends in the
disciplines such as economics, financial engineering, statistics, op- movements of the stock price. However, technical analysis contra-
erations research, and machine learning. Although a lot of efforts dicts with the efficient-market hypothesis but they cannot make
have been paid during the past several decades (Abarbanell & generalised inferences regarding the accuracy. For example, the
Bernard, 1992; Adam, Marcet, & Nicolini, 2016; Adebiyi, Adewumi, efficient-market hypothesis states that as long as the market is
& Ayo, 2014; Blume, Easley, & O’hara, 1994; Göçken, Özçalıcı, Boru, weak-form efficient, the price of a stock follows the random walk
& Dosdoğru, 2016), accurate forecast of the stock price, even its model (Fama, 1995) and cannot be predicted by analyzing prices
movements, is still not easy to achieve hitherto, though some ad- from the past. Meanwhile, the prices are affected by many macro-
vanced machine learning techniques have been utilized. For in- economical factors, fundamental factors of companies and the in-
stance, Kim (2003) used support vector machines to predict the volvement of public investors. Therefore, some criticism of techni-
direction of the daily socket price movements in Korea, obtaining cal analysis is that it only considers transactional data of stocks
a hit rate 56%. Schumaker and Chen (2009) included the text min- and completely ignores the fundamental factors of companies
ing technique into socket price forecast, achieving a hit rate 57%. (Nassirtoussi, Aghabozorgi, Wah, & Ngo, 2014; Patel, Shah, Thakkar,
Tsai and Wang (2009) combined the decision tree and neural net- & Kotecha, 2015) which might be helpful, if the market is in weak-
works to make prediction to Taiwan stock market. The accuracy form efficiency.
of the hybrid model achieves around 70%. However, their test data The fundamental factors of a company cover many aspects such
sets were relatively small, only including dozens of stocks. Accord- as basic financial status, marketing and development strategies, po-
ing to a recent empirical study (Gerlein, McGinnity, Belatreche, & litical events, general economic conditions, commodity price in-
Coleman, 2016), the prediction accuracies of several machine learn- dices, interest rate changes, movements of other stock markets,
expectations and psychology of investors, and so on. Comprehen-
R
sively figuring out the impact of these compound factors on the
The source code of the system is available at https://sourceforge.net/p/xuanwu/
svn/. movement of the stock price is obviously out of the capabil-

Corresponding author. ity of human analysts. Researchers have begun to develop some
E-mail addresses: [email protected] (J. Zhang), [email protected] (Q. Li), text-mining based methods that can automatically analyze some
taoli@cs.fiu.edu (T. Li).

https://doi.org/10.1016/j.eswa.2017.12.026
0957-4174/© 2017 Elsevier Ltd. All rights reserved.
J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69 61

of these fundamental factors (Nassirtoussi et al., 2014). For ex- prediction goals derived from actual user requirements, and its ap-
ample, Schumaker and Chen (2009) extracted information from plication interface is of the maximum availability that is suitable
the breaking financial news to increase the accuracy of predic- for any stock and any prediction duration; (3) it provides a simple
tion. Bollen, Mao, and Zeng (2011) analyzed the mood of investors and easy-to-test framework in which different supervised learn-
from twitter to reveal the sentiments of investors to some stocks. ing models and feature selection methods can be easily integrated.
Ruiz, Hristidis, Castillo, Gionis, and Jaimes (2012) analyzed the cor- Experimental results show that the proposed system outperforms
relations of financial time series from micro-blogging activities. some state-of-the-art methods in stock movement prediction even
Si et al. (2013) proposed a technique to leverage topic based sen- though the models of the compared methods are trained with
timents from Twitter to facilitate the prediction of the stock mar- carefully human-labeled samples.
ket. Even the up-to-date deep learning techniques have been in- The remainder of the paper is organized as follows.
troduced to conduct event-driven stock market prediction, where Section 2 describes the requirements from the aspect of users.
events are extracted from news (Ding, Zhang, Liu, & Duan, 2015). Section 3 illustrates the architecture of the proposed system.
However, the automatic fundamental factor analysis may be of Section 4 describes our unsupervised method to generate train-
some weakness. First, even though the messages or reports are re- ing samples. Section 5 presents our learning method in details.
leased by the companies, public media or some third-party insti- Section 6 addresses experimental results and discussions on these
tutes, it still cannot be guaranteed that there is no misleading in- results. Section 7 concludes the paper and points out some future
formation. Second, it is not very clear how strong the correlation is work.
between the released information and the stock price movement.
Third, when the market is in semi-strong-form and strong-form ef- 2. Requirements
ficiencies, the fundamental factor analysis even cannot bring excess
returns (Timmermann & Granger, 2004). Xuanwu follows an assumption that the actual investment activ-
Fortunately, in today’s big data age, above issues could be by- ities which are carried out based on the prediction results of the
passed, as a new train of thought, saying ”let the data speak for system do not have a far-reaching impact on the movements of
themselves”, has been proposed and drawn more attention. Unlike the stock prices in the future. Therefore, it is specially suitable for
the information obtained from newspapers, micro-blogging and the small startup investment companies that collect money from a
twitter, the everyday transaction data taking place in trade sys- small population and return the profits in fixed contract periods,
tems are absolutely realistic. The rapid development of machine whose investment volumes usually do not cause obvious market
learning provides a lot of new opportunities to utilize these trans- fluctuation. Otherwise, the prediction will be inaccurate. Moreover,
action data to predict the trend of the stock price movement. In these companies are unlikely to trade a stock very frequently (e.g.,
fact, applying machine learning to stock prediction has been stud- daily), nor do they hold a stock for a long time (e.g., more than
ied for over thirty years. The early studies in 1990s mainly fo- three months) without make a deal. We outline the key points of
cused on using Neutral Networks to make prediction (Schöneburg, user requirements in this section.
1990; Zhang, Patuwo, & Hu, 1998), which partially refuted the va-
lidity of efficient market hypothesis (Lawrence, 1997). For example, 2.1. Prediction granularity
Tsibouris and Zeidenberg (1995) utilized neural networks to pre-
dict stock price only based on past stock prices. The performance Nowadays, stock trades can take place in very high frequency
of these early methods usually was not good because of the size when the market is open. The prediction granularity can be vari-
limitation of the neural networks. To address this issue, some re- ous such as second, minute, hour, day and even a fix investment
cent studies resort to fusion or combination of models (Hadavandi, period. Xuanwu chooses trade day as the prediction granularity.
Shavandi, & Ghanbari, 2010; Tsai & Hsiao, 2010) and ensemble That is, it predicts the trend of a stock in a predefined period mea-
learning (Ballings, Van den Poel, Hespeels, & Gryp, 2015; Barak, Ar- sured by trade day. Short-term stock prediction is also interesting
jmand, & Ortobelli, 2017; Tsai, Lin, Yen, & Chen, 2011). All above (Lin, Yang, & Song, 2009) but not suitable for startup investment
studies have a common weak point that their practical availability companies because of the constraints in the capital volumes and
is still questionable. In their studies, a small amount of carefully transaction costs. The standard prediction durations of Xuanwu (re-
selected and labeled stock data were used to train and test mod- fer to Section 4.1) are 10, 15, 20, 30, 40, 50, and 60 trade days,
els. Since the data do not cover all stocks and their movements which spans two weeks to three months.
in a stock market, the generalisation capabilities of the models are
reduced in the real applications. 2.2. Automatic pattern discovery
A real stock market carries out huge amount of transactions
every day. We cannot expect that a real-world computer-aided In the era of big data, continuous growth of generated data re-
decision system heavily relies on humans selecting and labeling quires that the learning models update accordingly within short
the data used for model training. Unsupervised pattern recog- productive periods (Chen & Zhang, 2014; Sakurai, Matsubara, &
nition becomes more and more important in today’s big data Faloutsos, 2015). Obviously, it is no longer possible that training
age (Wu, Zhu, Wu, & Ding, 2014). If the problem of automatic samples are still selected and labeled by humans. Xuanwu aims to
data preprocessing cannot be solved, the system will hardly to get through all machine learning processes from generating train-
be pushed into a real usage, even if the learning algorithms in- ing samples from the raw transaction data to building the predic-
side are advanced. In this paper, we propose a novel data-driven tion models without any human intervene. All that users need to
stock price trend prediction system Xuanwu1 The contribution of do is to prepare a copy of original transaction data and then click
Xuanwu is three-fold. (1) it introduces unsupervised pattern recog- to start the learning progress. All patterns that we are interesting
nition methods to generate training samples from raw transaction in are extracted from every stock in the market. Then, the pieces
data without any human intervene; (2) it is a system for a real of interested patterns are transformed into training samples.
usage, in which multiple learning models are trained to meet the
2.3. Subdivision of classes
1
Xuanwu (Black Tortoise in English) is one of the Four Symbols of the Chinese
constellations, usually depicted as a tortoise entwined together with a snake. The Since Xuanwu aims to predict the trend of a stock price move-
creature was thought to have spiritual power to predict the future. ment by the end of a predefined period, it defines four main
62 J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69

classes based on the shapes of the close prices of a stock, i.e., the
price will (1) rise up (class Up), (2) go down (class Down), (3) ap-
proximate the same (class Flat) and (4) vibrate with large ampli-
tudes (class Unknown). Besides these four main classes, for class
Up (Down), we are also interested in the extents of the growth (de-
cline). Thus, for class Up (Down), it defines two sub-classes, i.e, the
growth (decline) rate is (1) within the range [10%, 30%] (class UA1
(DA1)) and (2) greater than 30% (class UA2 (DA2)). In addition to
the changes of close prices, we are also interested in the relative
return which is the return achieved by an asset over a specific time
period contrasted to a benchmark. For a period of n days, the log
relative return can be calculated as follows:

n
rt = (ln (1 + fi ) − ln (1 + bi )), (1)
i=1

where fi and bi are the asset return and benchmark return in the
i-th day, respectively. According to the relative return rate, for class
Up (Down), it defines two sub-classes, i.e., the increase (decrease)
of relative return rate is (1) less than 10% (class UR1 (DR1)), (2) Fig. 1. The system architecture of Xuanwu.
within the range [10%, 20%] (class UR2 (DR2)), and (3) greater than
20% (class UR3 (DR3)). According to these two kinds of measures,
Xuanwu runs at two prediction modes AbsoluteMode and Relative- four values equals 1. modeType can be AbsoluteMode or Relative-
Mode. Mode. If probClassUp (probClassDown) is the largest one among the
In the RelativeMode, the patterns (shapes) of stocks are recog- first four elements, when parameter mode in prediction API is set
nized and the classes are assigned based on the relative returns to RelativeMode, the last three elements provide the probabilities
rather than the close prices, because investors sometimes are more that the increase (decrease) of the price will be classified as UR1
interested in whether they can win over the ”average”, for exam- (DR1), DR2 (DR2) and UR3 (DR3), respectively. When parameter
ple, when the investments are made in a typical bull (or bear) mar- mode in prediction API is set to AbsoluteMode, probThirdClass is
ket. In a bull (or bear) market, most predictions are likely to be Up meaningless and the other two provide the probabilities of classes
(or Down), if the close prices are used. UA1 (DA1) and UA2 (DA2).

2.4. Prediction interface


2.5. Auxiliary functions
The system provides an easy-to-use prediction API as follows:
The system has some auxiliary functions to facilitate its usage.
XwResult predict(stockId, duration, modelName, mode), For example, we implemented a visualization tool to display the
training samples. We can use this tool to check whether the shapes
where XwResult is an object holding the prediction results, stockId of the samples follow the desired patterns. We also implemented
is an identifier for a stock, duration is a continuous sequence of a set of applications that can be easily used for sample generation,
dates (i.e., prediction duration), modelName specifies a learning model training, prediction and result validation.
model, and mode specifies a prediction mode (AbsoluteMode or Rel-
ativeMode).
When making prediction, we must specify a continuous se- 2.6. A typical use case
quence of dates (duration) which serves as a time series for pre-
diction. For example, when we input the date sequence from Feb. We describe a typical use case to help understand the require-
1, 2017 to Feb. 10, 2017, the price trend by the end of Feb. 15, 2017 ments of the system. Every day, the analyst in the company queries
will be returned if parameter modelName is not specified. There are the prediction results by the end of the next five trade days of each
several models in the system (refer to Section 4.1). If we do not stock via specifying the most recent 10 trade days. Then, by rank-
explicitly specify one of them, the most suitable model will be au- ing all prediction results, s/he can quickly find those stocks which
tomatically chosen for prediction. The prediction results are stored will rise up with the largest probabilities and the largest extent
in an object XwResult defined as follows: in the next five trade days. Analysts can focus on a small group
of stocks with a higher probability of rising, which makes their
class XwResult { choices more efficient and, to some extent, avoids selecting the
public Double probClassUp; stocks with a high probability of decline. Every three months, the
public Double probClassDown; analyst runs the model training tool again to update the all predic-
public Double probClassFlat; tion models. The updated models will cover all transaction data of
public Double probClassUnknown; all stocks in the recent three months.
public String modeType;
public Double probFirstClass;
public Double probSecondClass; 3. System architecture
public Double probThirdClass;
}. The system architecture of Xuanwu is illustrated in Fig. 1. The
main components of the system include a model training tool, a
Here, the first four elements are the probabilities of the four fundamental data module, a prediction API and a visualization tool
main classes Up, Down, Flat and Unknown. The sum of these first (XwExplorer).
J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69 63

3.1. Fundamental data module (FDM)

FDM provides the original transaction data and a technical in-


dex tool, which serves as a basic database for generating training
samples and the instances to be predicted. The technical index tool
is implemented with an open source library Ta-Lib2 . Traditional
technical analysis indices can serve as features for model training
(Ni, Ni, & Gao, 2011).

3.2. Model training tool (MTT)

MTT is the most complicated component in Xuanwu. It first


Fig. 2. Definitions of three durations and the growth rate when preprocessing the
reads the raw transaction data from FDM. Then, the Training Sam- raw transaction data for a stock.
ple Generator (TSG) divides the raw transaction data into pieces
(we call them Clips) and analyzes their shapes to match the pat- Table 1
terns defined in a pattern description document. The training sam- Predefined durations in Xuanwu.
ples are generated according to the recognized patterns, combin- PD MD Models
ing with some technical indices as features. Finally, machine learn-
10 6 M106, UA106, DA106, UR106, DR106
ing algorithms are utilized to build learning models. We directly
15 10 M1510, UA1510, DA1510, UR1510, DR1510
implemented the learning algorithms using a well-known open 20 13 M2013, UA2013, DA2013, UR2013, DR2013
source machine learning tool WEKA (Hall et al., 2009). The details 30 20 M3020, UA3020, DA3020, UR3020, DR3020
of the training sample generation and learning model building will 40 26 M4026, UA4026, DA4026, UR4026, DR4026
50 33 M5033, UA5033, DA5033, UR5033, DR5033
be further discussed in Section 4 and Section 5, respectively.
60 40 M6040, UA6040, DA6040, UR6040, DR6040

3.3. Prediction API and XwExplorer

The function signature of the prediction API has been discussed Definition 3. Test (Input) Duration (TD) is a time span within
in Section 2.4. Another main function of the prediction API is to which all data points form an input instance whose labels and
generate the instance to be predicted given the continuous se- their probabilities will be predicted.
quence of dates through parameter duration. This function will be
Fig. 2 shows the relationship among the three durations. PD is
further discussed in Section 4.4 after the details of the training
always greater than both MD and TD, because it embraces a pattern
sample generation are introduced. XwExplorer is an easy-to-use vi-
(i.e., the shape of the line connecting all close prices) that is ex-
sualization tool for users to build the learning models and check
pected to be eventually observed. The set of all data points within
the results of the training sample generation and prediction.
PD is named a Clip. Obviously, we cannot use a Clip as a train-
ing sample, because a meaningful forecast is to make prediction
4. Training sample generation as soon as possible. The earlier we make a correct prediction, the
greater the benefit will be. In our system, we use all data points
In this section, we will present our training sample generation within MD as a training sample. The time span of MD is two-thirds
scheme in Xuanwu in detail. that of PD, which not only results in good prediction accuracy but
also leaves enough room for price movement (increase or decrease)
4.1. Morphological patterns for classes and enough opportunities for decision making. Simply speaking,
the shape of a Clip determines the class label of its correspond-
Our training sample generation scheme is based on the shape ing training sample. Usually, an unlabeled instance to be predicted
of the close prices of a stock in predefined fixed trade durations. should have the same dimensions as the training sample. However,
Different from some studies that focus on the traditional techni- in a real-world environment, we hope that the system can exhibit
cal shapes defined by stock analysts (Jeon, Hong, & Chang, 2017), some robustness. For example, if the system can make prediction
we only focus on several simple shapes. Traditional prediction by given 10 days of data, it should be able to work if 9 (or 11) days
technical shapes is to make prediction when the predefined shapes of data are given, even though it might not be so good. Thus, the
appear, which usually does not work well because of the weak length of our input duration (i.e., TD) is not necessarily the same
correlations between the trend of the price movement and these as that of MD.
technical shapes (Nassirtoussi et al., 2014; Patel et al., 2015). Our In our system, we have seven predefined PD and MD pairs,
method is to predict the probability of forming a predefined shape which reflect the actual requirement of the small investment com-
in a fixed duration when we only see some of data in early trade panies. For each pair, we can train five different models: (1) M: a
days. We define the close prices in a fixed duration which form a model for four main classes, (2) UA: a model for classes UA1 and
specific shape as a pattern. In our system, there are three kinds of UA2, (3) DA: a model for classes DA1 and DA2, (4) UR: a model
durations as follows. for classes UR1, UR2, and UR3, and (5) DR: a model for classes
DR1, DR2, and DR3. (The meanings of these classes can be found
Definition 1. Pattern (Prediction) Duration (PD) is a time span in Section 2.3). Table 1 lists all predefined durations and their cor-
within which we expect the trend of close prices exhibits a spe- responding models.
cific pattern. The patterns predefined in the current version of Xuanwu are
Definition 2. Model (Training) Duration (MD) is a time span illustrated in Fig. 3. It is well known that two typical trends - Con-
within which all data points are used as a training sample. tinuous Up and Sideways Up - usually arouse investors’ interests.
These two kinds of trends form the first main class Up. Accord-
ingly, the mirror image of these two trends, namely Continuous
2
http://ta-lib.org/. Down and Sideways Down, form the second main class Down. If the
64 J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69

Fig. 3. Patterns predefined in Xuanwu.

all points between the first and last trade days. If a certain propor-
tion η (e.g., η = 95%) of points are in the area ABCD, this Clip can
be classified as Continuous Up.
The recognition of pattern Sideways Up is more complicated. We
outline the main steps in Algorithm 1. The algorithm has more pa-

Algorithm 1 Recognition for pattern Sideways Up.


Input: A Clip, α1 , α2 , δ and η
Output: Whether the Clip is Sideways Up
1: F lag = F ALSE
2: Calculate the growth rate (G)
3: for each point S between 50% and 80% PD do
4: if the close price of S in the range [α1 G, α2 G] then
5: Draw an area ABCD and a line DE (points A, B, C, and D
are determined by α1 and α2 ; point E is determined by
δ)
6: if a proportion η of points between the first point and S
are in area ABCD and a proportion η of points between S
the last point are above line DE then
7: F lag = T RUE
8: return F lag
9: end if
Fig. 4. Heuristic algorithms for recognition of patterns Continuous Up and Sideways 10: end if
Up.
11: end for
12: return F lag

close prices of the stock do not change considerably in a fixed du-


ration, it will be the third class Flat. Those shapes that cannot be rameters but the principle is the same as the recognition of pat-
classified into the above three classes form the last class Unknown, tern Continuous Up. When we draw an area that the points should
which usually has large vibrations. The ten sub-classes, UA1, UA2, lie in, we have a larger search space, because point S can move in
... , etc., can be simply derived from classes Up and Down by calcu- the range [50%PD, 80%PD]. All parameters in Algorithm 1 can be
lating and ranking the growth and decline rates of absolute prices set as follows: δ = 20%, α1 = 5%, α2 = 10% and η = 95%. Note that
and relative returns. when recognizing Sideways Up we do not need line 4 in Fig. 4.
For patterns Flat, Continuous Down and Sideways Down, the al-
4.2. Pattern recognition algorithms gorithms are very similar.

To avoid labeling samples by humans, Xuanwu introduces un- 4.3. Training samples
supervised heuristic algorithms to recognize patterns. Fig. 4 illus-
trates the basic idea of our algorithms for recognition of patterns Xuanwu is designed with the intention of covering as many as
Continuous Up and Sideways Up. shapes that a stock appear in its trade history. For each stock, we
The recognition of pattern Continuous Up is relatively simple. use a sliding window method to cut its historical transaction data
First, we calculate the growth rate (G) by comparing the close into multiple Clips as Fig. 5 shows. The window size equals to a
prices of the first and the last trade days. We allow the close prices predefined prediction duration (PD). From the first trade day to the
of the first and last trade days to vary in the range [−δ G, δ G] (e.g., last trade day, the step of window sliding is one day. As we can
δ = 20%). Then, we can draw two lines AB and CD. Finally, we check see in Fig. 5, Clip1 and Clip2 follow pattern Continuous Up. They
J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69 65

Fig. 5. Using sliding window method to cut transaction data of a stock into Clips.

will be labeled as class Up. As the window moves, Clip3 does not model is built by the random forests (Breiman, 2001), which oper-
follow any pattern, so it will be labeled as class Unknown. Clip5 ate by constructing a multitude of decision trees at training time.
again follows Continuous Up, Clip18 follows Continuous Down and The decision tree divides the features through its internal nodes
Clip30 follows Sideways Up. Thus, our method generates as many to establish the classification model. When creating partitions to
samples as we can, which covers all situations we may encounter a feature, the goodness of a partition is measured by purity. If a
in the future. partition is pure, for each sub-branch of this node, its instances
Note that Clips just represent the patterns that we are in- belong to the same class. For node m, let Nm be the number of
terested in, while they are not training samples. As mentioned training samples arrived. For the root node, we have Nm = N. Sup-
k out of N instances belong to class k and
 k
in Section 4.1, the training sample should be the first MD pose Nm m k Nm = Nm . If
points of a Clip whose duration is PD. Formally, for a Clip C = a test instance (Clips) x arrives at node m, the probability that it
{F1 , F2 , . . . , FPD , y}, the corresponding training sample will be C = belongs to class k will be estimated as:
{F1 , F2 , . . . , FMD , y}, where Fi (i ∈ [1, PD] ) is the data in trade day i i
Nm
and y is the class label of this sample. Fi ) serves as a part of fea- Pˆ(k|x, m ) ≡ pkm = . (3)
tures of the training sample. Because our pattern recognition algo- Nm
rithm is based on the close prices of a stock, Fi at least contains For a node m presenting feature f, the best partition maximizes
one element close price. We can extend Fi by adding more infor- the purity that is measured by information gain. First, we define
mation as follows: the entropy of node m as:
K 
 
Fi = {closepricei , openpricei , highpricei ,
Entropy(m ) = − pkm log2 pkm , (4)
lowpricei , volumei , ti(1) , ti(2) , . . . , ti(n ) }, (2) k=1

where ti(k ) (k ∈ [1, n] ) is the k-th technical index of trade day i. Fi- where K is the total number of classes. Then, for the v val-
nally, the features of the training sample can be extended to high ues of feature f, instances on node m are split into v partitions
dimensions, because it includes information of multiple trade days. {m1 , . . . , mv }, where mj (1 ≤ j ≤ v) contains those instances in m that
have outcome the vth value of f. We define the information gain on
4.4. Test instance node m as follows:

v
|m j |
As defined in Section 4.1, test duration (TD) does not neces- Gain(m ) = Entropy(m ) − × Entropy(m j ). (5)
sarily equal to model duration (MD), because the perdition API in j=1
|m|
Section 2.4 tells us that users can provide the duration with any
To overcome the over-fitting problem when using a single clas-
number of continuous dates as the input. The system will find the
sification decision tree, we utilize the ensemble of multiple deci-
most suitable model for prediction, if parameter modelName is not
sion trees (i.e., random forests) to make prediction. Suppose we
specified. For example, if the duration includes 9 days, the sys-
should construct a random forest with B decision trees. Each time,
tem will select model M1510 in Table 1 for prediction. However,
we perform a uniform random sampling with replacement on the
this model requires the features of an test instance have exactly 10
training set to get a sub training set with N samples. Then, we use
days’ information. To address this issue, we introduced an interpo-
it to build a decision tree hb whose output vector pb = [ p1b , . . . , pKb ]
lation method with linear scaling which has been widely used in
provides the probabilities of an instance being all classes. Finally,
image processing (Lehmann, Gonner, & Spitzer, 1999) to extend or
the probability of a test instance x being class k is calculated as:
shrink the test duration.
1 k
B

5. Model training Pˆ(k|x, H ) = pb . (6)


B
b=1
The current version of Xuanwu partially utilizes the off-the-self Finally, the hard class label of instance x can be obtained through
classification algorithms implemented in WEKA (Hall et al., 2009) a plurality voting:
to build learning models, among which we find that the models
trained by the random forests (Breiman, 2001) perform well. To L(x ) = argmax Pˆ(k|x, H ), (7)
k∈1,...,K
further improve the performance of the learned classifiers, we fo-
cus on two issues in model training: imbalanced class distribution where function L() returns the hard class label.
and feature selection.
5.2. Imbalanced class distribution
5.1. Learning with random forests
Intuitively, the numbers of Clips that follow the patterns Con-
As described in Section 2.4, the prediction interface outputs the tinuous Up, Sideways Up, Continuous Down, Sideways Up, Flat are
probability of a Clip being a certain pattern (class). The prediction far less than the number of Clips that belongs to Unknown. Thus,
66 J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69

it will appear imbalanced learning problem (He & Garcia, 2009). learning models. If the MD of a sample is 6, its feature dimension
Usually, the differences of the numbers of samples belonging to will be 24. Since we do not know whether other information (e.g.,
classes Up, Down and Flat Sideways Up are not too large. To form a volume and technical indices) has positive impact on the perfor-
better training set, we should balance the number of samples for mance of learned classifiers, feature selection techniques can be
all classes using the undersampling technique. used for tuning their performance, which has been widely used
In fact, our imbalanced-class treatment is embedded in the con- by some stock prediction systems (Huang, Chang, Cheng, & Chang,
struction of random forests. Algorithm 2 shows the skeleton of the 2012; Ni et al., 2011; Tsai & Hsiao, 2010).
Although there are a lot of feature selection methods can be
Algorithm 2 Skeleton of model training. used, finding an optimal combination of features is still an NP-
Input: Training set E hard problem (Dash & Liu, 1997). In our system, we use the For-
Output: Random forests H ward Sequential Search method, which selects one among all the
1: Count the numbers of instances in four main classes i.e., candidates to the current state. It works in an iterative manner
n1 , n2 , n3 and n4 and once a candidate is selected it is not possible to go back. It
2: n := min{n1 , n2 , n3 , n4 } does not guarantee an optimal result but has fast search speed.
3: for each class k do If the length of total sequence is n, the number of search steps
4: Sample n instances based on the descendent order of Di in must be limited by O(n) and the complexity is determined tak-
Eq.(8) ing into account the number t of evaluated sub-sets, which gives
5: end for O(nt+1 ). We need to modify this method a bit, because our fea-
6: The selected instances form a new training set E  sized N = 4n ture selection works on each trade day in a sample, that is, once
7: for b :=1 to B do a candidate is selected, MD features will be added. For the fea-
8: Randomly sample (with replacement) N instances from tures Fi of trade day i in a training sample, it starts with Fi =
training set E  {closepricei , openpricei , highpricei , lowpricei }, the forward step con-
9: Conduct feature selection [optional] sists of:
10: Build a decision tree hb Fi := {Fi ∪ fi(k ) ∈ Fi \Fi |J (Fi ∪ fi(k ) ) is bigger}, (9)
11: end for
12: Random forest H is the ensemble of {h1 , h2 , . . ., hB } defined by where Fi is the complete set of features of trade day i and J is an
Eq.(6). evaluation measure. Xuanwu uses the empirical risk of the learned
13: return H model as the evaluation measure J which is defined as:


N

model training for the four main classes. We first count the num- J ≡ Remp (H ) = I(H (x˜ i ), yi ), (10)
bers of Clips (n1 , n2 , n3 , n4 ) belonging to four main classes Up, i=1

Down, Flat, and Unknown, respectively. The number of instances n where I() is an indicator function. The stopping criterion can be:
for each class will be the minimum of them. Then, we sample n in- |Fi | = n (if n has been fixed in advance), the value of J has not
stances for each class (expect the class with n instances) based on increased in the last j steps, or it surpasses a prefixed value J0 .
the descendent order of Di in Eq. (8). The new train set contains
4n instances selected. Finally, we build a random forest H which 6. Evaluation
is an ensemble of B decision trees. Note that before building deci-
sion trees, we may select a subset of features to further optimize We evaluate our proposed system Xuanwu on 495 stocks in
the performance of models. The feature selection procedure will Shenzhen Growth Enterprise Market in China. The time span of the
be discussed in the next sub-section. The similar processes demon- transaction data of these stocks is within the range from January
strated in Algorithm 2 also can be applied to build sub-class mod- 25, 2010 to October 1, 2016.
els.
As for our undersampling in step 4, it is based on the measure 6.1. Experimental setup
of dissimilarity of instances, which is different from the traditional
random sampling. Suppose that we will sample n instances from We generated the data sets that can be used for model training
a pool totally containing n4 instances belonging to class Unknown, from the raw transaction data of all stocks, using the methods de-
for each instance xi in the pool, we calculate the measure of dis- scribed in Section 4 and Section 5.2. Since our prediction scheme
similarity as follows: is based on several fixed prediction durations, Table 1 shows that

n4 we need to build 70 learning models with the feature selection in
Di = 1 / dist (xi , x j ), (8) consideration. Thus, we created 70 data sets for our evaluation. For
j=1 each data set, we randomly held out 30% of its instances with re-
where function dist() returns the Euclidean distance between two spect to each class for testing and the remaining 70% of instances
instances xi and x j . Then, we select the first n instances after all are used to train models. The models were trained using the ran-
instances are sorted in descendent order of Di . Our undersampling dom forests implemented in WEKA (Hall et al., 2009) with the de-
method guarantees that the selected n instances have the largest fault parameter settings. Because we tuned the class distributions
diversity, which increases the quality of the models learned from to a relatively balanced status, we use the simple accuracy as our
them. performance measure. Furthermore, since we randomly conducted
a 30/70 splits to the data sets, we repeated the experiments ten
5.3. Feature selection times and the average values of accuracy and their standard devia-
tions are reported.
As mentioned in Section 4.3, the features of a training sample
can be extended by adding multiple technical indices of each trade 6.2. Results on four main classes
day, which results in very high dimensional features. Usually, for
each trade day in a training sample, we only use four price infor- We first evaluated the performance on the four main classes
mation, close price, open price, high price and low price, to train (i.e., Up, Down, Flat and Unknown) with respect to seven P D − MD
J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69 67

Table 2 Table 3
Classification accuracy on four main classes (percent). Classification accuracy on ten sub-classes (percent) .

B/F\PD-MD 10-6 15-10 20-13 30-20 PD-MD B/F UA DA UR DR

B 62.1 ± 3.2 65.7 ± 2.7 66.4 ± 3.1 70.8 ± 1.9 10-6 B 56.9 ± 3.1 57.2 ± 2.8 52.5 ± 3.7 51.8 ± 2.9
F 65.2 ± 2.9 67.2 ± 3.1 69.9 ± 2.8 74.2 ± 2.6 F 59.2 ± 2.4 61.1 ± 3.3 58.5 ± 4.1 57.2 ± 3.1
B/F\PD-MD 40-26 50-33 60-40 avg. 15-10 B 60.4 ± 2.7 62.3 ± 2.2 55.6 ± 3.1 56.2 ± 3.5
B 70.2 ± 2.1 69.8 ± 3.8 67.8 ± 2.5 67.5 ± 2.7 F 62.7 ± 3.1 65.7 ± 2.7 60.7 ± 3.3 59.7 ± 2.5
F 75.1 ± 3.7 72.2 ± 2.8 70.3 ± 2.2 70.6 ± 2.9 20-13 B 62.8 ± 2.4 63.5 ± 2.6 59.8 ± 2.1 57.5 ± 3.3
F 65.3 ± 2.6 68.2 ± 2.1 63.9 ± 3.1 61.2 ± 4.1
30-20 B 67.3 ± 1.9 66.8 ± 2.1 64.3 ± 2.4 65.3 ± 2.7
F 71.2 ± 2.2 72.6 ± 2.5 69.1 ± 2.4 68.5 ± 2.9
pairs (i.e., “10 − 6”, “15 − 10”, “20 − 13”, “30 − 20”, “40 − 26”, “50 − 40-26 B 66.9 ± 2.4 65.4 ± 1.9 65.4 ± 2.3 65.8 ± 2.5
F 70.1 ± 2.1 69.1 ± 3.1 68.3 ± 2.8 69.6 ± 3.2
33”, and “60 − 40”). The experimental results are listed in Table 2.
50-33 B 65.4 ± 3.3 66.3 ± 2.5 64.7 ± 2.7 63.6 ± 2.2
The baseline of the performance of the learned models is marked F 68.4 ± 2.9 70.0 ± 2.7 66.1 ± 2.2 66.6 ± 2.8
as “B” in Table 2, where the features of each trade day only include 60-40 B 63.0 ± 2.2 64.5 ± 3.1 64.8 ± 2.8 64.3 ± 2.4
four values (i.e., close price, open price, high price and low price). F 66.7 ± 2.5 68.1 ± 2.8 67.3 ± 2.5 68.2 ± 2.7
Comparatively, the performance optimized with feature selection avg. B 63.2 ± 2.6 63.7 ± 2.5 61.0 ± 2.7 60.6 ± 2.8
F 66.2 ± 2.5 67.8 ± 2.7 64.8 ± 2.9 64.4 ± 3.0
is marked as “F”. When we conducted the selection selection, we
searched the candidates which consist of “volume” and other 74
technical indices calculated by Ta-Lib3 .
From Table 2, we can draw the conclusions as follows. (1) When changes (i.e., UR and DR). Relative return is more related to the
features of each trade day in each training instance only include market status (e.g., in bull or bear). We found that using relative
four values, the average accuracy of all seven models is 67.5%. return while not the close price is more difficult to identify typical
Among these models, we find the performance of “30 − 20” and shapes, with results in lower accuracies. The models built with the
“40 − 26” is significantly better than that of the other. That is, our relative return are especially useful when the market is in a typ-
system is more suitable for predicting the trend of stock price in a ical bull or bear, where the shapes of most stocks are Continuous
relatively long term with the trade duration in the range of 30 to Up (Continuous Down). (3) For the models UA and DA, the perfor-
40 days. (2) When more information such as “volume” and some mance of “30 − 20” and “40 − 26” is greater than that of the oth-
technical indices are added into the features, the performance of ers, which is consistent with the results of the main model. For the
all models has been improved. The average of the increment is models UR and DR, besides the “30 − 20” and “40 − 26” models,
greater than 3% (in absolute value). (3) Although we randomly the “50 − 33”, and “60 − 40” models also have good performance.
choose 70% samples for model training, the standard deviations of That is, if we consider the relative return, our system is suitable
all models are sound. The maximum absolute value of the standard for long-term prediction with the trade duration in the range 30
deviations is only 3.7%. Small standard deviations suggest that our to 60 days. (4) Consistent with the results of the main model,
unsupervised algorithms for stock movement shape identification when more information such as “volume” and some technical in-
are accurate, which increases the robustness of the learned mod- dices are added into the features, the performance of all models
els. has been improved. The average of the increment is greater than
4% (in absolute value). (5) The robustness still maintains for these
more subtle model according to the standard deviations. Using rel-
6.3. Results on ten sub-classes
ative return only slightly increases the absolute values of standard
deviations.
Our system adopts a tow-stage prediction scheme to obtain
more refined results. For example, when a test instance is pre-
dicted as class Up, it will be re-predicted using the model UA to 6.4. Results under the market volatility
further determine the level of the increment of the price, i.e., the
increment will be in the range [10%, 30%] (class UA1) or greater In this experiment, we investigate the performance (in terms of
than 30% (class UA2). In this experiment, we still follow this tow- accuracy) of our system under the market volatility. We extracted
stage prediction scheme: a test instance is first predicted by the the data between August, 2014 and May, 2015 to generate test set
main model that includes four main classes and then re-predicted Tbull , in which the market is a bull market with the composite in-
by a specific sub model according to the results of the main model. dex increasing from 1331 to 3542, the data between June, 2015
That is, if the test instance x belongs to class UA1, our exper- and January, 2016 to generate test set Tbear , in which the market is
iment evaluates the joint probability distribution P r ((x = UA1 ) ∪ a bear market with the composite index decreasing from 3718 to
(x = UP )). Since our models are trained for different predefined 1994, and the data between February, 2016 and September, 2016
fixed prediction durations, there are totally 56 learned models if to generate test set Tshocking , in which the composite index fluctu-
the feature selection are taken into account. All experimental re- ates between 1192 and 2149 with the maximum value 2324 and
sults are listed in Table 3. the minimum value 1880.
From Table 3, we can draw the conclusions as follows. (1) Com- Table 4 shows the classification accuracies for ten sub-classes
pared with Table 2, the performance of all models in Table 3 de- of fourteen predefined models under three typical market patterns
creases. This is because that the joint probability distribution (bull, bear and shocking), where “A” presents that the models (A-
cannot be greater than the marginal probability distribution. For Model) are trained using the samples whose classes are identified
example, we always have P r ((x = UA1 ) ∪ (x = UP )) ≤ P r (x = UP ). through their absolute close prices and “R” represents that the
However, the average performance of these models is greater than models (R-Model) are trained using the samples whose classes are
60%, which means that they are still available in a real usage. (2) identified through their relative returns. From this table, we can
The performance of the models for absolute price movements (i.e., draw the conclusions as follows. (1) For the bull and bear mar-
UA and DA) outperforms that of the models for relative return ket, the accuracies of the A-Models are significantly improved, com-
pared with the values in Table 3. The reason is that under a bull
or bear market, stocks usually have typical ascending or declining
3
http://ta-lib.org. forms that are easier to be correctly classified. (2) For the shock-
68 J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69

Table 4 Table 6
Classification accuracy under three typical market patterns Comparisons with Shynkevich et al. (2017) in return per trade (percent).
(percent).
PD-MD SVM ANN k-NN Ours
PD-MD A/R Tbull Tbear Tshocking
10-6 (10-7) 4.23 ± 0.68 3.43 ± 1.45 1.78 ± 1.14 4.58 ± 1.45
10-6 A 76.8 ± 2.1 74.9 ± 2.2 58.3 ± 3.1 15-10 (15-10) 5.34 ± 0.99 4.52 ± 2.01 2.12 ± 1.52 6.02 ± 1.73
R 54.1 ± 2.4 53.3 ± 2.8 53.7 ± 2.4 20-13 (20-15) 6.41 ± 1.18 5.43 ± 2.28 2.36 ± 1.89 7.25 ± 2.28
15-10 A 78.6 ± 2.2 79.9 ± 3.1 63.6 ± 3.2 30-20 (30-20) 7.72 ± 1.68 6.02 ± 3.35 2.94 ± 2.46 9.65 ± 2.64
R 64.7 ± 2.8 65.2 ± 2.7 59.8 ± 2.3
20-13 A 80.1 ± 1.9 78.2 ± 2.6 66.7 ± 3.1
R 63.7 ± 2.4 64.8 ± 2.7 63.3 ± 4.1
30-20 A 83.1 ± 2.0 80.3 ± 1.9 72.1 ± 2.7 subtle classes, compared with theirs which only has three classes
R 68.4 ± 1.9 67.3 ± 2.2 65.8 ± 3.1 Up, Down and NotMove; and (2) the performance of our method
40-26 A 82.9 ± 2.1 83.5 ± 1.6 71.7 ± 2.7 will increase as the prediction duration increases. In their original
R 67.7 ± 2.2 68.3 ± 1.9 66.7 ± 3.1 study, they found some output-input (PD-MD) models, such as “7-
50-33 A 79.4 ± 1.8 82.2 ± 2.1 70.2 ± 2.3
5”, “10-7”, “15-10”, “20-15”, and “30-20”, can archive higher per-
R 67.1 ± 2.6 66.3 ± 2.3 65.9 ± 3.2
60-40 A 77.8 ± 1.9 78.2 ± 2.7 68.2 ± 2.6 formance compared with the other pairs. Their results have shown
R 68.1 ± 3.1 66.9 ± 2.3 66.6 ± 2.8 some consistency with our selections of PD-MD pairs, which is that
avg. A 79.8 ± 2.0 79.6 ± 2.3 67.3 ± 2.8 the performance could be better if the length of model-duration is
R 64.8 ± 2.5 64.6 ± 2.4 63.1 ± 3.0
around two-thirds that of prediction-duration.

Table 5 6.6. Comparisons with existing work in return per trade


Comparisons with Shynkevich et al. (2017) in accuracy (percent).

PD-MD SVM ANN k-NN Ours


Return per trade is a commonly used metric when evaluating
the performance of a trading system. An actual trading system
10-6 (10-7) 58.9 ± 3.6 55.2 ± 9.3 43.9 ± 6.6 65.2 ± 2.9
may have complicated trading rules which may be generated dy-
15-10 (15-10) 60.1 ± 4.6 55.9 ± 10.5 41.4 ± 5.4 67.2 ± 3.1
20-13 (20-15) 61.5 ± 3.6 57.0 ± 8.1 42.1 ± 6.4 69.9 ± 2.8 namically according to the movement of stocks (Arévalo, García,
30-20 (30-20) 59.6 ± 3.7 52.6 ± 10.0 40.8 ± 5.1 74.2 ± 2.6 Guijarro, & Peris, 2017; Cervelló-Royo, Guijarro, & Michniuk, 2015;
40-26 (30-25) 60.4 ± 5.1 54.4 ± 9.5 41.4 ± 5.2 75.1 ± 3.7 Wang & Chan, 2007). Although the trading rules are out of the
50-33 (30-30) 60.9 ± 5.3 55.4 ± 8.8 41.8 ± 5.4 72.2 ± 2.8 scope of this study, we still introduce very simple rules for eval-
uate the returns using our prediction system. When the prediction
of a stock is Up, we buy the stock at the moment of the prediction
ing market, the accuracies of the A-Model are similar with their and then sell it on the last day of a prediction duration (PD). The
values in Table 3 but obviously worse than their values under a return of this trade can be calculated as:
bull or bear market, because the shapes of stocks under a shock- R p,m = (C p − Cm )/Cm , (11)
ing market are more complicated to be correctly classified. (3) For
all market patterns, the accuracies of the R-Models are similar with where Cp is the close price on the last day of prediction duration
their values in Table 3. The models trained using relative returns (PD), Cm is the close price on the day that the prediction is made,
are more consistent in different market patterns. Overall, Because Rp, m is the return from a trade. When the prediction of a stock is
our training data cover sufficient historical information, their gen- Down, we sell the stock at the moment of the prediction and then
erated models are robust to the market volatility. buy it back on the last day of a prediction duration. The return
from of trade can be calculated as:

6.5. Comparisons with existing work in accuracy Rm,p = (Cm − C p )/Cm . (12)
We must point out that here we made a very large simplification
Although direct comparisons among existing work are not easy to real-world systems. We presume that return per trade, as de-
because of different data preprocessing, model training meth- fined above, neglect transaction costs. Additionally, actual trades
ods and learning goals, we tried to select some recent work will hardly be negotiated in the same level as closing prices.
on the prediction of stock price movement and make relatively To compare with the existing study (Shynkevich et al., 2017),
fair comparisons. Shynkevich, McGinnity, Coleman, Belatreche, and we randomly choose 50 stocks and the returns are calculated for
Li (2017) investigated the impact of varying input window length the trades made during the testing phase. Return of single stock
on the prediction accuracy. In their work, each training example is averaged over the total number of trades made for this stock.
consists of a sequence of technical indicators which are calcu- Finally, the return per trade is averaged over 50 stocks for each PD-
lated from transaction data on trade days. They investigated the MD pair. Table 6 lists the comparison results in return per trade.
prediction performance under different learning algorithms (i.e., The return per trade value will increase as the investment pe-
support vector machines, neural networks and k-nearest neigh- riod increases. Because the longest investment period in the com-
bors) by setting the numbers of the trade days (i.e., the length pared study is 30 days, we only list the comparison results un-
of the sequence) to 1, 3, 5, 7, 10, 15, 20, 25 and 30. Table 5 der four PD-MD pairs. Consistent with the previous comparison
lists the comparison results in accuracy between the existing work results in Section 6.5, the values of return per trade obtained
(Shynkevich et al., 2017) and ours. by our method are obviously greater than those of the existing
The first column of Table 5 represents the output-input models method under all PD-MD pairs, since the prediction accuracies of
of both methods. The PD-MD pairs in the parentheses are the clos- our method are always better. The comparison results reveal that
est values to their counterparts of our method. The columns SVM, even under such simple trade rules, our system can bring extra re-
ANN and k-NN shows the performance of their methods in term turns.
of accuracy and the last column shows the performance of our
method. Obviously, the performance of our method outperforms 7. Conclusion and future work
that of their method in both mean values and standard deviations
under all PD-MD models. Furthermore, compared with their work, For small startup investment companies, due to limited funds,
our method wins at two points: (1) our method provide more it is impossible to trade in the stock market frequently. Instead,
J. Zhang et al. / Expert Systems With Applications 97 (2018) 60–69 69

they are interested in moderate investment periods that last a Cervelló-Royo, R., Guijarro, F., & Michniuk, K. (2015). Stock market trading rule
week to three months. To address the prediction of the stock price based on pattern recognition and technical analysis: Forecasting the DJIA index
with intraday data. Expert Systems with Applications, 42(14), 5963–5975.
trend in such periods, this paper proposes a novel data-driven sys- Chen, C. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques
tem Xuanwu. The system gets through all machine learning pro- and technologies: A survey on big data. Information Sciences, 275, 314–347.
cesses from generating training samples from the original trans- Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analy-
sis, 1(1–4), 131–156.
action data to building the prediction models without any human Ding, X., Zhang, Y., Liu, T., & Duan, J. (2015). Deep learning for event-driven stock
intervene. It first uses a sliding window method to cut the his- prediction. In Ijcai (pp. 2327–2333).
torical transaction data of each stock into multiple Clips whose Fama, E. F. (1995). Random walks in stock market prices. Financial Analysts Journal,
51(1), 75–80.
length equals to a predefined prediction duration. Then, according
Gerlein, E. A., McGinnity, M., Belatreche, A., & Coleman, S. (2016). Evaluating ma-
the shapes that the close prices of these Clips appear, it utilizes an chine learning classification for financial trading: An empirical approach. Expert
unsupervised heuristic algorithm to classify them into four main Systems with Applications, 54, 193–207.
Göçken, M., Özçalıcı, M., Boru, A., & Dosdoğru, A. T. (2016). Integrating metaheuris-
classes: Up, Down, Flat, and Unknown. For the Clips belonging to
tics and artificial neural networks for improved stock price prediction. Expert
classes Up and Down, they are further classified into different lev- Systems with Applications, 44, 320–331.
els which can reflect the extents of their growth and decline rates Hadavandi, E., Shavandi, H., & Ghanbari, A. (2010). Integration of genetic fuzzy sys-
with respect to both absolute close price and relative return rate. tems and artificial neural networks for stock price forecasting. Knowledge-Based
Systems, 23(8), 800–808.
The training sets are derived from these Clips by sampling different Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009).
classes of samples for imbalanced class distribution. Finally, learn- The weka data mining software: An update. ACM SIGKDD Explorations Newsletter,
ing models are trained from these training sets with or without 11(1), 10–18.
He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on
feature selection. Knowledge and Data Engineering, 21(9), 1263–1284.
The real-world evaluations on seven-year Shenzhen Growth En- Huang, C.-F., Chang, B. R., Cheng, D.-W., & Chang, C.-H. (2012). Feature selection and
terprise Market (China) transaction data show the advantages of parameter optimization of a fuzzy-based stock selection model using genetic
algorithms. International Journal of Fuzzy Systems, 14(1), 65–75.
the proposed systems as follows. First, the unsupervised training Jeon, S., Hong, B., & Chang, V. (2017). Pattern graph tracking-based stock price pre-
sample generation is effective and efficient, accelerating the model diction using big data. Future Generation Computer Systems.
reproduction. Second, the performance of our learning models out- Kim, K.-j. (2003). Financial time series forecasting using support vector machines.
Neurocomputing, 55(1), 307–319.
perform some existing methods in terms of accuracy and return
Lawrence, R. (1997). Using neural networks to forecast stock market prices: 333. Uni-
per trade, because our learning method integrates random forests, versity of Manitoba.
imbalance learning and feature selection in a uniform process. Fi- Lehmann, T. M., Gonner, C., & Spitzer, K. (1999). Survey: Interpolation methods
in medical image processing. IEEE Transactions on Medical Imaging, 18(11),
nally, our prediction models are robust to the market volatility.
1049–1075.
There is a large room for performance improvement in the Lin, X., Yang, Z., & Song, Y. (2009). Short-term stock price prediction based on echo
future. First, we will study the prediction performance when state networks. Expert Systems with Applications, 36(3), 7313–7317.
different learning algorithms are applied to the training sets. Nassirtoussi, A. K., Aghabozorgi, S., Wah, T. Y., & Ngo, D. C. L. (2014). Text mining for
market prediction: A systematic review. Expert Systems with Applications, 41(16),
Second, more complicated feature selection methods will be exam- 7653–7670.
ined to select better combinations of features. Finally, our unsur- Ni, L.-P., Ni, Z.-W., & Gao, Y.-Z. (2011). Stock trend prediction based on fractal feature
prised heuristic algorithms for pattern recognition can be further selection and support vector machine. Expert Systems with Applications, 38(5),
5569–5576.
improved for more different shapes. Patel, J., Shah, S., Thakkar, P., & Kotecha, K. (2015). Predicting stock and stock price
index movement using trend deterministic data preparation and machine learn-
Acknowledgements ing techniques. Expert Systems with Applications, 42(1), 259–268.
Ruiz, E. J., Hristidis, V., Castillo, C., Gionis, A., & Jaimes, A. (2012). Correlating fi-
nancial time series with micro-blogging activity. In Proceedings of the fifth ACM
This research has been supported by the National Natural Sci- international conference on web search and data mining (pp. 513–522). ACM.
ence Foundation of China under Grant No. 61603186, the Natural Sakurai, Y., Matsubara, Y., & Faloutsos, C. (2015). Mining and forecasting of big
time-series data. In Proceedings of the 2015 ACM SIGMOD international conference
Science Foundation of Jiangsu Province, China, under Grant No.
on management of data (pp. 919–922). ACM.
BK20160843, the China Postdoctoral Science Foundation under Schöneburg, E. (1990). Stock price prediction using neural networks: A project re-
Grant No. 2016M590457 and 2017T100370, and the Science Foun- port. Neurocomputing, 2(1), 17–27.
Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction us-
dation of the Science and Technology Commission of the Central
ing breaking financial news: The AZFin text system. ACM Transactions on Infor-
Military Commission (Youth Project), China. mation Systems (TOIS), 27(2), 12.
Shynkevich, Y., McGinnity, T., Coleman, S., Belatreche, A., & Li, Y. (2017). Forecasting
References price movements using technical indicators: Investigating the impact of varying
input window length. Neurocomputing.
Abarbanell, J. S., & Bernard, V. L. (1992). Tests of analysts’ overreac- Si, J., Mukherjee, A., Liu, B., Li, Q., Li, H., & Deng, X. (2013). Exploiting topic based
tion/underreaction to earnings information as an explanation for anomalous twitter sentiment for stock prediction. ACL, 2, 24–29.
stock price behavior. The Journal of Finance, 47(3), 1181–1207. Timmermann, A., & Granger, C. W. (2004). Efficient market hypothesis and forecast-
Adam, K., Marcet, A., & Nicolini, J. P. (2016). Stock market volatility and learning. ing. International Journal of Forecasting, 20(1), 15–27.
The Journal of Finance, 71(1), 33–82. Tsai, C., & Wang, S. (2009). Stock price forecasting by hybrid machine learning tech-
Adebiyi, A. A., Adewumi, A. O., & Ayo, C. K. (2014). Comparison of ARIMA and artifi- niques. In Proceedings of the international multiconference of engineers and com-
cial neural networks models for stock price prediction. Journal of Applied Math- puter scientists (pp. 755–760).
ematics, 2014. Tsai, C.-F., & Hsiao, Y.-C. (2010). Combining multiple feature selection methods for
Arévalo, R., García, J., Guijarro, F., & Peris, A. (2017). A dynamic trading rule based stock prediction: Union, intersection, and multi-intersection approaches. Deci-
on filtered flag pattern recognition for stock market price forecasting. Expert sion Support Systems, 50(1), 258–269.
Systems with Applications, 81, 177–192. Tsai, C.-F., Lin, Y.-C., Yen, D. C., & Chen, Y.-M. (2011). Predicting stock returns by
Ballings, M., Van den Poel, D., Hespeels, N., & Gryp, R. (2015). Evaluating multiple classifier ensembles. Applied Soft Computing, 11(2), 2452–2459.
classifiers for stock price direction prediction. Expert Systems with Applications, Tsibouris, G., & Zeidenberg, M. (1995). Testing the efficient markets hypothesis
42(20), 7046–7056. with gradient descent algorithms. In Neural networks in the capital markets: 8
Barak, S., Arjmand, A., & Ortobelli, S. (2017). Fusion of multiple diverse predictors in (pp. 127–136). Wiley: Chichester.
stock market. Information Fusion, 36, 90–102. Wang, J.-L., & Chan, S.-H. (2007). Stock market trading rule discovery using pat-
Blume, L., Easley, D., & O’hara, M. (1994). Market statistics and technical analysis: tern recognition and technical analysis. Expert Systems with Applications, 33(2),
The role of volume. The Journal of Finance, 49(1), 153–181. 304–315.
Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal Wu, X., Zhu, X., Wu, G.-Q., & Ding, W. (2014). Data mining with big data. IEEE Trans-
of Computational Science, 2(1), 1–8. actions on Knowledge and Data Engineering, 26(1), 97–107.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. Zhang, G., Patuwo, B. E., & Hu, M. Y. (1998). Forecasting with artificial neural net-
works: The state of the art. International Journal of Forecasting, 14(1), 35–62.

You might also like